Introduction

Overview of Saturn
Figure 1. A high-level overview of the Saturn Vector Unit

This manual describes the Saturn Vector Unit, a parameterized and extensible vector microarchitecture executing the RISC-V vector extension. Saturn was developed to address an academic need for a representative, compliant, and flexible generator of RISC-V vector units targeting deployment in domain-specialized cores. Saturn is implemented as a parameterized Chisel RTL generator, enabling a range of possible Saturn configurations across many target deployment scenarios. This document discusses the microarchitectural details of all Saturn components.

  • Chapter 1 describes the motivation for Saturn and compares Saturn’s design approach to those of existing data-parallel microarchitecture archetypes

  • Chapter 2 discusses the system organization of Saturn

  • Chapter 3 describes the microarchitecture of Saturn’s vector frontend unit

  • Chapter 4 describes the microarchitecture of Saturn’s vector load-store unit

  • Chapter 5 describes the microarchitecture of Saturn’s datapath and vector instruction sequencers

  • Chapter 6 provides guidance on writing efficient vector code for Saturn

  • Chapter 7 discusses the historical context of Saturn within past academic vector units

Objectives

Saturn was developed with the following objectives:

  • Provide a representative baseline implementation of the RISC-V Vector specification

  • Support full compliance with the complete RVV specification, including support for virtual memory and precise faults

  • Target performant ASIC implementations, rather than FPGA deployments

  • Be sufficiently parameterized to support configurations across a wide power/performance/area design space

  • Demonstrate efficient scheduling of vector operations on a microarchitecture with a short hardware vector length

  • Implement a SIMD-style microarchitecture, comparable to existing SIMD datapaths in DSP microarchitectures

  • Integrate with existing efficient area-compact scalar cores, rather than high-IPC general-purpose cores

  • Support extensibility with custom vector instructions, functional units, and accelerators that leverage the baseline capability in the standard RVV ISA

  • Target deployment as part of a DSP core or similarly domain-specialized core, instead of general-purpose systems

Questions, bug reports, or requests for further documentation can be made to jzh@berkeley.edu.

1. Background

This chapter presents a background discussion of deployment scenarios for data-parallel systems and the dominant architectures in the commercial space for such systems. Saturn, as an implementation of a modern scalable vector ISA targeting deployment in specialized cores, fills an underexplored niche in the space of open-source data-parallel microarchitectures. A comparison of Saturn’s microarchitecture philosophy to alternative vector approaches is discussed.

1.1. Data-parallel Workloads

We broadly categorize the range of deployment scenarios and workloads for programmable data-parallel systems into four domains. These categories are differentiated by power/area constraints and workload characteristics. Figure 2 outlines the four deployment domains where DLP-enabled architectures are critical, and the salient architectural and microarchitectural archetypes that have come to dominate each domain.