15-418 Project Proposal: Simulating GPUs in CADSS

Authors: Ethan Lu (eblu) and Theo Kroening (tkroenin)

URL

Link to Andrew UserWeb: https://www.andrew.cmu.edu/user/tkroenin/15418/.

Summary

CADSS is an open source Computer Architecture simulation framework written by Professor Railing for CMU’s 15-346 course. For our project, we will model GPUs in CADSS, with configurable hardware layouts. Through benchmarking, we will explore impacts and tradeoffs of different hardware designs.

Background

CADSS is designed to fully simulate a CPU-based system. CADSS is composed of the following main components: Engine, Processor, Cache, Coherence, Interconnects, and Memory.

The inputs to the CADSS simulator is a trace file, which can be thought of as a something like an assembly file, where each trace operation contains fields such as op code, rs1 (register source 1), rs2 (register source 2), and rd (destination register), etc. The relevant output of the simulator is the number of ticks we take to run the entire simulation, which is printed out at the end.

Below is a brief description of the relevant components.

Below is an image of how the the compoenents interact with one another (taken from Professor Railing’s paper).

For our project, we will be simulating a GPU. At a high-level, many pieces of the architecture remain the same. GPU “cores” still have cache and memory components, for example. We will need to adapt the simulator framework to model scheduling of thread blocks onto execution resources and execution in warps. We will model a GPU as a drop-in replacement for the existing “Processor” component in CADSS.

In addition to looking at the processor component, we may also need to lightly modify other components, for example the cache, memory, and interconnect components to better model how these work on GPUs.

We may omit some of the components from 15-346 for the sake of time and accuracy. For example, branch predictors and coherence are not as relevant to the project, so we may use more trivial implementations for those/ignore them.

The Challenge

Below are some of the key challenges we can think of as of right now:

Designing Trace Files

The design of our trace files will be non-trivial. In CADSS, trace files are simple in that they are simply an abstract assembly with operators, operands, and addresses. For GPUs, we will need to model the idea of kernel launches and SIMD. In Assignment 2 of 15-418, we saw that the programmer can specify the “structure” of their computation (grid dimensions, number of blocks, etc.). So, we will need to develop a novel trace file format that can support multiple kernel launches with different parameters.

Scheduling

The CUDA abstraction allows the programmer to simply specify what computation needs to be done on the GPU without thinking about how to map/schedule it onto physical resouces. One critical question we need to answer is whether the responsibility of scheduling/mapping belongs to the compiler that generates traces or the simulated hardware that runs the trace. For example, the programmer may launch more thread blocks than can be handled at any given time by the GPU.

If we model scheduling on the GPU side, we would need to think about how to capture scheduling overhead of mapping thread blocks onto execution units.

Overall Architecture

While there any detailed resources describing the methodlogy for CPU design, it is much more difficult to find good guidance on GPU architecure other than an high level overview (ex. threads, warps, blocks). We need to decide on a basic architecture of the GPU.

Resources

CADSS is freely available on GitHub, though the implementations of certain “reference” components are kept private. We have referenced Professor Railing’s paper on the design of CADSS, and will continue to do so throughout the project.

In general, CADSS can run without issue on any x86 machine, so we will not need access to any special hardware.

Currently, we are thinking of basing our tracefiles on the NVIDIA PTX ISA.

Goals & Deliverables

MVP (Plan to Achieve)

Extensions/Challenges (Hope to Achieve)

Platform Choice

Our platform is the CADSS framework, which can run locally on any x86 machine. We will build everything in C++ as we are familiar with it from 15-418.

Schedule

We assign dates for each of our deliverables: