C³: CXL Coherence Controllers for Heterogeneous Architectures

Artifact for "C3: CXL Coherence Controllers for Heterogeneous Architectures", HPCA '26

This repository contains the models and workloads for evaluating CXL-based cache coherence protocols using the gem5 simulator. The evaluated protocols are:

MESI-MESI-MESI: Baseline MESI protocol
MESI-CXL-MESI: Two MESI cluster connected through CXL coherence
MESI-CXL-MOESI: MESI & MOESI clusters connected through CXL coherence
MESI-CXL-MESIF: MESI & MESIF clusters connected through CXL coherence

Quick Start

For simplicity, we provide Docker images:

Option 1: Prebuilt image (~40 GB)

$ just docker-prebuilt

Ready to run experiments -- includes pre-compiled gem5 models for C3 and pre-compiled workloads

Then follow from: Functional Validation

Option 2: Base image (~1 GB)

$ just docker-base

Ready to compile gem5 models and the workloads -- includes system dependencies

Then follow from: Build gem5

Option 3: Manual set-up (~1h30-3h, 30 GB free space required)

Follow the next steps to manually set up the environment

Prerequisites

OS: Ubuntu 22.04 LTS or 24.04 LTS
Compiler: GCC 11.4.0
Python: 3.10+
SCons: 4.0+

Install Dependencies

System Dependencies

sudo apt-get update && sudo apt-get install -y \
    build-essential \
    cmake \
    g++ \
    git \
    python3 \
    python3-pip \
    python3-venv \
    scons \
    zlib1g-dev \
    libprotobuf-dev \
    protobuf-compiler \
    libgoogle-perftools-dev \
    libboost-all-dev \
    libhdf5-serial-dev \
    libpng-dev \
    libjemalloc-dev \
    pkg-config \
    wget \
    m4 \
    libtbb-dev \
    gettext \
    libgettextpo-dev \
    libglw1-mesa-dev \
    libxext-dev \
    libx11-dev \
    libxmu-dev \
    libglut-dev \
    libxi-dev \
    gcc-aarch64-linux-gnu \
    g++-aarch64-linux-gnu

Python Dependencies (for plotting)

pip3 install pandas numpy matplotlib seaborn

Repository Structure

C-3-Artifact/
├── gem5/                    # gem5 simulator source
├── slicc/                   # SLICC protocol definitions
├── benchmarks/              # Benchmark suites
│   ├── parsec-benchmark/    # PARSEC 3.0 (X86)
│   ├── Splash-4/            # Splash-4 (X86)
│   ├── phoenix/             # Phoenix (X86)
│   ├── parsec-benchmark-arm/# PARSEC 3.0 (ARM)
│   ├── Splash-4-arm/        # Splash-4 (ARM)
│   └── phoenix-arm/         # Phoenix (ARM)
├── script/                  # Build and run scripts
│   ├── build-gem5.sh            # Build gem5 for X86 and ARM
│   ├── build-benchmark.sh       # Build all benchmarks (X86 + ARM)
│   ├── build-benchmark-x86.sh   # Build X86 benchmarks only
│   ├── build-benchmark-arm.sh   # Build ARM benchmarks only
│   ├── run-functional.sh        # Functional validation
│   ├── create-configurations.sh # Generate all configurations
│   ├── create-conf-x86.sh       # Generate X86 configurations
│   ├── create-conf-arm.sh       # Generate ARM configurations
│   ├── create-conf-litmus.sh    # Generate Litmus configurations
│   ├── run-all-fig.sh           # Run all experiments (Fig 9, 10, 11)
│   ├── run-fig9.sh              # Run Figure 9 experiments (ARM MCM)
│   ├── run-fig10.sh             # Run Figure 10 experiments (X86)
│   ├── run-fig11.sh             # Run Figure 11 experiments
│   ├── run-litmus.sh            # Run ARM litmus tests
│   ├── extract-stats.sh         # Extract X86 statistics
│   ├── extract-stats-arm.sh     # Extract ARM statistics
│   ├── plot_fig9.py             # Generate Figure 9 plot
│   ├── plot_fig10.py            # Generate Figure 10 plot
│   └── plot_fig11.py            # Generate Figure 11 plot
├── setup/                   # Protocol setup scripts
└── data/                    # Output directory (created at runtime)

Build gem5

Expected build time: ~1h-2h (depending on CPU and parallelism).

Build the gem5 simulator with all cache coherence protocols for both X86 and ARM architectures:

./script/build-gem5.sh

This builds gem5 for each protocol and architecture:

X86 builds:

gem5/build/X86_MESI_unord/gem5.opt
gem5/build/X86_MESI_unord_CXL/gem5.opt
gem5/build/X86_MESI_CXL_MOESI/gem5.opt
gem5/build/X86_MESI_CXL_MESIF/gem5.opt

ARM builds:

gem5/build/ARM_MESI_unord/gem5.opt
gem5/build/ARM_MESI_unord_CXL/gem5.opt
gem5/build/ARM_MESI_CXL_MOESI/gem5.opt
gem5/build/ARM_MESI_CXL_MESIF/gem5.opt

Build Benchmarks

Expected build time: ~20-40 minutes

Build all three benchmark suites (PARSEC, SPLASH-4, Phoenix) for both X86 and ARM:

./script/build-benchmark.sh

This script internally calls both build-benchmark-x86.sh and build-benchmark-arm.sh.

To build for a specific architecture only:

./script/build-benchmark-x86.sh   # Build X86 benchmarks only
./script/build-benchmark-arm.sh   # Build ARM benchmarks only

Functional Validation

[Optional] Expected run time: ~8min

Before running the full experiments, you can validate that gem5 and benchmarks are working correctly:

./script/run-functional.sh x86    # Validate X86 (Figure 10 protocols)
./script/run-functional.sh arm    # Validate ARM (Figure 9 protocols)
./script/run-functional.sh all    # Validate both X86 and ARM

This runs a deterministic benchmark (kmeans) and compares the simulated output against native execution.

Generate Workload Configurations

Expected build time: ~0 minutes

Before running experiments, generate the configuration files that define each simulation:

./script/create-configurations.sh

This generates all configuration files at once. Alternatively, generate them individually:

./script/create-conf-x86.sh       # Generate X86 configurations (for Figure 10)
./script/create-conf-arm.sh       # Generate ARM configurations (for Figure 9)
./script/create-conf-litmus.sh    # Generate Litmus test configurations

Generated files:

benchmarks/configuration/commands.conf - X86 experiment commands
benchmarks/configuration/commands-arm.conf - ARM experiment commands
benchmarks/configuration/commands-litmus.conf - Litmus test commands

Run Experiments

Run All Figures

To run all experiments at once (Figure 9, 10, 11):

./script/run-all-fig.sh

This runs run-fig9.sh, run-fig10.sh, and run-fig11.sh sequentially.

Run Individual Figures

Figure 9: ARM Heterogeneous MCM

Figure 9 evaluates heterogeneous memory consistency models (MCM) on ARM architecture, comparing three configurations:

ARM-ARM: Both clusters use ARM relaxed memory model
ARM-TSO: One cluster uses ARM, another uses TSO-enforced
TSO-TSO: Both clusters use TSO-enforced memory model

./script/run-fig9.sh

Filtering options:

./script/run-fig9.sh splash                    # Run Splash-4 only
./script/run-fig9.sh splash barnes             # Run single application
./script/run-fig9.sh splash barnes arm_arm     # Run specific MCM config

Output:

data/fig_9/
├── gem5.output/           # Raw simulation outputs
│   └── {suite}/{app}/{protocol}/{mcm}/
├── summary/               # Extracted statistics (CSV)
└── plot/
    └── fig9_arm_mcm.pdf

Manual plot regeneration:

./script/extract-stats-arm.sh
python3 ./script/plot_fig9.py

Figure 10: Execution Time Comparison

Figure 10 compares execution time across all protocols for PARSEC, Splash-4, and Phoenix benchmarks.

./script/run-fig10.sh

The script runs all experiments in parallel, showing progress every 30 seconds. Upon completion, it automatically extracts statistics and generates the plot.

Filtering options:

./script/run-fig10.sh parsec                          # Run PARSEC only
./script/run-fig10.sh parsec blackscholes             # Run single application
./script/run-fig10.sh parsec blackscholes MESI_unord  # Run specific configuration

Output:

data/fig_10/
├── gem5.output/           # Raw simulation outputs
│   ├── parsec/
│   │   ├── blackscholes/
│   │   │   ├── MESI_unord/
│   │   │   ├── MESI_unord_CXL/
│   │   │   ├── MESI_CXL_MOESI/
│   │   │   └── MESI_CXL_MESIF/
│   │   └── ...
│   ├── splash/
│   └── phoenix/
├── summary/               # Extracted statistics (CSV)
│   ├── MESI_unord/
│   │   ├── parsec.csv
│   │   ├── splash.csv
│   │   └── phoenix.csv
│   └── ...
└── plot/
    └── fig10_execution_time.pdf

Manual plot regeneration:

./script/extract-stats.sh
python3 ./script/plot_fig10.py

Figure 11: Miss Latency Breakdown

Figure 11 shows the miss latency breakdown for 4 representative applications (Barnes, LU-Ncont, Histogram, Vips).

Note: Figure 11 uses the same simulation data from Figure 10. Run Figure 10 experiments first.

Generate the plot:

./script/run-fig11.sh

Output:

data/fig_11/
└── plot/
    └── fig11_miss_latency.pdf

Litmus Tests

Litmus tests validate the correctness of memory consistency model implementations on ARM.

./script/run-litmus.sh

Run a specific test:

./script/run-litmus.sh IRIW_atomic
./script/run-litmus.sh MP_dmb.sys
./script/run-litmus.sh SB_dmb.sy_po

List available tests:

./script/run-litmus.sh --list

Output:

Each Test result (PASS/FAIL) is directly printed on stdout, with a summary after all are completed

Detailed Output:

data/litmus/
├── gem5.output/           # Raw simulation outputs per test
│   ├── IRIW_atomic/
│   ├── MP_dmb.sys/
│   └── ...
└── logs/                  # Execution logs
    ├── IRIW_atomic.log
    └── ...

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
benchmarks		benchmarks
docker		docker
gem5		gem5
script		script
setup		setup
slicc		slicc
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
gem5.tar.gz		gem5.tar.gz
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

C³: CXL Coherence Controllers for Heterogeneous Architectures

Table of Contents

Quick Start

Prerequisites

Install Dependencies

System Dependencies

Python Dependencies (for plotting)

Repository Structure

Build gem5

Build Benchmarks

Functional Validation

Generate Workload Configurations

Run Experiments

Run All Figures

Run Individual Figures

Figure 9: ARM Heterogeneous MCM

Figure 10: Execution Time Comparison

Figure 11: Miss Latency Breakdown

Litmus Tests

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

License

TUM-DSE/C3

Folders and files

Latest commit

History

Repository files navigation

C3: CXL Coherence Controllers for Heterogeneous Architectures

Table of Contents

Quick Start

Prerequisites

Install Dependencies

System Dependencies

Python Dependencies (for plotting)

Repository Structure

Build gem5

Build Benchmarks

Functional Validation

Generate Workload Configurations

Run Experiments

Run All Figures

Run Individual Figures

Figure 9: ARM Heterogeneous MCM

Figure 10: Execution Time Comparison

Figure 11: Miss Latency Breakdown

Litmus Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

C³: CXL Coherence Controllers for Heterogeneous Architectures

Packages