WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

TUM-DSE/C3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C3: CXL Coherence Controllers for Heterogeneous Architectures

Artifact for "C3: CXL Coherence Controllers for Heterogeneous Architectures", HPCA '26

This repository contains the models and workloads for evaluating CXL-based cache coherence protocols using the gem5 simulator. The evaluated protocols are:

  • MESI-MESI-MESI: Baseline MESI protocol
  • MESI-CXL-MESI: Two MESI cluster connected through CXL coherence
  • MESI-CXL-MOESI: MESI & MOESI clusters connected through CXL coherence
  • MESI-CXL-MESIF: MESI & MESIF clusters connected through CXL coherence

Table of Contents


Quick Start

For simplicity, we provide Docker images:

$ just docker-prebuilt

Ready to run experiments -- includes pre-compiled gem5 models for C3 and pre-compiled workloads

Then follow from: Functional Validation

  • Option 2: Base image (~1 GB)
$ just docker-base

Ready to compile gem5 models and the workloads -- includes system dependencies

Then follow from: Build gem5

  • Option 3: Manual set-up (~1h30-3h, 30 GB free space required)

Follow the next steps to manually set up the environment


Prerequisites

  • OS: Ubuntu 22.04 LTS or 24.04 LTS
  • Compiler: GCC 11.4.0
  • Python: 3.10+
  • SCons: 4.0+

Install Dependencies

System Dependencies

sudo apt-get update && sudo apt-get install -y \
    build-essential \
    cmake \
    g++ \
    git \
    python3 \
    python3-pip \
    python3-venv \
    scons \
    zlib1g-dev \
    libprotobuf-dev \
    protobuf-compiler \
    libgoogle-perftools-dev \
    libboost-all-dev \
    libhdf5-serial-dev \
    libpng-dev \
    libjemalloc-dev \
    pkg-config \
    wget \
    m4 \
    libtbb-dev \
    gettext \
    libgettextpo-dev \
    libglw1-mesa-dev \
    libxext-dev \
    libx11-dev \
    libxmu-dev \
    libglut-dev \
    libxi-dev \
    gcc-aarch64-linux-gnu \
    g++-aarch64-linux-gnu

Python Dependencies (for plotting)

pip3 install pandas numpy matplotlib seaborn

Repository Structure

C-3-Artifact/
├── gem5/                    # gem5 simulator source
├── slicc/                   # SLICC protocol definitions
├── benchmarks/              # Benchmark suites
│   ├── parsec-benchmark/    # PARSEC 3.0 (X86)
│   ├── Splash-4/            # Splash-4 (X86)
│   ├── phoenix/             # Phoenix (X86)
│   ├── parsec-benchmark-arm/# PARSEC 3.0 (ARM)
│   ├── Splash-4-arm/        # Splash-4 (ARM)
│   └── phoenix-arm/         # Phoenix (ARM)
├── script/                  # Build and run scripts
│   ├── build-gem5.sh            # Build gem5 for X86 and ARM
│   ├── build-benchmark.sh       # Build all benchmarks (X86 + ARM)
│   ├── build-benchmark-x86.sh   # Build X86 benchmarks only
│   ├── build-benchmark-arm.sh   # Build ARM benchmarks only
│   ├── run-functional.sh        # Functional validation
│   ├── create-configurations.sh # Generate all configurations
│   ├── create-conf-x86.sh       # Generate X86 configurations
│   ├── create-conf-arm.sh       # Generate ARM configurations
│   ├── create-conf-litmus.sh    # Generate Litmus configurations
│   ├── run-all-fig.sh           # Run all experiments (Fig 9, 10, 11)
│   ├── run-fig9.sh              # Run Figure 9 experiments (ARM MCM)
│   ├── run-fig10.sh             # Run Figure 10 experiments (X86)
│   ├── run-fig11.sh             # Run Figure 11 experiments
│   ├── run-litmus.sh            # Run ARM litmus tests
│   ├── extract-stats.sh         # Extract X86 statistics
│   ├── extract-stats-arm.sh     # Extract ARM statistics
│   ├── plot_fig9.py             # Generate Figure 9 plot
│   ├── plot_fig10.py            # Generate Figure 10 plot
│   └── plot_fig11.py            # Generate Figure 11 plot
├── setup/                   # Protocol setup scripts
└── data/                    # Output directory (created at runtime)

Build gem5

Expected build time: ~1h-2h (depending on CPU and parallelism).

Build the gem5 simulator with all cache coherence protocols for both X86 and ARM architectures:

./script/build-gem5.sh

This builds gem5 for each protocol and architecture:

X86 builds:

  • gem5/build/X86_MESI_unord/gem5.opt
  • gem5/build/X86_MESI_unord_CXL/gem5.opt
  • gem5/build/X86_MESI_CXL_MOESI/gem5.opt
  • gem5/build/X86_MESI_CXL_MESIF/gem5.opt

ARM builds:

  • gem5/build/ARM_MESI_unord/gem5.opt
  • gem5/build/ARM_MESI_unord_CXL/gem5.opt
  • gem5/build/ARM_MESI_CXL_MOESI/gem5.opt
  • gem5/build/ARM_MESI_CXL_MESIF/gem5.opt

Build Benchmarks

Expected build time: ~20-40 minutes

Build all three benchmark suites (PARSEC, SPLASH-4, Phoenix) for both X86 and ARM:

./script/build-benchmark.sh

This script internally calls both build-benchmark-x86.sh and build-benchmark-arm.sh.

To build for a specific architecture only:

./script/build-benchmark-x86.sh   # Build X86 benchmarks only
./script/build-benchmark-arm.sh   # Build ARM benchmarks only

Functional Validation

[Optional] Expected run time: ~8min

Before running the full experiments, you can validate that gem5 and benchmarks are working correctly:

./script/run-functional.sh x86    # Validate X86 (Figure 10 protocols)
./script/run-functional.sh arm    # Validate ARM (Figure 9 protocols)
./script/run-functional.sh all    # Validate both X86 and ARM

This runs a deterministic benchmark (kmeans) and compares the simulated output against native execution.


Generate Workload Configurations

Expected build time: ~0 minutes

Before running experiments, generate the configuration files that define each simulation:

./script/create-configurations.sh

This generates all configuration files at once. Alternatively, generate them individually:

./script/create-conf-x86.sh       # Generate X86 configurations (for Figure 10)
./script/create-conf-arm.sh       # Generate ARM configurations (for Figure 9)
./script/create-conf-litmus.sh    # Generate Litmus test configurations

Generated files:

  • benchmarks/configuration/commands.conf - X86 experiment commands
  • benchmarks/configuration/commands-arm.conf - ARM experiment commands
  • benchmarks/configuration/commands-litmus.conf - Litmus test commands

Run Experiments

Run All Figures

To run all experiments at once (Figure 9, 10, 11):

./script/run-all-fig.sh

This runs run-fig9.sh, run-fig10.sh, and run-fig11.sh sequentially.

Run Individual Figures

Figure 9: ARM Heterogeneous MCM

Figure 9 evaluates heterogeneous memory consistency models (MCM) on ARM architecture, comparing three configurations:

  • ARM-ARM: Both clusters use ARM relaxed memory model
  • ARM-TSO: One cluster uses ARM, another uses TSO-enforced
  • TSO-TSO: Both clusters use TSO-enforced memory model
./script/run-fig9.sh

Filtering options:

./script/run-fig9.sh splash                    # Run Splash-4 only
./script/run-fig9.sh splash barnes             # Run single application
./script/run-fig9.sh splash barnes arm_arm     # Run specific MCM config

Output:

data/fig_9/
├── gem5.output/           # Raw simulation outputs
│   └── {suite}/{app}/{protocol}/{mcm}/
├── summary/               # Extracted statistics (CSV)
└── plot/
    └── fig9_arm_mcm.pdf

Manual plot regeneration:

./script/extract-stats-arm.sh
python3 ./script/plot_fig9.py

Figure 10: Execution Time Comparison

Figure 10 compares execution time across all protocols for PARSEC, Splash-4, and Phoenix benchmarks.

./script/run-fig10.sh

The script runs all experiments in parallel, showing progress every 30 seconds. Upon completion, it automatically extracts statistics and generates the plot.

Filtering options:

./script/run-fig10.sh parsec                          # Run PARSEC only
./script/run-fig10.sh parsec blackscholes             # Run single application
./script/run-fig10.sh parsec blackscholes MESI_unord  # Run specific configuration

Output:

data/fig_10/
├── gem5.output/           # Raw simulation outputs
│   ├── parsec/
│   │   ├── blackscholes/
│   │   │   ├── MESI_unord/
│   │   │   ├── MESI_unord_CXL/
│   │   │   ├── MESI_CXL_MOESI/
│   │   │   └── MESI_CXL_MESIF/
│   │   └── ...
│   ├── splash/
│   └── phoenix/
├── summary/               # Extracted statistics (CSV)
│   ├── MESI_unord/
│   │   ├── parsec.csv
│   │   ├── splash.csv
│   │   └── phoenix.csv
│   └── ...
└── plot/
    └── fig10_execution_time.pdf

Manual plot regeneration:

./script/extract-stats.sh
python3 ./script/plot_fig10.py

Figure 11: Miss Latency Breakdown

Figure 11 shows the miss latency breakdown for 4 representative applications (Barnes, LU-Ncont, Histogram, Vips).

Note: Figure 11 uses the same simulation data from Figure 10. Run Figure 10 experiments first.

Generate the plot:

./script/run-fig11.sh

Output:

data/fig_11/
└── plot/
    └── fig11_miss_latency.pdf

Litmus Tests

Litmus tests validate the correctness of memory consistency model implementations on ARM.

./script/run-litmus.sh

Run a specific test:

./script/run-litmus.sh IRIW_atomic
./script/run-litmus.sh MP_dmb.sys
./script/run-litmus.sh SB_dmb.sy_po

List available tests:

./script/run-litmus.sh --list

Output:

Each Test result (PASS/FAIL) is directly printed on stdout, with a summary after all are completed

Detailed Output:

data/litmus/
├── gem5.output/           # Raw simulation outputs per test
│   ├── IRIW_atomic/
│   ├── MP_dmb.sys/
│   └── ...
└── logs/                  # Execution logs
    ├── IRIW_atomic.log
    └── ...

About

Artifact for "C3: CXL Coherence Controllers for Heterogeneous Architectures" HPCA '26

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •