This repository contains a production-ready implementation of Deep Kernel Gaussian Process models for population-level temporal data analysis, specifically designed for medical imaging applications and biomarker trajectory prediction. This code will be used for the integration of the models to NiChart platform (https://neuroimagingchart.com/).
The DKGP model combines deep neural networks with Gaussian Processes to provide:
- Population-level modeling of temporal trajectories
- Uncertainty quantification with confidence intervals that correspond to the 95% percentile of the posterior predictive distribution
- Deep feature learning for complex temporal patterns
- Production-ready inference for new subjects
- 8-year trajectory forecasting with 12-month intervals
Option A: Automated Setup (Recommended)
# Clone the repository
git clone [email protected]:CBICA/NiChart_DKGP.git
cd DKGP
# Run the automated setup script
./setup_minimal_environment.sh
# Activate the environment
source dkgp-venv/bin/activateOption B: Manual Setup
# Create virtual environment
python -m venv dkgp-venv
# Activate environment
source dkgp-venv/bin/activate
# Install dependencies
pip install -r requirements_minimal.txtHardware:
- GPU: NVIDIA RTX A6000 or similar (recommended)
- CPU: Multi-core processor (tested on 2x Intel Xeon Gold 6248R)
- RAM: 16GB+ (tested on 754GB)
- Storage: High-performance SSD recommended
Software:
- Python 3.8+
- CUDA 11.7+ (for GPU acceleration)
- PyTorch 2.4.1+
- GPyTorch 1.13+
# Activate environment
source dkgp-venv/bin/activate
# Test all dependencies
python -c "import torch; import gpytorch; import pandas; import numpy; import sklearn; print('✅ All dependencies installed successfully!')"
# Test inference
./run_inference.sh /output/folder/path/ hippocampus_right# Activate environment
source dkgp-venv/bin/activateThe comprehensive inference script supports all biomarker types with a single command:
# Run the COMPLETE inferencing pipeline including the preprocessing step (NiChart)
NiChart_Run_all.sh {INPUT_CSV_PATH} {OUTPUT_FOLDER_PATH}
# Run inference for specific biomarkers
./run_inference.sh /output/folder/path/ hippocampus_right
./run_inference.sh /output/folder/path/ spare_ad
./run_inference.sh /output/folder/path/ mmse
# Run inference for all biomarkers
./run_inference.sh /output/folder/path/ all
# Run inference for all 145 Volume ROIs (creates single CSV)
./run_inference.sh /output/folder/path/ volume_roisVolume ROIs:
hippocampus_right(Index 14) - Right Hippocampushippocampus_left(Index 15) - Left Hippocampusventricle_right(Index 16) - Right Lateral Ventricleventricle_left(Index 17) - Left Lateral Ventriclevolume_rois- All 145 Volume ROIs (single CSV output)
SPARE Scores:
spare_ad(Index 0) - SPARE-AD Scorespare_ba(Index 1) - SPARE-BA Score
Cognitive Scores:
mmse(Index 0) - MMSE Cognitive Scoreadas(Index 0) - ADAS Cognitive Score
Combined Options:
all- Run inference for all individual biomarkers
Results are saved to /output/folder/path/ directory:
output/
├── hippocampus_right_output.csv # Individual biomarker files
├── hippocampus_left_output.csv
├── lateral_ventricle_right_output.csv
├── lateral_ventricle_left_output.csv
├── spare_ad_output.csv
├── spare_ba_output.csv
├── mmse_output.csv
├── adas_output.csv
└── volume_rois_output.csv # All 145 ROIs in single file
Each CSV file contains:
PTID: Subject identifiertime_months: Time point (12, 24, 36, 48, 60, 72, 84, 96 months)predicted_value: DKGP predictionvariance: Model uncertaintylower_bound,upper_bound: 95% confidence intervalsinterval_width: Width of confidence intervalroi_idx: ROI index
For volume_rois_output.csv:
PTID,Time: Subject and time columnsDL_MUSE_0toDL_MUSE_144: All 145 ROI predictions
# Activate environment
source dkgp-venv/bin/activate
# Validate inference quality and create publication-ready plots
python visualize_trajectories.py --csv_file output/hippocampus_right_output.csv
# Create specific plot types
python visualize_trajectories.py --csv_file output/spare_ad_output.csv --plot_type trajectory
python visualize_trajectories.py --csv_file output/mmse_output.csv --plot_type uncertainty
python visualize_trajectories.py --csv_file output/hippocampus_left_output.csv --plot_type summaryPlot Types:
trajectory- Population mean trajectory with individual subject trajectoriesuncertainty- Model uncertainty validation plotsdiversity- Trajectory slope distribution analysissummary- Comprehensive validation summary (6-panel plot)all- All plot types (default)
# Plot random subject trajectory
python plot_single_subject.py --csv_file output/hippocampus_right_output.csv
# Plot specific subject
python plot_single_subject.py --csv_file output/spare_ad_output.csv --subject_id 002_S_1155
# Custom output directory
python plot_single_subject.py --csv_file output/mmse_output.csv --output_dir ./my_plotsDKGP provides extremely fast inference for biomarker trajectory prediction:
| Biomarker Type | Per-Subject Time | Per-Prediction Time | Throughput |
|---|---|---|---|
| Single ROI (8 time points) | 6.5ms | 0.8ms | ~154 subjects/sec |
| All 145 ROIs (8 time points each) | 0.94s | 0.8ms | ~1.1 subjects/sec |
| SPARE Scores (8 time points) | 6.5ms | 0.8ms | ~154 subjects/sec |
| Cognitive Scores (8 time points) | 6.5ms | 0.8ms | ~154 subjects/sec |
| Scenario | Subjects | Total Time | Per-Subject Time |
|---|---|---|---|
| Single ROI (e.g., Hippocampus) | 617 | ~4 seconds | 6.5ms |
| All 145 Volume ROIs | 617 | ~10 minutes | 0.94s |
| SPARE Scores | 617 | ~4 seconds | 6.5ms |
| Cognitive Scores | 617 | ~4 seconds | 6.5ms |
- ⚡ Real-time prediction suitable for clinical applications
- 🎯 8-year trajectory generation in milliseconds per subject
- 📊 Uncertainty quantification with 95% confidence intervals
- 🔄 Batch processing optimized for population studies
Benchmarked on Intel Xeon Gold 6248R CPU @ 3.00GHz with 617 test subjects.
Note: GPU acceleration can provide 3-5x speedup for large-scale batch processing.
NiChart_DKGP/
├── NiChart_Run_all.sh # Main inference script
├── run_preprocess_data.sh # Initial data preprocessing from a csv containing: ROI Volumes + clinical data & demographics
├── run_inference.sh # Core inference script
├── pdkgp_future_inference.py # Core inference logic
├── visualize_trajectories.py # Validation and plotting
├── plot_single_subject.py # Single subject visualization
├── setup_minimal_environment.sh # Environment setup script
├── requirements_minimal.txt # Minimal dependencies
├── environment_minimal.yml # Conda environment (alternative)
├── README.md # This file
├── .gitignore # Git ignore rules
├── data/ # Input data (not tracked)
├── models/ # Trained models (not tracked)
└── output/ # Inference results (not tracked)
1. CUDA/GPU Issues:
# Check CUDA availability
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# If CUDA not available, models will run on CPU (slower)2. Memory Issues:
# For large datasets, reduce batch size in pdkgp_future_inference.py
# Or process fewer subjects at once3. Environment Issues:
# Recreate environment
rm -rf dkgp-venv
./setup_minimal_environment.sh4. Missing Dependencies:
# Reinstall requirements
source dkgp-venv/bin/activate
pip install -r requirements_minimal.txt --force-reinstallIf you encounter issues:
- Check the troubleshooting section above
- Verify your system meets the requirements
- Ensure all dependencies are installed correctly
- Check that data and model files are in the correct locations
If you use this code in your research, please cite:
@article{tassopoulouadaptive,
title={Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories},
author={Tassopoulou, Vasiliki and Shou, Haochang and Davatzikos, Christos}
}
@inproceedings{tassopoulou2022deep,
title={Deep kernel learning with temporal gaussian processes for clinical variable prediction in alzheimer’s disease},
author={Tassopoulou, Vasiliki and Yu, Fanyang and Davatzikos, Christos},
booktitle={Machine Learning for Health},
pages={539--551},
year={2022},
organization={PMLR}
}