Data pipeline to provide individual, combined and consensus filtered domain annotations for protein structures using Chainsaw, Merizo and UniDoc.
Clone the repo. https://github.com/UCLOrengoGroup/domain-annotation-pipeline
Install Nextflow
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install nextflow
Install Docker https://docs.docker.com/compose/install/
Build Docker containers and run
docker compose buildThe following runs the debug mode, which uses test data included in this repository.
nextflow run workflows/annotate.nf -profile debug,docker
The pipeline expects two inputs:
- a zip file containing PDB files
- a file containing all the ids that should be processed
Given the following directory:
pdb_files/A0A3G5A0R2.pdb
pdb_files/A0A8S5U119.pdb
pdb_files/A0A0B5IZ33.pdb
pdb_files/UPI001E716444.pdb
pdb_files/A0A6C0N656.pdbCreate a zip file from all PDB files in this directory:
cd pdb_files
zip -r ../pdb_files.zip .Create a file containing all the ids to process:
# list the files in the zip and remove the `.pdb` suffix
zipinfo -1 pdb_files.zip | sed 's/.pdb//g' > ids.txtPass these parameters to nextflow:
nextflow run workflows/annotate.nf \
--pdb_zip_file pdb_files.zip \
--uniprot_csv_file ids.txtThese instructions are specific to the HPC setup in UCL Computer Sciences:
- Clone the GitHub repository
- Request access to the NextFlow submit node:
askey - Login to
askey
Set the following NextFlow environment variables interactively or add to ~/.bashrc.
export NXF_OPTS='-Xms3g -Xmx3g'
export PATH=/share/apps/jdk-20.0.2/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/jdk-20.0.2/lib:$LD_LIBRARY_PATH
export JAVA_HOME=/share/apps/jdk-20.0.2
export PATH=/share/apps/genomics/nextflow-local-23.04.2:$PATHCreate a cache directory for NextFlow (not entirely necessary but will prevent warnings).
mkdir ~/scratch
mkdir ~/scratch/nextflow_singularity_cache
export NXF_SINGULARITY_CACHEDIR=$HOME/Scratch/nextflow_singularity_cacheSet the following Python environment variables interactively or add to ~/.bashrc.
export PATH=/share/apps/python-3.13.0a6-shared/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/python-3.13.0a6-shared/lib:$LD_LIBRARY_PATH
source /share/apps/source_files/python/python-3.13.0a6.sourceSet up the venv environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtThe latest containers are built and stored in GitHub Container Reposity (ghrc.io) as part of the automated build.
These can be downloaded as singularity images with singularity pull:
Note: the following requires setting up a GitHub personal access token
singularity pull --docker-login domain-annotation-pipeline-script_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-script:main-latest
singularity pull --docker-login domain-annotation-pipeline-cath-af-cli_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-cath-af-cli:main-latest
singularity pull --docker-login domain-annotation-pipeline-ted-tools_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-ted-tools:main-latestThe directory containing these singularity images can be added to your config file, or passed directly to nextflow:
nextflow run workflows/annotate -profile singularity \
--singularity_image_dir "/path/to/singularity_images"