WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

UCLOrengoGroup/domain-annotation-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain Annotation Pipeline

Data pipeline to provide individual, combined and consensus filtered domain annotations for protein structures using Chainsaw, Merizo and UniDoc.

Install (with docker)

Clone the repo. https://github.com/UCLOrengoGroup/domain-annotation-pipeline

Install Nextflow

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install nextflow

Install Docker https://docs.docker.com/compose/install/

Build Docker containers and run

docker compose build

Running the workflow

The following runs the debug mode, which uses test data included in this repository.

nextflow run workflows/annotate.nf -profile debug,docker

Preparing data

The pipeline expects two inputs:

  • a zip file containing PDB files
  • a file containing all the ids that should be processed

Given the following directory:

pdb_files/A0A3G5A0R2.pdb
pdb_files/A0A8S5U119.pdb
pdb_files/A0A0B5IZ33.pdb
pdb_files/UPI001E716444.pdb
pdb_files/A0A6C0N656.pdb

Create a zip file from all PDB files in this directory:

cd pdb_files
zip -r ../pdb_files.zip .

Create a file containing all the ids to process:

# list the files in the zip and remove the `.pdb` suffix
zipinfo -1 pdb_files.zip | sed 's/.pdb//g' > ids.txt

Pass these parameters to nextflow:

nextflow run workflows/annotate.nf \
    --pdb_zip_file pdb_files.zip \
    --uniprot_csv_file ids.txt

Running on HPC

Install (with singularity)

These instructions are specific to the HPC setup in UCL Computer Sciences:

  • Clone the GitHub repository
  • Request access to the NextFlow submit node: askey
  • Login to askey

Set the following NextFlow environment variables interactively or add to ~/.bashrc.

export NXF_OPTS='-Xms3g -Xmx3g'
export PATH=/share/apps/jdk-20.0.2/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/jdk-20.0.2/lib:$LD_LIBRARY_PATH
export JAVA_HOME=/share/apps/jdk-20.0.2
export PATH=/share/apps/genomics/nextflow-local-23.04.2:$PATH

Create a cache directory for NextFlow (not entirely necessary but will prevent warnings).

mkdir ~/scratch
mkdir ~/scratch/nextflow_singularity_cache
export NXF_SINGULARITY_CACHEDIR=$HOME/Scratch/nextflow_singularity_cache

Set the following Python environment variables interactively or add to ~/.bashrc.

export PATH=/share/apps/python-3.13.0a6-shared/bin:$PATH
export LD_LIBRARY_PATH=/share/apps/python-3.13.0a6-shared/lib:$LD_LIBRARY_PATH
source /share/apps/source_files/python/python-3.13.0a6.source

Set up the venv environment

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

The latest containers are built and stored in GitHub Container Reposity (ghrc.io) as part of the automated build.

These can be downloaded as singularity images with singularity pull:

Note: the following requires setting up a GitHub personal access token

singularity pull --docker-login domain-annotation-pipeline-script_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-script:main-latest
singularity pull --docker-login domain-annotation-pipeline-cath-af-cli_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-cath-af-cli:main-latest
singularity pull --docker-login domain-annotation-pipeline-ted-tools_latest.sif docker://ghcr.io/uclorengogroup/domain-annotation-pipeline-ted-tools:main-latest

The directory containing these singularity images can be added to your config file, or passed directly to nextflow:

nextflow run workflows/annotate -profile singularity \
    --singularity_image_dir "/path/to/singularity_images"

About

Data pipeline to provide consensus domain annotations for protein structures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •