WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@xens25
Copy link

@xens25 xens25 commented Dec 1, 2025

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

categories:
- Sequence Analysis
- Metagenomics
- Quality Control
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Quality Control

@@ -0,0 +1,272 @@
<tool id="kneaddata" name="KneadData" version="0.12.1+galaxy0" python_template_version="3.5" profile="21.05">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="kneaddata" name="KneadData" version="0.12.1+galaxy0" python_template_version="3.5" profile="21.05">
<tool id="kneaddata" name="KneadData" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">

Please introduce the above tokens in the macros.xml file

Comment on lines 3 to 9
<requirements>
<requirement type="package" version="0.12.3">kneaddata</requirement>
<requirement type="package" version="0.40">trimmomatic</requirement>
<requirement type="package" version="2.5.4">bowtie2</requirement>
<requirement type="package" version="4.09.1">trf</requirement>
<requirement type="package" version="0.12.1">fastqc</requirement>
</requirements>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go in macros.xml. Wondering if these dependencies are already part of Kneaddata or does one explicitly need them?

Comment on lines 54 to 55
<option value="s">Single read</option>
<option value="p">Paired reads</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<option value="s">Single read</option>
<option value="p">Paired reads</option>
<option value="single">Single read</option>
<option value="paired">Paired reads</option>

</param>

<when value="s">
<param name="single_read" type="data" format="fastq" label="Single Read"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="single_read" type="data" format="fastq" label="Single Read"/>
<param name="single_read" type="data" format="fastqsanger" label="Single Read"/>

Does it also support fastq.gz?

<when value="s">
<param name="single_read" type="data" format="fastq" label="Single Read"/>
</when>
<when value="p">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<when value="p">
<when value="paired">

Comment on lines 71 to 73
<param name="number_threads" type="integer" value="1" label="Number of threads"/>

<param name="number_processes" type="integer" value="1" label="Number of processes"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="number_threads" type="integer" value="1" label="Number of threads"/>
<param name="number_processes" type="integer" value="1" label="Number of processes"/>

This is not required


<section name="trimmomatic" title="Trimmomatic arguments" >

<param name="max_memory" type="text" value="500m" label="Maximum memory for Trimmomatic"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="max_memory" type="text" value="500m" label="Maximum memory for Trimmomatic"/>


<conditional name="trimmomatic_options">
<param name="select_option" type="select" label="Trimmomatic settings">
<option value="d">Default settings</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better names for option values?

quality filtering, and removal of host contamination using
Bowtie2/TRIMMOMATIC/TRF.
homepage_url: https://github.com/biobakery/kneaddata
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/kneaddata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/kneaddata
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/kneaddata

owner: iuc
type: unrestricted
description: Quality control and contaminant removal for metagenomic data
long_description: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long_description: >
long_description: |

remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/kneaddata
categories:
- Metagenomics
- Statistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Statistics
- Sequence Analysis

@SaimMomin12 SaimMomin12 changed the title Add KneadData tool wrapper New tool addition: KneadData Dec 1, 2025
<requirement type="package" version="0.12.1">fastqc</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
kneaddata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kneaddata
mkdir -p results/
kneaddata

-i1 "$read_type.forward_read"
-i2 "$read_type.backward_read"
#end if
-o "output_dir"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-o "output_dir"
-o results/

metagenomic and metatranscriptomic sequencing data, especially
data from microbiome experiments. It performs adapter trimming,
quality filtering, and removal of host contamination using
Bowtie2/TRIMMOMATIC/TRF.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bowtie2/TRIMMOMATIC/TRF.
Bowtie2, TRIMMOMATIC and TRF.

Comment on lines 19 to 21
#if "$output_prefix"
--output-prefix "$output_prefix"
#end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of removing this parameter

Comment on lines 22 to 23
--threads "$number_threads"
--processes "$number_processes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--threads "$number_threads"
--processes "$number_processes"

Comment on lines 163 to 164
<param name="number_threads" value="1"/>
<param name="number_processes" value="1"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="number_threads" value="1"/>
<param name="number_processes" value="1"/>

</section>
<section name="read_type">
<param name="select_read_type" value="s"/>
<param name="single_read" value="28C.single.fastq"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="single_read" value="28C.single.fastq"/>
<param name="single_read" value="test_single.fastq"/>

<param name="forward_read" value="28C.R1.fastq"/>
<param name="backward_read" value="28C.R2.fastq"/>
</section>
<output name="paired_forward" file="paired_forward.fastq"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<output name="paired_forward" file="paired_forward.fastq"/>
<output name="paired_forward" file="test_paired_1.fastq"/>

<param name="backward_read" value="28C.R2.fastq"/>
</section>
<output name="paired_forward" file="paired_forward.fastq"/>
<output name="paired_backward" file="paired_backward.fastq"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<output name="paired_backward" file="paired_backward.fastq"/>
<output name="paired_backward" file="test_paired_2.fastq"/>

Comment on lines 201 to 259
usage: kneaddata [-h] [--version] [-v] [-i1 INPUT1] [-i2 INPUT2]
[-un UNPAIRED] -o OUTPUT_DIR
[-db REFERENCE_DB] [--bypass-trim] [--run-trim-repetitive]
[--output-prefix OUTPUT_PREFIX] [-t &lt;1&gt;] [-p &lt;1&gt;]
[-q {phred33,phred64}] [--run-bmtagger]
[--run-fastqc-start] [--run-fastqc-end] [--store-temp-output]
[--cat-final-output]
[--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--log LOG]
[--trimmomatic TRIMMOMATIC_PATH] [--max-memory MAX_MEMORY]
[--trimmomatic-options TRIMMOMATIC_OPTIONS]
[--bowtie2 BOWTIE2_PATH] [--bowtie2-options BOWTIE2_OPTIONS]
[--bmtagger BMTAGGER_PATH] [--trf TRF_PATH] [--match MATCH]
[--mismatch MISMATCH] [--delta DELTA] [--pm PM] [--pi PI]
[--minscore MINSCORE] [--maxperiod MAXPERIOD]
[--fastqc FASTQC_PATH]

KneadData

options:


-h, --help show this help message and exit
-v, --verbose additional output is printed
--version show program's version number and exit
-i INPUT, --input INPUT input FASTQ file (add a second argument instance to run with paired input files)
-o OUTPUT_DIR, --output OUTPUT_DIR directory to write output files
--db REFERENCE_DB, --reference-db REFERENCE_DB location of reference database
--run-trim-repetitive Option to trim repetitive/overrepresented sequences generated by FASTQC reports
--bypass-trim bypass the trim step
--output-prefix OUTPUT_PREFIX prefix for all output files [ DEFAULT : $SAMPLE_kneaddata ]
-t <1>, --threads <1> number of threads [ Default : 1 ]
-p <1>, --processes <1> number of processes [ Default : 1 ]
-q <quality>, --quality-scores <quality> quality scores [phred33|phred64] [DEFAULT: phred33]
--run-bmtagger run BMTagger instead of Bowtie2 to identify contaminant reads
--bypass-trf option to bypass the removal of tandem repeats
--run-fastqc-start run fastqc at the beginning of the workflow
--run-fastqc-end run fastqc at the end of the workflow
--store-temp-output store temp output files [ DEFAULT : temp output files are removed ]
--cat-final-output concatenate all final output files [ DEFAULT : final output is not concatenated ]
--log-level <DEBUG|INFO|WARNING|ERROR|CRITICAL> level of log messages [DEFAULT: DEBUG]
--log LOG log file [ DEFAULT : $OUTPUT_DIR/$SAMPLE_kneaddata.log ]
--trimmomatic TRIMMOMATIC_PATH path to trimmomatic [ DEFAULT : $PATH ]
--max-memory MAX_MEMORY max amount of memory [ DEFAULT : 500m ]
--trimmomatic-options TRIMMOMATIC_OPTIONS options for trimmomatic [ DEFAULT : SLIDINGWINDOW:4:20 MINLEN:50 ]
MINLEN is set to 50 percent of total input read length. The user can alternatively specify a length (in bases) for MINLEN.
--sequencer-source options for sequencer-source [ DEFAULT: NexteraPE] Available sequencers: ["NexteraPE","TruSeq2","TruSeq3"]
--bowtie2 BOWTIE2_PATH path to bowtie2 [ DEFAULT : $PATH ]
--bowtie2-options BOWTIE2_OPTIONS options for bowtie2 [ DEFAULT : --very-sensitive ]
--bmtagger BMTAGGER_PATH path to BMTagger [ DEFAULT : $PATH ]
--bypass-trf bypass the TRF step
--trf TRF_PATH path to TRF [ DEFAULT : $PATH ]
--mismatch MISMATCH mismatching penalty [ DEFAULT : 7 ]
--delta DELTA indel penalty [ DEFAULT : 7 ]
--pm PM match probability [ DEFAULT : 80 ]
--pi PI indel probability [ DEFAULT : 10 ]
--minscore MINSCORE minimum alignment score to report [ DEFAULT : 50 ]
--maxperiod MAXPERIOD maximum period size to report [ DEFAULT : 500 ]
--fastqc FASTQC_PATH path to fastqc [ DEFAULT : $PATH ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better help could be what is KneadData tool, what does it do, what inputs it requires, what outputs it gives out etc. Perhaps explanation of a few important parameters

fixed help section
added bmtagger
added all missing arguments
added additional outputs
Updated long description formatting and corrected repository URL.
- Updated tool XML with all arguments
- Updated  macros.xml
</macros>
<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir -p results/ ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir -p results/ ;
mkdir -p results/

<expand macro="requirements" />
<command detect_errors="exit_code"><![CDATA[
mkdir -p results/ ;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

$cat_final_output

]]></command>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

<inputs>

<conditional name="read_type">
<param name="select_read_type" type="select" label="Read type" help="Specify whether your sequencing data is single-end (one file) or paired-end (two files).">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="select_read_type" type="select" label="Read type" help="Specify whether your sequencing data is single-end (one file) or paired-end (two files).">
<param name="select_read_type" type="select" label="Read type" help="Specify your sequencing data type.">

</param>

<when value="single">
<param name="single_read" type="data" format="fastqsanger" label="Single Read"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the tool also accomodate fastqsanger.gz

<when value="single">
<param name="single_read" type="data" format="fastqsanger" label="Single Read"/>
</when>
<when value="paired">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement paired_collection input instead of paired input

<conditional name="reference_genome">
<param name="source" type="select" label="Select reference database source" help="Select a built-in genome index or upload your own FASTA file">
<option value="indexed">Use built-in genome index</option>
<option value="history">Upload custom FASTA file</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not implemented in the CMD section. If you like, you can also remove it

<conditional name="reference_genome">
<param name="source" type="select" label="Select reference database source" help="Select a built-in genome index or upload your own FASTA file">
<option value="indexed">Use built-in genome index</option>
<option value="history">Upload custom FASTA file</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines +141 to +144
<param name="bmtagger_db" type="select" label="Select reference genome database"
help="Select the organism genome to filter out. This genome's reads will be removed as host contamination.">
<options from_data_table="bmtagger_indexes">
<filter type="sort_by" column="3" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check with the bmtagger data tables from the bmtagger tool

Comment on lines +270 to +272
<param name="store_temp" type="boolean" truevalue="--store-temp-output" falsevalue="" label="Keep temporary files"
help="Save temporary/intermediate files generated during processing. This includes alignment files and trimmed intermediate reads. This will
increase the output file size."/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be skipped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants