It’s time for the monthly Nextflow release for March, edge version 19.03. This is another great release with some cool new features, bug fixes and improvements.
This sees the introduction of the long-awaited sequence read archive (SRA) channel factory. The SRA is a key public repository for sequencing data and run in coordination between The National Center for Biotechnology Information (NCBI), The European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ).
This feature originates all the way back in 2015 and was worked on during a 2018 Nextflow hackathon. It was brought to fore again thanks to the release of Phil Ewels’ excellent SRA Explorer. The SRA channel factory allows users to pull read data in FASTQ format directly from SRA by referencing a study, accession ID or even a keyword. It works in a similar way to fromFilePairs
, returning a sample ID and files (single or pairs of files) for each sample.
The code snippet below creates a channel containing 24 samples from a chromatin dynamics study and runs FASTQC on the resulting files.
Channel
.fromSRA('SRP043510')
.set{reads}
process fastqc {
input:
set sample_id, file(reads_file) from reads
output:
file("fastqc_${sample_id}_logs") into fastqc_ch
script:
"""
mkdir fastqc_${sample_id}_logs
fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads_file}
"""
}
See the documentation for more details. When combined with downstream processes, you can quickly open a firehose of data on your workflow!
Note that this is a monthly edge release. To use it simply execute the following command prior to running Nextflow:
export NXF_VER=19.03.0-edge
Please don’t hesitate to use our very active Gitter channel or create a thread in the Google discussion group.
Experiencing issues introduced by this release? Please report them in our issue tracker. Make sure to fill in the fields of the issue template.
Special thanks to the contributors of this release:
None known.