Analysis
The analysis of the output to attain a consensus genome for each sample, as well as find the lineage information of each sample, is done using the ARTIC bioinformatics pipeline and pangolin.
The are 2 methods within the ARTIC pipeline.
- Nanopolish (uses both fastq and fast5 files)
- Medaka (uses only fastq files)
We mostly use medaka for our routine samples. Either method works well enough for contact tracing and lineage analysis.
Nanopolish
Basic pipeline
Gather
Collects the reads which pass quality and length filtering. The --min-length/--max-length
flags depend on if you used the ligation kit or rapid kit.
Single sample
Multiple samples
Medaka
Basic pipeline
Single sample
Multiple samples
folder structure
PKRP008209
├── base
├── medaka
├── nanopolish
└── PKRP008209
└── 1905-8
└── 20210113_0459_1C_PAF26262_d0402aba
├── fastq_fail
└── fastq_pass
Make folders
mkdir base medaka nanopolish
in the base folder
cd base
Set up variable for run prefix (in this example, it is PKRP008209)
RUN=PKRP008209
Gather
Collects the reads which pass quality and length filtering. The --min-length/--max-length
flags depend on if you used the ligation kit or rapid kit.
Ligation
ARTIC 400bp primers
artic gather --min-length 300 --max-length 600 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s
Midnight 1200bp primers
artic gather --min-length 1000 --max-length 1400 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s
Eden/Kirby 2500bp primers
artic gather --min-length 2300 --max-length 2700 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s
Rapid
ARTIC 400bp primers
Probably not a good idea to use the rapid kit on 400bp amplicons, as it will make many fragments shorter then 300bp. The basecallers struggle with sequences smaller than this.
Midnight 1200bp primers
artic gather --min-length 300 --max-length 1400 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s
Eden/Kirby 2500bp primers
artic gather --min-length 300 --max-length 2700 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s
Demultiplexing
Ligation
artic demultiplex --threads 8 ${RUN}_fastq_pass.fastq
Rapid
porechop --verbosity 2 --untrimmed -i "${RUN}_fastq_pass.fastq" -b ./ --rapid_barcodes --discard_middle --barcode_threshold 80 --threads 8 --check_reads 10000 --barcode_diff 5 > ${RUN}_fastq_pass.fastq.demultiplexreport.txt
# it wont name the files properly this way, so rename them with this
for file in BC*.fastq; do mv $file ${RUN}_fastq_pass-${file}; done
consensus with medaka
using a for loop, we can name each folder and prefix each file with the sample information. Using the syntax: for i in "barcode_number<space>sample_ID"<space>"..."
This will then take the fastq for each barcode_number, 01, 02, etc., and name it with the corresponding sample name, for example, if the sample name is R_nCoV_0013, then the command has the static portion R_nCoV_${a[1]}
, and inserts the variable portion with the number, say, 0013. The commands below are for 2 barcoded samples, from barcodes 01 and 02, with their corresponding sample numbers, 0013, 0014. Change this part in the for loop to match the samples you wish to analyse.
for the barcodes, if you used the ligation method, the barcodes are prefixed with NB, where the rapid barcodes are prefixed with BC. Change in the commands accordingly
move to the medaka folder
cd ../medaka
ARTIC 400bp primers
for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_NB${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_NB${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/artic-ncov2019/primer_schemes --read-file ../../base/${RUN}_fastq_pass-NB${a[0]}.fastq nCoV-2019/V3 R_nCoV_${a[1]}; cd ../; done
Midnight 1200bp primers
for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/SARS-CoV-2_GTG/protocols/Midnight/schemes --read-file ../../base/${RUN}_fastq_pass-BC${a[0]}.fastq nCoV-2019/V1 R_nCoV_${a[1]}; cd ../; done
Eden/Kirby 2500bp primers
for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/SARS-CoV-2_GTG/protocols/Kirby/schemes --read-file ../../base/${RUN}_fastq_pass-BC${a[0]}.fastq nCoV-2019/V1 R_nCoV_${a[1]}; cd ../; done
Example of medaka only output structure
PKRP008209/.
├── base
├── medaka
│ ├── kirby_nCoV_1905_PKRP008209_BC09_medaka
│ ├── kirby_nCoV_1906_PKRP008209_BC10_medaka
│ ├── kirby_nCoV_1907_PKRP008209_BC11_medaka
│ └── kirby_nCoV_1908_PKRP008209_BC12_medaka
└── PKRP008209
└── 1905-8
└── 20210113_0459_1C_PAF26262_d0402aba
├── fastq_fail
└── fastq_pass