Analysis

The analysis of the output to attain a consensus genome for each sample, as well as find the lineage information of each sample, is done using the ARTIC bioinformatics pipeline and pangolin.

The are 2 methods within the ARTIC pipeline.

  1. Nanopolish (uses both fastq and fast5 files)
  2. Medaka (uses only fastq files)

We mostly use medaka for our routine samples. Either method works well enough for contact tracing and lineage analysis.

Nanopolish

Basic pipeline

Gather

Collects the reads which pass quality and length filtering. The --min-length/--max-length flags depend on if you used the ligation kit or rapid kit.

Single sample

Multiple samples

Medaka

Basic pipeline

Single sample

Multiple samples

folder structure

PKRP008209
├── base
├── medaka
├── nanopolish
└── PKRP008209
    └── 1905-8
        └── 20210113_0459_1C_PAF26262_d0402aba
            ├── fastq_fail
            └── fastq_pass

Make folders

mkdir base medaka nanopolish

in the base folder

cd base

Set up variable for run prefix (in this example, it is PKRP008209)

RUN=PKRP008209

Gather

Collects the reads which pass quality and length filtering. The --min-length/--max-length flags depend on if you used the ligation kit or rapid kit.

Ligation

ARTIC 400bp primers

artic gather --min-length 300 --max-length 600 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s

Midnight 1200bp primers

artic gather --min-length 1000 --max-length 1400 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s

Eden/Kirby 2500bp primers

artic gather --min-length 2300 --max-length 2700 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s

Rapid

ARTIC 400bp primers

Probably not a good idea to use the rapid kit on 400bp amplicons, as it will make many fragments shorter then 300bp. The basecallers struggle with sequences smaller than this.

Midnight 1200bp primers

artic gather --min-length 300 --max-length 1400 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s

Eden/Kirby 2500bp primers

artic gather --min-length 300 --max-length 2700 --prefix ${RUN} --directory ../${RUN}/*/ --no-fast5s

Demultiplexing

Ligation

artic demultiplex --threads 8 ${RUN}_fastq_pass.fastq

Rapid

porechop --verbosity 2 --untrimmed -i "${RUN}_fastq_pass.fastq" -b ./ --rapid_barcodes --discard_middle --barcode_threshold 80 --threads 8 --check_reads 10000 --barcode_diff 5 > ${RUN}_fastq_pass.fastq.demultiplexreport.txt

# it wont name the files properly this way, so rename them with this

for file in BC*.fastq; do mv $file ${RUN}_fastq_pass-${file}; done

consensus with medaka

using a for loop, we can name each folder and prefix each file with the sample information. Using the syntax: for i in "barcode_number<space>sample_ID"<space>"..."

This will then take the fastq for each barcode_number, 01, 02, etc., and name it with the corresponding sample name, for example, if the sample name is R_nCoV_0013, then the command has the static portion R_nCoV_${a[1]}, and inserts the variable portion with the number, say, 0013. The commands below are for 2 barcoded samples, from barcodes 01 and 02, with their corresponding sample numbers, 0013, 0014. Change this part in the for loop to match the samples you wish to analyse.

for the barcodes, if you used the ligation method, the barcodes are prefixed with NB, where the rapid barcodes are prefixed with BC. Change in the commands accordingly

move to the medaka folder

cd ../medaka

ARTIC 400bp primers

for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_NB${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_NB${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/artic-ncov2019/primer_schemes --read-file ../../base/${RUN}_fastq_pass-NB${a[0]}.fastq nCoV-2019/V3 R_nCoV_${a[1]}; cd ../; done

Midnight 1200bp primers

for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/SARS-CoV-2_GTG/protocols/Midnight/schemes --read-file ../../base/${RUN}_fastq_pass-BC${a[0]}.fastq nCoV-2019/V1 R_nCoV_${a[1]}; cd ../; done

Eden/Kirby 2500bp primers

for i in "01 0013" "02 0014"; do a=( $i ); mkdir midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; cd midnight_R_nCoV_${a[1]}_${RUN}_BC${a[0]}_medaka; artic minion --minimap2 --medaka --normalise 200 --threads 8 --scheme-directory /data1/software/SARS-CoV-2_GTG/protocols/Kirby/schemes --read-file ../../base/${RUN}_fastq_pass-BC${a[0]}.fastq nCoV-2019/V1 R_nCoV_${a[1]}; cd ../; done

Example of medaka only output structure

PKRP008209/.
├── base
├── medaka
│   ├── kirby_nCoV_1905_PKRP008209_BC09_medaka
│   ├── kirby_nCoV_1906_PKRP008209_BC10_medaka
│   ├── kirby_nCoV_1907_PKRP008209_BC11_medaka
│   └── kirby_nCoV_1908_PKRP008209_BC12_medaka
└── PKRP008209
└── 1905-8
    └── 20210113_0459_1C_PAF26262_d0402aba
        ├── fastq_fail
        └── fastq_pass