Custom Primers
The underlying artic pipeline of InterARTIC allows a user to use any primer scheme designed for viruses, given a few rules are followed, both in formatting, and folder structure. Here we will descibe these rules, how to build the required files, and how to use them with InterARTIC.
Please read the artic primer-scheme.md file for the most up to date descriptions of these formats.
The following is tailoured towards using InterARTIC with these primer-schemes. All credit to the artic team for building such a wonderful set of tools and protocols.
In the examples, we use nCoV-2019 as the virus name, and it's best to be consistent with naming conventions across both folder and files. We did not write the artic pipeline, and some assumptions on naming are made in the tools. If you are using a different virus, name the files in the same manner, so my-virus.bed
, my-virus.scheme.bed
for example.
The following files are required for the underlying artic pipeline:
nCoV-2019.reference.fasta
nCoV-2019.scheme.bed
The folder structure should be in the form primer-scheme/virus/version
.
Here are a few examples:
primer-schemes
├── artic
│ ├── nCoV-2019
│ ├── V1
│ │ ├── nCoV-2019.bed
│ │ ├── nCoV-2019.insert.bed
│ │ ├── nCoV-2019.log
│ │ ├── nCoV-2019.pdf
│ │ ├── nCoV-2019.pickle
│ │ ├── nCoV-2019.primer.bed
│ │ ├── nCoV-2019.reference.fasta
│ │ ├── nCoV-2019.reference.fasta.fai
│ │ ├── nCoV-2019.scheme.bed
│ │ ├── nCoV-2019_SMARTplex.tsv
│ │ ├── nCoV-2019.svg
│ │ └── nCoV-2019.tsv
│ ├── V2
│ │ ├── nCoV-2019.bed
│ │ ├── nCoV-2019.insert.bed
│ │ ├── nCoV-2019.primer.bed
│ │ ├── nCoV-2019.reference.fasta
│ │ ├── nCoV-2019.scheme.bed
│ │ └── nCoV-2019.tsv
│ ├── V3
│ │ ├── nCoV-2019.bed
│ │ ├── nCoV-2019.insert.bed
│ │ ├── nCoV-2019.primer.bed
│ │ ├── nCoV-2019.reference.fasta
│ │ ├── nCoV-2019.reference.fasta.fai
│ │ ├── nCoV-2019.scheme.bed
│ │ └── nCoV-2019.tsv
│ └── V4
│ ├── README
│ ├── SARS-CoV-2.design.fasta
│ ├── SARS-CoV-2.insert.bed
│ ├── SARS-CoV-2.primer.bed
│ ├── SARS-CoV-2.reference.fasta
│ └── SARS-CoV-2.scheme.bed
├── eden
│ └── nCoV-2019
│ └── V1
│ ├── nCoV-2019.reference.fasta
│ ├── nCoV-2019.reference.fasta.fai
│ ├── nCoV-2019.scheme.bed
│ └── nCoV-2019.scheme.bed.old
└── midnight
└── nCoV-2019
└── V1
├── nCoV-2019.bed
├── nCoV-2019.reference.fasta
├── nCoV-2019.reference.fasta.fai
└── nCoV-2019.scheme.bed
It can be seen here, there are more files, then just the 2 required stated above. That is simply from keeping initial bed files, in the translation to the *.scheme.bed
files, or the *reference.fasta.fai
files which are automatically created index files from minimap2 when doing read alignment, or other intermediate files.
Note: Because minimap2 needs to write the .fai
files to the same directory as the reference file, the directoy housing these files must be writable.
*.reference.fasta structure
The reference file is exactly that, a fasta file containing the reference genome of the target virus. In this case, with 60nt per line.
Here is the first 9 lines of the nCov-2019 (SARS-CoV-2) viral reference genome file.
>MN908947.3
ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT
GTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACT
CACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATC
TTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTT
CGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAAC
ACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGG
AGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGG
CTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAA
*.scheme.bed structure
The primer scheme bed file should be in the following form
reference start stop primer_name pool
Here are the first 6 lines of the eden ~2500bp primer set
MN908947.3 31 54 nCoV-2019_1_LEFT nCoV-2019_1
MN908947.3 2569 2592 nCoV-2019_1_RIGHT nCoV-2019_1
MN908947.3 1876 1897 nCoV-2019_2_LEFT nCoV-2019_2
MN908947.3 4429 4450 nCoV-2019_2_RIGHT nCoV-2019_2
MN908947.3 4295 4321 nCoV-2019_3_LEFT nCoV-2019_1
MN908947.3 6847 6873 nCoV-2019_3_RIGHT nCoV-2019_1
Here is the first 6 lines of the midnight ~1200bp primer set
MN908947.3 30 54 SARSCoV2120_1_LEFT nCoV-2019_1
MN908947.3 1205 1183 SARSCoV2120_1_RIGHT nCoV-2019_1
MN908947.3 1100 1128 SARSCoV2120_2_LEFT nCoV-2019_2
MN908947.3 2266 2244 SARSCoV2120_2_RIGHT nCoV-2019_2
MN908947.3 2153 2179 SARSCoV2120_3_LEFT nCoV-2019_1
MN908947.3 3257 3235 SARSCoV2120_3_RIGHT nCoV-2019_1
Here is the first 6 lines of the artic V3 ~400bp primer set
MN908947.3 30 54 nCoV-2019_1_LEFT nCoV-2019_1
MN908947.3 385 410 nCoV-2019_1_RIGHT nCoV-2019_1
MN908947.3 320 342 nCoV-2019_2_LEFT nCoV-2019_2
MN908947.3 704 726 nCoV-2019_2_RIGHT nCoV-2019_2
MN908947.3 642 664 nCoV-2019_3_LEFT nCoV-2019_1
MN908947.3 1004 1028 nCoV-2019_3_RIGHT nCoV-2019_1
And here is the first 6 lines of the artic V4 ~400bp primer set It is a little different
MN908947.3 25 50 SARS-CoV-2_1_LEFT 1 +
MN908947.3 408 431 SARS-CoV-2_1_RIGHT 1 -
MN908947.3 324 344 SARS-CoV-2_2_LEFT 2 +
MN908947.3 705 727 SARS-CoV-2_2_RIGHT 2 -
MN908947.3 644 666 SARS-CoV-2_3_LEFT 1 +
MN908947.3 1017 1044 SARS-CoV-2_3_RIGHT 1 -
The artic nomenclature has changed and they have added an extra field for direction
Why the structure and naming matter
When the primer scheme is used within the artic minion command, it is in the following form: (...
is other commands)
artic minion ... --scheme-directory ~/primer-schemes/artic nCoV-2019/V1 sample-name
Where nCoV-2019/V1
and sample-name
are positinal arguments.
As you can see, the --scheme-directory
is the top directory, then the positional argument nCoV-2019/V1
sets the virus and version as directory names. Thus the structure shown above, of scheme/virus/version
needs to be in that format.
Optional *.genes.bed file
The optional *.genes.bed
file is used for the QC table (and in the future, CoVar Plots). It allows for extra information to be given regarding coverage of the various regions within a genome, as well as variants contained. If the file is not provided, the extra fields will not be provided.
Here is an example, the nCoV-2019.genes.bed
Genome_name, start, stop, name
MN908947.3 265 21555 ORF1ab
MN908947.3 21562 25384 S
MN908947.3 25392 26220 ORF3a
MN908947.3 26244 26472 ORF4
MN908947.3 26522 27191 M
MN908947.3 27201 27387 ORF6
MN908947.3 27393 27759 ORF7a
MN908947.3 27755 27887 ORF7b
MN908947.3 27893 28259 ORF8
MN908947.3 28273 29533 ORF9
MN908947.3 29557 29674 ORF10
Using custom primer scheme in InterARTIC
To use a custom primer scheme in InterARTIC, (ie, one not included in the defaults provided) first select Custom
in the virus selection panel:
Then in the "Please enter your custom primer here:" field, ender the name of your primer-scheme (can be anything). This will be used in the output folder naming method.
Then enter the --scheme-directory
path into the Primer scheme directory:
field
Then the virus/version
info into the Name of primer scheme:
field.
InterARTIC will take care of the rest.