Usage

Configuring interARTIC

You only need to configure InterARTIC the first time you use it. The configuration details are saved for all future runs.

  • On the interARTIC web interface, click on Set locations of input data.

  • Fill the first two fields:

1. Location of your input data. This should be an absolute path of a directory. Usually, this should simply be the location where MinKNOW writes data from sequencing runs on your machine. On a GridION/PromethION the default location is the /data folder. If you use a MinION on a laptop, the current default location is /var/lib/minknow/data/ (although this may change for future versions of MinKNOW). Alternatively, you can provide a path to a custom directory where you have chosen to copy your sequencing data.

2. Location of your sample-barcode .csv files. This should be an absolute path of a directory. InterARTIC expects you to put the sample-barcode .csv manifest files directly inside this directory (not inside any sub directories).

  • Click confirm to save the settings.

Structure of input data

For interARTIC to work smoothly, the directory structure of the nanopore sequencing data as well as the file structure of sample-barcode .csv manifest files are very important. InterARTIC assumes that the sequencing data is organised in the default structure created by MinKNOW with live base-calling enabled - if this is how you ran your experiment, there is no need to rename or restructure the data generated by MinKNOW.

Assuming the input data directory mentioned above is set to /data (default for GridION / PromethION) and two nanopore sequencing experiments have been run, the directory structure should look like this:

/data/
    experiment_group_1/
        sample_id_1/
            uuid_1/
                # nanopore sequencing data here
    experiment_group_2/
        sample_id_2/
            uuid_2/
                # nanopore sequencing data here

Note that, this directory structure that includes the experiment_group, sample_id and the uuid (currently {start_time}{device_ID}{flow_cell_id}_{short_protocol_run_id} for this uuid) are produced by MinKNOW.

Inside the uuid directory, the .fastq files, .fast5 files and the sequencing summary file should be available as in the example below:

uuid/
    fast5_pass/
        A_10.fast5
        A_11.fast5
    fast5_fail/
        A_0.fast5
        A_1.fast5
    fastq_pass/
        B_0.fastq
        B_1.fastq
    fastq_fail/
        B_10.fastq
        B_11.fastq
    sequencing_summary.txt
    ...

If you had live demultiplexing on during the run, the following folder structure is also valid, but be sure to select the Already demultiplex with guppy? option

uuid/
    fast5_pass/
        barcode01/
            A_1.fast5
            A_1.fast5
        barcode02/
            A_1.fast5
            A_1.fast5
        ...
    fast5_fail/
        barcode01/
            A_1.fast5
            A_1.fast5
        barcode02/
            A_1.fast5
            A_1.fast5
        ...
    fastq_pass/
        barcode01/
            A_1.fastq
            A_1.fastq
        barcode02/
            A_1.fastq
            A_1.fastq
        ...
    fastq_fail/
        barcode01/
            A_1.fastq
            A_1.fastq
        barcode02/
            A_1.fastq
            A_1.fastq
        ...
    sequencing_summary.txt
    ...

If you called with guppy_barcoder instead, then the structure is the same as above, but withoput the fastq/fast5_pass/fail directories, and the barcode## directories directly under the uuid directory. Again, ensure that the Already demultiplex with guppy? is selected for this structure.

Note that this is how MinKNOW currently produces data when live basecalling is enabled. If your data is not named in this format, please rename the corresponding directories/files to: fast5_pass, fastq_pass, sequencing_summary.txt.

If you wish to analyse some custom data that is not organised in the above structure, you will need to create a directory structure that adheres to the above structure (give alpha-numeric names of your choice for experiment_group, sample_id and the uid) and rename the files accordingly. For medaka pipeline, only the .fastq files are required. You can simply place one or more .fastq files under a subdirectory named fastq_pass. For the nanopolish pipeline, you have to place .fastq file/s in fastq_pass, the corresponding fast5 file/s under fast5_pass and the sequencing_summary file under the name sequencing_summary.txt. The fast5_fail and fastq_fail directories are not required by either pipeline.

Sample-barcode manifest file

If your experiment contains multiple samples that were barcoded and run on a single flow cell, InterARTIC requires a sample-barcode manifest file in .csv format that links sample names to the barcode numbers. This is the only necessary input file that is not automatically generated by MinKNOW and must be created manually by the user. This must be placed in the sample-barcodes directory specified during the configuration step above.

An example manifest .csv file for a dataset with four multiplexed samples using native barcodes is below:

For Native ligation barcodes (NBXX):

sample-group-1.csv

sample1,NB03
sample2,NB04
sample3,NB05
sample4,NB06

For Rapid barcodes (RBXX):

sample-group-1.csv

sample1,RB03
sample2,RB04
sample3,RB05
sample4,RB06

The reason for usage of NB/RB is historical, for easy compatbility with other software.

Adding a job

To begin the process of adding a job, click the 'Add Job' button located underneath the Jobs Queue on the home page. This will direct you to the parameter selection page where you can set up your job.

Input the necessary parameters (see Parameter Descriptions below). Required parameters are denoted with an asterix (*).

Parameter Descriptions

You can customise the parameters by filling the respective text boxes, radio buttons or check boxes. For text boxes, only utf-8 alpha-numeric characters and underscore are allowed, except that forward slashes / are allowed for fields representing paths. No Space characters are allowed.

Basic Parameters

  • Job name: A unique name for your job, so you may identify your output files with it.
  • Input directory: Your nanopore experiment directory.
    • When you click on the text box (need to double click on certain browsers), a list will appear that lists the contents inside the /data directory you set during the configuration.
    • You can select the experiment directory from this list.
  • Single or Multiple samples: Select the appropriate option for your experiment.
  • Sample-barcode metadata file: This is applicable only if you selected multiple samples in the previous option.
  • When you click on the field (double click on certain browsers) a list will appear that lists the contents inside the sample-barcode directory you set during the configuration.
  • You can select the corresponding .csv file for your experiment.
  • Output folder: This is an optional field.

    • If left empty, an output directory called output will be created inside the experiment_group directory.
    • If you provide a name, a directory under that name will be created inside the experiment_group directory.
    • You can also provide an absolute path for a custom location (for instance, if you do not have write access to the experiment_group directory). If you do this, ensure that the parent directory exists.
      • For example, if you are inputting this directory path /path/to/file/hello/world as the output directory path, the folder “hello” must already exist for the “world” folder to be created.
  • Override existing data: Select this if your output directory already contains files in it that must be overwritten

  • WARNING: all files inside the output directory will be deleted. Please be careful.
  • Virus:. Select the pre-set viruses (SARS-CoV-2 and Ebola viruses at the moment) that are bundled with interARTIC or custom for analysing your own virus or custom scheme.
  • Primer scheme: If a pre-set virus is selected in the previous step, the pre-set primer schemes bundled with interARTIC for that virus will appear, which you can select based on what you used for your nanopore sequencing run. (Please contact us at GitHub to add your favourite virus and schemes)
  • Primer scheme top directory: This is auto filled if you selected a pre-set virus and a primer scheme. If you are analysing a custom virus or using a custom primer scheme, enter the directory path to where your custom primer schemes are located.
  • Name of primer scheme: This is auto filled if you selected a pre-set virus and a primer scheme. Otherwise give the primer scheme name that adheres to the format virus_name/version (e.g.,nCoV-2019/V1)..

    • Inside the directory location pointed by primer_scheme_directory/primer_scheme_name, the corresponding reference (.fasta file) and the primer .bed files should exist. See the examples at https://github.com/Psy-Fer/interARTIC/tree/master/primer-schemes/artic
  • Demultiplexing: Already demultiplexed with guppy? select this to skip the demux step.

  • Library preparation method: Enter what you used for sample preparation.
    • This is only used for folder-naming purposes.
  • Pipeline: Select the pipeline within ARTIC that you wish to run your data files through (nanopolish or medaka currently).

Advanced Parameters

  • Minimum length: This is auto filled if you selected a pre-set virus and a primer scheme. Otherwise set this to the minimum read length that should be used for the analysis, based on your primer lengths.
  • Maximum length: This is auto filled if you selected a pre-set virus and a primer scheme. Otherwise set this to the maximum read length that should be used for the analysis, based on your primer lengths.
  • Thread usage: Change the pre-filled values if you wish, based on the number of cores in your system you want to utilise. Note that not all tools inside the ARTIC pipeline are multi-threaded and thread efficient.
  • Normalise depth: This is auto filled if you selected a pre-set virus and a primer scheme. Otherwise, select the read depth (coverage) that you to which your data should be normalised.
  • Step start (DEBUG): This is used for dev debugging. If you already have a run completed by InterARTIC, but InterARTIC was restarted (losing the link to the ouput page), you can add the job again, with the exact same (and I mean it, EXACTLY the same) parameters, and you can choose which step to start the processing from. If the run has not been demultiplexed, select 3 to start the run from the last step, making the plots (the safest step) to quickly get to the output screens. If It was already Demultiplexed, start from step 2. If none of this makes sense, please ask the devs for help, and leave this field blank

After filling in the parameters carefully, click on the “Submit Job(s)” button. You will be redirected to the progress page after clicking this button.

Progress Page

The progress page displays the stream of standard error output being produced by your job run. Here you can see which commands are currently running and any errors that occur. Each job run has its own progress page which can be accessed via the home page or parameters page by clicking on the job name in the jobs queue.

For each job, the progress page will display:

  • The job name
  • The job's place in queue
  • An 'Abort Job' button
  • The overall progress of the job in the form of a progress bar and the number of steps remaining in the pipeline
  • A 'View Job Parameters here' button
  • The current standard output obtained from the job

The 'View Job Parameters here' button, when clicked, will display the job's basic parameters that have been entered by the user.

The 'Abort Job' button can be used to terminate the job. A confirmation window will appear when you click on the abort button. If you continue, you will then be asked to confirm whether you wish to delete the files created by the job. After this, you will then be directed back to the home page.

What happens if an error occurs during the run?

If an error occurs during a run, a red notification will appear. You can either let the job continue to run, or click the ‘Abort’ button. Harmless errors sometimes occur in the ARTIC pipeline (such as longshot failing), so it may be worth waiting for the run to finish and then assessing your output. If you have a negative control, it will fail, though in most cases won't make it past the QC/demux steps. The rest of the samples should be analysed just fine and display on the output page.

A confirmation window will appear when you click on the ‘Abort’ button asking you to confirm that you wish to abort the current job and whether to delete the files created by the job.

What happens when a job is completed?

When a job is completed, a ‘Go to Output’ button will appear at the top of the page. Click the button to be redirected to the output page. This may take 10 or more seconds depending on how many samples were sequenced. The job will also be moved to the 'Completed Jobs' list on the home page where you can click on the job name and be redirected to the progress page for that job.

Output Page

The Output Page is for data visualisation to enable a fast quality check of the sample and to see the called variants. Fasta files and QC metrics can be downloaded with the corresponding buttons. At the bottom of the page, there is a 'Go to Progress' button which will redirect you to the progress page of the job if you click on it.

Data visualisation

This section enables you to preview coverage depth profiles and locations if detected variants across a viral genome, for each sample, based on the outputs of the completed pipeline. To download a data plot, right-click on the image and select 'Save image as...'. If that does not work, right click the image and select 'Open image in new tab', then try the 'Save image as...' method again. Otherwise find the .png file located in the corresponding sample output folder. If no <sample_name>.pass.vcf.gz files are found in the output folder, the message “Vcf graph could not be made: No pass.vfc.gz file/s found in the output folder.” will be displayed. As no files of the suitable format have been found, these outputs cannot be produced. This may be due to errors or problems during the pipeline, so checking error messages in the progress page's standard error output section is important.

Variants Found

Beneath each of the graphs, a summarised version of the <sample_name>.pass.vcf.gz file is displayed in the form of a table. Each row corresponds to a different variant found and they are ordered in increasing numerical order based on their position in the viral genome.

Files Produced

To manually inspect the produced vcf files, consensus sequence or any intermediate data, open a file browser and traverse to the output directory. This is the directory called output in the experiment directory by default, or whatever you set during the parameter setting up step above.