MotifSeq

Background

MotifSeq, the ctrl+f for signal, identifies raw signal traces that correspond to a given nucleotide sequence, such as an adapter, barcode or motif of interest. MotifSeq takes a query nucleotide sequence as input, converts it to a normalised signal trace using Scrappie, then performs a signal-level local alignment using a dynamic programming algorithm. MotifSeq outputs the location of a matching target in the raw signal with an associated distance value.

Image demonstrating barcode identification using MotifSeq

Getting Started

MotifSeq requires an input signal, either extracted from another read, or by using something such as Scrappie. Use the built in visualisation -v for help in parameter tuning.

Instructions for use

Nanopore adapter identification

Building an adapter model:

scrappie squiggle adapter.fa > adapter.model

Identify stalls in signal using segmenter:

python segmenter.py -s signals.tsv.gz -ku -j 100 > signals_stall_segments.tsv

Identifying nanopore adapters in signal up stream of identified stalls from segmenter:

python MotifSeq.py -s signals.tsv.gz --segs signals_stall_segments.tsv -a adapter.model > signals_adapters.tsv

Find kmer motif:

Building an adapter model:

fasta format for scrappie:

>my_kmer_name
ATCGATCGCTATGCTAGCATTACG

Make the model from scrappie:

scrappie squiggle my_kmer.fa > scrappie_kmer.model

find the best match to that kmer in the signal:

python MotifSeq.py -s signals.tsv -m scrappie_kmer.model > signals_kmer.tsv

Full usage

usage: MotifSeq.py [-h] [-f F5F | -p F5_PATH | -s SIGNAL] [-a ADAPT]
                   [-m MODEL] [-x] [--segs SEGS] [-v] [-scale_hi SCALE_HI]
                   [-scale_low SCALE_LOW]

MotifSeq - the Ctrl+f for signal. Signal-level local alignment of sequence
motifs.

optional arguments:
  -h, --help            show this help message and exit
  -f F5F, --f5f F5F     File list of fast5 paths
  -p F5_PATH, --f5_path F5_PATH
                        Fast5 top dir
  -s SIGNAL, --signal SIGNAL
                        Extracted signal file from SquigglePull
  -a ADAPT, --adapt ADAPT
                        Adapter model file - use to find nanopore adapter
  -m MODEL, --model MODEL
                        Query model file - use for kmer searching
  -x, --sig_extract     Extract signal of match
  --segs SEGS           [Optional] segmenter file, used with --adapt
  -v, --view            view each output
  -scale_hi SCALE_HI, --scale_hi SCALE_HI
                        Upper limit for signal outlier scaling
  -scale_low SCALE_LOW, --scale_low SCALE_LOW
                        Lower limit for signal outlier scaling