Installation

Requirements

Following a self imposed guideline, most things written to handle nanopore data or bioinformatics in general, will use as little 3rd party libraries as possible, aiming for only core libraries, or have all included files in the package.

In the case of fast5_fetcher.py and batch_tater.py, only core python libraries are used. So as long as Python 2.7+ is present, everything should work with no extra steps.

There is one catch. Everything is written primarily for use with Linux. Due to MacOS running on Unix, so long as the GNU tools are installed (see below), there should be minimal issues running it. Windows however may require more massaging.

SquiggleKit tools were not made to be executable to allow for use with varying python environments on various operating systems. To make them executable, add #! paths, such as #!/usr/bin/env python2.7 as the first line of each of the files, then add the SquiggleKit directory to the PATH variable in ~/.bashrc, export PATH="$HOME/path/to/SquiggleKit:$PATH"

Install

git clone https://github.com/Psy-Fer/SquiggleKit.git
pip install numpy h5py sklearn matplotlib

Quick start

fast5_fetcher

If using MacOS, and NOT using homebrew, install it here:

homebrew installation instructions

then install gnu-tar with:

brew install gnu-tar
Basic use on a local computer

fastq

python fast5_fetcher.py -q my.fastq.gz -s sequencing_summary.txt.gz -i name.index.gz -o ./fast5

paf

python fast5_fetcher.py -p my.paf -s sequencing_summary.txt.gz -i name.index.gz -o ./fast5

flat

python fast5_fetcher.py -f my_flat.txt.gz -s sequencing_summary.txt.gz -i name.index.gz -o ./fast5

sequencing_summary.txt only

python fast5_fetcher.py -s sequencing_summary.txt.gz -i name.index.gz -o ./fast5

SquigglePull

All raw data:

python SquigglePull.py -rv -p ~/data/test/reads/1/ -f all > data.tsv

Positional event data:

python SquigglePull.py -ev -p ./test/ -t 50,150 -f pos1 > data.tsv

SquigglePlot

Individual File full signal

python SquigglePlot.py -i ~/data/test.fast5

Plot all from top folder in green

python SquigglePlot.py -p ~/data/ --plot_colour -g

Plot first 2000 data points of each read from signal file and save at 300dpi pdf*

python SquigglePlot.py -s signals.tsv.gz --plot_colour teal -n 2000 --dpi 300 --no_show o--save test.pdf --save_path ./test/plots/

segmenter

Stall identification

python segmenter.py -s signals.tsv.gz -ku -j 100 > signals_stall_segments.tsv

MotifSeq

(see full requirements for MLPY installation instructions)

Nanopore adapter identification

python MotifSeq.py -s signals.tsv.gz --segs signals_stall_segments.tsv -a adapter.model -t 120 -d 120 > signals_adapters.tsv

Full requirements

fast5_fetcher,py:

  • core python libraries

SquigglePull.py:

  • numpy
  • h5py
  • sklearn
pip install numpy h5py sklearn

SquigglePlot.py:

  • numpy
  • matplotlib
  • h5py
pip install numpy h5py matplotlib

segmenter.py:

  • numpy
  • matplotlib
  • h5py
  • sklearn
pip install numpy h5py sklearn matplotlib

MotifSeq.py:

  • numpy
  • h5py
  • sklearn
  • matplotlib
  • mlpy 3.5.0 (don't use pip for this)
pip install numpy h5py sklearn matplotlib

Installing mlpy: