SquigglePull

Background

Up until early 2019, Oxford Nanopore sequencing devices stored the raw current measurements and associated metadata in a single Hierarchical Data Format (HDF5) format file, called a fast5 file. After early 2019 devices were updated to produce multi-fast5 files, which contained multiple reads, usually 4000, in a single file.

Processing fast5 files has been troublesome for a number of reasons:

  • Unfamiliar format to many people.
  • Need 3rd party library to open and read (h5py, pytables).
  • libraries are not thread safe and locks files.
  • Don't compress as well when single files. (better in multi-fast5)

SquigglePull outputs a single tab separated value (.tsv) file where each row contains a single signal and read selected metadata.

Current format is designed as an example of producing a more accessible file format for those wishing to get started with nanopore signal data. Columns and data inclusion is subject to change depending on use case.

Getting Started

SquigglePull can extract both raw current measurements as well as event data. It also has some basic arguments and code scaffolding extraction profiles. This allows the user to implement their analysis methods into the extraction protocol for integration with pipelines, or just quick data trimming. For example using the form -f pos1, and the tarting -t 20,110, only the signal values between 20 and 110 will be extracted.

Inputs

SquigglePull takes 3 mandatory arguments

  1. path - Top directory of fast5 files
  2. form - Format of targeting information (default: all)
  3. raw/event - Raw signal or event data

SquigglePull figure showing inputs

Instructions for use

Simply point SquigglePull to a top directory containing fast5 files, and the signal will be extracted to STDOUT

Quick start

Extract all raw signal

python SquigglePull.py -r -p test/R9_raw_data/ > data.tsv

Extract events between position 20 and 210

python SquigglePull.py -e -p test/R9_event_data/ -t 20,110 -f pos1 > data.tsv

Full usage

usage: SquigglePull.py [-h] [-p PATH] [-t TARGET] [-f {pos1,all}] [-r | -e]
[-v] [-s]

SquigglePull - extraction of raw/event signal from Oxford Nanopore fast5 files

optional arguments:
-h, --help            show this help message and exit
-p PATH, --path PATH  Top directory path of fast5 files
-t TARGET, --target TARGET
Target information as comma delimited string
structured by format type
-f {pos1,all}, --form {pos1,all}
Format of target information
-r, --raw             Target raw signal
-e, --event           Target event signal
-v, --verbose         Engage higher output verbosity
-s, --scale           Scale signal output for comparison