Train a custom model
To train a custom model, a dataset with each barcode attached to a read that will uniquly map with regard to all other barcodes is required.
Some examples:
- Synthetic RNA controls sequins attached to each barcode
- Different species for each barcode
So long as when you map with minimap2, you can group the reads by expected barcode output.
Input requirements
- Raw fast5 files
- Truth table of readID->barcode
The truth table of readIDs should be in one-hot format, ie,
readID 1 0 0 0
readID 0 1 0 0
readID 0 0 1 0
readID 0 0 0 1
With binary classification of barcode 1, 2, 3, 4 respectively. Of course, this can be extended to any number of barcodes required, and accross multiple runs.
This should then be split into training
, testing
, and validation
files.
Do this by placing:
- 80% of the reads into a training
file. --train_truth
- 10% of the reads into a testing
file. --test_truth
- 10% of the reads into a validation
file. --val_truth
Running the training
Training requires a CUDA compatible GPU and the correct libraries installed.
Commence training, with validation, with the following:
python deeplexicon.py train --path /fast5/top/path/ --train_truth train.tsv --test_truth test.tsv --val_truth val.tsv
Full description
train.add_argument('-p', '--path', nargs='+',
help="Input path(s) of all used fast5s")
train.add_argument('-t', '--train_truth', nargs='+',
help="Traiing truth set(s) in one-hot format eg: readID, 0, 0, 1, 0 for barcode 3 of 4 ")
train.add_argument('-s', '--test_truth', nargs='+',
help="Testing truth set(s) in one-hot format eg: readID, 0, 0, 1, 0 for barcode 3 of 4 ")
train.add_argument('-u', '--val_truth', nargs='+',
help="Validation truth set(s) in one-hot format eg: readID, 0, 0, 1, 0 for barcode 3 of 4 ")
train.add_argument('-n', '--network', default="ResNet20",
help="Network to use (see table in docs)")
train.add_argument('--net_version', type=int, default=2,
help="Network version to use (see table in docs)")
train.add_argument('-e', '--epochs', type=int, default=40,
help="epochs to run")
train.add_argument('-v', '--verbose', action='count', default=0,
help="Verbose output [v/vv/vvv]")