-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Description
This is a reproducible bug where certain sequences in the reference/subject file result in no alignments. It is one of three similar examples found in a set of fungal genome assembliess as per pyani-plus/pyani-plus#417 and affects both NUCmer (NUCleotide MUMmer) version 3.1 and version 4.0.1 as well. Tested on both macOS and Linux, with mummer installed from BioConda.
Setup
Download smaller query genome:
❯ curl -o "GCA_009806305.1.fna.gz" "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_009806305.1?download=true&gzip=true"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11.6M 0 11.6M 0 0 1481k 0 --:--:-- 0:00:08 --:--:-- 1554k
Download larger genome where last two records are problematic:
❯ curl -o "GCA_011745365.1.fna.gz" "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_011745365.1?download=true&gzip=true"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 25.9M 0 25.9M 0 0 1203k 0 --:--:-- 0:00:22 --:--:-- 1165k
Checksums:
❯ md5sum GCA_*.fna.gz
6448363d6650a4a6a1db37000b1dde80 GCA_009806305.1.fna.gz
aa6bc2e22196bbdf26daaae2185fde88 GCA_011745365.1.fna.gz
Decompress and make subset of subject file without last two records:
❯ cat GCA_009806305.1.fna.gz | gunzip > query.fasta
❯ cat GCA_011745365.1.fna.gz | gunzip > subject59.fasta
❯ head -n 1461172 subject59.fasta > subject57.fasta
❯ grep -c "^>" query.fasta subject59.fasta subject57.fasta
query.fasta:35
subject59.fasta:59
subject57.fasta:57
Notice record 57 is complete:
❯ tail -n 2 subject57.fasta
TAAGATATTTATACCAAACATAATTAATTATTTTCTAAAATAAAGCTAGAATATTTATTA
AATTAAAGAATATAATTAAATATTTATAAT
Checksums:
❯ md5sum query.fasta subject59.fasta subject57.fasta
b028729332c5b2071eb3bba1be58e8ba query.fasta
74227e57109fede619e4f4ae473bf926 subject59.fasta
5b53388c322b5ebadc028ebc989af913 subject57.fasta
Running mummer4
With just 57 references, under 30s:
❯ nucmer --version
4.0.1
❯ time nucmer -p query_vs_subject57 --mum subject57.fasta query.fasta
(no output)
❯ cat query_vs_subject57.delta
/private/tmp/mum_bug/subject57.fasta /private/tmp/mum_bug/query.fasta
NUCMER
>ENA|WHUS01000032|WHUS01000032.1 ENA|RAGY01000030|RAGY01000030.1 2728 43253
2 2728 24861 27587 416 416 0
-111
9
1132
19
-4
-3
3
-267
536
14
-48
4
104
-150
6
-124
3
-143
-31
-20
0
With the full 59 references, similar run time:
❯ nucmer -p query_vs_subject59 --mum subject59.fasta query.fasta
(no output)
❯ cat query_vs_subject59.delta
/private/tmp/mum_bug/subject59.fasta /private/tmp/mum_bug/query.fasta
NUCMER
Expected output - similar or the same as the above.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels