nucmer gives no alignment with certain records in subject file

This is a reproducible bug where certain sequences in the reference/subject file result in no alignments. It is one of three similar examples found in a set of fungal genome assembliess as per https://github.com/pyani-plus/pyani-plus/issues/417 and affects both NUCmer (NUCleotide MUMmer) version 3.1 and version 4.0.1 as well. Tested on both macOS and Linux, with mummer installed from BioConda.

## Setup

Download smaller query genome:

```
❯ curl -o "GCA_009806305.1.fna.gz" "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_009806305.1?download=true&gzip=true"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.6M    0 11.6M    0     0  1481k      0 --:--:--  0:00:08 --:--:-- 1554k
```

Download larger genome where last two records are problematic:

```
❯ curl -o "GCA_011745365.1.fna.gz" "https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_011745365.1?download=true&gzip=true"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25.9M    0 25.9M    0     0  1203k      0 --:--:--  0:00:22 --:--:-- 1165k
```

Checksums:

```
❯ md5sum GCA_*.fna.gz
6448363d6650a4a6a1db37000b1dde80  GCA_009806305.1.fna.gz
aa6bc2e22196bbdf26daaae2185fde88  GCA_011745365.1.fna.gz
```

Decompress and make subset of subject file without last two records:

```
❯ cat GCA_009806305.1.fna.gz | gunzip > query.fasta
❯ cat GCA_011745365.1.fna.gz | gunzip > subject59.fasta
❯ head -n 1461172 subject59.fasta > subject57.fasta
```

```
❯ grep -c "^>" query.fasta subject59.fasta subject57.fasta
query.fasta:35
subject59.fasta:59
subject57.fasta:57
```

Notice record 57 is complete:

```
❯ tail -n 2 subject57.fasta
TAAGATATTTATACCAAACATAATTAATTATTTTCTAAAATAAAGCTAGAATATTTATTA
AATTAAAGAATATAATTAAATATTTATAAT
```

Checksums:

```
❯ md5sum query.fasta subject59.fasta subject57.fasta
b028729332c5b2071eb3bba1be58e8ba  query.fasta
74227e57109fede619e4f4ae473bf926  subject59.fasta
5b53388c322b5ebadc028ebc989af913  subject57.fasta
```

## Running mummer4

With just 57 references, under 30s:

```
❯ nucmer --version
4.0.1
❯ time nucmer -p query_vs_subject57 --mum subject57.fasta query.fasta
(no output)
❯ cat query_vs_subject57.delta
/private/tmp/mum_bug/subject57.fasta /private/tmp/mum_bug/query.fasta
NUCMER
>ENA|WHUS01000032|WHUS01000032.1 ENA|RAGY01000030|RAGY01000030.1 2728 43253
2 2728 24861 27587 416 416 0
-111
9
1132
19
-4
-3
3
-267
536
14
-48
4
104
-150
6
-124
3
-143
-31
-20
0
```

With the full 59 references, similar run time:

```
❯ nucmer -p query_vs_subject59 --mum subject59.fasta query.fasta
(no output)
❯ cat query_vs_subject59.delta
/private/tmp/mum_bug/subject59.fasta /private/tmp/mum_bug/query.fasta
NUCMER
```

Expected output - similar or the same as the above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nucmer gives no alignment with certain records in subject file #238

Setup

Running mummer4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nucmer gives no alignment with certain records in subject file #238

Description

Setup

Running mummer4

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions