inconsistent behaviour due to loading order

# Bug report: `MultiDBD.get()` returns fewer data points when a second `MultiDBD` instance (different file type) is created first in the same Python session

**Library:** `dbdreader` v0.5.8  
**Python:** 3.14.2  
**OS:** Linux

---

## Summary

Creating a `MultiDBD` instance for `.dbd` (flight-computer) files and calling `.get()` on it **before** creating a separate `MultiDBD` instance for `.ebd` (science-computer) files causes the `.ebd` instance to return significantly fewer data points — consistently and reproducibly — than when the `.ebd` instance is created first.

This leads to silently incomplete data and, because the effect size varies somewhat between Python sessions, **non-reproducible processing pipelines**.

---

## Minimal reproducible example

```python
import dbdreader

DATA = "/path/to/glider/hd/"   # contains echo*.dbd and echo*.ebd

# Case A: load EBD first, then DBD
gl_ebd = dbdreader.MultiDBD(pattern=DATA + "echo*.ebd")
t_a, _ = gl_ebd.get("sci_ctd41cp_timestamp")

gl_dbd = dbdreader.MultiDBD(pattern=DATA + "echo*.dbd")
_ = gl_dbd.get("m_gps_lat")

print(f"Case A (EBD first): {len(t_a):,} points")   # → 1,038,686

# Case B: load DBD first, then EBD (typical script order)
gl_dbd2 = dbdreader.MultiDBD(pattern=DATA + "echo*.dbd")
_ = gl_dbd2.get("m_gps_lat")

gl_ebd2 = dbdreader.MultiDBD(pattern=DATA + "echo*.ebd")
t_b, _ = gl_ebd2.get("sci_ctd41cp_timestamp")

print(f"Case B (DBD first): {len(t_b):,} points")   # → 1,028,677  (≈ 10k fewer)
```

---

## Observed behaviour

| Scenario | `sci_ctd41cp_timestamp` length | Notes |
|---|---|---|
| EBD only (no DBD in session) | 1,038,686 | consistent across runs |
| EBD after DBD loaded | 1,028,677 | consistent *within* a single Python session, but the exact count varies *between* separate Python sessions (observed range: ~964k – ~1,036k) |

Both `MultiDBD` instances are created from the **same 281 `.ebd` files**; `len(gl.filenames)` reports 281 in all cases.

---

## Impact

A data-processing script that (naturally) loads GPS positions from `.dbd` files before reading CTD data from `.ebd` files will receive up to **~74,000 fewer data points** than if the loading order is reversed. In practice we observed:

- Downstream dataset produced from the "DBD-first" script had ~964 k time steps
- The same script with "EBD-first" order produced ~1,025 k time steps
- The **extra ~61 k points recovered by reordering** were not QC failures — they were valid science data

Because the magnitude of the shortfall varies between Python sessions (likely depending on whether certain `.ccc` cache files have already been decompressed to `.cac` in a prior run), the pipeline is **non-reproducible**: re-running the same script on the same input files can yield different output files.

---

## Suspected cause

The issue appears to involve **shared state** between `MultiDBD` instances. Candidate locations in the source:

1. **`DBDCache.CACHEDIR` (class-level attribute)** — This is shared across all `MultiDBD` instances. Reading `.dbd` files first triggers `decompress_file()` calls that convert `.ccc` → `.cac` files. On subsequent `.ebd` reads, the newly-present `.cac` files change which files pass the `_safely_open_dbd_file` logic, potentially altering the set of files classified as `"ok"` vs `"failed"`.

2. **`DBDPatternSelect.cache = {}` (class-level dict)** — This timestamp-keyed cache is shared across all instances and could mix up file-open-time metadata between DBD and EBD instances.

3. **`DBDCache` decompression race / state** — `.ccc` → `.cac` decompression during one instance's `__init__` modifies the filesystem in a way that changes what the next instance finds.

---

## Workaround

Load `.ebd` (science) files **before** `.dbd` (flight) files in the same Python session. After this reordering, `MultiDBD.get()` gives consistent, reproducible results across repeated runs.

---

## Steps to confirm

```python
# Verify consistency when EBD is always first:
for _ in range(5):
    gl = dbdreader.MultiDBD(pattern=DATA + "echo*.ebd")
    t, _ = gl.get("sci_ctd41cp_timestamp")
    print(len(t))   # prints 1,038,686 every time
```

```python
# Verify inconsistency when DBD comes first:
gl_dbd = dbdreader.MultiDBD(pattern=DATA + "echo*.dbd")
gl_dbd.get("m_gps_lat")
for _ in range(3):
    gl = dbdreader.MultiDBD(pattern=DATA + "echo*.ebd")
    t, _ = gl.get("sci_ctd41cp_timestamp")
    print(len(t))   # same value within a session, but differs between sessions
```

---

H

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent behaviour due to loading order #32

Bug report: `MultiDBD.get()` returns fewer data points when a second `MultiDBD` instance (different file type) is created first in the same Python session

Summary

Minimal reproducible example

Observed behaviour

Impact

Suspected cause

Workaround

Steps to confirm

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scenario	`sci_ctd41cp_timestamp` length	Notes
EBD only (no DBD in session)	1,038,686	consistent across runs
EBD after DBD loaded	1,028,677	consistent within a single Python session, but the exact count varies between separate Python sessions (observed range: ~964k – ~1,036k)

inconsistent behaviour due to loading order #32

Description

Bug report: MultiDBD.get() returns fewer data points when a second MultiDBD instance (different file type) is created first in the same Python session

Summary

Minimal reproducible example

Observed behaviour

Impact

Suspected cause

Workaround

Steps to confirm

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Bug report: `MultiDBD.get()` returns fewer data points when a second `MultiDBD` instance (different file type) is created first in the same Python session