diff --git a/README.md b/README.md
index edfb7ee..01f0145 100644
--- a/README.md
+++ b/README.md
@@ -1,258 +1,305 @@
-# structflo-cser
+<h1 align="center">structflo.cser</h1>
+
+<p align="center">
+  <img src="./docs/images/example-1.png" alt="structflo.cser — detection and pairing example" width="700">
+</p>
+
+<p align="center">
+  <a href="https://pepy.tech/projects/structflo-cser"><img src="https://static.pepy.tech/personalized-badge/structflo-cser?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads" alt="PyPI Downloads"></a>
+  <a href="https://github.com/structflo/structflo-cser/actions"><img src="https://img.shields.io/github/actions/workflow/status/structflo/structflo-cser/ci.yml?label=tests" alt="Tests"></a>
+  <a href="https://github.com/structflo/structflo-cser/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-green.svg" alt="License"></a>
+  <a href="https://www.linkedin.com/in/sidxz/"><img src="https://img.shields.io/badge/LinkedIn-blue?logo=linkedin&logoColor=white" alt="LinkedIn"></a>
+  <a href="https://github.com/sidxz/"><img src="https://img.shields.io/badge/GitHub-black?logo=github&logoColor=white" alt="GitHub"></a>
+</p>
+
+<p align="center">
+  Chemical structure and label extraction from scientific documents.
+</p>
+
+<p align="center">
+  <a href="#installation">Installation</a> &bull;
+  <a href="#quick-start">Quick Start</a> &bull;
+  <a href="#step-by-step-pipeline">Step-by-Step</a> &bull;
+  <a href="#matchers">Matchers</a> &bull;
+  <a href="#downstream-processing">Downstream Processing</a> &bull;
+  <a href="#notebooks">Notebooks</a>
+</p>
 
-YOLO11l-based detector for chemical structures and their compound label IDs in scientific documents.
+---
 
-Part of the **structflo** library. Import as:
-```python
-from structflo.cser.pipeline import ChemPipeline
-```
+**structflo.cser** extracts chemical structure–label pairs from images and PDF pages. It uses a fine-tuned YOLO detector trained on synthetic chemical structure data to locate structures and compound labels on a page, then pairs them using Learned Pair Scorer (LPS) model or a simpler Hungarian Matcher.
 
-**Detection target:** A single bounding box (`compound_panel`) enclosing the union of a rendered chemical structure and its nearby label ID (e.g. `CHEMBL12345`).
+The extracted crops can be passed to any structure-to-SMILES converter (DECIMER, MolScribe) and any OCR engine for label text. DECIMER and EasyOCR are bundled for convenience, but any downstream tools can be swapped in.
 
----
+**Two-step process:**
+
+1. **Detect** — A fine-tuned YOLO detector finds all chemical structures and compound labels in the image
+2. **Match** — A matcher pairs each structure with its corresponding label, producing cropped image pairs
+
+|                   | `LearnedMatcher` (default)              | `HungarianMatcher`              |
+| ----------------- | --------------------------------------- | ------------------------------- |
+| Approach          | Neural Pair Scorer (LPS)                | Geometric (centroid distance)   |
+| Setup             | Auto-downloads weights                  | Zero config                     |
+| Speed             | Fast (GPU accelerated)                  | Instantaneous                   |
+| Accuracy          | Better for complex or crowded pages     | Good for simple layouts         |
+| Output            | `CompoundPair`                          | `CompoundPair` (identical)      |
 
 ## Installation
 
 ```bash
-uv pip install -e .
+pip install structflo-cser
 ```
 
-This installs all dependencies and registers the `sf-*` CLI commands on your PATH.
+```bash
+# or with uv
+uv add structflo-cser
+```
 
----
+This also installs DECIMER and EasyOCR for downstream SMILES and text extraction. The core pipeline does not depend on them — any extractor implementation can be swapped in.
 
-## Pipeline
+## Quick Start
 
-```
-1. Fetch SMILES          →  sf-fetch-smiles
-2. Download distractors  →  sf-download-distractors   (optional but recommended)
-3. Generate dataset      →  sf-generate
-4. Visualize labels      →  sf-viz                    (optional QA check)
-5. Train YOLO            →  sf-train
-6. Run inference         →  sf-detect
-7. Annotate real PDFs    →  sf-annotate               (optional)
-```
+One call from image to `(SMILES, label)` pairs:
 
----
+```python
+from structflo.cser.pipeline import ChemPipeline
+from structflo.cser.lps import LearnedMatcher
 
-## Commands
+pipeline = ChemPipeline(matcher=LearnedMatcher())
+results = pipeline.process("page.png")
 
-### 1. Fetch SMILES from ChEMBL
+for pair in results:
+    print(pair.smiles, pair.label_text)
+```
 
-Extracts ~20 k small-molecule SMILES from a local [ChEMBL SQLite database](https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/).
+Weights for both the detector and the LPS are auto-downloaded from HuggingFace Hub on first use.
 
-```bash
-sf-fetch-smiles \
-  --db chembl_35/chembl_35_sqlite/chembl_35.db \
-  --output data/smiles/chembl_smiles.csv \
-  --n 20000
+Export to a pandas DataFrame or JSON:
+
+```python
+df   = ChemPipeline.to_dataframe(results)
+data = ChemPipeline.to_json(results)
+```
+
+```
+   match_distance  match_confidence                              smiles     label_text
+0          135.19            0.9844  CN1CCC2=C(C1)SC(=N2)C(=O)NC3=...      7178-39-6
+1          208.40            0.9973  C1=CC(=CC=C1C2=C(C(=O)O)N=NN2...     72804-12-9
+2          126.25            0.9997  COC1=CC=C(C=C1)C=C2C(=O)N(C3=...   ZINC2978 720
 ```
 
-Output: `data/smiles/chembl_smiles.csv`
+### PDF input
 
----
+For PDFs, use `process_pdf()` — it renders each page and returns one result list per page:
+
+```python
+from structflo.cser.pipeline import ChemPipeline
+from structflo.cser.lps import LearnedMatcher
 
-### 2. Download distractor images
+pipeline = ChemPipeline(matcher=LearnedMatcher())
 
-Downloads real photographs from [Lorem Picsum](https://picsum.photos/) to use as hard-negative distractors during page generation.
+# Returns list[list[CompoundPair]] — one inner list per page
+all_pages = pipeline.process_pdf("paper.pdf")
 
-```bash
-sf-download-distractors --out data/distractors --count 1000
+for page_num, pairs in enumerate(all_pages):
+    print(f"Page {page_num + 1}: {len(pairs)} compound pairs")
+    for pair in pairs:
+        print(f"  {pair.label_text:20s}  {pair.smiles}")
 ```
 
----
+Pass `output_pdf` to save an annotated copy with bounding boxes and extracted data overlaid:
 
-### 3. Generate synthetic dataset
+```python
+pipeline.process_pdf("paper.pdf", output_pdf="paper_annotated.pdf")
+```
 
-Generates document-like pages (A4 @ 300 DPI or slide format) containing chemical structures, compound labels, and distractor elements.
+## Step-by-Step Pipeline
 
-```bash
-sf-generate \
-  --smiles data/smiles/chembl_smiles.csv \
-  --out data/generated \
-  --num-train 2000 --num-val 400 \
-  --fonts-dir data/fonts \
-  --distractors-dir data/distractors \
-  --dpi 96,144,200,300 \
-  --workers 0
-```
+For finer control, each stage is exposed individually.
 
-Key options:
+### 1. Create the pipeline
 
-| Flag | Default | Description |
-|------|---------|-------------|
-| `--num-train` | 2000 | Number of training pages |
-| `--num-val` | 200 | Number of validation pages |
-| `--dpi` | `96,144,200,300` | DPI values randomly sampled per page |
-| `--grayscale` / `--no-grayscale` | on | Convert pages to grayscale |
-| `--workers` | 0 (all CPUs) | Parallel workers; use `1` to disable multiprocessing |
+```python
+from structflo.cser.pipeline import ChemPipeline
 
-**Output structure:**
-```
-data/generated/
-├── train/
-│   ├── images/         (JPEG pages)
-│   ├── labels/         (YOLO .txt — union bbox per compound panel)
-│   └── ground_truth/   (JSON with split struct_bbox / label_bbox / smiles)
-└── val/
-    ├── images/
-    ├── labels/
-    └── ground_truth/
+# Default: LearnedMatcher — auto-downloads LPS weights on first use
+pipeline = ChemPipeline(tile=False, conf=0.70)
 ```
 
----
-
-### 4. Visualize labels (QA)
+For a heuristic based approach, use `HungarianMatcher`:
 
-Overlays YOLO bounding boxes on a random sample of generated pages.
+```python
+from structflo.cser.pipeline import ChemPipeline, HungarianMatcher
 
-```bash
-sf-viz --split both --n 30 --out data/viz
+pipeline = ChemPipeline(
+    tile=False,
+    conf=0.70,
+    matcher=HungarianMatcher(max_distance=500),
+)
 ```
 
-Green boxes = `chemical_structure`, blue boxes = `compound_label`.
+The pipeline is lazy — detector weights, DECIMER, and EasyOCR are loaded on first use only.
 
----
+### 2. Detect
 
-### 5. Train
+```python
+detections = pipeline.detect("page.png")
 
-Fine-tunes YOLO11l on the generated dataset.
+n_struct = sum(1 for d in detections if d.class_id == 0)
+n_label  = sum(1 for d in detections if d.class_id == 1)
+print(f"Found {n_struct} structures and {n_label} labels")
+# Found 6 structures and 6 labels
+```
 
-```bash
-sf-train --epochs 50 --imgsz 1280 --batch 8
+`class_id=0` = chemical structure &nbsp;|&nbsp; `class_id=1` = compound label
+
+### 3. Match
+
+```python
+pairs = pipeline.match(detections)
+# Matched 6 structure–label pairs
+#   Pair 0: distance=135px  structure@(490,421)  label@(489,285)
+#   Pair 1: distance=208px  structure@(258,194)  label@(466,195)
 ```
 
-Key options:
+### 4. Visualise
 
-| Flag | Default | Description |
-|------|---------|-------------|
-| `--weights` | `yolo11l.pt` | Pretrained backbone |
-| `--imgsz` | 1280 | Training resolution |
-| `--batch` | 8 | Batch size (safe for A6000 48 GB) |
-| `--resume` | — | Path to `last.pt` to resume an interrupted run |
+```python
+from structflo.cser.viz import plot_detections, plot_pairs, plot_crops, plot_results
 
-**Output:** `runs/labels_detect/yolo11l_panels/weights/best.pt`
+fig = plot_detections(img, detections)   # green = structure, blue = label
+fig = plot_pairs(img, pairs)             # orange lines connect matched pairs
+fig = plot_crops(img, pairs)             # cropped structure and label regions
+fig = plot_results(img, results)         # final annotated output
+```
 
----
+![Detection and pairing visualisation](docs/images/example-2.png)
 
-### 6. Detect
+### 5. Enrich — SMILES and label text
 
-Runs the trained detector on images using sliding-window tiling (1536 px tiles, 20 % overlap).
+```python
+enriched = pipeline.enrich(pairs, "page.png")
 
-```bash
-# Single image
-sf-detect --image page.png
+for i, p in enumerate(enriched):
+    print(f"Pair {i}:")
+    print(f"  SMILES:     {p.smiles}")
+    print(f"  Label text: {p.label_text}")
+```
 
-# Directory of images
-sf-detect --image_dir data/real/images/ --out detections/
+```
+Pair 0:
+  SMILES:     CN1CCC2=C(C1)SC(=N2)C(=O)NC3=C(C=CC=C3)CNC(=O)C4=CC=CC(=C4)Cl
+  Label text: 7178-39-6
 
-# With Hungarian pairing of structures → labels
-sf-detect --image page.png --pair --max_dist 300
+Pair 1:
+  SMILES:     C1=CC(=CC=C1C2=C(C(=O)O)N=NN2C3=CC=C(C=C3)S(=O)(=O)N)Br
+  Label text: 72804-12-9
 ```
 
-Key options:
+## Matchers
 
-| Flag | Default | Description |
-|------|---------|-------------|
-| `--weights` | `runs/.../best.pt` | Model weights |
-| `--conf` | 0.3 | Confidence threshold |
-| `--tile_size` | 1536 | Tile size in pixels |
-| `--no_tile` | off | Run on full image (skips tiling) |
-| `--grayscale` | off | Convert to grayscale before detection |
-| `--pair` | off | Hungarian match structures → labels |
+### Learned Pair Scorer — `LearnedMatcher` (default)
 
----
+A neural matcher trained to score structure–label compatibility using both visual crops and geometric features. It replaces the raw distance cost matrix with a learned association probability, then solves global assignment with the Hungarian algorithm.
 
-### 7. Annotate real PDFs (optional)
+Weights are auto-downloaded from HuggingFace Hub on first use — no manual setup needed. Models are hosted at:
 
-Web-based annotation tool for creating ground truth from real PDF documents.
+- Detector: [huggingface.co/sidxz/structflo-cser-detector](https://huggingface.co/sidxz/structflo-cser-detector)
+- LPS scorer: [huggingface.co/sidxz/structflo-cser-lps](https://huggingface.co/sidxz/structflo-cser-lps)
 
-```bash
-sf-annotate --out data/real --port 8000
-# then open http://127.0.0.1:8000 in a browser
+```python
+from structflo.cser.pipeline import ChemPipeline
+from structflo.cser.lps import LearnedMatcher
+
+pipeline = ChemPipeline(
+    matcher=LearnedMatcher(
+        min_score=0.5,      # drop pairs below this confidence
+        max_dist_px=None,   # optional centroid pre-filter to save compute
+    )
+)
 ```
 
----
+`min_score` — pairs scoring below this threshold are discarded as unlabelled structures.
 
-## Package layout
+### Hungarian Matcher — `HungarianMatcher` (fallback)
 
-```
-structflo/cser/              # importable package (from structflo.cser import ...)
-├── _geometry.py             # shared bbox utilities (boxes_intersect, try_place_box)
-├── config.py                # PageConfig dataclass + make_page_config()
-├── data/
-│   ├── smiles.py            # load_smiles(), fetch_smiles_from_chembl_sqlite()
-│   └── distractor_images.py # load_distractor_images(), download_picsum()
-├── rendering/
-│   ├── chemistry.py         # render_structure(), place_structure()
-│   └── text.py              # draw_rotated_text(), add_label_near_structure(), load_font()
-├── distractors/
-│   ├── charts.py            # bar / scatter / line / pie chart generators
-│   ├── shapes.py            # geometric shapes, noise patches, gradients
-│   └── text_elements.py     # prose blocks, captions, footnotes, arrows, tables
-├── generation/
-│   ├── page.py              # make_page(), make_negative_page(), apply_noise()
-│   └── dataset.py           # generate_dataset(), save_sample(), CLI entry point
-├── training/
-│   └── trainer.py           # train(), CLI entry point
-├── inference/
-│   ├── tiling.py            # generate_tiles()
-│   ├── nms.py               # nms()
-│   ├── pairing.py           # pair_detections() via Hungarian matching
-│   └── detector.py          # detect_tiled(), detect_full(), draw_boxes(), CLI
-└── viz/
-    └── labels.py            # visualize_split(), draw_boxes(), CLI entry point
-
-annotate/                    # Flask annotation tool (unchanged)
-config/
-├── data.yaml                # YOLO dataset paths
-└── pipeline.yaml
-data/                        # data files (gitignored)
-runs/                        # training checkpoints (gitignored)
+Pairs structures and labels by minimising total centroid-to-centroid distance. Zero config, zero weights download. Useful for simple document layouts or as a fast sanity check.
+
+```python
+from structflo.cser.pipeline import ChemPipeline, HungarianMatcher
+
+pipeline = ChemPipeline(
+    matcher=HungarianMatcher(max_distance=500),
+)
 ```
 
----
+`max_distance` — maximum pixel distance for a valid pair. Increase for large pages; reduce to avoid false pairings on dense layouts.
 
-## Data directory layout
+## Downstream Processing
 
-```
-data/
-├── smiles/
-│   └── chembl_smiles.csv    # ~20 k SMILES from ChEMBL
-├── fonts/                   # TTF/OTF fonts for label rendering
-├── distractors/             # ~1 k real photos (sf-download-distractors output)
-├── generated/               # synthetic dataset (sf-generate output)
-│   ├── train/
-│   └── val/
-└── real/                    # manually annotated real pages (sf-annotate output)
-    ├── images/
-    ├── labels/
-    └── ground_truth/
+**structflo.cser** outputs cropped image pairs. Plug in any converter for SMILES and any OCR for label text.
+
+### SMILES extraction
+
+DECIMER is bundled by default. Swap for MolScribe or any custom `BaseSmilesExtractor`:
+
+```python
+from structflo.cser.pipeline.smiles_extractor import BaseSmilesExtractor
+
+class MyExtractor(BaseSmilesExtractor):
+    def extract(self, image) -> str:
+        return my_model.predict(image)
+
+pipeline = ChemPipeline(smiles_extractor=MyExtractor())
 ```
 
----
+### OCR
+
+EasyOCR is bundled by default. Swap for any custom `BaseOCR`:
 
-## YOLO label format
+```python
+from structflo.cser.pipeline.ocr import BaseOCR
 
-Each `.txt` label file contains one line per annotated object:
+class MyOCR(BaseOCR):
+    def extract(self, image) -> str:
+        return my_ocr.read(image)
 
+pipeline = ChemPipeline(ocr=MyOCR())
 ```
-<class_id> <cx> <cy> <w> <h>   (all normalised to [0, 1])
+
+## CLI
+
+Run extraction directly from the terminal:
+
+```bash
+# Detect and pair structures/labels in a directory of images
+sf-detect --image_dir data/test_images/ --conf 0.60 --no_tile --pair --max_dist 500
+
+# Full pipeline: detect → match → SMILES + OCR
+sf-extract page.png
 ```
 
-| class_id | name |
-|----------|------|
-| 0 | chemical_structure |
-| 1 | compound_label |
+All available commands:
 
-Ground-truth JSON files in `ground_truth/` contain raw pixel coordinates plus `smiles` and `label_text` for downstream analysis.
+| Command                   | Description                                |
+| ------------------------- | ------------------------------------------ |
+| `sf-detect`               | Run YOLO detection on images               |
+| `sf-extract`              | Full pipeline: detect → match → extract    |
+| `sf-generate`             | Generate synthetic training data           |
+| `sf-train`                | Train the YOLO detection model             |
+| `sf-train-lps`            | Train the Learned Pair Scorer              |
+| `sf-eval-lps`             | Evaluate LPS on a test set                |
+| `sf-fetch-smiles`         | Download SMILES from ChEMBL               |
+| `sf-download-distractors` | Download distractor images for generation  |
+| `sf-annotate`             | Launch the web annotation server           |
 
----
+## Notebooks
+
+| Notebook | Description |
+| -------- | ----------- |
+| [01-quickstart.ipynb](notebooks/01-quickstart.ipynb) | Step-by-step pipeline walkthrough: detect → match → enrich, then one-call convenience API |
+| [02-LPS.ipynb](notebooks/02-LPS.ipynb) | Using the Learned Pair Scorer for improved matching on complex document pages |
 
-## Key design decisions
+## License
 
-- **Union bounding box** — each compound panel is annotated as the union of structure + label (1 class for YOLO). The GT JSON preserves the individual boxes.
-- **No horizontal flips** — chemical handedness matters; `fliplr=0` is enforced during training.
-- **15 % negative pages** — pages with no structures teach the model to output nothing for non-chemistry content.
-- **Multi-DPI generation** — pages at {96, 144, 200, 300} DPI create scale variance, improving robustness to different scanning resolutions.
-- **Tiled inference** — A4 pages (2480 × 3508 px) are tiled into 1536 px chunks with 20 % overlap to stay within GPU memory.
+Apache License 2.0
diff --git a/docs/images/example-1.png b/docs/images/example-1.png
new file mode 100644
index 0000000..6f7226b
Binary files /dev/null and b/docs/images/example-1.png differ
diff --git a/docs/images/example-2.png b/docs/images/example-2.png
new file mode 100644
index 0000000..650e957
Binary files /dev/null and b/docs/images/example-2.png differ
diff --git a/structflo/cser/pipeline/pipeline.py b/structflo/cser/pipeline/pipeline.py
index cbca452..77b7ce6 100644
--- a/structflo/cser/pipeline/pipeline.py
+++ b/structflo/cser/pipeline/pipeline.py
@@ -12,7 +12,8 @@
 from structflo.cser.inference.detector import detect_full, detect_tiled
 from structflo.cser.weights import resolve_weights
 
-from structflo.cser.pipeline.matcher import BaseMatcher, HungarianMatcher
+from structflo.cser.lps import LearnedMatcher
+from structflo.cser.pipeline.matcher import BaseMatcher
 from structflo.cser.pipeline.models import BBox, CompoundPair, Detection
 from structflo.cser.pipeline.ocr import BaseOCR, EasyOCRExtractor
 from structflo.cser.pipeline.smiles_extractor import (
@@ -76,7 +77,8 @@ def __init__(
             weights:          Weights version tag (e.g. ``"v1.0"``) or path to a
                               local ``.pt`` file.  ``None`` auto-downloads the
                               latest published weights.
-            matcher:          Pairing strategy.  Defaults to HungarianMatcher.
+            matcher:          Pairing strategy.  Defaults to LearnedMatcher
+                              (auto-downloads weights from HuggingFace Hub).
             smiles_extractor: SMILES model.  Defaults to DecimerExtractor.
             ocr:              OCR engine.  Defaults to PaddleOCRExtractor.
             tile:             Use sliding-window tiling during detection.
@@ -86,7 +88,7 @@ def __init__(
                               Defaults to True to match training data distribution.
         """
         self._weights = weights  # version tag, local path str/Path, or None
-        self._matcher = matcher or HungarianMatcher()
+        self._matcher = matcher or LearnedMatcher()
         self._smiles = smiles_extractor or DecimerExtractor()
         self._ocr = ocr or EasyOCRExtractor()
         self.tile = tile
@@ -190,6 +192,64 @@ def process(self, image: ImageLike) -> list[CompoundPair]:
         pairs = self.match(detections, image=img)
         return self.enrich(pairs, img)
 
+    def process_pdf(
+        self,
+        pdf_path: Path | str,
+        *,
+        dpi: int = 150,
+        output_pdf: Path | str | None = None,
+    ) -> list[list[CompoundPair]]:
+        """Run the full pipeline on every page of a PDF.
+
+        Pages are processed one at a time so memory usage stays bounded
+        regardless of document length.
+
+        Args:
+            pdf_path:   Path to the input PDF.
+            dpi:        Rendering resolution.  150 dpi works well for typical
+                        journal pages; use 200-300 for small or dense text.
+            output_pdf: Optional path for an annotated output PDF.  When given,
+                        each page is rendered with bounding boxes, pairing
+                        lines, and extracted SMILES / label text, then saved
+                        as a multi-page PDF.
+
+        Returns:
+            A list with one entry per page; each entry is a list of
+            ``CompoundPair`` objects with ``smiles`` and ``label_text``
+            populated.
+        """
+        import fitz  # pymupdf — required dependency
+
+        doc = fitz.open(str(pdf_path))
+        mat = fitz.Matrix(dpi / 72, dpi / 72)
+        all_results: list[list[CompoundPair]] = []
+
+        if output_pdf is not None:
+            import matplotlib.pyplot as plt
+            from matplotlib.backends.backend_pdf import PdfPages
+            from structflo.cser.viz import plot_results
+
+            pdf_out: PdfPages | None = PdfPages(str(output_pdf))
+        else:
+            pdf_out = None
+
+        try:
+            for page in doc:
+                pix = page.get_pixmap(matrix=mat, colorspace=fitz.csRGB)
+                img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
+                pairs = self.process(img)
+                all_results.append(pairs)
+                if pdf_out is not None:
+                    fig = plot_results(img, pairs)
+                    pdf_out.savefig(fig, bbox_inches="tight")
+                    plt.close(fig)
+        finally:
+            doc.close()
+            if pdf_out is not None:
+                pdf_out.close()
+
+        return all_results
+
     # ------------------------------------------------------------------
     # Output helpers  (static — can also be called on the class directly)
     # ------------------------------------------------------------------