[AUTO-MERGE] Polished existing vector database notebook, added to common workflows by drewoldag · Pull Request #753 · lincc-frameworks/hyrax

drewoldag · 2026-03-05T00:32:36Z

Change Description

Closes #730

The vector database notebook was mostly ready to go, I just polished some of the words. Also added to common workflows page.

FWIW it's a pre-executed notebook, and we could probably export the inference results to a zipped file on zenodo to save ourselves a bunch of cells at the beginning of the notebook. This could be an approach taken for a lot of notebooks...

…some of the words.

review-notebook-app · 2026-03-05T00:32:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2026-03-05T00:58:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.69%. Comparing base (618be04) to head (0337c13).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #753   +/-   ##
=======================================
  Coverage   64.69%   64.69%           
=======================================
  Files          61       61           
  Lines        5894     5894           
=======================================
  Hits         3813     3813           
  Misses       2081     2081

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-03-05T01:26:01Z

Before [`618be04`]	After [`a038bc5`]	Ratio	Benchmark (Parameter)
failed	failed	n/a	data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_hsc1k
failed	failed	n/a	data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
failed	failed	n/a	data_request_benchmarks.DatasetRequestBenchmarks.time_request_all_data
1.94±0.01s	1.97±0.01s	1.02	benchmarks.time_database_connection_help
1.59G	1.62G	1.02	vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(16384, 'chromadb')
9.60±0ms	9.79±0.08ms	1.02	vector_db_benchmarks.VectorDBSearchBenchmarks.time_search_by_vector_many_shards(128, 'chromadb')
37.9±0.1ms	38.2±0.2ms	1.01	benchmarks.time_nb_obj_construct
1.96±0.01s	1.98±0.03s	1.01	benchmarks.time_prepare_help
1.95±0.01s	1.98±0.02s	1.01	benchmarks.time_save_to_database_help
1.96±0.02s	1.97±0.02s	1.01	benchmarks.time_umap_help

Click here to view all benchmarks.

aritraghsh09

Looks good!

Note for the future for all our notebooks: Because we say we are an astronomy framework, we should really move away from CIFAR to something astronomy-related for all our notebooks, but we can do that later once we have basic working docs we are happy with!

SamSandwich07 · 2026-03-05T21:57:38Z

docs/pre_executed/vector_db_demo.ipynb

@@ -5,22 +5,22 @@
   "id": "f163b281",


Line #6. # h.config["vector_db"]["qdrant"]["vector_size"] = 64 # must match the model's latent dimension
Is the models' latent dimension something the user must know themselves or is there some way to recover it from the model? If so, maybe it's good to mention this for users?

Reply via ReviewNB

This is specific to the qdrant vector database. ChromaDB is more forgiving and detects the input size automatically. Qdrant requires it as an input configuration parameter. I'll add a note about it.

Also, the user would know what the latent dimension size is for a model that they created. It is also a configurable parameter for the HyraxAutoencoderV2 model. :)

SamSandwich07 · 2026-03-05T21:57:38Z

docs/pre_executed/vector_db_demo.ipynb

@@ -5,22 +5,22 @@
   "id": "f163b281",


Line #1. conn = h.database_connection()
Does this use the most recently timestamped directory? If so, is there some way to override the database used for the connection like there is with save_to_database()? Like with the latent space, maybe good to mention this for users if true?

Reply via ReviewNB

Good question, yes it does, and yes, the user can specify a different database to connect to. I'll update the comments.

SamSandwich07

Left a couple questions which I am wondering about. Also, would it be good to include a sentence or two about when a user might want to use ChromaDB vs Qdrant? Or is that something that is already elsewhere or the user is expected to know?

Copilot

Pull request overview

This PR polishes the existing pre-executed vector database demo notebook and adds it to the documentation’s “Common workflows” index, addressing issue #730 (“Demo how to create and query a vector database”).

Changes:

Updated the vector DB demo notebook narrative and restructured sections (backend selection, database creation, querying, visualization).
Adjusted demo configuration (model name, epochs, and data_request shape) and streamlined some notebook outputs.
Added the notebook to docs/common_workflows.rst to make it discoverable.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`docs/pre_executed/vector_db_demo.ipynb`	Rewords and reorganizes the vector DB demo; updates config/code cells for model training, DB creation, queries, and visualization.
`docs/common_workflows.rst`	Adds the vector DB demo to the “Common workflows” toctree.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-06T04:43:28Z

docs/pre_executed/vector_db_demo.ipynb

+    "            \"dataset_class\": \"HyraxCifarDataset\",\n",
+    "            \"data_location\": \"./data\",\n",
+    "            \"fields\": [\"image\"],\n",
+    "            \"primary_id_field\": \"object_id\",\n",
+    "            \"split_fraction\": 1.0,\n",
+    "        },\n",
+    "    },\n",
+    "    \"infer\": {\n",
+    "        \"data\": {\n",
+    "            \"dataset_class\": \"HyraxCifarDataset\",\n",
+    "            \"data_location\": \"./data\",\n",
+    "            \"fields\": [\"image\", \"object_id\"],\n",
+    "            \"primary_id_field\": \"object_id\",\n",
+    "        },\n",


split_fraction is set for the train data_request group but not for the infer group while both share the same data_location ("./data"). Hyrax validates that if any group for a given data_location specifies split_fraction, then all groups for that data_location must specify it, and the fractions must sum to <= 1.0. As written, this configuration is expected to error at runtime when h.set_config("data_request", data_request) is called.

Suggested fix: remove split_fraction here, or switch fully to split_fraction-based partitioning by adding split_fraction to all relevant groups and ensuring the per-data_location fractions sum to <= 1.0.

docs/pre_executed/vector_db_demo.ipynb

drewoldag · 2026-03-06T18:22:00Z

@aritraghsh09 The approach I've been taking without talking about it is to use CIFAR for the "common workflow" stuff just because it's small and not specific to any one astronomy area. Then in the science examples, there will be real astro data used.

But to your point, if that seems wrong or unhelpful, we can definitely change it up.

The vector database notebook was mostly ready to go, I just polished …

007cf16

…some of the words.

drewoldag self-assigned this Mar 5, 2026

drewoldag requested review from SamSandwich07 and Copilot and removed request for Copilot March 5, 2026 00:32

drewoldag requested a review from aritraghsh09 March 5, 2026 00:32

Copilot started reviewing on behalf of drewoldag March 5, 2026 00:33 View session

aritraghsh09 approved these changes Mar 5, 2026

View reviewed changes

SamSandwich07 reviewed Mar 5, 2026

View reviewed changes

Merge branch 'main' into issue/730/vdb-notebook

6b0034d

Copilot AI review requested due to automatic review settings March 6, 2026 04:39

Copilot started reviewing on behalf of drewoldag March 6, 2026 04:40 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Merge branch 'main' into issue/730/vdb-notebook

60ccbb3

drewoldag and others added 3 commits March 6, 2026 10:22

Merge branch 'main' into issue/730/vdb-notebook

a390f1d

Responding to PR comments.

f141497

Merge branch 'main' into issue/730/vdb-notebook

b31b7c2

drewoldag enabled auto-merge (squash) March 7, 2026 00:41

drewoldag changed the title ~~Polished existing vector database notebook, added to common workflows~~ [AUTO-MERGE] Polished existing vector database notebook, added to common workflows Mar 7, 2026

Merge branch 'main' into issue/730/vdb-notebook

0337c13

drewoldag merged commit 7448bc9 into main Mar 7, 2026
6 of 7 checks passed

drewoldag deleted the issue/730/vdb-notebook branch March 7, 2026 00:59

Conversation

drewoldag commented Mar 5, 2026

Change Description

Uh oh!

review-notebook-app bot commented Mar 5, 2026

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aritraghsh09 left a comment

Choose a reason for hiding this comment

Uh oh!

SamSandwich07 Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drewoldag Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

drewoldag Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

SamSandwich07 Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drewoldag Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

SamSandwich07 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drewoldag commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading

SamSandwich07 Mar 5, 2026 •

edited

Loading

SamSandwich07 Mar 5, 2026 •

edited

Loading