Swap to using 'get_node_ids' that may be sharded or fetch a split, etc #467

kmontemayor2-sc · 2026-01-27T17:56:04Z

Scope of work done

Per discussion in #438, migrating to some get_node_ids which is more flexible and can fetch a split and optionally shard, etc.

Additionally, breaking out the dataset building utils to their own (tested) file, tests/test_assets/distributed/test_dataset.py and then creating remote_dist_dataset_test.py as unit tests for RDI.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

kmontemayor2-sc · 2026-01-27T17:56:12Z

/unit_test_py

kmontemayor2-sc · 2026-01-27T17:56:19Z

/integration_test

kmontemayor2-sc · 2026-01-27T17:56:23Z

/e2e_test

github-actions · 2026-01-27T17:56:25Z

GiGL Automation

@ 17:56:25UTC : 🔄 Python Unit Test started.

@ 19:10:59UTC : ✅ Workflow completed successfully.

github-actions · 2026-01-27T17:56:36Z

GiGL Automation

@ 17:56:36UTC : 🔄 Integration Test started.

github-actions · 2026-01-27T17:56:56Z

GiGL Automation

@ 17:56:55UTC : 🔄 E2E Test started.

@ 19:17:55UTC : ✅ Workflow completed successfully.

kmontemayor2-sc · 2026-01-28T23:31:01Z

/unit_test_py

kmontemayor2-sc · 2026-01-28T23:31:06Z

/integration_test

kmontemayor2-sc · 2026-01-28T23:31:10Z

/e2e_test

github-actions · 2026-01-28T23:31:16Z

GiGL Automation

@ 23:31:16UTC : 🔄 Python Unit Test started.

@ 24:39:48UTC : ✅ Workflow completed successfully.

github-actions · 2026-01-28T23:31:20Z

GiGL Automation

@ 23:31:19UTC : 🔄 Integration Test started.

github-actions · 2026-01-28T23:31:21Z

GiGL Automation

@ 23:31:21UTC : 🔄 E2E Test started.

@ 24:55:47UTC : ✅ Workflow completed successfully.

kmontemayor2-sc · 2026-01-28T23:32:35Z

/unit_test_py

kmontemayor2-sc · 2026-01-28T23:32:40Z

/integration_test

github-actions · 2026-01-28T23:32:48Z

GiGL Automation

@ 23:32:48UTC : 🔄 Python Unit Test started.

@ 24:51:43UTC : ✅ Workflow completed successfully.

github-actions · 2026-01-28T23:32:52Z

GiGL Automation

@ 23:32:52UTC : 🔄 Integration Test started.

kmontemayor2-sc · 2026-01-29T18:21:28Z

/unit_test_py

kmontemayor2-sc · 2026-01-29T18:21:34Z

/integration_test

kmontemayor2-sc · 2026-01-29T18:21:39Z

/e2e_test

github-actions · 2026-01-29T18:21:41Z

GiGL Automation

@ 18:21:41UTC : 🔄 Python Unit Test started.

@ 19:40:00UTC : ✅ Workflow completed successfully.

github-actions · 2026-01-29T18:21:50Z

GiGL Automation

@ 18:21:49UTC : 🔄 E2E Test started.

@ 19:43:14UTC : ✅ Workflow completed successfully.

github-actions · 2026-01-29T18:21:53Z

GiGL Automation

@ 18:21:52UTC : 🔄 Integration Test started.

mkolodner-sc

Thanks Kyle! Left a few small comments, generally LGTM.

In the future, it might be easier to review this if the testing utility changes were moved to a separate PR from the get_node_ids change here.

gigl/distributed/graph_store/remote_dist_dataset.py

mkolodner-sc · 2026-01-29T20:26:25Z

gigl/distributed/graph_store/storage_utils.py

-def get_node_ids_for_rank(
-    rank: int,
-    world_size: int,
-    node_type: Optional[NodeType] = DEFAULT_HOMOGENEOUS_NODE_TYPE,


Previously this was defaulted to DEFAULT_HOMOGENEOUS_NODE_TYPE. Does this mean that users will need to provide this now if they are operating in the "homogeneous_with_labeled_edge_type" setting?

Yeah I updated, I forgot that we do have truly homogeneous settings in GiGL (e.g. inference for homogeneous datasets).

We don't have any users so this should be a safe change?

mkolodner-sc · 2026-01-29T20:27:14Z

gigl/distributed/graph_store/storage_utils.py

-    return shard_nodes_by_process(nodes, rank, world_size)
+
+    if rank is not None and world_size is not None:
+        return shard_nodes_by_process(nodes, rank, world_size)


Thanks! Just to double check, our existing dataloader classes are also doing the sharding under the hood right now right?

They are also doing sharding for the colocated mode, we should probably consolidate tbch but even then this shard is for the (compute node, compute world size), not (compute process, num compute processes) world.

Maybe a good idea to rename this to split tensor or something?

mkolodner-sc · 2026-01-29T20:28:38Z

tests/test_assets/distributed/test_dataset.py

+# =============================================================================
+
+USER: Final[NodeType] = NodeType("user")
+STORY: Final[NodeType] = NodeType("story")


I believe there was a comment prior that we should prefer item type to story to be more generic?

man I can update but is there really no place for fun and whimsy in the world...

mkolodner-sc · 2026-01-29T20:29:40Z

tests/test_assets/distributed/test_dataset.py

+def create_homogeneous_dataset(
+    edge_index: torch.Tensor,
+    node_features: Optional[torch.Tensor] = None,
+    node_feature_dim: int = DEFAULT_HOMOGENEOUS_NODE_FEATURE_DIM,


Should node feature dim be inferred from node features?

I see that right now, if it's None, we are creating an empty tensor. I feel like if it's None, this should mean my dataset has no NodeFeatures, and if we want to indicate there's no node features on the machine, an empty tensor should be provided as input

Also, do you think it's worth adding an arg here for edge features while we are making this, as well as node_labels to be similar to the below function?

mkolodner-sc · 2026-01-29T20:32:40Z

tests/test_assets/distributed/test_dataset.py

+        node_features: Mapping of NodeType -> feature tensor [num_nodes, feature_dim].
+            If None, creates zero features with the specified dimension.
+        node_labels: Mapping of NodeType -> label tensor [num_nodes, 1].
+            If None, creates labels equal to node indices.


For reusability in other tests, I feel like the None behavior again should be to default to no labels/features, not to auto-populate them.

mkolodner-sc · 2026-01-29T20:33:30Z

tests/test_assets/distributed/test_dataset.py

+    return dataset
+
+
+def create_heterogeneous_dataset_with_labels(


Might be worth specifying in function header that this is ABLP labels, not node classification

Swap to using 'get_node_ids' that may be sharded or fetch a split, etc

55e1a4f

Merge branch 'main' into kmonte/get-node-ids

81c46c7

more tests

78e05a8

kmonte added 4 commits January 29, 2026 00:41

updates

a75e65c

Merge branch 'main' into kmonte/get-node-ids

698e1c2

simplify

d105d69

update for sharding

d3a197e

mkolodner-sc reviewed Jan 29, 2026

View reviewed changes

fixes

957438e

Swap to using 'get_node_ids' that may be sharded or fetch a split, etc #467

Are you sure you want to change the base?

Swap to using 'get_node_ids' that may be sharded or fetch a split, etc #467

Uh oh!

Conversation

kmontemayor2-sc commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmontemayor2-sc commented Jan 27, 2026

Uh oh!

kmontemayor2-sc commented Jan 27, 2026

Uh oh!

kmontemayor2-sc commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

github-actions bot commented Jan 27, 2026

GiGL Automation

Uh oh!

github-actions bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

kmontemayor2-sc commented Jan 28, 2026

Uh oh!

kmontemayor2-sc commented Jan 28, 2026

Uh oh!

kmontemayor2-sc commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

github-actions bot commented Jan 28, 2026

GiGL Automation

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

kmontemayor2-sc commented Jan 28, 2026

Uh oh!

kmontemayor2-sc commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

github-actions bot commented Jan 28, 2026

GiGL Automation

Uh oh!

kmontemayor2-sc commented Jan 29, 2026

Uh oh!

kmontemayor2-sc commented Jan 29, 2026

Uh oh!

kmontemayor2-sc commented Jan 29, 2026

Uh oh!

github-actions bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

github-actions bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GiGL Automation

Uh oh!

github-actions bot commented Jan 29, 2026

GiGL Automation

Uh oh!

mkolodner-sc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmontemayor2-sc commented Jan 27, 2026 •

edited

Loading

github-actions bot commented Jan 27, 2026 •

edited

Loading

github-actions bot commented Jan 27, 2026 •

edited

Loading

github-actions bot commented Jan 28, 2026 •

edited

Loading

github-actions bot commented Jan 28, 2026 •

edited

Loading

github-actions bot commented Jan 28, 2026 •

edited

Loading

github-actions bot commented Jan 29, 2026 •

edited

Loading

github-actions bot commented Jan 29, 2026 •

edited

Loading