Skip to content

Conversation

@kmontemayor2-sc
Copy link
Collaborator

@kmontemayor2-sc kmontemayor2-sc commented Jan 27, 2026

Scope of work done

Per discussion in #438, migrating to some get_node_ids which is more flexible and can fetch a split and optionally shard, etc.

Additionally, breaking out the dataset building utils to their own (tested) file, tests/test_assets/distributed/test_dataset.py and then creating remote_dist_dataset_test.py as unit tests for RDI.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

@github-actions
Copy link
Contributor

github-actions bot commented Jan 27, 2026

GiGL Automation

@ 17:56:25UTC : 🔄 Python Unit Test started.

@ 19:10:59UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

GiGL Automation

@ 17:56:36UTC : 🔄 Integration Test started.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 27, 2026

GiGL Automation

@ 17:56:55UTC : 🔄 E2E Test started.

@ 19:17:55UTC : ✅ Workflow completed successfully.

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

GiGL Automation

@ 23:31:16UTC : 🔄 Python Unit Test started.

@ 24:39:48UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

GiGL Automation

@ 23:31:19UTC : 🔄 Integration Test started.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

GiGL Automation

@ 23:31:21UTC : 🔄 E2E Test started.

@ 24:55:47UTC : ✅ Workflow completed successfully.

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

GiGL Automation

@ 23:32:48UTC : 🔄 Python Unit Test started.

@ 24:51:43UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

GiGL Automation

@ 23:32:52UTC : 🔄 Integration Test started.

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

GiGL Automation

@ 18:21:41UTC : 🔄 Python Unit Test started.

@ 19:40:00UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 29, 2026

GiGL Automation

@ 18:21:49UTC : 🔄 E2E Test started.

@ 19:43:14UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

GiGL Automation

@ 18:21:52UTC : 🔄 Integration Test started.

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle! Left a few small comments, generally LGTM.

In the future, it might be easier to review this if the testing utility changes were moved to a separate PR from the get_node_ids change here.

def get_node_ids_for_rank(
rank: int,
world_size: int,
node_type: Optional[NodeType] = DEFAULT_HOMOGENEOUS_NODE_TYPE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this was defaulted to DEFAULT_HOMOGENEOUS_NODE_TYPE. Does this mean that users will need to provide this now if they are operating in the "homogeneous_with_labeled_edge_type" setting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I updated, I forgot that we do have truly homogeneous settings in GiGL (e.g. inference for homogeneous datasets).

We don't have any users so this should be a safe change?

return shard_nodes_by_process(nodes, rank, world_size)

if rank is not None and world_size is not None:
return shard_nodes_by_process(nodes, rank, world_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just to double check, our existing dataloader classes are also doing the sharding under the hood right now right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are also doing sharding for the colocated mode, we should probably consolidate tbch but even then this shard is for the (compute node, compute world size), not (compute process, num compute processes) world.

Maybe a good idea to rename this to split tensor or something?

# =============================================================================

USER: Final[NodeType] = NodeType("user")
STORY: Final[NodeType] = NodeType("story")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there was a comment prior that we should prefer item type to story to be more generic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

man I can update but is there really no place for fun and whimsy in the world...

def create_homogeneous_dataset(
edge_index: torch.Tensor,
node_features: Optional[torch.Tensor] = None,
node_feature_dim: int = DEFAULT_HOMOGENEOUS_NODE_FEATURE_DIM,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should node feature dim be inferred from node features?

I see that right now, if it's None, we are creating an empty tensor. I feel like if it's None, this should mean my dataset has no NodeFeatures, and if we want to indicate there's no node features on the machine, an empty tensor should be provided as input

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think it's worth adding an arg here for edge features while we are making this, as well as node_labels to be similar to the below function?

node_features: Mapping of NodeType -> feature tensor [num_nodes, feature_dim].
If None, creates zero features with the specified dimension.
node_labels: Mapping of NodeType -> label tensor [num_nodes, 1].
If None, creates labels equal to node indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reusability in other tests, I feel like the None behavior again should be to default to no labels/features, not to auto-populate them.

return dataset


def create_heterogeneous_dataset_with_labels(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth specifying in function header that this is ABLP labels, not node classification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants