-
Notifications
You must be signed in to change notification settings - Fork 12
Swap to using 'get_node_ids' that may be sharded or fetch a split, etc #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 17:56:25UTC : 🔄 @ 19:10:59UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:56:36UTC : 🔄 |
GiGL Automation@ 17:56:55UTC : 🔄 @ 19:17:55UTC : ✅ Workflow completed successfully. |
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 23:31:16UTC : 🔄 @ 24:39:48UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:31:19UTC : 🔄 |
GiGL Automation@ 23:31:21UTC : 🔄 @ 24:55:47UTC : ✅ Workflow completed successfully. |
|
/unit_test_py |
|
/integration_test |
GiGL Automation@ 23:32:48UTC : 🔄 @ 24:51:43UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:32:52UTC : 🔄 |
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 18:21:41UTC : 🔄 @ 19:40:00UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:21:49UTC : 🔄 @ 19:43:14UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:21:52UTC : 🔄 |
mkolodner-sc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle! Left a few small comments, generally LGTM.
In the future, it might be easier to review this if the testing utility changes were moved to a separate PR from the get_node_ids change here.
| def get_node_ids_for_rank( | ||
| rank: int, | ||
| world_size: int, | ||
| node_type: Optional[NodeType] = DEFAULT_HOMOGENEOUS_NODE_TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this was defaulted to DEFAULT_HOMOGENEOUS_NODE_TYPE. Does this mean that users will need to provide this now if they are operating in the "homogeneous_with_labeled_edge_type" setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I updated, I forgot that we do have truly homogeneous settings in GiGL (e.g. inference for homogeneous datasets).
We don't have any users so this should be a safe change?
| return shard_nodes_by_process(nodes, rank, world_size) | ||
|
|
||
| if rank is not None and world_size is not None: | ||
| return shard_nodes_by_process(nodes, rank, world_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just to double check, our existing dataloader classes are also doing the sharding under the hood right now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are also doing sharding for the colocated mode, we should probably consolidate tbch but even then this shard is for the (compute node, compute world size), not (compute process, num compute processes) world.
Maybe a good idea to rename this to split tensor or something?
| # ============================================================================= | ||
|
|
||
| USER: Final[NodeType] = NodeType("user") | ||
| STORY: Final[NodeType] = NodeType("story") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there was a comment prior that we should prefer item type to story to be more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
man I can update but is there really no place for fun and whimsy in the world...
| def create_homogeneous_dataset( | ||
| edge_index: torch.Tensor, | ||
| node_features: Optional[torch.Tensor] = None, | ||
| node_feature_dim: int = DEFAULT_HOMOGENEOUS_NODE_FEATURE_DIM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should node feature dim be inferred from node features?
I see that right now, if it's None, we are creating an empty tensor. I feel like if it's None, this should mean my dataset has no NodeFeatures, and if we want to indicate there's no node features on the machine, an empty tensor should be provided as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you think it's worth adding an arg here for edge features while we are making this, as well as node_labels to be similar to the below function?
| node_features: Mapping of NodeType -> feature tensor [num_nodes, feature_dim]. | ||
| If None, creates zero features with the specified dimension. | ||
| node_labels: Mapping of NodeType -> label tensor [num_nodes, 1]. | ||
| If None, creates labels equal to node indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reusability in other tests, I feel like the None behavior again should be to default to no labels/features, not to auto-populate them.
| return dataset | ||
|
|
||
|
|
||
| def create_heterogeneous_dataset_with_labels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth specifying in function header that this is ABLP labels, not node classification
Scope of work done
Per discussion in #438, migrating to some
get_node_idswhich is more flexible and can fetch a split and optionally shard, etc.Additionally, breaking out the dataset building utils to their own (tested) file,
tests/test_assets/distributed/test_dataset.pyand then creatingremote_dist_dataset_test.pyas unit tests for RDI.Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO