-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
The BERT task is currently being added to MLBench on this branch. Pre-processing works, and all pre-processed data is already on a bucket. However, the pre-training requires scaling the data by 10x, resulting in almost 370GB of data. This amount of data cannot be downloaded by each worker, as it would require huge disk sizes.
One way of going around this, would be to mount the bucket containing all preprocessed shards, and download them on demand
Metadata
Metadata
Assignees
Labels
No labels