-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
PySDK Version
- PySDK V2 (2.x)
- PySDK V3 (3.x)
Describe the bug
in Sagemaker 3.x, Processor has a tags parameter. However ProcessingJob dosent have the tags parameter defined. When a processing job is executed with tags. the job suceeds but the pydantic validation fails crashing the pipeline.
To reproduce
Create a simple processing job by instantiating a new instance of Processor with tags parameter. where tags = [{'Key': 'project', 'Value': 'tags-testing'}]
processor.run(...)
The .run call will throw the following error
def submit(request):
try:
logger.info("Creating processing-job with name %s", process_args["job_name"])
logger.debug("process request: %s", json.dumps(request, indent=4))
self.sagemaker_session.sagemaker_client.create_processing_job(**request)
except Exception as e:
troubleshooting = (
"https://docs.aws.amazon.com/sagemaker/latest/dg/"
"sagemaker-python-sdk-troubleshooting.html"
"#sagemaker-python-sdk-troubleshooting-create-processing-job"
)
logger.error(
"Please check the troubleshooting guide for common errors: %s", troubleshooting
)
raise e
self.sagemaker_session._intercept_create_request(serialized_request, submit, "process")
from sagemaker.core.utils.code_injection.codec import transform
transformed = transform(serialized_request, "CreateProcessingJobRequest")
> return ProcessingJob(**transformed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E pydantic_core._pydantic_core.ValidationError: 1 validation error for ProcessingJob
E tags
E Extra inputs are not permitted [type=extra_forbidden, input_value=[{'key': 'project', ....''}], input_type=list]
E For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
../../env/sagemaker-templates/py_312_1225/lib/python3.12/site-packages/sagemaker/core/processing.py:629: ValidationError
Additional code sample:
# Initialize the Processor
# image_uri: The URI of your Docker image in Amazon ECR
processor = ScriptProcessor(
image_uri='123456789012.dkr.ecr.us-west-2.amazonaws.com',
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
command=['python3'],
tags= [{'Key': 'project', 'Value': 'tags-testing'}], # Pass the tags here
base_job_name='processing-job-example'
)
# Submit the processing job
# Note: You can pass a local or S3 script to 'code'
processor.run(
code='preprocess.py',
wait=True
)
Expected behavior
ProcessingJob should include tags parameter.
The processing job after submission with tags should not throw the follwoing error:
E pydantic_core._pydantic_core.ValidationError: 1 validation error for ProcessingJob
E tags
E Extra inputs are not permitted [type=extra_forbidden, input_value=[{'key': 'project',
Screenshots or logs
Fix should be made to sagemaker/core/processing.py class ProcessingJob
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 3.x - currently tested on 3.3.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): NA / custom python script processing image
- Framework version: NA
- Python version: 3.12
- CPU or GPU: CPU
- Custom Docker image (Y/N): Y
Additional context
Our org relies on tags to organize/audit jobs. without this fix we are stuck with sagemaker SDK < 3.x