Skip to content

Sagameker ProcesingJob fails pydantic validation when tags are passed as part of the job #5442

@sateeshmannar

Description

@sateeshmannar

PySDK Version

  • PySDK V2 (2.x)
  • PySDK V3 (3.x)

Describe the bug
in Sagemaker 3.x, Processor has a tags parameter. However ProcessingJob dosent have the tags parameter defined. When a processing job is executed with tags. the job suceeds but the pydantic validation fails crashing the pipeline.

To reproduce
Create a simple processing job by instantiating a new instance of Processor with tags parameter. where tags = [{'Key': 'project', 'Value': 'tags-testing'}]
processor.run(...)
The .run call will throw the following error

        def submit(request):
            try:
                logger.info("Creating processing-job with name %s", process_args["job_name"])
                logger.debug("process request: %s", json.dumps(request, indent=4))
                self.sagemaker_session.sagemaker_client.create_processing_job(**request)
            except Exception as e:
                troubleshooting = (
                    "https://docs.aws.amazon.com/sagemaker/latest/dg/"
                    "sagemaker-python-sdk-troubleshooting.html"
                    "#sagemaker-python-sdk-troubleshooting-create-processing-job"
                )
                logger.error(
                    "Please check the troubleshooting guide for common errors: %s", troubleshooting
                )
                raise e
    
        self.sagemaker_session._intercept_create_request(serialized_request, submit, "process")
    
        from sagemaker.core.utils.code_injection.codec import transform
    
        transformed = transform(serialized_request, "CreateProcessingJobRequest")
    
>       return ProcessingJob(**transformed)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       pydantic_core._pydantic_core.ValidationError: 1 validation error for ProcessingJob
E       tags
E         Extra inputs are not permitted [type=extra_forbidden, input_value=[{'key': 'project', ....''}], input_type=list]
E           For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden

../../env/sagemaker-templates/py_312_1225/lib/python3.12/site-packages/sagemaker/core/processing.py:629: ValidationError

Additional code sample:
# Initialize the Processor
# image_uri: The URI of your Docker image in Amazon ECR
processor = ScriptProcessor(
    image_uri='123456789012.dkr.ecr.us-west-2.amazonaws.com',
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    command=['python3'],
    tags= [{'Key': 'project', 'Value': 'tags-testing'}],  # Pass the tags here
    base_job_name='processing-job-example'
)

# Submit the processing job
# Note: You can pass a local or S3 script to 'code'
processor.run(
    code='preprocess.py',
    wait=True
)

Expected behavior
ProcessingJob should include tags parameter.
The processing job after submission with tags should not throw the follwoing error:


E       pydantic_core._pydantic_core.ValidationError: 1 validation error for ProcessingJob
E       tags
E         Extra inputs are not permitted [type=extra_forbidden, input_value=[{'key': 'project', 

Screenshots or logs
Fix should be made to sagemaker/core/processing.py class ProcessingJob

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 3.x - currently tested on 3.3.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): NA / custom python script processing image
  • Framework version: NA
  • Python version: 3.12
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y

Additional context
Our org relies on tags to organize/audit jobs. without this fix we are stuck with sagemaker SDK < 3.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions