Skip to content

Conversation

@mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Jan 22, 2026

Why is needed?

For sharing and reusing speech processors, it is convenient to be able to run speech processors in docker, so that the full environment is available. This is important e.g. for IWSLT campaigns, where organizers need to run participants' solutions.

What does the PR do?

It creates a new gRPC-based speech processor and a HTTP server that exposes a configured speech processor. In this way, the simulstream server/inference can be run setting the HTTP-based speech processor adn configuring it to communicate with the HTTP server that can run in a Docker container.

How is this documented?

Updated documentation and added an example to build a docker in the repo.

How was the PR tested?

Manual runs.

@mgaido91 mgaido91 requested a review from sarapapi January 22, 2026 10:58
@mgaido91 mgaido91 self-assigned this Jan 22, 2026
@mgaido91
Copy link
Contributor Author

This depends on #12

@mgaido91 mgaido91 added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 22, 2026
@mgaido91 mgaido91 changed the title Enable using speech processor in Docker by using gRPC Enable using speech processor in Docker by using HTTP Jan 23, 2026
Copy link
Contributor

@sarapapi sarapapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few questions/clairification comments, but in general, it's good for me

docker run --rm --gpus=all -p 8080:8080 http_speech_processor
```

And then, you can use `simulstream` setting the proxy HTTP processor to access your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And then, you can use `simulstream` setting the proxy HTTP processor to access your
And then, you can use `simulstream` by setting the proxy HTTP processor to access your

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here simulstream is the command, not the tool

@@ -0,0 +1,32 @@
# Example of Docker Speech Processor

This folder contains a Dockerfile that is a working example of how to build a Docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This folder contains a Dockerfile that is a working example of how to build a Docker
This folder contains a [Dockerfile](examples/http_docker/Dockerfile) that is a working example of how to build a Docker

--metrics-log-file $YOUR_OUTPUT_JSONL_FILE
```

Please notice that this example Dockerfile runs a Canary sliding window speech processor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please notice that this example Dockerfile runs a Canary sliding window speech processor.
Please notice that [this Dockerfile example](examples/http_docker/Dockerfile) runs a Canary sliding window speech processor.

self.close_session(session_id)

def shutdown(self) -> None:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either you add comments like this to all methods, or you drop this one that is pretty redundant


def get_speech_chunk_size(self, session_id):
processor = self.speech_processor_manager.get(session_id)
self._send_json_response(200, {"speech_chunk_size": processor.speech_chunk_size})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this 200?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are stardard HTTP codes. 204 means no content, 200 is just "everything good". This is standard HTTP protocol...

processor = self.speech_processor_manager.get(session_id)
output = processor.process_chunk(
np.frombuffer(base64.b64decode(waveform), dtype=np.float32))
self._send_json_response(200, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

def put_source_language(self, session_id, language):
processor = self.speech_processor_manager.get(session_id)
processor.set_source_language(language)
self._send_json_response(204)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about the meaning of the response

yaml_config(args.speech_processor_config), server_config.pool_size, server_config.ttl
)
speech_processor_loading_time = time.time() - speech_processor_loading_time
LOGGER.info(f"Loaded speech processor in {speech_processor_loading_time:.3f} seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the time to load the model, also, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Co-authored-by: sarapapi <57095209+sarapapi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants