Funded by the EU.
This component is the FaaS equivalent of the Provision-Service. It is responsible for creating the necessary resources to run jobs and (optionally) deleting them once it is done. It also aggregates and stores logs from the other FaaS components. Unlike the Provision-Service, it uses plain HTTP instead of kafka messages. It is an HTTP server with numerous endpoints.
The component itself is fairly generic: The resources it creates, as well as the order in which they are created, are all specified in a configuration field. Not any combination of resources is valid, however. Whatever components are deployed must eventually notify the FaaS-Operator that their work has finished, otherwise it will not clean up the kubernetes resources or its internal state. To be consistent with the terminology used in the code, we will use the word "function" to refer to one instance of the FaaS component in an EO4EU workflow. Each such instance must use the FaaS-Operator to run its code.
The state of the FaaS-Operator consists of the following resources:
- A kubernetes ConfigMap that holds all the configuration options. Refer to the
faas-controlrepository for examples. - An SQLite3 database stored in a PVC. This, in turn, keeps track of:
- The progress and resulting metadata of each running function
- The logs from all the FaaS components
- A series of Go template files that are used to dynamically create kubernetes YAML manifests. These are stored in the same PVC as the SQLite3 database and must be uploaded manually by the administrator using the
/admin/manifestsHTTP endpoint of the FaaS-Operator. Refer to thefaas-controlrepository for examples.
A properly configured FaaS function will look like this:
- The FaaS-Proxy sends a
submitrequest to the FaaS-Operator, containing the UUID of the workflow and all the necessary function parameters. - If there are no currently running functions with the same UUID, the FaaS-Operator splits the data into N batches and then creates all the kubernetes resources. By default, this consists of a new namespace, one FaaS-Gateway deployment and N FaaS-Runner jobs, along with auxillary resources.
- The FaaS-Proxy periodically polls the FaaS-Operator using the
/pollendpoint, to see if it has finished. - Once N/N batches have sent a message to the
/end/{batch id}endpoint, the function is marked as finished and the namespace is deleted. - The next poll request from the FaaS-Proxy returns a list of responses, one for each job, as well as metainfo containing the location of all the output files.
- The FaaS-Proxy aggregates the responses into a metainfo text and sends it to the next component via kafka. The Notification-Manager is informed that the component is finished.
More granularly, it should go like this:
- The FaaS-Operator creates a FaaS-Gateway deployment (refer to the
faas-gatewayrepository) and waits for it to send a message to the/ping/{function UUID}endpoint. If it doesn't see a ping for some time, the function provisioning is considered failed. - The FaaS-Operator creates N FaaS-Runner jobs (refer to the
faas-runnerrepository) and waits for all of them to send a message to the/ping/{function UUID}/{batch ID}endpoint. If it doesn't see a ping for some time, the function provisioning is considered failed. - A kubernetes NetworkPolicy is created such that the FaaS-Runners can only reach endpoints in their own namespace. Any communication with the outside is handled by the FaaS-Gateway.
- Each runner requests the names of the user specified packages from the gateway's
/pkg/listendpoint. The gateway responds with an HTTP 202 - Accepted until it has downloaded all the packages. The packages are downloaded as.wheelfiles; source distributions are not supported. - Upon receiving an HTTP 200 - OK with a list of packages as content, each runner requests the actual package contents using the
/pkg/getendpoint of the gateway for each package name it's given. - Each runner installs all the
.wheelfiles fetched from the gateway. - Each runner requests the necessary data files from the gateway's
/s3/getendpoint. - The data files are downloaded on-demand by the gateway.
- Each runner runs the user's code under a separate (non-root) Linux user.
- Each runner sends the output files to the gateway and requests they be uploaded to S3 via the
/s3/setendpoint. - The gateway checks that the files have one of the accepted MIME types and then uploads them to S3.
- Each runner sends an
/end/{batch ID}request to the operator, containing the batch's full response and metainfo.
