Introducing 🚀 – our free batch evaluation tool for Large Language Models. Get started for free!

What is “production” Machine Learning?

August 16, 2022
Emmanuel Turlay
Emmanuel Turlay
Founder, CEO
What is “production” Machine Learning?

In traditional software development, “production” typically refers to instances of an application that are used by actual users – human or machine.

Whether it’s a web application, an application embedded in a device or machine, or a piece of infrastructure, production systems receive real-world traffic and are supposed to accomplish their mission without issues.

Production systems usually come with these guarantees:

  • Safety
    Before the application is deployed in production, it is thoroughly tested by an extensive suite of unit tests, integration tests, and sometimes manual tests. Scenarios covering happy-paths and corner cases are checked against expected results.
  • Traceability
    A record is kept of exactly what code was deployed, by whom, at what time, with what configurations, and to what infrastructure.
  • Observability
    Once the system is in production, it is possible to access logs, observe real-time resource usage, and get alerted when certain metrics veer outside of acceptable bounds.
  • Scalability
    The production system is able to withstand the expected incoming traffic and then some. If necessary, it is capable of scaling up or down based on demand and cost constraints.

What is a production ML system?

Some ML models are deployed and served by an endpoint (e.g. REST, gRPC), or directly embedded inside a larger application. They generate inferences on demand for each new sets of features sent their way (i.e. real-time inferencing).

Others are developed for the purpose of generating a set of inferences and persisting them in a database table as a one time task (i.e. batch inferencing). For example, a model can predict a set of customers’ lifetime value, write those in a table for consumption by other systems (metrics dashboard, downstream models, production applications, etc.).

Whether built for real-time or batch inferencing, a production ML system refers to the end-to-end training and inferencing pipeline: the entire chain of transformations that turn raw data sitting in a data warehouse, into a trained model, which is then used to generate inferences.

What guarantees for production ML systems?

We saw at the top what guarantees are expected from a traditional software system. What similar guarantees should we expect from production-grade ML systems?


In ML, safety means having a high level of certainty that inferences produced by a model fall within the bounds of expected and acceptable values, and do not endanger users – human or machine.

For example, safety means that a self-driving car will not drive dangerously on the road, or that a facial recognition model will not show biases, or that a chat bot will not generate abusive messages.

Safety in ML systems can be guaranteed in the following ways.

  • Unit testing
    Each piece of code used to generate a trained model should be unit-tested. Data transformation functions, data sampling strategies, evaluation methods, etc. should be confronted to both happy-path inputs and a reasonable set of corner cases.
  • Model testing, simulation
    After the model is trained and evaluated, real production data should be sent to it to establish an estimate of how the model will behave once deployed. This can be achieved by e.g. sending a small fraction of live production traffic to a candidate model and monitoring inferences, or by subjecting the model to a set of must-pass scenarios (real or synthetic) to ensure that no regressions are introduced.


In ML, beyond simply tracking what model version was deployed, it is crucial to enable exhaustive so-called Lineage Tracking.

End-to-end Lineage Tracking means a complete bookkeeping of ALL the assets and artifacts involved in the production of every single inference.

This means tracking

  • The raw dataset used as input
  • All of the intermediate data transformation steps
  • Configurations used at every step: featurization, training, evaluation, etc.
  • The actual code used at every step
  • What resources the end-to-end pipeline used (map/reduce clusters, GPU types, etc.)
  • Inferences generated by the deploy model

as well as the lineage relationships between those.

Lineage Tracking enables:

  • Auditability
    If a deployed model leads to undesired behaviors, an in-depth post-mortem investigation is made possible by using lineage data. This is especially important for models performing safety critical tasks (e.g. self-driving cars) where legal and compliance requirements demand auditability.
  • Debugging
    It is virtually impossible to debug a model without knowing what data, configuration, code, and resources were used to train it.
  • Reproducibility
    See below.


If a particular inference cannot be reproduced from scratch (within stochastic variations) starting from raw data, the corresponding model should arguably not be used in production. This would be like deploying an application when you had lost the source code with no way to retrieve it.

Without the ability to reproduce a particular trained model, there is no explainability of the model and its inferences. It is impossible to debug production issues.

Additionally, reproducibility enables rigorous experimentation. The entire end-to-end pipeline can be re-run while changing a single degree of freedom at a time (e.g. input data selection, sampling strategy, hyper-parameters, hardware resources, training code, etc.)


ML models are not one-and-done type of projects. Often times, the characteristics of the training data change over time, and the model needs to be retrained recurrently to pick up on new trends.

For example, when real-world conditions change (e.g. COVID, macro-economic changes, user trends, etc.), models can lose predictive power if they are not retrained frequently.

This frequent retraining can only be achieved if an end-to-end pipeline exists. If engineers can simply run or schedule a pipeline pointing to new data, and all steps are automated (data processing, featurization, training, evaluation, testing, etc.) then refreshing a model is no harder than deploying a web app.

How Sematic provides production-grade guarantees

Sematic is an open-source framework to build and execute end-to-end ML pipelines of arbitrary complexity.

Sematic helps build production-grade pipelines by ensuring the guarantees listed above in the following way, without requiring any additional work.

  • Safety
    In Sematic, all pipelines steps are just Python functions. Therefore, they can be unit-tested as part of a CI pipeline. Model testing/simulation is enabled by simply adding downstream steps after model training and evaluations steps. Trained model can be subjected to real or simulated data to ensure the absence of regressions.
  • Traceability
    Sematic keeps an exhaustive lineage graph of ALL assets consumed and produced by all steps in your end-to-end pipelines: inputs and outputs of all steps, code, third-party dependencies, hardware resources, etc. All these are visualizable in the Sematic UI.
  • Reproducibility
    By enabling complete traceability of all assets, Sematic lets you re-execute any past pipeline with the same or different inputs.
  • Automation
    Sematic enables users to build true end-to-end pipelines, from raw data to deployed model. These can then be scheduled to pick up on new data automatically

Check us out at, star us on Github, and join us on Discord to discuss production ML.

Subscribe to our mailing list

Receive release notes, updates, tips and news straight into your inbox.
Average frequency: every couple of weeks. No spam ever.

Subscribed, thank you!
Oops! Something went wrong while submitting the form.