Getting started with Sematic in 5 minutes

February 27, 2023

Shittu Olumide

Guest Author

Getting started with Sematic in 5 minutes

Sematic is an open-source development platform for Machine Learning (ML) and Data Science. It enables users to quickly build end-to-end ML pipelines to execute on their local machine or in their cloud environment.

With integrations such as PyTorch, Kubernetes, Bazel, Snowflake, and more, it is designed to support arbitrarily complex pipelines of Python-defined business logic running on heterogeneous compute.

Pipeline steps can notably include:

Data processing – Apache Spark jobs, Google Dataflow jobs, or other map/reduce jobs
Model training and evaluation – PyTorch, Tensorflow, XgBoost, Scikit Learn, etc.
Metrics extraction – extract aggregate metrics from model inferences or feature datasets
Hyperparameter tuning – iterate on configurations and trigger training jobs
Post to third-party APIs – post labeling requests, JIRA tickets, Slack messages, etc.
Arbitrary Python logic – really anything that can be implemented in Python.

Sematic currently supports Python 3.8, 3.9 and 3.10 on Linux and Mac. But if you're using Windows, you can run Sematic in Windows Subsystem for Linux.

Sematic comes with these components:

A lightweight Python SDK to define dynamic pipelines of arbitrary complexity
An execution backend to orchestrate pipelines locally or in a Kubernetes cluster
A Command Line Interface to interact with Sematic
A web dashboard

And some advanced features such as run caching, fault tolerance, function retries, reruns, etc.

Installation

Sematic is most useful when deployed in your cloud infrastructure, but it can also be used entirely locally with no infrastructure required.

It can be installed using the Python package installer.

$ pip install sematic

After installation, launch the Sematic web dashboard on your local machine with:

$ sematic start

This starts the metadata server and the web dashboard in your browser at http://127.0.0.1:5001, to stop the server simply type:

$ sematic stop

Sematic functions

Functions are the fundamental building block of work in your pipelines and may be nested arbitrarily, much like conventional Python functions. You will put all of the business logic pertaining to your pipeline steps into action there.

Sematic Function inputs and outputs are serialized, tracked in the database, and the execution state is also monitored. In the Dashboard, Sematic functions are shown as Runs.

Consider this Sematic function:

You will notice that this is just a regular Python function but it is decorated with a Sematic decorator. The input artifacts (a: int, b: int) and the output are type-checked, tracked and visualized in the Sematic Dashboard.

Let’s create a simple pipeline to fully understand how Sematic works.

$ sematic new tutorial

A Python package with some boilerplate code will be created with the following files present in the tutorial/ directory:

‍

__main__.py: This is the typical entry point of any Python package.
pipeline.py: This is where your pipeline and its nested steps are defined. You can define multiple pipelines and pilot their executions from the __main__.py file.
requirements.txt: This is where you can keep the external dependencies specific to your project.

In the tutorial/pipeline.py, add the following code:

In the tutorial/__main__.py, add the following code:

And you are done, the next thing is to run the pipeline. You will need to pass an argument --name when you try to run this code.

$ python3 -m tutorial --name "Shittu Olumide"

And with that, you just created your first pipeline :)

Head over to the web dashboard and you will find your first pipeline.

Click on the pipeline, you will discover information such as the run ID, the latest runs, the nested runs (your Python functions), the input, output, source, logs, resources and a Note (bottom right corner) where you can leave a note for your team members.

In the Execution Graph panel your pipeline is represented as a series of nested Directed Acyclic Graph (DAG).

‍

Learn more about the web dashboard here.

MNIST Example

With this understanding, let’s build a simple example pipeline for MNIST in PyTorch.

Start a new project

$ sematic new mnist_tutorial $ sematic start

Load the dataset

You will use the baseline MNIST dataset that comes with Pytorch.

Getting a dataloader

To feed this data into the model for training and testing, create a PyTorch dataloader.

Train the model

Train the model since the data is now ready.

Evaluate the model

After the model has been trained, you want to assess how well it performed on the test dataset.

The end-to-end pipeline

You can now combine everything into an end-to-end pipeline.

Finally, the launch script

To be able to execute the pipeline, create a launch script in the __main__.py file

Run the pipeline and see what the execution graph and visualizations look like in the web dashboard.

$ python3 -m mnist_tutorial

Wrap up

On your local development machine, Sematic enables you to iterate, prototype, and debug your pipelines before submitting them to run in your cloud environment's Kubernetes cluster and make use of resources like GPUs and large memory instances.

In order to get the best experience using Sematic you can use a wide range of features such as step retry, pipeline nesting, local execution, lightweight Python SDK, artifact visualization, pipeline reruns, Step caching, and many more.
‍

Check out our: Documentation, join our Discord server, subscribe to our Youtube channel, and star our GitHub.

July 18, 2023

Getting started with Sematic in 5 minutes

Installation

Sematic functions

MNIST Example

Start a new project

Load the dataset

Getting a dataloader

Train the model

Evaluate the model

The end-to-end pipeline

Finally, the launch script

Wrap up

Tuning and Testing Llama 2, FLAN-T5, and GPT-J with LoRA, Sematic, and Gradio

How Voxel cut model retraining time by 80%

Release Notes – 0.31.0

ML Orchestration: Why It's Time to Move Past Airflow

5 Tips to Reduce your ML Cloud Costs

Release Notes – 0.29.0

Sematic + Ray: The Best of Orchestration and Distributed Compute at your Fingertips

Release Notes – 0.27.0

Release Notes – 0.22.1

What is Lineage Tracking in Machine Learning and why you need It

What is “production” Machine Learning?

Sematic raises $3M to build an open-source Continuous Machine Learning platform

Observability for Machine Learning: what is it and what are the benefits

Implementing Deep Links in React with Atoms

Continuous Learning for safer and better ML models

Hello World

Getting started with Sematic in 5 minutes

Installation

Sematic functions

MNIST Example

Start a new project

Load the dataset

Getting a dataloader

Train the model

Evaluate the model

The end-to-end pipeline

Finally, the launch script

Wrap up

Tuning and Testing Llama 2, FLAN-T5, and GPT-J with LoRA, Sematic, and Gradio

How Voxel cut model retraining time by 80%

Release Notes – 0.31.0

ML Orchestration: Why It's Time to Move Past Airflow

5 Tips to Reduce your ML Cloud Costs

Release Notes – 0.29.0

Sematic + Ray: The Best of Orchestration and Distributed Compute at your Fingertips

Release Notes – 0.27.0

Release Notes – 0.22.1

What is Lineage Tracking in Machine Learning and why you need It

What is “production” Machine Learning?

Sematic raises $3M to build an open-source Continuous Machine Learning platform

Observability for Machine Learning: what is it and what are the benefits

Implementing Deep Links in React with Atoms

Continuous Learning for safer and better ML models

Hello World

Subscribe to our mailing list