Sematic Features

Local Orchestration

Sematic enables iterating on pipelines locally for easier development and debugging. Pipelines can run against a local or deployed metadata sever. The same code can then run at larger scale on a Kubernetes cluster.

Kubernetes Orchestration

Sematic can orchestrate large scale pipelines on Kubernetes clusters. Users can specify required resources on a per-function basis (GPUs, CPUs, memory, etc.) and leverage multiple nodes for distributed compute.

Web Dashboard

The Sematic web Dashboard lets users monitor pipelines, visualize artifacts and metrics, investigate failures, and collaborate with team mates. Pipelines can be replayed from the dashboard and results can be shared easily.

Python SDK

Sematic's lightweight Python SDK makes it extremely easy to convert arbitrary business logic into an orchestrated pipeline. In Sematic, everything is Python-centric: business logic, DAG definition, resource requirement, visualizations, etc.

Command Line Interface

The Sematic CLI lets users run pipelines, submit them to run on their cluster, list resources and configure Sematic.

Lineage Tracking

Sematic persists and tracks all assets pertaining to pipeline executions. Code, configuration, resources used, inputs and outputs of all functions. Sematic keeps a source of truth of all runs, enabling traceability and reproducibility.

Visualizations

Sematic displays all produced plots, images, configurations, dataframes, etc. in the web dashboard. Visualizations can be customized from the pipeline's Python code and then shared to team mates from the dashboard.

Dependency Packaging

At runtime, Sematic packages pipeline code and its dependencies (user code, Python dependencies, static libraries, hardware drivers, etc.) and ships it to the Kubernetes cluster. This ensures the fastest possible iteration loop to visualize results at scale.

Real-time Metrics

Sematic lets users log timeseries metrics from ongoing jobs and visualize them in real-time in the dashboard (e.g. loss curves, learning rates, etc.). This provides greater visibility into workloads for optimization and early stoppage.

Container logs

Sematic surfaces workloads container logs directly in the dashboard to accelerate debugging and increase observability.

Resource Customization

Each function in a Sematic pipeline can request custom resources (GPUs, CPUs, memory, Ray clusters, etc.). Sematic will dynamically allocate these resources at runtime and run the corresponding workloads.

Scalability

Sematic scales as much as the underlying Kubernetes on which it is deployed, enabling access to a large variety of VM types, GPUs, and hardware profiles. Workloads can also scale horizontally thanks to Sematic's Ray integration.

Distributed Compute with Ray

Sematic integrates with Ray to let workloads spin Ray clusters up and down at runtime with only a few lines of code. This enables parallelized data processing and distributed training.

Function Caching

Sematic enables caching pipeline steps whose inputs are unchanged between runs. This can greatly accelerate development workflows and debugging sessions, and dramatically reduces resource usage and costs.

Function Retries

Sematic enables fault tolerant pipelines by catching transient failures and retrying workloads in order to optimize resource usage and costs. Never let a network failure crash a workload.

Collaboration with tags and notes

Sematic enables sharing results and collaboration with team members with run-specific notes and tags for better organization of workloads.