This weblog publish focuses on new options and enhancements. For a complete record, together with bug fixes, please see the launch notes.
Introducing Native Runners: Run Fashions on Your Personal {Hardware}
Constructing AI fashions typically begins regionally. You experiment with structure, fine-tune on small datasets, and validate concepts utilizing your personal machine. However the second you need to check that mannequin inside a real-world pipeline, issues change into difficult.
You often have two choices:
-
Add the mannequin to a distant cloud surroundings, even for early-stage testing
-
Construct and expose your personal API server, deal with authentication, safety, and infrastructure simply to check regionally
Neither path is good, particularly if you happen to’re:
-
Engaged on private or resource-limited initiatives
-
Growing fashions that want entry to native recordsdata, OS-level instruments, or restricted knowledge
-
Managing edge or on-prem environments the place cloud is not viable
Native Runners remedy this drawback.
They will let you develop, check, and run fashions by yourself machine whereas nonetheless connecting to Clarifai’s platform. You don’t have to add your mannequin to the cloud. You merely run it the place it’s — your laptop computer, workstation, or server — and Clarifai takes care of routing, authentication, and integration.
As soon as registered, the Native Runner opens a safe connection to Clarifai’s management aircraft. Any requests to your mannequin’s Clarifai API endpoint are securely routed to your native runner, processed, and returned. From a person perspective, it really works like every other mannequin hosted on Clarifai, however behind the scenes it is operating completely in your machine.
Right here’s what you are able to do with Native Runners:
-
Streamlined mannequin improvement
Develop and debug fashions with out deployment overhead. Watch real-time site visitors, examine inputs, and check outputs interactively. -
Leverage your personal compute
You probably have a strong GPU or customized setup, use it to serve fashions. Your machine does the heavy lifting, whereas Clarifai handles the remainder of the stack. -
Non-public knowledge and system-level entry
Serve fashions that work together with native recordsdata, non-public APIs, or inner databases. With assist for the MCP (Mannequin Context Protocol), you’ll be able to expose native capabilities securely to brokers, with out making your infrastructure public.
Getting Began
Earlier than beginning a Native Runner, be sure to’ve finished the next:
-
Constructed or downloaded a mannequin – You should use your personal mannequin or decide a suitable one from a repo like Hugging Face. In case you’re constructing your personal, take a look at the documentation on the best way to construction it utilizing the Clarifai-compatible mission format.
-
Put in the Clarifai CLI – run
pip set up --upgrade clarifai
-
Generated a Private Entry Token (PAT) – out of your Clarifai account’s settings web page beneath “Safety.”
-
Created a context – this shops your native surroundings variables (like person ID, app ID, mannequin ID, and many others.) so the runner is aware of how to connect with Clarifai.
You’ll be able to arrange the context simply by logging in by the CLI, which can stroll you thru getting into all of the required values:
clarifai login
Beginning the Runner
As soon as every thing is about up, you can begin your Native Dev Runner from the listing containing your mannequin (or present a path):
clarifai mannequin local-runner [OPTIONS] [MODEL_PATH]
-
MODEL_PATH
is the trail to your mannequin listing. In case you go away it clean, it defaults to the present listing. -
This command will launch an area server that mimics a manufacturing Clarifai deployment, letting you check and debug your mannequin dwell.
If the runner doesn’t discover an present context or config, it’ll immediate you to generate one with default values. This may create:
-
A devoted native compute cluster and nodepool.
-
An app and mannequin entry in your Clarifai account.
-
A deployment and runner ID that ties your native occasion to the Clarifai platform.
As soon as launched, it additionally auto-generates a consumer code snippet that can assist you check the mannequin.
Native Runners provide the flexibility to construct and check fashions precisely the place your knowledge and compute dwell, whereas nonetheless integrating with Clarifai’s API, workflows, and platform options. Take a look at the total instance and setup information within the documentation right here.
You’ll be able to strive Native Runners at no cost. There’s additionally a $1/month Developer Plan for the primary 12 months, which supplies you the flexibility to attach as much as 5 Native Runners to the cloud API with limitless runner hours.
Compute UI
- We’ve launched a brand new Compute Overview dashboard that offers you a transparent, unified view of all of your compute assets. From a single display screen, now you can handle Clusters, Nodepools, Deployments, and the newly added Runners.
- This replace additionally consists of two main additions: Join a Native Runner, which helps you to run fashions instantly by yourself {hardware} with full privateness, and Join your personal cloud, permitting you to combine exterior infrastructure like AWS, GCP, or Oracle for dynamic, cost-efficient scaling. It’s now simpler than ever to manage the place and the way your fashions run.
- We’ve additionally redesigned the cluster creation expertise to make provisioning compute much more intuitive. As a substitute of choosing every parameter step-by-step, you now get a unified, filterable view of all accessible configurations throughout suppliers like AWS, GCP, Azure, Vultr, and Oracle. You’ll be able to filter by area, occasion sort, and {hardware} specs, then choose precisely what you want with full visibility into GPU, reminiscence, CPU, and pricing. As soon as chosen, you’ll be able to spin up a cluster immediately with a single click on.
Revealed New Fashions
We revealed the Gemma-3n-E2B and Gemma-3n-E4B fashions. We’ve added each the E2B and E4B variants, optimized for text-only era and fitted to totally different compute wants.
Gemma 3n is designed for real-world, low-latency use on units like telephones, tablets, and laptops. These fashions leverage Per-Layer Embedding (PLE) caching, the MatFormer structure, and conditional parameter loading.
You’ll be able to run them instantly within the Clarifai Playground or entry them by way of our OpenAI-compatible API.
Token-Primarily based Billing
We’ve began rolling out token-based billing for choose fashions on our Group platform. This variation aligns with trade requirements and extra precisely displays the price of inference, particularly for big language fashions.
Token-based pricing will apply solely to fashions operating on Clarifai’s default Shared compute within the Group. Fashions deployed on Devoted compute will proceed to be billed primarily based on compute time, with no change. Legacy imaginative and prescient fashions will nonetheless observe per-request billing for now.
Playground
- The Playground web page is now publicly accessible — no login required. Nevertheless, sure options stay accessible solely to logged-in customers.
- Added mannequin descriptions and predefined immediate examples to the Playground, making it simpler for customers to know mannequin capabilities and get began rapidly.
- Added Pythonic assist within the Playground for consuming the brand new mannequin specification.
- Improved the Playground person expertise with enhanced inference parameter controls, restored mannequin model selectors, and clearer error suggestions.
Extra Modifications
-
Python SDK: Added per-output token monitoring, async endpoints, improved batch assist, code validation, and construct optimizations.
Test all SDK updates right here. -
Platform Updates: Improved billing accuracy, added dynamic code snippets, UI tweaks to Group Dwelling and Management Middle, and higher privateness defaults.
Discover all platform modifications right here. -
Clarifai Organizations: Made invitations clearer, improved token visibility, and added persistent invite prompts for higher onboarding.
See full org enhancements right here.
Prepared to start out constructing?
With Native Runners, now you can serve fashions, MCP servers, or brokers instantly from your personal {hardware} with out importing mannequin weights or managing infrastructure. It’s the quickest option to check, iterate, and securely run fashions out of your laptop computer, workstation, or on-prem server. You’ll be able to learn the documentation, watch the demo video to get began.