Introduction
Operating Giant Language Fashions (LLMs) and different open-source fashions domestically presents vital benefits for builders. That is the place Ollama shines. Ollama simplifies the method of downloading, organising, and working these highly effective fashions in your native machine, supplying you with higher management, enhanced privateness, and lowered prices in comparison with cloud-based options.
Whereas working fashions domestically presents immense advantages, integrating them with cloud-based tasks or sharing them for broader entry is usually a problem. That is exactly the place Clarifai Native Runners are available. Native Runners allow you to reveal your domestically working Ollama fashions by way of a public API endpoint, permitting seamless integration with any venture, wherever, successfully bridging the hole between your native surroundings and the cloud.
On this publish, we’ll stroll by easy methods to run open-source fashions utilizing Ollama and expose them with a public API utilizing Clarifai Native Runners. This makes your native fashions accessible globally whereas nonetheless working fully in your machine.
Native Runners Defined
Native Runners allow you to run fashions by yourself machine, whether or not it is your laptop computer, workstation, or on-prem server, whereas exposing them by a safe, public API endpoint. You need not add the mannequin to the cloud. The mannequin stays native however behaves prefer it’s hosted on Clarifai.
As soon as initialized, the Native Runner opens a safe tunnel to Clarifai’s management aircraft. Any requests to your mannequin’s Clarifai API endpoint are routed to your machine, processed domestically, and returned to the caller. From the skin, it features like another hosted mannequin. Internally, every little thing runs in your {hardware}.
Native Runners are particularly helpful for:
- Quick native improvement: Construct, check, and iterate on fashions in your individual surroundings with out deployment delays. Examine site visitors, check outputs, and debug in actual time.
- Utilizing your individual {hardware}: Benefit from native GPUs or customized {hardware} setups. Let your machine deal with inference whereas Clarifai manages routing and API entry.
- Personal and offline knowledge: Run fashions that depend on native recordsdata, inner databases, or non-public APIs. Hold every little thing on-prem whereas nonetheless exposing a usable endpoint.
Native Runners provides you the pliability of native execution together with the attain of a managed API, all with out giving up management over your knowledge or surroundings.
Expose Native Ollama Fashions by way of Public API
This part will stroll you thru the steps to get your Ollama mannequin working domestically and accessible by way of a Clarifai public endpoint.
Conditions
Earlier than we start, guarantee you’ve:
Step 1: Set up Clarifai and Login
First, set up the Clarifai Python SDK:
Subsequent, log in to Clarifai to configure your context. This hyperlinks your native surroundings to your Clarifai account, permitting you to handle and expose your fashions.
Comply with the prompts to enter your Person ID and Private Entry Token (PAT). In case you need assistance acquiring these, confer with the documentation right here.
Step 2: Set Up Your Native Ollama Mannequin for Clarifai
Subsequent, you’ll put together your native Ollama mannequin so it may be accessed by Clarifai’s Native Runners. This step units up the mandatory recordsdata and configuration to reveal your mannequin by a public API endpoint utilizing Clarifai’s platform.
Use the next command to initialize the setup:
This generates three key recordsdata inside your venture listing:
-
mannequin.py
-
config.yaml
-
necessities.txt
These outline how Clarifai will talk along with your domestically working Ollama mannequin.
You may also customise the command with the next choices:
-
--model-name
: Identify of the Ollama mannequin you wish to serve. This pulls from the Ollama mannequin library (defaults tollama3:8b
). -
--port
: The port the place your Ollama mannequin is working (defaults to23333
). -
--context-length
: Units the mannequin’s context size (defaults to8192
).
For instance, to make use of the gemma:2b
mannequin with a 16K context size on port 8008
, run:
After this step, your native mannequin is able to be uncovered utilizing Clarifai Native Runners.
Step 3: Begin the Clarifai Native Runner
As soon as your native Ollama mannequin is configured, the subsequent step is to run Clarifai’s Native Runner. This exposes your native mannequin to the web by a safe Clarifai endpoint.
Navigate into the mannequin listing and run:
As soon as the runner begins, you’ll obtain a public Clarifai URL. This URL is your gateway to accessing your domestically working Ollama mannequin from wherever. Requests made to this Clarifai endpoint might be securely routed to your native machine, permitting your Ollama mannequin to course of them.
Operating Inference on Your Uncovered Mannequin
Along with your Ollama mannequin working domestically and uncovered by way of Clarifai Native Runner, now you can ship inference requests to it from wherever utilizing the Clarifai SDK or an OpenAI-compatible endpoint.
Inference utilizing OpenAI appropriate methodology
Set your Clarifai PAT as an surroundings variable:
Then, you should use the OpenAI shopper to ship requests:
For multimodal inference, you possibly can embrace picture knowledge:
Inference with Clarifai SDK
You may also use the Clarifai Python SDK for inference. The mannequin URL could be obtained out of your Clarifai account.
Customizing Ollama Mannequin Configuration
The clarifai mannequin init --toolkit ollama
 command generates a mannequin file construction:
ollama-model-upload/
├── 1/
│ └── mannequin.py
│
├── config.yaml
└── necessities.txt
You possibly can customise the generated recordsdata to regulate how your mannequin works:
-
1/mannequin.py
– Customise to tailor your mannequin’s habits, implement customized logic, or optimize efficiency. -
config.yaml
– Outline settings akin to compute necessities, particularly helpful when deploying to devoted compute utilizing Compute Orchestration. -
necessities.txt
– Record any required Python packages to your mannequin.
This setup provides you full management over how your Ollama mannequin is uncovered and used by way of API. Consult with the documentation right here.
Conclusion
Operating open-source fashions domestically with Ollama provides you full management over privateness, latency, and customization. With Clarifai Native Runners, you possibly can expose these fashions by way of a public API with out counting on centralized infrastructure. This setup makes it simple to plug native fashions into bigger workflows or agentic methods, whereas holding compute and knowledge totally in your management. If you wish to scale past your machine, try Compute Orchestration to deploy fashions on devoted GPU nodes.