DeepReinforce Releases Ornith-1.0: An Open-Supply Coding Mannequin Household That Learns Its Personal RL Scaffolds

By admin2010

June 26, 2026

2

DeepReinforce has launched Ornith-1.0, an open-source mannequin household constructed for agentic coding. The lineup spans 4 sizes, from a 9B dense mannequin to a 397B mixture-of-experts flagship. Each checkpoint ships below the MIT license on Hugging Face. The fashions are post-trained on prime of pretrained Gemma 4 and Qwen 3.5.

Most coding brokers pair a mannequin with a hard and fast, human-designed harness. Ornith-1.0 as an alternative learns to jot down its personal. The DeepReinforce analysis group studies state-of-the-art outcomes amongst open fashions of comparable dimension.

TL;DR

Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes below MIT, constructed on Gemma 4 and Qwen 3.5.
The mannequin learns its personal scaffold throughout RL, collectively optimizing the harness and the answer.
Ornith-1.0-397B tops Claude Opus 4.7 on each headline benchmarks, however not Opus 4.8 or the bigger GLM-5.2-744B.
Three layers — mounted belief boundary, deterministic monitor, frozen LLM choose — guard towards reward hacking.

What’s Ornith-1.0?

Ornith-1.0 is a set of reasoning fashions tuned for coding brokers. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B mannequin is mixture-of-experts and prompts roughly 3B parameters per token. FP8 and GGUF builds are additionally printed for quicker native serving.

Every mannequin is a reasoning mannequin. Replies open with a block earlier than the ultimate reply. The serving recipes allow a reasoning parser, in order that hint returns in a separate reasoning_content subject. The fashions additionally emit well-formed instrument requires agent loops.

Deployment is simple. The 9B mannequin is about 19GB in bf16 and serves on a single 80GB GPU. Serving recipes goal vLLM, SGLang, and Transformers. Every mannequin exposes an OpenAI-compatible endpoint. Customary agent frameworks subsequently work with out code modifications.

Interactive Explainer

=5){clearInterval(timer);timer=null;b.textContent=”Auto-run ▶”;}else{doStep();}},1400); }); root.querySelector(‘#resetBtn’).addEventListener(‘click on’,perform(){ if(timer){clearInterval(timer);timer=null;root.querySelector(‘#autoBtn’).textContent=”Auto-run ▶”;} step=0;reward=0.08; root.querySelector(‘#rFill’).fashion.width=”8%”; root.querySelector(‘#rVal’).textContent=”0.08″; root.querySelector(‘#scaffTxt’).textContent=scaffs[0]; root.querySelector(‘#outTxt’).textContent=”Press “Run coaching step” to start.”; root.querySelector(‘#stepOut’).innerHTML=’Step 0 — untrained coverage with a hard and fast, hand-written harness.’; resize(); }); /* benchmark information (vendor-reported) */ var BENCHES=[‘Terminal-Bench 2.1′,’SWE-Bench Verified’,’SWE-Bench Pro’,’SWE-Bench Multilingual’,’NL2Repo’,’ClawEval Avg’]; var DATA={ t397:{label:’Ornith-1.0-397B’,hero:’Ornith-1.0-397B’, fashions:[‘Ornith-1.0-397B’,’Qwen3.5-397B’,’Qwen3.7-Max’,’GLM-5.2-744B’,’Minimax-M3-428B’,’DeepSeek-V4-Pro-1.6T’,’Claude Opus 4.7′,’Claude Opus 4.8′], vals:[[77.5,53.5,73.5,81.0,64,64,70.3,85],[82.4,76.4,80.4,null,null,80.6,80.8,87.6],[62.2,51.6,60.6,62.1,59,55.4,64.3,69.2],[78.9,69.3,78.3,null,null,76.2,null,null],[48.2,36.8,47.2,48.9,42.1,null,null,69.7],[77.1,70.7,65.2,null,null,75.8,78.2,null]]}, t35:{label:’Ornith-1.0-35B-A3B’,hero:’Ornith-1.0-35B-A3B’, fashions:[‘Ornith-1.0-35B-A3B’,’Qwen3.5-35B-A3B’,’Qwen3.6-35B-A3B’,’Gemma4-31B’,’Qwen3.5-397B’], vals:[[64.2,41.4,52.5,42.1,53.5],[75.6,70,73.4,52,76.4],[50.4,44.6,49.5,35.7,51.6],[69.3,60.3,67.2,51.7,69.3],[34.6,20.5,29.4,15.5,36.8],[69.8,65.4,68.7,48.5,70.7]]}, t9:{label:’Ornith-1.0-9B’,hero:’Ornith-1.0-9B’, fashions:[‘Ornith-1.0-9B’,’Qwen3.5-9B’,’Qwen3.5-35B-A3B’,’Gemma4-12B’,’Gemma4-31B’], vals:[[43.1,21.3,41.4,21,42.1],[69.4,53.2,70,44.2,52],[42.9,31.3,44.6,27.6,35.7],[52,39.7,60.3,32.5,51.7],[27.2,16.2,20.5,10.3,15.5],[63.1,53.2,65.4,32.5,48.5]]} }; var curTier=”t397″,curB=0; var bchips=root.querySelector(‘#benchChips’); BENCHES.forEach(perform(b,i){ var c=doc.createElement(‘div’);c.className=”chip”+(i===0?’ on’:”);c.textContent=b;c.dataset.b=i; c.addEventListener(‘click on’,perform(){curB=i;bchips.querySelectorAll(‘.chip’).forEach(perform(x){x.classList.take away(‘on’)});c.classList.add(‘on’);draw();}); bchips.appendChild(c); }); root.querySelectorAll(‘.chip[data-tier]’).forEach(perform(c){ c.addEventListener(‘click on’,perform(){curTier=c.dataset.tier;root.querySelectorAll(‘.chip[data-tier]’).forEach(perform(x){x.classList.take away(‘on’)});c.classList.add(‘on’);draw();}); }); perform draw(){ var d=DATA[curTier];var row=d.vals[curB];var chart=root.querySelector(‘#chart’);chart.innerHTML=”; var max=Math.max.apply(null,row.filter(perform(v){return v!=null})); d.fashions.forEach(perform(m,i){ var v=row[i];var hero=(m===d.hero); var div=doc.createElement(‘div’);div.className=”row”+(hero?’ hero’:”)+(v==null?’ na’:”); div.innerHTML=’ ‘+m+’ ‘+(v==null?’n/a’:v)+’ ‘; chart.appendChild(div); (perform(bf,val){setTimeout(perform(){bf.fashion.width=(val==null?0:(val/max*100))+’%’;},40);})(div.querySelector(‘.bf’),v); }); root.querySelector(‘#benchNote’).textContent=”Benchmark: “+BENCHES[curB]+’. Bars scaled to the best rating proven. “n/a” = not reported by the seller. Self-reported, not independently verified.’; resize(); } draw(); /* defenses accordion */ root.querySelectorAll(‘.layer’).forEach(perform(l){ l.addEventListener(‘click on’,perform(){l.classList.toggle(‘open’);resize();}); }); /* auto-resize for WordPress iframe */ perform resize(){ attempt{ var h=root.offsetHeight+40; if(window.mum or dad){window.mum or dad.postMessage({kind:’mtp-ornith-height’,top:h},’*’);} }catch(e){} } window.addEventListener(‘load’,resize); setTimeout(resize,300); window.addEventListener(‘resize’,resize); })();

” fashion=”width:100%;border:0;show:block;min-height:600px;overflow:hidden” top=”600″ scrolling=”no” loading=”lazy” title=”Ornith-1.0 Interactive Explainer”>

The Self-Scaffolding Thought

Most coding brokers depend on a scaffold, additionally known as a harness. A scaffold wraps the mannequin with reminiscence, instruments, error dealing with, and orchestration logic. AI groups normally hand-design one scaffold per job class.

Ornith-1.0 treats the scaffold as a learnable object as an alternative. Throughout reinforcement studying, the scaffold co-evolves with the mannequin’s coverage. Every RL step runs in two levels.

First, the mannequin reads the duty and its earlier scaffold. It then proposes a refined scaffold. Second, it makes use of that scaffold and the duty to generate an answer rollout. Reward from the rollout flows again to each levels.

So the mannequin is optimized to writer orchestration, not simply solutions. Over coaching, higher-reward scaffolds are mutated and chosen robotically. Per-task methods emerge with out hand-engineered harness design.

Coaching additionally runs asynchronously, utilizing a pipeline-RL setup. A staleness weight downweights older, off-policy tokens and drops them previous a threshold. The optimization makes use of a token-level GRPO goal.

Guarding In opposition to Reward Hacking

Letting a mannequin write its personal scaffold invitations reward hacking. A scaffold may learn seen check information and hardcode anticipated outputs. It may additionally copy an oracle resolution sitting within the atmosphere. DeepReinforce group describes three protection layers.

The outer belief boundary is mounted and immutable. The atmosphere, instrument floor, and check isolation keep outdoors the mannequin’s attain. The mannequin evolves solely its interior coverage scaffold.
A deterministic monitor flags banned actions. Studying withheld paths or modifying verification scripts earns zero reward. These trajectories are excluded from the benefit computation.
A frozen LLM choose acts as a veto. It sits on prime of the verifier, not as the first reward.

Benchmark

DeepReinforce studies vendor numbers throughout a number of agentic coding benchmarks. At flagship scale, Ornith-1.0-397B posts 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified. On SWE-Bench Verified, that 82.4 trails solely Claude Opus 4.8 (87.6) among the many listed fashions. On Terminal-Bench 2.1, the image is extra combined.

Ornith-1.0-397B beats Claude Opus 4.7 (70.3) on Terminal-Bench 2.1. Nevertheless it trails Claude Opus 4.8 (85) and the bigger GLM-5.2-744B (81.0). So the ‘state-of-the-art’ declare is scoped to open fashions of comparable dimension.

The smaller fashions carry the effectivity case. The 35B mannequin scores 64.2 on Terminal-Bench 2.1, above Qwen 3.5-397B’s 53.5. The 9B mannequin reaches 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified.

Benchmark	Ornith-1.0-397B	Qwen3.5-397B	Qwen3.7-Max	GLM-5.2-744B	Minimax-M3-428B	DeepSeek-V4-Professional-1.6T	Claude Opus 4.7	Claude Opus 4.8
Terminal-Bench 2.1	77.5	53.5	73.5	81.0	64	64	70.3	85
SWE-Bench Verified	82.4	76.4	80.4	–	–	80.6	80.8	87.6
SWE-Bench Professional	62.2	51.6	60.6	62.1	59	55.4	64.3	69.2
SWE-Bench Multilingual	78.9	69.3	78.3	–	–	76.2	–	–
NL2Repo	48.2	36.8	47.2	48.9	42.1	–	–	69.7
ClawEval Avg	77.1	70.7	65.2	–	–	75.8	78.2	–

Use Instances and a Fast Begin

The fashions goal terminal-native coding brokers and repository-scale work. Sensible suits embody multi-file refactors, bug localization, and test-driven patches. The 9B mannequin fits edge or single-GPU setups the place latency and value matter. The 397B mannequin targets most accuracy on lengthy, multi-step duties.

For instance, a dev can run the 9B mannequin domestically to triage a failing check suite. A platform group can self-host the 397B mannequin for an inner coding agent.

Serving is a one-liner with vLLM:

vllm serve deepreinforce-ai/Ornith-1.0-9B 
    --served-model-name Ornith-1.0-9B 
    --max-model-len 262144 
    --enable-auto-tool-choice --tool-call-parser qwen3_xml 
    --reasoning-parser qwen3 
    --trust-remote-code

Then name it with any OpenAI consumer:

from openai import OpenAI

consumer = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

resp = consumer.chat.completions.create(
    mannequin="Ornith-1.0-9B",
    messages=[{"role": "user", "content": "Write a Python is_prime(n)."}],
    temperature=0.6, top_p=0.95,
)
msg = resp.selections[0].message
print(getattr(msg, "reasoning_content", None))  # the  hint
print(msg.content material)                              # the ultimate reply

The reasoning hint returns in reasoning_content, with the reply in content material. Beneficial sampling is temperature=0.6, top_p=0.95, top_k=20. The mannequin additionally plugs into OpenHands, OpenClaw, and OpenCode.

Try the Mannequin Weights and Technical particulars. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 150k+ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as nicely.

Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Join with us

DeepReinforce Releases Ornith-1.0: An Open-Supply Coding Mannequin Household That Learns Its Personal RL Scaffolds

TL;DR

What’s Ornith-1.0?

Interactive Explainer

The Self-Scaffolding Thought

Guarding In opposition to Reward Hacking

Benchmark

Use Instances and a Fast Begin

Prime 7 Coding Fashions You Can Run Domestically in 2026

Europe’s excessive warmth is shutting down energy vegetation

Utilizing Graphify and NetworkX to Map Python Codebase Construction with God Nodes, Communities, and Structure Visualizations

LEAVE A REPLY Cancel reply

Most Popular

The Variations Between Newbie and Skilled Merchants » Be taught To Commerce The Market

This 7.5% Month-to-month Dividend Inventory Needs to Show It’s Extra Than Only a Excessive Yield

Invesco, $2.5T asset supervisor, information for tokenized fund focusing on stablecoin reserves

Bitcoin ETP Holdings Hit Document Drawdown As K33 Flags Outflows

Recent Comments

ABOUT US

POPULAR POSTS

The Variations Between Newbie and Skilled Merchants » Be taught To Commerce The Market

This 7.5% Month-to-month Dividend Inventory Needs to Show It’s Extra Than Only a Excessive Yield

Invesco, $2.5T asset supervisor, information for tokenized fund focusing on stablecoin reserves

POPULAR CATEGORY