Inside Arcee’s Trinity Large: A 400B-Parameter U.S. Open Source MoE With a Rare Raw Checkpoint

Arcee, a San Francisco–based AI lab, has released what it positions as a new U.S.-made, frontier-scale open model: Trinity Large, a 400-billion parameter mixture-of-experts (MoE) language model, alongside a rare raw 10T-token checkpoint, Trinity-Large-TrueBase. For AI researchers, ML engineers, and technical leaders weighing open alternatives to proprietary or non-U.S. architectures, this launch combines three elements that rarely appear together: frontier-scale sparsity, a permissive Apache 2.0 license, and access to an un-instruction-tuned checkpoint at scale.

In an environment where most high-performing “open” models arrive only after extensive instruction tuning and RLHF, Trinity-Large-TrueBase offers something unusual: a look at what a very large sparse model learns from pretraining alone. At the same time, Trinity Large itself targets long-context, agentic workflows with a heavily optimized training story that Arcee describes as “engineering through constraint,” fitting a 400B MoE run into roughly $20 million over 33 days with a 30-person team.

Where Trinity Large Fits in the Current Open Model Landscape

Arcee has already drawn attention for being one of the few U.S. groups training large language models from scratch and releasing them under open or partially open licenses. Its prior Trinity-family models and AFM-4.5B aimed squarely at enterprise and independent developers who need to customize and deploy locally without proprietary lock-in.

Trinity Large extends that strategy into the frontier scale band. It launches at a time when the open-source LLM landscape is bifurcated:

Chinese labs including Alibaba (Qwen), z.AI (Zhipu), DeepSeek, Moonshot, and Baidu are leading in high-efficiency, high-quality open models, many of them sparse or otherwise optimized architectures.
Meta, after Llama 4’s mixed reception and subsequent controversy around its benchmarking methodology, has largely stepped back from aggressively pushing new fully open frontier releases.
In the U.S., only OpenAI’s gpt-oss line and Arcee are currently training and releasing new, from-scratch, open models at this scale.

This creates what Arcee’s leadership describes as a “vacuum” of American open frontier models. Trinity Large is explicitly framed as an attempt to fill that gap for organizations that, for regulatory, risk, or policy reasons, cannot or will not build on Chinese-origin architectures or tightly controlled proprietary APIs.

For technical decision-makers, the significance is less about raw benchmark scores and more about the combination of:

U.S.-based origin and training
Permissive licensing (Apache 2.0)
Full model weights plus a pre-alignment checkpoint
An MoE design tuned for long-context, agentic workloads

MoE and Extreme Sparsity: 400B Parameters, 1.56% Active

Trinity Large is built as a sparse MoE model with a strong emphasis on activation sparsity. While the total parameter count is 400 billion, only about 1.56%—roughly 13 billion parameters—are active for any given token.

This means that, operationally, each forward pass behaves more like a ~13B model from a compute perspective, while the full model capacity remains available across the expert pool. Arcee characterizes the result as roughly 2–3x faster inference than similarly capable dense or less sparse peers on the same hardware, while still tapping into the “knowledge” of a much larger system.

For practitioners, the implications are straightforward:

The model targets workloads where you want frontier-level breadth of knowledge and reasoning capacity but need lower per-request cost and latency than a fully dense 400B would allow.
The extreme sparsity pushes routing and stability challenges to the foreground; getting good behavior and specialization from such a high-expert-count MoE is nontrivial.

Arcee’s decision to push sparsity this far is aligned with the broader trend toward efficient frontier models: both Trinity Large and OpenAI’s gpt-oss-120b adopt sparse architectures, but Trinity pushes parameter count and context length further, at the cost of more complex engineering.

TrueBase: What a 400B Sparse MoE Looks Like Before Alignment

The most research-relevant part of this launch is Trinity-Large-TrueBase: a raw checkpoint at roughly 10 trillion tokens, released before the typical late-stage interventions of learning rate annealing, instruction data phases, and RLHF.

Most “open” models today are only accessible after:

Supervised fine-tuning (SFT) on instruction-following or chat-style data
Reinforcement Learning from Human Feedback (RLHF) or similar techniques

Those steps improve conversational quality and safety, but they also:

Distort or obscure the underlying knowledge distribution
Impose a particular alignment regime and behavioral style
Make it harder to audit what the model actually learned during pretraining

TrueBase deliberately precedes those steps. It is intended as an “OG base model” view of Trinity Large—what a large sparse model internalizes from web-scale data alone, at the 10T-token mark, before instruction-heavy phases of pretraining.

For researchers and heavily regulated enterprises, this has several direct uses:

Authentic audits: You can analyze data influence, emergent capabilities, and bias characteristics without the confounding effects of instruction-tuning datasets and reward models.
Custom alignment: Teams can apply their own SFT, RLHF, or rule-based layers on top of a high-capacity base without inheriting the “black box” choices of a general-purpose chat model.
Methodological clarity: The separation between intrinsic reasoning ability and post-training behavior can be studied, rather than inferred backwards from a fully tuned chat model.

Arcee’s CTO, Lucas Atkins, notes that even at this stage the checkpoint is already “one of the best performing base models in the world,” underscoring that the model is not a toy pretraining artifact but a viable starting point for serious downstream systems.

Engineering Through Constraint: 33 Days, ~$20M, 30 People

Unlike many frontier model efforts backed by multibillion-dollar budgets, Trinity Large was trained under relatively tight constraints. Arcee reports:

Approximate training cost: ~$20 million
Training duration: ~33 days
Team size: ~30 people
Total capital raised: just under $50 million

Atkins characterizes the approach as “engineering through constraint.” The idea is that limited capital and headcount force careful decisions about architecture, data, and infrastructure, rather than leaning on brute-force scale.

This constraint-driven approach influenced several aspects:

Architectural efficiency: The move to an extremely sparse 400B MoE is itself a cost-control mechanism—frontier capacity without dense-model training costs.
Hardware selection: Early access to Nvidia’s B300 (Blackwell) GPUs provided approximately 2x the speed and more memory compared with the Hopper generation, shortening the training window from a theoretical two to three months down to about a month.
Risk profile: With training spend approaching half the company’s total capital, the run was effectively a “back the company” bet, increasing pressure to optimize stability and time-to-usable-checkpoint.

For technical leaders, the key takeaway is that Arcee is explicitly positioning Trinity Large as proof that frontier-class, open models can be built without hyperscaler budgets—if the architecture and training regimen are tightly optimized.

Inside the Architecture: 4-of-256 Experts, SMEBU, and Long Context

Under the hood, Trinity Large combines several architectural decisions aimed at both efficiency and long-context performance.

4-of-256 MoE routing. The model uses a 4-of-256 sparse MoE scheme: for each token, only 4 of 256 experts are activated. This is an unusually high expert count with very sparse activation, which creates stability and routing challenges:

Without careful design, some experts can become “winners,” absorbing most of the traffic and capacity.
Other experts risk becoming effectively “dead” if they rarely receive gradient signal.

SMEBU for expert balance. To mitigate those issues, Arcee developed Soft-clamped Momentum Expert Bias Updates (SMEBU), a training mechanism designed to:

Encourage balanced routing across experts over a general web corpus
Promote meaningful specialization instead of a few dominant experts
Reduce the amount of wasted capacity from underutilized experts

This is central to making a 4-of-256 configuration viable at scale; without it, the high expert count would likely not translate into practical capacity.

Synthetic data via DatologyAI. In partnership with DatologyAI, Arcee used more than 8 trillion tokens of synthetic data. The key detail is that this data is not simple imitation of larger teacher models’ outputs. Instead, the pipeline:

Takes raw web text—blogs, Wikipedia, and similar sources
Synthetically rewrites or compresses it
Aims to condense the information into fewer tokens while preserving content

The goal is to push the model toward reasoning over condensed information rather than memorizing long exact token sequences. For engineers, this signals a focus on token efficiency and reasoning density rather than raw token throughput alone.

Long-context attention design. Trinity Large employs alternating local and global sliding-window attention layers in a 3:1 ratio. This hybrid setup is intended to:

Provide efficient handling of long sequences by using mostly local windows
Inject periodic global layers to propagate information across distant tokens

The model was trained for a 256k sequence length and “natively supports” 512k context, with internal evaluations suggesting usable performance up to around 1 million tokens. While external benchmarks are not detailed in the source material, the stated design clearly targets large-context applications such as complex agentic workflows, multi-document reasoning, and extended tool-augmented sessions.

How Trinity Large Compares to OpenAI’s gpt-oss-120b

Within the U.S. ecosystem, the most direct comparison is to OpenAI’s gpt-oss-120b, another sparse architecture released under a permissive license. Both are positioned as open, frontier-capable models that can be self-hosted and customized.

Based on the information Arcee has shared:

Architecture: Both use sparse designs, but Trinity Large deploys many more total parameters (400B vs. 120B) with MoE experts, pushing harder on the capacity side.
Reasoning and math: gpt-oss-120b currently appears to have an edge on specific reasoning and math benchmarks.
Context and capacity: Trinity Large’s advantages lie in much larger context windows and greater raw parameter depth, which are especially relevant for multi-step, long-horizon agent workflows.

For practitioners, the choice between them is less “which is universally better?” and more “which better fits our workload pattern?”:

If your priority is peak performance on classic reasoning/math benchmarks, gpt-oss-120b may be the more attractive baseline today.
If you are designing complex agents that must operate over very long contexts or orchestrate many tools over extended sessions, Trinity Large’s context and MoE capacity may be more compelling.

Both models, importantly, exist within a U.S. legal and regulatory environment and come with permissive licensing, which differentiates them from many of the strongest Chinese open models.

Sovereign, Apache 2.0–Licensed Infrastructure in a Geopolitical Vacuum

Arcee frames Trinity Large not only as a technical milestone but as a sovereignty play. CEO Mark McQuade describes a shift in the open ecosystem: Western or U.S.-based players pulled back from true open-source at the frontier just as Chinese labs ramped up production of strong, open, state-of-the-art models.

That created a tension for many U.S. and Western enterprises. According to McQuade, large organizations increasingly found they could not adopt Chinese-origin architectures due to regulatory, policy, or security considerations—but also lacked domestic open alternatives at comparable quality.

Trinity Large responds to that gap with:

U.S.-based origin and training—positioned as more acceptable to enterprises with strict data-sovereignty or supplier-risk constraints.
Apache 2.0 licensing—providing a “gold standard” permissive framework that allows companies to fully own and redistribute derivatives, including commercial products.
Self-hostability—critical in finance, defense, and other sensitive sectors where depending on third-party-hosted or restrictive cloud models is often untenable.

For CTOs and heads of ML platforms, the practical impact is that Trinity Large can function as a sovereign model layer: they can ingest the weights, audit them, fine-tune, wrap them in their own guardrails, and deploy on infrastructure of their choosing, without upstream dependency on Chinese labs or closed U.S. APIs.

From Intelligence to Utility: Arcee’s Alignment Priorities

Arcee is currently focused on what it calls the “current thinking model” for Trinity Large—essentially, the transition path from a strong base model into a production-ready reasoning system. Internally, the team is trying to balance “intelligence vs. usefulness.”

In concrete terms, that means:

Avoiding overly “yappy” behavior common in heavily RLHF-tuned chat models that maximize user-pleasing verbosity at the expense of precision and latency.
Preserving strong benchmark performance while still making the model efficient and predictable in real applications.
Designing alignment and instruction-tuning regimes that serve agentic, tool-using workflows rather than generic chat.

Arcee’s stated motto for Trinity—“We built Trinity so you can own it”—encapsulates the intended relationship between base capability and downstream alignment. Instead of shipping only a final, fixed chat persona, Arcee is offering both the raw TrueBase checkpoint and a path to an instruct and reasoning model that users can substantially reshape.

As production AI systems trend toward complex, long-context agents—not just chat UIs—this approach positions Trinity Large less as another “chatbot” and more as a programmable, sovereign substrate on which organizations can build their own stacks of alignment, tools, and workflows.

For AI researchers, ML engineers, and technical leaders, Trinity Large and TrueBase provide a new, U.S.-origin option in a space that has been increasingly dominated by either proprietary U.S. APIs or strong but geopolitically sensitive Chinese open models. The real test will be how the ecosystem uses this combination of high sparsity, long context, and raw checkpoint access to build specialized, aligned systems at scale.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.