Back to Home
Issue #3

GPT-5.6 Sol, Custom Chips, and the New AI Arms Race

ModelsIndustryTools

OpenAI went full vertical this week: a new model family that sets the frontier on fire, and a custom chip to make running it affordable. Google countered at I/O with an architecture that could make everyone's current approach look quaint. The AI arms race is now being fought simultaneously on silicon, software, and sheer audacity.

GPT-5.6 Sol: OpenAI's Model Gets a Management Layer

On June 26, OpenAI previewed GPT-5.6 Sol — the flagship of a new three-tier family (Sol, Terra, Luna) and their most capable release to date. The marquee feature is "ultra" mode, which deploys multiple subagents to decompose complex problems in parallel, then synthesizes their outputs into a coherent answer. Think of it as a model that delegates. It assigns different aspects of a problem to specialized internal reasoners, waits for them to finish, and integrates the work. It's basically a middle manager, except this one is actually useful.

On the safety front, OpenAI reports over 700,000 GPU-hours of automated red-teaming — their most aggressive safety testing ever. The preview is deliberately limited, with government coordination required for certain capability tiers. This is not a "sign up and go" launch; it's a controlled rollout reflecting both the model's power and the post-export-controls regulatory climate. OpenAI learned from watching Anthropic catch flak last week.

Pricing across the family: Sol at $5/$30, Terra at $2.50/$15, and Luna at $1/$6 per million input/output tokens. The three-tier structure gives developers clean capability-vs-cost tradeoffs, with Luna positioned to handle the boring stuff at prices that make "just throw AI at it" economically viable for almost anything.

Why it matters: The subagent architecture in ultra mode is a genuine paradigm shift — not just "bigger model, better scores." Multiple inference streams orchestrated in parallel is architecturally novel, and if it works as advertised, it's the kind of thing that makes single-forward-pass reasoning look primitive in hindsight. The limited preview also confirms we've entered an era where the most capable models may ship with access restrictions baked in from day one. Welcome to tiered intelligence.

OpenAI Gets Chippy with Broadcom: Meet Jalapeño

On June 24, OpenAI and Broadcom unveiled Jalapeño — OpenAI's first custom inference chip. Concept to tape-out in nine months, which Broadcom claims is the fastest ASIC development cycle for a chip of this complexity. (Marketing superlatives aside, nine months for custom silicon is genuinely quick.) It's optimized specifically for transformer inference patterns rather than being a general-purpose GPU that happens to do matrix math, yielding substantially better performance-per-watt than current NVIDIA kit for this specific workload.

The deployment plan is suitably ambitious: gigawatt-scale inference infrastructure built with Microsoft, purpose-built around Jalapeño. This isn't a research project or a conference demo — it's a serious play to fundamentally alter the cost structure of serving frontier models. If inference costs drop by the 60-70% analysts are projecting, entire categories of currently-uneconomical applications suddenly become viable.

What's next: This puts OpenAI on a collision course with NVIDIA in inference (training remains Jensen's kingdom — for now). It also raises the stakes for everyone else. Google has TPUs, Amazon has Trainium, and now OpenAI has Jalapeño. If you're running a frontier lab without a silicon strategy, you're bringing a cloud bill to a chip fight. The full-stack AI company is becoming table stakes, not a differentiator.

Google I/O 2026: "What If Text Generation Worked Like Image Generation?"

Google's I/O keynote was its usual firehose of announcements, but three items deserve attention. First: the Interactions API — a framework for building multi-turn, tool-using agentic apps on Gemini with dramatically less boilerplate. Good developer experience work, if not revolutionary. Second: a fully agentic Gemini app that takes complex multi-step actions across Google's ecosystem on behalf of users. Impressive demo; let's see how it handles the real world.

Third, and most interesting: DiffusionGemma — a diffusion-based text generation architecture that achieves 4x faster output than autoregressive decoding while maintaining quality parity. If you've been paying attention to the image generation world, diffusion is old news there. Applying it to text is not. The core insight: instead of generating tokens one by one (left to right, suffering sequentially), you generate text in parallel and iteratively refine it. Like sculpting from marble rather than laying bricks.

Why it matters: DiffusionGemma is the dark horse technical story of the week. If diffusion-based text generation scales — and that's a real "if" — it could make autoregressive decoding look like a bottleneck we tolerated simply because GPT-2 did it that way and nobody questioned it. Google released research-preview weights so the community can independently validate the speed claims. Early days, but 4x is the kind of number that makes architecture choices feel urgent.

Quick Hits

  • HP and OpenAI announce "Frontier" partnership — AI baked into HP's enterprise hardware stack. Your next company laptop will ship with optimized OpenAI model access whether you asked for it or not.
  • Sol limited preview — access currently restricted to select partners. General availability timeline "TBD pending government coordination," which is a phrase that did not exist in AI product launches twelve months ago.
  • Inference economics — analysts estimate Jalapeño could slash OpenAI's per-token serving costs by 60-70% at scale. Another round of API price cuts seems inevitable. Your move, Anthropic.
  • DiffusionGemma open weights — Google released research-preview weights, inviting the community to validate their claims. Bold move. Presumably they're confident the numbers hold up.

Never miss an issue

Get the most important AI news delivered to your inbox every week. Trusted by 50,000+ professionals.

Free forever. No spam. Unsubscribe anytime.