This Week in AI: The Agent Wars Converge, the Self-Improving Loop Goes Commercial, and Jensen Drops a Trillion-Dollar Number

This was the week convergence stopped being a trend and became a fact. Agents, self-improving models, design-to-code pipelines — every major story this week was a different company arriving at the same destination from a different direction.
Nvidia spent two hours on stage in San Jose and landed on one number that made everything else secondary. Google shipped two products in the same week that, taken together, change how you go from idea to deployed application. And the question of which AI model you're running quietly became less important than what you've done to it since you started.
The frontier is no longer the ceiling. This week made that impossible to ignore.
The Agent Wars Have a New Battlefield
Two weeks ago we noted that every major player had an agent product. This week they all upgraded at once — and the upgrade wasn't about capability. It was about containment, trust, and who gets to run inside your infrastructure.
NemoClaw: The Enterprise Wrapper
The most significant agent announcement of the week came from Nvidia at GTC. NemoClaw is Nvidia's enterprise wrapper around OpenClaw — single command installation, Nemotron model integration, and critically, a controlled runtime called OpenShell that enforces policies, privacy rules, and network limits on everything the agent does. Every action is routed through that layer. The agent doesn't step outside it.
OpenClaw's biggest barrier to enterprise adoption was never capability. It was the security conversation — the one that ended with "we can't have an autonomous agent with unrestricted access to our systems." NemoClaw ends that conversation. If you've been watching OpenClaw from the sidelines because your security team wouldn't sign off, this is the version they'll look at differently.
Manus My Computer
Manus — which Meta acquired after its explosion in popularity — shipped a desktop agent this week with terminal access, local file read/edit/launch capabilities, and full control of local applications. It's OpenClaw on your machine, from a company that now has Meta's resources behind it. The capability is not in question. The questions worth asking are the same ones you'd ask about any agent with terminal access: what does it touch when you're not watching, and what happens when it gets it wrong.
Claude Co-work Dispatch
Anthropic's Co-work added Dispatch this week — a persistent background conversation that runs on your computer, accepts messages from your phone, and returns finished work when you check back in. One persistent Claude. Always on. Accessible from anywhere.
It is, functionally, what OpenClaw has been for months. Anthropic is building toward something they had a direct path to own and chose not to pursue aggressively enough. The product is good. The timing is late.
MetaClaw: The Learning Agent
The quietest of the four and arguably the most interesting. MetaClaw is an open-source framework that sits on top of OpenClaw and turns every conversation into a learning signal. It intercepts your interactions, extracts skills, saves them to a persistent library, and injects relevant ones into future sessions. Enable reinforcement learning and it fine-tunes itself in the background during idle time. The agent you have next month is better than the one you have today — automatically, without manual intervention.
Four companies, four versions of the same idea. The differentiator is no longer whether your agent can do the task. It's whether you trust the environment it's running in. NemoClaw answers that for the enterprise. MetaClaw answers it for the power user who wants the agent to keep getting better. The middle ground — capable but running loose — is shrinking fast.
The Model Is Just the Starting Point
Last week, Karpathy showed a model improving itself overnight in a lab experiment. This week, two production systems shipped claiming they did it for real. And a coding tool revealed that the model underneath it isn't the one it started with — and made the case that this shouldn't matter at all.
MiniMax M2.7: Self-Evolution
MiniMax describes M2.7 as "the first model deeply participating in its own evolution." During training, the model ran its own experiments, updated its own tools, iterated through hundreds of cycles, and handled between 30% and 50% of its own development workflow autonomously. They call it self-evolution. The loop Karpathy open-sourced as a weekend experiment, MiniMax is claiming as a production methodology.
The results are hard to dismiss. On independent benchmarks from Artificial Analysis, M2.7 ties GLM5 — currently the best open-source model available. It benchmarks close to Gemini 3.1 Pro and Opus 4.6 on agentic coding tasks. It costs 50 cents per million tokens. Opus costs twenty times more. If the performance holds on your workloads, the pricing conversation ends quickly.
One flag worth raising: MiniMax has historically shipped open-weight models. M2.7 is proprietary. That's a strategic shift from a Chinese lab that built its reputation on openness. Note it.
MiroThinker 1.7 and H1
MiroThinker is an open-source research agent built around a loop of planning, tool use, and verification — not just answering questions but investigating them, checking intermediate steps, and auditing conclusions against evidence before delivering them. The H1 variant adds a second verification layer on top.
The benchmark story is interesting but the prediction story is better. On February 10th, MiroThinker was asked to predict the price of gold on February 25th. It was off by $4 — 0.08%. On January 6th, it was asked who would win the Super Bowl. It identified the Seattle Seahawks. They won on February 8th. Asked on January 8th which artist would dominate the Grammys, it named Kendrick Lamar. He won five awards.
Three predictions. Three correct answers. All made more than a month in advance.
The benchmark cherry-picking in the H1 release is real — comparisons switch between GPT 5.2 and 5.4 depending on the category, and Claude Opus disappears from certain tables entirely. Call that out, but don't let it obscure what the system is actually demonstrating. When your research agent is correctly calling major sporting and financial outcomes a month out, the conversation about which benchmark table it appeared in becomes less urgent.

Cursor Composer 2
Cursor's new native coding model is built on Kimi K2 — a Chinese open-weight model that Cursor took, fine-tuned specifically for coding tasks, and shipped as their own. Some corners of the developer community found that uncomfortable. The relevant response is: so what?
Someone took an existing model, retrained it with a clear objective, and produced a tool that benchmarks near GPT 5.4 on coding at a fraction of the cost. That's not a provenance story. That's the point. The model you start with stopped being the model you have to end with the day open weights became widely available. Cursor just proved it commercially. If you use Cursor daily, Composer 2 is your new default — switch to GPT 5.4 only when it gets stuck. The cost saving compounds quickly.
The thread across all three: MiniMax improved the model during training. MiroThinker improves its answers through iterative research loops. Cursor improved a base model for a specific purpose and priced it accordingly. The base model is the starting point. What you do with it is the product.
Jensen's Number
Nvidia GTC ran for the best part of a week. Jensen's keynote ran for two hours. The announcements covered supercomputing architecture, space-based data centres, neural rendering, autonomous vehicles, drug discovery, climate modelling, and humanoid robotics. One number cut through all of it.
Through 2027, Nvidia projects $1 trillion in GPU sales. Jensen confirmed at a private press Q&A that these are not forecasts — they are purchase orders. Companies have already committed to buying these chips when they become available. For context, Nvidia did approximately $500 billion in the previous year. They are projecting a doubling, backed by signed commitments.
The infrastructure story of the next two years just got a hard number attached to it.
Vera Rubin
The platform those chips will anchor is Vera Rubin — seven purpose-built chips working as a single unit, liquid-cooled racks treated as one compute node, terabyte-per-second internal links. Nvidia calls it extreme co-design: the entire data centre as one vertically integrated supercomputer. Training, inference, and agentic workflows on the same stack. Lowest cost per token claimed. Ships in stages through 2026 and 2027.
DLSS 5
The gamer backlash to DLSS 5 was predictable and mostly missed the point. This isn't a filter applied to existing games without consent. Game developers choose whether to implement it, set how aggressively it applies, and players can turn it off entirely. What it actually does — fusing traditional 3D rendering with generative AI to produce frames that look more detailed than the engine actually computed — is a genuinely novel approach to graphics that will matter more as it matures. The angry Reddit posts are noise. The underlying technique is not.
The Open Model Ecosystem
Beneath the hardware announcements, Nvidia quietly reinforced its open-weight model portfolio: Nemotron for language and reasoning, Cosmos for physics simulation and robot training, Isaac Groot for humanoid robot control, Alpameo for autonomous driving, BioNemo for drug discovery, Earth 2 for climate modelling. Each is open-weight and deployable. Nvidia is not just selling the GPUs to run AI — it's building the models that run on them. Vertical integration, top to bottom.
Google's Full-Stack Moment
Google shipped two products this week that the press covered separately and should have covered together.
Stitch
Stitch is Google's AI-native design canvas — voice-controlled, infinite canvas, Figma-adjacent in feel but built from the ground up for agent interaction. The key feature isn't the interface; it's the design.md export. Describe your design system in a markdown file and every new design you create inherits those rules automatically. That markdown file is also importable by Claude Code, Cursor, and OpenClaw. Google built a design tool specifically so your agent can read it.
In testing, Stitch generated four distinct website design variations from a single voice prompt, each with different layouts and colour schemes, rendered side by side for comparison. The quality is genuine. The speed is faster than anything comparable.
Google AI Studio Full-Stack Coding
The companion piece. Import your Stitch design — the code.html, design.md, and screen.png export — paste it into AI Studio, and ask it to build a functional version. In one demonstration, a complete animated website with working filters and hover effects was produced from a single prompt. Dark mode and some interactions needed further iteration. The core loop — design in Stitch, code in AI Studio, iterate between them — is real and it works today.
The combined workflow is the story. Design tool to code tool, round-trip, no manual translation layer, no handoff meeting, no separate developer required for the initial build. Google has quietly assembled a complete pipeline and shipped both ends of it in the same week. That deserves more attention than it got.
Gemini Personal Intelligence — Now Free
Brief but worth noting: the Gemini feature that connects your Gmail, Google Photos, and Calendar to your AI conversations — previously paywalled — is now rolling out to free users in the US. The personalisation gap between paid and free AI just narrowed significantly.
ID LoRA: The Capability Everyone Will Misuse First
ID LoRA is an open-source unified model that takes a reference image and a voice sample and generates a video of that person appearing to speak or sing — lip-synced, identity-consistent, production quality. Voice cloning and video generation in a single pipeline, not stitched together from separate tools.
The obvious conversation is the misuse one, and it's legitimate. A model that can clone anyone's voice and face from reference material and generate a convincing video of them saying anything is a serious capability that will be used irresponsibly before it's used responsibly.
But for legitimate use cases — artist performance visuals, branded spokesperson content, language localisation of existing video, accessibility applications — this is a step change from what was possible six months ago. The same capability that makes it dangerous for bad actors makes it transformative for legitimate ones.
It's open-source. It's available now. Understanding what it does is no longer optional for anyone working in video production, brand content, or media. The question isn't whether this technology exists — it does, and it's free. The question is whether you understand it well enough to use it intentionally, defend against it, and advise clients about it. If the answer is no, this week is a good time to change that.
Open Source: What Else Shipped
- Google Spark VSSR: The best open-source video upscaler available. Takes low-quality footage and outputs clean high-resolution video; tested against every comparable open-source alternative and wins by a clear margin. Full weights, inference code, and training code released. 42GB — needs a higher-end GPU — but if you're doing any video restoration or upscaling work, this is now the default option.
- Dreamverse / FastVideo: Near real-time video generation built on LTX3. A 5-second 1080p clip in 4.5 seconds on a single B200 GPU. Edit prompts land in under 3 seconds. The output has distortions at the edges and isn't production-ready, but the latency is genuinely new territory. Watch how this develops as the hardware requirement comes down.
- Terminator: Not a model. A single-layer probe that sits on top of existing reasoning models and stops them when the answer is ready. Cuts reasoning length by up to 55%, halves generation time, and reduces token costs accordingly. For OpenClaw and other agentic systems where models have a tendency to keep thinking long past the point of usefulness, this is a meaningful cost lever. Code is coming; not released yet. Watch for it.
- MidJourney V8 / Microsoft MAI Image 2: Two image models worth a brief mention. MidJourney V8 is faster and better at surreal creative work, but continues to struggle with instruction-following and text rendering where competitors have moved ahead. MAI Image 2 from Microsoft enters the arena at #3, with strong photorealism and reliable in-image text generation. Neither shifts the top of the leaderboard. Both are worth a test if you haven't already.
What Agencies Do Next
- Evaluate NemoClaw if your OpenClaw adoption has stalled on security grounds. Single command install, enforced policy layer, enterprise-grade containment. The blocker your security team cited may no longer apply. Revisit the conversation this week.
- Install MetaClaw on your existing OpenClaw setup. The agent that learns from your conversations and improves in the background costs nothing to run and compounds over time. Set it up before you need it, not after you've been running static workflows for another month.
- Switch Cursor to Composer 2 as your default model. Near GPT 5.4 coding quality at a fraction of the cost. Keep 5.4 available for when Composer 2 gets stuck. The savings compound at scale.
- Run the Google Stitch and AI Studio loop on one real project this week. Design a page or component in Stitch, export it, code it in AI Studio, iterate. Do this on something real, not a test. The workflow is production-ready enough to find out where it breaks on your specific use case.
- Understand ID LoRA before a client asks you about it. You will be asked. Run a test, understand the output quality, and know what you'd advise a brand or media client who discovers it independently. Being ahead of this conversation is considerably more valuable than catching up after the fact.
Bangkok8 AI: We'll tell you where the frontier is — so you can build past it.
.