oh-my-agent v2: Nine New Skills, First-Class Cursor, and an 80/100 Benchmark
oh-my-agent v2 adds nine new skills, promotes Cursor to a first-class vendor, and ships a benchmark scoring 80/100. A measured look at whether it fixes the agent failures developers actually hit.
If you have watched an AI coding agent install a package version that does not exist in your lockfile, or ship a function that fails your own lint config on the first commit, you already understand the gap oh-my-agent v2 is built to close. The framework’s second major release adds nine new skills, promotes Cursor to a first-class vendor, and ships a benchmark that scores the toolkit 80 out of 100.
Here is what v2 changes, and how to decide whether the additions target real failure modes or just expand the surface area.
What oh-my-agent does
oh-my-agent is a skill layer that sits between you and whatever AI coding agent you run. The name borrows from oh-my-zsh, and the analogy holds: instead of configuring shell behavior, you configure agent behavior with reusable, composable instruction modules the project calls skills.
The problem it targets is consistency. A raw coding agent keeps no durable memory of your project’s conventions. Ask it to add a dependency and it may guess a version that is not in your lockfile. Ask it to write a component and it may ignore the lint config sitting in your repo root. These are not edge cases — they are the default behavior of an agent that treats every request as a fresh context.
A skill in oh-my-agent is a packaged set of instructions and checks the agent loads when a task matches. One skill might force the agent to read your package.json and lockfile before proposing a version. Another might surface your linter rules before any code is written. The pitch is that you stop re-explaining the same constraints in every prompt.
The nine new skills in v2
The v2 release adds nine skills. Three are worth calling out, because they map to problems most teams hit within a week of adopting an agent.
deepsec handles security review. Instead of trusting the agent to remember secure patterns, the skill runs a structured pass over generated code, checking for the injection, secret-handling, and trust-boundary mistakes agents introduce when they optimize for making something work.
observability pushes the agent to add logging, metrics, and tracing as it writes code, rather than leaving instrumentation as a follow-up task that never happens.
docs drift detection is the one most teams underrate. When an agent changes a function signature or a config option, the matching documentation usually goes stale without anyone noticing. This skill flags the gap so docs and code stay in sync.
The remaining six skills round out areas like testing and project conventions. The pattern across all nine is the same: take a step a developer is supposed to do, and make it a non-optional part of the agent’s workflow instead of a hope.
Cursor becomes a first-class vendor
Earlier oh-my-agent releases were built around one agent and treated the rest as second-class. v2 changes the model. A vendor is the underlying agent that executes skills, and Cursor is now a first-class vendor, which means skills are tested against it and ship with Cursor-specific wiring rather than a generic fallback.
In practice, you can keep oh-my-agent’s skill definitions in one place and run them through Cursor’s agent without rewriting instructions per tool. For teams that have standardized on Cursor as their editor, that removes the main reason to maintain a separate, hand-rolled set of project rules.
Cursor
The AI-native editor that oh-my-agent v2 now supports as a first-class vendor, with tested skill wiring instead of a generic fallback.
Free tier; Pro $20/month
Affiliate link · We earn a commission at no cost to you.
First-class status is a maintenance commitment, not a one-time feature. The thing to watch over the next few releases is whether Cursor support keeps pace with the primary vendor or quietly drifts behind it — the usual failure pattern for multi-vendor tools.
What the 80/100 benchmark does and doesn’t tell you
v2 ships with a benchmark that scores the toolkit 80 out of 100. A published, repeatable number is useful on its own: it gives you a baseline to compare future releases against, and it signals the project is willing to measure itself instead of leaning on adjectives.
Treat the number as a starting point, not a verdict. A benchmark reflects the tasks its authors chose. An 80 on the project’s own suite tells you the skills behave as designed on that suite. It does not tell you how they perform on your codebase, your stack, or your conventions.
The honest read on v2: the release aims squarely at the most common, least glamorous agent failures — wrong versions, ignored configs, stale docs — rather than chasing a flashier capability. That is the right target. The open question is operational. Nine new skills is a lot of surface to keep working across two first-class vendors, and the real proof will be whether release three holds the line.
FAQ
Do I still need oh-my-agent if I already use Cursor? +
What does 'first-class vendor' actually mean? +
Will the skills help on any codebase? +
Related reading
2026-05-26
Orthrus: Parallel Token Generation That Doesn't Change Your Model's Output
Orthrus injects diffusion attention into each layer of a frozen autoregressive Transformer to generate 32 tokens in parallel — without altering the base model's output distribution.
2026-05-26
NVIDIA Warp Review: GPU-Accelerated Python for Simulation, Robotics, and Differentiable ML
NVIDIA Warp compiles Python functions to CUDA kernels for differentiable physics and robotics. We benchmarked it against JAX and Taichi to figure out when it earns a spot in your stack.
2026-05-26
OpenAI Daybreak vs Anthropic Glasswing: Convergent Bets on LLM Security Tooling
OpenAI's Daybreak (GPT-5.5 + Codex Security) and Anthropic's Glasswing shipped near-identical AppSec products the same week. What the convergence means and how to pick.
2026-05-26
Macchiato Day 2: Live Token Metrics and Parallel AI Terminals Reviewed
Macchiato's day-2 build adds a live token/cost sidebar and keyboard shortcuts for swapping between Claude Code and OpenCode in one terminal. Here's what shipped and what it means.
2026-05-26
Macchiato Day 2: Live Token Metrics and Parallel Terminals for Claude Code and OpenCode
Macchiato Day 2 adds a 2-4 pane terminal grid, live token and cost meters, and configurable spend ceilings for Claude Code and OpenCode sessions. Here is what it actually does and who should install it.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.