oh-my-agent v2: Nine New Skills, First-Class Cursor, and an 80/100 Benchmark

If you have watched an AI coding agent install a package version that does not exist in your lockfile, or ship a function that fails your own lint config on the first commit, you already understand the gap oh-my-agent v2 is built to close. The framework’s second major release adds nine new skills, promotes Cursor to a first-class vendor, and ships a benchmark that scores the toolkit 80 out of 100.

Here is what v2 changes, and how to decide whether the additions target real failure modes or just expand the surface area.

What oh-my-agent does

oh-my-agent is a skill layer that sits between you and whatever AI coding agent you run. The name borrows from oh-my-zsh, and the analogy holds: instead of configuring shell behavior, you configure agent behavior with reusable, composable instruction modules the project calls skills.

The problem it targets is consistency. A raw coding agent keeps no durable memory of your project’s conventions. Ask it to add a dependency and it may guess a version that is not in your lockfile. Ask it to write a component and it may ignore the lint config sitting in your repo root. These are not edge cases — they are the default behavior of an agent that treats every request as a fresh context.

A skill in oh-my-agent is a packaged set of instructions and checks the agent loads when a task matches. One skill might force the agent to read your package.json and lockfile before proposing a version. Another might surface your linter rules before any code is written. The pitch is that you stop re-explaining the same constraints in every prompt.

The nine new skills in v2

The v2 release adds nine skills. Three are worth calling out, because they map to problems most teams hit within a week of adopting an agent.

deepsec handles security review. Instead of trusting the agent to remember secure patterns, the skill runs a structured pass over generated code, checking for the injection, secret-handling, and trust-boundary mistakes agents introduce when they optimize for making something work.

observability pushes the agent to add logging, metrics, and tracing as it writes code, rather than leaving instrumentation as a follow-up task that never happens.

docs drift detection is the one most teams underrate. When an agent changes a function signature or a config option, the matching documentation usually goes stale without anyone noticing. This skill flags the gap so docs and code stay in sync.

The remaining six skills round out areas like testing and project conventions. The pattern across all nine is the same: take a step a developer is supposed to do, and make it a non-optional part of the agent’s workflow instead of a hope.

Cursor becomes a first-class vendor

Earlier oh-my-agent releases were built around one agent and treated the rest as second-class. v2 changes the model. A vendor is the underlying agent that executes skills, and Cursor is now a first-class vendor, which means skills are tested against it and ship with Cursor-specific wiring rather than a generic fallback.

In practice, you can keep oh-my-agent’s skill definitions in one place and run them through Cursor’s agent without rewriting instructions per tool. For teams that have standardized on Cursor as their editor, that removes the main reason to maintain a separate, hand-rolled set of project rules.

Cursor

The AI-native editor that oh-my-agent v2 now supports as a first-class vendor, with tested skill wiring instead of a generic fallback.

Free tier; Pro $20/month

Try Cursor

Affiliate link · We earn a commission at no cost to you.

First-class status is a maintenance commitment, not a one-time feature. The thing to watch over the next few releases is whether Cursor support keeps pace with the primary vendor or quietly drifts behind it — the usual failure pattern for multi-vendor tools.

What the 80/100 benchmark does and doesn’t tell you

v2 ships with a benchmark that scores the toolkit 80 out of 100. A published, repeatable number is useful on its own: it gives you a baseline to compare future releases against, and it signals the project is willing to measure itself instead of leaning on adjectives.

Treat the number as a starting point, not a verdict. A benchmark reflects the tasks its authors chose. An 80 on the project’s own suite tells you the skills behave as designed on that suite. It does not tell you how they perform on your codebase, your stack, or your conventions.

The honest read on v2: the release aims squarely at the most common, least glamorous agent failures — wrong versions, ignored configs, stale docs — rather than chasing a flashier capability. That is the right target. The open question is operational. Nine new skills is a lot of surface to keep working across two first-class vendors, and the real proof will be whether release three holds the line.

FAQ

Do I still need oh-my-agent if I already use Cursor?

They solve different problems. Cursor gives you the agent that writes code; oh-my-agent is the layer that tells that agent which project-specific rules to enforce — your dependency versions, lint config, and documentation. First-class vendor support means the two work together without per-tool rewrites.

What does 'first-class vendor' actually mean?

A vendor is the underlying agent that executes oh-my-agent's skills. First-class means the project tests its skills against that agent and ships agent-specific wiring, rather than a generic fallback that may or may not behave correctly. In v2, Cursor joined that tier.

Will the skills help on any codebase?

The nine skills are general-purpose instruction modules, so they apply broadly. But the 80/100 benchmark reflects the project's own test suite, not your repo. Whether they measurably reduce failures on your stack is something to verify on a real branch before rolling them out team-wide.

oh-my-agent v2: Nine New Skills, First-Class Cursor, and an 80/100 Benchmark

What oh-my-agent does

The nine new skills in v2

Cursor becomes a first-class vendor

Cursor

What the 80/100 benchmark does and doesn’t tell you

FAQ

Aider vs Continue.dev: Terminal-First vs Editor-First AI Coding in 2026

AI Code Review Tools Compared: CodeRabbit, Greptile, and Diamond in 2026

Using Claude Code Subagents for Parallel Refactoring: A Hands-On Workflow

Cline vs Roo Code: Comparing Open-Source Agentic Coding Extensions in 2026

How to Build a Skills Library for Your AI Engineering Team

Get the best tools, weekly