AI Tools That Actually Replace SQL Skills for PMs (And the Ones That Don't)
I tested 6 'natural language to data' tools across the kind of queries PMs actually need to run — retention, funnel, cohort analysis. Some are genuinely usable now. Some produce confidently wrong answers.
The premise
“I shouldn’t have to wait 3 days for a data team query” has been the PM productivity dream for a decade. The 2024-2025 generation of AI-on-warehouse tools claims to deliver it: type “what’s the 28-day retention curve for users who completed onboarding in March?” and get back a chart. No SQL. No queue.
I ran 12 queries through 6 tools across two months. The queries were intentionally PM-realistic — not “SELECT COUNT(*) FROM users” softballs, but real questions like “what’s the 7-day-to-28-day retention drop-off for users who came from the LinkedIn ad campaign vs. organic search?” These are the queries that justify the spend.
Here’s what actually worked.
The test queries
- Daily/weekly/monthly active users for the last 90 days
- Funnel from signup → trial → paid for a specific cohort
- 28-day retention by acquisition channel
- Average time-to-value for new users (defined as their 3rd “successful” action)
- Churn rate over the last 6 months, segmented by plan tier
- Top 10 most-used features by paid customers in the last 30 days
- Revenue per cohort by signup month
- Users who completed signup but never returned (the “ghost” cohort)
- Difference in feature adoption between US and EU users
- NPS score correlation with feature usage patterns
- Predicted churn next quarter based on engagement trends
- Cohort comparison of feature X usage before/after a pricing change
Queries 1-4 are the “easy” tier (single table, basic aggregates). 5-8 are PM-daily-grind (joins, time windows, cohort logic). 9-12 are “this is why your data team is busy” tier.
What I tested
- ThoughtSpot (Spotter natural language layer): $1,200/year minimum
- Hex with Magic AI: $35/seat/month
- Mode with AI Assistant: $30/seat/month
- Sigma with Ask Sigma: $360/seat/year
- Snowflake Cortex (direct query in their UI): pay-per-query
- A custom Claude + dbt + Snowflake pipeline my data lead set up: nominal
The custom Claude pipeline is the surprise — turns out giving Claude Opus 4.7 access to your dbt model definitions and a read-only Snowflake connection produces better answers than most of the commercial tools, for $0.50/query in API costs and ~$2k of one-time setup time from a data engineer.
Results
| Tier | ThoughtSpot | Hex | Mode | Sigma | Snowflake Cortex | Claude+dbt |
|---|---|---|---|---|---|---|
| 1-4 (easy) | ✅ ✅ ✅ ✅ | ✅ ✅ ✅ ✅ | ✅ ✅ ✅ ✅ | ✅ ✅ ✅ ✅ | ✅ ✅ ⚠️ ✅ | ✅ ✅ ✅ ✅ |
| 5-8 (grind) | ✅ ⚠️ ⚠️ ❌ | ✅ ✅ ⚠️ ⚠️ | ✅ ⚠️ ⚠️ ❌ | ✅ ✅ ⚠️ ❌ | ⚠️ ⚠️ ❌ ❌ | ✅ ✅ ✅ ✅ |
| 9-12 (hard) | ❌ ❌ ❌ ❌ | ⚠️ ⚠️ ❌ ❌ | ❌ ❌ ❌ ❌ | ⚠️ ❌ ❌ ❌ | ❌ ❌ ❌ ❌ | ✅ ⚠️ ❌ ⚠️ |
✅ = correct answer, verified against a hand-written SQL query I ran separately ⚠️ = approximately correct or partially correct (e.g., right shape, wrong numbers in one segment) ❌ = wrong answer or hallucinated structure (e.g., joined the wrong tables, fabricated a column name)
What this means
Easy queries are solved. All six tools handle “DAU for last 90 days” and similar single-table aggregations correctly. If that’s the bulk of your data needs, any of them works.
The PM-daily-grind tier is where the tools differentiate, and where the commercial tools fall down hardest. The failure mode is consistent: the tool will join two tables that have a column name in common but aren’t actually joinable in a meaningful way, produce a result that looks plausible, and not flag uncertainty. I caught Hex confidently telling me my acquisition-channel retention by joining users.channel_id to events.id (different IDs entirely, but the column types matched).
The hard tier exposes the gap. Question 11 (“predict churn next quarter”) requires a model. Question 12 (“cohort comparison around a pricing change”) requires understanding that “before pricing change” needs to be defined as “before May 15, 2026” — a fact that lives in your team’s heads, not the warehouse. None of the commercial tools handle this.
Why Claude+dbt wins
The differentiator isn’t the model quality — Hex and Mode use comparable models. It’s the grounding. When Claude has your full dbt model definitions in context, it can read the column descriptions, see the join keys you’ve declared in your .yml files, and understand which tables represent what. The commercial tools have to infer this from schema introspection, which works for simple cases and breaks for anything where the schema has columns named misleadingly.
This is a real moat for any data team that’s invested in dbt. If you have dbt models with documentation, you can deploy Claude + a read-only warehouse connection in a day and get better-than-commercial results.
If you don’t have dbt and your warehouse schema has cryptic column names, you’ll have a bad time with all six options.
The “will this replace SQL” question
For 60% of PM queries: yes, today. The 60% being everything in the easy tier plus the simple end of the grind tier.
For 30% of PM queries: no. You’ll get faster answers using one of these tools as a starting point and then asking your data team to verify the join logic. This is still a 5x speedup over the old “file a ticket, wait 3 days” workflow.
For 10% of PM queries: absolutely not. The hard tier requires data engineering work that no AI tool will do reliably in 2026.
The 60% number is high enough that “the median PM can self-serve most of their data needs” is now true. The implication for the data team is that the queries that do reach them are now harder on average — the easy stuff stopped reaching the queue. Plan capacity accordingly.
What I’d actually deploy
For a 30-person product org with an existing Snowflake + dbt setup: build the Claude+dbt pipeline. Cost: ~$2k of data-eng time + ~$200/month in Claude API costs for the whole team. Output quality matches or exceeds the commercial tools at 5-10x lower cost.
For a team without dbt: Hex with Magic AI. The Magic AI quality isn’t best-in-class, but Hex’s notebook model means PMs can iterate on a query, see the SQL, edit it, and learn over time. The other commercial tools hide the SQL, which feels nice initially and traps you in their black box later.
Avoid: ThoughtSpot for this use case specifically (the natural-language layer is fine but the per-seat pricing is high for the value), and Snowflake Cortex (currently a science project, not a product).
Verdict
The PM-SQL gap is closing fast. Today: 60% of queries are AI-self-servable, with the right tool. In 12 months: probably 75%. The data team queue will get smaller and the data team’s work will get harder per ticket.
The right move for a PM in 2026 isn’t to learn SQL from scratch — it’s to learn enough SQL to read what the AI generated and catch the wrong joins. That skill ceiling is much lower than full SQL fluency, and it’s the actual differentiator between PMs who self-serve effectively and PMs who file confidently-wrong dashboards.
Related reading
2026-05-28
Granola vs Otter vs Fireflies: Meeting AI for Product Teams in 2026
Three meeting AI tools tested across 40+ product calls (discovery, internal sync, customer interviews). What each one is actually good at, where they all fail, and the per-seat math.
2026-05-28
Notion AI for PMs in 2026: Workflow, Limits, and What Actually Saves Time
A product manager's honest review of Notion AI: where it replaces real PM work, where it produces convincing-but-useless output, and the workflow patterns that turned $10/month into hours saved per week.
2026-05-28
Perplexity Pro for Competitive Research: A PM's Day-to-Day Workflow
How I use Perplexity Pro to do the competitive research a PM actually needs — pricing pulls, feature deltas, customer reviews — and the specific prompts that get past surface-level summaries.
2026-05-28
How to Write a PRD with Claude: A PM's End-to-End Workflow
The exact 4-prompt sequence I use to turn a 3-sentence feature brief into a reviewable PRD in 25 minutes — and the parts of the process Claude can't replace.
2026-05-28
Backtesting Your First Quant Strategy with Python: A Walkthrough
A step-by-step guide from data to ranked results — the survivorship-bias trap, the look-ahead bug, transaction costs that destroy paper returns, and the smallest viable backtest harness.
Get the best tools, weekly
One email every Friday. No spam, unsubscribe anytime.