Not tax advice. Computation tools only. Have a professional check your work before filing.
All posts

Two-Thirds of AI Tax Answers Are Wrong. Here's Why — and What Actually Fixes It

Michael Cutajar|5 June 2026|5 min read
aitaxresearchaccuracyverificationclaudechatgpt

Two-thirds.

That's the number from a Loyola University Chicago study that tested major chatbots on a straightforward tax question. Two out of three answers were wrong. Bloomberg covered it in March 2026 under the headline "Claude and ChatGPT Tax Prep Is Here. Use Caution." Entrepreneur and MSN ran it. It landed in a lot of inboxes.

Accountants who saw it weren't surprised. They were just relieved someone finally measured it.

Why it happens — and it's not what most coverage says

The Bloomberg story, and most commentary that followed, framed this as "AI is bad at math" or "AI doesn't understand complex rules." That misses the actual problem.

AI models are not bad at reasoning. They're structurally disconnected from current tax law. These are different problems with different solutions.

Here's what's actually happening when you ask a chatbot a tax question:

The model was trained on a snapshot of the internet — articles, forum posts, official publications — up to a cutoff date. Tax law changes every single year. Rates move. Thresholds are adjusted for inflation. New reliefs are introduced; old ones expire. The model doesn't know what year it is when it answers you. It generates the most statistically plausible-sounding response based on whatever tax content was in its training data. That might be last year's figures. It might be from a different jurisdiction. It might be a confident blend of three different countries' rules, averaged into something that sounds right but isn't.

Three examples where this goes wrong in practice:

Income tax bands. The 2025/26 UK personal allowance is £12,570. A model trained before the last Budget might quote £12,570 and be accidentally correct — or it might quote a figure from two years prior and be confidently wrong. Either way, the model isn't reading current legislation. It's recalling.

VAT on cross-border digital services. B2B reverse charge rules are jurisdiction-specific and change frequently. A business selling software to a German company asks whether they need to register for VAT in Germany. A model might blend UK and EU rules, apply pre-Brexit treatment, or confuse B2B and B2C thresholds. The answer will sound authoritative. It may be entirely wrong.

Crypto gains. Someone asked an AI whether crypto gains under $3,000 need to be reported to the IRS. The AI said no. They filed without reporting. They're now dealing with the IRS. There is no $3,000 de minimis exemption for crypto gains in the United States. There never was. The model fabricated a rule that didn't exist, stated it with confidence, and moved on.

What doesn't fix it

"Use caution" is not a strategy.

Every article about AI tax errors ends with some version of "always consult a professional." That advice is correct. It is also useless if the user has already acted on the AI's answer, already filed, already made a decision based on what the model told them.

Appending a disclaimer to a wrong answer doesn't make the answer less wrong. The warning doesn't travel with the output. People copy the numbers. They file. They move on.

The caution paragraph at the bottom of a Bloomberg story doesn't help the person who's already in IRS trouble.

What actually fixes it

The problem is not that the model is bad at reasoning. The problem is that the model is being asked to recall current tax law from memory when it should be reading verified tax rules at the time of answering.

That's the structural fix: give the AI current, verified rules to read, instead of letting it improvise from whatever it half-remembers from training.

That's what OpenAccountants does. It's an MCP server — a knowledge layer that plugs into AI agents like Claude, ChatGPT, Cursor, and Windsurf — that exposes a library of CPA/EA-verified tax skills. Each skill contains the current rates, correct thresholds, filing rules, and source citations for a specific jurisdiction and tax type. The AI reads that skill when answering, rather than guessing from its training data.

The difference in practice:

Without OpenAccountants: "The UK personal allowance is £12,570 for the 2024/25 tax year." Maybe right, maybe stale, no citation, no way for the user to verify.

With OpenAccountants: The agent loads the current UK income tax skill — verified by a named ACCA-qualified accountant, with a direct citation to HMRC's published rate tables. The model applies those figures. Not its memory. The verified source.

The model's role shifts from recall to reasoning. That's the fix.

How to set it up

One command. Free. Works with Claude, ChatGPT, Cursor, Windsurf, or any agent that supports MCP:

npx openaccountants connect

Full instructions at openaccountants.com/connect.

Once it's running, your AI agent automatically pulls from the verified skill library before answering tax questions. You don't have to prompt it differently. It just works.

If you've already used AI for tax without this

You still have options.

If you've generated AI tax workings and want a professional eye on them before you file — or after — a licensed accountant in your jurisdiction can review the working paper directly. They'll check the figures, flag anything wrong, and give you a professional sign-off. That's the safety net for work that's already been done.

You can request a review here.

What this doesn't solve

I'll be direct about the limits.

OpenAccountants improves accuracy on clearly defined tax computations: income tax, corporate tax, VAT, payroll. Current rates, correct thresholds, right jurisdiction. That's a large share of the questions people are getting wrong.

It does not turn AI into a tax attorney. Complex situations — restructuring, contested assessments, cross-border arrangements with multiple moving parts, first-time returns with unusual income patterns — still require a qualified human who can exercise judgement, ask follow-up questions, and take professional responsibility for the advice.

The goal isn't to eliminate professional involvement. The goal is to make the starting point accurate, provide clear sources, and escalate correctly when something is out of scope. Right now, AI is failing at the starting point. That's the problem this fixes.

The accountant's take

I've been reviewing AI-generated tax output for real clients for over a year. The Loyola study confirmed what I see routinely: confident, well-formatted answers that are factually wrong. Wrong year. Wrong jurisdiction. Wrong rule that doesn't exist anywhere.

The profession's response so far has mostly been to warn people away from AI entirely. That's not going to work. People are already using it. The question is whether the rules they're using are right.

Verified skills are how you make them right. It's not perfect — verified coverage is still incomplete, and complex situations still need a human — but it's the only structural fix that actually addresses why the answers are wrong in the first place.

Two-thirds wrong is the baseline. It doesn't have to stay there.