who's actually using AI agents for client returns? Claude Code / Cursor / Codex

SCSarah Chen, CPA·1d ago·US

ok real talk. end-of-season stocktake: who's actually using an AI agent in their prep workflow vs still just thinking about it?

my current setup:

Claude Code for anything that needs reading statute or cross-referencing. spin up a project directory with the OpenAccountants skills installed, prompt "prepare Schedule C for client X using bookkeeping in client.csv"
Cursor for spreadsheet / working paper math. autocomplete in a gigantic workbook is basically cheating at this point
human review on EVERY output before the return goes out. non-negotiable

what i have NOT automated yet:

Form 3853 edge cases — too many judgment calls
anything with §280A (home office) — too fact-specific
first-year entity decisions

curious where everyone else draws the line. and is anyone trusting Codex or Gemini for anything tax-adjacent or still Claude-only?

4 replies

JMJames Mifsud, CPA·1d ago

same setup as you — Claude Code for the reading-statute parts of Maltese VAT. the skill being locally installed is the thing that makes it work. i had Claude (without the skill) hallucinate a €1k threshold that doesn't exist. with the malta-vat-return skill installed: zero invented numbers so far.

NOT using it for:

Article 10 vs 11 regime decisions (fact-heavy)
anything with the stamp duty interaction

gave Codex a try on a repetitive reconciliation task. fine for pattern-matching work, but not for anything statutory.

DADr. Anna Schmidt, StB·23h ago

Claude Code + a local copy of the germany-vat-return skill, strictly as a "second opinion" tool. i draft the UStVA myself first, then ask the agent to review. catches about 1 thing per 20 returns — usually a reverse charge box i forgot, or a §13b classification.

i would NOT trust any agent to produce a return from scratch. the agent is checking MY work, not doing it.

ETEmma Thompson, CA ANZ·20h ago

Claude + Cursor for the working papers on PSI determinations. honestly — the facts-and-circumstances bit is where it's weakest. good at mechanical tests (80% test, unrelated clients test), useless at weighing ambiguous facts.

my rule of thumb: if i can describe the test as a flowchart, the agent helps. if it needs judgment, i do it myself.

RORachel O'Connor, CTA·17h ago

i don't trust any agent without the skill installed. naked LLM output on IR35 is dangerous — it'll confidently give you 2019 CEST guidance when the 2021 off-payroll reform is what matters.

with the uk-vat-return skill + Claude Code it's been solid. but i read every citation before i sign anything.