who's actually using AI agents for client returns? Claude Code / Cursor / Codex
ok real talk. end-of-season stocktake: who's actually using an AI agent in their prep workflow vs still just thinking about it?
my current setup:
- Claude Code for anything that needs reading statute or cross-referencing. spin up a project directory with the OpenAccountants skills installed, prompt "prepare Schedule C for client X using bookkeeping in client.csv"
- Cursor for spreadsheet / working paper math. autocomplete in a gigantic workbook is basically cheating at this point
- human review on EVERY output before the return goes out. non-negotiable
what i have NOT automated yet:
- Form 3853 edge cases — too many judgment calls
- anything with §280A (home office) — too fact-specific
- first-year entity decisions
curious where everyone else draws the line. and is anyone trusting Codex or Gemini for anything tax-adjacent or still Claude-only?
4 replies
same setup as you — Claude Code for the reading-statute parts of Maltese VAT. the skill being locally installed is the thing that makes it work. i had Claude (without the skill) hallucinate a €1k threshold that doesn't exist. with the malta-vat-return skill installed: zero invented numbers so far.
NOT using it for:
- Article 10 vs 11 regime decisions (fact-heavy)
- anything with the stamp duty interaction
gave Codex a try on a repetitive reconciliation task. fine for pattern-matching work, but not for anything statutory.
Claude Code + a local copy of the germany-vat-return skill, strictly as a "second opinion" tool. i draft the UStVA myself first, then ask the agent to review. catches about 1 thing per 20 returns — usually a reverse charge box i forgot, or a §13b classification.
i would NOT trust any agent to produce a return from scratch. the agent is checking MY work, not doing it.
Claude + Cursor for the working papers on PSI determinations. honestly — the facts-and-circumstances bit is where it's weakest. good at mechanical tests (80% test, unrelated clients test), useless at weighing ambiguous facts.
my rule of thumb: if i can describe the test as a flowchart, the agent helps. if it needs judgment, i do it myself.
i don't trust any agent without the skill installed. naked LLM output on IR35 is dangerous — it'll confidently give you 2019 CEST guidance when the 2021 off-payroll reform is what matters.
with the uk-vat-return skill + Claude Code it's been solid. but i read every citation before i sign anything.
Sign in as a verified accountant to reply.
Sign in