AI-generated working papers — what does your review process look like?
i've been letting Claude generate first-draft working papers for Einkommensteuererklärungen (income tax returns) for individual clients. the mechanical output is honestly impressive — it pulls the right §§, fills in the Anlage correctly, computes Sonderausgaben.
but here's my concern: my review process is basically "read it and check if it feels right." that's not a process, that's vibes-based auditing.
i want to build a proper QA checklist for AI-generated working papers. something like:
- ☐ every number traces to a source document
- ☐ every statutory reference is verified against current law
- ☐ all thresholds checked against current-year values
- ☐ sign conventions correct (positive/negative)
- ☐ client-specific elections reflected
- ☐ prior-year comparison (does this make sense vs last year?)
anyone already doing this systematically? or are we all still in the "read it and hope" phase?
4 replies
we built an internal template in Notion that every AI-generated working paper goes through before it reaches the partner. takes about 15 minutes per return. the team hated it at first ("we already reviewed it") but it's caught 4 material errors in 2 months that "reading it" alone missed.
the most common error type: the agent uses LAST YEAR'S thresholds. skills help with this but don't eliminate it entirely.
i have a version of this for US returns. key additions to your list:
- ☐ state-specific rules applied (not just federal)
- ☐ phase-outs computed at correct income level
- ☐ carryforward items from prior year included
- ☐ estimated payments reconciled
the prior-year comparison (#6) is the single most useful check. if the agent says the client owes $15k more than last year and nothing material changed, something is wrong.
i think the honest answer is most of us are still in the "read it and hope" phase but feel guilty about admitting it. anna's checklist is a good start.
one thing i'd add: run the same question through the agent TWICE with slightly different wording. if you get materially different answers, that's a red flag that the output isn't reliable for that particular question.
for Japanese returns, i compare the agent output against the NTA's official calculation tool (確定申告書等作成コーナー). if numbers diverge, i investigate. simple but effective — gives you an independent verification source.
wish every jurisdiction had an equivalent free calculation tool to benchmark against.
Sign in as a verified accountant to reply.
Sign in