AI-generated working papers — what does your review process look like?

DADr. Anna Schmidt, StB·1d ago·AI Workflows·DE

i've been letting Claude generate first-draft working papers for Einkommensteuererklärungen (income tax returns) for individual clients. the mechanical output is honestly impressive — it pulls the right §§, fills in the Anlage correctly, computes Sonderausgaben.

but here's my concern: my review process is basically "read it and check if it feels right." that's not a process, that's vibes-based auditing.

i want to build a proper QA checklist for AI-generated working papers. something like:

☐ every number traces to a source document
☐ every statutory reference is verified against current law
☐ all thresholds checked against current-year values
☐ sign conventions correct (positive/negative)
☐ client-specific elections reflected
☐ prior-year comparison (does this make sense vs last year?)

anyone already doing this systematically? or are we all still in the "read it and hope" phase?

4 replies

ETEmma Thompson, CA ANZ·20h ago

we built an internal template in Notion that every AI-generated working paper goes through before it reaches the partner. takes about 15 minutes per return. the team hated it at first ("we already reviewed it") but it's caught 4 material errors in 2 months that "reading it" alone missed.

the most common error type: the agent uses LAST YEAR'S thresholds. skills help with this but don't eliminate it entirely.

SCSarah Chen, CPA·23h ago

i have a version of this for US returns. key additions to your list:

☐ state-specific rules applied (not just federal)
☐ phase-outs computed at correct income level
☐ carryforward items from prior year included
☐ estimated payments reconciled

the prior-year comparison (#6) is the single most useful check. if the agent says the client owes $15k more than last year and nothing material changed, something is wrong.

MKMichael Kelly, ACA·17h ago

i think the honest answer is most of us are still in the "read it and hope" phase but feel guilty about admitting it. anna's checklist is a good start.

one thing i'd add: run the same question through the agent TWICE with slightly different wording. if you get materially different answers, that's a red flag that the output isn't reliable for that particular question.

YTYuki Tanaka, 税理士·14h ago

for Japanese returns, i compare the agent output against the NTA's official calculation tool (確定申告書等作成コーナー). if numbers diverge, i investigate. simple but effective — gives you an independent verification source.

wish every jurisdiction had an equivalent free calculation tool to benchmark against.