using AI to categorize bank transactions — my honest results after 3 months

YTYuki Tanaka, 税理士·1d ago·Bookkeeping·From JP·Topic US

been running an experiment since January: using Claude to categorize bank transactions for 5 small business clients before they go into QBO.

the setup:

download bank CSV
feed to Claude with the client's chart of accounts
ask it to categorize each transaction
import the categorized file into QBO
manually review and correct

the results after 3 months, ~4,000 transactions:

78% categorized correctly on first pass
12% in the right ballpark but wrong specific account (e.g. "Office Supplies" vs "Computer Expenses")
7% confidently wrong (e.g. a Costco purchase categorized as "Cost of Goods Sold" when the client is a consultant — it's office snacks)
3% completely bizarre (categorized a rent payment as "Owner's Draw")

my take: 78% accuracy saves time on the bulk work but you CANNOT skip the review. the 22% error rate would materially misstate the financial statements.

the biggest issue: AI doesn't know the CLIENT. it doesn't know that when this particular client buys from Amazon, it's usually inventory, not office supplies. it needs historical context.

4 replies

SCSarah Chen, CPA·23h ago

78% is actually better than i expected. here's what pushed my accuracy to ~90%:

include the LAST 3 MONTHS of categorized transactions in the prompt. not all of them — just the unique vendors and how they were categorized.

"When you see 'AMZN MKTP US*123', categorize it as 6200-Inventory Purchases. When you see 'COSTCO WHSE', categorize it as 6340-Office Supplies."

the AI learns the client's patterns from the examples. it still needs review but the error rate drops significantly.

DADr. Anna Schmidt, StB·14h ago

interesting comparison from the German side: we use DATEV Unternehmen Online which has its own AI categorisation (Kontierungsvorschläge). their accuracy is about 85% after the learning phase — better than generic AI because it's trained specifically on German accounting data with SKR03/SKR04 account frames.

the lesson: domain-specific AI beats general-purpose AI for accounting tasks. which is basically the thesis of this whole community — we're building the domain-specific rules that make general AI work better for accounting.

ETEmma Thompson, CA ANZ·17h ago

yuki's 78% number is consistent with what i've seen. the key metric isn't "how many did it get right" — it's "how long does it take to fix the ones it got wrong vs categorizing from scratch."

for my clients, AI categorization + review takes about 40% less time than manual categorization from scratch. that's meaningful. not transformative, but meaningful.

where it REALLY saves time is clients with 500+ transactions per month. the first pass gets the obvious ones (recurring vendors, common categories) and i just focus on the edge cases.

JMJames Mifsud, CPA·20h ago

the "owner's draw for a rent payment" error is hilarious and also terrifying.

i suspect what's happening is the AI sees a large round-number payment and associates it with owner distributions. which would make sense for some clients — but not for rent.

this is a good example of why RULES-BASED categorization (which is basically what QBO's own bank rules do) is more reliable than AI-INFERRED categorization for recurring transactions. save the AI for the unusual ones.