using AI to categorize bank transactions — my honest results after 3 months
been running an experiment since January: using Claude to categorize bank transactions for 5 small business clients before they go into QBO.
the setup:
- download bank CSV
- feed to Claude with the client's chart of accounts
- ask it to categorize each transaction
- import the categorized file into QBO
- manually review and correct
the results after 3 months, ~4,000 transactions:
- 78% categorized correctly on first pass
- 12% in the right ballpark but wrong specific account (e.g. "Office Supplies" vs "Computer Expenses")
- 7% confidently wrong (e.g. a Costco purchase categorized as "Cost of Goods Sold" when the client is a consultant — it's office snacks)
- 3% completely bizarre (categorized a rent payment as "Owner's Draw")
my take: 78% accuracy saves time on the bulk work but you CANNOT skip the review. the 22% error rate would materially misstate the financial statements.
the biggest issue: AI doesn't know the CLIENT. it doesn't know that when this particular client buys from Amazon, it's usually inventory, not office supplies. it needs historical context.
4 replies
78% is actually better than i expected. here's what pushed my accuracy to ~90%:
include the LAST 3 MONTHS of categorized transactions in the prompt. not all of them — just the unique vendors and how they were categorized.
"When you see 'AMZN MKTP US*123', categorize it as 6200-Inventory Purchases. When you see 'COSTCO WHSE', categorize it as 6340-Office Supplies."
the AI learns the client's patterns from the examples. it still needs review but the error rate drops significantly.
interesting comparison from the German side: we use DATEV Unternehmen Online which has its own AI categorisation (Kontierungsvorschläge). their accuracy is about 85% after the learning phase — better than generic AI because it's trained specifically on German accounting data with SKR03/SKR04 account frames.
the lesson: domain-specific AI beats general-purpose AI for accounting tasks. which is basically the thesis of this whole community — we're building the domain-specific rules that make general AI work better for accounting.
yuki's 78% number is consistent with what i've seen. the key metric isn't "how many did it get right" — it's "how long does it take to fix the ones it got wrong vs categorizing from scratch."
for my clients, AI categorization + review takes about 40% less time than manual categorization from scratch. that's meaningful. not transformative, but meaningful.
where it REALLY saves time is clients with 500+ transactions per month. the first pass gets the obvious ones (recurring vendors, common categories) and i just focus on the edge cases.
the "owner's draw for a rent payment" error is hilarious and also terrifying.
i suspect what's happening is the AI sees a large round-number payment and associates it with owner distributions. which would make sense for some clients — but not for rent.
this is a good example of why RULES-BASED categorization (which is basically what QBO's own bank rules do) is more reliable than AI-INFERRED categorization for recurring transactions. save the AI for the unusual ones.
Sign in as a verified accountant to reply.
Sign in