Evaluate financial AI against planted ground truth.
An anomaly detector is only as trustworthy as the anomalies you tested it on — and whether you knew the right answers.
The problem with testing on real data
Production books have unknown ground truth. If your model flags twelve things, you can't say how many it missed, because nobody labelled reality. You can't improve what you can't measure.
Plant the needles, then score the model
Fictix plants known issues — duplicate payments, ghost vendors, miscategorised expenses, revenue recognised in the wrong period, books that don't reconcile — and records exactly what and where. Run your model, and Fictix grades it: precision and recall against ground truth, per scenario.
Adversarial and repeatable
Turn up the chaos and fraud intensity to stress the model. Lock a seed so a regression run is identical to last week's and the score delta is real.
Questions
How is detection accuracy measured?
Fictix knows every anomaly it planted, so it computes precision and recall on your product's findings for each scenario.
What kinds of anomalies can be planted?
Duplicate bills, name/vendor mix-ups, ghost employees, wrong categories, missing fields, wrong dates, unreconciled books, and fraud-shaped patterns.
Can I make it harder?
Yes — chaos and fraud intensity are tunable, and you can compose multiple anomalies into one company.