Plant the mistake. Then prove your product caught it.
The scenario engine is the difference between 'we tested it' and 'we measured it'. You define the failure; Fictix injects it and keeps the ground truth; your product gets a score.
Why ground truth is the whole game
Run a detector on production books and you learn what it flagged — never what it missed, because nobody labelled reality. An evaluation without ground truth is a vibe. The scenario engine inverts that: you decide the truth by planting it, so every run yields real precision and recall, not anecdotes.
The eight needle families
A needle is a known mistake hidden in otherwise coherent books. Fictix ships eight families, each with a detectable signature and a recorded location:
| Needle | What it looks like in the data |
|---|---|
| Same bill twice | One vendor invoice paid on two different dates / two journal entries |
| Name mix-ups | Vendor or customer recorded under near-duplicate names that should consolidate |
| Ghost worker | Payroll run for an employee with no offer, no start event, no activity |
| Wrong category | Expense booked to a GL account inconsistent with its vendor and history |
| Missing info | Required fields blank on otherwise valid documents (memo, class, date) |
| Wrong date | Transaction posted to the wrong period — revenue/expense recognition trap |
| Books don't match | Sub-ledger total diverges from the GL / bank feed by a planted delta |
| Looks like fraud | Structured pattern: round-dollar runs, threshold-hugging, off-hours edits |
An edit-history layer can also plant after-the-fact changes, so audit-trail products have something real to find. Needles compose: one company can carry many, across systems, at controlled intensity.
Author scenarios in plain language
You don't hand-write JSON. Describe the failure; Fictix injects it and writes a manifest — what was planted, where, and the expected finding. From the CLI:
fictix scenario add "pay invoice INV-2287 twice, 9 days apart"
fictix scenario add "ghost employee on the March payroll run"
fictix scenario list # shows planted needles + ground-truth refs
fictix advance 30d # move time so the trap matures
fictix assert --recall 0.9 --precision 0.8 # grade your detectorThe assertstep fails CI if your product's findings miss the recall/precision thresholds — detection becomes a build gate, not a quarterly review.
Test matrices and regression
Group scenarios into a test matrixthat must pass before release. Because the company is deterministic, last week's run and this week's run differ only by your code, so a score delta is a real regression — trackable over time, attributable to a commit, and filable as a ticket that points at the exact transaction you planted.
Turn up the pressure
Chaos and fraud are dials, not booleans. Raise intensity to stress a model toward its failure point; lower it to find the floor where it stops catching anything. Pair with the living simulation so needles appear mid-stream, not just at t=0.
Questions
How does Fictix know if my product is right?
It plants every anomaly itself and records a manifest of what and where, so it computes precision and recall on your product's findings per scenario — misses are explicit, not inferred.
Do I write code or config to design a scenario?
No. You describe the anomaly in plain language via the dashboard or `fictix scenario add`; Fictix injects it into the books and tracks ground truth.
Can scenario results gate a release?
Yes. `fictix assert` enforces recall/precision thresholds and fails CI, and a test matrix can be required to pass before shipping.
Can multiple anomalies coexist in one company?
Yes — needles compose across systems and time at controlled intensity, so you can simulate realistic, messy books rather than one isolated bug.