$ cat projects/ledger-reconciler/README.md
Ledger Reconciler
What if every break on your daily recon run already knew why it didn't match?
09:15:32 INFO Pass exact matched 252 | remaining int=108 ext=122 09:15:32 INFO Pass amount_date matched 36 | remaining int=72 ext=86 09:15:32 INFO Pass ref_amount matched 29 | remaining int=43 ext=57 09:15:32 INFO Pass fuzzy_amount matched 18 | remaining int=25 ext=39 Category Count Total Amount Max Age ----------------------- ------ ------------- ------- missing_external 25 127,450.00 28d timing 18 63,200.00 3d ← still settling duplicate_external 12 44,100.00 15d amount_mismatch 7 19,875.00 8d ← check FX rate unresolved 2 5,200.00 1d
The Problem
Every bank, fintech, and payment processor runs the same daily process: compare the internal ledger to the external bank statement, find anything that doesn't match, figure out why.
In practice this means an analyst opens a spreadsheet at 8 AM and works through 60-100 breaks. Most of them are the same five patterns — a SWIFT reference that got truncated to 16 characters, a transaction that posts a day later on the bank side, a foreign exchange amount rounded to 2 decimal places differently. Patterns a machine should handle.
The real breaks — a missing external confirmation, a legitimately mismatched amount — are buried in that noise.
root cause 1
SWIFT MT103 truncates references to 16 chars. Your internal ref is "PMT-2026-001928" — the bank sees "PMT-2026-00192". VLOOKUP says no match.
root cause 2
EUR/USD settlement rounds to 2dp differently on each side. €1,234.567 becomes $1,345.93 internally and $1,345.94 externally. One cent. Real break? No.
root cause 3
Banks post on T+1 for certain transaction types. Internal ledger records Tuesday. Statement shows Wednesday. Both are correct. Neither knows about the other.
Architecture
Ingestion — Strategy pattern parsers
CSVFeedParser
Handles standard bank statement CSV exports. Validates required columns at parse time, normalises signed amounts, upcases currency codes, derives a stable raw_source_id from the row hash for idempotency.
JSONFeedParser
Open Banking API response format — nested amount.value / amount.currency block. Adding an MT940 or FIX parser is one new class; nothing else changes.
Ingest is idempotent — UNIQUE(side, account_id, raw_source_id) + ON CONFLICT DO NOTHING. Re-running the same file is safe.
Matching Engine — 4 ordered passes
Pass 1 — Exact
Amount + value_date + reference all identical. The happy path. ~70% of volume on a typical feed.
Pass 2 — Amount + Date
Amount and value_date match. Reference differs — picks the external transaction whose reference is the longest common prefix of the internal one.
Pass 3 — Reference + Amount
Reference and amount match. value_date is within a configurable N-day window (default 3). Catches T+1 bank posting lag.
Pass 4 — Fuzzy Amount
Amount within 1% tolerance + date within 2 days. Catches FX rounding differences. Records the exact delta in pips for the audit trail.
Each pass removes matched IDs from the candidate pool before the next runs. A transaction can only be matched once. The strictest method that applies wins.
Break Classifier — root cause before human review
timing
Within 3 days of run date — likely still settling
amount_mismatch
Reference found on both sides; amounts differ
missing_external
On internal ledger; absent from bank statement
missing_internal
On bank statement; absent from internal ledger
duplicate_int/ext
Same amount+date+reference appears twice on one side
unresolved
None of the above patterns matched — needs analyst
SQL — analytical patterns without the ORM
Window functions
ROW_NUMBER() OVER (PARTITION BY side, account_id, amount, value_date, reference)
Duplicate detection without a self-join
CTE pivot
WITH daily AS (
SELECT value_date,
SUM(CASE WHEN
side='internal'...)
GROUP BY value_date
)Daily internal vs external counts in one pass
Aging buckets
CASE
WHEN aging_days = 0
THEN 'same_day'
WHEN aging_days <= 3
THEN '1_to_3d'
...
ENDAging heatmap without a lookup table
Design Decisions
Why no ORM?
The analytical queries — daily pivots, window-function duplicate detection, aging buckets — are most naturally expressed in plain SQL. SQLAlchemy would make ROW_NUMBER() OVER harder, not easier. The repository pattern provides the abstraction without the ORM overhead.
Why multi-pass instead of a single JOIN?
A single SQL join matches greedily with no priority. A reference+amount match from pass 3 should not consume a transaction that exact-matched in pass 1. Removing matched IDs after each pass ensures the strictest applicable method always wins and every transaction is only consumed once.
Why Protocol interfaces for repositories?
The concrete repos are SQLite today. Swapping to PostgreSQL means writing one class that satisfies the same Protocol. The engine, orchestrator, classifier, and dashboard are unchanged. No base classes, no inheritance hierarchy, no framework coupling.
Why SQLite with WAL mode?
For dev and small-scale ops tooling, SQLite with WAL mode is zero-config and gives concurrent read access while the reconciler writes. The DB path is a parameter — pointing it at a PostgreSQL DSN and swapping the connection factory swaps the DB without touching any business logic.
Dashboard Output
Streamlit monitoring dashboard — real data from the ACC-001 reconciliation run.
⚖ Ledger Reconciler
bank recon engine v0.1
Account
Period
2026-02-07
→ 2026-03-09
Recent Runs
ACC-001 2026-02-07 → 2026-03-09
Summary
Total Transactions
720
both sides
Matched
341
across 4 passes
Open Breaks
37
need investigation
Duplicates
0
flagged
Match Rate
94.7%
$482k total vol
Trend & Distribution
Daily Match Rate
Break Categories
Match Methods & Aging
Match Method Breakdown
Break Aging by Category
Open Breaks — Drill Down
37 breaks shown · sorted by aging ↓
Tech Stack
Language
Python 3.11+
match statement, Protocol typing, tomllib
Database
SQLite / PostgreSQL
WAL mode; swappable via repo pattern
Analytics
pandas
DataFrame work in dashboard layer only
Dashboard
Streamlit
Real-time ops monitoring, no front-end dev
Testing
pytest
Unit + integration; tmp_path fixtures
Packaging
pyproject.toml
PEP 517; editable install
Architecture
Strategy + Repository
SOLID; independently testable layers
SQL patterns
CTEs + window fns
No ORM; expressive analytics