$ cat projects/ledger-reconciler/README.md

Ledger Reconciler

What if every break on your daily recon run already knew why it didn't match?

BREAK REPORT — ACC-001
09:15:32  INFO  Pass exact          matched  252  | remaining int=108  ext=122
09:15:32  INFO  Pass amount_date    matched   36  | remaining int=72   ext=86
09:15:32  INFO  Pass ref_amount     matched   29  | remaining int=43   ext=57
09:15:32  INFO  Pass fuzzy_amount   matched   18  | remaining int=25   ext=39

Category                  Count   Total Amount   Max Age
-----------------------  ------  -------------  -------
missing_external             25     127,450.00      28d
timing                       18      63,200.00       3d    ← still settling
duplicate_external           12      44,100.00      15d
amount_mismatch               7      19,875.00       8d    ← check FX rate
unresolved                    2       5,200.00       1d

The Problem

Every bank, fintech, and payment processor runs the same daily process: compare the internal ledger to the external bank statement, find anything that doesn't match, figure out why.

In practice this means an analyst opens a spreadsheet at 8 AM and works through 60-100 breaks. Most of them are the same five patterns — a SWIFT reference that got truncated to 16 characters, a transaction that posts a day later on the bank side, a foreign exchange amount rounded to 2 decimal places differently. Patterns a machine should handle.

The real breaks — a missing external confirmation, a legitimately mismatched amount — are buried in that noise.

root cause 1

SWIFT MT103 truncates references to 16 chars. Your internal ref is "PMT-2026-001928" — the bank sees "PMT-2026-00192". VLOOKUP says no match.

root cause 2

EUR/USD settlement rounds to 2dp differently on each side. €1,234.567 becomes $1,345.93 internally and $1,345.94 externally. One cent. Real break? No.

root cause 3

Banks post on T+1 for certain transaction types. Internal ledger records Tuesday. Statement shows Wednesday. Both are correct. Neither knows about the other.

Architecture

01

Ingestion — Strategy pattern parsers

CSVFeedParser

Handles standard bank statement CSV exports. Validates required columns at parse time, normalises signed amounts, upcases currency codes, derives a stable raw_source_id from the row hash for idempotency.

JSONFeedParser

Open Banking API response format — nested amount.value / amount.currency block. Adding an MT940 or FIX parser is one new class; nothing else changes.

Ingest is idempotent — UNIQUE(side, account_id, raw_source_id) + ON CONFLICT DO NOTHING. Re-running the same file is safe.

02

Matching Engine — 4 ordered passes

Pass 1 — Exact

Amount + value_date + reference all identical. The happy path. ~70% of volume on a typical feed.

Pass 2 — Amount + Date

Amount and value_date match. Reference differs — picks the external transaction whose reference is the longest common prefix of the internal one.

Pass 3 — Reference + Amount

Reference and amount match. value_date is within a configurable N-day window (default 3). Catches T+1 bank posting lag.

Pass 4 — Fuzzy Amount

Amount within 1% tolerance + date within 2 days. Catches FX rounding differences. Records the exact delta in pips for the audit trail.

Each pass removes matched IDs from the candidate pool before the next runs. A transaction can only be matched once. The strictest method that applies wins.

03

Break Classifier — root cause before human review

timing

Within 3 days of run date — likely still settling

amount_mismatch

Reference found on both sides; amounts differ

missing_external

On internal ledger; absent from bank statement

missing_internal

On bank statement; absent from internal ledger

duplicate_int/ext

Same amount+date+reference appears twice on one side

unresolved

None of the above patterns matched — needs analyst

04

SQL — analytical patterns without the ORM

Window functions

ROW_NUMBER() OVER
(PARTITION BY side,
  account_id, amount,
  value_date, reference)

Duplicate detection without a self-join

CTE pivot

WITH daily AS (
  SELECT value_date,
    SUM(CASE WHEN
      side='internal'...)
  GROUP BY value_date
)

Daily internal vs external counts in one pass

Aging buckets

CASE
  WHEN aging_days = 0
    THEN 'same_day'
  WHEN aging_days <= 3
    THEN '1_to_3d'
  ...
END

Aging heatmap without a lookup table

Design Decisions

Why no ORM?

The analytical queries — daily pivots, window-function duplicate detection, aging buckets — are most naturally expressed in plain SQL. SQLAlchemy would make ROW_NUMBER() OVER harder, not easier. The repository pattern provides the abstraction without the ORM overhead.

Why multi-pass instead of a single JOIN?

A single SQL join matches greedily with no priority. A reference+amount match from pass 3 should not consume a transaction that exact-matched in pass 1. Removing matched IDs after each pass ensures the strictest applicable method always wins and every transaction is only consumed once.

Why Protocol interfaces for repositories?

The concrete repos are SQLite today. Swapping to PostgreSQL means writing one class that satisfies the same Protocol. The engine, orchestrator, classifier, and dashboard are unchanged. No base classes, no inheritance hierarchy, no framework coupling.

Why SQLite with WAL mode?

For dev and small-scale ops tooling, SQLite with WAL mode is zero-config and gives concurrent read access while the reconciler writes. The DB path is a parameter — pointing it at a PostgreSQL DSN and swapping the connection factory swaps the DB without touching any business logic.

Dashboard Output

Streamlit monitoring dashboard — real data from the ACC-001 reconciliation run.

localhost:8503

⚖ Ledger Reconciler

bank recon engine v0.1

Account

ACC-001

Period

2026-02-07

→ 2026-03-09

Recent Runs

#1 2026-03-09
341 matched
37 breaks

ACC-001 2026-02-07 → 2026-03-09

Summary

Total Transactions

720

both sides

Matched

341

across 4 passes

Open Breaks

37

need investigation

Duplicates

0

flagged

Match Rate

94.7%

$482k total vol

Trend & Distribution

Daily Match Rate

95%Feb 7Feb 21Mar 9

Break Categories

37breaks
missing_ext13
timing9
duplicate8
amt_mismatch5
unresolved2

Match Methods & Aging

Match Method Breakdown

315exact20ref_amount6fuzzy_amt

Break Aging by Category

missing_exttimingduplicateamt_mismatchsame day1–3d4–7d>7d

Open Breaks — Drill Down

ReferenceSideDateCcyAmountCategoryAgeNotes
PMT-2026-0087INT02-10USD−14,250.00missing​_external27d
PMT-2026-0143INT02-14USD−8,900.00missing​_external23d
PMT-2026-0291INT03-07EUR−3,120.00timing2d
PMT-2026-0308EXT03-08USD+1,875.50timing1d
PMT-2026-0112INT02-18USD−22,000.00duplicate​_internal19d
PMT-2026-0204INT02-25GBP−6,340.00amount​_mismatch12d

37 breaks shown · sorted by aging ↓

Tech Stack

Language

Python 3.11+

match statement, Protocol typing, tomllib

Database

SQLite / PostgreSQL

WAL mode; swappable via repo pattern

Analytics

pandas

DataFrame work in dashboard layer only

Dashboard

Streamlit

Real-time ops monitoring, no front-end dev

Testing

pytest

Unit + integration; tmp_path fixtures

Packaging

pyproject.toml

PEP 517; editable install

Architecture

Strategy + Repository

SOLID; independently testable layers

SQL patterns

CTEs + window fns

No ORM; expressive analytics