Sankar

Software & Data Engineer

Sankar Kalyanakumar

I build systems where bad data can't hide.

6 years designing pipelines that validate before they propagate, fail loud instead of silent, and let humans approve before anything irreversible runs.

PythonSQLAWSRedshiftGlueDynamoDBSpring BootTerraformAzure SynapseTableau

// how I think about engineering

[x]

Validate at the boundary

Bad data shouldn't travel far. Catch it at ingestion, log exactly what failed and why, and stop the pipeline before the corruption spreads downstream.

||

Pause before irreversible

Automation is most dangerous right before it does something permanent. Build the gate first — approval, confirmation, timeout — then build the action.

(!)

Fail loud, not silent

A pipeline that swallows errors and marks rows 'processed' is worse than one that crashes. If something is wrong, scream and stop. Silent failures cost weeks.

$ ls ~/projects

Open source work

github.com/Sankartk
cashcast — branch cash intelligence
01Cash Ops · Python + ML

CashCast

PythonFastAPIRidge RegressionIsolation ForestPlotly.jsSwagger

“Every branch pads its vault order 15–20% as a buffer. CashCast forecasts that demand with ML — the buffer becomes a number, not a guess.”

  • Ridge regression per branch, 730 days — avg MAPE 9.1%
  • Isolation Forest flags demand anomalies before vault gaps occur
  • 14-day forecast with confidence bands + $1K-rounded order rec
  • AI narrative: peak day, seasonal delta, idle cash risk per branch
  • Plotly.js ops dashboard: vault status, charts, CSV export
  • Swagger at /docs — 5 tagged endpoints, fully documented

Avg MAPE

9.1%

Tests

14 / 14

Branches

6

Horizon

14 days

○ CashCast:8001

Branches

6

Avg MAPE

9.1%

Total Rec

$867K

Horizon

14d

Anomalies

2

High Risk

1

// 14-day demand forecast — BRK-01 Downtown

$200K$150K$100K$50K

// order recs

BRK-01
$148K
BRK-02
$93K
BRK-03
$112K
BRK-04
$67K
BRK-05
$204K
BRK-06
$89K
fleetpulse — fleet maintenance ops
02Fleet Ops · Java

FleetPulse

Java 21Spring BootPostgreSQLFlyway

“A truck breaks down. The service was six weeks overdue. The spreadsheet was the last to know.”

  • Hourly scheduler catches overdue maintenance before anyone checks
  • Idempotent alerts — same event fires once, not on every poll
  • Live ops dashboard: resolve alerts, KPIs update every 60s
  • 25+ REST endpoints, Flyway migrations, role-based access
  • 16 integration tests — zero failures across full lifecycle
  • PostgreSQL + Spring Data JPA, containerised with Docker Compose

Endpoints

25+

Tests

16 / 16

Stack

Java 21

DB

Postgres

⬡ FleetPulse:8080

Vehicles

8

Overdue

1

Alerts

3

Tests

16/16

// unresolved alerts

VehicleTypeSeverity
FP-TRK-003MAINT_DUECRITICAL
FP-VAN-007LIC_EXPIRYHIGH
FP-SUV-002FUEL_LOWMEDIUM

// fleet status

Active
5
In Maint
2
Retired
1
ops-copilot-bedrock — incident AI
03Incident Response · AWS

Ops Copilot

PythonAWS BedrockStep FunctionsFAISS

“2am. Service is down. The fix is buried somewhere in a 40-page runbook.”

  • FAISS-indexed runbooks — answers cite exact file and line number
  • LLM stays grounded: only quotes what it found, never invents steps
  • Step Functions pauses at SNS gate — nothing runs until approved
  • Human-in-the-loop: approve or reject before any remediation fires
  • Swap one env var to switch between Ollama (local) and AWS Bedrock
  • Modular retriever: swap FAISS for any vector store without rewriting

Vector DB

FAISS

LLM

Bedrock

Gate

SNS

Workflow

StepFn

⚬ Ops Copilot:8501

// incident query

prod-db disk full — what do I do?ask

// answer — grounded in runbooks

runbooks/db.md #L42similarity 0.93

Run df -h /var/lib/postgresql to confirm. If >90%, execute cleanup as per section 3.2.

🔒 remediation pending approvalAPPROVEREJECT
ledger-reconciler — financial ops
04Financial Ops · Python

Ledger Reconciler

PythonSQLitepandasStreamlit

“Every break has a reason. They're just buried in 80 rows of noise before anyone can dig.”

  • 94.7% match rate — 720 transactions over a 30-day run
  • 4 ordered passes: exact → amount+date → reference → fuzzy
  • Every break classified with root cause before a human sees it
  • Streamlit dashboard: trend chart, aging heatmap, break drill-down
  • SQLite audit log — every match decision is traceable and replayable
  • Handles timing diffs, format mismatches, and near-duplicate entries

Match rate

94.7%

Txns

720

Period

30 days

Passes

4

◆ Ledger:8501

Match Rate

94.7%

Matched

681

Breaks

39

Period

30d

// open breaks

ReferenceCategoryAmount
PMT-2026-0087missing_ext$14,250
PMT-2026-0291timing$3,120
PMT-2026-0204amt_mismatch$6,340

// by category

missing_ext
timing
duplicate
amt_mismatch
unresolved