Software & Data Engineer

Sankar Kalyanakumar

I build systems where bad data can't hide.

6 years designing pipelines that validate before they propagate, fail loud instead of silent, and let humans approve before anything irreversible runs.

PythonSQLAWSRedshiftGlueDynamoDBSpring BootTerraformAzure SynapseTableau

karthicks399@gmail.com LinkedIn GitHub

// how I think about engineering

[x]

Validate at the boundary

Bad data shouldn't travel far. Catch it at ingestion, log exactly what failed and why, and stop the pipeline before the corruption spreads downstream.

Pause before irreversible

Automation is most dangerous right before it does something permanent. Build the gate first — approval, confirmation, timeout — then build the action.

(!)

Fail loud, not silent

A pipeline that swallows errors and marks rows 'processed' is worse than one that crashes. If something is wrong, scream and stop. Silent failures cost weeks.

$ ls ~/projects

Open source work

github.com/Sankartk

cashcast — branch cash intelligence

01Cash Ops · Python + ML

CashCast

PythonFastAPIRidge RegressionIsolation ForestPlotly.jsSwagger

“Every branch pads its vault order 15–20% as a buffer. CashCast forecasts that demand with ML — the buffer becomes a number, not a guess.”

→Ridge regression per branch, 730 days — avg MAPE 9.1%
→Isolation Forest flags demand anomalies before vault gaps occur
→14-day forecast with confidence bands + $1K-rounded order rec
→AI narrative: peak day, seasonal delta, idle cash risk per branch
→Plotly.js ops dashboard: vault status, charts, CSV export
→Swagger at /docs — 5 tagged endpoints, fully documented

Avg MAPE

9.1%

Tests

14 / 14

Branches

Horizon

14 days

○ CashCastBranch Cash Intelligence:8001

Branches

Avg MAPE

9.1%

Total Rec

$867K

Horizon

14d

Anomalies

High Risk

// 14-day demand forecast — BRK-01 Downtown

// order recs

BRK-01

$148K

BRK-02

$93K

BRK-03

$112K

BRK-04

$67K

BRK-05

$204K

BRK-06

$89K

Full write-up →GitHub →

fleetpulse — fleet maintenance ops

02Fleet Ops · Java

FleetPulse

Java 21Spring BootPostgreSQLFlyway

“A truck breaks down. The service was six weeks overdue. The spreadsheet was the last to know.”

→Hourly scheduler catches overdue maintenance before anyone checks
→Idempotent alerts — same event fires once, not on every poll
→Live ops dashboard: resolve alerts, KPIs update every 60s
→25+ REST endpoints, Flyway migrations, role-based access
→16 integration tests — zero failures across full lifecycle
→PostgreSQL + Spring Data JPA, containerised with Docker Compose

Endpoints

25+

Tests

16 / 16

Stack

Java 21

Postgres

⬡ FleetPulseOperations Dashboard:8080

Vehicles

Overdue

Alerts

Tests

16/16

// unresolved alerts

VehicleTypeSeverity

FP-TRK-003MAINT_DUECRITICAL

FP-VAN-007LIC_EXPIRYHIGH

FP-SUV-002FUEL_LOWMEDIUM

// fleet status

Active

In Maint

Retired

Full write-up →GitHub →

ops-copilot-bedrock — incident AI

03Incident Response · AWS

Ops Copilot

PythonAWS BedrockStep FunctionsFAISS

“2am. Service is down. The fix is buried somewhere in a 40-page runbook.”

→FAISS-indexed runbooks — answers cite exact file and line number
→LLM stays grounded: only quotes what it found, never invents steps
→Step Functions pauses at SNS gate — nothing runs until approved
→Human-in-the-loop: approve or reject before any remediation fires
→Swap one env var to switch between Ollama (local) and AWS Bedrock
→Modular retriever: swap FAISS for any vector store without rewriting

Vector DB

FAISS

LLM

Bedrock

Gate

SNS

Workflow

StepFn

⚬ Ops CopilotRAG Interface:8501

// incident query

prod-db disk full — what do I do?ask

// answer — grounded in runbooks

runbooks/db.md #L42similarity 0.93

Run df -h /var/lib/postgresql to confirm. If >90%, execute cleanup as per section 3.2.

🔒 remediation pending approvalAPPROVEREJECT

Full write-up →GitHub →

ledger-reconciler — financial ops

04Financial Ops · Python

Ledger Reconciler

PythonSQLitepandasStreamlit

“Every break has a reason. They're just buried in 80 rows of noise before anyone can dig.”

→94.7% match rate — 720 transactions over a 30-day run
→4 ordered passes: exact → amount+date → reference → fuzzy
→Every break classified with root cause before a human sees it
→Streamlit dashboard: trend chart, aging heatmap, break drill-down
→SQLite audit log — every match decision is traceable and replayable
→Handles timing diffs, format mismatches, and near-duplicate entries

Match rate

94.7%

Txns

720

Period

30 days

Passes

◆ LedgerReconciliation Dashboard:8501

Match Rate

94.7%

Matched

681

Breaks

Period

30d

// open breaks

ReferenceCategoryAmount

PMT-2026-0087missing_ext$14,250

PMT-2026-0291timing$3,120

PMT-2026-0204amt_mismatch$6,340

// by category

missing_ext

timing

duplicate

amt_mismatch

unresolved

Full write-up →GitHub →