Kushagra
← Back

⚙ Professional

techspec-for-spensy

07 Apr 2026·9 min read

Spensy v1 — Technical Architecture

Overview

Spensy is a full-stack personal finance tracker that uses Claude AI to extract transactions from bank statements (PDFs, images). Users upload statements, Claude extracts structured transaction data, users review and confirm, and the data is stored encrypted at rest.


Tech Stack

Layer Technology
Backend framework FastAPI 0.115.12 (async)
ASGI server Uvicorn 0.34.0
Language Python 3.12
Database PostgreSQL (async via asyncpg 0.30.0)
ORM SQLAlchemy 2.0 (async)
Templating Jinja2 3.1.6 (server-side rendering)
Authentication Authlib 1.4.1 (Google OAuth 2.0)
AI integration Anthropic Python SDK 0.49.0 (Claude API)
Encryption cryptography 44.0.2 (AES-256-GCM)
PDF processing pypdf 6.9.2
Session management Starlette SessionMiddleware + itsdangerous
Deployment Heroku (Procfile + runtime.txt)

Directory Structure

spensy_v1/
├── app/
│   ├── main.py                   # FastAPI app init, middleware, base routes
│   ├── auth.py                   # Google OAuth 2.0 handlers
│   ├── config.py                 # pydantic-settings config
│   ├── crypto.py                 # AES-256-GCM encrypt/decrypt utilities
│   ├── db.py                     # Async engine + session factory
│   ├── models.py                 # SQLAlchemy ORM models
│   ├── routers/
│   │   ├── settings.py           # /settings — LLM API key management
│   │   ├── transactions.py       # /transactions — view & filter
│   │   └── upload.py             # /upload, /review, /confirm
│   ├── services/
│   │   └── extractor.py          # Claude API integration + extraction logic
│   ├── prompts/
│   │   └── extraction_system.txt # Claude system prompt (194 lines)
│   └── templates/                # Jinja2 HTML templates (6 pages)
├── schema.sql                    # Full PostgreSQL DDL
├── requirements.txt
├── runtime.txt                   # python-3.12.x
├── Procfile                      # web: uvicorn app.main:app ...
└── .env.example

Database Schema

All tables use UUID PKs generated by pgcrypto. Sensitive fields are encrypted with AES-256-GCM before storage.

users

Column Type Notes
user_id UUID PK
google_sub_hash CHAR(64) UNIQUE HMAC-SHA256 of Google sub — one-way, searchable
email BYTEA AES-256-GCM encrypted
email_verified BOOLEAN
last_login_at TIMESTAMPTZ
created_at / updated_at TIMESTAMPTZ auto-trigger on updated_at

transactions

Column Type Notes
txn_id UUID PK
user_id UUID FK
account_id UUID FK nullable
source ENUM email_scan, pdf_scan, image_scan
txn_ts TIMESTAMPTZ transaction timestamp
amount NUMERIC(18,4)
curr CHAR(3) default INR
payer_name / payee_name BYTEA AES-256-GCM encrypted
payment_code ENUM DEBIT_P2P, DEBIT_P2M, CREDIT_M2P, CREDIT_P2P, REFUND_M2P, REFUND_P2P
txn_code CHAR(8) 8-digit category code (see Category System)
base_code CHAR(2) GENERATED extracted from txn_code chars 3-4
detail_code CHAR(4) GENERATED extracted from txn_code chars 5-8
type ENUM expense, earning, investment, credit_card_bill
description / user_code BYTEA AES-256-GCM encrypted
is_loan ENUM DEBIT, CREDIT, nullable
is_recurring BOOLEAN
rrn VARCHAR(12) Retrieval Reference Number, nullable

Indexes on: (user_id, txn_ts), (user_id, type), (user_id, payment_code), (user_id, base_code).

accounts

Column Type Notes
account_id UUID PK
user_id UUID FK
account_type ENUM savings, credit_card
account_name VARCHAR(64) e.g. "HDFC", "AXIS"
label VARCHAR(128) e.g. "Salary Account"

UNIQUE on (user_id, account_type, account_name).

user_llm_keys

Column Type Notes
key_id UUID PK
user_id UUID FK
provider ENUM claude
api_key BYTEA AES-256-GCM encrypted

UNIQUE on (user_id, provider) — one key per provider per user.

txn_base_categories & txn_detail_categories

Seeded lookup tables mapping 2-digit and 4-digit codes to labels. 11 base categories (household, food, transport, health, entertainment, education, finance, travel, income, investment, other).


Authentication Flow

User → "Continue with Google"
       → GET /auth/google
       → Authlib redirects to Google (scopes: openid email profile)
       → Google → GET /auth/google/callback?code=...
       → Exchange code for token + userinfo
       → HMAC-SHA256(google_sub, SERVER_SECRET) → google_sub_hash
       → AES-256-GCM(email, ENCRYPTION_KEY) → encrypted email
       → UPSERT into users ON CONFLICT(google_sub_hash) DO UPDATE
       → request.session["user_id"] = user_id
       → redirect to /dashboard

Security notes:


API Routes

Auth (app/auth.py)

Method Path Description
GET /auth/google Initiate OAuth flow
GET /auth/google/callback OAuth callback handler
POST /auth/logout Clear session

Main (app/main.py)

Method Path Description
GET / Landing page (redirect to /dashboard if logged in)
GET /dashboard Upload hub

Upload (app/routers/upload.py)

Method Path Description
GET /upload Upload page
POST /upload Process file (image/PDF), call Claude, store in _pending
GET /upload/pdf-password PDF password form
POST /upload/pdf-password Decrypt PDF, call Claude
GET /review Review extracted transactions
POST /transactions/confirm Confirm & save to DB

Transactions (app/routers/transactions.py)

Method Path Description
GET /transactions View transactions (paginated, filterable)

Query params: account_id, type, payment_code, from_date, to_date, page (50/page).

Settings (app/routers/settings.py)

Method Path Description
GET /settings Settings page (masked API keys)
POST /settings/keys Add/update Claude API key
POST /settings/keys/{key_id}/delete Delete API key

Transaction Upload Data Flow

1. User uploads image or PDF
2. POST /upload
   ├── Validate file type (JPEG, PNG, WebP, PDF)
   ├── If PDF encrypted → redirect to /upload/pdf-password
   └── Pass to extractor.py
3. extractor.py
   ├── Encode file as base64
   ├── Load system prompt from prompts/extraction_system.txt
   ├── Call Claude claude-sonnet-4-6 (max_tokens: 8192)
   │   └── Input: base64 image or PDF document + user's saved accounts
   ├── Parse JSON response
   │   └── If parse fails → send correction turn to Claude (retry)
   └── Return structured transaction list
4. Store extraction result in in-memory _pending dict (keyed by session token)
5. GET /review → user edits extracted data
6. POST /transactions/confirm
   ├── UPSERT accounts
   ├── Encrypt sensitive fields (payer_name, payee_name, description)
   ├── INSERT transactions (skip duplicates by user_id + txn_ts + amount + payment_code)
   └── Clear _pending entry

Note: _pending and _pending_pdf are in-memory dicts — single-process only. A future multi-process deployment would need a shared store (Redis, DB table).


Encryption Strategy

All encryption uses AES-256-GCM via the cryptography library. A fresh 12-byte nonce is generated per encryption; the nonce is prepended to the ciphertext before storing to BYTEA.

stored_bytes = nonce (12 bytes) || ciphertext || tag
Data Storage Method
Email BYTEA AES-256-GCM with ENCRYPTION_KEY
Payer / payee name BYTEA AES-256-GCM with ENCRYPTION_KEY
Transaction description BYTEA AES-256-GCM with ENCRYPTION_KEY
User notes (user_code) BYTEA AES-256-GCM with ENCRYPTION_KEY
LLM API keys BYTEA AES-256-GCM with ENCRYPTION_KEY
Google OAuth sub CHAR(64) HMAC-SHA256 with SERVER_SECRET (one-way)

Category System

Transactions use an 8-digit category code:

txn_code = "00" + 2-digit base + 4-digit detail

Example: "00010001"
         ^^          → reserved prefix
           ^^        → base code "01" = household
             ^^^^    → detail code "0001" = house_rent

11 base categories: 00 (uncategorized), 01 (household), 02 (food), 03 (transport), 04 (health), 05 (entertainment), 06 (education), 07 (finance), 08 (travel), 09 (income), 10 (investment).

base_code and detail_code are PostgreSQL generated columns derived from txn_code, enabling efficient indexed filtering.


External Services

Google OAuth 2.0

Claude API (Anthropic)


Environment Variables

Variable Required Description
DATABASE_URL Yes postgresql+asyncpg://user:pass@host:5432/db
GOOGLE_CLIENT_ID Yes Google Cloud Console OAuth client ID
GOOGLE_CLIENT_SECRET Yes Google Cloud Console OAuth secret
SESSION_SECRET Yes 64-char hex — Starlette session signing
SERVER_SECRET Yes 64-char hex — HMAC-SHA256 key for google_sub
ENCRYPTION_KEY Yes 64-char hex — AES-256-GCM key
LOG_LLM No true to enable LLM debug logging (default: false)

Generate secrets:

python -c "import secrets; print(secrets.token_hex(32))"

Deployment

Local development

pip install -r requirements.txt
# Set up .env from .env.example
# Run schema.sql against a Postgres DB
uvicorn app.main:app --reload

Heroku (production)

Current deployment


Frontend

Server-side rendered via Jinja2. No frontend framework — custom CSS only.

Template Route Purpose
landing.html / Google login page
dashboard.html /dashboard File upload hub
settings.html /settings API key management
transactions.html /transactions Transaction list with filters
review.html /review Review/edit extracted transactions before confirming
pdf_password.html /upload/pdf-password Encrypted PDF password entry

Design: White cards on light gray (#f5f5f0) background, accent blue (#4f7ef7), flexbox layout, no external CSS framework.


Key Architectural Decisions

  1. User-supplied API keys: Users bring their own Claude API key. No Anthropic costs on the platform side. Keys are validated on save and stored encrypted.

  2. In-memory pending state: Extraction results are held in _pending dicts between upload and confirm steps. Simple for single-process, but won't scale to multi-worker without a shared store.

  3. No raw PII in DB: Google sub hashed one-way; emails and names encrypted. The DB can be compromised without exposing user identity directly.

  4. Deduplication on confirm: Transactions are deduplicated by (user_id, txn_ts, amount, payment_code) — no unique constraint in DB, handled in application code.

  5. Session-only display name: User's name from Google never persisted to DB. Retrieved from Google each login and stored only in the session cookie.