Spensy v1 — Technical Architecture
Overview
Spensy is a full-stack personal finance tracker that uses Claude AI to extract transactions from bank statements (PDFs, images). Users upload statements, Claude extracts structured transaction data, users review and confirm, and the data is stored encrypted at rest.
Tech Stack
| Layer | Technology |
|---|---|
| Backend framework | FastAPI 0.115.12 (async) |
| ASGI server | Uvicorn 0.34.0 |
| Language | Python 3.12 |
| Database | PostgreSQL (async via asyncpg 0.30.0) |
| ORM | SQLAlchemy 2.0 (async) |
| Templating | Jinja2 3.1.6 (server-side rendering) |
| Authentication | Authlib 1.4.1 (Google OAuth 2.0) |
| AI integration | Anthropic Python SDK 0.49.0 (Claude API) |
| Encryption | cryptography 44.0.2 (AES-256-GCM) |
| PDF processing | pypdf 6.9.2 |
| Session management | Starlette SessionMiddleware + itsdangerous |
| Deployment | Heroku (Procfile + runtime.txt) |
Directory Structure
spensy_v1/
├── app/
│ ├── main.py # FastAPI app init, middleware, base routes
│ ├── auth.py # Google OAuth 2.0 handlers
│ ├── config.py # pydantic-settings config
│ ├── crypto.py # AES-256-GCM encrypt/decrypt utilities
│ ├── db.py # Async engine + session factory
│ ├── models.py # SQLAlchemy ORM models
│ ├── routers/
│ │ ├── settings.py # /settings — LLM API key management
│ │ ├── transactions.py # /transactions — view & filter
│ │ └── upload.py # /upload, /review, /confirm
│ ├── services/
│ │ └── extractor.py # Claude API integration + extraction logic
│ ├── prompts/
│ │ └── extraction_system.txt # Claude system prompt (194 lines)
│ └── templates/ # Jinja2 HTML templates (6 pages)
├── schema.sql # Full PostgreSQL DDL
├── requirements.txt
├── runtime.txt # python-3.12.x
├── Procfile # web: uvicorn app.main:app ...
└── .env.example
Database Schema
All tables use UUID PKs generated by pgcrypto. Sensitive fields are encrypted with AES-256-GCM before storage.
users
| Column | Type | Notes |
|---|---|---|
| user_id | UUID PK | |
| google_sub_hash | CHAR(64) UNIQUE | HMAC-SHA256 of Google sub — one-way, searchable |
| BYTEA | AES-256-GCM encrypted | |
| email_verified | BOOLEAN | |
| last_login_at | TIMESTAMPTZ | |
| created_at / updated_at | TIMESTAMPTZ | auto-trigger on updated_at |
transactions
| Column | Type | Notes |
|---|---|---|
| txn_id | UUID PK | |
| user_id | UUID FK | |
| account_id | UUID FK | nullable |
| source | ENUM | email_scan, pdf_scan, image_scan |
| txn_ts | TIMESTAMPTZ | transaction timestamp |
| amount | NUMERIC(18,4) | |
| curr | CHAR(3) | default INR |
| payer_name / payee_name | BYTEA | AES-256-GCM encrypted |
| payment_code | ENUM | DEBIT_P2P, DEBIT_P2M, CREDIT_M2P, CREDIT_P2P, REFUND_M2P, REFUND_P2P |
| txn_code | CHAR(8) | 8-digit category code (see Category System) |
| base_code | CHAR(2) GENERATED | extracted from txn_code chars 3-4 |
| detail_code | CHAR(4) GENERATED | extracted from txn_code chars 5-8 |
| type | ENUM | expense, earning, investment, credit_card_bill |
| description / user_code | BYTEA | AES-256-GCM encrypted |
| is_loan | ENUM | DEBIT, CREDIT, nullable |
| is_recurring | BOOLEAN | |
| rrn | VARCHAR(12) | Retrieval Reference Number, nullable |
Indexes on: (user_id, txn_ts), (user_id, type), (user_id, payment_code), (user_id, base_code).
accounts
| Column | Type | Notes |
|---|---|---|
| account_id | UUID PK | |
| user_id | UUID FK | |
| account_type | ENUM | savings, credit_card |
| account_name | VARCHAR(64) | e.g. "HDFC", "AXIS" |
| label | VARCHAR(128) | e.g. "Salary Account" |
UNIQUE on (user_id, account_type, account_name).
user_llm_keys
| Column | Type | Notes |
|---|---|---|
| key_id | UUID PK | |
| user_id | UUID FK | |
| provider | ENUM | claude |
| api_key | BYTEA | AES-256-GCM encrypted |
UNIQUE on (user_id, provider) — one key per provider per user.
txn_base_categories & txn_detail_categories
Seeded lookup tables mapping 2-digit and 4-digit codes to labels. 11 base categories (household, food, transport, health, entertainment, education, finance, travel, income, investment, other).
Authentication Flow
User → "Continue with Google"
→ GET /auth/google
→ Authlib redirects to Google (scopes: openid email profile)
→ Google → GET /auth/google/callback?code=...
→ Exchange code for token + userinfo
→ HMAC-SHA256(google_sub, SERVER_SECRET) → google_sub_hash
→ AES-256-GCM(email, ENCRYPTION_KEY) → encrypted email
→ UPSERT into users ON CONFLICT(google_sub_hash) DO UPDATE
→ request.session["user_id"] = user_id
→ redirect to /dashboard
Security notes:
- Google
subnever stored raw — only the HMAC hash (deterministic, one-way) - Email encrypted at rest; decrypted only in application layer on demand
- Display name stored only in session, never in DB
- Profile picture discarded immediately
API Routes
Auth (app/auth.py)
| Method | Path | Description |
|---|---|---|
| GET | /auth/google | Initiate OAuth flow |
| GET | /auth/google/callback | OAuth callback handler |
| POST | /auth/logout | Clear session |
Main (app/main.py)
| Method | Path | Description |
|---|---|---|
| GET | / | Landing page (redirect to /dashboard if logged in) |
| GET | /dashboard | Upload hub |
Upload (app/routers/upload.py)
| Method | Path | Description |
|---|---|---|
| GET | /upload | Upload page |
| POST | /upload | Process file (image/PDF), call Claude, store in _pending |
| GET | /upload/pdf-password | PDF password form |
| POST | /upload/pdf-password | Decrypt PDF, call Claude |
| GET | /review | Review extracted transactions |
| POST | /transactions/confirm | Confirm & save to DB |
Transactions (app/routers/transactions.py)
| Method | Path | Description |
|---|---|---|
| GET | /transactions | View transactions (paginated, filterable) |
Query params: account_id, type, payment_code, from_date, to_date, page (50/page).
Settings (app/routers/settings.py)
| Method | Path | Description |
|---|---|---|
| GET | /settings | Settings page (masked API keys) |
| POST | /settings/keys | Add/update Claude API key |
| POST | /settings/keys/{key_id}/delete | Delete API key |
Transaction Upload Data Flow
1. User uploads image or PDF
2. POST /upload
├── Validate file type (JPEG, PNG, WebP, PDF)
├── If PDF encrypted → redirect to /upload/pdf-password
└── Pass to extractor.py
3. extractor.py
├── Encode file as base64
├── Load system prompt from prompts/extraction_system.txt
├── Call Claude claude-sonnet-4-6 (max_tokens: 8192)
│ └── Input: base64 image or PDF document + user's saved accounts
├── Parse JSON response
│ └── If parse fails → send correction turn to Claude (retry)
└── Return structured transaction list
4. Store extraction result in in-memory _pending dict (keyed by session token)
5. GET /review → user edits extracted data
6. POST /transactions/confirm
├── UPSERT accounts
├── Encrypt sensitive fields (payer_name, payee_name, description)
├── INSERT transactions (skip duplicates by user_id + txn_ts + amount + payment_code)
└── Clear _pending entry
Note: _pending and _pending_pdf are in-memory dicts — single-process only. A future multi-process deployment would need a shared store (Redis, DB table).
Encryption Strategy
All encryption uses AES-256-GCM via the cryptography library. A fresh 12-byte nonce is generated per encryption; the nonce is prepended to the ciphertext before storing to BYTEA.
stored_bytes = nonce (12 bytes) || ciphertext || tag
| Data | Storage | Method |
|---|---|---|
| BYTEA | AES-256-GCM with ENCRYPTION_KEY | |
| Payer / payee name | BYTEA | AES-256-GCM with ENCRYPTION_KEY |
| Transaction description | BYTEA | AES-256-GCM with ENCRYPTION_KEY |
| User notes (user_code) | BYTEA | AES-256-GCM with ENCRYPTION_KEY |
| LLM API keys | BYTEA | AES-256-GCM with ENCRYPTION_KEY |
| Google OAuth sub | CHAR(64) | HMAC-SHA256 with SERVER_SECRET (one-way) |
Category System
Transactions use an 8-digit category code:
txn_code = "00" + 2-digit base + 4-digit detail
Example: "00010001"
^^ → reserved prefix
^^ → base code "01" = household
^^^^ → detail code "0001" = house_rent
11 base categories: 00 (uncategorized), 01 (household), 02 (food), 03 (transport), 04 (health), 05 (entertainment), 06 (education), 07 (finance), 08 (travel), 09 (income), 10 (investment).
base_code and detail_code are PostgreSQL generated columns derived from txn_code, enabling efficient indexed filtering.
External Services
Google OAuth 2.0
- Discovery URL:
https://accounts.google.com/.well-known/openid-configuration - Scopes:
openid email profile - Library: Authlib
Claude API (Anthropic)
- Model:
claude-sonnet-4-6 - Client:
anthropic.AsyncAnthropic(async) - Purpose: Extract structured transaction data from bank statement images/PDFs
- Input types: base64 JPEG/PNG/WebP (image block) or base64 PDF (document block)
- Max tokens: 8192
- User-supplied API keys — stored encrypted, validated on save via
models.list() - Debug logging: when
LOG_LLM=true, full request/response logged tologs/llm_debug.log
Environment Variables
| Variable | Required | Description |
|---|---|---|
| DATABASE_URL | Yes | postgresql+asyncpg://user:pass@host:5432/db |
| GOOGLE_CLIENT_ID | Yes | Google Cloud Console OAuth client ID |
| GOOGLE_CLIENT_SECRET | Yes | Google Cloud Console OAuth secret |
| SESSION_SECRET | Yes | 64-char hex — Starlette session signing |
| SERVER_SECRET | Yes | 64-char hex — HMAC-SHA256 key for google_sub |
| ENCRYPTION_KEY | Yes | 64-char hex — AES-256-GCM key |
| LOG_LLM | No | true to enable LLM debug logging (default: false) |
Generate secrets:
python -c "import secrets; print(secrets.token_hex(32))"
Deployment
Local development
pip install -r requirements.txt
# Set up .env from .env.example
# Run schema.sql against a Postgres DB
uvicorn app.main:app --reload
Heroku (production)
- Procfile:
web: uvicorn app.main:app --host 0.0.0.0 --port $PORT - Runtime: Python 3.12 (
runtime.txt) - Database: Heroku Postgres add-on (DATABASE_URL auto-set)
- Config vars: Set all env vars in Heroku dashboard
- Uses
ProxyHeadersMiddlewarefor forwarded headers behind Heroku's reverse proxy
Current deployment
- Hosted on Railway (migrated from Heroku, same Procfile pattern)
- Google OAuth redirect URIs configured for Railway domain
Frontend
Server-side rendered via Jinja2. No frontend framework — custom CSS only.
| Template | Route | Purpose |
|---|---|---|
| landing.html | / | Google login page |
| dashboard.html | /dashboard | File upload hub |
| settings.html | /settings | API key management |
| transactions.html | /transactions | Transaction list with filters |
| review.html | /review | Review/edit extracted transactions before confirming |
| pdf_password.html | /upload/pdf-password | Encrypted PDF password entry |
Design: White cards on light gray (#f5f5f0) background, accent blue (#4f7ef7), flexbox layout, no external CSS framework.
Key Architectural Decisions
-
User-supplied API keys: Users bring their own Claude API key. No Anthropic costs on the platform side. Keys are validated on save and stored encrypted.
-
In-memory pending state: Extraction results are held in
_pendingdicts between upload and confirm steps. Simple for single-process, but won't scale to multi-worker without a shared store. -
No raw PII in DB: Google sub hashed one-way; emails and names encrypted. The DB can be compromised without exposing user identity directly.
-
Deduplication on confirm: Transactions are deduplicated by
(user_id, txn_ts, amount, payment_code)— no unique constraint in DB, handled in application code. -
Session-only display name: User's name from Google never persisted to DB. Retrieved from Google each login and stored only in the session cookie.