System Design
Experience Navigator Architecture
A portfolio feature designed to demonstrate AI product thinking: explainability, human-in-the-loop workflow, grounding, and structured output. No hallucinations — every response is drawn from the actual CV context.
← Back to portfolioRequest Pipeline
User submits query
Job description or topic text, up to 500 characters. Response format selected (Fit Summary, Recruiter Brief, Matched Bullets, Case Study).
Sanitization (Hono API)
Zod schema validation + NFKC unicode normalization. tiktoken token count gate (≤300 tokens). Special character ratio check. 12-pattern injection regex. In-process sliding window rate limit (10 req/60s per IP).
TF-IDF Concept Extraction
natural.TfIdf runs query against 50-term domain vocabulary. Top 8 concepts scored by salience (0–100), tiered as primary/supporting/context. Explicit mentions get +2.0 score boost. No LLM used in this step.
Human weight adjustment
User sees extracted concepts with TF-IDF salience scores. Drag to reorder, slide to adjust priority weight (0–10). Dismiss irrelevant concepts. This step makes the system auditable and demonstrates explainable AI.
Cache check
SHA-256 key on (query + responseMode). LRU cache with 5-min TTL and 10MB max size. Cache hit returns immediately with cached: true flag.
LLM Generation (Claude 3.5 Haiku)
Full cv.md passed as single context block in <experience> tags. Concepts sorted by userWeight descending — higher weight = explicit emphasis instruction. generateObject with Zod schema. temperature: 0. 30s timeout. Proxied via Helicone for observability.
Structured response
Returns: answer (format-specific), citedRoles (which roles were cited), confidence (high/medium/low). Evidence panel shows grounding sources. Response cached for future requests.
System Topology
Browser (Vercel) Fly.io (always-warm)
───────────────── ──────────────────────────────────
Next.js 15 App Router Hono Node.js server
│ │
├── / (homepage) ├── POST /extract
│ └── AlgorithmCanvas │ sanitize() → extractConcepts()
│ └── Canvas 2D │ → ConceptScore[]
│ Dijkstra / BFS / │
│ MergeSort ├── POST /generate
│ │ cache check → sanitize()
├── /navigator │ → generateResponse()
│ └── ExperienceNavigator │ → LRU cache set
│ ├── useReducer │
│ │ (state machine) lib/
│ ├── AnimatePresence ├── cv.ts (cv.md loaded once)
│ │ (SPA transitions) ├── cache.ts (lru-cache v11)
│ ├── InputScreen ├── sanitize.ts (4-step pipeline)
│ ├── ConceptScreen ├── extraction.ts (TF-IDF)
│ │ └── ConceptPanel └── generation.ts (Helicone→Anthropic)
│ │ └── ConceptCard
│ │ (@dnd-kit)
│ ├── OutputScreen
│ └── EvidenceScreen
│
└── /system-design External Services
─────────────────
lib/api.ts Helicone (proxy)
└── extractConcepts() └── Anthropic Claude 3.5 Haiku
generateResponse() temperature: 0, generateObjectTechnology Stack
| Layer | Technology | Rationale |
|---|---|---|
| Frontend | Next.js 15 (App Router) | Server components for SEO, client components for SPA navigator, Vercel Hobby for hosting |
| Animations | Motion v12 (AnimatePresence) | Directional slide transitions between SPA steps, no page refreshes |
| Drag & Drop | @dnd-kit/sortable | Accessible, pointer + keyboard drag-to-reorder for concept priority |
| API Server | Hono on Fly.io (Node.js 22) | Single always-warm instance (auto_stop=false) for persistent in-memory cache and rate limiting — no cold starts |
| LLM | Claude 3.5 Haiku via Vercel AI SDK v6 | generateObject with Zod schema, temperature 0, structured output with citedRoles and confidence |
| LLM Proxy | Helicone | Observability, cost tracking, 200 req/day hard limit, $4 budget alert |
| Concept Extraction | natural (TF-IDF) | No LLM in extraction step — deterministic, instant, explainable. 50-term domain vocabulary seeded from CV. |
| Token Counting | tiktoken (cl100k_base) | Server-side gate: reject queries >300 tokens before they reach the LLM |
| Input Validation | Zod v4 | Schema validation, NFKC normalization, injection pattern detection |
| Rate Limiting | In-process sliding window Map | 10 req/60s per IP — no Upstash needed on single persistent server process |
| Output Cache | lru-cache v11 | SHA-256 keyed on query+mode, 5-min TTL, 10MB max, saves LLM cost on repeated queries |
| Styling | Tailwind CSS v4 | Utility-first, zero runtime, dark theme with zinc palette |
| Monorepo | pnpm workspaces + Turborepo | Shared @repo/types package, build pipeline: types → api → web |
Architecture Decision Records
ADR-001Full CV as single context window, no vector DBAccepted
- Context
- CV is ~2,000 tokens — well within Claude 3.5 Haiku's 200K context window. Many portfolio sites chunk the CV and use embeddings for retrieval.
- Decision
- Pass the entire cv.md as a single <experience> block in the system prompt. No vector DB, no chunking, no embeddings.
- Consequences
- Simpler architecture. Every response has full CV context — no missed references. Slight token cost increase ($0.001/request at Haiku pricing). No hallucination risk from retrieval errors.
ADR-002TF-IDF for concept extraction, not an LLMAccepted
- Context
- Concept extraction could use an LLM call. This would add latency, cost, and nondeterminism to a step that benefits from speed and explainability.
- Decision
- Use natural.TfIdf with a 50-term domain vocabulary seeded from the CV. Query as document, vocabulary terms as corpus. Salience = normalized TF-IDF score.
- Consequences
- Extraction is instant (<5ms), deterministic, free, and auditable. The vocabulary is the explicit model of what the CV contains — users can see exactly why concepts were chosen.
ADR-003Dedicated Fly.io server over Vercel serverless functionsAccepted
- Context
- Vercel serverless functions would cold-start on every request and cannot hold in-memory state (LRU cache, rate limit Map) across invocations.
- Decision
- Deploy a Hono/Node.js server to Fly.io with auto_stop_machines=false and min_machines_running=1. Single always-warm 256MB instance.
- Consequences
- No cold starts. Shared in-memory LRU cache and rate limiter work correctly. Cost: $0 (Fly.io free tier covers one always-on shared-cpu-1x 256MB machine). Trade-off: single instance means no horizontal scaling — acceptable for a portfolio demo.
ADR-004Human-in-the-loop concept weight adjustmentAccepted
- Context
- The system could skip concept review and go straight to generation. This is faster but opaque.
- Decision
- Add a mandatory concept review step where users see the extracted themes, their TF-IDF salience scores, and can adjust priority weights (0–10) and reorder via drag-and-drop.
- Consequences
- Demonstrates explainability and human-in-the-loop design. User weights are injected into the prompt sorted by weight descending — higher weight = explicit emphasis instruction to Claude. Adds 1 interaction step but makes the system auditable.
ADR-005Structured LLM output with Zod schemaAccepted
- Context
- Free-text LLM output requires parsing and validation. Structured output with generateObject gives type-safe responses.
- Decision
- Use Vercel AI SDK generateObject with a Zod schema: { answer: string, citedRoles: string[], confidence: "high" | "medium" | "low" }. temperature: 0.
- Consequences
- Type-safe responses. citedRoles enables the evidence panel to show exactly which roles were cited. confidence enables the UI badge. temperature: 0 makes outputs deterministic (LRU cache effective).