System Design

Experience Navigator Architecture

A portfolio feature designed to demonstrate AI product thinking: explainability, human-in-the-loop workflow, grounding, and structured output. No hallucinations — every response is drawn from the actual CV context.

← Back to portfolio

Request Pipeline

User submits query

Job description or topic text, up to 500 characters. Response format selected (Fit Summary, Recruiter Brief, Matched Bullets, Case Study).

Sanitization (Hono API)

Zod schema validation + NFKC unicode normalization. tiktoken token count gate (≤300 tokens). Special character ratio check. 12-pattern injection regex. In-process sliding window rate limit (10 req/60s per IP).

TF-IDF Concept Extraction

natural.TfIdf runs query against 50-term domain vocabulary. Top 8 concepts scored by salience (0–100), tiered as primary/supporting/context. Explicit mentions get +2.0 score boost. No LLM used in this step.

Human weight adjustment

User sees extracted concepts with TF-IDF salience scores. Drag to reorder, slide to adjust priority weight (0–10). Dismiss irrelevant concepts. This step makes the system auditable and demonstrates explainable AI.

Cache check

SHA-256 key on (query + responseMode). LRU cache with 5-min TTL and 10MB max size. Cache hit returns immediately with cached: true flag.

LLM Generation (Claude 3.5 Haiku)

Full cv.md passed as single context block in <experience> tags. Concepts sorted by userWeight descending — higher weight = explicit emphasis instruction. generateObject with Zod schema. temperature: 0. 30s timeout. Proxied via Helicone for observability.

Structured response

Returns: answer (format-specific), citedRoles (which roles were cited), confidence (high/medium/low). Evidence panel shows grounding sources. Response cached for future requests.

System Topology

  Browser (Vercel)                Fly.io (always-warm)
  ─────────────────               ──────────────────────────────────
  Next.js 15 App Router           Hono Node.js server
  │                               │
  ├── / (homepage)                ├── POST /extract
  │   └── AlgorithmCanvas             │  sanitize() → extractConcepts()
  │       └── Canvas 2D               │  → ConceptScore[]
  │           Dijkstra / BFS /        │
  │           MergeSort           ├── POST /generate
  │                                   │  cache check → sanitize()
  ├── /navigator                      │  → generateResponse()
  │   └── ExperienceNavigator         │  → LRU cache set
  │       ├── useReducer              │
  │       │   (state machine)     lib/
  │       ├── AnimatePresence      ├── cv.ts        (cv.md loaded once)
  │       │   (SPA transitions)    ├── cache.ts     (lru-cache v11)
  │       ├── InputScreen          ├── sanitize.ts  (4-step pipeline)
  │       ├── ConceptScreen        ├── extraction.ts (TF-IDF)
  │       │   └── ConceptPanel     └── generation.ts (Helicone→Anthropic)
  │       │       └── ConceptCard
  │       │           (@dnd-kit)
  │       ├── OutputScreen
  │       └── EvidenceScreen
  │
  └── /system-design              External Services
                                  ─────────────────
  lib/api.ts                      Helicone (proxy)
  └── extractConcepts()           └── Anthropic Claude 3.5 Haiku
      generateResponse()              temperature: 0, generateObject

Technology Stack

Layer	Technology	Rationale
Frontend	Next.js 15 (App Router)	Server components for SEO, client components for SPA navigator, Vercel Hobby for hosting
Animations	Motion v12 (AnimatePresence)	Directional slide transitions between SPA steps, no page refreshes
Drag & Drop	@dnd-kit/sortable	Accessible, pointer + keyboard drag-to-reorder for concept priority
API Server	Hono on Fly.io (Node.js 22)	Single always-warm instance (auto_stop=false) for persistent in-memory cache and rate limiting — no cold starts
LLM	Claude 3.5 Haiku via Vercel AI SDK v6	generateObject with Zod schema, temperature 0, structured output with citedRoles and confidence
LLM Proxy	Helicone	Observability, cost tracking, 200 req/day hard limit, $4 budget alert
Concept Extraction	natural (TF-IDF)	No LLM in extraction step — deterministic, instant, explainable. 50-term domain vocabulary seeded from CV.
Token Counting	tiktoken (cl100k_base)	Server-side gate: reject queries >300 tokens before they reach the LLM
Input Validation	Zod v4	Schema validation, NFKC normalization, injection pattern detection
Rate Limiting	In-process sliding window Map	10 req/60s per IP — no Upstash needed on single persistent server process
Output Cache	lru-cache v11	SHA-256 keyed on query+mode, 5-min TTL, 10MB max, saves LLM cost on repeated queries
Styling	Tailwind CSS v4	Utility-first, zero runtime, dark theme with zinc palette
Monorepo	pnpm workspaces + Turborepo	Shared @repo/types package, build pipeline: types → api → web

Architecture Decision Records

ADR-001Full CV as single context window, no vector DB

Accepted

Context: CV is ~2,000 tokens — well within Claude 3.5 Haiku's 200K context window. Many portfolio sites chunk the CV and use embeddings for retrieval.
Decision: Pass the entire cv.md as a single <experience> block in the system prompt. No vector DB, no chunking, no embeddings.
Consequences: Simpler architecture. Every response has full CV context — no missed references. Slight token cost increase ($0.001/request at Haiku pricing). No hallucination risk from retrieval errors.

ADR-002TF-IDF for concept extraction, not an LLM

Accepted

Context: Concept extraction could use an LLM call. This would add latency, cost, and nondeterminism to a step that benefits from speed and explainability.
Decision: Use natural.TfIdf with a 50-term domain vocabulary seeded from the CV. Query as document, vocabulary terms as corpus. Salience = normalized TF-IDF score.
Consequences: Extraction is instant (<5ms), deterministic, free, and auditable. The vocabulary is the explicit model of what the CV contains — users can see exactly why concepts were chosen.

ADR-003Dedicated Fly.io server over Vercel serverless functions

Accepted

Context: Vercel serverless functions would cold-start on every request and cannot hold in-memory state (LRU cache, rate limit Map) across invocations.
Decision: Deploy a Hono/Node.js server to Fly.io with auto_stop_machines=false and min_machines_running=1. Single always-warm 256MB instance.
Consequences: No cold starts. Shared in-memory LRU cache and rate limiter work correctly. Cost: $0 (Fly.io free tier covers one always-on shared-cpu-1x 256MB machine). Trade-off: single instance means no horizontal scaling — acceptable for a portfolio demo.

ADR-004Human-in-the-loop concept weight adjustment

Accepted

Context: The system could skip concept review and go straight to generation. This is faster but opaque.
Decision: Add a mandatory concept review step where users see the extracted themes, their TF-IDF salience scores, and can adjust priority weights (0–10) and reorder via drag-and-drop.
Consequences: Demonstrates explainability and human-in-the-loop design. User weights are injected into the prompt sorted by weight descending — higher weight = explicit emphasis instruction to Claude. Adds 1 interaction step but makes the system auditable.

ADR-005Structured LLM output with Zod schema

Accepted

Context: Free-text LLM output requires parsing and validation. Structured output with generateObject gives type-safe responses.
Decision: Use Vercel AI SDK generateObject with a Zod schema: { answer: string, citedRoles: string[], confidence: "high" | "medium" | "low" }. temperature: 0.
Consequences: Type-safe responses. citedRoles enables the evidence panel to show exactly which roles were cited. confidence enables the UI badge. temperature: 0 makes outputs deterministic (LRU cache effective).

Try the Navigator →Back to portfolio