Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”

1. What This Is (one paragraph)

Abstraction AI is a small web app that takes long, messy conversation context (chat logs, meeting notes, or uploaded text files) and compiles it into a consistent “spec pack” of 11 structured documents (product overview, feature rules, technical architecture, tasks/acceptance, decisions, edge cases, quotes, trace map, open questions, inconsistencies) so humans and coding AIs can implement a project with less drift and fewer missed decisions.

2. Who It’s For + Use Cases

Primary users

Non-technical builders who start ideas via long AI chats and need a bridge to “engineer-ready” documentation.
Engineers/tech leads who want a fast “single source of truth” spec scaffold before implementation.
Users of coding agents (Cursor / Claude Code / Augment Code) who want to reduce ambiguity and rework.

Common use cases

Convert a long brainstorming thread into implementable requirements + architecture + acceptance criteria.
Produce a repeatable “spec pack” you can drop into a repo before asking a coding AI to build.
Extract decisions/constraints/edge cases and make them traceable to quoted source snippets.

What “good outcome” looks like (repo evidence-backed)

11 documents are generated and previewable in the browser, then downloadable (single files or ZIP). (Evidence: backend/static/index.en.html:46, backend/static/app.js:352, backend/app.py:514)
The documents follow fixed separators and file names in either English or Chinese bundles. (Evidence: backend/prompt.py:218, backend/prompt.py:258, backend/prompt.py:493)

3. Product Surface Area (Features)

Feature: Paste context (primary input)

What it does: User pastes long context; UI shows character count; context is sent to the backend to generate docs. (Evidence: backend/static/app.js:246, backend/static/app.js:352)
Why it exists: The tool is designed around “raw context” as the single input, matching the stated problem of long AI chat histories. (Evidence: career_signaling_post.md:9, backend/static/index.en.html:48)
User journey (3–6 steps):
1. Open / (English) or /zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)
2. Paste context into the textarea.
3. Click Generate.
4. Watch documents stream in.
5. Preview and download results.
Constraints:
- Empty/whitespace-only context is rejected with a 400. (Evidence: backend/app.py:156, backend/app.py:222)

Feature: Upload multiple files (client-side) and merge into context

What it does: Allows selecting multiple files in the browser, merges their text into the textarea with per-file headers (=== filename ===). (Evidence: backend/static/app.js:271, backend/static/app.js:329)
Why it exists: Many “long contexts” live in files (notes, transcripts); merging keeps a single payload for generation. (Evidence: backend/static/index.en.html:48, career_signaling_post.md:23)
User journey:
1. Choose files.
2. UI lists uploads and lets you remove individual files.
3. Combined text is inserted into the context input.
Constraints:
- Uses File.text() in the browser; binary formats and very large files may fail or be slow; failures are replaced with a localized [Unable to read file contents] marker. (Evidence: backend/static/app.js:336)

Feature: Model toggle (GPT vs Gemini)

What it does: UI supports choosing gpt-5 or gemini-2.5-pro. (Evidence: backend/static/index.en.html:135, backend/static/app.js:197)
Why it exists: Lets users choose between “Better reasoning” and “Faster response” as described in the UI. (Evidence: backend/static/index.en.html:137, backend/static/index.en.html:141)
Constraints:
- Backend treats any model name containing "gemini" as non-streaming and uses a heartbeat loop + full-response parse. (Evidence: backend/app.py:234, backend/app.py:251)

Feature: Language toggle (EN/ZH)

What it does: / serves English by default (if index.en.html exists), with /zh for Chinese. Prompt bundle switches separators, file names, and copy. (Evidence: backend/app.py:114, backend/prompt.py:482)
Why it exists: The underlying prompt and doc names are localized (two prompt bundles). (Evidence: backend/prompt.py:254, backend/prompt.py:258)
Constraints:
- Only English/Chinese are supported; unknown values fall back to English. (Evidence: backend/prompt.py:482)

Feature: Streaming generation UX (11 docs as progress units)

What it does: Backend streams NDJSON events (meta, doc_started, chunk, doc_complete, done, error, and heartbeat) and frontend renders per-doc cards with status. (Evidence: backend/app.py:209, backend/static/app.js:462)
Why it exists: Improves perceived latency and reduces “blank screen” time for long generations. (Evidence: backend/static/index.en.html:153, backend/app.py:456)
Constraints:
- Requires the model to emit correct separators; backend and frontend include best-effort tolerance and fallbacks. (Evidence: backend/app.py:78, backend/prompt.py:452)

Feature: Preview, copy, and download documents (single + ZIP)

What it does: Users can open documents in a modal while streaming, copy to clipboard, download single docs, or download a ZIP via backend. (Evidence: backend/static/app.js:731, backend/static/app.js:828, backend/app.py:514)
Why it exists: The output is intended to be moved into a project repo. (Evidence: backend/static/index.en.html:209)
Constraints:
- ZIP filename is hard-coded to context_compiler_output.zip in backend response headers (even though the UI uses localized names). (Evidence: backend/app.py:525, backend/static/app.js:79)

Feature: “AI coding prompt” handoff box

What it does: Results UI contains a “Next: have a coding AI implement it” prompt box and a copy button. (Evidence: backend/static/index.en.html:198)
Why it exists: The product’s intended workflow is “generate specs → hand to coding agent”. (Evidence: backend/static/index.en.html:81, career_signaling_post.md:9)

Feature: Analytics + feedback (client-side)

What it does: Frontend triggers GA4 events and uses Microsoft Clarity, tracks anonymous user id + stats in localStorage, and shows thumbs feedback + NPS after multiple generations. (Evidence: backend/static/index.en.html:4, backend/static/app.js:16, backend/static/app.js:1055)
Constraints / privacy notes:
- Clarity “project id” is a placeholder string in HTML; actual deployment must replace it. (Evidence: backend/static/index.en.html:18)
- No backend consent or privacy controls are present in this repo. Unknown (not found in repo) whether deployment adds them.

4. Architecture Overview

Components diagram (text)

Browser (static HTML/CSS/JS)
  ├─ GET /, /en, /zh  → FastAPI serves HTML
  ├─ GET /static/*    → FastAPI serves JS/CSS
  └─ POST /api/generate-stream (JSON) ───────────────┐
                                                     ▼
FastAPI backend (Python)
  ├─ Builds full_prompt = ULTIMATE_PROMPT + context
  ├─ Calls BuilderSpace OpenAI-compatible API (chat.completions)
  ├─ Streams NDJSON events back to browser
  └─ (Optional) Zips documents for download
                                                     ▼
LLM Provider via BuilderSpace proxy
  └─ Returns text that includes 11 doc separators and content

Responsibilities per component

Frontend (backend/static/*): Collects input, initiates generation, renders stream events into 11 document cards, provides download/copy utilities, tracks analytics. (Evidence: backend/static/app.js:352, backend/static/app.js:462)
Backend (backend/app.py): Exposes routes, builds prompts, calls LLM, parses separators into docs, handles streaming and heartbeats, serves static assets. (Evidence: backend/app.py:28, backend/app.py:207, backend/app.py:529)
Prompt bundle (backend/prompt.py): Defines the “11-document contract”: names, separators, and the full instruction prompt (EN/ZH). (Evidence: backend/prompt.py:218, backend/prompt.py:493)

Key runtime assumptions

A valid AI_BUILDER_TOKEN is configured at runtime; otherwise generation fails (observed 401 with dummy token). (Evidence: backend/app.py:31, training_runs/2026-01-17T20-53-16Z_notes.md:58)
LLM outputs must include expected separators; otherwise parsing degrades to a single “full output” doc. (Evidence: backend/app.py:185, backend/prompt.py:452)

5. Data Model

This project is intentionally “stateless” server-side: there is no database layer or persistent server storage implemented in this repo. (Evidence: backend/requirements.txt:1 (no DB libs), backend/app.py:3 (no ORM/DB imports).)

API request/response models (backend)

GenerateRequest: { context: string, project_name?: string, model?: string, lang?: string } (Evidence: backend/app.py:50)
DocumentResponse: { name: string, content: string } (Evidence: backend/app.py:58)
GenerateResponse: { success: boolean, project_name: string, documents: DocumentResponse[], generated_at: string, raw_response?: string } (Evidence: backend/app.py:64)

Streaming event “schema” (backend → frontend)

NDJSON events for /api/generate-stream: (Evidence: backend/app.py:209)

meta: { type, project_name, document_names, generated_at }
doc_started: { type, doc_index }
chunk: { type, doc_index, delta }
doc_complete: { type, doc_index }
heartbeat: { type, elapsed_seconds, message } (Gemini and GPT timeouts)
done: { type }
error: { type, message }

Client-side storage (browser)

Stored in localStorage (not sent to backend by this repo):

abstraction_user_id (anonymous identifier) (Evidence: backend/static/app.js:18)
abstraction_stats (generation/download counts, first/last visit, NPS/feedback flags) (Evidence: backend/static/app.js:27)

6. AI System Design (if applicable)

This is a “prompt compiler” system, not a RAG system: it does not ingest into a knowledge base, compute embeddings, or run retrieval. All “knowledge” comes from the user-provided context payload.

Knowledge ingestion (sources, parsing, chunking)

Sources: pasted text + browser-read file contents merged into one string. (Evidence: backend/static/app.js:329)
Chunking strategy: Unknown (not found in repo). The backend sends the full context as a single user message; no chunking is implemented. (Evidence: backend/app.py:171)

Embeddings

Not used. (Evidence: no embedding code or deps; backend/requirements.txt contains no vector DB clients.)

Retrieval

Not used. (Evidence: no retrieval modules; single “prompt + context” call.)

Generation (models, prompts, grounding)

Model selection: passed through from the UI; defaults to gpt-5. (Evidence: backend/app.py:54, backend/static/index.en.html:135)
Prompting: full_prompt = ultimate_prompt + context, where ultimate_prompt includes strict separators, file names, and per-doc templates. (Evidence: backend/app.py:162, backend/prompt.py:274)
Output contract: Must emit 11 documents with exact separator lines and file names in order. (Evidence: backend/prompt.py:316, backend/prompt.py:452)
Parameters: max_tokens=32000, temperature=1.0. (Evidence: backend/app.py:176)

Parsing & hallucination control

Primary control: enforce separators + file name order in the prompt. (Evidence: backend/prompt.py:316)
Parser behavior: backend searches for each separator and slices content; it tolerates minor spacing differences around "=====". (Evidence: backend/app.py:82)
Fallback: if parsing yields no documents, return a single “full output” doc. (Evidence: backend/app.py:185)
Remaining risk: If the model omits separators or generates malformed JSON for TRACE_MAP.json, the system will still display raw text but with reduced structure. Unknown (not found in repo) whether the deployment adds output validation/retries beyond what’s in this codebase.

Streaming design

GPT streaming: calls client.chat.completions.create(..., stream=True) and incrementally emits NDJSON events. (Evidence: backend/app.py:363, backend/app.py:207)
Gemini fallback: treats any model name containing "gemini" as non-streaming, but still keeps the HTTP connection alive with heartbeat events while waiting on a background thread. (Evidence: backend/app.py:234, backend/app.py:251)
Frontend rendering: throttles document card updates to reduce jitter (RENDER_THROTTLE_MS = 100). (Evidence: backend/static/app.js:200)

Evaluation

Unknown (not found in repo). No eval scripts, golden tests, or scoring harnesses are present. Suggested next step: add a small regression suite with fixed inputs and snapshot outputs (redacted) to detect separator drift and doc completeness.

7. Reliability, Security, and Privacy

Threat model (what can go wrong)

Cost abuse: Public /api/generate* endpoints can be spammed to run up LLM costs if deployed without auth/rate limiting. (Evidence: endpoints at backend/app.py:151, backend/app.py:207, CORS * at backend/app.py:41)
Prompt injection / separator breaking: User context can instruct the model to ignore separators, producing unparseable output. Parser has limited tolerance but no enforcement. (Evidence: backend/prompt.py:452, backend/app.py:78)
Privacy leakage: User context is transmitted to an external LLM endpoint; retention policy is unknown. (Evidence: backend/app.py:171; Unknown (not found in repo): privacy/retention policy docs.)

Authn/authz

Backend auth: None. No sessions, cookies, or auth middleware found. (Evidence: backend/app.py:15, backend/requirements.txt:1)
Frontend “identity”: anonymous localStorage ID for analytics only. (Evidence: backend/static/app.js:16)

CSRF/CORS/rate limiting

CORS: allow_origins=["*"] and allows credentials/methods/headers broadly. (Evidence: backend/app.py:41)
CSRF/rate limiting: Unknown (not found in repo). No CSRF tokens or rate limiting middleware present.

Secret handling

Backend reads AI_BUILDER_TOKEN from environment and calls load_dotenv() at import time. (Evidence: backend/app.py:25, backend/app.py:31)
.gitignore lists .env, but this repo tree contains .env and backend/.env; whether they contain real credentials is unknown (values not inspected here). (Evidence: .gitignore:1, training_runs/2026-01-17T20-53-16Z_notes.md:43)

Data retention & redaction

Unknown (not found in repo). The system does not implement redaction before sending context to the model.

8. Performance & Cost

Latency drivers

Dominated by LLM response time + output size (up to max_tokens=32000). (Evidence: backend/app.py:176)
Long contexts increase generation time; UI explicitly warns “Usually 1–5 minutes” (Evidence: backend/static/index.en.html:153)

What is optimized (and why)

Streaming UX: NDJSON streaming provides progressive output to reduce perceived latency. (Evidence: backend/app.py:209, backend/static/app.js:352)
Timeout resilience: heartbeat events are sent when no chunks arrive to prevent connection idle timeouts. (Evidence: backend/app.py:456, backend/static/app.js:555)
Frontend jitter reduction: throttled re-rendering limits DOM churn under high-frequency streaming updates. (Evidence: backend/static/app.js:200)

Cost drivers

Each generation performs (at least) one LLM call with a large prompt and large output. (Evidence: backend/app.py:171)
Build-time/dev cost claim: the repo narrative claims ~$20 for development and iteration (not runtime). (Evidence: career_signaling_post.md:42)

What to measure next (explicit metrics)

End-to-end generation time distribution by model (p50/p95).
Token usage and cost per request (prompt tokens vs completion tokens).
Parse success rate: percentage of runs producing all 11 docs cleanly vs fallback.
Error rate and top error causes (401, timeouts, malformed separators).
Client metrics: time-to-first-doc, time-to-all-docs, abandon rate. (Unknown (not found in repo): backend metrics collection/instrumentation.)

9. Hardest Problems + Key Tradeoffs

Single-call 11-doc output vs 11 separate calls
- Chosen: single call with strict separators for simplicity and coherence. (Evidence: backend/prompt.py:316, backend/app.py:171)
- Tradeoff: one failure or separator drift can degrade the entire output; no per-doc retries.
Streaming parsing on the backend vs “wait then parse”
- Chosen: backend streams and splits docs during generation for better UX. (Evidence: backend/app.py:207)
- Tradeoff: complicated buffer/separator logic; harder to test; relies on exact separators.
Gemini “streaming” fallback vs uniform streaming
- Chosen: detect gemini and use heartbeat + full response parsing to preserve UI behavior. (Evidence: backend/app.py:234, backend/app.py:251)
- Tradeoff: no token-level streaming for Gemini; doc content appears in larger chunks.
Frictionless public demo vs securing the API
- Chosen (current): no auth, permissive CORS, simple endpoints. (Evidence: backend/app.py:41)
- Tradeoff: abuse risk and unclear privacy posture for production use.
Client-side file reads vs server-side uploads
- Chosen (current UI): read files in the browser and merge into a single payload. (Evidence: backend/static/app.js:329)
- Tradeoff: browser memory limits; no server-side file-type validation; but simpler backend.
- Note: Backend still includes /api/generate-from-file, suggesting an alternate design path. (Evidence: backend/app.py:490)
High temperature (1.0) vs strict determinism
- Chosen: temperature=1.0, likely to produce fuller prose and explanations. (Evidence: backend/app.py:177)
- Tradeoff: more variance; increased risk of format drift; would benefit from stronger validation and retries.

10. Operational Guide (Repro & Deploy)

Local run steps

Ensure Python is available (repo includes a venv at backend/venv/, but recreating is safer if it’s stale).
Set required env var (name only): AI_BUILDER_TOKEN. (Evidence: backend/app.py:31)
Run the backend from backend/:
- uvicorn app:app --host 127.0.0.1 --port 8000
Open http://127.0.0.1:8000/ (English) or http://127.0.0.1:8000/zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)

Required env vars (names only)

AI_BUILDER_TOKEN (Evidence: backend/app.py:31)

Ports used

Default: 8000 (Docker + local examples). (Evidence: Dockerfile:18, deploy-config.json:5)
In container/platform: $PORT is respected. (Evidence: Dockerfile:22)

How to deploy

Container build/run is defined by Dockerfile. (Evidence: Dockerfile:1)
- Build: docker build -t abstractai .
- Run: docker run -e AI_BUILDER_TOKEN=... -p 8000:8000 abstractai
deploy-config.json suggests a deployment target named abstractai on branch main with port 8000; additional platform details are unknown. (Evidence: deploy-config.json:3)

How to debug common failures

401 Invalid credentials: AI_BUILDER_TOKEN is missing or invalid. (Observed locally with dummy token.) (Evidence: training_runs/2026-01-17T20-53-16Z_notes.md:58)
Long-running requests timing out: rely on heartbeat events; if still timing out, increase server/proxy idle timeouts or move to WebSocket/SSE. (Evidence: backend/app.py:456, backend/static/app.js:555)
Docs not splitting into 11 cards: model likely broke separators; check raw output and tighten prompt / lower temperature / add validation + retry. (Evidence: backend/prompt.py:452, backend/app.py:185)

11. Evidence Map (repo anchors)

Claim	Evidence (repo anchors)
App purpose: compile long conversations into executable product docs	`backend/app.py:34`, `backend/static/index.en.html:36`, `career_signaling_post.md:21`
Uses FastAPI backend	`backend/app.py:15`, `backend/app.py:34`
Calls an OpenAI-compatible API via BuilderSpace	`backend/app.py:28`
Uses env var `AI_BUILDER_TOKEN`	`backend/app.py:31`, `training_runs/2026-01-17T20-53-16Z_notes.md:43`
Serves English by default, supports `/zh`	`backend/app.py:114`, `backend/app.py:136`
Health endpoint exists	`backend/app.py:145`, `training_runs/2026-01-17T20-53-16Z_notes.md:53`
Generation endpoint (non-streaming) exists	`backend/app.py:151`
Streaming endpoint emits NDJSON event types	`backend/app.py:209`, `backend/static/app.js:462`
Model defaults to `gpt-5`	`backend/app.py:54`, `backend/static/index.en.html:135`
Gemini uses a non-streaming fallback + heartbeat	`backend/app.py:234`, `backend/app.py:251`, `backend/static/app.js:555`
Prompt bundles define 11 fixed doc names and separators	`backend/prompt.py:218`, `backend/prompt.py:258`, `backend/prompt.py:493`
Parser tolerates minor separator formatting	`backend/app.py:82`
ZIP download endpoint exists	`backend/app.py:514`
Frontend reads multiple files and merges into context	`backend/static/app.js:271`, `backend/static/app.js:329`
Frontend calls `/api/generate-stream`	`backend/static/app.js:400`
Frontend tracks analytics + NPS/feedback	`backend/static/index.en.html:4`, `backend/static/app.js:8`, `backend/static/app.js:1055`
Docker deployment exists and respects `$PORT`	`Dockerfile:18`, `Dockerfile:22`
Deploy config points to GitHub repo and service name	`deploy-config.json:2`, `deploy-config.json:3`
Repo narrative claims ~$20 dev cost and mentions GPT-5-pro for prompt authoring	`career_signaling_post.md:42`

12. Interview Question Bank + Answer Outlines

System design

Q1: How would you design an “AI spec compiler” with good UX for long outputs?

Stream output to the browser as structured events (meta/progress/chunks), not just a final blob.
Pick a stable output contract (e.g., separators + fixed doc order) to render partial results incrementally.
Implement connection keep-alives (heartbeat) to survive long model latency and proxy timeouts.
Throttle frontend rendering to avoid DOM jitter under high-frequency updates.
Add fallbacks: if parsing fails, still return a usable “full output”.
Evidence: backend/app.py:207, backend/static/app.js:352, backend/static/app.js:200, backend/app.py:185

Q2: Why NDJSON over WebSockets or SSE?

NDJSON works over plain HTTP responses and is easy to parse line-by-line in the browser stream reader.
Server and client can evolve event types without breaking a strict SSE format.
A migration path to SSE/WebSockets exists if you need better proxy compatibility and bidirectional control.
Evidence: backend/app.py:209, backend/static/app.js:422

Q3: What would you change to make this production-safe?

Add authentication or at least abuse controls (rate limiting, quotas, captcha).
Restrict CORS origins and disable credentials unless needed.
Add logging + metrics around generation and parse success.
Add request size limits and file type controls.
Add privacy policy, retention behavior, and redaction options.
Evidence: backend/app.py:41, backend/app.py:151, backend/static/index.en.html:4 (analytics present but no consent controls)

Q4: How do you deploy and operate it?

Containerize backend with Docker; serve static assets from the same service.
Use $PORT for platform-provided ports.
Provide /health for readiness checks.
Evidence: Dockerfile:1, Dockerfile:22, backend/app.py:145

AI/RAG

Q1: Is this a RAG system? If not, what are the tradeoffs?

It’s prompt-based compilation: all knowledge comes from the user’s provided context.
Benefit: simpler architecture, no indexing pipelines, fewer moving parts.
Tradeoff: long contexts increase latency/cost; no retrieval means no grounding to external trusted sources.
Upgrade path: add optional retrieval over a user-provided knowledge base for consistent facts.
Evidence: backend/app.py:171, backend/prompt.py:274

Q2: How do you control hallucinations and keep output structured?

Encode a strict output contract: fixed separators + file names + per-doc templates.
Add “no hallucination” instructions and direct unknowns into Open Questions.
Parse output and degrade gracefully when the contract isn’t met.
Add validators + retry loops for malformed separators/JSON.
Evidence: backend/prompt.py:316, backend/prompt.py:289, backend/app.py:78

Q3: How do you handle streaming differences across model providers?

Detect capability gaps (Gemini streaming unsupported in this proxy) and fall back to non-streaming mode.
Keep the client contract stable by still emitting progress/heartbeat events.
Evidence: backend/app.py:234, backend/app.py:251, backend/static/app.js:555

Debugging & reliability

Q1: A user reports the UI hangs during generation. How do you debug?

Check if the backend is emitting heartbeats or chunks; confirm proxy idle timeouts.
Verify frontend stream reader loop is still receiving data and parsing JSON lines.
Confirm the model call is still in progress; check server logs for timeouts.
Add instrumentation for “time since last chunk” and request lifecycle.
Evidence: backend/static/app.js:426, backend/app.py:456

Q2: The 11 documents don’t split correctly—everything ends up in one doc. What happened?

The model likely did not output the exact separators.
Verify the prompt bundle and separators; check for language mismatch or formatting drift.
Add stricter prompting, reduce temperature, and validate separators early with retries.
Evidence: backend/prompt.py:316, backend/app.py:82, backend/app.py:185

Q3: How do you prevent HTTP/2 timeouts on long generations?

Emit heartbeat events on the same response stream when no chunks arrive.
Use background threads + queues so the generator can keep writing while the model call blocks.
Evidence: backend/app.py:251, backend/app.py:456

Product sense

Q1: What user pain does this solve, and how do you measure success?

Pain: long AI chat contexts cause drift; jumping from idea → code skips PRD/system design layers.
Product: adds a “doc compilation” stage producing a spec pack that both humans and coding AIs can use.
Success metrics (suggested): reduced iteration cycles, higher first-pass implementation accuracy, parse success rate, completion time, user satisfaction (NPS).
Evidence: career_signaling_post.md:13, career_signaling_post.md:15, backend/static/index.en.html:36

Q2: Why 11 docs—why not fewer?

The prompt explicitly defines 11 audience-specific artifacts (PM/QA/engineer/execution) + evidence pack (decisions/edge cases/quotes/trace map).
This reduces ambiguity by forcing structure and traceability.
Evidence: backend/prompt.py:316, backend/prompt.py:343

Q3: What’s the biggest “product risk” here?

Users may paste sensitive data; without redaction and clear policy, this creates trust issues.
Unauthenticated public endpoint can be abused, impacting cost and reliability.
Output may still be wrong; without evals, quality can regress silently.
Evidence: backend/app.py:41, backend/app.py:171; Unknown (not found in repo): privacy docs and eval harness.

Behavioral

Q1: Tell me about a time you made a system more reliable without changing the core idea.

Identified that long model latency + proxy idle timeouts caused failures.
Added heartbeat events to keep connections alive, plus a background thread so the stream stays responsive.
Added a Gemini fallback when streaming was unsupported.
Evidence: backend/app.py:251, backend/app.py:234

Q2: How did you manage ambiguity in requirements?

Translated ambiguity into a structured “spec pack” format with explicit rules (MUST/SHOULD/MAY) and Open Questions.
The product itself encodes an anti-ambiguity workflow: compile context → implement.
Evidence: backend/prompt.py:295, backend/static/index.en.html:81

Q3: How do you collaborate with AI tools effectively?

Define contracts (separators, doc templates) so the model’s output is machine-parseable and reviewer-friendly.
Create a handoff prompt designed for coding agents, embedded in the UI.
Evidence: backend/prompt.py:316, backend/static/index.en.html:198

13. Roadmap (high-leverage upgrades)

Must

Add abuse protection: rate limiting + quotas + basic auth or signed tokens for generation endpoints.
Restrict CORS to known origins; disable allow_credentials unless required.
Add server-side observability: structured logs + request IDs + metrics (latency, tokens, error rates, parse success).
Add output validators (separator presence, JSON validity for trace map) + automatic retries with lower temperature.
Add request size limits and better file handling (MIME allowlist; server-side uploads if needed).
Add a clear privacy notice + redaction mode (e.g., emails/phone numbers) before sending to LLM.
Add regression tests for parsing and streaming event sequencing.

Nice-to-have

Switch to SSE for more compatible streaming semantics (still line-based events).
Let users choose doc count/templates (custom bundles).
Add “resume generation” and partial retries per doc if a separator is missing.
Add “export to repo” integration (e.g., GitHub PR creation) — with explicit user consent.
Add optional RAG over user-provided attachments for consistent facts and cross-references.
Add multi-tenant usage dashboard (for internal ops) with cost attribution per workspace.

Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”

1. What This Is (one paragraph)

2. Who It’s For + Use Cases

Primary users

Non-technical builders who start ideas via long AI chats and need a bridge to “engineer-ready” documentation.
Engineers/tech leads who want a fast “single source of truth” spec scaffold before implementation.
Users of coding agents (Cursor / Claude Code / Augment Code) who want to reduce ambiguity and rework.

Common use cases

Convert a long brainstorming thread into implementable requirements + architecture + acceptance criteria.
Produce a repeatable “spec pack” you can drop into a repo before asking a coding AI to build.
Extract decisions/constraints/edge cases and make them traceable to quoted source snippets.

What “good outcome” looks like (repo evidence-backed)

11 documents are generated and previewable in the browser, then downloadable (single files or ZIP). (Evidence: backend/static/index.en.html:46, backend/static/app.js:352, backend/app.py:514)
The documents follow fixed separators and file names in either English or Chinese bundles. (Evidence: backend/prompt.py:218, backend/prompt.py:258, backend/prompt.py:493)

3. Product Surface Area (Features)

Feature: Paste context (primary input)

What it does: User pastes long context; UI shows character count; context is sent to the backend to generate docs. (Evidence: backend/static/app.js:246, backend/static/app.js:352)
Why it exists: The tool is designed around “raw context” as the single input, matching the stated problem of long AI chat histories. (Evidence: career_signaling_post.md:9, backend/static/index.en.html:48)
User journey (3–6 steps):
1. Open / (English) or /zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)
2. Paste context into the textarea.
3. Click Generate.
4. Watch documents stream in.
5. Preview and download results.
Constraints:
- Empty/whitespace-only context is rejected with a 400. (Evidence: backend/app.py:156, backend/app.py:222)

Feature: Upload multiple files (client-side) and merge into context

What it does: Allows selecting multiple files in the browser, merges their text into the textarea with per-file headers (=== filename ===). (Evidence: backend/static/app.js:271, backend/static/app.js:329)
Why it exists: Many “long contexts” live in files (notes, transcripts); merging keeps a single payload for generation. (Evidence: backend/static/index.en.html:48, career_signaling_post.md:23)
User journey:
1. Choose files.
2. UI lists uploads and lets you remove individual files.
3. Combined text is inserted into the context input.
Constraints:
- Uses File.text() in the browser; binary formats and very large files may fail or be slow; failures are replaced with a localized [Unable to read file contents] marker. (Evidence: backend/static/app.js:336)

Feature: Model toggle (GPT vs Gemini)

What it does: UI supports choosing gpt-5 or gemini-2.5-pro. (Evidence: backend/static/index.en.html:135, backend/static/app.js:197)
Why it exists: Lets users choose between “Better reasoning” and “Faster response” as described in the UI. (Evidence: backend/static/index.en.html:137, backend/static/index.en.html:141)
Constraints:
- Backend treats any model name containing "gemini" as non-streaming and uses a heartbeat loop + full-response parse. (Evidence: backend/app.py:234, backend/app.py:251)

Feature: Language toggle (EN/ZH)

What it does: / serves English by default (if index.en.html exists), with /zh for Chinese. Prompt bundle switches separators, file names, and copy. (Evidence: backend/app.py:114, backend/prompt.py:482)
Why it exists: The underlying prompt and doc names are localized (two prompt bundles). (Evidence: backend/prompt.py:254, backend/prompt.py:258)
Constraints:
- Only English/Chinese are supported; unknown values fall back to English. (Evidence: backend/prompt.py:482)

Feature: Streaming generation UX (11 docs as progress units)

What it does: Backend streams NDJSON events (meta, doc_started, chunk, doc_complete, done, error, and heartbeat) and frontend renders per-doc cards with status. (Evidence: backend/app.py:209, backend/static/app.js:462)
Why it exists: Improves perceived latency and reduces “blank screen” time for long generations. (Evidence: backend/static/index.en.html:153, backend/app.py:456)
Constraints:
- Requires the model to emit correct separators; backend and frontend include best-effort tolerance and fallbacks. (Evidence: backend/app.py:78, backend/prompt.py:452)

Feature: Preview, copy, and download documents (single + ZIP)

What it does: Users can open documents in a modal while streaming, copy to clipboard, download single docs, or download a ZIP via backend. (Evidence: backend/static/app.js:731, backend/static/app.js:828, backend/app.py:514)
Why it exists: The output is intended to be moved into a project repo. (Evidence: backend/static/index.en.html:209)
Constraints:
- ZIP filename is hard-coded to context_compiler_output.zip in backend response headers (even though the UI uses localized names). (Evidence: backend/app.py:525, backend/static/app.js:79)

Feature: “AI coding prompt” handoff box

What it does: Results UI contains a “Next: have a coding AI implement it” prompt box and a copy button. (Evidence: backend/static/index.en.html:198)
Why it exists: The product’s intended workflow is “generate specs → hand to coding agent”. (Evidence: backend/static/index.en.html:81, career_signaling_post.md:9)

Feature: Analytics + feedback (client-side)

What it does: Frontend triggers GA4 events and uses Microsoft Clarity, tracks anonymous user id + stats in localStorage, and shows thumbs feedback + NPS after multiple generations. (Evidence: backend/static/index.en.html:4, backend/static/app.js:16, backend/static/app.js:1055)
Constraints / privacy notes:
- Clarity “project id” is a placeholder string in HTML; actual deployment must replace it. (Evidence: backend/static/index.en.html:18)
- No backend consent or privacy controls are present in this repo. Unknown (not found in repo) whether deployment adds them.

4. Architecture Overview

Components diagram (text)

Browser (static HTML/CSS/JS)
  ├─ GET /, /en, /zh  → FastAPI serves HTML
  ├─ GET /static/*    → FastAPI serves JS/CSS
  └─ POST /api/generate-stream (JSON) ───────────────┐
                                                     ▼
FastAPI backend (Python)
  ├─ Builds full_prompt = ULTIMATE_PROMPT + context
  ├─ Calls BuilderSpace OpenAI-compatible API (chat.completions)
  ├─ Streams NDJSON events back to browser
  └─ (Optional) Zips documents for download
                                                     ▼
LLM Provider via BuilderSpace proxy
  └─ Returns text that includes 11 doc separators and content

Responsibilities per component

Frontend (backend/static/*): Collects input, initiates generation, renders stream events into 11 document cards, provides download/copy utilities, tracks analytics. (Evidence: backend/static/app.js:352, backend/static/app.js:462)
Backend (backend/app.py): Exposes routes, builds prompts, calls LLM, parses separators into docs, handles streaming and heartbeats, serves static assets. (Evidence: backend/app.py:28, backend/app.py:207, backend/app.py:529)
Prompt bundle (backend/prompt.py): Defines the “11-document contract”: names, separators, and the full instruction prompt (EN/ZH). (Evidence: backend/prompt.py:218, backend/prompt.py:493)

Key runtime assumptions

A valid AI_BUILDER_TOKEN is configured at runtime; otherwise generation fails (observed 401 with dummy token). (Evidence: backend/app.py:31, training_runs/2026-01-17T20-53-16Z_notes.md:58)
LLM outputs must include expected separators; otherwise parsing degrades to a single “full output” doc. (Evidence: backend/app.py:185, backend/prompt.py:452)

5. Data Model

API request/response models (backend)

GenerateRequest: { context: string, project_name?: string, model?: string, lang?: string } (Evidence: backend/app.py:50)
DocumentResponse: { name: string, content: string } (Evidence: backend/app.py:58)
GenerateResponse: { success: boolean, project_name: string, documents: DocumentResponse[], generated_at: string, raw_response?: string } (Evidence: backend/app.py:64)

Streaming event “schema” (backend → frontend)

NDJSON events for /api/generate-stream: (Evidence: backend/app.py:209)

meta: { type, project_name, document_names, generated_at }
doc_started: { type, doc_index }
chunk: { type, doc_index, delta }
doc_complete: { type, doc_index }
heartbeat: { type, elapsed_seconds, message } (Gemini and GPT timeouts)
done: { type }
error: { type, message }

Client-side storage (browser)

Stored in localStorage (not sent to backend by this repo):

abstraction_user_id (anonymous identifier) (Evidence: backend/static/app.js:18)
abstraction_stats (generation/download counts, first/last visit, NPS/feedback flags) (Evidence: backend/static/app.js:27)

6. AI System Design (if applicable)

Knowledge ingestion (sources, parsing, chunking)

Sources: pasted text + browser-read file contents merged into one string. (Evidence: backend/static/app.js:329)
Chunking strategy: Unknown (not found in repo). The backend sends the full context as a single user message; no chunking is implemented. (Evidence: backend/app.py:171)

Embeddings

Not used. (Evidence: no embedding code or deps; backend/requirements.txt contains no vector DB clients.)

Retrieval

Not used. (Evidence: no retrieval modules; single “prompt + context” call.)

Generation (models, prompts, grounding)

Model selection: passed through from the UI; defaults to gpt-5. (Evidence: backend/app.py:54, backend/static/index.en.html:135)
Prompting: full_prompt = ultimate_prompt + context, where ultimate_prompt includes strict separators, file names, and per-doc templates. (Evidence: backend/app.py:162, backend/prompt.py:274)
Output contract: Must emit 11 documents with exact separator lines and file names in order. (Evidence: backend/prompt.py:316, backend/prompt.py:452)
Parameters: max_tokens=32000, temperature=1.0. (Evidence: backend/app.py:176)

Parsing & hallucination control

Primary control: enforce separators + file name order in the prompt. (Evidence: backend/prompt.py:316)
Parser behavior: backend searches for each separator and slices content; it tolerates minor spacing differences around "=====". (Evidence: backend/app.py:82)
Fallback: if parsing yields no documents, return a single “full output” doc. (Evidence: backend/app.py:185)
Remaining risk: If the model omits separators or generates malformed JSON for TRACE_MAP.json, the system will still display raw text but with reduced structure. Unknown (not found in repo) whether the deployment adds output validation/retries beyond what’s in this codebase.

Streaming design

GPT streaming: calls client.chat.completions.create(..., stream=True) and incrementally emits NDJSON events. (Evidence: backend/app.py:363, backend/app.py:207)
Gemini fallback: treats any model name containing "gemini" as non-streaming, but still keeps the HTTP connection alive with heartbeat events while waiting on a background thread. (Evidence: backend/app.py:234, backend/app.py:251)
Frontend rendering: throttles document card updates to reduce jitter (RENDER_THROTTLE_MS = 100). (Evidence: backend/static/app.js:200)

Evaluation

Unknown (not found in repo). No eval scripts, golden tests, or scoring harnesses are present. Suggested next step: add a small regression suite with fixed inputs and snapshot outputs (redacted) to detect separator drift and doc completeness.

7. Reliability, Security, and Privacy

Threat model (what can go wrong)

Cost abuse: Public /api/generate* endpoints can be spammed to run up LLM costs if deployed without auth/rate limiting. (Evidence: endpoints at backend/app.py:151, backend/app.py:207, CORS * at backend/app.py:41)
Prompt injection / separator breaking: User context can instruct the model to ignore separators, producing unparseable output. Parser has limited tolerance but no enforcement. (Evidence: backend/prompt.py:452, backend/app.py:78)
Privacy leakage: User context is transmitted to an external LLM endpoint; retention policy is unknown. (Evidence: backend/app.py:171; Unknown (not found in repo): privacy/retention policy docs.)

Authn/authz

Backend auth: None. No sessions, cookies, or auth middleware found. (Evidence: backend/app.py:15, backend/requirements.txt:1)
Frontend “identity”: anonymous localStorage ID for analytics only. (Evidence: backend/static/app.js:16)

CSRF/CORS/rate limiting

CORS: allow_origins=["*"] and allows credentials/methods/headers broadly. (Evidence: backend/app.py:41)
CSRF/rate limiting: Unknown (not found in repo). No CSRF tokens or rate limiting middleware present.

Secret handling

Backend reads AI_BUILDER_TOKEN from environment and calls load_dotenv() at import time. (Evidence: backend/app.py:25, backend/app.py:31)
.gitignore lists .env, but this repo tree contains .env and backend/.env; whether they contain real credentials is unknown (values not inspected here). (Evidence: .gitignore:1, training_runs/2026-01-17T20-53-16Z_notes.md:43)

Data retention & redaction

Unknown (not found in repo). The system does not implement redaction before sending context to the model.

8. Performance & Cost

Latency drivers

Dominated by LLM response time + output size (up to max_tokens=32000). (Evidence: backend/app.py:176)
Long contexts increase generation time; UI explicitly warns “Usually 1–5 minutes” (Evidence: backend/static/index.en.html:153)

What is optimized (and why)

Streaming UX: NDJSON streaming provides progressive output to reduce perceived latency. (Evidence: backend/app.py:209, backend/static/app.js:352)
Timeout resilience: heartbeat events are sent when no chunks arrive to prevent connection idle timeouts. (Evidence: backend/app.py:456, backend/static/app.js:555)
Frontend jitter reduction: throttled re-rendering limits DOM churn under high-frequency streaming updates. (Evidence: backend/static/app.js:200)

Cost drivers

Each generation performs (at least) one LLM call with a large prompt and large output. (Evidence: backend/app.py:171)
Build-time/dev cost claim: the repo narrative claims ~$20 for development and iteration (not runtime). (Evidence: career_signaling_post.md:42)

What to measure next (explicit metrics)

End-to-end generation time distribution by model (p50/p95).
Token usage and cost per request (prompt tokens vs completion tokens).
Parse success rate: percentage of runs producing all 11 docs cleanly vs fallback.
Error rate and top error causes (401, timeouts, malformed separators).
Client metrics: time-to-first-doc, time-to-all-docs, abandon rate. (Unknown (not found in repo): backend metrics collection/instrumentation.)

9. Hardest Problems + Key Tradeoffs

Single-call 11-doc output vs 11 separate calls
- Chosen: single call with strict separators for simplicity and coherence. (Evidence: backend/prompt.py:316, backend/app.py:171)
- Tradeoff: one failure or separator drift can degrade the entire output; no per-doc retries.
Streaming parsing on the backend vs “wait then parse”
- Chosen: backend streams and splits docs during generation for better UX. (Evidence: backend/app.py:207)
- Tradeoff: complicated buffer/separator logic; harder to test; relies on exact separators.
Gemini “streaming” fallback vs uniform streaming
- Chosen: detect gemini and use heartbeat + full response parsing to preserve UI behavior. (Evidence: backend/app.py:234, backend/app.py:251)
- Tradeoff: no token-level streaming for Gemini; doc content appears in larger chunks.
Frictionless public demo vs securing the API
- Chosen (current): no auth, permissive CORS, simple endpoints. (Evidence: backend/app.py:41)
- Tradeoff: abuse risk and unclear privacy posture for production use.
Client-side file reads vs server-side uploads
- Chosen (current UI): read files in the browser and merge into a single payload. (Evidence: backend/static/app.js:329)
- Tradeoff: browser memory limits; no server-side file-type validation; but simpler backend.
- Note: Backend still includes /api/generate-from-file, suggesting an alternate design path. (Evidence: backend/app.py:490)
High temperature (1.0) vs strict determinism
- Chosen: temperature=1.0, likely to produce fuller prose and explanations. (Evidence: backend/app.py:177)
- Tradeoff: more variance; increased risk of format drift; would benefit from stronger validation and retries.

10. Operational Guide (Repro & Deploy)

Local run steps

Ensure Python is available (repo includes a venv at backend/venv/, but recreating is safer if it’s stale).
Set required env var (name only): AI_BUILDER_TOKEN. (Evidence: backend/app.py:31)
Run the backend from backend/:
- uvicorn app:app --host 127.0.0.1 --port 8000
Open http://127.0.0.1:8000/ (English) or http://127.0.0.1:8000/zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)

Required env vars (names only)

AI_BUILDER_TOKEN (Evidence: backend/app.py:31)

Ports used

Default: 8000 (Docker + local examples). (Evidence: Dockerfile:18, deploy-config.json:5)
In container/platform: $PORT is respected. (Evidence: Dockerfile:22)

How to deploy

Container build/run is defined by Dockerfile. (Evidence: Dockerfile:1)
- Build: docker build -t abstractai .
- Run: docker run -e AI_BUILDER_TOKEN=... -p 8000:8000 abstractai
deploy-config.json suggests a deployment target named abstractai on branch main with port 8000; additional platform details are unknown. (Evidence: deploy-config.json:3)

How to debug common failures

401 Invalid credentials: AI_BUILDER_TOKEN is missing or invalid. (Observed locally with dummy token.) (Evidence: training_runs/2026-01-17T20-53-16Z_notes.md:58)
Long-running requests timing out: rely on heartbeat events; if still timing out, increase server/proxy idle timeouts or move to WebSocket/SSE. (Evidence: backend/app.py:456, backend/static/app.js:555)
Docs not splitting into 11 cards: model likely broke separators; check raw output and tighten prompt / lower temperature / add validation + retry. (Evidence: backend/prompt.py:452, backend/app.py:185)

11. Evidence Map (repo anchors)

Claim	Evidence (repo anchors)
App purpose: compile long conversations into executable product docs	`backend/app.py:34`, `backend/static/index.en.html:36`, `career_signaling_post.md:21`
Uses FastAPI backend	`backend/app.py:15`, `backend/app.py:34`
Calls an OpenAI-compatible API via BuilderSpace	`backend/app.py:28`
Uses env var `AI_BUILDER_TOKEN`	`backend/app.py:31`, `training_runs/2026-01-17T20-53-16Z_notes.md:43`
Serves English by default, supports `/zh`	`backend/app.py:114`, `backend/app.py:136`
Health endpoint exists	`backend/app.py:145`, `training_runs/2026-01-17T20-53-16Z_notes.md:53`
Generation endpoint (non-streaming) exists	`backend/app.py:151`
Streaming endpoint emits NDJSON event types	`backend/app.py:209`, `backend/static/app.js:462`
Model defaults to `gpt-5`	`backend/app.py:54`, `backend/static/index.en.html:135`
Gemini uses a non-streaming fallback + heartbeat	`backend/app.py:234`, `backend/app.py:251`, `backend/static/app.js:555`
Prompt bundles define 11 fixed doc names and separators	`backend/prompt.py:218`, `backend/prompt.py:258`, `backend/prompt.py:493`
Parser tolerates minor separator formatting	`backend/app.py:82`
ZIP download endpoint exists	`backend/app.py:514`
Frontend reads multiple files and merges into context	`backend/static/app.js:271`, `backend/static/app.js:329`
Frontend calls `/api/generate-stream`	`backend/static/app.js:400`
Frontend tracks analytics + NPS/feedback	`backend/static/index.en.html:4`, `backend/static/app.js:8`, `backend/static/app.js:1055`
Docker deployment exists and respects `$PORT`	`Dockerfile:18`, `Dockerfile:22`
Deploy config points to GitHub repo and service name	`deploy-config.json:2`, `deploy-config.json:3`
Repo narrative claims ~$20 dev cost and mentions GPT-5-pro for prompt authoring	`career_signaling_post.md:42`

12. Interview Question Bank + Answer Outlines

System design

Q1: How would you design an “AI spec compiler” with good UX for long outputs?

Stream output to the browser as structured events (meta/progress/chunks), not just a final blob.
Pick a stable output contract (e.g., separators + fixed doc order) to render partial results incrementally.
Implement connection keep-alives (heartbeat) to survive long model latency and proxy timeouts.
Throttle frontend rendering to avoid DOM jitter under high-frequency updates.
Add fallbacks: if parsing fails, still return a usable “full output”.
Evidence: backend/app.py:207, backend/static/app.js:352, backend/static/app.js:200, backend/app.py:185

Q2: Why NDJSON over WebSockets or SSE?

NDJSON works over plain HTTP responses and is easy to parse line-by-line in the browser stream reader.
Server and client can evolve event types without breaking a strict SSE format.
A migration path to SSE/WebSockets exists if you need better proxy compatibility and bidirectional control.
Evidence: backend/app.py:209, backend/static/app.js:422

Q3: What would you change to make this production-safe?

Add authentication or at least abuse controls (rate limiting, quotas, captcha).
Restrict CORS origins and disable credentials unless needed.
Add logging + metrics around generation and parse success.
Add request size limits and file type controls.
Add privacy policy, retention behavior, and redaction options.
Evidence: backend/app.py:41, backend/app.py:151, backend/static/index.en.html:4 (analytics present but no consent controls)

Q4: How do you deploy and operate it?

Containerize backend with Docker; serve static assets from the same service.
Use $PORT for platform-provided ports.
Provide /health for readiness checks.
Evidence: Dockerfile:1, Dockerfile:22, backend/app.py:145

AI/RAG

Q1: Is this a RAG system? If not, what are the tradeoffs?

It’s prompt-based compilation: all knowledge comes from the user’s provided context.
Benefit: simpler architecture, no indexing pipelines, fewer moving parts.
Tradeoff: long contexts increase latency/cost; no retrieval means no grounding to external trusted sources.
Upgrade path: add optional retrieval over a user-provided knowledge base for consistent facts.
Evidence: backend/app.py:171, backend/prompt.py:274

Q2: How do you control hallucinations and keep output structured?

Encode a strict output contract: fixed separators + file names + per-doc templates.
Add “no hallucination” instructions and direct unknowns into Open Questions.
Parse output and degrade gracefully when the contract isn’t met.
Add validators + retry loops for malformed separators/JSON.
Evidence: backend/prompt.py:316, backend/prompt.py:289, backend/app.py:78

Q3: How do you handle streaming differences across model providers?

Detect capability gaps (Gemini streaming unsupported in this proxy) and fall back to non-streaming mode.
Keep the client contract stable by still emitting progress/heartbeat events.
Evidence: backend/app.py:234, backend/app.py:251, backend/static/app.js:555

Debugging & reliability

Q1: A user reports the UI hangs during generation. How do you debug?

Check if the backend is emitting heartbeats or chunks; confirm proxy idle timeouts.
Verify frontend stream reader loop is still receiving data and parsing JSON lines.
Confirm the model call is still in progress; check server logs for timeouts.
Add instrumentation for “time since last chunk” and request lifecycle.
Evidence: backend/static/app.js:426, backend/app.py:456

Q2: The 11 documents don’t split correctly—everything ends up in one doc. What happened?

The model likely did not output the exact separators.
Verify the prompt bundle and separators; check for language mismatch or formatting drift.
Add stricter prompting, reduce temperature, and validate separators early with retries.
Evidence: backend/prompt.py:316, backend/app.py:82, backend/app.py:185

Q3: How do you prevent HTTP/2 timeouts on long generations?

Emit heartbeat events on the same response stream when no chunks arrive.
Use background threads + queues so the generator can keep writing while the model call blocks.
Evidence: backend/app.py:251, backend/app.py:456

Product sense

Q1: What user pain does this solve, and how do you measure success?

Pain: long AI chat contexts cause drift; jumping from idea → code skips PRD/system design layers.
Product: adds a “doc compilation” stage producing a spec pack that both humans and coding AIs can use.
Success metrics (suggested): reduced iteration cycles, higher first-pass implementation accuracy, parse success rate, completion time, user satisfaction (NPS).
Evidence: career_signaling_post.md:13, career_signaling_post.md:15, backend/static/index.en.html:36

Q2: Why 11 docs—why not fewer?

The prompt explicitly defines 11 audience-specific artifacts (PM/QA/engineer/execution) + evidence pack (decisions/edge cases/quotes/trace map).
This reduces ambiguity by forcing structure and traceability.
Evidence: backend/prompt.py:316, backend/prompt.py:343

Q3: What’s the biggest “product risk” here?

Users may paste sensitive data; without redaction and clear policy, this creates trust issues.
Unauthenticated public endpoint can be abused, impacting cost and reliability.
Output may still be wrong; without evals, quality can regress silently.
Evidence: backend/app.py:41, backend/app.py:171; Unknown (not found in repo): privacy docs and eval harness.

Behavioral

Q1: Tell me about a time you made a system more reliable without changing the core idea.

Identified that long model latency + proxy idle timeouts caused failures.
Added heartbeat events to keep connections alive, plus a background thread so the stream stays responsive.
Added a Gemini fallback when streaming was unsupported.
Evidence: backend/app.py:251, backend/app.py:234

Q2: How did you manage ambiguity in requirements?

Translated ambiguity into a structured “spec pack” format with explicit rules (MUST/SHOULD/MAY) and Open Questions.
The product itself encodes an anti-ambiguity workflow: compile context → implement.
Evidence: backend/prompt.py:295, backend/static/index.en.html:81

Q3: How do you collaborate with AI tools effectively?

Define contracts (separators, doc templates) so the model’s output is machine-parseable and reviewer-friendly.
Create a handoff prompt designed for coding agents, embedded in the UI.
Evidence: backend/prompt.py:316, backend/static/index.en.html:198

13. Roadmap (high-leverage upgrades)

Must

Add abuse protection: rate limiting + quotas + basic auth or signed tokens for generation endpoints.
Restrict CORS to known origins; disable allow_credentials unless required.
Add server-side observability: structured logs + request IDs + metrics (latency, tokens, error rates, parse success).
Add output validators (separator presence, JSON validity for trace map) + automatic retries with lower temperature.
Add request size limits and better file handling (MIME allowlist; server-side uploads if needed).
Add a clear privacy notice + redaction mode (e.g., emails/phone numbers) before sending to LLM.
Add regression tests for parsing and streaming event sequencing.

Nice-to-have

Switch to SSE for more compatible streaming semantics (still line-based events).
Let users choose doc count/templates (custom bundles).
Add “resume generation” and partial retries per doc if a separator is missing.
Add “export to repo” integration (e.g., GitHub PR creation) — with explicit user consent.
Add optional RAG over user-provided attachments for consistent facts and cross-references.
Add multi-tenant usage dashboard (for internal ops) with cost attribution per workspace.

Overview

Deep dive

Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”

1. What This Is (one paragraph)

2. Who It’s For + Use Cases

3. Product Surface Area (Features)

Feature: Paste context (primary input)

Feature: Upload multiple files (client-side) and merge into context

Feature: Model toggle (GPT vs Gemini)

Feature: Language toggle (EN/ZH)

Feature: Streaming generation UX (11 docs as progress units)

Feature: Preview, copy, and download documents (single + ZIP)

Feature: “AI coding prompt” handoff box

Feature: Analytics + feedback (client-side)

4. Architecture Overview

Components diagram (text)

Responsibilities per component

Key runtime assumptions

5. Data Model

API request/response models (backend)

Streaming event “schema” (backend → frontend)

Client-side storage (browser)

6. AI System Design (if applicable)

Knowledge ingestion (sources, parsing, chunking)

Embeddings

Retrieval

Generation (models, prompts, grounding)

Parsing & hallucination control

Streaming design

Evaluation

7. Reliability, Security, and Privacy

Threat model (what can go wrong)

Authn/authz

CSRF/CORS/rate limiting

Secret handling

Data retention & redaction

8. Performance & Cost

Latency drivers

What is optimized (and why)

Cost drivers

What to measure next (explicit metrics)

9. Hardest Problems + Key Tradeoffs

10. Operational Guide (Repro & Deploy)

Local run steps

Required env vars (names only)

Ports used

How to deploy

How to debug common failures

11. Evidence Map (repo anchors)

12. Interview Question Bank + Answer Outlines

System design

AI/RAG

Debugging & reliability

Product sense

Behavioral

13. Roadmap (high-leverage upgrades)

Must

Nice-to-have

Overview

Deep dive

Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”

1. What This Is (one paragraph)

2. Who It’s For + Use Cases

3. Product Surface Area (Features)

Feature: Paste context (primary input)

Feature: Upload multiple files (client-side) and merge into context

Feature: Model toggle (GPT vs Gemini)

Feature: Language toggle (EN/ZH)

Feature: Streaming generation UX (11 docs as progress units)

Feature: Preview, copy, and download documents (single + ZIP)

Feature: “AI coding prompt” handoff box

Feature: Analytics + feedback (client-side)

4. Architecture Overview

Components diagram (text)

Responsibilities per component

Key runtime assumptions

5. Data Model

API request/response models (backend)

Streaming event “schema” (backend → frontend)

Client-side storage (browser)