Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”
1. What This Is (one paragraph)
Abstraction AI is a small web app that takes long, messy conversation context (chat logs, meeting notes, or uploaded text files) and compiles it into a consistent “spec pack” of 11 structured documents (product overview, feature rules, technical architecture, tasks/acceptance, decisions, edge cases, quotes, trace map, open questions, inconsistencies) so humans and coding AIs can implement a project with less drift and fewer missed decisions.
2. Who It’s For + Use Cases
Primary users
- Non-technical builders who start ideas via long AI chats and need a bridge to “engineer-ready” documentation.
- Engineers/tech leads who want a fast “single source of truth” spec scaffold before implementation.
- Users of coding agents (Cursor / Claude Code / Augment Code) who want to reduce ambiguity and rework.
Common use cases
- Convert a long brainstorming thread into implementable requirements + architecture + acceptance criteria.
- Produce a repeatable “spec pack” you can drop into a repo before asking a coding AI to build.
- Extract decisions/constraints/edge cases and make them traceable to quoted source snippets.
What “good outcome” looks like (repo evidence-backed)
- 11 documents are generated and previewable in the browser, then downloadable (single files or ZIP). (Evidence:
backend/static/index.en.html:46,backend/static/app.js:352,backend/app.py:514) - The documents follow fixed separators and file names in either English or Chinese bundles. (Evidence:
backend/prompt.py:218,backend/prompt.py:258,backend/prompt.py:493)
3. Product Surface Area (Features)
Feature: Paste context (primary input)
- What it does: User pastes long context; UI shows character count; context is sent to the backend to generate docs. (Evidence:
backend/static/app.js:246,backend/static/app.js:352) - Why it exists: The tool is designed around “raw context” as the single input, matching the stated problem of long AI chat histories. (Evidence:
career_signaling_post.md:9,backend/static/index.en.html:48) - User journey (3–6 steps):
- Open
/(English) or/zh(Chinese). (Evidence:backend/app.py:114,backend/app.py:136) - Paste context into the textarea.
- Click Generate.
- Watch documents stream in.
- Preview and download results.
- Open
- Constraints:
- Empty/whitespace-only context is rejected with a 400. (Evidence:
backend/app.py:156,backend/app.py:222)
- Empty/whitespace-only context is rejected with a 400. (Evidence:
Feature: Upload multiple files (client-side) and merge into context
- What it does: Allows selecting multiple files in the browser, merges their text into the textarea with per-file headers (
=== filename ===). (Evidence:backend/static/app.js:271,backend/static/app.js:329) - Why it exists: Many “long contexts” live in files (notes, transcripts); merging keeps a single payload for generation. (Evidence:
backend/static/index.en.html:48,career_signaling_post.md:23) - User journey:
- Choose files.
- UI lists uploads and lets you remove individual files.
- Combined text is inserted into the context input.
- Constraints:
- Uses
File.text()in the browser; binary formats and very large files may fail or be slow; failures are replaced with a localized[Unable to read file contents]marker. (Evidence:backend/static/app.js:336)
- Uses
Feature: Model toggle (GPT vs Gemini)
- What it does: UI supports choosing
gpt-5orgemini-2.5-pro. (Evidence:backend/static/index.en.html:135,backend/static/app.js:197) - Why it exists: Lets users choose between “Better reasoning” and “Faster response” as described in the UI. (Evidence:
backend/static/index.en.html:137,backend/static/index.en.html:141) - Constraints:
- Backend treats any model name containing
"gemini"as non-streaming and uses a heartbeat loop + full-response parse. (Evidence:backend/app.py:234,backend/app.py:251)
- Backend treats any model name containing
Feature: Language toggle (EN/ZH)
- What it does:
/serves English by default (ifindex.en.htmlexists), with/zhfor Chinese. Prompt bundle switches separators, file names, and copy. (Evidence:backend/app.py:114,backend/prompt.py:482) - Why it exists: The underlying prompt and doc names are localized (two prompt bundles). (Evidence:
backend/prompt.py:254,backend/prompt.py:258) - Constraints:
- Only English/Chinese are supported; unknown values fall back to English. (Evidence:
backend/prompt.py:482)
- Only English/Chinese are supported; unknown values fall back to English. (Evidence:
Feature: Streaming generation UX (11 docs as progress units)
- What it does: Backend streams NDJSON events (
meta,doc_started,chunk,doc_complete,done,error, andheartbeat) and frontend renders per-doc cards with status. (Evidence:backend/app.py:209,backend/static/app.js:462) - Why it exists: Improves perceived latency and reduces “blank screen” time for long generations. (Evidence:
backend/static/index.en.html:153,backend/app.py:456) - Constraints:
- Requires the model to emit correct separators; backend and frontend include best-effort tolerance and fallbacks. (Evidence:
backend/app.py:78,backend/prompt.py:452)
- Requires the model to emit correct separators; backend and frontend include best-effort tolerance and fallbacks. (Evidence:
Feature: Preview, copy, and download documents (single + ZIP)
- What it does: Users can open documents in a modal while streaming, copy to clipboard, download single docs, or download a ZIP via backend. (Evidence:
backend/static/app.js:731,backend/static/app.js:828,backend/app.py:514) - Why it exists: The output is intended to be moved into a project repo. (Evidence:
backend/static/index.en.html:209) - Constraints:
- ZIP filename is hard-coded to
context_compiler_output.zipin backend response headers (even though the UI uses localized names). (Evidence:backend/app.py:525,backend/static/app.js:79)
- ZIP filename is hard-coded to
Feature: “AI coding prompt” handoff box
- What it does: Results UI contains a “Next: have a coding AI implement it” prompt box and a copy button. (Evidence:
backend/static/index.en.html:198) - Why it exists: The product’s intended workflow is “generate specs → hand to coding agent”. (Evidence:
backend/static/index.en.html:81,career_signaling_post.md:9)
Feature: Analytics + feedback (client-side)
- What it does: Frontend triggers GA4 events and uses Microsoft Clarity, tracks anonymous user id + stats in
localStorage, and shows thumbs feedback + NPS after multiple generations. (Evidence:backend/static/index.en.html:4,backend/static/app.js:16,backend/static/app.js:1055) - Constraints / privacy notes:
- Clarity “project id” is a placeholder string in HTML; actual deployment must replace it. (Evidence:
backend/static/index.en.html:18) - No backend consent or privacy controls are present in this repo. Unknown (not found in repo) whether deployment adds them.
- Clarity “project id” is a placeholder string in HTML; actual deployment must replace it. (Evidence:
4. Architecture Overview
Components diagram (text)
Browser (static HTML/CSS/JS)
├─ GET /, /en, /zh → FastAPI serves HTML
├─ GET /static/* → FastAPI serves JS/CSS
└─ POST /api/generate-stream (JSON) ───────────────┐
▼
FastAPI backend (Python)
├─ Builds full_prompt = ULTIMATE_PROMPT + context
├─ Calls BuilderSpace OpenAI-compatible API (chat.completions)
├─ Streams NDJSON events back to browser
└─ (Optional) Zips documents for download
▼
LLM Provider via BuilderSpace proxy
└─ Returns text that includes 11 doc separators and content
Responsibilities per component
- Frontend (
backend/static/*): Collects input, initiates generation, renders stream events into 11 document cards, provides download/copy utilities, tracks analytics. (Evidence:backend/static/app.js:352,backend/static/app.js:462) - Backend (
backend/app.py): Exposes routes, builds prompts, calls LLM, parses separators into docs, handles streaming and heartbeats, serves static assets. (Evidence:backend/app.py:28,backend/app.py:207,backend/app.py:529) - Prompt bundle (
backend/prompt.py): Defines the “11-document contract”: names, separators, and the full instruction prompt (EN/ZH). (Evidence:backend/prompt.py:218,backend/prompt.py:493)
Key runtime assumptions
- A valid
AI_BUILDER_TOKENis configured at runtime; otherwise generation fails (observed 401 with dummy token). (Evidence:backend/app.py:31,training_runs/2026-01-17T20-53-16Z_notes.md:58) - LLM outputs must include expected separators; otherwise parsing degrades to a single “full output” doc. (Evidence:
backend/app.py:185,backend/prompt.py:452)
5. Data Model
This project is intentionally “stateless” server-side: there is no database layer or persistent server storage implemented in this repo. (Evidence: backend/requirements.txt:1 (no DB libs), backend/app.py:3 (no ORM/DB imports).)
API request/response models (backend)
GenerateRequest:{ context: string, project_name?: string, model?: string, lang?: string }(Evidence:backend/app.py:50)DocumentResponse:{ name: string, content: string }(Evidence:backend/app.py:58)GenerateResponse:{ success: boolean, project_name: string, documents: DocumentResponse[], generated_at: string, raw_response?: string }(Evidence:backend/app.py:64)
Streaming event “schema” (backend → frontend)
NDJSON events for /api/generate-stream: (Evidence: backend/app.py:209)
meta:{ type, project_name, document_names, generated_at }doc_started:{ type, doc_index }chunk:{ type, doc_index, delta }doc_complete:{ type, doc_index }heartbeat:{ type, elapsed_seconds, message }(Gemini and GPT timeouts)done:{ type }error:{ type, message }
Client-side storage (browser)
Stored in localStorage (not sent to backend by this repo):
abstraction_user_id(anonymous identifier) (Evidence:backend/static/app.js:18)abstraction_stats(generation/download counts, first/last visit, NPS/feedback flags) (Evidence:backend/static/app.js:27)
6. AI System Design (if applicable)
This is a “prompt compiler” system, not a RAG system: it does not ingest into a knowledge base, compute embeddings, or run retrieval. All “knowledge” comes from the user-provided context payload.
Knowledge ingestion (sources, parsing, chunking)
- Sources: pasted text + browser-read file contents merged into one string. (Evidence:
backend/static/app.js:329) - Chunking strategy: Unknown (not found in repo). The backend sends the full context as a single user message; no chunking is implemented. (Evidence:
backend/app.py:171)
Embeddings
- Not used. (Evidence: no embedding code or deps;
backend/requirements.txtcontains no vector DB clients.)
Retrieval
- Not used. (Evidence: no retrieval modules; single “prompt + context” call.)
Generation (models, prompts, grounding)
- Model selection: passed through from the UI; defaults to
gpt-5. (Evidence:backend/app.py:54,backend/static/index.en.html:135) - Prompting:
full_prompt = ultimate_prompt + context, whereultimate_promptincludes strict separators, file names, and per-doc templates. (Evidence:backend/app.py:162,backend/prompt.py:274) - Output contract: Must emit 11 documents with exact separator lines and file names in order. (Evidence:
backend/prompt.py:316,backend/prompt.py:452) - Parameters:
max_tokens=32000,temperature=1.0. (Evidence:backend/app.py:176)
Parsing & hallucination control
- Primary control: enforce separators + file name order in the prompt. (Evidence:
backend/prompt.py:316) - Parser behavior: backend searches for each separator and slices content; it tolerates minor spacing differences around
"=====". (Evidence:backend/app.py:82) - Fallback: if parsing yields no documents, return a single “full output” doc. (Evidence:
backend/app.py:185) - Remaining risk: If the model omits separators or generates malformed JSON for
TRACE_MAP.json, the system will still display raw text but with reduced structure. Unknown (not found in repo) whether the deployment adds output validation/retries beyond what’s in this codebase.
Streaming design
- GPT streaming: calls
client.chat.completions.create(..., stream=True)and incrementally emits NDJSON events. (Evidence:backend/app.py:363,backend/app.py:207) - Gemini fallback: treats any model name containing
"gemini"as non-streaming, but still keeps the HTTP connection alive with heartbeat events while waiting on a background thread. (Evidence:backend/app.py:234,backend/app.py:251) - Frontend rendering: throttles document card updates to reduce jitter (
RENDER_THROTTLE_MS = 100). (Evidence:backend/static/app.js:200)
Evaluation
- Unknown (not found in repo). No eval scripts, golden tests, or scoring harnesses are present. Suggested next step: add a small regression suite with fixed inputs and snapshot outputs (redacted) to detect separator drift and doc completeness.
7. Reliability, Security, and Privacy
Threat model (what can go wrong)
- Cost abuse: Public
/api/generate*endpoints can be spammed to run up LLM costs if deployed without auth/rate limiting. (Evidence: endpoints atbackend/app.py:151,backend/app.py:207, CORS*atbackend/app.py:41) - Prompt injection / separator breaking: User context can instruct the model to ignore separators, producing unparseable output. Parser has limited tolerance but no enforcement. (Evidence:
backend/prompt.py:452,backend/app.py:78) - Privacy leakage: User context is transmitted to an external LLM endpoint; retention policy is unknown. (Evidence:
backend/app.py:171; Unknown (not found in repo): privacy/retention policy docs.)
Authn/authz
- Backend auth: None. No sessions, cookies, or auth middleware found. (Evidence:
backend/app.py:15,backend/requirements.txt:1) - Frontend “identity”: anonymous localStorage ID for analytics only. (Evidence:
backend/static/app.js:16)
CSRF/CORS/rate limiting
- CORS:
allow_origins=["*"]and allows credentials/methods/headers broadly. (Evidence:backend/app.py:41) - CSRF/rate limiting: Unknown (not found in repo). No CSRF tokens or rate limiting middleware present.
Secret handling
- Backend reads
AI_BUILDER_TOKENfrom environment and callsload_dotenv()at import time. (Evidence:backend/app.py:25,backend/app.py:31) .gitignorelists.env, but this repo tree contains.envandbackend/.env; whether they contain real credentials is unknown (values not inspected here). (Evidence:.gitignore:1,training_runs/2026-01-17T20-53-16Z_notes.md:43)
Data retention & redaction
- Unknown (not found in repo). The system does not implement redaction before sending context to the model.
8. Performance & Cost
Latency drivers
- Dominated by LLM response time + output size (up to
max_tokens=32000). (Evidence:backend/app.py:176) - Long contexts increase generation time; UI explicitly warns “Usually 1–5 minutes” (Evidence:
backend/static/index.en.html:153)
What is optimized (and why)
- Streaming UX: NDJSON streaming provides progressive output to reduce perceived latency. (Evidence:
backend/app.py:209,backend/static/app.js:352) - Timeout resilience: heartbeat events are sent when no chunks arrive to prevent connection idle timeouts. (Evidence:
backend/app.py:456,backend/static/app.js:555) - Frontend jitter reduction: throttled re-rendering limits DOM churn under high-frequency streaming updates. (Evidence:
backend/static/app.js:200)
Cost drivers
- Each generation performs (at least) one LLM call with a large prompt and large output. (Evidence:
backend/app.py:171) - Build-time/dev cost claim: the repo narrative claims ~$20 for development and iteration (not runtime). (Evidence:
career_signaling_post.md:42)
What to measure next (explicit metrics)
- End-to-end generation time distribution by model (
p50/p95). - Token usage and cost per request (prompt tokens vs completion tokens).
- Parse success rate: percentage of runs producing all 11 docs cleanly vs fallback.
- Error rate and top error causes (401, timeouts, malformed separators).
- Client metrics: time-to-first-doc, time-to-all-docs, abandon rate. (Unknown (not found in repo): backend metrics collection/instrumentation.)
9. Hardest Problems + Key Tradeoffs
-
Single-call 11-doc output vs 11 separate calls
- Chosen: single call with strict separators for simplicity and coherence. (Evidence:
backend/prompt.py:316,backend/app.py:171) - Tradeoff: one failure or separator drift can degrade the entire output; no per-doc retries.
- Chosen: single call with strict separators for simplicity and coherence. (Evidence:
-
Streaming parsing on the backend vs “wait then parse”
- Chosen: backend streams and splits docs during generation for better UX. (Evidence:
backend/app.py:207) - Tradeoff: complicated buffer/separator logic; harder to test; relies on exact separators.
- Chosen: backend streams and splits docs during generation for better UX. (Evidence:
-
Gemini “streaming” fallback vs uniform streaming
- Chosen: detect
geminiand use heartbeat + full response parsing to preserve UI behavior. (Evidence:backend/app.py:234,backend/app.py:251) - Tradeoff: no token-level streaming for Gemini; doc content appears in larger chunks.
- Chosen: detect
-
Frictionless public demo vs securing the API
- Chosen (current): no auth, permissive CORS, simple endpoints. (Evidence:
backend/app.py:41) - Tradeoff: abuse risk and unclear privacy posture for production use.
- Chosen (current): no auth, permissive CORS, simple endpoints. (Evidence:
-
Client-side file reads vs server-side uploads
- Chosen (current UI): read files in the browser and merge into a single payload. (Evidence:
backend/static/app.js:329) - Tradeoff: browser memory limits; no server-side file-type validation; but simpler backend.
- Note: Backend still includes
/api/generate-from-file, suggesting an alternate design path. (Evidence:backend/app.py:490)
- Chosen (current UI): read files in the browser and merge into a single payload. (Evidence:
-
High temperature (1.0) vs strict determinism
- Chosen:
temperature=1.0, likely to produce fuller prose and explanations. (Evidence:backend/app.py:177) - Tradeoff: more variance; increased risk of format drift; would benefit from stronger validation and retries.
- Chosen:
10. Operational Guide (Repro & Deploy)
Local run steps
- Ensure Python is available (repo includes a venv at
backend/venv/, but recreating is safer if it’s stale). - Set required env var (name only):
AI_BUILDER_TOKEN. (Evidence:backend/app.py:31) - Run the backend from
backend/:uvicorn app:app --host 127.0.0.1 --port 8000
- Open
http://127.0.0.1:8000/(English) orhttp://127.0.0.1:8000/zh(Chinese). (Evidence:backend/app.py:114,backend/app.py:136)
Required env vars (names only)
AI_BUILDER_TOKEN(Evidence:backend/app.py:31)
Ports used
- Default:
8000(Docker + local examples). (Evidence:Dockerfile:18,deploy-config.json:5) - In container/platform:
$PORTis respected. (Evidence:Dockerfile:22)
How to deploy
- Container build/run is defined by
Dockerfile. (Evidence:Dockerfile:1)- Build:
docker build -t abstractai . - Run:
docker run -e AI_BUILDER_TOKEN=... -p 8000:8000 abstractai
- Build:
deploy-config.jsonsuggests a deployment target namedabstractaion branchmainwith port 8000; additional platform details are unknown. (Evidence:deploy-config.json:3)
How to debug common failures
- 401 Invalid credentials:
AI_BUILDER_TOKENis missing or invalid. (Observed locally with dummy token.) (Evidence:training_runs/2026-01-17T20-53-16Z_notes.md:58) - Long-running requests timing out: rely on heartbeat events; if still timing out, increase server/proxy idle timeouts or move to WebSocket/SSE. (Evidence:
backend/app.py:456,backend/static/app.js:555) - Docs not splitting into 11 cards: model likely broke separators; check raw output and tighten prompt / lower temperature / add validation + retry. (Evidence:
backend/prompt.py:452,backend/app.py:185)
11. Evidence Map (repo anchors)
| Claim | Evidence (repo anchors) |
|---|---|
| App purpose: compile long conversations into executable product docs | backend/app.py:34, backend/static/index.en.html:36, career_signaling_post.md:21 |
| Uses FastAPI backend | backend/app.py:15, backend/app.py:34 |
| Calls an OpenAI-compatible API via BuilderSpace | backend/app.py:28 |
Uses env var AI_BUILDER_TOKEN | backend/app.py:31, training_runs/2026-01-17T20-53-16Z_notes.md:43 |
Serves English by default, supports /zh | backend/app.py:114, backend/app.py:136 |
| Health endpoint exists | backend/app.py:145, training_runs/2026-01-17T20-53-16Z_notes.md:53 |
| Generation endpoint (non-streaming) exists | backend/app.py:151 |
| Streaming endpoint emits NDJSON event types | backend/app.py:209, backend/static/app.js:462 |
Model defaults to gpt-5 | backend/app.py:54, backend/static/index.en.html:135 |
| Gemini uses a non-streaming fallback + heartbeat | backend/app.py:234, backend/app.py:251, backend/static/app.js:555 |
| Prompt bundles define 11 fixed doc names and separators | backend/prompt.py:218, backend/prompt.py:258, backend/prompt.py:493 |
| Parser tolerates minor separator formatting | backend/app.py:82 |
| ZIP download endpoint exists | backend/app.py:514 |
| Frontend reads multiple files and merges into context | backend/static/app.js:271, backend/static/app.js:329 |
Frontend calls /api/generate-stream | backend/static/app.js:400 |
| Frontend tracks analytics + NPS/feedback | backend/static/index.en.html:4, backend/static/app.js:8, backend/static/app.js:1055 |
Docker deployment exists and respects $PORT | Dockerfile:18, Dockerfile:22 |
| Deploy config points to GitHub repo and service name | deploy-config.json:2, deploy-config.json:3 |
| Repo narrative claims ~$20 dev cost and mentions GPT-5-pro for prompt authoring | career_signaling_post.md:42 |
12. Interview Question Bank + Answer Outlines
System design
Q1: How would you design an “AI spec compiler” with good UX for long outputs?
- Stream output to the browser as structured events (meta/progress/chunks), not just a final blob.
- Pick a stable output contract (e.g., separators + fixed doc order) to render partial results incrementally.
- Implement connection keep-alives (heartbeat) to survive long model latency and proxy timeouts.
- Throttle frontend rendering to avoid DOM jitter under high-frequency updates.
- Add fallbacks: if parsing fails, still return a usable “full output”.
- Evidence:
backend/app.py:207,backend/static/app.js:352,backend/static/app.js:200,backend/app.py:185
Q2: Why NDJSON over WebSockets or SSE?
- NDJSON works over plain HTTP responses and is easy to parse line-by-line in the browser stream reader.
- Server and client can evolve event types without breaking a strict SSE format.
- A migration path to SSE/WebSockets exists if you need better proxy compatibility and bidirectional control.
- Evidence:
backend/app.py:209,backend/static/app.js:422
Q3: What would you change to make this production-safe?
- Add authentication or at least abuse controls (rate limiting, quotas, captcha).
- Restrict CORS origins and disable credentials unless needed.
- Add logging + metrics around generation and parse success.
- Add request size limits and file type controls.
- Add privacy policy, retention behavior, and redaction options.
- Evidence:
backend/app.py:41,backend/app.py:151,backend/static/index.en.html:4(analytics present but no consent controls)
Q4: How do you deploy and operate it?
- Containerize backend with Docker; serve static assets from the same service.
- Use
$PORTfor platform-provided ports. - Provide
/healthfor readiness checks. - Evidence:
Dockerfile:1,Dockerfile:22,backend/app.py:145
AI/RAG
Q1: Is this a RAG system? If not, what are the tradeoffs?
- It’s prompt-based compilation: all knowledge comes from the user’s provided context.
- Benefit: simpler architecture, no indexing pipelines, fewer moving parts.
- Tradeoff: long contexts increase latency/cost; no retrieval means no grounding to external trusted sources.
- Upgrade path: add optional retrieval over a user-provided knowledge base for consistent facts.
- Evidence:
backend/app.py:171,backend/prompt.py:274
Q2: How do you control hallucinations and keep output structured?
- Encode a strict output contract: fixed separators + file names + per-doc templates.
- Add “no hallucination” instructions and direct unknowns into Open Questions.
- Parse output and degrade gracefully when the contract isn’t met.
- Add validators + retry loops for malformed separators/JSON.
- Evidence:
backend/prompt.py:316,backend/prompt.py:289,backend/app.py:78
Q3: How do you handle streaming differences across model providers?
- Detect capability gaps (Gemini streaming unsupported in this proxy) and fall back to non-streaming mode.
- Keep the client contract stable by still emitting progress/heartbeat events.
- Evidence:
backend/app.py:234,backend/app.py:251,backend/static/app.js:555
Debugging & reliability
Q1: A user reports the UI hangs during generation. How do you debug?
- Check if the backend is emitting heartbeats or chunks; confirm proxy idle timeouts.
- Verify frontend stream reader loop is still receiving data and parsing JSON lines.
- Confirm the model call is still in progress; check server logs for timeouts.
- Add instrumentation for “time since last chunk” and request lifecycle.
- Evidence:
backend/static/app.js:426,backend/app.py:456
Q2: The 11 documents don’t split correctly—everything ends up in one doc. What happened?
- The model likely did not output the exact separators.
- Verify the prompt bundle and separators; check for language mismatch or formatting drift.
- Add stricter prompting, reduce temperature, and validate separators early with retries.
- Evidence:
backend/prompt.py:316,backend/app.py:82,backend/app.py:185
Q3: How do you prevent HTTP/2 timeouts on long generations?
- Emit heartbeat events on the same response stream when no chunks arrive.
- Use background threads + queues so the generator can keep writing while the model call blocks.
- Evidence:
backend/app.py:251,backend/app.py:456
Product sense
Q1: What user pain does this solve, and how do you measure success?
- Pain: long AI chat contexts cause drift; jumping from idea → code skips PRD/system design layers.
- Product: adds a “doc compilation” stage producing a spec pack that both humans and coding AIs can use.
- Success metrics (suggested): reduced iteration cycles, higher first-pass implementation accuracy, parse success rate, completion time, user satisfaction (NPS).
- Evidence:
career_signaling_post.md:13,career_signaling_post.md:15,backend/static/index.en.html:36
Q2: Why 11 docs—why not fewer?
- The prompt explicitly defines 11 audience-specific artifacts (PM/QA/engineer/execution) + evidence pack (decisions/edge cases/quotes/trace map).
- This reduces ambiguity by forcing structure and traceability.
- Evidence:
backend/prompt.py:316,backend/prompt.py:343
Q3: What’s the biggest “product risk” here?
- Users may paste sensitive data; without redaction and clear policy, this creates trust issues.
- Unauthenticated public endpoint can be abused, impacting cost and reliability.
- Output may still be wrong; without evals, quality can regress silently.
- Evidence:
backend/app.py:41,backend/app.py:171; Unknown (not found in repo): privacy docs and eval harness.
Behavioral
Q1: Tell me about a time you made a system more reliable without changing the core idea.
- Identified that long model latency + proxy idle timeouts caused failures.
- Added heartbeat events to keep connections alive, plus a background thread so the stream stays responsive.
- Added a Gemini fallback when streaming was unsupported.
- Evidence:
backend/app.py:251,backend/app.py:234
Q2: How did you manage ambiguity in requirements?
- Translated ambiguity into a structured “spec pack” format with explicit rules (MUST/SHOULD/MAY) and Open Questions.
- The product itself encodes an anti-ambiguity workflow: compile context → implement.
- Evidence:
backend/prompt.py:295,backend/static/index.en.html:81
Q3: How do you collaborate with AI tools effectively?
- Define contracts (separators, doc templates) so the model’s output is machine-parseable and reviewer-friendly.
- Create a handoff prompt designed for coding agents, embedded in the UI.
- Evidence:
backend/prompt.py:316,backend/static/index.en.html:198
13. Roadmap (high-leverage upgrades)
Must
- Add abuse protection: rate limiting + quotas + basic auth or signed tokens for generation endpoints.
- Restrict CORS to known origins; disable
allow_credentialsunless required. - Add server-side observability: structured logs + request IDs + metrics (latency, tokens, error rates, parse success).
- Add output validators (separator presence, JSON validity for trace map) + automatic retries with lower temperature.
- Add request size limits and better file handling (MIME allowlist; server-side uploads if needed).
- Add a clear privacy notice + redaction mode (e.g., emails/phone numbers) before sending to LLM.
- Add regression tests for parsing and streaming event sequencing.
Nice-to-have
- Switch to SSE for more compatible streaming semantics (still line-based events).
- Let users choose doc count/templates (custom bundles).
- Add “resume generation” and partial retries per doc if a separator is missing.
- Add “export to repo” integration (e.g., GitHub PR creation) — with explicit user consent.
- Add optional RAG over user-provided attachments for consistent facts and cross-references.
- Add multi-tenant usage dashboard (for internal ops) with cost attribution per workspace.