CC
ChengAI
HomeExperienceProjectsSkillsArticlesStoriesChat with AI

© 2026 Charlie Cheng. Built with ChengAI.

GitHubLinkedInEmail
Admin
Back to Projects

Abstraction AI

CodeLive Demo
GPTGeminiNode.jsDockerGit

Overview

# Abstraction AI: From Long, Messy Context to an Elegant Build Project: https://abstractai.ai-builders.space/ When I start a new project, a workflow I often fall into is: I talk with ChatGPT for many rounds until I end up with a very long context. In the past, I would throw that entire conversation directly into a coding AI (like Augment Code or Claude Code) and ask it to implement the project “based on all the context”. It *feels* like there’s enough information—after so many back-and-forth rounds, the AI should understand what we want and how to do it. Over time, I realized this is not the ideal workflow. ## Two Problems With “Just Dump the Whole Context” ### 1) The context is too long, too messy The same point may be discussed, overturned, and rebuilt multiple times. That makes the AI confused about what the final version actually is, and it’s easy for implementation to drift off course. ### 2) We often don’t know what we don’t know One of the hardest parts of using a coding AI agent today is that many people—especially those without a technical background—don’t truly know what a “complete system” includes. Jumping straight from an idea to a “full implementation”, skipping PRDs, system design, architecture, and engineering documentation, and expecting the AI to write a system that satisfies everything from a fuzzy starting point is genuinely difficult. ## What Abstraction AI Adds Abstraction AI deliberately inserts a crucial step between “long context” and “actual development”: **It turns the context into a complete set of documents, and produces a clear design for the whole system.** This is a bit like manually inserting a deliberate “long thinking” step into the workflow—forcing a round of high-quality system-level thinking, structuring, and design before any code is written. ## Flexible Inputs, Practical Outputs In practice, the tool turned out to be very flexible. It can take: - Any length of text - AI chat logs - Meeting transcripts - A long project description you wrote yourself No matter the user’s background, it generates a set of documents that you can hand to an AI engineer. With those documents, the AI can build the system more reliably, with higher success rates and more stable outcomes. I intentionally made the documents beginner-friendly: usable for coding AIs, but also readable for people who aren’t very technical. Before you pay an AI to “do the work”, you can read the system description yourself, edit it according to your understanding, and then hand it off for implementation. After the docs are generated, you can also use a set of prompts I prepared to build a complete product directly from these documents. Currently, it supports switching between GPT-5 and Gemini 2.5 Pro. The Gemini 2.5 Pro frontend visualization still has some rough edges, and I’ll keep improving it. ## Cost, And an Unexpected Effect: Saving Money The project itself was built with Augment Code, and the core prompt was written with help from GPT-5 Pro. End-to-end—building, iterating, debugging—the total cost was about $20. Interestingly, this project was built by “having AI read long context”, not by starting with structured documentation. But my next startup project was implemented *on top of the structured docs generated by this tool*. That project was much more complex, closer to a complete system, and the total cost still came out to roughly $20. That showed me a very direct effect: **it saves money**. Because before execution, the AI already has a clear “instruction manual”. It can follow it, instead of repeatedly trial-and-erroring in ambiguous context and reworking mistakes. And my next project is, in essence, also about making coding AI agents faster, better, and cheaper. If you also tend to talk with AI for a long time before bringing in a coding AI, you might want to try converting “long context” into a complete, elegant, executable product document set first—then handing it off to your AI engineer. This is my first time sharing a project publicly. If it helps anyone, that would mean a lot. Please try it, share it, and give feedback—those are incredibly valuable for someone like me who’s still learning how to work with users. Deploying on Builder Space was important for this project, and I’m grateful to the AI Architect course for making it so easy to share. ## Comments **10 likes · 9 comments** ### mier20 (Full-Stack Engineer) Thanks a lot for sharing—this is a great project. I’m curious: for users with a technical background, this can save some concrete implementation time. But for users without a technical background, how can they judge whether the AI-generated docs are correct, or whether they’re truly what the user needs? **Charlie** That’s a crucial question. What I tried to do inside the system—based on my experience prompting AIs to explain things—is to make the generated docs as friendly as possible to people of any background, so more people can actually read them. I also include a glossary to help with terminology. So I think “help the user understand” is the first direction. The second direction is “learn with AI”: after downloading these docs, use a coding agent chat to ask it to explain things more clearly—ask wherever you don’t understand—until you feel confident you have a solid grasp. If we’re using natural language to orchestrate compute, then helping users understand more of the natural language they didn’t understand before also expands the range of language they can use—so they can gradually obtain enough information to make the important judgments. **Charlie** Thanks for the kind words! ### Xu Jia I feel your tool solves the core problem of maximizing the effectiveness of collaboration between a person and AI tools. The efficiency improvement you mentioned is just the result. The deeper point is: your tool draws clear boundaries for the AI tool, and the AI tool explores and optimizes within those boundaries. I’d love to discuss further and learn from each other. Thank you very much for sharing. **Charlie** Thanks for trying it and for the feedback! The efficiency gain was indeed something I discovered unexpectedly—my main goal was still to help AI develop better things in a better way. Your description—optimizing within boundaries—is very accurate and inspiring. For example, Claude Opus 4.5 will look back at the docs at the right time to check whether requirements are met and what tests might be missing. The development process shifts from “brute-force, messy exploration” to an optimization process with clear ground truth, a way to compute loss, and a path for backpropagation. After multiple iterations, it tends to converge to what we want. Developing software like training an optimized model—this has been how AI coding has felt to me for a while, and this project really makes that path smoother. ### Xu Jia My experience is very similar to what you described. I’ve discussed a potential project for a long time with multiple LLMs, kept a large amount of text discussion records, and ended up generating multiple PRD versions—but the development process becomes more and more chaotic. **Charlie** Yeah—if you want to leverage different LLMs’ strengths, it can definitely lead to that kind of difficulty. ### Xu Jia Could I try it? How do I use it? Thanks. **Charlie** Here’s the link: https://abstractai.ai-builders.space/ Thanks for your interest!

Deep dive

Deep Dive: Abstraction AI (AbstractAI) — “Context Compiler”

1. What This Is (one paragraph)

Abstraction AI is a small web app that takes long, messy conversation context (chat logs, meeting notes, or uploaded text files) and compiles it into a consistent “spec pack” of 11 structured documents (product overview, feature rules, technical architecture, tasks/acceptance, decisions, edge cases, quotes, trace map, open questions, inconsistencies) so humans and coding AIs can implement a project with less drift and fewer missed decisions.

2. Who It’s For + Use Cases

Primary users

  • Non-technical builders who start ideas via long AI chats and need a bridge to “engineer-ready” documentation.
  • Engineers/tech leads who want a fast “single source of truth” spec scaffold before implementation.
  • Users of coding agents (Cursor / Claude Code / Augment Code) who want to reduce ambiguity and rework.

Common use cases

  • Convert a long brainstorming thread into implementable requirements + architecture + acceptance criteria.
  • Produce a repeatable “spec pack” you can drop into a repo before asking a coding AI to build.
  • Extract decisions/constraints/edge cases and make them traceable to quoted source snippets.

What “good outcome” looks like (repo evidence-backed)

  • 11 documents are generated and previewable in the browser, then downloadable (single files or ZIP). (Evidence: backend/static/index.en.html:46, backend/static/app.js:352, backend/app.py:514)
  • The documents follow fixed separators and file names in either English or Chinese bundles. (Evidence: backend/prompt.py:218, backend/prompt.py:258, backend/prompt.py:493)

3. Product Surface Area (Features)

Feature: Paste context (primary input)

  • What it does: User pastes long context; UI shows character count; context is sent to the backend to generate docs. (Evidence: backend/static/app.js:246, backend/static/app.js:352)
  • Why it exists: The tool is designed around “raw context” as the single input, matching the stated problem of long AI chat histories. (Evidence: career_signaling_post.md:9, backend/static/index.en.html:48)
  • User journey (3–6 steps):
    1. Open / (English) or /zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)
    2. Paste context into the textarea.
    3. Click Generate.
    4. Watch documents stream in.
    5. Preview and download results.
  • Constraints:
    • Empty/whitespace-only context is rejected with a 400. (Evidence: backend/app.py:156, backend/app.py:222)

Feature: Upload multiple files (client-side) and merge into context

  • What it does: Allows selecting multiple files in the browser, merges their text into the textarea with per-file headers (=== filename ===). (Evidence: backend/static/app.js:271, backend/static/app.js:329)
  • Why it exists: Many “long contexts” live in files (notes, transcripts); merging keeps a single payload for generation. (Evidence: backend/static/index.en.html:48, career_signaling_post.md:23)
  • User journey:
    1. Choose files.
    2. UI lists uploads and lets you remove individual files.
    3. Combined text is inserted into the context input.
  • Constraints:
    • Uses File.text() in the browser; binary formats and very large files may fail or be slow; failures are replaced with a localized [Unable to read file contents] marker. (Evidence: backend/static/app.js:336)

Feature: Model toggle (GPT vs Gemini)

  • What it does: UI supports choosing gpt-5 or gemini-2.5-pro. (Evidence: backend/static/index.en.html:135, backend/static/app.js:197)
  • Why it exists: Lets users choose between “Better reasoning” and “Faster response” as described in the UI. (Evidence: backend/static/index.en.html:137, backend/static/index.en.html:141)
  • Constraints:
    • Backend treats any model name containing "gemini" as non-streaming and uses a heartbeat loop + full-response parse. (Evidence: backend/app.py:234, backend/app.py:251)

Feature: Language toggle (EN/ZH)

  • What it does: / serves English by default (if index.en.html exists), with /zh for Chinese. Prompt bundle switches separators, file names, and copy. (Evidence: backend/app.py:114, backend/prompt.py:482)
  • Why it exists: The underlying prompt and doc names are localized (two prompt bundles). (Evidence: backend/prompt.py:254, backend/prompt.py:258)
  • Constraints:
    • Only English/Chinese are supported; unknown values fall back to English. (Evidence: backend/prompt.py:482)

Feature: Streaming generation UX (11 docs as progress units)

  • What it does: Backend streams NDJSON events (meta, doc_started, chunk, doc_complete, done, error, and heartbeat) and frontend renders per-doc cards with status. (Evidence: backend/app.py:209, backend/static/app.js:462)
  • Why it exists: Improves perceived latency and reduces “blank screen” time for long generations. (Evidence: backend/static/index.en.html:153, backend/app.py:456)
  • Constraints:
    • Requires the model to emit correct separators; backend and frontend include best-effort tolerance and fallbacks. (Evidence: backend/app.py:78, backend/prompt.py:452)

Feature: Preview, copy, and download documents (single + ZIP)

  • What it does: Users can open documents in a modal while streaming, copy to clipboard, download single docs, or download a ZIP via backend. (Evidence: backend/static/app.js:731, backend/static/app.js:828, backend/app.py:514)
  • Why it exists: The output is intended to be moved into a project repo. (Evidence: backend/static/index.en.html:209)
  • Constraints:
    • ZIP filename is hard-coded to context_compiler_output.zip in backend response headers (even though the UI uses localized names). (Evidence: backend/app.py:525, backend/static/app.js:79)

Feature: “AI coding prompt” handoff box

  • What it does: Results UI contains a “Next: have a coding AI implement it” prompt box and a copy button. (Evidence: backend/static/index.en.html:198)
  • Why it exists: The product’s intended workflow is “generate specs → hand to coding agent”. (Evidence: backend/static/index.en.html:81, career_signaling_post.md:9)

Feature: Analytics + feedback (client-side)

  • What it does: Frontend triggers GA4 events and uses Microsoft Clarity, tracks anonymous user id + stats in localStorage, and shows thumbs feedback + NPS after multiple generations. (Evidence: backend/static/index.en.html:4, backend/static/app.js:16, backend/static/app.js:1055)
  • Constraints / privacy notes:
    • Clarity “project id” is a placeholder string in HTML; actual deployment must replace it. (Evidence: backend/static/index.en.html:18)
    • No backend consent or privacy controls are present in this repo. Unknown (not found in repo) whether deployment adds them.

4. Architecture Overview

Components diagram (text)

Browser (static HTML/CSS/JS)
  ├─ GET /, /en, /zh  → FastAPI serves HTML
  ├─ GET /static/*    → FastAPI serves JS/CSS
  └─ POST /api/generate-stream (JSON) ───────────────┐
                                                     ▼
FastAPI backend (Python)
  ├─ Builds full_prompt = ULTIMATE_PROMPT + context
  ├─ Calls BuilderSpace OpenAI-compatible API (chat.completions)
  ├─ Streams NDJSON events back to browser
  └─ (Optional) Zips documents for download
                                                     ▼
LLM Provider via BuilderSpace proxy
  └─ Returns text that includes 11 doc separators and content

Responsibilities per component

  • Frontend (backend/static/*): Collects input, initiates generation, renders stream events into 11 document cards, provides download/copy utilities, tracks analytics. (Evidence: backend/static/app.js:352, backend/static/app.js:462)
  • Backend (backend/app.py): Exposes routes, builds prompts, calls LLM, parses separators into docs, handles streaming and heartbeats, serves static assets. (Evidence: backend/app.py:28, backend/app.py:207, backend/app.py:529)
  • Prompt bundle (backend/prompt.py): Defines the “11-document contract”: names, separators, and the full instruction prompt (EN/ZH). (Evidence: backend/prompt.py:218, backend/prompt.py:493)

Key runtime assumptions

  • A valid AI_BUILDER_TOKEN is configured at runtime; otherwise generation fails (observed 401 with dummy token). (Evidence: backend/app.py:31, training_runs/2026-01-17T20-53-16Z_notes.md:58)
  • LLM outputs must include expected separators; otherwise parsing degrades to a single “full output” doc. (Evidence: backend/app.py:185, backend/prompt.py:452)

5. Data Model

This project is intentionally “stateless” server-side: there is no database layer or persistent server storage implemented in this repo. (Evidence: backend/requirements.txt:1 (no DB libs), backend/app.py:3 (no ORM/DB imports).)

API request/response models (backend)

  • GenerateRequest: { context: string, project_name?: string, model?: string, lang?: string } (Evidence: backend/app.py:50)
  • DocumentResponse: { name: string, content: string } (Evidence: backend/app.py:58)
  • GenerateResponse: { success: boolean, project_name: string, documents: DocumentResponse[], generated_at: string, raw_response?: string } (Evidence: backend/app.py:64)

Streaming event “schema” (backend → frontend)

NDJSON events for /api/generate-stream: (Evidence: backend/app.py:209)

  • meta: { type, project_name, document_names, generated_at }
  • doc_started: { type, doc_index }
  • chunk: { type, doc_index, delta }
  • doc_complete: { type, doc_index }
  • heartbeat: { type, elapsed_seconds, message } (Gemini and GPT timeouts)
  • done: { type }
  • error: { type, message }

Client-side storage (browser)

Stored in localStorage (not sent to backend by this repo):

  • abstraction_user_id (anonymous identifier) (Evidence: backend/static/app.js:18)
  • abstraction_stats (generation/download counts, first/last visit, NPS/feedback flags) (Evidence: backend/static/app.js:27)

6. AI System Design (if applicable)

This is a “prompt compiler” system, not a RAG system: it does not ingest into a knowledge base, compute embeddings, or run retrieval. All “knowledge” comes from the user-provided context payload.

Knowledge ingestion (sources, parsing, chunking)

  • Sources: pasted text + browser-read file contents merged into one string. (Evidence: backend/static/app.js:329)
  • Chunking strategy: Unknown (not found in repo). The backend sends the full context as a single user message; no chunking is implemented. (Evidence: backend/app.py:171)

Embeddings

  • Not used. (Evidence: no embedding code or deps; backend/requirements.txt contains no vector DB clients.)

Retrieval

  • Not used. (Evidence: no retrieval modules; single “prompt + context” call.)

Generation (models, prompts, grounding)

  • Model selection: passed through from the UI; defaults to gpt-5. (Evidence: backend/app.py:54, backend/static/index.en.html:135)
  • Prompting: full_prompt = ultimate_prompt + context, where ultimate_prompt includes strict separators, file names, and per-doc templates. (Evidence: backend/app.py:162, backend/prompt.py:274)
  • Output contract: Must emit 11 documents with exact separator lines and file names in order. (Evidence: backend/prompt.py:316, backend/prompt.py:452)
  • Parameters: max_tokens=32000, temperature=1.0. (Evidence: backend/app.py:176)

Parsing & hallucination control

  • Primary control: enforce separators + file name order in the prompt. (Evidence: backend/prompt.py:316)
  • Parser behavior: backend searches for each separator and slices content; it tolerates minor spacing differences around "=====". (Evidence: backend/app.py:82)
  • Fallback: if parsing yields no documents, return a single “full output” doc. (Evidence: backend/app.py:185)
  • Remaining risk: If the model omits separators or generates malformed JSON for TRACE_MAP.json, the system will still display raw text but with reduced structure. Unknown (not found in repo) whether the deployment adds output validation/retries beyond what’s in this codebase.

Streaming design

  • GPT streaming: calls client.chat.completions.create(..., stream=True) and incrementally emits NDJSON events. (Evidence: backend/app.py:363, backend/app.py:207)
  • Gemini fallback: treats any model name containing "gemini" as non-streaming, but still keeps the HTTP connection alive with heartbeat events while waiting on a background thread. (Evidence: backend/app.py:234, backend/app.py:251)
  • Frontend rendering: throttles document card updates to reduce jitter (RENDER_THROTTLE_MS = 100). (Evidence: backend/static/app.js:200)

Evaluation

  • Unknown (not found in repo). No eval scripts, golden tests, or scoring harnesses are present. Suggested next step: add a small regression suite with fixed inputs and snapshot outputs (redacted) to detect separator drift and doc completeness.

7. Reliability, Security, and Privacy

Threat model (what can go wrong)

  • Cost abuse: Public /api/generate* endpoints can be spammed to run up LLM costs if deployed without auth/rate limiting. (Evidence: endpoints at backend/app.py:151, backend/app.py:207, CORS * at backend/app.py:41)
  • Prompt injection / separator breaking: User context can instruct the model to ignore separators, producing unparseable output. Parser has limited tolerance but no enforcement. (Evidence: backend/prompt.py:452, backend/app.py:78)
  • Privacy leakage: User context is transmitted to an external LLM endpoint; retention policy is unknown. (Evidence: backend/app.py:171; Unknown (not found in repo): privacy/retention policy docs.)

Authn/authz

  • Backend auth: None. No sessions, cookies, or auth middleware found. (Evidence: backend/app.py:15, backend/requirements.txt:1)
  • Frontend “identity”: anonymous localStorage ID for analytics only. (Evidence: backend/static/app.js:16)

CSRF/CORS/rate limiting

  • CORS: allow_origins=["*"] and allows credentials/methods/headers broadly. (Evidence: backend/app.py:41)
  • CSRF/rate limiting: Unknown (not found in repo). No CSRF tokens or rate limiting middleware present.

Secret handling

  • Backend reads AI_BUILDER_TOKEN from environment and calls load_dotenv() at import time. (Evidence: backend/app.py:25, backend/app.py:31)
  • .gitignore lists .env, but this repo tree contains .env and backend/.env; whether they contain real credentials is unknown (values not inspected here). (Evidence: .gitignore:1, training_runs/2026-01-17T20-53-16Z_notes.md:43)

Data retention & redaction

  • Unknown (not found in repo). The system does not implement redaction before sending context to the model.

8. Performance & Cost

Latency drivers

  • Dominated by LLM response time + output size (up to max_tokens=32000). (Evidence: backend/app.py:176)
  • Long contexts increase generation time; UI explicitly warns “Usually 1–5 minutes” (Evidence: backend/static/index.en.html:153)

What is optimized (and why)

  • Streaming UX: NDJSON streaming provides progressive output to reduce perceived latency. (Evidence: backend/app.py:209, backend/static/app.js:352)
  • Timeout resilience: heartbeat events are sent when no chunks arrive to prevent connection idle timeouts. (Evidence: backend/app.py:456, backend/static/app.js:555)
  • Frontend jitter reduction: throttled re-rendering limits DOM churn under high-frequency streaming updates. (Evidence: backend/static/app.js:200)

Cost drivers

  • Each generation performs (at least) one LLM call with a large prompt and large output. (Evidence: backend/app.py:171)
  • Build-time/dev cost claim: the repo narrative claims ~$20 for development and iteration (not runtime). (Evidence: career_signaling_post.md:42)

What to measure next (explicit metrics)

  • End-to-end generation time distribution by model (p50/p95).
  • Token usage and cost per request (prompt tokens vs completion tokens).
  • Parse success rate: percentage of runs producing all 11 docs cleanly vs fallback.
  • Error rate and top error causes (401, timeouts, malformed separators).
  • Client metrics: time-to-first-doc, time-to-all-docs, abandon rate. (Unknown (not found in repo): backend metrics collection/instrumentation.)

9. Hardest Problems + Key Tradeoffs

  1. Single-call 11-doc output vs 11 separate calls

    • Chosen: single call with strict separators for simplicity and coherence. (Evidence: backend/prompt.py:316, backend/app.py:171)
    • Tradeoff: one failure or separator drift can degrade the entire output; no per-doc retries.
  2. Streaming parsing on the backend vs “wait then parse”

    • Chosen: backend streams and splits docs during generation for better UX. (Evidence: backend/app.py:207)
    • Tradeoff: complicated buffer/separator logic; harder to test; relies on exact separators.
  3. Gemini “streaming” fallback vs uniform streaming

    • Chosen: detect gemini and use heartbeat + full response parsing to preserve UI behavior. (Evidence: backend/app.py:234, backend/app.py:251)
    • Tradeoff: no token-level streaming for Gemini; doc content appears in larger chunks.
  4. Frictionless public demo vs securing the API

    • Chosen (current): no auth, permissive CORS, simple endpoints. (Evidence: backend/app.py:41)
    • Tradeoff: abuse risk and unclear privacy posture for production use.
  5. Client-side file reads vs server-side uploads

    • Chosen (current UI): read files in the browser and merge into a single payload. (Evidence: backend/static/app.js:329)
    • Tradeoff: browser memory limits; no server-side file-type validation; but simpler backend.
    • Note: Backend still includes /api/generate-from-file, suggesting an alternate design path. (Evidence: backend/app.py:490)
  6. High temperature (1.0) vs strict determinism

    • Chosen: temperature=1.0, likely to produce fuller prose and explanations. (Evidence: backend/app.py:177)
    • Tradeoff: more variance; increased risk of format drift; would benefit from stronger validation and retries.

10. Operational Guide (Repro & Deploy)

Local run steps

  1. Ensure Python is available (repo includes a venv at backend/venv/, but recreating is safer if it’s stale).
  2. Set required env var (name only): AI_BUILDER_TOKEN. (Evidence: backend/app.py:31)
  3. Run the backend from backend/:
    • uvicorn app:app --host 127.0.0.1 --port 8000
  4. Open http://127.0.0.1:8000/ (English) or http://127.0.0.1:8000/zh (Chinese). (Evidence: backend/app.py:114, backend/app.py:136)

Required env vars (names only)

  • AI_BUILDER_TOKEN (Evidence: backend/app.py:31)

Ports used

  • Default: 8000 (Docker + local examples). (Evidence: Dockerfile:18, deploy-config.json:5)
  • In container/platform: $PORT is respected. (Evidence: Dockerfile:22)

How to deploy

  • Container build/run is defined by Dockerfile. (Evidence: Dockerfile:1)
    • Build: docker build -t abstractai .
    • Run: docker run -e AI_BUILDER_TOKEN=... -p 8000:8000 abstractai
  • deploy-config.json suggests a deployment target named abstractai on branch main with port 8000; additional platform details are unknown. (Evidence: deploy-config.json:3)

How to debug common failures

  • 401 Invalid credentials: AI_BUILDER_TOKEN is missing or invalid. (Observed locally with dummy token.) (Evidence: training_runs/2026-01-17T20-53-16Z_notes.md:58)
  • Long-running requests timing out: rely on heartbeat events; if still timing out, increase server/proxy idle timeouts or move to WebSocket/SSE. (Evidence: backend/app.py:456, backend/static/app.js:555)
  • Docs not splitting into 11 cards: model likely broke separators; check raw output and tighten prompt / lower temperature / add validation + retry. (Evidence: backend/prompt.py:452, backend/app.py:185)

11. Evidence Map (repo anchors)

ClaimEvidence (repo anchors)
App purpose: compile long conversations into executable product docsbackend/app.py:34, backend/static/index.en.html:36, career_signaling_post.md:21
Uses FastAPI backendbackend/app.py:15, backend/app.py:34
Calls an OpenAI-compatible API via BuilderSpacebackend/app.py:28
Uses env var AI_BUILDER_TOKENbackend/app.py:31, training_runs/2026-01-17T20-53-16Z_notes.md:43
Serves English by default, supports /zhbackend/app.py:114, backend/app.py:136
Health endpoint existsbackend/app.py:145, training_runs/2026-01-17T20-53-16Z_notes.md:53
Generation endpoint (non-streaming) existsbackend/app.py:151
Streaming endpoint emits NDJSON event typesbackend/app.py:209, backend/static/app.js:462
Model defaults to gpt-5backend/app.py:54, backend/static/index.en.html:135
Gemini uses a non-streaming fallback + heartbeatbackend/app.py:234, backend/app.py:251, backend/static/app.js:555
Prompt bundles define 11 fixed doc names and separatorsbackend/prompt.py:218, backend/prompt.py:258, backend/prompt.py:493
Parser tolerates minor separator formattingbackend/app.py:82
ZIP download endpoint existsbackend/app.py:514
Frontend reads multiple files and merges into contextbackend/static/app.js:271, backend/static/app.js:329
Frontend calls /api/generate-streambackend/static/app.js:400
Frontend tracks analytics + NPS/feedbackbackend/static/index.en.html:4, backend/static/app.js:8, backend/static/app.js:1055
Docker deployment exists and respects $PORTDockerfile:18, Dockerfile:22
Deploy config points to GitHub repo and service namedeploy-config.json:2, deploy-config.json:3
Repo narrative claims ~$20 dev cost and mentions GPT-5-pro for prompt authoringcareer_signaling_post.md:42

12. Interview Question Bank + Answer Outlines

System design

Q1: How would you design an “AI spec compiler” with good UX for long outputs?

  • Stream output to the browser as structured events (meta/progress/chunks), not just a final blob.
  • Pick a stable output contract (e.g., separators + fixed doc order) to render partial results incrementally.
  • Implement connection keep-alives (heartbeat) to survive long model latency and proxy timeouts.
  • Throttle frontend rendering to avoid DOM jitter under high-frequency updates.
  • Add fallbacks: if parsing fails, still return a usable “full output”.
  • Evidence: backend/app.py:207, backend/static/app.js:352, backend/static/app.js:200, backend/app.py:185

Q2: Why NDJSON over WebSockets or SSE?

  • NDJSON works over plain HTTP responses and is easy to parse line-by-line in the browser stream reader.
  • Server and client can evolve event types without breaking a strict SSE format.
  • A migration path to SSE/WebSockets exists if you need better proxy compatibility and bidirectional control.
  • Evidence: backend/app.py:209, backend/static/app.js:422

Q3: What would you change to make this production-safe?

  • Add authentication or at least abuse controls (rate limiting, quotas, captcha).
  • Restrict CORS origins and disable credentials unless needed.
  • Add logging + metrics around generation and parse success.
  • Add request size limits and file type controls.
  • Add privacy policy, retention behavior, and redaction options.
  • Evidence: backend/app.py:41, backend/app.py:151, backend/static/index.en.html:4 (analytics present but no consent controls)

Q4: How do you deploy and operate it?

  • Containerize backend with Docker; serve static assets from the same service.
  • Use $PORT for platform-provided ports.
  • Provide /health for readiness checks.
  • Evidence: Dockerfile:1, Dockerfile:22, backend/app.py:145

AI/RAG

Q1: Is this a RAG system? If not, what are the tradeoffs?

  • It’s prompt-based compilation: all knowledge comes from the user’s provided context.
  • Benefit: simpler architecture, no indexing pipelines, fewer moving parts.
  • Tradeoff: long contexts increase latency/cost; no retrieval means no grounding to external trusted sources.
  • Upgrade path: add optional retrieval over a user-provided knowledge base for consistent facts.
  • Evidence: backend/app.py:171, backend/prompt.py:274

Q2: How do you control hallucinations and keep output structured?

  • Encode a strict output contract: fixed separators + file names + per-doc templates.
  • Add “no hallucination” instructions and direct unknowns into Open Questions.
  • Parse output and degrade gracefully when the contract isn’t met.
  • Add validators + retry loops for malformed separators/JSON.
  • Evidence: backend/prompt.py:316, backend/prompt.py:289, backend/app.py:78

Q3: How do you handle streaming differences across model providers?

  • Detect capability gaps (Gemini streaming unsupported in this proxy) and fall back to non-streaming mode.
  • Keep the client contract stable by still emitting progress/heartbeat events.
  • Evidence: backend/app.py:234, backend/app.py:251, backend/static/app.js:555

Debugging & reliability

Q1: A user reports the UI hangs during generation. How do you debug?

  • Check if the backend is emitting heartbeats or chunks; confirm proxy idle timeouts.
  • Verify frontend stream reader loop is still receiving data and parsing JSON lines.
  • Confirm the model call is still in progress; check server logs for timeouts.
  • Add instrumentation for “time since last chunk” and request lifecycle.
  • Evidence: backend/static/app.js:426, backend/app.py:456

Q2: The 11 documents don’t split correctly—everything ends up in one doc. What happened?

  • The model likely did not output the exact separators.
  • Verify the prompt bundle and separators; check for language mismatch or formatting drift.
  • Add stricter prompting, reduce temperature, and validate separators early with retries.
  • Evidence: backend/prompt.py:316, backend/app.py:82, backend/app.py:185

Q3: How do you prevent HTTP/2 timeouts on long generations?

  • Emit heartbeat events on the same response stream when no chunks arrive.
  • Use background threads + queues so the generator can keep writing while the model call blocks.
  • Evidence: backend/app.py:251, backend/app.py:456

Product sense

Q1: What user pain does this solve, and how do you measure success?

  • Pain: long AI chat contexts cause drift; jumping from idea → code skips PRD/system design layers.
  • Product: adds a “doc compilation” stage producing a spec pack that both humans and coding AIs can use.
  • Success metrics (suggested): reduced iteration cycles, higher first-pass implementation accuracy, parse success rate, completion time, user satisfaction (NPS).
  • Evidence: career_signaling_post.md:13, career_signaling_post.md:15, backend/static/index.en.html:36

Q2: Why 11 docs—why not fewer?

  • The prompt explicitly defines 11 audience-specific artifacts (PM/QA/engineer/execution) + evidence pack (decisions/edge cases/quotes/trace map).
  • This reduces ambiguity by forcing structure and traceability.
  • Evidence: backend/prompt.py:316, backend/prompt.py:343

Q3: What’s the biggest “product risk” here?

  • Users may paste sensitive data; without redaction and clear policy, this creates trust issues.
  • Unauthenticated public endpoint can be abused, impacting cost and reliability.
  • Output may still be wrong; without evals, quality can regress silently.
  • Evidence: backend/app.py:41, backend/app.py:171; Unknown (not found in repo): privacy docs and eval harness.

Behavioral

Q1: Tell me about a time you made a system more reliable without changing the core idea.

  • Identified that long model latency + proxy idle timeouts caused failures.
  • Added heartbeat events to keep connections alive, plus a background thread so the stream stays responsive.
  • Added a Gemini fallback when streaming was unsupported.
  • Evidence: backend/app.py:251, backend/app.py:234

Q2: How did you manage ambiguity in requirements?

  • Translated ambiguity into a structured “spec pack” format with explicit rules (MUST/SHOULD/MAY) and Open Questions.
  • The product itself encodes an anti-ambiguity workflow: compile context → implement.
  • Evidence: backend/prompt.py:295, backend/static/index.en.html:81

Q3: How do you collaborate with AI tools effectively?

  • Define contracts (separators, doc templates) so the model’s output is machine-parseable and reviewer-friendly.
  • Create a handoff prompt designed for coding agents, embedded in the UI.
  • Evidence: backend/prompt.py:316, backend/static/index.en.html:198

13. Roadmap (high-leverage upgrades)

Must

  1. Add abuse protection: rate limiting + quotas + basic auth or signed tokens for generation endpoints.
  2. Restrict CORS to known origins; disable allow_credentials unless required.
  3. Add server-side observability: structured logs + request IDs + metrics (latency, tokens, error rates, parse success).
  4. Add output validators (separator presence, JSON validity for trace map) + automatic retries with lower temperature.
  5. Add request size limits and better file handling (MIME allowlist; server-side uploads if needed).
  6. Add a clear privacy notice + redaction mode (e.g., emails/phone numbers) before sending to LLM.
  7. Add regression tests for parsing and streaming event sequencing.

Nice-to-have

  1. Switch to SSE for more compatible streaming semantics (still line-based events).
  2. Let users choose doc count/templates (custom bundles).
  3. Add “resume generation” and partial retries per doc if a separator is missing.
  4. Add “export to repo” integration (e.g., GitHub PR creation) — with explicit user consent.
  5. Add optional RAG over user-provided attachments for consistent facts and cross-references.
  6. Add multi-tenant usage dashboard (for internal ops) with cost attribution per workspace.