Programming as Training: Turning Uncertain AI Coding into a Smooth Loss Curve

Thanks to the 课代表立正 community, my last project, Abstraction AI, helped bridge the gap between long, messy requirements and a complete, elegant product. Abstraction AI received great feedback. After that, I gradually refined a method I had been thinking about for a long time: using the mindset of training deep learning models to direct AI programming. Abstraction AI makes a coding agent’s objective clearer, but not as a rigid spec. It does not force the AI to ignore subtle details in our original intent. It helps the AI build what we actually want, while preserving both human and AI creativity.

Chapter 0: Deep Learning

Over the past year, one idea has become increasingly clear in my mind: good AI coding equals training a model.

AI cannot initialize an idea. It can output all the answers, but it still cannot truly take the first step of “what do I want to build?” So the essence of AI coding is this: a creative human proposes an idea that is not fully fixed or perfectly structured, and the AI writes code step by step, gradually converging toward the “Idea” in the human’s head.

When I think about goals, iteration, and optimization, I realize it looks more and more like optimizing a deep learning model: ground truth, compute loss, backpropagate to get gradients, and then gradient descent.

The key question is: what is the model?

The model is the pile of code written by the AI. Humans invented deep neural networks, and they gradually became black boxes where people only care about input and output. AI-written code is also gradually becoming a black box where people only care whether the input and output are correct.

Ground truth is the target we give the AI. Whether it is a spec so detailed that the AI can execute it mechanically, or a vague and messy idea, it is still the shape we hope the final code will take.

Computing loss is measuring the gap between the code and the target. The AI measures it by running tests. Humans sense it by observing the runtime behavior and outcomes.

Backpropagation is the AI deciding, based on the loss, how each detail of the code should be modified. If humans must be involved, how should we participate in this process? I previously wrote an article, From write-then-review to review-then-write: a programming method for better human and AI collaboration. That piece is about exactly this: before the AI starts changing code, have humans review and correct the AI’s modification plan.

Gradient descent is simple: execute all modifications.

Why make this analogy? For me, it is to let AI coding, which seems like inference rather than training, still enjoy the benefits of training: driving the AI straight toward the goal, converging smoothly, quickly, and elegantly, producing a clean, steadily decreasing loss curve with minimal spikes and detours.

First Method: RLVR, Verifiable Rewards

1. Building a project from scratch

In Abstraction AI, I turn a fuzzy target into a clear design document, then let the AI implement step by step against that design. This makes the loss curve much steeper, meaning fewer iterations, which saves money, time, and energy. I do not need to fight and twist the model into shape. The core instruction is this:

Implement everything we want according to these documents. After implementation, do deep research across all documents and the code, run all necessary tests, and if anything is missing or unsatisfactory, modify the code yourself. Repeat for multiple rounds until all requirements are satisfied.

The AI is effectively manually training a model called Full-Stack Web App, using structured documents as verifiable rewards, doing RLVR (Reinforcement Learning with Verifiable Rewards). Across iteration after iteration, epoch after epoch, I quickly get what I want.

2. Improving an existing project

After I got ChatGPT Pro, I started working with GPT-5.2-extrahigh inside Codex (not a Codex model), a partner that is free, smart enough, but not easy to tame.

While adapting to a new partner, my project also entered a deeper phase. The code was already built in the previous stage, but I kept generating new ideas and requirements. This is painful in Spec-Driven Development (SDD, writing a specification first, then building according to the spec), because it is hard to keep documentation synchronized with code. The ground truth keeps moving. This also resembles where LLM progress is today: benchmarks become less effective, and we do not clearly know what we want anymore.

Three days ago, I was pleasantly surprised to find Codex working almost 24 hours straight. Then I discovered it had partially gotten lost. I gave it many requirements, and it kept doing the same few tasks repeatedly, doing the same thing multiple times. This might have been caused by one of my instructions mixing in the “iterate epoch by epoch” mindset.

But I tamed it again. The method is simple: after giving a batch of requirements, have it list these requirements in a new file. Each time it completes one, it checks it off. No repeats. Now it sometimes runs for 24 hours and delivers one piece after another that I want.

In fact, this is a built-in feature of tools like Augment Code, Cursor, and Codex: make a plan, then execute it step by step. Implementing new requirements from a spec is not new either, similar to Kiro. But when I am the one who knows exactly what I want and I state it explicitly, it works better than letting the model plan by itself.

Still, Codex is slow.

Second Method: RLAIF, Let the AI Carry the Pain of Uncertainty

The discussion above is more about building CRUD software. In that world, everything eventually becomes certain, so with the right method you can always get a smooth loss curve.

But what if the software itself contains high uncertainty, where parts are probabilistic and sampled? That is what I want to discuss next: how to make AI write an AI agent, or at least a system where one step involves an AI component, such as a RAG system. Let the AI write AI, and let it carry almost all the work, without humans repeatedly suffering through “this is not good enough, I want X, change it.”

If humans have to do that constantly, it feels like Captain America holding a helicopter, being torn back and forth by the gap between ideal and reality. It hurts.

How do we let the AI carry that pain?

My approach is: write down as much as possible about my taste, preferences, and expectations for the agent, and let the AI align its taste with mine. First it implements the system and understands what kind of results I want. Then it generates all kinds of inputs on its own, observes the outputs, checks the gap between outputs and my preferences, and optimizes. It repeats this for multiple rounds until it truly meets the need. Iteration, optimization, training, call it whatever you want. It is like RLAIF (Reinforcement Learning from AI Feedback), using AI feedback as the driver.

Final Thoughts

I have not built a dedicated product for this yet. I only want to state the future I want to see:

A Lovable that can build Lovable by itself, and a Lovable that can build Claude Code by itself.

Make the loss curve smooth. Let AI do more. Let humans only do what AI genuinely cannot do. If AI can do it, let AI do it.

Thank you for reading all the way to the end.