MVP Builder
Apply →

Behavioral Economics · April 2026

The Confirmation Bias That's Keeping Your Side Project Almost Done

Last week you had four coding sessions. The codebase is cleaner than it was. You solved two problems that had been blocking you. And you are still not deployed.

This is not a motivation problem. It is a measurement problem.

You are measuring the wrong thing — and because you are measuring it consistently, it keeps reporting that everything is fine. The project is progressing. The evidence is right there in your commit history.

What you are not measuring is the one number that matters.


The Metrics That Feel Like Progress

Developers default to activity metrics because they are visible and immediate. VS Code sessions. Commits pushed. Lines written, lines deleted, lines rewritten. The codebase is larger. The architecture is cleaner. The tests pass.

All of this generates a genuine sense of forward movement. It is not irrational. These things correlate with progress in professional software work, where shipping is structurally enforced by other people's deadlines.

Solo side projects do not have that enforcement layer. Which means the correlation breaks — and the metrics decouple from the actual outcome.

The one metric that does not decouple: Is this project closer to a real user than it was seven days ago?

Binary. No reframing available. A cleaner architecture that is still not deployed has not moved the number. Four sessions of productive-feeling work that ended with zero users have not moved the number. The metric does not care about effort.

Most developers are not measuring this. They are measuring activity — and confirmation bias ensures they keep finding what they are looking for.


What Behavioral Economics Says

Dr. Anne Greul — Behavioral Economics Lecturer at UC Berkeley and founder of a Legal AI startup — studies how cognitive bias intersects with AI-assisted decision making. Her framing for this failure mode is precise:

"Wenn ich eine Annahme habe, auf der ich mein Unternehmen gründe, und dann die Metriken vor allem beobachte, die erfolgversprechend sind — Nutzerzahlen oder Retention außer Acht lasse, weil sie eine andere Geschichte zeichnen — das ist das Risiko, wo der Bias wirklich gefährlich wird."

— Dr. Anne Greul, Behavioral Economics Lecturer, UC Berkeley · founder, Legal AI startup

"If I have an assumption my company is built on, and I mainly observe the metrics that look promising — ignoring user numbers or retention because they tell a different story — that is where the bias gets genuinely dangerous."

Greul is describing founders measuring vanity metrics to protect a core assumption. But the structure is identical for a developer who has been "almost done" for three months.

The core assumption being protected is: I am the kind of person who will ship this.

Commit velocity supports that assumption. Deployed-and-in-front-of-a-real-user tests it. Confirmation bias drives you toward the metric that does not test the assumption — and away from the one that does.

Greul's observation about founders is also relevant here. She notes that the dangerous version of this bias is not a single bad decision — it is a sustained pattern of selective attention that accumulates over time without anyone noticing, including the person doing it.

For a side project with no external observers, that accumulation has no natural interruption. Six months of productive-feeling sessions with no deployment is entirely consistent with a developer who believes, genuinely, that they are making progress.


The Blackbox Parallel

Greul has a second frame that applies here, drawn from her work on Legal AI systems. She describes the failure condition as a blackbox:

"Keine Blackbox, bei der ein Ergebnis rauskommt, ohne dass der Nutzer versteht WIE es dazu kam, welche Daten verarbeitet wurden. Nur dann kann der Nutzer einwirken, erkennen dass es falsch ist, korrigieren."

— Dr. Anne Greul, Behavioral Economics Lecturer, UC Berkeley · founder, Legal AI startup

"Not a blackbox where a result comes out without the user understanding HOW it came about, which data was processed. Only then can the user intervene, recognize that it is wrong, correct it."

She was describing AI systems in legal contexts. The concern is that a user who cannot see the reasoning process cannot identify the error.

The same structure applies to a solo side project sprint.

The developer who works alone for three months accumulates their own internal model of progress. That model is not a blackbox in the technical sense — it just has no external observer who can read it and say: you have been marking this task as "in progress" for six weeks.

A milestone review does what Greul requires of a non-blackbox system: it makes the reasoning visible. Not to an AI, but to a human who can evaluate specific context — what you committed to, what you built, whether those two things match.

This is not about accountability theater. It is about creating the conditions under which a developer can recognize and correct their own errors — rather than confirming them indefinitely.


An Independent Confirmation: The 90/10 Model

Greul's framework is not an isolated observation. Security researcher Dr. Karsten Nohl arrived at structurally the same conclusion from a different domain entirely.

Nohl, known for exposing critical vulnerabilities in mobile network protocols and SIM card cryptography, described the failure mode of enterprise AI deployments:

"Ein realistisches Ziel darf nicht sein, Menschen zu 100% zu ersetzen. Sondern 90% der Arbeit die Maschine machen lassen — und an allen wichtigen Entscheidungsstellen trotzdem noch Human in the Loop."

— Dr. Karsten Nohl, security researcher

"A realistic goal must not be to replace humans 100%. Instead, let the machine do 90% of the work — and still keep a human at every important decision point."

Greul also makes the responsibility point explicit, and it reinforces Nohl's framework from a different angle:

"100% Compliance gibt es nicht. Es braucht immer noch Personen im Unternehmen, die Experten sind und diese Verantwortung übernehmen."

— Dr. Anne Greul, Behavioral Economics Lecturer, UC Berkeley · founder, Legal AI startup

"There is no such thing as 100% compliance. You always need people who are experts and take responsibility."

Four independent domains — METR's productivity research, Kahneman's planning fallacy framework, Nohl's enterprise AI security model, and Greul's behavioral economics work — converge on the same architecture: structured human checkpoints at defined intervals in an otherwise automated or self-directed process. Not as a compromise between human and machine capability. As the architecture that makes the system work.

For a solo side project, this means: daily work can be self-directed. The milestone review — whether the actual output matches what you committed to — requires an external human who can evaluate specific context, not pattern-match against generic progress criteria.


The Metric That Cuts Through

Greul's hardest observation to sit with is the one about unlearning:

"Die größte Herausforderung ist immer wieder to unlearn — also immer wieder zu vergessen, was man eigentlich gerade dachte zu wissen, und sich neu drauf einzulassen."

— Dr. Anne Greul, Behavioral Economics Lecturer, UC Berkeley · founder, Legal AI startup

"The biggest challenge is always to unlearn — to forget what you thought you knew and approach it fresh."

Applied here: the thing that needs to be unlearned is the assumption that activity equals progress.

This is not a character failure. The METR study found that even experienced professional developers, working in structured environments with external accountability, showed measurable friction when their tooling shifted. Kahneman established that planning fallacy is universal, not a symptom of poor judgment. The cognitive patterns that keep side projects unfinished are not unusual — they are the default.

The structural fix is not to work harder or care more. It is to replace the metric.

Is this project closer to a real user than it was seven days ago?

If the answer is no, the question worth asking is not "why didn't I work more" but "what specifically is between me and a deployed URL."

That question has a concrete answer. And a concrete answer can be addressed in a structured sprint.

MVP Builder is a structured 30-day sprint that enforces the real metric. You apply with your project description, receive daily prompts tailored to your stack and phase, and complete milestone checkpoints at Day 13, 21, or 30 — reviewed by a human before you advance. The review is binary: deployed or not deployed. There is no metric to reframe.

Cohort #1 is free. 8 spots. No credit card.

Apply to Cohort #1 →

Frequently Asked Questions

What is confirmation bias in software development?

Confirmation bias in software development is the tendency to monitor metrics that confirm the project is progressing — commit count, lines of code, VS Code session time — while discounting metrics that tell a different story, such as whether any real user has touched the product. Dr. Anne Greul, Behavioral Economics Lecturer at UC Berkeley and founder of a Legal AI startup, describes the pattern: "If I have an assumption my company is built on, and I mainly observe the metrics that look promising — ignoring user numbers or retention because they tell a different story — that's where the bias gets genuinely dangerous." Applied to side projects: activity feels like progress because we are primed to notice evidence that we are working.

Why do developers not finish side projects?

The most durable explanation is structural rather than motivational. Developers with full-time jobs have genuine time constraints, but the primary failure mode is the absence of external checkpoints. Professional software work has forcing functions — deadlines that affect others, standups, colleagues who notice absence. Solo side projects have none of these. A secondary factor is that AI tools have made planning effortless, which increases the temptation to re-plan instead of committing to execution. The METR study (July 2025, arxiv.org/abs/2507.09089) found that experienced developers using AI tools were 19% slower on complex software tasks — a finding consistent with this re-planning dynamic.

Who is Dr. Anne Greul and what does she research?

Dr. Anne Greul is a Behavioral Economics Lecturer at UC Berkeley and the founder of a Legal AI startup. Her work sits at the intersection of cognitive bias, decision-making under uncertainty, and the practical deployment of AI systems in high-stakes environments. Her research addresses how human judgment errors interact with AI-generated outputs — specifically the conditions under which humans can meaningfully correct AI errors versus conditions where bias prevents them from recognizing errors at all.

What is the blackbox problem in AI-assisted work?

Dr. Anne Greul defines the blackbox problem as follows: "Not a blackbox where a result comes out without the user understanding HOW it came about. Only then can the user intervene, recognize that it's wrong, correct it." Applied to developer side projects: when an AI tool generates a sprint plan or architecture recommendation, the developer who did not participate in the reasoning process cannot reliably identify where the plan is wrong. The result is not a technical failure but a visibility failure — the errors are present but not surfaced until a milestone forces a review.

What is the 90/10 human-in-the-loop model?

The 90/10 model is a framework described by security researcher Dr. Karsten Nohl: let automated systems handle 90% of the work, but preserve human judgment at every important decision point. Nohl derived this from enterprise AI failures where fully automated pipelines produced wrong outputs that no one caught. The model maps directly to structured accountability systems for side projects: daily prompts, reminders, and frameworks can be automated. The milestone review — whether the actual output matches the committed goal — requires a human who can evaluate specific project context.

What did the METR study find about AI and developer productivity?

The METR study (July 2025) measured the productivity of experienced professional developers on real open-source GitHub issues, with and without access to AI tools including Claude, GPT-4o, and Gemini. Developers using AI tools completed tasks 19% more slowly on average. A follow-up study (February 2026) showed a smaller effect of approximately 4% with a wider confidence interval. The direction remained consistent: for complex, novel software tasks, the productivity benefit of AI tools is uncertain and may be negative. The finding does not apply to routine, well-defined tasks where AI assistance is consistently faster.

What is the one metric that actually measures side project progress?

The one metric is: "Is this project closer to a real user than it was seven days ago?" This question is binary and not reframeable. A project with 40 new commits that is still not deployed has not moved closer to a real user. A project with no new commits that now has one person using a live URL has moved. Commit frequency, VS Code session time, and lines of code are activity metrics, not progress metrics. The distinction matters because confirmation bias causes developers to optimize for the metrics they are already measuring, and most developers are measuring activity.

How does MVP Builder address confirmation bias in side projects?

MVP Builder enforces the real metric structurally. At milestone checkpoints — Day 13 for Bronze, Day 13 and 21 for Silver, Day 13, 21, and 30 for Gold — a human reviews whether the output matches what you committed to at the start of the sprint. The review is binary: deployed / not deployed, scoped deliverable met / not met. This eliminates the re-framing option that confirmation bias enables. You cannot report increasing commit velocity as evidence of progress when the checkpoint asks for a deployed URL.


Sources

  • METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (July 2025) — arxiv.org/abs/2507.09089
  • METR: Update on AI Uplift Study Design (February 2026) — metr.org/blog/2026-02-24-uplift-update/
  • Kahneman, D. & Tversky, A.: Intuitive prediction: Biases and corrective procedures (1979) — Planning Fallacy; see also Buehler, R., Griffin, D. & Ross, M.: Exploring the planning fallacy, Journal of Personality and Social Psychology 67(3), 1994 — 43 percentage point gap between predicted and actual completion rates
  • Dr. Anne Greul — Behavioral Economics Lecturer, UC Berkeley; founder, Legal AI startup. Quotes drawn from public interviews on AI deployment, behavioral bias, and organizational accountability (2025–2026).
  • Dr. Karsten Nohl: KI-Agenten und menschliche Kontrollpunkte — public interview, YouTube (42:52–45:15)
  • Stack Overflow Developer Survey 2025: Trust in AI accuracy at 29% (down from 40% in 2024)