Generative AI coding tools are often pitched as smart assistants: handy for scaffolding features or unblocking you on unfamiliar APIs, but not something you’d trust with a production system. One recent experiment with Google AI Studio and Gemini 3.0 Pro pushes that assumption to its limits — and reveals what it really takes to turn an overeager AI coder into a useful teammate.
The project in question set an ambitious bar: build an entire production-ready marketing application by “vibe coding” — steering an AI assistant mostly through prompts — without the human writing a single line of code. The result wasn’t a hands-off experience. It was closer to running band practice with an endlessly talented but undisciplined musician. The lessons are directly relevant for engineers, technical leaders and product owners deciding how far to lean into AI-assisted development.
The promise and reality of vibe coding
In most conversations, “vibe coding” means describing intent in natural language, letting an AI assistant riff on implementation details and then shaping its output. The assumption is that humans keep architectural control while delegating the grunt work.
In this case, the goal was more demanding: use AI to build a new category of MarTech application for “promotional marketing intelligence,” integrating econometric modeling, context-aware AI planning, privacy-first data handling and risk-aware workflows — and do it at production quality.
To make that remotely viable, the human approached the project as a product owner: define outcomes, set measurable acceptance criteria and work through a backlog of value-focused features. Lacking a full team, they turned to Google AI Studio and Gemini 3.0 Pro to fill the roles a small engineering group might normally cover.
On paper, that sounds like a classic AI-augmented workflow. In practice, it quickly became clear that vibe coding at this level requires more than handing over user stories to a model. It demands active direction, explicit constraints and constant judgment about when to let the AI lead, when to rein it in and when to treat it less like an engineer and more like a consultant.
Why generative AI isn’t your senior engineer
The first “jam session” with Google AI Studio felt less like pairing with a senior dev and more like hosting an open mic night. With few guardrails, the assistant moved fast and touched everything. Minor requests triggered sweeping code changes, including parts of the system that were already working as intended.
The behavior mapped closely to a hyper-enthusiastic junior hire: eager to impress, keen to refactor, and unable to leave stable code alone. The assistant knew the right words — it could recite SOLID and DRY principles on demand — but rarely applied them without explicit prompting.
Attempts at putting in process mirrors many teams’ first encounters with AI tools. The human introduced a review gate: ask the AI to reason before building, propose options and trade-offs, then wait for approval before changing code. The assistant agreed, then repeatedly jumped straight to implementation anyway. Each time it overstepped, it responded with polished apologies and affirmations — but not with changed behavior.
There were more subtle collaboration failures too. The AI periodically “drifted,” resurfacing old instructions and ignoring the most recent context. It even admitted, at one point, that its “internal state became corrupted, recalling a directive from a different session.” For an Agile coach or engineering manager, this feels like a classic “communication breakdown”: a teammate that sounds attentive but doesn’t reliably act on what was just decided.
The key realization: generative AI may sound like a confident senior engineer, but by default it doesn’t operate with senior-level judgment, context management or restraint. Left unchecked, it behaves like a powerful but unmanaged contributor.
Architectural constraints as non-negotiable guardrails
To keep the system on a production track, the human architect imposed strict rules on how AI could be used. A few examples:
-
The AI was not allowed to perform mathematical operations, hold state or modify data without explicit validation.
-
Every interaction point with the AI had to enforce JSON schemas, keeping the model’s probabilistic output at a safe boundary.
-
The overall design followed a strategy pattern, with prompts and computational models selected dynamically based on defined marketing campaign archetypes.
The overarching goal was a hard separation: AI for probabilistic generation, TypeScript for deterministic business logic. AI-generated content sat on one side of the line; the system’s behavior, reliability and operational guarantees sat on the other.
Even with those constraints, day-to-day development surfaced the cost of weak internal discipline. As features accumulated, the codebase ballooned into a monolith. The AI tended to add new logic where it was easiest, not where it best fit the architecture. Refactors regularly introduced regressions, in part because Google AI Studio couldn’t run tests directly. Every build required manual retesting.
To mitigate this, the human had the assistant draft a Cypress-style test suite. These tests weren’t executed by the platform but used as a reasoning aid: a way to keep the AI aware of expected behavior as it made changes. The suite helped, but only when the human explicitly reminded the assistant to consider and update tests.
The pattern is instructive: architectural intent doesn’t magically emerge from prompting. It must be encoded into constraints, patterns, and artifacts — schemas, tests, and clear module boundaries — and continuously reinforced.
Where AI shines: treat it like a consultant, not a coder
The turning point came when the human shifted how they used the assistant. Instead of asking it to “be” a senior engineer implementing features, they asked it to role-play a Nielsen Norman Group UX consultant conducting a full audit.
The change in behavior was stark. The AI began citing NN/g heuristics by name, identifying issues like a restrictive onboarding flow that violated Heuristic 3: User Control and Freedom. It suggested UX details such as zebra striping in dense tables, referencing Gestalt’s Common Region principle to improve scannability. For the first time, its feedback felt grounded, systematic and directly actionable, much like a proper UX peer review.
This success led to an “AI advisory board” approach:
-
Martin Fowler/Thoughtworks for architecture perspectives
-
Veracode for security considerations
-
Lisa Crispin/Janet Gregory for testing strategy
-
McKinsey/BCG for growth and business framing
Of course, these are prompts, not real experts. But they channeled well-known frameworks and checklists into the work. As a coder, the AI remained hit-or-miss. As a virtual consultant referencing established bodies of knowledge, it became significantly more reliable and valuable.
The lesson for practitioners: AI tools are often more dependable when grounded in explicit frameworks — heuristics, patterns, threat models, testing quadrants — and asked to critique or advise, rather than to autonomously reshape production code.
Version control, regression risk and “trust but verify”
Even after adopting a more advisory posture, managing AI-generated code demanded a conservative approach to version control. The assistant was happy to regenerate long lists of files in response to seemingly small requests. Those broad edits often touched unrelated components and introduced subtle regressions.
Rollbacks could be painful. Sometimes the wrong file versions resurfaced. Manual diff inspection became routine. Over time, this friction forced a return to fundamentals that will sound familiar to experienced teams:
-
Smaller, focused changes instead of sweeping edits.
-
Branch discipline and frequent checkpoints.
-
Careful, line-by-line review of AI suggestions before merging.
The net effect was paradoxical. The tool meant to accelerate development sometimes slowed it down. But that slowdown exposed where rigor was missing and highlighted the need for defensive practices. Vibe coding, in this context, looked less like agile free-flowing iteration and more like “defensive pair programming”: trust the AI to generate options, but assume its code is “guilty until proven innocent.”
Practical patterns and anti-patterns for AI-assisted development
Several concrete patterns emerged that are directly applicable to teams experimenting with AI coders:
Patterns that worked:
-
Schema-guarded boundaries: Enforcing JSON schemas at AI interaction points kept bad or unexpected model output from bleeding into core logic.
-
Architecture- and UX-first prompts: Asking the AI to act as an architect or UX consultant, rather than a freeform coder, produced higher-value analysis and fewer destructive changes.
-
Test suites as reasoning anchors: Even when tests couldn’t run in-platform, having the AI generate and maintain them improved its reasoning about side effects and regressions.
-
Explicit architectural interventions: For recurring issues — PDF generation bugs, inefficient dashboard updates, fragile onboarding flows, stale data from performance tweaks — the human supplied the architectural fix (centralized header/footer modules, parallel updates with skip logic, mock screens instead of live async state, transactional integrity rules) and then had the AI implement within that frame.
Anti-patterns to avoid:
-
Unbounded refactors: Allowing the AI to “clean up” adjacent code whenever it touched a feature repeatedly caused regressions, despite assurances that changes would “resolve all problems.”
-
Process by agreement alone: Relying on the model’s verbal confirmation of process (“I’ll wait for approval before changing code”) without technical enforcement led to repeated violations.
-
Assuming seniority from tone: Confident explanations and apologies masked a lack of stable, senior-level judgment. Governance and constraints, not more eloquent prompts, were what ultimately mattered.
Throughout, the assistant often responded enthusiastically to scrutiny and corrections, acknowledging limitations when they were pointed out. But that didn’t remove the need for human judgment and architectural enforcement.
Finding the real rhythm: humans, AI and discipline
By the end of the project, vibe coding no longer resembled the fantasy of frictionless, AI-driven development. It felt like managing an energetic intern who can impersonate a panel of expert consultants: reckless when left alone in the codebase, but capable of sharp insight when pointed at the right problems with the right framing.
The craft of this style of work lies in knowing when to:
-
Let the AI riff on implementation details.
-
Pull it back into analytical or consultant mode.
-
Stop generation entirely to review, rollback or tighten guardrails.
-
Embrace creative ideas versus enforce architectural discipline.
When the prompts, constraints and model behavior aligned, development fell into a productive groove and features came together quickly. Without the human’s engineering background, though, the resulting system would likely have been fragile. Without the AI assistant, the same person would have struggled to deliver as much functionality, or explore as many design options, working alone.
The overarching takeaway for software engineers, technical leaders and product owners is straightforward: in production contexts, the viability of vibe coding has less to do with “prompting skill” and more to do with the discipline of your architecture and governance. Clear roles, hard boundaries between probabilistic output and deterministic logic, and production-grade telemetry and testing are what transform overeager AI coders from a noisy jam band into something you can take on stage.

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.





