Case study / HeliosX / AI product design
An AI competitor-intelligence system for regulated medical-intake flows.
I designed and architected a TypeScript, Playwright, and Gemini system that walks regulated UK healthcare competitors as different patient personas, hashes the forms it sees, replays cached answer maps, asks the model only when something changes, and alerts the team when a competitor's experience materially shifts.
The problem
The most useful competitor information was hidden behind medical questionnaires.
In regulated healthcare, the strategic detail is rarely on the marketing page. It lives inside the consultation flow: eligibility gates, rejection framing, clinical questions, post-consultation pricing, and checkout boundaries.
Walking those flows manually is slow and ethically fraught. The system needed to observe competitor journeys, compare them over time, and preserve evidence without submitting fake medical intake to real prescribers or crossing payment boundaries.
The AI product design challenge was to decide exactly where the model belonged. Gemini was useful for interpreting novel forms and selecting safe next actions; it was not allowed to become an unbounded browser agent that could invent patient facts, bypass clinical gates, or drift past payment and prescribing boundaries.
Impact
From Slack screenshots to a queryable intelligence system.
The output was a durable AI-assisted system that answers questions like "what changed on Voy this week?" or "how does Numan handle a low-BMI branch?" with evidence.
AI product brief
A model-in-the-loop system, not a free-roaming bot.
Many AI products are presented as a chat box wrapped around a workflow. This project needed a more specific interaction model: a crawler that could behave like a careful researcher, ask Gemini for help only when the interface was novel, and turn every observation into inspectable product evidence.
I designed the system around separation of concerns. Playwright handled deterministic navigation and state capture. Personas supplied clinical intent and safe test data. Page hashes decided whether a form was known or new. Gemini solved only the novel interaction problem and returned structured actions. Postgres preserved the evidence for dashboard review.
System shape
Persisted evidence first, dashboard second.
The architecture favoured boring, inspectable choices: native Node HTTP, server-rendered HTML, vanilla JavaScript, raw SQL, Postgres for durable state, Redis for in-flight work, and Playwright for browser automation. The dashboard is an operator console over persisted facts, not a separate source of truth.
That mattered because the design goal was trust. A product team should be able to see the source screenshot, route edge, persona, branch path, model-solved answer map, and timestamp behind any competitive claim.
System 01
Personas as portable patient profiles.
The crawler uses PASS and FAIL personas rather than generic users. PASS personas traverse approval paths. FAIL personas deliberately trigger clinical gates such as low BMI, pancreatitis history, contraindications, underage status, or nitrate therapy.
Decoupling personas from individual sites made the system scalable. The same semaglutide PASS profile can walk MedExpress, Voy, Numan, Pharmacy2U, and other competitors, making differences in question order, eligibility logic, and rejection framing visible.
This is where product design and AI operations met. The personas had to be specific enough for a model to answer forms consistently, but explicit enough that a human reviewer could understand what branch the system was trying to observe. PASS and FAIL were research instruments, not claims about real patients.
System 02
Page hash plus answer map.
Every interactive page gets reduced to a canonical signature of labels, inputs, and buttons. The system hashes that signature and looks up an answer map for the competitor, vertical, and page hash. If the form is unchanged, it replays the cached action plan. If the form is novel, Gemini solves it once and the answer map is stored for future runs.
This made the system economically viable. Playwright navigation is cheap; model calls are not. The crawler asks the model only when the page materially changes, which is exactly when human attention is useful too.
It also made the AI behaviour explainable. Instead of asking Gemini to "crawl the site," the system asked a bounded question: given this form, this persona, and these boundary rules, which visible control should be used next? The returned answer became a reusable map, not a hidden chain of improvisation.
Design value
Because the system stores page states, route edges, consultation paths, branch points, and screenshots, competitor findings stop evaporating into chat threads. They become searchable product evidence.
System 03
Gemini as a bounded interaction solver.
The model layer was designed as a specialist, not a supervisor. It received a cleaned representation of the current page, the active persona, treatment context, and explicit stop conditions. It returned structured actions that the crawler could validate, replay, and store.
This let the system handle messy real-world healthcare forms without hard-coding every label variant. The model could interpret that "Do you have a history of pancreatitis?" and "Have you ever had pancreas inflammation?" were equivalent for the persona, while the deterministic layer still controlled navigation, persistence, retries, and boundary enforcement.
- 01
Extract labels, inputs, buttons, and visible context from the current page.
- 02
Hash the interaction signature and check whether a validated answer map already exists.
- 03
If novel, ask Gemini for a structured action plan under persona and boundary constraints.
- 04
Replay, store, and expose the answer map so future runs are cheaper and more inspectable.
System 04
Boundaries built into the mechanism.
Two ethical boundaries shaped the architecture. The crawler does not submit fake medical intake to real prescribers, and it does not click payment buttons. Consultation branch replay lets the system return to stored branch points and explore alternatives without repeatedly pushing fake profiles through live prescriber workflows. Payment-boundary detection stops on pricing and billing surfaces without crossing into purchase.
I treated those boundaries as product requirements, not legal footnotes. The dashboard needed to show where the crawler stopped and why. The operator model needed cooldowns, review points, and anti-bot handling. The data model needed enough state to compare competitor behaviour without creating unsafe submissions.
- 01
Capture the visible form and available branches.
- 02
Persist the action path and branch alternatives.
- 03
Replay stored state to a known branch point.
- 04
Explore the alternative branch inside the bounded workflow.
Outcome
Competitive intelligence became refreshable, inspectable, and useful to design.
The system produced a 22-brand UK GLP-1 pricing competitive analysis with evidence per data point, rendered as stakeholder-friendly HTML and refreshable by supervisor command.
The dashboard gives designers and PMs access to crawl coverage, recent changes, route graphs, consultation branches, screenshot search, and competitor readiness without touching a CLI.
The same plan pattern from the MedExpress knowledge work carried over here, allowing multiple AI coding harnesses to work safely on documented phases.
The AI layer stayed useful because it was narrow. Gemini handled novel form interpretation; deterministic code handled repeatability, state, and boundaries; the dashboard made the intelligence usable by the product team.
Reflection
What I would keep improving.
Bring multi-site expansion forward
The system became much more valuable once the 20-site expansion landed because comparison, not crawling alone, is the product value.
Score model decisions over time
I would add evals for answer-map accuracy, stop-boundary detection, and branch-selection confidence so the model layer can be monitored like product behaviour.
Add cross-competitor flow diffs
The natural next step is side-by-side comparison of equivalent gate questions, rejection screens, pricing states, and safety boundaries.