Back to work

Case study / HeliosX / AI product design

An AI competitor-intelligence system for regulated medical-intake flows.

I designed and architected a TypeScript, Playwright, and Gemini system that walks regulated UK healthcare competitors as different patient personas, hashes the forms it sees, replays cached answer maps, asks the model only when something changes, and alerts the team when a competitor's experience materially shifts.

The problem

The most useful competitor information was hidden behind medical questionnaires.

In regulated healthcare, the strategic detail is rarely on the marketing page. It lives inside the consultation flow: eligibility gates, rejection framing, clinical questions, post-consultation pricing, and checkout boundaries.

Walking those flows manually is slow and ethically fraught. The system needed to observe competitor journeys, compare them over time, and preserve evidence without submitting fake medical intake to real prescribers or crossing payment boundaries.

The AI product design challenge was to decide exactly where the model belonged. Gemini was useful for interpreting novel forms and selecting safe next actions; it was not allowed to become an unbounded browser agent that could invent patient facts, bypass clinical gates, or drift past payment and prescribing boundaries.

Impact

From Slack screenshots to a queryable intelligence system.

The output was a durable AI-assisted system that answers questions like "what changed on Voy this week?" or "how does Numan handle a low-BMI branch?" with evidence.

22UK healthcare competitors configured for crawl coverage
16PASS and FAIL personas across verticals and clinical gates
57test files protecting crawler, dashboard, docs, and supervisor behaviour
1repeatable 22-brand GLP-1 pricing analysis surface

AI product brief

A model-in-the-loop system, not a free-roaming bot.

Many AI products are presented as a chat box wrapped around a workflow. This project needed a more specific interaction model: a crawler that could behave like a careful researcher, ask Gemini for help only when the interface was novel, and turn every observation into inspectable product evidence.

I designed the system around separation of concerns. Playwright handled deterministic navigation and state capture. Personas supplied clinical intent and safe test data. Page hashes decided whether a form was known or new. Gemini solved only the novel interaction problem and returned structured actions. Postgres preserved the evidence for dashboard review.

Model inputCleaned DOM, persona intent, competitor context, boundary rules
Model outputStructured answer map, confidence, and next-action rationale
Deterministic layerHashing, replay, route edges, branch storage, payment stops
Human surfaceDashboard, screenshots, change events, pricing analysis

System shape

Persisted evidence first, dashboard second.

The architecture favoured boring, inspectable choices: native Node HTTP, server-rendered HTML, vanilla JavaScript, raw SQL, Postgres for durable state, Redis for in-flight work, and Playwright for browser automation. The dashboard is an operator console over persisted facts, not a separate source of truth.

That mattered because the design goal was trust. A product team should be able to see the source screenshot, route edge, persona, branch path, model-solved answer map, and timestamp behind any competitive claim.

System 01

Personas as portable patient profiles.

The crawler uses PASS and FAIL personas rather than generic users. PASS personas traverse approval paths. FAIL personas deliberately trigger clinical gates such as low BMI, pancreatitis history, contraindications, underage status, or nitrate therapy.

Decoupling personas from individual sites made the system scalable. The same semaglutide PASS profile can walk MedExpress, Voy, Numan, Pharmacy2U, and other competitors, making differences in question order, eligibility logic, and rejection framing visible.

This is where product design and AI operations met. The personas had to be specific enough for a model to answer forms consistently, but explicit enough that a human reviewer could understand what branch the system was trying to observe. PASS and FAIL were research instruments, not claims about real patients.

PASSHealthy 42, tirzepatide candidate
FAILLow BMI 26, universal threshold fail
EDGEEthnicity-adjusted NICE threshold branch
FAILContraindication or medication safety gate

System 02

Page hash plus answer map.

Every interactive page gets reduced to a canonical signature of labels, inputs, and buttons. The system hashes that signature and looks up an answer map for the competitor, vertical, and page hash. If the form is unchanged, it replays the cached action plan. If the form is novel, Gemini solves it once and the answer map is stored for future runs.

This made the system economically viable. Playwright navigation is cheap; model calls are not. The crawler asks the model only when the page materially changes, which is exactly when human attention is useful too.

It also made the AI behaviour explainable. Instead of asking Gemini to "crawl the site," the system asked a bounded question: given this form, this persona, and these boundary rules, which visible control should be used next? The returned answer became a reusable map, not a hidden chain of improvisation.

Design value

Because the system stores page states, route edges, consultation paths, branch points, and screenshots, competitor findings stop evaporating into chat threads. They become searchable product evidence.

System 03

Gemini as a bounded interaction solver.

The model layer was designed as a specialist, not a supervisor. It received a cleaned representation of the current page, the active persona, treatment context, and explicit stop conditions. It returned structured actions that the crawler could validate, replay, and store.

This let the system handle messy real-world healthcare forms without hard-coding every label variant. The model could interpret that "Do you have a history of pancreatitis?" and "Have you ever had pancreas inflammation?" were equivalent for the persona, while the deterministic layer still controlled navigation, persistence, retries, and boundary enforcement.

  1. 01

    Extract labels, inputs, buttons, and visible context from the current page.

  2. 02

    Hash the interaction signature and check whether a validated answer map already exists.

  3. 03

    If novel, ask Gemini for a structured action plan under persona and boundary constraints.

  4. 04

    Replay, store, and expose the answer map so future runs are cheaper and more inspectable.

System 04

Boundaries built into the mechanism.

Two ethical boundaries shaped the architecture. The crawler does not submit fake medical intake to real prescribers, and it does not click payment buttons. Consultation branch replay lets the system return to stored branch points and explore alternatives without repeatedly pushing fake profiles through live prescriber workflows. Payment-boundary detection stops on pricing and billing surfaces without crossing into purchase.

I treated those boundaries as product requirements, not legal footnotes. The dashboard needed to show where the crawler stopped and why. The operator model needed cooldowns, review points, and anti-bot handling. The data model needed enough state to compare competitor behaviour without creating unsafe submissions.

  1. 01

    Capture the visible form and available branches.

  2. 02

    Persist the action path and branch alternatives.

  3. 03

    Replay stored state to a known branch point.

  4. 04

    Explore the alternative branch inside the bounded workflow.

Outcome

Competitive intelligence became refreshable, inspectable, and useful to design.

The system produced a 22-brand UK GLP-1 pricing competitive analysis with evidence per data point, rendered as stakeholder-friendly HTML and refreshable by supervisor command.

The dashboard gives designers and PMs access to crawl coverage, recent changes, route graphs, consultation branches, screenshot search, and competitor readiness without touching a CLI.

The same plan pattern from the MedExpress knowledge work carried over here, allowing multiple AI coding harnesses to work safely on documented phases.

The AI layer stayed useful because it was narrow. Gemini handled novel form interpretation; deterministic code handled repeatability, state, and boundaries; the dashboard made the intelligence usable by the product team.

Reflection

What I would keep improving.

Bring multi-site expansion forward

The system became much more valuable once the 20-site expansion landed because comparison, not crawling alone, is the product value.

Score model decisions over time

I would add evals for answer-map accuracy, stop-boundary detection, and branch-selection confidence so the model layer can be monitored like product behaviour.

Add cross-competitor flow diffs

The natural next step is side-by-side comparison of equivalent gate questions, rejection screens, pricing states, and safety boundaries.