Back to work

Case study / HeliosX / AI product design

Designing Leo and an AI-assisted acquisition workflow for MedExpress UK.

Over five intensive weeks I turned scattered MedExpress UK knowledge into an agent-ready design memory, a generated Claude Project assistant, and a prototype quality gate where AI could help generate, critique, and revise product work without losing brand, clinical, analytics, or compliance context.

The problem

AI could move faster than the organisation's design memory.

MedExpress UK is the largest brand inside HeliosX, a regulated online pharmacy and telehealth platform. The acquisition funnel handles high-volume GLP-1 weight-loss journeys across marketing, product pages, consultation, checkout, verification, clinical review, and retention.

When I started, the design work had no reliable memory. Research findings lived in Slack, brand rules lived across old source material, Figma and code disagreed, analytics questions had competing answers, and AI tools generated plausible screens that ignored MedExpress-specific brand and compliance constraints.

The product-design challenge was not "how do we use AI?" It was how to make AI useful inside a regulated product workflow: give it the right sources, define what it was allowed to decide, expose what it was relying on, and create review gates strong enough that speed did not become risk.

Impact

What changed.

The work produced a reusable AI operating layer for acquisition design, not just a collection of prototype screens.

212markdown files in the MedExpress knowledge base
60HTML/CSS rebrand components recreated from the live site
9evidence-sourced GLP-1 personas for new and repeat customers
41iterations of the shared MedExpress design-agent bundle shipped to the team

AI product brief

The job was to design the system around the model.

Other product designer portfolios often show the final interface, then unpack the reasoning behind it: user need, constraints, role, decision trade-offs, and measured outcome. I am using the same structure here, but the interface is partly an operating model. Leo was the product surface; the knowledge base, source hierarchy, answer contract, prompts, critic loop, and generated bundles were the product machinery.

I treated AI as a collaborator with a narrow job description. It could retrieve source material, propose prototype directions, draft copy variants, run heuristic reviews, and identify gaps. It could not invent clinical policy, override brand rules, treat synthetic personas as research, or ship a screen without deterministic and human-readable review.

ContextMarkdown sources, analytics packs, personas, brand rules, clinical gates
Model roleRetrieve, synthesize, generate, critique, and prepare handoffs
EvaluationLint checks, visual review, specialist critics, terminal states
HandoffClaude Project bundle, source chips, prompt routes, version history

Architecture

Two repositories: one for what is true, one for what is being tested.

The most important structural decision was separating stable product knowledge from experimental working state. The knowledge base holds brand, clinical, compliance, analytics, flow, component, and customer evidence. The acquisition experiments repo holds hypotheses, prototypes, reviewer outputs, quality-gate state, and plan files.

That split made the work agent-friendly in the same way modern AI design tools use project context or design-system files: stable rules are imported, fast explorations happen in a sandbox, and only reviewed learnings graduate back into the system.

System 01

Knowledge base as AI-readable design memory.

I structured the knowledge base around how a designer reaches for context: overview, design system, pages, flows, components, clinical, compliance, analytics, evaluations, and reports. Every file carries YAML frontmatter, tags, cross-links, and review dates so humans and AI agents can discover the same material.

The knowledge base gave MedExpress design a stable ground truth: brand rules, consultation gates, CAP/ASA advertising constraints, GPhC prescribing verification expectations, page anatomy, design principles, and customer evidence became searchable, versioned, and reusable.

The key product-design decision was to make the source layer readable before it was clever. Markdown stayed canonical because designers, PMs, writers, and agents could all inspect it. Frontmatter gave retrieval enough structure to rank and filter. Generated bundles and indexes were treated as caches, not truth.

Design principle

Markdown became the shared format because it is readable by designers, product managers, writers, and AI tools. JSON was reserved for code-to-code state, not human-authored product knowledge.

System 02

Research and analytics synthesis.

I synthesised customer segmentation, JTBD retention research, interview transcripts, weekly UXR insights, Trustpilot data, GA4, Metabase, and Amplitude into usable design artefacts. The output was not a deck; it was a connected evidence layer.

The Q1-Q9 analytics pack reframed several roadmap assumptions. Paid Social underperformance was largely an in-app webview problem. Homepage weight-loss drag was a CTA discoverability issue. GP-consent drop-off was exit-without-engaging, not opt-out. AV booking asymptoted at seven days, so delayed return was not the primary explanation.

For AI product work, this mattered because prompts without evidence quickly become confident theatre. The analytics pack gave agents and humans the same briefable facts: what was measured, what it suggested, what remained uncertain, and which claim was safe to use in a prototype rationale.

Paid Social28% of new users, 1.3% of revenue
Homepage9.5% page-to-CTA click vs 48.6% category page
GP consent~17% drop was disengagement, not toggle refusal

System 03

Prototype workflow with real gates.

I built a prototype quality gate that combines deterministic checks with agentic judgement. A generator creates or revises the prototype, lint and visual audits run first, and then specialist critics review UI/UX, visual brand, copy, compliance, and persona fit. A coordinator moves the run through deterministic terminal states such as approved, awaiting human review, escalated, or budget exhausted.

The point was not to make AI produce pretty screens faster. It was to make AI-assisted design inspectable: every output had a brief, section inventory, state file, iteration folders, screenshots, reviewer findings, and a documented decision trail.

  1. 01

    Scaffold the run with a brief, section inventory, and page assembly contract.

  2. 02

    Generate against brand, analytics, compliance, and persona context.

  3. 03

    Run deterministic lint and visual audit before any critic judgement.

  4. 04

    Fan out to specialist critics and iterate until a terminal state is reached.

Why this is product design

The interface was not just the prototype page. It was the workflow around the page: what the agent sees, how it reports uncertainty, where deterministic checks run, when a human is required, and how the next iteration inherits the last decision.

System 04

Leo as a team-facing AI product.

Leo was the shared MedExpress design and product assistant created for the HeliosX team. It packaged the knowledge base into a generated Claude Project bundle with identity, answer contract, source hierarchy, task routing, sample prompts, maintenance rules, and version history.

I designed Leo around the questions a designer or PM actually asks: what does the brand allow, what evidence supports this claim, how should this flow handle a clinical gate, which page anatomy should I reuse, and where is the source confidence weak? That made the assistant a product surface for design operations, not a novelty chat window.

The most important interaction pattern was source confidence. Leo had to distinguish curated synthesis from analytics-backed findings, code-backed behaviour, operational policy, and unsupported gaps. If the source was missing, the correct answer was to say so and route the user back to the missing evidence.

IdentityWhat Leo is, who it helps, and which tasks it should route
ContractAnswer format, citations, uncertainty, and refusal behaviour
MaintenanceFreshness checks, source hierarchy, versioned bundle exports
Team valueLess Slack archaeology, faster briefs, safer AI-assisted prototyping

Outcome

The archaeology stage of design work became a reusable product capability.

New MedExpress design tasks now start from known brand, clinical, regulatory, analytics, and customer-research context rather than scattered memory.

The team-facing Claude Project bundle, internally codenamed Leo, gives designers a grounded MedExpress assistant with brand rules, page anatomy, content rules, source hierarchy, and prompts preloaded.

The rebrand recreation gives designers and AI tools a readable HTML/CSS source of truth when Figma, code, and live rollout are not perfectly aligned.

The quality gate made the AI workflow legible enough to critique: generator output, deterministic checks, specialist review, iteration state, and final decision all stay visible.

Reflection

What I would keep improving.

Move retrieval closer to the work surface

The bundle works, but a live RAG layer would let Leo retrieve narrower source context instead of relying on a broad packaged memory.

Close analytics-to-prototype handoff

The next step is a workflow-owned handoff file that turns evidence into prototype briefs without manual rewriting.

Evaluate outputs like product behaviour

I would add scored evals for source use, policy handling, brand fidelity, and persona fit so prototype quality can be measured over time.