Case study / HeliosX / AI product design
Designing Leo and an AI-assisted acquisition workflow for MedExpress UK.
Over five intensive weeks I turned scattered MedExpress UK knowledge into an agent-ready design memory, a generated Claude Project assistant, and a prototype quality gate where AI could help generate, critique, and revise product work without losing brand, clinical, analytics, or compliance context.
The problem
AI could move faster than the organisation's design memory.
MedExpress UK is the largest brand inside HeliosX, a regulated online pharmacy and telehealth platform. The acquisition funnel handles high-volume GLP-1 weight-loss journeys across marketing, product pages, consultation, checkout, verification, clinical review, and retention.
When I started, the design work had no reliable memory. Research findings lived in Slack, brand rules lived across old source material, Figma and code disagreed, analytics questions had competing answers, and AI tools generated plausible screens that ignored MedExpress-specific brand and compliance constraints.
The product-design challenge was not "how do we use AI?" It was how to make AI useful inside a regulated product workflow: give it the right sources, define what it was allowed to decide, expose what it was relying on, and create review gates strong enough that speed did not become risk.
Impact
What changed.
The work produced a reusable AI operating layer for acquisition design, not just a collection of prototype screens.
AI product brief
The job was to design the system around the model.
Other product designer portfolios often show the final interface, then unpack the reasoning behind it: user need, constraints, role, decision trade-offs, and measured outcome. I am using the same structure here, but the interface is partly an operating model. Leo was the product surface; the knowledge base, source hierarchy, answer contract, prompts, critic loop, and generated bundles were the product machinery.
I treated AI as a collaborator with a narrow job description. It could retrieve source material, propose prototype directions, draft copy variants, run heuristic reviews, and identify gaps. It could not invent clinical policy, override brand rules, treat synthetic personas as research, or ship a screen without deterministic and human-readable review.
Architecture
Two repositories: one for what is true, one for what is being tested.
The most important structural decision was separating stable product knowledge from experimental working state. The knowledge base holds brand, clinical, compliance, analytics, flow, component, and customer evidence. The acquisition experiments repo holds hypotheses, prototypes, reviewer outputs, quality-gate state, and plan files.
That split made the work agent-friendly in the same way modern AI design tools use project context or design-system files: stable rules are imported, fast explorations happen in a sandbox, and only reviewed learnings graduate back into the system.
Problem space
Solution space
System 01
Knowledge base as AI-readable design memory.
I structured the knowledge base around how a designer reaches for context: overview, design system, pages, flows, components, clinical, compliance, analytics, evaluations, and reports. Every file carries YAML frontmatter, tags, cross-links, and review dates so humans and AI agents can discover the same material.
The knowledge base gave MedExpress design a stable ground truth: brand rules, consultation gates, CAP/ASA advertising constraints, GPhC prescribing verification expectations, page anatomy, design principles, and customer evidence became searchable, versioned, and reusable.
The key product-design decision was to make the source layer readable before it was clever. Markdown stayed canonical because designers, PMs, writers, and agents could all inspect it. Frontmatter gave retrieval enough structure to rank and filter. Generated bundles and indexes were treated as caches, not truth.
Design principle
Markdown became the shared format because it is readable by designers, product managers, writers, and AI tools. JSON was reserved for code-to-code state, not human-authored product knowledge.
System 02
Research and analytics synthesis.
I synthesised customer segmentation, JTBD retention research, interview transcripts, weekly UXR insights, Trustpilot data, GA4, Metabase, and Amplitude into usable design artefacts. The output was not a deck; it was a connected evidence layer.
The Q1-Q9 analytics pack reframed several roadmap assumptions. Paid Social underperformance was largely an in-app webview problem. Homepage weight-loss drag was a CTA discoverability issue. GP-consent drop-off was exit-without-engaging, not opt-out. AV booking asymptoted at seven days, so delayed return was not the primary explanation.
For AI product work, this mattered because prompts without evidence quickly become confident theatre. The analytics pack gave agents and humans the same briefable facts: what was measured, what it suggested, what remained uncertain, and which claim was safe to use in a prototype rationale.
System 03
Prototype workflow with real gates.
I built a prototype quality gate that combines deterministic checks with agentic judgement. A generator creates or revises the prototype, lint and visual audits run first, and then specialist critics review UI/UX, visual brand, copy, compliance, and persona fit. A coordinator moves the run through deterministic terminal states such as approved, awaiting human review, escalated, or budget exhausted.
The point was not to make AI produce pretty screens faster. It was to make AI-assisted design inspectable: every output had a brief, section inventory, state file, iteration folders, screenshots, reviewer findings, and a documented decision trail.
- 01
Scaffold the run with a brief, section inventory, and page assembly contract.
- 02
Generate against brand, analytics, compliance, and persona context.
- 03
Run deterministic lint and visual audit before any critic judgement.
- 04
Fan out to specialist critics and iterate until a terminal state is reached.
Why this is product design
The interface was not just the prototype page. It was the workflow around the page: what the agent sees, how it reports uncertainty, where deterministic checks run, when a human is required, and how the next iteration inherits the last decision.
System 04
Leo as a team-facing AI product.
Leo was the shared MedExpress design and product assistant created for the HeliosX team. It packaged the knowledge base into a generated Claude Project bundle with identity, answer contract, source hierarchy, task routing, sample prompts, maintenance rules, and version history.
I designed Leo around the questions a designer or PM actually asks: what does the brand allow, what evidence supports this claim, how should this flow handle a clinical gate, which page anatomy should I reuse, and where is the source confidence weak? That made the assistant a product surface for design operations, not a novelty chat window.
The most important interaction pattern was source confidence. Leo had to distinguish curated synthesis from analytics-backed findings, code-backed behaviour, operational policy, and unsupported gaps. If the source was missing, the correct answer was to say so and route the user back to the missing evidence.
Outcome
The archaeology stage of design work became a reusable product capability.
New MedExpress design tasks now start from known brand, clinical, regulatory, analytics, and customer-research context rather than scattered memory.
The team-facing Claude Project bundle, internally codenamed Leo, gives designers a grounded MedExpress assistant with brand rules, page anatomy, content rules, source hierarchy, and prompts preloaded.
The rebrand recreation gives designers and AI tools a readable HTML/CSS source of truth when Figma, code, and live rollout are not perfectly aligned.
The quality gate made the AI workflow legible enough to critique: generator output, deterministic checks, specialist review, iteration state, and final decision all stay visible.
Reflection
What I would keep improving.
Move retrieval closer to the work surface
The bundle works, but a live RAG layer would let Leo retrieve narrower source context instead of relying on a broad packaged memory.
Close analytics-to-prototype handoff
The next step is a workflow-owned handoff file that turns evidence into prototype briefs without manual rewriting.
Evaluate outputs like product behaviour
I would add scored evals for source use, policy handling, brand fidelity, and persona fit so prototype quality can be measured over time.