Case Study 01: Conversational AI

Designing the Handoff Between AI and Humans

When a chatbot can't help, what should that moment feel like? I designed the conversational AI experience for homedepot.com, defining the line between automation and human connection.

Role Lead Product Designer
Timeline 2021 - 2025
Tools Voiceflow, Figma, Sprinklr
Team 1 Designer, 2 PMs, 4 Engineers
5 min read
0:00
Feb 2026

The chatbot worked. Customers still hated it.

Home Depot's existing chat system could answer questions. It could route tickets. By every internal metric, it was performing. But customers weren't satisfied, and the support team was overwhelmed by escalations that felt abrupt and confusing.

The problem wasn't the AI's accuracy. It was the experience around it. Customers didn't know when they were talking to a bot versus a human. The handoff between them felt like falling through a trapdoor. And when the AI couldn't help, it just... stopped. No explanation. No warmth. No path forward.

I was brought in to redesign the entire conversational experience, end to end.

Understanding the constraints before designing around them.

Before sketching anything, I needed to understand what the AI could actually do and where it broke. I spent two weeks mapping the existing system: what triggered the bot, how it parsed intent, where confidence scores dropped, and what happened when they did.

I also interviewed support associates. They had a perspective no dashboard could give me. They knew which escalations felt earned (complex problems that genuinely needed a human) and which felt broken (the bot just gave up mid-conversation).

Your artifact goes here
Annotated map of the existing system: intent triggers, confidence thresholds, and the failure points you identified during research. Could be a screenshot of a Figma board, a photo of sticky notes, or a stylized diagram.

"Every time the bot transfers me, I have to repeat everything. It's like calling a doctor's office and being put on hold three times."

Paraphrased from customer feedback

When should AI handle it, and when should a human?

This was the core design decision. Not a UI question. A product philosophy question. The answer shaped everything downstream: the dialog flows, the escalation logic, the tone of every message.

I mapped every conversation type against two axes: complexity (can the AI reliably handle this?) and emotional stakes (how frustrated or anxious is the customer likely to be?). Simple and low-stakes? AI handles it. Complex or high-stakes? Route to a human. The interesting work was everything in between.

Your artifact goes here
2x2 matrix: Complexity vs. Emotional Stakes. Where AI handles it, where humans handle it, and the ambiguous middle zone where the real design work happened. This is the kind of framework a hiring manager will screenshot.

Three design choices that shaped the experience.

Click each to see what I considered and what I chose.

Decision 01

Should the bot identify itself as a bot?

+

Some companies obscure the line between AI and human. We went the opposite direction. The bot always introduces itself clearly. Transparency upfront actually increased trust and reduced frustration when escalations happened.

Considered

Blend bot and human personas into one seamless experience. Minimize awareness of who or what is responding.

Chosen

Always transparent. The bot says who it is. When a human takes over, the customer knows. Trust is built on honesty, not illusion.

Decision 02

What should the handoff moment feel like?

+

The old experience was a hard cut: "Transferring you to an agent." Then silence. The redesigned handoff passes context forward so the customer never repeats themselves, sets expectations, and maintains conversational continuity.

Your artifact goes here
Before/After: Old handoff (hard cut, dead air, context lost) vs. new handoff (summary passed, wait time set, bot stays present)
Considered

Instant transfer. Fast, but cold. Customer lands in a queue with no context and repeats their story.

Chosen

Warm handoff. Bot summarizes the conversation, sets a time expectation, and stays present until the associate joins. No dead air.

Decision 03

How should the AI handle "I don't know"?

+

The hardest edge case. When the model's confidence is low but the query doesn't clearly need a human. The old system would loop: "I'm sorry, can you rephrase that?" The new system gives the AI one honest attempt, then offers a clear choice.

Your artifact goes here
Low-confidence flow: AI attempts once, then offers the user a clear fork. "Try another question" or "Talk to a person." No loops, no dead ends.
Considered

Keep trying. Ask the customer to rephrase. Attempt to resolve without escalation. Felt like a trap.

Chosen

One honest attempt, then give the customer the choice. Respect their time. "I'm not sure I can help with this. Want to try another question, or talk to a person?"

This work led to Magic Apron.

The patterns I established for the chatbot (transparency about AI identity, warm handoffs, graceful failure) became the foundation for Magic Apron, Home Depot's generative AI shopping assistant. I joined the early design team and helped shape the interaction model before the product had established patterns.

The core question was the same, just bigger: when a generative AI can answer almost anything, how do you design for the moments it shouldn't?

Your artifact goes here
How the three core patterns (transparency, warm handoff, graceful failure) evolved from the chatbot into Magic Apron's interaction model. Could be a simple annotated timeline or side-by-side comparison.

What shipped and what it changed.

The redesigned conversational experience launched across homedepot.com, handling both pre- and post-transaction inquiries through a unified chatbot and live chat system built in Voiceflow.

~25%
Reduction in average
handle time
3
Core patterns adopted
by Magic Apron
1
Unified system replacing
fragmented tools

Placeholder: Escalation rate change, e.g. "X% reduction in unnecessary escalations"

Placeholder: Customer satisfaction signal, e.g. CSAT improvement or qualitative associate feedback

Placeholder: Adoption metric, e.g. "Patterns became standard for all new conversational AI work"

Placeholder: Associate feedback, e.g. "Less context-switching during handoffs"

If I Could Tend This Garden Again

I would have pushed harder for a feedback loop from customers after each AI interaction. We measured handle time and escalation rates, but we didn't systematically capture whether the customer felt heard. Metrics told us the system was faster. They didn't tell us if it felt better. In my next conversational AI project, I'd design the measurement into the experience from day one.

Next Case Study
Making Customer Service Associates Faster with AI →
AI-Assisted Tooling