Website Chatbot: A Pare Studio Case Study

Most people who land on a website have one or two questions standing between them and an enquiry. The answers exist, but they live in someone's head, or three clicks deep on the site, or only between 9 and 5. By the time the visitor finds them, they've moved on to the next tab. The bottleneck isn't the offer. It's the gap between the question and the answer.

Case Study, not Delivered Project.

How we'd approach this kind of problem. See delivered projects →

5 minutes Leads contacted within 5 minutes are 21x more likely to qualify than those contacted after 30 (InsideSales/Harvard Business Review)

You probably have this problem if…

Analytics show steady traffic but the contact form barely fires.
The phone is busy during the day and silent after 5pm, while the site visits keep arriving.
The same handful of pre-sales questions come up in every call. Area, timeline, rough price, process.
Visitors land on a service page, scroll, and leave before they ever reach the form.
You added a FAQ page once and nobody reads it.

Any two of those is a strong signal. All of them and you're paying for traffic that turns into nothing.

How the pattern works

Imagine a receptionist who has memorised your services, your coverage area, your usual timelines, your rough price bands, and the answer to every pre-sales question you've ever heard. They sit at the desk at midnight and at midday. When a question comes up that they aren't sure about, they say so, take the visitor's details, and pass the message on. When the visitor is ready to book, they collect what you need and put it in your inbox. That's the system.

Five things have to work for it to be useful.

From a visitor's question to a lead in your inbox, with a cited answer at every step.

1. It listens to the question in plain English

A small chat widget sits in the corner of every page. The visitor types the way they'd speak. "Do you cover Warrington." "How long does a conservatory take." "Do I need planning permission." The widget reads the message, keeps the conversation so far in context, and decides what to do next. It loads asynchronously and adds no measurable weight to page load. The visible experience is identical to a hosted SaaS widget. Everything underneath is yours.

2. It looks up the answer in your own information

The system has been given a knowledge base written in your voice. For a small business that's a few pages of services, areas covered, process, pricing guidance, credentials, and the questions that come up most often. For larger catalogues that's a proper retrieval index across product pages, specifications, and policy documents. The retrieval mechanics are the same ones we use in our document Q&A pattern. Passages get fingerprinted by meaning, the visitor's question gets fingerprinted the same way, and the closest passages come back as source material.

The line between "small enough for a system prompt" and "needs proper retrieval" sits somewhere around fifty pages of source material. Below that, stuffing everything into the prompt is simpler, cheaper, and behaves predictably. Above it, the model starts to lose track of where the answer lives and retrieval starts to pay for itself.

3. It writes the answer, citing what it used

The language model receives the visitor's question, the conversation so far, and the retrieved passages. It writes a direct reply and links to the page the answer came from. In the system prompt it's told to refuse to guess. If the answer isn't in the retrieved passages, it says so plainly and offers to take a message. That refusal behaviour is the single biggest predictor of whether a customer-facing bot stays trustworthy. Research on retrieval-augmented systems consistently shows large reductions in fabrication versus an ungrounded chatbot.

4. It works out whether the visitor wants to enquire

Every message is classified before the reply is sent. Browsing questions stay in answer mode. Buying signals switch to capture. Anything outside the bot's remit lands with a human.

The same model that writes the answer also classifies intent on every message. Browsing questions ("how does it work", "what areas do you cover") stay in answer mode. Buying signals ("can someone come round", "what would it cost for my property", "I want a quote") switch the conversation into capture mode. Anything sensitive, contested, or outside the configured scope triggers a human handover. The classifier is a thin layer of prompting and a small set of structured outputs. It doesn't need its own model, and trying to run it as a separate microservice is usually overkill at this stage.

5. It captures the lead in the format your team already uses

Once the conversation is in capture mode, the bot asks for the few details your team actually needs. Name, contact, postcode, project type, rough timing. Those details land in the same lead pipeline as your phone enquiries and contact form, so nothing has to be re-keyed. If the visitor asks for something the bot isn't allowed to commit to, a firm price, a confirmed appointment, anything contractual, it offers a handover instead. A callback slot, a contact form, or a "we'll be in touch first thing" message, depending on how you want it configured. Notifications go out the moment a hot lead lands so the owner knows before the visitor closes the tab.

The default stack

Chat widget Custom embed script (WordPress, Squarespace, Webflow, plain HTML)

Transport Server-sent events for streamed replies

Knowledge base System prompt for small sites, pgvector retrieval for large catalogues

Embeddings (large knowledge bases) Voyage AI

Storage Neon Postgres with pgvector

Answer generation & intent Claude Sonnet (Anthropic)

Lead pipeline Shared with phone, WhatsApp, and form enquiries

Notifications Twilio SMS plus email to the owner

Two choices that matter most.

Custom embed over a hosted SaaS widget. Most hosted widgets ship with a generic personality, push their own branding, cap the conversation logic, and meter usage per visitor. A small custom widget is a few hundred lines of code and gives you full control over the tone, the refusal behaviour, the handover rules, and where the lead lands. The cost difference becomes real well before the volume becomes interesting.

Server-sent events over WebSockets. The reply needs to stream token by token to feel responsive. SSE is the simpler protocol, works through every CDN and corporate proxy without configuration, and is enough for one-way streaming from server to client. Visitor messages go up over plain HTTPS. WebSockets only earn their complexity if you also want live human takeover inside the same widget, in which case Twilio Conversations or a similar service is a better path than building it yourself.

When this isn't the right fit

The pattern is powerful, but it's the wrong tool for some problems.

As your only support channel. A chatbot is a front door, not a service team. If a visitor is stuck mid-purchase, frustrated with a product, or needs to discuss something sensitive, the system has to hand off cleanly. Build this pattern alongside a phone number, an email address, and a contact form. Never as a replacement for all three.

Quoting firm prices. For any business where the price genuinely depends on the survey, the property, the spec, or the supplier of the day, the bot should not commit. It can quote bands ("most jobs in this range fall between £X and £Y"), but a firm number must come from a human. The bot's job is to capture enough detail for the quote to be accurate, not to write the quote.

Same-day appointments without human confirmation. The bot can offer provisional slots, but a same-day site visit committed by an unsupervised AI is a recipe for double-bookings, no-shows, and an irritated tradesperson. Treat any visit the bot proposes as provisional until a human confirms.

Regulated advice. For legal, medical, or financial questions the disclaimers and refusal behaviour need to be designed with whoever oversees compliance, and the bot must escalate every borderline case. The pattern still works, but the build is heavier and the audit conversation comes first.

What to expect

Implementation time 3–5 weeks for a typical first build, including a knowledge-base session with the owner and a week of supervised live tuning.

Deployment options Cloud-hosted by default behind a single embed script. The widget works on every common CMS and on plain HTML.

Infrastructure cost Roughly £30–120 per month for a small business site, covering the language model, storage, and notifications. Cost scales with conversation volume.

Typical reply latency First token in under a second, full streamed answer in 3–8 seconds. Fast enough to feel like a person typing back, not a form submission.

After-hours capture For service businesses with daytime phone cover, expect 25–45% of captured leads to arrive outside working hours, when the phone would have gone unanswered. The exact split is set by your traffic patterns.

Secondary benefits A weekly transcript review showing the exact questions visitors are asking. Useful for new website copy, new service pages, and sales script updates.

If this pattern fits your team

A Pare Audit is the way to find out whether it does, and what a delivery would look like in your specific situation. We spend a focused few days with you, look at your real website traffic, your real enquiries, and the questions you actually get asked, and come back with a written recommendation, a scoped build, and a costed plan.

Answer visitor questions, capture leads after hours

You probably have this problem if…

How the pattern works

1. It listens to the question in plain English

2. It looks up the answer in your own information

3. It writes the answer, citing what it used

4. It works out whether the visitor wants to enquire

5. It captures the lead in the format your team already uses

The default stack

When this isn't the right fit

What to expect

If this pattern fits your team

Find out for sure with a Pare Audit.