Requesty
Back|FEB '26OBSERVABILITY / REQUESTY FEATURES
4 MIN READ|

Closing the loop: how to turn user feedback into a routing signal

Thibault Jaigu
Thibault Jaigu
CEO & Co-Founder
Last updated

Most LLM teams treat user feedback like an afterthought: a thumbs-up widget in the UI that nobody reads. That's a mistake. A single bit of post-hoc quality signal per request is the cheapest eval you will ever run, and the one that's closest to what users actually want. Requesty's Request Feedback API is a metadata sidecar that attaches arbitrary signals — ratings, tags, comments, user IDs — to any completed request by its request_id, so the feedback can drive routing, prompt changes, and quality regression alerts.

This post covers the why, the wire format, and three patterns worth copying.

Why feedback beats offline evals at the 80% mark

Offline evals catch the cases you thought to write. Users hit the cases you didn't. A 2024 audit I keep citing found about 15% of production LLM failures happen on inputs that never appear in the eval set — because users are creative and your test writers aren't. That gap is where feedback wins.

SignalLatency to detect issueCovers unknown cases?Cost
Offline evalsNext eval run (daily)No — only what's testedHigh (eng time)
Canary metricsMinutesPartial — aggregate onlyLow
User feedbackReal-timeYesNear-zero
Post-hoc auditDaysYesHigh (human)

Feedback plugs the gap between "we have tests" and "we know the product is working."

The wire format (10 lines of code)

Every Requesty chat completion returns a request_id. Capture it, store it against whatever user action produced it, and POST back when you learn whether the output was good.

Python
from openai import OpenAI
import httpx
 
client = OpenAI(
    base_url="https://router.requesty.ai/v1",
    api_key="your-requesty-api-key",
)
 
resp = client.chat.completions.create(
    model="policy/support-v2",
    messages=[{"role": "user", "content": user_question}],
)
answer = resp.choices[0].message.content
request_id = resp.id  # keep this around
 
# Later — when the user clicks 👍 / 👎, or your eval pipeline scores it:
httpx.post(
    f"https://api.requesty.ai/feedback/{request_id}",
    headers={"Authorization": "Bearer your-requesty-api-key"},
    json={
        "data": {
            "rating": 5,                    # or 1, whatever scale you use
            "helpful": True,
            "message": "Resolved the user's issue.",
            "user_id": "u_28419",
            "tags": ["support", "resolved", "first-reply"],
        }
    },
)

The data object is completely free-form. Rating, helpful, message, user_id, tags — or whatever schema your team uses. Multiple submissions merge: you can POST a thumbs-up from the UI immediately and a structured eval score from your nightly pipeline later, and both end up on the same request.

Three patterns worth copying

1. Feedback-driven canary promotion

Run a 90/10 load-balancing policy. Your stable routing policy gets 90%, an experimental one gets 10%. Pipe feedback into your warehouse. Daily: compute avg(rating) and count(helpful=false) per policy. If the experimental one wins by a margin you set, promote its weight to 20, then 50, then 100.

This is the cleanest version of "progressive delivery" for LLMs — you don't need an eval harness or a team of labellers. Real users are the evaluator.

2. Bad-feedback → auto-escalate

Tag low-rated requests in real time and route the next turn from that same user to a stronger model. Pseudocode:

Python
if last_rating_from(user_id) <= 2:
    resp = client.chat.completions.create(
        model="policy/escalated",   # opus / gpt-5 tier
        messages=messages,
    )
else:
    resp = client.chat.completions.create(
        model="policy/default",
        messages=messages,
    )

You're using the feedback stream as a routing primitive — a "the cheap model failed this user once, give them the good one" signal. See also: Routing policies 101 for how to build policies like policy/escalated.

3. Regression alerts with tags

Tag every feedback submission with the product surface that generated it (support, onboarding, checkout, etc.) and the prompt version (prompt_v=2025-11-08). When you ship a new prompt, watch the rolling helpful=false rate per surface. If checkout drops 3 points overnight, the new prompt is the suspect. Roll back.

That's the whole workflow. It's not complicated — it's just that most teams don't have the plumbing to wire feedback back to requests in the first place, which is what the API gives you.

What to track, and what to ignore

A common mistake is to capture too much. For most teams, the high-signal fields are:

  • rating (1–5 or thumbs) — always capture
  • helpful (boolean) — yes/no is easier for users than a 5-point scale; capture both if you can
  • user_id — for cohort analysis
  • tags with the prompt version and the routing policy — so you can attribute regressions

Skip free-form text from users in the main data field. Collect it in message if they volunteer it, but don't gate feedback on writing a comment — 90% of users won't.

The one thing to take away

Production feedback is a continuous, free, real-user eval. It catches regressions your test suite can't, it pays for itself the first time you roll back a bad prompt because of it, and it's 15 lines of code to wire up. Routing gateways make it cheap; your product team makes it valuable.

Route → observe → feedback → route again. That's the loop.

Frequently asked questions

What is Request Feedback in Requesty?
Request Feedback is a Requesty API endpoint that lets your application attach post-hoc signals — ratings, comments, booleans, tags, user IDs — to a completed LLM request using its request_id. The feedback is stored alongside the request and surfaced in the dashboard analytics, so you can slice quality by model, prompt, user segment, or any custom tag.
How do I send feedback for an LLM response?
Capture the request_id from the chat completion response, then POST to https://api.requesty.ai/feedback/{request_id} with a JSON body containing any fields you care about — rating, helpful, message, user_id, tags. Multiple submissions merge, so you can append enrichment later (e.g. after an eval job runs).
What's the difference between feedback and evals?
Feedback is a lightweight production signal you collect continuously from real users or downstream checks. Evals are heavier, structured test suites run offline. Feedback tells you which specific production requests failed and why — a signal your evals probably don't catch because users hit edge cases you didn't write tests for.
Can I use feedback to pick routing policies automatically?
Not automatically on Requesty today — policies are configured explicitly, not learned. But you can feed the feedback stream into your own pipeline: filter requests with rating < 3 in the last 24h, check which model served them, and promote or demote candidates in your load-balancing policy. Some teams do this on a daily cadence.
Does feedback data affect billing or rate limits?
No. The feedback API is a metadata sidecar — no token cost, no rate-limit impact on your inference traffic. It writes to the request log, nothing else.
Related reading