Find out which AI model can safely handle your workload.

Your app runs normally.
Squire tests other models in the background and tells you what’s safe.

Squire runs after your app returns a response.

Cut your AI costs safely using your real traffic.

Open dashboard Docs

Production path unchanged Runs in the background Metrics-only by default Raw queue clears automatically

Start with the dashboard

Web dashboard

Create a workload and copy its key.

POST https://squire.run/v1/compare

Open dashboard to create and rotate workload keys.

POST /v1/compare
→ 202 Accepted
→ Squire evaluates models in the background

DO NOT SWITCH

Confidence: LOW
Best candidate: qwen/qwen3.5-flash-02-23
Pass rate: 54%

Next step:
Do not switch yet. Collect more samples.

Squire runs continuously as your app sends requests.

Decision

DO NOT SWITCH

Models fail ~46% of the time for this workload.
Switching now would likely break behavior.

Computed from your real production inputs.

Confidence: LOW
Best candidate: qwen/qwen3.5-flash-02-23
Pass rate: 54%

Next step:
→ Do not switch yet. Collect more samples.

Send shadow data

Your app keeps calling OpenAI, Claude, or Gemini normally. Squire receives the request and production output afterward.

View the decision

Open the dashboard or call GET /v1/decision and get one answer: switch, test more, or keep current.

Keep prompts private

Raw prompts and outputs may sit briefly in the transient sampling queue, then clear after evaluation or expiry. Scorecards and run history store metrics only.

Python

Add one background request.

Put this after your existing model call. If Squire fails, your app still works.

import json
import os
from threading import Thread
from urllib.request import Request, urlopen

SQUIRE_INGEST_KEY = os.environ["SQUIRE_INGEST_KEY"]

def send_to_squire(payload):
    request = Request(
        "https://squire.run/v1/compare",
        data=json.dumps(payload).encode("utf-8"),
        headers={
            "Authorization": f"Bearer {SQUIRE_INGEST_KEY}",
            "Content-Type": "application/json",
        },
        method="POST",
    )

    try:
        urlopen(request, timeout=2).close()
    except Exception:
        pass

PRODUCTION_MODEL = "gpt-4.1"  # use your existing production model

production_request = {
    "model": PRODUCTION_MODEL,
    "messages": messages,
    # If your production call uses response_format, tools, temperature,
    # max_tokens, or other OpenAI-compatible settings, keep them here.
}

response = client.chat.completions.create(**production_request)

Thread(
    target=send_to_squire,
    args=({
        **production_request,
        "reference_output": response.choices[0].message.content,
    },),
    daemon=True,
).start()