Squire Docs

Quickstart

Sign in with GitHub, create a workload, and copy its ingest key.

Use that key as SQUIRE_INGEST_KEY in your app.

Send shadow data

Your app keeps calling OpenAI, Claude, or Gemini normally. After the response, send the OpenAI-compatible request shape and production output to Squire in the background. No package is required.

import json
import os
from threading import Thread
from urllib.request import Request, urlopen

SQUIRE_INGEST_KEY = os.environ["SQUIRE_INGEST_KEY"]

def send_to_squire(payload):
    request = Request(
        "https://squire.run/v1/compare",
        data=json.dumps(payload).encode("utf-8"),
        headers={
            "Authorization": f"Bearer {SQUIRE_INGEST_KEY}",
            "Content-Type": "application/json",
        },
        method="POST",
    )

    try:
        urlopen(request, timeout=2).close()
    except Exception:
        pass

PRODUCTION_MODEL = "gpt-4.1"  # use your existing production model

production_request = {
    "model": PRODUCTION_MODEL,
    "messages": messages,
    # If your production call uses response_format, tools, temperature,
    # max_tokens, or other OpenAI-compatible settings, keep them here.
}

response = client.chat.completions.create(**production_request)

Thread(
    target=send_to_squire,
    args=({
        **production_request,
        "reference_output": response.choices[0].message.content,
    },),
    daemon=True,
).start()

Give this to Codex or Claude

Paste this into Codex or Claude Code when you want it to add Squire to the right call site.

Add Squire shadow evaluation to this codebase.

Use direct HTTP only. Do not install a package.
Do not put Squire in the production response path.
Do not change the existing model call or returned response.
If the Squire request fails, swallow the error.

Squire workload: Default
Squire endpoint: https://squire.run/v1/compare
SQUIRE_INGEST_KEY=<use this workload's ingest key; regenerate it in Squire if you need the full value>

After the existing OpenAI/Claude/Gemini call returns, send a background POST to Squire:

Headers:
Authorization: Bearer $SQUIRE_INGEST_KEY
Content-Type: application/json

Use this code shape:
import json
import os
from threading import Thread
from urllib.request import Request, urlopen

SQUIRE_INGEST_KEY = os.environ["SQUIRE_INGEST_KEY"]

def send_to_squire(payload):
    request = Request(
        "https://squire.run/v1/compare",
        data=json.dumps(payload).encode("utf-8"),
        headers={
            "Authorization": f"Bearer {SQUIRE_INGEST_KEY}",
            "Content-Type": "application/json",
        },
        method="POST",
    )

    try:
        urlopen(request, timeout=2).close()
    except Exception:
        pass

PRODUCTION_MODEL = "gpt-4.1"  # use your existing production model

production_request = {
    "model": PRODUCTION_MODEL,
    "messages": messages,
    # If your production call uses response_format, tools, temperature,
    # max_tokens, or other OpenAI-compatible settings, keep them here.
}

response = client.chat.completions.create(**production_request)

Thread(
    target=send_to_squire,
    args=({
        **production_request,
        "reference_output": response.choices[0].message.content,
    },),
    daemon=True,
).start()

Practical body template:
{
  "model": "<production model id>",
  "messages": [
    {
      "role": "system",
      "content": "<same system content sent to the production model>"
    },
    {
      "role": "user",
      "content": "<same user content sent to the production model>"
    }
  ],
  "reference_output": "<production response text or structured JSON string>"
}

Squire accepts an OpenAI-compatible request shape.
Send the same production request contract plus reference_output.
Use messages for chat.completions calls, or input for responses.parse/responses.create calls.
Do not invent optional values.
If the production call uses structured output, tools, or generation settings, copy those existing OpenAI-compatible fields exactly.
Do not send legacy wrapper keys: reference_model, production_request, response_schema.
Use a short timeout around 2 seconds.
Run it asynchronously in the background.

Ask for a decision

Open the workload page in the dashboard, or call the decision endpoint directly.

curl -sS https://squire.run/v1/decision \
  -H "Authorization: Bearer $SQUIRE_INGEST_KEY"

{
  "label": "Default",
  "action": "DO_NOT_SWITCH",
  "safe_to_switch": false,
  "confidence_level": "LOW",
  "confidence_score": 0.26,
  "best_candidate": "qwen/qwen3.5-flash-02-23",
  "pass_rate": 0.54,
  "sample_count": 26,
  "sample_target": 10,
  "sample_status": "10 of 10 proof samples evaluated.",
  "risk_level": "HIGH",
  "suggested_strategy": "COLLECT_MORE_SAMPLES",
  "recommended_rollout": null,
  "top_failures": [
    "Not enough reasoning",
    "Did not follow the requested format"
  ],
  "summary": "Models fail ~46% of the time for this workload."
}

API reference

POST /v1/compare accepts shadow data immediately and samples queued requests for background evaluation.

Practical compare template. This is the OpenAI-compatible shape Squire needs.

{
  "model": "<production model id>",
  "messages": [
    {
      "role": "system",
      "content": "<same system content sent to the production model>"
    },
    {
      "role": "user",
      "content": "<same user content sent to the production model>"
    }
  ],
  "reference_output": "<production response text or structured JSON string>"
}

Send the same production request contract plus reference_output. Use messages for chat completions, or input for responses calls.

Do not invent optional values. If the production call uses structured output, tools, or generation settings, copy those existing OpenAI-compatible fields exactly.

Do not send legacy wrapper keys: reference_model, production_request, or response_schema.

GET /v1/decision returns the current machine-readable decision.

Exact schemas are available at /api-docs and /openapi.json.

Privacy

Squire scorecards and run history are metrics-only by default.

Raw prompt/output payloads may sit in the transient sampling queue for up to 60 minutes. They are cleared after evaluation or expiry.

Persistent storage excludes

prompts
outputs

Store

hashes
feature summaries
scores
model IDs
failure categories
latency
aggregates

Boundaries

Squire is a decision system. It does not sit in your production response path.

no routing
no auto-switching
no production interference
no raw data in scorecards
no production-path dependency

Sign in, send, decide.