Runbook — enforce unique (org_id, external_ref) on tenants

Status: prepared, not yet applied to production. Migration 0024_tired_stick is committed but must be applied manually, after the dedup below.

Why

resolveOrCreateTenantByRef (src/lib/api/tenant-resolve.ts) documents and relies on "a unique index on (org_id, external_ref)" as its race-safety source of truth, and its insert-conflict branch matches tenants_org_external_ref. That constraint never existed — the schema only declared a non-unique index("tenants_external_ref_idx"). Consequences:

  • The race-recovery branch is dead code.
  • Concurrent first-touch requests for the same external_ref (e.g. a platform like Ezzeefy onboarding a shop via X-Sendoka-Tenant-Ref) can create duplicate tenants, and the failing insert path threw uncaught → raw 500 (the symptom fixed separately in the withApiAuth try/catch hotfix).

This change adds the missing unique("tenants_org_external_ref_unique") so the data model and the code agree.

Apply order (must be in this order)

  1. Audit. Run Step 1 of scripts/dedup-tenants-by-external-ref.sql against production (read-only).
    • Zero rows → skip to step 3. The bug may never have fired; nothing to clean.
  2. Dedup (only if step 1 found duplicates). Follow steps 2–5 of the script. Per duplicate group: pick a survivor (oldest active), repoint the 14 tenant-scoped tables, resolve the per-tenant unique collisions by hand (tenant_usage, suppressions, templates, audiences, messaging_pools), delete the loser, all in one transaction. Snapshot the DB first.
  3. Apply the migration. Once the audit returns zero rows:
    DATABASE_URL=<prod> npm run db:migrate     # applies 0024 (prod is at 0023)
    
    or apply the single statement directly:
    ALTER TABLE tenants ADD CONSTRAINT tenants_org_external_ref_unique UNIQUE (org_id, external_ref);
    -- (and DROP INDEX tenants_external_ref_idx; — folded into 0024)
    
    The ADD CONSTRAINT fails if any duplicate non-null external_ref remains — that failure is the safety net telling you dedup isn't complete.

Notes

  • external_ref is nullable; Postgres treats NULLs as distinct, so tenants created without a ref (e.g. via POST /tenants with no external_ref) are unaffected — many NULLs per org remain legal.
  • After this, resolveOrCreateTenantByRef's conflict-recovery becomes real: a lost race surfaces the unique violation and re-selects the winner instead of creating a duplicate.
  • Confirm the original production stack trace (incident ~`2026-06-25T11:15:10Z) names this insert — the withApiAuthhotfix now routes it tocaptureException`.