Open Gaps

Residual items after the ninth hardening pass.

Closed in the most recent pass (June 2026 — session 9)

Developer experience

  • Engagement webhook eventsmessage.opened / message.clicked / message.unsubscribed / message.complained now fan out (track routes, unsubscribe paths, SES complaint). Complaint moved off message.bounced to its own event.
  • Bulk webhook replayPOST /v1/webhooks/:id/replay-failed (100/call, oldest-first, replayed_at paging marker so re-invokes don't duplicate). Docs' broken paths fixed.
  • v1 test-firePOST /v1/webhooks/:id/test-fire (API-key parity with the dashboard button, shared 20/min budget).
  • Webhook auto-disable — 10 consecutive exhausted deliveries flip enabled=false + audit; any 2xx resets the streak; re-enable clears it. last_response_body (1 KB cap) captured per attempt.
  • delivery_id in webhook payload body + X-Sendoka-Delivery-Id header; Retry-After on all 429s.
  • Message list filtersto + created_after/before on v1 emails/sms + internal; date-range filter UI on the dashboard (the dead Filter button).
  • Suppressions dashboard/overview/suppressions + /api/internal/suppressions CRUD.
  • ResendPOST /api/internal/messages/:id/resend clones as due-now scheduled so the hardened cron path does provider/usage/fanout; button on message detail.
  • Verify-webhooks recipe rewritten — old doc showed a t=...,v1=... signature format that never existed on the wire; now V2-first with Python + Go examples.
  • Test-mode semantics documented — already correct in code (env-split usage, no quota impact, webhooks fire); docs/api/authentication.md now says so explicitly.

Enterprise

  • SSO enforcement (sso_connections.enforce_sso) — SSO-only login, membership-scoped with owner break-glass and verified-domain proof for unknown emails; enabling bumps members' session_version.
  • Session policiessession_max_age_hours + max_sessions_per_user on organizations, strictest-across-memberships, enforced in the JWT callback / at login.
  • SP metadata endpointGET /api/auth/saml/metadata for IdP import.
  • SCIM Groups/scim/v2/Groups CRUD + group→role mapping (owner excluded at zod + rank table + DB CHECK; directory default_role can no longer be owner, legacy directories clamped at provision time).
  • Permission pass finished — gate matrix documented in permissions.ts; sole violation fixed (internal messages DELETE was member+, now developer+); UI gates aligned.
  • Tenant key minting hardenedPOST /v1/keys now validates tenant_id against the org's tenants. Platform admin token gap was already closed by /v1/keys + /v1/domains accepting tenant_id; platforms.md now documents the full no-cookie provisioning flow.

Marketing honesty

  • SOC 2 → "in progress", SLA → "uptime target", VPC → "on roadmap"; DPA footer link now resolves to /legal/dpa.

Migrations this pass: 0017 (webhook auto-disable + response body), 0018 (replayed_at), 0019 (enforce_sso + session policies), 0020 (scim groups), 0021 (mapped_role CHECK).

Closed in the prior pass (May 2026 — session 8)

Cost / abuse protection

  • Provisioning rate limit (10/hr/org)src/lib/api/provision-gate.ts wired into POST /api/v1/{brands,campaigns,phone-numbers} and the internal mirrors. Each AWS provisioning call costs real money; runaway scripts or a compromised key now stop at quota with PROVISION_LIMIT_EXCEEDED.
  • Idempotency on POST /v1/brands + /v1/campaigns + /v1/phone-numbers — Idempotency-Key header replays the cached 201 within the TTL window. Closes the dup-side risk that the reconcile cron couldn't fix.

Inbound + observability

  • Inbound SMS persistence — new inbound_sms table + /api/internal/inbound-sms list + inbound.sms webhook event. Free-form replies that aren't STOP/HELP are now persisted and pushed to customer webhooks instead of being dropped.
  • Per-request structured contextsrc/lib/request-context.ts (AsyncLocalStorage) auto-injects request_id / org_id / tenant_id / route into every log.* and logError line within the request lifecycle. Wired into withApiAuth.
  • Sentry shimsrc/lib/observability.ts opt-in via SENTRY_DSN env + npm i @sentry/nextjs. warn / error log lines + logError calls forward to Sentry when configured; no-op otherwise.

Coverage + tests

  • sns-sms route end-to-end tests — added inbound-persistence success/miss paths + duplicate-delivery dedupe assertion. 11 tests covering: signature reject, rate limit, topic allowlist, SUCCESS → delivered + fanout, hard-block → bounced + suppress, spurious notification, STOP tenant scope, free-form inbound persistence, no-match drop, duplicate delivery.
  • OpenAPI parity for kind: alphanumericPhoneNumber, ProvisionPhoneNumber, new RegisterAlphanumericSender schemas; POST /phone-numbers requestBody now uses oneOf with kind discriminator.

Retention

  • Audit log archive-to-blob/api/cron/retention accepts ARCHIVE_AUDIT_LOGS=true + BLOB_READ_WRITE_TOKEN: serializes rows > 365d to Vercel Blob (audit-archive/YYYY-MM-DD.json) before delete. Upload failure skips the delete so the next run retries.
  • Idempotency TTL tuningIDEMPOTENCY_TTL_HOURS env (default 24, range 1–168) tunes the cache window per-deploy.

Sales surface + compliance

  • /pricing route — dedicated public page replacing the inline anchor on the home. Three tiers (Hacker / Team / Enterprise) with a full feature comparison table covering the three new enterprise features (Dedicated IPs, India DLT, SAML/SCIM) plus volume bands, compliance (SOC 2, GDPR, HIPAA), and FAQ.
  • GDPR per-user data export + erasurePOST /api/internal/user/data-export returns profile + memberships + sessions + 2FA + author-attributed audit entries (10k cap). DELETE /api/internal/user confirms via password OR email match, refuses when the user is the sole owner of an org (SOLE_OWNER 409 unless delete_owned_orgs: true), anonymizes audit entries rather than dropping them. UI panel on /overview/settings/security.

Enterprise: built out end-to-end

  • Dedicated IPs + IP pools — real AWS SESv2 calls (CreateDedicatedIpPool MANAGED scaling + CreateConfigurationSet + PutConfigurationSetDeliveryOptions). New /api/cron/ip-pool-warmup (every 15 min) polls GetDedicatedIps and advances provisioningwarmingactive. Send pipeline resolves domain → pool → configurationSet and passes ConfigurationSetName into SendEmailCommand; paused pools reject sends with IP_POOL_PAUSED. /api/internal/ip-pools/[id] PATCH + DELETE + /api/internal/ip-pools/[id]/assignments POST/DELETE round out the CRUD.
  • India DLT registration — Gupshup HTTP client behind DLT_PROVIDER=gupshup env (Kaleyra / MSG91 can plug in via the same interface). Headers + templates CRUD at /api/internal/dlt/{headers,templates}. New /api/cron/dlt-status-poll (every 30 min) flips submittedverified / approved / rejected via pollDltStatus. Send-time gate (enforceDltTemplate) on the v1 SMS route requires an approved dlt_template_id for any +91 recipient.
  • SAML / SSO + SCIM — real samlify-based AuthnResponse verification at /api/auth/saml/callback (signature check against sso_connections.certificate, JIT user provisioning, one-shot token exchange via a NextAuth Credentials samlToken arm). SCIM 2.0 /Users GET/POST + /Users/[id] GET/PATCH/PUT/DELETE with sha256-bearer-token auth against scim_directories. /api/internal/scim (enterprise-only) mints + returns plaintext bearer once.

Closed in the prior pass (May 2026 — session 7)

SMS audit + critical security

  • SNS verify cert fetch timeout + 5-min replay windowsrc/lib/api/sns-verify.ts. Captured signed payloads can no longer replay indefinitely.
  • SNS-SMS topic allowlist + SubscribeURL host gate + per-IP rate limitsrc/app/api/webhooks/sns-sms/route.ts rejects notifications from un-allowlisted topics, validates SubscribeURL against the same sns.*.amazonaws.com regex as the cert host, caps 600/min/IP.
  • STOP suppression now tenant-scoped — was org-wide (multi-tenant leak per the schema's own warning); STOP-from-recipient lookup now returns (orgId, tenantId) and is passed into addSuppression.
  • SMS hard-block → bounced + auto-suppressBLOCKED/EXPIRED/INVALID_NUMBER/OPTED_OUT map to message.bounced with auto-suppression (was collapsed to failed with no suppression).
  • v1 batch SMS / email gate paritywithApiAuth(..., "send:{channel}"), filterSuppressed per item, sender-registration check (SMS), per-item reject for unsupported template/scheduled_at/from_pool/media_url.
  • v1 /sms/{id} scope + env filter + audit — GET requires read:messages, DELETE requires send:sms, both filter on ctx.environment + ctx.tenantId. DELETE emits message.canceled.
  • Idempotency in-flight lock releasestoreResponseIfIdempotent releases the Redis lock after the cache row lands so retries within 30s see a 200 replay instead of 409. New releaseIdempotencyLock helper for error paths.
  • Send/DB ordering hardened — failure-row insert on v1 sms + emails uses onConflictDoNothing so a partial happy-path insert can't PK-collide and mask the original error; .catch(() => {}) swapped for logError on usage + insert failures.
  • pickFromNumber tenant scoping fixed — platform-root caller no longer matches every tenant's pool (Drizzle and(...) was silently dropping the undefined predicate).
  • Provider tighteningAWS.SNS.SMS.SenderID only set for non-E.164 originators (US/CA reject the attribute); provisionPhoneNumber derives MessageType from the campaign use case (marketing → PROMOTIONAL); mapStatus expanded to recognize VERIFIED / APPROVED / REGISTERED + the failure synonyms DENIED / REQUIRES_AUTHENTICATION / REVOKED.
  • 403 wordingrequireDeveloperSession sites stopped returning "Owner role required"; 20 routes now correctly say "Developer or owner role required."
  • Internal GET-by-id serialize parity — brands / campaigns / phone-numbers detail endpoints now use the same serialize() shape as the list endpoints.
  • Audit emission on v1 send routesmessage.sent + message.canceled added to the AuditAction union, emitted from v1 sms POST + DELETE.
  • Atomic claim on scheduled-send cron — overlapping cron ticks no longer double-send. Drain uses UPDATE … RETURNING with FOR UPDATE SKIP LOCKED against the inner SELECT; transient sending status acts as the claim marker, stale-claim recovery (>5 min) lets a crashed runner's rows be re-picked.
  • Reconcile cron for orphan registrations — new /api/cron/reconcile-registrations runs every 15 min, retries brand/campaign create when AWS failed at POST time leaving providerBrandId/providerCampaignId NULL. Rows older than 7 days flip to failed with RECONCILE_TIMEOUT.

Alphanumeric sender path (UK / AU / EU)

  • Schema: phone_numbers.kind (number | alphanumeric), nullable e164, new sender_id text + (orgId, senderId) unique. Migration 0009.
  • Validation: registerAlphanumericSenderSchema enforces 1–11 chars and rejects iso_country in US/CA/IN (carrier rules forbid alphanumeric there).
  • Routes: POST /api/v1/phone-numbers and /api/internal/phone-numbers branch on kind=alphanumeric — no AWS round-trip, row inserted as verified immediately.
  • Send gate: v1 sms + batch detect non-E.164 from and look up by senderId instead of e164.
  • UI: /overview/sms/numbers gets a "Register sender ID" button + modal alongside "Provision number"; Provision modal country dropdown limited to US/CA with helper text pointing at the sender-ID path.
  • Country guard: both POST routes reject iso_country outside {US, CA} for kind=number with 422 COUNTRY_NOT_PROVISIONABLE, plus a catch-block matcher for AWS INVALID_PARAMETER/isoCountryCode as a fallback.
  • Docs: docs/api/sms.md + /docs#send-sms updated for both sender kinds; importable Postman collection at docs/api/sendoka-sms.postman_collection.json + served from public/.

Tests

  • sns-verify.test, suppression.test, pool.test, sms-registration.test, sms.test — 262 pass / 4 skipped, no regressions.

Closed in the prior pass (April 2026 — session 6)

  • SDK packages removedpackages/sdk-node and packages/sdk-python deleted along with docs/developer-tools/sdks.md. The REST API is the surface contract; we'll revisit official SDKs once we have ≥3 customers asking. Code samples in marketing, docs, and the dashboard quickstart now use curl / fetch / requests directly.

Closed two passes back (April 2026 — session 5)

  • Platform mode (tenants) — new tenants table plus tenant_id nullable columns on api_keys, domains, webhook_endpoints, messages, suppressions, templates, audiences, contacts, inbound_messages, messaging_pools. All send + CRUD routes filter / stamp ctx.tenantId so a tenant-bound key can't cross-send, cross-read, cross-suppress, or cross-pool. /api/v1/tenants CRUD + /tenants/:id/usage; internal /overview/tenants with usage counts and paginated list. See docs/features/platforms.md.
  • Tenant-scoped audience send/api/v1/audiences/:id/send + internal dashboard mirror filter audience, template, contacts all by tenant.
  • Dashboard audience send now posts to /api/internal/audiences/:id/send (session-auth) instead of /api/v1/... (was 401 with cookies).
  • API keys UI surfaces tenant picker + allowed-domain multiselect on create; row badges show binding.
  • Audit failures surface in logslogAudit now internally calls logError on DB failure so silent drops become visible even when the caller uses .catch(() => {}).
  • Idempotency store failure loggingstoreIdempotency callsites swap .catch(() => {}) for logError("idempotency.store_failed", ...).
  • Domain cache invalidationinvalidateDomainCache() called from add/verify/delete so tenant-binding changes don't linger behind the 5-min TTL.
  • v1 fanout moved to after() — email/sms send + their batch siblings hand response back before endpoint delivery runs.
  • Dynamic audience insert chunking — chunks sized by serialized byte count (~0.5 MB budget) instead of row count so large HTML bodies don't push the Neon HTTP statement cap.
  • Toast UX — dashboard toasts now surface request_id when present and keep error toasts up longer so users can copy.
  • Sidebar grouping — nav organized into Sends / Configure / Admin sections instead of a flat 10-item list.
  • Schema barrel documented as server-only to head off a client-bundle regression where @/lib/db/schema drags Drizzle into the browser.
  • Tenants list paginated via cursor; /api/internal/tenants?cursor=… + "Load more" button.
  • Cron send-scheduled indentation cleaned up.

Closed in earlier passes (April 2026 — session 4)

  • IP allowlist per API keyapi_keys.allowed_cidrs; request IP (X-Forwarded-For / X-Real-IP) matched against CIDRs in withApiAuth with IPv4 + IPv6 support. 403 IP_NOT_ALLOWED on miss.
  • CSV import for contactsPOST /api/internal/audiences/:id/import-csv (text/csv or JSON body). Header row selects columns, extras become contact metadata. Up to 10k rows per call, audited as audience.csv_imported.
  • A/B subject variants on audience sendsubject_variants (2–5 strings) assigns each contact deterministically by hash. Variant letter lands in messages.metadata.subject_variant.
  • Domain warmup rampdomains.warmup_started_at + warmup_days. Daily cap grows 50 × 2^day for the configured window; exceeding returns 429 WARMUP_LIMIT_EXCEEDED.
  • Per-device session tracking — new user_sessions table + sessionId JWT claim. Revoked rows reject future requests via JWT callback. /api/internal/sessions lists active devices; DELETE revokes one. /overview/settings/security Devices card surfaces the list.
  • Async org exportorg_exports job table, /api/cron/run-exports every 10 min, /api/internal/org/export/async enqueues; download endpoint streams complete jobs.
  • Retention crons — nightly /api/cron/retention trims health_probes (>30d), audit_logs (>365d), daily_stats (>180d), delivered webhook_deliveries (>30d).
  • Public status page/status renders last 30 days from health_probes; cron probes every 5 min.
  • Dashboard audience surface/overview/audiences + /api/internal/audiences CRUD + inline CSV / send drawer.
  • Owner-gated UI audit — billing page now reads role via requireSession, cancel-send buttons on message list/detail gated on useCanWrite, templates test-send now requireWriteSession (rejects viewers).
  • Tests — vitest covers schedule.ts (timezone conversion), cidr.ts (v4 + v6 matching), csv.ts (quoted fields, CRLF, escaped quotes). 81 tests total.

Closed in earlier passes (April 2026 — session 3)

  • Non-blocking webhook fan-out — SES, SNS-SMS, and inbound-email handlers wrap fanoutWebhookEvent in after(() => ...) so the HTTP response returns before fan-out runs.
  • /api/health endpoint — Postgres + Upstash probes, JSON body with status, checks, uptime_s. Returns 503 when any probe fails.
  • Stripe in-app cancel flow — billing portal session created with flow_data.subscription_cancel and current active subscription, surfaced on /overview/settings/billing.
  • GitHub Actions CI.github/workflows/ci.yml runs tsc --noEmit, eslint, and vitest run on push + PR.
  • Per-key rate limitsapi_keys.rate_limit_per_minute column (nullable; falls back to plan default). Enforced before plan bucket, emits X-RateLimit-Scope: key|org.
  • Message search/api/internal/messages accepts ?q= across to/from/subject (email) or to/from/body (SMS) with escaped-ilike; UI adds a debounced search box.
  • Audit log dashboard/overview/settings/audit with filter by action, client-side text filter, cursor pagination.
  • Onboarding checklist — server-rendered on /overview based on actual state (email verified, domain added, domain verified, key created, first send).
  • Template preview + test-send/api/internal/templates/preview renders variables, /api/internal/templates/test-send fires a real send tagged template-test. Expandable UI per template on /overview/templates.
  • Deliverability card/overview groups current-period email sends by sender domain and surfaces bounce rate with warn/critical thresholds.
  • Hot-path logger wire-up — all cron + webhook handlers emit log.info on success and logError on failure.
  • Tests — vitest covers unsubscribe token round-trip/tampering, from-address parse/format, tracking link rewrite + pixel injection, scopes, idempotency body hash.

Closed in prior passes

  • Active-org cookie honored by requireSession(); /api/internal/* observes the switcher.
  • Backup codes bcrypt-hashed; login runs bcrypt.compare across the hash array.
  • Password reset bumps users.session_version; JWT carries the version and is invalidated on mismatch.
  • CORS preflight + headers set on /api/v1/* from proxy.ts.
  • Permissions model: owner vs member. Destructive / admin internal endpoints require owner.
  • Cancel scheduled send: DELETE /api/v1/{emails,sms}/{id} flips scheduledcanceled.
  • Member removal + role update: PATCH / DELETE /api/internal/team/members.
  • Audit: domain add/remove/verify, webhook create/remove/rotate, plan upgrade/downgrade all emit entries.
  • Resend verification email: POST /api/auth/resend-verification (rate-limited 3/hour via Upstash).
  • 2FA dashboard UI — setup (QR + verify), backup codes, disable w/ password.
  • Templates dashboard UI.
  • Webhook delivery inspection on /overview/webhooks.
  • OpenAPI 3.1 spec at /api/openapi.json.
  • Structured logger at src/lib/log.ts.
  • Tracking scaffold: applyTracking rewrites links and injects pixel when track_opens or track_clicks is set on a send.

Still open

Operational follow-ups for the new enterprise surfaces

  • SAML test harness — IdP-side configuration testing (Okta + Azure AD) currently only documented; no automated end-to-end test for AuthnResponse parsing.
  • DLT provider plurality — Gupshup is the only wired provider. Kaleyra / MSG91 / direct DLT-portal need their own pollDltStatus shims when customers ask.
  • SES @authenio/samlify-node-xmllint — optional schema validator. Production deploys should install it or verifyAndExtract throws.

Data

  • Neon HTTP driver caveat — multi-statement transactions only via sql.transaction([...]). db.transaction(...) unavailable.

Verified clean

  • tsc --noEmit passes with no errors after the current pass.
  • vitest run: 262 pass / 4 skipped / 0 fail.
  • All Drizzle tables re-exported in the schema barrel.
  • Migration 0009 applied to dev DB; journal backfilled for 0000-0009.
  • CI job defined at .github/workflows/ci.yml.