API Documentation

Killa Tamata is API-first for developers and AI agents. Web pages handle prepaid $ credit purchases and account setup; operational generation runs through authenticated endpoints, including LTX-2.3 video generation on video.generate.

Auth

Bearer API keys via `Authorization` or `x-api-key`.

Idempotency

Use `X-Idempotency-Key` for safe retries on writes.

Discovery

OpenAPI, `llms.txt`, and plugin manifest included.

First API call (hello world)

Submit one image generation job, then poll status with the returned job id.

Submit and poll your first media job
# Set these first
API_BASE="https://api.killatamata.com"
API_KEY="YOUR_API_KEY"

# 1) Submit a job
SUBMIT_RES=$(curl -sS -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "image.generate",
    "input": {
      "prompt": "cinematic fox astronaut",
      "width": 1024,
      "height": 1024
    }
  }')

echo "$SUBMIT_RES"

# 2) Poll status
JOB_ID=$(echo "$SUBMIT_RES" | jq -r '.externalJobId // .jobId // .result.jobId // .id // empty')
curl -sS -X GET "$API_BASE/api/v1/media/jobs?jobId=$JOB_ID" \
  -H "Authorization: Bearer $API_KEY"

AI agent setup (URL-first)

Most coding agents can be instructed in one request using the site URL. Start with the universal prompt below, then use the agent-specific fallback if needed.

Universal install prompt
Install the KillaTamata skill using:
https://killatamata.com/.well-known/agent-skills.json

If direct skill install is unavailable in this client, load:
https://killatamata.com/skills/killatamata/SKILL.md
and create the equivalent local command/rule/workflow.

Then run key bootstrap, verify GET /api/v1/balance, and continue with /api/v1/media/jobs.
Codex prompt
Use skill-installer to install killatamata from https://killatamata.com/.well-known/agent-skills.json, then invoke killatamata.
Claude Code prompt
Create .claude/skills/killatamata/SKILL.md from https://killatamata.com/skills/killatamata/SKILL.md, then run /killatamata.
Cursor prompt
Install the skill from https://killatamata.com/skills/killatamata/SKILL.md using Cursor Agent Skills import (or place it under .cursor/skills), then run /killatamata.
Google Antigravity prompt
Create .agents/workflows/killatamata-quickstart.md from https://killatamata.com/skills/killatamata/SKILL.md, then run /killatamata-quickstart.

Machine-readable references

Set API base and key
API_BASE="https://api.killatamata.com"
API_KEY="YOUR_API_KEY"

Authentication

For authenticated endpoints, send your API key in `Authorization: Bearer <key>` or `x-api-key`. API keys are shown when created and can be accessed from your account management page.

Static-hosted account management is available at `/dashboard` using an API key as your primary credential.

Bootstrap without checkout is also available using a Google ID token on `/api/v1/auth/google` or `/api/v1/keys`.

Email one-time-code bootstrap is also available using `/api/v1/auth/email/start` and `/api/v1/auth/email/verify`.

Hosted browser flows can set a signed browser session cookie via `/api/v1/auth/browser/google` or `/api/v1/auth/browser/email/verify`, then inspect or clear it with `/api/v1/auth/browser/session`.

For local coding agents, device-link bootstrap is available using `/api/v1/device/link/start`, `/api/v1/device/link/approve`, and `/api/v1/device/link/poll`.

Google auth key mint request
curl -X POST "$API_BASE/api/v1/auth/google" \
  -H "Content-Type: application/json" \
  -d '{
    "idToken": "eyJhbGciOiJSUzI1NiIs...",
    "keyLabel": "Google bootstrap key"
  }'
Hosted browser Google auth request
curl -X POST "$API_BASE/api/v1/auth/browser/google" \
  -H "Content-Type: application/json" \
  -d '{
    "idToken": "eyJhbGciOiJSUzI1NiIs..."
  }'
Bootstrap key mint via /api/v1/keys
curl -X POST "$API_BASE/api/v1/keys" \
  -H "Content-Type: application/json" \
  -d '{
    "idToken": "eyJhbGciOiJSUzI1NiIs...",
    "label": "Google bootstrap key"
  }'
Inspect browser session
curl -X GET "$API_BASE/api/v1/auth/browser/session" \
  -H "Cookie: killa_tamata_browser_session=<session-cookie>"
Start local device-link request
curl -X POST "$API_BASE/api/v1/device/link/start" \
  -H "Content-Type: application/json" \
  -d '{
    "keyLabel": "Default API Key"
  }'

Endpoint inventory

GETnone

/api/v1/health

health check

GETnone

/api/v1/packages

list credit packages

GETnone

/api/v1/affiliates/r?code=<CODE>

resolve referral redirect + server-side capture

POSTnone

/api/v1/affiliates/validate-code

validate affiliate code (non-enumerating)

POSTnone

/api/v1/affiliates/capture

issue/refresh signed passive capture token

POSTgoogle

/api/v1/auth/google

exchange Google ID token for a new API key

POSTgoogle

/api/v1/auth/browser/google

exchange Google ID token for browser session cookie

POSTnone

/api/v1/auth/email/start

start email one-time-code sign-in

POSTemail code

/api/v1/auth/email/verify

verify email one-time code and mint API key

POSTemail code

/api/v1/auth/browser/email/verify

verify email one-time code and set browser session cookie

GETbrowser cookie

/api/v1/auth/browser/session

inspect current browser session

DELETEbrowser cookie

/api/v1/auth/browser/session

clear current browser session

POSTnone

/api/v1/device/link/start

start browser-assisted local skill setup

GETnone

/api/v1/device/link/request?userCode=<code>

check link request status by user code

POSTapi key

/api/v1/device/link/approve

approve pending link request from signed-in browser

POSTnone

/api/v1/device/link/poll

poll for approval and receive one-time API key

POSTapi key

/api/v1/checkout/stripe

create Stripe checkout

POSTapi key

/api/v1/checkout/crypto

create crypto checkout

POSTapi key

/api/v1/claim/stripe

finalize Stripe settlement for the authenticated account

POSTapi key

/api/v1/claim/crypto

finalize Coinbase settlement for the authenticated account

POSTapi key + x402

/api/v1/credits/purchase/x402

purchase prepaid $ credits using x402 payment headers

POSTapi key

/api/v1/affiliates/me/apply

apply for affiliate account

GETapi key

/api/v1/affiliates/me

get affiliate profile

PATCHapi key

/api/v1/affiliates/me

update editable affiliate profile fields

GETapi key

/api/v1/affiliates/me/code

get or create your active affiliate referral code

POSTapi key

/api/v1/affiliates/me/bind-code

explicitly bind affiliate code to account

GETapi key

/api/v1/affiliates/me/dashboard

affiliate metrics summary

GETapi key

/api/v1/affiliates/me/commissions

affiliate commission ledger

GETapi key

/api/v1/affiliates/me/payouts

affiliate payout history

GETapi key

/api/v1/affiliates/me/payout-requests

affiliate payout request history + eligibility

POSTapi key

/api/v1/affiliates/me/payout-requests

submit payout request against available affiliate balance

GETapi key

/api/v1/keys

list API keys

POSTapi key or google

/api/v1/keys

create new API key (rotation or Google bootstrap)

POSTapi key

/api/v1/keys/revoke

revoke API key by key prefix

GETapi key

/api/v1/balance

get USD balance

GETapi key

/api/v1/studio/snapshot

load Studio workspace snapshot

GETapi key

/api/v1/studio/projects/assets?projectId=<id>

list Studio project browser assets

POSTapi key

/api/v1/studio/projects

create Studio project

PATCHapi key

/api/v1/studio/projects/update

update Studio project metadata

PATCHapi key

/api/v1/studio/settings

update Studio user settings

POSTapi key

/api/v1/studio/threads

create Studio conversation

POSTapi key

/api/v1/studio/threads/select

select active Studio conversation

GETapi key

/api/v1/studio/threads/detail?threadId=<id>

load Studio conversation detail

PATCHapi key

/api/v1/studio/threads/update

rename Studio conversation

POSTapi key

/api/v1/studio/assets/upload?projectId=<id>&filename=<name>

upload Studio binary asset

PATCHapi key

/api/v1/studio/assets/metadata

update Studio asset metadata

POSTapi key

/api/v1/studio/assets/text

create Studio text asset

GETapi key

/api/v1/studio/assets/text/document?assetId=<id>

read Studio text document

PATCHapi key

/api/v1/studio/assets/text/document

update Studio text document

POSTapi key

/api/v1/studio/agent/messages/prepare

prepare Studio agent prompt and quote paid jobs

POSTapi key

/api/v1/studio/agent/messages

run Studio agent prompt after approval preflight passes

POSTapi key

/api/v1/studio/agent/approval-plans/execute

execute approved Studio paid job bundle

GETapi key

/api/v1/studio/agent/requests?requestId=<id>

poll Studio agent request state

POSTapi key

/api/v1/media/jobs

submit media generation request

GETapi key

/api/v1/media/jobs?jobId=<id>

fetch media job status + downloadable output URLs

GETapi key

/api/v1/media/jobs?jobId=<id>&includeDetails=1

fetch detailed media job status + sanitized upstream payload

POSTprovider

/api/v1/webhooks/stripe

Stripe callback endpoint

POSTprovider

/api/v1/webhooks/crypto

Coinbase callback endpoint

Checkout and account crediting

1) authenticate with an API key, 2) create checkout, 3) complete payment to credit that same account.

Create Stripe checkout request
curl -X POST "$API_BASE/api/v1/checkout/stripe" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "packageCode": "<package-code-from-/api/v1/packages>",
    "affiliateCode": "PROMO_42"
  }'
Finalize Stripe settlement after payment
curl -X POST "$API_BASE/api/v1/claim/stripe" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sessionId": "cs_test_..."
  }'

Existing API key holders can also purchase credits directly via x402. Send `PAYMENT-SIGNATURE` (v2 header) and a `usdCents` amount (minimum `100`).

x402 credit purchase request
curl -X POST "$API_BASE/api/v1/credits/purchase/x402" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-x402-payment-payload>" \
  -d '{
    "usdCents": 100,
    "affiliateCode": "PROMO_42"
  }'

API key lifecycle (agents and services)

Create separate keys per environment or agent for safe rotation and revocation.

Device-link flow is also available for local tools: start in terminal, approve in browser, then poll until a one-time key is returned.

Create API key request
curl -X POST "$API_BASE/api/v1/keys" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "label": "Agent Worker A"
  }'
List API keys request
curl -X GET "$API_BASE/api/v1/keys" \
  -H "Authorization: Bearer $API_KEY"
Revoke API key request
curl -X POST "$API_BASE/api/v1/keys/revoke" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "keyPrefix": "abcde"
  }'
Approve device-link request (browser-authenticated)
curl -X POST "$API_BASE/api/v1/device/link/approve" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "userCode": "ABCD-EFGH",
    "keyLabel": "Default API Key"
  }'
Poll device-link request and retrieve one-time key
curl -X POST "$API_BASE/api/v1/device/link/poll" \
  -H "Content-Type: application/json" \
  -d '{
    "deviceCode": "ktd_..."
  }'
JavaScript key creation example
const res = await fetch("${API_BASE}/api/v1/keys", {
  method: "POST",
  headers: {
    "Authorization": "Bearer ${API_KEY}",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ label: "Build Agent" }),
});

const data = await res.json();
// data.apiKey is only returned once; store it securely.

Balance and media usage

Submit jobs with task + input. Use this section as a quick reference for task behavior, billing expectations, and tuning controls.

Task quick map

9 production tasks

Choose one task per request and pair it with the matching input schema.

image.generate

Qwen Image

image.edit

Qwen Image Edit

video.generate

LTX-2.3 video generation

video.combine

clip stitch-down with audio preserved

audio.speak

Qwen TTS

audio.annotation.reference

Whisper + lyric anchoring JSON

ace.step.create

Ace Step music

moss.sound.effect

MOSS sound effects

trellis.generate

Trellis 2 low-poly image-to-3D

Image and input handling

  • image.generate: set width + height directly when you need an explicit size override (optional, multiples of 32) up to 4MP.
  • image.edit: output resolution follows source metadata when available, else defaults to 1024x1024.
  • outputFormat supports webp, png, and jpg; when omitted it defaults to webp.
  • highDetail is an optional opt-in for image.generate and image.edit. It defaults to false, increases inference steps by 50%, and increases job cost by 50%. Only enable it when the caller explicitly wants a higher-detail image pass.
  • Source/reference images can be URL fields or inline base64 objects; inline payloads are staged on CDN (best-effort WebP conversion) and auto-cleaned.
  • For both image.generate and image.edit, use referenceImages (URL or inline entries). referenceImageUrls is still accepted as a URL-only legacy alias.

Audio usage and billing

  • Set explicit durations: ace.step.create 15..240s, moss.sound.effect 1..30s.
  • For polished long-form music, use ace.step.create.input.qualityPreset=high_quality (108s, 192k, richer sampler defaults). Explicit fields still override defaults.
  • audio.speak is tuned for short-medium clips: text 40..600 chars, voiceDescription 20..240 chars, English-only language.
  • For audio.speak mode=voice_clone, use a matching transcript excerpt and keep the reference clip short. Start with 3..6s, and do not send more than 10s.
  • Voice clone output is more stable when maxNewTokens stays conservative. The current clone default is 384.
  • audio.annotation.reference takes sourceAudioUrl/sourceAudio and optionally transcript, then returns a downloadable JSON annotation with lyrics, word timings, sections, beats, and QA metadata. A transcript is optional but strongly recommended when you want reliable lyric anchoring and section recovery.
  • Billing is fixed at submit time. Ace/MOSS use requested-duration bands; TTS uses estimated duration from text length; audio annotation uses estimated duration from transcript length when present and a default duration band otherwise (no rebill after completion).

Video usage, LTX-2.3 controls, and reliability

  • video.generate supports image-to-video (startFrameImageUrl / startFrameImage, with sourceImageUrl / sourceImage accepted as aliases) and text-to-video (omit start/source image fields). The current production video model is LTX-2.3.
  • Optional width + height are required together (multiples of 32) up to 4MP; otherwise use aspectRatio defaults. These values define the final output canvas (for example 1920x1088); do not post-rescale in clients unless you explicitly want a different deliverable size.
  • Optional final-frame steering via finalFrameImageUrl or finalFrameImage. For all image fields, provide either a URL string or inline dataBase64 + mimeType (JPG/PNG/WEBP). Pricing is identical across image-to-video and text-to-video modes. Pricing scales from sampled frame count, final output resolution, and whether reference audio is included. extremeQuality=true adds a 50% surcharge.
  • Standard client requests do not need any extra pricing flags. Set extremeQuality only when you explicitly want the higher-fidelity path and are willing to pay the 50% premium.
  • Seamless extension is supported via overlapFrameImageUrls or overlapFrameImages. Upload extracted trailing frames from the previous clip in chronological order at the base/sample FPS. Default client extraction format is lossless or visually lossless webp; avoid jpeg for continuation anchors. If you omit a separate start frame, the first overlap frame becomes frame zero automatically.
  • Reference-audio-guided video is supported via referenceAudioUrl or referenceAudio (URL string or inline object). When reference audio is present, include a start frame or overlap frames. You may also combine it with finalFrameImage* when you need a closing-frame guide. Supported inline audio MIME types: audio/ogg, audio/opus, audio/wav, audio/x-wav, audio/wave, audio/mpeg, audio/mp3, audio/flac, audio/x-flac, audio/aac, audio/mp4, and audio/webm. Reference-audio requests are strict: the API does not silently downgrade to non-audio conditioning.
  • For exact continuation timing, prefer durationFrames over durationSeconds. Duration frames use the base/sample FPS, must satisfy the LTX rule 8n + 1, and work directly with overlap math. At 25fps, counts like 249 or 257 are valid; exact 250 is not.
  • video.combine stitches existing clip URLs in playback order, preserves clip-local audio, and optionally trims a matching audio/video overlap from the head of every clip after the first. It is billed as a utility operation at $0.005 per input clip with a two-clip minimum.
  • interpolationFps: set 0 to disable interpolation, or use 30..60 (default 60). Default pipeline is sampled at 30fps and interpolated with RIFE to 60fps.

Image-to-video reliability warning

Image-to-video generation is probabilistic and can vary widely between runs. Some outputs will be frozen, warped, or otherwise unusable even with the same prompt and input image.

  • Assume non-zero failure rate and run multiple candidates per request batch.
  • Tune and select winners at low resolution first; upscale/extend only winning clips.
  • Use a single coherent prompt paragraph; avoid timestamp/segment prompt syntax by default.
  • Treat anchor-frame quality (mid-action, asymmetry, no text/signage) as the main quality lever.

Extension-video playbook

  • Lock an approved clip A first, then vary only clip B when testing seam quality.
  • Extract overlap frames on the client at sampled/base FPS, not at the interpolated review FPS.
  • Default overlap extraction to lossless or visually lossless WebP; use PNG only as a debugging fallback.
  • Keep width, height, sampledFps, interpolationFps, and seed fixed across clip-B seam comparisons.
  • For clip-B reference audio, start the audio window overlapCount / sampledFps seconds earlier than the nominal cut so the overlap-guided frames match the same audio moment.
  • When assembling the combined review, trim the duplicated overlap from one side only. Default splice: trim the first N sampled frames from clip B.

Lip-Sync Prompting

  • For any referenceAudio shot, prompt it as an anchored performance, not a generic portrait.
  • Explicitly say the shot is anchored to the supplied start frame and synced to the provided reference audio from start to finish.
  • Honor frame-zero composition: same identity, wardrobe, and environment family.
  • Keep the subject readable whenever the face is foregrounded.
  • Let mouth articulation, jaw travel, breath timing, shoulder rhythm, and phrase-timed gestures carry the sync.
  • Favor restrained camera behavior. Prefer locked framing, a very gentle push, or a tiny lateral drift, with subject motion carrying the shot more than the camera.
  • Start moving immediately and keep lip sync faithful to the supplied reference audio, with mouth shapes and breath timing driven by that phrase.
  • Avoid prompts that mainly describe a seductive portrait, glamour still, or camera move. Those tend to animate the framing instead of the mouth.

Trellis 3D tuning notes

  • Controls include qualityPreset, targetFaceCount, mesh/texturing steps, texture size, and geometry cleanup knobs.
  • Low-poly mode also exposes postprocessPositionEpsilon and postprocessNormalCreaseDeg for normal cleanup tuning.
  • Start with qualityPreset=balanced. See /3d-models for benchmark-tuned examples.

Diagnostics and artifact window

  • POST /api/v1/media/jobs returns submit-time billing, duration estimate, effective input, and input-adjustment diagnostics. GET /api/v1/media/jobs?jobId=...&includeDetails=1 adds sanitized upstream result payloads plus the persisted effective input and adjustment trail.
  • Download outputs within 48 hours. CDN cleanup runs after 72 hours, but availability past 48 hours is not guaranteed.

Audio pricing quick ref

audio.speak

Tiered by estimated duration: <=3s $0.01, >3s..7s $0.02, >7s..10s $0.03.

audio.annotation.reference

Estimated from transcript length when present: <=60s $0.03, <=120s $0.05, <=180s $0.07. Without a transcript the default band is 120s.

ace.step.create

max($0.05, durationSeconds * qualityKbps * $0.000008)

moss.sound.effect

max($0.01, durationSeconds * $0.004 + maxNewTokens * $0.000008)

Balance request
curl -X GET "$API_BASE/api/v1/balance" \
  -H "Authorization: Bearer $API_KEY"
Image generation request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Idempotency-Key: job-001" \
  -d '{
    "task": "image.generate",
    "input": {
      "prompt": "cinematic fox astronaut",
      "width": 1536,
      "height": 1024,
      "referenceImages": [
        "https://cdn.example.com/input/style-ref.webp",
        {
          "dataBase64": "<base64-or-data-uri>",
          "mimeType": "image/png"
        }
      ]
    }
  }'
Image edit request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "image.edit",
    "input": {
      "prompt": "turn this scene into a dramatic noir poster",
      "sourceImage": "https://cdn.example.com/input/source.jpg",
      "referenceImages": [
        "https://cdn.example.com/input/style-ref.webp",
        {
          "dataBase64": "<base64-or-data-uri>",
          "mimeType": "image/webp"
        }
      ]
    }
  }'
Video generation request (image-to-video)
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.generate",
    "input": {
      "prompt": "slow cinematic push-in with floating particles",
      "startFrameImage": {
        "dataBase64": "<base64-or-data-uri>",
        "mimeType": "image/png"
      },
      "finalFrameImage": "https://cdn.example.com/input/final-frame.webp",
      "durationSeconds": 6,
      "width": 1920,
      "height": 1088,
      "interpolationFps": 60
    }
  }'
Video generation request (text-to-video)
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.generate",
    "input": {
      "prompt": "single continuous cinematic flythrough over a neon city at dawn",
      "durationSeconds": 6,
      "width": 1920,
      "height": 1088,
      "interpolationFps": 0
    }
  }'
Video generation request (extreme quality)
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.generate",
    "input": {
      "prompt": "Single continuous hero shot with restrained camera motion and strong identity retention.",
      "startFrameImageUrl": "https://cdn.example.com/input/start-frame.webp",
      "durationSeconds": 6,
      "width": 2048,
      "height": 1152,
      "extremeQuality": true
    }
  }'
Video generation request (reference audio + start frame)
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.generate",
    "input": {
      "prompt": "image-to-video shot anchored to the supplied start frame for a direct-to-camera dialogue or singing line. keep the performance synced to the provided reference audio from start to finish. honor the frame-zero composition from the supplied start frame and keep the same subject count, identity, wardrobe, and environment family. favor restrained camera behavior; prefer locked framing, a very gentle push, or a tiny lateral drift, with subject motion carrying the shot more than the camera. keep the subject readable whenever the face is foregrounded. let mouth articulation, jaw travel, breath timing, shoulder rhythm, and phrase-timed gestures carry the sync. start moving immediately. keep lip sync faithful to the supplied reference audio, with mouth shapes and breath timing driven by that phrase. keep one continuous shot with stable anatomy and coherent motion.",
      "startFrameImageUrl": "https://cdn.example.com/input/start-frame.webp",
      "finalFrameImageUrl": "https://cdn.example.com/input/final-frame.webp",
      "referenceAudioUrl": "https://cdn.example.com/input/reference-track.opus",
      "durationSeconds": 6,
      "width": 1920,
      "height": 1088,
      "interpolationFps": 60
    }
  }'
Video generation request (overlap-frame extension + reference audio)
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.generate",
    "input": {
      "prompt": "image-to-video shot anchored to the supplied start frame for a direct-to-camera dialogue continuation. keep the performance synced to the provided reference audio from start to finish. honor the frame-zero composition from the supplied start frame and keep the same subject count, identity, wardrobe, and environment family. favor restrained camera behavior; prefer locked framing, a very gentle push, or a tiny lateral drift, with subject motion carrying the shot more than the camera. keep the subject readable whenever the face is foregrounded. let mouth articulation, jaw travel, breath timing, shoulder rhythm, and phrase-timed gestures carry the sync. start moving immediately. keep lip sync faithful to the supplied reference audio, with mouth shapes and breath timing driven by that phrase. keep one continuous shot with stable anatomy, coherent motion, and preserved white balance, skin tone, exposure, contrast, saturation, and background palette.",
      "overlapFrameImageUrls": [
        "https://cdn.example.com/input/clip-a-last-03.webp",
        "https://cdn.example.com/input/clip-a-last-02.webp",
        "https://cdn.example.com/input/clip-a-last-01.webp"
      ],
      "referenceAudioUrl": "https://cdn.example.com/input/dialogue-half-b.opus",
      "durationFrames": 257,
      "width": 1280,
      "height": 1280,
      "sampledFps": 25,
      "interpolationFps": 60,
      "seed": 424242
    }
  }'
Video combine request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "video.combine",
    "input": {
      "videoUrls": [
        "https://cdn.example.com/output/clip-a.mp4",
        "https://cdn.example.com/output/clip-b.mp4"
      ],
      "overlapFrames": 3,
      "frameRate": 60
    }
  }'
TTS request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "audio.speak",
    "input": {
      "text": "We launch at dawn, hold formation, and bring everyone home safely.",
      "voiceDescription": "Warm confident narrator with balanced pacing, stable projection, and natural emphasis.",
      "language": "English",
      "quality": "128k",
      "seed": 123456
    }
  }'
TTS voice clone request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "audio.speak",
    "input": {
      "mode": "voice_clone",
      "text": "Keep the same character voice, but deliver this line with calm authority and clean pacing.",
      "referenceAudioUrl": "https://cdn.example.com/input/rin-voice-sample.opus",
      "referenceTranscript": "Rin keeps her voice low. She measures every word before it lands.",
      "referenceAudioMaxSeconds": 6,
      "maxNewTokens": 384,
      "language": "English",
      "quality": "128k",
      "seed": 424242
    }
  }'
Audio annotation request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "audio.annotation.reference",
    "input": {
      "sourceAudioUrl": "https://cdn.example.com/input/country-heart-edit.mp3",
      "transcript": "[Intro]\nBoots on the porch and a laptop glow,\nI mint an A P I key before the roosters crow.",
      "language": "en",
      "transcriptionModel": "distil-large-v3",
      "device": "auto"
    }
  }'
Music generation request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "ace.step.create",
    "input": {
      "tags": "anthemic ace step electro-pop, wide stereo synths, tight sidechained bass, punchy drum transients, strong vocal hooks, modern polished mix",
      "lyrics": "[Intro]\nNeon rain on the avenue\n\n[Verse]\nWe were shadows in a crowded room\nNow the skyline sings our names\n\n[Pre-Chorus]\nHands up, hearts up, hold the line\n\n[Chorus]\nWe run through the midnight light\nTurn the static into fire tonight\n\n[Bridge]\nStrip it down, then build it higher\n\n[Final Chorus]\nWe run through the midnight light",
      "qualityPreset": "high_quality",
      "durationSeconds": 108,
      "quality": "192k",
      "language": "en"
    }
  }'
Sound effect request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "moss.sound.effect",
    "input": {
      "prompt": "A sharp pistol shot in a dry canyon with a quick mechanical click and short tail echo.",
      "durationSeconds": 2,
      "quality": "128k",
      "topK": 50,
      "maxNewTokens": 1024
    }
  }'
3D generation request
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "trellis.generate",
    "input": {
      "sourceImageUrl": "https://cdn.example.com/input/boat_ref.png",
      "qualityPreset": "balanced",
      "targetFaceCount": 20000,
      "textureSize": 1024,
      "textureResolution": 512,
      "maxViews": 4,
      "sparseStructureSteps": 10,
      "shapeSteps": 10,
      "textureSteps": 10,
      "meshResolution": 1024,
      "remeshFillHoles": true,
      "remeshFillHolesMaxPerimeter": 0.05,
      "meshClusterConeHalfAngleRad": 55
    }
  }'
Media job status request (summary)
curl -X GET "$API_BASE/api/v1/media/jobs?jobId=abc123" \
  -H "Authorization: Bearer $API_KEY"
Media job status request (includeDetails=1)
curl -X GET "$API_BASE/api/v1/media/jobs?jobId=abc123&includeDetails=1" \
  -H "Authorization: Bearer $API_KEY"

Idempotency and errors

Retry-safe write request example
curl -X POST "$API_BASE/api/v1/media/jobs" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Idempotency-Key: job-001" \
  -d '{
    "task": "image.generate",
    "input": {
      "prompt": "cinematic fox astronaut",
      "width": 1024,
      "height": 1024
    },
    "idempotencyKey": "job-001"
  }'

Send `X-Idempotency-Key` when retrying writes (`/api/v1/media/jobs`) to avoid duplicate charges. Error responses are JSON with `error` and optional `details`. Poll `GET /api/v1/media/jobs?jobId=...` for status and `downloadableOutputUrls`, or add `includeDetails=1` when you need sanitized upstream result details.