Gateway Quick Start for AI Agents

Generate images, videos, speech, and music using a single varg_xxx API key. No per-provider keys needed.

Setup

bun install vargai @vargai/gateway ai

# .env
VARG_API_KEY=varg_xxx

That's it. The gateway pools provider keys (fal, ElevenLabs, Higgsfield, Replicate) server-side. You only need one key.

Client Setup

Two ways to use the gateway:

Low-level client (`VargClient`)

Direct access to gateway REST endpoints. Returns job objects you poll/wait on.

import { VargClient } from "@vargai/gateway";

const client = new VargClient({
  apiKey: process.env.VARG_API_KEY!,
});

AI SDK provider (`createVarg`)

Implements the Vercel AI SDK provider interface. Use with generateVideo, generateImage, etc.

import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

Generate an Image

With VargClient

const job = await client.createImage({
  model: "nano-banana-pro",
  prompt: "sunset over mountains, dramatic lighting",
  aspect_ratio: "16:9",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.png

With AI SDK

import { generateImage } from "vargai/ai";

const { image } = await generateImage({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "sunset over mountains, dramatic lighting",
  aspectRatio: "16:9",
});

Generate a Video

With VargClient

const job = await client.createVideo({
  model: "kling-v3",
  prompt: "ocean waves crashing on rocks, cinematic slow motion",
  duration: 5,
  aspect_ratio: "16:9",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp4

With AI SDK

import { generateVideo } from "vargai/ai";

const { video } = await generateVideo({
  model: varg.videoModel("kling-v3"),
  prompt: "ocean waves crashing on rocks, cinematic slow motion",
  aspectRatio: "16:9",
  duration: 5,
});

Image-to-Video

Upload an image first, then animate it:

// Upload
const file = await Bun.file("./photo.jpg").arrayBuffer();
const uploaded = await client.uploadFile(new Blob([file]), "image/jpeg");

// Animate
const job = await client.createVideo({
  model: "kling-v3",
  prompt: "photo comes to life, character smiles and waves",
  files: [{ url: uploaded.url }],
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);

Generate Speech

const job = await client.createSpeech({
  model: "eleven_multilingual_v2",
  text: "Welcome to the future of AI video generation.",
  voice: "rachel",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp3

Voices: rachel, josh, adam, sarah, domi, elli, antoni, arnold, sam

Generate Music

const job = await client.createMusic({
  model: "music_v1",
  prompt: "upbeat electronic, energetic, modern pop feel",
  duration: 30,
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);

Video Composition (React / JSX)

Use vargai/react to compose multi-scene videos with AI-generated images, video clips, voiceover, music, and captions — all stitched into a single MP4. Pass gateway models via the defaults option so every generation routes through the gateway.

Minimal example

import { render, Render, Clip, Image, Video } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

const character = Image({
  prompt: "cute orange cat, big eyes, Pixar style",
  model: varg.imageModel("nano-banana-pro"),
  aspectRatio: "9:16",
});

await render(
  <Render width={1080} height={1920}>
    <Clip duration={5}>
      <Video
        prompt={{ text: "cat waves hello, bounces up and down", images: [character] }}
        model={varg.videoModel("kling-v3")}
      />
    </Clip>
  </Render>,
  { output: "output/hello.mp4" },
);

Run: bun run video.tsx

Using `defaults` (recommended)

Set default models once so you don't repeat them on every element:

import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

const voiceover = Speech({
  voice: "rachel",
  children: "Hey everyone! Check out this amazing sunset.",
});

await render(
  <Render width={1080} height={1920}>
    <Music prompt="chill lo-fi beats, relaxing" volume={0.15} />

    <Clip duration={5}>
      <Image prompt="sunset over ocean, golden hour, cinematic" zoom="in" />
    </Clip>

    <Clip duration={5} transition={{ name: "fade", duration: 0.5 }}>
      <Video prompt="waves gently rolling onto sandy beach, warm light" />
    </Clip>

    <Captions src={voiceover} style="tiktok" color="#ffffff" />
  </Render>,
  {
    output: "output/sunset.mp4",
    defaults: {
      video: varg.videoModel("kling-v3"),
      image: varg.imageModel("nano-banana-pro"),
      speech: varg.speechModel("eleven_turbo_v2"),
      music: varg.musicModel("music_v1"),
    },
  },
);

When defaults are set, elements without an explicit model prop use the default. You can still override per-element:

<Image prompt="high quality portrait" model={varg.imageModel("flux-pro")} />

Multi-scene video with character consistency

import { render, Render, Clip, Image, Video, Speech, Music, Captions, Title } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// Create character once, reuse across scenes
const character = Image({
  prompt: "friendly tech influencer, casual style, ring light",
  model: varg.imageModel("soul"),
  aspectRatio: "9:16",
});

const voiceover = Speech({
  voice: "sarah",
  children: "Three things you need to know about AI video generation.",
});

await render(
  <Render width={1080} height={1920}>
    <Music prompt="upbeat tech podcast intro" volume={0.1} />

    <Clip duration={5}>
      <Video
        prompt={{ text: "person speaking to camera, friendly wave", images: [character] }}
      />
      <Title position="bottom" color="#ffffff">3 AI Video Tips</Title>
    </Clip>

    <Clip duration={4} transition={{ name: "fade", duration: 0.5 }}>
      <Image prompt="AI neural network visualization, glowing nodes" zoom="in" />
      <Title position="center" color="#ffffff">Tip 1: Use Image-to-Video</Title>
    </Clip>

    <Clip duration={5} transition={{ name: "crossfade", duration: 0.8 }}>
      <Video
        prompt={{ text: "person smiling, nodding confidently", images: [character] }}
      />
    </Clip>

    <Captions src={voiceover} style="tiktok" color="#ffffff" activeColor="#FFD700" />
  </Render>,
  {
    output: "output/tips.mp4",
    defaults: {
      video: varg.videoModel("kling-v3"),
      image: varg.imageModel("nano-banana-pro"),
      speech: varg.speechModel("eleven_turbo_v2"),
      music: varg.musicModel("music_v1"),
    },
  },
);

All JSX components

Component	Purpose	Key Props
`<Render>`	Root container	`width`, `height`, `fps`
`<Clip>`	Time segment / scene	`duration`, `transition`, `cutFrom`, `cutTo`
`<Image>`	AI or static image	`prompt`, `src`, `model`, `zoom`, `aspectRatio`
`<Video>`	AI or source video	`prompt`, `src`, `model`, `volume`, `cutFrom`, `cutTo`
`<Speech>`	Text-to-speech	`voice`, `model`, `children` (text)
`<Music>`	Background music	`prompt`, `src`, `model`, `volume`, `loop`, `ducking`
`<Title>`	Text overlay	`position`, `color`, `start`, `end`
`<Subtitle>`	Subtitle text	`backgroundColor`
`<Captions>`	Auto-generated subs	`src`, `style`, `color`, `activeColor`
`<Overlay>`	Positioned layer	`left`, `top`, `width`, `height`
`<TalkingHead>`	Animated character	`character`, `src`, `voice`, `model`
`<Packshot>`	End card / CTA	`background`, `logo`, `cta`, `blinkCta`
`<Split>`	Side-by-side layout	`direction`
`<Slider>`	Before/after reveal	`direction`
`<Swipe>`	Tinder-style cards	`direction`, `interval`

Transitions

<Clip transition={{ name: "fade", duration: 0.5 }}>
<Clip transition={{ name: "crossfade", duration: 0.5 }}>
<Clip transition={{ name: "wipeleft", duration: 0.5 }}>
<Clip transition={{ name: "cube", duration: 0.8 }}>

Zoom effects on images

<Image prompt="landscape" zoom="in" />    // Ken Burns zoom in
<Image prompt="landscape" zoom="out" />   // Zoom out
<Image prompt="landscape" zoom="left" />  // Pan left
<Image prompt="landscape" zoom="right" /> // Pan right

Caption styles

<Captions src={voiceover} style="tiktok" />     // Word-by-word highlight
<Captions src={voiceover} style="karaoke" />    // Fill left-to-right
<Captions src={voiceover} style="bounce" />     // Words bounce in
<Captions src={voiceover} style="typewriter" /> // Typing effect

Aspect ratios

Ratio	Resolution	Platform
`9:16`	1080x1920	TikTok, Reels, Shorts
`16:9`	1920x1080	YouTube, Twitter
`1:1`	1080x1080	Instagram Feed

Render options

await render(<Render>...</Render>, {
  output: "video.mp4",           // Save to file
  cache: ".cache/ai",            // Custom cache directory
  defaults: { ... },             // Default models (see above)
  concurrency: 3,                // Max parallel AI calls (default: 3)
  mode: "preview",               // Use placeholders instead of real generation
});

Template: Cinematic tribute (8 scenes, voiceover, music, captions)

A production-level example showing reference-based character consistency across 8 scenes, clip trimming, fade transitions, music with ducking, voiceover, and captions.

import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// --- REFERENCES ---
const KURT_REF = "https://uu.varg.ai/1773109560573_gbz0buz0.jpg";

// --- STYLE ---
const STYLE = "vintage color photograph, film grain texture, cool blue tones, 1990s grunge aesthetic, moody atmospheric lighting, soft focus, no sepia, no border";
const CHARACTER = "same man from reference, long blonde hair, pale complexion, introspective gaze, worn expression";
const CHARACTER_SPEAKING = `${CHARACTER}, lips slightly parted, speaking`;

// --- SCENE IMAGES (all reference the same photo for character consistency) ---

const img1 = Image({
  prompt: { text: `Medium close-up portrait, ${CHARACTER_SPEAKING}, in a basement or garage setting, holding an acoustic guitar, dim lamp light, melancholic expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img2 = Image({
  prompt: { text: `Close-up portrait, ${CHARACTER_SPEAKING}, in a recording studio, microphone visible, focused intensity, cool studio lighting, raw emotional expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img3 = Image({
  prompt: { text: `Medium shot, ${CHARACTER}, on stage with guitar, powerful performance stance, stage lights and smoke, wild energy, hair flying, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img4 = Image({
  prompt: { text: `Portrait, ${CHARACTER}, in quiet contemplative pose, looking downward, soft diffused light, vulnerable and fragile expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img5 = Image({
  prompt: { text: `Close-up of man's anguished face, ${CHARACTER}, eyes filled with pain, brow furrowed, emotional turmoil visible, moody dark lighting, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img6 = Image({
  prompt: { text: `Close-up hands on guitar strings, ${CHARACTER} playing, fingers on fretboard, worn hands, intimate creative moment, dim candlelit atmosphere, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img7 = Image({
  prompt: { text: `Medium shot from below, ${CHARACTER} on stage with guitar raised, intense performance, crowd energy, stage lights, raw power, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img8 = Image({
  prompt: { text: `Wide shot, ${CHARACTER} alone in empty room, sitting isolated, looking haunted, pale blue light through window, solitary melancholy, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

// --- ANIMATE TO VIDEO ---

const vid1 = Video({
  prompt: { text: "man holding guitar, fingers moving gently on strings, sad contemplative gaze, quiet intensity", images: [img1] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid2 = Video({
  prompt: { text: "man in studio, raw emotional expression, voice trembling slightly, intense focus, vulnerability", images: [img2] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid3 = Video({
  prompt: { text: "man on stage with guitar, explosive energy, body swaying with music, wild passionate performance", images: [img3] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid4 = Video({
  prompt: { text: "man in quiet pose, head slightly tilted, troubled gaze, gentle defeated movement", images: [img4] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid5 = Video({
  prompt: { text: "man's jaw clenches, eyes show deep pain, brow furrows, emotional anguish expression", images: [img5] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid6 = Video({
  prompt: { text: "fingers playing guitar strings, hand movement, strings vibrating, intimate musical moment", images: [img6] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid7 = Video({
  prompt: { text: "man thrashing on stage with guitar, intense movement, powerful performance, crowd energy", images: [img7] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid8 = Video({
  prompt: { text: "man sits alone in empty room, slow movement, withdrawn isolated pose, melancholic atmosphere", images: [img8] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

// --- VOICEOVER ---

const speech = Speech({
  voice: "adam",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Kurt Cobain came from nowhere. His pain became a generation's anthem. Raw, authentic, unfiltered. Nirvana changed everything. Though he burned too bright, too fast. His voice echoes forever.",
});

// --- BUILD RENDER ---

await render(
  <Render width={1080} height={1920}>
    <Music
      prompt="dark melancholic grunge rock instrumental, distorted guitar, 1990s alternative rock, heavy sad atmosphere"
      model={varg.musicModel("music_v1")}
      volume={0.15}
      loop={true}
      ducking={true}
      duration={16}
    />

    <Clip duration={2} cutFrom={0.3} cutTo={2.3}>
      {vid1}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid2}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid5}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid6}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid3}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid7}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid4}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid8}
    </Clip>

    <Captions src={speech} style="tiktok" position="bottom" />
  </Render>,
  { output: "output/legend-kurt.mp4" },
);

Key techniques in this template:

Reference image consistency: All 8 scene images pass the same KURT_REF photo to nano-banana-pro/edit, keeping the character recognizable across different settings
cutFrom / cutTo: Trims the first ~0.3s of each generated clip (removes AI "warmup" frames)
ducking: Automatically lowers music volume when speech is playing
loop: Music loops to fill the full video duration
Non-sequential clip ordering: Clips are arranged for editorial pacing (vid1, vid2, vid5, vid6, vid3, vid7, vid4, vid8), not generation order

Best Practices

Recommended default models

Type	Model	ID	Credits
Video	Kling V3	`kling-v3`	150
Video (budget)	Kling V3 Standard	`kling-v3-standard`	100
Image	Nano Banana Pro	`nano-banana-pro`	5
Image editing	Nano Banana Pro Edit	`nano-banana-pro/edit`	5
Image (fast)	Flux Schnell	`flux-schnell`	5
Speech	Turbo	`eleven_turbo_v2`	20
Music	Music V1	`music_v1`	30

Model duration constraints

These constraints are critical — wrong values cause 422 errors:

Model	Duration rule
`kling-v3`, `kling-v3-standard`, `kling-v2.6`	Any integer from 3 to 15 seconds
`kling-v2.5` and older	ONLY 5 or 10 seconds — any other value fails
`ltx-2-19b-distilled`	Uses `num_frames` (not `duration`) and `video_size` (not `aspect_ratio`)

One image per Video prompt

Pass only one image in the Video prompt images array. Multiple images cause errors.

// CORRECT
Video({ prompt: { text: "cat walks", images: [catImage] }, ... })

// WRONG — will error
Video({ prompt: { text: "cat walks", images: [catImage, bgImage] }, ... })

Clip duration must match video duration

The <Clip duration={N}> should match the duration of the Video inside it:

const vid = Video({ ..., duration: 5 });

// CORRECT
<Clip duration={5}>{vid}</Clip>

// WRONG — mismatch causes timing issues
<Clip duration={3}>{vid}</Clip>

Audio-first workflow

Always generate audio (speech/music) before video. Audio duration is unpredictable — a voiceover might be 6s or 12s depending on text length and pacing. If you generate video first, you'll either have mismatched lengths or waste credits regenerating.

For voiceover-driven videos (local render):

Generate the voiceover first
Check its duration
Generate video clips to match

import { createVarg } from "@vargai/gateway";
import { generateVideo } from "vargai/ai";
import { render, Render, Clip, Image, Video, Speech, Captions } from "vargai/react";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// 1. Generate voiceover first
const voiceover = Speech({
  voice: "rachel",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Welcome to our product showcase. Here are three features you'll love.",
});

// 2. Render with `shortest` — video ends when voiceover ends
await render(
  <Render width={1080} height={1920} shortest>
    <Clip duration={15}>
      <Video
        prompt={{ text: "product showcase, smooth camera orbit", images: [productImage] }}
        model={varg.videoModel("kling-v3")}
        duration={15}
      />
    </Clip>
    <Captions src={voiceover} style="tiktok" />
  </Render>,
  { output: "output/showcase.mp4" },
);
// If voiceover is 8s, output is 8s (not 15s of mostly silent video)

For music: Always set duration on <Music> to match the total video length. Without it, ElevenLabs generates ~60s of audio which extends the video far beyond intended length. If the music is longer than the video, use shortest as a safety net:

// Video is 3 clips × 5s = 15s total
<Render width={1080} height={1920} shortest>
  <Music prompt="ambient electronic" volume={0.2} duration={15} />
  <Clip duration={5}>{vid1}</Clip>
  <Clip duration={5}>{vid2}</Clip>
  <Clip duration={5}>{vid3}</Clip>
</Render>

For gateway-only workflows (no local render): Generate speech first via VargClient, then use the audio duration to decide how many video clips to generate:

// 1. Generate speech first
const speechJob = await client.createSpeech({
  model: "eleven_turbo_v2",
  text: "A narration script that could be any length.",
  voice: "rachel",
});
const speech = await client.waitForJob(speechJob.job_id);
// speech.output.url → use this to determine pacing

// 2. Now generate video clips knowing the audio length
const videoJob = await client.createVideo({
  model: "kling-v3",
  prompt: "matching scene for the narration",
  duration: 10, // match to speech length
});

Prompt format per image model

Model	Prompt format
`nano-banana-pro`	Plain string: `prompt: "a sunset over the ocean"`
`nano-banana-pro/edit`	Object with images: `prompt: { text: "...", images: [refUrl] }`
`flux-schnell`, `flux-pro`, `flux-dev`	Plain string
`soul` (Higgsfield)	Plain string

Define media with function calls, not JSX

Media elements (Image, Video, Speech, Music) must be defined as variables using function calls. JSX assignment does not work for these:

// CORRECT — function call
const img = Image({ prompt: "a cat", model: varg.imageModel("nano-banana-pro") });
const vid = Video({ prompt: { text: "cat moves", images: [img] }, model: varg.videoModel("kling-v3") });

// WRONG — JSX assignment
const img = <Image prompt="a cat" />;

Then use the variables as children or props in JSX:

<Clip duration={5}>{vid}</Clip>

Character consistency: ref → edit → animate

When a character or product appears across multiple scenes, use this 3-step workflow:

Reference image — generate (or receive) a character hero shot
Scene images via /edit — use nano-banana-pro/edit to place the character into each scene, always passing the reference via images: [ref]
Animate via i2v — pass each scene image to Video() for image-to-video

Never generate scene images from scratch — always edit from the reference. Without this pattern, each clip generates a different-looking character.

import { createVarg } from "@vargai/gateway";
import { render, Render, Clip, Image, Video } from "vargai/react";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// 1. Character reference
const ref = Image({
  prompt: "a man in a dark suit, dramatic side lighting, neutral background",
  model: varg.imageModel("nano-banana-pro"),
  aspectRatio: "9:16",
});

// 2. Scene images — place character into different environments
const scene1 = Image({
  prompt: { text: "same man sitting at a wooden desk, harsh lamp light, dark study", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});
const scene2 = Image({
  prompt: { text: "same man standing by a tall window, cold grey daylight on face", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});
const scene3 = Image({
  prompt: { text: "same man walking alone down a narrow cobblestone alley at night", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

// 3. Animate each scene image
const vid1 = Video({ prompt: { text: "man looks up from desk, slight head turn", images: [scene1] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid2 = Video({ prompt: { text: "man turns away from window, eyes cast down", images: [scene2] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid3 = Video({ prompt: { text: "man walks slowly forward, coat swaying", images: [scene3] }, model: varg.videoModel("kling-v3"), duration: 5 });

await render(
  <Render width={1080} height={1920}>
    <Clip duration={5}>{vid1}</Clip>
    <Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid2}</Clip>
    <Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid3}</Clip>
  </Render>,
  { output: "output/multi-scene.mp4" },
);

Trim AI warm-up frames

Generated video clips often have ~0.3s of "warm-up" at the start where the image is mostly static. Use cutFrom on <Clip> to trim it:

// Trim first 0.3s of generated video for snappier cuts
<Clip duration={2} cutFrom={0.3} cutTo={2.3}>{vid}</Clip>

The source video duration must be >= cutTo.

Cache-aware iteration

When modifying part of a render, keep unchanged prompts exactly the same (same text, same model, same parameters). This ensures unchanged assets hit the cache and are not re-generated — saving time and credits. Only change the prompts for the parts you're actually modifying.

Lipsync

To sync a video to speech audio, pass both video and audio in the Video prompt:

const voiceover = Speech({
  voice: "rachel",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Hello, welcome to our product demo.",
});

const character = Video({
  prompt: { text: "person speaking, subtle movements", images: [characterImage] },
  model: varg.videoModel("kling-v3"),
  duration: 5,
});

const lipsynced = Video({
  prompt: { video: character, audio: voiceover },
  model: varg.videoModel("sync-v2-pro"),
});

Available Models

Video

Model	ID	Best for
Kling V3	`kling-v3`	Highest quality
Kling V3 Standard	`kling-v3-standard`	Quality/cost balance
Kling V2.6	`kling-v2.6`	High quality + native audio
Kling V2.5	`kling-v2.5`	Reliable general purpose
Wan 2.5	`wan-2.5`	Characters, anime
Minimax	`minimax`	Alternative
LTX 2	`ltx-2-19b-distilled`	Fast, with audio
Sync V2 Pro	`sync-v2-pro`	Lipsync
Lipsync	`lipsync`	Lipsync (budget)

Image

Model	ID	Best for
Flux Schnell	`flux-schnell`	Fast, cheap
Flux Pro	`flux-pro`	High quality
Flux Dev	`flux-dev`	Development
Nano Banana Pro	`nano-banana-pro`	Versatile
Nano Banana Edit	`nano-banana-pro/edit`	Image editing, character consistency
Recraft V3	`recraft-v3`	Design, illustration
Soul	`soul`	Character consistency (Higgsfield)

Speech

Model	ID
Multilingual V2	`eleven_multilingual_v2`
Turbo V2	`eleven_turbo_v2`
Flash V2.5	`eleven_flash_v2_5`
V3	`eleven_v3`

Music

Model	ID
Music V1	`music_v1`

Key Behaviors

Caching

Same parameters = instant cached result at zero cost. Cache is keyed on model + prompt + files + options. Cache TTL is 30 days. All outputs are persisted to s3.varg.ai.

Billing

1 credit = 1 cent. Signup gives 1,000 credits ($10).
Cache hits are free.
Credits are deducted after successful generation.
Cost examples: nano-banana-pro = 5 credits, kling-v3 = 150 credits, kling-v3-standard = 100 credits, speech = 20-25 credits, music = 30 credits.

Rate Limits

240 requests/minute per API key.

Error Handling

{
  "error": { "_tag": "ValidationError", "message": "prompt is required", "statusCode": 400 }
}

Status	Tag	Meaning
400	`ValidationError`	Bad request
401	`AuthError`	Invalid API key
402	`InsufficientBalanceError`	Out of credits
429	`RateLimitError`	Too many requests
502	`ProviderError`	Upstream AI provider failed

The VargClient throws VargGatewayError with statusCode, message, field, and provider properties.

Common Agent Patterns

Pattern 1: Generate image, then animate it

const img = await client.createImage({
  model: "nano-banana-pro",
  prompt: "product photo of white sneakers on marble surface",
  aspect_ratio: "9:16",
});
const imgResult = await client.waitForJob(img.job_id);

const vid = await client.createVideo({
  model: "kling-v3",
  prompt: "camera slowly orbits around the sneakers, cinematic lighting",
  files: [{ url: imgResult.output!.url }],
});
const vidResult = await client.waitForJob(vid.job_id);
console.log(vidResult.output!.url);

Pattern 2: Generate video with voiceover

// Generate speech and video in parallel
const [speechJob, videoJob] = await Promise.all([
  client.createSpeech({
    model: "eleven_multilingual_v2",
    text: "Today we're looking at three key trends in AI.",
    voice: "sarah",
  }),
  client.createVideo({
    model: "kling-v3",
    prompt: "woman speaking to camera, subtle gestures, professional setting",
    duration: 5,
    aspect_ratio: "9:16",
  }),
]);

const [speech, video] = await Promise.all([
  client.waitForJob(speechJob.job_id),
  client.waitForJob(videoJob.job_id),
]);

console.log("Video:", video.output!.url);
console.log("Audio:", speech.output!.url);

Pattern 3: AI SDK — generate and save to file

import { generateVideo, generateImage } from "vargai/ai";

const { image } = await generateImage({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "anime warrior girl, red hair, silver armor",
  aspectRatio: "9:16",
});

await Bun.write("output/character.png", image.uint8Array);

const { video } = await generateVideo({
  model: varg.videoModel("kling-v3"),
  prompt: "warrior draws sword dramatically",
  aspectRatio: "9:16",
});

await Bun.write("output/scene.mp4", video.uint8Array);

Pattern 4: Batch generate multiple images

const prompts = [
  "sunset over ocean, golden hour",
  "mountain peaks at dawn, misty",
  "city skyline at night, neon lights",
];

const jobs = await Promise.all(
  prompts.map((prompt) =>
    client.createImage({ model: "nano-banana-pro", prompt, aspect_ratio: "16:9" }),
  ),
);

const results = await Promise.all(
  jobs.map((job) => client.waitForJob(job.job_id)),
);

results.forEach((r, i) => console.log(`${prompts[i]}: ${r.output!.url}`));

API Reference

VargClient Methods

Method	Parameters	Returns
`createVideo(params)`	`{ model, prompt, duration?, aspect_ratio?, files?, provider_options? }`	`Promise<JobResponse>`
`createImage(params)`	`{ model, prompt, aspect_ratio?, files?, provider_options? }`	`Promise<JobResponse>`
`createSpeech(params)`	`{ model, text, voice?, provider_options? }`	`Promise<JobResponse>`
`createMusic(params)`	`{ model, prompt, duration?, provider_options? }`	`Promise<JobResponse>`
`uploadFile(blob, mediaType)`	`Blob \| Buffer`, `string`	`Promise<FileUploadResponse>`
`getJob(id)`	`string`	`Promise<JobResponse>`
`waitForJob(id, opts?)`	`string`, `{ pollIntervalMs?, maxAttempts? }`	`Promise<JobResponse>`
`cancelJob(id)`	`string`	`Promise<void>`
`listVoices()`	—	`Promise<VoiceListResponse>`

JobResponse Shape

{
  job_id: string;
  status: "queued" | "processing" | "completed" | "failed" | "cancelled";
  model: string;
  created_at: string;
  completed_at?: string;
  output?: { url: string; media_type: string };
  cache?: { hit: boolean; key: string };
  error?: string;
}

createVarg Provider Methods

Method	Returns
`varg.videoModel(id)`	`VideoModelV3`
`varg.imageModel(id)`	`ImageModelV3`
`varg.speechModel(id)`	`SpeechModelV3`
`varg.musicModel(id)`	`MusicModelV3`

SecurityQQ/gateway-agent-quickstart.md

Gateway Quick Start for AI Agents

Setup

Client Setup

Low-level client (VargClient)

AI SDK provider (createVarg)

Generate an Image

With VargClient

With AI SDK

Generate a Video

With VargClient

With AI SDK

Image-to-Video

Generate Speech

Generate Music

Video Composition (React / JSX)

Minimal example

Using defaults (recommended)

Multi-scene video with character consistency

All JSX components

Transitions

Zoom effects on images

Caption styles

Aspect ratios

Render options

Template: Cinematic tribute (8 scenes, voiceover, music, captions)

Best Practices

Recommended default models

Model duration constraints

One image per Video prompt

Clip duration must match video duration

Audio-first workflow

Prompt format per image model

Define media with function calls, not JSX

Character consistency: ref → edit → animate

Trim AI warm-up frames

Cache-aware iteration

Lipsync

Available Models

Video

Image

Speech

Music

Key Behaviors

Caching

Billing

Rate Limits

Error Handling

Common Agent Patterns

Pattern 1: Generate image, then animate it

Pattern 2: Generate video with voiceover

Pattern 3: AI SDK — generate and save to file

Pattern 4: Batch generate multiple images

API Reference

VargClient Methods

JobResponse Shape

createVarg Provider Methods

Low-level client (`VargClient`)

AI SDK provider (`createVarg`)

Using `defaults` (recommended)