Skip to content

Instantly share code, notes, and snippets.

@SecurityQQ
Last active March 12, 2026 01:36
Show Gist options
  • Select an option

  • Save SecurityQQ/2fbdc2f76f0d26f593eac449356211b8 to your computer and use it in GitHub Desktop.

Select an option

Save SecurityQQ/2fbdc2f76f0d26f593eac449356211b8 to your computer and use it in GitHub Desktop.
varg Gateway & SDK Quick Start for AI Agents

Gateway Quick Start for AI Agents

Generate images, videos, speech, and music using a single varg_xxx API key. No per-provider keys needed.

Setup

bun install vargai @vargai/gateway ai
# .env
VARG_API_KEY=varg_xxx

That's it. The gateway pools provider keys (fal, ElevenLabs, Higgsfield, Replicate) server-side. You only need one key.


Client Setup

Two ways to use the gateway:

Low-level client (VargClient)

Direct access to gateway REST endpoints. Returns job objects you poll/wait on.

import { VargClient } from "@vargai/gateway";

const client = new VargClient({
  apiKey: process.env.VARG_API_KEY!,
});

AI SDK provider (createVarg)

Implements the Vercel AI SDK provider interface. Use with generateVideo, generateImage, etc.

import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

Generate an Image

With VargClient

const job = await client.createImage({
  model: "nano-banana-pro",
  prompt: "sunset over mountains, dramatic lighting",
  aspect_ratio: "16:9",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.png

With AI SDK

import { generateImage } from "vargai/ai";

const { image } = await generateImage({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "sunset over mountains, dramatic lighting",
  aspectRatio: "16:9",
});

Generate a Video

With VargClient

const job = await client.createVideo({
  model: "kling-v3",
  prompt: "ocean waves crashing on rocks, cinematic slow motion",
  duration: 5,
  aspect_ratio: "16:9",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp4

With AI SDK

import { generateVideo } from "vargai/ai";

const { video } = await generateVideo({
  model: varg.videoModel("kling-v3"),
  prompt: "ocean waves crashing on rocks, cinematic slow motion",
  aspectRatio: "16:9",
  duration: 5,
});

Image-to-Video

Upload an image first, then animate it:

// Upload
const file = await Bun.file("./photo.jpg").arrayBuffer();
const uploaded = await client.uploadFile(new Blob([file]), "image/jpeg");

// Animate
const job = await client.createVideo({
  model: "kling-v3",
  prompt: "photo comes to life, character smiles and waves",
  files: [{ url: uploaded.url }],
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);

Generate Speech

const job = await client.createSpeech({
  model: "eleven_multilingual_v2",
  text: "Welcome to the future of AI video generation.",
  voice: "rachel",
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);
// https://s3.varg.ai/o/job_xxx.mp3

Voices: rachel, josh, adam, sarah, domi, elli, antoni, arnold, sam


Generate Music

const job = await client.createMusic({
  model: "music_v1",
  prompt: "upbeat electronic, energetic, modern pop feel",
  duration: 30,
});

const result = await client.waitForJob(job.job_id);
console.log(result.output!.url);

Video Composition (React / JSX)

Use vargai/react to compose multi-scene videos with AI-generated images, video clips, voiceover, music, and captions — all stitched into a single MP4. Pass gateway models via the defaults option so every generation routes through the gateway.

Minimal example

import { render, Render, Clip, Image, Video } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

const character = Image({
  prompt: "cute orange cat, big eyes, Pixar style",
  model: varg.imageModel("nano-banana-pro"),
  aspectRatio: "9:16",
});

await render(
  <Render width={1080} height={1920}>
    <Clip duration={5}>
      <Video
        prompt={{ text: "cat waves hello, bounces up and down", images: [character] }}
        model={varg.videoModel("kling-v3")}
      />
    </Clip>
  </Render>,
  { output: "output/hello.mp4" },
);

Run: bun run video.tsx

Using defaults (recommended)

Set default models once so you don't repeat them on every element:

import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

const voiceover = Speech({
  voice: "rachel",
  children: "Hey everyone! Check out this amazing sunset.",
});

await render(
  <Render width={1080} height={1920}>
    <Music prompt="chill lo-fi beats, relaxing" volume={0.15} />

    <Clip duration={5}>
      <Image prompt="sunset over ocean, golden hour, cinematic" zoom="in" />
    </Clip>

    <Clip duration={5} transition={{ name: "fade", duration: 0.5 }}>
      <Video prompt="waves gently rolling onto sandy beach, warm light" />
    </Clip>

    <Captions src={voiceover} style="tiktok" color="#ffffff" />
  </Render>,
  {
    output: "output/sunset.mp4",
    defaults: {
      video: varg.videoModel("kling-v3"),
      image: varg.imageModel("nano-banana-pro"),
      speech: varg.speechModel("eleven_turbo_v2"),
      music: varg.musicModel("music_v1"),
    },
  },
);

When defaults are set, elements without an explicit model prop use the default. You can still override per-element:

<Image prompt="high quality portrait" model={varg.imageModel("flux-pro")} />

Multi-scene video with character consistency

import { render, Render, Clip, Image, Video, Speech, Music, Captions, Title } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// Create character once, reuse across scenes
const character = Image({
  prompt: "friendly tech influencer, casual style, ring light",
  model: varg.imageModel("soul"),
  aspectRatio: "9:16",
});

const voiceover = Speech({
  voice: "sarah",
  children: "Three things you need to know about AI video generation.",
});

await render(
  <Render width={1080} height={1920}>
    <Music prompt="upbeat tech podcast intro" volume={0.1} />

    <Clip duration={5}>
      <Video
        prompt={{ text: "person speaking to camera, friendly wave", images: [character] }}
      />
      <Title position="bottom" color="#ffffff">3 AI Video Tips</Title>
    </Clip>

    <Clip duration={4} transition={{ name: "fade", duration: 0.5 }}>
      <Image prompt="AI neural network visualization, glowing nodes" zoom="in" />
      <Title position="center" color="#ffffff">Tip 1: Use Image-to-Video</Title>
    </Clip>

    <Clip duration={5} transition={{ name: "crossfade", duration: 0.8 }}>
      <Video
        prompt={{ text: "person smiling, nodding confidently", images: [character] }}
      />
    </Clip>

    <Captions src={voiceover} style="tiktok" color="#ffffff" activeColor="#FFD700" />
  </Render>,
  {
    output: "output/tips.mp4",
    defaults: {
      video: varg.videoModel("kling-v3"),
      image: varg.imageModel("nano-banana-pro"),
      speech: varg.speechModel("eleven_turbo_v2"),
      music: varg.musicModel("music_v1"),
    },
  },
);

All JSX components

Component Purpose Key Props
<Render> Root container width, height, fps
<Clip> Time segment / scene duration, transition, cutFrom, cutTo
<Image> AI or static image prompt, src, model, zoom, aspectRatio
<Video> AI or source video prompt, src, model, volume, cutFrom, cutTo
<Speech> Text-to-speech voice, model, children (text)
<Music> Background music prompt, src, model, volume, loop, ducking
<Title> Text overlay position, color, start, end
<Subtitle> Subtitle text backgroundColor
<Captions> Auto-generated subs src, style, color, activeColor
<Overlay> Positioned layer left, top, width, height
<TalkingHead> Animated character character, src, voice, model
<Packshot> End card / CTA background, logo, cta, blinkCta
<Split> Side-by-side layout direction
<Slider> Before/after reveal direction
<Swipe> Tinder-style cards direction, interval

Transitions

<Clip transition={{ name: "fade", duration: 0.5 }}>
<Clip transition={{ name: "crossfade", duration: 0.5 }}>
<Clip transition={{ name: "wipeleft", duration: 0.5 }}>
<Clip transition={{ name: "cube", duration: 0.8 }}>

Zoom effects on images

<Image prompt="landscape" zoom="in" />    // Ken Burns zoom in
<Image prompt="landscape" zoom="out" />   // Zoom out
<Image prompt="landscape" zoom="left" />  // Pan left
<Image prompt="landscape" zoom="right" /> // Pan right

Caption styles

<Captions src={voiceover} style="tiktok" />     // Word-by-word highlight
<Captions src={voiceover} style="karaoke" />    // Fill left-to-right
<Captions src={voiceover} style="bounce" />     // Words bounce in
<Captions src={voiceover} style="typewriter" /> // Typing effect

Aspect ratios

Ratio Resolution Platform
9:16 1080x1920 TikTok, Reels, Shorts
16:9 1920x1080 YouTube, Twitter
1:1 1080x1080 Instagram Feed

Render options

await render(<Render>...</Render>, {
  output: "video.mp4",           // Save to file
  cache: ".cache/ai",            // Custom cache directory
  defaults: { ... },             // Default models (see above)
  concurrency: 3,                // Max parallel AI calls (default: 3)
  mode: "preview",               // Use placeholders instead of real generation
});

Template: Cinematic tribute (8 scenes, voiceover, music, captions)

A production-level example showing reference-based character consistency across 8 scenes, clip trimming, fade transitions, music with ducking, voiceover, and captions.

import { render, Render, Clip, Image, Video, Speech, Music, Captions } from "vargai/react";
import { createVarg } from "@vargai/gateway";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// --- REFERENCES ---
const KURT_REF = "https://uu.varg.ai/1773109560573_gbz0buz0.jpg";

// --- STYLE ---
const STYLE = "vintage color photograph, film grain texture, cool blue tones, 1990s grunge aesthetic, moody atmospheric lighting, soft focus, no sepia, no border";
const CHARACTER = "same man from reference, long blonde hair, pale complexion, introspective gaze, worn expression";
const CHARACTER_SPEAKING = `${CHARACTER}, lips slightly parted, speaking`;

// --- SCENE IMAGES (all reference the same photo for character consistency) ---

const img1 = Image({
  prompt: { text: `Medium close-up portrait, ${CHARACTER_SPEAKING}, in a basement or garage setting, holding an acoustic guitar, dim lamp light, melancholic expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img2 = Image({
  prompt: { text: `Close-up portrait, ${CHARACTER_SPEAKING}, in a recording studio, microphone visible, focused intensity, cool studio lighting, raw emotional expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img3 = Image({
  prompt: { text: `Medium shot, ${CHARACTER}, on stage with guitar, powerful performance stance, stage lights and smoke, wild energy, hair flying, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img4 = Image({
  prompt: { text: `Portrait, ${CHARACTER}, in quiet contemplative pose, looking downward, soft diffused light, vulnerable and fragile expression, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img5 = Image({
  prompt: { text: `Close-up of man's anguished face, ${CHARACTER}, eyes filled with pain, brow furrowed, emotional turmoil visible, moody dark lighting, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img6 = Image({
  prompt: { text: `Close-up hands on guitar strings, ${CHARACTER} playing, fingers on fretboard, worn hands, intimate creative moment, dim candlelit atmosphere, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img7 = Image({
  prompt: { text: `Medium shot from below, ${CHARACTER} on stage with guitar raised, intense performance, crowd energy, stage lights, raw power, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

const img8 = Image({
  prompt: { text: `Wide shot, ${CHARACTER} alone in empty room, sitting isolated, looking haunted, pale blue light through window, solitary melancholy, ${STYLE}`, images: [KURT_REF] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

// --- ANIMATE TO VIDEO ---

const vid1 = Video({
  prompt: { text: "man holding guitar, fingers moving gently on strings, sad contemplative gaze, quiet intensity", images: [img1] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid2 = Video({
  prompt: { text: "man in studio, raw emotional expression, voice trembling slightly, intense focus, vulnerability", images: [img2] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid3 = Video({
  prompt: { text: "man on stage with guitar, explosive energy, body swaying with music, wild passionate performance", images: [img3] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid4 = Video({
  prompt: { text: "man in quiet pose, head slightly tilted, troubled gaze, gentle defeated movement", images: [img4] },
  model: varg.videoModel("kling-v2.5"),
  duration: 10,
});

const vid5 = Video({
  prompt: { text: "man's jaw clenches, eyes show deep pain, brow furrows, emotional anguish expression", images: [img5] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid6 = Video({
  prompt: { text: "fingers playing guitar strings, hand movement, strings vibrating, intimate musical moment", images: [img6] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid7 = Video({
  prompt: { text: "man thrashing on stage with guitar, intense movement, powerful performance, crowd energy", images: [img7] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

const vid8 = Video({
  prompt: { text: "man sits alone in empty room, slow movement, withdrawn isolated pose, melancholic atmosphere", images: [img8] },
  model: varg.videoModel("kling-v2.5"),
  duration: 5,
});

// --- VOICEOVER ---

const speech = Speech({
  voice: "adam",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Kurt Cobain came from nowhere. His pain became a generation's anthem. Raw, authentic, unfiltered. Nirvana changed everything. Though he burned too bright, too fast. His voice echoes forever.",
});

// --- BUILD RENDER ---

await render(
  <Render width={1080} height={1920}>
    <Music
      prompt="dark melancholic grunge rock instrumental, distorted guitar, 1990s alternative rock, heavy sad atmosphere"
      model={varg.musicModel("music_v1")}
      volume={0.15}
      loop={true}
      ducking={true}
      duration={16}
    />

    <Clip duration={2} cutFrom={0.3} cutTo={2.3}>
      {vid1}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid2}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid5}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid6}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid3}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid7}
    </Clip>

    <Clip duration={2} cutFrom={0.3} cutTo={2.3} transition={{ name: "fade", duration: 0.15 }}>
      {vid4}
    </Clip>

    <Clip duration={2} cutFrom={0.2} cutTo={2.2} transition={{ name: "fade", duration: 0.15 }}>
      {vid8}
    </Clip>

    <Captions src={speech} style="tiktok" position="bottom" />
  </Render>,
  { output: "output/legend-kurt.mp4" },
);

Key techniques in this template:

  • Reference image consistency: All 8 scene images pass the same KURT_REF photo to nano-banana-pro/edit, keeping the character recognizable across different settings
  • cutFrom / cutTo: Trims the first ~0.3s of each generated clip (removes AI "warmup" frames)
  • ducking: Automatically lowers music volume when speech is playing
  • loop: Music loops to fill the full video duration
  • Non-sequential clip ordering: Clips are arranged for editorial pacing (vid1, vid2, vid5, vid6, vid3, vid7, vid4, vid8), not generation order

Best Practices

Recommended default models

Type Model ID Credits
Video Kling V3 kling-v3 150
Video (budget) Kling V3 Standard kling-v3-standard 100
Image Nano Banana Pro nano-banana-pro 5
Image editing Nano Banana Pro Edit nano-banana-pro/edit 5
Image (fast) Flux Schnell flux-schnell 5
Speech Turbo eleven_turbo_v2 20
Music Music V1 music_v1 30

Model duration constraints

These constraints are critical — wrong values cause 422 errors:

Model Duration rule
kling-v3, kling-v3-standard, kling-v2.6 Any integer from 3 to 15 seconds
kling-v2.5 and older ONLY 5 or 10 seconds — any other value fails
ltx-2-19b-distilled Uses num_frames (not duration) and video_size (not aspect_ratio)

One image per Video prompt

Pass only one image in the Video prompt images array. Multiple images cause errors.

// CORRECT
Video({ prompt: { text: "cat walks", images: [catImage] }, ... })

// WRONG — will error
Video({ prompt: { text: "cat walks", images: [catImage, bgImage] }, ... })

Clip duration must match video duration

The <Clip duration={N}> should match the duration of the Video inside it:

const vid = Video({ ..., duration: 5 });

// CORRECT
<Clip duration={5}>{vid}</Clip>

// WRONG — mismatch causes timing issues
<Clip duration={3}>{vid}</Clip>

Audio-first workflow

Always generate audio (speech/music) before video. Audio duration is unpredictable — a voiceover might be 6s or 12s depending on text length and pacing. If you generate video first, you'll either have mismatched lengths or waste credits regenerating.

For voiceover-driven videos (local render):

  1. Generate the voiceover first
  2. Check its duration
  3. Generate video clips to match
import { createVarg } from "@vargai/gateway";
import { generateVideo } from "vargai/ai";
import { render, Render, Clip, Image, Video, Speech, Captions } from "vargai/react";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// 1. Generate voiceover first
const voiceover = Speech({
  voice: "rachel",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Welcome to our product showcase. Here are three features you'll love.",
});

// 2. Render with `shortest` — video ends when voiceover ends
await render(
  <Render width={1080} height={1920} shortest>
    <Clip duration={15}>
      <Video
        prompt={{ text: "product showcase, smooth camera orbit", images: [productImage] }}
        model={varg.videoModel("kling-v3")}
        duration={15}
      />
    </Clip>
    <Captions src={voiceover} style="tiktok" />
  </Render>,
  { output: "output/showcase.mp4" },
);
// If voiceover is 8s, output is 8s (not 15s of mostly silent video)

For music: Always set duration on <Music> to match the total video length. Without it, ElevenLabs generates ~60s of audio which extends the video far beyond intended length. If the music is longer than the video, use shortest as a safety net:

// Video is 3 clips × 5s = 15s total
<Render width={1080} height={1920} shortest>
  <Music prompt="ambient electronic" volume={0.2} duration={15} />
  <Clip duration={5}>{vid1}</Clip>
  <Clip duration={5}>{vid2}</Clip>
  <Clip duration={5}>{vid3}</Clip>
</Render>

For gateway-only workflows (no local render): Generate speech first via VargClient, then use the audio duration to decide how many video clips to generate:

// 1. Generate speech first
const speechJob = await client.createSpeech({
  model: "eleven_turbo_v2",
  text: "A narration script that could be any length.",
  voice: "rachel",
});
const speech = await client.waitForJob(speechJob.job_id);
// speech.output.url → use this to determine pacing

// 2. Now generate video clips knowing the audio length
const videoJob = await client.createVideo({
  model: "kling-v3",
  prompt: "matching scene for the narration",
  duration: 10, // match to speech length
});

Prompt format per image model

Model Prompt format
nano-banana-pro Plain string: prompt: "a sunset over the ocean"
nano-banana-pro/edit Object with images: prompt: { text: "...", images: [refUrl] }
flux-schnell, flux-pro, flux-dev Plain string
soul (Higgsfield) Plain string

Define media with function calls, not JSX

Media elements (Image, Video, Speech, Music) must be defined as variables using function calls. JSX assignment does not work for these:

// CORRECT — function call
const img = Image({ prompt: "a cat", model: varg.imageModel("nano-banana-pro") });
const vid = Video({ prompt: { text: "cat moves", images: [img] }, model: varg.videoModel("kling-v3") });

// WRONG — JSX assignment
const img = <Image prompt="a cat" />;

Then use the variables as children or props in JSX:

<Clip duration={5}>{vid}</Clip>

Character consistency: ref → edit → animate

When a character or product appears across multiple scenes, use this 3-step workflow:

  1. Reference image — generate (or receive) a character hero shot
  2. Scene images via /edit — use nano-banana-pro/edit to place the character into each scene, always passing the reference via images: [ref]
  3. Animate via i2v — pass each scene image to Video() for image-to-video

Never generate scene images from scratch — always edit from the reference. Without this pattern, each clip generates a different-looking character.

import { createVarg } from "@vargai/gateway";
import { render, Render, Clip, Image, Video } from "vargai/react";

const varg = createVarg({ apiKey: process.env.VARG_API_KEY! });

// 1. Character reference
const ref = Image({
  prompt: "a man in a dark suit, dramatic side lighting, neutral background",
  model: varg.imageModel("nano-banana-pro"),
  aspectRatio: "9:16",
});

// 2. Scene images — place character into different environments
const scene1 = Image({
  prompt: { text: "same man sitting at a wooden desk, harsh lamp light, dark study", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});
const scene2 = Image({
  prompt: { text: "same man standing by a tall window, cold grey daylight on face", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});
const scene3 = Image({
  prompt: { text: "same man walking alone down a narrow cobblestone alley at night", images: [ref] },
  model: varg.imageModel("nano-banana-pro/edit"),
  aspectRatio: "9:16",
});

// 3. Animate each scene image
const vid1 = Video({ prompt: { text: "man looks up from desk, slight head turn", images: [scene1] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid2 = Video({ prompt: { text: "man turns away from window, eyes cast down", images: [scene2] }, model: varg.videoModel("kling-v3"), duration: 5 });
const vid3 = Video({ prompt: { text: "man walks slowly forward, coat swaying", images: [scene3] }, model: varg.videoModel("kling-v3"), duration: 5 });

await render(
  <Render width={1080} height={1920}>
    <Clip duration={5}>{vid1}</Clip>
    <Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid2}</Clip>
    <Clip duration={5} transition={{ name: "fade", duration: 0.3 }}>{vid3}</Clip>
  </Render>,
  { output: "output/multi-scene.mp4" },
);

Trim AI warm-up frames

Generated video clips often have ~0.3s of "warm-up" at the start where the image is mostly static. Use cutFrom on <Clip> to trim it:

// Trim first 0.3s of generated video for snappier cuts
<Clip duration={2} cutFrom={0.3} cutTo={2.3}>{vid}</Clip>

The source video duration must be >= cutTo.

Cache-aware iteration

When modifying part of a render, keep unchanged prompts exactly the same (same text, same model, same parameters). This ensures unchanged assets hit the cache and are not re-generated — saving time and credits. Only change the prompts for the parts you're actually modifying.

Lipsync

To sync a video to speech audio, pass both video and audio in the Video prompt:

const voiceover = Speech({
  voice: "rachel",
  model: varg.speechModel("eleven_turbo_v2"),
  children: "Hello, welcome to our product demo.",
});

const character = Video({
  prompt: { text: "person speaking, subtle movements", images: [characterImage] },
  model: varg.videoModel("kling-v3"),
  duration: 5,
});

const lipsynced = Video({
  prompt: { video: character, audio: voiceover },
  model: varg.videoModel("sync-v2-pro"),
});

Available Models

Video

Model ID Best for
Kling V3 kling-v3 Highest quality
Kling V3 Standard kling-v3-standard Quality/cost balance
Kling V2.6 kling-v2.6 High quality + native audio
Kling V2.5 kling-v2.5 Reliable general purpose
Wan 2.5 wan-2.5 Characters, anime
Minimax minimax Alternative
LTX 2 ltx-2-19b-distilled Fast, with audio
Sync V2 Pro sync-v2-pro Lipsync
Lipsync lipsync Lipsync (budget)

Image

Model ID Best for
Flux Schnell flux-schnell Fast, cheap
Flux Pro flux-pro High quality
Flux Dev flux-dev Development
Nano Banana Pro nano-banana-pro Versatile
Nano Banana Edit nano-banana-pro/edit Image editing, character consistency
Recraft V3 recraft-v3 Design, illustration
Soul soul Character consistency (Higgsfield)

Speech

Model ID
Multilingual V2 eleven_multilingual_v2
Turbo V2 eleven_turbo_v2
Flash V2.5 eleven_flash_v2_5
V3 eleven_v3

Music

Model ID
Music V1 music_v1

Key Behaviors

Caching

Same parameters = instant cached result at zero cost. Cache is keyed on model + prompt + files + options. Cache TTL is 30 days. All outputs are persisted to s3.varg.ai.

Billing

  • 1 credit = 1 cent. Signup gives 1,000 credits ($10).
  • Cache hits are free.
  • Credits are deducted after successful generation.
  • Cost examples: nano-banana-pro = 5 credits, kling-v3 = 150 credits, kling-v3-standard = 100 credits, speech = 20-25 credits, music = 30 credits.

Rate Limits

240 requests/minute per API key.

Error Handling

{
  "error": { "_tag": "ValidationError", "message": "prompt is required", "statusCode": 400 }
}
Status Tag Meaning
400 ValidationError Bad request
401 AuthError Invalid API key
402 InsufficientBalanceError Out of credits
429 RateLimitError Too many requests
502 ProviderError Upstream AI provider failed

The VargClient throws VargGatewayError with statusCode, message, field, and provider properties.


Common Agent Patterns

Pattern 1: Generate image, then animate it

const img = await client.createImage({
  model: "nano-banana-pro",
  prompt: "product photo of white sneakers on marble surface",
  aspect_ratio: "9:16",
});
const imgResult = await client.waitForJob(img.job_id);

const vid = await client.createVideo({
  model: "kling-v3",
  prompt: "camera slowly orbits around the sneakers, cinematic lighting",
  files: [{ url: imgResult.output!.url }],
});
const vidResult = await client.waitForJob(vid.job_id);
console.log(vidResult.output!.url);

Pattern 2: Generate video with voiceover

// Generate speech and video in parallel
const [speechJob, videoJob] = await Promise.all([
  client.createSpeech({
    model: "eleven_multilingual_v2",
    text: "Today we're looking at three key trends in AI.",
    voice: "sarah",
  }),
  client.createVideo({
    model: "kling-v3",
    prompt: "woman speaking to camera, subtle gestures, professional setting",
    duration: 5,
    aspect_ratio: "9:16",
  }),
]);

const [speech, video] = await Promise.all([
  client.waitForJob(speechJob.job_id),
  client.waitForJob(videoJob.job_id),
]);

console.log("Video:", video.output!.url);
console.log("Audio:", speech.output!.url);

Pattern 3: AI SDK — generate and save to file

import { generateVideo, generateImage } from "vargai/ai";

const { image } = await generateImage({
  model: varg.imageModel("nano-banana-pro"),
  prompt: "anime warrior girl, red hair, silver armor",
  aspectRatio: "9:16",
});

await Bun.write("output/character.png", image.uint8Array);

const { video } = await generateVideo({
  model: varg.videoModel("kling-v3"),
  prompt: "warrior draws sword dramatically",
  aspectRatio: "9:16",
});

await Bun.write("output/scene.mp4", video.uint8Array);

Pattern 4: Batch generate multiple images

const prompts = [
  "sunset over ocean, golden hour",
  "mountain peaks at dawn, misty",
  "city skyline at night, neon lights",
];

const jobs = await Promise.all(
  prompts.map((prompt) =>
    client.createImage({ model: "nano-banana-pro", prompt, aspect_ratio: "16:9" }),
  ),
);

const results = await Promise.all(
  jobs.map((job) => client.waitForJob(job.job_id)),
);

results.forEach((r, i) => console.log(`${prompts[i]}: ${r.output!.url}`));

API Reference

VargClient Methods

Method Parameters Returns
createVideo(params) { model, prompt, duration?, aspect_ratio?, files?, provider_options? } Promise<JobResponse>
createImage(params) { model, prompt, aspect_ratio?, files?, provider_options? } Promise<JobResponse>
createSpeech(params) { model, text, voice?, provider_options? } Promise<JobResponse>
createMusic(params) { model, prompt, duration?, provider_options? } Promise<JobResponse>
uploadFile(blob, mediaType) Blob | Buffer, string Promise<FileUploadResponse>
getJob(id) string Promise<JobResponse>
waitForJob(id, opts?) string, { pollIntervalMs?, maxAttempts? } Promise<JobResponse>
cancelJob(id) string Promise<void>
listVoices() Promise<VoiceListResponse>

JobResponse Shape

{
  job_id: string;
  status: "queued" | "processing" | "completed" | "failed" | "cancelled";
  model: string;
  created_at: string;
  completed_at?: string;
  output?: { url: string; media_type: string };
  cache?: { hit: boolean; key: string };
  error?: string;
}

createVarg Provider Methods

Method Returns
varg.videoModel(id) VideoModelV3
varg.imageModel(id) ImageModelV3
varg.speechModel(id) SpeechModelV3
varg.musicModel(id) MusicModelV3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment