johndpope · July 31, 2021 04:09
diff --git a/zooming-vqgan-clip-z-quantize-method-with-additions.ipynb b/zooming-vqgan-clip-z-quantize-method-with-additions.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Zooming VQGAN+CLIP (z+quantize method with additions).ipynb",
      "private_outputs": true,
      "provenance": [],
      "collapsed_sections": [],
      "machine_shape": "hm",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/chigozienri/7c802ba40842914914494dc9d763c1e8/zooming-vqgan-clip-z-quantize-method-with-additions.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CppIQlPhhwhs"
      },
      "source": [
        "# Generate images from text phrases with VQGAN and CLIP (z + quantize method), with animation and keyframes\n",
        "\n",
        "Notebook by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). The original BigGAN + CLIP method was made by https://twitter.com/advadnoun. Translated into Spanish and added explanations, and modifications by Eleiber#8347, and the friendly interface was made thanks to Abulafia#3734. Translated back into English, and zoom, pan, rotation, and keyframes features by Chigozie Nri (https://github.com/chigozienri, https://twitter.com/chigozienri)\n",
        "If you encounter problems using it, you are welcome to ask me to fix it at https://twitter.com/chigozienri\n",
        "\n",
        "For a detailed tutorial on how to use it, I recommend [visiting this article (in Spanish)](https://tuscriaturas.miraheze.org/wiki/Ayuda:Crear_imágenes_con_VQGAN+CLIP), made by Jakeukalane#2767 and Avengium (Ángel)#3715\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "poDdjU3SDtF2"
      },
      "source": [
        "# How to use this notebook\n",
        "\n",
        "This is an example of a Jupyter Notebook, running in Google Colab\n",
        "\n",
        "It runs Python code in your browser. It's not hard to use, even if you haven't run code before.\n",
        "\n",
        "First, in the menu bar, click Runtime>Change Runtime Type, and ensure that under \"Hardware Accelerator\" it says \"GPU\". If not, choose \"GPU\" from the drop-down menu, and click Save.\n",
        "\n",
        "Then, run each of the cells in the notebook, one by one. Make sure to run all of them in order! Click in the cell, and press Shift-Enter on your keyboard. This will run the code in the cell, and then move to the next cell.\n",
        "\n",
        "Follow the instructions in each cell, and you'll have an AI image in no time!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "N3vV_Mq292w-"
      },
      "source": [
        "# Load Google Drive\n",
        "\n",
        "Long-running colab notebooks might halt, and discard all progress. For this reason, it's useful (although optional) to save the images as they are produced in your personal google drive. Run the cell below to load google drive, click the link, sign in, paste the code generated into the prompt, and press enter."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wOSNC5SwHBry"
      },
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/gdrive')\n",
        "\n",
        "working_dir = '/content/gdrive/MyDrive/vqgan'"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "M-_wFuRq-zPa"
      },
      "source": [
        "If you choose not to use google drive, uncomment the cell below and run it instead."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fsubmNPc-pD0"
      },
      "source": [
        "# working_dir = '/content'"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VA1PHoJrRiK9",
        "cellView": "form"
      },
      "source": [
        "# @title Licensed under the MIT License\n",
        "\n",
        "# Copyright (c) 2021 Katherine Crowson\n",
        "\n",
        "# Permission is hereby granted, free of charge, to any person obtaining a copy\n",
        "# of this software and associated documentation files (the \"Software\"), to deal\n",
        "# in the Software without restriction, including without limitation the rights\n",
        "# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n",
        "# copies of the Software, and to permit persons to whom the Software is\n",
        "# furnished to do so, subject to the following conditions:\n",
        "\n",
        "# The above copyright notice and this permission notice shall be included in\n",
        "# all copies or substantial portions of the Software.\n",
        "\n",
        "# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n",
        "# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n",
        "# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n",
        "# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n",
        "# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n",
        "# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n",
        "# THE SOFTWARE."
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TkUfzT60ZZ9q"
      },
      "source": [
        "!nvidia-smi"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wSfISAhyPmyp",
        "cellView": "form"
      },
      "source": [
        "# @title Library installation\n",
        "# @markdown This cell will take a while because you have to download multiple libraries\n",
        "\n",
        "print(\"Downloading CLIP...\")\n",
        "!git clone https://github.com/openai/CLIP                 &> /dev/null\n",
        " \n",
        "print(\"Downloading Python AI libraries...\")\n",
        "!git clone https://github.com/CompVis/taming-transformers &> /dev/null\n",
        "!pip install ftfy regex tqdm omegaconf pytorch-lightning  &> /dev/null\n",
        "!pip install kornia                                       &> /dev/null\n",
        "!pip install einops                                       &> /dev/null\n",
        " \n",
        "print(\"Installing libraries for handling metadata...\")\n",
        "!pip install stegano                                      &> /dev/null\n",
        "!apt install exempi                                       &> /dev/null\n",
        "!pip install python-xmp-toolkit                           &> /dev/null\n",
        "!pip install imgtag                                       &> /dev/null\n",
        "!pip install pillow==7.1.2                                &> /dev/null\n",
        " \n",
        "print(\"Installing Python video creation libraries...\")\n",
        "!pip install imageio-ffmpeg &> /dev/null\n",
        "path = f'{working_dir}/steps'\n",
        "!mkdir --parents {path}\n",
        "print(\"Installation finished.\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FhhdWrSxQhwg",
        "cellView": "form"
      },
      "source": [
        "#@title Selection of models to download\n",
        "#@markdown By default, the notebook downloads Model 16384 from ImageNet. There are others such as ImageNet 1024, COCO-Stuff, WikiArt 1024, WikiArt 16384, FacesHQ or S-FLCKR, which are not downloaded by default, since it would be in vain if you are not going to use them, so if you want to use them, simply select the models to download.\n",
        "\n",
        "imagenet_1024 = False #@param {type:\"boolean\"}\n",
        "imagenet_16384 = True #@param {type:\"boolean\"}\n",
        "coco = False #@param {type:\"boolean\"}\n",
        "faceshq = False #@param {type:\"boolean\"}\n",
        "wikiart_1024 = False #@param {type:\"boolean\"}\n",
        "wikiart_16384 = False #@param {type:\"boolean\"}\n",
        "sflckr = False #@param {type:\"boolean\"}\n",
        "\n",
        "if imagenet_1024:\n",
        "  !curl -L -o vqgan_imagenet_f16_1024.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_1024.yaml' #ImageNet 1024\n",
        "  !curl -L -o vqgan_imagenet_f16_1024.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_1024.ckpt'  #ImageNet 1024\n",
        "if imagenet_16384:\n",
        "  !curl -L -o vqgan_imagenet_f16_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.yaml' #ImageNet 16384\n",
        "  !curl -L -o vqgan_imagenet_f16_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/vqgan_imagenet_f16_16384.ckpt' #ImageNet 16384\n",
        "if coco:\n",
        "  !curl -L -o coco.yaml -C - 'https://dl.nmkd.de/ai/clip/coco/coco.yaml' #COCO\n",
        "  !curl -L -o coco.ckpt -C - 'https://dl.nmkd.de/ai/clip/coco/coco.ckpt' #COCO\n",
        "if faceshq:\n",
        "  !curl -L -o faceshq.yaml -C - 'https://drive.google.com/uc?export=download&id=1fHwGx_hnBtC8nsq7hesJvs-Klv-P0gzT' #FacesHQ\n",
        "  !curl -L -o faceshq.ckpt -C - 'https://app.koofr.net/content/links/a04deec9-0c59-4673-8b37-3d696fe63a5d/files/get/last.ckpt?path=%2F2020-11-13T21-41-45_faceshq_transformer%2Fcheckpoints%2Flast.ckpt' #FacesHQ\n",
        "if wikiart_1024: \n",
        "  !curl -L -o wikiart_1024.yaml -C - 'http://mirror.io.community/blob/vqgan/wikiart.yaml' #WikiArt 1024\n",
        "  !curl -L -o wikiart_1024.ckpt -C - 'http://mirror.io.community/blob/vqgan/wikiart.ckpt' #WikiArt 1024\n",
        "if wikiart_16384: \n",
        "  !curl -L -o wikiart_16384.yaml -C - 'http://mirror.io.community/blob/vqgan/wikiart_16384.yaml' #WikiArt 16384\n",
        "  !curl -L -o wikiart_16384.ckpt -C - 'http://mirror.io.community/blob/vqgan/wikiart_16384.ckpt' #WikiArt 16384\n",
        "if sflckr:\n",
        "  !curl -L -o sflckr.yaml -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fconfigs%2F2020-11-09T13-31-51-project.yaml&dl=1' #S-FLCKR\n",
        "  !curl -L -o sflckr.ckpt -C - 'https://heibox.uni-heidelberg.de/d/73487ab6e5314cb5adba/files/?p=%2Fcheckpoints%2Flast.ckpt&dl=1' #S-FLCKR"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EXMSuW2EQWsd",
        "cellView": "form"
      },
      "source": [
        "# @title Loading of libraries and definitions\n",
        " \n",
        "import argparse\n",
        "import math\n",
        "from pathlib import Path\n",
        "import sys\n",
        "import os\n",
        "import cv2\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import subprocess\n",
        " \n",
        "sys.path.append('./taming-transformers')\n",
        "from IPython import display\n",
        "from base64 import b64encode\n",
        "from omegaconf import OmegaConf\n",
        "from PIL import Image\n",
        "from taming.models import cond_transformer, vqgan\n",
        "import torch\n",
        "from torch import nn, optim\n",
        "from torch.nn import functional as F\n",
        "from torchvision import transforms\n",
        "from torchvision.transforms import functional as TF\n",
        "from tqdm.notebook import tqdm\n",
        " \n",
        "from CLIP import clip\n",
        "import kornia.augmentation as K\n",
        "import numpy as np\n",
        "import imageio\n",
        "from PIL import ImageFile, Image\n",
        "from imgtag import ImgTag    # metadata \n",
        "from libxmp import *         # metadata\n",
        "import libxmp                # metadata\n",
        "from stegano import lsb\n",
        "import json\n",
        "ImageFile.LOAD_TRUNCATED_IMAGES = True\n",
        " \n",
        "def sinc(x):\n",
        "    return torch.where(x != 0, torch.sin(math.pi * x) / (math.pi * x), x.new_ones([]))\n",
        " \n",
        " \n",
        "def lanczos(x, a):\n",
        "    cond = torch.logical_and(-a < x, x < a)\n",
        "    out = torch.where(cond, sinc(x) * sinc(x/a), x.new_zeros([]))\n",
        "    return out / out.sum()\n",
        " \n",
        " \n",
        "def ramp(ratio, width):\n",
        "    n = math.ceil(width / ratio + 1)\n",
        "    out = torch.empty([n])\n",
        "    cur = 0\n",
        "    for i in range(out.shape[0]):\n",
        "        out[i] = cur\n",
        "        cur += ratio\n",
        "    return torch.cat([-out[1:].flip([0]), out])[1:-1]\n",
        " \n",
        " \n",
        "def resample(input, size, align_corners=True):\n",
        "    n, c, h, w = input.shape\n",
        "    dh, dw = size\n",
        " \n",
        "    input = input.view([n * c, 1, h, w])\n",
        " \n",
        "    if dh < h:\n",
        "        kernel_h = lanczos(ramp(dh / h, 2), 2).to(input.device, input.dtype)\n",
        "        pad_h = (kernel_h.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (0, 0, pad_h, pad_h), 'reflect')\n",
        "        input = F.conv2d(input, kernel_h[None, None, :, None])\n",
        " \n",
        "    if dw < w:\n",
        "        kernel_w = lanczos(ramp(dw / w, 2), 2).to(input.device, input.dtype)\n",
        "        pad_w = (kernel_w.shape[0] - 1) // 2\n",
        "        input = F.pad(input, (pad_w, pad_w, 0, 0), 'reflect')\n",
        "        input = F.conv2d(input, kernel_w[None, None, None, :])\n",
        " \n",
        "    input = input.view([n, c, h, w])\n",
        "    return F.interpolate(input, size, mode='bicubic', align_corners=align_corners)\n",
        " \n",
        " \n",
        "class ReplaceGrad(torch.autograd.Function):\n",
        "    @staticmethod\n",
        "    def forward(ctx, x_forward, x_backward):\n",
        "        ctx.shape = x_backward.shape\n",
        "        return x_forward\n",
        " \n",
        "    @staticmethod\n",
        "    def backward(ctx, grad_in):\n",
        "        return None, grad_in.sum_to_size(ctx.shape)\n",
        " \n",
        " \n",
        "replace_grad = ReplaceGrad.apply\n",
        " \n",
        " \n",
        "class ClampWithGrad(torch.autograd.Function):\n",
        "    @staticmethod\n",
        "    def forward(ctx, input, min, max):\n",
        "        ctx.min = min\n",
        "        ctx.max = max\n",
        "        ctx.save_for_backward(input)\n",
        "        return input.clamp(min, max)\n",
        " \n",
        "    @staticmethod\n",
        "    def backward(ctx, grad_in):\n",
        "        input, = ctx.saved_tensors\n",
        "        return grad_in * (grad_in * (input - input.clamp(ctx.min, ctx.max)) >= 0), None, None\n",
        " \n",
        " \n",
        "clamp_with_grad = ClampWithGrad.apply\n",
        " \n",
        " \n",
        "def vector_quantize(x, codebook):\n",
        "    d = x.pow(2).sum(dim=-1, keepdim=True) + codebook.pow(2).sum(dim=1) - 2 * x @ codebook.T\n",
        "    indices = d.argmin(-1)\n",
        "    x_q = F.one_hot(indices, codebook.shape[0]).to(d.dtype) @ codebook\n",
        "    return replace_grad(x_q, x)\n",
        " \n",
        " \n",
        "class Prompt(nn.Module):\n",
        "    def __init__(self, embed, weight=1., stop=float('-inf')):\n",
        "        super().__init__()\n",
        "        self.register_buffer('embed', embed)\n",
        "        self.register_buffer('weight', torch.as_tensor(weight))\n",
        "        self.register_buffer('stop', torch.as_tensor(stop))\n",
        " \n",
        "    def forward(self, input):\n",
        "        input_normed = F.normalize(input.unsqueeze(1), dim=2)\n",
        "        embed_normed = F.normalize(self.embed.unsqueeze(0), dim=2)\n",
        "        dists = input_normed.sub(embed_normed).norm(dim=2).div(2).arcsin().pow(2).mul(2)\n",
        "        dists = dists * self.weight.sign()\n",
        "        return self.weight.abs() * replace_grad(dists, torch.maximum(dists, self.stop)).mean()\n",
        " \n",
        " \n",
        "def parse_prompt(prompt):\n",
        "    vals = prompt.rsplit(':', 2)\n",
        "    vals = vals + ['', '1', '-inf'][len(vals):]\n",
        "    return vals[0], float(vals[1]), float(vals[2])\n",
        " \n",
        " \n",
        "class MakeCutouts(nn.Module):\n",
        "    def __init__(self, cut_size, cutn, cut_pow=1.):\n",
        "        super().__init__()\n",
        "        self.cut_size = cut_size\n",
        "        self.cutn = cutn\n",
        "        self.cut_pow = cut_pow\n",
        "        self.augs = nn.Sequential(\n",
        "            K.RandomHorizontalFlip(p=0.5),\n",
        "            # K.RandomSolarize(0.01, 0.01, p=0.7),\n",
        "            K.RandomSharpness(0.3,p=0.4),\n",
        "            K.RandomAffine(degrees=30, translate=0.1, p=0.8, padding_mode='border'),\n",
        "            K.RandomPerspective(0.2,p=0.4),\n",
        "            K.ColorJitter(hue=0.01, saturation=0.01, p=0.7))\n",
        "        self.noise_fac = 0.1\n",
        " \n",
        " \n",
        "    def forward(self, input):\n",
        "        sideY, sideX = input.shape[2:4]\n",
        "        max_size = min(sideX, sideY)\n",
        "        min_size = min(sideX, sideY, self.cut_size)\n",
        "        cutouts = []\n",
        "        for _ in range(self.cutn):\n",
        "            size = int(torch.rand([])**self.cut_pow * (max_size - min_size) + min_size)\n",
        "            offsetx = torch.randint(0, sideX - size + 1, ())\n",
        "            offsety = torch.randint(0, sideY - size + 1, ())\n",
        "            cutout = input[:, :, offsety:offsety + size, offsetx:offsetx + size]\n",
        "            cutouts.append(resample(cutout, (self.cut_size, self.cut_size)))\n",
        "        batch = self.augs(torch.cat(cutouts, dim=0))\n",
        "        if self.noise_fac:\n",
        "            facs = batch.new_empty([self.cutn, 1, 1, 1]).uniform_(0, self.noise_fac)\n",
        "            batch = batch + facs * torch.randn_like(batch)\n",
        "        return batch\n",
        " \n",
        " \n",
        "def load_vqgan_model(config_path, checkpoint_path):\n",
        "    config = OmegaConf.load(config_path)\n",
        "    if config.model.target == 'taming.models.vqgan.VQModel':\n",
        "        model = vqgan.VQModel(**config.model.params)\n",
        "        model.eval().requires_grad_(False)\n",
        "        model.init_from_ckpt(checkpoint_path)\n",
        "    elif config.model.target == 'taming.models.cond_transformer.Net2NetTransformer':\n",
        "        parent_model = cond_transformer.Net2NetTransformer(**config.model.params)\n",
        "        parent_model.eval().requires_grad_(False)\n",
        "        parent_model.init_from_ckpt(checkpoint_path)\n",
        "        model = parent_model.first_stage_model\n",
        "    else:\n",
        "        raise ValueError(f'unknown model type: {config.model.target}')\n",
        "    del model.loss\n",
        "    return model\n",
        " \n",
        " \n",
        "def resize_image(image, out_size):\n",
        "    ratio = image.size[0] / image.size[1]\n",
        "    area = min(image.size[0] * image.size[1], out_size[0] * out_size[1])\n",
        "    size = round((area * ratio)**0.5), round((area / ratio)**0.5)\n",
        "    return image.resize(size, Image.LANCZOS)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1tthw0YaispD"
      },
      "source": [
        "## Instructions for setting parameters:\n",
        "\n",
        "| Parameter  |  Usage |\n",
        "|---|---|\n",
        "| `key_frames` | Whether to use key frames to change the parameters over the course of the run |\n",
        "|  `text_prompts` |  Text prompts, separated by \"\\|\" |\n",
        "| `width` | Width of the output, in pixels |\n",
        "| `height` | Height of the output, in pixels |\n",
        "| `model` | Choice of model, must be downloaded above |\n",
        "| `interval` | How often to display the frame in the notebook (doesn't affect the actual output) |\n",
        "| `initial_image` | Image to start with (relative path to file) |\n",
        "| `target_images` | Image prompts to target, separated by \"|\" (relative path to files) |\n",
        "| `seed` | Random seed, if set to a positive integer the run will be repeatable (get the same output for the same input each time, if set to -1 a random seed will be used. |\n",
        "| `max_frames` | Number of frames for the animation |\n",
        "| `angle` | Angle in degrees to rotate clockwise between each frame |\n",
        "| `zoom` | Factor to zoom in each frame, 1 is no zoom, less than 1 is zoom out, more than 1 is zoom in (negative is uninteresting, just adds an extra 180 rotation beyond that in angle) |\n",
        "| `translation_x` | Number of pixels to shift right each frame |\n",
        "| `translation_y` | Number of pixels to shift down each frame |\n",
        "| `iterations_per_frame` | Number of times to run the VQGAN+CLIP method each frame |\n",
        "| `save_all_iterations` | Debugging, set False in normal operation |\n",
        "\n",
        "---------\n",
        "\n",
        "Transformations (zoom, rotation, and translation)\n",
        "\n",
        "On each frame, the network restarts, is fed a version of the output zoomed in by `zoom` as the initial image, rotated clockwise by `angle` degrees, translated horizontally by `translation_x` pixels, and translated vertically by `translation_y` pixels. Then it runs `iterations_per_frame` iterations of the VQGAN+CLIP method. 0 `iterations_per_frame` is supported, to help test out the transformations without changing the image.\n",
        "\n",
        "For `iterations_per_frame = 1` (recommended for more abstract effects), the resulting images will not have much to do with the prompts, but at least one prompt is still required.\n",
        "\n",
        "In normal use, only the last iteration of each frame will be saved, but for trouble-shooting you can set `save_all_iterations` to True, and every iteration of each frame will be saved.\n",
        "\n",
        "----------------\n",
        "\n",
        "Mainly what you will have to modify will be `text_prompts`: there you can place the prompt(s) you want to generate (separated with |). It is a list because you can put more than one text, and so the AI tries to 'mix' the images, giving the same priority to both texts. You can also assign weights, to bias the priority towards one prompt or another, or negative weights, to remove an element (for example, a colour).\n",
        "\n",
        "Example of weights with decimals:\n",
        "\n",
        "Text : rubber:0.5 | rainbow:0.5\n",
        "\n",
        "To use an initial image to the model, you just have to upload a file to the Colab environment (in the section on the left), and then modify `initial_image`: putting the exact name of the file. Example: sample.png\n",
        "\n",
        "You can also change the model by changing the line that says `model`. Currently 1024, 16384, WikiArt, S-FLCKR and COCO-Stuff are available. To activate them you have to have downloaded them first, and then you can simply select it.\n",
        "\n",
        "You can also use `target_images`, which is basically putting one or more images on it that the AI will take as a \"target\", fulfilling the same function as putting text on it. To put more than one you have to use | as a separator.\n",
        "\n",
        "------------\n",
        "\n",
        "Key Frames\n",
        "\n",
        "If `key_frames` is set to True, you are able to change the parameters over the course of the run.\n",
        "To do this, put the parameters in in the following format:\n",
        "10:(0.5), 20: (1.0), 35: (-1.0)\n",
        "\n",
        "This means at frame 10, the value should be 0.5, at frame 20 the value should be 1.0, and at frame 35 the value should be -1.0. The value at each other frame will be linearly interpolated (that is, before frame 10, the value will be 0.5, between frame 10 and 20 the value will increase frame-by-frame from 0.5 to 1.0, between frame 20 and 35 the value will decrease frame-by-frame from 1.0 to -1.0, and after frame 35 the value will be -1.0)\n",
        "\n",
        "This also works for text_prompts, e.g. 10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\n",
        "will start with an Apple value of 1, once it hits frame 10 it will start decreasing in in Apple and increasing in Orange until it hits frame 20. Note that Peach will have a value of 1 the whole time.\n",
        "\n",
        "If `key_frames` is set to True, all of the parameters which can be key-framed must be entered in this format."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZdlpRFL8UAlW",
        "cellView": "form"
      },
      "source": [
        "#@title Parameters\n",
        "key_frames = True #@param {type:\"boolean\"}\n",
        "text_prompts = \"10:(Apple: 1| Orange: 0), 20: (Apple: 0| Orange: 1| Peach: 1)\" #@param {type:\"string\"}\n",
        "width =  800#@param {type:\"number\"}\n",
        "height =  450#@param {type:\"number\"}\n",
        "model = \"vqgan_imagenet_f16_16384\" #@param [\"vqgan_imagenet_f16_16384\", \"vqgan_imagenet_f16_1024\", \"wikiart_1024\", \"wikiart_16384\", \"coco\", \"faceshq\", \"sflckr\"]\n",
        "interval =  10#@param {type:\"number\"}\n",
        "initial_image = \"\"#@param {type:\"string\"}\n",
        "target_images = \"\"#@param {type:\"string\"}\n",
        "seed = 1#@param {type:\"number\"}\n",
        "max_frames = 350#@param {type:\"number\"}\n",
        "angle = \"0:(0)\"#@param {type:\"string\"}\n",
        "zoom = \"0:(1.1)\"#@param {type:\"string\"}\n",
        "translation_x = \"0:(-4)\"#@param {type:\"string\"}\n",
        "translation_y = \"0:(-6)\"#@param {type:\"string\"}\n",
        "iterations_per_frame = \"0:(10)\"#@param {type:\"string\"}\n",
        "save_all_iterations = False#@param {type:\"boolean\"}\n",
        "\n",
        "model_names={\n",
        "    \"vqgan_imagenet_f16_16384\": 'ImageNet 16384',\n",
        "    \"vqgan_imagenet_f16_1024\":\"ImageNet 1024\", \n",
        "    \"wikiart_1024\":\"WikiArt 1024\",\n",
        "    \"wikiart_16384\":\"WikiArt 16384\",\n",
        "    \"coco\":\"COCO-Stuff\",\n",
        "    \"faceshq\":\"FacesHQ\",\n",
        "    \"sflckr\":\"S-FLCKR\"\n",
        "}\n",
        "model_name = model_names[model]\n",
        "\n",
        "if seed == -1:\n",
        "    seed = None\n",
        "\n",
        "def parse_key_frames(string, prompt_parser=None):\n",
        "    import re\n",
        "    pattern = r'((?P<frame>[0-9]+):[\\s]*[\\(](?P<param>[\\S\\s]*?)[\\)])'\n",
        "    frames = dict()\n",
        "    for match_object in re.finditer(pattern, string):\n",
        "        frame = int(match_object.groupdict()['frame'])\n",
        "        param = match_object.groupdict()['param']\n",
        "        if prompt_parser:\n",
        "            frames[frame] = prompt_parser(param)\n",
        "        else:\n",
        "            frames[frame] = param\n",
        "    return frames\n",
        "\n",
        "def get_inbetweens(key_frames, integer=False):\n",
        "    key_frame_series = pd.Series([np.nan for a in range(max_frames)])\n",
        "    for i, value in key_frames.items():\n",
        "        key_frame_series[i] = value\n",
        "    key_frame_series = key_frame_series.astype(float)\n",
        "    key_frame_series = key_frame_series.interpolate(limit_direction='both')\n",
        "    if integer:\n",
        "        return key_frame_series.astype(int)\n",
        "    return key_frame_series\n",
        "\n",
        "def split_key_frame_text_prompts(frames):\n",
        "    prompt_dict = dict()\n",
        "    for i, parameters in frames.items():\n",
        "        prompts = parameters.split('|')\n",
        "        for prompt in prompts:\n",
        "            string, value = prompt.split(':')\n",
        "            string = string.strip()\n",
        "            value = float(value.strip())\n",
        "            if string in prompt_dict:\n",
        "                prompt_dict[string][i] = value\n",
        "            else:\n",
        "                prompt_dict[string] = {i: value}\n",
        "    prompt_series_dict = dict()\n",
        "    for prompt, values in prompt_dict.items():\n",
        "        value_string = (\n",
        "            ', '.join([f'{value}: ({values[value]})' for value in values])\n",
        "        )\n",
        "        prompt_series = get_inbetweens(parse_key_frames(value_string))\n",
        "        prompt_series_dict[prompt] = prompt_series\n",
        "    prompt_list = []\n",
        "    for i in range(max_frames):\n",
        "        prompt_list.append(\n",
        "            ' | '.join(\n",
        "                [f'{prompt}: {prompt_series_dict[prompt][i]}'\n",
        "                 for prompt in prompt_series_dict]\n",
        "            )\n",
        "        )\n",
        "    return prompt_list\n",
        "\n",
        "if key_frames:\n",
        "    text_prompts_series = split_key_frame_text_prompts(\n",
        "        parse_key_frames(text_prompts)\n",
        "    )\n",
        "    target_images_series = split_key_frame_text_prompts(\n",
        "        parse_key_frames(target_images)\n",
        "    )\n",
        "    angle_series = get_inbetweens(parse_key_frames(angle))\n",
        "    zoom_series = get_inbetweens(parse_key_frames(zoom))\n",
        "    translation_x_series = get_inbetweens(parse_key_frames(translation_x))\n",
        "    translation_y_series = get_inbetweens(parse_key_frames(translation_y))\n",
        "    iterations_per_frame_series = get_inbetweens(\n",
        "        parse_key_frames(iterations_per_frame), integer=True\n",
        "    )\n",
        "else:\n",
        "    text_prompts = [phrase.strip() for phrase in text_prompts.split(\"|\")]\n",
        "    if text_prompts == ['']:\n",
        "        text_prompts = []\n",
        "    if target_images == \"None\" or not target_images:\n",
        "        target_images = []\n",
        "    else:\n",
        "        target_images = target_images.split(\"|\")\n",
        "        target_images = [image.strip() for image in target_images]\n",
        "    angle = float(angle)\n",
        "    zoom = float(zoom)\n",
        "    translation_x = float(translation_x)\n",
        "    translation_y = float(translation_y)\n",
        "    iterations_per_frame = int(iterations_per_frame)\n",
        "args = argparse.Namespace(\n",
        "    prompts=text_prompts,\n",
        "    image_prompts=target_images,\n",
        "    noise_prompt_seeds=[],\n",
        "    noise_prompt_weights=[],\n",
        "    size=[width, height],\n",
        "    init_weight=0.,\n",
        "    clip_model='ViT-B/32',\n",
        "    vqgan_config=f'{model}.yaml',\n",
        "    vqgan_checkpoint=f'{model}.ckpt',\n",
        "    step_size=0.1,\n",
        "    cutn=64,\n",
        "    cut_pow=1.,\n",
        "    display_freq=interval,\n",
        "    seed=seed,\n",
        ")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TV_lYFXeAulw"
      },
      "source": [
        "The following cell deletes any frames already in the steps directory. Make sure you have saved any frames you want to keep from previous runs"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ksH-eM4pZBdP"
      },
      "source": [
        "path = f'{working_dir}/steps'\n",
        "!rm -r {path}\n",
        "!mkdir --parents {path}"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-qAWbJ-_ctkl"
      },
      "source": [
        "if key_frames:\n",
        "    # key frame filename would be too long\n",
        "    filename = \"video.mp4\"\n",
        "else:\n",
        "    filename = f\"{'_'.join(text_prompts).replace(' ', '')}.mp4\"\n",
        "filepath = f'{working_dir}/{filename}'"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "g7EDme5RYCrt",
        "cellView": "form"
      },
      "source": [
        "#@title Actually do the run...\n",
        "\n",
        "# Delete memory from previous runs\n",
        "!nvidia-smi -caa\n",
        "for var in ['device', 'model', 'perceptor', 'z']:\n",
        "  try:\n",
        "      del globals()[var]\n",
        "  except:\n",
        "      pass\n",
        "\n",
        "try:\n",
        "    import gc\n",
        "    gc.collect()\n",
        "except:\n",
        "    pass\n",
        "\n",
        "try:\n",
        "    torch.cuda.empty_cache()\n",
        "except:\n",
        "    pass\n",
        "\n",
        "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n",
        "print('Using device:', device)\n",
        "if not key_frames:\n",
        "    if text_prompts:\n",
        "        print('Using text prompts:', text_prompts)\n",
        "    if target_images:\n",
        "        print('Using image prompts:', target_images)\n",
        "if args.seed is None:\n",
        "    seed = torch.seed()\n",
        "else:\n",
        "    seed = args.seed\n",
        "torch.manual_seed(seed)\n",
        "print('Using seed:', seed)\n",
        " \n",
        "model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)\n",
        "perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)\n",
        " \n",
        "cut_size = perceptor.visual.input_resolution\n",
        "e_dim = model.quantize.e_dim\n",
        "f = 2**(model.decoder.num_resolutions - 1)\n",
        "make_cutouts = MakeCutouts(cut_size, args.cutn, cut_pow=args.cut_pow)\n",
        "n_toks = model.quantize.n_e\n",
        "toksX, toksY = args.size[0] // f, args.size[1] // f\n",
        "sideX, sideY = toksX * f, toksY * f\n",
        "z_min = model.quantize.embedding.weight.min(dim=0).values[None, :, None, None]\n",
        "z_max = model.quantize.embedding.weight.max(dim=0).values[None, :, None, None]\n",
        "stop_on_next_loop = False  # Make sure GPU memory doesn't get corrupted from cancelling the run mid-way through, allow a full frame to complete\n",
        "\n",
        "for i in range(max_frames):\n",
        "    if stop_on_next_loop:\n",
        "      break\n",
        "    if key_frames:\n",
        "        text_prompts = text_prompts_series[i]\n",
        "        text_prompts = [phrase.strip() for phrase in text_prompts.split(\"|\")]\n",
        "        if text_prompts == ['']:\n",
        "            text_prompts = []\n",
        "        args.prompts = text_prompts\n",
        "\n",
        "        target_images = target_images_series[i]\n",
        "\n",
        "        if target_images == \"None\" or not target_images:\n",
        "            target_images = []\n",
        "        else:\n",
        "            target_images = target_images.split(\"|\")\n",
        "            target_images = [image.strip() for image in target_images]\n",
        "\n",
        "        angle = angle_series[i]\n",
        "        zoom = zoom_series[i]\n",
        "        translation_x = translation_x_series[i]\n",
        "        translation_y = translation_y_series[i]\n",
        "        iterations_per_frame = iterations_per_frame_series[i]\n",
        "        print(\n",
        "            f'text_prompts: {text_prompts}'\n",
        "            f'angle: {angle}',\n",
        "            f'zoom: {zoom}',\n",
        "            f'translation_x: {translation_x}',\n",
        "            f'translation_y: {translation_y}',\n",
        "            f'iterations_per_frame: {iterations_per_frame}'\n",
        "        )\n",
        "    try:\n",
        "        if i == 0 and initial_image != \"\":\n",
        "            img_0 = cv2.imread(initial_image)\n",
        "            z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)\n",
        "        elif i == 0 and not os.path.isfile(f'{working_dir}/steps/{i:04d}.png'):\n",
        "            one_hot = F.one_hot(\n",
        "                torch.randint(n_toks, [toksY * toksX], device=device), n_toks\n",
        "            ).float()\n",
        "            z = one_hot @ model.quantize.embedding.weight\n",
        "            z = z.view([-1, toksY, toksX, e_dim]).permute(0, 3, 1, 2)\n",
        "        else:\n",
        "            if save_all_iterations:\n",
        "                img_0 = cv2.imread(\n",
        "                    f'{working_dir}/steps/{i:04d}_{iterations_per_frame}.png')\n",
        "            else:\n",
        "                # Hack to prevent colour inversion on every frame\n",
        "                img_temp = cv2.imread(f'{working_dir}/steps/{i:04d}.png')\n",
        "                imageio.imwrite('inverted_temp.png', img_temp)\n",
        "                img_0 = cv2.imread('inverted_temp.png')\n",
        "            center = (1*img_0.shape[1]//2, 1*img_0.shape[0]//2)\n",
        "            trans_mat = np.float32(\n",
        "                [[1, 0, translation_x],\n",
        "                [0, 1, translation_y]]\n",
        "            )\n",
        "            rot_mat = cv2.getRotationMatrix2D( center, angle, zoom )\n",
        "\n",
        "            trans_mat = np.vstack([trans_mat, [0,0,1]])\n",
        "            rot_mat = np.vstack([rot_mat, [0,0,1]])\n",
        "            transformation_matrix = np.matmul(rot_mat, trans_mat)\n",
        "\n",
        "            img_0 = cv2.warpPerspective(\n",
        "                img_0,\n",
        "                transformation_matrix,\n",
        "                (img_0.shape[1], img_0.shape[0]),\n",
        "                borderMode=cv2.BORDER_WRAP\n",
        "            )\n",
        "            z, *_ = model.encode(TF.to_tensor(img_0).to(device).unsqueeze(0) * 2 - 1)\n",
        "        i += 1\n",
        "\n",
        "        z_orig = z.clone()\n",
        "        z.requires_grad_(True)\n",
        "        opt = optim.Adam([z], lr=args.step_size)\n",
        "\n",
        "        normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],\n",
        "                                        std=[0.26862954, 0.26130258, 0.27577711])\n",
        "\n",
        "        pMs = []\n",
        "\n",
        "        for prompt in args.prompts:\n",
        "            txt, weight, stop = parse_prompt(prompt)\n",
        "            embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()\n",
        "            pMs.append(Prompt(embed, weight, stop).to(device))\n",
        "\n",
        "        for prompt in args.image_prompts:\n",
        "            path, weight, stop = parse_prompt(prompt)\n",
        "            img = resize_image(Image.open(path).convert('RGB'), (sideX, sideY))\n",
        "            batch = make_cutouts(TF.to_tensor(img).unsqueeze(0).to(device))\n",
        "            embed = perceptor.encode_image(normalize(batch)).float()\n",
        "            pMs.append(Prompt(embed, weight, stop).to(device))\n",
        "\n",
        "        for seed, weight in zip(args.noise_prompt_seeds, args.noise_prompt_weights):\n",
        "            gen = torch.Generator().manual_seed(seed)\n",
        "            embed = torch.empty([1, perceptor.visual.output_dim]).normal_(generator=gen)\n",
        "            pMs.append(Prompt(embed, weight).to(device))\n",
        "\n",
        "        def synth(z):\n",
        "            z_q = vector_quantize(z.movedim(1, 3), model.quantize.embedding.weight).movedim(3, 1)\n",
        "            return clamp_with_grad(model.decode(z_q).add(1).div(2), 0, 1)\n",
        "\n",
        "        def add_xmp_data(filename):\n",
        "            imagen = ImgTag(filename=filename)\n",
        "            imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'creator', 'VQGAN+CLIP', {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            if args.prompts:\n",
        "                imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', \" | \".join(args.prompts), {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            else:\n",
        "                imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'title', 'None', {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'i', str(i), {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'model', model_name, {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            imagen.xmp.append_array_item(libxmp.consts.XMP_NS_DC, 'seed',str(seed) , {\"prop_array_is_ordered\":True, \"prop_value_is_array\":True})\n",
        "            imagen.close()\n",
        "\n",
        "        def add_stegano_data(filename):\n",
        "            data = {\n",
        "                \"title\": \" | \".join(args.prompts) if args.prompts else None,\n",
        "                \"notebook\": \"VQGAN+CLIP\",\n",
        "                \"i\": i,\n",
        "                \"model\": model_name,\n",
        "                \"seed\": str(seed),\n",
        "            }\n",
        "            lsb.hide(filename, json.dumps(data)).save(filename)\n",
        "\n",
        "        @torch.no_grad()\n",
        "        def checkin(i, losses):\n",
        "            losses_str = ', '.join(f'{loss.item():g}' for loss in losses)\n",
        "            tqdm.write(f'i: {i}, loss: {sum(losses).item():g}, losses: {losses_str}')\n",
        "            out = synth(z)\n",
        "            TF.to_pil_image(out[0].cpu()).save('progress.png')\n",
        "            add_stegano_data('progress.png')\n",
        "            add_xmp_data('progress.png')\n",
        "            display.display(display.Image('progress.png'))\n",
        "\n",
        "        def save_output(i, img, suffix=None):\n",
        "            filename = \\\n",
        "                f\"{working_dir}/steps/{i:04}{'_' + suffix if suffix else ''}.png\"\n",
        "            imageio.imwrite(filename, np.array(img))\n",
        "            add_stegano_data(filename)\n",
        "            add_xmp_data(filename)\n",
        "\n",
        "        def ascend_txt(i, save=True, suffix=None):\n",
        "            out = synth(z)\n",
        "            iii = perceptor.encode_image(normalize(make_cutouts(out))).float()\n",
        "\n",
        "            result = []\n",
        "\n",
        "            if args.init_weight:\n",
        "                result.append(F.mse_loss(z, z_orig) * args.init_weight / 2)\n",
        "\n",
        "            for prompt in pMs:\n",
        "                result.append(prompt(iii))\n",
        "            img = np.array(out.mul(255).clamp(0, 255)[0].cpu().detach().numpy().astype(np.uint8))[:,:,:]\n",
        "            img = np.transpose(img, (1, 2, 0))\n",
        "            if save:\n",
        "                save_output(i, img, suffix=suffix)\n",
        "            return result\n",
        "\n",
        "        def train(i, save=True, suffix=None):\n",
        "            opt.zero_grad()\n",
        "            lossAll = ascend_txt(i, save=save, suffix=suffix)\n",
        "            if i % args.display_freq == 0 and save:\n",
        "                checkin(i, lossAll)\n",
        "            loss = sum(lossAll)\n",
        "            loss.backward()\n",
        "            opt.step()\n",
        "            with torch.no_grad():\n",
        "                z.copy_(z.maximum(z_min).minimum(z_max))\n",
        "\n",
        "        with tqdm() as pbar:\n",
        "            if iterations_per_frame == 0:\n",
        "                save_output(i, img_0)\n",
        "            j = 1\n",
        "            while True:\n",
        "                suffix = (str(j) if save_all_iterations else None)\n",
        "                if j >= iterations_per_frame:\n",
        "                    train(i, save=True, suffix=suffix)\n",
        "                    break\n",
        "                if save_all_iterations:\n",
        "                    train(i, save=True, suffix=suffix)\n",
        "                else:\n",
        "                    train(i, save=False, suffix=suffix)\n",
        "                j += 1\n",
        "                pbar.update()\n",
        "    except KeyboardInterrupt:\n",
        "      stop_on_next_loop = True\n",
        "      pass"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YIsSFgtPw0Pc"
      },
      "source": [
        "# Optional: SRCNN for increasing resolution"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HSJxMzXKtkTt"
      },
      "source": [
        "!git clone https://github.com/Mirwaisse/SRCNN.git\n",
        "!curl https://raw.githubusercontent.com/chigozienri/SRCNN/master/models/model_2x.pth -o model_2x.pth"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1iwOrcDbtndh"
      },
      "source": [
        "# @title Increase Resolution\n",
        "\n",
        "# Set zoomed = True if this cell is run\n",
        "zoomed = True\n",
        "\n",
        "init_frame = 50#@param {type:\"number\"}\n",
        "last_frame = 252#@param {type:\"number\"}\n",
        "\n",
        "for i in range(init_frame, last_frame): #\n",
        "    filename = f\"{i:04}.png\"\n",
        "    cmd = [\n",
        "        'python',\n",
        "        '/content/SRCNN/run.py',\n",
        "        '--zoom_factor',\n",
        "        '2',  # Note if you increase this, you also need to change the model.\n",
        "        '--model',\n",
        "        '/content/model_2x.pth',  # 2x, 3x and 4x are available from the repo above\n",
        "        '--image',\n",
        "        filename,\n",
        "        '--cuda'\n",
        "    ]\n",
        "    print(f'Upscaling frame {i}')\n",
        "\n",
        "    process = subprocess.Popen(cmd, cwd=f'{working_dir}/steps/')\n",
        "    stdout, stderr = process.communicate()\n",
        "    if stderr:\n",
        "      raise RuntimeError(stderr)\n",
        "      break"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "02ZbcWw5YYnU"
      },
      "source": [
        "## Make a video of the results\n",
        "\n",
        "To generate a video with the frames, run the cell below. You can modify the number of FPS, the initial frame, the last frame, etc."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mFo5vz0UYBrF"
      },
      "source": [
        "# @title Create video\n",
        "\n",
        "init_frame = 10#@param {type:\"number\"} This is the frame where the video will start\n",
        "last_frame = 350#@param {type:\"number\"} You can change i to the number of the last frame you want to generate. It will raise an error if that number of frames does not exist.\n",
        "fps = 12#@param {type:\"number\"}\n",
        "\n",
        "frames = []\n",
        "# tqdm.write('Generating video...')\n",
        "try:\n",
        "    zoomed\n",
        "except NameError:\n",
        "    image_path = f'{working_dir}/steps/%04d.png'\n",
        "else:\n",
        "    image_path = f'{working_dir}/steps/zoomed_%04d.png'\n",
        "\n",
        "cmd = [\n",
        "    'ffmpeg',\n",
        "    '-y',\n",
        "    '-vcodec',\n",
        "    'png',\n",
        "    '-r',\n",
        "    fps,\n",
        "    '-start_number',\n",
        "    init_frame,\n",
        "    '-i',\n",
        "    image_path,\n",
        "    '-c:v',\n",
        "    'libx264',\n",
        "    '-vf',\n",
        "    f'fps={fps}',\n",
        "    '-pix_fmt',\n",
        "    'yuv420p',\n",
        "    '-crf',\n",
        "    '17',\n",
        "    '-preset',\n",
        "    'veryslow',\n",
        "    filepath\n",
        "]\n",
        "\n",
        "process = subprocess.Popen(cmd, cwd=f'{working_dir}/steps/')\n",
        "stdout, stderr = process.communicate()\n",
        "if stderr:\n",
        "    raise RuntimeError(stderr)\n",
        "    break\n",
        "else:\n",
        "    print(\"The video is ready\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E8lvN6b0mb-b",
        "cellView": "form"
      },
      "source": [
        "# @title See video in the browser\n",
        "# @markdown This process may take a little longer. If you don't want to wait, download it by executing the next cell instead of using this cell.\n",
        "mp4 = open(filepath,'rb').read()\n",
        "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
        "display.HTML(\"\"\"\n",
        "<video width=400 controls>\n",
        "      <source src=\"%s\" type=\"video/mp4\">\n",
        "</video>\n",
        "\"\"\" % data_url)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Y0e8pHyJmi7s",
        "cellView": "form"
      },
      "source": [
        "# @title Download video\n",
        "from google.colab import files\n",
        "files.download(filepath)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g_0Gyi3E0HLW"
      },
      "source": [
        "# Optional: Super-Slomo for smoothing movement (Currently broken out of the box, you can get it to work, but you have to fiddle a little with the code)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cRbalqeLvy3y"
      },
      "source": [
        "!git clone -q --depth 1 https://github.com/avinashpaliwal/Super-SloMo.git\n",
        "from os.path import exists\n",
        "def download_from_google_drive(file_id, file_name):\n",
        "  # download a file from the Google Drive link\n",
        "  !rm -f ./cookie\n",
        "  !curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id={file_id}\" > /dev/null\n",
        "  confirm_text = !awk '/download/ {print $NF}' ./cookie\n",
        "  confirm_text = confirm_text[0]\n",
        "  !curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}\" -o {file_name}\n",
        "  \n",
        "pretrained_model = 'SuperSloMo.ckpt'\n",
        "if not exists(pretrained_model):\n",
        "  download_from_google_drive('1IvobLDbRiBgZr3ryCRrWL8xDbMZ-KnpF', pretrained_model)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2hT5Lhgs0gwe"
      },
      "source": [
        "SLOW_MOTION_FACTOR = 3#@param {type:\"number\"}\n",
        "TARGET_FPS = 12#@param {type:\"number\"}\n",
        "\n",
        "cmd1 = [\n",
        "    'python',\n",
        "    'Super-SloMo/video_to_slomo.py',\n",
        "    '--checkpoint',\n",
        "    pretrained_model,\n",
        "    '--video',\n",
        "    filepath,\n",
        "    '--sf',\n",
        "    str(SLOW_MOTION_FACTOR),\n",
        "    '--fps',\n",
        "    str(TARGET_FPS),\n",
        "    '--output',\n",
        "    f'{filepath}-slomo.mkv',\n",
        "]\n",
        "process = subprocess.Popen(cmd1, cwd=f'/content')\n",
        "stdout, stderr = process.communicate()\n",
        "if stderr is not None:\n",
        "    raise RuntimeError(stderr)\n",
        "\n",
        "cmd2 = [\n",
        "    'ffmpeg',\n",
        "    '-i',\n",
        "    f'{filepath}-slomo.mkv',\n",
        "    f'{filepath}-slomo.mp4',\n",
        "]\n",
        "\n",
        "process = subprocess.Popen(cmd2)\n",
        "stdout, stderr = process.communicate()\n",
        "if stderr is not None:\n",
        "    raise RuntimeError(stderr)\n"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
 }