Last active
November 28, 2023 16:33
-
-
Save M1ndBlast/64a46e60107d0319efd3a29d3d28da92 to your computer and use it in GitHub Desktop.
autoavsr.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "view-in-github", | |
| "colab_type": "text" | |
| }, | |
| "source": [ | |
| "<a href=\"https://colab.research.google.com/gist/M1ndBlast/64a46e60107d0319efd3a29d3d28da92/autoavsr.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "89tk8KwgJIEX" | |
| }, | |
| "source": [ | |
| "# Auto-AVSR Tutorial\n", | |
| "**Authors**: [Pingchuan Ma](https://mpc001.github.io/), [Alexandros Haliassos](https://dblp.org/pid/257/3052.html), [Adriana Fernandez-Lopez](https://scholar.google.com/citations?user=DiVeQHkAAAAJ), [Honglie Chen](https://scholar.google.com/citations?user=HPwdvwEAAAAJ), [Stavros Petridis](https://ibug.doc.ic.ac.uk/people/spetridis), [Maja Pantic](https://ibug.doc.ic.ac.uk/people/mpantic).\n", | |
| "\n", | |
| "This tutorial shows how to use Auto-AVSR model to perform speech recognition (ASR, VSR, and AV-ASR), crop mouth ROIs or extract visual speech features.\n", | |
| "\n", | |
| "**Disclaimer**: Please note that both the VSR model and AV-ASR model have been trained with videos that were pre-processed by RetinaFace. For the purpose of improving inference speed, we use mediapipe instead." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "kbRr5DNhJed7", | |
| "outputId": "0a23c774-4c6a-4acc-b9a5-f6acf1e1f963" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "/content\n", | |
| "Cloning into 'Visual_Speech_Recognition_for_Multiple_Languages'...\n", | |
| "remote: Enumerating objects: 277, done.\u001b[K\n", | |
| "remote: Counting objects: 100% (100/100), done.\u001b[K\n", | |
| "remote: Compressing objects: 100% (74/74), done.\u001b[K\n", | |
| "remote: Total 277 (delta 33), reused 81 (delta 22), pack-reused 177\u001b[K\n", | |
| "Receiving objects: 100% (277/277), 69.77 MiB | 15.68 MiB/s, done.\n", | |
| "Resolving deltas: 100% (58/58), done.\n", | |
| "/content/Visual_Speech_Recognition_for_Multiple_Languages\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "%cd \"/content/\"\n", | |
| "!git clone https://github.com/mpc001/Visual_Speech_Recognition_for_Multiple_Languages.git\n", | |
| "%cd \"Visual_Speech_Recognition_for_Multiple_Languages\"" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "id": "JRR0bdqNLXTc", | |
| "outputId": "000aa245-c7c1-422f-e7af-4c7ca1b6a297", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (2.1.0+cu118)\n", | |
| "Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (0.16.0+cu118)\n", | |
| "Requirement already satisfied: torchaudio in /usr/local/lib/python3.10/dist-packages (2.1.0+cu118)\n", | |
| "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.13.1)\n", | |
| "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.5.0)\n", | |
| "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.12)\n", | |
| "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.2.1)\n", | |
| "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.2)\n", | |
| "Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch) (2023.6.0)\n", | |
| "Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch) (2.1.0)\n", | |
| "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from torchvision) (1.23.5)\n", | |
| "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from torchvision) (2.31.0)\n", | |
| "Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.10/dist-packages (from torchvision) (9.4.0)\n", | |
| "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch) (2.1.3)\n", | |
| "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->torchvision) (3.3.2)\n", | |
| "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->torchvision) (3.4)\n", | |
| "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->torchvision) (2.0.7)\n", | |
| "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->torchvision) (2023.7.22)\n", | |
| "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)\n", | |
| "Requirement already satisfied: opencv-python in /usr/local/lib/python3.10/dist-packages (4.8.0.76)\n", | |
| "Requirement already satisfied: numpy>=1.21.2 in /usr/local/lib/python3.10/dist-packages (from opencv-python) (1.23.5)\n", | |
| "Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.3)\n", | |
| "Requirement already satisfied: numpy<1.28.0,>=1.21.6 in /usr/local/lib/python3.10/dist-packages (from scipy) (1.23.5)\n", | |
| "Requirement already satisfied: scikit-image in /usr/local/lib/python3.10/dist-packages (0.19.3)\n", | |
| "Requirement already satisfied: numpy>=1.17.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (1.23.5)\n", | |
| "Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (1.11.3)\n", | |
| "Requirement already satisfied: networkx>=2.2 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (3.2.1)\n", | |
| "Requirement already satisfied: pillow!=7.1.0,!=7.1.1,!=8.3.0,>=6.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (9.4.0)\n", | |
| "Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (2.31.6)\n", | |
| "Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (2023.9.26)\n", | |
| "Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (1.4.1)\n", | |
| "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from scikit-image) (23.2)\n", | |
| "Collecting av\n", | |
| " Downloading av-11.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.9 MB)\n", | |
| "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m32.9/32.9 MB\u001b[0m \u001b[31m49.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[?25hInstalling collected packages: av\n", | |
| "Successfully installed av-11.0.0\n", | |
| "Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (1.16.0)\n", | |
| "Collecting mediapipe\n", | |
| " Downloading mediapipe-0.10.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)\n", | |
| "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m34.5/34.5 MB\u001b[0m \u001b[31m48.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[?25hRequirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from mediapipe) (1.4.0)\n", | |
| "Requirement already satisfied: attrs>=19.1.0 in /usr/local/lib/python3.10/dist-packages (from mediapipe) (23.1.0)\n", | |
| "Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.10/dist-packages (from mediapipe) (23.5.26)\n", | |
| "Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from mediapipe) (3.7.1)\n", | |
| "Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from mediapipe) (1.23.5)\n", | |
| "Requirement already satisfied: opencv-contrib-python in /usr/local/lib/python3.10/dist-packages (from mediapipe) (4.8.0.76)\n", | |
| "Requirement already satisfied: protobuf<4,>=3.11 in /usr/local/lib/python3.10/dist-packages (from mediapipe) (3.20.3)\n", | |
| "Collecting sounddevice>=0.4.4 (from mediapipe)\n", | |
| " Downloading sounddevice-0.4.6-py3-none-any.whl (31 kB)\n", | |
| "Requirement already satisfied: CFFI>=1.0 in /usr/local/lib/python3.10/dist-packages (from sounddevice>=0.4.4->mediapipe) (1.16.0)\n", | |
| "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (1.2.0)\n", | |
| "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (0.12.1)\n", | |
| "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (4.44.3)\n", | |
| "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (1.4.5)\n", | |
| "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (23.2)\n", | |
| "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (9.4.0)\n", | |
| "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (3.1.1)\n", | |
| "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->mediapipe) (2.8.2)\n", | |
| "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from CFFI>=1.0->sounddevice>=0.4.4->mediapipe) (2.21)\n", | |
| "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->mediapipe) (1.16.0)\n", | |
| "Installing collected packages: sounddevice, mediapipe\n", | |
| "Successfully installed mediapipe-0.10.8 sounddevice-0.4.6\n", | |
| "Collecting ffmpeg-python\n", | |
| " Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)\n", | |
| "Requirement already satisfied: future in /usr/local/lib/python3.10/dist-packages (from ffmpeg-python) (0.18.3)\n", | |
| "Installing collected packages: ffmpeg-python\n", | |
| "Successfully installed ffmpeg-python-0.2.0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!pip install torch torchvision torchaudio\n", | |
| "!pip install opencv-python\n", | |
| "!pip install scipy\n", | |
| "!pip install scikit-image\n", | |
| "!pip install av\n", | |
| "!pip install six\n", | |
| "\n", | |
| "!pip install mediapipe\n", | |
| "!pip install ffmpeg-python" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "wIoPeIizMxVi" | |
| }, | |
| "source": [ | |
| "## Video preparation" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "id": "QBO8QaRHSCIJ" | |
| }, | |
| "source": [ | |
| "1. Download a video." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "Mtwa6fV4NHX4", | |
| "outputId": "07d2ea68-9fe2-490f-f07c-00efcb7745fd" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "--2023-11-28 15:55:22-- http://www.doc.ic.ac.uk/~pm4115/autoAVSR/autoavsr_demo_video.mp4\n", | |
| "Resolving www.doc.ic.ac.uk (www.doc.ic.ac.uk)... 146.169.13.6\n", | |
| "Connecting to www.doc.ic.ac.uk (www.doc.ic.ac.uk)|146.169.13.6|:80... connected.\n", | |
| "HTTP request sent, awaiting response... 200 OK\n", | |
| "Length: 3644186 (3.5M) [video/mp4]\n", | |
| "Saving to: ‘/content/data/clip.mp4’\n", | |
| "\n", | |
| "/content/data/clip. 100%[===================>] 3.47M 3.41MB/s in 1.0s \n", | |
| "\n", | |
| "2023-11-28 15:55:24 (3.41 MB/s) - ‘/content/data/clip.mp4’ saved [3644186/3644186]\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!mkdir -p /content/data/\n", | |
| "!wget --content-disposition http://www.doc.ic.ac.uk/~pm4115/autoAVSR/autoavsr_demo_video.mp4 -O /content/data/clip.mp4" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "id": "fArWyDh2NIqI" | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "from IPython.display import HTML\n", | |
| "from base64 import b64encode\n", | |
| "\n", | |
| "## play_video function based on: https://colab.research.google.com/drive/1bNXkfpHiVHzXQH8WjGhzQ-fsDxolpUjD\n", | |
| "\n", | |
| "def play_video(video_path, width=200):\n", | |
| " mp4 = open(video_path,'rb').read()\n", | |
| " data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", | |
| " return HTML(f\"\"\"\n", | |
| " <video width={width} controls>\n", | |
| " <source src=\"{data_url}\" type=\"video/mp4\">\n", | |
| " </video>\n", | |
| " \"\"\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "id": "VAX2S30tNTzC", | |
| "outputId": "e6fb17ae-311f-4f14-acc5-58352bdca38e", | |
| "colab": { | |
| "base_uri": "https://localhost:8080/", | |
| "height": 555 | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "execute_result", | |
| "data": { | |
| "text/plain": [ | |
| "<IPython.core.display.HTML object>" | |
| ], | |
| "text/html": [ | |
| "\n", | |
| " <video width=300 controls>\n", |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment