seaside2mm · November 23, 2022 01:30
diff --git a/examples.md b/examples.md
diff --git a/libtorch.md b/libtorch.md
diff --git a/onnx.md b/onnx.md
diff --git a/tensorrt.md b/tensorrt.md
diff --git a/Using_Pytorch_ONNX.ipynb b/Using_Pytorch_ONNX.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using PyTorch with TensorRT through ONNX:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TensorRT is a great way to take a trained PyTorch model and optimize it to run more efficiently during inference on an NVIDIA GPU.\n",
    "\n",
    "One approach to convert a PyTorch model to TensorRT is to export a PyTorch model to ONNX (an open format exchange for deep learning models) and then convert into a TensorRT engine. Essentially, we will follow this path to convert and deploy our model:\n",
    "\n",
    "![PyTorch+ONNX](./images/pytorch_onnx.png)\n",
    "\n",
    "Both TensorFlow and PyTorch models can be exported to ONNX, as well as many other frameworks. This allows models created using either framework to flow into common downstream pipelines.\n",
    "\n",
    "To get started, let's take a well-known computer vision model and follow five key steps to deploy it to the TensorRT Python runtime:\n",
    "\n",
    "1. __What format should I save my model in?__\n",
    "2. __What batch size(s) am I running inference at?__\n",
    "3. __What precision am I running inference at?__\n",
    "4. __What TensorRT path am I using to convert my model?__\n",
    "5. __What runtime am I targeting?__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. What format should I save my model in?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are going to use ResNet50, a widely used CNN architecture first described in <a href=https://arxiv.org/abs/1512.03385>this paper</a>.\n",
    "\n",
    "Let's start by loading dependencies and downloading the model. We will also move our Resnet model onto the GPU and set it to evaluation mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/seaside/miniconda3/envs/torch/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\n",
      "  warnings.warn(\n",
      "/home/seaside/miniconda3/envs/torch/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.\n",
      "  warnings.warn(msg)\n"
     ]
    }
   ],
   "source": [
    "import torchvision.models as models\n",
    "import torch\n",
    "import torch.onnx\n",
    "\n",
    "# load the pretrained model\n",
    "resnet50 = models.resnet50(pretrained=True, progress=False).eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When saving a model to ONNX, PyTorch requires a test batch in proper shape and format. We pick a batch size:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "BATCH_SIZE=32\n",
    "\n",
    "dummy_input=torch.randn(BATCH_SIZE, 3, 224, 224)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will export the model using the dummy input batch:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "ename": "PermissionError",
     "evalue": "[Errno 13] Permission denied: 'resnet50_pytorch.onnx'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mPermissionError\u001b[0m                           Traceback (most recent call last)",
      "\u001b[1;32m/Using_Pytorch_ONNX.ipynb Cell 9\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell://695fb9c2eb1a9c8c9830d57d6dfa4857/Using_Pytorch_ONNX.ipynb#X11sZ2lzdA%3D%3D?line=0'>1</a>\u001b[0m \u001b[39m# export the model to ONNX\u001b[39;00m\n\u001b[0;32m----> <a href='vscode-notebook-cell://695fb9c2eb1a9c8c9830d57d6dfa4857/Using_Pytorch_ONNX.ipynb#X11sZ2lzdA%3D%3D?line=1'>2</a>\u001b[0m torch\u001b[39m.\u001b[39;49monnx\u001b[39m.\u001b[39;49mexport(resnet50, dummy_input, \u001b[39m\"\u001b[39;49m\u001b[39mresnet50_pytorch.onnx\u001b[39;49m\u001b[39m\"\u001b[39;49m, verbose\u001b[39m=\u001b[39;49m\u001b[39mFalse\u001b[39;49;00m)\n",
      "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/onnx/utils.py:504\u001b[0m, in \u001b[0;36mexport\u001b[0;34m(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)\u001b[0m\n\u001b[1;32m    186\u001b[0m \u001b[39m@_beartype\u001b[39m\u001b[39m.\u001b[39mbeartype\n\u001b[1;32m    187\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mexport\u001b[39m(\n\u001b[1;32m    188\u001b[0m     model: Union[torch\u001b[39m.\u001b[39mnn\u001b[39m.\u001b[39mModule, torch\u001b[39m.\u001b[39mjit\u001b[39m.\u001b[39mScriptModule, torch\u001b[39m.\u001b[39mjit\u001b[39m.\u001b[39mScriptFunction],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    204\u001b[0m     export_modules_as_functions: Union[\u001b[39mbool\u001b[39m, Collection[Type[torch\u001b[39m.\u001b[39mnn\u001b[39m.\u001b[39mModule]]] \u001b[39m=\u001b[39m \u001b[39mFalse\u001b[39;00m,\n\u001b[1;32m    205\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m    206\u001b[0m     \u001b[39mr\u001b[39m\u001b[39m\"\"\"Exports a model into ONNX format.\u001b[39;00m\n\u001b[1;32m    207\u001b[0m \n\u001b[1;32m    208\u001b[0m \u001b[39m    If ``model`` is not a :class:`torch.jit.ScriptModule` nor a\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    501\u001b[0m \u001b[39m            All errors are subclasses of :class:`errors.OnnxExporterError`.\u001b[39;00m\n\u001b[1;32m    502\u001b[0m \u001b[39m    \"\"\"\u001b[39;00m\n\u001b[0;32m--> 504\u001b[0m     _export(\n\u001b[1;32m    505\u001b[0m         model,\n\u001b[1;32m    506\u001b[0m         args,\n\u001b[1;32m    507\u001b[0m         f,\n\u001b[1;32m    508\u001b[0m         export_params,\n\u001b[1;32m    509\u001b[0m         verbose,\n\u001b[1;32m    510\u001b[0m         training,\n\u001b[1;32m    511\u001b[0m         input_names,\n\u001b[1;32m    512\u001b[0m         output_names,\n\u001b[1;32m    513\u001b[0m         operator_export_type\u001b[39m=\u001b[39;49moperator_export_type,\n\u001b[1;32m    514\u001b[0m         opset_version\u001b[39m=\u001b[39;49mopset_version,\n\u001b[1;32m    515\u001b[0m         do_constant_folding\u001b[39m=\u001b[39;49mdo_constant_folding,\n\u001b[1;32m    516\u001b[0m         dynamic_axes\u001b[39m=\u001b[39;49mdynamic_axes,\n\u001b[1;32m    517\u001b[0m         keep_initializers_as_inputs\u001b[39m=\u001b[39;49mkeep_initializers_as_inputs,\n\u001b[1;32m    518\u001b[0m         custom_opsets\u001b[39m=\u001b[39;49mcustom_opsets,\n\u001b[1;32m    519\u001b[0m         export_modules_as_functions\u001b[39m=\u001b[39;49mexport_modules_as_functions,\n\u001b[1;32m    520\u001b[0m     )\n",
      "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/onnx/utils.py:1604\u001b[0m, in \u001b[0;36m_export\u001b[0;34m(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)\u001b[0m\n\u001b[1;32m   1602\u001b[0m \u001b[39mif\u001b[39;00m export_type \u001b[39m==\u001b[39m _exporter_states\u001b[39m.\u001b[39mExportTypes\u001b[39m.\u001b[39mPROTOBUF_FILE:\n\u001b[1;32m   1603\u001b[0m     \u001b[39massert\u001b[39;00m \u001b[39mlen\u001b[39m(export_map) \u001b[39m==\u001b[39m \u001b[39m0\u001b[39m\n\u001b[0;32m-> 1604\u001b[0m     \u001b[39mwith\u001b[39;00m torch\u001b[39m.\u001b[39;49mserialization\u001b[39m.\u001b[39;49m_open_file_like(f, \u001b[39m\"\u001b[39;49m\u001b[39mwb\u001b[39;49m\u001b[39m\"\u001b[39;49m) \u001b[39mas\u001b[39;00m opened_file:\n\u001b[1;32m   1605\u001b[0m         opened_file\u001b[39m.\u001b[39mwrite(proto)\n\u001b[1;32m   1606\u001b[0m \u001b[39melif\u001b[39;00m export_type \u001b[39min\u001b[39;00m [\n\u001b[1;32m   1607\u001b[0m     _exporter_states\u001b[39m.\u001b[39mExportTypes\u001b[39m.\u001b[39mZIP_ARCHIVE,\n\u001b[1;32m   1608\u001b[0m     _exporter_states\u001b[39m.\u001b[39mExportTypes\u001b[39m.\u001b[39mCOMPRESSED_ZIP_ARCHIVE,\n\u001b[1;32m   1609\u001b[0m ]:\n",
      "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/serialization.py:270\u001b[0m, in \u001b[0;36m_open_file_like\u001b[0;34m(name_or_buffer, mode)\u001b[0m\n\u001b[1;32m    268\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m_open_file_like\u001b[39m(name_or_buffer, mode):\n\u001b[1;32m    269\u001b[0m     \u001b[39mif\u001b[39;00m _is_path(name_or_buffer):\n\u001b[0;32m--> 270\u001b[0m         \u001b[39mreturn\u001b[39;00m _open_file(name_or_buffer, mode)\n\u001b[1;32m    271\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    272\u001b[0m         \u001b[39mif\u001b[39;00m \u001b[39m'\u001b[39m\u001b[39mw\u001b[39m\u001b[39m'\u001b[39m \u001b[39min\u001b[39;00m mode:\n",
      "File \u001b[0;32m~/miniconda3/envs/torch/lib/python3.8/site-packages/torch/serialization.py:251\u001b[0m, in \u001b[0;36m_open_file.__init__\u001b[0;34m(self, name, mode)\u001b[0m\n\u001b[1;32m    250\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__init__\u001b[39m(\u001b[39mself\u001b[39m, name, mode):\n\u001b[0;32m--> 251\u001b[0m     \u001b[39msuper\u001b[39m(_open_file, \u001b[39mself\u001b[39m)\u001b[39m.\u001b[39m\u001b[39m__init__\u001b[39m(\u001b[39mopen\u001b[39;49m(name, mode))\n",
      "\u001b[0;31mPermissionError\u001b[0m: [Errno 13] Permission denied: 'resnet50_pytorch.onnx'"
     ]
    }
   ],
   "source": [
    "# export the model to ONNX\n",
    "torch.onnx.export(resnet50, dummy_input, \"resnet50_pytorch.onnx\", verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we are picking a BATCH_SIZE of 32 in this example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now Test with a Real Image:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try a real image batch! For this example, we will simply repeat one open-source dog image from http://www.dog.ceo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "from skimage import io\n",
    "from skimage.transform import resize\n",
    "from matplotlib import pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "url='https://images.dog.ceo/breeds/retriever-golden/n02099601_3004.jpg'\n",
    "img = resize(io.imread(url), (224, 224))\n",
    "img = np.expand_dims(np.array(img, dtype=np.float32), axis=0) # Expand image to have a batch dimension\n",
    "input_batch = np.array(np.repeat(img, BATCH_SIZE, axis=0), dtype=np.float32) # Repeat across the batch dimension\n",
    "\n",
    "input_batch.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "plt.imshow(input_batch[0].astype(np.float32))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "resnet50_gpu = models.resnet50(pretrained=True, progress=False).to(\"cuda\").eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to move our batch onto GPU and properly format it to shape [32, 3, 224, 224]. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "input_batch_chw = torch.from_numpy(input_batch).transpose(1,3).transpose(2,3)\n",
    "input_batch_gpu = input_batch_chw.to(\"cuda\")\n",
    "\n",
    "input_batch_gpu.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can run a prediction on a batch using .forward():"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    predictions = np.array(resnet50_gpu(input_batch_gpu).cpu())\n",
    "\n",
    "predictions.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Verify Baseline Model Performance/Accuracy:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a baseline, lets time our prediction in FP32:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "with torch.no_grad():\n",
    "    preds = np.array(resnet50_gpu(input_batch_gpu).cpu())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also time FP16 precision performance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "resnet50_gpu_half = resnet50_gpu.half()\n",
    "input_half = input_batch_gpu.half()\n",
    "\n",
    "with torch.no_grad():\n",
    "    preds = np.array(resnet50_gpu_half(input_half).cpu()) # Warm Up\n",
    "    \n",
    "preds.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "with torch.no_grad():\n",
    "    preds = np.array(resnet50_gpu_half(input_half).cpu())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also make sure our results are accurate. We will look at the top 5 accuracy on a single image prediction. The image we are using is of a Golden Retriever, which is class 207 in the ImageNet dataset our model was trained on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "indices = (-predictions[0]).argsort()[:5]\n",
    "print(\"Class | Likelihood\")\n",
    "list(zip(indices, predictions[0][indices]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have a model exported to ONNX and a baseline to compare against! Let's now take our ONNX model and convert it to a TensorRT inference engine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's restart our Jupyter Kernel so PyTorch doesn't collide with TensorRT: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "os._exit(0) # Shut down all kernels so TRT doesn't fight with PyTorch for GPU memory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. What batch size(s) am I running inference at?\n",
    "\n",
    "We are going to run with a fixed batch size of 32 for this example. Note that above we set BATCH_SIZE to 32 when saving our model to ONNX. We need to create another dummy batch of the same size (this time it will need to be in our target precision) to test out our engine.\n",
    "\n",
    "First, as before, we will set our BATCH_SIZE to 32. Note that our trtexec command above includes the '--explicitBatch' flag to signal to TensorRT that we will be using a fixed batch size at runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "BATCH_SIZE = 32"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Importantly, by default TensorRT will use the input precision you give the runtime as the default precision for the rest of the network. So before we create our new dummy batch, we also need to choose a precision as in the next section:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. What precision am I running inference at?\n",
    "\n",
    "Remember that lower precisions than FP32 tend to run faster. There are two common reduced precision modes - FP16 and INT8. Graphics cards that are designed to do inference well often have an affinity for one of these two types. This guide was developed on an NVIDIA V100, which favors FP16, so we will use that here by default. INT8 is a more complicated process that requires a calibration step.\n",
    "\n",
    "__NOTE__: Make sure you use the same precision (USE_FP16) here you saved your model in above!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "USE_FP16 = True\n",
    "target_dtype = np.float16 if USE_FP16 else np.float32"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " To create a test batch, we will once again repeat one open-source dog image from http://www.dog.ceo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "from skimage import io\n",
    "from skimage.transform import resize\n",
    "from matplotlib import pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "url='https://images.dog.ceo/breeds/retriever-golden/n02099601_3004.jpg'\n",
    "img = resize(io.imread(url), (224, 224))\n",
    "input_batch = np.array(np.repeat(np.expand_dims(np.array(img, dtype=np.float32), axis=0), BATCH_SIZE, axis=0), dtype=np.float32)\n",
    "\n",
    "input_batch.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "plt.imshow(input_batch[0].astype(np.float32))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Preprocess Images:\n",
    "\n",
    "PyTorch has a normalization that it applies by default in all of its pretrained vision models - we can preprocess our images to match this normalization by the following, making sure our final result is in FP16 precision:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torchvision.transforms import Normalize\n",
    "\n",
    "def preprocess_image(img):\n",
    "    norm = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n",
    "    result = norm(torch.from_numpy(img).transpose(0,2).transpose(1,2))\n",
    "    return np.array(result, dtype=np.float16)\n",
    "\n",
    "preprocessed_images = np.array([preprocess_image(image) for image in input_batch])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. What TensorRT path am I using to convert my model?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use trtexec, a command line tool for working with TensorRT, in order to convert an ONNX model originally from PyTorch to an engine file.\n",
    "\n",
    "Let's make sure we have TensorRT installed (this comes with trtexec):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "import tensorrt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To convert the model we saved in the previous step, we need to point to the ONNX file, give trtexec a name to save the engine as, and last specify that we want to use a fixed batch size instead of a dynamic one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "# step out of Python for a moment to convert the ONNX model to a TRT engine using trtexec\n",
    "if USE_FP16:\n",
    "    !trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt  --explicitBatch --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16\n",
    "else:\n",
    "    !trtexec --onnx=resnet50_pytorch.onnx --saveEngine=resnet_engine_pytorch.trt  --explicitBatch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will save our model as 'resnet_engine.trt'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. What TensorRT runtime am I targeting?\n",
    "\n",
    "Now, we have a converted our model to a TensorRT engine. Great! That means we are ready to load it into the native Python TensorRT runtime. This runtime strikes a balance between the ease of use of the high level Python APIs used in frameworks and the fast, low level C++ runtimes available in TensorRT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "import tensorrt as trt\n",
    "import pycuda.driver as cuda\n",
    "import pycuda.autoinit\n",
    "\n",
    "f = open(\"resnet_engine_pytorch.trt\", \"rb\")\n",
    "runtime = trt.Runtime(trt.Logger(trt.Logger.WARNING)) \n",
    "\n",
    "engine = runtime.deserialize_cuda_engine(f.read())\n",
    "context = engine.create_execution_context()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now allocate input and output memory, give TRT pointers (bindings) to it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# need to set input and output precisions to FP16 to fully enable it\n",
    "output = np.empty([BATCH_SIZE, 1000], dtype = target_dtype) \n",
    "\n",
    "# allocate device memory\n",
    "d_input = cuda.mem_alloc(1 * input_batch.nbytes)\n",
    "d_output = cuda.mem_alloc(1 * output.nbytes)\n",
    "\n",
    "bindings = [int(d_input), int(d_output)]\n",
    "\n",
    "stream = cuda.Stream()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, set up the prediction function.\n",
    "\n",
    "This involves a copy from CPU RAM to GPU VRAM, executing the model, then copying the results back from GPU VRAM to CPU RAM:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "def predict(batch): # result gets copied into output\n",
    "    # transfer input data to device\n",
    "    cuda.memcpy_htod_async(d_input, batch, stream)\n",
    "    # execute model\n",
    "    context.execute_async_v2(bindings, stream.handle, None)\n",
    "    # transfer predictions back\n",
    "    cuda.memcpy_dtoh_async(output, d_output, stream)\n",
    "    # syncronize threads\n",
    "    stream.synchronize()\n",
    "    \n",
    "    return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's time the function!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "print(\"Warming up...\")\n",
    "\n",
    "pred = predict(preprocessed_images)\n",
    "\n",
    "print(\"Done warming up!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "\n",
    "pred = predict(preprocessed_images)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we should verify our TensorRT output is still accurate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "ename": "",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31mRunning cells with 'Python 3.8.10 64-bit' requires ipykernel package.\n",
      "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n",
      "\u001b[1;31mCommand: '/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
     ]
    }
   ],
   "source": [
    "indices = (-pred[0]).argsort()[:5]\n",
    "print(\"Class | Probability (out of 1)\")\n",
    "list(zip(indices, pred[0][indices]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Look for ImageNet indices 150-275 above, where 207 is the ground truth correct class (Golden Retriever). Compare with the results of the original unoptimized model in the first section!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h4> Profiling </h4>\n",
    "\n",
    "This is a great next step for further optimizing and debugging models you are working on productionizing\n",
    "\n",
    "You can find it here: https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html\n",
    "\n",
    "<h4>  TRT Dev Docs </h4>\n",
    "\n",
    "Main documentation page for the ONNX, layer builder, C++, and legacy APIs\n",
    "\n",
    "You can find it here: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html\n",
    "\n",
    "<h4>  TRT OSS GitHub </h4>\n",
    "\n",
    "Contains OSS TRT components, sample applications, and plugin examples\n",
    "\n",
    "You can find it here: https://github.com/NVIDIA/TensorRT\n",
    "\n",
    "\n",
    "#### TRT Supported Layers:\n",
    "\n",
    "https://github.com/NVIDIA/TensorRT/tree/main/samples/opensource/samplePlugin\n",
    "\n",
    "#### TRT ONNX Plugin Example:\n",
    "\n",
    "https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#layers-precision-matrix"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('torch')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "8d97bc586bba801e3a47b2bb936e1509677b32e0fb4ca34fe6bccd64ac7d083c"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }