Created
May 18, 2023 19:52
-
-
Save shaunaa126/0ad309208ad1eeb96843e231c121e4b1 to your computer and use it in GitHub Desktop.
Image Captioning - Implement an image captioning model using a CNN and a Transformer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "EFwSaNB8jF7s" | |
}, | |
"source": [ | |
"# Image Captioning Keras-IO" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"```\n", | |
"Title: Image Captioning\n", | |
"Author: [A_K_Nain](https://twitter.com/A_K_Nain)\n", | |
"Date created: 2021/05/29\n", | |
"Last modified: 2021/10/31\n", | |
"Description: Implement an image captioning model using a CNN and a Transformer.\n", | |
"Accelerator: GPU\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "5bwwk4uxRz6A" | |
}, | |
"source": [ | |
"## Setup" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "nQ6q39Vd-y-7" | |
}, | |
"source": [ | |
"This tutorial uses lots of imports, mostly for loading the dataset(s)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"cellView": "form", | |
"id": "U8l4RJ0XRPEm" | |
}, | |
"outputs": [], | |
"source": [ | |
"import os\n", | |
"import re\n", | |
"import numpy as np\n", | |
"import matplotlib.pyplot as plt\n", | |
"\n", | |
"import tensorflow as tf\n", | |
"from tensorflow import keras\n", | |
"from tensorflow.keras import layers\n", | |
"from tensorflow.keras.applications import efficientnet\n", | |
"from tensorflow.keras.layers import TextVectorization\n", | |
"\n", | |
"\n", | |
"seed = 111\n", | |
"np.random.seed(seed)\n", | |
"tf.random.set_seed(seed)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Kl9qGnjWrv80" | |
}, | |
"source": [ | |
"## [Optional] Data handling\n", | |
"\n", | |
"This section downloads a captions dataset and prepares it for training. It tokenizes the input text, and caches the results of running all the images through a pretrained feature-extractor model. It's not critical to understand everything in this section.\n", | |
"\n", | |
" <section class=\"expandable tfo-display-only-on-site\">\n", | |
" <button type=\"button\" class=\"button-red button expand-control\">Toggle section</button>\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "q5e_SigQFiWf" | |
}, | |
"source": [ | |
"### Choose a dataset\n", | |
"\n", | |
"This tutorial is set up to give a choice of datasets. Either [Flickr8k](https://www.ijcai.org/Proceedings/15/Papers/593.pdf) or a small slice of the [Conceptual Captions](https://ai.google.com/research/ConceptualCaptions/) dataset. These two are downloaded and converted from scratch, but it wouldn't be hard to convert the tutorial to use the caption datasets available in [TensorFlow Datasets](https://www.tensorflow.org/datasets): [Coco Captions](https://www.tensorflow.org/datasets/catalog/coco_captions) and the full [Conceptual Captions](https://www.tensorflow.org/datasets/community_catalog/huggingface/conceptual_captions).\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "wqGXX9Dc5c0v" | |
}, | |
"source": [ | |
"#### Flickr8k" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"id": "kaNy_l7tGuAZ" | |
}, | |
"outputs": [], | |
"source": [ | |
"def flickr8k(path='flickr8k'):\n", | |
" path = pathlib.Path(path)\n", | |
"\n", | |
" if len(list(path.rglob('*'))) < 16197:\n", | |
" tf.keras.utils.get_file(\n", | |
" origin='https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip',\n", | |
" cache_dir='.',\n", | |
" cache_subdir=path,\n", | |
" extract=True)\n", | |
" tf.keras.utils.get_file(\n", | |
" origin='https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip',\n", | |
" cache_dir='.',\n", | |
" cache_subdir=path,\n", | |
" extract=True)\n", | |
" \n", | |
" captions = (path/\"Flickr8k.token.txt\").read_text().splitlines()\n", | |
" captions = (line.split('\\t') for line in captions)\n", | |
" captions = ((fname.split('#')[0], caption) for (fname, caption) in captions)\n", | |
"\n", | |
" cap_dict = collections.defaultdict(list)\n", | |
" for fname, cap in captions:\n", | |
" cap_dict[fname].append(cap)\n", | |
"\n", | |
" train_files = (path/'Flickr_8k.trainImages.txt').read_text().splitlines()\n", | |
" train_captions = [(str(path/'Flicker8k_Dataset'/fname), cap_dict[fname]) for fname in train_files]\n", | |
"\n", | |
" test_files = (path/'Flickr_8k.testImages.txt').read_text().splitlines()\n", | |
" test_captions = [(str(path/'Flicker8k_Dataset'/fname), cap_dict[fname]) for fname in test_files]\n", | |
"\n", | |
" train_ds = tf.data.experimental.from_list(train_captions)\n", | |
" test_ds = tf.data.experimental.from_list(test_captions)\n", | |
"\n", | |
" return train_ds, test_ds" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "zQICBAF4FmSL" | |
}, | |
"source": [ | |
"#### Conceptual Captions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "vQwnxXZXRl12" | |
}, | |
"outputs": [], | |
"source": [ | |
"def conceptual_captions(*, data_dir=\"conceptual_captions\", num_train, num_val):\n", | |
" def iter_index(index_path):\n", | |
" with open(index_path) as f:\n", | |
" for line in f:\n", | |
" caption, url = line.strip().split('\\t')\n", | |
" yield caption, url\n", | |
"\n", | |
" def download_image_urls(data_dir, urls):\n", | |
" ex = concurrent.futures.ThreadPoolExecutor(max_workers=100)\n", | |
" def save_image(url):\n", | |
" hash = hashlib.sha1(url.encode())\n", | |
" # Name the files after the hash of the URL.\n", | |
" file_path = data_dir/f'{hash.hexdigest()}.jpeg'\n", | |
" if file_path.exists():\n", | |
" # Only download each file once.\n", | |
" return file_path\n", | |
"\n", | |
" try:\n", | |
" result = requests.get(url, timeout=5)\n", | |
" except Exception:\n", | |
" file_path = None\n", | |
" else:\n", | |
" file_path.write_bytes(result.content)\n", | |
" return file_path\n", | |
" \n", | |
" result = []\n", | |
" out_paths = ex.map(save_image, urls)\n", | |
" for file_path in tqdm.tqdm(out_paths, total=len(urls)):\n", | |
" result.append(file_path)\n", | |
"\n", | |
" return result\n", | |
"\n", | |
" def ds_from_index_file(index_path, data_dir, count):\n", | |
" data_dir.mkdir(exist_ok=True)\n", | |
" index = list(itertools.islice(iter_index(index_path), count))\n", | |
" captions = [caption for caption, url in index]\n", | |
" urls = [url for caption, url in index]\n", | |
"\n", | |
" paths = download_image_urls(data_dir, urls)\n", | |
"\n", | |
" new_captions = []\n", | |
" new_paths = []\n", | |
" for cap, path in zip(captions, paths):\n", | |
" if path is None:\n", | |
" # Download failed, so skip this pair.\n", | |
" continue\n", | |
" new_captions.append(cap)\n", | |
" new_paths.append(path)\n", | |
" \n", | |
" new_paths = [str(p) for p in new_paths]\n", | |
"\n", | |
" ds = tf.data.Dataset.from_tensor_slices((new_paths, new_captions))\n", | |
" ds = ds.map(lambda path,cap: (path, cap[tf.newaxis])) # 1 caption per image\n", | |
" return ds\n", | |
"\n", | |
" data_dir = pathlib.Path(data_dir)\n", | |
" train_index_path = tf.keras.utils.get_file(\n", | |
" origin='https://storage.googleapis.com/gcc-data/Train/GCC-training.tsv',\n", | |
" cache_subdir=data_dir,\n", | |
" cache_dir='.')\n", | |
" \n", | |
" val_index_path = tf.keras.utils.get_file(\n", | |
" origin='https://storage.googleapis.com/gcc-data/Validation/GCC-1.1.0-Validation.tsv',\n", | |
" cache_subdir=data_dir,\n", | |
" cache_dir='.')\n", | |
" \n", | |
" train_raw = ds_from_index_file(train_index_path, data_dir=data_dir/'train', count=num_train)\n", | |
" test_raw = ds_from_index_file(val_index_path, data_dir=data_dir/'val', count=num_val)\n", | |
"\n", | |
" return train_raw, test_raw" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "rBAagBw5p-TM" | |
}, | |
"source": [ | |
"#### Download the dataset" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "WFtTZaobquNr" | |
}, | |
"source": [ | |
"The Flickr8k is a good choice because it contains 5-captions per image, more data for a smaller download." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"id": "EJySPbzJ4Wxw" | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Downloading data from https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip\n", | |
"1115419746/1115419746 [==============================] - 913s 1us/step\n", | |
"Downloading data from https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip\n", | |
"2340801/2340801 [==============================] - 2s 1us/step\n" | |
] | |
} | |
], | |
"source": [ | |
"choose = 'flickr8k'\n", | |
"\n", | |
"if choose == 'flickr8k':\n", | |
" train_raw, test_raw = flickr8k()\n", | |
"else:\n", | |
" train_raw, test_raw = conceptual_captions(num_train=10000, num_val=5000)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "-UAc275FHxm8" | |
}, | |
"source": [ | |
"The loaders for both datasets above return `tf.data.Dataset`s containing `(image_path, captions)` pairs. The Flickr8k dataset contains 5 captions per image, while Conceptual Captions has 1:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 369, | |
"metadata": { | |
"id": "sAQSps5F8RQI" | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(TensorSpec(shape=(), dtype=tf.string, name=None),\n", | |
" TensorSpec(shape=(5,), dtype=tf.string, name=None))" | |
] | |
}, | |
"execution_count": 369, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"train_raw.element_spec" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 370, | |
"metadata": { | |
"id": "xIa0ZaP4tBez" | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"2023-05-12 19:56:51.980092: I tensorflow/core/common_runtime/executor.cc:1210] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_6376' with dtype string\n", | |
"\t [[{{node Placeholder/_6376}}]]\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"tf.Tensor(b'flickr8k/Flicker8k_Dataset/2513260012_03d33305cf.jpg', shape=(), dtype=string)\n", | |
"tf.Tensor(\n", | |
"[b'A black dog is running after a white dog in the snow .'\n", | |
" b'Black dog chasing brown dog through snow'\n", | |
" b'Two dogs chase each other across the snowy ground .'\n", | |
" b'Two dogs play together in the snow .'\n", | |
" b'Two dogs running through a low lying body of water .'], shape=(5,), dtype=string)\n" | |
] | |
} | |
], | |
"source": [ | |
"for ex_path, ex_captions in train_raw.take(1):\n", | |
" print(ex_path)\n", | |
" print(ex_captions)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 103, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Path to the images\n", | |
"IMAGES_PATH = \"/Users/aa849190/Downloads/github/autopilotai/autopilotai-ml/flickr8k/Flicker8k_Dataset\"\n", | |
"\n", | |
"# Desired image dimensions\n", | |
"IMAGE_SIZE = (299, 299)\n", | |
"\n", | |
"# Vocabulary size\n", | |
"VOCAB_SIZE = 10000\n", | |
"\n", | |
"# Fixed length allowed for any sequence\n", | |
"SEQ_LENGTH = 25\n", | |
"\n", | |
"# Dimension for the image embeddings and token embeddings\n", | |
"EMBED_DIM = 512\n", | |
"\n", | |
"# Per-layer units in the feed-forward network\n", | |
"FF_DIM = 512\n", | |
"\n", | |
"# Other training parameters\n", | |
"BATCH_SIZE = 64\n", | |
"EPOCHS = 1 #30\n", | |
"AUTOTUNE = tf.data.AUTOTUNE" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "uEWM9xrYcg45" | |
}, | |
"source": [ | |
"### Prepare the datasets" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 75, | |
"metadata": { | |
"id": "CZGUsuGzUfzt" | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Number of training samples: 6114\n", | |
"Number of validation samples: 1529\n" | |
] | |
} | |
], | |
"source": [ | |
"def load_captions_data(filename):\n", | |
" \"\"\"Loads captions (text) data and maps them to corresponding images.\n", | |
"\n", | |
" Args:\n", | |
" filename: Path to the text file containing caption data.\n", | |
"\n", | |
" Returns:\n", | |
" caption_mapping: Dictionary mapping image names and the corresponding captions\n", | |
" text_data: List containing all the available captions\n", | |
" \"\"\"\n", | |
"\n", | |
" with open(filename) as caption_file:\n", | |
" caption_data = caption_file.readlines()\n", | |
" caption_mapping = {}\n", | |
" text_data = []\n", | |
" images_to_skip = set()\n", | |
"\n", | |
" for line in caption_data:\n", | |
" line = line.rstrip(\"\\n\")\n", | |
" # Image name and captions are separated using a tab\n", | |
" img_name, caption = line.split(\"\\t\")\n", | |
"\n", | |
" # Each image is repeated five times for the five different captions.\n", | |
" # Each image name has a suffix `#(caption_number)`\n", | |
" img_name = img_name.split(\"#\")[0]\n", | |
" img_name = os.path.join(IMAGES_PATH, img_name.strip())\n", | |
"\n", | |
" # We will remove caption that are either too short to too long\n", | |
" tokens = caption.strip().split()\n", | |
"\n", | |
" if len(tokens) < 5 or len(tokens) > SEQ_LENGTH:\n", | |
" images_to_skip.add(img_name)\n", | |
" continue\n", | |
"\n", | |
" if img_name.endswith(\"jpg\") and img_name not in images_to_skip:\n", | |
" # We will add a start and an end token to each caption\n", | |
" caption = \"<start> \" + caption.strip() + \" <end>\"\n", | |
" text_data.append(caption)\n", | |
"\n", | |
" if img_name in caption_mapping:\n", | |
" caption_mapping[img_name].append(caption)\n", | |
" else:\n", | |
" caption_mapping[img_name] = [caption]\n", | |
"\n", | |
" for img_name in images_to_skip:\n", | |
" if img_name in caption_mapping:\n", | |
" del caption_mapping[img_name]\n", | |
"\n", | |
" return caption_mapping, text_data\n", | |
"\n", | |
"\n", | |
"def train_val_split(caption_data, train_size=0.8, shuffle=True):\n", | |
" \"\"\"Split the captioning dataset into train and validation sets.\n", | |
"\n", | |
" Args:\n", | |
" caption_data (dict): Dictionary containing the mapped caption data\n", | |
" train_size (float): Fraction of all the full dataset to use as training data\n", | |
" shuffle (bool): Whether to shuffle the dataset before splitting\n", | |
"\n", | |
" Returns:\n", | |
" Traning and validation datasets as two separated dicts\n", | |
" \"\"\"\n", | |
"\n", | |
" # 1. Get the list of all image names\n", | |
" all_images = list(caption_data.keys())\n", | |
"\n", | |
" # 2. Shuffle if necessary\n", | |
" if shuffle:\n", | |
" np.random.shuffle(all_images)\n", | |
"\n", | |
" # 3. Split into training and validation sets\n", | |
" train_size = int(len(caption_data) * train_size)\n", | |
"\n", | |
" training_data = {\n", | |
" img_name: caption_data[img_name] for img_name in all_images[:train_size]\n", | |
" }\n", | |
" validation_data = {\n", | |
" img_name: caption_data[img_name] for img_name in all_images[train_size:]\n", | |
" }\n", | |
"\n", | |
" # 4. Return the splits\n", | |
" return training_data, validation_data\n", | |
"\n", | |
"\n", | |
"# Load the dataset\n", | |
"captions_mapping, text_data = load_captions_data(\"/Users/aa849190/Downloads/github/autopilotai/autopilotai-ml/flickr8k/Flickr8k.token.txt\")\n", | |
"\n", | |
"# Split the dataset into training and validation sets\n", | |
"train_data, valid_data = train_val_split(captions_mapping)\n", | |
"print(\"Number of training samples: \", len(train_data))\n", | |
"print(\"Number of validation samples: \", len(valid_data))" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Vectorizing the text data\n", | |
"\n", | |
"```\n", | |
"We'll use the `TextVectorization` layer to vectorize the text data,\n", | |
"that is to say, to turn the\n", | |
"original strings into integer sequences where each integer represents the index of\n", | |
"a word in a vocabulary. We will use a custom string standardization scheme\n", | |
"(strip punctuation characters except `<` and `>`) and the default\n", | |
"splitting scheme (split on whitespace).\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 76, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def custom_standardization(input_string):\n", | |
" lowercase = tf.strings.lower(input_string)\n", | |
" return tf.strings.regex_replace(lowercase, \"[%s]\" % re.escape(strip_chars), \"\")\n", | |
"\n", | |
"\n", | |
"strip_chars = \"!\\\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~\"\n", | |
"strip_chars = strip_chars.replace(\"<\", \"\")\n", | |
"strip_chars = strip_chars.replace(\">\", \"\")\n", | |
"\n", | |
"vectorization = TextVectorization(\n", | |
" max_tokens=VOCAB_SIZE,\n", | |
" output_mode=\"int\",\n", | |
" output_sequence_length=SEQ_LENGTH,\n", | |
" standardize=custom_standardization,\n", | |
")\n", | |
"vectorization.adapt(text_data)\n", | |
"\n", | |
"# Data augmentation for image data\n", | |
"image_augmentation = keras.Sequential(\n", | |
" [\n", | |
" layers.RandomFlip(\"horizontal\"),\n", | |
" layers.RandomRotation(0.2),\n", | |
" layers.RandomContrast(0.3),\n", | |
" ]\n", | |
")" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Building a `tf.data.Dataset` pipeline for training\n", | |
"\n", | |
"```\n", | |
"We will generate pairs of images and corresponding captions using a `tf.data.Dataset` object.\n", | |
"The pipeline consists of two steps:\n", | |
"\n", | |
"1. Read the image from the disk\n", | |
"2. Tokenize all the five captions corresponding to the image\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 77, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def decode_and_resize(img_path):\n", | |
" img = tf.io.read_file(img_path)\n", | |
" img = tf.image.decode_jpeg(img, channels=3)\n", | |
" img = tf.image.resize(img, IMAGE_SIZE)\n", | |
" img = tf.image.convert_image_dtype(img, tf.float32)\n", | |
" return img\n", | |
"\n", | |
"\n", | |
"def process_input(img_path, captions):\n", | |
" return decode_and_resize(img_path), vectorization(captions)\n", | |
"\n", | |
"\n", | |
"def make_dataset(images, captions):\n", | |
" dataset = tf.data.Dataset.from_tensor_slices((images, captions))\n", | |
" dataset = dataset.shuffle(BATCH_SIZE * 8)\n", | |
" dataset = dataset.map(process_input, num_parallel_calls=AUTOTUNE)\n", | |
" dataset = dataset.batch(BATCH_SIZE).prefetch(AUTOTUNE)\n", | |
"\n", | |
" return dataset\n", | |
"\n", | |
"\n", | |
"# Pass the list of images and the list of corresponding captions\n", | |
"train_dataset = make_dataset(list(train_data.keys()), list(train_data.values()))\n", | |
"\n", | |
"valid_dataset = make_dataset(list(valid_data.keys()), list(valid_data.values()))" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Building the model\n", | |
"\n", | |
"```\n", | |
"Our image captioning architecture consists of three models:\n", | |
"\n", | |
"1. A CNN: used to extract the image features\n", | |
"2. A TransformerEncoder: The extracted image features are then passed to a Transformer\n", | |
" based encoder that generates a new representation of the inputs\n", | |
"3. A TransformerDecoder: This model takes the encoder output and the text data\n", | |
" (sequences) as inputs and tries to learn to generate the caption.\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 170, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_cnn_model():\n", | |
" base_model = efficientnet.EfficientNetB0(\n", | |
" input_shape=(*IMAGE_SIZE, 3),\n", | |
" include_top=False,\n", | |
" weights=\"imagenet\",\n", | |
" )\n", | |
" # We freeze our feature extractor\n", | |
" base_model.trainable = False\n", | |
" base_model_out = base_model.output\n", | |
" base_model_out = layers.Reshape((-1, base_model_out.shape[-1]))(base_model_out)\n", | |
" cnn_model = keras.models.Model(base_model.input, base_model_out)\n", | |
" return cnn_model\n", | |
"\n", | |
"\n", | |
"class TransformerEncoderBlock(layers.Layer):\n", | |
" def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):\n", | |
" super().__init__(**kwargs)\n", | |
" self.embed_dim = embed_dim\n", | |
" self.dense_dim = dense_dim\n", | |
" self.num_heads = num_heads\n", | |
" self.attention_1 = layers.MultiHeadAttention(\n", | |
" num_heads=num_heads, key_dim=embed_dim, dropout=0.0\n", | |
" )\n", | |
" self.layernorm_1 = layers.LayerNormalization()\n", | |
" self.layernorm_2 = layers.LayerNormalization()\n", | |
" self.dense_1 = layers.Dense(embed_dim, activation=\"relu\")\n", | |
"\n", | |
" def call(self, inputs, training, mask=None):\n", | |
" inputs = self.layernorm_1(inputs)\n", | |
" inputs = self.dense_1(inputs)\n", | |
"\n", | |
" attention_output_1 = self.attention_1(\n", | |
" query=inputs,\n", | |
" value=inputs,\n", | |
" key=inputs,\n", | |
" attention_mask=None,\n", | |
" training=training,\n", | |
" )\n", | |
" out_1 = self.layernorm_2(inputs + attention_output_1)\n", | |
" return out_1\n", | |
"\n", | |
"\n", | |
"class PositionalEmbedding(layers.Layer):\n", | |
" def __init__(self, sequence_length, vocab_size, embed_dim, **kwargs):\n", | |
" super().__init__(**kwargs)\n", | |
" self.token_embeddings = layers.Embedding(\n", | |
" input_dim=vocab_size, output_dim=embed_dim\n", | |
" )\n", | |
" self.position_embeddings = layers.Embedding(\n", | |
" input_dim=sequence_length, output_dim=embed_dim\n", | |
" )\n", | |
" self.sequence_length = sequence_length\n", | |
" self.vocab_size = vocab_size\n", | |
" self.embed_dim = embed_dim\n", | |
" self.embed_scale = tf.math.sqrt(tf.cast(embed_dim, tf.float32))\n", | |
"\n", | |
" def call(self, inputs):\n", | |
" length = tf.shape(inputs)[-1]\n", | |
" positions = tf.range(start=0, limit=length, delta=1)\n", | |
" embedded_tokens = self.token_embeddings(inputs)\n", | |
" embedded_tokens = embedded_tokens * self.embed_scale\n", | |
" embedded_positions = self.position_embeddings(positions)\n", | |
" return embedded_tokens + embedded_positions\n", | |
"\n", | |
" def compute_mask(self, inputs, mask=None):\n", | |
" return tf.math.not_equal(inputs, 0)\n", | |
"\n", | |
"\n", | |
"class TransformerDecoderBlock(layers.Layer):\n", | |
" def __init__(self, embed_dim, ff_dim, num_heads, **kwargs):\n", | |
" super().__init__(**kwargs)\n", | |
" self.embed_dim = embed_dim\n", | |
" self.ff_dim = ff_dim\n", | |
" self.num_heads = num_heads\n", | |
" self.attention_1 = layers.MultiHeadAttention(\n", | |
" num_heads=num_heads, key_dim=embed_dim, dropout=0.1\n", | |
" )\n", | |
" self.attention_2 = layers.MultiHeadAttention(\n", | |
" num_heads=num_heads, key_dim=embed_dim, dropout=0.1\n", | |
" )\n", | |
" self.ffn_layer_1 = layers.Dense(ff_dim, activation=\"relu\")\n", | |
" self.ffn_layer_2 = layers.Dense(embed_dim)\n", | |
"\n", | |
" self.layernorm_1 = layers.LayerNormalization()\n", | |
" self.layernorm_2 = layers.LayerNormalization()\n", | |
" self.layernorm_3 = layers.LayerNormalization()\n", | |
"\n", | |
" self.embedding = PositionalEmbedding(\n", | |
" embed_dim=EMBED_DIM, sequence_length=SEQ_LENGTH, vocab_size=VOCAB_SIZE\n", | |
" )\n", | |
" self.out = layers.Dense(VOCAB_SIZE, activation=\"softmax\")\n", | |
"\n", | |
" self.dropout_1 = layers.Dropout(0.3)\n", | |
" self.dropout_2 = layers.Dropout(0.5)\n", | |
" self.supports_masking = True\n", | |
"\n", | |
" def call(self, inputs, encoder_outputs, training, mask=None):\n", | |
" inputs = self.embedding(inputs)\n", | |
" causal_mask = self.get_causal_attention_mask(inputs)\n", | |
"\n", | |
" if mask is not None:\n", | |
" padding_mask = tf.cast(mask[:, :, tf.newaxis], dtype=tf.int32)\n", | |
" combined_mask = tf.cast(mask[:, tf.newaxis, :], dtype=tf.int32)\n", | |
" combined_mask = tf.minimum(combined_mask, causal_mask)\n", | |
"\n", | |
" attention_output_1 = self.attention_1(\n", | |
" query=inputs,\n", | |
" value=inputs,\n", | |
" key=inputs,\n", | |
" attention_mask=combined_mask,\n", | |
" training=training,\n", | |
" )\n", | |
" out_1 = self.layernorm_1(inputs + attention_output_1)\n", | |
"\n", | |
" attention_output_2 = self.attention_2(\n", | |
" query=out_1,\n", | |
" value=encoder_outputs,\n", | |
" key=encoder_outputs,\n", | |
" attention_mask=padding_mask,\n", | |
" training=training,\n", | |
" )\n", | |
" out_2 = self.layernorm_2(out_1 + attention_output_2)\n", | |
"\n", | |
" ffn_out = self.ffn_layer_1(out_2)\n", | |
" ffn_out = self.dropout_1(ffn_out, training=training)\n", | |
" ffn_out = self.ffn_layer_2(ffn_out)\n", | |
"\n", | |
" ffn_out = self.layernorm_3(ffn_out + out_2, training=training)\n", | |
" ffn_out = self.dropout_2(ffn_out, training=training)\n", | |
" preds = self.out(ffn_out)\n", | |
" return preds\n", | |
"\n", | |
" def get_causal_attention_mask(self, inputs):\n", | |
" input_shape = tf.shape(inputs)\n", | |
" batch_size, sequence_length = input_shape[0], input_shape[1]\n", | |
" i = tf.range(sequence_length)[:, tf.newaxis]\n", | |
" j = tf.range(sequence_length)\n", | |
" mask = tf.cast(i >= j, dtype=\"int32\")\n", | |
" mask = tf.reshape(mask, (1, input_shape[1], input_shape[1]))\n", | |
" mult = tf.concat(\n", | |
" [tf.expand_dims(batch_size, -1), tf.constant([1, 1], dtype=tf.int32)],\n", | |
" axis=0,\n", | |
" )\n", | |
" return tf.tile(mask, mult)\n", | |
"\n", | |
"\n", | |
"class ImageCaptioningModel(keras.Model):\n", | |
" def __init__(\n", | |
" self,\n", | |
" cnn_model,\n", | |
" encoder,\n", | |
" decoder,\n", | |
" num_captions_per_image=5,\n", | |
" image_aug=None,\n", | |
" ):\n", | |
" super().__init__()\n", | |
" self.cnn_model = cnn_model\n", | |
" self.encoder = encoder\n", | |
" self.decoder = decoder\n", | |
" self.loss_tracker = keras.metrics.Mean(name=\"loss\")\n", | |
" self.acc_tracker = keras.metrics.Mean(name=\"accuracy\")\n", | |
" self.num_captions_per_image = num_captions_per_image\n", | |
" self.image_aug = image_aug\n", | |
"\n", | |
" def calculate_loss(self, y_true, y_pred, mask):\n", | |
" loss = self.loss(y_true, y_pred)\n", | |
" mask = tf.cast(mask, dtype=loss.dtype)\n", | |
" loss *= mask\n", | |
" return tf.reduce_sum(loss) / tf.reduce_sum(mask)\n", | |
"\n", | |
" def calculate_accuracy(self, y_true, y_pred, mask):\n", | |
" accuracy = tf.equal(y_true, tf.argmax(y_pred, axis=2))\n", | |
" accuracy = tf.math.logical_and(mask, accuracy)\n", | |
" accuracy = tf.cast(accuracy, dtype=tf.float32)\n", | |
" mask = tf.cast(mask, dtype=tf.float32)\n", | |
" return tf.reduce_sum(accuracy) / tf.reduce_sum(mask)\n", | |
"\n", | |
" def _compute_caption_loss_and_acc(self, img_embed, batch_seq, training=True):\n", | |
" encoder_out = self.encoder(img_embed, training=training)\n", | |
" batch_seq_inp = batch_seq[:, :-1]\n", | |
" batch_seq_true = batch_seq[:, 1:]\n", | |
" mask = tf.math.not_equal(batch_seq_true, 0)\n", | |
" batch_seq_pred = self.decoder(\n", | |
" batch_seq_inp, encoder_out, training=training, mask=mask\n", | |
" )\n", | |
" loss = self.calculate_loss(batch_seq_true, batch_seq_pred, mask)\n", | |
" acc = self.calculate_accuracy(batch_seq_true, batch_seq_pred, mask)\n", | |
" return loss, acc\n", | |
"\n", | |
" def train_step(self, batch_data):\n", | |
" batch_img, batch_seq = batch_data\n", | |
" batch_loss = 0\n", | |
" batch_acc = 0\n", | |
"\n", | |
" if self.image_aug:\n", | |
" batch_img = self.image_aug(batch_img)\n", | |
"\n", | |
" # 1. Get image embeddings\n", | |
" img_embed = self.cnn_model(batch_img)\n", | |
"\n", | |
" # 2. Pass each of the five captions one by one to the decoder\n", | |
" # along with the encoder outputs and compute the loss as well as accuracy\n", | |
" # for each caption.\n", | |
" for i in range(self.num_captions_per_image):\n", | |
" with tf.GradientTape() as tape:\n", | |
" loss, acc = self._compute_caption_loss_and_acc(\n", | |
" img_embed, batch_seq[:, i, :], training=True\n", | |
" )\n", | |
"\n", | |
" # 3. Update loss and accuracy\n", | |
" batch_loss += loss\n", | |
" batch_acc += acc\n", | |
"\n", | |
" # 4. Get the list of all the trainable weights\n", | |
" train_vars = (\n", | |
" self.encoder.trainable_variables + self.decoder.trainable_variables\n", | |
" )\n", | |
"\n", | |
" # 5. Get the gradients\n", | |
" grads = tape.gradient(loss, train_vars)\n", | |
"\n", | |
" # 6. Update the trainable weights\n", | |
" self.optimizer.apply_gradients(zip(grads, train_vars))\n", | |
"\n", | |
" # 7. Update the trackers\n", | |
" batch_acc /= float(self.num_captions_per_image)\n", | |
" self.loss_tracker.update_state(batch_loss)\n", | |
" self.acc_tracker.update_state(batch_acc)\n", | |
"\n", | |
" # 8. Return the loss and accuracy values\n", | |
" return {\"loss\": self.loss_tracker.result(), \"acc\": self.acc_tracker.result()}\n", | |
"\n", | |
" def test_step(self, batch_data):\n", | |
" batch_img, batch_seq = batch_data\n", | |
" batch_loss = 0\n", | |
" batch_acc = 0\n", | |
"\n", | |
" # 1. Get image embeddings\n", | |
" img_embed = self.cnn_model(batch_img)\n", | |
"\n", | |
" # 2. Pass each of the five captions one by one to the decoder\n", | |
" # along with the encoder outputs and compute the loss as well as accuracy\n", | |
" # for each caption.\n", | |
" for i in range(self.num_captions_per_image):\n", | |
" loss, acc = self._compute_caption_loss_and_acc(\n", | |
" img_embed, batch_seq[:, i, :], training=False\n", | |
" )\n", | |
"\n", | |
" # 3. Update batch loss and batch accuracy\n", | |
" batch_loss += loss\n", | |
" batch_acc += acc\n", | |
"\n", | |
" batch_acc /= float(self.num_captions_per_image)\n", | |
"\n", | |
" # 4. Update the trackers\n", | |
" self.loss_tracker.update_state(batch_loss)\n", | |
" self.acc_tracker.update_state(batch_acc)\n", | |
"\n", | |
" # 5. Return the loss and accuracy values\n", | |
" return {\"loss\": self.loss_tracker.result(), \"acc\": self.acc_tracker.result()}\n", | |
"\n", | |
" @property\n", | |
" def metrics(self):\n", | |
" # We need to list our metrics here so the `reset_states()` can be\n", | |
" # called automatically.\n", | |
" return [self.loss_tracker, self.acc_tracker]\n", | |
"\n", | |
" # @tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.float32)])\n", | |
" # def call(self, x):\n", | |
" # result = x + x\n", | |
" # return {\n", | |
" # \"encoded_result\": result\n", | |
" # }\n", | |
" @tf.function(input_signature=[tf.TensorSpec(shape=[299, 299, 3], dtype=tf.float32)])\n", | |
" def call(self, image):\n", | |
" vocab = vectorization.get_vocabulary()\n", | |
" index_lookup = dict(zip(range(len(vocab)), vocab))\n", | |
" max_decoded_sentence_length = SEQ_LENGTH - 1\n", | |
"\n", | |
" # Read the image from the disk\n", | |
" sample_img = decode_and_resize(image)\n", | |
" img = sample_img.numpy().clip(0, 255).astype(np.uint8)\n", | |
"\n", | |
" # Pass the image to the CNN\n", | |
" img = tf.expand_dims(sample_img, 0)\n", | |
" img = self.cnn_model(img)\n", | |
"\n", | |
" # Pass the image features to the Transformer encoder\n", | |
" encoded_img = self.encoder(img, training=False)\n", | |
"\n", | |
" # Generate the caption using the Transformer decoder\n", | |
" decoded_caption = \"<start> \"\n", | |
" for i in range(max_decoded_sentence_length):\n", | |
" tokenized_caption = vectorization([decoded_caption])[:, :-1]\n", | |
" mask = tf.math.not_equal(tokenized_caption, tf.constant(0))\n", | |
" predictions = self.decoder(\n", | |
" tokenized_caption, encoded_img, training=False, mask=mask\n", | |
" )\n", | |
" sampled_token_index = np.argmax(predictions[0, i, :])\n", | |
" sampled_token = index_lookup[sampled_token_index]\n", | |
" # if sampled_token == \"<end>\":\n", | |
" # break\n", | |
" # tf.cond(sampled_token == \"<end>\", lambda: \"test\", lambda: \"continue\")\n", | |
" decoded_caption += \" \" + sampled_token\n", | |
"\n", | |
" decoded_caption = decoded_caption.replace(\"<start> \", \"\")\n", | |
" decoded_caption = decoded_caption.replace(\" <end>\", \"\").strip()\n", | |
" return {\n", | |
" \"result\": decoded_caption\n", | |
" }\n", | |
"\n", | |
"cnn_model = get_cnn_model()\n", | |
"encoder = TransformerEncoderBlock(embed_dim=EMBED_DIM, dense_dim=FF_DIM, num_heads=1)\n", | |
"decoder = TransformerDecoderBlock(embed_dim=EMBED_DIM, ff_dim=FF_DIM, num_heads=2)\n", | |
"caption_model = ImageCaptioningModel(\n", | |
" cnn_model=cnn_model,\n", | |
" encoder=encoder,\n", | |
" decoder=decoder,\n", | |
" image_aug=image_augmentation,\n", | |
")" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Model training" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 171, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"96/96 [==============================] - 357s 4s/step - loss: 24.4085 - acc: 0.1952 - val_loss: 19.6365 - val_acc: 0.3251\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"<keras.src.callbacks.History at 0x4a2a27310>" | |
] | |
}, | |
"execution_count": 171, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# Define the loss function\n", | |
"cross_entropy = keras.losses.SparseCategoricalCrossentropy(\n", | |
" from_logits=False, reduction=\"none\"\n", | |
")\n", | |
"\n", | |
"# EarlyStopping criteria\n", | |
"early_stopping = keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True)\n", | |
"\n", | |
"\n", | |
"# Learning Rate Scheduler for the optimizer\n", | |
"class LRSchedule(keras.optimizers.schedules.LearningRateSchedule):\n", | |
" def __init__(self, post_warmup_learning_rate, warmup_steps):\n", | |
" super().__init__()\n", | |
" self.post_warmup_learning_rate = post_warmup_learning_rate\n", | |
" self.warmup_steps = warmup_steps\n", | |
"\n", | |
" def __call__(self, step):\n", | |
" global_step = tf.cast(step, tf.float32)\n", | |
" warmup_steps = tf.cast(self.warmup_steps, tf.float32)\n", | |
" warmup_progress = global_step / warmup_steps\n", | |
" warmup_learning_rate = self.post_warmup_learning_rate * warmup_progress\n", | |
" return tf.cond(\n", | |
" global_step < warmup_steps,\n", | |
" lambda: warmup_learning_rate,\n", | |
" lambda: self.post_warmup_learning_rate,\n", | |
" )\n", | |
"\n", | |
"\n", | |
"# Create a learning rate schedule\n", | |
"num_train_steps = len(train_dataset) * EPOCHS\n", | |
"num_warmup_steps = num_train_steps // 15\n", | |
"lr_schedule = LRSchedule(post_warmup_learning_rate=1e-4, warmup_steps=num_warmup_steps)\n", | |
"\n", | |
"# Compile the model\n", | |
"caption_model.compile(optimizer=tf.keras.optimizers.legacy.Adam(lr_schedule), loss=cross_entropy)\n", | |
"\n", | |
"# Fit the model\n", | |
"caption_model.fit(\n", | |
" train_dataset,\n", | |
" epochs=EPOCHS,\n", | |
" validation_data=valid_dataset,\n", | |
" callbacks=[early_stopping],\n", | |
")" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Check sample predictions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 172, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"\"\"\"\n", | |
"## Check sample predictions\n", | |
"\"\"\"\n", | |
"\n", | |
"vocab = vectorization.get_vocabulary()\n", | |
"index_lookup = dict(zip(range(len(vocab)), vocab))\n", | |
"max_decoded_sentence_length = SEQ_LENGTH - 1\n", | |
"valid_images = list(valid_data.keys())\n", | |
"\n", | |
"def generate_caption():\n", | |
" # Select a random image from the validation dataset\n", | |
" sample_img = np.random.choice(valid_images)\n", | |
"\n", | |
" # Read the image from the disk\n", | |
" sample_img = decode_and_resize(sample_img)\n", | |
" img = sample_img.numpy().clip(0, 255).astype(np.uint8)\n", | |
" plt.imshow(img)\n", | |
" plt.show()\n", | |
"\n", | |
" # Pass the image to the CNN\n", | |
" img = tf.expand_dims(sample_img, 0)\n", | |
" img = caption_model.cnn_model(img)\n", | |
"\n", | |
" # Pass the image features to the Transformer encoder\n", | |
" encoded_img = caption_model.encoder(img, training=False)\n", | |
"\n", | |
" # Generate the caption using the Transformer decoder\n", | |
" decoded_caption = \"<start> \"\n", | |
" for i in range(max_decoded_sentence_length):\n", | |
" tokenized_caption = vectorization([decoded_caption])[:, :-1]\n", | |
" mask = tf.math.not_equal(tokenized_caption, 0)\n", | |
" predictions = caption_model.decoder(\n", | |
" tokenized_caption, encoded_img, training=False, mask=mask\n", | |
" )\n", | |
" sampled_token_index = np.argmax(predictions[0, i, :])\n", | |
" sampled_token = index_lookup[sampled_token_index]\n", | |
" if sampled_token == \"<end>\":\n", | |
" break\n", | |
" decoded_caption += \" \" + sampled_token\n", | |
"\n", | |
" decoded_caption = decoded_caption.replace(\"<start> \", \"\")\n", | |
" decoded_caption = decoded_caption.replace(\" <end>\", \"\").strip()\n", | |
" print(\"Predicted Caption: \", decoded_caption)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 173, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "", | |
"text/plain": [ | |
"<Figure size 640x480 with 1 Axes>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Predicted Caption: a man is standing on a beach\n" | |
] | |
} | |
], | |
"source": [ | |
"# Check predictions for a sample\n", | |
"generate_caption()" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Saving and Converting the Model to Tensorflow Lite\n", | |
"\n", | |
"Save the Tensorflow Model from `Keras Model`.\n", | |
"\n", | |
"After the Model is saved to disk, convert the model to a more efficient mobile version of the model using Tensorflow Lite." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 174, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Skipping full serialization of Keras layer <__main__.ImageCaptioningModel object at 0x4b2b65d10>, because it is not built.\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Skipping full serialization of Keras layer <__main__.ImageCaptioningModel object at 0x4b2b65d10>, because it is not built.\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.\n" | |
] | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.\n" | |
] | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"WARNING:tensorflow:Model's `__init__()` arguments contain non-serializable objects. Please implement a `get_config()` method in the subclassed Model for proper saving and loading. Defaulting to empty config.\n" | |
] | |
}, | |
{ | |
"ename": "OperatorNotAllowedInGraphError", | |
"evalue": "Exception encountered when calling layer 'image_captioning_model_14' (type ImageCaptioningModel).\n\nin user code:\n\n File \"/var/folders/2b/p4gxgkbj4qlg9wgcqphy88sw0000gq/T/ipykernel_94456/1975170582.py\", line 276, in call *\n vocab = vectorization.get_vocabulary()\n File \"/Users/aa849190/miniconda/envs/my-proj/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py\", line 493, in get_vocabulary **\n return self._lookup_layer.get_vocabulary(include_special_tokens)\n File \"/Users/aa849190/miniconda/envs/my-proj/lib/python3.11/site-packages/keras/src/layers/preprocessing/index_lookup.py\", line 382, in get_vocabulary\n if self.lookup_table.size() == 0:\n\n OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.\n\n\nCall arguments received by layer 'image_captioning_model_14' (type ImageCaptioningModel):\n • image=tf.Tensor(shape=(299, 299, 3), dtype=float32)", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mOperatorNotAllowedInGraphError\u001b[0m Traceback (most recent call last)", | |
"Cell \u001b[0;32mIn[174], line 14\u001b[0m\n\u001b[1;32m 7\u001b[0m converter \u001b[39m=\u001b[39m tf\u001b[39m.\u001b[39mlite\u001b[39m.\u001b[39mTFLiteConverter\u001b[39m.\u001b[39mfrom_keras_model(caption_model)\n\u001b[1;32m 9\u001b[0m converter\u001b[39m.\u001b[39mtarget_spec\u001b[39m.\u001b[39msupported_ops \u001b[39m=\u001b[39m [\n\u001b[1;32m 10\u001b[0m tf\u001b[39m.\u001b[39mlite\u001b[39m.\u001b[39mOpsSet\u001b[39m.\u001b[39mTFLITE_BUILTINS, \u001b[39m# enable TensorFlow Lite ops.\u001b[39;00m\n\u001b[1;32m 11\u001b[0m tf\u001b[39m.\u001b[39mlite\u001b[39m.\u001b[39mOpsSet\u001b[39m.\u001b[39mSELECT_TF_OPS \u001b[39m# enable TensorFlow ops.\u001b[39;00m\n\u001b[1;32m 12\u001b[0m ]\n\u001b[0;32m---> 14\u001b[0m tflite_model \u001b[39m=\u001b[39m converter\u001b[39m.\u001b[39mconvert()\n\u001b[1;32m 16\u001b[0m \u001b[39m# Print the signatures from the converted model\u001b[39;00m\n\u001b[1;32m 17\u001b[0m interpreter \u001b[39m=\u001b[39m tf\u001b[39m.\u001b[39mlite\u001b[39m.\u001b[39mInterpreter(model_content\u001b[39m=\u001b[39mtflite_model)\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/lite.py:1065\u001b[0m, in \u001b[0;36m_export_metrics.<locals>.wrapper\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1062\u001b[0m \u001b[39m@functools\u001b[39m\u001b[39m.\u001b[39mwraps(convert_func)\n\u001b[1;32m 1063\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mwrapper\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[1;32m 1064\u001b[0m \u001b[39m# pylint: disable=protected-access\u001b[39;00m\n\u001b[0;32m-> 1065\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_convert_and_export_metrics(convert_func, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/lite.py:1042\u001b[0m, in \u001b[0;36mTFLiteConverterBase._convert_and_export_metrics\u001b[0;34m(self, convert_func, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1040\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_save_conversion_params_metric()\n\u001b[1;32m 1041\u001b[0m start_time \u001b[39m=\u001b[39m time\u001b[39m.\u001b[39mprocess_time()\n\u001b[0;32m-> 1042\u001b[0m result \u001b[39m=\u001b[39m convert_func(\u001b[39mself\u001b[39m, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 1043\u001b[0m elapsed_time_ms \u001b[39m=\u001b[39m (time\u001b[39m.\u001b[39mprocess_time() \u001b[39m-\u001b[39m start_time) \u001b[39m*\u001b[39m \u001b[39m1000\u001b[39m\n\u001b[1;32m 1044\u001b[0m \u001b[39mif\u001b[39;00m result:\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/lite.py:1531\u001b[0m, in \u001b[0;36mTFLiteKerasModelConverterV2.convert\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1527\u001b[0m \u001b[39mif\u001b[39;00m saved_model_convert_result:\n\u001b[1;32m 1528\u001b[0m \u001b[39mreturn\u001b[39;00m saved_model_convert_result\n\u001b[1;32m 1530\u001b[0m graph_def, input_tensors, output_tensors, frozen_func \u001b[39m=\u001b[39m (\n\u001b[0;32m-> 1531\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_freeze_keras_model()\n\u001b[1;32m 1532\u001b[0m )\n\u001b[1;32m 1534\u001b[0m graph_def \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_optimize_tf_model(\n\u001b[1;32m 1535\u001b[0m graph_def, input_tensors, output_tensors, frozen_func\n\u001b[1;32m 1536\u001b[0m )\n\u001b[1;32m 1538\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39msuper\u001b[39m(TFLiteKerasModelConverterV2, \u001b[39mself\u001b[39m)\u001b[39m.\u001b[39mconvert(\n\u001b[1;32m 1539\u001b[0m graph_def, input_tensors, output_tensors\n\u001b[1;32m 1540\u001b[0m )\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/convert_phase.py:215\u001b[0m, in \u001b[0;36mconvert_phase.<locals>.actual_decorator.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 213\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m error:\n\u001b[1;32m 214\u001b[0m report_error_message(\u001b[39mstr\u001b[39m(error))\n\u001b[0;32m--> 215\u001b[0m \u001b[39mraise\u001b[39;00m error \u001b[39mfrom\u001b[39;00m \u001b[39mNone\u001b[39;00m\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/convert_phase.py:205\u001b[0m, in \u001b[0;36mconvert_phase.<locals>.actual_decorator.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 202\u001b[0m \u001b[39m@functools\u001b[39m\u001b[39m.\u001b[39mwraps(func)\n\u001b[1;32m 203\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mwrapper\u001b[39m(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[1;32m 204\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 205\u001b[0m \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 206\u001b[0m \u001b[39mexcept\u001b[39;00m ConverterError \u001b[39mas\u001b[39;00m converter_error:\n\u001b[1;32m 207\u001b[0m \u001b[39mif\u001b[39;00m converter_error\u001b[39m.\u001b[39merrors:\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/lite.py:1478\u001b[0m, in \u001b[0;36mTFLiteKerasModelConverterV2._freeze_keras_model\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1476\u001b[0m \u001b[39m# TODO(b/169898786): Use the Keras public API when TFLite moves out of TF\u001b[39;00m\n\u001b[1;32m 1477\u001b[0m func \u001b[39m=\u001b[39m _trace_model_call(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_keras_model, input_signature)\n\u001b[0;32m-> 1478\u001b[0m concrete_func \u001b[39m=\u001b[39m func\u001b[39m.\u001b[39mget_concrete_function()\n\u001b[1;32m 1479\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_funcs \u001b[39m=\u001b[39m [concrete_func]\n\u001b[1;32m 1481\u001b[0m frozen_func, graph_def \u001b[39m=\u001b[39m (\n\u001b[1;32m 1482\u001b[0m _convert_to_constants\u001b[39m.\u001b[39mconvert_variables_to_constants_v2_as_graph(\n\u001b[1;32m 1483\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_funcs[\u001b[39m0\u001b[39m], lower_control_flow\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m\n\u001b[1;32m 1484\u001b[0m )\n\u001b[1;32m 1485\u001b[0m )\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1189\u001b[0m, in \u001b[0;36mFunction.get_concrete_function\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1187\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mget_concrete_function\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[1;32m 1188\u001b[0m \u001b[39m# Implements GenericFunction.get_concrete_function.\u001b[39;00m\n\u001b[0;32m-> 1189\u001b[0m concrete \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_get_concrete_function_garbage_collected(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 1190\u001b[0m concrete\u001b[39m.\u001b[39m_garbage_collector\u001b[39m.\u001b[39mrelease() \u001b[39m# pylint: disable=protected-access\u001b[39;00m\n\u001b[1;32m 1191\u001b[0m \u001b[39mreturn\u001b[39;00m concrete\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:1169\u001b[0m, in \u001b[0;36mFunction._get_concrete_function_garbage_collected\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1167\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_variable_creation_fn \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m 1168\u001b[0m initializers \u001b[39m=\u001b[39m []\n\u001b[0;32m-> 1169\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initialize(args, kwargs, add_initializers_to\u001b[39m=\u001b[39minitializers)\n\u001b[1;32m 1170\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initialize_uninitialized_variables(initializers)\n\u001b[1;32m 1172\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_created_variables:\n\u001b[1;32m 1173\u001b[0m \u001b[39m# In this case we have created variables on the first call, so we run the\u001b[39;00m\n\u001b[1;32m 1174\u001b[0m \u001b[39m# version which is guaranteed to never create variables.\u001b[39;00m\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:694\u001b[0m, in \u001b[0;36mFunction._initialize\u001b[0;34m(self, args, kwds, add_initializers_to)\u001b[0m\n\u001b[1;32m 691\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_variable_creation_fn\u001b[39m.\u001b[39m_name \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_name \u001b[39m# pylint: disable=protected-access\u001b[39;00m\n\u001b[1;32m 692\u001b[0m \u001b[39m# Force the definition of the function for these arguments\u001b[39;00m\n\u001b[1;32m 693\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_concrete_variable_creation_fn \u001b[39m=\u001b[39m (\n\u001b[0;32m--> 694\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_variable_creation_fn \u001b[39m# pylint: disable=protected-access\u001b[39;00m\n\u001b[1;32m 695\u001b[0m \u001b[39m.\u001b[39m_get_concrete_function_internal_garbage_collected(\n\u001b[1;32m 696\u001b[0m \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwds))\n\u001b[1;32m 698\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39minvalid_creator_scope\u001b[39m(\u001b[39m*\u001b[39munused_args, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39munused_kwds):\n\u001b[1;32m 699\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"Disables variable creation.\"\"\"\u001b[39;00m\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py:176\u001b[0m, in \u001b[0;36mTracingCompiler._get_concrete_function_internal_garbage_collected\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 174\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"Returns a concrete function which cleans up its graph function.\"\"\"\u001b[39;00m\n\u001b[1;32m 175\u001b[0m \u001b[39mwith\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_lock:\n\u001b[0;32m--> 176\u001b[0m concrete_function, _ \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maybe_define_concrete_function(args, kwargs)\n\u001b[1;32m 177\u001b[0m \u001b[39mreturn\u001b[39;00m concrete_function\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py:171\u001b[0m, in \u001b[0;36mTracingCompiler._maybe_define_concrete_function\u001b[0;34m(self, args, kwargs)\u001b[0m\n\u001b[1;32m 168\u001b[0m args \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39minput_signature\n\u001b[1;32m 169\u001b[0m kwargs \u001b[39m=\u001b[39m {}\n\u001b[0;32m--> 171\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maybe_define_function(args, kwargs)\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py:398\u001b[0m, in \u001b[0;36mTracingCompiler._maybe_define_function\u001b[0;34m(self, args, kwargs)\u001b[0m\n\u001b[1;32m 395\u001b[0m args \u001b[39m=\u001b[39m placeholder_bound_args\u001b[39m.\u001b[39margs\n\u001b[1;32m 396\u001b[0m kwargs \u001b[39m=\u001b[39m placeholder_bound_args\u001b[39m.\u001b[39mkwargs\n\u001b[0;32m--> 398\u001b[0m concrete_function \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_create_concrete_function(\n\u001b[1;32m 399\u001b[0m args, kwargs, func_graph)\n\u001b[1;32m 401\u001b[0m \u001b[39m# TODO(b/263520817): Remove access to private attribute.\u001b[39;00m\n\u001b[1;32m 402\u001b[0m graph_capture_container \u001b[39m=\u001b[39m concrete_function\u001b[39m.\u001b[39mgraph\u001b[39m.\u001b[39mfunction_captures\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py:305\u001b[0m, in \u001b[0;36mTracingCompiler._create_concrete_function\u001b[0;34m(self, args, kwargs, func_graph)\u001b[0m\n\u001b[1;32m 301\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 302\u001b[0m arg_names \u001b[39m=\u001b[39m base_arg_names\n\u001b[1;32m 304\u001b[0m concrete_function \u001b[39m=\u001b[39m monomorphic_function\u001b[39m.\u001b[39mConcreteFunction(\n\u001b[0;32m--> 305\u001b[0m func_graph_module\u001b[39m.\u001b[39mfunc_graph_from_py_func(\n\u001b[1;32m 306\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_name,\n\u001b[1;32m 307\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_python_function,\n\u001b[1;32m 308\u001b[0m args,\n\u001b[1;32m 309\u001b[0m kwargs,\n\u001b[1;32m 310\u001b[0m \u001b[39mNone\u001b[39;00m,\n\u001b[1;32m 311\u001b[0m func_graph\u001b[39m=\u001b[39mfunc_graph,\n\u001b[1;32m 312\u001b[0m arg_names\u001b[39m=\u001b[39marg_names,\n\u001b[1;32m 313\u001b[0m capture_by_value\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_capture_by_value,\n\u001b[1;32m 314\u001b[0m create_placeholders\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m),\n\u001b[1;32m 315\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_function_attributes,\n\u001b[1;32m 316\u001b[0m spec\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mfunction_spec,\n\u001b[1;32m 317\u001b[0m \u001b[39m# Tell the ConcreteFunction to clean up its graph once it goes out of\u001b[39;00m\n\u001b[1;32m 318\u001b[0m \u001b[39m# scope. This is not the default behavior since it gets used in some\u001b[39;00m\n\u001b[1;32m 319\u001b[0m \u001b[39m# places (like Keras) where the FuncGraph lives longer than the\u001b[39;00m\n\u001b[1;32m 320\u001b[0m \u001b[39m# ConcreteFunction.\u001b[39;00m\n\u001b[1;32m 321\u001b[0m shared_func_graph\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m)\n\u001b[1;32m 322\u001b[0m \u001b[39mreturn\u001b[39;00m concrete_function\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/framework/func_graph.py:1055\u001b[0m, in \u001b[0;36mfunc_graph_from_py_func\u001b[0;34m(name, python_func, args, kwargs, signature, func_graph, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, create_placeholders)\u001b[0m\n\u001b[1;32m 1052\u001b[0m \u001b[39mreturn\u001b[39;00m x\n\u001b[1;32m 1054\u001b[0m _, original_func \u001b[39m=\u001b[39m tf_decorator\u001b[39m.\u001b[39munwrap(python_func)\n\u001b[0;32m-> 1055\u001b[0m func_outputs \u001b[39m=\u001b[39m python_func(\u001b[39m*\u001b[39mfunc_args, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mfunc_kwargs)\n\u001b[1;32m 1057\u001b[0m \u001b[39m# invariant: `func_outputs` contains only Tensors, CompositeTensors,\u001b[39;00m\n\u001b[1;32m 1058\u001b[0m \u001b[39m# TensorArrays and `None`s.\u001b[39;00m\n\u001b[1;32m 1059\u001b[0m func_outputs \u001b[39m=\u001b[39m variable_utils\u001b[39m.\u001b[39mconvert_variables_to_tensors(func_outputs)\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:597\u001b[0m, in \u001b[0;36mFunction._compiler_with_scope.<locals>.wrapped_fn\u001b[0;34m(*args, **kwds)\u001b[0m\n\u001b[1;32m 593\u001b[0m \u001b[39mwith\u001b[39;00m default_graph\u001b[39m.\u001b[39m_variable_creator_scope(scope, priority\u001b[39m=\u001b[39m\u001b[39m50\u001b[39m): \u001b[39m# pylint: disable=protected-access\u001b[39;00m\n\u001b[1;32m 594\u001b[0m \u001b[39m# __wrapped__ allows AutoGraph to swap in a converted function. We give\u001b[39;00m\n\u001b[1;32m 595\u001b[0m \u001b[39m# the function a weak reference to itself to avoid a reference cycle.\u001b[39;00m\n\u001b[1;32m 596\u001b[0m \u001b[39mwith\u001b[39;00m OptionalXlaContext(compile_with_xla):\n\u001b[0;32m--> 597\u001b[0m out \u001b[39m=\u001b[39m weak_wrapped_fn()\u001b[39m.\u001b[39m__wrapped__(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwds)\n\u001b[1;32m 598\u001b[0m \u001b[39mreturn\u001b[39;00m out\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/lite/python/tflite_keras_util.py:190\u001b[0m, in \u001b[0;36mtrace_model_call.<locals>._wrapped_model\u001b[0;34m(*args)\u001b[0m\n\u001b[1;32m 186\u001b[0m inputs \u001b[39m=\u001b[39m args[\u001b[39m0\u001b[39m] \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(input_signature) \u001b[39m==\u001b[39m \u001b[39m1\u001b[39m \u001b[39melse\u001b[39;00m \u001b[39mlist\u001b[39m(args)\n\u001b[1;32m 188\u001b[0m \u001b[39mwith\u001b[39;00m keras_deps\u001b[39m.\u001b[39mget_call_context_function()()\u001b[39m.\u001b[39menter(\n\u001b[1;32m 189\u001b[0m model, inputs\u001b[39m=\u001b[39minputs, build_graph\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m, training\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m, saving\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m):\n\u001b[0;32m--> 190\u001b[0m outputs \u001b[39m=\u001b[39m model(inputs, training\u001b[39m=\u001b[39m\u001b[39mFalse\u001b[39;00m)\n\u001b[1;32m 192\u001b[0m \u001b[39mreturn\u001b[39;00m outputs\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70\u001b[0m, in \u001b[0;36mfilter_traceback.<locals>.error_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 67\u001b[0m filtered_tb \u001b[39m=\u001b[39m _process_traceback_frames(e\u001b[39m.\u001b[39m__traceback__)\n\u001b[1;32m 68\u001b[0m \u001b[39m# To get the full stack trace, call:\u001b[39;00m\n\u001b[1;32m 69\u001b[0m \u001b[39m# `tf.debugging.disable_traceback_filtering()`\u001b[39;00m\n\u001b[0;32m---> 70\u001b[0m \u001b[39mraise\u001b[39;00m e\u001b[39m.\u001b[39mwith_traceback(filtered_tb) \u001b[39mfrom\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[1;32m 71\u001b[0m \u001b[39mfinally\u001b[39;00m:\n\u001b[1;32m 72\u001b[0m \u001b[39mdel\u001b[39;00m filtered_tb\n", | |
"File \u001b[0;32m~/miniconda/envs/my-proj/lib/python3.11/site-packages/tensorflow/python/eager/polymorphic_function/autograph_util.py:52\u001b[0m, in \u001b[0;36mpy_func_from_autograph.<locals>.autograph_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 50\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m e: \u001b[39m# pylint:disable=broad-except\u001b[39;00m\n\u001b[1;32m 51\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mhasattr\u001b[39m(e, \u001b[39m\"\u001b[39m\u001b[39mag_error_metadata\u001b[39m\u001b[39m\"\u001b[39m):\n\u001b[0;32m---> 52\u001b[0m \u001b[39mraise\u001b[39;00m e\u001b[39m.\u001b[39mag_error_metadata\u001b[39m.\u001b[39mto_exception(e)\n\u001b[1;32m 53\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 54\u001b[0m \u001b[39mraise\u001b[39;00m\n", | |
"\u001b[0;31mOperatorNotAllowedInGraphError\u001b[0m: Exception encountered when calling layer 'image_captioning_model_14' (type ImageCaptioningModel).\n\nin user code:\n\n File \"/var/folders/2b/p4gxgkbj4qlg9wgcqphy88sw0000gq/T/ipykernel_94456/1975170582.py\", line 276, in call *\n vocab = vectorization.get_vocabulary()\n File \"/Users/aa849190/miniconda/envs/my-proj/lib/python3.11/site-packages/keras/src/layers/preprocessing/text_vectorization.py\", line 493, in get_vocabulary **\n return self._lookup_layer.get_vocabulary(include_special_tokens)\n File \"/Users/aa849190/miniconda/envs/my-proj/lib/python3.11/site-packages/keras/src/layers/preprocessing/index_lookup.py\", line 382, in get_vocabulary\n if self.lookup_table.size() == 0:\n\n OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.\n\n\nCall arguments received by layer 'image_captioning_model_14' (type ImageCaptioningModel):\n • image=tf.Tensor(shape=(299, 299, 3), dtype=float32)" | |
] | |
} | |
], | |
"source": [ | |
"# Build Keras model.\n", | |
"#input_shape=(*IMAGE_SIZE, 3)\n", | |
"#caption_model.build(input_shape)\n", | |
"\n", | |
"# Convert the keras model using TFLiteConverter.\n", | |
"# Keras model converter API uses the default signature automatically.\n", | |
"converter = tf.lite.TFLiteConverter.from_keras_model(caption_model)\n", | |
"\n", | |
"converter.target_spec.supported_ops = [\n", | |
" tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.\n", | |
" tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.\n", | |
"]\n", | |
"\n", | |
"tflite_model = converter.convert()\n", | |
"\n", | |
"# Print the signatures from the converted model\n", | |
"interpreter = tf.lite.Interpreter(model_content=tflite_model)\n", | |
"\n", | |
"signatures = interpreter.get_signature_list()\n", | |
"print('Signatures:', signatures)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 158, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Signature: {'serving_default': {'inputs': ['x'], 'outputs': ['encoded_result']}}\n", | |
"Input: [{'name': 'serving_default_x:0', 'index': 0, 'shape': array([1], dtype=int32), 'shape_signature': array([-1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]\n", | |
"Output: [{'name': 'PartitionedCall:0', 'index': 1, 'shape': array([1], dtype=int32), 'shape_signature': array([-1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]\n" | |
] | |
} | |
], | |
"source": [ | |
"# Convert the model\n", | |
"#converter = tf.lite.TFLiteConverter.from_saved_model(\"my_model2\") # path to the SavedModel directory\n", | |
"#converter = tf.lite.TFLiteConverter.from_keras_model(tflite_model)\n", | |
"#tflite_model = converter.convert()\n", | |
"\n", | |
"# Save the model.\n", | |
"with open('model2.tflite', 'wb') as f:\n", | |
" f.write(tflite_model)\n", | |
"\n", | |
"# Load the TFLite model and allocate tensors.\n", | |
"interpreter = tf.lite.Interpreter(model_path=\"model2.tflite\")\n", | |
"interpreter.allocate_tensors()\n", | |
"\n", | |
"# Get input and output tensors.\n", | |
"input_details = interpreter.get_input_details()\n", | |
"output_details = interpreter.get_output_details()\n", | |
"\n", | |
"print('Signature:', interpreter.get_signature_list())\n", | |
"print('Input:', input_details)\n", | |
"print('Output:',output_details)" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Load and run the TFLite Model in Python\n", | |
"\n", | |
"The Python API for running an inference is provided in the tf.lite module. From which, you mostly need only tf.lite.Interpreter to load a model and run an inference.\n", | |
"\n", | |
"The following example shows how to use the Python interpreter to load a .tflite file and run inference with random input data:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 164, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Results: {'encoded_result': 2.0}\n" | |
] | |
} | |
], | |
"source": [ | |
"# Load the TFLite model and allocate tensors.\n", | |
"interpreter = tf.lite.Interpreter(model_path=\"model2.tflite\")\n", | |
"interpreter.allocate_tensors()\n", | |
"\n", | |
"# encode and decode are callable with input as arguments.\n", | |
"my_signature = interpreter.get_signature_runner('serving_default')\n", | |
"\n", | |
"# my_signature is callable with input as arguments.\n", | |
"input = tf.constant(1, dtype=tf.float32)\n", | |
"output = my_signature(x=input)\n", | |
"print('Results:', output)" | |
] | |
}, | |
{ | |
"attachments": {}, | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## End Notes\n", | |
"\n", | |
"```\n", | |
"We saw that the model starts to generate reasonable captions after a few epochs. To keep\n", | |
"this example easily runnable, we have trained it with a few constraints, like a minimal\n", | |
"number of attention heads. To improve the predictions, you can try changing these training\n", | |
"settings and find a good model for your use case.\n", | |
"```" | |
] | |
} | |
], | |
"metadata": { | |
"accelerator": "GPU", | |
"colab": { | |
"collapsed_sections": [], | |
"name": "image_captioning.ipynb", | |
"toc_visible": true | |
}, | |
"kernelspec": { | |
"display_name": "Python 3", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.11.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment