Last active
September 22, 2022 01:56
-
-
Save edouardpineau/d0cf9a41fee1a28bbc9ffe95925aabb2 to your computer and use it in GitHub Desktop.
Python workshop iterators
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Iterators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"**Iterable** objects are Python objects that contain a countable number of values (data) and that can be iterated upon, i.e. you can traverse through their values one by one. Many standard Python objects are iterable:\n", | |
"\n", | |
"- Sequences: lists, tuples, strings, etc.\n", | |
"- Dictionaries\n", | |
"- File objects\n", | |
"- Sets\n", | |
"- ...\n", | |
"\n", | |
"Under each iterable is hidden an **iterator**. \n", | |
"\n", | |
"**Iterators** are simple tools to browse **iterable** objects. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<list_iterator object at 0x7f371826ec50>\n" | |
] | |
} | |
], | |
"source": [ | |
"list_numbers = [1, 2, 3, 4, 5]\n", | |
"\n", | |
"print(list_numbers.__iter__())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The iterator enables to decorrelate the objects that contain the data from the object that iterates on data, based on an **iteration protocol**.\n", | |
"\n", | |
"Iteration protocol is formed by the existence of two methods:\n", | |
"\n", | |
"- __iter__() returns the iterator itself\n", | |
"- __next__() returns:\n", | |
" - The next item\n", | |
" - StopIteration exception if there are no further items\n", | |
"\n", | |
"<u> Remark </u>: an iterator is also an iterable object since it has a method **iter**(). Hence all methods iterating on iterable (for, while, etc.) can also take an iterator as input. \n", | |
"\n", | |
"<u> Example </u>:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"list_numbers = [1, 2, 3, 4, 5]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I want to browse my list of numbers. For that, I can define an iterator" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"it = iter(list_numbers)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"next(it)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The iterator retains the position of the iteration ! \n", | |
"\n", | |
"Once we reach the end of the object (StopIteration), we need to define a new iterator to browse the object a second time: each iterator is a **single-use object**. \n", | |
"\n", | |
"An iterator is also **ad-hoc** and **unique**." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"False" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"list_numbers.__iter__() == list_numbers.__iter__()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"NATURALLY, you can use **for** loops to do the same iteration on *list_numbers*. \n", | |
"\n", | |
"In practice, **for** loops are **iteration mechanisms** and are based on iterators: they apply __iter__() and __next__(). " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def for_loop(iterable, action_to_do):\n", | |
" iterator = iter(iterable)\n", | |
" done_looping = False\n", | |
" returns = []\n", | |
" while not done_looping:\n", | |
" try:\n", | |
" item = next(iterator)\n", | |
" returns.append(action_to_do(item))\n", | |
" except StopIteration:\n", | |
" done_looping = True\n", | |
" return returns" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[1, 4, 9, 16, 25]" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"list_numbers = [1, 2, 3, 4, 5]\n", | |
"for_loop(list_numbers, lambda x: x**2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##### <u> Question </u>: if standard iteration mechanisms (e.g. for loops) do it for you, why is it useful to know iterators?" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u> Answer 1</u>: you want to iterate on objects that are not loaded in memory\n", | |
"\n", | |
"<u> Example 1 </u>: when you **open** a file, you create an iterator that reads the line of the file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"file = open('winequality-white.csv')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"True" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"(file == file.__iter__()) and (file.__iter__() == file.__iter__())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The **file** is an iterator since it is its own iterator (that is then unique). Not loading the file into iterable object saves memory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Size of file (bytes): 264426\n", | |
"Size of file iterator (bytes): 224\n" | |
] | |
} | |
], | |
"source": [ | |
"import os\n", | |
"import sys\n", | |
"\n", | |
"print('Size of file (bytes): {}'.format(os.path.getsize('winequality-white.csv')))\n", | |
"print('Size of file iterator (bytes): {}'.format(sys.getsizeof(file)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It is called **laziness**:\n", | |
"\n", | |
"*Iterators allow us to both work with and create lazy iterables that don't do any work until we ask them for their next item.*\n", | |
"\n", | |
"Because of their laziness, the iterators can help us to deal with infinitely long iterables. In some cases, we can't even store all the information in the memory, so we can use an iterator which can give us the next item every time we ask it. Iterators can save us a lot of memory and CPU time." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u> Example 2 </u>: the **zip** function takes two iterable objects." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"list_numbers = [1, 2, 3, 4, 5, 6]\n", | |
"list_letters = ['a', 'b', 'c', 'd', 'e']\n", | |
"\n", | |
"z = zip(list_numbers, list_letters)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"At each iteration *i*, a tuple (list_numbers[i], list_letters[i]) is created" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(1, 'a')\n", | |
"(2, 'b')\n", | |
"(3, 'c')\n", | |
"(4, 'd')\n", | |
"(5, 'e')\n" | |
] | |
} | |
], | |
"source": [ | |
"for i in z:\n", | |
" print(i)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"for i in z:\n", | |
" print(i)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"True" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"(z == z.__iter__()) and (z.__iter__() == z.__iter__())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Size of lists (bytes): 232\n", | |
"Size of zip (bytes): 72\n" | |
] | |
} | |
], | |
"source": [ | |
"print('Size of lists (bytes): {}'.format(sys.getsizeof(list_numbers) + sys.getsizeof(list_letters)))\n", | |
"print('Size of zip (bytes): {}'.format(sys.getsizeof(z)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Single-use:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"for i in z:\n", | |
" print(i)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The iterator is dead since already used. \n", | |
"\n", | |
"<u> Remarks </u>\n", | |
"- Iterators are cheap and simple to create also because of their single-use property!\n", | |
"- Iterators are hidden in many standard Python built-in functions: map(), enumerate(), reversed(), range(), etc." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u> Answer 2</u>: you can create your own objects with appropriate iteration mechanism. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class increment_numbers:\n", | |
" def __init__(self, start_value, stop_value):\n", | |
" self.current = start_value\n", | |
" self.high = stop_value\n", | |
"\n", | |
" def __iter__(self):\n", | |
" return self\n", | |
"\n", | |
" def __next__(self):\n", | |
" if self.current > self.high:\n", | |
" raise StopIteration\n", | |
" else:\n", | |
" self.current += 1\n", | |
" return self.current - 1\n", | |
"\n", | |
"numbers = increment_numbers(1, 5)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class '__main__.increment_numbers'>\n" | |
] | |
} | |
], | |
"source": [ | |
"print(type(numbers))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1\n", | |
"2\n", | |
"3\n", | |
"4\n", | |
"5\n" | |
] | |
}, | |
{ | |
"ename": "StopIteration", | |
"evalue": "", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-19-28cf64d6cd9c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;32m<ipython-input-17-fc87b7a9807f>\u001b[0m in \u001b[0;36m__next__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__next__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcurrent\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhigh\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcurrent\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
"\u001b[0;31mStopIteration\u001b[0m: " | |
] | |
} | |
], | |
"source": [ | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Iterators and generators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I presented the concept of **iterator**. I showed how to use it and that it is underlying many iteration mechanisms and functions in Python. \n", | |
"\n", | |
"Here I will present another interesting application of iterators: the **generator functions** and **generator expressions**." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Generator functions\n", | |
"\n", | |
"<u> Definition from Pythpn docs</u>: a function which returns a **generator iterator**. It looks like a normal function except that it contains **yield** expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.\n", | |
"\n", | |
"#### Generator iterator\n", | |
"\n", | |
"<u> Definition from Pythpn docs</u>: an object created by a generator function." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def increment_numbers(start_value, stop_value):\n", | |
" while start_value <= stop_value:\n", | |
" yield start_value\n", | |
" start_value += 1\n", | |
" return 42\n", | |
"\n", | |
"numbers = increment_numbers(1, 5)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"<class 'generator'>\n" | |
] | |
} | |
], | |
"source": [ | |
"print(type(numbers))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1\n", | |
"2\n", | |
"3\n", | |
"4\n", | |
"5\n" | |
] | |
}, | |
{ | |
"ename": "StopIteration", | |
"evalue": "42", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mStopIteration\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-22-28cf64d6cd9c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnumbers\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;31mStopIteration\u001b[0m: 42" | |
] | |
} | |
], | |
"source": [ | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))\n", | |
"print(next(numbers))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[1, 2, 3, 4, 5]" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"numbers = increment_numbers(1, 5)\n", | |
"list(numbers)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The yield expression is the thing that separates a generation function from a normal function. This expression is helping us to use the iterator'ss laziness." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u> From Python docs</u>:\n", | |
" \n", | |
"*Each* **yield** *temporarily suspends processing, remembering the location execution state, and returns the value. When the generator iterator resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation).*" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Generator expressions" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The generator expressions are very similar to the **list comprehensions**." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u>Reminder on list comprehension</u>: [**output** for **iteration** in **iterable** if *(optional condition)*]\n", | |
"\n", | |
"<u>Example</u>:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[9, 16, 25]\n" | |
] | |
} | |
], | |
"source": [ | |
"numbers = [1, 2, 3, 4, 5]\n", | |
"squares = [number**2 for number in numbers if number > 2]\n", | |
"print(squares)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There exists comprehension for all iterable objects: sequences, dictionaries, etc.\n", | |
"\n", | |
"<u>Example</u>: set comprehension" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['ana', 'eve', 'ALICE', 'Anne', 'bob', 'alice', 'AlIcE', 'Alice']\n" | |
] | |
} | |
], | |
"source": [ | |
"prenoms = ['ana', 'eve', 'ALICE', 'Anne', 'bob', 'alice', 'AlIcE', 'Alice'] \n", | |
"print(prenoms)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['ana', 'alice', 'anne', 'alice', 'alice', 'alice']\n" | |
] | |
} | |
], | |
"source": [ | |
"a_prenoms = [p.lower() for p in prenoms if p.lower().startswith('a')] \n", | |
"print(a_prenoms)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'ana', 'alice', 'anne'}\n" | |
] | |
} | |
], | |
"source": [ | |
"print(set(a_prenoms))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'ana', 'alice', 'anne'}\n" | |
] | |
} | |
], | |
"source": [ | |
"a_prenoms = {p.lower() for p in prenoms if p.lower().startswith('a')} \n", | |
"print(a_prenoms)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<u>Example</u>: dictionary comprehension" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'ana': 20, 'EVE': 30, 'bob': 40}\n" | |
] | |
} | |
], | |
"source": [ | |
"ages = [('ana', 20), ('EVE', 30), ('bob', 40)] \n", | |
"ages = dict(ages) \n", | |
"print(ages)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'ana': 20, 'eve': 30, 'bob': 40}\n" | |
] | |
} | |
], | |
"source": [ | |
"ages_fix = {p.lower():a for p, a in ages.items()} \n", | |
"print(ages_fix)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{'ana': 20, 'eve': 30}\n" | |
] | |
} | |
], | |
"source": [ | |
"ages_fix = {p.lower():a for p, a in ages.items() if a < 40} \n", | |
"print(ages_fix)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##### From comprehension to generation\n", | |
"\n", | |
"A problem with a comprehension is that it creates a temporary structure\n", | |
"\n", | |
"<u>Generator expression</u>: (**output** for **iteration** in **iterable** if *(optional condition)*)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"scrolled": true | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 522 µs, sys: 0 ns, total: 522 µs\n", | |
"Wall time: 551 µs\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time \n", | |
"square = [x**2 for x in range(1000)]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 14 µs, sys: 4 µs, total: 18 µs\n", | |
"Wall time: 22.4 µs\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"332833500" | |
] | |
}, | |
"execution_count": 33, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"sum(square)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 14 µs, sys: 3 µs, total: 17 µs\n", | |
"Wall time: 21.9 µs\n" | |
] | |
} | |
], | |
"source": [ | |
"%%time \n", | |
"square = (x**2 for x in range(1000))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"<generator object <genexpr> at 0x7f37181bfa50>" | |
] | |
}, | |
"execution_count": 35, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"square" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CPU times: user 338 µs, sys: 0 ns, total: 338 µs\n", | |
"Wall time: 343 µs\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"332833500" | |
] | |
}, | |
"execution_count": 36, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"%%time\n", | |
"sum(square)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The sum on the generative expression (an iterator, NDLR) computes the square on-the-fly while the iterator iterates on the iterator range(1000)\n", | |
"\n", | |
"<u>Remark</u> we can do it with the Python built-in function *map()*:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 37, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"332833500" | |
] | |
}, | |
"execution_count": 37, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"square = map(lambda x: x**2, range(1000))\n", | |
"sum(square)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using iterator to avoid temporary data structure is very trendy in Python, in particular in Big Data problems. \n", | |
"\n", | |
"Since generator expressions have the same limitations than comprehensions (only one output expression can be defined) we can:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- chain the generator expressions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 38, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[0,\n", | |
" 1,\n", | |
" 4,\n", | |
" 9,\n", | |
" 121,\n", | |
" 484,\n", | |
" 676,\n", | |
" 10201,\n", | |
" 12321,\n", | |
" 14641,\n", | |
" 40804,\n", | |
" 44944,\n", | |
" 69696,\n", | |
" 94249,\n", | |
" 698896]" | |
] | |
}, | |
"execution_count": 38, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"square = map(lambda x: x**2, range(1000))\n", | |
"palindrome = (x for x in square if str(x) == str(x)[::-1])\n", | |
"\n", | |
"list(palindrome)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"- use generative functions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 39, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"square = map(lambda x: x**2, range(1000))\n", | |
"condition = lambda x: isinstance(x, (str, int)) and str(x) == str(x)[::-1]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 40, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def palindrome(iterator, condition):\n", | |
" for i in iterator:\n", | |
" if condition(i):\n", | |
" yield i" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"p = palindrome(square, condition)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[0,\n", | |
" 1,\n", | |
" 4,\n", | |
" 9,\n", | |
" 121,\n", | |
" 484,\n", | |
" 676,\n", | |
" 10201,\n", | |
" 12321,\n", | |
" 14641,\n", | |
" 40804,\n", | |
" 44944,\n", | |
" 69696,\n", | |
" 94249,\n", | |
" 698896]" | |
] | |
}, | |
"execution_count": 42, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"list(p)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"condition = lambda x: isinstance(x, (str, int)) and str(x) == str(x)[::-1] and x%2==0" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### What will be the output of the next cell?" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"p = palindrome(square, condition)\n", | |
"list(p)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Takeaways / Summary\n", | |
"\n", | |
"- An iterable is something you can loop over.\n", | |
"- Sequences are a very common type of iterable.\n", | |
"- Many things in Python are iterables, but not all of them are sequences.\n", | |
"- An iterator is an object representing a stream of data. It does the iterating over an iterable. You can use an iterator to get the next value or to loop over it. Once, you loop over an iterator, there are no more stream values.\n", | |
"- Iterators use the lazy evaluation approach.\n", | |
"- Many built-in classes in Python are iterators.\n", | |
"- A generator function is a function which returns an iterator.\n", | |
"- A generator expression is an expression that returns an iterator." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment