Last active
October 21, 2024 22:03
-
-
Save dyerrington/14f262fd54eaaeb2dce82c081e461c31 to your computer and use it in GitHub Desktop.
Efficient List Creation in Python: Comparing .append(), +=, List Comprehension, and Duplication Description: This notebook explores different approaches to creating and modifying lists in Python, comparing the efficiency of methods like .append(), +=, list comprehension, and list duplication ([0] * n). Includes insights on performance and trade-…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "e63411f4-2d96-4764-b5f7-b2f434818374", | |
"metadata": {}, | |
"source": [ | |
"# Basic Python Efficiency\n", | |
"> [David Yerrington](http://www.yerrington.net)\n", | |
"\n", | |
"I wrote this notebook because I've noticed many good posts that explore surface-level concepts about different approaches and their tradeoffs. For example, [this post on Stack Exchange](https://stackoverflow.com/questions/20816600/best-and-or-fastest-way-to-create-lists-in-python) dives into how long various approaches to iteration take.\n", | |
"\n", | |
"It's a helpful way to get a sense of how different approaches to sequential data operations compare, such as:\n", | |
"\n", | |
"- Appending to a list using the built-in `.append()` method.\n", | |
"- Using the additive operator `+=` to append.\n", | |
"- List comprehension: `[0 for i in range(50)]`.\n", | |
"- Using list duplication: `[0] * 50`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "ed195e01-599e-445d-86b2-041425e5b1b0", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"The line_profiler extension is already loaded. To reload it, use:\n", | |
" %reload_ext line_profiler\n" | |
] | |
} | |
], | |
"source": [ | |
"%load_ext line_profiler\n", | |
"\n", | |
"import dis" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "aec93598-5ed9-4b5e-a4e7-451bf5117590", | |
"metadata": {}, | |
"source": [ | |
"## Test 1\n", | |
"- Iterate, then append a list" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "7d0ef580-2c9b-4e8e-a983-86d84e42e30d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Timer unit: 1e-09 s\n", | |
"\n", | |
"Total time: 4.4e-05 s\n", | |
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/2577588337.py\n", | |
"Function: test1 at line 1\n", | |
"\n", | |
"Line # Hits Time Per Hit % Time Line Contents\n", | |
"==============================================================\n", | |
" 1 def test1():\n", | |
" 2 1 3000.0 3000.0 6.8 my_list = []\n", | |
" 3 51 19000.0 372.5 43.2 for i in range(50):\n", | |
" 4 50 22000.0 440.0 50.0 my_list.append(0)" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"def test1():\n", | |
" my_list = []\n", | |
" for i in range(50):\n", | |
" my_list.append(0)\n", | |
"\n", | |
"%lprun -f test1 test1()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1892db68-f3c5-41be-9894-ded31b9408a8", | |
"metadata": {}, | |
"source": [ | |
"### `dis` Python's Disassembler\n", | |
"\n", | |
"If we wrap any code in a function, we can use `dis` to see exactly which operations are actually being run at the lowest level within Python's byte-code level. exactly which bytecode instructions Python uses to execute that function, giving you insight into the internal operations happening under the hood.\n", | |
"\n", | |
"This is useful for a few reasons:\n", | |
"\n", | |
"- We can see how many operations each example takes\n", | |
"- It's possible to know the tradeoffs of different approaches in terms of efficiency\n", | |
"- You can tell people how Python works at the lowest level in various social settings!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "afdd033a-9a73-4686-87a2-126bfcc7db8d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" 2 0 BUILD_LIST 0\n", | |
" 2 STORE_FAST 0 (my_list)\n", | |
"\n", | |
" 3 4 LOAD_GLOBAL 0 (range)\n", | |
" 6 LOAD_CONST 1 (50)\n", | |
" 8 CALL_FUNCTION 1\n", | |
" 10 GET_ITER\n", | |
" >> 12 FOR_ITER 14 (to 28)\n", | |
" 14 STORE_FAST 1 (i)\n", | |
"\n", | |
" 4 16 LOAD_FAST 0 (my_list)\n", | |
" 18 LOAD_METHOD 1 (append)\n", | |
" 20 LOAD_CONST 2 (0)\n", | |
" 22 CALL_METHOD 1\n", | |
" 24 POP_TOP\n", | |
" 26 JUMP_ABSOLUTE 12\n", | |
" >> 28 LOAD_CONST 0 (None)\n", | |
" 30 RETURN_VALUE\n" | |
] | |
} | |
], | |
"source": [ | |
"dis.dis(test1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5d1cf84e-3021-4371-8b50-0138dd3e496d", | |
"metadata": {}, | |
"source": [ | |
"- BUILD_LIST: Creates an empty list (my_list)\n", | |
"- STORE_FAST: Saves the list into the variable my_list\n", | |
"- LOAD_GLOBAL and CALL_FUNCTION: Loads the range function and calls it with the argument 50\n", | |
"- FOR_ITER: Starts looping through the values from range(50)\n", | |
"- LOAD_FAST and LOAD_METHOD: Loads the list (my_list) and prepares to call its append method\n", | |
"- LOAD_CONST and CALL_METHOD: Loads the value 0 and appends it to the list\n", | |
"- JUMP_ABSOLUTE: Loops back to the start of the next iteration\n", | |
"- RETURN_VALUE: After the loop ends, the function returns None (which is the default in Python when no return value is specified).\n", | |
"\n", | |
"> ** But that do those numbers mean!? **\n", | |
"> \n", | |
"> `>> 12 FOR_ITER 14 (to 28)`\n", | |
"> \n", | |
"> - **12:** This is the instruction’s address (or offset) in the bytecode. The instruction at address 12 is FOR_ITER.\n", | |
"> - **FOR_ITER:** This is the bytecode instruction that handles iterating over the values generated by range(50).\n", | |
"> - **14 (to 28)**:\n", | |
"> - The 14 is the “jump distance,” meaning if the iteration is complete (i.e., the loop has finished), Python will jump 14 instructions forward.\n", | |
"> - “to 28” indicates that after completing the loop, the program will jump to bytecode address 28, which corresponds to the end of the function (where the RETURN_VALUE instruction is located)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4816e85d-293b-4baf-9afe-723111bf443c", | |
"metadata": {}, | |
"source": [ | |
"## Test 2\n", | |
"- Iteration with append operation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "4dff3fe9-cf7e-43e3-a3e4-6642cdfeb325", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Timer unit: 1e-09 s\n", | |
"\n", | |
"Total time: 6.2e-05 s\n", | |
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/722596294.py\n", | |
"Function: test2 at line 1\n", | |
"\n", | |
"Line # Hits Time Per Hit % Time Line Contents\n", | |
"==============================================================\n", | |
" 1 def test2():\n", | |
" 2 1 3000.0 3000.0 4.8 my_list = []\n", | |
" 3 51 26000.0 509.8 41.9 for i in range(50):\n", | |
" 4 50 33000.0 660.0 53.2 my_list += [0]" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"def test2():\n", | |
" my_list = []\n", | |
" for i in range(50):\n", | |
" my_list += [0]\n", | |
"\n", | |
"%lprun -f test2 test2()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "eafa4f59-ecf2-4425-9abf-5126c76998be", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" 2 0 BUILD_LIST 0\n", | |
" 2 STORE_FAST 0 (my_list)\n", | |
"\n", | |
" 3 4 LOAD_GLOBAL 0 (range)\n", | |
" 6 LOAD_CONST 1 (50)\n", | |
" 8 CALL_FUNCTION 1\n", | |
" 10 GET_ITER\n", | |
" >> 12 FOR_ITER 14 (to 28)\n", | |
" 14 STORE_FAST 1 (i)\n", | |
"\n", | |
" 4 16 LOAD_FAST 0 (my_list)\n", | |
" 18 LOAD_CONST 2 (0)\n", | |
" 20 BUILD_LIST 1\n", | |
" 22 INPLACE_ADD\n", | |
" 24 STORE_FAST 0 (my_list)\n", | |
" 26 JUMP_ABSOLUTE 12\n", | |
" >> 28 LOAD_CONST 0 (None)\n", | |
" 30 RETURN_VALUE\n" | |
] | |
} | |
], | |
"source": [ | |
"dis.dis(test2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "24278e4e-bedb-4ccd-98be-134fb1be76c8", | |
"metadata": {}, | |
"source": [ | |
"## Test 3\n", | |
"- Comprehension" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "7b2f2826-00ba-4025-b7a6-a29681197d1e", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Timer unit: 1e-09 s\n", | |
"\n", | |
"Total time: 0.000999 s\n", | |
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\n", | |
"Function: test3 at line 1\n", | |
"\n", | |
"Line # Hits Time Per Hit % Time Line Contents\n", | |
"==============================================================\n", | |
" 1 def test3():\n", | |
" 2 1 999000.0 999000.0 100.0 my_list = [0 for i in range(50)]" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"def test3():\n", | |
" my_list = [0 for i in range(50)]\n", | |
" \n", | |
"%lprun -f test3 test3()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "f14bc707-9ce9-4b50-a3f5-a4a2f929061b", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x1e9ebc9d0, file \"/var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\", line 2>)\n", | |
" 2 LOAD_CONST 2 ('test3.<locals>.<listcomp>')\n", | |
" 4 MAKE_FUNCTION 0\n", | |
" 6 LOAD_GLOBAL 0 (range)\n", | |
" 8 LOAD_CONST 3 (50)\n", | |
" 10 CALL_FUNCTION 1\n", | |
" 12 GET_ITER\n", | |
" 14 CALL_FUNCTION 1\n", | |
" 16 STORE_FAST 0 (my_list)\n", | |
" 18 LOAD_CONST 0 (None)\n", | |
" 20 RETURN_VALUE\n", | |
"\n", | |
"Disassembly of <code object <listcomp> at 0x1e9ebc9d0, file \"/var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\", line 2>:\n", | |
" 2 0 BUILD_LIST 0\n", | |
" 2 LOAD_FAST 0 (.0)\n", | |
" >> 4 FOR_ITER 8 (to 14)\n", | |
" 6 STORE_FAST 1 (i)\n", | |
" 8 LOAD_CONST 0 (0)\n", | |
" 10 LIST_APPEND 2\n", | |
" 12 JUMP_ABSOLUTE 4\n", | |
" >> 14 RETURN_VALUE\n" | |
] | |
} | |
], | |
"source": [ | |
"dis.dis(test3)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d6943ddd-8e4f-43b9-ace9-186b15884852", | |
"metadata": {}, | |
"source": [ | |
"## Test 4\n", | |
"- Duplication by scale" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "92937281-68ac-4172-bfd0-d7c218cab7e1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Timer unit: 1e-09 s\n", | |
"\n", | |
"Total time: 0.000343 s\n", | |
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/3846408923.py\n", | |
"Function: test4 at line 1\n", | |
"\n", | |
"Line # Hits Time Per Hit % Time Line Contents\n", | |
"==============================================================\n", | |
" 1 def test4():\n", | |
" 2 1 343000.0 343000.0 100.0 my_list = [0] * 50" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"def test4():\n", | |
" my_list = [0] * 50\n", | |
" \n", | |
"%lprun -f test4 test4()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "b9fea7a6-aa63-48b5-a34b-1d535396888b", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" 2 0 LOAD_CONST 1 (0)\n", | |
" 2 BUILD_LIST 1\n", | |
" 4 LOAD_CONST 2 (50)\n", | |
" 6 BINARY_MULTIPLY\n", | |
" 8 STORE_FAST 0 (my_list)\n", | |
" 10 LOAD_CONST 0 (None)\n", | |
" 12 RETURN_VALUE\n" | |
] | |
} | |
], | |
"source": [ | |
"dis.dis(test4)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "97a35a0a-3aba-41f0-9012-9330721977d0", | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "TensorFlow M1 Env", | |
"language": "python", | |
"name": "tf_m1_env" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.9.20" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment