Skip to content

Instantly share code, notes, and snippets.

@dyerrington
Last active October 21, 2024 22:03
Show Gist options
  • Save dyerrington/14f262fd54eaaeb2dce82c081e461c31 to your computer and use it in GitHub Desktop.
Save dyerrington/14f262fd54eaaeb2dce82c081e461c31 to your computer and use it in GitHub Desktop.
Efficient List Creation in Python: Comparing .append(), +=, List Comprehension, and Duplication Description: This notebook explores different approaches to creating and modifying lists in Python, comparing the efficiency of methods like .append(), +=, list comprehension, and list duplication ([0] * n). Includes insights on performance and trade-…
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "e63411f4-2d96-4764-b5f7-b2f434818374",
"metadata": {},
"source": [
"# Basic Python Efficiency\n",
"> [David Yerrington](http://www.yerrington.net)\n",
"\n",
"I wrote this notebook because I've noticed many good posts that explore surface-level concepts about different approaches and their tradeoffs. For example, [this post on Stack Exchange](https://stackoverflow.com/questions/20816600/best-and-or-fastest-way-to-create-lists-in-python) dives into how long various approaches to iteration take.\n",
"\n",
"It's a helpful way to get a sense of how different approaches to sequential data operations compare, such as:\n",
"\n",
"- Appending to a list using the built-in `.append()` method.\n",
"- Using the additive operator `+=` to append.\n",
"- List comprehension: `[0 for i in range(50)]`.\n",
"- Using list duplication: `[0] * 50`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ed195e01-599e-445d-86b2-041425e5b1b0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The line_profiler extension is already loaded. To reload it, use:\n",
" %reload_ext line_profiler\n"
]
}
],
"source": [
"%load_ext line_profiler\n",
"\n",
"import dis"
]
},
{
"cell_type": "markdown",
"id": "aec93598-5ed9-4b5e-a4e7-451bf5117590",
"metadata": {},
"source": [
"## Test 1\n",
"- Iterate, then append a list"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7d0ef580-2c9b-4e8e-a983-86d84e42e30d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timer unit: 1e-09 s\n",
"\n",
"Total time: 4.4e-05 s\n",
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/2577588337.py\n",
"Function: test1 at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def test1():\n",
" 2 1 3000.0 3000.0 6.8 my_list = []\n",
" 3 51 19000.0 372.5 43.2 for i in range(50):\n",
" 4 50 22000.0 440.0 50.0 my_list.append(0)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def test1():\n",
" my_list = []\n",
" for i in range(50):\n",
" my_list.append(0)\n",
"\n",
"%lprun -f test1 test1()"
]
},
{
"cell_type": "markdown",
"id": "1892db68-f3c5-41be-9894-ded31b9408a8",
"metadata": {},
"source": [
"### `dis` Python's Disassembler\n",
"\n",
"If we wrap any code in a function, we can use `dis` to see exactly which operations are actually being run at the lowest level within Python's byte-code level. exactly which bytecode instructions Python uses to execute that function, giving you insight into the internal operations happening under the hood.\n",
"\n",
"This is useful for a few reasons:\n",
"\n",
"- We can see how many operations each example takes\n",
"- It's possible to know the tradeoffs of different approaches in terms of efficiency\n",
"- You can tell people how Python works at the lowest level in various social settings!"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "afdd033a-9a73-4686-87a2-126bfcc7db8d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 0 BUILD_LIST 0\n",
" 2 STORE_FAST 0 (my_list)\n",
"\n",
" 3 4 LOAD_GLOBAL 0 (range)\n",
" 6 LOAD_CONST 1 (50)\n",
" 8 CALL_FUNCTION 1\n",
" 10 GET_ITER\n",
" >> 12 FOR_ITER 14 (to 28)\n",
" 14 STORE_FAST 1 (i)\n",
"\n",
" 4 16 LOAD_FAST 0 (my_list)\n",
" 18 LOAD_METHOD 1 (append)\n",
" 20 LOAD_CONST 2 (0)\n",
" 22 CALL_METHOD 1\n",
" 24 POP_TOP\n",
" 26 JUMP_ABSOLUTE 12\n",
" >> 28 LOAD_CONST 0 (None)\n",
" 30 RETURN_VALUE\n"
]
}
],
"source": [
"dis.dis(test1)"
]
},
{
"cell_type": "markdown",
"id": "5d1cf84e-3021-4371-8b50-0138dd3e496d",
"metadata": {},
"source": [
"- BUILD_LIST: Creates an empty list (my_list)\n",
"- STORE_FAST: Saves the list into the variable my_list\n",
"- LOAD_GLOBAL and CALL_FUNCTION: Loads the range function and calls it with the argument 50\n",
"- FOR_ITER: Starts looping through the values from range(50)\n",
"- LOAD_FAST and LOAD_METHOD: Loads the list (my_list) and prepares to call its append method\n",
"- LOAD_CONST and CALL_METHOD: Loads the value 0 and appends it to the list\n",
"- JUMP_ABSOLUTE: Loops back to the start of the next iteration\n",
"- RETURN_VALUE: After the loop ends, the function returns None (which is the default in Python when no return value is specified).\n",
"\n",
"> ** But that do those numbers mean!? **\n",
"> \n",
"> `>> 12 FOR_ITER 14 (to 28)`\n",
"> \n",
"> - **12:** This is the instruction’s address (or offset) in the bytecode. The instruction at address 12 is FOR_ITER.\n",
"> - **FOR_ITER:** This is the bytecode instruction that handles iterating over the values generated by range(50).\n",
"> - **14 (to 28)**:\n",
"> - The 14 is the “jump distance,” meaning if the iteration is complete (i.e., the loop has finished), Python will jump 14 instructions forward.\n",
"> - “to 28” indicates that after completing the loop, the program will jump to bytecode address 28, which corresponds to the end of the function (where the RETURN_VALUE instruction is located)."
]
},
{
"cell_type": "markdown",
"id": "4816e85d-293b-4baf-9afe-723111bf443c",
"metadata": {},
"source": [
"## Test 2\n",
"- Iteration with append operation"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4dff3fe9-cf7e-43e3-a3e4-6642cdfeb325",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timer unit: 1e-09 s\n",
"\n",
"Total time: 6.2e-05 s\n",
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/722596294.py\n",
"Function: test2 at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def test2():\n",
" 2 1 3000.0 3000.0 4.8 my_list = []\n",
" 3 51 26000.0 509.8 41.9 for i in range(50):\n",
" 4 50 33000.0 660.0 53.2 my_list += [0]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def test2():\n",
" my_list = []\n",
" for i in range(50):\n",
" my_list += [0]\n",
"\n",
"%lprun -f test2 test2()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "eafa4f59-ecf2-4425-9abf-5126c76998be",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 0 BUILD_LIST 0\n",
" 2 STORE_FAST 0 (my_list)\n",
"\n",
" 3 4 LOAD_GLOBAL 0 (range)\n",
" 6 LOAD_CONST 1 (50)\n",
" 8 CALL_FUNCTION 1\n",
" 10 GET_ITER\n",
" >> 12 FOR_ITER 14 (to 28)\n",
" 14 STORE_FAST 1 (i)\n",
"\n",
" 4 16 LOAD_FAST 0 (my_list)\n",
" 18 LOAD_CONST 2 (0)\n",
" 20 BUILD_LIST 1\n",
" 22 INPLACE_ADD\n",
" 24 STORE_FAST 0 (my_list)\n",
" 26 JUMP_ABSOLUTE 12\n",
" >> 28 LOAD_CONST 0 (None)\n",
" 30 RETURN_VALUE\n"
]
}
],
"source": [
"dis.dis(test2)"
]
},
{
"cell_type": "markdown",
"id": "24278e4e-bedb-4ccd-98be-134fb1be76c8",
"metadata": {},
"source": [
"## Test 3\n",
"- Comprehension"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7b2f2826-00ba-4025-b7a6-a29681197d1e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timer unit: 1e-09 s\n",
"\n",
"Total time: 0.000999 s\n",
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\n",
"Function: test3 at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def test3():\n",
" 2 1 999000.0 999000.0 100.0 my_list = [0 for i in range(50)]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def test3():\n",
" my_list = [0 for i in range(50)]\n",
" \n",
"%lprun -f test3 test3()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f14bc707-9ce9-4b50-a3f5-a4a2f929061b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x1e9ebc9d0, file \"/var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\", line 2>)\n",
" 2 LOAD_CONST 2 ('test3.<locals>.<listcomp>')\n",
" 4 MAKE_FUNCTION 0\n",
" 6 LOAD_GLOBAL 0 (range)\n",
" 8 LOAD_CONST 3 (50)\n",
" 10 CALL_FUNCTION 1\n",
" 12 GET_ITER\n",
" 14 CALL_FUNCTION 1\n",
" 16 STORE_FAST 0 (my_list)\n",
" 18 LOAD_CONST 0 (None)\n",
" 20 RETURN_VALUE\n",
"\n",
"Disassembly of <code object <listcomp> at 0x1e9ebc9d0, file \"/var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/1561672606.py\", line 2>:\n",
" 2 0 BUILD_LIST 0\n",
" 2 LOAD_FAST 0 (.0)\n",
" >> 4 FOR_ITER 8 (to 14)\n",
" 6 STORE_FAST 1 (i)\n",
" 8 LOAD_CONST 0 (0)\n",
" 10 LIST_APPEND 2\n",
" 12 JUMP_ABSOLUTE 4\n",
" >> 14 RETURN_VALUE\n"
]
}
],
"source": [
"dis.dis(test3)"
]
},
{
"cell_type": "markdown",
"id": "d6943ddd-8e4f-43b9-ace9-186b15884852",
"metadata": {},
"source": [
"## Test 4\n",
"- Duplication by scale"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "92937281-68ac-4172-bfd0-d7c218cab7e1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timer unit: 1e-09 s\n",
"\n",
"Total time: 0.000343 s\n",
"File: /var/folders/hk/_grk7_jn6j3801ffzjt_dmm00000gn/T/ipykernel_84247/3846408923.py\n",
"Function: test4 at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def test4():\n",
" 2 1 343000.0 343000.0 100.0 my_list = [0] * 50"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def test4():\n",
" my_list = [0] * 50\n",
" \n",
"%lprun -f test4 test4()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b9fea7a6-aa63-48b5-a34b-1d535396888b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 0 LOAD_CONST 1 (0)\n",
" 2 BUILD_LIST 1\n",
" 4 LOAD_CONST 2 (50)\n",
" 6 BINARY_MULTIPLY\n",
" 8 STORE_FAST 0 (my_list)\n",
" 10 LOAD_CONST 0 (None)\n",
" 12 RETURN_VALUE\n"
]
}
],
"source": [
"dis.dis(test4)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97a35a0a-3aba-41f0-9012-9330721977d0",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "TensorFlow M1 Env",
"language": "python",
"name": "tf_m1_env"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.20"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment