Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python 内部机制与性能分析教程\n",
"\n",
"欢迎来到 Python 内部机制与性能分析教程!本教程旨在帮助你更深入地理解 Python 的一些工作原理,并学习如何分析和优化 Python 代码的性能。\n",
"\n",
"**为什么需要了解这些?**\n",
"\n",
"1. **编写更高效的代码**:理解内部机制有助于你做出更好的设计决策,避免常见的性能陷阱。\n",
"2. **调试复杂问题**:对内存管理、对象模型等的了解有助于诊断难以捉摸的 bug。\n",
"3. **有效优化**:知道如何分析性能瓶颈是进行有效优化的前提。过早的优化是万恶之源,但有针对性的优化是必要的。\n",
"4. **更深入地掌握 Python**:探索语言的内部运作本身就是一件有趣的事情。\n",
"\n",
"**本教程将涵盖:**\n",
"\n",
"1. **Python 对象模型与引用计数**\n",
"2. **垃圾回收机制 (Garbage Collection)**\n",
"3. **`__slots__` 对内存的影响**\n",
"4. **弱引用 (`weakref` 模块)**\n",
"5. **性能分析工具**\n",
" * `timeit` 模块:微基准测试\n",
" * `cProfile` 和 `profile`:代码剖析\n",
" * 可视化工具 (如 `snakeviz`)\n",
" * `memory_profiler`:内存分析\n",
"6. **常见的 Python 性能瓶颈与优化技巧 (概述)**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Python 对象模型与引用计数\n",
"\n",
"在 Python 中,**一切皆对象**。每个对象都有三个主要特征:\n",
"* **身份 (Identity)**:对象的唯一标识符,在 CPython 中通常是其内存地址。可以用 `id(obj)` 获取。\n",
"* **类型 (Type)**:对象的类型,决定了对象可以进行哪些操作以及具有哪些属性。可以用 `type(obj)` 获取。\n",
"* **值 (Value)**:对象所存储的数据。\n",
"\n",
"**变量是指向对象的名称 (标签)**\n",
"Python 中的变量更像是名称标签,它们指向内存中的对象。多个变量可以指向同一个对象。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = [1, 2, 3]\n",
"b = a # b 和 a 指向同一个列表对象\n",
"c = [1, 2, 3] # c 指向一个新的列表对象,尽管值相同\n",
"\n",
"print(f\"id(a): {id(a)}, type(a): {type(a)}, value a: {a}\")\n",
"print(f\"id(b): {id(b)}, type(b): {type(b)}, value b: {b}\")\n",
"print(f\"id(c): {id(c)}, type(c): {type(c)}, value c: {c}\")\n",
"\n",
"print(f\"\\na is b: {a is b}\") # True, 因为它们是同一个对象\n",
"print(f\"a == b: {a == b}\") # True, 因为它们的值相等\n",
"\n",
"print(f\"a is c: {a is c}\") # False, 因为它们是不同的对象\n",
"print(f\"a == c: {a == c}\") # True, 因为它们的值相等\n",
"\n",
"b.append(4)\n",
"print(f\"\\nAfter b.append(4):\")\n",
"print(f\"a: {a}\") # a 也被修改了,因为 a 和 b 指向同一个对象\n",
"print(f\"b: {b}\")\n",
"print(f\"c: {c}\") # c 不受影响"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**引用计数 (Reference Counting)**\n",
"\n",
"CPython 使用引用计数作为主要的内存管理机制。每个对象都有一个与之关联的引用计数器,记录有多少个名称(变量、容器元素等)指向该对象。\n",
"* 当一个名称指向对象时,对象的引用计数加 1。\n",
"* 当一个名称不再指向对象时(例如,变量被重新赋值、`del` 变量、变量离开作用域),对象的引用计数减 1。\n",
"* **当对象的引用计数变为 0 时,该对象占用的内存就可以被回收了。**\n",
"\n",
"我们可以使用 `sys.getrefcount(obj)` 来查看对象的引用计数。**注意**:`sys.getrefcount()` 本身会创建一个临时引用,所以其返回值通常比你预期的多 1。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"x = [10, 20]\n",
"print(f\"Initial refcount of x (approx): {sys.getrefcount(x) -1 }\") # 减1是为了抵消getrefcount自身的引用\n",
"\n",
"y = x \n",
"print(f\"Refcount after y = x (approx): {sys.getrefcount(x) - 1}\")\n",
"\n",
"z = [x, x] # 列表z中的两个元素都指向x的对象\n",
"print(f\"Refcount after z = [x, x] (approx): {sys.getrefcount(x) - 1}\")\n",
"\n",
"del y\n",
"print(f\"Refcount after del y (approx): {sys.getrefcount(x) - 1}\")\n",
"\n",
"z[0] = None # z的第一个元素不再指向x\n",
"print(f\"Refcount after z[0] = None (approx): {sys.getrefcount(x) - 1}\")\n",
"\n",
"del z # z被删除,它包含的对x的引用也消失\n",
"print(f\"Refcount after del z (approx): {sys.getrefcount(x) - 1}\")\n",
"\n",
"# 当 x 离开作用域或被 del x 时,其引用计数会进一步减少,最终可能变为0并被回收。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 垃圾回收机制 (Garbage Collection - GC)\n",
"\n",
"引用计数非常高效,但它无法处理**循环引用 (Circular References)**。例如:\n",
"```python\n",
"a = []\n",
"b = []\n",
"a.append(b)\n",
"b.append(a)\n",
"# 此时 a 和 b 互相引用,即使没有外部变量指向它们,它们的引用计数也不会为0\n",
"del a\n",
"del b\n",
"# a 和 b 的对象仍然存在于内存中,造成内存泄漏\n",
"```\n",
"为了解决循环引用的问题,Python 还引入了**分代垃圾回收 (Generational Garbage Collection)** 机制。它基于以下观察:\n",
"* 大多数对象都是“短命”的,很快就会变成垃圾。\n",
"* 很少有对象会“长寿”。\n",
"\n",
"**GC 过程概述:**\n",
"1. Python 将对象分为三代(0代、1代、2代)。新创建的对象属于0代。\n",
"2. 当某一代的对象数量达到阈值时,会触发该代的垃圾回收。\n",
"3. GC 会扫描该代中的所有对象,识别出那些通过引用计数无法回收的循环引用(通常使用可达性分析)。\n",
"4. 无法回收的循环引用中的对象被标记为垃圾并回收。\n",
"5. 在该代回收后仍然存活的对象会被“晋升”到下一代(例如,0代到1代,1代到2代)。\n",
"6. 较老代(如2代)的回收频率低于较新代(如0代),因为老代中的对象被认为是更稳定的。\n",
"\n",
"**`gc` 模块:**\n",
"Python 的 `gc` 模块允许你与垃圾回收器交互:\n",
"* `gc.collect(generation=2)`: 手动触发指定代(或所有代)的垃圾回收。\n",
"* `gc.disable()` / `gc.enable()`: 禁用/启用自动垃圾回收。\n",
"* `gc.set_threshold(threshold0, threshold1, threshold2)`: 设置各代回收的阈值。\n",
"* `gc.get_count()`: 返回当前各代的对象数量。\n",
"* `gc.set_debug(...)`: 设置调试标志。\n",
"\n",
"**注意**:通常不需要手动调用 `gc.collect()`,Python 会在需要时自动进行。手动调用主要用于调试或特定场景。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import gc\n",
"\n",
"print(f\"GC enabled: {gc.isenabled()}\")\n",
"print(f\"GC thresholds: {gc.get_threshold()}\")\n",
"print(f\"Initial GC counts: {gc.get_count()}\")\n",
"\n",
"# 创建循环引用示例\n",
"class Node:\n",
" def __init__(self, name):\n",
" self.name = name\n",
" self.next = None\n",
" print(f\"Node '{self.name}' created.\")\n",
" def __del__(self):\n",
" print(f\"Node '{self.name}' being deleted.\")\n",
"\n",
"def create_cycle():\n",
" n1 = Node(\"A\")\n",
" n2 = Node(\"B\")\n",
" n1.next = n2\n",
" n2.next = n1\n",
" print(\"Cycle created between A and B.\")\n",
" # 当 n1, n2 离开作用域,它们仍然互相引用,引用计数不为0\n",
" # 但它们是不可达的,GC应该能回收它们\n",
" return n1, n2 # 返回是为了在外面控制删除\n",
"\n",
"obj1, obj2 = create_cycle()\n",
"print(f\"Refcount of obj1 (approx): {sys.getrefcount(obj1)-1}\")\n",
"print(f\"Refcount of obj2 (approx): {sys.getrefcount(obj2)-1}\")\n",
"\n",
"del obj1\n",
"del obj2\n",
"print(\"obj1 and obj2 deleted from main scope.\")\n",
"\n",
"# 此时,Node A 和 Node B 仅通过彼此的 .next 引用,形成了循环引用\n",
"# 它们应该由分代 GC 回收\n",
"print(\"Triggering manual GC collection...\")\n",
"collected_count = gc.collect() # 手动触发GC\n",
"print(f\"Objects collected by GC: {collected_count}\")\n",
"# 你应该会看到 Node A 和 Node B 的 __del__ 方法被调用 (可能在此之前或之后,取决于GC行为)\n",
"\n",
"# 如果禁用GC,循环引用可能不会被回收 (直到程序结束)\n",
"# gc.disable()\n",
"# obj1_no_gc, obj2_no_gc = create_cycle()\n",
"# del obj1_no_gc\n",
"# del obj2_no_gc\n",
"# print(\"GC disabled, cycle might not be collected immediately.\")\n",
"# collected_count_no_gc = gc.collect() # 即使手动调用,如果GC被禁,效果也可能不同\n",
"# print(f\"Objects collected with GC disabled (then re-enabled for collect): {collected_count_no_gc}\")\n",
"# gc.enable()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. `__slots__` 对内存的影响\n",
"\n",
"默认情况下,Python 类的实例使用一个字典 (`__dict__`) 来存储它们的属性。字典是灵活的(可以动态添加/删除属性),但也相对消耗内存,尤其是当你有很多小对象时。\n",
"\n",
"通过在类中定义 `__slots__` 属性,你可以告诉 Python 不要为实例使用 `__dict__`,而是为指定的属性分配固定大小的空间,类似于 C 结构体。\n",
"\n",
"**`__slots__` 的作用:**\n",
"* **节省内存**:对于大量实例,可以显著减少内存占用。\n",
"* **更快的属性访问**:直接访问固定槽位通常比字典查找更快。\n",
"\n",
"**`__slots__` 的限制:**\n",
"* **不能动态添加属性**:实例只能拥有 `__slots__` 中声明的属性,除非 `__dict__` 也被包含在 `__slots__` 中,或者父类有 `__dict__` 且子类没有覆盖 `__slots__`。\n",
"* **不支持弱引用**:默认情况下,使用 `__slots__` 的实例不支持弱引用,除非 `__weakref__` 也被包含在 `__slots__` 中。\n",
"* **继承问题**:如果父类有 `__slots__`,子类也会继承它们,除非子类也定义自己的 `__slots__` (这时子类的 `__slots__` 会覆盖父类的,但父类槽位仍然存在)。如果父类有 `__dict__` 而子类定义了 `__slots__`,子类实例仍然会有 `__dict__` (来自父类)。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class PointNoSlots:\n",
" def __init__(self, x, y):\n",
" self.x = x\n",
" self.y = y\n",
"\n",
"class PointWithSlots:\n",
" __slots__ = ('x', 'y') # 声明固定的实例属性\n",
" def __init__(self, x, y):\n",
" self.x = x\n",
" self.y = y\n",
"\n",
"p_no_slots = PointNoSlots(1, 2)\n",
"p_with_slots = PointWithSlots(1, 2)\n",
"\n",
"print(f\"p_no_slots.__dict__: {p_no_slots.__dict__}\")\n",
"try:\n",
" print(f\"p_with_slots.__dict__: {p_with_slots.__dict__}\")\n",
"except AttributeError as e:\n",
" print(f\"Error accessing p_with_slots.__dict__: {e}\")\n",
"\n",
"# 尝试动态添加属性\n",
"p_no_slots.z = 3\n",
"print(f\"p_no_slots.z: {p_no_slots.z}\")\n",
"try:\n",
" p_with_slots.z = 3\n",
"except AttributeError as e:\n",
" print(f\"Error adding attribute z to p_with_slots: {e}\")\n",
"\n",
"# 内存占用比较 (需要安装 psutil 或 memory_profiler 来精确测量)\n",
"# 这里用 sys.getsizeof 粗略看一下对象本身的大小 (不包括 __dict__ 内容)\n",
"print(f\"\\nSize of p_no_slots (approx, object only): {sys.getsizeof(p_no_slots)}\")\n",
"print(f\"Size of p_with_slots (approx, object only): {sys.getsizeof(p_with_slots)}\")\n",
"# __dict__ 本身也会占用空间\n",
"if hasattr(p_no_slots, '__dict__'):\n",
" print(f\"Size of p_no_slots.__dict__: {sys.getsizeof(p_no_slots.__dict__)}\")\n",
"\n",
"# 实际内存节省效果在大量实例时更明显\n",
"# num_instances = 100_000\n",
"# list_no_slots = [PointNoSlots(i, i+1) for i in range(num_instances)]\n",
"# list_with_slots = [PointWithSlots(i, i+1) for i in range(num_instances)]\n",
"# 内存分析工具会更准确地显示差异"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 弱引用 (`weakref` 模块)\n",
"\n",
"正常的引用(强引用)会增加对象的引用计数,阻止对象被垃圾回收。\n",
"**弱引用 (Weak Reference)** 是一种特殊的引用,它指向一个对象,但不会增加该对象的引用计数。如果一个对象只被弱引用指向,它仍然可以被垃圾回收。\n",
"\n",
"**用途:**\n",
"* **缓存**:当你想缓存一些对象,但又不希望缓存本身阻止这些对象被回收时(如果它们在其他地方不再被需要)。\n",
"* **对象注册/回调**:避免循环引用或让注册表不必要地持有对象。\n",
"\n",
"**`weakref` 模块:**\n",
"* `weakref.ref(object[, callback])`: 创建一个弱引用对象。当被引用对象被回收时,可选的 `callback` 函数会被调用(传入弱引用对象本身作为参数)。\n",
"* 要访问被引用的对象,你需要调用弱引用对象:`weak_ref_obj()`。如果对象已被回收,它会返回 `None`。\n",
"* `weakref.WeakKeyDictionary()`: 键是弱引用的字典。\n",
"* `weakref.WeakValueDictionary()`: 值是弱引用的字典。\n",
"* `weakref.WeakSet()`: 元素是弱引用的集合。\n",
"\n",
"**注意**:不是所有对象都支持弱引用(例如,列表和字典就不支持,但它们的子类可以)。自定义类默认支持,除非它们使用了 `__slots__` 且没有包含 `__weakref__`。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import weakref\n",
"\n",
"class MyObject:\n",
" def __init__(self, name):\n",
" self.name = name\n",
" print(f\"MyObject '{self.name}' created.\")\n",
" def __del__(self):\n",
" print(f\"MyObject '{self.name}' deleted.\")\n",
"\n",
"def weak_ref_callback(wr):\n",
" print(f\"Weak reference callback: Object {wr} is being finalized.\")\n",
"\n",
"obj = MyObject(\"CacheableItem\")\n",
"ref_count_before_weakref = sys.getrefcount(obj)\n",
"\n",
"# 创建弱引用\n",
"weak_obj_ref = weakref.ref(obj, weak_ref_callback)\n",
"\n",
"ref_count_after_weakref = sys.getrefcount(obj)\n",
"print(f\"Refcount of obj before weakref (real): {ref_count_before_weakref -1}\")\n",
"print(f\"Refcount of obj after weakref (real): {ref_count_after_weakref - 1}\") # 应该没有变化\n",
"\n",
"print(f\"Accessing object via weak_obj_ref(): {weak_obj_ref().name if weak_obj_ref() else 'None'}\")\n",
"\n",
"print(\"\\nDeleting strong reference 'obj'...\")\n",
"del obj\n",
"\n",
"# 尝试手动触发GC,看看弱引用是否失效和回调是否被调用\n",
"gc.collect()\n",
"\n",
"print(f\"Accessing object via weak_obj_ref() after del and GC: {weak_obj_ref()}\") # 应该返回 None\n",
"\n",
"print(\"\\n--- WeakValueDictionary Example ---\")\n",
"cache = weakref.WeakValueDictionary()\n",
"\n",
"obj_a = MyObject(\"A\")\n",
"obj_b = MyObject(\"B\")\n",
"\n",
"cache['key_a'] = obj_a # 字典持有对 obj_a 的弱引用\n",
"cache['key_b'] = obj_b\n",
"\n",
"print(f\"Cache contents: {list(cache.items())}\")\n",
"print(f\"Is 'key_a' in cache? {'key_a' in cache}\")\n",
"\n",
"print(\"Deleting strong reference to obj_a...\")\n",
"del obj_a\n",
"gc.collect() # 触发GC\n",
"\n",
"print(f\"Is 'key_a' in cache after del and GC? {'key_a' in cache}\") # 应该为 False\n",
"print(f\"Cache contents after obj_a deletion: {list(cache.items())}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 性能分析工具"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 `timeit` 模块:微基准测试\n",
"\n",
"`timeit` 用于精确测量小段 Python 代码的执行时间。它会多次运行代码以减少计时误差,并禁用垃圾回收的影响。\n",
"\n",
"**用法:**\n",
"* 命令行:`python -m timeit -s \"setup_code\" \"statement_to_measure\"`\n",
"* 代码中:`timeit.timeit(stmt, setup, number)` 或 `timeit.repeat(stmt, setup, repeat, number)`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import timeit\n",
"\n",
"# 比较列表推导式和 map+filter\n",
"setup_code = \"data = list(range(1000))\"\n",
"\n",
"stmt_list_comp = \"[x*x for x in data if x % 2 == 0]\"\n",
"stmt_map_filter = \"list(map(lambda x: x*x, filter(lambda x: x % 2 == 0, data)))\"\n",
"\n",
"num_executions = 10000\n",
"\n",
"time_list_comp = timeit.timeit(stmt_list_comp, setup=setup_code, number=num_executions)\n",
"time_map_filter = timeit.timeit(stmt_map_filter, setup=setup_code, number=num_executions)\n",
"\n",
"print(f\"List comprehension took: {time_list_comp:.6f} seconds for {num_executions} executions\")\n",
"print(f\"Map+filter took: {time_map_filter:.6f} seconds for {num_executions} executions\")\n",
"\n",
"if time_list_comp < time_map_filter:\n",
" print(\"List comprehension is faster.\")\n",
"else:\n",
" print(\"Map+filter is faster or similar.\")\n",
"\n",
"# Jupyter Notebook 中的魔法命令 %timeit 更方便\n",
"data_for_magic = list(range(1000))\n",
"print(\"\\nUsing %timeit magic command (output will be from Jupyter):\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%timeit [x*x for x in data_for_magic if x % 2 == 0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%timeit list(map(lambda x: x*x, filter(lambda x: x % 2 == 0, data_for_magic)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2 `cProfile` 和 `profile`:代码剖析 (Profiling)\n",
"\n",
"`profile` 是纯 Python 实现的分析器,`cProfile` 是 C 扩展实现,开销更小,推荐使用。\n",
"它们可以收集函数调用的统计信息,如:\n",
"* `ncalls`: 函数被调用的次数。\n",
"* `tottime`: 函数本身执行所花费的总时间(不包括子函数调用)。\n",
"* `percall` (tottime): `tottime / ncalls`。\n",
"* `cumtime`: 函数执行所花费的累计时间(包括所有子函数调用)。\n",
"* `percall` (cumtime): `cumtime / ncalls`。\n",
"\n",
"**用法:**\n",
"* 命令行:`python -m cProfile -o output.prof myscript.py`\n",
"* 代码中:\n",
" ```python\n",
" import cProfile\n",
" import pstats\n",
"\n",
" profiler = cProfile.Profile()\n",
" profiler.enable()\n",
" # ... your code to profile ...\n",
" profiler.disable()\n",
"\n",
" stats = pstats.Stats(profiler)\n",
" stats.sort_stats('cumulative') # 或 'tottime', 'ncalls'\n",
" stats.print_stats(10) # 打印前10条\n",
" # stats.dump_stats('output.prof') # 保存到文件供后续分析\n",
" ```\n",
"* Jupyter Notebook 魔法命令:`%prun statement_to_run`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cProfile\n",
"import pstats\n",
"from io import StringIO # 用于在内存中捕获输出\n",
"\n",
"def slow_function():\n",
" time.sleep(0.1)\n",
" for _ in range(10000):\n",
" pass\n",
"\n",
"def another_function():\n",
" time.sleep(0.05)\n",
" for _ in range(5000):\n",
" _ = _ * _\n",
"\n",
"def main_profiling_target():\n",
" for _ in range(3):\n",
" slow_function()\n",
" for _ in range(2):\n",
" another_function()\n",
"\n",
"print(\"--- Profiling with cProfile in code ---\")\n",
"profiler = cProfile.Profile()\n",
"profiler.enable()\n",
"\n",
"main_profiling_target() # 执行要分析的代码\n",
"\n",
"profiler.disable()\n",
"\n",
"# 创建一个 StringIO 对象来捕获 pstats 的输出\n",
"s = StringIO()\n",
"stats = pstats.Stats(profiler, stream=s).sort_stats('cumulative')\n",
"stats.print_stats(5) # 打印最耗时的前5个函数 (按累计时间)\n",
"\n",
"print(\"cProfile output:\")\n",
"print(s.getvalue())\n",
"\n",
"print(\"\\n--- Profiling with %prun magic command (output will be from Jupyter) ---\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%prun main_profiling_target()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.3 可视化工具 (如 `snakeviz`)\n",
"\n",
"`cProfile` 的输出文本可能难以阅读。`snakeviz` 等工具可以将 `.prof` 文件可视化为交互式的火焰图或旭日图,更容易找到性能瓶颈。\n",
"\n",
"**安装:** `pip install snakeviz`\n",
"\n",
"**用法:**\n",
"1. 使用 `cProfile` 保存分析结果到文件:\n",
" `python -m cProfile -o myprogram.prof myprogram.py`\n",
" 或者在代码中:`stats.dump_stats('myprogram.prof')`\n",
"2. 运行 `snakeviz`:\n",
" `snakeviz myprogram.prof`\n",
" 这会在浏览器中打开一个交互式界面。\n",
"\n",
"Jupyter Notebook 中,`%prun -D filename.prof statement` 可以直接保存分析结果。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 示例: 保存 %prun 的结果并在命令行中使用 snakeviz 查看\n",
"# %prun -D main_profiling_target.prof main_profiling_target()\n",
"\n",
"# 然后在终端运行: snakeviz main_profiling_target.prof\n",
"print(\"To visualize cProfile results with snakeviz:\")\n",
"print(\"1. Run '%prun -D some_file.prof your_function_call()' in a cell.\")\n",
"print(\"2. Open your terminal/command prompt.\")\n",
"print(\"3. Navigate to the directory where 'some_file.prof' was saved (usually your notebook's directory).\")\n",
"print(\"4. Run 'snakeviz some_file.prof'.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.4 `memory_profiler`:内存分析\n",
"\n",
"`memory_profiler` 可以逐行监控 Python 脚本的内存使用情况。\n",
"\n",
"**安装:** `pip install memory_profiler psutil` (`psutil` 是可选但推荐的依赖,用于更精确的测量)\n",
"\n",
"**用法:**\n",
"* **装饰器 `@profile`**:在你想分析内存使用的函数上添加 `@profile` 装饰器 (注意:这个 `@profile` 不是内置的,是 `memory_profiler` 提供的)。\n",
"* 命令行运行:`python -m memory_profiler myscript.py`\n",
"* Jupyter Notebook 魔法命令:\n",
" 1. 加载扩展:`%load_ext memory_profiler`\n",
" 2. 使用 `%memit statement_to_run` (类似于 `timeit`,但测量内存峰值)。\n",
" 3. 使用 `%mprun -f function_to_profile statement_that_calls_function` (逐行分析指定函数)。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 首先,确保你已经安装了 memory_profiler 和 psutil\n",
"# !pip install memory_profiler psutil\n",
"\n",
"# 加载 memory_profiler 扩展 (只需要执行一次)\n",
"%load_ext memory_profiler"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 使用 %memit\n",
"def create_large_list():\n",
" return [i for i in range(1_000_000)]\n",
"\n",
"print(\"--- Using %memit (output will be from Jupyter) ---\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%memit create_large_list()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 使用 %mprun (逐行分析)\n",
"# 注意:要使用 %mprun 逐行分析的函数,通常需要定义在单独的文件中然后导入,\n",
"# 或者在Jupyter中,有时可以直接分析当前session中定义的函数。\n",
"# 如果直接在cell中定义并用%mprun分析,可能需要将函数定义移到它自己的cell。\n",
"\n",
"# 我们先定义一个函数,如果需要mprun分析它,确保它对mprun可见\n",
"# 为了演示,我们假设这个函数定义在当前环境中\n",
"def memory_intensive_function(n):\n",
" print(f\"Running memory_intensive_function with n={n}\")\n",
" a = [i for i in range(n)] # 第一部分内存分配\n",
" b = [i*i for i in range(n//2)] # 第二部分内存分配\n",
" del b # 释放 b\n",
" c = {str(i): i for i in a} # 第三部分内存分配\n",
" return len(c)\n",
"\n",
"print(\"\\n--- Using %mprun (output will be from Jupyter) --- \")\n",
"print(\"Run the cell below to see line-by-line memory usage for memory_intensive_function:\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# %mprun -f memory_intensive_function memory_intensive_function(500_000)\n",
"# 注意: 上面这行是注释掉的,因为 %mprun 的输出会很长。\n",
"# 要实际运行它,请取消注释并在一个单元格中单独运行。\n",
"# 你需要先执行定义了 memory_intensive_function 的单元格。\n",
"print(\"To run %mprun: \")\n",
"print(\"1. Ensure memory_intensive_function is defined (run its cell).\")\n",
"print(\"2. Uncomment the line below and run this cell:\")\n",
"print(\"# %mprun -f memory_intensive_function memory_intensive_function(500_000)\")\n",
"\n",
"# 为了让 notebook 能完整运行,我们这里只调用函数,不进行 mprun 分析\n",
"result_mem_func = memory_intensive_function(10000) # 用小一点的 N 避免占用过多资源\n",
"print(f\"Result of memory_intensive_function: {result_mem_func}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. 常见的 Python 性能瓶颈与优化技巧 (概述)\n",
"\n",
"**常见瓶颈:**\n",
"1. **算法和数据结构选择不当**:这是最根本的,O(N^2) 的算法通常比 O(N log N) 慢得多。\n",
"2. **过多的函数调用开销**:尤其是在紧密循环中调用小函数。\n",
"3. **不必要的对象创建和销毁**:在循环中创建大量临时对象。\n",
"4. **字符串操作**:字符串是不可变的,频繁拼接字符串(如使用 `+` 在循环中)会导致创建许多中间字符串。使用 `\"\".join()` 通常更高效。\n",
"5. **全局变量访问**:访问全局变量比访问局部变量慢。\n",
"6. **类型转换**:不必要的类型转换。\n",
"7. **I/O 阻塞**:同步的磁盘或网络操作会阻塞整个程序。\n",
"8. **GIL 限制下的 CPU 密集型多线程**:如前所述。\n",
"\n",
"**优化技巧 (一般性建议,需结合分析工具):**\n",
"1. **选择正确的算法和数据结构**:这是最重要的。\n",
"2. **使用内置函数和库**:它们通常是用 C 实现的,非常高效 (e.g., `map`, `filter`, `itertools`, `collections`, NumPy, Pandas)。\n",
"3. **利用列表推导式和生成器表达式**:通常比显式的 `for` 循环 + `append` 更快更简洁。\n",
"4. **字符串拼接使用 `join()`**:`\" \".join(list_of_strings)`。\n",
"5. **避免在循环中进行不必要的计算或属性查找**:如果一个值在循环中不变,在循环外计算一次。\n",
" ```python\n",
" # 慢\n",
" # for item in my_list:\n",
" # x = math.sqrt(expensive_calculation()) \n",
" # item.process(x)\n",
" # 快\n",
" # x = math.sqrt(expensive_calculation())\n",
" # for item in my_list:\n",
" # item.process(x)\n",
" ```\n",
"6. **缓存结果 (Memoization)**:对于具有相同参数且耗时的纯函数调用,缓存其结果 (e.g., `functools.lru_cache`)。\n",
"7. **使用 `__slots__`**:对于大量小对象,可以减少内存占用。\n",
"8. **延迟计算/惰性求值**:使用生成器,只在需要时才计算值。\n",
"9. **并发与并行**:\n",
" * I/O 密集型:`asyncio`, `threading`。\n",
" * CPU 密集型:`multiprocessing`。\n",
"10. **Cython 或 Numba**:对于性能关键的数值计算部分,可以将 Python 代码编译成 C 扩展或使用 JIT 编译器。\n",
"11. **使用更快的 Python 实现**:如 PyPy (通常对纯 Python 长时间运行的程序有显著加速)。\n",
"\n",
"**优化原则:**\n",
"* **不要过早优化。**\n",
"* **先让代码工作正确,再考虑优化。**\n",
"* **使用分析工具定位瓶颈,不要猜测。**\n",
"* **优化最耗时的部分 (通常是代码的一小部分,遵循 80/20 法则)。**\n",
"* **优化后进行测试,确保功能正确性并验证性能提升。**\n",
"\n",
"## 总结\n",
"\n",
"理解 Python 的一些内部工作方式(如对象模型、内存管理)和掌握性能分析工具,是成为一名更高效 Python 开发者的重要步骤。通过有针对性地分析和优化,你可以显著改善 Python 应用程序的性能和资源使用情况。\n",
"\n",
"记住,清晰、可维护的代码通常比微小的性能提升更重要,除非性能确实是关键瓶颈。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment