Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python 生成器与迭代器协议 (深入) 教程\n",
"\n",
"欢迎来到 Python 生成器与迭代器协议的深入教程!迭代是 Python 中一个非常核心的概念,理解其背后的迭代器协议以及强大的生成器机制,可以帮助你编写出更高效、内存友好且富有表现力的代码。\n",
"\n",
"**为什么深入学习迭代和生成器?**\n",
"\n",
"1. **内存效率**:生成器允许按需生成值,而不是一次性在内存中创建整个序列,这对于处理大型数据集或无限序列至关重要。\n",
"2. **惰性求值 (Lazy Evaluation)**:值仅在需要时才被计算,可以节省计算资源。\n",
"3. **代码简洁性**:生成器提供了一种简洁的方式来创建迭代器。\n",
"4. **构建数据处理管道**:可以轻松地将多个生成器链接起来,形成高效的数据处理流。\n",
"5. **理解 Python 核心**:迭代协议是 `for` 循环、列表推导式、`map()`, `filter()` 等许多 Python 特性的基础。\n",
"\n",
"**本教程将涵盖:**\n",
"\n",
"1. **迭代协议 (Iterator Protocol)**:`__iter__` 和 `__next__`。\n",
"2. **可迭代对象 (Iterable) vs 迭代器 (Iterator)**。\n",
"3. **生成器函数 (Generator Functions)**:使用 `yield` 关键字。\n",
"4. **生成器表达式 (Generator Expressions)**。\n",
"5. **`itertools` 模块**:强大的迭代工具。\n",
"6. **`yield from` 语句** (Python 3.3+)。\n",
"7. **生成器的高级特性**:`send()`, `throw()`, `close()` 方法 (传统协程基础)。\n",
"8. **应用场景与最佳实践**。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 迭代协议 (Iterator Protocol)\n",
"\n",
"Python 的迭代协议定义了对象如何支持迭代。它依赖于两个核心的魔术方法:\n",
"\n",
"* **`__iter__(self)`**:\n",
" * 当一个对象被传递给 `iter()` 内置函数时,或者当 `for` 循环开始时,会调用该对象的 `__iter__` 方法。\n",
" * 它必须返回一个**迭代器对象**。\n",
"\n",
"* **`__next__(self)`**:\n",
" * 迭代器对象必须实现这个方法。\n",
" * 当调用 `next(iterator)` 内置函数时(`for` 循环在每次迭代时隐式调用它),会调用迭代器的 `__next__` 方法。\n",
" * 它应该返回序列中的下一个值。\n",
" * 当没有更多值可以返回时,它必须引发 `StopIteration` 异常。`for` 循环会自动捕获这个异常并终止循环。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 可迭代对象 (Iterable) vs 迭代器 (Iterator)\n",
"\n",
"* **可迭代对象 (Iterable)**:\n",
" * 任何实现了 `__iter__` 方法(返回一个迭代器)的对象都是可迭代的。\n",
" * 或者,如果一个对象实现了 `__getitem__` 方法并且可以从索引 0 开始接受整数参数(如序列),它也是可迭代的 (Python 会自动创建一个迭代器来遍历它)。\n",
" * 例子:列表 (`list`)、元组 (`tuple`)、字符串 (`str`)、字典 (`dict`)、集合 (`set`)、文件对象、自定义类(实现了 `__iter__` 或 `__getitem__`)。\n",
" * 你可以对一个可迭代对象多次调用 `iter()` 来获取新的迭代器,每个迭代器独立地遍历数据。\n",
"\n",
"* **迭代器 (Iterator)**:\n",
" * 任何实现了 `__iter__` 方法和 `__next__` 方法的对象都是迭代器。\n",
" * `__iter__` 方法对于迭代器来说,通常只需要返回 `self` (因为迭代器本身就是自己的迭代器)。\n",
" * 迭代器是有状态的:它们记住在迭代过程中的当前位置。\n",
" * 迭代器通常只能遍历一次。一旦 `__next__` 引发 `StopIteration`,它将继续引发该异常。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 示例:自定义一个可迭代对象和迭代器\n",
"class MyRangeIterable:\n",
" \"\"\"一个简单的可迭代对象,类似于 range()\"\"\"\n",
" def __init__(self, start, end):\n",
" self.start = start\n",
" self.end = end\n",
" print(f\"MyRangeIterable initialized ({self.start} to {self.end})\")\n",
"\n",
" def __iter__(self):\n",
" print(\"MyRangeIterable.__iter__ called, returning MyRangeIterator\")\n",
" # 返回一个新的迭代器实例\n",
" return MyRangeIterator(self.start, self.end)\n",
"\n",
"class MyRangeIterator:\n",
" \"\"\"一个迭代器,用于 MyRangeIterable\"\"\"\n",
" def __init__(self, start, end):\n",
" self.current = start\n",
" self.end = end\n",
" print(f\"MyRangeIterator initialized (current={self.current}, end={self.end})\")\n",
"\n",
" def __iter__(self):\n",
" # 迭代器自身的 __iter__ 方法应该返回 self\n",
" print(\"MyRangeIterator.__iter__ called, returning self\")\n",
" return self\n",
"\n",
" def __next__(self):\n",
" print(f\"MyRangeIterator.__next__ called (current={self.current})\")\n",
" if self.current < self.end:\n",
" value = self.current\n",
" self.current += 1\n",
" return value\n",
" else:\n",
" print(\"MyRangeIterator: Raising StopIteration\")\n",
" raise StopIteration\n",
"\n",
"print(\"--- Testing MyRangeIterable ---\")\n",
"my_range_obj = MyRangeIterable(1, 4) # 可迭代对象\n",
"\n",
"print(\"\\nFirst iteration using for loop:\")\n",
"for num in my_range_obj: # 隐式调用 iter(my_range_obj) 然后 next()\n",
" print(f\" For loop got: {num}\")\n",
"\n",
"print(\"\\nSecond iteration using for loop (gets a new iterator):\")\n",
"for num in my_range_obj:\n",
" print(f\" For loop got: {num}\")\n",
"\n",
"print(\"\\nManual iteration:\")\n",
"iterator1 = iter(my_range_obj) # 获取一个迭代器\n",
"print(f\"Type of iterator1: {type(iterator1)}\")\n",
"print(f\"next(iterator1): {next(iterator1)}\")\n",
"print(f\"next(iterator1): {next(iterator1)}\")\n",
"\n",
"iterator2 = iter(my_range_obj) # 获取另一个独立的迭代器\n",
"print(f\"next(iterator2): {next(iterator2)}\") # 从头开始\n",
"\n",
"print(f\"Continuing iterator1: {next(iterator1)}\")\n",
"try:\n",
" print(f\"Continuing iterator1 (expect StopIteration): {next(iterator1)}\")\n",
"except StopIteration as e:\n",
" print(f\" Caught StopIteration as expected: {e}\")\n",
"\n",
"# 验证迭代器也是可迭代的\n",
"iter_from_iter = iter(iterator2)\n",
"print(f\"iterator2 is iter_from_iter: {iterator2 is iter_from_iter}\") # True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 生成器函数 (Generator Functions)\n",
"\n",
"生成器函数是一种特殊的函数,它不使用 `return` 返回一个值,而是使用 `yield` 关键字“产生”一系列值。\n",
"\n",
"* 当调用一个生成器函数时,它**不会立即执行函数体**,而是返回一个**生成器对象 (generator object)**。\n",
"* 生成器对象是一种特殊的迭代器:它自动实现了 `__iter__` 和 `__next__` 方法。\n",
"* 每次在生成器对象上调用 `next()` 时,函数会从上次 `yield` 语句离开的地方继续执行,直到遇到下一个 `yield` 语句。\n",
"* `yield` 语句会“产生”一个值给调用者,并暂停函数的执行状态(包括局部变量)。\n",
"* 当函数执行完毕(没有更多 `yield` 或遇到 `return` 语句,或正常退出)时,会自动引发 `StopIteration`。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def simple_generator_func(n):\n",
" print(\"Generator function: simple_generator_func called\")\n",
" i = 0\n",
" while i < n:\n",
" print(f\"Generator: yielding {i}\")\n",
" yield i # 产生值并暂停\n",
" i += 1\n",
" print(f\"Generator: resumed, i is now {i}\")\n",
" print(\"Generator: finished\")\n",
" # 隐式 StopIteration\n",
"\n",
"print(\"--- Testing simple_generator_func ---\")\n",
"gen_obj = simple_generator_func(3) # 调用生成器函数,返回生成器对象\n",
"print(f\"Type of gen_obj: {type(gen_obj)}\") # <class 'generator'>\n",
"\n",
"print(f\"\\nFirst next(gen_obj): {next(gen_obj)}\") # 开始执行,直到第一个yield\n",
"print(f\"Second next(gen_obj): {next(gen_obj)}\")\n",
"print(f\"Third next(gen_obj): {next(gen_obj)}\")\n",
"try:\n",
" print(f\"Fourth next(gen_obj) (expect StopIteration): {next(gen_obj)}\")\n",
"except StopIteration:\n",
" print(\" Caught StopIteration as expected.\")\n",
"\n",
"print(\"\\nIterating with a for loop (uses a new generator object):\")\n",
"for val in simple_generator_func(2):\n",
" print(f\" For loop got: {val}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**生成器的优点:**\n",
"* **代码简洁**:创建迭代器的逻辑(状态管理、`StopIteration`)由 Python 自动处理。\n",
"* **内存高效**:值是按需生成的,适合处理大数据集或无限序列。\n",
"\n",
"**无限序列示例:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fibonacci_generator():\n",
" \"\"\"生成一个无限的斐波那契数列。\"\"\"\n",
" a, b = 0, 1\n",
" while True:\n",
" yield a\n",
" a, b = b, a + b\n",
"\n",
"print(\"--- Fibonacci Generator ---\")\n",
"fib_gen = fibonacci_generator()\n",
"print(\"First 10 Fibonacci numbers:\")\n",
"for _ in range(10):\n",
" print(next(fib_gen), end=\" \")\n",
"print(\"\\n\")\n",
"\n",
"# 如果你想从头开始,需要重新创建生成器对象\n",
"fib_gen2 = fibonacci_generator()\n",
"print(f\"Next from fib_gen2: {next(fib_gen2)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 生成器表达式 (Generator Expressions)\n",
"\n",
"生成器表达式提供了一种更简洁的方式来创建简单的生成器对象,其语法类似于列表推导式,但使用圆括号 `()` 而不是方括号 `[]`。\n",
"\n",
"`(expression for item in iterable if condition)`\n",
"\n",
"* 生成器表达式也返回一个生成器对象。\n",
"* 它们也是惰性求值的,按需生成值。\n",
"* 非常适合作为函数参数传递,尤其是当你不希望立即创建整个列表时。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"squares_list_comp = [x*x for x in range(5)] # 列表推导式,立即创建列表\n",
"squares_gen_expr = (x*x for x in range(5)) # 生成器表达式,返回生成器对象\n",
"\n",
"print(f\"List comprehension: {squares_list_comp}, type: {type(squares_list_comp)}\")\n",
"print(f\"Generator expression: {squares_gen_expr}, type: {type(squares_gen_expr)}\")\n",
"\n",
"print(\"\\nIterating over generator expression:\")\n",
"for sq in squares_gen_expr:\n",
" print(sq, end=\" \")\n",
"print(\"\\n\")\n",
"\n",
"# 再次迭代会发现它已经耗尽 (因为生成器是一次性的)\n",
"print(\"Trying to iterate again (should be empty):\")\n",
"for sq in squares_gen_expr: \n",
" print(sq, end=\" \") # 不会有输出\n",
"print(\"\\n\")\n",
"\n",
"# 作为函数参数\n",
"data = [1, 2, 3, 4, 5, 6]\n",
"sum_of_even_squares = sum(x*x for x in data if x % 2 == 0)\n",
"# 上面的 sum() 直接消耗了生成器表达式产生的值,没有创建中间列表\n",
"print(f\"Sum of even squares: {sum_of_even_squares}\")\n",
"\n",
"# 如果生成器表达式是函数调用的唯一参数,可以省略外层圆括号\n",
"sum_of_cubes = sum(x**3 for x in range(1, 4))\n",
"print(f\"Sum of cubes (1,2,3): {sum_of_cubes}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. `itertools` 模块\n",
"\n",
"`itertools` 模块包含一系列用于创建高效迭代器的函数。这些函数受到 APL, Haskell, SML 等函数式编程语言中类似构造的启发。\n",
"\n",
"**一些常用的 `itertools` 函数:**\n",
"\n",
"* **无限迭代器:**\n",
" * `count(start=0, step=1)`: 从 `start` 开始,以 `step` 递增的无限序列。\n",
" * `cycle(iterable)`: 无限重复 `iterable` 中的元素。\n",
" * `repeat(object[, times])`: 重复 `object`,可以指定次数,否则无限重复。\n",
"\n",
"* **处理有限序列的迭代器:**\n",
" * `accumulate(iterable[, func, *, initial=None])`: 返回累积的总和(或其他二元函数的结果)。\n",
" * `chain(*iterables)`: 将多个可迭代对象连接成一个序列。\n",
" * `compress(data, selectors)`: 根据 `selectors` 中的真值过滤 `data` 中的元素。\n",
" * `dropwhile(predicate, iterable)`: 当 `predicate` 为真时,跳过 `iterable` 中的元素,然后返回剩余所有元素。\n",
" * `filterfalse(predicate, iterable)`: 返回 `iterable` 中 `predicate` 为假的元素。\n",
" * `groupby(iterable, key=None)`: 将连续的具有相同键值(由 `key` 函数确定)的元素分组。\n",
" * `islice(iterable, stop)` 或 `islice(iterable, start, stop[, step])`: 返回 `iterable` 的一个切片,类似于列表切片,但返回迭代器。\n",
" * `starmap(function, iterable)`: 类似于 `map`,但 `iterable` 中的每个元素是一个元组,会解包作为 `function` 的参数。\n",
" * `takewhile(predicate, iterable)`: 只要 `predicate` 为真,就从 `iterable` 中返回元素。\n",
" * `tee(iterable, n=2)`: 返回 `n` 个独立的迭代器,它们都从同一个原始 `iterable` 中获取元素。\n",
" * `zip_longest(*iterables, fillvalue=None)`: 类似于 `zip`,但会用 `fillvalue` 填充最短的迭代器,直到所有迭代器耗尽。\n",
"\n",
"* **组合生成器:**\n",
" * `product(*iterables, repeat=1)`: 笛卡尔积。\n",
" * `permutations(iterable, r=None)`: 排列。\n",
" * `combinations(iterable, r)`: 组合。\n",
" * `combinations_with_replacement(iterable, r)`: 可重复组合。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import itertools\n",
"\n",
"print(\"--- itertools.count --- \")\n",
"counter = itertools.count(10, 2)\n",
"for _ in range(5):\n",
" print(next(counter), end=\" \") # 10 12 14 16 18\n",
"print(\"\\n\")\n",
"\n",
"print(\"--- itertools.cycle --- \")\n",
"cycler = itertools.cycle(\"ABC\")\n",
"for _ in range(7):\n",
" print(next(cycler), end=\" \") # A B C A B C A\n",
"print(\"\\n\")\n",
"\n",
"print(\"--- itertools.chain --- \")\n",
"chained = itertools.chain([1, 2], \"XY\", (3, 4))\n",
"print(list(chained)) # [1, 2, 'X', 'Y', 3, 4]\n",
"\n",
"print(\"--- itertools.islice --- \")\n",
"sliced = itertools.islice(range(10), 2, 8, 2) # 从索引2到8 (不含),步长2\n",
"print(list(sliced)) # [2, 4, 6]\n",
"\n",
"print(\"--- itertools.groupby --- \")\n",
"data = \"AAABBCDAA\"\n",
"for key, group in itertools.groupby(data):\n",
" print(f\"Key: {key}, Group: {list(group)}\")\n",
"# Key: A, Group: ['A', 'A', 'A']\n",
"# Key: B, Group: ['B', 'B']\n",
"# Key: C, Group: ['C']\n",
"# Key: D, Group: ['D']\n",
"# Key: A, Group: ['A', 'A']\n",
"\n",
"print(\"--- itertools.combinations --- \")\n",
"combs = itertools.combinations(\"ABC\", 2)\n",
"print(list(combs)) # [('A', 'B'), ('A', 'C'), ('B', 'C')]\n",
"\n",
"print(\"--- itertools.product --- \")\n",
"prod = itertools.product(\"AB\", \"12\")\n",
"print(list(prod)) # [('A', '1'), ('A', '2'), ('B', '1'), ('B', '2')]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. `yield from` 语句 (Python 3.3+)\n",
"\n",
"`yield from <iterable>` 语句允许一个生成器将其部分操作委托给另一个可迭代对象 (通常是另一个生成器)。\n",
"\n",
"它主要做了以下事情:\n",
"1. 迭代 `<iterable>`。\n",
"2. 将从 `<iterable>` 中产生的每个值直接传递给当前生成器的调用者。\n",
"3. 如果 `<iterable>` 本身是一个生成器,`yield from` 还会处理子生成器可能通过 `send()`, `throw()`, `close()` 接收到的值或异常,并将它们传递给子生成器。\n",
"\n",
"**用途:**\n",
"* **简化生成器嵌套**:避免写很多 `for item in sub_generator: yield item` 这样的代码。\n",
"* **构建协程管道** (虽然现代异步编程更多使用 `async/await`)。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def sub_generator(start, end):\n",
" print(f\" sub_generator: called with {start}, {end}\")\n",
" for i in range(start, end):\n",
" print(f\" sub_generator: yielding {i}\")\n",
" yield i\n",
" print(\" sub_generator: finished\")\n",
"\n",
"def delegating_generator_manual(iterables_list):\n",
" print(\"delegating_generator_manual: called\")\n",
" for iterable in iterables_list:\n",
" for item in iterable: # 手动迭代子可迭代对象\n",
" yield item\n",
" print(\"delegating_generator_manual: finished\")\n",
"\n",
"def delegating_generator_yield_from(iterables_list):\n",
" print(\"delegating_generator_yield_from: called\")\n",
" for iterable in iterables_list:\n",
" # 使用 yield from 委托给子可迭代对象\n",
" # 如果 iterable 是一个生成器,yield from 会建立一个双向通道\n",
" yield from iterable \n",
" print(\"delegating_generator_yield_from: finished\")\n",
"\n",
"print(\"--- Testing yield from ---\")\n",
"data_sources = [\n",
" sub_generator(1, 3), # 一个生成器\n",
" \"XY\", # 一个字符串 (可迭代)\n",
" (10, 11) # 一个元组 (可迭代)\n",
"]\n",
"\n",
"print(\"\\nUsing manual delegation:\")\n",
"for item in delegating_generator_manual(list(data_sources)): # list() to consume sub_generator once\n",
" print(f\"Got item: {item}\")\n",
"\n",
"# 重新创建 data_sources 因为生成器会被消耗\n",
"data_sources_2 = [\n",
" sub_generator(1, 3),\n",
" \"XY\",\n",
" (10, 11)\n",
"]\n",
"print(\"\\nUsing yield from:\")\n",
"for item in delegating_generator_yield_from(data_sources_2):\n",
" print(f\"Got item: {item}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. 生成器的高级特性:`send()`, `throw()`, `close()`\n",
"\n",
"除了通过 `next()` 从生成器获取值,还可以向生成器发送值或异常,或者关闭它。这些特性使得生成器可以用作简单的**协程 (coroutine)** (这是 `async/await` 出现之前的协程概念)。\n",
"\n",
"* **`generator.send(value)`**:\n",
" * 向生成器发送一个值,这个值会成为当前 `yield` 表达式的结果。\n",
" * 生成器会从暂停处恢复执行,直到遇到下一个 `yield` (产生一个值) 或终止。\n",
" * 在首次启动生成器时(即在第一次 `yield` 之前),必须发送 `None` (或者直接调用 `next(generator)`)。\n",
"\n",
"* **`generator.throw(type[, value[, traceback]])`**:\n",
" * 在生成器暂停的地方(`yield` 表达式处)引发一个异常。\n",
" * 如果生成器内部捕获了这个异常,它可以继续执行并 `yield` 一个值,或者正常退出(引发 `StopIteration`),或者引发另一个异常。\n",
" * 如果生成器未捕获该异常,异常会传播给调用者。\n",
"\n",
"* **`generator.close()`**:\n",
" * 在生成器暂停的地方引发一个 `GeneratorExit` 异常。\n",
" * 生成器通常应该捕获 `GeneratorExit`,执行清理操作,然后要么重新引发 `GeneratorExit`,要么引发 `StopIteration`,要么正常退出。\n",
" * 调用 `close()` 后,如果生成器尝试 `yield` 一个值,会引发 `RuntimeError`。\n",
" * `close()` 之后的 `next()` 或 `send()` 调用也会引发 `StopIteration`。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def simple_coroutine():\n",
" print(\"Coroutine started\")\n",
" received_value = None\n",
" try:\n",
" while True:\n",
" received_value = yield received_value # yield 表达式的值是 send() 过来的值\n",
" print(f\"Coroutine received: {received_value}\")\n",
" if received_value == \"exit\":\n",
" print(\"Coroutine exiting normally\")\n",
" break\n",
" received_value = f\"Processed: {received_value}\"\n",
" except GeneratorExit:\n",
" print(\"Coroutine: Caught GeneratorExit, cleaning up...\")\n",
" # 执行清理操作\n",
" print(\"Coroutine: Cleaned up and closing.\")\n",
" # 不应再 yield 值,可以重新引发 GeneratorExit 或 StopIteration,或直接返回\n",
" except ValueError as e:\n",
" print(f\"Coroutine: Caught ValueError: {e}\")\n",
" yield f\"Error handled: {e}\" # 可以选择 yield 一个错误处理结果\n",
" finally:\n",
" print(\"Coroutine finally block executed\")\n",
"\n",
"print(\"--- Testing Coroutine send() ---\")\n",
"co = simple_coroutine()\n",
"next(co) # 启动协程,执行到第一个 yield,此时 received_value 为 None\n",
"print(f\"Sent 10, got back: {co.send(10)}\") # 发送 10, yield 返回 'Processed: 10'\n",
"print(f\"Sent 'hello', got back: {co.send('hello')}\") # 发送 'hello', yield 返回 'Processed: hello'\n",
"\n",
"print(\"\\n--- Testing Coroutine throw() ---\")\n",
"co2 = simple_coroutine()\n",
"next(co2)\n",
"try:\n",
" print(f\"Throwing ValueError, got back: {co2.throw(ValueError, 'Test error')}\")\n",
"except ValueError as e:\n",
" print(f\"Caller caught an unhandled error from coroutine: {e}\") # 如果协程不处理并重新抛出\n",
"\n",
"print(f\"Sending 'after error' to co2, got back: {co2.send('after error')}\") # 协程可能已处理异常并继续\n",
"\n",
"print(\"\\n--- Testing Coroutine close() ---\")\n",
"co3 = simple_coroutine()\n",
"next(co3)\n",
"co3.send(\"data before close\")\n",
"co3.close() # 关闭协程,会引发 GeneratorExit\n",
"\n",
"try:\n",
" next(co3) # 尝试再次从已关闭的协程获取值\n",
"except StopIteration:\n",
" print(\"Caught StopIteration after close, as expected.\")\n",
"\n",
"print(\"\\n--- Testing Coroutine exit command ---\")\n",
"co4 = simple_coroutine()\n",
"next(co4)\n",
"co4.send(\"some data\")\n",
"try:\n",
" co4.send(\"exit\") # 协程内部处理 exit 并正常结束\n",
"except StopIteration:\n",
" print(\"Caught StopIteration after coroutine exited via 'exit' command.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"虽然 `async/await` 是现代 Python 中进行异步编程和协程的首选方式,但理解传统生成器协程的这些机制有助于理解 Python 异步历史以及某些库的底层实现。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. 应用场景与最佳实践\n",
"\n",
"**何时使用迭代器/生成器?**\n",
"\n",
"* **处理大型数据集**:当数据无法一次性装入内存时(例如,读取大文件、数据库查询结果)。\n",
"* **无限序列**:如计数器、斐波那契数列、随机数流。\n",
"* **数据处理管道**:将多个生成器链接起来,以流式方式处理数据,每一步都是惰性的。\n",
" ```python\n",
" # lines = (line for line in open('large_file.txt'))\n",
" # non_empty_lines = (line for line in lines if line.strip())\n",
" # processed_lines = (process(line) for line in non_empty_lines)\n",
" # for result in processed_lines:\n",
" # # ...\n",
" ```\n",
"* **需要自定义迭代行为的类**。\n",
"* **替代简单的列表推导式以节省内存**,如果结果列表很大且不需要立即全部使用。\n",
"\n",
"**最佳实践:**\n",
"\n",
"1. **优先使用生成器表达式**:对于简单的惰性序列生成,生成器表达式最简洁。\n",
"2. **使用生成器函数**:当迭代逻辑复杂,需要多个 `yield` 或内部状态时。\n",
"3. **利用 `itertools`**:在自己动手实现复杂迭代逻辑之前,先看看 `itertools` 是否有现成的解决方案。\n",
"4. **理解迭代器是一次性的**:如果需要多次迭代,要么重新创建迭代器/生成器,要么将结果存储在列表中(如果内存允许)。\n",
"5. **`yield from` 可以使代码更扁平**:当委托给其他可迭代对象时。\n",
"6. **谨慎使用生成器的高级方法 (`send`, `throw`, `close`)**:它们引入了更复杂的控制流,对于大多数迭代场景是不必要的。现代异步编程应优先考虑 `async/await`。\n",
"\n",
"## 总结\n",
"\n",
"迭代器和生成器是 Python 中非常强大且基础的特性。它们不仅是许多内置功能(如 `for` 循环)的核心,还提供了一种优雅、高效的方式来处理数据流和序列。\n",
"\n",
"通过深入理解迭代协议、生成器函数、生成器表达式以及 `itertools` 模块,你可以编写出更 Pythonic、更高效、内存更友好的代码。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment