Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vector Database 客户端库教程 (FAISS 重点)\n",
"\n",
"欢迎来到向量数据库客户端库教程!随着嵌入(Embeddings)技术在表示文本、图像等非结构化数据方面的成功,**向量数据库**(或称向量搜索引擎)成为了现代 AI 应用的关键基础设施。它们专门用于存储高维向量,并能高效地执行**相似性搜索 (Similarity Search)**,即根据一个查询向量找出数据库中最相似的向量。\n",
"\n",
"**为什么需要向量数据库/搜索库?**\n",
"\n",
"1. **高效检索**: 对于大规模向量数据集(百万甚至十亿级别),传统的线性扫描计算相似度非常慢。向量数据库使用近似最近邻 (Approximate Nearest Neighbor, ANN) 等算法来大幅加速搜索过程。\n",
"2. **RAG 的核心**: 在检索增强生成 (RAG) 中,向量数据库用于存储文档块的嵌入向量,并根据用户问题的嵌入向量快速找到相关的文档块。\n",
"3. **推荐系统**: 找到与用户或物品嵌入向量相似的其他用户或物品。\n",
"4. **图像/音频搜索**: 基于内容的图像或音频检索。\n",
"5. **重复数据删除**: 查找相似的文本或图像。\n",
"\n",
"**本教程重点介绍 `faiss-cpu` (或 `faiss-gpu`)**: \n",
"* FAISS (Facebook AI Similarity Search) 是由 Facebook AI 开发的一个非常高效的向量相似性搜索库。\n",
"* 它提供了多种索引类型,可以在内存或磁盘上运行。\n",
"* 它是一个库,而不是一个数据库服务,通常嵌入在应用程序中或由其他框架(如 LangChain, LlamaIndex)调用。\n",
"\n",
"**其他流行的向量数据库/库 (简介):**\n",
"* **ChromaDB**: 开源,本地优先,易于使用,与 LangChain/LlamaIndex 集成良好。\n",
"* **Pinecone**: 商业化的、完全托管的云原生向量数据库服务。\n",
"* **Weaviate**: 开源的云原生向量数据库,支持 GraphQL。\n",
"* **Milvus**: 开源的云原生向量数据库。\n",
"\n",
"**本教程将涵盖 FAISS 的核心用法:**\n",
"\n",
"1. 安装 FAISS\n",
"2. 准备示例向量数据 (使用 Sentence Transformers 获取文本嵌入)\n",
"3. 构建 FAISS 索引 (如 `IndexFlatL2`, `IndexIVFFlat`)\n",
"4. 向索引添加向量\n",
"5. 执行相似性搜索 (`index.search()`)\n",
"6. (简介) 索引的保存与加载\n",
"7. (简介) 与 LangChain/LlamaIndex 的集成"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 安装 FAISS 和 Sentence Transformers\n",
"\n",
"```bash\n",
"# 安装 FAISS CPU 版本\n",
"pip install faiss-cpu\n",
"\n",
"# 或者,如果你有兼容的 GPU 和 CUDA 环境,可以安装 GPU 版本\n",
"# pip install faiss-gpu\n",
"\n",
"# 安装 Sentence Transformers 用于生成文本嵌入\n",
"pip install sentence-transformers\n",
"\n",
"# 其他依赖\n",
"pip install numpy pandas matplotlib\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import time\n",
"import os\n",
"\n",
"# 尝试导入 faiss\n",
"try:\n",
" import faiss\n",
" print(f\"FAISS version: {faiss.__version__}\")\n",
" faiss_available = True\n",
"except ImportError:\n",
" print(\"FAISS library not found. Please install faiss-cpu or faiss-gpu.\")\n",
" print(\"pip install faiss-cpu\")\n",
" faiss_available = False\n",
"\n",
"# 尝试导入 sentence-transformers\n",
"try:\n",
" from sentence_transformers import SentenceTransformer\n",
" print(\"SentenceTransformer imported successfully.\")\n",
" st_available = True\n",
" # 加载一个预训练的嵌入模型 (第一次运行时会自动下载)\n",
" # 'all-MiniLM-L6-v2' 是一个常用且相对较小的模型\n",
" embedding_model = SentenceTransformer('all-MiniLM-L6-v2')\n",
" embedding_dim = embedding_model.get_sentence_embedding_dimension()\n",
" print(f\"Loaded Sentence Transformer model. Embedding dimension: {embedding_dim}\")\n",
"except ImportError:\n",
" print(\"sentence-transformers library not found. Please install it: pip install sentence-transformers\")\n",
" st_available = False\n",
" embedding_model = None\n",
" embedding_dim = None\n",
"except Exception as e:\n",
" print(f\"Error loading Sentence Transformer model: {e}. Check internet connection or model name.\")\n",
" st_available = False\n",
" embedding_model = None\n",
" embedding_dim = None\n",
"\n",
"# 辅助函数\n",
"def time_it(func, *args, **kwargs):\n",
" start = time.time()\n",
" result = func(*args, **kwargs)\n",
" end = time.time()\n",
" print(f\"Execution time: {end - start:.4f} seconds\")\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 准备示例向量数据\n",
"\n",
"我们需要一组向量来构建索引。这里,我们使用 `sentence-transformers` 将一些示例文本转换为嵌入向量。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Preparing Sample Data and Embeddings ---\")\n",
"\n",
"documents = [\n",
" \"The cat sat on the mat.\",\n",
" \"The dog chased the ball.\",\n",
" \"Apples and oranges are fruits.\",\n",
" \"Paris is the capital of France.\",\n",
" \"The weather is sunny today.\",\n",
" \"Machine learning models require data.\",\n",
" \"A feline rested upon the rug.\", # Similar to sentence 1\n",
" \"Information retrieval is key for RAG.\"\n",
"]\n",
"\n",
"doc_embeddings = None\n",
"if st_available and embedding_model:\n",
" print(f\"Generating embeddings for {len(documents)} documents...\")\n",
" # 使用 embedding_model.encode() 获取嵌入向量\n",
" doc_embeddings = embedding_model.encode(documents)\n",
" \n",
" # FAISS 需要 float32 类型的 NumPy 数组\n",
" doc_embeddings = np.array(doc_embeddings).astype('float32')\n",
" \n",
" print(f\"Embeddings generated. Shape: {doc_embeddings.shape}\") # (num_documents, embedding_dim)\n",
" # print(\"Sample embedding (first 5 dims of first doc):\")\n",
" # print(doc_embeddings[0, :5])\n",
"else:\n",
" print(\"Sentence Transformer model not available. Cannot generate embeddings.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 构建 FAISS 索引\n",
"\n",
"FAISS 提供了多种索引类型,适用于不同的数据规模、内存限制和搜索速度/精度权衡。\n",
"\n",
"* **`faiss.IndexFlatL2`**: \n",
" * 最简单的索引,进行精确的暴力 L2 (欧氏距离) 搜索。\n",
" * 不需要训练。\n",
" * 适用于小型数据集,精度最高,但速度最慢。\n",
"* **`faiss.IndexFlatIP`**: 类似 `IndexFlatL2`,但使用内积 (Inner Product) 作为相似度度量(对于归一化向量,等价于余弦相似度)。\n",
"* **`faiss.IndexIVFFlat`**: \n",
" * 基于倒排文件 (Inverted File) 的索引,是常用的近似最近邻 (ANN) 算法。\n",
" * 需要一个**训练 (train)** 阶段,使用一部分数据(或全部数据)来学习数据空间中的聚类中心 (centroids)。\n",
" * 搜索时,先找到查询向量最近的几个聚类中心,然后在这些中心对应的列表中进行搜索,从而减少搜索范围。\n",
" * 参数:`quantizer` (通常是 `IndexFlatL2`),`d` (向量维度),`nlist` (聚类中心数量)。\n",
" * `nprobe` 参数控制搜索时要检查的聚类列表数量(影响速度和精度)。\n",
"* **其他索引**: 如 `IndexPQ` (Product Quantization), `IndexHNSWFlat` (Hierarchical Navigable Small World graphs) 等,用于更大规模或需要更高压缩率/速度的场景。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\n--- Building FAISS Index --- \")\n",
"index_flat_l2 = None\n",
"index_ivf_flat = None\n",
"\n",
"if faiss_available and doc_embeddings is not None:\n",
" d = embedding_dim # 向量维度\n",
" n_docs = doc_embeddings.shape[0]\n",
" \n",
" # --- 1. IndexFlatL2 (精确,暴力搜索) --- \n",
" print(\"\\nBuilding IndexFlatL2...\")\n",
" index_flat_l2 = faiss.IndexFlatL2(d)\n",
" print(f\"IndexFlatL2 created. Is trained: {index_flat_l2.is_trained}\")\n",
" print(f\"Initial vector count: {index_flat_l2.ntotal}\")\n",
" \n",
" # --- 2. IndexIVFFlat (近似,需要训练) --- \n",
" print(\"\\nBuilding IndexIVFFlat...\")\n",
" nlist = 4 # 聚类中心数量 (通常选择 sqrt(N) 到 N/100 之间,这里 N 很小)\n",
" quantizer = faiss.IndexFlatL2(d) # 底层使用 L2 距离计算中心\n",
" index_ivf_flat = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)\n",
" # faiss.METRIC_L2: 使用 L2 距离\n",
" # faiss.METRIC_INNER_PRODUCT: 使用内积\n",
" \n",
" print(f\"IndexIVFFlat created. Is trained: {index_ivf_flat.is_trained}\")\n",
" \n",
" # 训练索引 (学习聚类中心)\n",
" if n_docs >= nlist: # Need enough data to train\n",
" print(f\"Training IndexIVFFlat with {n_docs} vectors...\")\n",
" time_it(index_ivf_flat.train, doc_embeddings)\n",
" print(f\"IndexIVFFlat trained: {index_ivf_flat.is_trained}\")\n",
" else:\n",
" print(f\"Skipping IndexIVFFlat training (need at least {nlist} vectors, got {n_docs}).\")\n",
" index_ivf_flat = None # Cannot use untrained IVF index\n",
" \n",
"else:\n",
" print(\"FAISS or embeddings not available, cannot build index.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 向索引添加向量\n",
"\n",
"使用索引对象的 `.add(vectors)` 方法将向量(必须是 float32 NumPy 数组)添加到索引中。\n",
"对于某些索引(如 IVF),向量会被分配到最近的聚类中心对应的列表中。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\n--- Adding Vectors to Index --- \")\n",
"\n",
"if index_flat_l2 and doc_embeddings is not None:\n",
" print(\"Adding vectors to IndexFlatL2...\")\n",
" index_flat_l2.add(doc_embeddings)\n",
" print(f\"IndexFlatL2 vector count: {index_flat_l2.ntotal}\")\n",
"\n",
"if index_ivf_flat and index_ivf_flat.is_trained and doc_embeddings is not None:\n",
" print(\"\\nAdding vectors to IndexIVFFlat...\")\n",
" index_ivf_flat.add(doc_embeddings)\n",
" print(f\"IndexIVFFlat vector count: {index_ivf_flat.ntotal}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 执行相似性搜索 (`index.search()`)\n",
"\n",
"使用 `.search(query_vectors, k)` 方法来查找与查询向量最相似的 `k` 个向量。\n",
"\n",
"* `query_vectors`: 一个包含一个或多个查询向量的 float32 NumPy 数组 (形状 `[num_queries, dimension]`)。\n",
"* `k`: 要查找的最近邻的数量。\n",
"* 返回两个数组:\n",
" * `D`: 距离数组 (形状 `[num_queries, k]`),包含查询向量到每个最近邻的距离(L2 距离或负内积)。\n",
" * `I`: 索引数组 (形状 `[num_queries, k]`),包含每个最近邻在原始添加数据中的索引。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\n--- Performing Similarity Search --- \")\n",
"\n",
"if (index_flat_l2 or index_ivf_flat) and embedding_model:\n",
" query_text = \" feline animal \" # Query related to 'cat sat on the mat'\n",
" k = 3 # Find top 3 similar documents\n",
" \n",
" print(f\"Query text: '{query_text}'\")\n",
" query_embedding = embedding_model.encode([query_text]).astype('float32')\n",
" print(f\"Query embedding shape: {query_embedding.shape}\")\n",
" \n",
" # --- Search using IndexFlatL2 --- \n",
" if index_flat_l2 and index_flat_l2.ntotal > 0:\n",
" print(\"\\nSearching using IndexFlatL2...\")\n",
" distances_flat, indices_flat = time_it(index_flat_l2.search, query_embedding, k)\n",
" print(f\" Distances (L2): {distances_flat[0]}\")\n",
" print(f\" Indices: {indices_flat[0]}\")\n",
" print(\" Retrieved Documents (FlatL2):\")\n",
" for i, idx in enumerate(indices_flat[0]):\n",
" if 0 <= idx < len(documents):\n",
" print(f\" {i+1}. Index={idx}, Dist={distances_flat[0][i]:.4f} - '{documents[idx]}'\")\n",
" else:\n",
" print(f\" {i+1}. Invalid index {idx} found.\")\n",
" \n",
" # --- Search using IndexIVFFlat --- \n",
" if index_ivf_flat and index_ivf_flat.is_trained and index_ivf_flat.ntotal > 0:\n",
" index_ivf_flat.nprobe = 2 # Search in the 2 nearest clusters (adjust for speed/accuracy)\n",
" print(f\"\\nSearching using IndexIVFFlat (nprobe={index_ivf_flat.nprobe})...\")\n",
" distances_ivf, indices_ivf = time_it(index_ivf_flat.search, query_embedding, k)\n",
" print(f\" Distances (L2): {distances_ivf[0]}\")\n",
" print(f\" Indices: {indices_ivf[0]}\")\n",
" print(\" Retrieved Documents (IVFFlat):\")\n",
" # Note: Indices might be -1 if fewer than k results are found in probed clusters\n",
" for i, idx in enumerate(indices_ivf[0]):\n",
" if idx != -1 and 0 <= idx < len(documents):\n",
" print(f\" {i+1}. Index={idx}, Dist={distances_ivf[0][i]:.4f} - '{documents[idx]}'\")\n",
" elif idx == -1:\n",
" print(f\" {i+1}. No result found in probed clusters for this rank.\")\n",
" else:\n",
" print(f\" {i+1}. Invalid index {idx} found.\")\n",
"else:\n",
" print(\"Index or embedding model not available, skipping search example.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. (简介) 索引的保存与加载\n",
"\n",
"FAISS 索引可以保存到磁盘,以便后续重用,避免重新构建。\n",
"\n",
"```python\n",
"# import faiss\n",
"\n",
"# index_to_save = index_flat_l2 # Or index_ivf_flat\n",
"# index_filename = \"my_faiss_index.index\"\n",
"\n",
"# # --- 保存 --- \n",
"# if index_to_save:\n",
"# print(f\"Saving index to {index_filename}...\")\n",
"# faiss.write_index(index_to_save, index_filename)\n",
"# print(\"Index saved.\")\n",
"\n",
"# # --- 加载 --- \n",
"# if os.path.exists(index_filename):\n",
"# print(f\"\\nLoading index from {index_filename}...\")\n",
"# loaded_index = faiss.read_index(index_filename)\n",
"# print(f\"Index loaded. Vector count: {loaded_index.ntotal}\")\n",
" # Ready to use loaded_index.search(...)\n",
"# os.remove(index_filename) # Cleanup\n",
"# else:\n",
"# print(\"Index file not found for loading.\")\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. (简介) 与 LangChain/LlamaIndex 的集成\n",
"\n",
"FAISS 可以作为 LangChain 和 LlamaIndex 的向量存储后端。\n",
"\n",
"**LangChain 示例:**\n",
"```python\n",
"# from langchain.vectorstores import FAISS\n",
"# from langchain.embeddings import OpenAIEmbeddings # Or other embeddings\n",
"# from langchain.docstore.document import Document\n",
"\n",
"# # Assuming 'split_docs' is a list of LangChain Document objects\n",
"# embeddings = OpenAIEmbeddings()\n",
"# vectorstore_lc = FAISS.from_documents(split_docs, embeddings)\n",
"# retriever_lc = vectorstore_lc.as_retriever()\n",
"# results = retriever_lc.get_relevant_documents(\"your query\")\n",
"# vectorstore_lc.save_local(\"faiss_langchain_index\")\n",
"# loaded_vectorstore_lc = FAISS.load_local(\"faiss_langchain_index\", embeddings)\n",
"```\n",
"\n",
"**LlamaIndex 示例:**\n",
"```python\n",
"# from llama_index.vector_stores import FaissVectorStore\n",
"# from llama_index import VectorStoreIndex, StorageContext\n",
"# import faiss # Need faiss installed\n",
"\n",
"# # Assuming 'nodes' is a list of LlamaIndex Node objects\n",
"# # Assuming Settings.embed_model is configured\n",
"\n",
"# # 1. Create FAISS index directly\n",
"# d = Settings.embed_model.embed_dim \n",
"# faiss_index = faiss.IndexFlatL2(d)\n",
"\n",
"# # 2. Create FaissVectorStore wrapper\n",
"# vector_store_li = FaissVectorStore(faiss_index=faiss_index)\n",
"\n",
"# # 3. Create StorageContext and build index\n",
"# storage_context = StorageContext.from_defaults(vector_store=vector_store_li)\n",
"# index_li = VectorStoreIndex(nodes, storage_context=storage_context)\n",
"\n",
"# # Or, let LlamaIndex handle FAISS creation internally during VectorStoreIndex build\n",
"# # (This might happen if you don't explicitly provide a vector_store)\n",
"\n",
"# # Persisting is usually done via the StorageContext\n",
"# # index_li.storage_context.persist(persist_dir=\"./faiss_llamaindex_index\")\n",
"\n",
"# # Loading\n",
"# # from llama_index import load_index_from_storage, StorageContext\n",
"# # storage_context_load = StorageContext.from_defaults(persist_dir=\"./faiss_llamaindex_index\")\n",
"# # loaded_index_li = load_index_from_storage(storage_context_load)\n",
"```\n",
"通常,使用 LangChain 或 LlamaIndex 提供的封装会更方便,它们会处理好索引构建、添加和搜索的细节。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 总结\n",
"\n",
"FAISS 是一个用于高效向量相似性搜索的强大库,是构建现代 AI 应用(尤其是 RAG 系统)的关键组件。\n",
"\n",
"**关键要点:**\n",
"* 向量数据库/索引库用于存储和快速检索高维嵌入向量。\n",
"* FAISS 提供了多种索引类型,需要在速度、内存和精度之间进行权衡 (`IndexFlatL2` 精确但慢, `IndexIVFFlat` 等 ANN 索引更快但近似)。\n",
"* 核心操作包括构建索引 (`faiss.Index...`)、训练索引 (如果需要)、添加向量 (`.add()`) 和搜索 (`.search()`)。\n",
"* FAISS 需要 NumPy float32 数组作为输入。\n",
"* 通常与文本嵌入模型 (如 Sentence Transformers) 结合使用。\n",
"* 可以作为 LangChain 和 LlamaIndex 的向量存储后端,通常由这些框架封装其使用细节。\n",
"\n",
"理解向量搜索的基本原理以及 FAISS 等库的核心用法,对于构建和优化基于嵌入的 AI 应用非常有帮助。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment