Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LangChain - LLM 应用开发框架教程\n",
"\n",
"欢迎来到 LangChain 教程!LangChain 是一个强大的开源框架,旨在简化基于大型语言模型 (LLM) 的应用程序的开发。它提供了一系列模块化组件、链 (Chains) 和代理 (Agents),使得开发者能够轻松地将 LLM 与外部数据源、计算资源和 API 连接起来,构建更复杂、更有用的 AI 应用。\n",
"\n",
"**为什么使用 LangChain?**\n",
"\n",
"1. **模块化与可组合性**: 提供标准化的接口和可组合的构建块 (Models, Prompts, Indexes, Chains, Memory, Agents),方便灵活构建应用。\n",
"2. **与 LLM 和外部工具集成**: 轻松连接不同的 LLM 提供商 (OpenAI, Hugging Face Hub, Anthropic 等) 以及各种工具 (搜索、计算器、数据库、API)。\n",
"3. **数据感知**: 帮助 LLM 连接到私有或特定领域的数据源,实现基于特定知识的问答或内容生成 (RAG)。\n",
"4. **状态管理**: 提供 Memory 组件来维护对话历史或应用状态。\n",
"5. **Agentic 应用**: 支持构建能够自主规划和执行任务的智能代理。\n",
"6. **活跃的社区和生态**: 快速发展,拥有大量示例和第三方集成。\n",
"\n",
"**核心组件概览 (我们将重点介绍其中一些):**\n",
"* **Models**: 与语言模型交互 (LLMs, ChatModels, Text Embedding Models)。\n",
"* **Prompts**: 管理和优化模型输入 (Prompt Templates, Example Selectors)。\n",
"* **Indexes**: 构建和查询外部数据 (Document Loaders, Text Splitters, Vector Stores, Retrievers)。\n",
"* **Chains**: 将多个组件按顺序或逻辑组合起来 (LCEL - LangChain Expression Language, `SequentialChain` 等)。\n",
"* **Memory**: 在 Chain 或 Agent 调用之间保持状态 (对话历史)。\n",
"* **Agents & Tools**: 让 LLM 使用工具来执行动作。\n",
"\n",
"**本教程将涵盖 LangChain 的核心概念和基础用法:**\n",
"\n",
"1. 安装与设置 (API Keys)\n",
"2. Models: 与 LLM 和 Chat Model 交互\n",
"3. Prompts: 使用 Prompt Templates\n",
"4. LCEL (LangChain Expression Language): 构建简单的 Chain\n",
"5. Indexes: 文档加载、分割、嵌入和向量存储 (基础)\n",
"6. Retrievers: 从索引中检索信息\n",
"7. 构建简单的 RAG (Retrieval-Augmented Generation) Chain\n",
"8. (简介) Memory\n",
"9. (简介) Agents & Tools"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 安装与设置\n",
"\n",
"首先,安装 LangChain 核心库。你还需要安装你想要使用的 LLM 提供商的库(例如 `openai`)和可能的其他集成库(如 `tiktoken` 用于 OpenAI 分词,`faiss-cpu` 或 `chromadb` 用于向量存储)。\n",
"\n",
"```bash\n",
"pip install langchain openai tiktoken python-dotenv\n",
"\n",
"# 如果需要使用向量数据库 (FAISS 示例)\n",
"pip install faiss-cpu # 或者 faiss-gpu 如果有 GPU 和 CUDA\n",
"\n",
"# 如果需要使用 Hugging Face 模型\n",
"# pip install huggingface_hub transformers sentence-transformers\n",
"```\n",
"\n",
"**设置 API Keys**: \n",
"许多 LLM 服务(如 OpenAI)需要 API Key。最佳实践是**不要将 Key 硬编码在代码中**。推荐使用环境变量或 `.env` 文件。\n",
"1. 创建一个名为 `.env` 的文件在你的项目根目录。\n",
"2. 在 `.env` 文件中添加你的 Key: `OPENAI_API_KEY='your_api_key_here'`\n",
"3. 使用 `python-dotenv` 库来加载它。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import langchain\n",
"import os\n",
"from dotenv import load_dotenv\n",
"\n",
"# 尝试加载 .env 文件中的环境变量\n",
"load_success = load_dotenv() # 返回 True 如果 .env 文件被找到并加载\n",
"print(f\"LangChain version: {langchain.__version__}\")\n",
"print(f\".env file loaded: {load_success}\")\n",
"\n",
"# 检查 OpenAI API Key 是否已设置 (从环境变量读取)\n",
"openai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
"if openai_api_key:\n",
" print(\"OpenAI API Key found in environment variables.\")\n",
" # 为了安全,不直接打印 Key\n",
" # print(f\"OpenAI API Key starts with: {openai_api_key[:5]}...\") \n",
" openai_available = True\n",
"else:\n",
" print(\"OpenAI API Key not found. Please set it in your environment or a .env file to run OpenAI examples.\")\n",
" openai_available = False\n",
"\n",
"# 可以在这里设置 Key,但不推荐用于共享代码\n",
"# import openai\n",
"# openai.api_key = \"sk-...\"\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Models: 与 LLM 和 Chat Model 交互\n",
"\n",
"LangChain 提供了与不同类型语言模型交互的标准接口。\n",
"\n",
"* **LLMs**: 基于文本补全的模型,输入是字符串,输出是字符串 (例如 OpenAI 的 `text-davinci-003` - 旧模型)。\n",
"* **ChatModels**: 基于聊天消息的模型,输入是一系列消息 (System, Human, AI),输出是一条 AI 消息 (例如 OpenAI 的 `gpt-3.5-turbo`, `gpt-4`)。\n",
"* **Text Embedding Models**: 将文本转换为向量表示 (嵌入)。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI # 使用 Chat Model\n",
"# from langchain.llms import OpenAI # 使用旧的 LLM 接口\n",
"from langchain.schema import HumanMessage, SystemMessage, AIMessage\n",
"\n",
"print(\"--- Interacting with Models ---\")\n",
"\n",
"if openai_available:\n",
" # --- 使用 Chat Model (推荐) ---\n",
" print(\"\\nUsing ChatOpenAI (gpt-3.5-turbo by default):\")\n",
" try:\n",
" chat = ChatOpenAI(temperature=0.7) # temperature 控制创造性 (0=确定性, >1=更随机)\n",
" \n",
" # 简单的人类消息\n",
" human_message = HumanMessage(content=\"Explain the concept of 'gradient descent' in one sentence.\")\n",
" ai_response = chat([human_message])\n",
" print(f\"Response to simple query: {ai_response.content}\")\n",
" print(f\"Response type: {type(ai_response)}\")\n",
" \n",
" # 包含系统消息和多轮对话\n",
" messages = [\n",
" SystemMessage(content=\"You are a helpful assistant that translates English to French.\"),\n",
" HumanMessage(content=\"Translate: 'Hello, how are you?'\")\n",
" ]\n",
" ai_response_french = chat(messages)\n",
" print(f\"\\nResponse to translation query: {ai_response_french.content}\")\n",
" \n",
" # 可以将 AI 的回答加入消息列表继续对话\n",
" messages.append(ai_response_french)\n",
" messages.append(HumanMessage(content=\"And 'Thank you'?\"))\n",
" ai_response_thanks = chat(messages)\n",
" print(f\"Response to 'Thank you': {ai_response_thanks.content}\")\n",
"\n",
" except Exception as e:\n",
" print(f\"Error interacting with ChatOpenAI: {e}\")\n",
" print(\"This might be due to invalid API key, network issues, or OpenAI service status.\")\n",
" \n",
" # --- (可选) 使用旧的 LLM 接口 --- \n",
" # print(\"\\nUsing OpenAI LLM (text-davinci-003 - might be deprecated):\")\n",
" # try:\n",
" # llm = OpenAI(temperature=0.9)\n",
" # text_prompt = \"What is the capital of France?\"\n",
" # completion = llm(text_prompt)\n",
" # print(f\"LLM completion for '{text_prompt}': {completion.strip()}\")\n",
" # except Exception as e:\n",
" # print(f\"Error interacting with OpenAI LLM: {e}\")\n",
" \n",
"else:\n",
" print(\"Skipping Model interaction examples because OpenAI API Key is not configured.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Prompts: 使用 Prompt Templates\n",
"\n",
"Prompt Templates 帮助我们动态地、一致地构建给 LLM 的指令 (Prompt)。\n",
"\n",
"* **`PromptTemplate`**: 用于简单的、包含几个变量的模板。\n",
"* **`ChatPromptTemplate`**: 用于构建聊天消息列表的模板,可以包含系统消息、人类消息模板等。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate, ChatPromptTemplate\n",
"from langchain.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate\n",
"\n",
"print(\"--- Using Prompt Templates ---\")\n",
"\n",
"# --- PromptTemplate (for LLMs) --- \n",
"template_llm = \"Tell me a short joke about a {subject}.\"\n",
"prompt_llm = PromptTemplate(input_variables=[\"subject\"], template=template_llm)\n",
"formatted_prompt_llm = prompt_llm.format(subject=\"computer\")\n",
"print(f\"Formatted LLM prompt:\\n{formatted_prompt_llm}\")\n",
"\n",
"# --- ChatPromptTemplate (for Chat Models) --- \n",
"system_template = \"You are a helpful assistant who writes concise summaries in {language}.\"\n",
"human_template = \"Summarize the following text: {text}\"\n",
"\n",
"system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)\n",
"human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)\n",
"\n",
"chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])\n",
"\n",
"# 格式化聊天提示\n",
"formatted_chat_messages = chat_prompt.format_messages(\n",
" language=\"Spanish\", \n",
" text=\"LangChain provides modules for models, prompts, indexes, memory, chains, and agents. It aims to simplify building complex applications powered by large language models.\"\n",
")\n",
"\n",
"print(\"\\nFormatted Chat Messages:\")\n",
"for msg in formatted_chat_messages:\n",
" print(f\" Type: {type(msg).__name__}, Content: '{msg.content[:50]}...'\" if len(msg.content) > 50 else f\" Type: {type(msg).__name__}, Content: '{msg.content}'\")\n",
" \n",
"# 可以在与 Chat Model 交互时直接使用 format_prompt().to_messages()\n",
"# formatted_prompt_value = chat_prompt.format_prompt(language=\"Spanish\", text=\"...\")\n",
"# messages_for_llm = formatted_prompt_value.to_messages()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. LCEL (LangChain Expression Language): 构建简单的 Chain\n",
"\n",
"LCEL 是 LangChain 的核心,它提供了一种声明式的方式来组合不同的 LangChain 组件 (如 Prompt, Model, Output Parser),使用类似 Python 管道 (`|`) 的语法。\n",
"\n",
"**好处**: \n",
"* **可组合性**: 轻松连接组件。\n",
"* **流式处理 (Streaming)**: 支持流式输出。\n",
"* **批处理 (Batching)**: 高效处理多个输入。\n",
"* **异步支持 (Async)**。\n",
"* **可观测性 (Observability)**: 与 LangSmith 等工具集成良好。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.output_parser import StrOutputParser\n",
"\n",
"print(\"--- Using LCEL to build a simple chain ---\")\n",
"\n",
"# 定义一个简单的链: Prompt -> Chat Model -> String Output Parser\n",
"# 1. Prompt Template (使用之前的 chat_prompt)\n",
"# 2. Chat Model (使用之前的 chat 实例)\n",
"# 3. Output Parser (将 AI Message 转换为字符串)\n",
"output_parser = StrOutputParser()\n",
"\n",
"if openai_available:\n",
" # 定义链\n",
" # The | operator chains the components\n",
" summarization_chain = chat_prompt | chat | output_parser\n",
" print(f\"Chain created: {summarization_chain}\")\n",
" \n",
" # 调用链 (使用 invoke)\n",
" input_dict = {\n",
" \"language\": \"English\",\n",
" \"text\": \"The weather today is sunny with a high of 25 degrees Celsius. There is a slight breeze from the west.\"\n",
" }\n",
" try:\n",
" summary_result = summarization_chain.invoke(input_dict)\n",
" print(\"\\nInvoking the summarization chain:\")\n",
" print(f\"Input: {input_dict['text']}\")\n",
" print(f\"Output Summary (English): {summary_result}\")\n",
" \n",
" # 尝试不同的语言\n",
" input_dict_fr = {**input_dict, \"language\": \"French\"}\n",
" summary_result_fr = summarization_chain.invoke(input_dict_fr)\n",
" print(f\"\\nOutput Summary (French): {summary_result_fr}\")\n",
" \n",
" except Exception as e:\n",
" print(f\"\\nError invoking chain: {e}\")\n",
" \n",
"else:\n",
" print(\"Skipping LCEL chain example (OpenAI Key not available).\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Indexes: 文档加载、分割、嵌入和向量存储 (基础)\n",
"\n",
"为了让 LLM 能够利用外部或私有数据(实现 RAG),我们需要对这些数据进行处理:\n",
"\n",
"1. **加载 (Load)**: 使用 `DocumentLoader` 从各种来源(文件、网页、数据库等)加载数据为 `Document` 对象。\n",
"2. **分割 (Split)**: 使用 `TextSplitter` 将长文档分割成更小的、语义相关的块 (chunks)。\n",
"3. **存储 (Store)**: \n",
" * 使用 **Text Embedding Model** 将每个块转换为数值向量 (embedding)。\n",
" * 将文本块和对应的 embedding 存储到**向量存储 (Vector Store)** 中,以便高效检索。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.embeddings import OpenAIEmbeddings # 需要 OpenAI key\n",
"# from langchain.embeddings import HuggingFaceEmbeddings # 使用 Hugging Face 模型 (本地或 Hub)\n",
"from langchain.vectorstores import FAISS # 使用 FAISS 作为向量存储\n",
"# from langchain.vectorstores import Chroma # 另一个流行的本地向量存储\n",
"\n",
"print(\"--- Indexing Example (Load, Split, Embed, Store) ---\")\n",
"\n",
"# --- 0. 准备示例文本文件 ---\n",
"sample_text_content = \"\"\"\n",
"LangChain is a framework for developing applications powered by language models.\n",
"It enables applications that are: Data-aware and Agentic.\n",
"Data-aware: connect a language model to other sources of data.\n",
"Agentic: allow a language model to interact with its environment.\n",
"The main value props of LangChain are: Components and Use-Cases.\n",
"Components: modular abstractions for working with language models.\n",
"Use-Cases: end-to-end chains for common applications like RAG and Agents.\n",
"LCEL, or LangChain Expression Language, is a declarative way to compose chains.\n",
"\"\"\"\n",
"sample_text_path = \"langchain_sample.txt\"\n",
"with open(sample_text_path, \"w\") as f:\n",
" f.write(sample_text_content)\n",
"print(f\"Sample text file '{sample_text_path}' created.\")\n",
"\n",
"# --- 1. 加载文档 ---\n",
"loader = TextLoader(sample_text_path)\n",
"documents = loader.load()\n",
"print(f\"\\nLoaded {len(documents)} document(s).\")\n",
"print(f\"First document content (preview):\\n{documents[0].page_content[:100]}...\")\n",
"\n",
"# --- 2. 分割文档 ---\n",
"# RecursiveCharacterTextSplitter 尝试按段落、句子、单词等分割,保持语义完整性\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=150, # 每个块的最大字符数\n",
" chunk_overlap=20 # 块之间的重叠字符数\n",
")\n",
"split_docs = text_splitter.split_documents(documents)\n",
"print(f\"\\nDocument split into {len(split_docs)} chunks.\")\n",
"print(\"First chunk example:\")\n",
"print(split_docs[0].page_content)\n",
"\n",
"# --- 3. 创建 Embedding 模型 --- \n",
"embeddings = None\n",
"if openai_available:\n",
" try:\n",
" embeddings = OpenAIEmbeddings()\n",
" print(\"\\nOpenAIEmbeddings model created.\")\n",
" # 示例:获取第一个块的嵌入向量\n",
" # vector = embeddings.embed_query(split_docs[0].page_content)\n",
" # print(f\"Embedding vector dimension: {len(vector)}\")\n",
" except Exception as e:\n",
" print(f\"Error creating OpenAI embeddings: {e}\")\n",
"else:\n",
" print(\"\\nSkipping embedding creation (OpenAI key unavailable).\")\n",
" # # 或者使用 Hugging Face 的模型 (需要安装 sentence-transformers)\n",
" # try:\n",
" # from langchain.embeddings import HuggingFaceEmbeddings\n",
" # model_name = \"sentence-transformers/all-MiniLM-L6-v2\"\n",
" # embeddings = HuggingFaceEmbeddings(model_name=model_name)\n",
" # print(f\"HuggingFaceEmbeddings model '{model_name}' created.\")\n",
" # except ImportError:\n",
" # print(\"HuggingFaceEmbeddings requires 'sentence-transformers'. Install it.\")\n",
" # except Exception as e:\n",
" # print(f\"Error creating HuggingFace embeddings: {e}\")\n",
"\n",
"# --- 4. 创建并填充向量存储 (Vector Store) --- \n",
"vector_store = None\n",
"if embeddings and split_docs:\n",
" try:\n",
" # 使用 FAISS (内存向量存储)\n",
" print(\"Creating FAISS vector store from documents...\")\n",
" vector_store = FAISS.from_documents(split_docs, embeddings)\n",
" print(f\"FAISS vector store created. Index contains {vector_store.index.ntotal} vectors.\")\n",
" \n",
" # 可以保存和加载 FAISS 索引\n",
" # vector_store.save_local(\"faiss_index_langchain\")\n",
" # new_vector_store = FAISS.load_local(\"faiss_index_langchain\", embeddings)\n",
" \n",
" except Exception as e:\n",
" print(f\"Error creating or using FAISS vector store: {e}\")\n",
"else:\n",
" print(\"\\nSkipping vector store creation (embeddings or documents unavailable).\")\n",
"\n",
"# 清理文本文件\n",
"if os.path.exists(sample_text_path):\n",
" os.remove(sample_text_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Retrievers: 从索引中检索信息\n",
"\n",
"向量存储通常提供一个 `.as_retriever()` 方法,返回一个 Retriever 对象。Retriever 负责根据用户查询,从向量存储中找出最相关的文档块。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Using Retriever --- \")\n",
"\n",
"if vector_store:\n",
" retriever = vector_store.as_retriever(search_kwargs={\"k\": 2}) # 获取最相关的 2 个块\n",
" print(\"Retriever created from FAISS vector store.\")\n",
" \n",
" query = \"What is LCEL?\"\n",
" try:\n",
" relevant_docs = retriever.get_relevant_documents(query)\n",
" print(f\"\\nQuery: '{query}'\")\n",
" print(f\"Retrieved {len(relevant_docs)} relevant documents:\")\n",
" for i, doc in enumerate(relevant_docs):\n",
" print(f\" Doc {i+1}: {doc.page_content.replace('\\n', ' ')}\")\n",
" # print(f\" Metadata: {doc.metadata}\") # 通常包含来源信息\n",
" except Exception as e:\n",
" print(f\"Error during retrieval: {e}\")\n",
"else:\n",
" print(\"Vector store not available, skipping retriever example.\")\n",
" retriever = None # Define retriever as None if not created"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. 构建简单的 RAG (Retrieval-Augmented Generation) Chain\n",
"\n",
"RAG 结合了信息检索和语言模型生成。基本流程:\n",
"1. 接收用户问题。\n",
"2. 使用 Retriever 查找相关的文档块。\n",
"3. 将问题和检索到的上下文信息一起输入到 Prompt Template。\n",
"4. 将格式化后的 Prompt 发送给 LLM。\n",
"5. LLM 基于上下文生成答案。\n",
"\n",
"LCEL 使这个流程的构建非常直观。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema.runnable import RunnablePassthrough\n",
"\n",
"print(\"--- Building a Simple RAG Chain --- \")\n",
"\n",
"if openai_available and retriever:\n",
" # 定义 RAG 的 Prompt Template\n",
" rag_template = \"\"\"Answer the question based only on the following context:\n",
" {context}\n",
" \n",
" Question: {question}\n",
" \n",
" Answer:\"\"\"\n",
" rag_prompt = ChatPromptTemplate.from_template(rag_template)\n",
" \n",
" # 定义 RAG 链\n",
" # 1. 输入: 一个包含 \"question\" 键的字典\n",
" # 2. RunnablePassthrough 将 \"question\" 传递下去\n",
" # 3. 同时,使用 retriever 获取相关 context (输入也是 \"question\")\n",
" # 4. 将 question 和 context 组合成 prompt 的输入\n",
" # 5. 调用 prompt template\n",
" # 6. 调用 LLM (chat model)\n",
" # 7. 解析输出\n",
" rag_chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
" | rag_prompt \n",
" | chat # Use the ChatOpenAI instance from earlier\n",
" | StrOutputParser()\n",
" )\n",
" \n",
" print(\"RAG chain created.\")\n",
"\n",
" # 调用 RAG 链\n",
" user_question = \"What is LangChain Expression Language?\"\n",
" print(f\"\\nInvoking RAG chain with question: '{user_question}'\")\n",
" try:\n",
" rag_answer = rag_chain.invoke(user_question)\n",
" print(f\"\\nRAG Answer:\\n{rag_answer}\")\n",
" \n",
" user_question_2 = \"What are the main value props of LangChain?\"\n",
" print(f\"\\nInvoking RAG chain with question: '{user_question_2}'\")\n",
" rag_answer_2 = rag_chain.invoke(user_question_2)\n",
" print(f\"\\nRAG Answer:\\n{rag_answer_2}\")\n",
" \n",
" except Exception as e:\n",
" print(f\"Error invoking RAG chain: {e}\")\n",
"else:\n",
" print(\"Skipping RAG chain example (OpenAI key or Retriever unavailable).\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. (简介) Memory\n",
"\n",
"Memory 组件用于在 Chain 或 Agent 的多次调用之间存储和检索信息,最常见的用途是维持对话历史。\n",
"\n",
"* **`ConversationBufferMemory`**: 存储完整的对话历史。\n",
"* **`ConversationBufferWindowMemory`**: 只存储最近的 K 轮对话。\n",
"* **`ConversationSummaryMemory`**: 使用 LLM 动态总结对话历史。\n",
"\n",
"通常将 Memory 对象添加到 Chain (如 `ConversationChain`) 或 Agent 中。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chains import ConversationChain\n",
"\n",
"print(\"\\n--- Memory Introduction ---\")\n",
"\n",
"if openai_available:\n",
" # 创建带 Memory 的对话链\n",
" memory = ConversationBufferMemory()\n",
" conversation = ConversationChain(\n",
" llm=chat, # Use the ChatOpenAI instance from earlier\n",
" memory=memory,\n",
" verbose=False # Set to True to see the full prompt with history\n",
" )\n",
" print(\"ConversationChain with memory created.\")\n",
" \n",
" # 进行对话\n",
" print(\"\\nSimulating conversation with memory:\")\n",
" try:\n",
" response1 = conversation.predict(input=\"Hi there! My name is Bob.\")\n",
" print(f\"AI Response 1: {response1}\")\n",
" \n",
" response2 = conversation.predict(input=\"What is my name?\")\n",
" print(f\"AI Response 2: {response2}\") # AI 应该能记住名字\n",
" \n",
" # 查看 Memory 中的内容\n",
" print(\"\\nCurrent Memory Buffer:\")\n",
" print(memory.buffer)\n",
" \n",
" except Exception as e:\n",
" print(f\"Error during conversation chain: {e}\")\n",
"else:\n",
" print(\"Skipping Memory example (OpenAI key not available).\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. (简介) Agents & Tools\n",
"\n",
"Agent 使用 LLM 来**决定**采取哪些行动以及行动的顺序。这些行动可以由 **Tools** 提供,例如:\n",
"* Google 搜索\n",
"* Python REPL\n",
"* 计算器\n",
"* 数据库查询\n",
"* 调用其他 Chain 或 API\n",
"\n",
"**基本流程**: \n",
"1. 定义 Agent 可以使用的 Tools。\n",
"2. 初始化 Agent Executor,传入 LLM、Tools 和 Prompt(通常使用特定的 Agent Prompt)。\n",
"3. 运行 Agent Executor,提供用户输入。\n",
"4. Agent 会进行思考、选择工具、执行工具、观察结果,并循环此过程,直到任务完成或达到限制。\n",
"\n",
"构建 Agent 涉及更多细节,如 Prompt 工程、选择合适的 Agent 类型 (如 ReAct, OpenAI Functions Agent) 等。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Agent 示例通常需要额外的库 (如 google-search-results) 或 API Key\n",
"# 这里仅作概念说明\n",
"print(\"\\n--- Agents & Tools Introduction ---\")\n",
"print(\"Agents use an LLM to decide which 'Tools' (like search, calculator, other chains) to use.\")\n",
"print(\"Example (conceptual):\")\n",
"print(\"1. Define tools (e.g., a search tool, a calculator tool).\")\n",
"print(\"2. Initialize an Agent Executor with the LLM and tools.\")\n",
"print(\"3. Run the executor with a complex query (e.g., 'What was the weather in London yesterday and what is that temperature in Fahrenheit?').\")\n",
"print(\"4. The agent would decide to use the search tool first, then the calculator tool.\")\n",
"\n",
"# from langchain.agents import load_tools, initialize_agent, AgentType\n",
"# if openai_available:\n",
"# try:\n",
"# # Requires 'pip install google-search-results'\n",
"# # Requires SERPAPI_API_KEY environment variable\n",
"# tools = load_tools([\"serpapi\", \"llm-math\"], llm=chat) # Example tools\n",
"# agent = initialize_agent(tools, chat, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)\n",
"# # agent.run(\"Who is the current CEO of OpenAI? What is their age raised to the power of 0.5?\")\n",
"# print(\"\\n(Agent execution code commented out due to external dependencies/keys)\")\n",
"# except Exception as e:\n",
"# print(f\"\\nCould not initialize or run agent (missing dependencies/keys?): {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 总结\n",
"\n",
"LangChain 是一个功能非常丰富的框架,极大地简化了构建基于 LLM 的复杂应用程序的过程。\n",
"\n",
"**关键要点:**\n",
"* 核心在于模块化组件 (Models, Prompts, Indexes, Memory, Chains, Agents, Tools)。\n",
"* **LCEL** 是组合这些组件的现代、声明式方式。\n",
"* **RAG** (通过 Indexes 和 Retrievers 实现) 是让 LLM 利用外部知识的关键模式。\n",
"* **Memory** 用于构建有状态的对话应用。\n",
"* **Agents** 赋予 LLM 执行动作和与环境交互的能力。\n",
"\n",
"LangChain 的生态系统仍在快速发展,掌握其核心组件和 LCEL 是构建强大 LLM 应用的基础。官方文档 ([https://python.langchain.com/](https://python.langchain.com/)) 提供了大量深入的指南和示例。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Final cleanup of the sample text file if it still exists\n",
"if os.path.exists(sample_text_path):\n",
" os.remove(sample_text_path)\n",
" print(f\"Cleaned up {sample_text_path}\")\n",
"# Cleanup potential FAISS index files (if saving was uncommented)\n",
"# import shutil\n",
"# if os.path.exists(\"faiss_index_langchain\"): \n",
"# shutil.rmtree(\"faiss_index_langchain\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment