Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 超参数优化教程 (Optuna & Ray Tune)\n",
"\n",
"欢迎来到超参数优化 (Hyperparameter Optimization, HPO) 教程!机器学习模型的性能往往对其超参数(如学习率、网络层数等)非常敏感。手动调优耗时且低效,而 HPO 工具则旨在自动化寻找最佳超参数组合的过程。\n",
"\n",
"本教程将分别介绍两个流行的 Python HPO 框架:\n",
"\n",
"1. **Optuna**: 一个现代、易于使用的 HPO 框架,具有 Pythonic 的 API,支持多种高效的采样和剪枝算法。\n",
"2. **Ray Tune**: Ray 生态系统的一部分,专注于提供可扩展、灵活的 HPO,支持分布式执行和高级调度策略。\n",
"\n",
"我们将通过优化一个简单神经网络分类器的**学习率**和**隐藏层大小**的示例,分别展示如何使用这两个工具。\n",
"\n",
"**本教程结构:**\n",
"1. 准备工作(安装库、公共数据准备)。\n",
"2. 使用 Optuna 进行超参数优化。\n",
"3. 使用 Ray Tune 进行超参数优化。\n",
"4. 简要比较与总结。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 准备工作\n",
"\n",
"安装必要的库,并准备用于优化的数据集和基础模型结构。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.1 安装库\n",
"\n",
"```bash\n",
"pip install optuna \"ray[tune]\" scikit-learn torch torchvision numpy pandas matplotlib seaborn\n",
"# 为了 Optuna 可视化\n",
"pip install plotly\n",
"# 为了 Ray Tune 使用 Optuna 作为搜索算法 (可选)\n",
"# pip install \"ray[tune, optuna]\"\n",
"```\n",
"**注意**: Ray Tune 可能需要配置 Ray Core (`ray.init()`)。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- 公共导入 --- (虽然分开介绍,但先导入方便检查)\n",
"import optuna\n",
"import ray\n",
"from ray import tune\n",
"from ray.tune.search.optuna import OptunaSearch \n",
"from ray.tune.schedulers import ASHAScheduler\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"from torch.utils.data import DataLoader, TensorDataset\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import accuracy_score\n",
"from sklearn.datasets import make_classification\n",
"import numpy as np\n",
"import time\n",
"import os \n",
"\n",
"print(f\"Optuna version: {optuna.__version__}\")\n",
"print(f\"Ray version: {ray.__version__}\")\n",
"\n",
"# --- Ray 初始化 (执行一次) --- \n",
"if not ray.is_initialized():\n",
" try:\n",
" # Limit resources for notebook environment if needed\n",
" ray.init(num_cpus=min(4, os.cpu_count()), ignore_reinit_error=True, log_to_driver=False)\n",
" print(\"Ray initialized.\")\n",
" except Exception as e:\n",
" print(f\"Could not initialize Ray: {e}\")\n",
"else:\n",
" print(\"Ray already initialized.\")\n",
"\n",
"# --- 公共数据准备 (执行一次) --- \n",
"print(\"\\nPreparing synthetic dataset...\")\n",
"X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)\n",
"X = X.astype(np.float32)\n",
"y = y.astype(np.int64)\n",
"X_train_hpo, X_val_hpo, y_train_hpo, y_val_hpo = train_test_split(X, y, test_size=0.25, random_state=42)\n",
"\n",
"X_train_tensor_hpo = torch.from_numpy(X_train_hpo)\n",
"y_train_tensor_hpo = torch.from_numpy(y_train_hpo)\n",
"X_val_tensor_hpo = torch.from_numpy(X_val_hpo)\n",
"y_val_tensor_hpo = torch.from_numpy(y_val_hpo)\n",
"\n",
"train_dataset_hpo = TensorDataset(X_train_tensor_hpo, y_train_tensor_hpo)\n",
"val_dataset_hpo = TensorDataset(X_val_tensor_hpo, y_val_tensor_hpo)\n",
"\n",
"print(f\"Dataset prepared: X_train shape={X_train_tensor_hpo.shape}, X_val shape={X_val_tensor_hpo.shape}\")\n",
"input_size_hpo = X_train_hpo.shape[1]\n",
"\n",
"# --- 公共模型定义 (定义一次) --- \n",
"class SimpleNN_HPO(nn.Module):\n",
" def __init__(self, input_size, hidden_size):\n",
" super().__init__()\n",
" self.layer1 = nn.Linear(input_size, hidden_size)\n",
" self.relu = nn.ReLU()\n",
" self.layer2 = nn.Linear(hidden_size, 2) # 2 classes\n",
"\n",
" def forward(self, x):\n",
" return self.layer2(self.relu(self.layer1(x)))\n",
"print(\"SimpleNN_HPO model defined.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 使用 Optuna 进行超参数优化\n",
"\n",
"Optuna 通过定义一个 `objective` 函数来工作,该函数接收 `trial` 对象,使用它建议超参数,然后训练并评估模型,最后返回需要优化的指标值。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- Optuna: 导入特定库 (如果需要,但已在顶部导入) ---\n",
"# import optuna\n",
"# import torch\n",
"# ... (其他依赖)\n",
"\n",
"print(\"\\n--- Optuna Example --- \")\n",
"device_optuna = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"# --- Optuna: 定义 Objective 函数 --- \n",
"def objective_optuna(trial):\n",
" # 建议超参数\n",
" lr = trial.suggest_float(\"lr\", 1e-4, 1e-1, log=True)\n",
" hidden_size = trial.suggest_int(\"hidden_size\", 32, 128, step=32)\n",
" optimizer_name = trial.suggest_categorical(\"optimizer\", [\"Adam\", \"SGD\"])\n",
" batch_size = 64 # 固定 batch size 示例\n",
" \n",
" # 创建模型和优化器\n",
" model = SimpleNN_HPO(input_size_hpo, hidden_size).to(device_optuna)\n",
" if optimizer_name == \"Adam\":\n",
" optimizer = optim.Adam(model.parameters(), lr=lr)\n",
" else:\n",
" optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
" \n",
" criterion = nn.CrossEntropyLoss()\n",
" train_loader_optuna = DataLoader(train_dataset_hpo, batch_size=batch_size)\n",
" val_loader_optuna = DataLoader(val_dataset_hpo, batch_size=batch_size)\n",
" \n",
" # 训练模型 (简化周期)\n",
" n_epochs_optuna = 4\n",
" for epoch in range(n_epochs_optuna):\n",
" model.train()\n",
" for batch_X, batch_y in train_loader_optuna:\n",
" batch_X, batch_y = batch_X.to(device_optuna), batch_y.to(device_optuna)\n",
" optimizer.zero_grad()\n",
" outputs = model(batch_X)\n",
" loss = criterion(outputs, batch_y)\n",
" loss.backward()\n",
" optimizer.step()\n",
" \n",
" # 评估并报告中间结果 (用于剪枝)\n",
" model.eval()\n",
" correct, total = 0, 0\n",
" with torch.no_grad():\n",
" for batch_X_val, batch_y_val in val_loader_optuna:\n",
" batch_X_val, batch_y_val = batch_X_val.to(device_optuna), batch_y_val.to(device_optuna)\n",
" outputs_val = model(batch_X_val)\n",
" _, predicted = torch.max(outputs_val.data, 1)\n",
" total += batch_y_val.size(0)\n",
" correct += (predicted == batch_y_val).sum().item()\n",
" accuracy = correct / total\n",
" trial.report(accuracy, epoch)\n",
" \n",
" if trial.should_prune():\n",
" # print(f\" Optuna Trial {trial.number} pruned at epoch {epoch+1}\")\n",
" raise optuna.TrialPruned()\n",
"\n",
" # 返回最终指标\n",
" # print(f\"Optuna Trial {trial.number} finished. Accuracy: {accuracy:.4f}, Params: {trial.params}\")\n",
" return accuracy\n",
"\n",
"# --- Optuna: 创建 Study 并运行优化 --- \n",
"study_optuna = optuna.create_study(direction=\"maximize\", pruner=optuna.pruners.MedianPruner(n_startup_trials=2, n_warmup_steps=1))\n",
"n_trials_optuna = 15 # 运行试验次数\n",
"print(f\"Starting Optuna optimization for {n_trials_optuna} trials...\")\n",
"study_optuna.optimize(objective_optuna, n_trials=n_trials_optuna, timeout=120, show_progress_bar=True)\n",
"print(\"Optuna optimization finished.\")\n",
"\n",
"# --- Optuna: 查看结果 --- \n",
"print(\"\\n--- Optuna Results ---\")\n",
"print(f\"Number of finished trials: {len(study_optuna.trials)}\")\n",
"try:\n",
" best_trial_optuna = study_optuna.best_trial\n",
" print(f\"Best trial number: {best_trial_optuna.number}\")\n",
" print(f\" Value (Best Accuracy): {best_trial_optuna.value:.4f}\")\n",
" print(\" Best Parameters:\")\n",
" for key, value in best_trial_optuna.params.items():\n",
" print(f\" {key}: {value}\")\n",
"except ValueError:\n",
" print(\"No completed trials found for Optuna.\")\n",
"\n",
"# --- Optuna: 可视化 --- \n",
"print(\"\\nAttempting Optuna visualizations...\")\n",
"try:\n",
" import plotly\n",
" if study_optuna.trials: # Check if there are trials before plotting\n",
" fig1 = optuna.visualization.plot_optimization_history(study_optuna)\n",
" fig2 = optuna.visualization.plot_param_importances(study_optuna)\n",
" fig1.show()\n",
" fig2.show()\n",
" else:\n",
" print(\"No trials to visualize for Optuna.\")\n",
"except ImportError:\n",
" print(\"Plotly not installed. Skipping Optuna visualizations.\")\n",
"except Exception as e:\n",
" print(f\"Error during Optuna visualization: {e}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 使用 Ray Tune 进行超参数优化\n",
"\n",
"Ray Tune 使用可训练函数/类 (`Trainable`)、搜索空间、搜索算法和调度器来管理优化过程。它特别适合需要并行或分布式执行的场景。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- Ray Tune: 导入特定库 (如果需要) ---\n",
"# import ray\n",
"# from ray import tune\n",
"# from ray.tune.search.optuna import OptunaSearch \n",
"# from ray.tune.schedulers import ASHAScheduler\n",
"# import torch\n",
"# ... (其他依赖)\n",
"\n",
"print(\"\\n--- Ray Tune Example --- \")\n",
"device_ray = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"# --- Ray Tune: 定义 Trainable 函数 ---\n",
"# 注意:Trainable 函数通常应该在其内部加载数据,以便分布式运行。\n",
"# 为简化起见,这里仍然引用外部数据,但在实际分布式中需要调整。\n",
"def trainable_ray(config):\n",
" lr = config[\"lr\"]\n",
" hidden_size = int(config[\"hidden_size\"])\n",
" optimizer_name = config[\"optimizer\"]\n",
" batch_size = config.get(\"batch_size\", 64) # Allow default if not in config\n",
" \n",
" # DataLoaders within the function for potential distribution\n",
" train_loader_tune = DataLoader(train_dataset_hpo, batch_size=batch_size)\n",
" val_loader_tune = DataLoader(val_dataset_hpo, batch_size=batch_size)\n",
" \n",
" model = SimpleNN_HPO(input_size_hpo, hidden_size).to(device_ray)\n",
" if optimizer_name == \"Adam\":\n",
" optimizer = optim.Adam(model.parameters(), lr=lr)\n",
" else:\n",
" optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
" criterion = nn.CrossEntropyLoss()\n",
"\n",
" n_epochs_ray = 10 # More epochs for scheduler\n",
" for epoch in range(n_epochs_ray):\n",
" model.train()\n",
" for batch_X, batch_y in train_loader_tune:\n",
" batch_X, batch_y = batch_X.to(device_ray), batch_y.to(device_ray)\n",
" optimizer.zero_grad()\n",
" outputs = model(batch_X)\n",
" loss = criterion(outputs, batch_y)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" # Evaluate and report to Ray Tune\n",
" model.eval()\n",
" correct, total = 0, 0\n",
" with torch.no_grad():\n",
" for batch_X_val, batch_y_val in val_loader_tune:\n",
" batch_X_val, batch_y_val = batch_X_val.to(device_ray), batch_y_val.to(device_ray)\n",
" outputs_val = model(batch_X_val)\n",
" _, predicted = torch.max(outputs_val.data, 1)\n",
" total += batch_y_val.size(0)\n",
" correct += (predicted == batch_y_val).sum().item()\n",
" accuracy = correct / total\n",
" \n",
" # 使用 tune.report()\n",
" tune.report(mean_accuracy=accuracy, epoch=epoch)\n",
"\n",
"# --- Ray Tune: 定义搜索空间 --- \n",
"search_space_ray = {\n",
" \"lr\": tune.loguniform(1e-4, 1e-1),\n",
" \"hidden_size\": tune.qrandint(32, 128, q=32),\n",
" \"optimizer\": tune.choice([\"Adam\", \"SGD\"]),\n",
" # \"batch_size\": tune.choice([64, 128]) # Can also tune batch size\n",
"}\n",
"\n",
"# --- Ray Tune: 配置搜索算法和调度器 --- \n",
"# 使用 Optuna 搜索算法 (示例)\n",
"optuna_search = OptunaSearch(metric=\"mean_accuracy\", mode=\"max\")\n",
"\n",
"# 使用 ASHA 调度器\n",
"asha_scheduler = ASHAScheduler(\n",
" metric=\"mean_accuracy\", mode=\"max\", max_t=10, grace_period=1, reduction_factor=2\n",
")\n",
"\n",
"# --- Ray Tune: 运行 Tuner --- \n",
"num_samples_ray = 10 # Number of trials\n",
"print(f\"\\nStarting Ray Tune optimization for {num_samples_ray} samples...\")\n",
"if ray.is_initialized():\n",
" try:\n",
" tuner = tune.Tuner(\n",
" trainable_ray,\n",
" param_space=search_space_ray,\n",
" tune_config=tune.TuneConfig(\n",
" search_alg=optuna_search,\n",
" scheduler=asha_scheduler,\n",
" num_samples=num_samples_ray,\n",
" metric=\"mean_accuracy\",\n",
" mode=\"max\"\n",
" ),\n",
" run_config=ray.train.RunConfig(\n",
" name=\"hpo_ray_tune_demo\",\n",
" verbose=1,\n",
" # Stop criteria can be added here, e.g.:\n",
" # stop={\"training_iteration\": n_epochs_ray}\n",
" )\n",
" )\n",
" results_ray = tuner.fit()\n",
" print(\"Ray Tune optimization finished.\")\n",
"\n",
" # --- Ray Tune: 分析结果 --- \n",
" print(\"\\n--- Ray Tune Results ---\")\n",
" best_result_ray = results_ray.get_best_result(metric=\"mean_accuracy\", mode=\"max\")\n",
" if best_result_ray:\n",
" print(\"Best trial config:\")\n",
" print(best_result_ray.config)\n",
" print(f\"Best trial final mean_accuracy: {best_result_ray.metrics['mean_accuracy']:.4f}\")\n",
" else:\n",
" print(\"No best result found for Ray Tune.\")\n",
" \n",
" except Exception as e:\n",
" print(f\"An error occurred during Ray Tune execution: {e}\")\n",
"else:\n",
" print(\"Ray is not initialized. Skipping Ray Tune execution.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 比较与选择\n",
"\n",
"| 特性 | Optuna | Ray Tune |\n",
"|------------------|----------------------------------|----------------------------------|\n",
"| **易用性** | 非常高,Pythonic API | 相对复杂,配置项更多 |\n",
"| **核心概念** | Study, Trial, Objective | Trainable, Search Space, Scheduler, Search Alg |\n",
"| **搜索算法** | 内置多种 (TPE, CMA-ES, Random) | 可插拔多种 (HyperOpt, Optuna, BayesOpt, etc.) |\n",
"| **剪枝 (Pruning)**| 内置多种剪枝器, 与框架集成良好 | 通过 Scheduler 实现 (ASHA, PBT等) |\n",
"| **并行/分布式** | 有限 (需要手动或 RDB 后端) | 核心优势,基于 Ray 构建 |\n",
"| **可扩展性** | 良好 | 非常高 |\n",
"| **依赖** | 轻量级 | 需要安装 Ray Core |\n",
"| **可视化** | 内置 Plotly 可视化 | 依赖 TensorBoard 或其他工具 |\n",
"| **主要优势** | 易用性, 快速上手, 强大采样/剪枝 | 可扩展性, 分布式, 高级调度 |\n",
"\n",
"**选择建议**: \n",
"* 对于**单机实验、快速原型设计、易用性优先**的场景,**Optuna** 通常是极好的选择。\n",
"* 对于需要**大规模并行、分布式训练、复杂调度策略或与其他 Ray 组件集成**的场景,**Ray Tune** 是更强大的选择。\n",
"* 可以结合使用:在 Ray Tune 中使用 Optuna 作为搜索算法。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 总结\n",
"\n",
"超参数优化是提升机器学习模型性能的关键步骤。Optuna 和 Ray Tune 是两个功能强大且流行的 Python HPO 框架。\n",
"\n",
"* **Optuna** 以其易用性和高效的采样/剪枝算法著称,非常适合快速上手和单机实验。\n",
"* **Ray Tune** 则在可扩展性、分布式执行和高级调度方面表现出色,适合大规模或复杂的优化任务。\n",
"\n",
"掌握 HPO 工具可以让你从繁琐的手动调参中解放出来,更系统地优化模型。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment