Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 6, 2025 17:35
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

  • sigh.md个人对于Python动态语言的看法
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Seaborn - Python 统计数据可视化教程\n",
"\n",
"欢迎来到 Seaborn 教程!Seaborn 是一个基于 Matplotlib 的 Python 数据可视化库。它提供了一个更高级的接口,用于绘制引人入胜且信息丰富的统计图形。\n",
"\n",
"**Seaborn 的优势:**\n",
"\n",
"1. **专注于统计可视化**:内置了许多专门用于展示数据分布、关系和比较的统计图表类型。\n",
"2. **美观的默认样式**:默认的图形样式和调色板通常比 Matplotlib 的默认设置更美观。\n",
"3. **与 Pandas DataFrame 紧密集成**:可以方便地直接使用 Pandas DataFrame 进行绘图,通过列名指定变量。\n",
"4. **高级接口**:用更少的代码创建复杂的统计图形,如分面网格 (facet grids)。\n",
"5. **对 Matplotlib 的补充**:由于 Seaborn 构建在 Matplotlib 之上,你可以结合使用两者的功能进行深度定制。\n",
"\n",
"**本教程将涵盖 Seaborn 的核心绘图功能:**\n",
"\n",
"1. 设置与准备 (导入库, 设置样式, 加载数据)\n",
"2. 关系图 (Relational Plots): `scatterplot`, `lineplot`\n",
"3. 分布图 (Distribution Plots): `histplot`, `kdeplot`, `ecdfplot`\n",
"4. 分类图 (Categorical Plots): 散点、分布、统计估计\n",
"5. 回归图 (Regression Plots): `regplot`, `lmplot`\n",
"6. 矩阵图 (Matrix Plots): `heatmap`, `clustermap`\n",
"7. 风格与颜色主题\n",
"8. 多图网格 (Figure-level interfaces & Faceting)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 设置与准备\n",
"\n",
"导入必要的库,设置 Seaborn 的默认样式,并加载一些示例数据集。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"# 设置 Seaborn 的默认样式 (可选, 但推荐)\n",
"sns.set_theme(style=\"whitegrid\") # 例如: \"whitegrid\", \"darkgrid\", \"ticks\", \"white\"\n",
"\n",
"# 加载 Seaborn 内置的示例数据集\n",
"tips = sns.load_dataset(\"tips\") # 小费数据集\n",
"iris = sns.load_dataset(\"iris\") # 鸢尾花数据集\n",
"titanic = sns.load_dataset(\"titanic\") # 泰坦尼克号数据集\n",
"flights = sns.load_dataset(\"flights\") # 航班乘客数据集\n",
"\n",
"print(\"Seaborn, Matplotlib, Pandas, NumPy imported.\")\n",
"print(\"\\nTips dataset head:\")\n",
"print(tips.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 关系图 (Relational Plots)\n",
"\n",
"用于可视化两个数值变量之间的关系。\n",
"* `scatterplot()`: 散点图。\n",
"* `lineplot()`: 线图,常用于展示趋势,特别是时间序列。\n",
"* `relplot()`: Figure-level 接口,可以方便地创建分面网格。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Scatter Plot (total_bill vs tip) ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.scatterplot(data=tips, x=\"total_bill\", y=\"tip\")\n",
"plt.title(\"Total Bill vs Tip\")\n",
"plt.show()\n",
"\n",
"# 使用 hue, size, style 添加更多维度\n",
"print(\"\\n--- Scatter Plot with Hue, Size, Style ---\")\n",
"plt.figure(figsize=(8, 5))\n",
"sns.scatterplot(data=tips, x=\"total_bill\", y=\"tip\", \n",
" hue=\"time\", # 按 'time' (午餐/晚餐) 区分颜色\n",
" size=\"size\", # 按用餐人数 'size' 区分大小\n",
" style=\"smoker\" # 按是否吸烟 'smoker' 区分标记样式\n",
" )\n",
"plt.title(\"Bill vs Tip (Colored by Time, Sized by Party Size, Styled by Smoker)\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Line Plot (Example with dummy time series) ---\")\n",
"# 创建一些示例时间序列数据\n",
"time_points = np.arange(1, 31)\n",
"values = np.random.randn(30).cumsum() + 10 # 累积和模拟趋势\n",
"df_time = pd.DataFrame({'day': time_points, 'value': values})\n",
"\n",
"plt.figure(figsize=(8, 4))\n",
"sns.lineplot(data=df_time, x=\"day\", y=\"value\", marker='o')\n",
"plt.title(\"Example Line Plot\")\n",
"plt.show()\n",
"\n",
"# 使用 relplot 创建分面网格\n",
"print(\"\\n--- Relplot (Figure-level interface for scatter/line) ---\")\n",
"sns.relplot(data=tips, x=\"total_bill\", y=\"tip\", \n",
" hue=\"smoker\", \n",
" col=\"time\", # 按 'time' 分列创建子图\n",
" kind=\"scatter\") # 指定绘制散点图 ('line' 绘制线图)\n",
"plt.suptitle(\"Bill vs Tip by Time and Smoker (using relplot)\", y=1.02) # 调整大标题位置\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 分布图 (Distribution Plots)\n",
"\n",
"用于可视化单个变量的分布或两个变量的联合分布。\n",
"* `histplot()`: 直方图。\n",
"* `kdeplot()`: 核密度估计图 (平滑的分布曲线)。\n",
"* `ecdfplot()`: 经验累积分布函数图。\n",
"* `rugplot()`: 在坐标轴上绘制小竖线来表示每个数据点的位置。\n",
"* `displot()`: Figure-level 接口,可以组合多种分布图并创建分面网格。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Histogram (histplot) of Total Bill ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.histplot(data=tips, x=\"total_bill\", bins=20, kde=True) # kde=True 同时绘制核密度曲线\n",
"plt.title(\"Distribution of Total Bill\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Kernel Density Estimate (kdeplot) of Tip Amount ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.kdeplot(data=tips, x=\"tip\", fill=True, color='green') # fill=True 填充曲线下方区域\n",
"plt.title(\"Density Distribution of Tip Amount\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Empirical Cumulative Distribution (ecdfplot) of Tip Amount ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.ecdfplot(data=tips, x=\"tip\")\n",
"plt.title(\"ECDF of Tip Amount\")\n",
"plt.ylabel(\"Cumulative Probability\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Joint Distribution (histplot 2D) ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.histplot(data=tips, x=\"total_bill\", y=\"tip\", cbar=True) # 二维直方图,用颜色深浅表示密度\n",
"plt.title(\"Joint Distribution of Total Bill and Tip\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Displot (Figure-level interface) ---\")\n",
"# kind 可以是 'hist', 'kde', 'ecdf'\n",
"sns.displot(data=tips, x=\"total_bill\", col=\"time\", kind=\"kde\", fill=True)\n",
"plt.suptitle(\"Distribution of Total Bill by Time (using displot)\", y=1.02)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 分类图 (Categorical Plots)\n",
"\n",
"用于可视化一个数值变量与一个或多个分类变量之间的关系。\n",
"\n",
"* **分类散点图 (Categorical Scatter Plots):**\n",
" * `stripplot()`: 简单的散点图,点可能重叠。\n",
" * `swarmplot()`: 散点图,点不会重叠(适合较小数据集)。\n",
"* **分类分布图 (Categorical Distribution Plots):**\n",
" * `boxplot()`: 箱线图,显示分布的中位数、四分位数和异常值。\n",
" * `violinplot()`: 小提琴图,结合了箱线图和核密度估计。\n",
" * `boxenplot()`: 增强箱线图,提供关于分布形状的更多信息。\n",
"* **分类统计估计图 (Categorical Estimate Plots):**\n",
" * `pointplot()`: 用点的位置显示中心趋势估计值,用误差线显示置信区间。\n",
" * `barplot()`: 用条形的高度显示中心趋势估计值(默认均值)。\n",
" * `countplot()`: 显示每个类别的计数(类似条形图,但 y 轴是计数)。\n",
"* `catplot()`: Figure-level 接口,用于绘制各种分类图并创建分面网格。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Box Plot (day vs total_bill) ---\")\n",
"plt.figure(figsize=(8, 5))\n",
"sns.boxplot(data=tips, x=\"day\", y=\"total_bill\", palette=\"pastel\")\n",
"plt.title(\"Total Bill Distribution by Day\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Violin Plot (day vs tip, split by sex) ---\")\n",
"plt.figure(figsize=(8, 5))\n",
"sns.violinplot(data=tips, x=\"day\", y=\"tip\", hue=\"sex\", split=True)\n",
"# split=True 将同一类别的不同 hue 分成两半显示\n",
"plt.title(\"Tip Distribution by Day and Sex\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Bar Plot (day vs total_bill - default estimator: mean) ---\")\n",
"plt.figure(figsize=(8, 5))\n",
"sns.barplot(data=tips, x=\"day\", y=\"total_bill\", hue=\"smoker\", errorbar=('ci', 95)) # errorbar 显示置信区间\n",
"plt.title(\"Average Total Bill by Day and Smoker\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Count Plot (counts of passengers by class) ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.countplot(data=titanic, x=\"class\", palette=\"viridis\")\n",
"plt.title(\"Passenger Count by Class on Titanic\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Catplot (Figure-level interface) ---\")\n",
"# kind 可以是 'strip', 'swarm', 'box', 'violin', 'boxen', 'point', 'bar', 'count'\n",
"sns.catplot(data=titanic, x=\"class\", y=\"age\", hue=\"sex\", \n",
" kind=\"violin\", split=True, col=\"survived\")\n",
"plt.suptitle(\"Age Distribution by Class, Sex, and Survival Status (using catplot)\", y=1.02)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 回归图 (Regression Plots)\n",
"\n",
"用于可视化两个变量之间的线性关系,并拟合一个回归模型。\n",
"* `regplot()`: Axes-level 接口,绘制散点图和线性回归拟合线。\n",
"* `lmplot()`: Figure-level 接口,功能更强,可以方便地进行分面和添加分类变量。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Regression Plot (regplot: total_bill vs tip) ---\")\n",
"plt.figure(figsize=(6, 4))\n",
"sns.regplot(data=tips, x=\"total_bill\", y=\"tip\", \n",
" scatter_kws={'alpha':0.5}, # 传递给底层 scatter 的参数\n",
" line_kws={'color':'red'}) # 传递给底层 plot 的参数\n",
"plt.title(\"Regression Fit for Total Bill vs Tip\")\n",
"plt.show()\n",
"\n",
"print(\"\\n--- LM Plot (lmplot: Faceted regression) ---\")\n",
"# lmplot 使用 FacetGrid,功能更强大\n",
"sns.lmplot(data=tips, x=\"total_bill\", y=\"tip\", \n",
" hue=\"smoker\", \n",
" col=\"time\", \n",
" markers=[\"o\", \"x\"],\n",
" height=5) # height 控制每个子图的大小\n",
"plt.suptitle(\"Regression Fits by Smoker and Time (using lmplot)\", y=1.02)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. 矩阵图 (Matrix Plots)\n",
"\n",
"用于可视化矩阵数据。\n",
"* `heatmap()`: 将矩阵数据绘制为颜色编码的网格。\n",
"* `clustermap()`: 绘制热力图,并使用层次聚类对行和/或列进行重新排序。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 计算相关系数矩阵 (示例)\n",
"iris_numeric = iris.drop('species', axis=1) # 只保留数值列\n",
"corr_matrix = iris_numeric.corr()\n",
"print(f\"Correlation Matrix:\\n{corr_matrix}\")\n",
"\n",
"print(\"\\n--- Heatmap of Correlation Matrix ---\")\n",
"plt.figure(figsize=(6, 5))\n",
"sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=\".2f\")\n",
"# annot=True: 在单元格中显示数值\n",
"# cmap: 颜色映射\n",
"# fmt: 数值格式\n",
"plt.title(\"Correlation Matrix Heatmap (Iris Dataset)\")\n",
"plt.show()\n",
"\n",
"# --- Clustermap --- \n",
"print(\"\\n--- Clustermap (Flights dataset example) ---\")\n",
"# flights 数据集需要先透视成矩阵\n",
"flights_pivot = flights.pivot_table(index='month', columns='year', values='passengers')\n",
"print(f\"Flights Pivot Table (head):\\n{flights_pivot.head()}\")\n",
"\n",
"sns.clustermap(flights_pivot, cmap=\"viridis\", standard_scale=1)\n",
"# standard_scale=1 对行进行标准化 (z-score)\n",
"plt.suptitle(\"Clustermap of Monthly Flight Passengers Over Years\", y=1.02)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. 风格与颜色主题\n",
"\n",
"Seaborn 提供了便捷的方式来控制图形的外观。\n",
"* **Themes**: `sns.set_theme()` (一次性设置风格、调色板等)。常用风格包括 `darkgrid`, `whitegrid`, `dark`, `white`, `ticks`。\n",
"* **Context**: `sns.set_context()` 控制图形元素的缩放比例,适用于不同场景 (如 `paper`, `notebook`, `talk`, `poster`)。\n",
"* **Palettes**: Seaborn 有许多内置的调色板,可以通过 `palette` 参数指定。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Different Styles and Contexts ---\")\n",
"\n",
"sns.set_theme(style=\"ticks\", palette=\"pastel\") # 设置新的主题\n",
"plt.figure(figsize=(6,4))\n",
"sns.boxplot(data=tips, x=\"day\", y=\"total_bill\")\n",
"plt.title(\"Boxplot with 'ticks' style and 'pastel' palette\")\n",
"plt.show()\n",
"\n",
"sns.set_theme(style=\"darkgrid\") # 切换回暗网格\n",
"sns.set_context(\"talk\") # 设置为适合演讲的上下文 (元素更大)\n",
"plt.figure(figsize=(6,4))\n",
"sns.histplot(data=tips, x=\"total_bill\", kde=True)\n",
"plt.title(\"Histogram with 'darkgrid' style and 'talk' context\")\n",
"plt.show()\n",
"\n",
"# 恢复默认设置 (可选)\n",
"sns.set_theme(style=\"whitegrid\") \n",
"sns.set_context(\"notebook\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. 多图网格 (Figure-level interfaces & Faceting)\n",
"\n",
"之前看到的 `relplot`, `displot`, `catplot`, `lmplot` 都是 **Figure-level** 接口。它们底层使用 Seaborn 的 `FacetGrid`, `PairGrid`, 或 `JointGrid` 对象来创建包含多个子图(分面)的图形。\n",
"\n",
"这些函数允许你通过 `row`, `col`, `hue` 等参数,轻松地根据数据的子集绘制多个相关的图表。\n",
"\n",
"* **`FacetGrid`**: 最通用的,可以创建任意行、列的分面网格,然后将绘图函数 `map` 到每个子图上。\n",
"* **`PairGrid`**: 创建一个子图矩阵,用于展示数据集中多个变量两两之间的关系和每个变量自身的分布。\n",
" * `pairplot()` 是 `PairGrid` 的便捷接口。\n",
"* **`JointGrid`**: 创建一个包含联合分布图和两个边缘分布图的图形。\n",
" * `jointplot()` 是 `JointGrid` 的便捷接口。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"--- Pair Plot (pairplot for iris dataset) ---\")\n",
"# 展示数据集中所有数值变量两两之间的关系以及每个变量的分布\n",
"sns.pairplot(iris, hue=\"species\") # 按物种区分颜色\n",
"plt.suptitle(\"Pair Plot of Iris Dataset (colored by species)\", y=1.02)\n",
"plt.show()\n",
"\n",
"print(\"\\n--- Joint Plot (jointplot for total_bill vs tip) ---\")\n",
"# kind 可以是 'scatter', 'kde', 'hist', 'reg'\n",
"sns.jointplot(data=tips, x=\"total_bill\", y=\"tip\", kind=\"reg\") # 中间散点+回归,边缘直方图\n",
"plt.suptitle(\"Joint Distribution and Regression (Total Bill vs Tip)\", y=1.02)\n",
"plt.show()\n",
"\n",
"print(\"\\n--- FacetGrid Example (Manual Mapping) ---\")\n",
"# 手动使用 FacetGrid (更灵活,但也更复杂)\n",
"g = sns.FacetGrid(tips, col=\"time\", row=\"smoker\", margin_titles=True)\n",
"g.map(sns.histplot, \"total_bill\", bins=10, color='skyblue') # 将 histplot 应用到每个子图\n",
"g.fig.suptitle(\"Total Bill Distribution by Time and Smoker (FacetGrid)\", y=1.03)\n",
"g.set_axis_labels(\"Total Bill\", \"Count\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 总结\n",
"\n",
"Seaborn 是一个强大的统计数据可视化库,它构建在 Matplotlib 之上,提供了更高级的接口和更美观的默认样式。\n",
"\n",
"**关键要点:**\n",
"* 专注于统计图形,如分布、关系、分类比较。\n",
"* 与 Pandas DataFrame 紧密集成。\n",
"* 提供 Axes-level 函数 (如 `scatterplot`, `histplot`, `boxplot`) 和 Figure-level 函数 (如 `relplot`, `displot`, `catplot`, `lmplot`)。\n",
"* Figure-level 函数方便创建分面网格 (Faceting)。\n",
"* 可以轻松设置不同的视觉主题和上下文。\n",
"* 仍然可以结合 Matplotlib 进行深度定制。\n",
"\n",
"对于探索性数据分析和生成用于报告或演示的统计图表,Seaborn 是一个非常有价值的工具。建议多尝试不同的绘图函数和参数组合,并查阅官方文档和示例库以获取更多灵感。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment