Common Python Patterns in Rust
Rust 中对应的常见 Python 模式
What you’ll learn: How to translate
dictintostruct,classintostruct + impl, list comprehensions into iterator chains, decorators into traits or wrappers, and context managers intoDrop/ RAII. Also included: essential crates and an incremental migration strategy.
本章将学习: 如何把dict迁移成struct,把class迁移成struct + impl,把列表推导式迁移成迭代器链,把装饰器迁移成 trait 或包装函数,以及把上下文管理器迁移成Drop/ RAII。同时还会补充常用 crate 和渐进式迁移策略。Difficulty: 🟡 Intermediate
难度: 🟡 进阶
Dictionary → Struct
字典 → 结构体
# Python — dict as data container (very common)
user = {
"name": "Alice",
"age": 30,
"email": "alice@example.com",
"active": True,
}
print(user["name"])
#![allow(unused)]
fn main() {
// Rust — struct with named fields
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
struct User {
name: String,
age: i32,
email: String,
active: bool,
}
let user = User {
name: "Alice".into(),
age: 30,
email: "alice@example.com".into(),
active: true,
};
println!("{}", user.name);
}
When Python code uses dictionaries as “free-form data bags”, Rust usually wants a named struct instead. That looks stricter, but it pays off in discoverability, tooling, and compile-time validation.
当 Python 把字典当成“随手塞字段的数据袋”时,Rust 一般更倾向于用具名结构体来承载。这样确实更严格,但换来的是更强的可读性、工具支持和编译期校验。
Context Manager → RAII (Drop)
上下文管理器 → RAII(Drop)
# Python — context manager for resource cleanup
class FileManager:
def __init__(self, path):
self.file = open(path, 'w')
def __enter__(self):
return self.file
def __exit__(self, *args):
self.file.close()
with FileManager("output.txt") as f:
f.write("hello")
#![allow(unused)]
fn main() {
// Rust — RAII: Drop trait runs when value goes out of scope
use std::fs::File;
use std::io::Write;
fn write_file() -> std::io::Result<()> {
let mut file = File::create("output.txt")?;
file.write_all(b"hello")?;
Ok(())
// File closes automatically when `file` leaves scope
}
}
Rust relies on scope-based cleanup rather than explicit with syntax. Once ownership leaves scope, the resource is cleaned up deterministically.
Rust 靠的是基于作用域的资源回收,而不是单独的 with 语法。所有者一旦离开作用域,资源就会按确定顺序被清理。
Decorator → Higher-Order Function or Macro
装饰器 → 高阶函数或宏
# Python — decorator for timing
import functools, time
def timed(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed:.4f}s")
return result
return wrapper
@timed
def slow_function():
time.sleep(1)
#![allow(unused)]
fn main() {
// Rust — no decorators, use wrapper functions or macros
use std::time::Instant;
fn timed<F, R>(name: &str, f: F) -> R
where
F: FnOnce() -> R,
{
let start = Instant::now();
let result = f();
println!("{} took {:.4?}", name, start.elapsed());
result
}
let result = timed("slow_function", || {
std::thread::sleep(std::time::Duration::from_secs(1));
42
});
}
Rust has no direct decorator syntax, but wrapper functions, closures, traits, and macros usually cover the same design space in a more explicit way.
Rust 没有和装饰器完全一模一样的语法,但包装函数、闭包、trait、宏这些组合起来,通常能覆盖同样的设计需求,只是写法更显式。
Iterator Pipeline (Data Processing)
迭代器流水线(数据处理)
# Python — chain of transformations
import csv
from collections import Counter
def analyze_sales(filename):
with open(filename) as f:
reader = csv.DictReader(f)
sales = [
row for row in reader
if float(row["amount"]) > 100
]
by_region = Counter(sale["region"] for sale in sales)
top_regions = by_region.most_common(5)
return top_regions
#![allow(unused)]
fn main() {
// Rust — iterator chains with strong types
use std::collections::HashMap;
#[derive(Debug, serde::Deserialize)]
struct Sale {
region: String,
amount: f64,
}
fn analyze_sales(filename: &str) -> Vec<(String, usize)> {
let data = std::fs::read_to_string(filename).unwrap();
let mut reader = csv::Reader::from_reader(data.as_bytes());
let mut by_region: HashMap<String, usize> = HashMap::new();
for sale in reader.deserialize::<Sale>().flatten() {
if sale.amount > 100.0 {
*by_region.entry(sale.region).or_insert(0) += 1;
}
}
let mut top: Vec<_> = by_region.into_iter().collect();
top.sort_by(|a, b| b.1.cmp(&a.1));
top.truncate(5);
top
}
}
Global Config / Singleton
全局配置 / 单例
# Python — module-level singleton
import json
class Config:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
with open("config.json") as f:
cls._instance.data = json.load(f)
return cls._instance
config = Config()
#![allow(unused)]
fn main() {
// Rust — OnceLock for lazy static initialization
use std::sync::OnceLock;
use serde_json::Value;
static CONFIG: OnceLock<Value> = OnceLock::new();
fn get_config() -> &'static Value {
CONFIG.get_or_init(|| {
let data = std::fs::read_to_string("config.json")
.expect("Failed to read config");
serde_json::from_str(&data)
.expect("Failed to parse config")
})
}
let db_host = get_config()["database"]["host"].as_str().unwrap();
}
Essential Crates for Python Developers
适合 Python 开发者优先认识的 crate
Data Processing & Serialization
数据处理与序列化
| Task 任务 | Python | Rust Crate | Notes 说明 |
|---|---|---|---|
| JSON | json | serde_json | Type-safe serialization 类型安全的序列化 |
| CSV | csv, pandas | csv | Streaming, low memory 支持流式处理,内存占用低 |
| YAML | pyyaml | serde_yaml | Config files |
| TOML | tomllib | toml | Config files |
| Data validation | pydantic | serde + custom | Compile-time + explicit validation 编译期约束配合显式校验 |
| Date/time | datetime | chrono | Full timezone support |
| Regex | re | regex | Very fast |
| UUID | uuid | uuid | Same concept |
Web & Network
Web 与网络
| Task 任务 | Python | Rust Crate | Notes 说明 |
|---|---|---|---|
| HTTP client | requests | reqwest | Async-first 异步优先 |
| Web framework | FastAPI / Flask | axum / actix-web | Very fast |
| WebSocket | websockets | tokio-tungstenite | Async |
| gRPC | grpcio | tonic | Full support |
| Database (SQL) | sqlalchemy | sqlx / diesel | Compile-time checked SQL SQL 约束更强 |
| Redis | redis-py | redis | Async support |
CLI & System
命令行与系统工具
| Task 任务 | Python | Rust Crate | Notes 说明 |
|---|---|---|---|
| CLI args | argparse / click | clap | Derive macros |
| Colored output | colorama | colored | Terminal colors |
| Progress bar | tqdm | indicatif | Similar UX |
| File watching | watchdog | notify | Cross-platform |
| Logging | logging | tracing | Structured and async-friendly |
| Env vars | os.environ | std::env + dotenvy | .env support |
| Subprocess | subprocess | std::process::Command | Built-in |
| Temp files | tempfile | tempfile | Same name |
Testing
测试
| Task 任务 | Python | Rust Crate | Notes 说明 |
|---|---|---|---|
| Test framework | pytest | Built-in + rstest | cargo test |
| Mocking | unittest.mock | mockall | Trait-based |
| Property testing | hypothesis | proptest | Similar idea |
| Snapshot testing | syrupy | insta | Snapshot approval |
| Benchmarking | pytest-benchmark | criterion | Statistical approach |
| Code coverage | coverage.py | cargo-tarpaulin | LLVM-based |
Incremental Adoption Strategy
渐进式引入策略
flowchart LR
A["1️⃣ Profile Python<br/>先找热点"] --> B["2️⃣ Write Rust Extension<br/>PyO3 + maturin"]
B --> C["3️⃣ Replace Python Call<br/>替换调用点"]
C --> D["4️⃣ Expand Gradually<br/>逐步扩大范围"]
D --> E{"Full rewrite worth it?<br/>值得全量重写吗?"}
E -->|Yes| F["Pure Rust 🦀"]
E -->|No| G["Hybrid 🐍 + 🦀"]
style A fill:#ffeeba
style B fill:#fff3cd
style C fill:#d4edda
style D fill:#d4edda
style F fill:#c3e6cb
style G fill:#c3e6cb
📌 See also: Ch. 14 — Unsafe Rust and FFI for the lower-level details behind PyO3 bindings.
📌 延伸阅读: 第 14 章——Unsafe Rust 与 FFI 会进一步讲 PyO3 绑定背后的底层细节。
Step 1: Identify Hotspots
第一步:先找热点
import cProfile
cProfile.run('main()')
# Or use py-spy:
# py-spy top --pid <python-pid>
# py-spy record -o profile.svg -- python main.py
Step 2: Write Rust Extension for the Hotspot
第二步:给热点写 Rust 扩展
cd my_python_project
maturin init --bindings pyo3
maturin develop --release
Step 3: Replace the Python Call
第三步:替换 Python 调用点
# Before:
result = python_hot_function(data)
# After:
import my_rust_extension
result = my_rust_extension.hot_function(data)
Step 4: Expand Gradually
第四步:逐步扩大范围
Week 1-2: Replace one CPU-bound function with Rust
Week 3-4: Replace data parsing/validation layer
Month 2: Replace core data pipeline
Month 3+: Consider full Rust rewrite if benefits justify it
Key principle: keep Python for orchestration, use Rust for computation.
第 1 到 2 周:先替换一个 CPU 密集函数第 3 到 4 周:再替换数据解析或校验层
第 2 个月:开始替换核心数据流水线
第 3 个月以后:如果收益足够大,再考虑完整重写
核心原则:Python 继续负责编排与胶水逻辑,Rust 专注高价值计算热点。
💼 Case Study: Accelerating a Data Pipeline with PyO3
案例:用 PyO3 加速数据流水线
A fintech startup processes 2GB of transaction CSV data every day. The slowest part is validation plus transformation.
一个金融科技团队每天要处理 2GB 交易 CSV。最慢的部分是校验和转换逻辑。
# Python — the slow part (~12 minutes for 2GB)
import csv
from decimal import Decimal
from datetime import datetime
def validate_and_transform(filepath: str) -> list[dict]:
results = []
with open(filepath) as f:
reader = csv.DictReader(f)
for row in reader:
amount = Decimal(row["amount"])
if amount < 0:
raise ValueError(f"Negative amount: {amount}")
date = datetime.strptime(row["date"], "%Y-%m-%d")
category = categorize(row["merchant"])
results.append({
"amount_cents": int(amount * 100),
"date": date.isoformat(),
"category": category,
"merchant": row["merchant"].strip().lower(),
})
return results
Step 1: Profile first and confirm that CSV parsing, decimal conversion, and string matching dominate runtime.
第一步:先 profile,确认耗时主要集中在 CSV 解析、金额转换和字符串匹配上。
Step 2: Move the hotspot into a Rust extension.
第二步:把热点逻辑搬进 Rust 扩展。
#![allow(unused)]
fn main() {
// src/lib.rs — PyO3 extension
use pyo3::prelude::*;
use pyo3::types::PyList;
use std::fs::File;
use std::io::BufReader;
#[derive(Debug)]
struct Transaction {
amount_cents: i64,
date: String,
category: String,
merchant: String,
}
fn categorize(merchant: &str) -> &'static str {
if merchant.contains("amazon") { "shopping" }
else if merchant.contains("uber") || merchant.contains("lyft") { "transport" }
else if merchant.contains("starbucks") { "food" }
else { "other" }
}
#[pyfunction]
fn process_transactions(path: &str) -> PyResult<Vec<(i64, String, String, String)>> {
let file = File::open(path).map_err(|e| pyo3::exceptions::PyIOError::new_err(e.to_string()))?;
let mut reader = csv::Reader::from_reader(BufReader::new(file));
let mut results = Vec::with_capacity(15_000_000);
for record in reader.records() {
let record = record.map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string()))?;
let amount_str = &record[0];
let amount_cents = parse_amount_cents(amount_str)?;
let date = &record[1];
let merchant = record[2].trim().to_lowercase();
let category = categorize(&merchant).to_string();
results.push((amount_cents, date.to_string(), category, merchant));
}
Ok(results)
}
#[pymodule]
fn fast_pipeline(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(process_transactions, m)?)?;
Ok(())
}
}
Step 3: Replace one call in Python and keep everything else the same.
第三步:只换掉 Python 里一个调用点,其余逻辑保持原样。
# Before:
results = validate_and_transform("transactions.csv")
# After:
import fast_pipeline
results = fast_pipeline.process_transactions("transactions.csv")
Results:
结果:
| Metric 指标 | Python | Rust |
|---|---|---|
| Time (2GB / 15M rows) 耗时 | 12 minutes | 45 seconds |
| Peak memory 峰值内存 | 6GB / 2GB | 200MB |
| Lines changed in Python Python 改动行数 | — | 1 |
| Rust code written 新增 Rust 代码 | — | ~60 lines |
| Tests passing 测试通过 | 47/47 | 47/47 |
Key lesson: Most teams do not need a full rewrite. Replacing the small fraction of code that consumes most of the runtime often captures most of the practical benefit.
关键教训:大多数团队根本不需要全量重写。只替换那一小部分最耗时的代码,往往就已经吃到了绝大多数收益。
Exercises
练习
🏋️ Exercise: Migration Decision Matrix
🏋️ 练习:迁移决策矩阵
Challenge: For each component of a Python web application below, decide whether it should stay in Python, move to Rust, or be bridged through PyO3. Give a short reason.
挑战:下面列出的是一个 Python Web 应用的不同组件。分别判断它们应该保留在 Python、重写成 Rust,还是通过 PyO3 桥接,并简要说明理由。
- Flask route handlers
1. Flask 路由处理器 - Image thumbnail generation
2. 图片缩略图生成 - SQLAlchemy ORM queries
3. SQLAlchemy ORM 查询 - Nightly CSV parsing for 2GB financial files
4. 每晚解析 2GB 金融 CSV - Admin dashboard templates
5. 管理后台模板页
🔑 Solution
🔑 参考答案
| Component 组件 | Decision 决策 | Rationale 理由 |
|---|---|---|
| Flask route handlers | 🐍 Keep Python | I/O-bound, framework-heavy, low performance return 偏 I/O、强框架绑定,迁移收益通常不高 |
| Image thumbnail generation | 🦀 PyO3 bridge | CPU-bound hotspot with clear boundary CPU 热点明确,边界清晰,很适合桥接 |
| Database ORM queries | 🐍 Keep Python | ORM 生态成熟,而且主要是 I/O 等待 成熟 ORM 生态优势明显,且核心瓶颈通常不是 CPU |
| CSV parser (2GB) | 🦀 PyO3 bridge or full Rust | CPU + memory sensitive, Rust parsing shines 既吃 CPU 又吃内存,Rust 在这里非常强 |
| Admin dashboard | 🐍 Keep Python | Mostly UI and templates, little execution pressure 主要是界面和模板逻辑,执行性能不是重点 |
Key takeaway: The best migration targets are usually CPU-heavy, performance-sensitive components with clean interfaces. Glue code and framework-heavy request handlers often stay more economical in Python.
核心收获:最值得迁移的部分,通常是 CPU 密集、性能敏感、边界清晰的组件。至于胶水代码和强框架绑定的请求处理逻辑,继续留在 Python 往往更划算。