Concurrency §§ZH§§ 并发 - Rust for Python Programmers

No GIL: True Parallelism
没有 GIL：真正的并行

What you’ll learn: Why the GIL limits Python concurrency, Rust’s Send/Sync traits for compile-time thread safety, Arc<Mutex<T>> vs Python threading.Lock, channels vs queue.Queue, and async/await differences.
本章将学习： GIL 为什么会限制 Python 并发，Rust 如何用 Send/Sync 在编译期保证线程安全，Arc<Mutex<T>> 与 Python threading.Lock 的对应关系，通道与 queue.Queue 的区别，以及两门语言中 async/await 的差异。

Difficulty: 🔴 Advanced
难度： 🔴 高级

The GIL is Python’s biggest limitation for CPU-bound work. Rust has no GIL, so threads can run truly in parallel, and the type system prevents data races at compile time.
对于 CPU 密集任务来说，GIL 基本就是 Python 最大的天花板。Rust 没有这一层限制，线程可以真并行，而数据竞争则由类型系统在编译期拦下来。

gantt
    title CPU-bound Work: Python GIL vs Rust Threads
    dateFormat X
    axisFormat %s
    section Python (GIL)
        Thread 1 :a1, 0, 4
        Thread 2 :a2, 4, 8
        Thread 3 :a3, 8, 12
        Thread 4 :a4, 12, 16
    section Rust (no GIL)
        Thread 1 :b1, 0, 4
        Thread 2 :b2, 0, 4
        Thread 3 :b3, 0, 4
        Thread 4 :b4, 0, 4

Key insight: Python threads run sequentially for CPU work because the GIL serializes them. Rust threads run truly in parallel, so four threads can approach a four-times speedup on the right workload.
关键理解： Python 线程在 CPU 任务上往往还是串着跑，因为 GIL 会把执行权串行化；Rust 线程则能真正并行，在合适负载下四个线程就有机会逼近四倍吞吐。

📌 Prerequisite: Be comfortable with Ch. 7 — Ownership and Borrowing before this chapter. Arc、Mutex、move 闭包这些东西，底层全都踩在所有权模型上。
📌 前置建议： 在读这一章前，最好已经吃透第 7 章——所有权与借用。Arc、Mutex、move 闭包这些概念，底层都建立在所有权模型之上。

Python’s GIL Problem
Python 的 GIL 问题

# Python — threads don't help for CPU-bound work
import threading
import time

counter = 0

def increment(n):
    global counter
    for _ in range(n):
        counter += 1  # NOT thread-safe! But GIL "protects" simple operations

threads = [threading.Thread(target=increment, args=(1_000_000,)) for _ in range(4)]
start = time.perf_counter()
for t in threads:
    t.start()
for t in threads:
    t.join()
elapsed = time.perf_counter() - start

print(f"Counter: {counter}")    # Might not be 4,000,000!
print(f"Time: {elapsed:.2f}s")  # About the SAME as single-threaded (GIL)

# For true parallelism, Python requires multiprocessing:
from multiprocessing import Pool
with Pool(4) as pool:
    results = pool.map(cpu_work, data)  # Separate processes, pickle overhead

Rust — True Parallelism, Compile-Time Safety
Rust：真正的并行与编译期安全

use std::sync::atomic::{AtomicI64, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    let counter = Arc::new(AtomicI64::new(0));

    let handles: Vec<_> = (0..4).map(|_| {
        let counter = Arc::clone(&counter);
        thread::spawn(move || {
            for _ in 0..1_000_000 {
                counter.fetch_add(1, Ordering::Relaxed);
            }
        })
    }).collect();

    for h in handles {
        h.join().unwrap();
    }

    println!("Counter: {}", counter.load(Ordering::Relaxed)); // Always 4,000,000
    // Runs on ALL cores — true parallelism, no GIL
}

Thread Safety: Type System Guarantees
线程安全：类型系统给出的保证

Python — Runtime Errors
Python：很多问题要到运行时才暴露

# Python — data races caught at runtime (or not at all)
import threading

shared_list = []

def append_items(items):
    for item in items:
        shared_list.append(item)  # "Thread-safe" due to GIL for append
        # But complex operations are NOT safe:
        # if item not in shared_list:
        #     shared_list.append(item)  # RACE CONDITION!

# Using Lock for safety:
lock = threading.Lock()
def safe_append(items):
    for item in items:
        with lock:
            if item not in shared_list:
                shared_list.append(item)
# Forgetting the lock? No compiler warning. Bug discovered in production.

Rust — Compile-Time Errors
Rust：很多错误编译不过去

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    // Trying to share a Vec across threads without protection:
    // let shared = vec![];
    // thread::spawn(move || shared.push(1));
    // ❌ Compile error: Vec is not Send/Sync without protection

    // With Mutex (Rust's equivalent of threading.Lock):
    let shared = Arc::new(Mutex::new(Vec::new()));

    let handles: Vec<_> = (0..4).map(|i| {
        let shared = Arc::clone(&shared);
        thread::spawn(move || {
            let mut data = shared.lock().unwrap(); // Lock is REQUIRED to access
            data.push(i);
            // Lock is automatically released when `data` goes out of scope
            // No "forgetting to unlock" — RAII guarantees it
        })
    }).collect();

    for h in handles {
        h.join().unwrap();
    }

    println!("{:?}", shared.lock().unwrap()); // [0, 1, 2, 3] (order may vary)
}

Send and Sync Traits
`Send` 与 `Sync` trait

#![allow(unused)]
fn main() {
// Rust uses two marker traits to enforce thread safety:

// Send — "this type can be transferred to another thread"
// Most types are Send. Rc<T> is NOT (use Arc<T> for threads).

// Sync — "this type can be referenced from multiple threads"
// Most types are Sync. Cell<T>/RefCell<T> are NOT (use Mutex<T>).

// The compiler checks these automatically:
// thread::spawn(move || { ... })
//   ↑ The closure's captures must be Send
//   ↑ Shared references must be Sync
//   ↑ If they're not → compile error

// Python has no equivalent. Thread safety bugs are discovered at runtime.
// Rust catches them at compile time. This is "fearless concurrency."
}

Send 和 Sync 这俩名字第一次看挺抽象，实际上说的就是“能不能在线程之间移动”和“能不能在线程之间共享引用”。理解成这两句人话，基本就顺了。
Send and Sync sound abstract at first, but they simply answer two questions: can this value move across threads, and can references to it be shared across threads?

Concurrency Primitives Comparison
并发原语对照

Python	Rust	Purpose 用途
`threading.Lock()`	`Mutex<T>`	Mutual exclusion 互斥访问
`threading.RLock()`	`Mutex<T>`	Reentrant lock in Python; Rust usually models ownership differently Python 是可重入锁；Rust 通常换一种设计来避免这种需求
`threading.RWLock` (N/A)	`RwLock<T>`	Multiple readers or one writer 多读单写
`threading.Event()`	`Condvar`	Condition variable 条件变量
`queue.Queue()`	`mpsc::channel()`	Thread-safe message channel 线程安全消息通道
`multiprocessing.Pool`	`rayon::ThreadPool`	Thread pool 线程池
`concurrent.futures`	`rayon` / `tokio::spawn`	Task-based parallelism 基于任务的并行
`threading.local()`	`thread_local!`	Thread-local storage 线程局部存储
N/A	`Atomic*` types	Lock-free counters and flags 无锁原子计数器与标志位

Mutex Poisoning
`Mutex` 中毒

If a thread panics while holding a Mutex, the lock becomes poisoned. Python has no direct equivalent. If a thread crashes while holding threading.Lock(), the program just gets stuck in a much uglier way.
如果某个线程在持有 Mutex 时 panic，这把锁就会进入 poisoned 状态。Python 没有这一层明确机制，线程拿着锁崩掉时，剩下的线程往往只会用更别扭的方式卡死在那里。

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};
use std::thread;

let data = Arc::new(Mutex::new(vec![1, 2, 3]));
let data2 = Arc::clone(&data);

let _ = thread::spawn(move || {
    let mut guard = data2.lock().unwrap();
    guard.push(4);
    panic!("oops!");  // Lock is now poisoned
}).join();

// Subsequent lock attempts return Err(PoisonError)
match data.lock() {
    Ok(guard) => println!("Data: {guard:?}"),
    Err(poisoned) => {
        println!("Lock was poisoned! Recovering...");
        let guard = poisoned.into_inner();
        println!("Recovered: {guard:?}");  // [1, 2, 3, 4]
    }
}
}

Atomic Ordering
原子操作中的内存序

The Ordering parameter on atomic operations controls memory visibility guarantees.
原子操作里的 Ordering 参数，控制的是内存可见性和执行顺序保证。

Ordering	When to use 什么时候用
`Relaxed`	Simple counters where ordering does not matter 只关心计数值、不关心先后关系时
`Acquire`/`Release`	Producer-consumer handoff 生产者消费者交接数据时
`SeqCst`	When in doubt; strongest and easiest to reason about 拿不准就先用它，语义最强也最好理解

Python 的 threading 基本把这些细节都藏在 GIL 后面了。Rust 让人自己选，权力大，责任也大，所以拿不准时先用 SeqCst 往往比较稳。
Python largely hides these memory-ordering details behind the GIL. Rust exposes the choice, which is powerful but also demands care, so SeqCst is a good default when correctness matters more than micro-optimization.

async/await Comparison
async/await 对照

Python and Rust both use async/await syntax, but the runtimes and performance model underneath are very different.
Python 和 Rust 都有 async/await 语法，但底下的运行时模型和性能边界差别很大。

Python async/await
Python 的 async/await

# Python — asyncio for concurrent I/O
import asyncio
import aiohttp

async def fetch_url(session, url):
    async with session.get(url) as resp:
        return await resp.text()

async def main():
    urls = ["https://example.com", "https://httpbin.org/get"]

    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

    for url, result in zip(urls, results):
        print(f"{url}: {len(result)} bytes")

asyncio.run(main())

# Python async is single-threaded (still GIL)!
# It only helps with I/O-bound work (waiting for network/disk).
# CPU-bound work in async still blocks the event loop.

Rust async/await
Rust 的 async/await

// Rust — tokio for concurrent I/O (and CPU parallelism!)
use reqwest;
use tokio;

async fn fetch_url(url: &str) -> Result<String, reqwest::Error> {
    reqwest::get(url).await?.text().await
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let urls = vec!["https://example.com", "https://httpbin.org/get"];

    let tasks: Vec<_> = urls.iter()
        .map(|url| tokio::spawn(fetch_url(url)))  // No GIL limitation
        .collect();                                 // Can use all CPU cores

    let results = futures::future::join_all(tasks).await;

    for (url, result) in urls.iter().zip(results) {
        match result {
            Ok(Ok(body)) => println!("{url}: {} bytes", body.len()),
            Ok(Err(e)) => println!("{url}: error {e}"),
            Err(e) => println!("{url}: task failed {e}"),
        }
    }

    Ok(())
}

Key Differences
关键差异

Aspect	Python asyncio	Rust tokio
GIL	Still applies 仍然存在	No GIL 没有 GIL
CPU parallelism	❌ Single-threaded 通常单线程	✅ Multi-threaded 可多线程并行
Runtime	Built-in 内建	External crate 外部 crate
Ecosystem	`aiohttp`, `asyncpg`	`reqwest`, `sqlx`
Performance	Good for I/O 适合 I/O	Excellent for I/O and capable around CPU orchestration I/O 很强，也更容易和并行任务结合
Error handling	Exceptions 异常	`Result<T, E>`
Cancellation	`task.cancel()`	Drop the future 丢弃 future
Color problem	Exists	Also exists

Simple Parallelism with Rayon
用 Rayon 做简单并行

# Python — multiprocessing for CPU parallelism
from multiprocessing import Pool

def process_item(item):
    return heavy_computation(item)

with Pool(8) as pool:
    results = pool.map(process_item, items)

#![allow(unused)]
fn main() {
// Rust — rayon for effortless CPU parallelism (one line change!)
use rayon::prelude::*;

// Sequential:
let results: Vec<_> = items.iter().map(|item| heavy_computation(item)).collect();

// Parallel (change .iter() to .par_iter() — that's it!):
let results: Vec<_> = items.par_iter().map(|item| heavy_computation(item)).collect();

// No pickle, no process overhead, no serialization.
// Rayon automatically distributes work across cores.
}

Case Study: Parallel Image Processing Pipeline
案例：并行图像处理流水线

A data science team processes 50,000 satellite images nightly. Their Python pipeline uses multiprocessing.Pool.
某数据团队每晚要处理 5 万张卫星图像，原来的 Python 流水线靠 multiprocessing.Pool 顶着跑。

# Python — multiprocessing for CPU-bound image work
import multiprocessing
from PIL import Image
import numpy as np

def process_image(path: str) -> dict:
    img = np.array(Image.open(path))
    # CPU-intensive: histogram equalization, edge detection, classification
    histogram = np.histogram(img, bins=256)[0]
    edges = detect_edges(img)       # ~200ms per image
    label = classify(edges)          # ~100ms per image
    return {"path": path, "label": label, "edge_count": len(edges)}

# Problem: each subprocess copies the full Python interpreter
# Memory: 50MB per worker × 16 workers = 800MB overhead
# Startup: 2-3 seconds to fork and pickle arguments
with multiprocessing.Pool(16) as pool:
    results = pool.map(process_image, image_paths)  # ~4.5 hours for 50k images

Pain points: 800MB memory overhead from forking, pickle serialization of arguments and results, the GIL forcing process-based workarounds, and ugly debugging when workers fail.
主要痛点： 进程分叉带来 800MB 额外内存，参数和结果都要 pickle，GIL 逼着人走多进程，worker 出错时还特别难排查。

use rayon::prelude::*;
use image::GenericImageView;

struct ImageResult {
    path: String,
    label: String,
    edge_count: usize,
}

fn process_image(path: &str) -> Result<ImageResult, image::ImageError> {
    let img = image::open(path)?;
    let histogram = compute_histogram(&img);       // ~50ms (no numpy overhead)
    let edges = detect_edges(&img);                // ~40ms (SIMD-optimized)
    let label = classify(&edges);                  // ~20ms
    Ok(ImageResult {
        path: path.to_string(),
        label,
        edge_count: edges.len(),
    })
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let paths: Vec<String> = load_image_paths()?;

    // Rayon automatically uses all CPU cores — no forking, no pickle, no GIL
    let results: Vec<ImageResult> = paths
        .par_iter()                                // Parallel iterator
        .filter_map(|p| process_image(p).ok())     // Skip errors gracefully
        .collect();                                // Collect in parallel

    println!("Processed {} images", results.len());
    Ok(())
}
// 50k images in ~35 minutes (vs 4.5 hours in Python)
// Memory: ~50MB total (shared threads, no forking)

Results:
结果：

Metric	Python	Rust
Time for 50k images 5 万张耗时	~4.5 hours	~35 minutes
Memory overhead 额外内存	800MB	~50MB
Error handling 错误处理	Opaque worker exceptions 异常难查	`Result<T, E>` everywhere 每一步都有明确结果类型
Startup cost 启动成本	2–3s	Near zero 几乎没有

Key lesson: For CPU-heavy parallel workloads, Rust threads plus Rayon avoid Python’s serialization overhead while keeping memory shared and correctness checked at compile time.
核心结论： 在 CPU 密集并行场景里，Rust 线程配合 Rayon 能把 Python 多进程那套序列化与内存开销省掉，同时还保留编译期正确性检查。

Exercises
练习

🏋️ Exercise: Thread-Safe Counter
🏋️ 练习：线程安全计数器

Challenge: In Python, a shared counter would often use threading.Lock. Translate that idea into Rust: spawn 10 threads, have each one increment the same counter 1000 times, and print the final value. Use Arc<Mutex<u64>>.
挑战：在 Python 里，这类共享计数器通常会配 threading.Lock。把它翻译成 Rust：启动 10 个线程，每个线程把同一个计数器加 1000 次，最后打印结果。要求使用 Arc<Mutex<u64>>。

🔑 Solution
🔑 参考答案

use std::sync::{Arc, Mutex};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0u64));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        handles.push(thread::spawn(move || {
            for _ in 0..1000 {
                let mut num = counter.lock().unwrap();
                *num += 1;
            }
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Final count: {}", *counter.lock().unwrap());
}

Key takeaway: Arc<Mutex<T>> is Rust’s equivalent of a shared value plus a lock, but the compiler forces the synchronization pattern into the type itself. That means fewer “forgot the lock” bugs escaping into production.
核心收获： Arc<Mutex<T>> 基本就是 Rust 里“共享数据 + 锁”的组合写法，但 Rust 会把同步要求直接编码进类型里，所以那种“锁忘了加，结果线上炸锅”的低级事故会少很多。

Keyboard shortcuts

Rust for Python Programmers | Rust 面向 Python 程序员