why-async-rust - JonahGao's Notes

#rust - 原文：[https://without.boats/blog/why-async-rust/](https://without.boats/blog/why-async-rust/) # 1. Some background on terminology - 操作系统 threads 的开销： - 内核和用户态的上下文切换，带来 CPU cycles 的开销 - pre-allocated stack 很大，带来 per-thread 的内存开销 - 第一个设计选择：**cooperative** 和 **preemptive** scheduling - cooperative： task 必须主动让出控制权给 scheduling subsystem - preemptively：task 可以在运行时的某个时刻被动停止，task 不需要感知到 - Goroutines： - preemptively scheduled tasks - 严格意义，应该称为 virtual threads 或者 green threads，而非 corountines - 第二个设计选择：**stackful** 和 **stackless** coroutine - stackful: 拥有 program stack，coroutine yields 后保存 stack 的 state，后续可以再从相同位置恢复 - stackless：以不同的方式存储恢复需要的 state，如 contiunation 或者 state machine。当 yield 后，它所使用的 stack 会被接管它的操作所使用，恢复后收回对 stack 的控制，并使用 continuation 或者 state machine 来恢复 coroutine。 - function coloring problem：为了获取异步函数的结果，需要使用一个不同的操作（await）而不是直接调用它。 - Rust 的 async/await 语法属于 **stackless coroutine** 机制。 - 一个 async 函数被编译成一个返回 `Future` 的函数。 - 当让出控制时，future 用于存储 coroutine 的状态。 --------- # 2. The development of async Rust ## 2.1 Green threads - Rust 最初有一个 stackful coroutine 机制（green threads），在 2014 年底被移除。 - green thread system 的一个大问题就是如果处理这些线程的 program stack - 用户态线程的一个优势就是没有 OS 线程巨大且是预分配的 stack 带来的内存开销 - 因此，green thread 的 stack 应该更小，且按需增长。 ### segmented stacks - stack 问题的解决方式之一 - stack 由 small stack segments 链表组成 - 增长时追加新 segment 到链表 - 缩容时从链表中移除 segment。 - 问题： - push stack frame 的 cost 变动很大（需要分配新 segment 时开销就大，不需要则比较小） - 极端场景：一个循环中的函数调用触发分配新 segmenet，每次循环都需要分配，对性能影响很大 - 并且该行为对用户是透明的。 - Go 和 Rust 开始都使用这种方式，后来都废弃了。 ### stack copying - stack 问题的另一种解决方式 - stack 更像是一个 Vec 而非链表，空间不够时时重分配一块更大的 - 问题：重分配（reallocate）时需要进行拷贝，stack 的内存地址变动，指向 stack 中内存的**地址都会失效**，需要有其他机制来进行更新。 - Go 使用了 **stack copying** - Go的好处：指针和它指向的内存都在同一个栈中，所以更新指针只需要扫描它所在的栈。 - Rust：stack 中的指针可以执行另外一个 stack 中的内容（比如另外一个线程中的栈），跟踪指针就变成跟 GC 一样的问题，而 Rust 没有 Gc 所以不能使用该方式。 - 另外的问题：stack resizing 会导致跟其他语言集成时有困难。 - 将代码从在 green thread 上执行切换到在 OS thread 堆栈上运行，对于 FFI 来说可能代价过高。 - Go 接收了这个 FFI 开销，C# 因为这个原因[废弃](https://github.com/dotnet/runtimelab/issues/2398)了 green thread - Rust 需要运行在嵌入式系统，不能携带 virtual threading runtime > [@rogeralsing](https://github.com/rogeralsing) If I understand correctly, the main cost of increased foreign function call overhead comes from stack switching. Green thread is initialized with a small stack and grow by-demand, to reduce memory overhead of having many green threads. The called native code does not have stack growing functionality, so not switching stack could cause stack overflow. > Golang does stack switching when calling FFI which is also slow. - green thread 在 [RFC 230](https://github.com/rust-lang/rfcs/pull/230) 被移出 Rust。 ## 2.2 Iterators - Rust 转向 [external iterators](https://web.archive.org/web/20140716172928/https://mail.mozilla.org/pipermail/rust-dev/2013-June/004599.html) 的决定，以及它与 Rust ownership 和 borrowing model 结合后的高效性，最终导致 Rust 转向了 async/await。 - external iterators：由 end user 驱动迭代器（pull-based） - **Aliasing XOR Mutability** - aliasing 和 mutability 只能有一个，不能同时存在 - aliasing：相同的内存位置有两个或多个 references - mutability：有资格修改某个内存位置的 value - 即要求只能有一个 mutable reference（一个定义为 `&mut` 的变量），或者两个到多个 reference（定义为 `&` 的变量） ```rust fn main() { let a = String::from("a"); let b = &a; let c = &a; // So far we have two additional references to `a`. // This is aliasing, which is fine, as long as // those references don't have mutability. println!("a is {}", a); println!("b is {}", b); println!("c is {}", c); } ``` - 早期 mutable XOR aliased 机制没有 lifetime analysis， references 只是 argument modifiers. - 2012年 Rust 实现了第一版的 lifetime analysis，将 references 提升为 real types，并可以嵌入到 strcuts 中。 - external iterator 之前，Rust 使用了 callback 机制来定义迭代器 ```rust enum ControlFlow { Break, Continue, } trait Iterator { type Item; fn iterate(self, f: impl FnMut(Self::Item) -> ControlFlow) -> ControlFlow; } ``` - External iterators 可以完美地跟 Rust 的 ownership 和 borrowing system 结合， - 迭代器本质上被编译成了一个结构体，该结构体保存了迭代的状态，可以包含对其他数据结构（被迭代的数据结构）的引用 - 问题：实现一个迭代器难以编写，需要定义 state machine - 改进：未来可以支持类似 C# 一样使用 `yield` 来生成 generators。 - [# Implement `gen` blocks in the 2024 edition #116447](https://github.com/rust-lang/rust/pull/116447) ## 2.3 Futures ### continuation passing style - futures/promises API 被称为 "continuation passing style." - callback-based - 即向 Future 对象传递一个 contiunation 参数，当 future 完成时调用该参数作为最后一个操作。 ```rust trait Future { type Output; fn schedule(self, continuation: impl FnOnce(Self::Output)); } ``` - continuation passing style 在 Rust 中遇到的问题： - `join`：需要接受两个 futures，并行地运行它们 - 因此 join 的 continuation 需要被两个 child futures 所拥有（任意一个 future 都可能先结束） - 最终 continuation 的分配需要有引用技术，这对 Rust 来说无法接受。 ### readiness-based - 参考了 C 实现异步编程的方式，他们想要的是一个Future的定义，它可以被编译成状态机。 ```rust enum Poll<T> { Ready(T), Pending, } trait Future { type Output; fn poll(&mut self) -> Poll<Self::Output>; } ``` - future 由 external executor 进行 poll。 - 当 future 变成 pending 后，它存储一种唤醒 exectutor 的方式（当它变成 ready 后）。 - 这一转变跟迭代器的非常类似。 - 状态机 borrow state from outside - 最终构建出 single-object state machine。 - 前期需要用户自己实现每一个 future，而用户实现一个 future 的困难： - 当 future spawned 后，需要逃离周围上下文（surrounding context） - 因此不能从该上下文 borrow state， task 必须拥有它所有的 state。 ## 2.4 Async/await - future 与 green thread 在 stack 占用上的对比： - green thread：按需增长 - future： perfectly sized stack，需要多少就严格分配了多少 - future 的 stack 的实现方式为 struct，而 struct 在 Rust 中是可以被 move 的。 - move 会导致跟 green thread 一样的问题：stack 上的指针失效 - 因此需要限制 future 是 **immovable** 的。 - async/await 的三个要点： - 需要语言支持 async/await 语法，这样用户可以使用类 coroutine 函数构建复杂的 future - Async/await 语法需要支持将那些函数编译成 **自引用**的结构体，这种用户可以在 coroutine 中使用引用。 - 这个特性需要尽快发布。 - `Pin`类型实现了限制 future 可 move。 ------------ # 3. Organizational considerations - Rust 缺乏 runtime，导致 green thread 不可行 - Rust 需要支持 embedding（嵌入到其他应用或者运作在嵌入式系统） - Rust 不能为 green thread 执行必要的内存管理。 - Rust 天然地可以将协程编译成高度可优化的状态机，同时仍保持**内存安全性** - 我们不仅在 Future 中利用这一点，也在迭代器中加以利用。 - 在实现高性能网络服务时，在没有用户态并发的语言如 C 中，人们需要手写状态机。 - 而 `Future` 可以避免手写状态机，让编译器生成状态转换。 - 并且相比 C/C++ 能收获内存安全性。 - 支持 async/await 从商业角度，也更利于 Rust 的发展，许多工业软件需要高性能的网络服务。 - 用户的抱怨：crates.io上 Rust 异步的生态，都集中在使用 async/await。 ------- # To be continued