Rust - пример безопасного параллелизма

Опубликовано: 31 Марта, 2022

Прежде чем взглянуть на сам Rust, давайте вернемся в 1992 год. Гвидо ван Россум, пытаясь справиться с условиями гонки в интерпретаторе CPython, добавил блокировку, известную как Global Interpreter Lock или сокращенно GIL. Два с половиной десятилетия спустя это стало одним из основных недостатков интерпретатора Python, который мы все любим и используем.

Так что же такое Global Interpreter Lock?

In Python, everything you create is allocated as a Python object in the memory, and a reference to it is returned back. Let us use visuals to better understand what is going on. Consider the line of code:

a = []

Behind the curtains, this is what the Python interpreter does

The space for [] is allocated and a reference of it is returned to A. What happens if we assign a to another variable b?

a = []
b = a

If we peek behind the curtains, this is what it looks like:

How is the memory freed when all the references are dropped?

Here is where Python’s simplicity comes into the play. With the Python object is another value linked to it called reference count. Reference count is a number that tells how many variables hold a reference to the given allocated value. When a new reference is made, this value increases. When a reference is dropped this value is decremented. To make the above diagrams more clear, this is what they look like with reference count.

When the reference count drops to zero, the memory allocated for the object is freed and that is how the CPython interpreter manages the memory. Without any Garbage Collector running periodically, it makes integrating C API with Python a breeze.

Note: For more information, refer to What is the Python Global Interpreter Lock (GIL)

With this comes a big limitation – What if two threads want to make a new reference or drop the reference to an object?

Take the above example with variables a and b. If a and b are on separate threads and want to drop the reference at exactly the same time, it creates something called a race condition. Let’s say first the reference count is read, decremented and stored – this is what happens in assembly code. If the reading happens at the exact same time, both threads will take the value 2, decrement to it to 1 and write it back to the object. The problem here is both the references are dropped but the Object’s reference count is held at 1 which means that this object can never be freed and leads to a memory leak.

The other scenario is even more horrifying – what if addition of two new reference in two threads only increase the value of the reference count by 1? At some point, when reference of one is dropped, the reference count drops to zero and the memory is collected but a reference still exists. This will lead to a scenario similar to a core dump or retrieval of garbage value from memory.

GIL averts this by adding a global lock for Python telling at any point of time, the thread that acquires the GIL is the only thread that can do memory IO, bytecode translation and all the other low-level things. This essentially means that although there may be 16 threads, only the thread that acquired GIL is doing the work while all the other threads are busy trying to acquire it. This makes Python weirdly single-threaded because only one thread runs at a time.

What adds to the problem is that there is no efficient way to remove GIL and preserve the speed of a single-threaded workload. An attempt to remove GIL with atomic increment and decrement saw the interpreter slowing down by 30% which for a language like CPython is a big no-no.

Good story but what has this all got to do with Rust?

Rust is a language built by Mozilla Research for safe concurrency. A code in Rust with race conditions is almost impossible to compile. The Rust compiler will not accept any code that is not memory or thread-safe. It does a check to see whether any race condition arises within the code and fails to compile is such a scenario exists.

Wonderful, then why can’t this be added to other compilers to avoid these scenarios altogether?

It’s complicated. Rust doesn’t follow the traditional programming pattern. Instead, it follows the process of ownership and borrowing. This means that at any point in time, Rust will make sure there is only one mutable reference to the object in question. You can have multiple read-only references but if you want to write to a location, you have to take ownership of the object and then perform the mutation.

Rust’s model can’t be directly ported to other compilers efficiently as the way of writing Rust code is fundamentally different compared to the way one might write C and C++ code. Where Rust truly shines is the way it brings safety and performance together in a single codebase. This is the reason Microsoft is Betting Big on Rust using it to develop open-source libraries and projects to tackle the memory issues crippling some of their core products.

If you are a web developer Rust is a great language to write Web Assembly code. Web Assembly is an intermediate low-level language for the browser and Rust is one of the languages that can be compiled to WASM. It is so efficient that NPM now uses Rust in their toolchain.

Rust is here to stay and disrupt the way we write concurrent programs away from a world of garbage collection. The growing community is a clear proof of its strength and its adoption by large tech companies is a clear sign its a language worth a look at

Rust - пример безопасного параллелизма

Так что же такое Global Interpreter Lock?

How is the memory freed when all the references are dropped?

With this comes a big limitation – What if two threads want to make a new reference or drop the reference to an object?

Good story but what has this all got to do with Rust?

РЕКОМЕНДУЕМЫЕ СТАТЬИ