The time I had mixed feelings about Rust …
Due to Rust’s recent rise in popularity among software engineers around the world, I got interested in learning about Rust and how it would help me and my team write better production grade codes.
Note: I come from a background with extensive experience in Python and some bit of C/C++ and Java.
After reading some materials on Rust and its internal workings, here are few points that clearly stood out as to how and why Rust is different.
- The most important aspect of Rust is how it manages memory. Instead of deploying a garbage collector at runtime it checks for single ownership and borrowing at compile time.
- Unlike Python, Rust is a compiled programming language similar to C/C++.
- Its not the responsibility of the OS or the Rust runtime to clean up unused heap memory or optimize memory usage but the onus is on the developer to take all the pain of writing production grade codes.
Let me elaborate a bit on point number 1.
There are 2 ways by which an OS allocates functions and variables in the computer’s memory — Stack and Heap.
CS 225 | Stack and Heap Memory (illinois.edu)
Functions (arguments etc.) and variables with primitive data types are stored on a stack data structure in the memory. The sizes of these objects are predefined prior to program run.
For e.g. in C++, “int” takes up 4 bytes, “long” 8 bytes and so on. The objects in the stack are deallocated using a LIFO strategy.
Hence stack deallocation do not require explicit garbage collection or memory management as when the top object in the stack goes out of scope, it is deallocated and the memory is freed.
Objects having dynamic sizes i.e. the size can increase or decrease during program run such as a vector in C++, list in Python etc. are allocated in the Heap instead of Stack
(Reference to the actual object stored in Heap might be stored in the Stack).
Objects in the Heap are not automatically deallocated and require either explicit way such as using “delete” keyword in C/C++ or “del” in Python, or the garbage collector will clean them up after program completion.
Microsoft Word — CSEP521 Project — Garbage Collection Algorithms.doc (washington.edu)
Java garbage collection: What is it and how does it work? | New Relic
Garbage collection algorithms — Holly Cummins
C/C++ do not use garbage collection to clean up the Heap memory. Python and JAVA on the other hand uses garbage collection algorithms to clean up Heap memory.
Thus C/C++ can have memory leakage issues if an object that was allocated in the Heap is not explicitly deallocated.
Let’s think about this program in CPython:
import ctypes
# The list is stored in heap space
# The references x and y will be stored in stack
x = [1,2,3,4]
y = x
# x and y points to the same memory location in the heap
x_id = id(x)
y_id = id(y)
# They will print same id
print(f"x_id={x_id}, y_id={y_id}")
# changing either x or y will be reflected in both
y[2] = 5
x[0] = 7
print(f"x={x}, y={y}")
# get the refernce count of the address in memory holding the list
# It will be 2 (because x and y are both pointing to it)
print(ctypes.c_long.from_address(x_id).value)
# remove reference y
del y
# reference count will now be 1
# (only x is pointing to the same memory location in heap)
print(ctypes.c_long.from_address(x_id).value)
After the above program ends, the cpython garbage collector will clean up the remaining reference i.e. reference x to the heap memory location and also free the heap memory.
What if we remove both the references x and y in the code:
# remove last reference x
del x
# check the contents of the memory address x_id
print(ctypes.cast(x_id, ctypes.py_object).value)
If the reference count of an object in heap space reaches 0, the object is deallocated and memory is freed.
But there could be certain scenarios where there are cyclic references and in that case reference count will never reach 0. Only option to the free the memory is by using a Garbage Collector.
An example of cyclic reference count in CPython.
class Node:
def __init__(self):
self.next = None
x = Node()
y = Node()
x.next = y
y.next = x
x_id = id(x)
y_id = id(y)
print(f"x_id={x_id}, y_id={y_id}")
# reference counts will be 2 for both x and y
print(ctypes.c_long.from_address(x_id).value)
print(ctypes.c_long.from_address(y_id).value)
Some disadvantages with this kind of memory manangement are as follows:
- What if someone intended to do a cloning of the list instead of referencing ?
Any changes made to a one reference also updates all the other references. - If there are many objects with cyclic references as seen above, the heap space are not freed at runtime even if some of these objects are not used anymore.
- Due to multiple references to the same object in heap space, the objects are not freed until number of references reduces to 0. Let’s take a simple example as below.
def sum_of_max_per_list(input_list:list[list]):
input_list_id = id(input_list)
n = len(input_list)
outputs = [0]*n
for i in range(n):
outputs[i] = max(input_list[i])
# reference count is 3
# one for initization outside this method, one for function argument
# and one for function stack
print(ctypes.c_long.from_address(input_list_id).value)
# some memory intensive workloads
s = sum(outputs)
return []
input_list = [[10,1,12,4],[5,87,13,65],[10,11,12,13,14],[46,1,32,90,87]]
s = max_per_list(input_list)
Note that after we find the maximum element per list, there is no further requirement of “input_list” variable. If afterwards we are doing some memory intensive task with the output, it could lead to OOM errors in Python because the memory pointed to by input_list is not freed.
A better approach would be:
def max_per_list(input_list:list[list]):
n = len(input_list)
outputs = [0]*n
for i in range(n):
outputs[i] = max(input_list[i])
return outputs
input_list = [[10,1,12,4],[5,87,13,65],[10,11,12,13,14],[46,1,32,90,87]]
input_list_id = id(input_list)
outputs = max_per_list(input_list)
# memory corresponding to input_list is freed
del input_list
print(ctypes.c_long.from_address(input_list_id).value)
print(ctypes.cast(input_list_id, ctypes.py_object).value)
s = sum(outputs)
Something similar happens with C++, but unlike Python, C++ does not have garbage collection. Thus heap space occupied by unwanted objects must be freed manually by the user.
#include <iostream>
int main() {
// pointer x is stored in stack but the actual array is stored in heap
int* x = new int[10];
// do some initialization
for (int i = 0; i < 10; i++) x[i] = 2*i;
// pointer y is stored in stack
// points to same memory location as x
int* y = x;
y[2] = 5;
x[0] = 7;
for (int i = 0; i < 10; i++) std::cout << x[i] << " ";
std::cout << std::endl;
for (int i = 0; i < 10; i++) std::cout << y[i] << " ";
std::cout << std::endl;
// deleting y will also delete the pointer to x and the heap space will
// be freed.
delete [] y;
return 0;
}
In the above C++ code, x and y are pointers in stack pointing to same object in the heap memory. When the pointer y is deleted, the memory is freed. This is unlike Python, where the reference counter is decremented by 1.
In C/C++ for every malloc we need to call free() and for every new we need to call delete else we would be having memory leaks, since there is no garbage collector to take care of unwanted objects lying around in heap.
In Rust, we would write the sum of max per list program as follows:
use std::isize::MIN;
fn sum_of_max_per_list(input_list:Vec<Vec<isize>>)->isize {
let n:usize = input_list.len();
let mut outputs:Vec<isize> = vec![MIN;n];
for (i, _) in input_list.iter().enumerate() {
for (_j, y) in input_list[i].iter().enumerate() {
if *y >= outputs[i] {
outputs[i] = *y;
}
}
}
let mut s:isize = 0;
for x in outputs.iter() {
s += *x;
}
return s;
}
fn main() {
let input_list:Vec<Vec<isize>> = vec![
vec![10, 1, 12, 4],
vec![5, 87, 13, 65],
vec![10, 11, 12, 13, 14],
vec![46, 1, 32, 90, 87],
];
let s:isize = sum_of_max_per_list(input_list);
println!("{:?}", s);
}
There are few things to notice in the above code when we compare the same code in Python.
When we are passing the input_list as function argument, the ownership of the input_list changes and it is moved into the function. The implication of this is that if we wished to use input_list variable after calling the “sum_of_max_per_list” method, we would get a compilation error.
fn main() {
let input_list:Vec<Vec<isize>> = vec![
vec![10, 1, 12, 4],
vec![5, 87, 13, 65],
vec![10, 11, 12, 13, 14],
vec![46, 1, 32, 90, 87],
];
let s:isize = sum_of_max_per_list(input_list);
println!("{:?}", s);
println!("{:?}", input_list);
}
Compilation error:
error[E0382]: borrow of moved value: `input_list`
--> src/main.rs:33:22
|
23 | let input_list:Vec<Vec<isize>> = vec![
| ---------- move occurs because `input_list` has type `Vec<Vec<isize>>`, which does not implement the `Copy` trait
...
30 | let s:isize = sum_of_max_per_list(input_list);
| ---------- value moved here
...
33 | println!("{:?}", input_list);
| ^^^^^^^^^^ value borrowed here after move
|
note: consider changing this parameter type in function `sum_of_max_per_list` to borrow instead if owning the value isn't necessary
--> src/main.rs:3:35
|
3 | fn sum_of_max_per_list(input_list:Vec<Vec<isize>>)->isize {
| ------------------- ^^^^^^^^^^^^^^^ this parameter takes ownership of the value
| |
| in this function
= note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider cloning the value if the performance cost is acceptable
|
30 | let s:isize = sum_of_max_per_list(input_list.clone());
| ++++++++
Thus, when we return from the sum_of_max_per_list method, the variable input_list goes out of scope and the heap memory is freed.
In Rust, there can be at-most one owner of any object in heap memory. If the owner goes out of scope, the object is deleted and memory is freed. Thus we do not need additional reference counting or garbage collection.
What if we needed to use the input_list after function call ?
In that case, the function can only borrow the input_list object but the owner is still the input_list variable.
The updated Rust code with borrowing:
fn sum_of_max_per_list(input_list:&Vec<Vec<isize>>)->isize {
let n:usize = input_list.len();
let mut outputs:Vec<isize> = vec![MIN;n];
for (i, _) in input_list.iter().enumerate() {
for (_j, y) in input_list[i].iter().enumerate() {
if *y >= outputs[i] {
outputs[i] = *y;
}
}
}
let mut s:isize = 0;
for x in outputs.iter() {
s += *x;
}
return s;
}
fn main() {
let input_list:Vec<Vec<isize>> = vec![
vec![10, 1, 12, 4],
vec![5, 87, 13, 65],
vec![10, 11, 12, 13, 14],
vec![46, 1, 32, 90, 87],
];
let s:isize = sum_of_max_per_list(&input_list);
println!("{:?}", s);
println!("{:?}", input_list);
}
When the function is borrowing the object, by default it is an immutable borrow i.e. the input_list cannot be changed inside the function. If we needed to update the input_list, we would have to pass a mutable reference
use std::isize::MIN;
fn sum_of_max_per_list(input_list:&mut Vec<Vec<isize>>)->isize {
let n:usize = input_list.len();
let mut outputs:Vec<isize> = vec![MIN;n];
for (i, x) in input_list.iter_mut().enumerate() {
for (_j, y) in x.iter_mut().enumerate() {
if *y >= outputs[i] {
outputs[i] = *y;
}
*y = -1;
}
}
let mut s:isize = 0;
for x in outputs.iter() {
s += *x;
}
return s;
}
fn main() {
let mut input_list:Vec<Vec<isize>> = vec![
vec![10, 1, 12, 4],
vec![5, 87, 13, 65],
vec![10, 11, 12, 13, 14],
vec![46, 1, 32, 90, 87],
];
let s:isize = sum_of_max_per_list(&mut input_list);
println!("{:?}", s);
println!("{:?}", input_list);
}
But even if we are passing an object as a mutable reference to a function, we cannot move out i.e. change ownership of the object within the function scope. The below will throw compilation error.
fn sum_of_max_per_list(input_list:&mut Vec<Vec<isize>>)->isize {
let n:usize = input_list.len();
let mut outputs:Vec<isize> = vec![MIN;n];
// will throw compilation error here
let obj = *input_list;
for (i, x) in input_list.iter_mut().enumerate() {
for (_j, y) in x.iter_mut().enumerate() {
if *y >= outputs[i] {
outputs[i] = *y;
}
*y = -1;
}
}
let mut s:isize = 0;
for x in outputs.iter() {
s += *x;
}
return s;
}
Compilation error:
error[E0507]: cannot move out of `*input_list` which is behind a mutable reference
--> src/main.rs:7:15
|
7 | let obj = *input_list;
| ^^^^^^^^^^^ move occurs because `*input_list` has type `Vec<Vec<isize>>`, which does not implement the `Copy` trait
|
help: consider removing the dereference here
|
7 - let obj = *input_list;
7 + let obj = input_list;
|
If we have to assign the value of the object to another variable in the above code, we need to clone or copy the original object (duplicating in the heap memory).
Here is how to do it:
let obj = input_list.clone();
Also be careful of the fact that for same object we cannot mix and match concurrent mutable and immutable references.
fn sum_of_max_per_list(input_list:&mut Vec<Vec<isize>>)->isize {
let n:usize = input_list.len();
let mut outputs:Vec<isize> = vec![MIN;n];
// borrow immutably
let first_element:&isize = &input_list[0][0];
// borrow mutably
for (i, x) in input_list.iter_mut().enumerate() {
for (_j, y) in x.iter_mut().enumerate() {
if *y >= outputs[i] {
outputs[i] = *y;
}
*y = -1;
}
}
let mut s:isize = 0;
for x in outputs.iter() {
s += *x;
}
// immutable borrow used here
println!("{:?}", *first_element);
return s;
}
So what we are doing above is, first we borrow input_list as immutable i.e. this value should not be changed to get the 1st element at index (0,0).
Then we borrow mutably, implying that the 1st element might be changed during this process but later on we are using the immutable borrow (to print the 1st element) hence compiler will throw an error.
error[E0502]: cannot borrow `*input_list` as mutable because it is also borrowed as immutable
--> src/main.rs:9:19
|
7 | let first_element:&isize = &input_list[0][0];
| ---------- immutable borrow occurs here
8 |
9 | for (i, x) in input_list.iter_mut().enumerate() {
| ^^^^^^^^^^ mutable borrow occurs here
...
23 | println!("{:?}", *first_element);
| -------------- immutable borrow later used here
If we did not use the immutable borrow later on (line no. 23) we will not get the error.
These are some of the common things that a Rust developer must be aware of:
- Understanding when an object is moved. If an object is moved i.e. ownership changes, then any statement using the object later on (by non-owner) will cause compilation error.
- In case an object is borrowed by a function, one cannot move out or change ownership within the function scope. Although can clone or copy the object.
- In case an object is borrowed immutably by a function, one cannot change the value of the object within the function scope. Need to pass the reference as mutable.
- If an object is borrowed mutably then we cannot simulataneously borrow the object as immutable or another mutably. This follows from the fact that if it is intended to update an object then until and unless the updates are completed, one cannot make another update simultaneously. Also if it is intended to read the object, then one cannot simulataneously update the object till the read is completed.
If you think carefully all of the above makes sense if we are not using reference counting or garbage collection for heap memory cleanup.
Now let’s look at a complete Rust example to implement a simple Binary Search Tree.
Here is the BST setup:
use std::fmt::Debug;
pub trait MyTrait: PartialOrd + Debug + Copy {}
struct BST<T> where T: MyTrait {
root_node: Option<Box<Node<T>>>,
}
impl<T: MyTrait> BST<T> {
fn new() -> Self {
Self {
root_node: None,
}
}
}
#[derive(Clone)]
struct Node<T> where T: MyTrait {
val: T,
lt_node: Option<Box<Node<T>>>,
rt_node: Option<Box<Node<T>>>,
}
impl<T: MyTrait> Node<T> {
fn new(&val:&T) -> Self {
Self {
val,
lt_node: None,
rt_node: None,
}
}
}
Since we are going to implement a BST for a generic type i.e. the values stored in BST nodes is “generic” i.e. could be integer, float, string, other complex types etc. But we will limit the types to integer, float and string.
To define a generic type and its properties, we create a new trait MyTrait.
The properties of MyTrait are PartialOrd i.e. we can compare two elements of type T using less than or greater than operators. This is required since during BST insertion, we will compare an incoming value with the value in current node in BST using < or > inequalities and depending on that we will go to either left or right subtree.
Note: In order to use ≤ and ≥ instead of < or >, we have to use the property Ord instead of PartialOrd.
Debug property allows us to print the BST node values and Copy property allows us to copy or clone the node values. For e.g. when we delete a node, we copy the value of its successor node into itself.
“struct” in Rust is similar to C/C++ struct. The method “new” is like the class constructor.
One interesting thing to note is that when we define the Node struct, in C/C++, we define the left and the right nodes as pointers to the same Node struct and similarly in Python:
struct Node {
int val;
Node* lt_node;
Node* rt_node;
};
But in Rust, we have something like:
struct Node {
val: i32,
lt_node: Option<Box<Node>>,
rt_node: Option<Box<Node>>,
}
When Rust creates the struct in the heap memory, it has to know how much memory it has to allocate in the heap. If we just we use Option<Node>, then it becomes recursive type and it becomes impossible to infer the size in heap memory to allocate for Node object.
If we use something similar to C/C++ i.e.
struct Node {
val: i32,
lt_node: Option<Node>,
rt_node: Option<Node>,
}
Then we get compilation error:
error[E0072]: recursive type `Node` has infinite size
--> src/lib.rs:1:1
|
1 | struct Node {
| ^^^^^^^^^^^
2 | val: i32,
3 | lt_node: Option<Node>,
| ---- recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
|
3 | lt_node: Option<Box<Node>>,
| ++++ +
Box is a smart pointer in Rust. It can be used to break the infinite cycle.
The size of a Node object is the sum of the i32 value, and the sizes of the two Box pointers corresponding to lt_node and rt_node. I will not go into details of smart pointers in this post.
Next we define the insert method for BST struct as follows:
impl<T: MyTrait> BST<T> {
fn insert(&mut self, root_node:&mut Option<Box<Node<T>>>, &val:&T) {
let new_node = Node::new(&val);
match root_node {
Some(node) => {
// root_node is not None
if val <= node.val {
if node.lt_node.is_none() {
node.lt_node = Some(Box::new(new_node));
}
else {
self.insert(&mut node.lt_node, &val);
}
}
else {
if node.rt_node.is_none() {
node.rt_node = Some(Box::new(new_node));
}
else {
self.insert(&mut node.rt_node, &val);
}
}
}
None => {
// root node is None
*root_node = Some(Box::new(new_node));
}
}
}
}
Check if the root_node is not None, if root_node is None, create a new root node and assign to root node. If root_node is not None, we add the new node by calling the insert method recursively.
Note that we are passing the root_node as a mutable reference.
It needs to be mutable because the tree is updated by adding a new node at the leaf. The mutability is propagated upwards, towards the root node.
Now let’s look at the delete method for BST:
impl<T: MyTrait> BST<T> {
// delete a node with value val
fn delete(&mut self, root_node:&mut Option<Box<Node<T>>>, &val:&T) {
match root_node {
Some(node) => {
// if root_node is not None
if val == node.val {
// found the node to delete
if (node.lt_node.is_none()) && (node.rt_node.is_none()) {
*root_node = None;
}
else if node.lt_node.is_none() {
match &node.rt_node {
Some(x) => {
node.val = x.val;
node.rt_node = None;
}
None => {}
}
}
else if node.rt_node.is_none() {
match &node.lt_node {
Some(x) => {
node.val = x.val;
node.lt_node = None;
}
None => {}
}
}
else {
let mut new_node = &node.rt_node;
let mut has_successor:bool = false;
// find successor node
while !new_node.is_none() {
match new_node {
Some(x) => {
has_successor = true;
node.val = x.val;
new_node = &x.lt_node;
}
None => {}
}
}
if has_successor {
self.delete(&mut node.rt_node, &node.val);
}
}
}
else if val < node.val {
// go left to delete
self.delete(&mut node.lt_node, &val);
}
else {
// go right to delete
self.delete(&mut node.rt_node, &val);
}
}
None => {}
}
}
}
The deletion algorithm follows standard deletion in BST. If the node to be deleted is a leaf node, we simply delete it. Else if the node to be deleted has either a left or right child, we replace the node with its left or right child.
If node to be deleted has both children, then we find the successor node of this node. Copy the node value of successor to itself and delete the successor node. Note that the successor node is either a leaf node or has only right child.
The final piece of the puzzle is defining the main function and how we create the BST and call the insert and delete methods on it.
impl<T: MyTrait> BST<T> {
// print tree as shown below
// 10
// -- 5
// ---- 2
// ---- 9
// -- 20
// ---- 15
// ---- 30
fn print_tree(&self, root_node:&Option<Box<Node<T>>>, level:usize) {
match root_node {
Some(node) => {
println!("{} {:?}", "-".repeat(2*level), node.val);
self.print_tree(&node.lt_node, level+1);
self.print_tree(&node.rt_node, level+1);
}
None => {}
}
}
}
fn main() {
// BST with values as unsigned integers
let mut bst:BST<usize> = BST::new();
let mut root:Option<Box<Node<usize>>> = None;
bst.insert(&mut root, &10);
bst.insert(&mut root, &20);
bst.insert(&mut root, &5);
bst.insert(&mut root, &9);
bst.insert(&mut root, &2);
bst.insert(&mut root, &15);
bst.insert(&mut root, &30);
bst.print_tree(&root, 0);
println!();
bst.delete(&mut root, &10);
bst.print_tree(&root, 0);
}
Although it looks like a clean implementation and it is infact true that if there are no compilation errors in Rust, the code is correct, but it takes a significant amount of practice to write compilation error free codes.
Most of the compiler errors stems from the earlier discussion points.
After implementing few algorithms and data structures in Rust, I believe that Rust codes are truly production grade and as a tech lead, I would want myself and my team members to use Rust extensively in all data intensive production workloads.
References
- The Rust Programming Language — The Rust Programming Language (rust-lang.org)
- Introduction — Rust By Example (rust-lang.org)
- Title Page — The Rust Performance Book (nnethercote.github.io)
- Understanding Memory Management in Rust | by Bijesh O S | Geek Culture | Medium
- ✨🥞 Rust Visualized: The Stack, the Heap, and Pointers — DEV Community
- Heap Allocation | Writing an OS in Rust (phil-opp.com)
- Python Garbage Collection: What It Is and How It Works (stackify.com)
- Stack vs Heap: What’s the difference? (educative.io)
- Dismissing Python Garbage Collection at Instagram | by Instagram Engineering | Instagram Engineering (instagram-engineering.com)
- Memory Management in Python — Real Python