Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

10.1 Concurrency Introduction

Building High-performance Parallel Applications

Section titled “Building High-performance Parallel Applications”

Rust’s approach to concurrency is built on a foundation of compile-time safety, leveraging its ownership and borrowing system to prevent common pitfalls like data races. By understanding how to create and manage threads, communicate via channels, and safely share state using mutexes and read-write locks, you gain powerful tools for building high-performance, parallel applications. For example, the Send and Sync traits act as the compiler’s guardians, ensuring that your concurrent code conforms to strict safety rules, allowing you to focus on logic rather than difficult runtime bugs. This explicit and robust concurrency model makes Rust a good choice for systems programming where reliability and performance are critical.

These materials were used in the generation of this chapter and provide strong additional support for the concepts described:

This chapter describes Rust’s approach to concurrency, a critical aspect of modern systems programming. You learn how Rust enables you to write efficient, parallel code with strong compile-time guarantees against common concurrency bugs like data races. The focus is on practical examples, illustrating how Rust’s unique ownership system extends to multi-threaded environments, ensuring safety without sacrificing performance. You should review this chapter at the same time as the materials in the associated Rust Tutorial on Concurrency. Please try to complete both in parallel (!).

The first step is exploring the fundamentals of creating threads using Rust’s standard library. You see how to spawn new threads to execute tasks concurrently, and how to manage their life cycle using JoinHandles. A key aspect here is understanding move closures (C++ STL anguish trigger!), which are essential for safely transferring ownership of data to new threads, preventing potential issues where data might outlive its owner.

Next, the chapter examines how to pass data between threads using message passing. Rust’s std::sync::mpsc (multiple producer, single consumer) channels, which is present in the Assignment 2 template code example, provide a safe and idiomatic way for threads to communicate by sending and receiving messages. This chapter walks through setting up channels, sending data from one or more producer threads, and receiving it in a consumer thread, emphasising how this mechanism inherently prevents data races.

We then explore shared state concurrency, where multiple threads need to access and potentially modify the same data. While this can be a source of complex bugs in other languages, Rust provides primitives like std::sync::Mutex and std::sync::RwLock. You will see how mutexes ensure exclusive access to data, preventing simultaneous writes, and how read-write locks allow multiple readers or a single writer, optimising for common access patterns. Detailed examples demonstrate how Rust’s type system and ownership rules integrate with these locks, ensuring that locking and unlocking are handled correctly and safely.

Finally, the chapter covers the important Send and Sync traits. These are fundamental to Rust’s compile-time thread safety guarantees. You see what it means for a type to be Send (safe to transfer ownership across threads) and Sync (safe to share references across threads). Through examples, we investigate how these marker traits are automatically implemented for most types, and how they enable the compiler to catch thread safety issues before your code ever runs, leading to more robust and reliable concurrent applications.

Edge devices — microcontrollers, single-board computers, and IoT gateways — are rarely doing one thing at a time. A smart sensor node might simultaneously read temperature data, handle an incoming Wi-Fi packet, write a log entry to flash storage, and blink a heartbeat LED. A robotics controller must process sensor inputs while computing motor commands and responding to network requests, all within strict real-time latency constraints. Without concurrency, each of these tasks would have to wait for the previous one to complete, producing a program that is unresponsive and wasteful of the hardware’s capability.

Concurrency is the ability of a program to make progress on multiple tasks within overlapping time periods. It does not require multiple physical processor cores. Even on a single-core device, the operating system creates the illusion of simultaneity by rapidly switching the CPU between threads — a process called context switching. On multi-core hardware such as the dual-core ESP32-S3 or the Raspberry Pi, true simultaneous execution is possible.

These two terms are related but distinct, and the distinction matters on edge hardware where you may be targeting a single-core microcontroller.

Parallelism means physically executing multiple computations at the exact same instant on separate hardware units, typically two CPU cores genuinely running different code at the same clock cycle.

Concurrency is about “structure”: how a program is organised so that multiple tasks can make progress by sharing available resources. On a single-core device, only one thread executes at any instant, but the scheduler gives each thread a turn often enough that the system appears responsive and the tasks “appear” to run in parallel.

ConcurrencyParallelism
DefinitionMultiple tasks make progress in overlapping timeMultiple tasks execute simultaneously
RequiresA scheduler (OS or async runtime)Multiple CPU cores or hardware units
Edge exampleOS time-slices between sensor reading and WiFi handlingBoth cores on an ESP32-S3 running code at once
Single-core MCUYes, via schedulingNo

In practice, the two overlap: a concurrent program can exploit parallelism when more than one core is available, but it is useful when only one is.

The operating system provides two fundamental units of concurrent execution.

A process is a self-contained program instance with its own private memory space, file descriptors, and resources. Processes are isolated from one another; communication between them requires explicit mechanisms such as pipes, shared memory, or sockets. Starting a process is relatively expensive: the OS must allocate a new address space and initialise kernel data structures.

A thread is a lighter-weight unit of execution that lives inside a process. All threads within a process share the same memory space and file descriptors. Creating a thread is far cheaper than creating a process. Threads communicate by reading and writing shared memory directly, which is both their greatest advantage (low overhead, high speed) and their greatest danger (data races and corruption if not synchronised).

For edge programming, threads are the natural primitive:

  • Spawning a thread costs far less memory and time than a new process
  • Direct memory sharing avoids serialisation overhead when passing sensor data between tasks
  • A single process containing multiple threads is simpler to deploy and monitor on a resource-constrained device

Thread lifecycle
A thread moves through a series of states during its lifetime:

StateDescription
CreatedOS has allocated a stack and thread control block; not yet running
RunnableReady to execute, waiting for the scheduler to assign CPU time
RunningActively executing instructions on a CPU core
BlockedWaiting for an event (I/O, a lock, a timer, a channel message); not consuming CPU
TerminatedReturned from its entry function; resources freed once joined

The OS scheduler decides when to move threads between runnable and running states. Each context switch saves the current thread’s register state and restores another’s — this has a cost of roughly 1–10 microseconds on modern hardware. On hard real-time edge systems this overhead must be accounted for.

Thread Lifecycle

Rust Thread Lifecycle

🛰️

Edge AI Scenario

An edge device spawns a sensor thread that reads a sensor, runs inference on the sample, then sleeps for 50 ms before repeating. Meanwhile main waits on handle.join(). Both threads share a single CPU core — the OS scheduler decides who runs when.

The program

PC @ line 4 · executing here

use std::thread;
use std::time::Duration;
 
fn main() {
let handle = thread::spawn(|| {
for _ in 0..2 {
let sample = read_sensor(); // I/O
let _label = run_inference(sample); // compute
thread::sleep(Duration::from_millis(50));
}
});
handle.join().unwrap();
}

The five lifecycle states

hover or tap a state for detail

CPU Core 0
Currently executing
main thread

Wall clock

000 ms

Context switches

0 switches · ~0 µs cost

Lifecycle timeline

each cell = 5 ms · total 320 ms

main threadfn main() { … handle.join() }
Running
sensor threadthread::spawn(|| { read · infer · sleep })
not yet spawned
080160240320
Speed
0 ms320 ms

What just happened

t = 000 msmain

main starts running

Jump to event

Real-time takeaway

Every context switch costs roughly 1–10 µs on modern hardware as registers are saved and restored. On hard real-time edge systems with tight deadlines, this overhead must be budgeted alongside the work the threads do. Excessive blocking and waking — chatty I/O, fine-grained locks, short sleeps — multiplies the switch count quickly.

Before examining Rust’s solutions it is worth understanding the problems concurrent programs face in general, independent of the language used.

Data race: two or more threads access the same memory location concurrently, at least one access is a write, and there is no synchronisation between them. The result is undefined behaviour: wrong answers, crashes, or silent data corruption. Data races are notoriously difficult to reproduce because they depend on precise thread scheduling.

Race condition: a broader term for any bug where program correctness depends on the relative timing of events. A data race is one kind of race condition, but race conditions can also occur at a higher level — for example, a check-then-act sequence where another thread modifies shared state between the check and the act.

Deadlock: two or more threads each wait for a resource held by the other, so neither can proceed. Thread A holds lock 1 and waits for lock 2; thread B holds lock 2 and waits for lock 1. The program hangs indefinitely with no error message.

Starvation: a thread is perpetually denied access to a resource because other threads are always given priority. Particularly relevant for real-time edge applications where sensor-reading threads must not be starved by compute-heavy processing threads.

Livelock: threads actively respond to each other’s state changes but make no overall progress. This is somewhat analogous to two people repeatedly stepping aside for each other in a corridor when trying to pass each other unsuccessfully.

C++ provides threads using std::thread (C++11 onwards) and synchronisation via std::mutex, std::lock_guard, std::condition_variable, and atomics. The APIs are structurally similar to Rust’s. The critical difference is enforcement: in C++, nothing prevents you from accessing shared data without holding the appropriate lock. The compiler will not object, and the bug will only appear at runtime, often intermittently and in production.

#include <iostream>
#include <thread>
#include <chrono>
void thread_task(int id) {
for (int i = 1; i <= 3; ++i) {
std::cout << "Thread " << id << ": tick " << i << "\n";
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
int main() {
std::thread t1(thread_task, 1);
std::thread t2(thread_task, 2);
t1.join();
t2.join();
std::cout << "Both threads finished.\n";
return 0;
}

Build and execute as follows:

Terminal window
molloyd@dcu:~/tmp$ g++ thread.cpp -o thread
molloyd@dcu:~/tmp$ ./thread
Thread 1: tick 1
Thread 2: tick 1
Thread 1: tick 2
Thread 2: tick 2
Thread 1: tick 3
Thread 2: tick 3
Both threads finished.

The structure in this code is recognisable: std::thread takes a callable and its arguments, and join() blocks until the thread completes. This is exactly as Rust’s thread::spawn and JoinHandle::join. The difference is what happens the moment shared mutable data enters the picture: C++ will compile code that accesses shared state without a lock, leaving a data race. Rust will reject that code outright.

Rust’s type system makes it a compile-time error to access data across threads without proper synchronisation. This is what “fearless concurrency” means in practice: you are free to write complex concurrent code without the constant concern that you have introduced a subtle data race.

Concept Match

Match the Fundamentals

Drag each definition into its matching concept slot, then click Submit. Tap × to return a placed card to the pool.

Concurrency
drag a definition here…
Parallelism
drag a definition here…
Process
drag a definition here…
Thread
drag a definition here…
Blocked
drag a definition here…

Definition Pool

A lightweight unit of execution sharing memory space; low overhead but requires careful synchronisation.
An isolated execution environment with private memory; high overhead to create and communicate between.
A thread state where execution is paused waiting for an event (like I/O or a lock), consuming no CPU.
A structural property where multiple tasks make progress in overlapping time; does not require multiple cores.
Physical simultaneity where multiple computations execute at the exact same instant on separate hardware units.
Quiz
Select 0/1

Why are threads generally preferred over processes for concurrent tasks on edge devices like the Raspberry Pi or ESP32?

Quiz
Select 0/1

Rust's compiler is famous for preventing 'Data Races'. Which scenario describes exactly what a data race is?