10.1 Concurrency Introduction

Building High-performance Parallel Applications

Introduction

Rust’s approach to concurrency is built on a foundation of compile-time safety, leveraging its ownership and borrowing system to prevent common pitfalls like data races. By understanding how to create and manage threads, communicate via channels, and safely share state using mutexes and read-write locks, you gain powerful tools for building high-performance, parallel applications. For example, the Send and Sync traits act as the compiler’s guardians, ensuring that your concurrent code conforms to strict safety rules, allowing you to focus on logic rather than difficult runtime bugs. This explicit and robust concurrency model makes Rust a good choice for systems programming where reliability and performance are critical.

Reference Materials

These materials were used in the generation of this chapter and provide strong additional support for the concepts described:

The Rust Programming Language — Fearless Concurrency. Covers threads, Mutex, channels, and Rust’s ownership model for safe concurrency. https://doc.rust-lang.org/book/ch16-00-concurrency.html
The Rust Reference — std::sync & std::thread APIs. API-level documentation which is good for looking up details on channels Arc and Mutex in particular. See: https://doc.rust-lang.org/std/sync/index.html and the document on thread spawning: https://doc.rust-lang.org/std/thread/
Tokio Tutorial — Asynchronous Concurrency. Shows how to build concurrent TCP servers, timers, channels, tasks, and more. https://tokio.rs/tokio/tutorial
The Rust Async Book. Good explanations of executors, tasks, and concurrency patterns in async Rust. https://rust-lang.github.io/async-book/

Concurrency Introduction

This chapter describes Rust’s approach to concurrency, a critical aspect of modern systems programming. You learn how Rust enables you to write efficient, parallel code with strong compile-time guarantees against common concurrency bugs like data races. The focus is on practical examples, illustrating how Rust’s unique ownership system extends to multi-threaded environments, ensuring safety without sacrificing performance. You should review this chapter at the same time as the materials in the associated Rust Tutorial on Concurrency. Please try to complete both in parallel (!).

The first step is exploring the fundamentals of creating threads using Rust’s standard library. You see how to spawn new threads to execute tasks concurrently, and how to manage their life cycle using JoinHandles. A key aspect here is understanding move closures (C++ STL anguish trigger!), which are essential for safely transferring ownership of data to new threads, preventing potential issues where data might outlive its owner.

Next, the chapter examines how to pass data between threads using message passing. Rust’s std::sync::mpsc (multiple producer, single consumer) channels, which is present in the Assignment 2 template code example, provide a safe and idiomatic way for threads to communicate by sending and receiving messages. This chapter walks through setting up channels, sending data from one or more producer threads, and receiving it in a consumer thread, emphasising how this mechanism inherently prevents data races.

We then explore shared state concurrency, where multiple threads need to access and potentially modify the same data. While this can be a source of complex bugs in other languages, Rust provides primitives like std::sync::Mutex and std::sync::RwLock. You will see how mutexes ensure exclusive access to data, preventing simultaneous writes, and how read-write locks allow multiple readers or a single writer, optimising for common access patterns. Detailed examples demonstrate how Rust’s type system and ownership rules integrate with these locks, ensuring that locking and unlocking are handled correctly and safely.

Finally, the chapter covers the important Send and Sync traits. These are fundamental to Rust’s compile-time thread safety guarantees. You see what it means for a type to be Send (safe to transfer ownership across threads) and Sync (safe to share references across threads). Through examples, we investigate how these marker traits are automatically implemented for most types, and how they enable the compiler to catch thread safety issues before your code ever runs, leading to more robust and reliable concurrent applications.

Concurrency at the Edge

Why Concurrency Matters for Edge Devices

Edge devices — microcontrollers, single-board computers, and IoT gateways — are rarely doing one thing at a time. A smart sensor node might simultaneously read temperature data, handle an incoming Wi-Fi packet, write a log entry to flash storage, and blink a heartbeat LED. A robotics controller must process sensor inputs while computing motor commands and responding to network requests, all within strict real-time latency constraints. Without concurrency, each of these tasks would have to wait for the previous one to complete, producing a program that is unresponsive and wasteful of the hardware’s capability.

Concurrency is the ability of a program to make progress on multiple tasks within overlapping time periods. It does not require multiple physical processor cores. Even on a single-core device, the operating system creates the illusion of simultaneity by rapidly switching the CPU between threads — a process called context switching. On multi-core hardware such as the dual-core ESP32-S3 or the Raspberry Pi, true simultaneous execution is possible.

Concurrency vs. Parallelism

These two terms are related but distinct, and the distinction matters on edge hardware where you may be targeting a single-core microcontroller.

Parallelism means physically executing multiple computations at the exact same instant on separate hardware units, typically two CPU cores genuinely running different code at the same clock cycle.

Concurrency is about “structure”: how a program is organised so that multiple tasks can make progress by sharing available resources. On a single-core device, only one thread executes at any instant, but the scheduler gives each thread a turn often enough that the system appears responsive and the tasks “appear” to run in parallel.

	Concurrency	Parallelism
Definition	Multiple tasks make progress in overlapping time	Multiple tasks execute simultaneously
Requires	A scheduler (OS or async runtime)	Multiple CPU cores or hardware units
Edge example	OS time-slices between sensor reading and WiFi handling	Both cores on an ESP32-S3 running code at once
Single-core MCU	Yes, via scheduling	No

In practice, the two overlap: a concurrent program can exploit parallelism when more than one core is available, but it is useful when only one is.

Processes and Threads

The operating system provides two fundamental units of concurrent execution.

A process is a self-contained program instance with its own private memory space, file descriptors, and resources. Processes are isolated from one another; communication between them requires explicit mechanisms such as pipes, shared memory, or sockets. Starting a process is relatively expensive: the OS must allocate a new address space and initialise kernel data structures.

A thread is a lighter-weight unit of execution that lives inside a process. All threads within a process share the same memory space and file descriptors. Creating a thread is far cheaper than creating a process. Threads communicate by reading and writing shared memory directly, which is both their greatest advantage (low overhead, high speed) and their greatest danger (data races and corruption if not synchronised).

For edge programming, threads are the natural primitive:

Spawning a thread costs far less memory and time than a new process
Direct memory sharing avoids serialisation overhead when passing sensor data between tasks
A single process containing multiple threads is simpler to deploy and monitor on a resource-constrained device

Thread lifecycle
A thread moves through a series of states during its lifetime:

State	Description
Created	OS has allocated a stack and thread control block; not yet running
Runnable	Ready to execute, waiting for the scheduler to assign CPU time
Running	Actively executing instructions on a CPU core
Blocked	Waiting for an event (I/O, a lock, a timer, a channel message); not consuming CPU
Terminated	Returned from its entry function; resources freed once joined

The OS scheduler decides when to move threads between runnable and running states. Each context switch saves the current thread’s register state and restores another’s — this has a cost of roughly 1–10 microseconds on modern hardware. On hard real-time edge systems this overhead must be accounted for.

🎬Code Demo: The Thread Lifecycle

Thread Lifecycle

Rust Thread Lifecycle

derekmolloy.ie

🛰️

Edge AI Scenario

An edge device spawns a sensor thread that reads a sensor, runs inference on the sample, then sleeps for 50 ms before repeating. Meanwhile main waits on handle.join(). Both threads share a single CPU core — the OS scheduler decides who runs when.

The program

PC @ line 4 · executing here

▶use std::thread;
▶use std::time::Duration;
▶ 
▶fn main() {
▶    let handle = thread::spawn(|| {
▶        for _ in 0..2 {
▶            let sample = read_sensor();          // I/O
▶            let _label = run_inference(sample);  // compute
▶            thread::sleep(Duration::from_millis(50));
▶        }
▶    });
▶    handle.join().unwrap();
▶}

The five lifecycle states

hover or tap a state for detail

CPU Core 0

Currently executing

main thread

Wall clock

000 ms

Context switches

0 switches · ~0 µs cost

Lifecycle timeline

each cell = 5 ms · total 320 ms

main threadfn main() { … handle.join() }

Running

sensor threadthread::spawn(|| { read · infer · sleep })

not yet spawned

080160240320

Speed

0 ms320 ms

What just happened

t = 000 msmain

main starts running

Jump to event

Real-time takeaway

Every context switch costs roughly 1–10 µs on modern hardware as registers are saved and restored. On hard real-time edge systems with tight deadlines, this overhead must be budgeted alongside the work the threads do. Excessive blocking and waking — chatty I/O, fine-grained locks, short sleeps — multiplies the switch count quickly.

Classic Concurrency Hazards

Before examining Rust’s solutions it is worth understanding the problems concurrent programs face in general, independent of the language used.

Data race: two or more threads access the same memory location concurrently, at least one access is a write, and there is no synchronisation between them. The result is undefined behaviour: wrong answers, crashes, or silent data corruption. Data races are notoriously difficult to reproduce because they depend on precise thread scheduling.

Race condition: a broader term for any bug where program correctness depends on the relative timing of events. A data race is one kind of race condition, but race conditions can also occur at a higher level — for example, a check-then-act sequence where another thread modifies shared state between the check and the act.

Deadlock: two or more threads each wait for a resource held by the other, so neither can proceed. Thread A holds lock 1 and waits for lock 2; thread B holds lock 2 and waits for lock 1. The program hangs indefinitely with no error message.

Starvation: a thread is perpetually denied access to a resource because other threads are always given priority. Particularly relevant for real-time edge applications where sensor-reading threads must not be starved by compute-heavy processing threads.

Livelock: threads actively respond to each other’s state changes but make no overall progress. This is somewhat analogous to two people repeatedly stepping aside for each other in a corridor when trying to pass each other unsuccessfully.

Priority inversion is a subtle scheduling hazard that occurs when a high-priority thread is indirectly blocked by a low-priority thread through a shared lock. Imagine three threads: a high-priority control loop (H), a medium-priority computation task (M), and a low-priority logging thread (L). L acquires a mutex to write a log entry. H then tries to acquire the same mutex and blocks. M, which needs no lock, is scheduled ahead of L because it has higher priority — so, M runs and prevents L from ever releasing the mutex, effectively delaying H indefinitely. The high-priority thread ends up waiting on a medium-priority one, inverting the intended priority order.

Priority inversion caused the famous 1997 Mars Pathfinder reset bug: a low-priority data collection task held a mutex needed by a high-priority communications task, while a medium-priority weather task kept pre-empting the low-priority one. The spacecraft’s watchdog timer concluded the system had hung and triggered a reset. Real-time operating systems address this with priority inheritance (temporarily boosting L’s priority to H’s while it holds the lock) or priority ceiling protocols. On Linux-based edge devices, pthread_mutexattr_setprotocol with PTHREAD_PRIO_INHERIT enables this; on bare-metal RTOS platforms such as FreeRTOS it is a configurable mutex option.

Concurrency in C++ vs. Rust

C++ provides threads using std::thread (C++11 onwards) and synchronisation via std::mutex, std::lock_guard, std::condition_variable, and atomics. The APIs are structurally similar to Rust’s. The critical difference is enforcement: in C++, nothing prevents you from accessing shared data without holding the appropriate lock. The compiler will not object, and the bug will only appear at runtime, often intermittently and in production.

#include <iostream>
#include <thread>
#include <chrono>

void thread_task(int id) {
    for (int i = 1; i <= 3; ++i) {
        std::cout << "Thread " << id << ": tick " << i << "\n";
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }
}

int main() {
    std::thread t1(thread_task, 1);
    std::thread t2(thread_task, 2);

    t1.join();
    t2.join();

    std::cout << "Both threads finished.\n";
    return 0;
}

Build and execute as follows:

molloyd@dcu:~/tmp$ g++ thread.cpp -o thread
molloyd@dcu:~/tmp$ ./thread
Thread 1: tick 1
Thread 2: tick 1
Thread 1: tick 2
Thread 2: tick 2
Thread 1: tick 3
Thread 2: tick 3
Both threads finished.

The structure in this code is recognisable: std::thread takes a callable and its arguments, and join() blocks until the thread completes. This is directly analogous to Rust’s thread::spawn and JoinHandle::join. The difference is what happens the moment shared mutable data enters the picture: C++ will compile code that accesses shared state without a lock, leaving a data race. Rust will reject that code outright.

Rust’s type system makes it a compile-time error to access data across threads without proper synchronisation. This is what “fearless concurrency” means in practice: you are free to write complex concurrent code without the constant concern that you have introduced a subtle data race.

🧩Knowledge Check

Concept Match

Match the Fundamentals

derekmolloy.ie

Drag each definition into its matching concept slot, then click Submit. Tap × to return a placed card to the pool.

Concurrency

drag a definition here…

Parallelism

drag a definition here…

Process

drag a definition here…

Thread

drag a definition here…

Blocked

drag a definition here…

Definition Pool

Physical simultaneity where multiple computations execute at the exact same instant on separate hardware units.

A lightweight unit of execution sharing memory space; low overhead but requires careful synchronisation.

A structural property where multiple tasks make progress in overlapping time; does not require multiple cores.

An isolated execution environment with private memory; high overhead to create and communicate between.

A thread state where execution is paused waiting for an event (like I/O or a lock), consuming no CPU.

Quiz

Select 0/1

Why are threads generally preferred over processes for concurrent tasks on edge devices like the Raspberry Pi or ESP32?

derekmolloy.ie

Quiz

Select 0/1

Rust's compiler is famous for preventing 'Data Races'. Which scenario describes exactly what a data race is?

derekmolloy.ie