ThreadScan Tutorial

Monday, March 23, 2015

5:35 PM



This is a quick tutorial on using the ThreadScan library to perform memory reclamation on a concurrent data structure written in an unmanaged language such as C/C++.

1. Prerequisites:


We will assume that the reader possesses the following:

·        One concurrent data structure written in C/C++, with correct but not thread-safe Malloc/Free calls.
In particular, each memory node should be unreachable from the data structure at the time when free is called on it.
This does not preclude concurrent threads from still having references to such nodes on their stacks. This is the problem that ThreadScan is designed to solve (see below).

·        One copy of the ThreadScan code, available from:


2. What ThreadScan Does


In a nutshell, ThreadScan implements a method to check that nodes that are candidates for reclamation are not still pointed to by other threads. The details of the technique are fully described in the paper, but the basic idea is that we check the address of the node against the stack and register contents of all other participating threads.


For efficiency, ThreadScan buffers deletes. More precisely, it maintains a delete ("purgatory") buffer at each thread. Once a thread reaches a set threshold, it will attempt to purge all the contents of the delete buffers. Delete candidates which are still pointed to by other threads remain in the delete buffer until the next reclamation phase.


3. Building ThreadScan

First, we build ThreadScan to obtain


Machine generated alternative text:
—o queue.o —Nail —fPIC —c —İdi queue.c 
—o env.o —Rall —fPIC —c —İdi env.c 
—o wrappers.o —Rall —fPIC —c —İdi wrappers.c 
-o alloc.o -Rall -fPIC -c -idi alloc.c 
-o util.o -Wall -fPIC -c -idi util.c 
—o thread.o —Rall —fPIC —c —İdi thread.c 
—o proc.o —Nail —fPIC —c —İdi proc.c 
—o threadscan.o —Rall —fPIC —c —İdi threadscan.c 
—shared —AI, —soname, —o queue.o env.o wrappers.o alloc.o util.c 
q CC 
g CC 
q CC 
t h rea•-i.c_, proc.c_, threa•dscan.o -ptlırea•-i


Optional:  'make install' to install the library file in /usr/local/lib/ and the header file in /usr/local/include/



4. An Example


Next, we show how to apply ThreadScan to an existing data structure.


We are going to apply ThreadScan to the lock-free linked list data structure from the synchrobench set of benchmarks, available at
(This is an implementation of Tim Harris's classic lock-free list.)


In particular, once you have cloned the repository, the list code should be under



Quick note: The current synchrobench implementation of the linked-list optimizes the standard linked-list algorithm to take advantage of the fact that, in the specific benchmark, it can leak memory.
We include a correct reclaimable implementation here.


Step 1: Code modifications


There are just two things to add to the code.
First, include the ThreadScan function definition, either by defining


Machine generated alternative text:
(void * ) ; 
extern void 
threadscan collect


or by including <threadscan.h>, in case you decided to make install.



Second, there is the matter of actually calling ThreadScan in the right place.
This is the point in the code where the  memory node has been unlinked from the data structure, and where you would usually call free. In the case of the lock-free list delete method, this is right after the node has been rendered unreachable.


Machine generated alternative text:
t *set, val_t Val) 
window_t window »head, 
node_t *pred window. pred, 
node t *curr window. curr; 
if (curr-)val 
return e; 
Val) ; 
node t *succ 
— »next) ; 
node t *marked succ 
— (long) succ); 
if ( ! ATOMIC succ, marked_succ)) 
CAS MB(8pred- curr, succ); 
// Call ThreadScan to reclaim curr 
return 1;


The threadscan_collect call will automatically use the free call that your code originally employed. In particular, it is compatible with high-performance concurrent allocators such as TC-Malloc.


Step 2: Linker


To link against ThreadScan, add the usual parameters to the build line. In my case, this was:


Machine generated alternative text:
. / ThreadScan 
-1 threadscan


Step 3: Profit!


We can now build the example and run a non-leaky version of the list:



Note: You can skip the preload if you have already installed ThreadScan via make install.




5.      FAQ


Q: What are ThreadScan's limitations?

A: First, ThreadScan is not a garbage collector. It still relies on the programmer to correctly unlink the delete candidates from the data structure. Second, ThreadScan checks the contents of stacks and registers against nodes in the delete buffer. Hence, it assumes that you are not using pointer obfuscation techniques in your data structure. Third, reclaiming threads may take a latency hit while reclaiming. That's about it.


Q: What are ThreadScan's guarantees?

A: ThreadScan should not affect the progress guarantees of methods which do not reclaim memory. Moreover, ThreadScan guarantees that reclamation completes as long as threads are scheduled fairly by the OS, which is the case in most modern OSs. 


Q: Is it efficient?

A: Yes. Below are some throughput vs #threads results for the lock-free linked list, see the paper for more benchmarks. 


Machine generated alternative text:
1.2 · 
2 07 
6 07 
8e 07 
Lock-free Linked List 
丶 ,

Machine generated alternative text:
Epoch —.*— 
Slow Epoch 
Hazard Pointers —B—




6.      Links:


·        Code:

·        Paper: here.

·        More features: