ThreadScan Tutorial

Monday, March 23, 2015

5:35 PM

 

 

This is a quick tutorial on using the ThreadScan library to perform memory reclamation on a concurrent data structure written in an unmanaged language such as C/C++.
 

1. Prerequisites:

 

We will assume that the reader possesses the following:

·        One concurrent data structure written in C/C++, with correct but not thread-safe Malloc/Free calls.
In particular, each memory node should be unreachable from the data structure at the time when free is called on it.
This does not preclude concurrent threads from still having references to such nodes on their stacks. This is the problem that ThreadScan is designed to solve (see below).

·        One copy of the ThreadScan code, available from:
https://github.com/Willtor/ThreadScan

 

2. What ThreadScan Does

 

In a nutshell, ThreadScan implements a method to check that nodes that are candidates for reclamation are not still pointed to by other threads. The details of the technique are fully described in the paper, but the basic idea is that we check the address of the node against the stack and register contents of all other participating threads.

 

For efficiency, ThreadScan buffers deletes. More precisely, it maintains a delete ("purgatory") buffer at each thread. Once a thread reaches a set threshold, it will attempt to purge all the contents of the delete buffers. Delete candidates which are still pointed to by other threads remain in the delete buffer until the next reclamation phase.

 

3. Building ThreadScan

First, we build ThreadScan to obtain libthreadscan.so.

 

Machine generated alternative text:
-q 
-q 
-g 
-g 
-q 
-q 
-q 
-q 
-q 
make 
—o queue.o —Nail —fPIC —c —İdi queue.c 
—o env.o —Rall —fPIC —c —İdi env.c 
—o wrappers.o —Rall —fPIC —c —İdi wrappers.c 
-o alloc.o -Rall -fPIC -c -idi alloc.c 
-o util.o -Wall -fPIC -c -idi util.c 
—o thread.o —Rall —fPIC —c —İdi thread.c 
—o proc.o —Nail —fPIC —c —İdi proc.c 
—o threadscan.o —Rall —fPIC —c —İdi threadscan.c 
—shared —AI, —soname, libthreadscan.so —o libthreadscan.so queue.o env.o wrappers.o alloc.o util.c 
OCC 
qcc 
q CC 
g CC 
q CC 
-02 
-02 
-02 
-02 
-02 
-02 
-02 
-02 
-02 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
-DNDEBUG 
t h rea•-i.c_, proc.c_, threa•dscan.o -ptlırea•-i

 

Optional:  'make install' to install the library file in /usr/local/lib/ and the header file in /usr/local/include/

 

 

4. An Example

 

Next, we show how to apply ThreadScan to an existing data structure.

 

We are going to apply ThreadScan to the lock-free linked list data structure from the synchrobench set of benchmarks, available at
https://github.com/gramoli/synchrobench
(This is an implementation of Tim Harris's classic lock-free list.)

 

In particular, once you have cloned the repository, the list code should be under

 

 

Quick note: The current synchrobench implementation of the linked-list optimizes the standard linked-list algorithm to take advantage of the fact that, in the specific benchmark, it can leak memory.
We include a correct reclaimable implementation here.

 

Step 1: Code modifications

 

There are just two things to add to the code.
First, include the ThreadScan function definition, either by defining

 

Machine generated alternative text:
(void * ) ; 
extern void 
threadscan collect

 

or by including <threadscan.h>, in case you decided to make install.

 

 

Second, there is the matter of actually calling ThreadScan in the right place.
This is the point in the code where the  memory node has been unlinked from the data structure, and where you would usually call free. In the case of the lock-free list delete method, this is right after the node has been rendered unreachable.

 

Machine generated alternative text:
t *set, val_t Val) 
int 
while 
window_t window »head, 
node_t *pred window. pred, 
node t *curr window. curr; 
if (curr-)val 
Val) 
return e; 
Val) ; 
node t *succ 
— »next) ; 
node t *marked succ 
— (long) succ); 
if ( ! ATOMIC succ, marked_succ)) 
continue; 
CAS MB(8pred- curr, succ); 
// Call ThreadScan to reclaim curr 
threadscan_collect(curr); 
return 1;

 

The threadscan_collect call will automatically use the free call that your code originally employed. In particular, it is compatible with high-performance concurrent allocators such as TC-Malloc.

 

Step 2: Linker

 

To link against ThreadScan, add the usual parameters to the build line. In my case, this was:

 

Machine generated alternative text:
. / ThreadScan 
-1 threadscan

 

Step 3: Profit!

 

We can now build the example and run a non-leaky version of the list:

 

 

Note: You can skip the preload if you have already installed ThreadScan via make install.

 

 

 

5.      FAQ

 

Q: What are ThreadScan's limitations?

A: First, ThreadScan is not a garbage collector. It still relies on the programmer to correctly unlink the delete candidates from the data structure. Second, ThreadScan checks the contents of stacks and registers against nodes in the delete buffer. Hence, it assumes that you are not using pointer obfuscation techniques in your data structure. Third, reclaiming threads may take a latency hit while reclaiming. That's about it.

 

Q: What are ThreadScan's guarantees?

A: ThreadScan should not affect the progress guarantees of methods which do not reclaim memory. Moreover, ThreadScan guarantees that reclamation completes as long as threads are scheduled fairly by the OS, which is the case in most modern OSs. 

 

Q: Is it efficient?

A: Yes. Below are some throughput vs #threads results for the lock-free linked list, see the paper for more benchmarks. 

 

Machine generated alternative text:
1.2 · 
1. 
1. 
2 07 
07 
6 07 
8e 07 
08 
0 
Lock-free Linked List 
丶 ,

Machine generated alternative text:
ThreadScan 
Epoch —.*— 
Slow Epoch 
Hazard Pointers —B—

 

 

 

6.      Links:

 

·        Code: https://github.com/Willtor/ThreadScan

·        Paper: here.

·        More features: https://github.com/Willtor/ThreadScan/blob/master/README.md