ThreadScan Tutorial
Monday, March 23, 2015
5:35 PM
This is a quick tutorial on using
the ThreadScan library
to perform memory reclamation on a concurrent data structure written in an
unmanaged language such as C/C++.
1. Prerequisites:
We will assume that the reader possesses the following:
·
One concurrent data structure written in C/C++, with correct but
not thread-safe Malloc/Free calls.
In particular, each memory node should be unreachable from the data structure
at the time when free is called on it.
This does not preclude concurrent threads from still having references to such
nodes on their stacks. This is the problem that ThreadScan is designed to solve
(see below).
·
One copy of the ThreadScan code, available from:
https://github.com/Willtor/ThreadScan
2. What ThreadScan Does
In a nutshell, ThreadScan implements a method to check that nodes that are candidates for reclamation are not still pointed to by other threads. The details of the technique are fully described in the paper, but the basic idea is that we check the address of the node against the stack and register contents of all other participating threads.
For efficiency, ThreadScan buffers deletes. More precisely, it maintains a delete ("purgatory") buffer at each thread. Once a thread reaches a set threshold, it will attempt to purge all the contents of the delete buffers. Delete candidates which are still pointed to by other threads remain in the delete buffer until the next reclamation phase.
3. Building ThreadScan
First, we build ThreadScan to obtain libthreadscan.so.
Optional: 'make install' to install the library file in /usr/local/lib/ and the header file in /usr/local/include/
4. An Example
Next, we show how to apply ThreadScan to an existing data structure.
We are going to apply
ThreadScan to the lock-free linked list data structure from the synchrobench
set of benchmarks, available at
https://github.com/gramoli/synchrobench
(This is an implementation of Tim Harris's classic lock-free list.)
In particular, once you have cloned the repository, the list code should be under
Quick note:
The current synchrobench implementation of the linked-list optimizes the
standard linked-list algorithm to take advantage of the fact that, in the
specific benchmark, it can leak memory.
We include a correct reclaimable implementation here.
Step 1: Code modifications
There are just two things
to add to the code.
First, include the ThreadScan function definition, either by defining
or by including <threadscan.h>, in case you decided to make install.
Second, there is the
matter of actually calling ThreadScan in the right place.
This is the point in the code where the memory node has been unlinked from the
data structure, and where you would usually call free. In the case of the
lock-free list delete method, this is right after the node has been rendered
unreachable.
The threadscan_collect call will automatically use the free call that your code originally employed. In particular, it is compatible with high-performance concurrent allocators such as TC-Malloc.
Step 2: Linker
To link against ThreadScan, add the usual parameters to the build line. In my case, this was:
Step 3: Profit!
We can now build the example and run a non-leaky version of the list:
Note: You can skip the preload if you have already installed ThreadScan via make install.
5. FAQ
Q: What are ThreadScan's limitations?
A: First, ThreadScan is not a garbage collector. It still relies on the programmer to correctly unlink the delete candidates from the data structure. Second, ThreadScan checks the contents of stacks and registers against nodes in the delete buffer. Hence, it assumes that you are not using pointer obfuscation techniques in your data structure. Third, reclaiming threads may take a latency hit while reclaiming. That's about it.
Q: What are ThreadScan's guarantees?
A: ThreadScan should not affect the progress guarantees of methods which do not reclaim memory. Moreover, ThreadScan guarantees that reclamation completes as long as threads are scheduled fairly by the OS, which is the case in most modern OSs.
Q: Is it efficient?
A: Yes. Below are some throughput vs #threads results for the lock-free linked list, see the paper for more benchmarks.
|
6. Links:
· Code: https://github.com/Willtor/ThreadScan
· Paper: here.
· More features: https://github.com/Willtor/ThreadScan/blob/master/README.md