FSAM is a new sparse Flow-Sensitive pointer Analysis for Multithreaded C programs (with Pthreads) conducted through a series of thread interference analysis phases.
Sparse Flow-Sensitive Pointer Analysis for Multithreaded Programs 2016 International Symposium on Code Generation and Optimization (CGO' 16)
Virtual Image of FSAM Framework: FSAM.ova (4.5GB)
FSAM is built on top of SVF, a static value-flow analysis framework operating on LLVM-IR. FSAM will be integrated into SVF based on the latest release of LLVM (version 3.7.0). The core source code files reside under "include/MTA" and "lib/MTA" and can be downloaded here . Its executable file "mta" can be found at "Release+Asserts/bin/mta" after a successful build.
To help users reproduce the data in the paper, we have also provided a virtual machine image, which contains the source code of FSAM, together with scripts and benchmarks used in our experimental evaluation, as described in the paper.
The OS in the virtual machine image is Ubuntu 12.04. A user account has been created with both its username and password as "pta". To run FSAM, you are advised to allocate at least 16GB memory to the virtual machine. The traditional non-sparse analysis, denoted NonSparse in the paper, is slow and requires large memory, as a lower memory budget may force OS to kill the running process when it is used to analyse some large programs, e.g., bodytrack, x264. Finally, a VirtualBox with version 4.1.12 or newer is required to run the image.
GPLv3
The FSAM project is located in the "pta" directory under the HOME directory of the user named "pta". To run the tests as we did in our evaluation, open a terminal and issue the following commands:
cd /home/pta/pta/ | # Go to the FSAM project directory, denoted as $FSAMHOME |
. ./setup.sh | # Set up environment variables (note that there is a white space between the two dots) |
cd cgo-exp | # Go to the test directory |
To reproduce our results for the 10 benchmarks evaluated, please do the following:
Collect the results of only FSAM in Figure 12:
./table2.sh | # Run FSAM analysis only |
./collect_table2.sh mta_res | # Collect and print out the results to terminal |
Collect the results of both FSAM and Non-Sparse analyses in Figure 12:
./table2.sh both | # Run FSAM and Non-sparse analyses |
./collect_table2.sh mta_res | # Collect and print out the results to terminal |
Collect the results in Table 2:
./figure12.sh | # Run FSAM with each of its phases disabled |
./collect_figure12.sh mta_res | # Collect and print out the results to terminal |
Please note that all the results related to the analysis times and memory usage reported in our paper were obtained on a 2.70GHz Intel Xeon 4-core CPU running Ubuntu Linux with 64GB memory, as described in Section 4.1. On this platform, obtaining the results for FSAM may take about 20 mins and obtaining the results for NonSparse may take more than five hours.
Initially, the users are advised to analyse a few benchmarks using the default configuration file 'list.stat'. To analyse all the benchmarks, please use another configuration file containing all the benchmarks:
cp list.stat.full list.stat |
For comparison purposes, the experimental data used in the paper are provided under "pta/cgo-exp/mta_res_cgo/".
To reproduce the results shown in Table 2 and Figure 12 with the provided data, issue the following commands:
./collect_table2.sh mta_res_cgo |
./collect_figure12.sh mta_res_cgo |
You can also analyse a single program using run_single.sh under "pta/cgo-exp" folder, say, "word_count.orig", as follows:
./run_single.sh word_count.orig |
If you would like to know more about how to use our pointer analysis framework or would like to add your own extensions, please refer to this wiki and this doxygen manual.
The benchmarks used in our paper are stored under "cgo-exp" directory. There are 10 programs selected from Phoenix-3.0, PARSEC-3.0 and open-source applications:
word_count, kmeans, radiosity, ferret, bodytrack, raytrace, x264, automount, mt_daapd and httpd-server |
The source code of each program is compiled into bit code files using clang and then merged together using LLVM Gold Plugin at link time stage (LTO) to produce a whole-program bc file.
We have also provided a micro-benchmark suite PTABen to validate the correctness of our pointer analysis algorithms for both single- and multi-threaded C programs. Please refer this link to see the source code hosted on GitHub. The layout of the test suite is as follows:
fi_tests : flow-insensitive tests |
fs_tests : flow-sensitive tests |
cs_tests : context-sensitive tests |
path_tests : path-sensitive tests |
complex_tests : complex test cases simplified from real programs |
mta : pthread tests |
scripts : scripts to run the tests |
Some micro-benchmarks can be found under "/home/pta/pta/tests/". You can compile them into LLVM bitcode files by running:
clang -c -emit-llvm -g example.c -o example.bc | # add "-g" option for debugging purpose |
opt -mem2reg example.bc -o example.opt | # optional |
For each bitcode file (.opt file) generated, you can try to analyse it by using the script "run_single.sh" as before. For large programs with multiple bitcode files, please refer to
Install and Use LLVM Gold Plugin in SVF |
for details on how to link several bitcode files into one.
For the five toy programs given in Figure 1 of our paper, we describe below how to inspect the analysis results produced by the flow-insensitive Andersen's analysis (used as the pre-analysis in FSAM) and FSAM's flow-sensitive pointer analysis.
cd /home/pta/pta/ | |
. ./setup | |
unzip toyprograms.zip | |
cd toyprograms | |
./run_all.sh | # analyze the five programs one by one and dump the results to the terminal |
./run_single.sh ex1.c.opt | # analyze a single file |
After a program has been analyzed, the points-to information, together with all the statistics, will be printed on the terminal. Note that when the "-print-pts" option is turned on, FSAM will first print the results from Andersen's analysis during its pre-analysis phase and then the results from its own flow-sensitive pointer analysis phase.
The following table shows the results for analyzing the five programs given in Figure 1 of our paper:
C file | LLVM bit code file | Andersen's points-to | FSAM's flow-sensitive points-to | PAG (Program Assignment Graph) |
ex1.c | ex1.c.opt.ll | ex1.ander.pts | ex1.fs.pts | ex1.dot     ex1.pdf |
ex2.c | ex2.c.opt.ll | ex2.ander.pts | ex2.fs.pts | ex2.dot     ex2.pdf |
ex3.c | ex3.c.opt.ll | ex3.ander.pts | ex3.fs.pts | ex3.dot     ex3.pdf |
ex4.c | ex4.c.opt.ll | ex4.ander.pts | ex4.fs.pts | ex4.dot     ex4.pdf |
ex5.c | ex5.c.opt.ll | ex5.ander.pts | ex5.fs.pts | ex5.dot     ex5.pdf |
Note that the symbol names in a PAG are extracted from its LLVM IR rather than its corresponding C program. For example, "y" in ex1.pdf represents the register pointer pointing to its global object (the address-taken variable with NodeID 10). As is standard, only the points-to results for the top-level pointers are printed, because the points-to results for the address-taken variables are not comparable between flow-sensitive and flow-insensitive analyses. (In a flow-sensitive analysis, separate solutions for different program points must be kept. In a flow-insensitive analysis, the single solution for all program points are maintained.)
For ex1.c and ex2.c (given Figure 1(a) and (b), respectively), both FSAM and Andersen's analysis have the same points-to results.
For ex3.c (Figure 1(c)), FSAM is more precise than Andersen's analysis when dereferencing the pointer "*p" at line 18 in ex3.c. The resulting pointer (with NodeID 50 representing the register pointer value "%4 = load i32** %3" in ex3.c.opt.ll) has different points-to sets in ex3.ander.pts and ex3.fs.pts:
NodeID 50 | PointsTo: { 10 14 } | # the points-to result in ex3.ander.pts |
NodeID 50 | PointsTo: { 10 } | # the points-to result in ex3.fs.pts |
For ex4.c (Figure 1(d)), both FSAM and Andersen's have the same points-to results.
For ex5.c (Figure 1(e)), FSAM is more precise than Andersen's analysis when dereferencing the pointer "*p" at line 21 in ex5.c. The resulting pointer (with NodeID 59 representing the register pointer value "%1 = load i32** %0" in "main" function of ex5.c.opt.ll) has different points-to sets in ex5.ander.pts and ex5.fs.pts:
NodeID 59 | PointsTo: { 10 14 } | # the points-to result in ex5.ander.pts |
NodeID 59 | PointsTo: { 10 } | # the points-to result in ex5.fs.pts |
This web site is hosted by the Programming Languages and Compiler Group at the University of New South Wales.