We highly recommend you evaluate the artifacts on our machine, since the performance is based on the policies learned on dedicated machines. If you run on different hardwares, it may result in different performance.
SSH to our machine:
ssh ae@202.120.40.82 -p 2021
password: osdi21ae
After accessing to our machine, you can jump to Part 2 since all the binaries are prepared on our machine.
Or you can try the following command to run a simple workload (TPC-C 1 warehouse, 1 thread):
ae@r743[~]$ cd polyjuice-ae
ae@r743[./polyjuice-ae]$ cd ae-tpcc-polyjuice
ae@r743[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt
After about 30 seconds, you should be able to see the following output:
RESULT throughput(77511.3),agg_abort_rate(0)
Warning: As is said before, the training and learned policy is based on specific hardwares. If you decide to evaluate on your own machine, it may not have the same performance as ours.
The source code is at GitLab. You can download the source code with the following command:
user@host[~]$ git clone https://oauth2:_VyRDDsg1oPy1dTgLgDQ@ipads.se.sjtu.edu.cn:1312/wangjc/polyjuice-ae.git
For each CC and benchmark, we set an individual folder so you can easily evaluate on different settings. Part 2.1 shows the detailed description for each folder.
On our machine, we use Python 3.6.9 and GCC 7.5.0/8.3.0. We recommend you to use the same versions of Python and GCC as ours because otherwise it may fail to compile the codes.
Our project depends on libraries as listed and we give the install command on Ubuntu 18.04 with apt-get.
Library | Install Command |
---|---|
libnuma | apt-get install libnuma-dev |
libdb | apt-get install libdb-dev libdb++-dev |
libaio | apt-get install libaio-dev |
libz | apt-get install libz-dev |
libssl | apt-get install libssl-dev |
It also needs to install Jemalloc 5.0.1. In the following we provide step-by-step instructions to help you install jemalloc
.
First, download the source code (e.g. jemalloc-5.0.1.tar.gz
) at GitHub (https://github.com/jemalloc/jemalloc/releases/tag/5.0.1).
user@host[~]$ tar zxvf jemalloc-5.0.1.tar.gz
user@host[~]$ cd jemalloc-5.0.1
Then, try the following commands to install jemalloc
.
user@host[~/jemalloc-5.0.1]$ ./autogen.sh
user@host[~/jemalloc-5.0.1]$ make
user@host[~/jemalloc-5.0.1]$ sudo make install
If you see the error:
install: cannot stat ‘doc/jemalloc.html’: No such file or directory
you can try the following command (https://github.com/jemalloc/jemalloc/issues/231):
user@host[~/jemalloc-5.0.1]$ sudo make install_bin install_include install_lib
After successfully installing jemalloc
, you should be able to see the related files under the directory /usr/local/lib
, /usr/local/include
, /usr/local/bin
.
Training code is meant to run in a Python3 environment. Below we describe the setup
using a Python Virtual Environment. This
can either be done with the virtualenv
package, or the Python3 venv
package.
First set up a virtual environment.
user@host[~]$ python3 -m venv sim_env
Next activate the environment and enter the repository directory.
user@host[~]$ source sim_env/bin/activate
(sim_env)user@host[~]$ cd polyjuice-ae
Install needed libraries.
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade pip
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade numpy
(sim_env)user@host[~/polyjuice-ae]$ pip install tensorflow==1.14.0 # we use this version specifically
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade tensorboard_logger
You can try the following commands to test whether all the dependencies have been installed and can successfully run the code. In our evaluation, we need to build the binaries in each folder. However, we also provide a script that automatically builds all the binaries in all folders.
user@host[~]$ cd polyjuice-ae
user@host[~/polyjuice-ae]$ cd ae-tpcc-polyjuice
user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ make dbtest -j
If it is built successfully, you should be able to see the following output:
g++ -o out-perf.masstree/benchmarks/dbtest out-perf.masstree/benchmarks/dbtest.o out-perf.masstree/allocator.o out-perf.masstree/btree.o out-perf.masstree/core.o out-perf.masstree/counter.o out-perf.masstree/memory.o out-perf.masstree/rcu.o out-perf.masstree/stats_server.o out-perf.masstree/thread.o out-perf.masstree/ticker.o out-perf.masstree/tuple.o out-perf.masstree/txn_btree.o out-perf.masstree/txn.o out-perf.masstree/txn_ic3_impl.o out-perf.masstree/varint.o out-perf.masstree/txn_entry_impl.o out-perf.masstree/policy.o out-perf.masstree/compiler.o out-perf.masstree/str.o out-perf.masstree/string.o out-perf.masstree/straccum.o out-perf.masstree/json.o out-perf.masstree/benchmarks/ldb_wrapper.o out-perf.masstree/benchmarks/bdb_wrapper.o out-perf.masstree/benchmarks/bench.o out-perf.masstree/benchmarks/encstress.o out-perf.masstree/benchmarks/masstree/kvrandom.o out-perf.masstree/benchmarks/queue.o out-perf.masstree/benchmarks/tpcc.o out-perf.masstree/benchmarks/tpce.o out-perf.masstree/benchmarks/micro_badcount.o out-perf.masstree/benchmarks/micro_lock_perf.o out-perf.masstree/benchmarks/micro_ic3_perf.o out-perf.masstree/benchmarks/micro_range.o out-perf.masstree/benchmarks/micro_delete.o out-perf.masstree/benchmarks/micro_insert.o out-perf.masstree/benchmarks/micro_transitive.o out-perf.masstree/benchmarks/micro_transitive2.o out-perf.masstree/benchmarks/micro_mem.o out-perf.masstree/benchmarks/micro_bench.o out-perf.masstree/benchmarks/micro_lock.o out-perf.masstree/benchmarks/ycsb.o out-perf.masstree/benchmarks/smallbank.o third-party/lz4/liblz4.so egen/egenlib/egenlib.a -lpthread -lnuma -lrt -ljemalloc -ldb_cxx -Lthird-party/lz4 -llz4 -Wl,-rpath,/home/jiachen/polyjuice-ae/ae-tpcc-polyjuice/third-party/lz4
Then, try the following command to run Polyjuice on TPC-C benchmark, 1 warehouse with 1 thread.
user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt
After about 30 seconds, you should be able to see the following output:
RESULT throughput(77511.3),agg_abort_rate(0)
Note: if you see the error that:
error while loading shared libraries: libjemalloc.so.2: cannot open shared object file: No such file or directory
you can:
/usr/local/lib/
, and you should see libjemalloc.so.2
under the directory. Otherwise, you should try to re-install jemalloc
(Part 1.2.1).user@host[~]$ cd /etc/ld.so.conf.d
# if there isn't other.conf, then create one
user@host[/etc/ld.so.conf.d]$ sudo vim other.conf
# and then append '/usr/local/lib/' to the end of the file
# apply the change
user@host[/etc/ld.so.conf.d]$ /sbin/ldconfig
If you can successfully run the commands above, then you can run the following commands to build all the binaries:
user@host[~/polyjuice-ae]$ ./ae_make_all.sh
We mainly provide the artifact to confirm our performance results, namely Figure 4-5, Figure 7-10 and Table 2 of the evaluation section.
The following shows the detailed description for each folder in ./polyjuice-ae
. However, you don't have to remember what is each folder used for. Our scripts in Part 2.3 will automatically cd
into corresponding folders before running the experiments.
TPC-C Benchmark
Figure 4: TPC-C Performance and Scalability &
Figure 10: Throughput under different workloads, 48 threads
folder | description |
---|---|
ae-tpcc-polyjuice | Polyjuice performance on TPC-C benchmark |
ae-tpcc-ic3 | IC3 performance on TPC-C benchmark |
ae-tpcc-2pl | 2PL performance on TPC-C benchmark |
ae-tpcc-tebaldi | Tebaldi performance on TPC-C benchmark |
ae-tpcc-silo | Silo performance on TPC-C benchmark |
Figure 5: Factor Analysis On TPC-C Benchmark
folder | description |
---|---|
ae-tpcc-polyjuice | Factor analysis, bar 1 & bar 5 |
ae-tpcc-factor-no-dirty-read-public-write | Factor analysis, bar 2 |
ae-tpcc-factor-no-coarse-grained-waiting | Factor analysis, bar 3 |
ae-tpcc-factor-no-fine-grained-waiting | Factor analysis, bar 4 |
Table 2: Latency for each transaction type in TPC-C with 1 warehouse and 48 threads
folder | description |
---|---|
ae-tpcc-polyjuice-latency | Polyjuice latency on TPC-C benchmark |
ae-tpcc-ic3-latency | IC3 latency on TPC-C benchmark |
ae-tpcc-2pl-latency | 2PL latency on TPC-C benchmark |
ae-tpcc-tebaldi-latency | Tebaldi latency on TPC-C benchmark |
ae-tpcc-silo-latency | Silo latency on TPC-C benchmark |
TPC-E Benchmark
Figure 7: TPC-E Performance and Scalability
folder | description |
---|---|
ae-tpce-polyjuice | Polyjuice performance on TPC-E benchmark |
ae-tpce-ic3 | IC3 performance on TPC-E benchmark |
ae-tpce-2pl | 2PL performance on TPC-E benchmark |
ae-tpce-silo | Silo performance on TPC-E benchmark |
Micro Benchmark
Figure 9: Micro-benchmark (10 tx types)
folder | description |
---|---|
ae-micro-polyjuice | Polyjuice performance on TPC-E benchmark |
ae-micro-ic3 | IC3 performance on TPC-E benchmark |
ae-micro-2pl | 2PL performance on TPC-E benchmark |
ae-micro-silo | Silo performance on TPC-E benchmark |
Training
Figure 8: Training Efficiency
folder | description |
---|---|
ae-tpcc-polyjuice | Polyjuice training on TPC-C benchmark, warmstart-stage/warmstart-no-stage |
ae-tpcc-polyjuice-randomstart | Polyjuice training on TPC-C benchmark, randomstart-stage |
ae-tpcc-polyjuice-rl | RL training on TPC-C benchmark |
All the policies used in the evaluation are saved in the directory ./polyjuice-ae/ae-policy/
.
Note that all the scripts in the following part are saved in the directory ./polyjuice-ae
, and all of them do not need to specify any parameter. For example, to get the scalability of TPC-C (Figure 4.(c)), you can simply run:
user@host[~/polyjuice-ae]$ ./ae_tpcc_scalability.sh
script | Experiment | Corresponding Figure |
---|---|---|
ae_tpcc_performance.sh | TPC-C performance | Figure 4.(a) 4.(b) |
ae_tpcc_scalability.sh | TPC-C scalability | Figure 4.(c) |
ae_tpcc_factor-analysis.sh | Polyjuice factor-analysis performance | Figure 5 |
ae_tpcc_latency.sh | TPC-C latency | Table 2 |
ae_tpce_performance.sh | TPC-E performance | Figure 7.(a) |
ae_tpce_scalability.sh | TPC-E scalability | Figure 7.(b) |
ae_micro_performance.sh | Micro performance | Figure 9 |
ae_tpcc_different_workloads.sh | TPC-C different workloads performance | Figure 10 |
It may take 1+ hours to take each experiment above.
For Figure 8, we provide two scripts for training. Since the training result is not stable, for each setting, the script will train for 5 times. Each training cost about 6 hours. To save the time, we also provide the training logs in our evaluation, you can quickly check the logs by the following scripts.
Note that the following commands need tensorflow library and please ensure that you have activated the virtual environment (see Part 1.2.2) before executing these commands. If you are on our machine, you can try the following command to switch to the virtual enviroment:
ae@r743[~]$ . sim_env/bin/activate
Check the training logs:
script | Experiment | Corresponding Figure |
---|---|---|
ae_tensorboard_staged_training.sh | Staged training efficiency | Figure 8.(a) |
ae_tensorboard_ea_rl.sh | EA v.s. RL efficiency | Figure 8.(b) |
For example, you could try:
(sim_env)user@host[~/polyjuice-ae]$ ./ae_tensorboard_staged_training.sh
to check the training logs in Figure 8.(a). You should be able to see the following output:
TensorBoard 1.14.0 at http://your_host:6006/ (Press CTRL+C to quit)
Then you can access http://your_host:6006/ to see the training logs.
If you are on our machine, you may not access the address. You can clone/copy the log files (~/polyjuice-ae/ae-ea-rl
, ~/polyjuice-ae/ae-staged-training
) and the scripts to your local machine and try again. This requires that you should install tensorboard on your local machine.
If you want to train the settings again on your own machine, you can run the following commands:
script | Experiment | Corresponding Figure |
---|---|---|
ae_staged_training.sh | Staged training efficiency | Figure 8.(a) |
ae_ea_rl.sh | EA v.s. RL efficiency | Figure 8.(b) |
It may take 3*5*6 = 90 hours to run the staged training script and 2*5*6 = 60 hours for another one.
The scripts will automatically run all the settings in the corresponding figure, and each setting for 5 times. To get a stable result, each run costs 30 seconds.
TPC-C performance
After executing ./ae_tpcc_performance.sh
, you should be able to see:
------ Figure 4(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(a,b) Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304165),agg_abort_rate(0.0291986)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(302750),agg_abort_rate(0.0285134)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(291854),agg_abort_rate(0.0292322)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(303681),agg_abort_rate(0.0294943)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(294665),agg_abort_rate(0.0287176)
...
TPC-C scalability
After executing ./ae_tpcc_scalability.sh
, you should be able to see:
------ Figure 4(c) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(c) Polyjuice
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77448.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77818.1),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78000.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78087.8),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77787.8),agg_abort_rate(0)
...
Polyjuice factor-analysis performance
After executing ./ae_tpcc_factor-analysis.sh
, you should be able to see:
------ Figure 5(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure5(a) high contention [occ policy]
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(69264.8),agg_abort_rate(0.227792)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68717.2),agg_abort_rate(0.233193)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68065.9),agg_abort_rate(0.24509)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68285),agg_abort_rate(0.23205)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(66486.2),agg_abort_rate(0.225003)
...
TPC-C latency
After executing ./ae_tpcc_latency.sh
, you should be able to see:
------ Table 2 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c Table 2 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
Latency - microseconds(µs)
new_order_p50 latency - 155
new_order_p90 latency - 187
new_order_p99 latency - 270
payment_p50 latency - 156
payment_p90 latency - 189
payment_p99 latency - 280
delivery_p50 latency - 139
delivery_p90 latency - 409
delivery_p99 latency - 775
...
TPC-E performance
After executing ./ae_tpce_performance.sh
, you should be able to see:
------ Figure 7(a) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(a) Polyjuice
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02724e+06),agg_abort_rate(0.061197)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00206e+06),agg_abort_rate(0.0609897)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02596e+06),agg_abort_rate(0.0618323)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02768e+06),agg_abort_rate(0.061071)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00391e+06),agg_abort_rate(0.0615438)
...
TPC-E scalability
After executing ./ae_tpce_scalability.sh
, you should be able to see:
------ Figure 7(b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(b) Polyjuice
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33598.3),agg_abort_rate(0.0380638)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33660.7),agg_abort_rate(0.0380675)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(31646),agg_abort_rate(0.0380343)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33687.7),agg_abort_rate(0.0380684)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(30935.8),agg_abort_rate(0.0380866)
...
Micro performance
After executing ./ae_micro_performance.sh
, you should be able to see:
------ Figure 9 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
micro-benchmark(10 tx type) figure9 Polyjuice
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(857784),agg_abort_rate(0.0178433)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(828046),agg_abort_rate(0.0178269)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(830430),agg_abort_rate(0.0178426)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(811564),agg_abort_rate(0.0178326)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(831768),agg_abort_rate(0.0179003)
...
TPC-C different workloads performance
After executing ./ae_tpcc_different_workloads.sh
, you should be able to see:
------ Figure 10 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure10 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304057),agg_abort_rate(0.0286652)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(289308),agg_abort_rate(0.0280998)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300991),agg_abort_rate(0.0294829)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(284700),agg_abort_rate(0.0282824)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300755),agg_abort_rate(0.0306853)
...
There are two ways to see the training results:
tensorboard --logdir="./"
Experiments related w/ training:
Experiment | Script | Corresponding Folder | Figure |
---|---|---|---|
polyjuice warmstart-stage | ae_staged_training.sh | ae-tpcc-polyjuice | Figure 8.(a) |
polyjuice warmstart-no-stage | ae_staged_training.sh | ae-tpcc-polyjuice | Figure 8.(a) |
polyjuice randomstart-stage | ae_staged_training.sh | ae-tpcc-polyjuice-randomstart | Figure 8.(a) |
polyjuice EA | ae_ea_rl.sh | ae-tpcc-polyjuice | Figure 8.(b) |
polyjuice RL | ae_ea_rl.sh | ae-tpcc-polyjuice-rl | Figure 8.(b) |
For example, if you run
./ae_staged_training.sh
It will create new folders, which are used for saving the training logs, in the following two directories:
ae-tpcc-polyjuice/training/saved_model/
ae-tpcc-polyjuice-randomstart/training/saved_model/
After the training, you can cd
into corresponding folders and get the training logs.
TPC-C/TPC-E/Micro performance
The output should show that Polyjuice outperforms other baselines under high/moderate contention. Polyjuice's performance is slightly lower than Silo under low-contention workloads (e.g. TPC-C 48 threads - 48 warehouse, TPC-E zipf 1.0, Micro zipf 0.4).
TPC-C/TPC-E scalability
The output should show that Polyjuice has at least the same scalability as IC3 and Tebaldi.
TPC-C factor analysis
The output should show that the performance gradually improves after adding more actions into the space.
Training
The output should show that the training result of EA has better performance than RL. As to the comparision between warmstart-stage
and warmstart-no-stage
, since the training result is not stable all the time, sometimes warmstart-no-stage
learns the policy as good as warmstart-stage
. However, warmstart-stage
is more stable than warmstart-no-stage
if trains for 5 times.