Polyjuice Artifact Evaluation

Part 1. Getting Started

1.1 Access to our machine

We highly recommend you evaluate the artifacts on our machine, since the performance is based on the policies learned on dedicated machines. If you run on different hardwares, it may result in different performance.

SSH to our machine:

ssh ae@202.120.40.82 -p 2021

password: osdi21ae

After accessing to our machine, you can jump to Part 2 since all the binaries are prepared on our machine.

Or you can try the following command to run a simple workload (TPC-C 1 warehouse, 1 thread):

ae@r743[~]$ cd polyjuice-ae
ae@r743[./polyjuice-ae]$ cd ae-tpcc-polyjuice
ae@r743[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt

After about 30 seconds, you should be able to see the following output:

RESULT throughput(77511.3),agg_abort_rate(0)

1.2 If you want to evaluate on your own machine

Warning: As is said before, the training and learned policy is based on specific hardwares. If you decide to evaluate on your own machine, it may not have the same performance as ours.

1.2.1 Download

The source code is at GitLab. You can download the source code with the following command:

user@host[~]$ git clone https://oauth2:_VyRDDsg1oPy1dTgLgDQ@ipads.se.sjtu.edu.cn:1312/wangjc/polyjuice-ae.git

For each CC and benchmark, we set an individual folder so you can easily evaluate on different settings. Part 2.1 shows the detailed description for each folder.

1.2.2 Build

On our machine, we use Python 3.6.9 and GCC 7.5.0/8.3.0. We recommend you to use the same versions of Python and GCC as ours because otherwise it may fail to compile the codes.

1.2.2.1 Library Dependency

Our project depends on libraries as listed and we give the install command on Ubuntu 18.04 with apt-get.

Library Install Command
libnuma apt-get install libnuma-dev
libdb apt-get install libdb-dev libdb++-dev
libaio apt-get install libaio-dev
libz apt-get install libz-dev
libssl apt-get install libssl-dev

It also needs to install Jemalloc 5.0.1. In the following we provide step-by-step instructions to help you install jemalloc.
First, download the source code (e.g. jemalloc-5.0.1.tar.gz) at GitHub (https://github.com/jemalloc/jemalloc/releases/tag/5.0.1).

user@host[~]$ tar zxvf jemalloc-5.0.1.tar.gz
user@host[~]$ cd jemalloc-5.0.1

Then, try the following commands to install jemalloc.

user@host[~/jemalloc-5.0.1]$ ./autogen.sh
user@host[~/jemalloc-5.0.1]$ make
user@host[~/jemalloc-5.0.1]$ sudo make install

If you see the error:

install: cannot stat ‘doc/jemalloc.html’: No such file or directory

you can try the following command (https://github.com/jemalloc/jemalloc/issues/231):

user@host[~/jemalloc-5.0.1]$ sudo make install_bin install_include install_lib

After successfully installing jemalloc, you should be able to see the related files under the directory /usr/local/lib, /usr/local/include, /usr/local/bin.

1.2.2.2 Training Support

Training code is meant to run in a Python3 environment. Below we describe the setup
using a Python Virtual Environment. This
can either be done with the virtualenv package, or the Python3 venv package.

First set up a virtual environment.

user@host[~]$ python3 -m venv sim_env

Next activate the environment and enter the repository directory.

user@host[~]$ source sim_env/bin/activate
(sim_env)user@host[~]$ cd polyjuice-ae

Install needed libraries.

(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade pip
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade numpy
(sim_env)user@host[~/polyjuice-ae]$ pip install tensorflow==1.14.0     # we use this version specifically
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade tensorboard_logger
1.2.2.3 Build binaries

You can try the following commands to test whether all the dependencies have been installed and can successfully run the code. In our evaluation, we need to build the binaries in each folder. However, we also provide a script that automatically builds all the binaries in all folders.

user@host[~]$ cd polyjuice-ae
user@host[~/polyjuice-ae]$ cd ae-tpcc-polyjuice
user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ make dbtest -j

If it is built successfully, you should be able to see the following output:

g++ -o out-perf.masstree/benchmarks/dbtest out-perf.masstree/benchmarks/dbtest.o out-perf.masstree/allocator.o out-perf.masstree/btree.o out-perf.masstree/core.o out-perf.masstree/counter.o out-perf.masstree/memory.o out-perf.masstree/rcu.o out-perf.masstree/stats_server.o out-perf.masstree/thread.o out-perf.masstree/ticker.o out-perf.masstree/tuple.o out-perf.masstree/txn_btree.o out-perf.masstree/txn.o out-perf.masstree/txn_ic3_impl.o out-perf.masstree/varint.o out-perf.masstree/txn_entry_impl.o out-perf.masstree/policy.o out-perf.masstree/compiler.o out-perf.masstree/str.o out-perf.masstree/string.o out-perf.masstree/straccum.o out-perf.masstree/json.o out-perf.masstree/benchmarks/ldb_wrapper.o out-perf.masstree/benchmarks/bdb_wrapper.o out-perf.masstree/benchmarks/bench.o out-perf.masstree/benchmarks/encstress.o out-perf.masstree/benchmarks/masstree/kvrandom.o out-perf.masstree/benchmarks/queue.o out-perf.masstree/benchmarks/tpcc.o out-perf.masstree/benchmarks/tpce.o out-perf.masstree/benchmarks/micro_badcount.o out-perf.masstree/benchmarks/micro_lock_perf.o out-perf.masstree/benchmarks/micro_ic3_perf.o out-perf.masstree/benchmarks/micro_range.o out-perf.masstree/benchmarks/micro_delete.o out-perf.masstree/benchmarks/micro_insert.o out-perf.masstree/benchmarks/micro_transitive.o out-perf.masstree/benchmarks/micro_transitive2.o out-perf.masstree/benchmarks/micro_mem.o out-perf.masstree/benchmarks/micro_bench.o out-perf.masstree/benchmarks/micro_lock.o out-perf.masstree/benchmarks/ycsb.o out-perf.masstree/benchmarks/smallbank.o third-party/lz4/liblz4.so egen/egenlib/egenlib.a -lpthread -lnuma -lrt -ljemalloc -ldb_cxx  -Lthird-party/lz4 -llz4 -Wl,-rpath,/home/jiachen/polyjuice-ae/ae-tpcc-polyjuice/third-party/lz4

Then, try the following command to run Polyjuice on TPC-C benchmark, 1 warehouse with 1 thread.

user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt

After about 30 seconds, you should be able to see the following output:

RESULT throughput(77511.3),agg_abort_rate(0)

Note: if you see the error that:

error while loading shared libraries: libjemalloc.so.2: cannot open shared object file: No such file or directory

you can:

  1. check the directory /usr/local/lib/, and you should see libjemalloc.so.2 under the directory. Otherwise, you should try to re-install jemalloc (Part 1.2.1).
  2. the following commands may help:
user@host[~]$ cd /etc/ld.so.conf.d

# if there isn't other.conf, then create one
user@host[/etc/ld.so.conf.d]$ sudo vim other.conf
# and then append '/usr/local/lib/' to the end of the file

# apply the change
user@host[/etc/ld.so.conf.d]$ /sbin/ldconfig

If you can successfully run the commands above, then you can run the following commands to build all the binaries:

user@host[~/polyjuice-ae]$ ./ae_make_all.sh

Part 2. Step-by-Step instructions

We mainly provide the artifact to confirm our performance results, namely Figure 4-5, Figure 7-10 and Table 2 of the evaluation section.

2.1 Important notes

  1. Most of our experiments will use 48 physical cores. Using fewer cores or hyper-threads might produce different performance results.
  2. To quickly reproduce the result in the paper, for Figure 4, 5, 7, 9, 10 and Table 2, we provide the policies learned in our evaluation. For Figure 8, we provide the training logs in our evaluation. We also provide two scripts used for retraining Figure 8, but it may take more than 5 days.

2.2 Directory structure

The following shows the detailed description for each folder in ./polyjuice-ae. However, you don't have to remember what is each folder used for. Our scripts in Part 2.3 will automatically cd into corresponding folders before running the experiments.

TPC-C Benchmark

Figure 4: TPC-C Performance and Scalability &

Figure 10: Throughput under different workloads, 48 threads

folder description
ae-tpcc-polyjuice Polyjuice performance on TPC-C benchmark
ae-tpcc-ic3 IC3 performance on TPC-C benchmark
ae-tpcc-2pl 2PL performance on TPC-C benchmark
ae-tpcc-tebaldi Tebaldi performance on TPC-C benchmark
ae-tpcc-silo Silo performance on TPC-C benchmark

Figure 5: Factor Analysis On TPC-C Benchmark

folder description
ae-tpcc-polyjuice Factor analysis, bar 1 & bar 5
ae-tpcc-factor-no-dirty-read-public-write Factor analysis, bar 2
ae-tpcc-factor-no-coarse-grained-waiting Factor analysis, bar 3
ae-tpcc-factor-no-fine-grained-waiting Factor analysis, bar 4

Table 2: Latency for each transaction type in TPC-C with 1 warehouse and 48 threads

folder description
ae-tpcc-polyjuice-latency Polyjuice latency on TPC-C benchmark
ae-tpcc-ic3-latency IC3 latency on TPC-C benchmark
ae-tpcc-2pl-latency 2PL latency on TPC-C benchmark
ae-tpcc-tebaldi-latency Tebaldi latency on TPC-C benchmark
ae-tpcc-silo-latency Silo latency on TPC-C benchmark


TPC-E Benchmark

Figure 7: TPC-E Performance and Scalability

folder description
ae-tpce-polyjuice Polyjuice performance on TPC-E benchmark
ae-tpce-ic3 IC3 performance on TPC-E benchmark
ae-tpce-2pl 2PL performance on TPC-E benchmark
ae-tpce-silo Silo performance on TPC-E benchmark


Micro Benchmark

Figure 9: Micro-benchmark (10 tx types)

folder description
ae-micro-polyjuice Polyjuice performance on TPC-E benchmark
ae-micro-ic3 IC3 performance on TPC-E benchmark
ae-micro-2pl 2PL performance on TPC-E benchmark
ae-micro-silo Silo performance on TPC-E benchmark


Training

Figure 8: Training Efficiency

folder description
ae-tpcc-polyjuice Polyjuice training on TPC-C benchmark, warmstart-stage/warmstart-no-stage
ae-tpcc-polyjuice-randomstart Polyjuice training on TPC-C benchmark, randomstart-stage
ae-tpcc-polyjuice-rl RL training on TPC-C benchmark

2.3 Run Experiments

All the policies used in the evaluation are saved in the directory ./polyjuice-ae/ae-policy/.

Note that all the scripts in the following part are saved in the directory ./polyjuice-ae, and all of them do not need to specify any parameter. For example, to get the scalability of TPC-C (Figure 4.(c)), you can simply run:

user@host[~/polyjuice-ae]$ ./ae_tpcc_scalability.sh
script Experiment Corresponding Figure
ae_tpcc_performance.sh TPC-C performance Figure 4.(a) 4.(b)
ae_tpcc_scalability.sh TPC-C scalability Figure 4.(c)
ae_tpcc_factor-analysis.sh Polyjuice factor-analysis performance Figure 5
ae_tpcc_latency.sh TPC-C latency Table 2
ae_tpce_performance.sh TPC-E performance Figure 7.(a)
ae_tpce_scalability.sh TPC-E scalability Figure 7.(b)
ae_micro_performance.sh Micro performance Figure 9
ae_tpcc_different_workloads.sh TPC-C different workloads performance Figure 10

It may take 1+ hours to take each experiment above.

For Figure 8, we provide two scripts for training. Since the training result is not stable, for each setting, the script will train for 5 times. Each training cost about 6 hours. To save the time, we also provide the training logs in our evaluation, you can quickly check the logs by the following scripts.

Note that the following commands need tensorflow library and please ensure that you have activated the virtual environment (see Part 1.2.2) before executing these commands. If you are on our machine, you can try the following command to switch to the virtual enviroment:

ae@r743[~]$ . sim_env/bin/activate

Check the training logs:

script Experiment Corresponding Figure
ae_tensorboard_staged_training.sh Staged training efficiency Figure 8.(a)
ae_tensorboard_ea_rl.sh EA v.s. RL efficiency Figure 8.(b)

For example, you could try:

(sim_env)user@host[~/polyjuice-ae]$ ./ae_tensorboard_staged_training.sh

to check the training logs in Figure 8.(a). You should be able to see the following output:

TensorBoard 1.14.0 at http://your_host:6006/ (Press CTRL+C to quit)

Then you can access http://your_host:6006/ to see the training logs.

If you are on our machine, you may not access the address. You can clone/copy the log files (~/polyjuice-ae/ae-ea-rl, ~/polyjuice-ae/ae-staged-training) and the scripts to your local machine and try again. This requires that you should install tensorboard on your local machine.

If you want to train the settings again on your own machine, you can run the following commands:

script Experiment Corresponding Figure
ae_staged_training.sh Staged training efficiency Figure 8.(a)
ae_ea_rl.sh EA v.s. RL efficiency Figure 8.(b)

It may take 3*5*6 = 90 hours to run the staged training script and 2*5*6 = 60 hours for another one.

2.4 Output format

2.4.1 Performance output

The scripts will automatically run all the settings in the corresponding figure, and each setting for 5 times. To get a stable result, each run costs 30 seconds.

TPC-C performance
After executing ./ae_tpcc_performance.sh, you should be able to see:

------ Figure 4(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(a,b) Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304165),agg_abort_rate(0.0291986)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(302750),agg_abort_rate(0.0285134)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(291854),agg_abort_rate(0.0292322)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(303681),agg_abort_rate(0.0294943)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(294665),agg_abort_rate(0.0287176)
...

TPC-C scalability
After executing ./ae_tpcc_scalability.sh, you should be able to see:

------ Figure 4(c) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(c) Polyjuice
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77448.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77818.1),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78000.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78087.8),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77787.8),agg_abort_rate(0)
...

Polyjuice factor-analysis performance
After executing ./ae_tpcc_factor-analysis.sh, you should be able to see:

------ Figure 5(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure5(a) high contention [occ policy]
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(69264.8),agg_abort_rate(0.227792)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68717.2),agg_abort_rate(0.233193)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68065.9),agg_abort_rate(0.24509)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68285),agg_abort_rate(0.23205)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(66486.2),agg_abort_rate(0.225003)
...

TPC-C latency
After executing ./ae_tpcc_latency.sh, you should be able to see:

------ Table 2 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c Table 2 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
Latency - microseconds(µs)
new_order_p50 latency - 155
new_order_p90 latency - 187
new_order_p99 latency - 270
payment_p50 latency - 156
payment_p90 latency - 189
payment_p99 latency - 280
delivery_p50 latency - 139
delivery_p90 latency - 409
delivery_p99 latency - 775
...

TPC-E performance
After executing ./ae_tpce_performance.sh, you should be able to see:

------ Figure 7(a) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(a) Polyjuice
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02724e+06),agg_abort_rate(0.061197)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00206e+06),agg_abort_rate(0.0609897)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02596e+06),agg_abort_rate(0.0618323)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02768e+06),agg_abort_rate(0.061071)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00391e+06),agg_abort_rate(0.0615438)
...

TPC-E scalability
After executing ./ae_tpce_scalability.sh, you should be able to see:

------ Figure 7(b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(b) Polyjuice
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33598.3),agg_abort_rate(0.0380638)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33660.7),agg_abort_rate(0.0380675)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(31646),agg_abort_rate(0.0380343)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33687.7),agg_abort_rate(0.0380684)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(30935.8),agg_abort_rate(0.0380866)
...

Micro performance
After executing ./ae_micro_performance.sh, you should be able to see:

------ Figure 9 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
micro-benchmark(10 tx type) figure9 Polyjuice
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(857784),agg_abort_rate(0.0178433)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(828046),agg_abort_rate(0.0178269)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(830430),agg_abort_rate(0.0178426)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(811564),agg_abort_rate(0.0178326)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(831768),agg_abort_rate(0.0179003)
...

TPC-C different workloads performance
After executing ./ae_tpcc_different_workloads.sh, you should be able to see:

------ Figure 10 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure10 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304057),agg_abort_rate(0.0286652)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(289308),agg_abort_rate(0.0280998)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300991),agg_abort_rate(0.0294829)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(284700),agg_abort_rate(0.0282824)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300755),agg_abort_rate(0.0306853)
...

2.4.2 Training Output

There are two ways to see the training results:

  1. The training information will be directly output on the terminal, including each policy's performance within every training iteration.
  2. Each training process creates a folder to store results under corresponding_folder/training/saved_model, named by experiment name and a timestamp.
    You can collect the result folders together and use tensorboard to view the training curves visually as in the paper.
tensorboard --logdir="./"

Experiments related w/ training:

Experiment Script Corresponding Folder Figure
polyjuice warmstart-stage ae_staged_training.sh ae-tpcc-polyjuice Figure 8.(a)
polyjuice warmstart-no-stage ae_staged_training.sh ae-tpcc-polyjuice Figure 8.(a)
polyjuice randomstart-stage ae_staged_training.sh ae-tpcc-polyjuice-randomstart Figure 8.(a)
polyjuice EA ae_ea_rl.sh ae-tpcc-polyjuice Figure 8.(b)
polyjuice RL ae_ea_rl.sh ae-tpcc-polyjuice-rl Figure 8.(b)

For example, if you run

./ae_staged_training.sh

It will create new folders, which are used for saving the training logs, in the following two directories:

ae-tpcc-polyjuice/training/saved_model/
ae-tpcc-polyjuice-randomstart/training/saved_model/

After the training, you can cd into corresponding folders and get the training logs.

2.5 Verify the performance

TPC-C/TPC-E/Micro performance
The output should show that Polyjuice outperforms other baselines under high/moderate contention. Polyjuice's performance is slightly lower than Silo under low-contention workloads (e.g. TPC-C 48 threads - 48 warehouse, TPC-E zipf 1.0, Micro zipf 0.4).

TPC-C/TPC-E scalability
The output should show that Polyjuice has at least the same scalability as IC3 and Tebaldi.

TPC-C factor analysis
The output should show that the performance gradually improves after adding more actions into the space.

Training
The output should show that the training result of EA has better performance than RL. As to the comparision between warmstart-stage and warmstart-no-stage, since the training result is not stable all the time, sometimes warmstart-no-stage learns the policy as good as warmstart-stage. However, warmstart-stage is more stable than warmstart-no-stage if trains for 5 times.