Polyjuice Artifact Evaluation

Part 1. Getting Started

1.1 Access to our machine

We highly recommend you evaluate the artifacts on our machine, since the performance is based on the policies learned on dedicated machines. If you run on different hardwares, it may result in different performance.

SSH to our machine:

ssh ae@202.120.40.82 -p 2021

password: osdi21ae

After accessing to our machine, you can jump to Part 2 since all the binaries are prepared on our machine.

Or you can try the following command to run a simple workload (TPC-C 1 warehouse, 1 thread):

ae@r743[~]$ cd polyjuice-ae
ae@r743[./polyjuice-ae]$ cd ae-tpcc-polyjuice
ae@r743[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt

After about 30 seconds, you should be able to see the following output:

RESULT throughput(77511.3),agg_abort_rate(0)

1.2 If you want to evaluate on your own machine

Warning: As is said before, the training and learned policy is based on specific hardwares. If you decide to evaluate on your own machine, it may not have the same performance as ours.

1.2.1 Download

The source code is at GitLab. You can download the source code with the following command:

user@host[~]$ git clone https://oauth2:_VyRDDsg1oPy1dTgLgDQ@ipads.se.sjtu.edu.cn:1312/wangjc/polyjuice-ae.git

For each CC and benchmark, we set an individual folder so you can easily evaluate on different settings. Part 2.1 shows the detailed description for each folder.

1.2.2 Build

On our machine, we use Python 3.6.9 and GCC 7.5.0/8.3.0. We recommend you to use the same versions of Python and GCC as ours because otherwise it may fail to compile the codes.

1.2.2.1 Library Dependency

Our project depends on libraries as listed and we give the install command on Ubuntu 18.04 with apt-get.

Library	Install Command
libnuma	apt-get install libnuma-dev
libdb	apt-get install libdb-dev libdb++-dev
libaio	apt-get install libaio-dev
libz	apt-get install libz-dev
libssl	apt-get install libssl-dev

It also needs to install Jemalloc 5.0.1. In the following we provide step-by-step instructions to help you install jemalloc.
First, download the source code (e.g. jemalloc-5.0.1.tar.gz) at GitHub (https://github.com/jemalloc/jemalloc/releases/tag/5.0.1).

user@host[~]$ tar zxvf jemalloc-5.0.1.tar.gz
user@host[~]$ cd jemalloc-5.0.1

Then, try the following commands to install jemalloc.

user@host[~/jemalloc-5.0.1]$ ./autogen.sh
user@host[~/jemalloc-5.0.1]$ make
user@host[~/jemalloc-5.0.1]$ sudo make install

If you see the error:

install: cannot stat ‘doc/jemalloc.html’: No such file or directory

you can try the following command (https://github.com/jemalloc/jemalloc/issues/231):

user@host[~/jemalloc-5.0.1]$ sudo make install_bin install_include install_lib

After successfully installing jemalloc, you should be able to see the related files under the directory /usr/local/lib, /usr/local/include, /usr/local/bin.

1.2.2.2 Training Support

Training code is meant to run in a Python3 environment. Below we describe the setup
using a Python Virtual Environment. This
can either be done with the virtualenv package, or the Python3 venv package.

First set up a virtual environment.

user@host[~]$ python3 -m venv sim_env

Next activate the environment and enter the repository directory.

user@host[~]$ source sim_env/bin/activate
(sim_env)user@host[~]$ cd polyjuice-ae

Install needed libraries.

(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade pip
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade numpy
(sim_env)user@host[~/polyjuice-ae]$ pip install tensorflow==1.14.0     # we use this version specifically
(sim_env)user@host[~/polyjuice-ae]$ pip install --upgrade tensorboard_logger

1.2.2.3 Build binaries

You can try the following commands to test whether all the dependencies have been installed and can successfully run the code. In our evaluation, we need to build the binaries in each folder. However, we also provide a script that automatically builds all the binaries in all folders.

user@host[~]$ cd polyjuice-ae
user@host[~/polyjuice-ae]$ cd ae-tpcc-polyjuice
user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ make dbtest -j

If it is built successfully, you should be able to see the following output:

g++ -o out-perf.masstree/benchmarks/dbtest out-perf.masstree/benchmarks/dbtest.o out-perf.masstree/allocator.o out-perf.masstree/btree.o out-perf.masstree/core.o out-perf.masstree/counter.o out-perf.masstree/memory.o out-perf.masstree/rcu.o out-perf.masstree/stats_server.o out-perf.masstree/thread.o out-perf.masstree/ticker.o out-perf.masstree/tuple.o out-perf.masstree/txn_btree.o out-perf.masstree/txn.o out-perf.masstree/txn_ic3_impl.o out-perf.masstree/varint.o out-perf.masstree/txn_entry_impl.o out-perf.masstree/policy.o out-perf.masstree/compiler.o out-perf.masstree/str.o out-perf.masstree/string.o out-perf.masstree/straccum.o out-perf.masstree/json.o out-perf.masstree/benchmarks/ldb_wrapper.o out-perf.masstree/benchmarks/bdb_wrapper.o out-perf.masstree/benchmarks/bench.o out-perf.masstree/benchmarks/encstress.o out-perf.masstree/benchmarks/masstree/kvrandom.o out-perf.masstree/benchmarks/queue.o out-perf.masstree/benchmarks/tpcc.o out-perf.masstree/benchmarks/tpce.o out-perf.masstree/benchmarks/micro_badcount.o out-perf.masstree/benchmarks/micro_lock_perf.o out-perf.masstree/benchmarks/micro_ic3_perf.o out-perf.masstree/benchmarks/micro_range.o out-perf.masstree/benchmarks/micro_delete.o out-perf.masstree/benchmarks/micro_insert.o out-perf.masstree/benchmarks/micro_transitive.o out-perf.masstree/benchmarks/micro_transitive2.o out-perf.masstree/benchmarks/micro_mem.o out-perf.masstree/benchmarks/micro_bench.o out-perf.masstree/benchmarks/micro_lock.o out-perf.masstree/benchmarks/ycsb.o out-perf.masstree/benchmarks/smallbank.o third-party/lz4/liblz4.so egen/egenlib/egenlib.a -lpthread -lnuma -lrt -ljemalloc -ldb_cxx  -Lthird-party/lz4 -llz4 -Wl,-rpath,/home/jiachen/polyjuice-ae/ae-tpcc-polyjuice/third-party/lz4

Then, try the following command to run Polyjuice on TPC-C benchmark, 1 warehouse with 1 thread.

user@host[~/polyjuice-ae/ae-tpcc-polyjuice]$ ./out-perf.masstree/benchmarks/dbtest --bench tpcc --parallel-loading --retry-aborted-transactions --bench-opt "--workload-mix 45,43,4,4,4" --db-type ndb-ic3 --backoff-aborted-transactions --runtime 30 --num-threads 1 --scale-factor 1 --policy training/input-RL-occ-tpcc.txt

After about 30 seconds, you should be able to see the following output:

RESULT throughput(77511.3),agg_abort_rate(0)

Note: if you see the error that:

error while loading shared libraries: libjemalloc.so.2: cannot open shared object file: No such file or directory

you can:

check the directory /usr/local/lib/, and you should see libjemalloc.so.2 under the directory. Otherwise, you should try to re-install jemalloc (Part 1.2.1).
the following commands may help:

user@host[~]$ cd /etc/ld.so.conf.d

# if there isn't other.conf, then create one
user@host[/etc/ld.so.conf.d]$ sudo vim other.conf
# and then append '/usr/local/lib/' to the end of the file

# apply the change
user@host[/etc/ld.so.conf.d]$ /sbin/ldconfig

If you can successfully run the commands above, then you can run the following commands to build all the binaries:

user@host[~/polyjuice-ae]$ ./ae_make_all.sh

Part 2. Step-by-Step instructions

We mainly provide the artifact to confirm our performance results, namely Figure 4-5, Figure 7-10 and Table 2 of the evaluation section.

2.1 Important notes

Most of our experiments will use 48 physical cores. Using fewer cores or hyper-threads might produce different performance results.
To quickly reproduce the result in the paper, for Figure 4, 5, 7, 9, 10 and Table 2, we provide the policies learned in our evaluation. For Figure 8, we provide the training logs in our evaluation. We also provide two scripts used for retraining Figure 8, but it may take more than 5 days.

2.2 Directory structure

The following shows the detailed description for each folder in ./polyjuice-ae. However, you don't have to remember what is each folder used for. Our scripts in Part 2.3 will automatically cd into corresponding folders before running the experiments.

TPC-C Benchmark

Figure 4: TPC-C Performance and Scalability &

Figure 10: Throughput under different workloads, 48 threads

folder	description
ae-tpcc-polyjuice	Polyjuice performance on TPC-C benchmark
ae-tpcc-ic3	IC3 performance on TPC-C benchmark
ae-tpcc-2pl	2PL performance on TPC-C benchmark
ae-tpcc-tebaldi	Tebaldi performance on TPC-C benchmark
ae-tpcc-silo	Silo performance on TPC-C benchmark

Figure 5: Factor Analysis On TPC-C Benchmark

folder	description
ae-tpcc-polyjuice	Factor analysis, bar 1 & bar 5
ae-tpcc-factor-no-dirty-read-public-write	Factor analysis, bar 2
ae-tpcc-factor-no-coarse-grained-waiting	Factor analysis, bar 3
ae-tpcc-factor-no-fine-grained-waiting	Factor analysis, bar 4

Table 2: Latency for each transaction type in TPC-C with 1 warehouse and 48 threads

folder	description
ae-tpcc-polyjuice-latency	Polyjuice latency on TPC-C benchmark
ae-tpcc-ic3-latency	IC3 latency on TPC-C benchmark
ae-tpcc-2pl-latency	2PL latency on TPC-C benchmark
ae-tpcc-tebaldi-latency	Tebaldi latency on TPC-C benchmark
ae-tpcc-silo-latency	Silo latency on TPC-C benchmark

TPC-E Benchmark

Figure 7: TPC-E Performance and Scalability

folder	description
ae-tpce-polyjuice	Polyjuice performance on TPC-E benchmark
ae-tpce-ic3	IC3 performance on TPC-E benchmark
ae-tpce-2pl	2PL performance on TPC-E benchmark
ae-tpce-silo	Silo performance on TPC-E benchmark

Micro Benchmark

Figure 9: Micro-benchmark (10 tx types)

folder	description
ae-micro-polyjuice	Polyjuice performance on TPC-E benchmark
ae-micro-ic3	IC3 performance on TPC-E benchmark
ae-micro-2pl	2PL performance on TPC-E benchmark
ae-micro-silo	Silo performance on TPC-E benchmark

Training

Figure 8: Training Efficiency

folder	description
ae-tpcc-polyjuice	Polyjuice training on TPC-C benchmark, warmstart-stage/warmstart-no-stage
ae-tpcc-polyjuice-randomstart	Polyjuice training on TPC-C benchmark, randomstart-stage
ae-tpcc-polyjuice-rl	RL training on TPC-C benchmark

2.3 Run Experiments

All the policies used in the evaluation are saved in the directory ./polyjuice-ae/ae-policy/.

Note that all the scripts in the following part are saved in the directory ./polyjuice-ae, and all of them do not need to specify any parameter. For example, to get the scalability of TPC-C (Figure 4.(c)), you can simply run:

user@host[~/polyjuice-ae]$ ./ae_tpcc_scalability.sh

script	Experiment	Corresponding Figure
ae_tpcc_performance.sh	TPC-C performance	Figure 4.(a) 4.(b)
ae_tpcc_scalability.sh	TPC-C scalability	Figure 4.(c)
ae_tpcc_factor-analysis.sh	Polyjuice factor-analysis performance	Figure 5
ae_tpcc_latency.sh	TPC-C latency	Table 2
ae_tpce_performance.sh	TPC-E performance	Figure 7.(a)
ae_tpce_scalability.sh	TPC-E scalability	Figure 7.(b)
ae_micro_performance.sh	Micro performance	Figure 9
ae_tpcc_different_workloads.sh	TPC-C different workloads performance	Figure 10

It may take 1+ hours to take each experiment above.

For Figure 8, we provide two scripts for training. Since the training result is not stable, for each setting, the script will train for 5 times. Each training cost about 6 hours. To save the time, we also provide the training logs in our evaluation, you can quickly check the logs by the following scripts.

Note that the following commands need tensorflow library and please ensure that you have activated the virtual environment (see Part 1.2.2) before executing these commands. If you are on our machine, you can try the following command to switch to the virtual enviroment:

ae@r743[~]$ . sim_env/bin/activate

Check the training logs:

script	Experiment	Corresponding Figure
ae_tensorboard_staged_training.sh	Staged training efficiency	Figure 8.(a)
ae_tensorboard_ea_rl.sh	EA v.s. RL efficiency	Figure 8.(b)

For example, you could try:

(sim_env)user@host[~/polyjuice-ae]$ ./ae_tensorboard_staged_training.sh

to check the training logs in Figure 8.(a). You should be able to see the following output:

TensorBoard 1.14.0 at http://your_host:6006/ (Press CTRL+C to quit)

Then you can access http://your_host:6006/ to see the training logs.

If you are on our machine, you may not access the address. You can clone/copy the log files (~/polyjuice-ae/ae-ea-rl, ~/polyjuice-ae/ae-staged-training) and the scripts to your local machine and try again. This requires that you should install tensorboard on your local machine.

If you want to train the settings again on your own machine, you can run the following commands:

script	Experiment	Corresponding Figure
ae_staged_training.sh	Staged training efficiency	Figure 8.(a)
ae_ea_rl.sh	EA v.s. RL efficiency	Figure 8.(b)

It may take 3*5*6 = 90 hours to run the staged training script and 2*5*6 = 60 hours for another one.

2.4 Output format

2.4.1 Performance output

The scripts will automatically run all the settings in the corresponding figure, and each setting for 5 times. To get a stable result, each run costs 30 seconds.

TPC-C performance
After executing ./ae_tpcc_performance.sh, you should be able to see:

------ Figure 4(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(a,b) Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304165),agg_abort_rate(0.0291986)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(302750),agg_abort_rate(0.0285134)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(291854),agg_abort_rate(0.0292322)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(303681),agg_abort_rate(0.0294943)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(294665),agg_abort_rate(0.0287176)
...

TPC-C scalability
After executing ./ae_tpcc_scalability.sh, you should be able to see:

------ Figure 4(c) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure4(c) Polyjuice
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77448.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77818.1),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78000.4),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(78087.8),agg_abort_rate(0)
tpc-c. cc=Polyjuice. num-threads=1, warehouse-num=1
RESULT throughput(77787.8),agg_abort_rate(0)
...

Polyjuice factor-analysis performance
After executing ./ae_tpcc_factor-analysis.sh, you should be able to see:

------ Figure 5(a,b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure5(a) high contention [occ policy]
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(69264.8),agg_abort_rate(0.227792)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68717.2),agg_abort_rate(0.233193)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68065.9),agg_abort_rate(0.24509)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(68285),agg_abort_rate(0.23205)
tpc-c. setting=[occ policy]. num-threads=48, warehouse-num=1
RESULT throughput(66486.2),agg_abort_rate(0.225003)
...

TPC-C latency
After executing ./ae_tpcc_latency.sh, you should be able to see:

------ Table 2 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c Table 2 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
Latency - microseconds(µs)
new_order_p50 latency - 155
new_order_p90 latency - 187
new_order_p99 latency - 270
payment_p50 latency - 156
payment_p90 latency - 189
payment_p99 latency - 280
delivery_p50 latency - 139
delivery_p90 latency - 409
delivery_p99 latency - 775
...

TPC-E performance
After executing ./ae_tpce_performance.sh, you should be able to see:

------ Figure 7(a) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(a) Polyjuice
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02724e+06),agg_abort_rate(0.061197)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00206e+06),agg_abort_rate(0.0609897)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02596e+06),agg_abort_rate(0.0618323)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.02768e+06),agg_abort_rate(0.061071)
tpc-e. cc=Polyjuice. num-threads=48, zipf_theta=1.0
RESULT throughput(1.00391e+06),agg_abort_rate(0.0615438)
...

TPC-E scalability
After executing ./ae_tpce_scalability.sh, you should be able to see:

------ Figure 7(b) Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-e figure7(b) Polyjuice
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33598.3),agg_abort_rate(0.0380638)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33660.7),agg_abort_rate(0.0380675)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(31646),agg_abort_rate(0.0380343)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(33687.7),agg_abort_rate(0.0380684)
tpc-e. cc=Polyjuice. num-threads=1, zipf_theta=3.0
RESULT throughput(30935.8),agg_abort_rate(0.0380866)
...

Micro performance
After executing ./ae_micro_performance.sh, you should be able to see:

------ Figure 9 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
micro-benchmark(10 tx type) figure9 Polyjuice
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(857784),agg_abort_rate(0.0178433)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(828046),agg_abort_rate(0.0178269)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(830430),agg_abort_rate(0.0178426)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(811564),agg_abort_rate(0.0178326)
micro-benchmark(10 tx type). cc=Polyjuice. num-threads=48, zipf_theta=0.4
RESULT throughput(831768),agg_abort_rate(0.0179003)
...

TPC-C different workloads performance
After executing ./ae_tpcc_different_workloads.sh, you should be able to see:

------ Figure 10 Evaluation Start ------
make: Nothing to be done for 'dbtest'.
------ Make Done ------
tpc-c figure10 Polyjuice
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(304057),agg_abort_rate(0.0286652)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(289308),agg_abort_rate(0.0280998)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300991),agg_abort_rate(0.0294829)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(284700),agg_abort_rate(0.0282824)
tpc-c. cc=Polyjuice. num-threads=48, warehouse-num=1
RESULT throughput(300755),agg_abort_rate(0.0306853)
...

2.4.2 Training Output

There are two ways to see the training results:

The training information will be directly output on the terminal, including each policy's performance within every training iteration.
Each training process creates a folder to store results under corresponding_folder/training/saved_model, named by experiment name and a timestamp.
You can collect the result folders together and use tensorboard to view the training curves visually as in the paper.

tensorboard --logdir="./"

Experiments related w/ training:

Experiment	Script	Corresponding Folder	Figure
polyjuice warmstart-stage	ae_staged_training.sh	ae-tpcc-polyjuice	Figure 8.(a)
polyjuice warmstart-no-stage	ae_staged_training.sh	ae-tpcc-polyjuice	Figure 8.(a)
polyjuice randomstart-stage	ae_staged_training.sh	ae-tpcc-polyjuice-randomstart	Figure 8.(a)
polyjuice EA	ae_ea_rl.sh	ae-tpcc-polyjuice	Figure 8.(b)
polyjuice RL	ae_ea_rl.sh	ae-tpcc-polyjuice-rl	Figure 8.(b)

For example, if you run

./ae_staged_training.sh

It will create new folders, which are used for saving the training logs, in the following two directories:

ae-tpcc-polyjuice/training/saved_model/
ae-tpcc-polyjuice-randomstart/training/saved_model/

After the training, you can cd into corresponding folders and get the training logs.

2.5 Verify the performance

TPC-C/TPC-E/Micro performance
The output should show that Polyjuice outperforms other baselines under high/moderate contention. Polyjuice's performance is slightly lower than Silo under low-contention workloads (e.g. TPC-C 48 threads - 48 warehouse, TPC-E zipf 1.0, Micro zipf 0.4).

TPC-C/TPC-E scalability
The output should show that Polyjuice has at least the same scalability as IC3 and Tebaldi.

TPC-C factor analysis
The output should show that the performance gradually improves after adding more actions into the space.

Training
The output should show that the training result of EA has better performance than RL. As to the comparision between warmstart-stage and warmstart-no-stage, since the training result is not stable all the time, sometimes warmstart-no-stage learns the policy as good as warmstart-stage. However, warmstart-stage is more stable than warmstart-no-stage if trains for 5 times.