DUPChecker analyzes data syntax defined using standard serialization libraries and detect incompatibility across versions, which can lead to upgrade failures. It focuses on two widely adopted serialization libraries, Portocol Buffer and Apache Thrift.
Protocols evolve over time. Developers can update any protocol to meet the program’s need. However, certain rules have to be followed to avoid data-syntax incompatibility across versions. Particularly, the manuals of Protocol Buffer and Apache Thrift both state the following rules:
(1). Add/delete required field.
(2). The tag number of a field has been changed.
(3). A required field has been changed to non-required.
(4). Added or deleted an enum with no 0 value.
Violating the first two rules will definitely lead to upgrade failures caused by syntax incompatibility, which will be referred to as ERROR
by DUPChecker; violating the third rule may lead to failures, which will be referred to as WARNING
by DUPChecker, if the new version generates data that does not contain its no-longer-required data member. For other type of changes such as changing field type,
DUPChecker will output INFO
level information.
Prerequiste: In Python3, install javalang, numpy, pyparsing with:
$pip3 install javalang, numpy, pyparsing
Checkout DUPChecker to your local machine.
$git clone https://github.com/jwjwyoung/DUPChecker.git
-
Prepare the application that you would like to check the consistentcy on the same machine, suppose its path is
path_app
. -
Run Script
python3 checker.py --app path_app --filetype --v1 old_version_tag --v2 new_version_tag
e.g. check for proto file:
python3 checker.py --app hbase --proto --v1 rel/2.2.6 --v2 rel/2.3.3
e.g. check for thrift file:
python3 checker.py --app hbase --thrift --v1 rel/2.2.6 --v2 rel/2.3.3
$java -jar EnumChecker.jar > output.log
$grep "============start enum================" -A 5 output.log
-
Checkout the required applications in the DUPChecker/ directory..
(1). hbase
git clone https://github.com/apache/hbase.git
(2). hdfs, yarn
git clone https://github.com/apache/hadoop.git
(3). mesos
git clone https://github.com/apache/mesos.git
(4). hive
git clone https://github.com/apache/hive.git
(5). impala
git clone https://github.com/apache/impala.git
(6). accumulo
git clone https://github.com/apache/accumulo.git
-
Create a log folder
mkdir log
-
Run scripts:
python3 run_experiment.py
The results will be output to files under log folder with application's name as prefix.
-
Generate Table 6 in the paper:
python3 export.py
-
server config: virtualbox ubnutu 18, 4G RAM, 20G disk.
-
time distribution:
1). setup new ubuntu vm - 15~30 min 2)install dependencies and download required git repos - 15 min 3) run experiements - ~60 min.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.