A Reproduction of Demystifying Privacy Policy of Third-Party Libraries in Android Apps
\textbf{Paper title}
We are submitting this artifact in reference to the technical paper ``Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps". The authors are: (1) Kaifa Zhao (kaifa.zhao@connect.polyu.hk) (2) Xian Zhan (chichoxian@gmail.com) (3) Le Yu (cslyu@comp.polyu.edu.hk) (4) Shiyao Zhou (shiyao.zhou@connect.polyu.hk) (5) Hao Zhou (cshaoz@comp.polyu.edu.hk) (6) Xiapu Luo (csxluo@comp.polyu.edu.hk) (7) Haoyu Wang (haoyuwang@hust.edu.cn) (8) Yepang Liu (liuyp1@sustech.edu.cn)
\textbf{Purpose of the Research artifact}
This artifact contains: (1) the source code and scripts for ATPChecker, the Android third-party library (TPL) privacy policy compliance analysis tool we implemented based on the theoretical framework in the technical paper, and (2) the dataset used in our experiments, which includes a TPL list, TPL binary files, TPL privacy policy documents, host app binary files and host apps’ privacy policy documents.
\textbf{Badges claimed} We are claiming three badges (Functional, Reusable, and Available) under the \textit{Artifacts Evaluated} and \textit{Artifacts Available} sections.
\textbf{\textit{Functional.}} The artifact is \textbf{documented} because it contains an inventory of source code, the privacy policy and binary files dataset, the results and corresponding scripts to generate tables and figures in the paper. The artifact contains documentation about the components of the dataset, the tutorial to use the tool and source code, and the introduction to generate results in the technical paper. The artifact is \textbf{consistent} because it contains ATPChecker and the inputs used to produce the results in the technical paper, the results files and corresponding scripts to generate figures and tables in the technical paper. The artifact is \textbf{complete} because it includes all the components relevant to the technical paper, including the dataset source, how to collect the dataset, how to preprocess the dataset, how to use the tool and how to generate the results. The artifact is \textbf{exercisable} because we give the tutorial to use the tool, including how to replace parameters to use ATPChecker on new data set and how to generate results and tables in the technique paper.
\textit{\textbf{Reusable.}} The artifact gives the tutorial to use the tool with script and source code. We believe that the artifact is of a quality conducive to reusability and extensibility. The main entry points into ATPChecker have user-friendly command-line interfaces which give help and usage information. We also give tutorial to use our tool on new dataset.
ATPChecker includes four modules, namely 1) TPL privacy policy analysis module, 2) host app privacy policy analysis module, 3) TPL files analysis module and 4) host apps’ Android packages (apks) analysis module. Modules 1) and 2) are implemented using python and mainly use natural language processing technique to deal with privacy policy documents. Module 3) and 4) are implemented using java and conduct static analysis to investigate data usage in TPLs’ and host apps’ bytecode. The source code is clean, commented, and consistent with the architecture; therefore, the community can improve or extend each component separately and incrementally. For example, when supporting analyzing privacy policies written in other language, we can simply (1) translate and expand dictionary content in the target language, (2) replace the pre-trained natural language processing model in \textit{hanlp} with models for target language, and (3) update the rule base to accommodate the new language conventions.
Furthermore, since bytecode analysis modules in ATPChecker are based on soot and flowdroid, we can simply update related module to support new features in soot and flowdroid to enhance the static analysis capacity.
\textit{\textbf{Available.}} The artifact is in a public GitHub repository so that the community can resuse, improve, and extend it. Besides, considering the dataset is too large, we also upload our source code and dataset on Onedrive(url{https://connectpolyu-my.sharepoint.com/:f:/g/personal/19044075r_connect_polyu_hk/EkwKbtnqIOBOjkjqHrOWVJYB6Dbx5HjKUFthI6Wpn-5Z0g?e=cgeek8}) and build a website (\url{https://atpchecker.github.io}) for users to apply for accessing the dataset and source code. We will release it on GihHub Release and Onedrive. We will provide appropriate DOI and links in our artifact submission.
\textbf{Technique skills assumed by the reviewer} We assume that the reviewer knows basic Linux shell commands (e.g., \textit{cd}), and redirection, and command-line tools, including git, pip3, python3, unzip, and vim.
\textbf{Software requirements} ATPChecker has been developed on Macos 13.0.1 with Intel Core and tested on Ubuntu 20.04 LTS as is provided in the virtual machine. We require: (1) Ubuntu 20.04 LTS (2) Python 3.8 (3) latest version of pip (4) Python3 packages in requirements.txt (5) at least 70GB storage