Detecting Memory Errors in Python Native Code by Tracking Object Lifecycle with Reference Count
Third-party Python modules are usually implemented as binary extensions by using native code (C/C++) to provide additional features and runtime acceleration. In native code, the heap-allocated PyObjects are managed by the reference counting mechanism provided in Python/C APIs for automatic reclaiming. Hence, improper refcount manipulations can lead to memory leaks and use-after-free problems, and cannot be detected by simply pairing the occurrence of source and sink points. To detect such problems, state-of-the-art approaches have made groundbreaking contributions to identifying inappropriate final refcount values before returning from native code to Python. However, not all problems can be exposed at the end of a path. To detect those hidden in the middle of a path in native code, it is also crucial to track the lifecycle state of PyObjects through the refcount and lifecycle operations in API calls.
To achieve this goal, we propose the PyObject State Transition Model (PSTM) recording the lifecycle states and refcount values of PyObjects to describe the effects of Python/C API calls and pointer operations. We track state transitions of PyObjects with symbolic execution based on the model, and report problems when a statement triggers a transition to buggy states. The program state is also expanded to handle pointer nullity checks and smart pointers of PyObjects. We conduct experiments on 12 open-source projects and detect 259 real problems out of 280 reports, which is twice as many bugs as state-of-the-art approaches. We submit 168 real bugs to those active projects, and 106 issues are either confirmed or resolved.
Thu 14 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
13:30 - 15:00 | DebuggingResearch Papers / Industry Showcase (Papers) at Room E Chair(s): Carol Hanna University College London | ||
13:30 12mTalk | Coding and Debugging by Separating Secret Code toward Secure Remote Development Industry Showcase (Papers) Shinobu Saito NTT Media Attached File Attached | ||
13:42 12mTalk | Detecting Memory Errors in Python Native Code by Tracking Object Lifecycle with Reference Count Research Papers Xutong Ma State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, Jiwei Yan Institute of Software at Chinese Academy of Sciences, China, Hao Zhang Institute of Software, Chinese Academy of Sciences, Jun Yan Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jian Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences Pre-print | ||
13:54 12mResearch paper | PERFCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis Research Papers Zhenlan Ji The Hong Kong University of Science and Technology, Pingchuan Ma HKUST, Shuai Wang Hong Kong University of Science and Technology Pre-print | ||
14:06 12mTalk | The MAP metric in Information Retrieval Fault Localization Research Papers Media Attached File Attached | ||
14:18 12mTalk | Eiffel: Inferring Input Ranges of Significant Floating-point Errors via Polynomial ExtrapolationRecorded talk Research Papers Zuoyan Zhang Information Engineering University, Bei Zhou Information Engineering University, Jiangwei Hao Information Engineering University, Hongru Yang Information Engineering University, Mengqi Cui Information Engineering University, Yuchang Zhou Information Engineering University, Guanghui Song Information Engineering University, Fei Li Information Engineering University, Jinchen Xu Information Engineering University, Jie Zhao State Key Laboratory of Mathematical Engineering and Advanced Computing Media Attached File Attached | ||
14:30 12mTalk | Information Retrieval-based Fault Localization for Concurrent ProgramsRecorded talk Research Papers Pre-print Media Attached |