Towards Reliable Memory Management for Python Native Extensions
Many programming languages provide a C interface as a foreign function interface (FFI) for C developers to access the language, the Python language being one of these languages. Over the years, the Python C API has grown to be a challenge for the evolution of the Python ecosystem. In this paper, we implement a new Python FFI, we call CyStck, by combining a stack and light-weight handles, to support efficient garbage collection (GC) in Python native extensions. Five large, real-world Python extensions are ported to CyStck, thoroughly profiled with the Scalene profiler, comparing CyStck to the current CPython C API and another Python C API implementation, HPy. CyStck provides speed ups in native (12%) and Python (13%) time respectively for some benchmarks. CyStck also introduces acceptable overhead in system time, as low as 0.2X in some benchmarks while copying the fewest bytes (1%—40%) for all benchmarks across the C/Python boundary compared to the CPython API and HPy respectively. We also implemented a tool to automate the migration of extensions from the CPython C API to CyStck using pattern matching and static analysis, with a success rate as high as 90%.
Our experience highlights two key outcomes towards any such efforts aimed at redesigning the Python C API. First, we can completely automate garbage collection to not require manual intervention, but if we are to still use the PyObject union type with an indirection through handles, albeit light-weight, then the API will incur costs related to conversion of data and processing the types when creating the extension module. Secondly, automation of migration of code bases is possible for most syntactical and semantic features, by even simple text-to-text transformations, and does not seem as complex as the Python 2 to 3 transition. As the Python core team ponders a new direction for evolving the Python C API, we believe our experience of redesigning the Python C API and porting large extensions manually and automatically, will be beneficial for the Python community on the design impact for both performance and backward compatibility.
Mon 17 JulDisplayed time zone: Pacific Time (US & Canada) change
10:30 - 12:00 | |||
10:30 30mTalk | Towards Reliable Memory Management for Python Native Extensions ICOOOLPS Joannah Nanjekye University of New Brunswick, David Bremner University of New Brunswick, Aleksandar Micic IBM, Canada Pre-print |