PyXray: Practical Cross-Language Call Graph Construction through Object Layout Analysis (ICSE 2026 - Research Track)

Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

Who

Georgios Alexopoulos, Thodoris Sotiropoulos, Georgios Gousios, Zhendong Su, Dimitris Mitropoulos

Track

ICSE 2026 Research Track

Abstract

A great number of software packages combine code in high-level languages, such as Python, with binary extensions compiled from low-level languages such as C, C++ or Rust to either boost efficiency or enable specific functionalities. In this context, high-level function calls can trigger native (binary) code execution. This setup introduces challenges for call graph generation. Accurate call graphs are essential for various applications, including vulnerability management and software maintenance, as they help track execution paths, assess security risks, and identify unused or redundant code.

This work tackles the problem of cross-language call graph construction in Python. Instead of relying on static analysis, which struggles with identifying Python-native interactions, we propose a dynamic analysis technique which does not require inputs to execute code. Our approach is based on two key insights: (1) when a binary extension is imported from Python code, all its objects (e.g., functions) are loaded into memory, and (2) the layout of callable Python objects contains pointers to the native functions they invoke. By analyzing these memory layouts for every loaded object, we identify corresponding graph edges, which link Python functions to the native functions they eventually invoke. This is an essential element for constructing call graphs across language boundaries.

We implement this approach in PyXray, a tool that efficiently analyzes massive Python packages such as NumPy and PyTorch in minutes, while significantly outpeforming existing static analysis methods in terms of precision and recall. PyXray enables two key applications: (1) cross-language vulnerability management, by identifying whether a Python package potentially calls a vulnerable native function and (2) cross-language bloat analysis, by quantifying unnecessary code across Python and native components.

Link to Preprint

https://grgalex.gr/assets/pdf/pyxray_icse26.pdf

Georgios Alexopoulos

University of Athens