KrakQL: LLM-Guided Blind Introspection of GraphQL Schemas
\emph{GraphQL APIs} provide a unified endpoint to access data in a distributed web application. The uniqueness of such a technology lies in the query structure, which allows direct access to the required data in one single request. This objective is accomplished by accessing types and fields according to a structured \emph{schema}.
Therefore, the GraphQL schema becomes essential to client applications that need to know how data is structured before accessing it. Fortunately, \emph{introspection queries} provide a programmatic way to recover the schema. However, since the schema exposes invaluable information – such as available fields or relations between types – it could give black hat hackers a significant advantage. To cope with that, production environments suppress direct access to the schema by disabling introspection.
Tools such as \emph{Clairvoyance} allow the retrieval of the schema by combining brute forcing with GraphQL’s field suggestion feature, which suggests valid fields similar to the ones misspelt – this is defined \emph{blind introspection}. Yet, the goodness of the retrieval, which is measured by the completeness of the schema, depends on the goodness of the wordlist used. We demonstrate that \emph{Clairvoyance} managed to cover roughly 35% of the schemas on average in our proposed benchmark, consisting of 7 open-source projects.
In this paper, we propose \emph{KrakQL}: an LLM-guided, novelty-search-based schema retriever that leverages the partially discovered schema to propose context-aware candidate fields and arguments. We demonstrate that \emph{KrakQL} outperforms \emph{Clairvoyance} with 1.4$\times$ higher average coverage, 148$\times$ fewer HTTP requests, and a 105$\times$ higher success rate, all at a negligible token cost. Finally, we release \emph{KrakQL} to support reproducibility and enable further research.