One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these files.
However, in a code repository, the trigram index evolves across different versions. Upon checking out a new version, this index is typically built from scratch, which is a time-consuming task, while we want our index to have almost zero-time startup.
Thus, we explore the persistent version of a trigram index for full-text and key word patterns search. Our approach just uses the current version of the trigram index and applies only the changes between versions during checkout, significantly enhancing performance. Furthermore, we extend our data structure to accommodate CamelHump search for class and function names.
Attila Szatmári Szegedi Tudományegyetem, Qusay Idrees Sarhan Department of Software Engineering, University of Szeged, Péter Attila Soha Department of Software Engineering, University of Szeged, Gergő Balogh Department of Software Engineering, University of Szeged, Árpád Beszédes Department of Software Engineering, University of Szeged
Niklas Krieger Institute of Software Engineering, University of Stuttgart, Sandro Speth Institute of Software Engineering, University of Stuttgart, Steffen Becker University of Stuttgart
Tim Kräuter Western Norway University of Applied Sciences, Patrick Stünkel Western Norway University of Applied Sciences, Adrian Rutle Western Norway University of Applied Sciences, Yngve Lamo Western Norway University of Applied Sciences