WebEvo: Taming Web Application Evolution via Semantic Structure Change Detection
In order to prevent information retrieval (IR) and robotic process automation (RPA) tools from functioning improperly due to website evolution, it is important to develop web monitoring tools to monitor changes in a website and report them to the developers and testers. Existing monitoring tools commonly make use of DOM-tree based similarity and visual analysis between different versions of web pages. However, DOM-tree based similarity suffers are prone to false positives, since they cannot identify content-based changes (i.e., contents refreshed every time a web page is retrieved) and GUI widget evolution (e.g., moving a button). Such imprecision adversely affect IR tools or test scripts. To address this problem, we propose approach, WebEvo, that first performs DOM-based change detection, and then leverages historic pages to identify the regions that represent content-based changes, which can be safely ignored. Further, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. We evaluated WebEvo on 10 real-world websites from 10 popular categories to demonstrate the superiority of WebEvo over the existing work that relies on DOM-tree based detection or whole-page visual comparison.