Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference (ESEM 2021 - Emerging Results and Vision papers)

Who

Anna Corazza, Sergio Di Martino, Adriano Peron, Luigi Libero Lucio Starace

Track

ESEM 2021 Emerging Results and Vision papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 12 Oct 2021 16:25 - 16:35 at ESEM ROOM - Testing & Security 2 Chair(s): Davide Fucci

Abstract

Background: In the context of End-to-End testing of web applications, automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites.

Aims: As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to improve the model inference of web applications, based on Tree Kernel (TK) functions. TKs are a class of functions that compute similarity between tree-structured objects, largely investigated and successfully applied in the Natural Language Processing domain.

Method: To evaluate the capability of the proposed approach in detecting near-duplicate web pages, we conducted preliminary classification experiments on a freely-available massive dataset of about 100k manually annotated web page pairs. We compared the classification performance of the proposed approach with other state-of-the-art near-duplicate detection techniques.

Results: Preliminary results show that our approach performs better than state-of-the-art techniques in the near-duplicate detection classification task.

Conclusions: These promising results show that TKs can be applied to near-duplicate detection in the context of web application model inference, and motivate further research in this direction to assess the impact of the technique on the quality of the inferred models and on the subsequent application of model-based testing techniques.

Link to Preprint

https://arxiv.org/abs/2108.13322

Anna Corazza

Università degli Studi di Napoli Federico II

Italy

Sergio Di Martino

Università degli Studi di Napoli Federico II

Italy

Adriano Peron

Università degli Studi di Napoli Federico II

Italy

Luigi Libero Lucio Starace

Università degli Studi di Napoli Federico II

Italy

Artifact

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 12 Oct
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:30 - 16:35	Testing & Security 2Technical Papers / Emerging Results and Vision papers at ESEM ROOM Chair(s): Davide Fucci Blekinge Institute of Technology

15:30 15m Talk		Barriers to Shift-Left Security: The Unique Pain Points of Writing Automated Tests Involving Security Controls Technical Papers Danielle Gonzalez Rochester Institute of Technology and Microsoft, Paola Peralta Perez Rochester Institute of Technology, Mehdi Mirakhorli Rochester Institute of Technology DOI
15:45 15m Talk		Security Smells Pervade Mobile App Servers Technical Papers Pascal Gadient University of Bern, Marc-Andrea Tarnutzer University of Bern, Oscar Nierstrasz University of Bern, Switzerland, Mohammad Ghafari University of Auckland Pre-print
16:00 15m Talk		Who are Vulnerability Reporters? A Large-scale Empirical Study on FLOSS Technical Papers Nikolaos Alexopoulos Technical University of Darmstadt, Andy Meneely Rochester Institute of Technology, Dorian Arnouts Technical University of Darmstadt, Max Mühlhäuser Technical University of Darmstadt Pre-print
16:15 10m Talk		Python Crypto Misuses in the Wild Emerging Results and Vision papers Anna-Katharina Wickert TU Darmstadt, Germany, Lars Baumgärtner TU Darmstadt, Florian Breitfelder TU Darmstadt, Mira Mezini TU Darmstadt, Germany Pre-print Media Attached
16:25 10m Talk		Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference Emerging Results and Vision papers Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II Pre-print Media Attached

Information for Participants

Tue 12 Oct 2021 15:30 - 16:35 at ESEM ROOM - Testing & Security 2 Chair(s): Davide Fucci

Info for room ESEM ROOM:

https://www.youtube.com/c/ESEM_Conference