When analyzing any corpus of programs, care must be taken to ensure that the corpus is truly representative of the entire ecosystem, otherwise the observed features might be far from reality. A naive approach is to increase the size of the dataset, thus diminishing the chance that an interesting feature will be left out. However, such approach may easily lead to overemphasis on features that are mostly present, but not frequently executed.
To tackle this issue, the code duplication patterns in the corpus and the ecosystem must be understood and correlated to the actual frequency of the code in the wild.
Wed 18 Jul
|14:00 - 14:30|
Saam BaratiAppleFile Attached
|14:30 - 14:50|
|14:50 - 15:10|
Petr MajCzech Technical University, François GauthierOracle Labs, Celeste HollenbeckNortheastern University, USA, Jan VitekNortheastern University, Cristina CifuentesOracle LabsFile Attached
|15:10 - 15:30|