‹Programming› 2022
Mon 11 - Thu 14 April 2022
Wed 23 Mar 2022 09:00 - 10:00 at Auditorium Nobre - Crista Lopes Chair(s): Theo D'Hondt

Previous studies have shown that there is a non-trivial amount of duplication in source code. We analyzed a corpus of 2.6 million non-fork projects hosted on GitHub representing over 258 million files written in Java, C++, Python and JavaScript, and found a large amount of duplication, much more than we anticipated. This finding made us be much more careful when using open source repositories for drawing statistical conclusions, especially now – in the age of machine learning. In this talk, I will present our GitHub study, and will briefly cover some of our most recent work on extending duplicate detection to the machine learning models themselves.


Cristina (Crista) Lopes is a Professor in the School of Computer Sciences at University of California, Irvine, with research interests in Programming Languages, Software Engineering, and Distributed Virtual Environments. She is an IEEE Fellow, an ACM Distinguished Scientist, a twice-elected member of the SIGPLAN Executive Committee, and Editor in Chief of The Art, Science, and Engineering of Programming. She is the recipient of the 2016 Pizzigati Prize for Software in the Public Interest for her work in the OpenSimulator virtual world platform. She’s also co-funder of Midspace, a virtual conference platform.

Wed 23 Mar

Displayed time zone: Lisbon change

09:00 - 10:00
Crista LopesKeynotes at Auditorium Nobre
Chair(s): Theo D'Hondt Vrije Universiteit Brussel
09:00
60m
Keynote
The Curious Case of Code Duplication in GitHub‹Programming› Keynote
Keynotes
K: Crista Lopes University of California, Irvine