- Tytuł:
- TF-IDF inspired detection for cross-language source code plagiarism and collusion
- Autorzy:
- Karnalim, Oscar
- Powiązania:
- https://bibliotekanauki.pl/articles/305519.pdf
- Data publikacji:
- 2020
- Wydawca:
- Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
- Tematy:
-
source code plagiarism and collusion
cross-language detection
TF-IDF
computing education
information retrieval - Opis:
- Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Furthermore, it is comparable to those techniques when dealing with conventional disguises.
- Źródło:
-
Computer Science; 2020, 21 (1); 113-134
1508-2806
2300-7036 - Pojawia się w:
- Computer Science
- Dostawca treści:
- Biblioteka Nauki