- Tytuł:
- Detection of source code in internet texts using automatically generated machine learning models
- Autorzy:
- Badurowicz, Marcin
- Powiązania:
- https://bibliotekanauki.pl/articles/2097432.pdf
- Data publikacji:
- 2022
- Wydawca:
- Polskie Towarzystwo Promocji Wiedzy
- Tematy:
-
source code
binary classification
text classification
AutoML - Opis:
- In the paper, the authors are presenting the outcome of web scraping software allowing for the automated classification of source code. The software system was prepared for a discussion forum for software developers to find fragments of source code that were published without marking them as code snippets. The analyzer software is using a Machine Learning binary classification model for differentiating between a programming language source code and highly technical text about software. The analyzer model was prepared using the AutoML subsystem without human intervention and fine-tuning and its accuracy in a described problem exceeds 95%. The analyzer based on the automatically generated model has been deployed and after the first year of continuous operation, its False Positive Rate is less than 3%. The similar process may be introduced in document management in software development process, where automatic tagging and search for code or pseudo-code may be useful for archiving purposes.
- Źródło:
-
Applied Computer Science; 2022, 18, 1; 89--98
1895-3735 - Pojawia się w:
- Applied Computer Science
- Dostawca treści:
- Biblioteka Nauki