- Tytuł:
- Improving the credibility of the extracted position from a vast collection of job offers with machine learning ensemble methods
- Autorzy:
-
Drozda, Paweł
Ropiak, Krzysztof
Nowak, Bartosz A.
Talun, Arkadiusz
Osowski, Maciej - Powiązania:
- https://bibliotekanauki.pl/articles/22615539.pdf
- Data publikacji:
- 2023
- Wydawca:
- Uniwersytet Warmińsko-Mazurski w Olsztynie
- Tematy:
-
machine learning
web scraping
granularity method
classification - Opis:
- The main aim of this paper is to evaluate crawlers collecting the job offers from websites. In particular the research is focused on checking the effectiveness of ensemble machine learning methods for the validity of extracted position from the job ads. Moreover, in order to significantly reduce the training time of the algorithms (Random Forests and XGBoost), granularity methods were also tested to significantly reduce the input training dataset. Both methods achieved satisfactory results in accuracy and F1 measures, which exceeded 96%. In addition, granulation reduced the input dataset by more than 99%, and the results obtained were only slightly worse (accuracy between 1% and 5%, F1 between 3% and 8%). Thus, it can be concluded that the considered methods can be used in the evaluation of job web crawlers.
- Źródło:
-
Technical Sciences / University of Warmia and Mazury in Olsztyn; 2023, 26(1); 125--140
1505-4675
2083-4527 - Pojawia się w:
- Technical Sciences / University of Warmia and Mazury in Olsztyn
- Dostawca treści:
- Biblioteka Nauki