- Tytuł:
- Automatic wrapper generation and generalization for social media websites
- Autorzy:
-
Baziński, B.
Brzezicki, M. - Powiązania:
- https://bibliotekanauki.pl/articles/206411.pdf
- Data publikacji:
- 2012
- Wydawca:
- Polska Akademia Nauk. Instytut Badań Systemowych PAN
- Tematy:
-
automatic wrapper generation
information extraction - Opis:
- The data contained within user generated kontent websites prove to be valuable in many applications, for example in social media monitoring or in acquisition of training sets for machine learning algorithms. Mining such data is especially difficult in case of web forums, because of hundreds of various forum engines used. We propose an algorithm capable of unsupervised extraction of posts from social websites, without the need to analyse more than one page in advance. Our method localizes potential data regions by repetition analysis within document structure and filtering potential results. Subsequently, the fields of data records are fund using key characteristics and series-wide dependencies. We manager to achieve 85% precision of extraction and 79% recall after experiments on single pages taken from 258 websites. Our solution is characterized by high computing efficiency, thus enabling wide applications.
- Źródło:
-
Control and Cybernetics; 2012, 41, 4; 817-834
0324-8569 - Pojawia się w:
- Control and Cybernetics
- Dostawca treści:
- Biblioteka Nauki