Temat: web data - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Current challenges and possible big data solutions for the use of web data as a source for official statistics
Współczesne wyzwania i możliwości w zakresie stosowania narzędzi big data do uzyskania danych webowych jako źródła dla statystyki publicznej
Autorzy:: Daas, Piet
Maślankowski, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/31232088.pdf
Data publikacji:: 2023-12-29
Wydawca:: Główny Urząd Statystyczny
Tematy:: big data
web data
websites
web scraping
dane webowe
strony internetowe
Opis:: Web scraping has become popular in scientific research, especially in statistics. Preparing an appropriate IT environment for web scraping is currently not difficult and can be done relatively quickly. Extracting data in this way requires only basic IT skills. This has resulted in the increased use of this type of data, widely referred to as big data, in official statistics. Over the past decade, much work was done in this area both on the national level within the national statistical institutes, and on the international one by Eurostat. The aim of this paper is to present and discuss current problems related to accessing, extracting, and using information from websites, along with the suggested potential solutions. For the sake of the analysis, a case study featuring large-scale web scraping performed in 2022 by means of big data tools is presented in the paper. The results from the case study, conducted on a total population of approximately 503,700 websites, demonstrate that it is not possible to provide reliable data on the basis of such a large sample, as typically up to 20% of the websites might not be accessible at the time of the survey. What is more, it is not possible to know the exact number of active websites in particular countries, due to the dynamic nature of the Internet, which causes websites to continuously change.
Web scraping jest coraz popularniejszy w badaniach naukowych, zwłaszcza w dziedzinie statystyki. Przygotowanie środowiska do scrapowania danych nie przysparza obecnie trudności i może być wykonane relatywnie szybko, a uzyskiwanie informacji w ten sposób wymaga jedynie podstawowych umiejętności cyfrowych. Dzięki temu statystyka publiczna w coraz większym stopniu korzysta z dużych wolumenów danych, czyli big data. W drugiej dekadzie XXI w. zarówno krajowe urzędy statystyczne, jak i Eurostat włożyły dużo pracy w doskonalenie narzędzi big data. Nadal istnieją jednak trudności związane z dostępnością, ekstrakcją i wykorzystywaniem informacji pobranych ze stron internetowych. Tym problemom oraz potencjalnym sposobom ich rozwiązania został poświęcony niniejszy artykuł. Omówiono studium przypadku masowego web scrapingu wykonanego w 2022 r. za pomocą narzędzi big data na próbie 503 700 stron internetowych. Z analizy wynika, że dostarczenie wiarygodnych danych na podstawie tak dużej próby jest niemożliwe, ponieważ w czasie badania zwykle do 20% stron internetowych może być niedostępnych. Co więcej, dokładna liczba aktywnych stron internetowych w poszczególnych krajach nie jest znana ze względu na dynamiczny charakter Internetu, skutkujący ciągłymi zmianami stron internetowych.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2023, 68, 12; 49-64
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Porównanie cen i wskaźników cen konsumpcyjnych: tradycyjna metoda uzyskiwania danych a źródła alternatywne
Comparison of prices and consumer price indices: traditional data collection and alternative data sources
Autorzy:: Białek, Jacek
Dominiczak-Astin, Alina
Turek, Dorota
Powiązania:: https://bibliotekanauki.pl/articles/1813758.pdf
Data publikacji:: 2021-09-30
Wydawca:: Główny Urząd Statystyczny
Tematy:: wskaźniki cen
dane skanowane
dane skrapowane
inflacja
price indices
scanner data
web scraped data
inflation
Opis:: Jednym z większych wyzwań stojących przed statystyką publiczną w XXI w. jest wykorzystanie alternatywnych źródeł danych o cenach w celu unowocześnienia statystyki cen konsumpcyjnych, a w rezultacie – zwiększenia dokładności i rzetelności danych o inflacji. Trudności w zbieraniu danych metodą tradycyjną spowodowane przez COVID-19 (obostrzenia dotyczące utrzymywania dystansu, które ograniczyły wyjścia ankieterów w teren, i zamykanie punktów sprzedaży) wpłynęły na zintensyfikowanie prac nad alternatywnymi źródłami danych. W artykule przedstawiono wyniki badania eksperymentalnego, w którym wykorzystano dane o cenach uzyskane metodą tradycyjną (przez ankieterów) oraz dane skanowane i skrapowane, pochodzące z sieci handlowej działającej w Polsce. Głównym celem badania było określenie występowania i oszacowanie wielkości różnic w poziomie cen i wartościach wskaźnika cen wybranych produktów spożywczych obliczonych metodą tradycyjną oraz z wykorzystaniem alternatywnych źródeł danych, czyli danych skanowanych i skrapowanych. Za dodatkowy cel postawiono sobie zidentyfikowanie przyczyn tych różnic w odniesieniu do specyfiki źródeł danych. Badaniem empirycznym objęto luty i marzec 2021 r. Wyniki otrzymane na podstawie danych z różnych źródeł porównano za pomocą metod graficznych (histogramy, wykresy pudełkowe) oraz wyznaczenia elementarnych indeksów według formuł Dutota, Carliego i Jevonsa. Wyniki wskazały na rozbieżności – niekiedy znaczne – w rozkładach cen uzyskanych z różnych źródeł danych, co skłania do wniosku, że zastosowanie danych skanowanych i skrapowanych może prowadzić do zawyżania lub zaniżania wskaźników cen uzyskanych metodą tradycyjną. W artykule omówiono również podstawowe aspekty metodologiczne dotyczące uzyskiwania i wykorzystywania danych ze źródeł alternatywnych oraz wskazano prawdopodobne przyczyny różnic, jakie zaobserwowano zarówno w rozkładach cen produktów, jak i w wartościach miesięcznego wskaźnika cen obliczonego przy wykorzystaniu danych z różnych źródeł.
One of the major challenges official statistics is faced with in the 21st century is the use of alternative sources of price data in order to modernise consumer price statistics and, as a result, to improve the accuracy and reliability of inflation data. Data collecting based on the traditional method encountered numerous difficulties caused by COVID-19 (distance-keeping restrictions limiting price collectors’ fieldwork, closures of points of sale). As a consequence, the work on alternative data sources intensified. The article presents the results of an experimental study involving the use of prices collected by means of the traditional method (by price collectors), and scanner and web scraped data from one of the retail chains operating in Poland. The aim of the study was to investigate the occurrence of differences in prices and price indices of selected food products and to estimate them, using the traditional method and alternative data sources, i.e. scanner and web scraped data. An additional goal was set to identify sourcebased reasons for these differences. The empirical study covered the period of February and March 2021. The results based on data from different sources were compared using both graphical methods (histograms, box plots) and the calculation of elementary price indices according to the Dutot, Carli and Jevons formulas. The findings revealed certain, sometimes serious discrepancies in the distributions of prices obtained from various data sources, which suggests that the application of scanner and web scraped data may lead to the over- and understating of price indices obtained via the traditional method. The article also discusses the main methodological aspects of obtaining and applying data from alternative sources, and indicates the probable causes of the differences observed both in distributions of product prices and in monthly price indices calculated using data from various sources.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2021, 66, 9; 32-69
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Analysis of the cadastral data published in the Polish Spatial Data Infrastructure
Autorzy:: Izdebski, W.
Powiązania:: https://bibliotekanauki.pl/articles/145438.pdf
Data publikacji:: 2017
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: dane katastralne
usługi sieciowe
mapa internetowa
web services
Web Map Service
spatial data infrastructure
cadastral data
Opis:: The cadastral data, including land parcels, are the basic reference data for presenting various objects collected in spatial databases. Easy access to up-to-date records is a very important matter for the individuals and institutions using spatial data infrastructure. The primary objective of the study was to check the current accessibility of cadastral data as well as to verify how current and complete they are. The author started researching this topic in 2007, i.e. from the moment the Team for National Spatial Data Infrastructure developed documentation concerning the standard of publishing cadastral data with the use of the WMS. Since ten years, the author was monitoring the status of cadastral data publishing in various districts as well as participated in data publishing in many districts. In 2017, when only half of the districts published WMS services from cadastral data, the questions arise: why is it so and how to change this unfavourable status? As a result of the tests performed, it was found that the status of publishing cadastral data is still far from perfect. The quality of the offered web services varies and, unfortunately, many services offer poor performance; moreover, there are plenty services that do not operate at all.
Źródło:: Geodesy and Cartography; 2017, 66, 2; 227-240
2080-6736
2300-2581
Pojawia się w:: Geodesy and Cartography
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Mailing Lists Archives Analyzer
Autorzy:: Rzecki, K.
Riegel, M.
Powiązania:: https://bibliotekanauki.pl/articles/93058.pdf
Data publikacji:: 2006
Wydawca:: Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:: e-mail header
data analyzing
web mining
Opis:: Article describes chance to explore data hidden in headers of e-mails taken from archive of mailing lists. Scientist part of the article presents a way of transforms information enclosed in Internet resources, explains idea of mailing lists archive and points out knowledge can be taken from. Technical part presents implemented and working system analyzing headers of e-mail messages stored in mailing lists archives. Some example results of this experiment are also given.
Źródło:: Studia Informatica : systems and information technology; 2006, 1(7); 117-125
1731-2264
Pojawia się w:: Studia Informatica : systems and information technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Integracja usług sieciowych z uwzględnieniem poziomu wiarygodności ich dostawców
WEB services integration considering the level of vendors believability
Autorzy:: Kaczmarek, A.
Powiązania:: https://bibliotekanauki.pl/articles/266675.pdf
Data publikacji:: 2010
Wydawca:: Politechnika Gdańska. Wydział Elektrotechniki i Automatyki
Tematy:: wiarygodność danych
usługi sieciowe
data believability
web services
Opis:: Artykuł porusza temat wiarygodności danych pobieranych z usług sieciowych. Przedstawiona została metoda oceny wiarygodności takich danych opierająca się na czterech metrykach: powszechności informacji, niezależności źródła informacji, prestiżu źródła oraz doświadczenia ze współpracy ze źródłem. Metoda ta ma zastosowanie przy integracji usług sieciowych pochodzących od wielu różnych dostawców. Metoda pozwala na automatyczną ocenę poziomu wiarygodności na podstawie informacji dotyczących pochodzenia danych (ang. data provenance).
This paper is concerned with the believability of data acquired from web services. In the paper a method for estimating data believability is presented. The estimation is based on four metrics: information commonality, source independence, prestige of the source and experience with the source. Presented method supports the integration of web services provided by various vendors. The method makes possible to automatically determine the level of data believability on the basis of data provenance.
Źródło:: Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej; 2010, 28; 69-72
1425-5766
2353-1290
Pojawia się w:: Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Distributed web service repository
Autorzy:: Nawrocki, P.
Mamla, A.
Powiązania:: https://bibliotekanauki.pl/articles/305281.pdf
Data publikacji:: 2015
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: web service
repository
heterogeneity
data replication
node balancing
Opis:: The increasing availability and popularity of computer systems has resulted in a demand for new language- and platform-independent ways of data exchange. This demand has, in turn, led to significant growth in the importance of systems based on Web services. Alongside the growing number of systems accessible via Web services came the need for specialized data repositories that could offer effective means of searching the available services. The development of mobile systems and wireless data transmission technologies has allowed us to use distributed devices and computer systems on a greater scale. The accelerating growth of distributed systems might be a good reason to consider the development of distributed Web service repositories with built-in mechanisms for data migration and synchronization.
Źródło:: Computer Science; 2015, 16 (1); 55-73
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Data presentation on the map in Google Charts and jQuery JavaScript technologies
Prezentacja danych na mapie w technologiach Google Charts oraz jQuery JavaScript
Autorzy:: Król, K.
Powiązania:: https://bibliotekanauki.pl/articles/100414.pdf
Data publikacji:: 2016
Wydawca:: Uniwersytet Rolniczy im. Hugona Kołłątaja w Krakowie
Tematy:: data visualisation
mashup
web cartography
wizualizacja danych
kartografia internetowa
Opis:: The article presents selected software development tools and technologies that enable the presentation of statistical data on digital maps in the browser. The aim of the study was to describe them, and to conduct their comparative evaluation. In our studies, we have used ad-hoc tests, performed on the basis of usability and functionality, using the technique of self-evaluation. Based on the criteria of global popularity and availability, the following were subjected to ad-hoc tests: Google Visualization - Geomap and Geo Chart, as well as selected solutions developed on the basis of the jQuery JavaScript. In conclusion, it has been demonstrated that the tested design and development technologies are complementary, while the selection of tools to carry out the design principles assumed remains at the discretion of the user.
W artykule przedstawiono wybrane techniki i narzędzia programistyczne, które umożliwiają prezentację danych statystycznych na mapach cyfrowych w oknie przeglądarki internetowej. Celem pracy była ich charakterystyka i ocena porównawcza. W badaniach posłużono się testami typu ad-hoc, które przeprowadzono na gruncie użyteczności i funkcjonalności, posługując się techniką samooceny. Na podstawie kryterium popularności oraz dostępności w świecie testom ad-hoc poddano Google Visualization: Geomap oraz Geo Chart, a także wybrane rozwiązania przygotowane w oparciu o jQuery JavaScript. W konkluzji wykazano, że testowane techniki projektowe są komplementarne, a w gestii użytkownika pozostaje dobór narzędzi umożliwiających realizację przyjętych założeń projektowych.
Źródło:: Geomatics, Landmanagement and Landscape; 2016, 2; 91-106
2300-1496
Pojawia się w:: Geomatics, Landmanagement and Landscape
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Data mining
Autorzy:: Morzy, Tadeusz
Powiązania:: https://bibliotekanauki.pl/articles/703139.pdf
Data publikacji:: 2007
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: data mining
data analysis
evolution of information technology
association analysis
classification
clustering
Web mining
Opis:: Recent advances in data capture, data transmission and data storage technologies have resulted in a growing gap between more powerful database systems and users' ability to understand and effectively analyze the information collected. Many companies and organizations gather gigabytes or terabytes of business transactions, scientific data, web logs, satellite pictures, textreports, which are simply too large and too complex to support a decision making process. Traditional database and data warehouse querying models are not sufficient to extract trends, similarities and correlations hidden in very large databases. The value of the existing databases and data warehouses can be significantly enhanced with help of data mining. Data mining is a new research area which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from large databases and data warehouses. Data mining, also referred to as database mining or knowledge discovery in databases, can help answer business questions that were too time consuming to resolve with traditional data processing techniques. The process of mining the data can be perceived as a new way of querying – with questions such as ”which clients are likely to respond to our next promotional mailing, and why?”. The aim of this paper is to present an overall picture of the data mining field as well as presents briefly few data mining methods. Finally, we summarize the concepts presented in the paper and discuss some problems related with data mining technology.
Źródło:: Nauka; 2007, 3
1231-8515
Pojawia się w:: Nauka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Web-based software system for processing bilingual digital resources
Autorzy:: Dutsova, Ralitsa
Powiązania:: https://bibliotekanauki.pl/articles/677188.pdf
Data publikacji:: 2014
Wydawca:: Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:: aligned corpus
concordance
data mining
dictionary entry
digital dictionary
search tool
web-interface
web-application
Opis:: Web-based software system for processing bilingual digital resourcesThe article describes a software management system developed at the Institute of Mathematics and Informatics, BAS, for the creation, storing and processing of digital language resources in Bulgarian. Independent components of the system are intended for the creation and management of bilingual dictionaries, for information retrieval and data mining from a bilingual dictionary, and for the presentation of aligned corpora. A module which connects these components is also being developed. The system, implemented as a web-application, contains tools for compilation, editing and search within all components.
Źródło:: Cognitive Studies; 2014, 14
2392-2397
Pojawia się w:: Cognitive Studies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Propozycja metodyki oceny poziomu automatyzacji powiatowego zasobu geodezyjnego i kartograficznego
Proposal for the methodology of automation assessment of geodetic and cartographic resource
Autorzy:: Izdebski, W.
Powiązania:: https://bibliotekanauki.pl/articles/371889.pdf
Data publikacji:: 2017
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: dane przestrzenne
infrastruktura danych przestrzennych
usługi sieciowe
spatial data
spatial data infrastructure (SDI)
web services
Opis:: Na bazie obowiązujących przepisów, dane zawarte w powiatowym zasobie geodezyjnym i kartograficznym są kluczowym elementem krajowej infrastruktury informacji przestrzennej. Metody i środki wykorzystywane do prowadzenia zasobu były zawsze adekwatne do dostępnych środków technicznych. Obecny stan technologiczny przynosi wiele nowych możliwości usprawnień w funkcjonowaniu zasobu, a przede wszystkim możliwość jego automatyzacji. Aby poziomy automatyzacji w poszczególnych powiatach mogły być ze sobą porównywane potrzebne jest opracowanie metodyki ich oceny, a propozycję takiej autorskiej metodyki zawiera niniejszy artykuł.
On the basis of applicable laws, data contained in the county geodetic and cartographic resource are a key element of the national spatial data infrastructure. Methods and means used to carry out the resource were always adequate to the available technical means. The current state of technology brings many new opportunities for improvement in the functioning of the resource, and above all the possibility of automation. A methodology for evaluation is needed for comparing the levels of automation in individual counties and a proposal for such a proprietary methodology is presented in this article.
Źródło:: Zeszyty Naukowe. Inżynieria Środowiska / Uniwersytet Zielonogórski; 2017, 165 (45); 27-35
1895-7323
Pojawia się w:: Zeszyty Naukowe. Inżynieria Środowiska / Uniwersytet Zielonogórski
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 11.

Tytuł:: The Integration, Analysis and Visualization of Sensor Data from Dispersed Wireless Sensor Network Systems Using the SWE Framework
Autorzy:: Lee, Y. J.
Trevathan, J.
Atkinson, I.
Read, W.
Powiązania:: https://bibliotekanauki.pl/articles/308227.pdf
Data publikacji:: 2015
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: environment data
environmental monitoring
sensor technologies
standardization
web-based visualization
Opis:: Wireless Sensor Networks (WSNs) have been used in numerous applications to remotely gather real-time data on important environmental parameters. There are several projects where WSNs are deployed in different locations and operate independently. Each deployment has its own models, encodings, and services for sensor data, and are integrated with different types of visualization/analysis tools based on individual project requirements. This makes it dicult to reuse these services for other WSN applications. A user/system is impeded by having to learn the models, encodings, and services of each system, and also must integrate/interoperate data from different data sources. Sensor Web Enablement (SWE) provides a set of standards (web service interfaces and data encoding/model specications) to make sensor data publicly available on the web. This paper describes how the SWE framework can be extended to integrate disparate WSN systems and to support standardized access to sensor data. The proposed system also introduces a web-based data visualization and statistical analysis service for data stored in the Sensor Observation Service (SOS) by integrating open source technologies. A performance analysis is presented to show that the additional features have minimal impact on the system. Also some lessons learned through implementing SWE are discussed.
Źródło:: Journal of Telecommunications and Information Technology; 2015, 4; 86-97
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 12.

Tytuł:: Interactive cloud data farming environment for military mission planning support
Autorzy:: Kryza, B.
Król, D.
Wrzeszcz, M.
Dutka, L.
Kitowski, J.
Powiązania:: https://bibliotekanauki.pl/articles/305448.pdf
Data publikacji:: 2012
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: data farming
cloud
virtualisation
Web 2.0
mission planning support
Opis:: In a modern globalised world, military and peace keeping forces often face situations which require very subtle and well planned operations taking into account cultural and social aspects of a given region and its population as well as dynamic psychological awareness related to recent events which can have impact on the attitude of the civilians. The goal of the EUSAS project is to develop a prototype of a system enabling mission planning support and training capabilities for soldiers and police forces dealing with asymmetric threat situations, such as crowd control in urban territory. In this paper, we discuss the data-farming infrastructure developed for this project, allowing generation of large amount of data from agent based simulations for further analysis allowing soldier training and evaluation of possible outcomes of different rules of engagement.
Źródło:: Computer Science; 2012, 13 (3); 89-100
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 13.

Tytuł:: Pozyskiwanie i analiza danych na temat ofert pracy z wykorzystaniem big data
The collection and analysis of the data on job advertisements with the use of big data
Autorzy:: Maślankowski, Jacek
Powiązania:: https://bibliotekanauki.pl/articles/962829.pdf
Data publikacji:: 2019
Wydawca:: Główny Urząd Statystyczny
Tematy:: big data
text mining
web scraping
rynek pracy
labour market
Opis:: Celem artykułu jest zaprezentowanie korzyści wynikających z wykorzystania na potrzeby statystyki publicznej (rynku pracy) narzędzi do automatycznego pobierania danych na temat ofert pracy zamieszczanych na stronach internetowych zaliczanych do zbiorów big data, a także związanych z tym wyzwań. Przedstawiono wyniki eksperymentalnych badań z wykorzystaniem metod web scrapingu oraz text miningu. Analizie poddano dane z lat 2017 i 2018 pochodzące z najpopularniejszych portali z ofertami pracy. Odwołano się do danych Głównego Urzędu Statystycznego (GUS) zbieranych na podstawie sprawozdania Z-05. Przeprowadzona analiza prowadzi do wniosku, że web scraping może być stosowany w statystyce publicznej do pozyskiwania danych statystycznych z alternatywnych źródeł, uzupełniających istniejące bazy danych statystycznych, pod warunkiem zachowania spójności z istniejącymi badaniami.
The goal of this paper is to present, on the one hand, the benefits for official statistics (labour market) resulting from the use of web scraping methods to gather data on job advertisements from websites belonging to big data compilations, and on the other, the challenges connected to this process. The paper introduces the results of experimental research where web-scraping and text-mining methods were adopted. The analysis was based on the data from 2017–2018 obtained from the most popular jobsearching websites, which was then collated with Statistics Poland’s data obtained from Z-05 forms. The above-mentioned analysis demonstrated that web-scraping methods can be adopted by public statistics services to obtain statistical data from alternative sources complementing the already-existing databases, providing the findings of such research remain coherent with the results of the already-existing studies.
Źródło:: Wiadomości Statystyczne. The Polish Statistician; 2019, 64, 9; 60-74
0043-518X
Pojawia się w:: Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 14.

Tytuł:: The use of web-scraped data to analyze the dynamics of footwear prices
Autorzy:: Juszczak, Adam
Powiązania:: https://bibliotekanauki.pl/articles/2027264.pdf
Data publikacji:: 2021
Wydawca:: Uniwersytet Ekonomiczny w Katowicach
Tematy:: Big data
Consumer Price Index
Inflation
Online shopping
Web-scraping
Opis:: Aim/purpose – Web-scraping is a technique used to automatically extract data from websites. After the rise-up of online shopping, it allows the acquisition of information about prices of goods sold by retailers such as supermarkets or internet shops. This study examines the possibility of using web-scrapped data from one clothing store. It aims at comparing known price index formulas being implemented to the web-scraping case and verifying their sensitivity on the choice of data filter type. Design/methodology/approach – The author uses the price data scrapped from one of the biggest online shops in Poland. The data were obtained as part of eCPI (electronic Consumer Price Index) project conducted by the National Bank of Poland. The author decided to select three types of products for this analysis – female ballerinas, male shoes, and male oxfords to compare their prices in over one-year time period. Six price indexes were used for calculation – The Jevons and Dutot indexes with their chain and GEKS (acronym from the names of creators – Gini–Éltető–Köves–Szulc) versions. Apart from the analysis conducted on a full data set, the author introduced filters to remove outliers. Findings – Clothing and footwear are considered one of the most difficult groups of goods to measure price change indexes due to high product churn, which undermines the possibility to use the traditional Jevons and Dutot indexes. However, it is possible to use chained indexes and GEKS indexes instead. Still, these indexes are fairly sensitive to large price changes. As observed in case of both product groups, the results provided by the GEKS and chained versions of indexes were different, which could lead to conclusion that even though they are lending promising results, they could be better suited for other COICOP (Classification of Individual Consumption by Purpose) groups. Research implications/limitations – The findings of the paper showed that usage of filters did not significantly reduce the difference between price indexes based on GEKS and chain formulas. Originality/value/contribution – The usage of web-scrapped data is a fairly new topic in the literature. Research on the possibility of using different price indexes provides useful insights for future usage of these data by statistics offices.
Źródło:: Journal of Economics and Management; 2021, 43; 251-269
1732-1948
Pojawia się w:: Journal of Economics and Management
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 15.

Tytuł:: A conceptual model for data quality management in a data system on the World Wide Web
Konceptualny model zarządzania jakością informacji w systemie informacyjnym w sieci WWW
Autorzy:: Czerwiński, Adam
Powiązania:: https://bibliotekanauki.pl/articles/435370.pdf
Data publikacji:: 2014-12
Wydawca:: Uniwersytet Opolski
Tematy:: data quality
management model
data system
Web service
jakość informacji
model zarządzania
system informacyjny
serwis WWW
Opis:: The article presents a conceptual model for data quality management treated as the usability or the compliance of the data product with its specification. The proposed model refers to the well-known TDQM model of Wang based on the Deming's quality improvement cycle. However, the TDQM model does not take into account the impact of the Internet environment on the quality of the data provided by the systems on the Web. The author's model presented in this article takes into account the impact of the Internet on all aspects resulting from data functions in society and organizations. Therefore, it takes into consideration the aspect of promoting data quality management processes, the communication aspect and the aspect of enrichment of individual and collective knowledge. The model also takes into account the fact that the impact of the known properties of the Internet (defined with the acronym MEDIA for example) refers primarily to the contextual quality characteristics of the data on the Web and, only to a small degree, it concerns the internal quality of information pieces described by such features as accuracy, consistency, complexity and precision.
W artykule przedstawiono konceptualny model zarządzania jakością informacji traktowanej jako jej użyteczność lub zgodność produktu informacyjnego z jego specyfikacją. Proponowany model nawiązuje do znanego modelu TDQM R.Y. Wanga opartego na cyklu Deminga doskonalenia jakości. Jednakże model TDQM nie uwzględnia wpływu środowiska Internetu na jakość informacji udostępnianej przez systemy informacyjne w sieci WWW. Zaprezentowany w artykule autorski model bierze pod uwagę wpływ właściwości Internetu na wszystkie aspekty wynikające z funkcji informacji w społeczeństwie i w organizacji. Uwzględnia zatem aspekt wspierania procesów zarządzania jakością informacji, aspekt komunikacyjny oraz aspekt wzbogacania wiedzy indywidualnej i zbiorowej. W modelu uwzględniono także fakt, że wpływ znanych właściwości Internetu (określonych np. akronimem MEDIUM) odnosi się przede wszystkim do kontekstowych cech jakości informacji w sieci WWW, a w małym stopniu dotyczy wewnętrznej jakości jednostek informacji opisanych takimi cechami jak np. dokładność, spójność, złożoność czy precyzja.
Źródło:: Economic and Environmental Studies; 2014, 14, 4(32); 361-373
1642-2597
2081-8319
Pojawia się w:: Economic and Environmental Studies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "web data" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język