Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "web data" wg kryterium: Temat


Tytuł:
Current challenges and possible big data solutions for the use of web data as a source for official statistics
Współczesne wyzwania i możliwości w zakresie stosowania narzędzi big data do uzyskania danych webowych jako źródła dla statystyki publicznej
Autorzy:
Daas, Piet
Maślankowski, Jacek
Powiązania:
https://bibliotekanauki.pl/articles/31232088.pdf
Data publikacji:
2023-12-29
Wydawca:
Główny Urząd Statystyczny
Tematy:
big data
web data
websites
web scraping
dane webowe
strony internetowe
Opis:
Web scraping has become popular in scientific research, especially in statistics. Preparing an appropriate IT environment for web scraping is currently not difficult and can be done relatively quickly. Extracting data in this way requires only basic IT skills. This has resulted in the increased use of this type of data, widely referred to as big data, in official statistics. Over the past decade, much work was done in this area both on the national level within the national statistical institutes, and on the international one by Eurostat. The aim of this paper is to present and discuss current problems related to accessing, extracting, and using information from websites, along with the suggested potential solutions. For the sake of the analysis, a case study featuring large-scale web scraping performed in 2022 by means of big data tools is presented in the paper. The results from the case study, conducted on a total population of approximately 503,700 websites, demonstrate that it is not possible to provide reliable data on the basis of such a large sample, as typically up to 20% of the websites might not be accessible at the time of the survey. What is more, it is not possible to know the exact number of active websites in particular countries, due to the dynamic nature of the Internet, which causes websites to continuously change.
Web scraping jest coraz popularniejszy w badaniach naukowych, zwłaszcza w dziedzinie statystyki. Przygotowanie środowiska do scrapowania danych nie przysparza obecnie trudności i może być wykonane relatywnie szybko, a uzyskiwanie informacji w ten sposób wymaga jedynie podstawowych umiejętności cyfrowych. Dzięki temu statystyka publiczna w coraz większym stopniu korzysta z dużych wolumenów danych, czyli big data. W drugiej dekadzie XXI w. zarówno krajowe urzędy statystyczne, jak i Eurostat włożyły dużo pracy w doskonalenie narzędzi big data. Nadal istnieją jednak trudności związane z dostępnością, ekstrakcją i wykorzystywaniem informacji pobranych ze stron internetowych. Tym problemom oraz potencjalnym sposobom ich rozwiązania został poświęcony niniejszy artykuł. Omówiono studium przypadku masowego web scrapingu wykonanego w 2022 r. za pomocą narzędzi big data na próbie 503 700 stron internetowych. Z analizy wynika, że dostarczenie wiarygodnych danych na podstawie tak dużej próby jest niemożliwe, ponieważ w czasie badania zwykle do 20% stron internetowych może być niedostępnych. Co więcej, dokładna liczba aktywnych stron internetowych w poszczególnych krajach nie jest znana ze względu na dynamiczny charakter Internetu, skutkujący ciągłymi zmianami stron internetowych.
Źródło:
Wiadomości Statystyczne. The Polish Statistician; 2023, 68, 12; 49-64
0043-518X
Pojawia się w:
Wiadomości Statystyczne. The Polish Statistician
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Analysis of the cadastral data published in the Polish Spatial Data Infrastructure
Autorzy:
Izdebski, W.
Powiązania:
https://bibliotekanauki.pl/articles/145438.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
dane katastralne
usługi sieciowe
mapa internetowa
web services
Web Map Service
spatial data infrastructure
cadastral data
Opis:
The cadastral data, including land parcels, are the basic reference data for presenting various objects collected in spatial databases. Easy access to up-to-date records is a very important matter for the individuals and institutions using spatial data infrastructure. The primary objective of the study was to check the current accessibility of cadastral data as well as to verify how current and complete they are. The author started researching this topic in 2007, i.e. from the moment the Team for National Spatial Data Infrastructure developed documentation concerning the standard of publishing cadastral data with the use of the WMS. Since ten years, the author was monitoring the status of cadastral data publishing in various districts as well as participated in data publishing in many districts. In 2017, when only half of the districts published WMS services from cadastral data, the questions arise: why is it so and how to change this unfavourable status? As a result of the tests performed, it was found that the status of publishing cadastral data is still far from perfect. The quality of the offered web services varies and, unfortunately, many services offer poor performance; moreover, there are plenty services that do not operate at all.
Źródło:
Geodesy and Cartography; 2017, 66, 2; 227-240
2080-6736
2300-2581
Pojawia się w:
Geodesy and Cartography
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Mailing Lists Archives Analyzer
Autorzy:
Rzecki, K.
Riegel, M.
Powiązania:
https://bibliotekanauki.pl/articles/93058.pdf
Data publikacji:
2006
Wydawca:
Uniwersytet Przyrodniczo-Humanistyczny w Siedlcach
Tematy:
e-mail header
data analyzing
web mining
Opis:
Article describes chance to explore data hidden in headers of e-mails taken from archive of mailing lists. Scientist part of the article presents a way of transforms information enclosed in Internet resources, explains idea of mailing lists archive and points out knowledge can be taken from. Technical part presents implemented and working system analyzing headers of e-mail messages stored in mailing lists archives. Some example results of this experiment are also given.
Źródło:
Studia Informatica : systems and information technology; 2006, 1(7); 117-125
1731-2264
Pojawia się w:
Studia Informatica : systems and information technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Distributed web service repository
Autorzy:
Nawrocki, P.
Mamla, A.
Powiązania:
https://bibliotekanauki.pl/articles/305281.pdf
Data publikacji:
2015
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
web service
repository
heterogeneity
data replication
node balancing
Opis:
The increasing availability and popularity of computer systems has resulted in a demand for new language- and platform-independent ways of data exchange. This demand has, in turn, led to significant growth in the importance of systems based on Web services. Alongside the growing number of systems accessible via Web services came the need for specialized data repositories that could offer effective means of searching the available services. The development of mobile systems and wireless data transmission technologies has allowed us to use distributed devices and computer systems on a greater scale. The accelerating growth of distributed systems might be a good reason to consider the development of distributed Web service repositories with built-in mechanisms for data migration and synchronization.
Źródło:
Computer Science; 2015, 16 (1); 55-73
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data presentation on the map in Google Charts and jQuery JavaScript technologies
Prezentacja danych na mapie w technologiach Google Charts oraz jQuery JavaScript
Autorzy:
Król, K.
Powiązania:
https://bibliotekanauki.pl/articles/100414.pdf
Data publikacji:
2016
Wydawca:
Uniwersytet Rolniczy im. Hugona Kołłątaja w Krakowie
Tematy:
data visualisation
mashup
web cartography
wizualizacja danych
kartografia internetowa
Opis:
The article presents selected software development tools and technologies that enable the presentation of statistical data on digital maps in the browser. The aim of the study was to describe them, and to conduct their comparative evaluation. In our studies, we have used ad-hoc tests, performed on the basis of usability and functionality, using the technique of self-evaluation. Based on the criteria of global popularity and availability, the following were subjected to ad-hoc tests: Google Visualization - Geomap and Geo Chart, as well as selected solutions developed on the basis of the jQuery JavaScript. In conclusion, it has been demonstrated that the tested design and development technologies are complementary, while the selection of tools to carry out the design principles assumed remains at the discretion of the user.
W artykule przedstawiono wybrane techniki i narzędzia programistyczne, które umożliwiają prezentację danych statystycznych na mapach cyfrowych w oknie przeglądarki internetowej. Celem pracy była ich charakterystyka i ocena porównawcza. W badaniach posłużono się testami typu ad-hoc, które przeprowadzono na gruncie użyteczności i funkcjonalności, posługując się techniką samooceny. Na podstawie kryterium popularności oraz dostępności w świecie testom ad-hoc poddano Google Visualization: Geomap oraz Geo Chart, a także wybrane rozwiązania przygotowane w oparciu o jQuery JavaScript. W konkluzji wykazano, że testowane techniki projektowe są komplementarne, a w gestii użytkownika pozostaje dobór narzędzi umożliwiających realizację przyjętych założeń projektowych.
Źródło:
Geomatics, Landmanagement and Landscape; 2016, 2; 91-106
2300-1496
Pojawia się w:
Geomatics, Landmanagement and Landscape
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Data mining
Autorzy:
Morzy, Tadeusz
Powiązania:
https://bibliotekanauki.pl/articles/703139.pdf
Data publikacji:
2007
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
data mining
data analysis
evolution of information technology
association analysis
classification
clustering
Web mining
Opis:
Recent advances in data capture, data transmission and data storage technologies have resulted in a growing gap between more powerful database systems and users' ability to understand and effectively analyze the information collected. Many companies and organizations gather gigabytes or terabytes of business transactions, scientific data, web logs, satellite pictures, textreports, which are simply too large and too complex to support a decision making process. Traditional database and data warehouse querying models are not sufficient to extract trends, similarities and correlations hidden in very large databases. The value of the existing databases and data warehouses can be significantly enhanced with help of data mining. Data mining is a new research area which aims at nontrivial extraction of implicit, previously unknown and potentially useful information from large databases and data warehouses. Data mining, also referred to as database mining or knowledge discovery in databases, can help answer business questions that were too time consuming to resolve with traditional data processing techniques. The process of mining the data can be perceived as a new way of querying – with questions such  as ”which clients are likely to respond to our next promotional mailing, and why?”. The aim of this paper is to present an overall picture of the data mining field as well as presents briefly few data mining methods. Finally, we summarize the concepts presented in the paper and discuss some problems related with data mining technology.
Źródło:
Nauka; 2007, 3
1231-8515
Pojawia się w:
Nauka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Web-based software system for processing bilingual digital resources
Autorzy:
Dutsova, Ralitsa
Powiązania:
https://bibliotekanauki.pl/articles/677188.pdf
Data publikacji:
2014
Wydawca:
Polska Akademia Nauk. Instytut Slawistyki PAN
Tematy:
aligned corpus
concordance
data mining
dictionary entry
digital dictionary
search tool
web-interface
web-application
Opis:
Web-based software system for processing bilingual digital resourcesThe article describes a software management system developed at the Institute of Mathematics and Informatics, BAS, for the creation, storing and processing of digital language resources in Bulgarian. Independent components of the system are intended for the creation and management of bilingual dictionaries, for information retrieval and data mining from a bilingual dictionary, and for the presentation of aligned corpora. A module which connects these components is also being developed. The system, implemented as a web-application, contains tools for compilation, editing and search within all components.
Źródło:
Cognitive Studies; 2014, 14
2392-2397
Pojawia się w:
Cognitive Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The Integration, Analysis and Visualization of Sensor Data from Dispersed Wireless Sensor Network Systems Using the SWE Framework
Autorzy:
Lee, Y. J.
Trevathan, J.
Atkinson, I.
Read, W.
Powiązania:
https://bibliotekanauki.pl/articles/308227.pdf
Data publikacji:
2015
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
environment data
environmental monitoring
sensor technologies
standardization
web-based visualization
Opis:
Wireless Sensor Networks (WSNs) have been used in numerous applications to remotely gather real-time data on important environmental parameters. There are several projects where WSNs are deployed in different locations and operate independently. Each deployment has its own models, encodings, and services for sensor data, and are integrated with different types of visualization/analysis tools based on individual project requirements. This makes it dicult to reuse these services for other WSN applications. A user/system is impeded by having to learn the models, encodings, and services of each system, and also must integrate/interoperate data from different data sources. Sensor Web Enablement (SWE) provides a set of standards (web service interfaces and data encoding/model specications) to make sensor data publicly available on the web. This paper describes how the SWE framework can be extended to integrate disparate WSN systems and to support standardized access to sensor data. The proposed system also introduces a web-based data visualization and statistical analysis service for data stored in the Sensor Observation Service (SOS) by integrating open source technologies. A performance analysis is presented to show that the additional features have minimal impact on the system. Also some lessons learned through implementing SWE are discussed.
Źródło:
Journal of Telecommunications and Information Technology; 2015, 4; 86-97
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The use of web-scraped data to analyze the dynamics of footwear prices
Autorzy:
Juszczak, Adam
Powiązania:
https://bibliotekanauki.pl/articles/2027264.pdf
Data publikacji:
2021
Wydawca:
Uniwersytet Ekonomiczny w Katowicach
Tematy:
Big data
Consumer Price Index
Inflation
Online shopping
Web-scraping
Opis:
Aim/purpose – Web-scraping is a technique used to automatically extract data from websites. After the rise-up of online shopping, it allows the acquisition of information about prices of goods sold by retailers such as supermarkets or internet shops. This study examines the possibility of using web-scrapped data from one clothing store. It aims at comparing known price index formulas being implemented to the web-scraping case and verifying their sensitivity on the choice of data filter type. Design/methodology/approach – The author uses the price data scrapped from one of the biggest online shops in Poland. The data were obtained as part of eCPI (electronic Consumer Price Index) project conducted by the National Bank of Poland. The author decided to select three types of products for this analysis – female ballerinas, male shoes, and male oxfords to compare their prices in over one-year time period. Six price indexes were used for calculation – The Jevons and Dutot indexes with their chain and GEKS (acronym from the names of creators – Gini–Éltető–Köves–Szulc) versions. Apart from the analysis conducted on a full data set, the author introduced filters to remove outliers. Findings – Clothing and footwear are considered one of the most difficult groups of goods to measure price change indexes due to high product churn, which undermines the possibility to use the traditional Jevons and Dutot indexes. However, it is possible to use chained indexes and GEKS indexes instead. Still, these indexes are fairly sensitive to large price changes. As observed in case of both product groups, the results provided by the GEKS and chained versions of indexes were different, which could lead to conclusion that even though they are lending promising results, they could be better suited for other COICOP (Classification of Individual Consumption by Purpose) groups. Research implications/limitations – The findings of the paper showed that usage of filters did not significantly reduce the difference between price indexes based on GEKS and chain formulas. Originality/value/contribution – The usage of web-scrapped data is a fairly new topic in the literature. Research on the possibility of using different price indexes provides useful insights for future usage of these data by statistics offices.
Źródło:
Journal of Economics and Management; 2021, 43; 251-269
1732-1948
Pojawia się w:
Journal of Economics and Management
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A conceptual model for data quality management in a data system on the World Wide Web
Konceptualny model zarządzania jakością informacji w systemie informacyjnym w sieci WWW
Autorzy:
Czerwiński, Adam
Powiązania:
https://bibliotekanauki.pl/articles/435370.pdf
Data publikacji:
2014-12
Wydawca:
Uniwersytet Opolski
Tematy:
data quality
management model
data system
Web service
jakość informacji
model zarządzania
system informacyjny
serwis WWW
Opis:
The article presents a conceptual model for data quality management treated as the usability or the compliance of the data product with its specification. The proposed model refers to the well-known TDQM model of Wang based on the Deming's quality improvement cycle. However, the TDQM model does not take into account the impact of the Internet environment on the quality of the data provided by the systems on the Web. The author's model presented in this article takes into account the impact of the Internet on all aspects resulting from data functions in society and organizations. Therefore, it takes into consideration the aspect of promoting data quality management processes, the communication aspect and the aspect of enrichment of individual and collective knowledge. The model also takes into account the fact that the impact of the known properties of the Internet (defined with the acronym MEDIA for example) refers primarily to the contextual quality characteristics of the data on the Web and, only to a small degree, it concerns the internal quality of information pieces described by such features as accuracy, consistency, complexity and precision.
W artykule przedstawiono konceptualny model zarządzania jakością informacji traktowanej jako jej użyteczność lub zgodność produktu informacyjnego z jego specyfikacją. Proponowany model nawiązuje do znanego modelu TDQM R.Y. Wanga opartego na cyklu Deminga doskonalenia jakości. Jednakże model TDQM nie uwzględnia wpływu środowiska Internetu na jakość informacji udostępnianej przez systemy informacyjne w sieci WWW. Zaprezentowany w artykule autorski model bierze pod uwagę wpływ właściwości Internetu na wszystkie aspekty wynikające z funkcji informacji w społeczeństwie i w organizacji. Uwzględnia zatem aspekt wspierania procesów zarządzania jakością informacji, aspekt komunikacyjny oraz aspekt wzbogacania wiedzy indywidualnej i zbiorowej. W modelu uwzględniono także fakt, że wpływ znanych właściwości Internetu (określonych np. akronimem MEDIUM) odnosi się przede wszystkim do kontekstowych cech jakości informacji w sieci WWW, a w małym stopniu dotyczy wewnętrznej jakości jednostek informacji opisanych takimi cechami jak np. dokładność, spójność, złożoność czy precyzja.
Źródło:
Economic and Environmental Studies; 2014, 14, 4(32); 361-373
1642-2597
2081-8319
Pojawia się w:
Economic and Environmental Studies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
The application of web analytics by owners of rural tourism facilities in Poland – diagnosis and an attempt at a measurement
Autorzy:
Król, Karol
Powiązania:
https://bibliotekanauki.pl/articles/1911954.pdf
Data publikacji:
2019-12-28
Wydawca:
Uniwersytet Przyrodniczy w Poznaniu. Wydawnictwo Uczelniane
Tematy:
data analytics
web analytics
rural tourism
optimisation
Google Analytics
digital marketing
Opis:
Data analytics changes the way the enterprises operate, and allows them to both discover new business opportunities and offer innovative products and services. Data analytics can also be used to boost the performance of rural businesses. Therefore, the purpose of this study is to identify web analytics tools used by owners of rural tourism establishments in Poland. The study covered 965 websites hosted on selected paid domains. SEOptimer, an online application, and Google Tag Assistant were used in order to identify the analytical tools. Findings from the study were considered in the context of search engine optimization level and of the scale of business operations. In the sample covered by this study, a form of web analytics was found in 449 websites (46.5%); however, a data collection engine was probably implemented in 425 websites (44%). Google Analytics was the most frequent analytical tool. It was demonstrated that rural tourism establishments used analytical tools to a lesser extent than other businesses which rely on the Internet to promote and sell their products. Moreover, owners of higher-standard rural tourism establishments are more keen to use web analytics.
Analityka danych zmienia sposób funkcjonowania przedsiębiorstw, pozwala im odkrywać nowe możliwości biznesowe oraz oferować innowacyjne produkty i usługi. Analityka danych może znaleźć zastosowanie także w podnoszeniu efektywności działań biznesowych podejmowanych na obszarach wiejskich. Celem pracy jest identyfikacja narzędzi analityki internetowej wykorzystywanych przez właścicieli obiektów turystyki wiejskiej w Polsce. Badaniami objęto 965 witryn w wybranych domenach płatnych. Do identyfikacji narzędzi analitycznych wykorzystano aplikację internetową SEOptimer oraz narzędzie Google Tag Assistant. Wyniki badań odniesiono do stopnia optymalizacji witryny internetowej (SEO) oraz skali działalności obiektu. W badanym zbiorze odnotowano 449 witryn (46,5%), na których zidentyfikowano jedną z form analityki internetowej, jednak dane zbierane były prawdopodobnie w przypadku 425 witryn (44%). Najczęściej odnotowywanym narzędziem analitycznym było Google Analytics. Wykazano, że wykorzystanie narzędzi analitycznych przez obiekty turystyki wiejskiej cieszy się mniejszą popularnością w porównaniu do innych podmiotów wykorzystujących Internet w promocji i sprzedaży oferowanych produktów.
Źródło:
Journal of Agribusiness and Rural Development; 2019, 54, 4; 319-326
1899-5241
Pojawia się w:
Journal of Agribusiness and Rural Development
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Scientometric and Bibliometric Analysis in Analytical Marketing Research
Autorzy:
Więcek-Janka, Ewa
Szewczuk, Sandra
Powiązania:
https://bibliotekanauki.pl/articles/2168352.pdf
Data publikacji:
2022
Wydawca:
Uniwersytet Marii Curie-Skłodowskiej. Wydawnictwo Uniwersytetu Marii Curie-Skłodowskiej
Tematy:
analytical marketing
financial marketing
data-driven marketing
Web of Science
VOSviewer
Opis:
Theoretical background: Analytical marketing is at the heart of scientific research because it plays an important role in building the competitiveness of enterprises and is an opportunity for them to grow.Purpose of the article: The aim of the article is to present the results of a bibliometric analysis of the developing area of analytical marketing.Research methods: For this purpose, specialist journals published between 1900 and 2021 were searched in the Web of Science database. The scientometric analyses carried out on their basis concern the number of publications, authorship and co-authorship, the number of citations, journals, thematic categories, institutions, countries and keywords. Over 200 publications cited 2,563 times were analyzed.Main findings: The concept of analytical marketing was taken into account by over 400 authors, with Maria Petrescu authoring the highest number of publications, and Michel Wedel being the most significant author due to the number of citations. An important role, due to the number of publications in this area, is played by institutions based in the USA (over 50%), including the University of Nevada, Las Vegas (UNLV) and the Nevada System of Higher Education (NSHE). What is more, the conducted research emphasizes the importance of marketing analytics and presents benefits that stem from using it.
Źródło:
Annales Universitatis Mariae Curie-Skłodowska, sectio H – Oeconomia; 2022, 56, 1; 143-167
0459-9586
2449-8513
Pojawia się w:
Annales Universitatis Mariae Curie-Skłodowska, sectio H – Oeconomia
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A Python library for the Jupyteo IDE Earth observation processing tool enabling interoperability with the QGIS System for use in data science
Autorzy:
Bednarczyk, Michał
Powiązania:
https://bibliotekanauki.pl/articles/2055774.pdf
Data publikacji:
2022
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
Earth observation data processing
IDE
IPython
Jupyter notebook
web processing service
GIS
data science
machine learning
API
Opis:
This paper describes JupyQgis – a new Python library for Jupyteo IDE enabling interoperability with the QGIS system. Jupyteo is an online integrated development environment for earth observation data processing and is available on a cloud platform. It is targeted at remote sensing experts, scientists and users who can develop the Jupyter notebook by reusing embedded open-source tools, WPS interfaces and existing notebooks. In recent years, there has been an increasing popularity of data science methods that have become the focus of many organizations. Many scientific disciplines are facing a significant transformation due to data-driven solutions. This is especially true of geodesy, environmental sciences, and Earth sciences, where large data sets, such as Earth observation satellite data (EO data) and GIS data are used. The previous experience in using Jupyteo, both among the users of this platform and its creators, indicates the need to supplement its functionality with GIS analytical tools. This study analyzed the most efficient way to combine the functionality of the QGIS system with the functionality of the Jupyteo platform in one tool. It was found that the most suitable solution is to create a custom library providing an API for collaboration between both environments. The resulting library makes the work much easier and simplifies the source code of the created Python scripts. The functionality of the developed solution was illustrated with a test use case.
Źródło:
Geomatics and Environmental Engineering; 2022, 16, 1; 117--144
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Domain WEB Monitoring
Autorzy:
Kluska-Nawarecka, S.
Opaliński, A.
Wilk-Kołodziejczyk, D.
Powiązania:
https://bibliotekanauki.pl/articles/381651.pdf
Data publikacji:
2015
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
web monitoring
foundry industry
data integration
monitoring Internetu
przemysł odlewniczy
integracja danych
Opis:
The last few years have seen a very dynamic development of the Internet worldwide. This is related to the rapid growth of the amount of information stored in its resources. The vast amount of data, impossible to be analyzed by man, is the reason why finding and selecting valuable information from a large number of results returned by search engines has recently become the task very difficult. Another problem is the low quality of the data contained in a large part of the results returned by search engines. This situation poses serious problems if one searches for detailed information related to the specific area of industry or science. In addition, the lack of effective solutions, allowing for continuous monitoring of WEB in terms of the search for emerging information while maintaining the high quality of the returned results, only aggravates this situation. Due to this state of affairs, a solution highly welcome would be a system allowing for continuous monitoring of the WEB and searching for valuable information from the selected Internet resources. This paper describes a concept of such a system along with its initial implementation and application to search for information in the foundry industry. The results of a prototype implementation of this system were presented, and plans for its further development and adaptation to other sectors of the industry were outlined.
Źródło:
Archives of Foundry Engineering; 2015, 15, 2 spec.; 43-46
1897-3310
2299-2944
Pojawia się w:
Archives of Foundry Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A k-Nearest Neighbors Method for Classifying User Sessions in E-Commerce Scenario
Autorzy:
Suchacka, G.
Skolimowska-Kulig, M.
Potempa, A.
Powiązania:
https://bibliotekanauki.pl/articles/308645.pdf
Data publikacji:
2015
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
data mining
e-commerce
k-Nearest Neighbors
k-NN
log file analysis
online store
R-project
supervised classification
web mining
Web store
Web traffic
Web usage mining
Opis:
This paper addresses the problem of classification of user sessions in an online store into two classes: buying sessions (during which a purchase confirmation occurs) and browsing sessions. As interactions connected with a purchase confirmation are typically completed at the end of user sessions, some information describing active sessions may be observed and used to assess the probability of making a purchase. The authors formulate the problem of predicting buying sessions in a Web store as a supervised classification problem where there are two target classes, connected with the fact of finalizing a purchase transaction in session or not, and a feature vector containing some variables describing user sessions. The presented approach uses the k-Nearest Neighbors (k-NN) classification. Based on historical data obtained from online bookstore log files a k-NN classifier was built and its efficiency was verified for different neighborhood sizes. A 11-NN classifier was the most effective both in terms of buying session predictions and overall predictions, achieving sensitivity of 87.5% and accuracy of 99.85%.
Źródło:
Journal of Telecommunications and Information Technology; 2015, 3; 64-69
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies