Temat: Apache Spark - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: A distributed algorithm for protein identification from tandem mass spectrometry data
Autorzy:: Orzechowska, Katarzyna
Rubel, Tymon
Kurjata, Robert
Zaremba, Krzysztof
Powiązania:: https://bibliotekanauki.pl/articles/2097435.pdf
Data publikacji:: 2022
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: proteomics
mass spectrometry
distributed computing
Apache Spark
Opis:: Tandem mass spectrometry is an analytical technique widely used in proteomics for the high-throughput characterization of proteins in biological samples. Modern in-depth proteomic studies require the collection of even millions of mass spectra representing short protein fragments (peptides). In order to identify the peptides, the measured spectra are most often scored against a database of amino acid sequences of known proteins. Due to the volume of input data and the sizes of proteomic databases, this is a resource-intensive task, which requires an efficient and scalable computational strategy. Here, we present SparkMS, an algorithm for peptide and protein identification from mass spectrometry data explicitly designed to work in a distributed computational environment. To achieve the required performance and scalability, we use Apache Spark, a modern framework that is becoming increasingly popular not only in the field of “big data” analysis but also in bioinformatics. This paper describes the algorithm in detail and demonstrates its performance on a large proteomic dataset. Experimental results indicate that SparkMS scales with the number of worker nodes and the increas-ing complexity of the search task. Furthermore, it exhibits a protein identification efficiency comparable to X!Tandem, a widely-used proteomic search engine.
Źródło:: Applied Computer Science; 2022, 18, 2; 16--27
1895-3735
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Detection of DDoS Attacks in OpenStack-based Private Cloud Using Apache Spark
Autorzy:: Gumaste, Shweta
G., Narayan D.
Shinde, Sumedha
K., Amit
Powiązania:: https://bibliotekanauki.pl/articles/1839316.pdf
Data publikacji:: 2020
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: cloud
DDoS
distributed processing
OpenStack
Apache Spark
random forest
Opis:: Security is a critical concern for cloud service providers. Distributed denial of service (DDoS) attacks are the most frequent of all cloud security threats, and the consequences of damage caused by DDoS are very serious. Thus, the design of an efficient DDoS detection system plays an important role in monitoring suspicious activity in the cloud. Real-time detection mechanisms operating in cloud environments and relying on machine learning algorithms and distributed processing are an important research issue. In this work, we propose a real-time detection of DDoS attacks using machine learning classifiers on a distributed processing platform. We evaluate the DDoS detection mechanism in an OpenStack-based cloud testbed using the Apache Spark framework. We compare the classification performance using benchmark and real-time cloud datasets. Results of the experiments reveal that the random forest method offers better classifier accuracy. Furthermore, we demonstrate the effectiveness of the proposed distributed approach in terms of training and detection time.
Źródło:: Journal of Telecommunications and Information Technology; 2020, 4; 62-71
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Scaling evolutionary programming with the use of apache spark
Autorzy:: Funika, W.
Koperek, P.
Powiązania:: https://bibliotekanauki.pl/articles/952932.pdf
Data publikacji:: 2016
Wydawca:: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:: distributed systems
evolutionary programming
symbolic regression
scaling
Apache Spark
Opis:: Organizations across the globe gather more and more data, encouraged by easy-to-use and cheap cloud storage services. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, use of this method for large-scale dataset analysis might be unfeasible. In this paper, we analyze a bottleneck in the open-source implementation of this method we call hubert. We identify that the evaluation of individuals is the most costly operation. As a solution to this problem, we propose a new evaluation service based on the Apache Spark framework, which attempts to speed up computations by executing them in a distributed manner on a cluster of machines. We analyze the performance of the service by comparing the evaluation execution time of a number of samples with the use of both implementations. Finally, we draw conclusions and outline plans for further research.
Źródło:: Computer Science; 2016, 17 (1); 69-82
1508-2806
2300-7036
Pojawia się w:: Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Influence of YARN schedulers on power consumption and processing time for various big data benchmarks
Autorzy:: Drypczewski, Krzysztof
Proficz, Jerzy
Stepnowski, Andrzej
Powiązania:: https://bibliotekanauki.pl/articles/1955269.pdf
Data publikacji:: 2018
Wydawca:: Politechnika Gdańska
Tematy:: Apache Spark
YARN
big data
green computing
Sentinel
Tera Sort
word count
benchmarks
scheduler
Opis:: Climate change caused by human activities can influence the lives of everybody on the planet. The environmental concerns must be taken into consideration by all fields of study includingICT. Green Computing aims to reduce negative effects of IT on the environment while, at the same time, maintaining all of the possible benefits it provides. Several Big Data platforms like Apache Spark or YARN have become widely used in analytics and High-Performance Computing systems due to the reliability and usability of Map Reduce implementations. The authors research the power consumption and energy efficiency of Hadoop YARN schedulers using Apache Spark under three different workloads. The test cases include: sorting large binary files,counting unique words in large text files and processing satellite imagery from the Sentinel-2mission. The presented results show small (2%–11%) but distinct differences in the power consumption of FIFO and FAIR schedulers.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2018, 22, 4; 303--312
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Cloud-based sentiment analysis for measuring customer satisfaction in the Moroccan banking sector using Naïve Bayes and Stanford NLP
Autorzy:: Riadsolh, Anouar
Lasri, Imane
ElBelkacemi, Mourad
Powiązania:: https://bibliotekanauki.pl/articles/2141901.pdf
Data publikacji:: 2020
Wydawca:: Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:: Big Data processing
Apache Spark
Apache Kafka
real-time text processing
sentiment analysis
Stanford core NLP
Naïve Bayes classifier
Opis:: In a world where every day we produce 2.5 quintillion bytes of data, sentiment analysis has been a key for making sense of that data. However, to process huge text data in real-time requires building a data processing pipeline in order to minimize the latency to process data streams. In this paper, we explain and evaluate our proposed real-time customer’ sentiment analysis pipeline on the Moroccan banking sector through data from the web and social network using open-source big data tools such as data ingestion using Apache Kafka, In-memory data processing using Apache Spark, Apache HBase for storing tweets and the satisfaction indicator, and ElasticSearch and Kibana for visualization then NodeJS for building a web application. The performance evaluation of Naïve Bayesian model show that for French Tweets the accuracy has reached 76.19% while for English Tweets the result was unsatisfactory and the resulting accuracy is 56%. To remedy this problem, we used the Stanford core NLP which, for English Tweets, reaches a precision of 80.7%.
Źródło:: Journal of Automation Mobile Robotics and Intelligent Systems; 2020, 14, 4; 64-71
1897-8649
2080-2145
Pojawia się w:: Journal of Automation Mobile Robotics and Intelligent Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: Processing of satellite data in the cloud
Autorzy:: Proficz, J.
Drypczewski, K.
Powiązania:: https://bibliotekanauki.pl/articles/1940555.pdf
Data publikacji:: 2017
Wydawca:: Politechnika Gdańska
Tematy:: Apache Spark
satellite data
Sentinel-2
ESA
big data
cloud
OpenStack
dane satelitarne
duże zbiory danych
chmura
Opis:: The dynamic development of digital technologies, especially those dedicated to devices generating large data streams, such as all kinds of measurement equipment (temperature and humidity sensors, cameras, radio-telescopes and satellites – Internet of Things) enables more in-depth analysis of the surrounding reality, including better understanding of various natural phenomenon, starting from atomic level reactions, through macroscopic processes (e.g. meteorology) to observation of the Earth and the outer space. On the other hand such a large quantitative improvement requires a great number of processing and storage resources, resulting in the recent rapid development of Big Data technologies. Since 2015, the European Space Agency (ESA) has been providing a great amount of data gathered by exploratory equipment: a collection of Sentinel satellites – which perform Earth observation using various measurement techniques. For example Sentinel-2 provides a stream of digital photos, including images of the Baltic Sea and the whole territory of Poland. This data is used in an experimental installation of a Big Data processing system based on the open source software at the Academic Computer Center in Gdansk. The center has one of the most powerful supercomputers in Poland – the Tryton computing cluster, consisting of 1600 nodes interconnected by a fast Infiniband network (56 Gbps) and over 6 PB of storage. Some of these nodes are used as a computational cloud supervised by an OpenStack platform, where the Sentinel-2 data is processed. A subsystem of the automatic, perpetual data download to object storage (based on Swift) is deployed, the required software libraries for the image processing are configured and the Apache Spark cluster has been set up. The above system enables gathering and analysis of the recorded satellite images and the associated metadata, benefiting from the parallel computation mechanisms. This paper describes the above solution including its technical aspects.
Źródło:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk; 2017, 21, 4; 365-377
1428-6394
Pojawia się w:: TASK Quarterly. Scientific Bulletin of Academic Computer Centre in Gdansk
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "Apache Spark" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język