Differences

This shows you the differences between two versions of the page.

--- bigdata [2020/05/04 14:33] – skipidar
+++ bigdata [2023/01/14 15:36] (current) – skipidar
@@ Line 1: / Line 1: @@
 ==== BigData ====
+{{https://s3.eu-central-1.amazonaws.com/alf-digital-wiki-pics/sharex/BgjRo5Jfcq.png}}
@@ Line 61: / Line 63: @@
 Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds. You’ll find it used by many well-known companies like Facebook, Airbnb, Netflix, Atlassian, and Nasdaq.
+=== Hadoop vs. Spark? What are the differences? ===
+Spark can run on top of the Hadoop Cluster.
+Spark may be a replacement of MapReduce.
+Hadoop and Apache Spark are both big-data frameworks, but they don't really serve the same purposes.
+Hadoop is essentially a DISTRIBUTED DATA infrastructure: It distributes massive data collections across multiple nodes within a cluster.
+Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage.
+Spark only competes with the MapReduce part of Hadoop.
+Spark is speedier. Spark is generally a lot faster than MapReduce
+https://www.infoworld.com/article/3014440/big-data/five-things-you-need-to-know-about-hadoop-v-apache-spark.html
+=== What is Apache Storm? ===
+Storm is a competitor of Spark.
+Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
+Apache Storm is NOT a DataBase
+=== Storm vs Spark? ===
+They do practically the same - processing of data
+multilantlanguage - Storm is better (like R)
+data sources - Spark is better (like S3)