User Tools

Site Tools


bigdata

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
bigdata [2020/05/04 14:33] skipidarbigdata [2023/01/14 15:36] (current) skipidar
Line 1: Line 1:
 ==== BigData ==== ==== BigData ====
 +
 +{{https://s3.eu-central-1.amazonaws.com/alf-digital-wiki-pics/sharex/BgjRo5Jfcq.png}}
  
  
Line 61: Line 63:
  
 Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds. You’ll find it used by many well-known companies like Facebook, Airbnb, Netflix, Atlassian, and Nasdaq. Presto can query data where it is stored, without needing to move data into a separate analytics system. Query execution runs in parallel over a pure memory-based architecture, with most results returning in seconds. You’ll find it used by many well-known companies like Facebook, Airbnb, Netflix, Atlassian, and Nasdaq.
 +
 +
 +
 +
 +=== Hadoop vs. Spark? What are the differences? ===
 +
 +Spark can run on top of the Hadoop Cluster.
 +Spark may be a replacement of MapReduce.
 +
 +Hadoop and Apache Spark are both big-data frameworks, but they don't really serve the same purposes.
 +
 +Hadoop is essentially a DISTRIBUTED DATA infrastructure: It distributes massive data collections across multiple nodes within a cluster.
 +
 +Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage.
 +Spark only competes with the MapReduce part of Hadoop.
 +Spark is speedier. Spark is generally a lot faster than MapReduce
 +
 +https://www.infoworld.com/article/3014440/big-data/five-things-you-need-to-know-about-hadoop-v-apache-spark.html
 +
 +
 +
 +
 +=== What is Apache Storm? ===
 +
 +Storm is a competitor of Spark.
 +
 +Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
 +
 +Apache Storm is NOT a DataBase
 +
 +
 +
 +=== Storm vs Spark? ===
 +
 +
 +They do practically the same - processing of data
 +
 +multilantlanguage - Storm is better (like R)
 +data sources - Spark is better (like S3)
 +
 +
bigdata.1588602825.txt.gz · Last modified: (external edit)