User Tools

Site Tools


cloud:aws:big_data

This is an old revision of the document!


Big Data

Analytics

Video about Big Data

A nice overview of UI for available BigData tools

https://www.youtube.com/watch?v=HeJ15SpR66w

Here the title picture again

  • Query S3 data with Athena
  • Glue Job to transform data
  • Query S3 data with Redshift Spectrum
  • Query Postgres data with Redshift Federated Query
  • Query data in S3, RDS (Postgres) and Redshift together
  • Create Materialized View with Federated Query in Redshift

Structure of Data pipeline

Even better overview of tools for data-pipelines

https://www.youtube.com/watch?v=tykcCf-Zz1M

  • Data Source >
  • Data Ingestion >
  • Raw Storage >
  • Business rules transformation, consolidation (Glue, EMR)
  • Processed Zone

Comparison services about shifting of big data

As generated by ChatGPT.

Parameter AWS Kinesis Firehose AWS Glue Service AWS EMR AWS Athena Apache Flink
Purpose Real-time data ingestion and transformation for data streams. ETL and data preparation for analytics and warehousing. Managed big data processing with Hadoop and Spark. Serverless SQL query service for data in Amazon S3. Stream processing for real-time data applications.
Pricing Model Pay-as-you-go DPU-based Instance-based Per query and data Infrastructure costs
Data Processing and Integration Real-time data streaming and transformation ETL, data preparation Big data processing SQL query service Stream processing
Data Sources AWS services, cloud apps Databases, data lakes, APIs Various sources Amazon S3 Multiple sources
Integration and Output AWS services, S3, Redshift, Elasticsearch, etc. AWS services, data warehouses Various AWS services Amazon S3, export Multiple data sinks
Data Catalog and Metadata Management None AWS Glue Data Catalog Integration with AWS Glue AWS Glue Data Catalog External tools may be required
cloud/aws/big_data.1697634727.txt.gz · Last modified: by skipidar