cloud:aws:big_data
This is an old revision of the document!
Table of Contents
Big Data
Analytics
Video about Big Data
A nice overview of UI for available BigData tools
https://www.youtube.com/watch?v=HeJ15SpR66w
- Query S3 data with Athena
- Glue Job to transform data
- Query S3 data with Redshift Spectrum
- Query Postgres data with Redshift Federated Query
- Query data in S3, RDS (Postgres) and Redshift together
- Create Materialized View with Federated Query in Redshift
Structure of Data pipeline
Even better overview of tools for data-pipelines
https://www.youtube.com/watch?v=tykcCf-Zz1M
- Data Source >
- Data Ingestion >
- Raw Storage >
- Business rules transformation, consolidation (Glue, EMR)
- Processed Zone
Comparison services about shifting of big data
Parameter | AWS Kinesis Firehose | AWS Glue Service | AWS EMR | AWS Athena | Apache Flink |
---|---|---|---|---|---|
Purpose | Real-time data ingestion and transformation for data streams. | ETL and data preparation for analytics and warehousing. | Managed big data processing with Hadoop and Spark. | Serverless SQL query service for data in Amazon S3. | Stream processing for real-time data applications. |
Pricing Model | Pay-as-you-go | DPU-based | Instance-based | Per query and data | Infrastructure costs |
Data Processing and Integration | Real-time data streaming and transformation | ETL, data preparation | Big data processing | SQL query service | Stream processing |
Data Sources | AWS services, cloud apps | Databases, data lakes, APIs | Various sources | Amazon S3 | Multiple sources |
Integration and Output | AWS services, S3, Redshift, Elasticsearch, etc. | AWS services, data warehouses | Various AWS services | Amazon S3, export | Multiple data sinks |
Data Catalog and Metadata Management | None | AWS Glue Data Catalog | Integration with AWS Glue | AWS Glue Data Catalog | External tools may be required |
cloud/aws/big_data.1697634656.txt.gz · Last modified: by skipidar