AWS Pricing Overview

Configure Expanses Report!!!

Pricing per resource

Compute – Compute hour
Storage – Pay per GB
Data Transfer – data transfer out
Data transfer between Regions – internet data transfer (EC2, S3, RDS, DynamoDB, SQS, SNS, VPC)

HADOOP Overview

HADOOP is a scalable data storage and batch processing framework.

Hadoop Components

Hadoop Common
Hadoop Distributed File System (HDFS)
Hadoop YARN – big data jobs
Hadoop MapReduce
Hive – data warehousing infrastructure provides HQL (High Query Language), useful data sumarization.
Pig – queries, data manipulation, combine SQL / MR

AWS Big Data Solution Overview

Big data solution is not only for the size of the data, it’s more about how to extract, search, transfer, updating, sharing, visiualizate, etc. data.

We can define 3 features of the big data.

Volume – size of the data
Velocity – frequency of data captured, shared
Variety – images, text, video, etc.

Amazon Web Service Big Data ecosystem step / component

Collect Data

Amazon Kinesis Firehose (Real-time)
AWS Snowball (Data Import)

Store Data

Amazon S3
Amazon Kinesis Stream (Real-time)
Amazon RDS RDSMS
Amazon DynamoDB for NoSQL

Process and Analyse

Amazon EMR (Hadoop Ecosystem)
AWS Lambda (Real-time)
Amazon Kinesis Analytics (Real time)
Amazon Redshift (Data Warehousing)
Amazon ML (Machine Learning)

Visualize

Amazon QuickSight (Business Intelligence, Data Visialization)

Big Data Tools

Collect Data

Fluentd
Flume
Log4j
Sqoop

Store

HDFS
Cassandra
Ka ka
HBase

Process and Analyze

Hadoop
Spark
Hive
Oozie
Mahout
Presto
Pig
Impala

Visualize

SAS
Qlik
Looker
Flot
Tableau
Tibco Spotfree
Tibco Jaspersoft