HADOOP Overview

HADOOP is a scalable data storage and batch processing framework.

Hadoop Components

Hadoop Common
Hadoop Distributed File System (HDFS)
Hadoop YARN – big data jobs
Hadoop MapReduce
Hive – data warehousing infrastructure provides HQL (High Query Language), useful data sumarization.
Pig – queries, data manipulation, combine SQL / MR