Module 1

Introduction to Big Data

Rise of Big Data

Compare Hadoop vs traditonal systems

Hadoop Master-Slave Architecture

Understanding HDFS Architecture

NameNode, DataNode, Secondary Node

Learn about JobTracker, TaskTracker

Module 2

HDFS and MapReduce Architecture

Core components of Hadoop

Understanding Hadoop Master-Slave Architecture

Learn about NameNode, DataNode, Secondary Node

Understanding HDFS Architecture

Anatomy of Read and Write data on HDFS

MapReduce Architecture Flow

JobTracker and TaskTracker

Module 3

Hadoop Configuration

Hadoop Modes

Hadoop Terminal Commands

Cluster Configuration

Web Ports

Hadoop Configuration Files

Reporting, Recovery



Module 4

Understanding Hadoop MapReduce Framework

Overview of the MapReduce Framework

Use cases of MapReduce

MapReduce Architecture

Anatomy of MapReduce Program

Mapper/Reducer Class, Driver code

Module 5

Advance MapReduce Part-1

Write your own Partitioner

Writing Map and Reduce in Python

Map side/Reduce side Join

Distributed Join

Distributed Cache


Joining Multiple datasets in MapReduce

Module 6

Advance MapReduce-Part2

MapReduce internals

Understanding Input Format

Custom Input Format

Using Writable and Comparable

Understanding Output Format

Sequence Files

JUnit and MRUnit Testing Frameworks

Module 7

Apache Pig

PIG vs MapReduce

PIG Architecture & Data types

PIG Latin Relational Operators

PIG Latin Join and CoGroup

PIG Latin Group and Union

Describe, Explain, Illustrate

PIG Latin: File Loaders & UDF

Module 8

Apache Hive And Hive QL

What is Hive

Hive DDL – Create/Show Database

Hive DDL – Create/Show/Drop Tables

Hive DML – Load Files & Insert Data

Hive SQL – Select, Filter, Join, Group By

Hive Architecture & Components

Difference between Hive and RDBMS

Module 9

Apache HiveQL

Multi-Table Inserts


Grouping Sets, Cubes, Rollups

Custom Map and Reduce scripts

Hive SerDe

Hive UDF


Module 10

Apache Fume, Sqoop Oozie

Sqoop – How Sqoop works

Sqoop Architecture

Flume – How it works

Flume Complex Flow – Multiplexing

Oozie – Simple/Complex Flow

Oozie Service/ Scheduler

Use Cases – Time and Data triggers

Module 11

NoSQL Databases

CAP theorem


Key Value stores: Memcached, Riak

Key Value stores: Redis, Dynamo DB

Column Family: Cassandra, HBase

Graph Store: Neo4J

Document Store: MongoDB, CouchDB

Module 12

Apache HBase

When/Why to use HBase

HBase Architecture/Storage

HBase Data Model

HBase Families/ Column Families

HBase Master

HBase vs RDBMS

Access HBase Data

Module 13

Apache Zookeeper

What is Zookeeper

Zookeeper Data Model

ZNokde Types

Sequential ZNodes

Installing and Configuring

Running Zookeeper

Zookeeper use cases

Module 14

Hadoop 2.0,YARN MrV2

Hadoop 1.0 Limitations

MapReduce Limitations

HDFS 2: Architecture

HDFS 2: High availability

HDFS 2: Federation

YARN Architecture

Classic vs YARN

YARN multitenancy

YARN Capacity Scheduler

Module 15


Demo of 3 Sample projects.