Subscribe to DSC Newsletter


Hadoop Developer course structure is a blend of administration and writing code using Hadoop ecosystem components to show working with Big data. Topics covered in this course, include Map Reduce, Hive, Pig, Zoo Keeper, Sqoop and multi node setup of Hadoop Cluster on Amazon ec2. Hadoop developer course focuses on training participants on setting up Hadoop infrastructure, writing Map Reduce Programs, Hive and Pig Scripts, working with HDFS, Zoo keeper and Sqoop.

Total Duration

30-35 hrs

Who should attend?

Java Developers / Architects, Data warehouse developers /Architects, Big data Professionals

Pre requisites for attending Hadoop Developer Training

Basic knowledge of unix , java, sql scripting

OBJECTIVE of the Training:

1. Understanding Distributed , parallel ,cloud computing ,No sql concepts
2. Setting up Hadoop infrastructure with single and multi node cluster on amazon ec2
3. Understanding of concepts of Map and Reduce and functional Programming
4. Writing Map and Reduce Programs , Working with HDFS
5. Writing Hive and Pig Scripts and working with Zoo Keeper and Sqoop
6. Ability to design and develop applications involving large data using Hadoop eco system

Course Outline

Introduction to Hadoop

  • Distributed computing
  • Parallel computing
  • Concurrency
  • Cloud Computing
  • Data Past, Present and Future
  • Computing Past, Present and Future
  • Hadoop
  • NoSQL

Hadoop Stack

  • MapReduceNoSQL
  • CAP Theorem
  • Databases: Key Value, Document, Graph
  • Hive and Pig
  • HDFS

Lab 1: Hadoop Hands-on

  • Installing Hadoop Single Node cluster
  • Understanding Hadoop configuration files

MapReduce Introduction

  • Functional – Concept of Map
  • Functional – Concept of Reduce
  • Functional – Ordering, Concurrency, No Lock, Concurrency
  • Functional – Shuffling
  • Functional – Reducing, Key, Concurrency
  • MapReduce Execution framework
  • MapReduce Partitioners and Combiners
  • MapReduce and role of distributed filesystem
  • Role of Key and Pairs
  • Hadoop Data Types

Lab 2: MapReduce Exercises

  • Understanding Sample MapReduce code
  • Executing MapReduce code

HDFS Introduction

  • Architecture
  • File System
  • Data replication
  • Name Node
  • Data Node


  • Architecture
  • Data Model
  • Physical Layout
  • DDL DML SQL Operations

Lab 3: Hive Hands ON

  • Installation
  • Setup
  • Exercises


  • Rationale
  • Pig Latin
  • Input, Output and Relational Operators
  • User Defined Functions
  • Analyzing and designing using Pig Latin

Lab 4: Pig Hands on

  • Installation
  • Setup
  • Executing Pig Latin scripts on File system
  • Executing Pig Latin scripts on HDFS
  • Writing custom User Defined Functions

Introduction to Zoo Keeper

Introduction to Sqoop

Hadoop Multi node Cluster Setup

  • Installation and Configuration
  • Running MapReduce Jobs on Multi Node cluster

Working with Large data sets

  • Steps involved in analyzing large data
  • Lab walk through

Contact Us

E-mail -[email protected]

Views: 868

On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service