Best Big Data Hadoop industrial internship Training at Goeduhub Technologies Jaipur

Course content

Summer Projects

Reviews

Big Data Hadoop | Register for Training

Big Data Hadoop Course Content (Apart from this course content we have included Linux + Python for Hadoop Mapreduce during Summer training free with Hadoop course)

Module 1:Understanding Big Data and Hadoop –week-1

Learning Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, the common Hadoop ecosystem components, Hadoop Architecture, HDFS, Anatomy of File Write and Read, how MapReduce Framework works.

Topics

Big Data
Limitations and Solutions of existing Data Analytics Architecture
Hadoop
Hadoop Features
Hadoop Ecosystem
Hadoop 2.x core components
Hadoop Storage: HDFS, Hadoop Processing:
MapReduce Framework
Hadoop Different Distributions

Module 2: Hadoop Architecture and HDFS week-1

Learning Objectives - In this module, you will learn the Hadoop Cluster Architecture, Important Configuration files in a Hadoop Cluster, Data Loading Techniques, how to setup single node and multi node Hadoop cluster.

Topics

Hadoop 2.x Cluster Architecture - Federation and High Availability
A Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single node cluster
Multi node cluster set up Hadoop Administration

Module 3: Hadoop MapReduce Framework week-2

Learning Objectives - In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets.

Topics

MapReduce Use Cases
Traditional way Vs MapReduce way
Why MapReduce
Hadoop 2.x MapReduce Architecture
Hadoop 2.x MapReduce Components
YARN MR Application Execution Flow
YARN Workflow, Anatomy of MapReduce Program
Demo on MapReduce. Input Splits,
Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Demo on de-identifying Health Care Data set
Demo on Weather Data

Module 4 : Advanced MapReduce week-3

Learning Objectives - In this module, you will learn Advanced MapReduce concepts such as Counters Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and XML parsing.

Topics

Counters
Distributed Cache
MRunit
Reduce Join
Custom Input Format
Sequence Input Format
Xml file Parsing using MapReduce

Module 5: Pig week-4

Learning Objectives - In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset.

Topics

About Pig
MapReduce Vs Pig
Pig Use Cases
Programming Structure in Pig
Pig Running Modes
Pig components
Pig Execution
Pig Latin Program
Data Models in Pig
Pig Data Types
Shell and Utility Commands
Pig Latin : Relational Operators File Loaders, Group
Operator, COGROUP Operator, Joins and
COGROUP, Union, Diagnostic Operators,
Specialized joins in Pig
Built In Functions ( Eval Function, Load and Store)
Functions, Math function, String Function, Date
Function, Pig UDF, Piggybank
Parameter Substitution ( PIG macros and Pig Parameter substitution )
Pig Streaming
Testing Pig scripts with Punit
Aviation use case in PIG, Pig Demo on Healthcare Data

Module 6: Hive week-5

Learning Objectives - This module will help you in understanding Hive concepts, Hive Data types, loading and Querying Data in Hive, running hive scripts and Hive UDF.

Topics

Hive Background
Hive Use Case
About Hive
Hive Vs Pig
Hive Architecture and Components
Metastore in Hive
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Partitions and Buckets
Hive Tables(Managed Tables and External Tables)
Importing Data
Querying Data
Managing Outputs
Hive Script
Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Data

Module 7: Advanced Hive and HBase week-6

Learning Objectives - In this module, you will understand Advanced Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, optimizations in hive. You will also acquire in-depth knowledge of HBase, HBase Architecture, running modes and its components.

Topics

Hive QL: Joining Tables, Dynamic Partitioning,
Custom Map/Reduce Scripts
Hive Indexes and views
Hive query optimizers
Hive : Thrift Server, User Defined Functions
HBase: Introduction to NoSQL Databases and
HBase, HBase v/s RDBMS, HBase Components,
HBase Architecture, HBase Cluster

Module 8: Advance HBase week-6

Learning Objectives - This module will cover Advanced HBase concepts. We will see demos on Bulk Loading, Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper.

Topics

HBase Data Model HBase Shell HBase Client API

Data Loading Techniques ZooKeeper Data Model Zookeeper Service Zookeeper

Demos on Bulk Loading Getting and Inserting Data Filters in HBase

Module 9: Processing Distributed Data with Apache Spark week-7

Learning Objectives - In this module you will learn Spark ecosystem and its components, how Scala is used in Spark, SparkContext. You will learn how to work in RDD in Spark. Demo will be there on running application on Spark Cluster, Comparing performance of MapReduce and Spark.

Topics

What is Apache Spark
Spark Ecosystem
Spark Components
History of Spark and Spark Versions/Releases
Spark a Polyglot
What is Scala?
Why Scala?
SparkContext
RDD

Module 10: Oozie and Hadoop Project week-8,9

Learning Objectives - In this module, you will understand working of multiple Hadoop ecosystem components together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Flume & Sqoop data loading Techniques, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.

Topics

Flume and Sqoop Demo
Oozie
Oozie Components
Oozie Workflow
Scheduling with Oozie
Demo on Oozie Workflow
Oozie Co-ordinator
Oozie Commands
Oozie Web Console
Oozie for MapReduce
PIG, Hive, and Sqoop,
Combine flow of MR, PIG, Hive in Oozie
Hadoop Project Work
Hadoop Integration with Talend

Project Work

Towards the end of the course, you will be working on a live project where you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics.

Also provide working concepts with Devops tools like Git, Chef, Docker,

Project will be like:

Youtube Data Analysis
Semi-Structured data analysis.
Healthcare Data Analysis.
Movie Recommendation
Real Time Log Data Analysis.

Online Courses	Free Tutorials	Go to Your University	Placement Preparation
Best Products for Students