Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses.
Big data defined
As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the three Vs of big data: volume, velocity and variety.
Volume: Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
Velocity: Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
Variety: Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
- Why is Big Data important? What is Big Data? Characteristics of Big Data. Why should you care about Big Data? What are possible options for analyzing big?
- Traditional Distributed Systems
- Problems with traditional distributed systems
- What is Hadoop? History of Hadoop. How does Hadoop solve Big Data problem? Components of Hadoop
- What is HDFS? How HDFS works? Understand the Basic Architecture.
- What is Mapreduce? How Mapreduce works?
- How Hadoop works as a system?
- What is Pig? How it works? Analyze data using Pig.
- What is Hive? How it works? Analyze data using Hive.
- What is Mapreduce? How it works? An Example.
- What is Flume? How it works?An example.
- What is Sqoop? How it works .An example
- What is Oozie? How it works. An example.
- Setting up Virtual Machine
- Installing Hadoop Eco-system on a single node.
- Understanding the configuration for single node and multi-node installation.
- Hands On exercise.
- Running your first MapReduce Program
- Hands-on using Pig , Hive, MapReduce and Sqoop.
- Understand how partitioners and combiners function in mapReduce.
- Planning your Hadoop cluster. Hardware and Software considerations.
- Scheduling in Hadoop
- Monitoring your Hadoop Cluster
- Monitoring tools available
- Monitoring best practices
- Administration Best practices
- Hadoop Administration best practices
- Tools of the trade