Understanding Apache Hive in Data Science πŸ“Š

Learn what Apache Hive is and its role in data science. Download the 90-day roadmap to become a Data Scientist now!

Understanding Apache Hive in Data Science πŸ“Š
BigDataElearning
129.8K views β€’ Dec 27, 2017
Understanding Apache Hive in Data Science πŸ“Š

About this video

ATTENTION DATA SCIENCE ASPIRANTS:
Click Below Link to Download Proven 90-Day Roadmap to become a Data Scientist in 90 Days
https://www.bigdataelearning.com/the-data-science-aspirants-90-day-roadmap
Apache Hive Beginner's Guide : https://www.bigdataelearning.com/blog/apache-hive-beginners-guide
Apache Hive Courses : https://bigdataelearning.com/courses

In this video, you will get a quick overview of Apache Hive, one of the most popular data warehouse components on the big data landscape. It’s mainly used to complement the Hadoop file system with its interface.
Hive was originally developed by Facebook and is now maintained as Apache hive by Apache software foundation. It is used and developed by biggies such as Netflix and Amazon as well.

Why was Hive Developed
=====================
The Hadoop ecosystem is not just scalable but also cost effective when it comes to processing large volumes of data. It is also a fairly new framework that packs a lot of punch. However, organizations with traditional data warehouses are based on SQL with users and developers that rely on SQL queries for extracting data.

It makes getting used to the Hadoop ecosystem an uphill task. And that is exactly why hive was developed.

Hive provides SQL intellect, so that users can write SQL like queries called HQL or hive query language to extract the data from Hadoop. These SQL likes queries will be converted into map reduce jobs by the Hive component and that is how it talks to Hadoop ecosystem and HDFS file system.

How and when Hive can be used?
===========================
οƒ˜ Hive can be used for OLAP (online analytic) processing
οƒ˜ It is scalable, fast and flexible
οƒ˜ It is a great platform for the SQL users to write SQL like queries to interact with the large datasets that reside on HDFS filesystem
Here is what Hive cannot be used for:
==============================
οƒ˜ It is not a relational database
οƒ˜ It cannot be used for OLTP (online transaction) processing
οƒ˜ It cannot be used for real time updates or queries
οƒ˜ It cannot be used for scenarios where low latency data retrieval is expected, because there is a latency in converting the HIVE scripts into MAP REDUCE scripts by Hive
Some of the finest features of Hive
============================
οƒ˜ It supports different file formats like sequence file, text file, avro file format, ORC file, RC file
οƒ˜ Metadata gets stored in RDBMS like derby database
οƒ˜ Hive provides lot of compression techniques, queries on the compressed data such as SNAPPY compression, gzip compression
οƒ˜ Users can write SQL like queries that hive converts into mapreduce or tez or spark jobs to query against hadoop datasets
οƒ˜ Users can plugin mapreduce scripts into the hive queries using UDF user defined functions
οƒ˜ Specialized joins are available that help to improve the query performance
If you don’t understand any of the above terms, that is fine. We will look into the above features in detail in our upcoming videos.

Tags and Topics

Browse our collection to discover more content in these categories.

Video Information

Views

129.8K

Likes

1.5K

Duration

5:24

Published

Dec 27, 2017

User Reviews

4.5
(25)
Rate:

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.