Alibaba's Deployment of Apache Hadoop YARN 3.x for Large-Scale Data Infrastructure

Since 2013, Alibaba has utilized Apache Hadoop YARN 3.x to construct its data infrastructure, now managing over 10,000 nodes. Hadoop YARN supports various data processing workloads within Alibaba's extensive ecosystem.

Alibaba's Deployment of Apache Hadoop YARN 3.x for Large-Scale Data Infrastructure
DataWorks Summit
426 views • Jul 23, 2018
Alibaba's Deployment of Apache Hadoop YARN 3.x for Large-Scale Data Infrastructure

About this video

Alibaba builds the data infrastructure with Apache Hadoop YARN since 2013, and till now it manages more than 10k nodes. In Alibaba, Hadoop YARN serves various systems such as search, advertising, and recommendation etc. It runs not just batch jobs, also streaming, machine learning, OLAP, and even online services that directly impact Alibaba’s user experience. To extend YARN’s ability to support such complex scenarios, we have done and leveraged a lot of YARN 3.x improvements. In this talk, you will find what are these improvements and how they helped to solve difficult problems in large production clusters.

This includes:
1. Highly improved performance with Capacity Scheduler’s async scheduling framework
2. Better placement decisions with node attributes, placement constraints
3. Better resource utilization with opportunistic containers
4. Introduce a load balancer to balance resource utilization
5. Generic resource types scheduling/isolation to manage new resources such as GPU and FPGA

In the presentation, we will further introduce how we build the entire ecosystem on top of YARN and how we keep evolving YARN’s ability to tackle the challenges brought by continuously increasing data and business in Alibaba.

Tags and Topics

Browse our collection to discover more content in these categories.

Video Information

Views

426

Likes

5

Duration

38:27

Published

Jul 23, 2018

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.