Screenshot 2015-08-25 13.37.41

Linkedin: Gradle for the Hadoop Ecosystem

The Hadoop Plugin is a Gradle Plugin that enables developers to more effectively build, test and deploy Hadoop applications at LinkedIn. The Plugin features the Hadoop DSL, a language for expressing workflows for Hadoop workflow schedulers like Azkaban and Apache Oozie. In this talk, I will explain our motivations behind writing the Plugin, discuss its features for working with parts of the Hadoop ecosystem (such as Apache Pig), and take a deep look into the design and implementation of the Hadoop DSL.

At LinkedIn, we have adopted Gradle as the company’s primary tool for building, testing and organizing our projects. Originally, Hadoop applications fell outside the company’s move to Gradle as Hadoop developers had a series of custom Ant, Maven, Python and Ruby build tools they used to make building and testing Hadoop applications easier. To help these teams complete the move to Gradle, we built the Hadoop Plugin, a Gradle Plugin that contains helpful features for working with the Hadoop ecosystem. With the Plugin, Hadoop developers at the company were able to migrate away from their old custom tools and complete the move to Gradle for building, testing and deploying Hadoop applications.

The Plugin features the Hadoop DSL, an embedded Groovy / Gradle language for expressing Hadoop workflows. At build time, the DSL compiles into configuration files for open source Hadoop workflow schedulers like Azkaban and Apache Oozie. In this talk, I will take a deep dive into the motivation and design behind the DSL, look at the various alternatives involved in its implementation, and discuss how it is being used at LinkedIn.