Spring for Apache Hadoop requires JDK level 6.0 (just like Hadoop) and above, Spring Framework 3.0 (3.2 recommended) and above and Apache Hadoop 0.20.2 (1.0.4 recommended) and above. SHDP supports and is tested daily against various Hadoop distributions, such as Cloudera CDH3 (CD3u5) and CDH4 (CDH4.1u3 MRv1) distributions and Greenplum HD (1.2). Any distro compatible with Apache Hadoop 1.0.x should be supported.
Spring Data Hadoop is provided out of the box and it is certified to work on Greenplum/Pivotal HD distribution.
|Note that Hadoop YARN, NextGen or 2.x (currently in alpha stage), is NOT supported yet. Some Hadoop distros offer both the stable Hadoop (also known as 1.x or MRv1) and the YARN variant (also known as 2.x or MRv2) bundled together. Since SHDP supports only Hadoop 1.x, make sure to use only the 1.x/MRv1 services and libraries both on the clients and server, as 2.x/MRv2, or a mixture of the two, will cause errors.|
Regarding Hadoop-related projects, SDHP supports Cascading 2.1, HBase 0.90.x, Hive 0.8.x and Pig 0.9.x and above. As a rule of thumb, when using Hadoop-related projects, such as Hive or Pig, use the required Hadoop version as a basis for discovering the supported versions.
Spring for Apache Hadoop also requires a Hadoop installation up and running. If you don't already have a Hadoop cluster up and running in your environment, a good first step is to create a single-node cluster. To install Hadoop 0.20.x+, the Getting Started page from the official Apache documentation is a good general guide. If you are running on Ubuntu, the tutorial from Michael G. Noll, "Running Hadoop On Ubuntu Linux (Single-Node Cluster)" provides more details. It is also convenience to download a Virtual Machine where Hadoop is setup and ready to go. Cloudera provides virtual machines of various formats here. You can also download the EMC Greenplum HD distribution or get a tech preview of the Hortonworks distribution. Additionally, the appendix provides information on how to use Spring for Apache Hadoop and setup Hadoop with cloud providers, such as Amazon Web Services.