Estimated reading time: 3 minutes, 4 seconds

What is New in Hadoop? Featured

What is New in Hadoop? Efe Kurnaz

Ever since it became available in 2011 as an open-source program, Hadoop has made a significant impact in solving big data issues. It has also undergone significant transformations that have not only made it more robust but effective as well. Since it was founded, Hadoop has gone through major changes in versions that have made it stand tall in the big data arena. The good thing about Hadoop is that it has an active community that releases updates that keep the platform new. The latest update on Hadoop is Apache Hadoop 3.x rolled out in 2017. This new version comes with bug fixes, features, and enhancements over Hadoop 2.x. Here are some of the new features that will boost Hadoop in version 3.x that you need to know.

  • JDK is the Minimum runtime version for Hadoop 3.x

With the end of Oracle JDK 7 in 2015, Hadoop 3.0 JAR files are compiled to run on a new version-JDK 8. This enables Hadoop 3.x to have a new dependency upgrade to modern versions because most libraries support only Java 8 and above.  Hadoop users who still use lower versions need to upgrade to the higher JDK 8 version compatible with these files.

  • Erasure coding in HDFS support

With the rapid growth in data and the data center hardware, erasure coding support in Hadoop 3.0 is a critical feature. This technique lets any random piece of data be recovered based on another piece of data.  This technique is like an advanced RAID technique that automatically recovers data when hard disks fail. The HDFS in Hadoop 2.0 inherits a 3-way replication from Google File System (GFS) to replicate each piece of data thrice for the purpose of reliability. Hadoop 3.0 will cut physical disk usage by more than half, and the fault tolerance also increases by more than 50%. This new feature in Hadoop 3.0 will save customers a lot of money on hardware infrastructure.

  • Shell script rewrite

The previous versions of Hadoop had many bugs and compatibility issues. With this, the new version of Hadoop shell scripts have been rewritten to resolve bugs, compatibility issues, and installation problems. Some of the critical areas that have been updated include all the shell script subsystems, which now execute Hadoop-env.sh, allowing all environment variables to be in a single place. With this new version, daemonization has been moved from -daemon.sh to the bin command. The updated scripts test and report error messages better. These are just a few of the updates.

  • Support for opportunistic container

The Execution Type notion has been introduced in the new version to allow applications to request containers that can be of an opportunistic type. The update allows containers to be dispatched for execution in the Node Manager even without any resource. Containers are queued at Node Manager, waiting for resources before it can start. The new opportunistic containers are of low priority and are preempted. As such, cluster utilization has been improved in the new version significantly.

  • Port changes for various services and the addition of new of new default ports

There have been notable changes in default ports such as the movement of Secondary NameNode, NameNode, DataNode, and KMS to Linux ephemeral port range (32768-61000) to avoid any potential bind errors on startup due to conflict with other applications. With this feature, there is enhanced reliability with regard to rolling restarts on larger clusters of Hadoop.

Although these are just a few improvements that have been made on Hadoop 3.0, the updates are a major advancement in the big data space. With the above features and others that were not covered and others that are likely to be announced in the coming days, Hadoop will remain a competitive platform for the unforeseeable future.

Read 2197 times
Rate this item
(0 votes)
Scott Koegler

Scott Koegler is Executive Editor for Big Data & Analytics Tech Brief

scottkoegler.me/

Visit other PMG Sites:

PMG360 is committed to protecting the privacy of the personal data we collect from our subscribers/agents/customers/exhibitors and sponsors. On May 25th, the European's GDPR policy will be enforced. Nothing is changing about your current settings or how your information is processed, however, we have made a few changes. We have updated our Privacy Policy and Cookie Policy to make it easier for you to understand what information we collect, how and why we collect it.