书名：Mastering Hadoop 3
作者名：Chanchal Singh Manish Kumar
本章字数：135字
更新时间：2025-04-04 14:54:50

Join pattern

The join is commonly used across companies where reports are being created. The two datasets are joined together to extract meaningful analysis, which can be helpful for decision makers. The join queries are simple in SQL but achieving this in MapReduce is a bit complex. Both mappers and reducers operate on a single key at a time. Joining two datasets of equal size will require two times the network bandwidth as all data from both datasets will have to be sent to the reducer for joining.

The join operation is very costly in Hadoop as it requires data traversal from one machine to another over the network and thus it is important to make sure that enough effort is made to save network bandwidth. Let's look into a few join patterns.

本周热推：

我的J2EE成功之路 SQL Server 2019 Administrator's Guide Spark大数据商业实战三部曲：内核解密|商业案例|性能调优实战Windows Azure 机器学习及应用（在线实验+在线自测）