Join pattern 

The join is commonly used across companies where reports are being created. The two datasets are joined together to extract meaningful analysis, which can be helpful for decision makers. The join queries are simple in SQL but achieving this in MapReduce is a bit complex. Both mappers and reducers operate on a single key at a time. Joining two datasets of equal size will require two times the network bandwidth as all data from both datasets will have to be sent to the reducer for joining.

The join operation is very costly in Hadoop as it requires data traversal from one machine to another over the network and thus it is important to make sure that enough effort is made to save network bandwidth. Let's look into a few join patterns.