Semi Join vs. Bloom Join

Difference Between Semi Join and Bloom Join

Semi join and Bloom Join are methods of joining which are used in query processing in case of distributed database. In case of distributed databases the data has to be transferred between the databases for processing queries.  These databases are usually located at different sites. To save on the cost of operation the queries are optimized so that minimum amount of data may need to be transferred. This is where these two methods come into picture.

Let us understand this by an example of an employee database. Suppose some of the information about an employee is at site 1 while rest of the information is at site 2 and you want to access the entire information from site 3 then you need to execute a query to get the information. Here it is not necessary to transfer the entire database, instead we can use some attributes required for the join so that the query can be executed successfully.  Here, semi join reduces the amount of data transferred between these sites. Only the join column is transferred between the sites.

In case of bloom join representation of join column is transferred between the remote sites instead of the join column like in semi join. This representation is created by using bloom filter with bit vector for executing membership queries.

Bloom join is more efficient than semi join because the amount of data transferred is far less in case of bloom join.

 

Category: VS  |  Tags: