How to optimize SQL commands on the JOIN clause

Use UNION in your SQL code.

Written by Raghavan Vaidhyaraman

Last published at: July 18th, 2025

Introduction

You want to optimize the following SQL command’s performance. 

SELECT * FROM <table-name-1>
INNER JOIN <table-name-2> ON <table-name-1>.<column-1> = <table-name-2>.<column-1> OR <table-name-1>.<column-2> = <table-name-2>.<column-2>

 

Additional context

JOIN ON uses the PhotonBroadcastNestedLoopJoin. The nested loop approach in PhotonBroadcastNestedLoopJoin can be slow for large datasets because it involves a Cartesian product-like operation (pairing every row from one dataset with every row from the other, creating all possible row-to-row combinations). 

 

 

Instructions

Change your SQL code to use UNION instead.

SELECT * FROM <table-name-1>
INNER JOIN <table-name-2> ON <table-name-1>.<column-1> = <table-name-2>.<column-1>
UNION
SELECT * FROM <table-name-1>
INNER JOIN <table-name-2> ON <table-name-1>.<column-2> = <table-name-2>.<column-2>

 

UNION uses the PhotonBroadcastHashJoin to perform parallel joins, which are more efficient for joining large datasets. PhotonBroadcastHashJoin uses a hash join algorithm where one side of the join is broadcast to all nodes, and a hash table is built on the broadcast side. The other side of the join then probes this hash table to find matching rows.