Problem
When you perform a JOIN operation on a masked column, you notice the system uses the masked values for the JOIN instead of the real (unmasked) values. You receive different results than what you expect.
Cause
Row filters and column masks are applied immediately after the table rows are scanned in the query plan, before any further operations such as joins or aggregations. Consequently, joins on masked columns use the masked values.
This is expected behavior, designed to ensure sensitive data from masked columns cannot be inadvertently exposed during query operations.
For more information, review the Column mask clause (AWS | Azure | GCP) documentation.
Solution
If you need to join on sensitive columns without exposing the real values, create a materialized view followed by dynamic view with column-level security or hash sensitive columns.
Create a materialized view
First, create a materialized view (AWS | Azure | GCP) that joins tables using unmasked sensitive columns. This allows the join operation to occur without masking interfering with the join logic.
The following example code creates a materialized view that joins <your-table-a>
and <your-table-b>
on the unmasked sensitive_id
column.
%sql
CREATE OR REPLACE MATERIALIZED VIEW <your-joined-mv>
AS SELECT
a.unmasked_id,
a.name,
b.address
FROM <your-table-a> a
INNER JOIN <your-table-b> b
ON a.unmasked_id = b.unmasked_id
Then create a dynamic view (AWS | Azure | GCP) on top of the materialized view to enforce column-level security. This dynamic view should implement masking or redaction policies to ensure sensitive data is only visible to authorized users.
The following example code creates a dynamic view on top of the previously-created materialized view to enforce column-level security using Unity Catalog functions.
%sql
CREATE OR REPLACE VIEW <your-secure-joined-mv> AS
SELECT
CASE
WHEN is_account_group_member('admin') THEN unmasked_id
ELSE '****'
END AS unmasked_id,
name,
CASE
WHEN is_account_group_member('admin') THEN address
ELSE CONCAT('**** ', SUBSTRING(address, -6))
END AS address
FROM <your-joined-mv>;
Hash sensitive columns
Hash sensitive columns and perform joins using these hashed representations.
First, hash sensitive columns during table creation. The following example code creates tables <your-table-a-hashed>
and <your-table-b-hashed>
with hashed versions of the sensitive columns to protect the original data. For more information, review the hash
function (AWS | Azure | GCP) documentation.
%sql
-- Table A with hashed IDs
CREATE OR REPLACE TABLE <your-table-a-hashed> AS
SELECT
hash(unmasked_id) AS hashed_id,
name
FROM <your-table-a>;
-- Table B with hashed IDs and addresses
CREATE OR REPLACE TABLE <your-table-b-hashed> AS
SELECT
hash(unmasked_id) AS hashed_id,
xxhash64(address) AS hashed_address
FROM <your-table-b>;
Then perform a JOIN
on the hashed columns from the previously-created tables, ensuring sensitive data remains protected throughout the process. The following code provides an example.
SELECT
a.hashed_id,
a.name,
b.hashed_address
FROM <your-table-a-hashed> a
INNER JOIN <your-table-b-hashed> b
ON a.hashed_id = b.hashed_id;