How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table

Moderator: NorbertKrupa

Post Reply
naranibhanu
Newbie
Newbie
Posts: 2
Joined: Fri Mar 06, 2015 6:51 am

How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table

Post by naranibhanu » Fri Mar 06, 2015 7:07 am

Hi All,

I have 2 diemsion tables and 1 fact table as below

D_DATE_DIMENSION

DATE_KEY
DATE_VAL

Data for above table is unsegmented(replicated) across all 3 nodes and projection is order by DATE_KEY column


D_SUBSCRIBER_DIMENSION

SUBSCRIBER_KEY
MSISDN

Data for above table is unsegmented(replicated) across all 3 nodes and projection is order by SUBSCRIBER_KEY column

F_USAGE_FACT

DATE_KEY
SUBSCRIBER_KEY
USAGE
CHARGE


Data for above table is segmented. Projection is order by DATE_KEY,SUBSCRIBER_KEY and segmented by DATE_KEY..


To use group by pipeline and merge join vertica suggests Data in the facts and dimension table should be sorted by the join key columns and i creates projections for above 2 dimension tables & fact table accordingly.

Now my query is as follows

select sum(USAGE) from F_USAGE_FACT a,D_DATE_DIMENSION b where a.DATE_KEY =b.DATE_KEY

For above query merge join used when icheck in explain plan..

Now i tried to join all 3 tables by using below query

select sum(USAGE) from F_USAGE_FACT a,D_DATE_DIMENSION b , D_SUBSCRIBER_DIMENSION c where a.DATE_KEY =b.DATE_KEY and a.SUBSCRIBER_KEY=c.SUBSCRIBER_KEY


Now when i check explain plan for above query it is doing Merge join only for D_DATE_DIMENSION and hash join for D_SUBSCRIBER_DIMENSION table even though fact table projection is ordered by DATE_KEY,SUBSCRIBER_KEY


How to use Merge join when i join fact table with more than one dimension tables..

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table

Post by JimKnicely » Fri Mar 06, 2015 2:21 pm

Hi,

For a MERGE JOIN to occur, the input from both sides must be sorted on the join key. You can re-order the results of the fact table join to the date dimension on the subscriber key, and then join to the subscriber dimension.

Code: Select all

 ------------------------------
 QUERY PLAN DESCRIPTION:
 ------------------------------

 explain select * from (select * from F_USAGE_FACT a join D_DATE_DIMENSION b ON a.DATE_KEY = b.DATE_KEY ORDER BY a.subscriber_key) foo join D_SUBSCRIBER_DIMENSION c on c.subscriber_key = foo.subscriber_key;

 Access Path:
 +-JOIN MERGEJOIN(inputs presorted) [Cost: 82, Rows: 1] (PATH ID: 1)
 |  Join Cond: (c.subscriber_key = foo.subscriber_key)
 |  Execute on: All Nodes
 | +-- Outer -> SELECT [Cost: 47, Rows: 1] (PATH ID: 2)
 | |      Execute on: All Nodes
 | | +---> SORT [Cost: 47, Rows: 1] (PATH ID: 3)
 | | |      Order: a.subscriber_key ASC
 | | |      Execute on: All Nodes
 | | |      Runtime Filter: (SIP2(MergeJoin): foo.subscriber_key)
 | | | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 46, Rows: 1] (PATH ID: 4)
 | | | |      Join Cond: (a.date_key = b.date_key)
 | | | |      Materialize at Output: b.date_val
 | | | |      Execute on: All Nodes
 | | | | +-- Outer -> STORAGE ACCESS for b [Cost: 17, Rows: 1] (PATH ID: 5)
 | | | | |      Projection: public.d_date_dimension_node0001
 | | | | |      Materialize: b.date_key
 | | | | |      Execute on: All Nodes
 | | | | |      Runtime Filter: (SIP1(MergeJoin): b.date_key)
 | | | | +-- Inner -> STORAGE ACCESS for a [Cost: 23, Rows: 1] (PATH ID: 6)
 | | | | |      Projection: public.f_usage_fact_b0
 | | | | |      Materialize: a.date_key, a.subscriber_key, a.usage, a.charge
 | | | | |      Execute on: All Nodes
 | +-- Inner -> STORAGE ACCESS for c [Cost: 34, Rows: 1] (PATH ID: 7)
 | |      Projection: public.d_subscriber_dimension_node0001
 | |      Materialize: c.subscriber_key, c.msisdn
 | |      Execute on: All Nodes
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table

Post by scutter » Fri Mar 06, 2015 6:39 pm

Yes, forcing a sort to get a merge join for the 2nd dimension does achieve a merge join, but I’d performance test that before doing it. Sort is expensive, and you’d still need to iterate over all of the fact records in order to join the dimension values.

The original example EXPLAIN output isn’t for real data - it’s just a small amount of data. But as long as the 2nd dimension table is small, the hash join will only be marginally slower than the merge join - just by the amount of time it takes to build the in-memory hash table for the dimension data.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table

Post by JimKnicely » Fri Mar 06, 2015 6:48 pm

I never said it'd perform well :)

I wonder if it'd be a good idea for the original poster to simply create a pre-join projection on the three tables?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “Vertica SQL”