Hi All,
I have 2 diemsion tables and 1 fact table as below
D_DATE_DIMENSION
DATE_KEY
DATE_VAL
Data for above table is unsegmented(replicated) across all 3 nodes and projection is order by DATE_KEY column
D_SUBSCRIBER_DIMENSION
SUBSCRIBER_KEY
MSISDN
Data for above table is unsegmented(replicated) across all 3 nodes and projection is order by SUBSCRIBER_KEY column
F_USAGE_FACT
DATE_KEY
SUBSCRIBER_KEY
USAGE
CHARGE
Data for above table is segmented. Projection is order by DATE_KEY,SUBSCRIBER_KEY and segmented by DATE_KEY..
To use group by pipeline and merge join vertica suggests Data in the facts and dimension table should be sorted by the join key columns and i creates projections for above 2 dimension tables & fact table accordingly.
Now my query is as follows
select sum(USAGE) from F_USAGE_FACT a,D_DATE_DIMENSION b where a.DATE_KEY =b.DATE_KEY
For above query merge join used when icheck in explain plan..
Now i tried to join all 3 tables by using below query
select sum(USAGE) from F_USAGE_FACT a,D_DATE_DIMENSION b , D_SUBSCRIBER_DIMENSION c where a.DATE_KEY =b.DATE_KEY and a.SUBSCRIBER_KEY=c.SUBSCRIBER_KEY
Now when i check explain plan for above query it is doing Merge join only for D_DATE_DIMENSION and hash join for D_SUBSCRIBER_DIMENSION table even though fact table projection is ordered by DATE_KEY,SUBSCRIBER_KEY
How to use Merge join when i join fact table with more than one dimension tables..
How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table
Moderator: NorbertKrupa
-
- Newbie
- Posts: 2
- Joined: Fri Mar 06, 2015 6:51 am
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table
Hi,
For a MERGE JOIN to occur, the input from both sides must be sorted on the join key. You can re-order the results of the fact table join to the date dimension on the subscriber key, and then join to the subscriber dimension.
For a MERGE JOIN to occur, the input from both sides must be sorted on the join key. You can re-order the results of the fact table join to the date dimension on the subscriber key, and then join to the subscriber dimension.
Code: Select all
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
explain select * from (select * from F_USAGE_FACT a join D_DATE_DIMENSION b ON a.DATE_KEY = b.DATE_KEY ORDER BY a.subscriber_key) foo join D_SUBSCRIBER_DIMENSION c on c.subscriber_key = foo.subscriber_key;
Access Path:
+-JOIN MERGEJOIN(inputs presorted) [Cost: 82, Rows: 1] (PATH ID: 1)
| Join Cond: (c.subscriber_key = foo.subscriber_key)
| Execute on: All Nodes
| +-- Outer -> SELECT [Cost: 47, Rows: 1] (PATH ID: 2)
| | Execute on: All Nodes
| | +---> SORT [Cost: 47, Rows: 1] (PATH ID: 3)
| | | Order: a.subscriber_key ASC
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP2(MergeJoin): foo.subscriber_key)
| | | +---> JOIN MERGEJOIN(inputs presorted) [Cost: 46, Rows: 1] (PATH ID: 4)
| | | | Join Cond: (a.date_key = b.date_key)
| | | | Materialize at Output: b.date_val
| | | | Execute on: All Nodes
| | | | +-- Outer -> STORAGE ACCESS for b [Cost: 17, Rows: 1] (PATH ID: 5)
| | | | | Projection: public.d_date_dimension_node0001
| | | | | Materialize: b.date_key
| | | | | Execute on: All Nodes
| | | | | Runtime Filter: (SIP1(MergeJoin): b.date_key)
| | | | +-- Inner -> STORAGE ACCESS for a [Cost: 23, Rows: 1] (PATH ID: 6)
| | | | | Projection: public.f_usage_fact_b0
| | | | | Materialize: a.date_key, a.subscriber_key, a.usage, a.charge
| | | | | Execute on: All Nodes
| +-- Inner -> STORAGE ACCESS for c [Cost: 34, Rows: 1] (PATH ID: 7)
| | Projection: public.d_subscriber_dimension_node0001
| | Materialize: c.subscriber_key, c.msisdn
| | Execute on: All Nodes
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table
Yes, forcing a sort to get a merge join for the 2nd dimension does achieve a merge join, but I’d performance test that before doing it. Sort is expensive, and you’d still need to iterate over all of the fact records in order to join the dimension values.
The original example EXPLAIN output isn’t for real data - it’s just a small amount of data. But as long as the 2nd dimension table is small, the hash join will only be marginally slower than the merge join - just by the amount of time it takes to build the in-memory hash table for the dimension data.
—Sharon
The original example EXPLAIN output isn’t for real data - it’s just a small amount of data. But as long as the 2nd dimension table is small, the hash join will only be marginally slower than the merge join - just by the amount of time it takes to build the in-memory hash table for the dimension data.
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: How to make query to use merge join when 1 fact table is joined with more than 1 dimensional table
I never said it'd perform well
I wonder if it'd be a good idea for the original poster to simply create a pre-join projection on the three tables?
I wonder if it'd be a good idea for the original poster to simply create a pre-join projection on the three tables?
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.