To load very huge historical data in Vertica

Moderator: NorbertKrupa

Post Reply
IMK
Newbie
Newbie
Posts: 1
Joined: Thu Nov 28, 2013 3:48 am

To load very huge historical data in Vertica

Post by IMK » Sat Nov 30, 2013 2:44 pm

Subject - To load very huge historical data in Vertica

Hi,

I've to join two tables as follows - Table1 LEFT OUTER JOIN Table2 using columns accouting_year_month and customer_key.
Total number of rows on Table1 will be around 13 billion(historic data).
Table2 would be having around 3 billion records.
So accouting_year_month column would be having last last 5 years values(total around 60 unique values) and customer_key(high cardinality) values would be spread across accouting_yr_month(low cardinality).

This will only one time SQL that I've to run to pull historic data.

I have defined my source tables/projections as follows -
-------------------
create TABLE table1(
.
.
.

)
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;

/** super prjection **/
Create Projection table1
select
.
.
.
from table1
Order by accouting_year_month,customer_key
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;


------------------------------------------
create TABLE table2(
.
.
.

)
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;
/** super projection **/
Create Projection table2
select
.
.
.
from table2
Order by accouting_year_month,customer_key
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;

Now my join would be

select
.
.
.
.
.
.

from table1 A
left outer join table2 B
on
A.accouting_year_month =B.accouting_year_month
AND A.customer_key = B.customer_key
;

My question, is above approach correct.

There is also alternative approach below I am thinking to use -

Should I be changing the ordering and segmentation of my projections on the basis of to (customer_key,accouting_year_month) instead of (accouting_year_month,customer_key)
In that case I'll need to modify my column order in join condition also. So my column order in join condition would be following.

select
.
.
.
.
.
.

from table1 A
left outer join table2 B
on

AND A.customer_key = B.customer_key
A.accouting_year_month =B.accouting_year_month
;

Please let me which approach is better theoretically.

Also, please let me know if there is any other alternative approach that can be used.

Thanks for your help in advance.

Post Reply

Return to “General”