Subject - To load very huge historical data in Vertica
Hi,
I've to join two tables as follows - Table1 LEFT OUTER JOIN Table2 using columns accouting_year_month and customer_key.
Total number of rows on Table1 will be around 13 billion(historic data).
Table2 would be having around 3 billion records.
So accouting_year_month column would be having last last 5 years values(total around 60 unique values) and customer_key(high cardinality) values would be spread across accouting_yr_month(low cardinality).
This will only one time SQL that I've to run to pull historic data.
I have defined my source tables/projections as follows -
-------------------
create TABLE table1(
.
.
.
)
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;
/** super prjection **/
Create Projection table1
select
.
.
.
from table1
Order by accouting_year_month,customer_key
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;
------------------------------------------
create TABLE table2(
.
.
.
)
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;
/** super projection **/
Create Projection table2
select
.
.
.
from table2
Order by accouting_year_month,customer_key
HASH segmented by (accouting_year_month,customer_key) ON ALL nodes ;
Now my join would be
select
.
.
.
.
.
.
from table1 A
left outer join table2 B
on
A.accouting_year_month =B.accouting_year_month
AND A.customer_key = B.customer_key
;
My question, is above approach correct.
There is also alternative approach below I am thinking to use -
Should I be changing the ordering and segmentation of my projections on the basis of to (customer_key,accouting_year_month) instead of (accouting_year_month,customer_key)
In that case I'll need to modify my column order in join condition also. So my column order in join condition would be following.
select
.
.
.
.
.
.
from table1 A
left outer join table2 B
on
AND A.customer_key = B.customer_key
A.accouting_year_month =B.accouting_year_month
;
Please let me which approach is better theoretically.
Also, please let me know if there is any other alternative approach that can be used.
Thanks for your help in advance.
To load very huge historical data in Vertica
Moderator: NorbertKrupa
Jump to
- General
- ↳ Welcome to vertica-forums.com!
- ↳ Board Announcements
- ↳ Request for New Forums Categories
- ↳ Announcements, Events and Activities
- ↳ Vertica Links
- ↳ General
- ↳ Employment
- ↳ Vertica Certification
- Newbie
- ↳ New to Vertica
- ↳ New to Vertica Database Administration
- ↳ New to Vertica Database Development
- ↳ New to Vertica SQL
- Using Vertica
- ↳ Vertica Database Administration
- ↳ Vertica Database Development
- ↳ Vertica SQL
- ↳ Vertica SQL Functions
- ↳ Vertica Database Designer (DBD)
- ↳ Vertica User Defined Functions (UDFs)
- ↳ Vertica External Procedures
- ↳ Vertica Analytics
- ↳ Vertica Management Console
- ↳ Vertica Error Codes
- ↳ Vertica Backup & Recovery
- ↳ Vertica Installation
- ↳ Vertica Security
- ↳ Vertica Performance Tuning
- ↳ Vertica Administration Tools
- ↳ Vertica Upgrade
- ↳ Vertica Migration
- ↳ Vertica and the Operating System
- ↳ Vertica Data Load
- ↳ Vertica Tips, Lessons and Examples
- ↳ Vertica "How to..."
- Connecting to Vertica
- ↳ ADO.NET
- ↳ Cognos
- ↳ DBeaver
- ↳ dbVisualizer
- ↳ HDFS Connector
- ↳ Hadoop Connector
- ↳ HCatalog Connector
- ↳ Informatica PowerCenter
- ↳ JDBC
- ↳ Microsoft SQL Server Analysis Services (SSAS)
- ↳ MicroStrategy
- ↳ Microsoft SQL Server Integration Services (SSIS)
- ↳ ODBC
- ↳ Perl
- ↳ PHP
- ↳ Python
- ↳ R Language Integration
- ↳ Rhapsody Interface Engine
- ↳ SQL Workbench/J
- ↳ SQuirreL SQL Client
- ↳ Talend
- ↳ Tableau
- ↳ Toad for Data Analysts
- ↳ Vertica SDK API
- ↳ VoltDB
- ↳ vSQL
- Vertica Portfolio
- ↳ Vertica Premium Edition
- ↳ Vertica Express Edition
- ↳ Vertica Community Edition
- ↳ Vertica SQL on Hadoop
- ↳ Vertica in the AWS Cloud
- ↳ Vertica on Microsoft Azure
- All times are UTC+01:00
- Top
- Delete all board cookies
Powered by phpBB ® | phpBB3 theme by KomiDesign