While joining two tables in vertica, in terms of performance wise, the query will perform good while joining on two integer fields or joining on VARCHAR fields.
As the data are stored in Vertica in encrypted format, so while performing join how the data is being read from Vertica. If the tables are joined on integer fileds then how the data is being read from Vertica or IF the join is on varchar fields, then how the data is being read from Vertica.
Which one will improve performance ?
Vertica joins on Integer or VARCHAR fields , which one is good for performance ?
Moderator: NorbertKrupa
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?
At the machine level, there are instructions which can perform integer comparisons in one cycle and are usually stored in smaller registers.
With strings (or VARCHAR) which can be varying lengths, and characters, you have to compare each character of the string as well as NULLs. Since these characters are translated into integers at the machine level, the machine is now doing a loop over the entire string.
Some databases do different optimizations to improve performance on string comparisons. As a rule of thumb, a JOIN on an integer is usually better.
I always say, test it yourself.
TLDR: An integer comparison at a machine level will always be faster than a string comparison
With strings (or VARCHAR) which can be varying lengths, and characters, you have to compare each character of the string as well as NULLs. Since these characters are translated into integers at the machine level, the machine is now doing a loop over the entire string.
Some databases do different optimizations to improve performance on string comparisons. As a rule of thumb, a JOIN on an integer is usually better.
I always say, test it yourself.
TLDR: An integer comparison at a machine level will always be faster than a string comparison
Checkout vertica.tips for more Vertica resources.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?
Hi,
I agree with Norbert in that typically joining on columns having the INT datatype is better than joining on columns that have a VARCHAR datatype. Although when tested at client sites, I typically only see a performance degradation with VARCHARs when they are significantly larger than 8 bytes in size. I'm guessing because integers in Vertica use 8 bytes
I agree with Norbert in that typically joining on columns having the INT datatype is better than joining on columns that have a VARCHAR datatype. Although when tested at client sites, I typically only see a performance degradation with VARCHARs when they are significantly larger than 8 bytes in size. I'm guessing because integers in Vertica use 8 bytes
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?
As the vertica uses Hash or Mege join while joing two tables; so even if we join two tables in vertica with Joining fields as varchar field, It will perform fast to retrieve the records ?
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Vertica joins on Integer or VARCHAR fields , which one is good for performance ?
Please read my post on the difference between merge and hash joins.
Checkout vertica.tips for more Vertica resources.