How does Vertica evaluate columns for encoding type?

Moderator: NorbertKrupa

Post Reply
NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

How does Vertica evaluate columns for encoding type?

Post by NorbertKrupa » Mon Apr 14, 2014 3:54 pm

I pulled the encoding types for the projection columns in my environment and was curious how Vertica evaluates columns/data in choosing an encoding type.

Code: Select all

   data_type    |  encoding_type   | count
----------------+------------------+-------
 float          | AUTO             |  2782
 float          | RLE              |  1214
 float          | BLOCKDICT_COMP   |  1100
 float          | DELTARANGE_COMP  |  1026
 int            | DELTARANGE_COMP  |   804
 int            | RLE              |   696
 int            | AUTO             |   454
 date           | RLE              |   360
 int            | BLOCKDICT_COMP   |   328
 int            | COMMONDELTA_COMP |   292
 float          | COMMONDELTA_COMP |   286
 date           | BLOCKDICT_COMP   |   158
 timestamp      | GCDDELTA         |   148
 timestamp      | DELTARANGE_COMP  |   140
 timestamp      | AUTO             |   130
 timestamp      | RLE              |   114
 float          | BLOCK_DICT       |    90
 date           | COMMONDELTA_COMP |    80
 timestamp      | COMMONDELTA_COMP |    56
 date           | AUTO             |    36
 timestamp      | BLOCKDICT_COMP   |    34
 boolean        | RLE              |    28
 int            | DELTAVAL         |    22
 boolean        | AUTO             |    16
 date           | DELTAVAL         |    14
 int            | BLOCK_DICT       |    10
 int            | GCDDELTA         |     8
 numeric(8,3)   | DELTARANGE_COMP  |     8
 numeric(9,3)   | COMMONDELTA_COMP |     8
 numeric(10,3)  | BLOCKDICT_COMP   |     8
 date           | DELTARANGE_COMP  |     6
 numeric(19,5)  | BLOCKDICT_COMP   |     4
 numeric(37,15) | AUTO             |     4
 date           | BLOCK_DICT       |     2
Since the columns with an AUTO encoding go to the default encoding outlined in the documentation, why aren't some of these columns following the default encoding? For example, the documentation shows that columns with a float type would use either COMMONDELTA_COMP or DELTARANGE_COMP. How did Vertica evaluate to use RLE or BLOCKDICT_COMP for float?

I would like to improve the AUTO encoding to use a more appropriate encoding. Would the query-specific DBD run be the best way to do so?

Cross posted on Vertica Community.
Checkout vertica.tips for more Vertica resources.

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: How does Vertica evaluate columns for encoding type?

Post by id10t » Tue Apr 15, 2014 9:38 am

Hi!

[DELETED]
Last edited by id10t on Wed May 06, 2015 5:35 pm, edited 1 time in total.

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: How does Vertica evaluate columns for encoding type?

Post by NorbertKrupa » Tue Apr 15, 2014 12:46 pm

What is involved in the process of choosing the best column?
Checkout vertica.tips for more Vertica resources.

cbennett
Newbie
Newbie
Posts: 11
Joined: Thu Sep 05, 2013 9:57 pm
Contact:

Re: How does Vertica evaluate columns for encoding type?

Post by cbennett » Sun Apr 20, 2014 5:52 am

I think the answer is: it depends on many things.

If DBD is ran under the "storage footprint" option, DBD will compress and encode every column in every possible way in order to achieve maximum storage. Maximum storage is often at odds with performance, however. So, it will probably tend to favor things like RLE and some other things if you opt for performance - and it won't try to encode in every combination.

The order of the columns also matters, as do their cardinality. In version 7, you could also turn on analyze_correlations and manually create a DBD design that makes heavy use of the way columns are correlated to one another. This would also significantly change the encoding recommendations versus a traditional DBD run.

Post Reply

Return to “Vertica Database Administration”