How to segment a projection

Moderator: NorbertKrupa

Post Reply
sarah
Intermediate
Intermediate
Posts: 77
Joined: Mon Aug 27, 2012 1:34 pm

How to segment a projection

Post by sarah » Fri Apr 11, 2014 7:55 am

Hi all,

I've found a lot of info in the admin guide that explains how to set up the ordering of columns in a projection, but not too much info about what columns to segment on. It it best just to segment by the primary key or the column with the highest cardinality? I think that is correct.... But if do segment by PK I get a lot of rows in the query_events table that have a description of "Many rows were resegmented during plan execution." I assume hat's bad...

:?
Have a GREAT day!

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: How to segment a projection

Post by NorbertKrupa » Fri Apr 11, 2014 12:54 pm

Technically, the primary key should have the highest cardinality because it has the most unique values. Vertica suggests to use the PK in the documentation. Otherwise, you find the highest cardinality column(s) and use those.
Checkout vertica.tips for more Vertica resources.

sarah
Intermediate
Intermediate
Posts: 77
Joined: Mon Aug 27, 2012 1:34 pm

Re: How to segment a projection

Post by sarah » Fri Apr 11, 2014 1:43 pm

#norbertk,

Thanks for your response! That's what I figured :)
Have a GREAT day!

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: How to segment a projection

Post by scutter » Fri Apr 11, 2014 2:02 pm

Sarah,

It’s a really good thing to understand how projections are optimized - what’s a good sort order, segmentation, etc. But in the end, using Database Designer to design projections is preferred at least as a first step. Have you tried using Designer?

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

cbennett
Newbie
Newbie
Posts: 11
Joined: Thu Sep 05, 2013 9:57 pm
Contact:

Re: How to segment a projection

Post by cbennett » Sat Apr 12, 2014 6:42 pm

Not a super-easy thing to answer, since there's a lot that goes into this, but in general, the above is all accurate. But even segmenting on a very cardinal PK might not be optimum, depending on what you're trying to accomplish. For example, even having perfect distribution of data through a PK segmentation, you could still get a lot of resegmentation if your query is trying to GROUP BY some logical value, like country_code. If GROUP BY performance is what you're after, you could try creating a projection that's segmented on the logical column instead. That creates local group bys, but beware - it could also create a lot of data skew in your cluster. Tread carefully.

Post Reply

Return to “New to Vertica Database Administration”