count distinct and segmentation

Moderator: NorbertKrupa

Post Reply
sramdram
Newbie
Newbie
Posts: 23
Joined: Fri Jul 12, 2013 6:46 pm

count distinct and segmentation

Post by sramdram » Sat Aug 03, 2013 6:10 am

If a query has a count (distinct col), does the initiator node have to pull in all rows for the column group for which we are doing a count distinct into a single node and then do the count ? (I am assuming this is how it would do it)..

If so...is creating a partition with a segmentation based on the column grouping the only way to effectively distribute the query processing ?

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: count distinct and segmentation

Post by scutter » Sat Aug 03, 2013 12:40 pm

The initiator will get the distinct values from each nodes and do the final step of the count(distinct). Depending on the projection's segmentation, distinct values may be determined locally on each node or the database may choose to resegment the data so that each node has distinct values on it. If the computation is done with distinct values on each node, then the initiator would only be quickly adding up the counts as the final step, not doing a final step of finding distinct values. An EXPLAIN should reveal which is the case for your query.

--Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

Post Reply

Return to “Vertica Performance Tuning”