If a query has a count (distinct col), does the initiator node have to pull in all rows for the column group for which we are doing a count distinct into a single node and then do the count ? (I am assuming this is how it would do it)..
If so...is creating a partition with a segmentation based on the column grouping the only way to effectively distribute the query processing ?
count distinct and segmentation
Moderator: NorbertKrupa
Re: count distinct and segmentation
The initiator will get the distinct values from each nodes and do the final step of the count(distinct). Depending on the projection's segmentation, distinct values may be determined locally on each node or the database may choose to resegment the data so that each node has distinct values on it. If the computation is done with distinct values on each node, then the initiator would only be quickly adding up the counts as the final step, not doing a final step of finding distinct values. An EXPLAIN should reveal which is the case for your query.
--Sharon
--Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC