Performance tuning a COUNT DISTINCT

Moderator: NorbertKrupa

Post Reply
User avatar
becky
Intermediate
Intermediate
Posts: 118
Joined: Sat Apr 28, 2012 11:37 am

Performance tuning a COUNT DISTINCT

Post by becky » Wed Sep 17, 2014 1:28 am

Hi all,

Does anyone have any tips or best ways to handle a COUNT DISTINCT on a very large data set (> 1 million rows in a fact table). I'm seeing my queries take > 30 seconds with the COUNT DISTINCT and about 7 seconds with a simple COUNT.

Thanks!
THANKS - BECKSTER

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Performance tuning a COUNT DISTINCT

Post by NorbertKrupa » Wed Sep 17, 2014 1:20 pm

If you're on 7.0+, you may want to look at Optimizing COUNT (DISTINCT) by Calculating Approximate Counts.
Checkout vertica.tips for more Vertica resources.

User avatar
BoMBaY
Beginner
Beginner
Posts: 26
Joined: Tue Jul 16, 2013 5:45 am

Re: Performance tuning a COUNT DISTINCT

Post by BoMBaY » Wed Sep 17, 2014 1:42 pm

There is the good article about count distinct on the link below.

Avoiding the OLAP Cliff for Count Distinct Queries in Vertica
http://www.vertica.com/2014/03/05/avoid ... n-vertica/
Itipong Chewinpipat (Bay)
DBA Specialist (Vertica/Oracle)

Image
ImageImage

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Performance tuning a COUNT DISTINCT

Post by NorbertKrupa » Wed Sep 17, 2014 1:53 pm

That article eludes to using an approximate count distinct.
Checkout vertica.tips for more Vertica resources.

User avatar
becky
Intermediate
Intermediate
Posts: 118
Joined: Sat Apr 28, 2012 11:37 am

Re: Performance tuning a COUNT DISTINCT

Post by becky » Wed Sep 17, 2014 2:13 pm

Thanks, guys.

I tried the approximate count distinct function and the query never returns! Well, I let it run for about two minutes and just canceled it. The COUNT DISTINCT is faster in my database. Is that weird?

:o :shock: :o
THANKS - BECKSTER

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Performance tuning a COUNT DISTINCT

Post by NorbertKrupa » Wed Sep 17, 2014 2:47 pm

What does your query look like?
Checkout vertica.tips for more Vertica resources.

User avatar
becky
Intermediate
Intermediate
Posts: 118
Joined: Sat Apr 28, 2012 11:37 am

Re: Performance tuning a COUNT DISTINCT

Post by becky » Tue Sep 30, 2014 3:47 pm

Anyone know what these parameters do?

Code: Select all

dbadmin=> select parameter_name, description from configuration_parameters where parameter_name ilike '%distinct%';
         parameter_name         |                                                       description
--------------------------------+-------------------------------------------------------------------------------------------------------------------------
 DBDCountDistinctSampleRowCount | Number of rows to be sampled for count distinct analysis during DBD design. Default: 0 Use DBDCountDistinctSampleRowPct
 DBDCountDistinctSampleRowPct   | Percentage of rows to be sampled for count distinct analysis during DBD design. Default: 100
(2 rows)
THANKS - BECKSTER

Post Reply

Return to “Vertica SQL”