Concurrency in Vertica

Moderator: NorbertKrupa

prabhjot.matta
Newbie
Newbie
Posts: 4
Joined: Wed Jan 14, 2015 8:58 pm

Concurrency in Vertica

Post by prabhjot.matta » Wed Jan 14, 2015 9:42 pm

Hi,

I am new to Vertica and our team is doing a PoC and we need to provide the impact of concurrency in Vertica.
We will be having more than 1500 users hitting or running the dashboard reports concurrently. What should we do to improve the performance of Vertica in such scenario.

I have read the Shilpa Lawande's(VP of Engineering at Vertica) comment on this but she has not mentioned how the customers run thousand concurrent queries in sub seconds:
Scalability has three aspects – data volume, hardware size, and concurrency. Vertica’s performance scales linearly (and often super linearly due to compression and other factors) when you double the data volume or run the same data volume on twice the number of nodes. We have customers who have grown their databases from scratch to over a petabyte, with clusters from tens to hundreds of nodes. As far as concurrency goes, running queries 50-200x faster ensures that we can get a lot more queries done in a unit of time. To efficiently handle a highly concurrent mix of short and long queries, we have built-in workload management that controls how resources are allocated to different classes of queries. Some of our customers run with thousands of concurrent users running sub-second queries.
Regards,
Prabhjot

doug_harmon
Beginner
Beginner
Posts: 36
Joined: Fri Feb 17, 2012 6:09 pm
Contact:

Re: Concurrency in Vertica

Post by doug_harmon » Sat Jan 17, 2015 12:49 am

Talk to the people setting up your cluster. Vertica has a cookbook that describes how to configure a DL-380 box for a Vertica cluster.
http://www.vertica.com/resources-for-pl ... -hardware/
Share it with them and check after they're done to ensure that they followed the cookbook.

Next run some tests to measure network and disk performance. Ask your network engineer what else can be done to improve network speeds across the nodes in the cluster.
Run the validation scripts that come with Vertica to ensure performance is up to spec.
http://my.vertica.com/docs/7.1.x/HTML/i ... cripts.htm

After installing Vertica make sure that connection load balancing is set up. http://my.vertica.com/docs/7.1.x/HTML/i ... ancing.htm
Validate connection load balancing by looking at the query_requests system table to ensure that user queries are spread evenly across all of the nodes in the cluster.

Code: Select all

SELECT user_name, node_name, count(*) AS NumRecs
  FROM query_requests
GROUP BY 1,2
ORDER BY 1,2;
Set up resource pools. http://www.vertica.com/2014/12/16/ad-ho ... he-rescue/

Once the environment is set up the next thing to focus on is ensuring that queries finish quickly. Here's why: http://www.vertica.com/2014/07/17/workl ... -triangle/

I'm assuming your BI strategy is to have the BI tool send SQL queries to the database instead of caching the data on the BI Server. If possible, choose a BI tool that was designed to write optimized SQL for very large databases (VLDBs). For example, MicroStrategy has a VLDB connector for Vertica. Other BI tools I've looked at have Vertica connectors, but they don't write optimized SQL and require a BI and SQL Tuning expert to get them to perform well if they are deployed on top of a complex data model.

Take some time to read through the Administrators Guide, especially the section on projections, partitions, statistics, and analyzing workloads. Fact tables should be segmented, and you may need more than one projection per table.
http://my.vertica.com/docs/7.1.x/HTML/i ... sGuide.htm
Sections to pay attention to include
* Scalability & Concurrency.
http://my.vertica.com/docs/7.1.x/HTML/i ... Tuning.htm
* Sessions and the MaxClientSessions parameter.
http://my.vertica.com/docs/7.1.x/HTML/i ... ssions.htm
* How to Optimize Query Performance:
http://my.vertica.com/docs/7.1.x/HTML/i ... rmance.htm

Last of all, create and deploy a physical data model that is optimized for your workload. Hopefully your data model has only one table. ;) If that's not the case, it may take a couple of iterations to optimize the physical data model. The best data model for BI tool performance is one where all of the data has been denormalized into one table. If you need even better performance build live aggregate projections on top of that table. BI tools that can't write optimized SQL perform well against a one table data model.

If you have additional linux boxes you can add them to the cluster and then rebalance the cluster to see how performance scales with the number of nodes. Or take away a node and see what happens.

Talk to your Vertica Sales Rep. They can put you in touch with experts who can help with performance tuning.

Regards,

Doug

prabhjot.matta
Newbie
Newbie
Posts: 4
Joined: Wed Jan 14, 2015 8:58 pm

Re: Concurrency in Vertica

Post by prabhjot.matta » Mon Jan 19, 2015 8:27 pm

Hi Doug,

Thanks for detailed response.
For PoC we already have three node cluster. currently there are few number of request in each node(checked by the query you provided).
I was planning to make a script(Unix, Perl or Python) to do the concurrency testing.
Can you please help me with that?
I have VSQL in my local machine and connecting to Vertica host using:
/Users/opt/vertica/bin/vsql -U <user_name> -w <password> -h <hostname>

I downloaded UnixODBC, Vertica ODBC drivers separately and running the Perl code given in HP Vertica Programmer's guide manual, But it is giving me following error:

"dyld: lazy symbol binding failed: Symbol not found: _Init_iODBC
Referenced from: /usr/lib/libiodbc.2.dylib
Expected in: flat namespace

dyld: Symbol not found: _Init_iODBC
Referenced from: /usr/lib/libiodbc.2.dylib
Expected in: flat namespace

Trace/BPT trap: 5"

Can it be related to installation of drivers(iODBC, UnixODBC, VerticaODBC) at different locations in my local machine. Do I have to make some file to link these drivers?

Regards,
Prabhjot

doug_harmon
Beginner
Beginner
Posts: 36
Joined: Fri Feb 17, 2012 6:09 pm
Contact:

Re: Concurrency in Vertica

Post by doug_harmon » Tue Jan 20, 2015 4:56 am

Use Apache Jmeter with the Vertica JDBC driver for performance testing.
http://jmeter.apache.org/usermanual/bui ... -plan.html
The example given is mysql, but it's relatively easy to convert it to Vertica.

prabhjot.matta
Newbie
Newbie
Posts: 4
Joined: Wed Jan 14, 2015 8:58 pm

Re: Concurrency in Vertica

Post by prabhjot.matta » Wed Jan 21, 2015 9:45 pm

Thanks Doug!!

Regards,
Prabhjot

doug_harmon
Beginner
Beginner
Posts: 36
Joined: Fri Feb 17, 2012 6:09 pm
Contact:

Re: Concurrency in Vertica

Post by doug_harmon » Thu Jan 22, 2015 5:38 am

You're welcome. I'm curious, what was the result of your Vertica PoC?

prabhjot.matta
Newbie
Newbie
Posts: 4
Joined: Wed Jan 14, 2015 8:58 pm

Re: Concurrency in Vertica

Post by prabhjot.matta » Thu Jan 22, 2015 7:52 pm

Hi Doug,

I am working on client machine(OSX), where I cant not(not allowed to) download external softwares - JMeter, that is why I was trying to write a script to do the concurrency testing. I can do it in Shell script, Perl or Python. I tried with Perl but got errors mentioned in my previous post. So now instead of ODBC, I am planning to give a try using JDBC.

Let me know if you have done similar thing like connecting to Vertica using JDBC.

Regards,
Prabhjot

Post Reply

Return to “New to Vertica”