Hello,
What is the fastest way to export a query result from Vertica to some external format? Greenplum has different parallel data unloading options.
Running vsql I measured maximum of 50,000 rows per second per cpu, which is quite slow for our purposes. The (uncompressed) row size is about 100 bytes.
Interestingly, Postgresql is also capable of exporting data at about the same speed, so it might be some problem of our setup or data.
Would using Hadoop connector allow to export query results faster?
http://www.vertica.com/2012/07/05/teach ... ew-tricks/
There is a web page from 2010
http://www.dbms2.com/2010/10/12/vertica ... tegration/
claiming that:
In addition, inspired by a large banking customer, Vertica is announcing some cool Hadoop integration futures:
* Vertica-formatted data will be stored on HDFS (Hadoop Distributed File System).
* It will get there via parallel backup — i.e., you will be able to back up Vertica to HDFS.
* Libraries will be exposed to let HDFS read and write the Vertica-formatted data, for purposes like ETL, long-running analytics, etc.
What is the status of this development?
Thank You!
Fast query result export
Moderator: NorbertKrupa
-
- Intermediate
- Posts: 149
- Joined: Mon Apr 30, 2012 10:04 pm
- Location: New York
- Contact:
Re: Fast query result export
Where do you want to write it to? Thinking outside the box, you could use the hadoop connector or you could use a custom UDX.