Column Sort Order

Moderator: NorbertKrupa

Post Reply
sramdram
Newbie
Newbie
Posts: 23
Joined: Fri Jul 12, 2013 6:46 pm

Column Sort Order

Post by sramdram » Sat Aug 10, 2013 6:43 pm

If i create a projection with a sort order like

order by col1,col2,col3

what does that translate to from a storage perspective...

Does it mean that col1,col2 and col3 are independently stored sorted ? Is there any collocation of rows based on this column ordering

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Column Sort Order

Post by JimKnicely » Tue Aug 13, 2013 2:33 pm

Hi,

Here is some info. from the Concept's Guide about sort order for projections:
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once. The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.

You can also optimize a query by matching the projection's sort order to the query's GROUP BY clause. If you do not specify a sort order, Vertica uses the order in which columns are specified in the column definition as the projection's sort order.
The key phrase "...localizes logically grouped values so that a disk read can pick up many results at once" seems to indicate that Vertica will store the data together physically on disk.

Anyone agree or disagree with this?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: Column Sort Order

Post by nnani » Tue Aug 13, 2013 2:47 pm

Agreed,

So the bottom line can be
col1, col2, col3 are logically grouped in a sort order relating one to another
nnani........
Long way to go

You can check out my blogs at vertica-howto

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Column Sort Order

Post by id10t » Wed Aug 14, 2013 9:22 am

HI!

>> seems to indicate that Vertica will store the data together physically on disk. Anyone agree or disagree with this?
In general: disagree (https://my.vertica.com/docs/6.1.x/HTML/ ... #13460.htm)
and agreed with @nnani: columns "are logically grouped in a sort order relating one to another" (and not physically)
Syntax

column-name-list

... [ ENCODING encoding-type ]

... [ ACCESSRANK integer ] [ , ... ]

... [ GROUPED ( projection-column-reference [,...] ) ]

Parameters
...
Groups two or more columns into a single disk file. This minimizes file I/O for work loads that:

- Read a large percentage of the columns in a table.
- Perform single row look-ups.
- Query against many small columns.
- Frequently update data in these columns.

If you have data that is always accessed together and it is not used in predicates, you can increase query performance by grouping these columns. Once grouped, queries can no longer independently retrieve from disk all records for an individual column independent of the other columns within the group.
...
Note: HP Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, HP Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Column Sort Order

Post by JimKnicely » Wed Aug 14, 2013 12:34 pm

Hi guys,

Now that I've read this statement "The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once." over and over again a few times, I do agree with #sKwa and #nnani.

So the columns defined in the ORDER BY clause of a projection are "logically grouped" in the DB meta-data (some sort of pointers) in such a way that when they are referenced in a SQL statement's WHERE or GROUP BY clause, read performance is increased dramatically.

Does that make sense?
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

User avatar
JimKnicely
Site Admin
Site Admin
Posts: 1825
Joined: Sat Jan 21, 2012 4:58 am
Contact:

Re: Column Sort Order

Post by JimKnicely » Wed Aug 14, 2013 7:43 pm

Here is link to a great article titled "Life Beyond Indices: The Query Benefits of Storing Sorted Data" that explains what we've been talking about in this thread :)

http://www.vertica.com/2010/11/23/life- ... rted-data/
Jim Knicely

Image

Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.

Post Reply

Return to “New to Vertica Database Administration”