Page 1 of 1

Column Sort Order

Posted: Sat Aug 10, 2013 6:43 pm
by sramdram
If i create a projection with a sort order like

order by col1,col2,col3

what does that translate to from a storage perspective...

Does it mean that col1,col2 and col3 are independently stored sorted ? Is there any collocation of rows based on this column ordering

Re: Column Sort Order

Posted: Tue Aug 13, 2013 2:33 pm
by JimKnicely
Hi,

Here is some info. from the Concept's Guide about sort order for projections:
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once. The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.

You can also optimize a query by matching the projection's sort order to the query's GROUP BY clause. If you do not specify a sort order, Vertica uses the order in which columns are specified in the column definition as the projection's sort order.
The key phrase "...localizes logically grouped values so that a disk read can pick up many results at once" seems to indicate that Vertica will store the data together physically on disk.

Anyone agree or disagree with this?

Re: Column Sort Order

Posted: Tue Aug 13, 2013 2:47 pm
by nnani
Agreed,

So the bottom line can be
col1, col2, col3 are logically grouped in a sort order relating one to another

Re: Column Sort Order

Posted: Wed Aug 14, 2013 9:22 am
by id10t
HI!

>> seems to indicate that Vertica will store the data together physically on disk. Anyone agree or disagree with this?
In general: disagree (https://my.vertica.com/docs/6.1.x/HTML/ ... #13460.htm)
and agreed with @nnani: columns "are logically grouped in a sort order relating one to another" (and not physically)
Syntax

column-name-list

... [ ENCODING encoding-type ]

... [ ACCESSRANK integer ] [ , ... ]

... [ GROUPED ( projection-column-reference [,...] ) ]

Parameters
...
Groups two or more columns into a single disk file. This minimizes file I/O for work loads that:

- Read a large percentage of the columns in a table.
- Perform single row look-ups.
- Query against many small columns.
- Frequently update data in these columns.

If you have data that is always accessed together and it is not used in predicates, you can increase query performance by grouping these columns. Once grouped, queries can no longer independently retrieve from disk all records for an individual column independent of the other columns within the group.
...
Note: HP Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, HP Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.

Re: Column Sort Order

Posted: Wed Aug 14, 2013 12:34 pm
by JimKnicely
Hi guys,

Now that I've read this statement "The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once." over and over again a few times, I do agree with #sKwa and #nnani.

So the columns defined in the ORDER BY clause of a projection are "logically grouped" in the DB meta-data (some sort of pointers) in such a way that when they are referenced in a SQL statement's WHERE or GROUP BY clause, read performance is increased dramatically.

Does that make sense?

Re: Column Sort Order

Posted: Wed Aug 14, 2013 7:43 pm
by JimKnicely
Here is link to a great article titled "Life Beyond Indices: The Query Benefits of Storing Sorted Data" that explains what we've been talking about in this thread :)

http://www.vertica.com/2010/11/23/life- ... rted-data/