If i create a projection with a sort order like
order by col1,col2,col3
what does that translate to from a storage perspective...
Does it mean that col1,col2 and col3 are independently stored sorted ? Is there any collocation of rows based on this column ordering
Column Sort Order
Moderator: NorbertKrupa
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Column Sort Order
Hi,
Here is some info. from the Concept's Guide about sort order for projections:
Anyone agree or disagree with this?
Here is some info. from the Concept's Guide about sort order for projections:
The key phrase "...localizes logically grouped values so that a disk read can pick up many results at once" seems to indicate that Vertica will store the data together physically on disk.The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once. The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection's sort order to the query's GROUP BY clause. If you do not specify a sort order, Vertica uses the order in which columns are specified in the column definition as the projection's sort order.
Anyone agree or disagree with this?
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: Column Sort Order
Agreed,
So the bottom line can be
col1, col2, col3 are logically grouped in a sort order relating one to another
So the bottom line can be
col1, col2, col3 are logically grouped in a sort order relating one to another
Re: Column Sort Order
HI!
>> seems to indicate that Vertica will store the data together physically on disk. Anyone agree or disagree with this?
In general: disagree (https://my.vertica.com/docs/6.1.x/HTML/ ... #13460.htm)
and agreed with @nnani: columns "are logically grouped in a sort order relating one to another" (and not physically)
>> seems to indicate that Vertica will store the data together physically on disk. Anyone agree or disagree with this?
In general: disagree (https://my.vertica.com/docs/6.1.x/HTML/ ... #13460.htm)
and agreed with @nnani: columns "are logically grouped in a sort order relating one to another" (and not physically)
Syntax
column-name-list
... [ ENCODING encoding-type ]
... [ ACCESSRANK integer ] [ , ... ]
... [ GROUPED ( projection-column-reference [,...] ) ]
Parameters
...
Groups two or more columns into a single disk file. This minimizes file I/O for work loads that:
- Read a large percentage of the columns in a table.
- Perform single row look-ups.
- Query against many small columns.
- Frequently update data in these columns.
If you have data that is always accessed together and it is not used in predicates, you can increase query performance by grouping these columns. Once grouped, queries can no longer independently retrieve from disk all records for an individual column independent of the other columns within the group.
...
Note: HP Vertica performs dynamic column grouping. For example, to provide better read and write efficiency for small loads, HP Vertica ignores any projection-defined column grouping (or lack thereof) and groups all columns together by default.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Column Sort Order
Hi guys,
Now that I've read this statement "The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once." over and over again a few times, I do agree with #sKwa and #nnani.
So the columns defined in the ORDER BY clause of a projection are "logically grouped" in the DB meta-data (some sort of pointers) in such a way that when they are referenced in a SQL statement's WHERE or GROUP BY clause, read performance is increased dramatically.
Does that make sense?
Now that I've read this statement "The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once." over and over again a few times, I do agree with #sKwa and #nnani.
So the columns defined in the ORDER BY clause of a projection are "logically grouped" in the DB meta-data (some sort of pointers) in such a way that when they are referenced in a SQL statement's WHERE or GROUP BY clause, read performance is increased dramatically.
Does that make sense?
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Column Sort Order
Here is link to a great article titled "Life Beyond Indices: The Query Benefits of Storing Sorted Data" that explains what we've been talking about in this thread
http://www.vertica.com/2010/11/23/life- ... rted-data/
http://www.vertica.com/2010/11/23/life- ... rted-data/
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.