Does Vertica run analyze_statistics at the Materialize step in EXPLAIN?
Thanks for your help!
analyze_statistics on "Materialize" Step
Moderator: NorbertKrupa
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: analyze_statistics on "Materialize" Step
It updates statistics on the data. If there is a materialize step in the plan, it's because the data needed to be materialized. If you don't think the columns should be materialized, do you have the proper encoding on them?
Checkout vertica.tips for more Vertica resources.
Re: analyze_statistics on "Materialize" Step
Thanks Norbert.
Perhaps I should get some clarification on a few things before proceeding:
1) What is materialization? Is it fair to call it an intermediate result set, between the original projection and the final result set?
2) What else prompts the materialization step, besides a need to encode or re-encode the columns (as implied by your reply)?
Basic questions, I'm sure, but haven't found any explanation in the docs or anywhere else online, so it would be nice to check my own assumptions here.
Perhaps I should get some clarification on a few things before proceeding:
1) What is materialization? Is it fair to call it an intermediate result set, between the original projection and the final result set?
2) What else prompts the materialization step, besides a need to encode or re-encode the columns (as implied by your reply)?
Basic questions, I'm sure, but haven't found any explanation in the docs or anywhere else online, so it would be nice to check my own assumptions here.
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: analyze_statistics on "Materialize" Step
Materialization means that compressed or encoded data must be decompressed or decoded. An encoding such as RLE may not need materialization.
Here's some reading:
Materialization Strategies in a Column-Oriented DBMS
Here's some reading:
Materialization Strategies in a Column-Oriented DBMS
Checkout vertica.tips for more Vertica resources.
Re: analyze_statistics on "Materialize" Step
Materialization is really just the reading of data from the projection. Query plans may choose to do “early materialization”, which means reading data from the projection during the initial projection scan, or “late materialization” which means deferring the read until a later step of the execution plan. Consider for example a join where many rows are filtered out by the join. It would be wasteful to read all of the columns referenced in the query before the join is evaluated because many of those rows won’t be needed since they are filtered out. So only the join key is read, the join is executed, and after the join, the remaining columns are read from disk for the rows that are emitted from the join.
Vertica doesn’t “update statistics” during query execution. It’s up to you to update statistics after data loads, or using an automated job.
—Sharon
Vertica doesn’t “update statistics” during query execution. It’s up to you to update statistics after data loads, or using an automated job.
—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC
Vertica Consultant, Zazz Technologies LLC
Re: analyze_statistics on "Materialize" Step
Thanks Norbert - excellent write-up.
Thanks for the nice summary explanation, Sharon.
Thanks for the nice summary explanation, Sharon.