We have a three node cluster, get_compliance_status() says it's 38% full. It looks like the 1TB community license limit is for total replicated space, but the real amount of data that can be stored is somewhat less then 1/4 of that. It's counting the data twice for each node:
At 38% of 1TB capacity:
dbadmin=> select storage_path, storage_usage, disk_space_used_mb, disk_space_used_mb / 1024 "disk_space_used_mb (GB)" from disk_storage;
storage_path | storage_usage | disk_space_used_mb | disk_space_used_mb (GB)
------------------------------------------------+---------------+--------------------+-------------------------
/home/dbadmin/dw/v_dw_node0001_catalog/Catalog | CATALOG | 95450 | 93.212890625000000000
/home/dbadmin/dw/v_dw_node0001_data | DATA,TEMP | 95450 | 93.212890625000000000
/home/dbadmin/dw/v_dw_node0002_catalog/Catalog | CATALOG | 95387 | 93.151367187500000000
/home/dbadmin/dw/v_dw_node0002_data | DATA,TEMP | 95387 | 93.151367187500000000
/home/dbadmin/dw/v_dw_node0003_catalog/Catalog | CATALOG | 95380 | 93.144531250000000000
/home/dbadmin/dw/v_dw_node0003_data | DATA,TEMP | 95380 | 93.144531250000000000
The Catalog directory is just a few MB. The actual amount of disk space used on each node is only 46GB, I am assuming it's compressed:
# du -sh /home/dbadmin/dw/v_dw_node0001_data
46G /home/dbadmin/dw/v_dw_node0001_data
# du -sh /home/dbadmin/dw/v_dw_node0001_catalog/Catalog
48M /home/dbadmin/dw/v_dw_node0001_catalog/Catalog
Is this the expected behavior? I expected to get 300GB data on each node before the 1TB community compliance limit was reached.
Data vs compliance capacity
Moderator: NorbertKrupa
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Data vs compliance capacity
Hi,
The license of 1 TB is for the raw (uncompressed) database size. The data on disk is very highly compressed. Vertica does not count data elements twice against the license, only once.
Check out Norbert Krupa's tip which includes a query you can run to get the estimated raw size and compressed size of your database:
http://vertica.tips/2014/01/24/license- ... -raw-size/
The license of 1 TB is for the raw (uncompressed) database size. The data on disk is very highly compressed. Vertica does not count data elements twice against the license, only once.
Check out Norbert Krupa's tip which includes a query you can run to get the estimated raw size and compressed size of your database:
http://vertica.tips/2014/01/24/license- ... -raw-size/
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: Data vs compliance capacity
Looks like the info in "disk_storage" is basically useless. A better query:
select node_name,sum(used_bytes) / 1024 / 1024 / 1024 as GB from v_monitor.storage_containers group by node_name;
returns the total size of compressed data on disk, which corresponds roughly to the raw_estimate_gb returned by the query in the tip.
The compression ratio for our data appears to be about 2.9 to 1.
And replication also seems to count against the license limit, so we should hit the 1TB limit while using only about 120GB disk space on each node.
select node_name,sum(used_bytes) / 1024 / 1024 / 1024 as GB from v_monitor.storage_containers group by node_name;
returns the total size of compressed data on disk, which corresponds roughly to the raw_estimate_gb returned by the query in the tip.
The compression ratio for our data appears to be about 2.9 to 1.
And replication also seems to count against the license limit, so we should hit the 1TB limit while using only about 120GB disk space on each node.
Last edited by wsanders on Mon Mar 09, 2015 10:30 pm, edited 1 time in total.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: Data vs compliance capacity
You may want to monitor DISK_STORAGE so that you don't run out of physical disk space
A particular table can take up a lot more disk space than what is calculated as the tables "raw" size depending on the number of projection it has.
I don't think that replicated data is counted more than once... You can use the AUDIT() function to estimate the raw data size of a database, a schema, a projection, or a table as it is counted in an audit of the database size.
Replicated Table:
Segmented Table:
A particular table can take up a lot more disk space than what is calculated as the tables "raw" size depending on the number of projection it has.
I don't think that replicated data is counted more than once... You can use the AUDIT() function to estimate the raw data size of a database, a schema, a projection, or a table as it is counted in an audit of the database size.
Replicated Table:
Code: Select all
dbadmin=> create table test as select * from columns unsegmented all nodes;
CREATE TABLE
dbadmin=> select count(*) from projections where anchor_table_name = 'test';
count
-------
3
(1 row)
dbadmin=> select sum(used_bytes) from projection_storage where anchor_table_name = 'test';
sum
--------
147456
(1 row)
dbadmin=> select audit('test');
audit
-------
24149
(1 row)
Code: Select all
dbadmin=> drop table test;
DROP TABLE
dbadmin=> select count(*) from projections where anchor_table_name = 'test';
count
-------
2
(1 row)
dbadmin=> create table test as select * from columns segmented by hash(table_name) all nodes;
CREATE TABLE
dbadmin=> select sum(used_bytes) from projection_storage where anchor_table_name = 'test';
sum
--------
163840
(1 row)
dbadmin=> select audit('test');
audit
-------
24149
(1 row)
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.