How is table data stored on disk?
Moderator: NorbertKrupa
How is table data stored on disk?
Hello, Does anyone know how Vertica stores the data in tables on disk? The directories in the file system are just a bunch of confusing numbers. Thanks.
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: How is table data stored on disk?
Hi jsbaskin,
Projection data in Verica is stored in ROS (Read Optimized Storage) containers on disk. The root location on disk where your data is located can be found by querying the DISK_STORAGE system table:
In the storage path directory on disk you'll see a bunch of sub-directories having numbered names ranging from 001 up to 999. Each directory is a ROS container. In each of those directories are additional sub-directories named by using a unique identifier called the projection storage uid. In these sub-directories is where you'll find your data files.
For example, I have a table named holiday_dim. Say I want to find out where on disk its data files are located.
First I would need to find out which projections have been created for the table. To do that I would query the PROJECTION_STORAGE system table:
Now that I know the projection name (holiday_dim_node0001) on node 1, I can find out the ROS container's unique id by querying the STORAGE_CONTAINERS system table:
The storage_oid is the unique numeric ID assigned by the Vertica catalog that identifies the storage location. It's basically the directory name on disk.
Since I know that my data is stored in the location /data1/data/intersect/v_intersect_node0001_data on disk, I just need to find the ROS container directory under there... For that I can use the Linux find command:
There are always two files located in the directory for a projection. One file has a .fdb extension and the other has a .pidx extension. The .fdb file is the data itself (encoded and compressed) while the .pidx file contains index/statistics info from the data file.
Notice that the last three digits of the projection directory represents the ROS container's directory name:
Projection data in Verica is stored in ROS (Read Optimized Storage) containers on disk. The root location on disk where your data is located can be found by querying the DISK_STORAGE system table:
Code: Select all
dbadmin=> select storage_path from v_monitor.disk_storage where storage_usage = 'DATA';
storage_path
-------------------------------------------------
/data1/data/intersect/v_intersect_node0001_data
/data2/data/intersect/v_intersect_node0001_data
(2 rows)
For example, I have a table named holiday_dim. Say I want to find out where on disk its data files are located.
First I would need to find out which projections have been created for the table. To do that I would query the PROJECTION_STORAGE system table:
Code: Select all
dbadmin=> select projection_name, row_count, ros_count from v_monitor.projection_storage where anchor_table_name = 'holiday_dim';
projection_name | row_count | ros_count
----------------------+-----------+-----------
holiday_dim_node0001 | 60 | 1
holiday_dim_node0002 | 60 | 1
holiday_dim_node0003 | 60 | 1
(3 rows)
Code: Select all
dbadmin=> select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'holiday_dim_node0001';
storage_type | storage_oid
--------------+-------------------
ROS | 45035996274585219
(1 row)
Since I know that my data is stored in the location /data1/data/intersect/v_intersect_node0001_data on disk, I just need to find the ROS container directory under there... For that I can use the Linux find command:
Code: Select all
bash-3.2$ pwd
/data1/data/intersect/v_intersect_node0001_data
bash-3.2$ find ./ -name 45035996274585219
./219/45035996274585219
bash-3.2$ cd 219/45035996274585219
bash-3.2$ ls -lrt
total 8
-rw------- 1 dbadmin dbadmin 48 Mar 21 22:31 45035996274585219_0.pidx
-rw------- 1 dbadmin dbadmin 134 Mar 21 22:31 45035996274585219_0.fdb
Notice that the last three digits of the projection directory represents the ROS container's directory name:
- /data1/data/intersect/v_intersect_node0001_data/219/45035996274585219
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: How is table data stored on disk?
Hi!
I have an another question...how works the projections data files creation? There is one data file only that stores data? Or is it possible to store data of projection in different files?
And...
I read something about automatic managemet of storage location...but obviusly if there is one file per projection's data it is always stored on one storage location...is it correct?
Tnx!
I have an another question...how works the projections data files creation? There is one data file only that stores data? Or is it possible to store data of projection in different files?
And...
I read something about automatic managemet of storage location...but obviusly if there is one file per projection's data it is always stored on one storage location...is it correct?
Tnx!
- JimKnicely
- Site Admin
- Posts: 1825
- Joined: Sat Jan 21, 2012 4:58 am
- Contact:
Re: How is table data stored on disk?
Hi man,
Great question. I believe that projection data is stored in one file per node unless the projection's anchor table was created using the PARTITION clause... in that case, there will be one file for each partition. When you create a table with the PARTITION clause then all projections based on that table are also partitioned.
To demonstrate, let's create a partitioned table:
We can check the v_monitor.projection_storage system table to find the table's projections:
Then we can find the storage containers used by the "test1_node0001" projection via the v_monitor.storage_containers system table:
We see that there are three ROS containers for the single projection, one ROS for each partition.
If we go out the OS we'll also see that there are three separate .fdb data files, one for each ROS and therefore one for each partition:
I hope this helps!
Great question. I believe that projection data is stored in one file per node unless the projection's anchor table was created using the PARTITION clause... in that case, there will be one file for each partition. When you create a table with the PARTITION clause then all projections based on that table are also partitioned.
To demonstrate, let's create a partitioned table:
Code: Select all
dbadmin=> create table test1 (col1 int, col2 varchar2(100), primary key (col1)) partition by col1;
CREATE TABLE
dbadmin=> insert into test1 values (1, 'Pittsburgh');
OUTPUT
--------
1
(1 row)
dbadmin=> insert into test1 values (2, 'Toronto');
OUTPUT
--------
1
(1 row)
dbadmin=> insert into test1 values (3, 'Dallas');
OUTPUT
--------
1
(1 row)
Code: Select all
dbadmin=> select projection_name, row_count, ros_count from v_monitor.projection_storage where anchor_table_name = 'test1';
projection_name | row_count | ros_count
-----------------+-----------+-----------
test1_node0001 | 3 | 3
test1_node0002 | 3 | 3
test1_node0003 | 3 | 3
(3 rows)
Code: Select all
dbadmin=> select storage_type, storage_oid from v_monitor.storage_containers where projection_name = 'test1_node0001';
storage_type | storage_oid
--------------+-------------------
ROS | 45035996276572379
ROS | 45035996276572389
ROS | 45035996276572399
(3 rows)
If we go out the OS we'll also see that there are three separate .fdb data files, one for each ROS and therefore one for each partition:
Code: Select all
verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./379/45035996276572379
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572379_0.pidx
-rw------- 1 dbadmin dbadmin 3 Apr 13 07:17 45035996276572379_0.fdb
verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./389/45035996276572389
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572389_0.pidx
-rw------- 1 dbadmin dbadmin 3 Apr 13 07:17 45035996276572389_0.fdb
verticatst01:/usr/local/data/intersect/v_intersect_node0001_data $ ls -lrt ./399/45035996276572399
total 8
-rw------- 1 dbadmin dbadmin 48 Apr 13 07:17 45035996276572399_0.pidx
-rw------- 1 dbadmin dbadmin 3 Apr 13 07:17 45035996276572399_0.fdb
Jim Knicely
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Note: I work for Vertica. My views, opinions, and thoughts expressed here do not represent those of my employer.
Re: How is table data stored on disk?
When query is running, What happens if node fails? I mean regarding performance
Re: How is table data stored on disk?
Hi guys,
I upgraded to 7.2. I not
I thought that the only files asscoiated with data are the .fb and .pidx files. But I now see .gt files:
Anyone know what these .gt files are?
I upgraded to 7.2. I not
I thought that the only files asscoiated with data are the .fb and .pidx files. But I now see .gt files:
Code: Select all
[dbadmin@BK 277]$ ls -lrt
total 4
-rw------- 1 dbadmin verticadba 438 Dec 16 13:50 0267ea83948962386adc79978bf46adc00a0000000024745_0.gt
THANKS - BECKSTER
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: How is table data stored on disk?
Coming in 7.3 there will be integration with Grand Theft Auto.
Checkout vertica.tips for more Vertica resources.