Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”
Thank you for your help
Large deletes from two tables
Moderator: NorbertKrupa
Re: Large deletes from two tables
If it's one bulk then you can do that in a delete and then purge deleted vector files (verify that your projection are suited for deletes with evakuate delete performance ).golgiapparatus wrote:Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”
Thank you for your help
If it's taking too much time then use create table as select to create a "good" table then rename the old one to a temp name and the new one to the original name
for example :
create table x_new as select * from schema.tableA where id =3 ( you can create the structure first the same as the original and use insert instead of ctas)
alter table schema.tableA rename to x_old
alter table x_new rename to schema.tableA;
Once you sure the process finished successfully drop the x_old table .
Aother approache is to use partitions and then your delete will not be as bad as scan the whole table.
-
- Newbie
- Posts: 5
- Joined: Mon Apr 22, 2013 5:49 pm
Re: Large deletes from two tables
Would partitioning be the best method, do you know of a good tutorial on partitioning?zvika wrote:If it's one bulk then you can do that in a delete and then purge deleted vector files (verify that your projection are suited for deletes with evakuate delete performance ).golgiapparatus wrote:Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”
Thank you for your help
If it's taking too much time then use create table as select to create a "good" table then rename the old one to a temp name and the new one to the original name
for example :
create table x_new as select * from schema.tableA where id =3 ( you can create the structure first the same as the original and use insert instead of ctas)
alter table schema.tableA rename to x_old
alter table x_new rename to schema.tableA;
Once you sure the process finished successfully drop the x_old table .
Aother approache is to use partitions and then your delete will not be as bad as scan the whole table.
Again,
Thanks for the help. I am working on somthing that I might not be the best person for at the time. I learn through walking through the steps.
Re: Large deletes from two tables
If you can implement partitioning than I think it's the one of the best way as every solution depend on the scenario and there is no rule for all scenarios.
Just read the documentation about partitioning it will give you the knowlege .
You will be the best person ... don't worry
Just read the documentation about partitioning it will give you the knowlege .
You will be the best person ... don't worry
-
- Beginner
- Posts: 36
- Joined: Fri Feb 17, 2012 6:09 pm
- Contact:
Re: Large deletes from two tables
Here's an example of how to do partitioning.
http://www.vertica-forums.com/viewtopic.php?f=63&t=1113
http://www.vertica-forums.com/viewtopic.php?f=63&t=1113
-
- Newbie
- Posts: 5
- Joined: Mon Apr 22, 2013 5:49 pm
Re: Large deletes from two tables
Hello all, I am sort of confused, I have a few questions to ask.
I have a table: schema.MyTblName
I do a query
This returns 900,000,000 rows.
Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.
Then create a projection table that has the same schema as my schema.MyTblName table?
I then populate the projection table with the items that I want to delete?
then run the command
Is this correct?
How can I then purge the data from storage so I can become compliant with my licenses?
I have a table: schema.MyTblName
I do a query
Code: Select all
SELECT COUNT(*) FROM schema.MyTblName WHERE ID = 0001
Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.
Then create a projection table that has the same schema as my schema.MyTblName table?
I then populate the projection table with the items that I want to delete?
then run the command
Code: Select all
DELETE FROM schema.MyTblName WHERE ID = 0001
How can I then purge the data from storage so I can become compliant with my licenses?
Re: Large deletes from two tables
Hi!
1. Better if you will use in PARTITIONING, deletes are painful for Vertica.
https://my.vertica.com/docs/6.1.x/HTML/ ... #19058.htm
https://my.vertica.com/docs/6.1.x/HTML/ ... 9074_1.htm
2. To create "clone" of table use in "CREATE TABLE LIKE" statement
https://my.vertica.com/docs/6.1.x/HTML/ ... #18554.htm
>> Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.
And what is capacity of table? 900M rows for deletion? If you have ~1 billion of rows, so may be will be faster to move all data that should not be deleted (and after it to truncate a table with deleted data)?
1. Better if you will use in PARTITIONING, deletes are painful for Vertica.
https://my.vertica.com/docs/6.1.x/HTML/ ... #19058.htm
https://my.vertica.com/docs/6.1.x/HTML/ ... 9074_1.htm
2. To create "clone" of table use in "CREATE TABLE LIKE" statement
https://my.vertica.com/docs/6.1.x/HTML/ ... #18554.htm
>> Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.
And what is capacity of table? 900M rows for deletion? If you have ~1 billion of rows, so may be will be faster to move all data that should not be deleted (and after it to truncate a table with deleted data)?