Large deletes from two tables

Moderator: NorbertKrupa

golgiapparatus
Newbie
Newbie
Posts: 5
Joined: Mon Apr 22, 2013 5:49 pm

Large deletes from two tables

Post by golgiapparatus » Sat Aug 31, 2013 9:45 pm

Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”

Thank you for your help

zvika
Beginner
Beginner
Posts: 25
Joined: Thu Apr 19, 2012 7:55 am

Re: Large deletes from two tables

Post by zvika » Mon Sep 02, 2013 12:46 pm

golgiapparatus wrote:Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”

Thank you for your help
If it's one bulk then you can do that in a delete and then purge deleted vector files (verify that your projection are suited for deletes with evakuate delete performance ).
If it's taking too much time then use create table as select to create a "good" table then rename the old one to a temp name and the new one to the original name
for example :

create table x_new as select * from schema.tableA where id =3 ( you can create the structure first the same as the original and use insert instead of ctas)
alter table schema.tableA rename to x_old
alter table x_new rename to schema.tableA;

Once you sure the process finished successfully drop the x_old table .

Aother approache is to use partitions and then your delete will not be as bad as scan the whole table.

golgiapparatus
Newbie
Newbie
Posts: 5
Joined: Mon Apr 22, 2013 5:49 pm

Re: Large deletes from two tables

Post by golgiapparatus » Mon Sep 02, 2013 8:16 pm

zvika wrote:
golgiapparatus wrote:Hello all,
I am in need of some help. I am working with Vertica (the default guy) and have a 9 raw tb licenses. Right now we have 7 TB raw used and I want to get rid of about 1TB of data.
This data is located in two tables, let’s call them tableA and tableB.
Each table has a common key that I want to remove. In sql I would do something like
“Delete from schema.tableA where id =3” and “Delete from schema.tableB where id =3”
Is that the correct way to remove lots of data? If not, what is the best way for removing large amounts of data?
Another question, does an audit(‘’) force the purge of data that has been flagged to be removed?
What is the best way to purge data so I can see the reflection of the change when I run “select get_compliance_status();”

Thank you for your help
If it's one bulk then you can do that in a delete and then purge deleted vector files (verify that your projection are suited for deletes with evakuate delete performance ).
If it's taking too much time then use create table as select to create a "good" table then rename the old one to a temp name and the new one to the original name
for example :

create table x_new as select * from schema.tableA where id =3 ( you can create the structure first the same as the original and use insert instead of ctas)
alter table schema.tableA rename to x_old
alter table x_new rename to schema.tableA;

Once you sure the process finished successfully drop the x_old table .

Aother approache is to use partitions and then your delete will not be as bad as scan the whole table.
Would partitioning be the best method, do you know of a good tutorial on partitioning?

Again,
Thanks for the help. I am working on somthing that I might not be the best person for at the time. I learn through walking through the steps.

zvika
Beginner
Beginner
Posts: 25
Joined: Thu Apr 19, 2012 7:55 am

Re: Large deletes from two tables

Post by zvika » Tue Sep 03, 2013 1:25 pm

If you can implement partitioning than I think it's the one of the best way as every solution depend on the scenario and there is no rule for all scenarios.
Just read the documentation about partitioning it will give you the knowlege .

You will be the best person ... don't worry :-)

doug_harmon
Beginner
Beginner
Posts: 36
Joined: Fri Feb 17, 2012 6:09 pm
Contact:

Re: Large deletes from two tables

Post by doug_harmon » Tue Sep 03, 2013 3:17 pm

Here's an example of how to do partitioning.

http://www.vertica-forums.com/viewtopic.php?f=63&t=1113

golgiapparatus
Newbie
Newbie
Posts: 5
Joined: Mon Apr 22, 2013 5:49 pm

Re: Large deletes from two tables

Post by golgiapparatus » Mon Sep 23, 2013 6:39 pm

Hello all, I am sort of confused, I have a few questions to ask.

I have a table: schema.MyTblName
I do a query

Code: Select all

SELECT COUNT(*) FROM schema.MyTblName WHERE ID = 0001
This returns 900,000,000 rows.

Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.

Then create a projection table that has the same schema as my schema.MyTblName table?

I then populate the projection table with the items that I want to delete?

then run the command

Code: Select all

 DELETE FROM schema.MyTblName WHERE ID = 0001 
Is this correct?

How can I then purge the data from storage so I can become compliant with my licenses?

id10t
GURU
GURU
Posts: 732
Joined: Mon Apr 16, 2012 2:44 pm

Re: Large deletes from two tables

Post by id10t » Mon Sep 23, 2013 7:32 pm

Hi!

1. Better if you will use in PARTITIONING, deletes are painful for Vertica.
https://my.vertica.com/docs/6.1.x/HTML/ ... #19058.htm
https://my.vertica.com/docs/6.1.x/HTML/ ... 9074_1.htm

2. To create "clone" of table use in "CREATE TABLE LIKE" statement
https://my.vertica.com/docs/6.1.x/HTML/ ... #18554.htm

>> Lets say that I want to delete all the rows from schema.MyTblName where ID = 0001.
And what is capacity of table? 900M rows for deletion? If you have ~1 billion of rows, so may be will be faster to move all data that should not be deleted (and after it to truncate a table with deleted data)?

Post Reply

Return to “Vertica Database Administration”