Pros & cons of running purge frequently

Moderator: NorbertKrupa

Post Reply
NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Pros & cons of running purge frequently

Post by NorbertKrupa » Thu Apr 24, 2014 7:44 pm

I'm curious what are the pro's & con's of running purge frequently. I'm aware that a purge should take place when deleted data exceeds 10% of the database storage size. I currently use this to determine that percent:

Code: Select all

SELECT (SELECT SUM(used_bytes) 
        FROM   v_monitor.delete_vectors) / (SELECT SUM(ros_used_bytes) 
                                            FROM   v_monitor.projection_storage) * 100 AS percent;
However, I saw that purge is sometimes run more frequently, such as after each ETL.

I know that there is high IO cost for running a purge, but what if it's run every hour for example? Since the delete vectors are still considered when the query is run, it makes sense that running purge frequently could be a potential benefit of speeding up queries.
Checkout vertica.tips for more Vertica resources.

User avatar
nnani
Master
Master
Posts: 302
Joined: Fri Apr 13, 2012 6:28 am
Contact:

Re: Pros & cons of running purge frequently

Post by nnani » Fri Apr 25, 2014 2:59 pm

Hi norbertk,

Purging in vertica is a heavy operation, which can affect the read operations on database

Pro's
1. Release database space being held by Delete vectors in database
2. Saves the queries from reading delete vectors while scanning data since there are no delete vectors existing after purge

Cons
1. time consuming resource consuming operation which may slow down other queries due to more resource aquisitions

It really depends on the ETL cycle of projection whether to run Purge frequently or not
If you have a loading window in your project and all the loading is done in this window then yu can use the 10% rule and only purge the data when the delete vector percentage is 10% or more
If you have trickle loading or random ETL in a day then I suggest doing purge after every ETL process since it makes sure the alst delete operation ETL process did not leave any delete vectors.

Some best practices while using purge
1. While purging avoid purging the whole database and try purging the tables which have delete vectors
2. Try purge_partitions for large deletes

Hope this helps
nnani........
Long way to go

You can check out my blogs at vertica-howto

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Pros & cons of running purge frequently

Post by NorbertKrupa » Fri Apr 25, 2014 3:06 pm

Thanks nnani! I definitely see a noticeable improvement after a purge is performed. Since we have a nightly period of inactivity, I think I'll try out doing a purge every day.
Checkout vertica.tips for more Vertica resources.

Post Reply

Return to “Vertica Database Administration”