Vertica node stuck in recovering state

Moderator: NorbertKrupa

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Vertica node stuck in recovering state

Post by dennisobrien_ig » Tue Apr 08, 2014 8:44 pm

We have a 4 node Vertica cluster running Vertica 5.1 on Red Hat Enterprise Linux Server 6.2.

Last week one of the nodes crashed and has been stuck in recovering state since. I have little experience as a Vertica database administrator, so I'm not sure what to try. Here's what I've tried so far:

- Force the database to restart via admintools. Three of the four nodes are successfully up, the one node remains stuck in recovering state.

- vertica.log - The log file is mostly filled with messages like this that repeat continuously:
2014-04-08 04:56:06.000 Timer Service:0x743d550 [Txn] <INFO> Begin Txn: c000000044f3c9 'ProjUtil::getLocalNodeLGE'
2014-04-08 04:56:06.005 Timer Service:0x743d550 [Txn] <INFO> Rollback Txn: c000000044f3c9 'ProjUtil::getLocalNodeLGE'
2014-04-08 04:56:06.005 Timer Service:0x743d550 [Recover] <INFO> My local node LGE = 0x1a4ccc and current epoch = 0x1a616c
2014-04-08 04:56:06.180 DistCall Dispatch:0x7f8e3cb19bc0 [Txn] <INFO> Rollback Txn: a00000009ec804 'sendGetClusterLGE'
2014-04-08 04:56:06.331 DistCall Dispatch:0x7f8e3cb19bc0 [Txn] <INFO> Rollback Txn: a00000009ec805 'sendCheckMissingLibraries'
- Reduce the load on Vertica. I've greatly increased the time between ETL runs to about 6 hours (from 15 minutes) to give the recovery process more time to recover without dealing with any new table writes. This was suggested in a post I ran across, but so far has had no effect.

I would greatly appreciate any suggestions for how to troubleshoot and steps to try.

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica node stuck in recovering state

Post by scutter » Tue Apr 08, 2014 9:52 pm

Hi Dennis,

Does the node show as RECOVERING in the ‘nodes’ table?

Assuming it does, check the vs_sessions table - is there a long-running recover operation? If yes, is it a table that gets a lot of deletes or udpates? This sounds like it could be a replay-delete issue. The solution would be to kill the vertica process that is recovering, execute “select make_ahm_now(‘true’)” to force the AHM to advance, and then restart recovery. With the AHM advanced, it won’t need to replay-delete. And if that is the issue, the projections also need some work to avoid similar issues.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Re: Vertica node stuck in recovering state

Post by dennisobrien_ig » Wed Apr 09, 2014 1:08 am

Hi Sharon

Thanks for your response and questions.

Yes, one of the four nodes reports as "RECOVERING" while the other three are "UP".

Code: Select all

select node_id, node_state from v_catalog.nodes;
node_id node_state
----------------- ----------
45035996273704972 UP
45035996273719654 UP
45035996273719658 RECOVERING
45035996273719662 UP
Looking at vs_sessions, I don't see any transactions that have taken a very long time, based on last_statement_duration_us. I do notice something suspicious with the timestamps however. Most of the transaction_start values are about 7 hours ahead of the current time UTC. In fact, if I just run the sql 'select now()' I get a value 7 hours ahead of UTC. The function getdate() returns the correct datetime in UTC. I have no idea if this is related or a red herring.

I will try the steps you suggested with make_ahm_now after I do enough documentation reading to know what I'm doing. I'll report back with the outcome.

Thanks again.
Dennis

scutter
Master
Master
Posts: 302
Joined: Tue Aug 07, 2012 2:15 am

Re: Vertica node stuck in recovering state

Post by scutter » Wed Apr 09, 2014 3:38 am

If you don’t see any signs of a long-running projection recovery, then I’d stop and continue to assess the situation before doing the make_ahm_now().

What does “select * from vs_recovery_status where is_running” show?

You have a support contract with Vertica? That may be your easiest path to resolution.

—Sharon
Sharon Cutter
Vertica Consultant, Zazz Technologies LLC

dennisobrien_ig
Newbie
Newbie
Posts: 6
Joined: Tue Apr 08, 2014 3:42 am

Re: Vertica node stuck in recovering state

Post by dennisobrien_ig » Wed Apr 09, 2014 5:44 am

Hi Sharon,

I may have misunderstood how to determine if a statement was long running. There is the last_statement_duration_us, which does not indicate any Internal or Recovery tasks. But if I order by transaction_start, then I see some Internal/Recovery tasks that have been running for about 8 hours. (Again, clouded by the mystery of now() reporting 7 hours in the future.)

The query you mentioned:

Code: Select all

select * from vs_recovery_status where is_running
node_name recover_epoch recovery_phase progress is_running
------------------------- ------------- ----------------- ------------------------------------ ----------
v_kontagent_logs_node0011 1729514 historical pass 1 0/0 split 6/8 historical 0/0 current true
We no longer have Vertica support, but I will look into our options. I'm not a DBA and I'm moving very carefully here so that I do not cause more problems than I fix.

Thanks.
Dennis

NorbertKrupa
GURU
GURU
Posts: 527
Joined: Tue Oct 22, 2013 9:36 pm
Location: Chicago, IL
Contact:

Re: Vertica node stuck in recovering state

Post by NorbertKrupa » Wed Apr 09, 2014 12:27 pm

Out of curiosity, how does the result of SELECT SYSDATE() compare to NOW()?
Checkout vertica.tips for more Vertica resources.

zentavr
Newbie
Newbie
Posts: 1
Joined: Wed Apr 09, 2014 5:07 pm

Re: Vertica node stuck in recovering state

Post by zentavr » Wed Apr 09, 2014 5:15 pm

norbertk wrote:Out of curiosity, how does the result of SELECT SYSDATE() compare to NOW()?
We are working with Dennis in the same department, so here is the reply to your question:

Code: Select all

=> SELECT NOW();
2014-04-09 13:13:01

Code: Select all

=> SELECT SYSDATE();
2014-04-09 16:13:02
in fact, OS date(1) showed Wed Apr 9 16:13:02 UTC 2014

Post Reply

Return to “New to Vertica Database Administration”