Page 1 of 1

Spread Monitoring

Posted: Wed Jan 28, 2015 3:34 pm
by Timbo
Hi,
Does anyone have the definition of the columns in the "dc_spread_monitor" table or some SQL that would make some meaning of the data?

No information on this table in either the V6 or V7 online documentation.

I see in the V7 MC that there is an alert for "Spread Retransmit Rate Over Threshold 10%", but need to look for something similar in a V6 cluster and assuming the data to analyse is in the dc_spread_monitor table.

Regards
Tim

Re: Spread Monitoring

Posted: Wed Jan 28, 2015 4:30 pm
by NorbertKrupa
Timbo wrote:I see in the V7 MC that there is an alert for "Spread Retransmit Rate Over Threshold 10%", but need to look for something similar in a V6 cluster and assuming the data to analyse is in the dc_spread_monitor table.
If you're just looking for a query, this might help:

Code: Select all

SELECT DATE_TRUNC('minute', time) AS time,
       node_name, 
       ROUND(( MAX(retrans) - MIN(retrans) ) / ( MAX(message_delivered) - MIN(message_delivered) ) * 100, 2.0) AS retransmit_rate
FROM   dc_spread_monitor 
GROUP  BY 1, 
          2 
HAVING ROUND(( MAX(retrans) - MIN(retrans) ) / ( MAX(message_delivered) - MIN(message_delivered) ) * 100, 2.0) > 10
ORDER  BY 1 DESC, 
          2; 
The original intent of this alert in 7.x was to alert on the possibility of a node dropping from the cluster. However, it doesn't always mean that a high retransmit rate is indicative of an unhealthy cluster. If the cluster has no activity, the retransmit rate will appear high since there are fewer transmissions.