I am looking to get some advice on what type(s) of schema designs have been effective for storing time series telemetry/sensor data in Vertica. I am working on a project where there is a need to store multiple metric values for a given entity at a single point in time, then, continually add new "polled" data on a fixed interval. I see at least a couple of different options for schema design:
option 1:
entity | time | metric name | metric value
A now cpu util 70
A now memory util 90
...
A now+5min cpu util 75
A now+5min memory util 85
option 2:
entity | time | cpu util | memory util | ...
A now 70 90
A now+5min 75 85
I would like to go with more of a key-value design (option 1) but am concerned about significant record growth and combining values from different metrics that might not compress well. Anyway, I'd be interested in hearing about other options beyond what I have above and schema designs that have been particularly effective for storing time series data in Vertica.
Thanks in advance,
Dave
Opinions re: Schema Design for storing time series data
Moderator: NorbertKrupa
-
- GURU
- Posts: 527
- Joined: Tue Oct 22, 2013 9:36 pm
- Location: Chicago, IL
- Contact:
Re: Opinions re: Schema Design for storing time series data
Your first option should be fine with proper partitioning and a proper timestamp. Instead of having a metric name, you could also use a metric lookup table.
The second option is not really normalized.
The second option is not really normalized.
Checkout vertica.tips for more Vertica resources.
Re: Opinions re: Schema Design for storing time series data
Thanks a lot for the feedback.