"At RockMelt they collect data from various sources: server logs, web site logs, browser metrics, etc. Data from these sources gets processed via Hadoop, Splunk or Hive and permanently stored in HDFS or as compressed files in Amazon EBS storage. As it turns out, almost all of our performance, product and business metrics are time-based, and different metrics have different data types/structures. One common use case arises: we need to store and retrieve time series data on any schema. We use this to drive various dashboards displaying the latest metrics, data trends and other interesting numbers.
For example, our crash dashboard displays the number of crashes per hour per browser version for the past 60 days. We use this to track the stability of new releases and to help drive down crashes over time.
Why not use RRD?
RRD is a commonly used tool for storing time series data. One major limitation of RRD is that it deals only with numerical values. As you can see in the above example we would like the flexibility to store and retrieve time series data on any schema, be it an array or a complex JSON object..."