At the edge, the amount of data generated by sensor devices is seemingly small (like individual raindrops) – in the 10’s of bytes. And if reported at a rate of once per day, this data still seems quite small (like a trickle) – 100’s of bytes a month leading to 1,000’s of bytes per year and possibly 10,000’s of bytes over a decade or more. With one device and one check-in per day this is in the realm of kilobytes (KB) of data that is collected and stored. But things change drastically if sensors check in every second (86,400 seconds in a day) or every minute (3,600 minutes in a day) and this is further compounded if there are 10,000’s of devices. Pretty quickly data collection, storage, analytics, and archival can turn into GigaBytes, TeraBytes, and even PetaBytes. A colleague used to counsel customers about server efficiency and to be on the lookout for the one additional server needed in a data center that would force their hands to have to build the next new $50M data center – he referred to it as the “$50M server”. Careful and deliberate choices of the minimum data records needed to be collected, analyzed and stored are important choices with data coming from devices deployed throughout an enterprise.
Device Design and Trade-off’s Effecting Monitoring and Analytics
If the objective is to minimize data, then only the data needed should be collected. And the data should be collected at a rate that is needed for the purposes of monitoring and analytics. For the purposes of monitoring, the amount of data retained could be kept quite small. For the purposes of charting trends, calculating rates, or performing analytics, the amount of data will be larger, but how much data is kept and for how long will be critical to the impact of the amount of data storage and computing resources required.
As the data is collected on the server (in the cloud, on-site, or in the corporate network) the parsing, calculations, and calibrations of the raw data will have an impact on computation resources. The selected data is usually stored with information about time, location, identification of the source, and any other relevant elements. This data is immediately useful for display, monitoring and reporting purposes. Graphing, animated gauges, and basic calculations are additionally useful “on-the-fly” operations that can be done with the data.
Total data storage needed is driven by the total number of data elements collected, retained, and any intermediate values that have been computed and retained as well. Thus, if all original raw data is stored, the calibrated and computed data is retained, and other intermediate computed values are retained as well, the original data could be represented and stored in three (3) or more formats in a very quick fashion. This is important since storage requirements are compounded by the number of devices sending data at the rate the data is sent. Ultimately, if no raw data is parsed, filtered, or discarded, the amount of data stored can grow at an alarming rate. Although it would be interesting to keep all the data collected from all the sensors for all time, this can come at a hefty cost. A more efficient and cost effective approach is to determine which data, at what frequency, and at what historical level is relevant for analytical and trending purposes and discard those records/elements that don’t meet the established criteria.
Bonus: Minimizing Data Collected, Transmitted and Stored
Although there are tradeoff’s for the amount of data collected and transmitted, some of the primary benefits of minimizing data include:
- battery life
- transmission distances
- data bandwidth
- data transmission charges/fees
- data computation and storage fees
- server power consumption
The Cellio Wireless Network has been designed to be as quick, affordable, and high quality. Cellio Wireless Transceivers allow the Cellio system and customers to quickly and easily expand existing systems with as many sensors and controllers as they like (with minimal incremental expense). The data collected in the back-end data system can be easily and rapidly mapped and provided for view on PC’s, tablets, smartphones, etc. both via browser views as well as automatically generated native mobile app views. Sharing and modifying the dashboard views is quick and easy. All of this is available and easy to put in place today. There is a saying that you should “Inspect what you expect”. The selection and placement of sensors, the reporting frequency of the data, the storage and retention of the data and resulting analyses will ultimately impact the collection and storage requirements for organizations and applications. With thoughtful planning, you won’t have to make the budget request for the server to retain the “$50M data record”.