Modern sensor networks and communication networks provide large sets of operational data, including information per system, subsystem, feature, port, or even per packet. A network device, like a router or a switch, offers millions of multivariate time series that portray the state of the device. But the fact that data is available does not mean that it is easily consumable.
Indeed, the reality is that network operations produce a lot of data but at the same time they are starving for insights. The amount of telemetry data available in the form of multivariate time series is produced at an unprecedented rate, but typical telemetry streams contain time series for metrics that are high dimensional, noisy often with missing values and thus offer incomplete information only. The generation of continuous telemetry data at high frequencies and volume poses serious problems to data centers in terms of bandwidth and storage requirements.
This is challenging for network administrators who need to store, interpret, and reason about the data in a holistic and comprehensive way. One of their usual practices is to hand-pick a small subset of the available timeseries-data based on experience. Another is to apply down-sampling strategies for aggregating common features, de facto limiting the prediction accuracy of telemetry analytics. Doing this, network administrators are confronted by two key challenges:
The traditional approaches to solve these challenges are to:
So how can we take advantage of this flood of data and turn them into actionable information? Traditional techniques for data mining and knowledge discovery are unsuitable to uncover the global data structure from local observations. When there is too much data, data is either aggregated globally (e.g., static semantic ontologies), or data is aggregated locally (e.g., deep learning architectures) to reduce data storage and data transport requirements. Either way, with simple aggregation methods, we might lose those insights that we are ultimately looking for. To avoid this, our team has been exploring methodologies that enable data mining services in complex environments and that allow to get useful insights directly from the data. We’re putting ourselves into the shoes of a consumer of telemetry data, understanding the delivered data as a product: What are the business insights and associated values that the telemetry data offers? Which of the “dark data” offered by a router or switch but typically left unexplored is providing interesting insights to the operator? We exploit topological methods to generate rich signatures of the input data by reducing large datasets to a compressed representation in lower dimensions and find unexpected relationships and rich structures.
In the next episode, we will provide some background on topology and introduce the concepts behind Topological Data Analysis.