Mining Patterns for Maximal Coverage in Time Series

Abstract

Time series are a fundamental building block of modern data analysis due to their cost-effectiveness in data collection and versatility in capturing a variety of dynamic phenomena over time. In many applications, the measured proxy variable represents an activity or state of the underlying system of interest. However, high-level information about the system is not directly accessible from raw time series data, which typically consist of sequentially recorded values over time rather than explicit system states. Conducting analysis on time series data often requires identifying the core patterns associated with different possible states. This identification step can enable forecasting, relationship mining, policy verification, or integrity assessment of the system of interest. When a system is neither observable (i.e., its states or activity cannot be accessed at any time) nor controllable (i.e., its states or activity cannot be scheduled), mining key patterns must rely on unsupervised methods. Moreover, when assuming that the system is always in one state, the solution must maximize coverage of the input time series rather than simply returning the best matches. In this paper, we propose a novel approach for mining recurrent patterns from time series, with a focus on maximizing time series coverage. The method first generates a list of candidate patterns and their occurrences using a well-established matrix profile algorithm. Then, a translation layer produces an ensemble of constraints compiled into a model that describes a solution while preventing overlapping occurrences. Finally, a constraint solver generates a solution in the form of a set of core patterns, which are selected based on predefined constraints on the Number of Patterns (NoP) and coverage conditions. We evaluate this approach on a dataset of power consumption time series representing the activity of a computer, as well as synthetic pattern-based time series.

Article