A measure of distance between time series: Dynamic Time Warping
Share:

Jessica Leung Discipline of Business Analytics The University of Sydney 

Robert James Discipline of Finance The University of Sydney 
Many important phenomena are recorded as time series data, for instance, audio signals, electrocardiogram (ECG) signals from heartbeat monitors, returns of a financial asset and even the intensity of light emitted from a distant star. In many applications, it is of interest to measure the similarity (distance) between different time series samples. For example, voice recognition systems compare the audio signal of a spoken word against a database of template signals. Another example is heartbeat monitoring systems that compare real time patterns against a database of ECG patterns of known heart conditions.
When measuring the similarity between time series sequences, it is important to consider the alignment path, that is, the mapping between observations in the two time series. The Euclidean distance is a widely used measure of similarity between two vectors which, by definition, enforces a lockstep onetoone mapping between corresponding observations in any two time series samples. Yet, the similarity between two time series measured by the Euclidean distance can be severely distorted when minor misalignments along the time axis, for example due to instrument measurement error, are present between the two time series. As a consequence, time series data mining tasks such as classification and clustering will break down if misalignments are not first corrected. Figure 1 demonstrates this idea for two starlight curves that belong to the same class of star but are misaligned in time^{1}. The second panel of Figure 1 illustrates the lockstep Euclidean alignment path between the two time series. Since we know that the two time series belong to the same class of star, it is clear that the Euclidean distance does not provide an intuitive measure of similarity for these misaligned time series.
Figure 1: Comparison of Euclidean and DTW alignment between two starlight curves for an identical star. 
To accurately measure the similarity in these cases, we must account for the potential misalignment between the two given time series samples. Dynamic Time Warping (DTW) offers one possible solution to this misalignment problem. DTW shrinks or stretches regions of one time series so as to best fit the other. In other words, DTW allows a nonlinear alignment between observations and is therefore invariant to misaligned data. The third panel of Figure 1 plots the alignment path that is constructed by DTW. Using this alignment path, the final panel of Figure 1 plots the synchronized time series. It is now clear that the starlight curves belong to the same class of stars.
To construct the DTW alignment path, we first build a cost matrix that contains the exhaustive combination of all possible pairwise Euclidean distances between observations in the two time series. Contiguous alignment paths can be traced through this cost matrix and the distance between the two time series under any given alignment path is the sum of distances along the path. The goal of DTW is to construct the alignment path with the least cost. This path of least cost is found by using a recursive equation that defines the cumulative cost at the current observation as the current cost plus the minimum cost in the adjacent cells corresponding to the previous observation. Typically we compute the optimal alignment path subject to a constraint, known as the warping window width constraint, that restricts where the warping path is located to a region around the main diagonal of the cost matrix. Intuitively, this constraint ensures realistic alignments by prohibiting cases where one observation is matched with many observations in the other time series, a circumstance known as pathological warping.
Figure 2 plots an example of the cost matrix together with the DTW alignment path (solid white line) for the two starlight curves plotted previously in Figure 1. The dashed line along the diagonal of the cost matrix corresponds to the Euclidean alignment path. Therefore a Euclidean alignment path is a special case of the DTW alignment path. When the DTW alignment path is above (below) the diagonal we stretch (shrink) time series x to match time series y. For this reason, DTW is often referred to as an elastic distance measure. The cost matrix demonstrates that the DTW alignment path is a path of least resistance. The dotted dashed lines around the diagonal of the cost matrix represent a warping window width constraint set to a value of 20% of the length of the time series. In practice, the optimal value to use in this constraint can be selected using cross validation [1].
Figure 2: Cost Matrix and DTW alignment path for two Starlight Curves 
One of the most prolific applications of DTW is as the distance measure in a nearest neighbour algorithm. For a given query time series, a Knearest neighbour DTW algorithm is used to find the best match from a database of reference time series. However, the computational cost of a nearest neighbour search that uses DTW distance is considerable. In practice, a Knearest neighbour DTW algorithm should always use lower bounding and early abandoning techniques to reduce the number of DTW computations that are required. Lower bounds are functions that are faster to compute than the DTW distance but closely approximate the actual DTW distance between two time series. Lower bounds are used to efficiently prune the nearest neighbour candidates and are implemented in cascading order of computational complexity. The three most common lower bounds developed in the literature are those of [7,3] and [4]. It is also possible to specify early abandoning conditions whereby a current DTW calculation can be stopped if the cumulative distance along the optimal alignment path is such that the reference sequence cannot be a nearest neighbour [7]. These techniques allow DTW to be applied to truly massive datasets. In addition to nearest neighbour applications, time series clustering can also be performed using DTW [6]. For example, STRAVA has demonstrated how DTW averaging techniques in the domain of time series clustering can fix sequences of location data that have been misaligned by GPS sensor noise. It is also possible to generalize DTW for use with multivariate time series sequences [8].
Dynamic Time warping has been shown to perform exceptionally well across a large number of application domains, particularly when considering its relative simplicity [1]. As such, DTW has emerged as one of the most important time series data mining techniques in the last two decades. More recent trends in time series classification have moved towards ensemble models that use Knearest neighbour DTW as a key algorithm [see eg. 5]. Moreover, [2] has recently formulated DTW as a classical optimal control problem. This opens up avenues for future research on new methods for analysing signals via extending the optimization formulations.
In conclusion, DTW is a versatile tool that has been used with great success across many scientific fields. It is likely that many more exciting time series data mining problems can be solved by applying DTW in imaginative new ways.
^{1}The starlight curve data is available at https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/
References:
 Dau, H. A., Silva, D. F., Petitjean, F., Forestier, G., Bagnall, A., Mueen, A., and Keogh, E. (2018). Optimizing dynamic time warping’s window width for time series data mining applications. Data mining and knowledge discovery, 32(4):1074– 1120.
 Deriso, D. and Boyd, S. (2019). A general optimization framework for dynamic time warping. arXiv preprint arXiv:1905.12893.
 Keogh, E. and Ratanamahatana, C. A. (2005). Exact indexing of dynamic time warping. Knowledge and information systems, 7(3):358–386.
 Lemire, D. (2009). Faster retrieval with a twopass dynamictimewarping lower bound. Pattern recognition, 42(9):2169–2180.
 Lines, J., Taylor, S., and Bagnall, A. (2018). Time series classification with hivecote: The hierarchical vote collective of transformationbased ensembles. ACM Transactions on Knowledge Discovery from Data, 12(5).
 Petitjean, F., Ketterlin, A., and Gançarski, P. (2011). A global averaging method for dynamic time warping, with applications to clustering. Pattern recognition, 44(3):678–693.
 Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E. (2013). Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(3):1–31.
 ShokoohiYekta, M., Hu, B., Jin, H., Wang, J., and Keogh, E. (2017). Generalizing dtw to the multidimensional case requires an adaptive approach. Data mining and knowledge discovery,31(1):1–31.