Download PDFOpen PDF in browserTimesmash: Process-Aware Fast Time Series Clustering and ClassificationEasyChair Preprint 70296 pages•Date: November 10, 2021AbstractWe introduce Timesmash: a comprehensive suite of clustering and classification algorithms and their implementation as a eponymous python package for stochastic time series analysis. We leverage a subclass of hidden Markov model (HMM), called Probabilistic Finite-State Automaton (PFSA), which are used to first model in an unsupervised setting the underlying generative processes for observed data streams, which then aid in carrying out automatic physics or process aware featurization enabling subsequent clustering and classification. The algorithms in this suite consist of the following tools: a) LikelihoodDistance estimating in an unsupervised setting the divergence between ergodic stationary finite valued stochastic processes from the observation of finite and possibly unequal sample paths. b) Featurization algorithms SymbolicDerivative, InferredHMMLikelihood, and ClusteredHMMClassifier, which operate by aiming to recover the underlying hidden generator for the sample paths pre- sented, which then may be used to automatically distill effective features for classification. Our core algorithms require the data streams to take values in a finite alphabet. To extend applicability to continuous-valued time series, a data-driven quantization algorithm, our implementation includes the tool Quantizer that discretizes continuous sequences without the assumption of domain knowledge. We evaluate the performance of the Timesmash algorithms on problems from the UCR Time Series Classification Archive, and show that we at par or better compared to the state of the art Dynamic Time Warping (DTW) algorithm. In addition, we include brief examples where our unsupervised physical modeling leads to insights not easily obtainable with the current state of the art. Keyphrases: Physics-aware learning, Probabilistic Finite State Automata, stochastic processes, time series classification, time series clustering
|