PhD thesis Title:
Advances in Streaming Novelty Detection
Due to the massive growth of machine learning applications motivated by the outstanding results obtained in a wide variety of areas that range from medicine, biology or economics to engineering and physics, a set of terms have been indistinctly used to refer to different problems. Such terms correspond to rare event, anomaly, novelty and outlier detection. As a first contribution of this PhD dissertation, a taxonomy of terms and learning scenarios is described that tries to give a short step into the standardization of the field. In such work, several key papers of the literature that also recall on the same problem have been analyzed. In order to further proof the proposed assignment, some experiments retrieving papers from Google Scholar, IEEE Xplore and ACM Digital Library have been performed that not only support the mix-up between terms and problems that exist, but also the given taxonomy.
As a second contribution, the Streaming Novelty Detection (SND) problem that gives name to this dissertation is treated. SND consists on learning a model that classifies among a given set of classes. At prediction time, unsupervised instances arrive in a stream fashion and the model must provide a classification for them; considering that the underlaying distribution of the data might have change — the so called concept drift —. Moreover, once in a while, some of the newcomer instances do not belong to the previously learned set of classes and the model must recognize them and, when sufficient amount of such instances are available, discover new emerging classes and hence, update the current model. To tackle this problem, a self-evolving algorithm based on a mixture of Gaussian distribution is proposed.
The last contribution of this dissertation also deals with the SND problem but, in this case, the instances are time series. To tackle this problem, deep auto encoders are used that compress the instances into a deep feature space (embedding) and then a Support Vector Data Description networks are used that enclose the instances into a minimum volume hyperspheres. In this work, a solution that allows an expert to evaluate the stream in hindsight is given.