Volume 16, No 1, 2019

A Semantic Approach for Outlier Detection in Big Data Streams


Hussien Ahmad and Salah Dowaji

Abstract

In recent years, the world faced a big revolution in data generation and collection technologies. The volume, velocity and veracity of data have changed drastically and led to new types of challenges related to data analysis, modeling and prediction. One of the key challenges is related to the semantic analysis of textual data especially in big data streams settings. The existing solutions focus on either topic analysis or the sentiment analysis. Moreover, the semantic outlier detection over data streams as one of the key problems in data mining and data analysis fields has less focus. In this paper, we introduce a new concept of semantic outlier through which the topic of the textual data is considered as the primary content of the data stream while the sentiment is considered as the context in which the data has been generated and affected. Also, we propose a framework for semantic outlier detection in big data streams which incorporates the contextual detection concepts. The advantage of the proposed concept is that it incorporates both topic and sentiment analysis into one single process; while at the same time the framework enables the implementation of different algorithms and approaches for semantic analysis.


Pages: 184-195

DOI: 10.14704/WEB/V16I1/a186

Keywords: Outlier detection; Big data; Big data stream; Distributed data streams; Graph data streams; Content-based outlier; Context-aware outlier

Full Text