Real-time High Performance Anomaly Detection over Data Streams
Real-time analytics over data streams are crucial for a wide range of use cases in industry and research. Today’s sensor systems can produce high throughput data streams that have to be analyzed in real-time. One important analytic task is anomaly or outlier detection from the streaming data. In many industry applications, sensing devices produce a data stream that can be monitored to know the correct operation of industry devices and consequently avoid damages by triggering reactions in real-time. While anomaly detection is a well-studied topic in data mining, the real-time high-performance anomaly detection from big data streams require special studies and well-optimized implementation. This paper presents our implementation of a real-time anomaly detection system over data streams. We outline details of our two separate implementations using the Java and C++ programming languages, and provide technical details about the data processing pipelines. We report experimental results and describe performance tuning strategies.