Stream Data Mining

- Higher-speed Stream Data Processing Enabling Software Defined Network (SDN)

Among the three V’s of big data, velocity is perhaps the most salient property of telecommunication big data. In any telecommunication network, in addition to the fact that the amount of data is extremely large, the speed at which the data flows is also extremely high. For instance, in a datacenter of a telecommunication company, the size of the data going through the network may be as large as several TBs per second.

To optimize the communication within the network, the data center adopts a software defined network (SDN) developed by Huawei, which is powered by the advanced stream data mining technologies from Noah’s Ark Lab. Currently, the stream data mining system of the SDN can process 1 million events per second on a single machine. With the information provided by the analysis, the controller of the SDN can control the network with more flexibility and make communications in the network more efficient.

The stream data mining platform developed by the lab is called StreamSmart, which can process data streams twice as fast as Storm, and ten times faster than Spark Streaming, both of which are stream data processing systems widely used in the industry. StreamSmart is built with technologies of auto load balancing, distributed auto recovery, shared ring buffer, and light-weight distributed computation, and that is why it enjoys higher speed when processing data streams.

The lab is also building stream data mining tools, referred to as StreamDM, running on single machines, StreamSmart, and Spark Streaming. The single machine (C++) version and the Spark Streaming version of the tools have been released as open source tools at GitHub.