Mining from Big Graph Data

- Vertex-centric Streamlined Processing for Reducing the Computation

Data recording the interactions between objects often takes the form of a graph. Graph data is being accumulated in various domains such as telecommunication, internet, e-commerce, social networks, and internet of things. Usually the scales of the graphs are very large and they continuously grow with rapid speed. In fact, companies can benefit greatly by mining the large scale graph data in their business. In the meantime, the scale also poses a great challenge to the related research.

Noah’s Ark Lab aims to develop advanced technologies to make effective use of the massive graph data in different applications. Specifically we want to build novel platforms, tools, models, and theories for processing, analyzing, and utilizing large scale graph data. The VENUS system, which we are developing, is a graph computation platform capable of handling large-scale graphs on multiple machines, or even a single machine. With its technique of vertex-centric streamlined processing, it can drastically reduce the computation time and significantly outperform the existing platforms. For example, to run the PageRank algorithm on a Twitter graph of 42M nodes and 1.4 billion edges, Spark needs 8 minutes with 50 machines (100 CPUs), GraphChi spends 13 minutes on one machine equipped with high-speed SSD drive, while it only takes 8 minutes for VENUS to complete the task on one machine with an ordinary hard disk.