Yet Another Analytics & Intelligence Communication Series: 2016

2016-01-02

Extracting ego networks with Apache Spark GraphX

Suppose you want to do graph computations on a social network with tens of millions of vertices, but all your data are stored in a Hortonworks secure Hadoop cluster and are supposed to stay there: what are your options? Having installed Neo4j on the cluster is a no-go: the devops team is having a good time getting the cluster and its associated services in production and distractions cannot be allowed. The first thing that comes to mind is applying Apache Spark GraphX, which is part of the Hortonworks Data Platform. It provides a scalable solution to graph computing based on Google's Pregel algorithm. Is Spark GraphX a suitable tool for devops teams to develop and maintain graph-based information services? Let's see.