-->

2018-08-12

Connecting to TinkerPop's gremlin server from R Studio with a java wrapper

Apache TinkerPop supports a number of gremlin language variants to provide native language support for connecting to gremlin server. The variants include java, python, .NET and javascript, but not R. So, when making available graph data in a TinkerPop-compatible graph database, you are likely to receive user requests to access these data from R studio. Yet, a google search on "gremlin R" only results in a suggestion to use rJava to connect to gremlin server and a cumbersome way to read some graph data into a TinkerGraph and querying it successively. Therefore, this post documents how to connect to gremlin server from R Studio using rJava.

Building the java wrapper

After experimenting a few hours with the minimally documented rJava package in R Studio, I decided this was not the way I wanted to duct-tape R Studio to gremlin server. So, rather I wrote a simple java class that does all the interfacing to gremlin server, but is easy to access from rJava. The source code is available from https://gist.github.com/vtslab/676419e6f205672aa935bb3dbfe2d1d8.

Assuming you have Apache Maven installed on your system, you can build the jar by placing the RGremlin.java file in the src/main/java directory relative to the rgremlin_pom.xml and issuing:

$ mvn install -f rgremlin_pom.xml

This gives you a target directory with the rgremlin-0.0.1.jar and a target/lib folder with dependencies of the RGremlin java class.

In this particular example, the JanusGraphIoRegistry is added to the connection. Of course, you can easily replace it with an IoRegistry for other TinkerPop implementations.

Running a gremlin query

With the java wrapper, connecting to gremlin server and submitting a gremlin query string is now as simple as:

# install.packages("rJava")
# install.packages("rjson")
library(rJava)
library(rjson)

params ← c("-Dgremlin.log4j.level=INFO")
jinit(parameters=params)
.jaddClassPath(
     dir("/some/path/lib/janusgraph-0.2.1-hadoop2/lib",  full.names=TRUE))
.jclassPath()

client ← .jnew("rgremlin.RGremlin", "localhost")
jsonStr ← .jcall(
     client, "Ljava/lang/String;",
     "submit", "g.V().id()")

result ← fromJSON(jsonStr)
print(result)

<- c="" gremlin.log4j.level="INFO" p=""><- .jnew="" p="" rgremlin.rgremlin="" some.gremlin.server=""><- .jcall="" p=""><- fromjson="" jsonstr="" p="">Note that query results are serialized as json between the java wrapper and R. Since rjson only recognizes standard types and collections, the java wrapper is not suitable for queries that return TinkerPop objects such as vertices, edges and TinkerGraphs. In the example above, vertices are returned as maps using the gremlin valueMap() step.

In the example above, the rgremlin-0.0.1.jar was simply copied to the JanusGraph lib directory, which on its turn was added to the rJava classpath. As an alternative, you can add the target and target/lib directories from the previous section to the rJava classpath.

Closing remark


The solution presented is only a quick and dirty manner to get your users going and so gather additional user requirements. Ideally, R would become available as a gremlin language variant for complete TinkerPop support and the possibility to write gremlin queries natively in R instead of submitting query strings.

No comments:

Post a Comment