Digging into the JanusGraph-HBase stack
The figure shows which steps JanusGraph-HBases makes to retrieve a vertex. JanusGraph instances maintain a memory-based cache and first try to use this to serve vertex requests. To hide uncommitted changes from other transactions, mutated vertices and edges are cached per transaction while unchanged vertices are stored in a so-called database cache. Retrieving a vertex with its properties by a gremlin query from the JanusGraph cache typically takes 0.5 ms.
When a vertex request does not hit the JanusGraph cache, a connection is set up to the HBase regionserver that serves the vertex id, cq row key. For a large graph, separate connections are established for all regionservers involved. Connections will time out after a while and have to be re-established. Connection setup time typically takes 15 ms on a kerberized cluster.
HBase regionservers maintain a so-called blockcache, consisting of small blocks of data of typical 64 kB. This way, JanusGraph can keep a very large graph in the distributed memory of an HBase cluster. However, the improved scalability comes at a price: response times for single-vertex gremlin queries now amount to about 7 ms.
The HBase blockcache will not be able to serve all vertex requests (unless you configure the JanusGraph table to be memory-resident): the cache needs to be warmed and blocks can be evicted due to the use of the blockcache by other applications. Then, the HBase regionserver needs to retrieve a particular row from its storefile. Typically, regionservers have their storefiles on Hadoop HDFS with a local replica. However, after a while local replicas of HDFS blocks may have got lost and major compactions to restore this may not have been run, resulting in a remote HDFS block access to retrieve a particular row. Response times for single-vertex gremlin queries served from local and remote storage by HBase typically amount to 30 ms and 100 ms, respectively.
Measuring reponse times
When measuring reponse times of single-vertex gremlin queries on JanusGraph-HBase, it is useful to set the logging level of the JanusGraph client to DEBUG, because the debug messages show:
when a JanusGraph transaction cache is instantiated
when connections with HBase are established
when rows are retrieved from HBase.
In addition it is useful to set tracing of class loading with: export JAVA_OPTIONS='-XX:+TraceClassLoading'
because the first query of a repeated set of queries can suffer delay from class loading.
Controlling the way in which vertex retrievals are served, goes as follows:
because the first query of a repeated set of queries can suffer delay from class loading.
- From the JanusGraph transaction cache
Set cache.db-cache=false in the JanusGraph properties file and simply repeat a single-vertex gremlin query, like: g.V(12345L).valueMap().toList() - From the JanusGraph database cacheSet cache.db-cache=true in the JanusGraph properties file and issue a g.tx().close() before repeating the query (setting storage.transactions=false or cache.tx-cache-size = 0 and cache.tx-dirty-size = 0 does not work)
- Set up HBase regionserver connection
Select relevant debug log messages during the first the round of queries - From the HBase blockcacheSet cache.db-cache=false in the JanusGraph properties file and issue a g.tx().close() before repeating the query. See results in the figure below.
Response times of queries served by the HBase blockcache |
- From the HBase storefile
Be sure to do these measurements from a cleared blockcache. Ultimately, this requires restarting the entire HBase cluster for HBase versions below 1.4/2.0, but this may not be practical so check the workaround in the next section. See results in the figure below.
Response times of queries for a vertex not in the HBase blockcache |
Tweaking HBase major compactions
A warning first: I am not an HBase admin, neither did I completely study any "definitive guide" on HBase. But I am not afraid to try things and I have problems with the impossible.After some haphazard reading I figured I could get a JanusGraph-HBase table with an empty blockcache by taking a snapshot from my table and creating a clone from the snapshot. However, this clone is inititally still based on the store files from the original table and the blockcache will serve requests for both the original and cloned tables from the same blockcache items.
Next trial was to run a major compaction on the cloned table. This should rewrite the store files if they can be compacted, resulting in the blockcache not longer recognizing that rows from the original and cloned tables are identical. However, this move did not do anything to my response time measurements. Apparently, the major compaction process decided that my storefiles were fine already and it challenged me to take more drastic measures. This brought me to lower the MAX_FILESIZE attribute of the cloned table, The complete procedure in the hbase shell looks as follows:
> snapshot 'some_table', 'some_table_shot'
> clone_snapshot 'some_table_shot', 'cloned_table'
> alter 'cloned_table', MAX_FILESIZE => '536870912'
> major_compact 'cloned_table'
Now, I finally got the response time measurements presented earlier. While writing this, I realize that these figures still look as if the storefile measurements are somewhat contaminated by blockcache hits, although they do not conflict with figures mentioned elsewhere.
Unexpectedly, the major compaction with a new, smaller MAX_FILESIZE gave me a bonus. Apparently, this forced rewrite also restored data locality at the region servers as it should have done in the first place. Measurements before the compaction looked as in the figure below.
Response times for rows not in the HBase blockcache, before major compaction |
Conclusions
Understanding reponse times for single vertex queries in JanusGraph-HBase requires careful scrutiny.Major compactions on HBase tables seem unsufficiently documented and can be tweaked with some perseverence. Of course, lowering MAX_FILESIZE on a regular basis is not a winning procedure for production tables (although you could preceed it with a number of forced region merges), but it is practical for the baseline measurements presented in this blog.
No comments:
Post a Comment