Graph Database Java OGM Comparison

I have been using graph databases for a while (mostly Neo4j) and i thought it might be a good idea to write down some of the things i noted while using Neo4j in combination with Spring Data Neo4j (SDN).

I also compiled a comparison of various graph databases a while ago. My list only includes Java based graph databases that allow embedding. I won’t go into details about the feature sets and how those differ between each of those databases. Instead i just point out the most important aspects i noticed while using the OGM’s and graph databases.

Graph Databases

Neo4j

Neo4j was the first graph database i have ever used. It is open source and very fast. It also ships directly with a neat little admin ui which can be used to visualize your graph data. The database is very easy to embed and comes with a powerful query language (cypher). I don’t know whether there are any other dedicated OGM/ORM layers for Neo4j besides of SDN and blueprint based OGMs. The licensing on the other hand is not very useful once you decide to embed Neo4j in your application. Neo4j Community Edition is licensed under GPL. MySQL is also licensed under GPL. This means when you would not embed the database and only use the provided Neo4j REST API you would not need to license your application under GPL. Once you embed the database in your application you must license your application under GPL. This gets even worse when you decide to utilize the clustering features. In this case you would need to license your application under AGPL (even if you would use Neo4j though the REST API)

The High Availability mode (Master/Slave Replication) can also be used when embedding the database. I wrote a dummy project a while ago that contains a working example.

OrientDB

OrientDB is also open source. There is no cypher but you can use Orient SQL. Embedding is also very easy and the licensing with Apache 2 license is very developer friendly. Tinkerpop support is very good.

The blueprint API is the nativ API for orientdb. This means no additional blueprint API implementation is needed when using a blueprint based OGM.

Sparsity Sparksee

I have never used "Sparsity Sparksee":http://www.sparsity-technologies.com/ but feature wise it is comparable to the other big graph databases.

Titan DB

Titan DB is an interesting database. The storage layer for this graph database is interchangeable. You can use Berkeley DB which is quiet fast but it basically limits the size of nodes you can store and you can’t use clustering. Alternatively you can also use Cassandra. Cassandra is slower compared to BerkeleyDB but it supports replication.

Hypergraphdb

I have never used this database and can’t say much about it but my impression is that it is very small and the feature set is limited.

Performance comparison

The performance comparision is very superficial and you should keep in mind that the usecase for the database should always dictate the choice. I have just compared low level read and write speed because i was interested in those. The benchmark does not cover any kind of graph traversals. I was merely interessted in the speed it takes each database to output a single node.

All tests were executed within in a JVM that also excuted the graph database. I created 10k nodes and read those 10k nodes sequentially and in random order. No warmup phase was added. As a baseline i choose Neo4j because it got the overall best performance. (less % is better)

DB write 10k read 10k seq read 10k random

Neo4j

100%

100%

100%

OrientDB

171%

101%

104%

Titan DB (Cassandra)

314%

502%

510%

Titan DB (BerkeleyDB)

-

200%

205%

OGM - Object Graph Mapping

I created a github project that contains examples for all mentioned OGM’s and graph databases.

Spring Data Neo4j

I have used SDN a lot and i’m quiet impressed by it. Getting started is quiet easy and there are a lot of examples out there.

SDN uses annotations to map the entities and relationships. Inheritance of objects is directly mapped to the labels of a node. It is possible to create Spring Data Repositories that retrieve objects by using property values or by specifying cypher statements.

What i like is the paging cypher support. What i do not like is the amount of classes and interfaces you need to create to interface with your objects but i guess this is always application specific.

When mentioning SDN it is important to note the differences between the versions.

3.3.x

Example usecase:

  • User.java - Defines the entity

  • UserRepository.java - Defines the SDN user repository

  • UserRepositoryImpl.java - Defines a SDN repository implementation that may contain custom repository method implementations

  • UserActions.java - Interface that contains the methods (is extended by UserRepository and implemented by UserRepositoryImpl)

  • UserService - Defines methods that the implementation may use to manipulate user objects.

  • UserServiceImpl - Implements the defined methods.

Another point that caused a lot of trouble for me was the @Fetch annotation. The getGroups() method would load the full entities (groups) when adding the @Fetch annotation to the method. This could cause to infinitive recursions or huge loading times. At the end i removed nearly all @Fetch annotations from my projects.

Instead i used the Neo4jTemplate class in order to populate the returned enitity. When no @Fetch was specified only a shallow object with no properties is returned. Using the neo4jTemplate.fetch() this shallow object could be loaded.

Additionally SDN 3.x was/is slow as hell when using a remote Neo4j instead of the embedded one. This is another reason why SDN 4.x was developed.

@NodeEntity
public class User extends AbstractPersistable {

	private String lastname;

	private String firstname;

	@Indexed(unique = true)
	private String username;

  @RelatedTo(type = "MEMBER_OF", direction = Direction.OUTGOING, elementClass = Group.class)
	private Set<Group> groups = new HashSet<>();
}

4.x

SDN 4.x is still in development and is currently not using the Neo4j Core API directly. Instead it relies on the Neo4j REST API. The overall performance for remote connected neo4j servers is faster (compared to SDN 3.3 in remote mode). I can only guess why Neo4j/Pivotal Software choose this approach but my guess is that they started a rewrite of SDN in preparation for the binary protocol support for neo4j and to speedup SDN when using a remote Neo4j.

Tinkerpop

Tinkerpop is a collection of APIs that allow transparent and easy interfacing with graph databases. The blueprint API is the most low level api which is used to wrap the graph databases native API. By doing so it provides a standarized API which other APIs can use to interface with a graph db through this API layer. The API layer is very thin. There are various wrappers for many graph databases. I have used the blueprint neo4j implementation.

Tinkerpop Blueprint is generally a good choice when you want to develop your application but you are not yet sure what graph database you will use at the end.

There are three OGM’s that are based upon the blueprint API i have looked at.

The Ferma Benchmark contains measurements for Frames, Totorom and Ferma.

Tinkerpop 2 - Frames

The Frames API uses annotations similar to SDN and thus switching from SDN to Frames is not that hard. Indices have to be created separately. Tinkerpop does not support cypher. You would need to write your statements in gremlin instead. The project seems not very active and Frames will not be part of Tinkerpop 3.

public interface User extends AbstractPersistable {

        @Property("firstname")
        public String getFirstname();

        @Property("firstname")
        public void setFirstname(String name);

        @Property("lastname")
        public String getLastname();

        @Property("lastname")
        public void setLastname(String name);

        @Property("username")
        public String getUsername();

        @Property("username")
        public void setUsername(String name);

        @Adjacency(label = "HAS_USER", direction = Direction.OUT)
        public Iterable<Group> getGroups();
}

Tinkerpop - Totorom

I guess Totorom could be seen as a successor to Frames. It is faster compared to Frames and it nativly interfaces with the tinkerpop gremlin query API. The whole OGM is also very small. Many (all?) annotations are gone. Instead of interfaces you write classes which make things a lot easier compared to frames. In frames custom method handlers would need a special annotation (@JavaHandler) and a dedicated handler implementation for the method. With Totorom you just add your custom method. Unfortunately the project itself is not very active (as for 06/2015).

public class User extends AbstractPersistable {

        public static String FIRSTNAME_KEY = "firstname";

        public static String LASTNAME_KEY = "lastname";

        public static String USERNAME_KEY = "username";

        public String getFirstname() {
                return getProperty(FIRSTNAME_KEY);
        }

        public void setFirstname(String name) {
                setProperty(FIRSTNAME_KEY, name);
        }

        public String getLastname() {
                return getProperty(LASTNAME_KEY);
        }

        public void setLastname(String name) {
                setProperty(LASTNAME_KEY, name);
        }

        public String getUsername() {
                return getProperty(USERNAME_KEY);
        }

        public void setUsername(String name) {
                setProperty(USERNAME_KEY, name);
        }

        public List<Group> getGroups() {
                return out("HAS_USER").toList(Group.class);
        }

}

Ferma

"Example projects":https://github.com/Jotschi/graph-ogm-examples/tree/master/ferma

The API of "Ferma":https://github.com/Syncleus/Ferma is very similar to Totorom. Ferma has various operation modes. It also supports the Frames annotations.

What i found useful:

  • Ferma is exposing the raw graph API.

  • The project is active as of 06/2015

  • The documentation is quite good.

  • The API contains useful method that can improve performance or prevent boilerplate code (e.g: toListExplicit, nextExplicit, nextOrDefault)

public class User extends AbstractPersistable {

        public static String FIRSTNAME_KEY = "firstname";

        public static String LASTNAME_KEY = "lastname";

        public static String USERNAME_KEY = "username";

        public String getFirstname() {
                return getProperty(FIRSTNAME_KEY);
        }

        public void setFirstname(String name) {
                setProperty(FIRSTNAME_KEY, name);
        }

        public String getLastname() {
                return getProperty(LASTNAME_KEY);
        }

        public void setLastname(String name) {
                setProperty(LASTNAME_KEY, name);
        }

        public String getUsername() {
                return getProperty(USERNAME_KEY);
        }

        public void setUsername(String name) {
                setProperty(USERNAME_KEY, name);
        }

        public List<Group> getGroups() {
                return out("HAS_USER").toList(Group.class);
        }

}