What are Graph databases and different types of Graph databases
4 min readApr 25, 2019
A graph database is a database designed to treat the relationships between data as equally important to the data itself. It uses Graph structures (node and edge) to represent and store data. A node in graph databases represents the record/object/entity and edge represent the relationship between the nodes.
Querying relationships within a graph database is fast because they are perpetually stored within the database itself and in many cases retrieved with one operation.
There are different types of graph databases based on storage and data model.
Different types of graph databases based on storage -
- Native graph storage — Storage that is specifically designed to store and manage graphs on disk, the unit is vertex and edge. Good for deep-link (multiple-hop) graph analytics. e.g. TigerGraph, Neo4j.
- Relational Storage — Storage that uses a relational model to store the vertex table and edge table. Then, at runtime, using relational JOIN to concatenate the two tables. e.g. GraphX.
- Key-Value Store — Storage that uses NoSQL databases like Cassandra, HBase, etc. e.g. JanusGraph.
Note — Relational or NoSQL databases are often slower due to the data model mismatch.
Different types of graph databases based on the data model -
- Property Graph (e.g. Neo4j, AWS Neptune) — Using property graph data is organized as nodes, relationships, and properties (data stored on nodes or relationships).
Nodes are the entities in the graph. They can hold any number of attributes (key-value pairs) called properties. Nodes can be tagged with labels, representing their different roles in your domain. Node labels may also serve to attach metadata (such as index or constraint information) to certain nodes.
Relationships provide directed, named, semantically-relevant connections between two node entities. A relationship always has a direction, a type, a start node, and an end node.
Like nodes, relationships can also have properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. Due to the efficient way relationships are stored, two nodes can share any number or type of relationships without sacrificing performance. Although they are stored in a specific direction, relationships can always be navigated efficiently in either direction.
- Hypergraph (e.g. HyperGraphDB) — Hypergraph is a graph data model in which a relationship (called a hyperedge) can connect any number of given nodes. It allows any number of nodes at either end of a relationship. It’s useful when your data includes a large number of many-to-many relationships.
In the below example, we see that Alice and Bob are the owners of three vehicles, but we can express this relationship using a single hyperedge. In a property graph, we would have to use six relationships to express the concept.
Since hyperedges are multidimensional, hypergraph models are more generalized than property graphs. Yet, the two are isomorphic, so you can always represent a hypergraph as a property graph (albeit with more relationships and nodes) — whereas you can’t do the reverse.
- Triple Store (e.g. AWS Neptune, AllegroGraph) — A triple store OR RDF (Resource Description Framework) stores data in a format known as a triple of subject-predicate-object data structure. e.g. “Bob is 35” or “Bob knows Fred”.
Addition of information is represented with a separate node. Specifically, an RDF graph model is composed of Nodes and Arcs. An RDF graph notation is represented by — a node for the subject, a node for the object, and an arc for the predicate.
Data processed by triple stores tend to be logically linked, thus triple stores are included in the category of graph databases. However, triple stores are not native graph databases because they don’t support index-free adjacency, nor are their storage engines optimized for storing property graphs.
Triple stores store triples as independent elements, which allows them to scale horizontally but prevents them from rapidly traversing relationships. In order to perform graph queries, triple stores must create connections from individual, independent facts — adding latency to every query.
Because of these trade-offs in scale and latency, the most common use case for triple stores is offline analytics rather than for online transactions.
References -