Databases 15 min read

Introduction to Knowledge Graphs and JanusGraph: Architecture, Schema Design, and Traversal

This article explains the rapid development of knowledge graphs, why graph databases like JanusGraph are preferred over relational databases for large‑scale semantic networks, and provides a step‑by‑step guide covering JanusGraph architecture, schema creation, Gremlin traversal language, server deployment, data import, and query examples.

58 Tech

Aug 16, 2019

Introduction to Knowledge Graphs and JanusGraph: Architecture, Schema Design, and Traversal

In recent years, the rise of big data has accelerated the development of knowledge graphs, which are large semantic networks composed of entities, attributes, and relationships. Storing and querying such topological data efficiently is crucial, making graph databases a natural fit.

Traditional relational databases handle low‑degree relationships well but struggle with high‑degree queries, leading to the emergence of graph databases. Popular graph databases include Neo4j, JanusGraph, Giraph, Dgraph, and TigerGraph. JanusGraph, in particular, offers strong extensibility by supporting back‑ends such as Cassandra, HBase, Google Cloud Bigtable, and indexing engines like Elasticsearch, Solr, and Lucene.

JanusGraph Overview

JanusGraph natively supports the Apache TinkerPop graph computing framework, providing both OLTP (graph database) and OLAP (graph analytics) capabilities. It integrates pluggable index back‑ends for geo, numeric, and full‑text search, and can store data in-memory, on distributed stores, or external databases.

Data Model and Schema

JanusGraph data consists of vertices, edges, and properties. A schema defines vertex labels, edge labels, and property keys. Edge labels support multiplicities (MULTI, SIMPLE, MANY2ONE, ONE2MANY, ONE2ONE), and property keys have types (String, Boolean, Integer, Double) and cardinalities (SINGLE, LIST, SET). For example, a real‑estate graph may have vertex labels "Metro Station", "Business District", "Community"; edge labels "Nearby" and "Similar"; and property keys like "Latitude/Longitude", "Line", "City", "Year", and "Alias" (SET).

Gremlin Traversal Language

Gremlin, the traversal language of TinkerPop, allows users to express complex graph queries in a functional, data‑flow style. It provides steps such as map, flatMap, filter, sideEffect, and branch, enabling powerful graph analytics.

Using JanusGraph

Applications can interact with JanusGraph either by embedding it directly in the JVM (embedded mode) or by connecting to a JanusGraph Server via Gremlin queries.

Practical Example: Real‑Estate Knowledge Graph

The following demonstrates how to set up a JanusGraph instance, design a schema, import data, and perform traversals.

1. JanusGraph Server Deployment

Download the JanusGraph release, extract it on a Linux machine, and use the bundled Cassandra and Elasticsearch for a quick demo. Start the services with the provided janusgraph.sh script, which launches Cassandra, Elasticsearch, and the Gremlin server.

2. Creating the Schema

Schema creation can be done via Gremlin console or Java code. Below are Java snippets that define property keys, vertex labels, edge labels, and their connections.

JanusGraphManagement management = graph.openManagement();
PropertyKey pk1 = management.makePropertyKey("name").dataType(String.class).cardinality(org.janusgraph.core.Cardinality.SINGLE).make();
PropertyKey pk2 = management.makePropertyKey("经纬度").dataType(String.class).cardinality(org.janusgraph.core.Cardinality.SINGLE).make();
PropertyKey pk3 = management.makePropertyKey("线路").dataType(String.class).cardinality(org.janusgraph.core.Cardinality.SINGLE).make();
PropertyKey pk4 = management.makePropertyKey("城市").dataType(String.class).cardinality(org.janusgraph.core.Cardinality.SINGLE).make();
PropertyKey pk5 = management.makePropertyKey("年代").dataType(Integer.class).cardinality(org.janusgraph.core.Cardinality.SINGLE).make();
PropertyKey pk6 = management.makePropertyKey("别名").dataType(String.class).cardinality(org.janusgraph.core.Cardinality.SET).make();

VertexLabel vl1 = management.makeVertexLabel("地铁站").make();
VertexLabel vl2 = management.makeVertexLabel("商圈").make();
VertexLabel vl3 = management.makeVertexLabel("小区").make();

EdgeLabel el1 = management.makeEdgeLabel("邻近").make();
EdgeLabel el2 = management.makeEdgeLabel("相似").make();

management.addProperties(vl1, pk1, pk2, pk3);
management.addProperties(vl2, pk1, pk2, pk4);
management.addProperties(vl3, pk1, pk2, pk5, pk6);

management.addConnection(el1, vl2, vl1);
management.addConnection(el1, vl2, vl3);
management.addConnection(el2, vl3, vl3);

management.commit();

3. Data Import

Data can be bulk‑loaded using OneTimeBulkLoader or IncrementBulkLoader, or inserted programmatically via Gremlin traversals. Example Gremlin statements:

Vertex v1 = graph.traversal().addV("地铁站").property("name","望京南地铁站").property("线路","13号线").property("经纬度","116.488413,39.990489").next();
Vertex v2 = graph.traversal().addV("商圈").property("name","大山子").property("城市","北京").property("经纬度","116.495599,39.994275").next();
Vertex v3 = graph.traversal().addV("小区").property("name","大山子北里").property("年代",1980).property("别名","大山子社区").property("经纬度","116.493875,39.990102").next();
Vertex v4 = graph.traversal().addV("小区").property("name","芳园里北区").property("年代",1970).property("别名","芳园里社区").property("别名","芳园里").property("别名","芳园里小区").property("经纬度","116.493875,39.990102").next();
Edge e1 = graph.traversal().V(v2).as("in").V(v1).addE("邻近").from("in").next();
Edge e2 = graph.traversal().V(v2).as("in").V(v3).addE("邻近").from("in").next();
Edge e3 = graph.traversal().V(v2).as("in").V(v4).addE("邻近").from("in").next();
Edge e4 = graph.traversal().V(v3).as("in").V(v4).addE("相似").from("in").next();
Edge e5 = graph.traversal().V(v4).as("in").V(v3).addE("相似").from("in").next();

4. Traversal Queries

Typical traversals start from a vertex or edge and apply a sequence of steps. Common step types include:

Step

Description map(Traversal<S,E>) Maps the traverser to an object of type E for the next step. flatMap(Traversal<S,E>) Maps the traverser to an iterator of E objects streamed to the next step. filter(Traversal<?,?>) Evaluates a predicate; false results are filtered out. sideEffect(Traversal<S,S>) Performs an operation on the traverser without altering it. branch(Traversal<S,M>) Splits the traverser into multiple traversals indexed by token M.

Additional terminating steps such as hasNext(), next(), toList(), and toSet() are also available, along with powerful predicate steps like as, by, and and.

Conclusion

This article introduced the relationship between knowledge graphs and graph databases, presented JanusGraph’s features, and demonstrated how to build a simple real‑estate knowledge graph using JanusGraph, Gremlin, and appropriate schema design. Future work will explore deeper topics such as indexing strategies, storage back‑ends, and advanced Gremlin queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

graph database schema Gremlin JanusGraph Traversal

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.