Managing Logistical Relationships in a Graph Database

April 4, 2017 Angshuman Talukdar

In today’s “Big Data” world, it’s not only about managing the large volumes of data but also about generating useful insights from them. To generate insights, we also need to concentrate on the relationships between data. What better way to leverage the relationships between data than using a database where relationships take first priority. We have such a database in a graph database that can represent our data as nodes and the relationships between them as edges.

Among many scenarios that can be naturally represented as graphs, maps are the most obvious. But when we think of maps, we can also think of logistics and geospatial applications, which are actually the original graph use cases. Geospatial applications of graph databases range from calculating routes between locations in networks such as a road or rail, airspace, or logistical network to spatial operations such as finding all points of interest in a bounded area, finding the center of a region, and calculating the intersection between two or more regions.

One concrete example of graph databases being used in logistics is eBay, who (owing to the acquisition of Shutl) provides a service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes — something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!

The eBay example is not isolated. Organizations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens more areas.

This article gives an example of how a use case for delivery can be modeled as a graph and solved through simple queries in the graph database.

 

Typical nodes in a logistical distribution network
Figure 1: Nodes in a typical logistical distribution network
 

The Use Case for Geospatial Data and Logistics

Let me present a highly simplified scenario. A parcel needs to be delivered and the delivery man needs to deliver it in the earliest time possible. So he has to choose the fastest route to the customer’s doorstep from the delivery center.

Graph Data Model:

To solve his problem, we need to represent the streets between the delivery center and the customer as a graph and hence the intersections/road junctions are considered as nodes of the graph and the edges of the graph as the lanes connecting the junctions. In graph database terms, edges (lanes) are the relationship between the nodes (intersections) and we name the relationship as Road which has the following attributes:

  • Distance: The actual distance between the intersections.
  • Coverability: Depends on a lot of factors like traffic, road condition, terrain etc.

The delivery center and the customer address are also represented as nodes in the graph. So essentially we have three types of nodes:

  • sNode: The Start node i.e. Delivery center.
  • eNode: The Finish node i.e. Customer’s address.
  • Intersection: Road junctions.

So now finding the fastest route boils down to a weighted shortest path calculation. We calculate it keeping in mind the two attributes of the relationship: distance (shorter is better) and coverability (higher is better).

Here we use the world’s leading graph database- Neo4j which is a highly scalable and ACID compliant database and its declarative query language Cypher.

A sample dataset is created in Neo4j using the CREATE clause in Cypher as given in Query 1. (Create clause in Cypher). This loads the data into Neo4j and generates the graph database as shown in Figure 2.

Neo4j has a lot of graph algorithms shipped with it as a package and those are accessible only from the JAVA API. Implementing some of these algorithms in Cypher is quite complex and time consuming. From Neo4j 3.x, the concept of user defined procedures had been introduced called APOC (Awesome Procedures On Cypher). Those are custom implementations of certain functionality, that can’t be (easily) expressed in Cypher itself. The APOC library consists of many (about 300) procedures to help with many different tasks in areas like data integration, graph algorithms or data conversion.

create (n0:sNode {name:'Start'}),
(n1:Intersection {name:'1'}),
(n2:Intersection {name:'2'}),
(n3:eNode {name:'Finish'}),
(n4:Intersection {name:'4'}),
(n5:Intersection {name:'5'}),
(n6:Intersection {name:'6'}),
(n7:Intersection {name:'7'}),
(n8:Intersection {name:'8'}),
(n9:Intersection {name:'9'}),
(n10:Intersection {name:'10'}),
(n11:Intersection {name:'11'}),
(n12:Intersection {name:'12'}),
(n13:Intersection {name:'13'}),
(n14:Intersection {name:'14'}),
(n15:Intersection {name:'15'}),
(n16:Intersection {name:'16'}),
(n17:Intersection {name:'17'}),
(n18:Intersection {name:'18'}),
(n19:Intersection {name:'19'}),

(n0)-[:Road {distance:5,coverability:1}]->(n4),
(n4)-[:Road {distance:15,coverability:1}]->(n5),
(n5)-[:Road {distance:5,coverability:1}]->(n6),
(n6)-[:Road {distance:100,coverability:1}]->(n7),
(n7)-[:Road {distance:80,coverability:0.9}]->(n8),
(n8)-[:Road {distance:5,coverability:0.8}]->(n1),
(n0)-[:Road {distance:100,coverability:0.9}]->(n9),
(n9)-[:Road {distance:10,coverability:0.8}]->(n10),
(n10)-[:Road {distance:10,coverability:0.8}]->(n1),
(n1)-[:Road {distance:5,coverability:0.8}]->(n11),
(n11)-[:Road {distance:10,coverability:0.8}]->(n12),
(n12)-[:Road {distance:30,coverability:0.8}]->(n13),
(n13)-[:Road {distance:60,coverability:0.8}]->(n2),
(n1)-[:Road {distance:5,coverability:0.8}]->(n8),
(n8)-[:Road {distance:105,coverability:0.9}]->(n14),
(n14)-[:Road {distance:5,coverability:0.8}]->(n2),
(n2)-[:Road {distance:15,coverability:0.8}]->(n15),
(n15)-[:Road {distance:75,coverability:0.9}]->(n3),
(n2)-[:Road {distance:20,coverability:0.7}]->(n16),
(n16)-[:Road {distance:15,coverability:0.9}]->(n17),
(n17)-[:Road {distance:75,coverability:1}]->(n3),
(n2)-[:Road {distance:5,coverability:0.9}]->(n14),
(n14)-[:Road {distance:40,coverability:0.9}]->(n18),
(n18)-[:Road {distance:40,coverability:0.9}]->(n19),
(n19)-[:Road {distance:70,coverability:1}]->(n3);
 

Query 1: Query to add data into the database

sample graph databaseFigure 2: The sample graph database

We need to download the latest APOC library, drop the .jar file into the. /plugins directory of our Neo4j installation, and then just run Query 2. We use the Dijkstra's algorithm for this task.

MATCH (startNode:sNode {name:"Start"}),(endNode:eNode {name:"Finish"})

call apoc.algo.dijkstra(startNode, endNode, 'Road', 'distance') YIELD path, weight

return path;

Query 2: Query to apply Dijkstra’s algorithm using only distance as the edge weights

shortest path graph databaseFigure 3: Shortest path from Start to End using distance as edge weights
 

In query 2 we just consider the distance property. To also consider the coverability property, we need to explicitly set the attribute, factor (= distance/coverability) into the relationships and then use Dijkstra’s algorithm. Run query 3 to achieve the desired results.

//set distance/coverability to each relationship

MATCH (a)-[r:Road]->(b)

SET r.factor=r.distance/r.coverability;

//apply Dijkstra’s algorithm with factor as the weight

MATCH (startNode:sNode {name:"Start"}),

(endNode:eNode {name:"Finish"})

call apoc.algo.dijkstra(startNode, endNode, 'Road', 'factor') YIELD path, weight

return path;

Query 3: Adding attribute distance/coverability and applying Dijkstra’s algorithm

shortest path graph database with edge weightsFigure 4: Shortest path from start to end with distance/coverability as edge weight

So figure 4 gives the route that the delivery man needs to follow to deliver the parcel on time.

Architecting Data Lakes eBook banner

Conclusion

In logistics, planning, impact analysis and many other applications, weighted shortest path algorithms have a great potential. The graph data model and graph databases can be used to generate competitive insights and significant business value in various use cases like: Social and Professional Networking (e.g.: LinkedIn, Facebook), Real-Time Recommendations, Fraud Detection, Master Data Management, Authorization and Access Control etc.

Dr. Roy Marsten in a WIRED article says, “The data that we have today, and in often the ways we look at data, are already steeped in the theory of graphs. Creating and analyzing graphs will bring us to answers automatically. When we let data connect itself, meaning will begin to emerge automatically.” So, with all the chatter around Graph Databases, it won’t be completely unwise to assume that the future is graph shaped.

Further reads:

References:

About the Author

Software Engineer - Big Data Services

More Content by Angshuman Talukdar
Previous Article
Team Zaloni - Spotlight on Chad Olds, VP of Worldwide Sales
Team Zaloni - Spotlight on Chad Olds, VP of Worldwide Sales

Zaloni welcomes Chad Olds to Team Zaloni as the Vice President of Worldwide Sales.

Next Article
Automate Metadata Capture for Enterprise Visibility in the Data Lake
Automate Metadata Capture for Enterprise Visibility in the Data Lake

To be successful with a modern data architecture, you must be able to automate metadata capture from dispar...

×

Get the latest tips and how-to's delivered straight to your inbox!

First Name
Last Name
Zaloni Blog Email Subscription
Thank you!
Error - something went wrong!