Skip to content

Instantly share code, notes, and snippets.

@aanastasiou
Created July 28, 2013 18:32
Show Gist options
  • Save aanastasiou/6099561 to your computer and use it in GitHub Desktop.
Save aanastasiou/6099561 to your computer and use it in GitHub Desktop.
Generate a Cypher query to store a Python Networkx directed graph

Exporting a Networkx graph as a Cypher query

This little project defines a function that can be used to construct a Cypher query which when executed against a Neo4j database server will store the graph to the server.

Background

  • A Graph is an abstract mathematical model composed of Nodes connected through Edges that can be used to describe complex systems composed of a set of parts (corresponding to nodes) and their connections (corresponding to edges).
  • Examples of graphs are road networks (junctions connected via roads), electronic circuit networks (components and their connections) and others
  • Networkx is an excellent Python module for manipulating such Graph objects of any kind.
  • Neo4j is a graph database. It uses the Graph as a data model to store such objects to a data store.
  • Cypher is Neo4j's query language. It is a domain specific language that can be used to manipulate graph objects.

Objectives

Given a graph (G) write a function that creates the query to store the graph with all of its nodes, edges and attributes.

How is it done?

  • By traversing all nodes and edges and creating the corresponding parts of the Cypher query.

Assumptions, Requirements, Caveats

  • graph2Cypher requires the random, Networkx modules.
  • The graph2Cypher_demo.py requires networkx, matplotlib
  • The graph2Cypher function assumes that its (only) parameter IS A DIRECTED GRAPH.
  • Simply going through all nodes and edges and dumping their attributes is not practical for all graphs because the node-id used by Networkx might not be usable by Neo4j directly. The typical example is a graph whose Networkx node-ids are integers.
  • For this reason and just for the needs of constructing the Cypher query, the graph's nodes get relabeled on the fly.
  • Furthermore, certain assumptions are made on attribute names. Each node's id is identified by the ID node attribute, while edges are getting the type ":LINKED_TO" by default.

Use

  • Obviously, the function can be used in stand-alone mode to create the query that can then be sent to the neo4j database through something like the Python REST interface or the Neo4j-shell.

  • In the case of the Neo4j-shell, assuming that you have it to your system path, you can simply do the following:

    python graph2Cypher_demo.py>aGraph.cypher #This creates the text file with the Cypher query neo4j-shell -file aGraph.cypher #This will execute the query within aGraph.cypher and store the graph to the database.

"""Defines a function that parses a Networkx graph and produces a Cypher query to store the graph in a Neo4j graph database
Athanasios Anastasiou 28/07/2013
"""
include random
include networkx
#Simple character lists
letDCT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numDCT = "0123456789"
def getRndTag(someLen, dct=letDCT):
"""Returns some random string of length someLen composed of the characters in the dct string"""
return "".join([dct[random.randint(0,len(dct)-1)] for i in range(0,someLen)])
def graph2Cypher(aGraph):
"""Generates a Cypher query from a Networkx Graph"""
nodeStatements = {}
edgeStatements = []
#Partially generate the node representations
for aNode in G.nodes(data = True):
#Generate a node identifier for Cypher
varName = getRndTag(2)+getRndTag(2,dct=numDCT)
#Append the node's ID attribute so that the node-ID information used by Networkx is preserved.
nodeItems = [("ID","%s" % aNode[0])]
nodeItems.extend(aNode[1].items())
#Create the key-value representation of the node's attributes taking care to add quotes when the value is of type string
nodeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,nodeItems))
#Store it to a dictionary indexed by the node-id.
nodeStatements[aNode[0]] = [varName, "(%s %s)" % (varName, nodeAttributes)]
#Generate the relationship representations
for anEdge in G.edges(data = True):
edgeItems = anEdge[2].items()
edgeAttributes = ""
if len(edgeItems)>0:
edgeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,edgeItems))
#NOTE: Declare the links by their Cypher node-identifier rather than their Networkx node identifier
edgeStatements.append("(%s)-[:LINKED_TO %s]->(%s)" % (nodeStatements[anEdge[0]][0], edgeAttributes, nodeStatements[anEdge[1]][0]))
#Put both definitions together and return the create statement.
return "create %s,%s;\n" % (",".join(map(lambda x:x[1][1],nodeStatements.items())),",".join(edgeStatements))
"""Creates a dummy tree graph example and from that, its Cypher representation"""
import networkx
import sys
import random
import graph2Cypher
#Create a DIRECTED network (In this case a simple binary tree (branching factor=2) having 17 nodes)
G = networkx.generators.full_rary_tree(2,17, create_using=networkx.DiGraph())
#Add some attributes to the nodes
for aNode in G.nodes():
G.node[aNode]['label'] = graph2Cypher.getRndTag(5)
G.node[aNode]['cost'] = random.randint(0,9)
#Add some attributes to the edges
#(Note: Here, 'diameter' could refer to pipe diameter, it's just a dummy name.)
for anEdge in G.edges():
G.edge[anEdge[0]][anEdge[1]].update({'diameter':random.randint(0,9)})
#Write the output to the standard output (this way the query could be piped if required)
sys.stdout.write(graph2Cypher.graph2Cypher(G))
@ducky427
Copy link

I've written something similar to this here.

@Gijs-Koot
Copy link

Hey, you've mixed up aGraph and G as function arguments, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment