There are almost as many graph file formats as there are graph processing platforms. Here, I attempt to provide a comprehensive listing. Some of these are nearly trivial, but yet, the standards are just not established. Whether the vertices are 0-indexed or 1-indexed is undefined. However, this does matter for some systems such as GraphMat. All the examples show the same directed, unweighted graph with the exception of .metis
.
This is the "edge list" format. The format is one edge per line of the form . For example,
1 2
2 1
1 3
This is arguably the simplest format and is widely used. I sometimes use .1el
to denote a one-indexed vertex numbering file, but otherwise it is assumed the vertices are zero-indexed. There may be any number of spaces and
.eg
(Edge Graph) - This is identical to.el
only the first line is the stringEdgeArray
. This is used for the Problem Based Benchmark Suite (PBBS). Note that the file is called an Edge Graph but the first line is EdgeArray. Their decision, not mine.
This is the "weighted edge list" format. The format is one edge per line in the form . For example,
1 2 5.0e-02
2 1 0.1
1 3 1
The weights may be integral, floating point, or exponential. The same conventions of .el
apply with indexing and spacing.
.weg
(Weighted Edge Graph) This is identical to.wel
only the first line is the string WeightedEdgeArray. As with.weg
this is described in the PBBS. Note that the file is called a Weighted Edge Graph but the first line is WeightedEdgeArray.
The ever-popular CSV file format can also be used to represent edges. Note that both .el and .wel can be considered CSVs. I refer to CSVs as any file that uses a delimiter that is NOT whitespace. Note that some systems (namely GraphBIG) require the first line of the CSV be something like SRC,DEST
This format is used to represent vertices. It consists of a single integer per line.
This is a more complex format as specified here. The linked website uses the .graph
file extension, but I find that ambiguous. Lines that start with % are comments, the first non-comment line is <nvertices> <nedges>
and there are <nvertices>
lines after that, where line i
represents the neighbors of vertex i
. This format has several extensions for different types of weighted graphs but can only represent undirected graphs. For example,
% Here we have 3 vertices and 2 edges
3 2
2 3
1
1
Note that some implementations (such as the Parallel Boost Graph Library's metis_reader
) expect there to be no duplicate edges.
This file format follows the XML standard and is described here. Our running example graph is a bit more verbose but is below.
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d1" for="edge" attr.name="weight" attr.type="double"/>
<graph id="G" edgedefault="directed">
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<edge source="n1" target="n2"/>
<data key="d1">0.05</data>
</edge>
<edge source="n2" target="n1"/>
<data key="d1">0.1</data>
</edge>
<edge source="n1" target="n3"/>
<data key="d1">1.0</data>
</edge>
</graph>
</graphml>
This is the file format used with the Pajek project.
This is a binary format output by the Graph500 used to store roots of vertices. It is simply an array of int64_t
.
This is the packed_edge
file format generated by the Graph500 . Even this is ambiguous; It can either be an array of edges with 96 bits per edge (three int32_t
s) or 128 bits per edge (two int64_t
s) , though by default it is the latter. This is described here
This is the binary format used by GraphMat. I don't know how it works, but I do know you can convert a 1-indexed (the first vertex is 1, not 0) .el or .wel file into the .graphmat using the binary built in GraphMat/bin
:
graph_converter --selfloops 1 --duplicatededges 1 --inputformat 1 --outputformat 0 --inputheader 0 --outputheader 1 myfile.el myfile.graphmat
This is the format used by Galois. They have a relatively complete way to convert to and from this .gr
format in their distribution.
.vgr
Binary Void gr format. This is used by Galois when the edges are unweighted.
These are not covered in depth here, but may be described later. They may be binary or text.
.mtx
Matrix Market Format