Skip to content

Instantly share code, notes, and snippets.

@sampollard
Last active April 14, 2021 05:08
Show Gist options
  • Save sampollard/f9169c4eb04669390a834884682c080d to your computer and use it in GitHub Desktop.
Save sampollard/f9169c4eb04669390a834884682c080d to your computer and use it in GitHub Desktop.
Comprehensive filename extension for graph file formats

There are almost as many graph file formats as there are graph processing platforms. Here, I attempt to provide a comprehensive listing. Some of these are nearly trivial, but yet, the standards are just not established. Whether the vertices are 0-indexed or 1-indexed is undefined. However, this does matter for some systems such as GraphMat. All the examples show the same directed, unweighted graph with the exception of .metis.

ASCII Text Formats

.el

This is the "edge list" format. The format is one edge per line of the form . For example,

1 2
2 1
1 3

This is arguably the simplest format and is widely used. I sometimes use .1el to denote a one-indexed vertex numbering file, but otherwise it is assumed the vertices are zero-indexed. There may be any number of spaces and

Variants of .el

  1. .eg (Edge Graph) - This is identical to .el only the first line is the string EdgeArray. This is used for the Problem Based Benchmark Suite (PBBS). Note that the file is called an Edge Graph but the first line is EdgeArray. Their decision, not mine.

.wel

This is the "weighted edge list" format. The format is one edge per line in the form . For example,

1 2 5.0e-02
2 1 0.1
1 3 1

The weights may be integral, floating point, or exponential. The same conventions of .el apply with indexing and spacing.

Variants of .wel

  1. .weg (Weighted Edge Graph) This is identical to .wel only the first line is the string WeightedEdgeArray. As with .weg this is described in the PBBS. Note that the file is called a Weighted Edge Graph but the first line is WeightedEdgeArray.

.csv

The ever-popular CSV file format can also be used to represent edges. Note that both .el and .wel can be considered CSVs. I refer to CSVs as any file that uses a delimiter that is NOT whitespace. Note that some systems (namely GraphBIG) require the first line of the CSV be something like SRC,DEST

.v

This format is used to represent vertices. It consists of a single integer per line.

.metis

This is a more complex format as specified here. The linked website uses the .graph file extension, but I find that ambiguous. Lines that start with % are comments, the first non-comment line is <nvertices> <nedges> and there are <nvertices> lines after that, where line i represents the neighbors of vertex i. This format has several extensions for different types of weighted graphs but can only represent undirected graphs. For example,

% Here we have 3 vertices and 2 edges
3 2
2 3
1
1

Note that some implementations (such as the Parallel Boost Graph Library's metis_reader) expect there to be no duplicate edges.

.graphml

This file format follows the XML standard and is described here. Our running example graph is a bit more verbose but is below.

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns 
        http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <key id="d1" for="edge" attr.name="weight" attr.type="double"/>
  <graph id="G" edgedefault="directed">
    <node id="n1"/>
    <node id="n2"/>
    <node id="n3"/>
    <edge source="n1" target="n2"/>
        <data key="d1">0.05</data>
    </edge>
    <edge source="n2" target="n1"/>
        <data key="d1">0.1</data>
    </edge>
    <edge source="n1" target="n3"/>
        <data key="d1">1.0</data>
    </edge>
  </graph>
</graphml>

Pajek

This is the file format used with the Pajek project.

Binary Formats

.roots

This is a binary format output by the Graph500 used to store roots of vertices. It is simply an array of int64_t.

.graph500

This is the packed_edge file format generated by the Graph500 . Even this is ambiguous; It can either be an array of edges with 96 bits per edge (three int32_ts) or 128 bits per edge (two int64_ts) , though by default it is the latter. This is described here

.graphmat

This is the binary format used by GraphMat. I don't know how it works, but I do know you can convert a 1-indexed (the first vertex is 1, not 0) .el or .wel file into the .graphmat using the binary built in GraphMat/bin:

graph_converter --selfloops 1 --duplicatededges 1 --inputformat 1 --outputformat 0 --inputheader 0 --outputheader 1  myfile.el myfile.graphmat

.gr

This is the format used by Galois. They have a relatively complete way to convert to and from this .gr format in their distribution.

Variants of .gr

  1. .vgr Binary Void gr format. This is used by Galois when the edges are unweighted.

Matrix Formats

These are not covered in depth here, but may be described later. They may be binary or text.

  1. .mtx Matrix Market Format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment