Load network
Edit network or load file
Run Infomap
Toggle parameters or add arguments
Explore map!
Save result or open in Network Navigator
Load network by dragging & dropping.
Supported formats.
Infomap output will be printed here

Infomap

Infomap is a network clustering algorithm based on the Map equation. For more info, see features.

Looking for Infomap 0.x?
Please read the old documentation.

Infomap Online

Infomap Online is a client-side web application that makes it possible for users to run Infomap without any installation. Infomap runs locally on your computer and uploads no data to any server. We support this solution by compiling Infomap from C++ to JavaScript with Emscripten, which gives a performance penalty compared to the stand-alone version of Infomap.

The code for running Infomap as a web worker in the browser is available as a package on NPM.

Install

Infomap can be installed on your computer in several ways.

We recommend using pip. If you want to compile from source yourself, read the section Compiling from source.

Installing Infomap requires a working gcc or clang compiler. More information can be found under Prerequisites.

Using pip

The easiest way to download Infomap is from the Python Package Index, PyPi.

To install, run

pip install infomap

To upgrade, run

pip install --upgrade infomap

When Infomap is installed, an executable called infomap is available on the command line from any directory.

To use the Infomap Python API, read the the API documentation.

Python 2 support

Since Python 2 has reached End of Life, Infomap now only supports Python 3.

Python 3 can be installed with Miniconda.

Compiling from source

To compile Infomap from source, first download the source code or clone the git repository.

After downloading the source and running make, an executable called Infomap will be available in the source directory.

Download source code

Download Infomap 1.0.11 or check the releases page for all releases.

Unzip the file and compile Infomap by running

unzip infomap-1.0.11.zip
cd infomap-1.0.11
make

Clone the git repository

To download the development version from Github, clone the repository and compile Infomap by running

git clone git@github.com:mapequation/infomap.git
cd infomap
make
Migrating from 0.x to 1.0

We have moved the old master branch to v0.x.

If you have cloned Infomap before v1.0, get the master branch up-to-date by running

git checkout v0.x
git branch -D master
git checkout master

Prerequisites

Infomap requires a working gcc or clang compiler.

Linux

In Ubuntu, for example, the metapackage build-essential installs the compiler as well as related packages. Install it from the terminal with

sudo apt-get install build-essential

macOS

Since Mac OS X 10.9, the standard compiler tools are based on clang, which can be installed with

xcode-select --install

However, the current version lacks OpenMP support for parallelization. While the Makefile automatically skips the -fopenmap compiler flag if the standard compiler is clang, to get support for OpenMP you can manually install a gcc-based compiler. A simple way is to install Homebrew and type, for example, brew install gcc in the terminal.

Windows

We recommend running Infomap in bash on ubuntu on Windows 10. Follow this installation guide or this installation guide to enable the Linux Bash Shell in Windows 10. For example, install git with

sudo apt-get install git

Then install the metapackage build-essential for the compiler and related packages with

sudo apt-get install build-essential

Without Windows 10, you can install MinGW/MSYS for a minimalist development environment. Follow the instructions on MinGW - Getting Started or download the complete MinGW-MSYS Bundle.

Running

If you installed Infomap using pip, usage is

infomap [parameters] network_data out_directory

If you compiled Infomap from source by running make, usage is

./Infomap [parameters] network_data out_directory

To make the compiled from source version of Infomap available in any directory, either add the source directory to your $PATH, or add an alias to your .profile by running

echo "alias Infomap=$PWD/Infomap" >> ~/.profile

in the source directory and restarting your shell.

The optional parameters can be put anywhere. The network_data should point to a valid network file and out_directory to a directory where Infomap should write output files.

If no parameters are provided, Infomap will assume an undirected network and try to partition it hierarchically.

Input formats

Infomap understands different file formats for the input network data, for example a minimal link list format or the standard Pajek format.

Infomap will try to recognize the file format, but the format can be explicitly specified with the flags -i or --input-format followed by a format specifier.

-i link-list

This is a minimal format to describe a network by only specifying a set of links:

Run example#source target [weight]
1 2 1
1 3 1
2 3 2
3 5 0.5

Each line corresponds to the triad source target weight which describes a weighted link between the nodes with specified numbers. The weight can be any non-negative value. If omitted, the link weight will default to 1.

Pajek

-i pajek

The Pajek format specifies both the nodes and the links in two different sections of the file:

Run example# A network in Pajek format
*Vertices 4
1 "1"
2 "2"
3 "3"
4 "4"
*Edges 4
#source target [weight]
1 2 1
1 3 1
1 4 1
2 3 1

The node section must start with *Vertices N and the link section with *Edges N or*Arcs N (case insensitive), where N is the number of nodes or links. The characters within each quotation mark defines the corresponding node name. Weights can be given to nodes by adding a third column with positive numbers, and the list of links are treated the same as for a link list.

Bipartite

-i bipartite

The bipartite format uses the heading *Bipartite N where N is the first node id of the second node type. The bipartite network can be provided both with node names:

Run example# A bipartite network with node names
*Vertices 5
1 "Node 1"
2 "Node 2"
3 "Node 3"
4 "Feature 1"
5 "Feature 2"
# set bipartite start id to 4
*Bipartite 4
1 4 1
1 5 1
2 4 1
2 5 1
3 5 1

and in the link list format:

Run example# A bipartite network in link list format
# set bipartite start id to 4
*Bipartite 4
1 4 1
1 5 1
2 4 1
2 5 1
3 5 1

Multilayer

-i multilayer

In a multilayer network, each physical node can exist in a number of layers, with different link structure for each layer. A general multilayer format follows the Pajek layout, but with the links defined between nodes for each layer:

Run example# A network in a general multilayer format
*Vertices 4
1 "Node 1"
2 "Node 2"
3 "Node 3"
4 "Node 4"
*Multilayer
#layer node layer node [weight]
1 1 1 2 2
1 1 2 2 1
1 2 1 1 1
1 3 2 2 1
2 2 1 3 1
2 3 2 2 1
2 4 2 1 2
2 4 1 2 1
The *Multilayer heading no longer optional.

The weight column is optional. Links without the last column will get weight 1.0 by default.

The multilayer format above gives full control of the dynamics, and no other movements are encoded. However, it is often useful to consider a dynamics in which a random walker moves within a layer and with a given relax rate jumps to another layer without recording this movement, such that the constraints from moving in different layers can be gradually relaxed. The format below explicitely divides the links into two groups, links within layers (intra-layer links) and links between layers (inter-layer links).

For the first group, the second layer column can be omitted. For the second group, the second node can be omitted, as a shorthand for an unrecorded jump between layers. That is, each inter-layer link layer1 node1 layer2 is expanded to weighted multilayer links layer1 node1 layer2 node2, one for each node2 that node1 is connected to in layer2 with weight proportional to the weight of the link between node1 and node2 inlayer2.

In this way, the random walker seamlessly switches to a different layer at a rate proportional to the inter-layer link weight to that layer, and the encoded dynamics correspond to relaxed layer constraints (see the interactive storyboard for illustration).

To define links like this, use the *Intra and *Inter headings:

Run example# A network in multilayer format
*Intra
#layer node node weight
1 1 2 1
1 2 1 1
1 2 3 1
1 3 2 1
1 3 1 1
1 1 3 1
1 2 4 1
1 4 2 1
2 4 5 1
2 5 4 1
2 5 6 1
2 6 5 1
2 6 4 1
2 4 6 1
2 3 6 1
2 6 3 1
*Inter
#layer node layer weight
1 3 2
2 3 1
1 4 2
2 4 1

If no inter links are provided, the inter links will be generated from the intra link structure by relaxing the layer constraints on those links.

States

-i states

The state format describes the exact network used internally by Infomap. It can model both ordinary networks and memory networks of variable order.

Run example# A network in state format
*Vertices 4
1 "PRE"
2 "SCIENCE"
3 "PRL"
4 "BIO"
*States
#state_id physical_id [name]
1 2 "1 2"
2 3 "2 3"
3 2 "4 2"
4 4 "2 4"
*Links
#source_state_id target_state_id
1 2
3 4

The *Vertices section is optional and follows the Pajek format. It is used to name the physical nodes. The *States section describes all internal nodes with the format state_id physical_id [name], where name is optional. The state id is referenced in the *Links section, and the physical id of the node is optionally referenced in the *Vertices section. The *Links section describes all links as source target [weight], where source andtarget references the state id in the *States section.

Output formats

By default, Infomap outputs a .tree-file if no other output formats is specified. To output several outputs in a single run, use -o tree,ftree,clu (see Output parameters).

All output formats start with a standardized heading which includes the Infomap version, the command line arguments, when it was started and completed, and resulting codelength:

# v1.0.11
# ./Infomap network.net . --ftree --clu
# started at 2020-03-04 15:38:35
# completed in 0.114 s
# codelength 2.32073 bits
# relative codelength savings 9.22792%

The relative codelength savings SL is calculated as

SL = 1 − L/L1

where L is the codelength and L1 is the one-level codelength.

The Map format used by the old flash Network Navigator is deprecated.

Physical and state-level output

The map equation for first-order network flows measures the description length of a random walker stepping between physical nodes within and between modules. This principle remains the same also for higher-order network flows, although higher-order models guide the random walker between physical nodes with the help of state nodes. Therefore, extending the map equation to higher-order network flows, including those described by memory, multilayer, and sparse memory networks, is straightforward. The only difference is at the finest level (Figure 1b). State nodes of the same physical node assigned to the same module should share code word, or they would not represent the same object.

For higher-order networks, Infomap writes two files for each of the clu,tree and ftree output option. The extra file has _statesappended to the name before the extension and describes the internal state-level network. In the ordinary output, state nodes within each module are merged if they belong to the same physical node, corresponding to how the Map Equation for higher-order networks only encode observable flow. If state nodes of the same physical node exist in different modules, the node id in the physical output will occur more than once, corresponding to overlapping modules. In the state-level output however, each node only exist in one module.

Physical and state nodes in output
Figure 1. Network flows at different modular levels. Large circles represent physical nodes, small circles represent state nodes, and dashed areas represent modules. (a) Finest modular level with physical nodes for first-order network flows; (b) Finest modular level with physical nodes and state nodes for higher-order network flows; (c) Intermediate level; (d) Coarsest modular level.

Tree

--tree, -o tree

The default output format if no other output is specified. The resulting hierarchy will be written to a file with the extension .tree and corresponds to the best hierarchical partition (shortest description length) of the attempts. The output format has the pattern:

# path flow name node_id
1:1 0.214286 "1" 1
1:2 0.142857 "2" 2
1:3 0.142857 "3" 3
2:1 0.214286 "4" 4
2:2 0.142857 "5" 5
2:3 0.142857 "6" 6

Each row begins with the multilevel module assignments of a node. The module assignments are colon separated from coarse to fine level, and all modules within each level are sorted by the total flow (PageRank) of the nodes they contain.

Further, the integer after the last colon is the rank within the finest-level module, the decimal number is the amount of flow in that node, i.e. the steady state population of random walkers, the part within quotation marks is the node name, and finally, the last integer is the id of the node in the original network file.

FTree

--ftree, -o ftree

The Tree format with an appended section of links with the flow between nodes within the same parent. The file is saved with the extension .ftree, and begins with the node hierarchy formatted as the tree format, followed by the links formatted as:

*Links undirected
#*Links path enterFlow exitFlow numEdges numChildren
*Links root 0 0 2 2
1 2 0.0714286
2 1 0.0714286
*Links 1 0.0714286 0.0714286 6 3
1 2 0.0714286
1 3 0.0714286
2 1 0.0714286
2 3 0.0714286
3 1 0.0714286
3 2 0.0714286
*Links 2 0.0714286 0.0714286 6 3
1 2 0.0714286
1 3 0.0714286
2 1 0.0714286
2 3 0.0714286
3 1 0.0714286
3 2 0.0714286

First is a line stating undirected or directed links. Then each module is identified by the path field, followed by all links between child nodes of that module.

The path is a colon separated path in the tree from the root to finest-level modules. The links are sorted by flow within each module. Links entering or exiting the module are not included, but the flow on those links is aggregated to enterFlow and exitFlow. The number of edges and child nodes within each module is also included in the module header line, as defined in the commented line above.

Clu

--clu, -o clu

A .clu file describes the best partition (shortest description length) of the attempts. By default, it will output the top level module assignments. To specify the module level, use --clu-level <i> where <i> is an integer.

The format has the pattern:

# module level: 1
# id module flow
1 1 0.214286
2 1 0.142857
3 1 0.142857
4 2 0.214286
5 2 0.142857
6 2 0.142857

If the .clu file is used as an input clustering to Infomap, the flow column will not be used and may be omitted.

Parameters

Input

Provide an initial two-level (clu format) or multi-layer (tree format) solution.
Don't run the optimizer. Useful to calculate codelength of provided cluster data or to print non-modular statistics.

Output

Write a tree file with the modular hierarchy. Automatically enabled if no other output is specified.
Write a ftree file with the modular hierarchy including aggregated links between (nested) modules. (Used by Network Navigator)
Write a clu file with the top cluster ids for each node.
Verbose output on the console. Add additional 'v' flags to increase verbosity up to -vvv.
No output on the console.

Algorithm

Optimize a two-level partition of the network.
Specify flow model. Options: undirected, directed, undirdir, outdirdir, rawdir.
Assume directed links. Shorthand for '--flow-model directed'.

Accuracy

A seed (integer) to the random number generator for reproducible results.
Number of outer-most loops to run before picking the best solution.

About

Prints this help message. Use -hh to show advanced options.
Display program version information.

Changelog

1.0.11(2020-03-19)

Fixes

Handle zero weights in intra-layer links

1.0.10(2020-03-17)

Fixes

Handle unassigned nodes after input tree (#119)

1.0.9(2020-03-14)

Fixes

windows compilation error on std::min

1.0.8(2020-03-12)

Fixes

Fix reconstruct physically merged state nodes (#118)
Remove debug output (#117)

1.0.7(2020-03-09)

Fixes

Use *Edges/*Arcs instead of *Links in pajek output
Add header in network and states output
Use node_id instead of id in clu output
js Fix support for all file io in Infomap.js

1.0.6(2020-03-03)

Fixes

python Enable numpy.int64 in link weights (#107)
python Print tree by default with python cli (#106)

1.0.5(2020-03-02)

Fixes

consistently use 1-based indexing for paths and 0 for indexes (#103)
Common parameters should not be advanced (#101)
Fix ftree links since remapping path from 1 (#102)

1.0.4(2020-02-28)

Fixes

js Revert attempted worker blob optimization
Show more

Features

Infomap optimizes The Map equation, which exploits the information-theoretic duality between the problem of compressing data, and the problem of detecting and extracting significant patterns or structures within those data.

Specifically, the map equation is a flow-based method and operates on dynamics on the network.

(Un)weighted (un)directed links

Infomap handles both unweighted and weighted, undirected and directed links.

Two-level and multi-level solutions

Infomap clusters tightly interconnected nodes into modules (two-level clustering) or the optimal number of nested modules (multi-level clustering).

First- and second-order dynamics

Infomap captures flow patterns modeled with both first-order dynamics (as on a conventional network: where flow moves to on the network only depends on where it currently is) and second-order dynamics (where flow moves to on the network both depends on where it currently is and where it just came from). Infomap captures second-order dynamics by performing first-order dynamics on memory nodes, see Memory in network flows and its effects on spreading dynamics and community detection and this interactive storyboard.

Single- and multi-layer networks

Infomap can identify (overlapping) modules in multilayer (multiplex) networks that may not be identified in a single aggregated network or by analyzing the layers separately. See Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems and this interactive storyboard.

Algorithm

The hierarchical map equation measures the per-step average code length necessary to describe a random walker's movements on a network, given a hierarchical network partition, but the challenge is to find the partition that minimizes the description length. Into how many hierarchical levels should a given network be partitioned? How many modules should each level have? And which nodes should be members of which modules?

Below we describe the Infomap algorithm and start with a description of the original two-level clustering of nodes into modules

Two-level Algorithm

The core of the algorithm follows closely the Louvain method: neighboring nodes are joined into modules, which subsequently are joined into supermodules and so on. First, each node is assigned to its own module. Then, in random sequential order, each node is moved to the neighboring module that results in the largest decrease of the map equation. If no move results in a decrease of the map equation, the node stays in its original module. This procedure is repeated, each time in a new random sequential order, until no move generates a decrease of the map equation. Now the network is rebuilt, with the modules of the last level forming the nodes at this level, and, exactly as at the previous level, the nodes are joined into modules. This hierarchical rebuilding of the network is repeated until the map equation cannot be reduced further.

With this algorithm, a fairly good clustering of the network can be found in a very short time. Let us call this the core algorithm and see how it can be improved. The nodes assigned to the same module are forced to move jointly when the network is rebuilt. As a result, what was an optimal move early in the algorithm might have the opposite effect later in the algorithm. Because two or more modules that merge together and form one single module when the network is rebuilt can never be separated again in this algorithm, the accuracy can be improved by breaking the modules of the final state of the core algorithm in either of the two following ways:

Submodule movements: First, each cluster is treated as a network on its own and the main algorithm is applied to this network. This procedure generates one or more submodules for each module. Then all submodules are moved back to their respective modules of the previous step. At this stage, with the same partition as in the previous step but with each submodule being freely movable between the modules, the main algorithm is re-applied on the submodules.

Single-node movements: First, each node is re-assigned to be the sole member of its own module, in order to allow for single-node movements. Then all nodes are moved back to their respective modules of the previous step. At this stage, with the same partition as in the previous step but with each single node being freely movable between the modules, the main algorithm is re-applied on the single nodes.

In practice, we repeat the two extensions to the core algorithm in sequence and as long as the clustering is improved. Moreover, we apply the submodule movements recursively. That is, to find the submodules to be moved, the algorithm first splits the submodules into subsubmodules, subsubsubmodules, and so on until no further splits are possible. Finally, because the algorithm is stochastic and fast, we can restart the algorithm from scratch every time the clustering cannot be improved further and the algorithm stops. The implementation is straightforward and, by repeating the search more than once, 100 times or more if possible, the final partition is less likely to correspond to a local minimum. For each iteration, we record the clustering if the description length is shorter than the previously shortest description length.

Multi-level Algorithm

We have generalized our search algorithm for the two-level map equation to recursively search for multilevel solutions. The recursive search operates on a module at any level; this can be all the nodes in the entire network, or a few nodes at the finest level. For a given module, the algorithm first generates submodules if this gives a shorter description length. If not, the recursive search does not go further down this branch. But if adding submodules gives a shorter description length, the algorithm tests if movements within the module can be further compressed by additional index codebooks. Further compression can be achieved both by adding one or more coarser codebooks to compress movements between submodules or by adding one or more finer index codebooks to compress movements within submodules. To test for all combinations, the algorithm calls itself recursively, both operating on the network formed by the submodules and on the networks formed by the nodes within every submodule. In this way, the algorithm successively increases and decreases the depth of different branches of the multilevel code structure in its search for the optimal hierarchical partitioning. For every split of a module into submodules, we use the two-level search algorithm described above.

Feedback

If you have any questions, suggestions or issues regarding the software, please add them to GitHub issues.

References

If you are using the software at mapequation.org in one of your research articles or otherwise want to refer to it, please cite relevant publication or use the following format:

D. Edler, A. Eriksson and M. Rosvall, The MapEquation software package, available online at mapequation.org.