MapEquation

Infomap Online

Network community detection using the Map Equation framework

Infomap
Loading...

Infomap

Infomap is a network clustering algorithm based on the Map Equation :

Infomap Online

Infomap Online is a client-side web application that enables users to run Infomap in the web browser. Your data never leaves your computer; we don't store any data on our servers.

We achieve this by compiling Infomap from C++ to JavaScript , which gives a performance penalty compared to the stand-alone version of Infomap.

If you want to integrate Infomap in your own web application, you can use the Infomap NPM package .

Install

We recommend installing Infomap from the Python Package Index. Upgrades are easy and you get access to the Python API .

Currently, we provide pre-compiled packages for Windows and macOS. If no package is available for your platform and Python version, the code compiles from source.

To install, run

pip install infomap

To upgrade, run

pip install --upgrade infomap

Infomap only supports Python 3

We currently build packages for Python 3.6 to 3.10.

Download binary

If you don't want to install Python, we provide pre-compiled binaries for Windows, Ubuntu and macOS. You can download the binaries from the releases page or use the direct links below. The OpenMP versions require libomp-dev on Ubuntu and libomp on macOS.

OpenMPWithout OpenMP
Windowsinfomap-win.zipinfomap-win-noomp.zip
Ubuntu 18.04infomap-ubuntu.zipinfomap-ubuntu-noomp.zip
macOS 10.15infomap-mac.zipinfomap-mac-noomp.zip

Trusting binaries on macOS

Run spctl --add Infomap and enter your password to add the Infomap binary to GateKeeper's trusted binaries.

Compiling from source

Building Infomap from source requires a working GCC or Clang compiler with support for C++14 and optionally OpenMP.

On Ubuntu and Windows with WSL, install the build-essential and libomp-dev packages.

On macOS, you can install Apple's development tools with xcode-select --install and the Homebrew version of OpenMP with brew install libomp.

We don't currently support building Infomap from source on Windows without WSL. If you don't have WSL, you should use the binary releases or the Python package.

Download source code

Download Infomap source code or check the releases page for all releases.

Unzip the file and compile Infomap by running

unzip infomap.zip && cd infomap
make -j

Clone the git repository

To download the development version from Github , clone the repository and compile Infomap by running

git clone git@github.com:mapequation/infomap.git
cd infomap
make -j

Running

If you installed Infomap using pip, usage is

infomap [params] network outdir

If you downloaded the binary or compiled Infomap from source, usage is

./Infomap [params] network outdir
on Linux and macOS and
.\Infomap [params] network outdir

in the Windows command line. You can also use the --help option to get a list of all parameters.

The optional parameters can be put anywhere. The network should point to a valid network file and outdir to a directory where Infomap should write output files.

If no parameters are provided, Infomap will assume an undirected network and try to partition it hierarchically.

The Map Equation

Infomap optimizes The Map equation , which exploits the information-theoretic duality between finding community structure in networks and minimizing the description length of a random walker's movements on a network. For a given network partition , the map equation specifies the theoretical limit of how concisely we can describe the trajectory of this random walk.

The underlying code structure of the map equation is designed such that the description can be compressed if the network has regions in which the random walker tends to stay for a long time. Therefore, with a random walk as a proxy for real flow, minimizing the map equation over all possible network partitions reveals important aspects of network structure with respect to the dynamics on the network.

To take advantage of the regional structure of the network, one index codebook and module codebooks, one for each module in the network, are used to describe the random walker's movements. The module codebooks have codewords for nodes within each module (and exit codes to leave the module), which are derived from the node visit/exit frequencies of the random walker. The index codebook has codewords for the modules, which are derived from the module switch rates of the random walker. Therefore, the average length of the code describing a step of the random walker is the average length of codewords from the index codebook and the module codebooks weighted by their rates of use:

The two-level average description length for a step of the random walker on a network with nodes partitioned into map with modules. The first term is the average description length of the index codebook and the second term is the average description length of the module codebooks.

The rate at which the index codebook is used. The per-step use rate of the index codebook is given by the total probability that the random walker enters any of the modules.

The frequency-weighted average length of codewords in the index codebook. The entropy of the relative rates to use the module codebooks measures the smallest average codeword length that is theoretically possible.

The rate at which the module codebooks are used. The per-step use rate of the module codebooks is given by the total use rate of the module codebooks. For module , this is given by the fraction of time the random walker spends in module , which is given by the total probability that any node in the module is visited, plus the probability that it exits the module and the exit message is used.

The frequency-weighted average length of codewords in module codebook . The entropy of the relative rates at which the random walker exits module and visits each node in module measures the smallest average codeword length that is theoretically possible.

Features

Infomap is a flow-based method that captures community structures based on the dynamics on the network.

(Un)weighted (un)directed links

Infomap handles both unweighted and weighted, undirected and directed links.

Two-level and multi-level solutions

Infomap clusters tightly interconnected nodes into modules (two-level clustering ) or the optimal number of nested modules (multi-level clustering ).

First- and second-order dynamics

Infomap captures flow patterns modeled with both first-order dynamics (as on a conventional network: where flow moves to on the network only depends on where it currently is) and second-order dynamics (where flow moves to on the network both depends on where it currently is and where it just came from). Infomap captures second-order dynamics by performing first-order dynamics on memory nodes, see Memory in network flows and its effects on spreading dynamics and community detection and this interactive storyboard .

Single- and multi-layer networks

Infomap can identify (overlapping) modules in multilayer (multiplex) networks that may not be identified in a single aggregated network or by analyzing the layers separately. See Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems and this interactive storyboard .

Algorithm

The hierarchical map equation measures the per-step average code length necessary to describe a random walker's movements on a network, given a hierarchical network partition, but the challenge is to find the partition that minimizes the description length. Into how many hierarchical levels should a given network be partitioned? How many modules should each level have? And which nodes should be members of which modules?

Below we describe the Infomap algorithm and start with a description of the original two-level clustering of nodes into modules

Two-level Algorithm

The core of the algorithm follows closely the Louvain method : neighboring nodes are joined into modules, which subsequently are joined into supermodules and so on. First, each node is assigned to its own module. Then, in random sequential order, each node is moved to the neighboring module that results in the largest decrease of the map equation. If no move results in a decrease of the map equation, the node stays in its original module. This procedure is repeated, each time in a new random sequential order, until no move generates a decrease of the map equation. Now the network is rebuilt, with the modules of the last level forming the nodes at this level, and, exactly as at the previous level, the nodes are joined into modules. This hierarchical rebuilding of the network is repeated until the map equation cannot be reduced further.

With this algorithm, a fairly good clustering of the network can be found in a very short time. Let us call this the core algorithm and see how it can be improved. The nodes assigned to the same module are forced to move jointly when the network is rebuilt. As a result, what was an optimal move early in the algorithm might have the opposite effect later in the algorithm. Because two or more modules that merge together and form one single module when the network is rebuilt can never be separated again in this algorithm, the accuracy can be improved by breaking the modules of the final state of the core algorithm in either of the two following ways:

Submodule movements:First, each cluster is treated as a network on its own and the main algorithm is applied to this network. This procedure generates one or more submodules for each module. Then all submodules are moved back to their respective modules of the previous step. At this stage, with the same partition as in the previous step but with each submodule being freely movable between the modules, the main algorithm is re-applied on the submodules.

Single-node movements:First, each node is re-assigned to be the sole member of its own module, in order to allow for single-node movements. Then all nodes are moved back to their respective modules of the previous step. At this stage, with the same partition as in the previous step but with each single node being freely movable between the modules, the main algorithm is re-applied on the single nodes.

In practice, we repeat the two extensions to the core algorithm in sequence and as long as the clustering is improved. Moreover, we apply the submodule movements recursively. That is, to find the submodules to be moved, the algorithm first splits the submodules into subsubmodules, subsubsubmodules, and so on until no further splits are possible. Finally, because the algorithm is stochastic and fast, we can restart the algorithm from scratch every time the clustering cannot be improved further and the algorithm stops. The implementation is straightforward and, by repeating the search more than once, 100 times or more if possible, the final partition is less likely to correspond to a local minimum. For each iteration, we record the clustering if the description length is shorter than the previously shortest description length.

Multi-level Algorithm

We have generalized our search algorithm for the two-level map equation to recursively search for multilevel solutions. The recursive search operates on a module at any level; this can be all the nodes in the entire network, or a few nodes at the finest level. For a given module, the algorithm first generates submodules if this gives a shorter description length. If not, the recursive search does not go further down this branch. But if adding submodules gives a shorter description length, the algorithm tests if movements within the module can be further compressed by additional index codebooks. Further compression can be achieved both by adding one or more coarser codebooks to compress movements between submodules or by adding one or more finer index codebooks to compress movements within submodules. To test for all combinations, the algorithm calls itself recursively, both operating on the network formed by the submodules and on the networks formed by the nodes within every submodule. In this way, the algorithm successively increases and decreases the depth of different branches of the multilevel code structure in its search for the optimal hierarchical partitioning. For every split of a module into submodules, we use the two-level search algorithm described above.

Feedback

If you have any questions or suggestions regarding the software, please add them to GitHub Discussions .

Bugs and installation problems can be reported in GitHub Issues .

References

If you are using the software at mapequation.org in one of your research articles or otherwise want to refer to it, please cite relevant publication or use the following format:

D. Edler, A. Eriksson and M. Rosvall, The MapEquation software package, available online at mapequation.org.

© 2022 mapequation.org – Terms