It is kept for reference and is no longer updated.
For the newest version, check Infomap Online or Github.
Here we provide the source code to the Infomap algorithm for detecting communities in large networks with the map equation framework. See also GossipMap, a distributed version for billion-edge directed graphs implemented with GraphLab PowerGraph by Seung-Hee Bae and Bill Howe at University of Washington. We also provide significance clustering code.
            Infomap is a command line software
            written in C++ that runs in Linux, Mac OS X, and Windows with
            gcc installed.
          
            Infomap requires a working
            gcc or clang compiler.
          
            In Ubuntu, for example, the metapackage
            build-essential installs the compiler as well as
            related packages. Install it from the terminal with
          
sudo apt-get install build-essential
          Since Mac OS X 10.9, the standard compiler tools are based on clang, which can be installed with
xcode-select --install
          
            However, the current version lacks OpenMP support for
            parallelization. While the Makefile automatically skips the
            -fopenmap compiler flag if the standard compiler is
            clang, to get support for OpenMP you can manually install a
            gcc-based compiler. A simple way is to install
            Homebrew and type, for
            example, brew install gcc43 in the terminal.
          
            We recommend running Infomap in
            bash on ubuntu on Windows 10. Follow
            this installation guide
            or
            this installation guide
            to enable the Linux Bash Shell in Windows 10. For example, install
            git with sudo apt-get install git. Then
            install the metapackage build-essential for the
            compiler and related packages with
          
sudo apt-get install build-essential
          Without Windows 10, you can install MinGW/MSYS for a minimalist development environment. Follow the instructions on MinGW - Getting Started or download the complete MinGW-MSYS Bundle.
Infomap v0.x code can be downloaded here infomap-0.x.zip .
To extract the zipped archive and compile the source code in a Unix-like environment, open a terminal and type
cd [path/to/Infomap]
unzip infomap-0.x.zip
cd infomap-0.x
make
          
            Substitute [path/to/Infomap] to the folder where
            Infomap was downloaded, such as ~/Downloads. With
            git installed, instead clone the source code with
            git clone -b v0.x https://github.com/mapequation/infomap.git.
            If you installed a custom compiler with, for example,
            brew install gcc43, replace the last command with
            CXX=g++-4.3 make.
          
            Compilation error due to missing parallelization library.
            If there is a compilation error that references
            -lomp or -lpthread, the compiler probably
            can't find the OpenMP library which is used for
            parallelization. To compile without the OpenMP library, type
            make noomp. You can also try to compile with another
            compiler by calling for example CXX=g++-4.3 make.
            Please give us feedback if you have any troubles!
          
Usage: ./Infomap [options] network_data dest
            The optional arguments can be put anywhere, see the
            Options section for the available
            options. The network_data should point to
            a valid network file (see Input section) and
            dest to a directory where the Infomap should write its
            output files.
          
option is provided,
            Infomap will assume an undirected
            network and try to partition it hierarchically.
          
            Run Infomap with the options -h or
            --help to list the options below:
          
-d2N3 is equivalent to
            -d -2 -N 3.
          
            Run Infomap with the options -hh to list the options
            below:
          
Argument types are defined as:
[+] means that the option can be repeated
Makefile and
            src directory.
          mkdir output
./Infomap ninetriangles.net output/ -N 10 --tree --bftree
          
            Here, we first create a new directory where we want the put the
            results, and feed that to
            Infomap together with an input network
            and an option. Now Infomap will try to
            parse the file ninetriangles.net as an
            undirected network, partition it
            hierarchically and write the best result out of
            10 attempts to the output directory, both
            in the human readable .tree format and
            the binary .bftree format, which can be
            loaded into the
            Network Navigator.
          
./Infomap ninetriangles.net output/ -N 10 --directed --two-level --clu --map
          Here, Infomap will treat the network as directed and try to find the optimal two-level partition with respect to the flow. This will also create an output .clu file with the cluster indices, and a .map file with the modular network that can be visualized using the Map Generator.
Multiplex networks describes a set of networks with corresponding nodes but different link structures, and possibly also an inter-network link structure (see multiplex format). If no inter-link data is provided, or if the relax rate is explicitly set, the inter-link structure will be generated by relaxing the layer constraints of the intra-network links.
There are two ways to provide a multiplex network to Infomap:
./Infomap multiplex-network.net output/ -i multiplex
          or
./Infomap network1.net network2.net [... networkN.net] output/
          
            where network1.net to networkN.net are
            ordinary networks in the Pajek or
            link list format.
          
            To see a minimal example on how Infomap can be run as a library,
            check the files example.cpp and
            Makefile in the
            examples/cpp/minimal directory, and run by typing:
          
cd examples/cpp/minimal
make
./example
          
            The included directory
            example/cpp/Infomap-igraph-interface-library/ contains
            an example on how to write a small interface library to simplify
            integration with other network libraries, here
            igraph. Example code to use that is
            included in the examples/cpp/igraph directory.
          
brew install swig.
          To run Infomap from the included python example code, type:
cd examples/python
make
python example-networkx.py
          
            The make command first calls
            make python in the Infomap base directory, which uses
            swig and
            setuptools
            to compile Infomap to a python module. Then that module is copied to
            the current example directory to be easily imported in the included
            python code.
          
            The example-networkx.py code depends on the
            NetworkX module, which can
            be installed by typing pip install networkx using the
            pip
            tool.
          
            To run Infomap from the included Jupyter notebook, first
            compile Infomap to a python module as described above, then install Jupyter by typing
            pip install jupyter using the
            pip
            tool, and finally start the notebook server and open the
            example notebook
            in examples/python by typing:
          
jupyter notebook infomap-examples.ipynb
          Assuming R is installed, you can run Infomap from from the included R example code, type:
cd examples/R
make
R
source("example-minimal.R")
          
            The make command first calls make R in the
            Infomap base directory, which uses
            swig to generate wrapper code for R and
            R CMD SHLIB to compile Infomap as a shared library. The
            library can then be loaded in an R environment by
            calling source("load-infomap.R"), which is
            called by the example files.
          
            The example-igraph.R code gives a more complete
            example, depending on the
            igraph R package to generate a
            network and to plot the communities found by Infomap.
          
Infomap decomposes a network into modules by optimally compressing a description of information flows on the network. To model this flow of information, we release a random walker on the network and encodes its steps and module exits. To tune the modular solution, you can therefore tune the dynamics of the random walker, or the encoding scheme.
            Some relevant options here are --directed,
            --undirdir, --rawdir,
            --teleportation-probability,
            --to-nodes and --include-self-links.
          
            Relevant options are --recorded-teleportation and
            --markov-time. With
            --recorded-teleportation, the solution captures the
            blurring effect of the random teleportation, typically needed on
            directed networks to reach a stationary solution. To minimize this
            artificial effect, we therefore no longer use the option by default.
          
            The --markov-time defines how many steps the random
            walker takes before its position is encoded (default 1). Shorter
            Markov times than 1 mean that the average transition rate of a
            random walker is lower than the encoding rate of its position, such
            that the same node will be encoded multiple times in a row. As a
            result, the map equation will favor more and smaller modules.
            Oppositely, longer Markov times mean that the average transition
            rate is higher than the encoding rate, such that not every node on
            the trajectory will be encoded, and the map equation will favor
            fewer and larger modules. Read more in
            Efficient community detection of network flows for varying Markov
              times and bipartite networks.
          
            Currently, Infomap understands three
            different file formats for the input network data, a minimal
            link list format, the standard
            Pajek format, and a 3-gram format for
            second-order dynamics. Specify the input format to
            Infomap with the -i or
            --input-format flag together with pajek or
            link-list as its argument. If no input format is
            specified, Infomap will try to parse
            the file by this assumption:
          
.net - Pajek format
            .txt -
              Link list format
            0, you
            have to add the flag -z or --zero-based to
            Infomap, otherwise it will assume that
            the node numbering starts from 1.
          A link list is a minimal format to describe a network by only specifying a set of links:
# A network in link list format
1 2 1
1 3 1
2 3 2
3 5 0.5
...
          
            Each line corresponds to the triad
            source target weight which describes a weighted link
            between the nodes with specified numbers. The
            weight can be any non-negative value. If omitted, the
            link weight will default to 1. The first node are
            assumed to start from 1* and the total
            number of nodes will be determined by the maximum node number.
          
The Pajek format specifies both the nodes and the links in two different sections of the file:
# A network in Pajek format
*Vertices 27
1 "1"
2 "2"
3 "3"
4 "4"
...
*Edges 33
1 2 1
1 3 1
1 4 1
2 3 1
...
          
            The node section must start with *Vertices N and the
            link section with *Edges N or
            *Arcs N (case insensitive), where N is the
            number of nodes or links. The characters within each quotation mark
            defines the corresponding node name. Weights can be given to nodes
            by adding a third column with positive numbers, and the list of
            links are treated the same as for a
            link list.
          
            The 3-gram format contains information about second-order dynamics
            and is similar to the pajek format but
            the links are defined by at least three columns,
            from through to [weight]:
          
# A network in 3-gram format
*Vertices 27
1 "1"
2 "2"
3 "3"
4 "4"
...
*3grams 33
1 1 2 1
2 1 2 1
3 1 2 1
1 2 1 1
...
          
            The node section format follows exactly the Pajek format, but the
            link section must begin with *3grams N or
            *Arcs N (case insensitive), where N is the
            number of links. Because pathways are inherently directed, the links
            are assumed to be directed by default.
          
            You can mix the trigrams with bigrams, i.e. ordinary links, by
            setting the first column to -1. Infomap will then try
            to match those links with the existing trigrams to spread out the
            weight proportinally to the weigh of the matched links.
          
            There are three ways to match an incomplete link
            -1 2 3, in priority order:
          
x 2 3 -> x 2 3x 2 y -> x 2 3x y 2 -> y 2 3
            If no matches are found, a self-link 2 2 3 is created
            if self-links are allowed.
          
In a multiplex network, each physical node can exist in a number of layers, with different link structure for each layer. A general multiplex format follows the Pajek layout, but with the links defined between nodes for each layer:
# A network in a general multiplex format
*Vertices 4
1 "Node 1"
2 "Node 2"
3 "Node 3"
4 "Node 4"
*Multiplex
# layer node layer node [weight]
1 1 1 2 2
1 1 2 2 1
1 2 1 1 1
1 3 2 2 1
2 2 1 3 1
2 3 2 2 1
2 4 2 1 2
2 4 1 2 1
          
            The weight column is optional. Links without the last
            column will get weight 1.0 by default.
          
            If no heading (beginning with *) is given, the
            multiplex format assumes *Multiplex link format:
          
# A network in a general multiplex format
# layer node layer node [weight]
1 1 1 2 2
1 1 2 2 1
1 2 1 1 1
1 3 2 2 1
2 2 1 3 1
2 3 2 2 1
2 4 2 1 2
2 4 1 2 1
          
            The multiplex format above gives full control of the dynamics, and
            no other movements are encoded. However, it is often useful to
            consider a dynamics in which a random walker moves within a layer
            and with a given relax rate jumps to another layer without recording
            this movement, such that the constraints from moving in different
            layers can be gradually relaxed. The format below explicitely
            divides the links into two groups, links within layers
            (intra-layer links) and links between layers (inter-layer
            links). For the first group, the second layer column can be omitted.
            For the second group, the second node can be omitted, as a shorthand
            for an unrecorded jump between layers. That is, each
            inter-layer link layer1 node1 layer2 is expanded to
            weighted multiplex links layer1 node1 layer2 node2, one
            for each node2 that node1 is connected to
            in layer2 with weight proportional to the weight of the
            link between node1 and node2 in
            layer2. In this way, the random walker seamlessly
            switches to a different layer at a rate proportional to the
            inter-layer link weight to that layer, and the encoded dynamics
            correspond to relaxed layer constraints (see the
            interactive storyboard
            for illustration). To define links like this, use the
            *Intra and *Inter headings:
          
# A network in multiplex format
*Intra
# layer node node weight
1 1 2 1
1 2 1 1
1 2 3 1
1 3 2 1
1 3 1 1
1 1 3 1
1 2 4 1
1 4 2 1
2 4 5 1
2 5 4 1
2 5 6 1
2 6 5 1
2 6 4 1
2 4 6 1
2 3 6 1
2 6 3 1
*Inter
# layer node layer weight
1 3 2
2 3 1
1 4 2
2 4 1
          If no inter links are provided, the inter links will be generated from the intra link structure by relaxing the layer constraints on those links.
            The bipartite network format uses prefixes f and
            n for features and nodes, respectively. The bipartite
            network can be provided both with node names:
          
# A bipartite network with node names
*Vertices 5
1 "Node 1"
2 "Node 2"
3 "Node 3"
4 "Node 4"
5 "Node 5"
*Edges 6
# feature node [weight]
f1 n1 2
f1 n2 2
f1 n3 1
f2 n3 1
f2 n4 2
f2 n5 2
          and in the link list format:
# A bipartite network in link list format
# feature node [weight]
f1 n1 2
f1 n2 2
f1 n3 1
f2 n3 1
f2 n4 2
f2 n5 2
          
            Node names can be provided under f and
            n for features and nodes, respectively. In the link
            list format, for example, it takes the form:
          
The state format describes the exact network used internally by Infomap. It can model both ordinary networks and memory networks of variable order as illustrated in this notebook.
# A network in state format
*Vertices 4
1 "PRE"
2 "SCIENCE"
3 "PRL"
4 "BIO"
*States
1 2 "1 2"
2 3 "2 3"
3 2 "4 2"
4 4 "2 4"
*Links
1 2
3 4
          
            The *Vertices section is optional and follows the
            Pajek format. It is used to name the
            physical nodes. The *States section describes all
            internal nodes with the format
            uniqueId physicalId [name], where name is
            optional. The unique index is referenced in the
            *Links section, and the physical index of the node is
            optionally referenced in the *Vertices section. The
            *Links section describes all links as
            from to [weight], where from and
            to references the first index in the
            *States section.
          
The example state network above is equivalent to the 3-gram network below:
# A 3-gram network equivalent to the state network above
*Vertices 4
1 "PRE"
2 "SCIENCE"
3 "PRL"
4 "BIO"
*3grams
1 2 3
4 2 4
          
            The resulting hierarchy will be written to a file with the extension
            .tree (plain text file) and corresponds to the best
            hierarchical partition (shortest description length) of the
            attempts. The output format has the pattern:
          
# Codelength = 3.48419 bits.
1:1:1 0.0384615 "7" 6
1:1:2 0.0384615 "8" 7
1:1:3 0.0384615 "9" 8
1:2:1 0.0384615 "4" 3
1:2:2 0.0384615 "5" 4
...
          Each row (except the first one, which summarizes the result) begins with the multilevel module assignments of a node. The module assignments are colon separated from coarse to fine level, and all modules within each level are sorted by the total flow (PageRank) of the nodes they contain. Further, the integer after the last colon is the rank within the finest-level module, the decimal number is the amount of flow in that node, i.e. the steady state population of random walkers, the content within quotation marks is the node name, and finally, the last integer is the index of the node in the original network file.
            The Tree format with an appended section
            of links with the flow between nodes within the same parent. The
            file is saved with the extension .ftree (plain text
            file), and begins with the node hierarchy formatted as the
            Tree format, followed by the links
            formatted as:
          
*Links undirected
#*Links path exitFlow numEdges numChildren
*Links root 0.0 6 5
1 2 0.0128205
1 3 0.0128205
2 4 0.0128205
3 4 0.0128205
3 5 0.0128205
4 5 0.0128205
*Links 1 0.025641 3 3
1 2 0.0128205
1 3 0.0128205
2 3 0.0128205
*Links 1:1 0.0384615 3 3
1 2 0.0128205
1 3 0.0128205
2 3 0.0128205
*Links 1:2 0.0384615 3 3
1 2 0.0128205
1 3 0.0128205
2 3 0.0128205
...
          
            First is a line stating undirected or
            directed links. Then each module is identified by the
            path field, followed by all links between child nodes
            of that module. The path is a colon separated path in
            the tree from the root to finest-level modules. The
            links are sorted by flow within each module. Links exiting the
            module are not included, but the flow on those links is aggregated
            to exitFlow. The number of edges and child nodes within
            each module is also included in the module header line, as defined
            in the commented line above. (Lines starting with # are
            treated as comments and are ignored when parsed.)
          
            A .map file (plain text file) describes the best
            two-level partition (shortest description length) of the
            attempts. The format has the pattern:
          
# modules: 4
# modulelinks: 4
# nodes: 16
# links: 20
# codelength: 3.32773
*Directed
*Modules 4
1 "Node 13,..." 0.25 0.0395432
2 "Node 5,..." 0.25 0.0395432
3 "Node 9,..." 0.25 0.0395432
4 "Node 1,..." 0.25 0.0395432
*Nodes 16
1:1 "Node 13" 0.0820133
1:2 "Node 14" 0.0790863
1:3 "Node 16" 0.0459137
1:4 "Node 15" 0.0429867
2:1 "Node 5" 0.0820133
2:2 "Node 6" 0.0790863
2:3 "Node 8" 0.0459137
2:4 "Node 7" 0.0429867
3:1 "Node 9" 0.0820133
3:2 "Node 10" 0.0790863
3:3 "Node 12" 0.0459137
3:4 "Node 11" 0.0429867
4:1 "Node 1" 0.0820133
4:2 "Node 2" 0.0790863
4:3 "Node 4" 0.0459137
4:4 "Node 3" 0.0429867
*Links 4
1 4 0.0395432
2 3 0.0395432
3 1 0.0395432
4 2 0.0395432
...
          
            This file contains the necessary information to generate a visual
            map in the Map Generator. The
            names under *Modules are derived from the node with the
            highest flow volume within the module and
            0.25 0.0395432 represent, respectively, the aggregated
            flow volume of all nodes within the module and the per step exit
            flow from the module. The nodes are listed with their module
            assignments together with their flow volumes. Finally, all links
            between the modules are listed in order from high flow to low flow.
          
            A .clu file (plain text file) describes the best
            two-level partition (shortest description length) of the
            attempts. The format has the pattern:
          
# node cluster flow:
4 1 0.4
3 1 0.1
1 2 0.2
2 2 0.1
5 2 0.05
8 2 0.05
7 3 0.05
6 3 0.05
          
            If the .clu file is used as an input clustering to
            Infomap or the apps, the
            flow column will not be used and may be omitted. Even
            the node column may be omitted, in which case it
            assumes an implicit column of naturally ordered integers starting
            from one.
          
            The resulting hierarchy can be written to a compact binary format
            using the --btree or --bftree flag. The
            former includes only the tree, while the latter also includes the
            horizontal flow links between nodes on each level of the
            tree. The data is written in a breath-first pattern to be
            streamable, which the
            Network Navigator utilizes
            to be able to show the top modular structure of the network before
            the deeper parts of the tree are loaded.
          
Don't hesitate to contact us if you have questions or comments about the code.
Infomap optimizes the map equation, which exploits the information-theoretic duality between the problem of compressing data, and the problem of detecting and extracting significant patterns or structures within those data.
Specifically, the map equation is a flow-based method and operates on dynamics on the network.
Infomap handles both unweighted and weighted, undirected and directed links.
Infomap clusters tightly interconnected nodes into modules (two-level clustering) or the optimal number of nested modules (multi-level clustering).
Infomap captures flow patterns modeled with both first-order dynamics (as on a conventional network: where flow moves to on the network only depends on where it currently is) and second-order dynamics (where flow moves to on the network both depends on where it currently is and where it just came from). Infomap captures second-order dynamics by performing first-order dynamics on memory nodes, see Memory in network flows and its effects on spreading dynamics and community detection and this interactive storyboard.
Infomap can identify overlapping modules by first representing the dynamics with memory nodes and then clustering the memory nodes in hard partitions.).
Infomap can identify (overlapping) modules in multilayer (multiplex) networks that may not be identified in a single aggregated network or by analyzing the layers separately. See Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems and this interactive storyboard.
The speed and accuracy of Infomap is visualized below, compared to the Louvain method, on generated benchmark networks described by Andrea Lancichinetti et al.
The hierarchical map equation measures the per-step average code length necessary to describe a random walker's movements on a network, given a hierarchical network partition, but the challenge is to find the partition that minimizes the description length. Into how many hierarchical levels should a given network be partitioned? How many modules should each level have? And which nodes should be members of which modules?
Below we describe the Infomap algorithm and start with a description of the original two-level clustering of nodes into modules
The core of the algorithm follows closely the Louvain method: neighboring nodes are joined into modules, which subsequently are joined into supermodules and so on. First, each node is assigned to its own module. Then, in random sequential order, each node is moved to the neighboring module that results in the largest decrease of the map equation. If no move results in a decrease of the map equation, the node stays in its original module. This procedure is repeated, each time in a new random sequential order, until no move generates a decrease of the map equation. Now the network is rebuilt, with the modules of the last level forming the nodes at this level, and, exactly as at the previous level, the nodes are joined into modules. This hierarchical rebuilding of the network is repeated until the map equation cannot be reduced further.
With this algorithm, a fairly good clustering of the network can be found in a very short time. Let us call this the core algorithm and see how it can be improved. The nodes assigned to the same module are forced to move jointly when the network is rebuilt. As a result, what was an optimal move early in the algorithm might have the opposite effect later in the algorithm. Because two or more modules that merge together and form one single module when the network is rebuilt can never be separated again in this algorithm, the accuracy can be improved by breaking the modules of the final state of the core algorithm in either of the two following ways:
In practice, we repeat the two extensions to the core algorithm in sequence and as long as the clustering is improved. Moreover, we apply the submodule movements recursively. That is, to find the submodules to be moved, the algorithm first splits the submodules into subsubmodules, subsubsubmodules, and so on until no further splits are possible. Finally, because the algorithm is stochastic and fast, we can restart the algorithm from scratch every time the clustering cannot be improved further and the algorithm stops. The implementation is straightforward and, by repeating the search more than once, 100 times or more if possible, the final partition is less likely to correspond to a local minimum. For each iteration, we record the clustering if the description length is shorter than the previously shortest description length.
We have generalized our search algorithm for the two-level map equation to recursively search for multilevel solutions. The recursive search operates on a module at any level; this can be all the nodes in the entire network, or a few nodes at the finest level. For a given module, the algorithm first generates submodules if this gives a shorter description length. If not, the recursive search does not go further down this branch. But if adding submodules gives a shorter description length, the algorithm tests if movements within the module can be further compressed by additional index codebooks. Further compression can be achieved both by adding one or more coarser codebooks to compress movements between submodules or by adding one or more finer index codebooks to compress movements within submodules. To test for all combinations, the algorithm calls itself recursively, both operating on the network formed by the submodules and on the networks formed by the nodes within every submodule. In this way, the algorithm successively increases and decreases the depth of different branches of the multilevel code structure in its search for the optimal hierarchical partitioning. For every split of a module into submodules, we use the two-level search algorithm described above.