Try itInstallFormatsHow it worksHow to citev2.10.1 GitHub

Research framework

Infomap maps networks by treating flow as a lens

A network representation defines what can move, persist, or be described. Infomap maps where that flow is retained, where it crosses boundaries, and which modular description captures the organization implied by the chosen lens. The resulting communities are meaningful relative to that lens, not as a universal property of the network.

Flow as a lens

From research question to map

The scientific logic is a modeling chain: the research question motivates the network representation, the representation guides the flow model, and Infomap maps where that flow is retained. Each step is adaptable and can be customized to the problem at hand. The map should be interpreted through that chain.

Modeling and mapping flow with the map equation framework
The map equation framework in three steps. A network representation (left) is chosen for the type of interaction — pairwise, multi-mode, multi-step, or multi-body. A random-walk model (middle) approximates the flow on that representation. Minimizing the map equation reveals flow modules (right) where a random walker tends to stay before moving on.

Working definition

In this page, a community is a region where flow is retained under the chosen network model — a region the random walker tends to stay inside before moving on.

01

Research question

Start with what the network should explain: movement, attention, similarity, interaction, influence, or another process on the system.

02

Network representation

Choose the nodes, links, weights, directions, layers, state nodes, or bipartite structure that preserve the signal you want to map.

03

Flow model

The representation induces the flow lens: how a random walker or higher-order process can move through the network.

04

Retained-flow communities

Infomap looks for regions where that flow tends to remain before moving elsewhere; those regions are the modules of the map.

05

Interpretation

Interpret the map in terms of the research question and the modeling choices that produced it.

Modeling choice

Choosing the flow lens

Flow is a modeling choice. It can be measured directly in some systems and induced by the network representation in others. The four typical setups below differ in where flow comes from and how strongly the data constrains the lens.

Observed flow

Traffic, mobility, transactions, web navigation, messages, or other measured transitions through a system.

Topology-induced flow

Citation networks, dependencies, biological interactions, hyperlink structures, and other networks where topology induces a walk.

Weighted similarity or correlation flow

Proximity, affinity, co-expression, molecular similarity, and other weighted networks where flow probes topology with care around noise and regularization.

Specified model flow

Use synthetic, generative, or benchmark networks when the research question is how modular organization appears under a known or controlled network model.

Models

Network models refine what the lens can see

The previous section asked what kind of process drives flow. The network model decides how that process is encoded: direction, weights, node types, layers, memory, scale, and regularization each change what the random walker can do and which structure becomes visible.

Directed and weighted links

Use direction and weight to define what can move, which paths are available, and how strongly links guide the flow lens.

Link-list formats

Hierarchical modules

Use hierarchical output and Markov time to explore coarser or finer flow maps when the research question concerns structure across scales.

Tree outputs

Memory and state networks

Use state nodes when the flow lens should depend on previous steps, context, layer, or another hidden state.

State input

Multilayer networks

Use multilayer input when the lens should preserve time, mode, layer, or context while connecting shared physical nodes.

Multilayer input

Bipartite networks

Use bipartite input when flow should alternate between two node types and a one-mode projection would distort the modeled process.

Bipartite input

Regularized map equation

Use regularization to add a Bayesian prior over missing transition rates and reduce overfitting when the network is sparse, noisy, or incomplete.

Workbench parameters

Communities

Communities are regions of retained flow

Communities under this definition are not the same as dense node sets. A module is useful when flow circulates inside it long enough to make the boundary informative, regardless of how many edges connect the nodes.

For similarity, correlation, or affinity networks, flow probes the weighted topology — it does not imply that expression, similarity, or influence physically moves between nodes. The walker is a measurement device, not a claim about the system.

01

Not just density

A dense subgraph is not automatically a module. What matters is whether flow circulates inside the region long enough to make a boundary useful.

02

Boundaries matter

A module boundary is useful when flow can spend time inside a region before crossing to another region.

03

Lingering implies compressibility

When flow lingers inside a region, that region can be described more efficiently with a local codebook — the bridge to the map equation that follows.

Compression

Good maps compress what matters

If flow lingers in a region, naming it locally is cheaper than naming it globally. Retained flow and short codelength are the same property seen from two angles.

The map equation makes this concrete by scoring how efficiently a partition describes movement. When flow stays within modules, local codebooks reuse short names inside each module and only switch context when flow crosses a boundary. Short codelength means the map captures regularities in the modeled flow.

01

Global names are expensive

Naming every node from one global codebook is simple, but it misses repeated local structure when flow is regional.

02

Local names reuse context

Modules get local codebooks. Node names can be reused inside modules, and exit codes mark when flow changes context.

03

A good map is conditional

Codelength gives a comparable objective for maps of the same flow model and rewards partitions that capture retained flow.

The same random walker on two views of the network. Without modules (left), every node needs its own global codeword, so the average codelength L₁ stays high. With modules (right), the walker reuses short codewords inside each module and only spends extra bits on enter and exit codes when it crosses a boundary; when flow is retained inside modules, L drops. The walker also teleports to a random node with a small probability so it can escape dead ends and explore disconnected parts of the network — the same teleport that defines the stationary flow used by the map equation.

Formula

The map equation scores a partition

The map equation is an information-theoretic objective: the expected codelength of describing a random walk with modular codebooks. It connects retained-flow communities to how efficiently the chosen flow model can be described.

Description length versus model complexity for partitions of a network
Solution landscape across partitions of varying model complexity. The number of modules grows from left to right; colors mark module assignments and the numbers approximate the description length in bits per step. The shortest codelength balances model complexity against the regularities the partition captures — here the four-module partition at 3.1 bits.

The entropy terms come from source coding: common events can have shorter codewords than rare events. A good partition makes codebook use predictable by matching module boundaries to retained flow.

The expected per-step description length for a random walk under partition M.

The cost of using an index codebook when flow moves between modules.

The cost of using module i's local codebook to describe node visits and exits.

Infomap searches for the map whose modular code gives the shortest useful description for the chosen flow model.

Optimization

How Infomap searches after the model is defined

Once the flow model and objective are defined, Infomap searches for a map with a short description under that model. The algorithmic question is practical: given this lens, which partition and hierarchy best compress the flow?

Start trial

1 / 92 · Trial

Start trial

Trial

STEP 1

Start from singletons

The two-level phase begins with each node in its own module, a high-codelength partition where almost every step crosses modules.

STEP 2

Move nodes greedily

In random order, Infomap tests moves to neighboring modules or a new singleton module and accepts moves that reduce codelength.

STEP 3

Move modules as units

Found modules are aggregated and moved together, repeating the same codelength-reducing logic at coarser scales.

STEP 4

Fine tune nodes

Fine-tuning revisits individual nodes inside the current partition and keeps only moves that further reduce codelength.

STEP 5

Coarse tune groups

Coarse-tuning splits modules into submodules and tests moving those groups to avoid local minima.

STEP 6

Add levels when useful

The multilevel phase adds extra index codebooks only when a deeper hierarchy further compresses inter-module flow.

STEP 7

Repeat trials

Because the solution landscape is non-convex, multiple trials with different random seeds help find shorter codelengths.

Multiple trials and refinements help navigate a non-convex search space. The result is evaluated by codelength and then interpreted relative to the representation, parameters, and research question.

Read next

How to cite Infomap