Research framework
Infomap maps networks by treating flow as a lens
A network representation defines what can move, persist, or be described. Infomap maps where that flow is retained, where it crosses boundaries, and which modular description captures the organization implied by the chosen lens. The resulting communities are meaningful relative to that lens, not as a universal property of the network.
Flow as a lens
From research question to map
The scientific logic is a modeling chain: the research question motivates the network representation, the representation guides the flow model, and Infomap maps where that flow is retained. Each step is adaptable and can be customized to the problem at hand. The map should be interpreted through that chain.
Working definition
In this page, a community is a region where flow is retained under the chosen network model — a region the random walker tends to stay inside before moving on.
01
Research question
Start with what the network should explain: movement, attention, similarity, interaction, influence, or another process on the system.
02
Network representation
Choose the nodes, links, weights, directions, layers, state nodes, or bipartite structure that preserve the signal you want to map.
03
Flow model
The representation induces the flow lens: how a random walker or higher-order process can move through the network.
04
Retained-flow communities
Infomap looks for regions where that flow tends to remain before moving elsewhere; those regions are the modules of the map.
05
Interpretation
Interpret the map in terms of the research question and the modeling choices that produced it.
Modeling choice
Choosing the flow lens
Flow is a modeling choice. It can be measured directly in some systems and induced by the network representation in others. The four typical setups below differ in where flow comes from and how strongly the data constrains the lens.
Observed flow
Traffic, mobility, transactions, web navigation, messages, or other measured transitions through a system.
Topology-induced flow
Citation networks, dependencies, biological interactions, hyperlink structures, and other networks where topology induces a walk.
Weighted similarity or correlation flow
Proximity, affinity, co-expression, molecular similarity, and other weighted networks where flow probes topology with care around noise and regularization.
Specified model flow
Use synthetic, generative, or benchmark networks when the research question is how modular organization appears under a known or controlled network model.
Models
Network models refine what the lens can see
The previous section asked what kind of process drives flow. The network model decides how that process is encoded: direction, weights, node types, layers, memory, scale, and regularization each change what the random walker can do and which structure becomes visible.
Directed and weighted links
Use direction and weight to define what can move, which paths are available, and how strongly links guide the flow lens.
Link-list formatsHierarchical modules
Use hierarchical output and Markov time to explore coarser or finer flow maps when the research question concerns structure across scales.
Tree outputsMemory and state networks
Use state nodes when the flow lens should depend on previous steps, context, layer, or another hidden state.
State inputMultilayer networks
Use multilayer input when the lens should preserve time, mode, layer, or context while connecting shared physical nodes.
Multilayer inputBipartite networks
Use bipartite input when flow should alternate between two node types and a one-mode projection would distort the modeled process.
Bipartite inputRegularized map equation
Use regularization to add a Bayesian prior over missing transition rates and reduce overfitting when the network is sparse, noisy, or incomplete.
Workbench parametersCommunities
Communities are regions of retained flow
Communities under this definition are not the same as dense node sets. A module is useful when flow circulates inside it long enough to make the boundary informative, regardless of how many edges connect the nodes.
For similarity, correlation, or affinity networks, flow probes the weighted topology — it does not imply that expression, similarity, or influence physically moves between nodes. The walker is a measurement device, not a claim about the system.
01
Not just density
A dense subgraph is not automatically a module. What matters is whether flow circulates inside the region long enough to make a boundary useful.
02
Boundaries matter
A module boundary is useful when flow can spend time inside a region before crossing to another region.
03
Lingering implies compressibility
When flow lingers inside a region, that region can be described more efficiently with a local codebook — the bridge to the map equation that follows.
Compression
Good maps compress what matters
If flow lingers in a region, naming it locally is cheaper than naming it globally. Retained flow and short codelength are the same property seen from two angles.
The map equation makes this concrete by scoring how efficiently a partition describes movement. When flow stays within modules, local codebooks reuse short names inside each module and only switch context when flow crosses a boundary. Short codelength means the map captures regularities in the modeled flow.
01
Global names are expensive
Naming every node from one global codebook is simple, but it misses repeated local structure when flow is regional.
02
Local names reuse context
Modules get local codebooks. Node names can be reused inside modules, and exit codes mark when flow changes context.
03
A good map is conditional
Codelength gives a comparable objective for maps of the same flow model and rewards partitions that capture retained flow.
Formula
The map equation scores a partition
The map equation is an information-theoretic objective: the expected codelength of describing a random walk with modular codebooks. It connects retained-flow communities to how efficiently the chosen flow model can be described.
The entropy terms come from source coding: common events can have shorter codewords than rare events. A good partition makes codebook use predictable by matching module boundaries to retained flow.
The expected per-step description length for a random walk under partition M.
The cost of using an index codebook when flow moves between modules.
The cost of using module i's local codebook to describe node visits and exits.
Infomap searches for the map whose modular code gives the shortest useful description for the chosen flow model.
Optimization
How Infomap searches after the model is defined
Once the flow model and objective are defined, Infomap searches for a map with a short description under that model. The algorithmic question is practical: given this lens, which partition and hierarchy best compress the flow?
1 / 92 · Trial
Start trial
Trial
STEP 1
Start from singletons
The two-level phase begins with each node in its own module, a high-codelength partition where almost every step crosses modules.
STEP 2
Move nodes greedily
In random order, Infomap tests moves to neighboring modules or a new singleton module and accepts moves that reduce codelength.
STEP 3
Move modules as units
Found modules are aggregated and moved together, repeating the same codelength-reducing logic at coarser scales.
STEP 4
Fine tune nodes
Fine-tuning revisits individual nodes inside the current partition and keeps only moves that further reduce codelength.
STEP 5
Coarse tune groups
Coarse-tuning splits modules into submodules and tests moving those groups to avoid local minima.
STEP 6
Add levels when useful
The multilevel phase adds extra index codebooks only when a deeper hierarchy further compresses inter-module flow.
STEP 7
Repeat trials
Because the solution landscape is non-convex, multiple trials with different random seeds help find shorter codelengths.
Multiple trials and refinements help navigate a non-convex search space. The result is evaluated by codelength and then interpreted relative to the representation, parameters, and research question.