- 2017 Maps of sparse memory networks reveal overlapping communities in network flows
- 2014 Community detection and visualization of networks with the map equation framework
- 2017 Mapping higher-order network flows in memory and multilayer networks with Infomap
- 2008 Maps of information flow reveal community structure in complex networks
- 2009 The map equation
- 2010 Mapping change in large networks
- 2011 Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems
- 2011 Compression of flow can reveal overlapping modular organization in networks
- 2012 Ranking and clustering of nodes in networks with smart teleportation
- 2014 Memory in network flows and its effects on spreading dynamics and community detection
- 2015 Estimating the resolution limit of the map equation in community detection
- 2015 Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems
- 2016 Efficient community detection of network flows for varying Markov times and bipartite networks
- 2016 Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
- 2017 Infomap Bioregions: Interactive mapping of biogeographical regions from species distributions
- 2018 Constrained information flows in temporal networks reveal intermittent communities
- 2019 Exploring the solution landscape enables more reliable network community detection
- 2019 Map equation with metadata: Varying the role of attributes in community detection
- 2020 Mapping flows on sparse networks with missing links

To comprehend the multipartite organization of large-scale
biological and social systems, we introduce a new
information-theoretic approach to reveal community structure in
weighted and directed networks. The method decomposes a network
into modules by optimally compressing a description of information
flows on the network. The result is a map that both simplifies and
highlights the regularities in the structure and their
relationships to each other. We illustrate the method by making a
map of scientific communication as captured in the citation
patterns of more than 6000 journals. We discover a multicentric
organization with fields that vary dramatically in size and degree
of integration into the network of science. Along the backbone of
the network — which includes physics, chemistry, molecular
biology, and medicine — information flows bidirectionally, but the
map reveals a directional pattern of citation from the applied
fields to the basic sciences.

Many real-world networks are so large that we must simplify their
structure before we can extract useful information about the
systems they represent. As the tools for doing these
simplifications proliferate within the network literature,
researchers would benefit from some guidelines about which of the
so-called community detection algorithms are most appropriate for
the structures they are studying and the questions they are
asking. Here we show that different methods highlight different
aspects of a network's structure and that the the sort of
information that we seek to extract about the system must guide us
in our decision. For example, many community detection algorithms,
including the popular modularity maximization approach, infer
module assignments from an underlying model of the network
formation process. However, we are not always as interested in how
a system's network structure was formed, as we are in how a
network's extant structure influences the system's behavior. To
see how structure influences current behavior, we will recognize
that links in a network induce movement across the network and
result in system-wide interdependence. In doing so, we explicitly
acknowledge that most networks carry flow. To highlight and
simplify the network structure with respect to this flow, we use
the map equation. We present an intuitive derivation of this
flow-based and information-theoretic method and provide an
interactive on-line application that anyone can use to explore the
mechanics of the map equation. The differences between the map
equation and the modularity maximization approach are not merely
conceptual. Because the map equation attends to patterns of flow
on the network and the modularity maximization approach does not,
the two methods can yield dramatically different results for some
network structures. To illustrate this and build our understanding
of each method, we partition several sample networks. We also
describe an algorithm and provide source code to efficiently
decompose large weighted and directed networks based on the map
equation.

Change is the very nature of interaction patterns in biology,
technology, economics, and science itself: The interactions within
and between organisms change; the air, ground, and sea traffic
change; the global financial flow changes; the scientific research
front changes. With increasingly available data, networks and
clustering tools have become important methods used to comprehend
instances of these large-scale structures. But blind to the
difference between noise and trends in the data, these tools alone
must fail when used to study change. Only if we can assign
significance to the partition of single networks can we
distinguish structural changes from fluctuations and assess how
much confidence we should have in the changes. Here we show that
bootstrap resampling accompanied by significance clustering
provides a solution to this problem. We use the significance
clustering to realize de Solla Price's vision of mapping the
change in science.

To comprehend the hierarchical organization of large integrated
systems, we introduce the hierarchical map equation that reveals
multilevel structures in networks. In this information-theoretic
approach, we exploit the duality between compression and pattern
detection; by compressing a description of a random walker as a
proxy for real flow on a network, we find regularities in the
network that induce this system-wide flow. Finding the shortest
multilevel description of the random walker therefore gives us the
best hierarchical clustering of the network — the optimal
number of levels and modular partition at each level — with
respect to the dynamics on the network. With a novel search
algorithm, we extract and illustrate the rich multilevel
organization of several large social and biological networks. For
example, from the global air traffic network we uncover countries
and continents, and from the pattern of scientific communication
we reveal more than 100 scientific fields organized in four major
disciplines: life sciences, physical sciences, ecology and earth
sciences, and social sciences. In general, we find shallow
hierarchical structures in globally interconnected systems, such
as neural networks, and rich multilevel organizations in systems
with highly separated regions, such as road networks.

To better understand the organization of overlapping modules in
large networks with respect to flow, we introduce the map equation
for overlapping modules. In this information-theoretic framework,
we use the correspondence between compression and regularity
detection. The generalized map equation measures how well we can
compress a description of flow in the network when we partition it
into modules with possible overlaps. When we minimize the
generalized map equation over overlapping network partitions, we
detect modules that capture flow and determine which nodes at the
boundaries between modules should be classified in multiple
modules and to what degree. With a novel greedy-search algorithm,
we find that some networks, for example, the neural network of the
nematode Caenorhabditis elegans, are best described by modules
dominated by hard boundaries, but that others, for example, the
sparse European-roads network, have an organization of highly
overlapping modules.

Random teleportation is a necessary evil for ranking and
clustering directed networks based on random walks. Teleportation
enables ergodic solutions, but the solutions must necessarily
depend on the exact implementation and parametrization of the
teleportation. For example, in the commonly used PageRank
algorithm, the teleportation rate must trade off a heavily biased
solution with a uniform solution. Here we show that teleportation
to links rather than nodes enables a much smoother trade-off and
effectively more robust results. We also show that, by not
recording the teleportation steps of the random walker, we can
further reduce the effect of teleportation with dramatic effects
on clustering.

Random walks on networks is the standard tool for modelling
spreading processes in social and biological systems. This
first-order Markov approach is used in conventional community
detection, ranking and spreading analysis, although it ignores a
potentially important feature of the dynamics: where flow moves to
may depend on where it comes from. Here we analyse pathways from
different systems, and although we only observe marginal
consequences for disease spreading, we show that ignoring the
effects of second-order Markov dynamics has important consequences
for community detection, ranking and information spreading. For
example, capturing dynamics with a second-order Markov model
allows us to reveal actual travel patterns in air traffic and to
uncover multidisciplinary journals in scientific communication.
These findings were achieved only by using more available data and
making no additional assumptions, and therefore suggest that
accounting for higher-order memory in network flows can help us
better understand how real systems are organized and function.

A community detection algorithm is considered to have a resolution
limit if the scale of the smallest modules that can be resolved
depends on the size of the analyzed subnetwork. The resolution
limit is known to prevent some community detection algorithms from
accurately identifying the modular structure of a network. In
fact, any global objective function for measuring the quality of a
two-level assignment of nodes into modules must have some sort of
resolution limit or an external resolution parameter. However, it
is yet unknown how the resolution limit affects the so-called map
equation, which is known to be an efficient objective function for
community detection. We derive an analytical estimate and conclude
that the resolution limit of the map equation is set by the total
number of links between modules instead of the total number of
links in the full network as for modularity. This mechanism makes
the resolution limit much less restrictive for the map equation
than for modularity; in practice, it is orders of magnitudes
smaller. Furthermore, we argue that the effect of the resolution
limit often results from shoehorning multilevel modular structures
into two-level descriptions. As we show, the hierarchical map
equation effectively eliminates the resolution limit for networks
with nested multilevel modular structures.

To comprehend interconnected systems across the social and natural
sciences, researchers have developed many powerful methods to
identify functional modules. For example, with interaction data
aggregated into a single network layer, flow-based methods have
proven useful for identifying modular dynamics in weighted and
directed networks that capture constraints on flow processes.
However, many interconnected systems consist of agents or
components that exhibit multiple layers of interactions, possibly
from several different processes. Inevitably, representing this
intricate network of networks as a single aggregated network leads
to information loss and may obscure the actual organization. Here,
we propose a method based on a compression of network flows that
can identify modular flows both within and across layers in
nonaggregated multilayer networks. Our numerical experiments on
synthetic multilayer networks, with some layers originating from
the same interaction process, show that the analysis fails in
aggregated networks or when treating the layers separately,
whereas the multilayer method can accurately identify modules
across layers that originate from the same interaction process. We
capitalize on our findings and reveal the community structure of
two multilayer collaboration networks with topics as layers:
scientists affiliated with the Pierre Auger Observatory and
scientists publishing works on networks on the arXiv. Compared to
conventional aggregated methods, the multilayer method uncovers
connected topics and reveals smaller modules with more overlap
that better capture the actual organization.

Community detection of network flows conventionally assumes
one-step dynamics on the links. For sparse networks and interest
in large-scale structures, longer timescales may be more
appropriate. Oppositely, for large networks and interest in
small-scale structures, shorter timescales may be better. However,
current methods for analyzing networks at different timescales
require expensive and often infeasible network reconstructions. To
overcome this problem, we introduce a method that takes advantage
of the inner-workings of the map equation and evades the
reconstruction step. This makes it possible to efficiently analyze
large networks at different Markov times with no extra overhead
cost. The method also evades the costly unipartite projection for
identifying flow modules in bipartite networks.

Biogeographical regions (bioregions) reveal how different sets of
species are spatially grouped and therefore are important units
for conservation, historical biogeography, ecol- ogy and
evolution. Several methods have been developed to identify
bioregions based on species distribution data rather than expert
opinion. One approach successfully applies network theory to
simplify and highlight the underlying structure in species
distributions. However, this method lacks tools for simple and
efficient analysis. Here we present In- fomap Bioregions, an
interactive web application that inputs species distribution data
and generates bioregion maps. Species distributions may be
provided as georeferenced point occurrences or range maps, and can
be of local, regional or global scale. The application uses a
novel adaptive resolution method to make best use of often
incomplete species dis- tribution data. The results can be
downloaded as vector graphics, shapefiles or in table format. We
validate the tool by processing large datasets of publicly
available species distribution data of the world’s amphibians
using species ranges, and mammals using point occurrences. As
examples of applications, researchers can reconstruct ancestral
ranges in historical biogeography or identify indicator species
for targeted conservation.

To better understand the flows of ideas or information through
social and biological systems, researchers develop maps that
reveal important patterns in network flows. In practice, network
flow models have implied memoryless first-order Markov chains, but
recently researchers have introduced higher-order Markov chain
models with memory to capture patterns in multi-step pathways.
Higher-order models are particularly important for effectively
revealing actual, overlapping community structure, but
higher-order Markov chain models suffer from the curse of
dimensionality: their vast parameter spaces require exponentially
increasing data to avoid overfitting and therefore make mapping
inefficient already for moderate-sized systems. To overcome this
problem, we introduce an efficient cross-validated mapping
approach based on network flows modeled by sparse Markov chains.
To illustrate our approach, we present a map of citation flows in
science with research fields that overlap in multidisciplinary
journals. Compared with currently used categories in science of
science studies, the research fields form better units of analysis
because the map more effectively captures how ideas flow through
science.

Comprehending complex systems by simplifying and highlighting
important dynamical patterns often require modeling and mapping of
higher-order network flows. However, complex systems come in many
forms and demand a range of representations, including memory and
multilayer networks, which in turn call for versatile
community-detection algorithms to reveal important modular
regularities in the flows. We show that various higher-order
network flow representations can be represented with sparse memory
networks, in which the multilevel map equation can identify
overlapping and nested modules that capture network flows. We
derive the map equation applied to sparse memory networks and
describe its search algorithm Infomap, which can exploit the
flexibility of sparse memory networks. Together they provide a
general solution to reveal overlapping modular patterns in
higher-order flows through complex systems.

Many real-world networks are representations of dynamic systems
with interactions that change over time, often in uncoordinated
ways and at irregular intervals. For example, university students
connect in intermittent groups that repeatedly form and dissolve
based on multiple factors, including their lectures, interests,
and friends. Such dynamic systems can be represented as multilayer
networks where each layer represents a snapshot of the temporal
network. In this representation, it is crucial that the links
between layers accurately capture real dependencies between those
layers. Often, however, these dependencies are unknown. Therefore,
current methods connect layers based on simplistic assumptions
that cannot capture node-level layer dependencies. For example,
connecting every node to itself in other layers with the same
weight can wipe out essential dependencies between intermittent
groups, making it difficult or even impossible to identify them.
In this paper, we present a principled approach to estimating
node-level layer dependencies based on the network structure
within each layer. We implement our node-level coupling method in
the community detection framework Infomap and demonstrate its
performance compared to current methods on synthetic and real
temporal networks. We show that our approach more effectively
constrains information inside multilayer communities so that
Infomap can better recover planted groups in multilayer benchmark
networks that represent multiple modes with different groups and
better identify intermittent communities in real temporal contact
networks. These results suggest that node-level layer coupling can
improve the modeling of information spreading in temporal networks
and better capture their dynamic community structure.

To understand how a complex system is organized and functions,
researchers often identify communities in the system's network of
interactions. Because it is practically impossible to explore all
solutions to guarantee the best one, many community-detection
algorithms rely on multiple stochastic searches. But for a given
combination of network and stochastic algorithm, how many searches
are sufficient to find a solution that is good enough? The
standard approach is to pick a reasonably large number of searches
and select the network partition with the highest quality or
derive a consensus solution based on all network partitions.
However, if different partitions have similar qualities such that
the solution landscape is degenerate, the single best partition
may miss relevant information, and a consensus solution may blur
complementary communities. Here we address this degeneracy problem
with coarse-grained descriptions of the solution landscape. We
cluster network partitions based on their similarity and suggest
an approach to determine the minimum number of searches required
to describe the solution landscape adequately. To make good use of
all partitions, we also propose different ways to explore the
solution landscape, including a significance clustering procedure.
We test these approaches on synthetic and real-world networks, and
find that different networks and algorithms require a different
number of searches and that exploring the coarse-grained solution
landscape can reveal noteworthy complementary solutions and enable
more reliable community detection.

`--meta-data <p> --meta-data-rate <f>`

»
Much of the community detection literature studies structural
communities, communities defined solely by the connectivity
patterns of the network. Often networks contain additional
metadata which can inform community detection such as the grade
and gender of students in a high school social network. In this
work, we introduce a tuning parameter to the content map equation
that allows users of the Infomap community detection algorithm to
control the metadata’s relative importance for identifying network
structure. On synthetic networks, we show that our algorithm can
overcome the structural detectability limit when the metadata are
well aligned with community structure. On real-world networks, we
show how our algorithm can achieve greater mutual information with
the metadata at a cost in the traditional map equation. Our tuning
parameter, like the focusing knob of a microscope, allows users to
“zoom in” and “zoom out” on communities with varying levels of
focus on the metadata.

`--bayes`

»
Unreliable network data can cause community-detection methods to overfit and highlight spurious structures with misleading information about the organization and function of complex systems. Here we show how to detect significant flow-based communities in sparse networks with missing links using the map equation. Since the map equation builds on Shannon entropy estimation, it assumes complete data such that analyzing undersampled networks can lead to overfitting. To overcome this problem, we incorporate a Bayesian approach with assumptions about network uncertainties into the map equation framework. Results in both synthetic and real-world networks show that the Bayesian estimate of the map equation provides a principled approach to revealing significant structures in undersampled networks.

© 2020 mapequation.org - Terms