A hybrid ensemble approach for the Steiner tree problem in large

Applied Soft Computing 11 (2011) 5745–5754
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
A hybrid ensemble approach for the Steiner tree problem in large graphs:
A geographical application
Abdelhamid Bouchachia a,∗ , Markus Prossegger b
a
b
University of Klagenfurt, Department of Informatics, Group of Software Engineering and Soft Computing, Klagenfurt 9020, Austria
Carinthia University of Applied Sciences, School of Network Engineering and Communication, Klagenfurt 9020, Austria
a r t i c l e
i n f o
Article history:
Received 16 June 2010
Received in revised form
26 December 2010
Accepted 12 March 2011
Available online 27 March 2011
Keywords:
Parallel ant colony optimization
Spectral clustering
Steiner tree problem
Ensemble clustering
Divide-and-conquer
a b s t r a c t
Hybrid approaches are often recommended for dealing in an efficient manner with complex problems
that require considerable computational time. In this study, we follow a similar approach consisting of
combining spectral clustering and ant colony optimization in a two-stage algorithm for the purpose of
efficiently solving the Steiner tree problem in large graphs. The idea of the two-stage approach, called
ESC–IAC, is to apply a divide-and-conquer strategy which consists of breaking down the problem into
sub-problems to find local solutions before combining them. In the first stage, graph segments (clusters)
are generated using an ensemble spectral clustering method for enhancing the quality; whereas in the
second step, parallel independent ant colonies are implemented to find local and global minima of the
Steiner tree. To illustrate the efficiency and accuracy, ESC–IAC is applied in the context of a geographical
application relying on real-world as well as artificial benchmarks.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
Combining various computational models for building systems
aims at capitalizing on the insufficiency and/or shortcoming of each
of the models involved in order to achieve highly efficient systems.
Hybridization assumes that the models are complementary. Viewing this from the perspective of optimization in complex systems,
the goal is to tackle the multifaceted complexity via a divideand-conquer-like strategy which consists of decomposing large
problems into smaller tractable sub-problems. Often a bottom-up
approach is adopted in order to find the final solution.
In this paper, we investigate a hybrid two-phase approach
relying on a divide-and-conquer strategy to deal with large
geographical data sets involving a combinatorial optimization
problem. Applications like wiring and pipelining in urban areas
are typically complex problems. They require searching the famous
minimal Steiner tree in the huge graphs that model the real-world
topology of the urban areas. Because optimization for hard problems is often accomplished by means of search heuristics, the
optimal solution may only be approximated.
The present paper suggests the application of an instance of
swarm intelligence algorithms, that is ant colony optimization
∗ Corresponding author.
E-mail addresses: hamid@isys.uni-klu.ac.at (A. Bouchachia),
m.prossegger@cuas.at (M. Prossegger).
1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2011.03.005
(ACO). This meta heuristic relies on a natural metaphor inspired
from real ant colonies behavior. We are interested in investigating the application of divide-and-conquer intertwined with ant
colony systems to the Steiner tree problem (STP) [13,15]. In general
terms, we investigate in the present research a multi-colony strategy stemming from the divide-and-conquer concept to solve STP
modeling the problem of wiring and pipelining in an urban area
(e.g., city). Given that the geographical data representing a map
looks like a huge graph whose vertices are the topological elements
of the area, the application of ACO is well justified by the fact that
STP in its most versions is NP-complete [16]. However, ACO alone
may not be able to cope with such a complexity, hence the application of clustering. In fact the graph is segmented using spectral
clustering producing a set of subgraphs (regions or clusters). With
the aim of enhancing the quality of the resulting clusters, we apply
an ensemble method for clustering. Three spectral clustering algorithms are used. Their combination generate the final segmentation
of the data which will be used as input to the next stage. During such
a stage, the ACO algorithm attempts to find a local minimal Steiner
trees on the subgraphs and to compute a global minimal Steiner
tree on the hypergraph resulting from combining the clusters. We
apply a parallel version of ACO, that is, parallel independent ant
colonies (IAC), to efficiently handle the optimization problem. In all,
the approach combines ensemble spectral clustering (ESC) and IAC
to cope with the optimization complexity, hence the name ESC–IAC.
Before delving into the details of ESC–IAC, the structure of the
paper is presented as follows. Section 2 introduces some prelimi-
5746
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
are independent, parallelism can be used to compute a minimal
Steiner tree in each of them. This is achieved by parallel independent ant colonies. In a nutshell, the required steps of ESC–IAC are
highlighted in Algorithm 1.
Algorithm 1.
1:
2:
3:
Algorithmic steps
The graph is clustered using each of the algorithms described in
Section ce:cross-ref refid=”sec4”/ce:cross-ref to obtain k clusters. Here
there are two alternatives: (1) use individual results of the algorithms,
(2) use of the best results and (3) use of the combination of the
individual clustering results.
Once obtained, the proposed independent ant colony system is used to
calculate the local Steiner trees. This local solutions are then
compressed in the form of hypergraphs to allow the calculation of the
global Steiner tree.
Once the global solution is computed, an expansion (reconstruction) is
applied to obtain the minimal Steiner tree in the original graph.
Fig. 1. Topology of an urban area.
naries and context of the present research. Section 3 is providing
the description of ESC–IAC highlighting the spectral clustering algorithms in Section 3.1 and the independent ant colonies strategy in
Section 3.3. In Section 4, a set of experiments are discussed to show
the effectiveness of the proposed approach. Section 5 highlights the
contributions and future work.
2. Preliminaries
The problem is formulated on an undirected graph G = (V, E, d),
consisting of n = | V | vertices and m = | E | edges. The distance of an
edge e ∈ E between two vertices i ∈ V and j ∈ V is given as a cost
function dij : E → R. The Steiner tree Problem is about calculating
a minimum spanning tree in G connecting a given set of terminals
T ⊂ V. Any non-terminals V \ T spanned by the Steiner tree are called
Steiner Points. This is known to be a challenging and hard problem
[16]. This Steiner tree problem can be encountered in various applications like electrical power supply networks, telecommunication
networks, routing, VLSI, etc. For instance in our case, the problem
is to minimize the cable routing (connection) costs such that all
buildings (i.e., terminals T of graph G) get connected (i.e. included
in the spanning tree). Clearly, our approach is intended to deal with
complex graphs resulting from geographical data. The initial step
of our investigation is to construct the graph from the geographical
data (map). The vertices of such a graph represent the topological
elements of the map; whereas edges represents the connections
between these topological elements. The vertices to be connected
by the Steiner tree are marked as terminals. The resulting graph, as
a modeling instrument, will allow to simulate and optimize routes
in presence of real-world topology.
A sample of such graph is illustrated in Fig. 1 showing the order
of complexity we are facing in our investigations. Therefore, our
approach aims at handling graphs characterized by n > 9000 and
m > 10, 000. The number of terminals |T | is at least 500. It is worth
mentioning that due to the fact that finding the minimal Steiner
tree is an offline task, we have implemented the approach to handle
huge data structures instead of focusing just on time constraints.
3. ESC–IAC approach
The investigated approach ESC–IAC aims at dealing with minimal Steiner trees in complex graphs. Relying on the concept of
divide and conquer, this approach is hybrid in the sense of involving
two different mechanisms: spectral clustering and ant colony optimization. The first mechanism allows segmenting large graphs into
subgraphs. Using ACO, we intend to obtain from the subgraphs local
minimal Steiner trees. These trees are then combined and refined
to obtain a final solution to the original graph. Because subgraphs
The stages of the approach are described below in Section 3.1
and Section 3.3.
3.1. Spectral clustering
Clustering aims at partitioning data into compact clusters. In
general this is achieved by minimizing the intra-cluster distances
and maximizing the between-clusters distances. Data points lying
inside the same clusters are closer to each other than to those lying
in other clusters. This criterion applies also to graph clustering. Partitions of a graph correspond to disconnected subgraphs (clusters),
such that each cluster is strongly connected and weakly connected
to outside. There exist two popular implementations of this criterion relying on the notion of min-cut and max-flow and many
variants of these [9]. However, the problem of obtaining the optimal
cut is in general NP-hard. To overcome this difficulty, often spectral relaxation techniques relying on the matrix representation of
graphs are applied [8,26]. These techniques relate graph partitions
to the eigenvectors of the graph matrix L or its Laplacian (D–L).
It is well established in the literature (see for instance [4,25,27])
that spectral clustering is the most efficient graph partitioning
technique compared to the other techniques such as: (i) recursive
bisection, (ii) geometry-based partitioning (Coordinate bisection,
Inertial bisection, Geometric partitioning), (iii) Greedy algorithms.
It is important also to note that there exist many varieties like
multilevel partitioning which, in principle, can rely on any partitioning algorithm at each level, but due to efficiency reasons,
authors always prefer to apply spectral clustering [12,1]. On the
other hand classical clustering techniques depart from the assumption that objects are described by feature vectors. The dissimilarity
between objects is computed by means of a distance measure. On
the contrast, graphs are objects that cannot easily be described
using the feature vector representation [31]. They involve notions
like connectivity, reachability, degree, etc. Moreover, in general
edge weights do not correspond to a distance. It becomes clear that
classical clustering algorithms like FCM do not fit graph partitioning very well. The reason is simple as mentioned; such algorithms
do not use the characteristics of graphs. If applied they will generate very poor partitions which are mostly useless. There are some
attempts to apply fuzzy clustering but in conjunction with spectral
clustering [14].
In the present paper, we use three spectral graph partitioning
algorithms. These are an updated version of [24,21,19] respectively.
A brief description of each of these algorithms will follow.
The algorithm proposed by Ng et al. [24] relies on the computation of eigenvectors of the normalized affinity matrix. The idea
of the algorithm is to infer the partitions of the original data from
clustering the eigenvectors of the largest eigenvalues of the affinity matrix. While in the original algorithm, k-means is used, in this
study we rely on the kernelized fuzzy c-means clustering algo-
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
5747
rithm proposed by Bouchachia and Pedrycz in [3]. The steps of the
algorithm are portrayed in Algorithm 2.
Algorithm 2.
1:
2:
3:
4:
5:
6:
First spectral clustering algorithm
Let X be the set of points (i.e., graph vertices) to be clustered:
X = {x1 , . . . , xn } and k the number of clusters.
Compute the weight (or affinity) matrix S ∈ Rn×n using a similarity
/ j and Sii = 0).
measure (for instance Sij = exp(−||xi − xj ||2 /2 2 ) if i =
Define D to be the diagonal matrix whose (i, i)-element is the sum of S’s i
−1/2
−1/2
SD
.
th row and compute the matrix L = D
Compute the first k largest eigenvalues (e1 , . . . , ek ) of L.
Form the matrix V = [v1 , . . . , vk ] containing the corresponding
eigenvectors (arranged column wise).
Form the matrix Y ∈ Rn×k from V by normalizing each of the V ’s rows to
k
have norm 1: yij = vij /(
7:
8:
j
v2ij )
1/2
.
Apply the kernelized fuzzy c-means to cluster Y .
Assign the vertices xi to cluster j if and only if yi was assigned to cluster j.
Fig. 2. Segmentation of an urban area using clustering with k = 6.
The second algorithm applied in this study is proposed by Meila
[21]. It is similar to the previous algorithm up to some details. In
the previous algorithm, the Laplacian matrix (D−1/2 SD−1/2 ) is used
as input to the algorithm and a normalization of the rows of the
selected k eigenvectors is performed. The main steps of the second
algorithm are given in Algorithm 3
Algorithm 3.
1:
2:
3:
4:
5:
6:
Second spectral clustering algorithm
Let X be the set of points (i.e., graph vertices) to be clustered:
X = {x1 , . . . , xn } and k the number of clusters.
Compute the edge weight matrix S ∈ Rn×n .
Compute the first k largest eigenvalues (e1 , . . . , ek ) of S.
Form the matrix V = [v1 , . . . , vk ] containing the corresponding
eigenvectors (arranged column wise).
Apply the k-means to cluster V .
Assign the vertices xi to cluster j if and only if yi was assigned to cluster j.
Fig. 3. Ensemble clustering.
The third algorithm is proposed by Lim et al. [19] and follows the same idea of the previous algorithms. However, it differs
in the sense that it require the matrix S to be double stochastic
(see Algorithm 4) and does not need the computation of Laplacian
matrix or a normalization of the rows of the selected k eigenvectors.
3.2. Ensemble spectral clustering
Algorithm 4.
• Input of the algorithm (various instances or various subsets of
features)
• Clustering algorithm
• The parameter setting of the algorithm
1:
2:
3:
Third spectral clustering algorithm
Let X be the set of points (i.e., graph vertices) to be clustered:
X = {x1 , . . . , xn } and k the number of clusters.
Compute the edge weight matrix S ∈ Rn×n .
Make the matrix S double stochastic (that is, all its eigenvalues are real
them exactly equal to one)
and smaller than or equal to one, with one of
cost(x, x ) = 1, for all
by normalizing the costs of the edges node (
x∈X
4:
5:
6:
7:
x ∈ X).
Compute the first k largest eigenvalues (e1 , . . . , ek ) of S.
Form the matrix V = [v1 , . . . , vk ] containing the corresponding
eigenvectors (arranged column wise).
Apply the k-means to cluster V .
Assign the vertices xi to cluster j if and only if yi was assigned to cluster j.
Because we are targeting quite complex problems represented
as huge graphs, one could apply an approach similar to multilevel
clustering [2] where the original graph G(V, E) is approximated by
another less complex but coarse graph Gc (Vc , Ec ). The latter is then
partitioned before mapping back (expanding) the obtained clusters
to the original graph. If Gc is also large, then an approximation of Gc
is computed and clustered. This procedure is recursively executed
as long as the approximation is still large.
We do proceed the other way round by clustering the graph
resulting in a set of subgraphs. Once processed, these subgraphs
are compressed and connected to produce a hypergraph which is
then processed. However, the similarity between our approach and
multilevel clustering is worth mentioning, since it will be the focus
of our future investigations.
To illustrate the clustering procedure, let us consider an urban
area. Using the spectral clustering (Algorithm 2) and setting k to 6,
we obtain the result shown in Fig. 2.
Ensemble clustering methods have been the subject of intensive
research along with ensemble classification methods [10,11]. The
idea consists of generating partitions of the data by changing:
The motivation of ensemble clustering methods is to take advantage of the diversity of clusterings in order to enhance the quality of
the clustering results. As shown in Fig. 3, ensemble clustering consists of two stages: (i) clusterings generation by varying different
aspects, and (ii) combination of clusterings relying on a consensus
function that finds commonalities of the base clusterings.
Among other ensemble methods, there are three major class
methods: graph-based methods, greedy optimization based methods, and matrix-based methods [30]. In the first class of methods,
the clusters represent the hyperedges of a hypergraph where the
nodes are the data points. Three methods were developed by
Stehl [28]: cluster-based similarity partitioning algorithm (CSPA),
hyper-graph partitioning algorithm (HGPA), and meta-clustering
algorithm (MCLA). Other methods based on bipartite graphs are
proposed in [7,6].
In the second class of methods, the goal is to find the optimal consensus between clusterings. As proposed in [28], the best
combination of clusterings is first found based on average normalized mutual information (see Eq. (4)), then an attempt to find
better labeling by randomly moving data points to another cluster.
In the third class of methods [18,23], the idea is to combine the
clustering results (clustering matrices of the base clusterings) into
other matrices such as co-association, consensus, and nonnegative
matrix to determine the final labeling.
5748
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
Often to compare two clustering C = {c1 , c2 , . . . } and K = {k1 , k2 ,
. . . } of n data points, the normalized mutual information (NMI)
[28] which will be also used in this study. NMI measures the overlap between clusterings and is based on two measures: the mutual
information (I) and entropy (H) which are given as follows:
|c ∩ k|
I(C, K) =
c∈C k∈K
and
|c|
H(C) = −
log2
n
c∈C
n
log2
n|c ∩ k| |c||k|
|c| n
(1)
(2)
NMI is then expressed as:
NMI(C, K) =
I(C, K)
(3)
H(C)H(K)
If NMI = 1, then both clusterings are the same. Moreover, if we want
to check the similarity of the base clusterings to the ensemble’s
result, we may use NMI (Eq. (3)). Another way to do this is to rely
on the averaged NMI measure which is given as:
ANMI(E, C) =
1
NMI(E, C)
|E|
Once local minimal Steiner trees are found, the clusters are then
compressed yielding a hypergraph (where nodes represent clusters
and edges represent cluster neighborhood). A (hyper) ant colony is
then applied on the derived hypergraph Gh = (Vh , Eh ) to compute
a (hyper) minimal Steiner tree (designated as the global minimal
Steiner tree).
As indicated by Step 3 of Algorithm 1, an expansion (reconstruction) is applied to obtain a minimal Steiner tree of the original
graph. This expansion is simply the aggregation of local optimal
Steiner trees into the global tree.
A colony c is characterized by a number of parameters that correlate with the actual graph size and in particular with the number
of terminals. Before describing the algorithmic steps of IAC, we
shall define the symbols to be used and their default values for
all benchmarks in our experimental evaluation.
(4)
E∈E
where E is the set of individual clusterings.
In this paper we will rely on the graph-based methods described
in [28] which refer to Eq. (4). The basic clusterings are generated using the three algorithms presented in the previous section
(Algorithms 2, 3, 4).
• ]nants : Number of ants within one colony (500)
• ]minants : Min. number of ants per nest (50)
• ]pants : Percentage of ants on best path to enlarge colony space
(2/3)
• ]ncycles : Number of cycles until forcing ant’s move (30)
• ]˛: Order of trace effect (2)
• ]ˇ: Order of effect of ant’s sight (0.1)
• ]e: Evaporation coefficient (0.2)
In a colony c, an ant a moves from a vertex i to a vertex j with a
probability expressed by the following transition rule:
ˇ
ˇ
˛
ik
ik
(5)
k/
∈Toura
3.3. Parallel ant colony optimization
Step 2 of Algorithm 1 is realized using independent ant colonies
(IAC). In this scheme of parallel ant colony optimization, colonies
correspond to clusters produced by the ensemble spectral clustering (Section 3.2) and resulting from Step 1 of Algorithm 1.
During the simulation, each of the ant colonies is assigned to one
processing unit. Each colony computes a local minimal Steiner
tree on subgraphs (i.e., cluster of urban areas). Interestingly
enough, the application of IAC is well motivated by the nature of
the problem but also by their performance (less communication
overhead).
To enable a time near optimization, the graph G has to be clustered into k subgraphs Gc = (Vc , Ec ) where c = 1 · · · k and Gc ⊂ G.
To find a minimal Steiner tree in each subgraph, a multiple ant
colonies approach is applied. The ants of the colony associated
with a given subgraph are split into sub-colonies in order to tackle
the complexity of the graph and to enhance the search efficiency.
Hence, the ant colony optimization consists of two levels: (i) each
subgraph is assigned a colony acting independently, but (ii) each
colony is divided into sub-colonies that communicate at the end of
an execution cycle.
In precise terms, each sub-colony picks a random vertex i of the
subgraph as its own nest (i.e., starting point) before the conventional ACO is applied to find the minimal Steiner tree connecting
all terminals Tc ⊂ T of the respective cluster (all terminals in the subgraph have to be included in the minimal tree). This methodology
is different from the known parallel ACO schemes discussed by several authors [5,20,22,29]. Independent sub-colonies run in parallel
(on different processors) but at the end of each execution cycle, if
the partial solutions obtained by sub-colonies share a vertex, then
these colonies are merged together. However, because this merge
operation might yield a subgraph that contains a cycle, it is important to re-initialize a single sub-colony, again solving the STP on
the merged subgraph.
ij˛ ij
Pija =
where ij represents the intensity of the pheromone between vertex i and vertex j. The parameter ˛ and ˇ indicate the influence of ij
and ij respectively. Parameter ij = 1/dij is the visibility of vertex j
from vertex i which is inversely proportional to the cost dij between
i and j. Toura indicates the tour made by the ant a.
Once a cycle is completed, the pheromone matrix is update
according to the Eq. (6)
ij (t + 1) = (1 − e)ij (t) + ij + b
(6)
such that
ij =
nants
ija
(7)
s=1
where ija indicates the change of the pheromone intensity on edge
(i, j) produced by the a th ant. If the current tour is the best one,
an additional bonus b = 0.2 is added (otherwise b = 0). This change
is quantified as follows:
ija
=
1
La
0
if ant a traverses along the edge (i, j)
(8)
otherwise
where La is the length (i.e., cumulated edge costs) of tour found by
the a th ant.
Relying on the given expressions, the local Steiner tree is
computed for each of the graph clusters Gc using the algorithm
IAC-COPY (shown in Algorithm 5). Each ant a of a colony c starts
its tour from a random terminal t ∈ T which stands for a nest
(Tourant = {random(T)}). Each colony c is enlarged by adding the verc
tices of the tours Tourbest
as nests Vc . If two colonies c1 and c2 share
a vertex v1 ∈ Vc1 and v1 ∈ Vc2 , they are merged and a single subcolony solves the STP on the merged subgraph to get rid of potential
cycles. This procedure is repeated until only one colony remains in
the cluster.
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
Then once these local optimization problems are solved, the
global minimal Steiner tree is calculated using again Algorithm 5.
Note that IAC-COPY refers to the functions INITIALIZE() and CYCLE().
The former allows to initialize different parameters especially the
pheromone matrix, while the latter illustrates the conventional
steps of ACO.
Algorithm 5.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
IAC-COPY()
4. Empirical evaluation
To evaluate the proposed ESC–IAC, we use a number of graph
instances available from the SteinLib Testdata Library [17]. For
the sake of illustration, we use the benchmarks es500fst01 and
es500fst02 obtained from the SteinLib testdata library and one realworld benchmark urban500 representing an area in an Austrian
city. The details of these three benchmarking data sets are shown in
/*initialization*/
for each colony c ∈ C do
Choose a random terminal as nest Vc
INITIALIZE(Vc , nants , minants )
end for
repeat
for each colony c ∈ C do
for m = 1 to ncycles do
c
Build the tour T ourm
using CYCLE(Ac )
c
c
Find the colony’s best tour T ourbest
with min(cost(T ourm
))
if Percentage of ants running on the best tour ≥ pants then
Break the loop (i.e., that tour is supposed to be the optimal
solution)
end if
end for
c
c
Add all vertices of T ourbest
to the colony c (Vc T ourbest
)
end for
if two sub-colonies ci and cj share some vertices, i.e., Vci Vcj = Ø
then
Merge ci and cj to obtain a new sub-colony ck (Vck = Vci Vcj then
remove ci and cj
end if
/*Initialize next step*/
for each colony c ∈ C do
INITIALIZE(Vc , nants minants )
end for
until One colony c owns all terminals T ⊆ Vc
Algorithm 6.
INITIALIZE(Vc , nAnts , minAnts )
Require: The nests Vc of the colony c (i.e. the starting nodes of the subcolonies)
Require: Number of ants in colony nAnts > 0
Require: Minimum ants per vertex minAnts > 0
1: Initialize local pheromone matrix with 10−4
2: Initialize local sight matrix with inverse edge costs
3: for each vertex v ∈ Vc do
c|
4:
Place max( n|V
, minAnts ) ants on v
Ants
5:
Initialize the tours T oura = {v} of the placed ants
6: end for
Algorithm 7.
5749
CYCLE(Ac )
Require: The ants Ac of colony c
1: for each ant a ∈ Ac do
2:
repeat
3:
Apply the transition rule (Eq.5) to choose any vertex v such that
/ Vc and v is not member of the tou
/ T oura )
v∈
r of ant a (v ∈
4:
Add vertex v into ant’s tour T oura v
5:
until v ∈ T // T is the set of terminals
6: end for
7: Update the pheromone matrix (Eq. 6).
5750
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
Table 1
Graphs from the SteinLib library.
Name
|V |
|E |
|T |
Optimum
es500fst01
es500fst02
1.250
1.408
1.763
2.056
500
500
162.978.810
160.756.854
Name
|V |
|E |
|T |
Optimum
urban500
9.128
12.409
569
–
Table 2
Real-world graph.
Tables 1 and 2 respectively. While for es500fst01 and es500fst02, the
cost of the minimal Steiner tree is known, that of urban500 is not.
One can also notice that the number of terminals in these benchmarks is very high meeting the purpose of our ESC–IAC approach.
Because the huge size of the real-world graph, urban500,
the optimum Steiner tree requires high computational time, we
decided to use the minimum spanning tree heuristic MST to get
comparable results. This heuristic aims at finding the shortest paths
between the terminals in order to build up a minimum spanning
tree MST(T).
To explore the performance of ESC–IAC, three investigations
are conducted. The first aims at exploring the effect of changing
the number of clusters on the overall execution time of ESC–IAC.
The second set of experiments deal with the quality of the results
obtained by the algorithm on each the benchmarks described earlier, whereas the last experiments aim at comparing ESC–IAC with
the conventional ACO.
4.1. Effect of clustering
Given that the size of benchmark graphs is big, it is desirable to
check the effect of graph segmentation via clustering. Recall that
the number of subgraphs corresponds to the number of clusters
and each cluster contains a number of colonies.
In this experiment, the effect of clustering is observed from the
computational time of the whole algorithm ESC–IAC, that is the
ensemble spectral clustering algorithm followed by the IAC optimization when executed on the graphs. But, first the ensemble
clustering results are presented.
Figs. 4(a)–(d), 5(a)–(d), and 6(a)–(h) show the results of the
ensemble clustering on two data sets, es500fst02 and urban500 by
setting the number of clusters to 6 and 12.
Fig. 4. Ensemble clustering of es500fst02 data set (6 clusters).
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
5751
Fig. 5. Ensemble clustering of es500fst02 data set (12 clusters).
Table 3
Similarity of the individual clusterings to the ensemble clustering.
Table 4
Effect of the cluster number on the execution time.
Instance
# Clusters
Algorithm 2
Algorithm 3
Algorithm 4
Instance
# Cluster
Time [s]
es500fst01
06
12
0.89406
0.95618
1
0.88000
0.85873
0.85595
urban500
es500fst02
06
12
1.0000
0.9728
1.0000
0.9728
0.77981
0.87096
1
6
12
10, 000
1500
700
es500fst01
Urban500
06
12
0.80597
0.61835
0.87244
0.66017
0.63059
0.60881
1
6
12
930
120
90
es500fst02
1
6
12
1100
150
80
Moreover the similarity of the individual clusterings to the final
one obtained by consensus is displayed in Table 3. The columns
with labels Algorithm 2, Algorithm 3, and Algorithm 4 mean the
value of NMI between the ensemble clustering and the different
clusterings. One can notice that Lim et al.’s algorithm (Algorithm
4) offers in most cases the closest results to the consensus results.
Based on this one could use the best algorithm or the ensemble.
Coming to the computational time, Table 4 shows the execution
time of the algorithm when the number of clusters is set to 6 and 12
for each of the graphs. The first number of clusters corresponds to
the number of clusters used by the communication network, while
the second corresponds to the number of electoral districts and
used for comparison purposes.
The results show clearly that a significant improvement of the
ESC–IAC’s efficiency is achieved by the spectral clustering ensemble.
The time required to compute the minimal Steiner tree decreases
proportionally as the number of clusters increases. For instance,
in the case of es500fst01, the ESC–IAC algorithm saves 90.3% of
the time (compare the results with 1 cluster against those with
12 clusters), whereas the optimization result is less than 3.1%
worse as will be discussed in the next section. In the case of
es500fst02 and urban500, the time gain is 92.7% and more than 93%
respectively.
5752
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
Fig. 6. Ensemble clustering of Urban500 data set (6 and 12 clusters).
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
Table 5
Results related to es500fst01 [optimum=162.978.810].
5753
tree using the sequential approach is about 0.3% lower but the
average time needed for the optimization is about 550% higher.
Approach
# Clusters
Average result
Obtained-Optimum [%]
MST
ESC–IAC
ESC–IAC
ESC–IAC
1
1
6
12
171.542.977
167.602.151
171.039.501
172.793.480
+5.25 ± 0
+2.84 ± 0.17
+4.95 ± 0.27
+6.02 ± 0.31
Table 6
Results related to es500fst02 [optimum=160.756.854].
Approach
# Cluster
Average result
Obtained-Optimum [%]
MST
ESC–IAC
ESC–IAC
ESC–IAC
1
1
6
12
170.945.745
167.196.505
174.114.834
175.622.291
+6.34 ± 0
+4.01 ± 0.27
+8.31 ± 0.41
+7.76 ± 0.38
Table 7
Results related to urban500 with unknown optimum.
Approach
# Cluster
Average result
Obtained-MST value [%]
MST
ESC–IAC
ESC–IAC
ESC–IAC
1
1
6
12
19.608
19.350
19.724
19.886
0
− 1.32 ± 0.28
+0.59 ± 0.33
+1.42 ± 0.27
5. Conclusions
The present paper introduces a new approach to deal with
minimal Steiner trees. Methodologically, the novelty concerns (1)
parallelism in ant colony optimization which is enhanced by the
ensemble spectral clustering and (2) handling large and complex
problems by ant colony systems. The ESC–IAC approach consists
of three main steps: spectral clustering to segment large graphs,
application of multiple colonies on each graph segment to find
local solutions, then application of ant colony to the hypergraph
obtained by compressing the graph segments. The empirical studies show that ESC–IAC can be successfully applied on real-world
complex problems and compares very well to standard algorithms.
As future work, it would be interesting to investigate the ESC–IAC
algorithm in order to handle real-world constraints especially in the
context of spectral clustering algorithm. The current version is general for all applications modeled as graphs; hence it might be seen as
“naive”. However, in geographical applications various constraints
are encountered. Another interesting aspect is multilevel clustering
which is worth applying in the context of such applications.
4.2. Performance of ESC–IAC
References
The optimization results for the instance es500fst01 and
es500fst02 are displayed in Tables 5 and 6 respectively, whereas
those related to the urban graph stemming from the real-world
geoinformation data are shown in Table 7.
These results illustrate the numerical optimization results (cost
of the solution), the total time needed for clustering and sequential
optimization, and the difference to the known cost of the optimum Steiner tree. The first outcome of this set of experiments
suggests that the more clusters used, the less time is needed. The
most important outcome is that pertaining to the quality of the
optimization result. Considering es500fst01 and es500fst02, ESC–IAC
produces results close to the known optimum but as the number of clusters increases, the performance decreases. However,
ESC–IAC performs better than the minimum spanning tree (MST)
algorithm which is the standard in this context when the number of clusters is set to 1 (under the same conditions). ESC–IAC
also can outperform MST when the number of clusters is small
compared to the size of the graph (e.g. with es500fst01, ESC–IAC
is better even when the number of clusters is set to 6). This performance coupled with the execution time allow to state that ESC–IAC
performs reasonably well. One might wonder why in some cases
(i.e., when the number of clusters exceeds a certain limit) MST outperforms ESC–IAC. The reason is that MST has access to the whole
graph and, therefore, it could reach the optimum, while ESC–IAC
does not use the whole graph and local optimum may not lead to
global optimum if the number of colonies (i.e., number of clusters)
increases.
In the case of the real-world urban500 data, the optimum is not
known, therefore we decided to compare the other results against
the standard MST. Again the execution time decreases as the number of clusters increases and the performance of ESC–IAC is less than
2% worse in case of 12 clusters.
[1] S. Barnard, H. Simon, Fast multilevel implementation of recursive spectral
bisection for partitioning unstructured problems, Concurrency: Pract. Exp. 6
(2) (1994) 101–117.
[2] S. Barnard, H. Simon, A parallel implementation of multilevel recursive spectral
besection for application to adaptive unstructured meshes, in: Proceedings of
the Seventh SIAM Conference on Parallel Processing for Scientific Computing,
1995, pp. 627–632.
[3] A. Bouchachia, W. Pedrycz, Enhancement of fuzzy clustering by mechanisms of
partial supervision, Fuzzy Sets Syst. 735 (13) (2006) 776–786.
[4] B. Chamberlain, Graph partitioning algorithms for distributing workloads of
parallel computations, Tech. Re TR-98-10-03, Univ. of Washington, Dept. of
Computer Science & Engineering, 1998.
[5] S. Chu, J. Roddick, J. Pan, C. Su, Parallelization strategies for ant colony optimization, in: Proceedings of the 5th International Symposium on Methodologies for
Intelligent Systems, Springer, 2003, pp. 279–284.
[6] C. Domeniconi, M. Al-Razgan, Weighted cluster ensembles: methods and analysis, ACM Trans. Knowl. Discov. Data 2 (4) (2009) 1–40.
[7] X. Fern, C. Brodley, Solving cluster ensemble problems by bipartite graph
partitioning, in: Proceedings of the Twenty-first International Conference on
Machine Learning, 2004, p. 36.
[8] M. Fiedler, A property of eigenvectors of non-negative symmetric matrices and
its application to graph theory, Czech. Math. J. 25 (1975) 619–633.
[9] G. Flake, R. Tarjan, K. Tsioutsiouliklis, Graph clustering and minimum cut trees,
Internet Math. 1 (3) (2004) 355–378.
[10] D. Greene, A. Tsymbal, N. Bolshakova, P. Cunningham, Ensemble clustering in
medical diagnostics, in: 17th IEEE Symposium on Computer-based Medical
Systems, 2004, pp. 576–581.
[11] S. Hadjitodorov, L. Kuncheva, L. Todorova, Moderate diversity for better cluster
ensembles, Inf. Fusion 7 (3) (2006) 264–275.
[12] B. Hendrickson, R. Leland, A multilevel algorithm for partitioning graphs, in:
Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM),
Supercomputing’95, ACM, New York, NY, USA, 1995.
[13] F. Hwang, D. Richards, P. Winter, The Steiner Tree Problem, North-Holland,
1992.
[14] K. Inoue, K. Urahama, Sequential fuzzy cluster extraction by a graph spectral
method, Pattern Recognit. Lett. 20 (7) (1999) 699–705.
[15] A. Ivanov, A. Tuzhelin, Minimal Networks: The Steiner Problem and its Generalizations, CRC Press, 1994.
[16] R. Karp, Complexity of Computer Computations, chap. Reducibility among
Combinatorial Problems, Plenum Press, New York, 1972, pp. 85–103.
[17] T. Koch, A. Martin, S. Voß, SteinLib: an updated library on steiner tree problems
in graphs, Tech. Re ZIB-Report 00-37, Konrad-Zuse-Zentrum für Informationstechnik Berlin, Takustr. 7, Berlin, 2000. http://elib.zib.de/steinlib.
[18] T. Li, C. Ding, M. Jordan, Solving consensus and semi-supervised clustering
problems using nonnegative matrix factorization, in: Proceedings of the 2007
Seventh IEEE International Conference on Data Mining, 2007, pp. 577–582.
[19] C. Lim, S. Bohacek, J. Hespanha, K. Obraczka, Hierarchical max-flow routing, in: IEEE Conference on Global Telecommunications Conference, 2005, pp.
550–556.
[20] M. Manfrin, M. Birattari, T. Stützle, M. Dorigo, Parallel ant colony optimization
for the traveling salesman problem, in: ANTS Workshop, 2006, pp. 224–234.
4.3. Comparison against sequential ACO
The proposed algorithm behaves as a sequential ant colony optimization if the number of clusters is set to 1 and the number of
colonies is limited |C | = 1. We ran all benchmarks using sequential
ACO instead of ESC–IAC. The average costs of the minimal Steiner
5754
A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754
[21] M. Meila, J. Shi, Learning segmentation by random walks, in: Neural Information
Processing Systems, NIPS, 2001, pp. 873–879.
[22] M. Middendorf, F. Reischle, H. Schmeck, Multi colony ant algorithms, J. Heuristics 8 (2002) 305–320.
[23] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: a resamplingbased method for class discovery and visualization of gene expression
microarray data, Mach. Learn. 52 (1–2) (2003) 91–118.
[24] A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in:
Proceedings of Advances in Neural Information Processing Systems (14), MIT
Press, 2001, pp. 849–856.
[25] A. Pothen, Parallel Numerical Algorithms, chap. Graph Partitioning with Application to Scientific Computing, Kluwer Academic Press, 1995.
[26] A. Pothen, D. Simon, K. Liou, Partitioning sparse matrices with eigenvectors of
graphs, SIAM J. Matrix Anal. Appl. 11 (3) (1990) 430–452.
[27] S. Schaeffer, Graph clustering, Comput. Sci. Rev. 1 (1) (2007) 27–64.
[28] A. Strehl, J. Ghosh, C. Cardie, Cluster ensembles—a knowledge reuse framework
for combining multiple partitions, J. Mach. Learn. Res. 3 (2002) 583–617.
[29] T. Stuetzle, Parallelization strategies for ant colony optimization, in: Proceedings of the 5th International Conference on Parallel Problem Soving for Nature,
Springer, 1998, pp. 722–731.
[30] H. Wang, H. Shan, A. Banerjee, Bayesian cluster ensembles, Statistical Analysis
Data Mining 4 (2011) 57–70.
[31] Y. Zhou, H. Cheng, J. Yu, Graph clustering based on structural/attribute similarities, Proceedings of VLDB’09 2 (2009) 718–729.