Zoltan is added as thirdParty package

This commit is contained in:
Hamidreza
2025-05-15 21:58:43 +03:30
parent 83a6e4baa1
commit d7479cf1bd
3392 changed files with 318142 additions and 1 deletions

View File

@ -0,0 +1,515 @@
<!-------- @HEADER
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Zoltan Toolkit for Load-balancing, Partitioning, Ordering and Coloring
! Copyright 2012 Sandia Corporation
!
! Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
! the U.S. Government retains certain rights in this software.
!
! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions are
! met:
!
! 1. Redistributions of source code must retain the above copyright
! notice, this list of conditions and the following disclaimer.
!
! 2. Redistributions in binary form must reproduce the above copyright
! notice, this list of conditions and the following disclaimer in the
! documentation and/or other materials provided with the distribution.
!
! 3. Neither the name of the Corporation nor the names of the
! contributors may be used to endorse or promote products derived from
! this software without specific prior written permission.
!
! THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
! EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
! PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
! CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
! EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
! PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
! PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
! LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
! NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
! SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
!
! Questions? Contact Karen Devine kddevin@sandia.gov
! Erik Boman egboman@sandia.gov
!
! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! @HEADER
------->
<!DOCTYPE html PUBLIC "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="GENERATOR"
content="Mozilla/4.76 [en] (X11; U; Linux 2.4.2-2smp i686) [Netscape]">
<meta name="sandia.approved" content="SAND99-1377">
<meta name="author" content="nick aase, neaase@sandia.gov">
<title>Zoltan Developer's Guide: Hybrid Partitioning</title>
</head>
<body bgcolor="#ffffff">
<div align="right"><b><i><a href="dev.html">Zoltan Developer's Guide</a>&nbsp;
|&nbsp; <a href="dev_reftree.html">Next(NEANEA CHANGE ME)</a>&nbsp; |&nbsp; <a
href="dev_parmetis.html">Previous(NEANEA CHANGE ME)</a></i></b></div>
<h2>
<a name="Hybrid Partitioning"></a>Appendix: Hybrid Partitioning</h2>
Hybrid partitioning is an amalgam of Zoltan's native parallel hypergraph
partitioner (<a href="dev_phg.html">PHG</a>) and it Recursive Coordinate
Bisection algortihm (<a href="dev_rcb.html">RCB</a>). Hybrid partitioning can
be useful when a user is looking to strike a happy medium of both efficiency
and fidelity in their work. Traditional Zoltan-PHG is well suited to minimize
the number of cut hyperedges in the system, but it is comparatively slow due
to the multiple layers of coarsening it goes through and the standard matching
methods used to calculate new vertices for the coarser hypergraph.
<p>
Hypergraph partitioning is a useful partitioning and
load balancing method when connectivity data is available. It can be
viewed as a more sophisticated alternative to
the traditional graph partitioning.
<p>A hypergraph consists of vertices and hyperedges. A hyperedge
connects
one or more vertices. A graph is a special case of a hypergraph where
each edge has size two (two vertices). The hypergraph model is well
suited to parallel computing, where vertices correspond to data objects
and hyperedges represent the communication requirements. The basic
partitioning problem is to partition the vertices into <i>k</i>
approximately equal sets such that the number of cut hyperedges is
minimized. Most partitioners (including Zoltan-PHG) allows a more
general
model where both vertices and hyperedges can be assigned weights.
It has been
shown that the hypergraph model gives a more accurate representation
of communication cost (volume) than the graph model. In particular,
for sparse matrix-vector multiplication, the hypergraph model
<strong>exactly</strong> represents communication volume. Sparse
matrices can be partitioned either along rows or columns;
in the row-net model the columns are vertices and each row corresponds
to an hyperedge, while in the column-net model the roles of vertices
and hyperedges are reversed. </p>
<p>Zoltan contains a native parallel hypergraph partitioner, called PHG
(Parallel HyperGraph partitioner). In addition, Zoltan provides
access to <a href="http://bmi.osu.edu/%7Eumit/software.htm">PaToH</a>,
a serial hypergraph partitioner.
Note that PaToH is not part of Zoltan and should be obtained
separately from the <a href="http://bmi.osu.edu/%7Eumit/software.htm">
PaToH web site</a>.
Zoltan-PHG is a fully parallel multilevel hypergraph partitioner. For
further technical description, see <a
href="ug_refs.html#hypergraph-ipdps06">[Devine et al, 2006]</a>.<br>
</p>
<h4>Algorithm:</h4>
The algorithm used is multilevel hypergraph partitioning. For
coarsening, several versions of inner product (heavy connectivity)
matching are available.
The refinement is based on Fiduccia-Mattheysis (FM) but in parallel it
is only an approximation.
<h4>Parallel implementation:</h4>
A novel feature of our parallel implementation is that we use a 2D
distribution of the hypergraph. That is, each processor owns partial
data about some vertices and some hyperedges. The processors are
logically organized in a 2D grid as well. Most communication is limited
to either a processor row or column. This design should allow for
good scalability on large number of processors.<br>
<h4>Data structures:</h4>
The hypergraph is the most important data structure. This is stored as
a compressed sparse matrix. Note that in parallel, each processor owns
a local part of the global hypergraph
(a submatrix of the whole matrix).
The hypergraph data type is <i>struct HGraph</i>, and contains
information like number of vertices, hyperedges, pins, compressed
storage of all pins, optional vertex and edge weights, pointers
to relevant communicators, and more. One cryptic notation needs an
explanation: The arrays <i>hindex, hvertex</i> are used to
look up vertex info given a hyperedge, and <i>vindex, vedge</i> are
used to look up hyperedge info given a vertex. Essentially,
we store the hypergraph as a sparse matrix in both CSR and CSC formats.
This doubles the memory cost but gives better performance.
The data on each processor is stored using local indexing, starting at zero.
In order to get the global vertex or edge number, use the macros
<i>VTX_LNO_TO_GNO</i> and <i>EDGE_LNO_TO_GNO</i>. These macros will
look up the correct offsets (using the dist_x and dist_y arrays).
Note that <i>phg->nVtx</i> is always the local number of vertices,
which may be zero on some processors.
<h4>Parameters:</h4>
In the User's Guide, only the most essential parameters have been
documented. There are several other parameters, intended for developers
and perhaps expert "power" users. We give a more complete list of all
parameters below. Note that these parameters <span
style="font-style: italic;">may change in future versions!<br>
</span>
For a precise list of parameters in a particular version of Zoltan, look at the source code (phg.c).
<table nosave="" width="100%">
<tbody>
<tr>
<td valign="top"><b>Method String:</b></td>
<td><b>HYPERGRAPH</b></td>
</tr>
<tr>
<td><b>Parameters:</b></td>
<td><br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">HYPERGRAPH_PACKAGE</span><br>
</td>
<td style="vertical-align: top;">PHG (parallel) or PaToH (serial)<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp; <span
style="font-style: italic;">CHECK_HYPERGRAPH</span><br>
</td>
<td style="vertical-align: top;">Check if input data is valid.
(Slows performance;intended for debugging.)<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;"><span style="font-style: italic;">&nbsp;&nbsp;&nbsp;
PHG_OUTPUT_LEVEL</span><br>
</td>
<td style="vertical-align: top;">Level of verbosity; 0 is silent.<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_FINAL_OUTPUT</span><br>
</td>
<td style="vertical-align: top;">Print stats about final
partition? (0/1)<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_NPROC_VERTEX</span><br>
</td>
<td style="vertical-align: top;">Desired number of processes in
the vertex direction (for 2D internal layout) </td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_NPROC_HEDGE</span><br>
</td>
<td style="vertical-align: top;">Desired number of processes in
the hyperedge direction (for 2D internal layout) </td>
</tr>
<tr>
<td valign="top"><i>&nbsp;&nbsp;&nbsp; PHG_COARSENING_METHOD</i></td>
<td>The method to use in matching/coarsening; currently these are
available.&nbsp; <br>
<span style="font-style: italic;">agg</span> - agglomerative inner product
matching (a.k.a. heavy connectivity matching) <br>
<span style="font-style: italic;">ipm</span> - inner product
matching (a.k.a. heavy connectivity matching) <br>
<span style="font-style: italic;">c-ipm</span> -&nbsp; column
ipm;&nbsp; faster method based on ipm within processor columns <br>
<span style="font-style: italic;">a-ipm </span>- alternate
between fast method (l-ipm ) and ipm <br>
<span style="font-style: italic;">l-ipm </span>-&nbsp; local ipm
on each processor. Fastest option&nbsp; but often gives poor quality. <br>
<i>h-ipm - </i>hybrid ipm that&nbsp; uses partial c-ipm followed
by ipm on each level <br>
<i><br>
</i></td>
</tr>
<tr>
<td>&nbsp;&nbsp;&nbsp; <span style="font-style: italic;">PHG_COARSENING_LIMIT</span><br>
</td>
<td>Number of vertices at which to stop coarsening.<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_VERTEX_VISIT_ORDER</span><br>
</td>
<td style="vertical-align: top;">Ordering of vertices in greedy
matching scheme:<br>
0 - random<br>
1 - natural order (as given by the query functions)<br>
2 - increasing vertex weights<br>
3 - increasing vertex degree<br>
4 - increasing vertex degree, weighted by pins<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_EDGE_SCALING</span><br>
</td>
<td style="vertical-align: top;">Scale edge weights by some
function of size of the hyperedges:<br>
0 - no scaling<br>
1 - scale by 1/(size-1)&nbsp;&nbsp;&nbsp;&nbsp; [absorption scaling]<br>
2 - scale by 2/((size*size-1)) [clique scaling]<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_VERTEX_SCALING</span><br>
</td>
<td style="vertical-align: top;">Variations in "inner product"
similarity metric (for matching):<br>
0 - Euclidean inner product: &lt;x,y&gt;<br>
1 - cosine similarity: &lt;x,y&gt;/(|x|*|y|)<br>
2 - &lt;x,y&gt;/(|x|^2 * |y|^2)<br>
3 - scale by sqrt of vertex weights<br>
4 - scale by vertex weights<br>
</td>
</tr>
<tr>
<td valign="top">&nbsp;&nbsp;&nbsp; <i>PHG_COARSEPARTITION_METHOD</i></td>
<td>Method to partition the coarsest (smallest) hypergraph;
typically done in serial:<br>
<span style="font-style: italic;">random</span> - random<br>
<span style="font-style: italic;">linear</span> - linear
(natural) order<br>
<span style="font-style: italic;">greedy </span>- greedy method
based on minimizing cuts<br>
<span style="font-style: italic;">auto </span>- automatically
select from the above methods (in parallel, the processes will do
different methods)<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp;&nbsp; <span
style="font-style: italic;">PHG_REFINEMENT_METHOD</span><br>
</td>
<td style="vertical-align: top;">Refinement algorithm:<br>
&nbsp;<span style="font-style: italic;">fm </span>- two-way
approximate&nbsp; FM<br>
<span style="font-style: italic;">none</span> - no refinement<br>
</td>
</tr>
<tr>
<td>&nbsp;&nbsp;&nbsp; <i>PHG_REFINEMENT_LOOP_LIMIT</i></td>
<td>Loop limit in FM refinement. Higher number means more
refinement. <br>
</td>
</tr>
<tr nosave="" valign="top">
<td>&nbsp;&nbsp;&nbsp; <span style="font-style: italic;">PHG_REFINEMENT_MAX_NEG_MOVE</span><br>
</td>
<td nosave="">Maximum number of negative moves allowed in FM.<br>
</td>
</tr>
<tr nosave="" valign="top">
<td>&nbsp;&nbsp; <span style="font-style: italic;">PHG_BAL_TOL_ADJUSTMENT</span><br>
</td>
<td nosave="">Controls how the balance tolerance is adjusted at
each level of bisection.<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp; <span
style="font-style: italic;">PHG_RANDOMIZE_INPUT</span><br>
</td>
<td style="vertical-align: top;">Randomize layout of vertices and
hyperedges in internal parallel 2D layout? (0/1)<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp; <a
name="PHG_EDGE_WEIGHT_OPERATION"></a><span style="font-style: italic;">PHG_EDGE_WEIGHT_OPERATION</span>
</td>
<td style="vertical-align: top;">Operation to be applied to edge
weights supplied by different processes for the same hyperedge:<br>
<i>add</i> - the hyperedge weight will be the sum of the supplied
weights<br>
<i>max</i> - the hyperedge weight will be the maximum of the
supplied weights<br>
<i>error</i> - if the hyperedge weights are not equal, Zoltan
will flag an error, otherwise the hyperedge weight will be the value
returned by the processes<br>
</td>
</tr>
<tr nosave="" valign="top">
<td>&nbsp;&nbsp; <span style="font-style: italic;">EDGE_SIZE_THRESHOLD</span><br>
</td>
<td nosave="">Ignore hyperedges greater than this fraction times
number of vertices.<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp; <span
style="font-style: italic;">PATOH_ALLOC_POOL0</span><br>
</td>
<td style="vertical-align: top;">Memory allocation for PaToH; see
the PaToH manual for details.<br>
</td>
</tr>
<tr>
<td style="vertical-align: top;">&nbsp;&nbsp; <span
style="font-style: italic;">PATOH_ALLOC_POOL1</span><br>
</td>
<td style="vertical-align: top;">Memory allocation for PaToH; see
the PaToH manual for details.</td>
</tr>
<tr>
<td valign="top"><b>Default values:</b></td>
<td><br>
</td>
</tr>
<tr>
<td><br>
</td>
<td><i>HYPERGRAPH_PACKAGE = PHG<br>
</i></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">CHECK_HYPERGRAPH</span>
= 0<br>
</td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_OUTPUT_LEVEL=0</span></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_FINAL_OUTPUT=0</span></td>
</tr>
<tr>
<td><br>
</td>
<td><i>PHG_REDUCTION_METHOD=ipm</i></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_REDUCTION_LIMIT=100</span></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_VERTEX_VISIT_ORDER=0</span></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_EDGE_SCALING=0</span></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_VERTEX_SCALING=0</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><i>PHG_COARSEPARTITION_METHOD=greedy</i></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PHG_REFINEMENT_METHOD=fm</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><i>PHG_REFINEMENT_LOOP_LIMIT=10</i></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PHG_REFINEMENT_MAX_NEG_MOVE=100</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PHG_BAL_TOL_ADJUSTMENT=0.7</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PHG_RANDOMIZE_INPUT=0</span></td>
</tr>
<tr>
<td><br>
</td>
<td><span style="font-style: italic;">PHG_EDGE_WEIGHT_OPERATION=max</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">EDGE_SIZE_THRESHOLD=0.25</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PATOH_ALLOC_POOL0=0</span></td>
</tr>
<tr>
<td style="vertical-align: top;"><br>
</td>
<td style="vertical-align: top;"><span style="font-style: italic;">PATOH_ALLOC_POOL1=0</span></td>
</tr>
<tr>
<td valign="top"><b>Required Query Functions:</b></td>
<td><br>
</td>
</tr>
<tr>
<td><br>
</td>
<td><b><a href="../ug_html/ug_query_lb.html#ZOLTAN_NUM_OBJ_FN">ZOLTAN_NUM_OBJ_FN</a></b></td>
</tr>
<tr>
<td><br>
</td>
<td><b><a href="../ug_html/ug_query_lb.html#ZOLTAN_OBJ_LIST_FN">ZOLTAN_OBJ_LIST_FN</a></b>
or <b><a href="../ug_html/ug_query_lb.html#ZOLTAN_FIRST_OBJ_FN">ZOLTAN_FIRST_OBJ_FN</a></b>/<b><a
href="../ug_html/ug_query_lb.html#ZOLTAN_NEXT_OBJ_FN">ZOLTAN_NEXT_OBJ_FN</a></b>
pair</td>
</tr>
<tr nosave="" valign="top">
<td><br>
</td>
<td nosave=""> <b><a href="../ug_html/ug_query_lb.html#ZOLTAN_HG_SIZE_CS_FN">ZOLTAN_HG_SIZE_CS_FN</a></b>
<br>
<b><a href="../ug_html/ug_query_lb.html#ZOLTAN_HG_CS_FN">ZOLTAN_HG_CS_FN</a></b>
</td>
</tr>
<tr>
<td valign="top"><b>Optional Query Functions:</b></td>
<td><br>
</td>
</tr>
<tr>
<td><br>
</td>
<td><b><a href="../ug_html/ug_query_lb.html#ZOLTAN_HG_SIZE_EDGE_WTS_FN">ZOLTAN_HG_SIZE_EDGE_WTS_FN</a></b></td>
</tr>
<tr>
<td><br>
</td>
<td><b><a href="../ug_html/ug_query_lb.html#ZOLTAN_HG_EDGE_WTS_FN">ZOLTAN_HG_EDGE_WTS_FN</a></b></td>
</tr>
</tbody>
</table>
<p>
It is possible to provide the graph query functions instead of the
hypergraph queries, though this is not recommended. If only graph query
functions are registered, Zoltan will automatically create a hypergraph
from the graph, but some information (specifically, edge weights) will
be lost. </p>
<hr width="100%">[<a href="ug.html">Table of Contents</a>&nbsp; | <a
href="dev_reftree.html">Next:&nbsp;
Refinement Tree Partitioning(NEANEA CHANGE ME)</a>&nbsp; |&nbsp; <a
href="dev_parmetis.html">Previous:&nbsp;
ParMetis(NEANEA CHANGE ME)</a>&nbsp; |&nbsp; <a href="http://www.sandia.gov/general/privacy-security/index.html">Privacy and Security</a>]
</body>
</html>

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,38 @@
\relax
\ifx\hyper@anchor\@undefined
\global \let \oldcontentsline\contentsline
\gdef \contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
\global \let \oldnewlabel\newlabel
\gdef \newlabel#1#2{\newlabelxx{#1}#2}
\gdef \newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
\AtEndDocument{\let \contentsline\oldcontentsline
\let \newlabel\oldnewlabel}
\else
\global \let \hyper@last\relax
\fi
\@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}{section.1}}
\@writefile{toc}{\contentsline {section}{\numberline {2}Parallel hypergraphs and geometric input}{1}{section.2}}
\@writefile{toc}{\contentsline {section}{\numberline {3}PHG, MPI and 2-dimensional representation}{2}{section.3}}
\@writefile{lot}{\contentsline {table}{\numberline {1}{\ignorespaces Before communication}}{2}{table.1}}
\newlabel{tab:0/tc}{{1}{2}{\label {tab:0/tc} Before communication\relax }{table.1}{}}
\@writefile{lot}{\contentsline {table}{\numberline {2}{\ignorespaces After communication}}{2}{table.2}}
\newlabel{tab:1/tc}{{2}{2}{\label {tab:1/tc} After communication\relax }{table.2}{}}
\@writefile{toc}{\contentsline {section}{\numberline {4}Matching}{3}{section.4}}
\@writefile{toc}{\contentsline {section}{\numberline {5}Reduction factor}{3}{section.5}}
\citation{Catalyurek}
\@writefile{toc}{\contentsline {section}{\numberline {6}Results}{4}{section.6}}
\@writefile{lof}{\contentsline {figure}{\numberline {1}{\ignorespaces Runtimes on 128 processors}}{4}{figure.1}}
\newlabel{fig:Times_np_128}{{1}{4}{Runtimes on 128 processors\relax }{figure.1}{}}
\bibcite{Catalyurek}{1}
\@writefile{toc}{\contentsline {section}{\numberline {7}Conclusion and discussion}{5}{section.7}}
\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Cuts on 128 processors}}{6}{figure.2}}
\newlabel{fig:Cuts_np_128}{{2}{6}{Cuts on 128 processors\relax }{figure.2}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces Timing by percentage on 128 processors (UL, Shockstem 3D; UR, Shockstem 3D -- 108; LL, RPI; LR, Slac1.5}}{6}{figure.3}}
\newlabel{fig:Percent_np_128}{{3}{6}{Timing by percentage on 128 processors (UL, Shockstem 3D; UR, Shockstem 3D -- 108; LL, RPI; LR, Slac1.5\relax }{figure.3}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {4}{\ignorespaces Runtimes in serial on 2 processors}}{7}{figure.4}}
\newlabel{fig:Times_np_2}{{4}{7}{Runtimes in serial on 2 processors\relax }{figure.4}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {5}{\ignorespaces Cuts in serial on 2 processors}}{7}{figure.5}}
\newlabel{fig:Cuts_np_2}{{5}{7}{Cuts in serial on 2 processors\relax }{figure.5}{}}
\@writefile{lof}{\contentsline {figure}{\numberline {6}{\ignorespaces Timing by percentage on 2 processors (UL, Shockstem 3D; UR, Shockstem 3D -- 108; LL, RPI; LR, Slac1.5}}{9}{figure.6}}
\newlabel{fig:Percent_np_2}{{6}{9}{Timing by percentage on 2 processors (UL, Shockstem 3D; UR, Shockstem 3D -- 108; LL, RPI; LR, Slac1.5\relax }{figure.6}{}}

View File

@ -0,0 +1,336 @@
This is pdfTeXk, Version 3.141592-1.40.3 (Web2C 7.5.6) (format=pdflatex 2011.6.3) 18 AUG 2011 13:37
entering extended mode
%&-line parsing enabled.
**hybrid_current.tex
(./hybrid_current.tex
LaTeX2e <2005/12/01>
Babel <v3.8h> and hyphenation patterns for english, usenglishmax, dumylang, noh
yphenation, arabic, basque, bulgarian, coptic, welsh, czech, slovak, german, ng
erman, danish, esperanto, spanish, catalan, galician, estonian, farsi, finnish,
french, greek, monogreek, ancientgreek, croatian, hungarian, interlingua, ibyc
us, indonesian, icelandic, italian, latin, mongolian, dutch, norsk, polish, por
tuguese, pinyin, romanian, russian, slovenian, uppersorbian, serbian, swedish,
turkish, ukenglish, ukrainian, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2005/09/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size12.clo
File: size12.clo 2005/09/16 v1.4f Standard LaTeX file (size option)
)
\c@part=\count79
\c@section=\count80
\c@subsection=\count81
\c@subsubsection=\count82
\c@paragraph=\count83
\c@subparagraph=\count84
\c@figure=\count85
\c@table=\count86
\abovecaptionskip=\skip41
\belowcaptionskip=\skip42
\bibindent=\dimen102
)
(/usr/share/texmf/tex/latex/amsmath/amsmath.sty
Package: amsmath 2000/07/18 v2.13 AMS math features
\@mathmargin=\skip43
For additional information on amsmath, use the `?' option.
(/usr/share/texmf/tex/latex/amsmath/amstext.sty
Package: amstext 2000/06/29 v2.01
(/usr/share/texmf/tex/latex/amsmath/amsgen.sty
File: amsgen.sty 1999/11/30 v2.0
\@emptytoks=\toks14
\ex@=\dimen103
))
(/usr/share/texmf/tex/latex/amsmath/amsbsy.sty
Package: amsbsy 1999/11/29 v1.2d
\pmbraise@=\dimen104
)
(/usr/share/texmf/tex/latex/amsmath/amsopn.sty
Package: amsopn 1999/12/14 v2.01 operator names
)
\inf@bad=\count87
LaTeX Info: Redefining \frac on input line 211.
\uproot@=\count88
\leftroot@=\count89
LaTeX Info: Redefining \overline on input line 307.
\classnum@=\count90
\DOTSCASE@=\count91
LaTeX Info: Redefining \ldots on input line 379.
LaTeX Info: Redefining \dots on input line 382.
LaTeX Info: Redefining \cdots on input line 467.
\Mathstrutbox@=\box26
\strutbox@=\box27
\big@size=\dimen105
LaTeX Font Info: Redeclaring font encoding OML on input line 567.
LaTeX Font Info: Redeclaring font encoding OMS on input line 568.
\macc@depth=\count92
\c@MaxMatrixCols=\count93
\dotsspace@=\muskip10
\c@parentequation=\count94
\dspbrk@lvl=\count95
\tag@help=\toks15
\row@=\count96
\column@=\count97
\maxfields@=\count98
\andhelp@=\toks16
\eqnshift@=\dimen106
\alignsep@=\dimen107
\tagshift@=\dimen108
\tagwidth@=\dimen109
\totwidth@=\dimen110
\lineht@=\dimen111
\@envbody=\toks17
\multlinegap=\skip44
\multlinetaggap=\skip45
\mathdisplay@stack=\toks18
LaTeX Info: Redefining \[ on input line 2666.
LaTeX Info: Redefining \] on input line 2667.
)
(/usr/share/texmf/tex/latex/graphics/graphicx.sty
Package: graphicx 1999/02/16 v1.0f Enhanced LaTeX Graphics (DPC,SPQR)
(/usr/share/texmf/tex/latex/graphics/keyval.sty
Package: keyval 1999/03/16 v1.13 key=value parser (DPC)
\KV@toks@=\toks19
)
(/usr/share/texmf/tex/latex/graphics/graphics.sty
Package: graphics 2006/02/20 v1.0o Standard LaTeX Graphics (DPC,SPQR)
(/usr/share/texmf/tex/latex/graphics/trig.sty
Package: trig 1999/03/16 v1.09 sin cos tan (DPC)
)
(/usr/share/texmf/tex/latex/config/graphics.cfg
File: graphics.cfg 2007/01/18 v1.5 graphics configuration of teTeX/TeXLive
)
Package graphics Info: Driver file: pdftex.def on input line 90.
(/usr/share/texmf/tex/latex/pdftex-def/pdftex.def
File: pdftex.def 2007/01/08 v0.04d Graphics/color for pdfTeX
\Gread@gobject=\count99
))
\Gin@req@height=\dimen112
\Gin@req@width=\dimen113
)
(/usr/share/texmf/tex/latex/tools/verbatim.sty
Package: verbatim 2003/08/22 v1.5q LaTeX2e package for verbatim enhancements
\every@verbatim=\toks20
\verbatim@line=\toks21
\verbatim@in@stream=\read1
)
(/usr/share/texmf/tex/latex/graphics/color.sty
Package: color 2005/11/14 v1.0j Standard LaTeX Color (DPC)
(/usr/share/texmf/tex/latex/config/color.cfg
File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
)
Package color Info: Driver file: pdftex.def on input line 130.
)
(/usr/share/texmf/tex/latex/subfigure/subfigure.sty
Package: subfigure 2002/03/15 v2.1.5 subfigure package
\subfigtopskip=\skip46
\subfigcapskip=\skip47
\subfigcaptopadj=\dimen114
\subfigbottomskip=\skip48
\subfigcapmargin=\dimen115
\subfiglabelskip=\skip49
\c@subfigure=\count100
\c@lofdepth=\count101
\c@subtable=\count102
\c@lotdepth=\count103
****************************************
* Local config file subfigure.cfg used *
****************************************
(/usr/share/texmf/tex/latex/subfigure/subfigure.cfg)
\subfig@top=\skip50
\subfig@bottom=\skip51
)
(/usr/share/texmf/tex/latex/hyperref/hyperref.sty
Package: hyperref 2007/02/07 v6.75r Hypertext links for LaTeX
\@linkdim=\dimen116
\Hy@linkcounter=\count104
\Hy@pagecounter=\count105
(/usr/share/texmf/tex/latex/hyperref/pd1enc.def
File: pd1enc.def 2007/02/07 v6.75r Hyperref: PDFDocEncoding definition (HO)
)
(/usr/share/texmf/tex/latex/config/hyperref.cfg
File: hyperref.cfg 2002/06/06 v1.2 hyperref configuration of TeXLive
)
(/usr/share/texmf/tex/latex/oberdiek/kvoptions.sty
Package: kvoptions 2006/08/22 v2.4 Connects package keyval with LaTeX options (
HO)
)
Package hyperref Info: Hyper figures OFF on input line 2288.
Package hyperref Info: Link nesting OFF on input line 2293.
Package hyperref Info: Hyper index ON on input line 2296.
Package hyperref Info: Plain pages OFF on input line 2303.
Package hyperref Info: Backreferencing OFF on input line 2308.
Implicit mode ON; LaTeX internals redefined
Package hyperref Info: Bookmarks ON on input line 2444.
(/usr/share/texmf/tex/latex/ltxmisc/url.sty
\Urlmuskip=\muskip11
Package: url 2005/06/27 ver 3.2 Verb mode for urls, etc.
)
LaTeX Info: Redefining \url on input line 2599.
\Fld@menulength=\count106
\Field@Width=\dimen117
\Fld@charsize=\dimen118
\Choice@toks=\toks22
\Field@toks=\toks23
Package hyperref Info: Hyper figures OFF on input line 3102.
Package hyperref Info: Link nesting OFF on input line 3107.
Package hyperref Info: Hyper index ON on input line 3110.
Package hyperref Info: backreferencing OFF on input line 3117.
Package hyperref Info: Link coloring OFF on input line 3122.
\Hy@abspage=\count107
\c@Item=\count108
\c@Hfootnote=\count109
)
*hyperref using default driver hpdftex*
(/usr/share/texmf/tex/latex/hyperref/hpdftex.def
File: hpdftex.def 2007/02/07 v6.75r Hyperref driver for pdfTeX
\Fld@listcount=\count110
) (./hybrid_current.aux)
\openout1 = `hybrid_current.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 26.
LaTeX Font Info: ... okay on input line 26.
Package hyperref Info: Link coloring OFF on input line 26.
(/usr/share/texmf/tex/latex/hyperref/nameref.sty
Package: nameref 2006/12/27 v2.28 Cross-referencing by name of section
(/usr/share/texmf/tex/latex/oberdiek/refcount.sty
Package: refcount 2006/02/20 v3.0 Data extraction from references (HO)
)
\c@section@level=\count111
)
LaTeX Info: Redefining \ref on input line 26.
LaTeX Info: Redefining \pageref on input line 26.
(./hybrid_current.out)
(./hybrid_current.out)
\@outlinefile=\write3
\openout3 = `hybrid_current.out'.
! Missing $ inserted.
<inserted text>
$
l.73 that is, \forall
\, $v_x$\in\, $H:$\, \exists\, $C_x = \{c_0, c_1, ...,...
?
! Missing $ inserted.
<inserted text>
$
l.73 that is, \forall\, $v_
x$\in\, $H:$\, \exists\, $C_x = \{c_0, c_1, ...,...
?
! Missing $ inserted.
<inserted text>
$
l.73 that is, \forall\, $v_x$\in
\, $H:$\, \exists\, $C_x = \{c_0, c_1, ...,...
?
! Missing $ inserted.
<inserted text>
$
l.73 ... \forall\, $v_x$\in\, $H:$\, \exists\, $C_
x = \{c_0, c_1, ..., c_{n...
?
[1
{/usr/share/texmf/fonts/map/pdftex/updmap/pdftex.map}]
! Missing $ inserted.
<inserted text>
$
l.132 ...}^{numProc-1} ($number of local vertices_
i$)$.
?
! Missing $ inserted.
<inserted text>
$
l.133
?
[2] [3] <128_time.pdf, id=61, 794.97pt x 614.295pt>
File: 128_time.pdf Graphic file (type pdf)
<use 128_time.pdf>
<128_cutl.pdf, id=62, 794.97pt x 614.295pt>
File: 128_cutl.pdf Graphic file (type pdf)
<use 128_cutl.pdf> [4 <./128_time.pdf
pdfTeX warning: pdflatex (file ./128_time.pdf): PDF inclusion: Page Group detec
ted which pdfTeX can't handle. Ignoring it.
>] <128_breakdown_percent.pdf, id=76, 794.97pt x 614.295pt>
File: 128_breakdown_percent.pdf Graphic file (type pdf)
<use 128_breakdown_percent.pdf> <2_time.pdf, id=77, 794.97pt x 614.295pt>
File: 2_time.pdf Graphic file (type pdf)
<use 2_time.pdf> <2_cutl.pdf, id=78, 794.97pt x 614.295pt>
File: 2_cutl.pdf Graphic file (type pdf)
<use 2_cutl.pdf>
<2_breakdown_percent.pdf, id=79, 794.97pt x 614.295pt>
File: 2_breakdown_percent.pdf Graphic file (type pdf)
<use 2_breakdown_percent.pdf> [5] [6 <./128_cutl.pdf
pdfTeX warning: pdflatex (file ./128_cutl.pdf): PDF inclusion: Page Group detec
ted which pdfTeX can't handle. Ignoring it.
> <./128_breakdown_percent.pdf
pdfTeX warning: pdflatex (file ./128_breakdown_percent.pdf): PDF inclusion: Pag
e Group detected which pdfTeX can't handle. Ignoring it.
>] [7 <./2_time.pdf
pdfTeX warning: pdflatex (file ./2_time.pdf): PDF inclusion: Page Group detecte
d which pdfTeX can't handle. Ignoring it.
> <./2_cutl.pdf
pdfTeX warning: pdflatex (file ./2_cutl.pdf): PDF inclusion: Page Group detecte
d which pdfTeX can't handle. Ignoring it.
>] [8] [9 <./2_breakdown_percent.pdf
pdfTeX warning: pdflatex (file ./2_breakdown_percent.pdf): PDF inclusion: Page
Group detected which pdfTeX can't handle. Ignoring it.
>] (./hybrid_current.aux) )
Here is how much of TeX's memory you used:
3336 strings out of 256216
44724 string characters out of 1917073
104735 words of memory out of 1500000
6577 multiletter control sequences out of 10000+200000
8770 words of font info for 32 fonts, out of 1200000 for 2000
645 hyphenation exceptions out of 8191
27i,9n,36p,252b,420s stack positions out of 5000i,500n,6000p,200000b,15000s
</usr/share/texmf/fonts/type1/bluesky/cm/cmbx12.pfb>
</usr/share/texmf/fonts/type1/bluesky/cm/cmex10.pfb></usr/share/texmf/fonts/typ
e1/bluesky/cm/cmmi12.pfb></usr/share/texmf/fonts/type1/bluesky/cm/cmmi8.pfb></u
sr/share/texmf/fonts/type1/bluesky/cm/cmr10.pfb></usr/share/texmf/fonts/type1/b
luesky/cm/cmr12.pfb></usr/share/texmf/fonts/type1/bluesky/cm/cmr8.pfb></usr/sha
re/texmf/fonts/type1/bluesky/cm/cmsy10.pfb></usr/share/texmf/fonts/type1/bluesk
y/cm/cmsy8.pfb></usr/share/texmf/fonts/type1/bluesky/cm/cmti12.pfb></usr/share/
texmf/fonts/type1/bluesky/cm/cmtt12.pfb>
Output written on hybrid_current.pdf (9 pages, 186635 bytes).
PDF statistics:
186 PDF objects out of 1000 (max. 8388607)
27 named destinations out of 1000 (max. 131072)
103 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file not shown.

View File

@ -0,0 +1,296 @@
\documentclass[12pt]{article}
\usepackage{amsmath} % need for subequations
\usepackage{graphicx} % need for figures
\usepackage{verbatim} % useful for program listings
\usepackage{color} % use if color is used in text
\usepackage{subfigure} % use for side-by-side figures
\usepackage{hyperref} % use for hypertext links, including those to external documents and URLs
\setlength{\baselineskip}{16.0pt} % 16 pt usual spacing between lines
\setlength{\parskip}{3pt plus 2pt}
\setlength{\parindent}{20pt}
\setlength{\oddsidemargin}{0.5cm}
\setlength{\evensidemargin}{0.5cm}
\setlength{\marginparsep}{0.75cm}
\setlength{\marginparwidth}{2.5cm}
\setlength{\marginparpush}{1.0cm}
\setlength{\textwidth}{150mm}
\begin{comment}
\pagestyle{empty}
\end{comment}
\begin{document}
\begin{center}
{\large Hybrid Partitioning in Zoltan} \\
Nick Aase, Karen Devine \\
Summer, 2011
\end{center}
\section{Introduction}
When used for partitioning, Zoltan has a wide range of algorithms
available to it. Traditionally they have fallen into two categories:
geometric-based partitioning, and topology-based partitioning. Each
method has its own strengths and weaknesses which ultimately come down
to the tradeoff between speed and quality, and the onus is placed
upon the user to determine which is more desirable for the project
at hand.
In our project we strived to develop a hybrid partitioning algorithm;
one that attempts to take advantage of the efficiency of geometric
methods, as well as the precision of topological ones. The reasoning
behind this concept is that problem sets with large amounts of data may
be more easily digestible by topological methods if they are first
reduced into managable pieces based on their geometry.
The two subjects chosen for this project were the Recursive
Coordinate Bisection (RCB) algorithm and Parallel Hypergraph
partitioning (PHG). RCB is an extremely fast method of partitioning,
but it can be clumsy at times when it ``cuts'' across a coordinate plane.
On the other hand, PHG has a good understanding of the relationships
between data, making its partitioning quite accurate, but it suffers
from having to spend a great deal of time finding those relationships.
For further information on implementing hybrid partitioning, please see
the developer's guide at
http://www.cs.sandia.gov/Zoltan/dev\_html/dev\_hybrid.html
\section{Parallel hypergraphs and geometric input}
In order for Zoltan to support hybrid partitioning, it is necessary
to properly and frequently obtain, preserve, and communicate coordinate
data. The first step that needed to be taken was to modify PHG to
support coordinate information. Hypergraph objects carry a substantial
amount of data already, but we had to add an array of floating point
values to store the coordinates. Currently, when a hypergraph is built and
geometric information is available from the input, each vertex will have
a corresponding subset within the array defining its coordinates;
that is, \forall\, $v_x$\in\, $H:$\, \exists\, $C_x = \{c_0, c_1, ..., c_{n-1}\},$
where $v_x$ is an arbitrary vertex in the hypergraph $H$, $C_x$ is its
corresponding coordinate subset, and $n$ is the number of dimensions in
the system. In this way, Zoltan can treat each coordinate subset as an
element of that vertex
\section{PHG, MPI and 2-dimensional representation}
PHG is interesting in that multiple processors can share partial data
that describes the properties of hyperedges and vertices. This sort of
system can be represented in a 2-dimensional distribution similar to
Table 1. A populated field represents that a processor on the y-axis has
data related to the vertex on the x-axis. In this example, you can see
that processor $P_0$ and $P_2$ share data describing vertices $v_0$ and
$v_2$.
\begin{table}[h]
\begin{center}
\begin{tabular}{|r|l|l|l|}
\hline
Processor & $v_0$ & $v_1$ & $v_2$ \\
\hline
$P_0$ & x & & x \\
\hline
$P_1$ & & x & \\
\hline
$P_2$ & x & & x \\
\hline
\end{tabular}
\caption{\label{tab:0/tc} Before communication}
\end{center}
\end{table}
Using Message Passing Interface (MPI) communicators, it is possible to
communicate with processors by column. We use an \texttt{MPI\_Allreduce}
call to collect data from each processor, which groups them into a usable
form. Consider Table 2.
\begin{table}[h]
\begin{center}
\begin{tabular}{|r|l|l|l|}
\hline
Processor & $v_0$ & $v_1$ & $v_2$ \\
\hline
$P_0$ & x & & \\
\hline
$P_1$ & & x & \\
\hline
$P_2$ & & & x \\
\hline
\end{tabular}
\caption{\label{tab:1/tc} After communication}
\end{center}
\end{table}
This same sort of operation is performed with weight data, so implementing
it on coordinate data was simply another step in setting up PHG to support
coordinate information from the input. Afterwards the entirity of a vertex's
data will be unique to a single processor, with the number of global
vertices == $\sum_{i=0}^{numProc-1} ($number of local vertices_i$)$.
\section{Matching}
There are several matching methods already native to Zoltan and specific to
PHG, but we needed to create a new method in order to use RCB on the
hypergraph data. Before the actual matching occurs several specialized
callbacks and parameters are registered. Doing this is crucial if RCB and PHG
are to interface properly with each other.
The next task is to physically call RCB. It was easy enough to send PHG
data to RCB as we simply used the \texttt{Zoltan\_LB\_Partition} wrapper,
not unlike other standard load balancing partitioners. However, getting
matchings \emph{back} from RCB to PHG was another matter entirely. Thanks to
Dr. Devine's work, we were able to ostensibly comondeer one of RCB's unused
return values: since all matching algorithms conform syntactically to the
afforementioned load-balancing wrapper, there are some arguments and/or
values that are never used depending on what data that partitioner needs In
the case of RCB, the return value \texttt{*export\_global\_ids}, which is
defined in its prototype, was never actually computed. Dr. Devine was able
to rewire RCB so that, when using hybrid partitioning, it would return the
IDs of the matchings we need for each hypergraph (which are referred to in
the matching procedure as \emph{candidates}).
This new matching procedure is similar to PHG's agglomerative matching,
whereby candidate vertices are selected to represent groups of similar
vertices. These candidates then make up the standard vertices in the
resultant coarse hypergraph. The major difference is that standard
agglomerative matching determines its candidates by the connectivity of
vertices to one another; the more heavily connected a subset of vertices
is, the more likely they will share the same candidate. Using RCB means
making the assumption that related vertices will be geometrically similar:
recursive geometric cuts will be more likely to naturally bisect less
connected parts of the hypergraph, and the vertices that are members of
the resulting subdomains will share the same candidates. Given RCB's
track record, this method should be significantly faster than the
agglomerative matching.
\section{Reduction factor}
When using hybrid partitioning, the user passes a parameter in the input
file called \texttt{HYBRID\_REDUCTION\_FACTOR}, which is a number $> 0$
and $\leq 1$ that gets passed into RCB. This parameter defines the
aggressiveness of the overall procedure. This number simply determines
the amount by which the larger graph will be reduced (e.g. for the
original, fine hypergraph, $H_f$, where the number of vertices
$|V_f| == 1000$, and a reduction factor of $f == 0.1$, the coarse hypergraph,
$H_c$, will have $|V_c| == 100$ vertices).
This gives the user more control over the balance between quality
and efficiency.
\section{Results}
We ran experiments primarily with 2 and 128 processors on the Odin cluster
at Sandia National Labs, though there were brief, undocumented forees with
16 and 32 processors as well. Odin has two AMD Opteron 2.2GHz processors
and 4GB of RAM on each node, which are connected with a Myrinet network
\cite{Catalyurek}. The partitioning methods used were RCB, PHG, and hybrid
partitioning with a reduction factor of 0.01, 0.05, and 0.1. Each run went
through 10 iterations of the scenario. The runs with 128 processors were
given 5 different meshes to run on, whereas the 2 processor runs only ran
on the 4 smaller meshes, as the cluster was undergoing diagnostics at the
time of the experiements.
%NEED TIMES @ 128 PROCS
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=80mm]{128_time.pdf}
\caption{Runtimes on 128 processors}\label{fig:Times_np_128}
\end{figure}
%NEED cutl @ 128 PROCS
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=70mm]{128_cutl.pdf}
\caption{Cuts on 128 processors}\label{fig:Cuts_np_128}
\end{figure}
You can see from Figure 1 and 2 that at 128 processors the hybrid methods
are mainly slower than PHG and less accurate than RCB: both results are
the inverse of what we had hoped. There was better news looking at where
the processes were taking their time though:
%timer breakdowns for 128
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=70mm]{128_breakdown_percent.pdf}
\caption{Timing by percentage on 128 processors (UL, Shockstem 3D; UR,
Shockstem 3D -- 108; LL, RPI; LR, Slac1.5}\label{fig:Percent_np_128}
\end{figure}
The dramatic decrease in the matching time meant that RCB was, indeed,
helping on that front.
When we ran our simulations in serial, however, we saw some very different
results:
%times, cutl
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=80mm]{2_time.pdf}
\caption{Runtimes in serial on 2 processors}\label{fig:Times_np_2}
\end{figure}
%NEED cutl @ 128 PROCS
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=70mm]{2_cutl.pdf}
\caption{Cuts in serial on 2 processors}\label{fig:Cuts_np_2}
\end{figure}
In general the hybrid times beat the PHG times, and the hybrid cuts beat
the RCB cuts.
%time breakdowns for 2
\begin{figure}[hgp]
\centering
\includegraphics[width=\textwidth, height=70mm]{2_breakdown_percent.pdf}
\caption{Timing by percentage on 2 processors (UL, Shockstem 3D; UR,
Shockstem 3D -- 108; LL, RPI; LR, Slac1.5}\label{fig:Percent_np_2}
\end{figure}
Looking at individual timers in this serial run, we can see that RCB has
still drastically reduced the matching time. In addition, the slowdown in
the coarse partitioning has been greatly reduced.
\section{Conclusion and discussion}
The parallel implementation of hybrid partitioning is obviously not
functioning as desired, but we believe that there is ultimately a great
deal of promise in this method. Seeing the results from our serial runs
is encouraging, and it would be worth the effort to continue forward.
Perhaps it would be helpful to check for any communication issues arising
between processors. The whole system could potentially drag, was a
single processor waiting for a message. Additionally, Dr. Catalyurek had
suggested only using RCB-based coarsening on the largest, most complex
hypergraphs, and then revert to standard agglomerative matching for
coarser iterations.
At this moment, there could be four different ways to use Dr. Catalyurek's
method: the first, and perhaps simplest of the three, would be to hardwire
in the number of coarsening levels to give to RCB. A second way would be
to define a new parameter to allow the user to select the number of
RCB-based coarsenings. A third would be to write a short algorithm to
determine and use the optimal number of layers based off of the input.
Finally, there could be an option of user input, with a default to
be either of the other ways.
\begin{thebibliography}{5}
\bibitem{Catalyurek}U.V. Catalyurek, E.G. Boman, K.D. Devine, D. Bozdag,
R.T. Heaphy, and L.A. Riesen. \emph{A Repartitioning Hypergraph Model
for Dynamic Load Balancing.} Sandia National Labs, 2009.
\end{thebibliography}
{\small \noindent August 2011.}
\end{document}

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
thirdParty/Zoltan/doc/Zoltan_pdf/ug.pdf vendored Normal file

Binary file not shown.