mirror of
https://github.com/PhasicFlow/phasicFlow.git
synced 2025-06-12 16:26:23 +00:00
Zoltan is added as thirdParty package
This commit is contained in:
251
thirdParty/Zoltan/Known_Problems
vendored
Normal file
251
thirdParty/Zoltan/Known_Problems
vendored
Normal file
@ -0,0 +1,251 @@
|
||||
Problems existing in Zoltan.
|
||||
This file was last updated on $Date$
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
ERROR CONDITIONS IN ZOLTAN
|
||||
When a processor returns from Zoltan to the application due to an error
|
||||
condition, other processors do not necessarily return the same condition.
|
||||
In fact, other processors may not know that the processor has quit Zoltan,
|
||||
and may hang in a communication (waiting for a message that is not sent
|
||||
due to the error condition). The parallel error-handling capabilities of
|
||||
Zoltan will be improved in future releases.
|
||||
-------------------------------------------------------------------------------
|
||||
RCB/RIB ON ASCI RED
|
||||
On ASCI Red, the number of context IDs (e.g., MPI Communicators) is limited
|
||||
to 8192. The environment variable MPI_REUSE_CONTEXT_IDS must be set to
|
||||
reuse the IDs; setting this variable, however, slows performance.
|
||||
An alternative is to set Zoltan_Parameter TFLOPS_SPECIAL to "1". With
|
||||
TFLOPS_SPECIAL set, communicators in RCB/RIB are not split and, thus, the
|
||||
application is less likely to run out of context IDs. However, ASCI Red
|
||||
also has a bug that is exposed by TFLOPS_SPECIAL; when messages that use
|
||||
MPI_Send/MPI_Recv within RCB/RIB exceed the MPI_SHORT_MSG_SIZE, MPI_Recv
|
||||
hangs. We do not expect these conditions to exist on future platforms and,
|
||||
indeed, plan to make TFLOPS_SPECIAL obsolete in future versions of Zoltan
|
||||
rather than re-work it with MPI_Irecv. -- KDD 10/5/2004
|
||||
-------------------------------------------------------------------------------
|
||||
ERROR CONDITIONS IN OCTREE, PARMETIS AND JOSTLE
|
||||
On failure, OCTREE, ParMETIS and Jostle methods abort rather than return
|
||||
error codes.
|
||||
-------------------------------------------------------------------------------
|
||||
ZOLTAN_INITIALIZE BUT NO ZOLTAN_FINALIZE
|
||||
If Zoltan_Initialize calls MPI_Init, then MPI_Finalize
|
||||
will never be called because there is no Zoltan_Finalize routine.
|
||||
If the application uses MPI and calls MPI_Init and MPI_Finalize,
|
||||
then there is no problem.
|
||||
-------------------------------------------------------------------------------
|
||||
HETEROGENEOUS ENVIRONMENTS
|
||||
Some parts of Zoltan currently assume that basic data types like
|
||||
integers and real numbers (floats) have identical representation
|
||||
on all processors. This may not be true in a heterogeneous
|
||||
environment. Specifically, the unstructured (irregular) communication
|
||||
library is unsafe in a heterogeneous environment. This problem
|
||||
will be corrected in a future release of Zoltan for heterogeneous
|
||||
systems.
|
||||
-------------------------------------------------------------------------------
|
||||
F90 ISSUES
|
||||
Pacific Sierra Research (PSR) Vastf90 is not currently supported due to bugs
|
||||
in the compiler with no known workarounds. It is not known when or if this
|
||||
compiler will be supported.
|
||||
|
||||
N.A.Software FortranPlus is not currently supported due to problems with the
|
||||
query functions. We anticipate that this problem can be overcome, and support
|
||||
will be added soon.
|
||||
-------------------------------------------------------------------------------
|
||||
PROBLEMS EXISTING IN PARMETIS
|
||||
(Reported to the ParMETIS development team at the University of Minnesota,
|
||||
metis@cs.umn.edu)
|
||||
|
||||
Name: Free-memory write in PartGeomKway
|
||||
Version: ParMETIS 3.1.1
|
||||
Symptom: Free-memory write reported by Purify and Valgrind for graphs with
|
||||
no edges.
|
||||
Description:
|
||||
For input graphs with no (or, perhaps, few) edges, Purify and Valgrind
|
||||
report writes to already freed memory as shown below.
|
||||
FMW: Free memory write:
|
||||
* This is occurring while in thread 22199:
|
||||
SetUp(void) [setup.c:80]
|
||||
PartitionSmallGraph(void) [weird.c:39]
|
||||
ParMETIS_V3_PartGeomKway [gkmetis.c:214]
|
||||
Zoltan_ParMetis [parmetis_interface.c:280]
|
||||
Zoltan_LB [lb_balance.c:384]
|
||||
Zoltan_LB_Partition [lb_balance.c:91]
|
||||
run_zoltan [dr_loadbal.c:581]
|
||||
main [dr_main.c:386]
|
||||
__libc_start_main [libc.so.6]
|
||||
_start [crt1.o]
|
||||
* Writing 4 bytes to 0xfcd298 in the heap.
|
||||
* Address 0xfcd298 is at the beginning of a freed block of 4 bytes.
|
||||
* This block was allocated from thread -1781075296:
|
||||
malloc [rtlib.o]
|
||||
GKmalloc(void) [util.c:151]
|
||||
idxmalloc(void) [util.c:100]
|
||||
AllocateWSpace [memory.c:28]
|
||||
ParMETIS_V3_PartGeomKway [gkmetis.c:123]
|
||||
Zoltan_ParMetis [parmetis_interface.c:280]
|
||||
Zoltan_LB [lb_balance.c:384]
|
||||
Zoltan_LB_Partition [lb_balance.c:91]
|
||||
run_zoltan [dr_loadbal.c:581]
|
||||
main [dr_main.c:386]
|
||||
__libc_start_main [libc.so.6]
|
||||
_start [crt1.o]
|
||||
* There have been 10 frees since this block was freed from thread 22199:
|
||||
GKfree(void) [util.c:168]
|
||||
Mc_MoveGraph(void) [move.c:92]
|
||||
ParMETIS_V3_PartGeomKway [gkmetis.c:149]
|
||||
Zoltan_ParMetis [parmetis_interface.c:280]
|
||||
Zoltan_LB [lb_balance.c:384]
|
||||
Zoltan_LB_Partition [lb_balance.c:91]
|
||||
run_zoltan [dr_loadbal.c:581]
|
||||
main [dr_main.c:386]
|
||||
__libc_start_main [libc.so.6]
|
||||
_start [crt1.o]
|
||||
Reported: Reported 8/31/09 http://glaros.dtc.umn.edu/flyspray/task/50
|
||||
Status: Reported 8/31/09
|
||||
|
||||
Name: PartGeom limitation
|
||||
Version: ParMETIS 3.0, 3.1
|
||||
Symptom: inaccurate number of partitions when # partitions != # processors
|
||||
Description:
|
||||
ParMETIS method PartGeom produces decompositions with #-processor
|
||||
partitions only. Zoltan parameters NUM_GLOBAL_PARTITIONS and
|
||||
NUM_LOCAL_PARTITIONS will be ignored.
|
||||
Reported: Not yet reported.
|
||||
Status: Not yet reported.
|
||||
|
||||
Name: vsize array freed in ParMetis
|
||||
Version: ParMETIS 3.0 and 3.1
|
||||
Symptom: seg. fault, core dump at runtime
|
||||
Description:
|
||||
When calling ParMETIS_V3_AdaptiveRepart with the vsize parameter,
|
||||
ParMetis will try to free the vsize array even if it was
|
||||
allocated in Zoltan. Zoltan will then try to free vsize again
|
||||
later, resulting in a fatal error. As a temporary fix,
|
||||
Zoltan will never call ParMetis with the vsize parameter.
|
||||
Reported: 11/25/2003.
|
||||
Status: Acknowledged by George Karypis.
|
||||
|
||||
Name: ParMETIS_V3_AdaptiveRepart and ParMETIS_V3_PartKWay crash
|
||||
for zero-sized partitions.
|
||||
Version: ParMETIS 3.1
|
||||
Symptom: run-time error "killed by signal 8" on DEC. FPE, divide-by-zero.
|
||||
Description:
|
||||
Metis divides by partition size; thus, zero-sized partitions
|
||||
cause a floating-point exception.
|
||||
Reported: 9/9/2003.
|
||||
Status: ?
|
||||
|
||||
Name: ParMETIS_V3_AdaptiveRepart dies for zero-sized partitions.
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: run-time error "killed by signal 8" on DEC. FPE, divide-by-zero.
|
||||
Description:
|
||||
ParMETIS_V3_AdaptiveRepart divides by partition size; thus, zero-sized
|
||||
partitions cause a floating-point exception. This problem is exhibited in
|
||||
adaptive-partlocal3 tests. The tests actually run on Sun and Linux machines
|
||||
(which don't seem to care about the divide-by-zero), but cause an FPE
|
||||
signal on DEC (Compaq) machines.
|
||||
Reported: 1/23/2003.
|
||||
Status: Fixed in ParMetis 3.1, but new problem appeared (see above).
|
||||
|
||||
Name: ParMETIS_V3_AdaptiveRepart crashes when no edges.
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: Floating point exception, divide-by-zero.
|
||||
Description:
|
||||
Divide-by-zero in ParMETISLib/adrivers.c, function Adaptive_Partition,
|
||||
line 40.
|
||||
Reported: 1/23/2003.
|
||||
Status: Fixed in ParMetis 3.1.
|
||||
|
||||
Name: Uninitialized memory read in akwayfm.c.
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: UMR warning.
|
||||
Description:
|
||||
UMR in ParMETISLib/akwayfm.c, function Moc_KWayAdaptiveRefine, near line 520.
|
||||
Reported: 1/23/2003.
|
||||
Status: Fixed in ParMetis 3.1.
|
||||
|
||||
Name: Memory leak in wave.c
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: Some memory not freed.
|
||||
Description:
|
||||
Memory leak in ParMETISLib/wave.c, function WavefrontDiffusion;
|
||||
memory for the following variables is not always freed:
|
||||
solution, perm, workspace, cand
|
||||
We believe the early return near line 111 causes the problem.
|
||||
Reported: 1/23/2003.
|
||||
Status: Fixed in ParMetis 3.1.
|
||||
|
||||
Name: tpwgts ignored for small graphs.
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: incorrect output (partitioning)
|
||||
Description:
|
||||
When using ParMETIS_V3_PartKway to partition into partitions
|
||||
of unequal sizes, the input array tpwgts is ignored and
|
||||
uniform-sized partitions are computed. This bug shows up when
|
||||
(a) the number of vertices is < 10000 and (b) only one weight
|
||||
per vertex is given (ncon=1).
|
||||
Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/10/30.
|
||||
Status: Fixed in ParMetis 3.1.
|
||||
|
||||
|
||||
Name: AdaptiveRepart crashes on partless test.
|
||||
Version: ParMETIS 3.0
|
||||
Symptom: run-time segmentation violation.
|
||||
Description:
|
||||
ParMETIS_V3_AdaptiveRepart crashes with a SIGSEGV if
|
||||
the input array _part_ contains any value greater then
|
||||
the desired number of partitions, nparts. This shows up
|
||||
in Zoltan's "partless" test cases.
|
||||
Reported: Reported to George Karypis and metis@cs.umn.edu on 2002/12/02.
|
||||
Status: Fixed in ParMetis 3.1.
|
||||
|
||||
|
||||
Name: load imbalance tolerance
|
||||
Version: ParMETIS 2.0
|
||||
Symptom: missing feature
|
||||
Description:
|
||||
The load imbalance parameter UNBALANCE_FRACTION can
|
||||
only be set at compile-time. With Zoltan it is
|
||||
necessary to be able to set this parameter at run-time.
|
||||
Reported: Reported to metis@cs.umn.edu on 19 Aug 1999.
|
||||
Status: Fixed in version 3.0.
|
||||
|
||||
|
||||
Name: no edges
|
||||
Version: ParMETIS 2.0
|
||||
Symptom: segmentation fault at run time
|
||||
Description:
|
||||
ParMETIS crashes if the input graph has no edges and
|
||||
ParMETIS_PartKway is called. We suspect all the graph based
|
||||
methods crash. From the documentation it is unclear if
|
||||
a NULL pointer is a valid input for the adjncy array.
|
||||
Apparently, the bug occurs both with NULL as input or
|
||||
a valid pointer to an array.
|
||||
Reported: Reported to metis@cs.umn.edu on 5 Oct 1999.
|
||||
Status: Fixed in version 3.0.
|
||||
|
||||
|
||||
Name: no vertices
|
||||
Version: ParMETIS 2.0, 3.0, 3.1
|
||||
Symptom: segmentation fault at run time
|
||||
Description:
|
||||
ParMETIS may crash if a processor owns no vertices.
|
||||
The extent of this bug is not known (which methods are affected).
|
||||
Again, it is unclear if NULL pointers are valid input.
|
||||
Reported: Reported to metis@cs.umn.edu on 6 Oct 1999.
|
||||
Status: Fixed in 3.0 and 3.1 for the graph methods, but not the geometric methods.
|
||||
New bug report sent on 2003/08/20.
|
||||
|
||||
|
||||
Name: partgeom bug
|
||||
Version: ParMETIS 2.0
|
||||
Symptom: floating point exception
|
||||
Description:
|
||||
For domains where the global delta_x, delta_y, or delta_z (in 3D)
|
||||
is zero (e.g., all nodes lie along the y-axis), a floating point
|
||||
exception can occur when the partgeom algorithm is used.
|
||||
Reported: kirk@cs.umn.edu in Jan 2001.
|
||||
Status: Fixed in version 3.0.
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
|
Reference in New Issue
Block a user