[llvm-commits] CVS: llvm-test/MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs/smg2000.readme
Duraid Madina
duraid at octopus.com.au
Sun Apr 10 22:22:19 PDT 2005
Changes in directory llvm-test/MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs:
smg2000.readme added (r1.1)
---
Log message:
* add the SMG2000 benchmark. This one is brutal on memory, anything
you can do in terms of prefetching/unrolling/SWP will probably help.
---
Diffs of the changes: (+389 -0)
smg2000.readme | 389 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 389 insertions(+)
Index: llvm-test/MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs/smg2000.readme
diff -c /dev/null llvm-test/MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs/smg2000.readme:1.1
*** /dev/null Mon Apr 11 00:22:17 2005
--- llvm-test/MultiSource/Benchmarks/ASCI_Purple/SMG2000/docs/smg2000.readme Mon Apr 11 00:22:07 2005
***************
*** 0 ****
--- 1,389 ----
+ %==========================================================================
+ %==========================================================================
+
+ Code Description
+
+ A. General description:
+
+ SMG2000 is a parallel semicoarsening multigrid solver for the linear
+ systems arising from finite difference, finite volume, or finite
+ element discretizations of the diffusion equation,
+
+ \grad \cdot ( D \grad u ) + \sigma u = f
+
+ on logically rectangular grids. The code solves both 2D and 3D
+ problems with discretization stencils of up to 9-point in 2D and up to
+ 27-point in 3D. See the following paper for details on the algorithm
+ and its parallel implementation/performance:
+
+ P. N. Brown, R. D. Falgout, and J. E. Jones,
+ "Semicoarsening multigrid on distributed memory machines",
+ SIAM Journal on Scientific Computing, 21 (2000), pp. 1823-1834.
+ Also available as LLNL technical report UCRL-JC-130720.
+
+ The driver provided with SMG2000 builds linear systems for the special
+ case of the above equation,
+
+ - cx u_xx - cy u_yy - cz u_zz = (1/h)^2 , (in 3D)
+ - cx u_xx - cy u_yy = (1/h)^2 , (in 2D)
+
+ with Dirichlet boundary conditions of u = 0, where h is the mesh
+ spacing in each direction. Standard finite differences are used to
+ discretize the equations, yielding 5-pt. and 7-pt. stencils in 2D and
+ 3D, respectively.
+
+ To determine when the solver has converged, the driver currently uses
+ the relative-residual stopping criteria,
+
+ ||r_k||_2 / ||b||_2 < tol
+
+ with tol = 10^-6.
+
+ This solver can serve as a key component for achieving scalability in
+ radiation diffusion simulations.
+
+ B. Coding:
+
+ SMG2000 is written in ISO-C. It is an SPMD code which uses MPI.
+ Parallelism is achieved by data decomposition. The driver provided
+ with SMG2000 achieves this decomposition by simply subdividing the
+ grid into logical P x Q x R (in 3D) chunks of equal size.
+
+ C. Parallelism:
+
+ SMG2000 is a highly synchronous code. The communications and
+ computations patterns exhibit the surface-to-volume relationship
+ common to many parallel scientific codes. Hence, parallel efficiency
+ is largely determined by the size of the data "chunks" mentioned
+ above, and the speed of communications and computations on the
+ machine. SMG2000 is also memory-access bound, doing only about 1-2
+ computations per memory access, so memory-access speeds will also have
+ a large impact on performance.
+
+ %==========================================================================
+ %==========================================================================
+
+ Files in this Distribution
+
+ NOTE: The SMG2000 code is derived directly from the hypre library, a large
+ linear solver library that is being developed in the Center for Applied
+ Scientific Computing (CASC) at LLNL.
+
+ In the smg2000 directory the following files are included:
+
+ COPYRIGHT_and_DISCLAIMER
+ HYPRE_config.h
+ Makefile
+ Makefile.include
+
+ The following subdirectories are also included:
+
+ docs
+ krylov
+ struct_ls
+ struct_mv
+ test
+ utilities
+
+ In the 'docs' directory the following files are included:
+
+ smg2000.readme
+
+ In the 'krylov' directory the following files are included:
+
+ HYPRE_pcg.c
+ Makefile
+ krylov.h
+ pcg.c
+
+ In the 'struct_ls' directory the following files are included:
+
+ HYPRE_struct_ls.h
+ HYPRE_struct_pcg.c
+ HYPRE_struct_smg.c
+ Makefile
+ coarsen.c
+ cyclic_reduction.c
+ general.c
+ headers.h
+ pcg_struct.c
+ point_relax.c
+ semi_interp.c
+ semi_restrict.c
+ smg.c
+ smg.h
+ smg2_setup_rap.c
+ smg3_setup_rap.c
+ smg_axpy.c
+ smg_relax.c
+ smg_residual.c
+ smg_setup.c
+ smg_setup_interp.c
+ smg_setup_rap.c
+ smg_setup_restrict.c
+ smg_solve.c
+ struct_ls.h
+
+ In the 'struct_mv' directory the following files are included:
+
+ HYPRE_struct_grid.c
+ HYPRE_struct_matrix.c
+ HYPRE_struct_mv.h
+ HYPRE_struct_stencil.c
+ HYPRE_struct_vector.c
+ Makefile
+ box.c
+ box_algebra.c
+ box_alloc.c
+ box_neighbors.c
+ communication.c
+ communication_info.c
+ computation.c
+ grow.c
+ headers.h
+ hypre_box_smp_forloop.h
+ project.c
+ struct_axpy.c
+ struct_copy.c
+ struct_grid.c
+ struct_innerprod.c
+ struct_io.c
+ struct_matrix.c
+ struct_matrix_mask.c
+ struct_matvec.c
+ struct_mv.h
+ struct_scale.c
+ struct_stencil.c
+ struct_vector.c
+
+ In the 'test' directory the following files are included:
+
+ Makefile
+ smg2000.c
+
+ In the 'utilities' directory the following files are included:
+
+ HYPRE_utilities.h
+ Makefile
+ general.h
+ hypre_smp_forloop.h
+ memory.c
+ memory.h
+ mpistubs.c
+ mpistubs.h
+ random.c
+ threading.c
+ threading.h
+ timer.c
+ timing.c
+ timing.h
+ utilities.h
+ version
+
+ %==========================================================================
+ %==========================================================================
+
+ Building the Code
+
+ SMG2000 uses a simple Makefile system for building the code. All
+ compiler and link options are set by modifying the file
+ 'smg2000/Makefile.include' appropriately. This file is then included
+ in each of the following makefiles:
+
+ krylov/Makefile
+ struct_ls/Makefile
+ struct_mv/Makefile
+ test/Makefile
+ utilities/Makefile
+
+ To build the code, first modify the 'Makefile.include' file
+ appropriately, then type (in the smg2000 directory)
+
+ make
+
+ Other available targets are
+
+ make clean (deletes .o files)
+ make veryclean (deletes .o files, libraries, and executables)
+
+ To configure the code to run with:
+ 1 - OpenMP only, add '-DHYPRE_USING_OPENMP -DHYPRE_SEQUENTIAL' to
+ the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file and
+ use a valid OpenMP compiler.
+ 2 - Open MP with MPI, add '-DHYPRE_USING_OPENMP -DTIMER_USE_MPI'
+ to the 'INCLUDE_CFLAGS' line in the 'Makefile.include' file
+ and use a valid OpenMP compiler and MPI library.
+ 3 - MPI only , add '-DTIMER_USE_MPI' to the 'INCLUDE_CFLAGS' line
+ in the 'Makefile.include' file and use a valid MPI.
+
+ %==========================================================================
+ %==========================================================================
+
+ Optimization and Improvement Challenges
+
+ This code is memory-access bound. We believe it would be very
+ difficult to obtain "good" cache reuse with an optimized version of
+ the code.
+
+ %==========================================================================
+ %==========================================================================
+
+ Parallelism and Scalability Expectations
+
+ SMG2000 has been run on the following platforms:
+
+ Blue-Pacific - up to 1000 procs
+ Red - up to 3150 procs
+ Compaq cluster - up to 64 procs
+ Sun Sparc Ultra 10's - up to 4 machines
+
+ Consider increasing both problem size and number of processors in tandem.
+ On scalable architectures, time-to-solution for SMG2000 will initially
+ increase, then it will level off at a modest numbers of processors,
+ remaining roughly constant for larger numbers of processors. Iteration
+ counts will also increase slightly for small to modest sized problems,
+ then level off at a roughly constant number for larger problem sizes.
+
+ For example, we get the following results for a 3D problem with
+ cx = 0.1, cy = 1.0, and cz = 10.0, for a problem distributed on
+ a logical P x Q x R processor topology, with fixed local problem
+ size per processor given as 35x35x35:
+
+ "P x Q x R" P "iters" "setup time" "solve time"
+ 1x1x1 1 6 1.681680 23.255241
+ 2x2x2 8 6 3.738600 32.262907
+ 3x3x3 27 6 6.601194 41.341892
+ 6x6x6 216 7 12.310776 46.672215
+ 8x8x8 512 7 18.968893 50.051737
+ 10x10x10 1000 7 18.890876 54.094806
+ 14x15x15 3150 8 30.635085 62.725305
+
+ These results were obtained on ASCI Red.
+
+ %==========================================================================
+ %==========================================================================
+
+ Running the Code
+
+ The driver for SMG2000 is called `smg2000', and is located in the
+ smg2000/test subdirectory. Type
+
+ mpirun -np 1 smg2000 -help
+
+ to get usage information. This prints out the following:
+
+ Usage: .../smg2000/test/smg2000 [<options>]
+
+ -n <nx> <ny> <nz> : problem size per block
+ -P <Px> <Py> <Pz> : processor topology
+ -b <bx> <by> <bz> : blocking per processor
+ -c <cx> <cy> <cz> : diffusion coefficients
+ -v <n_pre> <n_post> : number of pre and post relaxations
+ -d <dim> : problem dimension (2 or 3)
+ -solver <ID> : solver ID (default = 0)
+ 0 - SMG
+ 1 - CG with SMG precond
+ 2 - CG with diagonal scaling
+ 3 - CG
+
+ All of the arguments are optional. The most important options for the
+ SMG2000 compact application are the `-n' and `-P' options. The `-n'
+ option allows one to specify the local problem size per MPI process,
+ the the `-P' option specifies the process topology on which to run.
+ The global problem size will be <Px>*<nx> by <Py>*<ny> by <Pz>*<nz>.
+
+ When running with OpenMP, the number of threads used per MPI process
+ is controlled via the OMP_NUM_THREADS environment variable.
+
+ %==========================================================================
+ %==========================================================================
+
+ Timing Issues
+
+ If using MPI, the whole code is timed using the MPI timers. If not using
+ MPI, standard system timers are used. Timing results are printed to
+ standard out, and are divided into "Setup Phase" times and "Solve Phase"
+ times. Timings for a few individual routines are also printed out.
+
+ %==========================================================================
+ %==========================================================================
+
+ Memory Needed
+
+ SMG2000 is a memory intensive code, and its memory needs are somewhat
+ complicated to describe. For the 3D problems discussed in this
+ document, memory requirements are roughly 54 times the local problem
+ size times the size of a double plus some overhead for storing ghost
+ points, etc. in the code. The overhead required by this version of
+ the SMG code grows essentially like the logarithm of the problem size.
+
+ %==========================================================================
+ %==========================================================================
+
+ About the Data
+
+ SMG2000 does not read in any data. All control is on the execute line.
+
+ %==========================================================================
+ %==========================================================================
+
+ Expected Results
+
+ Consider the following run:
+
+ mpirun -np 1 smg2000 -n 12 12 12 -c 2.0 3.0 40
+
+ This is what SMG2000 prints out:
+
+ Running with these driver parameters:
+ (nx, ny, nz) = (12, 12, 12)
+ (Px, Py, Pz) = (1, 1, 1)
+ (bx, by, bz) = (1, 1, 1)
+ (cx, cy, cz) = (2.000000, 3.000000, 40.000000)
+ (n_pre, n_post) = (1, 1)
+ dim = 3
+ solver ID = 0
+ =============================================
+ Struct Interface:
+ =============================================
+ Struct Interface:
+ wall clock time = 0.005627 seconds
+ cpu clock time = 0.010000 seconds
+
+ =============================================
+ Setup phase times:
+ =============================================
+ SMG Setup:
+ wall clock time = 0.330096 seconds
+ cpu clock time = 0.330000 seconds
+
+ =============================================
+ Solve phase times:
+ =============================================
+ SMG Solve:
+ wall clock time = 0.686244 seconds
+ cpu clock time = 0.480000 seconds
+
+
+ Iterations = 4
+ Final Relative Residual Norm = 8.972097e-07
+
+ The relative residual norm may differ slightly from machine to machine
+ or compiler to compiler, but should only differ very slightly (say,
+ the 6th or 7th decimal place). Also, the code should generate nearly
+ identical results for a given problem, independent of the data
+ distribution. The only part of the code that does not guarantee
+ bitwise identical results is the inner product used to compute norms.
+ In practice, the above residual norm has remained the same.
+
+ %==========================================================================
+ %==========================================================================
+
+ Release and Modification Record
+
+ LLNL code release number: UCRL-CODE-2000-022
+
+ (c) 2000 The Regents of the University of California
+
+ See the file COPYRIGHT_and_DISCLAIMER for a complete copyright notice,
+ contact person, and disclaimer.
More information about the llvm-commits
mailing list