SuperLU Distributed 8.2.1
Distributed memory sparse direct solver
|
Gets matrix permutation. More...
#include <limits.h>
#include <math.h>
#include "superlu_dist_config.h"
#include "parmetis.h"
#include "superlu_ddefs.h"
Functions | |
static float | a_plus_at_CompRow_loc (int, int_t *, int, int_t *, int_t, int_t *, int_t *, int, int_t *, int_t *, int_t **, int_t **, gridinfo_t *) |
float | get_perm_c_parmetis (SuperMatrix *A, int_t *perm_r, int_t *perm_c, int nprocs_i, int noDomains, int_t **sizes, int_t **fstVtxSep, gridinfo_t *grid, MPI_Comm *metis_comm) |
Gets matrix permutation.
Copyright (c) 2003, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from U.S. Dept. of Energy)
All rights reserved.
The source code is distributed under BSD license, see the file License.txt at the top-level directory.
-- Distributed symbolic factorization auxialiary routine (version 2.1) -- Lawrence Berkeley National Lab, Univ. of California Berkeley - July 2003 INRIA France - January 2004 Laura Grigori November 1, 2007
|
static |
Purpose ======= Form the structure of Pr*A +A'Pr'. A is an n-by-n matrix in NRformat_loc format, represented by (rowptr, colind). The output B=Pr*A +A'Pr' is in NRformat_loc format (symmetrically, also row oriented), represented by (b_rowptr, b_colind). The input matrix A is distributed in block row format on nprocs_i processors. The output matrix B is distributed in block row format on nprocs_o processors, where nprocs_o <= nprocs_i. On output, the matrix B has its rows permuted according to perm_r. Sketch of the algorithm ======================= Let iam by my process number. Let fst_row, lst_row = m_loc + fst_row be the first/last row stored on iam. Compute Pr' - the inverse row permutation, stored in iperm_r. Compute the transpose of the block row of Pr*A that iam owns: T[:,Pr(fst_row:lst_row)] = Pr' * A[:,fst_row:lst_row] * Pr' All to all communication such that every processor iam receives all the blocks of the transpose matrix that it needs, that is T[fst_row:lst_row, :] Compute B = A[fst_row:lst_row, :] + T[fst_row:lst_row, :] If Pr != I or nprocs_i != nprocs_o then permute the rows of B (that is compute Pr*B) and redistribute from nprocs_i to nprocs_o according to the block row distribution in vtxdist_i, vtxdist_o.
float get_perm_c_parmetis | ( | SuperMatrix * | A, |
int_t * | perm_r, | ||
int_t * | perm_c, | ||
int | nprocs_i, | ||
int | noDomains, | ||
int_t ** | sizes, | ||
int_t ** | fstVtxSep, | ||
gridinfo_t * | grid, | ||
MPI_Comm * | metis_comm | ||
) |
Purpose ======= GET_PERM_C_PARMETIS obtains a permutation matrix Pc, by applying a graph partitioning algorithm to the symmetrized graph A+A'. The multilevel graph partitioning algorithm used is the ParMETIS_V3_NodeND routine available in the parallel graph partitioning package parMETIS. The number of independent sub-domains noDomains computed by this algorithm has to be a power of 2. Hence noDomains is the larger number power of 2 that is smaller than nprocs_i, where nprocs_i = nprow * npcol is the number of processors used in SuperLU_DIST. Arguments ========= A (input) SuperMatrix* Matrix A in A*X=B, of dimension (A->nrow, A->ncol). The number of the linear equations is A->nrow. Matrix A is distributed in NRformat_loc format. perm_r (input) int_t* Row permutation vector of size A->nrow, which defines the permutation matrix Pr; perm_r[i] = j means row i of A is in position j in Pr*A. perm_c (output) int_t* Column permutation vector of size A->ncol, which defines the permutation matrix Pc; perm_c[i] = j means column i of A is in position j in A*Pc. nprocs_i (input) int* Number of processors the input matrix is distributed on in a block row format. It corresponds to number of processors used in SuperLU_DIST. noDomains (input) int*, must be power of 2 Number of independent domains to be computed by the graph partitioning algorithm. ( noDomains <= nprocs_i ) sizes (output) int_t**, of size 2 * noDomains Returns pointer to an array containing the number of nodes for each sub-domain and each separator. Separators are stored from left to right. Memory for the array is allocated in this routine. fstVtxSep (output) int_t**, of size 2 * noDomains Returns pointer to an array containing first node for each sub-domain and each separator. Memory for the array is allocated in this routine. Return value ============ < 0, number of bytes allocated on return from the symbolic factorization. > 0, number of bytes allocated when out of memory.