Scotch Brand 5.1.10 User Manual

Scotch and libScotch 5.1 User’s Guide

(version 5.1.10)

Fran¸cois Pellegrini

Bacchus team, INRIA Bordeaux Sud-Ouest

ENSEIRB & LaBRI, UMR CNRS 5800

Universit´e Bordeaux I

351 cours de la Lib´eration, 33405 TALENCE, FRANCE

[email protected]

August 29, 2010

Abstract

This document describes the capabilities and operations of Scotch and

libScotch, a software package and a software library devoted to static

mapping, partitioning, and sparse matrix block ordering of graphs and

meshes/hypergraphs. It gives brief descriptions of the algorithms, details

the input/output formats, instructions for use, installation procedures, and

provides a number of examples.

Scotch is distributed as free/libre software, and has been designed such

that new partitioning or ordering methods can be added in a straightforward

manner. It can therefore be used as a testbed for the easy and quick coding

and testing of such new methods, and may also be redistributed, as a library,

along with third-party software that makes use of it, either in its original or

in updated forms.

6.3.5 gcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.3.6 gmap / gpart . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.3.7 gmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3.8 gmk msh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3.9 gmtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3.10 gord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.3.11 gotst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.3.12 gout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3.13 gtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.14 mcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.3.15 mmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3.16 mord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.3.17 mtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Library

7.1 Calling the routines of libScotch . . . . . . . . . . . . . . . . . . . 47

7.1.1 Calling from C . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.1.2 Calling from Fortran . . . . . . . . . . . . . . . . . . . . . . . 47

7.1.3 Compiling and linking . . . . . . . . . . . . . . . . . . . . . . 48

7.1.4 Machine word size issues . . . . . . . . . . . . . . . . . . . . . 49

7.2 Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2.1 Architecture format . . . . . . . . . . . . . . . . . . . . . . . 50

7.2.2 Graph format . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.2.3 Mesh format . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.2.4 Geometry format . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.2.5 Block ordering format . . . . . . . . . . . . . . . . . . . . . . 56

7.3 Strategy strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.3.1 Using default strategy strings . . . . . . . . . . . . . . . . . . 57

7.3.2 Mapping strategy strings . . . . . . . . . . . . . . . . . . . . 58

7.3.3 Graph bipartitioning strategy strings . . . . . . . . . . . . . . 59

7.3.4 Ordering strategy strings . . . . . . . . . . . . . . . . . . . . 63

7.3.5 Node separation strategy strings . . . . . . . . . . . . . . . . 66

7.4 Target architecture handling routines . . . . . . . . . . . . . . . . . . 70

7.4.1 SCOTCH archInit . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.2 SCOTCH archExit . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.3 SCOTCH archLoad . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.4.4 SCOTCH archSave . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.4.5 SCOTCH archName . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.4.6 SCOTCH archSize . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.4.7 SCOTCH archBuild . . . . . . . . . . . . . . . . . . . . . . . . 72

7.4.8 SCOTCH archCmplt . . . . . . . . . . . . . . . . . . . . . . . . 73

7.4.9 SCOTCH archCmpltw . . . . . . . . . . . . . . . . . . . . . . . 73

7.4.10 SCOTCH archHcub . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.4.11 SCOTCH archMesh2D . . . . . . . . . . . . . . . . . . . . . . . 74

7.4.12 SCOTCH archMesh3D . . . . . . . . . . . . . . . . . . . . . . . 75

7.4.13 SCOTCH archTleaf . . . . . . . . . . . . . . . . . . . . . . . . 75

7.4.14 SCOTCH archTorus2D . . . . . . . . . . . . . . . . . . . . . . . 76

7.4.15 SCOTCH archTorus3D . . . . . . . . . . . . . . . . . . . . . . . 76

7.5 Graph handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.1 SCOTCH graphInit . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.2 SCOTCH graphExit . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.3 SCOTCH graphFree . . . . . . . . . . . . . . . . . . . . . . . . 77

7.5.4 SCOTCH graphLoad . . . . . . . . . . . . . . . . . . . . . . . . 78

7.5.5 SCOTCH graphSave . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.6 SCOTCH graphBuild . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.7 SCOTCH graphBase . . . . . . . . . . . . . . . . . . . . . . . . 80

7.5.8 SCOTCH graphCheck . . . . . . . . . . . . . . . . . . . . . . . 81

7.5.9 SCOTCH graphSize . . . . . . . . . . . . . . . . . . . . . . . . 81

7.5.10 SCOTCH graphData . . . . . . . . . . . . . . . . . . . . . . . . 82

7.5.11 SCOTCH graphStat . . . . . . . . . . . . . . . . . . . . . . . . 83

7.6 Graph mapping and partitioning routines . . . . . . . . . . . . . . . 84

7.6.1 SCOTCH graphPart . . . . . . . . . . . . . . . . . . . . . . . . 84

7.6.2 SCOTCH graphMap . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.6.3 SCOTCH graphMapInit . . . . . . . . . . . . . . . . . . . . . . 86

7.6.4 SCOTCH graphMapExit . . . . . . . . . . . . . . . . . . . . . . 86

7.6.5 SCOTCH graphMapLoad . . . . . . . . . . . . . . . . . . . . . . 87

7.6.6 SCOTCH graphMapSave . . . . . . . . . . . . . . . . . . . . . . 87

7.6.7 SCOTCH graphMapCompute . . . . . . . . . . . . . . . . . . . . 88

7.6.8 SCOTCH graphMapView . . . . . . . . . . . . . . . . . . . . . . 88

7.7 Graph ordering routines . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.7.1 SCOTCH graphOrder . . . . . . . . . . . . . . . . . . . . . . . 89

7.7.2 SCOTCH graphOrderInit . . . . . . . . . . . . . . . . . . . . . 90

7.7.3 SCOTCH graphOrderExit . . . . . . . . . . . . . . . . . . . . . 91

7.7.4 SCOTCH graphOrderLoad . . . . . . . . . . . . . . . . . . . . . 91

7.7.5 SCOTCH graphOrderSave . . . . . . . . . . . . . . . . . . . . . 92

7.7.6 SCOTCH graphOrderSaveMap . . . . . . . . . . . . . . . . . . 92

7.7.7 SCOTCH graphOrderSaveTree . . . . . . . . . . . . . . . . . . 93

7.7.8 SCOTCH graphOrderCheck . . . . . . . . . . . . . . . . . . . . 93

7.7.9 SCOTCH graphOrderCompute . . . . . . . . . . . . . . . . . . 94

7.7.10 SCOTCH graphOrderComputeList . . . . . . . . . . . . . . . . 94

7.8 Mesh handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.8.1 SCOTCH meshInit . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.8.2 SCOTCH meshExit . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.8.3 SCOTCH meshLoad . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.8.4 SCOTCH meshSave . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.8.5 SCOTCH meshBuild . . . . . . . . . . . . . . . . . . . . . . . . 97

7.8.6 SCOTCH meshCheck . . . . . . . . . . . . . . . . . . . . . . . . 99

7.8.7 SCOTCH meshSize . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.8.8 SCOTCH meshData . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.8.9 SCOTCH meshStat . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.8.10 SCOTCH meshGraph . . . . . . . . . . . . . . . . . . . . . . . . 102

7.9 Mesh ordering routines . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.9.1 SCOTCH meshOrder . . . . . . . . . . . . . . . . . . . . . . . . 103

7.9.2 SCOTCH meshOrderInit . . . . . . . . . . . . . . . . . . . . . 104

7.9.3 SCOTCH meshOrderExit . . . . . . . . . . . . . . . . . . . . . 105

7.9.4 SCOTCH meshOrderSave . . . . . . . . . . . . . . . . . . . . . 105

7.9.5 SCOTCH meshOrderSaveMap . . . . . . . . . . . . . . . . . . . 106

7.9.6 SCOTCH meshOrderSaveTree . . . . . . . . . . . . . . . . . . 106

7.9.7 SCOTCH meshOrderCheck . . . . . . . . . . . . . . . . . . . . . 107

7.9.8 SCOTCH meshOrderCompute . . . . . . . . . . . . . . . . . . . 107

7.10 Strategy handling routines . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.1 SCOTCH stratInit . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.2 SCOTCH stratExit . . . . . . . . . . . . . . . . . . . . . . . . 108

7.10.3 SCOTCH stratSave . . . . . . . . . . . . . . . . . . . . . . . . 109

7.10.4 SCOTCH stratGraphBipart . . . . . . . . . . . . . . . . . . . 109

7.10.5 SCOTCH stratGraphMap . . . . . . . . . . . . . . . . . . . . . 110

7.10.6 SCOTCH stratGraphMapBuild . . . . . . . . . . . . . . . . . . 110

7.10.7 SCOTCH stratGraphOrder . . . . . . . . . . . . . . . . . . . . 111

7.10.8 SCOTCH stratGraphOrderBuild . . . . . . . . . . . . . . . . 111

7.10.9 SCOTCH stratMeshOrder . . . . . . . . . . . . . . . . . . . . . 112

7.10.10SCOTCH stratMeshOrderBuild . . . . . . . . . . . . . . . . . 112

7.11 Geometry handling routines . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.1 SCOTCH geomInit . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.2 SCOTCH geomExit . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.11.3 SCOTCH geomData . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.11.4 SCOTCH graphGeomLoadChac . . . . . . . . . . . . . . . . . . 115

7.11.5 SCOTCH graphGeomSaveChac . . . . . . . . . . . . . . . . . . 115

7.11.6 SCOTCH graphGeomLoadHabo . . . . . . . . . . . . . . . . . . 116

7.11.7 SCOTCH graphGeomLoadScot . . . . . . . . . . . . . . . . . . 116

7.11.8 SCOTCH graphGeomSaveScot . . . . . . . . . . . . . . . . . . 117

7.11.9 SCOTCH meshGeomLoadHabo . . . . . . . . . . . . . . . . . . . 118

7.11.10SCOTCH meshGeomLoadScot . . . . . . . . . . . . . . . . . . . 118

7.11.11SCOTCH meshGeomSaveScot . . . . . . . . . . . . . . . . . . . 119

7.12 Error handling routines . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.1 SCOTCH errorPrint . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.2 SCOTCH errorPrintW . . . . . . . . . . . . . . . . . . . . . . . 120

7.12.3 SCOTCH errorProg . . . . . . . . . . . . . . . . . . . . . . . . 120

7.13 Miscellaneous routines . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.13.1 SCOTCH randomReset . . . . . . . . . . . . . . . . . . . . . . . 121

7.14 MeTiS compatibility library . . . . . . . . . . . . . . . . . . . . . . . 121

7.14.1 METIS EdgeND . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.14.2 METIS NodeND . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.14.3 METIS NodeWND . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.14.4 METIS PartGraphKway . . . . . . . . . . . . . . . . . . . . . . 123

7.14.5 METIS PartGraphRecursive . . . . . . . . . . . . . . . . . . 124

7.14.6 METIS PartGraphVKway . . . . . . . . . . . . . . . . . . . . . 125

Installation

126

8.1 Thread issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.2 File compression issues . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.3 Machine word size issues . . . . . . . . . . . . . . . . . . . . . . . . . 127

Examples

127

10 Adding new features to Scotch

129

10.1 Graphs and meshes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10.2 Methods and partition data . . . . . . . . . . . . . . . . . . . . . . . 130

10.3 Adding a new method to Scotch . . . . . . . . . . . . . . . . . . . 130

10.4 Licensing of new methods and of derived works . . . . . . . . . . . . 132

1 Introduction

1.1 Static mapping

The eﬃcient execution of a parallel program on a parallel machine requires that

the communicating processes of the program be assigned to the processors of the

machine so as to minimize its overall running time. When processes have a lim-

ited duration and their logical dependencies are accounted for, this optimization

problem is referred to as scheduling. When processes are assumed to coexist simul-

taneously for the entire duration of the program, it is referred to as mapping. It

amounts to balancing the computational weight of the processes among the proces-

sors of the machine, while reducing the cost of communication by keeping intensively

inter-communicating processes on nearby processors. In most cases, the underlying

computational structure of the parallel programs to map can be conveniently mod-

eled as a graph in which vertices correspond to processes that handle distributed

pieces of data, and edges reﬂect data dependencies. The mapping problem can then

be addressed by assigning processor labels to the vertices of the graph, so that all

processes assigned to some processor are loaded and run on it. In a SPMD con-

text, this is equivalent to the distribution across processors of the data structures

of parallel programs; in this case, all pieces of data assigned to some processor are

handled by a single process located on this processor.

A mapping is called static if it is computed prior to the execution of the program.

Static mapping is NP-complete in the general case [13]. Therefore, many studies

have been carried out in order to ﬁnd sub-optimal solutions in reasonable time,

including the development of speciﬁc algorithms for common topologies such as the

hypercube [11, 21]. When the target machine is assumed to have a communication

network in the shape of a complete graph, the static mapping problem turns into the

partitioning problem, which has also been intensely studied [4, 22, 31, 33, 49]. How-

ever, when mapping onto parallel machines the communication network of which is

not a bus, not accounting for the topology of the target machine usually leads to

worse running times, because simple cut minimization can induce more expensive

long-distance communication [21, 56].

1.2 Sparse matrix ordering

Many scientiﬁc and engineering problems can be modeled by sparse linear systems,

which are solved either by iterative or direct methods. To achieve eﬃciency with

direct methods, one must minimize the ﬁll-in induced by factorization. This ﬁll-in

is a direct consequence of the order in which the unknowns of the linear system are

numbered, and its eﬀects are critical both in terms of memory and computation

costs.

An eﬃcient way to compute ﬁll reducing orderings of symmetric sparse matrices

is to use recursive nested dissection [17]. It amounts to computing a vertex set S

that separates the graph into two parts A and B, ordering S with the highest indices

that are still available, and proceeding recursively on parts A and B until their sizes

become smaller than some threshold value. This ordering guarantees that, at each

step, no non-zero term can appear in the factorization process between unknowns

of A and unknowns of B.

The main issue of the nested dissection ordering algorithm is thus to ﬁnd small

vertex separators that balance the remaining subgraphs as evenly as possible, in

order to minimize ﬁll-in and to increase concurrency in the factorization process.

1.3 Contents of this document

This document describes the capabilities and operations of Scotch, a software

package devoted to static mapping, graph and mesh partitioning, and sparse matrix

block ordering. Scotch allows the user to map eﬃciently any kind of weighted

process graph onto any kind of weighted architecture graph, and provides high-

quality block orderings of sparse matrices. The rest of this manual is organized

as follows. Section 2 presents the goals of the Scotch project, and section 3

outlines the most important aspects of the mapping and ordering algorithms that it

implements. Section 4 summarizes the most important changes between version 5.0

and previous versions. Section 5 deﬁnes the formats of the ﬁles used in Scotch,

section 6 describes the programs of the Scotch distribution, and section 7 deﬁnes

the interface and operations of the libScotch library. Section 8 explains how

to obtain and install the Scotch distribution. Finally, some practical examples

are given in section 9, and instructions on how to implement new methods in the

libScotch library are provided in section 10.

2 The Scotch project

2.1 Description

Scotch is a project carried out at the Laboratoire Bordelais de Recherche en In-

formatique (LaBRI) of the Universit´e Bordeaux I, and now within the ScAlApplix

project of INRIA Bordeaux Sud-Ouest. Its goal is to study the applications of graph

theory to scientiﬁc computing, using a “divide and conquer” approach.

It focused ﬁrst on static mapping, and has resulted in the development of the

Dual Recursive Bipartitioning (or DRB) mapping algorithm and in the study of

several graph bipartitioning heuristics [41], all of which have been implemented in

the Scotch software package [45]. Then, it focused on the computation of high-

quality vertex separators for the ordering of sparse matrices by nested dissection,

by extending the work that has been done on graph partitioning in the context

of static mapping [46, 47]. More recently, the ordering capabilities of Scotch

have been extended to native mesh structures, thanks to hypergraph partitioning

algorithms. New graph partitioning methods have also been recently added [8, 42].

Version 5.0 of Scotch is the ﬁrst one to comprise parallel graph ordering rou-

tines. The parallel features of Scotch are referred to as PT-Scotch (“Parallel

Threaded Scotch”). While both packages share a signiﬁcant amount of code, bea-

cuse PT-Scotch transfers control to the sequential routines of the libScotch

library when the subgraphs on which it operates are located on a single processor,

the two sets of routines have a distinct user’s manual. Readers interested in the

parallel features of Scotch should refer to the PT-Scotch 5.1 User’s Guide [43].

2.2 Availability

Starting from version 4.0, which has been developed at INRIA within the ScAlAp-

plix project, Scotch is available under a dual licensing basis. On the one hand, it

is downloadable from the Scotch web page as free/libre software, to all interested

parties willing to use it as a library or to contribute to it as a testbed for new

partitioning and ordering methods. On the other hand, it can also be distributed,

under other types of licenses and conditions, to parties willing to embed it tightly

into closed, proprietary software.

The free/libre software license under which Scotch 5.1 is distributed is

the CeCILL-C license [6], which has basically the same features as the GNU

LGPL (“Lesser General Public License”): ability to link the code as a library

to any free/libre or even proprietary software, ability to modify the code and to

redistribute these modiﬁcations. Version 4.0 of Scotch was distributed under the

LGPL itself.

Please refer to section 8 to see how to obtain the free/libre distribution of

Scotch.

3 Algorithms

3.1 Static mapping by Dual Recursive Bipartitioning

For a detailed description of the mapping algorithm and an extensive analysis of its

performance, please refer to [41, 44]. In the next sections, we will only outline the

most important aspects of the algorithm.

3.1.1 Static mapping

The parallel program to be mapped onto the target architecture is modeled by a val-

uated unoriented graph S called source graph or process graph, the vertices of which

represent the processes of the parallel program, and the edges of which the commu-

nication channels between communicating processes. Vertex- and edge- valuations

associate with every vertex v_Sand every edge e_Sof S integer numbers w_S(v_S) and

w_S(e_S) which estimate the computation weight of the corresponding process and

the amount of communication to be transmitted on the channel, respectively.

The target machine onto which is mapped the parallel program is also modeled

by a valuated unoriented graph T called target graph or architecture graph. Vertices

v_Tand edges e_Tof T are assigned integer weights w_T(v_T) and w_T(e_T), which

estimate the computational power of the corresponding processor and the cost of

traversal of the inter-processor link, respectively.

A mapping from S to T consists of two applications τ_S,T: V (S) −→ V (T ) and

ρ_S,T: E(S) −→ P(E(T )), where P(E(T )) denotes the set of all simple loopless

paths which can be built from E(T ). τ_S,T(v_S) = v_Tif process v_Sof S is mapped

onto processor v_Tof T , and ρ_S,T(e_S) = {e¹_T, e_T², . . . , e_Tⁿ} if communication channel

e_Sof S is routed through communication links e¹_T, e²_T, . . . , e_Tⁿof T . |ρ_S,T(e_S)|

denotes the dilation of edge e_S, that is, the number of edges of E(T ) used to route

e_S.

3.1.2 Cost function and performance criteria

The computation of eﬃcient static mappings requires an a priori knowledge of the

dynamic behavior of the target machine with respect to the programs which are

run on it. This knowledge is synthesized in a cost function, the nature of which

determines the characteristics of the desired optimal mappings. The goal of our

mapping algorithm is to minimize some communication cost function, while keeping

the load balance within a speciﬁed tolerance. The communication cost function f_C

that we have chosen is the sum, for all edges, of their dilation multiplied by their

weight:

ꢀ

f_C(τ_S,T, ρ_S,T) ^d=^ef

w_S(e_S) |ρ_S,T(e_S)| .

e_S∈E(S)

This function, which has already been considered by several authors for hyper-

cube target topologies [11, 21, 25], has several interesting properties: it is easy

to compute, allows incremental updates performed by iterative algorithms, and its

minimization favors the mapping of intensively intercommunicating processes onto

nearby processors; regardless of the type of routage implemented on the target

machine (store-and-forward or cut-through), it models the traﬃc on the intercon-

nection network and thus the risk of congestion.

The strong positive correlation between values of this function and eﬀective

execution times has been experimentally veriﬁed by Hammond [21] on the CM-2,

and by Hendrickson and Leland [26] on the nCUBE 2.

The quality of mappings is evaluated with respect to the criteria for quality that

we have chosen: the balance of the computation load across processors, and the

minimization of the interprocessor communication cost modeled by function f_C.

These criteria lead to the deﬁnition of several parameters, which are described

below.

For load balance, one can deﬁne µ_map, the average load per computational

power unit (which does not depend on the mapping), and δ_map, the load imbalance

ratio, as

ꢁ

v_S∈V (S)

ꢁ

v_T∈V (T )

w_S(v_S)

def

µ_map

and

w_T(v_T)

ꢂ







ꢁ

v_S∈ V (S)

_S,T(v_S) = v_T

ꢂ

_T(v_T

w (v ) − µ_map

ꢂ



_S

)

_{∈V (T )}ꢂ

v_T

ꢂ

def

ꢁ

v_S∈V (S)

δ_map

w_S(v_S)

However, since the maximum load imbalance ratio is provided by the user in input

of the mapping, the information given by these parameters is of little interest, since

what matters is the minimization of the communication cost function under this

load balance constraint.

For communication, the straightforward parameter to consider is f_C. It can be

normalized as µ_exp, the average edge expansion, which can be compared to µ_dil

the average edge dilation; these are deﬁned as

|ρ_S,T(e_S)|

ꢁ

e_S∈E(S)

f_C

def

ꢁ

e_S∈E(S)

µ_exp

and

µ_dil

w_S(e_S)

|E(S)|

µ_exp

is smaller than 1 when the mapper succeeds in putting heavily inter-

µ_dil

def

δ_exp

communicating processes closer to each other than it does for lightly communicating

processes; they are equal if all edges have same weight.

3.1.3 The Dual Recursive Bipartitioning algorithm

Our mapping algorithm uses a divide and conquer approach to recursively allocate

subsets of processes to subsets of processors [41]. It starts by considering a set of

processors, also called domain, containing all the processors of the target machine,

and with which is associated the set of all the processes to map. At each step, the

algorithm bipartitions a yet unprocessed domain into two disjoint subdomains, and

calls a graph bipartitioning algorithm to split the subset of processes associated with

the domain across the two subdomains, as sketched in the following.

mapping (D, P)

Set_Of_Processors D;

Set_Of_Processes

{

Set_Of_Processors D0, D1;

Set_Of_Processes P0, P1;

if (|P| == 0) return; /* If nothing to do.

if (|D| == 1) {

result (D, P);

return;

/* If one processor in D */

/* P is mapped onto it. */

}

(D0, D1) = processor_bipartition (D);

(P0, P1) = process_bipartition (P, D0, D1);

/* Perform recursion. */

mapping (D0, P0);

mapping (D1, P1);

}

The association of a subdomain with every process deﬁnes a partial mapping of the

process graph. As bipartitionings are performed, the subdomain sizes decrease, up

to give a complete mapping when all subdomains are of size one.

The above algorithm lies on the ability to deﬁne ﬁve main objects:

• a domain structure, which represents a set of processors in the target archi-

tecture;

• a domain bipartitioning function, which, given a domain, bipartitions it in two

disjoint subdomains;

• a domain distance function, which gives, in the target graph, a measure of the

distance between two disjoint domains. Since domains may not be convex nor

connected, this distance may be estimated. However, it must respect certain

homogeneity properties, such as giving more accurate results as domain sizes

decrease. The domain distance function is used by the graph bipartitioning

algorithms to compute the communication function to minimize, since it allows

the mapper to estimate the dilation of the edges that link vertices which belong

to diﬀerent domains. Using such a distance function amounts to considering

that all routings will use shortest paths on the target architecture, which

is how most parallel machines actually do. We have thus chosen that our

program would not provide routings for the communication channels, leaving

their handling to the communication system of the target machine;

• a process subgraph structure, which represents the subgraph induced by a

subset of the vertex set of the original source graph;

• a process subgraph bipartitioning function, which bipartitions subgraphs in

two disjoint pieces to be mapped onto the two subdomains computed by the

domain bipartitioning function.

All these routines are seen as black boxes by the mapping program, which can thus

accept any kind of target architecture and process bipartitioning functions.

3.1.4 Partial cost function

The production of eﬃcient complete mappings requires that all graph bipartition-

ings favor the criteria that we have chosen. Therefore, the bipartitioning of a

subgraph S^′of S should maintain load balance within the user-speciﬁed tolerance,

and minimize the partial communication cost function f_C^′, deﬁned as

ꢀ

f^′(τ_S,T, ρ_S,T) ^d=^ef

w_S({v, v^′}) |ρ_S,T({v, v^′})| ,

′

v ∈ V (S )

′

{v, v } ∈ E(S)

which accounts for the dilation of edges internal to subgraph S^′as well as for the

one of edges which belong to the cocycle of S^′, as shown in Figure 1. Taking into

account the partial mapping results issued by previous bipartitionings makes it pos-

sible to avoid local choices that might prove globally bad, as explained below. This

amounts to incorporating additional constraints to the standard graph bipartition-

ing problem, turning it into a more general optimization problem termed skewed

graph partitioning by some authors [27].

a. Initial position.

b. After one vertex is moved.

Figure 1: Edges accounted for in the partial communication cost function when

bipartitioning the subgraph associated with domain D between the two subdomains

D₀and D₁of D. Dotted edges are of dilation zero, their two ends being mapped

onto the same subdomain. Thin edges are cocycle edges.

3.1.5 Execution scheme

From an algorithmic point of view, our mapper behaves as a greedy algorithm, since

the mapping of a process to a subdomain is never reconsidered, and at each step

of which iterative algorithms can be applied. The double recursive call performed

at each step induces a recursion scheme in the shape of a binary tree, each vertex

of which corresponds to a bipartitioning job, that is, the bipartitioning of both a

domain and its associated subgraph.

In the case of depth-ﬁrst sequencing, as written in the above sketch, biparti-

tioning jobs run in the left branches of the tree have no information on the dis-

tance between the vertices they handle and neighbor vertices to be processed in

the right branches. On the contrary, sequencing the jobs according to a by-level

(breadth-ﬁrst) travel of the tree allows any bipartitioning job of a given level to

have information on the subdomains to which all the processes have been assigned

at the previous level. Thus, when deciding in which subdomain to put a given pro-

cess, a bipartitioning job can account for the communication costs induced by its

neighbor processes, whether they are handled by the job itself or not, since it can

estimate in f_C^′the dilation of the corresponding edges. This results in an interesting

feedback eﬀect: once an edge has been kept in a cut between two subdomains, the

distance between its end vertices will be accounted for in the partial communication

cost function to be minimized, and following jobs will be more likely to keep these

vertices close to each other, as illustrated in Figure 2. The relative eﬃciency of

CL0

CL2

CL1

CL2

a. Depth-ﬁrst sequencing.

b. Breadth-ﬁrst sequencing.

Figure 2: Inﬂuence of depth-ﬁrst and breadth-ﬁrst sequencings on the bipartitioning

of a domain D belonging to the leftmost branch of the bipartitioning tree. With

breadth-ﬁrst sequencing, the partial mapping data regarding vertices belonging to

the right branches of the bipartitioning tree are more accurate (C.L. stands for “Cut

Level”).

depth-ﬁrst and breadth-ﬁrst sequencing schemes with respect to the structure of

the source and target graphs is discussed in [44].

3.1.6 Graph bipartitioning methods

The core of our recursive mapping algorithm uses process graph bipartitioning meth-

ods as black boxes. It allows the mapper to run any type of graph bipartitioning

method compatible with our criteria for quality. Bipartitioning jobs maintain an in-

ternal image of the current bipartition, indicating for every vertex of the job whether

it is currently assigned to the ﬁrst or to the second subdomain. It is therefore possi-

ble to apply several diﬀerent methods in sequence, each one starting from the result

of the previous one, and to select the methods with respect to the job character-

istics, thus enabling us to deﬁne mapping strategies. The currently implemented

graph bipartitioning methods are listed below.

Band

Like the multi-level method which will be described below, the band method

is a meta-algorithm, in the sense that it does not itself compute partitions, but

rather helps other partitioning algorithms perform better. It is a reﬁnement

algorithm which, from a given initial partition, extracts a band graph of given

width (which only contains graph vertices that are at most at this distance

from the separator), calls a partitioning strategy on this band graph, and

projects back the reﬁned partition on the original graph. This method was

designed to be able to use expensive partitioning heuristics, such as genetic

algorithms, on large graphs, as it dramatically reduces the problem space by

several orders of magnitude. However, it was found that, in a multi-level

context, it also improves partition quality, by coercing partitions in a problem

space that derives from the one which was globally deﬁned at the coarsest

level, thus preventing local optimization reﬁnement algorithms to be trapped

in local optima of the ﬁner graphs [8].

Diﬀusion

This global optimization method, presented in [42], ﬂows two kinds of antag-

onistic liquids, scotch and anti-scotch, from two source vertices, and sets the

new frontier as the limit between vertices which contain scotch and the ones

which contain anti-scotch. In order to add load-balancing constraints to the

algorithm, a constant amount of liquid disappears from every vertex per unit

of time, so that no domain can spread across more than half of the vertices.

Because selecting the source vertices is essential to the obtainment of use-

ful results, this method has been hard-coded so that the two source vertices

are the two vertices of highest indices, since in the band method these are

the anchor vertices which represent all of the removed vertices of each part.

Therefore, this method must be used on band graphs only, or on speciﬁcally

crafted graphs.

Exactiﬁer

This greedy algorithm reﬁnes the current partition so as to reduce load imbal-

ance as much as possible, while keeping the value of the communication cost

function as small as possible. The vertex set is scanned in order of decreasing

vertex weights, and vertices are moved from one subdomain to the other if

doing so reduces load imbalance. When several vertices have same weight,

the vertex whose swap decreases most the communication cost function is se-

lected ﬁrst. This method is used in post-processing of other methods when

load balance is mandatory. For weighted graphs, the strict enforcement of

load balance may cause the swapping of isolated vertices of small weight, thus

greatly increasing the cut. Therefore, great care should be taken when using

this method if connectivity or cut minimization are mandatory.

Fiduccia-Mattheyses

The Fiduccia-Mattheyses heuristics [12] is an almost-linear improvement of

the famous Kernighan-Lin algorithm [35]. It tries to improve the bipartition

that is input to it by incrementally moving vertices between the subsets of

the partition, as long as it can ﬁnd sequences of moves that lower its commu-

nication cost. By considering sequences of moves instead of single swaps, the

algorithm allows hill-climbing from local minima of the cost function. As an

extension to the original Fiduccia-Mattheyses algorithm, we have developed

new data structures, based on logarithmic indexings of arrays, that allow us

to handle weighted graphs while preserving the almost-linearity in time of the

algorithm [44].

As several authors quoted before [24, 32], the Fiduccia-Mattheyses algorithm

gives better results when trying to optimize a good starting partition. There-

fore, it should not be used on its own, but rather after greedy starting methods

such as the Gibbs-Poole-Stockmeyer or the greedy graph growing methods.

Gibbs-Poole-Stockmeyer

This greedy bipartitioning method derives from an algorithm proposed by

Gibbs, Poole, and Stockmeyer to minimize the dilation of graph orderings,

that is, the maximum absolute value of the diﬀerence between the numbers of

neighbor vertices [18]. The graph is sliced by using a breadth-ﬁrst spanning

tree rooted at a randomly chosen vertex, and this process is iterated by se-

lecting a new root vertex within the last layer as long as the number of layers

increases. Then, starting from the current root vertex, vertices are assigned

layer after layer to the ﬁrst subdomain, until half of the total weight has been

processed. Remaining vertices are then allocated to the second subdomain.

As for the original Gibbs, Poole, and Stockmeyer algorithm, it is assumed that

the maximization of the number of layers results in the minimization of the

sizes –and therefore of the cocycles– of the layers. This property has already

been used by George and Liu to reorder sparse linear systems using the nested

dissection method [17], and by Simon in [54].

Greedy graph growing

This greedy algorithm, which has been proposed by Karypis and Kumar [31],

belongs to the GRASP (“Greedy Randomized Adaptive Search Procedure”)

class [36]. It consists in selecting an initial vertex at random, and repeatedly

adding vertices to this growing subset, such that each added vertex results

in the smallest increase in the communication cost function. This process,

which stops when load balance is achieved, is repeated several times in order

to explore (mostly in a gradient-like fashion) diﬀerent areas of the solution

space, and the best partition found is kept.

Multi-level

This algorithm, which has been studied by several authors [4, 23, 31] and

should be considered as a strategy rather than as a method since it uses other

methods as parameters, repeatedly reduces the size of the graph to bipartition

by ﬁnding matchings that collapse vertices and edges, computes a partition

for the coarsest graph obtained, and projects the result back to the original

graph, as shown in Figure 3. The multi-level method, when used in conjunc-

Refined partition

Projected partition

Coarsening

phase

Uncoarsening

phase

Initial partitioning

Figure 3: The multi-level partitioning process. In the uncoarsening phase, the light

and bold lines represent for each level the projected partition obtained from the

coarser graph, and the partition obtained after reﬁnement, respectively.

tion with the Fiduccia-Mattheyses method to compute the initial partitions

and reﬁne the projected partitions at every level, usually leads to a signiﬁ-

cant improvement in quality with respect to the plain Fiduccia-Mattheyses

method. By coarsening the graph used by the Fiduccia-Mattheyses method

to compute and project back the initial partition, the multi-level algorithm

broadens the scope of the Fiduccia-Mattheyses algorithm, and makes possible

for it to account for topological structures of the graph that would else be of

a too high level for it to encompass in its local optimization process.

3.1.7 Mapping onto variable-sized architectures

Several constrained graph partitioning problems can be modeled as mapping the

problem graph onto a target architecture, the number of vertices and topology of

which depend dynamically on the structure of the subgraphs to bipartition at each

step.

Variable-sized architectures are supported by the DRB algorithm in the follow-

ing way: at the end of each bipartitioning step, if any of the variable subdomains

is empty (that is, all vertices of the subgraph are mapped only to one of the sub-

domains), then the DRB process stops for both subdomains, and all of the vertices

are assigned to their parent subdomain; else, if a variable subdomain has only one

vertex mapped onto it, the DRB process stops for this subdomain, and the vertex

is assigned to it.

The moment when to stop the DRB process for a speciﬁc subgraph can be con-

trolled by deﬁning a bipartitioning strategy that tests for the validity of a criterion

at each bipartitioning step, and maps all of the subgraph vertices to one of the

subdomains when it becomes false.

3.2 Sparse matrix ordering by hybrid incomplete nested dis-

section

When solving large sparse linear systems of the form Ax = b, it is common to

precede the numerical factorization by a symmetric reordering. This reordering is

chosen in such a way that pivoting down the diagonal in order on the resulting

permuted matrix PAP^Tproduces much less ﬁll-in and work than computing the

factors of A by pivoting down the diagonal in the original order (the ﬁll-in is the

set of zero entries in A that become non-zero in the factored matrix).

3.2.1 Minimum Degree

The minimum degree algorithm [55] is a local heuristic that performs its pivot

selection by iteratively selecting from the graph a node of minimum degree.

The minimum degree algorithm is known to be a very fast and general purpose

algorithm, and has received much attention over the last three decades (see for

example [1, 16, 39]). However, the algorithm is intrinsically sequential, and very

little can be theoretically proved about its eﬃciency.

3.2.2 Nested dissection

The nested dissection algorithm [17] is a global, heuristic, recursive algorithm which

computes a vertex set S that separates the graph into two parts A and B, ordering

S with the highest remaining indices. It then proceeds recursively on parts A and B

until their sizes become smaller than some threshold value. This ordering guarantees

that, at each step, no non zero term can appear in the factorization process between

unknowns of A and unknowns of B.

Many theoretical results have been carried out on nested dissection order-

ing [7, 38], and its divide and conquer nature makes it easily parallelizable. The

main issue of the nested dissection ordering algorithm is thus to ﬁnd small vertex

separators that balance the remaining subgraphs as evenly as possible. Most often,

vertex separators are computed by using direct heuristics [28, 37], or from edge

separators [48, and included references] by minimum cover techniques [9, 30], but

other techniques such as spectral vertex partitioning have also been used [49].

Provided that good vertex separators are found, the nested dissection algorithm

produces orderings which, both in terms of ﬁll-in and operation count, compare

favorably [19, 31, 46] to the ones obtained with the minimum degree algorithm [39].

Moreover, the elimination trees induced by nested dissection are broader, shorter,

and better balanced, and therefore exhibit much more concurrency in the context

of parallel Cholesky factorization [3, 14, 15, 19, 46, 53, and included references].

3.2.3 Hybridization

Due to their complementary nature, several schemes have been proposed to

hybridize the two methods [28, 34, 46]. However, to our knowledge, only loose

couplings have been achieved: incomplete nested dissection is performed on the

graph to order, and the resulting subgraphs are passed to some minimum degree

algorithm. This results in the fact that the minimum degree algorithm does not

have exact degree values for all of the boundary vertices of the subgraphs, leading

to a misbehavior of the vertex selection process.

Our ordering program implements a tight coupling of the nested dissection and

minimum degree algorithms, that allows each of them to take advantage of the infor-

mation computed by the other. First, the nested dissection algorithm provides exact

degree values for the boundary vertices of the subgraphs passed to the minimum

degree algorithm (called halo minimum degree since it has a partial visibility of the

neighborhood of the subgraph). Second, the minimum degree algorithm returns the

assembly tree that it computes for each subgraph, thus allowing for supervariable

amalgamation, in order to obtain column-blocks of a size suitable for BLAS3 block

computations.

As for our mapping program, it is possible to combine ordering methods into

ordering strategies, which allow the user to select the proper methods with respect

to the characteristics of the subgraphs.

The ordering program is completely parametrized by its ordering strategy. The

nested dissection method allows the user to take advantage of all of the graph

partitioning routines that have been developed in the earlier stages of the Scotch

project. Internal ordering strategies for the separators are relevant in the case of

sequential or parallel [20, 50, 51, 52] block solving, to select ordering algorithms

that minimize the number of extra-diagonal blocks [7], thus allowing for eﬃcient

use of BLAS3 primitives, and to reduce inter-processor communication.

3.2.4 Performance criteria

The quality of orderings is evaluated with respect to several criteria. The ﬁrst

one, NNZ, is the number of non-zero terms in the factored reordered matrix. The

second one, OPC, is the operation count, that is the number of arithmetic operations

required to factor the matrix. The operation count that we have considered takes

into consideration all operations (additions, subtractions, multiplications, divisions)

ꢁ

required by Cholesky factorization, except square roots; it is equal to _cn²_c, where

n_cis the number of non-zeros of column c of the factored matrix, diagonal included.

A third criterion for quality is the shape of the elimination tree; concurrency in

parallel solving is all the higher as the elimination tree is broad and short. To

measure its quality, several parameters can be deﬁned: h_min, h_max, and h_avgdenote

the minimum, maximum, and average heights of the tree¹, respectively, and h_dlt

is the variance, expressed as a percentage of h_avg. Since small separators result in

small chains in the elimination tree, h_avgshould also indirectly reﬂect the quality

of separators.

3.2.5 Ordering methods

The core of our ordering algorithm uses graph ordering methods as black boxes,

which allows the orderer to run any type of ordering method. In addition to yielding

orderings of the subgraphs that are passed to them, these methods may compute

column block partitions of the subgraphs, that are recorded in a separate tree

structure. The currently implemented graph ordering methods are listed below.

Halo approximate minimum degree

The halo approximate minimum degree method [47] is an improvement of

the approximate minimum degree [1] algorithm, suited for use on subgraphs

produced by nested dissection methods. Its interest compared to classical min-

imum degree algorithms is that boundary vertices are processed using their

real degree in the global graph rather than their (much smaller) degree in the

subgraph, resulting in smaller ﬁll-in and operation count. This method also

implements amalgamation techniques that result in eﬃcient block computa-

tions in the factoring and the solving processes.

Halo approximate minimum ﬁll

The halo approximate minimum ﬁll method is a variant of the halo approxi-

mate minimum degree algorithm, where the criterion to select the next vertex

to permute is not based on its current estimated degree but on the minimiza-

tion of the induced ﬁll.

Graph compression

The graph compression method [2] merges cliques of vertices into single nodes,

so as to speed-up the ordering of the compressed graph. It also results in some

improvement of the quality of separators, especially for stiﬀness matrices.

Gibbs-Poole-Stockmeyer

This method is mainly used on separators to reduce the number and extent

of extra-diagonal blocks.

Simple method

Vertices are ordered consecutively, in the same order as they are stored in the

graph. This is the fastest method to use on separators when the shape of

extra-diagonal structures is not a concern.

Nested dissection

Incomplete nested dissection method. Separators are computed recursively on

subgraphs, and speciﬁc ordering methods are applied to the separators and

to the resulting subgraphs (see sections 3.2.2 and 3.2.3).

¹We do not consider as leaves the disconnected vertices that are present in some meshes, since

they do not participate in the solving process.

3.2.6 Graph separation methods

The core of our incomplete nested dissection algorithm uses graph separation

methods as black boxes. It allows the orderer to run any type of graph separation

method compatible with our criteria for quality, that is, reducing the size of the

vertex separator while maintaining the loads of the separated parts within some

user-speciﬁed tolerance. Separation jobs maintain an internal image of the current

vertex separator, indicating for every vertex of the job whether it is currently

assigned to one of the two parts, or to the separator. It is therefore possible to

apply several diﬀerent methods in sequence, each one starting from the result of

the previous one, and to select the methods with respect to the job characteristics,

thus enabling the deﬁnition of separation strategies.

The currently implemented graph separation methods are listed below.

Fiduccia-Mattheyses

This is a vertex-oriented version of the original, edge-oriented, Fiduccia-

Mattheyses heuristics described in page 13.

Greedy graph growing

This is a vertex-oriented version of the edge-oriented greedy graph growing

algorithm described in page 14.

Multi-level

This is a vertex-oriented version of the edge-oriented multi-level algorithm

described in page 14.

Thinner

This greedy algorithm reﬁnes the current separator by removing all of the

exceeding vertices, that is, vertices that do not have neighbors in both parts.

It is provided as a simple gradient reﬁnement algorithm for the multi-level

method, and is clearly outperformed by the Fiduccia-Mattheyses algorithm.

Vertex cover

This algorithm computes a vertex separator by ﬁrst computing an edge sepa-

rator, that is, a bipartition of the graph, and then turning it into a vertex sep-

arator by using the method proposed by Pothen and Fang [48]. This method

requires the computation of maximal matchings in the bipartite graphs as-

sociated with the edge cuts, which are built using Duﬀ’s variant [9] of the

Hopcroft and Karp algorithm [30]. Edge separators are computed by using a

bipartitioning strategy, which can use any of the graph bipartitioning methods

described in section 3.1.6, page 12.

4 Updates

4.1 Changes from version 4.0

Scotch has gone parallel with the release of PT-Scotch, the Parallel Threaded

Scotch. People interested in these parallel routines should refer to the PT-Scotch

and libScotch 5.1 User’s Guide [43], which extends this manual.

A compatibility library has been developed to allow users to try and use Scotch

in programs that were designed to use MeTiS. Please refer to Section 7.14 for more

information.

Scotch can now handle compressed streams on the ﬂy, in several widely used

formats such as gzip, bzip2 or lzma. Please refer to Section 6.2 for more informa-

tion.

4.2 Changes from version 5.0

A new integer index type has been created in the Fortran interface, to address array

indices larger than the maximum value which can be stored in a regular integer.

Please refer to Section 8.3 for more information.

5 Files and data structures

For the sake of portability, readability, and reduction of storage space, all the data

ﬁles shared by the diﬀerent programs of the Scotch project are coded in plain

ASCII text exclusively. Although we may speak of “lines” when describing ﬁle for-

mats, text-formatting characters such as newlines or tabulations are not mandatory,

and are not taken into account when ﬁles are read. They are only used to provide

better readability and understanding. Whenever numbers are used to label objects,

and unless explicitely stated, numberings always start from zero, not one.

5.1 Graph ﬁles

Graph ﬁles, which usually end in “.grf” or “.src”, describe valuated graphs, which

can be valuated process graphs to be mapped onto target architectures, or graphs

representing the adjacency structures of matrices to order.

Graphs are represented by means of adjacency lists: the deﬁnition of each

vertex is accompanied by the list of all of its neighbors, i.e. all of its adjacent arcs.

Therefore, the overall number of edge data is twice the number of edges.

Since version 3.3 has been introduced a new ﬁle format, referred to as the “new-

style” ﬁle format, which replaces the previous, “old-style”, ﬁle format. The two

advantages of the new-style format over its predecessor are its greater compacity,

which results in shorter I/O times, and its ability to handle easily graphs output

by C or by Fortran programs.

Starting from version 4.0, only the new format is supported. To convert

remaining old-style graph ﬁles into new-style graph ﬁles, one should get version 3.4

of the Scotch distribution, which comprises the scv ﬁle converter, and use it to

produce new-style Scotch graph ﬁles from the old-style Scotch graph ﬁles which

it is able to read. See section 6.3.5 for a description of gcv, formerly called scv.

The ﬁrst line of a graph ﬁle holds the graph ﬁle version number, which is cur-

rently 0. The second line holds the number of vertices of the graph (referred to as

vertnbr in libScotch; see for instance Figure 16, page 51, for a detailed example),

followed by its number of arcs (unappropriately called edgenbr, as it is in fact equal

to twice the actual number of edges). The third line holds two ﬁgures: the graph

base index value (baseval), and a numeric ﬂag.

The graph base index value records the value of the starting index used to

describe the graph; it is usually 0 when the graph has been output by C programs,

and 1 for Fortran programs. Its purpose is to ease the manipulation of graphs within

each of these two environments, while providing compatibility between them.

The numeric ﬂag, similar to the one used by the Chaco graph format [24], is

made of three decimal digits. A non-zero value in the units indicates that vertex

weights are provided. A non-zero value in the tenths indicates that edge weights

are provided. A non-zero value in the hundredths indicates that vertex labels are

provided; if it is the case, vertices can be stored in any order in the ﬁle; else, natural

order is assumed, starting from the graph base index.

This header data is then followed by as many lines as there are vertices in the

graph, that is, vertnbr lines. Each of these lines begins with the vertex label,

if necessary, the vertex load, if necessary, and the vertex degree, followed by the

description of the arcs. An arc is deﬁned by the load of the edge, if necessary, and

by the label of its other end vertex. The arcs of a given vertex can be provided

in any order in its neighbor list. If vertex labels are provided, vertices can also be

stored in any order in the ﬁle.

Figure 4 shows the contents of a graph ﬁle modeling a cube with unity vertex

and edge weights and base 0.

000

Figure 4: Graph ﬁle representing a cube.

5.2 Mesh ﬁles

Mesh ﬁles, which usually end in “.msh”, describe valuated meshes, made of elements

and nodes, the elements of which can be mapped onto target architectures, and the

nodes of which can be reordered.

Meshes are bipartite graphs, in the sense that every element is connected to the

nodes that it comprises, and every node is connected to the elements to which it

belongs. No edge connects any two element vertices, nor any two node vertices.

One can also think of meshes as hypergraphs, such that nodes are the vertices

of the hypergraph and elements are hyper-edges which connect multiple nodes, or

reciprocally such that elements are the vertices of the hypergraph and nodes are

hyper-edges which connect multiple elements.

Since meshes are graphs, the structure of mesh ﬁles resembles very much the

one of graph ﬁles described above in section 5.1, and diﬀers only by its header,

which indicates which of the vertices are node vertices and element vertices.

The ﬁrst line of a mesh ﬁle holds the mesh ﬁle version number, which is currently

1. Graph and mesh version numbers will always diﬀer, which enables application

programs to accept both ﬁle formats and adapt their behavior according to the

type of input data. The second line holds the number of elements of the mesh

(velmnbr), followed by its number of nodes (vnodnbr), and by its overall number of

arcs (edgenbr, that is, twice the number of edges which connect elements to nodes

and vice-versa).

The third line holds three ﬁgures: the base index of the ﬁrst element vertex in

memory (velmbas), the base index of the ﬁrst node vertex in memory (vnodbas),

and a numeric ﬂag.

The Scotch mesh ﬁle format requires that all nodes and all elements be assigned

to contiguous ranges of indices. Therefore, either all element vertices are deﬁned

before all node vertices, or all node vertices are deﬁned before all element vertices.

The node and element base indices indicate at the same time whether elements or

nodes are put in the ﬁrst place, as well as the value of the starting index used to

describe the graph. Indeed, if velmbas < vnodbas, then elements have the smallest

indices, velmbas is the base value of the underlying graph (that is, baseval =

velmbas), and velmbas + velmnbr = vnodbas holds. Conversely, if velmbas >

vnodbas, then nodes have the smallest indices, vnodbas is the base value of the

underlying graph, (that is, baseval = vnodbas), and vnodbas+vnodnbr = velmbas

holds.

The numeric ﬂag, similar to the one used by the Chaco graph format [24], is

made of three decimal digits. A non-zero value in the units indicates that vertex

weights are provided. A non-zero value in the tenths indicates that edge weights

are provided. A non-zero value in the hundredths indicates that vertex labels are

provided; if it is the case, and if velmbas < vnodbas (resp. velmbas > vnodbas),

the velmnbr (resp. vnodnbr) ﬁrst vertex lines are assumed to be element (resp.

node) vertices, irrespective of their vertex labels, and the vnodnbr (resp. velmnbr)

remaining vertex lines are assumed to be node (resp. element) vertices; else, natural

order is assumed, starting at the underlying graph base index (baseval).

This header data is then followed by as many lines as there are node and element

vertices in the graph. These lines are similar to the ones of the graph format, except

that, in order to save disk space, the numberings of nodes and elements all start

from the same base value, that is, min(velmbas, vnodbas) (also called baseval, like

for regular graphs).

For example, Figure 5 shows the contents of the mesh ﬁle modeling three square

elements, with unity vertex and edge weights, elements deﬁned before nodes, and

numbering of the underlying graph starting from 1. In memory, the three elements

are labeled from 1 to 3, and the eight nodes are labeled from 4 to 11. In the ﬁle,

the three elements are still labeled from 1 to 3, while the eight nodes are labeled

from 1 to 8.

When labels are used, elements and nodes may have similar labels, but not two

elements, nor two nodes, should have the same labels.

5.3 Geometry ﬁles

Geometry ﬁles, which usually end in “.xyz”, hold the coordinates of the vertices

of their associated graph or mesh. These ﬁles are not used in the mapping process

itself, since only topological properties are taken into account then (mappings are

computed regardless of graph geometry). They are used by visualization programs

to compute graphical representations of mapping results.

The ﬁrst string to appear in a geometry ﬁle codes for its type, or dimensional-

ity. It is “1” if the ﬁle contains unidimensional coordinates, “2” for bidimensional

coordinates, and “3” for tridimensional coordinates. It is followed by the number of

coordinate data stored in the ﬁle, which should be at least equal to the number of

vertices of the associated graph or mesh, and by that many coordinate lines. Each

coordinate line holds the label of the vertex, plus one, two or three real numbers

which are the (X), (X,Y), or (X,Y,Z), coordinates of the graph vertices, according

000

(= 5)

(= 10)

(= 8)

(= 11)

(= 7)

(= 11)

(= 6)

(= 4)

(= 7)

(= 5)

(= 9)

Figure 5: Mesh ﬁle representing three square elements, with unity vertex and edge

weights. Elements are deﬁned before nodes, and numbering of the underlying graph

starts from 1. The left part of the ﬁgure shows the mesh representation in memory,

with consecutive element and node indices. The right part of the ﬁgure shows

the contents of the ﬁle, with both element and node numberings starting from 1,

the minimum of the element and node base values. Corresponding node indices in

memory are shown in parentheses for the sake of comprehension.

to the graph dimensionality.

Vertices can be stored in any order in the ﬁle. Moreover, a geometry ﬁle can have

more coordinate data than there are vertices in the associated graph or mesh ﬁle;

only coordinates the labels of which match labels of graph or mesh vertices will be

taken into account. This feature allows all subgraphs of a given graph or mesh to

share the same geometry ﬁle, provided that graph vertex labels remain unchanged.

For example, Figure 6 shows the contents of the 3D geometry ﬁle associated with

the graph of Figure 4.

0.0

1.0

0.0

1.0

0.0

1.0

0.0

1.0

0.0

1.0

0.0

1.0

0.0

1.0

Figure 6: Geometry ﬁle associated with the graph ﬁle of Figure 4.

5.4 Target ﬁles

Target ﬁles describe the architectures onto which source graphs are mapped. Instead

of containing the structure of the target graph itself, as source graph ﬁles do, target

ﬁles deﬁne how target graphs are bipartitioned and give the distances between all

pairs of vertices (that is, processors). Keeping the bipartitioning information within

target ﬁles avoids recomputing it every time a target architecture is used. We are

allowed to do so because, in our approach, the recursive bipartitioning of the target

graph is fully independent with respect to that of the source graph (however, the

opposite is false).

For space and time saving issues, some classical homogeneous architectures (2D

and 3D meshes and tori, hypercubes, complete graphs, etc.) have been algorithmi-

cally coded within the mapper itself by the means of built-in functions. Instead of

containing the whole graph decomposition data, their target ﬁles hold only a few

values, used as parameters by the built-in functions.

5.4.1 Decomposition-deﬁned architecture ﬁles

Decomposition-deﬁned architecture ﬁles are the standard way to describe weighted

and/or irregular target architectures. Several ﬁle formats exist, but we only present

here the most humanly readable one, which begins in “deco 0” (“deco” stands for

“decomposition-deﬁned” architecture, and “0” is the format type).

The “deco 0” header is followed by two integer numbers, which are the number

of processors and the largest terminal number used in the decomposition, respec-

tively. Two arrays follow. The ﬁrst array has as many lines as there are processors.

Each of these lines holds three numbers: the processor label, the processor weight

(that is an estimation of its computational power), and its terminal number. The

terminal number associated with every processor is obtained by giving the initial

domain holding all the processors number 1, and by numbering the two subdomains

of a given domain of number i with numbers 2i and 2i + 1. The second array is

a lower triangular diagonal-less matrix that gives the distance between all pairs of

processors. This distance matrix, combined with the decomposition tree coded by

terminal numbers, allows the evaluation by averaging of the distance between all

pairs of domains. In order for the mapper to behave properly, distances between

processors must be strictly positive numbers. Therefore, null distances are not ac-

cepted. For instance, Figure 7 shows the contents of the architecture decomposition

ﬁle for UB(2, 3), the binary de Bruijn graph of dimension 3, as computed by the

amk grf program.

deco 0

8 15

0 1 15

1 1 14

2 1 13

3 1 11

4 1 12

5 1 9

15 14

12 13

9 11

8 10

6 1 8

7 1 10

2 1

2 1 2

1 1 1 2

3 2 1 1 2

2 2 2 1 1 1

3 2 3 1 2 2 1

Figure 7: Target decomposition ﬁle for UB(2, 3). The terminal numbers associated

with every processor deﬁne a unique recursive bipartitioning of the target graph.

5.4.2 Algorithmically-coded architecture ﬁles

Almost all algorithmically-coded architectures are deﬁned with unity edge and ver-

tex weights. They start with an abbreviation name of the architecture, followed by

parameters speciﬁc to the architecture. The available built-in architecture deﬁni-

tions are listed below.

cmplt size

Deﬁnes a complete graph with size vertices. The vertex labels are numbers

between 0 and size − 1.

cmpltw size load₀load₁. . . load

size−1

Deﬁnes a weighted complete graph with size vertices. The vertex labels are

numbers between 0 and size − 1, and vertices are assigned integer weights in

the order in which these are provided.

hcub dim

Deﬁnes a binary hypercube of dimension dim. Graph vertices are numbered

according to the value of the binary representation of their coordinates in the

hypercube.

tleaf levlnbr sizeval₀linkval₀. . . sizeval

levlnbr−1

Deﬁnes a hierarchical, tree-shaped, architecture with levlnbr levels and

linkval

levlnbr−1

ꢁ

^levlnbr−1sizeval_ileaf vertices. This topology is used to model multi-stage,

i=0

NUMA ou NUIOA machines. The mapping is only computed with respect

to the leaf vertices, which represent processing elements, while the upper lev-

els of the tree model interconnection networks (intra-chip buses, inter-chip

interconnection networks, network routers, etc.), as shown in Figure 8. The

communication cost between two nodes is the cost of the highest common

ancestor level.

Figure 8: A “tree-leaf” graph of three levels. Processors are drawn in black and

routers in grey The description of this architecture is “tleaf 3 3 20 2 7 2 2”,

since it has 3 levels, the ﬁrst level has 3 sons and a traversal cost of 20, the second

level has 2 sons and a traversal cost of 7, and the third level has also 2 sons and a

traversal cost of 2.

The two additional parameters cluster and weight serve to model hetero-

geneous architectures for which multiprocessor nodes having several highly

interconnected processors (typically by means of shared memory) are linked

by means of networks of lower bandwidth. cluster represents the number

of levels to traverse, starting from the root of the leaf, before reaching the

height−cluster

multiprocessors, each multiprocessor having 2

nodes. weight is

the relative cost of extra-cluster links, that is, links in the upper levels of the

tree-leaf graph. Links within clusters are assumed to have weight 1.

When there are no clusters at all, that is, in the case of purely homogeneous

architectures, set cluster to be equal to height, and weight to 1.

mesh2D dimX dimY

Deﬁnes a bidimensional array of dimX columns by dimY rows. The vertex

with coordinates (posX, posY) has label posY × dimX + posX.

mesh3D dimX dimY dimZ

Deﬁnes a tridimensional array of dimX columns by dimY rows by dimZ lev-

els. The vertex with coordinates (posX,posY,posZ) has label (posZ × dimY +

posY) × dimX + posX.

torus2D dimX dimY

Deﬁnes a bidimensional array of dimX columns by dimY rows, with

wraparound edges. The vertex with coordinates (posX, posY) has label

posY × dimX + posX.

torus3D dimX dimY dimZ

Deﬁnes a tridimensional array of dimX columns by dimY rows by dimZ levels,

with wraparound edges. The vertex with coordinates (posX,posY,posZ ) has

label (posZ × dimY + posY) × dimX + posX.

5.4.3 Variable-sized architecture ﬁles

Variable-sized architectures are a class of algorithmically-coded architectures the

size of which is not deﬁned a priori. As for ﬁxed-size algorithmically-coded ar-

chitectures, they start with an abbreviation name of the architecture, followed by

parameters speciﬁc to the architecture. The available built-in variable-sized archi-

tecture deﬁnitions are listed below.

varcmplt

Deﬁnes a variable-sized complete graph. Domains are labeled such that the

ﬁrst domain is labeled 1, and the two subdomains of any domain i are labeled

2i and 2i + 1. The distance between any two subdomains i and j is 0 if i = j

and 1 else.

varhcub

Deﬁnes a variable-sized hypercube. Domains are labeled such that the ﬁrst

domain is labeled 1, and the two subdomains of any domain i are labeled 2i

and 2i + 1. The distance between any two domains is the Hamming distance

between the common bits of the two domains, plus half of the absolute dif-

ference between the levels of the two domains, this latter term modeling the

average distance on unknown bits. For instance, the distance between subdo-

main 9 = 1001_B, of level 3 (since its leftmost 1 has been shifted left thrice),

and subdomain 53 = 110101_B, of level 5 (since its leftmost 1 has been shifted

left ﬁve times), is 2: it is 1, which is the number of bits which diﬀer between

1101_B(that is, 53 = 110101_Bshifted rightwards twice) and 1001_B, plus 1,

which is half of the absolute diﬀerence between 5 and 3.

5.5 Mapping ﬁles

Mapping ﬁles, which usually end in “.map”, contain the result of the mapping of

source graphs onto target architectures. They associate a vertex of the target graph

with every vertex of the source graph.

Mapping ﬁles begin with the number of mapping lines which they contain, fol-

lowed by that many mapping lines. Each mapping line holds a mapping pair, made

of two integer numbers which are the label of a source graph vertex and the label

of the target graph vertex onto which it is mapped. Mapping pairs can be stored

in any order in the ﬁle; however, labels of source graph vertices must be all dif-

ferent. For example, Figure 9 shows the result obtained when mapping the source

graph of Figure 4 onto the target architecture of Figure 7. This one-to-one embed-

ding of H(3) into UB(2, 3) has dilation 1, except for one hypercube edge which has

dilation 3.

Figure 9: Mapping ﬁle obtained when mapping the hypercube source graph of

Figure 4 onto the binary de Bruijn architecture of Figure 7.

Mapping ﬁles are also used on output of the block orderer to represent the

allocation of the vertices of the original graph to the column blocks associated with

the ordering. In this case, column blocks are labeled in ascending order, such that

the number of a block is always greater than the ones of its predecessors in the

elimination process, that is, its leaves in the elimination tree.

5.6 Ordering ﬁles

Ordering ﬁles, which usually end in “.ord”, contain the result of the ordering of

source graphs or meshes that represent sparse matrices. They associate a number

with every vertex of the source graph or mesh.

The structure of ordering ﬁles is analogous to the one of mapping ﬁles; they

diﬀer only by the meaning of their data.

Ordering ﬁles begin with the number of ordering lines which they contain, that

is the number of vertices in the source graph or the number of nodes in the source

mesh, followed by that many ordering lines. Each ordering line holds an ordering

pair, made of two integer numbers which are the label of a source graph or mesh

vertex and its rank in the ordering. Ranks range from the base value of the graph or

mesh (baseval) to the base value plus the number of vertices (resp. nodes), minus

one (baseval + vertnbr − 1 for graphs, and baseval + vnodnbr − 1 for meshes).

Ordering pairs can be stored in any order in the ﬁle; however, indices of source

vertices must be all diﬀerent.

For example, Figure 10 shows the result obtained when reordering the source

graph of Figure 4.

The advantage of having both graph and mesh orderings start from baseval

(and not vnodbas in the case of meshes) is that an ordering computed on the nodal

graph of some mesh has the same structure as an ordering computed from the native

mesh structure, allowing for greater modularity. However, in memory, permutation

indices for meshes are numbered from vnodbas to vnodbas + vnodnbr − 1.

5.7 Vertex list ﬁles

Vertex lists are used by programs that select vertices from graphs.

Figure 10: Ordering ﬁle obtained when reordering the hypercube graph of Figure 4.

Vertex lists are coded as lists of integer numbers. The ﬁrst integer is the number

of vertices in the list and the other integers are the labels of the selected vertices,

given in any order. For example, Figure 11 shows the list made from three vertices

of labels 2, 45, and 7.

Figure 11: Example of vertex list with three vertices of labels 2, 45, and 7.

6 Programs

The programs of the Scotch project belong to ﬁve distinct classes.

• Graph handling programs, the names of which begin in “g”, that serve to

build and test source graphs.

• Mesh handling programs, the names of which begin in “m”, that serve to build

and test source meshes.

• Target architecture handling programs, the names of which begin in “a”,

that allow the user to build and test decomposition-deﬁned target ﬁles, and

especially to turn a source graph ﬁle into a target ﬁle.

• The mapping and ordering programs themselves.

• Output handling programs, which are the mapping performance analyzer, the

graph factorization program, and the graph, matrix, and mapping visualiza-

tion program.

The general architecture of the Scotch project is displayed in Figure 12.

6.1 Invocation

The programs comprising the Scotch project have been designed to run in

command-line mode without any interactive prompting, so that they can be called

easily from other programs by means of “system ()” or “popen ()” system calls,

or be piped together on a single shell command line. In order to facilitate this,

whenever a stream name is asked for (either on input or output), the user may

put a single “-” to indicate standard input or output. Moreover, programs read

their input in the same order as stream names are given in the command line. It

allows them to read all their data from a single stream (usually the standard input),

provided that these data are ordered properly.

External

mesh file

External

graph file

mcv

gcv

mmk_*

gmk_*

Geometry

file

amk_*

.xyz

Source

mesh file

Source

graph file

Target file

gmk_msh

amk_grf

acpl

.msh

.grf

.tgt

mtst

mord

gord

gtst

gmap

atst

Ordering

Mapping

file

.ord

.map

gotst

gout

gmtst

File

Graphics

file

Program

Data flow

Figure 12: General architecture of the Scotch project. All of the features oﬀered

by the stand-alone programs are also available in the libScotch library.

A brief on-line help is provided with all the programs. To get this help, use the

“-h” option after the program name. The case of option letters is not signiﬁcant,

except when both the lower and upper cases of a letter have diﬀerent meanings.

When passing parameters to the programs, only the order of ﬁle names is signiﬁcant;

options can be put anywhere in the command line, in any order. Examples of use

of the diﬀerent programs of the Scotch project are provided in section 9.

Error messages are standardized, but may not be fully explanatory. However,

most of the errors you may run into should be related to ﬁle formats, and located

in “...Load” routines. In this case, compare your data formats with the deﬁnitions

given in section 5, and use the gtst and mtst programs to check the consistency of

source graphs and meshes.

6.2 Using compressed ﬁles

Starting from version 5.0.6, Scotch allows users to provide and retrieve data in

compressed form. Since this feature requires that the compression and decompres-

sion tasks run in the same time as data is read or written, it can only be done

on systems which support multi-threading (Posix threads) or multi-processing (by

means of fork system calls).

To determine if a stream has to be handled in compressed form, Scotch checks

its extension. If it is “.gz” (gzip format), “.bz2” (bzip2 format) or “.lzma” (lzma

format), the stream is assumed to be compressed according to the corresponding

format. A ﬁlter task will then be used to process it accordingly if the format is

implemented in Scotch and enabled on your system.

To date, data can be read and written in bzip2 and gzip formats, and can

also be read in the lzma format. Since the compression ratio of lzma on Scotch

graphs is 30% better than the one of gzip and bzip2 (which are almost equivalent

in this case), the lzma format is a very good choice for handling very large graphs.

To see how to enable compressed data handling in Scotch, please refer to Section 8.

When the compressed format allows it, several ﬁles can be provided on

the same stream, and be uncompressed on the ﬂy.

For instance, the

command “cat brol.grf.gz brol.xyz.gz | gout -.gz -.gz -Mn - brol.iv”

concatenates the topology and geometry data of some graph brol and feed them

as a single compressed stream to the standard input of program gout, hence the

”-.gz” to indicate a compressed standard stream.

6.3 Description

6.3.1 acpl

Synopsis

acpl [input target ﬁle [output target ﬁle]] options

Description

The program acpl is the decomposition-deﬁned architecture ﬁle compiler. It

processes architecture ﬁles of type “deco 0” built by hand or by the amk *

programs, to create a “deco 1” compiled architecture ﬁle of about four times

the size of the original one; see section 5.4.1, page 23, for a detailed description

of decomposition-deﬁned target architecture ﬁle formats.

The mapper can read both original and compiled architecture ﬁle formats.

However, compiled architecture ﬁles are read much more eﬃciently, as they are

directly loaded into memory without further processing. Since the compilation

time of a target architecture graph evolves as the square of its number of

vertices, precompiling with acpl can save some time when many mappings

are to be performed onto the same large target architecture.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.2 amk *

Synopsis

amk ccc dim [output target ﬁle] options

amk fft2 dim [output target ﬁle] options

amk hy dim [output target ﬁle] options

amk m2 dimX [dimY [output target ﬁle]] options

amk p2 weight0 [weight1 [output target ﬁle]] options

Description

The amk * programs make target graphs. Each of them is devoted to a

speciﬁc topology, for which it builds target graphs of any dimension.

These programs are an alternate way between algorithmically-coded built-in

target architectures and decompositions computed by mapping with amk grf.

Like built-in target architectures, their decompositions are algorithmically

computed, and like amk grf, their output is a decomposition-deﬁned target

architecture ﬁle. These programs allow the deﬁnition and testing of new

algorithmically-coded target architectures without coding them in the core of

the mapper.

Program amk ccc outputs the target architecture ﬁle of a Cube-Connected-

Cycles graph of dimension dim.

Vertex (l, m) of CCC(dim), with

0 ≤ l < dim and 0 ≤ m < 2^dim, is linked to vertices ((l − 1) mod dim, m),

((l + 1) mod dim, m), and (l, m ⊕ 2^l), and is labeled l × 2^dim+ m. ⊕ denotes

the bitwise exclusive-or binary operator, and a mod b the integer remainder

of the euclidian division of a by b.

Program amk fft2 outputs the target architecture ﬁle of a binary Fast-

Fourier-Transform graph of dimension dim. Vertex (l, m) of FFT(dim),

with 0 ≤ l ≤ dim and 0 ≤ m < 2^dim, is linked to vertices (l − 1, m),

(l − 1, m mod 2^l−1), (l + 1, m), and (l + 1, m ⊕ 2^l), if they exist, and is labeled

l × 2^dim+ m.

Program amk hy outputs the target architecture ﬁle of a hypercube graph

of dimension dim. Vertices are labeled according to the decimal value of

their binary representation. The decomposition-deﬁned target architectures

computed by amk hy do not exactly give the same results as the built-in

hypercube targets because distances are not computed in the same manner,

although the two recursive bipartitionings are identical. To achieve best

performance and save space, use the built-in architecture.

Program amk p2 outputs the target architecture ﬁle of a weighted path graph

with two vertices, the weights of which are given as parameters.

This simple target topology is used to bipartition a source graph into two

weighted parts with as few cut edges as possible. In particular, it is used

to compute independent partitions of the processors of a multi-user parallel

machine. As a matter of fact, if the yet unallocated part of the machine is

represented by a source graph with n vertices, and n^′processors are requested

by a user in order to run a job (with n^′≤ n), mapping the source graph onto

the weighted path graph with two vertices of weights n^′and n − n^′leads to

a partition of the machine in which the allocated n^′processors should be as

densely connected as possible (see Figure 13).

a. Construction of a partition with 13 b. Construction of a partition with

vertices (in black) on a 8 × 8 bidimen- 17 vertices (in black) on the remaining

sional mesh architecture.

architecture.

Figure 13: Construction of partitions on a bidimensional 8 × 8 mesh architecture

by weighted bipartitioning.

Options

-h Display the program synopsis.

-mmethod

Select the bipartitioning method (for amk m2 only).

Nested dissection.

Dimension-per-dimension one-way dissection. This is less eﬃcient

than nested dissection, and this feature exists only for benchmarking

purposes.

-V Print the program version and copyright.

6.3.3 amk grf

Synopsis

amk grf [input graph ﬁle [output target ﬁle]] options

Description

The program amk grf turns a source graph ﬁle into a decomposition-deﬁned

target ﬁle. It computes a recursive bipartitioning of the source graph, as well

as the array of distances between all pairs of its vertices, both of which are

combined to give a decomposition-deﬁned target architecture of same topology

as the input source graph.

The -l option restricts the target architecture to the vertices indicated in the

given vertex list ﬁle. It is therefore possible to build a target architecture

made of several disconnected parts of a bigger architecture. Note that this is

not equivalent to turning a disconnected source graph into a target architec-

ture, since doing so would lead to an architecture made of several independent

pieces at inﬁnite distance one from another. Considering the selected vertices

within their original architecture makes it possible to compute the distance

between vertices belonging to distinct connected components, and therefore

to evaluate the cost of the mapping of two neighbor processes onto disjoint

areas of the architecture.

The restriction feature is very useful in the context of multi-user parallel ma-

chines. On these machines, when users request processors in order to run their

jobs, the partitions allocated by the operating system may not be regular nor

connected, because of existing partitions already attributed to other people.

By feeding amk grf with the source graph representing the whole parallel ma-

chine, and the vertex list containing the labels of the processors allocated by

the operating system, it is possible to build a target architecture correspond-

ing to this partition, and therefore to map processes on it, automatically,

regardless of the partition shape.

The -b option selects the recursive bipartitioning strategy used to build the

decomposition of the source graph. For regular, unweighted, topologies, the

’-b(g|h)fx’ recursive bipartitioning strategy should work best. For irregular

or weighted graphs, use the default strategy, which is more ﬂexible. See

also the manual page of function SCOTCH archBuild, page 72, for further

information.

Options

-bstrategy

Use recursive bipartitioning strategy strategy to build the decomposi-

tion of the architecture graph. The format of bipartitioning strategies is

deﬁned within section 7.3.2, at page 59.

-h Display the program synopsis.

-linput vertex ﬁle

Load vertex list from input vertex ﬁle. As for all other ﬁle names, “-”

may be used to indicate standard input.

-V Print the program version and copyright.

6.3.4 atst

Synopsis

atst [input target ﬁle [output data ﬁle]] options

Description

The program atst is the architecture tester. It gives some statistics on

decomposition-deﬁned target architectures, and in particular the minimum,

maximum, and average communication costs (that is, weighted distance) be-

tween all pairs of processors.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.5 gcv

Synopsis

gcv [input graph ﬁle [output graph ﬁle [output geometry ﬁle]]] options

Description

The program gcv is the source graph converter. It takes on input a graph

ﬁle of the format speciﬁed with the -i option, and outputs its equivalent

in the format speciﬁed with the -o option, along with its associated geom-

etry ﬁle whenever geometry data is available. At the time being, it accepts

four input formats: the Matrix Market format [5], the Harwell-Boeing col-

lection format [10], the Chaco/MeTiS graph format [24], and the Scotch

format. Three output format are available: the Matrix Market format, the

Chaco/MeTiS graph format and the Scotch source graph and geometry

data format.

Options

-h Display the program synopsis.

-iformat

Specify the type of input graph. The available input formats are listed

below.

b[number]

Harwell-Boeing graph collection format. Only symmetric assembled

matrices are currently supported. Since ﬁles in this format can con-

tain several graphs one after another, the optional integer number,

starting from 0, indicates which graph of the ﬁle is considered for

conversion.

Chaco v1.0/MeTiS format.

The Matrix Market format.

Scotch source graph format.

-oformat

Specify the output graph format. The available output formats are listed

below.

Chaco v1.0/MeTiS format.

The Matrix Market format.

Scotch source graph format.

-V Print the program version and copyright.

Default option set is “-Ib0 -Os”.

6.3.6 gmap / gpart

Synopsis

gmap [input graph ﬁle [input target ﬁle [output mapping ﬁle [output log ﬁle]]]]

options

gpart number of parts [input graph ﬁle [output mapping ﬁle [output log ﬁle]]]

options

Description

The program gmap is the graph mapper. It uses a partitioning strategy to

map a source graph onto a target graph, so that the weight of source graph

vertices allocated to target vertices is balanced, and the communication cost

function f_Cis minimized.

The implemented mapping methods mainly derive from graph theory.

In particular, graph geometry is never used, even if it is available; only

topological properties are taken into account. Mapping methods are used to

deﬁne mapping strategies by means of selection, combination, grouping, and

condition operators.

The only mapping method implemented in version 5.1 is the Dual Recursive

Bipartitioning algorithm, which uses graph bipartitioning methods. Avail-

able bipartitioning methods include a multi-level algorithm that uses other

bipartitioning methods to compute the initial and reﬁned bipartitions, an

improved implementation of the Fiduccia–Mattheyses heuristic designed to

handle weighted graphs, a greedy method derived from the Gibbs, Poole, and

Stockmeyer algorithm, the greedy graph growing heuristic, and a greedy “ex-

actifying” reﬁnement algorithm designed to balance vertex loads as much as

possible; random and backtracking methods are also provided.

gpart is a simpliﬁed interface to gmap, which performs graph partitioning

instead of static mapping. Consequently, the desired number of parts has to

be provided, in lieu of the target architecture.

The -b and -c options allow the user to set preferences on the behavior of the

mapping strategy which is used by default. The -m option allows the user to

deﬁne a custom mapping strategy.

If mapping statistics are wanted rather than the mapping output itself, map-

ping output can be set to /dev/null, with option -vmt to get mapping statis-

tics and timings.

Options

Since the program is devoted to experimental studies, it has many optional

parameters, used to test various execution modes. Values set by default will

give best results in most cases.

-brat

Set the maximum load imbalance ratio to rat, which should be a value

comprised between 0 and 1. This option can be used in conjunction with

option -c, but is incompatible with option -m.

-cﬂags

Tune the default mapping strategy according to the given preference

ﬂags. Some of these ﬂags are antagonistic, while others can be combined.

See Section 7.3.1 for more information. The currently available ﬂags are

the following.

Enforce load balance as much as possible.

Privilege quality over speed. This is the default behavior.

Privilege speed over quality.

Use only safe methods in the strategy.

This option can be used in conjunction with option -b, but is incompat-

ible with option -m. The resulting strategy string can be displayed by

means of the -vs option.

-h Display the program synopsis.

-mstrat

Apply mapping strategy strat. The format of mapping strategies is de-

ﬁned in section 7.3.2. This option is incompatible with options -b and

-c.

-sobj

Mask source edge and vertex weights. This option allows the user to “un-

weight” weighted source graphs by removing weights from edges and ver-

tices at loading time. obj may contain several of the following switches.

Remove edge weights, if any.

Remove vertex weights, if any.

-V Print the program version and copyright.

-vverb

Set verbose mode to verb, which may contain several of the following

switches. For a detailed description of the data displayed, please refer to

the manual page of gmtst below.

Mapping information.

Strategy information. This parameter displays the mapping strategy

which will be used by gmap.

Timing information.

-V Print the program version and copyright.

6.3.7 gmk *

Synopsis

gmk hy dim [output graph ﬁle] options

gmk m2 dimX [dimY [output graph ﬁle]] options

gmk m3 dimX [dimY [dimZ [output graph ﬁle]]] options

gmk ub2 dim [output graph ﬁle] options

Description

The gmk * programs make source graphs. Each of them is devoted to a

speciﬁc topology, for which it builds target graphs of any dimension.

The gmk * programs are mainly used in conjunction with amk grf. Most

gmk * programs build source graphs describing parallel machines, which

are used by amk grf to generate corresponding target sub-architectures,

by means of its -l option. Such a procedure is shown in section 9, which

builds a target architecture from ﬁve vertices of a binary de Bruijn graph of

dimension 3.

Program gmk hy outputs the source ﬁle of a hypercube graph of dimension

dim. Vertices are labeled according to the decimal value of their binary

representation.

Program gmk m2 outputs the source ﬁle of a bidimensional mesh with dimX

columns and dimY rows. If the -t option is set, tori are built instead of

meshes. The vertex of coordinates (posX, posY ) is labeled posY×dimX+posX.

Program gmk m3 outputs the source ﬁle of a tridimensional mesh with dimZ

layers of dimY rows by dimX columns. If the -t option is set, tori are

built instead of meshes. The vertex of coordinates (posX, posY ) is labeled

(posZ × dimY + posY) × dimX + posX.

Program gmk ub2 outputs the source ﬁle of a binary unoriented de Bruijn

graph of dimension dim. Vertices are labeled according to the decimal value

of their binary representation.

Options

-goutput geometry ﬁle

Output graph geometry to ﬁle output geometry ﬁle (for gmk m2 only). As

for all other ﬁle names, “-” may be used to indicate standard output.

-h Display the program synopsis.

-t Build a torus rather than a mesh (for gmk m2 only).

-V Print the program version and copyright.

6.3.8 gmk msh

Synopsis

gmk msh [input mesh ﬁle [output graph ﬁle]] options

Description

The gmk msh program builds a graph ﬁle from a mesh ﬁle. All of the nodes

of the mesh are turned into graph vertices, and edges are created between

all pairs of vertices that share an element (that is, elements are turned into

cliques).

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.9 gmtst

Synopsis

gmtst [input graph ﬁle [input target ﬁle [input mapping ﬁle [output data

ﬁle]]]] options

Description

The program gmtst is the graph mapping tester. It outputs some statistics

on the given mapping, regarding load balance and inter-processor communi-

cation.

The two ﬁrst statistics lines deal with process mapping statistics, while the

following ones deal with communication statistics. The ﬁrst mapping line

gives the number of processors used by the mapping, followed by the number

of processors available in the architecture, and the ratio of these two numbers,

written between parentheses. The second mapping line gives the minimum,

maximum, and average loads of the processors, followed by the variance of the

load distribution, and an imbalance ratio equal to the maximum load over the

average load. The ﬁrst communication line gives the minimum and maximum

number of neighbors over all blocks of the mapping, followed by the sum of the

number of neighbors over all blocks of the mapping, that is the total number

of messages that have to be sent to exchange data between all neighboring

blocks. The second communication line gives the average dilation of the edges,

followed by the sum of all edge dilations. The third communication line gives

the average expansion of the edges, followed by the value of function f_C. The

fourth communication line gives the average cut of the edges, followed by the

number of cut edges. The ﬁfth communication line shows the ratio of the aver-

age expansion over the average dilation; it is smaller than 1 when the mapper

succeeds in putting heavily intercommunicating processes closer to each other

than it does for lightly communicating processes; it is equal to 1 if all edges

have the same weight. The remaining lines form a distance histogram, which

shows the amount of communication load that involves processors located at

increasing distances.

gmtst allows the testing of cross-architecture mappings. By inputing it a

target architecture diﬀerent from the one that has been used to compute the

mapping, but with compatible vertex labels, one can see what the mapping

would yield on this new target architecture.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.10 gord

Synopsis

gord [input graph ﬁle [output ordering ﬁle [output log ﬁle]]] options

Description

The gord program is the block sparse matrix graph orderer. It uses an or-

dering strategy to compute block orderings of sparse matrices represented as

source graphs, whose vertex weights indicate the number of DOFs per node (if

this number is non homogeneous) and whose edges are unweighted, in order

to minimize ﬁll-in and operation count.

Since its main purpose is to provide orderings that exhibit high concurrency

for parallel block factorization, it comprises a nested dissection method [17],

but classical [39] and state-of-the-art [1, 47] minimum degree algorithms are

implemented as well. Ordering methods are used to deﬁne ordering strategies

by means of selection, grouping, and condition operators.

For the nested dissection method, vertex separation methods comprise algo-

rithms that directly compute vertex separators, as well as methods that build

vertex separators from edge separators, i.e. graph bipartitions (all of the graph

bipartitioning methods available in the static mapper gmap can be used in this

latter case).

The -o option allows the user to deﬁne the ordering strategy. The -c option

allows the user to set preferences on the behavior of the ordering strategy

which is used by default.

When the graphs to order are very large, the same results can be obtained by

using the dgord parallel program of the PT-Scotch distribution, which can

read centralized graph ﬁles too.

Options

Since the program is devoted to experimental studies, it has many optional

parameters, used to test various execution modes. Values set by default will

give best results in most cases.

-cﬂags

Tune the default ordering strategy according to the given preference ﬂags.

Some of these ﬂags are antagonistic, while others can be combined. See

Section 7.3.1 for more information. The resulting strategy string can be

displayed by means of the -vs option.

Enforce load balance as much as possible.

Privilege quality over speed. This is the default behavior.

Privilege speed over quality.

Use only safe methods in the strategy.

-h Display the program synopsis.

-moutput mapping ﬁle

Write to output mapping ﬁle the mapping of graph vertices to column

blocks. All of the separators and leaves produced by the nested dissection

method are considered as distinct column blocks, which may be in turn

split by the ordering methods that are applied to them. Distinct integer

numbers are associated with each of the column blocks, such that the

number of a block is always greater than the ones of its predecessors in

the elimination process, that is, its descendants in the elimination tree.

The structure of mapping ﬁles is given in section 5.5.

When the geometry of the graph is available, this mapping ﬁle may be

processed by program gout to display the vertex separators and super-

variable amalgamations that have been computed.

-ostrat

Apply ordering strategy strat. The format of ordering strategies is deﬁned

in section 7.3.4.

-toutput tree ﬁle

Write to output tree ﬁle the structure of the separator tree. The data

that is written resembles much the one of a mapping ﬁle: after a ﬁrst

line that contains the number of lines to follow, there are that many lines

of mapping pairs, which associate an integer number with every graph

vertex index. This integer number is the number of the column block

which is the parent of the column block to which the vertex belongs,

or −1 if the column block to which the vertex belongs is a root of the

separator tree (there can be several roots, if the graph is disconnected).

Combined to the column block mapping data produced by option -m, the

tree structure allows one to rebuild the separator tree.

-V Print the program version and copyright.

-vverb

Set verbose mode to verb, which may contain several of the following

switches.

Strategy information. This parameter displays the ordering strategy

which will be used by gord.

Timing information.

6.3.11 gotst

Synopsis

gotst [input graph ﬁle [input ordering ﬁle [output data ﬁle]]] options

Description

The program gotst is the ordering tester. It gives some statistics on orderings,

including the number of non-zeros and the operation count of the factored

matrix, as well as statistics regarding the elimination tree. Since it performs

the factorization of the reordered matrix, it can take a very long time and

consume a large amount of memory when applied to large graphs.

The ﬁrst two statistics lines deal with the elimination tree. The ﬁrst one

displays the number of leaves, while the second shows the minimum height of

the tree (that is, the length of the shortest path from any leaf to the –or a–

root node), its maximum height, its average height, and the variance of the

heights with respect to the average. The third line displays the number of non-

zero terms in the factored matrix, the amount of index data that is necessary

to maintain the block structure of the factored matrix, and the number of

operations required to factor the matrix by means of Cholesky factorization.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.12 gout

Synopsis

gout [input graph ﬁle [input geometry ﬁle [input mapping ﬁle [output

visualization ﬁle]]]] options

Description

The gout program is the graph, matrix, and mapping viewer program. It takes

on input a source graph, its geometry ﬁle, and optionally a mapping result ﬁle,

and produces a ﬁle suitable for display. At the time being, gout can gener-

ate plain and encapsulated PostScript ﬁles for the display of adjacency matrix

patterns and the display of planar graphs (although tridimensional objects can

be displayed by means of isometric projection, the display of tridimensional

mappings is not eﬃcient), and Open Inventor ﬁles [40] for the interactive

visualization of tridimensional graphs.

In the case of mapping display, the number of mapping pairs contained in the

input mapping ﬁle may diﬀer from the number of vertices of the input source

graph; only mapping pairs the source labels of which match labels of source

graph vertices will be taken into account for display. This feature allows the

user to show the result of the mapping of a subgraph drawn on the whole

graph, or else to outline the most important aspects of a mapping by restrict-

ing the display to a limited portion of the graph. For example, Figure 14.b

shows how the result of the mapping of a subgraph of the bidimensional mesh

M₂(4, 4) onto the complete graph K(2) can be displayed on the whole M₂(4, 4)

graph, and Figure 14.c shows how the display of the same mapping can be

restricted to a subgraph of the original graph.

Options

-gparameters

Geometry parameters.

Do not read geometry data. This option can be used in conjunction

with option -om to avoid reading the geometry ﬁle when displaying

the pattern of the adjacency matrix associated with the source graph,

since geometry data are not needed in this case. If this option is set,

the geometry ﬁle is not read. However, if an output visualization ﬁle

name is given in the command line, dummy input geometry ﬁle and

input mapping ﬁle names must be speciﬁed so that the ﬁle argument

count is correct. In this case, use the “-” parameter to take standard

input as a dummy geometry input stream. In practice, the -om and

-gn options always imply the -mn option.

For bidimensional geometry only, rotate geometry data by 90 de-

grees, counter-clockwise.

-h Display the program synopsis.

-mn Do not read mapping data, and display the graph without any mapping

information. If this option is set, the mapping ﬁle is not read. However, if

an output visualization ﬁle name is given in the command line, a dummy

input mapping ﬁle name must be speciﬁed so that the ﬁle argument count

is correct. In this case, use the “-” parameter to take standard input as

a dummy mapping input stream.

a. A subgraph of M₂(4, 4) to

be mapped onto K(2).

b. Mapping result displayed

on the full M₂(4, 4) graph.

c. Mapping result dis-

played on another subgraph

of M₂(4, 4).

Figure 14: PostScript diplay of a single mapping ﬁle with diﬀerent subgraphs of the

same source graph. Vertices covered with disks of the same color are mapped onto

the same processor.

-oformat[{parameters}]

Specify the type of output, with optional parameters within curly braces

and separated by commas. The output formats are listed below.

Output the graph in SGI’s Open Inventor format, in ASCII mode,

suitable for display by the ivview program [40]. The optional pa-

rameters are given below.

Color output, using 16 diﬀerent colors. Opposite of g.

Grey-level output, using 8 diﬀerent levels. Opposite of c.

Remove cut edges. Edges the ends of which are mapped onto

diﬀerent processors are not displayed. Opposite of v.

View cut edges. All graph edges are displayed. Opposite of r.

Output the pattern of the adjacency matrix associated with the

source graph, in Adobe’s PostScript format. The optional parame-

ters are given below.

Encapsulated PostScript output, suitable for LT X use with

epsf. Opposite of f.

Full-page PostScript output, suitable for direct printing. Oppo-

site of e.

Output the graph in Adobe’s PostScript format. The optional pa-

rameters are given below.

Figure 15: Snapshot of an Open Inventor display of a sphere partitioned into 7

almost equal pieces by mapping onto the complete graph with 7 vertices. Vertices

of same color are mapped onto the same processor.

Avoid displaying the mapping disks. Opposite of d.

Color PostScript output, using 16 diﬀerent colors. Opposite of

Display the mapping disks. Opposite of a.

Encapsulated PostScript output, suitable for LT X use with

epsf. Opposite of f.

Full-page PostScript output, suitable for direct printing. Oppo-

site of e.

Grey-level PostScript output. Opposite of c.

Large clipping. Mapping disks are included in the clipping area

computation. Opposite of s.

Remove cut edges. Edges the ends of which are mapped onto

diﬀerent processors are not displayed. Opposite of v.

Small clipping. Mapping disks are excluded from the clipping

area computation. Opposite of l.

View cut edges. All graph edges are displayed. Opposite of r.

x=val

Minimum X relative clipping position (in [0.0;1.0]).

X=val

Maximum X relative clipping position (in [0.0;1.0]).

y=val

Minimum Y relative clipping position (in [0.0;1.0]).

Y=val

Maximum Y relative clipping position (in [0.0;1.0]).

-V Print the program version and copyright.

Default option set is “-Oi{v}”.

6.3.13 gtst

Synopsis

gtst [input graph ﬁle [output data ﬁle]] options

Description

The program gtst is the source graph tester. It checks the consistency of

the input source graph structure (matching of arcs, number of vertices and

edges, etc.), and gives some statistics regarding edge weights, vertex weights,

and vertex degrees.

When the graphs to test are very large, the same results can be obtained by

using the dgtst parallel program of the PT-Scotch distribution, which can

read centralized graph ﬁles too.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.14 mcv

Synopsis

mcv [input mesh ﬁle [output mesh ﬁle [output geometry ﬁle]]] options

Description

The program mcv is the source mesh converter. It takes on input a mesh ﬁle

of the format speciﬁed with the -i option, and outputs its equivalent in the

format speciﬁed with the -o option, along with its associated geometry ﬁle

whenever geometrical data is available. At the time being, it only accepts one

external input format: the Harwell-Boeing format [10], for square elemental

matrices only. The only output format to date is the Scotch source mesh

and geometry data format.

Options

-h Display the program synopsis.

-iformat

Specify the type of input mesh. The available input formats are listed

below.

b[number]

Harwell-Boeing mesh collection format. Only symmetric elemental

matrices are currently supported. Since ﬁles in this format can con-

tain several meshes one after another, the optional integer number,

starting from 0, indicates which mesh of the ﬁle is considered for

conversion.

Scotch source mesh format.

-oformat

Specify the output graph format. The available output formats are listed

below.

Scotch source graph format.

-V Print the program version and copyright.

Default option set is “-Ib0 -Os”.

6.3.15 mmk *

Synopsis

mmk m2 dimX [dimY [output mesh ﬁle]] options

mmk m3 dimX [dimY [dimZ [output mesh ﬁle]]] options

Description

The mmk * programs make source meshes.

Program mmk m2 outputs the source ﬁle of a bidimensional mesh with

dimX × dimY elements and (dimX + 1) × (dimY + 1) nodes. The element

of coordinates (posX, posY ) is labeled posY × dimX + posX.

Program mmk m3 outputs the source ﬁle of a tridimensional mesh with

dimX × dimY × dimZ elements and (dimX + 1) × (dimY + 1) × (dimZ + 1)

nodes.

Options

-goutput geometry ﬁle

Output mesh geometry to ﬁle output geometry ﬁle (for mmk m2 only). As

for all other ﬁle names, “-” may be used to indicate standard output.

-h Display the program synopsis.

-V Print the program version and copyright.

6.3.16 mord

Synopsis

mord [input mesh ﬁle [output ordering ﬁle [output log ﬁle]]] options

Description

The mord program is the block sparse matrix mesh orderer. It uses an ordering

strategy to compute block orderings of sparse matrices represented as source

meshes, whose node vertex weights indicate the number of DOFs per node (if

this number is non homogeneous), in order to minimize ﬁll-in and operation

count.

Since its main purpose is to provide orderings that exhibit high concurrency

for parallel block factorization, it comprises a nested dissection method [17],

but classical [39] and state-of-the-art [1, 47] minimum degree algorithms are

implemented as well. Ordering methods are used to deﬁne ordering strategies

by means of selection, grouping, and condition operators.

The -o option allows the user to deﬁne the ordering strategy. The -c option

allows the user to set preferences on the behavior of the ordering strategy

which is used by default.

Options

Since the program is devoted to experimental studies, it has many optional

parameters, used to test various execution modes. Values set by default will

give best results in most cases.

-cﬂags

Tune the default ordering strategy according to the given preference ﬂags.

Some of these ﬂags are antagonistic, while others can be combined. See

Section 7.3.1 for more information. The resulting strategy string can be

displayed by means of the -vs option.

Enforce load balance as much as possible.

Privilege quality over speed. This is the default behavior.

Privilege speed over quality.

Use only safe methods in the strategy.

-h Display the program synopsis.

-moutput mapping ﬁle

Write to output mapping ﬁle the mapping of mesh node vertices to col-

umn blocks. All of the separators and leaves produced by the nested

dissection method are considered as distinct column blocks, which may

be in turn split by the ordering methods that are applied to them. Dis-

tinct integer numbers are associated with each of the column blocks, such

that the number of a block is always greater than the ones of its prede-

cessors in the elimination process, that is, its leaves in the elimination

tree. The structure of mapping ﬁles is given in section 5.5.

When the coordinates of the node vertices are available, the mapping

ﬁle may be processed by program gout, along with the graph structure

that can be created from the source mesh ﬁle by means of the gmk

msh program, to display the node vertex separators and supervariable

amalgamations that have been computed.

-ostrat

Apply ordering strategy strat. The format of ordering strategies is deﬁned

in section 7.3.4.

-toutput tree ﬁle

Write to output tree ﬁle the structure of the separator tree. The data

that is written resembles much the one of a mapping ﬁle: after a ﬁrst

line that contains the number of lines to follow, there are that many

lines of mapping pairs, which associate an integer number with every

node vertex index. This integer number is the number of the column

block which is the parent of the column block to which the node vertex

belongs, or −1 if the column block to which the node vertex belongs is

a root of the separator tree (there can be several roots, if the mesh is

disconnected).

Combined to the column block mapping data produced by option -m, the

tree structure allows one to rebuild the separator tree.

-V Print the program version and copyright.

-vverb

Set verbose mode to verb, which may contain several of the following

switches.

Strategy information. This parameter displays the default ordering

strategy used by mord.

Timing information.

6.3.17 mtst

Synopsis

mtst [input mesh ﬁle [output data ﬁle]] options

Description

The program mtst is the source mesh tester. It checks the consistency of the

input source mesh structure (matching of arcs that link elements to nodes

and nodes to elements, number of elements, nodes, and edges, etc.), and gives

some statistics regarding element and node weights, edge weights, and element

and node degrees.

Options

-h Display the program synopsis.

-V Print the program version and copyright.

7 Library

All of the features provided by the programs of the Scotch distribution may be

directly accessed by calling the appropriate functions of the libScotch library,

archived in ﬁles libscotch.a and libscotcherr.a. These routines belong to six

distinct classes:

• source graph and source mesh handling routines, which serve to declare, build,

load, save, and check the consistency of source graphs and meshes, along with

their geometry data;

• target architecture handling routines, which allow the user to declare, build,

load, and save target architectures;

• strategy handling routines, which allow the user to declare and build mapping

and ordering strategies;

• mapping routines, which serve to declare, compute, and save mappings of

source graphs to target architectures by means of mapping strategies;

• ordering routines, which allow the user to declare, compute, and save orderings

of source graphs and meshes;

• error handling routines, which allow the user either to provide his own error

servicing routines, or to use the default routines provided in the libScotch

distribution.

A MeTiS compatibility library, called libscotchmetis.a, is also available. It

allows users who were previously using MeTiS in their software to take advantage of

the eﬃcieny of Scotch without having to modify their code. The services provided

by this library are described in Section 7.14.

7.1 Calling the routines of libScotch

7.1.1 Calling from C

All of the C routines of the libScotch library are preﬁxed with “SCOTCH ”. The

remainder of the function names is made of the name of the type of object to which

the functions apply (e.g. “graph”, “mesh”, “arch”, “map”, etc.), followed by the

type of action performed on this object: “Init” for the initialization of the object,

“Exit” for the freeing of its internal structures, “Load” for loading the object from

a stream, and so on.

Typically, functions that return an error code return zero if the function suc-

ceeds, and a non-zero value in case of error.

For instance, the SCOTCH graphInit and SCOTCH graphLoad routines, described

in sections 7.5.1 and 7.5.4, respectively, can be called from C by using the following

code.

#include <stdio.h>

#include "scotch.h"

...

SCOTCH_Graph

FILE *

grafdat;

fileptr;

if (SCOTCH_graphInit (&grafdat) != 0) {

... /* Error handling */

}

if ((fileptr = fopen ("brol.grf", "r")) == NULL) {

... /* Error handling */

}

if (SCOTCH_graphLoad (&grafdat, fileptr, -1, 0) != 0) {

... /* Error handling */

}

...

Since “scotch.h” uses several system objects which are declared in “stdio.h”,

this latter ﬁle must be included beforehand in your application code.

Although the “scotch.h” and “ptscotch.h” ﬁles may look very similar on your

system, never mistake them, and always use the “scotch.h” ﬁle as the include ﬁle

for compiling a program which uses only the sequential routines of the libScotch

library.

7.1.2 Calling from Fortran

The routines of the libScotch library can also be called from Fortran. For any C

function named SCOTCH typeAction() which is documented in this manual, there

exists a SCOTCHFTYPEACTION () Fortran counterpart, in which the separating

underscore character is replaced by an “F”. In most cases, the Fortran routines

have exactly the same parameters as the C functions, save for an added trailing

INTEGER argument to store the return value yielded by the function when the

return type of the C function is not void.

Since all the data structures used in libScotch are opaque, equivalent dec-

larations for these structures must be provided in Fortran. These structures must

therefore be deﬁned as arrays of DOUBLEPRECISIONs, of sizes given in ﬁle scotchf.h,

which must be included whenever necessary.

For routines which read or write data using a FILE * stream in C, the Fortran

counterpart uses an INTEGER parameter which is the numer of the Unix ﬁle descrip-

tor corresponding to the logical unit from which to read or write. In most Unix

implementations of Fortran, standard descriptors 0 for standard input (logical unit

5), 1 for standard output (logical unit 6) and 2 for standard error are opened by

default. However, for ﬁles which are opened using OPEN statements, an additional

function must be used to obtain the number of the Unix ﬁle descriptor from the

number of the logical unit. This function is called PXFFILENO in the normalized

POSIX Fortran API, and ﬁles which use it should include the USE IFPOSIX direc-

tive whenever necessary. An alternate, non normalized, function also exists in most

Unix implementations of Fortran, and is called FNUM.

For instance, the SCOTCH graphInit and SCOTCH graphLoad routines, described

in sections 7.5.1 and 7.5.4, respectively, can be called from Fortran by using the

following code.

INCLUDE "scotchf.h"

DOUBLEPRECISION GRAFDAT(SCOTCH_GRAPHDIM)

INTEGER RETVAL

...

CALL SCOTCHFGRAPHINIT (GRAFDAT (1), RETVAL)

IF (RETVAL .NE. 0) THEN

...

OPEN (10, FILE=’brol.grf’)

CALL SCOTCHFGRAPHLOAD (GRAFDAT (1), FNUM (10), 1, 0, RETVAL)

CLOSE (10)

IF (RETVAL .NE. 0) THEN

...

Although the “scotchf.h” and “ptscotchf.h” ﬁles may look very similar on

your system, never mistake them, and always use the “scotchf.h” ﬁle as the in-

clude ﬁle for compiling a program which uses only the sequential routines of the

libScotch library.

7.1.3 Compiling and linking

The compilation of C or Fortran routines which use routines of the libScotch

library requires that either scotch.h or scotchf.h be included, respectively.

The routines of the libScotch library are grouped in a library ﬁle called

libscotch.a. Default error routines that print an error message and exit are pro-

vided in library ﬁle libscotcherr.a.

Therefore, the linking of applications that make use of the libScotch li-

brary with standard error handling is carried out by using the following options:

“-lscotch -lscotcherr -lm”. If you want to handle errors by yourself, you

should not link with library ﬁle libscotcherr.a, but rather provide a SCOTCH

errorPrint() routine. Please refer to section 7.12 for more information.

7.1.4 Machine word size issues

Graph indices are represented in Scotch as integer values of type SCOTCH Num. By

default, this type equates to the int C type, that is, an integer type of size equal

to the one of the machine word. However, it can represent any other integer type.

Indeed, the size of the SCOTCH Num integer type can be coerced to 32 or 64 bits

by using the “-DINTSIZE32” or “-DINTSIZE64” compilation ﬂags, respectively, or

else by using the “-DINT=” deﬁnition (see Section 8.3 for more information on the

setting of these compilation ﬂags).

Consequently, the C interface of Scotch uses two types of integers. Graph-

related quantities are passed as SCOTCH Nums, while system-related values such as

ﬁle handles, as well as return values of libScotch routines, are always passed as

ints.

Because of the variability of library integer type sizes, one must be careful

when using the Fortran interface of Scotch, as it does not provide any proto-

typing information, and consequently cannot produce any warning at link time.

In the manual pages of the libScotch routines, Fortran prototypes are written

using three types of INTEGERs. As for the C interface, the regular INTEGER type

is used for system-based values, such as ﬁle handles and MPI communicators, as

well as for return values of the libScotch routines, while the INTEGER*num type

should be used for all graph-related values, in accordance to the size of the SCOTCH

Num type, as set by the “-DINTSIZEx” compilation ﬂags. Also, the INTEGER*idx

type represents an integer type of a size equivalent to the one of a SCOTCH Idx, as

set by the “-DIDXSIZEx” compilation ﬂags. Values of this type are used in the For-

tran interface to represent arbitrary array indices which can span across the whole

address space, and consequently deserve special treatment.

In practice, when Scotch is compiled on a 32-bit architecture so as

to use 64-bit SCOTCH Nums, graph indices should be declared as INTEGER*8,

while error return values should still be declared as plain INTEGER (that is,

INTEGER*4) values. On a 32 64-bit architecture, irrespective of whether SCOTCH

Nums are deﬁned as INTEGER*4 or INTEGER*8 quantities, the SCOTCH Idx type

should always be deﬁned as a 64-bit quantity, that is, an INTEGER*8, because

it stores diﬀerences between memory addresses, which are represented by 64-bit

values. The above is no longer a problem if Scotch is compiled such that ints

equate 64-bit integers. In this case, there is no need to use any type coercing

deﬁnition.

Also, the MeTiS compatibility library provided by Scotch will not work when

SCOTCH Nums are not ints, since the interface of MeTiS uses regular ints to represent

graph indices. In addition to compile-time warnings, an error message will be issued

when one of these routines is called.

7.2 Data formats

All of the data used in the libScotch interface are of integer type SCOTCH Num.

To hide the internals of Scotch to callers, all of the data structures are opaque,

that is, declared within scotch.h as dummy arrays of double precision values, for

the sake of data alignment. Accessor routines, the names of which end in “Size”

and “Data”, allow callers to retrieve information from opaque structures.

In all of the following, whenever arrays are deﬁned, passed, and accessed, it

is assumed that the ﬁrst element of these arrays is always labeled as baseval,

whether baseval is set to 0 (for C-style arrays) or 1 (for Fortran-style arrays).

Scotch internally manages with base values and array pointers so as to process

these arrays accordingly.

7.2.1 Architecture format

Target architecture structures are completely opaque. The only way to describe an

architecture is by means of a graph passed to the SCOTCH archBuild routine.

7.2.2 Graph format

Source graphs are described by means of adjacency lists. The description of a graph

requires several SCOTCH Num scalars and arrays, as shown in Figures 16 and 17. They

have the following meaning:

baseval

Base value for all array indexings.

vertnbr

Number of vertices in graph.

edgenbr

Number of arcs in graph. Since edges are represented by both of their ends,

the number of edge data in the graph is twice the number of graph edges.

verttab

Array of start indices in edgetab of vertex adjacency sub-arrays.

vendtab

Array of after-last indices in edgetab of vertex adjacency sub-arrays. For any

vertex i, with baseval ≤ i < (baseval+vertnbr), vendtab[i]−verttab[i]

is the degree of vertex i, and the indices of the neighbors of i are stored in

edgetab from edgetab[verttab[i]] to edgetab[vendtab[i]−1], inclusive.

When all vertex adjacency lists are stored in order in edgetab, it is possible to

save memory by not allocating the physical memory for vendtab. In this case,

illustrated in Figure 16, verttab is of size vertnbr+1 and vendtab points to

verttab + 1. This case is referred to as the “compact edge array” case, such

that verttab is sorted in ascending order, verttab[baseval] = baseval and

verttab[baseval + vertnbr] = (baseval + edgenbr).

velotab

Optional array, of size vertnbr, holding the integer load associated with every

vertex.

edgetab

Array, of a size equal at least to (max_i(vendtab[i]) − baseval), holding the

adjacency array of every vertex.

edlotab

Optional array, of a size equal at least to (max_i(vendtab[i]) − baseval),

holding the integer load associated with every arc. Matching arcs should

always have identical loads.

baseval

vertnbr

edgenbr

vlbltab

velotab

vendtab

verttab

4 10 13 16 19 22 25

edgetab

edlotab

Figure 16: Sample graph and its description by libScotch arrays using a compact

edge array. Numbers within vertices are vertex indices, bold numbers close to

vertices are vertex loads, and numbers close to edges are edge loads. Since the edge

array is compact, verttab is of size vertnbr+1 and vendtab points to verttab+1.

verttab

edgetab

17 2 13 10 20 27 23

vendtab

edlotab

20 8 16 13 23 30 26

Figure 17: Adjacency structure of the sample graph of Figure 16 with disjoint edge

and edge load arrays. Both verttab and vendtab are of size vertnbr. This allows

for the handling of dynamic graphs, the structure of which can evolve with time.

Dynamic graphs can be handled elegantly by using the vendtab array. In order

to dynamically manage graphs, one just has to allocate verttab, vendtab and

edgetab arrays that are large enough to contain all of the expected new vertex and

edge data. Original vertices are labeled starting from baseval, leaving free space at

the end of the arrays. To remove some vertex i, one just has to replace verttab[i]

and vendtab[i] with the values of verttab[vertnbr-1] and vendtab[vertnbr-1],

respectively, and browse the adjacencies of all neighbors of former vertex vertnbr-1

such that all (vertnbr-1) indices are turned into is. Then, vertnbr must be

decremented, and SCOTCH graphBuild() must be called to account for the change

of topology. If a graph building routine such as SCOTCH graphLoad() or SCOTCH

graphBuild() had already been called on the SCOTCH Graph structure, SCOTCH

graphFree() has to be called ﬁrst in order to free the internal structures associated

with the older version of the graph, else these data would be lost, which would

result in memory leakage.

To add a new vertex, one has to ﬁll verttab[vertnbr-1] and vendtab[vertnbr

-1] with the starting and end indices of the adjacency sub-array of the new vertex.

Then, the adjacencies of its neighbor vertices must also be updated to account for it.

If free space had been reserved at the end of each of the neighbors, one just has to

increment the vendtab[i] values of every neighbor i, and add the index of the new

vertex at the end of the adjacency sub-array. If the sub-array cannot be extended,

then it has to be copied elsewhere in the edge array, and both verttab[i] and

vendtab[i] must be updated accordingly. With simple housekeeping of free areas

of the edge array, dynamic arrays can be updated with as little data movement as

possible.

7.2.3 Mesh format

Since meshes are basically bipartite graphs, source meshes are also described by

means of adjacency lists. The description of a mesh requires several SCOTCH Num

scalars and arrays, as shown in Figure 18. They have the following meaning:

velmbas

Base value for element indexings.

vnodbas

Base value for node indexings. The base value of the underlying graph,

baseval, is set as min(velmbas, vnodbas).

velmnbr

Number of element vertices in mesh.

vnodnbr

Number of node vertices in mesh. The overall number of vertices in the

underlying graph, vertnbr, is set as velmnbr + vnodnbr.

edgenbr

Number of arcs in mesh. Since edges are represented by both of their ends,

the number of edge data in the mesh is twice the number of edges.

verttab

Array of start indices in edgetab of vertex (that is, both elements and nodes)

adjacency sub-arrays.

velmbas

vnodbas

velmnbr

vnodnbr

edgenbr

vlbltab

velotab

vendtab

verttab

9 13 14 16 18 20 21 22 23 25

edgetab

5 11 7 6 10 5 11 4

Figure 18: Sample mesh and its description by libScotch arrays using a compact

edge array. Numbers within vertices are vertex indices. Since the edge array is

compact, verttab is of size vertnbr + 1 and vendtab points to verttab + 1.

vendtab

Array of after-last indices in edgetab of vertex adjacency sub-arrays. For

any element or node vertex i, with baseval ≤ i < (baseval + vertnbr),

vendtab[i] − verttab[i] is the degree of vertex i, and the indices of the

neighbors of i are stored in edgetab from edgetab[verttab[i]] to edgetab

[vendtab[i]−1], inclusive.

When all vertex adjacency lists are stored in order in edgetab, it is possible to

save memory by not allocating the physical memory for vendtab. In this case,

illustrated in Figure 18, verttab is of size vertnbr+1 and vendtab points to

verttab + 1. This case is referred to as the “compact edge array” case, such

that verttab is sorted in ascending order, verttab[baseval] = baseval and

verttab[baseval + vertnbr] = (baseval + edgenbr).

velotab

Array, of size vertnbr, holding the integer load associated with each vertex.

As for graphs, it is possible to handle elegantly dynamic meshes by means of the

verttab and vendtab arrays. There is, however, an additional constraint, which is

that mesh nodes and elements must be ordered consecutively. The solution to fulﬁll

this constraint in the context of mesh ordering is to keep a set of empty elements

(that is, elements which have no node adjacency attached to them) between the

element and node arrays. For instance, Figure 19 represents a 4-element mesh

with 6 nodes, and such that 4 element vertex slots have been reserved for new

elements and nodes. These slots are empty elements for which verttab[i] equals

vendtab[i], irrespective of these values, since they will not lead to any memory

access in edgetab.

Using this layout of vertices, new nodes and elements can be created by growing

the element and node sub-arrays into the empty element sub-array, by both of

its sides, without having to re-write the whole mesh structure, as illustrated in

Figure 20. Empty elements are transparent to the mesh ordering routines, which

base their work on node vertices only. Users who want to update the arrays of

velmbas

vnodbas

velmnbr

vnodnbr

edgenbr

vlbltab

velotab

verttab

9 12 0

0 13 16 19 22

edgetab

vendtab

11 11 12 13 11 12 14 13 13 14 12 14 1

9 12 13 0

0 16 19 22 25

Figure 19: Sample mesh and its description by libScotch arrays, with nodes

numbered ﬁrst and elements numbered last. In order to allow for dynamic re-

meshing, empty elements (in grey) have been inserted between existing node and

element vertices.

a mesh that has already been declared using the SCOTCH meshBuild routine must

call SCOTCH meshExit prior to updating the mesh arrays, and then call SCOTCH

meshBuild again after the arrays have been updated, so that the SCOTCH Mesh

structure remains consistent with the new mesh data.

7.2.4 Geometry format

Geometry data is always associated with a graph or a mesh. It is simply made

of a single array of double-precision values which represent the coordinates of the

vertices of a graph, or of the node vertices of a mesh, in vertex order. The ﬁelds of

a geometry structure are the following:

dimnnbr

Number of dimensions of the graph or of the mesh, which can be 1, 2, or 3.

geomtab

Array of coordinates. This is an array of double precision values organized

as an array of (x), or (x,y), or (x,y,z) tuples, according to dimnnbr. Co-

ordinates that are not used (e.g. the “z” coordinates for a 2-dimentional

object) are not allocated. Therefore, the “x” coordinate of some graph

vertex i is located at geomtab[(i - baseval) * dimnnbr + baseval], its

“y” coordinate is located at geomtab[(i - baseval) * dimnnbr + baseval

+ 1] if dimnnbr ≤ 2, and its “z” coordinate is located at geomtab[(i -

baseval) * dimnnbr + baseval + 2] if dimnnbr = 3. Whenever the ge-

ometry is associated with a mesh, only node vertices are considered, so

the “x” coordinate of some mesh node vertex i, with vnodbas ≤ i, is lo-

cated at geomtab[(i - vnodbas) * dimnnbr + baseval], its “y” coordi-

nate is located at geomtab[(i - vnodbas) * dimnnbr + baseval + 1] if

velmbas

vnodbas

velmnbr

vnodnbr

edgenbr

vlbltab

velotab

verttab

25 2

8 27 12 31 0 9 35 13 16 19 22

edgetab

vendtab

11 12 13 9 10 14 13 1

3 14 1

5 11 9 13 14 12 10 11 9 12 10 7

27 5

9 31 13 35 0 12 38 16 19 22 25

Figure 20: Re-meshing of the mesh of Figure 19. New node vertices have been added

at the end of the vertex sub-array, new elements have been added at the beginning

of the element sub-array, and vertex base values have been updated accordingly.

Node adjacency lists that could not ﬁt in place have been added at the end of the

edge array, and some of the freed space has been re-used for new adjacency lists.

Element adjacency lists do not require moving in this case, as all of the elements

have the name number of nodes.

dimnnbr ≤ 2, and its “z” coordinate is located at geomtab[(i - vnodbas) *

dimnnbr + baseval + 2] if dimnnbr = 3.

7.2.5 Block ordering format

Block orderings associated with graphs and meshes are described by means of block

and permutation arrays, made of SCOTCH Nums, as shown in Figure 21. In order for

all orderings to have the same structure, irrespective of whether they are created

from graphs or meshes, all ordering data indices start from baseval, even when they

refer to a mesh the node vertices of which are labeled from a vnodbas index such

that vnodbas > baseval. Consequently, row indices are related to vertex indices

in memory in the following way: row i is associated with vertex i of the SCOTCH

Graph structure if the ordering was computed from a graph, and with node vertex

i+(vnodbas−baseval) of the SCOTCH Mesh structure if the ordering was computed

from a mesh. Block orderings are made of the following data:

permtab

Array holding the permutation of the reordered matrix. Thus, if k =

permtab[i], then row i of the original matrix is now row k of the reordered

matrix, that is, row i is the k pivot.

peritab

Inverse permutation of the reordered matrix. Thus, if i = peritab[k], then

row k of the reordered matrix was row i of the original matrix.

cblknbr

Number of column blocks (that is, supervariables) in the block ordering.

rangtab

Array of ranges for the column blocks. Column block c, with baseval ≤ c <

(cblknbr+baseval), contains columns with indices ranging from rangtab[i]

to rangtab[i + 1], exclusive, in the reordered matrix. Indices in rangtab

are based. Therefore, rangtab[baseval] is always equal to baseval, and

rangtab[cblknbr + baseval] is always equal to vertnbr + baseval for

graphs and to vnodnbr + baseval for meshes. In order to avoid memory

errors when column blocks are all single columns, the size of rangtab must

always be one more than the number of columns, that is, vertnbr + 1 for

graphs and vnodnbr + 1 for meshes.

treetab

Array of ascendants of permuted column blocks in the separators tree.

treetab[i] is the index of the father of column block i in the separators

tree, or −1 if column block i is the root of the separators tree. Whenever sep-

arators or leaves of the separators tree are split into subblocks, as the block

splitting, minimum ﬁll or minimum degree methods do, all subblocks of the

same level are linked to the column block of higher index belonging to the

closest separator ancestor. Indices in treetab are based, in the same way as

for the other blocking structures. See Figure 21 for a complete example.

7.3 Strategy strings

The behavior of the mapping and block ordering routines of the libScotch library

is parametrized by means of strategy strings, which describe how and when given

permtab

peritab

cblknbr

rangtab

treetab

3 10 6 4 11 8

1 12 5

5 11 4

7 12 3 6 10

8 10 13

7 −1

Figure 21: Arrays resulting from the ordering by complete nested dissection of a 4

by 3 grid based from 1. Leftmost grid is the original grid, and righmost grid is the

reordered grid, with separators shown and column block indices written in bold.

partitioning or ordering methods should be applied to graphs and subgraphs, or to

meshes and submeshes.

7.3.1 Using default strategy strings

While strategy strings can be built by hand, according to the syntax given in the

next sections, users who do not have speciﬁc needs can take advantage of default

strategies already implemented in the libScotch, which will yield very good results

in most cases. By doing so, they will spare themselves the hassle of updating their

strategies to comply to subsequent syntactic changes, and they will beneﬁt from

the availability of new partitioning or ordering methods as soon as they are made

available.

The simplest way to use default strategy strings is to avoid specifying any. By

initializing a strategy object, by means of the SCOTCH stratInit routine, and by

using the initialized strategy object as is, without further parametrization, this

object will be ﬁlled with a default strategy when passing it as a parameter to the

next partitioning or ordering routine to be called. On return, the strategy object

will contain a fully speciﬁed strategy, tailored for the type of operation which has

been requested. Consequently, a fresh strategy object that was used to partition a

graph cannot be used afterward as a default strategy for calling an ordering routine,

for instance, as partitioning and ordering strategies are incompatible.

The libScotch also provides helper routines which allow users to express their

preferences on the kind of strategy that they need. These helper routines, which

are of the form SCOTCH strat*Build, tune default strategy strings according to

parameters provided by the user, such as the requested number of parts (used

as a hint to select the most eﬃcient partitioning routines), the desired maximum

load imbalance ratio, and a set of preference ﬂags. While some of these ﬂags are

antagonistic, most of them can be combined, by means of addition or “binary or”

operators. These ﬂags are the following.

SCOTCH STRATQUALITY

Privilege quality over speed. This is the default behavior of default strategy

strings when they are used just after being initialized.

SCOTCH STRATSPEED

Privilege speed over quality.

SCOTCH STRATBALANCE

Enforce load balance as much as possible.

SCOTCH STRATSAFETY

Do not use methods that can lead to the occurrence of problematic events,

such as ﬂoating point exceptions, which could not be properly handled by the

calling software.

7.3.2 Mapping strategy strings

At the time being, mapping methods only apply to graphs, as there is not yet a mesh

mapping tool in the Scotch package. Mapping strategies are made of methods,

with optional parameters enclosed between curly braces, and separated by commas,

in the form of method[{parameters}] . The currently available mapping methods

are the following.

Multi-level method. The parameters of the multi-level method are listed be-

low.

asc=strat

Set the strategy that is used to reﬁne the mappings obtained at ascending

levels of the uncoarsening phase by projection of the mappings computed

for coarser graphs. This strategy is not applied to the coarsest graph,

for which only the low strategy is used.

low=strat

Set the strategy that is used to compute the mapping of the coarsest

graph, at the lowest level of the coarsening process.

rat=rat

Set the threshold maximum coarsening ratio over which graphs are no

longer coarsened. The ratio of any given coarsening cannot be less that

0.5 (case of a perfect matching), and cannot be greater than 1.0. Coars-

ening stops when either the coarsening ratio is above the maximum coars-

ening ratio, or the graph has fewer vertices than the minimum number

of vertices allowed.

type=type

Set the type of matching that is used to coarsen the graphs. type is h for

heavy-edge matching, or s for scan (ﬁrst-ﬁt) matching.

vert=nbr

Set the threshold under which graphs are no longer coarsened. Coarsen-

ing stops when either the coarsening ratio is above the maximum coars-

ening ratio, or the graph would have fewer vertices than the minimum

number of vertices allowed. When the target architecture is a variable-

sized architecture, coarsening stops when the coarsened graph would have

less than nbr vertices. When the target architecture is a regular, ﬁxed-

size, architecture, coarsening stops when each subdomain would have less

than nbr vertices, that is, when the size of the coarsened graph would

have less than nbr × domnnbr vertices, where domnnbr is the number of

vertices in the target architecture.

Dual Recursive Bipartitioning mapping algorithm, as deﬁned in section 3.1.3.

The parameters of the DRB mapping method are listed below.

job=tie

The tie ﬂag deﬁnes how new jobs are stored in job pools.

Tie job pools together. Subjobs are stored in same pool as their par-

ent job. This is the default behavior, as it proves the most eﬃcient

in practice.

Untie job pools. Subjobs are stored in the next job pool to be pro-

cessed.

map=tie

The tie ﬂag deﬁnes how results of bipartitioning jobs are propagated to

jobs still in pools.

Tie both mapping tables together. Results are immediately available

to jobs in the same job pool. This is the default behavior.

Untie mapping tables. Results are only available to jobs of next pool

to be processed.

poli=policy

Select jobs according to policy policy. Job selection policies deﬁne how

bipartitioning jobs are ordered within the currently active job pool. Valid

policy ﬂags are

Most neighbors of higher level.

Highest level.

Random.

Most neighbors of smaller size. This is the default behavior.

Biggest size.

sep=strat

Apply bipartitioning strategy strat to each bipartitioning job. A biparti-

tioning strategy is made of one or several bipartitioning methods, which

can be combined by means of strategy operators. Graph bipartitioning

strategies are described below.

7.3.3 Graph bipartitioning strategy strings

A graph bipartitioning strategy is made of one or several graph bipartitioning meth-

ods, which can be combined by means of strategy operators. Strategy operators are

listed below, by increasing precedence.

strat1|strat2

Selection operator. The result of the selection is the best bipartition of the

two that are obtained by the separate application of strat1 and strat2 to the

current bipartition.

strat1 strat2

Combination operator. Strategy strat2 is applied to the bipartition resulting

from the application of strategy strat1 to the current bipartition. Typically,

the ﬁrst method used should compute an initial bipartition from scratch, and

every following method should use the result of the previous one at its starting

point.

(strat)

Grouping operator. The strategy enclosed within the parentheses is treated

as a single bipartitioning method.

/cond?strat1[:strat2];

Condition operator. According to the result of the evaluation of condition

cond, either strat1 or strat2 (if it is present) is applied. The condition applies

to the characteristics of the current active graph, and can be built from logical

and relational operators. Conditional operators are listed below, by increasing

precedence.

cond1|cond2

Logical or operator. The result of the condition is true if cond1 or cond2

are true, or both.

cond1&cond2

Logical and operator. The result of the condition is true only if both

cond1 and cond2 are true.

!cond

Logical not operator. The result of the condition is true only if cond is

false.

var relop val

Relational operator, where var is a graph variable, val is either a graph

variable or a constant of the type of variable var, and relop is one of ’<’,

’=’, and ’>’. The graph variables are listed below, along with their types.

deg

The average degree of the current graph. Float.

edge

The number of arcs (which is twice the number of edges) of the

current graph. Integer.

load

The overall vertex load (weight) of the current graph. Integer.

load0

The vertex load of the ﬁrst subset of the current bipartition of the

current graph. Integer.

vert

The number of vertices of the current graph. Integer.

method[{parameters}]

Bipartitioning method. For bipartitioning methods that can be parametrized,

parameter settings may be provided after the method name. Parameters must

be separated by commas, and the whole list be enclosed between curly braces.

The currently available graph bipartitioning methods are the following.

Band method. This method builds a band graph of given width around the

current frontier of the graph to which it is applied, and calls a graph biparti-

tioning strategy to reﬁne the equivalent bipartition of the band graph. Then,

the reﬁned frontier of the band graph is projected back to the current graph.

This method, presented in [8], was created to reduce the cost of vertex sepa-

rator reﬁnement algorithms in a multi-level context, but it improves partition

quality too. The same behavior is observed for graph bipartitioning. The

parameters of the band bipartitioning method are listed below.

bnd=strat

Set the graph bipartitioning strategy to be used on the band graph.

org=strat

Set the fallback graph bipartitioning strategy to be used on the original

graph if the band graph strategy could not be used. The three cases

which require the use of this fallback strategy are the following. First, if

the separator of the original graph is empty, which makes it impossible

to compute a band graph. Second, if any part of the band graph to be

built is of the same size as the one of the original graph. Third, if the

application of the bnd bipartitioning method to the band graph leads to

a situation where both anchor vertices are placed in the same part.

width=val

Set the width of the band graph. All graph vertices that are at a distance

less than or equal to val from any frontier vertex are kept in the band

graph.

Diﬀusion method. This method, presented in [42], ﬂows two kinds of antag-

onistic liquids, scotch and anti-scotch, from two source vertices, and sets the

new frontier as the limit between vertices which contain scotch and the ones

which contain anti-scotch. Because selecting the source vertices is essential

to the obtainment of useful results, this method has been hard-coded so that

the two source vertices are the two vertices of highest indices, since in the

band method these are the anchor vertices which represent all of the removed

vertices of each part. Therefore, this method must be used on band graphs

only, or on speciﬁcally crafted graphs. Applying it to any other graphs is

very likely to lead to extremely poor results. The parameters of the diﬀusion

bipartitioning method are listed below.

dif=rat

Fraction of liquid which is diﬀused to neighbor vertices at each pass. To

achieve convergence, the sum of the dif and rem parameters must be

equal to 1, but in order to speed-up the diﬀusion process, other combi-

nations of higher sum can be tried. In this case, the number of passes

must be kept low, to avoid numerical overﬂows which would make the

results useless.

pass=nbr

Set the number of diﬀusion sweeps performed by the algorithm. This

number depends on the width of the band graph to which the diﬀusion

method is applied. Useful values range from 30 to 500 according to

chosen dif and rem coeﬃcients.

rem=rat

Fraction of liquid which remains on vertices at each pass. See above.

Fiduccia-Mattheyses method. The parameters of the Fiduccia-Mattheyses

method are listed below.

bal=rat

Set the maximum weight imbalance ratio to the given fraction of the

subgraph vertex weight. Common values are around 0.01, that is, one

percent.

move=nbr

Maximum number of hill-climbing moves that can be performed before a

pass ends. During each of its passes, the Fiduccia-Mattheyses algorithm

repeatedly swaps vertices between the two parts so as to minimize the

cost function. A pass completes either when all of the vertices have been

moved once, or if too many swaps that do not decrease the value of the

cost function have been performed. Setting this value to zero turns the

Fiduccia-Mattheyses algorithm into a gradient-like method, which may

be used to quickly reﬁne partitions during the uncoarsening phase of the

multi-level method.

pass=nbr

Set the maximum number of optimization passes performed by the algo-

rithm. The Fiduccia-Mattheyses algorithm stops as soon as a pass has

not yielded any improvement of the cost function, or when the maximum

number of passes has been reached. Value −1 stands for an inﬁnite num-

ber of passes, that is, as many as needed by the algorithm to converge.

Gibbs-Poole-Stockmeyer method. This method has only one parameter.

pass=nbr

Set the number of sweeps performed by the algorithm.

Greedy-graph-growing method. This method has only one parameter.

pass=nbr

Set the number of runs performed by the algorithm.

Multi-level method. The parameters of the multi-level method are listed be-

low.

asc=strat

Set the strategy that is used to reﬁne the partitions obtained at ascend-

ing levels of the uncoarsening phase by projection of the bipartitions

computed for coarser graphs. This strategy is not applied to the coarsest

graph, for which only the low strategy is used.

low=strat

Set the strategy that is used to compute the partition of the coarsest

graph, at the lowest level of the coarsening process.

rat=rat

Set the threshold maximum coarsening ratio over which graphs are no

longer coarsened. The ratio of any given coarsening cannot be less that

0.5 (case of a perfect matching), and cannot be greater than 1.0. Coars-

ening stops when either the coarsening ratio is above the maximum coars-

ening ratio, or the graph has fewer vertices than the minimum number

of vertices allowed.

type=type

Set the type of matching that is used to coarsen the graphs. type is h for

heavy-edge matching, or s for scan (ﬁrst-ﬁt) matching.

vert=nbr

Set the threshold minimum graph size under which graphs are no longer

coarsened. Coarsening stops when either the coarsening ratio is above

the maximum coarsening ratio, or the coarsened graph would have fewer

vertices than the minimum number of vertices allowed.

Exactifying method.

Zero method. This method moves all of the vertices to the ﬁrst part. Its

main use is to stop the bipartitioning process, if some condition is true, when

mapping onto variable-sized architectures (see section 3.1.7).

7.3.4 Ordering strategy strings

Ordering strategies are available both for graphs and for meshes. An ordering

strategy is made of one or several ordering methods, which can be combined by

means of strategy operators. The strategy operators that can be used in ordering

strategies are listed below, by increasing precedence.

(strat)

Grouping operator. The strategy enclosed within the parentheses is treated

as a single ordering method.

/cond?strat1[:strat2];

Condition operator. According to the result of the evaluation of condition

cond, either strat1 or strat2 (if it is present) is applied. The condition applies

to the characteristics of the current node of the separators tree, and can be

built from logical and relational operators. Conditional operators are listed

below, by increasing precedence.

cond1|cond2

Logical or operator. The result of the condition is true if cond1 or cond2

are true, or both.

cond1&cond2

Logical and operator. The result of the condition is true only if both

cond1 and cond2 are true.

!cond

Logical not operator. The result of the condition is true only if cond is

false.

var relop val

Relational operator, where var is a node variable, val is either a node

variable or a constant of the type of variable var, and relop is one of ’<’,

’=’, and ’>’. The node variables are listed below, along with their types.

edge

The number of vertices of the current subgraph. Integer.

levl

The level of the subgraph in the separators tree, starting from zero

for the initial graph at the root of the tree. Integer.

load

The overall vertex load (weight) of the current subgraph. Integer.

mdeg

The maximum degree of the current subgraph. Integer.

vert

The number of vertices of the current subgraph. Integer.

method[{parameters}]

Graph or mesh ordering method. Available ordering methods are listed below.

The currently available ordering methods are the following.

Blocking method. This method does not perform ordering by itself, but is used

as post-processing to cut into blocks of smaller sizes the separators or large

blocks produced by other ordering methods. This is not useful in the context of

direct solving methods, because the oﬀ-diagonal blocks created by the splitting

of large diagonal blocks are likely to be ﬁlled at factoring time. However, in

the context of incomplete solving methods such as ILU(k) [29], it can lead

to a signiﬁcant reduction of the required memory space and time, because it

helps carving large triangular blocks. The parameters of the blocking method

are described below.

cmin=size

Set the minimum size of the resulting subblocks, in number of columns.

Blocks larger than twice this minimum size are cut into sub-blocks of

equal sizes (within one), having a number of columns comprised between

size and 2size.

The deﬁnition of size depends on the size of the graph to order. Large

graphs cannot aﬀord very small values, because the number of blocks

becomes much too large and limits the acceleration of BLAS 3 routines,

while large values do not help reducing enough the complexity of ILU(k)

solving.

strat=strat

Ordering strategy to be performed. After the ordering strategy is applied,

the resulting separators tree is traversed and all of the column blocks

that are larger than 2size are split into smaller column blocks, without

changing the ordering that has been computed.

Compression method [2]. The parameters of the compression method are

listed below.

rat=rat

Set the compression ratio over which graphs and meshes will not be

compressed. Useful values range between 0.7 and 0.8.

cpr=strat

Ordering strategy to use on the compressed graph or mesh if its size is

below the compression ratio times the size of the original graph or mesh.

unc=strat

Ordering strategy to use on the original graph or mesh if the size of the

compressed graph or mesh were above the compression ratio times the

size of the original graph or mesh.

Block Halo Approximate Minimum Degree method [47]. The parameters of

the Halo Approximate Minimum Degree method are listed below. The Block

Halo Approximate Minimum Fill method, described below, is more eﬃcient

and should be preferred.

cmin=size

Minimum number of columns per column block. All column blocks of

width smaller than size are amalgamated to their parent column block in

the elimination tree, provided that it does not violate the cmax constraint.

cmax=size

Maximum number of column blocks over which some column block will

not amalgamate one of its descendents in the elimination tree. This

parameter is mainly designed to provide an upper bound for block size

in the context of BLAS3 computations ; else, a huge value should be

provided.

frat=rat

Fill-in ratio over which some column block will not amalgamate one of

its descendents in the elimination tree. Typical values range from 0.05

to 0.10.

Block Halo Approximate Minimum Fill method. The parameters of the Halo

Approximate Minimum Fill method are listed below.

cmin=size

Minimum number of columns per column block. All column blocks of

width smaller than size are amalgamated to their parent column block in

the elimination tree, provided that it does not violate the cmax constraint.

cmax=size

Maximum number of column blocks over which some column block will

not amalgamate one of its descendents in the elimination tree. This

parameter is mainly designed to provide an upper bound for block size

in the context of BLAS3 computations ; else, a huge value should be

provided.

frat=rat

Fill-in ratio over which some column block will not amalgamate one of

its descendents in the elimination tree. Typical values range from 0.05

to 0.10.

Gibbs-Poole-Stockmeyer method. This method is used on separators to reduce

the number and extent of extra-diagonal blocks. If the number of extra-

diagonal blocks is not relevant, the s method should be preferred. This method

has only one parameter.

pass=nbr

Set the number of sweeps performed by the algorithm.

Nested dissection method. The parameters of the nested dissection method

are given below.

ole=strat

Set the ordering strategy that is used on every leaf of the separators tree

if the node separation strategy sep has failed to separate it further.

ose=strat

Set the ordering strategy that is used on every separator of the separators

tree.

sep=strat

Set the node separation strategy that is used on every leaf of the sep-

arators tree to make it grow. Node separation strategies are described

below, in section 7.3.5.

Simple method. Vertices are ordered in their natural order. This method is

fast, and should be used to order separators if the number of extra-diagonal

blocks is not relevant ; else, the g method should be preferred.

Mesh-to-graph method. Available only for mesh ordering strategies. From the

mesh to which this method applies is derived a graph, such that a graph vertex

is associated with every node of the mesh, and a clique is created between all

vertices which represent nodes that belong to the same element. A graph

ordering strategy is then applied to the derived graph, and this ordering is

projected back to the nodes of the mesh. This method is here for evaluation

purposes only, as mesh ordering methods are generally more eﬃcient than

their graph ordering counterpart.

strat=strat

Graph ordering strategy to apply to the associated graph.

7.3.5 Node separation strategy strings

A node separation strategy is made of one or several node separation methods,

which can be combined by means of strategy operators. Strategy operators are

listed below, by increasing precedence.

strat1|strat2

Selection operator. The result of the selection is the best vertex separator of

the two that are obtained by the distinct application of strat1 and strat2 to

the current separator.

strat1 strat2

Combination operator. Strategy strat2 is applied to the vertex separator

resulting from the application of strategy strat1 to the current separator.

Typically, the ﬁrst method used should compute an initial separation from

scratch, and every following method should use the result of the previous one

as a starting point.

(strat)

Grouping operator. The strategy enclosed within the parentheses is treated

as a single separation method.

/cond?strat1[:strat2];

Condition operator. According to the result of the evaluation of condition

cond, either strat1 or strat2 (if it is present) is applied. The condition applies

to the characteristics of the current subgraph, and can be built from logical

and relational operators. Conditional operators are listed below, by increasing

precedence.

cond1|cond2

Logical or operator. The result of the condition is true if cond1 or cond2

are true, or both.

cond1&cond2

Logical and operator. The result of the condition is true only if both

cond1 and cond2 are true.

!cond

Logical not operator. The result of the condition is true only if cond is

false.

var relop val

Relational operator, where var is a graph or node variable, val is either

a graph or node variable or a constant of the type of variable var, and

relop is one of ’<’, ’=’, and ’>’. The graph and node variables are listed

below, along with their types.

levl

The level of the subgraph in the separators tree, starting from zero

at the root of the tree. Integer.

proc

The number of processors on which the current subgraph is dis-

tributed at this level of the separators tree. This variable is available

only when calling from routines of the PT-Scotch parallel library.

Integer.

rank

The rank of the current processor among the group of processors

on which the current subgraph is distributed at this level of the

separators tree. This variable is available only when calling from

routines of the PT-Scotch parallel library, for instance to decide

which node separation strategy should be used on which processor

in a multi-sequential approach. Integer.

vert

The number of vertices of the current subgraph. Integer.

The currently available vertex separation methods are the following.

Band method. Available only for graph separation strategies. This method

builds a band graph of given width around the current separator of the graph

to which it is applied, and calls a graph separation strategy to reﬁne the

equivalent separator of the band graph. Then, the reﬁned separator of the

band graph is projected back to the current graph. This method, presented

in [8], was created to reduce the cost of separator reﬁnement algorithms in a

multi-level context, but it improves partition quality too. The parameters of

the band separation method are listed below.

bnd=strat

Set the vertex separation strategy to be used on the band graph.

org=strat

Set the fallback vertex separation strategy to be used on the original

graph if the band graph strategy could not be used. The three cases

which require the use of this fallback strategy are the following. First, if

the separator of the original graph is empty, which makes it impossible

to compute a band graph. Second, if any part of the band graph to be

built is of the same size as the one of the original graph. Third, if the

application of the bnd vertex separation method to the band graph leads

to a situation where both anchor vertices are placed in the same part.

width=val

Set the width of the band graph. All graph vertices that are at a distance

less than or equal to val from any separator vertex are kept in the band

graph.

Edge-separation method. Available only for graph separation strategies. This

method builds vertex separators from edge separators, by the method pro-

posed by Pothen and Fang [48], which uses a variant of the Hopcroft and

Karp algorithm due to Duﬀ [9]. This method is expensive and most often

yields poorer results than direct vertex-oriented methods such as the vertex

vertex Greedy-graph-growing and the vertex Fiduccia-Mattheyses algorithms.

The parameters of the edge-separation method are listed below.

bal=val

Set the load imbalance tolerance to val, which is a ﬂoating-point ratio

expressed with respect to the ideal load of the partitions.

strat=strat

Set the graph bipartitioning strategy that is used to compute the edge bi-

partition. The syntax of bipartitioning strategy strings is deﬁned within

section 7.3.3, at page 59.

width=type

Select the width of the vertex separators built from edge separators.

When type is set to f, fat vertex separators are built, that hold all of

the ends of the edges of the edge cut. When it is set to t, a thin vertex

separator is built by removing as many vertices as possible from the fat

separator.

Vertex Fiduccia-Mattheyses method. The parameters of the vertex Fiduccia-

Mattheyses method are listed below.

bal=rat

Set the maximum weight imbalance ratio to the given fraction of the

weight of all node vertices. Common values are around 0.01, that is, one

percent.

move=nbr

Maximum number of hill-climbing moves that can be performed before

a pass ends. During each of its passes, the vertex Fiduccia-Mattheyses

algorithm repeatedly moves vertices from the separator to any of the two

parts, so as to minimize the size of the separator. A pass completes either

when all of the vertices have been moved once, or if too many swaps that

do not decrease the size of the separator have been performed.

pass=nbr

Set the maximum number of optimization passes performed by the al-

gorithm. The vertex Fiduccia-Mattheyses algorithm stops as soon as a

pass has not yielded any reduction of the size of the separator, or when

the maximum number of passes has been reached. Value -1 stands for an

inﬁnite number of passes, that is, as many as needed by the algorithm

to converge.

Gibbs-Poole-Stockmeyer method. Available only for graph separation strate-

gies. This method has only one parameter.

pass=nbr

Set the number of sweeps performed by the algorithm.

Vertex greedy-graph-growing method. This method has only one parameter.

pass=nbr

Set the number of runs performed by the algorithm.

Vertex multi-level method. The parameters of the vertex multi-level method

are listed below.

asc=strat

Set the strategy that is used to reﬁne the vertex separators obtained at

ascending levels of the uncoarsening phase by projection of the separators

computed for coarser graphs or meshes. This strategy is not applied to

the coarsest graph or mesh, for which only the low strategy is used.

low=strat

Set the strategy that is used to compute the vertex separator of the

coarsest graph or mesh, at the lowest level of the coarsening process.

rat=rat

Set the threshold maximum coarsening ratio over which graphs or meshes

are no longer coarsened. The ratio of any given coarsening cannot be less

that 0.5 (case of a perfect matching), and cannot be greater than 1.0.

Coarsening stops when either the coarsening ratio is above the maximum

coarsening ratio, or the graph or mesh has fewer node vertices than the

minimum number of vertices allowed.

vert=nbr

Set the threshold minimum size under which graphs or meshes are no

longer coarsened. Coarsening stops when either the coarsening ratio is

above the maximum coarsening ratio, or the graph or mesh has fewer

node vertices than the minimum number of vertices allowed.

Thinner method. Available only for graph separation strategies. This method

quickly eliminates all useless vertices of the current separator. It searches the

separator for vertices that have no neighbors in one of the two parts, and

moves these vertices to the part they are connected to. This method may

be used to reﬁne separators during the uncoarsening phase of the multi-level

method, and is faster than a vertex Fiduccia-Mattheyses algorithm with {move

=0}.

Mesh-to-graph method. Available only for mesh separation strategies. From

the mesh to which this method applies is derived a graph, such that a graph

vertex is associated with every node of the mesh, and a clique is created

between all vertices which represent nodes that belong to the same element.

A graph separation strategy is then applied to the derived graph, and the

separator is projected back to the nodes of the mesh. This method is here

for evaluation purposes only, as mesh separation methods are generally more

eﬃcient than their graph separation counterpart.

strat=strat

Graph separation strategy to apply to the associated graph.

Graph separator viewer. Available only for graph separation strategies. Ev-

ery call to this method results in the creation, in the current subdirectory,

of partial mapping ﬁles called “vgraphseparatevw output nnnnnnnn.map”,

where “nnnnnnnn” are increasing decimal numbers, which contain the cur-

rent state of the two parts and the separator. These mapping ﬁles can be

used as input by the gout program to produce displays of the evolving shape

of the current separator and parts. This is mostly a debugging feature, but

it can also have an illustrative interest. While it is only available for graph

separation strategies, mesh separation strategies can indirectly use it through

the mesh-to-graph separation method.

Zero method. This method moves all of the node vertices to the ﬁrst part,

resulting in an empty separator. Its main use is to stop the separation process

whenever some condition is true.

7.4 Target architecture handling routines

7.4.1 SCOTCH archInit

Synopsis

int SCOTCH archInit (SCOTCH Arch * archptr)

scotchfarchinit (doubleprecision (*) archdat,

integer

ierr)

Description

The SCOTCH archInit function initializes a SCOTCH Arch structure so as to

make it suitable for future operations. It should be the ﬁrst function to be

called upon a SCOTCH Arch structure. When the target architecture data is

no longer of use, call function SCOTCH archExit to free its internal structures.

Return values

SCOTCH archInit returns 0 if the graph structure has been successfully ini-

tialized, and 1 else.

7.4.2 SCOTCH archExit

Synopsis

void SCOTCH archExit (SCOTCH Arch * archptr)

scotchfarchexit (doubleprecision (*) archdat)

Description

The SCOTCH archExit function frees the contents of a SCOTCH Arch structure

previously initialized by SCOTCH archInit. All subsequent calls to SCOTCH

arch routines other than SCOTCH archInit, using this structure as parameter,

may yield unpredictable results.

7.4.3 SCOTCH archLoad

Synopsis

int SCOTCH archLoad (SCOTCH Arch * archptr,

FILE *

stream)

scotchfarchload (doubleprecision (*) archdat,

integer

fildes,

ierr)

Description

The SCOTCH archLoad routine ﬁlls the SCOTCH Arch structure pointed to by

archptr with the source graph description available from stream stream in

the Scotch target architecture format (see Section 5.4).

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

architecture ﬁle.

Return values

SCOTCH archLoad returns 0 if the target architecture structure has been suc-

cessfully allocated and ﬁlled with the data read, and 1 else.

7.4.4 SCOTCH archSave

Synopsis

int SCOTCH archSave (const SCOTCH Arch * archptr,

FILE *

stream)

scotchfarchsave (doubleprecision (*) archdat,

integer

fildes,

ierr)

Description

The SCOTCH archSave routine saves the contents of the SCOTCH Arch structure

pointed to by archptr to stream stream, in the Scotch target architecture

format (see section 5.4).

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

architecture ﬁle.

Return values

SCOTCH archSave returns 0 if the graph structure has been successfully writ-

ten to stream, and 1 else.

7.4.5 SCOTCH archName

Synopsis

const char * SCOTCH archName (const SCOTCH Arch * archptr)

scotchfarchname (doubleprecision (*) archdat,

character (*)

integer

chartab,

charnbr)

Description

The SCOTCH archName function returns a string containing the name of the

architecture pointed to by archptr. Since Fortran routines cannot return

string pointers, the scotchfarchname routine takes as second and third pa-

rameters a character() array to be ﬁlled with the name of the architecture,

and the integer size of the array, respectively. If the array is of suﬃcient

size, a trailing nul character is appended to the string to materialize the end

of the string (this is the C style of handling character strings).

Return values

SCOTCH archName returns a non-null character pointer that points to a null-

terminated string describing the type of the architecture.

7.4.6 SCOTCH archSize

Synopsis

SCOTCH Num SCOTCH archSize (const SCOTCH Arch * archptr)

scotchfarchsize (doubleprecision (*) archdat,

integer*num

archnbr)

Description

The SCOTCH archSize function returns the number of nodes of the given tar-

get architecture. The Fortran routine has a second parameter, of integer type,

which is set on return with the number of nodes of the target architecture.

Return values

SCOTCH archSize returns the number of nodes of the target architecture.

7.4.7 SCOTCH archBuild

Synopsis

int SCOTCH archBuild (SCOTCH Arch *

archptr,

const SCOTCH Graph * grafptr,

const SCOTCH Num

listnbr,

listtab,

const SCOTCH Num *

const SCOTCH Strat * straptr)

scotchfarchbuild (doubleprecision (*) archdat,

doubleprecision (*) grafdat,

integer*num

listnbr,

listtab,

integer*num (*)

doubleprecision (*) stradat,

integer ierr)

Description

The SCOTCH archBuild routine ﬁlls the architecture structure pointed to by

archptr with the decomposition-deﬁned target architecture computed by ap-

plying the graph bipartitioning strategy pointed to by straptr to the archi-

tecture graph pointed to by grafptr.

When listptr is not NULL and listnbr is greater than zero, the

decomposition-deﬁned architecture is restricted to the listnbr vertices whose

indices are given in the array pointed to by listtab, from listtab[0] to

listtab[listnbr - 1]. These indices should have the same base value as

the one of the graph pointed to by grafptr, that is, be in the range from 0 to

vertnbr − 1 if the graph base is 0, and from 1 to vertnbr if the graph base

is 1.

Graph bipartitioning strategies are declared by means of the SCOTCH strat

GraphBipart function, described in page 109. The syntax of bipartitioning

strategy strings is deﬁned in section 7.3.2, page 59. Additional information

may be obtained from the manual page of amk grf, the stand-alone executable

that uses function SCOTCH archBuild to build decomposition-deﬁned target

architecture from source graphs, available at page 32.

Return values

SCOTCH archBuild returns 0 if the decomposition-deﬁned architecture has

been successfully computed, and 1 else.

7.4.8 SCOTCH archCmplt

Synopsis

int SCOTCH archCmplt (SCOTCH Arch *

archptr,

const SCOTCH Num vertnbr)

scotchfarchcmplt (doubleprecision (*) archdat,

integer*num

vertnbr,

ierr)

integer

Description

The SCOTCH archCmplt routine ﬁlls the SCOTCH Arch structure pointed to by

archptr with the description of a complete graph architecture with vertnbr

processors, which can be used as input to SCOTCH graphMap to perform graph

partitioning. A shortcut to this is to use the SCOTCH graphPart routine.

Return values

SCOTCH archCmplt returns 0 if the complete graph target architecture has

been successfully built, and 1 else.

7.4.9 SCOTCH archCmpltw

Synopsis

int SCOTCH archCmpltw (SCOTCH Arch *

const SCOTCH Num

archptr,

vertnbr,

const SCOTCH Num * const velotab)

scotchfarchcmplt (doubleprecision (*) archdat,

integer*num

integer*num (*)

integer

vertnbr,

velotab,

ierr)

Description

The SCOTCH archCmpltw routine ﬁlls the SCOTCH Arch structure pointed to

by archptr with the description of a weighted complete graph architecture

with vertnbr processors. The relative weights of the processors are given in

the velotab array. Once the target architecture has been created, it can be

used as input to SCOTCH graphMap to perform weighted graph partitioning.

Return values

SCOTCH archCmpltw returns 0 if the weighted complete graph target architec-

ture has been successfully built, and 1 else.

7.4.10 SCOTCH archHcub

Synopsis

int SCOTCH archHcub (SCOTCH Arch *

archptr,

const SCOTCH Num hdimval)

scotchfarchhcub (doubleprecision (*) archdat,

integer*num

hdimval,

ierr)

integer

Description

The SCOTCH archHcub routine ﬁlls the SCOTCH Arch structure pointed to by

archptr with the description of a hypercube graph of dimension hdimval.

Return values

SCOTCH archHcub returns 0 if the hypercube target architecture has been suc-

cessfully built, and 1 else.

7.4.11 SCOTCH archMesh2D

Synopsis

int SCOTCH archMesh2D (SCOTCH Arch *

archptr,

const SCOTCH Num xdimval,

const SCOTCH Num ydimval)

scotchfarchmesh2d (doubleprecision (*) archdat,

integer*num

integer

xdimval,

ydimval,

ierr)

Description

The SCOTCH archMesh2D routine ﬁlls the SCOTCH Arch structure pointed to

by archptr with the description of a 2D mesh architecture with xdimval ×

ydimval processors.

Return values

SCOTCH archMesh2D returns 0 if the 2D mesh target architecture has been

successfully built, and 1 else.

7.4.12 SCOTCH archMesh3D

Synopsis

int SCOTCH archMesh3D (SCOTCH Arch *

archptr,

const SCOTCH Num xdimval,

const SCOTCH Num ydimval,

const SCOTCH Num zdimval)

scotchfarchmesh3d (doubleprecision (*) archdat,

integer*num

integer

xdimval,

ydimval,

zdimval,

ierr)

Description

The SCOTCH archMesh3D routine ﬁlls the SCOTCH Arch structure pointed to

by archptr with the description of a 3D mesh architecture with xdimval ×

ydimval × zdimval processors.

Return values

SCOTCH archMesh3D returns 0 if the 3D mesh target architecture has been

successfully built, and 1 else.

7.4.13 SCOTCH archTleaf

Synopsis

int SCOTCH archTleaf (SCOTCH Arch *

const SCOTCH Num

archptr,

levlnbr,

const SCOTCH Num * sizetab,

const SCOTCH Num * linktab)

scotchfarchtleaf (doubleprecision (*) archdat,

integer*num

integer*num (*)

integer

levlnbr,

sizetab,

linktab,

ierr)

Description

The SCOTCH archTleaf routine ﬁlls the SCOTCH Arch structure pointed to by

archptr with the description of a tree-shaped, hierarchical graph architecture

ꢁ

with ^levlnbr−1sizetab[i] processors. Level 0 is the root of the tree. For

each level i, with 0 ≤ i < levlnbr, sizetab[i] is the number of childs at level

i=0

(i + 1) of each node at level i, and linktab[i] is the cost of communication

between processors the ﬁrst common ancestor of which belongs to this level.

See Section 5.4.2, page 24, for an example of such an architecture.

Return values

SCOTCH archTleaf returns 0 if the tree-leaf target architecture has been suc-

cessfully built, and 1 else.

7.4.14 SCOTCH archTorus2D

Synopsis

int SCOTCH archTorus2D (SCOTCH Arch *

archptr,

const SCOTCH Num xdimval,

const SCOTCH Num ydimval)

scotchfarchtorus2d (doubleprecision (*) archdat,

integer*num

integer

xdimval,

ydimval,

ierr)

Description

The SCOTCH archTorus2D routine ﬁlls the SCOTCH Arch structure pointed to

by archptr with the description of a 2D torus architecture with xdimval ×

ydimval processors.

Return values

SCOTCH archTorus2D returns 0 if the 2D torus target architecture has been

successfully built, and 1 else.

7.4.15 SCOTCH archTorus3D

Synopsis

int SCOTCH archTorus3D (SCOTCH Arch *

archptr,

const SCOTCH Num xdimval,

const SCOTCH Num ydimval,

const SCOTCH Num zdimval)

scotchfarchtorus3d (doubleprecision (*) archdat,

integer*num

integer

xdimval,

ydimval,

zdimval,

ierr)

Description

The SCOTCH archTorus3D routine ﬁlls the SCOTCH Arch structure pointed to

by archptr with the description of a 3D torus architecture with xdimval ×

ydimval × zdimval processors.

Return values

SCOTCH archTorus3D returns 0 if the 3D torus target architecture has been

successfully built, and 1 else.

7.5 Graph handling routines

7.5.1 SCOTCH graphInit

Synopsis

int SCOTCH graphInit (SCOTCH Graph * grafptr)

scotchfgraphinit (doubleprecision (*) grafdat,

integer

ierr)

Description

The SCOTCH graphInit function initializes a SCOTCH Graph structure so as to

make it suitable for future operations. It should be the ﬁrst function to be

called upon a SCOTCH Graph structure. When the graph data is no longer of

use, call function SCOTCH graphExit to free its internal structures.

Return values

SCOTCH graphInit returns 0 if the graph structure has been successfully ini-

tialized, and 1 else.

7.5.2 SCOTCH graphExit

Synopsis

void SCOTCH graphExit (SCOTCH Graph * grafptr)

scotchfgraphexit (doubleprecision (*) grafdat)

Description

The SCOTCH graphExit function frees the contents of a SCOTCH Graph struc-

ture previously initialized by SCOTCH graphInit. All subsequent calls to

SCOTCH graph routines other than SCOTCH graphInit, using this structure

as parameter, may yield unpredictable results.

7.5.3 SCOTCH graphFree

Synopsis

void SCOTCH graphFree (SCOTCH Graph * grafptr)

scotchfgraphfree (doubleprecision (*) grafdat)

Description

The SCOTCH graphFree function frees the graph data of a SCOTCH Graph struc-

ture previously initialized by SCOTCH graphInit, but preserves its internal

data structures. This call is equivalent to a call to SCOTCH graphExit im-

mediately followed by a call to SCOTCH graphInit. Consequently, the given

SCOTCH Graph structure remains ready for subsequent calls to any routine of

the libScotch library.

7.5.4 SCOTCH graphLoad

Synopsis

int SCOTCH graphLoad (SCOTCH Graph * grafptr,

FILE *

stream,

baseval,

flagval)

SCOTCH Num

scotchfgraphload (doubleprecision (*) grafdat,

integer

fildes,

baseval,

flagval,

ierr)

integer*num

integer

Description

The SCOTCH graphLoad routine ﬁlls the SCOTCH Graph structure pointed to

by grafptr with the source graph description available from stream stream

in the Scotch graph format (see section 5.1).

To ease the handling of source graph ﬁles by programs written in C as well as

in Fortran, the base value of the graph to read can be set to 0 or 1, by setting

the baseval parameter to the proper value. A value of -1 indicates that the

graph base should be the same as the one provided in the graph description

that is read from stream.

The flagval value is a combination of the following integer values, that may

be added or bitwise-ored:

Keep vertex and edge weights if they are present in the stream data.

Remove vertex weights. The graph read will have all of its vertex weights

set to one, regardless of what is speciﬁed in the stream data.

Remove edge weights. The graph read will have all of its edge weights

set to one, regardless of what is speciﬁed in the stream data.

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the graph

ﬁle.

Return values

SCOTCH graphLoad returns 0 if the graph structure has been successfully al-

located and ﬁlled with the data read, and 1 else.

7.5.5 SCOTCH graphSave

Synopsis

int SCOTCH graphSave (const SCOTCH Graph * grafptr,

FILE * stream)

scotchfgraphsave (doubleprecision (*) grafdat,

integer

fildes,

ierr)

Description

The SCOTCH graphSave routine saves the contents of the SCOTCH Graph struc-

ture pointed to by grafptr to stream stream, in the Scotch graph format

(see section 5.1).

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the graph

ﬁle.

Return values

SCOTCH graphSave returns 0 if the graph structure has been successfully writ-

ten to stream, and 1 else.

7.5.6 SCOTCH graphBuild

Synopsis

int SCOTCH graphBuild (SCOTCH Graph *

const SCOTCH Num

grafptr,

baseval,

vertnbr,

const SCOTCH Num

const SCOTCH Num * verttab,

const SCOTCH Num * vendtab,

const SCOTCH Num * velotab,

const SCOTCH Num * vlbltab,

const SCOTCH Num

edgenbr,

const SCOTCH Num * edgetab,

const SCOTCH Num * edlotab)

scotchfgraphbuild (doubleprecision (*) grafdat,

integer*num

baseval,

vertnbr,

verttab,

vendtab,

velotab,

vlbltab,

edgenbr,

edgetab,

edlotab,

ierr)

integer*num

integer*num (*)

integer*num

integer*num (*)

integer

Description

The SCOTCH graphBuild routine ﬁlls the source graph structure pointed to

by grafptr with all of the data that are passed to it.

baseval is the graph base value for index arrays (typically 0 for structures

built from C and 1 for structures built from Fortran). vertnbr is the number

of vertices. verttab is the adjacency index array, of size (vertnbr + 1) if

the edge array is compact (that is, if vendtab equals verttab + 1 or NULL),

or of size vertnbr else. vendtab is the adjacency end index array, of size

vertnbr if it is disjoint from verttab. velotab is the vertex load array, of

size vertnbr if it exists. vlbltab is the vertex label array, of size vertnbr if

it exists. edgenbr is the number of arcs (that is, twice the number of edges).

edgetab is the adjacency array, of size at least edgenbr (it can be more if the

edge array is not compact). edlotab is the arc load array, of size edgenbr if

it exists.

The vendtab, velotab, vlbltab and edlotab arrays are optional, and a NULL

pointer can be passed as argument whenever they are not deﬁned. Since, in

Fortran, there is no null reference, passing the scotchfgraphbuild routine a

reference equal to verttab in the velotab or vlbltab ﬁelds makes them be

considered as missing arrays. The same holds for edlotab when it is passed a

reference equal to edgetab. Setting vendtab to refer to one cell after verttab

yields the same result, as it is the exact semantics of a compact vertex array.

To limit memory consumption, SCOTCH graphBuild does not copy array data,

but instead references them in the SCOTCH Graph structure. Therefore, great

care should be taken not to modify the contents of the arrays passed to

SCOTCH graphBuild as long as the graph structure is in use. Every update

of the arrays should be preceded by a call to SCOTCH graphFree, to free in-

ternal graph structures, and eventually followed by a new call to SCOTCH

graphBuild to re-build these internal structures so as to be able to use the

new graph.

To ensure that inconsistencies in user data do not result in an erroneous behav-

ior of the libScotch routines, it is recommended, at least in the development

stage, to call the SCOTCH graphCheck routine on the newly created SCOTCH

Graph structure before calling any other libScotch routine.

Return values

SCOTCH graphBuild returns 0 if the graph structure has been successfully set

with all of the input data, and 1 else.

7.5.7 SCOTCH graphBase

Synopsis

int SCOTCH graphBase (SCOTCH Graph * grafptr,

SCOTCH Num

baseval)

scotchfgraphbase (doubleprecision (*) grafdat,

integer*num

baseval,

integer*num

oldbaseval)

Description

The SCOTCH graphBase routine sets the base of all graph indices according to

the given base value, and returns the old base value. This routine is a helper

for applications that do not handle base values properly.

In Fortan, the old base value is returned in the third parameter of the function

call.

Return values

SCOTCH graphBase returns the old base value.

7.5.8 SCOTCH graphCheck

Synopsis

int SCOTCH graphCheck (const SCOTCH Graph * grafptr)

scotchfgraphcheck (doubleprecision (*) grafdat,

integer

ierr)

Description

The SCOTCH graphCheck routine checks the consistency of the given SCOTCH

Graph structure. It can be used in client applications to determine if a graph

that has been created from used-generated data by means of the SCOTCH

graphBuild routine is consistent, prior to calling any other routines of the

libScotch library.

Return values

SCOTCH graphCheck returns 0 if graph data are consistent, and 1 else.

7.5.9 SCOTCH graphSize

Synopsis

void SCOTCH graphSize (const SCOTCH Graph * grafptr,

SCOTCH Num *

vertptr,

edgeptr)

scotchfgraphsize (doubleprecision (*) grafdat,

integer*num

vertnbr,

edgenbr)

integer*num

Description

The SCOTCH graphSize routine ﬁlls the two areas of type SCOTCH Num pointed

to by vertptr and edgeptr with the number of vertices and arcs (that is, twice

the number of edges) of the given graph pointed to by grafptr, respectively.

Any of these pointers can be set to NULL on input if the corresponding infor-

mation is not needed. Else, the reference to a dummy area can be provided,

where all unwanted data will be written.

This routine is useful to get the size of a graph read by means of the SCOTCH

graphLoad routine, in order to allocate auxiliary arrays of proper sizes. If the

whole structure of the graph is wanted, function SCOTCH graphData should

be preferred.

7.5.10 SCOTCH graphData

Synopsis

void SCOTCH graphData (const SCOTCH Graph * grafptr,

SCOTCH Num *

SCOTCH Num **

SCOTCH Num *

SCOTCH Num **

baseptr,

vertptr,

verttab,

vendtab,

velotab,

vlbltab,

edgeptr,

edgetab,

edlotab)

scotchfgraphdata (doubleprecision (*) grafdat,

integer*num (*)

integer*num

integer*idx

integer*num

integer*idx

integer*num

indxtab,

baseval,

vertnbr,

vertidx,

vendidx,

veloidx,

vlblidx,

edgenbr,

edgeidx,

edloidx)

Description

The SCOTCH graphData routine is the dual of the SCOTCH graphBuild routine.

It is a multiple accessor that returns scalar values and array references.

baseptr is the pointer to a location that will hold the graph base value for

index arrays (typically 0 for structures built from C and 1 for structures built

from Fortran). vertptr is the pointer to a location that will hold the number

of vertices. verttab is the pointer to a location that will hold the reference

to the adjacency index array, of size *vertptr + 1 if the adjacency array

is compact, or of size *vertptr else. vendtab is the pointer to a location

that will hold the reference to the adjacency end index array, and is equal to

verttab + 1 if the adjacency array is compact. velotab is the pointer to a

location that will hold the reference to the vertex load array, of size *vertptr.

vlbltab is the pointer to a location that will hold the reference to the vertex

label array, of size vertnbr. edgeptr is the pointer to a location that will

hold the number of arcs (that is, twice the number of edges). edgetab is the

pointer to a location that will hold the reference to the adjacency array, of

size at least *edgeptr. edlotab is the pointer to a location that will hold the

reference to the arc load array, of size *edgeptr.

Any of these pointers can be set to NULL on input if the corresponding infor-

mation is not needed. Else, the reference to a dummy area can be provided,

where all unwanted data will be written.

Since there are no pointers in Fortran, a speciﬁc mechanism is used to allow

users to access graph arrays. The scotchfgraphdata routine is passed an

integer array, the ﬁrst element of which is used as a base address from which all

other array indices are computed. Therefore, instead of returning references,

the routine returns integers, which represent the starting index of each of the

relevant arrays with respect to the base input array, or vertidx, the index

of verttab, if they do not exist. For instance, if some base array myarray

(1) is passed as parameter indxtab, then the ﬁrst cell of array verttab

will be accessible as myarray(vertidx). In order for this feature to behave

properly, the indxtab array must be word-aligned with the graph arrays.

This is automatically enforced on most systems, but some care should be

taken on systems that allow one to access data that is not word-aligned. On

such systems, declaring the array after a dummy doubleprecision array

can coerce the compiler into enforcing the proper alignment. Also, on 32 64

architectures, such indices can be larger than the size of a regular INTEGER.

This is why the indices to be returned are deﬁned by means of a speciﬁc

integer type. See Section 7.1.4 for more information on this issue.

7.5.11 SCOTCH graphStat

Synopsis

void SCOTCH graphStat (const SCOTCH Graph * grafptr,

SCOTCH Num *

double *

velominptr,

velomaxptr,

velosumptr,

veloavgptr,

velodltptr,

degrminptr,

degrmaxptr,

degravgptr,

degrdltptr,

edlominptr,

edlomaxptr,

edlosumptr,

edloavgptr,

edlodltptr)

double *

SCOTCH Num *

double *

SCOTCH Num *

double *

scotchfgraphstat (doubleprecision (*) grafdat,

integer*num

velomin,

velomax,

velosum,

veloavg,

velodlt,

degrmin,

degrmax,

degravg,

degrdlt,

edlomin,

edlomax,

edlosum,

edloavg,

edlodlt)

integer*num

doubleprecision

integer*num

doubleprecision

integer*num

doubleprecision

Description

The SCOTCH graphStat routine produces some statistics regarding the graph

structure pointed to by grafptr. velomin, velomax, velosum, veloavg and

velodlt are the minimum vertex load, the maximum vertex load, the sum of

all vertex loads, the average vertex load, and the variance of the vertex loads,

respectively. degrmin, degrmax, degravg and degrdlt are the minimum ver-

tex degree, the maximum vertex degree, the average vertex degree, and the

variance of the vertex degrees, respectively. edlomin, edlomax, edlosum,

edloavg and edlodlt are the minimum edge load, the maximum edge load,

the sum of all edge loads, the average edge load, and the variance of the edge

loads, respectively.

7.6 Graph mapping and partitioning routines

The ﬁrst two routines provide high-level functionalities and free the user from the

burden of calling in sequence several of the low-level routines described afterward.

7.6.1 SCOTCH graphPart

Synopsis

int SCOTCH graphPart (const SCOTCH Graph * grafptr,

const SCOTCH Num

const SCOTCH Strat * straptr,

SCOTCH Num * parttab)

partnbr,

scotchfgraphpart (doubleprecision (*) grafdat,

integer*num partnbr,

doubleprecision (*) stradat,

integer*num (*)

parttab,

ierr)

integer

Description

The SCOTCH graphPart routine computes a partition into partnbr parts of the

source graph structure pointed to by grafptr, using the graph partitioning

strategy pointed to by stratptr, and returns the partition data in the array

pointed to by parttab.

The parttab array should have been previously allocated, of a size suﬃcient

to hold as many SCOTCH Num integers as there are vertices in the source graph.

On return, every array cell holds the number of the part to which the corre-

sponding vertex is mapped. Parts are numbered from 0 to partnbr − 1.

Return values

SCOTCH graphPart returns 0 if the partition of the graph has been successfully

computed, and 1 else. In this latter case, the parttab array may however have

been partially or completely ﬁlled, but its content is not signiﬁcant.

7.6.2 SCOTCH graphMap

Synopsis

int SCOTCH graphMap (const SCOTCH Graph * grafptr,

const SCOTCH Arch *

const SCOTCH Strat * straptr,

SCOTCH Num * parttab)

archptr,

scotchfgraphmap (doubleprecision (*) grafdat,

doubleprecision (*) archdat,

doubleprecision (*) stradat,

integer*num (*)

parttab,

ierr)

integer

Description

The SCOTCH graphMap routine computes a mapping of the source graph

structure pointed to by grafptr onto the target architecture pointed to by

archptr, using the mapping strategy pointed to by straptr, and returns the

mapping data in the array pointed to by parttab.

The parttab array should have been previously allocated, of a size suﬃcient

to hold as many SCOTCH Num integers as there are vertices in the source graph.

On return, every cell of the mapping array holds the number of the target

vertex to which the corresponding source vertex is mapped. The numbering

of target values is not based: target vertices are numbered from 0 to the

number of target vertices minus 1.

Return values

SCOTCH graphMap returns 0 if the partition of the graph has been successfully

computed, and 1 else. In this last case, the parttab array may however have

been partially or completely ﬁlled, but its content is not signiﬁcant.

7.6.3 SCOTCH graphMapInit

Synopsis

int SCOTCH graphMapInit (const SCOTCH Graph * grafptr,

SCOTCH Mapping *

const SCOTCH Arch *

SCOTCH Num *

mappptr,

archptr,

parttab)

scotchfgraphmapinit (doubleprecision (*) grafdat,

doubleprecision (*) mappdat,

doubleprecision (*) archdat,

integer*num (*)

parttab,

ierr)

integer

Description

The SCOTCH graphMapInit routine ﬁlls the mapping structure pointed to by

mappptr with all of the data that is passed to it. Thus, all subsequent calls

to ordering routines such as SCOTCH graphMapCompute, using this mapping

structure as parameter, will place mapping results in ﬁeld parttab.

parttab is the pointer to an array of as many SCOTCH Nums as there are vertices

in the graph pointed to by grafptr, and which will receive the indices of the

vertices of the target architecture pointed to by archptr.

It should be the ﬁrst function to be called upon a SCOTCH Mapping structure.

When the mapping structure is no longer of use, call function SCOTCH graph

MapExit to free its internal structures.

Return values

SCOTCH graphMapInit returns 0 if the mapping structure has been success-

fully initialized, and 1 else.

7.6.4 SCOTCH graphMapExit

Synopsis

void SCOTCH graphMapExit (const SCOTCH Graph * grafptr,

SCOTCH Mapping *

mappptr)

scotchfgraphmapexit (doubleprecision (*) grafdat,

doubleprecision (*) mappdat)

Description

The SCOTCH graphMapExit function frees the contents of a SCOTCH Mapping

structure previously initialized by SCOTCH graphMapInit. All subsequent calls

to SCOTCH graphMap* routines other than SCOTCH graphMapInit, using this

structure as parameter, may yield unpredictable results.

7.6.5 SCOTCH graphMapLoad

Synopsis

int SCOTCH graphMapLoad (const SCOTCH Graph * grafptr,

SCOTCH Mapping *

FILE *

mappptr,

stream)

scotchfgraphmapload (doubleprecision (*) grafdat,

doubleprecision (*) mappdat,

integer

fildes,

ierr)

Description

The SCOTCH graphMapLoad routine ﬁlls the SCOTCH Mapping structure pointed

to by mappptr with the mapping data available in the Scotch mapping for-

mat (see section 5.5) from stream stream.

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

mapping ﬁle.

Return values

SCOTCH graphMapLoad returns 0 if the mapping structure has been success-

fully loaded from stream, and 1 else.

7.6.6 SCOTCH graphMapSave

Synopsis

int SCOTCH graphMapSave (const SCOTCH Graph *

const SCOTCH Mapping * mappptr,

FILE * stream)

grafptr,

scotchfgraphmapsave (doubleprecision (*) grafdat,

doubleprecision (*) mappdat,

integer

fildes,

ierr)

Description

The SCOTCH graphMapSave routine saves the contents of the SCOTCH Mapping

structure pointed to by mappptr to stream stream, in the Scotch mapping

format (see section 5.5).

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

mapping ﬁle.

Return values

SCOTCH graphMapSave returns 0 if the mapping structure has been success-

fully written to stream, and 1 else.

7.6.7 SCOTCH graphMapCompute

Synopsis

int SCOTCH graphMapCompute (const SCOTCH Graph * grafptr,

SCOTCH Mapping *

mappptr,

const SCOTCH Strat * straptr)

scotchfgraphmapcompute (doubleprecision (*) grafdat,

doubleprecision (*) mappdat,

doubleprecision (*) stradat,

integer

ierr)

Description

The SCOTCH graphMapCompute routine computes a mapping on the given

SCOTCH Mapping structure pointed to by mappptr using the mapping strategy

pointed to by stratptr.

On return, every cell of the mapping array (see section 7.6.3) holds the number

of the target vertex to which the corresponding source vertex is mapped. The

numbering of target values is not based: target vertices are numbered from 0

to the number of target vertices, minus 1.

Return values

SCOTCH graphMapCompute returns 0 if the mapping has been successfully com-

puted, and 1 else. In this latter case, the mapping array may however have

been partially or completely ﬁlled, but its content is not signiﬁcant.

7.6.8 SCOTCH graphMapView

Synopsis

int SCOTCH graphMapView (const SCOTCH Graph *

const SCOTCH Mapping * mappptr,

FILE * stream)

grafptr,

scotchfgraphmapview (doubleprecision (*) grafdat,

doubleprecision (*) mappdat,

integer

fildes,

ierr)

Description

The SCOTCH mapView routine summarizes statistical information on the map-

ping pointed to by mappptr (load of target processors, number of neighboring

domains, average dilation and expansion, edge cut size, distribution of edge

dilations), and prints these results to stream stream.

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the output

data ﬁle.

Return values

SCOTCH mapView returns 0 if the data has been successfully written to stream,

and 1 else.

7.7 Graph ordering routines

The ﬁrst routine provides high-level functionality and frees the user from the burden

of calling in sequence several of the low-level routines described afterward.

7.7.1 SCOTCH graphOrder

Synopsis

int SCOTCH graphOrder (const SCOTCH Graph * grafptr,

const SCOTCH Strat * straptr,

SCOTCH Num *

permtab,

peritab,

cblkptr,

rangtab,

treetab)

scotchfgraphorder (doubleprecision (*) grafdat,

doubleprecision (*) stradat,

integer*num (*)

integer*num

permtab,

peritab,

cblknbr,

rangtab,

treetab,

ierr)

integer*num (*)

integer

Description

The SCOTCH graphOrder routine computes a block ordering of the unknowns

of the symmetric sparse matrix the adjacency structure of which is represented

by the source graph structure pointed to by grafptr, using the ordering

strategy pointed to by stratptr, and returns ordering data in the scalar

pointed to by cblkptr and the four arrays permtab, peritab, rangtab and

treetab.

The permtab, peritab, rangtab and treetab arrays should have been pre-

viously allocated, of a size suﬃcient to hold as many SCOTCH Num integers as

there are vertices in the source graph, plus one in the case of rangtab. Any

of the ﬁve output ﬁelds can be set to NULL if the corresponding information is

not needed. Since, in Fortran, there is no null reference, passing a reference

to grafptr in these ﬁelds will have the same eﬀect.

On return, permtab holds the direct permutation of the unknowns, that is,

vertex i of the original graph has index permtab[i] in the reordered graph,

while peritab holds the inverse permutation, that is, vertex i in the reordered

graph had index peritab[i] in the original graph. All of these indices are

numbered according to the base value of the source graph: permutation indices

are numbered from baseval to vertnbr + baseval − 1, that is, from 0 to

vertnbr − 1 if the graph base is 0, and from 1 to vertnbr if the graph base

is 1.

The three other result ﬁelds, *cblkptr, rangtab and treetab, contain data

related to the block structure. *cblkptr holds the number of column blocks

of the produced ordering, and rangtab holds the starting indices of each of the

permuted column blocks, in increasing order, so that column block i starts at

index rangtab[i] and ends at index (rangtab[i+1]−1), inclusive, in the new

ordering. treetab holds the separators tree structure, that is, treetab[i] is

the index of the father of column block i in the separators tree, or −1 if column

block i is the root of the separators tree. Please refer to Section 7.2.5 for more

information.

Return values

SCOTCH graphOrder returns 0 if the ordering of the graph has been successfully

computed, and 1 else. In this last case, the rangtab, permtab, and peritab

arrays may however have been partially or completely ﬁlled, but their contents

are not signiﬁcant.

7.7.2 SCOTCH graphOrderInit

Synopsis

int SCOTCH graphOrderInit (const SCOTCH Graph * grafptr,

SCOTCH Ordering *

SCOTCH Num *

ordeptr,

permtab,

peritab,

cblkptr,

rangtab,

treetab)

scotchfgraphorderinit (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer*num (*)

integer*num

permtab,

peritab,

cblknbr,

rangtab,

treetab,

ierr)

integer*num (*)

integer

Description

The SCOTCH graphOrderInit routine ﬁlls the ordering structure pointed to by

ordeptr with all of the data that are passed to it. Thus, all subsequent calls

to ordering routines such as SCOTCH graphOrderCompute, using this ordering

structure as parameter, will place ordering results in ﬁelds permtab, peritab,

*cblkptr, rangtab or treetab, if they are not set to NULL.

permtab is the ordering permutation array, of size vertnbr, peritab is the

inverse ordering permutation array, of size vertnbr, cblkptr is the pointer

to a SCOTCH Num that will receive the number of produced column blocks,

rangtab is the array that holds the column block span information, of size

vertnbr+ 1, and treetab is the array holding the structure of the separators

tree, of size vertnbr. See the above manual page of SCOTCH graphOrder, as

well as section 7.2.5, for an explanation of the semantics of all of these ﬁelds.

The SCOTCH graphOrderInit routine should be the ﬁrst function to be called

upon a SCOTCH Ordering structure for ordering graphs. When the ordering

structure is no longer of use, the SCOTCH graphOrderExit function must be

called, in order to to free its internal structures.

Return values

SCOTCH graphOrderInit returns 0 if the ordering structure has been success-

fully initialized, and 1 else.

7.7.3 SCOTCH graphOrderExit

Synopsis

void SCOTCH graphOrderExit (const SCOTCH Graph * grafptr,

SCOTCH Ordering *

ordeptr)

scotchfgraphorderexit (doubleprecision (*) grafdat,

doubleprecision (*) ordedat)

Description

The SCOTCH graphOrderExit function frees the contents of a SCOTCH

Ordering structure previously initialized by SCOTCH graphOrderInit. All

subsequent calls to SCOTCH graphOrder* routines other than SCOTCH graph

OrderInit, using this structure as parameter, may yield unpredictable results.

7.7.4 SCOTCH graphOrderLoad

Synopsis

int SCOTCH graphOrderLoad (const SCOTCH Graph * grafptr,

SCOTCH Ordering *

FILE *

ordeptr,

stream)

scotchfgraphorderload (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer

fildes,

ierr)

Description

The SCOTCH graphOrderLoad routine ﬁlls the SCOTCH Ordering structure

pointed to by ordeptr with the ordering data available in the Scotch order-

ing format (see section 5.6) from stream stream.

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

ordering ﬁle.

Return values

SCOTCH graphOrderLoad returns 0 if the ordering structure has been success-

fully loaded from stream, and 1 else.

7.7.5 SCOTCH graphOrderSave

Synopsis

int SCOTCH graphOrderSave (const SCOTCH Graph *

grafptr,

const SCOTCH Ordering * ordeptr,

FILE * stream)

scotchfgraphordersave (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer

fildes,

ierr)

Description

The SCOTCH graphOrderSave routine saves the contents of the SCOTCH

Ordering structure pointed to by ordeptr to stream stream, in the Scotch

ordering format (see section 5.6).

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

ordering ﬁle.

Return values

SCOTCH graphOrderSave returns 0 if the ordering structure has been success-

fully written to stream, and 1 else.

7.7.6 SCOTCH graphOrderSaveMap

Synopsis

int SCOTCH graphOrderSaveMap (const SCOTCH Graph *

const SCOTCH Ordering * ordeptr,

FILE * stream)

grafptr,

scotchfgraphordersavemap (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer

fildes,

ierr)

Description

The SCOTCH graphOrderSaveMap routine saves the block partitioning data as-

sociated with the SCOTCH Ordering structure pointed to by ordeptr to stream

stream, in the Scotch mapping format (see section 5.5). A target domain

number is associated with every block, such that all node vertices belonging

to the same block are shown as belonging to the same target vertex. The

resulting mapping ﬁle can be used by the gout program (see Section 6.3.12)

to produce pictures showing the diﬀerent separators and blocks.

Fortran users must use the PXFFILENO or FNUM functions to obtain the num-

ber of the Unix ﬁle descriptor fildes associated with the logical unit of the

mapping ﬁle.

Return values

SCOTCH graphOrderSaveMap returns 0 if the ordering structure has been suc-

cessfully written to stream, and 1 else.

7.7.7 SCOTCH graphOrderSaveTree

Synopsis

int SCOTCH graphOrderSaveTree (const SCOTCH Graph *

const SCOTCH Ordering * ordeptr,

FILE * stream)

grafptr,

scotchfgraphordersavetree (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer

fildes,

ierr)

Description

The SCOTCH graphOrderSaveTree routine saves the tree hierarchy informa-

tion associated with the SCOTCH Ordering structure pointed to by ordeptr

to stream stream.

The format of the tree output ﬁle resembles the one of a mapping or ordering

ﬁle: it is made up of as many lines as there are vertices in the ordering. Each

of these lines holds two integer numbers. The ﬁrst one is the index or the

label of the vertex, and the second one is the index of its parent node in the

separators tree, or −1 if the vertex belongs to a root node.

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the tree

mapping ﬁle.

Return values

SCOTCH graphOrderSaveTree returns 0 if the separators tree structure has

been successfully written to stream, and 1 else.

7.7.8 SCOTCH graphOrderCheck

Synopsis

int SCOTCH graphOrderCheck (const SCOTCH Graph *

grafptr,

const SCOTCH Ordering * ordeptr)

scotchfgraphordercheck (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer

ierr)

Description

The SCOTCH graphOrderCheck routine checks the consistency of the given

SCOTCH Ordering structure pointed to by ordeptr.

Return values

SCOTCH graphOrderCheck returns 0 if ordering data are consistent, and 1 else.

7.7.9 SCOTCH graphOrderCompute

Synopsis

int SCOTCH graphOrderCompute (const SCOTCH Graph * grafptr,

SCOTCH Ordering *

ordeptr,

const SCOTCH Strat * straptr)

scotchfgraphordercompute (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

doubleprecision (*) stradat,

integer

ierr)

Description

The SCOTCH graphOrderCompute routine computes a block ordering of the

graph structure pointed to by grafptr, using the ordering strategy pointed

to by stratptr, and stores its result in the ordering structure pointed to by

ordeptr.

On return, the ordering structure holds a block ordering of the given graph

(see section 7.7.2 for a description of the ordering ﬁelds).

Return values

SCOTCH graphOrderCompute returns 0 if the ordering has been successfully

computed, and 1 else. In this latter case, the ordering arrays may however

have been partially or completely ﬁlled, but their contents are not signiﬁcant.

7.7.10 SCOTCH graphOrderComputeList

Synopsis

int SCOTCH graphOrderComputeList (const SCOTCH Graph * grafptr,

SCOTCH Ordering *

SCOTCH Num

ordeptr,

listnbr,

listtab,

SCOTCH Num *

const SCOTCH Strat * straptr)

scotchfgraphordercompute (doubleprecision (*) grafdat,

doubleprecision (*) ordedat,

integer*num

listnbr,

listtab,

integer*num (*)

doubleprecision (*) stradat,

integer

ierr)

Description

The SCOTCH graphOrderComputeList routine computes a block ordering of a

subgraph of the graph structure pointed to by grafptr, using the ordering

strategy pointed to by stratptr, and stores its result in the ordering structure

pointed to by ordeptr. The induced subgraph is described by means of a

vertex list: listnbr holds the number of vertices to keep in the induced

subgraph, the indices of which are given, in any order, in the listtab array.

On return, the ordering structure holds a block ordering of the induced sub-

graph (see section 7.2.5 for a description of the ordering ﬁelds). To com-

pute this ordering, graph ordering methods such as the minimum degree and

minimum ﬁll methods will base on the original degree of the induced graph

vertices, their non-induced neighbors being considered as halo vertices (see

Section 3.2.3 for more information on halo vertices).

Because an ordering always refers to the full graph, the ordering com-

puted by SCOTCH graphOrderComputeList is divided into two distinct parts:

the induced graph vertices are ordered by applying to the induced graph

the strategy provided by the stratptr parameter, while non-induced ver-

tex are ordered consecutively with the highest available indices. Conse-

quently, the permuted indices of induced vertices range from baseval to

(listnbr + baseval − 1), while the permuted indices of the remaining ver-

tices range from (listnbr + baseval) to (vertnbr + baseval − 1), inclusive.

The separation tree yielded by SCOTCH graphOrderComputeList reﬂects this

property: it is made of two branches, the ﬁrst one corresponding to the in-

duced subgraph, and the second one to the remaining vertices. Since these

two subgraphs are not considered to be connected, both will have their own

root, represented by a −1 value in the treetab array of the ordering.

Return values

SCOTCH graphOrderComputeList returns 0 if the ordering has been success-

fully computed, and 1 else. In this latter case, the ordering arrays may however

have been partially or completely ﬁlled, but their contents are not signiﬁcant.

7.8 Mesh handling routines

7.8.1 SCOTCH meshInit

Synopsis

int SCOTCH meshInit (SCOTCH Mesh * meshptr)

scotchfmeshinit (doubleprecision (*) meshdat,

integer

ierr)

Description

The SCOTCH meshInit function initializes a SCOTCH Mesh structure so as to

make it suitable for future operations. It should be the ﬁrst function to be

called upon a SCOTCH Mesh structure. When the mesh data is no longer of

use, call function SCOTCH meshExit to free its internal structures.

Return values

SCOTCH meshInit returns 0 if the mesh structure has been successfully ini-

tialized, and 1 else.

7.8.2 SCOTCH meshExit

Synopsis

void SCOTCH meshExit (SCOTCH Mesh * meshptr)

scotchfmeshexit (doubleprecision (*) meshdat)

Description

The SCOTCH meshExit function frees the contents of a SCOTCH Mesh structure

previously initialized by SCOTCH meshInit. All subsequent calls to SCOTCH

mesh* routines other than SCOTCH meshInit, using this structure as parame-

ter, may yield unpredictable results.

7.8.3 SCOTCH meshLoad

Synopsis

int SCOTCH meshLoad (SCOTCH Mesh * meshptr,

FILE *

stream,

SCOTCH Num

baseval)

scotchfmeshload (doubleprecision (*) meshdat,

integer

fildes,

baseval,

ierr)

integer*num

integer

Description

The SCOTCH meshLoad routine ﬁlls the SCOTCH Mesh structure pointed to by

meshptr with the source mesh description available from stream stream in

the Scotch mesh format (see section 5.2).

To ease the handling of source mesh ﬁles by programs written in C as well as

in Fortran, The base value of the mesh to read can be set to 0 or 1, by setting

the baseval parameter to the proper value. A value of -1 indicates that the

mesh base should be the same as the one provided in the mesh description

that is read from stream.

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the mesh

ﬁle.

Return values

SCOTCH meshLoad returns 0 if the mesh structure has been successfully allo-

cated and ﬁlled with the data read, and 1 else.

7.8.4 SCOTCH meshSave

Synopsis

int SCOTCH meshSave (const SCOTCH Mesh * meshptr,

FILE * stream)

scotchfmeshsave (doubleprecision (*) meshdat,

integer

fildes,

ierr)

Description

The SCOTCH meshSave routine saves the contents of the SCOTCH Mesh structure

pointed to by meshptr to stream stream, in the Scotch mesh format (see

section 5.2).

Fortran users must use the PXFFILENO or FNUM functions to obtain the number

of the Unix ﬁle descriptor fildes associated with the logical unit of the mesh

ﬁle.

Return values

SCOTCH meshSave returns 0 if the mesh structure has been successfully written

to stream, and 1 else.

7.8.5 SCOTCH meshBuild

Synopsis

int SCOTCH meshBuild (SCOTCH Mesh *

const SCOTCH Num

meshptr,

velmbas,

vnodbas,

velmnbr,

vnodnbr,

const SCOTCH Num

const SCOTCH Num * verttab,

const SCOTCH Num * vendtab,

const SCOTCH Num * velotab,

const SCOTCH Num * vnlotab,

const SCOTCH Num * vlbltab,

const SCOTCH Num

edgenbr,

const SCOTCH Num * edgetab)

scotchfmeshbuild (doubleprecision (*) meshdat,

integer*num

velmbas,

vnodbas,

velmnbr,

vnodnbr,

verttab,

vendtab,

velotab,

vnlotab,

vlbltab,

edgenbr,

edgetab,

ierr)

integer*num

integer*num (*)

integer*num

integer*num (*)

integer*num

Description

The SCOTCH meshBuild routine ﬁlls the source mesh structure pointed to by

meshptr with all of the data that is passed to it.

velmbas and vnodbas are the base values for the element and node ver-

tices, respectively. velmnbr and vnodnbr are the number of element and

node vertices, respectively, such that either velmbas+ velmnbr = vnodnbr or

vnodbas+vnodnbr = velmnbr holds, and typically min(velmbas, vnodbas) is

0 for structures built from C and 1 for structures built from Fortran. verttab

is the adjacency index array, of size (velmnbr + vnodnbr + 1) if the edge ar-

ray is compact (that is, if vendtab equals vendtab + 1 or NULL), or of size

(velmnbr + vnodnbr1) else. vendtab is the adjacency end index array, of size

(velmnbr + vnodnbr) if it is disjoint from verttab. velotab is the element

vertex load array, of size velmnbr if it exists. vnlotab is the node vertex load

array, of size vnodnbr if it exists. vlbltab is the vertex label array, of size

(velmnbr+vnodnbr) if it exists. edgenbr is the number of arcs (that is, twice

the number of edges). edgetab is the adjacency array, of size at least edgenbr

(it can be more if the edge array is not compact).

The vendtab, velotab, vnlotab and vlbltab arrays are optional, and a NULL

pointer can be passed as argument whenever they are not deﬁned. Since, in

Fortran, there is no null reference, passing the scotchfmeshbuild routine a

reference equal to verttab in the velotab, vnlotab or vlbltab ﬁelds makes

them be considered as missing arrays. Setting vendtab to refer to one cell

after verttab yields the same result, as it is the exact semantics of a compact

vertex array.

To limit memory consumption, SCOTCH meshBuild does not copy array data,

but instead references them in the SCOTCH Mesh structure. Therefore, great

care should be taken not to modify the contents of the arrays passed to

SCOTCH meshBuild as long as the mesh structure is in use. Every update

of the arrays should be preceded by a call to SCOTCH meshExit, to free in-

ternal mesh structures, and eventually followed by a new call to SCOTCH

meshBuild to re-build these internal structures so as to be able to use the

new mesh.

To ensure that inconsistencies in user data do not result in an erroneous behav-

ior of the libScotch routines, it is recommended, at least in the development

stage, to call the SCOTCH meshCheck routine on the newly created SCOTCH

Mesh structure, prior to any other calls to libScotch routines.

Return values

SCOTCH meshBuild returns 0 if the mesh structure has been successfully set

with all of the input data, and 1 else.

7.8.6 SCOTCH meshCheck

Synopsis

int SCOTCH meshCheck (const SCOTCH Mesh * meshptr)

scotchfmeshcheck (doubleprecision (*) meshdat,

integer

ierr)

Description

The SCOTCH meshCheck routine checks the consistency of the given SCOTCH

Mesh structure. It can be used in client applications to determine if a mesh

that has been created from used-generated data by means of the SCOTCH

meshBuild routine is consistent, prior to calling any other routines of the

libScotch library.

Return values

SCOTCH meshCheck returns 0 if mesh data are consistent, and 1 else.

7.8.7 SCOTCH meshSize

Synopsis

void SCOTCH meshSize (const SCOTCH Mesh * meshptr,

SCOTCH Num *

velmptr,

vnodptr,

edgeptr)

scotchfmeshsize (doubleprecision (*) meshdat,

integer*num

velmnbr,

vnodnbr,

edgenbr)

Description

The SCOTCH meshSize routine ﬁlls the three areas of type SCOTCH Num pointed

to by velmptr, vnodptr and edgeptr with the number of element vertices,

node vertices and arcs (that is, twice the number of edges) of the given mesh

pointed to by meshptr, respectively.

Any of these pointers can be set to NULL on input if the corresponding infor-

mation is not needed. Else, the reference to a dummy area can be provided,

where all unwanted data will be written.

This routine is useful to get the size of a mesh read by means of the SCOTCH

meshLoad routine, in order to allocate auxiliary arrays of proper sizes. If the

whole structure of the mesh is wanted, function SCOTCH meshData should be

preferred.

7.8.8 SCOTCH meshData

Synopsis

void SCOTCH meshData (const SCOTCH Mesh * meshptr,

SCOTCH Num *

SCOTCH Num **

SCOTCH Num *

SCOTCH Num **

SCOTCH Num *

vebaptr,

vnbaptr,

velmptr,

vnodptr,

verttab,

vendtab,

velotab,

vnlotab,

vlbltab,

edgeptr,

edgetab,

degrptr)

scotchfmeshdata (doubleprecision (*) meshdat,

integer*num (*)

integer*num

integer*idx

integer*num

integer*idx

integer*num

indxtab,

velobas,

vnlobas,

velmnbr,

vnodnbr,

vertidx,

vendidx,

veloidx,

vnloidx,

vlblidx,

edgenbr,

edgeidx,

degrmax)

Description

The SCOTCH meshData routine is the dual of the SCOTCH meshBuild routine.

It is a multiple accessor that returns scalar values and array references.

vebaptr and vnbaptr are pointers to locations that will hold the mesh base

value for elements and nodes, respectively (the minimum of these two val-

ues is typically 0 for structures built from C and 1 for structures built from

Fortran). velmptr and vnodptr are pointers to locations that will hold the

number of element and node vertices, respectively. verttab is the pointer

to a location that will hold the reference to the adjacency index array, of

size (*velmptr + *vnodptr + 1) if the adjacency array is compact, or of size

(*velmptr+*vnodptr) else. vendtab is the pointer to a location that will hold

100

the reference to the adjacency end index array, and is equal to verttab + 1

if the adjacency array is compact. velotab and vnlotab are pointers to

locations that will hold the reference to the element and node vertex load

arrays, of sizes *velmptr and *vnodptr, respectively. vlbltab is the pointer

to a location that will hold the reference to the vertex label array, of size

(*velmptr + *vnodptr). edgeptr is the pointer to a location that will hold

the number of arcs (that is, twice the number of edges). edgetab is the pointer

to a location that will hold the reference to the adjacency array, of size at least

edgenbr. degrptr is the pointer to a location that will hold the maximum

vertex degree computed across all element and node vertices.

Any of these pointers can be set to NULL on input if the corresponding infor-

mation is not needed. Else, the reference to a dummy area can be provided,

where all unwanted data will be written.

Since there are no pointers in Fortran, a speciﬁc mechanism is used to allow

users to access mesh arrays. The scotchfmeshdata routine is passed an inte-

ger array, the ﬁrst element of which is used as a base address from which all

other array indices are computed. Therefore, instead of returning references,

the routine returns integers, which represent the starting index of each of the

relevant arrays with respect to the base input array, or vertidx, the index

of verttab, if they do not exist. For instance, if some base array myarray

(1) is passed as parameter indxtab, then the ﬁrst cell of array verttab will

be accessible as myarray(vertidx). In order for this feature to behave prop-

erly, the indxtab array must be word-aligned with the mesh arrays. This is

automatically enforced on most systems, but some care should be taken on

systems that allow one to access data that is not word-aligned. On such sys-

tems, declaring the array after a dummy doubleprecision array can coerce

the compiler into enforcing the proper alignment. Also, on 32 64 architec-

tures, such indices can be larger than the size of a regular INTEGER. This is

why the indices to be returned are deﬁned by means of a speciﬁc integer type.

See Section 7.1.4 for more information on this issue.

7.8.9 SCOTCH meshStat

Synopsis

void SCOTCH meshStat (const SCOTCH Mesh * meshptr,

SCOTCH Num *

double *

vnlominptr,

vnlomaxptr,

vnlosumptr,

vnloavgptr,

vnlodltptr,

edegminptr,

edegmaxptr,

edegavgptr,

edegdltptr,

ndegminptr,

ndegmaxptr,

ndegavgptr,

ndegdltptr)

double *

SCOTCH Num *

double *

SCOTCH Num *

double *

101

scotchfmeshstat (doubleprecision (*) meshdat,

integer*num

vnlomin,

vnlomax,

vnlosum,

vnloavg,

vnlodlt,

edegmin,

edegmax,

edegavg,

edegdlt,

ndegmin,

ndegmax,

ndegavg,

ndegdlt)

integer*num

doubleprecision

integer*num

doubleprecision

integer*num

doubleprecision

Description

The SCOTCH meshStat routine produces some statistics regarding the mesh

structure pointed to by meshptr. vnlomin, vnlomax, vnlosum, vnloavg and

vnlodlt are the minimum node vertex load, the maximum node vertex load,

the sum of all node vertex loads, the average node vertex load, and the vari-

ance of the node vertex loads, respectively. edegmin, edegmax, edegavg and

edegdlt are the minimum element vertex degree, the maximum element ver-

tex degree, the average element vertex degree, and the variance of the element

vertex degrees, respectively. ndegmin, ndegmax, ndegavg and ndegdlt are the

minimum element vertex degree, the maximum element vertex degree, the av-

erage element vertex degree, and the variance of the element vertex degrees,

respectively.

7.8.10 SCOTCH meshGraph

Synopsis

int SCOTCH meshGraph (const SCOTCH Mesh * meshptr,

SCOTCH Graph *

grafptr)

scotchfmeshgraph (doubleprecision (*) meshdat,

doubleprecision (*) grafdat,

integer

ierr)

Description

The SCOTCH meshGraph routine builds a graph from a mesh. It creates in the

SCOTCH Graph structure pointed to by grafptr a graph having as many ver-

tices as there are nodes in the SCOTCH Mesh structure pointed to by meshptr,

and where there is an edge between any two graph vertices if and only if

there exists in the mesh an element containing both of the associated nodes.

Consequently, all of the elements of the mesh are turned into cliques in the

resulting graph.

102

In order to save memory space as well as computation time, in the current

implementation of SCOTCH meshGraph, some mesh arrays are shared with the

graph structure. Therefore, one should make sure that the graph must no

longer be used after the mesh structure is freed. The graph structure can be

freed before or after the mesh structure, but must not be used after the mesh

structure is freed.

Return values

SCOTCH meshGraph returns 0 if the graph structure has been successfully al-

located and ﬁlled, and 1 else.

7.9 Mesh ordering routines

The ﬁrst routine provides high-level functionality and frees the user from the burden

of calling in sequence several of the low-level routines described afterward.

7.9.1 SCOTCH meshOrder

Synopsis

int SCOTCH meshOrder (const SCOTCH Mesh *

meshptr,

const SCOTCH Strat * straptr,

SCOTCH Num *

permtab,

peritab,

cblkptr,

rangtab,

treetab)

scotchfmeshorder (doubleprecision (*) meshdat,

doubleprecision (*) stradat,

integer*num (*)

integer*num

permtab,

peritab,

cblknbr,

rangtab,

treetab,

ierr)

integer*num (*)

integer

Description

The SCOTCH meshOrder routine computes a block ordering of the unknowns of

the symmetric sparse matrix the adjacency structure of which is represented

by the elements that connect the nodes of the source mesh structure pointed to

by meshptr, using the ordering strategy pointed to by stratptr, and returns

ordering data in the scalar pointed to by cblkptr and the four arrays permtab,

peritab, rangtab and treetab.

The permtab, peritab, rangtab and treetab arrays should have been pre-

viously allocated, of a size suﬃcient to hold as many SCOTCH Num integers as

there are node vertices in the source mesh, plus one in the case of rangtab.

Any of the ﬁve output ﬁelds can be set to NULL if the corresponding infor-

mation is not needed. Since, in Fortran, there is no null reference, passing a

reference to meshptr in these ﬁelds will have the same eﬀect.

103

On return, permtab holds the direct permutation of the unknowns, that is,

node vertex i of the original mesh has index permtab[i] in the reordered

mesh, while peritab holds the inverse permutation, that is, node vertex i

in the reordered mesh had index peritab[i] in the original mesh. All of

these indices are numbered according to the base value of the source mesh:

permutation indices are numbered from min(velmbas, vnodbas) to vnodnbr+

min(velmbas, vnodbas) − 1, that is, from 0 to vnodnbr − 1 if the mesh base

is 0, and from 1 to vnodnbr if the mesh base is 1. The base value for mesh

orderings is taken as min(velmbas, vnodbas), and not just as vnodbas, such

that orderings that are computed on some mesh have exactly the same index

range as orderings that would be computed on the graph obtained from the

original mesh by means of the SCOTCH meshGraph routine.