RELAPACK Configuration ====================== ReLAPACK has two configuration files: `make.inc`, which is included by the Makefile, and `config.h` which is included in the source files. Build and Testing Environment ----------------------------- The build environment (compiler and flags) and the test configuration (linker flags for BLAS and LAPACK) are specified in `make.inc`. The test matrix size and error bounds are defined in `test/config.h`. The library `librelapack.a` is compiled by invoking `make`. The tests are performed by either `make test` or calling `make` in the test folder. BLAS/LAPACK complex function interfaces --------------------------------------- For BLAS and LAPACK functions that return a complex number, there exist two conflicting (FORTRAN compiler dependent) calling conventions: either the result is returned as a `struct` of two floating point numbers or an additional first argument with a pointer to such a `struct` is used. By default ReLAPACK uses the former (which is what gfortran uses), but it can switch to the latter by setting `COMPLEX_FUNCTIONS_AS_ROUTINES` (or explicitly the BLAS and LAPACK specific counterparts) to `1` in `config.h`. **For MKL, `COMPLEX_FUNCTIONS_AS_ROUTINES` must be set to `1`.** (Using the wrong convention will break `ctrsyl` and `ztrsyl` and the test cases will segfault or return errors on the order of 1 or larger.) BLAS extension `xgemmt` ----------------------- The LDL decompositions require a general matrix-matrix product that updates only a triangular matrix called `xgemmt`. If the BLAS implementation linked against provides such a routine, set the flag `HAVE_XGEMMT` to `1` in `config.h`; otherwise, ReLAPACK uses its own recursive implementation of these kernels. `xgemmt` is provided by MKL. Routine Selection ----------------- ReLAPACK's routines are named `RELAPACK_X` (e.g., `RELAPACK_dgetrf`). If the corresponding `INCLUDE_X` flag in `config.h` (e.g., `INCLUDE_DGETRF`) is set to `1`, ReLAPACK additionally provides a wrapper under the LAPACK name (e.g., `dgetrf_`). By default, wrappers for all routines are enabled. Crossover Size -------------- The crossover size determines below which matrix sizes ReLAPACK's recursive algorithms switch to LAPACK's unblocked routines to avoid tiny BLAS Level 3 routines. The crossover size is set in `config.h` and can be chosen either globally for the entire library, by operation, or individually by routine. Allowing Temporary Buffers -------------------------- Two of ReLAPACK's routines make use of temporary buffers, which are allocated and freed within ReLAPACK. Setting `ALLOW_MALLOC` (or one of the routine specific counterparts) to 0 in `config.h` will disable these buffers. The affected routines are: * `xsytrf`: The LDL decomposition requires a buffer of size n^2 / 2. As in LAPACK, this size can be queried by setting `lWork = -1` and the passed buffer will be used if it is large enough; only if it is not, a local buffer will be allocated. The advantage of this mechanism is that ReLAPACK will seamlessly work even with codes that statically provide too little memory instead of breaking them. * `xsygst`: The reduction of a real symmetric-definite generalized eigenproblem to standard form can use an auxiliary buffer of size n^2 / 2 to avoid redundant computations. It thereby performs about 30% less FLOPs than LAPACK. FORTRAN symbol names -------------------- ReLAPACK is commonly linked to BLAS and LAPACK with standard FORTRAN interfaces. Since these libraries usually have an underscore to their symbol names, ReLAPACK has configuration switches in `config.h` to adjust the corresponding routine names.