3 * =========== DOCUMENTATION ===========
5 * Online html documentation available at
6 * http://www.netlib.org/lapack/explore-html/
9 *> Download ZGEJSV + dependencies
10 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/zgejsv.f">
12 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/zgejsv.f">
14 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/zgejsv.f">
21 * SUBROUTINE ZGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
22 * M, N, A, LDA, SVA, U, LDU, V, LDV,
23 * CWORK, LWORK, RWORK, LRWORK, IWORK, INFO )
25 * .. Scalar Arguments ..
27 * INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
29 * .. Array Arguments ..
30 * COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ), CWORK( LWORK )
31 * DOUBLE PRECISION SVA( N ), RWORK( LRWORK )
33 * CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
42 *> ZGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
43 *> matrix [A], where M >= N. The SVD of [A] is written as
45 *> [A] = [U] * [SIGMA] * [V]^*,
47 *> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
48 *> diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
49 *> [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
50 *> the singular values of [A]. The columns of [U] and [V] are the left and
51 *> the right singular vectors of [A], respectively. The matrices [U] and [V]
52 *> are computed and stored in the arrays U and V, respectively. The diagonal
53 *> of [SIGMA] is computed and stored in the array SVA.
61 *> JOBA is CHARACTER*1
62 *> Specifies the level of accuracy:
63 *> = 'C': This option works well (high relative accuracy) if A = B * D,
64 *> with well-conditioned B and arbitrary diagonal matrix D.
65 *> The accuracy cannot be spoiled by COLUMN scaling. The
66 *> accuracy of the computed output depends on the condition of
67 *> B, and the procedure aims at the best theoretical accuracy.
68 *> The relative error max_{i=1:N}|d sigma_i| / sigma_i is
69 *> bounded by f(M,N)*epsilon* cond(B), independent of D.
70 *> The input matrix is preprocessed with the QRF with column
71 *> pivoting. This initial preprocessing and preconditioning by
72 *> a rank revealing QR factorization is common for all values of
73 *> JOBA. Additional actions are specified as follows:
74 *> = 'E': Computation as with 'C' with an additional estimate of the
75 *> condition number of B. It provides a realistic error bound.
76 *> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
77 *> D1, D2, and well-conditioned matrix C, this option gives
78 *> higher accuracy than the 'C' option. If the structure of the
79 *> input matrix is not known, and relative accuracy is
80 *> desirable, then this option is advisable. The input matrix A
81 *> is preprocessed with QR factorization with FULL (row and
83 *> = 'G' Computation as with 'F' with an additional estimate of the
84 *> condition number of B, where A=D*B. If A has heavily weighted
85 *> rows, then using this condition number gives too pessimistic
87 *> = 'A': Small singular values are the noise and the matrix is treated
88 *> as numerically rank defficient. The error in the computed
89 *> singular values is bounded by f(m,n)*epsilon*||A||.
90 *> The computed SVD A = U * S * V^* restores A up to
91 *> f(m,n)*epsilon*||A||.
92 *> This gives the procedure the licence to discard (set to zero)
93 *> all singular values below N*epsilon*||A||.
94 *> = 'R': Similar as in 'A'. Rank revealing property of the initial
95 *> QR factorization is used do reveal (using triangular factor)
96 *> a gap sigma_{r+1} < epsilon * sigma_r in which case the
97 *> numerical RANK is declared to be r. The SVD is computed with
98 *> absolute error bounds, but more accurately than with 'A'.
103 *> JOBU is CHARACTER*1
104 *> Specifies whether to compute the columns of U:
105 *> = 'U': N columns of U are returned in the array U.
106 *> = 'F': full set of M left sing. vectors is returned in the array U.
107 *> = 'W': U may be used as workspace of length M*N. See the description
109 *> = 'N': U is not computed.
114 *> JOBV is CHARACTER*1
115 *> Specifies whether to compute the matrix V:
116 *> = 'V': N columns of V are returned in the array V; Jacobi rotations
117 *> are not explicitly accumulated.
118 *> = 'J': N columns of V are returned in the array V, but they are
119 *> computed as the product of Jacobi rotations. This option is
120 *> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
121 *> = 'W': V may be used as workspace of length N*N. See the description
123 *> = 'N': V is not computed.
128 *> JOBR is CHARACTER*1
129 *> Specifies the RANGE for the singular values. Issues the licence to
130 *> set to zero small positive singular values if they are outside
131 *> specified range. If A .NE. 0 is scaled so that the largest singular
132 *> value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues
133 *> the licence to kill columns of A whose norm in c*A is less than
134 *> SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
135 *> where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E').
136 *> = 'N': Do not kill small columns of c*A. This option assumes that
137 *> BLAS and QR factorizations and triangular solvers are
138 *> implemented to work in that range. If the condition of A
139 *> is greater than BIG, use ZGESVJ.
140 *> = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
141 *> (roughly, as described above). This option is recommended.
142 *> ===========================
143 *> For computing the singular values in the FULL range [SFMIN,BIG]
149 *> JOBT is CHARACTER*1
150 *> If the matrix is square then the procedure may determine to use
151 *> transposed A if A^* seems to be better with respect to convergence.
152 *> If the matrix is not square, JOBT is ignored. This is subject to
153 *> changes in the future.
154 *> The decision is based on two values of entropy over the adjoint
155 *> orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
156 *> = 'T': transpose if entropy test indicates possibly faster
157 *> convergence of Jacobi process if A^* is taken as input. If A is
158 *> replaced with A^*, then the row pivoting is included automatically.
159 *> = 'N': do not speculate.
160 *> This option can be used to compute only the singular values, or the
161 *> full SVD (U, SIGMA and V). For only one set of singular vectors
162 *> (U or V), the caller should provide both U and V, as one of the
163 *> matrices is used as workspace if the matrix A is transposed.
164 *> The implementer can easily remove this constraint and make the
165 *> code more complicated. See the descriptions of U and V.
170 *> JOBP is CHARACTER*1
171 *> Issues the licence to introduce structured perturbations to drown
172 *> denormalized numbers. This licence should be active if the
173 *> denormals are poorly implemented, causing slow computation,
174 *> especially in cases of fast convergence (!). For details see [1,2].
175 *> For the sake of simplicity, this perturbations are included only
176 *> when the full SVD or only the singular values are requested. The
177 *> implementer/user can easily add the perturbation for the cases of
178 *> computing one set of singular vectors.
179 *> = 'P': introduce perturbation
180 *> = 'N': do not perturb
186 *> The number of rows of the input matrix A. M >= 0.
192 *> The number of columns of the input matrix A. M >= N >= 0.
197 *> A is COMPLEX*16 array, dimension (LDA,N)
198 *> On entry, the M-by-N matrix A.
204 *> The leading dimension of the array A. LDA >= max(1,M).
209 *> SVA is DOUBLE PRECISION array, dimension (N)
211 *> - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
212 *> computation SVA contains Euclidean column norms of the
213 *> iterated matrices in the array A.
214 *> - For WORK(1) .NE. WORK(2): The singular values of A are
215 *> (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
216 *> sigma_max(A) overflows or if small singular values have been
217 *> saved from underflow by scaling the input matrix A.
218 *> - If JOBR='R' then some of the singular values may be returned
219 *> as exact zeros obtained by "set to zero" because they are
220 *> below the numerical rank threshold or are denormalized numbers.
225 *> U is COMPLEX*16 array, dimension ( LDU, N )
226 *> If JOBU = 'U', then U contains on exit the M-by-N matrix of
227 *> the left singular vectors.
228 *> If JOBU = 'F', then U contains on exit the M-by-M matrix of
229 *> the left singular vectors, including an ONB
230 *> of the orthogonal complement of the Range(A).
231 *> If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
232 *> then U is used as workspace if the procedure
233 *> replaces A with A^*. In that case, [V] is computed
234 *> in U as left singular vectors of A^* and then
235 *> copied back to the V array. This 'W' option is just
236 *> a reminder to the caller that in this case U is
237 *> reserved as workspace of length N*N.
238 *> If JOBU = 'N' U is not referenced, unless JOBT='T'.
244 *> The leading dimension of the array U, LDU >= 1.
245 *> IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
250 *> V is COMPLEX*16 array, dimension ( LDV, N )
251 *> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
252 *> the right singular vectors;
253 *> If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
254 *> then V is used as workspace if the pprocedure
255 *> replaces A with A^*. In that case, [U] is computed
256 *> in V as right singular vectors of A^* and then
257 *> copied back to the U array. This 'W' option is just
258 *> a reminder to the caller that in this case V is
259 *> reserved as workspace of length N*N.
260 *> If JOBV = 'N' V is not referenced, unless JOBT='T'.
266 *> The leading dimension of the array V, LDV >= 1.
267 *> If JOBV = 'V' or 'J' or 'W', then LDV >= N.
272 *> CWORK is COMPLEX*16 array, dimension at least LWORK.
278 *> Length of CWORK to confirm proper allocation of workspace.
279 *> LWORK depends on the job:
281 *> 1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
282 *> 1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
283 *> LWORK >= 2*N+1. This is the minimal requirement.
284 *> ->> For optimal performance (blocked code) the optimal value
285 *> is LWORK >= N + (N+1)*NB. Here NB is the optimal
286 *> block size for ZGEQP3 and ZGEQRF.
287 *> In general, optimal LWORK is computed as
288 *> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF)).
289 *> 1.2. .. an estimate of the scaled condition number of A is
290 *> required (JOBA='E', or 'G'). In this case, LWORK the minimal
291 *> requirement is LWORK >= N*N + 3*N.
292 *> ->> For optimal performance (blocked code) the optimal value
293 *> is LWORK >= max(N+(N+1)*NB, N*N+3*N).
294 *> In general, the optimal length LWORK is computed as
295 *> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF),
296 *> N+N*N+LWORK(ZPOCON)).
298 *> 2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
300 *> -> the minimal requirement is LWORK >= 3*N.
301 *> -> For optimal performance, LWORK >= max(N+(N+1)*NB, 3*N,2*N+N*NB),
302 *> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQF,
303 *> ZUNMLQ. In general, the optimal length LWORK is computed as
304 *> LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZPOCON), N+LWORK(ZGESVJ),
305 *> N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
307 *> 3. If SIGMA and the left singular vectors are needed
308 *> -> the minimal requirement is LWORK >= 3*N.
309 *> -> For optimal performance:
310 *> if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB),
311 *> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
312 *> In general, the optimal length LWORK is computed as
313 *> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON),
314 *> 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)).
316 *> 4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
317 *> 4.1. if JOBV.EQ.'V'
318 *> the minimal requirement is LWORK >= 5*N+2*N*N.
319 *> 4.2. if JOBV.EQ.'J' the minimal requirement is
321 *> In both cases, the allocated CWORK can accommodate blocked runs
322 *> of ZGEQP3, ZGEQRF, ZGELQF, ZUNMQR, ZUNMLQ.
327 *> RWORK is DOUBLE PRECISION array, dimension at least LRWORK.
329 *> RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
330 *> such that SCALE*SVA(1:N) are the computed singular values
331 *> of A. (See the description of SVA().)
332 *> RWORK(2) = See the description of RWORK(1).
333 *> RWORK(3) = SCONDA is an estimate for the condition number of
334 *> column equilibrated A. (If JOBA .EQ. 'E' or 'G')
335 *> SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
336 *> It is computed using SPOCON. It holds
337 *> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
338 *> where R is the triangular factor from the QRF of A.
339 *> However, if R is truncated and the numerical rank is
340 *> determined to be strictly smaller than N, SCONDA is
341 *> returned as -1, thus indicating that the smallest
342 *> singular values might be lost.
344 *> If full SVD is needed, the following two condition numbers are
345 *> useful for the analysis of the algorithm. They are provied for
346 *> a developer/implementer who is familiar with the details of
349 *> RWORK(4) = an estimate of the scaled condition number of the
350 *> triangular factor in the first QR factorization.
351 *> RWORK(5) = an estimate of the scaled condition number of the
352 *> triangular factor in the second QR factorization.
353 *> The following two parameters are computed if JOBT .EQ. 'T'.
354 *> They are provided for a developer/implementer who is familiar
355 *> with the details of the method.
356 *> RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
357 *> of diag(A^* * A) / Trace(A^* * A) taken as point in the
358 *> probability simplex.
359 *> RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
365 *> Length of RWORK to confirm proper allocation of workspace.
366 *> LRWORK depends on the job:
368 *> 1. If only singular values are requested i.e. if
369 *> LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
371 *> 1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
372 *> then LRWORK = max( 7, N + 2 * M ).
373 *> 1.2. Otherwise, LRWORK = max( 7, 2 * N ).
374 *> 2. If singular values with the right singular vectors are requested
376 *> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
377 *> .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
379 *> 2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
380 *> then LRWORK = max( 7, N + 2 * M ).
381 *> 2.2. Otherwise, LRWORK = max( 7, 2 * N ).
382 *> 3. If singular values with the left singular vectors are requested, i.e. if
383 *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
384 *> .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
386 *> 3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
387 *> then LRWORK = max( 7, N + 2 * M ).
388 *> 3.2. Otherwise, LRWORK = max( 7, 2 * N ).
389 *> 4. If singular values with both the left and the right singular vectors
390 *> are requested, i.e. if
391 *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
392 *> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
394 *> 4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
395 *> then LRWORK = max( 7, N + 2 * M ).
396 *> 4.2. Otherwise, LRWORK = max( 7, 2 * N ).
401 *> IWORK is INTEGER array, of dimension:
402 *> If LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then
403 *> the dimension of IWORK is max( 3, 2 * N + M ).
404 *> Otherwise, the dimension of IWORK is
405 *> -> max( 3, 2*N ) for full SVD
406 *> -> max( 3, N ) for singular values only or singular
407 *> values with one set of singular vectors (left or right)
409 *> IWORK(1) = the numerical rank determined after the initial
410 *> QR factorization with pivoting. See the descriptions
412 *> IWORK(2) = the number of the computed nonzero singular values
413 *> IWORK(3) = if nonzero, a warning message:
414 *> If IWORK(3).EQ.1 then some of the column norms of A
415 *> were denormalized floats. The requested high accuracy
416 *> is not warranted by the data.
422 *> < 0 : if INFO = -i, then the i-th argument had an illegal value.
423 *> = 0 : successful exit;
424 *> > 0 : ZGEJSV did not converge in the maximal allowed number
425 *> of sweeps. The computed values may be inaccurate.
431 *> \author Univ. of Tennessee
432 *> \author Univ. of California Berkeley
433 *> \author Univ. of Colorado Denver
438 *> \ingroup complex16GEsing
440 *> \par Further Details:
441 * =====================
445 *> ZGEJSV implements a preconditioned Jacobi SVD algorithm. It uses ZGEQP3,
446 *> ZGEQRF, and ZGELQF as preprocessors and preconditioners. Optionally, an
447 *> additional row pivoting can be used as a preprocessor, which in some
448 *> cases results in much higher accuracy. An example is matrix A with the
449 *> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
450 *> diagonal matrices and C is well-conditioned matrix. In that case, complete
451 *> pivoting in the first QR factorizations provides accuracy dependent on the
452 *> condition number of C, and independent of D1, D2. Such higher accuracy is
453 *> not completely understood theoretically, but it works well in practice.
454 *> Further, if A can be written as A = B*D, with well-conditioned B and some
455 *> diagonal D, then the high accuracy is guaranteed, both theoretically and
456 *> in software, independent of D. For more details see [1], [2].
457 *> The computational range for the singular values can be the full range
458 *> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
459 *> & LAPACK routines called by ZGEJSV are implemented to work in that range.
460 *> If that is not the case, then the restriction for safe computation with
461 *> the singular values in the range of normalized IEEE numbers is that the
462 *> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
463 *> overflow. This code (ZGEJSV) is best used in this restricted range,
464 *> meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
465 *> returned as zeros. See JOBR for details on this.
466 *> Further, this implementation is somewhat slower than the one described
467 *> in [1,2] due to replacement of some non-LAPACK components, and because
468 *> the choice of some tuning parameters in the iterative part (ZGESVJ) is
469 *> left to the implementer on a particular machine.
470 *> The rank revealing QR factorization (in this code: ZGEQP3) should be
471 *> implemented as in [3]. We have a new version of ZGEQP3 under development
472 *> that is more robust than the current one in LAPACK, with a cleaner cut in
473 *> rank defficient cases. It will be available in the SIGMA library [4].
474 *> If M is much larger than N, it is obvious that the initial QRF with
475 *> column pivoting can be preprocessed by the QRF without pivoting. That
476 *> well known trick is not used in ZGEJSV because in some cases heavy row
477 *> weighting can be treated with complete pivoting. The overhead in cases
478 *> M much larger than N is then only due to pivoting, but the benefits in
479 *> terms of accuracy have prevailed. The implementer/user can incorporate
480 *> this extra QRF step easily. The implementer can also improve data movement
481 *> (matrix transpose, matrix copy, matrix transposed copy) - this
482 *> implementation of ZGEJSV uses only the simplest, naive data movement.
485 *> \par Contributors:
488 *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
495 *> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
496 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
497 *> LAPACK Working note 169.
498 *> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
499 *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
500 *> LAPACK Working note 170.
501 *> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
502 *> factorization software - a case study.
503 *> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
504 *> LAPACK Working note 176.
505 *> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
506 *> QSVD, (H,K)-SVD computations.
507 *> Department of Mathematics, University of Zagreb, 2008.
510 *> \par Bugs, examples and comments:
511 * =================================
513 *> Please report all bugs and send interesting examples and/or comments to
514 *> drmac@math.hr. Thank you.
516 * =====================================================================
517 SUBROUTINE ZGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
518 $ M, N, A, LDA, SVA, U, LDU, V, LDV,
519 $ CWORK, LWORK, RWORK, LRWORK, IWORK, INFO )
521 * -- LAPACK computational routine (version 3.6.1) --
522 * -- LAPACK is a software package provided by Univ. of Tennessee, --
523 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
526 * .. Scalar Arguments ..
528 INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N
530 * .. Array Arguments ..
531 COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ),
533 DOUBLE PRECISION SVA( N ), RWORK( * )
535 CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
538 * ===========================================================================
540 * .. Local Parameters ..
541 DOUBLE PRECISION ZERO, ONE
542 PARAMETER ( ZERO = 0.0D0, ONE = 1.0D0 )
543 COMPLEX*16 CZERO, CONE
544 PARAMETER ( CZERO = ( 0.0D0, 0.0D0 ), CONE = ( 1.0D0, 0.0D0 ) )
546 * .. Local Scalars ..
548 DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1,
549 $ COND_OK, CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN,
550 $ MAXPRJ, SCALEM, SCONDA, SFMIN, SMALL, TEMP1,
551 $ USCAL1, USCAL2, XSC
552 INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
553 LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
554 $ L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN,
555 $ NOSCAL, ROWPIV, RSVEC, TRANSP
557 * .. Intrinsic Functions ..
558 INTRINSIC ABS, DCMPLX, DCONJG, DLOG, DMAX1, DMIN1, DBLE,
559 $ MAX0, MIN0, NINT, DSQRT
561 * .. External Functions ..
562 DOUBLE PRECISION DLAMCH, DZNRM2
563 INTEGER IDAMAX, IZAMAX
565 EXTERNAL IDAMAX, IZAMAX, LSAME, DLAMCH, DZNRM2
567 * .. External Subroutines ..
568 EXTERNAL DLASSQ, ZCOPY, ZGELQF, ZGEQP3, ZGEQRF, ZLACPY, ZLASCL,
569 $ DLASCL, ZLASET, ZLASSQ, ZLASWP, ZUNGQR, ZUNMLQ,
570 $ ZUNMQR, ZPOCON, DSCAL, ZDSCAL, ZSWAP, ZTRSM, XERBLA
575 * Test the input arguments
578 LSVEC = LSAME( JOBU, 'U' ) .OR. LSAME( JOBU, 'F' )
579 JRACC = LSAME( JOBV, 'J' )
580 RSVEC = LSAME( JOBV, 'V' ) .OR. JRACC
581 ROWPIV = LSAME( JOBA, 'F' ) .OR. LSAME( JOBA, 'G' )
582 L2RANK = LSAME( JOBA, 'R' )
583 L2ABER = LSAME( JOBA, 'A' )
584 ERREST = LSAME( JOBA, 'E' ) .OR. LSAME( JOBA, 'G' )
585 L2TRAN = LSAME( JOBT, 'T' )
586 L2KILL = LSAME( JOBR, 'R' )
587 DEFR = LSAME( JOBR, 'N' )
588 L2PERT = LSAME( JOBP, 'P' )
590 IF ( .NOT.(ROWPIV .OR. L2RANK .OR. L2ABER .OR.
591 $ ERREST .OR. LSAME( JOBA, 'C' ) )) THEN
593 ELSE IF ( .NOT.( LSVEC .OR. LSAME( JOBU, 'N' ) .OR.
594 $ LSAME( JOBU, 'W' )) ) THEN
596 ELSE IF ( .NOT.( RSVEC .OR. LSAME( JOBV, 'N' ) .OR.
597 $ LSAME( JOBV, 'W' )) .OR. ( JRACC .AND. (.NOT.LSVEC) ) ) THEN
599 ELSE IF ( .NOT. ( L2KILL .OR. DEFR ) ) THEN
601 ELSE IF ( .NOT. ( L2TRAN .OR. LSAME( JOBT, 'N' ) ) ) THEN
603 ELSE IF ( .NOT. ( L2PERT .OR. LSAME( JOBP, 'N' ) ) ) THEN
605 ELSE IF ( M .LT. 0 ) THEN
607 ELSE IF ( ( N .LT. 0 ) .OR. ( N .GT. M ) ) THEN
609 ELSE IF ( LDA .LT. M ) THEN
611 ELSE IF ( LSVEC .AND. ( LDU .LT. M ) ) THEN
613 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
615 ELSE IF ( (.NOT.(LSVEC .OR. RSVEC .OR. ERREST).AND.
616 $ (LWORK .LT. 2*N+1)) .OR.
617 $ (.NOT.(LSVEC .OR. RSVEC) .AND. ERREST .AND.
618 $ (LWORK .LT. N*N+3*N)) .OR.
619 $ (LSVEC .AND. (.NOT.RSVEC) .AND. (LWORK .LT. 3*N))
621 $ (RSVEC .AND. (.NOT.LSVEC) .AND. (LWORK .LT. 3*N))
623 $ (LSVEC .AND. RSVEC .AND. (.NOT.JRACC) .AND.
624 $ (LWORK.LT.5*N+2*N*N))
625 $ .OR. (LSVEC .AND. RSVEC .AND. JRACC .AND.
629 ELSE IF ( LRWORK.LT. MAX0(N+2*M,7)) THEN
636 IF ( INFO .NE. 0 ) THEN
638 CALL XERBLA( 'ZGEJSV', - INFO )
642 * Quick return for void matrix (Y3K safe)
644 IF ( ( M .EQ. 0 ) .OR. ( N .EQ. 0 ) ) THEN
650 * Determine whether the matrix U should be M x N or M x M
654 IF ( LSAME( JOBU, 'F' ) ) N1 = M
657 * Set numerical parameters
659 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
661 EPSLN = DLAMCH('Epsilon')
662 SFMIN = DLAMCH('SafeMinimum')
663 SMALL = SFMIN / EPSLN
667 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
669 *(!) If necessary, scale SVA() to protect the largest norm from
670 * overflow. It is possible that this scaling pushes the smallest
671 * column norm left from the underflow threshold (extreme case).
673 SCALEM = ONE / DSQRT(DBLE(M)*DBLE(N))
679 CALL ZLASSQ( M, A(1,p), 1, AAPP, AAQQ )
680 IF ( AAPP .GT. BIG ) THEN
682 CALL XERBLA( 'ZGEJSV', -INFO )
686 IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCAL ) THEN
690 SVA(p) = AAPP * ( AAQQ * SCALEM )
693 CALL DSCAL( p-1, SCALEM, SVA, 1 )
698 IF ( NOSCAL ) SCALEM = ONE
703 AAPP = DMAX1( AAPP, SVA(p) )
704 IF ( SVA(p) .NE. ZERO ) AAQQ = DMIN1( AAQQ, SVA(p) )
707 * Quick return for zero M x N matrix
709 IF ( AAPP .EQ. ZERO ) THEN
710 IF ( LSVEC ) CALL ZLASET( 'G', M, N1, CZERO, CONE, U, LDU )
711 IF ( RSVEC ) CALL ZLASET( 'G', N, N, CZERO, CONE, V, LDV )
714 IF ( ERREST ) RWORK(3) = ONE
715 IF ( LSVEC .AND. RSVEC ) THEN
729 * Issue warning if denormalized column norms detected. Override the
730 * high relative accuracy request. Issue licence to kill columns
731 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
734 IF ( AAQQ .LE. SFMIN ) THEN
740 * Quick return for one-column matrix
745 CALL ZLASCL( 'G',0,0,SVA(1),SCALEM, M,1,A(1,1),LDA,IERR )
746 CALL ZLACPY( 'A', M, 1, A, LDA, U, LDU )
747 * computing all M left singular vectors of the M x 1 matrix
748 IF ( N1 .NE. N ) THEN
749 CALL ZGEQRF( M, N, U,LDU, CWORK, CWORK(N+1),LWORK-N,IERR )
750 CALL ZUNGQR( M,N1,1, U,LDU,CWORK,CWORK(N+1),LWORK-N,IERR )
751 CALL ZCOPY( M, A(1,1), 1, U(1,1), 1 )
757 IF ( SVA(1) .LT. (BIG*SCALEM) ) THEN
758 SVA(1) = SVA(1) / SCALEM
761 RWORK(1) = ONE / SCALEM
763 IF ( SVA(1) .NE. ZERO ) THEN
765 IF ( ( SVA(1) / SCALEM) .GE. SFMIN ) THEN
775 IF ( ERREST ) RWORK(3) = ONE
776 IF ( LSVEC .AND. RSVEC ) THEN
789 L2TRAN = L2TRAN .AND. ( M .EQ. N )
793 IF ( ROWPIV .OR. L2TRAN ) THEN
795 * Compute the row norms, needed to determine row pivoting sequence
796 * (in the case of heavily row weighted A, row pivoting is strongly
797 * advised) and to collect information needed to compare the
798 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
804 CALL ZLASSQ( N, A(p,1), LDA, XSC, TEMP1 )
805 * ZLASSQ gets both the ell_2 and the ell_infinity norm
806 * in one pass through the vector
807 RWORK(M+N+p) = XSC * SCALEM
808 RWORK(N+p) = XSC * (SCALEM*DSQRT(TEMP1))
809 AATMAX = DMAX1( AATMAX, RWORK(N+p) )
810 IF (RWORK(N+p) .NE. ZERO)
811 $ AATMIN = DMIN1(AATMIN,RWORK(N+p))
815 RWORK(M+N+p) = SCALEM*ABS( A(p,IZAMAX(N,A(p,1),LDA)) )
816 AATMAX = DMAX1( AATMAX, RWORK(M+N+p) )
817 AATMIN = DMIN1( AATMIN, RWORK(M+N+p) )
823 * For square matrix A try to determine whether A^* would be better
824 * input for the preconditioned Jacobi SVD, with faster convergence.
825 * The decision is based on an O(N) function of the vector of column
826 * and row norms of A, based on the Shannon entropy. This should give
827 * the right choice in most cases when the difference actually matters.
828 * It may fail and pick the slower converging side.
836 CALL DLASSQ( N, SVA, 1, XSC, TEMP1 )
841 BIG1 = ( ( SVA(p) / XSC )**2 ) * TEMP1
842 IF ( BIG1 .NE. ZERO ) ENTRA = ENTRA + BIG1 * DLOG(BIG1)
844 ENTRA = - ENTRA / DLOG(DBLE(N))
846 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
847 * It is derived from the diagonal of A^* * A. Do the same with the
848 * diagonal of A * A^*, compute the entropy of the corresponding
849 * probability distribution. Note that A * A^* and A^* * A have the
854 BIG1 = ( ( RWORK(p) / XSC )**2 ) * TEMP1
855 IF ( BIG1 .NE. ZERO ) ENTRAT = ENTRAT + BIG1 * DLOG(BIG1)
857 ENTRAT = - ENTRAT / DLOG(DBLE(M))
859 * Analyze the entropies and decide A or A^*. Smaller entropy
860 * usually means better input for the algorithm.
862 TRANSP = ( ENTRAT .LT. ENTRA )
865 * If A^* is better than A, take the adjoint of A.
868 * In an optimal implementation, this trivial transpose
869 * should be replaced with faster transpose.
871 A(p,p) = DCONJG(A(p,p))
873 CTEMP = DCONJG(A(q,p))
874 A(q,p) = DCONJG(A(p,q))
878 A(N,N) = DCONJG(A(N,N))
880 RWORK(M+N+p) = SVA(p)
882 * previously computed row 2-norms are now column 2-norms
883 * of the transposed matrix
902 * Scale the matrix so that its maximal singular value remains less
903 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
904 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
905 * SQRT(BIG) instead of BIG is the fact that ZGEJSV uses LAPACK and
906 * BLAS routines that, in some implementations, are not capable of
907 * working in the full interval [SFMIN,BIG] and that they may provoke
908 * overflows in the intermediate results. If the singular values spread
909 * from SFMIN to BIG, then ZGESVJ will compute them. So, in that case,
910 * one should use ZGESVJ instead of ZGEJSV.
913 TEMP1 = DSQRT( BIG / DBLE(N) )
915 CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, N, 1, SVA, N, IERR )
916 IF ( AAQQ .GT. (AAPP * SFMIN) ) THEN
917 AAQQ = ( AAQQ / AAPP ) * TEMP1
919 AAQQ = ( AAQQ * TEMP1 ) / AAPP
921 TEMP1 = TEMP1 * SCALEM
922 CALL ZLASCL( 'G', 0, 0, AAPP, TEMP1, M, N, A, LDA, IERR )
924 * To undo scaling at the end of this procedure, multiply the
925 * computed singular values with USCAL2 / USCAL1.
931 * L2KILL enforces computation of nonzero singular values in
932 * the restricted range of condition number of the initial A,
933 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
938 * Now, if the condition number of A is too big,
939 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
940 * as a precaution measure, the full SVD is computed using ZGESVJ
941 * with accumulated Jacobi rotations. This provides numerically
942 * more robust computation, at the cost of slightly increased run
943 * time. Depending on the concrete implementation of BLAS and LAPACK
944 * (i.e. how they behave in presence of extreme ill-conditioning) the
945 * implementor may decide to remove this switch.
946 IF ( ( AAQQ.LT.DSQRT(SFMIN) ) .AND. LSVEC .AND. RSVEC ) THEN
951 IF ( AAQQ .LT. XSC ) THEN
953 IF ( SVA(p) .LT. XSC ) THEN
954 CALL ZLASET( 'A', M, 1, CZERO, CZERO, A(1,p), LDA )
960 * Preconditioning using QR factorization with pivoting
963 * Optional row permutation (Bjoerck row pivoting):
964 * A result by Cox and Higham shows that the Bjoerck's
965 * row pivoting combined with standard column pivoting
966 * has similar effect as Powell-Reid complete pivoting.
967 * The ell-infinity norms of A are made nonincreasing.
969 q = IDAMAX( M-p+1, RWORK(M+N+p), 1 ) + p - 1
973 RWORK(M+N+p) = RWORK(M+N+q)
977 CALL ZLASWP( N, A, LDA, 1, M-1, IWORK(2*N+1), 1 )
981 * End of the preparation phase (scaling, optional sorting and
982 * transposing, optional flushing of small columns).
986 * If the full SVD is needed, the right singular vectors are computed
987 * from a matrix equation, and for that we need theoretical analysis
988 * of the Businger-Golub pivoting. So we use ZGEQP3 as the first RR QRF.
989 * In all other cases the first RR QRF can be chosen by other criteria
990 * (eg speed by replacing global with restricted window pivoting, such
991 * as in xGEQPX from TOMS # 782). Good results will be obtained using
992 * xGEQPX with properly (!) chosen numerical parameters.
993 * Any improvement of ZGEQP3 improves overal performance of ZGEJSV.
995 * A * P1 = Q1 * [ R1^* 0]^*:
997 * .. all columns are free columns
1000 CALL ZGEQP3( M, N, A, LDA, IWORK, CWORK, CWORK(N+1), LWORK-N,
1003 * The upper triangular matrix R1 from the first QRF is inspected for
1004 * rank deficiency and possibilities for deflation, or possible
1005 * ill-conditioning. Depending on the user specified flag L2RANK,
1006 * the procedure explores possibilities to reduce the numerical
1007 * rank by inspecting the computed upper triangular factor. If
1008 * L2RANK or L2ABER are up, then ZGEJSV will compute the SVD of
1009 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1013 * Standard absolute error bound suffices. All sigma_i with
1014 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1015 * agressive enforcement of lower numerical rank by introducing a
1016 * backward error of the order of N*EPSLN*||A||.
1017 TEMP1 = DSQRT(DBLE(N))*EPSLN
1019 IF ( ABS(A(p,p)) .GE. (TEMP1*ABS(A(1,1))) ) THEN
1026 ELSE IF ( L2RANK ) THEN
1027 * .. similarly as above, only slightly more gentle (less agressive).
1028 * Sudden drop on the diagonal of R1 is used as the criterion for
1029 * close-to-rank-defficient.
1030 TEMP1 = DSQRT(SFMIN)
1032 IF ( ( ABS(A(p,p)) .LT. (EPSLN*ABS(A(p-1,p-1))) ) .OR.
1033 $ ( ABS(A(p,p)) .LT. SMALL ) .OR.
1034 $ ( L2KILL .AND. (ABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3402
1040 * The goal is high relative accuracy. However, if the matrix
1041 * has high scaled condition number the relative accuracy is in
1042 * general not feasible. Later on, a condition number estimator
1043 * will be deployed to estimate the scaled condition number.
1044 * Here we just remove the underflowed part of the triangular
1045 * factor. This prevents the situation in which the code is
1046 * working hard to get the accuracy not warranted by the data.
1047 TEMP1 = DSQRT(SFMIN)
1049 IF ( ( ABS(A(p,p)) .LT. SMALL ) .OR.
1050 $ ( L2KILL .AND. (ABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3302
1058 IF ( NR .EQ. N ) THEN
1061 TEMP1 = ABS(A(p,p)) / SVA(IWORK(p))
1062 MAXPRJ = DMIN1( MAXPRJ, TEMP1 )
1064 IF ( MAXPRJ**2 .GE. ONE - DBLE(N)*EPSLN ) ALMORT = .TRUE.
1073 IF ( N .EQ. NR ) THEN
1075 * .. V is available as workspace
1076 CALL ZLACPY( 'U', N, N, A, LDA, V, LDV )
1078 TEMP1 = SVA(IWORK(p))
1079 CALL ZDSCAL( p, ONE/TEMP1, V(1,p), 1 )
1081 CALL ZPOCON( 'U', N, V, LDV, ONE, TEMP1,
1082 $ CWORK(N+1), RWORK, IERR )
1084 ELSE IF ( LSVEC ) THEN
1085 * .. U is available as workspace
1086 CALL ZLACPY( 'U', N, N, A, LDA, U, LDU )
1088 TEMP1 = SVA(IWORK(p))
1089 CALL ZDSCAL( p, ONE/TEMP1, U(1,p), 1 )
1091 CALL ZPOCON( 'U', N, U, LDU, ONE, TEMP1,
1092 $ CWORK(N+1), RWORK, IERR )
1094 CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1096 TEMP1 = SVA(IWORK(p))
1097 CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1099 * .. the columns of R are scaled to have unit Euclidean lengths.
1100 CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1101 $ CWORK(N+N*N+1), RWORK, IERR )
1104 SCONDA = ONE / DSQRT(TEMP1)
1105 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1106 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1112 L2PERT = L2PERT .AND. ( ABS( A(1,1)/A(NR,NR) ) .GT. DSQRT(BIG1) )
1113 * If there is no violent scaling, artificial perturbation is not needed.
1117 IF ( .NOT. ( RSVEC .OR. LSVEC ) ) THEN
1119 * Singular Values only
1121 * .. transpose A(1:NR,1:N)
1122 DO 1946 p = 1, MIN0( N-1, NR )
1123 CALL ZCOPY( N-p, A(p,p+1), LDA, A(p+1,p), 1 )
1124 CALL ZLACGV( N-p+1, A(p,p), 1 )
1126 IF ( NR .EQ. N ) A(N,N) = DCONJG(A(N,N))
1128 * The following two DO-loops introduce small relative perturbation
1129 * into the strict upper triangle of the lower triangular matrix.
1130 * Small entries below the main diagonal are also changed.
1131 * This modification is useful if the computing environment does not
1132 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1133 * annoying denormalized numbers in case of strongly scaled matrices.
1134 * The perturbation is structured so that it does not introduce any
1135 * new perturbation of the singular values, and it does not destroy
1136 * the job done by the preconditioner.
1137 * The licence for this perturbation is in the variable L2PERT, which
1138 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1140 IF ( .NOT. ALMORT ) THEN
1144 XSC = EPSLN / DBLE(N)
1146 CTEMP = DCMPLX(XSC*ABS(A(q,q)),ZERO)
1148 IF ( ( (p.GT.q) .AND. (ABS(A(p,q)).LE.TEMP1) )
1149 $ .OR. ( p .LT. q ) )
1150 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1155 CALL ZLASET( 'U', NR-1,NR-1, CZERO,CZERO, A(1,2),LDA )
1158 * .. second preconditioning using the QR factorization
1160 CALL ZGEQRF( N,NR, A,LDA, CWORK, CWORK(N+1),LWORK-N, IERR )
1162 * .. and transpose upper to lower triangular
1163 DO 1948 p = 1, NR - 1
1164 CALL ZCOPY( NR-p, A(p,p+1), LDA, A(p+1,p), 1 )
1165 CALL ZLACGV( NR-p+1, A(p,p), 1 )
1170 * Row-cyclic Jacobi SVD algorithm with column pivoting
1172 * .. again some perturbation (a "background noise") is added
1173 * to drown denormals
1176 XSC = EPSLN / DBLE(N)
1178 CTEMP = DCMPLX(XSC*ABS(A(q,q)),ZERO)
1180 IF ( ( (p.GT.q) .AND. (ABS(A(p,q)).LE.TEMP1) )
1181 $ .OR. ( p .LT. q ) )
1182 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1187 CALL ZLASET( 'U', NR-1, NR-1, CZERO, CZERO, A(1,2), LDA )
1190 * .. and one-sided Jacobi rotations are started on a lower
1191 * triangular matrix (plus perturbation which is ignored in
1192 * the part which destroys triangular form (confusing?!))
1194 CALL ZGESVJ( 'L', 'NoU', 'NoV', NR, NR, A, LDA, SVA,
1195 $ N, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO )
1198 NUMRANK = NINT(RWORK(2))
1201 ELSE IF ( RSVEC .AND. ( .NOT. LSVEC ) ) THEN
1203 * -> Singular Values and Right Singular Vectors <-
1207 * .. in this case NR equals N
1209 CALL ZCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1210 CALL ZLACGV( N-p+1, V(p,p), 1 )
1212 CALL ZLASET( 'Upper', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV )
1214 CALL ZGESVJ( 'L','U','N', N, NR, V,LDV, SVA, NR, A,LDA,
1215 $ CWORK, LWORK, RWORK, LRWORK, INFO )
1217 NUMRANK = NINT(RWORK(2))
1221 * .. two more QR factorizations ( one QRF is not enough, two require
1222 * accumulated product of Jacobi rotations, three are perfect )
1224 CALL ZLASET( 'Lower', NR-1,NR-1, CZERO, CZERO, A(2,1), LDA )
1225 CALL ZGELQF( NR,N, A, LDA, CWORK, CWORK(N+1), LWORK-N, IERR)
1226 CALL ZLACPY( 'Lower', NR, NR, A, LDA, V, LDV )
1227 CALL ZLASET( 'Upper', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV )
1228 CALL ZGEQRF( NR, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1231 CALL ZCOPY( NR-p+1, V(p,p), LDV, V(p,p), 1 )
1232 CALL ZLACGV( NR-p+1, V(p,p), 1 )
1234 CALL ZLASET('Upper', NR-1, NR-1, CZERO, CZERO, V(1,2), LDV)
1236 CALL ZGESVJ( 'Lower', 'U','N', NR, NR, V,LDV, SVA, NR, U,
1237 $ LDU, CWORK(N+1), LWORK-N, RWORK, LRWORK, INFO )
1239 NUMRANK = NINT(RWORK(2))
1240 IF ( NR .LT. N ) THEN
1241 CALL ZLASET( 'A',N-NR, NR, CZERO,CZERO, V(NR+1,1), LDV )
1242 CALL ZLASET( 'A',NR, N-NR, CZERO,CZERO, V(1,NR+1), LDV )
1243 CALL ZLASET( 'A',N-NR,N-NR,CZERO,CONE, V(NR+1,NR+1),LDV )
1246 CALL ZUNMLQ( 'Left', 'C', N, N, NR, A, LDA, CWORK,
1247 $ V, LDV, CWORK(N+1), LWORK-N, IERR )
1252 CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1254 CALL ZLACPY( 'All', N, N, A, LDA, V, LDV )
1257 CALL ZLACPY( 'All', N, N, V, LDV, U, LDU )
1260 ELSE IF ( LSVEC .AND. ( .NOT. RSVEC ) ) THEN
1262 * .. Singular Values and Left Singular Vectors ..
1264 * .. second preconditioning step to avoid need to accumulate
1265 * Jacobi rotations in the Jacobi iterations.
1267 CALL ZCOPY( N-p+1, A(p,p), LDA, U(p,p), 1 )
1268 CALL ZLACGV( N-p+1, U(p,p), 1 )
1270 CALL ZLASET( 'Upper', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU )
1272 CALL ZGEQRF( N, NR, U, LDU, CWORK(N+1), CWORK(2*N+1),
1275 DO 1967 p = 1, NR - 1
1276 CALL ZCOPY( NR-p, U(p,p+1), LDU, U(p+1,p), 1 )
1277 CALL ZLACGV( N-p+1, U(p,p), 1 )
1279 CALL ZLASET( 'Upper', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU )
1281 CALL ZGESVJ( 'Lower', 'U', 'N', NR,NR, U, LDU, SVA, NR, A,
1282 $ LDA, CWORK(N+1), LWORK-N, RWORK, LRWORK, INFO )
1284 NUMRANK = NINT(RWORK(2))
1286 IF ( NR .LT. M ) THEN
1287 CALL ZLASET( 'A', M-NR, NR,CZERO, CZERO, U(NR+1,1), LDU )
1288 IF ( NR .LT. N1 ) THEN
1289 CALL ZLASET( 'A',NR, N1-NR, CZERO, CZERO, U(1,NR+1),LDU )
1290 CALL ZLASET( 'A',M-NR,N1-NR,CZERO,CONE,U(NR+1,NR+1),LDU )
1294 CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U,
1295 $ LDU, CWORK(N+1), LWORK-N, IERR )
1298 $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1301 XSC = ONE / DZNRM2( M, U(1,p), 1 )
1302 CALL ZDSCAL( M, XSC, U(1,p), 1 )
1306 CALL ZLACPY( 'All', N, N, U, LDU, V, LDV )
1313 IF ( .NOT. JRACC ) THEN
1315 IF ( .NOT. ALMORT ) THEN
1317 * Second Preconditioning Step (QRF [with pivoting])
1318 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1319 * equivalent to an LQF CALL. Since in many libraries the QRF
1320 * seems to be better optimized than the LQF, we do explicit
1321 * transpose and use the QRF. This is subject to changes in an
1322 * optimized implementation of ZGEJSV.
1325 CALL ZCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1326 CALL ZLACGV( N-p+1, V(p,p), 1 )
1329 * .. the following two loops perturb small entries to avoid
1330 * denormals in the second QR factorization, where they are
1331 * as good as zeros. This is done to avoid painfully slow
1332 * computation with denormals. The relative size of the perturbation
1333 * is a parameter that can be changed by the implementer.
1334 * This perturbation device will be obsolete on machines with
1335 * properly implemented arithmetic.
1336 * To switch it off, set L2PERT=.FALSE. To remove it from the
1337 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1338 * The following two loops should be blocked and fused with the
1339 * transposed copy above.
1344 CTEMP = DCMPLX(XSC*ABS( V(q,q) ),ZERO)
1346 IF ( ( p .GT. q ) .AND. ( ABS(V(p,q)) .LE. TEMP1 )
1347 $ .OR. ( p .LT. q ) )
1348 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1350 IF ( p .LT. q ) V(p,q) = - V(p,q)
1354 CALL ZLASET( 'U', NR-1, NR-1, CZERO, CZERO, V(1,2), LDV )
1357 * Estimate the row scaled condition number of R1
1358 * (If R1 is rectangular, N > NR, then the condition number
1359 * of the leading NR x NR submatrix is estimated.)
1361 CALL ZLACPY( 'L', NR, NR, V, LDV, CWORK(2*N+1), NR )
1363 TEMP1 = DZNRM2(NR-p+1,CWORK(2*N+(p-1)*NR+p),1)
1364 CALL ZDSCAL(NR-p+1,ONE/TEMP1,CWORK(2*N+(p-1)*NR+p),1)
1366 CALL ZPOCON('Lower',NR,CWORK(2*N+1),NR,ONE,TEMP1,
1367 $ CWORK(2*N+NR*NR+1),RWORK,IERR)
1368 CONDR1 = ONE / DSQRT(TEMP1)
1369 * .. here need a second oppinion on the condition number
1370 * .. then assume worst case scenario
1371 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1372 * more conservative <=> CONDR1 .LT. SQRT(DBLE(N))
1374 COND_OK = DSQRT(DSQRT(DBLE(NR)))
1375 *[TP] COND_OK is a tuning parameter.
1377 IF ( CONDR1 .LT. COND_OK ) THEN
1378 * .. the second QRF without pivoting. Note: in an optimized
1379 * implementation, this QRF should be implemented as the QRF
1380 * of a lower triangular matrix.
1382 CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1386 XSC = DSQRT(SMALL)/EPSLN
1388 DO 3958 q = 1, p - 1
1389 CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))),
1391 IF ( ABS(V(q,p)) .LE. TEMP1 )
1392 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1399 $ CALL ZLACPY( 'A', N, NR, V, LDV, CWORK(2*N+1), N )
1402 * .. this transposed copy should be better than naive
1403 DO 1969 p = 1, NR - 1
1404 CALL ZCOPY( NR-p, V(p,p+1), LDV, V(p+1,p), 1 )
1405 CALL ZLACGV(NR-p+1, V(p,p), 1 )
1407 V(NR,NR)=DCONJG(V(NR,NR))
1413 * .. ill-conditioned case: second QRF with pivoting
1414 * Note that windowed pivoting would be equaly good
1415 * numerically, and more run-time efficient. So, in
1416 * an optimal implementation, the next call to ZGEQP3
1417 * should be replaced with eg. CALL ZGEQPX (ACM TOMS #782)
1418 * with properly (carefully) chosen parameters.
1420 * R1^* * P2 = Q2 * R2
1424 CALL ZGEQP3( N, NR, V, LDV, IWORK(N+1), CWORK(N+1),
1425 $ CWORK(2*N+1), LWORK-2*N, RWORK, IERR )
1426 ** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1427 ** $ LWORK-2*N, IERR )
1431 DO 3968 q = 1, p - 1
1432 CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))),
1434 IF ( ABS(V(q,p)) .LE. TEMP1 )
1435 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1441 CALL ZLACPY( 'A', N, NR, V, LDV, CWORK(2*N+1), N )
1446 DO 8971 q = 1, p - 1
1447 CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))),
1449 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1454 CALL ZLASET( 'L',NR-1,NR-1,CZERO,CZERO,V(2,1),LDV )
1456 * Now, compute R2 = L3 * Q3, the LQ factorization.
1457 CALL ZGELQF( NR, NR, V, LDV, CWORK(2*N+N*NR+1),
1458 $ CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, IERR )
1459 * .. and estimate the condition number
1460 CALL ZLACPY( 'L',NR,NR,V,LDV,CWORK(2*N+N*NR+NR+1),NR )
1462 TEMP1 = DZNRM2( p, CWORK(2*N+N*NR+NR+p), NR )
1463 CALL ZDSCAL( p, ONE/TEMP1, CWORK(2*N+N*NR+NR+p), NR )
1465 CALL ZPOCON( 'L',NR,CWORK(2*N+N*NR+NR+1),NR,ONE,TEMP1,
1466 $ CWORK(2*N+N*NR+NR+NR*NR+1),RWORK,IERR )
1467 CONDR2 = ONE / DSQRT(TEMP1)
1470 IF ( CONDR2 .GE. COND_OK ) THEN
1471 * .. save the Householder vectors used for Q3
1472 * (this overwrittes the copy of R2, as it will not be
1473 * needed in this branch, but it does not overwritte the
1474 * Huseholder vectors of Q2.).
1475 CALL ZLACPY( 'U', NR, NR, V, LDV, CWORK(2*N+1), N )
1476 * .. and the rest of the information on Q3 is in
1477 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1485 CTEMP = XSC * V(q,q)
1486 DO 4969 p = 1, q - 1
1487 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1492 CALL ZLASET( 'U', NR-1,NR-1, CZERO,CZERO, V(1,2), LDV )
1495 * Second preconditioning finished; continue with Jacobi SVD
1496 * The input matrix is lower trinagular.
1498 * Recover the right singular vectors as solution of a well
1499 * conditioned triangular matrix equation.
1501 IF ( CONDR1 .LT. COND_OK ) THEN
1503 CALL ZGESVJ( 'L','U','N',NR,NR,V,LDV,SVA,NR,U, LDU,
1504 $ CWORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,RWORK,
1507 NUMRANK = NINT(RWORK(2))
1509 CALL ZCOPY( NR, V(1,p), 1, U(1,p), 1 )
1510 CALL ZDSCAL( NR, SVA(p), V(1,p), 1 )
1513 * .. pick the right matrix equation and solve it
1515 IF ( NR .EQ. N ) THEN
1516 * :)) .. best case, R1 is inverted. The solution of this matrix
1517 * equation is Q2*V2 = the product of the Jacobi rotations
1518 * used in ZGESVJ, premultiplied with the orthogonal matrix
1519 * from the second QR factorization.
1520 CALL ZTRSM('L','U','N','N', NR,NR,CONE, A,LDA, V,LDV)
1522 * .. R1 is well conditioned, but non-square. Adjoint of R2
1523 * is inverted to get the product of the Jacobi rotations
1524 * used in ZGESVJ. The Q-factor from the second QR
1525 * factorization is then built in explicitly.
1526 CALL ZTRSM('L','U','C','N',NR,NR,CONE,CWORK(2*N+1),
1528 IF ( NR .LT. N ) THEN
1529 CALL ZLASET('A',N-NR,NR,CZERO,CZERO,V(NR+1,1),LDV)
1530 CALL ZLASET('A',NR,N-NR,CZERO,CZERO,V(1,NR+1),LDV)
1531 CALL ZLASET('A',N-NR,N-NR,CZERO,CONE,V(NR+1,NR+1),LDV)
1533 CALL ZUNMQR('L','N',N,N,NR,CWORK(2*N+1),N,CWORK(N+1),
1534 $ V,LDV,CWORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR)
1537 ELSE IF ( CONDR2 .LT. COND_OK ) THEN
1539 * The matrix R2 is inverted. The solution of the matrix equation
1540 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1541 * the lower triangular L3 from the LQ factorization of
1542 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1543 CALL ZGESVJ( 'L', 'U', 'N', NR, NR, V, LDV, SVA, NR, U,
1544 $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR,
1545 $ RWORK, LRWORK, INFO )
1547 NUMRANK = NINT(RWORK(2))
1549 CALL ZCOPY( NR, V(1,p), 1, U(1,p), 1 )
1550 CALL ZDSCAL( NR, SVA(p), U(1,p), 1 )
1552 CALL ZTRSM('L','U','N','N',NR,NR,CONE,CWORK(2*N+1),N,
1554 * .. apply the permutation from the second QR factorization
1557 CWORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1560 U(p,q) = CWORK(2*N+N*NR+NR+p)
1563 IF ( NR .LT. N ) THEN
1564 CALL ZLASET( 'A',N-NR,NR,CZERO,CZERO,V(NR+1,1),LDV )
1565 CALL ZLASET( 'A',NR,N-NR,CZERO,CZERO,V(1,NR+1),LDV )
1566 CALL ZLASET('A',N-NR,N-NR,CZERO,CONE,V(NR+1,NR+1),LDV)
1568 CALL ZUNMQR( 'L','N',N,N,NR,CWORK(2*N+1),N,CWORK(N+1),
1569 $ V,LDV,CWORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1571 * Last line of defense.
1572 * #:( This is a rather pathological case: no scaled condition
1573 * improvement after two pivoted QR factorizations. Other
1574 * possibility is that the rank revealing QR factorization
1575 * or the condition estimator has failed, or the COND_OK
1576 * is set very close to ONE (which is unnecessary). Normally,
1577 * this branch should never be executed, but in rare cases of
1578 * failure of the RRQR or condition estimator, the last line of
1579 * defense ensures that ZGEJSV completes the task.
1580 * Compute the full SVD of L3 using ZGESVJ with explicit
1581 * accumulation of Jacobi rotations.
1582 CALL ZGESVJ( 'L', 'U', 'V', NR, NR, V, LDV, SVA, NR, U,
1583 $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR,
1584 $ RWORK, LRWORK, INFO )
1586 NUMRANK = NINT(RWORK(2))
1587 IF ( NR .LT. N ) THEN
1588 CALL ZLASET( 'A',N-NR,NR,CZERO,CZERO,V(NR+1,1),LDV )
1589 CALL ZLASET( 'A',NR,N-NR,CZERO,CZERO,V(1,NR+1),LDV )
1590 CALL ZLASET('A',N-NR,N-NR,CZERO,CONE,V(NR+1,NR+1),LDV)
1592 CALL ZUNMQR( 'L','N',N,N,NR,CWORK(2*N+1),N,CWORK(N+1),
1593 $ V,LDV,CWORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1595 CALL ZUNMLQ( 'L', 'C', NR, NR, NR, CWORK(2*N+1), N,
1596 $ CWORK(2*N+N*NR+1), U, LDU, CWORK(2*N+N*NR+NR+1),
1597 $ LWORK-2*N-N*NR-NR, IERR )
1600 CWORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1603 U(p,q) = CWORK(2*N+N*NR+NR+p)
1609 * Permute the rows of V using the (column) permutation from the
1610 * first QRF. Also, scale the columns to make them unit in
1611 * Euclidean norm. This applies to all cases.
1613 TEMP1 = DSQRT(DBLE(N)) * EPSLN
1616 CWORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1619 V(p,q) = CWORK(2*N+N*NR+NR+p)
1621 XSC = ONE / DZNRM2( N, V(1,q), 1 )
1622 IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1623 $ CALL ZDSCAL( N, XSC, V(1,q), 1 )
1625 * At this moment, V contains the right singular vectors of A.
1626 * Next, assemble the left singular vector matrix U (M x N).
1627 IF ( NR .LT. M ) THEN
1628 CALL ZLASET('A', M-NR, NR, CZERO, CZERO, U(NR+1,1), LDU)
1629 IF ( NR .LT. N1 ) THEN
1630 CALL ZLASET('A',NR,N1-NR,CZERO,CZERO,U(1,NR+1),LDU)
1631 CALL ZLASET('A',M-NR,N1-NR,CZERO,CONE,
1636 * The Q matrix from the first QRF is built into the left singular
1637 * matrix U. This applies to all cases.
1639 CALL ZUNMQR( 'Left', 'No_Tr', M, N1, N, A, LDA, CWORK, U,
1640 $ LDU, CWORK(N+1), LWORK-N, IERR )
1642 * The columns of U are normalized. The cost is O(M*N) flops.
1643 TEMP1 = DSQRT(DBLE(M)) * EPSLN
1645 XSC = ONE / DZNRM2( M, U(1,p), 1 )
1646 IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1647 $ CALL ZDSCAL( M, XSC, U(1,p), 1 )
1650 * If the initial QRF is computed with row pivoting, the left
1651 * singular vectors must be adjusted.
1654 $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1658 * .. the initial matrix A has almost orthogonal columns and
1659 * the second QRF is not needed
1661 CALL ZLACPY( 'Upper', N, N, A, LDA, CWORK(N+1), N )
1665 CTEMP = XSC * CWORK( N + (p-1)*N + p )
1666 DO 5971 q = 1, p - 1
1667 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
1668 * $ ABS(CWORK(N+(p-1)*N+q)) )
1669 CWORK(N+(q-1)*N+p)=-CTEMP
1673 CALL ZLASET( 'Lower',N-1,N-1,CZERO,CZERO,CWORK(N+2),N )
1676 CALL ZGESVJ( 'Upper', 'U', 'N', N, N, CWORK(N+1), N, SVA,
1677 $ N, U, LDU, CWORK(N+N*N+1), LWORK-N-N*N, RWORK, LRWORK,
1681 NUMRANK = NINT(RWORK(2))
1683 CALL ZCOPY( N, CWORK(N+(p-1)*N+1), 1, U(1,p), 1 )
1684 CALL ZDSCAL( N, SVA(p), CWORK(N+(p-1)*N+1), 1 )
1687 CALL ZTRSM( 'Left', 'Upper', 'NoTrans', 'No UD', N, N,
1688 $ CONE, A, LDA, CWORK(N+1), N )
1690 CALL ZCOPY( N, CWORK(N+p), N, V(IWORK(p),1), LDV )
1692 TEMP1 = DSQRT(DBLE(N))*EPSLN
1694 XSC = ONE / DZNRM2( N, V(1,p), 1 )
1695 IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1696 $ CALL ZDSCAL( N, XSC, V(1,p), 1 )
1699 * Assemble the left singular vector matrix U (M x N).
1701 IF ( N .LT. M ) THEN
1702 CALL ZLASET( 'A', M-N, N, CZERO, CZERO, U(N+1,1), LDU )
1703 IF ( N .LT. N1 ) THEN
1704 CALL ZLASET('A',N, N1-N, CZERO, CZERO, U(1,N+1),LDU)
1705 CALL ZLASET( 'A',M-N,N1-N, CZERO, CONE,U(N+1,N+1),LDU)
1708 CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U,
1709 $ LDU, CWORK(N+1), LWORK-N, IERR )
1710 TEMP1 = DSQRT(DBLE(M))*EPSLN
1712 XSC = ONE / DZNRM2( M, U(1,p), 1 )
1713 IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1714 $ CALL ZDSCAL( M, XSC, U(1,p), 1 )
1718 $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1722 * end of the >> almost orthogonal case << in the full SVD
1726 * This branch deploys a preconditioned Jacobi SVD with explicitly
1727 * accumulated rotations. It is included as optional, mainly for
1728 * experimental purposes. It does perfom well, and can also be used.
1729 * In this implementation, this branch will be automatically activated
1730 * if the condition number sigma_max(A) / sigma_min(A) is predicted
1731 * to be greater than the overflow threshold. This is because the
1732 * a posteriori computation of the singular vectors assumes robust
1733 * implementation of BLAS and some LAPACK procedures, capable of working
1734 * in presence of extreme values. Since that is not always the case, ...
1737 CALL ZCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1738 CALL ZLACGV( N-p+1, V(p,p), 1 )
1742 XSC = DSQRT(SMALL/EPSLN)
1744 CTEMP = DCMPLX(XSC*ABS( V(q,q) ),ZERO)
1746 IF ( ( p .GT. q ) .AND. ( ABS(V(p,q)) .LE. TEMP1 )
1747 $ .OR. ( p .LT. q ) )
1748 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1750 IF ( p .LT. q ) V(p,q) = - V(p,q)
1754 CALL ZLASET( 'U', NR-1, NR-1, CZERO, CZERO, V(1,2), LDV )
1757 CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1759 CALL ZLACPY( 'L', N, NR, V, LDV, CWORK(2*N+1), N )
1762 CALL ZCOPY( NR-p+1, V(p,p), LDV, U(p,p), 1 )
1763 CALL ZLACGV( NR-p+1, U(p,p), 1 )
1767 XSC = DSQRT(SMALL/EPSLN)
1769 DO 9971 p = 1, q - 1
1770 CTEMP = DCMPLX(XSC * DMIN1(ABS(U(p,p)),ABS(U(q,q))),
1772 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
1777 CALL ZLASET('U', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU )
1780 CALL ZGESVJ( 'L', 'U', 'V', NR, NR, U, LDU, SVA,
1781 $ N, V, LDV, CWORK(2*N+N*NR+1), LWORK-2*N-N*NR,
1782 $ RWORK, LRWORK, INFO )
1784 NUMRANK = NINT(RWORK(2))
1786 IF ( NR .LT. N ) THEN
1787 CALL ZLASET( 'A',N-NR,NR,CZERO,CZERO,V(NR+1,1),LDV )
1788 CALL ZLASET( 'A',NR,N-NR,CZERO,CZERO,V(1,NR+1),LDV )
1789 CALL ZLASET( 'A',N-NR,N-NR,CZERO,CONE,V(NR+1,NR+1),LDV )
1792 CALL ZUNMQR( 'L','N',N,N,NR,CWORK(2*N+1),N,CWORK(N+1),
1793 $ V,LDV,CWORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1795 * Permute the rows of V using the (column) permutation from the
1796 * first QRF. Also, scale the columns to make them unit in
1797 * Euclidean norm. This applies to all cases.
1799 TEMP1 = DSQRT(DBLE(N)) * EPSLN
1802 CWORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1805 V(p,q) = CWORK(2*N+N*NR+NR+p)
1807 XSC = ONE / DZNRM2( N, V(1,q), 1 )
1808 IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1809 $ CALL ZDSCAL( N, XSC, V(1,q), 1 )
1812 * At this moment, V contains the right singular vectors of A.
1813 * Next, assemble the left singular vector matrix U (M x N).
1815 IF ( NR .LT. M ) THEN
1816 CALL ZLASET( 'A', M-NR, NR, CZERO, CZERO, U(NR+1,1), LDU )
1817 IF ( NR .LT. N1 ) THEN
1818 CALL ZLASET('A',NR, N1-NR, CZERO, CZERO, U(1,NR+1),LDU)
1819 CALL ZLASET('A',M-NR,N1-NR, CZERO, CONE,U(NR+1,NR+1),LDU)
1823 CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U,
1824 $ LDU, CWORK(N+1), LWORK-N, IERR )
1827 $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1832 * .. swap U and V because the procedure worked on A^*
1834 CALL ZSWAP( N, U(1,p), 1, V(1,p), 1 )
1839 * end of the full SVD
1841 * Undo scaling, if necessary (and possible)
1843 IF ( USCAL2 .LE. (BIG/SVA(1))*USCAL1 ) THEN
1844 CALL DLASCL( 'G', 0, 0, USCAL1, USCAL2, NR, 1, SVA, N, IERR )
1849 IF ( NR .LT. N ) THEN
1855 RWORK(1) = USCAL2 * SCALEM
1857 IF ( ERREST ) RWORK(3) = SCONDA
1858 IF ( LSVEC .AND. RSVEC ) THEN