User talk:Pzgesvd: Difference between revisions
(a recall of all the work that I've done) |
|||
Line 39: | Line 39: | ||
(1) Complex version SVD for ScaLAPACK | (1) Complex version SVD for ScaLAPACK | ||
(2) RFP for LAPACK | (2) RFP for LAPACK | ||
(3) Performance comparison between PLAPACK & SCALAPACK | (3) Performance comparison between PLAPACK & SCALAPACK | ||
(4) Re-write PDGETRF, optimization with look-ahead and other threading method for better performance on multi-core cluster | (4) Re-write PDGETRF, optimization with look-ahead and other threading method for better performance on multi-core cluster | ||
(5) CELL learning | (5) CELL learning | ||
(6) Variations of LU, Chol and QR for LAPACK | (6) Variations of LU, Chol and QR for LAPACK | ||
(7) CLAPACK conversion | (7) CLAPACK conversion | ||
(8) ScaLAPACK auto-tuning | (8) ScaLAPACK auto-tuning | ||
(9) DAG automatic generation for LAPACK | (9) DAG automatic generation for LAPACK | ||
(10) Using random butterfly transformation to remove pivoting | (10) Using random butterfly transformation to remove pivoting | ||
(11) ILP64 support for LAPACK, ScaLAPACK | (11) ILP64 support for LAPACK, ScaLAPACK | ||
(12) tiled LU without pivoting |
Revision as of 22:30, 9 September 2008
bidiagonal reduction code function matching
DGEQRT --- LQR1
DTSQRT --- LQR2
DLARTB --- LUP1
DSSRFT --- LUP2
DGEQRT --- RQR1
DTSQRT --- RQR2
DLARTB --- RUP1
DSSRFT --- RUP2
delete lines from files using sed
sed -ie '1,11d' dgetrf.c
original files are backed up in the .ce files
refer to [1]
low cost stack for function call?
can the stacking operation of the dag chasing be reduced so the overhead is minimized?
clapack3.1.1 blas testing handtune
in the testing routines, eps is wrongly calculated as 1e-19, which should be 1e-7. In each files (dblat3.c, for example), a new piece of eps code is inserted.
For double precision (complex16), problem exists for not accurate element result.
a recall of all the work that I've done
(1) Complex version SVD for ScaLAPACK
(2) RFP for LAPACK
(3) Performance comparison between PLAPACK & SCALAPACK
(4) Re-write PDGETRF, optimization with look-ahead and other threading method for better performance on multi-core cluster
(5) CELL learning
(6) Variations of LU, Chol and QR for LAPACK
(7) CLAPACK conversion
(8) ScaLAPACK auto-tuning
(9) DAG automatic generation for LAPACK
(10) Using random butterfly transformation to remove pivoting
(11) ILP64 support for LAPACK, ScaLAPACK
(12) tiled LU without pivoting