Commit Graph

7 Commits

Author SHA1 Message Date
thorpej 792a41ba35 Do the CPU-specific optimization flags a better way, suggested
by Simon Burge.
2001-09-10 02:56:57 +00:00
thorpej 6fcde7aad3 Grumble. If you make external references, the code MUST be PIC
for shared libraries.  This code is not PIC, so DO NOT BUILD IT.
2001-09-09 19:55:24 +00:00
tls 43e3cefe90 Add assembly versions of DES transforms for x86; a performance improvement
of about 3.5X on my 1333MHz Athlon (about 37MB/sec!) compared to the old
C versions.

We could boost the speed of the C versions on most other architectures with
des.inc files that set the compile-time flags (DES_PTR, DES_RISC1, DES_RISC2)
correctly; at the moment they aren't set at all.
2001-09-09 10:44:24 +00:00
thorpej cf6fc32958 Add support for building the assembly versions of some BIGNUM
routines from OpenSSL.  Speeds up DSA significantly.  A similar
gain should also be seen for RSA.

Before:
Doing 512 bit sign dsa's for 10s: 965 512 bit DSA signs in 9.97s
Doing 512 bit verify dsa's for 10s: 766 512 bit DSA verify in 9.93s
Doing 1024 bit sign dsa's for 10s: 276 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 217 1024 bit DSA verify in 9.93s
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0103s   0.0130s     96.8     77.1
dsa 1024 bits   0.0362s   0.0458s     27.6     21.9

After:
Doing 512 bit sign dsa's for 10s: 3742 512 bit DSA signs in 9.88s
Doing 512 bit verify dsa's for 10s: 3065 512 bit DSA verify in 9.92s
Doing 1024 bit sign dsa's for 10s: 1357 1024 bit DSA signs in 9.99s
Doing 1024 bit verify dsa's for 10s: 1094 1024 bit DSA verify in 9.83s
                  sign    verify    sign/s verify/s
dsa  512 bits   0.0026s   0.0032s    378.7    309.0
dsa 1024 bits   0.0074s   0.0090s    135.8    111.3
2000-07-31 19:57:30 +00:00
thorpej 557e12076d Add support for building the assembly version of RMD160 from OpenSSL.
Before:
Doing rmd160 for 3s on 8 size blocks: 778828 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 430214 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 182108 rmd160's in 3.00s
Doing rmd160 for 3s on 1024 size blocks: 55050 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 7339 rmd160's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rmd160            2076.87k     9177.90k    15539.88k    18790.40k    20040.36k

After:
Doing rmd160 for 3s on 8 size blocks: 1084941 rmd160's in 3.00s
Doing rmd160 for 3s on 64 size blocks: 617966 rmd160's in 3.00s
Doing rmd160 for 3s on 256 size blocks: 267381 rmd160's in 2.99s
Doing rmd160 for 3s on 1024 size blocks: 82001 rmd160's in 3.00s
Doing rmd160 for 3s on 8192 size blocks: 10974 rmd160's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rmd160            2893.18k    13183.27k    22892.82k    27989.67k    29966.34k
2000-07-31 19:22:04 +00:00
thorpej cb83ceb68d Add support for building the assembly version of MD5 from OpenSSL.
Before:
Doing md5 for 3s on 8 size blocks: 1490796 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 895849 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 410807 md5's in 3.00s
Doing md5 for 3s on 1024 size blocks: 129416 md5's in 3.00s
Doing md5 for 3s on 8192 size blocks: 17527 md5's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               3975.46k    19111.45k    35055.53k    44173.99k    47860.39k

After:
Doing md5 for 3s on 8 size blocks: 2041410 md5's in 3.00s
Doing md5 for 3s on 64 size blocks: 1345402 md5's in 3.00s
Doing md5 for 3s on 256 size blocks: 669827 md5's in 3.10s
Doing md5 for 3s on 1024 size blocks: 221744 md5's in 2.96s
Doing md5 for 3s on 8192 size blocks: 30685 md5's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               5443.76k    28701.91k    56968.68k    76711.44k    83790.51k
2000-07-31 19:08:02 +00:00
thorpej dacf9960bf Add support for building the assembly versions of Blowfish encrypt
and decrypt from OpenSSL.  Right now we only build the 586 version,
but eventually we will be able to build the 686 version based on a
CPP flag defined as a result of using `cc -mcpu=pentiumpro'.

We don't build the assembly version of BF_cbc_encrypt(), as it would
have to be rewritten to be PIC.

Performance difference is quite noticeable.

Before:
Doing blowfish cbc for 3s on 8 size blocks: 2891026 blowfish cbc's in 2.97s
Doing blowfish cbc for 3s on 64 size blocks: 411766 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 256 size blocks: 104721 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 26291 blowfish cbc's in 2.98s
Doing blowfish cbc for 3s on 8192 size blocks: 3290 blowfish cbc's in 3.10s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc      7787.28k     8755.16k     8936.19k     9034.22k     8954.05k

After:
Doing blowfish cbc for 3s on 8 size blocks: 4573792 blowfish cbc's in 3.10s
Doing blowfish cbc for 3s on 64 size blocks: 713440 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 256 size blocks: 183125 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 46221 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 5787 blowfish cbc's in 3.00s
type              8 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     12156.26k    15270.96k    15626.67k    15776.77k    15802.37k
2000-07-31 18:39:04 +00:00