NetBSD/sys/lib/libkern
itohy f89823c1f8 Save 1-4 instructions on all cases except for the ret=0 case.
This is probably the last version from me. :)
You are welcome to speed it up, of course. :)

Here's a benchmark on SH-4 200MHz.
9.2% faster if all the cases occur evenly.

return value	C version	previous vers	this version	speed ratio
of ffs()	(ns/call) *1	(ns/call)	(ns/call) *2	(*1/*2)
------------	------------	-------------	------------	-----------
 0		  86		 81		 81		1.06
 1		 110		106		 91		1.21
 2		 132		106		 92		1.43
 3		 165		117		 96		1.72
 4		 201		116		 95		2.12
 5		 237		107		 99		2.39
 6		 271		106		101		2.68
 7		 307		116		107		2.87
 8		 342		116		105		3.26
 9		 376		126		111		3.39
10		 410		127		110		3.73
11		 446		136		115		3.88
12		 483		134		116		4.16
13		 518		125		119		4.35
14		 551		126		120		4.59
15		 587		135		127		4.62
16		 624		136		126		4.95
17		 658		139		126		5.22
18		 694		140		126		5.51
19		 727		148		131		5.55
20		 764		150		131		5.83
21		 799		141		135		5.92
22		 834		142		135		6.18
23		 868		152		140		6.20
24		 903		153		142		6.36
25		 939		140		127		7.39
26		 974		141		126		7.73
27		1009		152		131		7.70
28		1044		148		130		8.03
29		1080		141		136		7.94
30		1115		141		136		8.20
31		1151		151		141		8.16
32		1185		151		140		8.46
2002-09-01 13:14:53 +00:00
..
arch Save 1-4 instructions on all cases except for the ret=0 case. 2002-09-01 13:14:53 +00:00
__assert.c
__cmsg_alignbytes.c don't need <sys/types.h> when including <sys/param.h> 2001-11-15 09:47:59 +00:00
__main.c
_que.c Move _insque()/_remque() to libkern. Once remaining uses would 2001-08-12 08:35:31 +00:00
adddi3.c
anddi3.c
arc4random.c discard 256 bytes of output every time we stir (not just when initializing) 2002-06-14 03:05:46 +00:00
ashldi3.c
ashrdi3.c
bcmp.c
bcopy.c
bswap16.c
bswap32.c
bswap64.c
bzero.c
cmpdi2.c
divdi3.c
ffs.c
htonl.c sync argument/return type of [hn]to[nh][ls] to XNET 5.2 (uint{16,32}_t). 2001-08-22 07:42:07 +00:00
htons.c sync argument/return type of [hn]to[nh][ls] to XNET 5.2 (uint{16,32}_t). 2001-08-22 07:42:07 +00:00
imax.c
imin.c
index.c
inet_addr.c ctype-like functions are now in libkern. 2001-04-18 15:40:58 +00:00
intoa.c
iordi3.c
libkern.h Tweak the previous change so that a prototype is always provided. 2002-08-25 21:09:45 +00:00
lmax.c
lmin.c
lshldi3.c
lshrdi3.c
Makefile Updated version of cscope/mkid support. Check libkern and compat lib 2002-06-18 23:46:52 +00:00
Makefile.inc Don't make clean and cleandir depend on the lib subdir. Just check for it's 2001-11-21 22:10:54 +00:00
max.c
mcount.c
md4c.c don't need <sys/types.h> when including <sys/param.h> 2001-11-15 09:47:59 +00:00
md5c.c don't need <sys/types.h> when including <sys/param.h> 2001-11-15 09:47:59 +00:00
memchr.c
memcmp.c
memcpy.c
memmove.c
memset.c
milieu.h o IEEE 754 floating-point completion code. 2001-04-26 03:10:44 +00:00
min.c
moddi3.c
muldi3.c
negdi2.c
notdi2.c
ntohl.c sync argument/return type of [hn]to[nh][ls] to XNET 5.2 (uint{16,32}_t). 2001-08-22 07:42:07 +00:00
ntohs.c sync argument/return type of [hn]to[nh][ls] to XNET 5.2 (uint{16,32}_t). 2001-08-22 07:42:07 +00:00
pmatch.c don't need <sys/types.h> when including <sys/param.h> 2001-11-15 09:47:59 +00:00
qdivrem.c
quad.h
random.c
rb.c Add "Red Black +" balanced binary tree routines to libkern. These provide 2001-10-24 22:40:56 +00:00
rb.h Add "Red Black +" balanced binary tree routines to libkern. These provide 2001-10-24 22:40:56 +00:00
rindex.c
scanc.c convert to ansi knf, and fix a bug where the last arg was incorrectly 2001-08-09 08:03:34 +00:00
sha1.c
skpc.c
softfloat-macros.h o IEEE 754 floating-point completion code. 2001-04-26 03:10:44 +00:00
softfloat-specialize.h o IEEE 754 floating-point completion code. 2001-04-26 03:10:44 +00:00
softfloat.c o IEEE 754 floating-point completion code. 2001-04-26 03:10:44 +00:00
softfloat.h o IEEE 754 floating-point completion code. 2001-04-26 03:10:44 +00:00
strcasecmp.c
strcat.c
strchr.c
strcmp.c
strcpy.c
strlen.c
strncasecmp.c
strncmp.c
strncpy.c
strrchr.c
strtoul.c
subdi3.c
ucmpdi2.c
udivdi3.c
ulmax.c
ulmin.c
umoddi3.c
xordi3.c