genann

Commit Graph

Author	SHA1	Message	Date
Jose Fernando Lopez Fernandez	5c93809d3b	Rebased to upstream master. Removed inline function specifiers.	2019-06-07 12:13:27 -04:00
Lewis Van Winkle	94e22d4e06	tweaks to help it compile in VS	2018-07-07 19:05:53 -05:00
Lewis Van Winkle	eb56a6d9f2	changed macro naming	2018-07-07 18:51:16 -05:00
Lewis Van Winkle	314ef383dd	update copyright date	2018-07-06 10:50:44 -05:00
Lewis Van Winkle	d2e716d9d1	removed line of dead code	2018-07-06 10:29:23 -05:00
Andrew Jeffery	d21d0f301b	genann: Remove branching from back-propagation inner-loop This saves approximately 80 million instructions and 44 million branches in the trace of example4, shaving off around 8ms: Before: ``` Performance counter stats for './example4': 92.629610 task-clock (msec) # 0.997 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 78 page-faults # 0.842 K/sec 291,863,801 cycles # 3.151 GHz 1,000,931,204 instructions # 3.43 insn per cycle 202,465,800 branches # 2185.757 M/sec 50,949 branch-misses # 0.03% of all branches 0.092889789 seconds time elapsed ``` After: ``` Performance counter stats for './example4': 84.473035 task-clock (msec) # 0.997 CPUs utilized 3 context-switches # 0.036 K/sec 0 cpu-migrations # 0.000 K/sec 81 page-faults # 0.959 K/sec 265,472,170 cycles # 3.143 GHz 919,372,488 instructions # 3.46 insn per cycle 158,754,885 branches # 1879.356 M/sec 65,337 branch-misses # 0.04% of all branches 0.084755458 seconds time elapsed ``` Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-12-18 22:37:39 +10:30
Andrew Jeffery	db51375bb7	genann: Optionally resolve activation functions at link time Shave around 94 million instructions and 10 million branches off of execution trace of example4 if the sigmoid activation function is resolved at link-time. Before (`make`): ``` Performance counter stats for './example4': 98.988806 task-clock (msec) # 0.998 CPUs utilized 1 context-switches # 0.010 K/sec 0 cpu-migrations # 0.000 K/sec 79 page-faults # 0.798 K/sec 312,298,260 cycles # 3.155 GHz 1,094,183,752 instructions # 3.50 insn per cycle 212,007,732 branches # 2141.734 M/sec 62,774 branch-misses # 0.03% of all branches 0.099228100 seconds time elapsed ``` After: `make`: ``` Performance counter stats for './example4': 97.335180 task-clock (msec) # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 82 page-faults # 0.842 K/sec 306,722,357 cycles # 3.151 GHz 1,065,669,644 instructions # 3.47 insn per cycle 214,256,601 branches # 2201.225 M/sec 60,154 branch-misses # 0.03% of all branches 0.097577079 seconds time elapsed ``` `make sigmoid`: ``` Performance counter stats for './example4': 92.629610 task-clock (msec) # 0.997 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 78 page-faults # 0.842 K/sec 291,863,801 cycles # 3.151 GHz 1,000,931,204 instructions # 3.43 insn per cycle 202,465,800 branches # 2185.757 M/sec 50,949 branch-misses # 0.03% of all branches 0.092889789 seconds time elapsed ``` Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-12-18 22:36:44 +10:30
Andrew Jeffery	b1f72be243	genann: Unroll loops via hoisting inner-loop conditions in genann_run() This gives a reduction of rougly 27 million instructions and 11 million branches in the execution trace of example4. On a Lenovo X1 Carbon Gen 3 machine (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz) running Ubuntu 17.10 with GCC 7.2.0-8ubuntu3, using CFLAGS="-g -O3 -march=native -DNDEBUG" I see the following change in `perf stat`: Before: ``` Performance counter stats for './example4': 101.369081 task-clock (msec) # 0.998 CPUs utilized 1 context-switches # 0.010 K/sec 0 cpu-migrations # 0.000 K/sec 79 page-faults # 0.779 K/sec 320,197,883 cycles # 3.159 GHz 1,121,174,423 instructions # 3.50 insn per cycle 223,257,752 branches # 2202.425 M/sec 62,680 branch-misses # 0.03% of all branches 0.101595114 seconds time elapsed ``` After: ``` Performance counter stats for './example4': 98.988806 task-clock (msec) # 0.998 CPUs utilized 1 context-switches # 0.010 K/sec 0 cpu-migrations # 0.000 K/sec 79 page-faults # 0.798 K/sec 312,298,260 cycles # 3.155 GHz 1,094,183,752 instructions # 3.50 insn per cycle 212,007,732 branches # 2141.734 M/sec 62,774 branch-misses # 0.03% of all branches 0.099228100 seconds time elapsed ``` Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-12-18 16:59:40 +10:30
Andrew Jeffery	6574bddf6b	genann: Use reciprocal interval value to strength reduce divide to multiply This gives a reduction of roughly 2.5 million instructions in the execution trace of example4. genann_act_sigmoid_cached() previously divided by interval to calculate the lookup index. Divide is a expensive operation, so instead use the reciprocal of the existing interval calculation to reduce the divide to a multiply. Building with the following configuration: ``` $ head /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 61 model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz stepping : 4 microcode : 0x25 cpu MHz : 2593.871 cache size : 4096 KB physical id : 0 $ cat /etc/os-release NAME="Ubuntu" VERSION="17.10 (Artful Aardvark)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 17.10" VERSION_ID="17.10" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=artful UBUNTU_CODENAME=artful $ cc --version gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` on my Lenovo X1 Carbon Gen 3 machine sees the following: ``` $ make CFLAGS="-g -O3 -march=native -DNDEBUG" cc -g -O3 -march=native -DNDEBUG -c -o test.o test.c cc -g -O3 -march=native -DNDEBUG -c -o genann.o genann.c cc -g -O3 -march=native -DNDEBUG -c -o example1.o example1.c cc -g -O3 -march=native -DNDEBUG -c -o example2.o example2.c cc -g -O3 -march=native -DNDEBUG -c -o example3.o example3.c cc -g -O3 -march=native -DNDEBUG -c -o example4.o example4.c cc -g -O3 -march=native -DNDEBUG -c -o strings.o strings.c cc test.o genann.o -lm -o test cc example1.o genann.o -lm -o example1 cc example4.o genann.o -lm -o example4 cc example3.o genann.o -lm -o example3 cc example2.o genann.o -lm -o example2 cc strings.o genann.o -lm -o strings $ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4 GENANN example 4. Train an ANN on the IRIS dataset using backpropagation. Loading 150 data points from example/iris.data Training for 5000 loops over data. 147/150 correct (98.0%). Performance counter stats for './example4': 101.369081 task-clock (msec) # 0.998 CPUs utilized 1 context-switches # 0.010 K/sec 0 cpu-migrations # 0.000 K/sec 79 page-faults # 0.779 K/sec 320,197,883 cycles # 3.159 GHz 1,121,174,423 instructions # 3.50 insn per cycle 223,257,752 branches # 2202.425 M/sec 62,680 branch-misses # 0.03% of all branches 0.101595114 seconds time elapsed ``` Prior to the change, we see something like: ``` $ make CFLAGS="-g -O3 -march=native" cc -g -O3 -march=native -c -o test.o test.c cc -g -O3 -march=native -c -o genann.o genann.c cc -g -O3 -march=native -c -o example1.o example1.c cc -g -O3 -march=native -c -o example2.o example2.c cc -g -O3 -march=native -c -o example3.o example3.c cc -g -O3 -march=native -c -o example4.o example4.c cc -g -O3 -march=native -c -o strings.o strings.c cc test.o genann.o -lm -o test cc example1.o genann.o -lm -o example1 cc example3.o genann.o -lm -o example3 cc example4.o genann.o -lm -o example4 cc strings.o genann.o -lm -o strings cc example2.o genann.o -lm -o example2 $ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4 GENANN example 4. Train an ANN on the IRIS dataset using backpropagation. Loading 150 data points from example/iris.data Training for 5000 loops over data. 147/150 correct (98.0%). Performance counter stats for './example4': 104.644198 task-clock (msec) # 0.998 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 79 page-faults # 0.755 K/sec 330,340,554 cycles # 3.157 GHz 1,123,669,767 instructions # 3.40 insn per cycle 215,441,809 branches # 2058.803 M/sec 62,406 branch-misses # 0.03% of all branches 0.104891323 seconds time elapsed ``` Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-12-18 16:59:40 +10:30
Andrew Jeffery	8b35090f06	genann: Sort headers Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-10-22 08:24:44 +10:30
Andrew Jeffery	9e86fc903e	genann: Fix unused-result warnings for fscanf() Signed-off-by: Andrew Jeffery <andrew@aj.id.au>	2017-10-22 08:23:58 +10:30
Dickby	e12f3a1820	simplify most inner loops of genann_run	2017-08-20 21:27:54 +02:00
Lewis Van Winkle	8853c56155	Added linear act output delta calculation	2016-05-20 17:54:46 -05:00
Lewis Van Winkle	29264145be	Added linear activation function.	2016-05-19 16:55:44 -05:00
Lewis Van Winkle	99e4d6a0e1	Changed name case, code style.	2016-02-11 14:38:42 -06:00
Lewis Van Winkle	850f080045	Initial commit	2016-02-09 17:53:54 -06:00

16 Commits