Commit Graph

34 Commits

Author SHA1 Message Date
Andrew Jeffery d21d0f301b genann: Remove branching from back-propagation inner-loop
This saves approximately 80 million instructions and 44 million branches in the
trace of example4, shaving off around 8ms:

Before:

```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

After:
```
 Performance counter stats for './example4':

         84.473035      task-clock (msec)         #    0.997 CPUs utilized
                 3      context-switches          #    0.036 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                81      page-faults               #    0.959 K/sec
       265,472,170      cycles                    #    3.143 GHz
       919,372,488      instructions              #    3.46  insn per cycle
       158,754,885      branches                  # 1879.356 M/sec
            65,337      branch-misses             #    0.04% of all branches

       0.084755458 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:37:39 +10:30
Andrew Jeffery db51375bb7 genann: Optionally resolve activation functions at link time
Shave around 94 million instructions and 10 million branches off of execution
trace of example4 if the sigmoid activation function is resolved at link-time.

Before (`make`):
```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

After:

`make`:
```
 Performance counter stats for './example4':

         97.335180      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                82      page-faults               #    0.842 K/sec
       306,722,357      cycles                    #    3.151 GHz
     1,065,669,644      instructions              #    3.47  insn per cycle
       214,256,601      branches                  # 2201.225 M/sec
            60,154      branch-misses             #    0.03% of all branches

       0.097577079 seconds time elapsed
```

`make sigmoid`:
```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:36:44 +10:30
Andrew Jeffery b1f72be243 genann: Unroll loops via hoisting inner-loop conditions in genann_run()
This gives a reduction of rougly 27 million instructions and 11 million
branches in the execution trace of example4.

On a Lenovo X1 Carbon Gen 3 machine (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz)
running Ubuntu 17.10 with GCC 7.2.0-8ubuntu3, using
CFLAGS="-g -O3 -march=native -DNDEBUG" I see the following change in
`perf stat`:

Before:

```
Performance counter stats for './example4':

       101.369081      task-clock (msec)         #    0.998 CPUs utilized
                1      context-switches          #    0.010 K/sec
                0      cpu-migrations            #    0.000 K/sec
               79      page-faults               #    0.779 K/sec
      320,197,883      cycles                    #    3.159 GHz
    1,121,174,423      instructions              #    3.50  insn per cycle
      223,257,752      branches                  # 2202.425 M/sec
           62,680      branch-misses             #    0.03% of all branches

      0.101595114 seconds time elapsed
```

After:

```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery 6574bddf6b genann: Use reciprocal interval value to strength reduce divide to multiply
This gives a reduction of roughly 2.5 million instructions in the execution
trace of example4.

genann_act_sigmoid_cached() previously divided by interval to calculate the
lookup index. Divide is a expensive operation, so instead use the reciprocal of
the existing interval calculation to reduce the divide to a multiply.

Building with the following configuration:

```
$ head /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
stepping        : 4
microcode       : 0x25
cpu MHz         : 2593.871
cache size      : 4096 KB
physical id     : 0
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful
$ cc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

on my Lenovo X1 Carbon Gen 3 machine sees the following:

```
$ make CFLAGS="-g -O3 -march=native -DNDEBUG"
cc -g -O3 -march=native -DNDEBUG   -c -o test.o test.c
cc -g -O3 -march=native -DNDEBUG   -c -o genann.o genann.c
cc -g -O3 -march=native -DNDEBUG   -c -o example1.o example1.c
cc -g -O3 -march=native -DNDEBUG   -c -o example2.o example2.c
cc -g -O3 -march=native -DNDEBUG   -c -o example3.o example3.c
cc -g -O3 -march=native -DNDEBUG   -c -o example4.o example4.c
cc -g -O3 -march=native -DNDEBUG   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example4.o genann.o  -lm -o example4
cc   example3.o genann.o  -lm -o example3
cc   example2.o genann.o  -lm -o example2
cc   strings.o genann.o  -lm -o strings
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        101.369081      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.779 K/sec
       320,197,883      cycles                    #    3.159 GHz
     1,121,174,423      instructions              #    3.50  insn per cycle
       223,257,752      branches                  # 2202.425 M/sec
            62,680      branch-misses             #    0.03% of all branches

       0.101595114 seconds time elapsed
```

Prior to the change, we see something like:

```
$ make CFLAGS="-g -O3 -march=native"
cc -g -O3 -march=native   -c -o test.o test.c
cc -g -O3 -march=native   -c -o genann.o genann.c
cc -g -O3 -march=native   -c -o example1.o example1.c
cc -g -O3 -march=native   -c -o example2.o example2.c
cc -g -O3 -march=native   -c -o example3.o example3.c
cc -g -O3 -march=native   -c -o example4.o example4.c
cc -g -O3 -march=native   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example3.o genann.o  -lm -o example3
cc   example4.o genann.o  -lm -o example4
cc   strings.o genann.o  -lm -o strings
cc   example2.o genann.o  -lm -o example2
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        104.644198      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.755 K/sec
       330,340,554      cycles                    #    3.157 GHz
     1,123,669,767      instructions              #    3.40  insn per cycle
       215,441,809      branches                  # 2058.803 M/sec
            62,406      branch-misses             #    0.03% of all branches

       0.104891323 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery b79a5ce751 Makefile: Increase optimisation
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery ad8bbaa979 Makefile: Add test and example programs to clean target
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery 01511920e2 Makefile: Fix CFLAGS variable name
CCFLAGS is non-standard, and thus ignored now that standard make rules are
used.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Lewis Van Winkle 4dd67e42bc
Merge pull request #7 from amboar/misc-cleanups
Miscellaneous cleanups
2017-11-28 11:43:09 -06:00
Andrew Jeffery e4e40304e0 Makefile: Use standard make variables and recipes
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 09:54:56 +10:30
Andrew Jeffery 8b35090f06 genann: Sort headers
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:24:44 +10:30
Andrew Jeffery 9e86fc903e genann: Fix unused-result warnings for fscanf()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:58 +10:30
Andrew Jeffery 4ef0a3f874 example4: Fix unused-result warning for fgets()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:21 +10:30
Andrew Jeffery afa5df1ffc Makefile: Use $(RM), silencing errors on missing files
$(RM) includes the -f flag, so the clean target now succeeds when files
to remove don't exist. The post-condition of clean is that compilation
artifacts are not present; this is trivially satisfied if they never
existed.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:05:06 +10:30
Lewis Van Winkle 88c9d3513b Merge pull request #5 from Dickby/master
simplify most inner loops of genann_run
2017-08-20 16:57:21 -05:00
Dickby e12f3a1820 simplify most inner loops of genann_run 2017-08-20 21:27:54 +02:00
Lewis Van Winkle de04314b10 fixed headings for github 2017-04-03 13:12:42 -05:00
Lewis Van Winkle a55180c2e4 typo 2017-01-30 20:06:38 -06:00
Lewis Van Winkle 4d2eec816e added more training for xor 2017-01-15 12:48:00 -06:00
Lewis Van Winkle b596fe2fd0 moved license 2016-12-21 12:22:49 -06:00
Lewis Van Winkle 9ffc3715f5 Added building section to readme 2016-12-07 15:44:18 -06:00
Lewis Van Winkle 58f8e88730 added logo to readme 2016-12-07 12:20:39 -06:00
Lewis Van Winkle 8853c56155 Added linear act output delta calculation 2016-05-20 17:54:46 -05:00
Lewis Van Winkle 29264145be Added linear activation function. 2016-05-19 16:55:44 -05:00
Lewis Van Winkle 6ece821187 Typo 2016-03-14 16:44:38 -05:00
Lewis Van Winkle 28e3d30ef1 Rewording. 2016-03-14 16:43:36 -05:00
Lewis Van Winkle 0bb6591322 Linked to examples. 2016-03-14 12:25:08 -05:00
Lewis Van Winkle d9da5edab4 Added to documentation. 2016-03-14 12:23:10 -05:00
Lewis Van Winkle 064703e332 Changed build to work with both gcc and clang. 2016-03-08 12:58:02 -06:00
Lewis Van Winkle b1fcdf7f69 Added travis build status. 2016-03-08 12:51:05 -06:00
Lewis Van Winkle 99e4d6a0e1 Changed name case, code style. 2016-02-11 14:38:42 -06:00
Lewis Van Winkle 2bbc1b146c Readme work. 2016-02-09 22:09:21 -06:00
Lewis Van Winkle f44be25c8b Tweaked readme. 2016-02-09 19:16:18 -06:00
Lewis Van Winkle b6da170fa7 Added C++ guards. 2016-02-09 19:13:37 -06:00
Lewis Van Winkle 850f080045 Initial commit 2016-02-09 17:53:54 -06:00