Commit Graph

16 Commits

Author SHA1 Message Date
Jose Fernando Lopez Fernandez
5c93809d3b
Rebased to upstream master. Removed inline function specifiers. 2019-06-07 12:13:27 -04:00
Lewis Van Winkle
94e22d4e06 tweaks to help it compile in VS 2018-07-07 19:05:53 -05:00
Lewis Van Winkle
eb56a6d9f2 changed macro naming 2018-07-07 18:51:16 -05:00
Lewis Van Winkle
314ef383dd update copyright date 2018-07-06 10:50:44 -05:00
Lewis Van Winkle
d2e716d9d1 removed line of dead code 2018-07-06 10:29:23 -05:00
Andrew Jeffery
d21d0f301b genann: Remove branching from back-propagation inner-loop
This saves approximately 80 million instructions and 44 million branches in the
trace of example4, shaving off around 8ms:

Before:

```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

After:
```
 Performance counter stats for './example4':

         84.473035      task-clock (msec)         #    0.997 CPUs utilized
                 3      context-switches          #    0.036 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                81      page-faults               #    0.959 K/sec
       265,472,170      cycles                    #    3.143 GHz
       919,372,488      instructions              #    3.46  insn per cycle
       158,754,885      branches                  # 1879.356 M/sec
            65,337      branch-misses             #    0.04% of all branches

       0.084755458 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:37:39 +10:30
Andrew Jeffery
db51375bb7 genann: Optionally resolve activation functions at link time
Shave around 94 million instructions and 10 million branches off of execution
trace of example4 if the sigmoid activation function is resolved at link-time.

Before (`make`):
```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

After:

`make`:
```
 Performance counter stats for './example4':

         97.335180      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                82      page-faults               #    0.842 K/sec
       306,722,357      cycles                    #    3.151 GHz
     1,065,669,644      instructions              #    3.47  insn per cycle
       214,256,601      branches                  # 2201.225 M/sec
            60,154      branch-misses             #    0.03% of all branches

       0.097577079 seconds time elapsed
```

`make sigmoid`:
```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:36:44 +10:30
Andrew Jeffery
b1f72be243 genann: Unroll loops via hoisting inner-loop conditions in genann_run()
This gives a reduction of rougly 27 million instructions and 11 million
branches in the execution trace of example4.

On a Lenovo X1 Carbon Gen 3 machine (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz)
running Ubuntu 17.10 with GCC 7.2.0-8ubuntu3, using
CFLAGS="-g -O3 -march=native -DNDEBUG" I see the following change in
`perf stat`:

Before:

```
Performance counter stats for './example4':

       101.369081      task-clock (msec)         #    0.998 CPUs utilized
                1      context-switches          #    0.010 K/sec
                0      cpu-migrations            #    0.000 K/sec
               79      page-faults               #    0.779 K/sec
      320,197,883      cycles                    #    3.159 GHz
    1,121,174,423      instructions              #    3.50  insn per cycle
      223,257,752      branches                  # 2202.425 M/sec
           62,680      branch-misses             #    0.03% of all branches

      0.101595114 seconds time elapsed
```

After:

```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
6574bddf6b genann: Use reciprocal interval value to strength reduce divide to multiply
This gives a reduction of roughly 2.5 million instructions in the execution
trace of example4.

genann_act_sigmoid_cached() previously divided by interval to calculate the
lookup index. Divide is a expensive operation, so instead use the reciprocal of
the existing interval calculation to reduce the divide to a multiply.

Building with the following configuration:

```
$ head /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
stepping        : 4
microcode       : 0x25
cpu MHz         : 2593.871
cache size      : 4096 KB
physical id     : 0
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful
$ cc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

on my Lenovo X1 Carbon Gen 3 machine sees the following:

```
$ make CFLAGS="-g -O3 -march=native -DNDEBUG"
cc -g -O3 -march=native -DNDEBUG   -c -o test.o test.c
cc -g -O3 -march=native -DNDEBUG   -c -o genann.o genann.c
cc -g -O3 -march=native -DNDEBUG   -c -o example1.o example1.c
cc -g -O3 -march=native -DNDEBUG   -c -o example2.o example2.c
cc -g -O3 -march=native -DNDEBUG   -c -o example3.o example3.c
cc -g -O3 -march=native -DNDEBUG   -c -o example4.o example4.c
cc -g -O3 -march=native -DNDEBUG   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example4.o genann.o  -lm -o example4
cc   example3.o genann.o  -lm -o example3
cc   example2.o genann.o  -lm -o example2
cc   strings.o genann.o  -lm -o strings
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        101.369081      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.779 K/sec
       320,197,883      cycles                    #    3.159 GHz
     1,121,174,423      instructions              #    3.50  insn per cycle
       223,257,752      branches                  # 2202.425 M/sec
            62,680      branch-misses             #    0.03% of all branches

       0.101595114 seconds time elapsed
```

Prior to the change, we see something like:

```
$ make CFLAGS="-g -O3 -march=native"
cc -g -O3 -march=native   -c -o test.o test.c
cc -g -O3 -march=native   -c -o genann.o genann.c
cc -g -O3 -march=native   -c -o example1.o example1.c
cc -g -O3 -march=native   -c -o example2.o example2.c
cc -g -O3 -march=native   -c -o example3.o example3.c
cc -g -O3 -march=native   -c -o example4.o example4.c
cc -g -O3 -march=native   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example3.o genann.o  -lm -o example3
cc   example4.o genann.o  -lm -o example4
cc   strings.o genann.o  -lm -o strings
cc   example2.o genann.o  -lm -o example2
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        104.644198      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.755 K/sec
       330,340,554      cycles                    #    3.157 GHz
     1,123,669,767      instructions              #    3.40  insn per cycle
       215,441,809      branches                  # 2058.803 M/sec
            62,406      branch-misses             #    0.03% of all branches

       0.104891323 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
8b35090f06 genann: Sort headers
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:24:44 +10:30
Andrew Jeffery
9e86fc903e genann: Fix unused-result warnings for fscanf()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:58 +10:30
Dickby
e12f3a1820 simplify most inner loops of genann_run 2017-08-20 21:27:54 +02:00
Lewis Van Winkle
8853c56155 Added linear act output delta calculation 2016-05-20 17:54:46 -05:00
Lewis Van Winkle
29264145be Added linear activation function. 2016-05-19 16:55:44 -05:00
Lewis Van Winkle
99e4d6a0e1 Changed name case, code style. 2016-02-11 14:38:42 -06:00
Lewis Van Winkle
850f080045 Initial commit 2016-02-09 17:53:54 -06:00