Commit Graph

32 Commits

Author SHA1 Message Date
Andrew Jeffery
b1f72be243 genann: Unroll loops via hoisting inner-loop conditions in genann_run()
This gives a reduction of rougly 27 million instructions and 11 million
branches in the execution trace of example4.

On a Lenovo X1 Carbon Gen 3 machine (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz)
running Ubuntu 17.10 with GCC 7.2.0-8ubuntu3, using
CFLAGS="-g -O3 -march=native -DNDEBUG" I see the following change in
`perf stat`:

Before:

```
Performance counter stats for './example4':

       101.369081      task-clock (msec)         #    0.998 CPUs utilized
                1      context-switches          #    0.010 K/sec
                0      cpu-migrations            #    0.000 K/sec
               79      page-faults               #    0.779 K/sec
      320,197,883      cycles                    #    3.159 GHz
    1,121,174,423      instructions              #    3.50  insn per cycle
      223,257,752      branches                  # 2202.425 M/sec
           62,680      branch-misses             #    0.03% of all branches

      0.101595114 seconds time elapsed
```

After:

```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
6574bddf6b genann: Use reciprocal interval value to strength reduce divide to multiply
This gives a reduction of roughly 2.5 million instructions in the execution
trace of example4.

genann_act_sigmoid_cached() previously divided by interval to calculate the
lookup index. Divide is a expensive operation, so instead use the reciprocal of
the existing interval calculation to reduce the divide to a multiply.

Building with the following configuration:

```
$ head /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
stepping        : 4
microcode       : 0x25
cpu MHz         : 2593.871
cache size      : 4096 KB
physical id     : 0
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful
$ cc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

on my Lenovo X1 Carbon Gen 3 machine sees the following:

```
$ make CFLAGS="-g -O3 -march=native -DNDEBUG"
cc -g -O3 -march=native -DNDEBUG   -c -o test.o test.c
cc -g -O3 -march=native -DNDEBUG   -c -o genann.o genann.c
cc -g -O3 -march=native -DNDEBUG   -c -o example1.o example1.c
cc -g -O3 -march=native -DNDEBUG   -c -o example2.o example2.c
cc -g -O3 -march=native -DNDEBUG   -c -o example3.o example3.c
cc -g -O3 -march=native -DNDEBUG   -c -o example4.o example4.c
cc -g -O3 -march=native -DNDEBUG   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example4.o genann.o  -lm -o example4
cc   example3.o genann.o  -lm -o example3
cc   example2.o genann.o  -lm -o example2
cc   strings.o genann.o  -lm -o strings
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        101.369081      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.779 K/sec
       320,197,883      cycles                    #    3.159 GHz
     1,121,174,423      instructions              #    3.50  insn per cycle
       223,257,752      branches                  # 2202.425 M/sec
            62,680      branch-misses             #    0.03% of all branches

       0.101595114 seconds time elapsed
```

Prior to the change, we see something like:

```
$ make CFLAGS="-g -O3 -march=native"
cc -g -O3 -march=native   -c -o test.o test.c
cc -g -O3 -march=native   -c -o genann.o genann.c
cc -g -O3 -march=native   -c -o example1.o example1.c
cc -g -O3 -march=native   -c -o example2.o example2.c
cc -g -O3 -march=native   -c -o example3.o example3.c
cc -g -O3 -march=native   -c -o example4.o example4.c
cc -g -O3 -march=native   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example3.o genann.o  -lm -o example3
cc   example4.o genann.o  -lm -o example4
cc   strings.o genann.o  -lm -o strings
cc   example2.o genann.o  -lm -o example2
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        104.644198      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.755 K/sec
       330,340,554      cycles                    #    3.157 GHz
     1,123,669,767      instructions              #    3.40  insn per cycle
       215,441,809      branches                  # 2058.803 M/sec
            62,406      branch-misses             #    0.03% of all branches

       0.104891323 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
b79a5ce751 Makefile: Increase optimisation
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
ad8bbaa979 Makefile: Add test and example programs to clean target
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery
01511920e2 Makefile: Fix CFLAGS variable name
CCFLAGS is non-standard, and thus ignored now that standard make rules are
used.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Lewis Van Winkle
4dd67e42bc
Merge pull request #7 from amboar/misc-cleanups
Miscellaneous cleanups
2017-11-28 11:43:09 -06:00
Andrew Jeffery
e4e40304e0 Makefile: Use standard make variables and recipes
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 09:54:56 +10:30
Andrew Jeffery
8b35090f06 genann: Sort headers
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:24:44 +10:30
Andrew Jeffery
9e86fc903e genann: Fix unused-result warnings for fscanf()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:58 +10:30
Andrew Jeffery
4ef0a3f874 example4: Fix unused-result warning for fgets()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:21 +10:30
Andrew Jeffery
afa5df1ffc Makefile: Use $(RM), silencing errors on missing files
$(RM) includes the -f flag, so the clean target now succeeds when files
to remove don't exist. The post-condition of clean is that compilation
artifacts are not present; this is trivially satisfied if they never
existed.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:05:06 +10:30
Lewis Van Winkle
88c9d3513b Merge pull request #5 from Dickby/master
simplify most inner loops of genann_run
2017-08-20 16:57:21 -05:00
Dickby
e12f3a1820 simplify most inner loops of genann_run 2017-08-20 21:27:54 +02:00
Lewis Van Winkle
de04314b10 fixed headings for github 2017-04-03 13:12:42 -05:00
Lewis Van Winkle
a55180c2e4 typo 2017-01-30 20:06:38 -06:00
Lewis Van Winkle
4d2eec816e added more training for xor 2017-01-15 12:48:00 -06:00
Lewis Van Winkle
b596fe2fd0 moved license 2016-12-21 12:22:49 -06:00
Lewis Van Winkle
9ffc3715f5 Added building section to readme 2016-12-07 15:44:18 -06:00
Lewis Van Winkle
58f8e88730 added logo to readme 2016-12-07 12:20:39 -06:00
Lewis Van Winkle
8853c56155 Added linear act output delta calculation 2016-05-20 17:54:46 -05:00
Lewis Van Winkle
29264145be Added linear activation function. 2016-05-19 16:55:44 -05:00
Lewis Van Winkle
6ece821187 Typo 2016-03-14 16:44:38 -05:00
Lewis Van Winkle
28e3d30ef1 Rewording. 2016-03-14 16:43:36 -05:00
Lewis Van Winkle
0bb6591322 Linked to examples. 2016-03-14 12:25:08 -05:00
Lewis Van Winkle
d9da5edab4 Added to documentation. 2016-03-14 12:23:10 -05:00
Lewis Van Winkle
064703e332 Changed build to work with both gcc and clang. 2016-03-08 12:58:02 -06:00
Lewis Van Winkle
b1fcdf7f69 Added travis build status. 2016-03-08 12:51:05 -06:00
Lewis Van Winkle
99e4d6a0e1 Changed name case, code style. 2016-02-11 14:38:42 -06:00
Lewis Van Winkle
2bbc1b146c Readme work. 2016-02-09 22:09:21 -06:00
Lewis Van Winkle
f44be25c8b Tweaked readme. 2016-02-09 19:16:18 -06:00
Lewis Van Winkle
b6da170fa7 Added C++ guards. 2016-02-09 19:13:37 -06:00
Lewis Van Winkle
850f080045 Initial commit 2016-02-09 17:53:54 -06:00