Commit Graph

55 Commits

Author SHA1 Message Date
Lewis Van Winkle 07b29f936c
Merge pull request #44 from timgates42/bugfix_typo_optimization
docs: fix simple typo, optimizion -> optimization
2020-10-10 10:09:22 -05:00
Tim Gates 8ac633de97
docs: fix simple typo, optimizion -> optimization
There is a small typo in README.md.

Should read `optimization` rather than `optimizion`.
2020-10-10 17:24:47 +11:00
Lewis Van Winkle 5e147c7e3f
Merge pull request #43 from codeplea/doc_c99
update doc to specify C99
2020-10-09 09:53:38 -05:00
Lewis Van Winkle f6c22401d2 update doc to specify C99 2020-10-09 09:52:57 -05:00
Lewis Van Winkle 122243f944
Removed inline function specifiers. 2019-06-07 11:55:37 -05:00
Jose Fernando Lopez Fernandez 5c93809d3b
Rebased to upstream master. Removed inline function specifiers. 2019-06-07 12:13:27 -04:00
Lewis Van Winkle 23f2a94216 added srand to examples 2018-09-05 08:06:25 -05:00
Lewis Van Winkle 30da4ebf5a changed header guard 2018-07-08 11:59:37 -05:00
Lewis Van Winkle 7cb7557668 make runs check by default 2018-07-07 19:07:45 -05:00
Lewis Van Winkle 94e22d4e06 tweaks to help it compile in VS 2018-07-07 19:05:53 -05:00
Lewis Van Winkle 703b51ad43 Merge branch 'master' of https://github.com/codeplea/genann 2018-07-07 19:05:24 -05:00
Lewis Van Winkle eb56a6d9f2 changed macro naming 2018-07-07 18:51:16 -05:00
Lewis Van Winkle 1afe9688d1
Merge pull request #22 from crclark96/crclark96-memleak-patch
free malloc'd memory at end of file
2018-07-07 18:41:33 -05:00
Collin Clark 35d115964b free malloc'd memory at end of file #21 2018-07-07 19:16:59 -04:00
Lewis Van Winkle fb6df6d00a update copyright date 2018-07-06 10:52:06 -05:00
Lewis Van Winkle 775b270851 update copyright date 2018-07-06 10:51:30 -05:00
Lewis Van Winkle 314ef383dd update copyright date 2018-07-06 10:50:44 -05:00
Lewis Van Winkle d2e716d9d1 removed line of dead code 2018-07-06 10:29:23 -05:00
Lewis Van Winkle b802e1e0b5 fixed fclose typo/bug 2018-07-06 10:28:11 -05:00
Lewis Van Winkle 033618b1f5
Merge pull request #8 from amboar/speed
RFC: Increase genann performance by roughly 30%
2018-04-11 08:38:44 -05:00
Lewis Van Winkle e8680aed7c
added link to tinn 2018-04-11 08:35:53 -05:00
Andrew Jeffery d21d0f301b genann: Remove branching from back-propagation inner-loop
This saves approximately 80 million instructions and 44 million branches in the
trace of example4, shaving off around 8ms:

Before:

```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

After:
```
 Performance counter stats for './example4':

         84.473035      task-clock (msec)         #    0.997 CPUs utilized
                 3      context-switches          #    0.036 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                81      page-faults               #    0.959 K/sec
       265,472,170      cycles                    #    3.143 GHz
       919,372,488      instructions              #    3.46  insn per cycle
       158,754,885      branches                  # 1879.356 M/sec
            65,337      branch-misses             #    0.04% of all branches

       0.084755458 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:37:39 +10:30
Andrew Jeffery db51375bb7 genann: Optionally resolve activation functions at link time
Shave around 94 million instructions and 10 million branches off of execution
trace of example4 if the sigmoid activation function is resolved at link-time.

Before (`make`):
```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

After:

`make`:
```
 Performance counter stats for './example4':

         97.335180      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                82      page-faults               #    0.842 K/sec
       306,722,357      cycles                    #    3.151 GHz
     1,065,669,644      instructions              #    3.47  insn per cycle
       214,256,601      branches                  # 2201.225 M/sec
            60,154      branch-misses             #    0.03% of all branches

       0.097577079 seconds time elapsed
```

`make sigmoid`:
```
 Performance counter stats for './example4':

         92.629610      task-clock (msec)         #    0.997 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                78      page-faults               #    0.842 K/sec
       291,863,801      cycles                    #    3.151 GHz
     1,000,931,204      instructions              #    3.43  insn per cycle
       202,465,800      branches                  # 2185.757 M/sec
            50,949      branch-misses             #    0.03% of all branches

       0.092889789 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 22:36:44 +10:30
Andrew Jeffery b1f72be243 genann: Unroll loops via hoisting inner-loop conditions in genann_run()
This gives a reduction of rougly 27 million instructions and 11 million
branches in the execution trace of example4.

On a Lenovo X1 Carbon Gen 3 machine (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz)
running Ubuntu 17.10 with GCC 7.2.0-8ubuntu3, using
CFLAGS="-g -O3 -march=native -DNDEBUG" I see the following change in
`perf stat`:

Before:

```
Performance counter stats for './example4':

       101.369081      task-clock (msec)         #    0.998 CPUs utilized
                1      context-switches          #    0.010 K/sec
                0      cpu-migrations            #    0.000 K/sec
               79      page-faults               #    0.779 K/sec
      320,197,883      cycles                    #    3.159 GHz
    1,121,174,423      instructions              #    3.50  insn per cycle
      223,257,752      branches                  # 2202.425 M/sec
           62,680      branch-misses             #    0.03% of all branches

      0.101595114 seconds time elapsed
```

After:

```
 Performance counter stats for './example4':

         98.988806      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.798 K/sec
       312,298,260      cycles                    #    3.155 GHz
     1,094,183,752      instructions              #    3.50  insn per cycle
       212,007,732      branches                  # 2141.734 M/sec
            62,774      branch-misses             #    0.03% of all branches

       0.099228100 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery 6574bddf6b genann: Use reciprocal interval value to strength reduce divide to multiply
This gives a reduction of roughly 2.5 million instructions in the execution
trace of example4.

genann_act_sigmoid_cached() previously divided by interval to calculate the
lookup index. Divide is a expensive operation, so instead use the reciprocal of
the existing interval calculation to reduce the divide to a multiply.

Building with the following configuration:

```
$ head /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 61
model name      : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
stepping        : 4
microcode       : 0x25
cpu MHz         : 2593.871
cache size      : 4096 KB
physical id     : 0
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="17.10 (Artful Aardvark)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 17.10"
VERSION_ID="17.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=artful
UBUNTU_CODENAME=artful
$ cc --version
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

on my Lenovo X1 Carbon Gen 3 machine sees the following:

```
$ make CFLAGS="-g -O3 -march=native -DNDEBUG"
cc -g -O3 -march=native -DNDEBUG   -c -o test.o test.c
cc -g -O3 -march=native -DNDEBUG   -c -o genann.o genann.c
cc -g -O3 -march=native -DNDEBUG   -c -o example1.o example1.c
cc -g -O3 -march=native -DNDEBUG   -c -o example2.o example2.c
cc -g -O3 -march=native -DNDEBUG   -c -o example3.o example3.c
cc -g -O3 -march=native -DNDEBUG   -c -o example4.o example4.c
cc -g -O3 -march=native -DNDEBUG   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example4.o genann.o  -lm -o example4
cc   example3.o genann.o  -lm -o example3
cc   example2.o genann.o  -lm -o example2
cc   strings.o genann.o  -lm -o strings
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        101.369081      task-clock (msec)         #    0.998 CPUs utilized
                 1      context-switches          #    0.010 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.779 K/sec
       320,197,883      cycles                    #    3.159 GHz
     1,121,174,423      instructions              #    3.50  insn per cycle
       223,257,752      branches                  # 2202.425 M/sec
            62,680      branch-misses             #    0.03% of all branches

       0.101595114 seconds time elapsed
```

Prior to the change, we see something like:

```
$ make CFLAGS="-g -O3 -march=native"
cc -g -O3 -march=native   -c -o test.o test.c
cc -g -O3 -march=native   -c -o genann.o genann.c
cc -g -O3 -march=native   -c -o example1.o example1.c
cc -g -O3 -march=native   -c -o example2.o example2.c
cc -g -O3 -march=native   -c -o example3.o example3.c
cc -g -O3 -march=native   -c -o example4.o example4.c
cc -g -O3 -march=native   -c -o strings.o strings.c
cc   test.o genann.o  -lm -o test
cc   example1.o genann.o  -lm -o example1
cc   example3.o genann.o  -lm -o example3
cc   example4.o genann.o  -lm -o example4
cc   strings.o genann.o  -lm -o strings
cc   example2.o genann.o  -lm -o example2
$ for i in `seq 0 10`; do ./example4 > /dev/null; done; sudo perf stat record ./example4
GENANN example 4.
Train an ANN on the IRIS dataset using backpropagation.
Loading 150 data points from example/iris.data
Training for 5000 loops over data.
147/150 correct (98.0%).

 Performance counter stats for './example4':

        104.644198      task-clock (msec)         #    0.998 CPUs utilized
                 0      context-switches          #    0.000 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                79      page-faults               #    0.755 K/sec
       330,340,554      cycles                    #    3.157 GHz
     1,123,669,767      instructions              #    3.40  insn per cycle
       215,441,809      branches                  # 2058.803 M/sec
            62,406      branch-misses             #    0.03% of all branches

       0.104891323 seconds time elapsed
```

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery b79a5ce751 Makefile: Increase optimisation
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery ad8bbaa979 Makefile: Add test and example programs to clean target
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Andrew Jeffery 01511920e2 Makefile: Fix CFLAGS variable name
CCFLAGS is non-standard, and thus ignored now that standard make rules are
used.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-12-18 16:59:40 +10:30
Lewis Van Winkle 4dd67e42bc
Merge pull request #7 from amboar/misc-cleanups
Miscellaneous cleanups
2017-11-28 11:43:09 -06:00
Andrew Jeffery e4e40304e0 Makefile: Use standard make variables and recipes
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 09:54:56 +10:30
Andrew Jeffery 8b35090f06 genann: Sort headers
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:24:44 +10:30
Andrew Jeffery 9e86fc903e genann: Fix unused-result warnings for fscanf()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:58 +10:30
Andrew Jeffery 4ef0a3f874 example4: Fix unused-result warning for fgets()
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:23:21 +10:30
Andrew Jeffery afa5df1ffc Makefile: Use $(RM), silencing errors on missing files
$(RM) includes the -f flag, so the clean target now succeeds when files
to remove don't exist. The post-condition of clean is that compilation
artifacts are not present; this is trivially satisfied if they never
existed.

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
2017-10-22 08:05:06 +10:30
Lewis Van Winkle 88c9d3513b Merge pull request #5 from Dickby/master
simplify most inner loops of genann_run
2017-08-20 16:57:21 -05:00
Dickby e12f3a1820 simplify most inner loops of genann_run 2017-08-20 21:27:54 +02:00
Lewis Van Winkle de04314b10 fixed headings for github 2017-04-03 13:12:42 -05:00
Lewis Van Winkle a55180c2e4 typo 2017-01-30 20:06:38 -06:00
Lewis Van Winkle 4d2eec816e added more training for xor 2017-01-15 12:48:00 -06:00
Lewis Van Winkle b596fe2fd0 moved license 2016-12-21 12:22:49 -06:00
Lewis Van Winkle 9ffc3715f5 Added building section to readme 2016-12-07 15:44:18 -06:00
Lewis Van Winkle 58f8e88730 added logo to readme 2016-12-07 12:20:39 -06:00
Lewis Van Winkle 8853c56155 Added linear act output delta calculation 2016-05-20 17:54:46 -05:00
Lewis Van Winkle 29264145be Added linear activation function. 2016-05-19 16:55:44 -05:00
Lewis Van Winkle 6ece821187 Typo 2016-03-14 16:44:38 -05:00
Lewis Van Winkle 28e3d30ef1 Rewording. 2016-03-14 16:43:36 -05:00
Lewis Van Winkle 0bb6591322 Linked to examples. 2016-03-14 12:25:08 -05:00
Lewis Van Winkle d9da5edab4 Added to documentation. 2016-03-14 12:23:10 -05:00
Lewis Van Winkle 064703e332 Changed build to work with both gcc and clang. 2016-03-08 12:58:02 -06:00
Lewis Van Winkle b1fcdf7f69 Added travis build status. 2016-03-08 12:51:05 -06:00