Algorithms_in_C
1.0.0
Set of algorithms implemented in C.
|
Kohonen self organizing map (topological map)
More...
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
|
#define | _USE_MATH_DEFINES |
| required for MS Visual C
|
|
#define | max(a, b) (((a) > (b)) ? (a) : (b)) |
| shorthand for maximum value
|
|
#define | min(a, b) (((a) < (b)) ? (a) : (b)) |
| shorthand for minimum value
|
|
|
double * | data_3d (const struct array_3d *arr, int x, int y, int z) |
| Function that returns the pointer to (x, y, z) ^th location in the linear 3D array given by: More...
|
|
double | _random (double a, double b) |
| Helper function to generate a random number in a given interval. More...
|
|
int | save_2d_data (const char *fname, double **X, int num_points, int num_features) |
| Save a given n-dimensional data martix to file. More...
|
|
int | save_u_matrix (const char *fname, struct array_3d *W) |
| Create the distance matrix or U-matrix from the trained weights and save to disk. More...
|
|
void | get_min_2d (double **X, int N, double *val, int *x_idx, int *y_idx) |
| Get minimum value and index of the value in a matrix. More...
|
|
double | update_weights (const double *X, struct array_3d *W, double **D, int num_out, int num_features, double alpha, int R) |
| Update weights of the SOM using Kohonen algorithm. More...
|
|
void | kohonen_som (double **X, struct array_3d *W, int num_samples, int num_features, int num_out, double alpha_min) |
| Apply incremental algorithm with updating neighborhood and learning rates on all samples in the given datset. More...
|
|
void | test_2d_classes (double *const *data, int N) |
| Creates a random set of points distributed in four clusters in 3D space with centroids at the points. More...
|
|
void | test1 () |
| Test that creates a random set of points distributed in four clusters in 2D space and trains an SOM that finds the topological pattern. More...
|
|
void | test_3d_classes1 (double *const *data, int N) |
| Creates a random set of points distributed in four clusters in 3D space with centroids at the points. More...
|
|
void | test2 () |
| Test that creates a random set of points distributed in 4 clusters in 3D space and trains an SOM that finds the topological pattern. More...
|
|
void | test_3d_classes2 (double *const *data, int N) |
| Creates a random set of points distributed in four clusters in 3D space with centroids at the points. More...
|
|
void | test3 () |
| Test that creates a random set of points distributed in eight clusters in 3D space and trains an SOM that finds the topological pattern. More...
|
|
double | get_clock_diff (clock_t start_t, clock_t end_t) |
| Convert clock cycle difference to time in seconds. More...
|
|
int | main (int argc, char **argv) |
| Main function.
|
|
Kohonen self organizing map (topological map)
- Author
- Krishna Vedala This example implements a powerful unsupervised learning algorithm called as a self organizing map. The algorithm creates a connected network of weights that closely follows the given data points. This thus creates a topological map of the given data i.e., it maintains the relationship between varipus data points in a much higher dimesional space by creating an equivalent in a 2-dimensional space.
- Warning
- MSVC 2019 compiler generates code that does not execute as expected. However, MinGW, Clang for GCC and Clang for MSVC compilers on windows perform as expected. Any insights and suggestions should be directed to the author.
- See also
- kohonen_som_trace.c
◆ _random()
double _random |
( |
double |
a, |
|
|
double |
b |
|
) |
| |
Helper function to generate a random number in a given interval.
Steps:
r1 = rand() % 100
gets a random number between 0 and 99
r2 = r1 / 100
converts random number to be between 0 and 0.99
- scale and offset the random number to given range of \([a,b)\)
\[ y = (b - a) \times \frac{\text{(random number between 0 and RAND_MAX)} \; \text{mod}\; 100}{100} + a \]
- Parameters
-
[in] | a | lower limit |
[in] | b | upper limit |
- Returns
- random number in the range \([a,b)\)
82 return ((b - a) * (rand() % 100) / 100.f) + a;
◆ data_3d()
double* data_3d |
( |
const struct array_3d * |
arr, |
|
|
int |
x, |
|
|
int |
y, |
|
|
int |
z |
|
) |
| |
Function that returns the pointer to (x, y, z) ^th location in the linear 3D array given by:
\[ X_{i,j,k} = i\times M\times N + j\times N + k \]
where \(L\), \(M\) and \(N\) are the 3D matrix dimensions.
- Parameters
-
[in] | arr | pointer to array_3d structure |
[in] | x | first index |
[in] | y | second index |
[in] | z | third index |
- Returns
- pointer to (x,y,z)^th location of data
62 int offset = (x * arr->
dim2 * arr->
dim3) + (y * arr->
dim3) + z;
63 return arr->
data + offset;
◆ get_clock_diff()
double get_clock_diff |
( |
clock_t |
start_t, |
|
|
clock_t |
end_t |
|
) |
| |
Convert clock cycle difference to time in seconds.
- Parameters
-
[in] | start_t | start clock |
[in] | end_t | end clock |
- Returns
- time difference in seconds
652 return (
double)(end_t - start_t) / (
double)CLOCKS_PER_SEC;
◆ get_min_2d()
void get_min_2d |
( |
double ** |
X, |
|
|
int |
N, |
|
|
double * |
val, |
|
|
int * |
x_idx, |
|
|
int * |
y_idx |
|
) |
| |
Get minimum value and index of the value in a matrix.
- Parameters
-
[in] | X | matrix to search |
[in] | N | number of points in the vector |
[out] | val | minimum value found |
[out] | x_idx | x-index where minimum value was found |
[out] | y_idx | y-index where minimum value was found |
201 for (
int i = 0; i <
N; i++)
203 for (
int j = 0; j <
N; j++)
205 if (X[i][j] < val[0])
◆ kohonen_som()
void kohonen_som |
( |
double ** |
X, |
|
|
struct array_3d * |
W, |
|
|
int |
num_samples, |
|
|
int |
num_features, |
|
|
int |
num_out, |
|
|
double |
alpha_min |
|
) |
| |
Apply incremental algorithm with updating neighborhood and learning rates on all samples in the given datset.
- Parameters
-
[in] | X | data set |
[in,out] | W | weights matrix |
[in] | num_samples | number of output points |
[in] | num_features | number of features per input sample |
[in] | num_out | number of output points |
[in] | alpha_min | terminal value of alpha |
309 int R = num_out >> 2, iter = 0;
310 double **D = (
double **)malloc(num_out *
sizeof(
double *));
311 for (
int i = 0; i < num_out; i++)
312 D[i] = (
double *)malloc(num_out *
sizeof(
double));
317 for (
double alpha = 1.f; alpha > alpha_min && dmin > 1e-3;
318 alpha -= 0.001, iter++)
322 for (
int sample = 0; sample < num_samples; sample++)
330 if (iter % 100 == 0 && R > 1)
334 printf(
"iter: %5d\t alpha: %.4g\t R: %d\td_min: %.4g\r", iter, alpha, R,
339 for (
int i = 0; i < num_out; i++) free(D[i]);
◆ save_2d_data()
int save_2d_data |
( |
const char * |
fname, |
|
|
double ** |
X, |
|
|
int |
num_points, |
|
|
int |
num_features |
|
) |
| |
Save a given n-dimensional data martix to file.
- Parameters
-
[in] | fname | filename to save in (gets overwriten without confirmation) |
[in] | X | matrix to save |
[in] | num_points | rows in the matrix = number of points |
[in] | num_features | columns in the matrix = dimensions of points |
- Returns
- 0 if all ok
-
-1 if file creation failed
98 FILE *fp = fopen(fname,
"wt");
102 sprintf(msg,
"File error (%s): ", fname);
107 for (
int i = 0; i < num_points; i++)
109 for (
int j = 0; j < num_features; j++)
111 fprintf(fp,
"%.4g", X[i][j]);
112 if (j < num_features - 1)
115 if (i < num_points - 1)
◆ save_u_matrix()
int save_u_matrix |
( |
const char * |
fname, |
|
|
struct array_3d * |
W |
|
) |
| |
Create the distance matrix or U-matrix from the trained weights and save to disk.
- Parameters
-
[in] | fname | filename to save in (gets overwriten without confirmation) |
[in] | W | model matrix to save |
- Returns
- 0 if all ok
-
-1 if file creation failed
134 FILE *fp = fopen(fname,
"wt");
138 sprintf(msg,
"File error (%s): ", fname);
145 for (
int i = 0; i < W->
dim1; i++)
147 for (
int j = 0; j < W->
dim2; j++)
149 double distance = 0.f;
152 int from_x =
max(0, i - R);
153 int to_x =
min(W->
dim1, i + R + 1);
154 int from_y =
max(0, j - R);
155 int to_y =
min(W->
dim2, j + R + 1);
158 #pragma omp parallel for reduction(+ : distance)
160 for (l = from_x; l < to_x; l++)
162 for (
int m = from_y; m < to_y; m++)
165 for (k = 0; k < W->
dim3; k++)
167 double *w1 =
data_3d(W, i, j, k);
168 double *w2 =
data_3d(W, l, m, k);
169 d += (w1[0] - w2[0]) * (w1[0] - w2[0]);
178 fprintf(fp,
"%.4g", distance);
◆ test1()
Test that creates a random set of points distributed in four clusters in 2D space and trains an SOM that finds the topological pattern.
The following CSV files are created to validate the execution:
test1.csv
: random test samples points with a circular pattern
w11.csv
: initial random U-matrix
w12.csv
: trained SOM U-matrix
400 double **X = (
double **)malloc(
N *
sizeof(
double *));
407 W.data = (
double *)malloc(num_out * num_out * features *
410 for (
int i = 0; i <
max(num_out,
N); i++)
413 X[i] = (
double *)malloc(features *
sizeof(
double));
416 for (
int k = 0; k < num_out; k++)
422 for (j = 0; j < features; j++)
424 double *w =
data_3d(&W, i, k, j);
437 for (
int i = 0; i <
N; i++) free(X[i]);
◆ test2()
Test that creates a random set of points distributed in 4 clusters in 3D space and trains an SOM that finds the topological pattern.
The following CSV files are created to validate the execution:
test2.csv
: random test samples points
w21.csv
: initial random U-matrix
w22.csv
: trained SOM U-matrix
500 double **X = (
double **)malloc(
N *
sizeof(
double *));
507 W.data = (
double *)malloc(num_out * num_out * features *
510 for (
int i = 0; i <
max(num_out,
N); i++)
513 X[i] = (
double *)malloc(features *
sizeof(
double));
516 for (
int k = 0; k < num_out; k++)
521 for (j = 0; j < features; j++)
523 double *w =
data_3d(&W, i, k, j);
536 for (
int i = 0; i <
N; i++) free(X[i]);
◆ test3()
Test that creates a random set of points distributed in eight clusters in 3D space and trains an SOM that finds the topological pattern.
The following CSV files are created to validate the execution:
test3.csv
: random test samples points
w31.csv
: initial random U-matrix
w32.csv
: trained SOM U-matrix
601 double **X = (
double **)malloc(
N *
sizeof(
double *));
608 W.data = (
double *)malloc(num_out * num_out * features *
611 for (
int i = 0; i <
max(num_out,
N); i++)
614 X[i] = (
double *)malloc(features *
sizeof(
double));
617 for (
int k = 0; k < num_out; k++)
623 for (j = 0; j < features; j++)
625 double *w =
data_3d(&W, i, k, j);
638 for (
int i = 0; i <
N; i++) free(X[i]);
◆ test_2d_classes()
void test_2d_classes |
( |
double *const * |
data, |
|
|
int |
N |
|
) |
| |
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
- \((0,5, 0.5, 0.5)\)
- \((0,5,-0.5, -0.5)\)
- \((-0,5, 0.5, 0.5)\)
- \((-0,5,-0.5, -0.5)\)
- Parameters
-
[out] | data | matrix to store data in |
[in] | N | number of points required |
355 const double R = 0.3;
357 const int num_classes = 4;
358 const double centres[][2] = {
369 for (i = 0; i <
N; i++)
372 rand() % num_classes;
375 data[i][0] =
_random(centres[
class][0] - R, centres[
class][0] + R);
376 data[i][1] =
_random(centres[
class][1] - R, centres[
class][1] + R);
◆ test_3d_classes1()
void test_3d_classes1 |
( |
double *const * |
data, |
|
|
int |
N |
|
) |
| |
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
- \((0,5, 0.5, 0.5)\)
- \((0,5,-0.5, -0.5)\)
- \((-0,5, 0.5, 0.5)\)
- \((-0,5,-0.5, -0.5)\)
- Parameters
-
[out] | data | matrix to store data in |
[in] | N | number of points required |
454 const double R = 0.2;
456 const int num_classes = 4;
457 const double centres[][3] = {
468 for (i = 0; i <
N; i++)
471 rand() % num_classes;
474 data[i][0] =
_random(centres[
class][0] - R, centres[
class][0] + R);
475 data[i][1] =
_random(centres[
class][1] - R, centres[
class][1] + R);
476 data[i][2] =
_random(centres[
class][2] - R, centres[
class][2] + R);
◆ test_3d_classes2()
void test_3d_classes2 |
( |
double *const * |
data, |
|
|
int |
N |
|
) |
| |
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
- \((0,5, 0.5, 0.5)\)
- \((0,5,-0.5, -0.5)\)
- \((-0,5, 0.5, 0.5)\)
- \((-0,5,-0.5, -0.5)\)
- Parameters
-
[out] | data | matrix to store data in |
[in] | N | number of points required |
553 const double R = 0.2;
555 const int num_classes = 8;
556 const double centres[][3] = {
571 for (i = 0; i <
N; i++)
574 rand() % num_classes;
577 data[i][0] =
_random(centres[
class][0] - R, centres[
class][0] + R);
578 data[i][1] =
_random(centres[
class][1] - R, centres[
class][1] + R);
579 data[i][2] =
_random(centres[
class][2] - R, centres[
class][2] + R);
◆ update_weights()
double update_weights |
( |
const double * |
X, |
|
|
struct array_3d * |
W, |
|
|
double ** |
D, |
|
|
int |
num_out, |
|
|
int |
num_features, |
|
|
double |
alpha, |
|
|
int |
R |
|
) |
| |
Update weights of the SOM using Kohonen algorithm.
- Parameters
-
[in] | X | data point |
[in,out] | W | weights matrix |
[in,out] | D | temporary vector to store distances |
[in] | num_out | number of output points |
[in] | num_features | number of features per input sample |
[in] | alpha | learning rate \(0<\alpha\le1\) |
[in] | R | neighborhood range |
- Returns
- minimum distance of sample and trained weights
237 for (x = 0; x < num_out; x++)
239 for (y = 0; y < num_out; y++)
244 for (k = 0; k < num_features; k++)
246 double *w =
data_3d(W, x, y, k);
247 D[x][y] += (w[0] - X[k]) * (w[0] - X[k]);
249 D[x][y] = sqrt(D[x][y]);
255 int d_min_x, d_min_y;
256 get_min_2d(D, num_out, &d_min, &d_min_x, &d_min_y);
259 int from_x =
max(0, d_min_x - R);
260 int to_x =
min(num_out, d_min_x + R + 1);
261 int from_y =
max(0, d_min_y - R);
262 int to_y =
min(num_out, d_min_y + R + 1);
269 for (x = from_x; x < to_x; x++)
271 for (y = from_y; y < to_y; y++)
281 (d_min_x - x) * (d_min_x - x) + (d_min_y - y) * (d_min_y - y);
282 double scale_factor = exp(-d2 / (2.f * alpha * alpha));
284 for (k = 0; k < num_features; k++)
286 double *w =
data_3d(W, x, y, k);
288 w[0] += alpha * scale_factor * (X[k] - w[0]);
#define min(a, b)
shorthand for minimum value
Definition: kohonen_som_topology.c:36
double update_weights(const double *X, struct array_3d *W, double **D, int num_out, int num_features, double alpha, int R)
Update weights of the SOM using Kohonen algorithm.
Definition: kohonen_som_topology.c:227
double * data
pointer to data
Definition: kohonen_som_topology.c:45
Definition: prime_factoriziation.c:25
double * data_3d(const struct array_3d *arr, int x, int y, int z)
Function that returns the pointer to (x, y, z) ^th location in the linear 3D array given by:
Definition: kohonen_som_topology.c:60
void test_3d_classes1(double *const *data, int N)
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
Definition: kohonen_som_topology.c:452
void test_3d_classes2(double *const *data, int N)
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
Definition: kohonen_som_topology.c:551
#define N
number of digits of the large number
Definition: sol1.c:109
int save_u_matrix(const char *fname, struct array_3d *W)
Create the distance matrix or U-matrix from the trained weights and save to disk.
Definition: kohonen_som_topology.c:132
int dim3
lengths of thirddimension
Definition: kohonen_som_topology.c:44
int dim1
lengths of first dimension
Definition: kohonen_som_topology.c:42
void get_min_2d(double **X, int N, double *val, int *x_idx, int *y_idx)
Get minimum value and index of the value in a matrix.
Definition: kohonen_som_topology.c:197
double _random(double a, double b)
Helper function to generate a random number in a given interval.
Definition: kohonen_som_topology.c:80
to store info regarding 3D arrays
Definition: kohonen_som_topology.c:41
int save_2d_data(const char *fname, double **X, int num_points, int num_features)
Save a given n-dimensional data martix to file.
Definition: kohonen_som_topology.c:95
#define max(a, b)
shorthand for maximum value
Definition: kohonen_som_topology.c:32
int dim2
lengths of second dimension
Definition: kohonen_som_topology.c:43
void kohonen_som(double **X, struct array_3d *W, int num_samples, int num_features, int num_out, double alpha_min)
Apply incremental algorithm with updating neighborhood and learning rates on all samples in the given...
Definition: kohonen_som_topology.c:306
void test_2d_classes(double *const *data, int N)
Creates a random set of points distributed in four clusters in 3D space with centroids at the points.
Definition: kohonen_som_topology.c:353