FreeRDP/libfreerdp/primitives
Norbert Federa e8c4910e2e fix segfaults casused by size_t format specifier
win32/msvc cc does not recognize the %z format specifier which caused
invalid references and segfaults on win32.
Until FreeRDP gets format specifier macros we'll cast size_t to
unsigned long and use the %lu specifier.

Also simplified winpr_backtrace_symbols() a little bit and fixed it
to allocate the correct amount of bytes for the return buffer.
2016-05-27 15:55:28 +02:00
..
test fix segfaults casused by size_t format specifier 2016-05-27 15:55:28 +02:00
prim_16to32bpp_opt.c add YCoCg->RGB and 16-to-32bit SSE 2014-07-02 14:30:04 -06:00
prim_16to32bpp.c add YCoCg->RGB and 16-to-32bit SSE 2014-07-02 14:30:04 -06:00
prim_16to32bpp.h add YCoCg->RGB and 16-to-32bit SSE 2014-07-02 14:30:04 -06:00
prim_add_opt.c primitives: use alias define for SSE2 2013-03-01 09:02:15 +01:00
prim_add.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_add.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_alphaComp_opt.c primitives: use alias define for SSE2 2013-03-01 09:02:15 +01:00
prim_alphaComp.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_alphaComp.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_andor_opt.c primitives: use alias define for SSE2 2013-03-01 09:02:15 +01:00
prim_andor.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_andor.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_colors_opt.c Fixed constant definition. 2016-02-03 11:51:31 +01:00
prim_colors.c libfreerdp-codec: add BGR support to egfx 2014-09-16 16:55:47 -04:00
prim_colors.h libfreerdp-codec: improve YCbCr to RGB color conversion 2014-09-04 13:09:46 -04:00
prim_copy.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_internal.h libfreerdp-primitives: add YUV420 to RGB conversion 2014-09-06 17:10:27 -04:00
prim_set_opt.c primitives: use alias define for SSE2 2013-03-01 09:02:15 +01:00
prim_set.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_set.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_shift_opt.c primitives: use alias define for SSE2 2013-03-01 09:02:15 +01:00
prim_shift.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_shift.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_sign_opt.c primitives: fixed flag detection for sign functions 2013-03-01 09:02:15 +01:00
prim_sign.c primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_sign.h primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00
prim_templates.h primitives: separating optimized functions into their own .c files. 2013-02-21 02:45:10 -08:00
prim_YCoCg_opt.c libfreerdp-primitives: cleanup YCoCg 2014-09-06 21:13:37 -04:00
prim_YCoCg.c libfreerdp-primitives: cleanup YCoCg 2014-09-06 21:13:37 -04:00
prim_YCoCg.h libfreerdp-primitives: cleanup YCoCg 2014-09-06 21:13:37 -04:00
prim_YUV_opt.c Implemented YUV444 related primitives. 2016-03-16 13:43:17 +01:00
prim_YUV.c Fixed comments. 2016-03-16 13:43:18 +01:00
prim_YUV.h YUV data conversion of H.264 implementation (egfx): 2014-09-09 00:13:18 +02:00
primitives.c libfreerdp-primitives: add YUV420 to RGB conversion 2014-09-06 17:10:27 -04:00
README.txt primitives: make use of winprs processor feature detection 2013-03-01 09:02:14 +01:00

The Primitives Library

Introduction
------------
The purpose of the primitives library is to give the freerdp code easy
access to *run-time* optimization via SIMD operations.  When the library
is initialized, dynamic checks of processor features are run (such as
the support of SSE3 or Neon), and entrypoints are linked to through
function pointers to provide the fastest possible operations.  All
routines offer generic C alternatives as fallbacks.

Run-time optimization has the advantage of allowing a single executable
to run fast on multiple platforms with different SIMD capabilities.


Use In Code
-----------
A singleton pointing to a structure containing the function pointers
is accessed through primitives_get().   The function pointers can then
be used from that structure, e.g.

    primitives_t *prims = primitives_get();
    prims->shiftC_16s(buffer, shifts, buffer, 256);

Of course, there is some overhead in calling through the function pointer
and setting up the SIMD operations, so it would be counterproductive to
call the primitives library for very small operation, e.g. initializing an
array of eight values to a constant.  The primitives library is intended
for larger-scale operations, e.g. arrays of size 64 and larger.


Initialization and Cleanup
--------------------------
Library initialization is done the first time primitives_init() is called
or the first time primitives_get() is used.  Cleanup (if any) is done by
primitives_deinit().


Intel Integrated Performance Primitives (IPP)
---------------------------------------------
If freerdp is compiled with IPP support (-DWITH_IPP=ON), the IPP function
calls will be used (where available) to fill the function pointers.
Where possible, function names and parameter lists match IPP format so
that the IPP functions can be plugged into the function pointers without
a wrapper layer.  Use of IPP is completely optional, and in many cases
the SSE operations in the primitives library itself are faster or similar
in performance.


Coverage
--------
The primitives library is not meant to be comprehensive, offering
entrypoints for every operation and operand type.  Instead, the coverage
is focused on operations known to be performance bottlenecks in the code.
For instance, 16-bit signed operations are used widely in the RemoteFX
software, so you'll find 16s versions of several operations, but there
is no attempt to provide (unused) copies of the same code for 8u, 16u,
32s, etc.


New Optimizations
-----------------
As the need arises, new optimizations can be added to the library,
including NEON, AVX, and perhaps OpenCL or other SIMD implementations.
The CPU feature detection is done in winpr/sysinfo.


Adding Entrypoints
------------------
As the need for new operations or operands arises, new entrypoints can
be added.  
  1) Function prototypes and pointers are added to 
     include/freerdp/primitives.h
  2) New module initialization and cleanup function prototypes are added
     to prim_internal.h and called in primitives.c (primitives_init()
     and primitives_deinit()).
  3) Operation names and parameter lists should be compatible with the IPP.
     IPP manuals are available online at software.intel.com.
  4) A generic C entrypoint must be available as a fallback.
  5) prim_templates.h contains macro-based templates for simple operations,
     such as applying a single SSE operation to arrays of data.
     The template functions can frequently be used to extend the
     operations without writing a lot of new code.

Cache Management
----------------
I haven't found a lot of speed improvement by attempting prefetch, and
in fact it seems to have a negative impact in some cases.  Done correctly
perhaps the routines could be further accelerated by proper use of prefetch,
fences, etc.


Testing
-------
In the test subdirectory is an executable (prim_test) that tests both
functionality and speed of primitives library operations.   Any new
modules should be added to that test, following the conventions already
established in that directory.  The program can be executed on various
target hardware to compare generic C, optimized, and IPP performance
with various array sizes.