NEON optimized code might be used in multiarch/universal builds.
So not only guard with WITH_NEON but also with architecture defines from
winpr/platform.h
SSE optimized code might be used in multiarch/universal builds.
So not only guard with WITH_SSE2 but also with architecture defines from
winpr/platform.h
This patch adds the basic infrastructure to have openCL acceleration.
For now only YUV2RGB is implemented but other operations could be
implemented.
The primitives have been massively reworked so that we have an autodetect
mode that will pick the best implementation automatically by performing a
benchmark.
Sponsored-by: Rangee Gmbh(http://www.rangee.com)
Currently supported source pixel formats are:
- PIXEL_FORMAT_BGRA32
- PIXEL_FORMAT_BGRX32
Support for PIXEL_FORMAT_RGB[XA]32 can simply be added if
required (see the comment in prim_YUV_opt.c).
On my old 3.1 GHz Core i5-2400 the new SSSE3 function can convert
over 900 1080p BGRX frames per second.
The current non-optimized C version (which supports all pixel formats)
can't do more than 40 yuv conversions per second on this cpu.
---------------------------+---------+-------------+-----------+-------
RGB TO YUV420 1080p 32bit | COUNT | TOTAL | AVG | FPS
---------------------------+---------+-------------+-----------+-------
general_RGBToYUV420 | 500 | 13.1776s | 0.026355s | 38
ssse3_RGBToYUV420 | 500 | 0.5320s | 0.001064s | 940
Also fixed an error in TestPrimitivesYUV which generated resolutions
with height or width set to zero
CMake 2.8 does not support default visibility on windows.
To allow building tests add the FREERDP_LOCAL define for each
function that is internal to FreeRDP.
When build with testing these functions are exported and available
for use by tests.
This way we use certain compiler flags (like -msse3) only on files
containing optimized code. This avoids problems that occured when
using these flags compiling generic code and running it on platforms
that don't support these optimizations (i.e. NEON optimization on
ARM platforms).