Replaced the non-accelerated rgb to ycbcr encoder (rfx_encode.c) to use 32-bit
integer multiplication with shifted factors: 2 times faster
The accelerated SSE2 rgb to ycbcr encoder (rfx_sse2.c) was completely changed
and simplified in order to make use of the SSE2 signed 16-bit integer
multiplication: 2 times faster
Also modified the non-accelerated ycbcr to rgb decoder (rfx_encode.c) to use
32-bit integer multiplications with shifted factors instead of floating point
multiplications: 3 times faster