The previous method was failing to actually mix the samples as signed
16-bit values and just adding the individual bytes. This works out fine
for single buffers but creates weird artifacting if multiple buffers are
being mixed and there is a carry between the bytes.