1. In the presence of nonzero shift_x / shift_y,
stbtt_GetGlyphBitmapBoxSubpixel would return a nonzero-sized bounding
box for empty glyphs (e.g. spaces). Since such glyphs don't have any
outlines, the rasterizer wouldn't do anything, resulting in a 1x1-pixel
image with uninitialized memory.
2. GetGlyphBitmapBoxSubpixel added shift_y then flipped the y axis,
whereas the rasterizer flipped the y axis then added shift_y.
Consistently flip-then-add in both places. This also makes the pattern
of floors/ceils in GetGlyphBitmapBoxSubpixel simpler.
3. The rasterizer added shift_y after multiplying by the vertical
oversampling factor, instead of before.
Vertical shifts now work much better, in my tests anyway.