The kernel heap only uses object caches for objects up to size 8192.
Larger allocations have to go through the raw allocator. That can
get pretty expensive.
Adding instrumentation around the malloc/free calls in this function
showed that on my machine, some 596ms during boot were spent on
*malloc/free alone*, all else aside. After this change, we are at
around 110ms, or a >5x improvement. Running an fgrep -R on /system/
after boot increased the cumulative time in memory functions to over
5 seconds, while after this change it is "only" 1170ms.
Honestly, it seems like the object depots should be able to be faster
than that, even if this function is called thousands of times. But that
is a problem for a different investigation.
It would be even faster for every consumer of this data in
packagefs just allocated one set of buffers up front, or at least
for a single "read session", but plumbing that all the way
through the myriad abstractions of the Package Kit will
not be easy, and is left for another time, as well.