update memset16/32 inlining heuristics
I spent some time looking at perf.skia.org and it looks like we can do better.
It is weird, weird, weird that on x86, we see three completely different behaviors:
- x86 Android: inlining better for small N, custom better for large N;
- Windows: inlining better for large N, custom better for small N;
- other x86: inlining generally better
BUG=skia:4316,chromium:516426
(Temporary, plan to revert.)
TBR=reed@google.com
Review URL: https://codereview.chromium.org/
1357193002