decompress_generic: Add a loop fastpath
Copy the main loop, and change checks such that op is always less
than oend-SAFE_DISTANCE. Currently these are added for the literal
copy length check, and for the match copy length check.
Otherwise the first loop is exactly the same as the second. Follow on
diffs will optimize the first copy loop based on this new requirement.
I also tried instead making a separate inlineable function for the copy
loop (similar to existing partialDecode flags, etc), but I think the
changes might be significant enough to warrent doubling the code, instead
pulling out common functionality to separate functions.
This is the basic transformation that will allow several following optimisations.