lib: zstd: Improve decode performance
To speed up decode performance, optimizations are brought from zstd
github repository and ported as kernel-style.
Since the low-level algorithm is preferred in linux due to
compression/decompression performance (default level 3), the
optimization for low-level is chosed as follows:
[1] lib: zstd: Speed up single segment zstd_fast by 5%
(https://github.com/facebook/zstd/pull/1562/commits/
95624b77e477752b3c380c22be7bcf67f06c9934)
[2] perf improvements for zstd decode
(https://github.com/facebook/zstd/pull/1668/commits/
29d1e81bbdfc21085529623e7bc5abcb3e1627ae)
[3] updated double_fast complementary insertion
(https://github.com/facebook/zstd/pull/1681/commits/
d1327738c277643f09c972a407083ad73c8ecf7b)
[4] Improvements in zstd decode performance
(https://github.com/facebook/zstd/pull/1756/commits/
b83059958246dfcb5b91af9c187fad8c706869a0)
[5] Optimize decompression and fix wildcopy overread
(https://github.com/facebook/zstd/pull/1804/commits/
efd37a64eaff5a0a26ae2566fdb45dc4a0c91673)
[6] Improve ZSTD_highbit32's codegen
(https://github.com/facebook/zstd/commit/
a07da7b0db682c170a330a8c21585be3d68275fa)
[7] Optimize decompression speed for gcc and clang (#1892)
(https://github.com/facebook/zstd/commit/
718f00ff6fe42db7e6ba09a7f7992b3e85283f77)
[8] Fix performance regression on aarch64 with clang
(https://github.com/facebook/zstd/pull/1973/commits/
cb2abc3dbe010113d9e00ca3b612bf61983145a2)
Change-Id: Ia2cf120879a415988dbbc2fce59a994915c8c77c
Signed-off-by: Dongwoo Lee <dwoo08.lee@samsung.com>