erofs-utils: lib: reset HC to avoid 32-bit overflow of kite-deflate
authorGao Xiang <hsiangkao@linux.alibaba.com>
Wed, 24 Jan 2024 09:16:21 +0000 (17:16 +0800)
committerGao Xiang <hsiangkao@linux.alibaba.com>
Wed, 24 Jan 2024 10:40:13 +0000 (18:40 +0800)
Yifan reported a "segmentation fault (core dumped)" error days ago
with a large dataset (enwik9 x 5).   Let's fix it.

Reported-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn>
Fixes: 861037f4fc15 ("erofs-utils: add a built-in DEFLATE compressor")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Tested-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn>
Link: https://lore.kernel.org/r/20240124091621.2413606-1-hsiangkao@linux.alibaba.com
include/erofs/defs.h
lib/kite_deflate.c

index e7384a143f9b27e046dec2ec9830bf26b9c8d930..4ea9a55ec0e961726e74274c0af3174d177981d7 100644 (file)
@@ -343,6 +343,9 @@ unsigned long __roundup_pow_of_two(unsigned long n)
 #define ST_MTIM_NSEC(stbuf) 0
 #endif
 
+#define likely(x)      __builtin_expect(!!(x), 1)
+#define unlikely(x)    __builtin_expect(!!(x), 0)
+
 #ifdef __cplusplus
 }
 #endif
index 8667954162fe2943d32860242f7cf4e0974dd4fd..570bc5a6d6b625b730199c01ebe5e6bfe4dbadf7 100644 (file)
@@ -859,6 +859,17 @@ static void kite_mf_reset(struct kite_matchfinder *mf,
         */
        mf->base += mf->offset + kHistorySize32 + 1;
 
+       /*
+        * Unlike other LZ encoders like liblzma [1], we simply reset the hash
+        * chain instead of normalization.  This avoids extra complexity, as we
+        * don't consider extreme large input buffers in one go.
+        *
+        * [1] https://github.com/tukaani-project/xz/blob/v5.4.0/src/liblzma/lz/lz_encoder_mf.c#L94
+        */
+       if (unlikely(mf->base > ((typeof(mf->base))-1) >> 1)) {
+               mf->base = kHistorySize32 + 1;
+               memset(mf->hash, 0, 0x10000 * sizeof(mf->hash[0]));
+       }
        mf->offset = 0;
        mf->cyclic_pos = 0;