erofs-utils: lib: improve freeing hashmap in erofs_blob_exit()
authorSandeep Dhavale <dhavale@google.com>
Thu, 23 May 2024 21:01:31 +0000 (14:01 -0700)
committerGao Xiang <hsiangkao@linux.alibaba.com>
Tue, 4 Jun 2024 09:00:22 +0000 (17:00 +0800)
Depending on size of the filesystem being built there can be huge number
of elements in the hashmap. Currently we call hashmap_iter_first() in
while loop to iterate and free the elements. However technically
correct, this is inefficient in 2 aspects.

- As we are iterating the elements for removal, we do not need overhead of
rehashing.
- Second part which contributes hugely to the performance is using
hashmap_iter_first() as it starts scanning from index 0 throwing away
the previous successful scan. For sparsely populated hashmap this becomes
O(n^2) in worst case.

Lets fix this by disabling hashmap shrink which avoids rehashing
and use hashmap_iter_next() which is now guaranteed to iterate over
all the elements while removing while avoiding the performance pitfalls
of using hashmap_iter_first().

Test with random data shows performance improvement as:

fs_size  Before   After
1G   23s    7s
2G   81s      15s
4G  272s     31s
8G   1252s   61s

Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240523210131.3126753-3-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
lib/blobchunk.c

index 645bcc11be50010573dc781e992419a85b397463..8082aa4e961857991195444f23d201fd667cbf52 100644 (file)
@@ -548,11 +548,17 @@ void erofs_blob_exit(void)
        if (blobfile)
                fclose(blobfile);
 
-       while ((e = hashmap_iter_first(&blob_hashmap, &iter))) {
+       /* Disable hashmap shrink, effectively disabling rehash.
+        * This way we can iterate over entire hashmap efficiently
+        * and safely by using hashmap_iter_next() */
+       hashmap_disable_shrink(&blob_hashmap);
+       e = hashmap_iter_first(&blob_hashmap, &iter);
+       while (e) {
                bc = container_of((struct hashmap_entry *)e,
                                  struct erofs_blobchunk, ent);
                DBG_BUGON(hashmap_remove(&blob_hashmap, e) != e);
                free(bc);
+               e = hashmap_iter_next(&iter);
        }
        DBG_BUGON(hashmap_free(&blob_hashmap));