Depending on size of the filesystem being built there can be huge number
of elements in the hashmap. Currently we call hashmap_iter_first() in
while loop to iterate and free the elements. However technically
correct, this is inefficient in 2 aspects.
- As we are iterating the elements for removal, we do not need overhead of
rehashing.
- Second part which contributes hugely to the performance is using
hashmap_iter_first() as it starts scanning from index 0 throwing away
the previous successful scan. For sparsely populated hashmap this becomes
O(n^2) in worst case.
Lets fix this by disabling hashmap shrink which avoids rehashing
and use hashmap_iter_next() which is now guaranteed to iterate over
all the elements while removing while avoiding the performance pitfalls
of using hashmap_iter_first().
Test with random data shows performance improvement as:
fs_size Before After
1G 23s 7s
2G 81s 15s
4G 272s 31s
8G 1252s 61s
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240523210131.3126753-3-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
if (blobfile)
fclose(blobfile);
- while ((e = hashmap_iter_first(&blob_hashmap, &iter))) {
+ /* Disable hashmap shrink, effectively disabling rehash.
+ * This way we can iterate over entire hashmap efficiently
+ * and safely by using hashmap_iter_next() */
+ hashmap_disable_shrink(&blob_hashmap);
+ e = hashmap_iter_first(&blob_hashmap, &iter);
+ while (e) {
bc = container_of((struct hashmap_entry *)e,
struct erofs_blobchunk, ent);
DBG_BUGON(hashmap_remove(&blob_hashmap, e) != e);
free(bc);
+ e = hashmap_iter_next(&iter);
}
DBG_BUGON(hashmap_free(&blob_hashmap));