net/mlx5: Bridge, fix peer entry ageing in LAG mode
authorVlad Buslov <vladbu@nvidia.com>
Wed, 9 Aug 2023 09:10:57 +0000 (11:10 +0200)
committerSaeed Mahameed <saeedm@nvidia.com>
Thu, 12 Oct 2023 18:10:33 +0000 (11:10 -0700)
commit7a3ce8074878a68a75ceacec93d9ae05906eec86
tree6f00f5b56fe3d56e630cfbab5dc6dfce98f2c9b5
parent7624e58a8b3a251e3e5108b32f2183b34453db32
net/mlx5: Bridge, fix peer entry ageing in LAG mode

With current implementation in single FDB LAG mode all packets are
processed by eswitch 0 rules. As such, 'peer' FDB entries receive the
packets for rules of other eswitches and are responsible for updating the
main entry by sending SWITCHDEV_FDB_ADD_TO_BRIDGE notification from their
background update wq task. However, this introduces a race condition when
non-zero eswitch instance decides to delete a FDB entry, sends
SWITCHDEV_FDB_DEL_TO_BRIDGE notification, but another eswitch's update task
refreshes the same entry concurrently while its async delete work is still
pending on the workque. In such case another SWITCHDEV_FDB_ADD_TO_BRIDGE
event may be generated and entry will remain stuck in FDB marked as
'offloaded' since no more SWITCHDEV_FDB_DEL_TO_BRIDGE notifications are
sent for deleting the peer entries.

Fix the issue by synchronously marking deleted entries with
MLX5_ESW_BRIDGE_FLAG_DELETED flag and skipping them in background update
job.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
drivers/net/ethernet/mellanox/mlx5/core/en/rep/bridge.c
drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.c
drivers/net/ethernet/mellanox/mlx5/core/esw/bridge.h
drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_priv.h