Previously, under high stress, the node could lose track of a resource's (action/source) metrics and not be able to recover until the node is reboot. This is now mitigated by attempting to recreate such metrics when either restarting the resource or resetting its metrics.
Also, warning logs about failures to bump said metrics would flood the logs for "hot-path" metrics such as matched. Now, such logs are throttled to avoid bloating log files.
An example of such throttled log:
2024-11-14T13:56:44.134289+00:00 [warning] tag: RESOURCE, clientid: clientid, msg: handle_resource_metrics_failed, peername: 172.100.239.1:33896, reason: {badkey,matched}, stacktrace: [{erlang,map_get,[matched,#{}],[{error_info,#{module => erl_erts_errors}}]},{emqx_metrics_worker,idx_metric,4,[{file,"src/emqx_metrics_worker.erl"},{line,560}]},...
2024-11-14T13:57:12.490503+00:00 [warning] msg: log_events_throttled_during_last_period, period: 1 minutes, 0 seconds, dropped: #{handle_resource_metrics_failed => 2294}