فهرست منبع

fix(emqx_cm): fix channel data registration race-condition

when clustered, there are chances the a mqtt client process
get killed (e.g. holding the channel registeration lock for too long),
if the channel data inserts happen before casting out the message
for channel process monitoring, there is a chance for the
stale message left in the ets tables indefinitely.

this commit changes the order of the non-atomic operations:
it casts out the monitor request message before inserting
channel data.
Zaiming (Stone) Shi 2 سال پیش
والد
کامیت
c75e9bbe0d
2فایلهای تغییر یافته به همراه7 افزوده شده و 1 حذف شده
  1. 3 1
      apps/emqx/src/emqx_cm.erl
  2. 4 0
      changes/ce/fix-10923.en.md

+ 3 - 1
apps/emqx/src/emqx_cm.erl

@@ -176,11 +176,13 @@ insert_channel_info(ClientId, Info, Stats) ->
 %% Note that: It should be called on a lock transaction
 %% Note that: It should be called on a lock transaction
 register_channel(ClientId, ChanPid, #{conn_mod := ConnMod}) when is_pid(ChanPid) ->
 register_channel(ClientId, ChanPid, #{conn_mod := ConnMod}) when is_pid(ChanPid) ->
     Chan = {ClientId, ChanPid},
     Chan = {ClientId, ChanPid},
+    %% cast (for process monitor) before inserting ets tables
+    cast({registered, Chan}),
     true = ets:insert(?CHAN_TAB, Chan),
     true = ets:insert(?CHAN_TAB, Chan),
     true = ets:insert(?CHAN_CONN_TAB, {Chan, ConnMod}),
     true = ets:insert(?CHAN_CONN_TAB, {Chan, ConnMod}),
     ok = emqx_cm_registry:register_channel(Chan),
     ok = emqx_cm_registry:register_channel(Chan),
     mark_channel_connected(ChanPid),
     mark_channel_connected(ChanPid),
-    cast({registered, Chan}).
+    ok.
 
 
 %% @doc Unregister a channel.
 %% @doc Unregister a channel.
 -spec unregister_channel(emqx_types:clientid()) -> ok.
 -spec unregister_channel(emqx_types:clientid()) -> ok.

+ 4 - 0
changes/ce/fix-10923.en.md

@@ -0,0 +1,4 @@
+Fix a race-condition in channel info registration.
+
+Prior to this fix, when system is under heavy load, it might happen that a client is disconnected (or has its session expired) but still can be found in the clients page in dashboard.
+One of the possible reasons is a race condition fixed in this PR: the connection is killed in the middle of channel data registration.