Просмотр исходного кода

chore(router): outline helper responsibilities and operational model

Andrew Mayorov 1 год назад
Родитель
Сommit
3388101652
1 измененных файлов с 27 добавлено и 0 удалено
  1. 27 0
      apps/emqx/src/emqx_router_helper.erl

+ 27 - 0
apps/emqx/src/emqx_router_helper.erl

@@ -14,6 +14,33 @@
 %% limitations under the License.
 %%--------------------------------------------------------------------
 
+%% Router helper process.
+%%
+%% Responsibility is twofold:
+%% 1. Cleaning own portion of the global routing table when restarted.
+%%    The assumption is that the node has crashed (worst-case), so the
+%%    previous incarnation's routes are still present upon restart.
+%% 2. Managing portions of global routing table belonging to dead / "left"
+%%    cluster members, i.e. members that are not supposed to come back
+%%    online again.
+%%
+%% Only core nodes are responsible for the latter task. Moreover, helper
+%% adopts the following operational model:
+%% 1. Core nodes are supposed to be explicitly evicted (or "left") from
+%%    the cluster. Even if a core node is marked down for several hours,
+%%    helper won't attempt to purge its portion of the global routing
+%%    table.
+%% 2. Replicant nodes are considered dead (or "left") once they are down
+%%    for a specific timespan. Currently hardcoded as `?PURGE_DEAD_TIMEOUT`.
+%%    Ideally it should reflect amount of time it takes for a connectivity
+%%    failure between cores and replicants to heal worst-case.
+%%
+%% TODO
+%% While cores purge unreachable replicants' routes after a timeout,
+%% replicants _do nothing_ on connectivity loss, regardless of how long
+%% it is. Coupled with the fact that replicants are not affected by
+%% "autoheal" mechanism, this may still lead to routing inconsistencies.
+
 -module(emqx_router_helper).
 
 -behaviour(gen_server).