瀏覽代碼

Merge remote-tracking branch 'upstream/release-56' into 0319-sync-release56

Ivan Dyachkov 2 年之前
父節點
當前提交
f2dc940436
共有 100 個文件被更改,包括 3412 次插入736 次删除
  1. 20 7
      .github/workflows/build_and_push_docker_images.yaml
  2. 2 0
      .gitignore
  3. 3 3
      Makefile
  4. 2 2
      apps/emqx/include/emqx_release.hrl
  5. 71 1
      apps/emqx/integration_test/emqx_persistent_session_ds_SUITE.erl
  6. 1 0
      apps/emqx/priv/bpapi.versions
  7. 1 1
      apps/emqx/rebar.config
  8. 4 0
      apps/emqx/src/emqx_channel.erl
  9. 23 3
      apps/emqx/src/emqx_cm_registry_keeper.erl
  10. 1 0
      apps/emqx/src/emqx_config.erl
  11. 92 0
      apps/emqx/src/emqx_cpu_sup_worker.erl
  12. 46 1
      apps/emqx/src/emqx_inflight.erl
  13. 51 1
      apps/emqx/src/emqx_mqueue.erl
  14. 4 3
      apps/emqx/src/emqx_os_mon.erl
  15. 18 0
      apps/emqx/src/emqx_persistent_message.erl
  16. 88 37
      apps/emqx/src/emqx_persistent_session_ds.erl
  17. 1 0
      apps/emqx/src/emqx_rpc.erl
  18. 1 1
      apps/emqx/src/emqx_session.erl
  19. 5 0
      apps/emqx/src/emqx_session_mem.erl
  20. 1 1
      apps/emqx/src/emqx_shared_sub.erl
  21. 2 2
      apps/emqx/src/emqx_sys_mon.erl
  22. 1 1
      apps/emqx/src/emqx_sys_sup.erl
  23. 22 19
      apps/emqx/src/emqx_vm.erl
  24. 11 0
      apps/emqx/test/emqx_common_test_helpers.erl
  25. 80 2
      apps/emqx/test/emqx_inflight_SUITE.erl
  26. 68 0
      apps/emqx/test/emqx_mqueue_SUITE.erl
  27. 2 1
      apps/emqx/test/emqx_os_mon_SUITE.erl
  28. 3 2
      apps/emqx/test/emqx_vm_SUITE.erl
  29. 1 1
      apps/emqx_bridge/src/emqx_bridge_api.erl
  30. 6 6
      apps/emqx_bridge/src/emqx_bridge_v2_api.erl
  31. 18 0
      apps/emqx_bridge/test/emqx_bridge_v2_testlib.erl
  32. 25 5
      apps/emqx_bridge_dynamo/src/emqx_bridge_dynamo_connector.erl
  33. 9 12
      apps/emqx_bridge_dynamo/src/emqx_bridge_dynamo_connector_client.erl
  34. 49 2
      apps/emqx_bridge_dynamo/test/emqx_bridge_dynamo_SUITE.erl
  35. 3 3
      apps/emqx_bridge_gcp_pubsub/src/emqx_bridge_gcp_pubsub_client.erl
  36. 30 13
      apps/emqx_bridge_gcp_pubsub/src/emqx_bridge_gcp_pubsub_impl_producer.erl
  37. 100 55
      apps/emqx_bridge_gcp_pubsub/test/emqx_bridge_gcp_pubsub_producer_SUITE.erl
  38. 215 0
      apps/emqx_bridge_gcp_pubsub/test/emqx_bridge_v2_gcp_pubsub_producer_SUITE.erl
  39. 6 4
      apps/emqx_bridge_hstreamdb/src/emqx_bridge_hstreamdb.erl
  40. 4 0
      apps/emqx_bridge_iotdb/src/emqx_bridge_iotdb_connector.erl
  41. 3 0
      apps/emqx_bridge_kafka/src/emqx_bridge_kafka.erl
  42. 7 1
      apps/emqx_bridge_kafka/src/emqx_bridge_kafka_consumer_schema.erl
  43. 44 4
      apps/emqx_bridge_kafka/src/emqx_bridge_kafka_impl_consumer.erl
  44. 34 2
      apps/emqx_bridge_kafka/test/emqx_bridge_kafka_impl_consumer_SUITE.erl
  45. 12 0
      apps/emqx_bridge_kafka/test/emqx_bridge_v2_kafka_consumer_SUITE.erl
  46. 16 1
      apps/emqx_bridge_kinesis/src/emqx_bridge_kinesis.erl
  47. 79 78
      apps/emqx_bridge_opents/src/emqx_bridge_opents_connector.erl
  48. 18 12
      apps/emqx_bridge_pulsar/test/emqx_bridge_pulsar_v2_SUITE.erl
  49. 55 0
      apps/emqx_bridge_rabbitmq/test/emqx_bridge_rabbitmq_v2_SUITE.erl
  50. 1 1
      apps/emqx_bridge_rocketmq/src/emqx_bridge_rocketmq_connector.erl
  51. 49 11
      apps/emqx_bridge_tdengine/src/emqx_bridge_tdengine_connector.erl
  52. 33 25
      apps/emqx_conf/src/emqx_conf_cli.erl
  53. 2 0
      apps/emqx_connector/src/emqx_connector.erl
  54. 1 1
      apps/emqx_connector/src/emqx_connector_api.erl
  55. 14 0
      apps/emqx_dashboard/include/emqx_dashboard.hrl
  56. 8 3
      apps/emqx_dashboard/src/emqx_dashboard_monitor.erl
  57. 28 21
      apps/emqx_dashboard/src/emqx_dashboard_monitor_api.erl
  58. 89 2
      apps/emqx_dashboard/src/emqx_dashboard_swagger.erl
  59. 94 6
      apps/emqx_dashboard/test/emqx_dashboard_monitor_SUITE.erl
  60. 7 3
      apps/emqx_durable_storage/src/emqx_ds.erl
  61. 35 25
      apps/emqx_durable_storage/src/emqx_ds_replication_layer.erl
  62. 11 12
      apps/emqx_durable_storage/src/emqx_ds_storage_bitfield_lts.erl
  63. 5 7
      apps/emqx_durable_storage/src/emqx_ds_storage_layer.erl
  64. 3 5
      apps/emqx_durable_storage/src/proto/emqx_ds_proto_v4.erl
  65. 111 11
      apps/emqx_durable_storage/test/emqx_ds_SUITE.erl
  66. 58 0
      apps/emqx_durable_storage/test/emqx_ds_test_helpers.erl
  67. 1 1
      apps/emqx_ldap/src/emqx_ldap.app.src
  68. 1 1
      apps/emqx_ldap/src/emqx_ldap.erl
  69. 3 0
      apps/emqx_management/include/emqx_mgmt.hrl
  70. 58 31
      apps/emqx_management/src/emqx_mgmt.erl
  71. 62 5
      apps/emqx_management/src/emqx_mgmt_api.erl
  72. 2 1
      apps/emqx_management/src/emqx_mgmt_api_banned.erl
  73. 361 58
      apps/emqx_management/src/emqx_mgmt_api_clients.erl
  74. 1 1
      apps/emqx_management/src/emqx_mgmt_api_configs.erl
  75. 1 1
      apps/emqx_management/src/emqx_mgmt_api_listeners.erl
  76. 86 0
      apps/emqx_management/src/proto/emqx_management_proto_v5.erl
  77. 115 5
      apps/emqx_management/test/emqx_mgmt_SUITE.erl
  78. 522 3
      apps/emqx_management/test/emqx_mgmt_api_clients_SUITE.erl
  79. 2 2
      apps/emqx_management/test/emqx_mgmt_api_configs_SUITE.erl
  80. 1 1
      apps/emqx_opentelemetry/src/emqx_opentelemetry.app.src
  81. 13 1
      apps/emqx_opentelemetry/src/emqx_otel_metrics.erl
  82. 20 11
      apps/emqx_prometheus/src/emqx_prometheus.erl
  83. 0 5
      apps/emqx_prometheus/src/emqx_prometheus_cluster.erl
  84. 18 6
      apps/emqx_prometheus/test/emqx_prometheus_data_SUITE.erl
  85. 1 1
      apps/emqx_rule_engine/src/emqx_rule_engine_api.erl
  86. 1 1
      apps/emqx_rule_engine/src/emqx_rule_funcs.erl
  87. 105 13
      apps/emqx_rule_engine/test/emqx_rule_funcs_SUITE.erl
  88. 50 75
      apps/emqx_utils/src/emqx_utils_calendar.erl
  89. 19 8
      bin/emqx
  90. 2 70
      bin/nodetool
  91. 12 14
      build
  92. 1 1
      changes/ce/feat-12326.en.md
  93. 21 0
      changes/ce/feat-12561.en.md
  94. 1 0
      changes/ce/feat-12670.en.md
  95. 1 0
      changes/ce/feat-12679.en.md
  96. 9 0
      changes/ce/feat-12700.en.md
  97. 12 0
      changes/ce/feat-12719.en.md
  98. 1 0
      changes/ce/fix-12663.en.md
  99. 2 0
      changes/ce/fix-12668.en.md
  100. 0 0
      changes/ce/fix-12672.en.md

+ 20 - 7
.github/workflows/build_and_push_docker_images.yaml

@@ -69,7 +69,7 @@ permissions:
 jobs:
   build:
     runs-on: ${{ github.repository_owner == 'emqx' && fromJSON(format('["self-hosted","ephemeral","linux","{0}"]', matrix.arch)) || 'ubuntu-22.04' }}
-    container: "ghcr.io/emqx/emqx-builder/${{ inputs.builder_vsn }}:${{ inputs.elixir_vsn }}-${{ inputs.otp_vsn }}-debian11"
+    container: "ghcr.io/emqx/emqx-builder/${{ inputs.builder_vsn }}:${{ inputs.elixir_vsn }}-${{ inputs.otp_vsn }}-debian12"
     outputs:
       PKG_VSN: ${{ steps.build.outputs.PKG_VSN }}
 
@@ -166,7 +166,7 @@ jobs:
           DOCKER_BUILD_NOCACHE: true
           DOCKER_PLATFORMS: linux/amd64,linux/arm64
           DOCKER_LOAD: true
-          EMQX_RUNNER: 'public.ecr.aws/debian/debian:11-slim@sha256:22cfb3c06a7dd5e18d86123a73405664475b9d9fa209cbedcf4c50a25649cc74'
+          EMQX_RUNNER: 'public.ecr.aws/debian/debian:12-slim'
           EMQX_DOCKERFILE: 'deploy/docker/Dockerfile'
           PKG_VSN: ${{ needs.build.outputs.PKG_VSN }}
           EMQX_BUILDER_VERSION: ${{ inputs.builder_vsn }}
@@ -203,10 +203,23 @@ jobs:
           docker exec -t -u root -w /root $CID bash -c 'apt-get -y update && apt-get -y install net-tools'
           docker exec -t -u root $CID node_dump
           docker rm -f $CID
-      - name: push images
+      - name: Push docker image
         if: inputs.publish || github.repository_owner != 'emqx'
+        env:
+          PROFILE: ${{ matrix.profile[0] }}
+          DOCKER_REGISTRY: ${{ matrix.profile[1] }}
+          DOCKER_ORG: ${{ github.repository_owner }}
+          DOCKER_LATEST: ${{ inputs.latest }}
+          DOCKER_PUSH: true
+          DOCKER_BUILD_NOCACHE: false
+          DOCKER_PLATFORMS: linux/amd64,linux/arm64
+          DOCKER_LOAD: false
+          EMQX_RUNNER: 'public.ecr.aws/debian/debian:12-slim'
+          EMQX_DOCKERFILE: 'deploy/docker/Dockerfile'
+          PKG_VSN: ${{ needs.build.outputs.PKG_VSN }}
+          EMQX_BUILDER_VERSION: ${{ inputs.builder_vsn }}
+          EMQX_BUILDER_OTP: ${{ inputs.otp_vsn }}
+          EMQX_BUILDER_ELIXIR: ${{ inputs.elixir_vsn }}
+          EMQX_SOURCE_TYPE: tgz
         run: |
-          for tag in $(cat .emqx_docker_image_tags); do
-            echo "Pushing tag $tag"
-            docker push $tag
-          done
+          ./build ${PROFILE} docker

+ 2 - 0
.gitignore

@@ -72,5 +72,7 @@ ct_run*/
 apps/emqx_conf/etc/emqx.conf.all.rendered*
 rebar-git-cache.tar
 # build docker image locally
+.dockerignore
 .docker_image_tag
+.emqx_docker_image_tags
 .git/

+ 3 - 3
Makefile

@@ -7,8 +7,8 @@ REBAR = $(CURDIR)/rebar3
 BUILD = $(CURDIR)/build
 SCRIPTS = $(CURDIR)/scripts
 export EMQX_RELUP ?= true
-export EMQX_DEFAULT_BUILDER = ghcr.io/emqx/emqx-builder/5.3-2:1.15.7-26.2.1-2-debian11
-export EMQX_DEFAULT_RUNNER = public.ecr.aws/debian/debian:11-slim
+export EMQX_DEFAULT_BUILDER = ghcr.io/emqx/emqx-builder/5.3-2:1.15.7-26.2.1-2-debian12
+export EMQX_DEFAULT_RUNNER = public.ecr.aws/debian/debian:12-slim
 export EMQX_REL_FORM ?= tgz
 export QUICER_DOWNLOAD_FROM_RELEASE = 1
 ifeq ($(OS),Windows_NT)
@@ -21,7 +21,7 @@ endif
 # Dashboard version
 # from https://github.com/emqx/emqx-dashboard5
 export EMQX_DASHBOARD_VERSION ?= v1.7.0
-export EMQX_EE_DASHBOARD_VERSION ?= e1.6.0-beta.2
+export EMQX_EE_DASHBOARD_VERSION ?= e1.6.0-beta.5
 
 PROFILE ?= emqx
 REL_PROFILES := emqx emqx-enterprise

+ 2 - 2
apps/emqx/include/emqx_release.hrl

@@ -32,7 +32,7 @@
 %% `apps/emqx/src/bpapi/README.md'
 
 %% Opensource edition
--define(EMQX_RELEASE_CE, "5.6.0-alpha.2").
+-define(EMQX_RELEASE_CE, "5.6.0-rc.1").
 
 %% Enterprise edition
--define(EMQX_RELEASE_EE, "5.6.0-alpha.2").
+-define(EMQX_RELEASE_EE, "5.6.0-rc.1").

+ 71 - 1
apps/emqx/integration_test/emqx_persistent_session_ds_SUITE.erl

@@ -118,7 +118,6 @@ app_specs() ->
 app_specs(Opts) ->
     ExtraEMQXConf = maps:get(extra_emqx_conf, Opts, ""),
     [
-        emqx_durable_storage,
         {emqx, "session_persistence = {enable = true}" ++ ExtraEMQXConf}
     ].
 
@@ -154,6 +153,14 @@ start_client(Opts0 = #{}) ->
     on_exit(fun() -> catch emqtt:stop(Client) end),
     Client.
 
+start_connect_client(Opts = #{}) ->
+    Client = start_client(Opts),
+    ?assertMatch({ok, _}, emqtt:connect(Client)),
+    Client.
+
+mk_clientid(Prefix, ID) ->
+    iolist_to_binary(io_lib:format("~p/~p", [Prefix, ID])).
+
 restart_node(Node, NodeSpec) ->
     ?tp(will_restart_node, #{}),
     emqx_cth_cluster:restart(Node, NodeSpec),
@@ -601,3 +608,66 @@ t_session_gc(Config) ->
         []
     ),
     ok.
+
+t_session_replay_retry(_Config) ->
+    %% Verify that the session recovers smoothly from transient errors during
+    %% replay.
+
+    ok = emqx_ds_test_helpers:mock_rpc(),
+
+    NClients = 10,
+    ClientSubOpts = #{
+        clientid => mk_clientid(?FUNCTION_NAME, sub),
+        auto_ack => never
+    },
+    ClientSub = start_connect_client(ClientSubOpts),
+    ?assertMatch(
+        {ok, _, [?RC_GRANTED_QOS_1]},
+        emqtt:subscribe(ClientSub, <<"t/#">>, ?QOS_1)
+    ),
+
+    ClientsPub = [
+        start_connect_client(#{
+            clientid => mk_clientid(?FUNCTION_NAME, I),
+            properties => #{'Session-Expiry-Interval' => 0}
+        })
+     || I <- lists:seq(1, NClients)
+    ],
+    lists:foreach(
+        fun(Client) ->
+            Index = integer_to_binary(rand:uniform(NClients)),
+            Topic = <<"t/", Index/binary>>,
+            ?assertMatch({ok, #{}}, emqtt:publish(Client, Topic, Index, 1))
+        end,
+        ClientsPub
+    ),
+
+    Pubs0 = emqx_common_test_helpers:wait_publishes(NClients, 5_000),
+    NPubs = length(Pubs0),
+    ?assertEqual(NClients, NPubs, ?drainMailbox()),
+
+    ok = emqtt:stop(ClientSub),
+
+    %% Make `emqx_ds` believe that roughly half of the shards are unavailable.
+    ok = emqx_ds_test_helpers:mock_rpc_result(
+        fun(_Node, emqx_ds_replication_layer, _Function, [_DB, Shard | _]) ->
+            case erlang:phash2(Shard) rem 2 of
+                0 -> unavailable;
+                1 -> passthrough
+            end
+        end
+    ),
+
+    _ClientSub = start_connect_client(ClientSubOpts#{clean_start => false}),
+
+    Pubs1 = emqx_common_test_helpers:wait_publishes(NPubs, 5_000),
+    ?assert(length(Pubs1) < length(Pubs0), Pubs1),
+
+    %% "Recover" the shards.
+    emqx_ds_test_helpers:unmock_rpc(),
+
+    Pubs2 = emqx_common_test_helpers:wait_publishes(NPubs - length(Pubs1), 5_000),
+    ?assertEqual(
+        [maps:with([topic, payload, qos], P) || P <- Pubs0],
+        [maps:with([topic, payload, qos], P) || P <- Pubs1 ++ Pubs2]
+    ).

+ 1 - 0
apps/emqx/priv/bpapi.versions

@@ -39,6 +39,7 @@
 {emqx_management,2}.
 {emqx_management,3}.
 {emqx_management,4}.
+{emqx_management,5}.
 {emqx_metrics,1}.
 {emqx_mgmt_api_plugins,1}.
 {emqx_mgmt_api_plugins,2}.

+ 1 - 1
apps/emqx/rebar.config

@@ -30,7 +30,7 @@
     {esockd, {git, "https://github.com/emqx/esockd", {tag, "5.11.1"}}},
     {ekka, {git, "https://github.com/emqx/ekka", {tag, "0.19.0"}}},
     {gen_rpc, {git, "https://github.com/emqx/gen_rpc", {tag, "3.3.1"}}},
-    {hocon, {git, "https://github.com/emqx/hocon.git", {tag, "0.42.0"}}},
+    {hocon, {git, "https://github.com/emqx/hocon.git", {tag, "0.42.1"}}},
     {emqx_http_lib, {git, "https://github.com/emqx/emqx_http_lib.git", {tag, "0.5.3"}}},
     {pbkdf2, {git, "https://github.com/emqx/erlang-pbkdf2.git", {tag, "2.0.4"}}},
     {recon, {git, "https://github.com/ferd/recon", {tag, "2.5.1"}}},

+ 4 - 0
apps/emqx/src/emqx_channel.erl

@@ -1210,6 +1210,10 @@ handle_call(
     ChanInfo1 = info(NChannel),
     emqx_cm:set_chan_info(ClientId, ChanInfo1#{sockinfo => SockInfo}),
     reply(ok, reset_timer(keepalive, NChannel));
+handle_call({Type, _Meta} = MsgsReq, Channel = #channel{session = Session}) when
+    Type =:= mqueue_msgs; Type =:= inflight_msgs
+->
+    {reply, emqx_session:info(MsgsReq, Session), Channel};
 handle_call(Req, Channel) ->
     ?SLOG(error, #{msg => "unexpected_call", call => Req}),
     reply(ignored, Channel).

+ 23 - 3
apps/emqx/src/emqx_cm_registry_keeper.erl

@@ -20,7 +20,8 @@
 
 -export([
     start_link/0,
-    count/1
+    count/1,
+    purge/0
 ]).
 
 %% gen_server callbacks
@@ -48,7 +49,10 @@ start_link() ->
 init(_) ->
     case mria_config:whoami() =:= replicant of
         true ->
-            ignore;
+            %% Do not run delete loops on replicant nodes
+            %% because the core nodes will do it anyway
+            %% The process is started to serve the 'count' calls
+            {ok, #{no_deletes => true}};
         false ->
             ok = send_delay_start(),
             {ok, #{next_clientid => undefined}}
@@ -71,6 +75,19 @@ count(Since) ->
             gen_server:call(?MODULE, {count, Since}, infinity)
     end.
 
+%% @doc Delete all retained history. Only for tests.
+-spec purge() -> ok.
+purge() ->
+    purge_loop(undefined).
+
+purge_loop(StartId) ->
+    case cleanup_one_chunk(StartId, _IsPurge = true) of
+        '$end_of_table' ->
+            ok;
+        NextId ->
+            purge_loop(NextId)
+    end.
+
 handle_call({count, Since}, _From, State) ->
     {LastCountTime, LastCount} =
         case State of
@@ -128,10 +145,13 @@ code_change(_OldVsn, State, _Extra) ->
     {ok, State}.
 
 cleanup_one_chunk(NextClientId) ->
+    cleanup_one_chunk(NextClientId, false).
+
+cleanup_one_chunk(NextClientId, IsPurge) ->
     Retain = retain_duration(),
     Now = now_ts(),
     IsExpired = fun(#channel{pid = Ts}) ->
-        is_integer(Ts) andalso (Ts < Now - Retain)
+        IsPurge orelse (is_integer(Ts) andalso (Ts < Now - Retain))
     end,
     cleanup_loop(NextClientId, ?CLEANUP_CHUNK_SIZE, IsExpired).
 

+ 1 - 0
apps/emqx/src/emqx_config.erl

@@ -715,6 +715,7 @@ add_handlers() ->
     ok = emqx_config_logger:add_handler(),
     ok = emqx_config_zones:add_handler(),
     emqx_sys_mon:add_handler(),
+    emqx_persistent_message:add_handler(),
     ok.
 
 remove_handlers() ->

+ 92 - 0
apps/emqx/src/emqx_cpu_sup_worker.erl

@@ -0,0 +1,92 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2024 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+
+-module(emqx_cpu_sup_worker).
+
+-behaviour(gen_server).
+
+-include("logger.hrl").
+
+%% gen_server APIs
+-export([start_link/0]).
+
+-export([
+    cpu_util/0,
+    cpu_util/1
+]).
+
+%% gen_server callbacks
+-export([
+    init/1,
+    handle_continue/2,
+    handle_call/3,
+    handle_cast/2,
+    terminate/2,
+    code_change/3
+]).
+
+-define(CPU_USAGE_WORKER, ?MODULE).
+
+%%--------------------------------------------------------------------
+%% API
+%%--------------------------------------------------------------------
+
+cpu_util() ->
+    gen_server:call(?CPU_USAGE_WORKER, ?FUNCTION_NAME, infinity).
+
+cpu_util(Args) ->
+    gen_server:call(?CPU_USAGE_WORKER, {?FUNCTION_NAME, Args}, infinity).
+
+%%--------------------------------------------------------------------
+%% gen_server callbacks
+%% simply handle cpu_sup:util/0,1 called in one process
+%%--------------------------------------------------------------------
+
+start_link() ->
+    gen_server:start_link({local, ?CPU_USAGE_WORKER}, ?MODULE, [], []).
+
+init([]) ->
+    {ok, undefined, {continue, setup}}.
+
+handle_continue(setup, undefined) ->
+    %% start os_mon temporarily
+    {ok, _} = application:ensure_all_started(os_mon),
+    %% The returned value of the first call to cpu_sup:util/0 or cpu_sup:util/1 by a
+    %% process will on most systems be the CPU utilization since system boot,
+    %% but this is not guaranteed and the value should therefore be regarded as garbage.
+    %% This also applies to the first call after a restart of cpu_sup.
+    _Val = cpu_sup:util(),
+    {noreply, #{}}.
+
+handle_call(cpu_util, _From, State) ->
+    Val = cpu_sup:util(),
+    {reply, Val, State};
+handle_call({cpu_util, Args}, _From, State) ->
+    Val = erlang:apply(cpu_sup, util, Args),
+    {reply, Val, State};
+handle_call(Req, _From, State) ->
+    ?SLOG(error, #{msg => "unexpected_call", call => Req}),
+    {reply, ignored, State}.
+
+handle_cast(Msg, State) ->
+    ?SLOG(error, #{msg => "unexpected_cast", cast => Msg}),
+    {noreply, State}.
+
+terminate(_Reason, _State) ->
+    ok.
+
+code_change(_OldVsn, State, _Extra) ->
+    {ok, State}.

+ 46 - 1
apps/emqx/src/emqx_inflight.erl

@@ -36,7 +36,8 @@
     max_size/1,
     is_full/1,
     is_empty/1,
-    window/1
+    window/1,
+    query/2
 ]).
 
 -export_type([inflight/0]).
@@ -138,3 +139,47 @@ size(?INFLIGHT(Tree)) ->
 -spec max_size(inflight()) -> non_neg_integer().
 max_size(?INFLIGHT(MaxSize, _Tree)) ->
     MaxSize.
+
+-spec query(inflight(), #{continuation => Cont, limit := L}) ->
+    {[{key(), term()}], #{continuation := Cont, count := C}}
+when
+    Cont :: none | end_of_data | key(),
+    L :: non_neg_integer(),
+    C :: non_neg_integer().
+query(?INFLIGHT(Tree), #{limit := Limit} = Pager) ->
+    Count = gb_trees:size(Tree),
+    ContKey = maps:get(continuation, Pager, none),
+    {List, NextCont} = sublist(iterator_from(ContKey, Tree), Limit),
+    {List, #{continuation => NextCont, count => Count}}.
+
+iterator_from(none, Tree) ->
+    gb_trees:iterator(Tree);
+iterator_from(ContKey, Tree) ->
+    It = gb_trees:iterator_from(ContKey, Tree),
+    case gb_trees:next(It) of
+        {ContKey, _Val, ItNext} -> ItNext;
+        _ -> It
+    end.
+
+sublist(_It, 0) ->
+    {[], none};
+sublist(It, Len) ->
+    {ListAcc, HasNext} = sublist(It, Len, []),
+    {lists:reverse(ListAcc), next_cont(ListAcc, HasNext)}.
+
+sublist(It, 0, Acc) ->
+    {Acc, gb_trees:next(It) =/= none};
+sublist(It, Len, Acc) ->
+    case gb_trees:next(It) of
+        none ->
+            {Acc, false};
+        {Key, Val, ItNext} ->
+            sublist(ItNext, Len - 1, [{Key, Val} | Acc])
+    end.
+
+next_cont(_Acc, false) ->
+    end_of_data;
+next_cont([{LastKey, _LastVal} | _Acc], _HasNext) ->
+    LastKey;
+next_cont([], _HasNext) ->
+    end_of_data.

+ 51 - 1
apps/emqx/src/emqx_mqueue.erl

@@ -68,7 +68,8 @@
     stats/1,
     dropped/1,
     to_list/1,
-    filter/2
+    filter/2,
+    query/2
 ]).
 
 -define(NO_PRIORITY_TABLE, disabled).
@@ -171,6 +172,55 @@ filter(Pred, #mqueue{q = Q, len = Len, dropped = Droppend} = MQ) ->
             MQ#mqueue{q = Q2, len = Len2, dropped = Droppend + Diff}
     end.
 
+-spec query(mqueue(), #{continuation => ContMsgId, limit := L}) ->
+    {[message()], #{continuation := ContMsgId, count := C}}
+when
+    ContMsgId :: none | end_of_data | binary(),
+    C :: non_neg_integer(),
+    L :: non_neg_integer().
+query(MQ, #{limit := Limit} = Pager) ->
+    ContMsgId = maps:get(continuation, Pager, none),
+    {List, NextCont} = sublist(skip_until(MQ, ContMsgId), Limit),
+    {List, #{continuation => NextCont, count => len(MQ)}}.
+
+skip_until(MQ, none = _MsgId) ->
+    MQ;
+skip_until(MQ, MsgId) ->
+    do_skip_until(MQ, MsgId).
+
+do_skip_until(MQ, MsgId) ->
+    case out(MQ) of
+        {empty, MQ} ->
+            MQ;
+        {{value, #message{id = MsgId}}, Q1} ->
+            Q1;
+        {{value, _Msg}, Q1} ->
+            do_skip_until(Q1, MsgId)
+    end.
+
+sublist(_MQ, 0) ->
+    {[], none};
+sublist(MQ, Len) ->
+    {ListAcc, HasNext} = sublist(MQ, Len, []),
+    {lists:reverse(ListAcc), next_cont(ListAcc, HasNext)}.
+
+sublist(MQ, 0, Acc) ->
+    {Acc, element(1, out(MQ)) =/= empty};
+sublist(MQ, Len, Acc) ->
+    case out(MQ) of
+        {empty, _MQ} ->
+            {Acc, false};
+        {{value, Msg}, Q1} ->
+            sublist(Q1, Len - 1, [Msg | Acc])
+    end.
+
+next_cont(_Acc, false) ->
+    end_of_data;
+next_cont([#message{id = Id} | _Acc], _HasNext) ->
+    Id;
+next_cont([], _HasNext) ->
+    end_of_data.
+
 to_list(MQ, Acc) ->
     case out(MQ) of
         {empty, _MQ} ->

+ 4 - 3
apps/emqx/src/emqx_os_mon.erl

@@ -18,6 +18,7 @@
 
 -behaviour(gen_server).
 
+-include("emqx.hrl").
 -include("logger.hrl").
 
 -export([start_link/0]).
@@ -47,8 +48,6 @@
 ]).
 -export([is_os_check_supported/0]).
 
--include("emqx.hrl").
-
 -define(OS_MON, ?MODULE).
 
 start_link() ->
@@ -92,6 +91,8 @@ handle_continue(setup, undefined) ->
     SysHW = init_os_monitor(),
     MemRef = start_mem_check_timer(),
     CpuRef = start_cpu_check_timer(),
+    %% the value of the first call should be regarded as garbage.
+    _Val = cpu_sup:util(),
     {noreply, #{sysmem_high_watermark => SysHW, mem_time_ref => MemRef, cpu_time_ref => CpuRef}}.
 
 init_os_monitor() ->
@@ -131,7 +132,7 @@ handle_info({timeout, _Timer, mem_check}, #{sysmem_high_watermark := HWM} = Stat
 handle_info({timeout, _Timer, cpu_check}, State) ->
     CPUHighWatermark = emqx:get_config([sysmon, os, cpu_high_watermark]) * 100,
     CPULowWatermark = emqx:get_config([sysmon, os, cpu_low_watermark]) * 100,
-    CPUVal = emqx_vm:cpu_util(),
+    CPUVal = cpu_sup:util(),
     case CPUVal of
         %% 0 or 0.0
         Busy when Busy == 0 ->

+ 18 - 0
apps/emqx/src/emqx_persistent_message.erl

@@ -16,11 +16,16 @@
 
 -module(emqx_persistent_message).
 
+-behaviour(emqx_config_handler).
+
 -include("emqx.hrl").
 
 -export([init/0]).
 -export([is_persistence_enabled/0, force_ds/0]).
 
+%% Config handler
+-export([add_handler/0, pre_config_update/3]).
+
 %% Message persistence
 -export([
     persist/1
@@ -66,6 +71,19 @@ storage_backend(Path) ->
 
 %%--------------------------------------------------------------------
 
+-spec add_handler() -> ok.
+add_handler() ->
+    emqx_config_handler:add_handler([session_persistence], ?MODULE).
+
+pre_config_update([session_persistence], #{<<"enable">> := New}, #{<<"enable">> := Old}) when
+    New =/= Old
+->
+    {error, "Hot update of session_persistence.enable parameter is currently not supported"};
+pre_config_update(_Root, _NewConf, _OldConf) ->
+    ok.
+
+%%--------------------------------------------------------------------
+
 -spec persist(emqx_types:message()) ->
     ok | {skipped, _Reason} | {error, _TODO}.
 persist(Msg) ->

+ 88 - 37
apps/emqx/src/emqx_persistent_session_ds.erl

@@ -123,7 +123,12 @@
 -define(TIMER_PULL, timer_pull).
 -define(TIMER_GET_STREAMS, timer_get_streams).
 -define(TIMER_BUMP_LAST_ALIVE_AT, timer_bump_last_alive_at).
--type timer() :: ?TIMER_PULL | ?TIMER_GET_STREAMS | ?TIMER_BUMP_LAST_ALIVE_AT.
+-define(TIMER_RETRY_REPLAY, timer_retry_replay).
+
+-type timer() :: ?TIMER_PULL | ?TIMER_GET_STREAMS | ?TIMER_BUMP_LAST_ALIVE_AT | ?TIMER_RETRY_REPLAY.
+
+%% TODO: Needs configuration?
+-define(TIMEOUT_RETRY_REPLAY, 1000).
 
 -type session() :: #{
     %% Client ID
@@ -134,10 +139,15 @@
     s := emqx_persistent_session_ds_state:t(),
     %% Buffer:
     inflight := emqx_persistent_session_ds_inflight:t(),
+    %% In-progress replay:
+    %% List of stream replay states to be added to the inflight buffer.
+    replay => [{_StreamKey, stream_state()}, ...],
     %% Timers:
     timer() => reference()
 }.
 
+-define(IS_REPLAY_ONGOING(SESS), is_map_key(replay, SESS)).
+
 -record(req_sync, {
     from :: pid(),
     ref :: reference()
@@ -450,12 +460,14 @@ deliver(ClientInfo, Delivers, Session0) ->
 
 -spec handle_timeout(clientinfo(), _Timeout, session()) ->
     {ok, replies(), session()} | {ok, replies(), timeout(), session()}.
-handle_timeout(
-    ClientInfo,
-    ?TIMER_PULL,
-    Session0
-) ->
-    {Publishes, Session1} = drain_buffer(fetch_new_messages(Session0, ClientInfo)),
+handle_timeout(ClientInfo, ?TIMER_PULL, Session0) ->
+    {Publishes, Session1} =
+        case ?IS_REPLAY_ONGOING(Session0) of
+            false ->
+                drain_buffer(fetch_new_messages(Session0, ClientInfo));
+            true ->
+                {[], Session0}
+        end,
     Timeout =
         case Publishes of
             [] ->
@@ -465,6 +477,9 @@ handle_timeout(
         end,
     Session = emqx_session:ensure_timer(?TIMER_PULL, Timeout, Session1),
     {ok, Publishes, Session};
+handle_timeout(ClientInfo, ?TIMER_RETRY_REPLAY, Session0) ->
+    Session = replay_streams(Session0, ClientInfo),
+    {ok, [], Session};
 handle_timeout(_ClientInfo, ?TIMER_GET_STREAMS, Session0 = #{s := S0}) ->
     S1 = emqx_persistent_session_ds_subs:gc(S0),
     S = emqx_persistent_session_ds_stream_scheduler:renew_streams(S1),
@@ -503,30 +518,47 @@ bump_last_alive(S0) ->
     {ok, replies(), session()}.
 replay(ClientInfo, [], Session0 = #{s := S0}) ->
     Streams = emqx_persistent_session_ds_stream_scheduler:find_replay_streams(S0),
-    Session = lists:foldl(
-        fun({_StreamKey, Stream}, SessionAcc) ->
-            replay_batch(Stream, SessionAcc, ClientInfo)
-        end,
-        Session0,
-        Streams
-    ),
+    Session = replay_streams(Session0#{replay => Streams}, ClientInfo),
+    {ok, [], Session}.
+
+replay_streams(Session0 = #{replay := [{_StreamKey, Srs0} | Rest]}, ClientInfo) ->
+    case replay_batch(Srs0, Session0, ClientInfo) of
+        Session = #{} ->
+            replay_streams(Session#{replay := Rest}, ClientInfo);
+        {error, recoverable, Reason} ->
+            RetryTimeout = ?TIMEOUT_RETRY_REPLAY,
+            ?SLOG(warning, #{
+                msg => "failed_to_fetch_replay_batch",
+                stream => Srs0,
+                reason => Reason,
+                class => recoverable,
+                retry_in_ms => RetryTimeout
+            }),
+            emqx_session:ensure_timer(?TIMER_RETRY_REPLAY, RetryTimeout, Session0)
+        %% TODO: Handle unrecoverable errors.
+    end;
+replay_streams(Session0 = #{replay := []}, _ClientInfo) ->
+    Session = maps:remove(replay, Session0),
     %% Note: we filled the buffer with the historical messages, and
     %% from now on we'll rely on the normal inflight/flow control
     %% mechanisms to replay them:
-    {ok, [], pull_now(Session)}.
+    pull_now(Session).
 
--spec replay_batch(stream_state(), session(), clientinfo()) -> session().
-replay_batch(Srs0, Session, ClientInfo) ->
+-spec replay_batch(stream_state(), session(), clientinfo()) -> session() | emqx_ds:error(_).
+replay_batch(Srs0, Session0, ClientInfo) ->
     #srs{batch_size = BatchSize} = Srs0,
-    %% TODO: retry on errors:
-    {Srs, Inflight} = enqueue_batch(true, BatchSize, Srs0, Session, ClientInfo),
-    %% Assert:
-    Srs =:= Srs0 orelse
-        ?tp(warning, emqx_persistent_session_ds_replay_inconsistency, #{
-            expected => Srs0,
-            got => Srs
-        }),
-    Session#{inflight => Inflight}.
+    case enqueue_batch(true, BatchSize, Srs0, Session0, ClientInfo) of
+        {ok, Srs, Session} ->
+            %% Assert:
+            Srs =:= Srs0 orelse
+                ?tp(warning, emqx_persistent_session_ds_replay_inconsistency, #{
+                    expected => Srs0,
+                    got => Srs
+                }),
+            Session;
+        {error, _, _} = Error ->
+            Error
+    end.
 
 %%--------------------------------------------------------------------
 
@@ -746,7 +778,7 @@ fetch_new_messages([I | Streams], Session0 = #{inflight := Inflight}, ClientInfo
             fetch_new_messages(Streams, Session, ClientInfo)
     end.
 
-new_batch({StreamKey, Srs0}, BatchSize, Session = #{s := S0}, ClientInfo) ->
+new_batch({StreamKey, Srs0}, BatchSize, Session0 = #{s := S0}, ClientInfo) ->
     SN1 = emqx_persistent_session_ds_state:get_seqno(?next(?QOS_1), S0),
     SN2 = emqx_persistent_session_ds_state:get_seqno(?next(?QOS_2), S0),
     Srs1 = Srs0#srs{
@@ -756,11 +788,30 @@ new_batch({StreamKey, Srs0}, BatchSize, Session = #{s := S0}, ClientInfo) ->
         last_seqno_qos1 = SN1,
         last_seqno_qos2 = SN2
     },
-    {Srs, Inflight} = enqueue_batch(false, BatchSize, Srs1, Session, ClientInfo),
-    S1 = emqx_persistent_session_ds_state:put_seqno(?next(?QOS_1), Srs#srs.last_seqno_qos1, S0),
-    S2 = emqx_persistent_session_ds_state:put_seqno(?next(?QOS_2), Srs#srs.last_seqno_qos2, S1),
-    S = emqx_persistent_session_ds_state:put_stream(StreamKey, Srs, S2),
-    Session#{s => S, inflight => Inflight}.
+    case enqueue_batch(false, BatchSize, Srs1, Session0, ClientInfo) of
+        {ok, Srs, Session} ->
+            S1 = emqx_persistent_session_ds_state:put_seqno(
+                ?next(?QOS_1),
+                Srs#srs.last_seqno_qos1,
+                S0
+            ),
+            S2 = emqx_persistent_session_ds_state:put_seqno(
+                ?next(?QOS_2),
+                Srs#srs.last_seqno_qos2,
+                S1
+            ),
+            S = emqx_persistent_session_ds_state:put_stream(StreamKey, Srs, S2),
+            Session#{s => S};
+        {error, Class, Reason} ->
+            %% TODO: Handle unrecoverable error.
+            ?SLOG(info, #{
+                msg => "failed_to_fetch_batch",
+                stream => Srs1,
+                reason => Reason,
+                class => Class
+            }),
+            Session0
+    end.
 
 enqueue_batch(IsReplay, BatchSize, Srs0, Session = #{inflight := Inflight0}, ClientInfo) ->
     #srs{
@@ -789,13 +840,13 @@ enqueue_batch(IsReplay, BatchSize, Srs0, Session = #{inflight := Inflight0}, Cli
                 last_seqno_qos1 = LastSeqnoQos1,
                 last_seqno_qos2 = LastSeqnoQos2
             },
-            {Srs, Inflight};
+            {ok, Srs, Session#{inflight := Inflight}};
         {ok, end_of_stream} ->
             %% No new messages; just update the end iterator:
-            {Srs0#srs{it_begin = ItBegin, it_end = end_of_stream, batch_size = 0}, Inflight0};
-        {error, _} when not IsReplay ->
-            ?SLOG(info, #{msg => "failed_to_fetch_batch", iterator => ItBegin}),
-            {Srs0, Inflight0}
+            Srs = Srs0#srs{it_begin = ItBegin, it_end = end_of_stream, batch_size = 0},
+            {ok, Srs, Session#{inflight := Inflight0}};
+        {error, _, _} = Error ->
+            Error
     end.
 
 %% key_of_iter(#{3 := #{3 := #{5 := K}}}) ->

+ 1 - 0
apps/emqx/src/emqx_rpc.erl

@@ -35,6 +35,7 @@
 
 -export_type([
     badrpc/0,
+    call_result/1,
     call_result/0,
     cast_result/0,
     multicall_result/1,

+ 1 - 1
apps/emqx/src/emqx_session.erl

@@ -527,7 +527,7 @@ info(Session) ->
 
 -spec info
     ([atom()], t()) -> [{atom(), _Value}];
-    (atom(), t()) -> _Value.
+    (atom() | {atom(), _Meta}, t()) -> _Value.
 info(Keys, Session) when is_list(Keys) ->
     [{Key, info(Key, Session)} || Key <- Keys];
 info(impl, Session) ->

+ 5 - 0
apps/emqx/src/emqx_session_mem.erl

@@ -268,6 +268,9 @@ info(inflight_cnt, #session{inflight = Inflight}) ->
     emqx_inflight:size(Inflight);
 info(inflight_max, #session{inflight = Inflight}) ->
     emqx_inflight:max_size(Inflight);
+info({inflight_msgs, PagerParams}, #session{inflight = Inflight}) ->
+    {InflightList, Meta} = emqx_inflight:query(Inflight, PagerParams),
+    {[I#inflight_data.message || {_, I} <- InflightList], Meta};
 info(retry_interval, #session{retry_interval = Interval}) ->
     Interval;
 info(mqueue, #session{mqueue = MQueue}) ->
@@ -278,6 +281,8 @@ info(mqueue_max, #session{mqueue = MQueue}) ->
     emqx_mqueue:max_len(MQueue);
 info(mqueue_dropped, #session{mqueue = MQueue}) ->
     emqx_mqueue:dropped(MQueue);
+info({mqueue_msgs, PagerParams}, #session{mqueue = MQueue}) ->
+    emqx_mqueue:query(MQueue, PagerParams);
 info(next_pkt_id, #session{next_pkt_id = PacketId}) ->
     PacketId;
 info(awaiting_rel, #session{awaiting_rel = AwaitingRel}) ->

+ 1 - 1
apps/emqx/src/emqx_shared_sub.erl

@@ -435,7 +435,7 @@ handle_call({unsubscribe, Group, Topic, SubPid}, _From, State) ->
     true = ets:delete_object(?SHARED_SUBS, {{Group, Topic}, SubPid}),
     delete_route_if_needed({Group, Topic}),
     maybe_delete_round_robin_count({Group, Topic}),
-    {reply, ok, State};
+    {reply, ok, update_stats(State)};
 handle_call(Req, _From, State) ->
     ?SLOG(error, #{msg => "unexpected_call", req => Req}),
     {reply, ignored, State}.

+ 2 - 2
apps/emqx/src/emqx_sys_mon.erl

@@ -58,8 +58,8 @@ remove_handler() ->
 post_config_update(_, _Req, NewConf, OldConf, _AppEnvs) ->
     #{os := OS1, vm := VM1} = OldConf,
     #{os := OS2, vm := VM2} = NewConf,
-    VM1 =/= VM2 andalso ?MODULE:update(VM2),
-    OS1 =/= OS2 andalso emqx_os_mon:update(OS2),
+    (VM1 =/= VM2) andalso ?MODULE:update(VM2),
+    (OS1 =/= OS2) andalso emqx_os_mon:update(OS2),
     ok.
 
 update(VM) ->

+ 1 - 1
apps/emqx/src/emqx_sys_sup.erl

@@ -28,7 +28,7 @@ start_link() ->
 init([]) ->
     OsMon =
         case emqx_os_mon:is_os_check_supported() of
-            true -> [child_spec(emqx_os_mon)];
+            true -> [child_spec(emqx_os_mon), child_spec(emqx_cpu_sup_worker)];
             false -> []
         end,
     Children =

+ 22 - 19
apps/emqx/src/emqx_vm.erl

@@ -16,6 +16,8 @@
 
 -module(emqx_vm).
 
+-include("logger.hrl").
+
 -export([
     schedulers/0,
     scheduler_usage/1,
@@ -376,28 +378,29 @@ avg15() ->
     compat_windows(fun cpu_sup:avg15/0).
 
 cpu_util() ->
-    compat_windows(fun cpu_sup:util/0).
+    compat_windows(fun() -> emqx_cpu_sup_worker:cpu_util() end).
 
 cpu_util(Args) ->
-    compat_windows(fun cpu_sup:util/1, Args).
-
+    compat_windows(fun() -> emqx_cpu_sup_worker:cpu_util(Args) end).
+
+-spec compat_windows(function()) -> any().
+compat_windows(Fun) when is_function(Fun, 0) ->
+    case emqx_os_mon:is_os_check_supported() of
+        true ->
+            try Fun() of
+                Val when is_float(Val) -> floor(Val * 100) / 100;
+                Val when is_number(Val) -> Val;
+                Val when is_tuple(Val) -> Val;
+                _ -> 0.0
+            catch
+                _:_ -> 0.0
+            end;
+        false ->
+            0.0
+    end;
 compat_windows(Fun) ->
-    case compat_windows(Fun, []) of
-        Val when is_float(Val) -> floor(Val * 100) / 100;
-        Val when is_number(Val) -> Val;
-        _ -> 0.0
-    end.
-
-compat_windows(Fun, Args) ->
-    try
-        case emqx_os_mon:is_os_check_supported() of
-            false -> 0.0;
-            true when Args =:= [] -> Fun();
-            true -> Fun(Args)
-        end
-    catch
-        _:_ -> 0.0
-    end.
+    ?SLOG(warning, "Invalid function: ~p", [Fun]),
+    error({badarg, Fun}).
 
 load(Avg) ->
     floor((Avg / 256) * 100) / 100.

+ 11 - 0
apps/emqx/test/emqx_common_test_helpers.erl

@@ -61,6 +61,7 @@
     read_schema_configs/2,
     render_config_file/2,
     wait_for/4,
+    wait_publishes/2,
     wait_mqtt_payload/1,
     select_free_port/1
 ]).
@@ -426,6 +427,16 @@ wait_for(Fn, Ln, F, Timeout) ->
     {Pid, Mref} = erlang:spawn_monitor(fun() -> wait_loop(F, catch_call(F)) end),
     wait_for_down(Fn, Ln, Timeout, Pid, Mref, false).
 
+wait_publishes(0, _Timeout) ->
+    [];
+wait_publishes(Count, Timeout) ->
+    receive
+        {publish, Msg} ->
+            [Msg | wait_publishes(Count - 1, Timeout)]
+    after Timeout ->
+        []
+    end.
+
 flush() ->
     flush([]).
 

+ 80 - 2
apps/emqx/test/emqx_inflight_SUITE.erl

@@ -116,5 +116,83 @@ t_window(_) ->
     ),
     ?assertEqual([a, b], emqx_inflight:window(Inflight)).
 
-% t_to_list(_) ->
-%     error('TODO').
+t_to_list(_) ->
+    Inflight = lists:foldl(
+        fun(Seq, InflightAcc) ->
+            emqx_inflight:insert(Seq, integer_to_binary(Seq), InflightAcc)
+        end,
+        emqx_inflight:new(100),
+        [1, 6, 2, 3, 10, 7, 9, 8, 4, 5]
+    ),
+    ExpList = [{Seq, integer_to_binary(Seq)} || Seq <- lists:seq(1, 10)],
+    ?assertEqual(ExpList, emqx_inflight:to_list(Inflight)).
+
+t_query(_) ->
+    EmptyInflight = emqx_inflight:new(500),
+    ?assertMatch(
+        {[], #{continuation := end_of_data}}, emqx_inflight:query(EmptyInflight, #{limit => 50})
+    ),
+    ?assertMatch(
+        {[], #{continuation := end_of_data}},
+        emqx_inflight:query(EmptyInflight, #{continuation => <<"empty">>, limit => 50})
+    ),
+    ?assertMatch(
+        {[], #{continuation := end_of_data}},
+        emqx_inflight:query(EmptyInflight, #{continuation => none, limit => 50})
+    ),
+
+    Inflight = lists:foldl(
+        fun(Seq, QAcc) ->
+            emqx_inflight:insert(Seq, integer_to_binary(Seq), QAcc)
+        end,
+        EmptyInflight,
+        lists:reverse(lists:seq(1, 114))
+    ),
+
+    LastCont = lists:foldl(
+        fun(PageSeq, Cont) ->
+            Limit = 10,
+            PagerParams = #{continuation => Cont, limit => Limit},
+            {Page, #{continuation := NextCont} = Meta} = emqx_inflight:query(Inflight, PagerParams),
+            ?assertEqual(10, length(Page)),
+            ExpFirst = PageSeq * Limit - Limit + 1,
+            ExpLast = PageSeq * Limit,
+            ?assertEqual({ExpFirst, integer_to_binary(ExpFirst)}, lists:nth(1, Page)),
+            ?assertEqual({ExpLast, integer_to_binary(ExpLast)}, lists:nth(10, Page)),
+            ?assertMatch(
+                #{count := 114, continuation := IntCont} when is_integer(IntCont),
+                Meta
+            ),
+            NextCont
+        end,
+        none,
+        lists:seq(1, 11)
+    ),
+    {LastPartialPage, LastMeta} = emqx_inflight:query(Inflight, #{
+        continuation => LastCont, limit => 10
+    }),
+    ?assertEqual(4, length(LastPartialPage)),
+    ?assertEqual({111, <<"111">>}, lists:nth(1, LastPartialPage)),
+    ?assertEqual({114, <<"114">>}, lists:nth(4, LastPartialPage)),
+    ?assertMatch(#{continuation := end_of_data, count := 114}, LastMeta),
+
+    ?assertMatch(
+        {[], #{continuation := end_of_data}},
+        emqx_inflight:query(Inflight, #{continuation => <<"not-existing-cont-id">>, limit => 10})
+    ),
+
+    {LargePage, LargeMeta} = emqx_inflight:query(Inflight, #{limit => 1000}),
+    ?assertEqual(114, length(LargePage)),
+    ?assertEqual({1, <<"1">>}, hd(LargePage)),
+    ?assertEqual({114, <<"114">>}, lists:last(LargePage)),
+    ?assertMatch(#{continuation := end_of_data}, LargeMeta),
+
+    {FullPage, FullMeta} = emqx_inflight:query(Inflight, #{limit => 114}),
+    ?assertEqual(114, length(FullPage)),
+    ?assertEqual({1, <<"1">>}, hd(FullPage)),
+    ?assertEqual({114, <<"114">>}, lists:last(FullPage)),
+    ?assertMatch(#{continuation := end_of_data}, FullMeta),
+
+    {EmptyPage, EmptyMeta} = emqx_inflight:query(Inflight, #{limit => 0}),
+    ?assertEqual([], EmptyPage),
+    ?assertMatch(#{continuation := none, count := 114}, EmptyMeta).

+ 68 - 0
apps/emqx/test/emqx_mqueue_SUITE.erl

@@ -282,6 +282,74 @@ t_dropped(_) ->
     {Msg, Q2} = ?Q:in(Msg, Q1),
     ?assertEqual(1, ?Q:dropped(Q2)).
 
+t_query(_) ->
+    EmptyQ = ?Q:init(#{max_len => 500, store_qos0 => true}),
+    ?assertMatch({[], #{continuation := end_of_data}}, ?Q:query(EmptyQ, #{limit => 50})),
+    ?assertMatch(
+        {[], #{continuation := end_of_data}},
+        ?Q:query(EmptyQ, #{continuation => <<"empty">>, limit => 50})
+    ),
+    ?assertMatch(
+        {[], #{continuation := end_of_data}}, ?Q:query(EmptyQ, #{continuation => none, limit => 50})
+    ),
+
+    Q = lists:foldl(
+        fun(Seq, QAcc) ->
+            Msg = emqx_message:make(<<"t">>, integer_to_binary(Seq)),
+            {_, QAcc1} = ?Q:in(Msg, QAcc),
+            QAcc1
+        end,
+        EmptyQ,
+        lists:seq(1, 114)
+    ),
+
+    LastCont = lists:foldl(
+        fun(PageSeq, Cont) ->
+            Limit = 10,
+            PagerParams = #{continuation => Cont, limit => Limit},
+            {Page, #{continuation := NextCont} = Meta} = ?Q:query(Q, PagerParams),
+            ?assertEqual(10, length(Page)),
+            ExpFirstPayload = integer_to_binary(PageSeq * Limit - Limit + 1),
+            ExpLastPayload = integer_to_binary(PageSeq * Limit),
+            ?assertEqual(
+                ExpFirstPayload,
+                emqx_message:payload(lists:nth(1, Page)),
+                #{page_seq => PageSeq, page => Page, meta => Meta}
+            ),
+            ?assertEqual(ExpLastPayload, emqx_message:payload(lists:nth(10, Page))),
+            ?assertMatch(#{count := 114, continuation := <<_/binary>>}, Meta),
+            NextCont
+        end,
+        none,
+        lists:seq(1, 11)
+    ),
+    {LastPartialPage, LastMeta} = ?Q:query(Q, #{continuation => LastCont, limit => 10}),
+    ?assertEqual(4, length(LastPartialPage)),
+    ?assertEqual(<<"111">>, emqx_message:payload(lists:nth(1, LastPartialPage))),
+    ?assertEqual(<<"114">>, emqx_message:payload(lists:nth(4, LastPartialPage))),
+    ?assertMatch(#{continuation := end_of_data, count := 114}, LastMeta),
+
+    ?assertMatch(
+        {[], #{continuation := end_of_data}},
+        ?Q:query(Q, #{continuation => <<"not-existing-cont-id">>, limit => 10})
+    ),
+
+    {LargePage, LargeMeta} = ?Q:query(Q, #{limit => 1000}),
+    ?assertEqual(114, length(LargePage)),
+    ?assertEqual(<<"1">>, emqx_message:payload(hd(LargePage))),
+    ?assertEqual(<<"114">>, emqx_message:payload(lists:last(LargePage))),
+    ?assertMatch(#{continuation := end_of_data}, LargeMeta),
+
+    {FullPage, FullMeta} = ?Q:query(Q, #{limit => 114}),
+    ?assertEqual(114, length(FullPage)),
+    ?assertEqual(<<"1">>, emqx_message:payload(hd(FullPage))),
+    ?assertEqual(<<"114">>, emqx_message:payload(lists:last(FullPage))),
+    ?assertMatch(#{continuation := end_of_data}, FullMeta),
+
+    {EmptyPage, EmptyMeta} = ?Q:query(Q, #{limit => 0}),
+    ?assertEqual([], EmptyPage),
+    ?assertMatch(#{continuation := none, count := 114}, EmptyMeta).
+
 conservation_prop() ->
     ?FORALL(
         {Priorities, Messages},

+ 2 - 1
apps/emqx/test/emqx_os_mon_SUITE.erl

@@ -132,7 +132,8 @@ do_sys_mem_check_alarm(_Config) ->
         get_memory_usage,
         fun() -> Mem end,
         fun() ->
-            timer:sleep(500),
+            %% wait for `os_mon` started
+            timer:sleep(10_000),
             Alarms = emqx_alarm:get_alarms(activated),
             ?assert(
                 emqx_vm_mon_SUITE:is_existing(

+ 3 - 2
apps/emqx/test/emqx_vm_SUITE.erl

@@ -21,7 +21,8 @@
 
 -include_lib("eunit/include/eunit.hrl").
 
-all() -> emqx_common_test_helpers:all(?MODULE).
+all() ->
+    emqx_common_test_helpers:all(?MODULE).
 
 t_load(_Config) ->
     lists:foreach(
@@ -97,7 +98,7 @@ t_get_process_limit(_Config) ->
     emqx_vm:get_process_limit().
 
 t_cpu_util(_Config) ->
-    _Cpu = emqx_vm:cpu_util().
+    ?assertMatch(Val when is_number(Val), emqx_vm:cpu_util()).
 
 easy_server() ->
     {ok, LSock} = gen_tcp:listen(5678, [binary, {packet, 0}, {active, false}]),

+ 1 - 1
apps/emqx_bridge/src/emqx_bridge_api.erl

@@ -764,7 +764,7 @@ is_bridge_enabled_v1(BridgeType, BridgeName) ->
     %% we read from the translated config because the defaults are populated here.
     try emqx:get_config([bridges, BridgeType, binary_to_existing_atom(BridgeName)]) of
         ConfMap ->
-            maps:get(enable, ConfMap, false)
+            maps:get(enable, ConfMap, true)
     catch
         error:{config_not_found, _} ->
             throw(not_found);

+ 6 - 6
apps/emqx_bridge/src/emqx_bridge_v2_api.erl

@@ -126,8 +126,8 @@ paths() ->
         %% %% try to match the latter first, trying to interpret `metrics' as an operation...
         "/sources/:id/metrics",
         "/sources/:id/metrics/reset",
-        "/sources_probe"
-        %% "/source_types"
+        "/sources_probe",
+        "/source_types"
     ].
 
 error_schema(Code, Message) ->
@@ -639,16 +639,16 @@ schema("/source_types") ->
         'operationId' => '/source_types',
         get => #{
             tags => [<<"sources">>],
-            desc => ?DESC("desc_api10"),
+            desc => ?DESC("desc_api11"),
             summary => <<"List available source types">>,
             responses => #{
                 200 => emqx_dashboard_swagger:schema_with_examples(
-                    array(emqx_bridge_v2_schema:action_types_sc()),
+                    array(emqx_bridge_v2_schema:source_types_sc()),
                     #{
                         <<"types">> =>
                             #{
                                 summary => <<"Source types">>,
-                                value => emqx_bridge_v2_schema:action_types()
+                                value => emqx_bridge_v2_schema:source_types()
                             }
                     }
                 )
@@ -990,7 +990,7 @@ call_operation_if_enabled(NodeOrAll, OperFunc, [Nodes, ConfRootKey, BridgeType,
 is_enabled_bridge(ConfRootKey, BridgeType, BridgeName) ->
     try emqx_bridge_v2:lookup(ConfRootKey, BridgeType, binary_to_existing_atom(BridgeName)) of
         {ok, #{raw_config := ConfMap}} ->
-            maps:get(<<"enable">>, ConfMap, false);
+            maps:get(<<"enable">>, ConfMap, true);
         {error, not_found} ->
             throw(not_found)
     catch

+ 18 - 0
apps/emqx_bridge/test/emqx_bridge_v2_testlib.erl

@@ -458,6 +458,24 @@ probe_bridge_api(Kind, BridgeType, BridgeName, BridgeConfig) ->
     ct:pal("bridge probe (~s, http) result:\n  ~p", [Kind, Res]),
     Res.
 
+probe_connector_api(Config) ->
+    probe_connector_api(Config, _Overrides = #{}).
+
+probe_connector_api(Config, Overrides) ->
+    #{
+        connector_type := Type,
+        connector_name := Name
+    } = get_common_values(Config),
+    ConnectorConfig0 = get_value(connector_config, Config),
+    ConnectorConfig1 = emqx_utils_maps:deep_merge(ConnectorConfig0, Overrides),
+    Params = ConnectorConfig1#{<<"type">> => Type, <<"name">> => Name},
+    Path = emqx_mgmt_api_test_util:api_path(["connectors_probe"]),
+    ct:pal("probing connector (~s, http):\n  ~p", [Type, Params]),
+    Method = post,
+    Res = request(Method, Path, Params),
+    ct:pal("probing connector (~s, http) result:\n  ~p", [Type, Res]),
+    Res.
+
 list_bridges_http_api_v1() ->
     Path = emqx_mgmt_api_test_util:api_path(["bridges"]),
     ct:pal("list bridges (http v1)"),

+ 25 - 5
apps/emqx_bridge_dynamo/src/emqx_bridge_dynamo_connector.erl

@@ -178,14 +178,34 @@ on_batch_query(InstanceId, [{_ChannelId, _} | _] = Query, State) ->
 on_batch_query(_InstanceId, Query, _State) ->
     {error, {unrecoverable_error, {invalid_request, Query}}}.
 
-on_get_status(_InstanceId, #{pool_name := Pool}) ->
+health_check_timeout() ->
+    2500.
+
+on_get_status(_InstanceId, #{pool_name := Pool} = State) ->
     Health = emqx_resource_pool:health_check_workers(
-        Pool, {emqx_bridge_dynamo_connector_client, is_connected, []}
+        Pool,
+        {emqx_bridge_dynamo_connector_client, is_connected, [
+            health_check_timeout()
+        ]},
+        health_check_timeout(),
+        #{return_values => true}
     ),
-    status_result(Health).
+    case Health of
+        {error, timeout} ->
+            {?status_connecting, State, <<"timeout_while_checking_connection">>};
+        {ok, Results} ->
+            status_result(Results, State)
+    end.
 
-status_result(_Status = true) -> ?status_connected;
-status_result(_Status = false) -> ?status_connecting.
+status_result(Results, State) ->
+    case lists:filter(fun(Res) -> Res =/= true end, Results) of
+        [] when Results =:= [] ->
+            ?status_connecting;
+        [] ->
+            ?status_connected;
+        [{false, Error} | _] ->
+            {?status_connecting, State, Error}
+    end.
 
 %%========================================================================================
 %% Helper fns

+ 9 - 12
apps/emqx_bridge_dynamo/src/emqx_bridge_dynamo_connector_client.erl

@@ -9,7 +9,7 @@
 %% API
 -export([
     start_link/1,
-    is_connected/1,
+    is_connected/2,
     query/4
 ]).
 
@@ -27,20 +27,17 @@
 -export([execute/2]).
 -endif.
 
-%% The default timeout for DynamoDB REST API calls is 10 seconds,
-%% but this value for `gen_server:call` is 5s,
-%% so we should pass the timeout to `gen_server:call`
--define(HEALTH_CHECK_TIMEOUT, 10000).
-
 %%%===================================================================
 %%% API
 %%%===================================================================
-is_connected(Pid) ->
+is_connected(Pid, Timeout) ->
     try
-        gen_server:call(Pid, is_connected, ?HEALTH_CHECK_TIMEOUT)
+        gen_server:call(Pid, is_connected, Timeout)
     catch
-        _:_ ->
-            false
+        _:{timeout, _} ->
+            {false, <<"timeout_while_checking_connection_dynamo_client">>};
+        _:Error ->
+            {false, Error}
     end.
 
 query(Pid, Table, Query, Templates) ->
@@ -76,8 +73,8 @@ handle_call(is_connected, _From, State) ->
         case erlcloud_ddb2:list_tables([{limit, 1}]) of
             {ok, _} ->
                 true;
-            _ ->
-                false
+            Error ->
+                {false, Error}
         end,
     {reply, IsConnected, State};
 handle_call({query, Table, Query, Templates}, _From, State) ->

+ 49 - 2
apps/emqx_bridge_dynamo/test/emqx_bridge_dynamo_SUITE.erl

@@ -88,7 +88,9 @@ init_per_suite(Config) ->
 
 end_per_suite(_Config) ->
     emqx_mgmt_api_test_util:end_suite(),
-    ok = emqx_common_test_helpers:stop_apps([emqx_bridge, emqx_resource, emqx_conf, erlcloud]),
+    ok = emqx_common_test_helpers:stop_apps([
+        emqx_rule_engine, emqx_bridge, emqx_resource, emqx_conf, erlcloud
+    ]),
     ok.
 
 init_per_testcase(TestCase, Config) ->
@@ -134,7 +136,7 @@ common_init(ConfigT) ->
             emqx_common_test_helpers:reset_proxy(ProxyHost, ProxyPort),
             % Ensure enterprise bridge module is loaded
             ok = emqx_common_test_helpers:start_apps([
-                emqx_conf, emqx_resource, emqx_bridge
+                emqx_conf, emqx_resource, emqx_bridge, emqx_rule_engine
             ]),
             _ = application:ensure_all_started(erlcloud),
             _ = emqx_bridge_enterprise:module_info(),
@@ -273,6 +275,24 @@ create_bridge_http(Params) ->
         Error -> Error
     end.
 
+update_bridge_http(#{<<"type">> := Type, <<"name">> := Name} = Config) ->
+    BridgeID = emqx_bridge_resource:bridge_id(Type, Name),
+    Path = emqx_mgmt_api_test_util:api_path(["bridges", BridgeID]),
+    AuthHeader = emqx_mgmt_api_test_util:auth_header_(),
+    case emqx_mgmt_api_test_util:request_api(put, Path, "", AuthHeader, Config) of
+        {ok, Res} -> {ok, emqx_utils_json:decode(Res, [return_maps])};
+        Error -> Error
+    end.
+
+get_bridge_http(#{<<"type">> := Type, <<"name">> := Name}) ->
+    BridgeID = emqx_bridge_resource:bridge_id(Type, Name),
+    Path = emqx_mgmt_api_test_util:api_path(["bridges", BridgeID]),
+    AuthHeader = emqx_mgmt_api_test_util:auth_header_(),
+    case emqx_mgmt_api_test_util:request_api(get, Path, "", AuthHeader) of
+        {ok, Res} -> {ok, emqx_utils_json:decode(Res, [return_maps])};
+        Error -> Error
+    end.
+
 send_message(Config, Payload) ->
     Name = ?config(dynamo_name, Config),
     BridgeType = ?config(dynamo_bridge_type, Config),
@@ -359,6 +379,33 @@ t_setup_via_config_and_publish(Config) ->
     ),
     ok.
 
+%% https://emqx.atlassian.net/browse/EMQX-11984
+t_setup_via_http_api_and_update_wrong_config(Config) ->
+    BridgeType = ?config(dynamo_bridge_type, Config),
+    Name = ?config(dynamo_name, Config),
+    PgsqlConfig0 = ?config(dynamo_config, Config),
+    PgsqlConfig = PgsqlConfig0#{
+        <<"name">> => Name,
+        <<"type">> => BridgeType,
+        %% NOTE: using literal secret with HTTP API requests.
+        <<"aws_secret_access_key">> => <<?SECRET_ACCESS_KEY>>
+    },
+    BrokenConfig = PgsqlConfig#{<<"url">> => <<"http://non_existing_host:9999">>},
+    ?assertMatch(
+        {ok, _},
+        create_bridge_http(BrokenConfig)
+    ),
+    WrongURL2 = <<"http://non_existing_host:9998">>,
+    BrokenConfig2 = PgsqlConfig#{<<"url">> => WrongURL2},
+    ?assertMatch(
+        {ok, _},
+        update_bridge_http(BrokenConfig2)
+    ),
+    %% Check that the update worked
+    {ok, Result} = get_bridge_http(PgsqlConfig),
+    ?assertMatch(#{<<"url">> := WrongURL2}, Result),
+    emqx_bridge:remove(BridgeType, Name).
+
 t_setup_via_http_api_and_publish(Config) ->
     BridgeType = ?config(dynamo_bridge_type, Config),
     Name = ?config(dynamo_name, Config),

+ 3 - 3
apps/emqx_bridge_gcp_pubsub/src/emqx_bridge_gcp_pubsub_client.erl

@@ -198,13 +198,13 @@ get_status(#{connect_timeout := Timeout, pool_name := PoolName} = State) ->
 %%-------------------------------------------------------------------------------------------------
 
 -spec get_topic(topic(), state(), request_opts()) -> {ok, map()} | {error, term()}.
-get_topic(Topic, ConnectorState, ReqOpts) ->
-    #{project_id := ProjectId} = ConnectorState,
+get_topic(Topic, ClientState, ReqOpts) ->
+    #{project_id := ProjectId} = ClientState,
     Method = get,
     Path = <<"/v1/projects/", ProjectId/binary, "/topics/", Topic/binary>>,
     Body = <<>>,
     PreparedRequest = {prepared_request, {Method, Path, Body}, ReqOpts},
-    ?MODULE:query_sync(PreparedRequest, ConnectorState).
+    ?MODULE:query_sync(PreparedRequest, ClientState).
 
 %%-------------------------------------------------------------------------------------------------
 %% Helper fns

+ 30 - 13
apps/emqx_bridge_gcp_pubsub/src/emqx_bridge_gcp_pubsub_impl_producer.erl

@@ -186,10 +186,14 @@ on_batch_query_async(ResourceId, Requests, ReplyFunAndArgs, ConnectorState) ->
     {ok, connector_state()}.
 on_add_channel(_ConnectorResId, ConnectorState0, ActionId, ActionConfig) ->
     #{installed_actions := InstalledActions0} = ConnectorState0,
-    ChannelState = install_channel(ActionConfig),
-    InstalledActions = InstalledActions0#{ActionId => ChannelState},
-    ConnectorState = ConnectorState0#{installed_actions := InstalledActions},
-    {ok, ConnectorState}.
+    case install_channel(ActionConfig, ConnectorState0) of
+        {ok, ChannelState} ->
+            InstalledActions = InstalledActions0#{ActionId => ChannelState},
+            ConnectorState = ConnectorState0#{installed_actions := InstalledActions},
+            {ok, ConnectorState};
+        Error = {error, _} ->
+            Error
+    end.
 
 -spec on_remove_channel(
     connector_resource_id(),
@@ -218,8 +222,7 @@ on_get_channel_status(_ConnectorResId, _ChannelId, _ConnectorState) ->
 %% Helper fns
 %%-------------------------------------------------------------------------------------------------
 
-%% TODO: check if topic exists ("unhealthy target")
-install_channel(ActionConfig) ->
+install_channel(ActionConfig, ConnectorState) ->
     #{
         parameters := #{
             attributes_template := AttributesTemplate,
@@ -231,13 +234,27 @@ install_channel(ActionConfig) ->
             request_ttl := RequestTTL
         }
     } = ActionConfig,
-    #{
-        attributes_template => preproc_attributes(AttributesTemplate),
-        ordering_key_template => emqx_placeholder:preproc_tmpl(OrderingKeyTemplate),
-        payload_template => emqx_placeholder:preproc_tmpl(PayloadTemplate),
-        pubsub_topic => PubSubTopic,
-        request_ttl => RequestTTL
-    }.
+    #{client := Client} = ConnectorState,
+    case
+        emqx_bridge_gcp_pubsub_client:get_topic(PubSubTopic, Client, #{request_ttl => RequestTTL})
+    of
+        {error, #{status_code := 404}} ->
+            {error, {unhealthy_target, <<"Topic does not exist">>}};
+        {error, #{status_code := 403}} ->
+            {error, {unhealthy_target, <<"Permission denied for topic">>}};
+        {error, #{status_code := 401}} ->
+            {error, {unhealthy_target, <<"Bad credentials">>}};
+        {error, Reason} ->
+            {error, Reason};
+        {ok, _} ->
+            {ok, #{
+                attributes_template => preproc_attributes(AttributesTemplate),
+                ordering_key_template => emqx_placeholder:preproc_tmpl(OrderingKeyTemplate),
+                payload_template => emqx_placeholder:preproc_tmpl(PayloadTemplate),
+                pubsub_topic => PubSubTopic,
+                request_ttl => RequestTTL
+            }}
+    end.
 
 -spec do_send_requests_sync(
     connector_state(),

+ 100 - 55
apps/emqx_bridge_gcp_pubsub/test/emqx_bridge_gcp_pubsub_producer_SUITE.erl

@@ -76,6 +76,7 @@ only_sync_tests() ->
     [t_query_sync].
 
 init_per_suite(Config) ->
+    emqx_common_test_helpers:clear_screen(),
     Apps = emqx_cth_suite:start(
         [
             emqx,
@@ -257,20 +258,31 @@ create_rule_and_action_http(Config) ->
 success_http_handler() ->
     TestPid = self(),
     fun(Req0, State) ->
-        {ok, Body, Req} = cowboy_req:read_body(Req0),
-        TestPid ! {http, cowboy_req:headers(Req), Body},
-        Rep = cowboy_req:reply(
-            200,
-            #{<<"content-type">> => <<"application/json">>},
-            emqx_utils_json:encode(#{messageIds => [<<"6058891368195201">>]}),
-            Req
-        ),
-        {ok, Rep, State}
+        case {cowboy_req:method(Req0), cowboy_req:path(Req0)} of
+            {<<"GET">>, <<"/v1/projects/myproject/topics/", _/binary>>} ->
+                Rep = cowboy_req:reply(
+                    200,
+                    #{<<"content-type">> => <<"application/json">>},
+                    <<"{}">>,
+                    Req0
+                ),
+                {ok, Rep, State};
+            _ ->
+                {ok, Body, Req} = cowboy_req:read_body(Req0),
+                TestPid ! {http, cowboy_req:headers(Req), Body},
+                Rep = cowboy_req:reply(
+                    200,
+                    #{<<"content-type">> => <<"application/json">>},
+                    emqx_utils_json:encode(#{messageIds => [<<"6058891368195201">>]}),
+                    Req
+                ),
+                {ok, Rep, State}
+        end
     end.
 
 start_echo_http_server() ->
     HTTPHost = "localhost",
-    HTTPPath = <<"/v1/projects/myproject/topics/mytopic:publish">>,
+    HTTPPath = '_',
     ServerSSLOpts =
         [
             {verify, verify_none},
@@ -656,6 +668,20 @@ wait_n_events(TelemetryTable, ResourceId, NEvents, Timeout, EventName) ->
         error({timeout_waiting_for_telemetry, EventName})
     end.
 
+kill_gun_process(EhttpcPid) ->
+    State = ehttpc:get_state(EhttpcPid, minimal),
+    GunPid = maps:get(client, State),
+    true = is_pid(GunPid),
+    _ = exit(GunPid, kill),
+    ok.
+
+kill_gun_processes(ConnectorResourceId) ->
+    Pool = ehttpc:workers(ConnectorResourceId),
+    Workers = lists:map(fun({_, Pid}) -> Pid end, Pool),
+    %% assert there is at least one pool member
+    ?assertMatch([_ | _], Workers),
+    lists:foreach(fun(Pid) -> kill_gun_process(Pid) end, Workers).
+
 %%------------------------------------------------------------------------------
 %% Testcases
 %%------------------------------------------------------------------------------
@@ -1343,15 +1369,26 @@ t_failure_with_body(Config) ->
     TestPid = self(),
     FailureWithBodyHandler =
         fun(Req0, State) ->
-            {ok, Body, Req} = cowboy_req:read_body(Req0),
-            TestPid ! {http, cowboy_req:headers(Req), Body},
-            Rep = cowboy_req:reply(
-                400,
-                #{<<"content-type">> => <<"application/json">>},
-                emqx_utils_json:encode(#{}),
-                Req
-            ),
-            {ok, Rep, State}
+            case {cowboy_req:method(Req0), cowboy_req:path(Req0)} of
+                {<<"GET">>, <<"/v1/projects/myproject/topics/", _/binary>>} ->
+                    Rep = cowboy_req:reply(
+                        200,
+                        #{<<"content-type">> => <<"application/json">>},
+                        <<"{}">>,
+                        Req0
+                    ),
+                    {ok, Rep, State};
+                _ ->
+                    {ok, Body, Req} = cowboy_req:read_body(Req0),
+                    TestPid ! {http, cowboy_req:headers(Req), Body},
+                    Rep = cowboy_req:reply(
+                        400,
+                        #{<<"content-type">> => <<"application/json">>},
+                        emqx_utils_json:encode(#{}),
+                        Req
+                    ),
+                    {ok, Rep, State}
+            end
         end,
     ok = emqx_bridge_http_connector_test_server:set_handler(FailureWithBodyHandler),
     Topic = <<"t/topic">>,
@@ -1381,15 +1418,26 @@ t_failure_no_body(Config) ->
     TestPid = self(),
     FailureNoBodyHandler =
         fun(Req0, State) ->
-            {ok, Body, Req} = cowboy_req:read_body(Req0),
-            TestPid ! {http, cowboy_req:headers(Req), Body},
-            Rep = cowboy_req:reply(
-                400,
-                #{<<"content-type">> => <<"application/json">>},
-                <<>>,
-                Req
-            ),
-            {ok, Rep, State}
+            case {cowboy_req:method(Req0), cowboy_req:path(Req0)} of
+                {<<"GET">>, <<"/v1/projects/myproject/topics/", _/binary>>} ->
+                    Rep = cowboy_req:reply(
+                        200,
+                        #{<<"content-type">> => <<"application/json">>},
+                        <<"{}">>,
+                        Req0
+                    ),
+                    {ok, Rep, State};
+                _ ->
+                    {ok, Body, Req} = cowboy_req:read_body(Req0),
+                    TestPid ! {http, cowboy_req:headers(Req), Body},
+                    Rep = cowboy_req:reply(
+                        400,
+                        #{<<"content-type">> => <<"application/json">>},
+                        <<>>,
+                        Req
+                    ),
+                    {ok, Rep, State}
+            end
         end,
     ok = emqx_bridge_http_connector_test_server:set_handler(FailureNoBodyHandler),
     Topic = <<"t/topic">>,
@@ -1415,20 +1463,6 @@ t_failure_no_body(Config) ->
     ),
     ok.
 
-kill_gun_process(EhttpcPid) ->
-    State = ehttpc:get_state(EhttpcPid, minimal),
-    GunPid = maps:get(client, State),
-    true = is_pid(GunPid),
-    _ = exit(GunPid, kill),
-    ok.
-
-kill_gun_processes(ConnectorResourceId) ->
-    Pool = ehttpc:workers(ConnectorResourceId),
-    Workers = lists:map(fun({_, Pid}) -> Pid end, Pool),
-    %% assert there is at least one pool member
-    ?assertMatch([_ | _], Workers),
-    lists:foreach(fun(Pid) -> kill_gun_process(Pid) end, Workers).
-
 t_unrecoverable_error(Config) ->
     ActionResourceId = ?config(action_resource_id, Config),
     ConnectorResourceId = ?config(connector_resource_id, Config),
@@ -1436,19 +1470,30 @@ t_unrecoverable_error(Config) ->
     TestPid = self(),
     FailureNoBodyHandler =
         fun(Req0, State) ->
-            {ok, Body, Req} = cowboy_req:read_body(Req0),
-            TestPid ! {http, cowboy_req:headers(Req), Body},
-            %% kill the gun process while it's waiting for the
-            %% response so we provoke an `{error, _}' response from
-            %% ehttpc.
-            ok = kill_gun_processes(ConnectorResourceId),
-            Rep = cowboy_req:reply(
-                200,
-                #{<<"content-type">> => <<"application/json">>},
-                <<>>,
-                Req
-            ),
-            {ok, Rep, State}
+            case {cowboy_req:method(Req0), cowboy_req:path(Req0)} of
+                {<<"GET">>, <<"/v1/projects/myproject/topics/", _/binary>>} ->
+                    Rep = cowboy_req:reply(
+                        200,
+                        #{<<"content-type">> => <<"application/json">>},
+                        <<"{}">>,
+                        Req0
+                    ),
+                    {ok, Rep, State};
+                _ ->
+                    {ok, Body, Req} = cowboy_req:read_body(Req0),
+                    TestPid ! {http, cowboy_req:headers(Req), Body},
+                    %% kill the gun process while it's waiting for the
+                    %% response so we provoke an `{error, _}' response from
+                    %% ehttpc.
+                    ok = kill_gun_processes(ConnectorResourceId),
+                    Rep = cowboy_req:reply(
+                        200,
+                        #{<<"content-type">> => <<"application/json">>},
+                        <<>>,
+                        Req
+                    ),
+                    {ok, Rep, State}
+            end
         end,
     ok = emqx_bridge_http_connector_test_server:set_handler(FailureNoBodyHandler),
     Topic = <<"t/topic">>,

+ 215 - 0
apps/emqx_bridge_gcp_pubsub/test/emqx_bridge_v2_gcp_pubsub_producer_SUITE.erl

@@ -0,0 +1,215 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2024 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%--------------------------------------------------------------------
+
+-module(emqx_bridge_v2_gcp_pubsub_producer_SUITE).
+
+-compile(nowarn_export_all).
+-compile(export_all).
+
+-include_lib("eunit/include/eunit.hrl").
+-include_lib("common_test/include/ct.hrl").
+-include_lib("snabbkaffe/include/snabbkaffe.hrl").
+
+-define(CONNECTOR_TYPE_BIN, <<"gcp_pubsub_producer">>).
+-define(ACTION_TYPE_BIN, <<"gcp_pubsub_producer">>).
+
+%%------------------------------------------------------------------------------
+%% CT boilerplate
+%%------------------------------------------------------------------------------
+
+all() ->
+    emqx_common_test_helpers:all(?MODULE).
+
+init_per_suite(Config) ->
+    emqx_common_test_helpers:clear_screen(),
+    emqx_bridge_gcp_pubsub_consumer_SUITE:init_per_suite(Config).
+
+end_per_suite(Config) ->
+    emqx_bridge_gcp_pubsub_consumer_SUITE:end_per_suite(Config).
+
+init_per_testcase(TestCase, Config) ->
+    common_init_per_testcase(TestCase, Config).
+
+common_init_per_testcase(TestCase, Config0) ->
+    ct:timetrap(timer:seconds(60)),
+    ServiceAccountJSON =
+        #{<<"project_id">> := ProjectId} =
+        emqx_bridge_gcp_pubsub_utils:generate_service_account_json(),
+    UniqueNum = integer_to_binary(erlang:unique_integer()),
+    Name = <<(atom_to_binary(TestCase))/binary, UniqueNum/binary>>,
+    ConnectorConfig = connector_config(Name, ServiceAccountJSON),
+    PubsubTopic = Name,
+    ActionConfig = action_config(#{
+        connector => Name,
+        parameters => #{pubsub_topic => PubsubTopic}
+    }),
+    Config = [
+        {bridge_kind, action},
+        {action_type, ?ACTION_TYPE_BIN},
+        {action_name, Name},
+        {action_config, ActionConfig},
+        {connector_name, Name},
+        {connector_type, ?CONNECTOR_TYPE_BIN},
+        {connector_config, ConnectorConfig},
+        {service_account_json, ServiceAccountJSON},
+        {project_id, ProjectId},
+        {pubsub_topic, PubsubTopic}
+        | Config0
+    ],
+    ok = emqx_bridge_gcp_pubsub_consumer_SUITE:ensure_topic(Config, PubsubTopic),
+    Config.
+
+end_per_testcase(_Testcase, Config) ->
+    ProxyHost = ?config(proxy_host, Config),
+    ProxyPort = ?config(proxy_port, Config),
+    emqx_common_test_helpers:reset_proxy(ProxyHost, ProxyPort),
+    emqx_bridge_v2_testlib:delete_all_bridges_and_connectors(),
+    emqx_common_test_helpers:call_janitor(60_000),
+    ok = snabbkaffe:stop(),
+    ok.
+
+%%------------------------------------------------------------------------------
+%% Helper fns
+%%------------------------------------------------------------------------------
+
+connector_config(Name, ServiceAccountJSON) ->
+    InnerConfigMap0 =
+        #{
+            <<"enable">> => true,
+            <<"tags">> => [<<"bridge">>],
+            <<"description">> => <<"my cool bridge">>,
+            <<"connect_timeout">> => <<"5s">>,
+            <<"pool_size">> => 8,
+            <<"pipelining">> => <<"100">>,
+            <<"max_retries">> => <<"2">>,
+            <<"service_account_json">> => ServiceAccountJSON,
+            <<"resource_opts">> =>
+                #{
+                    <<"health_check_interval">> => <<"1s">>,
+                    <<"start_after_created">> => true,
+                    <<"start_timeout">> => <<"5s">>
+                }
+        },
+    emqx_bridge_v2_testlib:parse_and_check_connector(?ACTION_TYPE_BIN, Name, InnerConfigMap0).
+
+action_config(Overrides0) ->
+    Overrides = emqx_utils_maps:binary_key_map(Overrides0),
+    CommonConfig =
+        #{
+            <<"enable">> => true,
+            <<"connector">> => <<"please override">>,
+            <<"parameters">> =>
+                #{
+                    <<"pubsub_topic">> => <<"please override">>
+                },
+            <<"resource_opts">> => #{
+                <<"batch_size">> => 1,
+                <<"batch_time">> => <<"0ms">>,
+                <<"buffer_mode">> => <<"memory_only">>,
+                <<"buffer_seg_bytes">> => <<"10MB">>,
+                <<"health_check_interval">> => <<"15s">>,
+                <<"inflight_window">> => 100,
+                <<"max_buffer_bytes">> => <<"256MB">>,
+                <<"metrics_flush_interval">> => <<"1s">>,
+                <<"query_mode">> => <<"sync">>,
+                <<"request_ttl">> => <<"45s">>,
+                <<"resume_interval">> => <<"15s">>,
+                <<"worker_pool_size">> => <<"1">>
+            }
+        },
+    maps:merge(CommonConfig, Overrides).
+
+assert_persisted_service_account_json_is_binary(ConnectorName) ->
+    %% ensure cluster.hocon has a binary encoded json string as the value
+    {ok, Hocon} = hocon:files([application:get_env(emqx, cluster_hocon_file, undefined)]),
+    ?assertMatch(
+        Bin when is_binary(Bin),
+        emqx_utils_maps:deep_get(
+            [
+                <<"connectors">>,
+                <<"gcp_pubsub_producer">>,
+                ConnectorName,
+                <<"service_account_json">>
+            ],
+            Hocon
+        )
+    ),
+    ok.
+
+%%------------------------------------------------------------------------------
+%% Testcases
+%%------------------------------------------------------------------------------
+
+t_start_stop(Config) ->
+    ok = emqx_bridge_v2_testlib:t_start_stop(Config, gcp_pubsub_stop),
+    ok.
+
+t_create_via_http(Config) ->
+    ok = emqx_bridge_v2_testlib:t_create_via_http(Config),
+    ok.
+
+t_create_via_http_json_object_service_account(Config0) ->
+    %% After the config goes through the roundtrip with `hocon_tconf:check_plain', service
+    %% account json comes back as a binary even if the input is a json object.
+    ConnectorName = ?config(connector_name, Config0),
+    ConnConfig0 = ?config(connector_config, Config0),
+    Config1 = proplists:delete(connector_config, Config0),
+    ConnConfig1 = maps:update_with(
+        <<"service_account_json">>,
+        fun(X) ->
+            ?assert(is_binary(X), #{json => X}),
+            JSON = emqx_utils_json:decode(X, [return_maps]),
+            ?assert(is_map(JSON)),
+            JSON
+        end,
+        ConnConfig0
+    ),
+    Config = [{connector_config, ConnConfig1} | Config1],
+    ok = emqx_bridge_v2_testlib:t_create_via_http(Config),
+    assert_persisted_service_account_json_is_binary(ConnectorName),
+    ok.
+
+%% Check that creating an action (V2) with a non-existent topic leads returns an error.
+t_bad_topic(Config) ->
+    ?check_trace(
+        begin
+            %% Should it really be 201 here?
+            ?assertMatch(
+                {ok, {{_, 201, _}, _, #{}}},
+                emqx_bridge_v2_testlib:create_bridge_api(
+                    Config,
+                    #{<<"parameters">> => #{<<"pubsub_topic">> => <<"i-dont-exist">>}}
+                )
+            ),
+            #{
+                kind := Kind,
+                type := Type,
+                name := Name
+            } = emqx_bridge_v2_testlib:get_common_values(Config),
+            ActionConfig0 = emqx_bridge_v2_testlib:get_value(action_config, Config),
+            ProbeRes = emqx_bridge_v2_testlib:probe_bridge_api(
+                Kind,
+                Type,
+                Name,
+                emqx_utils_maps:deep_merge(
+                    ActionConfig0,
+                    #{<<"parameters">> => #{<<"pubsub_topic">> => <<"i-dont-exist">>}}
+                )
+            ),
+            ?assertMatch(
+                {error, {{_, 400, _}, _, _}},
+                ProbeRes
+            ),
+            {error, {{_, 400, _}, _, #{<<"message">> := Msg}}} = ProbeRes,
+            ?assertMatch(match, re:run(Msg, <<"unhealthy_target">>, [{capture, none}]), #{
+                msg => Msg
+            }),
+            ?assertMatch(match, re:run(Msg, <<"Topic does not exist">>, [{capture, none}]), #{
+                msg => Msg
+            }),
+            ok
+        end,
+        []
+    ),
+    ok.

+ 6 - 4
apps/emqx_bridge_hstreamdb/src/emqx_bridge_hstreamdb.erl

@@ -173,14 +173,16 @@ fields(action_parameters) ->
         {record_template,
             mk(binary(), #{default => <<"${payload}">>, desc => ?DESC("record_template")})},
         {aggregation_pool_size,
-            mk(integer(), #{
+            mk(pos_integer(), #{
                 default => ?DEFAULT_AGG_POOL_SIZE, desc => ?DESC("aggregation_pool_size")
             })},
         {max_batches,
-            mk(integer(), #{default => ?DEFAULT_MAX_BATCHES, desc => ?DESC("max_batches")})},
+            mk(pos_integer(), #{default => ?DEFAULT_MAX_BATCHES, desc => ?DESC("max_batches")})},
         {writer_pool_size,
-            mk(integer(), #{default => ?DEFAULT_WRITER_POOL_SIZE, desc => ?DESC("writer_pool_size")})},
-        {batch_size, mk(integer(), #{default => 100, desc => ?DESC("batch_size")})},
+            mk(pos_integer(), #{
+                default => ?DEFAULT_WRITER_POOL_SIZE, desc => ?DESC("writer_pool_size")
+            })},
+        {batch_size, mk(pos_integer(), #{default => 100, desc => ?DESC("batch_size")})},
         {batch_interval,
             mk(emqx_schema:timeout_duration_ms(), #{
                 default => ?DEFAULT_BATCH_INTERVAL_RAW, desc => ?DESC("batch_interval")

+ 4 - 0
apps/emqx_bridge_iotdb/src/emqx_bridge_iotdb_connector.erl

@@ -544,6 +544,8 @@ convert_int(Str) when is_binary(Str) ->
         _:_ ->
             convert_int(binary_to_float(Str))
     end;
+convert_int(null) ->
+    null;
 convert_int(undefined) ->
     null.
 
@@ -556,6 +558,8 @@ convert_float(Str) when is_binary(Str) ->
         _:_ ->
             convert_float(binary_to_integer(Str))
     end;
+convert_float(null) ->
+    null;
 convert_float(undefined) ->
     null.
 

+ 3 - 0
apps/emqx_bridge_kafka/src/emqx_bridge_kafka.erl

@@ -32,6 +32,9 @@
     producer_opts/1
 ]).
 
+%% Internal export to be used in v2 schema
+-export([consumer_topic_mapping_validator/1]).
+
 -export([
     kafka_connector_config_fields/0,
     kafka_producer_converter/2,

+ 7 - 1
apps/emqx_bridge_kafka/src/emqx_bridge_kafka_consumer_schema.erl

@@ -65,7 +65,7 @@ fields(source_parameters) ->
                     type => hocon_schema:field_schema(Sc, type),
                     required => false,
                     default => [],
-                    validator => fun(_) -> ok end,
+                    validator => fun legacy_consumer_topic_mapping_validator/1,
                     importance => ?IMPORTANCE_HIDDEN
                 },
                 {Name, hocon_schema:override(Sc, Override)};
@@ -231,3 +231,9 @@ connector_example(put) ->
                 start_timeout => <<"5s">>
             }
     }.
+
+legacy_consumer_topic_mapping_validator(_TopicMapping = []) ->
+    %% Can be (and should be, unless it has migrated from v1) empty in v2.
+    ok;
+legacy_consumer_topic_mapping_validator(TopicMapping = [_ | _]) ->
+    emqx_bridge_kafka:consumer_topic_mapping_validator(TopicMapping).

+ 44 - 4
apps/emqx_bridge_kafka/src/emqx_bridge_kafka_impl_consumer.erl

@@ -220,10 +220,17 @@ on_stop(ConnectorResId, State) ->
 
 -spec on_get_status(connector_resource_id(), connector_state()) ->
     ?status_connected | ?status_disconnected.
-on_get_status(_ConnectorResId, _State = #{kafka_client_id := ClientID}) ->
-    case brod_sup:find_client(ClientID) of
-        [_Pid] -> ?status_connected;
-        _ -> ?status_disconnected
+on_get_status(_ConnectorResId, State = #{kafka_client_id := ClientID}) ->
+    case whereis(ClientID) of
+        Pid when is_pid(Pid) ->
+            case check_client_connectivity(Pid) of
+                {Status, Reason} ->
+                    {Status, State, Reason};
+                Status ->
+                    Status
+            end;
+        _ ->
+            ?status_disconnected
     end;
 on_get_status(_ConnectorResId, _State) ->
     ?status_disconnected.
@@ -631,6 +638,39 @@ is_dry_run(ConnectorResId) ->
             string:equal(TestIdStart, ConnectorResId)
     end.
 
+-spec check_client_connectivity(pid()) ->
+    ?status_connected
+    | ?status_disconnected
+    | {?status_disconnected, term()}.
+check_client_connectivity(ClientPid) ->
+    %% We use a fake group id just to probe the connection, as `get_group_coordinator'
+    %% will ensure a connection to the broker.
+    FakeGroupId = <<"____emqx_consumer_probe">>,
+    case brod_client:get_group_coordinator(ClientPid, FakeGroupId) of
+        {error, client_down} ->
+            ?status_disconnected;
+        {error, {client_down, Reason}} ->
+            %% `brod' should have already logged the client being down.
+            {?status_disconnected, maybe_clean_error(Reason)};
+        {error, Reason} ->
+            %% `brod' should have already logged the client being down.
+            {?status_disconnected, maybe_clean_error(Reason)};
+        {ok, _Metadata} ->
+            ?status_connected
+    end.
+
+%% Attempt to make the returned error a bit more friendly.
+maybe_clean_error(Reason) ->
+    case Reason of
+        [{{Host, Port}, {nxdomain, _Stacktrace}} | _] when is_integer(Port) ->
+            HostPort = iolist_to_binary([Host, ":", integer_to_binary(Port)]),
+            {HostPort, nxdomain};
+        [{error_code, Code}, {error_msg, Msg} | _] ->
+            {Code, Msg};
+        _ ->
+            Reason
+    end.
+
 -spec make_client_id(connector_resource_id(), binary(), atom() | binary()) -> atom().
 make_client_id(ConnectorResId, BridgeType, BridgeName) ->
     case is_dry_run(ConnectorResId) of

+ 34 - 2
apps/emqx_bridge_kafka/test/emqx_bridge_kafka_impl_consumer_SUITE.erl

@@ -74,6 +74,7 @@ testcases(once) ->
         t_node_joins_existing_cluster,
         t_cluster_node_down,
         t_multiple_topic_mappings,
+        t_duplicated_kafka_topics,
         t_dynamic_mqtt_topic,
         t_resource_manager_crash_after_subscriber_started,
         t_resource_manager_crash_before_subscriber_started
@@ -292,7 +293,10 @@ end_per_group(_Group, _Config) ->
 init_per_testcase(t_cluster_group = TestCase, Config0) ->
     Config = emqx_utils:merge_opts(Config0, [{num_partitions, 6}]),
     common_init_per_testcase(TestCase, Config);
-init_per_testcase(t_multiple_topic_mappings = TestCase, Config0) ->
+init_per_testcase(TestCase, Config0) when
+    TestCase =:= t_multiple_topic_mappings;
+    TestCase =:= t_duplicated_kafka_topics
+->
     KafkaTopicBase =
         <<
             (atom_to_binary(TestCase))/binary,
@@ -671,7 +675,12 @@ authentication(_) ->
 parse_and_check(ConfigString, Name) ->
     {ok, RawConf} = hocon:binary(ConfigString, #{format => map}),
     TypeBin = ?BRIDGE_TYPE_BIN,
-    hocon_tconf:check_plain(emqx_bridge_schema, RawConf, #{required => false, atom_key => false}),
+    #{<<"bridges">> := #{TypeBin := #{Name := _}}} =
+        hocon_tconf:check_plain(
+            emqx_bridge_schema,
+            RawConf,
+            #{required => false, atom_key => false}
+        ),
     #{<<"bridges">> := #{TypeBin := #{Name := Config}}} = RawConf,
     Config.
 
@@ -1359,6 +1368,28 @@ t_multiple_topic_mappings(Config) ->
     ),
     ok.
 
+%% Although we have a test for the v1 schema, the v1 compatibility layer does some
+%% shenanigans that do not go through V1 schema validations...
+t_duplicated_kafka_topics(Config) ->
+    #{<<"topic_mapping">> := [#{<<"kafka_topic">> := KT} | _] = TM0} =
+        ?config(kafka_config, Config),
+    TM = [M#{<<"kafka_topic">> := KT} || M <- TM0],
+    ?check_trace(
+        begin
+            ?assertMatch(
+                {error, {{_, 400, _}, _, _}},
+                create_bridge_api(
+                    Config,
+                    #{<<"topic_mapping">> => TM}
+                )
+            ),
+
+            ok
+        end,
+        []
+    ),
+    ok.
+
 t_on_get_status(Config) ->
     ProxyPort = ?config(proxy_port, Config),
     ProxyHost = ?config(proxy_host, Config),
@@ -2071,6 +2102,7 @@ t_begin_offset_earliest(Config) ->
             {ok, _} = create_bridge(Config, #{
                 <<"kafka">> => #{<<"offset_reset_policy">> => <<"earliest">>}
             }),
+            ?retry(500, 20, ?assertEqual({ok, connected}, health_check(Config))),
 
             #{num_published => NumMessages}
         end,

+ 12 - 0
apps/emqx_bridge_kafka/test/emqx_bridge_v2_kafka_consumer_SUITE.erl

@@ -339,3 +339,15 @@ t_update_topic(Config) ->
         emqx_bridge_v2_testlib:get_source_api(?SOURCE_TYPE_BIN, Name)
     ),
     ok.
+
+t_bad_bootstrap_host(Config) ->
+    ?assertMatch(
+        {error, {{_, 400, _}, _, _}},
+        emqx_bridge_v2_testlib:probe_connector_api(
+            Config,
+            #{
+                <<"bootstrap_hosts">> => <<"bad_host:9999">>
+            }
+        )
+    ),
+    ok.

+ 16 - 1
apps/emqx_bridge_kinesis/src/emqx_bridge_kinesis.erl

@@ -62,7 +62,19 @@ fields(kinesis_action) ->
                 required => true,
                 desc => ?DESC("action_parameters")
             }
-        )
+        ),
+        #{
+            resource_opts_ref => hoconsc:ref(?MODULE, action_resource_opts)
+        }
+    );
+fields(action_resource_opts) ->
+    emqx_bridge_v2_schema:action_resource_opts_fields(
+        _Overrides = [
+            {batch_size, #{
+                type => range(1, 500),
+                validator => emqx_resource_validator:max(int, 500)
+            }}
+        ]
     );
 fields("config_producer") ->
     emqx_bridge_schema:common_bridge_fields() ++
@@ -84,6 +96,7 @@ fields("resource_opts") ->
 fields("creation_opts") ->
     emqx_resource_schema:create_opts([
         {batch_size, #{
+            type => range(1, 500),
             validator => emqx_resource_validator:max(int, 500)
         }}
     ]);
@@ -199,6 +212,8 @@ desc(action_parameters) ->
     ?DESC("action_parameters");
 desc(connector_resource_opts) ->
     ?DESC(emqx_resource_schema, "resource_opts");
+desc(action_resource_opts) ->
+    ?DESC(emqx_resource_schema, "resource_opts");
 desc(_) ->
     undefined.
 

+ 79 - 78
apps/emqx_bridge_opents/src/emqx_bridge_opents_connector.erl

@@ -292,50 +292,20 @@ try_render_messages([{ChannelId, _} | _] = BatchReq, Channels) ->
 render_channel_message(Msg, #{data := DataList}, Acc) ->
     RawOpts = #{return => rawlist, var_trans => fun(X) -> X end},
     lists:foldl(
-        fun(#{metric := MetricTk, tags := TagsTk, value := ValueTk} = Data, InAcc) ->
+        fun(
+            #{
+                metric := MetricTk,
+                tags := TagsProcer,
+                value := ValueProcer,
+                timestamp := TimeProcer
+            },
+            InAcc
+        ) ->
             MetricVal = emqx_placeholder:proc_tmpl(MetricTk, Msg),
-
-            TagsVal =
-                case TagsTk of
-                    [tags | TagTkList] ->
-                        maps:from_list([
-                            {
-                                emqx_placeholder:proc_tmpl(TagName, Msg),
-                                emqx_placeholder:proc_tmpl(TagValue, Msg)
-                            }
-                         || {TagName, TagValue} <- TagTkList
-                        ]);
-                    TagsTks ->
-                        case emqx_placeholder:proc_tmpl(TagsTks, Msg, RawOpts) of
-                            [undefined] ->
-                                #{};
-                            [Any] ->
-                                Any
-                        end
-                end,
-
-            ValueVal =
-                case ValueTk of
-                    [_] ->
-                        %% just one element, maybe is a variable or a plain text
-                        %% we should keep it as it is
-                        erlang:hd(emqx_placeholder:proc_tmpl(ValueTk, Msg, RawOpts));
-                    Tks when is_list(Tks) ->
-                        emqx_placeholder:proc_tmpl(Tks, Msg);
-                    Raw ->
-                        %% not a token list, just a raw value
-                        Raw
-                end,
-            Base = #{metric => MetricVal, tags => TagsVal, value => ValueVal},
-            [
-                case maps:get(timestamp, Data, undefined) of
-                    undefined ->
-                        Base;
-                    TimestampTk ->
-                        Base#{timestamp => emqx_placeholder:proc_tmpl(TimestampTk, Msg)}
-                end
-                | InAcc
-            ]
+            TagsVal = TagsProcer(Msg, RawOpts),
+            ValueVal = ValueProcer(Msg, RawOpts),
+            Result = TimeProcer(Msg, #{metric => MetricVal, tags => TagsVal, value => ValueVal}),
+            [Result | InAcc]
         end,
         Acc,
         DataList
@@ -345,41 +315,72 @@ preproc_data_template([]) ->
     preproc_data_template(emqx_bridge_opents:default_data_template());
 preproc_data_template(DataList) ->
     lists:map(
-        fun(#{tags := Tags, value := Value} = Data) ->
-            Data2 = maps:without([tags, value], Data),
-            Template = maps:map(
-                fun(_Key, Val) ->
-                    emqx_placeholder:preproc_tmpl(Val)
-                end,
-                Data2
-            ),
-
-            TagsTk =
-                case Tags of
-                    Tmpl when is_binary(Tmpl) ->
-                        emqx_placeholder:preproc_tmpl(Tmpl);
-                    Map when is_map(Map) ->
-                        [
-                            tags
-                            | [
-                                {
-                                    emqx_placeholder:preproc_tmpl(emqx_utils_conv:bin(TagName)),
-                                    emqx_placeholder:preproc_tmpl(TagValue)
-                                }
-                             || {TagName, TagValue} <- maps:to_list(Map)
-                            ]
-                        ]
-                end,
-
-            ValueTk =
-                case Value of
-                    Text when is_binary(Text) ->
-                        emqx_placeholder:preproc_tmpl(Text);
-                    Raw ->
-                        Raw
-                end,
-
-            Template#{tags => TagsTk, value => ValueTk}
+        fun(#{metric := Metric, tags := Tags, value := Value} = Data) ->
+            TagsProcer = mk_tags_procer(Tags),
+            ValueProcer = mk_value_procer(Value),
+            #{
+                metric => emqx_placeholder:preproc_tmpl(Metric),
+                tags => TagsProcer,
+                value => ValueProcer,
+                timestamp => mk_timestamp_procer(Data)
+            }
         end,
         DataList
     ).
+
+mk_tags_procer(Tmpl) when is_binary(Tmpl) ->
+    TagsTks = emqx_placeholder:preproc_tmpl(Tmpl),
+    fun(Msg, RawOpts) ->
+        case emqx_placeholder:proc_tmpl(TagsTks, Msg, RawOpts) of
+            [undefined] ->
+                #{};
+            [Any] ->
+                Any
+        end
+    end;
+mk_tags_procer(Map) when is_map(Map) ->
+    TagTkList = [
+        {
+            emqx_placeholder:preproc_tmpl(emqx_utils_conv:bin(TagName)),
+            emqx_placeholder:preproc_tmpl(TagValue)
+        }
+     || {TagName, TagValue} <- maps:to_list(Map)
+    ],
+    fun(Msg, _RawOpts) ->
+        maps:from_list([
+            {
+                emqx_placeholder:proc_tmpl(TagName, Msg),
+                emqx_placeholder:proc_tmpl(TagValue, Msg)
+            }
+         || {TagName, TagValue} <- TagTkList
+        ])
+    end.
+
+mk_value_procer(Text) when is_binary(Text) ->
+    ValueTk = emqx_placeholder:preproc_tmpl(Text),
+    case ValueTk of
+        [_] ->
+            %% just one element, maybe is a variable or a plain text
+            %% we should keep it as it is
+            fun(Msg, RawOpts) ->
+                erlang:hd(emqx_placeholder:proc_tmpl(ValueTk, Msg, RawOpts))
+            end;
+        Tks when is_list(Tks) ->
+            fun(Msg, _RawOpts) ->
+                emqx_placeholder:proc_tmpl(Tks, Msg)
+            end
+    end;
+mk_value_procer(Raw) ->
+    fun(_, _) ->
+        Raw
+    end.
+
+mk_timestamp_procer(#{timestamp := Timestamp}) ->
+    TimestampTk = emqx_placeholder:preproc_tmpl(Timestamp),
+    fun(Msg, Base) ->
+        Base#{timestamp => emqx_placeholder:proc_tmpl(TimestampTk, Msg)}
+    end;
+mk_timestamp_procer(_) ->
+    fun(_Msg, Base) ->
+        Base
+    end.

+ 18 - 12
apps/emqx_bridge_pulsar/test/emqx_bridge_pulsar_v2_SUITE.erl

@@ -212,20 +212,25 @@ t_action(Config) ->
     ?assertEqual(ReqPayload, emqx_utils_json:decode(RespPayload)),
     ok = emqtt:disconnect(C1),
     InstanceId = instance_id(actions, Name),
-    #{counters := Counters} = emqx_resource:get_metrics(InstanceId),
+    ?retry(
+        100,
+        20,
+        ?assertMatch(
+            #{
+                counters := #{
+                    dropped := 0,
+                    success := 1,
+                    matched := 1,
+                    failed := 0,
+                    received := 0
+                }
+            },
+            emqx_resource:get_metrics(InstanceId)
+        )
+    ),
     ok = delete_action(Name),
     ActionsAfterDelete = emqx_bridge_v2:list(actions),
     ?assertNot(lists:any(Any, ActionsAfterDelete), ActionsAfterDelete),
-    ?assertMatch(
-        #{
-            dropped := 0,
-            success := 1,
-            matched := 1,
-            failed := 0,
-            received := 0
-        },
-        Counters
-    ),
     ok.
 
 %%------------------------------------------------------------------------------
@@ -292,7 +297,8 @@ pulsar_action(Config) ->
                         <<"pulsar_topic">> => ?config(pulsar_topic, Config)
                     },
                     <<"resource_opts">> => #{
-                        <<"health_check_interval">> => <<"1s">>
+                        <<"health_check_interval">> => <<"1s">>,
+                        <<"metrics_flush_interval">> => <<"300ms">>
                     }
                 }
             }

+ 55 - 0
apps/emqx_bridge_rabbitmq/test/emqx_bridge_rabbitmq_v2_SUITE.erl

@@ -12,6 +12,7 @@
 -include_lib("common_test/include/ct.hrl").
 -include_lib("stdlib/include/assert.hrl").
 -include_lib("amqp_client/include/amqp_client.hrl").
+-import(emqx_config_SUITE, [prepare_conf_file/3]).
 
 -import(emqx_bridge_rabbitmq_test_utils, [
     rabbit_mq_exchange/0,
@@ -317,6 +318,60 @@ t_action_not_exist_exchange(_Config) ->
     ?assertNot(lists:any(Any, ActionsAfterDelete), ActionsAfterDelete),
     ok.
 
+t_replace_action_source(Config) ->
+    Action = #{<<"rabbitmq">> => #{<<"my_action">> => rabbitmq_action()}},
+    Source = #{<<"rabbitmq">> => #{<<"my_source">> => rabbitmq_source()}},
+    ConnectorName = atom_to_binary(?MODULE),
+    Connector = #{<<"rabbitmq">> => #{ConnectorName => rabbitmq_connector(get_rabbitmq(Config))}},
+    Rabbitmq = #{
+        <<"actions">> => Action,
+        <<"sources">> => Source,
+        <<"connectors">> => Connector
+    },
+    ConfBin0 = hocon_pp:do(Rabbitmq, #{}),
+    ConfFile0 = prepare_conf_file(?FUNCTION_NAME, ConfBin0, Config),
+    ?assertMatch(ok, emqx_conf_cli:conf(["load", "--replace", ConfFile0])),
+    ?assertMatch(
+        #{<<"rabbitmq">> := #{<<"my_action">> := _}},
+        emqx_config:get_raw([<<"actions">>]),
+        Action
+    ),
+    ?assertMatch(
+        #{<<"rabbitmq">> := #{<<"my_source">> := _}},
+        emqx_config:get_raw([<<"sources">>]),
+        Source
+    ),
+    ?assertMatch(
+        #{<<"rabbitmq">> := #{ConnectorName := _}},
+        emqx_config:get_raw([<<"connectors">>]),
+        Connector
+    ),
+
+    Empty = #{
+        <<"actions">> => #{},
+        <<"sources">> => #{},
+        <<"connectors">> => #{}
+    },
+    ConfBin1 = hocon_pp:do(Empty, #{}),
+    ConfFile1 = prepare_conf_file(?FUNCTION_NAME, ConfBin1, Config),
+    ?assertMatch(ok, emqx_conf_cli:conf(["load", "--replace", ConfFile1])),
+
+    ?assertEqual(#{}, emqx_config:get_raw([<<"actions">>])),
+    ?assertEqual(#{}, emqx_config:get_raw([<<"sources">>])),
+    ?assertMatch(#{}, emqx_config:get_raw([<<"connectors">>])),
+
+    %% restore connectors
+    Rabbitmq2 = #{<<"connectors">> => Connector},
+    ConfBin2 = hocon_pp:do(Rabbitmq2, #{}),
+    ConfFile2 = prepare_conf_file(?FUNCTION_NAME, ConfBin2, Config),
+    ?assertMatch(ok, emqx_conf_cli:conf(["load", "--replace", ConfFile2])),
+    ?assertMatch(
+        #{<<"rabbitmq">> := #{ConnectorName := _}},
+        emqx_config:get_raw([<<"connectors">>]),
+        Connector
+    ),
+    ok.
+
 waiting_for_disconnected_alarms(InstanceId) ->
     waiting_for_disconnected_alarms(InstanceId, 0).
 

+ 1 - 1
apps/emqx_bridge_rocketmq/src/emqx_bridge_rocketmq_connector.erl

@@ -244,7 +244,7 @@ do_query(
     ?TRACE(
         "QUERY",
         "rocketmq_connector_received",
-        #{connector => InstanceId, query => Query, state => State}
+        #{connector => InstanceId, query => Query, state => redact(State)}
     ),
     ChannelId = get_channel_id(Query),
     #{

+ 49 - 11
apps/emqx_bridge_tdengine/src/emqx_bridge_tdengine_connector.erl

@@ -6,10 +6,11 @@
 
 -behaviour(emqx_resource).
 
--include_lib("typerefl/include/types.hrl").
+-include_lib("hocon/include/hoconsc.hrl").
 -include_lib("emqx/include/logger.hrl").
+-include_lib("typerefl/include/types.hrl").
 -include_lib("snabbkaffe/include/snabbkaffe.hrl").
--include_lib("hocon/include/hoconsc.hrl").
+-include_lib("emqx_resource/include/emqx_resource.hrl").
 
 -export([namespace/0, roots/0, fields/1, desc/1]).
 
@@ -209,18 +210,50 @@ on_batch_query(InstanceId, BatchReq, State) ->
     ?SLOG(error, LogMeta#{msg => "invalid_request"}),
     {error, {unrecoverable_error, invalid_request}}.
 
-on_get_status(_InstanceId, #{pool_name := PoolName}) ->
-    Health = emqx_resource_pool:health_check_workers(PoolName, fun ?MODULE:do_get_status/1),
-    status_result(Health).
+on_get_status(_InstanceId, #{pool_name := PoolName} = State) ->
+    case
+        emqx_resource_pool:health_check_workers(
+            PoolName,
+            fun ?MODULE:do_get_status/1,
+            emqx_resource_pool:health_check_timeout(),
+            #{return_values => true}
+        )
+    of
+        {ok, []} ->
+            {?status_connecting, State, undefined};
+        {ok, Values} ->
+            case lists:keyfind(error, 1, Values) of
+                false ->
+                    ?status_connected;
+                {error, Reason} ->
+                    {?status_connecting, State, enhance_reason(Reason)}
+            end;
+        {error, Reason} ->
+            {?status_connecting, State, enhance_reason(Reason)}
+    end.
 
 do_get_status(Conn) ->
-    case tdengine:insert(Conn, "select server_version()", []) of
-        {ok, _} -> true;
-        _ -> false
+    try
+        tdengine:insert(
+            Conn,
+            "select server_version()",
+            [],
+            emqx_resource_pool:health_check_timeout()
+        )
+    of
+        {ok, _} ->
+            true;
+        {error, _} = Error ->
+            Error
+    catch
+        _Type:Reason ->
+            {error, Reason}
     end.
 
-status_result(_Status = true) -> connected;
-status_result(_Status = false) -> connecting.
+enhance_reason(timeout) ->
+    connection_timeout;
+enhance_reason(Reason) ->
+    Reason.
 
 on_add_channel(
     _InstanceId,
@@ -253,7 +286,12 @@ on_get_channels(InstanceId) ->
 on_get_channel_status(InstanceId, ChannelId, #{channels := Channels} = State) ->
     case maps:is_key(ChannelId, Channels) of
         true ->
-            on_get_status(InstanceId, State);
+            case on_get_status(InstanceId, State) of
+                {Status, _State, Reason} ->
+                    {Status, Reason};
+                Status ->
+                    Status
+            end;
         _ ->
             {error, not_exists}
     end.

+ 33 - 25
apps/emqx_conf/src/emqx_conf_cli.erl

@@ -245,10 +245,11 @@ load_config_from_raw(RawConf0, Opts) ->
     case check_config(RawConf1) of
         {ok, RawConf} ->
             %% It has been ensured that the connector is always the first configuration to be updated.
-            %% However, when deleting the connector, we need to clean up the dependent actions first;
+            %% However, when deleting the connector, we need to clean up the dependent actions/sources first;
             %% otherwise, the deletion will fail.
-            %% notice: we can't create a action before connector.
-            uninstall_actions(RawConf, Opts),
+            %% notice: we can't create a action/sources before connector.
+            uninstall(<<"actions">>, RawConf, Opts),
+            uninstall(<<"sources">>, RawConf, Opts),
             Error =
                 lists:filtermap(
                     fun({K, V}) ->
@@ -288,27 +289,33 @@ load_config_from_raw(RawConf0, Opts) ->
             {error, Errors}
     end.
 
-uninstall_actions(#{<<"actions">> := New}, #{mode := replace}) ->
-    Old = emqx_conf:get_raw([<<"actions">>], #{}),
-    #{removed := Removed} = emqx_bridge_v2:diff_confs(New, Old),
-    maps:foreach(
-        fun({Type, Name}, _) ->
-            case emqx_bridge_v2:remove(Type, Name) of
-                ok ->
-                    ok;
-                {error, Reason} ->
-                    ?SLOG(error, #{
-                        msg => "failed_to_remove_action",
-                        type => Type,
-                        name => Name,
-                        error => Reason
-                    })
-            end
-        end,
-        Removed
-    );
-%% we don't delete things when in merge mode or without actions key.
-uninstall_actions(_RawConf, _) ->
+uninstall(ActionOrSource, Conf, #{mode := replace}) ->
+    case maps:find(ActionOrSource, Conf) of
+        {ok, New} ->
+            Old = emqx_conf:get_raw([ActionOrSource], #{}),
+            ActionOrSourceAtom = binary_to_existing_atom(ActionOrSource),
+            #{removed := Removed} = emqx_bridge_v2:diff_confs(New, Old),
+            maps:foreach(
+                fun({Type, Name}, _) ->
+                    case emqx_bridge_v2:remove(ActionOrSourceAtom, Type, Name) of
+                        ok ->
+                            ok;
+                        {error, Reason} ->
+                            ?SLOG(error, #{
+                                msg => "failed_to_remove",
+                                type => Type,
+                                name => Name,
+                                error => Reason
+                            })
+                    end
+                end,
+                Removed
+            );
+        error ->
+            ok
+    end;
+%% we don't delete things when in merge mode or without actions/sources key.
+uninstall(_, _RawConf, _) ->
     ok.
 
 update_config_cluster(
@@ -481,7 +488,8 @@ filter_readonly_config(Raw) ->
     end.
 
 reload_config(AllConf, Opts) ->
-    uninstall_actions(AllConf, Opts),
+    uninstall(<<"actions">>, AllConf, Opts),
+    uninstall(<<"sources">>, AllConf, Opts),
     Fold = fun({Key, Conf}, Acc) ->
         case update_config_local(Key, Conf, Opts) of
             ok ->

+ 2 - 0
apps/emqx_connector/src/emqx_connector.erl

@@ -473,6 +473,8 @@ ensure_no_channels(Configs) ->
             fun({Type, ConnectorName}) ->
                 fun(_) ->
                     case emqx_connector_resource:get_channels(Type, ConnectorName) of
+                        {error, not_found} ->
+                            ok;
                         {ok, []} ->
                             ok;
                         {ok, Channels} ->

+ 1 - 1
apps/emqx_connector/src/emqx_connector_api.erl

@@ -532,7 +532,7 @@ call_operation_if_enabled(NodeOrAll, OperFunc, [Nodes, BridgeType, BridgeName])
 is_enabled_connector(ConnectorType, ConnectorName) ->
     try emqx:get_config([connectors, ConnectorType, binary_to_existing_atom(ConnectorName)]) of
         ConfMap ->
-            maps:get(enable, ConfMap, false)
+            maps:get(enable, ConfMap, true)
     catch
         error:{config_not_found, _} ->
             throw(not_found);

+ 14 - 0
apps/emqx_dashboard/include/emqx_dashboard.hrl

@@ -85,3 +85,17 @@
     sent => sent_msg_rate,
     dropped => dropped_msg_rate
 }).
+
+-define(CURRENT_SAMPLE_NON_RATE,
+    [
+        node_uptime,
+        retained_msg_count,
+        shared_subscriptions
+    ] ++ ?LICENSE_QUOTA
+).
+
+-if(?EMQX_RELEASE_EDITION == ee).
+-define(LICENSE_QUOTA, [license_quota]).
+-else.
+-define(LICENSE_QUOTA, []).
+-endif.

+ 8 - 3
apps/emqx_dashboard/src/emqx_dashboard_monitor.erl

@@ -264,6 +264,8 @@ merge_cluster_rate(Node, Cluster) ->
                 NCluster#{topics => V};
             (retained_msg_count, V, NCluster) ->
                 NCluster#{retained_msg_count => V};
+            (shared_subscriptions, V, NCluster) ->
+                NCluster#{shared_subscriptions => V};
             (license_quota, V, NCluster) ->
                 NCluster#{license_quota => V};
             %% for cluster sample, ignore node_uptime
@@ -357,8 +359,8 @@ next_interval() ->
 
 sample(Time) ->
     Fun =
-        fun(Key, Res) ->
-            maps:put(Key, getstats(Key), Res)
+        fun(Key, Acc) ->
+            Acc#{Key => getstats(Key)}
         end,
     Data = lists:foldl(Fun, #{}, ?SAMPLER_LIST),
     #emqx_monit{time = Time, data = Data}.
@@ -416,6 +418,8 @@ stats(live_connections) -> emqx_stats:getstat('live_connections.count');
 stats(cluster_sessions) -> emqx_stats:getstat('cluster_sessions.count');
 stats(topics) -> emqx_stats:getstat('topics.count');
 stats(subscriptions) -> emqx_stats:getstat('subscriptions.count');
+stats(shared_subscriptions) -> emqx_stats:getstat('subscriptions.shared.count');
+stats(retained_msg_count) -> emqx_stats:getstat('retained.count');
 stats(received) -> emqx_metrics:val('messages.received');
 stats(received_bytes) -> emqx_metrics:val('bytes.received');
 stats(sent) -> emqx_metrics:val('messages.sent');
@@ -428,7 +432,8 @@ stats(dropped) -> emqx_metrics:val('messages.dropped').
 %% the non rate values should be same on all nodes
 non_rate_value() ->
     (license_quota())#{
-        retained_msg_count => emqx_retainer:retained_count(),
+        retained_msg_count => stats(retained_msg_count),
+        shared_subscriptions => stats(shared_subscriptions),
         node_uptime => emqx_sys:uptime()
     }.
 

+ 28 - 21
apps/emqx_dashboard/src/emqx_dashboard_monitor_api.erl

@@ -94,7 +94,7 @@ schema("/monitor_current/nodes/:node") ->
             description => ?DESC(current_stats_node),
             parameters => [parameter_node()],
             responses => #{
-                200 => hoconsc:mk(hoconsc:ref(sampler_current), #{}),
+                200 => hoconsc:mk(hoconsc:ref(sampler_current_node), #{}),
                 404 => emqx_dashboard_swagger:error_codes(['NOT_FOUND'], <<"Node not found">>)
             }
         }
@@ -125,8 +125,17 @@ fields(sampler) ->
          || SamplerName <- ?SAMPLER_LIST
         ],
     [{time_stamp, hoconsc:mk(non_neg_integer(), #{desc => <<"Timestamp">>})} | Samplers];
+fields(sampler_current_node) ->
+    fields_current(sample_names(sampler_current_node));
 fields(sampler_current) ->
-    Names = maps:values(?DELTA_SAMPLER_RATE_MAP) ++ ?GAUGE_SAMPLER_LIST,
+    fields_current(sample_names(sampler_current)).
+
+sample_names(sampler_current_node) ->
+    maps:values(?DELTA_SAMPLER_RATE_MAP) ++ ?GAUGE_SAMPLER_LIST ++ ?CURRENT_SAMPLE_NON_RATE;
+sample_names(sampler_current) ->
+    sample_names(sampler_current_node) -- [node_uptime].
+
+fields_current(Names) ->
     [
         {SamplerName, hoconsc:mk(integer(), #{desc => swagger_desc(SamplerName)})}
      || SamplerName <- Names
@@ -167,6 +176,8 @@ current_rate(Node) ->
 %% -------------------------------------------------------------------------------------------------
 %% Internal
 
+-define(APPROXIMATE_DESC, " Can only represent an approximate state.").
+
 swagger_desc(received) ->
     swagger_desc_format("Received messages ");
 swagger_desc(received_bytes) ->
@@ -178,30 +189,18 @@ swagger_desc(sent_bytes) ->
 swagger_desc(dropped) ->
     swagger_desc_format("Dropped messages ");
 swagger_desc(subscriptions) ->
-    <<
-        "Subscriptions at the time of sampling."
-        " Can only represent the approximate state"
-    >>;
+    <<"Subscriptions at the time of sampling.", ?APPROXIMATE_DESC>>;
 swagger_desc(topics) ->
-    <<
-        "Count topics at the time of sampling."
-        " Can only represent the approximate state"
-    >>;
+    <<"Count topics at the time of sampling.", ?APPROXIMATE_DESC>>;
 swagger_desc(connections) ->
-    <<
-        "Sessions at the time of sampling."
-        " Can only represent the approximate state"
-    >>;
+    <<"Sessions at the time of sampling.", ?APPROXIMATE_DESC>>;
 swagger_desc(live_connections) ->
-    <<
-        "Connections at the time of sampling."
-        " Can only represent the approximate state"
-    >>;
+    <<"Connections at the time of sampling.", ?APPROXIMATE_DESC>>;
 swagger_desc(cluster_sessions) ->
     <<
         "Total number of sessions in the cluster at the time of sampling. "
-        "It includes expired sessions when `broker.session_history_retain` is set to a duration greater than `0s`. "
-        "Can only represent the approximate state"
+        "It includes expired sessions when `broker.session_history_retain` is set to a duration greater than `0s`."
+        ?APPROXIMATE_DESC
     >>;
 swagger_desc(received_msg_rate) ->
     swagger_desc_format("Dropped messages ", per);
@@ -210,7 +209,15 @@ swagger_desc(sent_msg_rate) ->
     swagger_desc_format("Sent messages ", per);
 %swagger_desc(sent_bytes_rate)     -> swagger_desc_format("Sent bytes ", per);
 swagger_desc(dropped_msg_rate) ->
-    swagger_desc_format("Dropped messages ", per).
+    swagger_desc_format("Dropped messages ", per);
+swagger_desc(retained_msg_count) ->
+    <<"Retained messages count at the time of sampling.", ?APPROXIMATE_DESC>>;
+swagger_desc(shared_subscriptions) ->
+    <<"Shared subscriptions count at the time of sampling.", ?APPROXIMATE_DESC>>;
+swagger_desc(node_uptime) ->
+    <<"Node up time in seconds. Only presented in endpoint: `/monitor_current/nodes/:node`.">>;
+swagger_desc(license_quota) ->
+    <<"License quota. AKA: limited max_connections for cluster">>.
 
 swagger_desc_format(Format) ->
     swagger_desc_format(Format, last).

+ 89 - 2
apps/emqx_dashboard/src/emqx_dashboard_swagger.erl

@@ -178,8 +178,36 @@ fields(hasnext) ->
     >>,
     Meta = #{desc => Desc, required => true},
     [{hasnext, hoconsc:mk(boolean(), Meta)}];
+fields('after') ->
+    Desc = <<
+        "The value of \"last\" field returned in the previous response. It can then be used"
+        " in subsequent requests to get the next chunk of results.<br/>"
+        "It is used instead of \"page\" parameter to traverse volatile data.<br/>"
+        "Can be omitted or set to \"none\" to get the first chunk of data.<br/>"
+        "\last\" = end_of_data\" is returned, if there is no more data.<br/>"
+        "Sending \"after=end_of_table\" back to the server will result in \"400 Bad Request\""
+        " error response."
+    >>,
+    Meta = #{
+        in => query, desc => Desc, required => false, example => <<"AAYS53qRa0n07AAABFIACg">>
+    },
+    [{'after', hoconsc:mk(hoconsc:union([none, end_of_data, binary()]), Meta)}];
+fields(last) ->
+    Desc = <<
+        "An opaque token that can then be in subsequent requests to get "
+        " the next chunk of results: \"?after={last}\"<br/>"
+        "if there is no more data, \"last\" = end_of_data\" is returned.<br/>"
+        "Sending \"after=end_of_table\" back to the server will result in \"400 Bad Request\""
+        " error response."
+    >>,
+    Meta = #{
+        desc => Desc, required => true, example => <<"AAYS53qRa0n07AAABFIACg">>
+    },
+    [{last, hoconsc:mk(hoconsc:union([none, end_of_data, binary()]), Meta)}];
 fields(meta) ->
-    fields(page) ++ fields(limit) ++ fields(count) ++ fields(hasnext).
+    fields(page) ++ fields(limit) ++ fields(count) ++ fields(hasnext);
+fields(continuation_meta) ->
+    fields(last) ++ fields(count).
 
 -spec schema_with_example(hocon_schema:type(), term()) -> hocon_schema:field_schema().
 schema_with_example(Type, Example) ->
@@ -416,20 +444,79 @@ check_parameter(
 check_parameter([], _Bindings, _QueryStr, _Module, NewBindings, NewQueryStr) ->
     {NewBindings, NewQueryStr};
 check_parameter([{Name, Type} | Spec], Bindings, QueryStr, Module, BindingsAcc, QueryStrAcc) ->
-    Schema = ?INIT_SCHEMA#{roots => [{Name, Type}]},
     case hocon_schema:field_schema(Type, in) of
         path ->
+            Schema = ?INIT_SCHEMA#{roots => [{Name, Type}]},
             Option = #{atom_key => true},
             NewBindings = hocon_tconf:check_plain(Schema, Bindings, Option),
             NewBindingsAcc = maps:merge(BindingsAcc, NewBindings),
             check_parameter(Spec, Bindings, QueryStr, Module, NewBindingsAcc, QueryStrAcc);
         query ->
+            Type1 = maybe_wrap_array_qs_param(Type),
+            Schema = ?INIT_SCHEMA#{roots => [{Name, Type1}]},
             Option = #{},
             NewQueryStr = hocon_tconf:check_plain(Schema, QueryStr, Option),
             NewQueryStrAcc = maps:merge(QueryStrAcc, NewQueryStr),
             check_parameter(Spec, Bindings, QueryStr, Module, BindingsAcc, NewQueryStrAcc)
     end.
 
+%% Compatibility layer for minirest 1.4.0 that parses repetitive QS params into lists.
+%% Previous minirest releases dropped all but the last repetitive params.
+
+maybe_wrap_array_qs_param(FieldSchema) ->
+    Conv = hocon_schema:field_schema(FieldSchema, converter),
+    Type = hocon_schema:field_schema(FieldSchema, type),
+    case array_or_single_qs_param(Type, Conv) of
+        any ->
+            FieldSchema;
+        array ->
+            override_conv(FieldSchema, fun wrap_array_conv/2, Conv);
+        single ->
+            override_conv(FieldSchema, fun unwrap_array_conv/2, Conv)
+    end.
+
+array_or_single_qs_param(?ARRAY(_Type), undefined) ->
+    array;
+%% Qs field schema is an array and defines a converter:
+%% don't change (wrap/unwrap) the original value, and let the converter handle it.
+%% For example, it can be a CSV list.
+array_or_single_qs_param(?ARRAY(_Type), _Conv) ->
+    any;
+array_or_single_qs_param(?UNION(Types), _Conv) ->
+    HasArray = lists:any(
+        fun
+            (?ARRAY(_)) -> true;
+            (_) -> false
+        end,
+        Types
+    ),
+    case HasArray of
+        true -> any;
+        false -> single
+    end;
+array_or_single_qs_param(_, _Conv) ->
+    single.
+
+override_conv(FieldSchema, NewConv, OldConv) ->
+    Conv = compose_converters(NewConv, OldConv),
+    hocon_schema:override(FieldSchema, FieldSchema#{converter => Conv}).
+
+compose_converters(NewFun, undefined = _OldFun) ->
+    NewFun;
+compose_converters(NewFun, OldFun) ->
+    case erlang:fun_info(OldFun, arity) of
+        {_, 2} ->
+            fun(V, Opts) -> OldFun(NewFun(V, Opts), Opts) end;
+        {_, 1} ->
+            fun(V, Opts) -> OldFun(NewFun(V, Opts)) end
+    end.
+
+wrap_array_conv(Val, _Opts) when is_list(Val); Val =:= undefined -> Val;
+wrap_array_conv(SingleVal, _Opts) -> [SingleVal].
+
+unwrap_array_conv([HVal | _], _Opts) -> HVal;
+unwrap_array_conv(SingleVal, _Opts) -> SingleVal.
+
 check_request_body(#{body := Body}, Schema, Module, CheckFun, true) ->
     Type0 = hocon_schema:field_schema(Schema, type),
     Type =

+ 94 - 6
apps/emqx_dashboard/test/emqx_dashboard_monitor_SUITE.erl

@@ -21,24 +21,53 @@
 
 -import(emqx_dashboard_SUITE, [auth_header_/0]).
 
--include_lib("eunit/include/eunit.hrl").
 -include("emqx_dashboard.hrl").
+-include_lib("eunit/include/eunit.hrl").
 
 -define(SERVER, "http://127.0.0.1:18083").
 -define(BASE_PATH, "/api/v5").
 
+-define(BASE_RETAINER_CONF, <<
+    "retainer {\n"
+    "    enable = true\n"
+    "    msg_clear_interval = 0s\n"
+    "    msg_expiry_interval = 0s\n"
+    "    max_payload_size = 1MB\n"
+    "    flow_control {\n"
+    "        batch_read_number = 0\n"
+    "        batch_deliver_number = 0\n"
+    "     }\n"
+    "   backend {\n"
+    "        type = built_in_database\n"
+    "        storage_type = ram\n"
+    "        max_retained_messages = 0\n"
+    "     }\n"
+    "}"
+>>).
+
+%%--------------------------------------------------------------------
+%% CT boilerplate
+%%--------------------------------------------------------------------
+
 all() ->
     emqx_common_test_helpers:all(?MODULE).
 
 init_per_suite(Config) ->
-    meck:new(emqx_retainer, [non_strict, passthrough, no_history, no_link]),
-    meck:expect(emqx_retainer, retained_count, fun() -> 0 end),
-    emqx_mgmt_api_test_util:init_suite([]),
+    ok = emqx_mgmt_api_test_util:init_suite([emqx, emqx_conf, emqx_retainer]),
     Config.
 
 end_per_suite(_Config) ->
-    meck:unload([emqx_retainer]),
-    emqx_mgmt_api_test_util:end_suite([]).
+    emqx_mgmt_api_test_util:end_suite([emqx_retainer]).
+
+set_special_configs(emqx_retainer) ->
+    emqx_retainer:update_config(?BASE_RETAINER_CONF),
+    ok;
+set_special_configs(_App) ->
+    ok.
+
+%%--------------------------------------------------------------------
+%% Test Cases
+%%--------------------------------------------------------------------
 
 t_monitor_samplers_all(_Config) ->
     timer:sleep(?DEFAULT_SAMPLE_INTERVAL * 2 * 1000 + 20),
@@ -112,6 +141,65 @@ t_monitor_current_api_live_connections(_) ->
     {ok, _} = emqtt:connect(C2),
     ok = emqtt:disconnect(C2).
 
+t_monitor_current_retained_count(_) ->
+    process_flag(trap_exit, true),
+    ClientId = <<"live_conn_tests">>,
+    {ok, C} = emqtt:start_link([{clean_start, false}, {clientid, ClientId}]),
+    {ok, _} = emqtt:connect(C),
+    _ = emqtt:publish(C, <<"t1">>, <<"qos1-retain">>, [{qos, 1}, {retain, true}]),
+
+    ok = waiting_emqx_stats_and_monitor_update('retained.count'),
+    {ok, Res} = request(["monitor_current"]),
+    {ok, ResNode} = request(["monitor_current", "nodes", node()]),
+
+    ?assertEqual(1, maps:get(<<"retained_msg_count">>, Res)),
+    ?assertEqual(1, maps:get(<<"retained_msg_count">>, ResNode)),
+    ok = emqtt:disconnect(C),
+    ok.
+
+t_monitor_current_shared_subscription(_) ->
+    process_flag(trap_exit, true),
+    ShareT = <<"$share/group1/t/1">>,
+    AssertFun = fun(Num) ->
+        {ok, Res} = request(["monitor_current"]),
+        {ok, ResNode} = request(["monitor_current", "nodes", node()]),
+        ?assertEqual(Num, maps:get(<<"shared_subscriptions">>, Res)),
+        ?assertEqual(Num, maps:get(<<"shared_subscriptions">>, ResNode)),
+        ok
+    end,
+
+    ok = AssertFun(0),
+
+    ClientId1 = <<"live_conn_tests1">>,
+    ClientId2 = <<"live_conn_tests2">>,
+    {ok, C1} = emqtt:start_link([{clean_start, false}, {clientid, ClientId1}]),
+    {ok, _} = emqtt:connect(C1),
+    _ = emqtt:subscribe(C1, {ShareT, 1}),
+
+    ok = AssertFun(1),
+
+    {ok, C2} = emqtt:start_link([{clean_start, true}, {clientid, ClientId2}]),
+    {ok, _} = emqtt:connect(C2),
+    _ = emqtt:subscribe(C2, {ShareT, 1}),
+    ok = AssertFun(2),
+
+    _ = emqtt:unsubscribe(C2, ShareT),
+    ok = AssertFun(1),
+    _ = emqtt:subscribe(C2, {ShareT, 1}),
+    ok = AssertFun(2),
+
+    ok = emqtt:disconnect(C1),
+    %% C1: clean_start = false, proto_ver = 3.1.1
+    %% means disconnected but the session pid with a share-subscription is still alive
+    ok = AssertFun(2),
+
+    _ = emqx_cm:kick_session(ClientId1),
+    ok = AssertFun(1),
+
+    ok = emqtt:disconnect(C2),
+    ok = AssertFun(0),
+    ok.
+
 t_monitor_reset(_) ->
     restart_monitor(),
     {ok, Rate} = request(["monitor_current"]),

+ 7 - 3
apps/emqx_durable_storage/src/emqx_ds.erl

@@ -68,6 +68,8 @@
     make_iterator_result/1, make_iterator_result/0,
     make_delete_iterator_result/1, make_delete_iterator_result/0,
 
+    error/1,
+
     ds_specific_stream/0,
     ds_specific_iterator/0,
     ds_specific_generation_rank/0,
@@ -118,14 +120,14 @@
 
 -type message_key() :: binary().
 
--type store_batch_result() :: ok | {error, _}.
+-type store_batch_result() :: ok | error(_).
 
--type make_iterator_result(Iterator) :: {ok, Iterator} | {error, _}.
+-type make_iterator_result(Iterator) :: {ok, Iterator} | error(_).
 
 -type make_iterator_result() :: make_iterator_result(iterator()).
 
 -type next_result(Iterator) ::
-    {ok, Iterator, [{message_key(), emqx_types:message()}]} | {ok, end_of_stream} | {error, _}.
+    {ok, Iterator, [{message_key(), emqx_types:message()}]} | {ok, end_of_stream} | error(_).
 
 -type next_result() :: next_result(iterator()).
 
@@ -142,6 +144,8 @@
 
 -type delete_next_result() :: delete_next_result(delete_iterator()).
 
+-type error(Reason) :: {error, recoverable | unrecoverable, Reason}.
+
 %% Timestamp
 %% Earliest possible timestamp is 0.
 %% TODO granularity?  Currently, we should always use milliseconds, as that's the unit we

+ 35 - 25
apps/emqx_durable_storage/src/emqx_ds_replication_layer.erl

@@ -195,7 +195,12 @@ drop_db(DB) ->
 -spec store_batch(emqx_ds:db(), [emqx_types:message(), ...], emqx_ds:message_store_opts()) ->
     emqx_ds:store_batch_result().
 store_batch(DB, Messages, Opts) ->
-    emqx_ds_replication_layer_egress:store_batch(DB, Messages, Opts).
+    try
+        emqx_ds_replication_layer_egress:store_batch(DB, Messages, Opts)
+    catch
+        error:{Reason, _Call} when Reason == timeout; Reason == noproc ->
+            {error, recoverable, Reason}
+    end.
 
 -spec get_streams(emqx_ds:db(), emqx_ds:topic_filter(), emqx_ds:time()) ->
     [{emqx_ds:stream_rank(), stream()}].
@@ -204,7 +209,14 @@ get_streams(DB, TopicFilter, StartTime) ->
     lists:flatmap(
         fun(Shard) ->
             Node = node_of_shard(DB, Shard),
-            Streams = emqx_ds_proto_v4:get_streams(Node, DB, Shard, TopicFilter, StartTime),
+            Streams =
+                try
+                    emqx_ds_proto_v4:get_streams(Node, DB, Shard, TopicFilter, StartTime)
+                catch
+                    error:{erpc, _} ->
+                        %% TODO: log?
+                        []
+                end,
             lists:map(
                 fun({RankY, StorageLayerStream}) ->
                     RankX = Shard,
@@ -240,11 +252,14 @@ get_delete_streams(DB, TopicFilter, StartTime) ->
 make_iterator(DB, Stream, TopicFilter, StartTime) ->
     ?stream_v2(Shard, StorageStream) = Stream,
     Node = node_of_shard(DB, Shard),
-    case emqx_ds_proto_v4:make_iterator(Node, DB, Shard, StorageStream, TopicFilter, StartTime) of
+    try emqx_ds_proto_v4:make_iterator(Node, DB, Shard, StorageStream, TopicFilter, StartTime) of
         {ok, Iter} ->
             {ok, #{?tag => ?IT, ?shard => Shard, ?enc => Iter}};
-        Err = {error, _} ->
-            Err
+        Error = {error, _, _} ->
+            Error
+    catch
+        error:RPCError = {erpc, _} ->
+            {error, recoverable, RPCError}
     end.
 
 -spec make_delete_iterator(emqx_ds:db(), delete_stream(), emqx_ds:topic_filter(), emqx_ds:time()) ->
@@ -263,28 +278,19 @@ make_delete_iterator(DB, Stream, TopicFilter, StartTime) ->
             Err
     end.
 
--spec update_iterator(
-    emqx_ds:db(),
-    iterator(),
-    emqx_ds:message_key()
-) ->
+-spec update_iterator(emqx_ds:db(), iterator(), emqx_ds:message_key()) ->
     emqx_ds:make_iterator_result(iterator()).
 update_iterator(DB, OldIter, DSKey) ->
     #{?tag := ?IT, ?shard := Shard, ?enc := StorageIter} = OldIter,
     Node = node_of_shard(DB, Shard),
-    case
-        emqx_ds_proto_v4:update_iterator(
-            Node,
-            DB,
-            Shard,
-            StorageIter,
-            DSKey
-        )
-    of
+    try emqx_ds_proto_v4:update_iterator(Node, DB, Shard, StorageIter, DSKey) of
         {ok, Iter} ->
             {ok, #{?tag => ?IT, ?shard => Shard, ?enc => Iter}};
-        Err = {error, _} ->
-            Err
+        Error = {error, _, _} ->
+            Error
+    catch
+        error:RPCError = {erpc, _} ->
+            {error, recoverable, RPCError}
     end.
 
 -spec next(emqx_ds:db(), iterator(), pos_integer()) -> emqx_ds:next_result(iterator()).
@@ -303,8 +309,12 @@ next(DB, Iter0, BatchSize) ->
         {ok, StorageIter, Batch} ->
             Iter = Iter0#{?enc := StorageIter},
             {ok, Iter, Batch};
-        Other ->
-            Other
+        Ok = {ok, _} ->
+            Ok;
+        Error = {error, _, _} ->
+            Error;
+        RPCError = {badrpc, _} ->
+            {error, recoverable, RPCError}
     end.
 
 -spec delete_next(emqx_ds:db(), delete_iterator(), emqx_ds:delete_selector(), pos_integer()) ->
@@ -408,7 +418,7 @@ do_get_streams_v2(DB, Shard, TopicFilter, StartTime) ->
     emqx_ds:topic_filter(),
     emqx_ds:time()
 ) ->
-    {ok, emqx_ds_storage_layer:iterator()} | {error, _}.
+    emqx_ds:make_iterator_result(emqx_ds_storage_layer:iterator()).
 do_make_iterator_v1(_DB, _Shard, _Stream, _TopicFilter, _StartTime) ->
     error(obsolete_api).
 
@@ -419,7 +429,7 @@ do_make_iterator_v1(_DB, _Shard, _Stream, _TopicFilter, _StartTime) ->
     emqx_ds:topic_filter(),
     emqx_ds:time()
 ) ->
-    {ok, emqx_ds_storage_layer:iterator()} | {error, _}.
+    emqx_ds:make_iterator_result(emqx_ds_storage_layer:iterator()).
 do_make_iterator_v2(DB, Shard, Stream, TopicFilter, StartTime) ->
     emqx_ds_storage_layer:make_iterator({DB, Shard}, Stream, TopicFilter, StartTime).
 

+ 11 - 12
apps/emqx_durable_storage/src/emqx_ds_storage_bitfield_lts.erl

@@ -245,7 +245,7 @@ drop(_Shard, DBHandle, GenId, CFRefs, #s{}) ->
     emqx_ds_storage_layer:shard_id(), s(), [emqx_types:message()], emqx_ds:message_store_opts()
 ) ->
     emqx_ds:store_batch_result().
-store_batch(_ShardId, S = #s{db = DB, data = Data}, Messages, _Options = #{atomic := true}) ->
+store_batch(_ShardId, S = #s{db = DB, data = Data}, Messages, _Options) ->
     {ok, Batch} = rocksdb:batch(),
     lists:foreach(
         fun(Msg) ->
@@ -255,18 +255,17 @@ store_batch(_ShardId, S = #s{db = DB, data = Data}, Messages, _Options = #{atomi
         end,
         Messages
     ),
-    Res = rocksdb:write_batch(DB, Batch, _WriteOptions = []),
+    Result = rocksdb:write_batch(DB, Batch, []),
     rocksdb:release_batch(Batch),
-    Res;
-store_batch(_ShardId, S = #s{db = DB, data = Data}, Messages, _Options) ->
-    lists:foreach(
-        fun(Msg) ->
-            {Key, _} = make_key(S, Msg),
-            Val = serialize(Msg),
-            rocksdb:put(DB, Data, Key, Val, [])
-        end,
-        Messages
-    ).
+    %% NOTE
+    %% Strictly speaking, `{error, incomplete}` is a valid result but should be impossible to
+    %% observe until there's `{no_slowdown, true}` in write options.
+    case Result of
+        ok ->
+            ok;
+        {error, {error, Reason}} ->
+            {error, unrecoverable, {rocksdb, Reason}}
+    end.
 
 -spec get_streams(
     emqx_ds_storage_layer:shard_id(),

+ 5 - 7
apps/emqx_durable_storage/src/emqx_ds_storage_layer.erl

@@ -302,7 +302,7 @@ make_iterator(
                     Err
             end;
         {error, not_found} ->
-            {error, end_of_stream}
+            {error, unrecoverable, generation_not_found}
     end.
 
 -spec make_delete_iterator(shard_id(), delete_stream(), emqx_ds:topic_filter(), emqx_ds:time()) ->
@@ -326,9 +326,7 @@ make_delete_iterator(
             {error, end_of_stream}
     end.
 
--spec update_iterator(
-    shard_id(), iterator(), emqx_ds:message_key()
-) ->
+-spec update_iterator(shard_id(), iterator(), emqx_ds:message_key()) ->
     emqx_ds:make_iterator_result(iterator()).
 update_iterator(
     Shard,
@@ -348,7 +346,7 @@ update_iterator(
                     Err
             end;
         {error, not_found} ->
-            {error, end_of_stream}
+            {error, unrecoverable, generation_not_found}
     end.
 
 -spec next(shard_id(), iterator(), pos_integer()) ->
@@ -365,12 +363,12 @@ next(Shard, Iter = #{?tag := ?IT, ?generation := GenId, ?enc := GenIter0}, Batch
                     {ok, end_of_stream};
                 {ok, GenIter, Batch} ->
                     {ok, Iter#{?enc := GenIter}, Batch};
-                Error = {error, _} ->
+                Error = {error, _, _} ->
                     Error
             end;
         {error, not_found} ->
             %% generation was possibly dropped by GC
-            {ok, end_of_stream}
+            {error, unrecoverable, generation_not_found}
     end.
 
 -spec delete_next(shard_id(), delete_iterator(), emqx_ds:delete_selector(), pos_integer()) ->

+ 3 - 5
apps/emqx_durable_storage/src/proto/emqx_ds_proto_v4.erl

@@ -67,7 +67,7 @@ get_streams(Node, DB, Shard, TopicFilter, Time) ->
     emqx_ds:topic_filter(),
     emqx_ds:time()
 ) ->
-    {ok, emqx_ds_storage_layer:iterator()} | {error, _}.
+    emqx_ds:make_iterator_result().
 make_iterator(Node, DB, Shard, Stream, TopicFilter, StartTime) ->
     erpc:call(Node, emqx_ds_replication_layer, do_make_iterator_v2, [
         DB, Shard, Stream, TopicFilter, StartTime
@@ -80,9 +80,7 @@ make_iterator(Node, DB, Shard, Stream, TopicFilter, StartTime) ->
     emqx_ds_storage_layer:iterator(),
     pos_integer()
 ) ->
-    {ok, emqx_ds_storage_layer:iterator(), [{emqx_ds:message_key(), [emqx_types:message()]}]}
-    | {ok, end_of_stream}
-    | {error, _}.
+    emqx_rpc:call_result(emqx_ds:next_result()).
 next(Node, DB, Shard, Iter, BatchSize) ->
     emqx_rpc:call(Shard, Node, emqx_ds_replication_layer, do_next_v1, [DB, Shard, Iter, BatchSize]).
 
@@ -106,7 +104,7 @@ store_batch(Node, DB, Shard, Batch, Options) ->
     emqx_ds_storage_layer:iterator(),
     emqx_ds:message_key()
 ) ->
-    {ok, emqx_ds_storage_layer:iterator()} | {error, _}.
+    emqx_ds:make_iterator_result().
 update_iterator(Node, DB, Shard, OldIter, DSKey) ->
     erpc:call(Node, emqx_ds_replication_layer, do_update_iterator_v2, [
         DB, Shard, OldIter, DSKey

+ 111 - 11
apps/emqx_durable_storage/test/emqx_ds_SUITE.erl

@@ -21,6 +21,7 @@
 -include_lib("emqx/include/emqx.hrl").
 -include_lib("common_test/include/ct.hrl").
 -include_lib("stdlib/include/assert.hrl").
+-include_lib("emqx/include/asserts.hrl").
 -include_lib("snabbkaffe/include/snabbkaffe.hrl").
 
 -define(N_SHARDS, 1).
@@ -446,7 +447,10 @@ t_drop_generation_with_never_used_iterator(_Config) ->
     ],
     ?assertMatch(ok, emqx_ds:store_batch(DB, Msgs1)),
 
-    ?assertMatch({ok, end_of_stream, []}, iterate(DB, Iter0, 1)),
+    ?assertMatch(
+        {error, unrecoverable, generation_not_found, []},
+        iterate(DB, Iter0, 1)
+    ),
 
     %% New iterator for the new stream will only see the later messages.
     [{_, Stream1}] = emqx_ds:get_streams(DB, TopicFilter, StartTime),
@@ -495,9 +499,10 @@ t_drop_generation_with_used_once_iterator(_Config) ->
     ],
     ?assertMatch(ok, emqx_ds:store_batch(DB, Msgs1)),
 
-    ?assertMatch({ok, end_of_stream, []}, iterate(DB, Iter1, 1)),
-
-    ok.
+    ?assertMatch(
+        {error, unrecoverable, generation_not_found, []},
+        iterate(DB, Iter1, 1)
+    ).
 
 t_drop_generation_update_iterator(_Config) ->
     %% This checks the behavior of `emqx_ds:update_iterator' after the generation
@@ -523,9 +528,10 @@ t_drop_generation_update_iterator(_Config) ->
     ok = emqx_ds:add_generation(DB),
     ok = emqx_ds:drop_generation(DB, GenId0),
 
-    ?assertEqual({error, end_of_stream}, emqx_ds:update_iterator(DB, Iter1, Key2)),
-
-    ok.
+    ?assertEqual(
+        {error, unrecoverable, generation_not_found},
+        emqx_ds:update_iterator(DB, Iter1, Key2)
+    ).
 
 t_make_iterator_stale_stream(_Config) ->
     %% This checks the behavior of `emqx_ds:make_iterator' after the generation underlying
@@ -549,7 +555,7 @@ t_make_iterator_stale_stream(_Config) ->
     ok = emqx_ds:drop_generation(DB, GenId0),
 
     ?assertEqual(
-        {error, end_of_stream},
+        {error, unrecoverable, generation_not_found},
         emqx_ds:make_iterator(DB, Stream0, TopicFilter, StartTime)
     ),
 
@@ -590,9 +596,99 @@ t_get_streams_concurrently_with_drop_generation(_Config) ->
             ok
         end,
         []
+    ).
+
+t_error_mapping_replication_layer(_Config) ->
+    %% This checks that the replication layer maps recoverable errors correctly.
+
+    ok = emqx_ds_test_helpers:mock_rpc(),
+    ok = snabbkaffe:start_trace(),
+
+    DB = ?FUNCTION_NAME,
+    ?assertMatch(ok, emqx_ds:open_db(DB, (opts())#{n_shards => 2})),
+    [Shard1, Shard2] = emqx_ds_replication_layer_meta:shards(DB),
+
+    TopicFilter = emqx_topic:words(<<"foo/#">>),
+    Msgs = [
+        message(<<"C1">>, <<"foo/bar">>, <<"1">>, 0),
+        message(<<"C1">>, <<"foo/baz">>, <<"2">>, 1),
+        message(<<"C2">>, <<"foo/foo">>, <<"3">>, 2),
+        message(<<"C3">>, <<"foo/xyz">>, <<"4">>, 3),
+        message(<<"C4">>, <<"foo/bar">>, <<"5">>, 4),
+        message(<<"C5">>, <<"foo/oof">>, <<"6">>, 5)
+    ],
+
+    ?assertMatch(ok, emqx_ds:store_batch(DB, Msgs)),
+
+    ?block_until(#{?snk_kind := emqx_ds_replication_layer_egress_flush, shard := Shard1}),
+    ?block_until(#{?snk_kind := emqx_ds_replication_layer_egress_flush, shard := Shard2}),
+
+    Streams0 = emqx_ds:get_streams(DB, TopicFilter, 0),
+    Iterators0 = lists:map(
+        fun({_Rank, S}) ->
+            {ok, Iter} = emqx_ds:make_iterator(DB, S, TopicFilter, 0),
+            Iter
+        end,
+        Streams0
     ),
 
-    ok.
+    %% Disrupt the link to the second shard.
+    ok = emqx_ds_test_helpers:mock_rpc_result(
+        fun(_Node, emqx_ds_replication_layer, _Function, Args) ->
+            case Args of
+                [DB, Shard1 | _] -> passthrough;
+                [DB, Shard2 | _] -> unavailable
+            end
+        end
+    ),
+
+    %% Result of `emqx_ds:get_streams/3` will just contain partial results, not an error.
+    Streams1 = emqx_ds:get_streams(DB, TopicFilter, 0),
+    ?assert(
+        length(Streams1) > 0 andalso length(Streams1) =< length(Streams0),
+        Streams1
+    ),
+
+    %% At least one of `emqx_ds:make_iterator/4` will end in an error.
+    Results1 = lists:map(
+        fun({_Rank, S}) ->
+            case emqx_ds:make_iterator(DB, S, TopicFilter, 0) of
+                Ok = {ok, _Iter} ->
+                    Ok;
+                Error = {error, recoverable, {erpc, _}} ->
+                    Error;
+                Other ->
+                    ct:fail({unexpected_result, Other})
+            end
+        end,
+        Streams0
+    ),
+    ?assert(
+        length([error || {error, _, _} <- Results1]) > 0,
+        Results1
+    ),
+
+    %% At least one of `emqx_ds:next/3` over initial set of iterators will end in an error.
+    Results2 = lists:map(
+        fun(Iter) ->
+            case emqx_ds:next(DB, Iter, _BatchSize = 42) of
+                Ok = {ok, _Iter, [_ | _]} ->
+                    Ok;
+                Error = {error, recoverable, {badrpc, _}} ->
+                    Error;
+                Other ->
+                    ct:fail({unexpected_result, Other})
+            end
+        end,
+        Iterators0
+    ),
+    ?assert(
+        length([error || {error, _, _} <- Results2]) > 0,
+        Results2
+    ),
+
+    snabbkaffe:stop(),
+    meck:unload().
 
 update_data_set() ->
     [
@@ -628,6 +724,10 @@ fetch_all(DB, TopicFilter, StartTime) ->
         Streams
     ).
 
+message(ClientId, Topic, Payload, PublishedAt) ->
+    Msg = message(Topic, Payload, PublishedAt),
+    Msg#message{from = ClientId}.
+
 message(Topic, Payload, PublishedAt) ->
     #message{
         topic = Topic,
@@ -647,8 +747,8 @@ iterate(DB, It0, BatchSize, Acc) ->
             iterate(DB, It, BatchSize, Acc ++ Msgs);
         {ok, end_of_stream} ->
             {ok, end_of_stream, Acc};
-        Ret ->
-            Ret
+        {error, Class, Reason} ->
+            {error, Class, Reason, Acc}
     end.
 
 delete(DB, It, Selector, BatchSize) ->

+ 58 - 0
apps/emqx_durable_storage/test/emqx_ds_test_helpers.erl

@@ -0,0 +1,58 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2024 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+-module(emqx_ds_test_helpers).
+
+-compile(export_all).
+-compile(nowarn_export_all).
+
+%% RPC mocking
+
+mock_rpc() ->
+    ok = meck:new(erpc, [passthrough, no_history, unstick]),
+    ok = meck:new(gen_rpc, [passthrough, no_history]).
+
+unmock_rpc() ->
+    catch meck:unload(erpc),
+    catch meck:unload(gen_rpc).
+
+mock_rpc_result(ExpectFun) ->
+    mock_rpc_result(erpc, ExpectFun),
+    mock_rpc_result(gen_rpc, ExpectFun).
+
+mock_rpc_result(erpc, ExpectFun) ->
+    ok = meck:expect(erpc, call, fun(Node, Mod, Function, Args) ->
+        case ExpectFun(Node, Mod, Function, Args) of
+            passthrough ->
+                meck:passthrough([Node, Mod, Function, Args]);
+            unavailable ->
+                meck:exception(error, {erpc, noconnection});
+            {timeout, Timeout} ->
+                ok = timer:sleep(Timeout),
+                meck:exception(error, {erpc, timeout})
+        end
+    end);
+mock_rpc_result(gen_rpc, ExpectFun) ->
+    ok = meck:expect(gen_rpc, call, fun(Dest = {Node, _}, Mod, Function, Args) ->
+        case ExpectFun(Node, Mod, Function, Args) of
+            passthrough ->
+                meck:passthrough([Dest, Mod, Function, Args]);
+            unavailable ->
+                {badtcp, econnrefused};
+            {timeout, Timeout} ->
+                ok = timer:sleep(Timeout),
+                {badrpc, timeout}
+        end
+    end).

+ 1 - 1
apps/emqx_ldap/src/emqx_ldap.app.src

@@ -1,6 +1,6 @@
 {application, emqx_ldap, [
     {description, "EMQX LDAP Connector"},
-    {vsn, "0.1.7"},
+    {vsn, "0.1.8"},
     {registered, []},
     {applications, [
         kernel,

+ 1 - 1
apps/emqx_ldap/src/emqx_ldap.erl

@@ -327,7 +327,7 @@ do_ldap_query(
 mk_log_func(LogTag) ->
     fun(_Level, Format, Args) ->
         ?SLOG(
-            info,
+            debug,
             #{
                 msg => LogTag,
                 log => io_lib:format(Format, [redact_ldap_log(Arg) || Arg <- Args])

+ 3 - 0
apps/emqx_management/include/emqx_mgmt.hrl

@@ -15,3 +15,6 @@
 %%--------------------------------------------------------------------
 
 -define(DEFAULT_ROW_LIMIT, 100).
+
+-define(URL_PARAM_INTEGER, url_param_integer).
+-define(URL_PARAM_BINARY, url_param_binary).

+ 58 - 31
apps/emqx_management/src/emqx_mgmt.erl

@@ -18,6 +18,7 @@
 
 -include("emqx_mgmt.hrl").
 -include_lib("emqx/include/emqx_cm.hrl").
+-include_lib("emqx/include/logger.hrl").
 
 -elvis([{elvis_style, invalid_dynamic_call, disable}]).
 -elvis([{elvis_style, god_modules, disable}]).
@@ -52,6 +53,7 @@
     kickout_clients/1,
     list_authz_cache/1,
     list_client_subscriptions/1,
+    list_client_msgs/3,
     client_subscriptions/2,
     clean_authz_cache/1,
     clean_authz_cache/2,
@@ -116,6 +118,13 @@
 
 -elvis([{elvis_style, god_modules, disable}]).
 
+-define(maybe_log_node_errors(LogData, Errors),
+    case Errors of
+        [] -> ok;
+        _ -> ?SLOG(error, (LogData)#{node_errors => Errors})
+    end
+).
+
 %%--------------------------------------------------------------------
 %% Node Info
 %%--------------------------------------------------------------------
@@ -184,7 +193,7 @@ get_sys_memory() ->
     end.
 
 node_info(Nodes) ->
-    emqx_rpc:unwrap_erpc(emqx_management_proto_v4:node_info(Nodes)).
+    emqx_rpc:unwrap_erpc(emqx_management_proto_v5:node_info(Nodes)).
 
 stopped_node_info(Node) ->
     {Node, #{node => Node, node_status => 'stopped', role => core}}.
@@ -204,23 +213,17 @@ cpu_stats() ->
         false ->
             [];
         true ->
-            Idle = vm_stats('cpu.idle'),
-            [
-                {cpu_idle, Idle},
-                {cpu_use, 100 - Idle}
-            ]
+            vm_stats('cpu')
     end.
 
-vm_stats('cpu.idle') ->
-    case emqx_vm:cpu_util([detailed]) of
-        {_Num, _Use, List, _} when is_list(List) -> proplists:get_value(idle, List, 0);
-        %% return {all, 0, 0, []} when cpu_sup is not started
-        _ -> 0
-    end;
-vm_stats('cpu.use') ->
-    case vm_stats('cpu.idle') of
-        0 -> 0;
-        Idle -> 100 - Idle
+vm_stats('cpu') ->
+    CpuUtilArg = [],
+    case emqx_vm:cpu_util([CpuUtilArg]) of
+        %% return 0.0 when `emqx_cpu_sup_worker` is not started
+        {all, Use, Idle, _} ->
+            [{cpu_use, Use}, {cpu_idle, Idle}];
+        _ ->
+            [{cpu_use, 0}, {cpu_idle, 0}]
     end;
 vm_stats('total.memory') ->
     {_, MemTotal} = get_sys_memory(),
@@ -253,7 +256,7 @@ convert_broker_info({K, V}, M) ->
     M#{K => iolist_to_binary(V)}.
 
 broker_info(Nodes) ->
-    emqx_rpc:unwrap_erpc(emqx_management_proto_v4:broker_info(Nodes)).
+    emqx_rpc:unwrap_erpc(emqx_management_proto_v5:broker_info(Nodes)).
 
 %%--------------------------------------------------------------------
 %% Metrics and Stats
@@ -366,7 +369,7 @@ kickout_client(Node, ClientId) ->
 
 kickout_clients(ClientIds) when is_list(ClientIds) ->
     F = fun(Node) ->
-        emqx_management_proto_v4:kickout_clients(Node, ClientIds)
+        emqx_management_proto_v5:kickout_clients(Node, ClientIds)
     end,
     Results = lists:map(F, emqx:running_nodes()),
     case lists:filter(fun(Res) -> Res =/= ok end, Results) of
@@ -417,6 +420,12 @@ list_client_subscriptions_mem(ClientId) ->
             end
     end.
 
+list_client_msgs(MsgsType, ClientId, PagerParams) when
+    MsgsType =:= inflight_msgs;
+    MsgsType =:= mqueue_msgs
+->
+    call_client(ClientId, {MsgsType, PagerParams}).
+
 client_subscriptions(Node, ClientId) ->
     {Node, unwrap_rpc(emqx_broker_proto_v1:list_client_subscriptions(Node, ClientId))}.
 
@@ -460,17 +469,34 @@ set_keepalive(_ClientId, _Interval) ->
 
 %% @private
 call_client(ClientId, Req) ->
-    Results = [call_client(Node, ClientId, Req) || Node <- emqx:running_nodes()],
-    Expected = lists:filter(
+    case emqx_cm_registry:is_enabled() of
+        true ->
+            do_call_client(ClientId, Req);
+        false ->
+            call_client_on_all_nodes(ClientId, Req)
+    end.
+
+call_client_on_all_nodes(ClientId, Req) ->
+    Nodes = emqx:running_nodes(),
+    Results = call_client(Nodes, ClientId, Req),
+    {Expected, Errs} = lists:foldr(
         fun
-            ({error, _}) -> false;
-            (_) -> true
+            ({_N, {error, not_found}}, Acc) -> Acc;
+            ({_N, {error, _}} = Err, {OkAcc, ErrAcc}) -> {OkAcc, [Err | ErrAcc]};
+            ({_N, OkRes}, {OkAcc, ErrAcc}) -> {[OkRes | OkAcc], ErrAcc}
         end,
-        Results
+        {[], []},
+        lists:zip(Nodes, Results)
     ),
+    ?maybe_log_node_errors(#{msg => "call_client_failed", request => Req}, Errs),
     case Expected of
-        [] -> {error, not_found};
-        [Result | _] -> Result
+        [] ->
+            case Errs of
+                [] -> {error, not_found};
+                [{_Node, FirstErr} | _] -> FirstErr
+            end;
+        [Result | _] ->
+            Result
     end.
 
 %% @private
@@ -490,8 +516,8 @@ do_call_client(ClientId, Req) ->
     end.
 
 %% @private
-call_client(Node, ClientId, Req) ->
-    unwrap_rpc(emqx_management_proto_v4:call_client(Node, ClientId, Req)).
+call_client(Nodes, ClientId, Req) ->
+    emqx_rpc:unwrap_erpc(emqx_management_proto_v5:call_client(Nodes, ClientId, Req)).
 
 %%--------------------------------------------------------------------
 %% Subscriptions
@@ -504,7 +530,7 @@ do_list_subscriptions() ->
     throw(not_implemented).
 
 list_subscriptions(Node) ->
-    unwrap_rpc(emqx_management_proto_v4:list_subscriptions(Node)).
+    unwrap_rpc(emqx_management_proto_v5:list_subscriptions(Node)).
 
 list_subscriptions_via_topic(Topic, FormatFun) ->
     lists:append([
@@ -526,7 +552,7 @@ subscribe(ClientId, TopicTables) ->
     subscribe(emqx:running_nodes(), ClientId, TopicTables).
 
 subscribe([Node | Nodes], ClientId, TopicTables) ->
-    case unwrap_rpc(emqx_management_proto_v4:subscribe(Node, ClientId, TopicTables)) of
+    case unwrap_rpc(emqx_management_proto_v5:subscribe(Node, ClientId, TopicTables)) of
         {error, _} -> subscribe(Nodes, ClientId, TopicTables);
         {subscribe, Res} -> {subscribe, Res, Node}
     end;
@@ -553,7 +579,7 @@ unsubscribe(ClientId, Topic) ->
 -spec unsubscribe([node()], emqx_types:clientid(), emqx_types:topic()) ->
     {unsubscribe, _} | {error, channel_not_found}.
 unsubscribe([Node | Nodes], ClientId, Topic) ->
-    case unwrap_rpc(emqx_management_proto_v4:unsubscribe(Node, ClientId, Topic)) of
+    case unwrap_rpc(emqx_management_proto_v5:unsubscribe(Node, ClientId, Topic)) of
         {error, _} -> unsubscribe(Nodes, ClientId, Topic);
         Re -> Re
     end;
@@ -576,7 +602,7 @@ unsubscribe_batch(ClientId, Topics) ->
 -spec unsubscribe_batch([node()], emqx_types:clientid(), [emqx_types:topic()]) ->
     {unsubscribe_batch, _} | {error, channel_not_found}.
 unsubscribe_batch([Node | Nodes], ClientId, Topics) ->
-    case unwrap_rpc(emqx_management_proto_v4:unsubscribe_batch(Node, ClientId, Topics)) of
+    case unwrap_rpc(emqx_management_proto_v5:unsubscribe_batch(Node, ClientId, Topics)) of
         {error, _} -> unsubscribe_batch(Nodes, ClientId, Topics);
         Re -> Re
     end;
@@ -655,6 +681,7 @@ lookup_running_client(ClientId, FormatFun) ->
 %%--------------------------------------------------------------------
 %% Internal Functions.
 %%--------------------------------------------------------------------
+
 unwrap_rpc({badrpc, Reason}) ->
     {error, Reason};
 unwrap_rpc(Res) ->

+ 62 - 5
apps/emqx_management/src/emqx_mgmt_api.erl

@@ -17,6 +17,7 @@
 -module(emqx_mgmt_api).
 
 -include_lib("stdlib/include/qlc.hrl").
+-include("emqx_mgmt.hrl").
 
 -elvis([{elvis_style, dont_repeat_yourself, #{min_complexity => 100}}]).
 
@@ -37,6 +38,8 @@
 
 -export([
     parse_pager_params/1,
+    parse_cont_pager_params/2,
+    encode_cont_pager_params/2,
     parse_qstring/2,
     init_query_result/0,
     init_query_state/5,
@@ -45,6 +48,7 @@
     finalize_query/2,
     mark_complete/2,
     format_query_result/3,
+    format_query_result/4,
     maybe_collect_total_from_tail_nodes/2
 ]).
 
@@ -134,6 +138,33 @@ page(Params) ->
 limit(Params) when is_map(Params) ->
     maps:get(<<"limit">>, Params, emqx_mgmt:default_row_limit()).
 
+continuation(Params, Encoding) ->
+    try
+        decode_continuation(maps:get(<<"after">>, Params, none), Encoding)
+    catch
+        _:_ ->
+            error
+    end.
+
+decode_continuation(none, _Encoding) ->
+    none;
+decode_continuation(end_of_data, _Encoding) ->
+    %% Clients should not send "after=end_of_data" back to the server
+    error;
+decode_continuation(Cont, ?URL_PARAM_INTEGER) ->
+    binary_to_integer(Cont);
+decode_continuation(Cont, ?URL_PARAM_BINARY) ->
+    emqx_utils:hexstr_to_bin(Cont).
+
+encode_continuation(none, _Encoding) ->
+    none;
+encode_continuation(end_of_data, _Encoding) ->
+    end_of_data;
+encode_continuation(Cont, ?URL_PARAM_INTEGER) ->
+    integer_to_binary(Cont);
+encode_continuation(Cont, ?URL_PARAM_BINARY) ->
+    emqx_utils:bin_to_hexstr(Cont, lower).
+
 %%--------------------------------------------------------------------
 %% Node Query
 %%--------------------------------------------------------------------
@@ -589,10 +620,13 @@ is_fuzzy_key(<<"match_", _/binary>>) ->
 is_fuzzy_key(_) ->
     false.
 
-format_query_result(_FmtFun, _MetaIn, Error = {error, _Node, _Reason}) ->
+format_query_result(FmtFun, MetaIn, ResultAcc) ->
+    format_query_result(FmtFun, MetaIn, ResultAcc, #{}).
+
+format_query_result(_FmtFun, _MetaIn, Error = {error, _Node, _Reason}, _Opts) ->
     Error;
 format_query_result(
-    FmtFun, MetaIn, ResultAcc = #{hasnext := HasNext, rows := RowsAcc}
+    FmtFun, MetaIn, ResultAcc = #{hasnext := HasNext, rows := RowsAcc}, Opts
 ) ->
     Meta =
         case ResultAcc of
@@ -608,7 +642,10 @@ format_query_result(
         data => lists:flatten(
             lists:foldl(
                 fun({Node, Rows}, Acc) ->
-                    [lists:map(fun(Row) -> exec_format_fun(FmtFun, Node, Row) end, Rows) | Acc]
+                    [
+                        lists:map(fun(Row) -> exec_format_fun(FmtFun, Node, Row, Opts) end, Rows)
+                        | Acc
+                    ]
                 end,
                 [],
                 RowsAcc
@@ -616,10 +653,11 @@ format_query_result(
         )
     }.
 
-exec_format_fun(FmtFun, Node, Row) ->
+exec_format_fun(FmtFun, Node, Row, Opts) ->
     case erlang:fun_info(FmtFun, arity) of
         {arity, 1} -> FmtFun(Row);
-        {arity, 2} -> FmtFun(Node, Row)
+        {arity, 2} -> FmtFun(Node, Row);
+        {arity, 3} -> FmtFun(Node, Row, Opts)
     end.
 
 parse_pager_params(Params) ->
@@ -632,6 +670,25 @@ parse_pager_params(Params) ->
             false
     end.
 
+-spec parse_cont_pager_params(map(), ?URL_PARAM_INTEGER | ?URL_PARAM_BINARY) ->
+    #{limit := pos_integer(), continuation := none | end_of_table | binary()} | false.
+parse_cont_pager_params(Params, Encoding) ->
+    Cont = continuation(Params, Encoding),
+    Limit = b2i(limit(Params)),
+    case Limit > 0 andalso Cont =/= error of
+        true ->
+            #{continuation => Cont, limit => Limit};
+        false ->
+            false
+    end.
+
+-spec encode_cont_pager_params(map(), ?URL_PARAM_INTEGER | ?URL_PARAM_BINARY) -> map().
+encode_cont_pager_params(#{continuation := Cont} = Meta, ContEncoding) ->
+    Meta1 = maps:remove(continuation, Meta),
+    Meta1#{last => encode_continuation(Cont, ContEncoding)};
+encode_cont_pager_params(Meta, _ContEncoding) ->
+    Meta.
+
 %%--------------------------------------------------------------------
 %% Types
 %%--------------------------------------------------------------------

+ 2 - 1
apps/emqx_management/src/emqx_mgmt_api_banned.erl

@@ -169,7 +169,8 @@ banned(get, #{query_string := Params}) ->
 banned(post, #{body := Body}) ->
     case emqx_banned:parse(Body) of
         {error, Reason} ->
-            {400, 'BAD_REQUEST', list_to_binary(Reason)};
+            ErrorReason = io_lib:format("~p", [Reason]),
+            {400, 'BAD_REQUEST', list_to_binary(ErrorReason)};
         Ban ->
             case emqx_banned:create(Ban) of
                 {ok, Banned} ->

+ 361 - 58
apps/emqx_management/src/emqx_mgmt_api_clients.erl

@@ -22,8 +22,8 @@
 -include_lib("emqx/include/emqx.hrl").
 -include_lib("emqx/include/emqx_cm.hrl").
 -include_lib("hocon/include/hoconsc.hrl").
-
 -include_lib("emqx/include/logger.hrl").
+-include_lib("emqx_utils/include/emqx_utils_api.hrl").
 
 -include("emqx_mgmt.hrl").
 
@@ -47,14 +47,17 @@
     unsubscribe/2,
     unsubscribe_batch/2,
     set_keepalive/2,
-    sessions_count/2
+    sessions_count/2,
+    inflight_msgs/2,
+    mqueue_msgs/2
 ]).
 
 -export([
     qs2ms/2,
     run_fuzzy_filter/2,
     format_channel_info/1,
-    format_channel_info/2
+    format_channel_info/2,
+    format_channel_info/3
 ]).
 
 %% for batch operation
@@ -64,7 +67,10 @@
 
 -define(CLIENT_QSCHEMA, [
     {<<"node">>, atom},
+    %% list
     {<<"username">>, binary},
+    %% list
+    {<<"clientid">>, binary},
     {<<"ip_address">>, ip},
     {<<"conn_state">>, atom},
     {<<"clean_start">>, atom},
@@ -101,6 +107,8 @@ paths() ->
         "/clients/:clientid/unsubscribe",
         "/clients/:clientid/unsubscribe/bulk",
         "/clients/:clientid/keepalive",
+        "/clients/:clientid/mqueue_messages",
+        "/clients/:clientid/inflight_messages",
         "/sessions_count"
     ].
 
@@ -121,10 +129,13 @@ schema("/clients") ->
                         example => <<"emqx@127.0.0.1">>
                     })},
                 {username,
-                    hoconsc:mk(binary(), #{
+                    hoconsc:mk(hoconsc:array(binary()), #{
                         in => query,
                         required => false,
-                        desc => <<"User name">>
+                        desc => <<
+                            "User name, multiple values can be specified by"
+                            " repeating the parameter: username=u1&username=u2"
+                        >>
                     })},
                 {ip_address,
                     hoconsc:mk(binary(), #{
@@ -198,7 +209,17 @@ schema("/clients") ->
                             "Search client connection creation time by less"
                             " than or equal method, rfc3339 or timestamp(millisecond)"
                         >>
-                    })}
+                    })},
+                {clientid,
+                    hoconsc:mk(hoconsc:array(binary()), #{
+                        in => query,
+                        required => false,
+                        desc => <<
+                            "Client ID, multiple values can be specified by"
+                            " repeating the parameter: clientid=c1&clientid=c2"
+                        >>
+                    })},
+                ?R_REF(requested_client_fields)
             ],
             responses => #{
                 200 =>
@@ -391,6 +412,14 @@ schema("/clients/:clientid/keepalive") ->
             }
         }
     };
+schema("/clients/:clientid/mqueue_messages") ->
+    ContExample = <<"AAYS53qRa0n07AAABFIACg">>,
+    RespSchema = ?R_REF(mqueue_messages),
+    client_msgs_schema(mqueue_msgs, ?DESC(get_client_mqueue_msgs), ContExample, RespSchema);
+schema("/clients/:clientid/inflight_messages") ->
+    ContExample = <<"10">>,
+    RespSchema = ?R_REF(inflight_messages),
+    client_msgs_schema(inflight_msgs, ?DESC(get_client_inflight_msgs), ContExample, RespSchema);
 schema("/sessions_count") ->
     #{
         'operationId' => sessions_count,
@@ -411,7 +440,10 @@ schema("/sessions_count") ->
             responses => #{
                 200 => hoconsc:mk(binary(), #{
                     desc => <<"Number of sessions">>
-                })
+                }),
+                400 => emqx_dashboard_swagger:error_codes(
+                    ['BAD_REQUEST'], <<"Node {name} cannot handle this request.">>
+                )
             }
         }
     }.
@@ -621,6 +653,50 @@ fields(subscribe) ->
 fields(unsubscribe) ->
     [
         {topic, hoconsc:mk(binary(), #{desc => <<"Topic">>, example => <<"testtopic/#">>})}
+    ];
+fields(mqueue_messages) ->
+    [
+        {data, hoconsc:mk(hoconsc:array(?REF(message)), #{desc => ?DESC(mqueue_msgs_list)})},
+        {meta, hoconsc:mk(hoconsc:ref(emqx_dashboard_swagger, continuation_meta), #{})}
+    ];
+fields(inflight_messages) ->
+    [
+        {data, hoconsc:mk(hoconsc:array(?REF(message)), #{desc => ?DESC(inflight_msgs_list)})},
+        {meta, hoconsc:mk(hoconsc:ref(emqx_dashboard_swagger, continuation_meta), #{})}
+    ];
+fields(message) ->
+    [
+        {msgid, hoconsc:mk(binary(), #{desc => ?DESC(msg_id)})},
+        {topic, hoconsc:mk(binary(), #{desc => ?DESC(msg_topic)})},
+        {qos, hoconsc:mk(emqx_schema:qos(), #{desc => ?DESC(msg_qos)})},
+        {publish_at, hoconsc:mk(integer(), #{desc => ?DESC(msg_publish_at)})},
+        {from_clientid, hoconsc:mk(binary(), #{desc => ?DESC(msg_from_clientid)})},
+        {from_username, hoconsc:mk(binary(), #{desc => ?DESC(msg_from_username)})},
+        {payload, hoconsc:mk(binary(), #{desc => ?DESC(msg_payload)})}
+    ];
+fields(requested_client_fields) ->
+    %% NOTE: some Client fields actually returned in response are missing in schema:
+    %%  enable_authn, is_persistent, listener, peerport
+    ClientFields = [element(1, F) || F <- fields(client)],
+    [
+        {fields,
+            hoconsc:mk(
+                hoconsc:union([all, hoconsc:array(hoconsc:enum(ClientFields))]),
+                #{
+                    in => query,
+                    required => false,
+                    default => all,
+                    desc => <<"Comma separated list of client fields to return in the response">>,
+                    converter => fun
+                        (all, _Opts) ->
+                            all;
+                        (<<"all">>, _Opts) ->
+                            all;
+                        (CsvFields, _Opts) when is_binary(CsvFields) ->
+                            binary:split(CsvFields, <<",">>, [global, trim_all])
+                    end
+                }
+            )}
     ].
 
 %%%==============================================================================================
@@ -693,6 +769,15 @@ set_keepalive(put, #{bindings := #{clientid := ClientID}, body := Body}) ->
             end
     end.
 
+mqueue_msgs(get, #{bindings := #{clientid := ClientID}, query_string := QString}) ->
+    list_client_msgs(mqueue_msgs, ClientID, QString).
+
+inflight_msgs(get, #{
+    bindings := #{clientid := ClientID},
+    query_string := QString
+}) ->
+    list_client_msgs(inflight_msgs, ClientID, QString).
+
 %%%==============================================================================================
 %% api apply
 
@@ -825,6 +910,63 @@ unsubscribe_batch(#{clientid := ClientID, topics := Topics}) ->
 %%--------------------------------------------------------------------
 %% internal function
 
+client_msgs_schema(OpId, Desc, ContExample, RespSchema) ->
+    #{
+        'operationId' => OpId,
+        get => #{
+            description => Desc,
+            tags => ?TAGS,
+            parameters => client_msgs_params(),
+            responses => #{
+                200 =>
+                    emqx_dashboard_swagger:schema_with_example(RespSchema, #{
+                        <<"data">> => [message_example()],
+                        <<"meta">> => #{
+                            <<"count">> => 100,
+                            <<"last">> => ContExample
+                        }
+                    }),
+                400 =>
+                    emqx_dashboard_swagger:error_codes(
+                        ['INVALID_PARAMETER'], <<"Invalid parameters">>
+                    ),
+                404 => emqx_dashboard_swagger:error_codes(
+                    ['CLIENTID_NOT_FOUND'], <<"Client ID not found">>
+                )
+            }
+        }
+    }.
+
+client_msgs_params() ->
+    [
+        {clientid, hoconsc:mk(binary(), #{in => path})},
+        {payload,
+            hoconsc:mk(hoconsc:enum([none, base64, plain]), #{
+                in => query,
+                default => base64,
+                desc => <<
+                    "Client's inflight/mqueue messages payload encoding."
+                    " If set to `none`, no payload is returned in the response."
+                >>
+            })},
+        {max_payload_bytes,
+            hoconsc:mk(emqx_schema:bytesize(), #{
+                in => query,
+                default => <<"1MB">>,
+                desc => <<
+                    "Client's inflight/mqueue messages payload limit."
+                    " The total payload size of all messages in the response will not exceed this value."
+                    " Messages beyond the limit will be silently omitted in the response."
+                    " The only exception to this rule is when the first message payload"
+                    " is already larger than the limit."
+                    " In this case, the first message will be returned in the response."
+                >>,
+                validator => fun max_bytes_validator/1
+            })},
+        hoconsc:ref(emqx_dashboard_swagger, 'after'),
+        hoconsc:ref(emqx_dashboard_swagger, limit)
+    ].
+
 do_subscribe(ClientID, Topic0, Options) ->
     try emqx_topic:parse(Topic0, Options) of
         {Topic, Opts} ->
@@ -870,7 +1012,10 @@ list_clients_cluster_query(QString, Options) ->
                     ?CHAN_INFO_TAB, NQString, fun ?MODULE:qs2ms/2, Meta, Options
                 ),
                 Res = do_list_clients_cluster_query(Nodes, QueryState, ResultAcc),
-                emqx_mgmt_api:format_query_result(fun ?MODULE:format_channel_info/2, Meta, Res)
+                Opts = #{fields => maps:get(<<"fields">>, QString, all)},
+                emqx_mgmt_api:format_query_result(
+                    fun ?MODULE:format_channel_info/3, Meta, Res, Opts
+                )
             catch
                 throw:{bad_value_type, {Key, ExpectedType, AcutalValue}} ->
                     {error, invalid_query_string_param, {Key, ExpectedType, AcutalValue}}
@@ -922,7 +1067,8 @@ list_clients_node_query(Node, QString, Options) ->
                 ?CHAN_INFO_TAB, NQString, fun ?MODULE:qs2ms/2, Meta, Options
             ),
             Res = do_list_clients_node_query(Node, QueryState, ResultAcc),
-            emqx_mgmt_api:format_query_result(fun ?MODULE:format_channel_info/2, Meta, Res)
+            Opts = #{fields => maps:get(<<"fields">>, QString, all)},
+            emqx_mgmt_api:format_query_result(fun ?MODULE:format_channel_info/3, Meta, Res, Opts)
     end.
 
 add_persistent_session_count(QueryState0 = #{total := Totals0}) ->
@@ -993,10 +1139,18 @@ do_persistent_session_count(Cursor, N) ->
     case emqx_persistent_session_ds_state:session_iterator_next(Cursor, 1) of
         {[], _} ->
             N;
-        {_, NextCursor} ->
-            do_persistent_session_count(NextCursor, N + 1)
+        {[{_Id, Meta}], NextCursor} ->
+            case is_expired(Meta) of
+                true ->
+                    do_persistent_session_count(NextCursor, N);
+                false ->
+                    do_persistent_session_count(NextCursor, N + 1)
+            end
     end.
 
+is_expired(#{last_alive_at := LastAliveAt, expiry_interval := ExpiryInterval}) ->
+    LastAliveAt + ExpiryInterval < erlang:system_time(millisecond).
+
 do_persistent_session_query(ResultAcc, QueryState) ->
     case emqx_persistent_message:is_persistence_enabled() of
         true ->
@@ -1014,7 +1168,7 @@ do_persistent_session_query1(ResultAcc, QueryState, Iter0) ->
     %% through all the nodes.
     #{limit := Limit} = QueryState,
     {Rows0, Iter} = emqx_persistent_session_ds_state:session_iterator_next(Iter0, Limit),
-    Rows = remove_live_sessions(Rows0),
+    Rows = drop_live_and_expired(Rows0),
     case emqx_mgmt_api:accumulate_query_rows(undefined, Rows, QueryState, ResultAcc) of
         {enough, NResultAcc} ->
             emqx_mgmt_api:finalize_query(NResultAcc, emqx_mgmt_api:mark_complete(QueryState, true));
@@ -1024,19 +1178,50 @@ do_persistent_session_query1(ResultAcc, QueryState, Iter0) ->
             do_persistent_session_query1(NResultAcc, QueryState, Iter)
     end.
 
-remove_live_sessions(Rows) ->
+drop_live_and_expired(Rows) ->
     lists:filtermap(
-        fun({ClientId, _Session}) ->
-            case emqx_mgmt:lookup_running_client(ClientId, _FormatFn = undefined) of
-                [] ->
-                    {true, {ClientId, emqx_persistent_session_ds_state:print_session(ClientId)}};
-                [_ | _] ->
-                    false
+        fun({ClientId, Session}) ->
+            case is_expired(Session) orelse is_live_session(ClientId) of
+                true ->
+                    false;
+                false ->
+                    {true, {ClientId, emqx_persistent_session_ds_state:print_session(ClientId)}}
             end
         end,
         Rows
     ).
 
+%% Return 'true' if there is a live channel found in the global channel registry.
+%% NOTE: We cannot afford to query all running nodes to find out if a session is live.
+%% i.e. assuming the global session registry is always enabled.
+%% Otherwise this function may return `false` for `true` causing the session to appear
+%% twice in the query result.
+is_live_session(ClientId) ->
+    [] =/= emqx_cm_registry:lookup_channels(ClientId).
+
+list_client_msgs(MsgType, ClientID, QString) ->
+    case emqx_mgmt_api:parse_cont_pager_params(QString, cont_encoding(MsgType)) of
+        false ->
+            {400, #{code => <<"INVALID_PARAMETER">>, message => <<"after_limit_invalid">>}};
+        PagerParams = #{} ->
+            case emqx_mgmt:list_client_msgs(MsgType, ClientID, PagerParams) of
+                {error, not_found} ->
+                    {404, ?CLIENTID_NOT_FOUND};
+                {Msgs, Meta = #{}} when is_list(Msgs) ->
+                    format_msgs_resp(MsgType, Msgs, Meta, QString)
+            end
+    end.
+
+%% integer packet id
+cont_encoding(inflight_msgs) -> ?URL_PARAM_INTEGER;
+%% binary message id
+cont_encoding(mqueue_msgs) -> ?URL_PARAM_BINARY.
+
+max_bytes_validator(MaxBytes) when is_integer(MaxBytes), MaxBytes > 0 ->
+    ok;
+max_bytes_validator(_MaxBytes) ->
+    {error, "must be higher than 0"}.
+
 %%--------------------------------------------------------------------
 %% QueryString to Match Spec
 
@@ -1050,19 +1235,36 @@ qs2ms(_Tab, {QString, FuzzyQString}) ->
 -spec qs2ms(list()) -> ets:match_spec().
 qs2ms(Qs) ->
     {MtchHead, Conds} = qs2ms(Qs, 2, {#{}, []}),
-    [{{'$1', MtchHead, '_'}, Conds, ['$_']}].
+    [{{{'$1', '_'}, MtchHead, '_'}, Conds, ['$_']}].
 
 qs2ms([], _, {MtchHead, Conds}) ->
     {MtchHead, lists:reverse(Conds)};
+qs2ms([{Key, '=:=', Value} | Rest], N, {MtchHead, Conds}) when is_list(Value) ->
+    {Holder, NxtN} = holder_and_nxt(Key, N),
+    NMtchHead = emqx_mgmt_util:merge_maps(MtchHead, ms(Key, Holder)),
+    qs2ms(Rest, NxtN, {NMtchHead, [orelse_cond(Holder, Value) | Conds]});
 qs2ms([{Key, '=:=', Value} | Rest], N, {MtchHead, Conds}) ->
     NMtchHead = emqx_mgmt_util:merge_maps(MtchHead, ms(Key, Value)),
     qs2ms(Rest, N, {NMtchHead, Conds});
 qs2ms([Qs | Rest], N, {MtchHead, Conds}) ->
-    Holder = binary_to_atom(iolist_to_binary(["$", integer_to_list(N)]), utf8),
+    Holder = holder(N),
     NMtchHead = emqx_mgmt_util:merge_maps(MtchHead, ms(element(1, Qs), Holder)),
     NConds = put_conds(Qs, Holder, Conds),
     qs2ms(Rest, N + 1, {NMtchHead, NConds}).
 
+%% This is a special case: clientid is a part of the key (ClientId, Pid}, as the table is ordered_set,
+%% using partially bound key optimizes traversal.
+holder_and_nxt(clientid, N) ->
+    {'$1', N};
+holder_and_nxt(_, N) ->
+    {holder(N), N + 1}.
+
+holder(N) -> list_to_atom([$$ | integer_to_list(N)]).
+
+orelse_cond(Holder, ValuesList) ->
+    Conds = [{'=:=', Holder, V} || V <- ValuesList],
+    erlang:list_to_tuple(['orelse' | Conds]).
+
 put_conds({_, Op, V}, Holder, Conds) ->
     [{Op, Holder, V} | Conds];
 put_conds({_, Op1, V1, Op2, V2}, Holder, Conds) ->
@@ -1072,8 +1274,8 @@ put_conds({_, Op1, V1, Op2, V2}, Holder, Conds) ->
         | Conds
     ].
 
-ms(clientid, X) ->
-    #{clientinfo => #{clientid => X}};
+ms(clientid, _X) ->
+    #{};
 ms(username, X) ->
     #{clientinfo => #{username => X}};
 ms(conn_state, X) ->
@@ -1117,7 +1319,11 @@ format_channel_info({ClientId, PSInfo}) ->
     %% offline persistent session
     format_persistent_session_info(ClientId, PSInfo).
 
-format_channel_info(WhichNode, {_, ClientInfo0, ClientStats}) ->
+format_channel_info(WhichNode, ChanInfo) ->
+    DefaultOpts = #{fields => all},
+    format_channel_info(WhichNode, ChanInfo, DefaultOpts).
+
+format_channel_info(WhichNode, {_, ClientInfo0, ClientStats}, Opts) ->
     Node = maps:get(node, ClientInfo0, WhichNode),
     ClientInfo1 = emqx_utils_maps:deep_remove([conninfo, clientid], ClientInfo0),
     ClientInfo2 = emqx_utils_maps:deep_remove([conninfo, username], ClientInfo1),
@@ -1136,45 +1342,17 @@ format_channel_info(WhichNode, {_, ClientInfo0, ClientStats}) ->
     ClientInfoMap5 = convert_expiry_interval_unit(ClientInfoMap4),
     ClientInfoMap = maps:put(connected, Connected, ClientInfoMap5),
 
-    RemoveList =
-        [
-            auth_result,
-            peername,
-            sockname,
-            peerhost,
-            conn_state,
-            send_pend,
-            conn_props,
-            peercert,
-            sockstate,
-            subscriptions,
-            receive_maximum,
-            protocol,
-            is_superuser,
-            sockport,
-            anonymous,
-            socktype,
-            active_n,
-            await_rel_timeout,
-            conn_mod,
-            sockname,
-            retry_interval,
-            upgrade_qos,
-            zone,
-            %% session_id, defined in emqx_session.erl
-            id,
-            acl
-        ],
+    #{fields := RequestedFields} = Opts,
     TimesKeys = [created_at, connected_at, disconnected_at],
     %% format timestamp to rfc3339
     result_format_undefined_to_null(
         lists:foldl(
             fun result_format_time_fun/2,
-            maps:without(RemoveList, ClientInfoMap),
+            with_client_info_fields(ClientInfoMap, RequestedFields),
             TimesKeys
         )
     );
-format_channel_info(undefined, {ClientId, PSInfo0 = #{}}) ->
+format_channel_info(undefined, {ClientId, PSInfo0 = #{}}, _Opts) ->
     format_persistent_session_info(ClientId, PSInfo0).
 
 format_persistent_session_info(ClientId, PSInfo0) ->
@@ -1204,6 +1382,114 @@ format_persistent_session_info(ClientId, PSInfo0) ->
     ),
     result_format_undefined_to_null(PSInfo).
 
+with_client_info_fields(ClientInfoMap, all) ->
+    RemoveList =
+        [
+            auth_result,
+            peername,
+            sockname,
+            peerhost,
+            peerport,
+            conn_state,
+            send_pend,
+            conn_props,
+            peercert,
+            sockstate,
+            subscriptions,
+            receive_maximum,
+            protocol,
+            is_superuser,
+            sockport,
+            anonymous,
+            socktype,
+            active_n,
+            await_rel_timeout,
+            conn_mod,
+            sockname,
+            retry_interval,
+            upgrade_qos,
+            zone,
+            %% session_id, defined in emqx_session.erl
+            id,
+            acl
+        ],
+    maps:without(RemoveList, ClientInfoMap);
+with_client_info_fields(ClientInfoMap, RequestedFields) when is_list(RequestedFields) ->
+    maps:with(RequestedFields, ClientInfoMap).
+
+format_msgs_resp(MsgType, Msgs, Meta, QString) ->
+    #{
+        <<"payload">> := PayloadFmt,
+        <<"max_payload_bytes">> := MaxBytes
+    } = QString,
+    Meta1 = emqx_mgmt_api:encode_cont_pager_params(Meta, cont_encoding(MsgType)),
+    Resp = #{meta => Meta1, data => format_msgs(Msgs, PayloadFmt, MaxBytes)},
+    %% Make sure minirest won't set another content-type for self-encoded JSON response body
+    Headers = #{<<"content-type">> => <<"application/json">>},
+    case emqx_utils_json:safe_encode(Resp) of
+        {ok, RespBin} ->
+            {200, Headers, RespBin};
+        _Error when PayloadFmt =:= plain ->
+            ?BAD_REQUEST(
+                <<"INVALID_PARAMETER">>,
+                <<"Some message payloads are not JSON serializable">>
+            );
+        %% Unexpected internal error
+        Error ->
+            ?INTERNAL_ERROR(Error)
+    end.
+
+format_msgs([FirstMsg | Msgs], PayloadFmt, MaxBytes) ->
+    %% Always include at least one message payload, even if it exceeds the limit
+    {FirstMsg1, PayloadSize0} = format_msg(FirstMsg, PayloadFmt),
+    {Msgs1, _} =
+        catch lists:foldl(
+            fun(Msg, {MsgsAcc, SizeAcc} = Acc) ->
+                {Msg1, PayloadSize} = format_msg(Msg, PayloadFmt),
+                case SizeAcc + PayloadSize of
+                    SizeAcc1 when SizeAcc1 =< MaxBytes ->
+                        {[Msg1 | MsgsAcc], SizeAcc1};
+                    _ ->
+                        throw(Acc)
+                end
+            end,
+            {[FirstMsg1], PayloadSize0},
+            Msgs
+        ),
+    lists:reverse(Msgs1);
+format_msgs([], _PayloadFmt, _MaxBytes) ->
+    [].
+
+format_msg(
+    #message{
+        id = ID,
+        qos = Qos,
+        topic = Topic,
+        from = From,
+        timestamp = Timestamp,
+        headers = Headers,
+        payload = Payload
+    },
+    PayloadFmt
+) ->
+    Msg = #{
+        msgid => emqx_guid:to_hexstr(ID),
+        qos => Qos,
+        topic => Topic,
+        publish_at => Timestamp,
+        from_clientid => emqx_utils_conv:bin(From),
+        from_username => maps:get(username, Headers, <<>>)
+    },
+    format_payload(PayloadFmt, Msg, Payload).
+
+format_payload(none, Msg, _Payload) ->
+    {Msg, 0};
+format_payload(base64, Msg, Payload) ->
+    Payload1 = base64:encode(Payload),
+    {Msg#{payload => Payload1}, erlang:byte_size(Payload1)};
+format_payload(plain, Msg, Payload) ->
+    {Msg#{payload => Payload}, erlang:iolist_size(Payload)}.
+
 %% format func helpers
 take_maps_from_inner(_Key, Value, Current) when is_map(Value) ->
     maps:merge(Current, Value);
@@ -1305,7 +1591,24 @@ client_example() ->
         <<"recv_msg.qos0">> => 0
     }.
 
+message_example() ->
+    #{
+        <<"msgid">> => <<"000611F460D57FA9F44500000D360002">>,
+        <<"topic">> => <<"t/test">>,
+        <<"qos">> => 0,
+        <<"publish_at">> => 1709055346487,
+        <<"from_clientid">> => <<"mqttx_59ac0a87">>,
+        <<"from_username">> => <<"test-user">>,
+        <<"payload">> => <<"eyJmb28iOiAiYmFyIn0=">>
+    }.
+
 sessions_count(get, #{query_string := QString}) ->
-    Since = maps:get(<<"since">>, QString, 0),
-    Count = emqx_cm_registry_keeper:count(Since),
-    {200, integer_to_binary(Count)}.
+    try
+        Since = maps:get(<<"since">>, QString, 0),
+        Count = emqx_cm_registry_keeper:count(Since),
+        {200, integer_to_binary(Count)}
+    catch
+        exit:{noproc, _} ->
+            Msg = io_lib:format("Node (~s) cannot handle this request.", [node()]),
+            {400, 'BAD_REQUEST', iolist_to_binary(Msg)}
+    end.

+ 1 - 1
apps/emqx_management/src/emqx_mgmt_api_configs.erl

@@ -407,7 +407,7 @@ get_configs_v1(QueryStr) ->
     Node = maps:get(<<"node">>, QueryStr, node()),
     case
         lists:member(Node, emqx:running_nodes()) andalso
-            emqx_management_proto_v4:get_full_config(Node)
+            emqx_management_proto_v5:get_full_config(Node)
     of
         false ->
             Message = list_to_binary(io_lib:format("Bad node ~p, reason not found", [Node])),

+ 1 - 1
apps/emqx_management/src/emqx_mgmt_api_listeners.erl

@@ -516,7 +516,7 @@ list_listeners() ->
     lists:map(fun list_listeners/1, [Self | lists:delete(Self, emqx:running_nodes())]).
 
 list_listeners(Node) ->
-    wrap_rpc(emqx_management_proto_v4:list_listeners(Node)).
+    wrap_rpc(emqx_management_proto_v5:list_listeners(Node)).
 
 listener_status_by_id(NodeL) ->
     Listeners = maps:to_list(listener_status_by_id(NodeL, #{})),

+ 86 - 0
apps/emqx_management/src/proto/emqx_management_proto_v5.erl

@@ -0,0 +1,86 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2024 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+
+-module(emqx_management_proto_v5).
+
+-behaviour(emqx_bpapi).
+
+-export([
+    introduced_in/0,
+
+    node_info/1,
+    broker_info/1,
+    list_subscriptions/1,
+
+    list_listeners/1,
+    subscribe/3,
+    unsubscribe/3,
+    unsubscribe_batch/3,
+
+    call_client/3,
+
+    get_full_config/1,
+
+    kickout_clients/2
+]).
+
+-include_lib("emqx/include/bpapi.hrl").
+
+introduced_in() ->
+    "5.6.0".
+
+-spec unsubscribe_batch(node(), emqx_types:clientid(), [emqx_types:topic()]) ->
+    {unsubscribe, _} | {error, _} | {badrpc, _}.
+unsubscribe_batch(Node, ClientId, Topics) ->
+    rpc:call(Node, emqx_mgmt, do_unsubscribe_batch, [ClientId, Topics]).
+
+-spec node_info([node()]) -> emqx_rpc:erpc_multicall(map()).
+node_info(Nodes) ->
+    erpc:multicall(Nodes, emqx_mgmt, node_info, [], 30000).
+
+-spec broker_info([node()]) -> emqx_rpc:erpc_multicall(map()).
+broker_info(Nodes) ->
+    erpc:multicall(Nodes, emqx_mgmt, broker_info, [], 30000).
+
+-spec list_subscriptions(node()) -> [map()] | {badrpc, _}.
+list_subscriptions(Node) ->
+    rpc:call(Node, emqx_mgmt, do_list_subscriptions, []).
+
+-spec list_listeners(node()) -> map() | {badrpc, _}.
+list_listeners(Node) ->
+    rpc:call(Node, emqx_mgmt_api_listeners, do_list_listeners, []).
+
+-spec subscribe(node(), emqx_types:clientid(), emqx_types:topic_filters()) ->
+    {subscribe, _} | {error, atom()} | {badrpc, _}.
+subscribe(Node, ClientId, TopicTables) ->
+    rpc:call(Node, emqx_mgmt, do_subscribe, [ClientId, TopicTables]).
+
+-spec unsubscribe(node(), emqx_types:clientid(), emqx_types:topic()) ->
+    {unsubscribe, _} | {error, _} | {badrpc, _}.
+unsubscribe(Node, ClientId, Topic) ->
+    rpc:call(Node, emqx_mgmt, do_unsubscribe, [ClientId, Topic]).
+
+-spec call_client([node()], emqx_types:clientid(), term()) -> emqx_rpc:erpc_multicall(term()).
+call_client(Nodes, ClientId, Req) ->
+    erpc:multicall(Nodes, emqx_mgmt, do_call_client, [ClientId, Req], 30000).
+
+-spec get_full_config(node()) -> map() | list() | {badrpc, _}.
+get_full_config(Node) ->
+    rpc:call(Node, emqx_mgmt_api_configs, get_full_config, []).
+
+-spec kickout_clients(node(), [emqx_types:clientid()]) -> ok | {badrpc, _}.
+kickout_clients(Node, ClientIds) ->
+    rpc:call(Node, emqx_mgmt, do_kickout_clients, [ClientIds]).

+ 115 - 5
apps/emqx_management/test/emqx_mgmt_SUITE.erl

@@ -28,14 +28,19 @@
 all() ->
     [
         {group, persistence_disabled},
-        {group, persistence_enabled}
+        {group, persistence_enabled},
+        {group, cm_registry_enabled},
+        {group, cm_registry_disabled}
     ].
 
 groups() ->
-    TCs = emqx_common_test_helpers:all(?MODULE),
+    CMRegistryTCs = [t_call_client_cluster],
+    TCs = emqx_common_test_helpers:all(?MODULE) -- CMRegistryTCs,
     [
         {persistence_disabled, [], TCs},
-        {persistence_enabled, [], [t_persist_list_subs]}
+        {persistence_enabled, [], [t_persist_list_subs]},
+        {cm_registry_enabled, CMRegistryTCs},
+        {cm_registry_disabled, CMRegistryTCs}
     ].
 
 init_per_group(persistence_disabled, Config) ->
@@ -66,10 +71,17 @@ init_per_group(persistence_enabled, Config) ->
     [
         {apps, Apps}
         | Config
-    ].
+    ];
+init_per_group(cm_registry_enabled, Config) ->
+    [{emqx_config, "broker.enable_session_registry = true"} | Config];
+init_per_group(cm_registry_disabled, Config) ->
+    [{emqx_config, "broker.enable_session_registry = false"} | Config].
 
 end_per_group(_Grp, Config) ->
-    emqx_cth_suite:stop(?config(apps, Config)).
+    case ?config(apps, Config) of
+        undefined -> ok;
+        Apps -> emqx_cth_suite:stop(Apps)
+    end.
 
 init_per_suite(Config) ->
     Config.
@@ -447,6 +459,83 @@ t_persist_list_subs(_) ->
     %% clients:
     VerifySubs().
 
+t_call_client_cluster(Config) ->
+    [Node1, Node2] = ?config(cluster, Config),
+    [Node1ClientId, Node2ClientId] = ?config(client_ids, Config),
+    ?assertMatch(
+        {[], #{}}, rpc:call(Node1, emqx_mgmt, list_client_msgs, client_msgs_args(Node1ClientId))
+    ),
+    ?assertMatch(
+        {[], #{}}, rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_args(Node2ClientId))
+    ),
+    ?assertMatch(
+        {[], #{}}, rpc:call(Node1, emqx_mgmt, list_client_msgs, client_msgs_args(Node2ClientId))
+    ),
+    ?assertMatch(
+        {[], #{}}, rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_args(Node1ClientId))
+    ),
+
+    case proplists:get_value(name, ?config(tc_group_properties, Config)) of
+        cm_registry_disabled ->
+            %% Simulating crashes that must be handled by erpc multicall
+            ?assertMatch(
+                {error, _},
+                rpc:call(Node1, emqx_mgmt, list_client_msgs, client_msgs_bad_args(Node2ClientId))
+            ),
+            ?assertMatch(
+                {error, _},
+                rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_bad_args(Node1ClientId))
+            );
+        cm_registry_enabled ->
+            %% Direct call to remote pid is expected to crash
+            ?assertMatch(
+                {badrpc, {'EXIT', _}},
+                rpc:call(Node1, emqx_mgmt, list_client_msgs, client_msgs_bad_args(Node1ClientId))
+            ),
+            ?assertMatch(
+                {badrpc, {'EXIT', _}},
+                rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_bad_args(Node2ClientId))
+            );
+        _ ->
+            ok
+    end,
+
+    NotFoundClientId = <<"no_such_client_id">>,
+    ?assertEqual(
+        {error, not_found},
+        rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_args(NotFoundClientId))
+    ),
+    ?assertEqual(
+        {error, not_found},
+        rpc:call(Node2, emqx_mgmt, list_client_msgs, client_msgs_args(NotFoundClientId))
+    ).
+
+t_call_client_cluster(init, Config) ->
+    Apps = [{emqx, ?config(emqx_config, Config)}, emqx_management],
+    [Node1, Node2] =
+        Cluster = emqx_cth_cluster:start(
+            [
+                {list_to_atom(atom_to_list(?MODULE) ++ "1"), #{role => core, apps => Apps}},
+                {list_to_atom(atom_to_list(?MODULE) ++ "2"), #{role => core, apps => Apps}}
+            ],
+            #{work_dir => emqx_cth_suite:work_dir(?FUNCTION_NAME, Config)}
+        ),
+    {ok, Node1Client, Node1ClientId} = connect_client(Node1),
+    {ok, Node2Client, Node2ClientId} = connect_client(Node2),
+    %% They may exit during the test due to simulated crashes
+    unlink(Node1Client),
+    unlink(Node2Client),
+    [
+        {cluster, Cluster},
+        {client_ids, [Node1ClientId, Node2ClientId]},
+        {client_pids, [Node1Client, Node2Client]}
+        | Config
+    ];
+t_call_client_cluster('end', Config) ->
+    emqx_cth_cluster:stop(?config(cluster, Config)),
+    [exit(ClientPid, kill) || ClientPid <- ?config(client_pids, Config)],
+    ok.
+
 %%% helpers
 ident(Arg) ->
     Arg.
@@ -462,3 +551,24 @@ setup_clients(Config) ->
 disconnect_clients(Config) ->
     Clients = ?config(clients, Config),
     lists:foreach(fun emqtt:disconnect/1, Clients).
+
+get_mqtt_port(Node) ->
+    {_IP, Port} = erpc:call(Node, emqx_config, get, [[listeners, tcp, default, bind]]),
+    Port.
+
+connect_client(Node) ->
+    Port = get_mqtt_port(Node),
+    ClientId = <<(atom_to_binary(Node))/binary, "_client">>,
+    {ok, Client} = emqtt:start_link([
+        {port, Port},
+        {proto_ver, v5},
+        {clientid, ClientId}
+    ]),
+    {ok, _} = emqtt:connect(Client),
+    {ok, Client, ClientId}.
+
+client_msgs_args(ClientId) ->
+    [mqueue_msgs, ClientId, #{limit => 10, continuation => none}].
+
+client_msgs_bad_args(ClientId) ->
+    [mqueue_msgs, ClientId, "bad_page_params"].

+ 522 - 3
apps/emqx_management/test/emqx_mgmt_api_clients_SUITE.erl

@@ -23,16 +23,23 @@
 -include_lib("common_test/include/ct.hrl").
 -include_lib("snabbkaffe/include/snabbkaffe.hrl").
 -include_lib("emqx/include/asserts.hrl").
+-include_lib("emqx/include/emqx_mqtt.hrl").
 
 all() ->
     AllTCs = emqx_common_test_helpers:all(?MODULE),
     [
-        {group, persistent_sessions}
-        | AllTCs -- persistent_session_testcases()
+        {group, persistent_sessions},
+        {group, msgs_base64_encoding},
+        {group, msgs_plain_encoding}
+        | AllTCs -- (persistent_session_testcases() ++ client_msgs_testcases())
     ].
 
 groups() ->
-    [{persistent_sessions, persistent_session_testcases()}].
+    [
+        {persistent_sessions, persistent_session_testcases()},
+        {msgs_base64_encoding, client_msgs_testcases()},
+        {msgs_plain_encoding, client_msgs_testcases()}
+    ].
 
 persistent_session_testcases() ->
     [
@@ -42,12 +49,19 @@ persistent_session_testcases() ->
         t_persistent_sessions4,
         t_persistent_sessions5
     ].
+client_msgs_testcases() ->
+    [
+        t_inflight_messages,
+        t_mqueue_messages
+    ].
 
 init_per_suite(Config) ->
+    ok = snabbkaffe:start_trace(),
     emqx_mgmt_api_test_util:init_suite(),
     Config.
 
 end_per_suite(_) ->
+    ok = snabbkaffe:stop(),
     emqx_mgmt_api_test_util:end_suite().
 
 init_per_group(persistent_sessions, Config) ->
@@ -67,6 +81,10 @@ init_per_group(persistent_sessions, Config) ->
         #{work_dir => emqx_cth_suite:work_dir(Config)}
     ),
     [{nodes, Nodes} | Config];
+init_per_group(msgs_base64_encoding, Config) ->
+    [{payload_encoding, base64} | Config];
+init_per_group(msgs_plain_encoding, Config) ->
+    [{payload_encoding, plain} | Config];
 init_per_group(_Group, Config) ->
     Config.
 
@@ -77,6 +95,21 @@ end_per_group(persistent_sessions, Config) ->
 end_per_group(_Group, _Config) ->
     ok.
 
+end_per_testcase(TC, _Config) when
+    TC =:= t_inflight_messages;
+    TC =:= t_mqueue_messages
+->
+    ClientId = atom_to_binary(TC),
+    lists:foreach(fun(P) -> exit(P, kill) end, emqx_cm:lookup_channels(local, ClientId)),
+    ok = emqx_common_test_helpers:wait_for(
+        ?FUNCTION_NAME,
+        ?LINE,
+        fun() -> [] =:= emqx_cm:lookup_channels(local, ClientId) end,
+        5000
+    );
+end_per_testcase(_TC, _Config) ->
+    ok.
+
 t_clients(_) ->
     process_flag(trap_exit, true),
 
@@ -682,6 +715,238 @@ t_query_clients_with_time(_) ->
     {ok, _} = emqx_mgmt_api_test_util:request_api(delete, Client1Path),
     {ok, _} = emqx_mgmt_api_test_util:request_api(delete, Client2Path).
 
+t_query_multiple_clients(_) ->
+    process_flag(trap_exit, true),
+    ClientIdsUsers = [
+        {<<"multi_client1">>, <<"multi_user1">>},
+        {<<"multi_client1-1">>, <<"multi_user1">>},
+        {<<"multi_client2">>, <<"multi_user2">>},
+        {<<"multi_client2-1">>, <<"multi_user2">>},
+        {<<"multi_client3">>, <<"multi_user3">>},
+        {<<"multi_client3-1">>, <<"multi_user3">>},
+        {<<"multi_client4">>, <<"multi_user4">>},
+        {<<"multi_client4-1">>, <<"multi_user4">>}
+    ],
+    _Clients = lists:map(
+        fun({ClientId, Username}) ->
+            {ok, C} = emqtt:start_link(#{clientid => ClientId, username => Username}),
+            {ok, _} = emqtt:connect(C),
+            C
+        end,
+        ClientIdsUsers
+    ),
+    timer:sleep(100),
+
+    Auth = emqx_mgmt_api_test_util:auth_header_(),
+
+    %% Not found clients/users
+    ?assertEqual([], get_clients(Auth, "clientid=no_such_client")),
+    ?assertEqual([], get_clients(Auth, "clientid=no_such_client&clientid=no_such_client1")),
+    %% Duplicates must cause no issues
+    ?assertEqual([], get_clients(Auth, "clientid=no_such_client&clientid=no_such_client")),
+    ?assertEqual([], get_clients(Auth, "username=no_such_user&clientid=no_such_user1")),
+    ?assertEqual([], get_clients(Auth, "username=no_such_user&clientid=no_such_user")),
+    ?assertEqual(
+        [],
+        get_clients(
+            Auth,
+            "clientid=no_such_client&clientid=no_such_client"
+            "username=no_such_user&clientid=no_such_user1"
+        )
+    ),
+
+    %% Requested ClientId / username values relate to different clients
+    ?assertEqual([], get_clients(Auth, "clientid=multi_client1&username=multi_user2")),
+    ?assertEqual(
+        [],
+        get_clients(
+            Auth,
+            "clientid=multi_client1&clientid=multi_client1-1"
+            "&username=multi_user2&username=multi_user3"
+        )
+    ),
+    ?assertEqual([<<"multi_client1">>], get_clients(Auth, "clientid=multi_client1")),
+    %% Duplicates must cause no issues
+    ?assertEqual(
+        [<<"multi_client1">>], get_clients(Auth, "clientid=multi_client1&clientid=multi_client1")
+    ),
+    ?assertEqual(
+        [<<"multi_client1">>], get_clients(Auth, "clientid=multi_client1&username=multi_user1")
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(get_clients(Auth, "username=multi_user1"))
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(get_clients(Auth, "clientid=multi_client1&clientid=multi_client1-1"))
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(
+            get_clients(
+                Auth,
+                "clientid=multi_client1&clientid=multi_client1-1"
+                "&username=multi_user1"
+            )
+        )
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(
+            get_clients(
+                Auth,
+                "clientid=no-such-client&clientid=multi_client1&clientid=multi_client1-1"
+                "&username=multi_user1"
+            )
+        )
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(
+            get_clients(
+                Auth,
+                "clientid=no-such-client&clientid=multi_client1&clientid=multi_client1-1"
+                "&username=multi_user1&username=no-such-user"
+            )
+        )
+    ),
+
+    AllQsFun = fun(QsKey, Pos) ->
+        QsParts = [
+            QsKey ++ "=" ++ binary_to_list(element(Pos, ClientUser))
+         || ClientUser <- ClientIdsUsers
+        ],
+        lists:flatten(lists:join("&", QsParts))
+    end,
+    AllClientsQs = AllQsFun("clientid", 1),
+    AllUsersQs = AllQsFun("username", 2),
+    AllClientIds = lists:sort([C || {C, _U} <- ClientIdsUsers]),
+
+    ?assertEqual(AllClientIds, lists:sort(get_clients(Auth, AllClientsQs))),
+    ?assertEqual(AllClientIds, lists:sort(get_clients(Auth, AllUsersQs))),
+    ?assertEqual(AllClientIds, lists:sort(get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs))),
+
+    %% Test with other filter params
+    NodeQs = "&node=" ++ atom_to_list(node()),
+    NoNodeQs = "&node=nonode@nohost",
+    ?assertEqual(
+        AllClientIds, lists:sort(get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ NodeQs))
+    ),
+    ?assertMatch(
+        {error, _}, get_clients_expect_error(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ NoNodeQs)
+    ),
+
+    %% fuzzy search (like_{key}) must be ignored if accurate filter ({key}) is present
+    ?assertEqual(
+        AllClientIds,
+        lists:sort(get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ "&like_clientid=multi"))
+    ),
+    ?assertEqual(
+        AllClientIds,
+        lists:sort(get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ "&like_username=multi"))
+    ),
+    ?assertEqual(
+        AllClientIds,
+        lists:sort(
+            get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ "&like_clientid=does-not-matter")
+        )
+    ),
+    ?assertEqual(
+        AllClientIds,
+        lists:sort(
+            get_clients(Auth, AllClientsQs ++ "&" ++ AllUsersQs ++ "&like_username=does-not-matter")
+        )
+    ),
+
+    %% Combining multiple clientids with like_username and vice versa must narrow down search results
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(get_clients(Auth, AllClientsQs ++ "&like_username=user1"))
+    ),
+    ?assertEqual(
+        lists:sort([<<"multi_client1">>, <<"multi_client1-1">>]),
+        lists:sort(get_clients(Auth, AllUsersQs ++ "&like_clientid=client1"))
+    ),
+    ?assertEqual([], get_clients(Auth, AllClientsQs ++ "&like_username=nouser")),
+    ?assertEqual([], get_clients(Auth, AllUsersQs ++ "&like_clientid=nouser")).
+
+t_query_multiple_clients_urlencode(_) ->
+    process_flag(trap_exit, true),
+    ClientIdsUsers = [
+        {<<"multi_client=a?">>, <<"multi_user=a?">>},
+        {<<"mutli_client=b?">>, <<"multi_user=b?">>}
+    ],
+    _Clients = lists:map(
+        fun({ClientId, Username}) ->
+            {ok, C} = emqtt:start_link(#{clientid => ClientId, username => Username}),
+            {ok, _} = emqtt:connect(C),
+            C
+        end,
+        ClientIdsUsers
+    ),
+    timer:sleep(100),
+
+    Auth = emqx_mgmt_api_test_util:auth_header_(),
+    ClientsQs = uri_string:compose_query([{<<"clientid">>, C} || {C, _} <- ClientIdsUsers]),
+    UsersQs = uri_string:compose_query([{<<"username">>, U} || {_, U} <- ClientIdsUsers]),
+    ExpectedClients = lists:sort([C || {C, _} <- ClientIdsUsers]),
+    ?assertEqual(ExpectedClients, lists:sort(get_clients(Auth, ClientsQs))),
+    ?assertEqual(ExpectedClients, lists:sort(get_clients(Auth, UsersQs))).
+
+t_query_clients_with_fields(_) ->
+    process_flag(trap_exit, true),
+    TCBin = atom_to_binary(?FUNCTION_NAME),
+    ClientId = <<TCBin/binary, "_client">>,
+    Username = <<TCBin/binary, "_user">>,
+    {ok, C} = emqtt:start_link(#{clientid => ClientId, username => Username}),
+    {ok, _} = emqtt:connect(C),
+    timer:sleep(100),
+
+    Auth = emqx_mgmt_api_test_util:auth_header_(),
+    ?assertEqual([#{<<"clientid">> => ClientId}], get_clients_all_fields(Auth, "fields=clientid")),
+    ?assertEqual(
+        [#{<<"clientid">> => ClientId, <<"username">> => Username}],
+        get_clients_all_fields(Auth, "fields=clientid,username")
+    ),
+
+    AllFields = get_clients_all_fields(Auth, "fields=all"),
+    DefaultFields = get_clients_all_fields(Auth, ""),
+
+    ?assertEqual(AllFields, DefaultFields),
+    ?assertMatch(
+        [#{<<"clientid">> := ClientId, <<"username">> := Username}],
+        AllFields
+    ),
+    ?assert(map_size(hd(AllFields)) > 2),
+    ?assertMatch({error, _}, get_clients_expect_error(Auth, "fields=bad_field_name")),
+    ?assertMatch({error, _}, get_clients_expect_error(Auth, "fields=all,bad_field_name")),
+    ?assertMatch({error, _}, get_clients_expect_error(Auth, "fields=all,username,clientid")).
+
+get_clients_all_fields(Auth, Qs) ->
+    get_clients(Auth, Qs, false, false).
+
+get_clients_expect_error(Auth, Qs) ->
+    get_clients(Auth, Qs, true, true).
+
+get_clients(Auth, Qs) ->
+    get_clients(Auth, Qs, false, true).
+
+get_clients(Auth, Qs, ExpectError, ClientIdOnly) ->
+    ClientsPath = emqx_mgmt_api_test_util:api_path(["clients"]),
+    Resp = emqx_mgmt_api_test_util:request_api(get, ClientsPath, Qs, Auth),
+    case ExpectError of
+        false ->
+            {ok, Body} = Resp,
+            #{<<"data">> := Clients} = emqx_utils_json:decode(Body),
+            case ClientIdOnly of
+                true -> [ClientId || #{<<"clientid">> := ClientId} <- Clients];
+                false -> Clients
+            end;
+        true ->
+            Resp
+    end.
+
 t_keepalive(_Config) ->
     Username = "user_keepalive",
     ClientId = "client_keepalive",
@@ -759,8 +1024,262 @@ t_client_id_not_found(_Config) ->
     ?assertMatch({error, {Http, _, Body}}, PostFun(post, PathFun(["unsubscribe"]), UnsubBody)),
     ?assertMatch(
         {error, {Http, _, Body}}, PostFun(post, PathFun(["unsubscribe", "bulk"]), [UnsubBody])
+    ),
+    %% Mqueue messages
+    ?assertMatch({error, {Http, _, Body}}, ReqFun(get, PathFun(["mqueue_messages"]))),
+    %% Inflight messages
+    ?assertMatch({error, {Http, _, Body}}, ReqFun(get, PathFun(["inflight_messages"]))).
+
+t_sessions_count(_Config) ->
+    ClientId = atom_to_binary(?FUNCTION_NAME),
+    Topic = <<"t/test_sessions_count">>,
+    Conf0 = emqx_config:get([broker]),
+    Conf1 = hocon_maps:deep_merge(Conf0, #{session_history_retain => 5}),
+    %% from 1 seconds ago, which is for sure less than histry retain duration
+    %% hence force a call to the gen_server emqx_cm_registry_keeper
+    Since = erlang:system_time(seconds) - 1,
+    ok = emqx_config:put(#{broker => Conf1}),
+    {ok, Client} = emqtt:start_link([
+        {proto_ver, v5},
+        {clientid, ClientId},
+        {clean_start, true}
+    ]),
+    {ok, _} = emqtt:connect(Client),
+    {ok, _, _} = emqtt:subscribe(Client, Topic, 1),
+    Path = emqx_mgmt_api_test_util:api_path(["sessions_count"]),
+    AuthHeader = emqx_mgmt_api_test_util:auth_header_(),
+    ?assertMatch(
+        {ok, "1"},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "since=" ++ integer_to_list(Since), AuthHeader
+        )
+    ),
+    ok = emqtt:disconnect(Client),
+    %% simulate the situation in which the process is not running
+    ok = supervisor:terminate_child(emqx_cm_sup, emqx_cm_registry_keeper),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "since=" ++ integer_to_list(Since), AuthHeader
+        )
+    ),
+    %% restore default value
+    ok = emqx_config:put(#{broker => Conf0}),
+    ok = emqx_cm_registry_keeper:purge(),
+    ok.
+
+t_mqueue_messages(Config) ->
+    ClientId = atom_to_binary(?FUNCTION_NAME),
+    Topic = <<"t/test_mqueue_msgs">>,
+    Count = emqx_mgmt:default_row_limit(),
+    {ok, _Client} = client_with_mqueue(ClientId, Topic, Count),
+    Path = emqx_mgmt_api_test_util:api_path(["clients", ClientId, "mqueue_messages"]),
+    ?assert(Count =< emqx:get_config([mqtt, max_mqueue_len])),
+    AuthHeader = emqx_mgmt_api_test_util:auth_header_(),
+    test_messages(Path, Topic, Count, AuthHeader, ?config(payload_encoding, Config)),
+
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "limit=10&after=not-base64%23%21", AuthHeader
+        )
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "limit=-5&after=not-base64%23%21", AuthHeader
+        )
+    ).
+
+t_inflight_messages(Config) ->
+    ClientId = atom_to_binary(?FUNCTION_NAME),
+    Topic = <<"t/test_inflight_msgs">>,
+    PubCount = emqx_mgmt:default_row_limit(),
+    {ok, Client} = client_with_inflight(ClientId, Topic, PubCount),
+    Path = emqx_mgmt_api_test_util:api_path(["clients", ClientId, "inflight_messages"]),
+    InflightLimit = emqx:get_config([mqtt, max_inflight]),
+    AuthHeader = emqx_mgmt_api_test_util:auth_header_(),
+    test_messages(Path, Topic, InflightLimit, AuthHeader, ?config(payload_encoding, Config)),
+
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "limit=10&after=not-int", AuthHeader
+        )
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(
+            get, Path, "limit=-5&after=invalid-int", AuthHeader
+        )
+    ),
+    emqtt:stop(Client).
+
+client_with_mqueue(ClientId, Topic, Count) ->
+    {ok, Client} = emqtt:start_link([
+        {proto_ver, v5},
+        {clientid, ClientId},
+        {clean_start, false},
+        {properties, #{'Session-Expiry-Interval' => 120}}
+    ]),
+    {ok, _} = emqtt:connect(Client),
+    {ok, _, _} = emqtt:subscribe(Client, Topic, 1),
+    ok = emqtt:disconnect(Client),
+    publish_msgs(Topic, Count),
+    {ok, Client}.
+
+client_with_inflight(ClientId, Topic, Count) ->
+    {ok, Client} = emqtt:start_link([
+        {proto_ver, v5},
+        {clientid, ClientId},
+        {clean_start, true},
+        {auto_ack, never}
+    ]),
+    {ok, _} = emqtt:connect(Client),
+    {ok, _, _} = emqtt:subscribe(Client, Topic, 1),
+    publish_msgs(Topic, Count),
+    {ok, Client}.
+
+publish_msgs(Topic, Count) ->
+    lists:foreach(
+        fun(Seq) ->
+            emqx_broker:publish(emqx_message:make(undefined, ?QOS_1, Topic, integer_to_binary(Seq)))
+        end,
+        lists:seq(1, Count)
+    ).
+
+test_messages(Path, Topic, Count, AuthHeader, PayloadEncoding) ->
+    Qs0 = io_lib:format("payload=~s", [PayloadEncoding]),
+    {ok, MsgsResp} = emqx_mgmt_api_test_util:request_api(get, Path, Qs0, AuthHeader),
+    #{<<"meta">> := Meta, <<"data">> := Msgs} = emqx_utils_json:decode(MsgsResp),
+
+    ?assertMatch(
+        #{
+            <<"last">> := <<"end_of_data">>,
+            <<"count">> := Count
+        },
+        Meta
+    ),
+    ?assertEqual(length(Msgs), Count),
+    lists:foreach(
+        fun({Seq, #{<<"payload">> := P} = M}) ->
+            ?assertEqual(Seq, binary_to_integer(decode_payload(P, PayloadEncoding))),
+            ?assertMatch(
+                #{
+                    <<"msgid">> := _,
+                    <<"topic">> := Topic,
+                    <<"qos">> := _,
+                    <<"publish_at">> := _,
+                    <<"from_clientid">> := _,
+                    <<"from_username">> := _
+                },
+                M
+            )
+        end,
+        lists:zip(lists:seq(1, Count), Msgs)
+    ),
+
+    %% The first message payload is <<"1">>,
+    %% and when it is urlsafe base64 encoded (with no padding), it's <<"MQ">>,
+    %% so we cover both cases:
+    %% - when total payload size exceeds the limit,
+    %% - when the first message payload already exceeds the limit but is still returned in the response.
+    QsPayloadLimit = io_lib:format("payload=~s&max_payload_bytes=1", [PayloadEncoding]),
+    {ok, LimitedMsgsResp} = emqx_mgmt_api_test_util:request_api(
+        get, Path, QsPayloadLimit, AuthHeader
+    ),
+    #{<<"meta">> := _, <<"data">> := FirstMsgOnly} = emqx_utils_json:decode(LimitedMsgsResp),
+    ct:pal("~p", [FirstMsgOnly]),
+    ?assertEqual(1, length(FirstMsgOnly)),
+    ?assertEqual(
+        <<"1">>, decode_payload(maps:get(<<"payload">>, hd(FirstMsgOnly)), PayloadEncoding)
+    ),
+
+    Limit = 19,
+    LastCont = lists:foldl(
+        fun(PageSeq, Cont) ->
+            Qs = io_lib:format("payload=~s&after=~s&limit=~p", [PayloadEncoding, Cont, Limit]),
+            {ok, MsgsRespP} = emqx_mgmt_api_test_util:request_api(get, Path, Qs, AuthHeader),
+            #{
+                <<"meta">> := #{<<"last">> := NextCont} = MetaP,
+                <<"data">> := MsgsP
+            } = emqx_utils_json:decode(MsgsRespP),
+            ?assertMatch(#{<<"count">> := Count}, MetaP),
+            ?assertNotEqual(<<"end_of_data">>, NextCont),
+            ?assertEqual(length(MsgsP), Limit),
+            ExpFirstPayload = integer_to_binary(PageSeq * Limit - Limit + 1),
+            ExpLastPayload = integer_to_binary(PageSeq * Limit),
+            ?assertEqual(
+                ExpFirstPayload, decode_payload(maps:get(<<"payload">>, hd(MsgsP)), PayloadEncoding)
+            ),
+            ?assertEqual(
+                ExpLastPayload,
+                decode_payload(maps:get(<<"payload">>, lists:last(MsgsP)), PayloadEncoding)
+            ),
+            NextCont
+        end,
+        none,
+        lists:seq(1, Count div 19)
+    ),
+    LastPartialPage = Count div 19 + 1,
+    LastQs = io_lib:format("payload=~s&after=~s&limit=~p", [PayloadEncoding, LastCont, Limit]),
+    {ok, MsgsRespLastP} = emqx_mgmt_api_test_util:request_api(get, Path, LastQs, AuthHeader),
+    #{<<"meta">> := #{<<"last">> := EmptyCont} = MetaLastP, <<"data">> := MsgsLastP} = emqx_utils_json:decode(
+        MsgsRespLastP
+    ),
+    ?assertEqual(<<"end_of_data">>, EmptyCont),
+    ?assertMatch(#{<<"count">> := Count}, MetaLastP),
+
+    ?assertEqual(
+        integer_to_binary(LastPartialPage * Limit - Limit + 1),
+        decode_payload(maps:get(<<"payload">>, hd(MsgsLastP)), PayloadEncoding)
+    ),
+    ?assertEqual(
+        integer_to_binary(Count),
+        decode_payload(maps:get(<<"payload">>, lists:last(MsgsLastP)), PayloadEncoding)
+    ),
+
+    ExceedQs = io_lib:format("payload=~s&after=~s&limit=~p", [
+        PayloadEncoding, EmptyCont, Limit
+    ]),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, ExceedQs, AuthHeader)
+    ),
+
+    %% Invalid common page params
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "limit=0", AuthHeader)
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "limit=limit", AuthHeader)
+    ),
+
+    %% Invalid max_paylod_bytes param
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "max_payload_bytes=0", AuthHeader)
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "max_payload_bytes=-1", AuthHeader)
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "max_payload_bytes=-1MB", AuthHeader)
+    ),
+    ?assertMatch(
+        {error, {_, 400, _}},
+        emqx_mgmt_api_test_util:request_api(get, Path, "max_payload_bytes=0MB", AuthHeader)
     ).
 
+decode_payload(Payload, base64) ->
+    base64:decode(Payload);
+decode_payload(Payload, _) ->
+    Payload.
+
 t_subscribe_shared_topic(_Config) ->
     ClientId = <<"client_subscribe_shared">>,
 

+ 2 - 2
apps/emqx_management/test/emqx_mgmt_api_configs_SUITE.erl

@@ -287,12 +287,12 @@ t_configs_node({'init', Config}) ->
         (other_node, _) -> <<"log=2">>;
         (bad_node, _) -> {badrpc, bad}
     end,
-    meck:expect(emqx_management_proto_v4, get_full_config, F),
+    meck:expect(emqx_management_proto_v5, get_full_config, F),
     meck:expect(emqx_conf_proto_v3, get_hocon_config, F2),
     meck:expect(hocon_pp, do, fun(Conf, _) -> Conf end),
     Config;
 t_configs_node({'end', _}) ->
-    meck:unload([emqx, emqx_management_proto_v4, emqx_conf_proto_v3, hocon_pp]);
+    meck:unload([emqx, emqx_management_proto_v5, emqx_conf_proto_v3, hocon_pp]);
 t_configs_node(_) ->
     Node = atom_to_list(node()),
 

+ 1 - 1
apps/emqx_opentelemetry/src/emqx_opentelemetry.app.src

@@ -1,6 +1,6 @@
 {application, emqx_opentelemetry, [
     {description, "OpenTelemetry for EMQX Broker"},
-    {vsn, "0.2.3"},
+    {vsn, "0.2.4"},
     {registered, []},
     {mod, {emqx_otel_app, []}},
     {applications, [

+ 13 - 1
apps/emqx_opentelemetry/src/emqx_otel_metrics.erl

@@ -104,7 +104,7 @@ safe_stop_default_metrics() ->
         _ = opentelemetry_experimental:stop_default_metrics(),
         ok
     catch
-        %% noramal scenario, metrics supervisor is not started
+        %% normal scenario, metrics supervisor is not started
         exit:{noproc, _} -> ok
     end.
 
@@ -254,6 +254,18 @@ create_counter(Meter, Counters, CallBack) ->
         Counters
     ).
 
+%% Note: list_to_existing_atom("cpu.use") will crash
+%% so we make sure the atom is already existing here
+normalize_name(cpu_use) ->
+    'cpu.use';
+normalize_name(cpu_idle) ->
+    'cpu.idle';
+normalize_name(run_queue) ->
+    'run.queue';
+normalize_name(total_memory) ->
+    'total.memory';
+normalize_name(used_memory) ->
+    'used.memory';
 normalize_name(Name) ->
     list_to_existing_atom(lists:flatten(string:replace(atom_to_list(Name), "_", ".", all))).
 

+ 20 - 11
apps/emqx_prometheus/src/emqx_prometheus.erl

@@ -195,7 +195,7 @@ collect_mf(?PROMETHEUS_DEFAULT_REGISTRY, Callback) ->
     ok = add_collect_family(Callback, stats_metric_meta(), ?MG(stats_data, RawData)),
     ok = add_collect_family(
         Callback,
-        stats_metric_cluster_consistened_meta(),
+        stats_metric_cluster_consistented_meta(),
         ?MG(stats_data_cluster_consistented, RawData)
     ),
     ok = add_collect_family(Callback, vm_metric_meta(), ?MG(vm_data, RawData)),
@@ -502,8 +502,6 @@ stats_metric_meta() ->
         {emqx_sessions_max, gauge, 'sessions.max'},
         {emqx_channels_count, gauge, 'channels.count'},
         {emqx_channels_max, gauge, 'channels.max'},
-        {emqx_cluster_sessions_count, gauge, 'cluster_sessions.count'},
-        {emqx_cluster_sessions_max, gauge, 'cluster_sessions.max'},
         %% pub/sub stats
         {emqx_suboptions_count, gauge, 'suboptions.count'},
         {emqx_suboptions_max, gauge, 'suboptions.max'},
@@ -511,21 +509,25 @@ stats_metric_meta() ->
         {emqx_subscribers_max, gauge, 'subscribers.max'},
         {emqx_subscriptions_count, gauge, 'subscriptions.count'},
         {emqx_subscriptions_max, gauge, 'subscriptions.max'},
-        {emqx_subscriptions_shared_count, gauge, 'subscriptions.shared.count'},
-        {emqx_subscriptions_shared_max, gauge, 'subscriptions.shared.max'},
         %% delayed
         {emqx_delayed_count, gauge, 'delayed.count'},
         {emqx_delayed_max, gauge, 'delayed.max'}
     ].
 
-stats_metric_cluster_consistened_meta() ->
+stats_metric_cluster_consistented_meta() ->
     [
+        %% sessions
+        {emqx_cluster_sessions_count, gauge, 'cluster_sessions.count'},
+        {emqx_cluster_sessions_max, gauge, 'cluster_sessions.max'},
         %% topics
         {emqx_topics_max, gauge, 'topics.max'},
         {emqx_topics_count, gauge, 'topics.count'},
         %% retained
         {emqx_retained_count, gauge, 'retained.count'},
-        {emqx_retained_max, gauge, 'retained.max'}
+        {emqx_retained_max, gauge, 'retained.max'},
+        %% shared subscriptions
+        {emqx_subscriptions_shared_count, gauge, 'subscriptions.shared.count'},
+        {emqx_subscriptions_shared_max, gauge, 'subscriptions.shared.max'}
     ].
 
 stats_data(Mode) ->
@@ -545,7 +547,7 @@ stats_data_cluster_consistented() ->
             AccIn#{Name => [{[], ?C(MetricKAtom, Stats)}]}
         end,
         #{},
-        stats_metric_cluster_consistened_meta()
+        stats_metric_cluster_consistented_meta()
     ).
 
 %%========================================
@@ -589,12 +591,19 @@ cluster_metric_meta() ->
         {emqx_cluster_nodes_stopped, gauge, undefined}
     ].
 
-cluster_data(Mode) ->
+cluster_data(node) ->
+    Labels = [],
+    do_cluster_data(Labels);
+cluster_data(_) ->
+    Labels = [{node, node(self())}],
+    do_cluster_data(Labels).
+
+do_cluster_data(Labels) ->
     Running = emqx:cluster_nodes(running),
     Stopped = emqx:cluster_nodes(stopped),
     #{
-        emqx_cluster_nodes_running => [{with_node_label(Mode, []), length(Running)}],
-        emqx_cluster_nodes_stopped => [{with_node_label(Mode, []), length(Stopped)}]
+        emqx_cluster_nodes_running => [{Labels, length(Running)}],
+        emqx_cluster_nodes_stopped => [{Labels, length(Stopped)}]
     }.
 
 %%========================================

+ 0 - 5
apps/emqx_prometheus/src/emqx_prometheus_cluster.erl

@@ -23,8 +23,6 @@
 
     collect_json_data/2,
 
-    aggre_cluster/3,
-
     point_to_map_fun/1,
 
     boolean_to_number/1,
@@ -83,9 +81,6 @@ aggre_cluster(Module, Mode) ->
         Module:aggre_or_zip_init_acc()
     ).
 
-aggre_cluster(LogicSumKs, ResL, Init) ->
-    do_aggre_cluster(LogicSumKs, ResL, Init).
-
 do_aggre_cluster(_LogicSumKs, [], AccIn) ->
     AccIn;
 do_aggre_cluster(LogicSumKs, [{ok, {_NodeName, NodeMetric}} | Rest], AccIn) ->

+ 18 - 6
apps/emqx_prometheus/test/emqx_prometheus_data_SUITE.erl

@@ -287,12 +287,18 @@ assert_stats_metric_labels([MetricName | R] = _Metric, Mode) ->
         undefined ->
             ok;
         N when is_integer(N) ->
-            %% ct:print(
-            %%     "====================~n"
-            %%     "%% Metric: ~p~n"
-            %%     "%% Expect labels count: ~p in Mode: ~p~n",
-            %%     [_Metric, N, Mode]
-            %% ),
+            case N =:= length(lists:droplast(R)) of
+                true ->
+                    ok;
+                false ->
+                    ct:print(
+                        "====================~n"
+                        "%% Metric: ~p~n"
+                        "%% Expect labels count: ~p in Mode: ~p~n"
+                        "%% But got labels: ~p~n",
+                        [_Metric, N, Mode, length(lists:droplast(R))]
+                    )
+            end,
             ?assertEqual(N, length(lists:droplast(R)))
     end.
 
@@ -304,10 +310,14 @@ assert_stats_metric_labels([MetricName | R] = _Metric, Mode) ->
 
 %% `/prometheus/stats`
 %% BEGIN always no label
+metric_meta(<<"emqx_cluster_sessions_count">>) -> ?meta(0, 0, 0);
+metric_meta(<<"emqx_cluster_sessions_max">>) -> ?meta(0, 0, 0);
 metric_meta(<<"emqx_topics_max">>) -> ?meta(0, 0, 0);
 metric_meta(<<"emqx_topics_count">>) -> ?meta(0, 0, 0);
 metric_meta(<<"emqx_retained_count">>) -> ?meta(0, 0, 0);
 metric_meta(<<"emqx_retained_max">>) -> ?meta(0, 0, 0);
+metric_meta(<<"emqx_subscriptions_shared_count">>) -> ?meta(0, 0, 0);
+metric_meta(<<"emqx_subscriptions_shared_max">>) -> ?meta(0, 0, 0);
 %% END
 %% BEGIN no label in mode `node`
 metric_meta(<<"emqx_vm_cpu_use">>) -> ?meta(0, 1, 1);
@@ -316,6 +326,8 @@ metric_meta(<<"emqx_vm_run_queue">>) -> ?meta(0, 1, 1);
 metric_meta(<<"emqx_vm_process_messages_in_queues">>) -> ?meta(0, 1, 1);
 metric_meta(<<"emqx_vm_total_memory">>) -> ?meta(0, 1, 1);
 metric_meta(<<"emqx_vm_used_memory">>) -> ?meta(0, 1, 1);
+metric_meta(<<"emqx_cluster_nodes_running">>) -> ?meta(0, 1, 1);
+metric_meta(<<"emqx_cluster_nodes_stopped">>) -> ?meta(0, 1, 1);
 %% END
 metric_meta(<<"emqx_cert_expiry_at">>) -> ?meta(2, 2, 2);
 metric_meta(<<"emqx_license_expiry_at">>) -> ?meta(0, 0, 0);

+ 1 - 1
apps/emqx_rule_engine/src/emqx_rule_engine_api.erl

@@ -705,7 +705,7 @@ generate_match_spec(Qs) ->
 generate_match_spec([], _, {MtchHead, Conds}) ->
     {MtchHead, lists:reverse(Conds)};
 generate_match_spec([Qs | Rest], N, {MtchHead, Conds}) ->
-    Holder = binary_to_atom(iolist_to_binary(["$", integer_to_list(N)]), utf8),
+    Holder = list_to_atom([$$ | integer_to_list(N)]),
     NMtchHead = emqx_mgmt_util:merge_maps(MtchHead, ms(element(1, Qs), Holder)),
     NConds = put_conds(Qs, Holder, Conds),
     generate_match_spec(Rest, N + 1, {NMtchHead, NConds}).

+ 1 - 1
apps/emqx_rule_engine/src/emqx_rule_funcs.erl

@@ -1327,7 +1327,7 @@ format_date(TimeUnit, Offset, FormatString, TimeEpoch) ->
 
 date_to_unix_ts(TimeUnit, FormatString, InputString) ->
     Unit = time_unit(TimeUnit),
-    emqx_utils_calendar:parse(InputString, Unit, FormatString).
+    emqx_utils_calendar:formatted_datetime_to_system_time(InputString, Unit, FormatString).
 
 date_to_unix_ts(TimeUnit, Offset, FormatString, InputString) ->
     Unit = time_unit(TimeUnit),

+ 105 - 13
apps/emqx_rule_engine/test/emqx_rule_funcs_SUITE.erl

@@ -1215,6 +1215,50 @@ timezone_to_offset_seconds_helper(FunctionName) ->
     apply_func(FunctionName, [local]),
     ok.
 
+t_date_to_unix_ts(_) ->
+    TestTab = [
+        {{"2024-03-01T10:30:38+08:00", second}, [
+            <<"second">>, <<"+08:00">>, <<"%Y-%m-%d %H-%M-%S">>, <<"2024-03-01 10:30:38">>
+        ]},
+        {{"2024-03-01T10:30:38.333+08:00", second}, [
+            <<"second">>, <<"+08:00">>, <<"%Y-%m-%d %H-%M-%S.%3N">>, <<"2024-03-01 10:30:38.333">>
+        ]},
+        {{"2024-03-01T10:30:38.333+08:00", millisecond}, [
+            <<"millisecond">>,
+            <<"+08:00">>,
+            <<"%Y-%m-%d %H-%M-%S.%3N">>,
+            <<"2024-03-01 10:30:38.333">>
+        ]},
+        {{"2024-03-01T10:30:38.333+08:00", microsecond}, [
+            <<"microsecond">>,
+            <<"+08:00">>,
+            <<"%Y-%m-%d %H-%M-%S.%3N">>,
+            <<"2024-03-01 10:30:38.333">>
+        ]},
+        {{"2024-03-01T10:30:38.333+08:00", nanosecond}, [
+            <<"nanosecond">>,
+            <<"+08:00">>,
+            <<"%Y-%m-%d %H-%M-%S.%3N">>,
+            <<"2024-03-01 10:30:38.333">>
+        ]},
+        {{"2024-03-01T10:30:38.333444+08:00", microsecond}, [
+            <<"microsecond">>,
+            <<"+08:00">>,
+            <<"%Y-%m-%d %H-%M-%S.%6N">>,
+            <<"2024-03-01 10:30:38.333444">>
+        ]}
+    ],
+    lists:foreach(
+        fun({{DateTime3339, Unit}, DateToTsArgs}) ->
+            ?assertEqual(
+                calendar:rfc3339_to_system_time(DateTime3339, [{unit, Unit}]),
+                apply_func(date_to_unix_ts, DateToTsArgs),
+                "Failed on test: " ++ DateTime3339 ++ "/" ++ atom_to_list(Unit)
+            )
+        end,
+        TestTab
+    ).
+
 t_parse_date_errors(_) ->
     ?assertError(
         bad_formatter_or_date,
@@ -1226,6 +1270,37 @@ t_parse_date_errors(_) ->
         bad_formatter_or_date,
         emqx_rule_funcs:date_to_unix_ts(second, <<"%y-%m-%d %H:%M:%S">>, <<"2022-05-26 10:40:12">>)
     ),
+    %% invalid formats
+    ?assertThrow(
+        {missing_date_part, month},
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"%Y-%d %H:%M:%S">>, <<"2022-32 10:40:12">>
+        )
+    ),
+    ?assertThrow(
+        {missing_date_part, year},
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"%H:%M:%S">>, <<"10:40:12">>
+        )
+    ),
+    ?assertError(
+        _,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"%Y-%m-%d %H:%M:%S">>, <<"2022-05-32 10:40:12">>
+        )
+    ),
+    ?assertError(
+        _,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"%Y-%m-%d %H:%M:%S">>, <<"2023-02-29 10:40:12">>
+        )
+    ),
+    ?assertError(
+        _,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-02-30 10:40:12">>
+        )
+    ),
 
     %% Compatibility test
     %% UTC+0
@@ -1245,25 +1320,42 @@ t_parse_date_errors(_) ->
         emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2022-05-26 10-40-12">>)
     ),
 
-    %% UTC+0
-    UnixTsLeap0 = 1582986700,
+    %% leap year checks
     ?assertEqual(
-        UnixTsLeap0,
-        emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2020-02-29 14:31:40">>)
+        %% UTC+0
+        1709217100,
+        emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-02-29 14:31:40">>)
     ),
-
-    %% UTC+0
-    UnixTsLeap1 = 1709297071,
     ?assertEqual(
-        UnixTsLeap1,
+        %% UTC+0
+        1709297071,
         emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-03-01 12:44:31">>)
     ),
-
-    %% UTC+0
-    UnixTsLeap2 = 1709535387,
     ?assertEqual(
-        UnixTsLeap2,
-        emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-03-04 06:56:27">>)
+        %% UTC+0
+        4107588271,
+        emqx_rule_funcs:date_to_unix_ts(second, <<"%Y-%m-%d %H:%M:%S">>, <<"2100-03-01 12:44:31">>)
+    ),
+    ?assertEqual(
+        %% UTC+8
+        1709188300,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"+08:00">>, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-02-29 14:31:40">>
+        )
+    ),
+    ?assertEqual(
+        %% UTC+8
+        1709268271,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"+08:00">>, <<"%Y-%m-%d %H:%M:%S">>, <<"2024-03-01 12:44:31">>
+        )
+    ),
+    ?assertEqual(
+        %% UTC+8
+        4107559471,
+        emqx_rule_funcs:date_to_unix_ts(
+            second, <<"+08:00">>, <<"%Y-%m-%d %H:%M:%S">>, <<"2100-03-01 12:44:31">>
+        )
     ),
 
     %% None zero zone shift with millisecond level precision

+ 50 - 75
apps/emqx_utils/src/emqx_utils_calendar.erl

@@ -22,7 +22,7 @@
     formatter/1,
     format/3,
     format/4,
-    parse/3,
+    formatted_datetime_to_system_time/3,
     offset_second/1
 ]).
 
@@ -48,8 +48,9 @@
 -define(DAYS_PER_YEAR, 365).
 -define(DAYS_PER_LEAP_YEAR, 366).
 -define(DAYS_FROM_0_TO_1970, 719528).
--define(SECONDS_FROM_0_TO_1970, (?DAYS_FROM_0_TO_1970 * ?SECONDS_PER_DAY)).
-
+-define(DAYS_FROM_0_TO_10000, 2932897).
+-define(SECONDS_FROM_0_TO_1970, ?DAYS_FROM_0_TO_1970 * ?SECONDS_PER_DAY).
+-define(SECONDS_FROM_0_TO_10000, (?DAYS_FROM_0_TO_10000 * ?SECONDS_PER_DAY)).
 %% the maximum value is the SECONDS_FROM_0_TO_10000 in the calendar.erl,
 %% here minus SECONDS_PER_DAY to tolerate timezone time offset,
 %% so the maximum date can reach 9999-12-31 which is ample.
@@ -171,10 +172,10 @@ format(Time, Unit, Offset, FormatterBin) when is_binary(FormatterBin) ->
 format(Time, Unit, Offset, Formatter) ->
     do_format(Time, time_unit(Unit), offset_second(Offset), Formatter).
 
-parse(DateStr, Unit, FormatterBin) when is_binary(FormatterBin) ->
-    parse(DateStr, Unit, formatter(FormatterBin));
-parse(DateStr, Unit, Formatter) ->
-    do_parse(DateStr, Unit, Formatter).
+formatted_datetime_to_system_time(DateStr, Unit, FormatterBin) when is_binary(FormatterBin) ->
+    formatted_datetime_to_system_time(DateStr, Unit, formatter(FormatterBin));
+formatted_datetime_to_system_time(DateStr, Unit, Formatter) ->
+    do_formatted_datetime_to_system_time(DateStr, Unit, Formatter).
 
 %%--------------------------------------------------------------------
 %% Time unit
@@ -467,56 +468,51 @@ padding(Data, _Len) ->
     Data.
 
 %%--------------------------------------------------------------------
-%% internal: parse part
+%% internal: formatted_datetime_to_system_time part
 %%--------------------------------------------------------------------
 
-do_parse(DateStr, Unit, Formatter) ->
+do_formatted_datetime_to_system_time(DateStr, Unit, Formatter) ->
     DateInfo = do_parse_date_str(DateStr, Formatter, #{}),
-    {Precise, PrecisionUnit} = precision(DateInfo),
-    Counter =
-        fun
-            (year, V, Res) ->
-                Res + dy(V) * ?SECONDS_PER_DAY * Precise - (?SECONDS_FROM_0_TO_1970 * Precise);
-            (month, V, Res) ->
-                Dm = dym(maps:get(year, DateInfo, 0), V),
-                Res + Dm * ?SECONDS_PER_DAY * Precise;
-            (day, V, Res) ->
-                Res + (V * ?SECONDS_PER_DAY * Precise);
-            (hour, V, Res) ->
-                Res + (V * ?SECONDS_PER_HOUR * Precise);
-            (minute, V, Res) ->
-                Res + (V * ?SECONDS_PER_MINUTE * Precise);
-            (second, V, Res) ->
-                Res + V * Precise;
-            (millisecond, V, Res) ->
-                case PrecisionUnit of
-                    millisecond ->
-                        Res + V;
-                    microsecond ->
-                        Res + (V * 1000);
-                    nanosecond ->
-                        Res + (V * 1000000)
-                end;
-            (microsecond, V, Res) ->
-                case PrecisionUnit of
-                    microsecond ->
-                        Res + V;
-                    nanosecond ->
-                        Res + (V * 1000)
-                end;
-            (nanosecond, V, Res) ->
-                Res + V;
-            (parsed_offset, V, Res) ->
-                Res - V * Precise
-        end,
-    Count = maps:fold(Counter, 0, DateInfo) - (?SECONDS_PER_DAY * Precise),
-    erlang:convert_time_unit(Count, PrecisionUnit, Unit).
-
-precision(#{nanosecond := _}) -> {1000_000_000, nanosecond};
-precision(#{microsecond := _}) -> {1000_000, microsecond};
-precision(#{millisecond := _}) -> {1000, millisecond};
-precision(#{second := _}) -> {1, second};
-precision(_) -> {1, second}.
+    PrecisionUnit = precision(DateInfo),
+    ToPrecisionUnit = fun(Time, FromUnit) ->
+        erlang:convert_time_unit(Time, FromUnit, PrecisionUnit)
+    end,
+    GetRequiredPart = fun(Key) ->
+        case maps:get(Key, DateInfo, undefined) of
+            undefined -> throw({missing_date_part, Key});
+            Value -> Value
+        end
+    end,
+    GetOptionalPart = fun(Key) -> maps:get(Key, DateInfo, 0) end,
+    Year = GetRequiredPart(year),
+    Month = GetRequiredPart(month),
+    Day = GetRequiredPart(day),
+    Hour = GetRequiredPart(hour),
+    Min = GetRequiredPart(minute),
+    Sec = GetRequiredPart(second),
+    DateTime = {{Year, Month, Day}, {Hour, Min, Sec}},
+    TotalSecs = datetime_to_system_time(DateTime) - GetOptionalPart(parsed_offset),
+    check(TotalSecs, DateStr, Unit),
+    TotalTime =
+        ToPrecisionUnit(TotalSecs, second) +
+            ToPrecisionUnit(GetOptionalPart(millisecond), millisecond) +
+            ToPrecisionUnit(GetOptionalPart(microsecond), microsecond) +
+            ToPrecisionUnit(GetOptionalPart(nanosecond), nanosecond),
+    erlang:convert_time_unit(TotalTime, PrecisionUnit, Unit).
+
+check(Secs, _, _) when Secs >= -?SECONDS_FROM_0_TO_1970, Secs < ?SECONDS_FROM_0_TO_10000 ->
+    ok;
+check(_Secs, DateStr, Unit) ->
+    throw({bad_format, #{date_string => DateStr, to_unit => Unit}}).
+
+datetime_to_system_time(DateTime) ->
+    calendar:datetime_to_gregorian_seconds(DateTime) - ?SECONDS_FROM_0_TO_1970.
+
+precision(#{nanosecond := _}) -> nanosecond;
+precision(#{microsecond := _}) -> microsecond;
+precision(#{millisecond := _}) -> millisecond;
+precision(#{second := _}) -> second;
+precision(_) -> second.
 
 do_parse_date_str(<<>>, _, Result) ->
     Result;
@@ -564,27 +560,6 @@ date_size(timezone) -> 5;
 date_size(timezone1) -> 6;
 date_size(timezone2) -> 9.
 
-dym(Y, M) ->
-    case is_leap_year(Y) of
-        true when M > 2 ->
-            dm(M) + 1;
-        _ ->
-            dm(M)
-    end.
-
-dm(1) -> 0;
-dm(2) -> 31;
-dm(3) -> 59;
-dm(4) -> 90;
-dm(5) -> 120;
-dm(6) -> 151;
-dm(7) -> 181;
-dm(8) -> 212;
-dm(9) -> 243;
-dm(10) -> 273;
-dm(11) -> 304;
-dm(12) -> 334.
-
 str_to_int_or_error(Str, Error) ->
     case string:to_integer(Str) of
         {Int, []} ->

+ 19 - 8
bin/emqx

@@ -529,7 +529,6 @@ else
         tmp_proto_dist=$(echo -e "$PS_LINE" | $GREP -oE '\s-ekka_proto_dist.*' | awk '{print $2}' || echo 'inet_tcp')
         SSL_DIST_OPTFILE="$(echo -e "$PS_LINE" | $GREP -oE '\-ssl_dist_optfile\s.+\s' | awk '{print $2}' || true)"
         tmp_ticktime="$(echo -e "$PS_LINE" | $GREP -oE '\s-kernel\snet_ticktime\s.+\s' | awk '{print $3}' || true)"
-        # data_dir is actually not needed, but kept anyway
         tmp_datadir="$(echo -e "$PS_LINE" | $GREP -oE "\-emqx_data_dir.*" | sed -E 's#.+emqx_data_dir[[:blank:]]##g' | sed -E 's#[[:blank:]]--$##g' || true)"
         ## Make the format like what call_hocon multi_get prints out, but only need 4 args
         EMQX_BOOT_CONFIGS="node.name=${tmp_nodename}\nnode.cookie=${tmp_cookie}\ncluster.proto_dist=${tmp_proto_dist}\nnode.dist_net_ticktime=$tmp_ticktime\nnode.data_dir=${tmp_datadir}"
@@ -747,7 +746,11 @@ relx_start_command() {
 # Function to check configs without generating them
 check_config() {
     ## this command checks the configs without generating any files
-    call_hocon -v -s "$SCHEMA_MOD" -c "$EMQX_ETC_DIR"/emqx.conf check_schema
+    call_hocon -v \
+        -s "$SCHEMA_MOD" \
+        -c "$DATA_DIR"/configs/cluster.hocon \
+        -c "$EMQX_ETC_DIR"/emqx.conf \
+        check_schema
 }
 
 # Function to generate app.config and vm.args
@@ -763,11 +766,19 @@ generate_config() {
     local NOW_TIME
     NOW_TIME="$(date +'%Y.%m.%d.%H.%M.%S')"
 
-    ## this command populates two files: app.<time>.config and vm.<time>.args
-    ## NOTE: the generate command merges environment variables to the base config (emqx.conf),
-    ## but does not include the cluster-override.conf and local-override.conf
-    ## meaning, certain overrides will not be mapped to app.<time>.config file
-    call_hocon -v -t "$NOW_TIME" -s "$SCHEMA_MOD" -c "$EMQX_ETC_DIR"/emqx.conf -d "$DATA_DIR"/configs generate
+    ## This command populates two files: app.<time>.config and vm.<time>.args
+    ## It takes input sources and overlays values in below order:
+    ##   - $DATA_DIR/cluster.hocon (if exists)
+    ##   - etc/emqx.conf
+    ##   - environment variables starts with EMQX_ e.g. EMQX_NODE__ROLE
+    ##
+    ## NOTE: it's a known issue that cluster.hocon may change right after the node boots up
+    ##       because it has to sync cluster.hocon from other nodes.
+    call_hocon -v -t "$NOW_TIME" \
+        -s "$SCHEMA_MOD" \
+        -c "$DATA_DIR"/configs/cluster.hocon \
+        -c "$EMQX_ETC_DIR"/emqx.conf \
+        -d "$DATA_DIR"/configs generate
 
     ## filenames are per-hocon convention
     CONF_FILE="$CONFIGS_DIR/app.$NOW_TIME.config"
@@ -986,7 +997,7 @@ if [[ "$IS_BOOT_COMMAND" == 'yes' && "$(get_boot_config 'node.db_backend')" == "
     if ! (echo -e "$COMPATIBILITY_INFO" | $GREP -q 'MNESIA_OK'); then
       logwarn "DB Backend is RLOG, but an incompatible OTP version has been detected. Falling back to using Mnesia DB backend."
       export EMQX_NODE__DB_BACKEND=mnesia
-      export EMQX_NODE__DB_ROLE=core
+      export EMQX_NODE__ROLE=core
     fi
 fi
 

+ 2 - 70
bin/nodetool

@@ -44,8 +44,6 @@ cleanup_key(Str0) ->
 
 do(Args) ->
     ok = do_with_halt(Args, "mnesia_dir", fun create_mnesia_dir/2),
-    ok = do_with_halt(Args, "chkconfig", fun("-config", X) -> chkconfig(X) end),
-    ok = do_with_halt(Args, "chkconfig", fun chkconfig/1),
     Args1 = do_with_ret(
         Args,
         "-name",
@@ -185,7 +183,7 @@ do(Args) ->
         Other ->
             io:format("Other: ~p~n", [Other]),
             io:format(
-                "Usage: nodetool chkconfig|getpid|ping|stop|rpc|rpc_infinity|rpcterms|eval|cold_eval [Terms] [RPC]\n"
+                "Usage: nodetool getpid|ping|stop|rpc|rpc_infinity|rpcterms|eval|cold_eval [Terms] [RPC]\n"
             )
     end,
     net_kernel:stop().
@@ -205,11 +203,7 @@ shutdown_status_loop() ->
 parse_eval_args(Args) ->
     % shells may process args into more than one, and end up stripping
     % spaces, so this converts all of that to a single string to parse
-    String = binary_to_list(
-        list_to_binary(
-            join(Args, " ")
-        )
-    ),
+    String = lists:flatten(lists:join(" ", Args)),
 
     % then just as a convenience to users, if they forgot a trailing
     % '.' add it for them.
@@ -309,36 +303,6 @@ create_mnesia_dir(DataDir, NodeName) ->
     io:format("~s", [MnesiaDir]),
     halt(0).
 
-chkconfig(File) ->
-    case file:consult(File) of
-        {ok, Terms} ->
-            case validate(Terms) of
-                ok ->
-                    halt(0);
-                {error, Problems} ->
-                    lists:foreach(fun print_issue/1, Problems),
-                    %% halt(1) if any problems were errors
-                    halt(
-                        case [x || {error, _} <- Problems] of
-                            [] -> 0;
-                            _ -> 1
-                        end
-                    )
-            end;
-        {error, {Line, Mod, Term}} ->
-            io:format(
-                standard_error, ["Error on line ", file:format_error({Line, Mod, Term}), "\n"], []
-            ),
-            halt(1);
-        {error, Error} ->
-            io:format(
-                standard_error,
-                ["Error reading config file: ", File, " ", file:format_error(Error), "\n"],
-                []
-            ),
-            halt(1)
-    end.
-
 check_license(Config) ->
     ok = ensure_application_load(emqx_license),
     %% This checks formal license validity to ensure
@@ -379,38 +343,6 @@ consult(Cont, Str, Acc) ->
             consult(Cont1, eof, Acc)
     end.
 
-%%
-%% Validation functions for checking the app.config
-%%
-validate([Terms]) ->
-    Results = [ValidateFun(Terms) || ValidateFun <- get_validation_funs()],
-    Failures = [Res || Res <- Results, Res /= true],
-    case Failures of
-        [] ->
-            ok;
-        _ ->
-            {error, Failures}
-    end.
-
-%% Some initial and basic checks for the app.config file
-get_validation_funs() ->
-    [].
-
-print_issue({warning, Warning}) ->
-    io:format(standard_error, "Warning in app.config: ~s~n", [Warning]);
-print_issue({error, Error}) ->
-    io:format(standard_error, "Error in app.config: ~s~n", [Error]).
-
-%% string:join/2 copy; string:join/2 is getting obsoleted
-%% and replaced by lists:join/2, but lists:join/2 is too new
-%% for version support (only appeared in 19.0) so it cannot be
-%% used. Instead we just adopt join/2 locally and hope it works
-%% for most unicode use cases anyway.
-join([], Sep) when is_list(Sep) ->
-    [];
-join([H | T], Sep) ->
-    H ++ lists:append([Sep ++ X || X <- T]).
-
 add_libs_dir() ->
     [_ | _] = RootDir = os:getenv("RUNNER_ROOT_DIR"),
     CurrentVsn = os:getenv("REL_VSN"),

+ 12 - 14
build

@@ -183,10 +183,10 @@ just_compile() {
 just_compile_elixir() {
     ./scripts/pre-compile.sh "$PROFILE"
     rm -f rebar.lock
-    # shellcheck disable=SC1010
-    env MIX_ENV="$PROFILE" mix do local.hex --if-missing --force, \
-        local.rebar rebar3 "${PWD}/rebar3" --if-missing --force, \
-        deps.get
+    env MIX_ENV="$PROFILE" mix local.rebar --if-missing --force
+    env MIX_ENV="$PROFILE" mix local.rebar rebar3 "${PWD}/rebar3" --if-missing --force
+    env MIX_ENV="$PROFILE" mix local.hex --if-missing --force
+    env MIX_ENV="$PROFILE" mix deps.get
     env MIX_ENV="$PROFILE" mix compile
 }
 
@@ -201,13 +201,11 @@ make_rel() {
 make_elixir_rel() {
     ./scripts/pre-compile.sh "$PROFILE"
     export_elixir_release_vars "$PROFILE"
-    # for some reason, this has to be run outside "do"...
-    mix local.rebar --if-missing --force
-    # shellcheck disable=SC1010
-    mix do local.hex --if-missing --force, \
-        local.rebar rebar3 "${PWD}/rebar3" --if-missing --force, \
-        deps.get
-    mix release --overwrite
+    env MIX_ENV="$PROFILE" mix local.rebar --if-missing --force
+    env MIX_ENV="$PROFILE" mix local.rebar rebar3 "${PWD}/rebar3" --if-missing --force
+    env MIX_ENV="$PROFILE" mix local.hex --if-missing --force
+    env MIX_ENV="$PROFILE" mix deps.get
+    env MIX_ENV="$PROFILE" mix release --overwrite
     assert_no_excluded_deps emqx-enterprise emqx_telemetry
 }
 
@@ -395,10 +393,10 @@ function is_ecr_and_enterprise() {
   fi
 }
 
-## Build the default docker image based on debian 11.
+## Build the default docker image based on debian 12.
 make_docker() {
     local EMQX_BUILDER_VERSION="${EMQX_BUILDER_VERSION:-5.3-2}"
-    local EMQX_BUILDER_PLATFORM="${EMQX_BUILDER_PLATFORM:-debian11}"
+    local EMQX_BUILDER_PLATFORM="${EMQX_BUILDER_PLATFORM:-debian12}"
     local EMQX_BUILDER_OTP="${EMQX_BUILDER_OTP:-25.3.2-2}"
     local EMQX_BUILDER_ELIXIR="${EMQX_BUILDER_ELIXIR:-1.15.7}"
     local EMQX_BUILDER=${EMQX_BUILDER:-ghcr.io/emqx/emqx-builder/${EMQX_BUILDER_VERSION}:${EMQX_BUILDER_ELIXIR}-${EMQX_BUILDER_OTP}-${EMQX_BUILDER_PLATFORM}}
@@ -431,7 +429,7 @@ make_docker() {
     local PRODUCT_URL='https://www.emqx.io'
     local PRODUCT_DESCRIPTION='Official docker image for EMQX, the most scalable open-source MQTT broker for IoT, IIoT, and connected vehicles.'
     local DOCUMENTATION_URL='https://www.emqx.io/docs/en/latest/'
-    ## extra_deps is a comma separated list of debian 11 package names
+    ## extra_deps is a comma separated list of debian 12 package names
     local EXTRA_DEPS=''
     if [[ "$PROFILE" = *enterprise* ]]; then
         EXTRA_DEPS='libsasl2-2,libsasl2-modules-gssapi-mit'

+ 1 - 1
changes/ce/feat-12326.en.md

@@ -11,4 +11,4 @@ A new gauge `cluster_sessions` is added to the metrics collection. Exposed to pr
 emqx_cluster_sessions_count 1234
 ```
 
-The counter can only be used for an approximate estimation as the collection and calculations are async.
+NOTE: The counter can only be used for an approximate estimation as the collection and calculations are async.

+ 21 - 0
changes/ce/feat-12561.en.md

@@ -0,0 +1,21 @@
+Implement HTTP APIs to get the list of client's inflight and mqueue messages.
+
+To get the first chunk of data:
+ - GET /clients/{clientid}/mqueue_messages?limit=100
+ - GET /clients/{clientid}/inflight_messages?limit=100
+
+Alternatively:
+ - GET /clients/{clientid}/mqueue_messages?limit=100&after=none
+ - GET /clients/{clientid}/inflight_messages?limit=100&after=none
+
+To get the next chunk of data:
+ - GET /clients/{clientid}/mqueue_messages?limit=100&after={last}
+ - GET /clients/{clientid}/inflight_messages?limit=100&after={last}
+
+ Where {last} is a value (opaque string token) of "meta.last" field from the previous response.
+
+ If there is no more data, "last" = "end_of_data" is returned.
+ If a subsequent request is attempted with "after=end_of_data", a "400 Bad Request" error response will be received.
+
+Mqueue messages are ordered according to the queue (FIFO) order.
+Inflight messages are ordered by MQTT Packet Id, which may not represent the chronological messages order.

+ 1 - 0
changes/ce/feat-12670.en.md

@@ -0,0 +1 @@
+Add field `shared_subscriptions` to endpoint `/monitor_current` and `/monitor_current/nodes/:node`.

+ 1 - 0
changes/ce/feat-12679.en.md

@@ -0,0 +1 @@
+Upgrade docker image base from Debian 11 to Debian 12

+ 9 - 0
changes/ce/feat-12700.en.md

@@ -0,0 +1,9 @@
+Support "b" and "B" unit in bytesize hocon fields.
+
+For example, all three fields below will have the value of 1024 bytes:
+
+```
+bytesize_field = "1024b"
+bytesize_field2 = "1024B"
+bytesize_field2 = 1024
+```

+ 12 - 0
changes/ce/feat-12719.en.md

@@ -0,0 +1,12 @@
+## Support multiple clientid and username Query string parameters in "/clients" API
+
+Multi clientid/username queries examples:
+ - "/clients?clientid=client1&clientid=client2
+ - "/clients?username=user11&username=user2"
+ - "/clients?clientid=client1&clientid=client2&username=user1&username=user2"
+
+## Add an option to specify which client info fields must be included in the response
+
+Request response fields examples:
+ - "/clients?fields=all" (omitting "fields" Qs parameter defaults to returning all fields)
+ - "/clients?fields=clientid,username"

+ 1 - 0
changes/ce/fix-12663.en.md

@@ -0,0 +1 @@
+Fixed an issue where `emqx_vm_cpu_use` and `emqx_vm_cpu_idle` metrics in Prometheus endpoint `/prometheus/stats` are always calculating average usage since operating system boot.

+ 2 - 0
changes/ce/fix-12668.en.md

@@ -0,0 +1,2 @@
+Refactor the SQL function: `date_to_unix_ts()` by using `calendar:datetime_to_gregorian_seconds/1`.
+This change also added validation for the input date format.

+ 0 - 0
changes/ce/fix-12672.en.md


Some files were not shown because too many files changed in this diff