Просмотр исходного кода

Merge remote-tracking branch 'origin/master' into release-50

Zaiming (Stone) Shi 3 лет назад
Родитель
Сommit
b3f3bdeafe
100 измененных файлов с 5165 добавлено и 515 удалено
  1. 8 5
      .ci/docker-compose-file/docker-compose-kafka.yaml
  2. 1 1
      .github/PULL_REQUEST_TEMPLATE/v5.md
  3. 1 1
      .github/workflows/run_test_cases.yaml
  4. 7 3
      apps/emqx/etc/emqx.conf
  5. 356 4
      apps/emqx/i18n/emqx_schema_i18n.conf
  6. 25 0
      apps/emqx/include/emqx_quic.hrl
  7. 2 2
      apps/emqx/rebar.config
  8. 15 2
      apps/emqx/rebar.config.script
  9. 60 23
      apps/emqx/src/emqx_connection.erl
  10. 10 10
      apps/emqx/src/emqx_limiter/src/emqx_limiter_schema.erl
  11. 92 17
      apps/emqx/src/emqx_listeners.erl
  12. 1 1
      apps/emqx/src/emqx_misc.erl
  13. 267 28
      apps/emqx/src/emqx_quic_connection.erl
  14. 469 0
      apps/emqx/src/emqx_quic_data_stream.erl
  15. 158 16
      apps/emqx/src/emqx_quic_stream.erl
  16. 277 47
      apps/emqx/src/emqx_schema.erl
  17. 26 4
      apps/emqx/test/emqx_common_test_helpers.erl
  18. 1 1
      apps/emqx/test/emqx_mqtt_protocol_v5_SUITE.erl
  19. 2041 0
      apps/emqx/test/emqx_quic_multistreams_SUITE.erl
  20. 1 1
      apps/emqx_authn/src/emqx_authn.app.src
  21. 1 1
      apps/emqx_authn/src/simple_authn/emqx_authn_mysql.erl
  22. 1 1
      apps/emqx_authz/src/emqx_authz.app.src
  23. 1 1
      apps/emqx_authz/src/emqx_authz_api_schema.erl
  24. 1 1
      apps/emqx_authz/src/emqx_authz_schema.erl
  25. 36 36
      apps/emqx_conf/src/emqx_conf_schema.erl
  26. 6 2
      apps/emqx_connector/i18n/emqx_connector_mqtt_schema.conf
  27. 1 1
      apps/emqx_connector/src/emqx_connector_http.erl
  28. 2 2
      apps/emqx_connector/src/mqtt/emqx_connector_mqtt_schema.erl
  29. 1 1
      apps/emqx_ctl/src/emqx_ctl.erl
  30. 19 26
      apps/emqx_dashboard/src/emqx_dashboard_monitor_api.erl
  31. 5 5
      apps/emqx_dashboard/src/emqx_dashboard_schema.erl
  32. 5 1
      apps/emqx_dashboard/src/emqx_dashboard_swagger.erl
  33. 2 4
      apps/emqx_dashboard/test/emqx_dashboard_monitor_SUITE.erl
  34. 3 3
      apps/emqx_dashboard/test/emqx_swagger_remote_schema.erl
  35. 32 2
      apps/emqx_dashboard/test/emqx_swagger_requestBody_SUITE.erl
  36. 1 1
      apps/emqx_dashboard/test/emqx_swagger_response_SUITE.erl
  37. 1 1
      apps/emqx_exhook/src/emqx_exhook.app.src
  38. 2 2
      apps/emqx_exhook/src/emqx_exhook_api.erl
  39. 2 2
      apps/emqx_exhook/src/emqx_exhook_schema.erl
  40. 6 2
      apps/emqx_gateway/src/emqx_gateway_api_clients.erl
  41. 5 5
      apps/emqx_gateway/src/emqx_gateway_schema.erl
  42. 214 0
      apps/emqx_machine/src/emqx_cover.erl
  43. 1 1
      apps/emqx_machine/src/emqx_machine.app.src
  44. 1 1
      apps/emqx_management/include/emqx_mgmt.hrl
  45. 41 57
      apps/emqx_management/src/emqx_mgmt.erl
  46. 3 3
      apps/emqx_management/src/emqx_mgmt_api.erl
  47. 6 5
      apps/emqx_management/src/emqx_mgmt_api_clients.erl
  48. 33 11
      apps/emqx_management/src/emqx_mgmt_api_trace.erl
  49. 1 1
      apps/emqx_management/src/emqx_mgmt_util.erl
  50. 387 0
      apps/emqx_management/test/emqx_mgmt_SUITE.erl
  51. 1 1
      apps/emqx_management/test/emqx_mgmt_api_alarms_SUITE.erl
  52. 12 5
      apps/emqx_management/test/emqx_mgmt_api_clients_SUITE.erl
  53. 3 3
      apps/emqx_management/test/emqx_mgmt_api_subscription_SUITE.erl
  54. 1 1
      apps/emqx_management/test/emqx_mgmt_api_topics_SUITE.erl
  55. 15 8
      apps/emqx_management/test/emqx_mgmt_api_trace_SUITE.erl
  56. 2 1
      apps/emqx_modules/test/emqx_telemetry_SUITE.erl
  57. 1 1
      apps/emqx_plugins/src/emqx_plugins.app.src
  58. 2 2
      apps/emqx_plugins/src/emqx_plugins_schema.erl
  59. 2 2
      apps/emqx_prometheus/src/emqx_prometheus_schema.erl
  60. 221 90
      apps/emqx_resource/src/emqx_resource_buffer_worker.erl
  61. 49 10
      apps/emqx_resource/test/emqx_connector_demo.erl
  62. 110 18
      apps/emqx_resource/test/emqx_resource_SUITE.erl
  63. 1 1
      apps/emqx_retainer/src/emqx_retainer_api.erl
  64. 3 3
      apps/emqx_retainer/src/emqx_retainer_schema.erl
  65. 1 1
      apps/emqx_rule_engine/src/emqx_rule_engine_schema.erl
  66. 1 1
      apps/emqx_slow_subs/src/emqx_slow_subs.app.src
  67. 2 2
      apps/emqx_slow_subs/src/emqx_slow_subs_schema.erl
  68. 3 3
      apps/emqx_statsd/src/emqx_statsd_api.erl
  69. 2 2
      apps/emqx_statsd/src/emqx_statsd_schema.erl
  70. 9 3
      bin/emqx
  71. 10 0
      build
  72. 1 0
      changes/ce/feat-10019.en.md
  73. 1 0
      changes/ce/feat-10019.zh.md
  74. 1 0
      changes/ce/feat-9213.en.md
  75. 1 0
      changes/ce/feat-9213.zh.md
  76. 2 0
      changes/ce/feat-9949.en.md
  77. 1 0
      changes/ce/feat-9949.zh.md
  78. 1 0
      changes/ce/fix-10009.en.md
  79. 1 0
      changes/ce/fix-10009.zh.md
  80. 7 0
      changes/ce/fix-10015.en.md
  81. 4 0
      changes/ce/fix-10015.zh.md
  82. 1 0
      changes/ce/fix-10020.en.md
  83. 1 0
      changes/ce/fix-10020.zh.md
  84. 1 0
      changes/ce/fix-10021.en.md
  85. 1 0
      changes/ce/fix-10021.zh.md
  86. 3 0
      changes/ce/fix-9939.en.md
  87. 2 0
      changes/ce/fix-9939.zh.md
  88. 1 0
      changes/ce/fix-9997.en.md
  89. 1 0
      changes/ce/fix-9997.zh.md
  90. 1 0
      changes/ee/feat-10011.en.md
  91. 1 0
      changes/ee/feat-10011.zh.md
  92. 0 0
      changes/ee/feat-9932.en.md
  93. 0 0
      changes/ee/feat-9932.zh.md
  94. 5 0
      changes/ee/fix-10007.en.md
  95. 3 0
      changes/ee/fix-10007.zh.md
  96. 2 0
      changes/v5.0.18/fix-9966.en.md
  97. 2 0
      changes/v5.0.18/fix-9966.zh.md
  98. 3 0
      deploy/charts/README.md
  99. 28 12
      deploy/charts/emqx-enterprise/README.md
  100. 0 0
      deploy/charts/emqx-enterprise/templates/ingress.yaml

+ 8 - 5
.ci/docker-compose-file/docker-compose-kafka.yaml

@@ -2,7 +2,7 @@ version: '3.9'
 
 services:
   zookeeper:
-    image: wurstmeister/zookeeper
+    image: docker.io/library/zookeeper:3.6
     ports:
       - "2181:2181"
     container_name: zookeeper
@@ -39,9 +39,12 @@ services:
     container_name: kafka-1.emqx.net
     hostname: kafka-1.emqx.net
     depends_on:
-      - "kdc"
-      - "zookeeper"
-      - "ssl_cert_gen"
+      kdc:
+        condition: service_started
+      zookeeper:
+        condition: service_started
+      ssl_cert_gen:
+        condition: service_completed_successfully
     environment:
       KAFKA_BROKER_ID: 1
       KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
@@ -52,7 +55,7 @@ services:
       KAFKA_SASL_ENABLED_MECHANISMS: PLAIN,SCRAM-SHA-256,SCRAM-SHA-512,GSSAPI
       KAFKA_SASL_KERBEROS_SERVICE_NAME: kafka
       KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL: PLAIN
-      KAFKA_JMX_OPTS: "-Djava.security.auth.login.config=/etc/kafka/jaas.conf"
+      KAFKA_OPTS: "-Djava.security.auth.login.config=/etc/kafka/jaas.conf"
       KAFKA_ALLOW_EVERYONE_IF_NO_ACL_FOUND: "true"
       KAFKA_CREATE_TOPICS_NG: test-topic-one-partition:1:1,test-topic-two-partitions:2:1,test-topic-three-partitions:3:1,
       KAFKA_AUTHORIZER_CLASS_NAME: kafka.security.auth.SimpleAclAuthorizer

+ 1 - 1
.github/PULL_REQUEST_TEMPLATE/v5.md

@@ -5,7 +5,7 @@ Please convert it to a draft if any of the following conditions are not met. Rev
 
 - [ ] Added tests for the changes
 - [ ] Changed lines covered in coverage report
-- [ ] Change log has been added to `changes/<version>/(feat|fix)-<PR-id>.en.md` and `.zh.md` files
+- [ ] Change log has been added to `changes/{ce,ee}/(feat|perf|fix)-<PR-id>.en.md` and `.zh.md` files
 - [ ] For internal contributor: there is a jira ticket to track this change
 - [ ] If there should be document changes, a PR to emqx-docs.git is sent, or a jira ticket is created to follow up
 - [ ] Schema changes are backward compatible

+ 1 - 1
.github/workflows/run_test_cases.yaml

@@ -56,7 +56,7 @@ jobs:
               echo "runs-on=${RUNS_ON}" | tee -a $GITHUB_OUTPUT
 
     prepare:
-        runs-on: aws-amd64
+        runs-on: ${{ needs.build-matrix.outputs.runs-on }}
         needs: [build-matrix]
         strategy:
           fail-fast: false

+ 7 - 3
apps/emqx/etc/emqx.conf

@@ -34,6 +34,10 @@ listeners.wss.default {
 #  enabled = true
 #  bind = "0.0.0.0:14567"
 #  max_connections = 1024000
-#  keyfile = "{{ platform_etc_dir }}/certs/key.pem"
-#  certfile = "{{ platform_etc_dir }}/certs/cert.pem"
-#}
+#  ssl_options {
+#   verify = verify_none
+#   keyfile = "{{ platform_etc_dir }}/certs/key.pem"
+#   certfile = "{{ platform_etc_dir }}/certs/cert.pem"
+#   cacertfile = "{{ platform_etc_dir }}/certs/cacert.pem"
+#  }
+# }

+ 356 - 4
apps/emqx/i18n/emqx_schema_i18n.conf

@@ -1815,8 +1815,8 @@ fields_listener_enabled {
 
 fields_mqtt_quic_listener_certfile {
     desc {
-        en: """Path to the certificate file."""
-        zh: """证书文件。"""
+        en: """Path to the certificate file. Will be deprecated in 5.1, use .ssl_options.certfile instead."""
+        zh: """证书文件。在 5.1 中会被废弃,使用 .ssl_options.certfile 代替。"""
     }
     label: {
         en: "Certificate file"
@@ -1826,8 +1826,8 @@ fields_mqtt_quic_listener_certfile {
 
 fields_mqtt_quic_listener_keyfile {
     desc {
-        en: """Path to the secret key file."""
-        zh: """私钥文件。"""
+        en: """Path to the secret key file. Will be deprecated in 5.1, use .ssl_options.keyfile instead."""
+        zh: """私钥文件。在 5.1 中会被废弃,使用 .ssl_options.keyfile 代替。"""
     }
     label: {
         en: "Key file"
@@ -1868,6 +1868,17 @@ fields_mqtt_quic_listener_keep_alive_interval {
     }
 }
 
+fields_mqtt_quic_listener_ssl_options {
+    desc {
+        en: """TLS options for QUIC transport"""
+        zh: """QUIC 传输层的 TLS 选项"""
+    }
+    label: {
+        en: "TLS Options"
+        zh: "TLS 选项"
+    }
+}
+
 base_listener_bind {
     desc {
         en: """IP address and port for the listening socket."""
@@ -1890,6 +1901,347 @@ base_listener_acceptors {
     }
 }
 
+fields_mqtt_quic_listener_max_bytes_per_key {
+    desc {
+        en: "Maximum number of bytes to encrypt with a single 1-RTT encryption key before initiating key update. Default: 274877906944"
+        zh: "在启动密钥更新之前,用单个 1-RTT 加密密钥加密的最大字节数。默认值:274877906944"
+    }
+    label {
+        en: "Max bytes per key"
+        zh: "每个密钥的最大字节数"
+    }
+}
+
+fields_mqtt_quic_listener_handshake_idle_timeout_ms {
+    desc {
+        en: "How long a handshake can idle before it is discarded. Default: 10 000"
+        zh: "一个握手在被丢弃之前可以空闲多长时间。 默认值:10 000"
+    }
+    label {
+        en: "Handshake idle timeout ms"
+        zh: "握手空闲超时毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_tls_server_max_send_buffer {
+    desc {
+        en: "How much Server TLS data to buffer. Default: 8192"
+        zh: "缓冲多少TLS数据。 默认值:8192"
+    }
+    label {
+        en: "TLS server max send buffer"
+        zh: "TLS 服务器最大发送缓冲区"
+    }
+}
+
+fields_mqtt_quic_listener_stream_recv_window_default {
+    desc {
+        en: "Initial stream receive window size. Default: 32678"
+        zh: "初始流接收窗口大小。 默认值:32678"
+    }
+    label {
+        en: "Stream recv window default"
+        zh: "流接收窗口默认"
+    }
+}
+
+fields_mqtt_quic_listener_stream_recv_buffer_default {
+    desc {
+        en: "Stream initial buffer size. Default: 4096"
+        zh: "流的初始缓冲区大小。默认:4096"
+    }
+    label {
+        en: "Stream recv buffer default"
+        zh: "流媒体接收缓冲区默认值"
+    }
+}
+
+fields_mqtt_quic_listener_conn_flow_control_window {
+    desc {
+        en: "Connection-wide flow control window. Default: 16777216"
+        zh: "连接的流控窗口。默认:16777216"
+    }
+    label {
+        en: "Conn flow control window"
+        zh: "流控窗口"
+    }
+}
+
+fields_mqtt_quic_listener_max_stateless_operations {
+    desc {
+        en: "The maximum number of stateless operations that may be queued on a worker at any one time. Default: 16"
+        zh: "无状态操作的最大数量,在任何时候都可以在一个工作者上排队。默认值:16"
+    }
+    label {
+        en: "Max stateless operations"
+        zh: "最大无状态操作数"
+    }
+}
+
+fields_mqtt_quic_listener_initial_window_packets {
+    desc {
+        en: "The size (in packets) of the initial congestion window for a connection. Default: 10"
+        zh: "一个连接的初始拥堵窗口的大小(以包为单位)。默认值:10"
+    }
+    label {
+        en: "Initial window packets"
+        zh: "初始窗口数据包"
+    }
+}
+
+fields_mqtt_quic_listener_send_idle_timeout_ms {
+    desc {
+        en: "Reset congestion control after being idle for amount of time. Default: 1000"
+        zh: "在闲置一定时间后重置拥堵控制。默认值:1000"
+    }
+    label {
+        en: "Send idle timeout ms"
+        zh: "发送空闲超时毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_initial_rtt_ms {
+    desc {
+        en: "Initial RTT estimate."
+        zh: "初始RTT估计"
+    }
+    label {
+        en: "Initial RTT ms"
+        zh: "Initial RTT 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_max_ack_delay_ms {
+    desc {
+        en: "How long to wait after receiving data before sending an ACK. Default: 25"
+        zh: "在收到数据后要等待多长时间才能发送一个ACK。默认值:25"
+    }
+    label {
+        en: "Max ack delay ms"
+        zh: "最大应答延迟 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_disconnect_timeout_ms {
+    desc {
+        en: "How long to wait for an ACK before declaring a path dead and disconnecting. Default: 16000"
+        zh: "在判定路径无效和断开连接之前,要等待多长时间的ACK。默认:16000"
+    }
+    label {
+        en: "Disconnect timeout ms"
+        zh: "断开连接超时 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_idle_timeout_ms {
+    desc {
+        en: "How long a connection can go idle before it is gracefully shut down. 0 to disable timeout"
+        zh: "一个连接在被优雅地关闭之前可以空闲多长时间。0 表示禁用超时"
+    }
+    label {
+        en: "Idle timeout ms"
+        zh: "空闲超时 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_handshake_idle_timeout_ms {
+    desc {
+        en: "How long a handshake can idle before it is discarded"
+        zh: "一个握手在被丢弃之前可以空闲多长时间"
+    }
+    label {
+        en: "Handshake idle timeout ms"
+        zh: "握手空闲超时 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_keep_alive_interval_ms {
+    desc {
+        en: "How often to send PING frames to keep a connection alive."
+        zh: "多长时间发送一次PING帧以保活连接。"
+    }
+    label {
+        en: "Keep alive interval ms"
+        zh: "保持活着的时间间隔 毫秒"
+    }
+}
+
+fields_mqtt_quic_listener_peer_bidi_stream_count {
+    desc {
+        en: "Number of bidirectional streams to allow the peer to open."
+        zh: "允许对端打开的双向流的数量"
+    }
+    label {
+        en: "Peer bidi stream count"
+        zh: "对端双向流的数量"
+    }
+}
+
+fields_mqtt_quic_listener_peer_unidi_stream_count {
+    desc {
+        en: "Number of unidirectional streams to allow the peer to open."
+        zh: "允许对端打开的单向流的数量"
+    }
+    label {
+        en: "Peer unidi stream count"
+        zh: "对端单向流的数量"
+    }
+}
+
+fields_mqtt_quic_listener_retry_memory_limit {
+    desc {
+        en: "The percentage of available memory usable for handshake connections before stateless retry is used. Calculated as `N/65535`. Default: 65"
+        zh: "在使用无状态重试之前,可用于握手连接的可用内存的百分比。计算为`N/65535`。默认值:65"
+    }
+    label {
+        en: "Retry memory limit"
+        zh: "重试内存限制"
+    }
+}
+
+fields_mqtt_quic_listener_load_balancing_mode {
+    desc {
+        en: "0: Disabled, 1: SERVER_ID_IP, 2: SERVER_ID_FIXED. default: 0"
+        zh: "0: 禁用, 1: SERVER_ID_IP, 2: SERVER_ID_FIXED. 默认: 0"
+    }
+    label {
+        en: "Load balancing mode"
+        zh: "负载平衡模式"
+    }
+}
+
+fields_mqtt_quic_listener_max_operations_per_drain {
+    desc {
+        en: "The maximum number of operations to drain per connection quantum. Default: 16"
+        zh: "每个连接操作的最大耗费操作数。默认:16"
+    }
+    label {
+        en: "Max operations per drain"
+        zh: "每次操作最大操作数"
+    }
+}
+
+fields_mqtt_quic_listener_send_buffering_enabled {
+    desc {
+        en: "Buffer send data instead of holding application buffers until sent data is acknowledged. Default: 1 (Enabled)"
+        zh: "缓冲发送数据,而不是保留应用缓冲区,直到发送数据被确认。默认值:1(启用)"
+    }
+    label {
+        en: "Send buffering enabled"
+        zh: "启用发送缓冲功能"
+    }
+}
+
+fields_mqtt_quic_listener_pacing_enabled {
+    desc {
+        en: "Pace sending to avoid overfilling buffers on the path. Default: 1 (Enabled)"
+        zh: "有节奏的发送,以避免路径上的缓冲区过度填充。默认值:1(已启用)"
+    }
+    label {
+        en: "Pacing enabled"
+        zh: "启用节奏发送"
+    }
+}
+
+fields_mqtt_quic_listener_migration_enabled {
+    desc {
+        en: "Enable clients to migrate IP addresses and tuples. Requires a cooperative load-balancer, or no load-balancer. Default: 1 (Enabled)"
+        zh: "开启客户端地址迁移功能。需要一个支持的负载平衡器,或者没有负载平衡器。默认值:1(已启用)"
+    }
+    label {
+        en: "Migration enabled"
+        zh: "启用地址迁移"
+    }
+}
+
+fields_mqtt_quic_listener_datagram_receive_enabled {
+    desc {
+        en: "Advertise support for QUIC datagram extension. Reserve for the future. Default 0 (FALSE)"
+        zh: "宣传对QUIC Datagram 扩展的支持。为将来保留。默认为0(FALSE)"
+    }
+    label {
+        en: "Datagram receive enabled"
+        zh: "启用 Datagram 接收"
+    }
+}
+
+fields_mqtt_quic_listener_server_resumption_level {
+    desc {
+        en: "Controls resumption tickets and/or 0-RTT server support. Default: 0 (No resumption)"
+        zh: "连接恢复 和/或 0-RTT 服务器支持。默认值:0(无恢复功能)"
+    }
+    label {
+        en: "Server resumption level"
+        zh: "服务端连接恢复支持"
+    }
+}
+
+fields_mqtt_quic_listener_minimum_mtu {
+    desc {
+        en: "The minimum MTU supported by a connection. This will be used as the starting MTU. Default: 1248"
+        zh: "一个连接所支持的最小MTU。这将被作为起始MTU使用。默认值:1248"
+    }
+    label {
+        en: "Minimum MTU"
+        zh: "最小 MTU"
+    }
+}
+
+fields_mqtt_quic_listener_maximum_mtu {
+    desc {
+        en: "The maximum MTU supported by a connection. This will be the maximum probed value. Default: 1500"
+        zh: "一个连接所支持的最大MTU。这将是最大的探测值。默认值:1500"
+    }
+    label {
+        en: "Maximum MTU"
+        zh: "最大 MTU"
+    }
+}
+
+fields_mqtt_quic_listener_mtu_discovery_search_complete_timeout_us {
+    desc {
+        en: "The time in microseconds to wait before reattempting MTU probing if max was not reached. Default: 600000000"
+        zh: "如果没有达到 max ,在重新尝试 MTU 探测之前要等待的时间,单位是微秒。默认值:600000000"
+    }
+    label {
+        en: "MTU discovery search complete timeout us"
+        zh: ""
+    }
+}
+
+fields_mqtt_quic_listener_mtu_discovery_missing_probe_count {
+    desc {
+        en: "The maximum number of stateless operations that may be queued on a binding at any one time. Default: 3"
+        zh: "在任何时候都可以在一个绑定上排队的无状态操作的最大数量。默认值:3"
+    }
+    label {
+        en: "MTU discovery missing probe count"
+        zh: "MTU发现丢失的探针数量"
+    }
+}
+
+fields_mqtt_quic_listener_max_binding_stateless_operations {
+    desc {
+        en: "The maximum number of stateless operations that may be queued on a binding at any one time. Default: 100"
+        zh: "在任何时候可以在一个绑定上排队的无状态操作的最大数量。默认值:100"
+    }
+    label {
+        en: "Max binding stateless operations"
+        zh: "最大绑定无状态操作"
+    }
+}
+
+fields_mqtt_quic_listener_stateless_operation_expiration_ms {
+    desc {
+        en: "The time limit between operations for the same endpoint, in milliseconds. Default: 100"
+        zh: "同一个对端的操作之间的时间限制,单位是毫秒。 默认:100"
+    }
+    label {
+        en: "Stateless operation expiration ms"
+        zh: "无状态操作过期 毫秒"
+    }
+}
+
 base_listener_max_connections {
     desc {
         en: """The maximum number of concurrent connections allowed by the listener."""

+ 25 - 0
apps/emqx/include/emqx_quic.hrl

@@ -0,0 +1,25 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2023 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+
+-ifndef(EMQX_QUIC_HRL).
+-define(EMQX_QUIC_HRL, true).
+
+%% MQTT Over QUIC Shutdown Error code.
+-define(MQTT_QUIC_CONN_NOERROR, 0).
+-define(MQTT_QUIC_CONN_ERROR_CTRL_STREAM_DOWN, 1).
+-define(MQTT_QUIC_CONN_ERROR_OVERLOADED, 2).
+
+-endif.

+ 2 - 2
apps/emqx/rebar.config

@@ -27,7 +27,7 @@
     {jiffy, {git, "https://github.com/emqx/jiffy", {tag, "1.0.5"}}},
     {cowboy, {git, "https://github.com/emqx/cowboy", {tag, "2.9.0"}}},
     {esockd, {git, "https://github.com/emqx/esockd", {tag, "5.9.4"}}},
-    {ekka, {git, "https://github.com/emqx/ekka", {tag, "0.14.0"}}},
+    {ekka, {git, "https://github.com/emqx/ekka", {tag, "0.14.1"}}},
     {gen_rpc, {git, "https://github.com/emqx/gen_rpc", {tag, "2.8.1"}}},
     {hocon, {git, "https://github.com/emqx/hocon.git", {tag, "0.35.3"}}},
     {pbkdf2, {git, "https://github.com/emqx/erlang-pbkdf2.git", {tag, "2.0.4"}}},
@@ -43,7 +43,7 @@
             {meck, "0.9.2"},
             {proper, "1.4.0"},
             {bbmustache, "1.10.0"},
-            {emqtt, {git, "https://github.com/emqx/emqtt", {tag, "1.7.0"}}}
+            {emqtt, {git, "https://github.com/emqx/emqtt", {tag, "1.8.2"}}}
         ]},
         {extra_src_dirs, [{"test", [recursive]}]}
     ]}

+ 15 - 2
apps/emqx/rebar.config.script

@@ -24,7 +24,20 @@ IsQuicSupp = fun() ->
 end,
 
 Bcrypt = {bcrypt, {git, "https://github.com/emqx/erlang-bcrypt.git", {tag, "0.6.0"}}},
-Quicer = {quicer, {git, "https://github.com/emqx/quic.git", {tag, "0.0.16"}}}.
+Quicer = {quicer, {git, "https://github.com/emqx/quic.git", {tag, "0.0.111"}}}.
+
+Dialyzer = fun(Config) ->
+                   {dialyzer, OldDialyzerConfig} = lists:keyfind(dialyzer, 1, Config),
+                   {plt_extra_apps, OldExtra} = lists:keyfind(plt_extra_apps, 1, OldDialyzerConfig),
+                   Extra = OldExtra ++ [quicer || IsQuicSupp()],
+                   NewDialyzerConfig = [{plt_extra_apps, Extra} | OldDialyzerConfig],
+                   lists:keystore(
+                     dialyzer,
+                     1,
+                     Config,
+                     {dialyzer, NewDialyzerConfig}
+                    )
+           end.
 
 ExtraDeps = fun(C) ->
     {deps, Deps0} = lists:keyfind(deps, 1, C),
@@ -43,4 +56,4 @@ ExtraDeps = fun(C) ->
     )
 end,
 
-ExtraDeps(CONFIG).
+Dialyzer(ExtraDeps(CONFIG)).

+ 60 - 23
apps/emqx/src/emqx_connection.erl

@@ -14,7 +14,13 @@
 %% limitations under the License.
 %%--------------------------------------------------------------------
 
-%% MQTT/TCP|TLS Connection
+%% This module interacts with the transport layer of MQTT
+%% Transport:
+%%   - TCP connection
+%%   - TCP/TLS connection
+%%   - QUIC Stream
+%%
+%% for WebSocket @see emqx_ws_connection.erl
 -module(emqx_connection).
 
 -include("emqx.hrl").
@@ -111,7 +117,10 @@
     limiter_buffer :: queue:queue(pending_req()),
 
     %% limiter timers
-    limiter_timer :: undefined | reference()
+    limiter_timer :: undefined | reference(),
+
+    %% QUIC conn owner pid if in use.
+    quic_conn_pid :: maybe(pid())
 }).
 
 -record(retry, {
@@ -189,12 +198,16 @@
     ]}
 ).
 
--spec start_link(
-    esockd:transport(),
-    esockd:socket() | {pid(), quicer:connection_handler()},
-    emqx_channel:opts()
-) ->
-    {ok, pid()}.
+-spec start_link
+    (esockd:transport(), esockd:socket(), emqx_channel:opts()) ->
+        {ok, pid()};
+    (
+        emqx_quic_stream,
+        {ConnOwner :: pid(), quicer:connection_handle(), quicer:new_conn_props()},
+        emqx_quic_connection:cb_state()
+    ) ->
+        {ok, pid()}.
+
 start_link(Transport, Socket, Options) ->
     Args = [self(), Transport, Socket, Options],
     CPid = proc_lib:spawn_link(?MODULE, init, Args),
@@ -329,6 +342,7 @@ init_state(
     },
     ParseState = emqx_frame:initial_parse_state(FrameOpts),
     Serialize = emqx_frame:serialize_opts(),
+    %% Init Channel
     Channel = emqx_channel:init(ConnInfo, Opts),
     GcState =
         case emqx_config:get_zone_conf(Zone, [force_gc]) of
@@ -359,7 +373,9 @@ init_state(
         zone = Zone,
         listener = Listener,
         limiter_buffer = queue:new(),
-        limiter_timer = undefined
+        limiter_timer = undefined,
+        %% for quic streams to inherit
+        quic_conn_pid = maps:get(conn_pid, Opts, undefined)
     }.
 
 run_loop(
@@ -476,7 +492,9 @@ process_msg([Msg | More], State) ->
             {ok, Msgs, NState} ->
                 process_msg(append_msg(More, Msgs), NState);
             {stop, Reason, NState} ->
-                {stop, Reason, NState}
+                {stop, Reason, NState};
+            {stop, Reason} ->
+                {stop, Reason, State}
         end
     catch
         exit:normal ->
@@ -507,7 +525,6 @@ append_msg(Q, Msg) ->
 
 %%--------------------------------------------------------------------
 %% Handle a Msg
-
 handle_msg({'$gen_call', From, Req}, State) ->
     case handle_call(From, Req, State) of
         {reply, Reply, NState} ->
@@ -525,11 +542,10 @@ handle_msg({Inet, _Sock, Data}, State) when Inet == tcp; Inet == ssl ->
     inc_counter(incoming_bytes, Oct),
     ok = emqx_metrics:inc('bytes.received', Oct),
     when_bytes_in(Oct, Data, State);
-handle_msg({quic, Data, _Sock, _, _, _}, State) ->
-    Oct = iolist_size(Data),
-    inc_counter(incoming_bytes, Oct),
-    ok = emqx_metrics:inc('bytes.received', Oct),
-    when_bytes_in(Oct, Data, State);
+handle_msg({quic, Data, _Stream, #{len := Len}}, State) when is_binary(Data) ->
+    inc_counter(incoming_bytes, Len),
+    ok = emqx_metrics:inc('bytes.received', Len),
+    when_bytes_in(Len, Data, State);
 handle_msg(check_cache, #state{limiter_buffer = Cache} = State) ->
     case queue:peek(Cache) of
         empty ->
@@ -595,9 +611,20 @@ handle_msg({inet_reply, _Sock, {error, Reason}}, State) ->
 handle_msg({connack, ConnAck}, State) ->
     handle_outgoing(ConnAck, State);
 handle_msg({close, Reason}, State) ->
+    %% @FIXME here it could be close due to appl error.
     ?TRACE("SOCKET", "socket_force_closed", #{reason => Reason}),
     handle_info({sock_closed, Reason}, close_socket(State));
-handle_msg({event, connected}, State = #state{channel = Channel}) ->
+handle_msg(
+    {event, connected},
+    State = #state{
+        channel = Channel,
+        serialize = Serialize,
+        parse_state = PS,
+        quic_conn_pid = QuicConnPid
+    }
+) ->
+    QuicConnPid =/= undefined andalso
+        emqx_quic_connection:activate_data_streams(QuicConnPid, {PS, Serialize, Channel}),
     ClientId = emqx_channel:info(clientid, Channel),
     emqx_cm:insert_channel_info(ClientId, info(State), stats(State));
 handle_msg({event, disconnected}, State = #state{channel = Channel}) ->
@@ -654,6 +681,12 @@ maybe_raise_exception(#{
     stacktrace := Stacktrace
 }) ->
     erlang:raise(Exception, Context, Stacktrace);
+maybe_raise_exception({shutdown, normal}) ->
+    ok;
+maybe_raise_exception(normal) ->
+    ok;
+maybe_raise_exception(shutdown) ->
+    ok;
 maybe_raise_exception(Reason) ->
     exit(Reason).
 
@@ -748,6 +781,7 @@ when_bytes_in(Oct, Data, State) ->
         NState
     ).
 
+%% @doc: return a reversed Msg list
 -compile({inline, [next_incoming_msgs/3]}).
 next_incoming_msgs([Packet], Msgs, State) ->
     {ok, [{incoming, Packet} | Msgs], State};
@@ -870,6 +904,7 @@ send(IoData, #state{transport = Transport, socket = Socket, channel = Channel})
             ok;
         Error = {error, _Reason} ->
             %% Send an inet_reply to postpone handling the error
+            %% @FIXME: why not just return error?
             self() ! {inet_reply, Socket, Error},
             ok
     end.
@@ -893,12 +928,14 @@ handle_info({sock_error, Reason}, State) ->
         false -> ok
     end,
     handle_info({sock_closed, Reason}, close_socket(State));
-handle_info({quic, peer_send_shutdown, _Stream}, State) ->
-    handle_info({sock_closed, force}, close_socket(State));
-handle_info({quic, closed, _Channel, ReasonFlag}, State) ->
-    handle_info({sock_closed, ReasonFlag}, State);
-handle_info({quic, closed, _Stream}, State) ->
-    handle_info({sock_closed, force}, State);
+%% handle QUIC control stream events
+handle_info({quic, Event, Handle, Prop}, State) when is_atom(Event) ->
+    case emqx_quic_stream:Event(Handle, Prop, State) of
+        {{continue, Msgs}, NewState} ->
+            {ok, Msgs, NewState};
+        Other ->
+            Other
+    end;
 handle_info(Info, State) ->
     with_channel(handle_info, [Info], State).
 

+ 10 - 10
apps/emqx/src/emqx_limiter/src/emqx_limiter_schema.erl

@@ -110,11 +110,11 @@ fields(limiter) ->
         ];
 fields(node_opts) ->
     [
-        {rate, ?HOCON(rate(), #{desc => ?DESC(rate), default => "infinity"})},
+        {rate, ?HOCON(rate(), #{desc => ?DESC(rate), default => <<"infinity">>})},
         {burst,
             ?HOCON(burst_rate(), #{
                 desc => ?DESC(burst),
-                default => 0
+                default => <<"0">>
             })}
     ];
 fields(client_fields) ->
@@ -128,14 +128,14 @@ fields(client_fields) ->
     ];
 fields(bucket_opts) ->
     [
-        {rate, ?HOCON(rate(), #{desc => ?DESC(rate), default => "infinity"})},
-        {capacity, ?HOCON(capacity(), #{desc => ?DESC(capacity), default => "infinity"})},
-        {initial, ?HOCON(initial(), #{default => "0", desc => ?DESC(initial)})}
+        {rate, ?HOCON(rate(), #{desc => ?DESC(rate), default => <<"infinity">>})},
+        {capacity, ?HOCON(capacity(), #{desc => ?DESC(capacity), default => <<"infinity">>})},
+        {initial, ?HOCON(initial(), #{default => <<"0">>, desc => ?DESC(initial)})}
     ];
 fields(client_opts) ->
     [
-        {rate, ?HOCON(rate(), #{default => "infinity", desc => ?DESC(rate)})},
-        {initial, ?HOCON(initial(), #{default => "0", desc => ?DESC(initial)})},
+        {rate, ?HOCON(rate(), #{default => <<"infinity">>, desc => ?DESC(rate)})},
+        {initial, ?HOCON(initial(), #{default => <<"0">>, desc => ?DESC(initial)})},
         %% low_watermark add for emqx_channel and emqx_session
         %% both modules consume first and then check
         %% so we need to use this value to prevent excessive consumption
@@ -145,13 +145,13 @@ fields(client_opts) ->
                 initial(),
                 #{
                     desc => ?DESC(low_watermark),
-                    default => "0"
+                    default => <<"0">>
                 }
             )},
         {capacity,
             ?HOCON(capacity(), #{
                 desc => ?DESC(client_bucket_capacity),
-                default => "infinity"
+                default => <<"infinity">>
             })},
         {divisible,
             ?HOCON(
@@ -166,7 +166,7 @@ fields(client_opts) ->
                 emqx_schema:duration(),
                 #{
                     desc => ?DESC(max_retry_time),
-                    default => "10s"
+                    default => <<"10s">>
                 }
             )},
         {failure_strategy,

+ 92 - 17
apps/emqx/src/emqx_listeners.erl

@@ -72,9 +72,7 @@ id_example() -> 'tcp:default'.
 list_raw() ->
     [
         {listener_id(Type, LName), Type, LConf}
-     || %% FIXME: quic is not supported update vi dashboard yet
-        {Type, LName, LConf} <- do_list_raw(),
-        Type =/= <<"quic">>
+     || {Type, LName, LConf} <- do_list_raw()
     ].
 
 list() ->
@@ -170,6 +168,11 @@ current_conns(Type, Name, ListenOn) when Type == tcp; Type == ssl ->
     esockd:get_current_connections({listener_id(Type, Name), ListenOn});
 current_conns(Type, Name, _ListenOn) when Type =:= ws; Type =:= wss ->
     proplists:get_value(all_connections, ranch:info(listener_id(Type, Name)));
+current_conns(quic, _Name, _ListenOn) ->
+    case quicer:perf_counters() of
+        {ok, PerfCnts} -> proplists:get_value(conn_active, PerfCnts);
+        _ -> 0
+    end;
 current_conns(_, _, _) ->
     {error, not_support}.
 
@@ -367,31 +370,45 @@ do_start_listener(quic, ListenerName, #{bind := Bind} = Opts) ->
     case [A || {quicer, _, _} = A <- application:which_applications()] of
         [_] ->
             DefAcceptors = erlang:system_info(schedulers_online) * 8,
-            ListenOpts = [
-                {cert, maps:get(certfile, Opts)},
-                {key, maps:get(keyfile, Opts)},
-                {alpn, ["mqtt"]},
-                {conn_acceptors, lists:max([DefAcceptors, maps:get(acceptors, Opts, 0)])},
-                {keep_alive_interval_ms, maps:get(keep_alive_interval, Opts, 0)},
-                {idle_timeout_ms, maps:get(idle_timeout, Opts, 0)},
-                {handshake_idle_timeout_ms, maps:get(handshake_idle_timeout, Opts, 10000)},
-                {server_resumption_level, 2}
-            ],
+            SSLOpts = maps:merge(
+                maps:with([certfile, keyfile], Opts),
+                maps:get(ssl_options, Opts, #{})
+            ),
+            ListenOpts =
+                [
+                    {certfile, str(maps:get(certfile, SSLOpts))},
+                    {keyfile, str(maps:get(keyfile, SSLOpts))},
+                    {alpn, ["mqtt"]},
+                    {conn_acceptors, lists:max([DefAcceptors, maps:get(acceptors, Opts, 0)])},
+                    {keep_alive_interval_ms, maps:get(keep_alive_interval, Opts, 0)},
+                    {idle_timeout_ms, maps:get(idle_timeout, Opts, 0)},
+                    {handshake_idle_timeout_ms, maps:get(handshake_idle_timeout, Opts, 10000)},
+                    {server_resumption_level, maps:get(server_resumption_level, Opts, 2)},
+                    {verify, maps:get(verify, SSLOpts, verify_none)}
+                ] ++
+                    case maps:get(cacertfile, SSLOpts, undefined) of
+                        undefined -> [];
+                        CaCertFile -> [{cacertfile, binary_to_list(CaCertFile)}]
+                    end ++
+                    optional_quic_listener_opts(Opts),
             ConnectionOpts = #{
                 conn_callback => emqx_quic_connection,
-                peer_unidi_stream_count => 1,
-                peer_bidi_stream_count => 10,
+                peer_unidi_stream_count => maps:get(peer_unidi_stream_count, Opts, 1),
+                peer_bidi_stream_count => maps:get(peer_bidi_stream_count, Opts, 10),
                 zone => zone(Opts),
                 listener => {quic, ListenerName},
                 limiter => limiter(Opts)
             },
-            StreamOpts = [{stream_callback, emqx_quic_stream}],
+            StreamOpts = #{
+                stream_callback => emqx_quic_stream,
+                active => 1
+            },
             Id = listener_id(quic, ListenerName),
             add_limiter_bucket(Id, Opts),
             quicer:start_listener(
                 Id,
                 ListenOn,
-                {ListenOpts, ConnectionOpts, StreamOpts}
+                {maps:from_list(ListenOpts), ConnectionOpts, StreamOpts}
             );
         [] ->
             {ok, {skipped, quic_app_missing}}
@@ -710,3 +727,61 @@ get_ssl_options(Conf) ->
         error ->
             maps:get(<<"ssl_options">>, Conf, undefined)
     end.
+
+%% @doc Get QUIC optional settings for low level tunings.
+%% @see quicer:quic_settings()
+-spec optional_quic_listener_opts(map()) -> proplists:proplist().
+optional_quic_listener_opts(Conf) when is_map(Conf) ->
+    maps:to_list(
+        maps:filter(
+            fun(Name, _V) ->
+                lists:member(
+                    Name,
+                    quic_listener_optional_settings()
+                )
+            end,
+            Conf
+        )
+    ).
+
+-spec quic_listener_optional_settings() -> [atom()].
+quic_listener_optional_settings() ->
+    [
+        max_bytes_per_key,
+        %% In conf schema we use handshake_idle_timeout
+        handshake_idle_timeout_ms,
+        %% In conf schema we use idle_timeout
+        idle_timeout_ms,
+        %% not use since we are server
+        %% tls_client_max_send_buffer,
+        tls_server_max_send_buffer,
+        stream_recv_window_default,
+        stream_recv_buffer_default,
+        conn_flow_control_window,
+        max_stateless_operations,
+        initial_window_packets,
+        send_idle_timeout_ms,
+        initial_rtt_ms,
+        max_ack_delay_ms,
+        disconnect_timeout_ms,
+        %% In conf schema,  we use keep_alive_interval
+        keep_alive_interval_ms,
+        %% over written by conn opts
+        peer_bidi_stream_count,
+        %% over written by conn opts
+        peer_unidi_stream_count,
+        retry_memory_limit,
+        load_balancing_mode,
+        max_operations_per_drain,
+        send_buffering_enabled,
+        pacing_enabled,
+        migration_enabled,
+        datagram_receive_enabled,
+        server_resumption_level,
+        minimum_mtu,
+        maximum_mtu,
+        mtu_discovery_search_complete_timeout_us,
+        mtu_discovery_missing_probe_count,
+        max_binding_stateless_operations,
+        stateless_operation_expiration_ms
+    ].

+ 1 - 1
apps/emqx/src/emqx_misc.erl

@@ -720,4 +720,4 @@ pub_props_to_packet(Properties) ->
 safe_filename(Filename) when is_binary(Filename) ->
     binary:replace(Filename, <<":">>, <<"-">>, [global]);
 safe_filename(Filename) when is_list(Filename) ->
-    string:replace(Filename, ":", "-", all).
+    lists:flatten(string:replace(Filename, ":", "-", all)).

+ 267 - 28
apps/emqx/src/emqx_quic_connection.erl

@@ -14,60 +14,282 @@
 %% limitations under the License.
 %%--------------------------------------------------------------------
 
+%% @doc impl. the quic connection owner process.
 -module(emqx_quic_connection).
 
 -ifndef(BUILD_WITHOUT_QUIC).
+
+-include("logger.hrl").
 -include_lib("quicer/include/quicer.hrl").
--else.
--define(QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, 0).
--endif.
+-include_lib("emqx/include/emqx_quic.hrl").
+
+-behaviour(quicer_connection).
 
-%% Callbacks
 -export([
     init/1,
-    new_conn/2,
-    connected/2,
-    shutdown/2
+    new_conn/3,
+    connected/3,
+    transport_shutdown/3,
+    shutdown/3,
+    closed/3,
+    local_address_changed/3,
+    peer_address_changed/3,
+    streams_available/3,
+    peer_needs_streams/3,
+    resumed/3,
+    new_stream/3
+]).
+
+-export([activate_data_streams/2]).
+
+-export([
+    handle_call/3,
+    handle_info/2
 ]).
 
--type cb_state() :: map() | proplists:proplist().
+-type cb_state() :: #{
+    %% connecion owner pid
+    conn_pid := pid(),
+    %% Pid of ctrl stream
+    ctrl_pid := undefined | pid(),
+    %% quic connecion handle
+    conn := undefined | quicer:conneciton_handle(),
+    %% Data streams that handoff from this process
+    %% these streams could die/close without effecting the connecion/session.
+    %@TODO type?
+    streams := [{pid(), quicer:stream_handle()}],
+    %% New stream opts
+    stream_opts := map(),
+    %% If conneciton is resumed from session ticket
+    is_resumed => boolean(),
+    %% mqtt message serializer config
+    serialize => undefined,
+    _ => _
+}.
+-type cb_ret() :: quicer_lib:cb_ret().
 
--spec init(cb_state()) -> cb_state().
-init(ConnOpts) when is_list(ConnOpts) ->
-    init(maps:from_list(ConnOpts));
+%% @doc  Data streams initializions are started in parallel with control streams, data streams are blocked
+%%       for the activation from control stream after it is accepted as a legit conneciton.
+%%       For security, the initial number of allowed data streams from client should be limited by
+%%       'peer_bidi_stream_count` & 'peer_unidi_stream_count`
+-spec activate_data_streams(pid(), {
+    emqx_frame:parse_state(), emqx_frame:serialize_opts(), emqx_channel:channel()
+}) -> ok.
+activate_data_streams(ConnOwner, {PS, Serialize, Channel}) ->
+    gen_server:call(ConnOwner, {activate_data_streams, {PS, Serialize, Channel}}, infinity).
+
+%% @doc conneciton owner init callback
+-spec init(map()) -> {ok, cb_state()}.
+init(#{stream_opts := SOpts} = S) when is_list(SOpts) ->
+    init(S#{stream_opts := maps:from_list(SOpts)});
 init(ConnOpts) when is_map(ConnOpts) ->
-    ConnOpts.
+    {ok, init_cb_state(ConnOpts)}.
+
+-spec closed(quicer:conneciton_handle(), quicer:conn_closed_props(), cb_state()) ->
+    {stop, normal, cb_state()}.
+closed(_Conn, #{is_peer_acked := _} = Prop, S) ->
+    ?SLOG(debug, Prop),
+    {stop, normal, S}.
 
--spec new_conn(quicer:connection_handler(), cb_state()) -> {ok, cb_state()} | {error, any()}.
-new_conn(Conn, #{zone := Zone} = S) ->
+%% @doc handle the new incoming connecion as the connecion acceptor.
+-spec new_conn(quicer:connection_handle(), quicer:new_conn_props(), cb_state()) ->
+    {ok, cb_state()} | {error, any(), cb_state()}.
+new_conn(
+    Conn,
+    #{version := _Vsn} = ConnInfo,
+    #{zone := Zone, conn := undefined, ctrl_pid := undefined} = S
+) ->
     process_flag(trap_exit, true),
+    ?SLOG(debug, ConnInfo),
     case emqx_olp:is_overloaded() andalso is_zone_olp_enabled(Zone) of
         false ->
-            {ok, Pid} = emqx_connection:start_link(emqx_quic_stream, {self(), Conn}, S),
+            %% Start control stream process
+            StartOption = S,
+            {ok, CtrlPid} = emqx_connection:start_link(
+                emqx_quic_stream,
+                {self(), Conn, maps:without([crypto_buffer], ConnInfo)},
+                StartOption
+            ),
             receive
-                {Pid, stream_acceptor_ready} ->
+                {CtrlPid, stream_acceptor_ready} ->
                     ok = quicer:async_handshake(Conn),
-                    {ok, S};
-                {'EXIT', Pid, _Reason} ->
-                    {error, stream_accept_error}
+                    {ok, S#{conn := Conn, ctrl_pid := CtrlPid}};
+                {'EXIT', _Pid, _Reason} ->
+                    {stop, stream_accept_error, S}
             end;
         true ->
             emqx_metrics:inc('olp.new_conn'),
-            {error, overloaded}
+            _ = quicer:async_shutdown_connection(
+                Conn,
+                ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE,
+                ?MQTT_QUIC_CONN_ERROR_OVERLOADED
+            ),
+            {stop, normal, S}
     end.
 
--spec connected(quicer:connection_handler(), cb_state()) -> {ok, cb_state()} | {error, any()}.
-connected(Conn, #{slow_start := false} = S) ->
-    {ok, _Pid} = emqx_connection:start_link(emqx_quic_stream, Conn, S),
-    {ok, S};
-connected(_Conn, S) ->
+%% @doc callback when connection is connected.
+-spec connected(quicer:connection_handle(), quicer:connected_props(), cb_state()) ->
+    {ok, cb_state()} | {error, any(), cb_state()}.
+connected(_Conn, Props, S) ->
+    ?SLOG(debug, Props),
+    {ok, S}.
+
+%% @doc callback when connection is resumed from 0-RTT
+-spec resumed(quicer:connection_handle(), SessionData :: binary() | false, cb_state()) -> cb_ret().
+%% reserve resume conn with callback.
+%% resumed(Conn, Data, #{resumed_callback := ResumeFun} = S) when
+%%     is_function(ResumeFun)
+%% ->
+%%     ResumeFun(Conn, Data, S);
+resumed(_Conn, _Data, S) ->
+    {ok, S#{is_resumed := true}}.
+
+%% @doc callback for handling orphan data streams
+%%      depends on the connecion state and control stream state.
+-spec new_stream(quicer:stream_handle(), quicer:new_stream_props(), cb_state()) -> cb_ret().
+new_stream(
+    Stream,
+    #{is_orphan := true, flags := _Flags} = Props,
+    #{
+        conn := Conn,
+        streams := Streams,
+        stream_opts := SOpts,
+        zone := Zone,
+        limiter := Limiter,
+        parse_state := PS,
+        channel := Channel,
+        serialize := Serialize
+    } = S
+) ->
+    %% Cherry pick options for data streams
+    SOpts1 = SOpts#{
+        is_local => false,
+        zone => Zone,
+        % unused
+        limiter => Limiter,
+        parse_state => PS,
+        channel => Channel,
+        serialize => Serialize,
+        quic_event_mask => ?QUICER_STREAM_EVENT_MASK_START_COMPLETE
+    },
+    {ok, NewStreamOwner} = quicer_stream:start_link(
+        emqx_quic_data_stream,
+        Stream,
+        Conn,
+        SOpts1,
+        Props
+    ),
+    case quicer:handoff_stream(Stream, NewStreamOwner, {PS, Serialize, Channel}) of
+        ok ->
+            ok;
+        E ->
+            %% Only log, keep connecion alive.
+            ?SLOG(error, #{message => "new stream handoff failed", stream => Stream, error => E})
+    end,
+    %% @TODO maybe keep them in `inactive_streams'
+    {ok, S#{streams := [{NewStreamOwner, Stream} | Streams]}}.
+
+%% @doc callback for handling remote connecion shutdown.
+-spec shutdown(quicer:connection_handle(), quicer:error_code(), cb_state()) -> cb_ret().
+shutdown(Conn, ErrorCode, S) ->
+    ErrorCode =/= 0 andalso ?SLOG(debug, #{error_code => ErrorCode, state => S}),
+    _ = quicer:async_shutdown_connection(Conn, ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, 0),
+    {ok, S}.
+
+%% @doc callback for handling transport error, such as idle timeout
+-spec transport_shutdown(quicer:connection_handle(), quicer:transport_shutdown_props(), cb_state()) ->
+    cb_ret().
+transport_shutdown(_C, DownInfo, S) when is_map(DownInfo) ->
+    ?SLOG(debug, DownInfo),
+    {ok, S}.
+
+%% @doc callback for handling for peer addr changed.
+-spec peer_address_changed(quicer:connection_handle(), quicer:quicer_addr(), cb_state) -> cb_ret().
+peer_address_changed(_C, _NewAddr, S) ->
+    %% @TODO update conn info in emqx_quic_stream
+    {ok, S}.
+
+%% @doc callback for handling local addr change, currently unused
+-spec local_address_changed(quicer:connection_handle(), quicer:quicer_addr(), cb_state()) ->
+    cb_ret().
+local_address_changed(_C, _NewAddr, S) ->
     {ok, S}.
 
--spec shutdown(quicer:connection_handler(), cb_state()) -> {ok, cb_state()} | {error, any()}.
-shutdown(Conn, S) ->
-    quicer:async_shutdown_connection(Conn, ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, 0),
+%% @doc callback for handling remote stream limit updates
+-spec streams_available(
+    quicer:connection_handle(),
+    {BidirStreams :: non_neg_integer(), UnidirStreams :: non_neg_integer()},
+    cb_state()
+) -> cb_ret().
+streams_available(_C, {BidirCnt, UnidirCnt}, S) ->
+    {ok, S#{
+        peer_bidi_stream_count => BidirCnt,
+        peer_unidi_stream_count => UnidirCnt
+    }}.
+
+%% @doc callback for handling request when remote wants for more streams
+%%      should cope with rate limiting
+%% @TODO this is not going to get triggered in current version
+%% ref: https://github.com/microsoft/msquic/issues/3120
+-spec peer_needs_streams(quicer:connection_handle(), undefined, cb_state()) -> cb_ret().
+peer_needs_streams(_C, undefined, S) ->
+    ?SLOG(info, #{
+        msg => "ignore: peer need more streames", info => maps:with([conn_pid, ctrl_pid], S)
+    }),
     {ok, S}.
 
+%% @doc handle API calls
+-spec handle_call(Req :: term(), gen_server:from(), cb_state()) -> cb_ret().
+handle_call(
+    {activate_data_streams, {PS, Serialize, Channel} = ActivateData},
+    _From,
+    #{streams := Streams} = S
+) ->
+    _ = [
+        %% Try to activate streams individually if failed, stream will shutdown on its own.
+        %% we dont care about the return val here.
+        %% note, this is only used after control stream pass the validation. The data streams
+        %%       that are called here are assured to be inactived (data processing hasn't been started).
+        catch emqx_quic_data_stream:activate_data(OwnerPid, ActivateData)
+     || {OwnerPid, _Stream} <- Streams
+    ],
+    {reply, ok, S#{
+        channel := Channel,
+        serialize := Serialize,
+        parse_state := PS
+    }};
+handle_call(_Req, _From, S) ->
+    {reply, {error, unimpl}, S}.
+
+%% @doc handle DOWN messages from streams.
+handle_info({'EXIT', Pid, Reason}, #{ctrl_pid := Pid, conn := Conn} = S) ->
+    Code =
+        case Reason of
+            normal ->
+                ?MQTT_QUIC_CONN_NOERROR;
+            _ ->
+                ?MQTT_QUIC_CONN_ERROR_CTRL_STREAM_DOWN
+        end,
+    _ = quicer:async_shutdown_connection(Conn, ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, Code),
+    {ok, S};
+handle_info({'EXIT', Pid, Reason}, #{streams := Streams} = S) ->
+    case proplists:is_defined(Pid, Streams) of
+        true when
+            Reason =:= normal orelse
+                Reason =:= {shutdown, protocol_error} orelse
+                Reason =:= killed
+        ->
+            {ok, S};
+        true ->
+            ?SLOG(info, #{message => "Data stream unexpected exit", reason => Reason}),
+            {ok, S};
+        false ->
+            {stop, unknown_pid_down, S}
+    end.
+
+%%%
+%%%  Internals
+%%%
 -spec is_zone_olp_enabled(emqx_types:zone()) -> boolean().
 is_zone_olp_enabled(Zone) ->
     case emqx_config:get_zone_conf(Zone, [overload_protection]) of
@@ -76,3 +298,20 @@ is_zone_olp_enabled(Zone) ->
         _ ->
             false
     end.
+
+-spec init_cb_state(map()) -> cb_state().
+init_cb_state(#{zone := _Zone} = Map) ->
+    Map#{
+        conn_pid => self(),
+        ctrl_pid => undefined,
+        conn => undefined,
+        streams => [],
+        parse_state => undefined,
+        channel => undefined,
+        serialize => undefined,
+        is_resumed => false
+    }.
+
+%% BUILD_WITHOUT_QUIC
+-else.
+-endif.

+ 469 - 0
apps/emqx/src/emqx_quic_data_stream.erl

@@ -0,0 +1,469 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2023 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+
+%%
+%% @doc QUIC data stream
+%% Following the behaviour of emqx_connection:
+%%  The MQTT packets and their side effects are handled *atomically*.
+%%
+
+-module(emqx_quic_data_stream).
+
+-ifndef(BUILD_WITHOUT_QUIC).
+-behaviour(quicer_remote_stream).
+
+-include_lib("snabbkaffe/include/snabbkaffe.hrl").
+-include_lib("quicer/include/quicer.hrl").
+-include("emqx_mqtt.hrl").
+-include("logger.hrl").
+
+%% Connection Callbacks
+-export([
+    init_handoff/4,
+    post_handoff/3,
+    send_complete/3,
+    peer_send_shutdown/3,
+    peer_send_aborted/3,
+    peer_receive_aborted/3,
+    send_shutdown_complete/3,
+    stream_closed/3,
+    passive/3
+]).
+
+-export([handle_stream_data/4]).
+
+%% gen_server API
+-export([activate_data/2]).
+
+-export([
+    handle_call/3,
+    handle_info/2,
+    handle_continue/2
+]).
+
+-type cb_ret() :: quicer_stream:cb_ret().
+-type cb_state() :: quicer_stream:cb_state().
+-type error_code() :: quicer:error_code().
+-type connection_handle() :: quicer:connection_handle().
+-type stream_handle() :: quicer:stream_handle().
+-type handoff_data() :: {
+    emqx_frame:parse_state() | undefined,
+    emqx_frame:serialize_opts() | undefined,
+    emqx_channel:channel() | undefined
+}.
+%%
+%% @doc Activate the data handling.
+%%      Note, data handling is disabled before finishing the validation over control stream.
+-spec activate_data(pid(), {
+    emqx_frame:parse_state(), emqx_frame:serialize_opts(), emqx_channel:channel()
+}) -> ok.
+activate_data(StreamPid, {PS, Serialize, Channel}) ->
+    gen_server:call(StreamPid, {activate, {PS, Serialize, Channel}}, infinity).
+
+%%
+%% @doc Handoff from previous owner, from the connection owner.
+%%      Note, unlike control stream, there is no acceptor for data streams.
+%%            The connection owner get new stream, spawn new proc and then handover to it.
+%%
+-spec init_handoff(stream_handle(), map(), connection_handle(), quicer:new_stream_props()) ->
+    {ok, cb_state()}.
+init_handoff(
+    Stream,
+    _StreamOpts,
+    Connection,
+    #{is_orphan := true, flags := Flags}
+) ->
+    {ok, init_state(Stream, Connection, Flags)}.
+
+%%
+%% @doc Post handoff data stream
+%%
+-spec post_handoff(stream_handle(), handoff_data(), cb_state()) -> cb_ret().
+post_handoff(_Stream, {undefined = _PS, undefined = _Serialize, undefined = _Channel}, S) ->
+    %% When the channel isn't ready yet.
+    %% Data stream should wait for activate call with ?MODULE:activate_data/2
+    {ok, S};
+post_handoff(Stream, {PS, Serialize, Channel}, S) ->
+    ?tp(debug, ?FUNCTION_NAME, #{channel => Channel, serialize => Serialize}),
+    _ = quicer:setopt(Stream, active, 10),
+    {ok, S#{channel := Channel, serialize := Serialize, parse_state := PS}}.
+
+-spec peer_receive_aborted(stream_handle(), error_code(), cb_state()) -> cb_ret().
+peer_receive_aborted(Stream, ErrorCode, #{is_unidir := _} = S) ->
+    %% we abort send with same reason
+    _ = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_ABORT, ErrorCode),
+    {ok, S}.
+
+-spec peer_send_aborted(stream_handle(), error_code(), cb_state()) -> cb_ret().
+peer_send_aborted(Stream, ErrorCode, #{is_unidir := _} = S) ->
+    %% we abort receive with same reason
+    _ = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_ABORT_RECEIVE, ErrorCode),
+    {ok, S}.
+
+-spec peer_send_shutdown(stream_handle(), undefined, cb_state()) -> cb_ret().
+peer_send_shutdown(Stream, undefined, S) ->
+    ok = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_GRACEFUL, 0),
+    {ok, S}.
+
+-spec send_complete(stream_handle(), IsCanceled :: boolean(), cb_state()) -> cb_ret().
+send_complete(_Stream, false, S) ->
+    {ok, S};
+send_complete(_Stream, true = _IsCanceled, S) ->
+    {ok, S}.
+
+-spec send_shutdown_complete(stream_handle(), error_code(), cb_state()) -> cb_ret().
+send_shutdown_complete(_Stream, _Flags, S) ->
+    {ok, S}.
+
+-spec handle_stream_data(stream_handle(), binary(), quicer:recv_data_props(), cb_state()) ->
+    cb_ret().
+handle_stream_data(
+    _Stream,
+    Bin,
+    _Flags,
+    #{
+        is_unidir := false,
+        channel := Channel,
+        parse_state := PS,
+        data_queue := QueuedData,
+        task_queue := TQ
+    } = State
+) when
+    %% assert get stream data only after channel is created
+    Channel =/= undefined
+->
+    {MQTTPackets, NewPS} = parse_incoming(list_to_binary(lists:reverse([Bin | QueuedData])), PS),
+    NewTQ = lists:foldl(
+        fun(Item, Acc) ->
+            queue:in(Item, Acc)
+        end,
+        TQ,
+        [{incoming, P} || P <- lists:reverse(MQTTPackets)]
+    ),
+    {{continue, handle_appl_msg}, State#{parse_state := NewPS, task_queue := NewTQ}}.
+
+-spec passive(stream_handle(), undefined, cb_state()) -> cb_ret().
+passive(Stream, undefined, S) ->
+    _ = quicer:setopt(Stream, active, 10),
+    {ok, S}.
+
+-spec stream_closed(stream_handle(), quicer:stream_closed_props(), cb_state()) -> cb_ret().
+stream_closed(
+    _Stream,
+    #{
+        is_conn_shutdown := IsConnShutdown,
+        is_app_closing := IsAppClosing,
+        is_shutdown_by_app := IsAppShutdown,
+        is_closed_remotely := IsRemote,
+        status := Status,
+        error := Code
+    },
+    S
+) when
+    is_boolean(IsConnShutdown) andalso
+        is_boolean(IsAppClosing) andalso
+        is_boolean(IsAppShutdown) andalso
+        is_boolean(IsRemote) andalso
+        is_atom(Status) andalso
+        is_integer(Code)
+->
+    {stop, normal, S}.
+
+-spec handle_call(Request :: term(), From :: {pid(), term()}, cb_state()) -> cb_ret().
+handle_call(Call, _From, S) ->
+    do_handle_call(Call, S).
+
+-spec handle_continue(Continue :: term(), cb_state()) -> cb_ret().
+handle_continue(handle_appl_msg, #{task_queue := Q} = S) ->
+    case queue:out(Q) of
+        {{value, Item}, Q2} ->
+            do_handle_appl_msg(Item, S#{task_queue := Q2});
+        {empty, _Q} ->
+            {ok, S}
+    end.
+
+%%% Internals
+do_handle_appl_msg(
+    {outgoing, Packets},
+    #{
+        channel := Channel,
+        stream := _Stream,
+        serialize := _Serialize
+    } = S
+) when
+    Channel =/= undefined
+->
+    case handle_outgoing(Packets, S) of
+        {ok, Size} ->
+            ok = emqx_metrics:inc('bytes.sent', Size),
+            {{continue, handle_appl_msg}, S};
+        {error, E1, E2} ->
+            {stop, {E1, E2}, S};
+        {error, E} ->
+            {stop, E, S}
+    end;
+do_handle_appl_msg({incoming, #mqtt_packet{} = Packet}, #{channel := Channel} = S) when
+    Channel =/= undefined
+->
+    ok = inc_incoming_stats(Packet),
+    with_channel(handle_in, [Packet], S);
+do_handle_appl_msg({incoming, {frame_error, _} = FE}, #{channel := Channel} = S) when
+    Channel =/= undefined
+->
+    with_channel(handle_in, [FE], S);
+do_handle_appl_msg({close, Reason}, S) ->
+    %% @TODO shall we abort shutdown or graceful shutdown here?
+    with_channel(handle_info, [{sock_closed, Reason}], S);
+do_handle_appl_msg({event, updated}, S) ->
+    %% Data stream don't care about connection state changes.
+    {{continue, handle_appl_msg}, S}.
+
+handle_info(Deliver = {deliver, _, _}, S) ->
+    Delivers = [Deliver],
+    with_channel(handle_deliver, [Delivers], S);
+handle_info({timeout, Ref, Msg}, S) ->
+    with_channel(handle_timeout, [Ref, Msg], S);
+handle_info(Info, State) ->
+    with_channel(handle_info, [Info], State).
+
+with_channel(Fun, Args, #{channel := Channel, task_queue := Q} = S) when
+    Channel =/= undefined
+->
+    case apply(emqx_channel, Fun, Args ++ [Channel]) of
+        ok ->
+            {{continue, handle_appl_msg}, S};
+        {ok, Msgs, NewChannel} when is_list(Msgs) ->
+            {{continue, handle_appl_msg}, S#{
+                task_queue := queue:join(Q, queue:from_list(Msgs)),
+                channel := NewChannel
+            }};
+        {ok, Msg, NewChannel} when is_record(Msg, mqtt_packet) ->
+            {{continue, handle_appl_msg}, S#{
+                task_queue := queue:in({outgoing, Msg}, Q), channel := NewChannel
+            }};
+        %% @FIXME WTH?
+        {ok, {outgoing, _} = Msg, NewChannel} ->
+            {{continue, handle_appl_msg}, S#{task_queue := queue:in(Msg, Q), channel := NewChannel}};
+        {ok, NewChannel} ->
+            {{continue, handle_appl_msg}, S#{channel := NewChannel}};
+        %% @TODO optimisation for shutdown wrap
+        {shutdown, Reason, NewChannel} ->
+            {stop, {shutdown, Reason}, S#{channel := NewChannel}};
+        {shutdown, Reason, Msgs, NewChannel} when is_list(Msgs) ->
+            %% @TODO handle outgoing?
+            {stop, {shutdown, Reason}, S#{
+                channel := NewChannel,
+                task_queue := queue:join(Q, queue:from_list(Msgs))
+            }};
+        {shutdown, Reason, Msg, NewChannel} ->
+            {stop, {shutdown, Reason}, S#{
+                channel := NewChannel,
+                task_queue := queue:in(Msg, Q)
+            }}
+    end.
+
+handle_outgoing(#mqtt_packet{} = P, S) ->
+    handle_outgoing([P], S);
+handle_outgoing(Packets, #{serialize := Serialize, stream := Stream, is_unidir := false}) when
+    is_list(Packets)
+->
+    OutBin = [serialize_packet(P, Serialize) || P <- filter_disallowed_out(Packets)],
+    %% Send data async but still want send feedback via {quic, send_complete, ...}
+    Res = quicer:async_send(Stream, OutBin, ?QUICER_SEND_FLAG_SYNC),
+    ?TRACE("MQTT", "mqtt_packet_sent", #{packets => Packets}),
+    [ok = inc_outgoing_stats(P) || P <- Packets],
+    Res.
+
+serialize_packet(Packet, Serialize) ->
+    try emqx_frame:serialize_pkt(Packet, Serialize) of
+        <<>> ->
+            ?SLOG(warning, #{
+                msg => "packet_is_discarded",
+                reason => "frame_is_too_large",
+                packet => emqx_packet:format(Packet, hidden)
+            }),
+            ok = emqx_metrics:inc('delivery.dropped.too_large'),
+            ok = emqx_metrics:inc('delivery.dropped'),
+            ok = inc_outgoing_stats({error, message_too_large}),
+            <<>>;
+        Data ->
+            Data
+    catch
+        %% Maybe Never happen.
+        throw:{?FRAME_SERIALIZE_ERROR, Reason} ->
+            ?SLOG(info, #{
+                reason => Reason,
+                input_packet => Packet
+            }),
+            erlang:error({?FRAME_SERIALIZE_ERROR, Reason});
+        error:Reason:Stacktrace ->
+            ?SLOG(error, #{
+                input_packet => Packet,
+                exception => Reason,
+                stacktrace => Stacktrace
+            }),
+            erlang:error(?FRAME_SERIALIZE_ERROR)
+    end.
+
+-spec init_state(
+    quicer:stream_handle(),
+    quicer:connection_handle(),
+    quicer:new_stream_props()
+) ->
+    % @TODO
+    map().
+init_state(Stream, Connection, OpenFlags) ->
+    init_state(Stream, Connection, OpenFlags, undefined).
+
+init_state(Stream, Connection, OpenFlags, PS) ->
+    %% quic stream handle
+    #{
+        stream => Stream,
+        %% quic connection handle
+        conn => Connection,
+        %% if it is QUIC unidi stream
+        is_unidir => quicer:is_unidirectional(OpenFlags),
+        %% Frame Parse State
+        parse_state => PS,
+        %% Peer Stream handle in a pair for type unidir only
+        peer_stream => undefined,
+        %% if the stream is locally initiated.
+        is_local => false,
+        %% queue binary data when is NOT connected, in reversed order.
+        data_queue => [],
+        %% Channel from connection
+        %% `undefined' means the connection is not connected.
+        channel => undefined,
+        %% serialize opts for connection
+        serialize => undefined,
+        %% Current working queue
+        task_queue => queue:new()
+    }.
+
+-spec do_handle_call(term(), cb_state()) -> cb_ret().
+do_handle_call(
+    {activate, {PS, Serialize, Channel}},
+    #{
+        channel := undefined,
+        stream := Stream,
+        serialize := undefined
+    } = S
+) ->
+    NewS = S#{channel := Channel, serialize := Serialize, parse_state := PS},
+    %% We use quic protocol for flow control, and we don't check return val
+    case quicer:setopt(Stream, active, true) of
+        ok ->
+            {reply, ok, NewS};
+        {error, E} ->
+            ?SLOG(error, #{msg => "set stream active failed", error => E}),
+            {stop, E, NewS}
+    end;
+do_handle_call(_Call, _S) ->
+    {error, unimpl}.
+
+%% @doc return reserved order of Packets
+parse_incoming(Data, PS) ->
+    try
+        do_parse_incoming(Data, [], PS)
+    catch
+        throw:{?FRAME_PARSE_ERROR, Reason} ->
+            ?SLOG(info, #{
+                reason => Reason,
+                input_bytes => Data
+            }),
+            {[{frame_error, Reason}], PS};
+        error:Reason:Stacktrace ->
+            ?SLOG(error, #{
+                input_bytes => Data,
+                reason => Reason,
+                stacktrace => Stacktrace
+            }),
+            {[{frame_error, Reason}], PS}
+    end.
+
+do_parse_incoming(<<>>, Packets, ParseState) ->
+    {Packets, ParseState};
+do_parse_incoming(Data, Packets, ParseState) ->
+    case emqx_frame:parse(Data, ParseState) of
+        {more, NParseState} ->
+            {Packets, NParseState};
+        {ok, Packet, Rest, NParseState} ->
+            do_parse_incoming(Rest, [Packet | Packets], NParseState)
+    end.
+
+%% followings are copied from emqx_connection
+-compile({inline, [inc_incoming_stats/1]}).
+inc_incoming_stats(Packet = ?PACKET(Type)) ->
+    inc_counter(recv_pkt, 1),
+    case Type =:= ?PUBLISH of
+        true ->
+            inc_counter(recv_msg, 1),
+            inc_qos_stats(recv_msg, Packet),
+            inc_counter(incoming_pubs, 1);
+        false ->
+            ok
+    end,
+    emqx_metrics:inc_recv(Packet).
+
+-compile({inline, [inc_outgoing_stats/1]}).
+inc_outgoing_stats({error, message_too_large}) ->
+    inc_counter('send_msg.dropped', 1),
+    inc_counter('send_msg.dropped.too_large', 1);
+inc_outgoing_stats(Packet = ?PACKET(Type)) ->
+    inc_counter(send_pkt, 1),
+    case Type of
+        ?PUBLISH ->
+            inc_counter(send_msg, 1),
+            inc_counter(outgoing_pubs, 1),
+            inc_qos_stats(send_msg, Packet);
+        _ ->
+            ok
+    end,
+    emqx_metrics:inc_sent(Packet).
+
+inc_counter(Key, Inc) ->
+    _ = emqx_pd:inc_counter(Key, Inc),
+    ok.
+
+inc_qos_stats(Type, Packet) ->
+    case inc_qos_stats_key(Type, emqx_packet:qos(Packet)) of
+        undefined ->
+            ignore;
+        Key ->
+            inc_counter(Key, 1)
+    end.
+
+inc_qos_stats_key(send_msg, ?QOS_0) -> 'send_msg.qos0';
+inc_qos_stats_key(send_msg, ?QOS_1) -> 'send_msg.qos1';
+inc_qos_stats_key(send_msg, ?QOS_2) -> 'send_msg.qos2';
+inc_qos_stats_key(recv_msg, ?QOS_0) -> 'recv_msg.qos0';
+inc_qos_stats_key(recv_msg, ?QOS_1) -> 'recv_msg.qos1';
+inc_qos_stats_key(recv_msg, ?QOS_2) -> 'recv_msg.qos2';
+%% for bad qos
+inc_qos_stats_key(_, _) -> undefined.
+
+filter_disallowed_out(Packets) ->
+    lists:filter(fun is_datastream_out_pkt/1, Packets).
+
+is_datastream_out_pkt(#mqtt_packet{header = #mqtt_packet_header{type = Type}}) when
+    Type > 2 andalso Type < 12
+->
+    true;
+is_datastream_out_pkt(_) ->
+    false.
+%% BUILD_WITHOUT_QUIC
+-else.
+-endif.

+ 158 - 16
apps/emqx/src/emqx_quic_stream.erl

@@ -14,9 +14,18 @@
 %% limitations under the License.
 %%--------------------------------------------------------------------
 
-%% MQTT/QUIC Stream
+%% MQTT over QUIC
+%% multistreams: This is the control stream.
+%% single stream: This is the only main stream.
+%%   callbacks are from emqx_connection process rather than quicer_stream
 -module(emqx_quic_stream).
 
+-ifndef(BUILD_WITHOUT_QUIC).
+
+-behaviour(quicer_remote_stream).
+
+-include("logger.hrl").
+
 %% emqx transport Callbacks
 -export([
     type/1,
@@ -31,44 +40,84 @@
     sockname/1,
     peercert/1
 ]).
+-include_lib("quicer/include/quicer.hrl").
+-include_lib("emqx/include/emqx_quic.hrl").
 
-wait({ConnOwner, Conn}) ->
+-type cb_ret() :: quicer_stream:cb_ret().
+-type cb_data() :: quicer_stream:cb_state().
+-type connection_handle() :: quicer:connection_handle().
+-type stream_handle() :: quicer:stream_handle().
+
+-export([
+    send_complete/3,
+    peer_send_shutdown/3,
+    peer_send_aborted/3,
+    peer_receive_aborted/3,
+    send_shutdown_complete/3,
+    stream_closed/3,
+    passive/3
+]).
+
+-export_type([socket/0]).
+
+-opaque socket() :: {quic, connection_handle(), stream_handle(), socket_info()}.
+
+-type socket_info() :: #{
+    is_orphan => boolean(),
+    ctrl_stream_start_flags => quicer:stream_open_flags(),
+    %% and quicer:new_conn_props()
+    _ => _
+}.
+
+%%% For Accepting New Remote Stream
+-spec wait({pid(), connection_handle(), socket_info()}) ->
+    {ok, socket()} | {error, enotconn}.
+wait({ConnOwner, Conn, ConnInfo}) ->
     {ok, Conn} = quicer:async_accept_stream(Conn, []),
     ConnOwner ! {self(), stream_acceptor_ready},
     receive
-        %% from msquic
-        {quic, new_stream, Stream} ->
-            {ok, {quic, Conn, Stream}};
+        %% New incoming stream, this is a *control* stream
+        {quic, new_stream, Stream, #{is_orphan := IsOrphan, flags := StartFlags}} ->
+            SocketInfo = ConnInfo#{
+                is_orphan => IsOrphan,
+                ctrl_stream_start_flags => StartFlags
+            },
+            {ok, socket(Conn, Stream, SocketInfo)};
+        %% connection closed event for stream acceptor
+        {quic, closed, undefined, undefined} ->
+            {error, enotconn};
+        %% Connection owner process down
         {'EXIT', ConnOwner, _Reason} ->
             {error, enotconn}
     end.
 
+-spec type(_) -> quic.
 type(_) ->
     quic.
 
-peername({quic, Conn, _Stream}) ->
+peername({quic, Conn, _Stream, _Info}) ->
     quicer:peername(Conn).
 
-sockname({quic, Conn, _Stream}) ->
+sockname({quic, Conn, _Stream, _Info}) ->
     quicer:sockname(Conn).
 
 peercert(_S) ->
     %% @todo but unsupported by msquic
     nossl.
 
-getstat({quic, Conn, _Stream}, Stats) ->
+getstat({quic, Conn, _Stream, _Info}, Stats) ->
     case quicer:getstat(Conn, Stats) of
         {error, _} -> {error, closed};
         Res -> Res
     end.
 
-setopts(Socket, Opts) ->
+setopts({quic, _Conn, Stream, _Info}, Opts) ->
     lists:foreach(
         fun
             ({Opt, V}) when is_atom(Opt) ->
-                quicer:setopt(Socket, Opt, V);
+                quicer:setopt(Stream, Opt, V);
             (Opt) when is_atom(Opt) ->
-                quicer:setopt(Socket, Opt, true)
+                quicer:setopt(Stream, Opt, true)
         end,
         Opts
     ),
@@ -84,9 +133,18 @@ getopts(_Socket, _Opts) ->
         {buffer, 80000}
     ]}.
 
-fast_close({quic, _Conn, Stream}) ->
-    %% Flush send buffer, gracefully shutdown
-    quicer:async_shutdown_stream(Stream),
+%% @TODO supply some App Error Code from caller
+fast_close({ConnOwner, Conn, _ConnInfo}) when is_pid(ConnOwner) ->
+    %% handshake aborted.
+    _ = quicer:async_shutdown_connection(Conn, ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, 0),
+    ok;
+fast_close({quic, _Conn, Stream, _Info}) ->
+    %% Force flush
+    _ = quicer:async_shutdown_stream(Stream),
+    %% @FIXME Since we shutdown the control stream, we shutdown the connection as well
+    %% *BUT* Msquic does not flush the send buffer if we shutdown the connection after
+    %% gracefully shutdown the stream.
+    % quicer:async_shutdown_connection(Conn, ?QUIC_CONNECTION_SHUTDOWN_FLAG_NONE, 0),
     ok.
 
 -spec ensure_ok_or_exit(atom(), list(term())) -> term().
@@ -102,8 +160,92 @@ ensure_ok_or_exit(Fun, Args = [Sock | _]) when is_atom(Fun), is_list(Args) ->
             Result
     end.
 
-async_send({quic, _Conn, Stream}, Data, _Options) ->
-    case quicer:send(Stream, Data) of
+async_send({quic, _Conn, Stream, _Info}, Data, _Options) ->
+    case quicer:async_send(Stream, Data, ?QUICER_SEND_FLAG_SYNC) of
         {ok, _Len} -> ok;
+        {error, X, Y} -> {error, {X, Y}};
         Other -> Other
     end.
+
+%%%
+%%% quicer stream callbacks
+%%%
+
+-spec peer_receive_aborted(stream_handle(), non_neg_integer(), cb_data()) -> cb_ret().
+peer_receive_aborted(Stream, ErrorCode, S) ->
+    _ = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_ABORT, ErrorCode),
+    {ok, S}.
+
+-spec peer_send_aborted(stream_handle(), non_neg_integer(), cb_data()) -> cb_ret().
+peer_send_aborted(Stream, ErrorCode, S) ->
+    %% we abort receive with same reason
+    _ = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_ABORT, ErrorCode),
+    {ok, S}.
+
+-spec peer_send_shutdown(stream_handle(), undefined, cb_data()) -> cb_ret().
+peer_send_shutdown(Stream, undefined, S) ->
+    ok = quicer:async_shutdown_stream(Stream, ?QUIC_STREAM_SHUTDOWN_FLAG_GRACEFUL, 0),
+    {ok, S}.
+
+-spec send_complete(stream_handle(), boolean(), cb_data()) -> cb_ret().
+send_complete(_Stream, false, S) ->
+    {ok, S};
+send_complete(_Stream, true = _IsCancelled, S) ->
+    ?SLOG(error, #{message => "send cancelled"}),
+    {ok, S}.
+
+-spec send_shutdown_complete(stream_handle(), boolean(), cb_data()) -> cb_ret().
+send_shutdown_complete(_Stream, _IsGraceful, S) ->
+    {ok, S}.
+
+-spec passive(stream_handle(), undefined, cb_data()) -> cb_ret().
+passive(Stream, undefined, S) ->
+    case quicer:setopt(Stream, active, 10) of
+        ok -> ok;
+        Error -> ?SLOG(error, #{message => "set active error", error => Error})
+    end,
+    {ok, S}.
+
+-spec stream_closed(stream_handle(), quicer:stream_closed_props(), cb_data()) ->
+    {{continue, term()}, cb_data()}.
+stream_closed(
+    _Stream,
+    #{
+        is_conn_shutdown := IsConnShutdown,
+        is_app_closing := IsAppClosing,
+        is_shutdown_by_app := IsAppShutdown,
+        is_closed_remotely := IsRemote,
+        status := Status,
+        error := Code
+    },
+    S
+) when
+    is_boolean(IsConnShutdown) andalso
+        is_boolean(IsAppClosing) andalso
+        is_boolean(IsAppShutdown) andalso
+        is_boolean(IsRemote) andalso
+        is_atom(Status) andalso
+        is_integer(Code)
+->
+    %% For now we fake a sock_closed for
+    %% emqx_connection:process_msg to append
+    %% a msg to be processed
+    Reason =
+        case Code of
+            ?MQTT_QUIC_CONN_NOERROR ->
+                normal;
+            _ ->
+                Status
+        end,
+    {{continue, {sock_closed, Reason}}, S}.
+
+%%%
+%%%  Internals
+%%%
+-spec socket(connection_handle(), stream_handle(), socket_info()) -> socket().
+socket(Conn, CtrlStream, Info) when is_map(Info) ->
+    {quic, Conn, CtrlStream, Info}.
+
+%% BUILD_WITHOUT_QUIC
+-else.
+-endif.

+ 277 - 47
apps/emqx/src/emqx_schema.erl

@@ -120,6 +120,9 @@
 
 -elvis([{elvis_style, god_modules, disable}]).
 
+-define(BIT(Bits), (1 bsl (Bits))).
+-define(MAX_UINT(Bits), (?BIT(Bits) - 1)).
+
 namespace() -> broker.
 
 tags() ->
@@ -268,7 +271,7 @@ fields("persistent_session_store") ->
             sc(
                 duration(),
                 #{
-                    default => "1h",
+                    default => <<"1h">>,
                     desc => ?DESC(persistent_session_store_max_retain_undelivered)
                 }
             )},
@@ -276,7 +279,7 @@ fields("persistent_session_store") ->
             sc(
                 duration(),
                 #{
-                    default => "1h",
+                    default => <<"1h">>,
                     desc => ?DESC(persistent_session_store_message_gc_interval)
                 }
             )},
@@ -284,7 +287,7 @@ fields("persistent_session_store") ->
             sc(
                 duration(),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     desc => ?DESC(persistent_session_store_session_message_gc_interval)
                 }
             )}
@@ -352,7 +355,7 @@ fields("authz_cache") ->
             sc(
                 duration(),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     desc => ?DESC(fields_cache_ttl)
                 }
             )}
@@ -363,7 +366,7 @@ fields("mqtt") ->
             sc(
                 hoconsc:union([infinity, duration()]),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC(mqtt_idle_timeout)
                 }
             )},
@@ -371,7 +374,7 @@ fields("mqtt") ->
             sc(
                 bytesize(),
                 #{
-                    default => "1MB",
+                    default => <<"1MB">>,
                     desc => ?DESC(mqtt_max_packet_size)
                 }
             )},
@@ -507,7 +510,7 @@ fields("mqtt") ->
             sc(
                 duration(),
                 #{
-                    default => "30s",
+                    default => <<"30s">>,
                     desc => ?DESC(mqtt_retry_interval)
                 }
             )},
@@ -523,7 +526,7 @@ fields("mqtt") ->
             sc(
                 duration(),
                 #{
-                    default => "300s",
+                    default => <<"300s">>,
                     desc => ?DESC(mqtt_await_rel_timeout)
                 }
             )},
@@ -531,7 +534,7 @@ fields("mqtt") ->
             sc(
                 duration(),
                 #{
-                    default => "2h",
+                    default => <<"2h">>,
                     desc => ?DESC(mqtt_session_expiry_interval)
                 }
             )},
@@ -617,7 +620,7 @@ fields("flapping_detect") ->
             sc(
                 duration(),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     desc => ?DESC(flapping_detect_window_time)
                 }
             )},
@@ -625,7 +628,7 @@ fields("flapping_detect") ->
             sc(
                 duration(),
                 #{
-                    default => "5m",
+                    default => <<"5m">>,
                     desc => ?DESC(flapping_detect_ban_time)
                 }
             )}
@@ -652,7 +655,7 @@ fields("force_shutdown") ->
             sc(
                 wordsize(),
                 #{
-                    default => "32MB",
+                    default => <<"32MB">>,
                     desc => ?DESC(force_shutdown_max_heap_size),
                     validator => fun ?MODULE:validate_heap_size/1
                 }
@@ -715,7 +718,7 @@ fields("conn_congestion") ->
             sc(
                 duration(),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     desc => ?DESC(conn_congestion_min_alarm_sustain_duration)
                 }
             )}
@@ -739,7 +742,7 @@ fields("force_gc") ->
             sc(
                 bytesize(),
                 #{
-                    default => "16MB",
+                    default => <<"16MB">>,
                     desc => ?DESC(force_gc_bytes)
                 }
             )}
@@ -845,18 +848,96 @@ fields("mqtt_wss_listener") ->
         ];
 fields("mqtt_quic_listener") ->
     [
-        %% TODO: ensure cacertfile is configurable
         {"certfile",
             sc(
                 string(),
-                #{desc => ?DESC(fields_mqtt_quic_listener_certfile)}
+                #{
+                    %% TODO: deprecated => {since, "5.1.0"}
+                    desc => ?DESC(fields_mqtt_quic_listener_certfile)
+                }
             )},
         {"keyfile",
             sc(
                 string(),
-                #{desc => ?DESC(fields_mqtt_quic_listener_keyfile)}
+                %% TODO: deprecated => {since, "5.1.0"}
+                #{
+                    desc => ?DESC(fields_mqtt_quic_listener_keyfile)
+                }
             )},
         {"ciphers", ciphers_schema(quic)},
+
+        {"max_bytes_per_key",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(64),
+                ?DESC(fields_mqtt_quic_listener_max_bytes_per_key)
+            )},
+        {"handshake_idle_timeout_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(64),
+                ?DESC(fields_mqtt_quic_listener_handshake_idle_timeout)
+            )},
+        {"tls_server_max_send_buffer",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_tls_server_max_send_buffer)
+            )},
+        {"stream_recv_window_default",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_stream_recv_window_default)
+            )},
+        {"stream_recv_buffer_default",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_stream_recv_buffer_default)
+            )},
+        {"conn_flow_control_window",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_conn_flow_control_window)
+            )},
+        {"max_stateless_operations",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_max_stateless_operations)
+            )},
+        {"initial_window_packets",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_initial_window_packets)
+            )},
+        {"send_idle_timeout_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_send_idle_timeout_ms)
+            )},
+        {"initial_rtt_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_initial_rtt_ms)
+            )},
+        {"max_ack_delay_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_max_ack_delay_ms)
+            )},
+        {"disconnect_timeout_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_disconnect_timeout_ms)
+            )},
         {"idle_timeout",
             sc(
                 duration_ms(),
@@ -865,14 +946,26 @@ fields("mqtt_quic_listener") ->
                     desc => ?DESC(fields_mqtt_quic_listener_idle_timeout)
                 }
             )},
+        {"idle_timeout_ms",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(64),
+                ?DESC(fields_mqtt_quic_listener_idle_timeout_ms)
+            )},
         {"handshake_idle_timeout",
             sc(
                 duration_ms(),
                 #{
-                    default => "10s",
+                    default => <<"10s">>,
                     desc => ?DESC(fields_mqtt_quic_listener_handshake_idle_timeout)
                 }
             )},
+        {"handshake_idle_timeout_ms",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(64),
+                ?DESC(fields_mqtt_quic_listener_handshake_idle_timeout_ms)
+            )},
         {"keep_alive_interval",
             sc(
                 duration_ms(),
@@ -880,6 +973,108 @@ fields("mqtt_quic_listener") ->
                     default => 0,
                     desc => ?DESC(fields_mqtt_quic_listener_keep_alive_interval)
                 }
+            )},
+        {"keep_alive_interval_ms",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(32),
+                ?DESC(fields_mqtt_quic_listener_keep_alive_interval_ms)
+            )},
+        {"peer_bidi_stream_count",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_peer_bidi_stream_count)
+            )},
+        {"peer_unidi_stream_count",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_peer_unidi_stream_count)
+            )},
+        {"retry_memory_limit",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_retry_memory_limit)
+            )},
+        {"load_balancing_mode",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_load_balancing_mode)
+            )},
+        {"max_operations_per_drain",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(8),
+                ?DESC(fields_mqtt_quic_listener_max_operations_per_drain)
+            )},
+        {"send_buffering_enabled",
+            quic_feature_toggle(
+                ?DESC(fields_mqtt_quic_listener_send_buffering_enabled)
+            )},
+        {"pacing_enabled",
+            quic_feature_toggle(
+                ?DESC(fields_mqtt_quic_listener_pacing_enabled)
+            )},
+        {"migration_enabled",
+            quic_feature_toggle(
+                ?DESC(fields_mqtt_quic_listener_migration_enabled)
+            )},
+        {"datagram_receive_enabled",
+            quic_feature_toggle(
+                ?DESC(fields_mqtt_quic_listener_datagram_receive_enabled)
+            )},
+        {"server_resumption_level",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(8),
+                ?DESC(fields_mqtt_quic_listener_server_resumption_level)
+            )},
+        {"minimum_mtu",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_minimum_mtu)
+            )},
+        {"maximum_mtu",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_maximum_mtu)
+            )},
+        {"mtu_discovery_search_complete_timeout_us",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(64),
+                ?DESC(fields_mqtt_quic_listener_mtu_discovery_search_complete_timeout_us)
+            )},
+        {"mtu_discovery_missing_probe_count",
+            quic_lowlevel_settings_uint(
+                1,
+                ?MAX_UINT(8),
+                ?DESC(fields_mqtt_quic_listener_mtu_discovery_missing_probe_count)
+            )},
+        {"max_binding_stateless_operations",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_max_binding_stateless_operations)
+            )},
+        {"stateless_operation_expiration_ms",
+            quic_lowlevel_settings_uint(
+                0,
+                ?MAX_UINT(16),
+                ?DESC(fields_mqtt_quic_listener_stateless_operation_expiration_ms)
+            )},
+        {"ssl_options",
+            sc(
+                ref("listener_quic_ssl_opts"),
+                #{
+                    required => false,
+                    desc => ?DESC(fields_mqtt_quic_listener_ssl_options)
+                }
             )}
     ] ++ base_listener(14567);
 fields("ws_opts") ->
@@ -888,7 +1083,7 @@ fields("ws_opts") ->
             sc(
                 string(),
                 #{
-                    default => "/mqtt",
+                    default => <<"/mqtt">>,
                     desc => ?DESC(fields_ws_opts_mqtt_path)
                 }
             )},
@@ -912,7 +1107,7 @@ fields("ws_opts") ->
             sc(
                 duration(),
                 #{
-                    default => "7200s",
+                    default => <<"7200s">>,
                     desc => ?DESC(fields_ws_opts_idle_timeout)
                 }
             )},
@@ -936,7 +1131,7 @@ fields("ws_opts") ->
             sc(
                 comma_separated_list(),
                 #{
-                    default => "mqtt, mqtt-v3, mqtt-v3.1.1, mqtt-v5",
+                    default => <<"mqtt, mqtt-v3, mqtt-v3.1.1, mqtt-v5">>,
                     desc => ?DESC(fields_ws_opts_supported_subprotocols)
                 }
             )},
@@ -968,7 +1163,7 @@ fields("ws_opts") ->
             sc(
                 string(),
                 #{
-                    default => "x-forwarded-for",
+                    default => <<"x-forwarded-for">>,
                     desc => ?DESC(fields_ws_opts_proxy_address_header)
                 }
             )},
@@ -976,7 +1171,7 @@ fields("ws_opts") ->
             sc(
                 string(),
                 #{
-                    default => "x-forwarded-port",
+                    default => <<"x-forwarded-port">>,
                     desc => ?DESC(fields_ws_opts_proxy_port_header)
                 }
             )},
@@ -1008,7 +1203,7 @@ fields("tcp_opts") ->
             sc(
                 duration(),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC(fields_tcp_opts_send_timeout)
                 }
             )},
@@ -1049,7 +1244,7 @@ fields("tcp_opts") ->
             sc(
                 bytesize(),
                 #{
-                    default => "1MB",
+                    default => <<"1MB">>,
                     desc => ?DESC(fields_tcp_opts_high_watermark)
                 }
             )},
@@ -1090,6 +1285,8 @@ fields("listener_wss_opts") ->
         },
         true
     );
+fields("listener_quic_ssl_opts") ->
+    server_ssl_opts_schema(#{}, false);
 fields("ssl_client_opts") ->
     client_ssl_opts_schema(#{});
 fields("deflate_opts") ->
@@ -1260,7 +1457,7 @@ fields("sys_topics") ->
             sc(
                 hoconsc:union([disabled, duration()]),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     desc => ?DESC(sys_msg_interval)
                 }
             )},
@@ -1268,7 +1465,7 @@ fields("sys_topics") ->
             sc(
                 hoconsc:union([disabled, duration()]),
                 #{
-                    default => "30s",
+                    default => <<"30s">>,
                     desc => ?DESC(sys_heartbeat_interval)
                 }
             )},
@@ -1337,7 +1534,7 @@ fields("sysmon_vm") ->
             sc(
                 duration(),
                 #{
-                    default => "30s",
+                    default => <<"30s">>,
                     desc => ?DESC(sysmon_vm_process_check_interval)
                 }
             )},
@@ -1345,7 +1542,7 @@ fields("sysmon_vm") ->
             sc(
                 percent(),
                 #{
-                    default => "80%",
+                    default => <<"80%">>,
                     desc => ?DESC(sysmon_vm_process_high_watermark)
                 }
             )},
@@ -1353,7 +1550,7 @@ fields("sysmon_vm") ->
             sc(
                 percent(),
                 #{
-                    default => "60%",
+                    default => <<"60%">>,
                     desc => ?DESC(sysmon_vm_process_low_watermark)
                 }
             )},
@@ -1369,7 +1566,7 @@ fields("sysmon_vm") ->
             sc(
                 hoconsc:union([disabled, duration()]),
                 #{
-                    default => "240ms",
+                    default => <<"240ms">>,
                     desc => ?DESC(sysmon_vm_long_schedule)
                 }
             )},
@@ -1377,7 +1574,7 @@ fields("sysmon_vm") ->
             sc(
                 hoconsc:union([disabled, bytesize()]),
                 #{
-                    default => "32MB",
+                    default => <<"32MB">>,
                     desc => ?DESC(sysmon_vm_large_heap)
                 }
             )},
@@ -1404,7 +1601,7 @@ fields("sysmon_os") ->
             sc(
                 duration(),
                 #{
-                    default => "60s",
+                    default => <<"60s">>,
                     desc => ?DESC(sysmon_os_cpu_check_interval)
                 }
             )},
@@ -1412,7 +1609,7 @@ fields("sysmon_os") ->
             sc(
                 percent(),
                 #{
-                    default => "80%",
+                    default => <<"80%">>,
                     desc => ?DESC(sysmon_os_cpu_high_watermark)
                 }
             )},
@@ -1420,7 +1617,7 @@ fields("sysmon_os") ->
             sc(
                 percent(),
                 #{
-                    default => "60%",
+                    default => <<"60%">>,
                     desc => ?DESC(sysmon_os_cpu_low_watermark)
                 }
             )},
@@ -1428,7 +1625,7 @@ fields("sysmon_os") ->
             sc(
                 hoconsc:union([disabled, duration()]),
                 #{
-                    default => "60s",
+                    default => <<"60s">>,
                     desc => ?DESC(sysmon_os_mem_check_interval)
                 }
             )},
@@ -1436,7 +1633,7 @@ fields("sysmon_os") ->
             sc(
                 percent(),
                 #{
-                    default => "70%",
+                    default => <<"70%">>,
                     desc => ?DESC(sysmon_os_sysmem_high_watermark)
                 }
             )},
@@ -1444,7 +1641,7 @@ fields("sysmon_os") ->
             sc(
                 percent(),
                 #{
-                    default => "5%",
+                    default => <<"5%">>,
                     desc => ?DESC(sysmon_os_procmem_high_watermark)
                 }
             )}
@@ -1465,7 +1662,7 @@ fields("sysmon_top") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "system_monitor.top_sample_interval",
-                    default => "2s",
+                    default => <<"2s">>,
                     desc => ?DESC(sysmon_top_sample_interval)
                 }
             )},
@@ -1484,7 +1681,7 @@ fields("sysmon_top") ->
                 #{
                     mapping => "system_monitor.db_hostname",
                     desc => ?DESC(sysmon_top_db_hostname),
-                    default => ""
+                    default => <<>>
                 }
             )},
         {"db_port",
@@ -1501,7 +1698,7 @@ fields("sysmon_top") ->
                 string(),
                 #{
                     mapping => "system_monitor.db_username",
-                    default => "system_monitor",
+                    default => <<"system_monitor">>,
                     desc => ?DESC(sysmon_top_db_username)
                 }
             )},
@@ -1510,7 +1707,7 @@ fields("sysmon_top") ->
                 binary(),
                 #{
                     mapping => "system_monitor.db_password",
-                    default => "system_monitor_password",
+                    default => <<"system_monitor_password">>,
                     desc => ?DESC(sysmon_top_db_password),
                     converter => fun password_converter/2,
                     sensitive => true
@@ -1521,7 +1718,7 @@ fields("sysmon_top") ->
                 string(),
                 #{
                     mapping => "system_monitor.db_name",
-                    default => "postgres",
+                    default => <<"postgres">>,
                     desc => ?DESC(sysmon_top_db_name)
                 }
             )}
@@ -1551,7 +1748,7 @@ fields("alarm") ->
             sc(
                 duration(),
                 #{
-                    default => "24h",
+                    default => <<"24h">>,
                     example => "24h",
                     desc => ?DESC(alarm_validity_period)
                 }
@@ -1590,7 +1787,7 @@ mqtt_listener(Bind) ->
                     duration(),
                     #{
                         desc => ?DESC(mqtt_listener_proxy_protocol_timeout),
-                        default => "3s"
+                        default => <<"3s">>
                     }
                 )},
             {?EMQX_AUTHENTICATION_CONFIG_ROOT_NAME, authentication(listener)}
@@ -1769,6 +1966,12 @@ desc("listener_ssl_opts") ->
     "Socket options for SSL connections.";
 desc("listener_wss_opts") ->
     "Socket options for WebSocket/SSL connections.";
+desc("fields_mqtt_quic_listener_certfile") ->
+    "Path to the certificate file. Will be deprecated in 5.1, use '.ssl_options.certfile' instead.";
+desc("fields_mqtt_quic_listener_keyfile") ->
+    "Path to the secret key file. Will be deprecated in 5.1, use '.ssl_options.keyfile' instead.";
+desc("listener_quic_ssl_opts") ->
+    "TLS options for QUIC transport.";
 desc("ssl_client_opts") ->
     "Socket options for SSL clients.";
 desc("deflate_opts") ->
@@ -1935,7 +2138,7 @@ common_ssl_opts_schema(Defaults) ->
             sc(
                 duration(),
                 #{
-                    default => Df("hibernate_after", "5s"),
+                    default => Df("hibernate_after", <<"5s">>),
                     desc => ?DESC(common_ssl_opts_schema_hibernate_after)
                 }
             )}
@@ -1985,7 +2188,7 @@ server_ssl_opts_schema(Defaults, IsRanchListener) ->
                 sc(
                     duration(),
                     #{
-                        default => Df("handshake_timeout", "15s"),
+                        default => Df("handshake_timeout", <<"15s">>),
                         desc => ?DESC(server_ssl_opts_schema_handshake_timeout)
                     }
                 )}
@@ -2617,3 +2820,30 @@ parse_port(Port) ->
         _:_ ->
             throw("bad_port_number")
     end.
+
+quic_feature_toggle(Desc) ->
+    sc(
+        %% true, false are for user facing
+        %% 0, 1 are for internal represtation
+        typerefl:alias("boolean", typerefl:union([true, false, 0, 1])),
+        #{
+            desc => Desc,
+            hidden => true,
+            required => false,
+            converter => fun
+                (true) -> 1;
+                (false) -> 0;
+                (Other) -> Other
+            end
+        }
+    ).
+
+quic_lowlevel_settings_uint(Low, High, Desc) ->
+    sc(
+        range(Low, High),
+        #{
+            required => false,
+            hidden => true,
+            desc => Desc
+        }
+    ).

+ 26 - 4
apps/emqx/test/emqx_common_test_helpers.erl

@@ -22,6 +22,8 @@
 
 -export([
     all/1,
+    init_per_testcase/3,
+    end_per_testcase/3,
     boot_modules/1,
     start_apps/1,
     start_apps/2,
@@ -42,6 +44,7 @@
     client_ssl_twoway/1,
     ensure_mnesia_stopped/0,
     ensure_quic_listener/2,
+    ensure_quic_listener/3,
     is_all_tcp_servers_available/1,
     is_tcp_server_available/2,
     is_tcp_server_available/3,
@@ -150,6 +153,19 @@ all(Suite) ->
         string:substr(atom_to_list(F), 1, 2) == "t_"
     ]).
 
+init_per_testcase(Module, TestCase, Config) ->
+    case erlang:function_exported(Module, TestCase, 2) of
+        true -> Module:TestCase(init, Config);
+        false -> Config
+    end.
+
+end_per_testcase(Module, TestCase, Config) ->
+    case erlang:function_exported(Module, TestCase, 2) of
+        true -> Module:TestCase('end', Config);
+        false -> ok
+    end,
+    Config.
+
 %% set emqx app boot modules
 -spec boot_modules(all | list(atom())) -> ok.
 boot_modules(Mods) ->
@@ -496,11 +512,14 @@ ensure_dashboard_listeners_started(_App) ->
 
 -spec ensure_quic_listener(Name :: atom(), UdpPort :: inet:port_number()) -> ok.
 ensure_quic_listener(Name, UdpPort) ->
+    ensure_quic_listener(Name, UdpPort, #{}).
+-spec ensure_quic_listener(Name :: atom(), UdpPort :: inet:port_number(), map()) -> ok.
+ensure_quic_listener(Name, UdpPort, ExtraSettings) ->
     application:ensure_all_started(quicer),
     Conf = #{
         acceptors => 16,
-        bind => {{0, 0, 0, 0}, UdpPort},
-        certfile => filename:join(code:lib_dir(emqx), "etc/certs/cert.pem"),
+        bind => UdpPort,
+
         ciphers =>
             [
                 "TLS_AES_256_GCM_SHA384",
@@ -509,13 +528,16 @@ ensure_quic_listener(Name, UdpPort) ->
             ],
         enabled => true,
         idle_timeout => 15000,
-        keyfile => filename:join(code:lib_dir(emqx), "etc/certs/key.pem"),
+        ssl_options => #{
+            certfile => filename:join(code:lib_dir(emqx), "etc/certs/cert.pem"),
+            keyfile => filename:join(code:lib_dir(emqx), "etc/certs/key.pem")
+        },
         limiter => #{},
         max_connections => 1024000,
         mountpoint => <<>>,
         zone => default
     },
-    emqx_config:put([listeners, quic, Name], Conf),
+    emqx_config:put([listeners, quic, Name], maps:merge(Conf, ExtraSettings)),
     case emqx_listeners:start_listener(quic, Name, Conf) of
         ok -> ok;
         {error, {already_started, _Pid}} -> ok

+ 1 - 1
apps/emqx/test/emqx_mqtt_protocol_v5_SUITE.erl

@@ -905,7 +905,7 @@ t_shared_subscriptions_client_terminates_when_qos_eq_2(Config) ->
         emqtt,
         connected,
         fun
-            (cast, ?PUBLISH_PACKET(?QOS_2, _PacketId), _State) ->
+            (cast, {?PUBLISH_PACKET(?QOS_2, _PacketId), _Via}, _State) ->
                 ok = counters:add(CRef, 1, 1),
                 {stop, {shutdown, for_testing}};
             (Arg1, ARg2, Arg3) ->

Разница между файлами не показана из-за своего большого размера
+ 2041 - 0
apps/emqx/test/emqx_quic_multistreams_SUITE.erl


+ 1 - 1
apps/emqx_authn/src/emqx_authn.app.src

@@ -1,7 +1,7 @@
 %% -*- mode: erlang -*-
 {application, emqx_authn, [
     {description, "EMQX Authentication"},
-    {vsn, "0.1.13"},
+    {vsn, "0.1.14"},
     {modules, []},
     {registered, [emqx_authn_sup, emqx_authn_registry]},
     {applications, [kernel, stdlib, emqx_resource, emqx_connector, ehttpc, epgsql, mysql, jose]},

+ 1 - 1
apps/emqx_authn/src/simple_authn/emqx_authn_mysql.erl

@@ -74,7 +74,7 @@ query(_) -> undefined.
 
 query_timeout(type) -> emqx_schema:duration_ms();
 query_timeout(desc) -> ?DESC(?FUNCTION_NAME);
-query_timeout(default) -> "5s";
+query_timeout(default) -> <<"5s">>;
 query_timeout(_) -> undefined.
 
 %%------------------------------------------------------------------------------

+ 1 - 1
apps/emqx_authz/src/emqx_authz.app.src

@@ -1,7 +1,7 @@
 %% -*- mode: erlang -*-
 {application, emqx_authz, [
     {description, "An OTP application"},
-    {vsn, "0.1.13"},
+    {vsn, "0.1.14"},
     {registered, []},
     {mod, {emqx_authz_app, []}},
     {applications, [

+ 1 - 1
apps/emqx_authz/src/emqx_authz_api_schema.erl

@@ -108,7 +108,7 @@ authz_http_common_fields() ->
                 })},
             {request_timeout,
                 mk_duration("Request timeout", #{
-                    required => false, default => "30s", desc => ?DESC(request_timeout)
+                    required => false, default => <<"30s">>, desc => ?DESC(request_timeout)
                 })}
         ] ++
         maps:to_list(

+ 1 - 1
apps/emqx_authz/src/emqx_authz_schema.erl

@@ -223,7 +223,7 @@ http_common_fields() ->
         {url, fun url/1},
         {request_timeout,
             mk_duration("Request timeout", #{
-                required => false, default => "30s", desc => ?DESC(request_timeout)
+                required => false, default => <<"30s">>, desc => ?DESC(request_timeout)
             })},
         {body, ?HOCON(map(), #{required => false, desc => ?DESC(body)})}
     ] ++

+ 36 - 36
apps/emqx_conf/src/emqx_conf_schema.erl

@@ -145,7 +145,7 @@ fields("cluster") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "ekka.cluster_autoclean",
-                    default => "5m",
+                    default => <<"5m">>,
                     desc => ?DESC(cluster_autoclean),
                     'readOnly' => true
                 }
@@ -214,7 +214,7 @@ fields(cluster_mcast) ->
             sc(
                 string(),
                 #{
-                    default => "239.192.0.1",
+                    default => <<"239.192.0.1">>,
                     desc => ?DESC(cluster_mcast_addr),
                     'readOnly' => true
                 }
@@ -232,7 +232,7 @@ fields(cluster_mcast) ->
             sc(
                 string(),
                 #{
-                    default => "0.0.0.0",
+                    default => <<"0.0.0.0">>,
                     desc => ?DESC(cluster_mcast_iface),
                     'readOnly' => true
                 }
@@ -259,7 +259,7 @@ fields(cluster_mcast) ->
             sc(
                 emqx_schema:bytesize(),
                 #{
-                    default => "16KB",
+                    default => <<"16KB">>,
                     desc => ?DESC(cluster_mcast_sndbuf),
                     'readOnly' => true
                 }
@@ -268,7 +268,7 @@ fields(cluster_mcast) ->
             sc(
                 emqx_schema:bytesize(),
                 #{
-                    default => "16KB",
+                    default => <<"16KB">>,
                     desc => ?DESC(cluster_mcast_recbuf),
                     'readOnly' => true
                 }
@@ -277,7 +277,7 @@ fields(cluster_mcast) ->
             sc(
                 emqx_schema:bytesize(),
                 #{
-                    default => "32KB",
+                    default => <<"32KB">>,
                     desc => ?DESC(cluster_mcast_buffer),
                     'readOnly' => true
                 }
@@ -289,7 +289,7 @@ fields(cluster_dns) ->
             sc(
                 string(),
                 #{
-                    default => "localhost",
+                    default => <<"localhost">>,
                     desc => ?DESC(cluster_dns_name),
                     'readOnly' => true
                 }
@@ -318,7 +318,7 @@ fields(cluster_etcd) ->
             sc(
                 string(),
                 #{
-                    default => "emqxcl",
+                    default => <<"emqxcl">>,
                     desc => ?DESC(cluster_etcd_prefix),
                     'readOnly' => true
                 }
@@ -327,7 +327,7 @@ fields(cluster_etcd) ->
             sc(
                 emqx_schema:duration(),
                 #{
-                    default => "1m",
+                    default => <<"1m">>,
                     'readOnly' => true,
                     desc => ?DESC(cluster_etcd_node_ttl)
                 }
@@ -347,7 +347,7 @@ fields(cluster_k8s) ->
             sc(
                 string(),
                 #{
-                    default => "http://10.110.111.204:8080",
+                    default => <<"http://10.110.111.204:8080">>,
                     desc => ?DESC(cluster_k8s_apiserver),
                     'readOnly' => true
                 }
@@ -356,7 +356,7 @@ fields(cluster_k8s) ->
             sc(
                 string(),
                 #{
-                    default => "emqx",
+                    default => <<"emqx">>,
                     desc => ?DESC(cluster_k8s_service_name),
                     'readOnly' => true
                 }
@@ -374,7 +374,7 @@ fields(cluster_k8s) ->
             sc(
                 string(),
                 #{
-                    default => "default",
+                    default => <<"default">>,
                     desc => ?DESC(cluster_k8s_namespace),
                     'readOnly' => true
                 }
@@ -383,7 +383,7 @@ fields(cluster_k8s) ->
             sc(
                 string(),
                 #{
-                    default => "pod.local",
+                    default => <<"pod.local">>,
                     'readOnly' => true,
                     desc => ?DESC(cluster_k8s_suffix)
                 }
@@ -395,7 +395,7 @@ fields("node") ->
             sc(
                 string(),
                 #{
-                    default => "emqx@127.0.0.1",
+                    default => <<"emqx@127.0.0.1">>,
                     'readOnly' => true,
                     desc => ?DESC(node_name)
                 }
@@ -477,7 +477,7 @@ fields("node") ->
                 hoconsc:union([disabled, emqx_schema:duration()]),
                 #{
                     mapping => "emqx_machine.global_gc_interval",
-                    default => "15m",
+                    default => <<"15m">>,
                     desc => ?DESC(node_global_gc_interval),
                     'readOnly' => true
                 }
@@ -497,7 +497,7 @@ fields("node") ->
                 emqx_schema:duration_s(),
                 #{
                     mapping => "vm_args.-env ERL_CRASH_DUMP_SECONDS",
-                    default => "30s",
+                    default => <<"30s">>,
                     desc => ?DESC(node_crash_dump_seconds),
                     'readOnly' => true
                 }
@@ -507,7 +507,7 @@ fields("node") ->
                 emqx_schema:bytesize(),
                 #{
                     mapping => "vm_args.-env ERL_CRASH_DUMP_BYTES",
-                    default => "100MB",
+                    default => <<"100MB">>,
                     desc => ?DESC(node_crash_dump_bytes),
                     'readOnly' => true
                 }
@@ -517,7 +517,7 @@ fields("node") ->
                 emqx_schema:duration_s(),
                 #{
                     mapping => "vm_args.-kernel net_ticktime",
-                    default => "2m",
+                    default => <<"2m">>,
                     'readOnly' => true,
                     desc => ?DESC(node_dist_net_ticktime)
                 }
@@ -624,7 +624,7 @@ fields("cluster_call") ->
                 emqx_schema:duration(),
                 #{
                     desc => ?DESC(cluster_call_retry_interval),
-                    default => "1m"
+                    default => <<"1m">>
                 }
             )},
         {"max_history",
@@ -640,7 +640,7 @@ fields("cluster_call") ->
                 emqx_schema:duration(),
                 #{
                     desc => ?DESC(cluster_call_cleanup_interval),
-                    default => "5m"
+                    default => <<"5m">>
                 }
             )}
     ];
@@ -712,7 +712,7 @@ fields("rpc") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "gen_rpc.connect_timeout",
-                    default => "5s",
+                    default => <<"5s">>,
                     desc => ?DESC(rpc_connect_timeout)
                 }
             )},
@@ -745,7 +745,7 @@ fields("rpc") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "gen_rpc.send_timeout",
-                    default => "5s",
+                    default => <<"5s">>,
                     desc => ?DESC(rpc_send_timeout)
                 }
             )},
@@ -754,7 +754,7 @@ fields("rpc") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "gen_rpc.authentication_timeout",
-                    default => "5s",
+                    default => <<"5s">>,
                     desc => ?DESC(rpc_authentication_timeout)
                 }
             )},
@@ -763,7 +763,7 @@ fields("rpc") ->
                 emqx_schema:duration(),
                 #{
                     mapping => "gen_rpc.call_receive_timeout",
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC(rpc_call_receive_timeout)
                 }
             )},
@@ -772,7 +772,7 @@ fields("rpc") ->
                 emqx_schema:duration_s(),
                 #{
                     mapping => "gen_rpc.socket_keepalive_idle",
-                    default => "15m",
+                    default => <<"15m">>,
                     desc => ?DESC(rpc_socket_keepalive_idle)
                 }
             )},
@@ -781,7 +781,7 @@ fields("rpc") ->
                 emqx_schema:duration_s(),
                 #{
                     mapping => "gen_rpc.socket_keepalive_interval",
-                    default => "75s",
+                    default => <<"75s">>,
                     desc => ?DESC(rpc_socket_keepalive_interval)
                 }
             )},
@@ -799,7 +799,7 @@ fields("rpc") ->
                 emqx_schema:bytesize(),
                 #{
                     mapping => "gen_rpc.socket_sndbuf",
-                    default => "1MB",
+                    default => <<"1MB">>,
                     desc => ?DESC(rpc_socket_sndbuf)
                 }
             )},
@@ -808,7 +808,7 @@ fields("rpc") ->
                 emqx_schema:bytesize(),
                 #{
                     mapping => "gen_rpc.socket_recbuf",
-                    default => "1MB",
+                    default => <<"1MB">>,
                     desc => ?DESC(rpc_socket_recbuf)
                 }
             )},
@@ -817,7 +817,7 @@ fields("rpc") ->
                 emqx_schema:bytesize(),
                 #{
                     mapping => "gen_rpc.socket_buffer",
-                    default => "1MB",
+                    default => <<"1MB">>,
                     desc => ?DESC(rpc_socket_buffer)
                 }
             )},
@@ -861,7 +861,7 @@ fields("log_file_handler") ->
             sc(
                 hoconsc:union([infinity, emqx_schema:bytesize()]),
                 #{
-                    default => "50MB",
+                    default => <<"50MB">>,
                     desc => ?DESC("log_file_handler_max_size")
                 }
             )}
@@ -899,7 +899,7 @@ fields("log_overload_kill") ->
             sc(
                 emqx_schema:bytesize(),
                 #{
-                    default => "30MB",
+                    default => <<"30MB">>,
                     desc => ?DESC("log_overload_kill_mem_size")
                 }
             )},
@@ -915,7 +915,7 @@ fields("log_overload_kill") ->
             sc(
                 hoconsc:union([emqx_schema:duration_ms(), infinity]),
                 #{
-                    default => "5s",
+                    default => <<"5s">>,
                     desc => ?DESC("log_overload_kill_restart_after")
                 }
             )}
@@ -942,7 +942,7 @@ fields("log_burst_limit") ->
             sc(
                 emqx_schema:duration(),
                 #{
-                    default => "1s",
+                    default => <<"1s">>,
                     desc => ?DESC("log_burst_limit_window_time")
                 }
             )}
@@ -1092,7 +1092,7 @@ log_handler_common_confs(Enable) ->
             sc(
                 string(),
                 #{
-                    default => "system",
+                    default => <<"system">>,
                     desc => ?DESC("common_handler_time_offset"),
                     validator => fun validate_time_offset/1
                 }
@@ -1169,9 +1169,9 @@ crash_dump_file_default() ->
     case os:getenv("RUNNER_LOG_DIR") of
         false ->
             %% testing, or running emqx app as deps
-            "log/erl_crash.dump";
+            <<"log/erl_crash.dump">>;
         Dir ->
-            [filename:join([Dir, "erl_crash.dump"])]
+            unicode:characters_to_binary(filename:join([Dir, "erl_crash.dump"]), utf8)
     end.
 
 %% utils

+ 6 - 2
apps/emqx_connector/i18n/emqx_connector_mqtt_schema.conf

@@ -114,9 +114,13 @@ topic filters for <code>remote.topic</code> of ingress connections."""
         desc {
             en: """If enable bridge mode.
 NOTE: This setting is only for MQTT protocol version older than 5.0, and the remote MQTT
-broker MUST support this feature."""
+broker MUST support this feature.
+If bridge_mode is set to true, the bridge will indicate to the remote broker that it is a bridge not an ordinary client.
+This means that loop detection will be more effective and that retained messages will be propagated correctly."""
             zh: """是否启用 Bridge Mode。
-注意:此设置只针对 MQTT 协议版本 < 5.0 有效,并且需要远程 MQTT Broker 支持 Bridge Mode。"""
+注意:此设置只针对 MQTT 协议版本 < 5.0 有效,并且需要远程 MQTT Broker 支持 Bridge Mode。
+如果设置为 true ,桥接会告诉远端服务器当前连接是一个桥接而不是一个普通的客户端。
+这意味着消息回环检测会更加高效,并且远端服务器收到的保留消息的标志位会透传给本地。"""
         }
         label {
             en: "Bridge Mode"

+ 1 - 1
apps/emqx_connector/src/emqx_connector_http.erl

@@ -87,7 +87,7 @@ fields(config) ->
             sc(
                 emqx_schema:duration_ms(),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC("connect_timeout")
                 }
             )},

+ 2 - 2
apps/emqx_connector/src/mqtt/emqx_connector_mqtt_schema.erl

@@ -115,12 +115,12 @@ fields("server_configs") ->
                     desc => ?DESC("clean_start")
                 }
             )},
-        {keepalive, mk_duration("MQTT Keepalive.", #{default => "300s"})},
+        {keepalive, mk_duration("MQTT Keepalive.", #{default => <<"300s">>})},
         {retry_interval,
             mk_duration(
                 "Message retry interval. Delay for the MQTT bridge to retry sending the QoS1/QoS2 "
                 "messages in case of ACK not received.",
-                #{default => "15s"}
+                #{default => <<"15s">>}
             )},
         {max_inflight,
             mk(

+ 1 - 1
apps/emqx_ctl/src/emqx_ctl.erl

@@ -149,7 +149,7 @@ help() ->
         [] ->
             print("No commands available.~n");
         Cmds ->
-            print("Usage: ~ts~n", [?MODULE]),
+            print("Usage: ~ts~n", ["emqx ctl"]),
             lists:foreach(
                 fun({_, {Mod, Cmd}, _}) ->
                     print("~110..-s~n", [""]),

+ 19 - 26
apps/emqx_dashboard/src/emqx_dashboard_monitor_api.erl

@@ -55,7 +55,7 @@ schema("/monitor/nodes/:node") ->
             parameters => [parameter_node(), parameter_latest()],
             responses => #{
                 200 => hoconsc:mk(hoconsc:array(hoconsc:ref(sampler)), #{}),
-                400 => emqx_dashboard_swagger:error_codes(['BAD_RPC'], <<"Bad RPC">>)
+                404 => emqx_dashboard_swagger:error_codes(['NOT_FOUND'], <<"Node not found">>)
             }
         }
     };
@@ -79,7 +79,7 @@ schema("/monitor_current/nodes/:node") ->
             parameters => [parameter_node()],
             responses => #{
                 200 => hoconsc:mk(hoconsc:ref(sampler_current), #{}),
-                400 => emqx_dashboard_swagger:error_codes(['BAD_RPC'], <<"Bad RPC">>)
+                404 => emqx_dashboard_swagger:error_codes(['NOT_FOUND'], <<"Node not found">>)
             }
         }
     }.
@@ -122,38 +122,31 @@ fields(sampler_current) ->
 monitor(get, #{query_string := QS, bindings := Bindings}) ->
     Latest = maps:get(<<"latest">>, QS, infinity),
     RawNode = maps:get(node, Bindings, all),
-    case emqx_misc:safe_to_existing_atom(RawNode, utf8) of
-        {ok, Node} ->
-            case emqx_dashboard_monitor:samplers(Node, Latest) of
-                {badrpc, {Node, Reason}} ->
-                    Message = list_to_binary(
-                        io_lib:format("Bad node ~p, rpc failed ~p", [Node, Reason])
-                    ),
-                    {400, 'BAD_RPC', Message};
-                Samplers ->
-                    {200, Samplers}
-            end;
-        _ ->
-            Message = list_to_binary(io_lib:format("Bad node ~p", [RawNode])),
-            {400, 'BAD_RPC', Message}
+    with_node(RawNode, dashboard_samplers_fun(Latest)).
+
+dashboard_samplers_fun(Latest) ->
+    fun(NodeOrCluster) ->
+        case emqx_dashboard_monitor:samplers(NodeOrCluster, Latest) of
+            {badrpc, _} = Error -> Error;
+            Samplers -> {ok, Samplers}
+        end
     end.
 
 monitor_current(get, #{bindings := Bindings}) ->
     RawNode = maps:get(node, Bindings, all),
+    with_node(RawNode, fun emqx_dashboard_monitor:current_rate/1).
+
+with_node(RawNode, Fun) ->
     case emqx_misc:safe_to_existing_atom(RawNode, utf8) of
         {ok, NodeOrCluster} ->
-            case emqx_dashboard_monitor:current_rate(NodeOrCluster) of
-                {ok, CurrentRate} ->
-                    {200, CurrentRate};
+            case Fun(NodeOrCluster) of
                 {badrpc, {Node, Reason}} ->
-                    Message = list_to_binary(
-                        io_lib:format("Bad node ~p, rpc failed ~p", [Node, Reason])
-                    ),
-                    {400, 'BAD_RPC', Message}
+                    {404, 'NOT_FOUND', io_lib:format("Node not found: ~p (~p)", [Node, Reason])};
+                {ok, Result} ->
+                    {200, Result}
             end;
-        {error, _} ->
-            Message = list_to_binary(io_lib:format("Bad node ~p", [RawNode])),
-            {400, 'BAD_RPC', Message}
+        _Error ->
+            {404, 'NOT_FOUND', io_lib:format("Node not found: ~p", [RawNode])}
     end.
 
 %% -------------------------------------------------------------------------------------------------

+ 5 - 5
apps/emqx_dashboard/src/emqx_dashboard_schema.erl

@@ -40,7 +40,7 @@ fields("dashboard") ->
             ?HOCON(
                 emqx_schema:duration_s(),
                 #{
-                    default => "10s",
+                    default => <<"10s">>,
                     desc => ?DESC(sample_interval),
                     validator => fun validate_sample_interval/1
                 }
@@ -49,7 +49,7 @@ fields("dashboard") ->
             ?HOCON(
                 emqx_schema:duration(),
                 #{
-                    default => "60m",
+                    default => <<"60m">>,
                     desc => ?DESC(token_expired_time)
                 }
             )},
@@ -141,7 +141,7 @@ common_listener_fields() ->
             ?HOCON(
                 emqx_schema:duration(),
                 #{
-                    default => "10s",
+                    default => <<"10s">>,
                     desc => ?DESC(send_timeout)
                 }
             )},
@@ -206,14 +206,14 @@ desc(_) ->
     undefined.
 
 default_username(type) -> binary();
-default_username(default) -> "admin";
+default_username(default) -> <<"admin">>;
 default_username(required) -> true;
 default_username(desc) -> ?DESC(default_username);
 default_username('readOnly') -> true;
 default_username(_) -> undefined.
 
 default_password(type) -> binary();
-default_password(default) -> "public";
+default_password(default) -> <<"public">>;
 default_password(required) -> true;
 default_password('readOnly') -> true;
 default_password(sensitive) -> true;

+ 5 - 1
apps/emqx_dashboard/src/emqx_dashboard_swagger.erl

@@ -417,13 +417,17 @@ init_prop(Keys, Init, Type) ->
         fun(Key, Acc) ->
             case hocon_schema:field_schema(Type, Key) of
                 undefined -> Acc;
-                Schema -> Acc#{Key => to_bin(Schema)}
+                Schema -> Acc#{Key => format_prop(Key, Schema)}
             end
         end,
         Init,
         Keys
     ).
 
+format_prop(deprecated, Value) when is_boolean(Value) -> Value;
+format_prop(deprecated, _) -> true;
+format_prop(_, Schema) -> to_bin(Schema).
+
 trans_required(Spec, true, _) -> Spec#{required => true};
 trans_required(Spec, _, path) -> Spec#{required => true};
 trans_required(Spec, _, _) -> Spec.

+ 2 - 4
apps/emqx_dashboard/test/emqx_dashboard_monitor_SUITE.erl

@@ -22,8 +22,6 @@
 -import(emqx_dashboard_SUITE, [auth_header_/0]).
 
 -include_lib("eunit/include/eunit.hrl").
--include_lib("common_test/include/ct.hrl").
--include_lib("emqx/include/emqx.hrl").
 -include("emqx_dashboard.hrl").
 
 -define(SERVER, "http://127.0.0.1:18083").
@@ -114,9 +112,9 @@ t_monitor_reset(_) ->
     ok.
 
 t_monitor_api_error(_) ->
-    {error, {400, #{<<"code">> := <<"BAD_RPC">>}}} =
+    {error, {404, #{<<"code">> := <<"NOT_FOUND">>}}} =
         request(["monitor", "nodes", 'emqx@127.0.0.2']),
-    {error, {400, #{<<"code">> := <<"BAD_RPC">>}}} =
+    {error, {404, #{<<"code">> := <<"NOT_FOUND">>}}} =
         request(["monitor_current", "nodes", 'emqx@127.0.0.2']),
     {error, {400, #{<<"code">> := <<"BAD_REQUEST">>}}} =
         request(["monitor"], "latest=0"),

+ 3 - 3
apps/emqx_dashboard/test/emqx_swagger_remote_schema.erl

@@ -32,8 +32,8 @@ fields("root") ->
             )},
         {default_username, fun default_username/1},
         {default_password, fun default_password/1},
-        {sample_interval, mk(emqx_schema:duration_s(), #{default => "10s"})},
-        {token_expired_time, mk(emqx_schema:duration(), #{default => "30m"})}
+        {sample_interval, mk(emqx_schema:duration_s(), #{default => <<"10s">>})},
+        {token_expired_time, mk(emqx_schema:duration(), #{default => <<"30m">>})}
     ];
 fields("ref1") ->
     [
@@ -52,7 +52,7 @@ fields("ref3") ->
     ].
 
 default_username(type) -> string();
-default_username(default) -> "admin";
+default_username(default) -> <<"admin">>;
 default_username(required) -> true;
 default_username(_) -> undefined.
 

+ 32 - 2
apps/emqx_dashboard/test/emqx_swagger_requestBody_SUITE.erl

@@ -94,6 +94,30 @@ t_object(_Config) ->
     validate("/object", Spec, Refs),
     ok.
 
+t_deprecated(_Config) ->
+    ?assertMatch(
+        [
+            #{
+                <<"emqx_swagger_requestBody_SUITE.deprecated_ref">> :=
+                    #{
+                        <<"properties">> :=
+                            [
+                                {<<"tag1">>, #{
+                                    deprecated := true
+                                }},
+                                {<<"tag2">>, #{
+                                    deprecated := true
+                                }},
+                                {<<"tag3">>, #{
+                                    deprecated := false
+                                }}
+                            ]
+                    }
+            }
+        ],
+        emqx_dashboard_swagger:components([{?MODULE, deprecated_ref}], #{})
+    ).
+
 t_nest_object(_Config) ->
     GoodRef = <<"#/components/schemas/emqx_swagger_requestBody_SUITE.good_ref">>,
     Spec = #{
@@ -790,7 +814,7 @@ to_schema(Body) ->
 
 fields(good_ref) ->
     [
-        {'webhook-host', mk(emqx_schema:ip_port(), #{default => "127.0.0.1:80"})},
+        {'webhook-host', mk(emqx_schema:ip_port(), #{default => <<"127.0.0.1:80">>})},
         {log_dir, mk(emqx_schema:file(), #{example => "var/log/emqx"})},
         {tag, mk(binary(), #{desc => <<"tag">>})}
     ];
@@ -812,7 +836,13 @@ fields(sub_fields) ->
             {init_file, fun init_file/1}
         ],
         desc => <<"test sub fields">>
-    }.
+    };
+fields(deprecated_ref) ->
+    [
+        {tag1, mk(binary(), #{desc => <<"tag1">>, deprecated => {since, "4.3.0"}})},
+        {tag2, mk(binary(), #{desc => <<"tag2">>, deprecated => true})},
+        {tag3, mk(binary(), #{desc => <<"tag3">>, deprecated => false})}
+    ].
 
 enable(type) -> boolean();
 enable(desc) -> <<"Whether to enable tls psk support">>;

+ 1 - 1
apps/emqx_dashboard/test/emqx_swagger_response_SUITE.erl

@@ -689,7 +689,7 @@ to_schema(Object) ->
 
 fields(good_ref) ->
     [
-        {'webhook-host', mk(emqx_schema:ip_port(), #{default => "127.0.0.1:80"})},
+        {'webhook-host', mk(emqx_schema:ip_port(), #{default => <<"127.0.0.1:80">>})},
         {log_dir, mk(emqx_schema:file(), #{example => "var/log/emqx"})},
         {tag, mk(binary(), #{desc => <<"tag">>})}
     ];

+ 1 - 1
apps/emqx_exhook/src/emqx_exhook.app.src

@@ -1,7 +1,7 @@
 %% -*- mode: erlang -*-
 {application, emqx_exhook, [
     {description, "EMQX Extension for Hook"},
-    {vsn, "5.0.9"},
+    {vsn, "5.0.10"},
     {modules, []},
     {registered, []},
     {mod, {emqx_exhook_app, []}},

+ 2 - 2
apps/emqx_exhook/src/emqx_exhook_api.erl

@@ -229,9 +229,9 @@ server_conf_schema() ->
             name => "default",
             enable => true,
             url => <<"http://127.0.0.1:8081">>,
-            request_timeout => "5s",
+            request_timeout => <<"5s">>,
             failed_action => deny,
-            auto_reconnect => "60s",
+            auto_reconnect => <<"60s">>,
             pool_size => 8,
             ssl => SSL
         }

+ 2 - 2
apps/emqx_exhook/src/emqx_exhook_schema.erl

@@ -63,7 +63,7 @@ fields(server) ->
             })},
         {request_timeout,
             ?HOCON(emqx_schema:duration(), #{
-                default => "5s",
+                default => <<"5s">>,
                 desc => ?DESC(request_timeout)
             })},
         {failed_action, failed_action()},
@@ -74,7 +74,7 @@ fields(server) ->
             })},
         {auto_reconnect,
             ?HOCON(hoconsc:union([false, emqx_schema:duration()]), #{
-                default => "60s",
+                default => <<"60s">>,
                 desc => ?DESC(auto_reconnect)
             })},
         {pool_size,

+ 6 - 2
apps/emqx_gateway/src/emqx_gateway_api_clients.erl

@@ -19,7 +19,6 @@
 -include("emqx_gateway_http.hrl").
 -include_lib("typerefl/include/types.hrl").
 -include_lib("hocon/include/hoconsc.hrl").
--include_lib("emqx/include/emqx_placeholder.hrl").
 -include_lib("emqx/include/logger.hrl").
 
 -behaviour(minirest_api).
@@ -464,7 +463,12 @@ schema("/gateways/:name/clients") ->
                 summary => <<"List Gateway's Clients">>,
                 parameters => params_client_query(),
                 responses =>
-                    ?STANDARD_RESP(#{200 => schema_client_list()})
+                    ?STANDARD_RESP(#{
+                        200 => [
+                            {data, schema_client_list()},
+                            {meta, mk(hoconsc:ref(emqx_dashboard_swagger, meta), #{})}
+                        ]
+                    })
             }
     };
 schema("/gateways/:name/clients/:clientid") ->

+ 5 - 5
apps/emqx_gateway/src/emqx_gateway_schema.erl

@@ -267,7 +267,7 @@ fields(lwm2m) ->
             sc(
                 duration(),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC(lwm2m_lifetime_min)
                 }
             )},
@@ -275,7 +275,7 @@ fields(lwm2m) ->
             sc(
                 duration(),
                 #{
-                    default => "86400s",
+                    default => <<"86400s">>,
                     desc => ?DESC(lwm2m_lifetime_max)
                 }
             )},
@@ -283,7 +283,7 @@ fields(lwm2m) ->
             sc(
                 duration_s(),
                 #{
-                    default => "22s",
+                    default => <<"22s">>,
                     desc => ?DESC(lwm2m_qmode_time_window)
                 }
             )},
@@ -624,7 +624,7 @@ mountpoint(Default) ->
     sc(
         binary(),
         #{
-            default => Default,
+            default => iolist_to_binary(Default),
             desc => ?DESC(gateway_common_mountpoint)
         }
     ).
@@ -707,7 +707,7 @@ proxy_protocol_opts() ->
             sc(
                 duration(),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     desc => ?DESC(tcp_listener_proxy_protocol_timeout)
                 }
             )}

+ 214 - 0
apps/emqx_machine/src/emqx_cover.erl

@@ -0,0 +1,214 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2023 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+
+%% @doc This module is NOT used in production.
+%% It is used to collect coverage data when running blackbox test
+-module(emqx_cover).
+
+-include_lib("covertool/include/covertool.hrl").
+
+-ifdef(EMQX_ENTERPRISE).
+-define(OUTPUT_APPNAME, 'EMQX Enterprise').
+-else.
+-define(OUTPUT_APPNAME, 'EMQX').
+-endif.
+
+-export([
+    start/0,
+    start/1,
+    abort/0,
+    export_and_stop/1,
+    lookup_source/1
+]).
+
+%% This is a ETS table to keep a mapping of module name (atom) to
+%% .erl file path (relative path from project root)
+%% We needed this ETS table because the source file information
+%% is missing from the .beam metadata sicne we are using 'deterministic'
+%% compile flag.
+-define(SRC, emqx_cover_module_src).
+
+%% @doc Start cover.
+%% All emqx_ modules will be cover-compiled, this may cause
+%% some excessive RAM consumption and result in warning logs.
+start() ->
+    start(#{}).
+
+%% @doc Start cover.
+%% All emqx_ modules will be cover-compiled, this may cause
+%% some excessive RAM consumption and result in warning logs.
+%% Supported options:
+%% - project_root: the directory to search for .erl source code
+%% - debug_secret_file: only applicable to EMQX Enterprise
+start(Opts) ->
+    ok = abort(),
+    DefaultDir = os_env("EMQX_PROJECT_ROOT"),
+    ProjRoot = maps:get(project_root, Opts, DefaultDir),
+    case ProjRoot =:= "" of
+        true ->
+            io:format("Project source code root dir is not provided.~n"),
+            io:format(
+                "You may either start EMQX node with environment variable EMQX_PROJECT_ROOT set~n"
+            ),
+            io:format("Or provide #{project_root => \"/path/to/emqx/\"} as emqx_cover:start arg~n"),
+            exit(project_root_is_not_set);
+        false ->
+            ok
+    end,
+    %% spawn a ets table owner
+    %% this implementation is kept dead-simple
+    %% because there is no concurrency requirement
+    Parent = self(),
+    {Pid, Ref} =
+        erlang:spawn_monitor(
+            fun() ->
+                true = register(?SRC, self()),
+                _ = ets:new(?SRC, [named_table, public]),
+                _ = Parent ! {started, self()},
+                receive
+                    stop ->
+                        ok
+                end
+            end
+        ),
+    receive
+        {started, Pid} ->
+            ok;
+        {'DOWN', Ref, process, Pid, Reason} ->
+            throw({failed_to_start, Reason})
+    after 1000 ->
+        throw({failed_to_start, timeout})
+    end,
+    Modules = modules(Opts),
+    case cover:start() of
+        {ok, _Pid} ->
+            ok;
+        {error, {already_started, _Pid}} ->
+            ok;
+        Other ->
+            throw(Other)
+    end,
+    ok = cover_compile(Modules),
+    io:format("cover-compiled ~p modules~n", [length(Modules)]),
+    ok = put_project_root(ProjRoot),
+    ok = do_build_source_mapping(ProjRoot, Modules),
+    CachedModulesCount = ets:info(?SRC, size),
+    io:format("source-cached ~p modules~n", [CachedModulesCount]),
+    ok.
+
+%% @doc Abort cover data collection without exporting.
+abort() ->
+    _ = cover:stop(),
+    case whereis(?SRC) of
+        undefined ->
+            ok;
+        Pid ->
+            Ref = monitor(process, Pid),
+            exit(Pid, kill),
+            receive
+                {'DOWN', Ref, process, Pid, _} ->
+                    ok
+            end
+    end,
+    ok.
+
+%% @doc Export coverage report (xml) format.
+%% e.g. `emqx_cover:export_and_stop("/tmp/cover.xml").'
+export_and_stop(Path) when is_list(Path) ->
+    ProjectRoot = get_project_root(),
+    Config = #config{
+        appname = ?OUTPUT_APPNAME,
+        sources = [ProjectRoot],
+        output = Path,
+        lookup_source = fun ?MODULE:lookup_source/1
+    },
+    covertool:generate_report(Config, cover:modules()).
+
+get_project_root() ->
+    [{_, Dir}] = ets:lookup(?SRC, {root, ?OUTPUT_APPNAME}),
+    Dir.
+
+put_project_root(Dir) ->
+    _ = ets:insert(?SRC, {{root, ?OUTPUT_APPNAME}, Dir}),
+    ok.
+
+do_build_source_mapping(Dir, Modules0) ->
+    Modules = sets:from_list(Modules0, [{version, 2}]),
+    All = filelib:wildcard("**/*.erl", Dir),
+    lists:foreach(
+        fun(Path) ->
+            ModuleNameStr = filename:basename(Path, ".erl"),
+            Module = list_to_atom(ModuleNameStr),
+            case sets:is_element(Module, Modules) of
+                true ->
+                    ets:insert(?SRC, {Module, Path});
+                false ->
+                    ok
+            end
+        end,
+        All
+    ),
+    ok.
+
+lookup_source(Module) ->
+    case ets:lookup(?SRC, Module) of
+        [{_, Path}] ->
+            Path;
+        [] ->
+            false
+    end.
+
+modules(_Opts) ->
+    %% TODO better filter based on Opts,
+    %% e.g. we may want to see coverage info for ehttpc
+    Filter = fun is_emqx_module/1,
+    find_modules(Filter).
+
+cover_compile(Modules) ->
+    Results = cover:compile_beam(Modules),
+    Errors = lists:filter(
+        fun
+            ({ok, _}) -> false;
+            (_) -> true
+        end,
+        Results
+    ),
+    case Errors of
+        [] ->
+            ok;
+        _ ->
+            io:format("failed_to_cover_compile:~n~p~n", [Errors]),
+            throw(failed_to_cover_compile)
+    end.
+
+find_modules(Filter) ->
+    All = code:all_loaded(),
+    F = fun({M, _BeamPath}) -> Filter(M) andalso {true, M} end,
+    lists:filtermap(F, All).
+
+is_emqx_module(?MODULE) ->
+    %% do not cover-compile self
+    false;
+is_emqx_module(Module) ->
+    case erlang:atom_to_binary(Module, utf8) of
+        <<"emqx", _/binary>> ->
+            true;
+        _ ->
+            false
+    end.
+
+os_env(Name) ->
+    os:getenv(Name, "").

+ 1 - 1
apps/emqx_machine/src/emqx_machine.app.src

@@ -3,7 +3,7 @@
     {id, "emqx_machine"},
     {description, "The EMQX Machine"},
     % strict semver, bump manually!
-    {vsn, "0.1.4"},
+    {vsn, "0.2.0"},
     {modules, []},
     {registered, []},
     {applications, [kernel, stdlib, emqx_ctl]},

+ 1 - 1
apps/emqx_management/include/emqx_mgmt.hrl

@@ -16,4 +16,4 @@
 
 -define(MANAGEMENT_SHARD, emqx_management_shard).
 
--define(MAX_ROW_LIMIT, 100).
+-define(DEFAULT_ROW_LIMIT, 100).

+ 41 - 57
apps/emqx_management/src/emqx_mgmt.erl

@@ -21,8 +21,6 @@
 -elvis([{elvis_style, god_modules, disable}]).
 
 -include_lib("stdlib/include/qlc.hrl").
--include_lib("emqx/include/emqx.hrl").
--include_lib("emqx/include/emqx_mqtt.hrl").
 
 %% Nodes and Brokers API
 -export([
@@ -71,8 +69,6 @@
     list_subscriptions/1,
     list_subscriptions_via_topic/2,
     list_subscriptions_via_topic/3,
-    lookup_subscriptions/1,
-    lookup_subscriptions/2,
 
     do_list_subscriptions/0
 ]).
@@ -105,12 +101,10 @@
 
 %% Common Table API
 -export([
-    max_row_limit/0,
+    default_row_limit/0,
     vm_stats/0
 ]).
 
--define(APP, emqx_management).
-
 -elvis([{elvis_style, god_modules, disable}]).
 
 %%--------------------------------------------------------------------
@@ -162,7 +156,7 @@ node_info(Nodes) ->
     emqx_rpc:unwrap_erpc(emqx_management_proto_v3:node_info(Nodes)).
 
 stopped_node_info(Node) ->
-    #{name => Node, node_status => 'stopped'}.
+    {Node, #{node => Node, node_status => 'stopped'}}.
 
 vm_stats() ->
     Idle =
@@ -194,8 +188,13 @@ lookup_broker(Node) ->
     Broker.
 
 broker_info() ->
-    Info = maps:from_list([{K, iolist_to_binary(V)} || {K, V} <- emqx_sys:info()]),
-    Info#{node => node(), otp_release => otp_rel(), node_status => 'Running'}.
+    Info = lists:foldl(fun convert_broker_info/2, #{}, emqx_sys:info()),
+    Info#{node => node(), otp_release => otp_rel(), node_status => 'running'}.
+
+convert_broker_info({uptime, Uptime}, M) ->
+    M#{uptime => emqx_datetime:human_readable_duration_string(Uptime)};
+convert_broker_info({K, V}, M) ->
+    M#{K => iolist_to_binary(V)}.
 
 broker_info(Nodes) ->
     emqx_rpc:unwrap_erpc(emqx_management_proto_v3:broker_info(Nodes)).
@@ -265,7 +264,7 @@ lookup_client({username, Username}, FormatFun) ->
      || Node <- mria_mnesia:running_nodes()
     ]).
 
-lookup_client(Node, Key, {M, F}) ->
+lookup_client(Node, Key, FormatFun) ->
     case unwrap_rpc(emqx_cm_proto_v1:lookup_client(Node, Key)) of
         {error, Err} ->
             {error, Err};
@@ -273,18 +272,23 @@ lookup_client(Node, Key, {M, F}) ->
             lists:map(
                 fun({Chan, Info0, Stats}) ->
                     Info = Info0#{node => Node},
-                    M:F({Chan, Info, Stats})
+                    maybe_format(FormatFun, {Chan, Info, Stats})
                 end,
                 L
             )
     end.
 
-kickout_client({ClientID, FormatFun}) ->
-    case lookup_client({clientid, ClientID}, FormatFun) of
+maybe_format(undefined, A) ->
+    A;
+maybe_format({M, F}, A) ->
+    M:F(A).
+
+kickout_client(ClientId) ->
+    case lookup_client({clientid, ClientId}, undefined) of
         [] ->
             {error, not_found};
         _ ->
-            Results = [kickout_client(Node, ClientID) || Node <- mria_mnesia:running_nodes()],
+            Results = [kickout_client(Node, ClientId) || Node <- mria_mnesia:running_nodes()],
             check_results(Results)
     end.
 
@@ -295,17 +299,22 @@ list_authz_cache(ClientId) ->
     call_client(ClientId, list_authz_cache).
 
 list_client_subscriptions(ClientId) ->
-    Results = [client_subscriptions(Node, ClientId) || Node <- mria_mnesia:running_nodes()],
-    Filter =
-        fun
-            ({error, _}) ->
-                false;
-            ({_Node, List}) ->
-                erlang:is_list(List) andalso 0 < erlang:length(List)
-        end,
-    case lists:filter(Filter, Results) of
-        [] -> [];
-        [Result | _] -> Result
+    case lookup_client({clientid, ClientId}, undefined) of
+        [] ->
+            {error, not_found};
+        _ ->
+            Results = [client_subscriptions(Node, ClientId) || Node <- mria_mnesia:running_nodes()],
+            Filter =
+                fun
+                    ({error, _}) ->
+                        false;
+                    ({_Node, List}) ->
+                        erlang:is_list(List) andalso 0 < erlang:length(List)
+                end,
+            case lists:filter(Filter, Results) of
+                [] -> [];
+                [Result | _] -> Result
+            end
     end.
 
 client_subscriptions(Node, ClientId) ->
@@ -388,17 +397,11 @@ call_client(Node, ClientId, Req) ->
 %% Subscriptions
 %%--------------------------------------------------------------------
 
--spec do_list_subscriptions() -> [map()].
+-spec do_list_subscriptions() -> no_return().
 do_list_subscriptions() ->
-    case check_row_limit([mqtt_subproperty]) of
-        false ->
-            throw(max_row_limit);
-        ok ->
-            [
-                #{topic => Topic, clientid => ClientId, options => Options}
-             || {{Topic, ClientId}, Options} <- ets:tab2list(mqtt_subproperty)
-            ]
-    end.
+    %% [FIXME] Add function to `emqx_broker` that returns list of subscriptions
+    %% and either redirect from here or bpapi directly (EMQX-8993).
+    throw(not_implemented).
 
 list_subscriptions(Node) ->
     unwrap_rpc(emqx_management_proto_v3:list_subscriptions(Node)).
@@ -415,12 +418,6 @@ list_subscriptions_via_topic(Node, Topic, _FormatFun = {M, F}) ->
         Result -> M:F(Result)
     end.
 
-lookup_subscriptions(ClientId) ->
-    lists:append([lookup_subscriptions(Node, ClientId) || Node <- mria_mnesia:running_nodes()]).
-
-lookup_subscriptions(Node, ClientId) ->
-    unwrap_rpc(emqx_broker_proto_v1:list_client_subscriptions(Node, ClientId)).
-
 %%--------------------------------------------------------------------
 %% PubSub
 %%--------------------------------------------------------------------
@@ -556,24 +553,11 @@ unwrap_rpc(Res) ->
 otp_rel() ->
     iolist_to_binary([emqx_vm:get_otp_version(), "/", erlang:system_info(version)]).
 
-check_row_limit(Tables) ->
-    check_row_limit(Tables, max_row_limit()).
-
-check_row_limit([], _Limit) ->
-    ok;
-check_row_limit([Tab | Tables], Limit) ->
-    case table_size(Tab) > Limit of
-        true -> false;
-        false -> check_row_limit(Tables, Limit)
-    end.
-
 check_results(Results) ->
     case lists:any(fun(Item) -> Item =:= ok end, Results) of
         true -> ok;
         false -> unwrap_rpc(lists:last(Results))
     end.
 
-max_row_limit() ->
-    ?MAX_ROW_LIMIT.
-
-table_size(Tab) -> ets:info(Tab, size).
+default_row_limit() ->
+    ?DEFAULT_ROW_LIMIT.

+ 3 - 3
apps/emqx_management/src/emqx_mgmt_api.erl

@@ -98,8 +98,8 @@ count(Table) ->
 page(Params) ->
     maps:get(<<"page">>, Params, 1).
 
-limit(Params) ->
-    maps:get(<<"limit">>, Params, emqx_mgmt:max_row_limit()).
+limit(Params) when is_map(Params) ->
+    maps:get(<<"limit">>, Params, emqx_mgmt:default_row_limit()).
 
 %%--------------------------------------------------------------------
 %% Node Query
@@ -683,7 +683,7 @@ paginate_test_() ->
     Size = 1000,
     MyLimit = 10,
     ets:insert(?MODULE, [{I, foo} || I <- lists:seq(1, Size)]),
-    DefaultLimit = emqx_mgmt:max_row_limit(),
+    DefaultLimit = emqx_mgmt:default_row_limit(),
     NoParamsResult = paginate(?MODULE, #{}, {?MODULE, paginate_test_format}),
     PaginateResults = [
         paginate(

+ 6 - 5
apps/emqx_management/src/emqx_mgmt_api_clients.erl

@@ -274,11 +274,10 @@ schema("/clients/:clientid/subscriptions") ->
             responses => #{
                 200 => hoconsc:mk(
                     hoconsc:array(hoconsc:ref(emqx_mgmt_api_subscriptions, subscription)), #{}
+                ),
+                404 => emqx_dashboard_swagger:error_codes(
+                    ['CLIENTID_NOT_FOUND'], <<"Client ID not found">>
                 )
-                %% returns [] if client not existed in cluster
-                %404 => emqx_dashboard_swagger:error_codes(
-                %    ['CLIENTID_NOT_FOUND'], <<"Client ID not found">>
-                %)
             }
         }
     };
@@ -599,6 +598,8 @@ unsubscribe_batch(post, #{bindings := #{clientid := ClientID}, body := TopicInfo
 
 subscriptions(get, #{bindings := #{clientid := ClientID}}) ->
     case emqx_mgmt:list_client_subscriptions(ClientID) of
+        {error, not_found} ->
+            {404, ?CLIENTID_NOT_FOUND};
         [] ->
             {200, []};
         {Node, Subs} ->
@@ -677,7 +678,7 @@ lookup(#{clientid := ClientID}) ->
     end.
 
 kickout(#{clientid := ClientID}) ->
-    case emqx_mgmt:kickout_client({ClientID, ?FORMAT_FUN}) of
+    case emqx_mgmt:kickout_client(ClientID) of
         {error, not_found} ->
             {404, ?CLIENTID_NOT_FOUND};
         _ ->

+ 33 - 11
apps/emqx_management/src/emqx_mgmt_api_trace.erl

@@ -47,9 +47,11 @@
     get_trace_size/0
 ]).
 
+-define(MAX_SINT32, 2147483647).
+
 -define(TO_BIN(_B_), iolist_to_binary(_B_)).
 -define(NOT_FOUND(N), {404, #{code => 'NOT_FOUND', message => ?TO_BIN([N, " NOT FOUND"])}}).
--define(BAD_REQUEST(C, M), {400, #{code => C, message => ?TO_BIN(M)}}).
+-define(SERVICE_UNAVAILABLE(C, M), {503, #{code => C, message => ?TO_BIN(M)}}).
 -define(TAGS, [<<"Trace">>]).
 
 namespace() -> "trace".
@@ -148,8 +150,9 @@ schema("/trace/:name/download") ->
                                 #{schema => #{type => "string", format => "binary"}}
                         }
                     },
-                400 => emqx_dashboard_swagger:error_codes(['NODE_ERROR'], <<"Node Not Found">>),
-                404 => emqx_dashboard_swagger:error_codes(['NOT_FOUND'], <<"Trace Name Not Found">>)
+                404 => emqx_dashboard_swagger:error_codes(
+                    ['NOT_FOUND', 'NODE_ERROR'], <<"Trace Name or Node Not Found">>
+                )
             }
         }
     };
@@ -184,8 +187,15 @@ schema("/trace/:name/log") ->
                         {items, hoconsc:mk(binary(), #{example => "TEXT-LOG-ITEMS"})},
                         {meta, fields(bytes) ++ fields(position)}
                     ],
-                400 => emqx_dashboard_swagger:error_codes(['NODE_ERROR'], <<"Trace Log Failed">>),
-                404 => emqx_dashboard_swagger:error_codes(['NOT_FOUND'], <<"Trace Name Not Found">>)
+                400 => emqx_dashboard_swagger:error_codes(
+                    ['BAD_REQUEST'], <<"Bad input parameter">>
+                ),
+                404 => emqx_dashboard_swagger:error_codes(
+                    ['NOT_FOUND', 'NODE_ERROR'], <<"Trace Name or Node Not Found">>
+                ),
+                503 => emqx_dashboard_swagger:error_codes(
+                    ['SERVICE_UNAVAILABLE'], <<"Requested chunk size too big">>
+                )
             }
         }
     }.
@@ -313,12 +323,16 @@ fields(bytes) ->
     [
         {bytes,
             hoconsc:mk(
-                integer(),
+                %% This seems to be the minimum max value we may encounter
+                %% across different OS
+                range(0, ?MAX_SINT32),
                 #{
-                    desc => "Maximum number of bytes to store in request",
+                    desc => "Maximum number of bytes to send in response",
                     in => query,
                     required => false,
-                    default => 1000
+                    default => 1000,
+                    minimum => 0,
+                    maximum => ?MAX_SINT32
                 }
             )}
     ];
@@ -495,7 +509,7 @@ download_trace_log(get, #{bindings := #{name := Name}, query_string := Query}) -
                     },
                     {200, Headers, {file_binary, ZipName, Binary}};
                 {error, not_found} ->
-                    ?BAD_REQUEST('NODE_ERROR', <<"Node not found">>)
+                    ?NOT_FOUND(<<"Node">>)
             end;
         {error, not_found} ->
             ?NOT_FOUND(Name)
@@ -579,11 +593,19 @@ stream_log_file(get, #{bindings := #{name := Name}, query_string := Query}) ->
                     {200, #{meta => Meta, items => <<"">>}};
                 {error, not_found} ->
                     ?NOT_FOUND(Name);
+                {error, enomem} ->
+                    ?SLOG(warning, #{
+                        code => not_enough_mem,
+                        msg => "Requested chunk size too big",
+                        bytes => Bytes,
+                        name => Name
+                    }),
+                    ?SERVICE_UNAVAILABLE('SERVICE_UNAVAILABLE', <<"Requested chunk size too big">>);
                 {badrpc, nodedown} ->
-                    ?BAD_REQUEST('NODE_ERROR', <<"Node not found">>)
+                    ?NOT_FOUND(<<"Node">>)
             end;
         {error, not_found} ->
-            ?BAD_REQUEST('NODE_ERROR', <<"Node not found">>)
+            ?NOT_FOUND(<<"Node">>)
     end.
 
 -spec get_trace_size() -> #{{node(), file:name_all()} => non_neg_integer()}.

+ 1 - 1
apps/emqx_management/src/emqx_mgmt_util.erl

@@ -302,7 +302,7 @@ page_params() ->
             name => limit,
             in => query,
             description => <<"Page size">>,
-            schema => #{type => integer, default => emqx_mgmt:max_row_limit()}
+            schema => #{type => integer, default => emqx_mgmt:default_row_limit()}
         }
     ].
 

+ 387 - 0
apps/emqx_management/test/emqx_mgmt_SUITE.erl

@@ -0,0 +1,387 @@
+%%--------------------------------------------------------------------
+%% Copyright (c) 2022-2023 EMQ Technologies Co., Ltd. All Rights Reserved.
+%%
+%% Licensed under the Apache License, Version 2.0 (the "License");
+%% you may not use this file except in compliance with the License.
+%% You may obtain a copy of the License at
+%%
+%%     http://www.apache.org/licenses/LICENSE-2.0
+%%
+%% Unless required by applicable law or agreed to in writing, software
+%% distributed under the License is distributed on an "AS IS" BASIS,
+%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%% See the License for the specific language governing permissions and
+%% limitations under the License.
+%%--------------------------------------------------------------------
+-module(emqx_mgmt_SUITE).
+
+-compile(export_all).
+-compile(nowarn_export_all).
+
+-include_lib("eunit/include/eunit.hrl").
+-include_lib("common_test/include/ct.hrl").
+
+-export([ident/1]).
+
+-define(FORMATFUN, {?MODULE, ident}).
+
+all() ->
+    emqx_common_test_helpers:all(?MODULE).
+
+init_per_suite(Config) ->
+    emqx_mgmt_api_test_util:init_suite([emqx_conf, emqx_management]),
+    Config.
+
+end_per_suite(_) ->
+    emqx_mgmt_api_test_util:end_suite([emqx_management, emqx_conf]).
+
+init_per_testcase(TestCase, Config) ->
+    meck:expect(mria_mnesia, running_nodes, 0, [node()]),
+    emqx_common_test_helpers:init_per_testcase(?MODULE, TestCase, Config).
+
+end_per_testcase(TestCase, Config) ->
+    meck:unload(mria_mnesia),
+    emqx_common_test_helpers:end_per_testcase(?MODULE, TestCase, Config).
+
+t_list_nodes(init, Config) ->
+    meck:expect(
+        mria_mnesia,
+        cluster_nodes,
+        fun
+            (running) -> [node()];
+            (stopped) -> ['stopped@node']
+        end
+    ),
+    Config;
+t_list_nodes('end', _Config) ->
+    ok.
+
+t_list_nodes(_) ->
+    NodeInfos = emqx_mgmt:list_nodes(),
+    Node = node(),
+    ?assertMatch(
+        [
+            {Node, #{node := Node, node_status := 'running'}},
+            {'stopped@node', #{node := 'stopped@node', node_status := 'stopped'}}
+        ],
+        NodeInfos
+    ).
+
+t_lookup_node(init, Config) ->
+    meck:new(os, [passthrough, unstick, no_link]),
+    OsType = os:type(),
+    meck:expect(os, type, 0, {win32, winME}),
+    [{os_type, OsType} | Config];
+t_lookup_node('end', Config) ->
+    %% We need to restore the original behavior so that rebar3 doesn't crash. If
+    %% we'd `meck:unload(os)` or not set `no_link` then `ct` crashes calling
+    %% `os` with "The code server called the unloaded module `os'".
+    OsType = ?config(os_type, Config),
+    meck:expect(os, type, 0, OsType),
+    ok.
+
+t_lookup_node(_) ->
+    Node = node(),
+    ?assertMatch(
+        #{node := Node, node_status := 'running', memory_total := 0},
+        emqx_mgmt:lookup_node(node())
+    ),
+    ?assertMatch(
+        {error, _},
+        emqx_mgmt:lookup_node('fake@nohost')
+    ),
+    ok.
+
+t_list_brokers(_) ->
+    Node = node(),
+    ?assertMatch(
+        [{Node, #{node := Node, node_status := running, uptime := _}}],
+        emqx_mgmt:list_brokers()
+    ).
+
+t_lookup_broker(_) ->
+    Node = node(),
+    ?assertMatch(
+        #{node := Node, node_status := running, uptime := _},
+        emqx_mgmt:lookup_broker(Node)
+    ).
+
+t_get_metrics(_) ->
+    Metrics = emqx_mgmt:get_metrics(),
+    ?assert(maps:size(Metrics) > 0),
+    ?assertMatch(
+        Metrics, maps:from_list(emqx_mgmt:get_metrics(node()))
+    ).
+
+t_lookup_client(init, Config) ->
+    setup_clients(Config);
+t_lookup_client('end', Config) ->
+    disconnect_clients(Config).
+
+t_lookup_client(_Config) ->
+    [{Chan, Info, Stats}] = emqx_mgmt:lookup_client({clientid, <<"client1">>}, ?FORMATFUN),
+    ?assertEqual(
+        [{Chan, Info, Stats}],
+        emqx_mgmt:lookup_client({username, <<"user1">>}, ?FORMATFUN)
+    ),
+    ?assertEqual([], emqx_mgmt:lookup_client({clientid, <<"notfound">>}, ?FORMATFUN)),
+    meck:expect(mria_mnesia, running_nodes, 0, [node(), 'fake@nonode']),
+    ?assertMatch(
+        [_ | {error, nodedown}], emqx_mgmt:lookup_client({clientid, <<"client1">>}, ?FORMATFUN)
+    ).
+
+t_kickout_client(init, Config) ->
+    process_flag(trap_exit, true),
+    setup_clients(Config);
+t_kickout_client('end', _Config) ->
+    ok.
+
+t_kickout_client(Config) ->
+    [C | _] = ?config(clients, Config),
+    ok = emqx_mgmt:kickout_client(<<"client1">>),
+    receive
+        {'EXIT', C, Reason} ->
+            ?assertEqual({shutdown, tcp_closed}, Reason);
+        Foo ->
+            error({unexpected, Foo})
+    after 1000 ->
+        error(timeout)
+    end,
+    ?assertEqual({error, not_found}, emqx_mgmt:kickout_client(<<"notfound">>)).
+
+t_list_authz_cache(init, Config) ->
+    setup_clients(Config);
+t_list_authz_cache('end', Config) ->
+    disconnect_clients(Config).
+
+t_list_authz_cache(_) ->
+    ?assertNotMatch({error, _}, emqx_mgmt:list_authz_cache(<<"client1">>)),
+    ?assertMatch({error, not_found}, emqx_mgmt:list_authz_cache(<<"notfound">>)).
+
+t_list_client_subscriptions(init, Config) ->
+    setup_clients(Config);
+t_list_client_subscriptions('end', Config) ->
+    disconnect_clients(Config).
+
+t_list_client_subscriptions(Config) ->
+    [Client | _] = ?config(clients, Config),
+    ?assertEqual([], emqx_mgmt:list_client_subscriptions(<<"client1">>)),
+    emqtt:subscribe(Client, <<"t/#">>),
+    ?assertMatch({_, [{<<"t/#">>, _Opts}]}, emqx_mgmt:list_client_subscriptions(<<"client1">>)),
+    ?assertEqual({error, not_found}, emqx_mgmt:list_client_subscriptions(<<"notfound">>)).
+
+t_clean_cache(init, Config) ->
+    setup_clients(Config);
+t_clean_cache('end', Config) ->
+    disconnect_clients(Config).
+
+t_clean_cache(_Config) ->
+    ?assertNotMatch(
+        {error, _},
+        emqx_mgmt:clean_authz_cache(<<"client1">>)
+    ),
+    ?assertNotMatch(
+        {error, _},
+        emqx_mgmt:clean_authz_cache_all()
+    ),
+    ?assertNotMatch(
+        {error, _},
+        emqx_mgmt:clean_pem_cache_all()
+    ),
+    meck:expect(mria_mnesia, running_nodes, 0, [node(), 'fake@nonode']),
+    ?assertMatch(
+        {error, [{'fake@nonode', {error, _}}]},
+        emqx_mgmt:clean_authz_cache_all()
+    ),
+    ?assertMatch(
+        {error, [{'fake@nonode', {error, _}}]},
+        emqx_mgmt:clean_pem_cache_all()
+    ).
+
+t_set_client_props(init, Config) ->
+    setup_clients(Config);
+t_set_client_props('end', Config) ->
+    disconnect_clients(Config).
+
+t_set_client_props(_Config) ->
+    ?assertEqual(
+        % [FIXME] not implemented at this point?
+        ignored,
+        emqx_mgmt:set_ratelimit_policy(<<"client1">>, foo)
+    ),
+    ?assertEqual(
+        {error, not_found},
+        emqx_mgmt:set_ratelimit_policy(<<"notfound">>, foo)
+    ),
+    ?assertEqual(
+        % [FIXME] not implemented at this point?
+        ignored,
+        emqx_mgmt:set_quota_policy(<<"client1">>, foo)
+    ),
+    ?assertEqual(
+        {error, not_found},
+        emqx_mgmt:set_quota_policy(<<"notfound">>, foo)
+    ),
+    ?assertEqual(
+        ok,
+        emqx_mgmt:set_keepalive(<<"client1">>, 3600)
+    ),
+    ?assertMatch(
+        {error, _},
+        emqx_mgmt:set_keepalive(<<"client1">>, true)
+    ),
+    ?assertEqual(
+        {error, not_found},
+        emqx_mgmt:set_keepalive(<<"notfound">>, 3600)
+    ),
+    ok.
+
+t_list_subscriptions_via_topic(init, Config) ->
+    setup_clients(Config);
+t_list_subscriptions_via_topic('end', Config) ->
+    disconnect_clients(Config).
+
+t_list_subscriptions_via_topic(Config) ->
+    [Client | _] = ?config(clients, Config),
+    ?assertEqual([], emqx_mgmt:list_subscriptions_via_topic(<<"t/#">>, ?FORMATFUN)),
+    emqtt:subscribe(Client, <<"t/#">>),
+    ?assertMatch(
+        [{{<<"t/#">>, _SubPid}, _Opts}],
+        emqx_mgmt:list_subscriptions_via_topic(<<"t/#">>, ?FORMATFUN)
+    ).
+
+t_pubsub_api(init, Config) ->
+    setup_clients(Config);
+t_pubsub_api('end', Config) ->
+    disconnect_clients(Config).
+
+-define(TT(Topic), {Topic, #{qos => 0}}).
+
+t_pubsub_api(Config) ->
+    [Client | _] = ?config(clients, Config),
+    ?assertEqual([], emqx_mgmt:list_subscriptions_via_topic(<<"t/#">>, ?FORMATFUN)),
+    ?assertMatch(
+        {subscribe, _, _},
+        emqx_mgmt:subscribe(<<"client1">>, [?TT(<<"t/#">>), ?TT(<<"t1/#">>), ?TT(<<"t2/#">>)])
+    ),
+    timer:sleep(100),
+    ?assertMatch(
+        [{{<<"t/#">>, _SubPid}, _Opts}],
+        emqx_mgmt:list_subscriptions_via_topic(<<"t/#">>, ?FORMATFUN)
+    ),
+    Message = emqx_message:make(?MODULE, 0, <<"t/foo">>, <<"helloworld">>, #{}, #{}),
+    emqx_mgmt:publish(Message),
+    Recv =
+        receive
+            {publish, #{client_pid := Client, payload := <<"helloworld">>}} ->
+                ok
+        after 100 ->
+            timeout
+        end,
+    ?assertEqual(ok, Recv),
+    ?assertEqual({error, channel_not_found}, emqx_mgmt:subscribe(<<"notfound">>, [?TT(<<"t/#">>)])),
+    ?assertNotMatch({error, _}, emqx_mgmt:unsubscribe(<<"client1">>, <<"t/#">>)),
+    ?assertEqual({error, channel_not_found}, emqx_mgmt:unsubscribe(<<"notfound">>, <<"t/#">>)),
+    Node = node(),
+    ?assertMatch(
+        {Node, [{<<"t1/#">>, _}, {<<"t2/#">>, _}]},
+        emqx_mgmt:list_client_subscriptions(<<"client1">>)
+    ),
+    ?assertMatch(
+        {unsubscribe, [{<<"t1/#">>, _}, {<<"t2/#">>, _}]},
+        emqx_mgmt:unsubscribe_batch(<<"client1">>, [<<"t1/#">>, <<"t2/#">>])
+    ),
+    timer:sleep(100),
+    ?assertMatch([], emqx_mgmt:list_client_subscriptions(<<"client1">>)),
+    ?assertEqual(
+        {error, channel_not_found},
+        emqx_mgmt:unsubscribe_batch(<<"notfound">>, [<<"t1/#">>, <<"t2/#">>])
+    ).
+
+t_alarms(init, Config) ->
+    [
+        emqx_mgmt:deactivate(Node, Name)
+     || {Node, ActiveAlarms} <- emqx_mgmt:get_alarms(activated), #{name := Name} <- ActiveAlarms
+    ],
+    emqx_mgmt:delete_all_deactivated_alarms(),
+    Config;
+t_alarms('end', Config) ->
+    Config.
+
+t_alarms(_) ->
+    Node = node(),
+    ?assertEqual(
+        [{node(), []}],
+        emqx_mgmt:get_alarms(all)
+    ),
+    emqx_alarm:activate(foo),
+    ?assertMatch(
+        [{Node, [#{name := foo, activated := true, duration := _}]}],
+        emqx_mgmt:get_alarms(all)
+    ),
+    emqx_alarm:activate(bar),
+    ?assertMatch(
+        [{Node, [#{name := foo, activated := true}, #{name := bar, activated := true}]}],
+        sort_alarms(emqx_mgmt:get_alarms(all))
+    ),
+    ?assertEqual(
+        ok,
+        emqx_mgmt:deactivate(node(), bar)
+    ),
+    ?assertMatch(
+        [{Node, [#{name := foo, activated := true}, #{name := bar, activated := false}]}],
+        sort_alarms(emqx_mgmt:get_alarms(all))
+    ),
+    ?assertMatch(
+        [{Node, [#{name := foo, activated := true}]}],
+        emqx_mgmt:get_alarms(activated)
+    ),
+    ?assertMatch(
+        [{Node, [#{name := bar, activated := false}]}],
+        emqx_mgmt:get_alarms(deactivated)
+    ),
+    ?assertEqual(
+        [ok],
+        emqx_mgmt:delete_all_deactivated_alarms()
+    ),
+    ?assertMatch(
+        [{Node, [#{name := foo, activated := true}]}],
+        emqx_mgmt:get_alarms(all)
+    ),
+    ?assertEqual(
+        {error, not_found},
+        emqx_mgmt:deactivate(node(), bar)
+    ).
+
+t_banned(_) ->
+    Banned = #{
+        who => {clientid, <<"TestClient">>},
+        by => <<"banned suite">>,
+        reason => <<"test">>,
+        at => erlang:system_time(second),
+        until => erlang:system_time(second) + 1
+    },
+    ?assertMatch(
+        {ok, _},
+        emqx_mgmt:create_banned(Banned)
+    ),
+    ?assertEqual(
+        ok,
+        emqx_mgmt:delete_banned({clientid, <<"TestClient">>})
+    ).
+
+%%% helpers
+ident(Arg) ->
+    Arg.
+
+sort_alarms([{Node, Alarms}]) ->
+    [{Node, lists:sort(fun(#{activate_at := A}, #{activate_at := B}) -> A < B end, Alarms)}].
+
+setup_clients(Config) ->
+    {ok, C} = emqtt:start_link([{clientid, <<"client1">>}, {username, <<"user1">>}]),
+    {ok, _} = emqtt:connect(C),
+    [{clients, [C]} | Config].
+
+disconnect_clients(Config) ->
+    Clients = ?config(clients, Config),
+    lists:foreach(fun emqtt:disconnect/1, Clients).

+ 1 - 1
apps/emqx_management/test/emqx_mgmt_api_alarms_SUITE.erl

@@ -62,5 +62,5 @@ get_alarms(AssertCount, Activated) ->
     Limit = maps:get(<<"limit">>, Meta),
     Count = maps:get(<<"count">>, Meta),
     ?assertEqual(Page, 1),
-    ?assertEqual(Limit, emqx_mgmt:max_row_limit()),
+    ?assertEqual(Limit, emqx_mgmt:default_row_limit()),
     ?assert(Count >= AssertCount).

+ 12 - 5
apps/emqx_management/test/emqx_mgmt_api_clients_SUITE.erl

@@ -64,7 +64,7 @@ t_clients(_) ->
     ClientsLimit = maps:get(<<"limit">>, ClientsMeta),
     ClientsCount = maps:get(<<"count">>, ClientsMeta),
     ?assertEqual(ClientsPage, 1),
-    ?assertEqual(ClientsLimit, emqx_mgmt:max_row_limit()),
+    ?assertEqual(ClientsLimit, emqx_mgmt:default_row_limit()),
     ?assertEqual(ClientsCount, 2),
 
     %% get /clients/:clientid
@@ -78,7 +78,14 @@ t_clients(_) ->
     %% delete /clients/:clientid kickout
     Client2Path = emqx_mgmt_api_test_util:api_path(["clients", binary_to_list(ClientId2)]),
     {ok, _} = emqx_mgmt_api_test_util:request_api(delete, Client2Path),
-    timer:sleep(300),
+    Kick =
+        receive
+            {'EXIT', C2, _} ->
+                ok
+        after 300 ->
+            timeout
+        end,
+    ?assertEqual(ok, Kick),
     AfterKickoutResponse2 = emqx_mgmt_api_test_util:request_api(get, Client2Path),
     ?assertEqual({error, {"HTTP/1.1", 404, "Not Found"}}, AfterKickoutResponse2),
 
@@ -107,7 +114,7 @@ t_clients(_) ->
         SubscribeBody
     ),
     timer:sleep(100),
-    [{AfterSubTopic, #{qos := AfterSubQos}}] = emqx_mgmt:lookup_subscriptions(ClientId1),
+    {_, [{AfterSubTopic, #{qos := AfterSubQos}}]} = emqx_mgmt:list_client_subscriptions(ClientId1),
     ?assertEqual(AfterSubTopic, Topic),
     ?assertEqual(AfterSubQos, Qos),
 
@@ -152,7 +159,7 @@ t_clients(_) ->
         UnSubscribeBody
     ),
     timer:sleep(100),
-    ?assertEqual([], emqx_mgmt:lookup_subscriptions(Client1)),
+    ?assertEqual([], emqx_mgmt:list_client_subscriptions(ClientId1)),
 
     %% testcase cleanup, kickout client1
     {ok, _} = emqx_mgmt_api_test_util:request_api(delete, Client1Path),
@@ -272,7 +279,7 @@ t_client_id_not_found(_Config) ->
     %% Client kickout
     ?assertMatch({error, {Http, _, Body}}, ReqFun(delete, PathFun([]))),
     %% Client Subscription list
-    ?assertMatch({ok, {{"HTTP/1.1", 200, "OK"}, _, "[]"}}, ReqFun(get, PathFun(["subscriptions"]))),
+    ?assertMatch({error, {Http, _, Body}}, ReqFun(get, PathFun(["subscriptions"]))),
     %% AuthZ Cache lookup
     ?assertMatch({error, {Http, _, Body}}, ReqFun(get, PathFun(["authorization", "cache"]))),
     %% AuthZ Cache clean

+ 3 - 3
apps/emqx_management/test/emqx_mgmt_api_subscription_SUITE.erl

@@ -57,7 +57,7 @@ t_subscription_api(Config) ->
     Data = emqx_json:decode(Response, [return_maps]),
     Meta = maps:get(<<"meta">>, Data),
     ?assertEqual(1, maps:get(<<"page">>, Meta)),
-    ?assertEqual(emqx_mgmt:max_row_limit(), maps:get(<<"limit">>, Meta)),
+    ?assertEqual(emqx_mgmt:default_row_limit(), maps:get(<<"limit">>, Meta)),
     ?assertEqual(2, maps:get(<<"count">>, Meta)),
     Subscriptions = maps:get(<<"data">>, Data),
     ?assertEqual(length(Subscriptions), 2),
@@ -95,7 +95,7 @@ t_subscription_api(Config) ->
 
     DataTopic2 = #{<<"meta">> := Meta2} = request_json(get, QS, Headers),
     ?assertEqual(1, maps:get(<<"page">>, Meta2)),
-    ?assertEqual(emqx_mgmt:max_row_limit(), maps:get(<<"limit">>, Meta2)),
+    ?assertEqual(emqx_mgmt:default_row_limit(), maps:get(<<"limit">>, Meta2)),
     ?assertEqual(1, maps:get(<<"count">>, Meta2)),
     SubscriptionsList2 = maps:get(<<"data">>, DataTopic2),
     ?assertEqual(length(SubscriptionsList2), 1).
@@ -120,7 +120,7 @@ t_subscription_fuzzy_search(Config) ->
 
     MatchData1 = #{<<"meta">> := MatchMeta1} = request_json(get, MatchQs, Headers),
     ?assertEqual(1, maps:get(<<"page">>, MatchMeta1)),
-    ?assertEqual(emqx_mgmt:max_row_limit(), maps:get(<<"limit">>, MatchMeta1)),
+    ?assertEqual(emqx_mgmt:default_row_limit(), maps:get(<<"limit">>, MatchMeta1)),
     %% count is undefined in fuzzy searching
     ?assertNot(maps:is_key(<<"count">>, MatchMeta1)),
     ?assertMatch(3, length(maps:get(<<"data">>, MatchData1))),

+ 1 - 1
apps/emqx_management/test/emqx_mgmt_api_topics_SUITE.erl

@@ -52,7 +52,7 @@ t_nodes_api(Config) ->
     RoutesData = emqx_json:decode(Response, [return_maps]),
     Meta = maps:get(<<"meta">>, RoutesData),
     ?assertEqual(1, maps:get(<<"page">>, Meta)),
-    ?assertEqual(emqx_mgmt:max_row_limit(), maps:get(<<"limit">>, Meta)),
+    ?assertEqual(emqx_mgmt:default_row_limit(), maps:get(<<"limit">>, Meta)),
     ?assertEqual(1, maps:get(<<"count">>, Meta)),
     Data = maps:get(<<"data">>, RoutesData),
     Route = erlang:hd(Data),

+ 15 - 8
apps/emqx_management/test/emqx_mgmt_api_trace_SUITE.erl

@@ -19,9 +19,7 @@
 -compile(export_all).
 -compile(nowarn_export_all).
 
--include_lib("common_test/include/ct.hrl").
 -include_lib("eunit/include/eunit.hrl").
--include_lib("emqx/include/emqx.hrl").
 -include_lib("kernel/include/file.hrl").
 -include_lib("stdlib/include/zip.hrl").
 -include_lib("snabbkaffe/include/snabbkaffe.hrl").
@@ -225,12 +223,12 @@ t_log_file(_Config) ->
         ]},
         zip:table(Binary2)
     ),
-    {error, {_, 400, _}} =
+    {error, {_, 404, _}} =
         request_api(
             get,
-            api_path("trace/test_client_id/download?node=unknonwn_node")
+            api_path("trace/test_client_id/download?node=unknown_node")
         ),
-    {error, {_, 400, _}} =
+    {error, {_, 404, _}} =
         request_api(
             get,
             % known atom but unknown node
@@ -296,12 +294,21 @@ t_stream_log(_Config) ->
     #{<<"meta">> := Meta1, <<"items">> := Bin1} = json(Binary1),
     ?assertEqual(#{<<"position">> => 30, <<"bytes">> => 10}, Meta1),
     ?assertEqual(10, byte_size(Bin1)),
-    {error, {_, 400, _}} =
+    ct:pal("~p vs ~p", [Bin, Bin1]),
+    %% in theory they could be the same but we know they shouldn't
+    ?assertNotEqual(Bin, Bin1),
+    BadReqPath = api_path("trace/test_stream_log/log?&bytes=1000000000000"),
+    {error, {_, 400, _}} = request_api(get, BadReqPath),
+    meck:new(file, [passthrough, unstick]),
+    meck:expect(file, read, 2, {error, enomem}),
+    {error, {_, 503, _}} = request_api(get, Path),
+    meck:unload(file),
+    {error, {_, 404, _}} =
         request_api(
             get,
-            api_path("trace/test_stream_log/log?node=unknonwn_node")
+            api_path("trace/test_stream_log/log?node=unknown_node")
         ),
-    {error, {_, 400, _}} =
+    {error, {_, 404, _}} =
         request_api(
             get,
             % known atom but not a node

+ 2 - 1
apps/emqx_modules/test/emqx_telemetry_SUITE.erl

@@ -858,7 +858,8 @@ stop_slave(Node) ->
     ok = slave:stop(Node),
     ?assertEqual([node()], mria_mnesia:running_nodes()),
     ?assertEqual([], nodes()),
-    ok.
+    _ = application:stop(mria),
+    ok = application:start(mria).
 
 leave_cluster() ->
     try mnesia_hook:module_info() of

+ 1 - 1
apps/emqx_plugins/src/emqx_plugins.app.src

@@ -1,7 +1,7 @@
 %% -*- mode: erlang -*-
 {application, emqx_plugins, [
     {description, "EMQX Plugin Management"},
-    {vsn, "0.1.1"},
+    {vsn, "0.1.2"},
     {modules, []},
     {mod, {emqx_plugins_app, []}},
     {applications, [kernel, stdlib, emqx]},

+ 2 - 2
apps/emqx_plugins/src/emqx_plugins_schema.erl

@@ -78,11 +78,11 @@ states(_) -> undefined.
 install_dir(type) -> string();
 install_dir(required) -> false;
 %% runner's root dir
-install_dir(default) -> "plugins";
+install_dir(default) -> <<"plugins">>;
 install_dir(T) when T =/= desc -> undefined;
 install_dir(desc) -> ?DESC(install_dir).
 
 check_interval(type) -> emqx_schema:duration();
-check_interval(default) -> "5s";
+check_interval(default) -> <<"5s">>;
 check_interval(T) when T =/= desc -> undefined;
 check_interval(desc) -> ?DESC(check_interval).

+ 2 - 2
apps/emqx_prometheus/src/emqx_prometheus_schema.erl

@@ -40,7 +40,7 @@ fields("prometheus") ->
             ?HOCON(
                 string(),
                 #{
-                    default => "http://127.0.0.1:9091",
+                    default => <<"http://127.0.0.1:9091">>,
                     required => true,
                     validator => fun ?MODULE:validate_push_gateway_server/1,
                     desc => ?DESC(push_gateway_server)
@@ -50,7 +50,7 @@ fields("prometheus") ->
             ?HOCON(
                 emqx_schema:duration_ms(),
                 #{
-                    default => "15s",
+                    default => <<"15s">>,
                     required => true,
                     desc => ?DESC(interval)
                 }

+ 221 - 90
apps/emqx_resource/src/emqx_resource_buffer_worker.erl

@@ -70,6 +70,18 @@
 -define(RETRY_IDX, 3).
 -define(WORKER_MREF_IDX, 4).
 
+-define(ENSURE_ASYNC_FLUSH(InflightTID, EXPR),
+    (fun() ->
+        IsFullBefore = is_inflight_full(InflightTID),
+        case (EXPR) of
+            blocked ->
+                ok;
+            ok ->
+                ok = maybe_flush_after_async_reply(IsFullBefore)
+        end
+    end)()
+).
+
 -type id() :: binary().
 -type index() :: pos_integer().
 -type expire_at() :: infinity | integer().
@@ -97,6 +109,7 @@ start_link(Id, Index, Opts) ->
 
 -spec sync_query(id(), request(), query_opts()) -> Result :: term().
 sync_query(Id, Request, Opts0) ->
+    ?tp(sync_query, #{id => Id, request => Request, query_opts => Opts0}),
     Opts1 = ensure_timeout_query_opts(Opts0, sync),
     Opts = ensure_expire_at(Opts1),
     PickKey = maps:get(pick_key, Opts, self()),
@@ -106,6 +119,7 @@ sync_query(Id, Request, Opts0) ->
 
 -spec async_query(id(), request(), query_opts()) -> Result :: term().
 async_query(Id, Request, Opts0) ->
+    ?tp(async_query, #{id => Id, request => Request, query_opts => Opts0}),
     Opts1 = ensure_timeout_query_opts(Opts0, async),
     Opts = ensure_expire_at(Opts1),
     PickKey = maps:get(pick_key, Opts, self()),
@@ -121,6 +135,7 @@ simple_sync_query(Id, Request) ->
     %% call ends up calling buffering functions, that's a bug and
     %% would mess up the metrics anyway.  `undefined' is ignored by
     %% `emqx_resource_metrics:*_shift/3'.
+    ?tp(simple_sync_query, #{id => Id, request => Request}),
     Index = undefined,
     QueryOpts = simple_query_opts(),
     emqx_resource_metrics:matched_inc(Id),
@@ -132,6 +147,7 @@ simple_sync_query(Id, Request) ->
 %% simple async-query the resource without batching and queuing.
 -spec simple_async_query(id(), request(), query_opts()) -> term().
 simple_async_query(Id, Request, QueryOpts0) ->
+    ?tp(simple_async_query, #{id => Id, request => Request, query_opts => QueryOpts0}),
     Index = undefined,
     QueryOpts = maps:merge(simple_query_opts(), QueryOpts0),
     emqx_resource_metrics:matched_inc(Id),
@@ -194,8 +210,8 @@ init({Id, Index, Opts}) ->
     ?tp(buffer_worker_init, #{id => Id, index => Index}),
     {ok, running, Data}.
 
-running(enter, _, Data) ->
-    ?tp(buffer_worker_enter_running, #{id => maps:get(id, Data)}),
+running(enter, _, #{tref := _Tref} = Data) ->
+    ?tp(buffer_worker_enter_running, #{id => maps:get(id, Data), tref => _Tref}),
     %% According to `gen_statem' laws, we mustn't call `maybe_flush'
     %% directly because it may decide to return `{next_state, blocked, _}',
     %% and that's an invalid response for a state enter call.
@@ -212,9 +228,8 @@ running(info, ?SEND_REQ(_ReplyTo, _Req) = Request0, Data) ->
     handle_query_requests(Request0, Data);
 running(info, {flush, Ref}, St = #{tref := {_TRef, Ref}}) ->
     flush(St#{tref := undefined});
-running(internal, flush, St) ->
-    flush(St);
 running(info, {flush, _Ref}, _St) ->
+    ?tp(discarded_stale_flush, #{}),
     keep_state_and_data;
 running(info, {'DOWN', _MRef, process, Pid, Reason}, Data0 = #{async_workers := AsyncWorkers0}) when
     is_map_key(Pid, AsyncWorkers0)
@@ -225,21 +240,24 @@ running(info, Info, _St) ->
     ?SLOG(error, #{msg => unexpected_msg, state => running, info => Info}),
     keep_state_and_data.
 
-blocked(enter, _, #{resume_interval := ResumeT} = _St) ->
+blocked(enter, _, #{resume_interval := ResumeT} = St0) ->
     ?tp(buffer_worker_enter_blocked, #{}),
-    {keep_state_and_data, {state_timeout, ResumeT, unblock}};
+    %% discard the old timer, new timer will be started when entering running state again
+    St = cancel_flush_timer(St0),
+    {keep_state, St, {state_timeout, ResumeT, unblock}};
 blocked(cast, block, _St) ->
     keep_state_and_data;
 blocked(cast, resume, St) ->
     resume_from_blocked(St);
-blocked(cast, flush, Data) ->
-    resume_from_blocked(Data);
+blocked(cast, flush, St) ->
+    resume_from_blocked(St);
 blocked(state_timeout, unblock, St) ->
     resume_from_blocked(St);
 blocked(info, ?SEND_REQ(_ReplyTo, _Req) = Request0, Data0) ->
     Data = collect_and_enqueue_query_requests(Request0, Data0),
     {keep_state, Data};
 blocked(info, {flush, _Ref}, _Data) ->
+    %% ignore stale timer
     keep_state_and_data;
 blocked(info, {'DOWN', _MRef, process, Pid, Reason}, Data0 = #{async_workers := AsyncWorkers0}) when
     is_map_key(Pid, AsyncWorkers0)
@@ -335,11 +353,13 @@ resume_from_blocked(Data) ->
             %% We retry msgs in inflight window sync, as if we send them
             %% async, they will be appended to the end of inflight window again.
             retry_inflight_sync(Ref, Query, Data);
+        {batch, Ref, NotExpired, []} ->
+            retry_inflight_sync(Ref, NotExpired, Data);
         {batch, Ref, NotExpired, Expired} ->
-            update_inflight_item(InflightTID, Ref, NotExpired),
             NumExpired = length(Expired),
+            ok = update_inflight_item(InflightTID, Ref, NotExpired, NumExpired),
             emqx_resource_metrics:dropped_expired_inc(Id, NumExpired),
-            NumExpired > 0 andalso ?tp(buffer_worker_retry_expired, #{expired => Expired}),
+            ?tp(buffer_worker_retry_expired, #{expired => Expired}),
             %% We retry msgs in inflight window sync, as if we send them
             %% async, they will be appended to the end of inflight window again.
             retry_inflight_sync(Ref, NotExpired, Data)
@@ -470,9 +490,14 @@ flush(Data0) ->
     Data1 = cancel_flush_timer(Data0),
     CurrentCount = queue_count(Q0),
     IsFull = is_inflight_full(InflightTID),
-    ?tp(buffer_worker_flush, #{queue_count => CurrentCount, is_full => IsFull}),
+    ?tp(buffer_worker_flush, #{
+        queued => CurrentCount,
+        is_inflight_full => IsFull,
+        inflight => inflight_count(InflightTID)
+    }),
     case {CurrentCount, IsFull} of
         {0, _} ->
+            ?tp(buffer_worker_queue_drained, #{inflight => inflight_count(InflightTID)}),
             {keep_state, Data1};
         {_, true} ->
             ?tp(buffer_worker_flush_but_inflight_full, #{}),
@@ -487,7 +512,7 @@ flush(Data0) ->
             %% if the request has expired, the caller is no longer
             %% waiting for a response.
             case sieve_expired_requests(Batch, Now) of
-                all_expired ->
+                {[], _AllExpired} ->
                     ok = replayq:ack(Q1, QAckRef),
                     emqx_resource_metrics:dropped_expired_inc(Id, length(Batch)),
                     emqx_resource_metrics:queuing_set(Id, Index, queue_count(Q1)),
@@ -496,7 +521,7 @@ flush(Data0) ->
                 {NotExpired, Expired} ->
                     NumExpired = length(Expired),
                     emqx_resource_metrics:dropped_expired_inc(Id, NumExpired),
-                    IsBatch = BatchSize =/= 1,
+                    IsBatch = (BatchSize > 1),
                     %% We *must* use the new queue, because we currently can't
                     %% `nack' a `pop'.
                     %% Maybe we could re-open the queue?
@@ -506,7 +531,6 @@ flush(Data0) ->
                     ),
                     Ref = make_request_ref(),
                     do_flush(Data2, #{
-                        new_queue => Q1,
                         is_batch => IsBatch,
                         batch => NotExpired,
                         ref => Ref,
@@ -519,18 +543,16 @@ flush(Data0) ->
     is_batch := boolean(),
     batch := [queue_query()],
     ack_ref := replayq:ack_ref(),
-    ref := inflight_key(),
-    new_queue := replayq:q()
+    ref := inflight_key()
 }) ->
     gen_statem:event_handler_result(state(), data()).
 do_flush(
-    Data0,
+    #{queue := Q1} = Data0,
     #{
         is_batch := false,
         batch := Batch,
         ref := Ref,
-        ack_ref := QAckRef,
-        new_queue := Q1
+        ack_ref := QAckRef
     }
 ) ->
     #{
@@ -606,16 +628,18 @@ do_flush(
                     }),
                     flush_worker(self());
                 false ->
+                    ?tp(buffer_worker_queue_drained, #{
+                        inflight => inflight_count(InflightTID)
+                    }),
                     ok
             end,
             {keep_state, Data1}
     end;
-do_flush(Data0, #{
+do_flush(#{queue := Q1} = Data0, #{
     is_batch := true,
     batch := Batch,
     ref := Ref,
-    ack_ref := QAckRef,
-    new_queue := Q1
+    ack_ref := QAckRef
 }) ->
     #{
         id := Id,
@@ -685,6 +709,9 @@ do_flush(Data0, #{
             Data2 =
                 case {CurrentCount > 0, CurrentCount >= BatchSize} of
                     {false, _} ->
+                        ?tp(buffer_worker_queue_drained, #{
+                            inflight => inflight_count(InflightTID)
+                        }),
                         Data1;
                     {true, true} ->
                         ?tp(buffer_worker_flush_ack_reflush, #{
@@ -718,13 +745,14 @@ batch_reply_caller_defer_metrics(Id, BatchResult, Batch, QueryOpts) ->
     {ShouldAck, PostFns} =
         lists:foldl(
             fun(Reply, {_ShouldAck, PostFns}) ->
+                %% _ShouldAck should be the same as ShouldAck starting from the second reply
                 {ShouldAck, PostFn} = reply_caller_defer_metrics(Id, Reply, QueryOpts),
                 {ShouldAck, [PostFn | PostFns]}
             end,
             {ack, []},
             Replies
         ),
-    PostFn = fun() -> lists:foreach(fun(F) -> F() end, PostFns) end,
+    PostFn = fun() -> lists:foreach(fun(F) -> F() end, lists:reverse(PostFns)) end,
     {ShouldAck, PostFn}.
 
 reply_caller(Id, Reply, QueryOpts) ->
@@ -853,7 +881,7 @@ handle_async_worker_down(Data0, Pid) ->
     {keep_state, Data}.
 
 call_query(QM0, Id, Index, Ref, Query, QueryOpts) ->
-    ?tp(call_query_enter, #{id => Id, query => Query}),
+    ?tp(call_query_enter, #{id => Id, query => Query, query_mode => QM0}),
     case emqx_resource_manager:ets_lookup(Id) of
         {ok, _Group, #{status := stopped}} ->
             ?RESOURCE_ERROR(stopped, "resource stopped or disabled");
@@ -919,7 +947,7 @@ apply_query_fun(async, Mod, Id, Index, Ref, ?QUERY(_, Request, _, _) = Query, Re
                 inflight_tid => InflightTID,
                 request_ref => Ref,
                 query_opts => QueryOpts,
-                query => minimize(Query)
+                min_query => minimize(Query)
             },
             IsRetriable = false,
             WorkerMRef = undefined,
@@ -952,7 +980,7 @@ apply_query_fun(async, Mod, Id, Index, Ref, [?QUERY(_, _, _, _) | _] = Batch, Re
                 inflight_tid => InflightTID,
                 request_ref => Ref,
                 query_opts => QueryOpts,
-                batch => minimize(Batch)
+                min_batch => minimize(Batch)
             },
             Requests = lists:map(
                 fun(?QUERY(_ReplyTo, Request, _, _ExpireAt)) -> Request end, Batch
@@ -968,27 +996,39 @@ apply_query_fun(async, Mod, Id, Index, Ref, [?QUERY(_, _, _, _) | _] = Batch, Re
     ).
 
 handle_async_reply(
+    #{
+        request_ref := Ref,
+        inflight_tid := InflightTID,
+        query_opts := Opts
+    } = ReplyContext,
+    Result
+) ->
+    case maybe_handle_unknown_async_reply(InflightTID, Ref, Opts) of
+        discard ->
+            ok;
+        continue ->
+            ?ENSURE_ASYNC_FLUSH(InflightTID, handle_async_reply1(ReplyContext, Result))
+    end.
+
+handle_async_reply1(
     #{
         request_ref := Ref,
         inflight_tid := InflightTID,
         resource_id := Id,
         worker_index := Index,
-        buffer_worker := Pid,
-        query := ?QUERY(_, _, _, ExpireAt) = _Query
+        min_query := ?QUERY(_, _, _, ExpireAt) = _Query
     } = ReplyContext,
     Result
 ) ->
     ?tp(
         handle_async_reply_enter,
-        #{batch_or_query => [_Query], ref => Ref}
+        #{batch_or_query => [_Query], ref => Ref, result => Result}
     ),
     Now = now_(),
     case is_expired(ExpireAt, Now) of
         true ->
-            IsFullBefore = is_inflight_full(InflightTID),
             IsAcked = ack_inflight(InflightTID, Ref, Id, Index),
             IsAcked andalso emqx_resource_metrics:late_reply_inc(Id),
-            IsFullBefore andalso ?MODULE:flush_worker(Pid),
             ?tp(handle_async_reply_expired, #{expired => [_Query]}),
             ok;
         false ->
@@ -1003,7 +1043,7 @@ do_handle_async_reply(
         worker_index := Index,
         buffer_worker := Pid,
         inflight_tid := InflightTID,
-        query := ?QUERY(ReplyTo, _, Sent, _ExpireAt) = _Query
+        min_query := ?QUERY(ReplyTo, _, Sent, _ExpireAt) = _Query
     },
     Result
 ) ->
@@ -1020,46 +1060,95 @@ do_handle_async_reply(
         ref => Ref,
         result => Result
     }),
-
     case Action of
         nack ->
             %% Keep retrying.
-            mark_inflight_as_retriable(InflightTID, Ref),
-            ?MODULE:block(Pid);
+            ok = mark_inflight_as_retriable(InflightTID, Ref),
+            ok = ?MODULE:block(Pid),
+            blocked;
         ack ->
-            do_ack(InflightTID, Ref, Id, Index, PostFn, Pid, QueryOpts)
+            ok = do_async_ack(InflightTID, Ref, Id, Index, PostFn, QueryOpts)
     end.
 
 handle_async_batch_reply(
     #{
-        buffer_worker := Pid,
-        resource_id := Id,
-        worker_index := Index,
         inflight_tid := InflightTID,
         request_ref := Ref,
-        batch := Batch
+        query_opts := Opts
+    } = ReplyContext,
+    Result
+) ->
+    case maybe_handle_unknown_async_reply(InflightTID, Ref, Opts) of
+        discard ->
+            ok;
+        continue ->
+            ?ENSURE_ASYNC_FLUSH(InflightTID, handle_async_batch_reply1(ReplyContext, Result))
+    end.
+
+handle_async_batch_reply1(
+    #{
+        inflight_tid := InflightTID,
+        request_ref := Ref,
+        min_batch := Batch
     } = ReplyContext,
     Result
 ) ->
     ?tp(
         handle_async_reply_enter,
-        #{batch_or_query => Batch, ref => Ref}
+        #{batch_or_query => Batch, ref => Ref, result => Result}
     ),
     Now = now_(),
     case sieve_expired_requests(Batch, Now) of
-        all_expired ->
-            IsFullBefore = is_inflight_full(InflightTID),
-            IsAcked = ack_inflight(InflightTID, Ref, Id, Index),
-            IsAcked andalso emqx_resource_metrics:late_reply_inc(Id),
-            IsFullBefore andalso ?MODULE:flush_worker(Pid),
-            ?tp(handle_async_reply_expired, #{expired => Batch}),
+        {_NotExpired, []} ->
+            %% this is the critical code path,
+            %% we try not to do ets:lookup in this case
+            %% because the batch can be quite big
+            do_handle_async_batch_reply(ReplyContext, Result);
+        {_NotExpired, _Expired} ->
+            %% at least one is expired
+            %% the batch from reply context is minimized, so it cannot be used
+            %% to update the inflight items, hence discard Batch and lookup the RealBatch
+            ?tp(handle_async_reply_expired, #{expired => _Expired}),
+            handle_async_batch_reply2(ets:lookup(InflightTID, Ref), ReplyContext, Result, Now)
+    end.
+
+handle_async_batch_reply2([], _, _, _) ->
+    %% this usually should never happen unless the async callback is being evaluated concurrently
+    ok;
+handle_async_batch_reply2([Inflight], ReplyContext, Result, Now) ->
+    ?INFLIGHT_ITEM(_, RealBatch, _IsRetriable, _WorkerMRef) = Inflight,
+    #{
+        resource_id := Id,
+        worker_index := Index,
+        inflight_tid := InflightTID,
+        request_ref := Ref,
+        min_batch := Batch
+    } = ReplyContext,
+    %% All batch items share the same HasBeenSent flag
+    %% So we just take the original flag from the ReplyContext batch
+    %% and put it back to the batch found in inflight table
+    %% which must have already been set to `false`
+    [?QUERY(_ReplyTo, _, HasBeenSent, _ExpireAt) | _] = Batch,
+    {RealNotExpired0, RealExpired} = sieve_expired_requests(RealBatch, Now),
+    RealNotExpired =
+        lists:map(
+            fun(?QUERY(ReplyTo, CoreReq, _HasBeenSent, ExpireAt)) ->
+                ?QUERY(ReplyTo, CoreReq, HasBeenSent, ExpireAt)
+            end,
+            RealNotExpired0
+        ),
+    NumExpired = length(RealExpired),
+    emqx_resource_metrics:late_reply_inc(Id, NumExpired),
+    case RealNotExpired of
+        [] ->
+            %% all expired, no need to update back the inflight batch
+            _ = ack_inflight(InflightTID, Ref, Id, Index),
             ok;
-        {NotExpired, Expired} ->
-            NumExpired = length(Expired),
-            emqx_resource_metrics:late_reply_inc(Id, NumExpired),
-            NumExpired > 0 andalso
-                ?tp(handle_async_reply_expired, #{expired => Expired}),
-            do_handle_async_batch_reply(ReplyContext#{batch := NotExpired}, Result)
+        _ ->
+            %% some queries are not expired, put them back to the inflight batch
+            %% so it can be either acked now or retried later
+            ok = update_inflight_item(InflightTID, Ref, RealNotExpired, NumExpired),
+            do_handle_async_batch_reply(ReplyContext#{min_batch := RealNotExpired}, Result)
     end.
 
 do_handle_async_batch_reply(
@@ -1069,7 +1158,7 @@ do_handle_async_batch_reply(
         worker_index := Index,
         inflight_tid := InflightTID,
         request_ref := Ref,
-        batch := Batch,
+        min_batch := Batch,
         query_opts := QueryOpts
     },
     Result
@@ -1084,14 +1173,14 @@ do_handle_async_batch_reply(
     case Action of
         nack ->
             %% Keep retrying.
-            mark_inflight_as_retriable(InflightTID, Ref),
-            ?MODULE:block(Pid);
+            ok = mark_inflight_as_retriable(InflightTID, Ref),
+            ok = ?MODULE:block(Pid),
+            blocked;
         ack ->
-            do_ack(InflightTID, Ref, Id, Index, PostFn, Pid, QueryOpts)
+            ok = do_async_ack(InflightTID, Ref, Id, Index, PostFn, QueryOpts)
     end.
 
-do_ack(InflightTID, Ref, Id, Index, PostFn, WorkerPid, QueryOpts) ->
-    IsFullBefore = is_inflight_full(InflightTID),
+do_async_ack(InflightTID, Ref, Id, Index, PostFn, QueryOpts) ->
     IsKnownRef = ack_inflight(InflightTID, Ref, Id, Index),
     case maps:get(simple_query, QueryOpts, false) of
         true ->
@@ -1101,9 +1190,47 @@ do_ack(InflightTID, Ref, Id, Index, PostFn, WorkerPid, QueryOpts) ->
         false ->
             ok
     end,
-    IsFullBefore andalso ?MODULE:flush_worker(WorkerPid),
     ok.
 
+maybe_flush_after_async_reply(_WasFullBeforeReplyHandled = false) ->
+    %% inflight was not full before async reply is handled,
+    %% after it is handled, the inflight table must be even smaller
+    %% hance we can rely on the buffer worker's flush timer to trigger
+    %% the next flush
+    ?tp(skip_flushing_worker, #{}),
+    ok;
+maybe_flush_after_async_reply(_WasFullBeforeReplyHandled = true) ->
+    %% the inflight table was full before handling aync reply
+    ?tp(do_flushing_worker, #{}),
+    ok = ?MODULE:flush_worker(self()).
+
+%% check if the async reply is valid.
+%% e.g. if a connector evaluates the callback more than once:
+%% 1. If the request was previously deleted from inflight table due to
+%%    either succeeded previously or expired, this function logs a
+%%    warning message and returns 'discard' instruction.
+%% 2. If the request was previously failed and now pending on a retry,
+%%    then this function will return 'continue' as there is no way to
+%%    tell if this reply is stae or not.
+maybe_handle_unknown_async_reply(undefined, _Ref, #{simple_query := true}) ->
+    continue;
+maybe_handle_unknown_async_reply(InflightTID, Ref, #{}) ->
+    try ets:member(InflightTID, Ref) of
+        true ->
+            continue;
+        false ->
+            ?tp(
+                warning,
+                unknown_async_reply_discarded,
+                #{inflight_key => Ref}
+            ),
+            discard
+    catch
+        error:badarg ->
+            %% shutdown ?
+            discard
+    end.
+
 %%==============================================================================
 %% operations for queue
 queue_item_marshaller(Bin) when is_binary(Bin) ->
@@ -1202,10 +1329,8 @@ inflight_get_first_retriable(InflightTID, Now) ->
                     {single, Ref, Query}
             end;
         {[{Ref, Batch = [_ | _]}], _Continuation} ->
-            %% batch is non-empty because we check that in
-            %% `sieve_expired_requests'.
             case sieve_expired_requests(Batch, Now) of
-                all_expired ->
+                {[], _AllExpired} ->
                     {expired, Ref, Batch};
                 {NotExpired, Expired} ->
                     {batch, Ref, NotExpired, Expired}
@@ -1218,10 +1343,10 @@ is_inflight_full(InflightTID) ->
     [{_, MaxSize}] = ets:lookup(InflightTID, ?MAX_SIZE_REF),
     %% we consider number of batches rather than number of messages
     %% because one batch request may hold several messages.
-    Size = inflight_num_batches(InflightTID),
+    Size = inflight_count(InflightTID),
     Size >= MaxSize.
 
-inflight_num_batches(InflightTID) ->
+inflight_count(InflightTID) ->
     case ets:info(InflightTID, size) of
         undefined -> 0;
         Size -> max(0, Size - ?INFLIGHT_META_ROWS)
@@ -1243,7 +1368,7 @@ inflight_append(
     InflightItem = ?INFLIGHT_ITEM(Ref, Batch, IsRetriable, WorkerMRef),
     IsNew = ets:insert_new(InflightTID, InflightItem),
     BatchSize = length(Batch),
-    IsNew andalso ets:update_counter(InflightTID, ?SIZE_REF, {2, BatchSize}),
+    IsNew andalso inc_inflight(InflightTID, BatchSize),
     emqx_resource_metrics:inflight_set(Id, Index, inflight_num_msgs(InflightTID)),
     ?tp(buffer_worker_appended_to_inflight, #{item => InflightItem, is_new => IsNew}),
     ok;
@@ -1258,7 +1383,7 @@ inflight_append(
     Query = mark_as_sent(Query0),
     InflightItem = ?INFLIGHT_ITEM(Ref, Query, IsRetriable, WorkerMRef),
     IsNew = ets:insert_new(InflightTID, InflightItem),
-    IsNew andalso ets:update_counter(InflightTID, ?SIZE_REF, {2, 1}),
+    IsNew andalso inc_inflight(InflightTID, 1),
     emqx_resource_metrics:inflight_set(Id, Index, inflight_num_msgs(InflightTID)),
     ?tp(buffer_worker_appended_to_inflight, #{item => InflightItem, is_new => IsNew}),
     ok;
@@ -1274,6 +1399,8 @@ mark_inflight_as_retriable(undefined, _Ref) ->
     ok;
 mark_inflight_as_retriable(InflightTID, Ref) ->
     _ = ets:update_element(InflightTID, Ref, {?RETRY_IDX, true}),
+    %% the old worker's DOWN should not affect this inflight any more
+    _ = ets:update_element(InflightTID, Ref, {?WORKER_MREF_IDX, erased}),
     ok.
 
 %% Track each worker pid only once.
@@ -1317,13 +1444,18 @@ ack_inflight(InflightTID, Ref, Id, Index) ->
                 1;
             [?INFLIGHT_ITEM(Ref, [?QUERY(_, _, _, _) | _] = Batch, _IsRetriable, _WorkerMRef)] ->
                 length(Batch);
-            _ ->
+            [] ->
                 0
         end,
-    IsAcked = Count > 0,
-    IsAcked andalso ets:update_counter(InflightTID, ?SIZE_REF, {2, -Count, 0, 0}),
-    emqx_resource_metrics:inflight_set(Id, Index, inflight_num_msgs(InflightTID)),
-    IsAcked.
+    ok = dec_inflight(InflightTID, Count),
+    IsKnownRef = (Count > 0),
+    case IsKnownRef of
+        true ->
+            emqx_resource_metrics:inflight_set(Id, Index, inflight_num_msgs(InflightTID));
+        false ->
+            ok
+    end,
+    IsKnownRef.
 
 mark_inflight_items_as_retriable(Data, WorkerMRef) ->
     #{inflight_tid := InflightTID} = Data,
@@ -1341,9 +1473,18 @@ mark_inflight_items_as_retriable(Data, WorkerMRef) ->
     ok.
 
 %% used to update a batch after dropping expired individual queries.
-update_inflight_item(InflightTID, Ref, NewBatch) ->
+update_inflight_item(InflightTID, Ref, NewBatch, NumExpired) ->
     _ = ets:update_element(InflightTID, Ref, {?ITEM_IDX, NewBatch}),
-    ?tp(buffer_worker_worker_update_inflight_item, #{ref => Ref}),
+    ok = dec_inflight(InflightTID, NumExpired).
+
+inc_inflight(InflightTID, Count) ->
+    _ = ets:update_counter(InflightTID, ?SIZE_REF, {2, Count}),
+    ok.
+
+dec_inflight(_InflightTID, 0) ->
+    ok;
+dec_inflight(InflightTID, Count) when Count > 0 ->
+    _ = ets:update_counter(InflightTID, ?SIZE_REF, {2, -Count, 0, 0}),
     ok.
 
 %%==============================================================================
@@ -1453,22 +1594,12 @@ is_async_return(_) ->
     false.
 
 sieve_expired_requests(Batch, Now) ->
-    {Expired, NotExpired} =
-        lists:partition(
-            fun(?QUERY(_ReplyTo, _CoreReq, _HasBeenSent, ExpireAt)) ->
-                is_expired(ExpireAt, Now)
-            end,
-            Batch
-        ),
-    case {NotExpired, Expired} of
-        {[], []} ->
-            %% Should be impossible for batch_size >= 1.
-            all_expired;
-        {[], [_ | _]} ->
-            all_expired;
-        {[_ | _], _} ->
-            {NotExpired, Expired}
-    end.
+    lists:partition(
+        fun(?QUERY(_ReplyTo, _CoreReq, _HasBeenSent, ExpireAt)) ->
+            not is_expired(ExpireAt, Now)
+        end,
+        Batch
+    ).
 
 -spec is_expired(infinity | integer(), integer()) -> boolean().
 is_expired(infinity = _ExpireAt, _Now) ->

+ 49 - 10
apps/emqx_resource/test/emqx_connector_demo.erl

@@ -135,11 +135,11 @@ on_query(_InstId, get_counter, #{pid := Pid}) ->
     after 1000 ->
         {error, timeout}
     end;
-on_query(_InstId, {sleep, For}, #{pid := Pid}) ->
+on_query(_InstId, {sleep_before_reply, For}, #{pid := Pid}) ->
     ?tp(connector_demo_sleep, #{mode => sync, for => For}),
     ReqRef = make_ref(),
     From = {self(), ReqRef},
-    Pid ! {From, {sleep, For}},
+    Pid ! {From, {sleep_before_reply, For}},
     receive
         {ReqRef, Result} ->
             Result
@@ -159,9 +159,9 @@ on_query_async(_InstId, block_now, ReplyFun, #{pid := Pid}) ->
 on_query_async(_InstId, {big_payload, Payload}, ReplyFun, #{pid := Pid}) ->
     Pid ! {big_payload, Payload, ReplyFun},
     {ok, Pid};
-on_query_async(_InstId, {sleep, For}, ReplyFun, #{pid := Pid}) ->
+on_query_async(_InstId, {sleep_before_reply, For}, ReplyFun, #{pid := Pid}) ->
     ?tp(connector_demo_sleep, #{mode => async, for => For}),
-    Pid ! {{sleep, For}, ReplyFun},
+    Pid ! {{sleep_before_reply, For}, ReplyFun},
     {ok, Pid}.
 
 on_batch_query(InstId, BatchReq, State) ->
@@ -173,10 +173,13 @@ on_batch_query(InstId, BatchReq, State) ->
         get_counter ->
             batch_get_counter(sync, InstId, State);
         {big_payload, _Payload} ->
-            batch_big_payload(sync, InstId, BatchReq, State)
+            batch_big_payload(sync, InstId, BatchReq, State);
+        {random_reply, Num} ->
+            %% async batch retried
+            make_random_reply(Num)
     end.
 
-on_batch_query_async(InstId, BatchReq, ReplyFunAndArgs, State) ->
+on_batch_query_async(InstId, BatchReq, ReplyFunAndArgs, #{pid := Pid} = State) ->
     %% Requests can be of multiple types, but cannot be mixed.
     case hd(BatchReq) of
         {inc_counter, _} ->
@@ -186,7 +189,11 @@ on_batch_query_async(InstId, BatchReq, ReplyFunAndArgs, State) ->
         block_now ->
             on_query_async(InstId, block_now, ReplyFunAndArgs, State);
         {big_payload, _Payload} ->
-            batch_big_payload({async, ReplyFunAndArgs}, InstId, BatchReq, State)
+            batch_big_payload({async, ReplyFunAndArgs}, InstId, BatchReq, State);
+        {random_reply, Num} ->
+            %% only take the first Num in the batch should be random enough
+            Pid ! {{random_reply, Num}, ReplyFunAndArgs},
+            {ok, Pid}
     end.
 
 batch_inc_counter(CallMode, InstId, BatchReq, State) ->
@@ -299,16 +306,33 @@ counter_loop(
             {{FromPid, ReqRef}, get} ->
                 FromPid ! {ReqRef, Num},
                 State;
-            {{sleep, _} = SleepQ, ReplyFun} ->
+            {{random_reply, RandNum}, ReplyFun} ->
+                %% usually a behaving  connector should reply once and only once for
+                %% each (batch) request
+                %% but we try to reply random results a random number of times
+                %% with 'ok' in the result, the buffer worker should eventually
+                %% drain the buffer (and inflights table)
+                ReplyCount = 1 + (RandNum rem 3),
+                Results = make_random_replies(ReplyCount),
+                %% add a delay to trigger inflight full
+                lists:foreach(
+                    fun(Result) ->
+                        timer:sleep(rand:uniform(5)),
+                        apply_reply(ReplyFun, Result)
+                    end,
+                    Results
+                ),
+                State;
+            {{sleep_before_reply, _} = SleepQ, ReplyFun} ->
                 apply_reply(ReplyFun, handle_query(async, SleepQ, Status)),
                 State;
-            {{FromPid, ReqRef}, {sleep, _} = SleepQ} ->
+            {{FromPid, ReqRef}, {sleep_before_reply, _} = SleepQ} ->
                 FromPid ! {ReqRef, handle_query(sync, SleepQ, Status)},
                 State
         end,
     counter_loop(NewState).
 
-handle_query(Mode, {sleep, For} = Query, Status) ->
+handle_query(Mode, {sleep_before_reply, For} = Query, Status) ->
     ok = timer:sleep(For),
     Result =
         case Status of
@@ -329,3 +353,18 @@ maybe_register(_Name, _Pid, false) ->
 
 apply_reply({ReplyFun, Args}, Result) when is_function(ReplyFun) ->
     apply(ReplyFun, Args ++ [Result]).
+
+make_random_replies(0) ->
+    [];
+make_random_replies(N) ->
+    [make_random_reply(N) | make_random_replies(N - 1)].
+
+make_random_reply(N) ->
+    case rand:uniform(3) of
+        1 ->
+            {ok, N};
+        2 ->
+            {error, {recoverable_error, N}};
+        3 ->
+            {error, {unrecoverable_error, N}}
+    end.

+ 110 - 18
apps/emqx_resource/test/emqx_resource_SUITE.erl

@@ -1482,7 +1482,7 @@ t_retry_async_inflight_full(_Config) ->
                         AsyncInflightWindow * 2,
                         fun() ->
                             For = (ResumeInterval div 4) + rand:uniform(ResumeInterval div 4),
-                            {sleep, For}
+                            {sleep_before_reply, For}
                         end,
                         #{async_reply_fun => {fun(Res) -> ct:pal("Res = ~p", [Res]) end, []}}
                     ),
@@ -1507,6 +1507,59 @@ t_retry_async_inflight_full(_Config) ->
     ?assertEqual(0, emqx_resource_metrics:inflight_get(?ID)),
     ok.
 
+%% this test case is to ensure the buffer worker will not go crazy even
+%% if the underlying connector is misbehaving: evaluate async callbacks multiple times
+t_async_reply_multi_eval(_Config) ->
+    ResumeInterval = 5,
+    TotalTime = 5_000,
+    AsyncInflightWindow = 3,
+    TotalQueries = AsyncInflightWindow * 5,
+    emqx_connector_demo:set_callback_mode(async_if_possible),
+    {ok, _} = emqx_resource:create(
+        ?ID,
+        ?DEFAULT_RESOURCE_GROUP,
+        ?TEST_RESOURCE,
+        #{name => ?FUNCTION_NAME},
+        #{
+            query_mode => async,
+            async_inflight_window => AsyncInflightWindow,
+            batch_size => 3,
+            batch_time => 10,
+            worker_pool_size => 1,
+            resume_interval => ResumeInterval
+        }
+    ),
+    %% block
+    ok = emqx_resource:simple_sync_query(?ID, block),
+    inc_counter_in_parallel(
+        TotalQueries,
+        fun() ->
+            Rand = rand:uniform(1000),
+            {random_reply, Rand}
+        end,
+        #{}
+    ),
+    ?retry(
+        ResumeInterval,
+        TotalTime div ResumeInterval,
+        begin
+            Metrics = tap_metrics(?LINE),
+            #{
+                counters := Counters,
+                gauges := #{queuing := 0, inflight := 0}
+            } = Metrics,
+            #{
+                matched := Matched,
+                success := Success,
+                dropped := Dropped,
+                late_reply := LateReply,
+                failed := Failed
+            } = Counters,
+            ?assertEqual(TotalQueries, Matched - 1),
+            ?assertEqual(Matched, Success + Dropped + LateReply + Failed)
+        end
+    ).
+
 t_retry_async_inflight_batch(_Config) ->
     ResumeInterval = 1_000,
     emqx_connector_demo:set_callback_mode(async_if_possible),
@@ -1944,7 +1997,7 @@ t_expiration_async_batch_after_reply(_Config) ->
         #{name => test_resource},
         #{
             query_mode => async,
-            batch_size => 2,
+            batch_size => 3,
             batch_time => 100,
             worker_pool_size => 1,
             resume_interval => 2_000
@@ -1959,7 +2012,7 @@ do_t_expiration_async_after_reply(IsBatch) ->
             NAcks =
                 case IsBatch of
                     batch -> 1;
-                    single -> 2
+                    single -> 3
                 end,
             ?force_ordering(
                 #{?snk_kind := buffer_worker_flush_ack},
@@ -1980,6 +2033,10 @@ do_t_expiration_async_after_reply(IsBatch) ->
                 ok,
                 emqx_resource:query(?ID, {inc_counter, 199}, #{timeout => TimeoutMS})
             ),
+            ?assertEqual(
+                ok,
+                emqx_resource:query(?ID, {inc_counter, 299}, #{timeout => TimeoutMS})
+            ),
             ?assertEqual(
                 ok, emqx_resource:query(?ID, {inc_counter, 99}, #{timeout => infinity})
             ),
@@ -1997,30 +2054,44 @@ do_t_expiration_async_after_reply(IsBatch) ->
             {ok, _} = ?block_until(
                 #{?snk_kind := handle_async_reply_expired}, 10 * TimeoutMS
             ),
+            wait_telemetry_event(success, #{n_events => 1, timeout => 4_000}),
 
             unlink(Pid0),
             exit(Pid0, kill),
             ok
         end,
         fun(Trace) ->
-            ?assertMatch(
-                [
-                    #{
-                        expired := [{query, _, {inc_counter, 199}, _, _}]
-                    }
-                ],
-                ?of_kind(handle_async_reply_expired, Trace)
-            ),
-            wait_telemetry_event(success, #{n_events => 1, timeout => 4_000}),
+            case IsBatch of
+                batch ->
+                    ?assertMatch(
+                        [
+                            #{
+                                expired := [
+                                    {query, _, {inc_counter, 199}, _, _},
+                                    {query, _, {inc_counter, 299}, _, _}
+                                ]
+                            }
+                        ],
+                        ?of_kind(handle_async_reply_expired, Trace)
+                    );
+                single ->
+                    ?assertMatch(
+                        [
+                            #{expired := [{query, _, {inc_counter, 199}, _, _}]},
+                            #{expired := [{query, _, {inc_counter, 299}, _, _}]}
+                        ],
+                        ?of_kind(handle_async_reply_expired, Trace)
+                    )
+            end,
             Metrics = tap_metrics(?LINE),
             ?assertMatch(
                 #{
                     counters := #{
-                        matched := 2,
+                        matched := 3,
                         %% the request with infinity timeout.
                         success := 1,
                         dropped := 0,
-                        late_reply := 1,
+                        late_reply := 2,
                         retried := 0,
                         failed := 0
                     }
@@ -2042,7 +2113,7 @@ t_expiration_batch_all_expired_after_reply(_Config) ->
         #{name => test_resource},
         #{
             query_mode => async,
-            batch_size => 2,
+            batch_size => 3,
             batch_time => 100,
             worker_pool_size => 1,
             resume_interval => ResumeInterval
@@ -2067,6 +2138,10 @@ t_expiration_batch_all_expired_after_reply(_Config) ->
                 ok,
                 emqx_resource:query(?ID, {inc_counter, 199}, #{timeout => TimeoutMS})
             ),
+            ?assertEqual(
+                ok,
+                emqx_resource:query(?ID, {inc_counter, 299}, #{timeout => TimeoutMS})
+            ),
             Pid0 =
                 spawn_link(fun() ->
                     ?tp(delay_enter, #{}),
@@ -2087,7 +2162,10 @@ t_expiration_batch_all_expired_after_reply(_Config) ->
             ?assertMatch(
                 [
                     #{
-                        expired := [{query, _, {inc_counter, 199}, _, _}]
+                        expired := [
+                            {query, _, {inc_counter, 199}, _, _},
+                            {query, _, {inc_counter, 299}, _, _}
+                        ]
                     }
                 ],
                 ?of_kind(handle_async_reply_expired, Trace)
@@ -2096,12 +2174,16 @@ t_expiration_batch_all_expired_after_reply(_Config) ->
             ?assertMatch(
                 #{
                     counters := #{
-                        matched := 1,
+                        matched := 2,
                         success := 0,
                         dropped := 0,
-                        late_reply := 1,
+                        late_reply := 2,
                         retried := 0,
                         failed := 0
+                    },
+                    gauges := #{
+                        inflight := 0,
+                        queuing := 0
                     }
                 },
                 Metrics
@@ -2217,6 +2299,16 @@ do_t_expiration_retry(IsBatch) ->
                 [#{expired := [{query, _, {inc_counter, 1}, _, _}]}],
                 ?of_kind(buffer_worker_retry_expired, Trace)
             ),
+            Metrics = tap_metrics(?LINE),
+            ?assertMatch(
+                #{
+                    gauges := #{
+                        inflight := 0,
+                        queuing := 0
+                    }
+                },
+                Metrics
+            ),
             ok
         end
     ),

+ 1 - 1
apps/emqx_retainer/src/emqx_retainer_api.erl

@@ -166,7 +166,7 @@ config(put, #{body := Body}) ->
 %%------------------------------------------------------------------------------
 lookup_retained(get, #{query_string := Qs}) ->
     Page = maps:get(<<"page">>, Qs, 1),
-    Limit = maps:get(<<"limit">>, Qs, emqx_mgmt:max_row_limit()),
+    Limit = maps:get(<<"limit">>, Qs, emqx_mgmt:default_row_limit()),
     {ok, Msgs} = emqx_retainer_mnesia:page_read(undefined, undefined, Page, Limit),
     {200, #{
         data => [format_message(Msg) || Msg <- Msgs],

+ 3 - 3
apps/emqx_retainer/src/emqx_retainer_schema.erl

@@ -41,13 +41,13 @@ fields("retainer") ->
             sc(
                 emqx_schema:duration_ms(),
                 msg_expiry_interval,
-                "0s"
+                <<"0s">>
             )},
         {msg_clear_interval,
             sc(
                 emqx_schema:duration_ms(),
                 msg_clear_interval,
-                "0s"
+                <<"0s">>
             )},
         {flow_control,
             sc(
@@ -59,7 +59,7 @@ fields("retainer") ->
             sc(
                 emqx_schema:bytesize(),
                 max_payload_size,
-                "1MB"
+                <<"1MB">>
             )},
         {stop_publish_clear_msg,
             sc(

+ 1 - 1
apps/emqx_rule_engine/src/emqx_rule_engine_schema.erl

@@ -51,7 +51,7 @@ fields("rule_engine") ->
             ?HOCON(
                 emqx_schema:duration_ms(),
                 #{
-                    default => "10s",
+                    default => <<"10s">>,
                     desc => ?DESC("rule_engine_jq_function_default_timeout")
                 }
             )},

+ 1 - 1
apps/emqx_slow_subs/src/emqx_slow_subs.app.src

@@ -1,7 +1,7 @@
 {application, emqx_slow_subs, [
     {description, "EMQX Slow Subscribers Statistics"},
     % strict semver, bump manually!
-    {vsn, "1.0.2"},
+    {vsn, "1.0.3"},
     {modules, []},
     {registered, [emqx_slow_subs_sup]},
     {applications, [kernel, stdlib, emqx]},

+ 2 - 2
apps/emqx_slow_subs/src/emqx_slow_subs_schema.erl

@@ -30,13 +30,13 @@ fields("slow_subs") ->
         {threshold,
             sc(
                 emqx_schema:duration_ms(),
-                "500ms",
+                <<"500ms">>,
                 threshold
             )},
         {expire_interval,
             sc(
                 emqx_schema:duration_ms(),
-                "300s",
+                <<"300s">>,
                 expire_interval
             )},
         {top_k_num,

+ 3 - 3
apps/emqx_statsd/src/emqx_statsd_api.erl

@@ -77,9 +77,9 @@ statsd_config_schema() ->
 statsd_example() ->
     #{
         enable => true,
-        flush_time_interval => "30s",
-        sample_time_interval => "30s",
-        server => "127.0.0.1:8125",
+        flush_time_interval => <<"30s">>,
+        sample_time_interval => <<"30s">>,
+        server => <<"127.0.0.1:8125">>,
         tags => #{}
     }.
 

+ 2 - 2
apps/emqx_statsd/src/emqx_statsd_schema.erl

@@ -61,12 +61,12 @@ server() ->
     emqx_schema:servers_sc(Meta, ?SERVER_PARSE_OPTS).
 
 sample_interval(type) -> emqx_schema:duration_ms();
-sample_interval(default) -> "30s";
+sample_interval(default) -> <<"30s">>;
 sample_interval(desc) -> ?DESC(?FUNCTION_NAME);
 sample_interval(_) -> undefined.
 
 flush_interval(type) -> emqx_schema:duration_ms();
-flush_interval(default) -> "30s";
+flush_interval(default) -> <<"30s">>;
 flush_interval(desc) -> ?DESC(?FUNCTION_NAME);
 flush_interval(_) -> undefined.
 

+ 9 - 3
bin/emqx

@@ -545,8 +545,12 @@ else
                 logerr "Make sure environment variable EMQX_NODE__NAME is set to indicate for which node this command is intended."
                 exit 1
             fi
+        else
+            if [ -n "${EMQX_NODE__NAME:-}" ]; then
+                die "Node $EMQX_NODE__NAME is not running?"
+            fi
         fi
-        ## We have no choiece but to read the bootstrap config (with environment overrides available in the current shell)
+        ## We have no choice but to read the bootstrap config (with environment overrides available in the current shell)
         [ -f "$EMQX_ETC_DIR"/emqx.conf ] || die "emqx.conf is not found in $EMQX_ETC_DIR" 1
         maybe_use_portable_dynlibs
         EMQX_BOOT_CONFIGS="$(call_hocon -s "$SCHEMA_MOD" -c "$EMQX_ETC_DIR"/emqx.conf multi_get "${CONF_KEYS[@]}")"
@@ -940,9 +944,11 @@ if [ -n "${EMQX_NODE_COOKIE:-}" ]; then
     unset EMQX_NODE_COOKIE
 fi
 COOKIE="${EMQX_NODE__COOKIE:-}"
-if [ -z "$COOKIE" ]; then
-    COOKIE="$(get_boot_config 'node.cookie')"
+COOKIE_IN_USE="$(get_boot_config 'node.cookie')"
+if [ -n "$COOKIE_IN_USE" ] && [ -n "$COOKIE" ] && [ "$COOKIE" != "$COOKIE_IN_USE" ]; then
+    die "EMQX_NODE__COOKIE is different from the cookie used by $NAME"
 fi
+[ -z "$COOKIE" ] && COOKIE="$COOKIE_IN_USE"
 [ -z "$COOKIE" ] && COOKIE="$EMQX_DEFAULT_ERLANG_COOKIE"
 
 maybe_warn_default_cookie() {

+ 10 - 0
build

@@ -233,6 +233,9 @@ make_tgz() {
         macos*)
             target_name="${PROFILE}-${full_vsn}.zip"
             ;;
+        windows*)
+            target_name="${PROFILE}-${full_vsn}.zip"
+            ;;
         *)
             target_name="${PROFILE}-${full_vsn}.tar.gz"
             ;;
@@ -298,6 +301,13 @@ make_tgz() {
             # sha256sum may not be available on macos
             openssl dgst -sha256 "${target}" | cut -d ' ' -f 2  > "${target}.sha256"
             ;;
+        windows*)
+            pushd "${tard}" >/dev/null
+            7z a "${target_name}" ./emqx/* >/dev/null
+            popd >/dev/null
+            mv "${tard}/${target_name}" "${target}"
+            sha256sum "${target}" | head -c 64 > "${target}.sha256"
+            ;;
         *)
             ## create tar after change dir
             ## to avoid creating an extra level of 'emqx' dir in the .tar.gz file

+ 1 - 0
changes/ce/feat-10019.en.md

@@ -0,0 +1 @@
+Add low level tuning settings for QUIC listeners.

+ 1 - 0
changes/ce/feat-10019.zh.md

@@ -0,0 +1 @@
+为 QUIC 侦听器添加更多底层调优选项。

+ 1 - 0
changes/ce/feat-9213.en.md

@@ -0,0 +1 @@
+Add pod disruption budget to helm chart

+ 1 - 0
changes/ce/feat-9213.zh.md

@@ -0,0 +1 @@
+在舵手图中添加吊舱干扰预算。

+ 2 - 0
changes/ce/feat-9949.en.md

@@ -0,0 +1,2 @@
+QUIC transport Multistreams support and QUIC TLS cacert support.
+

+ 1 - 0
changes/ce/feat-9949.zh.md

@@ -0,0 +1 @@
+QUIC 传输多流支持和 QUIC TLS cacert 支持。

+ 1 - 0
changes/ce/fix-10009.en.md

@@ -0,0 +1 @@
+Validate `bytes` param to `GET /trace/:name/log` to not exceed signed 32bit integer.

+ 1 - 0
changes/ce/fix-10009.zh.md

@@ -0,0 +1 @@
+验证 `GET /trace/:name/log` 的 `bytes` 参数,使其不超过有符号的32位整数。

+ 7 - 0
changes/ce/fix-10015.en.md

@@ -0,0 +1,7 @@
+To prevent errors caused by an incorrect EMQX node cookie provided from an environment variable,
+we have implemented a fail-fast mechanism.
+Previously, when an incorrect cookie was provided, the command would still attempt to ping the node,
+leading to the error message 'Node xxx not responding to pings'.
+With the new implementation, if a mismatched cookie is detected,
+a message will be logged to indicate that the cookie is incorrect,
+and the command will terminate with an error code of 1 without trying to ping the node.

+ 4 - 0
changes/ce/fix-10015.zh.md

@@ -0,0 +1,4 @@
+在 cookie 给错时,快速失败。
+在此修复前,即使 cookie 配置错误,emqx 命令仍然会尝试去 ping EMQX 节点,
+并得到一个 "Node xxx not responding to pings" 的错误。
+修复后,如果发现 cookie 不一致,立即打印不一致的错误信息并退出。

+ 1 - 0
changes/ce/fix-10020.en.md

@@ -0,0 +1 @@
+Fix bridge metrics when running in async mode with batching enabled (`batch_size` > 1).

+ 1 - 0
changes/ce/fix-10020.zh.md

@@ -0,0 +1 @@
+修复使用异步和批量配置的桥接计数不准确的问题。

+ 1 - 0
changes/ce/fix-10021.en.md

@@ -0,0 +1 @@
+Fix error message when the target node of `emqx_ctl cluster join` command is not running.

+ 1 - 0
changes/ce/fix-10021.zh.md

@@ -0,0 +1 @@
+修正当`emqx_ctl cluster join`命令的目标节点未运行时的错误信息。

+ 3 - 0
changes/ce/fix-9939.en.md

@@ -0,0 +1,3 @@
+Allow 'emqx ctl cluster' command to be issued before Mnesia starts.
+Prior to this change, EMQX `replicant` could not use `manual` discovery strategy.
+Now it's possible to join cluster using 'manual' strategy.

+ 2 - 0
changes/ce/fix-9939.zh.md

@@ -0,0 +1,2 @@
+允许 'emqx ctl cluster join' 命令在 Mnesia 启动前就可以调用。
+在此修复前, EMQX 的 `replicant` 类型节点无法使用 `manual` 集群发现策略。

+ 1 - 0
changes/ce/fix-9997.en.md

@@ -0,0 +1 @@
+Fix Swagger API schema generation. `deprecated` metadata field is now always boolean, as [Swagger specification](https://swagger.io/specification/) suggests.

+ 1 - 0
changes/ce/fix-9997.zh.md

@@ -0,0 +1 @@
+修复 Swagger API 生成时,`deprecated` 元数据字段未按照[标准](https://swagger.io/specification/)建议的那样始终为布尔值的问题。

+ 1 - 0
changes/ee/feat-10011.en.md

@@ -0,0 +1 @@
+Add pod disruption budget to helm chart

+ 1 - 0
changes/ee/feat-10011.zh.md

@@ -0,0 +1 @@
+在舵手图中添加吊舱干扰预算。

changes/ee/feat-9932-en.md → changes/ee/feat-9932.en.md


changes/ee/feat-9932-zh.md → changes/ee/feat-9932.zh.md


+ 5 - 0
changes/ee/fix-10007.en.md

@@ -0,0 +1,5 @@
+Change Kafka bridge's config `memory_overload_protection` default value from `true` to `false`.
+EMQX logs cases when messages get dropped due to overload protection, and this is also reflected in counters.
+However, since there is by default no alerting based on the logs and counters,
+setting it to `true` may cause messages being dropped without noticing.
+At the time being, the better option is to let sysadmin set it explicitly so they are fully aware of the benefits and risks.

+ 3 - 0
changes/ee/fix-10007.zh.md

@@ -0,0 +1,3 @@
+Kafka 桥接的配置参数 `memory_overload_protection` 默认值从 `true` 改成了 `false`。
+尽管内存过载后消息被丢弃会产生日志和计数,如果没有基于这些日志或计数的告警,系统管理员可能无法及时发现消息被丢弃。
+当前更好的选择是:让管理员显式的配置该项,迫使他们理解这个配置的好处以及风险。

+ 2 - 0
changes/v5.0.18/fix-9966.en.md

@@ -0,0 +1,2 @@
+Add two new Erlang apps 'tools' and 'covertool' to the release.
+So we can run profiling and test coverage analysis on release packages.

+ 2 - 0
changes/v5.0.18/fix-9966.zh.md

@@ -0,0 +1,2 @@
+在发布包中增加了2个新的 Erlang app,分别是 ‘tools’ 和 ‘covertool’。
+这两个 app 可以用于性能和测试覆盖率的分析。

+ 3 - 0
deploy/charts/README.md

@@ -0,0 +1,3 @@
+# Sync changes to emqx-enterprise
+
+When making changes in charts, please update `emqx` charts and run `./sync-enterprise.sh`.

+ 28 - 12
deploy/charts/emqx-enterprise/README.md

@@ -40,7 +40,7 @@ The following table lists the configurable parameters of the emqx chart and thei
 | Parameter | Description | Default Value |
 |--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
 | `replicaCount` | It is recommended to have odd number of nodes in a cluster, otherwise the emqx cluster cannot be automatically healed in case of net-split. | 3 |
-| `image.repository` | EMQX Image name | `emqx/emqx-enterprise` |
+| `image.repository` | EMQX Image name | emqx/emqx-enterprise |
 | `image.pullPolicy` | The image pull policy | IfNotPresent |
 | `image.pullSecrets ` | The image pull secrets | `[]` (does not add image pull secrets to deployed pods) |
 | `serviceAccount.create` | If `true`, create a new service account | `true` |
@@ -68,28 +68,30 @@ The following table lists the configurable parameters of the emqx chart and thei
 | `service.dashboard` | Port for dashboard and API. | 18083 |
 | `service.nodePorts.mqtt` | Kubernetes node port for MQTT. | nil |
 | `service.nodePorts.mqttssl` | Kubernetes node port for MQTT(SSL). | nil |
-| `service.nodePorts.mgmt` | Kubernetes node port for mgmt API. | nil |
 | `service.nodePorts.ws` | Kubernetes node port for WebSocket/HTTP. | nil |
 | `service.nodePorts.wss` | Kubernetes node port for WSS/HTTPS. | nil |
 | `service.nodePorts.dashboard` | Kubernetes node port for dashboard. | nil |
 | `service.loadBalancerIP` | loadBalancerIP for Service | nil |
 | `service.loadBalancerSourceRanges` | Address(es) that are allowed when service is LoadBalancer | [] |
 | `service.externalIPs` | ExternalIPs for the service | [] |
-`service.externalTrafficPolicy` |	External Traffic Policy for the service |	`Cluster`
+| `service.externalTrafficPolicy` |	External Traffic Policy for the service |	`Cluster`
 | `service.annotations` | Service annotations | {}(evaluated as a template) |
 | `ingress.dashboard.enabled` | Enable ingress for EMQX Dashboard | false |
 | `ingress.dashboard.ingressClassName` | Set the ingress class for EMQX Dashboard | |
 | `ingress.dashboard.path` | Ingress path for EMQX Dashboard | / |
 | `ingress.dashboard.pathType` | Ingress pathType for EMQX Dashboard | `ImplementationSpecific` |
-| `ingress.dashboard.hosts` | Ingress hosts for EMQX Mgmt API | dashboard.emqx.local |
-| `ingress.dashboard.tls` | Ingress tls for EMQX Mgmt API | [] |
-| `ingress.dashboard.annotations` | Ingress annotations for EMQX Mgmt API | {} |
-| `ingress.mgmt.enabled` | Enable ingress for EMQX Mgmt API | false |
-| `ingress.dashboard.ingressClassName` | Set the ingress class for EMQX Mgmt API | |
-| `ingress.mgmt.path` | Ingress path for EMQX Mgmt API | / |
-| `ingress.mgmt.hosts` | Ingress hosts for EMQX Mgmt API | api.emqx.local |
-| `ingress.mgmt.tls` | Ingress tls for EMQX Mgmt API | [] |
-| `ingress.mgmt.annotations` | Ingress annotations for EMQX Mgmt API | {} |
+| `ingress.dashboard.hosts` | Ingress hosts for EMQX Dashboard | dashboard.emqx.local |
+| `ingress.dashboard.tls` | Ingress tls for EMQX Dashboard | [] |
+| `ingress.dashboard.annotations` | Ingress annotations for EMQX Dashboard | {} |
+| `ingress.dashboard.ingressClassName` | Set the ingress class for EMQX Dashboard | |
+| `ingress.mqtt.enabled` | Enable ingress for MQTT | false |
+| `ingress.mqtt.ingressClassName` | Set the ingress class for MQTT | |
+| `ingress.mqtt.path` | Ingress path for MQTT | / |
+| `ingress.mqtt.pathType` | Ingress pathType for MQTT | `ImplementationSpecific` |
+| `ingress.mqtt.hosts` | Ingress hosts for MQTT | mqtt.emqx.local |
+| `ingress.mqtt.tls` | Ingress tls for MQTT | [] |
+| `ingress.mqtt.annotations` | Ingress annotations for MQTT | {} |
+| `ingress.mqtt.ingressClassName` | Set the ingress class for MQTT | |
 | `metrics.enable` | If set to true, [prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) needs to be installed, and emqx_prometheus needs to enable | false |
 | `metrics.type` | Now we only supported "prometheus" | "prometheus" |
 | `ssl.enabled` | Enable SSL support | false |
@@ -121,3 +123,17 @@ which needs to explicitly configured by either changing the emqx config file or
 
 If you chose to use an existing certificate, make sure, you update the filenames accordingly.
 
+## Tips
+Enable the Proxy Protocol V1/2 if the EMQX cluster is deployed behind HAProxy or Nginx.
+In order to preserve the original client's IP address, you could change the emqx config by passing the following environment variable:
+
+```
+EMQX_LISTENERS__TCP__DEFAULT__PROXY_PROTOCOL: "true"
+```
+
+With haproxy you'd also need the following ingress annotation:
+
+```
+haproxy-ingress.github.io/proxy-protocol: "v2"
+```
+

+ 0 - 0
deploy/charts/emqx-enterprise/templates/ingress.yaml


Некоторые файлы не были показаны из-за большого количества измененных файлов