Bladeren bron

修复ASR的启动录音慢的问题,加入常驻 MIC Buffer

hwt 1 week geleden
bovenliggende
commit
a5694aaa72

+ 133 - 0
brain/PlannerNode2/config.md

@@ -0,0 +1,133 @@
+# 一、基础元信息
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| version | 配置版本号 | `1.0.36` | string | 当前配置版本,用于判断配置是否更新 | 只读显示 |
+| timestamp | 配置生成时间 | `2026-05-13T11:18:28Z` | string/datetime | 配置节点生成该配置的时间 | 只读显示 |
+| source | 配置来源 | `config_node (mock)` | string | 配置由哪个节点或模块生成 | 只读显示 |
+
+# 二、ASR 语音识别配置
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| asr.VAD_MODE | VAD 灵敏度 | `2` | int | 语音活动检测灵敏度,通常数值越高越敏感或越严格,取决于 VAD 实现 | 数字输入 / 下拉 |
+| asr.sample_rate | ASR 采样率 | `16000` | int | 录音音频采样率,单位 Hz,常用 16000 | 数字输入 |
+| asr.frame_duration_ms | VAD 帧大小 | `30` | int | VAD 每帧音频长度,单位毫秒 | 数字输入 |
+| asr.use_oline_asr | 是否使用在线 ASR | `false` | boolean | 是否调用云端 ASR 识别;false 表示使用本地 ASR | 开关 |
+| asr.mic_serial_port | 麦克风串口 | `/dev/ttyUSB0` | string | 麦克风串口设备路径或别名 | 文本输入 |
+| asr.mic_index | 麦克风索引 | `-2` | int | PyAudio/系统中的麦克风设备索引 | 数字输入 |
+| asr.language | ASR 语言 | `zh` | string | 系统语音语言,zh 表示中文,en 表示英文 | 下拉:zh/en |
+| asr.regional_setting | 区域版本 | `China` | string | China 表示国内版,international 表示国际版 | 下拉 |
+
+# 三、动作服务 action_service 配置
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| action_service.Speed_topic | 速度控制话题 | `/cmd_vel` | string | ROS2 机器人速度控制 Topic,一般用于发布 Twist 速度指令 | 文本输入 |
+| action_service.text_chat_mode | 文字交互模式 | `false` | boolean | 是否启用文字聊天模式;true 表示文字交互,false 表示语音/动作交互为主 | 开关 |
+| action_service.image_topic | 相机图像话题 | `/camera/color/image_raw` | string | 视觉模型读取的 ROS2 图像 Topic | 文本输入 |
+| action_service.useolinetts | 是否使用在线 TTS | `true` | boolean | 是否使用云端语音合成;false 表示使用本地 TTS | 开关 |
+| action_service.language | 本地 TTS 语言 | `zh` | string | 本地语音合成语言,zh 中文,en 英文 | 下拉:zh/en |
+| action_service.regional_setting | 区域版本 | `China` | string | China 表示国内版,international 表示国际版 | 下拉 |
+
+# 四、模型服务 model_service 配置
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| model_service.language | 大模型接口语言 | `zh` | string | 大模型交互语言,zh 中文,en 英文 | 下拉:zh/en |
+| model_service.regional_setting | 区域版本 | `China` | string | China 表示国内版,international 表示国际版 | 下拉 |
+| model_service.text_chat_mode | 文字交互模式 | `true` | boolean | 模型服务是否以文字对话方式运行,通常用于调试或无语音场景 | 开关 |
+
+# 五、大模型与语音云服务配置 large_model
+
+## 5.1 阿里百炼 / 通义千问配置
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| large_model.tongyi_api_key | 通义 API Key | `sk-****` | string/secret | 阿里百炼/通义千问平台 API Key | 密码框,加密存储,脱敏显示 |
+| large_model.tongyi_app_id | 通义 App ID | `6ed9f00173214e7883af7310731a5d7b` | string/secret | 阿里百炼应用 ID | 密码框或文本框,建议脱敏 |
+| large_model.multimodel | 多模态模型名称 | `qwen-vl-max-2025-04-08` | string | 执行层视觉大模型,用于图片理解、安防识别、看图问答等 | 下拉 / 文本输入 |
+| large_model.tts_supplier | TTS 供应商 | `aliyun` | string | 在线语音合成供应商,目前支持 aliyun / baidu | 下拉:aliyun/baidu |
+| large_model.tts_language | TTS 语言 | `zh` | string | 语音合成语言,zh 中文,en 英文 | 下拉:zh/en |
+| large_model.oline_tts_model | 在线 TTS 模型 | `cosyvoice-v2` | string | 阿里云在线语音合成模型名称 | 文本输入 / 下拉 |
+| large_model.voice_tone | 在线 TTS 音色 | `longwan_v2` | string | 阿里云 CosyVoice 音色名称 | 文本输入 / 下拉 |
+| large_model.oline_asr_sample_rate | 在线 ASR 采样率 | `16000` | int | 在线 ASR 音频采样率,单位 Hz | 数字输入 |
+| large_model.oline_asr_model | 在线 ASR 模型 | `paraformer-realtime-v2` | string | 在线语音识别模型名称 | 下拉 |
+
+## 5.2 百度语音配置
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| large_model.baidu_API_KEY | 百度 API Key | `Pppr****` | string/secret | 百度智能云语音合成 API Key | 密码框,加密存储,脱敏显示 |
+| large_model.baidu_SECRET_KEY | 百度 Secret Key | `1jGl****` | string/secret | 百度智能云语音合成 Secret Key | 密码框,加密存储,脱敏显示 |
+| large_model.CUID | 百度设备标识 | `nLSB0tSszSlc2vxM9gQ96FksFuSrQ2cp` | string | 百度语音接口设备唯一标识 | 文本输入 |
+| large_model.PER | 百度发音人 | `103` | int | 百度 TTS 发音人编号,例如 103 为度米朵 | 数字输入 / 下拉 |
+| large_model.SPD | 百度语速 | `5` | int | 百度 TTS 语速,通常范围 0-15,默认 5 | 滑块 / 数字输入 |
+| large_model.PIT | 百度音调 | `5` | int | 百度 TTS 音调,通常范围 0-15,默认 5 | 滑块 / 数字输入 |
+| large_model.VOL | 百度音量 | `5` | int | 百度 TTS 音量,通常范围 0-9,默认 5 | 滑块 / 数字输入 |
+| large_model.network_adapter | 网络适配器 | `wlP1p1s0` | string | 用于网络状态检测或联网相关功能的网卡名称 | 文本输入 / 下拉 |
+
+# 六、国际版 Dify 配置 international
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| international.decision_AI_api_key | 决策大模型 API Key | `app-****` | string/secret | Dify 决策大模型应用 API Key,负责复杂任务规划、任务拆解、意图判断 | 密码框,加密存储,脱敏显示 |
+| international.execution_AI_api_key | 执行大模型 API Key | `app-****` | string/secret | Dify 执行大模型应用 API Key,负责动作生成、执行层回复等 | 密码框,加密存储,脱敏显示 |
+
+# 七、本地模型路径 model_paths
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| model_paths.zh_tts_model | 中文 TTS 模型路径 | `/home/sunrise/opt/app/yahboom_ws/src/largemodel/MODELS/tts/zh/zh_CN-huayan-medium.onnx` | string/path | 本地中文语音合成 ONNX 模型路径 | 文本输入 |
+| model_paths.zh_tts_json | 中文 TTS 配置路径 | `/home/sunrise/opt/app/yahboom_ws/src/largemodel/MODELS/tts/zh/zh_CN-huayan-medium.onnx.json` | string/path | 中文 TTS 模型对应的 JSON 配置文件 | 文本输入 |
+| model_paths.en_tts_model | 英文 TTS 模型路径 | `/home/sunrise/opt/app/yahboom_ws/src/largemodel/MODELS/tts/en/en_US-libritts-high.onnx` | string/path | 本地英文语音合成 ONNX 模型路径 | 文本输入 |
+| model_paths.en_tts_json | 英文 TTS 配置路径 | `/home/sunrise/opt/app/yahboom_ws/src/largemodel/MODELS/tts/en/en_US-libritts-high.onnx.json` | string/path | 英文 TTS 模型对应的 JSON 配置文件 | 文本输入 |
+| model_paths.local_asr_model | 本地 ASR 模型路径 | `/home/sunrise/opt/app/yahboom_ws/src/largemodel/MODELS/asr/SenseVoiceSmall` | string/path | 本地语音识别模型路径,例如 SenseVoiceSmall | 文本输入 |
+
+# 八、系统配置 system
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| system.tongyi_base_url | 通义接口地址 | `https://dashscope.aliyuncs.com/compatible-mode/v1` | string/url | 阿里百炼 OpenAI 兼容接口地址,一般不需要修改 | 文本输入,默认高级配置 |
+| system.local_tts_enabled | 本地 TTS 是否启用 | `true` | boolean | 是否允许使用本地语音合成能力 | 开关 |
+| system.local_asr_enabled | 本地 ASR 是否启用 | `true` | boolean | 是否允许使用本地语音识别能力 | 开关 |
+
+# 九、ROS2 Topic 配置 topics
+
+## 9.1 action_service 相关 Topic
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| topics.action_service.Speed_topic | 速度控制 Topic | `/cmd_vel` | string | 动作服务发布速度控制指令的 Topic | 文本输入 |
+| topics.action_service.image_topic | 图像输入 Topic | `/camera/color/image_raw` | string | 动作服务或视觉模块订阅的相机图像 Topic | 文本输入 |
+| topics.action_service.tts_topic | TTS 播放 Topic | `tts_topic` | string | 发布或订阅语音合成文本/播放指令的 Topic | 文本输入 |
+| topics.action_service.reset_flag | 重置标志 Topic | `reset_flag` | string | 用于重置动作服务状态的 Topic | 文本输入 |
+| topics.action_service.interrupt_flag | 打断标志 Topic | `interrupt_flag` | string | 用于打断当前语音、动作或任务执行的 Topic | 文本输入 |
+| topics.action_service.arm_done_topic | 机械臂完成 Topic | `/largemodel_arm_done` | string | 机械臂动作完成反馈 Topic | 文本输入 |
+| topics.action_service.wakeup_topic | 唤醒 Topic | `wakeup` | string | 麦克风或唤醒模块发布唤醒事件的 Topic | 文本输入 |
+| topics.action_service.record_status_topic | 录音状态 Topic | `record_status` | string | 录音开始、结束、状态变化相关 Topic | 文本输入 |
+
+## 9.2 model_service 相关 Topic
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| topics.model_service.actionstatus_topic | 动作状态 Topic | `actionstatus` | string | 接收动作执行状态反馈的 Topic | 文本输入 |
+| topics.model_service.asr_topic | ASR 识别结果 Topic | `asr` | string | 接收语音识别文本结果的 Topic | 文本输入 |
+| topics.model_service.seewhat_topic | 看图处理 Topic | `seewhat_handle` | string | 视觉问答、看图识别、图片处理相关 Topic | 文本输入 |
+| topics.model_service.text_response_topic | 文本回复 Topic | `text_response` | string | 模型服务输出文字回复的 Topic | 文本输入 |
+
+## 9.3 environment_node 相关 Topic
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| topics.environment_node.environment_topic | 环境信息 Topic | `/ai/env` | string | 环境节点发布环境状态的 Topic | 文本输入 |
+
+# 十、环境节点配置 environment
+
+| 配置路径 | 名称 | 当前值 | 类型 | 解释 | 页面建议 |
+|---|---|---:|---|---|---|
+| environment.publish_topic | 环境信息发布 Topic | `/ai/env` | string | 环境节点统一发布机器人环境状态的 Topic | 文本输入 |
+| environment.intervals.battery_seconds | 电池信息发布间隔 | `1` | int | 电池状态采集/发布间隔,单位秒 | 数字输入 |
+| environment.intervals.temperature_seconds | 温度信息发布间隔 | `1` | int | 温度状态采集/发布间隔,单位秒 | 数字输入 |
+| environment.intervals.weather_seconds | 天气信息发布间隔 | `1` | int | 天气信息采集/发布间隔,单位秒 | 数字输入 |
+| environment.intervals.map_seconds | 地图信息发布间隔 | `1` | int | 地图状态或定位信息发布间隔,单位秒 | 数字输入 |

+ 2 - 2
brain/PlannerNode2/config_node/config_node/config_node.py

@@ -136,9 +136,9 @@ class ConfigNode(Node):
                 # ========== Action Service 节点配置 ==========
                 # ========== Action Service 节点配置 ==========
                 "action_service": {
                 "action_service": {
                     "Speed_topic": "/cmd_vel",
                     "Speed_topic": "/cmd_vel",
-                    "text_chat_mode": True,
+                    "text_chat_mode": False,
                     "image_topic": "/camera/color/image_raw",
                     "image_topic": "/camera/color/image_raw",
-                    "useolinetts": False,
+                    "useolinetts": True,
                     "language": "zh",
                     "language": "zh",
                     "regional_setting": "China"
                     "regional_setting": "China"
                 },
                 },

+ 80 - 26
brain/PlannerNode2/largemodel/largemodel/action_service.py

@@ -44,11 +44,9 @@ class CustomActionServer(Node):
         self.load_target_points()
         self.load_target_points()
         # 初始化机械臂抓取功能 / Initialize arm grasping function
         # 初始化机械臂抓取功能 / Initialize arm grasping function
         # self.arm_grasp_init()
         # self.arm_grasp_init()
-        # 初始化语音合成功能 / Initialize text-to-speech synthesis function
-        self.system_sound_init()
-        # 初始化语言设置/Initialize language settings
-        self.init_language()
-        self.get_logger().info("action service started...")
+        # 配置标志:等待 /ai/config 到达后再初始化声音和语言
+        self.config_ready = False
+        self.get_logger().info("action service started, waiting for /ai/config...")
 
 
     def init_param_config(self):
     def init_param_config(self):
         """
         """
@@ -180,33 +178,88 @@ class CustomActionServer(Node):
     def config_callback(self, msg):
     def config_callback(self, msg):
         """
         """
         配置订阅回调函数
         配置订阅回调函数
-        从 /ai/config 解析 environment_topic
+        从 /ai/config 解析所有配置项
         """
         """
         try:
         try:
             config = json.loads(msg.data)
             config = json.loads(msg.data)
-            topics = config.get('config', {}).get('topics', {})
+            config_root = config.get('config', {})
+
+            # --- action_service 配置处理 ---
+            action_service_cfg = config_root.get('action_service', {})
+            if action_service_cfg:
+                # Speed_topic
+                new_speed_topic = action_service_cfg.get('Speed_topic', self.Speed_topic)
+                if new_speed_topic != self.Speed_topic:
+                    self.Speed_topic = new_speed_topic
+                    self.get_logger().debug(f'[配置] Speed_topic 已更新: {self.Speed_topic}')
+
+                # text_chat_mode
+                self.text_chat_mode = action_service_cfg.get('text_chat_mode', self.text_chat_mode)
+                self.get_logger().debug(f'[配置] text_chat_mode: {self.text_chat_mode}')
+
+                # image_topic(重建订阅)
+                new_image_topic = action_service_cfg.get('image_topic', self.image_topic)
+                if new_image_topic != self.image_topic:
+                    self.image_topic = new_image_topic
+                    if hasattr(self, 'subscription') and self.subscription:
+                        self.destroy_subscription(self.subscription)
+                    self.subscription = self.create_subscription(
+                        Image, self.image_topic, self.image_callback, 2
+                    )
+                    self.get_logger().debug(f'[配置] image_topic 已更新并重建订阅: {self.image_topic}')
+
+                # useolinetts(需要重新初始化 TTS)
+                new_useolinetts = action_service_cfg.get('useolinetts', self.useolinetts)
+                if new_useolinetts != self.useolinetts:
+                    self.useolinetts = new_useolinetts
+                    self.get_logger().debug(f'[配置] useolinetts 已更新: {self.useolinetts},重新初始化 TTS')
+                    # 重新初始化 TTS 模型
+                    self._init_tts_model()
+
+                # language(可能需要重新初始化)
+                new_language = action_service_cfg.get('language', self.language)
+                if new_language != self.language:
+                    self.language = new_language
+                    self.get_logger().debug(f'[配置] language 已更新: {self.language}')
+
+                # regional_setting(可能需要重新初始化)
+                new_regional_setting = action_service_cfg.get('regional_setting', self.regional_setting)
+                if new_regional_setting != self.regional_setting:
+                    self.regional_setting = new_regional_setting
+                    self.get_logger().info(f'[配置] regional_setting 已更新: {self.regional_setting}')
+
+            # --- 首次配置到达:完成延迟初始化的模块 ---
+            if not self.config_ready:
+                self.system_sound_init()
+                self.init_language()
+                self.config_ready = True
+                self.get_logger().debug('[配置] 首次配置已到达,声音和语言模块初始化完成')
+
+            # --- topics 配置处理 ---
+            topics = config_root.get('topics', {})
             environment_node_config = topics.get('environment_node', {})
             environment_node_config = topics.get('environment_node', {})
 
 
             if environment_node_config:
             if environment_node_config:
                 new_topic = environment_node_config.get('environment_topic', '/ai/env')
                 new_topic = environment_node_config.get('environment_topic', '/ai/env')
                 if new_topic != self.environment_topic:
                 if new_topic != self.environment_topic:
                     self.environment_topic = new_topic
                     self.environment_topic = new_topic
-                    # 重建订阅
                     if self.environment_sub:
                     if self.environment_sub:
                         self.destroy_subscription(self.environment_sub)
                         self.destroy_subscription(self.environment_sub)
                     self.environment_sub = self.create_subscription(
                     self.environment_sub = self.create_subscription(
                         String, self.environment_topic, self.environment_callback, 10
                         String, self.environment_topic, self.environment_callback, 10
                     )
                     )
-                    self.get_logger().info(f'[配置] 环境数据订阅 Topic 已更新: {self.environment_topic}')
+                    self.get_logger().debug(f'[配置] 环境数据订阅 Topic 已更新: {self.environment_topic}')
         except Exception as e:
         except Exception as e:
             self.get_logger().warn(f'解析配置数据失败: {e}')
             self.get_logger().warn(f'解析配置数据失败: {e}')
 
 
-    def system_sound_init(
-        self,
-    ):  # 初始化系统声音相关的功能 / Initialize system sound-related functions
+    def _init_tts_model(self):
+        """
+        根据当前 useolinetts / regional_setting 重新初始化 TTS 模型
+        供首次配置到达和 config 动态更新时调用
+        """
         pkg_path = get_package_share_directory("largemodel")
         pkg_path = get_package_share_directory("largemodel")
 
 
-        if self.regional_setting == "China":  # 如果是中国地区 /if it is in China
+        if self.regional_setting == "China":
             if self.useolinetts:
             if self.useolinetts:
                 model_type = "oline"
                 model_type = "oline"
                 self.tts_out_path = os.path.join(
                 self.tts_out_path = os.path.join(
@@ -217,25 +270,26 @@ class CustomActionServer(Node):
                 self.tts_out_path = os.path.join(
                 self.tts_out_path = os.path.join(
                     pkg_path, "resources_file", "tts_output.wav"
                     pkg_path, "resources_file", "tts_output.wav"
                 )
                 )
-
-        elif (
-            self.regional_setting == "international"
-        ):  # 如果是国际地区 /if it is international
-
+        elif self.regional_setting == "international":
             model_type = "XUNFEI_FOR_INTERNATIONAL"
             model_type = "XUNFEI_FOR_INTERNATIONAL"
             self.tts_out_path = os.path.join(
             self.tts_out_path = os.path.join(
                 pkg_path, "resources_file", "XUNFEI_TTS.mp3"
                 pkg_path, "resources_file", "XUNFEI_TTS.mp3"
             )
             )
         else:
         else:
-            while True:
-                self.get_logger().info()(
-                    'Please check the regional_setting parameter in yahboom.yaml file, it should be either "China" or "international".'
-                )
-                time.sleep(1)
+            self.get_logger().warn(
+                'Please check the regional_setting parameter, it should be either "China" or "international".'
+            )
+            return
 
 
-        self.model_client.tts_model_init(
-            model_type, self.language
-        )  # 初始化语音合成模型 / Initialize TTS model
+        self.model_client.tts_model_init(model_type, self.language)
+        self.get_logger().info(
+            f'[TTS] 模型初始化完成: model_type={model_type}, tts_out_path={self.tts_out_path}'
+        )
+
+    def system_sound_init(
+        self,
+    ):  # 初始化系统声音相关的功能 / Initialize system sound-related functions
+        self._init_tts_model()
         # 初始化音频播放器 / Initialize audio player
         # 初始化音频播放器 / Initialize audio player
         pygame.mixer.init()
         pygame.mixer.init()
         self.stop_event = threading.Event()
         self.stop_event = threading.Event()

+ 90 - 55
brain/PlannerNode2/largemodel/largemodel/asr.py

@@ -130,24 +130,30 @@ class ASRNode(Node):
                 self.get_logger().info("I'm here")
                 self.get_logger().info("I'm here")
                 playsound(
                 playsound(
                     self.audio_dict[self.first_response]
                     self.audio_dict[self.first_response]
-                )  # 应答用户 / Respond to the user
+                )  # 应答用户(阻塞,等提示音播完)/ Respond to the user (blocking)
 
 
                 if (
                 if (
                     self.current_thread and self.current_thread.is_alive()
                     self.current_thread and self.current_thread.is_alive()
                 ):  # 打断上次的唤醒处理线程 / Interrupt the previous wake-up handling thread
                 ):  # 打断上次的唤醒处理线程 / Interrupt the previous wake-up handling thread
-                    self.stop_event.set()
-                    self.current_thread.join()  # 等待当前线程结束 / Wait for the current thread to finish
-                    self.stop_event.clear()  # 清除事件 / Clear the event
+                    self.stop_event.set()  # 通知旧线程停止(不等待)
+
+                self.stop_event = threading.Event()  # 创建新的 stop_event
                 self.current_thread = threading.Thread(target=self.kws_handler)
                 self.current_thread = threading.Thread(target=self.kws_handler)
                 self.current_thread.daemon = True
                 self.current_thread.daemon = True
                 self.current_thread.start()
                 self.current_thread.start()
             rclpy.spin_once(self, timeout_sec=0.1)
             rclpy.spin_once(self, timeout_sec=0.1)
-            time.sleep(0.1)
 
 
     def kws_handler(self) -> None:
     def kws_handler(self) -> None:
         if self.stop_event.is_set():
         if self.stop_event.is_set():
             return
             return
 
 
+        # 清空 buffer 中已有的旧帧,确保从"当前时刻"开始录音
+        while not self.audio_buffer.empty():
+            try:
+                self.audio_buffer.get_nowait()
+            except queue.Empty:
+                break
+
         if self.listen_for_speech(self.mic_index):
         if self.listen_for_speech(self.mic_index):
             asr_text = self.ASR_conversion(
             asr_text = self.ASR_conversion(
                 self.user_speechdir
                 self.user_speechdir
@@ -244,6 +250,47 @@ class ASRNode(Node):
         receive_thread.daemon = True
         receive_thread.daemon = True
         receive_thread.start()
         receive_thread.start()
 
 
+        # 初始化常驻音频读取线程 / Initialize persistent audio capture thread
+        self.audio_buffer = queue.Queue(
+            maxsize=100
+        )  # 环形 buffer,保留约 100 帧(~3秒)
+        self.audio_capture_running = True
+        self.audio_capture_thread = threading.Thread(target=self._audio_capture_loop)
+        self.audio_capture_thread.daemon = True
+        self.audio_capture_thread.start()
+        self.get_logger().info("Audio capture thread started (persistent mode)")
+
+    def _audio_capture_loop(self):
+        """常驻音频读取线程,持续从麦克风读取音频帧到 buffer"""
+        p = pyaudio.PyAudio()
+        stream_kwargs = {
+            "format": pyaudio.paInt16,
+            "channels": 1,
+            "rate": self.sample_rate,
+            "input": True,
+            "frames_per_buffer": self.frame_bytes,
+        }
+        if self.mic_index != 0:
+            stream_kwargs["input_device_index"] = self.mic_index
+
+        stream = p.open(**stream_kwargs)
+        self.get_logger().info("Audio stream opened (persistent mode)")
+
+        try:
+            while self.audio_capture_running:
+                frame = stream.read(self.frame_bytes, exception_on_overflow=False)
+                if self.audio_buffer.full():
+                    try:
+                        self.audio_buffer.get_nowait()  # 丢弃最旧的帧
+                    except queue.Empty:
+                        pass
+                self.audio_buffer.put(frame)
+        finally:
+            stream.stop_stream()
+            stream.close()
+            p.terminate()
+            self.get_logger().info("Audio capture thread stopped")
+
     def asr_pub_result(self, asr_result: str) -> None:
     def asr_pub_result(self, asr_result: str) -> None:
         msg = String(data=asr_result)
         msg = String(data=asr_result)
         self.asr_pub.publish(msg)
         self.asr_pub.publish(msg)
@@ -274,70 +321,53 @@ class ASRNode(Node):
 
 
     def listen_for_speech(self, mic_index=0):
     def listen_for_speech(self, mic_index=0):
         self.record_status_pub.publish(Bool(data=True))
         self.record_status_pub.publish(Bool(data=True))
-        p = pyaudio.PyAudio()
         audio_buffer = []
         audio_buffer = []
         silence_counter = 0
         silence_counter = 0
-        MAX_SILENCE_FRAMES = 90  # 30帧*30ms=900ms静音后停止 / Stop after 900ms of silence (30 frames * 30ms)
+        MAX_SILENCE_FRAMES = 30  # 30帧*30ms=900ms静音后停止 / Stop after 900ms of silence (30 frames * 30ms)
         speaking = False  # 语音活动标志 / Flag indicating speech activity
         speaking = False  # 语音活动标志 / Flag indicating speech activity
         frame_counter = 0  # 计数器 / Frame counter
         frame_counter = 0  # 计数器 / Frame counter
-        stream_kwargs = {
-            "format": pyaudio.paInt16,
-            "channels": 1,
-            "rate": self.sample_rate,
-            "input": True,
-            "frames_per_buffer": self.frame_bytes,
-        }
-        if mic_index != 0:
-            stream_kwargs["input_device_index"] = mic_index
+        empty_count = 0  # 连续空帧计数 / Consecutive empty frame count
+        MAX_EMPTY_FRAMES = 200  # 约6秒无音频则退出 / Exit after ~6s of no audio
 
 
         # 通过蜂鸣器提示用户讲话 / Prompt the user to speak via the buzzer
         # 通过蜂鸣器提示用户讲话 / Prompt the user to speak via the buzzer
         self.pub_beep.publish(UInt16(data=1))
         self.pub_beep.publish(UInt16(data=1))
-        time.sleep(0.5)
         self.pub_beep.publish(UInt16(data=0))
         self.pub_beep.publish(UInt16(data=0))
 
 
-        try:
-            # 打开音频流 / Open audio stream
-            stream = p.open(**stream_kwargs)
-            while True:
-                if self.stop_event.is_set():
+        while True:
+            if self.stop_event.is_set():
+                self.record_status_pub.publish(Bool(data=False))
+                return False
+
+            try:
+                frame = self.audio_buffer.get(timeout=0.5)
+                empty_count = 0
+            except queue.Empty:
+                empty_count += 1
+                if empty_count >= MAX_EMPTY_FRAMES:
+                    self.get_logger().warn("No audio input, exiting recording")
+                    self.record_status_pub.publish(Bool(data=False))
                     return False
                     return False
+                continue
 
 
-                frame = stream.read(
-                    self.frame_bytes, exception_on_overflow=False
-                )  # 读取音频数据 / Read audio data
-                is_speech = self.vad.is_speech(
-                    frame, self.sample_rate
-                )  # VAD检测 / VAD detection
+            is_speech = self.vad.is_speech(frame, self.sample_rate)
 
 
-                if is_speech:
-                    # 检测到语音活动 / Detected speech activity
-                    speaking = True
+            if is_speech:
+                speaking = True
+                audio_buffer.append(frame)
+                silence_counter = 0
+            else:
+                if speaking:
+                    silence_counter += 1
                     audio_buffer.append(frame)
                     audio_buffer.append(frame)
-                    silence_counter = 0
-                else:
-                    if speaking:
-                        # 在语音活动后检测静音 / Detect silence after speech activity
-                        silence_counter += 1
-                        audio_buffer.append(
-                            frame
-                        )  # 持续记录缓冲 / Continue recording buffer
-
-                        # 静音持续时间达标时结束录音 / End recording when silence duration meets the threshold
-                        if silence_counter >= MAX_SILENCE_FRAMES:
-                            break
-                frame_counter += 1
-                if frame_counter % 2 == 0:
-                    self.get_logger().info("1" if is_speech else "-")
-                    
-        finally:
-            stream.stop_stream()
-            stream.close()
-            p.terminate()
-            self.record_status_pub.publish(Bool(data=False))
+                    if silence_counter >= MAX_SILENCE_FRAMES:
+                        break
+            frame_counter += 1
+            if frame_counter % 2 == 0:
+                self.get_logger().info("1" if is_speech else "-")
+
+        self.record_status_pub.publish(Bool(data=False))
 
 
-        # 保存有效录音(去除尾部静音) / Save valid recording (remove trailing silence)
         if speaking and len(audio_buffer) > 0:
         if speaking and len(audio_buffer) > 0:
-            # 裁剪最后静音部分 / Trim the last silent part
             clean_buffer = (
             clean_buffer = (
                 audio_buffer[:-MAX_SILENCE_FRAMES]
                 audio_buffer[:-MAX_SILENCE_FRAMES]
                 if len(audio_buffer) > MAX_SILENCE_FRAMES
                 if len(audio_buffer) > MAX_SILENCE_FRAMES
@@ -346,10 +376,11 @@ class ASRNode(Node):
 
 
             with wave.open(self.user_speechdir, "wb") as wf:
             with wave.open(self.user_speechdir, "wb") as wf:
                 wf.setnchannels(1)
                 wf.setnchannels(1)
-                wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
+                wf.setsampwidth(2)  # 16-bit = 2 bytes
                 wf.setframerate(self.sample_rate)
                 wf.setframerate(self.sample_rate)
                 wf.writeframes(b"".join(clean_buffer))
                 wf.writeframes(b"".join(clean_buffer))
                 return True
                 return True
+        return False
             
             
         
         
 def main(args=None):
 def main(args=None):
@@ -360,6 +391,10 @@ def main(args=None):
     except KeyboardInterrupt:
     except KeyboardInterrupt:
         pass
         pass
     finally:
     finally:
+        # 停止常驻音频读取线程 / Stop the persistent audio capture thread
+        sense_voice_node.audio_capture_running = False
+        if sense_voice_node.audio_capture_thread.is_alive():
+            sense_voice_node.audio_capture_thread.join(timeout_sec=2)
         sense_voice_node.destroy_node()
         sense_voice_node.destroy_node()
         rclpy.shutdown()
         rclpy.shutdown()
 
 

+ 368 - 0
brain/PlannerNode2/largemodel/largemodel/asr.py.back

@@ -0,0 +1,368 @@
+import rclpy
+import os
+import time
+from rclpy.node import Node
+import pyaudio
+from playsound import playsound
+import wave
+import threading
+import webrtcvad
+import queue
+from std_msgs.msg import String, UInt16, Bool
+from utils.mic_serial import kws_mic
+from utils import large_model_interface
+from utils.large_model_interface import rec_wav_music_en
+from ament_index_python.packages import get_package_share_directory
+import functools
+def measure_execution_time(func):
+    """
+    装饰器:测量函数执行时间并使用 ROS 日志打印结果
+    """
+    @functools.wraps(func)
+    def wrapper(self, *args, **kwargs):
+        start_time = time.time()
+        result = func(self, *args, **kwargs)
+        end_time = time.time()
+        execution_time = end_time - start_time
+        
+        # 使用 ROS 日志系统记录执行时间
+        if hasattr(self, 'get_logger'):
+            self.get_logger().info(f"[性能统计] {func.__name__} 函数执行时间: {execution_time:.4f} 秒")
+        else:
+            print(f"[性能统计] {func.__name__} 函数执行时间: {execution_time:.4f} 秒")
+        return result
+    return wrapper
+class ASRNode(Node):
+    def __init__(self):
+        super().__init__("asr_node")
+        # 初始化参数、变量 / Initialize parameters and variables
+        self.init_param_config()
+        # 初始化语音唤醒 / Initialize keyword spotting (KWS)
+        self.kws_init()
+        # 初始化ASR模型 / Initialize ASR model
+        self.asr_mdoel_init()
+        # 初始化语言设置 / Initialize language settings
+        self.language_init()
+        # 初始化系统声音 / Initialize system sound functionality
+        self.system_sound_init()
+        # 初始化ROS通信 / Initialize ROS communication
+        self.init_ros_comunication()
+        # 打印初始化信息 / Log initialization completion
+        self.get_logger().info("asr_node Initialization completed")
+
+    def init_ros_comunication(self):
+        # 创建蜂鸣器发布者 / Create a publisher for the buzzer
+        self.pub_beep = self.create_publisher(UInt16, "beep", 10)
+        # 创建ASR发布者,发布转换完成的消息 / Create an ASR publisher to publish conversion results
+        self.asr_pub = self.create_publisher(String, "asr", 5)
+        # 创建唤醒信息发布者 / Create a publisher for wake-up signals
+        self.wakeup_pub = self.create_publisher(Bool, "wakeup", 5)
+        #创建发布录音状态发布者 / Create a publisher for recording status
+        self.record_status_pub=self.create_publisher(Bool, "record_status", 5)
+
+    def init_param_config(self):
+        self.user_speechdir = os.path.join(
+            get_package_share_directory("largemodel"),
+            "resources_file",
+            "user_speech.wav",
+        )
+        # 参数声明 / Declare parameters
+        self.declare_parameter("VAD_MODE", 1)
+        self.declare_parameter("sample_rate", 16000)
+        self.declare_parameter("frame_duration_ms", 30)
+        self.declare_parameter("language", "en")
+        self.declare_parameter("use_oline_asr", False)
+        self.declare_parameter("mic_serial_port", "/dev/mic")
+        self.declare_parameter("mic_index", 0)
+        self.declare_parameter("regional_setting", "China")
+
+
+        # 获取服务器参数 / Get server parameters
+        self.VAD_MODE = (
+            self.get_parameter("VAD_MODE").get_parameter_value().integer_value
+        )
+        self.sample_rate = (
+            self.get_parameter("sample_rate").get_parameter_value().integer_value
+        )
+        self.frame_duration_ms = (
+            self.get_parameter("frame_duration_ms").get_parameter_value().integer_value
+        )
+        self.language = (
+            self.get_parameter("language").get_parameter_value().string_value
+        )
+        self.use_oline_asr = (
+            self.get_parameter("use_oline_asr").get_parameter_value().bool_value
+        )
+        self.mic_serial_port = (
+            self.get_parameter("mic_serial_port").get_parameter_value().string_value
+        )
+        self.mic_index = (
+            self.get_parameter("mic_index").get_parameter_value().integer_value
+        )
+        self.regional_setting = (
+            self.get_parameter("regional_setting").get_parameter_value().string_value
+        )
+        self.frame_bytes = int(
+            self.sample_rate * self.frame_duration_ms / 1000
+        )  # 音频帧大小 / Audio frame size
+    
+        # 大模型接口实例端 / Instance of the large model interface
+        # 传入 logger 用于调试日志
+        self.modelinterface = large_model_interface.model_interface(logger=self.get_logger())
+        # 初始化 WebRTC VAD / Initialize WebRTC VAD
+        self.vad = webrtcvad.Vad()
+        self.vad.set_mode(self.VAD_MODE)
+        self.current_thread = None  # 唤醒处理线程 / Thread for handling wake-up events
+        self.stop_event = threading.Event()
+
+    def main_loop(self):
+        while rclpy.ok():
+            while (
+                self.audio_request_queue.qsize() > 1
+            ):  # 只处理最近的一次唤醒请求,防止重复唤醒 / Process only the most recent wake-up request to prevent duplicates
+                self.audio_request_queue.get()
+
+            if not self.audio_request_queue.empty():
+                self.audio_request_queue.get()
+                self.wakeup_pub.publish(
+                    Bool(data=True)
+                )  # 发布唤醒信号 / Publish wake-up signal
+                self.get_logger().info("I'm here")
+                playsound(
+                    self.audio_dict[self.first_response]
+                )  # 应答用户 / Respond to the user
+
+                if (
+                    self.current_thread and self.current_thread.is_alive()
+                ):  # 打断上次的唤醒处理线程 / Interrupt the previous wake-up handling thread
+                    self.stop_event.set()
+                    self.current_thread.join()  # 等待当前线程结束 / Wait for the current thread to finish
+                    self.stop_event.clear()  # 清除事件 / Clear the event
+                self.current_thread = threading.Thread(target=self.kws_handler)
+                self.current_thread.daemon = True
+                self.current_thread.start()
+            rclpy.spin_once(self, timeout_sec=0.1)
+            time.sleep(0.1)
+
+    def kws_handler(self) -> None:
+        if self.stop_event.is_set():
+            return
+
+        if self.listen_for_speech(self.mic_index):
+            asr_text = self.ASR_conversion(
+                self.user_speechdir
+            )  # 进行 ASR 转换 / Perform ASR conversion
+            if (
+                asr_text == "error"
+            ):  # 检查 ASR 结果长度是否小于4个字符 / Check if ASR result length is less than 4 characters
+                self.get_logger().warn(
+                    "I still don't understand what you mean. Please try again"
+                )
+                playsound(
+                    self.audio_dict[self.error_response]
+                )  # 错误响应 / Error response
+            else:
+                self.get_logger().info(asr_text)
+                self.get_logger().info("😀okay, let me think for a moment...")
+                self.asr_pub_result(asr_text)  # 发布 ASR结果 / Publish ASR result
+        else:
+            return
+
+    def system_sound_init(
+        self,
+    ):  # 初始化系统声音相关的功能 / Initialize system sound functionality
+        pkg_path = get_package_share_directory("largemodel")
+        self.audio_dict = {}  # 系统声音字典 / Dictionary of system sounds
+        self.audio_dict["longwan-women-1"] = os.path.join(
+            pkg_path, "resources_file", "longwan-women-1.mp3"
+        )
+        self.audio_dict["longwan-women-2"] = os.path.join(
+            pkg_path, "resources_file", "longwan-women-2.mp3"
+        )
+        self.audio_dict["longxiaochun-women-1"] = os.path.join(
+            pkg_path, "resources_file", "longxiaochun-women-1.mp3"
+        )
+        self.audio_dict["longxiaochun-women-2"] = os.path.join(
+            pkg_path, "resources_file", "longxiaochun-women-2.mp3"
+        )
+
+    def asr_mdoel_init(self):  # 初始化asr模型 / Initialize ASR model
+        if self.regional_setting == "international":  
+
+            self.get_logger().info(
+                f"The online asr model :XUN-FEI ASR is loaded"
+            )            
+        elif self.regional_setting == "China":
+            if self.use_oline_asr:
+                
+                self.get_logger().info(
+                    f"The online asr model :{self.modelinterface.init_oline_asr(self.language)} is loaded"
+                )
+            else:
+                # -------- SenseVoiceSmall 语音识别  --模型加载----- / Load SenseVoiceSmall online ASR model
+                self.modelinterface.init_local_asr_model()
+                self.get_logger().info("The asr model :SenseVoiceSmall is loaded")        
+
+        else:
+            while True:
+                self.get_logger().info('Please check the regional_setting parameter in yahboom.yaml file, it should be either "China" or "international".')
+                    
+                time.sleep(1)
+
+    def language_init(self):
+        if self.language == "zh":
+            self.first_response = "longwan-women-1"
+            self.error_response = "longwan-women-2"
+        elif self.language == "en":
+            self.first_response = "longxiaochun-women-1"
+            self.error_response = "longxiaochun-women-2"
+        else:
+            while True:
+                self.get_logger().error(
+                    "language setting error,please check your language setting"
+                )  # 语言设置错误,请检查语言设置 / Language setting error, please check your language setting
+                time.sleep(3)
+
+    def kws_init(
+        self,
+    ):  # 初始化关键词唤醒相关的内容 / Initialize keyword spotting (KWS) related content
+        self.port_name = self.mic_serial_port
+        self.audio_request_queue = (
+            queue.Queue()
+        )  # 用于传递音频请求 / Queue for passing audio requests
+        self.serial_port = kws_mic(
+            port=self.port_name, kwsquence=self.audio_request_queue, baudrate=115200
+        )
+        self.serial_port.open()
+        if not self.serial_port.ser or not self.serial_port.ser.is_open:
+            while True:
+                time.sleep(1)
+                self.get_logger().error(
+                    "Failed to open kws serial port.Please check whether the hardware wiring or the voice module is normal?"
+                )  # 未能打开kws串口 / Failed to open KWS serial port
+        receive_thread = threading.Thread(target=self.serial_port.receive_data)
+        receive_thread.daemon = True
+        receive_thread.start()
+
+    def asr_pub_result(self, asr_result: str) -> None:
+        msg = String(data=asr_result)
+        self.asr_pub.publish(msg)
+    # @measure_execution_time
+    def ASR_conversion(self, input_file: str) -> str:
+        if self.regional_setting == "international":  
+            res=rec_wav_music_en()
+            if res is not None:
+                return res
+            else:
+                return "error"
+        else:
+  
+            if self.use_oline_asr:
+                result = self.modelinterface.oline_asr(input_file)
+                if result[0] == "ok" and len(result[1]) > 4:
+                    return result[1]
+                else:
+                    self.get_logger().error(f"ASR Error:{result[1]}")  # ASR错误 / ASR error
+                    return "error"
+            else:
+                result = self.modelinterface.SenseVoiceSmall_ASR(input_file)
+                if result[0] == "ok" and len(result[1]) > 4:
+                    return result[1]
+                else:
+                    self.get_logger().error(f"ASR Error:{result[1]}")  # ASR错误 / ASR error
+                    return "error"
+
+    def listen_for_speech(self, mic_index=0):
+        self.record_status_pub.publish(Bool(data=True))
+        p = pyaudio.PyAudio()
+        audio_buffer = []
+        silence_counter = 0
+        MAX_SILENCE_FRAMES = 30  # 30帧*30ms=900ms静音后停止 / Stop after 900ms of silence (30 frames * 30ms)
+        speaking = False  # 语音活动标志 / Flag indicating speech activity
+        frame_counter = 0  # 计数器 / Frame counter
+        stream_kwargs = {
+            "format": pyaudio.paInt16,
+            "channels": 1,
+            "rate": self.sample_rate,
+            "input": True,
+            "frames_per_buffer": self.frame_bytes,
+        }
+        if mic_index != 0:
+            stream_kwargs["input_device_index"] = mic_index
+
+        # 通过蜂鸣器提示用户讲话 / Prompt the user to speak via the buzzer
+        self.pub_beep.publish(UInt16(data=1))
+        time.sleep(0.5)
+        self.pub_beep.publish(UInt16(data=0))
+
+        try:
+            # 打开音频流 / Open audio stream
+            stream = p.open(**stream_kwargs)
+            while True:
+                if self.stop_event.is_set():
+                    return False
+
+                frame = stream.read(
+                    self.frame_bytes, exception_on_overflow=False
+                )  # 读取音频数据 / Read audio data
+                is_speech = self.vad.is_speech(
+                    frame, self.sample_rate
+                )  # VAD检测 / VAD detection
+
+                if is_speech:
+                    # 检测到语音活动 / Detected speech activity
+                    speaking = True
+                    audio_buffer.append(frame)
+                    silence_counter = 0
+                else:
+                    if speaking:
+                        # 在语音活动后检测静音 / Detect silence after speech activity
+                        silence_counter += 1
+                        audio_buffer.append(
+                            frame
+                        )  # 持续记录缓冲 / Continue recording buffer
+
+                        # 静音持续时间达标时结束录音 / End recording when silence duration meets the threshold
+                        if silence_counter >= MAX_SILENCE_FRAMES:
+                            break
+                frame_counter += 1
+                if frame_counter % 2 == 0:
+                    self.get_logger().info("1" if is_speech else "-")
+                    
+        finally:
+            stream.stop_stream()
+            stream.close()
+            p.terminate()
+            self.record_status_pub.publish(Bool(data=False))
+
+        # 保存有效录音(去除尾部静音) / Save valid recording (remove trailing silence)
+        if speaking and len(audio_buffer) > 0:
+            # 裁剪最后静音部分 / Trim the last silent part
+            clean_buffer = (
+                audio_buffer[:-MAX_SILENCE_FRAMES]
+                if len(audio_buffer) > MAX_SILENCE_FRAMES
+                else audio_buffer
+            )
+
+            with wave.open(self.user_speechdir, "wb") as wf:
+                wf.setnchannels(1)
+                wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
+                wf.setframerate(self.sample_rate)
+                wf.writeframes(b"".join(clean_buffer))
+                return True
+            
+        
+def main(args=None):
+    rclpy.init(args=args)
+    sense_voice_node = ASRNode()
+    try:
+        sense_voice_node.main_loop()
+    except KeyboardInterrupt:
+        pass
+    finally:
+        sense_voice_node.destroy_node()
+        rclpy.shutdown()
+
+
+if __name__ == "__main__":
+    main()