Watchdog kills OMG on ESP32-C3 within seconds

The upgrade to espressif32@5.3.0 was an improvement for me. So far I was not able to reproduce the assert rwble.c 261 error. OMG runs and keeps on running. It does reboot itself every now and then though. But it no longer dies and waits for a physical reset. One step forward. :slight_smile:

I think the problem of running very slow without a serial listener is still present in 5.3.0.

I will observe it a little while longer and hopefully I will catch one of those seemingly random reboots. To be continued.

1 Like

@argafal: Error confirmed! Here is my report about the long term, serial connected run of the Seeed Studio XIAO ESP32C3 (seeedy C3). The C3 was up 6h 9min before it broke. Last lines in the MQTT log

2023-02-28T22:29:39+0100 home/OMG_xiao_c3.1/LWT offline

and the serial log

[22:29:15.416092 1.182972] DIAG0 1000e1a0
[22:29:15.416620 0.000533] DIAG1 120b0025
[22:29:15.417051 0.000431] BB DIAG0: 000a4004
[22:29:15.417583 0.000531]                             BB DIAG1: d6002e02
[22:29:15.418893 0.001310]                             BB DIAG2: 00000000
[22:29:15.421424 0.002531]                             BB DIAG3: 8e89bed6
[22:29:15.425586 0.004162]                             BB DIAG4: 00000000
[22:29:15.429755 0.004169]                             BB DIAG5: 00000000
[22:29:15.433664 0.003909] assert rwble.c 261, param 00020000 00000000

So the error can be reproduced across different ESP32C3 boards. By the way, the final offline message was the only one in the MQTT log. There is something odd about the serial connection. I will also test the espressif32@5.3.0 platform and report back.

1 Like
  • I have now recorded a night of OMG running with espressif@5.3.0.
  • I have to have the serial listener connected, this problem persists.
  • OMG reboots every now and then. Otherwise it runs fine. It never stops entirely, the assert rwble.c 261 error seems to be resolved.
  • I cannot identify a trigger or pattern for the reboots.
  • Uptimes range from 1000 seconds to some 15000 seconds, before a reboot occurs.
  • There is no error message in the log. It looks like this:
N: Device detected: XX:YY:ZZ:XX:ZZ:YY                                                                                                                                                                                              
N: Send on /BTtoMQTT/XXYYZZXXZZYY msg {"id":"XX:YY:ZZ:XX:ZZ:YY","name":"ATC_XXXXXX","rssi":-64,"brand":"Xiaomi","model":"LYWSD03MMC","model_id":"LYWSD03MMC_PVVX","tempc":9.89,"tempf":49.802,"hum":80.6,"batt":40,"volt":2.566}             
ESP-ROM:esp32c3-api1-20210207^M                                                                                                                                                                                                              
Build:Feb  7 2021^M                                                                                                                                                                                                                          
rst:0x3 (RTC_SW_SYS_RST),boot:0xd (SPI_FAST_FLASH_BOOT)^M                                                                                                                                                                                    
Saved PC:0x40381a7a^M                                                                                                                                                                                                                        
SPIWP:0xee^M                                                                                                                                                                                                                                 
mode:DIO, clock div:1^M                                                                                                                                                                                                                      
load:0x3fcd5810,len:0x438^M                                                                                                                                                                                                                  
load:0x403cc710,len:0x918^M                                                                                                                                                                                                                  
load:0x403ce710,len:0x24e4^M                                                                                                                                                                                                                 
entry 0x403cc710^M                                                                                                                                                                                                                           
E (251) esp_core_dump_flash: Core dump data check failed:^M                                                                                                                                                                                  
Calculated checksum='11e11b27'^M                                                                                                                                                                                                             
Image checksum='5df3c73e'^M                                                                                                                                                                                                                  
N:                                                                                                                                                                                                                                           
************* WELCOME TO OpenMQTTGateway **************

I have tested our serial problem with a simple program, no WIFI, no BLE. The on states of my three LEDs were rotated. After each change the current active LED was printed to Serial: green, red, yellow, green, … The speed was increased by the pushbutton, maximum speed limited by time to debounce. At high speeds everything was fine with the serial connected. When I shut down the terminal it took 1 - 2 sec for the rotation to decrease markedly. When restarting the terminal the original high speed was restored. I fear that the serial output buffer of the framework is not really circular. When full, all the previous bytes may have to be shifted when new ones arrive.
So I went ahead to modify my prod_env.ini to accommodate the updated espressif32@5.3.0 platform and the possibility to suppress serial output by compiler directives as earlier suggested in this thread. There is still some output left:

{
  "bleconnect": true,
  "interval": 55555,
...
  "btqsnd": 0,
  "btqavg": 0
}

and

[  5218][E][esp32-hal-misc.c:128] disableCore0WDT(): Failed to remove Core 0 IDLE task from WDT

Apparently that is relatively short and not a hindrance at the moment. So I am hopefully logging at long, new platform, serial minimized.
See you tomorrow.

I regard this as a success. The Seeed Studio XIAO ESP32C3 is now up for more than 22h and still sending MQTT. It is just hooked up to a power supply, using the espressif32@5.3.0 platform and serial logging supressed. The log shows after the “LWT online” no more LWT tokens. Thank you @argafal and @1technophile. My current _env.ini is:

; Custom configuration file for the Seed XIAO ESP32C3 dev
; Wed, 01 Mar 2023 20:10:45 +0100
;
; Currently there were two problems
; 1.  In long term runs ESP32C3 terminate with an assert failure.
;     Therefore the standard platform is overridden with a currently newer one.
;     One may have to repeat Wifi and MQTT configuration after changing platform.
; 2.  Serial output if no terminal is connected slows down everything else.
;     For this situation chose the xxx_no_serial variant for default_envs.
; -------------------------------------------------------------------------

[platformio]
default_envs = seeed_xiao_esp32c3_no_serial

[env:seeed_xiao_esp32c3]
;platform = ${com.esp32_c3_s3_platform} ; standard, problems
platform = espressif32@5.3.0  ; override
board = seeed_xiao_esp32c3
board_build.partitions = min_spiffs.csv
monitor_speed = 115200
lib_deps =
  ${com-esp.lib_deps}
  ${libraries.wifimanager32}
  ${libraries.ble}
  ${libraries.decoder}
build_flags =
  ${com-esp.build_flags}
  '-DZgatewayBT="BT"'
  '-DLED_SEND_RECEIVE=10' ; GPIO: 2 - 10, 20, 21
  '-DLED_INFO=8'
  '-DLED_ERROR=9'
  '-DTRIGGER_GPIO=7'
  '-DNO_INT_TEMP_READING=true' ; No internal temperature on ESP32 C3 or S3
  '-DGateway_Name="OMG_xiao_c3.1"'
custom_description = BLE gateway on Xiao C3
custom_hardware = Seed XIAO ESP32C3 dev

[env:seeed_xiao_esp32c3_no_serial]
extends = env:seeed_xiao_esp32c3
build_flags =
  ${env:seeed_xiao_esp32c3.build_flags}
  '-DLOG_LEVEL=LOG_LEVEL_SILENT'  ; shut up
  '-DWM_DEBUG_LEVEL=0'

@argafal: I have newer observed so nicely documented core dumps of microcontrollers. But I do not have the SERIAL_JTAG directives in my config. Perhaps they add an extra level of information to serial logs. At least I should have seen the WELCOME TO OpenMQTTGateway, but I have not.
@1technophile: Another difference between argafal and me might be that I had cloned OMG from theengs/OpenMQTTGateway at github, stage Wed Feb 22 14:14:45 2023 +0100. I did that because I am mainly interested in BLE and there are other theengs tools which look very promising to get it working.
Nevertheless, this night I will test the Seed C3 with serial grabbing enabled.
See you tomorrow.

2 Likes

@mrickma I much appreciate your diligent testing and reporting. :slight_smile: Nice to see confirmation that espressif@5.3.0 also resolved the rwble error for you! Good that you don’t have random reboots either, means there might be hope for me :wink:

For the record, I am on git 03c83fb69c2bd9a36d043b323f1008d6802c961b, Sat Feb 4 16:19:57 2023 -0600.

In the meantime I have also tested espressif@6.0.1. The behaviour to me looks identical as I had documented in my report on using espressif@5.3.0. In brief summary, with 6.0.1 I also have:

  • random reboots without an error message
  • OMG continues to work after the reboot
  • Serial listener needs to be present or it slows down massively
  • No assert rwble 261 error, no total death

I am currently out of ideas what to test next. In the meantime I actually much enjoy the constant stream of MQTT data without having to walk up to the ESP32C3 to press the reset button every few hours. Sweet. See you tomorrow.

Seeed Studio XIAO ESP32C3, this time running for more than 13h 20min with serial output enabled and connected: I have not detected any flaws. In the MQTT log the SYStoMQTT tokens continually recorded uptime up to 48007. Also in the serial log there was no sign of a dump or reboot. So I am pretty satisfied with the _env configuration I posted above. Thank you @argafal and @1technophile.
I have read quite a bit about our problem of serial output in a non-conected state. There are workarounds like skipping, etc… The best appeared to me was avoiding it when not needed. So I can accommodate with two configurations, one for testing and one for production. When reading I was especially helpless about documentation of normal serial output and debugging on the same USB port of the ESP32C3. @argafal: Can you try omitting the SERIAL_JTAG directives in your configuration to not seeing or even avoiding the reboots.
What am I doing next? I could try to reproduce our problems and successes on a ESP32-C3-WROOM-02 board from Espressif Systems, which I have, to help OpenMQTTGateway to upgrade to espressif32@5.3.0. However, my best wish is to ask Alexa what she thinks about our efforts.
Regards

@mrickma Thanks for the concise updates once again. It’s very enjoyable to tackle this along your side.

The last few days I observed OMG without further changes from my last documented setup. However, I still cannot find a pattern or reason for the random reboots. Ultimately, your post pointed out a small difference between our tests: you are a few commits ahead of me. I have now changed to git 1b5215de as of Fri Mar 3 21:53:07 2023 +0100. I will report back once I’ve seen how it behaves.

We are starting to get environment definitions now that seem to work well with the C3. In the long run, what would be needed to allow flashing through the webflasher on (Option 1) Upload from the web | OpenMQTTGateway v1.4.0?

See you tomorrow :slight_smile:

Once we agree on the environment, it can be added with a pull request to the repository. So that it will be available for web upload and OTA.

1 Like

Same random reboots. Uptime ranges are around a thousand to a few thousand seconds, i.e. tens of minutes to two hours or so. There is no error message and it’s not always the same location in the log when the reboot occurs. Is there anything I can do to help understand this better?

@argafal: Do you have the SERIAL_JTAG directives still in your configuration? I could imagine that they add some extra serial output. That should not matter as long as it is read. But when serially disconnected, it would fill up the serial output buffer even when logging is suppressed. In principle that buffer is blocking. That situation slowed down everything else for us and for me also increased the frequency of reboots. This is only a wild guess but the best I have.
I have switched my efforts to a ESP32‐C3‐DevKitC‐02 board from Espressiv. It would be nice to have a reliable basic configuration of OMG for ESP32C3 in general. Since this forum is board oriented, I will open a new thread for my findings. See you.

  • I still have the SERIAL_JTAG directives but I am also reading the serial output. So this should not be the reason for the random reboots.
  • What is the right way of cleaning the build? I removed the SERIAL_JTAG directives, ran pio clean, and then rebuilt/uploaded. However, the serial output still works fine, as if the directives were still there. Serial output had not worked until I added these directives. Is there another (better?) way to clean the build?
  • Is there a way to enable more debug so I can provide more useful feedback on the random reboots?

I have now changed to @mrickma’s environment definition. This time, I uploaded the _no_serial version to the Wemos Lolin C3 mini v2.3.0. Serial output is gone. MQTT shows a successful start-up. The second line is home/OpenMQTTGateway/version version_tag. This seems odd, I guess there should be a version number here.

I will now observe how long it lives with @mrickma’s environment definition and serial output disabled.

The other way would be to decode the exception with an exception decoder, but I think we have enough data here already.

If you do a build yourself with PIO or Arduino IDE the version is not added automatically, you can add it with the flag -DOMG_VERSION="test123456"

Thanks for your answer! I’m afraid there is no exception to decode. It just restarts without an error message or exception, without a pattern or trigger that I could identify.

Understood about the version.

I now use @mrickma’s environment definition from this post: Watchdog kills OMG on ESP32-C3 within seconds - #25 by mrickma

My board is a Wemos Lolin C3 mini v2.3.0. Flashing seeed_xiao_esp32c3_no_serial worked without any issues. git hash 1b5215de. OMG runs fine, but still randomly reboots as documented earlier in this thread. Below an output from an MQTT listener. I did not trigger those reboots myself, I do not know what causes them:

2023-03-07T16:59:32+0100 home/OpenMQTTGateway/LWT online
2023-03-07T19:15:08+0100 home/OpenMQTTGateway/LWT offline
2023-03-07T19:15:08+0100 home/OpenMQTTGateway/LWT online
2023-03-08T08:09:43+0100 home/OpenMQTTGateway/LWT offline
2023-03-08T08:09:43+0100 home/OpenMQTTGateway/LWT online
2023-03-08T09:42:43+0100 home/OpenMQTTGateway/LWT offline
2023-03-08T09:42:43+0100 home/OpenMQTTGateway/LWT online
2023-03-08T15:56:19+0100 home/OpenMQTTGateway/LWT offline
2023-03-08T15:56:19+0100 home/OpenMQTTGateway/LWT online

I suspect those are the same reboots that I experienced when I had serial output enabled and a serial listener attached with my previous environment definition. For the record: these reboots neither showed an Exception nor an error message. I copy and pasted the serial output here: Watchdog kills OMG on ESP32-C3 within seconds - #23 by argafal

Ignoring the reboots this is nicely usable. I would be happy if a similar environment definition as @mrickma 's above would make it into OMG.

Feel free either you or @mrickma to propose a PR with the environment, you can take this example:

@argafal: What do you think about

[env:esp32c3_seeed_lolin]
platform = espressif32@5.3.0
board = seeed_xiao_esp32c3
board_build.partitions = min_spiffs.csv
monitor_speed = 115200
lib_deps =
  ${com-esp.lib_deps}
  ${libraries.wifimanager32}
  ${libraries.ble}
  ${libraries.decoder}
build_flags =
  ${com-esp.build_flags}
  '-DZgatewayBT="BT"'
  '-DLED_SEND_RECEIVE=10' ; valid GPIOs
  '-DLED_INFO=8'
  '-DLED_ERROR=9'
  '-DTRIGGER_GPIO=7'
  '-DNO_INT_TEMP_READING=true' ; No internal temperature on ESP32 C3 or S3
  '-DGateway_Name="OpenMQTTGateway_S&L_ESP32C3"'
custom_description = BLE gateway on Seeed or Lolin C3
custom_hardware = Seeed XIAO ESP32C3 dev or Lolin C3 mini v2.1.0

Can you access ownCloud Enterprise Edition ? Sorry about the translation of the link.There should be OMGseedlolin.diff.zip containing the complete diff. I think first we have to agree about naming and content. And then make sure that it can be built after uncommenting the respective entry in platformio.ini . See you.

@mrickma I made a PR earlier: Add environment for board Wemos Lolin C3 mini by Argafal · Pull Request #1512 · 1technophile/OpenMQTTGateway · GitHub

Just like you, I was also wondering how to handle the two different boards here. Both your Seeed XIAO ESP32C3 and my Lolin C3 mini need the same environment, where only the line “board = …” would be different. Does it matter? Can we make one environment definition for both boards? Or should each have its own? If the latter the two environments definitions will be almost identical.

@1technophile Do you have an opinion on the above? I don’t mind either way, I’m happy to update the PR once we decided.

Seems that there is difference on the LED side isn’t it and that the 2 boards would need a different definition on this ?

I think it is only important that the pins are valid. For the Seeed board GPIOs 2-10, 20, 21 are valid. That seems to be the same for the Lolin. However, there is a contradiction in the PR: “; Available GPIOs are 0 - 8, 10, 20, 21.” vs. “‘-DTRIGGER_GPIO=9’”. Otherwise I really like the result of our efforts. See you.