ESP32 BLE gateway dying every X days

I changed your user level, let’s see if it help

I did more logging now. If you look at the end when it send the core dump it start to spam disconnection_handling, failed xxxx times - I let it run 1300 times, and it never reconnects.

At the 1300th disconnection_handling I clicked the reset on the board and it reconnects

https://pastebin.com/RLveFLbX

Could you indicate your router model please?

I will check the algorithm also to see if there is a bad loop

Unifi usg router with ap-ac-pro ap points. It’s optimized for 2,4ghz no fast roaming, best channels, not shared ssid with 5ghz and all the other general unifi best practices so business class equipment. I have a lot of clients on 2,4, just this one does this with full signal strength

Thanks for pointing this; I have found 2 issues:

  • the origin of the endless loop due to the removing yesterday of the reset function when the gateway fails connecting the first time
  • a bad ; , which displayed a successful connection message even if it was not the case
    N: Connected with saved credentials

They are corrected here:

Can finally confirm my ESP32 dying randomly was fixed by replacing the board with a different model. Exact same flash… just died randomly on one and is fine on a different one.

1 Like

Was running great and…now it’s died twice today already. Power cycle fixes it.
I’ve got the LWT offline message 23 seconds after the last real message, if that gives any indication of what could be happening. Thought it might be some memory leak, as it had run fine for quite some time, but then one of the runs today lasted barely 4 hours…

If you are using v0.9.5 it is not likely to be a memory leak, I had one running for more than one month without free memory degradation:

Are you on v0.9.5?

Yes, 0.9.5 with bluetooth and RF. It’s happened again today. Had been working fine for weeks, and now can’t get it to run more than a few hours without it freezing and needing a reset. No changes at all other than I’ve added a wifi extender (same wifi name, different channel) around the time it’s started doing this. The OMG device is literally 10 cm from the main router, and maybe 30 meters and 1 floor away from the new AP though. Hard to imagine, but maybe picks up that signal for some reason randomly and can’t handle it?

And if you stop the wifi extender, do you get rid of the issue ?

That’d be difficult to test at the moment, especially as it’s actually been running for 5 days now with no further issues, so not a quick test and we rely on signal coverage from that extender. Will negotiate that option if this starts happening again.

Just to update, went ahead with switching the extender to a separate wifi name, and all has been good now for a few weeks. Unsure if this was the cause, but seems like it. Note it was same wifi name on a different channel.

1 Like

I have a ESP32 board with capabilities [“RF”,“BT”,“HCSR501”,“DHT”], it dies every few days/hours. To let it restart I connect 5v to the “EN” pin.

I wonder is this normal?

Here are the history snapshots of uptime&free memory in HA


Thanks for the chart, it is interesting.
I have also my esp32 gateways that restart automatically sometimes but I’m not seeing these memories jump, are you using the default parameters for BLE scanning, is there any particular configuration changes that you could share?

Here are the changes I made for the BT configuration(the reason why is I want the BT gateway to read the status change of my door sensor more frequently so it wouldn’t miss any door open event)

#  define ScanBeforeConnect 80
#  define TimeBtwRead 7000

you can see the whole diff here:esp32 1st config based on 0.9.8 · lkisme/OpenMQTTGateway@dbd57dc · GitHub

it’s based on the “0.9.8” tag

FYI,there are only 2 ble devices I need to track, so I send the white list command per day to the gateway.

You could publish the white list with the retain flag. It will be recorded this way by the broker. And you will not have to send it everyday.
https://docs.openmqttgateway.com/use/ble.html#setting-a-white-or-black-list

Thanks for the tip, for some reason, I didn’t use the retain flag.

Do you have any idea why the “free memory” jump ?

BTW, my other nodemcu board dies and would’t restart sometimes(in this case, if I pushed the RST button on the board, the Serial would print “{null} {null}”. After several times pushing, the board will restart successfully), is there some solution to make sure it will restart(like I did with the ESP32 by connecting the EN to the VCC)?

I don’t have these with my devices. I will try to reproduce your exact configuration to verify if I get the same behaviour.

Could you give the exact model ?

Hi, after some work, I found out 2 kinds of scenarios the OMG would crash, both of them are related with the MQTT connection.

This topic is so long that I create a new one: Runtime exceptions(would cause restart) with OMG

1 Like