ESP32 BLE gateway dying every X days

Lets see in the long run, but still a lot of wifi issues. The one with the official v4 dev board have a lot of reconnects, and when the software do a cpu reset it wont reconnect it seems, but with the button reset its a bit more luck

I see `abort() was called at PC 0x401b31cb on core 1

Backtrace: 0x40092408:0x3ffd0560 0x40092639:0x3ffd0580 0x401b31cb:0x3ffd05a0 0x401b3212:0x3ffd05c0 0x401b32bf:0x3ffd05e0 0x401b3342:0x3ffd0600 0x401b3359:0x3ffd0620 0x401cd7c3:0x3ffd0640 0x400dd798:0x3ffd0680 0x400d5b5e:0x3ffd06a0 0x400df5c1:0x3ffd06c0 0x4008eb51:0x3ffd06e0`

after it search for BT devices though and it seems to make it impossible to OTA as it crashes before the OTA reach 50%

Here is a snippet of the serial log https://pastebin.com/VSqfKAhW

Checked the terminal while trying an OTA. It fails with

N: Scan end, deinit controller

[E][ArduinoOTA.cpp:285] _runUpdate(): Receive Failed
Error[3]: E: Receive FailedPreformatted text

That’s very strange, could you try to erase your flash and to set the user_config.h log level to LOG_LEVEL_TRACE

Also do you have another board to test with?

It works ok on the clone, still testing.

The other one is an esp32-wrover-b - on that OTA and the dump is happening

Great!

Is there is something particular with this board I should be aware of?

Actually, I couldn´t get OTA working on my ESP32 either, but thought it was a problem with my IDE (kept asking for password). Been uploading on Serial.

Im not allowed to write a new post for the next two hours, reached the limit with this one

I dont think they are very different, but will do some tracing. First here you get the upload log. Doing tracing now, and let it run until it fails

This is the flashing log of the esp32-wrover-b

EDIT:

Some more logging - on the bottom some errors that stands out: T: getDeviceByMac 1950E807A107T: Device already discovered or model not detect - Pastebin.com


TO GGGGH

In [env]

upload_port = 192.168.1.xxxx (ip to esp)
upload_protocol = espota
upload_flags =
–port=8266
–auth=OTAPASSWORDHERE
–timeout=10

*** EDIT 3 ***

The user level didnt work. Maybe not the best to limit posts for new users when it can be a bit back and forth :slight_smile:

I updated the log a couple of times above.

What I notice is the esp works ok the first 2-3 minutes (good signal). Seems like BT runs on core 0, then a failure happens on core1, and it start to run a bit crazy.

It ran 5-10 minutes going on and off the wifi, just being online long enough to send the version info, and no BT activity before it rebooted itself.

Noticed now I could run pio device monitor > logFile.txt so easier to create a full log

I’m using Arduino IDE, and it shows a popup asking for the password. I’ve had the issue before and I think it was a bug on the IDE which just reasks for password if any OTA issue encountered. Could be problems with my ESP32, and have a test pending on a new one anyway, so will also try OTA on it.

I changed your user level, let’s see if it help

I did more logging now. If you look at the end when it send the core dump it start to spam disconnection_handling, failed xxxx times - I let it run 1300 times, and it never reconnects.

At the 1300th disconnection_handling I clicked the reset on the board and it reconnects

https://pastebin.com/RLveFLbX

Could you indicate your router model please?

I will check the algorithm also to see if there is a bad loop

Unifi usg router with ap-ac-pro ap points. It’s optimized for 2,4ghz no fast roaming, best channels, not shared ssid with 5ghz and all the other general unifi best practices so business class equipment. I have a lot of clients on 2,4, just this one does this with full signal strength

Thanks for pointing this; I have found 2 issues:

  • the origin of the endless loop due to the removing yesterday of the reset function when the gateway fails connecting the first time
  • a bad ; , which displayed a successful connection message even if it was not the case
    N: Connected with saved credentials

They are corrected here:

Can finally confirm my ESP32 dying randomly was fixed by replacing the board with a different model. Exact same flash… just died randomly on one and is fine on a different one.

1 Like

Was running great and…now it’s died twice today already. Power cycle fixes it.
I’ve got the LWT offline message 23 seconds after the last real message, if that gives any indication of what could be happening. Thought it might be some memory leak, as it had run fine for quite some time, but then one of the runs today lasted barely 4 hours…

If you are using v0.9.5 it is not likely to be a memory leak, I had one running for more than one month without free memory degradation:

Are you on v0.9.5?

Yes, 0.9.5 with bluetooth and RF. It’s happened again today. Had been working fine for weeks, and now can’t get it to run more than a few hours without it freezing and needing a reset. No changes at all other than I’ve added a wifi extender (same wifi name, different channel) around the time it’s started doing this. The OMG device is literally 10 cm from the main router, and maybe 30 meters and 1 floor away from the new AP though. Hard to imagine, but maybe picks up that signal for some reason randomly and can’t handle it?

And if you stop the wifi extender, do you get rid of the issue ?

That’d be difficult to test at the moment, especially as it’s actually been running for 5 days now with no further issues, so not a quick test and we rely on signal coverage from that extender. Will negotiate that option if this starts happening again.

Just to update, went ahead with switching the extender to a separate wifi name, and all has been good now for a few weeks. Unsure if this was the cause, but seems like it. Note it was same wifi name on a different channel.

1 Like