ESP32 BLE gateway dying every X days

Off topic, but what do you use the esp32-cam for? Regular network cameras, or something interesting? I have a few spare which are looking for a project.
I was looking to get into tensorflow for detecting unoccupied parking spots in my building’s parking, but am dragging my feet on actually starting it

Yes, I have configured them as regular network cameras (for indoor use but I guess they could work outdoor as well provided a good housing) They are so small, easy to mount in well hidden spots :wink:

My system for home security video monitoring is structured like this

Outdoor & indoor camera streams → Motion sw (when & if motion is detected, those frames are recorded & forwarded) → Nvidia Jetson Nano (where AI object detection takes place) → If “people” are detected (some compressed pictures are sent to my iPhone for each event).

The video monitoring is only activated when our home alarm system is “armed” or “armed home”

In the Jetson Nano, I use TensorRT with detectNET. It works pretty well and fast. I have also tried YOLO, it is also very accurate. Detecting cars is supported by both models and trained networks but in my use case, I filter on “people” only

captured12

PS I have also seen that the ESP32 BLE gateway stopped working after a while. Since this morning I’m running a version with

#define TimeBtw_Read 5000 //define default time between 2 scans
#define Scan_duration 10 //define the time for a scan

So far so good, I receive scans of my iBeacon approx every 15 second, we will see. If it stops, I will try to start it again via the serial monitor (PlatformIO) and check if it then runs longer. Currently I have no idea if that will make any difference but I noticed yesterday that if it stops, it can be started again via the serial monitor

1 Like

When you say “Motion sw” do you mean on the esp32, or something built into a dedicated security camera?
You’re forwarding just several frames, not the entire rtsp stream?

I think the borderline of stability for esp32 is around 10 seconds between scans, though I haven’t tried to see if doing the scan on a separate core makes any difference. Btw I also wanted to use it for presence detection (to turn lights off, never on), but I get a lot of instances where the beacon has better rssi to the esp32 in the next room rather than the one I’m in since humans block Bluetooth better than walls, so as you orient yourself differently to the beacon you get varying rssi

Yes, Motion is running on a RPi3, the esp32-cam is not making any image processing, is just providng the stream to Motion over wifi network. If it is rtsp, I doubt, you get the stream typically via http like http://192.168.0.35:81/stream. I have not measured the framerate but it is ok for my use case
This is Motion: https://motion-project.github.io/index.html

I use presence detection when we arrive home to our house. Close to our house, the presence of mine and my wifes iPhone is detected. They are detected by a sw named Monitor running in a RPi (is a bash script, don’t think it can run in a esp32). On phones, just bluetooth has to be turned on, no additional apps like Owntracks etc needed. When detected, the video detection is paused for 5 minutes allowing us to enter and disarm the alarm system, we do not want pictures of ourselves sent to our phones for legal entrances

And this is Monitor: https://github.com/andrewjfreyer/monitor

My test with ESP 32 BLE gateway is still running fine,

ESP32 BLE gw still running fine, 69 hours now since start

image

1 Like

Same, still holding strong

1 Like

I think I maybe have found something…

So the story as follows; Been running the ESP32 for testing via the serial monitor in platformio. In our home network I have a scheduler running that once a week restarts my wireless AP (power off —wait — power on). It seems this is “good for the AP behavior”)

Anyway, this happens 4:30 am on Mondays (today). When this happened I could see that the ESP32 BLE gateway stopped working and never recovered. From the log it looks like it is somehow looping, retrying to reconnect the wifi but never succeeds. Closing the serial monitor and restarting helped, now everything restarted from scratch ok

See the following from the serial monitor log:

client not connected can't pub
client not connected can't pub
Creating BLE buffer
device detected
75B2426C1801
BLErssi
-95
txPower
-59
BLE DISTANCE :
35.51
client not connected can't pub
client not connected can't pub
MQTT connection...
[E][WiFiClient.cpp:232] connect(): connect on fd 58, errno: 118, "Host is unreachable"
failure_number
1151
failed, rc=
-2
Scan end, deinit controller
BT Task running on core 0
MQTT connection...
[E][WiFiClient.cpp:232] connect(): connect on fd 59, errno: 118, "Host is unreachable"
failure_number
1152
failed, rc=
-2
Scan begin
MQTT connection...
[E][WiFiClient.cpp:232] connect(): connect on fd 60, errno: 118, "Host is unreachable"
failure_number
1153
failed, rc=
-2
Creating BLE buffer
device detected
47DEF4224401
BLErssi
-87
txPower
-59
BLE DISTANCE :
18.08
client not connected can't pub
client not connected can't pub

Hello,

Thanks for the extract, this is with the v0.9.3beta isn’t it?

If yes are you using wifimanager for entering your wifi credentials?

Yes, 0.9.3beta and yes, always using wifimanager for the first setup after a new flashing/upload

ok I will try to reproduce but maybe we should open another topic as it is a different issue.

If you prefer, ok, but the symptoms are the same, you experience that it just stops working

Anyway, I did try to reproduce and did it in the following way

  1. Start platformio, and run from serial monitor
  2. Everything is working fine, reporting to mqtt is working, seen in the terminal log
  3. Break the power to the AP for a while (a minute or two)
  4. You see in the terminal log that the ESP has lost wifi connection, cannot find the AP is continuosly reported:

client not connected can’t pub
client not connected can’t pub
*WM: [2] [EVENT] WIFI_REASON: 201
*WM: [2] [EVENT] WIFI_REASON: NO_AP_FOUND
*WM: [2] [EVENT] WIFI_REASON: 201
*WM: [2] [EVENT] WIFI_REASON: NO_AP_FOUND
MQTT connection…
[E][WiFiClient.cpp:232] connect(): connect on fd 57, errno: 118, “Host is unreachable”

  1. Turn on power to the AP again, the messages like above is not shown any longer, so it seems the AP is found again since only this message is repeatedly shown in the log:

[E][WiFiClient.cpp:232] connect(): connect on fd 57, errno: 118, “Host is unreachable”

  1. No re-connection to the AP is happening, you can wait forever
  2. To recover, restart the task serial monitor and everything is initiated from scratch and it starts functioning again

Hopes this helps,
Kind regards, Walter

Hello,

Thanks for this detailled report.
I’m suspecting that the BLEscan corner the antenna.
Here is a modification to test, so as to see if it correct your issue.
https://github.com/1technophile/OpenMQTTGateway/tree/wifi-reconnect-when-scan

This modification avoid the start of the BLE scan if MQTT is disconnected

Hello,
Did not help. I can see that “MQTT client disconnected no BLE scan” is written to log but the wifi client is not trying looking for the AP and reconnecting even if the AP is back

I’m also having problems that it freezes after a couple of hours/days depending on the used Scan_duration.
I’m using a m5 stack.

I now tried to use v0.9.3beta and the default 10s Scan_duration, and it worked fine for 5 days then it freezes again, and after manually re-powering the hardware it works for 5 hours and again freeze, so I see no visible logical behaviour.

when I do:
mosquitto_sub -t home/# -v
i get:
home/OpenMQTTGateway/LWT offline
home/OpenMQTTGateway/version 0.9.3beta

I tried:
mosquitto_pub -t home/OpenMQTTGateway/commands/MQTTtoBT/set -m ‘{“interval”:0}’
and then:
mosquitto_pub -t “home/OpenMQTTGateway/commands/MQTTtoSYS/set” -m ‘{“cmd”:“restart”}’

but nothing.
What else can/should I check?
Trying to leave it connected to a computer on serial for a few days, hoping it will log something usefull to diagnose is a bit complicated.

I’m using the mijia temp sensors to controll heating in the house, so when this freezes in the middle of the night it means we start freezing literally!!! :slight_smile:

It must be something I can change to make it work, otherwise my family will get quite angry with me :slight_smile:

I will test a different ESP32 board (devkit v1) to check if this is m5-stack related or not…

In the meantime I’m launching the stability tests of the current dev branch. I will keep you updated with the results.

May I suggest you to plan a dayly restart for such a “freezing” use case :wink:

I’m thinking that maybe my freezes have something to do with the fact that I have around 6 pcs of eq3 eqiva bluetooth heating valves, and OMG seems to be discovering them and somehow getting some data from them…because if I filter out my mijia temp sensors from mqtt messages I still get a bunch of stuff from the eq3:

home/home_presence/OpenMQTTGateway/id cc:b1:1a:1a:41:59
home/home_presence/OpenMQTTGateway/manufacturerdata u
home/home_presence/OpenMQTTGateway/rssi -89
home/home_presence/OpenMQTTGateway/distance 21.5
home/OpenMQTTGateway/BTtoMQTT/CCB11A1A4159/id cc:b1:1a:1a:41:59
home/OpenMQTTGateway/BTtoMQTT/CCB11A1A4159/manufacturerdata u
home/OpenMQTTGateway/BTtoMQTT/CCB11A1A4159/rssi -89
home/OpenMQTTGateway/BTtoMQTT/CCB11A1A4159/distance 21.5
home/home_presence/OpenMQTTGateway/id 4c:65:a8:d9:c6:3d
home/home_presence/OpenMQTTGateway/rssi -81
home/home_presence/OpenMQTTGateway/distance 10.5
home/home_presence/OpenMQTTGateway/id 00:1a:22:0c:74:f1
home/home_presence/OpenMQTTGateway/rssi -85
home/home_presence/OpenMQTTGateway/distance 15.1
home/OpenMQTTGateway/BTtoMQTT/001A220C74F1/id 00:1a:22:0c:74:f1
home/OpenMQTTGateway/BTtoMQTT/001A220C74F1/rssi -85
home/OpenMQTTGateway/BTtoMQTT/001A220C74F1/distance 15.1
home/home_presence/OpenMQTTGateway/id 00:1a:22:0e:0d:d9
home/home_presence/OpenMQTTGateway/rssi -92
home/home_presence/OpenMQTTGateway/distance 27.8
home/OpenMQTTGateway/BTtoMQTT/001A220E0DD9/id 00:1a:22:0e:0d:d9
home/OpenMQTTGateway/BTtoMQTT/001A220E0DD9/rssi -92
home/OpenMQTTGateway/BTtoMQTT/001A220E0DD9/distance 27.8
home/home_presence/OpenMQTTGateway/id 74:63:0b:c5:b9:df
home/home_presence/OpenMQTTGateway/manufacturerdata L
home/home_presence/OpenMQTTGateway/rssi -92
home/home_presence/OpenMQTTGateway/distance 27.8
home/OpenMQTTGateway/BTtoMQTT/74630BC5B9DF/id 74:63:0b:c5:b9:df
home/OpenMQTTGateway/BTtoMQTT/74630BC5B9DF/manufacturerdata L
home/OpenMQTTGateway/BTtoMQTT/74630BC5B9DF/rssi -92
home/OpenMQTTGateway/BTtoMQTT/74630BC5B9DF/distance 27.8
home/home_presence/OpenMQTTGateway/id 42:ee:70:fc:86:0f
home/home_presence/OpenMQTTGateway/manufacturerdata L
home/home_presence/OpenMQTTGateway/rssi -74
home/home_presence/OpenMQTTGateway/txpower 12
home/home_presence/OpenMQTTGateway/distance 5.3
home/OpenMQTTGateway/BTtoMQTT/42EE70FC860F/id 42:ee:70:fc:86:0f
home/OpenMQTTGateway/BTtoMQTT/42EE70FC860F/manufacturerdata L
home/OpenMQTTGateway/BTtoMQTT/42EE70FC860F/rssi -74
home/OpenMQTTGateway/BTtoMQTT/42EE70FC860F/txpower 12
home/OpenMQTTGateway/BTtoMQTT/42EE70FC860F/distance 5.3
home/home_presence/OpenMQTTGateway/id 00:1a:22:0c:76:41
home/home_presence/OpenMQTTGateway/rssi -82

Do you think this might contribute to the dying every X days?
If I add the eq3 addrs to the blacklist (is there something like that), or just add the mijia sensors to the whitelist, maybe it will be better ??

One more thing, before I was using with pBLEScan->setActiveScan(false) in order to save power, but then I changed it back to eliminate the chance that this could be causing the freezes. But I see that even with ActiveScan true it still does freezes.
I really prefer to save power in order to change batteries not so often (I got 6 sensors+6 valves around the house)

I just added my mijia sensors to the whitelist to filter out everything else, let’s see if this helps with the freezes.

(after few hours)

Ok, now it just logs this, without any Bluetooth sensor stuff:

home/OpenMQTTGateway/LWT online
home/OpenMQTTGateway/version 0.9.3beta

home/OpenMQTTGateway/SYStoMQTT/uptime 16800
home/OpenMQTTGateway/SYStoMQTT/freeMem 39140
home/OpenMQTTGateway/SYStoMQTT/rssi -68
home/OpenMQTTGateway/SYStoMQTT/SSID fmar
home/OpenMQTTGateway/SYStoMQTT/ip 192.168.1.18
home/OpenMQTTGateway/SYStoMQTT/mac 80:7D:3A:C8:28:4C
home/OpenMQTTGateway/SYStoMQTT/modules BT
home/OpenMQTTGateway/SYStoMQTT/uptime 16920
home/OpenMQTTGateway/SYStoMQTT/freeMem 39140
home/OpenMQTTGateway/SYStoMQTT/rssi -68
home/OpenMQTTGateway/SYStoMQTT/SSID fmar
home/OpenMQTTGateway/SYStoMQTT/ip 192.168.1.18
home/OpenMQTTGateway/SYStoMQTT/mac 80:7D:3A:C8:28:4C
home/OpenMQTTGateway/SYStoMQTT/modules BT

I noticed that the freeMem is way much lower than when it started (few hours before):

home/OpenMQTTGateway/SYStoMQTT/uptime 12240
home/OpenMQTTGateway/SYStoMQTT/freeMem 59680
home/OpenMQTTGateway/SYStoMQTT/rssi -66
home/OpenMQTTGateway/SYStoMQTT/SSID fmar
home/OpenMQTTGateway/SYStoMQTT/ip 192.168.1.18
home/OpenMQTTGateway/SYStoMQTT/mac 80:7D:3A:C8:28:4C
home/OpenMQTTGateway/SYStoMQTT/modules BT
home/OpenMQTTGateway/SYStoMQTT/uptime 12360
home/OpenMQTTGateway/SYStoMQTT/freeMem 58840
home/OpenMQTTGateway/SYStoMQTT/rssi -66
home/OpenMQTTGateway/SYStoMQTT/SSID fmar
home/OpenMQTTGateway/SYStoMQTT/ip 192.168.1.18
home/OpenMQTTGateway/SYStoMQTT/mac 80:7D:3A:C8:28:4C
home/OpenMQTTGateway/SYStoMQTT/modules BT
home/OpenMQTTGateway/SYStoMQTT/uptime 12480
home/OpenMQTTGateway/SYStoMQTT/freeMem 58588
home/OpenMQTTGateway/SYStoMQTT/rssi -66
home/OpenMQTTGateway/SYStoMQTT/SSID fmar
home/OpenMQTTGateway/SYStoMQTT/ip 192.168.1.18
home/OpenMQTTGateway/SYStoMQTT/mac 80:7D:3A:C8:28:4C
home/OpenMQTTGateway/SYStoMQTT/modules BT

when I tried:

home/OpenMQTTGateway/commands/MQTTtoBT/set {“interval”:0}

it just went:

home/OpenMQTTGateway/LWT offline

and then nothing

What the hell could be wrong with it that it stopped logging BLE data?
At least it’s not freezed like before I added the sensor addrs to the whitelist…

Any ideas what to try next? I’m running out of ideas/hope that this can be a reliable option to extend range of bluetooth mijia sensors… :frowning:

That’s an interesting track, thanks for pointing it.

Instead of that could you try a restart to see if you recover the scan availability and the memory at the same level as start?

In my side I’m monitoring 2 ESP32 to see if I reproduce the same behaviour