But I didn’t see the config popping up in a MQTT message using MQTT explorer.
Both devices (the Theengs Bridge and the RPi with the docker container and USB bluetooth dongle) are at the same location. The gateway in the docker container is stable and keeps sending data for weeks now without any disruption. The Theengs Bridge faces a disruption almost every hour for 20 seconds to a couple of minutes. During this period of time no data is sent to the broker. At the same moment the Theengs Gateways keeps sending data, and my Tile devices are advertising as usual.
What’s the difference between the Theengs Gateway and the Theengs Bridge causing this disruption? Am I using incorrect parameters?
And after this interruption it restart retrieving the data by itself?
Mostly, data starts coming in by itself quite fast, but not always. And since 13:30 today it has stopped, see this log on my MQTT client:
2024-08-12 21:14:09 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-12 21:17:49 [INFO] data received from client 2, resuming normal operation
2024-08-12 21:55:49 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-12 22:01:09 [INFO] data received from client 2, resuming normal operation
2024-08-12 23:00:49 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-12 23:01:09 [INFO] data received from client 2, resuming normal operation
2024-08-12 23:08:29 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-12 23:12:09 [INFO] data received from client 2, resuming normal operation
2024-08-13 00:33:49 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 00:34:09 [INFO] data received from client 2, resuming normal operation
2024-08-13 03:33:29 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 03:33:49 [INFO] data received from client 2, resuming normal operation
2024-08-13 03:35:49 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 03:39:09 [INFO] data received from client 2, resuming normal operation
2024-08-13 05:24:09 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 05:24:29 [INFO] data received from client 2, resuming normal operation
2024-08-13 08:48:50 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 08:52:30 [INFO] data received from client 2, resuming normal operation
2024-08-13 10:08:50 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 10:09:10 [INFO] data received from client 2, resuming normal operation
2024-08-13 11:05:10 [ERROR] no data received from client 2 for at least 20 seconds
2024-08-13 11:05:30 [INFO] data received from client 2, resuming normal operation
2024-08-13 13:30:30 [ERROR] no data received from client 2 for at least 20 seconds
...
I would suggest not going below 100ms to let the time to send the data.
I also tried it with 1000ms, same behaviour.
Also you could have "onlysensors":true to reduce the traffic to the broker.
For my use case I would like to scan for any device. And I believe the Docker container is also not scanning for just sensors.
Questions:
Do you understand this different behaviour between the Theengs Bridge and the Docker Container?
Can I check/enable logs on the Theengs Bridge to understand why it pauses for a while?
I planned to replace my Docker containers by using the Theengs Bridges, but first I need it to be stable.
The common ground is the decoder they used; outside of this, they are not using the same libraries and hardware.
They are enabled by default, feel free to connect it to a computer with an USB cable and log the trace, it will be interesting to see what’s going on when disconnecting. You can use Arduino IDE to monitor the logs.
Also could you try with WiFi if you are using Ethernet, or Ethernet if you are using WiFi. To check if that makes a difference.
Thank you for explaining the common ground. I wasn’t aware these were such different implementations. I bought the Things Bridges because the Docker container worked fine. And my plan was to simplify my software stack, and become location independent, by using stand alone devices for BLE discovery.
The current state of the Theengs Bridge is that it regularly disconnects and reconnects (every 1-2 hours), but after a while (a day or two?) it disconnects and never comes back online. This situation is not stable enough to use it for my use case. So I will try my best to gather information about the cause and I hope you can resolve it with a software update when its clear why this happens.
I was using ethernet only, I reset the device and switched to Wi-Fi only now to see if this makes a difference.
I installed Arduino IDE on my PC to check the logs. But can you give me a couple of hints or short description on how to check the logs? It looks like I have to write a sketch and upload it to the device? I have no Arduino experience so far. Mainly working with Python and Raspberry Pi’s.
For my information, are the Theengs Bridges in general in use by yourself and others, and are you experiencing a stable situation in where it keeps sending messages for a week without disruptions?
This should not occur, of course. Usually, my Theengs Bridges, connected by Ethernet are stable and last weeks without restart. I have several options to try depending on the result of the different tests.
Thanks, let’s see how it goes.
If you are unfamiliar with Arduino IDE, I may have a more straightforward solution for you. However, it was still interesting to install it to have the USB drivers.
Thank you for providing the steps to check the logs.
The Bridge is now using Wi-Fi only, since I performed a reset and unplugged the ethernet cable. So far it’s running without issues, for 21 hours now. First, I will wait a couple of days to see it anything happens before I connect it to a PC and use it while connected. I will keep you updated.
Once you are done, if you are still interested, I suggest going to the development version instead of trying to get logs with the serial monitor. It improved the stability for another user.
When asked if you want to erase the flash, you can leave the checkbox unchecked.
It ran ~27 hours without issues using only Wi-Fi, then it lost connection. I found it this morning with just a yellow light blinking every ~7 seconds. So far both Wi-Fi and Ethernet give similar results. However, I’m not sure if I saw the same pattern of lights when it lost connection on Ethernet.
Then I plugged it in my PC and it started working again, probably due to a power cycle. I installed the development version without checking the erase checkbox. After updating it started working again and resumed connection to the broker. It’s still using the default parameters since the reset.
I can see the log messages on the screen now. Do I understand it correctly that I have to keep my PC on until the crash to see the logs? Or is there an option to run the bridge independently and plug it in to download the log of the past days? I noticed the WebUI now works, is the Console providing exactly the same information?
Another question is if I can enable timestamps starting at every log line to make it easier to understand when an error happened?
When starting up I notice a couple of errors, is this expected?
************* WELCOME TO OpenMQTTGateway **************
[ 174][E][Preferences.cpp:50] begin(): nvs_open failed: NOT_FOUND
N: SYS config not found
[ 175][E][esp32-hal-gpio.c:102] __pinMode(): Invalid pin selected
E (134) gpio: gpio_set_level(226): GPIO output gpio_num error
A little further I notice some Wi-Fi errors, is this expected?
N: Attempting Wifi connection with saved AP: 0
N: Attempting Wifi connection with saved AP: 1
E (17279) wifi:sta is connecting, return error
[ 8442][E][WiFiSTA.cpp:317] begin(): connect failed! 0x3007
N: Attempting Wifi connection with saved AP: 2
E (18287) wifi:sta is connecting, return error
[ 9450][E][WiFiSTA.cpp:317] begin(): connect failed! 0x3007
N: Attempting Wifi connection with saved AP: 3
E (19295) wifi:sta is connecting, return error
[ 10458][E][WiFiSTA.cpp:317] begin(): connect failed! 0x3007
N: Attempting Wifi connection with saved AP: 4
E (20303) wifi:sta is connecting, return error
[ 11466][E][WiFiSTA.cpp:317] begin(): connect failed! 0x3007
Not compulsory, as you now have the development version, you could let it alone connected to the power supply.
If something goes wrong with this version, we will have to maintain a serial connection to have access to the logs.
Yes, we could imagine going into the WebUI if something goes wrong (if the issue is not a network connection)
With Arduino IDE yes, but not with the WebUI.
It did not find a system configuration, if you have not changed the system parameters this is expected
After running ~17 hours, I received a warning from my broker that no messages were received for at least 20 seconds. But shortly after messages were received again. So there was a short hickup.
Since it’s running stand alone now, using ethernet, I went into the WebUI but I didn’t see any issue. If I understand correctly, the WebUI console does not show any historic information from before you opened the WebUI console. I will have to keep the console window open on my PC to catch the error.
Would it be possible to add a download button to the WebUI console to download the logs of the full day?
I need to think about it. There is a Download button when the board is connected to Serial with the web upload but it does not export a full day
The WebUI would enable to catch a cycling error without having to keep the console open but for transient errors the solution is to have the Arduino IDE connected to a Serial cable.
We may not need to go there, let’s see how it goes.
The development version has been running for a week now and the good news is that it has not stopped sending messages completely after a couple of days, like the original version did.
However, every day my broker notifies me of 2 to 3 occasions (per day) where the broker does not receive any messages for at least 20 seconds. So far, this issue resolved itself every time a couple of seconds later. The Theengs Gateway running from a Docker container on a RPi has never showed such behavior, even after running for weeks.
When looking at MQTT Explorer, it seems all the messages are received in a batch and then it becomes silent for ~10 seconds before it receives the next batch of messages. Is it possible to shorten this timeframe? I expected the scanduration to affect this, but that’s set to 1000, which should be 1 second right?
I’ve changed the default parameters now in such way that the interval is now at 1 instead of 100 and the minrssi is now at -200 instead of -100.
Would this be helping in any way? Or would it be better to connect the bridge to my PC for a couple of days to see if anything shows up in the logs during the short timeout occasions?
Sounds great that you are working on a new improved version, looking forward!
I notice that you seem to be able to quote my replies, but when I click the reply button, nothing from your text is quoted. Am I missing a button on this forum?
Today I have looked more closely at the Theengs Bridge compared to my RPi 3b with internal bluetooth that is running the Docker container with the Theengs Gateway. I discovered that the results of discovering devices are very different.
My bike has a Tile device connected to the frame so that I know if the bike is home or away. Both the RPi and the Theengs Bridge report to the same broker. My home automation system checks the broker to see if a device was reported more than 5 minutes ago, and then sets it to away. Both the RPi and the Theengs Bridge are located at the same spot in my house, about 5 metres away from the bike with one brick wall in between.
Since I haven’t used the bike for a month now, the RPi with the Theengs Gateway states that the bike is home for 30 days already. It never did not find the Tile for more than 5 minutes.
Looking at the Theengs Bridge, having the external antenna, I expected similar or even better results. But I noticed that the bike shows the away status for ~40 times a day, every day.
As far as I know, the Tile device advertises every few seconds, so this should not be happening. I noticed this behaviour for multiple devices.
This behaviour would make it impossible to use the Theengs Bridge for discovering devices related to a home automation system. Do you understand why this is happening and can we get it improved?
I would suggest the following parameters: “interval”: 500 → let time to process the data “scanduration”: 30000 → improve the capacity to catch advertisements “presenceawaytimer”: 300000 → increase the time before declaring the device offline
I adjusted the parameters as you advised now and will check in 1-2 days to see the effect.
But by increasing the scanduration, will it take longer to find a new device? When I come home with my bike the garage door opens when it is found. Would be too bad if I would be waiting ~30 seconds because of the scanduration.