Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Performance Issues: #234

Open
jack3308 opened this issue Jul 10, 2024 · 13 comments
Open

Performance Issues: #234

jack3308 opened this issue Jul 10, 2024 · 13 comments
Assignees

Comments

@jack3308
Copy link

jack3308 commented Jul 10, 2024

Version

v0.0.0
(installing 0.6.7 to test)

Bug

Opening an issues separate from the wedging issue documented elsewhere.

Running HASS on RPi 4B 4GB ram and booting from SSD. 4 addons (really basic stuff), processor sits at about 5% usually and ram usually at around 25%.

Turning on Bermuda leads to immediate slow down of the whole system. Bluetooth Proxies are running just fine, all added to areas, and seem to have very minimal impact on the system when Bermuda isn't running.

Logs have been sent via email.

Originally posted by @agittins in #199 (comment)

@agittins agittins self-assigned this Jul 10, 2024
@agittins
Copy link
Owner

I just checked my mail again, and I don't seem to have it. I also checked the logs on my mailserver, still no luck. Could you try re-sending it, or perhaps better still, you can upload it to my nextcloud server (this is an upload-only service, so only I can access the files uploaded):

https://cloud.ajg.net.au/index.php/s/JpeXDnZQGeXqqHB

@jack3308
Copy link
Author

I'll just leave the link here - logs

I've resent the email and made sure I had the address right so hopefully you get it there too, but it should be fine I think.

@agittins
Copy link
Owner

Great, got it, thanks!

OK, so for some reason Bermuda is seeing/creating tens of thousands of bluetooth devices, and I think that's causing it to bog down.

The following is partly me just working through the problem and taking notes as I go, if it's all a bit much to take in, just skip down to the "Next Steps" heading :-)

Of the bluetooth scanners that show up in the logs, this is how many updates we get from each one:

      2 0C:8B:95:A8:2D:B4
     12 DC:A6:32:A4:81:91
     30 4C:75:25:C4:51:C0
      5 4C:75:25:C6:80:9C
      5 4C:75:25:C6:80:BC
   6601 8e53deb259378073498fd5d4439ff7c4
  22710 21821ec872bdd98bf04545f692c4d0b8
  35416 158392aa5806da6fef042630ed5e3a0d

The ones that are MAC addresses look normal, and shows a fairly "typical" amount of traffic. Those last three are... something else! They are sending thousands on thousands of BluetoothChange.ADVERT events. This is almost certainly what's causing the bog down, and causing the log to be so big.

On top of the log volumes, it also means that there are a lot (like, a LOT a lot!) of devices stored in the backend's bluetooth advert logs. This will be taking up a bit of memory, but it seems it's perhaps not causing any issues except when Bermuda is trying to process the data.

These log entries:

2024-07-01 18:58:56.712 INFO (MainThread) [custom_components.bermuda] Having to prune 63830 extra devices to make quota.
2024-07-01 19:04:07.028 INFO (MainThread) [custom_components.bermuda] Having to prune 63837 extra devices to make quota.
2024-07-01 19:09:19.985 INFO (MainThread) [custom_components.bermuda] Having to prune 63834 extra devices to make quota.
2024-07-02 00:31:37.332 INFO (MainThread) [custom_components.bermuda] Having to prune 63858 extra devices to make quota.
2024-07-02 00:45:07.392 INFO (MainThread) [custom_components.bermuda] Having to prune 63890 extra devices to make quota.
2024-07-02 01:11:53.932 INFO (MainThread) [custom_components.bermuda] Having to prune 63864 extra devices to make quota.
2024-07-02 01:32:13.454 INFO (MainThread) [custom_components.bermuda] Having to prune 63897 extra devices to make quota.
2024-07-02 03:01:29.908 INFO (MainThread) [custom_components.bermuda] Having to prune 63888 extra devices to make quota.

Indicate that Bermuda is trimming the number of adverts, and even after discarding the old ones (with old timestamps on them) it still has another 63 thousand it has to purge to keep within the limits (we keep a thousand).

The error happens repeatedly because those integrations keep reporting new devices (or at least, new timestamps on them).

Looking at all the IRK devices it "finds" it seems that there are 11 thousand unique addresses there. I think something might be reporting its historical cache of addresses as new adverts, perhaps. Or something.

At any rate, what Bermuda is receiving is way outside of what it expected, which is why your system is bogging down.

Next Steps

There are two things that would be helpful...

I need to work out which integration(s) are providing those bluetooth scanner services, which HA is identifying as
8e53deb259378073498fd5d4439ff7c4, 21821ec872bdd98bf04545f692c4d0b8, and 158392aa5806da6fef042630ed5e3a0d.

In HA, if you can go to "Developer Tools", "Template" and paste this in:

8e53: {{ config_entry_attr('8e53deb259378073498fd5d4439ff7c4', 'domain') }}: {{ config_entry_attr('8e53deb259378073498fd5d4439ff7c4', 'title') }}

2182: {{ config_entry_attr('21821ec872bdd98bf04545f692c4d0b8', 'domain') }}: {{ config_entry_attr('21821ec872bdd98bf04545f692c4d0b8', 'title') }}

1583: {{ config_entry_attr('158392aa5806da6fef042630ed5e3a0d', 'domain') }}: {{ config_entry_attr('158392aa5806da6fef042630ed5e3a0d', 'title') }}

It should give you a listing on the right with the integration name and config title of each of them. Copy/paste that to here and I can work from there.

That's the most important thing I need right now, as it will tell me which integration is freaking Bermuda out, and I can hopefully try and replicate the issue here.

As a secondary objective 😄, if you have the Bluetooth integration visible, can you download diagnostics from that for me? In "Settings", "Devices and Services", "Bluetooth", click the meatball next to CONFIGURE and choose Download Diagnostics. It will look a bit like:
image
(note this is in the Bluetooth integration, not Bermuda).

I received your last email, so you can again upload it and send me the link via email, if that works for you.

Thanks for your time with this!

@formatBCE
Copy link

Hi @agittins ! It's grumpy guy from #199

I decided to give another shot to Bermuda.
So here's info you're asking for:
Screenshot 2024-08-02 at 2 13 38 PM
config_entry-bermuda-01J4A5YEKP8F7F5JHBS9G5FBXM.json

Right now installed to 2 ESP devices and also HA has its own bluetooth adapter.
6 beacons added.
Didn't notice performance drop so far, except strange 20-sec hiccup for the time that diagnostics was gathered. I already thought it's freezing again, but it came back.

@formatBCE
Copy link

I will hold it on 2 scanner nodes for couple days, will see how it works.

@jack3308
Copy link
Author

jack3308 commented Aug 20, 2024

Sorry it's been such a long time since I've responded, @agittins... Initially went to try and find the source of those BLE readings in setup and was just getting nulls when I tried the template you provided and when I tried searching through means. Just by process of elimination, I think I narrowed it down to either the Passive BLE HACS component or the BTHome Integration as I was using some of the xiaomi ble temp/humiditiy sensors. Scrapped those for now and swapped back to some sonoff zigbee ones and removed both of those integrations (plus the tractive ble pet tracker integration, but I'd be surprised if that was part of the problem) and I'm not getting the ridiculous logs that I was seeing before. Seeing something much more reasonable now, though still having issues with performance drop. Most notably when it comes to firing events or calling services from the front end. The best example of this is when using a Roku Remote card in lovelace it's pretty instantaneous when Bermuda isn't enabled - very much akin to just using a regular remote or the Roku app. Turning Bermuda on immediately adds a 1-3 second delay to all remote actions I call.

Not sure if any of that is helpful, and happy to provide more info for troubleshooting if you like, but just thought I'd come back with what I found finally.

Regardless, it works really really well to accomplish it's goal. Easiest room tracker I've used to date by far! With the added bonus of BLE proxies everywhere!

@agittins
Copy link
Owner

agittins commented Aug 20, 2024

Cheers Jack, all good. Life gets in the way :-)

That's interesting that the template didn't work, maybe the id isn't a valid config entry, or maybe I got something wrong in the template.

Glad you have your system up and running, but sorry that your still having some performance/lag issues. I'd like to get to the bottom of it as it's probably something that others will continue to run into, and might point to something silly I'm doing that can be fixed :-) . I'm happy to work on this with you as long as you are, and no worries about delays etc.

I think to get a fresh start on looking into this, what would really help is:

  • a download diagnostics from your current system. Note that this can take a bit of time to generate, as it scrubs the results to anonymise the mac addresses etc. The longer your system has been "up" (eg, several days/weeks) the longer this will probably take, as it will have a lot more MAC addresses cached - so doing it within a few hours after a fresh boot might work quicker.
  • 30 seconds of debug logs. This is mainly so I can see how long each processing loop takes for Bermuda to complete, and if it's maybe doing something silly like reloading the device registry or something too often.

Feel free to email the log if you'd rather not post publically (oh and feel free to remove your old log from proton), the diagnostic download should be fairly safe to share, depending on your personal privacy boundaries.

Some things I am thinking that could be going on are:

  • lots of devices enabled, or a short update_interval, causing lots of state machine changes (directly impacts the frontend) and/or database writes
  • an unexpectedly large number of source devices being reported, slowing things down
  • Bermuda doing something unexpected.

Anyway if you are able to shoot me a diags and a debug log I can take a look from there, and the above might give you some ideas if you decide to do some more digging on top of that.

Regardless, it works really really well to accomplish it's goal. Easiest room tracker I've used to date by far! With the added bonus of BLE proxies everywhere!

Awesome, that's great to hear! 😀

@myroslav
Copy link

myroslav commented Oct 5, 2024

@agittins I have Performance issue w/ Bermuda as well, not sure if I narrowed down the cause, but I tried to track iBeacon provided by HA Companion and added hass_Bluetooth_Proxy w/ its Companion on a stationary Android Tablet. Will try to dig deeper and find out what exactly bogs my system down in the coming days.

Should I post separate issue, or hijacking this one would be ok?

@agittins
Copy link
Owner

@myroslav sorry for the delay in my reply, I've been away and without solid internet for a while.

Sticking with this issue might be good for now, since I suspect there might be a common thread. The other integration you mention is very interesting, it could certainly be having an affect if it creates bluetooth entries in the HA backend that might be in a format Bermuda is not expecting.

If you're able to submit the output from a "download diagnostics" that might be really helpful, and I'll have a look at the hass_bluetooth_proxy integration when I can, too.

@jack3308 how have things been going with Bermuda for you? If you're still seeing slow-downs then a diagnostics might be helpful. Another thing is to check how many sensors you have enabled - Bermuda creates a lot of diagnostics sensors in a "disabled" state (like "distance to [proxy]" and "unfiltered distance to [proxy]"). If you have many of those enabled that will create a lot of memory use for the browser, making the front-end a bit heavy and slow, and will also result in a lot of database writes, making the backend slow.

@jack3308
Copy link
Author

jack3308 commented Oct 15, 2024

Yea, I'd enabled the distance and one other sensor (can't quite remember which). But compared to the other things I'd had enabled it should have been negligible - even with the frequency of updates.

For what it's worth I did attempt the debugging you asked for and when I used the templates, but I crashed my browser or the pi each time I attempted so something was definitely off. THAT BEING SAID, I think my instance shouldn't be something to focus on because its been running since I started my ha journey and I've installed, uninstalled, and reinstalled nearly everything I could get my hands on at some point along the journey so my instance is murky as hell anyways. I'm in the process of migrating to a fresh install.

I've swapped my production instance to docker on my server and have re-imaged the Pi's ssd and had Bermuda running constantly without issue for the past 2 weeks now that its cleaned up. Haven't moved everything over to it yet, so I'll keep you updated if I experience the same sort of issue as I migrate back to the pi (once the semester finishes up this week).

@jack3308
Copy link
Author

jack3308 commented Oct 15, 2024

@agittins I have Performance issue w/ Bermuda as well, not sure if I narrowed down the cause, but I tried to track iBeacon provided by HA Companion and added hass_Bluetooth_Proxy w/ its Companion on a stationary Android Tablet. Will try to dig deeper and find out what exactly bogs my system down in the coming days.

I'd set this up as well but had uninstalled it and still had the issue's? Maybe it leaves some artifacts that interact with Bermuda nastily? Either way, Bermuda does away with the need for the BT_Proxy_Companion for me so it won't be getting installed again now that I can use my m5 atoms for both tracking and proxy

@agittins
Copy link
Owner

Ahh, now that's very interesting! Yes, I wonder if that integration is leaving a lot of records in the bluetooth backend that are confusing/bogging down Bermuda. I might try installing it myself for a bit and see if I can see what it's doing - if nothing else I can probably alter Bermuda to ignore the extra records.

@Anto79-ops
Copy link

hello all, I posted this issue here without knowing about this issue. There are some callgrinds that @bdraco created after looking for perfomance bottlenecks on my HAOS system running on a NUC12 bare-metal.

thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants