Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cloudflare block when fetching for stream url with correct user agent #1041

Open
yongfg opened this issue Feb 12, 2025 · 12 comments
Open

Cloudflare block when fetching for stream url with correct user agent #1041

yongfg opened this issue Feb 12, 2025 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@yongfg
Copy link

yongfg commented Feb 12, 2025

Background

There was some user agent issue tracked in other thread but none of the existing user agent could give rtsp stream url, so I reverse engineered and grabbed the correct user agent that works on my phone.
The user agent looks like this (iPhone15,2 18_1_1) iOS Arlo 5.4.3
I verified that this user agent works and is giving me the correct rtsp stream url I want. So I start to use this user agent when fetching for stream urls whenever there's a motion triggered event. Which is a pretty normal thing.

However

It seems like, with the same user agent, after successfully fetching the stream url for a couple times, I start to get 403 Unknown error occurred. I verified that the credentials are still working fine. When I restarted the Pyaarlo object (meaning reload the session file and grabbed a new scraper), most time it comes back to work for a couple tries and then it runs into the same problem

I have the strong doubt that it's due to cloudflare. So I tried to refresh the scraper upon failure and it gets the situation better. However, it doesn't seem to work for every account. For account that has more devices, it seems more likely to fail.

Any idea, suggestion, experience @twrecked to bypass the cloudflare issue? I'm a little bit running out of options for now. Really Appreciated!

Please let me know if I should provide more information.

@twrecked
Copy link
Owner

Thank you for looking into this. I've been trying to get the rtsp stream back after the old user agent I was using was deprecated.

I'll keep playing around with this and report back.

The way I was looking at getting this to work was by adding an egressToken header into the Stream component but that was looking quite complicated to achieve.

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

I figured out the new agent by inspecting the traffic from my app and got the working user agent.

I'm interested in your idea. If you can share more information or obstacles I can also help try out.

Also, I tried to recreate the scraper with the cookies and user agent like this (your code):
_cookies, self._user_agent = cloudscraper.get_tokens(ORIGIN_HOST)
But it doesn't seem to be better.

Furthermore, I find using proxies is necessary for me. So every rtsp stream call Im making is with random rotating proxies

@twrecked
Copy link
Owner

Using the mpeg-dash stream is easy enough, here is come code I was using to test pyaarlo.

    stream_url = camera.start_stream("mac")
    print("stream-url={}".format(stream_url))
    url = urlparse(stream_url)
    egress_token = parse_qs(url.query)["egressToken"][0]

    print('starting ffmpeg')
    os.system(f"ffmpeg -v debug "
              f"-headers 'Egress-Token: {egress_token}\r\n"
              "Origin: https://my.arlo.com\r\n"
              "Referer: https://my.arlo.com/\r\n"
              "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3 Safari/605.1.15\r\n' "
              f"-i '{stream_url}' "
              "-c copy out.mp4")

I just need a mechanism to pass those extra headers into the stream component of Home Assistant to make it work. There is a mechanism to pass in options I think I just need to expand that.

I'm going to push a change so other people can test the user agent you find. We might be able to find a pattern of when it stops working.

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

Talking about the mpeg-dash stream, Im always getting:

https://arlostreaming21093-z2-prod.wowza.arlo.com:80/stream/AF524577E0D0C_1739335424492.mpd?egressToken=c50899ae_4e7c_453b_afa6_766a567d52eb&userAgent=web&cameraId=AF524577E0D0C_1739335424492&txnId=FE!eb34d40e-42ce-4d49-9b99-dff1daa6edb7&watchalong=true: Invalid data found when processing input

that's why I have to fall back to the rtsp. Any quick thoughts? I'd be really appreciated if can get this https stream to work cz then we'll have two options

@twrecked
Copy link
Owner

This is what I showed you. We need to pass the headers I showed you in the previous message to the stream component. I'm trying to work out how to do it.

ffmpeg needs to pass an egressToken as part of the headers when it opens the stream.

@twrecked twrecked self-assigned this Feb 12, 2025
@twrecked twrecked added the enhancement New feature or request label Feb 12, 2025
@twrecked
Copy link
Owner

twrecked commented Feb 12, 2025

These diffs allow me get mpeg-dash streaming.

This diff applies to the core homeassistant.

diff --git a/homeassistant/components/stream/__init__.py b/homeassistant/components/stream/__init__.py
index 8fa4c69ac5a..51758f0ede8 100644
--- a/homeassistant/components/stream/__init__.py
+++ b/homeassistant/components/stream/__init__.py
@@ -44,6 +44,7 @@ from .const import (
     ATTR_SETTINGS,
     ATTR_STREAMS,
     CONF_EXTRA_PART_WAIT_TIME,
+    CONF_HTTP_HEADERS,
     CONF_LL_HLS,
     CONF_PART_DURATION,
     CONF_RTSP_TRANSPORT,
@@ -166,6 +167,8 @@ def _convert_stream_options(
         pyav_options["rtsp_transport"] = rtsp_transport
     if stream_options.get(CONF_USE_WALLCLOCK_AS_TIMESTAMPS):
         pyav_options["use_wallclock_as_timestamps"] = "1"
+    if headers := stream_options.get(CONF_HTTP_HEADERS):
+        pyav_options[CONF_HTTP_HEADERS] = headers
 
     # For RTSP streams, prefer TCP
     if isinstance(stream_source, str) and stream_source[:7] == "rtsp://":
@@ -624,5 +627,6 @@ STREAM_OPTIONS_SCHEMA: Final = vol.Schema(
         vol.Optional(CONF_RTSP_TRANSPORT): vol.In(RTSP_TRANSPORTS),
         vol.Optional(CONF_USE_WALLCLOCK_AS_TIMESTAMPS): bool,
         vol.Optional(CONF_EXTRA_PART_WAIT_TIME): cv.positive_float,
+        vol.Optional(CONF_HTTP_HEADERS): cv.string,
     }
 )
diff --git a/homeassistant/components/stream/const.py b/homeassistant/components/stream/const.py
index c81d2f6cb18..d6b96deef5c 100644
--- a/homeassistant/components/stream/const.py
+++ b/homeassistant/components/stream/const.py
@@ -60,6 +60,7 @@ RTSP_TRANSPORTS = {
 }
 CONF_USE_WALLCLOCK_AS_TIMESTAMPS = "use_wallclock_as_timestamps"
 CONF_EXTRA_PART_WAIT_TIME = "extra_part_wait_time"
+CONF_HTTP_HEADERS = "headers"
 
 
 class StreamClientError(IntEnum):

This is for the aarlo piece:

diff --git a/custom_components/aarlo/camera.py b/custom_components/aarlo/camera.py
index 9f4aa9e..0ef985d 100644
--- a/custom_components/aarlo/camera.py
+++ b/custom_components/aarlo/camera.py
@@ -13,6 +13,7 @@ import logging
 import voluptuous as vol
 from collections.abc import Callable
 from haffmpeg.camera import CameraMjpeg
+from urllib.parse import urlparse, parse_qs
 
 import homeassistant.helpers.config_validation as cv
 from homeassistant.components import websocket_api
@@ -517,6 +518,23 @@ class ArloCam(Camera):
 
         return attrs
 
+    def _stream_source(self, user_agent):
+        """Return the source of the stream.
+
+        This set stream_options if the stream is https so we can pass egress
+        token on.
+        """
+        self.stream_options = {}
+        stream_url = self._camera.get_stream(user_agent)
+        if stream_url is not None:
+            if stream_url.startswith("https"):
+                url = urlparse(stream_url)
+                egress_token = parse_qs(url.query)["egressToken"][0]
+                self.stream_options = {
+                    "headers": f"Egress-Token: {egress_token}\r\n"
+                }
+        return stream_url
+
     async def stream_source(self):
         """Return the source of the stream.
 
@@ -524,11 +542,11 @@ class ArloCam(Camera):
         to the original Arlo one. This means we get a `rtsps` stream back which the stream
         component can handle.
         """
-        return await self.hass.async_add_executor_job(self._camera.get_stream, "arlo")
+        return await self.hass.async_add_executor_job(self._stream_source, "linux")
 
     async def async_stream_source(self, user_agent=None):
         return await self.hass.async_add_executor_job(
-            self._camera.get_stream, user_agent
+            self._stream_source, user_agent
         )
 
     def camera_image(

edit: removed the manifest changes

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

Great the mpeg-dash streaming works. Thank you for the help.

Also I wish to share the information when investigating the cloudflare issue. I figured out that the cloudflare issue doesn't seem to relate to the user agent. With either linux, mac or arlo, after requesting for the stream url for 6-9 times for the same device, I start to have 403. I tried to make the request pattern a bit more random (like random wait time, or random retry, etc) but doesn't seem to be helpful without refreshing the cloudscraper. After all, the cloudflare is protecting the endpoint so it's before we even got the stream url.

@justinmaiuto
Copy link

Is this still being looked at? I believe i have the same problem, cloudflare 403 after a little. Once this happens I need to delete the integration and config from scratch again. Then some point during the day the streams will stop working again

@twrecked
Copy link
Owner

@justinmaiuto Which bit? There are two pieces, the broken agent or mpeg-dash support?

I can't tell what is happening from you comment, can you provide:

  • the home assistant version
  • the aarlo version
  • and attach you aarlo.yaml config

And capture some debug logs around the time the issue happens.

@justinmaiuto
Copy link

justinmaiuto commented Feb 28, 2025

Sorry, I rushed that one a bit:

HA 2025.2.5
Aarlo 0.8.1.18
Config:

version: 1
aarlo:
  refresh_devices_every: 3
  stream_timeout: 180
  request_timeout: 120
  reconnect_every: 240
  user_agent: linux
  backend: sse
  stream_snapshot: true
  stream_snapshot_stop: 5
  snapshot_checks:
  - 5
  snapshot_timeout: 20
  mode_api: v2

The cameras don't start a stream saying "Failed to start WebRTC stream: Camera has no stream source"

Aarlo integration does not report any errors while the streams are not working, but after trying to restart HA or reload the integration, the following error is given "Error: login failed: 403 - possible cloudflare issue
If error persists you might need to change config and restart."

Nothing obvious from the logs:

unable to connect to Arlo: attempt=1,sleep=15,error=login failed: 403 - possible cloudflare issue
unable to connect to Arlo: attempt=2,sleep=30,error=login failed: 403 - possible cloudflare issue
unable to connect to Arlo: attempt=3,sleep=60,error=login failed: 403 - possible cloudflare issue

login failed: 403 - possible cloudflare issue

failed to read modes (v2)
login failed: 403 - possible cloudflare issue
error loading the image library
No devices returned from /hmsweb/v2/users/devices?t=1740665383763
No devices returned from /hmsweb/v2/users/devices?t=1740676243767
No devices returned from /hmsweb/v2/users/devices?t=1740687043782
No devices returned from /hmsweb/v2/users/devices?t=1740697903771
Error requesting stream: camera.aarlo_nursery_camera does not support play stream service
Error requesting stream: camera.aarlo_hallway_camera does not support play stream service
Error requesting stream: camera.aarlo_back_yard_camera does not support play stream service
Error requesting stream: camera.aarlo_garage_camera does not support play stream service
Error requesting stream: camera.aarlo_valentina_camera does not support play stream service

I can get Aarlo working again by either deleting the cookie and pickle files, or deleting the integration all together and starting again. The streams work for a while, but then the issue above repeats

@justinmaiuto
Copy link

Update, now I can't re-connect at all. Tried deleting cookie and pickle files, tried re-installing through HACS, tried changing gmail application password. All 403 Cloudflare issue

@twrecked
Copy link
Owner

I can't connect as well. Something has changed and even the web site is blocked for me.

I'm not sure what it is right now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants