-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Adding support for header mappings in form submit. #327
Conversation
Thank you for this extensive contribution! I really like your solution and it seems to solve many open issues! Thanks again and let me know if you have questions! |
Thank you for making the integration in the first place! You're welcome, I'm glad to contribute. No worries, I can wait. The scraping works for me, it's just that I have to refresh it from time to time. |
@danieldotnl, what is the rough estimate? Is it days/weeks/months? |
Weeks! |
Finally! I merged the test-service branch! |
Thanks! |
Any progress? I don't want your tokens to expire ;-) |
Sorry, totally forgot about this and actually remembered it two days ago. I'll do it now/soon. |
I messed up and did not create a dev branch on my fork (first PR on GitHub and kind of thought that my fork is considered as a dev branch). There might have been a way to do this, but it is what it is. 😄 I'll try to reactivate this PR, and if not possible, I'll create a new one. |
@danieldotnl , merge complete. Please review. |
Thanks a lot! That wasn't the easiest merge 😊 I'll look into it soon. |
Yep, it wasn't an easy merge. On the other hand, it was not that complex, only tedious (making sure something is not lost in the transition). Regarding testing, you should be able to use it with any website operating with username and pass only (no two factor and more complicated encryption). multiscrape:
- name: AirVisual
resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices'
scan_interval: 10
log_response: true
form_submit:
submit_once: True
resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
input:
email: '<username>'
password: '<password>'
header_mappings:
- name: X-Login-Token
value_template: '{{ (value | from_json).loginToken }}'
sensor:
- name: test_header_mapping
value_template: '{{ 10 }}' Get user_id with the instructions from the PR description. If you remove header_mappings from the config (or input wrong password), you should get the 401 error (as headers were not populated in the main scrape). To have better testing of the feature, maybe we can ask folks that filed issues linked in this PR (that this feature should resolve) to test it? |
Nice job!
But most sites with a username/pass don't require this right? I can check if it forwards headers, but I still like to verify if the login works with this change while it didn't work without.
I made a throwaway account, but I don't see how I can get the userid since I don't have a device to add... Ideas?
Yes, I'll create a pre-release and we'll ask the people who created token related issues to try it out. |
I thought that most of them do actually give you a token you use for accessing the website. Otherwise, how does the website know that a specific request is properly authenticated? Spoofing becomes much easier otherwise without the token. On the other hand, you do not want to transmit username/password with every request, as it opens you up to many other vulnerabilities. I'm more familiar with bearer tokens (with the Authorization header), but X-Login-Token seems to serve the same purpose. And here is Copilot's summary:
Go to https://dashboard.iqair.com/personal/devices and in the |
@danieldotnl, any progress on the review? Can I help you somehow? |
1 similar comment
@danieldotnl, any progress on the review? Can I help you somehow? |
Thanks @jeremicmila, I know it takes a long time! I had it working a week ago with airvisual, so I understand now how it's working. Now I need to find time for the review. |
I have been thinking about this and wanted to share my thoughts with you. I think this feature should be about two-step requests where we need to pass something back to the server that we received as response to the first request. This could be:
I was wondering if we could solve all three of them, because I believe the other two cases are actually more common than yours. Maybe your example would then become something like this: multiscrape:
- name: AirVisual
resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices/<device_id>?units.system=metric&AQI=US&language=en'
headers:
X-Login-Token: {{ token }}
scan_interval: 10
log_response: true
form_submit:
submit_once: True
resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
input:
email: '<email>'
password: '<password>'
variables:
- name: token
value_template: '{{ (value | from_json).loginToken }}'
sensor: Later on the other cases could then be implemented. Let me know what you think! |
I like the idea. Also, it should not be too hard to implement. Maybe we should be more specific when using variables to avoid naming conflicts? Something like I only wish you replied couple of days ago, as I've had some extra time during the holiday season. Now it's going to take a while before I find a slot to work on this, but I'll get it done. :) |
Hello.
In the log the header looks like this:
Is there any way to do this? |
@BentoAlves, once this PR is completed, you'll be able to do something like this. Per the current feature design, you will fetch the Bearer token as part of the form submit part of the configuration and then use it in the parsing itself. @danieldotnl and I are currently discussing how exactly this functionality should be exposed. @danieldotnl, this is another interesting way to solve the problem. Although, I don't like that the bearer token is leaking into the rest of the system (where it is not needed by anything else), so I would still do it by forwarding variables from the form submit part. However, there are probably legit use cases to use global home assistant state while scrapping something? Could you do that today and @BentoAlves is not doing it correctly at the moment? |
@jeremicmilan That's very good. @danieldotnl Being able to get a Token from a state or an input helper is also very useful. This makes it easier to change a token or even the username and password of an integration, without the need to change a file and reload the integration. @jeremicmilan I'm definitely not doing it. This was a way of expressing my need. |
@jeremicmilan I think our proposal with the variables will solve @BentoAlves his case. I don't see a convincing use case to "scrape" headers into sensors. Headers are not meant to carry state-like data.
It's not a problem to get a token from a sensor or input helper. Pulling it into the sensor is the part that isn't supported. |
@jeremicmilan I already implemented handling cookies as I needed it. I don't think they need to be part of the variables as I don't see a reason why they shouldn't be included in requests all the time, just like browsers do. |
@danieldotnl, implemented the feedback as part of this second PR: #374. Let's continue the discussion there. I have to abandon this PR due to my previous mistake, where I used my master branch for this PR. I could have gotten away with that mistake, but it seems you tightened commit to master rules. :) |
Header mappings are a feature to configure the headers you want to be forwarded from scraping the form-submit page to scraping the main page for sensor data. A common use case is to populate the
X-Login-Token
header which is the result of the login.Example:
Log into
https://dashboard.iqair.com/personal/devices
, select the device to get the<device_id>
in the URL. After that analyze network traffic and find the name starting with<device_id>
. That will contain the entire path in the example including<user_id>
(there's probably an easier way to get<user_id>
, but this works),