Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Adding support for header mappings in form submit. #327

Closed
wants to merge 6 commits into from

Conversation

jeremicmilan
Copy link
Contributor

@jeremicmilan jeremicmilan commented Feb 3, 2024

Header mappings are a feature to configure the headers you want to be forwarded from scraping the form-submit page to scraping the main page for sensor data. A common use case is to populate the X-Login-Token header which is the result of the login.

Example:

multiscrape:
  - name: AirVisual
    resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices/<device_id>?units.system=metric&AQI=US&language=en'
    scan_interval: 10
    log_response: true
    form_submit:
      submit_once: True
      resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
      input:
        email: '<email>'
        password: '<password>'
      header_mappings:
        - name: X-Login-Token
          value_template: '{{ (value | from_json).loginToken }}'
    sensor:
      - name: AirVisual Outdoor AQI
        value_template: '{{ (value | from_json).current.aqi.value }}'
        unit_of_measurement: 'AQI US'
      - name: AirVisual Outdoor PM1 AQI
        value_template: '{{ (value | from_json).current.pm1.aqi }}'
        unit_of_measurement: 'AQI US'
      - name: AirVisual Outdoor PM2.5 AQI
        value_template: '{{ (value | from_json).current.pm25.aqi }}'
        unit_of_measurement: 'AQI US'
      - name: AirVisual Outdoor PM10 AQI
        value_template: '{{ (value | from_json).current.pm10.aqi }}'
        unit_of_measurement: 'AQI US'

      - name: AirVisual Outdoor PM1
        value_template: '{{ (value | from_json).current.pm1.value }}'
        unit_of_measurement: 'µg/m³'
      - name: AirVisual Outdoor PM2.5
        value_template: '{{ (value | from_json).current.pm25.value }}'
        unit_of_measurement: 'µg/m³'
      - name: AirVisual Outdoor PM10
        value_template: '{{ (value | from_json).current.pm10.value }}'
        unit_of_measurement: 'µg/m³'
      - name: AirVisual Outdoor ParticleCount
        value_template: '{{ (value | from_json).current.pc.value }}'
        unit_of_measurement: 'pc/L'

      - name: AirVisual Outdoor Pressure
        value_template: '{{ (value | from_json).current.pressure.value }}'
        unit_of_measurement: 'mbar'
      - name: AirVisual Outdoor Humidity
        value_template: '{{ (value | from_json).current.humidity.value }}'
        unit_of_measurement: '%'
      - name: AirVisual Outdoor Temperature
        value_template: '{{ (value | from_json).current.temperature.value }}'
        unit_of_measurement: '°C'

Log into https://dashboard.iqair.com/personal/devices, select the device to get the <device_id> in the URL. After that analyze network traffic and find the name starting with <device_id>. That will contain the entire path in the example including <user_id> (there's probably an easier way to get <user_id>, but this works),

@danieldotnl
Copy link
Owner

Thank you for this extensive contribution! I really like your solution and it seems to solve many open issues!
I'm currently working on another custom component of mine (https://github.com/danieldotnl/ha-measureit). I hope that will be finished soon and then I'll give multiscrape some more love and attention again.
I have a very large change prepared in the test-service branch which I hope to merge soon.
It introduces two services for scraping and retrieving page content, which should it make a lot easier for people to figure out their configuration.
However, merging is not trivial and I would like to ask you if you can rebase your PR on that branch?

Thanks again and let me know if you have questions!

@jeremicmilan
Copy link
Contributor Author

Thank you for making the integration in the first place! You're welcome, I'm glad to contribute.

No worries, I can wait. The scraping works for me, it's just that I have to refresh it from time to time.

@jeremicmilan
Copy link
Contributor Author

@danieldotnl, what is the rough estimate? Is it days/weeks/months?
If it's months, I'll probably go ahead and install the integration from my branch to avoid expiring tokens.

@danieldotnl
Copy link
Owner

Weeks!

@danieldotnl
Copy link
Owner

Finally! I merged the test-service branch!
Could you please resolve the conflicts in your PR? 😅

@jeremicmilan
Copy link
Contributor Author

Thanks!
Little bit short on time at the moment. I'll try it this weekend or the next one.

@danieldotnl
Copy link
Owner

Any progress? I don't want your tokens to expire ;-)

@jeremicmilan
Copy link
Contributor Author

Sorry, totally forgot about this and actually remembered it two days ago. I'll do it now/soon.

@jeremicmilan
Copy link
Contributor Author

I messed up and did not create a dev branch on my fork (first PR on GitHub and kind of thought that my fork is considered as a dev branch). There might have been a way to do this, but it is what it is. 😄 I'll try to reactivate this PR, and if not possible, I'll create a new one.

@jeremicmilan jeremicmilan reopened this Apr 6, 2024
@jeremicmilan
Copy link
Contributor Author

@danieldotnl , merge complete. Please review.

@danieldotnl
Copy link
Owner

Thanks a lot! That wasn't the easiest merge 😊 I'll look into it soon.
One questing already: how can I test it as I don't have an airvisual account?

@jeremicmilan
Copy link
Contributor Author

jeremicmilan commented Apr 6, 2024

Yep, it wasn't an easy merge. On the other hand, it was not that complex, only tedious (making sure something is not lost in the transition).

Regarding testing, you should be able to use it with any website operating with username and pass only (no two factor and more complicated encryption).
On the other hand, you should be able to create an AirVisual account. You just will not be able to scrape data from a device, but you can change that to scrape a random thing from a slightly different URL. Here is an example:

multiscrape:
  - name: AirVisual
    resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices'
    scan_interval: 10
    log_response: true
    form_submit:
      submit_once: True
      resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
      input:
        email: '<username>'
        password: '<password>'
      header_mappings:
        - name: X-Login-Token
          value_template: '{{ (value | from_json).loginToken }}'
    sensor:
      - name: test_header_mapping
        value_template: '{{ 10 }}'

Get user_id with the instructions from the PR description. If you remove header_mappings from the config (or input wrong password), you should get the 401 error (as headers were not populated in the main scrape).

To have better testing of the feature, maybe we can ask folks that filed issues linked in this PR (that this feature should resolve) to test it?

@danieldotnl
Copy link
Owner

Yep, it wasn't an easy merge. On the other hand, it was not that complex, only tedious (making sure something is not lost in the transition).

Nice job!

Regarding testing, you should be able to use it with any website operating with username and pass only (no two factor and more complicated encryption).

But most sites with a username/pass don't require this right? I can check if it forwards headers, but I still like to verify if the login works with this change while it didn't work without.

On the other hand, you should be able to create an AirVisual account. You just will not be able to scrape data from a device, but you can change that to scrape a random thing from a slightly different URL. Here is an example:

multiscrape:
  - name: AirVisual
    resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices'
    scan_interval: 10
    log_response: true
    form_submit:
      submit_once: True
      resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
      input:
        email: '<username>'
        password: '<password>'
      header_mappings:
        - name: X-Login-Token
          value_template: '{{ (value | from_json).loginToken }}'
    sensor:
      - name: test_header_mapping
        value_template: '{{ 10 }}'

Get user_id with the instructions from the PR description. If you remove header_mappings from the config (or input wrong password), you should get the 401 error (as headers were not populated in the main scrape).

I made a throwaway account, but I don't see how I can get the userid since I don't have a device to add... Ideas?

To have better testing of the feature, maybe we can ask folks that filed issues linked in this PR (that this feature should resolve) to test it?

Yes, I'll create a pre-release and we'll ask the people who created token related issues to try it out.

@jeremicmilan
Copy link
Contributor Author

But most sites with a username/pass don't require this right? I can check if it forwards headers, but I still like to verify if the login works with this change while it didn't work without.

I thought that most of them do actually give you a token you use for accessing the website. Otherwise, how does the website know that a specific request is properly authenticated? Spoofing becomes much easier otherwise without the token. On the other hand, you do not want to transmit username/password with every request, as it opens you up to many other vulnerabilities. I'm more familiar with bearer tokens (with the Authorization header), but X-Login-Token seems to serve the same purpose. And here is Copilot's summary:
image

I made a throwaway account, but I don't see how I can get the userid since I don't have a device to add... Ideas?

Go to https://dashboard.iqair.com/personal/devices and in the Network tab of the Developer Tools, search for account?units.system=metric&AQI=US&language=en. The first field id in the response is the userid.

image

@jeremicmilan
Copy link
Contributor Author

@danieldotnl, any progress on the review? Can I help you somehow?

1 similar comment
@jeremicmilan
Copy link
Contributor Author

@danieldotnl, any progress on the review? Can I help you somehow?

@danieldotnl
Copy link
Owner

Thanks @jeremicmila, I know it takes a long time! I had it working a week ago with airvisual, so I understand now how it's working. Now I need to find time for the review.
Going on a short vacation now, and really hope to be able to finish soon after.

@danieldotnl
Copy link
Owner

I have been thinking about this and wanted to share my thoughts with you.

I think this feature should be about two-step requests where we need to pass something back to the server that we received as response to the first request. This could be:

  • a token in a header
  • a token on a response page (your case)
  • a cookie

I was wondering if we could solve all three of them, because I believe the other two cases are actually more common than yours.
What about introducing a concept of variables which you can use in templates (e.g. in headers/resource url) later on?

Maybe your example would then become something like this:

multiscrape:
  - name: AirVisual
    resource: 'https://website-api.airvisual.com/v1/users/<user_id>/devices/<device_id>?units.system=metric&AQI=US&language=en'
    headers:
      X-Login-Token: {{ token }}
    scan_interval: 10
    log_response: true
    form_submit:
      submit_once: True
      resource: 'https://website-api.airvisual.com/v1/auth/signin/by/email'
      input:
        email: '<email>'
        password: '<password>'
      variables:
        - name: token
          value_template: '{{ (value | from_json).loginToken }}'
    sensor:

Later on the other cases could then be implemented.

Let me know what you think!

@jeremicmilan
Copy link
Contributor Author

I like the idea. Also, it should not be too hard to implement. Maybe we should be more specific when using variables to avoid naming conflicts? Something like X-Login-Token: {{ form_submit_variables.token }}?

I only wish you replied couple of days ago, as I've had some extra time during the holiday season. Now it's going to take a while before I find a slot to work on this, but I'll get it done. :)

@BentoAlves
Copy link

Hello.
I have the same problem, I need to do a Get on a page to get the bearer token.
However, I cannot use the token in a header as a template.

- name: Preco do gas
  resource: https://www.precodogas.com.br/fazer-pedido-ads/3/-23.536139/-46.6777853/Rua%20Apiac%C3%A1s/Pompeia/S%C3%A3o%20Paulo/SP/467
  scan_interval: 600
  sensor:
    - unique_id: preco_do_gas_token
      name: Preco do gas Token
      select: "#token"
      attribute: value

- name: Preco do gas API
  resource: https://api-lb.precodogas.com.br/api/lista-revendas/-23.536139/-46.6777853/
  scan_interval: 600
  headers:
    APP-KEY: 14c2529eb4498c5d1ffd6915d05bf58a91bdda796af59f41d480d11c099d0479
    Authorization: bearer {{ states("sensor.preco_do_gas_token") }}
    Content-type: "application/json"
    Accept: "text/plain"
    Content-Encoding: "utf-8"
  method: POST
  payload: '{"tipo_produto": "0","ordem": "1","rua": "Rua Apiacás","bairro": "Pompeia","cidade": "São Paulo","estado": "SP","numero": "467","origem": "2","cep": "","idrevenda": "0"}'
  log_response: true

In the log the header looks like this:

{'APP-KEY': '14c2529eb4498c5d1ffd6915d05bf58a91bdda796af59f41d480d11c099d0479', 'Authorization': 'bearer unavailable', 'Content-type': 'application/json', 'Accept': 'text/plain', 'Content-Encoding': 'utf-8'}

Is there any way to do this?

@jeremicmilan
Copy link
Contributor Author

jeremicmilan commented May 8, 2024

@BentoAlves, once this PR is completed, you'll be able to do something like this. Per the current feature design, you will fetch the Bearer token as part of the form submit part of the configuration and then use it in the parsing itself. @danieldotnl and I are currently discussing how exactly this functionality should be exposed.

@danieldotnl, this is another interesting way to solve the problem. Although, I don't like that the bearer token is leaking into the rest of the system (where it is not needed by anything else), so I would still do it by forwarding variables from the form submit part. However, there are probably legit use cases to use global home assistant state while scrapping something? Could you do that today and @BentoAlves is not doing it correctly at the moment?

@BentoAlves
Copy link

@jeremicmilan That's very good.

@danieldotnl Being able to get a Token from a state or an input helper is also very useful. This makes it easier to change a token or even the username and password of an integration, without the need to change a file and reload the integration.

@jeremicmilan I'm definitely not doing it. This was a way of expressing my need.

@danieldotnl
Copy link
Owner

@jeremicmilan I think our proposal with the variables will solve @BentoAlves his case. I don't see a convincing use case to "scrape" headers into sensors. Headers are not meant to carry state-like data.

@danieldotnl Being able to get a Token from a state or an input helper is also very useful. This makes it easier to change a token or even the username and password of an integration, without the need to change a file and reload the integration.

It's not a problem to get a token from a sensor or input helper. Pulling it into the sensor is the part that isn't supported.

@danieldotnl
Copy link
Owner

@jeremicmilan I already implemented handling cookies as I needed it. I don't think they need to be part of the variables as I don't see a reason why they shouldn't be included in requests all the time, just like browsers do.
If you are interested, you can take a look and/or review: #368

@jeremicmilan
Copy link
Contributor Author

jeremicmilan commented May 27, 2024

@danieldotnl, implemented the feedback as part of this second PR: #374. Let's continue the discussion there.

I have to abandon this PR due to my previous mistake, where I used my master branch for this PR. I could have gotten away with that mistake, but it seems you tightened commit to master rules. :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants