Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Forward cookies from login page form to resource page #438

Open
mallorca2288 opened this issue Oct 11, 2024 · 2 comments
Open

Forward cookies from login page form to resource page #438

mallorca2288 opened this issue Oct 11, 2024 · 2 comments

Comments

@mallorca2288
Copy link

mallorca2288 commented Oct 11, 2024

Version of the custom_component

8.0.2

Configuration

  - resource: "https://plataforma.habidat.es/src/php/vecino/selectViviendas.php?list=acumulado_fecha_desde&vivienda=XXXXX"
    name: Habidat
    log_response: True
    scan_interval: 0
    form_submit:
        submit_once: True
        resource: 'https://plataforma.habidat.es'
        select: "form.login-form"
        input:
            u12: email@gmail.com
            c12: 'password'
        headers:
            referer: "https://plataforma.habidat.es/index.php"
            X-Requested-With: XMLHttpRequest
    headers:
        referer: "https://plataforma.habidat.es"
    sensor:
      - select: 'body'
        name: habidat
    button:
        unique_id: refrescar_habidat
        name: Refrescar habidat

Describe the bug

First of all I want to thank the developer for this amazing custom component.

Cookies set when loading the form page (form_page_response_cookies) and also set when submitting the form (form_submit_request_cookies) are not sent when requesting the resource page (page_request_cookies).

The issue I'm having is that the server sets a cookie with PHPSESSID when loading the form. But this cookie is not sent when loading the resource page, so it returns the login page again.

Everything works ok If I manually set the cookie via the headers option, but it only lasts a few minutes until the session expires.

headers:
  Cookie: PHPSESSID=e3ffadacc4d2c58edf71c5add71a96##

Is there any way to retrieve and store the cookies from the form page and send them to the resource page? I've tried reading all the issues related to cookies (#319 , #407, #327) but I couldn't figure out what I'm doing wrong.

Debug log


2024-10-11 02:53:24.549 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Starting with form-submit
2024-10-11 02:53:24.550 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Requesting page with form from: https://plataforma.habidat.es
2024-10-11 02:53:24.550 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing form_page-request with a GET to url: https://plataforma.habidat.es with headers: {'referer': 'https://plataforma.habidat.es/index.php', 'X-Requested-With': 'XMLHttpRequest'} and cookies: None.
2024-10-11 02:53:24.607 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: form_page_request_headers.txt
2024-10-11 02:53:24.861 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:24.965 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: form_page_response_headers.txt
2024-10-11 02:53:24.965 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: form_page_response_cookies.txt
2024-10-11 02:53:24.979 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: form_page_response_body.txt
2024-10-11 02:53:24.980 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Parse page with form with BeautifulSoup parser lxml
2024-10-11 02:53:25.716 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # The page with the form parsed by BeautifulSoup has been written to file: form_page_soup.txt
2024-10-11 02:53:25.716 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Try to find form with selector form.login-form
2024-10-11 02:53:25.718 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Form looks like this: 
<form action="src/php/processLogin.php" class="login-form" method="post" name="login">
<h1 style="color:#333;">Inicio de sesión</h1>
<p>Servicio de control de consumos.</p>
<div class="row">
<div class="col-xs-12 col-md-6">
<input autocomplete="off" class="form-control form-control-solid placeholder-no-fix form-group" id="u12" name="u12" onkeydown="if(event.keyCode==13){event.preventDefault();cargarDesafioLogin();}" placeholder="Email" type="text"/>
</div>
<div class="col-xs-12 col-md-6">
<input autocomplete="off" class="form-control form-control-solid placeholder-no-fix form-group" id="c12" name="c12" onkeydown="if(event.keyCode==13){event.preventDefault();cargarDesafioLogin();}" placeholder="Contraseña" type="password"/>
</div>
</div>
<div class="row">
<div class="col-sm-12 text-right">
<input id="res40" name="res40" type="hidden"/>
<div class="forgot-password">
<a class="forget-password" id="forget-password">Solicitar contraseña</a>
</div>
<input class="btn blue" id="botonlogin" onclick="javascript:cargarDesafioLogin();" type="button" value="Entrar"/>
</div>
</div>
<input id="smt" name="smt" type="hidden" value="Habidat"/>
<input id="via" name="via" type="hidden" value="plataforma"/>
</form>
2024-10-11 02:53:25.734 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Finding all input fields in form
2024-10-11 02:53:25.734 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Found the following input fields: {'u12': None, 'c12': None, 'res40': None, None: 'Entrar', 'smt': 'Habidat', 'via': 'plataforma'}
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Found form action src/php/processLogin.php and method post
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Merged input fields with input data in config. Result: {'u12': 'email@gmail.com', 'c12': 'password', 'res40': None, None: 'Entrar', 'smt': 'Habidat', 'via': 'plataforma'}
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Determined the url to submit the form to: https://plataforma.habidat.es/src/php/processLogin.php
2024-10-11 02:53:25.735 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Submitting the form
2024-10-11 02:53:25.736 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing form_submit-request with a post to url: https://plataforma.habidat.es/src/php/processLogin.php with headers: {'referer': 'https://plataforma.habidat.es/index.php', 'X-Requested-With': 'XMLHttpRequest'} and cookies: <Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>.
2024-10-11 02:53:25.893 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_body written to file: form_submit_request_body.txt
2024-10-11 02:53:25.944 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: form_submit_request_headers.txt
2024-10-11 02:53:25.975 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_cookies written to file: form_submit_request_cookies.txt
2024-10-11 02:53:26.435 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:26.524 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: form_submit_response_headers.txt
2024-10-11 02:53:26.535 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: form_submit_response_body.txt
2024-10-11 02:53:26.535 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: form_submit_response_cookies.txt
2024-10-11 02:53:26.577 DEBUG (MainThread) [custom_components.multiscrape.form] Habidat # Form seems to be submitted successfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
2024-10-11 02:53:26.578 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Executing page-request with a GET to url: https://plataforma.habidat.es/src/php/vecino/selectViviendas.php?list=acumulado_fecha_desde&vivienda=XXXXX with headers: {'referer': 'https://plataforma.habidat.es'} and cookies: <Cookies[]>.
2024-10-11 02:53:26.652 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_headers written to file: page_request_headers.txt
2024-10-11 02:53:26.673 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # request_cookies written to file: page_request_cookies.txt
2024-10-11 02:53:26.878 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # Response status code received: 200
2024-10-11 02:53:26.955 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_headers written to file: page_response_headers.txt
2024-10-11 02:53:26.955 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_body written to file: page_response_body.txt
2024-10-11 02:53:27.002 DEBUG (MainThread) [custom_components.multiscrape.http] Habidat # response_cookies written to file: page_response_cookies.txt
2024-10-11 02:53:27.005 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # Loading the content in BeautifulSoup.
2024-10-11 02:53:27.965 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # page_soup written to file: page_soup.txt
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Habidat # Data successfully refreshed. Sensors will now start scraping to update.
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 3.569 seconds (success: True)
2024-10-11 02:53:27.966 DEBUG (MainThread) [custom_components.multiscrape.sensor] Habidat # habidat # Start scraping to update sensor
2024-10-11 02:53:27.969 DEBUG (MainThread) [custom_components.multiscrape.scraper] Habidat # habidat # Tag selected: [REMOVED LOGIN PAGE HTML CONTENT]

form_page_response_cookies.txt
<Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>

form_submit_request_cookies.txt
<Cookies[<Cookie PHPSESSID=e3ffadacc4d2c58edf71c5add71a96## for plataforma.habidat.es />]>

form_submit_response_cookies.txt
<Cookies[]>

page_request_cookies.txt
<Cookies[]>

page_response_cookies.txt
<Cookies[]>

@danieldotnl
Copy link
Owner

Thanks for the detailed description! Could you try if it works with submit_once: False?

@mallorca2288
Copy link
Author

Unfortunately I see the same behaviour when changing submit_once to false.

However I've found an alternative solution to save the value of the cookies in a sensor. I'll explain how just in case it can help someone else.

In configuration.yaml I have added the following lines:

homeassistant:
allowlist_external_dirs:
- "/config/multiscrape/"

I have created a sensor with the Home Assistant (FILE integration) that will be reading the content from form_page_response_cookies.txt file.
In my example:

File path: /config/multiscrape/habidat/form_page_response_cookies.txt
Template:

{% if value is string and (value|length>5)%}
  {% set ret = (value|regex_findall(find='PHPSESSID=(.*?) for ', ignorecase=False))[0] %}
  {% if ret is string and (ret|length>5)%}
  {{ ret }}
  {% else %}
  unknown
  {% endif %}
{% else %}
unknown
{% endif %}

This way, I have the cookie value saved in a sensor that I can use for another call to multiscraper using the option:

headers:
   Cookie: PHPSESSID={{ states('sensor.habidat_cookie') }};

If I want to store cookies that are more than 255 characters in lengh (Like for example Authorization cookies) what I did is create two file sensors for each cookie and storing the first 240 characters in one sensor (By changing the template to {{ ret[:240] }} and another one with the template {{ ret[240:] }}. Afterwards I merged the content of both sensors with a (template sensor) .

@danieldotnl if I can make a feature request: I think being able to directly save the value of the cookies in a sensor with multiscraper would be wonderful and much simpler.

Thank you!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants