Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

time zone issue #12

Closed
ice1e0 opened this issue Apr 14, 2020 · 14 comments · Fixed by mara/mara-pipelines#34
Closed

time zone issue #12

ice1e0 opened this issue Apr 14, 2020 · 14 comments · Fixed by mara/mara-pipelines#34

Comments

@ice1e0
Copy link
Member

ice1e0 commented Apr 14, 2020

had to change the time zone at my postgresql server; seems like that the logging date/time is saved in local time zone and not in UTC. Now my pipeline numbers look totally off:

image

Suggest to change it that the date time is always saved in UTC in the database to support multiple time zones in the future (e.g. eastern and pacific time).
As a quick-fix the time zone could be configured in the mara app config and always when displaying time the webside the app should calculate the time based on the utc time adjusted with a time bias from the time zone.

@ghost
Copy link

ghost commented Apr 23, 2020

looks like that negative costs can appear not only when you switch the time zone of the server, but as well, when an execution fails
image

@martin-loetzsch
Copy link
Member

This is weird. Can you show the table definition for data_integration_node_run in the mara metadata db? In psql you get this with \d data_integration_node_run

@ghost
Copy link

ghost commented Apr 24, 2020

mara_stage=# \c mara_prod
You are now connected to database "mara_prod" as user "postgres".
mara_prod=# \d data_integration_node_run
mara_prod=# \d data_integration_node_run
                                            Table "public.data_integration_node_run"
   Column    |           Type           | Collation | Nullable |                            Default
-------------+--------------------------+-----------+----------+----------------------------------------------------------------
 node_run_id | integer                  |           | not null | nextval('data_integration_node_run_node_run_id_seq'::regclass)
 run_id      | integer                  |           |          |
 node_path   | text[]                   |           |          |
 start_time  | timestamp with time zone |           | not null |
 end_time    | timestamp with time zone |           |          |
 succeeded   | boolean                  |           |          |
 is_pipeline | boolean                  |           |          |
Indexes:
    "data_integration_node_run_pkey" PRIMARY KEY, btree (node_run_id)
    "ix_data_integration_node_run_node_path" btree (node_path)
    "ix_data_integration_node_run_run_id" btree (run_id)
Foreign-key constraints:
    "data_integration_node_run_run_id_fkey" FOREIGN KEY (run_id) REFERENCES data_integration_run(run_id)
Referenced by:
    TABLE "data_integration_node_output" CONSTRAINT "data_integration_node_output_node_run_id_fkey" FOREIGN KEY (node_run_id) REFERENCES data_integration_node_run(node_run_id)

mara_prod=#

timezone configured in postgresql.conf: "Europe/Berlin"

@jankatins
Copy link
Member

So this is stored as UTC:

 start_time  | timestamp with time zone |           | not null |
 end_time    | timestamp with time zone |           |          |

Just to confirm the hypothesis that this is because we have saved a greater end time and start time:

Select
extract(EPOCH FROM end_time - start_time)
, *
from data_integration_node_run
where extract(EPOCH FROM end_time - start_time) < 0

(Derived from https://github.com/mara/data-integration/blob/master/data_integration/logging/node_cost.py#L31)

My guess is that we actually send in non-TZ aware timestamp which postgresql assumes to be local timzone on the DB. E.g. https://github.com/mara/data-integration/blob/master/data_integration/execution.py#L210 we use datetime.now() which is not timezone aware. (which up to now was new to me... WTF?)

@ghost
Copy link

ghost commented Apr 24, 2020

mara_prod=# Select
extract(EPOCH FROM end_time - start_time)
, *
from data_integration_node_run
where extract(EPOCH FROM end_time - start_time) < 0
;
   date_part   | node_run_id | run_id |                    node_path                     |          start_time           |           end_time            | succeeded | is_pipeline
---------------+-------------+--------+--------------------------------------------------+-------------------------------+-------------------------------+-----------+-------------
 -21598.144066 |        2622 |     36 | {dwh,schema_product}                             | 2020-04-14 18:30:38.152571-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21598.108317 |        2620 |     36 | {dwh,schema_hr}                                  | 2020-04-14 18:30:38.116822-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21599.734062 |        2670 |     36 | {dwh,schema_finance,ledger_account_view}         | 2020-04-14 18:30:39.742567-04 | 2020-04-14 12:30:40.008505-04 | f         | f
 -21599.709135 |        2669 |     36 | {dwh,schema_finance,dimension_project}           | 2020-04-14 18:30:39.71764-04  | 2020-04-14 12:30:40.008505-04 | f         | f
 -21599.655276 |        2668 |     36 | {dwh,schema_finance,dimension_department_view}   | 2020-04-14 18:30:39.663781-04 | 2020-04-14 12:30:40.008505-04 | f         | f
 -21599.651385 |        2667 |     36 | {dwh,schema_finance,currency_exchange_rate_view} | 2020-04-14 18:30:39.65989-04  | 2020-04-14 12:30:40.008505-04 | f         | f
 -21598.088832 |        2618 |     36 | {dwh,schema_bank}                                | 2020-04-14 18:30:38.097337-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21598.075789 |        2617 |     36 | {dwh,schema_global}                              | 2020-04-14 18:30:38.084294-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21598.045521 |        2615 |     36 | {dwh,schema_inventory}                           | 2020-04-14 18:30:38.054026-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21598.032459 |        2614 |     36 | {dwh,schema_finance}                             | 2020-04-14 18:30:38.040964-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21598.018383 |        2613 |     36 | {dwh}                                            | 2020-04-14 18:30:38.026888-04 | 2020-04-14 12:30:40.008505-04 | f         | t
 -21553.874597 |        9795 |     80 | {refresh_dwh,sales_invoice_line}                 | 2020-04-23 19:34:37.610677-04 | 2020-04-23 13:35:23.73608-04  | f         | f
 -21556.862308 |        9796 |     81 | {refresh_dwh,sales_invoice}                      | 2020-04-23 19:34:43.495279-04 | 2020-04-23 13:35:26.632971-04 | f         | f
 -21577.922513 |        9797 |     82 | {refresh_dwh,sales_order}                        | 2020-04-23 19:35:06.266946-04 | 2020-04-23 13:35:28.344433-04 | f         | f
 -21596.962854 |       10676 |     99 | {msolapdwh}                                      | 2020-04-24 06:23:48.891532-04 | 2020-04-24 00:23:51.928678-04 | f         | t
 -21599.541479 |       10730 |     99 | {msolapdwh,dim_gift_card_operation}              | 2020-04-24 06:23:51.470157-04 | 2020-04-24 00:23:51.928678-04 | f         | f
 -21599.517992 |       10729 |     99 | {msolapdwh,fact_ledger_trans_de01}               | 2020-04-24 06:23:51.44667-04  | 2020-04-24 00:23:51.928678-04 | f         | f
 -21599.483688 |       10728 |     99 | {msolapdwh,fact_inventory_trans_no01}            | 2020-04-24 06:23:51.412366-04 | 2020-04-24 00:23:51.928678-04 | f         | f
 -21599.430411 |       10727 |     99 | {msolapdwh,dim_subsidiary_company}               | 2020-04-24 06:23:51.359089-04 | 2020-04-24 00:23:51.928678-04 | f         | f
(19 rows)

mara_prod=#

@jankatins here u go

@jankatins
Copy link
Member

jankatins commented Apr 24, 2020

Ok, my assumption is that your DB runs in another timezone than your frontend and these pipelines were killed and then closed by https://github.com/mara/data-integration/blob/master/data_integration/logging/run_log.py#L69 which uses the timestamp from the postgresql server and not the frontend. @hz-lschick Does that sound right?

If so, then adding the local frontend TZ to all datetime.datetime.now() calls will make that problem go away, too.

@martin-loetzsch
Copy link
Member

The node runs above indeed failed. So that's a valid hypothesis

@ghost
Copy link

ghost commented Apr 24, 2020

well, not at least that I know; all my executions run on Europe/Berlin:

I execute it on the server:
image

... or via Chrome browser and my OS is running on Europe/Berlin as well.

Is there a way to validate/verbose the execution timezone somehow?

@jankatins
Copy link
Member

@hz-lschick Can you try show timezone; in postgresql?

@jankatins
Copy link
Member

Please have a look at mara/mara-pipelines#34

@ghost
Copy link

ghost commented Apr 24, 2020

you where right, the timezone of postgres db was off, somehow

@martin-loetzsch
Copy link
Member

So does mara/mara-pipelines#34 fix your problem?

@ghost
Copy link

ghost commented Apr 27, 2020

@martin-loetzsch seems to work now with the fix
i'd set my timezone back and made a execution which failed, resulting in a correct time now:
image

jankatins added a commit to mara/mara-pipelines that referenced this issue Apr 27, 2020
)

The problem was that a) the frontend generated timestamps without timezones which postgresql then interpreted as the UTC but which were actually in the timezone which you ran the python code and b) the closing of open runs after an exception which kills the ETL runner itself (not individual tasks) after an error happened to insert a timestamp with timezone of the time on the postgresql server.

-> the error only happens when an exception happens which triggers this code path, i.e. because something shuts down the the runner (e.g. docker instance is shut down or ctrl+c on the command line).

Should fix mara/mara-example-project-2#12
@jankatins
Copy link
Member

Release data-integration 2.8.1 with the fix for this

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants