Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Airflow backfilling can't be disabled #16107

Closed
hafid-d opened this issue May 27, 2021 · 13 comments
Closed

Airflow backfilling can't be disabled #16107

hafid-d opened this issue May 27, 2021 · 13 comments
Labels
affected_version:2.0 Issues Reported for 2.0 area:core kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file

Comments

@hafid-d
Copy link

hafid-d commented May 27, 2021

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version):

  • Cloud provider or hardware configuration:
  • OS : Ubuntu 18.04.3
  • Install tools: celery = 4.4.7, redis = 3.5.3

What happened:.
Noticed a weird behavior with Airflow backfilling: my previous dags are still queing and running even after doing the following :

  • Setting catchup_by_default=False in airflow.cfg
  • Setting catchup=False in the DAG definition
  • Using LatestOnlyOperator

What you expected to happen:
I expect the old dags not to be running again.

@hafid-d hafid-d added the kind:bug This is a clearly a bug label May 27, 2021
@hafid-d hafid-d changed the title Airflow backfilling can't be disable Airflow backfilling can't be disabled May 27, 2021
@motherhubbard
Copy link

I had this issue also in 2.0.2. Ive just bumped to 2.1.0 and it seems to be fixed in there.

@hafid-d
Copy link
Author

hafid-d commented Jun 1, 2021

@motherhubbard tried using 2.1.0 but still have the issue :-/ did u update anything else?

@eladkal
Copy link
Contributor

eladkal commented Jun 7, 2021

Can you please add a reproduce example?

@GergelyKalmar
Copy link

I had a similar issue – in my case it was a mistyped cron that seemingly caused backfills. The trigger seemed to work fine but the intervals were wrong (the cron expressions were like 5/10 * * * * *, notice it has one too many stars in it).

@uranusjr
Copy link
Member

the cron expressions were like 5/10 * * * * *, notice it has one too many stars in it

Could you open an issue for that? This is an invalid expression (I think) IMO we should prevent this from even being accepted. (I think this is a bug in croniter, but we should be able to perform additional validation in Airflow if they don’t want to fix it.)

@eladkal
Copy link
Contributor

eladkal commented Jun 30, 2021

the cron expressions were like 5/10 * * * * *, notice it has one too many stars in it

This is a valid cron exp. The 6th parameter means year.
However there is a bug in croniter it seems they use the 6th parameter wrong and it means seconds: taichino/croniter#76

>>> from croniter import croniter
>>> croniter('5/10 * * * * *')
<croniter.croniter.croniter object at 0x10b07c190>
>>> croniter('5/10 * * * * 1')
<croniter.croniter.croniter object at 0x10af92750>
>>> croniter('5/10 * * * * 59')
<croniter.croniter.croniter object at 0x10b07c190>
>>> croniter('5/10 * * * * 60')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 154, in __init__
    self.expanded, self.nth_weekday_of_month = self.expand(expr_format, hash_id=hash_id)
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 759, in expand
    return cls._expand(expr_format, hash_id=hash_id)
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 725, in _expand
    expr_format))
croniter.croniter.CroniterBadCronError: [5/10 * * * * 60] is not acceptable, out of range
>>> croniter('5/10 * * * * 2000')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 154, in __init__
    self.expanded, self.nth_weekday_of_month = self.expand(expr_format, hash_id=hash_id)
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 759, in expand
    return cls._expand(expr_format, hash_id=hash_id)
  File "/usr/local/lib/python3.7/site-packages/croniter/croniter.py", line 725, in _expand
    expr_format))
croniter.croniter.CroniterBadCronError: [5/10 * * * * 2000] is not acceptable, out of range

@GergelyKalmar
Copy link

That is in line with what I observed! What is weird though is that Airflow would schedule a separate job for every second even if backfilling is disabled.

@eladkal
Copy link
Contributor

eladkal commented Jun 30, 2021

That is in line with what I observed! What is weird though is that Airflow would schedule a separate job for every second even if backfilling is disabled.

So yeah it's a bug in croniter - but your cron expression is valid one it's just doesn't say what it should :)
I suggest to open an issue with https://github.com/taichino/croniter/issues/

@eladkal
Copy link
Contributor

eladkal commented Jun 30, 2021

@GergelyKalmar can you be more specific about what is the bug?

from airflow.models import DAG
from datetime import datetime
from airflow.operators.bash import BashOperator

with DAG(
    dag_id="16107",
    schedule_interval='5/10 * * * * *',
    start_date=datetime(2018, 1, 1),
    catchup=False
) as dag:

    BashOperator(task_id='try', bash_command='echo {{ ds }}')

It behaves as expected:
Screen Shot 2021-06-30 at 14 38 07

@GergelyKalmar
Copy link

GergelyKalmar commented Jun 30, 2021

It works as expected when enabling the DAG but when the next scheduled run comes you should see a lot of instances being created (I think one for every second in the given minute). I know it is not exactly backfilling but it kind of looked like it on first sight (just because of the many instances).

I've checked your job and I could reproduce the weird behavior with https://github.com/aws/aws-mwaa-local-runner.

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 31, 2021
@github-actions
Copy link

github-actions bot commented Aug 8, 2021

This issue has been closed because it has not received response from the issue author.

@github-actions github-actions bot closed this as completed Aug 8, 2021
@hafid-d
Copy link
Author

hafid-d commented Aug 10, 2021

Upgraded to 2.1.2 and still facing the issue even with catchup_by_default = False

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
affected_version:2.0 Issues Reported for 2.0 area:core kind:bug This is a clearly a bug pending-response stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

No branches or pull requests

5 participants