Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a property to all for Batch JobParameters to enable configuration via environment variables #23411

Closed
mminella opened this issue Sep 18, 2020 · 28 comments
Labels
status: declined A suggestion or change that we don't feel we should currently apply type: enhancement A general enhancement

Comments

@mminella
Copy link
Member

Currently the only way for a user to provide Spring Batch Job Parameters is via command line args. This requested enhancement is to create a Boot property that accepts JSON to be deserialized into JobParameters. This is needed for the Kubernetes functionality when you attempt to restart a job using the shell entry point on your image. In this use case, there is no way to pass command line arguments to the application and some elements of normal parameters are not friendly to being set as environment variable names (run.id(long)=5 for example). I'd assume that the order of precedence for resolving duplicate JobParameters would follow the normal Boot conventions.

@philwebb
Copy link
Member

philwebb commented Sep 19, 2020

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

The only thing that feels a bit odd to me is using serialized JSON for the properties. We currently delegate to JobParametersConverter which takes a Properties input. Does this mean we're going JSON -> Properties -> JobParameters?

I wonder if we can get away with a simpler format? Perhaps a list of ':' delimited items? e.g:

SPRING_BATCH_JOBPARAMETERS="run.id(long):5,foo.bar:test"

@philwebb philwebb added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 19, 2020
@philwebb philwebb added this to the General Backlog milestone Sep 19, 2020
@mminella
Copy link
Member Author

mminella commented Sep 21, 2020

@philwebb

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

You understand the request correctly.

As for the use of JSON for the properties, that really was just following the model of the spring_application_json model where we are passing in all the vars in a single block. I'd prefer consistency over introducing a new format that requires users to understand/Boot needs to be able to parse/etc but I'm open to ideas here.

One other note. We'd like to get this into Boot 2.4 since it is actually impacting a SCDF use case currently. We can do the PR if that helps.

@philwebb
Copy link
Member

philwebb commented Sep 21, 2020

One other note. We'd like to get this into Boot 2.4 since it is actually impacting a SCDF use case currently. We can do the PR if that helps.

I've reserved this one for a conference event that's happening early Oct. Hopefully someone will take this on for that, if not, I'm sure we can work together to get something in for 2.4.

@philwebb philwebb modified the milestones: General Backlog, 2.4.x Sep 21, 2020
@snicoll snicoll added status: on-hold We can't start working on this issue yet and removed GraceHopperOSD labels Oct 5, 2020
@snicoll
Copy link
Member

snicoll commented Oct 5, 2020

If I understand the request correctly, we can probably add a field toBatchPropertiesthe update JobLauncherApplicationRunner.launchJobFromProperties to consider both command line-args and any properties.

I am a bit confused. @mminella approved this but, on the other hand, talks about environment variables. I don't think we should update BatchProperties. That would expose a property in the metadata for this, which would let the user paste raw json in application.properties.

Wouldn't we detect that SPRING_BATCH_JOBPARAMETERS is set and then decode the json if necessary? Being able to specify this via environment variables or system properties but not any other property sources feels a bit inconsistent though.

@snicoll snicoll added the for: team-attention An issue we'd like other members of the team to review label Oct 5, 2020
@philwebb philwebb added status: pending-design-work Needs design work before any code can be developed and removed for: team-attention An issue we'd like other members of the team to review labels Oct 5, 2020
snicoll added a commit to snicoll/spring-boot that referenced this issue Oct 6, 2020
This commit adds a "spring.batch.job.parameters" property that can be
used to specify the job parameters that should be used to run jobs on
startup.

This property can be specified as a comma-separate list of "key=value"
pair or as an indexed list in case one of the value contains a comma.

If the property is set, it takes precedence over application arguments
and is parsed using the same converter.

Closes spring-projectsgh-23411
snicoll added a commit to snicoll/spring-boot that referenced this issue Oct 6, 2020
This commit adds a "spring.batch.job.parameters" property that can be
used to specify job parameters that should be used to run jobs on
startup.

This property can be specified as a comma-separate list of "key=value"
pair or as an indexed list in case one of the value contains a comma.
Each parameter specified this way are parsed using the same converter
used to parse application arguments.

If a job parameter is specified both using the property and an
application argument, the latter takes precedence.

Closes spring-projectsgh-23411
@snicoll
Copy link
Member

snicoll commented Oct 6, 2020

OK so I've tested something that exposes a spring.batch.job.parameters as a list of parameter with the usual "key=value" form. The property can be specified as a comma-separated value or indexed fashion, as usual (the latter being mandatory if one of the parameter contains a , somewhere.

Something I didn't catch initially on this issue but a discussion with @mminella revealed. It is important that application arguments still override whatever has been specified using this property. I've pushed a proposal in snicoll@cc9f02b and welcome any feedback. Thanks!

@snicoll snicoll removed status: on-hold We can't start working on this issue yet status: pending-design-work Needs design work before any code can be developed labels Oct 6, 2020
@cppwfs
Copy link

cppwfs commented Oct 6, 2020

@snicoll I took the JSON approach to this issue. https://github.com/cppwfs/spring-boot/tree/BOOT-23411 .

@snicoll
Copy link
Member

snicoll commented Oct 6, 2020

Yes, I've read the diff in the PR you've raised @cppwfs. As I've mentioned there, we're not keen to use json for this. Please review the commit I've reference above and let me know if there is any problem. Thank you.

@cppwfs
Copy link

cppwfs commented Oct 7, 2020

@snicoll Thank you for creating this branch. I had a chance to run some of our tests from data flow. The only scenario that I can see that this branch does not cover is when a user has embedded commas in the job param. i.e. spring.batch.job.parameters=foo=bar,baz=boo,goo=ball,yikes. This fails parsing.

@snicoll
Copy link
Member

snicoll commented Oct 8, 2020

Thanks for reviewing.

The only scenario that I can see that this branch does not cover is when a user has embedded commas in the job param

There is an explicit mention of that use case in the commit that I've shared. It is mapped to a list so we need a separate to split the various elements. If you have a comma, you can specific the parameters with an index (one property per value) rather than a single value with everything. We could also consider changing the separator if that helps.

@snicoll snicoll added the status: blocked An issue that's blocked on an external project change label Oct 8, 2020
@snicoll snicoll added the for: team-attention An issue we'd like other members of the team to review label Feb 11, 2021
@cppwfs
Copy link

cppwfs commented Feb 11, 2021

Hello @snicoll,
I agree with Michael, that your recommendation is actually 2 issues. The priority is the ability to pass job parameters via properties.
It is a blocker in that a user can't pass job parameters if they use a shell entry point and they have corporate policies that prevent them from using anything but a shell entrypoint .

@wilkinsona
Copy link
Member

If we implement 1 (the ability to pass parameters via properties) while ignoring 2 (the need to specify which Job gets each job parameter), we're likely to need to make breaking changes to 1 when we subsequently tackle 2. We already know that passing parameters via the command line is broken in this regard. It would seem foolish to make the same mistake again when adding support for providing job parameters via the environment.

It seems to me that we have two ways forward here:

  1. Simplify things by only running a single job
  2. Continue to support running multiple jobs and allow parameters to be specified on a per-job basis

Option 1 would breaking things for the users that are combining multiple jobs for many reasons but would allow property-based parameter support to be implemented relatively easily. Option 2 would seem to require us to take a step back and agree on how to provide parameters in a per-job basis and then add the environment as input to that as a second step.

@mminella
Copy link
Member Author

While not our recommendation, 1 would be a jaring change for many users. I know it is a common question about how to package multiple jobs into a single artifact and run only the one that matters at that time (they wish to avoid having an artifact per job).

I would vote for option two if we are going to address both issues at once.

@wilkinsona
Copy link
Member

1 would be a jaring change for many users. I know it is a common question about how to package multiple jobs into a single artifact and run only the one that matters at that time (they wish to avoid having an artifact per job).

There's a difference between packaging multiple jobs into a single artifact and running multiple jobs at the same time. If the goal is to avoid having one artifact per job, we could support that while also requiring that we only execute one job. If the context only contains a single job, we execute it. If the context contains multiple jobs, we require that a property be used to identify the job to execute. This is what @snicoll described above. We'd rename spring.batch.job.names to spring.batch.job.name and it would identify the one and only job to execute.

Before we can make progress on this, I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time. That's the default behaviour at the moment and we also have a property that can be used to specify the jobs that are executed if you don't want to execute all of them. Users that package multiple jobs in the same artifact but only execute one of them would be unaffected by this, other than a possible change to the property name used to identify the job to execute. Users that package multiple jobs in the same artifact and execute all of them would be adversely affected as they'd either have to start the same artifact n times or split it into n different artifacts and start each of them.

@wilkinsona
Copy link
Member

Before we can make progress on this, I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time. That's the default behaviour at the moment and we also have a property that can be used to specify the jobs that are executed if you don't want to execute all of them. Users that package multiple jobs in the same artifact but only execute one of them would be unaffected by this, other than a possible change to the property name used to identify the job to execute. Users that package multiple jobs in the same artifact and execute all of them would be adversely affected as they'd either have to start the same artifact n times or split it into n different artifacts and start each of them.

@mminella @benas what's your opinion on this please?

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Feb 16, 2021
@fmbenhassine
Copy link
Contributor

@wilkinsona Looking into this now. There are a lot of details discussed here, so I will check and get back to you asap.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Feb 16, 2021
@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Feb 16, 2021
@fmbenhassine
Copy link
Contributor

I think the first thing we need to decide is whether or not we want to continue to support executing multiple jobs at the same time

I will let the boot team decide on that, but I would not continue the support for running multiple jobs at the same time, because this is problematic in many ways: Same arguments are passed to all jobs (as mentioned by Stephane), mixed logs, ambiguous job execution order(ing), confusing exit code, etc. I can elaborate on each of those, but I believe there are obvious reasons why the current behaviour is confusing. So following the unix philosophy of making one thing do one thing and do it well, I would change the current behaviour to run a single job at a time.

Now regarding the initial feature request of adding support to pass job parameters through environment variables, I am not sure I fully understand the problem that led to this request. Here is what I understand from the original issue (spring-cloud/spring-cloud-dataflow#4119): SCDF tries to restart a failed job execution with a command similar to java -jar myjob.jar run.id(long)=5 where the entry point style of the task is of type shell. In this configuration, SCDF passes all application properties and command line arguments as environment variables (and transforms . to _). This means the run.id(long)=5 argument is transformed to an environment variable RUN_ID(LONG)=5 and this is when the failure happens because ( is an illegal character in an environment variable name. Now here is where I'm not fully clear with this issue:

  • Even if we add support to pass job parameters through environment variables, how would this fix the initial issue? It is not even possible to set an environment variable like RUN_ID(LONG)=5 for the same reason. I really need to see an example to understand the expected behaviour from boot. Moreover, if we decide to add support for that in boot, we need to add it in batch as well for feature parity with non-boot users.
  • If this Spring Batch specific notation of parameterName(ParameterType)=parameterValue is not friendly to being set as environment variable, then this should be fixed in batch. Note that this notation comes from the DefaultJobParametersConverter which could be customized/overridden if needed to support other characters. So unless I'm missing something, the initial issue could have been fixed with a custom JobParametersConverter.

That's why I would like to fully understand the problem before thinking of or proposing a solution. FTR, we have a meeting planned this week with @cppwfs to discuss the matter, and I would like someone from the boot team to join.

As a side note, this shell entry point form should not be recommended in SCDF (and even outside SCDF) because in this form, the process executed in the container is run as a subcommand of /bin/sh -c and hence will not receive unix signals. This is problematic if we want to ensure a graceful shutdown of Spring Boot apps, and more specifically Spring Batch apps where a graceful shutdown is crucial to be able to restart jobs automatically (and avoid having to update the job repository manually).

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Feb 17, 2021
@cppwfs
Copy link

cppwfs commented Feb 18, 2021

Hello All,

  • I agree with @benas in that it is a better standard that we only run one job per application launch. A user may have more than one job per app, however they have to select the job name they want to run. Something like spring.batch.job.name=foo. If they choose to run an app that has multiple jobs and don't select a specific job, JobApplicationRunner will thrown an exception.
  • To the original issue we discussed using a property that would promote certain environment variables to be job params. Something like: spring.batch.job.params=foo,bar,baz

Thoughts?

@wilkinsona
Copy link
Member

To the original issue we discussed using a property that would promote certain environment variables to be job params. Something like: spring.batch.job.params=foo,bar,baz

Based on what @benas has said above, I don't understand how this will help as RUN_ID(LONG) isn't a legal name for an environment variable.

That's why I would like to fully understand the problem before thinking of or proposing a solution. FTR, we have a meeting planned this week with @cppwfs to discuss the matter, and I would like someone from the boot team to join.

When is this meeting scheduled?

@cppwfs
Copy link

cppwfs commented Feb 18, 2021

Hello @wilkinsona ,
A user who wants to launch boot/batch jobs on kubernetes is currently using containers that contain a shell application that launches the boot/batch application. This shell app does not pass command line args to the application, it expects all properties to be set via the environment variables. Via SCDF you can specify a entry_point of shell that will convert all command-line-args to environment variables so that this information is not lost (this is meant to be used for docker containers with shell entry_point). However in kubernetes you can't use () as a key for environment variables.
But this initial issue raised a bigger point, how does a user set job-parameters for a boot/batch job if they can't pass in command line args for those cases where a docker container is using the shell entry-point? Or in this user's case they are using a shell app to launch their boot/batch app and those args are not passed to the boot/batch application.

  • We can schedule another meeting with you on this topic.

@wilkinsona
Copy link
Member

As just discussed with @cppwfs and @benas, we don't think that Boot is the right place to solve this as it's a more general problem.

SCDF is responsible for mapping command line arguments to environment variables in situations where the entry point cannot consume command line arguments. When the common line argument's name contains characters that are illegal in an environment variable's name, it performs a mapping to something that is legal. It then needs to perform the reverse of this mapping so that the transformed environment variables are turned back into their original command line arguments and made available to the application.

We also discussed the possibility of Batch being able to consume job parameters from sources other than the command line. This would be a new Batch feature that @benas is happy to consider should there be sufficient demand. My feeling was that if sources other than the command line are going to be considered, using Spring Framework's Environment could be a good option providing there was a convention for identifying entries in the Environment that are to be treated as job parameters.

@wilkinsona wilkinsona added status: declined A suggestion or change that we don't feel we should currently apply and removed for: team-attention An issue we'd like other members of the team to review status: blocked An issue that's blocked on an external project change status: feedback-provided Feedback has been provided labels Feb 19, 2021
@wilkinsona wilkinsona removed this from the 2.x milestone Feb 19, 2021
@wilkinsona
Copy link
Member

I've opened #25373 so that we can deprecate support for running multiple jobs.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants