Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Task status is Unknown when using exec as entry point style #4291

Closed
umutcann opened this issue Dec 16, 2020 · 22 comments
Closed

Task status is Unknown when using exec as entry point style #4291

umutcann opened this issue Dec 16, 2020 · 22 comments
Assignees
Labels
status/need-feedback Calling participant to provide feedback status/need-investigation Oh need to look under a hood

Comments

@umutcann
Copy link

Hi,

While using shell as entry point style, If batch kubernetes pod has finished error, task status is error like below.

image

But we want to launch task using exec due to restart problem in the case.

And when we try to launch task using exec, task status is still unknown although pod has error and batch app finish error.

image

image

image

Do you have any idea about this issue?

Thanks.

@cppwfs
Copy link
Contributor

cppwfs commented Dec 16, 2020

It looks like the Task failed to start. Check your task's log, to see if it threw an exception. If there is no Pod for the task, check SCDF's log to see if there is an exception when attempting to launch the task. If this can't provide you the information you need to resolve the problem you can share the logs with us. Also what version of SCDF are your using?

@umutcann
Copy link
Author

umutcann commented Dec 21, 2020

Hi @cppwfs

We are using 2.7.0-RC1 version. There are pods on kubernetes for both shell and exec. And two of them are failing same error.

Let me clarify something. I am trying to test restart task capability of scdf. So I'm generating error for task intentionally.

When I used shell as entry point style, I am able to see failed step and exit message like below.

image

image

But when I used exec, pod status is still unknown although pod is failed with same error.

I hope that I am able to describe issue.

Please inform me if you have any other question.

Thanks in advance.

@sabbyanandan sabbyanandan added the status/need-investigation Oh need to look under a hood label Dec 21, 2020
@cppwfs
Copy link
Contributor

cppwfs commented Dec 21, 2020

Hello @umutcann Can you provide a sample app that displays this failure so I can retry it on my side.
Also how are you building your container for the application (Spring Boot, jib, Dockerfile, other)?

The steps you reproduce the problem:

  1. Launch Task/Batch boot application on SCDF. (It is expected to fail)
  2. Restart Failed Task/Batch boot application. (it is expected to fail)
  3. Check the status of application and verify that 2 failed Task Executions are present.

@umutcann
Copy link
Author

Hi @cppwfs,

I am not able to share our sample app due to restrictions of corporate data security. I'm sorry about that.

We are building our spring batch apps using s2i build strategy of RedHat Openshift. We are building source code using maven after that we are creating image using build config.

And I couldn't understand steps which you mentioned above. Because I'm not restarting failed task using as it's status is unknown using exec.

Please let me know, If I understand wrongly.

Kind regards.

@umutcann
Copy link
Author

umutcann commented Jan 4, 2021

Hi @cppwfs ,

Is there any update regarding issue?

Do you need any other thing to investigate issue?

@cppwfs
Copy link
Contributor

cppwfs commented Jan 6, 2021

Currently we are supporting containers created by Spring Boot, Jib and, Dockerfile.
Can you provide a simple application that replicates the problem along with the associated scripts used in the --scripts-url directory, that you are using for the S2I container build.

@cppwfs cppwfs added the status/need-feedback Calling participant to provide feedback label Jan 14, 2021
@umutcann
Copy link
Author

umutcann commented Jan 18, 2021

Hi @cppwfs

I was working on the problem.

I guess I find root cause of problem.

SCDF is creating a task, and creates a new pod in openshift to execute the task, but, in this pod, instead of updating the details of the task, it is creating a new one:

a

In the logs of the corresponing pod, you can see the details of the task and the job id

b (2)

c

We can see the task execution 28026 in scdf console, including references to the corresponding job, although the details of the task are missing (the new record is not properly populated)

Execution 28026 cannot be restarted as the task it is referring to is not existing.

This is working propertly in case we use entryPointStyle=shell

Do you still need a sample application?

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Jan 18, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Jan 19, 2021

Spring Cloud Data Flow creates a task execution row in the task_execution table when it launches a task. It then passes the spring.cloud.task.executionid property to the task containing the task execution id for the task to use. The task then uses this id to update its task execution information. The task is behaving like the spring.cloud.task.executionid is not being passed to the task. But the image above does not show the properties, so I can't be sure. You need to make sure that your container is passing all the properties from dataflow to the task application.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Jan 19, 2021
@umutcann
Copy link
Author

umutcann commented Jan 20, 2021

Hi @cppwfs

I shared the image which demonstrate that spring.cloud.task.executionid is being passed to kubernetes pod as argument right below.

1

But SCDF couldn't show spring.cloud.task.executionid in task execution page as you see below.

2

Although spring.cloud.task.executionid passed to pod as argument, batch app generate a new task execution as you see below.

3

If you need any other thing, please let me know.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Jan 20, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Jan 20, 2021

Yeah that is my bad. The task-execution-id is not stored in the task manifest, which is used to populate the "Applications Properties". This is because each launch always gets a new task execution id. You can see the task-execution-id passed in via the environment by executing the kubectl describe pod <your task pod>
i.e.

    Environment:
...
      SPRING_CLOUD_TASK_EXECUTIONID:                         515
...

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Jan 20, 2021
@umutcann
Copy link
Author

umutcann commented Jan 21, 2021

Hi @cppwfs

I was able to see SPRING_CLOUD_TASK_EXECUTIONID as anvironment variable only using shell.

Also I was able to see spring.cloud.task.executionid as argument using exec on kubectl describe pod . I shared screenshoot on my previous comment.

As you know exec passes all application properties and command line arguments in the deployment request as container arguments. So that I'm expecting that spring.cloud.task.executionid should be passed as an argument.

When we used the exec, you can see that spring.cloud.task.executionid is being passed to pod as argument not an environment variable. (last line)

1

I think there is a bug. TaskBatchExecutionListener class generates a new task execution id even though spring.cloud.task.executionid is being passed to app.

How can we solve this issue?

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Jan 21, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Jan 21, 2021

Let's go back to my comment here: #4291 (comment)
In this comment I mentioned the container creation methods that we support. But I also mentioned that I'd take a look at your scripts in the --scripts-url of your S2I to see if I can help there.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Jan 21, 2021
@umutcann
Copy link
Author

I attached s2i scripts below.

s2i.zip

I hope that it helps to figure out issue.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Jan 28, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Feb 1, 2021

Is the run-java in the run file the one provided by fabric8 or is this a run-java.sh that you are using?

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Feb 1, 2021
@umutcann
Copy link
Author

umutcann commented Feb 2, 2021

We are using redhat-openjdk18-openshift:1.8 base image. This scripts are provided by Redhat. We are using it.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Feb 2, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Feb 2, 2021

I just reread the comments. So you were able to get it to work when you used shell entrypoint style. If so, I think we can close this issue.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Feb 2, 2021
@umutcann
Copy link
Author

umutcann commented Feb 3, 2021

Yes @cppwfs ,we are able to deploy apps using shell. But we can not restart task using shell due to bug in spring boot as you mentioned in this issue.(#4199 (comment)).

Because of that we are trying to launch task using exec.

Now we are locked state. I hope that you understand us. You shouldn't close ticket.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Feb 3, 2021
@umutcann
Copy link
Author

umutcann commented Feb 8, 2021

Hi, @cppwfs

How can we proceed? Do you have any guidance regarding my previous comment?

Thanks in advance.

@cppwfs
Copy link
Contributor

cppwfs commented Feb 11, 2021

Hello @umutcann ,
There have been updates on spring-projects/spring-boot#23411.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Feb 11, 2021
@umutcann
Copy link
Author

umutcann commented Feb 12, 2021

Hello @cppwfs ,

As I understand you are making design decision for shell bug.

May you estimate to present release version of fix? And do you have any idea what is our problem when using exec type?

As I told in my previous comment, we are now stuck state.

We are trying to run our all batches over SCDF. Due to this problems, we couldn't service restart capability. It causes dissatisfaction for SCDF.

I hope that you are understanding us.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Feb 12, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Feb 14, 2021

Hello @umutcann,
I'm sorry you are having difficulties with this issue. However in this case, your diagnosis that it is SCDF's issue is incorrect based on the information you have provided in our conversation above. The solution that you require is a new feature in the JobLauncherApplicationRunner of the Spring Boot project (Issue: spring-projects/spring-boot#23411) to support restarting of batch jobs using properties to pass in Job Parameters. This is because your container uses a shell app to launch the batch-boot application that does not pass the job parameters via command line argument, thus you would experience this issue even if you start/restart a batch job directly via Kubernetes.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Feb 14, 2021
@cppwfs
Copy link
Contributor

cppwfs commented Feb 16, 2021

To prevent confusion we will close this issue and put all new notes on #23411.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
status/need-feedback Calling participant to provide feedback status/need-investigation Oh need to look under a hood
Projects
None yet
Development

No branches or pull requests

3 participants