Skip to content

Performance issues with partition and split [BATCH-2523] #1079

Closed
@spring-projects-issues

Description

@spring-projects-issues

Damien DALY opened BATCH-2523 and commented

Hi,

I am trying to run Spring Batch with splitted/partitionned steps, to increase batch thoughput.

This is a one shot database migation (from Firebird to postgresql), I don't need to store job/state data, so I use a MapJobRepository and a SimpleJobLauncher. Configuration is done by annotations on static class members. There is only one application, no remote step execution, only local code.

I also created a ThreadPoolTaskExecutor.

I have a main flow, that starts sequentially 3 other flows : -->[flow1]-->[flow2]-->[flow3]--.

Flow1 is a split flow, containing single "classic" steps (reader, processor, writer, chunked) and some "partitioned" steps. Each split takes a new SimpleAsyncTaskExecutor instance.

Each partitioner creates lists of entity id (Integer[]) to process.

The TaskExecutor is a singleton of ThreadPoolTaskExecutor.

The performance issue I have seams that there is a long time when master steps are finalising child step executions. If I am right, it looks like a serialization/deserialization process happening to get child steps status/context.

How can I either change serializer/deserializer process, or bypass totally serialization ?
What can be "good" values for gridSize, thread pool size... ?

Thanks.


No further details from BATCH-2523

Metadata

Metadata

Assignees

No one assigned

    Labels

    related-to: performancestatus: declinedFeatures that we don't intend to implement or Bug reports that are invalid or missing enough details

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions