Overview of the Spring Integration Batch Module
Many use cases in Spring Batch look like they might be efficiently and concisely implemented in Spring Integration. Here is a list. These are features that can extend Spring Batch, or use Spring batch features in the context of Spring Integration. Work in progress waiting for community feedback. Many issues to do with transactionality and synchronous execution have been raised and fixed in Spring Integration as a result of these use cases being prototyped.
||Message triggers job
||Complete. Also lots of opportunities with monitoring progress.
||Chunking and multi-VM job execution
||Failures might need some analysis. Use of stateful StepExecutionListener requires use of step scope.
||Stateful and non-linear jobs -> job = flow
||Simple use cases work well with Spring Batch 2.0 and no Integration features.
||Flexible item processing model (as message flow) -> step = flow
||Complete (v. simple using MessagingGateway). Unit tests only.
||Automatic repeat / retry
||retry (unit test)
||Unit tests only, since it just uses existing features.
||Restartable file processing
||Seems to hang together. Not tested thoroughly, but apparently someone is using it.
||Asynchronous item processing
||A general purpose ItemProcesor that returns a Future.
Numbers 2, 4, 5 have also been identified as high level Spring Batch 2.0 Features or themes. If we implement 1, then we also don't need to do any more scheduling and triggering in Spring Batch.
Number 6 from the list (repeat/retry) is more of a Spring Integration pattern than a Spring Batch one. We implemented it in Spring Batch first, with an eye to seeing about pushing it out into Spring Integration later (with probably a split of repeat/retry out of Batch at that time).
- User sends message to channel (maybe through a scheduler)
- System interprets message payload as parameters for JobLauncher
- System launches job execution
- If message had a replyTo, System acknowledges with JobExecution
- User accepts response and uses it to monitor progress
- System waits for job to finish and replies when it is over
- User polls for replies and gets notification about end of execution
- User wants to block on send and only receive response when job is done
- Step flushes chunk as message to outgoing channel (repeat up to throttle limit)
- Worker thread picks up chunk and processes it
- Worker thread replies to response channel
- Step picks up reply and aggregates the counts
- Step blocks until all the requests are satisfied
TODO: failure modes
Job is executed over long period. Many jobs can be executing concurrently.
- Input stage for each job: System reads all items and marks with the job instance id in a durable repository (staging table)
- System sends each item (or chunks of items that can be processed together as appropriate) to a channel
- Items flow through message pipeline, occasionally pausing until certain conditions are met, possibly for days at a time
- Aggregator sits and waits for all items in a job to be finished and then wraps up
Stateful and non-linear jobs
Dependencies beyween steps and conditional flow between steps. Each handler node in a message flow is a step execution, with all the robustness guarantees from the Spring Batch meta data.
- User launches job
- System sends message to channel containing job execution
- Handler accepts message and executes a step
- Handler translates result of step execution into the same form that it accepted the original request
- System routes message to next handler, possibly dynamically based on data in the message
- Next handler does the same... until one of the routing decisions leads to a reply channel
- System receives reply and transfers information to job execution (e.g. status) as necessary
Variation: failure in one of the handlers
Variation: restart after failure
Flexible item processing model
- Step hands item to ItemWriter
- Item is converted to message and sent to synchronous flow
- Handler accepts message and does something with item
- System routes result to next handler, possibly dynamically
- Handler throws exception
- System propagates exception up to ItemWriter (forces rollback under normal circs - hence synchronous flow)
Automatic repeat / retry
- User sends message to channel
- System start a transaction and reseives message, then processes it
- User sends another message
- System receives and processes it in the same transaction
- ... repeat ...
- System determines that batch is complete and commits transaction
Restartable file processing
Large files need to be processed, so message payload of file contents is not practical. One line or XML event per message with failover and restartability from Spring Batch.
- User triggers file processing (sends message, copies file to directory, etc.)
- System starts new job
- System processes file line by line (or even by event), wrapping each one as a message and sending it to a synchronous flow
- System commits periodically (as determined by Spring Batch step configuration)
Variation: failure and restart
- Item processing fails
- System aborts job and sends message to failure channel (or failure message to normal reply channel)
- Operator fixes problem and triggers restart (another message channel?)
- System restarts job for same file at point where it left off
- System completes processing
- System sends sucess message to reply channel
Variation: send to asynchronous flow. Same as main use case but item message is sent to asynchronous flow. Not as robust because if the lights go out then meesages will be lost, but at least a large file can be split into smaller chunks.
Asynchronous item processing
This is actually a variation on flexible item processing model.
- ItemProcessor executes in background (non-transactionally)
- ItemWriter collects outputs from futures before phyically writing data