We are observing faulty behavior in our application that leads to the display of zombie documents.
According to our analysis, the following could be the cause of the problem:
We are using the data-service component in conjunction with workflows. Due to a double-click in the UI, two COMPLETE_TASK calls are triggered in quick succession.
Thread 1 and Thread 2 begin working on the same process instance and attempt to complete the same task. At the beginning of their transactions (almost simultaneously), both store the Solr index state.
Camunda uses optimistic locking, meaning two threads cannot complete the same task concurrently (only one can succeed).
Thread 1 completes successfully, commits to the database, and updates the Solr index.
Thread 2 (Transaction 2) fails due to optimistic locking (standard Camunda behavior), after which the RollbackPostProcessor is executed.
While the database changes remain persistent, the search index is left in a vulnerable state. The RollbackPostProcessor resets it to an older state — the state prior to both transactions. As a result, the index becomes corrupted. The application can only be recovered by performing a reindex.