How to not block resources while waiting for solr commits?

In my application I have an rpc request that adds about 30 documents and 30 relations between those.

I have noticed that on my local setup the request takes about 6s to complete and more than 70% of the time is spent waiting on solr commits.

During the wait for the solr commits the http request acceptor thread is being blocked and also the jdbc connection from the pool is still being held meaning that those resources are not available for further request processing.

I know that I can use mgmtp.a12.dataservices.search.solr.commitPerUpdate.enabled to disable these commits and hence the wait times. However this means that when the request is answered the client will not be able to query data about the added documents because they might just not have been commited and therefore be visible to the client.

So how can I avoid blocking resources during solr commits while still being able to query documents after an rpc request was answered?

Hi @andreas-fresh-mesa,

Unfortunately, there are no other options from the DS side. We either wait for the commits or we don’t.

  • If we wait, Solr response will be slow because commit takes time. As the result, the data are immediately available because a new searcher on the Solr side is available with each commit.
  • If we do not commit with each update, Solr decides when the data will be available because Solr autocommit feature will be used. Solr writes into the transaction log all the changes, and it decides when to flush it. It is possible to change Solr configuration to autocommit earlier, but there will be delay anyway. For more info, please see: Commits and Transaction Logs :: Apache Solr Reference Guide

This information is relevant for DS version 37.2.0 and below

A solution sketch to not unnecessarily block resources would look like this:

@RequestMapping("#{@dataServicesCoreProperties.server.contextPath}/v2/rpc2")
@SecuredController
@RestController
public class LessResourceBlockingJsonRpcController {

	@Autowired
	private JsonRpcControllerImpl jsonRpcControllerImpl;
	
	@Autowired
	private AsyncTaskExecutor taskExecutor;
	
	@Autowired
	private SolrClient solrClient;
	@Autowired
	private DataServicesCoreProperties dataServicesCoreProperties;

	@PostMapping(consumes = { MediaType.APPLICATION_JSON_VALUE }, produces = { MediaType.APPLICATION_JSON_VALUE })
	public DeferredResult<ResponseEntity<ByteArrayResource>> jsonRpc(InputStream request,
			@RequestHeader(value = REQUEST_ID_HEADER, required = false) String requestId) throws IOException {
		ResponseEntity<ByteArrayResource> result = jsonRpcControllerImpl.jsonRpc(request, requestId);

		CompletableFuture<UpdateResponse> combinedCompletableFuture = taskExecutor.submitCompletable(() -> {
			return solrClient
					.commit(dataServicesCoreProperties.getSearch().getSolr().getCollection().getName());
		});

		DeferredResult<ResponseEntity<ByteArrayResource>> deferredResult = new DeferredResult<ResponseEntity<ByteArrayResource>>();
		combinedCompletableFuture.whenComplete((r, e) -> {
			if (e != null) {
				deferredResult.setErrorResult(e);
			} else {
				deferredResult.setResult(result);
			}
		});
		
		return deferredResult;
	}
}

together with the configuration options

mgmtp.a12.dataservices.search.solr.commitPerUpdate.enabled=false
mgmtp.a12.dataservices.search.solr.initialization.commitBypass.enabled=true

I’m not sure why I had to enable the commit by pass, but it didn’t work without it.

The above controller uses the original rpc controller to calculate the response so that the logic stays the same. However it will issue an async solr commit and then use servlet async facilities to only send the servlet response once the solr commit was done.

That way a client will receive the response at a time where the data has been commited to solr.

This way dataservices also doesn’t issue lots of solr commits during rpc requests but only one at the end (which might not be one wants for an application). It also causes the http acceptor thread to be given back to the servlet dispatcher much earlier by not blocking on the solr commit. Also the jdbc connection is returned much earlier to the connection pool because dataservices uses spring.jpa.open-in-view where the return to the jdbc connection pool is only done outside of the controller.

@andreas-fresh-mesa,

Very nice! We have not been thinking in these directions when it comes to synchronization with Solr. This code could make all our processing simpler because we could not just send commits, but we can inform Solr about our changes only after each JSON-RPC operation has been processed, and we could also batch those requests. To be honest, we have spent last 2 years thinking about how to replace Solr because it is causing us more troubles than give us benefits, but there are also ways how to work with Solr in more efficient way.

We actually have implemented this with quite some success. The final implementation approach we use is the following:

  • dataservices solr commits were disabled by configuration
  • a spring after commit listener starts an asyn solr commit
  • a servlet filter waits for the async solr commit to finish

This brought request processing times down from about 8s to 3.0 - 3.3s because

  • there is now only one solr commit for all of the documents modified during the request.
  • a soft solr commit is used instead of hard commit

This is not as efficient as it could be because the servlet thread is still being blocked. But by blocking it in a servlet filter that is very early on in the filter chain no more other resources are blocked while waiting for the solr commit - especially no jdbc connection is held at that point anymore.

Making use of the servlet async facilities is not possible because rpc operations do not support reactive return types or java.util.concurrent.CompletionStage.

To add a bit more information this is what code we used to implement all of the above:

The following class deals with the actual solr commit and when it is necessary to issue one.

@RequiredArgsConstructor
@Slf4j
public class SolrCommitCoordinator {

	/*
	 * This only supports one transaction per request. This might in some cases not
	 * be enough. However there is currently no other possibility known to determine
	 * whether additional solr commits are needed.
	 */
	private final NamedThreadLocal<CompletableFuture<Void>> commitCoordination = new NamedThreadLocal<>(
			"solr-commit-coordination");

	private final SolrClient solrClient;
	private final DataServicesCoreProperties dataServicesCoreProperties;
	private final AsyncTaskExecutor taskExecutor;

	public void triggerSolrCommit() {
		Object existingCommitCoordination = commitCoordination.get();
		if (existingCommitCoordination == null) {
			log.debug("Triggering solr commit.");
			CompletableFuture<Void> solrCommit = taskExecutor.submitCompletable(() -> {
				return executeSolrCommit();
			});
			commitCoordination.set(solrCommit);
		}
	}

	private Void executeSolrCommit() throws SolrServerException, IOException {
		StopWatch commitDuration = null;
		if (log.isDebugEnabled()) {
			commitDuration = new StopWatch();
			commitDuration.start();
		}

		try {
			solrClient.commit(dataServicesCoreProperties.getSearch().getSolr().getCollection().getName(), false, true,
					true);
		} finally {
			if (commitDuration != null) {
				commitDuration.stop();
				log.debug("Solr commit took {}ms.", commitDuration.getTotalTime(TimeUnit.MILLISECONDS));
			}
		}
		return null;
	}

	public void blockUntilSolrCommitFinished() {
		CompletableFuture<Void> existingCommitCoordination = commitCoordination.get();
		if (existingCommitCoordination != null) {
			commitCoordination.remove();

			StopWatch blockedDuration = null;
			if (log.isDebugEnabled()) {
				blockedDuration = new StopWatch();
				blockedDuration.start();
			}

			try {
				existingCommitCoordination.join();
			} catch (CancellationException | CompletionException e) {
				throw new IllegalStateException("An error occured while waiting for the solr commit to finish.", e);
			} finally {
				if (blockedDuration != null) {
					blockedDuration.stop();
					log.debug("Waited {}ms for solr commit to finish.",
							blockedDuration.getTotalTime(TimeUnit.MILLISECONDS));
				}
			}
		}
	}
	
	public void cleanup() {
		commitCoordination.remove();
	}
}

This class is responsible to actually trigger a commit after a transaction completes. If a transaction completes the triggering may happen many times. That’s why SolrCommitCoordinator takes care not to issue any further commits if one has already been issued.

@RequiredArgsConstructor
public class SolrCommitListener {
	
	private final SolrCommitCoordinator solrCommitCoordinator;

	@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
	public void afterDocumentCreate(DocumentAfterCreateEvent event) {
		solrCommitCoordinator.triggerSolrCommit();
	}

	@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
	public void afterDocumentUpdate(DocumentAfterUpdateEvent event) {
		solrCommitCoordinator.triggerSolrCommit();
	}

	@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
	public void afterDocumentDelete(DocumentAfterDeleteEvent event) {
		solrCommitCoordinator.triggerSolrCommit();
	}
}

And a servlet filter is used to keep the behaviour of clients seeing data in solr that was changed with the previous request.

/**
 * Waits for pending solr commits to happen so that the client receives the
 * response after the solr commit and is able to find the data in solr.
 */
@RequiredArgsConstructor
public class WaitForSolrCommitFilter extends OncePerRequestFilter {

	private final SolrCommitCoordinator solrCommitCoordinator;
	
	@Override
	protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
			throws ServletException, IOException {
		try {
			filterChain.doFilter(request, response);
			solrCommitCoordinator.blockUntilSolrCommitFinished();
		} finally {
			solrCommitCoordinator.cleanup();
		}
	}

}

The actual spring config used for this is then:

@Configuration
@ConditionalOnProperty(prefix = "mgmtp.a12.dataservices.search.solr.commitPerUpdate", 
	value = "enabled", 
	havingValue = "false", 
	matchIfMissing = false)
@RequiredArgsConstructor
public class SolrCommitAfterJdbcCommitConfiguration {
	
	private final SolrClient solrClient;
	private final DataServicesCoreProperties dataServicesCoreProperties;
	private final AsyncTaskExecutor taskExecutor;

	@Bean
	public SolrCommitListener solrCommitListener() {
		return new SolrCommitListener(solrCommitCoordinator());
	}

	@Bean
	public SolrCommitCoordinator solrCommitCoordinator() {
		return new SolrCommitCoordinator(solrClient, dataServicesCoreProperties, taskExecutor);
	}
	
	@Bean
	public GenericFilterBean waitForSolrCommitFilter() {
		return new WaitForSolrCommitFilter(solrCommitCoordinator());
	}
}

Note that the following configuration properties must be set for this to be effective:

mgmtp.a12.dataservices.search.solr.initialization.commitBypass.enabled=true
mgmtp.a12.dataservices.search.solr.commitPerUpdate.enabled=false