Reinforced Queries using Pre-trained Language Models in Sparse Retrieval

We address the persistent issue of query-document mismatch in sparse retrieval by leveraging reinforcement learning and a pre-trained model.

Abstract

Despite the success of dense retrieval, sparse retrieval methods still show potential in interpretability and generalizability. However, query-document term mismatch in sparse retrieval persists, rendering it infeasible for many practical applications. To remedy this, we introduce a novel query expansion approach, denoted as QSparse. QSparse generates expanded terms by pre-trained language models trained by reinforcement learning and then uses a sparse retrieval method to retrieve documents. A thorough experimental evaluation on three datasets from disparate domains (SCIFACT, Natural Questions (NQ), and MS-MARCO passage) shows that QSparse enriches the original query and significantly improves sparse re- trieval. Furthermore, QSparse, when combined with dense retrieval, achieves an 8% improvement in NDCG@10 for SCIFACT and a 2% increase in recall for NQ, compared to the original dense retrieval. These results highlight that QSparse leverages the benefits of both sparse retrieval and dense retrieval to address mismatch issues.

The preprint and code for this paper are available.