
Related items loading ...
Section 1: Publication
Publication Type
Journal Article
Authorship
Lin Jimmy, Ma Xueguang, Lin Sheng-Chieh, Yang Jheng-Hong, Pradeep Ronak, and Nogueira Rodrigo
Title
Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
Year
2021
Publication Outlet
Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356-2362, July
DOI
ISBN
ISSN
Citation
Lin Jimmy, Ma Xueguang, Lin Sheng-Chieh, Yang Jheng-Hong, Pradeep Ronak, and Nogueira Rodrigo. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 2356-2362, July 2021.
Abstract
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. Around this toolkit, our group has built a culture of reproducibility through shared norms and tools that enable rigorous automated testing.
Plain Language Summary
Section 2: Additional Information
Program Affiliations
Project Affiliations
Submitters
Publication Stage
Published
Theme
Presentation Format
Additional Information
Computer Science Core Team, Refereed Publications