Releases: terrier-org/pyterrier
0.11.0
What's Changed
Significant update that has refactored much of the PyTerrier source code and renamed many classes as we progress towards a PyTerrier 1.0 release.
The most significant changes are:
pt.init()
is no longer required 😃. If necessarypt.java
methods can be used to change Java initialisationpt.BatchRetrieve
is nowpt.terrier.Retriever
, and similar changes for other Terrier indexers and retrieverspt.AnseriniBatchRetrieve
is now in its own separate project, PyTerrier-Anserini, with various improvements
All changes are backwards compatible in this release - deprecation warnings will guide you how to update your code.
More details below:
Improvements
- Move all Java/JNIUS code into pt.java, move all Terrier code into pt.terrier; remove pt.init() by @seanmacavaney in #447
- dynamic module loading by @seanmacavaney in #461
- Incorporate Retrieval Scores into RM3 by @mam10eks in #453
- pt.apply for making an indexer by @cmacdonald in #467
- query_toks support for terrier.Retriever by @cmacdonald in #466
- add save_mode='warn' and save_mode='error' to pt.Experiment (warn as default) by @cmacdonald in #408
### Refactoring
- Deprecate DFIndexer by @cmacdonald in #457
- pt.terrier.rewrite revisions - remove Axiomatic, remove terrier-prf by @seanmacavaney in #472
- shims for deprecated modules by @seanmacavaney in #476
- text_loader abstraction for pt.text.get_text by @seanmacavaney in #469
- move Anserini to a separate project by @seanmacavaney in #473
Documentation
- Add RankVicuna and RankZephyr Plugins by @kaustubhdhole in #441
- Update tuning.rst by @albertoueda in #446
- Add PyTerrier_ChatNoir to the plugin section by @mam10eks in #452
- Remove nptyping dependency to assure numpy 2 compatability by @cmacdonald in #445
Minor
- change all tests to use new terrier retriever names, but check old names too by @cmacdonald in #458
- Parallel fixes by @seanmacavaney in #462
- fix logger error by @seanmacavaney in #464
- Add comments to requirements.txt by @cmacdonald in #465
- failing anserini tests due to version 0.36.0, disabling for now by @seanmacavaney in #468
- remove the writing of a default terrier.properties file by @cmacdonald in #470
- fix test_maven by @seanmacavaney in #471
- Python 3.12 in GHA by @cmacdonald in #459
- Bump most JDK version tested in GHA to 21 by @cmacdonald in #475
- Update pt.terrier.Retriever str and repr #474
New Contributors
- @kaustubhdhole made their first contribution in #441
- @mam10eks made their first contribution in #452
Full Changelog: 0.10.1...0.11.0
0.10.1
Minor release with minor improvements and bug fixes.
What's Changed
- Bugfix: Delete baseline pvalue from correction method input by @JorgeGabin in #440
- Fix: fix msmarco location by @cmacdonald in #435
- Feature: added corpus_iter for Terrier index by @cmacdonald in #426
- remove sklearn as required dependency by @cmacdonald in #410
- Add troubleshoot for installation and certification error by @Krissy510 in #411
- fix parsing of trecxml topics by @lukaszett in #414
- paired t-tost by @seanmacavaney in #420
- read_results optimization by @seanmacavaney in #421
- pickling QE pipelines to parallelised QE gridsearch by @cmacdonald in #430
- Require Python 3.8 minimum by @cmacdonald in #431
- Bump logback from 1.2.0 to 1.2.13 in /terrier-python-helper by @dependabot
- improved error message pt.apply.query - from #433 by @cmacdonald in #434
- Improved testing of FeaturesBatchRetrieve by @cmacdonald in #437
New Contributors
- @Krissy510 made their first contribution in #411
- @JorgeGabin made their first contribution in #440
Full Changelog: 0.10.0...0.10.1
0.10.0
What's Changed
New Features
Transformer.__call__
now supports both dataframe and iterdicts by @cmacdonald in #381- Terrier: Custom stopwords by @cmacdonald in #372
- Terrier: Access the stemmer of Terrier from PyTerrier by @cmacdonald in #382
- Terrier: Improved API for loading Terrier indices into memory by @cmacdonald in #386
Improvements
- added tokenizer as arg for pt.text.sliding by @mihirs16 in #387
- addresses #367 - include qid in pt.apply Exception by @cmacdonald in #370
- addresses #377: pt.apply.query() raises exception if the query column does not exist by @cmacdonald in #380
- let pt.tqdm exist without pt.init() by @cmacdonald in #399
- deprecate pt.Utils by @cmacdonald in #384
- removes two warnings by @cmacdonald in #385
- work on test failure by @cmacdonald in #401
- Test pyterrier with newer Python versions by @cmacdonald in #400
- bump supported Anserini version by @cmacdonald in #406, addresses #404
- Terrier: allow to put term and LexiconEntry into a tuple by @cmacdonald in #369
Bugs:
- stringify properties and controls, addresses #357 by @cmacdonald in #358
- fix bug in metadata size warning by @seanmacavaney in #362
Documentation
- Update pipeline_examples.md by @gurcankavakci in #359
- Fixed typo by @hermlon in #364
- Update ltr.rst by @Hermi-Mire in #371
- Update transformer.rst by @albertoueda in #383
- clarify docstring for indexing with regards to metadata by @lukaszett in #394
- Query Rewriting & Expansion by @cakiki in #402, #403
New Contributors
- @gurcankavakci made their first contribution in #359
- @hermlon made their first contribution in #364
- @Hermi-Mire made their first contribution in #371
- @lukaszett made their first contribution in #394
- @cakiki made their first contribution in #402
- @mihirs16 made their first contribution in #387
Full Changelog: 0.9.2...0.10.0
0.9.2
Minor release with minor improvements and bug fixes.
What's Changed
- add sbert example notebook by @cmacdonald in #344
- Update scikit-learn requirement from the deprecated sklearn, which was causing build errors at some times.
- adding batching operations to
apply.generic()
andapply.by_query()
by @cmacdonald in #351 - thanks to Xun Zhou, University of Michigan via #350 - improve error messages for invalid indexing configurations by @cmacdonald in #349 -- thanks to @maxhenze in #348
- Various empty dataframe fixes by @cmacdonald in #353 -- thanks to report by Prithvijit Dasgupta, University of Michigan in #352
- improved error message for add_ranks by @cmacdonald in #354
Full Changelog: 0.9.1...0.9.2
0.9.1
Bugfix release addressing a problem with pretokenised indices on Windows
What's Changed
- Nofifo pretok indexing fixes by @cmacdonald in #343
Full Changelog: 0.9.0...0.9.1
0.9.0
Significant update - refactoring of public API (e.g. pt.transformer.TransformerBase
-> pt.Transformer
) and support in the Terrier backend for making indices from pre-tokenised documents. Python 3.10 is now supported.
What's Changed
- fix error in IRDSDataset when a query field is named "query" by @seanmacavaney in #303
- Fix type annotation by @heinrichreimer in #313
- addresses #315 IRDS corpus_iter are not subscriptable by @cmacdonald in #316
- Missing comma in bm25_qe example by @JohnGiorgi in #319
- Argument meta should be supplied as dictionary by @JohnGiorgi in #320
- use Jnius 1.4 by @cmacdonald in #249
- Python 3.10 support by @cmacdonald in #322
- Lz4 support for pt.io.autoopen() by @cmacdonald in #323
- addresses #326 faster version of add_ranks for single queries by @cmacdonald in #327
- addresses #321 pt.apply.doc_score batching by @cmacdonald in #325
- IterDictIndexer can index pre-tokenised documents by @cmacdonald in #328
- Bump logback-core from 1.2.0 to 1.2.9 in /terrier-python-helper by @dependabot in #336
- documenting BM25F controls and tuning by @cmacdonald in #296, addresses #294
- 0.9refactor by @cmacdonald in #314, #339, addresses #271
- pt.Experiment() alters the input measures list to drop "mrt" #301
- Expose Termpipelines in Terrier index backend by @cmacdonald in #338
- pt.rewrite.tokenise() impl by @cmacdonald in #340 addresses #252 #253
- upgraded GitHub actions by @cmacdonald in #341, #342
- fix LTR groupby for xgboost & lightgbm by @cmacdonald in #284
New Contributors
- @heinrichreimer made their first contribution in #313
- @JohnGiorgi made their first contribution in #319
Full Changelog: 0.8.1...0.9.0
0.8.1
Minor release with minor improvements and bug fixes.
What's Changed
- fixed bug with is_transformer by @seanmacavaney in #274
- addresses #275 issue k in kmaxavg, improved testing by @cmacdonald in #276
- defer loading ir_datasets by @seanmacavaney in #280
- Set meta and meta_lengths in constructor by @MWschutte in #282
- Anserini fixes by @cmacdonald in #279, reported by @Azouu
- prevent use of nptyping v2 by @cmacdonald in #291, reported by @tabonnet
- SourceTransformer pass through extra columns, addresses #287 by @cmacdonald in #288, reported by @Xiao0728
- more transformers with repr by @cmacdonald in #289
New Contributors
- @MWschutte made their first contribution in #282
Full Changelog: 0.8.0...0.8.1
0.8.0
PyTerrier 0.8.0 Release Notes
Released on 18/01/2022
What's Changed - Major
- Require Python 3.7 by @cmacdonald in #255
- Deprecate automatic coercion of transformers by @cmacdonald in #258
- introduce
pt.Transformer
as public API; *pt.transformer.TransformerBase
will be deprecated in 0.9; by @cmacdonald in #258 - introduce query biased summarisation - addresses #205 by @cmacdonald in #223, suggested by @adambaker
- provide re-ranking runs from datasets by @seanmacavaney in #262
What's Changed - Minor
- faster testing in Github Actions: focus on requested jnius, rather than changing jnius version 3 times by @cmacdonald in #256
- Faster tests by @cmacdonald in #257
- Use Flake to identify bugs, reduce imports etc by @cmacdonald in #259
- pyterrier loaded message to stderr by @seanmacavaney in #260
- Fix code block in
ltr.rst
in sectionWorking with Features
by @bart-kosmala in #261 get_dataset()
for non-existant irds dataset by @seanmacavaney in #263- Filter out non-indexed/metadata fields when indexing by @seanmacavaney in #267, reported by @bjoernengelmann in #266
- mirroring of Vaswani dataset files by @seanmacavaney in #268
pt.io.read_results()
can merge topics by @seanmacavaney in #265- addresses #264,
text.scorer()
will default totakes='docs'
by @cmacdonald in #269 - change paths and exercise names by @cmacdonald in #270
New Contributors
- @bart-kosmala made their first contribution in #261
Full Changelog: 0.7.2...0.8.0
0.7.2
Minor release addressing some useful bug fixes and small features. This is the last release that will support Python 3.6.
What's Changed
- using chunked instead of ichunked - this fixes indexing speed/memory-consumption/crashes with indexing pipelines, by @seanmacavaney in #238
- remove deprecated code by @cmacdonald in #239
- combsum dropping documents not appearing on both sides of + by @cmacdonald in #240
- addresses #203, verbose in pt.Experiment by @cmacdonald in #245
- Py37 minimum warning, addresses #241 by @cmacdonald in #246
- use caching in GitHub Actions by @cmacdonald in #248
- save run files automatically in pt.Experiment #163 by @cmacdonald in #247
- Set dtype for qrels columns at read time in io method by @jjdelvalle in #254
- support meta config in IterDictIndexer constructor, addresses #250, by @cmacdonald in #251
New Contributors
- @jjdelvalle made their first contribution in #254
Full Changelog: 0.7.1...0.7.2
0.7.1
PyTerrier 0.7.1
Minor update to support activities for CIKM 2021 tutorial. In particular:
pt.debug.print_num_rows()
added- Terrier Data Repository support for TREC Covid test collection.