Lemur 3.1.2 release notes
(March 9, 2005)Related Links
- We have tested using gcc 3.2.2, 3.2.3, 3.4 and VC++ .NET .
- This version includes minor enhancements and bug fixes from the major release version 3.1.
Please see release notes for 3.1 for explanation of new features and deprecations since version 2.2.x.
- Enhancements:
- New support for indexing arbitrary fields in IndriIndex
- Changes to support gcc 3.4
- Windows Installer
- #band operator added to indri query language. This is a prox operator suitable for use as the filter node for #filrej and #filreq .
- Also in the indri query language, '{' term1 .. termN '}' added as alternative syntax for the synonym operator. The use of '<' term1 .. termN '>' has been deprecated because queries with that operator can cause a parsing failure when used in a parameter file.
- Modify the convenience static method RetMethodManager::runQuery so that it tries to get stemming and stopping information from Inv(FP)Index and KeyfileIncIndex to process queries if no options are passed in.
- Added Index::collectionProps method to API for getting back values that were passed in during indexing time. BasicCollectionProps class created to support this activity. All current indexing TextHandlers and Indexes have been updated to enable its use.
- Lemur CGI updated to make use of Index's CollectionProps to automatically stop or stem queries to match Index's term dictionary.
- Addition of the TopdocsIndex for query optimization in Indri. Provides score-safe pruning of the inverted lists.
- Add the PonteExpander class as an example of an alternative query expander (to be integrated into IndriRunQuery.)
- Have NetworkListener throw an exception if the port is in use, enabling IndriDaemon to exit cleanly.
- Update numeric fields in an Indri repository so that it may now contain either positive or negative values, instead of positive only.
- Bugs Fixed:
- Problem: KeyfileIncIndex does not correctly store GB encoded terms.
Solution: Changed compress_int and int_lc_if_compressed to accept unsigned long instead of int value - Problem: DirectoryIterator can seg fault on ++
Solution: Fix FileTreeIterator to explicitly test for directory instead of adding all that isn't a File - Problem: QueryEnvironment::stemFieldCount and QueryEnvrionment::termFieldCount are returning 0, even when they should return something larger than 0
Solution: Change LocalQueryServer::termFieldCount to represent the term and fieldid parameters in the correct order - Problem: KrovetzStemmerTransformation loads kstem tables more than once, which produces numerous duplicate entry warnings
Solution: Add a static member to prevent multiple loads of the table
- Problem: Using RetEval,TF.IDF retrieval using log of the weights gives incorrect scores for versions after 2.0.2
Solution: Fixed RetParamManager to maintain compatibility when parameter specification was updated to use strings (ie logf) instead of numerals (1) - Problem: keyfilecode::init_file_name produces a spurious bad_name_err when there is a "." in path
Solution: scan for last "." - Problem: KeyfileDocMgr and KeyfileIncIndex corrupts data files when accessed by multiple cgi processes
Solution: Add (default) read-only flags - Problem: BuildDocMgr does not work on windows with given project files.
Solution: Fix project settings to enable run-time type checking - Problem: Incremental build of Indri Repository produces corrupted CompressedCollection
Solution: Add an exclusive access flag to WriteBuffer to force it to delegate tellp to its File object. This makes the file offsets correct when incrementally building a repository. - Problem: Indri's KrovetzStemmerTransformation does not initialize the kstem data tables leading to poor stemming performance
Solution: Add appropriate call to indri_kstem_load_table - Problem: Inv(FP)Index fails to load indexes with names longer than 128 chars.
Solution: Change load to use variable length string object instead of char buffer - Problem: Indri's NetworkServerProxy::documentCount(term) ignores its argument.
Solution: Pass the parameter into the XMLNode, enabling correct behavior. - Problem: QueryEnvironment::termCount(stopword) and
QueryEnvironment::documentCount(stopword) pass an empty string to Keyfile, causing an exception
Solution: Place guards against the empty string in all uses of processTerm by LocalQueryServer. - Problem: Attempting to index a non-pdf file as pdf causes a segfault, crashing IndriBuildIndex and leaving a corrupt repository behind.
Solution: Check for doc->isOK in PDFDocumentExtractor - Problem: IndriIndex reports incorrect term counts for fields.
Solution: Fix off by one error in _removeclosedTags - Problem: Adding a 0 length document can corrupt a CompressedCollection.
Solution: Change zlib_deflate to accept a 0 length document and return without throwing an exception. - Problem: IndriTextHandler drops '\0' termination of docnos (document ids).
Solution: Add 1 to the docid.valueLength to retain the trailing '\0'.
- Problem: KeyfileIncIndex does not correctly store GB encoded terms.