
Recoll
Source (link to git-repo or to original if based on someone elses unmodified work):
It is based on the very strong Xapian backend.
It provides an easy to use, feature-rich interface with a Qt GUI.
Most common document types are supported are supported along with their compressed versions (Text, HTML, PDF, Dvi, PostScript, Openoffice, Lyx, Scribus, Word/Excel/PPT, Abiword, Kword, Wordperfect, RTF, djvu, gaim logs, maildir and mailbox mail folders including attachments, misc media files).
Powerful query facilities are provided from simple keyword entry to assisted boolean query building with proximity clauses, filtering on file types or location. A Xesam-compatible query language also supports field searches, and date filtering.
Multiple character sets are supported. Internal processing and storage uses Unicode UTF-8.
Recoll has few dependancies. No database daemon, Web server, or exotic language/framework is necessary. In the default setup, it only runs on your system when you need it. Indexing can be performed in batch mode or in real time.
Thanks to Xapian, indexing does not tax system resources excessively and searching is very fast.
Latest 1.19 is 1.19.13: this hopefully fixes the last remaining bug in the multithreading code, which was causing quite rare, but ennoying crashes. You definitely want to upgrade to this version if you are running recoll 1.19.
Release 1.19 brings faster indexing for multiprocessors, new results management features (multiple attachment saves, duplicates listing), advanced search history storage, and other performance and usability enhancements. Also, a nice new PPT filter, Python 3 compatibility, and, for Ubuntu Users a Scope for the Dash on Saucy and Trusty.
Release 1.18.1 brings optional case- and diacritics-sensitive searches, complex search history, direct access to hit pages for PDF documents.
Release 1.17.3 brings a number of usability improvement: management of indexing operations from the GUI, filtering on file size, extended directory filtering, Ubuntu Unity Lens, thumbnails in result lists, Okular notes and Gnumeric filters, etc.
Release 1.16.2 brings a long list of small improvements and bug fixes. Image previews, negative directory filtering, anchored searches, more popup menu entries, etc. Please check the release notes for details (http://www.recoll.org/release-1.16.html).
Release 1.15 (.9): Enhanced native Qt 4 user interface (no more Qt 3 compatibility). Switchable table-like display for the results. Direct access to sort functions. Negative directory filtering. Web archive formats.
Release 1.14 (.3): Modification date searches and filtering. New GNU info filter. Improved Thunderbird mail indexing. Other small bug fixes. date searches and filtering, arbitrary email header indexing, new audio tag extractor based on the Mutagen Python library, and miscellaneous other improvements.
Release 1.13 (.04): New class of persistent filters and indexed file types: zip, chm, ics. Improved big text files handling, Firefox visited pages indexing. Quite a few other performance and usability improvements.
Release 1.12: new KDE KIO slave module, collapsing of identical results, context-sensitive F1 help, saving email attachments and other embedded documents to files, and other small improvements and bug fixes.
Release 1.11: easy filtering of results by document type, nicer previews which use html when possible, python programming interface for indexing and searching, better support for the Xesam user query language, new filter framework, better support for arbitrary field indexing and searching.
Release 1.10:
- Created mailing-list to improve support. Check home page.
- Fixed openSuse 11 compile issues.
- Fixed bug in interpreting email mime structure, which resulted in base-64 decoding errors.
- Fixed "Prev" button in preview window. Would actually go forward when walking the search terms.
- Allow setting the highlight color for search terms in result list and preview
- Added svg filter
- Ensure that in case the data of a file can't be indexed because of some error, at least the file name is indexed.
- Improve query language to support OR queries of terms with field specifications (ie: title:someterm OR author:someauthor).
- Fix filename search to split patterns on white space, so that a "*.jpg *.jpeg" search does what's expected. Means you now need to use double-quotes if there is actual embedded white space.
- Jump directly to the external editor choice dialog instead of opening preferences when an external viewer is not found.
- Allow stopping indexing through menu action (only works with qt4 for now).
- Create an "indexedmimetypes" configuration variable to allow explicitely restricting the file types which do get indexed.
- Adds support for CJK text, and a GUI configuration tool for the main configuration file.
Release 1.9: This release brings a number of small practical improvements: new filters: Wordperfect, Abiword, Kword, jpeg, flac, ogg; better control of disk and memory usage during indexing; improved abstract generation; arbitrary field support; improved qt4 support; and miscellaneous user interface improvements and bug fixes, described in more detail in the Changes file.
Ratings & Comments
42 Comments
9 There are clear UI improvements that can be made here. The application doesn't go with dark theme.
Also have a look at the improved Icon Theme and Layout for Recoll: https://www.linux-apps.com/p/1162008/
I prefer Recoll over using KDE's integrated search.
Here are rpms for Mageia 1 64bits http://mageia-gr.org/rpm/1/x86_64/recoll-1.17.1-1mgr1.x86_64.rpm http://mageia-gr.org/rpm/1/x86_64/kio-recoll-1.17.1-1mgr1.x86_64.rpm
If the icons are too big or if you don't like them, here is an alternative more beatiful Icon-Theme: http://kde-look.org/content/show.php/Alternative+Icon+Theme+Recoll?content=145669
running rpmbuiild --rebuild on recoll-1.15.2-0.src.rpm results in an error " /usr/lib64/gcc/x86_64-suse-linux/4.5/../../../../x86_64-suse-linux/bin/ld: cannot find -luuid collect2: ld returned 1 exit status" but I do have libuuid-devel, libuuid1 and uuid rpms installed ps the link on your dl page for 1.15.2 src.rpm actually links to 1.15.0 for openSuse no biggy since it will soon be in repo thanks,
of course it compiled from source fine thanks,
Thanks, I'll take a closer look at this when I'm back in 10 days. jf
Just compiled and 1) file name column in table view is empty 2) adding size column might be useful 3) in regular view the reason for the indentationing (to me) isn't obvious - seems based on year quarters but having each document in the indented group indented one more than the previous is excessive thanks,
Hi, About the file name column: if a previous version of recoll was installed, you need a full reindex (recollindex -z). Else, this is a bug, please get in touch with me (jfd@recoll.org). Size column: right-click on the table header, you should be able to customize the columns to your content (else, see email address above...) Indentation: this is not intentional, I've seen it happen, it seems to depend on the Qt version. Try to reset the result list paragraph format (in the query preferences, just set it to empty to restore the default). Maybe you can try to update Qt too. If nothing works, please get in touch. jf
thanks, 1) file still name not showing in table view after running 'recoll -z' , sent email 2) did not notice columns could be added 3) clearing 'paragraph format' resolved indenting thanks,
sorry but for clarification tile=filename so the filename column is not necessary, correct? Or is filename supposed to equal url? Either way not sure what the purpose of filename column is now thanks,
filename is the short name for the file (without the path). For people who give meaningful names to files it's sometimes actually more interesting than the document title. This depends on local taste and type of document, it was added as a separate field following popular request :) By the way the command to reset the index would be recollindex -z, not recoll -z, but maybe that's what you did.
Oh yes, and when a document does not have an internal title (ie: text/plain), recoll uses the file name as stand in, so that in this case filename==title
so then you are defaulting title to filename and not displaying filename in the cases where the document does not have an internal filename? And since I don't add title to doc's I create I should just delete the filename columns ps - I did run recollindez -z, as suggested (just posted recoll -z) pss - would you format the size colum (add comma's) thsnks,
(seems I can't reply to the last comment, so replying here). Normally ALL documents have file names stored as a field in the index. And they also all have titles, because if no internal title (ie: html <title> or email Subject:) is found, then the file name is copied in there. So I don't understand why you don't see the file names. Maybe try to empty ~/.recoll/fields in case there is something weird in there, then retry the recollindex -z (sorry about the repetition).
I have no ~/.recoll/fields I did delete all in that folder except recoll.conf and reindexed (recollindex -z) and I get the same - nothing in the filename column. I am running 1.5.1 see http://simplest-image-hosting.net/jpeg-0-recoll0 as always, thanks
This is becoming seriously mysterious ! We need to check what happens during indexing. - Set loglevel to 6 in the config (either from indexing preferences or by editing recoll.conf) - Create a small text file inside the indexed area, ie: cd echo atextfile > bogus.txt - try to index it: recollindex -i bogus.txt - You should see in the log the data record created for the file: :5:../rcldb/rcldb.cpp:1128:Rcl::Db::add: new doc record: url=file:///home/.../2010telephs.txt ... filename=2010telephs.txt If the filename field is not there, this is an indexing issue, else it's a query issue, we'll concentrate on the appropriate area at the next step. If you need to repeat the test, run "recollindex -e bogus.txt" to erase the index data for the file first (else no reindexing will be performed). I'm leaving for ten days this afternoon, I'll get back to this then, if you're patient enough to still be around :)
Just not to let this thread unclosed: the filenames field finally got to work for not entirely clear reasons, but anyway, all was well that ended well :)
I like the customization options. I added an extra <br> to space out the query results. cool!
...it would be nice if the result rows could be rendered with alternating background colors.
In the search results, it's generally best to use the pdf filename rather than the embedded title info which is rarely accurate (and things will likely never change in this respect). Yes, the filename is listed underneath, but it's not easy to discern.
Hi, There is now (1.14) a "filename" field which you can use in a custom result paragraph format, ie, <b>%(filename)</b> to display the file name prominently.
Recoll is the only one which does not slow down my old PIII clunker. I do not dare try to run nepomuk and strigi, the kde4 resource hawgs. Those kde4 daemons are crippling. So ... I am sure I am not the only one who would like to use recoll INSTEAD of the above. I will likely roll my own runner using the CLI and parsing its output when I have time to devote to it. Some .so based backends would be nice, even a library to directly access recoll's/xapian's data.
Probably the best approach for custom search interfaces would be to use the Python API which can access most, if not all, Recoll functionality (and I'm willing to extend it). Of course there is a .so with a C++ api behind this, but I think that the C++ api is quite unwieldy and it would be better to use the Python one. (and sorry I did not answer earlier, I rarely look at this page, and don't get email when comments are added). jf