Skip to main content

Solr integration for external indexing and searching.

Project description

Introduction

collective.solr is an approach to integrate the Solr search engine with Plone. It provides an indexing processor for use with collective.indexing as well as a search API similar to the standard portal catalog. GenericSetup profiles can be applied to set up content indexing in Solr and use it as a backend for Plone’s site and live search facilities.

Current Status

The implementation is considered to be nearly finished. The package can be installed in a Plone 3.x site to enable indexing operations as well as searching (site and live search) using Solr. Doing so will not only significantly improve search performance — especially for a large number of indexed objects, but also reduce the memory footprint of your Plone instance by allowing to remove the SearchableText index from the portal catalog — at least for most sites. A sample buildout is provided for your convenience.

The code was written with emphasis on minimalism, clarity and maintainability. It comes with extensive tests covering the code base. The package is currently in use in production and considered stable.

For outstanding issues and features remaining to be implemented please see the to-do list included in the package as well as it’s issue tracker.

Installation

The following buildout configuration may be used to get started quickly:

[buildout]
extends =
  buildout.cfg
  http://svn.plone.org/svn/collective/collective.solr/trunk/buildout/solr-1.3.cfg

[instance]
eggs += collective.solr
zcml += collective.solr

After saving this to let’s say solr.cfg buildout can be run and the Solr server and Plone instance started:

$ python bootstrap.py
$ bin/buildout -c solr.cfg
...
$ bin/solr-instance start
$ bin/instance start

Next the “collective.solr (site search)” profile should be applied via the portal setup or when creating a fresh Plone site. After activating and configuring the integration in the Plone control panel and initially indexing any existing content using the provided maintenance view:

http://localhost:8080/plone/@@solr-maintenance/reindex

facet information should appear in Plone’s search results page.

FAQs / Troubleshooting

“AssertionError: cannot use multiple direct indexers; please enable queueing”

Symptom

When installing additional packages or applying a GenericSetup profile you’re getting the following error:

AssertionError: cannot use multiple direct indexers; please enable queueing
Problem

Early versions of the package used a persistent local utility, which is still present in your ZODB. This utility has meanwhile been replaced so that there are currently two instances present. However, without queued indexing being enabled, only one such indexer is allowed at a time.

Solution

Please simply re-install the package via Plone’s control panel or the quick-installer. Alternatively you can also use the ZMI “Components” tab on your site root object, typically located at http://localhost:8080/plone/manage_components, to remove the broken utilities from the XML. Search for “broken”.

Credits

This code was inspired by enfold.solr by Enfold Systems as well as work done at the snowsprint’08. The solr.py module is based on the original python integration package from Solr itself.

Development was kindly sponsored by Elkjop.

Changelog

1.0b20 - Released January 26, 2010

  • Fix reindexing to always provide data for all fields defined in the schema as support for “updateable/modifiable documents” is only planned for Solr 1.5. See https://issues.apache.org/jira/browse/SOLR-139 for more info. [witsch]

  • Fix CSS issues regarding facet display on IE6. [witsch]

1.0b19 - Released January 24, 2010

  • Fix partial reindexing to preserve data for indices that are not stored. [witsch]

  • Help with improved logging of auto-flushes for easier performance tuning. [witsch]

1.0b18 - Released January 23, 2010

  • Work around layout issue regarding facet counts on IE6. [witsch]

1.0b17 - Released January 21, 2010

  • Don’t confuse pre-configured filter queries with facet selections. [witsch]

  • Always display selected facets, even, or especially, without search results. [witsch]

1.0b16 - Released January 11, 2010

  • Remove catalogSync maintenance view since it would need to fetch additional data (for non-stored indices) from the objects themselves in order to work correctly. [witsch]

  • Fix reindex maintenance view to preserve data that cannot be fetched from Solr during partial indexing, i.e. indices that are not stored. [witsch]

  • Use wildcard searches for simple search terms to reflect Plone’s default behaviour. [witsch]

  • Fix drill-down for facet values containing white space. [witsch]

  • Add support for partial syncing of catalog and solr indexes. [witsch]

1.0b15 - Released October 12, 2009

1.0b14 - Released September 17, 2009

  • Fix query builder to use explicit ORs so that it becomes possible to change Solr’s default operator to AND. [witsch]

  • Remove relevance information from search results as they don’t make sense to the user. [witsch]

1.0b13 - Released August 20, 2009

  • Fix reindex and catalogSync maintenance views to not pass invalid data back to Solr when indexing an explicit list of attributes. [witsch]

1.0b12 - Released August 15, 2009

  • Fix reindex maintenance view to keep any existing data when indexing a given list of attributes. [witsch]

  • Add support for facet dependencies: Specifying a facet “foo” like “foo:bar” only makes it show up when a value for “bar” has been previously selected. [witsch]

  • Allow indexer methods to raise AttributeError to prevent an attribute from being indexed. [witsch]

1.0b11 - Released July 2, 2009

  • Fix maintenance view for adding/syncing single indexes using catalog data. [witsch]

  • Allow to configure query parameters for which filter queries should be used (see http://wiki.apache.org/solr/FilterQueryGuidance for more info) [fschulze, witsch]

  • Encode unicode strings when building facet links. [fschulze, witsch]

  • Fix facet display to try to keep the given order of facets. [witsch]

  • Allow facet values to be translated. [witsch]

1.0b10 - Released June 11, 2009

  • Range queries must not be quoted with the new query parser. [witsch]

  • Disable socket timeouts during maintenance tasks. [witsch]

  • Close the response object after searching in order to avoid ResponseNotReady errors triggering duplicate queries. [witsch]

  • Use proper way of accessing jQuery & fix IE6 syntax error. [fschulze]

  • Format relevance value for search results. [witsch]

1.0b9 - Released May 12, 2009

1.0b8 - Released May 4, 2009

1.0b7 - Released April 28, 2009

  • Fix unintended (de)activation of the Solr integration during profile (re)application. [witsch]

  • Fix display of facet information with no active facets. [witsch]

  • Register import and export steps using ZCML. [witsch]

1.0b6 - Released April 20, 2009

  • Add support for facetted searches. [witsch]

  • Update code to comply to PEP8 style guide lines. [witsch]

  • Expose additional information provided by Solr - for example about headers and search facets. [witsch]

  • Handle edge cases like invalid range queries by quoting [tesdal]

  • Parse and quote the query to filter invalid query syntax. [tesdal]

  • In solrSearchResults, if the passed in request is a dict, look up request to enable adaptation into PloneFlare. [tesdal]

  • Added support for objects with a ‘query’ attribute as search values. [tmog]

1.0b5 - Released December 16, 2008

  • Fix and extend logging in “sync” maintenance view. [witsch]

1.0b4 - Released November 23, 2008

  • Filter control characters to prevent indexing errors. This fixes http://plone.org/products/collective.solr/issues/1 [witsch]

  • Avoid using brains when getting all objects from the catalog for sync runs. [witsch]

  • Prefix output from maintenance views with a time-stamp. [witsch]

1.0b3 - Released November 12, 2008

  • Fix url fallback during schema retrieval. [witsch]

  • Fix issue regarding quoting of white space when searching. [witsch]

  • Make indexing operations more robust in case the schema is missing a unique key or couldn’t be parsed. [witsch]

1.0b2 - Released November 7, 2008

  • Make schema retrieval slightly more robust to not let network failures prevent access to the site. [witsch]

1.0b1 - Released November 5, 2008

  • Initial release [witsch]

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.solr-1.0b20.zip (132.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page