collective.gsa

GSA integration for external indexing and searching

These details have not been verified by PyPI

Project links

Homepage

Project description

Introduction

Package collective.gsa integrates Plone site with a Google Search Appliance (GSA). It provides an indexing processor with collective.indexing as well as a search capabilities.

Installation

Add collective.gsa to your buildout.cfg to both eggs and zcml section:

[buildout]

eggs = collective.gsa

[instance]
zcml =
    collective.gsa
    collective.gsa-overrides

After running buildout and restarting the server, you can install it via Quick Installer either ZMI or Plone Add/Remove Products. After installing the package the GSA settings and GSA maintenance configlets will appear in the Plone Control Panel. Follow the fields’ description to set it up.

Global reindex

In the GSA maintenance configlet there is a tool to globally reindex the whole site. If the site is a large one, memory related issues may appear. Thus the reindex allows you to run it piece by piece by batching the objects.

If it is more suitable to run rather more small batches then there is an example script global_reindex.py in the example folder which runs the batch reindexes repeatedly.

Indexing

Package collective.gsa registers adapter for IQueueIndexProcessor and indexing is done via collective.indexing package. When object is reindexed the content provider adapter is called to obtain the data.

The package contains content providers for objects implementing IATDocument, IATFile and IATContentType.

For document CTs (Page, News Items etc.) the main macro ( usually the site without portlets and the header).
For file CTs the primary file field is sent.
For other archetype based CTs the title and description.

To create support for other types just create your own content provider implementing interface IContentProvider and register it via zcml. For details look at the content_provider module and gsa’s configure.zcml

The package supports dual indexing if you have two sites - e.g. secure for edit access and public for anonymous access. The object’s identifier in GSA is its url which is obtained using object’s absolute_url method. Thus all the indexing has to be done from the url you want it to be indexed for ( e.i. not from localhost). In the GSA’s control panel you can set a dual base url for anonymous site. Then the url is constructed using the dual url plus absolute_url_path method.

When reindexing object, the feed id added to a persistent queue and is removed when successfully sent to GSA hence if GSA is unreachable the feed will be send when another object is reindexed.

Fact that GSA received the feed does not mean that it is going to be indexed ( e.i. the url is not in the Matched URLs settings ) If your objects are not indexed, please, check the GSA’s Crawl and Index settings.

Searching

This package replaces the search template and livesearch script to use GSA as a search engine. This is done by adding a gsasearch=on into the search request to avoid using GSA search for internal searches ( such as navigation, folder contents etc. )

The plone’s advanced search is at the default search_form template and does not use GSA at all, because GSA does not handle indexes as zope’s ZCatalog does. However you can use the GSA’s advanced search which url you can set at the local GSA control panel.

Current Status

The basic implementation is nearly finished and we aim to write the neccessary tests for it.

Credit

This code was inspired by collective.solr package and it was kindly sponsored by University of Leicester.

Changelog

1.0.4 - 2009-07-13

added render method to overriden searchbox viewlet - fixes compatibility with 3.1.2
remove mechanize from required packages ( zope2 has its own with different version)

1.0.3 - 2009-06-24

Included global_reindex script to run the reindex (by Steven Hayles)
‘Start over’ button at gsa-maintenance view only resets the ‘already reindexed objects’ number

1.0.2 - 2009-06-08

Removed rank from LiveSearch if zero
Filtering creators only to the existing in the current plone instance
When reindexing files, commiting after one
Added straight option for GSA indexer to skip persistent queue ( for global reindex )

1.0.1 - 2009-05-29

Added better Memory error handling

1.0 - Initial release

Initial release

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0rc9 pre-release

Jan 15, 2013

2.0rc8 pre-release

Nov 14, 2012

2.0rc7 pre-release

Nov 1, 2012

2.0rc6 pre-release

Aug 3, 2012

2.0rc5 pre-release

Jun 6, 2012

2.0rc4 pre-release

May 21, 2012

2.0rc3 pre-release

Apr 27, 2012

2.0rc2 pre-release

Apr 27, 2012

2.0rc1 pre-release

Apr 27, 2012

1.0.9

Jan 25, 2011

1.0.8

Jan 21, 2011

1.0.7

Nov 11, 2010

1.0.6

Jun 21, 2010

1.0.5

Aug 11, 2009

1.0.5dev-r3574 pre-release

Aug 11, 2009

This version

1.0.4

Jul 13, 2009

1.0.3

Jun 24, 2009

1.0.2

Jun 8, 2009

1.0.1

May 29, 2009

1.0

May 24, 2009

1.0dev-r3284 pre-release

May 24, 2009

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.gsa-1.0.4.tar.gz (32.5 kB view details)

Uploaded Jul 13, 2009 Source

Built Distribution

collective.gsa-1.0.4-py2.4.egg (87.2 kB view details)

Uploaded Jul 13, 2009 Source

File details

Details for the file collective.gsa-1.0.4.tar.gz.

File metadata

Download URL: collective.gsa-1.0.4.tar.gz
Upload date: Jul 13, 2009
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for collective.gsa-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`5e75078b0cd54929fa9e91b4128ef33b4585fbb6028f8f4e831afb7bdf15bd02`
MD5	`2f2f4c6235b205e9c5033d1d0cbdd2f6`
BLAKE2b-256	`316ff22462ba262417965e396611e4046674006301a3d222096c6c72aa874810`

See more details on using hashes here.

File details

Details for the file collective.gsa-1.0.4-py2.4.egg.

File metadata

Download URL: collective.gsa-1.0.4-py2.4.egg
Upload date: Jul 13, 2009
Size: 87.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for collective.gsa-1.0.4-py2.4.egg
Algorithm	Hash digest
SHA256	`1bce00aa04651d8a3ec071a81804c3003a644901afaa249d18f7824b0546ad10`
MD5	`d34257e82c5677a23538ef660fd2d349`
BLAKE2b-256	`642a90b6588f0452666e841121ea666e8b8660e0110104f0bd05d640ad0d893d`