Solr crawler for BG Phoenics
Project description
Introduction
bg.crawler is a command-line frontend for feeding a tree of files (a directory) into a Solr for indexing
Usage
Command line options:
blackmoon:~/src/bg.crawler> bin/solr-crawler --help usage: solr-crawler [-h] [--solr-url SOLR_URL] [--max-depth MAX_DEPTH] [--batch-size BATCH_SIZE] [--tag TAG] [--clear-all] [--clear-tag SOLR_CLEAR_TAG] [--verbose] [--no-type-check] <directory> Commandline parser positional arguments: <directory> Directory to be crawled optional arguments: -h, --help show this help message and exit --solr-url SOLR_URL SOLR server URL --max-depth MAX_DEPTH maximum folder depth --batch-size BATCH_SIZE Solr batch size --tag TAG Solr import tag --clear-all Clear the Solr indexes before crawling --clear-tag SOLR_CLEAR_TAG Remove all items from Solr indexed tagged with the given tag --verbose Verbose logging --no-type-check Apply extension filter while crawling
--solr-url defines the URL of the SOLR server
--max-depth limits the crawler to a given folder depth
--batch-size insert N documents within one batch before sending a commit to Solr (default behavior: every single add to the Solr indexed will be committed)
--tag will tag the imported document(s) with a string (this may be useful importing different document sources into Solr while supporting the option to filter by tag at query time)
--clear-all clear the complete Solr index before running the import
--clear-tag remove all documents with the given tag before running the import
--verbose enable extensive logging
--no-type-check if set: do not apply any type check filtering but instead pass all file types to Solr
Licence
bg.crawler is published under the GNU Public Licence V2 (GPL 2)
Contributors
Changelog
0.1. (2011-11-11)
initial release [ajung]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.