Crawler for importing data from a filesystem directory into Solr
Project description
Introduction
bg.crawler is a command-line frontend for feeding a tree of files (a directory) into a Solr for indexing
Usage
Command line options:
blackmoon:~/src/bg.crawler> bin/solr-crawler --help usage: solr-crawler [-h] [--solr-url SOLR_URL] [--max-depth MAX_DEPTH] [--batch-size BATCH_SIZE] [--tag TAG] [--clear-all] [--clear-tag SOLR_CLEAR_TAG] [--verbose] [--no-type-check] <directory> Commandline parser positional arguments: <directory> Directory to be crawled optional arguments: -h, --help show this help message and exit --solr-url SOLR_URL SOLR server URL --max-depth MAX_DEPTH maximum folder depth --batch-size BATCH_SIZE Solr batch size --tag TAG Solr import tag --clear-all Clear the Solr indexes before crawling --clear-tag SOLR_CLEAR_TAG Remove all items from Solr indexed tagged with the given tag --verbose Verbose logging --no-type-check Apply extension filter while crawling
--solr-url defines the URL of the SOLR server
--max-depth limits the crawler to a given folder depth
--batch-size insert N documents within one batch before sending a commit to Solr (default behavior: every single add to the Solr indexed will be committed)
--tag will tag the imported document(s) with a string (this may be useful importing different document sources into Solr while supporting the option to filter by tag at query time)
--clear-all clear the complete Solr index before running the import
--clear-tag remove all documents with the given tag before running the import
--verbose enable extensive logging
--no-type-check if set: do not apply any type check filtering but instead pass all file types to Solr
Licence
bg.crawler is published under the GNU Public Licence V2 (GPL 2)
Credits
bg.crawler is sponsored by BG Phoenics
Contributors
Changelog
0.1. (2011-11-11)
initial release [ajung]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.