Skip to main content

Easy extraction of keywords from search engine results pages (SERPs).

Project description

https://travis-ci.org/Parsely/serpextract.png?branch=master

serpextract provides easy extraction of keywords from search engine results pages (SERPs).

This module is possible in large part to the very hard work of the Piwik team. Specifically, we make extensive use of their list of search engines.

Installation

Latest release on PyPI:

$ pip install serpextract

Or the latest development version:

$ pip install -e git://github.com/Parsely/serpextract.git#egg=serpextract

Usage

Command Line

Command-line usage, returns the engine name and keyword components separated by a comma and enclosed in quotes:

$ serpextract "http://www.google.ca/url?sa=t&rct=j&q=ars%20technica"
"Google","ars technica"

You can also print out a list of all the SearchEngineParsers currently available in your local cache via:

$ serpextract -l

Python

from serpextract import get_parser, extract, is_serp, get_all_query_params

non_serp_url = 'http://arstechnica.com/'
serp_url = ('http://www.google.ca/url?sa=t&rct=j&q=ars%20technica&source=web&cd=1&ved=0CCsQFjAA'
            '&url=http%3A%2F%2Farstechnica.com%2F&ei=pf7RUYvhO4LdyAHf9oGAAw&usg=AFQjCNHA7qjcMXh'
            'j-UX9EqSy26wZNlL9LQ&bvm=bv.48572450,d.aWc')

get_all_query_params()
# ['key', 'text', 'search_for', 'searchTerm', 'qrs', 'keyword', ...]

is_serp(serp_url)
# True
is_serp(non_serp_url)
# False

get_parser(serp_url)
# SearchEngineParser(engine_name='Google', keyword_extractor=['q'], link_macro='search?q={k}', charsets=['utf-8'])
get_parser(non_serp_url)
# None

extract(serp_url)
# ExtractResult(engine_name='Google', keyword=u'ars technica', parser=SearchEngineParser(...))
extract(non_serp_url)
# None

Tests

There are some basic tests for popular search engines, but more are required:

$ pip install -r requirements.txt
$ nosetests

Caching

Internally, this module caches an OrderedDict representation of Piwik’s list of search engines which is stored in serpextract/search_engines.pickle. This isn’t intended to change that often and so this module ships with a cached version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpextract-0.1.2.tar.gz (22.3 kB view details)

Uploaded Source

File details

Details for the file serpextract-0.1.2.tar.gz.

File metadata

  • Download URL: serpextract-0.1.2.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for serpextract-0.1.2.tar.gz
Algorithm Hash digest
SHA256 df27d4bc8f499e3a58dc85606572f990a18c3665eb32618e08007ce7980d92be
MD5 667eca93609fab803bb3206c27e2239d
BLAKE2b-256 df7612da838472191f407b7e85e5253930bb613be52800098a8bbe6d0e0322bd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page