serpextract

Easy extraction of keywords from search engine results pages (SERPs).

These details have been verified by PyPI

Maintainers

Daniel.Blanchard emmett9001 kbourgoin msukmanowsky

These details have not been verified by PyPI

Project links

Homepage

Project description

https://travis-ci.org/Parsely/serpextract.png?branch=master

serpextract provides easy extraction of keywords from search engine results pages (SERPs).

This module is possible in large part to the very hard work of the Piwik team. Specifically, we make extensive use of their list of search engines.

Installation

Latest release on PyPI:

$ pip install serpextract

Or the latest development version:

$ pip install -e git://github.com/Parsely/serpextract.git#egg=serpextract

Usage

Command Line

Command-line usage, returns the engine name and keyword components separated by a comma and enclosed in quotes:

$ serpextract "http://www.google.ca/url?sa=t&rct=j&q=ars%20technica"
"Google","ars technica"

You can also print out a list of all the SearchEngineParsers currently available in your local cache via:

$ serpextract -l

Python

from serpextract import get_parser, extract, is_serp, get_all_query_params

non_serp_url = 'http://arstechnica.com/'
serp_url = ('http://www.google.ca/url?sa=t&rct=j&q=ars%20technica&source=web&cd=1&ved=0CCsQFjAA'
            '&url=http%3A%2F%2Farstechnica.com%2F&ei=pf7RUYvhO4LdyAHf9oGAAw&usg=AFQjCNHA7qjcMXh'
            'j-UX9EqSy26wZNlL9LQ&bvm=bv.48572450,d.aWc')

get_all_query_params()
# ['key', 'text', 'search_for', 'searchTerm', 'qrs', 'keyword', ...]

is_serp(serp_url)
# True
is_serp(non_serp_url)
# False

get_parser(serp_url)
# SearchEngineParser(engine_name='Google', keyword_extractor=['q'], link_macro='search?q={k}', charsets=['utf-8'])
get_parser(non_serp_url)
# None

extract(serp_url)
# ExtractResult(engine_name='Google', keyword=u'ars technica', parser=SearchEngineParser(...))
extract(non_serp_url)
# None

Tests

There are some basic tests for popular search engines, but more are required:

$ pip install -r requirements.txt
$ nosetests

Caching

Internally, this module caches an OrderedDict representation of Piwik’s list of search engines which is stored in serpextract/search_engines.pickle. This isn’t intended to change that often and so this module ships with a cached version. You can manually update the local cache via:

$ serpextract -u

This action currently requires PHP (we know, we know). We grab Piwik’s PHP array of all search engines, turn it into OrderedDict and store in pickle form. Ideally, we would have this search engine list in a language-independent form like JSON.

Project details

These details have been verified by PyPI

Maintainers

Daniel.Blanchard emmett9001 kbourgoin msukmanowsky

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.3

Aug 16, 2021

0.7.2

Apr 21, 2021

0.7.1

Feb 18, 2021

0.7.0

Aug 12, 2020

0.6.3

Jul 19, 2017

0.6.2

May 4, 2017

0.6.1

Mar 6, 2017

0.6.0

Feb 28, 2017

0.5.0

Sep 23, 2016

0.4.1

Sep 6, 2016

0.4.0

Sep 6, 2016

0.3.0

Jan 8, 2016

0.2.10

Jan 5, 2016

0.2.9

Oct 9, 2015

0.2.8

Sep 10, 2015

0.2.7

Dec 16, 2014

0.2.6

Oct 21, 2013

0.2.5

Aug 28, 2013

0.2.4

Jul 20, 2013

0.2.3

Jul 19, 2013

0.2.2

Jul 18, 2013

0.2.1

Jul 17, 2013

0.2.0

Jul 17, 2013

0.1.2

Jul 5, 2013

This version

0.1.1

Jul 2, 2013

0.1.0

Jul 2, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

serpextract-0.1.1.tar.gz (22.5 kB view details)

Uploaded Jul 2, 2013 Source

Built Distribution

serpextract-0.1.1-py2.7.egg (26.3 kB view details)

Uploaded Jul 2, 2013 Source

File details

Details for the file serpextract-0.1.1.tar.gz.

File metadata

Download URL: serpextract-0.1.1.tar.gz
Upload date: Jul 2, 2013
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for serpextract-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6826fc2840ef4678f7df1a9907bbc459177ed3cab59c697fe9914c481c31237b`
MD5	`8915934302396d35fdfadf2c492b5826`
BLAKE2b-256	`91ffcb746e188b834de6368da83c3244726ac6055db7fa28f60c74460fd98530`

See more details on using hashes here.

Provenance

File details

Details for the file serpextract-0.1.1-py2.7.egg.

File metadata

Download URL: serpextract-0.1.1-py2.7.egg
Upload date: Jul 2, 2013
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for serpextract-0.1.1-py2.7.egg
Algorithm	Hash digest
SHA256	`4e8ded0355496151f8939c9de05ccf96604f522b2f2b9469d2f79ad49bbdbe52`
MD5	`fb6cb2fcb86ea4e28939ef0c115b44bd`
BLAKE2b-256	`e9ea03577741b2e551ba62d7cb146286096bad5b52be3d5dfe8bbb3085bd481a`