Skip to main content

Client interface for Scrapinghub API

Project description

https://secure.travis-ci.org/scrapinghub/python-scrapinghub.png?branch=master

Requirements

Usage

The scrapinghub module is a Python library for communicating with the Scrapinghub API.

First, you connect to Scrapinghub:

>>> from scrapinghub import Connection
>>> conn = Connection('APIKEY')
>>> conn
Connection('APIKEY')

You can list the projects available to your account:

>>> conn.project_ids()
[123, 456]

And select a particular project to work with:

>>> project = conn[123]
>>> project
Project(Connection('APIKEY'), 123)
>>> project.id
123

To schedule a spider run (it returns the job id):

>>> project.schedule('myspider', arg1='val1')
u'123/1/1'

To get the list of spiders in the project:

>>> project.spiders()
[
  {u'id': u'spider1', u'tags': [], u'type': u'manual', u'version': u'123'},
  {u'id': u'spider2', u'tags': [], u'type': u'manual', u'version': u'123'}
]

To get all finished jobs:

>>> jobs = project.jobs(state='finished')

jobs is a JobSet. JobSet objects are iterable and, when iterated, return an iterable of Job objects, so you typically use it like this:

>>> for job in jobs:
...     # do something with job

Or, if you just want to get the job ids:

>>> [x.id for x in jobs]
[u'123/1/1', u'123/1/2', u'123/1/3']

To select a specific job:

>>> job = project.job(u'123/1/2')
>>> job.id
u'123/1/2'

To retrieve all scraped items from a job:

>>> for item in job.items():
...     # do something with item (it's just a dict)

To retrieve all log entries from a job:

>>> for logitem in job.log():
...     # logitem is a dict with logLevel, message, time

To get job info:

>>> job.info['spider']
'myspider'
>>> job.info['started_time']
'2010-09-28T15:09:57.629000'
>>> job.info['tags']
[]
>>> job.info['fields_count]['description']
1253

To mark a job with tag consumed:

>>> job.update(add_tag='consumed')

To mark several jobs with tag consumed (JobSet also supports the update() method):

>>> project.jobs(state='finished').update(add_tag='consumed')

To delete a job:

>>> job.delete()

To delete several jobs (JobSet also supports the update() method):

>>> project.jobs(state='finished').delete()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapinghub-1.8.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

scrapinghub-1.8.0-py2-none-any.whl (5.9 kB view details)

Uploaded Python 2

File details

Details for the file scrapinghub-1.8.0.tar.gz.

File metadata

  • Download URL: scrapinghub-1.8.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scrapinghub-1.8.0.tar.gz
Algorithm Hash digest
SHA256 ac4031c64e295f1bfc00e755e6f99d6c2ff71f0e8c797840d8eb2fd37875eda3
MD5 b4a90fa14848b2793cc477f0d93581f7
BLAKE2b-256 cae3e65535c2f6f44d64be4fab3ffe2454721988026d33d784ad464aaf538ea4

See more details on using hashes here.

Provenance

File details

Details for the file scrapinghub-1.8.0-py2-none-any.whl.

File metadata

File hashes

Hashes for scrapinghub-1.8.0-py2-none-any.whl
Algorithm Hash digest
SHA256 ad32eb392fac3398d123a6d8171a7fca83b9999a956c6d3ae6e08bdd322e38d6
MD5 732a845d1e07743d2b799402ff844398
BLAKE2b-256 637d57d016ed866483315c24750b5a6762917c777e10cdabb4b3e4b283467f28

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page