Client interface for Scrapinghub API
Project description
Requirements
Python 2.6 or above
Requests library
Usage
The scrapinghub module is a Python library for communicating with the Scrapinghub API.
First, you connect to Scrapinghub:
>>> from scrapinghub import Connection >>> conn = Connection('APIKEY') >>> conn Connection('APIKEY')
You can list the projects available to your account:
>>> conn.project_ids() [123, 456]
And select a particular project to work with:
>>> project = conn[123] >>> project Project(Connection('APIKEY'), 123) >>> project.id 123
To schedule a spider run (it returns the job id):
>>> project.schedule('myspider', arg1='val1') u'123/1/1'
To get the list of spiders in the project:
>>> project.spiders() [ {u'id': u'spider1', u'tags': [], u'type': u'manual', u'version': u'123'}, {u'id': u'spider2', u'tags': [], u'type': u'manual', u'version': u'123'} ]
To get all finished jobs:
>>> jobs = project.jobs(state='finished')
jobs is a JobSet. JobSet objects are iterable and, when iterated, return an iterable of Job objects, so you typically use it like this:
>>> for job in jobs: ... # do something with job
Or, if you just want to get the job ids:
>>> [x.id for x in jobs] [u'123/1/1', u'123/1/2', u'123/1/3']
To select a specific job:
>>> job = project.job(u'123/1/2') >>> job.id u'123/1/2'
To retrieve all scraped items from a job:
>>> for item in job.items(): ... # do something with item (it's just a dict)
To retrieve all log entries from a job:
>>> for logitem in job.log(): ... # logitem is a dict with logLevel, message, time
To get job info:
>>> job.info['spider'] 'myspider' >>> job.info['started_time'] '2010-09-28T15:09:57.629000' >>> job.info['tags'] [] >>> job.info['fields_count]['description'] 1253
To mark a job with tag consumed:
>>> job.update(add_tag='consumed')
To mark several jobs with tag consumed (JobSet also supports the update() method):
>>> project.jobs(state='finished').update(add_tag='consumed')
To delete a job:
>>> job.delete()
To delete several jobs (JobSet also supports the update() method):
>>> project.jobs(state='finished').delete()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapinghub-1.8.0.tar.gz
.
File metadata
- Download URL: scrapinghub-1.8.0.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac4031c64e295f1bfc00e755e6f99d6c2ff71f0e8c797840d8eb2fd37875eda3 |
|
MD5 | b4a90fa14848b2793cc477f0d93581f7 |
|
BLAKE2b-256 | cae3e65535c2f6f44d64be4fab3ffe2454721988026d33d784ad464aaf538ea4 |
Provenance
File details
Details for the file scrapinghub-1.8.0-py2-none-any.whl
.
File metadata
- Download URL: scrapinghub-1.8.0-py2-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad32eb392fac3398d123a6d8171a7fca83b9999a956c6d3ae6e08bdd322e38d6 |
|
MD5 | 732a845d1e07743d2b799402ff844398 |
|
BLAKE2b-256 | 637d57d016ed866483315c24750b5a6762917c777e10cdabb4b3e4b283467f28 |