scrapyd-client

A client for Scrapyd

These details have been verified by PyPI

Maintainers

digenis rdowinton redapple scrapy

These details have not been verified by PyPI

Project links

Homepage

Project description

Build Status

Scrapyd-client is a client for Scrapyd. It provides:

Command line tools:

scrapyd-deploy, to deploy your project to a Scrapyd server
scrapyd-client, to interact with your project once deployed

Python client:

ScrapydClient, to interact with Scrapyd within your python code

It is configured using the Scrapy configuration file.

scrapyd-deploy

Deploying your project to a Scrapyd server involves:

Eggifying your project.
Uploading the egg to the Scrapyd server through the addversion.json webservice.

The scrapyd-deploy tool automates the process of building the egg and pushing it to the target Scrapyd server.

Deploying a project

Change (cd) to the root of your project (the directory containing the scrapy.cfg file)
Eggify your project and upload it to the target:
```
scrapyd-deploy <target> -p <project>
```

If you don’t have a setup.py file in the root of your project, one will be created. If you have one, it must set the entry_points keyword argument in the setup() function call, for example:

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = projectname.settings']},
)

If the command is successful, you should see a JSON response, like:

Deploying myproject-1287453519 to http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "spiders": ["spider1", "spider2"]}

To save yourself from having to specify the target and project, you can configure your defaults in the Scrapy configuration file.

Versioning

By default, scrapyd-deploy uses the current timestamp for generating the project version. You can pass a custom version using --version:

scrapyd-deploy <target> -p <project> --version <version>

See Scrapyd’s documentation on how it determines the latest version.

If you use Mercurial or Git, you can use HG or GIT respectively as the argument supplied to --version to use the current revision as the version. You can save yourself having to specify the version parameter by adding it to your target’s entry in scrapy.cfg:

[deploy]
...
version = HG

Note: The version keyword argument in the setup() function call in the setup.py file has no meaning to Scrapyd.

Include dependencies

Create a requirements.txt file at the root of your project, alongside the scrapy.cfg file
Use the --include-dependencies option when building or deploying your project:
```
scrapyd-deploy --include-dependencies
```

Alternatively, you can install the dependencies directly on the Scrapyd server.

Include data files

Create a setup.py file at the root of your project, alongside the scrapy.cfg file, if you don’t have one:
```
scrapyd-deploy --build-egg=/dev/null
```

Set the package_data and include_package_data` keyword arguments in the ``setup() function call in the setup.py file. For example:

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = projectname.settings']},
    package_data = {'projectname': ['path/to/*.json']},
    include_package_data = True,
)

Local settings

You may want to keep certain settings local and not have them deployed to Scrapyd.

Create a local_settings.py file at the root of your project, alongside the scrapy.cfg file

Add the following to your project’s settings file:

try:
    from local_settings import *
except ImportError:
    pass

scrapyd-deploy doesn’t deploy anything outside of the project module, so the local_settings.py file won’t be deployed.

Troubleshooting

Problem: A settings file for local development is being included in the egg.

Solution: See Local settings. Or, exclude the module from the egg. If using scrapyd-client’s default setup.py file, change the find_package() call:

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = projectname.settings']},
)

to:

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(exclude=["myproject.devsettings"]),
    entry_points = {'scrapy': ['settings = projectname.settings']},
)

Problem: Code using __file__ breaks when run in Scrapyd.

Solution: Use pkgutil.get_data instead. For example, change:

path = os.path.dirname(os.path.realpath(__file__))  # BAD
open(os.path.join(path, "tools", "json", "test.json"), "rb").read()

to:

import pkgutil
pkgutil.get_data("projectname", "tools/json/test.json")

Be careful when writing to disk in your project, as Scrapyd will most likely be running under a different user which may not have write access to certain directories. If you can, avoid writing to disk and always use tempfile for temporary files.
If you use a proxy, use the HTTP_PROXY, HTTPS_PROXY, NO_PROXY and/or ALL_PROXY environment variables, as documented by the requests package.

scrapyd-client

For a reference on each subcommand invoke scrapyd-client <subcommand> --help.

Where filtering with wildcards is possible, it is facilitated with fnmatch. The --project option can be omitted if one is found in a scrapy.cfg.

deploy

This is a wrapper around scrapyd-deploy.

targets

Lists all targets:

scrapyd-client targets

projects

Lists all projects of a Scrapyd instance:

# lists all projects on the default target
scrapyd-client projects
# lists all projects from a custom URL
scrapyd-client -t http://scrapyd.example.net projects

schedule

Schedules one or more spiders to be executed:

# schedules any spider
scrapyd-client schedule
# schedules all spiders from the 'knowledge' project
scrapyd-client schedule -p knowledge \*
# schedules any spider from any project whose name ends with '_daily'
scrapyd-client schedule -p \* \*_daily
# schedules spider1 in project1 specifying settings
scrapyd-client schedule -p project1 spider1 --arg 'setting=DOWNLOADER_MIDDLEWARES={"my.middleware.MyDownloader": 610}'

spiders

Lists spiders of one or more projects:

# lists all spiders
scrapyd-client spiders
# lists all spiders from the 'knowledge' project
scrapyd-client spiders -p knowledge

ScrapydClient

Interact with Scrapyd within your python code.

from scrapyd_client import ScrapydClient
client = ScrapydClient()

for project in client.projects():
   print(client.jobs(project=project))

Scrapy configuration file

Targets

You can define a Scrapyd target in your project’s scrapy.cfg file. Example:

[deploy]
url = http://scrapyd.example.com/api/scrapyd
username = scrapy
password = secret
project = projectname

You can now deploy your project without the <target> argument or -p <project> option:

scrapyd-deploy

If you have multiple targets, add the target name in the section name. Example:

[deploy:targetname]
url = http://scrapyd.example.com/api/scrapyd

[deploy:another]
url = http://other.example.com/api/scrapyd

If you are working with CD frameworks, you do not need to commit your secrets to your repository. You can use environment variable expansion like so:

[deploy]
url = $SCRAPYD_URL
username = $SCRAPYD_USERNAME
password = $SCRAPYD_PASSWORD

or using this syntax:

[deploy]
url = ${SCRAPYD_URL}
username = ${SCRAPYD_USERNAME}
password = ${SCRAPYD_PASSWORD}

To deploy to one target, run:

scrapyd-deploy targetname -p <project>

To deploy to all targets, use the -a option:

scrapyd-deploy -a -p <project>

While your target needs to be defined with its URL in scrapy.cfg, you can use netrc for username and password, like so:

machine scrapyd.example.com
    login scrapy
    password secret

Project details

These details have been verified by PyPI

Maintainers

digenis rdowinton redapple scrapy

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.0

Oct 11, 2024

1.2.3

Jan 30, 2023

1.2.2

May 3, 2022

1.2.1

May 2, 2022

1.2.0

Oct 1, 2021

1.2.0a1 pre-release

Aug 24, 2017

1.1.0

Feb 10, 2017

1.0.1

Apr 9, 2015

1.0

Apr 9, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyd_client-2.0.0.tar.gz (19.2 kB view details)

Uploaded Oct 11, 2024 Source

Built Distribution

scrapyd_client-2.0.0-py3-none-any.whl (12.7 kB view details)

Uploaded Oct 11, 2024 Python 3

File details

Details for the file scrapyd_client-2.0.0.tar.gz.

File metadata

Download URL: scrapyd_client-2.0.0.tar.gz
Upload date: Oct 11, 2024
Size: 19.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scrapyd_client-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f5e60071ee0e036472dc3f1da3e87b880d5b4b071090b257ee28d6d41bb3ed63`
MD5	`a70bb944dd679351b0b0df44c374ecf8`
BLAKE2b-256	`cf8e9319790d25de498b96e224ac7f3b70dae6e2345de64b456df2af89f89b26`

See more details on using hashes here.

Provenance

File details

Details for the file scrapyd_client-2.0.0-py3-none-any.whl.

File metadata

Download URL: scrapyd_client-2.0.0-py3-none-any.whl
Upload date: Oct 11, 2024
Size: 12.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for scrapyd_client-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89a94ea92cb501561d16e8a1af0fa1f0c35edba413824592c8e6b8d05b6cfd1f`
MD5	`bc9479c3c289d3ff401b984f9ac1a054`
BLAKE2b-256	`414cd6272c0cb603f9fd0ef5b7cc77cd079815a8cd66621814a036bc441152ad`