Skip to main content

Robots exclusion application for Django, complementing Sitemaps.

Project description

This is a basic Django application to manage robots.txt files following the robots exclusion protocol, complementing the Django Sitemap contrib app.

For installation instructions, see the documentation install section; for instructions on how to use this application, and on what it provides, see the file “overview.txt” in the “docs/” directory or on ReadTheDocs: https://django-robots.readthedocs.io/

Supported Django versions

  • Django 1.8

  • Django 1.9

  • Django 1.10

  • Django 1.11

Supported Python version

  • Python 2.7

  • Python 3.3, 3.4, 3.5, 3.6

Installation

Use your favorite Python installer to install it from PyPI:

pip install django-robots

Or get the source from the application site at:

http://github.com/jazzband/django-robots/

To install the sitemap app, then follow these steps:

  1. Add 'robots' to your INSTALLED_APPS setting.

  2. Make sure 'django.template.loaders.app_directories.Loader' is in your TEMPLATES setting. It’s in there by default, so you’ll only need to change this if you’ve changed that setting.

  3. Make sure you’ve installed the sites framework.

  4. Run the migrate management command

Sitemaps

By default a Sitemap statement is automatically added to the resulting robots.txt by reverse matching the URL of the installed Sitemap contrib app. This is especially useful if you allow every robot to access your whole site, since it then gets URLs explicitly instead of searching every link.

To change the default behaviour to omit the inclusion of a sitemap link, change the ROBOTS_USE_SITEMAP setting in your Django settings file to:

ROBOTS_USE_SITEMAP = False

In case you want to use specific sitemap URLs instead of the one that is automatically discovered, change the ROBOTS_SITEMAP_URLS setting to:

ROBOTS_SITEMAP_URLS = [
    'http://www.example.com/sitemap.xml',
]

If the sitemap is wrapped in a decorator, dotted path reverse to discover the sitemap URL does not work. To overcome this, provide a name to the sitemap instance in urls.py:

urlpatterns = [
    ...
    url(r'^sitemap.xml$', cache_page(60)(sitemap_view), {'sitemaps': [...]}, name='cached-sitemap'),
    ...
]

and inform django-robots about the view name by adding the followin setting:

ROBOTS_SITEMAP_VIEW_NAME = 'cached-sitemap'

Use ROBOTS_SITEMAP_VIEW_NAME also if you use custom sitemap views (e.g.: wagtail custom sitemaps).

Initialization

To activate robots.txt generation on your Django site, add this line to your URLconf:

(r'^robots\.txt$', include('robots.urls')),

This tells Django to build a robots.txt when a robot accesses /robots.txt. Then, please sync your database to create the necessary tables and create Rule objects in the admin interface or via the shell.

Rules

Rule - defines an abstract rule which is used to respond to crawling web robots, using the robots exclusion protocol, a.k.a. robots.txt.

You can link multiple URL pattern to allows or disallows the robot identified by its user agent to access the given URLs.

The crawl delay field is supported by some search engines and defines the delay between successive crawler accesses in seconds. If the crawler rate is a problem for your server, you can set the delay up to 5 or 10 or a comfortable value for your server, but it’s suggested to start with small values (0.5-1), and increase as needed to an acceptable value for your server. Larger delay values add more delay between successive crawl accesses and decrease the maximum crawl rate to your web server.

The sites framework is used to enable multiple robots.txt per Django instance. If no rule exists it automatically allows every web robot access to every URL.

Please have a look at the database of web robots for a full list of existing web robots user agent strings.

Host directive

By default a Host statement is automatically added to the resulting robots.txt to avoid mirrors and select the main website properly.

To change the default behaviour to omit the inclusion of host directive, change the ROBOTS_USE_HOST setting in your Django settings file to:

ROBOTS_USE_HOST = False

if you want to prefix the domain with the current request protocol (http or https as in Host: https://www.mysite.com) add this setting:

ROBOTS_USE_SCHEME_IN_HOST = True

URLs

Url - defines a case-sensitive and exact URL pattern which is used to allow or disallow the access for web robots. Case-sensitive.

A missing trailing slash does also match files which start with the name of the given pattern, e.g., '/admin' matches /admin.html too.

Some major search engines allow an asterisk (*) as a wildcard to match any sequence of characters and a dollar sign ($) to match the end of the URL, e.g., '/*.jpg$' can be used to match all jpeg files.

Caching

You can optionally cache the generation of the robots.txt. Add or change the ROBOTS_CACHE_TIMEOUT setting with a value in seconds in your Django settings file:

ROBOTS_CACHE_TIMEOUT = 60*60*24

This tells Django to cache the robots.txt for 24 hours (86400 seconds). The default value is None (no caching).

Changelog

3.0 (2017-02-28)

  • Dropped support for Django < 1.8

  • Added support for Django 1.10 / 1.11

  • Improved admin changeform

  • Added support for protocol prefix to Host directive

  • Added support for sitemap named views (for non standard sitemap views)

  • Fixed an error which resulted in doubling the scheme for sitemap

  • Fixed support for cached sitemaps

2.0 (2016-02-28)

  • Dropped support for Django 1.5

  • Added support for Django 1.9

  • Improved code / metadata quality

  • Added Host directive

  • Added support to detect current site via http host var

  • Added filter_horizontal for for allowed and disallowed

  • Fixed error in which get_sitemap_urls modifies SITEMAP_URLS

  • Url patterns marked as safe in template

  • disabled localization of decimal fields in template

1.1 (2015-05-12)

  • Fixed compatibility to Django 1.7 and 1.8.

  • Moved South migrations into different subdirectory so South>=1.0 is needed.

1.0 (2014-01-16)

  • BACKWARDS-INCOMPATIBLE change: The default behaviour of this app has changed to allow all bots from the previous opposite behavior.

  • Fixed some backward compatibility issues.

  • Updated existing translations (Danish, German, French, Portugese (Brasil), Russian).

  • Added Greek, Spanish (Spain), Japanese, Dutch, Slovak and Ukrainian translations.

0.9.2 (2013-03-24)

  • Fixed compatibility with Django 1.5. Thanks, Russell Keith-Magee.

0.9.1 (2012-11-23)

  • Fixed argument signature in new class based view. Thanks, mkai.

0.9 (2012-11-21)

  • Deprecated ROBOTS_SITEMAP_URL setting. Use ROBOTS_SITEMAP_URLS instead.

  • Refactored rule_list view to be class based. django-robots now requires Django >= 1.3.

  • Stop returning 404 pages if there are no Rules setup on the site. Instead dissallow access for all robots.

  • Added an initial South migration. If you’re using South you have to “fake” the initial database migration:

    python manage.py migrate --fake robots 0001
  • Added initial Sphinx docs.

Bugs and feature requests

As always your mileage may vary, so please don’t hesitate to send feature requests and bug reports:

https://github.com/jazzband/django-robots/issues

Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-robots-3.0.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

django_robots-3.0-py2.py3-none-any.whl (68.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file django-robots-3.0.tar.gz.

File metadata

  • Download URL: django-robots-3.0.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for django-robots-3.0.tar.gz
Algorithm Hash digest
SHA256 91ab9c99c045bc25420c301676cb16668c3866ca763a77d1c740a74ca7e6b120
MD5 a75d9cf8c6eff63fa93cdb99dd5c64e6
BLAKE2b-256 83936ab7e26c28b6ac2e0e0b0cb53ddfee1306306e9494bc2a38295fe2c1939f

See more details on using hashes here.

Provenance

File details

Details for the file django_robots-3.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for django_robots-3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 163b977d84b611f40b978f6719ddd7a3d9893e676d4ff6c14a87a4f7e9ef7c86
MD5 dc01e800da0dc6c00f2e28e968ca963e
BLAKE2b-256 daa6bf8d1746233991e4517a1af7eab3893781704297ee475357f7559b4674bc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page