Skip to main content

Robots exclusion application for Django, complementing Sitemaps.

Project description

=======================================
Robots exclusion application for Django
=======================================

This is a basic Django application to manage robots.txt files following the
`robots exclusion protocol`_, complementing the Django_ `Sitemap contrib app`_.

The robots exclusion application consists of two database models which are
tied together with a m2m relationship:

* Rules_
* URLs_

.. _Django: http://www.djangoproject.com/


Supported Django versions
-------------------------

* Django 1.6
* Django 1.7
* Django 1.8
* Django 1.9

Supported Python version
------------------------

* Python 2.6, 2.7
* Python 3.3, 3.4, 3.5


Installation
============

Use your favorite Python installer to install it from PyPI::

pip install django-robots

Or get the source from the application site at::

http://github.com/jazzband/django-robots/

To install the sitemap app, then follow these steps:

1. Add ``'robots'`` to your INSTALLED_APPS_ setting.
2. Make sure ``'django.template.loaders.app_directories.Loader'``
is in your TEMPLATE_LOADERS_ setting. It's in there by default, so
you'll only need to change this if you've changed that setting.
3. Make sure you've installed the `sites framework`_.
4. Run the ``syncdb`` or ``migrate`` management command (depening if you're
using South or not)

.. _INSTALLED_APPS: http://docs.djangoproject.com/en/dev/ref/settings/#installed-apps
.. _TEMPLATE_LOADERS: http://docs.djangoproject.com/en/dev/ref/settings/#template-loaders
.. _sites framework: http://docs.djangoproject.com/en/dev/ref/contrib/sites/

Sitemaps
--------

By default a ``Sitemap`` statement is automatically added to the resulting
robots.txt by reverse matching the URL of the installed `Sitemap contrib app`_.
This is especially useful if you allow every robot to access your whole site,
since it then gets URLs explicitly instead of searching every link.

To change the default behaviour to omit the inclusion of a sitemap link,
change the ``ROBOTS_USE_SITEMAP`` setting in your Django settings file to::

ROBOTS_USE_SITEMAP = False

In case you want to use specific sitemap URLs instead of the one that is
automatically discovered, change the ``ROBOTS_SITEMAP_URLS`` setting to::

ROBOTS_SITEMAP_URLS = [
'http://www.example.com/sitemap.xml',
]

.. _Sitemap contrib app: http://docs.djangoproject.com/en/dev/ref/contrib/sitemaps/

Initialization
==============

To activate robots.txt generation on your Django site, add this line to your
URLconf_::

(r'^robots\.txt$', include('robots.urls')),

This tells Django to build a robots.txt when a robot accesses ``/robots.txt``.
Then, please sync your database to create the necessary tables and create
``Rule`` objects in the admin interface or via the shell.

.. _URLconf: http://docs.djangoproject.com/en/dev/topics/http/urls/
.. _sync your database: http://docs.djangoproject.com/en/dev/ref/django-admin/#syncdb

Rules
=====

``Rule`` - defines an abstract rule which is used to respond to crawling web
robots, using the `robots exclusion protocol`_, a.k.a. robots.txt.

You can link multiple URL pattern to allows or disallows the robot identified
by its user agent to access the given URLs.

The crawl delay field is supported by some search engines and defines the
delay between successive crawler accesses in seconds. If the crawler rate is a
problem for your server, you can set the delay up to 5 or 10 or a comfortable
value for your server, but it's suggested to start with small values (0.5-1),
and increase as needed to an acceptable value for your server. Larger delay
values add more delay between successive crawl accesses and decrease the
maximum crawl rate to your web server.

The `sites framework`_ is used to enable multiple robots.txt per Django instance.
If no rule exists it automatically allows every web robot access to every URL.

Please have a look at the `database of web robots`_ for a full list of
existing web robots user agent strings.

.. _robots exclusion protocol: http://en.wikipedia.org/wiki/Robots_exclusion_standard
.. _'sites' framework: http://www.djangoproject.com/documentation/sites/
.. _database of web robots: http://www.robotstxt.org/db.html

Host directive
==============
By default a ``Host`` statement is automatically added to the resulting
robots.txt to avoid mirrors and select the main website properly.

To change the default behaviour to omit the inclusion of host directive,
change the ``ROBOTS_USE_HOST`` setting in your Django settings file to::

ROBOTS_USE_HOST = False

URLs
====

``Url`` - defines a case-sensitive and exact URL pattern which is used to
allow or disallow the access for web robots. Case-sensitive.

A missing trailing slash does also match files which start with the name of
the given pattern, e.g., ``'/admin'`` matches ``/admin.html`` too.

Some major search engines allow an asterisk (``*``) as a wildcard to match any
sequence of characters and a dollar sign (``$``) to match the end of the URL,
e.g., ``'/*.jpg$'`` can be used to match all jpeg files.

Caching
=======

You can optionally cache the generation of the ``robots.txt``. Add or change
the ``ROBOTS_CACHE_TIMEOUT`` setting with a value in seconds in your Django
settings file::

ROBOTS_CACHE_TIMEOUT = 60*60*24

This tells Django to cache the ``robots.txt`` for 24 hours (86400 seconds).
The default value is ``None`` (no caching).

.. include:: ../HISTORY.rst

Bugs and feature requests
=========================

As always your mileage may vary, so please don't hesitate to send feature
requests and bug reports:

https://github.com/jazzband/django-robots/issues

Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-robots-2.0rc2.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

django_robots-2.0rc2-py2.py3-none-any.whl (65.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file django-robots-2.0rc2.tar.gz.

File metadata

File hashes

Hashes for django-robots-2.0rc2.tar.gz
Algorithm Hash digest
SHA256 fe310a2d6d8524b98be9a2043f85ff0773dd8e75c1126010f6442408a4877b89
MD5 9152d5e8bddbe1aae76a0fae7cadcd1e
BLAKE2b-256 a6515c2a4989d88e02df4266ecc4a2bc8bd933e8f21ebc85e03857b4442bd5a6

See more details on using hashes here.

Provenance

File details

Details for the file django_robots-2.0rc2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for django_robots-2.0rc2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 493c942e6967f59422682663a0837accf8ee3259933a7d075cda63aa7126480b
MD5 0d3990dc8833ff1e8848c5eb8490f585
BLAKE2b-256 3dc13502a2bf6340731f2f48c3cf236fa43b67950265fb19239b57179310acb9

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page