Skip to main content

Scrapy extension to write scraped items using MongoEngine documents

Project description

Bringing Scrapy and MongoEngine together.

PyPI Version Build Status GPL-2.0-only OR LGPL-2.1-or-later

scrapy-mongoengine-item is an extension that allows you to define Scrapy items using existing MongoEngine documents.

Documentation is available on Read the Docs.

Prerequisites

Both Python 2.7 and Python 3.5/3.6 are supported. For Python 3 you need Scrapy v1.1 or above.

Latest tested MongoEngine version is MongoEngine 0.17.0.

Installation

  1. Install latest stable version from PyPI:

    pip install scrapy-mongoengine-item

    or latest stable version from GitHub:

    pip install https://github.com/barseghyanartur/scrapy-mongoengine-item/archive/stable.tar.gz

    or latest stable version from BitBucket:

    pip install https://bitbucket.org/barseghyanartur/scrapy-mongoengine-item/get/stable.tar.gz

Introduction

MongoEngineItem is a class of item that gets its fields definition from a MongoEngine document, you simply create a MongoEngineItem and specify what MongoEngine document it relates to.

Besides of getting the document fields defined on your item, MongoEngineItem provides a method to create and populate a MongoEngine document instance with the item data.

Usage

MongoEngineItem works as follows: you create a subclass and define its mongoengine_document attribute to be a valid MongoEngine document. With this you will get an item with a field for each MongoEngine document field.

In addition, you can define fields that aren’t present in the document and even override fields that are present in the model defining them in the item.

Let’s see some examples:

Creating a MongoEngine document for the examples:

from mongoengine import fields, document

class Person(document.Document):

    name = fields.StringField(max_length=255)
    age = fields.IntField()

Defining a basic MongoEngineItem:

from scrapy_mongoengine_item import MongoEngineItem

class PersonItem(MongoEngineItem):

    mongoengine_document = Person

MongoEngineItem works just like Scrapy items:

p = PersonItem()
p['name'] = 'John'
p['age'] = 22

To obtain the MongoEngine document from the item, we call the extra method MongoEngineItem.save() of the MongoEngineItem:

person = p.save()
person.name
# 'John'
person.age
# 22
person.id
# 1

The document is already saved when we call MongoEngineItem.save(), we can prevent this by calling it with commit=False. We can use commit=False in MongoEngineItem.save() method to obtain an unsaved document:

person = p.save(commit=False)
person.name
# 'John'
person.age
# 22
person.id
# None

As said before, we can add other fields to the item:

import scrapy
from scrapy_mongoengine_item import MongoEngineItem

class PersonItem(MongoEngineItem):

    mongoengine_document = Person
    sex = scrapy.Field()
p = PersonItem()
p['name'] = 'John'
p['age'] = 22
p['sex'] = 'M'

And we can override the fields of the document with your own:

class PersonItem(MongoEngineItem):

    mongoengine_document = Person
    name = scrapy.Field(default='No Name')

This is useful to provide properties to the field, like a default or any other property that your project uses. Those additional fields won’t be taken into account when doing a MongoEngineItem.save().

Development

Testing

To run tests in your working environment type:

./runtests.py

To test with all supported Python versions type:

tox

Running MongoDB

The easiest way is to run it via Docker:

docker pull mongo:latest
docker run -p 27017:27017 mongo:latest

Writing documentation

Keep the following hierarchy.

=====
title
=====

header
======

sub-header
----------

sub-sub-header
~~~~~~~~~~~~~~

sub-sub-sub-header
^^^^^^^^^^^^^^^^^^

sub-sub-sub-sub-header
++++++++++++++++++++++

sub-sub-sub-sub-sub-header
**************************

License

GPL-2.0-only OR LGPL-2.1-or-later

Support

For any issues contact me at the e-mail given in the Author section.

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-mongoengine-item-0.1.5.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

scrapy_mongoengine_item-0.1.5-py2.py3-none-any.whl (21.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy-mongoengine-item-0.1.5.tar.gz.

File metadata

File hashes

Hashes for scrapy-mongoengine-item-0.1.5.tar.gz
Algorithm Hash digest
SHA256 2e331e25c920eddc282399054c2ae8a121e432a74e75c3528b935984b7c242ed
MD5 2aeccd15c065a96beaa9407e9b2cc443
BLAKE2b-256 e31316adb922920a15a6ee5acd06201920f407a95e299cb928e5d5d9f1aebae1

See more details on using hashes here.

File details

Details for the file scrapy_mongoengine_item-0.1.5-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_mongoengine_item-0.1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3368410304d7dd9628c02d85433b078ff396cc257e291e812f09e5f4425c6ccc
MD5 07203cf9e874a9f17df3c75c7b529d7d
BLAKE2b-256 3621154db3ed6cef4b3e9c7c2347033e223b4c336e118cfa0ab94b72a39026a9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page