Scrapy extension to write scraped items using MongoEngine documents
Project description
Bringing Scrapy and MongoEngine together.
scrapy-mongoengine-item is an extension that allows you to define Scrapy items using existing MongoEngine documents.
Documentation is available on Read the Docs.
Prerequisites
Both Python 2.7 and Python 3.5/3.6 are supported. For Python 3 you need Scrapy v1.1 or above.
Latest tested MongoEngine version is MongoEngine 0.17.0.
Installation
Install latest stable version from PyPI:
pip install scrapy-mongoengine-item
or latest stable version from GitHub:
pip install https://github.com/barseghyanartur/scrapy-mongoengine-item/archive/stable.tar.gz
or latest stable version from BitBucket:
pip install https://bitbucket.org/barseghyanartur/scrapy-mongoengine-item/get/stable.tar.gz
Introduction
MongoEngineItem is a class of item that gets its fields definition from a MongoEngine document, you simply create a MongoEngineItem and specify what MongoEngine document it relates to.
Besides of getting the document fields defined on your item, MongoEngineItem provides a method to create and populate a MongoEngine document instance with the item data.
Usage
MongoEngineItem works as follows: you create a subclass and define its mongoengine_document attribute to be a valid MongoEngine document. With this you will get an item with a field for each MongoEngine document field.
In addition, you can define fields that aren’t present in the document and even override fields that are present in the model defining them in the item.
Let’s see some examples:
Creating a MongoEngine document for the examples:
from mongoengine import fields, document
class Person(document.Document):
name = fields.StringField(max_length=255)
age = fields.IntField()
Defining a basic MongoEngineItem:
from scrapy_mongoengine_item import MongoEngineItem
class PersonItem(MongoEngineItem):
mongoengine_document = Person
MongoEngineItem works just like Scrapy items:
p = PersonItem()
p['name'] = 'John'
p['age'] = 22
To obtain the MongoEngine document from the item, we call the extra method MongoEngineItem.save() of the MongoEngineItem:
person = p.save()
person.name
# 'John'
person.age
# 22
person.id
# 1
The document is already saved when we call MongoEngineItem.save(), we can prevent this by calling it with commit=False. We can use commit=False in MongoEngineItem.save() method to obtain an unsaved document:
person = p.save(commit=False)
person.name
# 'John'
person.age
# 22
person.id
# None
As said before, we can add other fields to the item:
import scrapy
from scrapy_mongoengine_item import MongoEngineItem
class PersonItem(MongoEngineItem):
mongoengine_document = Person
sex = scrapy.Field()
p = PersonItem()
p['name'] = 'John'
p['age'] = 22
p['sex'] = 'M'
And we can override the fields of the document with your own:
class PersonItem(MongoEngineItem):
mongoengine_document = Person
name = scrapy.Field(default='No Name')
This is useful to provide properties to the field, like a default or any other property that your project uses. Those additional fields won’t be taken into account when doing a MongoEngineItem.save().
Development
Testing
To run tests in your working environment type:
./runtests.py
To test with all supported Python versions type:
tox
Running MongoDB
The easiest way is to run it via Docker:
docker pull mongo:latest
docker run -p 27017:27017 mongo:latest
Writing documentation
Keep the following hierarchy.
=====
title
=====
header
======
sub-header
----------
sub-sub-header
~~~~~~~~~~~~~~
sub-sub-sub-header
^^^^^^^^^^^^^^^^^^
sub-sub-sub-sub-header
++++++++++++++++++++++
sub-sub-sub-sub-sub-header
**************************
License
GPL-2.0-only OR LGPL-2.1-or-later
Support
For any issues contact me at the e-mail given in the Author section.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy-mongoengine-item-0.1.5.tar.gz
.
File metadata
- Download URL: scrapy-mongoengine-item-0.1.5.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e331e25c920eddc282399054c2ae8a121e432a74e75c3528b935984b7c242ed |
|
MD5 | 2aeccd15c065a96beaa9407e9b2cc443 |
|
BLAKE2b-256 | e31316adb922920a15a6ee5acd06201920f407a95e299cb928e5d5d9f1aebae1 |
File details
Details for the file scrapy_mongoengine_item-0.1.5-py2.py3-none-any.whl
.
File metadata
- Download URL: scrapy_mongoengine_item-0.1.5-py2.py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3368410304d7dd9628c02d85433b078ff396cc257e291e812f09e5f4425c6ccc |
|
MD5 | 07203cf9e874a9f17df3c75c7b529d7d |
|
BLAKE2b-256 | 3621154db3ed6cef4b3e9c7c2347033e223b4c336e118cfa0ab94b72a39026a9 |