Get book information (isbn or search) from real bookstores.
Project description
Web scraping to get book informations
This library is to get book informations. We can search with keywords, with the isbn, with an advanced search, and do pagination.
We get the data from existing websites. We scrape:
for French books, http://www.librairie-de-paris.fr (also Decitre, but it’s less complete)
for Spain: http://www.casadellibro.com
for Germany: the site went down ! This is the danger of webscraping.
we get: the title and authors, the price, the publisher(s), the cover, etc
Import data from an ods or csv file
If your file has an ‘isbn’ and a ‘quantity’ column, it’s easy, we will find all the book information.
If it has the title and the publisher, it’s doable but error prone. We can still do it, but you shall do an inventory of your stock afterwards.
See the odsimport module. It gives back a json. It’s your responsibility to add what you want in your database (this is done in Abelujo https://gitlab.com/vindarel/abelujo).
Usable, but work in progress.
Accepted format and columns
We can read ods and csv files.
a file with an “isbn” and “quantity” column,
a file with columns “title”, “publisher”, “isbn” (optionnal in this case), “shelf”, “distributor”, “quantity”. There is no “price” column. “authors” is optionnal (it can help to fetch the correct book).
If the file has no headers, use the “odsettings.py” configuration file (in particular, to set the csv delimiter, either “,” or “;”).
Why not Amazon ?
Amazon kills the book industry and its employees. But moreover, we can add value to our results. We can link to a good and independent bookshop from within our application, we could command books from it, we could say if it has exemplaries in stock or not, etc. And… we learn a lot in doing this !
Why not Google books ?
It has very few data.
Why not the BNF (Bibliothèque Nationale de France) ?
Because, for bookshops, we need recent books (they enter the BNF database after a few months), up to date information. There isn’t a lot of tools either.
Install
Install from pypi:
pip install bookshops
It is usable, but not considered mature.
Use
Command line
You can try this lib on the command line with the following commands: - livres: french data - libros: spanish data - come and ask for more :)
For example:
livres antigone
or
livres 9782918059363
and you get the above screenshot.
As a library
But most of all, from within your program:
from bookshops.frFR.librairiedeparis.librairiedeparisScraper import Scraper as frenchScraper scraper = frenchScraper("search keywords") cards = scraper.search() # we get a list of dictionnaries with the title, the authors, etc.
Advanced search
Work in progress.
You can search ed:agone to search for a specific publisher.
Pagination
We do pagination:
scraper = frenchScraper("search keywords", PAGE=2)
Develop and test
See http://dev.abelujo.cc/webscraping.html
Development mode:
pip install -e .
Now you can edit the project and run the development version like the lib is meant to be run, i.e. with the entry_points: livres, libros, etc.
Bugs and shortcomings
This is webscraping, so it doesn’t go without pitfalls:
the site can go down. It happened already.
the site can change, it which case we would have to change our sraper too. This can be catched early with automated and frequent tests (work ongoing).
Licence
LGPLv3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bookshops-0.1.3.tar.gz
.
File metadata
- Download URL: bookshops-0.1.3.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54c761699e03cf01dbb39276b1d2f35ef8eb3f8826046f375c1f9c4f2623007a |
|
MD5 | fc845e19dbfc14dcbf54439f0b150fb5 |
|
BLAKE2b-256 | 369d49df4c1c513d964eb384f95ce76650f0efffe8c18ed68e5f388760fb13ca |