DOM Traversing and Scraping using GraphQL
Project description
GDOM is the next generation of web-parsing, powered by GraphQL syntax and the Graphene framework.
Install it typing in your console:
pip install gdom
DEMO: Try GDOM online
Usage
You can either do gdom --test to start a test server for testing queries or
gdom QUERY_FILE
This command will write in the standard output (or other output if specified via --output) the resulting JSON.
Your QUERY_FILE could look similar to this:
{
page(url:"http://news.ycombinator.com") {
items: query(selector:"tr.athing") {
rank: text(selector:"td span.rank")
title: text(selector:"td.title a")
sitebit: text(selector:"span.comhead a")
url: attr(selector:"td.title a", name:"href")
attrs: next {
score: text(selector:"span.score")
user: text(selector:"a:eq(0)")
comments: text(selector:"a:eq(2)")
}
}
}
}
Advanced usage
If you want to generalize your gdom query to any page, just rewrite your query file adding the $page var. So should look to something like this:
query ($page: String) {
page(url:$page) {
# ...
}
}
And then, query it like:
gdom QUERY_FILE http://news.ycombinator.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gdom-1.0.0.tar.gz
.
File metadata
- Download URL: gdom-1.0.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66b092ea846cf9b462d7fad23181cbc07603e43b662d39d7a2dacfe502af56ac |
|
MD5 | f1ec05032cefc74d023fcdf8a24177f6 |
|
BLAKE2b-256 | 76791ccbf38c32576dbb29efdc35819f96a99768266cdf6dc1586aef0e9fbe73 |