Simple tools for downloading, cleaning, extracting and parsing content
Project description
snagit
Yet another scrapping tool.
snagit allows you to scrape multiple pages or documents by either running script files, or in the interactive REPL. For instance:
$ snagit Type "help" for more information. Ctrl+c to exit > load http://httpbin.org/links/3/{} range='0-2' > print <html><head><title>Links</title></head><body>0 <a href='/links/3/1'>1</a> <a href='/links/3/2'>2</a> </body></html> <html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> 1 <a href='/links/3/2'>2</a> </body></html> <html><head><title>Links</title></head><body><a href='/links/3/0'>0</a> <a href='/links/3/1'>1</a> 2 </body></html> > select a > print <a href="/links/3/1">1</a> <a href="/links/3/2">2</a> <a href="/links/3/0">0</a> <a href="/links/3/2">2</a> <a href="/links/3/0">0</a> <a href="/links/3/1">1</a> > unwrap_attr a href > print /links/3/1 /links/3/2 /links/3/0 /links/3/2 /links/3/0 /links/3/1 > list LOAD 'http://httpbin.org/links/3/{}' range='0-2' PRINT SELECT 'a' PRINT UNWRAP_ATTR 'a' 'href' PRINT
Features
Process data as either a text block, lines of text, or HTML (using BeautifulSoup)
Built-in scripting language
REPL for command line interaction
Requirements
Python 3.5+
bs4 (BeautifulSoup 4.x)
requests
strutil
cachely
For testing:
pytest
pytest-cov
Development and Testing
Assumptions: you have pip and virtualenv installed.
$ virtualenv snagit $ source bin/activate $ git clone https://github.com/dakrauth/snagit.git $ cd snagit $ inv develop $ inv test $ inv cov
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
snagit-0.3.0.tar.gz
(13.3 kB
view details)
Built Distribution
snagit-0.3.0-py3-none-any.whl
(16.4 kB
view details)
File details
Details for the file snagit-0.3.0.tar.gz
.
File metadata
- Download URL: snagit-0.3.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7b03cbd8eccfe492de2e400e7bd23a4ba85d46f4836ef139902040f3c34c920 |
|
MD5 | 4be781b4cd3b554e40c1f5c06e423473 |
|
BLAKE2b-256 | 02f709f9d3f301932ff8dc13124623fa489f9b3dd8a7374da0d1514d98ea4769 |
File details
Details for the file snagit-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: snagit-0.3.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 348fde452098dc7deb024778e081128d15d176d4fff6e4cc48e7ae2d038a4b2f |
|
MD5 | d73cd1848693930067b5fa4db31656dd |
|
BLAKE2b-256 | 55d88db31f93564a87dcaff628574fe1334ce41f208356ba3c84548a3e363f09 |