A webmining CLI tool & library for python.
Project description
minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.
In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.
Features
- Multithreaded, memory-efficient fetching from the web.
- Multithreaded, scalable crawling using a comfy DSL.
- Multiprocessed raw text content extraction from HTML pages.
- Multiprocessed scraping from HTML pages using a comfy DSL.
- URL-related heuristics utilities such as extraction, normalization and matching.
- Data collection from various APIs such as CrowdTangle.
Installation
minet
can be installed using pip:
pip install minet
Cookbook
To learn how to use minet
and understand how it may fit your use cases, you should definitely check out our Cookbook.
Usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file minet-0.29.1.tar.gz
.
File metadata
- Download URL: minet-0.29.1.tar.gz
- Upload date:
- Size: 54.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6047b5cd5bd4b1954131cdeaac85e3015f7d44c4e93101edd452b5fe4240de0 |
|
MD5 | c52c7e46f145255e7cfa8396dda309ec |
|
BLAKE2b-256 | 3668049adcaed0955766c429410c94994027ee6a4bef6a8839405be2b48f1b53 |
Provenance
File details
Details for the file minet-0.29.1-py3-none-any.whl
.
File metadata
- Download URL: minet-0.29.1-py3-none-any.whl
- Upload date:
- Size: 80.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1bec4af2bd31796a5553321c987a6f22a75ee26938b5800ac8f59a1cb8022b6 |
|
MD5 | 921931ab85c7f5b0ad1ced12d1b812ed |
|
BLAKE2b-256 | fe2affc933567661a3b4b51219321a7cebafe9d8cd94a502c2210576da816e23 |