Skip to main content

A webmining CLI tool & library for python.

Project description

Build Status

Minet

minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.

In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.

Features

  • Multithreaded, memory-efficient fetching from the web.
  • Multithreaded, scalable crawling using a comfy DSL.
  • Multiprocessed raw text content extraction from HTML pages.
  • Multiprocessed scraping from HTML pages using a comfy DSL.
  • URL-related heuristics utilities such as extraction, normalization and matching.
  • Data collection from various APIs such as CrowdTangle.

Installation

minet can be installed using pip:

pip install minet

Cookbook

To learn how to use minet and understand how it may fit your use cases, you should definitely check out our Cookbook.

Usage

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minet-0.32.1.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

minet-0.32.1-py3-none-any.whl (88.9 kB view details)

Uploaded Python 3

File details

Details for the file minet-0.32.1.tar.gz.

File metadata

  • Download URL: minet-0.32.1.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9

File hashes

Hashes for minet-0.32.1.tar.gz
Algorithm Hash digest
SHA256 3f076f6432cc1f20f1192f46d8a4e17a9f162b1f9a979800c8bc6d00e55bbd95
MD5 75aee7773eeca99e3c960a5659a8bcf3
BLAKE2b-256 b0a132c72ce43c21bf1b35c5c219f3dd1eaaac0b7c64e70678344fd388074790

See more details on using hashes here.

Provenance

File details

Details for the file minet-0.32.1-py3-none-any.whl.

File metadata

  • Download URL: minet-0.32.1-py3-none-any.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9

File hashes

Hashes for minet-0.32.1-py3-none-any.whl
Algorithm Hash digest
SHA256 405a528eb15836d30a3a47dc283ddc3133b359e9c9ff031155bc5b1180bb5cac
MD5 676f322b2cbeeaeae51b7eaf9032cce9
BLAKE2b-256 0978a4f2e63ecdda53750e9c7a69d6fdb5f5d51e03886df1485589c04030c434

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page