Skip to main content

A webmining CLI tool & library for python.

Project description

Build Status

Minet

minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.

In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.

Features

  • Multithreaded, memory-efficient fetching from the web.
  • Multithreaded, scalable crawling using a comfy DSL.
  • Multiprocessed raw text content extraction from HTML pages.
  • Multiprocessed scraping from HTML pages using a comfy DSL.
  • URL-related heuristics utilities such as extraction, normalization and matching.
  • Data collection from various APIs such as CrowdTangle.

Installation

minet can be installed using pip:

pip install minet

Cookbook

To learn how to use minet and understand how it may fit your use cases, you should definitely check out our Cookbook.

Usage

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minet-0.29.1.tar.gz (54.4 kB view details)

Uploaded Source

Built Distribution

minet-0.29.1-py3-none-any.whl (80.8 kB view details)

Uploaded Python 3

File details

Details for the file minet-0.29.1.tar.gz.

File metadata

  • Download URL: minet-0.29.1.tar.gz
  • Upload date:
  • Size: 54.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9

File hashes

Hashes for minet-0.29.1.tar.gz
Algorithm Hash digest
SHA256 b6047b5cd5bd4b1954131cdeaac85e3015f7d44c4e93101edd452b5fe4240de0
MD5 c52c7e46f145255e7cfa8396dda309ec
BLAKE2b-256 3668049adcaed0955766c429410c94994027ee6a4bef6a8839405be2b48f1b53

See more details on using hashes here.

Provenance

File details

Details for the file minet-0.29.1-py3-none-any.whl.

File metadata

  • Download URL: minet-0.29.1-py3-none-any.whl
  • Upload date:
  • Size: 80.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.9

File hashes

Hashes for minet-0.29.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c1bec4af2bd31796a5553321c987a6f22a75ee26938b5800ac8f59a1cb8022b6
MD5 921931ab85c7f5b0ad1ced12d1b812ed
BLAKE2b-256 fe2affc933567661a3b4b51219321a7cebafe9d8cd94a502c2210576da816e23

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page