minet

A webmining CLI tool & library for python.

Project description

Minet

minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.

In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.

Features

Multithreaded, memory-efficient fetching from the web.
Multithreaded, scalable crawling using a comfy DSL.
Multiprocessed raw text content extraction from HTML pages.
Multiprocessed scraping from HTML pages using a comfy DSL.
URL-related heuristics utilities such as normalization and matching.
Data collection from various APIs such as CrowdTangle.

Installation

minet can be installed using pip:

pip install minet

Cookbook

To learn how to use minet and understand how it may fit your use cases, you should definitely check out our Cookbook.

Usage

CLI

Global utilities

-h/--help/help
--version

Basic commands

crawl
fetch
extract
scrape
url-join
url-parse

Platform-related commands

crowdtangle (ct)
- leaderboard
- lists
- posts
- search
- summary
facebook (fb)
- comments
hyphe
- dump
mediacloud (mc)
- topic
  - stories

CLI

-h/--help

If you need help about a command, don't hesitate to use the -h/--help flag or the help command:

minet ct posts -h
# or:
minet ct posts --help
# or
minet help ct posts

To check the installed version of minet, you can use the --version flag:

minet --version
>>> minet x.x.x

crawl

usage: minet crawl [-h] [-d OUTPUT_DIR] [--resume] [--throttle THROTTLE] crawler

Minet Crawl Command
===================

Use multiple threads to crawl the web using minet crawling and
scraping DSL.

positional arguments:
  crawler                                 Path to the crawler definition file.

optional arguments:
  -h, --help                              show this help message and exit
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR  Output directory.
  --resume                                Whether to resume an interrupted crawl.
  --throttle THROTTLE                     Time to wait - in seconds - between 2 calls to the same domain. Defaults to 0.2.

examples:

. TODO:
    `minet crawl`

fetch

usage: minet fetch [-h] [--compress] [--contents-in-report] [-d OUTPUT_DIR]
                   [-f FILENAME] [--filename-template FILENAME_TEMPLATE]
                   [-g {firefox,chrome}] [-H HEADERS] [--resume]
                   [--standardize-encoding] [-o OUTPUT] [-s SELECT] [-t THREADS]
                   [--throttle THROTTLE] [--total TOTAL]
                   [--url-template URL_TEMPLATE] [-X METHOD]
                   column [file]

Minet Fetch Command
===================

Use multiple threads to fetch batches of urls from a CSV file. The
command outputs a CSV report with additional metadata about the
HTTP calls and will generally write the retrieved files in a folder
given by the user.

positional arguments:
  column                                          Column of the CSV file containing urls to fetch.
  file                                            CSV file containing the urls to fetch.

optional arguments:
  -h, --help                                      show this help message and exit
  --compress                                      Whether to compress the contents.
  --contents-in-report, --no-contents-in-report   Whether to include retrieved contents, e.g. html, directly in the report
                                                  and avoid writing them in a separate folder. This requires to standardize
                                                  encoding and won't work on binary formats.
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR          Directory where the fetched files will be written. Defaults to "content".
  -f FILENAME, --filename FILENAME                Name of the column used to build retrieved file names. Defaults to an uuid v4 with correct extension.
  --filename-template FILENAME_TEMPLATE           A template for the name of the fetched files.
  -g {firefox,chrome}, --grab-cookies {firefox,chrome}
                                                  Whether to attempt to grab cookies from your computer's browser.
  -H HEADERS, --header HEADERS                    Custom headers used with every requests.
  --resume                                        Whether to resume from an aborted report.
  --standardize-encoding                          Whether to systematically convert retrieved text to UTF-8.
  -o OUTPUT, --output OUTPUT                      Path to the output report file. By default, the report will be printed to stdout.
  -s SELECT, --select SELECT                      Columns to include in report (separated by `,`).
  -t THREADS, --threads THREADS                   Number of threads to use. Defaults to 25.
  --throttle THROTTLE                             Time to wait - in seconds - between 2 calls to the same domain. Defaults to 0.2.
  --total TOTAL                                   Total number of lines in CSV file. Necessary if you want to display a finite progress indicator.
  --url-template URL_TEMPLATE                     A template for the urls to fetch. Handy e.g. if you need to build urls from ids etc.
  -X METHOD, --request METHOD                     The http method to use. Will default to GET.

examples:

. Fetching a batch of url from existing CSV file:
    `minet fetch url_column file.csv > report.csv`

. CSV input from stdin:
    `xsv select url_column file.csv | minet fetch url_column > report.csv`

. Fetching a single url, useful to pipe into `minet scrape`:
    `minet fetch http://google.com | minet scrape ./scrape.json > scraped.csv`

extract

If you want to be able to use the extract command, you will need to install the dragnet library. Because it is a bit cumbersome to install, it's not included in minet's dependencies yet.

Just run the following & in the same order (dragnet needs to have specific deps installed before it can be able to compile its native files):

pip install lxml numpy Cython
pip install dragnet

usage: minet extract [-h] [-e {dragnet,html2text}] [-i INPUT_DIRECTORY]
                     [-o OUTPUT] [-p PROCESSES] [-s SELECT] [--total TOTAL]
                     [report]

Minet Extract Command
=====================

Use multiple processes to extract raw text from a batch of HTML files.
This command can either work on a `minet fetch` report or on a bunch
of files. It will output an augmented report with the extracted text.

positional arguments:
  report                                          Input CSV fetch action report file.

optional arguments:
  -h, --help                                      show this help message and exit
  -e {dragnet,html2text}, --extractor {dragnet,html2text}
                                                  Extraction engine to use. Defaults to `dragnet`.
  -i INPUT_DIRECTORY, --input-directory INPUT_DIRECTORY
                                                  Directory where the HTML files are stored. Defaults to "content".
  -o OUTPUT, --output OUTPUT                      Path to the output report file. By default, the report will be printed to stdout.
  -p PROCESSES, --processes PROCESSES             Number of processes to use. Defaults to 4.
  -s SELECT, --select SELECT                      Columns to include in report (separated by `,`).
  --total TOTAL                                   Total number of HTML documents. Necessary if you want to display a finite progress indicator.

examples:

. Extracting raw text from a `minet fetch` report:
    `minet extract report.csv > extracted.csv`

. Working on a report from stdin:
    `minet fetch url_column file.csv | minet extract > extracted.csv`

. Extracting raw text from a bunch of files:
    `minet extract --glob "./content/*.html" > extracted.csv`

scrape

TODO: document the scraping DSL

usage: minet scrape [-h] [-f {csv,jsonl}] [-g GLOB] [-i INPUT_DIRECTORY]
                    [-o OUTPUT] [-p PROCESSES] [--total TOTAL]
                    scraper [report]

Minet Scrape Command
====================

Use multiple processes to scrape data from a batch of HTML files.
This command can either work on a `minet fetch` report or on a bunch
of files. It will output the scraped items.

positional arguments:
  scraper                                         Path to a scraper definition file.
  report                                          Input CSV fetch action report file.

optional arguments:
  -h, --help                                      show this help message and exit
  -f {csv,jsonl}, --format {csv,jsonl}            Output format.
  -g GLOB, --glob GLOB                            Whether to scrape a bunch of html files on disk matched by a glob pattern rather than sourcing them from a CSV report.
  -i INPUT_DIRECTORY, --input-directory INPUT_DIRECTORY
                                                  Directory where the HTML files are stored. Defaults to "content".
  -o OUTPUT, --output OUTPUT                      Path to the output report file. By default, the report will be printed to stdout.
  -p PROCESSES, --processes PROCESSES             Number of processes to use. Defaults to 4.
  --total TOTAL                                   Total number of HTML documents. Necessary if you want to display a finite progress indicator.

examples:

. Scraping item from a `minet fetch` report:
    `minet scrape scraper.json report.csv > scraped.csv`

. Working on a report from stdin:
    `minet fetch url_column file.csv | minet scrape scraper.json > scraped.csv`

. Scraping a single page from the web:
    `minet fetch https://news.ycombinator.com/ | minet scrape scraper.json > scraped.csv`

. Scraping items from a bunch of files:
    `minet scrape scraper.json --glob "./content/*.html" > scraped.csv`

url-join

usage: minet url-join [-h] [-o OUTPUT] [-s SELECT] column1 file1 column2 file2

Minet Url Join Command
======================

Join two CSV files by matching them on columns containing urls. In
fact, the command will index the first file's urls into a
hierchical trie before attempting to match the second file's ones.

positional arguments:
  column1                     Name of the url column in the first file.
  file1                       Path to the first file.
  column2                     Name of the url column in the second file.
  file2                       Path to the second file.

optional arguments:
  -h, --help                  show this help message and exit
  -o OUTPUT, --output OUTPUT  Path to the output joined file. By default, the join will be printed to stdout.
  -s SELECT, --select SELECT  Columns from the first file to keep, separated by comma.

examples:

. Joining two files:
    `minet url-join url webentities.csv post_url posts.csv > joined.csv`

. Keeping only some columns from first file:
    `minet url-join url webentities.csv post_url posts.csv -s url,id > joined.csv`

url-parse

usage: minet url-parse [-h] [-o OUTPUT] [-s SELECT] [--separator SEPARATOR]
                       [--total TOTAL]
                       column [file]

Minet Url Parse Command
=======================

Overload a CSV file containing urls with a selection of additional
metadata such as their normalized version, domain name etc.

positional arguments:
  column                      Name of the column containing urls.
  file                        Target CSV file.

optional arguments:
  -h, --help                  show this help message and exit
  -o OUTPUT, --output OUTPUT  Path to the output file. By default, the result will be printed to stdout.
  -s SELECT, --select SELECT  Columns to keep in output, separated by comma.
  --separator SEPARATOR       Split url column by a separator?
  --total TOTAL               Total number of lines in CSV file. Necessary if you want to display a finite progress indicator.

examples:

. Creating a report about a file's urls:
    `minet url-report url posts.csv > report.csv`

. Keeping only selected columns from the input file:
    `minet url-report url posts.csv -s id,url,title > report.csv`

. Multiple urls joined by separator:
    `minet url-report urls posts.csv --separator "|" > report.csv`

CrowdTangle

usage: minet crowdtangle [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT] [-t TOKEN]
                         {leaderboard,lists,posts,search,summary} ...

Minet Crowdtangle Command
=========================

Gather data from the CrowdTangle APIs easily and efficiently.

optional arguments:
  -h, --help                                show this help message and exit
  --rate-limit RATE_LIMIT                   Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT                Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN                   CrowdTangle dashboard API token.

actions:
  {leaderboard,lists,posts,search,summary}  Action to perform using the CrowdTangle API.

leaderboard

usage: minet crowdtangle leaderboard [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT]
                                     [-t TOKEN] [--no-breakdown]
                                     [-f {csv,jsonl}] [-l LIMIT]
                                     [--list-id LIST_ID]

Minet CrowdTangle Leaderboard Command
=====================================

Gather information and aggregated stats about pages and groups of
the designated dashboard (indicated by a given token).

optional arguments:
  -h, --help                            show this help message and exit
  --rate-limit RATE_LIMIT               Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT            Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN               CrowdTangle dashboard API token.
  --no-breakdown                        Whether to skip statistics breakdown by post type in the CSV output.
  -f {csv,jsonl}, --format {csv,jsonl}  Output format. Defaults to `csv`.
  -l LIMIT, --limit LIMIT               Maximum number of posts to retrieve. Will fetch every post by default.
  --list-id LIST_ID                     Optional list id from which to retrieve accounts.

examples:

. Fetching accounts statistics for every account in your dashboard:
    `minet ct leaderboard --token YOUR_TOKEN > accounts-stats.csv`

lists

usage: minet crowdtangle lists [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT]
                               [-t TOKEN]

Minet CrowdTangle Lists Command
===============================

Retrieve the lists from a CrowdTangle dashboard (indicated by a
given token).

optional arguments:
  -h, --help                  show this help message and exit
  --rate-limit RATE_LIMIT     Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT  Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN     CrowdTangle dashboard API token.

examples:

. Fetching a dashboard's lists:
    `minet ct lists --token YOUR_TOKEN > lists.csv`

posts

usage: minet crowdtangle posts [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT]
                               [-t TOKEN] [--end-date END_DATE] [-f {csv,jsonl}]
                               [--language LANGUAGE] [-l LIMIT]
                               [--list-ids LIST_IDS]
                               [--partition-strategy PARTITION_STRATEGY]
                               [--resume]
                               [--sort-by {date,interaction_rate,overperforming,total_interactions,underperforming}]
                               [--start-date START_DATE]
                               [--url-report URL_REPORT]

Minet CrowdTangle Posts Command
===============================

Gather post data from the designated dashboard (indicated by
a given token).

optional arguments:
  -h, --help                                      show this help message and exit
  --rate-limit RATE_LIMIT                         Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT                      Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN                         CrowdTangle dashboard API token.
  --end-date END_DATE                             The latest date at which a post could be posted (UTC!).
  -f {csv,jsonl}, --format {csv,jsonl}            Output format. Defaults to `csv`.
  --language LANGUAGE                             Language of posts to retrieve.
  -l LIMIT, --limit LIMIT                         Maximum number of posts to retrieve. Will fetch every post by default.
  --list-ids LIST_IDS                             Ids of the lists from which to retrieve posts, separated by commas.
  --partition-strategy PARTITION_STRATEGY         Query partition strategy to use to overcome the API search result limits. Should either be `day` or a number of posts.
  --resume                                        Whether to resume an interrupted collection. Requires -o/--output & --sort-by date
  --sort-by {date,interaction_rate,overperforming,total_interactions,underperforming}
                                                  The order in which to retrieve posts. Defaults to `date`.
  --start-date START_DATE                         The earliest date at which a post could be posted (UTC!).
  --url-report URL_REPORT                         Path to an optional report file to write about urls found in posts.

examples:

. Fetching the 500 most latest posts from a dashboard:
    `minet ct posts --token YOUR_TOKEN --limit 500 > latest-posts.csv`

search

usage: minet crowdtangle posts [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT]
                               [-t TOKEN] [--end-date END_DATE] [-f {csv,jsonl}]
                               [--language LANGUAGE] [-l LIMIT]
                               [--list-ids LIST_IDS]
                               [--partition-strategy PARTITION_STRATEGY]
                               [--resume]
                               [--sort-by {date,interaction_rate,overperforming,total_interactions,underperforming}]
                               [--start-date START_DATE]
                               [--url-report URL_REPORT]

Minet CrowdTangle Posts Command
===============================

Gather post data from the designated dashboard (indicated by
a given token).

optional arguments:
  -h, --help                                      show this help message and exit
  --rate-limit RATE_LIMIT                         Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT                      Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN                         CrowdTangle dashboard API token.
  --end-date END_DATE                             The latest date at which a post could be posted (UTC!).
  -f {csv,jsonl}, --format {csv,jsonl}            Output format. Defaults to `csv`.
  --language LANGUAGE                             Language of posts to retrieve.
  -l LIMIT, --limit LIMIT                         Maximum number of posts to retrieve. Will fetch every post by default.
  --list-ids LIST_IDS                             Ids of the lists from which to retrieve posts, separated by commas.
  --partition-strategy PARTITION_STRATEGY         Query partition strategy to use to overcome the API search result limits. Should either be `day` or a number of posts.
  --resume                                        Whether to resume an interrupted collection. Requires -o/--output & --sort-by date
  --sort-by {date,interaction_rate,overperforming,total_interactions,underperforming}
                                                  The order in which to retrieve posts. Defaults to `date`.
  --start-date START_DATE                         The earliest date at which a post could be posted (UTC!).
  --url-report URL_REPORT                         Path to an optional report file to write about urls found in posts.

examples:

. Fetching the 500 most latest posts from a dashboard:
    `minet ct posts --token YOUR_TOKEN --limit 500 > latest-posts.csv`

summary

usage: minet crowdtangle summary [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT]
                                 [-t TOKEN] [--start-date START_DATE]
                                 [--total TOTAL]
                                 column [file]

Minet CrowdTangle Link Summary Command
======================================

Retrieve aggregated statistics about link sharing
on the Crowdtangle API and by platform.

positional arguments:
  column                      Name of the column containing the URL in the CSV file.
  file                        CSV file containing the inquired URLs.

optional arguments:
  -h, --help                  show this help message and exit
  --rate-limit RATE_LIMIT     Authorized number of hits by minutes. Defaults to 6.
  -o OUTPUT, --output OUTPUT  Path to the output file. By default, everything will be printed to stdout.
  -t TOKEN, --token TOKEN     CrowdTangle dashboard API token.
  --start-date START_DATE     The earliest date at which a post could be posted (UTC!).
  --total TOTAL               Total number of HTML documents. Necessary if you want to display a finite progress indicator.

examples:

. Computing a summary of aggregated stats for urls contained in a CSV row:
    `minet ct summary url urls.csv --token YOUR_TOKEN --start-date 2019-01-01 > summary.csv`

Facebook

usage: minet facebook [-h] {comments} ...

Minet Facebook Command
======================

Collects data from Facebook.

optional arguments:
  -h, --help  show this help message and exit

actions:
  {comments}  Action to perform to collect data on Facebook

comments

usage: minet facebook comments [-h] [-c COOKIE] [-o OUTPUT] url

Minet Facebook Comments Command
===============================

Scrape series of comments on Facebook.

positional arguments:
  url                         Url of the post from which to scrape comments.

optional arguments:
  -h, --help                  show this help message and exit
  -c COOKIE, --cookie COOKIE  Authenticated cookie to use or browser from which to extract it (support "firefox" and "chrome").
  -o OUTPUT, --output OUTPUT  Path to the output report file. By default, the report will be printed to stdout.

examples:

. Fetching a dashboard's lists:
    `minet fb comments`

Hyphe

dump

usage: minet hyphe dump [-h] [-d OUTPUT_DIR] [--body] url corpus

Minet Hyphe Dump Command
========================

Command dumping page-level information from a given
Hyphe corpus.

positional arguments:
  url                                     Url of the Hyphe API.
  corpus                                  Id of the corpus.

optional arguments:
  -h, --help                              show this help message and exit
  -d OUTPUT_DIR, --output-dir OUTPUT_DIR  Output directory for dumped files. Will default to some name based on corpus name.
  --body                                  Whether to download pages body.

examples:

. Dumping a corpus into the ./corpus directory:
    `minet hyphe dump http://myhyphe.com/api/ corpus-name -d corpus`

Mediacloud

topic

stories

usage: minet mediacloud topic stories [-h] [-t TOKEN] [-o OUTPUT] topic_id

Minet Mediacloud Topic Stories Command
======================================

Retrieves the list of stories from a mediacloud topic.

positional arguments:
  topic_id                    Id of the topic

optional arguments:
  -h, --help                  show this help message and exit
  -t TOKEN, --token TOKEN     Mediacloud API token (also called key).
  -o OUTPUT, --output OUTPUT  Path to the output report file. By default, the report will be printed to stdout.

API

multithreaded_fetch

Function fetching urls in a multithreaded fashion.

from minet import multithreaded_fetch

# Most basic usage
urls = ['https://google.com', 'https://twitter.com']

for result in multithreaded_fetch(urls):
  print(result.url, result.response.status)

# Using a list of dicts

urls = [
  {
    'url': 'https://google.com',
    'label': 'Google'
  },
  {
    'url': 'https://twitter.com',
    'label': 'Twitter'
  }
]

for result in multithreaded_fetch(urls, key=lambda x: x['url']):
  print(result.item['label'], result.response.status)

Arguments:

iterator iterable: An iterator over urls or arbitrary items, if you provide a key argument along with it.
key ?callable: A function extracting the url to fetch from the items yielded by the provided iterator.
request_args ?callable: A function returning arguments to pass to the internal request helper for a call.
threads ?int [25]: Number of threads to use.
throttle ?float|callable [0.2]: Per-domain throttle in seconds. Or a function taking the domain and current item and returning the throttle to apply.
guess_extension ?bool [True]: Whether to attempt to guess the resource's extension.
guess_encoding ?bool [True]: Whether to attempt to guess the resource's encoding.
buffer_size ?int [25]: Max number of items per domain to enqueue into memory in hope of finding a new domain that can be processed immediately.
insecure ?bool [False]: Whether to ignore SSL certification errors when performing requests.
timeout ?float|urllib3.Timeout: Custom timeout for every request.

Yields:

A FetchWorkerResult having the following attributes:

url ?string: the fetched url.
item any: original item from the iterator.
error ?Exception: an error.
response ?urllib3.HTTPResponse: the http response.
meta ?dict: additional metadata:
- mime ?string: resource's mimetype.
- ext ?string: resource's extension.
- encoding ?string: resource's encoding.

multithreaded_resolve

Function resolving url redirections in a multithreaded fashion.

from minet import multithreaded_resolve

# Most basic usage
urls = ['https://bit.ly/whatever', 'https://t.co/whatever']

for result in multithreaded_resolve(urls):
  print(result.stack)

# Using a list of dicts

urls = [
  {
    'url': 'https://bit.ly/whatever',
    'label': 'Bit.ly'
  },
  {
    'url': 'https://t.co/whatever',
    'label': 'Twitter'
  }
]

for result in multithreaded_resolve(urls, key=lambda x: x['url']):
  print(result.stack)

Arguments:

iterator iterable: An iterator over urls or arbitrary items, if you provide a key argument along with it.
key ?callable: A function extracting the url to fetch from the items yielded by the provided iterator.
resolve_args ?callable: A function returning arguments to pass to the internal resolve helper for a call.
threads ?int [25]: Number of threads to use.
throttle ?float|callable [0.2]: Per-domain throttle in seconds. Or a function taking the domain and current item and returning the throttle to apply.
max_redirects ?int [5]: Max number of redirections to follow.
follow_refresh_header ?bool [False]: Whether to follow Refresh headers or not.
follow_meta_refresh ?bool [False]: Whether to follow meta refresh tags. It's more costly because we need to stream the start of the response's body and cannot rely on headers alone.
buffer_size ?int [25]: Max number of items per domain to enqueue into memory in hope of finding a new domain that can be processed immediately.
insecure ?bool [False]: Whether to ignore SSL certification errors when performing requests.
timeout ?float|urllib3.Timeout: Custom timeout for every request.

Yields:

A ResolveWorkerResult having the following attributes:

url ?string: the fetched url.
item any: original item from the iterator.
error ?Exception: an error.
stack ?list: the redirection stack.

Project details

Release history Release notifications | RSS feed

3.1.0

Oct 2, 2024

3.0.0

Sep 11, 2024

2.0.8

Aug 7, 2024

2.0.7

Jul 5, 2024

2.0.6

Jul 5, 2024

2.0.4

May 28, 2024

2.0.3

Apr 24, 2024

2.0.2

Apr 22, 2024

2.0.1

Apr 16, 2024

2.0.0

Apr 16, 2024

1.5.1

Mar 28, 2024

1.5.0

Mar 28, 2024

1.4.1

Mar 19, 2024

1.4.0

Feb 23, 2024

1.3.5

Feb 15, 2024

1.3.3

Feb 14, 2024

1.3.2

Jan 17, 2024

1.3.1

Jan 17, 2024

1.3.0

Jan 16, 2024

1.2.2

Dec 20, 2023

1.2.1

Dec 13, 2023

1.2.0

Dec 7, 2023

1.1.10

Nov 28, 2023

1.1.9

Nov 27, 2023

1.1.8

Nov 23, 2023

1.1.7

Nov 15, 2023

1.1.6

Nov 8, 2023

1.1.5

Nov 7, 2023

1.1.4

Nov 6, 2023

1.1.3

Nov 3, 2023

1.1.2

Nov 2, 2023

1.1.1

Nov 2, 2023

1.1.0

Nov 2, 2023

1.0.1

Oct 26, 2023

1.0.0

Oct 23, 2023

1.0.0a55 pre-release

Oct 19, 2023

1.0.0a54 pre-release

Oct 16, 2023

1.0.0a53 pre-release

Oct 10, 2023

1.0.0a52 pre-release

Sep 29, 2023

1.0.0a51 pre-release

Aug 21, 2023

1.0.0a50 pre-release

Aug 18, 2023

1.0.0a49 pre-release

Aug 15, 2023

1.0.0a48 pre-release

Aug 3, 2023

1.0.0a47 pre-release

Aug 3, 2023

1.0.0a46 pre-release

Jul 28, 2023

1.0.0a45 pre-release

Jul 24, 2023

1.0.0a44 pre-release

Jul 20, 2023

1.0.0a43 pre-release

Jul 12, 2023

1.0.0a42 pre-release

Jul 12, 2023

1.0.0a41 pre-release

Jul 5, 2023

1.0.0a40 pre-release

Jul 5, 2023

1.0.0a39 pre-release

Jul 5, 2023

1.0.0a38 pre-release

Jun 27, 2023

1.0.0a37 pre-release

Jun 14, 2023

1.0.0a36 pre-release

Jun 9, 2023

1.0.0a35 pre-release

Jun 1, 2023

1.0.0a34 pre-release

Jun 1, 2023

1.0.0a33 pre-release

May 31, 2023

1.0.0a32 pre-release

May 26, 2023

1.0.0a31 pre-release

May 25, 2023

1.0.0a30 pre-release

May 17, 2023

1.0.0a29 pre-release

May 15, 2023

1.0.0a28 pre-release

May 15, 2023

1.0.0a27 pre-release

May 10, 2023

1.0.0a26 pre-release

May 5, 2023

1.0.0a25 pre-release

May 2, 2023

1.0.0a24 pre-release

Apr 27, 2023

1.0.0a23 pre-release

Apr 27, 2023

1.0.0a22 pre-release

Apr 27, 2023

1.0.0a21 pre-release

Apr 26, 2023

1.0.0a20 pre-release

Apr 25, 2023

1.0.0a19 pre-release

Apr 21, 2023

1.0.0a18 pre-release

Apr 21, 2023

1.0.0a17 pre-release

Apr 20, 2023

1.0.0a16 pre-release

Apr 20, 2023

1.0.0a15 pre-release

Apr 1, 2023

1.0.0a14 pre-release

Mar 27, 2023

1.0.0a13 pre-release

Mar 15, 2023

1.0.0a12 pre-release

Mar 15, 2023

1.0.0a11 pre-release

Mar 14, 2023

1.0.0a10 pre-release

Mar 13, 2023

1.0.0a9 pre-release

Mar 13, 2023

1.0.0a8 pre-release

Mar 10, 2023

1.0.0a7 pre-release

Mar 10, 2023

1.0.0a6 pre-release

Mar 9, 2023

1.0.0a5 pre-release

Mar 9, 2023

1.0.0a4 pre-release

Mar 8, 2023

1.0.0a3 pre-release

Mar 3, 2023

1.0.0a2 pre-release

Mar 1, 2023

1.0.0a1 pre-release

Feb 28, 2023

0.67.1

Feb 1, 2023

0.67.0

Jan 26, 2023

0.66.2

Jan 20, 2023

0.66.1

Dec 13, 2022

0.66.0

Dec 7, 2022

0.65.0

Nov 9, 2022

0.64.0

Nov 8, 2022

0.63.1

Oct 14, 2022

0.63.0

Oct 14, 2022

0.62.1

Sep 26, 2022

0.62.0

Sep 21, 2022

0.61.6

Sep 14, 2022

0.61.5

Aug 10, 2022

0.61.4

Jul 29, 2022

0.61.3

Jul 27, 2022

0.61.2

Jul 27, 2022

0.61.1

Jul 26, 2022

0.61.0

Jul 25, 2022

0.60.4

May 19, 2022

0.60.3

May 5, 2022

0.60.2

Apr 27, 2022

0.60.1

Apr 11, 2022

0.60.0

Apr 6, 2022

0.59.0

Apr 6, 2022

0.58.1

Mar 2, 2022

0.58.0

Feb 23, 2022

0.57.0

Feb 14, 2022

0.56.4

Jan 12, 2022

0.56.2

Dec 17, 2021

0.56.1

Dec 8, 2021

0.56.0

Dec 6, 2021

0.55.9

Nov 19, 2021

0.55.8

Nov 19, 2021

0.55.7

Nov 19, 2021

0.55.6

Nov 18, 2021

0.55.3

Nov 18, 2021

0.55.2

Nov 9, 2021

0.55.1

Nov 9, 2021

0.55.0

Nov 4, 2021

0.54.1

Nov 3, 2021

0.54.0

Oct 21, 2021

0.53.10

Oct 12, 2021

0.53.9

Sep 29, 2021

0.53.8

Sep 15, 2021

0.53.7

Sep 10, 2021

0.53.6

Sep 10, 2021

0.53.5

Sep 8, 2021

0.53.4

Aug 25, 2021

0.53.3

Jul 23, 2021

0.53.2

Jul 5, 2021

0.53.1

Jul 1, 2021

0.53.0

Jun 29, 2021

0.52.13

Jun 22, 2021

0.52.12

Jun 21, 2021

0.52.11

Jun 16, 2021

0.52.10

Jun 14, 2021

0.52.9

Jun 9, 2021

0.52.8

Jun 4, 2021

0.52.7

Jun 3, 2021

0.52.6

May 20, 2021

0.52.5

May 8, 2021

0.52.4

May 7, 2021

0.52.3

Apr 29, 2021

0.52.2

Apr 28, 2021

0.52.1

Apr 28, 2021

0.52.0

Apr 28, 2021

0.51.7

Apr 26, 2021

0.51.6

Apr 20, 2021

0.51.5

Apr 16, 2021

0.51.4

Apr 16, 2021

0.51.3

Apr 14, 2021

0.51.2

Apr 13, 2021

0.51.1

Apr 12, 2021

0.51.0

Apr 11, 2021

0.50.1

Apr 9, 2021

0.50.0

Apr 2, 2021

0.49.4

Mar 30, 2021

0.49.3

Mar 29, 2021

0.49.2

Mar 29, 2021

0.49.1

Mar 27, 2021

0.49.0

Mar 26, 2021

0.48.1

Mar 20, 2021

0.48.0

Mar 20, 2021

0.47.0

Mar 15, 2021

0.46.5

Mar 8, 2021

0.46.4

Mar 7, 2021

0.46.3

Mar 6, 2021

0.46.2

Mar 6, 2021

0.46.1

Mar 5, 2021

0.46.0

Mar 4, 2021

0.45.1

Mar 3, 2021

0.45.0

Mar 3, 2021

0.44.0

Feb 25, 2021

0.43.0

Feb 24, 2021

0.42.5

Feb 23, 2021

0.42.3

Feb 22, 2021

0.42.2

Feb 19, 2021

0.42.1

Feb 18, 2021

0.42.0

Feb 10, 2021

0.41.5

Feb 8, 2021

0.41.4

Feb 4, 2021

0.41.3

Feb 3, 2021

0.41.2

Feb 2, 2021

0.41.1

Feb 2, 2021

0.41.0

Feb 2, 2021

0.40.0

Jan 22, 2021

0.39.6

Jan 4, 2021

0.39.5

Jan 4, 2021

0.39.4

Dec 19, 2020

0.39.3

Dec 19, 2020

0.39.2

Dec 19, 2020

0.39.1

Dec 19, 2020

0.39.0

Dec 16, 2020

0.38.0

Dec 16, 2020

0.37.1

Dec 4, 2020

0.37.0

Dec 1, 2020

0.36.0

Nov 24, 2020

0.35.5

Nov 23, 2020

0.35.4

Nov 23, 2020

0.35.3

Nov 18, 2020

0.35.2

Nov 16, 2020

0.35.1

Nov 13, 2020

0.35.0

Nov 11, 2020

0.34.1

Nov 10, 2020

0.34.0

Nov 10, 2020

0.33.0

Nov 6, 2020

0.32.5

Nov 2, 2020

0.32.4

Nov 2, 2020

0.32.3

Nov 2, 2020

0.32.2

Oct 30, 2020

0.32.1

Oct 28, 2020

0.32.0

Oct 28, 2020

0.31.1

Jul 10, 2020

0.31.0

Jul 2, 2020

0.30.1

May 15, 2020

0.30.0

May 15, 2020

0.29.1

May 7, 2020

0.29.0

Apr 27, 2020

0.28.0

Apr 24, 2020

0.27.0

Mar 30, 2020

0.26.0

Mar 24, 2020

0.25.0

Mar 9, 2020

0.24.2

Mar 6, 2020

0.24.1

Mar 6, 2020

0.24.0

Feb 26, 2020

0.23.0

Feb 18, 2020

0.22.0

Jan 30, 2020

0.21.1

Jan 6, 2020

This version

0.21.0

Jan 6, 2020

0.20.2

Oct 23, 2019

0.20.1

Oct 18, 2019

0.20.0

Oct 18, 2019

0.19.0

Oct 18, 2019

0.18.0

Oct 15, 2019

0.17.0

Oct 14, 2019

0.16.0

Oct 10, 2019

0.15.0

Oct 9, 2019

0.14.3

Oct 7, 2019

0.14.2

Oct 7, 2019

0.14.1

Oct 4, 2019

0.14.0

Oct 4, 2019

0.13.0

Oct 2, 2019

0.12.0

Oct 2, 2019

0.11.0

Sep 30, 2019

0.10.0

Sep 24, 2019

0.9.1

Sep 23, 2019

0.9.0

Sep 23, 2019

0.8.2

Sep 19, 2019

0.8.1

Sep 19, 2019

0.8.0

Sep 17, 2019

0.7.0

Jul 23, 2019

0.6.0

Jul 19, 2019

0.5.0

Jul 17, 2019

0.4.1

Jul 16, 2019

0.4.0

Jul 16, 2019

0.3.0

Jul 15, 2019

0.2.0

Jun 18, 2019

0.1.0

Mar 22, 2019

0.0.1

Feb 28, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minet-0.21.0.tar.gz (61.1 kB view details)

Uploaded Jan 6, 2020 Source

Built Distribution

minet-0.21.0-py3-none-any.whl (65.2 kB view details)

Uploaded Jan 6, 2020 Python 3

File details

Details for the file minet-0.21.0.tar.gz.

File metadata

Download URL: minet-0.21.0.tar.gz
Upload date: Jan 6, 2020
Size: 61.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for minet-0.21.0.tar.gz
Algorithm	Hash digest
SHA256	`c7c3f50ab6f7de3b63cfa2fc2d65fafd33f7dd80aac73bb74226615f15ee12c5`
MD5	`894d6f0628bdd0acb9405a36c2a8ae1d`
BLAKE2b-256	`b00cfce8400f27d0139002a57c1db0c8a5ace1588b462f254931faa7c249b99d`

See more details on using hashes here.

Provenance

File details

Details for the file minet-0.21.0-py3-none-any.whl.

File metadata

Download URL: minet-0.21.0-py3-none-any.whl
Upload date: Jan 6, 2020
Size: 65.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.11.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for minet-0.21.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e1f4ea00673b76ee3b904da4ace29bb25cdda71e1e2463b2230c61ddf1a66b7`
MD5	`cbaa83173fc8ab1712d0f5923c2c2171`
BLAKE2b-256	`12612034c0e15db59d8937aa31ec92752bbba77d7ba6afb4d89800808b058edf`

See more details on using hashes here.

minet 0.21.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Features

Installation

Cookbook

Usage

CLI

API

CLI

-h/--help

crawl

fetch

extract

scrape

url-join

url-parse

CrowdTangle

leaderboard

lists

posts

search

summary

Facebook

comments

Hyphe

dump

Mediacloud

topic

stories

API

multithreaded_fetch

multithreaded_resolve

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance