Skip to main content

Python tool to extract sentences from po files and create language datasets for NLP machine learning

Project description

PO2Dataset

Python tool to extract sentences from po files and create language datasets for NLP machine learning and neural machine translation.

This command line tool is intended to create dataset packages suitable for Argos Train.

How to install

Manual installation

Create a virtual environment using virtualenv

git clone https://github.com/urtzai/po2dataset.git
virtualenv po2dataset
cd po2dataset
source ./bin/activate

Quick start guide

Create Argos Train suitable dataset

python po2dataset/po2dataset.py <path_to_po_file> --name <project_name> --source_code <source_lang_code> --target_code <target_lang_code> --ref "Some reference information of the project"

Where:

  • name: The name of the project
  • source_code: Source language code (ISO 639)
  • target_code: Target language code (ISO 639)
  • ref: Some reference information of the project

Support

Should you experience any issues do not hesistate to post an issue or contribute in this project pulling requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

po2dataset-0.1.0b0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

po2dataset-0.1.0b0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file po2dataset-0.1.0b0.tar.gz.

File metadata

  • Download URL: po2dataset-0.1.0b0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for po2dataset-0.1.0b0.tar.gz
Algorithm Hash digest
SHA256 a2df36f46904e7d74a8d4368bf1dd5a33b06d79b102866c0f6d66ae99f165480
MD5 905cda915af08c11dae1b85d25f60159
BLAKE2b-256 8fbe8af23f2439161832724d214e0ddf643b81aaabab3f4a51ea4d50d186d391

See more details on using hashes here.

File details

Details for the file po2dataset-0.1.0b0-py3-none-any.whl.

File metadata

File hashes

Hashes for po2dataset-0.1.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0138563e7d025d05f2771b7185d0347941c396b988bf791204a8a21eee384f5
MD5 475cb9157d9fd3335fcbb1318a0374b5
BLAKE2b-256 44311a8af9d96a038fd8f89b7afd6ad36bd437a6932018b885e97baf4074dbb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page