Build datasets with natural language
Project description
title: Synthetic Data Generator short_description: Build datasets using natural language emoji: 🧬 colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: true license: apache-2.0 hf_oauth: true #header: mini hf_oauth_scopes:
- read-repos
- write-repos
- manage-repos
- inference-api
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
🧬 Synthetic Data Generator
Build datasets using natural language
This repository contains the code for the free Synthetic Data Generator app, which is hosted on the Hugging Face Hub.
How it works?
Distilabel Synthetic Data Generator is a tool that allows you to easily create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and advanced language models to generate synthetic data tailored to your specific needs.
This tool simplifies the process of creating custom datasets, enabling you to:
- Define the characteristics of your desired application
- Generate system prompts and tasks automatically
- Create sample datasets for quick iteration
- Produce full-scale datasets with customizable parameters
- Push your generated datasets directly to the Hugging Face Hub
By using Distilabel Synthetic Data Generator, you can rapidly prototype and create datasets for, accelerating your AI development process.
Do you want to run this locally?
You can simply clone the repository and run it locally with:
pip install -r requirements.txt
python app.py
Note that you do need to have an HF_TOKEN
that can make calls to the free serverless Hugging Face Inference Endpoints. You can get one here.
Do you need more control?
Each pipeline is based on a distilabel component, so you can easily run it locally or with other LLMs.
Check out the distilabel library for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file synthetic_dataset_generator-0.1.0.tar.gz
.
File metadata
- Download URL: synthetic_dataset_generator-0.1.0.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.18.0 CPython/3.12.7 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe543e14e0419ef182080fc0cecd49d0e8aae507158e7c823364ca0fdd2e178a |
|
MD5 | c6a4e1a42386758b4254a6f2b7c2740d |
|
BLAKE2b-256 | 70b2890a04899b75ea5bd94dd6987ec6d0d6b7eee828bc417d6fa7ec876f7645 |
File details
Details for the file synthetic_dataset_generator-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: synthetic_dataset_generator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.18.0 CPython/3.12.7 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5eebba9b044058fa631ca74ac4b64fbbeefe2017a15712ddd81f82a8ef48e35 |
|
MD5 | 2b342592a7fac651f541148bf328a921 |
|
BLAKE2b-256 | a12b1e6bb9a9662efc8059b96cc6bb48c02900acb712cd8d94a89918d8aea46b |