Skip to main content

Watermarking and detection for speech audios

Project description

:loud_sound: AudioSeal: Proactive Localized Watermarking

Python Code style: black

Inference code for AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed (training code coming soon). More details can be found in the paper.

[arXiv] [Colab notebook][🤗Hugging Face]

fig

Updates:

  • 2024-06-17: Training code is now available. Check the instruction !!!
  • 2024-05-31: Our paper gets accepted at ICML'24 :)
  • 2024-04-02: We have updated our license to full MIT license (including the license for the model weights) ! Now you can use AudioSeal in commercial application too !
  • 2024-02-29: AudioSeal 0.1.2 is out, with more bug fixes for resampled audios and updated notebooks

Abtract

We introduce AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed. It jointly trains a generator that embeds a watermark in the audio, and a detector that detects the watermarked fragments in longer audios, even in the presence of editing. Audioseal achieves state-of-the-art detection performance of both natural and synthetic speech at the sample level (1/16k second resolution), it generates limited alteration of signal quality and is robust to many types of audio editing. Audioseal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed — achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

:mate: Installation

AudioSeal requires Python >=3.8, Pytorch >= 1.13.0, omegaconf, julius, and numpy. To install from PyPI:

pip install audioseal

To install from source: Clone this repo and install in editable mode:

git clone https://github.com/facebookresearch/audioseal
cd audioseal
pip install -e .

:gear: Models

You can find all the model checkpoints on the Hugging Face Hub. We provide the checkpoints for the following models:

  • AudioSeal Generator. It takes as input an audio signal (as a waveform), and outputs a watermark of the same size as the input, that can be added to the input to watermark it. Optionally, it can also take as input a secret message of 16-bits that will be encoded in the watermark.
  • AudioSeal Detector. It takes as input an audio signal (as a waveform), and outputs a probability that the input contains a watermark at each sample of the audio (every 1/16k s). Optionally, it may also output the secret message encoded in the watermark.

Note that the message is optional and has no influence on the detection output. It may be used to identify a model version for instance (up to $2**16=65536$ possible choices).

Note: We are working to release the training code for anyone wants to build their own watermarker. Stay tuned !

:abacus: Usage

Audioseal provides a simple API to watermark and detect the watermarks from an audio sample. Example usage:

from audioseal import AudioSeal

# model name corresponds to the YAML card file name found in audioseal/cards
model = AudioSeal.load_generator("audioseal_wm_16bits")

# Other way is to load directly from the checkpoint
# model =  Watermarker.from_pretrained(checkpoint_path, device = wav.device)

# a torch tensor of shape (batch, channels, samples) and a sample rate
# It is important to process the audio to the same sample rate as the model
# expectes. In our case, we support 16khz audio 
wav, sr = ..., 16000

watermark = model.get_watermark(wav, sr)

# Optional: you can add a 16-bit message to embed in the watermark
# msg = torch.randint(0, 2, (wav.shape(0), model.msg_processor.nbits), device=wav.device)
# watermark = model.get_watermark(wav, message = msg)

watermarked_audio = wav + watermark

detector = AudioSeal.load_detector("audioseal_detector_16bits")

# To detect the messages in the high-level.
result, message = detector.detect_watermark(watermarked_audio, sr)

print(result) # result is a float number indicating the probability of the audio being watermarked,
print(message)  # message is a binary vector of 16 bits


# To detect the messages in the low-level.
result, message = detector(watermarked_audio, sr)

# result is a tensor of size batch x 2 x frames, indicating the probability (positive and negative) of watermarking for each frame
# A watermarked audio should have result[:, 1, :] > 0.5
print(result[:, 1 , :])  

# Message is a tensor of size batch x 16, indicating of the probability of each bit to be 1.
# message will be a random tensor if the detector detects no watermarking from the audio
print(message)  

Train your own watermarking model

See here for details on how to train your own Watermarking model.

Want to contribute?

We welcome Pull Requests with improvements or suggestions. If you want to flag an issue or propose an improvement, but dont' know how to realize it, create a GitHub Issue.

Troubleshooting

  • If you encounter the error ValueError: not enough values to unpack (expected 3, got 2), this is because we expect a batch of audio tensors as inputs. Add one dummy batch dimension to your input (e.g. wav.unsqueeze(0), see example notebook for getting started).

  • In Windows machines, if you encounter the error KeyError raised while resolving interpolation: "Environmen variable 'USER' not found": This is due to an old checkpoint uploaded to the model hub, which is not compatible in Windows. Try to invalidate the cache by removing the files in C:\Users\<USER>\.cache\audioseal and re-run again.

  • If you use torchaudio to handle your audios and encounter the error Couldn't find appropriate backend to handle uri ..., this is due to newer version of torchaudio does not handle the default backend well. Either downgrade your torchaudio to 2.1.0 or earlier, or install soundfile as your audio backend.

License

  • The code in this repository is released under the MIT license as found in the LICENSE file.

Maintainers:

Citation

If you find this repository useful, please consider giving a star :star: and please cite as:

@article{sanroman2024proactive,
  title={Proactive Detection of Voice Cloning with Localized Watermarking},
  author={San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and D´efossez, Alexandre and Furon, Teddy and Tran, Tuan},
  journal={ICML},
  year={2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audioseal-0.1.4.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

audioseal-0.1.4-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file audioseal-0.1.4.tar.gz.

File metadata

  • Download URL: audioseal-0.1.4.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for audioseal-0.1.4.tar.gz
Algorithm Hash digest
SHA256 28580a640a47f445347a76ce9f02e639fae781535e5bfbde24baa4767ce58317
MD5 3211ceabd08326cd1e0ec054f87d1518
BLAKE2b-256 cae15a4548b076fe54250a0678c09b8d0c868d2d9aeff8354c8b752be74ec001

See more details on using hashes here.

Provenance

File details

Details for the file audioseal-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: audioseal-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for audioseal-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e1a36741fadeae3b71d2891ed0639a262f0924b6615be737e005dc629f470fae
MD5 b4b0bb477ae8b3e745e776695c1226ad
BLAKE2b-256 9120b8bf24be673e159e99b2c3e2df128a19f2f217862118becb14db7bb559ba

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page