Skip to main content

Contains Retrieval Augmented Generation related utilities for Azure Machine Learning and OSS interoperability.

Project description

AzureML Retrieval Augmented Generation Utilities

This package is in alpha stage at the moment, use at risk of breaking changes and unstable behavior.

It contains utilities for:

  • Processing text documents into chunks appropriate for use in LLM prompts, with metadata such is source url.
  • Embedding chunks with OpenAI or HuggingFace embeddings models, including the ability to update a set of embeddings over time.
  • Create MLIndex artifacts from embeddings, a yaml file capturing metadata needed to deserialize different kinds of Vector Indexes for use in langchain. Supported Index types:
    • FAISS index (via langchain)
    • Azure Cognitive Search index

Getting started

You can install AzurrML RAG package via pip.

pip install azureml-rag

MLIndex

MLIndex files describe an index of data + embeddings and the embeddings model used in yaml.

embeddings:
  dimension: 768
  kind: hugging_face
  model: sentence-transformers/all-mpnet-base-v2
  schema_version: '2'
index:
  api_version: 2021-04-30-Preview
  connection:
    id: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace>/connections/<acs_connection_name>
  connection_type: workspace_connection
  endpoint: https://<acs_name>.search.windows.net
  engine: azure-sdk
  field_mapping:
    content: content
    filename: sourcefile
    metadata: meta_json_string
    title: title
    url: sourcepage
  index: azureml-rag-test-206e03b6-3880-407b-9bc4-c0a1162d6c70
  kind: acs

Create MLIndex

TODO: Link to Example Notebooks

Consume MLIndex

from azureml.rag.mlindex import MLIndex

retriever = MLIndex(uri_to_folder_with_mlindex).as_langchain_retriever()
retriever.get_relevant_documents('What is an AzureML Compute Instance?')

Changelog

0.1.6 (2023-05-31)

  • Fail crack_and_chunk task when no files were processed (usually because of a malformed input_glob)
  • Change update_acs.py to default push_embeddings=True instead of False.

0.1.5 (2023-05-19)

  • Add api_base back to MLIndex embeddings config for back-compat (until all clients start getting it from Workspace Connection).
  • Add telemetry for tasks used in pipeline components, not enabled by default for SDK usage.

0.1.4 (2023-05-17)

  • Fix bug where enabling rcts option on split_documents used nltk splitter instead.

0.1.3 (2023-05-12)

  • Support Workspace Connection based auth for Git, Azure OpenAI and Azure Cognitive Search usage.

0.1.2 (2023-05-05)

  • Refactored document chunking to allow insertion of custom processing logic

0.0.1 (2023-04-25)

Features Added

  • Introduced package
  • langchain Retriever for Azure Cognitive Search

Project details


Release history Release notifications | RSS feed

This version

0.1.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

azureml_rag-0.1.7-py3-none-any.whl (172.8 kB view details)

Uploaded Python 3

File details

Details for the file azureml_rag-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: azureml_rag-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 172.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.9.6 requests/2.31.0 setuptools/50.3.2 requests-toolbelt/1.0.0 tqdm/4.65.0 CPython/3.8.13

File hashes

Hashes for azureml_rag-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b9e3f85fcbb0c9b5534c4f19ddce0a7895b8ab1ffb64c34f1eedc6f65876497a
MD5 12aa47a0cdc7d75f2556ffc986d84508
BLAKE2b-256 8d92a7d43fa14b0c037858faaa7892641504b86cbe2ab08bed3f179e912c73cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page