Microsoft Azure Cognitive Search Client Library for Python

These details have not been verified by PyPI

Project links

Homepage

Project description

Azure Cognitive Search client library for Python

Azure Cognitive Search is a search-as-a-service cloud solution that gives developers APIs and tools for adding a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

The Azure Cognitive Search service is well suited for the following application scenarios:

Consolidate varied content types into a single searchable index. To populate an index, you can push JSON documents that contain your content, or if your data is already in Azure, create an indexer to pull in data automatically.
Attach skillsets to an indexer to create searchable content from images and large text documents. A skillset leverages AI from Cognitive Services for built-in OCR, entity recognition, key phrase extraction, language detection, text translation, and sentiment analysis. You can also add custom skills to integrate external processing of your content during data ingestion.
In a search client application, implement query logic and user experiences similar to commercial web search engines.

Use the Azure.Search.Documents client library to:

Submit queries for simple and advanced query forms that include fuzzy search, wildcard search, regular expressions.
Implement filtered queries for faceted navigation, geospatial search, or to narrow results based on filter criteria.
Create and manage search indexes.
Upload and update documents in the search index.
Create and manage indexers that pull data from Azure into an index.
Create and manage skillsets that add AI enrichment to data ingestion.
Create and manage analyzers for advanced text analysis or multi-lingual content.
Optimize results through scoring profiles to factor in business logic or freshness.

Source code | Package (PyPI) | API reference documentation | Product documentation | Samples

Getting started

Install the package

Install the Azure Cognitive Search client library for Python with pip:

pip install azure-search-documents --pre

Prerequisites

Python 2.7, or 3.5 or later is required to use this package.
You need an Azure subscription and a Azure Cognitive Search service to use this package.

To create a new search service, you can use the Azure portal, Azure PowerShell, or the Azure CLI.

az search service create --name <mysearch> --resource-group <mysearch-rg> --sku free --location westus

See choosing a pricing tier for more information about available options.

Authenticate the client

All requests to a search service need an api-key that was generated specifically for your service. The api-key is the sole mechanism for authenticating access to your search service endpoint. You can obtain your api-key from the Azure portal or via the Azure CLI:

az search admin-key show --service-name <mysearch> --resource-group <mysearch-rg>

There are two types of keys used to access your search service: admin (read-write) and query (read-only) keys. Restricting access and operations in client apps is essential to safeguarding the search assets on your service. Always use a query key rather than an admin key for any query originating from a client app.

Note: The example Azure CLI snippet above retrieves an admin key so it's easier to get started exploring APIs, but it should be managed carefully.

We can use the api-key to create a new SearchClient.

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

index_name = "nycjobs";
# Get the service endpoint and API key from the environment
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]

# Create a client
credential = AzureKeyCredential(key)
client = SearchClient(endpoint=endpoint,
                      index_name=index_name,
                      credential=credential)

Send your first search request

To get running immediately, we're going to connect to a well known sandbox Search service provided by Microsoft. This means you do not need an Azure subscription or Azure Cognitive Search service to try out this query.

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

# We'll connect to the Azure Cognitive Search public sandbox and send a
# query to its "nycjobs" index built from a public dataset of available jobs
# in New York.
service_name = "azs-playground"
index_name = "nycjobs"
api_key = "252044BE3886FE4A8E3BAA4F595114BB"

# Create a SearchClient to send queries
endpoint = "https://{}.search.windows.net/".format(service_name)
credential = AzureKeyCredential(api_key)
client = SearchClient(endpoint=endpoint,
                      index_name=index_name,
                      credential=credential)

# Let's get the top 5 jobs related to Microsoft
results = client.search(search_text="Microsoft", top=5)

for result in results:
    # Print out the title and job description
    print("{}\n{}\n)".format(result["business_title"], result["job_description"]))

Key concepts

An Azure Cognitive Search service contains one or more indexes that provide persistent storage of searchable data in the form of JSON documents. (If you're brand new to search, you can make a very rough analogy between indexes and database tables.) The Azure.Search.Documents client library exposes operations on these resources through two main client types.

SearchClient helps with:
- Searching your indexed documents using rich queries and powerful data shaping
- Autocompleting partially typed search terms based on documents in the index
- Suggesting the most likely matching text in documents as a user types
- Adding, Updating or Deleting Documents documents from an index
SearchIndexClient allows you to:
- Create, delete, update, or configure a search index
- Declare custom synonym maps to expand or rewrite queries
- Most of the SearchServiceClient functionality is not yet available in our current preview
SearchIndexerClient allows you to:
- Start indexers to automatically crawl data sources
- Define AI powered Skillsets to transform and enrich your data

The Azure.Search.Documents client library (v1) is a brand new offering for Python developers who want to use search technology in their applications. There is an older, fully featured Microsoft.Azure.Search client library (v10) with many similar looking APIs, so please be careful to avoid confusion when exploring online resources.

Examples

The following examples all use a simple Hotel data set that you can import into your own index from the Azure portal. These are just a few of the basics - please check out our Samples for much more.

Querying
Creating an index
Adding documents to your index
Retrieving a specific document from your index
Async APIs

Querying

Let's start by importing our namespaces.

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

We'll then create a SearchClient to access our hotels search index.

index_name = "hotels"
# Get the service endpoint and API key from the environment
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]

# Create a client
credential = AzureKeyCredential(key)
client = SearchClient(endpoint=endpoint,
                      index_name=index_name,
                      credential=credential)

Let's search for a "luxury" hotel.

results = client.search(search_text="luxury")

for result in results:
    print("{}: {})".format(result["hotelId"], result["hotelName"]))

Creating an index

You can use the SearchIndexClient to create a search index. Fields can be defined using convenient SimpleField, SearchableField, or ComplexField models. Indexes can also define suggesters, lexical analyzers, and more.

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import ( 
    ComplexField, 
    CorsOptions, 
    SearchIndex, 
    ScoringProfile, 
    SearchFieldDataType, 
    SimpleField, 
    SearchableField 
)

endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]

# Create a service client
client = SearchIndexClient(endpoint, AzureKeyCredential(key))

# Create the index
name = "hotels"
fields = [
        SimpleField(name="hotelId", type=SearchFieldDataType.String, key=True),
        SimpleField(name="baseRate", type=SearchFieldDataType.Double),
        SearchableField(name="description", type=SearchFieldDataType.String),
        ComplexField(name="address", fields=[
            SimpleField(name="streetAddress", type=SearchFieldDataType.String),
            SimpleField(name="city", type=SearchFieldDataType.String),
        ])
    ]
cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)
scoring_profiles = []

index = SearchIndex(
    name=name,
    fields=fields,
    scoring_profiles=scoring_profiles,
    cors_options=cors_options)

result = client.create_index(index)

Adding documents to your index

You can Upload, Merge, MergeOrUpload, and Delete multiple documents from an index in a single batched request. There are a few special rules for merging to be aware of.

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

index_name = "hotels"
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]

DOCUMENT = {
    'Category': 'Hotel',
    'HotelId': '1000',
    'Rating': 4.0,
    'Rooms': [],
    'HotelName': 'Azure Inn',
}

result = client.upload_documents(documents=[DOCUMENT])

print("Upload of new document succeeded: {}".format(result[0].succeeded))

Retrieve a specific document from an index

In addition to querying for documents using keywords and optional filters, you can retrieve a specific document from your index if you already know the key. You could get the key from a query, for example, and want to show more information about it or navigate your customer to that document.

import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

index_name = "hotels"
endpoint = os.environ["SEARCH_ENDPOINT"]
key = os.environ["SEARCH_API_KEY"]

client = SearchClient(endpoint, index_name, AzureKeyCredential(key))

result = client.get_document(key="1")

print("Details for hotel '1' are:")
print("        Name: {}".format(result["HotelName"]))
print("      Rating: {}".format(result["Rating"]))
print("    Category: {}".format(result["Category"]))

Async APIs

This library includes a complete async API supported on Python 3.5+. To use it, you must first install an async transport, such as aiohttp. See azure-core documentation for more information.

from azure.core.credentials import AzureKeyCredential
from azure.search.documents.aio import SearchClient

client = SearchClient(endpoint, index_name, AzureKeyCredential(api_key))

async with client:
  results = await client.search(search_text="hotel")
  async for result in results:
    print("{}: {})".format(result["hotelId"], result["hotelName"]))
...


## Troubleshooting

### General

The Azure Cognitive Search client will raise exceptions defined in [Azure Core][azure_core].

### Logging

This library uses the standard [logging][python_logging] library for logging.
Basic information about HTTP sessions (URLs, headers, etc.) is logged at INFO
level.

Detailed DEBUG level logging, including request/response bodies and unredacted
headers, can be enabled on a client with the `logging_enable` keyword argument:
```python
import sys
import logging
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

# Create a logger for the 'azure' SDK
logger = logging.getLogger('azure')
logger.setLevel(logging.DEBUG)

# Configure a console output
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)

# This client will log detailed information about its HTTP sessions, at DEBUG level
client = SearchClient("<service endpoint>", "<index_name>", AzureKeyCredential("<api key>"), logging_enable=True)

Similarly, logging_enable can enable detailed logging for a single operation, even when it isn't enabled for the client:

result =  client.search(search_text="spa", logging_enable=True)

Next steps

Contributing

See our Search CONTRIBUTING.md for details on building, testing, and contributing to this library.

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit cla.microsoft.com.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Impressions

Related projects

Microsoft Azure SDK for Python

Impressions

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

11.6.0b8 pre-release

Nov 21, 2024

11.6.0b7 pre-release

Nov 18, 2024

11.6.0b6 pre-release

Oct 8, 2024

11.6.0b5 pre-release

Sep 19, 2024

11.6.0b4 pre-release

May 7, 2024

11.6.0b3 pre-release

Apr 9, 2024

11.6.0b2 pre-release

Mar 6, 2024

11.6.0b1 pre-release

Jan 31, 2024

11.5.2

Oct 31, 2024

11.5.1

Jul 30, 2024

11.5.0

Jul 16, 2024

11.4.0

Nov 13, 2023

11.4.0b11 pre-release

Oct 12, 2023

11.4.0b10 pre-release

Oct 10, 2023

11.4.0b9 pre-release

Sep 12, 2023

11.4.0b8 pre-release

Aug 8, 2023

11.4.0b7 pre-release

Aug 8, 2023

11.4.0b6 pre-release

Jul 11, 2023

11.4.0b5 pre-release

Jul 11, 2023

11.4.0b4 pre-release

Jul 11, 2023

11.4.0b3 pre-release

Feb 7, 2023

11.4.0b2 pre-release

Nov 8, 2022

11.4.0b1 pre-release

Sep 8, 2022

11.3.0

Sep 6, 2022

11.3.0b8 pre-release

Mar 8, 2022

11.3.0b7 pre-release

Feb 8, 2022

11.3.0b6 pre-release

Nov 19, 2021

11.3.0b5 pre-release

Nov 9, 2021

11.3.0b4 pre-release

Oct 5, 2021

11.3.0b3 pre-release

Sep 8, 2021

11.3.0b2 pre-release

Aug 10, 2021

11.3.0b1 pre-release

Jul 7, 2021

11.2.2

Apr 14, 2022

11.2.1

Jan 11, 2022

11.2.0

Jun 8, 2021

11.2.0b3 pre-release

May 11, 2021

11.2.0b2 pre-release

Apr 13, 2021

11.2.0b1 pre-release

Apr 6, 2021

11.1.0

Feb 10, 2021

11.1.0b4 pre-release

Nov 10, 2020

11.1.0b3 pre-release

Oct 6, 2020

11.1.0b2 pre-release

Sep 8, 2020

11.1.0b1 pre-release

Aug 11, 2020

This version

11.0.0

Jul 7, 2020

1.0.0b4 pre-release

Jun 9, 2020

1.0.0b3 pre-release

May 4, 2020

1.0.0b2 pre-release

Apr 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure-search-documents-11.0.0.zip (290.6 kB view details)

Uploaded Jul 7, 2020 Source

Built Distribution

azure_search_documents-11.0.0-py2.py3-none-any.whl (213.4 kB view details)

Uploaded Jul 7, 2020 Python 2 Python 3

File details

Details for the file azure-search-documents-11.0.0.zip.

File metadata

Download URL: azure-search-documents-11.0.0.zip
Upload date: Jul 7, 2020
Size: 290.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for azure-search-documents-11.0.0.zip
Algorithm	Hash digest
SHA256	`1956025d7e09a9242c688173b771e37e450fba394bbb90c1d1c30502fd3da9c1`
MD5	`9cb587f02473dfcab6e8851190e7cc85`
BLAKE2b-256	`578ccdd28db6ab76ac50c8253a301318513dbe8693de44d69da17f1ac85851f1`

See more details on using hashes here.

File details

Details for the file azure_search_documents-11.0.0-py2.py3-none-any.whl.

File metadata

Download URL: azure_search_documents-11.0.0-py2.py3-none-any.whl
Upload date: Jul 7, 2020
Size: 213.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for azure_search_documents-11.0.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc381f220d27b1e9d2c32a9b68197222a10a35509e15e9f5ad96c30f9464f335`
MD5	`f5e55199a63ff1a3cb8c8590e58d55b3`
BLAKE2b-256	`d1fea8167d07aeab848cf9796b448fc7004f97a5d6cedbbddd9296073013a5ee`