Skip to main content

A Looker Cleanup Tool

Project description

image


Henry: A Looker Cleanup Tool

Henry is a command line tool that helps determine model bloat in your Looker instance and identify unused content in models and explores. It is meant to help developers cleanup models from unused explores and explores from unused joins and fields, as well as maintain a healthy and user-friendly instance.

Table of Contents

Status and Support

Henry is NOT supported or warranted by Looker in any way. Please do not contact Looker support for issues with Henry. Issues can be logged via https://github.com/looker-open-source/henry/issues

Installation

Henry requires python3.7+. It is published on PyPI and can be installed using pip:

$ pip install henry

For development setup, follow the Development setup below.

Usage

In order to display usage information, use:

$ henry --help

Global Options that apply to many commands

Authentication

Henry makes use of the Looker SDK to issue API calls and requires API3 credentials. These can provided either using an .ini file or environment variables as documented here. By default, the tool looks for a "looker.ini" file in the working directory. If the configuration file is named differently or located elsewhere, it must be specified using the --config-file argument.

Example .ini file:

[Looker]
# Base URL for API. Do not include /api/* in the url
base_url=https://self-signed.looker.com:19999
# API 3 client id
client_id=YourClientID
# API 3 client secret
client_secret=YourClientSecret
# Set to false if testing locally against self-signed certs. Otherwise leave True
verify_ssl=True

[Production]
base_url=https://production.looker.com:19999
client_id=YourClientID
client_secret=YourClientSecret
verify_ssl=True

Assuming the above ini file contents, Henry can be run as follows:

$ henry pulse --config-file=looker.ini --section=Looker

which due to defaults, is equivalent to

$ henry pulse

Running it using the details under the Production section can be done as follows:

$ henry pulse --section=Production

API timeout settings

By default, API calls have a timeout of 120 seconds. This can be overriden using the --timeout argument.

Output to File

If the --save flag is used the tool saves the results to your current working directory. Example usage:

$ henry vacuum models --save

saves the results in vacuum_models_{date}_{time}.csv in the current working directory.

Pulse Command

The command henry pulse runs a number of tests that help determine the overall instance health.

Analyze Command

The analyze command is meant to help identify models and explores that have become bloated and use vacuum on them in order to trim them.

analyze projects

The analyze projects command scans projects for their content as well as checks for the status of quintessential features for success such as the git connection status and validation requirements.

+-------------------+---------------+--------------+-------------------------+---------------------+------------------------+
| Project           |  # Models     | # View Files | Git Connection Status   | PR Mode             | Is Validation Required |
|-------------------+---------------+--------------+-------------------------+---------------------+------------------------|
| marketing         |       1       |      13      | OK                      | links               | True                   |
| admin             |       2       |      74      | OK                      | off                 | True                   |
| powered_by_looker |       1       |      14      | OK                      | links               | True                   |
| salesforce        |       1       |      36      | OK                      | required            | False                  |
| thelook_event     |       1       |      17      | OK                      | required            | True                   |
+-------------------+---------------+--------------+-------------------------+---------------------+------------------------+

analyze models

Shows the number of explores in each model as well as the number of queries against that model.

+-------------------+------------------+-----------------+-------------------+-------------------+
| Project           | Model            |  # Explores     | # Unused Explores |    Query Count    |
|-------------------+------------------+-----------------+-------------------+-------------------|
| salesforce        | salesforce       |        8        |         0         |       39923       |
| thelook_event     | thelook          |       10        |         0         |      166307       |
| powered_by_looker | powered_by       |        5        |         0         |       49122       |
| marketing         | thelook_adwords  |        3        |         0         |       40869       |
| admin             | looker_base      |        0        |         0         |         0         |
| admin             | looker_on_looker |       10        |         9         |        28         |
+-------------------+------------------+-----------------+-------------------+-------------------+

analyze explores

Shows explores and their usage. If the --min-queries argument is passed, joins and fields that have been used less than the threshold specified will be considered as unused.

+---------+-----------------------------------------+-------------+-------------------+--------------+----------------+---------------+-----------------+---------------+
| Model   | Explore                                 | Is Hidden   | Has Description   |   # Joins    | # Unused Joins |    # Fields   | # Unused Fields |  Query Count  |
|---------+-----------------------------------------+-------------+-------------------+--------------+----------------+---------------+-----------------+---------------|
| thelook | cohorts                                 | True        | False             |      3       |       0        |      19       |        4        |      333      |
| thelook | data_tool                               | True        | False             |      3       |       0        |      111      |       90        |      736      |
| thelook | order_items                             | False       | True              |      7       |       0        |      153      |       16        |    126898     |
| thelook | events                                  | False       | True              |      6       |       0        |      167      |       68        |     19372     |
| thelook | sessions                                | False       | False             |      6       |       0        |      167      |       83        |     12205     |
| thelook | affinity                                | False       | False             |      2       |       0        |      34       |       13        |     3179      |
| thelook | orders_with_share_of_wallet_application | False       | True              |      9       |       0        |      161      |       140       |     1586      |
| thelook | journey_mapping                         | False       | False             |      11      |       2        |      238      |       228       |      14       |
| thelook | inventory_snapshot                      | False       | False             |      3       |       0        |      25       |       15        |      33       |
| thelook | kitten_order_items                      | True        | False             |      8       |       0        |      154      |       138       |      39       |
+---------+-----------------------------------------+-------------+-------------------+--------------+----------------+---------------+-----------------+---------------+

Vacuum Information

The vacuum command outputs a list of unused content based on predefined criteria that a developer can then use to cleanup models and explores.

vacuum models

The vacuum models command exposes models and the number of queries against them over a predefined period of time. Explores that are listed here have not had the minimum number of queries against them in the timeframe specified. As a result it is safe to hide them and later delete them.

+------------------+---------------------------------------------+-------------------------+
| Model            | Explore                                     |     Model Query Count   |
|------------------+---------------------------------------------+-------------------------|
| salesforce       |                                             |          39450          |
| thelook          |                                             |         164930          |
| powered_by       |                                             |          49453          |
| thelook_adwords  |                                             |          38108          |
| looker_on_looker | user_full                                   |           27            |
|                  | history_full                                |                         |
|                  | content_view                                |                         |
|                  | project_status                              |                         |
|                  | field_usage_full                            |                         |
|                  | dashboard_performance_full                  |                         |
|                  | user_weekly_app_activity_period_over_period |                         |
|                  | pdt_state                                   |                         |
|                  | user_daily_query_activity                   |                         |
+------------------+---------------------------------------------+-------------------------+

vacuum explores

The vacuum explores command exposes joins and exposes fields that are below or equal to the minimum number of queries threshold (default=0, can be changed using the --min-queries argument) over the specified timeframe (default: 90, can be changed using the --timeframe argument).

Example: from the analyze function run above, we know that the cohorts explore has 4 fields that haven't been queried once in the past 90 days. Running the following vacuum command:

$ henry vacuum explores --model thelook --explore cohorts

provides the name of the unused fields:

+---------+-----------+----------------+------------------------------+
| Model   | Explore   | Unused Joins   | Unused Fields                |
|---------+-----------+----------------+------------------------------|
| thelook | cohorts   | users          | users.id                     |
|         |           |                | order_items.id               |
|         |           |                | order_items.id               |
|         |           |                | order_items.total_sale_price |
+---------+-----------+----------------+------------------------------+

If a join is unused, it's implying that fields introduced by that join haven't been used for the defined timeframe. For this reason fields exposed as a result of that join are not explicitly listed as unused fields.

It is very important to note that fields listed as unused in one explore are not meant to be completely removed from view files altogether because they might be used in other explores (via extensions), or filters. Instead, one should either hide those fields (if they're not used anywhere else) or exclude them from the explore using the fields LookML parameter.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/looker-open-source/henry/issues. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Code of Conduct

Everyone interacting in the Henry project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Copyright

Copyright (c) 2018 Joseph Axisa for Looker Data Sciences. See MIT License for further details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

henry-0.2.3.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

henry-0.2.3-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file henry-0.2.3.tar.gz.

File metadata

  • Download URL: henry-0.2.3.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.7

File hashes

Hashes for henry-0.2.3.tar.gz
Algorithm Hash digest
SHA256 d41345ecc194fa479038549afabb7cb460fc42c2db4e0db5a4ef3881b20a87af
MD5 bf042ee3d9a63f2ad94872fd5a28a4dc
BLAKE2b-256 f3de962101648e838419a87462a4b3c62667773973941d397beb58ba77ca6ce2

See more details on using hashes here.

Provenance

File details

Details for the file henry-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: henry-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.7.7

File hashes

Hashes for henry-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9a0f456074e65296d4481f17bdf22a518f93c317b7298e8ce18a51cde4a6aa9c
MD5 630a3472bdaa8f9f805653c03bfd6fd2
BLAKE2b-256 a8cfccacecc53e6d497189d8487017f8e9c62b7fc8c676562ea1dbf98db63367

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page