Normalize a URL to a standard unicode encoding
Project description
urlnorm.py
Normalize a URL to a standard unicode representation
urlnorm normalizes a URL by:
lowercasing the scheme and hostname
converting the hostname to IDN format
taking out default port if present (e.g., http://www.foo.com:80/)
collapsing the path (./, ../, //, etc)
removing the last character in the hostname if it is ‘.’
unescaping any percent escape sequences (where possible)
upercase percent escape (ie: %3f => %3F)
converts spaces, and %20 to ‘+’
converts ip encoded as an integer to dotted quad notation
Installation
pip install -U urlnorm
or to install from source
pip install -e git://github.com/jehiah/urlnorm.git#egg=urlnorm
Examples
>>> import urlnorm >>> urlnorm.norm("http://xn--q-bga.com./u/u/../%72/l/") u'http://q\xe9.com/u/r/l/'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file urlnorm-1.1.tar.gz
.
File metadata
- Download URL: urlnorm-1.1.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0421d9c897e602e54d2b11902641c9a07d9cabcf1a85770c9582cf59931e54e1 |
|
MD5 | 88cb46c51014da099ad9f1c0d3e14f14 |
|
BLAKE2b-256 | da3747470f778762fdc136c774fb68bc98ca15b9d663db5f3e6711ecab29e00d |