Skip to main content

A tool for learning vector representations of words and entities from Wikipedia

Project description

Wikipedia2Vec
=============

[![Fury badge](https://badge.fury.io/py/wikipedia2vec.png)](http://badge.fury.io/py/wikipedia2vec)
[![CircleCI](https://circleci.com/gh/wikipedia2vec/wikipedia2vec.svg?style=svg)](https://circleci.com/gh/wikipedia2vec/wikipedia2vec)

Wikipedia2Vec is a tool used for obtaining embeddings (vector representations) of words and entities from Wikipedia.
It is developed and maintained by [Studio Ousia](http://www.ousia.jp).

This tool enables you to learn embeddings that map words and entities into a unified continuous vector space.
The embeddings can be used as word embeddings, entity embeddings, and the unified embeddings of words and entities.
They are used in the state-of-the-art models of various tasks such as [entity linking](https://arxiv.org/abs/1601.01343), [named entity recognition](http://www.aclweb.org/anthology/I17-2017), [entity relatedness](https://arxiv.org/abs/1601.01343), and [question answering](https://arxiv.org/abs/1803.08652).

Documentation and pretrained embeddings are available online at [http://wikipedia2vec.github.io/](http://wikipedia2vec.github.io/).

Reference
---------

If you use Wikipedia2Vec in a scientific publication, please cite the following paper:

@InProceedings{yamada-EtAl:2016:CoNLL,
author = {Yamada, Ikuya and Shindo, Hiroyuki and Takeda, Hideaki and Takefuji, Yoshiyasu},
title = {Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation},
booktitle = {Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning},
month = {August},
year = {2016},
address = {Berlin, Germany},
pages = {250--259},
publisher = {Association for Computational Linguistics}
}

License
-------

[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipedia2vec-0.2.8.tar.gz (1.2 MB view details)

Uploaded Source

File details

Details for the file wikipedia2vec-0.2.8.tar.gz.

File metadata

  • Download URL: wikipedia2vec-0.2.8.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/28.8.0 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/3.6.0

File hashes

Hashes for wikipedia2vec-0.2.8.tar.gz
Algorithm Hash digest
SHA256 7e1d7114399e58a87b75e3f6bd5d1fd99fe0e4269fc3965c83207d55eadd1277
MD5 55fa95ca6855cecffba28c8d4cea83f9
BLAKE2b-256 3a1c1b555415d635c83375c9b774a65382553f9197573a296a573c1f9199dbd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page