Private, personalized searchable knowledge base, from your own notes.
Project description
Knowt
Knowt turns notes into knowledge. You can search your notes for the name of that person you saw at the cafe last week or even have a conversation with your past self about anything at all. It won't write your term paper for you, or draw you dreamy pictures, but it will help you remember the important things, the things that your favorite humans wrote down.
Getting started
My favorite humans these days are on open source communities like Hacker Public Radio.
So knowt
comes with all the show notes from every on of the 4,000+ HPR episodes recorded in its 15+ years of cointinuous broadcasting.
What questions do you have for the 100s of agalmic contributors to HPR?
$ pip install knowt
$ knowt what is Haycyon?
Installation
Python virtual environment
To set up the project environment, follow these steps:
- Clone the project repository or download the project files to your local machine.
- Navigate to the project directory.
- Create a Python virtual environment in the project directory:
pip install virtualenv
python -m virtualenv .venv
- Activate the virtual environment (mac/linux):
source .venv/bin/activate
Install dependencies
Not that you have a virtual environment, you're ready to install some Python packages and download language models (spaCy and BERT).
- Install the required packages using the
requirements.txt
file:
pip install -e .
- Download the small BERT embedding model (you can use whichever open source model you like):
python -c 'from sentence_transformers import SentenceTransformer; sbert = SentenceTransformer("paraphrase-MiniLM-L6-v2")'
Quick start
You can search an example corpus of nutrition and health documents by running the search_engine.py
script.
Search your personal docs
- Replace the text files in
data/corpus
with your own. - Start the command-line search engine with:
python search_engine.py --refresh
The --refresh
flag ensures that a fresh index is created based on your documents.
Otherwise it may ignore the data/corpus
directory and reuse an existing index and corpus in the data/cache
directory.
The search_engine.py
script will first segement the text files into sentences.
Then it will create a "reverse index" by counting up words and character patterns in your documents.
It will also creat semantic embeddings to allow you to as questions about vague concepts without even knowing any the words you used in your documents.
Contributing
Submit an Issue (bug or feature suggestion) or a Merge Request and someone will respond within the week.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file knowt-0.1.5.tar.gz
.
File metadata
- Download URL: knowt-0.1.5.tar.gz
- Upload date:
- Size: 21.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8d27f2d8863b7a0afd328548c5db1c7ac0381c07d26ae09ef99ae052695bde6 |
|
MD5 | 1e65273f670815df3a1e0feb0b5a64e9 |
|
BLAKE2b-256 | 4bbf30017bfde3e640bd07346f92799c3aab5eccc45bd7ff5d727a313ff2f5d2 |