Machine Learning, Statistics and Utilities around Developer Productivity, Company Productivity and Project Productivity
Project description
devml
Machine Learning, Statistics and Utilities around Developer Productivity
A few handy bits of functionality:
Can checkout all repositories in Github
Converts a tree of checked out repositories on disk into a pandas dataframe
Statistics on combined DataFrames
Get environment setup
Code is written to support Python 3.6 or greater. You can get that here: https://www.python.org/downloads/release/python-360/.
An easy way to run the project locally is to check the repo out and in the root of the repo run:
make setup
This create a virtualenv in ~/.devml
Next, source that virtualenv:
source ~/.devml/bin/activate
Run Make All (installs, lints and tests)
make all
# #Example output
#(.devml) ➜ devml git:(master) make all
#pip install -r requirements.txt
#Requirement already satisfied: pytest in /Users/noahgift/.devml/lib/python3.6/site-packages (from -r requirements.txt (line #1)
---------- coverage: platform darwin, python 3.6.2-final-0 -----------
Name Stmts Miss Cover
----------------------------------------------
devml/__init__.py 1 0 100%
devml/author_stats.py 6 6 0%
devml/fetch_repo.py 54 42 22%
devml/mkdata.py 84 21 75%
devml/org_stats.py 76 55 28%
devml/post_processing.py 50 35 30%
devml/state.py 29 9 69%
devml/stats.py 55 43 22%
devml/ts.py 29 14 52%
devml/util.py 12 4 67%
dml.py 111 66 41%
----------------------------------------------
TOTAL 507 295 42%
....
You don’t use virtualenv or don’t want to use it. No problem, just run make all it should probably work if you have python 3.6 installed.
make all
Explore Jupyter Notebooks on Github Organizations
You can explore combined datasets here using this example as a starter:
https://github.com/noahgift/devml/blob/master/notebooks/github_data_exploration.ipynb
Explore Jupyter Notebooks on Repository Churn
You can explore File Metadata exploration example here:
https://github.com/noahgift/devml/blob/master/notebooks/repo_file_exploration.ipynb
All Files Churned by type:
Summary Churn Statistics by type:
Expected Configuration
The command-line tools expects for you to create a project directory with a config.json file. Inside the config.json file, you will need to provide an oath token. You can find information about how to do that here: https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/.
Alternately, you can pass these values in via the python API or via the command-line as options. They stand for the following:
org: Github Organization (To clone entire tree of repos)
checkout_dir: place to checkout
oath: personal oath token generated from Github
➜ devml git:(master) ✗ cat project/config.json
{
"project" :
{
"org":"pallets",
"checkout_dir": "/tmp/checkout",
"oath": "<keygenerated from Github>"
}
}
Basic command-line Usage
You can find out stats for a checkout or a directory full of checkout as follows
python dml.py gstats author --path ~/src/mycompanyrepo(s)
Top Commits By Author: author_name commits
0 John Smith 3059
1 Sally Joe 2995
2 Greg Mathews 2194
3 Jim Mayflower 1448
Basic API Usage (Converting a tree of repo(s) into a pandas DataFrame)
In [1]: from devml import (mkdata, stats)
In [2]: org_df = mkdata.create_org_df(path=/src/mycompanyrepo(s)")
In [3]: author_counts = stats.author_commit_count(org_df)
In [4]: author_counts.head()
Out[4]:
author_name commits
0 John Smith 3059
1 Sally Joe 2995
2 Greg Mathews 2194
3 Jim Mayflower 1448
4 Truck Pritter 1441
Clone all repos in Github using API
In [1]: from devml import (mkdata, stats, state, fetch_repo)
In [2]: dest, token, org = state.get_project_metadata("../project/config.json")
In [3]: fetch_repo.clone_org_repos(token, org,
dest, branch="master")
017-10-14 17:11:36,590 - devml - INFO - Creating Checkout Root: /tmp/checkout
2017-10-14 17:11:37,346 - devml - INFO - Found Repo # 1 REPO NAME: flask , URL: git@github.com:pallets/flask.git
2017-10-14 17:11:37,347 - devml - INFO - Found Repo # 2 REPO NAME: pallets-sphinx-themes , URL: git@github.com:pallets/pallets-sphinx-themes.git
2017-10-14 17:11:37,347 - devml - INFO - Found Repo # 3 REPO NAME: markupsafe , URL: git@github.com:pallets/markupsafe.git
2017-10-14 17:11:37,348 - devml - INFO - Found Repo # 4 REPO NAME: jinja , URL: git@github.com:pallets/jinja.git
2017-10-14 17:11:37,349 - devml - INFO - Found Repo # 5 REPO NAME: werkzeug , URL: git@githu
In [4]: !ls -l /tmp/checkout
total 0
drwxr-xr-x 21 noahgift wheel 672 Oct 14 17:11 click
drwxr-xr-x 25 noahgift wheel 800 Oct 14 17:11 flask
drwxr-xr-x 11 noahgift wheel 352 Oct 14 17:11 flask-docs
drwxr-xr-x 12 noahgift wheel 384 Oct 14 17:11 flask-ext-migrate
drwxr-xr-x 8 noahgift wheel 256 Oct 14 17:11 flask-snippets
drwxr-xr-x 14 noahgift wheel 448 Oct 14 17:11 flask-website
drwxr-xr-x 18 noahgift wheel 576 Oct 14 17:11 itsdangerous
drwxr-xr-x 23 noahgift wheel 736 Oct 14 17:11 jinja
drwxr-xr-x 18 noahgift wheel 576 Oct 14 17:11 markupsafe
drwxr-xr-x 4 noahgift wheel 128 Oct 14 17:11 meta
drwxr-xr-x 10 noahgift wheel 320 Oct 14 17:11 pallets-sphinx-themes
drwxr-xr-x 9 noahgift wheel 288 Oct 14 17:11 pocoo-sphinx-themes
drwxr-xr-x 15 noahgift wheel 480 Oct 14 17:11 website
drwxr-xr-x 25 noahgift wheel 800 Oct 14 17:11 werkzeug
Advanced CLI-Churn: Get churn by file type
Get the top ten files sorted by churn count with the extension .py:
✗ python dml.py gstats churn --path /Users/noahgift/src/flask --limit 10 --ext .py
2017-10-15 12:10:55,783 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/flask]
files churn_count line_count extension \
1 b'flask/app.py' 316 2183.0 .py
3 b'flask/helpers.py' 176 1019.0 .py
5 b'tests/flask_tests.py' 127 NaN .py
7 b'flask.py' 104 NaN .py
8 b'setup.py' 80 112.0 .py
10 b'flask/cli.py' 75 759.0 .py
11 b'flask/wrappers.py' 70 194.0 .py
12 b'flask/__init__.py' 65 49.0 .py
13 b'flask/ctx.py' 62 415.0 .py
14 b'tests/test_helpers.py' 62 888.0 .py
relative_churn
1 0.14
3 0.17
5 NaN
7 NaN
8 0.71
10 0.10
11 0.36
12 1.33
13 0.15
14 0.07
Get descriptive statistics for extension .py and compare to another repository
In this example, flask, this repo and cpython are all compared to see how the median churn is.
(.devml) ➜ devml git:(master) python dml.py gstats metachurn --path /Users/noahgift/src/flask --ext .py --statistic median
2017-10-15 12:39:44,781 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/flask]
MEDIAN Statistics:
churn_count line_count relative_churn
extension
.py 2 85.0 0.13
(.devml) ➜ devml git:(master) python dml.py gstats metachurn --path /Users/noahgift/src/devml --ext .py --statistic median
2017-10-15 12:40:10,999 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/devml]
MEDIAN Statistics:
churn_count line_count relative_churn
extension
.py 1 62.5 0.02
(.devml) ➜ devml git:(master) python dml.py gstats metachurn --path /Users/noahgift/src/cpython --ext .py --statistic median
2017-10-15 12:42:19,260 - devml.post_processing - INFO - Running churn cmd: [git log --name-only --pretty=format:] at path [/Users/noahgift/src/cpython]
MEDIAN Statistics:
churn_count line_count relative_churn
extension
.py 7 169.5 0.1
Deletion Statistics
Find all delete files from repository
DELETION STATISTICS
files ext
0 b'tests/test_deprecations.py' .py
1 b'scripts/flask-07-upgrade.py' .py
2 b'flask/ext/__init__.py' .py
3 b'flask/exthook.py' .py
4 b'scripts/flaskext_compat.py' .py
5 b'tests/test_ext.py' .py
FAQ
What is Churn and Why Do I Care?
Code churn is the amount of times a file has been modified. Relative churn is the amount of times it has been modified relative to lines of code. Research into defects in software has shown that relative code churn is highly predictive of defects, i.e., the greater the relative churn number the higher the amount of defects.
“Increase in relative code churn measures is accompanied by an increase in system defect density; “
You can read the entire study here: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/icse05churn.pdf
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file devml-0.3.tar.gz
.
File metadata
- Download URL: devml-0.3.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50e39924b56a6ced540842e975075010989bae3a36d94220818ea58965422f00 |
|
MD5 | 7f60e2e40bd271c7344f514f8496c64f |
|
BLAKE2b-256 | ff96b0935b376806bc3ae68896a0c8646c57d75c376fe218db7c8c2562bea06a |