Python linting and testing
With the lack of strong typing and compile-time checks, writing bug-free python code can be tricky. A common approach is just to rely on unit tests to exercise all the code and trigger any bugs, but python does have some static analysis and style checking tools available like pylint and pep8.
Furthermore, it is nice to have pep8 style requirements on code that is checked into a shared repository, to keep things consistent across different authors. So before committing some code I often found myself fussing with pep8 problems as much as I was testing and fixing functional issues. My process was generally:
- Run
pylint -Eand fix any errors - Run unit tests, fix more errors
- (optional) Run pep8 and fix style errors
- (optional) Run
pylintwithout-E, fix warnings, feel guilty about long and short variable names and functions with too many arguments - Repeat until there are no more errors
There are some problems with this:
pylintis noisy, there are lots of style complaints that you might not want to fix- If you have a lot of test cases it can take a long time to get to the ones you care about
pylint -Eis also noisy when using frameworks like Django (the dreaded “Class X has no ‘objects’ member” error) because Django does a lot of metastuff that pylint cannot follow- Why can’t stupid pep8 whitespace errors fix themselves!
The good news is that problem 4 is already fixed via
autopep8, we can just run
autopep8 -i on the files we want to fix, and that will fix all sorts
of pep8 spacing problems.
Problem 3 is also solvable via the pylint-django package. Which sets up all the pylint special cases for the model meta magic.
Problem 1 is solved by customizing the pylint message classes you do not want to see using the -d flag, but who wants to do all that customization and looking up of pep8 classes? A better way for lazy people is to install the package prospector. Prospector actually doesn’t just simplify pylint output but includes some other nice packages like mccabe and pyflakes. It also aggregates results between these tools and eliminates some duplicates.
The next problem is 2, one could just always run all the unit tests
when developing, but if you have 100s or 1000s of unit tests, it might
be a few minutes before you get around to the point in your test suite
where you are developing. One solution is just to selectively pick
the package or module you are developing on the command line. This
involves figuring out which files you are working on and picking the
related tests and putting it all on the command line. Another
solution is to use an IDE that auto-runs your tests, which presumably
will run the tests for the files you are changing first. This problem
should be automatible though, right? Python test files must start
with test_, so generally the convention is to put all the tests for
foo.py somewhere in a file called test_foo.py. So an efficient
test runner solution would look like this:
- Get a list of files that you have modified from git, this is
pretty easily scriptable with
git diff --name-only. For example, you could get the changed files staged withgit diff --name-only --cached. - So now you have a list of files changed, you then pull out the changed files that are test files themselves and put them aside for a minute.
- Take the non test files and find their test file counterparts
starting with
test_ - Take the union of the synthesized test file set and the other changed test files.
- Specify this list of test files to the test runner.
For that matter, you can use the set of test files plus other modified files to select results from prospector and autopep8 as well. This is generally a good idea when dealing with a large legacy code base because you may not want to make a bunch of random pep8 whitespace changes to files that you are not actively working on.
Great, we are ready to put it all together now. First get the list of files:
pyfiles=$(git diff --cached --name-only | grep ".py$")This grabs all the staged python files, we could instead use git
diff HEAD^ to get the last commit instead if we like. Next let’s
filter out the files that no longer exist (like if we just did git
rm):
existingfiles=""
for f in $pyfiles; do
if [ -e $f ]; then
existingfiles=$(echo "$existingfiles $f")
fi
doneNext, let’s generate a list of test files:
declare -A totest
for f in $existingfiles; do
if [[ "$f" =~ "/test_" ]]; then
totest[$f]=1
else
withtestpath=${f/\//\/tests\/}
withtestprefix=$(echo $withtestpath | sed 's/\([a-zA-Z_]*\.py\)/test_\1/')
if [ -e "$withtestprefix" ]; then
totest[$withtestprefix]=1
fi
fi
doneIn the above we make use of an associative array to store filenames to
eliminate duplicates (if we are editing test_foo.py and foo.py). The
if statement collects files that are already test files. The else
statement constructs test files from given non-test files.
$withtestpath injects “tests” after the first path element,
reflecting the particular directory structure of this project.
$withtestprefix prepends a “test_” to the name of the python
file to get the fill test path.
Now we are in a position to start using our file lists:
autopep8 -i $existingfiles
prospector | grep -E -A2 $(echo "${existingfiles// /|}" | sed 's/^|//' | sed 's/|$//')We just run autopep8 (in-place) on all the existing files we changed, and then run prospector. Since prospector runs over all the files in the project, there is an extra filter to grep out just the contexts that relate to the files we modified.
Finally, we convert our test files to packages and run them as tests:
nopy=${!totest[@]//.py/}
packages=${nopy//\//\.}
manage.py test $packagesHere we take the contents of the associative array, discard the .py suffixes then convert the /s to .s and feed it into the test tool (for Django in this case).