Thursday, June 20, 2013

Question on StackOverflow: Best Unittest for Python programs using OptionParser() ?

Just posted this on stackoverflow. http://stackoverflow.com/questions/17223530/python-how-do-i-usefully-unittest-a-method-invoking-optionparser

MagicMock Returns different values each call

Recent discoveries about mock objects!

I'm trying to create a Python unittest that exercises a class method with a time() call in it.

That is I have this:

class DumObj(object):
def methodOne(self, intime):
while time.time() < intime:
... do stuff ...
so, I refactored this to:

class DumObj(object):
def nowTime(self):
return time.time()
def methodOne(self, intime):
while self.nowTime() < intime:
... do stuff ...
and have to test methodOne. How to get it to go through the loop once (or twice)?

I tried setting the return value to the output of a function:

import unittest
from mock import Mock, MagicMock
class TestDumObj(unittest.TestCase)
def test_methodOne(self):
d = DumObj()
retvals=[1,2,3,4,5]
def mf():
ret = retvals.pop()
return ret
d.nowTime = MagicMock(return_value=mf()) d.methodOne()

Doesn't work. Here's the interactive version:

>>> from mock import MagicMock
>>> retvals = [ 1, 2, 3, 4, 5 ]
>>> def mse(*args, **kwargs):
... ret = retvals.pop()
... return ret
...
>>> mm = MagicMock(return_value=mse())
>>> mm()
5
>>> mm()
5

Damn! I want MagicMock to return different values each time it's called. I want to have an array of return values for MagicMock, and each call it returns the next one. How to do this? Turns out it's using the side_effect param. Continuing from above:

>>> mm = MagicMock(side_effect=mse)
>>> mm()
4
>>> mm()
3
Yay! Solved: MagicMock returns different values each call. Now, tie it into the overall test:

import unittest
from mock import Mock, MagicMock
class TestDumObj(unittest.TestCase)
def test_methodOne(self):
d = DumObj()
retvals=[1,2,3,4,5]
def mse():
ret = retvals.pop()
return ret
d.nowTime = MagicMock(side_effect=mse) d.methodOne()

Works!

Tuesday, May 14, 2013

Drop-Thru Code Considered Harmful

Recently, I was describing my frustrations with some Python code that had been written by a now-departed co-worker.  I describe this code as "Drop-Thru Code".  The main characteristic is that it uses module scope for a non-trivial number of variables.

Module scope is functionally almost-global.  That is, it's so close to global you'll want to use it, but it's far enough away that you'll end up shooting yourself in the foot.

Consider this code snippet:

#!/bin/env python2.6
import os
import sys
varname = 33
vardict = { 'a' : 44 }
# other imports here
print "Starting!"
def something():
print "thing", varname
print "next"
something()
print "end"

This is what I call drop-thru code.  All the variables are global, and the call to something() will fail because varname is out of scope.

Contrast this with what I prefer, well-encapsulated code:

#!/bin/env python2.6
import os
import sys

class SomeThing(object):
    def __init__(self):
        self.varname = 33 
    def something(self):
        print "thing", self.varname
    def main(self):
        self.vardict = { 'a' : 44 }
        # other imports here
        print "Starting!"
        print "next"
        self.something()
        print "end" 

st = SomeThing()
st.main()

Nothing is global, it's obvious what the scope for things is.  Clean, beautiful, easy to understand.

There are lots of examples of scope problems.  I created this one for my team so they understood my frustration:

outside = 1
def function1():
    try:
        print "f1 outside: %s" % (outside)
        outside += 1
    except:
        print "no outside in function1."
print "a outside: %s" % (outside)
function1()                                     
print "b outside: %s" % (outside)
def function2():
    global outside
    try:
        print "f2 outside: %s" % (outside)
        outside += 1
    except:
        print "no outside in function2"
function2()
function2()
class Dum(object):
    def __init__(self):
        print "dum instantiated."
    def main(self):
        print "Dum main outside: %s" % (outside)
    def changeOutside(self, inval):
        outside = inval
        print "Dum changed: outside: %s" % (outside)
    def changeGlobalOutside(self, inval):
        global outside
        outside = inval
        print "Dum changed: outside: %s" % (outside)
d = Dum()
print "c outside: %s" % (outside)
d.main()
print "d outside: %s" % (outside)
d.changeOutside(33)
print "e outside: %s" % (outside)
d.main()
d.changeGlobalOutside(44)
d.main()
print "f outside: %s" % (outside)
______________________________________________________
output:
krice4@zaphod:~/checkouts/userSandboxes/krice4$ python scopeTest.py 
a outside: 1
no outside in function1.
b outside: 1
f2 outside: 1
f2 outside: 2
dum instantiated.
c outside: 3
Dum main outside: 3
d outside: 3
Dum changed: outside: 33
e outside: 3
Dum main outside: 3
Dum changed: outside: 44
Dum main outside: 44
f outside: 44

In short, Drop-Thru Code is considered harmful.  It allows for lots of scope problems that show up as bugs and frustrations later.  So, avoid them.  Put all the vars you can in a class and invoke that class, you'll be glad you did.  IMHO.

Friday, May 10, 2013

Subversion pre-commit hook script - Python files: prevent tabs, verify svn properties

Here's a precommit hook script I've modified from one I've used before.  I hope it comes in handy.  I'm going to try to submit it to the dev group of subversion itself for inclusion in the contrib/hook-scripts directory.

#!/bin/env python

"""
    pre-commit hook script that does several things:
    * prevents committing any python file containing a tab character.
    * checks if there are tabs in the source file and warns if so;
    * aborts if incorrect properties of eol-style and keywords 'id'.
"""
import sys
import os
import traceback
from optparse import OptionParser

#sys.stderr.write("NOTE:  pre-commit hook script enabled - checks for tabs, svn eol-style and id properties...\n")

def command_output(cmd):
    " Capture a command's standard output. "
    import subprocess
    return subprocess.Popen(
        cmd.split(), stdout=subprocess.PIPE).communicate()[0]

def files_changed(look_cmd):
    """ List the files added or updated by this transaction.

        "svnlook changed" gives output like:
          U   trunk/file1.cpp
          A   trunk/file2.py
    """
    def filename(line):
        return line[4:]

    def added_or_updated(line):
        return line and line[0] in ("A", "U")

    retval = []
    for line in command_output(look_cmd % "changed").split("\n"):
        if added_or_updated(line):
            retval.append(filename(line))
    #sys.stderr.write("files changed: %s" % (retval))
    return retval

def file_contents(filename, look_cmd):
    " Return a file's contents for this transaction. "
    return command_output("%s %s" % (look_cmd % "cat", filename))

def file_get_properties(filename, look_cmd):
    propslines = command_output("%s %s" % (look_cmd % "proplist -v", filename))
    res = {}
    for line in propslines.split('\n'):
        line = line.strip()
        if not line:
            continue
        k, v = line.split(' : ')
        res[k] = v
    return res

def contains_tabs(filename, look_cmd):
    " Return True if this version of the file contains tabs. "
    return "\t" in file_contents(filename, look_cmd)

def check_py_files(look_cmd):
    " Check Python files in this transaction are tab-free. "
   
    def is_py_file(fname):
        return os.path.splitext(fname)[1] == ".py"
   
    py_files_with_tabs    = set()
    py_files_bad_eolstyle = set()
    py_files_bad_exec     = set()
    py_files_bad_keywords = set()
    for ff in files_changed(look_cmd):
        if not is_py_file(ff):
            continue
        if contains_tabs(ff, look_cmd):
            py_files_with_tabs.add(ff)
        props = file_get_properties(ff, look_cmd)
       if props.get('svn:special'):
            sys.stderr.write("file %s has svn:special flag, probably a symlink. don't check other props." % ff)
            continue
        eolstyle = props.get('svn:eol-style')
        #sys.stderr.write("props: %s\neolstyle: '%s'\n" % (props, eolstyle))
        if eolstyle in ('native', 'LFFFFF'):
            py_files_bad_eolstyle.add(ff)
        execut   = props.get('svn:executable')
        if execut not in ['ON', '*']:
            py_files_bad_exec.add(ff)
        keywords = props.get('svn:keywords')
        if (not keywords) or ('Id' not in keywords.split()):
            py_files_bad_keywords.add(ff)

    preventCommit = False
    if len(py_files_with_tabs) > 0:
        sys.stderr.write("The following files contain tabs:\n%s\n"                                                                              % "\n".join(py_files_with_tabs))
        preventCommit = True
    if len(py_files_bad_exec) > 0:
        sys.stderr.write("The following py files are missing 'executable' property, committing anyway, but please fix this:\n%s\n"              % "\n".join(py_files_bad_exec))
        # note, do not prevent commit over this, just warn.
    if len(py_files_bad_keywords) > 0:
        sys.stderr.write("The following files don't have keywords property set to 'Id' at least.  Please fix this before committing:\n%s\n"     % "\n".join(py_files_bad_keywords))
        preventCommit = True
    if len(py_files_bad_eolstyle) > 0:
        sys.stderr.write("The following files don't have svn propset svn:eol-style 'LF', please do so before committing:\n%s\n"                 % "\n".join(py_files_bad_eolstyle))
        preventCommit = True

    return preventCommit

def main():
    usage = """usage: %prog REPOS TXN
        Run pre-commit options on a repository transaction."""

    parser = OptionParser(usage=usage)
    parser.add_option("-r", "--revision",
                      help="Test mode. TXN actually refers to a revision.",
                      action="store_true", default=False)
    errors = 0
    try:
        (opts, (repos, txn_or_rvn)) = parser.parse_args()
        look_opt = ("--transaction", "--revision")[opts.revision]
        look_cmd = "svnlook %s %s %s %s" % (
            "%s", repos, look_opt, txn_or_rvn)
        errors += check_py_files(look_cmd)
    except:
        parser.print_help()
        errors += 1
        sys.stderr.write("Pre-commit hook traceback: %s" % (traceback.format_exc()))
    return errors

if __name__ == "__main__":
    sys.exit(main())

Wednesday, May 08, 2013

Setting up Graphite / Django - MemoryError in Glyph

I'm setting up Graphite (under Django, of course) on a new box, and just saw the following error:

Exception Type: MemoryError
/opt/graphite/webapp/graphite/render/glyph.py in getExtents, line 221

specifically on the line: 
 F = self.ctx.font_extents()
 
turns out the problem was that I didn't have fonts installed.  New box, right?  Yah.
So, did apt-get install:
 
apt-get install cairo
apt-get install pycairo
apt-get install bitmap-fonts
 
The last one took a while.  But, now it's up and going!
 
 
BTW, the total list of apt-get I ran to get Graphite running on Centos-6.1 with python 2.6 was as follows:
 
apt-get install subversion
apt-get install httpd
apt-get install Django 
apt-get install mod_wsgi
apt-get install memcached
apt-get install python-devel
apt-get install  python-tools python-setuptools python-memcached python-pycurl
easy_install django-tagging
easy_install -U distribute
apt-get install libmysqlclient-dev
apt-get install mysql-libs
easy_install python-mysqldb
apt-get install MySQL-python
apt-get install cairo
apt-get install pycairo
apt-get install bitmap-fonts
 
Enjoy!

Oh, and if you are not satisfied with the quality of this posting, Complaints should be addressed to: 
 
Complaints Dept.
British Airways
Ingrams Drive 
Redditch, B79 5UT  
UNITED KINGDOM 
 
 

Wednesday, April 17, 2013

Too Smart for their Britches

I just ran into a bit of bash script that did this:

conflist=`cat www-conf/* |sort -u |sed ':a;N;$!ba;s/\n/ /g'`

On first glance, it's getting cat'ing all the files in a dir, sorting, and returning the result.  BUT WAIT, there's a funky sed script in there.

Sure, I get the 's/\n/ /g' -> get rid of intervening newlines.  But what of the other stuff?

It turns out, some smart-ass decided to code up some fancy schmantzy "I'm Smarter Than You" thumb-to-nose action.   How it works, courtesy of http://www.grymoire.com/Unix/Sed.html :

  • The semicolons separate commands.
  • The :a is a tag, so we can 'goto a' later.
  • The N command says append lines together including their newline character;
    •  The "n" command will print out the current pattern space (unless the "-n" flag is used), empty the current pattern space, and read in the next line of input. The "N" command does not print out the current pattern space and does not empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space.
  • the $!ba command says, 'unless you're the last line of the file, branch to a (goto a).
    •   An easier way is to use the special character "$," which means the last line in the file.

So, the goal was to collapse all this data into a sorted list without newlines.  Note that if I echo $conflist, I see "a b c d e..." just like if I had left out those fancy branches.

What an utter waste of 10 minutes to figure this out, and another 10 to write this up for future reference and helping anyone else.

Questions:
  • Will I ever use this construct?  NO.  I'll find some other way to do it.  If I even need to, given Bash collapses things onto one line with an echo anyway.
  • Do I actively dislike the person who wrote this?  YES.  I don't know who it was, specifically.  But, it doesn't matter, I know what I need to know:  He wasted my time, and was what I would consider the professional equivalent of a braggart.  Code should be easy to read as a primary goal.
Note also, thanks for help:  http://www.catonmat.net/blog/sed-one-liners-explained-part-one/


Tuesday, April 02, 2013

Solved: MongoDB - How to add Binary data to a document using pymongo

I'm trying to add data to a MongoDB document. My data is tuples of doubles, and I'm adding to an array of them. This is a good example of how to add binary data to a mongodb document using pymongo. My doc looks like this (presume I have created it already):

$ mongo
mongos> use kevintest1
switched to db kevintest1
mongos> show collections
metricValue
system.indexes
mongos> db.metricValue.find()
{ "_id" : ObjectId("5159f82f64524c06f5cb1208"), "mtid" : 2, "seqno" : 0, "vals" : [ ] }
mongos> mtid2 = db.metricValue.findOne({ "mtid" : 2})
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [ ]
}
mongos> mtid2
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [ ]
}
mongos> mtid2.bsonsize()
Tue Apr  2 17:23:57 TypeError: mtid2.bsonsize is not a function (shell):1
mongos> Object.bsonsize(mtid2)
54
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
54

NOW, I take a break, and in a different window, I run the following program:
#!/usr/bin/python
import struct
import bson
import pymongo
from pymongo import Connection
from bson import Binary
from pprint import pprint, pformat
conn = pymongo.Connection('myboxname', 7333)
db = conn['kevintest1']
mvc = db.metricValue
print "have mvc.find: %s" % ([x for x in mvc.find()])
mybuffer = struct.pack("dd", 65535.7, 65535.8)
print "binary info to add:  %s" % (pformat(struct.unpack("dd", mybuffer)))
retval = mvc.update({ 'mtid': 2, 'seqno': 0}, { '$push': { 'vals' : Binary(mybuffer) }}, w=1)
print "retval of update: %s" % (retval)
print "After update, have mvc.find: %s" % ([x for x in mvc.find()]) 
 
So, I run it.  Then, I go back to the mongos window, and each time I run it, I check the bson size:

mongos> db.metricValue.findOne({ "mtid" : 2})
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [
        BinData(0,"ZmZmZvb/70CamZmZ+f/vQA==")
    ]
} 
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
78
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
102
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
126
 
There you have it. 24 bytes per inserted pair of doubles. Without packing, it is 30 bytes, so I would be saving 24/30=12/15=4/5 => 20%.