Sunday, November 30, 2014

Python Coding Best Practice Guide - Practical Goodness (prevent over-smart show-offs from killing your project)

Over the last 8? 9? years of coding in Python, I've had to deal with code written by a variety of people:

  • Some of these people were very good at coding so I could quickly understand what they were doing and modify it as needed.
  • Some of these people were very smart.
  • These two groups overlapped only haphazardly.

My definition of good code is stuff that the average joe programmer, maybe 2 years out of a C.S. degree or so (or the equivalent industry experience), can understand quickly and modify to meet the business-demand of the moment.

Bad code is that which has to be read, re-read, analyzed, tried out interactively to see how parts work, analyzed again, and drawn out with data-flow diagrams because it requires a lot of brainpower to hold all of the parts in your head at once.

Maybe I'm not that smart. Maybe I am. I like to think I'm smart. But, when it's 2 am and I have to figure something out Right Now to make it work before I can go back to sleep, or if I'm really working on a giant project and I only have an hour or so to make a small mod someone wants - that's not when I want to be analyzing something to trace back where the problem is.

Back 15 years ago, when I was full-time coding in C (or C++, somewhat rarely), I had a whole list of things that I thought the designers of C, Kernigan and Richie, got wrong. I still remember some of that list. But, the important part is the adage:

Just because you CAN do a thing doesn't mean you SHOULD do a thing.

In C, the verboten (or at least things I avoided) included:

  • extensive use of #define and #ifdef for things other than constants and MAX functions. Smart people show off and write complex crap.
  • unions. Yes, unions. You could declare a struct that has an alternate layout as a union, and switch between the two. It's ripe for stupidity.
  • lots of mallocs and frees. I put everything I could on the stack because if there was an exit path I missed, that memory wasn't getting freed anytime soon.
  • ...etc.

Some of this translates below into Python. I think the corollary to IFDEF in Python is decorators. You're doing the same thing - putting all sorts of logic into a wrapper. That is, you have to figure out the wrapper before they can figure out your code. if it's simple, great, go ahead. If not, you're shooting the person who comes after you.

Some decorators are golden - @property and @classmethod are great, no problem there because you can create getters/setters quickly and singletons, etc. But, a frequent example is something like, @deprecated which writes a lot message with the call stack before turning around and calling the function. BUT, it's just as easy to insert a function call at the top of the function, something like, GlobalCallDeprecated() which does the same thing. When a function works, why use a decorator? People like to show off how smart they are. Whup-tee-do, great, you're smart, so am I, get over it and write something I don't have to spend time figuring out.

With that in mind: Below is the list we're using here at (Large.Fortune-50.Company.Infrastructure.Dept.). This department generates graphs, thus the references to 'graph' boxes.


1.  pep-8
2.  use 4-space indentation.  Anyone touching code should use reindent to fix this.
    There is a program called 'reindent' available on all graph boxes.  it rewrites the file with 4 space indent.
3.  spacing (pep-8) is important, once space around equals in assignment, but none in calls:
    a = 5
    self.runThis(5, 6, abc=12)
4.  run pychecker on all code
        * alias pychecker="pychecker --limit=9999"
        * some unused vars are okay
5.  MOST except the shortest sets of code should be in classes that do nothing but can be invoked from a test.
    MOST outer scope routines should contain try/except blocks.
    When creating something that produces multiple outputs (measurements, for example), each set of measurements
      should be created in a separate try/except block, so a failure in one area does not lead to total failure.
    MOST classes should start with a capital letter, e.g., CamelCase.
    MOST modules should start with a lowercase letter, e.g., camelCase.py
    MOST module globals should be all caps with underscores, e.g., ALL_CAPS
    MOST methods should be lowercase first letter, e.g., camelCase() or camel_case()
    Slight preference for camelCase vs. under_scored var and method names, but both are okay.
    VERY FEW global variables should be used, if any, and all should be at the top of a file.
    Globals should be used in the init so they can be overriden by tests and other code.
    Filenames and/or explicit paths should be at the top, as globals, so they can be found / changed easily.


    import this
    import traceback

    VERY_FEW_GLOBALS = 1

    class Something(object):

        def __init__(self, logger=None):
            if logger:
                self.log = logger
            else:
                self.setupLogging()
            self.var1 = 'c'
            self.var2 = 'ff'
            self.veryFewGlobals = VERY_FEW_GLOBALS

        def blah1(self, instr):
            instr += 'a'
            self.var1 = instr

        def blah2(self, instr):
            instr += 'b'
            self.var1 = instr

        def main(self):
            self.log.info("starting.")  #whatever
            try:
                x = 'abc'
                self.blah1(x)
                self.var2 = self.blah2(x)
                self.var1 = str(self.var1) + ''
            except Exception, e:
                tb = traceback.format_exc()
                self.log.error("Something.main(): Exception: e: %s, tb: %s" % (e, tb))
            return

    if __name__ == '__main__':
        s = Something()
        s.main()

6.  All code should have at least two tests: a trivial test, and an instantiation test.
    Tests should be in a file named test_doSomething.py in a subdirectory tests.
    Monkey-patch / duck-punch methods to reduce and simplify the test.

    # for one test, run:  nosetests metricType:TestMetricType.test_simpleRetrieve
    class TestSomething(unittest.TestCase):

       def test_trivial(self):
           return True

       def test_instantiate(self):
           s = Something()
           assert s

       def test_blah1(self):
           s = Something()
           s.var1 = 'ddd'
           s.blah1('bb')
           assert s.var1 == 'abcab'

       def myroutine(self, instr):
           something = {
               'metricType'  : 'poweriq',
               'metric'      : metricName,
               'timestamp'   : timestamp,
               'value'       : value,
               'context'     : context,
           }

3.  Function calls with lots of parameters should not be indented to the max level - it makes for
    lots of whitespace, all the code on the right side of the screen, too-long lines, etc.

    # BAD - discouraged.
    someVarNameHere = self.callingFunctionNowHere(exceptionallyLongVarName=1,
                                                  freakishlyLongExceptionallyLongVarName=2,
                                                  additionallyLongExceptionallyLongVarName=4,
                                                  somethihng=5)
    # PREFERRED:
    someVarNameHere = self.callingFunctionNowHere(
        exceptionallyLongVarName=1,
        freakishlyLongExceptionallyLongVarName=2,
        additionallyLongExceptionallyLongVarName=4,
        somethihng=5
        )

    # ALSO POSSIBLE:
    someVarNameHere = self.callingFunctionNowHere(exceptionallyLongVarName=1,
        freakishlyLongExceptionallyLongVarName=2,
        additionallyLongExceptionallyLongVarName=4, somethihng=5 )

14. Comments added to the end of multiple lines are discouraged:

    # BAD - Discouraged
    var1 = self.callingSomethingHere(12)   # ESOC-123334
    var2 = self.callingSomethingHere(14)   # ESOC-123334

    # Preferred:

    # added for ESOC-123334:
    var1 = self.callingSomethingHere(12)
    var2 = self.callingSomethingHere(14)

15. Exceptionally long quotes should use the triple-quote concept to prevent problems with
    escaping quotes:

    x = ''' very long
        lines go here and
        continue at indentations
        presuming that's not a problem.
        '''

16.  It is generally quite possible to avoid line-continuation characters.
     Generally, long lines end up being inside a list.
     As with the above function calls with many params, just open the paren
     and keep adding variables.

17.  Line length limits of 80 or even 120 characters are far less than the width of our
     modern screens.  However, some limit should apply since the eye has a hard time aligning
     to very, very long lines.
     There's no hard/fast limit, it's up to us individually, but 150 characters is probably a good limit
     except in very unusual circumstances.

18.  Logging lines should start with the name of the routine, so we know where the line came from.
     self.log.info("someRoutine():  started routine to figure out blah.")

19.  Libraries incorporated into other programs should have the class name also.
     self.log.info("sendReconEvent.connect(): connected to %s" % (pformat(self.destination)))

20.  Getting command line options should always be its own routine.
     This is because test cases cannot parse or understand parseOptions.  Example:

     # pulled out so can be overridden in test routines, e.g., m.getOptionParser = lambda x: None
     def getOptionParser(self):
         return OptionParser()

     def getOptions(self):
         parser = self.getOptionParser()
         parser.add_option("-d", "--debug",   action="store_true", dest="debug",   default=False, help="debug mode for this program, writes debug messages to logfile." )
         parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="verbose mode for this program, prints a lot to stdout." )
         parser.add_option("-p", "--processes", action="store", type="int", dest="processes", default=10, help="number of processes to run.")
         (options, args)             = parser.parse_args()
         self.verbose                = options.verbose
         self.debug                  = options.debug
         self.numProcesses           = options.processes
         if (self.verbose):
             print "verbose=%s, debug=%s, processes=%s" % (self.verbose, self.debug, self.numProcesses)
         return

21.  Except in test routines and sorts, the use of lambda functions is strongly discouraged.  Example:
     return [lambda x, i=i : i * x for i in range(5)]   # WTF? What does it do? Tested: [x(2)for x in f]=>[0,2,4,6,8] Why? Dafuq?

22.  In general, the use of yield is troublesome and should be avoided except when strongly needed.
     Especially grievous is the use of multiple yield statements in one method / function, since tracing
     code execution paths with this is vastly nasty.

23.  In code, avoid the use of 'from blah import *' for many, many, many reasons.  Most proximate:  pychecker can't check the code now.

24.  Decorators:
     * Writing custom decorators is strongly discouraged as it obfuscates code.  
     * Generally acceptable decorators are ones that are included in standard libraries (incl. Django)
     * Decorators should be used as tools, not as weapons of mass confusion.

25.  Any but the most simplistic use should be avoided of the following:
        * getattr and setattr methods

26.  The overall architectural goal of all scripts should be SIMPLICITY of code readability by zombies (the on-call guy at 3 am). 
     This is versus other considerations like execution speed, code size, 'elegance', etc.

27.  No file or class should be named the same as any standard library method.

28.  No variables anywhere should be named with reserved words or the names of standard libraries.  Examples:
     type = 'a'
     resource = 'Something'
     int = 7
     long = True
     path = 'blah/blah1'   # os.path is common, sometimes people do 'from os import path' and then you're in trouble.

29.  The use of **args is discouraged.  It is way too easy to mess up.  It has to be called exactly correctly and used
     within a method exactly correctly.  That is, calls with foo(context) are bad but easy, must do foo(**context) instead,
     but you will not get an error message

     #So, instead of:
     context = { foo: 7, bar: 8, baz: 10 }
     someClassInstance.funcOne(baz=6, **context)
     def funcOne(baz=8, bar=9, **args):
         # magic happened, baz is 6 or 10, but probably not 8.  Good luck.

     # do this, it's utterly obvious even at a glance:
     miscVals = { foo: 7, bar: 8 }
     someClassInstance.funcOne(baz=8, context=miscVals)
     def funcOne(baz=8, bar=9, context={}):
         if baz != 8 and context.get('baz', None):  print "WHICH ONE?"
         baz = context.get('baz', None) or baz
         bar = context.get('bar', None) or bar

30.  To close db handles, instead of __del__ use atexit.register() as in:

     import mysql
     import atexit
     class Bar(object):
         def __init__(self):
             self.dbHandle = mysql.openConnection(...) # or something
             atexit.register(self.cleanup)
         def cleanup(self):
             try:
                 self.dbHandle.close()
                 print "db handle closed."
             except:
                 print "FYI, db close failed."

31.  Additional rules: http://docs.python-guide.org/en/latest/writing/style/
     with the exception of using lambdas, map, and filter.

32.  In a perfect world, we clean up our code to bring it to clarity and obviousness right away.
     Short of that, we should endeavor to clean up our code any time we touch it.
     This includes making scripts use classes and adding a test_blah.py file with a couple of tests in it.
     Before committing, we should always run pychecker.

33.  Functions and methods should not be absurdly long.  
     If they are too long, they should be broken up.
     A handy library for reducing method size is counterStats which allows obj=CounterStats(['x']); obj.x = 5 coding.