Wednesday, January 07, 2015

Custom Python Decorators Considered Harmful

In line with the classic post about GOTO being harmful, I must extend this to include the 'Decorators' language feature of Python.

"Just because you can do a thing doesn't mean you should do a thing."

Decorators, IMHO, as a general rule, are Evil in the classical sense: they tempt you with easy riches, but there are complications and ambiguities down the road that make life difficult.

I recently read a blog post about someone advocating for creating a @retry decorator that would automatically retry a function a certain number of times until it returned success.

As decorated, the method would fetch a URL a max number of times or until it got a certain HTTP success code. I've seen such methods used before at various places, but have found them to be VERY troublesome.

Test driven (TDD) requires we create a method that can be tested through all its code paths, or at least most of them. Decorators make writing tests more complicated as well.

So, here's my reasoning:

  • The code without a decorator, retrying n times is very obvious in function.
  • Presumably one would want to use a decorator so the same decorator could be used on multiple functions, yet the undecorated version can be parameterized easily, also.
  • Modifying this code will happen an industry average of 6 more times;
  • Each time it's modified, the person reading it will have to understand not just loops, but decorators and how they can go wrong;
  • Some of the people who come after you, who modify this code, will do it incorrectly and this will result in more time spent;
  • Writing unit tests for decorated code is usually far more difficult, since how do you test a decorator without testing the code it decorates?
  • People will be tempted to extend this concept and make more decorators that do far more things (database operations, disk writes, etc.) and this sets a bad precedent that decorators are in frequent use.
  • Some decorators are fine: @staticmethod, @classmethod, @property.  These are common and useful.  Further, some libraries like Twisted have some predefined that are useful.

As a general rule, wherever I see custom decorators used, I'm sorely tempted to yank them for the future good of humanity. Of course, I'm not free to do that a lot of the time, but it's a temptation.

TL;DR version: If you have custom decorated code, yank if possible; Don't write custom decorators, it makes for crazymaking code.

I think frequently people write this stuff just to prove they're smarter than I am. Perhaps that's true. But, I'm smart enough to realize that the simplest possible code, even if it takes more lines, is the best possible code.

-- Kevin J. Rice

9 years in Python and counting, and loving it (except rampant use of decorators and dependency injection).

Saturday, December 13, 2014

Pylint Broken on install: "control_pragmas = {'disable', 'enable'}"

So, wanting better Python code and having used pychecker for years and years, I was recently pointed to pylint again as a tool-of-choice.

Installing it was simple:

$ sudo pip install pylint
Downloading/unpacking pylint
Downloading pylint-1.4.0-py2.py3-none-any.whl (412kB): 412kB downloaded
Downloading/unpacking astroid>=1.3.2 (from pylint)
Downloading astroid-1.3.2-py2.py3-none-any.whl (163kB): 163kB downloaded
Downloading/unpacking six (from pylint)
Downloading six-1.8.0-py2.py3-none-any.whl
Downloading/unpacking logilab-common>=0.53.0 (from pylint)
Downloading logilab-common-0.63.2.tar.gz (196kB): 196kB downloaded
Running setup.py (path:/tmp/pip_build_root/logilab-common/setup.py) egg_info for package logilab-common
/usr/lib64/python2.6/distutils/dist.py:266: UserWarning: Unknown distribution option: 'test_require'
warnings.warn(msg)
package init file './test/__init__.py' not found (or not a regular file)
Downloading/unpacking unittest2>=0.5.1 (from logilab-common>=0.53.0->pylint)
Downloading unittest2-0.8.0-py2.py3-none-any.whl (94kB): 94kB downloaded
Downloading/unpacking argparse (from unittest2>=0.5.1->logilab-common>=0.53.0->pylint)
Downloading argparse-1.2.2-py2.py3-none-any.whl
Installing collected packages: pylint, astroid, six, logilab-common, unittest2, argparse
Compiling /tmp/pip_build_root/pylint/pylint/lint.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/lint.py', 585, 37, " control_pragmas = {'disable', 'enable'}\n"))

Compiling /tmp/pip_build_root/pylint/pylint/reporters/text.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/reporters/text.py', 136, 18, " for attr in ('msg', 'symbol', 'category', 'C')})\n"))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/abstract_abc_methods.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/abstract_abc_methods.py', 6, 31, 'class Parent(object, metaclass=abc.ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/abstract_class_instantiated_py2.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/abstract_class_instantiated_py2.py', 15, 34, 'class GoodClass(object, metaclass=abc.ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/abstract_class_instantiated_py3.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/abstract_class_instantiated_py3.py', 14, 34, 'class GoodClass(object, metaclass=abc.ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/class_members_py27.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/class_members_py27.py', 34, 38, 'class TestMetaclass(object, metaclass=ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/class_members_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/class_members_py30.py', 34, 38, 'class TestMetaclass(object, metaclass=ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/defined_and_used_on_same_line.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/defined_and_used_on_same_line.py', 29, 20, "with open('f') as f, open(f.read()) as g:\n"))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/no_name_in_module.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/no_name_in_module.py', 15, 26, "print('hello world', file=sys.stdout)\n"))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/old_style_class_py27.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/old_style_class_py27.py', 10, 29, 'class NotOldStyle2(metaclass=type):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/unbalanced_tuple_unpacking_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/unbalanced_tuple_unpacking_py30.py', 9, 20, ' first, second, *last = (1, 2, 3, 4)\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/undefined_variable_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/functional/undefined_variable_py30.py', 8, 19, ' def test(self)->Undefined: # [undefined-variable]\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/functional/yield_outside_func.py ...
SyntaxError: ("'yield' outside function", ('/tmp/pip_build_root/pylint/pylint/test/functional/yield_outside_func.py', 2, None, 'yield 1 # [yield-outside-function]\n'))

/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:4: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (1 == 1, 2 == 2), "no error"
/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:5: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (1 == 1, 2 == 2) #this should generate a warning
/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:7: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (1 == 1, ), "no error"
/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:8: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (1 == 1, )
/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:9: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (1 == 1, 2 == 2, 3 == 5), "no error"
/tmp/pip_build_root/pylint/pylint/test/input/func_assert_2uple.py:11: SyntaxWarning: assertion is always true, perhaps remove parentheses?
assert (True, 'error msg') #this should generate a warning
Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_bad_cont_dictcomp_py27.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_bad_cont_dictcomp_py27.py', 8, 9, ' for x in range(3)}\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_bad_exception_context_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_bad_exception_context_py30.py', 14, 25, ' raise IndexError from 1\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_continue_not_in_loop.py ...
SyntaxError: ("'continue' not properly in loop", ('/tmp/pip_build_root/pylint/pylint/test/input/func_continue_not_in_loop.py', 8, None, 'continue\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_e0108.py ...
SyntaxError: ("duplicate argument '_' in function definition",)

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_exec_used_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_exec_used_py30.py', 6, 22, "exec('a = 1', globals={})\n"))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_keyword_repeat.py ...
SyntaxError: ('keyword argument repeated', ('/tmp/pip_build_root/pylint/pylint/test/input/func_keyword_repeat.py', 8, None, 'function_default_arg(two=5, two=7)\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_kwoa_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_kwoa_py30.py', 3, 15, 'def function(*, foo):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_loopvar_in_dict_comp_py27.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_loopvar_in_dict_comp_py27.py', 8, 28, ' return {x: lambda: x for x in range(10)}\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_noerror_mcs_attr_access.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_noerror_mcs_attr_access.py', 14, 29, 'class Test(object, metaclass=Meta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_noerror_unused_variable_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_noerror_unused_variable_py30.py', 12, 21, ' nonlocal attr\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_return_outside_func.py ...
SyntaxError: ("'return' outside function", ('/tmp/pip_build_root/pylint/pylint/test/input/func_return_outside_func.py', 3, None, 'return\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_return_yield_mix_py_33.py ...
SyntaxError: ("'return' with argument inside generator",)

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_set_literal_as_default_py27.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_set_literal_as_default_py27.py', 5, 23, 'def function1(value={1}):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_syntax_error.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_syntax_error.py', 1, 9, 'def toto\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_undefined_metaclass_var_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_undefined_metaclass_var_py30.py', 8, 20, 'class Bad(metaclass=ABCMet):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_unused_import_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_unused_import_py30.py', 10, 21, 'class Meta(metaclass=abc.ABCMeta):\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_used_before_assignment_py30.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/test/input/func_used_before_assignment_py30.py', 10, 20, ' nonlocal cnt\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/func_w0705.py ...
SyntaxError: ("default 'except:' must be last", ('/tmp/pip_build_root/pylint/pylint/test/input/func_w0705.py', 28, None, '__revision__ += 1\n'))

Compiling /tmp/pip_build_root/pylint/pylint/test/input/syntax_error.py ...
Sorry: IndentationError: ('expected an indented block', ('/tmp/pip_build_root/pylint/pylint/test/input/syntax_error.py', 2, 5, "print('hop')\n"))
Compiling /tmp/pip_build_root/pylint/pylint/utils.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/pylint/pylint/utils.py', 59, 26, 'MSG_TYPES_LONG = {v: k for k, v in six.iteritems(MSG_TYPES)}\n'))

Compiling /tmp/pip_build_root/astroid/astroid/tests/testdata/python3/data/module.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/testdata/python3/data/module.py', 55, 32, " print('yo', end=' ')\n"))

Compiling /tmp/pip_build_root/astroid/astroid/tests/testdata/python3/data/module2.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/testdata/python3/data/module2.py', 100, 22, "print('bonjour', file=stream)\n"))

Compiling /tmp/pip_build_root/astroid/astroid/tests/unittest_brain.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/unittest_brain.py', 71, 33, ' self.assertSetEqual({"a", "b", "c"}, set(base.instance_attrs))\n'))

Compiling /tmp/pip_build_root/astroid/astroid/tests/unittest_inference.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/unittest_inference.py', 492, 44, ' self.assertSetEqual({n.__class__ for n in xxx.infered()},\n'))

Compiling /tmp/pip_build_root/astroid/astroid/tests/unittest_modutils.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/unittest_modutils.py', 248, 41, " {os.path.join(package, x) for x in ['__init__.py', 'module.py', 'module2.py', 'noendingnewline.py', 'nonregr.py']})\n"))

Compiling /tmp/pip_build_root/astroid/astroid/tests/unittest_scoped_nodes.py ...
SyntaxError: ('invalid syntax', ('/tmp/pip_build_root/astroid/astroid/tests/unittest_scoped_nodes.py', 556, 39, " expected_methods = {'__init__', 'class_method', 'method', 'static_method'}\n"))

Running setup.py install for logilab-common
/usr/lib64/python2.6/distutils/dist.py:266: UserWarning: Unknown distribution option: 'test_require'
warnings.warn(msg)
package init file './test/__init__.py' not found (or not a regular file)
package init file './test/__init__.py' not found (or not a regular file)
changing mode of build/scripts-2.6/pytest from 644 to 755
package init file './test/__init__.py' not found (or not a regular file)
Installing /usr/lib/python2.6/site-packages/logilab_common-0.63.2-py2.6-nspkg.pth
changing mode of /usr/bin/pytest to 755
package init file './test/__init__.py' not found (or not a regular file)
Successfully installed pylint astroid six logilab-common unittest2 argparse
Cleaning up...
krice4@graph301p:~$ dir
Thinking this would be good, I did the normal first-use invocation:

[esm@graph301p:bin]$ pylint
Traceback (most recent call last):
  File "/usr/bin/pylint", line 11, in
    sys.exit(run_pylint())
  File "/usr/lib/python2.6/site-packages/pylint/__init__.py", line 22, in run_pylint
    from pylint.lint import Run
  File "/usr/lib/python2.6/site-packages/pylint/lint.py", line 585
    control_pragmas = {'disable', 'enable'}
                                ^
SyntaxError: invalid syntax
[esm@graph301p:bin]$ dir

No joy.  Thinking my install failed, I turned to the intertubewebs and found that the Pylint people decided to not support Python 2.6 anymore.  It's apparently a Python 2.7 game only now.


I found out from here:  https://bitbucket.org/logilab/pylint/issue/390/py26-compatiblity-broken

Not to worry, though, pychecker still works.

Sunday, November 30, 2014

Python Coding Best Practice Guide - Practical Goodness (prevent over-smart show-offs from killing your project)

Over the last 8? 9? years of coding in Python, I've had to deal with code written by a variety of people:

  • Some of these people were very good at coding so I could quickly understand what they were doing and modify it as needed.
  • Some of these people were very smart.
  • These two groups overlapped only haphazardly.

My definition of good code is stuff that the average joe programmer, maybe 2 years out of a C.S. degree or so (or the equivalent industry experience), can understand quickly and modify to meet the business-demand of the moment.

Bad code is that which has to be read, re-read, analyzed, tried out interactively to see how parts work, analyzed again, and drawn out with data-flow diagrams because it requires a lot of brainpower to hold all of the parts in your head at once.

Maybe I'm not that smart. Maybe I am. I like to think I'm smart. But, when it's 2 am and I have to figure something out Right Now to make it work before I can go back to sleep, or if I'm really working on a giant project and I only have an hour or so to make a small mod someone wants - that's not when I want to be analyzing something to trace back where the problem is.

Back 15 years ago, when I was full-time coding in C (or C++, somewhat rarely), I had a whole list of things that I thought the designers of C, Kernigan and Richie, got wrong. I still remember some of that list. But, the important part is the adage:

Just because you CAN do a thing doesn't mean you SHOULD do a thing.

In C, the verboten (or at least things I avoided) included:

  • extensive use of #define and #ifdef for things other than constants and MAX functions. Smart people show off and write complex crap.
  • unions. Yes, unions. You could declare a struct that has an alternate layout as a union, and switch between the two. It's ripe for stupidity.
  • lots of mallocs and frees. I put everything I could on the stack because if there was an exit path I missed, that memory wasn't getting freed anytime soon.
  • ...etc.

Some of this translates below into Python. I think the corollary to IFDEF in Python is decorators. You're doing the same thing - putting all sorts of logic into a wrapper. That is, you have to figure out the wrapper before they can figure out your code. if it's simple, great, go ahead. If not, you're shooting the person who comes after you.

Some decorators are golden - @property and @classmethod are great, no problem there because you can create getters/setters quickly and singletons, etc. But, a frequent example is something like, @deprecated which writes a lot message with the call stack before turning around and calling the function. BUT, it's just as easy to insert a function call at the top of the function, something like, GlobalCallDeprecated() which does the same thing. When a function works, why use a decorator? People like to show off how smart they are. Whup-tee-do, great, you're smart, so am I, get over it and write something I don't have to spend time figuring out.

With that in mind: Below is the list we're using here at (Large.Fortune-50.Company.Infrastructure.Dept.). This department generates graphs, thus the references to 'graph' boxes.


1.  pep-8
2.  use 4-space indentation.  Anyone touching code should use reindent to fix this.
    There is a program called 'reindent' available on all graph boxes.  it rewrites the file with 4 space indent.
3.  spacing (pep-8) is important, once space around equals in assignment, but none in calls:
    a = 5
    self.runThis(5, 6, abc=12)
4.  run pychecker on all code
        * alias pychecker="pychecker --limit=9999"
        * some unused vars are okay
5.  MOST except the shortest sets of code should be in classes that do nothing but can be invoked from a test.
    MOST outer scope routines should contain try/except blocks.
    When creating something that produces multiple outputs (measurements, for example), each set of measurements
      should be created in a separate try/except block, so a failure in one area does not lead to total failure.
    MOST classes should start with a capital letter, e.g., CamelCase.
    MOST modules should start with a lowercase letter, e.g., camelCase.py
    MOST module globals should be all caps with underscores, e.g., ALL_CAPS
    MOST methods should be lowercase first letter, e.g., camelCase() or camel_case()
    Slight preference for camelCase vs. under_scored var and method names, but both are okay.
    VERY FEW global variables should be used, if any, and all should be at the top of a file.
    Globals should be used in the init so they can be overriden by tests and other code.
    Filenames and/or explicit paths should be at the top, as globals, so they can be found / changed easily.


    import this
    import traceback

    VERY_FEW_GLOBALS = 1

    class Something(object):

        def __init__(self, logger=None):
            if logger:
                self.log = logger
            else:
                self.setupLogging()
            self.var1 = 'c'
            self.var2 = 'ff'
            self.veryFewGlobals = VERY_FEW_GLOBALS

        def blah1(self, instr):
            instr += 'a'
            self.var1 = instr

        def blah2(self, instr):
            instr += 'b'
            self.var1 = instr

        def main(self):
            self.log.info("starting.")  #whatever
            try:
                x = 'abc'
                self.blah1(x)
                self.var2 = self.blah2(x)
                self.var1 = str(self.var1) + ''
            except Exception, e:
                tb = traceback.format_exc()
                self.log.error("Something.main(): Exception: e: %s, tb: %s" % (e, tb))
            return

    if __name__ == '__main__':
        s = Something()
        s.main()

6.  All code should have at least two tests: a trivial test, and an instantiation test.
    Tests should be in a file named test_doSomething.py in a subdirectory tests.
    Monkey-patch / duck-punch methods to reduce and simplify the test.

    # for one test, run:  nosetests metricType:TestMetricType.test_simpleRetrieve
    class TestSomething(unittest.TestCase):

       def test_trivial(self):
           return True

       def test_instantiate(self):
           s = Something()
           assert s

       def test_blah1(self):
           s = Something()
           s.var1 = 'ddd'
           s.blah1('bb')
           assert s.var1 == 'abcab'

       def myroutine(self, instr):
           something = {
               'metricType'  : 'poweriq',
               'metric'      : metricName,
               'timestamp'   : timestamp,
               'value'       : value,
               'context'     : context,
           }

3.  Function calls with lots of parameters should not be indented to the max level - it makes for
    lots of whitespace, all the code on the right side of the screen, too-long lines, etc.

    # BAD - discouraged.
    someVarNameHere = self.callingFunctionNowHere(exceptionallyLongVarName=1,
                                                  freakishlyLongExceptionallyLongVarName=2,
                                                  additionallyLongExceptionallyLongVarName=4,
                                                  somethihng=5)
    # PREFERRED:
    someVarNameHere = self.callingFunctionNowHere(
        exceptionallyLongVarName=1,
        freakishlyLongExceptionallyLongVarName=2,
        additionallyLongExceptionallyLongVarName=4,
        somethihng=5
        )

    # ALSO POSSIBLE:
    someVarNameHere = self.callingFunctionNowHere(exceptionallyLongVarName=1,
        freakishlyLongExceptionallyLongVarName=2,
        additionallyLongExceptionallyLongVarName=4, somethihng=5 )

14. Comments added to the end of multiple lines are discouraged:

    # BAD - Discouraged
    var1 = self.callingSomethingHere(12)   # ESOC-123334
    var2 = self.callingSomethingHere(14)   # ESOC-123334

    # Preferred:

    # added for ESOC-123334:
    var1 = self.callingSomethingHere(12)
    var2 = self.callingSomethingHere(14)

15. Exceptionally long quotes should use the triple-quote concept to prevent problems with
    escaping quotes:

    x = ''' very long
        lines go here and
        continue at indentations
        presuming that's not a problem.
        '''

16.  It is generally quite possible to avoid line-continuation characters.
     Generally, long lines end up being inside a list.
     As with the above function calls with many params, just open the paren
     and keep adding variables.

17.  Line length limits of 80 or even 120 characters are far less than the width of our
     modern screens.  However, some limit should apply since the eye has a hard time aligning
     to very, very long lines.
     There's no hard/fast limit, it's up to us individually, but 150 characters is probably a good limit
     except in very unusual circumstances.

18.  Logging lines should start with the name of the routine, so we know where the line came from.
     self.log.info("someRoutine():  started routine to figure out blah.")

19.  Libraries incorporated into other programs should have the class name also.
     self.log.info("sendReconEvent.connect(): connected to %s" % (pformat(self.destination)))

20.  Getting command line options should always be its own routine.
     This is because test cases cannot parse or understand parseOptions.  Example:

     # pulled out so can be overridden in test routines, e.g., m.getOptionParser = lambda x: None
     def getOptionParser(self):
         return OptionParser()

     def getOptions(self):
         parser = self.getOptionParser()
         parser.add_option("-d", "--debug",   action="store_true", dest="debug",   default=False, help="debug mode for this program, writes debug messages to logfile." )
         parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="verbose mode for this program, prints a lot to stdout." )
         parser.add_option("-p", "--processes", action="store", type="int", dest="processes", default=10, help="number of processes to run.")
         (options, args)             = parser.parse_args()
         self.verbose                = options.verbose
         self.debug                  = options.debug
         self.numProcesses           = options.processes
         if (self.verbose):
             print "verbose=%s, debug=%s, processes=%s" % (self.verbose, self.debug, self.numProcesses)
         return

21.  Except in test routines and sorts, the use of lambda functions is strongly discouraged.  Example:
     return [lambda x, i=i : i * x for i in range(5)]   # WTF? What does it do? Tested: [x(2)for x in f]=>[0,2,4,6,8] Why? Dafuq?

22.  In general, the use of yield is troublesome and should be avoided except when strongly needed.
     Especially grievous is the use of multiple yield statements in one method / function, since tracing
     code execution paths with this is vastly nasty.

23.  In code, avoid the use of 'from blah import *' for many, many, many reasons.  Most proximate:  pychecker can't check the code now.

24.  Decorators:
     * Writing custom decorators is strongly discouraged as it obfuscates code.  
     * Generally acceptable decorators are ones that are included in standard libraries (incl. Django)
     * Decorators should be used as tools, not as weapons of mass confusion.

25.  Any but the most simplistic use should be avoided of the following:
        * getattr and setattr methods

26.  The overall architectural goal of all scripts should be SIMPLICITY of code readability by zombies (the on-call guy at 3 am). 
     This is versus other considerations like execution speed, code size, 'elegance', etc.

27.  No file or class should be named the same as any standard library method.

28.  No variables anywhere should be named with reserved words or the names of standard libraries.  Examples:
     type = 'a'
     resource = 'Something'
     int = 7
     long = True
     path = 'blah/blah1'   # os.path is common, sometimes people do 'from os import path' and then you're in trouble.

29.  The use of **args is discouraged.  It is way too easy to mess up.  It has to be called exactly correctly and used
     within a method exactly correctly.  That is, calls with foo(context) are bad but easy, must do foo(**context) instead,
     but you will not get an error message

     #So, instead of:
     context = { foo: 7, bar: 8, baz: 10 }
     someClassInstance.funcOne(baz=6, **context)
     def funcOne(baz=8, bar=9, **args):
         # magic happened, baz is 6 or 10, but probably not 8.  Good luck.

     # do this, it's utterly obvious even at a glance:
     miscVals = { foo: 7, bar: 8 }
     someClassInstance.funcOne(baz=8, context=miscVals)
     def funcOne(baz=8, bar=9, context={}):
         if baz != 8 and context.get('baz', None):  print "WHICH ONE?"
         baz = context.get('baz', None) or baz
         bar = context.get('bar', None) or bar

30.  To close db handles, instead of __del__ use atexit.register() as in:

     import mysql
     import atexit
     class Bar(object):
         def __init__(self):
             self.dbHandle = mysql.openConnection(...) # or something
             atexit.register(self.cleanup)
         def cleanup(self):
             try:
                 self.dbHandle.close()
                 print "db handle closed."
             except:
                 print "FYI, db close failed."

31.  Additional rules: http://docs.python-guide.org/en/latest/writing/style/
     with the exception of using lambdas, map, and filter.

32.  In a perfect world, we clean up our code to bring it to clarity and obviousness right away.
     Short of that, we should endeavor to clean up our code any time we touch it.
     This includes making scripts use classes and adding a test_blah.py file with a couple of tests in it.
     Before committing, we should always run pychecker.

33.  Functions and methods should not be absurdly long.  
     If they are too long, they should be broken up.
     A handy library for reducing method size is counterStats which allows obj=CounterStats(['x']); obj.x = 5 coding.




Thursday, September 04, 2014

SOLVED: MongoDB - Pymongo ConnectionFailure: function takes at most 4 arguments (5 given)

I was getting this odd message, leading to pymongo fail the Connection() and MongoClient() instantiation.  No connection?  Really?  Getting this message:

[user@boxnamehere:logs]$ python
Python 2.6.6 (r266:84292, Jul 20 2011, 10:22:43)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymongo import MongoClient
>>> mc = MongoClient('servername.here', 11222)
Traceback (most recent call last):
  File "", line 1, in
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 369, in __init__
    raise ConnectionFailure(str(e))
pymongo.errors.ConnectionFailure: function takes at most 4 arguments (5 given)

So, what's wrong?  It's tough to find out.  I went in, backed-up and modified the files in the lib64/python2.6/site-packages/pymongo and bson/ dirs.  I imported traceback, added tb=format_exc() to each level of the try/except needed.  This gave me a really long traceback but at least I could see what was happening.

Once modified, my test program now produces:

user@boxname:~$ ./testPymongo.py
starting.
Traceback (most recent call last):
  File "./testPymongo.py", line 5, in
    mc = MC(host='servername.goes.here', port=11222, socketTimeoutMS=15000)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 373, in __init__
    raise ConnectionFailure(msg)
pymongo.errors.ConnectionFailure: function takes at most 4 arguments (5 given) Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 867, in __find_node
    member, nodes = self.__try_node(candidate)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 714, in __try_node
    {'ismaster': 1})
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 695, in __simple_command
    response = helpers._unpack_response(response)['data'][0]
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 117, in _unpack_response
    compile_re)
TypeError: function takes at most 4 arguments (5 given)
, Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 367, in __init__
    self._ensure_connected(True)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 936, in _ensure_connected
    self.__ensure_member()
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 807, in __ensure_member
    member, nodes = self.__find_node()
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 900, in __find_node
    raise AutoReconnect(', '.join(errors))
AutoReconnect: function takes at most 4 arguments (5 given) Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 867, in __find_node
    member, nodes = self.__try_node(candidate)
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 714, in __try_node
    {'ismaster': 1})
  File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 695, in __simple_command
    response = helpers._unpack_response(response)['data'][0]
  File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 117, in _unpack_response
    compile_re)
TypeError: function takes at most 4 arguments (5 given)

So, we see the problem is that the Pymongo version is calling, at about line 85 of /usr/lib64/python2.6/site-packages/pymongo/helpers.py

result["data"] = bson.decode_all(response[20:],
                                  as_class, tz_aware, uuid_subtype,
                                  compile_re)


But, when I go into /usr/lib64/python2.6/site-packages/bson/_init_.py I see function defined differently:

def decode_all(data, as_class=dict,
               tz_aware=True, uuid_subtype=OLD_UUID_SUBTYPE):
    """Decode BSON data to multiple documents.


So, I could fix this in the bson code, by adding an optional param.

Instead, though, I'm going to just remove the extraneous param since I'm thinking I don't need it.

Changeset: One file modified: /usr/lib64/python2.6/site-packages/pymongo/helpers.py

[user@server:pymongo]$ diff helpers.py helpers.py.orig
16,117c116,117
<                                      as_class, tz_aware, uuid_subtype
<                                     )
---
>                                      as_class, tz_aware, uuid_subtype,
>                                      compile_re)

*Solution*

Turns out, the problem isn't  with the code in pymongo.  It's that MongoDB Corp. has chosen to bundle their own version of the 'bson' package in with pymongo, and that version doesn't engage in happy-kindergarden-play with the pypy version of bson.

So, to fix this, do the following (omitting the $ prompts so you can cut and paste).  Note important steps of NOT BELIEVING PIP that it really uninstalled everything:


sudo pip uninstall bson
sudo pip uninstall pymongo
cd /usr/lib64/python2.6/site-packages
# Now, remove old versions of pymongo and bson.  Pip doesn't delete everything, dammit.
sudo rm -rf bson pymongo*
sudo pip install pymongo

Done!  Pymongo installs its own, proper version of bson.

Restart your processes and re-run your tests, it should work now.  Pymongo will connect to your mongo cluster properly, etc.

Note that the above problem occurs especially frequently if the box is under heavy load because (as we noted above) the initial connection raised an AutoReconnect. 

Enjoy your newfound powers in peace and with goodwill towards all.


Tuesday, September 02, 2014

SOLVED: Skype crashes on startup on Ubuntu 14.04

Just had a problem with Skype crashing immediately upon startup on my Ubuntu 14.04 box (as of Sept. 2, 2014).  This was an unexpected, sudden, and seemingly a random failure.

Skype would startup with the blue skype starting screen, open the contacts window for several seconds, then the window would close with no message.

Needing a message, I invoked from the command line and saw messages:

me@boxname:~/.Skype$ skype
Gtk-Message: Failed to load module "overlay-scrollbar"
Gtk-Message: Failed to load module "unity-gtk-module"
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
... bunch of these ...

(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
Gtk-Message: Failed to load module "canberra-gtk-module"
Corrupt JPEG data: 3014 extraneous bytes before marker 0xd9
Aborted (core dumped)
me@boxname:~$


Found this set of directions: 

http://community.skype.com/t5/Linux/Skype-is-crashing-with-Corrupt-JPEG-data/td-p/3455587

Worked perfectly.

me@boxname:~$  cd
me@boxname:~$ tar -czf my.dot.skype.dir.tgz ~/.Skype
me@boxname:~$ sqlite3 ~/.Skype/my.profile.name/main.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from Messages where type = 68;
... bunch of crap here ...

sqlite> select count(*) from Messages where type = 68;
25
sqlite> quit
me@boxname:~$ skype
... same error messages as above, without the last one about 'Aborted (core dumped)'.
 Success!

Thursday, August 28, 2014

SOLVED: diff 2 files, get results without markup

So, I want do a logrotate and remove files that are old, keeping only the latest 3 files.  So, script starts with:

$ fullLoglist="/tmp/full_loglist"
$ keepLoglist="/tmp/keep_loglist"
$ rmLoglist="/tmp/rm_loglist"

$ ls -lrt ${d}/rs*log* > ${fullLoglist}

 
First, I create a file with all the files listed:


$ cat ${fullLoglist}
graphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21


I only want the top 3 files, though, and to rm the older ones.  So, I extract the ones I want to keep:

$ cat ${fullLoglist} | head -3 > $keepLoglist
$ cat $keepLoglistgraphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46



Now, I need to get a diff of the rest of them so I know what to call rm on.  But how?  If I just do a diff, I get:

$ diff /tmp/full_loglist /tmp/keep_loglist
4,8d3
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21



I DON'T WANT the extra markup.  I want diff without markup.  I try to google:

linux diff without markup
linux diff without greater-than less-than
linux diff line format markup
linux diff supress markup
linux diff only different lines

I try various options, like:

$ diff --line-format "%L" --suppress-common-lines /tmp/full_loglist /tmp/keep_loglist
graphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21

This is WRONG, it prints all the lines, not just the different ones. Ug. 

SOLUTION ONE:  comm -3

I found two solutions.  The first is to use comm, and pass a -3 option.  This prints just the different lines.  I hadn't ever heard of comm before, but it's nice:

$ comm -3 /tmp/full_loglist /tmp/keep_loglist
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21

SOLUTION TWO:  sort | uniq -u

I don't care about the ordering of these things.  So, I can use the simple solution of sort and uniq -u.  The uniq command normally prints all unique lines, removing duplicates.  But, an option is -u, which only prints lines that occur once-and-only-once.

$ cat /tmp/full_loglist /tmp/keep_loglist | sort  | uniq -u
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21


SOLVED!   No >, no < signs with diff and no line number markup.


Script to implement this process, for your edification:

#!/bin/bash

cd /opt/storage
dirlist=`ls -d graphdb_[1234][ij]/rs-*`
fullLoglist="/tmp/full_loglist"
keepLoglist="/tmp/keep_loglist"
rmLoglist="/tmp/rm_loglist"

for d in $dirlist
do     
  echo "---------------------------"
  echo "dir: ${d}"
  ls ${d}/rs*log* > ${fullLoglist}
  ls -al ${fullLoglist}
  echo "full files: `cat ${fullLoglist}`"
  cat ${fullLoglist} | head -3 > $keepLoglist
  echo "keep files: `cat ${keepLoglist}`"
  cat $keepLoglist $fullLoglist | sort | uniq -u > $rmLoglist
  echo "rm   files: `cat ${rmLoglist}`"
  if [ -s $rmLoglist ]
    then
        echo "Removing files...."
        cat $rmLoglist | xargs rm -v
    else
        echo "No files to remove this time."
    fi
done


Done.








Tuesday, August 26, 2014

SOLVED: Multiprocess updates of shared dict-of-dicts

Earlier, I posted the following:

I've run into my very first bug in Python.  It's a known, existing bug, but it's still the first time I've actually encountered one spontaneously.

I should caveat that I occassionally forget things, and I've been using Python 8 years, so it could be this has happened before but I've forgotten it.

The problem is as follows:

* I'm working in multiple processes, so I'm sharing a data structure across processes.  it's a dict.  So, I do

from multiprocessing import manager
d = manager.dict()

and share the sucker around.  Then, I do:

self.ddict.setdefault(mname, {})
#self.ddict[mname][ts] = val
self.log.error("updating mname: %s w/ ts: %s, val: %s" % (mname, ts, val))
self.ddict[mname].update({ts:val})
self.log.error("ddict %s, mname at %s" % (self.ddict, self.ddict[mname]))


output: 

2014-08-26 11:20:57,221 MainThread ERROR updating mname: a.b.c.d w/ ts: 1409070057, val: 1.0
2014-08-26 11:20:57,221 MainThread ERROR ddict {'a.b.c.d': {}}, mname at {}
2014-08-26 11:20:57,222 MainThread ERROR ddict: {'a.b.c.d' {}}


According to http://bugs.python.org/issue6766, this is a known bug.  I'm using Python2.6 and cannot upgrade. 

Damnit.  Looking for a workaround.


SOLVED:

Problem is that the shared memory 'watcher' manages the data structure and shares around updates to everyone using it. This means that this manager has to know when the data struct is updated in one process and tell the other processes to pull in the updates.

If you're using a dict-of-dicts DOD, the inner dict just looks like a blob that process A changes by direct access to the object, without informing DOD of the change. So, to fix this do:

dod = Manager.dict()
process ONE: dod['a'] = { 'x' : 12 }
# process TWO sees this update of a = x:12
process TWO: dod['a']['x'] = 13
# process ONE has no idea this happened, manager doesn't bring in the update.
# to fix:
process TWO: y=dod['a'];y['x']=13;dod=y
# this informs manager that dod is different.

Wednesday, January 15, 2014

WANTED: Complete List of WhiteHouse.Gov Petitions

I've created a petition on http://whitehouse.gov to modify copyright law to prevent copyright of any law enacted in the United States at the Federal, State, and Local level.  This prevents jerk corporations from claiming copyright when someone tries to put the code/law/edict on a webpage.

But, there aren't enough signatures yet.  So I tried advertising to my FB friends.  I have to get to 150 before the petition is visible on the list of open petitions on the website.

That got me thinking:  how do I find all the other petitions that aren't available for viewing yet due to being not-advertised enough?

They provide a short-URL for these petitions.  This shortener probably is incremental.  Mine is http://wh.gov/lInfx  PLEASE SIGN IT and I figure the other petitions might have urls near it.

So, here's my program to find them:

#!/usr/bin/python

#http://wh.gov/lInfx
#http://wh.gov/lInfx

nums1 = [x for x in range(65, 90)]
nums2 = [x for x in range(97, 122)]
nums = []
nums.extend(nums1)
nums.extend(nums2)
print nums

with open("/tmp/urls.wh", "w") as OFH:
    for i in nums:
        for j in nums:
            for k in nums:
                for m in nums:
                    OFH.write("http://wh.gov/l%c%c%c%c\n" % (i, j, k, m))
                    

# wget -a out.log -t 1 --read-timeout=3 --max-redirect 1  --save-headers


No luck yet, it takes a long time to run.

https://petitions.whitehouse.gov/petition/holidays-muslim/6T6csRph
https://petitions.whitehouse.gov/petition//95WplFSK 
https://petitions.whitehouse.gov/petition/muslim-should-have-holiday-their-holiday/dH8ZTMSf https://petitions.whitehouse.gov/petition/please-protect-peace-monument-nassau-county-new-york-eisenhower-park/hkXX3082






Tuesday, January 07, 2014

Solved: How to install Python ibm_db DB2 driver on RHEL

Solved:  Installing Python's DB2 module (ibm_db) on Linux


Problem 1 I kept getting a message about not having include files installed.  First, I had to get a sysadmin to install the files.  But, he didn't finish the job, quite.  I had to soft link it myself.  That is, the include files are installed by default in /opt/ibm/db2/V10.1, but linked to from /opt/db2inst/sqllib.  So:
$ sudo ln -s /opt/ibm/db2/V10.1/include /opt/db2inst/sqllib/include
That solved the include problem.

Problem 2I was installing IBM's DB2 python driver on a RHEL 6.2 linux box, but I kept getting the message:
Detected 64-bit Python
Environment variable IBM_DB_HOME is not set. Set it to your DB2/IBM_Data_Server_Driver installation directory and retry ibm_db module install.
This was despite executing the userprofile script in /opt/db2inst/sqllib/userprofile, which set the environment vars properly: 
xxxxx@xxxxxx:~/making/ibm_db-2.0.4.1$ env | egrep -i ibm
IBM_DB_LIB=/opt/db2inst1/sqllib/lib
IBM_DB_DIR=/opt/db2inst1/sqllib
IBM_DB_HOME=/home/db2inst1/sqllib
IBM_DB_INCLUDE=/opt/db2inst1/sqllib/include
Dammit!  I couldn't get past this.  I tried both
  • sudo easy_install ibm_db 
  • sudo pip install ibm_db
This was to no avail.  I even tried putting the whole thing in a bash script and doing the export IBM_DB_LIB stuff there.  Again, failure.

Finally, I downloaded the source and did the python setup.py build, which worked, so I was halfway there.  Then, I had to do the sudo setup.py install (since it was to be installed in system directories):  Failure (with above message about missing IBM_DB_HOME environment variable).  But, I could do the env and see the vars there!  So, since I'm doing this from source, I edited setup.py and put a pprint in, showing the os.envoron.  This made it obvious.  It showed I was executing as sudo, so as root, and root didn't have the userprofile being executed, so not var was being set.

Quick:  "man sudo" !!

SOLUTION:  It shows that to do this right, you must invoke sudo with -E to keep the environment variables from the current environment.
$ sudo -E python setup.py install
Hurray!  Installed!


Friday, December 13, 2013

Python version of iostat.c

Ok, so I have a problem. I'm trying to create metrics for a Linux system, an insert them into a local database. Constraints:
  • Runs on lots of machines;
  • machines may be heavily loaded;
  • Sometimes the kick-off time is delayed beyond a 1-minute standard (typically due to load);
  • I don't like having long-running subprocesses: they might stop/fail and I'd have to restart, they use memory, etc.;
  • I'm doing this in Python;
  • I can store state between runs in a pickle file;
  • I want to replicate the existing fields coming out of vmstat -s and iostat -x -D n (where n is number of seconds of sample size);
  • I want the values of these fields to match likewise.
So, I need to replicate iostat.c in python. I can get absolute numbers from iostat and vmstat and stat, and do the math myself, storing state from the last time I ran it and subtracting to get a diff.

Problem 1: Where is the source code for iostat.c ? In ubuntu at least (hoping RHEL / CENTOS is similar) it turns out it's in the sysstat package. I found systat source at: http://freecode.com/projects/sysstat.

Inside this package, there's source code for iostat as a file named iostat.c. I have yet to find it online, so here it is:
General plan:
  1. Read current values from /proc/stats, /proc/diskstats, and vmstat -s
  2. read values from previous run from disk
  3. find diffs
  4. use diffs to compute values needed
  5. write current values to disk for next time.
This will involve significant coding, don't know if I'll have space to post it all here...


Sunday, October 27, 2013

SOLVED: Ubuntu installation of Canon MX452 Inkjet Printer

Unlike many printers and Linux, this install went simply.
  • Unbox printer.
  • remove orange packing tape.
  • unbox power cords and usb cable, install.
  • open front pull-down cover, and one under it - push down on grey loops gently and put in inkjet cartridges.
  • Put paper in at bottom, only hold 50 sheets or so.
  • turn on, wait.
  • open terminal window, type sudo ls and enter pw.
  • open browser, get download from http://support-sg.canon-asia.com/contents/SG/EN/0100515301.html
  • in terminal, cd ~/Downloads
  • tar -xzvf cnijfilter*
  • cnijfilter*
  • On printer, turn off and on again just in case.
  • sudo ./install.sh
  • follow prompts accepting defaults.
  • in browser, open google.com and print open page as test. Should hear printer working.
Done.

Tuesday, October 22, 2013

Optimizing Python - getting data out of memcache with struct.unpack

So, I have this Memcache data store that holds timestamps and values from a monitoring application. Since each memcache key corresponds to an hour's data, I only need to store 2 bytes for the number of seconds past the hour. I don't care about duplicate data being stored, but on retrieval I'd like to eliminate it if it exists.

Input data is: (ts,val), (ts, val), ... encoded using Python's struct.pack command. The ts (timestamp) is (as noted) packed with format h (unsigned int). The val (value) is a floating point number of 4 bytes, packed with format f.

The original version of this encoding was:

    def OLD_rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        while raw:
            ts, val, raw = raw[:2], raw[2:6], raw[6:]
            ts = timeOffset + struct.unpack('h', ts)[0]
            val = struct.unpack('f', val)[0]
            tsVals.append((ts, val))
        return tsVals
I found this version:
  • ran really slowly;
  • didn't eliminate duplicate values;
  • would really choke the longer the input data (as in 33,000 datapoints in an hour).

For the second version, I knew I had to stop with the copying of the data string over and over again, which I knew was eating major cycles.

Doing some math, I figured out I could iterate over the string, extracting each element and converting the two parts to python numbers.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

This was better timewise, but didn't remove duplicate data. I made the 'seen it yet' test occur even before the conversion to (int, float), which saved a bit of time doing useless conversions.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            if rawtime in seenTimes:
                continue
            seenTimes.add(rawtime)
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

Yet, it was STILL TOO SLOW. Where was the time going? I timed the various parts and found the slow bit was the conversion to int/float. That unpack was happening a lot and the time added up.

I tried the following but FAILED.

        # BAD DON'T USE ** BAD DON'T USE **
        elems  = rawLen / 6.0  # 6 bytes per - 2=time + 4=data.
        intElems = int(elems)
        if (elems != intElems):
            self.log.warning("elems non-integer: len: %s" % (rawLen))
            return []
        unp = struct.unpack("hf"*intElems, raw)
        # BAD DON'T USE ** BAD DON'T USE **

The above fails because if we pack these things together, there's a word-alignment problem that unpack is unable to cope with. It would have to be something like (int, zeroes, float) to make the float align on a word boundary.

But, I couldn't give up, this had to work better. So, I extract all the ints, string those together and unpack them, then do the same thing with the floats.

HERE IS THE FINAL VERSION:

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        try:
            rawLen = len(raw)
            times = ""
            vals  = ""
            for i in range(0, rawLen, 6):
                rawtime = raw[i:i+2]
                if rawtime in seenTimes:
                    continue
                times += rawtime
                vals  += raw[i+2:i+6]
            timesList = struct.unpack('h'*(len(times)/2), times)
            valsList  = struct.unpack('f'*(len(vals)/4),  vals)
            assert len(timesList) == len(valsList), "Lens of times and vals unequal, t=%s, v=%s" % (len(timesList), len(valsList))
            for i in range(0, len(timesList)):
                tsVals.append((timeOffset+timesList[i], valsList[i]))
            #self.log.debug("unpacked %d vals, len ts %s." % (len(timesList), len(tsVals)))
        except:
            tb = traceback.format_exc()
            self.log.debug("tb in rawDataToTsVals(): rawSize: %s, %s" % (rawLen, tb))
            pass
        
        if 0:  # debugging
            self.log.debug("tsvals: %s" % ( tsVals))
        return tsVals

Unpacking all the h's at the same time, and likewise the floats, makes everything align, and since it's one function call to struct, is very fast.

Enjoy!

Monday, September 09, 2013

Alternate Fortune Cookie Sayings

I had 'Panda Express' Chinese take-out for lunch, and found the fortune cookes to be too boring. Thus, I've created some of my own, in the hope they liven things up a bit. Feel free to reply with some, if you're inspired to do so.
  • Avoid doing new things, you might get hurt.
  • Your opinions are frequently wrong.
  • Now is not a good time to invest.
  • Your face betrays you.
  • Other people work harder than you do.
  • Your life has no meaning.
  • Your lucky number is Zero.
  • Avoid having opinions, you might be wrong.
  • Do not finish any projects tomorrow.
  • Avoid doing things that require too much thought.
  • Your efforts are doomed to failure.
  • Ask people for help, pleading stupidity.
  • Insult the nationality of all new people.
  • Your smile looks pathetic, only use it when begging.
  • Your kids will always ignore you.
  • Things are always as they seem.
  • Bad news is coming, in large amounts.
  • Fear your loved ones.
  • Sunrises bode not well for your financial future.
  • Steal everything before your friends stop liking you.
  • Wear red on Tuesdays to avoid a dishonorable death.
  • Cancel all your credit cards before it's too late.
  • Your wishes of last Wednesday will never come true.
  • Buy new underwear before they find out.
  • Several former friends are conspiring against you.
  • Never order fries with that.
  • People's clothes indicate if they like you.
  • Someone knows what you did that time.
  • Invest only in companies starting with the letter F.
  • Learn how to cook roadkill safely. Soon.
  • To see far, sleep on your roof.
  • Learn how to chop the head off an attacker.
  • As of yesterday, praying became useless.
  • The Antichrist was born last week.
  • To kill the bugs, wash all your clothes on hot.
  • Your spouse and your best friend have a secret.
  • When you see the flash, duck, it might help.
  • Horseback riding will be useless in 3 years.
  • Farm animals like you until they smell you.
  • Unplug your lamps and toaster before it's too late.
  • A phone call will soon make you cry loudly.
  • Fear everything, it's safer that way.
  • Put your affairs in order before tomorrow night.
  • Give away your posessions, but don't leave a note.
  • Contemplate everything, but don't commit, ever.
  • Physical pain can help you make hard decisions.
  • To change the inside, change the inside.
  • Lock up your women. 
Have any others?

Tuesday, August 13, 2013

Example of how to set JournalCommitInterval in MongoDB

I have found no way yet to find the current value of journalCommitInterval.

But, I have found how to set the journalCommitInterval.

We've got our mongodb set up to use a local disk for our journal, and data resides on a SAN-mounted mount point.

We'd like to set our journalCommitInterval up to a larger value to cut down on the IOPS to our local disk. This might have the added benefit of helping us run faster.

I've run the following command against a mongos, and it complains:

mongos> db.adminCommand({ "setParameter" : 1, "journalCommitInterval": 499});
{ "ok" : 0, "errmsg" : "journaling is off" }
When I run this against a specific daemon, it works up to a value of 499 (setting it to 500 generates a complaint/error):
shard-000:PRIMARY> db.adminCommand({ "setParameter" : 1, "journalCommitInterval": 499});
{ "ok" : 1 }
I have posed the follwoing questions to 10Gen:
  • Question 1: do I need to run this against each mongod individually?
  • Answer 1: Yes.
  • Question 2: does this persist across restarts of the daemon?
  • Answer 2: No. That daemon's journalCommitInterval returns to its default when a daemon are restarted. Thus, it's better to set it in the config file you have defined for the shard. Or, if you define it via command line, do something like:
    /path/to/mongod --journalCommitInterval=499 ...
    
I have just created a minor request in core-server to expose the current value so we can verify it has been set correctly on all shards. If you agree this would be a handy feature to have, please watch and VOTE on the case, https://jira.mongodb.org/browse/SERVER-10508

Thursday, August 08, 2013

Solved: Installing Linux on Acer Aspire V5-122P-0643 Quadcore AMD laptop

So, my wife wanted an Acer Aspire laptop because of its size, primarily. The form factor was just right for her. But, it came with Windows 8, which she really hates (she loves Ubuntu Linux and Libre Office Writer). So, I downloaded Ubuntu Linux 13.04 Raring Ringtail and installed it on this Aspire V5 122P 0643 box. Complications?

1. windows 8 doesn't like to give up control of the BIOS. Going to system settings and boot from USB key (which I'd already put Ubuntu's install image on via another box).

2. Once installed, it wanted to reboot. I did so.

3. VAST multi-day hassles due to black screen after boot, then prompting for user login: instead of going to the graphical user login page familiar to all Ubuntu users. I messed around for a long time trying various things. Running startx failed with a message about no screens. Trying to reboot with some setting for recovery mode got me success once, but I couldn't reproduce it.

The command to show all hw installed revealed the video chipset is AMD Radeon 8280. This appears to be the only laptop, or in fact any device whatsoever, that uses this chipset, though Ubuntu seems to think it's on some desktops somewhere (in their compatibility pages).

4. FINALLY, solved the problem. On my other laptop, navigated to find the download for AMD's proprietary driver, and found the download destination.

Logged into Ubuntu using the user created during installation. Did 'sudo ls' and reentered my password so I had sudo privs without prompting. Then, downloaded the AMD driver from the link on this page: http://support.amd.com/us/gpudownload/linux/Pages/radeon_linux.aspx

the download link was http://www2.ati.com/drivers/linux/amd-driver-installer-catalyst-13-4-linux-x86.x86_64.zip, so I retyped it at the command prompt as:

$ wget http://www2.ati.com/drivers/linux/amd-driver-installer-catalyst-13-4-linux-x86.x86_64.zip

and it downloaded. I

then did: unzip amd* and then did: chmod 755 amd* then agreed to everything and installed it, rebooted per instructions, and voila! IT WORKED!

So, the answer is that the fglrx drivers (automatically installed as part of the amd drivers download, methinks) are the correct solution here.

The touch screen does work , but is of limited usefulness since the icons are so small in a normal linux desktop screen, as they should be. After all , windows 8 sucks at trying to unite the phone with the desktop environment, because they're disparate platforms and everyone except Microsoft seems to know this.

Enjoy!

Friday, July 19, 2013

How to Disable password prompt during ssh login with key failure

How to Disable password prompt during ssh login with key failure

I wanted to login to a bunch of servers and test whether I could get in, or if I got error messages.

After messing around a while, I worked it into:

for n in `cat allhosts.txt`; do echo "host: ${n}"; ssh -oConnectTimeout=2 -oKbdInteractiveAuthentication=no -oPasswordAuthentication=no -oStrictHostKeyChecking=no -oChallengeResponseAuthentication=no myuser@${n} echo '----------'; done

Thursday, June 20, 2013

Question on StackOverflow: Best Unittest for Python programs using OptionParser() ?

Just posted this on stackoverflow. http://stackoverflow.com/questions/17223530/python-how-do-i-usefully-unittest-a-method-invoking-optionparser

MagicMock Returns different values each call

Recent discoveries about mock objects!

I'm trying to create a Python unittest that exercises a class method with a time() call in it.

That is I have this:

class DumObj(object):
def methodOne(self, intime):
while time.time() < intime:
... do stuff ...
so, I refactored this to:

class DumObj(object):
def nowTime(self):
return time.time()
def methodOne(self, intime):
while self.nowTime() < intime:
... do stuff ...
and have to test methodOne. How to get it to go through the loop once (or twice)?

I tried setting the return value to the output of a function:

import unittest
from mock import Mock, MagicMock
class TestDumObj(unittest.TestCase)
def test_methodOne(self):
d = DumObj()
retvals=[1,2,3,4,5]
def mf():
ret = retvals.pop()
return ret
d.nowTime = MagicMock(return_value=mf()) d.methodOne()

Doesn't work. Here's the interactive version:

>>> from mock import MagicMock
>>> retvals = [ 1, 2, 3, 4, 5 ]
>>> def mse(*args, **kwargs):
... ret = retvals.pop()
... return ret
...
>>> mm = MagicMock(return_value=mse())
>>> mm()
5
>>> mm()
5

Damn! I want MagicMock to return different values each time it's called. I want to have an array of return values for MagicMock, and each call it returns the next one. How to do this? Turns out it's using the side_effect param. Continuing from above:

>>> mm = MagicMock(side_effect=mse)
>>> mm()
4
>>> mm()
3
Yay! Solved: MagicMock returns different values each call. Now, tie it into the overall test:

import unittest
from mock import Mock, MagicMock
class TestDumObj(unittest.TestCase)
def test_methodOne(self):
d = DumObj()
retvals=[1,2,3,4,5]
def mse():
ret = retvals.pop()
return ret
d.nowTime = MagicMock(side_effect=mse) d.methodOne()

Works!

Tuesday, May 14, 2013

Drop-Thru Code Considered Harmful

Recently, I was describing my frustrations with some Python code that had been written by a now-departed co-worker.  I describe this code as "Drop-Thru Code".  The main characteristic is that it uses module scope for a non-trivial number of variables.

Module scope is functionally almost-global.  That is, it's so close to global you'll want to use it, but it's far enough away that you'll end up shooting yourself in the foot.

Consider this code snippet:

#!/bin/env python2.6
import os
import sys
varname = 33
vardict = { 'a' : 44 }
# other imports here
print "Starting!"
def something():
print "thing", varname
print "next"
something()
print "end"

This is what I call drop-thru code.  All the variables are global, and the call to something() will fail because varname is out of scope.

Contrast this with what I prefer, well-encapsulated code:

#!/bin/env python2.6
import os
import sys

class SomeThing(object):
    def __init__(self):
        self.varname = 33 
    def something(self):
        print "thing", self.varname
    def main(self):
        self.vardict = { 'a' : 44 }
        # other imports here
        print "Starting!"
        print "next"
        self.something()
        print "end" 

st = SomeThing()
st.main()

Nothing is global, it's obvious what the scope for things is.  Clean, beautiful, easy to understand.

There are lots of examples of scope problems.  I created this one for my team so they understood my frustration:

outside = 1
def function1():
    try:
        print "f1 outside: %s" % (outside)
        outside += 1
    except:
        print "no outside in function1."
print "a outside: %s" % (outside)
function1()                                     
print "b outside: %s" % (outside)
def function2():
    global outside
    try:
        print "f2 outside: %s" % (outside)
        outside += 1
    except:
        print "no outside in function2"
function2()
function2()
class Dum(object):
    def __init__(self):
        print "dum instantiated."
    def main(self):
        print "Dum main outside: %s" % (outside)
    def changeOutside(self, inval):
        outside = inval
        print "Dum changed: outside: %s" % (outside)
    def changeGlobalOutside(self, inval):
        global outside
        outside = inval
        print "Dum changed: outside: %s" % (outside)
d = Dum()
print "c outside: %s" % (outside)
d.main()
print "d outside: %s" % (outside)
d.changeOutside(33)
print "e outside: %s" % (outside)
d.main()
d.changeGlobalOutside(44)
d.main()
print "f outside: %s" % (outside)
______________________________________________________
output:
krice4@zaphod:~/checkouts/userSandboxes/krice4$ python scopeTest.py 
a outside: 1
no outside in function1.
b outside: 1
f2 outside: 1
f2 outside: 2
dum instantiated.
c outside: 3
Dum main outside: 3
d outside: 3
Dum changed: outside: 33
e outside: 3
Dum main outside: 3
Dum changed: outside: 44
Dum main outside: 44
f outside: 44

In short, Drop-Thru Code is considered harmful.  It allows for lots of scope problems that show up as bugs and frustrations later.  So, avoid them.  Put all the vars you can in a class and invoke that class, you'll be glad you did.  IMHO.

Friday, May 10, 2013

Subversion pre-commit hook script - Python files: prevent tabs, verify svn properties

Here's a precommit hook script I've modified from one I've used before.  I hope it comes in handy.  I'm going to try to submit it to the dev group of subversion itself for inclusion in the contrib/hook-scripts directory.

#!/bin/env python

"""
    pre-commit hook script that does several things:
    * prevents committing any python file containing a tab character.
    * checks if there are tabs in the source file and warns if so;
    * aborts if incorrect properties of eol-style and keywords 'id'.
"""
import sys
import os
import traceback
from optparse import OptionParser

#sys.stderr.write("NOTE:  pre-commit hook script enabled - checks for tabs, svn eol-style and id properties...\n")

def command_output(cmd):
    " Capture a command's standard output. "
    import subprocess
    return subprocess.Popen(
        cmd.split(), stdout=subprocess.PIPE).communicate()[0]

def files_changed(look_cmd):
    """ List the files added or updated by this transaction.

        "svnlook changed" gives output like:
          U   trunk/file1.cpp
          A   trunk/file2.py
    """
    def filename(line):
        return line[4:]

    def added_or_updated(line):
        return line and line[0] in ("A", "U")

    retval = []
    for line in command_output(look_cmd % "changed").split("\n"):
        if added_or_updated(line):
            retval.append(filename(line))
    #sys.stderr.write("files changed: %s" % (retval))
    return retval

def file_contents(filename, look_cmd):
    " Return a file's contents for this transaction. "
    return command_output("%s %s" % (look_cmd % "cat", filename))

def file_get_properties(filename, look_cmd):
    propslines = command_output("%s %s" % (look_cmd % "proplist -v", filename))
    res = {}
    for line in propslines.split('\n'):
        line = line.strip()
        if not line:
            continue
        k, v = line.split(' : ')
        res[k] = v
    return res

def contains_tabs(filename, look_cmd):
    " Return True if this version of the file contains tabs. "
    return "\t" in file_contents(filename, look_cmd)

def check_py_files(look_cmd):
    " Check Python files in this transaction are tab-free. "
   
    def is_py_file(fname):
        return os.path.splitext(fname)[1] == ".py"
   
    py_files_with_tabs    = set()
    py_files_bad_eolstyle = set()
    py_files_bad_exec     = set()
    py_files_bad_keywords = set()
    for ff in files_changed(look_cmd):
        if not is_py_file(ff):
            continue
        if contains_tabs(ff, look_cmd):
            py_files_with_tabs.add(ff)
        props = file_get_properties(ff, look_cmd)
       if props.get('svn:special'):
            sys.stderr.write("file %s has svn:special flag, probably a symlink. don't check other props." % ff)
            continue
        eolstyle = props.get('svn:eol-style')
        #sys.stderr.write("props: %s\neolstyle: '%s'\n" % (props, eolstyle))
        if eolstyle in ('native', 'LFFFFF'):
            py_files_bad_eolstyle.add(ff)
        execut   = props.get('svn:executable')
        if execut not in ['ON', '*']:
            py_files_bad_exec.add(ff)
        keywords = props.get('svn:keywords')
        if (not keywords) or ('Id' not in keywords.split()):
            py_files_bad_keywords.add(ff)

    preventCommit = False
    if len(py_files_with_tabs) > 0:
        sys.stderr.write("The following files contain tabs:\n%s\n"                                                                              % "\n".join(py_files_with_tabs))
        preventCommit = True
    if len(py_files_bad_exec) > 0:
        sys.stderr.write("The following py files are missing 'executable' property, committing anyway, but please fix this:\n%s\n"              % "\n".join(py_files_bad_exec))
        # note, do not prevent commit over this, just warn.
    if len(py_files_bad_keywords) > 0:
        sys.stderr.write("The following files don't have keywords property set to 'Id' at least.  Please fix this before committing:\n%s\n"     % "\n".join(py_files_bad_keywords))
        preventCommit = True
    if len(py_files_bad_eolstyle) > 0:
        sys.stderr.write("The following files don't have svn propset svn:eol-style 'LF', please do so before committing:\n%s\n"                 % "\n".join(py_files_bad_eolstyle))
        preventCommit = True

    return preventCommit

def main():
    usage = """usage: %prog REPOS TXN
        Run pre-commit options on a repository transaction."""

    parser = OptionParser(usage=usage)
    parser.add_option("-r", "--revision",
                      help="Test mode. TXN actually refers to a revision.",
                      action="store_true", default=False)
    errors = 0
    try:
        (opts, (repos, txn_or_rvn)) = parser.parse_args()
        look_opt = ("--transaction", "--revision")[opts.revision]
        look_cmd = "svnlook %s %s %s %s" % (
            "%s", repos, look_opt, txn_or_rvn)
        errors += check_py_files(look_cmd)
    except:
        parser.print_help()
        errors += 1
        sys.stderr.write("Pre-commit hook traceback: %s" % (traceback.format_exc()))
    return errors

if __name__ == "__main__":
    sys.exit(main())