Tuesday, August 26, 2014

SOLVED: Multiprocess updates of shared dict-of-dicts

Earlier, I posted the following:

I've run into my very first bug in Python.  It's a known, existing bug, but it's still the first time I've actually encountered one spontaneously.

I should caveat that I occassionally forget things, and I've been using Python 8 years, so it could be this has happened before but I've forgotten it.

The problem is as follows:

* I'm working in multiple processes, so I'm sharing a data structure across processes.  it's a dict.  So, I do

from multiprocessing import manager
d = manager.dict()

and share the sucker around.  Then, I do:

self.ddict.setdefault(mname, {})
#self.ddict[mname][ts] = val
self.log.error("updating mname: %s w/ ts: %s, val: %s" % (mname, ts, val))
self.log.error("ddict %s, mname at %s" % (self.ddict, self.ddict[mname]))


2014-08-26 11:20:57,221 MainThread ERROR updating mname: a.b.c.d w/ ts: 1409070057, val: 1.0
2014-08-26 11:20:57,221 MainThread ERROR ddict {'a.b.c.d': {}}, mname at {}
2014-08-26 11:20:57,222 MainThread ERROR ddict: {'a.b.c.d' {}}

According to http://bugs.python.org/issue6766, this is a known bug.  I'm using Python2.6 and cannot upgrade. 

Damnit.  Looking for a workaround.


Problem is that the shared memory 'watcher' manages the data structure and shares around updates to everyone using it. This means that this manager has to know when the data struct is updated in one process and tell the other processes to pull in the updates.

If you're using a dict-of-dicts DOD, the inner dict just looks like a blob that process A changes by direct access to the object, without informing DOD of the change. So, to fix this do:

dod = Manager.dict()
process ONE: dod['a'] = { 'x' : 12 }
# process TWO sees this update of a = x:12
process TWO: dod['a']['x'] = 13
# process ONE has no idea this happened, manager doesn't bring in the update.
# to fix:
process TWO: y=dod['a'];y['x']=13;dod=y
# this informs manager that dod is different.
Post a Comment