Thursday, September 04, 2014

SOLVED: MongoDB - Pymongo ConnectionFailure: function takes at most 4 arguments (5 given)

I was getting this odd message, leading to pymongo fail the Connection() and MongoClient() instantiation. No connection? Really? Getting this message:

[user@boxnamehere:logs]$ python
Python 2.6.6 (r266:84292, Jul 20 2011, 10:22:43)
[GCC 4.4.5 20110214 (Red Hat 4.4.5-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymongo import MongoClient
>>> mc = MongoClient('servername.here', 11222)
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 369, in __init__
    raise ConnectionFailure(str(e))
pymongo.errors.ConnectionFailure: function takes at most 4 arguments (5 given)
So, what's wrong? It's tough to find out. I went in, backed-up and modified the files in the lib64/python2.6/site-packages/pymongo and bson/ dirs. I imported traceback, added tb=format_exc() to each level of the try/except needed. This gave me a really long traceback but at least I could see what was happening.

Once modified, my test program now produces:

user@boxname:~$ ./testPymongo.py
starting.
Traceback (most recent call last):
File "./testPymongo.py", line 5, in
    mc = MC(host='servername.goes.here', port=11222, socketTimeoutMS=15000)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 373, in __init__
    raise ConnectionFailure(msg)
pymongo.errors.ConnectionFailure: function takes at most 4 arguments (5 given) Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 867, in __find_node
    member, nodes = self.__try_node(candidate)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 714, in __try_node
    {'ismaster': 1})
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 695, in __simple_command
    response = helpers._unpack_response(response)['data'][0]
File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 117, in _unpack_response
    compile_re)
TypeError: function takes at most 4 arguments (5 given)
, Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 367, in __init__
    self._ensure_connected(True)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 936, in _ensure_connected
    self.__ensure_member()
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 807, in __ensure_member
    member, nodes = self.__find_node()
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 900, in __find_node
    raise AutoReconnect(', '.join(errors))
AutoReconnect: function takes at most 4 arguments (5 given) Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 867, in __find_node
    member, nodes = self.__try_node(candidate)
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 714, in __try_node
    {'ismaster': 1})
File "/usr/lib64/python2.6/site-packages/pymongo/mongo_client.py", line 695, in __simple_command
    response = helpers._unpack_response(response)['data'][0]
File "/usr/lib64/python2.6/site-packages/pymongo/helpers.py", line 117, in _unpack_response
    compile_re)
TypeError: function takes at most 4 arguments (5 given)
So, we see the problem is that the Pymongo version is calling, at about line 85 of /usr/lib64/python2.6/site-packages/pymongo/helpers.py

result["data"] = bson.decode_all(response[20:],
                                  as_class, tz_aware, uuid_subtype,
                                  compile_re)

But, when I go into /usr/lib64/python2.6/site-packages/bson/_init_.py I see function defined differently:

def decode_all(data, as_class=dict,
               tz_aware=True, uuid_subtype=OLD_UUID_SUBTYPE):
    """Decode BSON data to multiple documents.

So, I could fix this in the bson code, by adding an optional param.

Instead, though, I'm going to just remove the extraneous param since I'm thinking I don't need it.

Changeset: One file modified: /usr/lib64/python2.6/site-packages/pymongo/helpers.py

[user@server:pymongo]$ diff helpers.py helpers.py.orig
16,117c116,117
<                                      as_class, tz_aware, uuid_subtype
<                                     )
---
>                                      as_class, tz_aware, uuid_subtype,
>                                      compile_re)

*Solution*

Turns out, the problem isn't with the code in pymongo. It's that MongoDB Corp. has chosen to bundle their own version of the 'bson' package in with pymongo, and that version doesn't engage in happy-kindergarden-play with the pypy version of bson.

So, to fix this, do the following (omitting the $ prompts so you can cut and paste). Note important steps of NOT BELIEVING PIP that it really uninstalled everything:

sudo pip uninstall bson
sudo pip uninstall pymongo
cd /usr/lib64/python2.6/site-packages
# Now, remove old versions of pymongo and bson. Pip doesn't delete everything, dammit.
sudo rm -rf bson pymongo*
sudo pip install pymongo

Done! Pymongo installs its own, proper version of bson.

Restart your processes and re-run your tests, it should work now. Pymongo will connect to your mongo cluster properly, etc.

Note that the above problem occurs especially frequently if the box is under heavy load because (as we noted above) the initial connection raised an AutoReconnect.

Enjoy your newfound powers in peace and with goodwill towards all.

Tuesday, September 02, 2014

SOLVED: Skype crashes on startup on Ubuntu 14.04

Just had a problem with Skype crashing immediately upon startup on my Ubuntu 14.04 box (as of Sept. 2, 2014). This was an unexpected, sudden, and seemingly a random failure.

Skype would startup with the blue skype starting screen, open the contacts window for several seconds, then the window would close with no message.

Needing a message, I invoked from the command line and saw messages:

me@boxname:~/.Skype$ skype
Gtk-Message: Failed to load module "overlay-scrollbar"
Gtk-Message: Failed to load module "unity-gtk-module"
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
... bunch of these ...
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
(skype:3825): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
Gtk-Message: Failed to load module "canberra-gtk-module"
Corrupt JPEG data: 3014 extraneous bytes before marker 0xd9
Aborted (core dumped)
me@boxname:~$

Found this set of directions:

http://community.skype.com/t5/Linux/Skype-is-crashing-with-Corrupt-JPEG-data/td-p/3455587

Worked perfectly.

me@boxname:~$ cd
me@boxname:~$ tar -czf my.dot.skype.dir.tgz ~/.Skype
me@boxname:~$ sqlite3 ~/.Skype/my.profile.name/main.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> select * from Messages where type = 68;
... bunch of crap here ...
sqlite> select count(*) from Messages where type = 68;
25
sqlite> quit
me@boxname:~$ skype
... same error messages as above, without the last one about 'Aborted (core dumped)'.
Success!

Thursday, August 28, 2014

SOLVED: diff 2 files, get results without markup

So, I want do a logrotate and remove files that are old, keeping only the latest 3 files. So, script starts with:

$ fullLoglist="/tmp/full_loglist"
$ keepLoglist="/tmp/keep_loglist"
$ rmLoglist="/tmp/rm_loglist"
$ ls -lrt ${d}/rs*log* > ${fullLoglist}

First, I create a file with all the files listed:

$ cat ${fullLoglist}
graphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21

I only want the top 3 files, though, and to rm the older ones. So, I extract the ones I want to keep:

$ cat ${fullLoglist} | head -3 > $keepLoglist
$ cat $keepLoglistgraphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46

Now, I need to get a diff of the rest of them so I know what to call rm on. But how? If I just do a diff, I get:

$ diff /tmp/full_loglist /tmp/keep_loglist
4,8d3
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
< graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21

I DON'T WANT the extra markup. I want diff without markup. I try to google:

linux diff without markup
linux diff without greater-than less-than
linux diff line format markup
linux diff supress markup
linux diff only different lines

I try various options, like:

$ diff --line-format "%L" --suppress-common-lines /tmp/full_loglist /tmp/keep_loglist
graphdb_1j/rs-47-a/rs-47-a.log
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T13-25-13
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-46-46
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21
This is WRONG, it prints all the lines, not just the different ones. Ug.

SOLUTION ONE: comm -3

I found two solutions. The first is to use comm, and pass a -3 option. This prints just the different lines. I hadn't ever heard of comm before, but it's nice:

$ comm -3 /tmp/full_loglist /tmp/keep_loglist
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21
SOLUTION TWO: sort | uniq -u

I don't care about the ordering of these things. So, I can use the simple solution of sort and uniq -u. The uniq command normally prints all unique lines, removing duplicates. But, an option is -u, which only prints lines that occur once-and-only-once.

$ cat /tmp/full_loglist /tmp/keep_loglist | sort | uniq -u
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T14-48-44
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-14-35
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T15-35-11
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-27T19-48-56
graphdb_1j/rs-47-a/rs-47-a.log.2014-08-28T13-44-21

SOLVED!   No >, no < signs with diff and no line number markup.

Script to implement this process, for your edification:

#!/bin/bash

cd /opt/storage
dirlist=`ls -d graphdb_[1234][ij]/rs-*`
fullLoglist="/tmp/full_loglist"
keepLoglist="/tmp/keep_loglist"
rmLoglist="/tmp/rm_loglist"

for d in $dirlist
do
echo "---------------------------"
echo "dir: ${d}"
ls ${d}/rs*log* > ${fullLoglist}
ls -al ${fullLoglist}
echo "full files: `cat ${fullLoglist}`"
cat ${fullLoglist} | head -3 > $keepLoglist
echo "keep files: `cat ${keepLoglist}`"
cat $keepLoglist $fullLoglist | sort | uniq -u > $rmLoglist
echo "rm   files: `cat ${rmLoglist}`"
if [ -s $rmLoglist ]
    then
        echo "Removing files...."
        cat $rmLoglist | xargs rm -v
    else
        echo "No files to remove this time."
    fi
done

Done.

Tuesday, August 26, 2014

SOLVED: Multiprocess updates of shared dict-of-dicts

Earlier, I posted the following:

I've run into my very first bug in Python. It's a known, existing bug, but it's still the first time I've actually encountered one spontaneously.

I should caveat that I occassionally forget things, and I've been using Python 8 years, so it could be this has happened before but I've forgotten it.

The problem is as follows:

* I'm working in multiple processes, so I'm sharing a data structure across processes. it's a dict. So, I do

from multiprocessing import manager
d = manager.dict()

and share the sucker around. Then, I do:

self.ddict.setdefault(mname, {})
#self.ddict[mname][ts] = val
self.log.error("updating mname: %s w/ ts: %s, val: %s" % (mname, ts, val))
self.ddict[mname].update({ts:val})
self.log.error("ddict %s, mname at %s" % (self.ddict, self.ddict[mname]))

output:

2014-08-26 11:20:57,221 MainThread ERROR updating mname: a.b.c.d w/ ts: 1409070057, val: 1.0
2014-08-26 11:20:57,221 MainThread ERROR ddict {'a.b.c.d': {}}, mname at {}
2014-08-26 11:20:57,222 MainThread ERROR ddict: {'a.b.c.d' {}}

According to http://bugs.python.org/issue6766, this is a known bug. I'm using Python2.6 and cannot upgrade.

Damnit. Looking for a workaround.

SOLVED:

Problem is that the shared memory 'watcher' manages the data structure and shares around updates to everyone using it. This means that this manager has to know when the data struct is updated in one process and tell the other processes to pull in the updates.

If you're using a dict-of-dicts DOD, the inner dict just looks like a blob that process A changes by direct access to the object, without informing DOD of the change. So, to fix this do:

dod = Manager.dict()
process ONE: dod['a'] = { 'x' : 12 }
# process TWO sees this update of a = x:12
process TWO: dod['a']['x'] = 13
# process ONE has no idea this happened, manager doesn't bring in the update.
# to fix:
process TWO: y=dod['a'];y['x']=13;dod=y
# this informs manager that dod is different.

Wednesday, January 15, 2014

WANTED: Complete List of WhiteHouse.Gov Petitions

I've created a petition on http://whitehouse.gov to modify copyright law to prevent copyright of any law enacted in the United States at the Federal, State, and Local level. This prevents jerk corporations from claiming copyright when someone tries to put the code/law/edict on a webpage.

But, there aren't enough signatures yet. So I tried advertising to my FB friends. I have to get to 150 before the petition is visible on the list of open petitions on the website.

That got me thinking: how do I find all the other petitions that aren't available for viewing yet due to being not-advertised enough?

They provide a short-URL for these petitions. This shortener probably is incremental. Mine is http://wh.gov/lInfx PLEASE SIGN IT and I figure the other petitions might have urls near it.

So, here's my program to find them:

#!/usr/bin/python

#http://wh.gov/lInfx
#http://wh.gov/lInfx

nums1 = [x for x in range(65, 90)]
nums2 = [x for x in range(97, 122)]
nums = []
nums.extend(nums1)
nums.extend(nums2)
print nums

with open("/tmp/urls.wh", "w") as OFH:
    for i in nums:
        for j in nums:
            for k in nums:
                for m in nums:
                    OFH.write("http://wh.gov/l%c%c%c%c\n" % (i, j, k, m))
                    

# wget -a out.log -t 1 --read-timeout=3 --max-redirect 1  --save-headers

No luck yet, it takes a long time to run.

https://petitions.whitehouse.gov/petition/holidays-muslim/6T6csRph
https://petitions.whitehouse.gov/petition//95WplFSK
https://petitions.whitehouse.gov/petition/muslim-should-have-holiday-their-holiday/dH8ZTMSf https://petitions.whitehouse.gov/petition/please-protect-peace-monument-nassau-county-new-york-eisenhower-park/hkXX3082

Tuesday, January 07, 2014

Solved: How to install Python ibm_db DB2 driver on RHEL

Solved: Installing Python's DB2 module (ibm_db) on Linux

Problem 1: I kept getting a message about not having include files installed. First, I had to get a sysadmin to install the files. But, he didn't finish the job, quite. I had to soft link it myself. That is, the include files are installed by default in /opt/ibm/db2/V10.1, but linked to from /opt/db2inst/sqllib. So:

$ sudo ln -s /opt/ibm/db2/V10.1/include /opt/db2inst/sqllib/include

That solved the include problem.

Problem 2: I was installing IBM's DB2 python driver on a RHEL 6.2 linux box, but I kept getting the message:

Detected 64-bit Python
Environment variable IBM_DB_HOME is not set. Set it to your DB2/IBM_Data_Server_Driver installation directory and retry ibm_db module install.

This was despite executing the userprofile script in /opt/db2inst/sqllib/userprofile, which set the environment vars properly:

xxxxx@xxxxxx:~/making/ibm_db-2.0.4.1$ env | egrep -i ibm
IBM_DB_LIB=/opt/db2inst1/sqllib/lib
IBM_DB_DIR=/opt/db2inst1/sqllib
IBM_DB_HOME=/home/db2inst1/sqllib
IBM_DB_INCLUDE=/opt/db2inst1/sqllib/include

Dammit! I couldn't get past this. I tried both

sudo easy_install ibm_db
sudo pip install ibm_db

This was to no avail. I even tried putting the whole thing in a bash script and doing the export IBM_DB_LIB stuff there. Again, failure.

Finally, I downloaded the source and did the python setup.py build, which worked, so I was halfway there. Then, I had to do the sudo setup.py install (since it was to be installed in system directories): Failure (with above message about missing IBM_DB_HOME environment variable). But, I could do the env and see the vars there! So, since I'm doing this from source, I edited setup.py and put a pprint in, showing the os.envoron. This made it obvious. It showed I was executing as sudo, so as root, and root didn't have the userprofile being executed, so not var was being set.

Quick: "man sudo" !!

SOLUTION: It shows that to do this right, you must invoke sudo with -E to keep the environment variables from the current environment.

$ sudo -E python setup.py install

Hurray! Installed!

Friday, December 13, 2013

Python version of iostat.c

Ok, so I have a problem. I'm trying to create metrics for a Linux system, an insert them into a local database. Constraints:

Runs on lots of machines;
machines may be heavily loaded;
Sometimes the kick-off time is delayed beyond a 1-minute standard (typically due to load);
I don't like having long-running subprocesses: they might stop/fail and I'd have to restart, they use memory, etc.;
I'm doing this in Python;
I can store state between runs in a pickle file;
I want to replicate the existing fields coming out of vmstat -s and iostat -x -D n (where n is number of seconds of sample size);
I want the values of these fields to match likewise.

So, I need to replicate iostat.c in python. I can get absolute numbers from iostat and vmstat and stat, and do the math myself, storing state from the last time I ran it and subtracting to get a diff.

Problem 1: Where is the source code for iostat.c ? In ubuntu at least (hoping RHEL / CENTOS is similar) it turns out it's in the sysstat package. I found systat source at: http://freecode.com/projects/sysstat.

Inside this package, there's source code for iostat as a file named iostat.c. I have yet to find it online, so here it is:

General plan:

Read current values from /proc/stats, /proc/diskstats, and vmstat -s
read values from previous run from disk
find diffs
use diffs to compute values needed
write current values to disk for next time.

This will involve significant coding, don't know if I'll have space to post it all here...