Wednesday, April 17, 2013

Too Smart for their Britches

I just ran into a bit of bash script that did this:

conflist=`cat www-conf/* |sort -u |sed ':a;N;$!ba;s/\n/ /g'`

On first glance, it's getting cat'ing all the files in a dir, sorting, and returning the result.  BUT WAIT, there's a funky sed script in there.

Sure, I get the 's/\n/ /g' -> get rid of intervening newlines.  But what of the other stuff?

It turns out, some smart-ass decided to code up some fancy schmantzy "I'm Smarter Than You" thumb-to-nose action.   How it works, courtesy of http://www.grymoire.com/Unix/Sed.html :

  • The semicolons separate commands.
  • The :a is a tag, so we can 'goto a' later.
  • The N command says append lines together including their newline character;
    •  The "n" command will print out the current pattern space (unless the "-n" flag is used), empty the current pattern space, and read in the next line of input. The "N" command does not print out the current pattern space and does not empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space.
  • the $!ba command says, 'unless you're the last line of the file, branch to a (goto a).
    •   An easier way is to use the special character "$," which means the last line in the file.

So, the goal was to collapse all this data into a sorted list without newlines.  Note that if I echo $conflist, I see "a b c d e..." just like if I had left out those fancy branches.

What an utter waste of 10 minutes to figure this out, and another 10 to write this up for future reference and helping anyone else.

Questions:
  • Will I ever use this construct?  NO.  I'll find some other way to do it.  If I even need to, given Bash collapses things onto one line with an echo anyway.
  • Do I actively dislike the person who wrote this?  YES.  I don't know who it was, specifically.  But, it doesn't matter, I know what I need to know:  He wasted my time, and was what I would consider the professional equivalent of a braggart.  Code should be easy to read as a primary goal.
Note also, thanks for help:  http://www.catonmat.net/blog/sed-one-liners-explained-part-one/


Tuesday, April 02, 2013

Solved: MongoDB - How to add Binary data to a document using pymongo

I'm trying to add data to a MongoDB document. My data is tuples of doubles, and I'm adding to an array of them. This is a good example of how to add binary data to a mongodb document using pymongo. My doc looks like this (presume I have created it already):

$ mongo
mongos> use kevintest1
switched to db kevintest1
mongos> show collections
metricValue
system.indexes
mongos> db.metricValue.find()
{ "_id" : ObjectId("5159f82f64524c06f5cb1208"), "mtid" : 2, "seqno" : 0, "vals" : [ ] }
mongos> mtid2 = db.metricValue.findOne({ "mtid" : 2})
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [ ]
}
mongos> mtid2
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [ ]
}
mongos> mtid2.bsonsize()
Tue Apr  2 17:23:57 TypeError: mtid2.bsonsize is not a function (shell):1
mongos> Object.bsonsize(mtid2)
54
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
54

NOW, I take a break, and in a different window, I run the following program:
#!/usr/bin/python
import struct
import bson
import pymongo
from pymongo import Connection
from bson import Binary
from pprint import pprint, pformat
conn = pymongo.Connection('myboxname', 7333)
db = conn['kevintest1']
mvc = db.metricValue
print "have mvc.find: %s" % ([x for x in mvc.find()])
mybuffer = struct.pack("dd", 65535.7, 65535.8)
print "binary info to add:  %s" % (pformat(struct.unpack("dd", mybuffer)))
retval = mvc.update({ 'mtid': 2, 'seqno': 0}, { '$push': { 'vals' : Binary(mybuffer) }}, w=1)
print "retval of update: %s" % (retval)
print "After update, have mvc.find: %s" % ([x for x in mvc.find()]) 
 
So, I run it.  Then, I go back to the mongos window, and each time I run it, I check the bson size:

mongos> db.metricValue.findOne({ "mtid" : 2})
{
    "_id" : ObjectId("5159f82f64524c06f5cb1208"),
    "mtid" : 2,
    "seqno" : 0,
    "vals" : [
        BinData(0,"ZmZmZvb/70CamZmZ+f/vQA==")
    ]
} 
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
78
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
102
mongos> Object.bsonsize(db.metricValue.findOne({ "mtid" : 2}))
126
 
There you have it. 24 bytes per inserted pair of doubles. Without packing, it is 30 bytes, so I would be saving 24/30=12/15=4/5 => 20%.