Tuesday, October 22, 2013

Optimizing Python - getting data out of memcache with struct.unpack

So, I have this Memcache data store that holds timestamps and values from a monitoring application. Since each memcache key corresponds to an hour's data, I only need to store 2 bytes for the number of seconds past the hour. I don't care about duplicate data being stored, but on retrieval I'd like to eliminate it if it exists.

Input data is: (ts,val), (ts, val), ... encoded using Python's struct.pack command. The ts (timestamp) is (as noted) packed with format h (unsigned int). The val (value) is a floating point number of 4 bytes, packed with format f.

The original version of this encoding was:

    def OLD_rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        while raw:
            ts, val, raw = raw[:2], raw[2:6], raw[6:]
            ts = timeOffset + struct.unpack('h', ts)[0]
            val = struct.unpack('f', val)[0]
            tsVals.append((ts, val))
        return tsVals
I found this version:
  • ran really slowly;
  • didn't eliminate duplicate values;
  • would really choke the longer the input data (as in 33,000 datapoints in an hour).

For the second version, I knew I had to stop with the copying of the data string over and over again, which I knew was eating major cycles.

Doing some math, I figured out I could iterate over the string, extracting each element and converting the two parts to python numbers.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

This was better timewise, but didn't remove duplicate data. I made the 'seen it yet' test occur even before the conversion to (int, float), which saved a bit of time doing useless conversions.

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        for i in range(0, len(raw), 6):
            rawtime = raw[i:i+2]
            if rawtime in seenTimes:
                continue
            seenTimes.add(rawtime)
            ts = timeOffset + struct.unpack('h', rawtime)[0]
            val = struct.unpack('f', raw[i+2:i+6])[0]
            tsVals.append((ts, val))
        return tsVals

Yet, it was STILL TOO SLOW. Where was the time going? I timed the various parts and found the slow bit was the conversion to int/float. That unpack was happening a lot and the time added up.

I tried the following but FAILED.

        # BAD DON'T USE ** BAD DON'T USE **
        elems  = rawLen / 6.0  # 6 bytes per - 2=time + 4=data.
        intElems = int(elems)
        if (elems != intElems):
            self.log.warning("elems non-integer: len: %s" % (rawLen))
            return []
        unp = struct.unpack("hf"*intElems, raw)
        # BAD DON'T USE ** BAD DON'T USE **

The above fails because if we pack these things together, there's a word-alignment problem that unpack is unable to cope with. It would have to be something like (int, zeroes, float) to make the float align on a word boundary.

But, I couldn't give up, this had to work better. So, I extract all the ints, string those together and unpack them, then do the same thing with the floats.

HERE IS THE FINAL VERSION:

    def rawDataToTsVals(self, timeOffset, raw):
        tsVals = []
        seenTimes = set()
        try:
            rawLen = len(raw)
            times = ""
            vals  = ""
            for i in range(0, rawLen, 6):
                rawtime = raw[i:i+2]
                if rawtime in seenTimes:
                    continue
                times += rawtime
                vals  += raw[i+2:i+6]
            timesList = struct.unpack('h'*(len(times)/2), times)
            valsList  = struct.unpack('f'*(len(vals)/4),  vals)
            assert len(timesList) == len(valsList), "Lens of times and vals unequal, t=%s, v=%s" % (len(timesList), len(valsList))
            for i in range(0, len(timesList)):
                tsVals.append((timeOffset+timesList[i], valsList[i]))
            #self.log.debug("unpacked %d vals, len ts %s." % (len(timesList), len(tsVals)))
        except:
            tb = traceback.format_exc()
            self.log.debug("tb in rawDataToTsVals(): rawSize: %s, %s" % (rawLen, tb))
            pass
        
        if 0:  # debugging
            self.log.debug("tsvals: %s" % ( tsVals))
        return tsVals

Unpacking all the h's at the same time, and likewise the floats, makes everything align, and since it's one function call to struct, is very fast.

Enjoy!

Post a Comment