Package CHEM :: Package datatype :: Module compress :: Class GolombCodec
[hide private]
[frames] | no frames]

Class GolombCodec



Codec --+
        |
       GolombCodec


FORMAT:
  C_Alpha code        = len(Ns)
  1-bit flag          = counts? (T/F)=(1/0)
  C_Alpha code        = log_2 M
  C_Alpha code        = sum(counts) (optional)

  for each integers, with D = N_{i}-N{i-1}
    q-bits of 1:    q = floor(D/M)
    1-bit of 0
    M-bits:         r = D%M
    C_Alpha code      = count[N_{i}] (optional)



Instance Methods [hide private]
 
__init__(self, counts=False)
if counts=False, then only the keys are stored, and returned as a set.
 
encodeL(self, data, logM=None, counts=None, info={})
 
decodeI(self, iter)
 
iter_decode(self, iter)

Inherited from Codec: __call__, decode, finger

Method Details [hide private]

__init__(self, counts=False)
(Constructor)

 
>>> S=RandomDict(200,N=600)
>>> Ss=[RandomDict(200,N=600) for x in xrange(100)]
if counts=False, then only the keys are stored, and returned as a set.
>>> G=GolombCodec()
>>> G.decode(G(S))==set(S.keys())
1
>>> G.decode(G([]))==set()
1
>>> G.decode(G([0]))==set([0])
1
Otherwise, a dictionary is returned.
>>> G=GolombCodec(counts=True)
>>> G.decode(G(S))==S
1
We can compute the tversky similarity in many ways. First by using the python set objects.
>>> T=numpy.array([len(set(S) & set(s))/float(len(set(S) | set(s))) for s in Ss])
Second by converting them Golomb Fingerprints and using the fast C implementations.
>>> G=GolombCodec(); F=G.finger(S)
>>> Fs=[G.finger(s) for s in Ss]
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,10))
1
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,11))
1
The C implementation still works when counts are embeded, but doesn't use them.
>>> G=GolombCodec(counts=True)
>>> F=G.finger(S)
>>> Fs=[G.finger(s) for s in Ss]
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,10))
1
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,11))
1
Overrides: Codec.__init__

encodeL(self, data, logM=None, counts=None, info={})

 
Overrides: Codec.encodeL

decodeI(self, iter)

 
Overrides: Codec.decodeI