__init__(self,
counts=False)
(Constructor)
|
|
>>> S=RandomDict(200,N=600)
>>> Ss=[RandomDict(200,N=600) for x in xrange(100)]
if counts=False, then only the keys are stored, and returned as a
set.
>>> G=GolombCodec()
>>> G.decode(G(S))==set(S.keys())
1
>>> G.decode(G([]))==set()
1
>>> G.decode(G([0]))==set([0])
1
Otherwise, a dictionary is returned.
>>> G=GolombCodec(counts=True)
>>> G.decode(G(S))==S
1
We can compute the tversky similarity in many ways. First by using the
python set objects.
>>> T=numpy.array([len(set(S) & set(s))/float(len(set(S) | set(s))) for s in Ss])
Second by converting them Golomb Fingerprints and using the fast C
implementations.
>>> G=GolombCodec(); F=G.finger(S)
>>> Fs=[G.finger(s) for s in Ss]
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,10))
1
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,11))
1
The C implementation still works when counts are embeded, but doesn't
use them.
>>> G=GolombCodec(counts=True)
>>> F=G.finger(S)
>>> Fs=[G.finger(s) for s in Ss]
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,10))
1
>>> numpy.alltrue(T==finger.parallel_sim(F,Fs,1,1,1,11))
1
- Overrides:
Codec.__init__
|