Tips
Guideline for search
Some useful tips for tuning of search parameters:
Need more accurate search results:
Set a larger
L
inrii.Rii.query()
. This usually improves the accuracy, but makes the search slower. The recommended way is to set a multiple ofrii.Rii.L0
, e.g.,e.query(q=q, L=4 * e.L0)
If you find changing
L
is not enough, please construct the rii with a largerM
value. Again, this change boosts the accuracy, but the runtime becomes slower. Don’t forget that the dimensionality of the vectorD
must be dividable byM
.If your codec is pq, please consider to switch to opq.
Want to make the search faster:
Please run
rii.Rii.reconfigure()
with a largernlist
, such ase.reconfigure(nlist=4*np.sqrt(e.N))
If your task is subset-search, please consider sorting
target_ids
before passing it to the query function, and callrii.Rii.query()
withsort_target_ids=False
.Try a smaller
L
such ase.query(q=q, L=int(e.L0 / 2))
. This is not strongly recommended because the accuracy gets worse).
Adding vectors sequentially
You might want to add vectors one by one. There are two ways to achieve that.
The first option is simply calling rii.Rii.add_configure()
everytime.
# Suppose X is a set of vectors (np.ndarray with the shape (N, D))
e = rii.Rii(fine_quantizer=codec)
for x in X:
e.add_configure(vecs=x.reshape(1, -1)) # Don't forget reshaping (D, ) to (1, D)
This works perfectly.
But this would take time if you would like to add many vectors by this way.
It is because the rii.Rii.reconfigure()
function is called
(inside rii.Rii.add_configure()
) whenever a new vector x
is added.
The reconfiguration step creates postings list from scratch,
that does not need to be run for every addition.
Alternatively, you can call rii.Rii.add()
for each x
without updating
the posting lists, and run
rii.Rii.reconfigure()
finally.
e = rii.Rii(fine_quantizer=codec)
for x in X:
e.add(vecs=x.reshape(1, -1)) # Don't forget reshaping (D, ) to (1, D)
e.reconfigure()
This is much faster. The final results from both ways are identical.
Please remember that you must call rii.Rii.reconfigure()
in the final step to create posting lists.
Note that, if you receive your data in a batch way, that can be handled in the same manner:
# X1 is a set of vectors (batch). Xs is a set of batches.
# You might receive Xs as a generator/iterator
# because the whole Xs is too large to read on memory at once
Xs = [X1, X2, X3]
# Running "add_configure" everytime
e1 = rii.Rii(fine_quantizer=codec)
for X in Xs:
e1.add_configure(vecs=X)
# Or, you can run "add" for each batch, and finally run "reconfigure"
e2 = rii.Rii(fine_quantizer=codec)
for X in Xs:
e2.add(vecs=X)
e2.reconfigure()
Verbose flag
You can turn on/off the verbose flag via e.verbose = True
or e.verbose = False
. The default value is
decided by the verbose flag of the codec.
Version
The version of the package can be checked via rii.__version__
.