Tips

Guideline for search

Some useful tips for tuning of search parameters:

Need more accurate search results:
- Set a larger L in rii.Rii.query(). This usually improves the accuracy, but makes the search slower. The recommended way is to set a multiple of rii.Rii.L0, e.g., e.query(q=q, L=4 * e.L0)
- If you find changing L is not enough, please construct the rii with a larger M value. Again, this change boosts the accuracy, but the runtime becomes slower. Don’t forget that the dimensionality of the vector D must be dividable by M.
- If your codec is pq, please consider to switch to opq.
Want to make the search faster:
- Please run rii.Rii.reconfigure() with a larger nlist, such as e.reconfigure(nlist=4*np.sqrt(e.N))
- If your task is subset-search, please consider sorting target_ids before passing it to the query function, and call rii.Rii.query() with sort_target_ids=False.
- Try a smaller L such as e.query(q=q, L=int(e.L0 / 2)). This is not strongly recommended because the accuracy gets worse).

Adding vectors sequentially

You might want to add vectors one by one. There are two ways to achieve that.

The first option is simply calling rii.Rii.add_configure() everytime.

# Suppose X is a set of vectors (np.ndarray with the shape (N, D))
e = rii.Rii(fine_quantizer=codec)
for x in X:
    e.add_configure(vecs=x.reshape(1, -1))  # Don't forget reshaping (D, ) to (1, D)

This works perfectly. But this would take time if you would like to add many vectors by this way. It is because the rii.Rii.reconfigure() function is called (inside rii.Rii.add_configure()) whenever a new vector x is added. The reconfiguration step creates postings list from scratch, that does not need to be run for every addition.

Alternatively, you can call rii.Rii.add() for each x without updating the posting lists, and run rii.Rii.reconfigure() finally.

e = rii.Rii(fine_quantizer=codec)
for x in X:
    e.add(vecs=x.reshape(1, -1))  # Don't forget reshaping (D, ) to (1, D)
e.reconfigure()

This is much faster. The final results from both ways are identical. Please remember that you must call rii.Rii.reconfigure() in the final step to create posting lists.

Note that, if you receive your data in a batch way, that can be handled in the same manner:

# X1 is a set of vectors (batch). Xs is a set of batches.
# You might receive Xs as a generator/iterator
# because the whole Xs is too large to read on memory at once
Xs = [X1, X2, X3]

# Running "add_configure" everytime
e1 = rii.Rii(fine_quantizer=codec)
for X in Xs:
    e1.add_configure(vecs=X)

# Or, you can run "add" for each batch, and finally run "reconfigure"
e2 = rii.Rii(fine_quantizer=codec)
for X in Xs:
    e2.add(vecs=X)
e2.reconfigure()

Verbose flag

You can turn on/off the verbose flag via e.verbose = True or e.verbose = False. The default value is decided by the verbose flag of the codec.

Version

The version of the package can be checked via rii.__version__.