Similarity-based Retrieval (cont)
Similarity-based retrieval requires a distance measure
- dist(x,y) ∈ 0..1, dist(x,x) = 0,
- dist(x,y) = dist(y,x)
where x and y are two objects (in the database)
Note: distance calculation often requires substantial computational effort
How to restrict solution set to only the "most similar" objects:
- threshold dmax
(only objects t such that dist(t,q) ≤ dmax)
- count k
(k closest objects (k nearest neighbours))
BUT both above methods require knowing distance between query object and all objects in DB
|