[prev] 65 [next]

Similarity-based Retrieval (cont)

Similarity-based retrieval requires a distance measure
  • dist(x,y) ∈ 0..1,     dist(x,x) = 0,     dist(x,y) = dist(y,x)
where x and y are two objects (in the database)

Note: distance calculation often requires substantial computational effort


How to restrict solution set to only the "most similar" objects:

  • threshold dmax   (only objects t such that dist(t,q) ≤ dmax)
  • count k   (k closest objects (k nearest neighbours))
BUT both above methods require knowing distance between query object and all objects in DB