Similarity-based Retrieval (cont)
For some applications, Cost(dist(x,y)) is comparable to Tr
⇒ computing dist(t.val,q) for every tuple t is infeasible.
To improve this ...
- compute feature vector to capture "critical" object properties
- store feature vectors "in parallel" with objects (cf. signatures)
- compute distance using feature vectors (not objects)
i.e. replace dist(t,q) by dist'(vec(t),vec(q)) in previous algorithm.
Further optimisation: dimension-reduction to make vectors smaller
|