Application of Geometry to Similarity
Search Problems
Table
of Contents
Preface
1. Introduction
1.1 The Importance of Similarity Search
1.2 Research on Similarity Search: An Overview
1.3 Challenges of Similarity Search
1.4 What Geometry Can Offer
1.5 Summary
2. Object Representation
2.1 Introduction
2.2 Feature Extraction
2.2.1 1-D Time Series
2.2.2 2-D Objects
2.2.3 Complex Objects
2.3 Distance Metrics
2.4 Intrinsic Dimensionality
2.5 Transformation and Dimensionality Reduction
2.6 Summary
3. Similarity Search Problems
3.1 Introduction
3.2 Similarity Search Problems
3.3 Properties of High Dimensional Data
3.4 Similarity Measures
3.5 Multidimensional Access Methods
3.6 Challenges of Similarity Search in High-Dimensional Space
3.7 The 2-Step Search Approach
3.8 Approximation
3.8.1 Lower-Bounding of Distance Metric
3.8.2 Indexability
3.8.3 Compression of Index
3.8.4 Transformation
3.8.5 Other Approaches
3.9 Performance Metrics
3.10 Summary
4. High-Dimensional Geometry and Similarity Search: An Introduction
4.1 Introduction
4.2 Elements
4.2.1 Point
4.2.2 Clusters of Points
4.2.3 Hyperplane
4.2.4 Hyperrectangle
4.2.5 Hypersphere
4.2.6 Iso-Hypersurface
4.3 Space Transformation
4.4 Bounded Volume
4.5 Surface Complexity
4.6 Some Important Iso-Hypersurfaces
4.6.1 Definitions
4.6.2 Iso-µ Hyperplanes
4.6.3 Iso-σ Hypercylinders
4.7 Summary
5. Similarity Search Problems in the Context of Geometry
5.1 Introduction
5.2 Transformations
5.3 Retrieval Sets
5.4 Hypercubic Approximation
5.5 The Diagonal Mapping
5.6 Unboundedness of Popular Transformations
5.7 Bounded Approximation
5.8 Approximation Using Iso-Hypersurfaces
5.9 Workspace Mapping
5.10 Data Distribution
5.11 Summary
6. Approximation of Search Sphere Using Iso-Hypersurfaces
6.1 Introduction
6.2 Lower Bounds in Sn
6.2.1 Bound of µ
6.2.2 Bound of σ
6.3 Approximation Using Iso-µ and Iso-σ Hypersurfaces
6.4 Approximation in Subspace Sm
6.5 Indexing and Search
6.6 Performance Evaluation
6.6.1 Settings
6.6.2 Configurations
6.6.3 Comparative Studies
6.6.4 Effectiveness of the Techniques
6.6.5 Efficiency
6.6.6 Effects of Unboundedness
6.7 Summary
7. Approximation of Search Sphere Using Iso-Hypertorus
7.1 Introduction
7.2 Approximation using Iso-(µ, σ) Hypersurface
7.2.1 Bound of µ and σ
7.2.2 Bounding the Search Sphere with Iso-(µ, σ) Hypersurface
7.2.3 Performance Analysis
7.2.4 Approximation in Subspace Sm
7.2.5 Lower Bounding Measure
7.2.6 Search Algorithm
7.3 Performance Evaluation
7.3.1 Settings
7.3.2 Effectiveness of Lower-Bounding Measures
7.3.3 Resources and Response Time
7.4 Summary
8. Similarity Measure in the Presence of Translation, Scaling and Inverse Scaling
8.1 Introduction
8.2 Background and Related Work
8.2.1 Similarity Search in the Presence of T/S/IS
8.2.2 Dimensionality-Reduction Approximation
8.3 Similarity Measures
8.3.1 Distance Measure Revisited
8.3.2 New Formula 1
8.3.3 New Formula 2
8.3.4 The Proposed Formula
8.4 Retrieval Procedure
8.5 Properties
8.6 Techniques Revisited
8.7 Performance Evaluation
8.7.1 Approximation Performance
8.7.2 Similarity Measure Performance
8.7.3 Performance in Realistic Environment
8.8 Summary
9.
Similarity Search in
9.1 Introduction
9.2 Related Work
9.3
9.3.1 Representation of the Relevant Set
9.3.2 The Principal Set
9.3.3 Query Expansion
9.4 Retrieval of the Relevant Set
9.4.1 Considerations
9.4.2 Determining the Exact Set
9.5 Approximation of Relevant Sets
9.5.1 Approximation Schemes
9.5.1.1 Minimum Bounding Hyperrectangle (MBR)
9.5.1.2 Minimum Bounding Hypersphere (MBS)
9.5.1.3 Minimum Bounding Iso Hypersurface (MBI)
9.5.1.4 Minimum Bounding Merged Hyperspheres (MBMH)
9.5.2 Boundary-based Minimum Bounding Hyperrectangle (bMBR)
9.6 Efficient Indexing and Search
9.7 Performance Study
9.7.1 Performance of Approximation Techniques
9.7.2 An Implementation
9.8 Summary
10. The Antenna Structure
10.1 Introduction
10.2 Generalization of the µ, σ Mapping
10.3 The Antenna Structure
10.4 Clustering
10.5 Indexing
10.6 Search
10.7 Properties
10.8 Performance Evaluation
10.8 Summary
11. Fast Similarity Search under Arbitrary Lp Norm
11.1 Introduction
11.2 L2Embedding
11.2.1 Embedding into Normed Space L2
11.2.2 Computation of Maximum Stretch
11.2.3 Approximation for Arbitrary Lp
11.3 Approximation
11.3.1 Properties of the Convex Function
11.3.2 Lower Bounding of General Lp Metrics
11.4 Summary
12. Concluding Remarks
12.1 Summary
12.2 A Final Thought
12.2.1 The Universal Dataspace: Beyond the Datacube
12.2.2 Relativity of Boundedness in Similarity Search
12.3 Future Work
12.3.1 More Analysis and More Tools
12.3.2 A Powerful Framework for Other Challenging Problems
Appendix A Popular Transformations
A.1 Singular Value Decomposition
A.2 Discrete Fourier Transform
A.3 Discrete Wavelet Transform
A.4 Piecewise Aggregate Approximation
A.5 Gram-Schmidt Orthogonalization
Bibliography