Application of Geometry to Similarity Search Problems

 

Table of Contents

 

Preface

 

1. Introduction

1.1 The Importance of Similarity Search

1.2 Research on Similarity Search: An Overview

1.3 Challenges of Similarity Search

1.4 What Geometry Can Offer

1.5 Summary

 

2. Object Representation

2.1 Introduction

2.2 Feature Extraction

2.2.1 1-D Time Series

2.2.2 2-D Objects

2.2.3 Complex Objects

2.3 Distance Metrics

2.4 Intrinsic Dimensionality

2.5 Transformation and Dimensionality Reduction

2.6 Summary

 

3. Similarity Search Problems

3.1 Introduction

3.2 Similarity Search Problems

3.3 Properties of High Dimensional Data

3.4 Similarity Measures

3.5 Multidimensional Access Methods

3.6 Challenges of Similarity Search in High-Dimensional Space

3.7 The 2-Step Search Approach

3.8 Approximation

3.8.1 Lower-Bounding of Distance Metric

3.8.2 Indexability

3.8.3 Compression of Index

3.8.4 Transformation

3.8.5 Other Approaches

3.9 Performance Metrics

3.10 Summary

 

4. High-Dimensional Geometry and Similarity Search: An Introduction

4.1 Introduction

4.2 Elements

4.2.1 Point

4.2.2 Clusters of Points

4.2.3 Hyperplane

4.2.4 Hyperrectangle

4.2.5 Hypersphere

4.2.6 Iso-Hypersurface

4.3 Space Transformation

4.4 Bounded Volume

4.5 Surface Complexity

4.6 Some Important Iso-Hypersurfaces

4.6.1 Definitions

4.6.2 Iso-µ Hyperplanes

4.6.3 Iso-σ Hypercylinders

4.7 Summary

 

5. Similarity Search Problems in the Context of Geometry

5.1 Introduction

5.2 Transformations

5.3 Retrieval Sets

5.4 Hypercubic Approximation

5.5 The Diagonal Mapping

5.6 Unboundedness of Popular Transformations

5.7 Bounded Approximation

5.8 Approximation Using Iso-Hypersurfaces

5.9 Workspace Mapping

5.10 Data Distribution

5.11 Summary

 

6. Approximation of Search Sphere Using Iso-Hypersurfaces

6.1 Introduction

6.2 Lower Bounds in Sn

6.2.1 Bound of µ

6.2.2 Bound of σ

6.3 Approximation Using Iso-µ and Iso-σ Hypersurfaces

6.4 Approximation in Subspace Sm

6.5 Indexing and Search

6.6 Performance Evaluation

6.6.1 Settings

6.6.2 Configurations

6.6.3 Comparative Studies

6.6.4 Effectiveness of the Techniques

6.6.5 Efficiency

6.6.6 Effects of Unboundedness

6.7 Summary

 

7. Approximation of Search Sphere Using Iso-Hypertorus

7.1 Introduction

7.2 Approximation using Iso-(µ, σ) Hypersurface

7.2.1 Bound of µ and σ

7.2.2 Bounding the Search Sphere with Iso-(µ, σ) Hypersurface

7.2.3 Performance Analysis

7.2.4 Approximation in Subspace Sm

7.2.5 Lower Bounding Measure

7.2.6 Search Algorithm

7.3 Performance Evaluation

7.3.1 Settings

7.3.2 Effectiveness of Lower-Bounding Measures

7.3.3 Resources and Response Time

7.4 Summary

 

8. Similarity Measure in the Presence of Translation, Scaling and Inverse Scaling

8.1 Introduction

8.2 Background and Related Work

8.2.1 Similarity Search in the Presence of T/S/IS

8.2.2 Dimensionality-Reduction Approximation

8.3 Similarity Measures

8.3.1 Distance Measure Revisited

8.3.2 New Formula 1

8.3.3 New Formula 2

8.3.4 The Proposed Formula

8.4 Retrieval Procedure

8.5 Properties

8.6 Techniques Revisited

8.7 Performance Evaluation

8.7.1 Approximation Performance

8.7.2 Similarity Measure Performance

8.7.3 Performance in Realistic Environment

8.8 Summary

 

9. Similarity Search in Multipoint Queries

9.1 Introduction

9.2 Related Work

9.3 Multipoint Queries

9.3.1 Representation of the Relevant Set 

9.3.2 The Principal Set

9.3.3 Query Expansion

9.4 Retrieval of the Relevant Set

9.4.1 Considerations

9.4.2 Determining the Exact Set

9.5 Approximation of Relevant Sets

9.5.1 Approximation Schemes

9.5.1.1 Minimum Bounding Hyperrectangle (MBR)

9.5.1.2 Minimum Bounding Hypersphere (MBS)

9.5.1.3 Minimum Bounding Iso Hypersurface (MBI)

9.5.1.4 Minimum Bounding Merged Hyperspheres (MBMH)

9.5.2 Boundary-based Minimum Bounding Hyperrectangle (bMBR)

            9.6 Efficient Indexing and Search

9.7 Performance Study

9.7.1 Performance of Approximation Techniques

9.7.2 An Implementation

9.8 Summary

 

10. The Antenna Structure

10.1 Introduction

10.2 Generalization of the µ, σ Mapping

10.3 The Antenna Structure

10.4 Clustering

10.5 Indexing

10.6 Search

10.7 Properties

10.8 Performance Evaluation

10.8 Summary

 

11. Fast Similarity Search under Arbitrary Lp Norm

11.1 Introduction

11.2 L2Embedding

11.2.1 Embedding into Normed Space L2

11.2.2 Computation of Maximum Stretch

11.2.3 Approximation for Arbitrary Lp

11.3 Approximation

11.3.1 Properties of the Convex Function

11.3.2 Lower Bounding of General Lp Metrics

11.4 Summary

 

12. Concluding Remarks

12.1 Summary

12.2 A Final Thought

12.2.1 The Universal Dataspace: Beyond the Datacube

12.2.2 Relativity of Boundedness in Similarity Search

12.3 Future Work

12.3.1 More Analysis and More Tools

12.3.2 A Powerful Framework for Other Challenging Problems

 

Appendix A Popular Transformations

A.1 Singular Value Decomposition

A.2 Discrete Fourier Transform

A.3 Discrete Wavelet Transform

A.4 Piecewise Aggregate Approximation

A.5 Gram-Schmidt Orthogonalization

 

Bibliography