Interface MultiMatcher<E>
-
- Type Parameters:
E
- the type of the elements being matched.
- All Known Implementing Classes:
MultiMatcher.Default
public interface MultiMatcher<E>
Logic for bidirectionally and exclusively linking all matching elements from two collections according to equality and/or sufficient similarity.Exclusviely means each element in both collections can at most be linked with one element from the other collection. Bidirectionally means the link between two elements always has two directions. If element A is linked to element B, element B is inherently linked to element A as well.
Equality and similarity are defined by
Equalator
andSimilator
functions that can be passed at creation time. All values controlling the matching algorithm can be optionally configured in the factory class if the default configuration is not desired. Additionally, a callback function for deciding found matches with questionable similarity can be injected.This is a powerful general purpose means of building associations of two sets of similar but not equal elements.
A very simple use case is the formal recognition of a changed table column structure (for which this class was originally developed).For example given the following two hypothetical definitions (old and new) of column names:
Old:- Name
- Firstname
- Age
- Address
- Freetext
- OtherAddress
New:- firstname
- lastname
- age
- emailAddress
- postalAddress
- noteLink
- newColumn1
- someMiscAddress
Similator
(seeLevenshtein.substringSimilarity(java.lang.String, java.lang.String)
) the algorithm produces the following associations:firstname -1.00- Firstname lastname -0.75- Name age -1.00- Age emailAddress -0.71- Email postalAddress -0.77- Address noteLink [new] newColumn1 [new] someMiscAddress -0.56- OtherAddress X Freetext
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
MultiMatcher.Default<E>
-
Method Summary
-
-
-
Method Detail
-
similarityThreshold
double similarityThreshold()
-
singletonPrecedenceThreshold
double singletonPrecedenceThreshold()
This is a measure of how "eager" the algorithm is to find as many matches as possible. The lower this threshold is, the more "single potential match" items will be preferred over actually better matching pairs just to not leave them unmatched. To deactivate this special casing, set the threshold to 1.0, meaning only items that fit perfectly anyway take precedence over others.
-
singletonPrecedenceBonus
double singletonPrecedenceBonus()
-
noiseFactor
double noiseFactor()
-
validator
MatchValidator<? super E> validator()
-
setSimilarityThreshold
MultiMatcher<E> setSimilarityThreshold(double similarityThreshold)
-
setSingletonPrecedenceThreshold
MultiMatcher<E> setSingletonPrecedenceThreshold(double singletonPrecedenceThreshold)
-
setSingletonPrecedenceBonus
MultiMatcher<E> setSingletonPrecedenceBonus(double singletonPrecedenceBonus)
-
setNoisefactor
MultiMatcher<E> setNoisefactor(double noiseFactor)
-
setSimilator
MultiMatcher<E> setSimilator(Similator<? super E> similator)
-
setEqualator
MultiMatcher<E> setEqualator(Equalator<? super E> equalator)
-
setValidator
MultiMatcher<E> setValidator(MatchValidator<? super E> validator)
-
match
MultiMatch<E> match(XGettingCollection<? extends E> source, XGettingCollection<? extends E> target)
-
defaultSimilarityThreshold
static double defaultSimilarityThreshold()
-
defaultSingletonPrecedenceThreshold
static double defaultSingletonPrecedenceThreshold()
-
defaultSingletonPrecedenceBonus
static double defaultSingletonPrecedenceBonus()
-
defaultNoiseFactor
static double defaultNoiseFactor()
-
New
static <E> MultiMatcher<E> New()
-
-