Interface MultiMatcher<E>

  • Type Parameters:
    E - the type of the elements being matched.
    All Known Implementing Classes:
    MultiMatcher.Default

    public interface MultiMatcher<E>
    Logic for bidirectionally and exclusively linking all matching elements from two collections according to equality and/or sufficient similarity.

    Exclusviely means each element in both collections can at most be linked with one element from the other collection. Bidirectionally means the link between two elements always has two directions. If element A is linked to element B, element B is inherently linked to element A as well.

    Equality and similarity are defined by Equalator and Similator functions that can be passed at creation time. All values controlling the matching algorithm can be optionally configured in the factory class if the default configuration is not desired. Additionally, a callback function for deciding found matches with questionable similarity can be injected.

    This is a powerful general purpose means of building associations of two sets of similar but not equal elements.
    A very simple use case is the formal recognition of a changed table column structure (for which this class was originally developed).

    For example given the following two hypothetical definitions (old and new) of column names:

    Old:

    • Name
    • Firstname
    • Age
    • Address
    • Freetext
    • Email
    • OtherAddress
    and

    New:
    • firstname
    • lastname
    • age
    • emailAddress
    • postalAddress
    • noteLink
    • newColumn1
    • someMiscAddress
    When using a case insensitive modified Levenshtein Similator(see Levenshtein.substringSimilarity(java.lang.String, java.lang.String)) the algorithm produces the following associations:
     firstname       -1.00- Firstname
     lastname        -0.75- Name
     age             -1.00- Age
     emailAddress    -0.71- Email
     postalAddress   -0.77- Address
     noteLink        [new]
     newColumn1      [new]
     someMiscAddress -0.56- OtherAddress
     X Freetext
     
    • Method Detail

      • similarityThreshold

        double similarityThreshold()
      • singletonPrecedenceThreshold

        double singletonPrecedenceThreshold()
        This is a measure of how "eager" the algorithm is to find as many matches as possible. The lower this threshold is, the more "single potential match" items will be preferred over actually better matching pairs just to not leave them unmatched. To deactivate this special casing, set the threshold to 1.0, meaning only items that fit perfectly anyway take precedence over others.
      • singletonPrecedenceBonus

        double singletonPrecedenceBonus()
      • noiseFactor

        double noiseFactor()
      • setSimilarityThreshold

        MultiMatcher<E> setSimilarityThreshold​(double similarityThreshold)
      • setSingletonPrecedenceThreshold

        MultiMatcher<E> setSingletonPrecedenceThreshold​(double singletonPrecedenceThreshold)
      • setSingletonPrecedenceBonus

        MultiMatcher<E> setSingletonPrecedenceBonus​(double singletonPrecedenceBonus)
      • setNoisefactor

        MultiMatcher<E> setNoisefactor​(double noiseFactor)
      • defaultSimilarityThreshold

        static double defaultSimilarityThreshold()
      • defaultSingletonPrecedenceThreshold

        static double defaultSingletonPrecedenceThreshold()
      • defaultSingletonPrecedenceBonus

        static double defaultSingletonPrecedenceBonus()
      • defaultNoiseFactor

        static double defaultNoiseFactor()