Function std.numeric.gapWeightedSimilarityNormalized
The similarity per gapWeightedSimilarity
has an issue in that it
grows with the lengths of the two strings, even though the strings are
not actually very similar. For example, the range ["Hello",
"world"]
is increasingly similar with the range ["Hello",
"world", "world", "world",...]
as more instances of "world"
are
appended. To prevent that, gapWeightedSimilarityNormalized
computes a normalized version of the similarity that is computed as
gapWeightedSimilarity(s, t, lambda) /
sqrt(gapWeightedSimilarity(s, t, lambda) * gapWeightedSimilarity(s, t,
lambda))
. The function gapWeightedSimilarityNormalized
(a
so-called normalized kernel) is bounded in [0, 1]
, reaches 0
only for ranges that don't match in any position, and 1
only for
identical ranges.
Select!(isFloatingPoint!F,F,double) gapWeightedSimilarityNormalized(alias comp, R1, R2, F)
(
R1 s,
R2 t,
F lambda,
F sSelfSim = F .init,
F tSelfSim = F .init
)
if (isRandomAccessRange!R1 && hasLength!R1 && isRandomAccessRange!R2 && hasLength!R2);
The optional parameters sSelfSim
and tSelfSim
are meant for
avoiding duplicate computation. Many applications may have already
computed gapWeightedSimilarity(s, s, lambda)
and/or gapWeightedSimilarity(t, t, lambda)
. In that case, they can be passed
as sSelfSim
and tSelfSim
, respectively.
Example
import std .math : isClose, sqrt;
string[] s = ["Hello", "brave", "new", "world"];
string[] t = ["Hello", "new", "world"];
writeln(gapWeightedSimilarity(s, s, 1)); // 15
writeln(gapWeightedSimilarity(t, t, 1)); // 7
writeln(gapWeightedSimilarity(s, t, 1)); // 7
assert(isClose(gapWeightedSimilarityNormalized(s, t, 1),
7.0 / sqrt(15.0 * 7), 0.01));
Authors
Andrei Alexandrescu, Don Clugston, Robert Jacques, Ilya Yaroshenko