#Optimization and correctness of algorithm
1 messages ยท Page 1 of 1 (latest)
What exactly is your question. Maybe explain what you are doing.
function JaroSimilarity(string1: string, string2: string): number {
let charArray1 = string1.split("");
let charArray2 = string2.split("");
const s1 = string1.length;
const s2 = string2.length;
const matching_window = Math.floor(Math.max(s1, s2) / 2) - 1;
const pointerArr = new Array(s1).fill(false);
let m = 0;
const a1: string[] = [];
const a2: string[] = [];
for (let i = 0; i < charArray2.length; i++) {
let maxIndex = Math.min(i + matching_window, charArray1.length);
let minIndex = Math.max(0, i - matching_window);
for (let j = minIndex; j <= maxIndex; j++) {
if (charArray2[i] === charArray1[j]) {
if (pointerArr[j] === false) {
m++;
let matchVal = charArray1[j];
a2.push(matchVal);
a1.push(matchVal);
pointerArr[j] = true;
} else {
break;
}
}
}
}
if (m === 0) return 0;
let transposition = getTranspositions(a1, a2);
let value = (1 / 3) * (m / s1 + m / s2 + (m - transposition) / m);
return value;
}
function getTranspositions(a2: string[], a1: string[]) {
let count = 0;
a2.forEach((e, i) => e !== a1[i] && count++);
return count / 2;
}
function JaroWinklerSimilarity(string1: string, string2: string, p_optimal = 0.1): number {
const p = Math.min(p_optimal, 0.25);
let simJ = JaroSimilarity(string1, string2);
let l = getPrefixLength(string1, string2);
return simJ + l * p * (1 - simJ);
}
function getPrefixLength(s1: string, s2: string) {
let a1 = s1.trim().split("");
let a2 = s2.trim().split("");
let c = 0;
for (let i = 0; i < Math.min(a1.length, a2.length, 4); i++) {
if (a1[i] === a2[i]) c++;
else break;
}
return c;
}
I wanna know if there are any optimisations i can make
Do you have ANY test cases?
First you have to know you are making the correct output. That means test cases.
Generally when I'd doing algorithm discovery and optimization I have test cases first. Then I do baseline JMH bench.
PointerArray seems like a BitSet.
fuzzy match... regex?
It's really not worth bothering to optimize if you don't have test coverage.
yeah im gonna write some tests in jest rn
umm regex wont really be optimal for that tbh if u have similar ones and need scoring
"sassafras"
I did one some time ago, i just split the search text, then join it on .* and regex match
feels hacky 
If it work, it work.
You compile it before looping, and it is pretty fast.
Only 5 lines
i think we can benchmark later 
public static boolean fuzzyMatch(String search, String text){
Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
return matcher.find();
}```
xD
why didnt u inline the return it is bothering me 
Because I would not use it like that
I would separate the compiled pattern from the loop that search
public static void loop() {
List<String> texts = new ArrayList<>();
var filteredList = fuzzyMatchFilter("asd", texts);
}
public static List<String> fuzzyMatchFilter(String search, List<String> texts) {
Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
return texts.stream().filter(text -> pattern.matcher(text).find()).toList();
}```
my final ```java
public static void loop() {
List<String> texts = new ArrayList<>();
var filteredList = fuzzyMatchFilter("asd", texts);
}
public static List<String> fuzzyMatchFilter(String search, List<String> texts) {
Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
List<String> filtered = new ArrayList<>();
for (var text : texts) {
if (pattern.matcher(text).find()) {
filtered.add(text);
}
}
return filtered;
}```
nice
I really do not like stream api performance.
9 time longer in regex?
ig urs is faster tbh
its mostly machine difference
but this gives a sorted array so u can rank and see lol
rank and see?
does it matche smiley?
wdym
lmao why would anyone search emojis lol unless its for a emoji picker tbh
besides form would have some validation
why? it is a good character
do your app not manage UTF-32?
i do not need to its mostly for sorting through a json schema of text values
that the form would have
well that is if i dont give up on the ui