#Optimization and correctness of algorithm

1 messages ยท Page 1 of 1 (latest)

vague crownBOT
#

<@&987246964494204979> please have a look, thanks.

modest hatch
#

What exactly is your question. Maybe explain what you are doing.

exotic edge
#
function JaroSimilarity(string1: string, string2: string): number {
  let charArray1 = string1.split("");
  let charArray2 = string2.split("");

  const s1 = string1.length;
  const s2 = string2.length;

  const matching_window = Math.floor(Math.max(s1, s2) / 2) - 1;
  const pointerArr = new Array(s1).fill(false);

  let m = 0;

  const a1: string[] = [];
  const a2: string[] = [];

  for (let i = 0; i < charArray2.length; i++) {
    let maxIndex = Math.min(i + matching_window, charArray1.length);
    let minIndex = Math.max(0, i - matching_window);

    for (let j = minIndex; j <= maxIndex; j++) {
      if (charArray2[i] === charArray1[j]) {
        if (pointerArr[j] === false) {
          m++;
          let matchVal = charArray1[j];
          a2.push(matchVal);
          a1.push(matchVal);
          pointerArr[j] = true;
        } else {
          break;
        }
      }
    }
  }

  if (m === 0) return 0;

  let transposition = getTranspositions(a1, a2);
  let value = (1 / 3) * (m / s1 + m / s2 + (m - transposition) / m);
  return value;
}

function getTranspositions(a2: string[], a1: string[]) {
  let count = 0;
  a2.forEach((e, i) => e !== a1[i] && count++);
  return count / 2;
}

function JaroWinklerSimilarity(string1: string, string2: string, p_optimal = 0.1): number {
  const p = Math.min(p_optimal, 0.25);
  let simJ = JaroSimilarity(string1, string2);
  let l = getPrefixLength(string1, string2);
  return simJ + l * p * (1 - simJ);
}

function getPrefixLength(s1: string, s2: string) {
  let a1 = s1.trim().split("");
  let a2 = s2.trim().split("");

  let c = 0;
  for (let i = 0; i < Math.min(a1.length, a2.length, 4); i++) {
    if (a1[i] === a2[i]) c++;
    else break;
  }
  return c;
}

I wanna know if there are any optimisations i can make

modest hatch
exotic edge
#

yes

#

its for fuzzy matching

modest hatch
#

Do you have ANY test cases?

#

First you have to know you are making the correct output. That means test cases.

#

Generally when I'd doing algorithm discovery and optimization I have test cases first. Then I do baseline JMH bench.

#

PointerArray seems like a BitSet.

limber viper
#

fuzzy match... regex?

modest hatch
#

It's really not worth bothering to optimize if you don't have test coverage.

exotic edge
exotic edge
modest hatch
#

"sassafras"

limber viper
exotic edge
#

feels hacky pepekek

limber viper
#

You compile it before looping, and it is pretty fast.

#

Only 5 lines

exotic edge
#

i think we can benchmark later pepekek

limber viper
#
  public static boolean fuzzyMatch(String search, String text){
    Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(text);
    return matcher.find();
  }```
#

xD

exotic edge
limber viper
#

Because I would not use it like that

#

I would separate the compiled pattern from the loop that search

limber viper
# exotic edge why didnt u inline the return it is bothering me <:pepekek:646439889797251082>
  public static void loop() {
    List<String> texts = new ArrayList<>();
    var filteredList = fuzzyMatchFilter("asd", texts);
  }

  public static List<String> fuzzyMatchFilter(String search, List<String> texts) {
    Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
    return texts.stream().filter(text -> pattern.matcher(text).find()).toList();
  }```
#

my final ```java
public static void loop() {
List<String> texts = new ArrayList<>();
var filteredList = fuzzyMatchFilter("asd", texts);
}

public static List<String> fuzzyMatchFilter(String search, List<String> texts) {
Pattern pattern = Pattern.compile(String.join(".*", search.split("")), Pattern.CASE_INSENSITIVE);
List<String> filtered = new ArrayList<>();
for (var text : texts) {
if (pattern.matcher(text).find()) {
filtered.add(text);
}
}
return filtered;
}```

exotic edge
#

nice

limber viper
#

I really do not like stream api performance.

exotic edge
#

@limber viper

#

perhaps similar language and machine would give better benchmarks

limber viper
#

9 time longer in regex?

exotic edge
#

ig urs is faster tbh

#

its mostly machine difference

#

but this gives a sorted array so u can rank and see lol

limber viper
#

rank and see?

exotic edge
#

also matches typos

#

thats good

limber viper
#

does it matche smiley?

exotic edge
#

wdym

limber viper
#

๐Ÿ˜„

#

Unicode emoticon

exotic edge
#

lmao why would anyone search emojis lol unless its for a emoji picker tbh

limber viper
#

\u+1F600

#

Well people can use them anytime

exotic edge
#

besides form would have some validation

limber viper
#

why? it is a good character

exotic edge
#

well to check for special unicode characters

limber viper
#

Just allow them

#

why not?

exotic edge
#

because options will only have text

#

lol

limber viper
#

smiley is text?

#

I do not get the problem.

exotic edge
#

never mind

#

its a special unicode character

limber viper
#

do your app not manage UTF-32?

exotic edge
#

i do not need to its mostly for sorting through a json schema of text values

#

that the form would have

#

well that is if i dont give up on the ui

limber viper
#

How do you manage india, japan, thailand? and any non ascii char?

#

Is the dream for dev not to have a worldwide app?

exotic edge
#

i dont the form has a fixed array of english options

#

the options are inherently english

#

because java annotations are english

#

๐Ÿ˜„