#String#replace vs compiled LITERAL pattern replaceAll vs combined pattern

1 messages · Page 1 of 1 (latest)

blazing nebula
#

I made a class whose purpose is to substitute named placeholders in the format ${name} with values.
I implemented it 3 ways that I think are equivalent, and I'm wondering about potential differences in output and performace.

https://paste.myst.rs/3pp2wakt

First, substitute1 vs substitute2:
a. Can they output different results?
b. Does the performance differ much (for any input)?

Second, substitute3:
a. Can it output different results?
b. Is it actually more efficient? (I expect it to only traverse the string once, as opposed to once for each placeholder name)

pliant elmBOT
#

<@&987246717831381062> please have a look, thanks.

pliant elmBOT
#

While you are waiting for getting help, here are some tips to improve your experience:

Code is much easier to read if posted with syntax highlighting and proper formatting.

If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.

Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.

gritty rampart
#

Hi!
Let's start:

  1. Both substitute1 and substitute2 methods aim to achieve similar results. However, there could be subtle differences in how they handle certain scenarios due to the different approaches they use:
    1.1. substitute1 uses simple string replacement. It may replace placeholders in a linear order from left to right. If there are nested or overlapping placeholders, the behavior might not be as expected. For example, if a value contains a placeholder that matches another placeholder's name, the replacement might not work as intended.
    1.2. substitute2 uses regular expressions to perform the replacements. Regex provide more powerful pattern matching capabilities. This method could potentially handle nested or overlapping placeholders more effectively.
  2. Performance. It really depends on the input data and the specific patterns being replaced. In general, regex matching can have slightly higher computational overhead compared to simple string replacement, especially for large input strings. However, the performance difference might not be significant unless you're dealing with very large inputs or extremely complex patterns.
#
  1. Substitute3 uses the namePattern. In most cases, it should produce the same results as substitute1 and substitute2 for the same input string and substitution map. However, if there are discrepancies between the namesPattern and the actual pattern used in substitute1 or substitute2, then the results might differ.
    3.1. Performance Yes, substitute3 is potentially more efficient than substitute1 and substitute2 because it processes the input string using a single traversal, unlike the other methods which process the string multiple times. This can lead to better performance, especially for larger input strings, as it minimizes the number of iterations over the input.
blazing nebula
#

One clarification: I expect substitution keys to never overlap because I throw if they contain ${} and then wrap them in those characters.

I did forget that the values might have substitution names.
Wouldn't substitute3 avoid modifying values that contain substitutions (desired behavior)?

gritty rampart
#

Oh, you're right. I overlooked that when I was writing.

And yes, substitute3 should indeed avoid modifying values that contain substitutions. This is because namesPattern specifically looks for placeholders in the form ${name} and then performs replacements based on the substitutions map. Values that contain ${name} placeholders will not match the namesPattern, so they won't be subject to further replacements.

blazing nebula
#

Great, thanks for the help!

Now I need to figure out how to properly escape the names when creating namesPattern;
currently it makes a pattern like this
\Q${\E(\Q${collection}\E|\Q${source_id}\E|\Q${source_type}\E|\Q${pos}\E|\Q${values}\E)
so I think Pattern.quote(...) isn't intended for escaping individual pieces of expressions like I'm trying to do.

#

oh no I just derped and built it from this.substitutions instead of the substitutions arg, so it might still work if I fix that

gritty rampart
#

Yeah, Pattern.quote(...) is intenden to escape entire strings and not individual pieces of regex

blazing nebula
#

ah ok

gritty rampart
#

final StringBuilder namesPatternBuilder = new StringBuilder("\\$\\{").append("(");

I've changed the namesPatternBuilder line to start with \\$\\{ to correctly escape ${ since $ and { have special meanings in regex

#

This sould generate a pattern that matches the placeholders ${collection} ${source_ud}, etc

blazing nebula
#

I still need to escape the names themselves, don't I?

#

incase a name is .* or something

gritty rampart
#

Yeah

#

If you want to thread .* as a literal sequence and not as regex metacharacter you can do something like this:

#
String key = entry.getKey();
String keyReplacement = key.replace(".*", "\\Q.*\\E");
namesPatternBuilder.append(Pattern.quote("\\${"))
                   .append(keyReplacement)
                   .append("}");
pliant elmBOT
blazing nebula
#

What do the \Q abd \E actually do? I'm not familiar with them from js and python regex.
could I use them to make String escape(String arg) that returns a string representing a pattern that will match the arg literally?

gritty rampart
#

\Q and \E are used to specify a portion of the pattern that should be treated as a literal text. Basically they tell the regex engine to no interpret the text inbetween those two markets as regexes

#

And yeah, you can do this:

  public static String escape(String input) {
        return "\\Q" + input + "\\E";
    }
pliant elmBOT
blazing nebula
#

oh that's what Pattern.quote(...) is doing, and it seems to be working as expected sinced I've fixed some oversights
\Q${\E(\Qcollection\E|\Qsource_id\E|\Qsource_type\E|\Qpos\E|\Qvalues\E)}

#

substitute3 still isn't quite working because of some issue with the substitutions map, but I think eveything relevant to this question is working/answered

#

TYSM for the help!

pliant elmBOT
#

Closed the thread.