#Cross checking every permutation and combination of set of letters with dictionary of words

57 messages · Page 1 of 1 (latest)

trim breach
#

I'd like to use regex to do this, but don't have lots of experience outside of simple tasks. Would it be suitable or am I forced to make every possible combination and permutation and check like that.

e.g. "doog" -> dog, god, good, go, do
all of those should return as true

light breachBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

icy forge
#

If I understand right, you have a set of letters, and a list of words, and you want to find out which words in that list can be made with your set of letters

#

do you have to use all the letters, or can there be leftovers?

trim breach
#

yep

#

there can be leftovers

#

so combinations or permutations

#

e.g. "doog" -> dog, god, good, go, do

#

only one of these uses all 4, but all of them are okay

icy forge
#

I don't think regex is the right tool for the job, if I'm right you should just be able to use a counting solution

trim breach
#

i said this in general:
i have a file with all english words which ive now succesfully parsed
but i want to minimise the dictionary to only be suitable words
so, a simple start was only adding words that were <= amount of starting letters
but i thought i could use regex to contain it to only words that could be made from the letters
the alternative being i create every combination and permutation of the letters, then cross check the dictionary and the permutations then keeping matches only, but it seems a bit inefficient
i thought regex would be a neat solution
my problem is ive only used regex when the letters are in a straight line, not jumbledd

trim breach
#

i havent heard of this before

icy forge
#

so basically you want a function from a set of letters, and a list of strings, and filter that list for words that can be made from those letters

#

You should just be able to count how many times each letter appears in each word, and compare that to your set of letters

trim breach
#

a hashmap of occurences?

icy forge
#

for example, doog is

d: 1
g: 1
o: 2

and dog is

d: 1
g: 1
o: 1

so you should be able to make dog from doog

trim breach
#

or dictionary actually is it

#

dict and hashmap are the same things right?

icy forge
#

a dict is an abstract data structure, a hashmap is one particular implementation that you can use to implement a dict

trim breach
#

gotcha

#

i havent learnt about structures properly yet so my understanding is a bit hazy

#

i plan on implementing a hashtable later for the dictionary, as i think itll be a good opportunity

icy forge
#

a hashmap could work, map from char to int, but given that you are only looking at english words i.e. characters from a-z you could just use a fixed length array

#

so array of 26 ints, where array[0] is the number of times a appears, array[1] is the number of times b appears and so on

#

makes things easier to do in c since you're just using native types

solemn gazelle
trim breach
#

original[0] = 'a'?

#

like

#

it corresponds to letter a?

#

do i need to come up with some logic to fit it into the array?

icy forge
#

so if your set of letters was "abacaba"
your array would look like
{4, 2, 1, 0, 0 ....... 0, 0}

trim breach
#

do i use ascii - 91 or whatever?

icy forge
#

you'd have to write a loop to compute that array from the word

trim breach
#

could i do array[letter - 65]++;

#

is that allowed in c

icy forge
#

you can index the array like this array[letter - 'a']

#

yep, cause char is basically just a byte, and c lets you do arithmetic on it

#

tho if you wanted it to work with any possible character you'd have to do an array of 256 numbers

trim breach
#

but im only doing words, and i have something to check that the original set of letters only contains letters

#

so 26 should be okay

icy forge
#

as long as you only have lowercase letters a-z then you're fine, otherwise you're going to be indexing that array out of bounds

trim breach
#

so i think that should be okay

#

they want us to display things in uppercase unfortunately

icy forge
#

easiest way to avoid UB is to just check the index is in bounds before accessing

trim breach
#

i think it looks a bit jarring but oh well

#

UB?

#

whats ub

icy forge
#

undefined behaviour

light breachBOT
#
Undefined Behavior

Undefined behavior occurs when you violate rules specified by the language. For example: Reading uninitialized memory, performing out-of-bounds memory access, signed integer overflow, and race conditions.

Example: Read Indeterminate Val.
int i; // default init,
       // indeterminate value
while(i < 10) {
    printf("%d\n", i++);
}
Example: Out-of-Bounds Access
int arr[10];
for(int i = 0; i < 20; i++) {
    arr[i] = i; // [i] out of
                // bounds
}
Consequences of UB

Compilers are not required to provide warnings or errors about UB. Often it is undetectable at compile-time. Performing actions which are UB can render your entire program's behavior undefined, leading to anything from crashing to summoning Eldritch Abominations.

See Also

Sanitizers can help identify UB. For more info, look into address sanitizer (ASan) and undefined behavior sanitizer (UBSan). See !howto asan

trim breach
#

ahh okay

trim breach
#

am i doing something wrong?

#

its working!

#

ty all!