#algos-and-data-structs
1 messages · Page 130 of 1
Unless you mean machine-learning algorithms, in which case, maybe.
For learning ds and algo
Yep, for introductory algorithms and data structures, you won't need calculus.
We need calculus when
most of the time you probably wont need calculus and you will probably know when you need it
as mentioned calculus is needed to fully understand how some machine learning algorithms work, I can't think of anywhere else where you would need it rn
either way it wont hurt to know it though the more maths you know in general the better
how is searching for an element in a dictionary constant time complexity?
ohk
Assuming that there are negligible key collisions, when you query for an element in the dictionary, the corresponding element could be found by simply looking up the hash table backing the dictionary. Since the time required to look up the hash table is independent of the number of entries, the time complexity is constant.
btw speaking of constant look up I recently was told that python lists are actually dynamic arrays and so have a lookup of O(1) not O(N)
but when i searched why numpy arrays where so much faster
I got this
something doesnt seem right here and why would it be called a list if its not a linked list...
Indeed. However, Python lists are dynamic arrays of pointers to objects, so the objects themselves aren't stored contiguously (that's not possible, since lists can have arbitrary elements of arbitrary sizes).
that I understand its just an array of pointers correct?
yup
That, and the fact that NumPy arrays and their operations are implemented in C make them a lot faster
I can understand it being a bit slower however numpy arrays seem to be much much faster
Faster in what?
why wouldnt the same be true for python built ins?
look up
They aren't.
import numpy as np
arr = np.ones(1000)
lst = [1]*1000
%timeit arr[520]
%timeit lst[520]
179 ns ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
38.9 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Slower by a lot, in fact
in favor of lists, note.
The looping in Python is awfully slow
I didn't know that indexing Python lists is faster than NumPy though
numpy arrays have vectorized operations, though, which allows them to be often like a hundred times faster on stuff like elementwise addition, etc. That's where they shine. In just access, they actually end up slower (probably because of the interface between numpy's C functions and Python) - but accessing elements of arrays/lists isn't really a performance-critical thing.
I would expect python lists to be 2 times slower at look up
NumPy probably needs to create objects on the fly
Basically should contain raw data (not Python objects but ints)
ah yeah, that's probably the reason
every i32 of a numpy array needs to be wrapped in a new Python object
that's the time expenditure
ahh fair
well do you guys know why python calls it a "list"?
I just find that part to be confusing since it really is more of an array than a list
I personally don't associate "list" with linked lists.
For one, Java calls dynamic arrays ArrayLists.
hmm didnt know that either lol
Maybe because list can be extensible - array has fixed size
"vector" is a common name for dynamic arrays (C++ uses that, say, and Rust)
why not that, no idea
but it still is an array it just creates a new array when needing to grow/shrink
class vector(list): # fixing convention
pass
Perhaps it makes the language easier to understand by new programmers
yeah, it conflicts somewhat with the mathematical concept of a vector
idk I learnt c before python and I found it to be more straight forward
They're named after the list abstract data type, not linked lists. This is similar to the naming of Java's List interface and C#'s List<T>
hmm do you know of any language that actually uses linked lists then?
You can implement them in C/C++
I always assumed list refers to some type of linked lists
done that before which is why I actually thought this
in c that is
Linked lists are pretty useless (in their basic form, at least) for most applications (except training in DSA)
Python uses a linked list for collections.deque, but not a basic one - it's a linked list of arrays (so elements are stored in blocks which are arrays, and each array is a node of a linked list). That way, you still has the extensibility of linked lists, but only have to follow a pointer every block_size elements rather than every time, and also waste less memory.
im so happy this is whats used instead of normal linked lists since it was always on my mind when I learnt about them
java has one in its stdlib
well thx everyone cleared a few things up (still dont agree with the naming though)
something something haskell? 😉
where did all of you learn so much about the inner workings of python btw?
I noticed a lot of people on this server know how things are implemented
By reading CPython source code lol
is there some kind of book/course everyone takes or is it just knowledge accumulated over time
"the code is the best documentation" from authors of Hashcat 😂
do you have any suggestions for where I could start looking into it?
knowledge accumulated over time for me for sure, I don't read CPython sources for fun - only a few tiny parts 😅
All depends what you want to know... When I wanted to see details of some intern mechanisms then I tried to look at the source code for that 
Probably no one knows all details
Pick some small subset 👍
Should grow with time
will do when I have future questions thx 🙂
sometimes someone like nedbat casually drops a revelation like "Python ints are infinitely-sized and implemented as dynamic arrays of 30-bit integers" and I track down a source for that
I knew about the infinite size but didn't know it grew in increments of 30 bits when needed
I've actually discovered that one on my own, by measuring the performance of int operations depending on their value
there was a noticable ladder pattern, with steps at around 2**30, 2**60, and so on
but I didn't know why that was - turns out, because that's when length changes.
I should probably read the rest of this series one day.
https://rushter.com/blog/tags/cpython/
I need a list that is able to do a key-value lookup. Any tips?
To boil it down, I need a datastructure that is ordered and behaves very similarly to a list. But each item in this list has a unique value I want to do a O(1) lookup for, like an index for databases
Right now I am thinking of just subclassing list, but adding a dict attribute that I "replicate" with all items added
do you need to append to the list? modify the order after creation?
Not really no, that is not a use-case I need to support
you could just use a dict then, since they maintain their insertion order now
Hmm alright, what doesn't sit right with me is that it's more of a dict than list. But making it a dict subclass make more sense yeah
yeah you'd have to index it with .keys(), but yeah you could sugarcoat that with a subclass
I was thinking d[d.keys()[0]], but yours makes more sense :p
Hey question about trees, I wrote a function to copy a tree into a dictionary and I'm wondering if this is a good way to do it, or if there is an easier way that I didn't think of. This seems to work, but I just kind of whipped it out and I don't really know if I can explain exactly how it works, so I'm open to a simpler idea. py def tree_to_dict(curr_node: TreeNode, _dict=None, _copy: dict = None) -> dict: """Reads a tree into a dictionary""" if not _copy: # this runs once at start of copy _dict = dict() _copy = _dict _dict[curr_node] = dict() # add node and node child dict to current dict _dict = _dict[curr_node] # set current dict to node child dict # repeat for each child recursively, passing in curretn dict for child in curr_node.children: tree_to_dict(child, _dict, _copy) return _copy Output of program, the list of names is generated as the web-crawler creates the tree, and the dictionary is the return of tree_to_dict() after it copies the tree. ```txt
1_USGS.gov |
2_USGS Libra
3_Resources
... snip ...
this is the output
{1_USGS.gov |: {
2_USGS Libra: {
3_Resources : {}, 3_America th: {}},
2_Overview -: {
3_U.S. Board: {}, 3_USGS - You: {}
}}}
>>> d.items()[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'dict_items' object is not subscriptable
Uh oh ;-;
mm shit
So I either need to do ```py
counter = 0
for item in d.items():
if counter == index:
return item
counter += 1
Or exhaust the `dict_items` object: ```py
return list(d.items())[index]
Any better option, because those are the only ones I can come up with
well, there's collections.namedtuple
Im just curious why would you do this?
I am making a script that, given a url, goes to some of the url's on that page and then on those pages, goes to some of their url's, and so on recursively. As the crawler moves around, it is generating a tree structure with nodes that have a one parent to many children relationship. I want to then store that tree in a dictionary so that the urls extracted from the crawl can be saved to JSON in a nested dict format
ahhh fair and then I guess you convert it back into a tree afterwards?
Since I can only save strings to the JSON, not the actual object, the actual tree is lost, but the data from the nodes, and the structure, is saved. Hmm maybe I can just pickle the whole tree though, I am not sure how that works but I could try it out
I think that could work but I've heard to be careful with pickle since its not safe?
Ooh yeah I'm not sure about the security, but it isn't really important for me atm, as I'm just messing around
But you're right that would probably be a consideration if it were business logic
she's fookin purrin as it is though
lol well if it aint broke dont fix it ygm
haha yeah I'm gonna put it down for today I think
can someone help me in how to find the minimum spanning tree using prims algorithm ? for double integer and the input must be read and showing the output in the console?
this might be a dumb question lol but what is the main difference when iterating through a for loop using enumerate and range(len(listHere))?
I know enumerate is used when you need keep a count of something
range(len(listHere)) will just give you the indices. enumerate(listHere) will give you the indices paired with the elements.
!eval ```py
mylist = ['a', 'b', 'c']
print(list(range(len(mylist))))
print(list(enumerate(mylist)))
@keen hearth :white_check_mark: Your eval job has completed with return code 0.
001 | [0, 1, 2]
002 | [(0, 'a'), (1, 'b'), (2, 'c')]
So you can do: ```py
for i, x in enumerate(xs):
...
Need help with discounts in python
Python is free
How is that related to algorithms and data structures?
for questions like this opening up a help channel #❓|how-to-get-help probably makes more sense
Hi, I had a question, if sets can store only non-mutable types, how can it store an object of the below class?
# class ListNode:
# def __init__(self, x):
# self.val = x
# self.next = None```
We can at any point change values of an obj by obj.val = some_value, so objs are mutable, right?
The hash value should be immutable
With user defined classes, it uses the object’s memory address to get a hash value
oh, okay, thanks
Also it's worth remembering that if you ever define __eq__ for your class, then __hash__ will become undefined, and you will need to define __hash__ yourself if you want objects of that class to be used in sets or as dictionary keys. The __hash__ function must be compatible with __eq__, so that if x == y, then hash(x) == hash(y).
Yeah
You’ll only be able to find the exact object that’s in there
That’s what happens if you do this:py class list_(list): pass
You can only find the exact list_ object in the set
so if we create another list class where the hash is computed using the list items and make it so that the equality checks for if the items are the same in 2 lists then we could use them as dict keys?
yes
you'll still run into
L = HashableList()
d = {L: 10}
L.append(15)
``` this being a problem
well if I change the list I wouldnt want it to have the same hash anyway in this particular instance
just make sure that if two lists are the "same" they will both have the same hash
yeah, but the hash table doesn't recompute the hashes
it can't know you've changed it
The problem is that suddenly, the hash in the table won't be actual
the table doesnt actually store the hashes though does it?
hmm ok maybe the implementation of dicts in python is different from what I thought
no matter the implementation, that'd be a problem unless the table can somehow detect when an object in it changes
because when a hash changes, that means that item needs to be put into a different bucket to be locatable.
the way I know hash tables to work is that they use some kind of hash function to figure out where things should go in the table
but dont actually store the hashes
they don't have to
Some implementations, yes - but even in them, the item needs to be in a very specific location depending on its hash, or the table won't work right (won't find the item).
I just realised that since we can get the keys from the table then it makes sense to store the keys aswell...
never really thought of it like that lol
my idea is that if you change the key then it refers to something else but the key will still be the original list that we gave for that specific value
I see why it wont work with python dicts now though since it also stores the key
Basically, you can use anything you want as a dictionary key as long as you implement __hash__, but you should take care that you don't mutate those keys. You're still responsible for maintaining the invariants, even if Python doesn't force you.
I don't think it would work with any implementation of a hash table, simply because different key -> different hash -> needs to be in a different location in the table. This is true whether or not you explicitly store the key.
yes however what I mean is that the if I change the list then it isnt the key I want anyway
but the dictionary can't know that you've mutated the key
the value is now stored in a spot the key doesn't map to
so if you try to access it with your new key, it won't work
so if I have list1 and lst2 as two different but equal lists in that they both have the same items they can both be used as keys for my value if I change lst1 now only lst2 will work
and If I later make a lst3 thats equivalent to lst2 it can be used aswell
the whole point I wont access that same value with the new key
ok, but now you just have a random value floating around
but the same key can still be used on it
lets say the key is
[0, 1]
I can always use that to access it
you mean your lists would be copy-on-write?
not sure what you mean by that
!e
from dataclasses import dataclass
@dataclass
class A:
lst: list
def __hash__(self):
return hash(tuple(self.lst))
def __eq__(self, other):
return self.lst == other.lst
st = set()
a = A([1,2,3])
st.add(a)
print(A([1,2,3]) in st)
a.lst.append(4)
print(A([1,2,3,4]) in st)
@vocal gorge :white_check_mark: Your eval job has completed with return code 0.
001 | True
002 | False
Here, the set does have an A([1,2,3,4]), but it's not found, because it's still in the location of the hash table where it was when it was an A([1,2,3]).
hm, but he says that's desired
^^
that sounds really evil tbh
this works pretty much like I wanted it to
but when your table resizes, they'll all be recomputed, and you'll have so many collisions
not saying its useful I just wanted to know if its possible
I don't know if hash is called again on table resize, actually
when you get the keys from a dict does it use pointers to the items you used initially to give you the keys or does it reconstruct them based on the stored hash?
well that would still be fine since the idea is that the hash shouldnt change but the question I just asked above is what might cause issues I think
It just iterates over the hash table, giving you the keys of all the nonempty buckets.
(the table holds references to all the keys and values)
but I mean how does it give you the keys
ahhh well that might cause issues then
it can't reconstruct them, a hash loses information
since if I change the list then its hash changes and now the user cant access the value using the key it thinks has the hash for that value
the hash shouldn't change, i think
hmm if the table made new objects that only have refrences in the table then it would solve this issue but that would be very inefficient
i'm not sure what you mean by that
that would limit dict keys to only objects that can be cloned.
and in Python, there's no builtin interface for being cloneable
well it happens with strings but in reverse
where if a string changes it creates a new string with a new pointer and discards the old refrence so it doesnt change what the dict is refrencing
well I tested it and the error wasnt what I expected lol
from dataclasses import dataclass
@dataclass
class A:
lst: list
def __hash__(self):
return hash(tuple(self.lst))
def __eq__(self, other):
return self.lst == other.lst
my_dict = {"test": "test_value"}
a = A([1,2,3])
my_dict[a] = "value for key [1, 2, 3]"
print(my_dict[A([1,2,3])])
# output: value for key [1, 2, 3]
print(my_dict)
# output: {'test': 'test_value', A(lst=[1, 2, 3]): 'value for key [1, 2, 3]'}
a.lst.append(4)
print(my_dict)
# output: {'test': 'test_value', A(lst=[1, 2, 3, 4]): 'value for key [1, 2, 3]'}
[print(my_dict[key]) for key in my_dict] # doesnt work, as expected
print(my_dict[a]) # desont work, as expected
print(my_dict[A([1,2,3])]) # doesnt work either anymore???
should i move this convo to esoteric python btw?
eh, it's algorithmic enough
When you do my_dict[A([1,2,3])], it calculates the hash of that key, finds the buckets where keys with that hash should be, then searches (using __eq__) for a key equal to the target one among them.
Since there aren't any such keys, you get a KeyError.
python uses open addressing so they aren't buckets
ahh because of collisions it has to search the bucket with the key again which causes this issue
(it's totally fine that there's another, non-equal key with the same hash there - that's just a hash collision as far as the table is concerned, happens all the time)
ok this makes more sense now so the value is pretty much inaccessible unless I change my eq so that does [1, 2, 3] is the same as [1, 2, 3, 4]
doing that would make it accessible by [1, 2, 3] after I change a but not by a since that searches in a different bucket
yup that worked
from dataclasses import dataclass
@dataclass
class A:
lst: list
def __hash__(self):
return hash(tuple(self.lst))
def __eq__(self, other):
return self.lst[0] == other.lst[0]
my_dict = {"test": "test_value"}
a = A([1,2,3])
my_dict[a] = "value for key [1, 2, 3]"
print(my_dict[A([1,2,3])])
# output: value for key [1, 2, 3]
print(my_dict)
# output: {'test': 'test_value', A(lst=[1, 2, 3]): 'value for key [1, 2, 3]'}
a.lst.append(4)
print(my_dict)
# output: {'test': 'test_value', A(lst=[1, 2, 3, 4]): 'value for key [1, 2, 3]'}
print(my_dict[A([1,2,3])]) # works now
# output: value for key [1, 2, 3]
[print(my_dict[key]) for key in my_dict] # doesnt work, as expected
print(my_dict[a]) # desont work, as expected
this ruins the eq function but allows it work as I wanted originally with hash tables assuming that no 2 tuples can have the same hash (I think they can so in rare cases it wont work as intended)
to make it work 100% the way I wanted without runing eq I would have to make a dict class as well that stores deep clones of keys instead of actual keys or only set deep clones of objects with other refrences as keys so that they wont change correct?
hmm although even then the key could be changed by referencing from the dict so really the only way to make this work is to make the list immutable which now makes more sense as to why python chooses to only allow immutable keys since there will be an issue on way or another if keys are mutable
@feral hound keys can be mutable, but
- a hash of an object must always stay the same
- if
a == bthenhash(a) == hash(b)
For example, classes are mutable, but they are hashable. Same with all custom object by default.
^^ its gona be hard to write a tldr for this lol
but this was already discussed and why it doesnt work if the key mutates later on violating rule 1 if the hash is based on the contents of the object
have a look at this
then this
the only way i can see that allows for truly mutable keys is to store deep copies as keys and return deep copies of the keys when user asks for them
but that would make it a lot slower for insertion and retrieval of keys
and would take up more memory
You only want to be using lightweight objects as keys anyway, like if you find yourself wanting to do anything too strange with them, maybe there are bigger picture design questions you need to rethink
100% this question isnt about how I could actually use this IRL but more of a hypothetical is it possible, to learn more about how things actually work
Right, so a dictionary is a hashtable, basically. Not quite exactly, since I guess it maintains insertion order now.
about that how does it even maintain insertion order?
I don't know how it maintains insertion order and still offers constant-time lookup, maybe somebody else knows
I can only think of it doing that by storing the insertion order for a key value as well and then sorting all the keys by that order
would still allow for constant time lookup if using the key
but would make iterating over the keys nlogn
I've never written code that even relies on it preserving insertion order, that's a pretty unexpected feature to me 😛
the comments in dictobject.c explain it
unless it stores a linked list with the order it wouldnt need to sort
will have a look rq
well that was a lot simpler than I thought lol
btw hashes are stored so they dont have to be recomputed for a resize correct?
probably
well I learned a lot today thx everyone 🙂
yeah, me too, now I know how dictionaries preserve insertion order 🙂
this explains how memory inefficient the hashmap structuring is
fix error had passed a code explaining how it works
i have a grid of blocks. some float to the top others fall to the bottom. How do i update the grid so that both types of blocks animate uniformly
I you have any questions just ask.
I think this question is better suited for a help channel #❓|how-to-get-help
why?
this chat is for algorithms and data structures
this isnt strictly animation. its an algorithm for how i should update a grid of blocks so that they animate uniformly. well i mean right now its mostly animation but at its core its still an algorithm
ok so let me try to exaplain
do you know which one?
yeah im on it
I had interesting math homework about arithmetic progressions, one exercise was cool enough for me to try to compare to looping in Python.
Guess who won. Spoilers: ||Math won, and by, considerably, a lot||.
I checked the byte code generated from the two functions, analyze()'s was shorter by 12 lines.
If anyone can help me optimize either, feel free.
If it's unclear, pos is the position of the first positive number in the sequence, numbers is the number of positive numbers in the sequence, smallest_divisible and biggest_divisible are the smallest and biggest positive numbers divisible by n in the sequence.
All of which, except for pos, were the answers for the more complex parts of an exercise.
!e
r = range(5, 51, 15)
print(r.start, r.stop, r.step)
@fervent saddle :white_check_mark: Your eval job has completed with return code 0.
5 51 15
hello.
im a starter here can i get the basic tutorials of algos and data structures
Hi, I have an array of values, how can I divide them into ranges where the values approximately increase/decrease/are equal? Is there any existing function in scipy/numpy that can solve it? That's what I want to achieve, red 1 annotates increasing values, 0 equal, -1 decreasing
idk about any library that does this but you could achieve it with a simple for loop
Yeah, that's how I'm doing it right now. But I'm trying to find existing solution to that
fair maybe someone else knows then gl
Use np.diff and then np.sign
I'm not sure it's possible to obtain a good-looking graph that easily.
Because the keyword is "approximately". If you just look at the sign of the first difference, you'll have intervals as short as 1 point.
Wait you want the inverse of that
and you'll basically never have 0, because that'd require two consequtive points to be equal.
you just want to round it a bit
divide by 3 and do np.trunc
then [-3,3] becomes 0 and you're golden
It would actually be easier if the question was framed as "integrate a function and plot it"
where 3 is something sensible you choose
huh?
@flat sorrel should be differentiate not integrate since he wants to know if dx/dy is positive or negative and he also doesnt care about the value just the sign
but that makes it more complicated anyway since he can just do this discretely using a for loop
nah, that won't work either
basically, my point is, if you want to get a good-looking graph, you can't look at just the consequtive points
you need to consider the average slope over several points, at the very least
he has to both round and use some sample rate to make it more approximate
oh average could work too
but it looks smooth
could use a weighted average with some threshold
it wouldn't work in the sense that if you have a line that smoothly grows from −99999 to 99999 it would show as 0 everywhere
depends on if that's fine basically
true but for his case I dont think theres much he can do about that
just fine tune the parameters until it gives what he wants
exactly
I do think however hes trying to find points where dx/dy is over or under some threshold
so that might be desirable if the slope is small enough
whoops just realised I meant to type dy/dx all those times not dx/dy
Sorry, was outside so couldn't respond properly.
Basically, OP is asking to find the values of a function given the sign of its derivative
...they are?
I read it as generating the kinda-sorta-derivative-plot(but smoothened) from a function
That's what I gather from the question, they want the function to increase/decrease/keep constant according to the red line, no?
Otherwise,
This should work
!e
import numpy as np
y = np.random.randint(0, 10, size=20)
print(y)
sign = np.sign(np.diff(y, prepend=y[0]))
print(sign)
@flat sorrel :white_check_mark: Your eval job has completed with return code 0.
001 | [7 5 5 4 3 5 8 4 0 1 5 9 4 2 1 8 9 5 1 0]
002 | [ 0 -1 0 -1 -1 1 1 -1 -1 1 1 1 -1 -1 -1 1 1 -1 -1 -1]
I think it's important to know what "approximately flat" means before you can really come up with a sensible way to do this.
To reduce noise, they could smoothen the graph however they want before passing it to np.diff
Sure, but it will almost never be exactly flat
Yeah, probably they need to add some sort of threshold for that
Exactly, so now you need to know what scale to set the threshold at
Is it an absolute scale? Is it relative to the height of the peaks? etc.
here's how unsmoothened looks like on even slightly spiky data:
import numpy as np
import matplotlib.pyplot as plt
X = np.linspace(-1,1,1000)
Y = (X + np.random.normal(size=X.shape,loc=0.0,scale=0.001))**2
plt.figure()
plt.plot(X,Y)
plt.plot(X,np.sign(np.diff(Y,prepend=[0])))
plt.show()
oof
Oh, didn't know the bot also had matplotlib
it doesn't, that's not a !e 😛
to smoothen it, you'd have to, uhh...
You'd have to do something that's equivalent to what we're trying to do, probably
!d scipy.signal.savgol_filter
scipy.signal.savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=- 1, mode='interp', cval=0.0)```
Apply a Savitzky-Golay filter to an array.
This is a 1-D filter. If *x* has dimension greater than 1, *axis* determines the axis along which the filter is applied.
Try this
Sounds nice. Honestly this question has more of a #data-science-and-ml flavor 🙂
there
def diff_average(arr, n=1):
return arr[n:] - arr[:-n]
n=10
plt.figure()
plt.plot(X,Y)
plt.plot(X,np.sign([0]*n+list(diff_average(Y,n))))
plt.show()
that's already a lot better, although proper filters named after smart people are probably a better choice
There are loads of ways you can define filters, in this case you essentially want to throw out high-frequency data. so there's something you can do with Fourier analysis, although that is probably not the smart way either.
(Fourier analysis will use future points, so if this is time series data, it will bias you in funny ways)
hmm, lemme try how well that works
Your parabola is very well approximated by a single cosine, so I'm guessing Fourier is going to look unreasonably good on this example 🙂
yup, leaving only the 20% lowest-frequency components makes it very smooth:
vals = np.fft.fft(Y)
freqs = np.fft.fftfreq(X.shape[-1])
vals[abs(freqs)>np.percentile(abs(freqs),20)] = 0
re_Y = np.fft.ifft(vals)
sign = np.sign(np.diff(re_Y,prepend=0))
plt.figure()
plt.plot(X,Y)
plt.plot(X,re_Y)
plt.plot(X,sign)
plt.show()
(that 0 at the beginning won't be there if one does the diff properly; I just can't be bothered. Besides it, it's perfect)
smoott
ah, so if you zoom in near (0, 0), you'll see something interesting
hmm, what?
since the curve is nearly horizontal there, you may see small trends in the noise
Which gets back to what I was saying about the scales involved. Do I want to be able to zoom in and see trends?
I don't actually see the trends in the middle since due to how I'm adding the noise (to the X, before applying the function), the magnitude of it is lowest in the middle - but good point anyway
ah, I see
But in any case, the "correct" way to remove noise probably depends on what sort of data we're measuring
Implement the state minimization algorithm in Python. The program should take a finite automata M=(Q, Alphabet, Transition Table, Initial state, set of final states) as input. The output should give the minimum equivalence state partition and the corresponding minimum state transition table. How do i do this? Can anyone help me ?
Sorry for not replying. This is the best effect I could achive using savgol_filter and np.sign with threshold. This is the code: https://pastebin.com/raw/s4MtD06Y It's far from perfect but I can't get any better. The problem with this approach is that using savgol_filter I lose exact position of peaks and there are short flat orange lines
import numpy as np
while True:
derivative_of_loss_function = (y - current_theta * x).dot(-x)
new_theta = current_theta - learning_rate * derivative_of_loss_function
current_theta = new_theta
loss_function = np.sum(1/2 * (y - current_theta * x)**2)
if loss_function < .01:
break
else:
pass
anyone see an error in the gradient descent algo?
my theta is blowing up and im not sure why
looks like np is not defined in your code
I have this code
def gap(a,b):
return abs(ord(a)-ord(b))
def func(s,k):
temp,max="",""
for i in range(len(s)-1):
temp+=s[i]
if gap(s[i],s[i+1])>k:
max=max if len(max)>len(temp) else temp
temp=""
return max```
my question is how do I append the last character of s to s?
for a general non-empty string you can do s += s[-1]
I would s += s[-1] right below temp=""?
possibly, though I'm not sure what that function is doing
my code is finding the longest k interspace substring
this is what am trying to solve @fiery cosmos
I tried it for some reason its still not working 😩
I see. Though why would you need to modify s? I think you are on to the right strategy with looping over and updating the max as you go along. I'm coding up a solution, one sec
when I try apple and 25, my output comes out empty for some reason
wouldn't 25 mean the only possible substrings are az...
oh, it says at most, my bad
!e I think this works. I used two ints for index and length rather than remember the max string, then I slice at the end ```py
def interspace(a, b):
return abs(ord(a) - ord(b))
def longest_k_interspace_substring(s, k): # returns length and first longest substring
if not s:
return 0, s
index = 0 # index of the start of the last k-interspace substring
longest_index, longest_length = 0, 1
for i in range(1, len(s)):
if interspace(s[i - 1], s[i]) > k:
if i - index > longest_length:
longest_length = i - index
longest_index = index
index = i
return longest_length, s[longest_index:longest_index + longest_length]
print(longest_k_interspace_substring('', 1)) # 0
print(longest_k_interspace_substring('a', 0)) # 1
print(longest_k_interspace_substring('wedding', 0)) # 2
print(longest_k_interspace_substring("ababbacaabbbb", 1)) # 6
print(longest_k_interspace_substring("apple", 25)) # 1```
@fiery cosmos :white_check_mark: Your eval job has completed with return code 0.
001 | (0, '')
002 | (1, 'a')
003 | (2, 'dd')
004 | (6, 'ababba')
005 | (1, 'a')
it says the interspace should be at most k
yeah, now I'm confused about the apple example 
it should be the entire string "apple"
that breaks other test cases
the problem with "apple" is there's never a trigger to update longest_length, yeah
Just put it outside the for loop as an extra case
Well or add a dummy character to the end of s
I think a little flag works where you just return the whole string if you don't find issues? Also gets rid of the empty string check
!e ```py
def interspace(a, b):
return abs(ord(a) - ord(b))
def longest_k_interspace_substring(s, k): # returns length and first longest substring
index = 0 # index of the start of the last k-interspace substring
longest_index, longest_length = 0, 1
found_too_much_space = False
for i in range(1, len(s)):
if interspace(s[i - 1], s[i]) > k:
found_too_much_space = True
if i - index > longest_length:
longest_length = i - index
longest_index = index
index = i
if not found_too_much_space:
return len(s), s
return longest_length, s[longest_index:longest_index + longest_length]
print(longest_k_interspace_substring('', 1)) # 0
print(longest_k_interspace_substring('a', 0)) # 1
print(longest_k_interspace_substring('wedding', 0)) # 2
print(longest_k_interspace_substring("ababbacaabbbb", 1)) # 6
print(longest_k_interspace_substring("apple", 25)) # 1
@fiery cosmos :white_check_mark: Your eval job has completed with return code 0.
001 | (0, '')
002 | (1, 'a')
003 | (2, 'dd')
004 | (6, 'ababba')
005 | (5, 'apple')
Though I'm not 100% certain 
no worry guys 😩 , already submitted. only passed 7/15 test cases with my solution
it had the label of being an easy question but I think it should have been labeled medium. Its funny because the other question was medium but I solved it fairly quick
Your code is extremely slow due to you assigning strings instead of indexes
but after seeing your solution @fiery cosmos , makes sense
Even if it's correct it'd take too long to run
Well it'll depend on the exact testcases. But yeah, that's why I wanted to avoid copying the string.
But I think you had the right idea, almighty, which is a good start 
would you consider that an "easy" question lol
I would
bordering on medium
😩
Well depends on how hard medium problems are 🙂
This might seem hard for you because you aren't familiar enough with strings ig
def peakElement(self,arr, n):
# Code here
return arr.index(max(arr))
is this right method to solve interview questions?
or i have to right more elaborative ish code even tho im using python
can someone pls explain more on it from interview perspective
They might want you to show that you understand a peak element doesn't have to be the max
it is first question for level 1 section already going above my head i feel so dumb
I think this is one of those binary search-like problems. It requires your to split the array into sub arrays left and right.
Iterate from element 1
but what if it is unsorted array
That’s fine.
Iterate from element 1. Check edge case where element 0 > element 1, if so then element 0 is a peak element. Else compare index 0 of right sub array with element 1
You need to use something like arr[2:][0]
is there any performance benefits with slicing?
slicing copies the part of the list, so it's pretty slow (compared to just iterating from that index) on big lists
but often you don't care about such speedups, only the asymptotic performance
Please explain the graph data structure and how to represent it?
there's a ton of ways - a common and nice one is a dictionary mapping a vertex's index to a list of indexes of all vertices that this vertex is connected to.
e.g. two points connected with an undirected edge would be
{0: [1], 1: [0]}
thanks
Hi,
I want to find distance between different object, from an arial view (captured by drone). Can anyone tell me where I can start my research? How can I achieve this?
Can you help me why the isin doesn't work?
Looking at the error, you're not running the code you're showing.
the rt is working but I want to simplify as the filter2. Can anyone tell me what is wrong with it?
You're doing .product, which is reserved for the product function on dataframes, and so refers to that instead of the ["product"] column.
I tried but as you can see, still doesn't work both
in this case find only Gorilla
and nothing from Poly
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
according to this, you can't pass two search queries. You can pass a regex to match, though.
thans
what do yo uwant to do?
a = 5 if 2 < 6 else 4?
oh, okay
got it
a_index+=1 if a[a_index]<b[b_index] else b_index+=1
How to make this work?
you can't do that
in general you cannot programmatically select variables
or rather, you should not
so I'll have to work with only one variable?
So, is there a way?
why dict?
what do you mean why dict
a is an array in this case
variables are basically stored in a dict that is kind-of internal to the interpreter
Oh, okay
To make the code shorter
you probably
care way too much
about code length
and way too little
about readability and maintainability
harsh truths hurt
if a[a_index]<b[b_index]:
a_index+=1
else:
b_index+=1```
This was what I was tryna shorten
not really sure why you need to shorten that...
also...
please add spaces between your operators
that hurts my eyes a bit
lmao
there's a thing called PEP8
you should probably check it out
it helps if your code largely complies to the standards most people follow
of course, if you only ever write code for yourself, you can do whatever you want
cool
hey
im trying to find the mode of a list
with numbers
but if there are equal amount of elements i want to print the higher number
but it always prints the lower one
is there anything i can do?
which approach are you using currently to find the mode?
multimode will find all modes, then you can take the max of them
>>> from statistics import multimode
>>>
>>> nums = [1, 2, 2, 3, 3, 4]
>>> modes = multimode(nums)
>>> modes
[2, 3]
>>> max(modes)
3
if you're using statistics.mode then according to the docs it returns the first encountered mode, so it likely depends on ordering
>>> from statistics import mode
>>> mode([2, 2, 3, 3])
2
>>> mode([3, 3, 2, 2])
3
#help-cake help please selection sort
Might be in the wrong chat-channel but, is there a good tutorial on how to work with python with multiple files?
Most tutorials and courses i saw, handle most of the information in a single .py or insite a jupyter notebook.
RN i am working on a Flask-api and i have a hard time to figure out how to split my code in smaller files without running in a circular import error..
i usually follow a tree system
as in the main file imports from secondary files and the secondary file imports from tertiary, etc.
but a tertiary file never imports from a secondary one, etc.
the primary one can also import from tertiary files as well
basically imports only go one way
this is computing fib(n+1) if that helps you
also I looked through a bunch of the solutions on there and they pretty much all did the same thing so as long as you get time complexity of O(N) and space complexity of O(1) your good
have you checked out the pins?
hi.
I have a forloop, where
for x in range(0, n):
//does n % 26 checks
what would be the runtime?
would doing n % 26 checks technically be O(1) because we know the behavior can't grow faster than a certain rate?
that was worded really badly but
1 + 2 + 3 + 4 + ... + (n - 1) + n
would be n^2,
but in this case it is
1 + 2 + 3 + 4 + 5 ... + 25 + 26 + 1 + 2 + 3 + 4 + ... + 25 + 26 + 1 + 2 + ... (n times)
thats O(n)
if 26 is constant
It’s O(n) pretty much for the reason you said, OP
Hello, check out the resources linked in the pinned messages of this channel.
I need help with this algorithm problem. I already submitted my answer:
`
Tile Arranger
Programming challenge description:
Write an algorithm to determine whether a collection of tiles can be rearranged to form a given word. Each tile has 1..N letters. You do not have to use each tile, but you cannot use any tile more than once. There can be several identical tiles.
You may assume len(word) >= 1 and size(tiles) >= 1
Input:
Newline separated list of strings provided through STDIN
<word>
<tile 0>
<tile 1>
…
<tile n>
Output:
"true" or "false"
Does anyone know how to do this? It seems like a DP problem(edited)
[12:53 PM]
Test 1
Input:
foobarbaz
foob
foo
ba
ba
r
z
Expected Output
true
Test 2
Input:
foobarbaz
foo
fooz
ba
r
z
Expected Output
false
Test 3
Input:
catsanddogs
ddogs
cat
san
Expected Output
true
similar to Leetcode 139. Word Break
`
Thank you
I don't even know how to approach this problem
Is there a limit on number of tiles
in a grid we can move 'N', 'S', 'W', 'E'. Each move is either +1 or -1 on respected axis x or y.
My task is to check positions that have been visited more than once, including 0, 0.
so, 'NS' would return 1, 'WEWNES' would return 2.
for some reason I find this harder than it should be. I think I just went down the wrong rabbit hole with this...
what have you tried?
Well, my initial approach was something like:
since it's given to me in line by line,
I was going to check the word with the tile and remove the letters if they were found
and keep iterating through the list
but that doesn't work
see example 2
I mean example 1
Well you can use a set to save all visited positions
But there must be a limit on the number of tiles and the length
If it's small you could just loop over each combination
well i made a 2d grid with comprehension. that is the size of my moves and I was thinking to keep checking each index in the list of lists if they have bee nvisited more than once
`
import sys
for line in sys.stdin:
print(line, end="'
`
import sys
for line in sys.stdin:
print(line, end="')
`
this is what I was given in the code editor
What do you mean by loop over each combination?
like make a word from all the tiles?
and manually check
Yeah I mean if it's small enough then you can just check every possible ways
so I even had trouble with the formatting because I'm not too familiar with sys.stdin.
`
import sys
for line in sys.stdin:
print(line, end="')
`
I did something like
`
list = []
for line in sys.stdin:
list += line
word = list.pop(0)
`
would something like this work?
you can list.append(line)
Hello there, how would a go about finding overlapping rectangles in a list/array efficiently, by a given threshold criteria? (e.g. Intersection over union score?)
ok i did it
I checked and it passes all three test cases
I had to change the code to permutations rather than combinations
import sys
from itertools import permutations
list = []
for line in sys.stdin:
list.append(line)
word = list.pop(0)
subsets = []
def add_powerset(list):
for i in range(0, len(list) + 1):
for element in permutations(list, i):
subsets.append(''.join(element))
add_powerset(list)
flag = 'false'
for subset in subsets[1:]:
if subset == word:
flag = 'true'
print(flag)
this is a brute force approach.
can someone help me with optimizations?
def arrange_tiles(target, tiles):
paths = []
for word in tiles:
if target.startswith(word):
paths.append([word])
no_match = False
while True:
no_match = True
for word in tiles:
for i, path in enumerate(paths):
if "".join(path + [word]) == target:
print("".join(path + [word]))
return True
elif target.startswith("".join(path + [word])) and paths[i].count(word) < tiles.count(word):
no_match = False
paths[i].append(word)
if no_match:
return False
target = "foobarbaz"
tiles = ["foo", "fooz", "ba", "r", "z"]
print(arrange_tiles(target, tiles))
this is my solution for it
it doesnt check all possibilities, it first loops over all the "tiles" and checks if the target starts with any of those it appends it to a list of possible paths
then it loops over the list of "tiles" again and checks if the target starts with the word appended to the end of a path if so append it to that path and keep repeating
this will go on until a complete path is found and return true
if in any of the loops it wasnt able to append any word to a path and we still havent found the target then return false
I will look over this. This seems better than my brute force solution
this takes into account only using each tile once right?
Hi there!, anyone around?
I have a grid in which I do moves. N, S, W, E. These are my moves along X and Y.
I need to keep track of positions I have visited more than once. This includes 0, 0
So let's say... NS would return 1 cause 0,0 --N--> (x=0, y=1) --S--> (x=0, y=0). Thus I have visited 0, 0 for the second time
I already told you how
my approach for this wast to make a grid which is the size of the my moves.
grid = [[0 for x in range(len(moves))] for y in range(len(moves))]
x = 0
y = 0
for move in moves:
if move == 'N':
y += 1
grid[x][y] = True```
yeah, I remember but I have no clue how to that with a set
I'm very new to all this
and I just don't understand how will I keep track of a visited position with a set when a set can contain only unique elements
!e
s = {(1, 2), (3, 4)}
print((1, 2) in s)
@glossy breach :white_check_mark: Your eval job has completed with return code 0.
True
I still have a lot to learn
Hmm I mean you can check if it's already inside the set
so I can make my set with a comprehension to a dynamic size? As the number of moves can be arbitrarily long
I don't know, but I feel like I should be able to understand this, but it's eludeing me soo hard lol
You can use the fact that every elements in the set is unique
All you have to do is add the position to the set
!e
s = {(1, 2), (3, 4)}
s.add((1, 2,))
print(s)
@glossy breach :white_check_mark: Your eval job has completed with return code 0.
{(1, 2), (3, 4)}
cheers, thank you!
I'll go and hack away at this.
Hopefully the correct channel - never did much with hashing.
I'm looking for an algorithm that turns a 31bit integer into 5 characters [0-9a-zA-Z].
The numbers are high enough that no doubles occur.
Wondering what could be popular hash algorithms to achieve that.
are you saying the first bit is never 0?
you have enough space for about half the range, "the numbers are high enough that doubles don't occur" taken literally means it always starts with 1
The first is always 1 and the last doesn't matter. So 30 from 32 bits that vary
y
wait I'm wrong the first bit is not always 1, actually 0 atm.
The numbers are periodic. And the chance for them the exhaust is basically zero.
what do you mean periodic
there's like a straightforward rule on what numbers can appear, always the same?
or you just know they aren't very dense
pls give me an online resource to master algorithm.
Cp-algorithms
thanks!
And codeforces / cses problem set for practice
Yes.
then you need to build on that rule
if all your algorithm knows is that it will never run out of space, it's not possible, or at least it's not hashing
is there like a minimum distance between numbers
"if i see 106 I know 108 never appears, too close"
yeah it does but i forgot to check for that in one place lol
def arrange_tiles(target, tiles):
paths = []
for word in tiles:
if target.startswith(word):
paths.append([word])
no_match = False
while True:
no_match = True
for word in tiles:
for i, path in enumerate(paths):
if "".join(path) == target:
print("".join(path))
return True
elif target.startswith("".join(path + [word])) and paths[i].count(word) < tiles.count(word):
no_match = False
paths[i].append(word)
if no_match:
return False
target = "foobarbaz"
tiles = ["foob", "foo", "ba", "ba", "r", "z"]
print(arrange_tiles(target, tiles))
this shouldnt have any issues
paths[i].count(word) < tiles.count(word)
this is whats making sure that each tile is only used once btw
I haven't come up with any polynomial solution for this problem sadly
I've thought of 2 different solutions and both are O(NMK) cant think of anything better tbh
N number of tiles
M number of paths
K how many times it has to loop over both to find a path could be seen as length of target / number of tiles that make it up
the number of paths already makes it exponential
well number of paths has to be less than or equal to number of tiles
and unless the tiles are very small or the target is very long then k shouldnt be bigger than 5 for most inputs and can be seen as a constant for the most part, k also can only be less than or equal to number of tiles
so its not that bad
and definitely better than trying all permutations
absolute worst case its O(N^3)
Your code sounds incorrect though
I mean words can overlap
So path[i].count(word) may not work correctly
nvm I got it wrong
But there's still a way to make it incorrect
when using statistics.mode() and there are equal amount of elements, such as numbers it always returns the lower number, but i want it to return the higher number, how do i do that?
I think your misunderstanding what paths is
Yeah that's why I said I got it wrong
its just a list of the tiles that currently make up the start of the target
ahh fair
but what do you mean by a way to make it incorrect
But your code will be wrong
give me a case where it doesnt work
If there're multiple ways to get a prefix of the string
owait
I think I got it wrong again
Or no
["z", "abc", "ab", "c"], target = "zabcc"
gimme a sec will change it
If it matches "z", "ab" and "c"
Then there's no way to get the last c
just an example
yeah I see what you mean now
So I think if you do this it has to be exponential
well the time complexity will still be O(NMK)
!e
def arrange_tiles(target, tiles):
paths = []
for word in tiles:
if target.startswith(word):
paths.append([word])
no_match = False
while not no_match:
no_match = True
new_paths = []
for word in tiles:
for path in paths:
if "".join(path) == target:
return True
elif target.startswith("".join(path + [word])) and path.count(word) < tiles.count(word):
no_match = False
new_paths.append(path+[word])
paths = new_paths
return False
target = "zabcc"
tiles = ["z", "abc", "ab", "c"]
print(arrange_tiles(target, tiles))
@feral hound :white_check_mark: Your eval job has completed with return code 0.
True
@glossy breach fixed it now
worst case its N^3
And your target also a string of n 'a's
It will generate n^2 paths after first iteration
n^3 paths after second
"aaa"
["a", "a", "a"]
ahh your right its because this
path.count(word) < tiles.count(word):
I need a better way to check for this lol
I don't think there's a way to do this in polynomial time without some complex data structure
path.count(word) < tiles.count(word): I think if we just fix this it will be NMK
Hmm
which should be N for most cases
I have an idea how to fix it gimme a sec
!e
def arrange_tiles(target, tiles):
paths = []
paths_index = []
for i, word in enumerate(tiles):
if target.startswith(word):
paths.append([word])
paths_index.append([i])
no_match = False
while not no_match:
no_match = True
new_paths = []
new_paths_index = []
print(paths)
for i, word in enumerate(tiles):
for indexs, path in zip(paths_index, paths):
if "".join(path) == target:
return True
elif target.startswith("".join(path + [word])) and i not in indexs:
no_match = False
new_paths.append(path+[word])
new_paths_index.append(indexs + [i])
paths = new_paths
paths_index = new_paths_index
return False
target = "foobarbaz"
tiles = ["foob", "foo", "ba", "ba", "r", "z"]
print(arrange_tiles(target, tiles))
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [['foob'], ['foo']]
002 | [['foo', 'ba'], ['foo', 'ba']]
003 | [['foo', 'ba', 'r'], ['foo', 'ba', 'r']]
004 | [['foo', 'ba', 'r', 'ba'], ['foo', 'ba', 'r', 'ba']]
005 | [['foo', 'ba', 'r', 'ba', 'z'], ['foo', 'ba', 'r', 'ba', 'z']]
006 | True
@glossy breach this fixes it
wait I can fix this a bit more
actually wait will have to use the old code again to fix it even more lol
nvm works both way all I had to do was add this
and path+[word] not in paths and path+[word] not in new_paths:
!e
def arrange_tiles2(target, tiles):
paths = []
paths_index = []
for i, word in enumerate(tiles):
if target.startswith(word):
paths.append([word])
paths_index.append([i])
no_match = False
while not no_match:
no_match = True
new_paths = []
new_paths_index = []
print(paths)
for i, word in enumerate(tiles):
for indexs, path in zip(paths_index, paths):
if "".join(path) == target:
return True
elif target.startswith("".join(path + [word])) and i not in indexs and path+[word] not in paths and path+[word] not in new_paths:
no_match = False
new_paths.append(path+[word])
new_paths_index.append(indexs + [i])
paths = new_paths
paths_index = new_paths_index
return False
target = "aaaa"
tiles = ["a", "a", "a", "a"]
print(arrange_tiles2(target, tiles))
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [['a'], ['a'], ['a'], ['a']]
002 | [['a', 'a']]
003 | [['a', 'a', 'a']]
004 | [['a', 'a', 'a', 'a']]
005 | True
@glossy breach
ahh also before I forget @dawn pebble
ig it's still exponential if you have n distinct tiles "a", "aa", "aaa", "aaaa", "aaaaa"... and target "aaaaaaaa..."
you will reach the word way before any of that becomes an Issue I think
I mean
!e
def arrange_tiles2(target, tiles):
paths = []
paths_index = []
for i, word in enumerate(tiles):
if target.startswith(word) and [word] not in paths:
paths.append([word])
paths_index.append([i])
no_match = False
while not no_match:
no_match = True
new_paths = []
new_paths_index = []
print(paths)
for i, word in enumerate(tiles):
for indexs, path in zip(paths_index, paths):
if "".join(path) == target:
return True
elif target.startswith("".join(path + [word])) and i not in indexs and path+[word] not in paths and path+[word] not in new_paths:
no_match = False
new_paths.append(path+[word])
new_paths_index.append(indexs + [i])
paths = new_paths
paths_index = new_paths_index
return False
target = "aaaa"
tiles = ["a", "aa", "aaa", "a"]
print(arrange_tiles2(target, tiles))
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [['a'], ['aa'], ['aaa']]
002 | [['aa', 'a'], ['aaa', 'a'], ['a', 'aa'], ['a', 'aaa'], ['a', 'a']]
003 | True
I mean
extend the target
to like
100 characters
Or 1000
sth like that
Just saying it'll be exponential
yeah it would be a problem in that case but dont think theres anything we can do about that
we can also skip the last step by returning as soon as we find a match but this is easier to debug rn
!e
def arrange_tiles2(target, tiles):
paths = []
paths_index = []
for i, word in enumerate(tiles):
if target.startswith(word) and [word] not in paths:
paths.append([word])
paths_index.append([i])
no_match = False
while not no_match:
no_match = True
new_paths = []
new_paths_index = []
print(paths)
for i, word in enumerate(tiles):
for indexs, path in zip(paths_index, paths):
if target.startswith("".join(path + [word])) and i not in indexs and path+[word] not in paths and path+[word] not in new_paths:
no_match = False
if "".join(path+ [word]) == target:
return True
new_paths.append(path+[word])
new_paths_index.append(indexs + [i])
paths = new_paths
paths_index = new_paths_index
return False
target = "aaaa"
tiles = ["a", "aa", "aaa", "a"]
print(arrange_tiles2(target, tiles))
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [['a'], ['aa'], ['aaa']]
002 | True
and [word] not in paths
and adding this to the first loop
hmm actually thinking about it now i think we can improve this further
rn its doing kind of a BFS if we make it do a DFS it will probably be faster for cases like this
target = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
tiles = ["a", "aa", "aaa", "a","a", "aa", "aaa", "a","a", "aa", "aaa", "a","a", "aa", "aaa", "a"]
@jolly fog I see you lol
I'm certain I don't know what you're talking about. 😇
Yeah but slower for most cases
Or depends on how you implement it
🤔
Anyway
I'll tell you when I have an idea of a polynomial time complexity solution
What's the actual problem you're solving?
this
Ah, I see, so if the tiles and the word were actually interesting, then it wouldn't take so long
But for a bunch of aaaaaaaaaaaaaa it's tricky
Even though it shouldn't be tricky since you know the answer just by adding the lengths
Unless I gave you a bunch of tiles with lengths that cannot be combined to add up to the "word"
yeah I'm making one that does does DFS rn and its already a lot faster
And then you're left with a well-known NP-hard problem: Given the total of a shopping bill, can you find which combination of items could have been bought to add up to that total?
So, if there are any combination of aaa tiles that can be combined to give the aaaaaaaaaaaaaaaaaaaaaa word, then DFS will find it fast. But if no combination of tiles can be found, you will still of course try all n! permutations.
There isn't, it's NP-hard
You might find a better constant, but you'll still have at least exponential scaling in the worst case
fair
But yeah, there is probably a smarter strategy than just trying all the combinations. It may be very complicated to code, though
Like prioritize larger tiles in your search, so you first try the longest aaaaa that is less than your word, then add the longest aaa that still fits, etc.
!e
def arrange_tiles3(target, tiles, paths):
if not paths:
for tile in tiles:
if target.startswith(tile) and [tile] not in paths:
paths.append([tile])
for path in paths:
for tile in tiles:
if "".join(path + [tile]) == target:
return True
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
targets = ["aaaaaaaaaaaaaaaaaaaaaaa", "aaaa", "foobarbaz", "zabcc", "foobarbaz"]
all_tiles = [["a", "aa", "aaa", "a","a", "aa", "aaa", "a","a", "aa", "aaa", "a","a", "aa", "aaa", "a"], ["a", "aa", "aaa", "a"], ["foob", "foo", "ba", "ba", "r", "z"], ["z", "abc", "ab", "c"], ["foo", "fooz", "ba", "r", "z"]]
all_expected = [True, True, True, True, False]
for target, tiles, expected in zip(targets, all_tiles, all_expected):
print(f"target: {target}")
print(f"tiles: {tiles}")
output = arrange_tiles3(target, tiles, [])
if output == expected:
print(f"{output} is correct\n")
else:
print(f"{output} is incorrect\n")
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | target: aaaaaaaaaaaaaaaaaaaaaaa
002 | tiles: ['a', 'aa', 'aaa', 'a', 'a', 'aa', 'aaa', 'a', 'a', 'aa', 'aaa', 'a', 'a', 'aa', 'aaa', 'a']
003 | True is correct
004 |
005 | target: aaaa
006 | tiles: ['a', 'aa', 'aaa', 'a']
007 | True is correct
008 |
009 | target: foobarbaz
010 | tiles: ['foob', 'foo', 'ba', 'ba', 'r', 'z']
011 | True is correct
... (truncated - too many lines)
Full output: https://paste.pythondiscord.com/zevurunozo.txt?noredirect
recursive DFS solution
we're trying to find an optimal way to solve this problem that someone asked about yesterday
Ah could u like ping the problem 😅😅
have a look here
To really optimize it, you'd have to have some information about the types of words and tiles we have. If they are just random sequences of letters, then you have no information to go on, and your search is forced to be "blind"
btw python does some weird shit when you put an empty list as a default argument to a function
yeah, don't do that
It only constructs the list once, and then statically reuses it every time the function is called
If you can select a word multiple times then I have a solution in O(length of target * number of tiles)
Ah hmm okay yeah that's a nice problem
lets see it
Otherwise I think it's impossible to reach polynomial time complexity
find it weird that it does that though shouldnt it make a scoped variable?
Python violates a lot of assumptions about scoped variables, it's one of its annoying properties
man I thought I was safe from that after using python instead of JS
for i in range(10):
pass
print(i)
yeah I noticed that aswell but i think for loops and if statements just arent scoped so i dnt mind it as much
First time I tried to write a DFS in python it took me like nearly 1 hour to realize that the list isn't copied when I pass it into the function
yeah I fall for it every time until I remember lol
Ya sometimes I actually want it to be copied
But yeah, if you're used to pass-by-value, it will be a surprise
If you want an efficient DFS, and the list might be very long, you don't want it copied 🙂
I think it was a backtrack related problem so I needed the list to be copied
global works for those situations
ikr I was only a newbie back then 🙂
Or if copying is very expensive, you might be able to design some way of only using the "diff" between two steps in the search. Will take a bit of thought to implement, though
But git commits are all "diff" information, for example
!e
value = 2
def test():
value = 5
print(value)
def test2():
global value
print(value)
test2()
test()
print(value)
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | 5
002 | 2
003 | 2
speaking of global is there a way to make test2 look for the value in test instead of globally
Just pass it as a parameter at that point, lol
I am not sure I completely understand the soln rn (I'm kinda sleepy lol I'll give it a read again)
But could anyone tell me if this would be a good way of doing it:
Mapping the tiles to their length
Like {2:["aa", "ab", " aa",... ],...}
And then seeing the total buffer length given of tiles... Like the useless tile length and break it down to sum of different tile lengths
And then do cases eliminating the particular tiles from the list.. Like if I got a buffer length of 3
It could be 1+2 , 1+1+1, or 0+3
So I make cases
1st time I'll remove a tile of length 2 and one of length one and then try to construct with the remaining tiles... And effectively run this case until I have exhausted removing all combinations of onr tile of length 2 and one tile of length 2
And then move ahead to the next case?
lol true I guess there wouldnt be a point to trying to do that with something like this now that I think about it
Oh shit that is a huge paragraph... I didn't realize lol
I think something like this can work but its more complicated and I think it still has the same time complexity as the current solutions
Actually if test2() is an inner function of test() as written, then just saying value will refer to the one in test() already
!e
value = 2
def test():
value = 5
def test2():
value = 3
print(value)
print(value)
test2()
test()
print(value)
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | 5
002 | 3
003 | 2
but then you run into this
thats the annoying thing about assignment and initialization being the same in python...
Actually that makes sense, remember test2() gets defined as part of test()
nah I get why its doing this
just find it annoying that I cant make it clear if I want to create a new value or update an existing one
Hmm well I guess.. I think time complexity for my solution would be somewhere in O(n^4-n^6) not sure lol ...
Anyways I am not very sure about how to calculate time complexity anyway so I'll leave it to y'all ig 😅
tbh rn I'm struggling to figure out my time complexity so dunno If I can work out yours lol
but i think my solution should be N^3 worst case
Aha then I guess that rules out mine by far lol
I cant say for certain yet
I think yours might have an edge somewhere
rn I have 2 solutions which both have the same worst case but are better in different situations
I'd probably want a table of my tile words based on what letters are in them, as well, so that I know not to try certain ones over certain regions of my target word
^^ another optimization but I dont think that affects the complexity
How do you guys are master in DSA?
No, the worst case complexity is exponential
How do you put into work?
?
What will be your recommendation on DSA for beginners?
what you mean?
honestly idk just do more problems maybe have a look at the pinned resources as well
Help me find the resources. Show me.
00:00:00 - Introduction
00:00:15 - Artificial Intelligence
00:03:14 - Search
00:14:17 - Solving Search Problems
00:25:57 - Depth First Search
00:28:30 - Breadth First Search
00:54:29 - Greedy Best-First Search
01:05:15 - A* Search
01:12:01 - Adversarial Search
01:14:09 - Minimax
01:36:17 - Alpha-Beta Pruning
01:45:28 - Depth-Limited Minimax
Thi...
have a look at this I found it quite helpful
That's for AI?
The problem contains the Subset Sum Problem as a special case, which is NP-complete. Therefore the problem itself is NP-complete.
https://en.wikipedia.org/wiki/Subset_sum_problem
they go over a good amount of DSA in that video
Ah Alright thanks.
The special case is precisely the one where the target word is aaaaaaaaaa and the tile words are different lengths, made of the character a
DFS deals with that particular case very well though
and still preforms decently on other cases
BFS is the one which had issues with that case
DFS takes factorial time if there is no combination of a words that can add up to the right length. I think it can still take a very long time if there is only one correct combination out of a very large number.
Basically, it is only performing well in your examples, because your examples have many possible solutions
give me an input that would cause issues
What if all my tile words have lengths which are prime numbers between 11 and 100, and the target word has length 1000
remember as well we cant reuse tiles for this problem which is why I dont think its an issue
yeah can be the exact same
["foob", "foo", "ba", "ba", "r", "z"]
this is a valid tileset that uses ba twice
Sure, then you can construct an example like what I mentioned
Of course, it's worth asking whether the worst-case complexity is relevant, if we can make the average case decently good
But the meaning of "average case" depends very strongly on what expectations we have about the input.
^^
tbf at this point this isnt even about being useful anymore but more about figuring out the challenge lol
I mean, you have an algorithm that solves it
The best algorithm will still have exponential worst-case complexity
nah but I mean solve it efficiently for all cases
If "efficiently" means "polynomial time", then no, you can't
I feel like there is always some kind of info we can extract to prevent the worst case
nope, you'll just end up spending so much time extracting the info that you end up taking exponential time anyway
DFS has factorial worst-case complexity, since it literally tries every permutation of inputs. If I gave you some very non-obvious inputs, you would not be able to predict whether DFS would find the answer quickly or not.
But I agree DFS is probably better than BFS, since I don't see the point of constantly checking whether small strings match at the beginning, without looking at the rest of the word.
"aaaaaaaaaaaaaaaaaaaaaaa"
["a", "aa", "aaa", "a","a", "aa", "aaa", "a"]
it took me a while ngl but yeah I see what you mean in cases like this it does try all permutations
but thats only for this kind of case in other cases it will exit before that even if the target cant be reached
the issue comes in from saying that we cant reuse tiles since
If the target can't be reached, it will try everything before it figures that out.
path "aaa" might not be equivalent to another path "aaa"
The smartest algorithm is gonna be DFS with some additional logic to figure out "obvious failures" in advance.
But the sky's the limit as far as how you want to define "obvious".
it doesnt try all permutations for something like this
foobarbaz
["foo", "fooz", "ba", "r", "z"]
I mean it does, it's just that when the letters don't match, you can fail earlier
this might be hard to explain but for the algo i wrote it wont try every single permutation only the ones that match the start of the target until they fail before it discards them
What I mean is "tries every permutation until failure" of course. We don't keep going down the same branch if the string doesn't match so far.
So yes, DFS is gonna be very good when string matches actually matter.
It's the aaa thing that will be hard
that is a pretty big difference
my intuition tells me there is definitely a way to circumvent the issue somehow but im too tired for that rn plus the solution is good enough rn I guess
There are many ways to optimize that though
You can do some kind of string hashing to compare in O(1)
Which can make it a lot faster
ok I have a different idea
if we make a list of lists called occurences that holds where each tile starts and ends in the target so it makes a list of ranges
we just have to find if there exists a list of ranges that covers the whole target without overlapping
weird question tbh but I think dictionary makes more sense
you can have each vertex as a key and its value be the edges
"foobarbaz"
["foo", "fooz", "ba", "r", "z"]
occurences = {
"foo": [(0, 3)]
"ba": [(3, 5), (6, 8)]
"r": [(5, 6)]
"z": [(8, 9)]
}
something like this used a dict to show the point but the values themselves dont actually matter and we only need to know which ranges we are allowed to use at the same time for example "ba" has 2 ranges and we can only use one of them at any given time
if we can figure out an efficent way to check if its possible to create a complete range from 0 to 9 without any overlap then we have a fast solution
@glossy breach @tropic glacier what do you think?
it doesnt have to be in this format either btw just using it as an initial example I think we could something with it if we store an id with each range to denote what item it belongs to and have the keys be the start of each range instead
start end id
occurences = { \/ \/ \/
0: [(3, 0)]
3: [(5, 1)]
6: [(8, 1)]
5: [(6, 2)]
8: [(9, 3)]
}
something like this
then we can start from 0 and try to stitch together a path if we find that we cant continue at any point then return false if we find a complete path we return true
I mean it'll be faster but it will never be polynomial
hmm I mean if we try every combination 100% but I think using this we could find a different way to quickly find if a gap exists or if there is no possible sum of ranges without an overlap
Hello, I need help with this code, because this list give me a IndexError
with open(persons_names_file_path,"r+") as f:
lineas = [linea.strip() for linea in f.readlines()]
with open(persons_identifier_img_file_path,"r+") as g:
lineas_Association = [lineaA.strip() for lineaA in g.readlines()]
if (person_name not in lineas):
f.write(f"{person_name}\n")
#num_linea = lineas.index(person_name)
persons_identifier_img = str(num_linea) + ".jpg"
elif (person_name in lineas):
redefinition = True
with open(persons_identifier_img_file_path,"r+") as f:
lineas = [linea.strip() for linea in f.readlines()]
if (redefinition == False):
if(both_coincidences == False):
f.write(f"{persons_identifier_img}\n")
print("No te conocia")
elif(both_coincidences == True):
print("Ya te conocia")
both_coincidences = False
if (redefinition == True):
f.truncate(0)
f.seek(0)
lineas[num_linea] = f"{persons_identifier_img}\n" #THIS IS THE ERROR LINE
f.writelines("\n".join(lineas))
print("Identifique una nueva imagen con tu personal, osea asociada a tu nombre")
redefinition = False
return text
And the error is :
lineas[num_linea] = f"{persons_identifier_img}\n"
IndexError: list assignment index out of range
num_linea is the line number containing a person's name in the persons_names_file_path file.
But then when I try to access that same line number in the persons_identifier_img_file_path txt file.
The second file has fewer lines, and so the index is out of range.
I hope you can help me
@feral hound wow this is pretty advanced. I simplified my code a lot to be a lot cleaner.
Essentially, my solution calculates all the permutations using all the tiles and then just checks to see if the word exists inside a permutation
yeah that works but its the slowest solution
the 2 solutions I posted both only check the possible permutations instead of all the permutations however even that given some cases can be very very bad which is why we tried to optimise it even more but dnt really have anything better rn
the 2 solutions are both good in different scenarios so it depends on your input which one will be better
if your solution is working well enough though thats all you need
as they say if it aint broke dnt fix it
so I'm having trouble understanding this line of your code:
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
I understand your code up to this point. Essentially, before, you're checking to see if a tile starts at the beginning of the word. Then you check to see if that tile plus another tile will be equal to your word.
I'm having trouble understanding when let's say, 3 tiles can be combined to form the target word.
ok wait which one are you looking at cause I have a few different solutions
also a part of that was redundant let me send something new in
def arrange_tiles3(target, tiles, paths):
if len("".join(tiles)) < len(target):
return False
if not paths:
for tile in tiles:
if target.startswith(tile) and [tile] not in paths:
paths.append([tile])
for path in paths:
for tile in tiles:
if "".join(path + [tile]) == target:
return True
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
have a look at this one
its my best solution for this problem but also the hardest to understand lol
@dawn pebble
oh ok. i was looking at different solution. I will take a look.
i do
depth first search
yea
thats what this is doing
okay. so please correct me if I'm wrong at any point:
- you compare the length of all tiles combined vs. length of target word, if less than, return False. Makes sense.
- you essentially create a list of paths with the first tile
also just realised theres a small mistake here gimme a sec
def arrange_tiles3(target, tiles, paths):
if len("".join(tiles)) < len(target):
return False
if not paths:
for tile in tiles:
if target.startswith(tile) and [tile] not in paths:
paths.append([tile])
for path in paths:
for tile in tiles:
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
there we go
the reason the other one was wrong was because it allowed a tile to be reused at the last check
so at this point, you have a list named paths with all potential first tiles.
for path in paths:
for tile in tiles:
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
then you're starting to check the first tiles + second tile
what does path.count(tile) < tiles.count(tile) do
this is to make sure that a tile isnt reused
so if the tile is "ba" we make sure that the number of "ba"s in the path is less than the number of "ba"s in the list of tiles before we append it to the path
that way we make sure that every tile is only used once
I had a different way to check for this as well where it checked the indexes of the tiles being added but this works too
okay
ah
then you recursively call the function again but this time, update the paths
so that it has the second tile
ok
yup
and for every call other than the initial first one the paths will only have 1 path
yup it completely looks through 1 branch before trying another
its nice but struggles at one particular input
i mean, this is a lot better than the brute force solution
i feel kinda sad bc i didn't pass the coding challenge but I guess it's a learning experience
!e
def arrange_tiles3(target, tiles, paths):
if len("".join(tiles)) < len(target):
return False
if not paths:
for tile in tiles:
if target.startswith(tile) and [tile] not in paths:
paths.append([tile])
for path in paths:
for tile in tiles:
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
target = "aaaaaaaaaaaaaaaab"
tiles = ["a", "aa", "aaa", "a","a", "aa", "aaa", "a", "a", "aa", "aaa", "a","a", "aa", "aaa", "a"]
print(arrange_tiles3(target, tiles, []))
@feral hound :warning: Your eval job timed out or ran out of memory.
[No output]
you can try running this yourself idk how long it will take but after 1 hr I still didnt get an answer lol
!e
def arrange_tiles3(target, tiles, paths):
if len("".join(tiles)) < len(target):
return False
if not paths:
for tile in tiles:
if target.startswith(tile) and [tile] not in paths:
paths.append([tile])
for path in paths:
for tile in tiles:
if target.startswith("".join(path + [tile])) and path.count(tile) < tiles.count(tile):
if "".join(path + [tile]) == target:
return True
if arrange_tiles3(target, tiles, [path + [tile]]):
return True
return False
target = "aaaaaaaaaaaaaaaab"
tiles = ["a", "aa", "aaa", "a","a", "aa", "aaa", "a", "a", "aa", "aaa", "a","a", "aa", "aaa", "a"]
print(arrange_tiles3(target, tiles, []))
@dawn pebble :warning: Your eval job timed out or ran out of memory.
[No output]
and all that is because of the b at the end it makes us go through every single permutation of the tiles before letting us know that it doesnt work
what coding challenge was it?
when you say ever single permutation, you also mean the combinations where there were different lengths?
every*
for example, if it was permutations(a, b) => (), (a), (b), (a, b), (b, a)
or just (a, b) and (b, a)
this
its the same as that for this particular input
only permutes (a, b) and (b, a)
ohh
and then checks to see if the word exists in (b, a) and (a, b)
are you sure it works for every case though?
so it uses all the tiles?
def main():
tiles = [line.strip() for line in sys.stdin]
word = tiles.pop(0)
for subset in permutations(tiles):
if word in "".join(subset):
return True
return False
print(main())
yeah,
i checked and it uses all the tiles
but it works for the first test case
i only checked for the 3 given test cases
oh opps
from itertools import permutations
and import sys
but you can change the inputs
wont be using sys but yeah
ahh I see why it works now
if word in "".join(subset):
you did this
yeah
that doesnt always work btw
!e
from itertools import permutations
def main():
tiles = ["foobarbazzz"]
word = "foobarbaz"
print([permutation for permutation in permutations(tiles)])
for subset in permutations(tiles):
if word in "".join(subset):
return True
return False
print(main())
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [('foobarbazzz',)]
002 | True
this should be false
!e
from itertools import permutations
def main():
tiles = ["fooba", "rbazzz"]
word = "foobarbaz"
print([permutation for permutation in permutations(tiles)])
for subset in permutations(tiles):
if word in "".join(subset):
return True
return False
print(main())
@feral hound :white_check_mark: Your eval job has completed with return code 0.
001 | [('fooba', 'rbazzz'), ('rbazzz', 'fooba')]
002 | True
that too but yeah you get the point
sigh
i had to go back so far to see the start of this conversation
i did this with this code:
def create_word(word, tiles):
if not word:
return True
for i, tile in enumerate(tiles):
if (
word.startswith(tile)
and create_word(word.removeprefix(tile), tiles[:i] + tiles[i + 1:])
):
return True
return False
this did go on for a while lol
i'm just salt-die
you can shrink it with any
hmm this is also DFS but much cleaner than my implementation lol
ngl been having trouble figuring that out
I thought it was NMK and N^3 worst case but that doesnt seem to be the case
def create_word(word, tiles):
return (
not word
or any(
word.startswith(tile)
and create_word(word.removeprefix(tile), tiles[:i] + tiles[i + 1:])
for i, tile in enumerate(tiles)
)
)
ok this is starting to hurt my head
do you do coding competitions?
no
thats a comprehension right?
wait, i'm having trouble understanding this clean code
yep
it is too advanced for me
ok makes sense now
do you understand comprehensions?
!e
nums = [i for i in range(10) if i % 2 ==0]
print(nums)
@feral hound :white_check_mark: Your eval job has completed with return code 0.
[0, 2, 4, 6, 8]
hes doing a version of this there
ok, this is the any shorthand?
any just checks if any of the arguments inside are true
it will immediately return True if one argument is true as well, so it short-circuits
