#algos-and-data-structs
1 messages · Page 60 of 1
||transform(x) = fast_mod(x * (2^96 %p))||
||Here 2^96%p is a "constant", and can be precomputed||
Wat why
Ah because
Of the 2^64
Damn that’s smart
This thing is quite elegant in the end
Crazy to think that it’s faster than mod
This is basically the standard method whenever you need fast modular operations
I don't know why more people don't use it on cf
I know why
We aren’t aware of it
At least the bottom part of cf which is like 90% active users 😁
I really don't like the wiki article https://en.m.wikipedia.org/wiki/Montgomery_modular_multiplication
In modular arithmetic computation, Montgomery modular multiplication, more commonly referred to as Montgomery multiplication, is a method for performing fast modular multiplication. It was introduced in 1985 by the American mathematician Peter L. Montgomery.
Montgomery modular multiplication relies on a special representation of numbers called ...
It took forever for me to understand.
In the end, it turns out their fancy Montgomery transform is just multiplying by 2^32
https://codeforces.com/contest/839/problem/D
i was solving this, and it is TLE'ing , but idk what minor changes to make
pwer = [0] * (int(2e5) + 1)
pwer[0] = 1
for i in range(1, int(2e5) + 1):
pwer[i] = (pwer[i-1] * 2) % MOD
def solve(case = None):
n = int(input())
arr = list(map(int, input().split()))
dp = [0] * (int(1e6) + 1)
for elem in arr:
for i in range(1, int(math.sqrt(elem)) + 1):
if not elem % i:
dp[i] += 1
if i * i != elem:
dp[elem // i] += 1
real_dp = [0] * (int(1e6) + 1)
for i in range(int(1e6) + 1):
if dp[i]:
real_dp[i] = dp[i] * pwer[dp[i]-1] % MOD
for i in range(int(1e6), 1, -1):
for j in range(2 * i , int(1e6) + 1, i):
real_dp[i] -= real_dp[j]
res = 0
for i in range(2, int(1e6) + 1):
res = (res + real_dp[i] * i) % MOD
print(res % MOD)
maybe high constant factor idk
there's like 4 big loops
but i think the major problem is calculating divisor worse case (2e5) * (1e3)
Can you link your submission
One algorithm that is good to know about is something I call divisor sieve
oh precalculating max divisors
divisors = [[] for _ in range(m)]
for d in range(1, m):
for mult in range(d, m, d):
divisors[mult].append(d)
This algorithm
For each possible divisor d, it loops over all numbers mult that divides d
Then it appends d to their divisor list
so that means sqrtN is a bad upper bound here
but it will AC in cpp i think
i see
easy enough
The sqrtN solution does not have the best possible time complexity, yes
got it
For efficiently solving this problem, you can do something similar
n = int(input())
A = [int(x) for x in input().split()]
m = max(A) + 1
A_cnt = [0] * m
for a in A:
A_cnt[a] += 1
divisor_cnt = [0] * m
for d in range(1, m):
divisor_cnt[d] = sum(A_cnt[mult] for mult in range(d, m, d))
Then put divisor_cnt as your dp variable
alr tysm
okay its so fast wow 740ms , faster than many cpp submissions
https://codeforces.com/contest/839/submission/277828946 Tried making one too
got below 0.5s
Tweaked it to run faster, but somehow this runs slower https://codeforces.com/contest/839/submission/277829972
Such a pypy moement
Here is another tweaked submission that runs signficiantly slower for no reasoon at all https://codeforces.com/contest/839/submission/277831459
Got it pretty short now
The only difference is between these two is that the bottom one uses for mult in range(2*d,m,d), while the other uses a while loop:
https://codeforces.com/contest/839/submission/277831910
https://codeforces.com/contest/839/submission/277831841
what the
Pypy is really weird I had a higher runtime for pre calculating till nth power of 2
in comparison to 2e5th power every test
Hello, I am new here. I want to learn python and be able to write code and get comfortable solving coding puzzles but I am a beginner with no experience. Looking for some help
is this the idea?
# Task: Find a*b % P without calling % P(costly)
P = 10**9 + 7
P_INV = pow(P, -1, 1 << 64)
R_INV = pow(1 << 64, -1, P)
# 2**64 | n + kP
# n + kP := 0
# => k = -n * P**-1
def k(n):
return -n * P_INV
def transform(n):
return (n << 32) % P
def fast_mod(n):
return n * R_INV % P
def mont_mulmod(a, b):
return fast_mod(transform(a) * transform(b) + k(a * b) * P)
I'm not really seeing why this would be faster
so I'm prob doing something wrong
what are some hash functions for counters that are quick to compute and can be added together post hash? for context, I'm reading the editorial of a problem:
given r rows (40) and c columns (10^5) of c colours (10^5, given as integers), what's the least amount of rows that need to be removed so every color is in the table the same number of times (-1 if no sol)
the editorial says to do meet in the middle with a hash function that can be aggregated (it also says to use multiple hash functions together, which I'm not too sure about either)
my first thought is to generate c random coefficients and take a linear combination with the counts modulo some large prime
in other words, a hash h s.t. h(x + y) = h(x) + h(y)? then I'd search for like 'homomorphic hashing' or something
linear combination mod p should be fine
# Task: Find a*b % P without calling % P(costly)
P = 10**9 + 7
MASK = (1 << 64) - 1
P_INV = pow(P, -1, 1 << 64)
TWO96 = pow(2, 96, P)
TWO128 = pow(2, 128, P)
def fast_mod(n):
# Find k such that n - k * p is divisible by 2^64
k = (n & MASK) * P_INV) & MASK # everything mod 2^64
return (n - k * P) >> 64
def transform(n):
return fast_mod(n * TWO96)
# fast mod mul can be implemented in many different ways
def mont_mulmod1(a, b):
return fast_mod(transform(a) * transform(b))
def mont_mulmod2(a, b):
return fast_mod(fast_mod(a * TWO128) * b)
It should be something more like this
The computation for k should ideally be implemented using int64 (or uint64) variables.
In pypy, it is possible to do this using __pypy__.intop module
The standard hashing of strings (rolling hash), has this property
(if you see strings as vectors of numbers that can be added elementwise)
I see, ty
won't this part theoretically overflow though?
k = ((n & MASK) * P_INV) & MASK # everything mod 2^64
In c++ overflow of uint64 is fine, it will just wrap around
In pypy you can get this same behaviour using __pypy__.intop
and that's fine?
wait it is, cause technically you just did a % 2^64
In C++ is it 100% fine to overflow uint (unsigned int), but not int (singned int)
Weird quirk of C++
Ye I want that % 2^64
Which is how overflow usually works
yeah, I just had to wrap my head around for a sec
I don't think I've ever thought of overflow as something I can use to my advantage tbh
If you want to compute something mod 2^64 it is nice
That isn't super common
But it happens
mind blown
any good resources on flow networks in particular? i keep running into stuff like min cut max flow
I'd say start learning about augmenting paths to solve maximal bipartite matching.
They generalize to solving Max-Flow (and more stuff to like Matroid problems)
maximal bipartite matching was actually what got me to ask this lmao
spent like 2 hours on a problem today and the editorial was just like "yeah this is pretty obviously just maximal bipartite matching so just do hungarian algorithm"
what a coincidence doing CSES download speed
I'm looking at https://cp-algorithms.com/graph/dinic.html
The goal of this project is to translate the wonderful resource http://e-maxx.ru/algo which provides descriptions of many algorithms and data structures especially popular in field of competitive programming. Moreover we want to improve the collected knowledge by extending the articles and adding new articles to the collection.
https://www.youtube.com/watch?v=ELcgI_C1mNM Here is a pretty good run down of bipartite matching. The augmenting path talk starts around 5 min mark
Matching problems are ubiquitous in real life, like matching students to schools, drivers to passengers, airplanes to airports, etc.
Maximum cardinality bipartite matching is the simplest matching problem, but the results and theory are fundamental to more complex matching problems.
We go over a simple example throughout the whole video, wher...
In my opinion, explaining bipartite matching using flows is the wrong way around
Max flow is the more general (and more complicated) problem
Speaking of matching, is there a formal problem for something opposite to matching like, Choosing some vertices from the first set and second set of a bipartite graph such that all chosen nodes in the first set cannot reach any of the chosen nodes in second set and vice versa. And the number of nodes chosen is maximum
maximum independent set
(I totally didn't ask chatgpt)
I'm building a group sorting/shuffle algorithm and I'm not exactly sure what it's called....
I have 4 groups (a,b,c,d) each group has several items. I need to shuffle items to neighboring groups (I.E. b can shuffle into a, b, and c but not d. c can shuffle into b, c and d but not a) I'm having a hard time understanding (on paper) how to implement the logic or what type of algorithm I'm looking for. Is this a common technique?
what's the logic behind the shuffling?
Not quite sure what your asking...?
The logic as far as I can see it seems pretty simple
A can be shuffled into A, B
B into A,B,C
C Into B,C,D
D Into C,D
It can actually that was a typo
It's a group with several items in it....?
what does it mean to shuffle a group? are you moving individual items in a group into another group, and do items have to be moved into the same group? what decides where an item goes?
The original group does not change a temporary list (array?) Would be used for the placement of items until the next time a shuffle was needed. An item (or object) would have properties (A, B, C, D) that determines where each item is allowed to be placed. Each item would only have 1 of those properties as they are mutually exclusive. Groups can overlap as defined above but does not change the items properties.
Ok.....
I have a tire shop with tires that come in several sizes. A, B, C, D
For the sake of argument (and not really practicality) tires can be matched with other tires from similar sizes (neighboring groups as defined above)
guys i'm having problem here
await self.db.execute(
"UPDATE auto_proxy SET proxy_prefix = ? proxy_mode = ? WHERE user_id = ?",
(user_id, is_proxy, prefix),
)
what is the correct way to set this up?
i can't update my database
i want to set multiple stuffs
where my user_id is indicated
nevermind
I've found py f"UPDATE field SET item = {valueA}"
Format strings to be easier to read as you can slap a variable directly into your strings
it worked for me
I don't think you're supposed to use f-strings
See this: https://www.pythondiscord.com/pages/tags/sql-fstring/
@shadow glen
no need, i use item = ? then i use a tuple for that
Okay, good
SQL injection is a fair point.
I'm pretty sure I have something working. I don't really want to slap my code in the channel as it's a paid endeavor. I appreciate your time to hear me out.
@regal spoke I'm doing another inv mod problem and I wanted to use your algorithm to find the inverse easily; but you never mentioned the complexity
Since 1 + 998244353 times 15 is divisible by 16, we have 16^-1 = (1+998244353 * 15) / 16 mod 998244353. So for 16 we get the "worst case".
So my intuition is that the function mod_inverse(a, p) will be O(a) in the worst case? What if then I can't use this https://media.discordapp.net/attachments/650401909852864553/1276566537200730163/Screenshot_20240823-173913.png?ex=66cb503b&is=66c9febb&hm=1fd925d0f8274fd78d3446912aad6d79d50fa83bedf1085f1e166f76f36a3e47&=&format=webp&quality=lossless&width=1661&height=358 because I don't want to precompute all of them (say ( a > 10^6 )), how should I proceed?
Time complexity of the thing I talked about is horrible.
Ah so I shouldn't use it, it was just for intuition
or for really big p and low a to do it by hand?
Btw you might not know this, but relatively recently, mod inverse was added to pow
So you can just use pow(a, -1, p)
Is the underlying algo the extended euclidean algorithm thing?
yes
Great, the last one I hadn't looked at
or if p is prime and on you're on an older python `pow(a, p-2, p)'
My favourite method is definitely using factorials
m = 10**6
MOD = 10**9 + 7
fac = [1] * m
for i in range(2, m):
fac[i] = fac[i - 1] * i % MOD
inv_fac = [pow(fac[-1], MOD - 2, MOD)] * m
for i in range(2, m)[::-1]:
inv_fac[i - 1] = inv_fac[i] * i % MOD
inv = [fac[i] * inv_fac[i - 1] % MOD for i in range(m)]
The reason for this is that inverse factorial is needed for n choose k
and thats very common in these kind of problems
is it better to start dp/recursion/memoisation from n-1 index or 0 index?
iterative usually faster than recursive
from index 0 or index n-1?
doesn't matter, iterative faster than recursive
and also, you didn't really give context to what's different between 'starting from 0' vs 'starting from n-1'
if you can post the code then maybe we can figure it out
if you're talking about implementation difficulty, recursive is usually more intuitive to code
vector<int> dp(n, -1);
dp[0] = 0;
dp[1] = abs(heights[0]-heights[1]);
for(int i{2}; i<dp.size(); i++){
dp[i] = min(
dp[i-1]+abs(heights[i-1]-heights[i]),
dp[i-2]+abs(heights[i-2]-heights[i])
);
}
return dp[n-1];
I can also write it in loop from dp.size() to 2 (return dp[0] in this case)
so which is more intuitive faster for solving, are their certain problem will are easier one way or another?
literally doesn't matter for performance in this case, you're just inverting the logic
which you use... whatever is simpler to you
isnt there a rule for this,
might be informal, but I see some patterns which make one way more easy.
there is no rule, if you see patterns that make it easier for you, use it
though people usually start the base case as 0 or 1
Yes, base case at index 0 is nice. Also for loops in increasing order is a little bit nicer than the reverse.
But it really doesn't matter
int i{2} Thats an interesting way of writing i = 2
it's actually the c++ way if you believe it
according to here
though that's in a for loop and idk what's the standard there
vscode autocompletes it to using = there and I can't be bothered
It can warn against some type of conversions. https://stackoverflow.com/questions/18222926/what-are-the-advantages-of-list-initialization-using-curly-braces
But I don't really see the point of it
I'd imagine it's more useful in bigger projects where it can catch potentially unintended behavior from devs
I'm sorry, why?
a^(p-2) % p = a^(-1) % p?
Fermat's little theorem
😛
wanted to fix the bad reply
a^p = a (mod p)
a^(p-1) = 1 (mod p)
a^(p-2) = a^-1 (mod p)
I meant the theorem in itself
Going from the theorem to the result is indeed easy 😄
Thank you Fermat
A day later after my question yesterday, finally ..
it was introduced to try to make initialization more uniform across different cases
for(int i{2}; i<dp.size(); i++) would you ever code like this?
I don't personally use it, at least not for primitive types
I know some people do use it
it does end up smoothing out some inconsistencies wrt trivial types
struct Foo {
int x;
int y;
Foo(int x_, int _y) {
x = x_;
y = y_;
}
};
struct Bar {
int x;
int y;
};
...
Foo foo1(42, 5); // works
Foo foo2 = {42, 5}; // didn't work historically
Foo foo3{42, 5}; // didn't work historically
Bar bar1(42, 5); // doesn't work
Bar bar2 = {42, 5}; // works
Bar bar3{42, 5}; // I think this worked, but not sure
...
it was a bit of a mess
Hi. Whats the usual way to store objects of a same class to access them?
I normally use a list, and when I need an specific object I'll do a list comprehension to locate it like: [x for x in cars if x.model == selected_car][0]
I realized I could have a dictionary were each key-value pair is a unique attribute of the object, and the value is the object itself. That way I could reference it by this unique attibute... but how would I handle multiple of them? A list of dictionaries? A dictionary with multiple levels? Or is there a "standard" method of doing this?
wdym by 'multiple of them'?
How would you store multiple cars and access 1 of them specifically later?
depends, ig? I don't think your situation is very clear to me rn
e.g. what about the value being a list of cars that has the attribute
Imagine you have this Car class with an attribute called model thats unique to every Car (obviously wrong, but just to simplify the example):
class Car:
def __init__(self, model, price):
self.model = model
self.price = price
cars = [Car("Fiesta", 5000), Car("Etios", 3000), Car("Clio", 2000)]
# I know model is unique, so I'm using a list comprehension to find the car where it matches and returning index 0, because I know there is only 1 that matches... this seems weird, it works, but I'm wondering if there is a "standard" or better way to do it
specific_car = [x for x in cars if x.model == 'Etios'][0]```
```py
code in here
```
Like that?
sure, so I assume you then used a dict like
cars = {
"Fiesta": Car("Fiesta", 5000),
"Etios": Car("Etios", 3000),
"Clio": Car("Clio", 2000)
}
specific_car = cars['Etios']
so what's the problem now?
Oh
No... I did something weird and stupid, but thats exactly was I looking for :p
Thanks a lot 🙂
So I’m solving this problem
Given an integer n, return the number of ways you can write n as the sum of consecutive positive integers.
And the answer is 1 + number of odd divisors that aren’t 1 and the number itself
I could formally prove it and it’s decent (works up to 10^9 easily)
—-
But I was wondering can I get the minimum number n such that there are exactly k ways to write n as a sum of consecutive positive integers
With k = 1000, my algorithm above + brute force (testing all n) took too long (I stopped it after like five seconds)
So I’m wondering if there’s an efficient way to solve this problem
I was thinking binary search instead of brute forcing but there’s no monotony so :/
ways == 1 + #odd_divisors
so... generate k-1 smallest odd divisors and multiply them together to get n?
But then I’d get additional divisors
Like 3^10 won’t get me only 10 odd divisors
It will also get me 3^2 and 3^8
So I’m gonna get too many
I think?
Although I guess I can count them easily
aren't those 2 included in the 10 divisors of 3^10?
Ah wait
Am I stupid
Yes
Sorry you’re right
I got confused
So 3^10 has 10 divisors and they’re all odd
So that’s the smallest we can make
Is it though
Im not sure
so that's 1 + 11 - 2 = 10, 10 ways (-2 cause you proved that you can't count 1 nor itself)
(3^10 has 11 divisors, including 3^0)
Im not sure it’s the smallest we can make though
it's certainly an upper bound, tho we prob can do better
I get across this problem in the round that just ended earlier
https://codeforces.com/contest/2003/problem/C
I proposed that the number of good pairs will be maximum if there is no consecutive same characters in the string, But I wasn't able to implement the code to reorder the string like that (and I couldn't quickly prove what I am doing is right so I just moved on)
Now the editorial says something like "sort the characters by frequency descending and then alternate between them"? I can roughly understand it but the code is still not clear to me (I can't really read C++ lol)
i.e. consider 5 * 3^5, which has (2 * 6) divisors, all odd because any 2 odd numbers multiplied is still odd
this would make 3^11(also 12 divisors) not the optimal
I did mess up my formula above it was 2 + number of odd divisors that aren’t number and itself
But that’s ok I’ll fix the by one errors later
Yes indeed
What if I want 10 divisors though
Im getting a bit confused by those one off error
I mean at this point you can just do the dumb thing and it'd still run fast right
i.e. for some target number of divisors n_d, generate an overestimated list of numbers that'd might have n_d divisors, filter those that don't out, and take the min from the rest
Real formula is just nb of odd divisors
Example 3^1=1+2 or 3^1=3
It has 2 odd divisor: 1 or itself
Not sure how you’d generate that list?
We’ll only be playing with 5 and 3 right?
If we want the smallest
Or maybe only primes?
if it gets large enough, it might be beneficial to also include 7, 11, 13, 17, etc.
primes, now that I think about it
n = int(input())
s = input()
counts = Counter(s)
result = []
maxheap: list[tuple[int, str]] = [] # (counts, char) counts will be neg to be maxheap
for c, cnt in counts.items():
heappush(maxheap, (-cnt, c))
while (maxheap):
cnt_c, c = heappop(maxheap)
if (not maxheap):
result.append(
result += c;
cnt_d, d = -1, '$' # dummy, won't be added to maxheap
else:
cnt_d, d = heappop(maxheap)
if result[-1] == c: # make sure to alternate
result.extend([d, c])
else:
result.extend([c, d])
cnt_c += 1
cnt_d += 1
if cnt_c != 0:
heappush(result, (cnt_c, c))
if cnt_d != 0:
heappush(result, (cnt_d, d))
print(''.join(result))
```roughly translated from my c++, might be weird because of that
E: actually they used a normal queue, so what I have has an extra `log`... ehh still passed tho who cares
E2: wait no I don't, the heap has `sigma` elements max, hell yeah
Yeah I think that’s the good way of approaching it
I should decompose 1000=2^3*5^3
And build the number
3^4 5^4 7^4 11 13 17
I feel like it’d work
But I’m not sure
Ah wait
I have way too many odd divisors
I think unless n or k get really large, then you should be fine doing dumb crap once you reach here
- gen a list of primes (sieve)
- for target
k, you need divisorsk-1, so choose the first ~(k-1)//2primes, generate candidates(can be dumb), filter those that actually havek-1divisors, then take themin
I had used 2 my bad
We only take primes that aren’t even
Otherwise we don’t create new odd divisors
And my formula was wrong lol, corrected it for posterity 💀
But all in all it works
noice
how would a normal queue work?
well I just passed with the max_heap
Idk but I just realized that my time complexity is the same as theirs, cause there's a max of sigma elements in the heap
I feel kinda dumb that I give up on that so easily
but it is my first ever time doing contests
how many did you get right?
in my first time, I only got 1 lol
I solved the first two and spent quite some time on C
then I just feel like I can't implement this even though I know the solution
I feel like the problem style is really different to what I have seen before
is fine
you'll do better 2nd time
...anyway I gotta sleep
i cycled the unique characters, that passed (eg aabcbac -> abc abc a)
oh right, that's probably better
Does anybody know of a good article/video on implementing and LL(1) parser for simple arithmetic grammar. I.E. ```
Expr -> Sum
Sum -> Prod ( '+' Prod)*
Prod -> NUMBER ( '' NUMBER )
(non-recursive)
should I learn oop first and then start dsa
Try to understand the Editorial. The minimize sum a_i * a_(i+1) expression.
After that, solving the problem is kind of trivial. Just alternate characters as often as possible.
I did not come up with the expression but kinda arrived at the conclusion by some guessing, the problem is that I couldn't write the code at that moment
do ppl really fully prove their solution like the editoral during contests?
people usually motivate it enough for them to be convinced that it should work
That depends. There are problems where people typically don't prove their results
- Greedy solutions
- Solved using patterns and guessing
For problem setters, 2. is a huge issue. A problem meant to be really difficult can turn out to be easy to guess the solution to
Especially using tools like oeis.org
Here is a problem I created on this theme https://open.kattis.com/problems/patrickstriangle
It was a real pain trying to make a problem like this
and not make it guessable using oeis
what does this have to do with oeis and guessing the solution
I don't quite understand it
are there problems really just asking you to find some specific sequences?
Yes, tons of combinatorial problems work like that
Do you not understand the problem itself?
no I mean this
Suppose the problem instead of asking for "Patrick's triangle" asked about the following triangle
This one is just straight up on oeis https://oeis.org/search?q=1+2+2+3+4+3+4+7+7+4&language=english&go=Search
So solving this version of the problem would have been trivial by guessing
Patrick's triangle was my attempt at making a problem of this kind, but without making it too easy to guess the answer
There is still a pretty simple pattern, but it is not super obvious
Another really cool example of guessing sequences is Berlekamp-Massey's algorithm https://codeforces.com/blog/entry/61306
Using it, your program can find linear reccurences (think fibonacci numbers)
This is surprisingly powerful. For example, the sequence #paths between s and t of length k, k = 1,2,3,4,... in an undirected graph, can be found using it
I belive it is the fastest algorithm out there for finding this sequence
The method works like this:
- Find the first 2*n numbers in the sequence with brute force
- Plug the numbers in Berlekamp-Massey
Here is a problem on that theme https://open.kattis.com/problems/forgottenhomework
It becomes easier yeah
Oop is the basics
It makes it easier to implement the dsa
idk how much it would help, most stuff is working with lists/dicts/whatever of built-in types
how hard is random forest?
As hard as it sounds
Hi everyone
you are literally looking at it
the string gets citisenship in british columbia?
but it gets highlighted like the b prefix
so it was in python at one point
when you change it it just gets red
bc get treated differently
and it says it is not supported by python 3.12
not python in general
please if someone knows tell me
there is no such string prefix as bc
and never was
and it is not planned
if you are worried about it, go report a bug to pycharm
Am I allowed to post code I need help with here or is that not allowed?
Or can I post a link to a reddit post for brevity?
sure, do it like so
```py
code in here
```
!paste or here, if it's too large for discord
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
I got help with pasting on main help channel. Just finisheed it. Writing question for here now. Thanx though.
So, the question. I'm trying to get alpaca dataframes to play with backtesting.py. I've tried to trim and match the dataframes as best as I could get but I'm now stuck.
Link to the paste.
https://paste.pythondiscord.com/U4FA
not familiar with those, but that is prob more suited for #data-science-and-ml for future ref
It deals with trading algos. It's just data structure issues.
So not here?
well, data structure as in dataframes right?
I'd say the 'data structs' in this channel are more like binary search trees, hashsets, etc
ok. I'll try there then
anyone know a simple python code that finds the critical path from a precedence/activity table (which suites micropython)?
check out what i made
I need the code that does the following (chatGPT cannot really do it):
- I have a mapping (126 data points) from 2d pixel coordiantes to 3d world coordinate
- I need code to find the extrinsic matrix transformation (I have already the intrinsic)
Can anyone suggest an algorithm that will separate the parts of different colors in this picture with straight lines? 4 pieces, none of the pieces have a completely certain intensity value.
what algorithm should I use to create an eficcient text search engine. I want to be able to find a string a substring, i.e. hello from ello.
and the index dosent need to be regenerated often, because the text is a bunch of books that are the same...
I can also implement it using another language and create python extension...
@brazen python Use lucene. It has python library as well.
thanks looks pretty good (meets all the requirements), I need to find some benchmarks now
do you know of any other alternatives I can compare too?
@brazen python From Rust: https://github.com/quickwit-oss/tantivy The idea is from lucene and written in rust lang.
https://github.com/terrier-org/terrier-core It has python library.
There would be many others in this domain. However for text search I would look at lucene and unless having any specific problem.
Solr and Elasticsearch is application based on Lucene.
For a autocomplete type search, I would be using a Trie data structure.
wow thanks so much for the resources
I was just looking for some Rust alternatives lol
is there an "off the shelf" library for autocomplete search?
All those library/application has the autocomplete feature.
There is another rust application: https://github.com/meilisearch/meilisearch
oh yeah right
I was looking for this one again, thanks
Guys can someone suggest me any project idea where I can leverage DS Algo concepts? Intending to level up my resume
hello i'm just starting leetcoding and i was wondering which algorithms do i need to know to begin
update on my TSP algorithm
I don't know if this question can be asked there but I try to do some abstraction with ABC and I get a attribute-defined-outside-init / W0201 from pylint when I want to set variable in the child class :/ Is it mandatory to add the variable in the init of the child class ?
not really the right channel, but it's probably a good idea to do so
otherwise you have this awkward situation where an attribute might or might not exist
probably better to initialize it to some value
like
class A(ABC):
def __init__(self):
self.x = None
class B(A):
def parse_args(self, args):
self.x = args.x
and in parse_args it says that I need to define x in init
I initialize it in the parent class
maybe ask in #python-discussion
I can't remember the exact semantics here
it's quite possible it's really a false positive
its geany IDE
are you using a genetic algorithm here ?
its a mix between genetic algorithm and a few manual algorithms
but most of the work is done using the manual algorithms
the best algorithm i have right now is one where it takes 2 points and checks if reversing any subset between those two points is a shorter distance
it goes like
for A in range(numnodes):
for B in range(A,numnodes):
path = currentpath.copy()
cdist = dist(path)
path[A:B] = reversed(path[A:B])
rdist = dist(path)
if rdist<cdist:
currentpath = path.copy()
sorta like that?
thats nice, its just like doing a mutation on a route in a genetic algorithm
i worked on a solution once which i read in the book "Math adventures with python" , it used an initial population of random routes and kept mutating and selecting the top routes to breed the population again, you can see it here : https://github.com/MrElyazid/MathAdventures/tree/main/geneticAlgorithms/TSP
TSP is cool !
Anyone good at graphs and wants to help? I have a question about testcase of todays graph problem, 2699.
are there some books/videos where the algorithms and especially runtimes (O notation code) are discussed with the help of python code examples?
from typing import List
from collections import deque
class Solution:
def shortestPathBinaryMatrix(self, grid: List[List[int]]) -> int:
ROWS, COLS = len(grid), len(grid[0])
if grid[0][0] == 1 or grid[ROWS - 1][COLS - 1] == 1:
return -1
visit = set()
queue = deque()
queue.append((0, 0))
visit.add((0, 0))
length = 1
while queue:
for _ in range(len(queue)):
r, c = queue.popleft()
if r == ROWS - 1 and c == COLS - 1:
return length
neighbors = [[0, 1], [0, -1], [1, 0], [-1, 0], [1, 1], [-1, -1], [-1, 1], [1, -1]]
for dr, dc in neighbors:
if (min(r + dr, c + dc) < 0 or
r + dr == ROWS or c + dc == COLS or
(r + dr, c + dc) in visit or grid[r + dr][c + dc] == 1):
continue
queue.append((r + dr, c + dc))
visit.add((r + dr, c + dc))
length += 1
return -1
im trying to understand something as why are we doing min(r + dr, c +dc) < 0
shouldnt we just if r + dr < 0 or c + dc < 0 instead
like how using that min function is simplying the process
OK
nvm ok ya that makes sense
cause if the smaller value is bigger than 0 then ofc the other one would be bigger than 0 too and if even one of the cordinates is out of bound we dont care about the other one
lol
still i feel like this is complicating the code for first time learners
but ya that makes sense
I often hear that modulus is slow
But I'm struggling to reproduce it locally
even on big numbers
(100,000,000 runs)
makes sense that %2 is fast
but %7524 should be really slow, shouldn't it? I was expecting at least a *2 factor
i'm on Python (3.12.4) not Pypy so there shouldn't be any sort of JIT magic
I do get a 'big' jump for integer division, but again, it's not even that big
Python is not the right place for benchmarking low level operations
But I do find some of your results weird
for sure but does that mean that in Python there isn't that much of a difference?
that's surprising to me
// and % should internally be identical
I'm running it on colab to check if my machine is being weird
but colab is 3.8 I think
Btw have you heard of divmod()
it does both right
Ye
a = 1234**20 + 5
%timeit a + 1234
%timeit a * 1234
%timeit a / 1234
%timeit a // 1234
%timeit a % 1234
23.1 ns ± 0.141 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
33.1 ns ± 0.737 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
55.9 ns ± 0.453 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
51.7 ns ± 0.758 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.8 ns ± 0.615 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
But like, from my understanding, // and % are both using divmod internally
more than likely you're mostly just measuring noise
well averaging runs on 10^7 loops or so should be good enough right?
is there any weird byte compiling stuff going on?
that's why I didn't use pypy
fear of JIT messing up with me
i get similar results with your numbers
modulus faster than integer division too
I also tried swapping the operations to see if the first ones were slower due to warm up or whatever but nah
it's fairly consistent
Just note that 1234**20 is huge, but 1234**20%1234 is tiny
ok but small numbers won't have much difference
because there's barely any operations
so i end up with the conclusion
mod isn't that much of a problem in python
compared to addition
What
it's running
a = 3*20 + 5
%timeit a + 7
%timeit a * 7
%timeit a / 7
%timeit a // 7
%timeit a % 7
gets me
12.8 ns ± 0.131 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
17.6 ns ± 0.262 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
19.8 ns ± 0.0741 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
14.9 ns ± 0.18 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
16.8 ns ± 0.69 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
it makes sense
but the difference is so small
import random
random.seed(42)
a = random.randint(1234**20, 1234**21)
b = random.randint(1234**20, 1234**21)
print(a, '\n', b)
%timeit a + b
%timeit a * b
%timeit a / b
%timeit a // b
%timeit a % b
23573866323295538555680340564482258280076714391144691590903200157
57500839459739648747853969033792067870446155720047493529779880425
26 ns ± 0.385 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
109 ns ± 1.04 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
118 ns ± 1.83 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
20.1 ns ± 0.211 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
20.7 ns ± 0.135 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
actually that's prob cheating again cause a < b
after swapping a, b around:
25.7 ns ± 0.32 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
108 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
117 ns ± 1.81 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
72.2 ns ± 0.512 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
69.8 ns ± 0.951 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
```this seems more inline with the assessment that `// and % both use divmod`
the numbers you're showing us are like
insanely big
although the conclusions are to be fair more consistent with what I'd expect
Btw what times do you get with timeit pass
good question
a = 3*20 + 5
%timeit b = a + 7
%timeit b = a * 7
%timeit b = a / 7
%timeit b = a // 7
%timeit b = a % 7
%timeit pass
I'll run with assignment too
in case if it's not assigned it's ignored
although the fact that // is a bit slower than + consistently tells me I'm not only measuring noise
on my pc %timeit pass is 5.24 ns ± 0.0547 ns
13.2 ns ± 0.298 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
16.5 ns ± 0.228 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
19.1 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
14.1 ns ± 0.192 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
15.6 ns ± 0.195 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
4.01 ns ± 0.0306 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
small numbers: a small (significant but small) difference
very very very big numbers: fair enough
I'm a bit disappointed or at least wasn't expecting that
11 ns ± 0.0669 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
again, honestly quite aligned with my intuition
but the gaps aren't as much as i'd have expected
Unitary + being slow?
well it doesn't know in advance a is positive
What
ah no
i'm stupid
for some reason b = a is faster
and i can reproduce
that one seems counterintuitive fair enough
Can you bench PyPy too?
yeah
Pass being 3rd slowest 
if it's getting compiled away, maybe you want to interleave say print(b)s between each %timeit or smthn? idk how pypy works
no like
%timeit b = a + 7
print(b)
$timeit b = a * 7
...
```etc
ah
I usually try benchmarking something like
b = 0
for i in range(10**6):
b = (b + i) % a
That way, pypy cant really cheat
I'm more happy with that
Still Idk
(it's 10000000 times, i made a typo)
in Python %2 not being faster shocks me
Pypy ten times faster but %2 ain't faster
& 1 is so much faster
than % 2
the compiled couldn't have overwritten % 2 by & 1 ??
proof this works on codeforces too (&1 faster than %2)
I would not trust those numbers on cf
it does (seem to) work locally though
I guess it's not that important
don't even know why I got into looking at this
I would just not trust cf numbers in general
I should crawl every submission that uses % 2, and re run it with & 1
and plot the distributions
and submit it to pypy or the python team
😆
although I'm sure they thought about it already and have a good reason why they didn't replace %2 by &1
Technically there can be a difference between the two
If the thing on the left is some homemade class
fair
You're assuming you're working with integers. The AST optimizer doesn't investigate beyond names...
Ah that’s why in c++ they manage to get to speed up
Because it’s strongly typed?
That's pretty much the case for everything
&1 only works with int
Command "1" is not found
lol
>>> 1&1
1
>>> 1.0&1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for &: 'float' and 'int'
i usually use &1 anyway though
!e
print(1.0%1)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0.0
!e
print(2.0%1)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0.0
what is modulus with floats
it's 2
I'm stupid
its the same for integers and floats
but & is bitwise AND
!e
print(1.0%2)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
1.0
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0.0
this used in cellular automata also?? can you give the nodes more autonomy even?
damn thats a cool post
no
it uses a genetic algorithm
what are these cursed param names
In 1931 the International Commission on Illumination (CIE) published the CIE 1931 color spaces which define the relationship between the visible spectrum and the visual sensation of specific colors by human color vision. The CIE color spaces are mathematical models that create a "standard observer", which attempts to predict the perception of un...
im just copying them from this
this thing is gonna be really confusing anyway what with all the magic numbers, so i figured i might as well keep it close
you're trying to generate random colors that look good?
I swear I've unironically seen a newer scheme cooked up by 1 random guy that was proved to be better, and there was a website to test it
no
im trying to convert wavelengths to color
and i think i know the one you're talking about, OKLAB?
i implemented that one already a year or two ago
ye that one, nice
its pretty slow but it works
i can try optimizing it some time
damn its off i think
heres with better scaling
its supposed to look like this
oh one of my signs was of
got it
ok i dont completely underestand the difference between dfs and bfs when it comes to matrix
it makes sense to me in trees
i might need to study the topic more in depth
ok nvm makes sense to me now
backtracking
ok ya its the same thing as the tree idk why i was being dumb
code
!e
print(2.0%2)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0.0
can anyone suggest an algorithm that draws straight lines based on this picture, giving the starting and ending points of the lines?
I found an algorithm that draws lines, but the lines are not straight enough and do not give start and end points
I don't quite understand the third example output of this question
https://codeforces.com/contest/2008/problem/F
(I am not sure if I can ask here if the contest is just over and it is in hacking phase btw)
So for n = 5 and a = [1, 2, 3, 4, 5] the sum of products should be 85 and the answer should be 85 * 5^-1 mod 10**9 + 7? I tried multiple library codes to calculate the mod inverse of 5^-1 mod 10**9 + 7 and it should be 400000003. I got the output 17 at the end but how is the answer 500000012? The mod inverse is surely right so Idk what happens here
My code just in case
def pow_mod(x, n, m):
y = 1
while n > 0:
if n % 2:
y = y * x % m
n //= 2
x = x * x % m
return y
for _ in range(int(input())):
n = int(input())
a = [int(i) for i in input().split()]
s = sum(a)
MOD = 10 ** 9 + 7
ans = 0
for i, j in enumerate(a):
s -= j
ans = (ans + s * j % MOD) % MOD
print(ans * pow_mod(n, MOD - 2, MOD) % MOD)
Q isnt n in general
(almost)
tangent, but why not use the builtin pow?
n(n-1) / 2 actually
I vaguely rmb pow can do a**n mod m but I didn't see it when I checked the docs
pow(x, -1, m) works
well the signature is pow(base, exp, mod=None), so ye you can do that
tbh I'm not sure if pow needs to exist without the mod capabilities (**)
math.pow:
ok but actually what is the use of that
I'd assume compatibility...?
Unlike the built-in ** operator, math.pow() converts both its arguments to type float. Use ** or the built-in pow() function for computing exact integer powers.
dunno
I guess I have lost my common sense when doing contests
Math.pow is only there because it is part of cmath
I've seen many times how people with many import * accidentally call math.pow, and get weird bugs from that
(math.h in C, cmath in C++)
(cmath in python is a different thing)
Consider this problem with dijkstra
In an other post the user ask this
In the solution, the user mention that he will just create a dummy node and connect to the sources with weight 0 and run dijkstra from there to know who is the actual shortest source
But my lecturer say it wouldn't work fully
I don't understand why not
Can anyone explain?
what's the problem your lecturer gave you?
@narrow mica this is what he say
But I don't get how is this possible
A and B are the sources
S is the node introduced
T is the single destination
I meant to ask what the problem you're trying to solve is
multi-source shortest paths?
Yes
idk what this is on about then
to reiterate, the idea is to make a virtual node S and connect S with every source node with an edge of weight 0
then just run dijkstra from S to T, nothing should go wrong unless the graph has negative edges (then dijkstra won't work to begin with)
with a correct dijkstra, if S - A - X - T is faster than S - B - X - T, then S - A - X - T would appear be in the queue first so what they said is literally not going to happen
and tbh you don't even need the virtual node really, just a slight modification to dijkstra
def dijk(adj_list: list[list[Edge]], sources: list[int]):
dists = [inf] * len(adj_list)
for s in sources:
dists[s] = 0
qu = heapq.heapify(sources)
'proceed with dijkstra as normal'
That's a standard technique that definitely works.
However, had you used bfs instead of Dijkstra then 0 weight edges wouldn't be allowed
Bfs is the only case I know of where this technique "fails"
Maybe your teacher tried to tell you that his proof of Dijkstra assumes edge weights > 0?
That doesn't mean you are wrong, it just means that the proof you've been shown for Dijkstra is not a proof that your technique for multi source work's.
But this is just purely speculations on my part
Space complexity for the storage (only a constant amount of internal memory should be used during the search)```
I answered that we can arrange the in alphabetical order of the words and the save it in a file by turning it into like an array where each element contains the words and a character position to its children. Or use a formula to get there given a balanced tree. The point is I don't really understand the "space complexity" problem. How can you calculate a space complexity for a file on a disc? According to google: the definition goes: ```the total space taken by the algorithm with respect to the input size.```
Hi everyone,
I'm currently working on a lab for the algo, data structures and complexity course, which involves creating a concordance data structure on the file system and implementing a search program that retrieves word occurrences along with their surrounding context. For Task 3, we need to evaluate different data structures (binary search tree, sorted array, hash table, trie, and lazy hashing) for implementing the concordance on file. I need help with the following points:
-
Implementation Details: How would you go about implementing these data structures on file, especially considering we should use as little internal memory as possible? Are there any resources or examples that show how to handle pointers or references on disk, especially when dealing with large text files?
-
Performance Considerations: The task requires us to compare the speed (number of file reads and seeks per search), memory complexity for file storage, and the ease of construction and storage on file. Does anyone have insights or experience on which data structures are most efficient in these aspects? I'm particularly struggling to understand how to keep the search fast when the data is not in memory.
-
Why Lazy Hashing (Latmanshashning)?: In this lab, we are encouraged to use lazy hashing, also known as "latmanshashning." This method hashes only on the first three letters of the search key and then uses binary search to refine the results. It is particularly suited for searches with few disk accesses in large texts when the index can't fit in primary memory. I'm trying to fully grasp why this approach is preferred over other data structures like tries or hash tables. I understand that it maintains constant memory complexity, but I’m not clear on how it compares practically with the other options in terms of implementation complexity and speed.
are you in the same lab with the last person lol
wjat's mongodb and should i store my database in the hosting provider or mongodb??
if you don't know, you probably don't need it
and not related to this channel
Can someone help me with this problem. So, I’m supposed to modify a linear probe function for a hash map adt
The purpose of the original linear probe function is to find the next empty slot in an array to add a key value pair. It accepts a string key, hashes it by performing some function on the characters of the string key and returns an integer representing an index of the array. But since the hashing isn’t perfect, collisions are bound to happen. For instance if indices 5,6,7,8,9 are occupied and a new key to be added hashes to index 5, linear probe function when called will have to loop about 5 times (+1 increment) to get to index 10 which is an empty slot and then return that index. So the requirement for the upgrade is to make a hash function that returns a dynamic step size for each key so that the linear probe function can find an empty slot faster since the increment is no longer just +1. How should I go about doing this?
Does anyone know big o analysis?
if you have a question you should ask it directly rather than asking general things like that
Ive seen several people state
Performing DFS on a directed acyclic graph and sorting the vertices in descending order of finish times gives a topological ordering of vertices.
—
But if I have « two clusters »
Let’s say a->b->c and d->e
The finishing times are
a 3
b 2
c 1
d 2
e 1
And b and d aren’t supposed to be at the same level of a topological order; it should be a and d that are
Im struggling to make sense of it
wdym by 'level' and 'finishing time'?
for your graph, a b c d e is a valid topological ordering, so is d e a b c or d a b c e or many others
Ahhh
I also don't see how there can be the same finishing time
dfs should visit nodes 1 at a time
Yes I was convinced a and d had to be at the beginning
But a can be before or after d in the topological ordering
I had forgotten the true definition of a topological order
Makes sense now
I thought we’d reset the timer when running a new dfs at a new cluster
All clear now sorry I had gotten confused with several things
np
Note that depending on the order you start doing the DFS from, you might find for example b->c, then d->e, then a
The thing you call "clusters" are usually called weakly connected components
has anyone this being used before? or do you use another technique to check if one child doesnt exist
Can work but I find the basic approach easier if not node.left or not node.right. Why do you ask? It's not particularly important how you check so long as the logic is correct.
in this case it's slightly different because the original checks for exactly 1 node being None
Ah right, XOR.
Thanks! Was looking for the directed version of « connected components » when writing my message
I would not call that variable empty. It is a very confusing name
yes that I think would be the best way to go about it
@lru_cache(None)
def dfs(n_: int, k_: int) -> int:
# n_ number of planes in that direction
# k decay age
if n_ == 0:
return 1
if k_ == 1:
return 1
ans: int = 0
for new_n in range(n_-1, -1, -1):
number_of_planes_in_other_direction: int = n-new_n-1
ans += dfs(number_of_planes_in_other_direction, k_-1)
ans %= MOD
return (ans + 1) % MOD
Is it fair to say that this function call is of complexity O(k*n) O(k^n)?
I'm struggling to analyse functions with memoization
Each call to this function will generate n calls to a function with k = k-1
Even if those calls were to magically take O(1) thanks to an already filled, I feel like it's more on the order of k^n than k*n
Would this fairly simple analysis be correct?
how would it be n^k?
Sorry k^n
At k we have n calls, each of those n calls (we are at k - 1) call at most n other times dfs, so we already at n^2 calls
how many unique arguments are there?
Two
err, unique sets of parameters
Ah k*n
right
Which is why I initially thought memoization would work
But I figured I might call a lot of time
the cache
And I thought I ended up calling the cache at most k^n time
By this logic
(I didn't include the link to the problem statement because i'm not really interested in an answer to the problem but rather how to properly analyse my (faulty since it TLE) function - k and n are both integers that are <= 1000)
with the memoization this feels something like O(n² k)
ah yes
that actually makes more sense
because they don't all generate calls that will also generate other calls if we hit the cache (idk if it makes sense to you but it does to me)
i'm dumb
things will exit early due to the caching yeah
yes
ok that actually makes sense
It feels hard to analyse complexity when caching is involved
I could feel it was not k*n because I was TLEing but also not k^n because k = 100 and n = 100 took like 1 second on my computer locally
this thing will have some nice closed form formula
the result
Could be, I'm going to keep searching for it though (no spoiler :D)
(for reference it was that problem https://codeforces.com/problemset/problem/1498/C)
is there anything i can use for parsing and writing a format (KSP's '.cfg' format) that python doesnt natively have tools for, or should i just use strings and stuff
heres a sample of what that looks like
my main concern is how i'm gonna do tag nesting
for writing, maybe using f string BS and recursion?
one thing i specifically want to do but idk the name for it and idk the most efficient way to do it is sort of parsing nested stuff in strings
i did it once when i made a brainfuck interpreter to do loops but i dont remember where i found the algorithm for that from
feel free to contribute your solutions in any other language, aside JavaScript https://github.com/O-BERNARDOFOEGBU/DSA-solutions
also, don't forget to leave a star
Welcome to my repository of Data Structure and Algorithm (DSA) solutions! Here, you'll find my personal solutions to various DSA problems as listed on Algomap.io, a free Data Structure and ...
I watched the discrete math videos and so I hope that can help them out
Am about to take discrete math and am wondering what to expect
Did you see the pinned MIT OCW video? A skim over the topics might be helpful, just to get familiar with teh terms and ideas.
Can you give me the link as I watched Trefor bazett
Thank you
Did anybody here take it online and if so how was your experience with it
can someone help me im in 9th grade and im starting pyhton CSP but the teacher gave me like assignments that i dont even know anyhting about, does anyone know like the best website to learn pyhton on and like a one that would actually stick in ur memory, and how do i like make my code more accurate and shorter
I have some question on this problem
https://codeforces.com/problemset/problem/2010/C2
The easy version of this only has the string length up to 100, so I just brute force all the possible overlaps starting from 1 to n / 2. Although here the string length is up to 4 * 10^5, I think you can still do the same thing and would only need some method of fast substring comparison. Is there a better way to solve this problem?
you could use a polynomial hash modulo some large prime, and then verify with a string comparison when you've found a candidate with the same hash
yes that's what I have thought
but you would still try all possible positions
also the site just broke for me and can't render latex
for the easy version I try every substring starting from i = 1 to i = (n + 1) / 2
where s[i] = s[0]
so the hard version is just that but use hash to compare strings
isn't the criterion that a suffix of length k is equal to a prefix of length k?
well Idk about suffix stuff
well and k is large enough that a prefix suffix covers the entire string
they would overlap, yeah
that's the entire thing of the problem if I read it correctly
alr
the point of the polynomial hash is that you don't need to recompute the entire thing each time
if you have the hash of the k-1th prefix, you don't need to iterate through all k characters to compute the hash of the next one
it's fairly straightforward, when doing it modulo a prime the one thing you need to know about is the modular inverse of the base mod the prime
since you can't do regular division anymore
actually...
does that even show up in this case?
it does when you do a sliding window
I don't think so since you need the suffix and prefix to have powers in the same direction
but here we just expand a prefix/suffix
I thought the same as well
prefix extending is just adding coefficient * base ** power which you can compute along the way
and suffix should just be multiplying base and then adding coefficient
all modulo ofc
yup
how do you train a weak learn for gradient boosting?
Here is how you do it
MOD = 2**61 - 1
x = random.randrange(MOD)
hashes = [0]
powers = [1]
for c in S:
hashes.append((hashes[-1] * x + c) % MOD)
powers.append( powers[-1] * x % MOD)
# Here `hashes[k]` is the hash of `S[:k]`
# Compute rolling hash of S[l:r] in O(1) time
def rolling_hash(l, r):
return (hashes[r] - hashes[l] * powers[r - l]) % MOD
what is the x for ?
Rolling hash is a type of polynomial hash.
If S = [5, 10, 2], then the corresponding polynomial is simply 5*x^2 + 10*x + 2
To actually compute the hash, you need to fix a value of x
Yes
Think about this, for which x are two polynomials P_1(x) and P_2(x) the same (assuming they are different polynomials)?
Thats the same as asking, for which x is the polynomial P_1(x) - P_2(x) zero?
so if my x is fixed, they can find another random polynomial such that x computes identical hash for both. And construct another wrong string out of it?
Yes with birthday paradox
Create sqrt(2^61 - 1) random strings
so you should choose very large primes
By birthday paradox, likely two will share the same hash
Yes. 10^9 + 7 is not enough
(sorry for necroing but)
I'm not sure how that'd work? you need 2n numbers for berlekamp-massey to work, but the problem only gives you 2n-1 numbers?
That's very much intendend 
so... intended for b-m to not work, you mean?
There is a way to find one more number...
BM definitely works
hm... well, either I'm really dumb, or my BM implementation is incorrect(which would also imply I'm really dumb but shush)
(I think ||the 0th number is 1 if i == j else 0||?)
well then, something's wrong with my bm impl ig
With probability 2^61, yes
It doesn't matter
There are always gonna be some bad values for x
(another way is to choose multiple xs and check all hashes are the same)
what values are bad in general?
Just a warning about this. Two smaller modulo with a fixed x is really dangerous.
That's easy to hack
It is relatively easy to create a (non-zero) string that hashes to 0 in both mods
i c
Small mod is bad because of birthday paradox reasons
Fixed x is bad because people can construct "evil" strings that for example hash to 0
Non-prime mods are also bad. Most importantaly, for mod 2^64 there are strings that always hash to 0, no matter the value of x
(this sequence https://en.wikipedia.org/wiki/Thue–Morse_sequence)
if I calculate with positive coefficients then it's wrong
if I calculate with negative coefficients and set coeff = -coeff at the end, it's right
wut
how big of an issue is hacking even for this task? a false positive you can discard in O(n), I would assume you can't generate a ton of collisions even if I gave you a fixed x
Well if I make a short string with hash = 0
Then I can just repeat it (or/and combine different short strings with hash=0)
Then I could make tons of false collisions
Since the rolling hash function would just return 0
kinda feels like comparing overlapping prefix and suffix would make it harder to construct collisions 
conceptually I think this would be the solution
for n in reversed(range(1, len(s))):
if polyhash(s[:n]) == polyhash(s[-n:]):
if s[:n] == s[-n:]:
...found it...
I could make basically all of those if polyhash(s[:n]) == polyhash(s[-n:]): return true
While every s[:n] == s[-n:]: returns false

well I got mle with the 4 * 10^5 testcase
I need to store the hash and possibly a slice of the string
can you link your submission
It shouldn't MLE I think
Ah I found your submission
pre_hash.append(pre_hash[-1] * x + c % MOD) this is wrong
You need to mod everything
There is a missing parenthesis
Yeah I figured
https://codeforces.com/contest/2010/submission/279953225 It gets AC with that fixed
kinda off topic but the site couldn't render latex for me properly, but it works when I am in incognito mode or use other browsers (I was using firefox)
There is a "complete problemset" button
Maybe that works
nope
found a blog post about this but I don't want to click on the sus link
https://codeforces.com/blog/entry/103827
when trying tokenize these lines, i get an error because of a newline in the middle of a string. is there a way i can fix this, or should i just replace all new lines with a space?
your view UNTIL you can only see a tiny bit of green."
"TF_VR_SetIpd" "If you know your IPD, you can set it directly here.
If not, leave this field alone."
"TF_VR_SetEyeRelief" "If you know your eye relief, you can set it directly here.
If not, leave this field alone."
"TF_VR_UseControls" "Cursor keys to move, shift=faster, enter for next line.
D-pad to move, triggers=faster, A button for next line."```
this will not go through, and *will* result in a token error
Here is the code I'm attempting to use (https://stackoverflow.com/a/59143007):
def parse_valve_format(data):
dest = {}
stack = [dest]
for tok in tokenize.tokenize(io.BytesIO(data.encode()).readline):
if tok.type == token.STRING:
ts = ast.literal_eval(tok.string)
if isinstance(stack[-1], str):
# already a string on the stack?
# this has to be a key-value setting
key = stack.pop(-1)
stack[-1][key] = ts
else:
# otherwise assume we'll find a } soon
stack.append(ts)
elif tok.type == token.OP and tok.string == "{":
obj = {}
key = stack.pop(-1)
stack[-1][key] = obj
stack.append(obj)
elif tok.type == token.OP and tok.string == "}":
assert isinstance(stack[-1], dict), "stray }"
stack.pop(-1)
return dest
I cannot attach the file because it is too long, but that section of text is when the breakage occurs
I love message queues
is there a better channel suited for this? i'm losing my mind trying to find a way around this, i've lost six hours already
maybe open a help post #❓|how-to-get-help and ask in #python-discussion
hey guys
i js started coding yesterday
im watching a tutorial
but im wondering
do i watch the vid then start coding or do i code at the same time while hes explaining
watch and code at the same time so that you have a better understanding
hi guys, im trying to construct a graph in in O(E) time and aux space when i have given a list of edges wirh weight to construct them, is it possible? i cant use external libraries or set/hashmap
I have been stuck on this forever, please help me
Can you be more specific?
Wdym by construct a graph using a list of edges with weight
A list of edges is technically already a way to represent a graph
which tutorial
Learn Python basics in 1 hour! ⚡ This beginner-friendly tutorial will get you coding fast.
🚀 Want to dive deeper? Check out my Python mastery course:
- English edition: https://mosh.link/python-course
- Hindi (हिन्दी) edition: https://mosh.link/python-course-hindi
- Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF
📕 Get th...
theres a 6 hour one but idk if i should watch it when this vid exists
this is the power of sets
i dug up some old code of mine to render marching squares and found it was really laggy (both of these are doing 80 particles)
i did a thing here and there but the biggest performance boost i got was switching a list to a set
did you do existence checks in a list? 🥴
if by that you mean membership yes
at least you know better now 🙃
this is some pretty old code but yeah
at the time i didnt know
I actually had something very similar happen when i was doing space engineers blueprint creation where i was generating a sphere, but i couldnt be sure it wouldnt try placing a block twice so i had a list of visited positions
switching that to a set massively sped it up
a good rule of thumb is that if you do a membership check on something other than a set or a dict there is a good chance you're doing something questionable
i sometimes do strings for checking if a char is in that though usually its not for anything performance demanding
I didn't know this
A membership check is like if foo in list?
right
so if you do a bunch of lookups into a thing, you'll want that part to be fast
I see, thank you! Makes sense
did more optimization, it can handle 2000 particles without a ton of lag
fps isnt like 60 but its watchable
I am studying Python for 6 months for now and still don't know how to make good algorithm, ):
hi
uh
i dont' exactly need help with cs
but i need help w discrete math
i assume cs ppl know it best
need help negating biconditionals
Wow that's really cool
my bad, thought this was offtopic, good work
Can someone help me fix an error in my algorithm?
I'm trying to solve a multi-agent path planning question.
I've been stuck for 3 days 🥲
Hey, stuck on a problem here.
Problem statement:
You are given N strings consisting of symbols ( and ). Find whether there is a way to rearrange these strings in such a way that they form a balanced bracket sequence, if there is - print the order, else - print -1.
Solution discussion: https://discord.com/channels/267624335836053506/1281690496963706923
Code: https://paste.pythondiscord.com/UNYQ
Passes the sample test cases and fails on the very next test (I'm not given the test that it fails on). Also this code correctly solves all the test cases I could come up with...
Test samples + discussion: #1281883204416180316 message
is this a codeforces problem?
Here is one helpful advice, any bracket string ())((()())(() has an equivalent string )((, where the last string only consist of left-brackets first, and then right brackets
yeah, figured that out a while ago, represented them as 2 numbers (amount of ) and amount of ()
Another thing to notice, the string you place first must only be a ((( string
it is not (at least the statement isn't from it and I couldn't find it there)
Infact, all ((( strings should go first
and all )) should go last
Well ye
so only the )( need sorting
Well kinda
which I first came up with a complicated sorting and then a person from the last thread suggested a sorting by number of ")" - number of "("
my complex sorting got wrong answer on test 4, and his wrong answer on test 50..
@tight cedar yours*
I tried sorting by ")" now
I think you want to sort all strings with #"(" - #")" >= 0 by ordering them in increasing #")" order
Then do the opposite for the strings with #"(" - #")" < 0. Order them in decreasing #"(" order
def order(strings):
pos = []
neg = []
for S,i in enumerate(strings):
if S.count('(') >= S.count(')'):
pos.append(i)
else:
neg.append(i)
pos.sorted(key = (lambda i: S[i].count(')')))
neg.sorted(key = (lambda i: S[i].count('(')), reverse = True)
return pos + neg
This way it will start with the ((( strings
and then greedily add strings with more (((( than )))
what if we have ((( and it greedily adds ))))(((((((((
Note that I add the strings in increasing order of #")" count
If the first two strings in the order is (((, and then ))))(((((((((, then it is impossible to begin with
Only way to save this is if there was a string like )))((((
In that case, according to my greedy sorting rule, )))(((( would be added before ))))((((((((( (because )))(((( has fewer ")")
seems like this actually works
this is actually just my first solution but simplified
and it makes sense
but when will it be impossible tho
You just have to check if the greedy sorting gives a valid bracket string
def check_if_balanced(S):
dif = 0 # left - right
for c in S:
dif += (c == '(') - (c == ')')
if dif < 0:
return False
return dif == 0
return dif == 0 (I made that mistake when I wrote the same thing)
uhh not sure if I did anything wrong with this
def valid(s: str):
bal = 0
for i in s:
if i == "(":
bal += 1
else:
bal -= 1
if bal < 0:
return False
return bal == 0
for _ in range(int(input())):
N = int(input())
bracket_types = ([], [])
lines = []
total_bal = 0
for i in range(N):
s = input()
lines.append(s)
left = s.count("(")
right = len(s) - left
bal = left - right
# categorize the strings
if bal >= 0:
bracket_types[0].append((left, right, i + 1))
else:
bracket_types[1].append((left, right, i + 1))
if total_bal != 0:
print(-1)
continue
bracket_types[0].sort(key=lambda x: x[1])
bracket_types[1].sort(key=lambda x: x[0], reverse=True)
ans = []
for b in bracket_types:
for _, _, i in b:
ans.append(i)
seq = "".join(lines[i - 1] for i in ans)
if valid(seq):
print(*ans)
else:
print(-1)
I forgot about the total bal thing but that shouldn't matter too much
Looks correct
I would code it something like this
def check_if_balanced(S):
dif = 0 # left - right
for c in S:
dif += (c == '(') - (c == ')')
if dif < 0:
return False
return dif == 0
t = int(input())
for _ in range(t):
n = int(input())
S = [input() for _ in range(n)]
lcount = [s.count('(') for s in S]
rcount = [s.count(')') for s in S]
pos = [i for i in range(n) if lcount[i] >= rcount[i]]
neg = [i for i in range(n) if lcount[i] < rcount[i]]
pos.sort(key = rcount.__getitem__)
neg.sort(key = lcount.__getitem__, reverse = True)
order = pos + neg
if check_if_balanced(''.join(S[i] for i in order)):
print(*order) # Warning, 0-index print
else:
print(-1)
I just got this case
1
4
)()()(()()
))(
()))()
(((
where the program outputs -1 (it calculated 4 1 3 2 but the actual answer is 4 1 2 3)
and I just submitted my solution that I wrote, pajenegod's solution and your solution and they all got wrong answer on test 18
yeah this is more complicated than I thought
Oh I see the issue
The lcount and rcount should be on the reduced strings
The ones looking like )))(((
So you would need to keep track of the minimum balance ever reached
whats this for btw?
It checks if a () string is "balanced"
(())) is not balanced
(())() is balanced
oohhh
so the same amount of open and closed?
or does it nees to be logically open and closed?
is.. )((()) balanced?
no
well i have two ideas
val = input()
balanced = abs(val.count("(")-val.count(")"))*-1+1
and then the second idea
val = input()
balanced = sum([ x=="(" - x==")" for x in val])
i think that works?
either of these work?
oohhh interestingg
val = input()
balanced = (abs(val.count("(")-val.count(")"))*-1+1) and val[0]=="(" and val[-1]==")")
lol, what about this?
ooohh good point yeah
There are some funny algorithms that work
def check(S):
while '()' in S:
S = S.replace('()','')
return not S
!e
def check(S):
while '()' in S:
S = S.replace('()','')
return not S
print(check('())(()'))
:white_check_mark: Your 3.12 eval job has completed with return code 0.
False
But this one runs really slow
val = input()
count = 0
for x in val:
count+=x=="("
count-=x==")"
if count<0:
break
print(count==0)
think i got a solution
Thats just this algorithm^
oh lmaooo
I saw there's a interactive runner / stress tester on pyrival but I don't really know how to use it
The "standard" interactive runner out there is the one codejam used
https://codeforces.com/blog/entry/66578?#comment-512288 you can find the code in this comment
[8, 2, 30, 2, 6, 2, 4, 114, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 442, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 58, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 1738, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 58, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 202, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 58, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 74, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 42, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 34, 2, 4, 8, 16, 32, 6890, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 58, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 202, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 58, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8, 16, 74, 2, 6, 2, 4, 18, 2, 6, 2, 4, 10, 2, 4, 8, 26, 2, 6, 2, 4, 10, 2, 4, 8, 18, 2, 4, 8,
... (truncated
i calculated the binary differences between each of the possible states that parenthesis are balanced
e.g.
1 0 = ()
and the next closest is
1 0 1 0 = ()()
with a difference of eight
# CBS (Conflict-Based Search) algorithm for multi-agent pathfinding
def cbs(agents, rail, starts, goals, max_timestep):
paths = []
for start, goal in zip(starts, goals):
path = get_path(start, None, goal, rail, max_timestep)
if not path or len(path) <= 1:
return None # No valid path
paths.append(path)
root = CBSNode(paths=paths, conflicts=detect_conflicts(paths), cost=sum(len(path) for path in paths))
open_list = [root]
while open_list:
node = heapq.heappop(open_list)
if not node.conflicts:
return node.paths # Return the conflict-free paths
conflict = node.conflicts[0]
for agent in [conflict.agent1, conflict.agent2]:
new_paths = list(node.paths)
new_paths[agent] = get_path(starts[agent], None, goals[agent], rail, max_timestep)
if not new_paths[agent] or len(new_paths[agent]) <= 1:
continue # No valid path
new_conflicts = detect_conflicts(new_paths)
new_node = CBSNode(paths=new_paths, conflicts=new_conflicts, cost=sum(len(path) for path in new_paths))
heapq.heappush(open_list, new_node)
return None # If no solution is found, return None
For some reason, my attempt at conflict based search keeps failing.Somebody please help
I've been stuck for 3 days
This question concerns expanding Covid vaccines and reaching herd immunity in the US. Assume that herd immunity is reached and the pandemic ends when 80% of a country’s population is vaccinated. (We will assume for simplicity that only one vaccine is needed.) Assume the following:
Population of the US = 330 million
Number of people already vaccinated = 32 million
Current vaccination rate = 1.2 million per day
Target Time Period (in weeks) input by the user
A Daily Increase Rate of vaccines as a fixed percentage of the base vaccination rate. (i.e., if this parameter is entered as 10, then the number of vaccines administered will increase by 10% of 1.2 million, which is .12 million, every day. So, it will be 1.2, 1.32, 1.44, 1.56…. millions on successive days)
And compute and print the following results:
Percentage of population vaccinated by the end of the target time period if the current vaccination rate is increased by the daily increase rate every day.
My code to this question:
uspopulation = 330_000_000
no_of_vaccinated = 32_000_000
vaccine_rate = 1_200_000
target_time = int(input("Target time periods (in weeks): "))
target_time_indays = 7 * target_time
daily_increase = int(input("Daily vaccination increase rate (in %): "))
vaccine_rate = vaccine_rate + ((daily_increase/100) * vaccine_rate * target_time_indays)
percentage_vaccinated = ((no_of_vaccinated + (target_time_indays * vaccine_rate)) / (uspopulation)) * 100
print(f"Percentage of population vaccinated by the end of the target period ({target_time_indays} days or {target_time} week(s)) is {percentage_vaccinated:.2f}%")```
Where did i go wrong? Why is it wrong?
whats the algorithm called where you converge to a point by essentially multiplying and dividing an offset, like the attached image
wdym by multiplying and dividing by an offset
looks a bit like simulating the step response of a second order system
yeah, not sure what you mean here 
sounds a bit like binary searching an exponent, but the plots don't match any clean convergence process
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real-valued function. The most basic version starts with a real-valued function f, its derivative f′, and an ini...
gpt said, but didn't learn it this semester
oh here's the algorithm i threw together to do it
i think it's bugged though lol, it takes like 300 steps to converge in some cases
def weird_converging_algorithm(n:float,mult:float=2)->float:
#converges to n²
x=0
target = n*n
steps = 0
values = []
v = 1
overunder = 0
for _ in range(n**2):
if math.isclose(x,target): return x,steps,values
elif x>target:
if overunder==-1: v/=mult
if overunder==1: v*=mult
overunder = 1
x-=v
elif x<target:
if overunder==-1: v*=mult
if overunder==1: v/=mult
overunder = -1
x+=v
x+=1
values.append(x)
steps +=1
return x,steps,values
this seems weirdly complicated 
well yeah its testing the idea
not sure in which cases this would be useful
compared to say just doing some binary search variant
e.g. if you want some multiplicative action you can binary search for an exponent
Does anyone have a py template of iterative segment tree for range set and range query?
If its anywhere its in Pyrival
What's the advantage of learning or implementing algorithms?
I was thinking of how to solve a specific issue
with altering lists while iterating through them
for example
take a list with a certain number of elements
you cannot be sure of what their types are, if all of them are the same type, and what their values are
how can I remove all instances of a particular element?
for example
in the list ['a', 'l', 'p', 'h', 'a'], I want to remove all instances of the string 'a'
if I try to do a for loop, I will get an out range error
to this day
the simplest way to solve this issue I've found is to do it like this
!e
foo = list('alpha')
print(foo)
while 'a' in foo:
foo.remove('a')
print(foo)
:white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | ['a', 'l', 'p', 'h', 'a']
002 | ['l', 'p', 'h']
what are your thoughts?
Build a new list, append to it everything from the old one except 'a'
out of range
Bad stuff happens when you try to iterate thru smthn while modifying it
yeah, that's precisely my point and reason for coming here
Well see my suggestion above
Alternatively, use list comprehension if you've learned that
It'd look something like
new_l = [e for e in old_l if e != 'a']
beat me to it
list comprehension took me a while to get
to wrap my head around it
now I get the jokes about making a list comprehension that's multiple lines long lol
The advantage of learning them is that you gain more knowledge for applying them or making new ones. The advantage of implementing them is that you learn how to program complicated things based on abstract ideas.
What kind of range query? There is a pretty big difference between sum vs min/max
range sum query is more complicated than range min (or max) query
Oh it was about Range Assign/Set and Point Query (value of a[i]) on an array filled with 0s initially.
I was able to do it after referring to a blog on cf
Then you don't need any lazy segment tree at all
Yea, I saw a different implementation. They kept a counter of updates and then traversed from leaf to root and assigned the value with max counter value as it would be the last assigned value.
To allow for assignments, I usually store "time-stamps" in the segment tree. Then I have a seperate list that can map time-stamps into values
Yes. But for more complicated things (like range sum query), you need to do something completely different
So the trick unfortunately doesn't generalize
Hello
Is there like an all encompassing straight forward ultimate 100 hours ADS course ?
as in algos & data structs? (usually it's DSA instead of ADS)
probably not
https://codeforces.com/gym/101341/problem/A Found the problem on codeforces
I probably know where this guy got this problem now xd
yep, same testcases
Please do not discuss the selection tasks with other people. All tasks must be completed independently.
If I want to find the longest and second longest paths in a weighted tree
I planned on doing dfs from a random node
Then get the longest path from that node
Go to the farthest node
And do dfs again
And get the two biggest paths
On my example, I start a dfs from 1 (a random node ||that I picked to prove my reasoning doesn't work||) and get the longest path 1-2-4


