#algos-and-data-structs

1 messages · Page 69 of 1

haughty mountain
#

that's all

#

no fiddling with ±1 or whatever

#

if the function is true for the midpoint, then that's l

#

if false it's r

#

because that's the invariant

fervent saddle
#

I can't think about things like this

vocal gorge
#

also need to carefully make sure in your head that (your loop condition is such that) mid will never be the same as l or r, since otherwise it's an infinite loop.

haughty mountain
#

the end condition is that l and r straddle the transition as close as possible

fervent saddle
#

it's too hard to think about this

haughty mountain
#

i.e. l is the last true one, r is the first false one

haughty mountain
fervent saddle
#

why did they graduate me when I can't even write binary search?

haughty mountain
#

you are looking at some property that goes
true, true, ..., true, true, false, false, ..., false

#

you are interested in exactly where things go from true to false

#

start l at some known true value, start r at some known false value

fervent saddle
haughty mountain
#

then by checking whether the midpoint is true/false you move the limits to maintain the very simple invariant

haughty mountain
fervent saddle
#

you need to mathematically know exactly what they'll be when left approaches right

#

I don't know, I don't know what I mean

#

I'm not good at math

haughty mountain
#

with the formulation it's all about the invariant

#

at the end l and r are adjacent, and since we maintained the invariant we know f(l) is true and f(r) is false

#

e.g. f(x) could be something like arr[x] < value

#

and then l at the end would be the last index where the array element is < than the value

fervent saddle
#

what's an invariant

opal oriole
#

If you are computing sqrt(x), you can guess and check. And if you guess some value and then square it, you can see if you undershot or overshot x (too low or too high), you can then make your next guess the midpoint, and repeat.

#

(This is faster than checking every value between 0 and x linearly)

fervent saddle
#

why is writing binary search so hard for me?

opal oriole
#

With step by step?

fervent saddle
#

understanding the general idea is easy, but I try to write it and it's failing to find the value, or going out of bounds, or going into infinite loops.

fervent saddle
opal oriole
fervent saddle
#

I've written thousands of lines of code

#

class Solution:
    def search(self, nums: List[int], target: int) -> int:

        low = 0
        high = len(nums) - 1

        while low <= high:
            
            mid = (low + high) // 2

            if nums[mid] < target:
                low = mid + 1
            elif nums[mid] > target:
                high = mid - 1
            else:
                return mid
        
        return -1

I finally sort of understood it with this.
"low = mid + 1" and "high = mid - 1" because you want them to be at the start and end of the potential range, because that's how they start out, "low = 0, high = len(nums) - 1".

#

And it's "while low <= high" because "low < high" would stop at a range of length 1

#

I still don't really get it.

#

I can't visualize this, I can't think like this, I don't have good spacial intelligence.

opal oriole
#

What happens when low is equal to high?

fervent saddle
#

then that's the last index that will be checked

opal oriole
#

I mean can you give me line by line what happens?

#

Starting at the while loop?

fervent saddle
#

it either finds the value, decrements high below low, or increments low above high

opal oriole
#

No I mean right now given that low is equal to high in this exact moment, and you are currently on the while low <= high: line, how does this evaluate line by line?

#

(And given that the target is actually in there)

fervent saddle
#

it sets mid equal to low and high since (x + x) / 2 is x.

opal oriole
#

Ok, so lets say the array have length 1, and you don't have while low <= high, but rather while low < high, what happens?

fervent saddle
#

it wouldn't run the while loop

opal oriole
#

And so your result would be?

fervent saddle
#

it would say it couldn't find it

opal oriole
#

Which is not correct given my assumption that it's in there.

fervent saddle
#

it would stop checking when the potential range is size 1

opal oriole
#

So while low < high can't be right.

#

[1] search for 1.

fervent saddle
#

Yeah, I understood that it can't work by thinking about what happens when the array is size 1.

opal oriole
#

So something important to take away here, simple cases (length is 1) can show flaws quickly and easily.

fervent saddle
#

I'm frustrated that I even need to do that.

opal oriole
#

So it's important to play around with many simple cases.

#

Like length 0 to 5, missing value, not missing value, etc.

fervent saddle
#

This is such a fundamental and basic algorithm, I shouldn't need to struggle with it like that.

opal oriole
#

When mathematicians invent things, they start with simple cases like this.

#

This is why proof by mathematical induction is such a big thing.

#

You can start simple, a few basic cases to get the intuition (and catch a bunch of errors), then show it holds for infinity by being able to go from one rung to the next.

#

In science you start with a complex case (given to you by nature) and have to craft the simple cases yourself by holding as many variables constant as possible to isolate and remove possible things that get in the way.

#

(So it's additional steps, so that you can get to this nice mathematical setup (then do all of these problem solving methods))

#

(Here in leetcode/math world, you are given this so you don't need to do all that work, the rules are given to you, you don't need to find them)

opal oriole
fervent saddle
#

it's just way to hard.

#
class Solution:
    def findMin(self, nums: List[int]) -> int:
        
        low = 0
        high = len(nums) - 1

        while low < high:

            mid = (low + high) // 2

            if nums[mid] < nums[high]:
                high = mid
            else:
                low = mid + 1
        
        return nums[low]

This is the "find the minimum in a rotated sorted array" problem, and I'm staring at this, and I just can't explain to you why writing "len(nums)" instead of "len(nums) - 1" and "high = mid + 1" instead of "high = mid" breaks it.

#

I don't have the first clue how to even start figuring out why it works that way.

opal oriole
fervent saddle
#

well, i guess it would fail instantly if high is len(nums) since that's out of bounds

opal oriole
#

Correct.

#

nums[high] is not in bounds.

#

Indices can't be the length of the array and be valid, so code will often have length minus one all over the place.

#

This is also why range, slices and such things have inclusive lower/start and exclusive upper.

fervent saddle
#

I honestly just didn't read the error message

#

So that error makes sense

opal oriole
#

Error messages that cause a crash are great, they give you information. Silent errors are way worse, and worst is random errors (only errors in some rare cases (this is the target of exploits)).

fervent saddle
#

I know

#

I think I'm burning myself out, because I know I've done "search in rotated sorted array" before, but I just can't figure out how to write it right now.

#

mid = (((low + high) // 2) + start_idx) % len_nums
I think it's like this. it just took me like five minutes to think of that one line.

opal oriole
fervent saddle
#

yeah

#

I'm not getting any dopamine hits off of this, every time I solve one I just think "that took way too long, you need to do better"

haughty mountain
#

simple invariant, no ±1 in annoying places

fervent saddle
haughty mountain
#

what do you need to know other than the invariant being maintained?

fervent saddle
#

I just can't visualize it

haughty mountain
#

and the invariant is obviously being maintained

#
     mid
      v
1 1 1 1 1 0 0
^           ^
l           r
#

mid is 1, so we move l there to maintain the invariant

desert meadow
#

i came up with this solution it works in 0ms but idk if it's acceptable or not

haughty mountain
#
         mid
          v
1 1 1 1 1 0 0
        ^   ^
        l   r
```and here it's a zero, so r is moved
#

and this is the final state

1 1 1 1 1 0 0
        ^ ^
        l r
#

l and r straddle the transition between 1 and 0

fervent saddle
#

And then you just check both l and r and decide what to do from there?

haughty mountain
#

no need to check l and r

#

you know l fulfills the condition, and r doesn't

#

in particular l is the last one to fulfill the condition, and r is the first one to not

fervent saddle
#

Wouldn't l point to the maximum value and not the minimum?

desert meadow
# desert meadow ```python class Solution: def findMin(self, nums: List[int]) -> int: ...
class Solution:
    def findMin(self, nums: List[int]) -> int:
        
        mid = len(nums) // 2
        min_val = nums[mid]
        
        if nums[mid] > nums[0] and nums[mid] > nums[-1]:
            left, right = mid + 1, len(nums) - 1
            while left <= right:
                m = (left + right) // 2
                if nums[m] < min_val:
                    min_val = nums[m]
                    right = m - 1 
                else:
                    left = m + 1
        else:
            left, right = 0, mid - 1
            while left <= right:
                m = (left + right) // 2
                if nums[m] < min_val:
                    min_val = nums[m]
                    right = m - 1
                else:
                    left = m + 1

        return min_val

this works in O(log(n/2)), and uses binary search. The most optimal solution I could come up with. Also it assumes the array is rotated sorted array and is accepted on leetcode and works in 0ms.

haughty mountain
#

but yes, it's likely that r will be your answer for this task

fervent saddle
#

if nums[mid] < nums[high]:

fervent saddle
haughty mountain
#

the r at the end would be your answer

#

the one iffy case is if the array is already sorted and not shifted

#

you could probably just shift one of the starting conditions to make sure the invariant is correct

fervent saddle
#

It makes sense when you write it out, but I just can't quickly visualize it in my head. Particularly the end part.

#

I can't just instantly go "okay, low goes above high" or "high goes above low" or anything like that.

deft spruce
deft spruce
#

I didn't know about this approach. It is interesting to see this approach because this approach could be used for finding the threshold where a condition flips from True to False, so l would be largest value of something. Example: You can do a job for x days, as long as cost(x) ≤ budget. What’s the maximum x?

stoic aspen
#

hello guys i want to start to learn algorithms after basics i'm almost in the end i think i will end basics of python today or tomorrow including oop

#

GIVE ME SOME ADVICES

haughty mountain
#

it could be trivially adapted to search in float values as well

narrow mica
#

maybe you can do something with bisect in a pinch

haughty mountain
narrow mica
haughty mountain
#

funny is always an admirable motivation

haughty mountain
#

but you can't do something like "binary search to some relative and absolute error"

wheat pagoda
wide thistle
#

Hi everyone, I'm implementing deletion in binary search tree.

If you check BST deletion code , it's not working for few nodes like 20, 34 ..

can anyone please help?
I'm following this book - introduction to algorithms
Here is the BST:

            20
           /  \
         10    50
        /     /  \
       5     34   69
             / \
           21  48
               / \
             40   49
            /  \
           35  41
fervent saddle
#

Don't try to memorize AVL tree / red-black tree deletion mechanics, there's no way you can memorize all of it.

#

When I was in college they didn't make us memorize how it worked.

#

I mean I guess you could try to memorize it if you really wanted to.

opal oriole
#

Since you keep bringing up Neetcode, I looked into it and IMO it's not a good learning resource. It's a bunch of problems chosen from Leetcode rather than a properly structured and paced resource like a book. In addition, the video explanations are way too short and act like a magic trick where they quickly pull out insight after insight in such a rapid pace that it will leave the viewer felling inept (this is a timing / pacing issue). There is also the issue of many of the problems in the same difficulty rating not being even close to the same level of difficulty. And the progression within each section on the website jumping up way to quickly in difficulty (needs more in between steps / problems). I highly recommend switching away from it because its design will make you angry at yourself.

fervent saddle
#

I don't watch the videos because the way the guy talks is kind of irritating.

#

He's like "And NOW, I'm going to SHOW YOU, the crazy cool way that we can SOLVE THIS PROBLEM."

#

Idk if that makes sense in text, but yeah.

opal oriole
fervent saddle
#

Is there a better curated question resource?

opal oriole
#

There is also an additional anti-learning trap on the website. Since the solutions are just on another tab and so easy to access (one click), it becomes very tempting for beginners to just go look at the solution after failing for a bit. But to really learn you need to not give up, and refuse to look up the solution. If you are really stuck, it means you have to go solve some other (easier) problems first, then come back to it, not look at the solution.

#

Part of the reason why many math (all STEM) books don't give all the solutions is because of this, not just so teachers can use the other half for homework / tests.

opal oriole
#

(CLRS is kind of "learning the hard way" (may want a different book, there are many))

fervent saddle
#

Aren't there any free resources?

opal oriole
#

Since you are doing this for an interview specifically, one I have seen mentioned many times is: https://www.crackingthecodinginterview.com/

fervent saddle
#

Cracking the Coding Interview, the book that ruined everything

#

Now everyone needs to solve Leetcode hards because that book turned the interview process into an arms race.

#

"Let's ask our candidate Longest Substring Without Repeating Characters."
"NO that's too basic! We need to make them solve (insane nightmare genius problem)."

desert meadow
#

I think it's both good and bad

terse viper
#

You'll rarely get anything harder than a medium in an interview

fervent saddle
#

But Median of Two Sorted Arrays is hard tier

haughty mountain
#

actually, maybe it's even easier than that

tight cedar
#

the neetcode lists are for interview prep no? not learning completely from scratch

#

I think a 250 one with more easy problems were added

#

I think one thing I really hate about textbooks / college materials is that they are trying to 'be correct' and use a lot of generic notations when you don't have the proper abstractions built yet

stiff quartz
deft spruce
#

@fervent saddle I don't know what other people who are more experienced think about this, but this course that I payed for is good https://www.designgurus.io/course/grokking-the-coding-interview in the way that you can see which patterns are needed to solve DSA problems. That info is useful because you can ask LLM to give you problems from LeetCode that are related to each pattern and then if you don't know how to solve problem, you can ask LLM. I think that it is useful to see mapping between different patterns and solutions, at least for me it was a case that I was able to solve new problems when I realised how pattern relate to different solutions. So to not pay for that course, you can just see what patterns are used in the context of DSA problems and then ask LLM to give you LeetCode problems for each pattern

Tech Interview Preparation – System Design, Coding & Behavioral Courses | Design Gurus

Master 28 essential coding patterns with Grokking the Coding Interview, a comprehensive course featuring 500+ practice problems in Java, Python, C++, and more. Ace your next coding interview with proven patterns and exercises.

flint arch
paper lotus
#

Has anyone here ever organically actually had to use dynamic programming in their work

tight cedar
blissful cypress
solid rune
#

hey can anyoe guide me in machine learning / deep learning for python , I have intermediate knowledge of python(Upto OOP)

solid rune
#

okk thanks

stiff quartz
#

To solve this https://www.codechef.com/JAN231D/problems/KSIZEGCD, is there any way to guarantee that this approach works (or doesn't, I'm 99% sure it doesn't but I'm struggling to find an intuitive counter-example that makes it too slow):

  • answer array will be decreasing (easy to prove), and I feel like there shouldn't be that many distinct values (at most sqrt(n)? [1000,1,999,999,1,998,998,998,1,…] purely intuition, I don't know how to prove it or to prove that it's not true) so we just check the max gcd for random values of k (sparse table or gcd-queue-with-two-stacks-trick to have n*log(max(A)) complexity for a given subarray length) and if we have answer[i] = answer[j] then for all i < l < j we know answer[l] = answer[i].
  • We shouldn't have to sample too much till we completely fill answer so that should be fine? n^3/2*log(max(A))
solid rune
#

what all projects u working on??

regal spoke
#

Using that makes it relatively easy to solve a problem like the one you linked

#

For every index i, find all values of j where gcd(A[i:j]) decreases.

#

This can be done in n log (max A) time

#

As for what you are asking, I'm not sure that the sqrt(n) is true. But it is probably true unless someone has made a test case against it

stiff quartz
regal spoke
#

The problemsetters would really have to think about how to make good tests

#

since randomly generated test cases are not difficult

stiff quartz
#

Ah yes ok I see - you have to make custom test cases like my example if I wasn’t mistaken in my analysis to reach something actually bad

mild pike
#

Hi!!!

#

Can Anyone give me a roadmap to learn Dsa from starting

serene comet
#

I'm looking for a job

#

Help me pls i have skills

unreal swallow
blissful tundra
#

How dics and sets working with -1 -2 bcz if hash of -1 is -2 how python can access it

haughty mountain
analog ferry
icy birch
#

Does someone now what I can make with PyTorch?

stiff quartz
regal spoke
stiff quartz
#

don't you pay it the second you do a math.gcd?

#

I guess actually what I did is a n log(A)^2 now that I think about it

regal spoke
#

Computing gcd a list of n numbers is actually O(n + log max A)

stiff quartz
#

the way I did it, I never compute the gcd of a list of n numbers

#
  • I compute the gcds of the subarrays that start at n-1
  • and I use those (at most log(max(A))) to compute the gcds of the subarrays that start at n-2
    etc
#

(which is why I realized my solution is actually nlog(max(A))^2)

regal spoke
#

What would you say the time complexity is of this code?

import math
A = [int(x) for x in input().split()]
g = 0
for a in A:
  g = math.gcd(g, a)
print(g)
#

it is not n log max A

stiff quartz
#

Ah yes

#

because g can only "go down" log(max(A)) time

#

@regal spoke how is it that in our model of computation we consider one single math.gcd to be of log complexity whilst a % operation is considered of constant complexity

regal spoke
#

Gcd is clearly not a low level operation

stiff quartz
#

In my head, integer division for example is log(n) complexity but because we're only on 32/64 bit integers we say it's O(1)

#

(if we were mathematicians dealing with really big numbers we'd say log(n))

#

now it feels like gcd is a bit similar if we're doing it with the euclidean algorithm we could say we won't ever do more than 32/64 whatever operations so that's O(1)

stiff quartz
regal spoke
#

In practice the reason why we say it is log is because that is the algorithm that is being used

stiff quartz
#

Yeah I know but like wouldn't we say there is an "algorithm" behind integer division as well?

#

The line gets a bit blurry for me for those things as to what we assume is O(1) or not

regal spoke
#

In theory, I think people usually just dont care about log factors like this at all

stiff quartz
#

But maybe the argument is if the algorithm is "in the hardware" then we assume it's O(1)

regal spoke
#

Thing is, there are algorithms like half gcd that is able to non-trivially speed up gcd.

#

But probably that is not at all useful unless the numbers become huge

fervent saddle
#

I finally understood how Median of Two Sorted Arrays works

haughty mountain
#

some funny binary search?

#

or maybe ternary search

brazen flicker
#

Any good book for data structures with python?

fervent saddle
#

And now I can finally move on to the linked list problems on neetcode.

#

As soon as I write out the median of two lists solution.

fiery cosmos
#

hello

turbid veldt
turbid veldt
brazen flicker
turbid veldt
# brazen flicker Forms?

like mit and other like post forms i used to read them for the different types of properties and stuff

brazen flicker
turbid veldt
brazen flicker
#

Not sure wym

turbid veldt
#

your right it did not show up for me to wtf

#

they removed it noooo

brazen flicker
turbid veldt
blissful tundra
#

"Ppl who don't play blox fruits are a bad programmers " ms fact

halcyon plankBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

green galleon
#
class ExtendedList(list):
    def __init__(self, *args):
        super().__init__()

        map(self.append, args)

    # ... snip ...

items = ExtendedList(1, 2, 3)
print(items)

[]

why does the append function invoked by map not append to the instance of ExtendedList?

agile sundial
#

map is lazy

green galleon
#

thank you!!!

waxen valve
#

hello everyone. I have written some code to turn Rise of Kingdoms “mailcache” files (Persistent.Mail.…) into JSON.
Structure is straightforward — after a short header every file is a TLV map:

The walk itself works; I get a nice nested dict and dump it to JSON.

Problem: the numeric values I translate (mainly coordinates, levels, resource amounts) are very wrong, this is due to the probably some bits being used for other stuff other than holding the value that i am looking for.

#

if someone wants to help me with this i will show the input file, my code and the output json

#

its been a week i am trying to solve this

coarse pine
severe cipher
#

any resource for learning DSA in python

fervent saddle
#

https://neetcode.io/problems/copy-linked-list-with-random-pointer
here's my hash map solution to this problem:

class Solution:
    def copyRandomList(self, head: 'Optional[Node]') -> 'Optional[Node]':

        # empty list
        if head is None:
            return

        curr = head

        # dealing with the first node
        # # # # #
        head_copy = Node(curr.val, None, None)
        node_map = {head: head_copy, None: None}

        if head.random is not None:
            if head.random is not head:
                node_map[head.random] = Node(head.random.val, None, None)
                head_copy.random = node_map[head.random]
            else:
                head_copy.random = head_copy
        
        if head.next in node_map:
            head_copy.next = node_map[head.next]
        else:
            head_copy.next = Node(head.next.val, None, None)
        
        node_map[head.next] = head_copy.next
        # # # # #
        # # # # #

        curr = curr.next
        
        while curr is not None:
            
            curr_copy = node_map[curr]

            if curr.random is not None:
                if curr.random in node_map:
                    curr_copy.random = node_map[curr.random]
                else:
                    curr_copy.random = Node(curr.random.val, None, None)
                    node_map[curr.random] = curr_copy.random

            if curr.next in node_map:
                curr_copy.next = node_map[curr.next]
            else:
                curr_copy.next = Node(curr.next.val, None, None)
                node_map[curr.next] = curr_copy.next
            
            curr = curr.next
        
        return head_copy
#

and here's the hash map solution on the site:

class Solution:
    def copyRandomList(self, head: 'Optional[Node]') -> 'Optional[Node]':
        oldToCopy = collections.defaultdict(lambda: Node(0))
        oldToCopy[None] = None
        
        cur = head
        while cur:
            oldToCopy[cur].val = cur.val
            oldToCopy[cur].next = oldToCopy[cur.next]
            oldToCopy[cur].random = oldToCopy[cur.random]
            cur = cur.next
        return oldToCopy[head]
#

why am I so bad at coding? it's like three times more concise than what I wrote.

fervent saddle
#

Also how hard is it to come up with an O(1) space solution to this problem? Should I try, or is it one of those things where you should just look up how it's done? I see that the site recommends you to come up with an O(n) space solution.

naive oak
#

the answer itself requires O(n) space

frail venture
#

Hey guys I'm looking for some people to work as a team for my sql that I'm working on called ASTRiXe*SQL I want some people as testers and people who can be staff members for my server for Astrixe please dm me if interested 😊

fervent saddle
haughty mountain
#

using the input to temporarily store the data that would otherwise go in the hashmap (and presumably carefully restoring it afterwards)

#

kinda building a mutant linked list for a bit until you separate them

regal spoke
#

Pretty much any kind of hashing problem can be solved using this

stiff quartz
#

If N is around 10^5, and I run one dijkstra per binary search iteration (final complexity N*logN*log(max(a)) and max(a) can go up to 10^9), is it reasonable for my code to run for 11s with PyPy on codeforces? Or did I mess up somewhere?
10^5*log2(10^5)*log2(10^9) is around 10^7-10^8

merry dust
#

!d collections.defaultdict

halcyon plankBOT
#

class collections.defaultdict(default_factory=None, /[, ...])```
Return a new dictionary-like object. [`defaultdict`](https://docs.python.org/3/library/collections.html#collections.defaultdict) is a subclass of the built-in [`dict`](https://docs.python.org/3/library/stdtypes.html#dict) class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the [`dict`](https://docs.python.org/3/library/stdtypes.html#dict) class and is not documented here.

The first argument provides the initial value for the [`default_factory`](https://docs.python.org/3/library/collections.html#collections.defaultdict.default_factory) attribute; it defaults to `None`. All remaining arguments are treated the same as if they were passed to the [`dict`](https://docs.python.org/3/library/stdtypes.html#dict) constructor, including keyword arguments.

[`defaultdict`](https://docs.python.org/3/library/collections.html#collections.defaultdict) objects support the following method in addition to the standard [`dict`](https://docs.python.org/3/library/stdtypes.html#dict) operations:
fervent saddle
#

https://neetcode.io/problems/find-duplicate-integer

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        
        slow = fast = 0
        
        while True:
            
            fast = nums[fast]
            if fast == slow:
                break
            fast = nums[fast]
            if fast == slow:
                break
            
            slow = nums[slow]
        
        idx1 = 0
        idx2 = slow
        
        while idx1 != idx2:
            idx1 = nums[idx1]
            idx2 = nums[idx2]
        
        return idx1

Could someone please explain why the fuck this says "time limit exceeded" when I submit it??? It also passes the test when I run it without submitting.

haughty mountain
#

bug that causes an infinite loop maybe?

stiff quartz
haughty mountain
#

it looks like tortoise and hare algorithm, though I'm not sure why that's applicable, I would need to think

#

In computer science, cycle detection or cycle finding is the algorithmic problem of finding a cycle in a sequence of iterated function values.
For any function f that maps a finite set S to itself, and any initial value x0 in S, the sequence of iterated function values

      x
      
        0
      
...
haughty mountain
stiff quartz
fervent saddle
#

It can't be that, the numbers can't be 0.

haughty mountain
#

fine [2, 1, 3, 3]

fervent saddle
haughty mountain
#

What loop are you trying to find?

fervent saddle
#

The number that appears more than once is basically like a node that points to the start of a cycle.

#

So it ends up being that you can use the fast/slow cycle start detection algorithm for this.

fervent saddle
#

I should have said cycle instead of loop

stiff quartz
#

Just write it down it'll be obvious

stiff quartz
fervent saddle
#

It works when I click "run", but not "submit" ON THE SAME TEST CASE

stiff quartz
#

!e

from typing import List
class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        
        slow = fast = 0
        
        while True:
            
            fast = nums[fast]
            if fast == slow:
                break
            fast = nums[fast]
            if fast == slow:
                break
            
            slow = nums[slow]
        
        idx1 = 0
        idx2 = slow
        
        while idx1 != idx2:
            idx1 = nums[idx1]
            idx2 = nums[idx2]
        
        return idx1
solution = Solution()
solution.findDuplicate([1,3,4,2,2])
halcyon plankBOT
stiff quartz
#

clearly it doesn't work

fervent saddle
#

And the solution posted on the site works fine, so it's not just the question being broken.

fervent saddle
stiff quartz
#

it timed out

fervent saddle
#

Oh

stiff quartz
#

You implemented the algo wrong

stiff quartz
fervent saddle
#

Then why does it pass the two default test cases

#

I don't get it, mine does exactly the same thing as the one posted on the site. Why doesn't t work???

stiff quartz
#

again, take a piece of paper

#

write down the steps on my example

fervent saddle
#

How???

#

I did

stiff quartz
#

and you didn't end up in an infinite loop?

#

idx1 and idx2 should be alternating between 2 and 4 and never meeting if you correctly wrote down the steps

fervent saddle
stiff quartz
#

I don’t understand what you mean

#

The two test cases on your website have nothing to do with my example

fervent saddle
stiff quartz
#

But I didn’t talk about those test cases did I

stiff quartz
fervent saddle
#

Isn't that where you got it from?

stiff quartz
#

Well I don’t see it on your website

#

Im not sure I follow you

fervent saddle
#

If you click "run" then it shows a couple test cases.

#

Nvm, it's not relevant

stiff quartz
#

I don’t know this website sorry

#

All I know is that this is a good example for you to understand why your algorithm is wrong

haughty mountain
fervent saddle
#

Is it because I'm checking both index changes on the fast index?

fervent saddle
stiff quartz
#

And idx2 goes from 4 to 2

#

At the same speed

#

And they start ‘unsynchronized’

fervent saddle
#

Because I check if fast equals slow before fast has changed twice?

stiff quartz
#

Yes I think so

#

This is the first time I read about this algo though, all I can say is where your infinite loop is, not what the algorithm should look like

fervent saddle
#

I thought I tried to change that part, but I'll try again later.

haughty mountain
#

so the idea is to assume that the input represents a graph, and there is a guaranteed loop because of the duplicates?

stiff quartz
#

n nodes, n+1 directed edges if I’m not mistaken

fervent saddle
#

It's a linked list with a cycle in it

stiff quartz
#

That’s equivalent

opal oriole
stiff quartz
#

Whats the formal proof that the second loop ends (if the first one was done correctly) if anyone knows id be curious

haughty mountain
#

I think that's mostly trivial

haughty mountain
opal oriole
#

Since cycles are involved, you can probably guess than it involves modular arithmetic.

stiff quartz
#

Do we have any info the « meeting point » properties

haughty mountain
#

(the number of steps)
fast = slow + k*cycle_length is one thing you know

opal oriole
haughty mountain
stiff quartz
#

Yep

haughty mountain
#

i.e.

slow = k*cycle_length
#

slow is a multiple of the cycle length

stiff quartz
#

That’s only if it’s directly in the cycle?

haughty mountain
#

is it?

#

I think that holds in general

opal oriole
#

There is some point at which both slow and fast will be trapped in the cycle.

stiff quartz
haughty mountain
#

the second loop does a walk starting at the beginning and from slow, and because of being a multiple things will line up in the loop

#

and we are interested in the first time they line up, which is the start of the cycle

stiff quartz
opal oriole
haughty mountain
#

fast could have half the cycle length, but that's fine (I think)

stiff quartz
#

I edited my message my bad

stiff quartz
#

Makes sense

opal oriole
#

Yeah, the hard part of this problem is just realizing this implicit graph (and having already messed around with graphs). The cycle detection algorithm is not too hard to come up with if you went into it looking for that specifically and were therefore messing around with a small graph to begin with (you already see the graph).

#

The thing to add the list of heuristics / things to check with this one is when given constraints on the possible values, and there are coincidences (n length, 1 to n possible for the values (reused variable is a flag for relationships you can use to get a better algorithm)), it probably forms some math structure implicitly.

stiff quartz
fervent saddle
#

Okay, maybe that was what was wrong

#

I'll check soon

stiff quartz
# haughty mountain fast could have half the cycle length, but that's fine (I think)

And so the proof that the first loop ends is after they both are in the cycle (finite time for sure), if they are not equal mod cycle_length, each iteration, their difference* mod cycle_length increases by one (since one moves two steps and the other one step) so before cycle_length iteration they'll meet?

*if we order the nodes of the cycle

haughty mountain
#

that sounds correct

stiff quartz
#

I was afraid the time complexity would be some LCM which would be terrible just to achieve O(1) space

#

but it turns out it's linear so it's pretty cool

#

if I'm not mistaken

fervent saddle
#

Yeah it's linear

fiery cosmos
#

2 pointers 🥱

zealous dirge
#

Hi, im taking a course discrete structures, I may need some help

#

bayes theorem if anyone knows

fiery cosmos
#

Asked by google, Amazon , meta

#

Crazy easy too 😂

deft spruce
#

So when are heaps used in DSA problems? I know about these problems: fast finding min and max elements, when there is dynamical list of elements and we need to often do insertion or removal of element and when there are median problems so two heaps are then used

#

I guess they are also used for tracking something like occurrences of elements and then doing something with those occurrences

fiery cosmos
#

Adding 2 sorted arrays together returning a sorted merged array in O(n) time

fiery cosmos
#

Asked by google / meta / zon/ Microsoft / Tesla / TikTok / apple

#

if anyone wants to walk through it or needs help understanding lmk

narrow mica
haughty mountain
haughty mountain
deft spruce
#

I wonder whether they look for work email address like for example in Blind app

haughty mountain
#

I would be surprised if that one was used by any larger company

#

except maybe as a very simple warm up leading into something else

chrome pewter
chrome pewter
stiff quartz
fiery cosmos
#

So

urban shoal
#

hey guys i am unable to install File "<input>", line 1
pip3 install openai
^^^^^^^
SyntaxError: invalid syntax

fiery cosmos
urban shoal
fiery cosmos
naive oak
# fiery cosmos

hey i recently wrote a formal proof in a proof checker that this relation is a partial order

#

small world

fiery cosmos
#

Nice I love doing leetcode problems it’s so fun

haughty mountain
#

you literally just walk through the thing and see that the letters in the sequence appear in order

fiery cosmos
#

it seems difficult at first glance but you just set two pointers to the beggining of both arrays and then see if they equal each other or not and iterate through both arrays

#

O(n)

#

i will say this problem has a 48% acceptance rate so it might be more of a medium than easy

chrome pewter
fiery cosmos
#

one mans easy is another mans hard?

chrome pewter
chrome pewter
#

If it exists return the indices

#

Iterating it fully would make it O(n) no?

chrome pewter
fiery cosmos
#

Binary search would delete half of the array every iteration

#

Imagine the array was 1000 numbers

chrome pewter
#

Since it says "sorted"

#

Deleting half of what we don't need

#

Would reduce the tc to O(logn)

#

Right

fiery cosmos
#

But can you find the answer that way?

chrome pewter
#

I'll give it a try once and let yk if it doesn't work then O(n)?

fiery cosmos
#

You are looking to see if the target value provided is the sum of two numbers in a non decreasing array

#

Yes O(n)

chrome pewter
#

Okay give me 10

#

Wait a sec

#

For each element you've got to do bs

#

So it would be O(nlogn)

#

And the normal way would take O(n^2)

#

@fiery cosmos

fiery cosmos
#

if you have two pointers you can do it in O(n) time

#

set one pointer to the beggining of the index

#

set the other pointer to end of the index

#

and then iterate through the array from both sides

#

if your combo is > target then you can move your right pointer -1
and if your combo is < target you can move your left pointer +1

#

If you need more help I can send you some written code that might assist

chrome pewter
#

Two pointers is the most efficient here

#

Cause we iterate through only once

#

If sum > move right

#

Else move left

#

RAHHH I FEEL SO DUMB 😭

fiery cosmos
#

Lol

#

You can use recursion here too

#

But it would be much slower

#

And complicated

fiery cosmos
#

You can add some code to return whatever the problem wants you to return but essentially this is the logic

chrome pewter
#

Perfect

chrome pewter
chrome pewter
haughty mountain
silent pond
#

hi there,

#

is learning arrays and linked list helpful?

#

if so where can i learn them from.

#

with real examples.

clever bone
#

Hi, I am trying to convert a image in text through tesseract OCR, but preprocess is needed and I am not able to do it. Does someone know how can i get a good result?

#

here it is a image example to convert in text

stray walrus
# fiery cosmos

I've been solving a lot of problems with substrings so my first thought was recursion. But yeah then I saw the difficulty level.

#

The two pointer approach you suggested is actually the optimal.

#

I will add this to my list to make a note. Let me know if you want to solve some more questions together.

fiery cosmos
#

You can use greedy to solve but why 😂

plain quartz
#

I want to make an Project about the skills i learnt and how to integrate all things
I have understanding about - Python, numpy, pandas, streamlit, sql, matplot and powerBI

mellow helm
fiery cosmos
#

No why?

mellow helm
#

it kinda looked like he struggled

mellow helm
fiery cosmos
#

Took like 20 sec

mellow helm
#

like someone struggled to right that

mellow helm
fiery cosmos
#

Struggled to write legibly so others could read it lol

#

Writing is a dying art haha

mellow helm
#

man functions are so useful!

fiery cosmos
#

Frrr

mellow helm
#

PRAISE THE DEF

mellow helm
#

ALL HAIL THE DEFF

#

YEAA

#

A

fiery cosmos
#

Yes

mellow helm
#

have you ever used %s etc as placeholder before

#

most likely you should

#

js asking

#

..

#

you didnt, did you

fiery cosmos
#

Show me what you mean

mellow helm
#

☠️

#

bro can write long paragraphs of codes but not understand what % does

#

ill give you a hint

fiery cosmos
#

Modulo

mellow helm
#

% formatting

#

its a placeholder

#

like say for an example

#

this one is from ai

#

IM TEMPORARILY USING AI

#

BECAUSE IM TOO LAZI

#

product = "phone"
quantity = 2
message = "You bought %d %s(s)." % (quantity, product)
print(message)

#

dude why is the code so sketchy

fiery cosmos
#

Looks sue

#

Sus

mellow helm
fiery cosmos
#

I like doing f string formatting

mellow helm
fiery cosmos
#

I use em all the time

mellow helm
#

☠️

fiery cosmos
#

**




mellow helm
#

we're definitely opposite

#

i barely do f string formatting

#

i do % formmating all the time

#

YOU do f string formatting

fiery cosmos
#
*
**
***
****
*****
#

F strings make problems like these ez ^^

mellow helm
#

YOU DONT EVEN KNOW WHAT % FORMMATING IS im confused i thought you were pretty good at python

fiery cosmos
#

I do lol

mellow helm
#

k bet

fiery cosmos
#

I just don’t care to use it

mellow helm
#

yea its basically f string

#

wasted my time of my life

fiery cosmos
#

Yea but with brackets lol

mellow helm
#

hi

#

CURVY brackets

#

there are three types

fiery cosmos
#

Yee

mellow helm
#

(
{
[

#

hi

fiery cosmos
#

Isn’t the first a parenthesis

mellow helm
#

hi

mellow helm
#

hi

fiery cosmos
#

I can’t wait to do some leetcode when I get home

mellow helm
#

sound like

#

sounds kinda goofy

#

EHAT

mellow helm
#

lmao

#

what

fallow gulch
#

Tell me something I am starter to DSA

fiery cosmos
#

Currently learning sliding window 🪟

fiery cosmos
chrome pewter
fiery cosmos
#

yess

frank grove
#

hi everyone

#

I have a question about how many hours it took you to get good at Python?

#

especially to have a more or less understanding of algorithms

fiery cosmos
#

Like a month of daily study ~ 5 hours

fiery cosmos
fiery cosmos
wise timber
#

But when we invest time on Algorithams they pay off come with diff life pattern

wise timber
# fiery cosmos

Can you give me some tips or resources for learnings algorithams

woven python
#

@feral stump

fiery cosmos
wide kestrel
#

Hi guys please suggest to me what I should add in my GitHub profile

wise timber
wise timber
stiff quartz
wise timber
stiff quartz
fiery cosmos
#

I cooked

wanton root
# fiery cosmos
def row_finder(r):
    start = (r * (r + 1)) - (r - 1) * 2
    end = r * (r + 1) + 1
    numbers = list(range(start, end, 2))
    print(sum(numbers))

🔥

fiery cosmos
#

Hehe

#

Yea I should have done that

#

Made it more readable

umbral mountain
#

It's a cool technique if you haven't learnt it.

fiery cosmos
#

Yea I solved it using sliding window it’s neat

wide kestrel
wise timber
#

Implementation

#

Explore Algorithams, solve leetcode problems this will make improvements on logics

wide kestrel
#

Thankuuu i do leetcode also but i am not consistent in maintaining contributions on GitHub

hoary adder
#

anyone here who can provide some resources for dsa ?

fiery cosmos
#

Coding interview patterns pdf

stiff quartz
#

Anyone knows what I'm doing wrong, I'm trying to generate collisions in a Python dictionary with small numbers (so I can't just take multiples of 2^61-1), and it barely works:

This should have collisions:

evil_all_numbers: set[int] = set()

i = 1 & (2**17-1)
perturb = 1
while len(evil_all_numbers) < 100_000:
    evil_all_numbers.add(i)
    i = (5*i+ 1 + perturb) & (2**17-1)
    perturb >>= 5
my_dict: dict[int, int] = {}
for number in evil_all_numbers:
    my_dict[number] = 1

This shouldn't:

import random

random_all_numbers: set[int] = {1}
while len(random_all_numbers) < 100_000:
    random_all_numbers.add(random.randint(0, 2**32-1))
my_dict_random: dict[int, int] = {}
for number in random_all_numbers:
    my_dict_random[number] = 1

But accessing key 1 in the second dictionary takes 15.5 ns and in the first dictionary 27.4 ns (but too much noise to make the difference significantly different), so I guess that's a bit disappointing, I was expecting some O(n^2) blowup.

And 100_000 < 2^17 = 131072 so the internal length of the dictionary should indeed be 2^17? (not sure how I can access the actual internal length of the dictionary though)

EDIT I’m stupid, accessing key 1 won’t generate any collision

rare laurel
# wanton root ```py def row_finder(r): start = (r * (r + 1)) - (r - 1) * 2 end = r * (...

An arithmetic progression or arithmetic sequence is a sequence of numbers such that the difference from any succeeding term to its preceding term remains constant throughout the sequence. The constant difference is called common difference of that arithmetic progression. For instance, the sequence 5, 7, 9, 11, 13, 15, . . . is an arithmetic prog...

#

and if you expand & simplify that enough you can get to the simple closed form solution, ||r**3 + r||
math is beautiful

vocal gorge
haughty mountain
#

I might be misremembering

#

pithink

#define GROWTH_RATE(d) ((d)->ma_used*3)
fiery cosmos
#

What branch of math is this

rare laurel
regal spoke
#

You are "hacking" PyPy's version

regal spoke
#

Ye read the conversation

#

in the issue I linked

fiery cosmos
regal spoke
# stiff quartz Anyone knows what I'm doing wrong, I'm trying to generate collisions in a Python...

I read some of pypy source's code. https://github.com/pypy/pypy/blob/886dab661a8fa4aee61f7884b5c5f018ecf3a513/rpython/rtyper/lltypesystem/rdict.py#L575

Here is some python "pseudo code" for how the resizing works:

size = 16
items = 0

def insert_item():
    global size, items
    items += 1
    if 2 * size <= 3 * items:
        newsize = 1
        while newsize <= (size + min(items, 30000)) * 2:
            newsize *= 2
        size = newsize
        print('newsize =', newsize, 'happened when inserting', items, 'th item')

for _ in range(10**6):
    insert_item()

Whenever 2 * size <= 3 * items, the dict is resized. This guarantees that at all times, size > 3/2 * items.

#
newsize = 64 happened when inserting 11 th item
newsize = 256 happened when inserting 43 th item
newsize = 1024 happened when inserting 171 th item
newsize = 4096 happened when inserting 683 th item
newsize = 16384 happened when inserting 2731 th item
newsize = 65536 happened when inserting 10923 th item
newsize = 262144 happened when inserting 43691 th item
newsize = 1048576 happened when inserting 174763 th item
newsize = 4194304 happened when inserting 699051 th item
#

So for example, the moment you insert the 43691th item, the internal storage jumps 65536=2^16 to 262144=2^18

#

Because of this. The following is a "hack"

evil_for_x = {}

x = 1
i = x
perturb = x
while len(evil_for_x) < 43690:
    evil_for_x[i + 2**24] = 1
    i = (5*i+ 1 + perturb) & (2**17-1)
    perturb >>= 5

for _ in range(10**5):
    x in evil_for_x

But if you insert one more item. Then the internal size of the dict becomes 2^18, and thus 2^17 is not enough anymore. So this is not a "hack":

evil_for_x = {}

x = 1
i = x
perturb = x
while len(evil_for_x) < 43691:
    evil_for_x[i + 2**24] = 1
    i = (5*i+ 1 + perturb) & (2**17-1)
    perturb >>= 5

for _ in range(10**5):
    x in evil_for_x
regal spoke
#

Btw this is just a very fancy math formula for size *= 4. I've yet to see a case where the result is something different.

regal spoke
#

anything from 2^18 and up would work

#

Its only really needed when i = x

#

The point is that you dont want to put x into evil_for_x since then it would find x instantly when you do x in evil_for_x.

stiff quartz
#

Ah yes without it we wouldn’t generate collisions because we would find 1 at the first spot and not do any walk?

regal spoke
#

But you need to put something at x & mask

stiff quartz
#

Ok yes yes

#

Smart

#

And so granted we’re capable of reverse engineering or reading CPython source code we could do the same except we would perturb first?

regal spoke
#

I think dict in pypy and cpython are meant to be the same

#

Just that whoever translated cpythons source code into pypy messed up where to put perturb >>= 5

stiff quartz
#

To be fair perturb goes to 0 quickly enough for us to not care right?

regal spoke
#

Well in PyPys it is very unproductive

stiff quartz
#

Yeah I saw the person mentioning it on the issue

regal spoke
#

If two numbers are same & mask, then the perturb doesn't seperate them on the first iteration in PyPys implementation

stiff quartz
#

It seems unlikely that they are same & mask but with a different perturb

#

Unless intentionally done

#

Like you’d need to have the same remainder when dividing by mask

stiff quartz
#

Just a slight tweak is needed

regal spoke
#

Yes 100%

stiff quartz
#

I’ll try extensively tomorrow

regal spoke
#

I just took a closer look. Seems like there are two differences between PyPy and CPythons dicts

#
size = 8 if CPython else 16
items = 0

def insert_item():
    global size, items
    items += 1
    if 2 * size <= 3 * items:
        newsize = 1
        while newsize <= (size + min(items, 30000)) * 2:
            newsize *= 2
        size = newsize
        print('newsize =', newsize, 'happened when inserting', items, 'th item')

The initial size of their dictionaries are different

#

Here are the size increments for CPython

newsize = 32 happened when inserting 6 th item
newsize = 128 happened when inserting 22 th item
newsize = 512 happened when inserting 86 th item
newsize = 2048 happened when inserting 342 th item
newsize = 8192 happened when inserting 1366 th item
newsize = 32768 happened when inserting 5462 th item
newsize = 131072 happened when inserting 21846 th item
newsize = 524288 happened when inserting 87382 th item
newsize = 2097152 happened when inserting 349526 th item
#

The other difference is the position of the perturb >>= 5 line

stiff quartz
#

There might not be an answer but holy cow how do you read CPython source code I never am able to understand anything from it

regal spoke
#

But I'm able to confirm things from checking when the hacks work

#

!e

evil_for_x = {}

x = 1
i = x
perturb = x
while len(evil_for_x) < 87382:
    evil_for_x[i + 2**24] = 1
    perturb >>= 5
    i = (5*i+ 1 + perturb) & (2**17-1)
    
for _ in range(10**5):
    x in evil_for_x
halcyon plankBOT
regal spoke
#

!e

evil_for_x = {}

x = 1
i = x
perturb = x
while len(evil_for_x) < 87381:
    evil_for_x[i + 2**24] = 1
    perturb >>= 5
    i = (5*i+ 1 + perturb) & (2**17-1)
    
for _ in range(10**5):
    x in evil_for_x
halcyon plankBOT
regal spoke
#

The second one hangs, which shows that CPython does a resize at 87382

regal spoke
stiff quartz
#

Sorry

regal spoke
#

The size always grows in power of 2 or 4 (almost always a factor of 4). There are no primes anywhere as far as I can tell.

stiff quartz
regal spoke
#

There is no prime sieve or anything like that in the code. It just resizes whenever 2 * size <= 3 * items.

stiff quartz
#

!e


import sys

my_dict = {}
previous_size = sys.getsizeof(my_dict)

for i in range(100000):
    my_dict[i] = i
    current_size = sys.getsizeof(my_dict)
    if current_size != previous_size:
        print(f"Size changed: Elements = {len(my_dict)}, Size = {current_size} bytes")
        previous_size = current_size
halcyon plankBOT
# stiff quartz !e ```py import sys my_dict = {} previous_size = sys.getsizeof(my_dict) for ...

:white_check_mark: Your 3.13 eval job has completed with return code 0.

001 | Size changed: Elements = 1, Size = 224 bytes
002 | Size changed: Elements = 6, Size = 352 bytes
003 | Size changed: Elements = 11, Size = 632 bytes
004 | Size changed: Elements = 22, Size = 1168 bytes
005 | Size changed: Elements = 43, Size = 2264 bytes
006 | Size changed: Elements = 86, Size = 4688 bytes
007 | Size changed: Elements = 171, Size = 9304 bytes
008 | Size changed: Elements = 342, Size = 18512 bytes
009 | Size changed: Elements = 683, Size = 36952 bytes
010 | Size changed: Elements = 1366, Size = 73808 bytes
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/WNQBU6MMRTRQ3WJOFHH7OHX4OQ

stiff quartz
#

Or is it because they use split tables (not sure that’s the term, but an array with all the values in insertion order and the other where we do open addressing)

#

And they don’t resize both arrays at the same time

regal spoke
#

Let me try something

#

!e

size = 8
items = 0

def insert_item():
    global size, items
    items += 1
    if 2 * size <= 3 * items:
        newsize = 1
        while newsize <= (size + min(items, 30000)):
            newsize *= 2
        size = newsize
        print('newsize =', newsize, 'happened when inserting', items, 'th item')

for _ in range(10**6):
    insert_item()
halcyon plankBOT
# regal spoke !e ```py size = 8 items = 0 def insert_item(): global size, items items...

:white_check_mark: Your 3.13 eval job has completed with return code 0.

001 | newsize = 16 happened when inserting 6 th item
002 | newsize = 32 happened when inserting 11 th item
003 | newsize = 64 happened when inserting 22 th item
004 | newsize = 128 happened when inserting 43 th item
005 | newsize = 256 happened when inserting 86 th item
006 | newsize = 512 happened when inserting 171 th item
007 | newsize = 1024 happened when inserting 342 th item
008 | newsize = 2048 happened when inserting 683 th item
009 | newsize = 4096 happened when inserting 1366 th item
010 | newsize = 8192 happened when inserting 2731 th item
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/MEGQBZF5GHUS6SNMFTISIUCPDE

regal spoke
#

Now they are matching

#

I just changed the math formula so it grows by a factor of 2 instead of 4 at a resize

#

This is in CPythons source

#

Maybe CPython was updated and PyPy wasnt

stiff quartz
#

Thats a weird formula

stiff quartz
regal spoke
stiff quartz
#

Yeah ok i was wondering if you had simplified it and indeed you did

regal spoke
#

PyPy's source code has comments that claims it comes from CPython.

#

My guess is that CPython has changed a lot of things since then.

#
GitHub

BPO 17563 Nosy @rhettinger, @vstinner, @markshannon, @ericsnowcurrently Files resize.patch: Patch Note: these values reflect the state of the issue at the time it was migrated and might not reflect...

GitHub

BPO 32623 Nosy @gvanrossum, @tim-one, @rhettinger, @methane, @serhiy-storchaka, @1st1, @xgid PRs #5297#5300 Dependencies bpo-33205: GROWTH_RATE prevents dict shrinking Note: these values reflect th...

#

Seems every time there is a change, there is some kind of performance regression that people want to fix

stiff quartz
#

I was convinced it was *=4 because I had watched a very old video by a core dev at CPython that was explaining the concept of « modern dictionaries » but indeed that is not the case since 3.2 damn

#

Anyway thank you for the help!

#

Unrelated but do you know if you can hack a submission after the « hacking session » of a contest is done?

regal spoke
stiff quartz
#

Ah ok but if i solved the problem of a recent contest outside of that contest, I can’t seem to hack it

#

Kind of a shame tbf (unless I’m just blind and you can)

regal spoke
#

I've had submissions like that. Not much one can do unfortunately.

#

After going through CPython source, here is what I think is happening

#

!e

size = 8
items = 0

def log2(x):
    return x.bit_length() - 1

def insert_item():
    global size, items
    items += 1
    if size < 2**log2(3 * items):
        size = 2**log2(3 * items)
        print('newsize =', size, 'happened when inserting', items, 'th item')

for _ in range(10**6):
    insert_item()
halcyon plankBOT
# regal spoke !e ```py size = 8 items = 0 def log2(x): return x.bit_length() - 1 def ins...

:white_check_mark: Your 3.13 eval job has completed with return code 0.

001 | newsize = 16 happened when inserting 6 th item
002 | newsize = 32 happened when inserting 11 th item
003 | newsize = 64 happened when inserting 22 th item
004 | newsize = 128 happened when inserting 43 th item
005 | newsize = 256 happened when inserting 86 th item
006 | newsize = 512 happened when inserting 171 th item
007 | newsize = 1024 happened when inserting 342 th item
008 | newsize = 2048 happened when inserting 683 th item
009 | newsize = 4096 happened when inserting 1366 th item
010 | newsize = 8192 happened when inserting 2731 th item
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/YP2RPORQ3OA7WBBYT435CF3SGU

regal spoke
#

Honestly, it would be nice if PyPy's source was updated to match this.

regal spoke
stiff quartz
#

The decision of when to resize probably doesn’t have a perfect answer, no matter what you do it’ll help some use cases and be bad for other use cases

stiff quartz
still quiver
#
global  _start
extern  printf

section .data
    outFormat db  "It's %s", 0x0a, 0x00
    message db "Aligned!", 0x0a

section .text
_start:
    push rax
    call print          ; print string
    call Exit           ; Exit the program

print:
    mov rdi, outFormat  ; set 1st argument (Print Format)
    mov rsi, message    ; set 2nd argument (message)
    call printf         ; printf(outFormat, message)
    ret

Exit:
    mov rax, 60
    mov rdi, 0
    syscall

is this valid asm code

fiery cosmos
trim thorn
#

0 Time and Space Complexity
1 Arrays & Strings
2 Loops, Conditions, Midpoint Math
3 Hashing (HashMap, HashSet)
4 Two Pointers
5 Sliding Window
6 Binary Search
7 Sorting Algorithms
8 Prefix Sum / Difference Arrays
9 Stacks & Queues
10 Linked Lists
11 Recursion + Backtracking
12 Binary Trees
13 Binary Search Trees (BST)
14 Heaps / Priority Queues
15 Tries (Prefix Trees)
16 Graphs (DFS/BFS)
17 Topological Sort
18 Union-Find (DSU)
19 Greedy Algorithms
20 Dynamic Programming (DP)
21 Bit Manipulation
22 Segment Trees / BIT
23 Suffix Arrays / KMP
24 Shortest Path (Dijkstra, Bellman-Ford, Floyd)
25 MST (Kruskal, Prim)
26 Advanced Graphs (SCC, Bridges, Tarjan’s, 2-SAT)
27 Number Theory (GCD, Sieve, Mod)
28 Math for CP (Combinatorics, Inverse)
29 Game Theory, DP with Bitmasking
30 Heavy-Light Decomposition / Centroid Decomposition

i am about to start dsa in python so i just wanted to know if the above mentioned syllabus is ideal for it or not??

stiff quartz
#

Graph comes so late lol, doing it after trie is mad, where does that syllabus come from, it’s terrible

trim thorn
#

is it a complete trash?

inland haven
#

I will say, follow the basic structure. The structure of learning for dsa isn't much different for python per se.

trim thorn
#

and from where to follow that?

stiff quartz
regal spoke
# trim thorn 0 Time and Space Complexity 1 Arrays & Strings 2 Loops, Conditions, Midpoint Mat...

Typically university courses in "advanced algorithms" bunch the following together

16 Graphs (DFS/BFS)
14 Heaps / Priority Queues
24 Shortest Path (Dijkstra, Bellman-Ford, Floyd)
18 Union-Find (DSU)
25 MST (Kruskal, Prim)
19 Greedy Algorithms
20 Dynamic Programming (DP)

If you are interested in competitive programming. Then 22 Segment Trees is one of the essentials. But for some reason university courses never bring them up.

trim thorn
#

there are lot of courses online on dsa (with python) but most of them follow different syllabus and curriculum.

#

so i want to atleast be aware of ideal syllabus so that i can learn accordingly

trim thorn
regal spoke
#

Guess which python program runs faster?

([0]*2097153) * 22

or

([0]*2107153) * 22
#

Can someone please explain why tourist_mad

stiff quartz
#

👀

#

😖

regal spoke
#

I can maybe find a smaller example

stiff quartz
#

I can reproduce locally

regal spoke
#

Yes holyfuck

#

totally not a bug

stiff quartz
#

Can't reproduce on older versions of Python @regal spoke

stiff quartz
#

can't reproduce on pypy either

#

To be fair creating an issue on CPython github might be justified

regal spoke
#

With PyPy the bug also appears, I have the bug in all versions >= PyPy 7.3.17. but none older than that

stiff quartz
#

Ah, my version is 7.3.16 haven't updated it in a while

#

difference isn't that big here

regal spoke
stiff quartz
#

you should add a =

#

it's printing stuff locally lol

#

doesn't work on 3.12.5

regal spoke
#

Let me see if I can find better examples

regal spoke
stiff quartz
#

ah lol

#

yeah we're checking 262145 twice

regal spoke
#

Can you change 262145 to n and run it again

stiff quartz
#

yes ofc

#

perfect

#

(3.12.5 fyi)

#

Yeah idk why though, not sure we can find out why with little knowledge of Python source code

#

I tried every integer that ends in 145 up to 100_000

regal spoke
stiff quartz
#

I'm going to try all values then

#

up to 100 000

#

that'll be faster

#

to see

regal spoke
#

probably you wanna do something like

for _ in range(10**7//n):
  ([0]*n) * 22
#

To find what is the most expensive to copy independent of the size

stiff quartz
#

I don't understand what you mean

regal spoke
#

Basically what we are interested in is maximizingtime2copy/n

stiff quartz
regal spoke
#

That isnt too impressive

stiff quartz
#

Yep

#

nevermind I got it

#

There are some weird patterns in the 80_000 - 90_000 I think

#

!e

import time

for n in 82148, 82154:
    t = time.time()
    for _ in range(100):
        a = ([0] * n) * 22
    print(n, 'took', time.time() - t)
#

Not impressive

regal spoke
#

I think the bug is windows related

#

and then discord bot wont find it

stiff quartz
#

I'm on Python 3.12 as well

#

discord is on 3.13

halcyon plankBOT
regal spoke
#

Latest version of both CPython and PyPy has the performance bug for me

#

Even slightly older versions have it

stiff quartz
#

!e

import time

for n in 82485, 82490:
    t = time.time()
    for _ in range(100):
        a = ([0] * n) * 22
    print(n, 'took', time.time() - t)
halcyon plankBOT
stiff quartz
#

Also, you don't need the 22 @regal spoke

regal spoke
stiff quartz
#

without anything it's still there

#

a bit

#

definitely less impressive

#

--

some are just abnormaly slow, rather than some abnormaly fast

#

For the value 262145 it goes up suddently and eventually it goes back down to "baseline", not in a sudden manner though

runic flicker
#

Yo guys I m trying to learn python
how should i learn it?

#

Like from YouTube?

stiff quartz
regal spoke
stiff quartz
#

written as 262145 or written as 2**18+1

regal spoke
#

doesnt matter

stiff quartz
regal spoke
#

2**(18 + 1) is not correct

stiff quartz
#

ah yeah my bad lol

#

8 ms vs 11 ms itself is weird

#

if anything 2**18+1 would take time to compute so it should be ever so slightly slower i re-ran them they got much closer

regal spoke
#

On my computer the difference is closer to a factor of 3

stiff quartz
#

What python version

regal spoke
#

Been trying out on different ones. The current one I'm using is 3.12.4

stiff quartz
#

I'm downloading the 3.14 preview

#

@regal spoke Factor 5 with 3.14 and *14 (above was *22)

#

Order doesn't matter

regal spoke
#

Could you try them

stiff quartz
#

I'll even give you a negative control (that's not that negative) then

#

So it is the copying that's a problem

regal spoke
# stiff quartz

That 7.4 vs 1.5 is a lot higher than what I'm able to reach

stiff quartz
#

!e

print(7.4/1.5)
print(5.48/1.166)
halcyon plankBOT
stiff quartz
#

almost factor 5 yes

#

It'd be so fun to be like a CPython or PyPy core developer and try to fix those things, unfortunately, I don't know anything about their source code

#

If you make an issue, send me the link i'm curious

stiff quartz
regal spoke
#

So which pair of n to report?

stiff quartz
#

I think pick the smallest one, don't pick the big initial ones (you lowered it from 10^6 to 10^5)

stiff quartz
#

to me is the most relevant for them:

#
  • It's Python 3.14
#
  • And 7.4 vs 1.5 is insane
regal spoke
#

Does it take equal amount of time to make 14 copies of the bad one as 70 copies of the good one?

#

That would be funny

stiff quartz
#

I can check.

stiff quartz
#

Still on Python 3.14 preview

regal spoke
regal spoke
#

Created a bug report

stiff quartz
#

Lemme check if I can reproduce on Ubuntu

regal spoke
#

It really does seem like windows only

stiff quartz
pearl wigeon
#

Does any1 know a better version of matching a json data structure with a specific string algorithm

stiff quartz
# regal spoke https://github.com/python/cpython/issues/137121
import time

start_time = time.time()
for _ in range(200):
    a = ([0] * 262145)
    b = []
    for _ in range(14):
        b += a.copy()
print(time.time() - start_time)

start_time = time.time()
for _ in range(200):
    a = ([0] * 265665)
    b = []
    for _ in range(14):
        b += a.copy()
print(time.time() - start_time)

This doesn't trigger the bug

regal spoke
#

So it isnt super important that it is 1 + a power of two

stiff quartz
#
import time

for i in range(15, 20):
    print(f"Testing with 2^{i} elements:")
    
    start_time = time.time()
    for _ in range(200):
        a = ([0] * (2**i + 1)) * 14
    print(time.time() - start_time)

    start_time = time.time()
    for _ in range(200):
        a = ([0] * (2**i + 10)) * 14
    print(time.time() - start_time)

    print("---")

Gives us plenty of examples

stiff quartz
stiff quartz
#

@regal spoke I think I have a good idea of why there is a bug

#

You said it started with 7.3.17

#

And in 7.3.17 changelog they say

copy lists in multiplication with log2(n) arraycopys instead of just copying n times (string multiplication does it like this too)
regal spoke
#

I see

#

haha

#

Still doesn't explain why windows = 💀

stiff quartz
#

Yeah loads of things aren't explainable by this

#

but the coïncidence is too big to ignore haha

regal spoke
#

But at least this explains what changed starting from 7.3.17

stiff quartz
#

am I supposed to look for a method __mul__?

regal spoke
#
GitHub

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

GitHub

The Python programming language. Contribute to python/cpython development by creating an account on GitHub.

regal path
stiff quartz
#

it doubles doubles doubles till it's done

halcyon plankBOT
#

Objects/listobject.c line 795

assert(n > 0);```
regal spoke
stiff quartz
#

well I guess this is the problem

#

the last one is just a big copy

#

to add one element

#

but if you do a big copy to add 10 elements, it should be equally bad

regal spoke
#

I don't know the point of the doubling

stiff quartz
#

Idk

#

PyPy pretends it's nice because you only do log2(n) copies

#

but you do bigger copies

regal spoke
#

Actually nvm, the code looks fine

stiff quartz
regal spoke
#

Leaning more and more to that

stiff quartz
#

I only have gcc locally and I'm no C expert

regal spoke
#

Still, a factor of 5

#

is so much

stiff quartz
#

was it 3 on yours?

regal path
stiff quartz
#

how do i check that?

#

Python 3.14.0rc1 (tags/v3.14.0rc1:48f8831, Jul 22 2025, 17:09:57) [MSC v.1944 64 bit (AMD64)] on win32 does that help, MSC = MSVC?

regal spoke
regal path
stiff quartz
#

Is the move to make C code

#

and benchmark memcpy

#

with different compilers on Windows

#

around 2^N+1

regal spoke
#

Copying something of size (2^18 + 1)*2^i with memcopy, yes

stiff quartz
#

well 14 != 2^i

regal spoke
stiff quartz
#

you just need to copy many a times for that "log optimisation" kicks in

#

I think

regal spoke
#

With the last one being slightly off from a full doubling

stiff quartz
#

why is that not working anymore

#

what I have done wrong

regal spoke
#

Idk

stiff quartz
#

Yep, can't reproduce anymore

regal path
#

looks like it isn't msvc after all

#

uhhhhh

#

god my internet is so slow

stiff quartz
#

you want to know something even more bizarre?

regal path
#

okay I'm just confused

stiff quartz
#

If I put my computer to sleep and wake it up

#

I have around half a chance to be able to reproduce the bug

#

And I can only reproduce the bug if I'm lucky otherwise I HAVE TO PUT IT BACK TO SLEEP AND WAKE IT UP

regal path
#

well, we could mess around and create a "proper" mre using timeit

regal spoke
#

I'm getting more than 3 times slower on this

#

Honestly it is looking kind of random

stiff quartz
#

If I run the same ten times in a row

#

sometimes I have a factor 5

#

sometimes a factor 2

#

sometimes a factor 4

#

sometimes a factor 1.3

#

always the 2^k+1 list being slower than 2^k+10

#

That probably indicates a memcpy + Windows memory allocation problem?

regal spoke
#

As all things in life, the one at fault

#

the source of all evil

#

||is microsoft||

regal spoke
stiff quartz
#

Only for the evil number

regal path
#
Size: 262145, Multiplier: 14, Time taken: 7.914299 seconds, Average time taken: 0.007914
Size: 265665, Multiplier: 14, Time taken: 7.043591 seconds, Average time taken: 0.007044
Size: 262145, Multiplier: 15, Time taken: 7.701653 seconds, Average time taken: 0.007702
Size: 265665, Multiplier: 15, Time taken: 7.552365 seconds, Average time taken: 0.007552
Size: 262145, Multiplier: 16, Time taken: 27.139098 seconds, Average time taken: 0.027139
Size: 265665, Multiplier: 16, Time taken: 27.375930 seconds, Average time taken: 0.027376
#

(on linux)

#

yeah idk, it's inconsistent here as well

regal spoke
regal path
#

no clue

regal spoke
#

Logically, that should be roughly the same amount of data

regal path
#

I only have zram from swap so that shouldn't be a reason

#

yeah inconsistent af

regal spoke
#

Maybe a simple fix is to grow by slightly less than a factor of two

#

The problem with that factor of two is that it is possible to repeatedly hit the bad case

stiff quartz
stiff quartz
regal path
stiff quartz
#

Then all the data again (but you have twice as much data)

#

You can do less than a factor of 2

#

Of course in your case it’s only zeros

#

So you could

#

If you have real data you have to copy multiples of the length of the initial array

regal spoke
stiff quartz
#

That’s what was done before they did this

#

They’d have n memcopy of that initial size array

#

Instead of log2(n)

regal spoke
regal path
stiff quartz
regal path
#

roughly yeah

stiff quartz
#

I don’t see anything surprising though it might just be not in the same cache

#

Anymore

#

So significantly slower

#

But consistently between the two numbers

stiff quartz
#

If your array is 1,2,3 at step 1

#

And you wanna copy it 8 times

regal spoke
#

There is no reason why you need to do multiples of the orignal list

stiff quartz
#

But that means you need to keep the original list alive

regal spoke
#

no

stiff quartz
#

Ah ok no I guess

#

What would be the pseudo code though

regal spoke
#

Here is a way to allways call memcpy with increasing power of two sized chunks (except for the first and last call)

def copy_k(A, k):
  n = len(A)
  m = n * k
  B = [0] * m
  B[:n] = A
  i = n
  pow2 = 1
  while i < m:
      chunk = min(m - i, pow2)
      j = i % n
      B[i:i+chunk] = B[j:j+chunk]
      
      i += chunk
      pow2 *= 2
  return B
stiff quartz
#

memcpy with a power of 2

#

or memcpy with a power of 2 + 1

regal spoke
#

My point is just that one doesn't have to call memcpy with multiples of n

stiff quartz
#

Can you run that C code locally?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define COUNT1 262144   // 2^18
#define COUNT2 262145   // 2^18 + 1
#define COUNT3 262154   // 2^18 + 10
#define REPEATS 10000

void benchmark_copy(const int *src, int *dst, size_t count, const char *label) {
    size_t size_bytes = count * sizeof(int);

    clock_t start = clock();
    for (int i = 0; i < REPEATS; ++i) {
        memcpy(dst, src, size_bytes);
    }
    clock_t end = clock();

    double total_sec = (double)(end - start) / CLOCKS_PER_SEC;
    printf("%s: Copied %zu bytes (%zu ints) %d times\n", label, size_bytes, count, REPEATS);
    printf("%s: Total time: %.6f sec\n", label, total_sec);

    // Prevent optimizing away some stuff
    printf("%d \n", dst[14]);
}

int main(void) {
    static int src1[COUNT1], dst1[COUNT1];
    static int src2[COUNT2], dst2[COUNT2];
    static int src3[COUNT3], dst3[COUNT3];

    for (size_t i = 0; i < COUNT1; ++i) src1[i] = (int)i;
    for (size_t i = 0; i < COUNT2; ++i) src2[i] = (int)i;
    for (size_t i = 0; i < COUNT3; ++i) src3[i] = (int)i;
    
    // Do all benchmarks twice just in case
    benchmark_copy(src1, dst1, COUNT1, "COUNT1 (2^18)");
    benchmark_copy(src1, dst1, COUNT1, "COUNT1 (2^18)");
    benchmark_copy(src2, dst2, COUNT2, "COUNT2 (2^18 + 1)");
    benchmark_copy(src2, dst2, COUNT2, "COUNT2 (2^18 + 1)");
    benchmark_copy(src3, dst3, COUNT3, "COUNT3 (2^18 + 10)");
    benchmark_copy(src3, dst3, COUNT3, "COUNT3 (2^18 + 10)");

    return 0;
}
#

I get very suspicious results

#

And that might explain everything

regal spoke
#

I only have gcc setup atm

stiff quartz
#

I have gcc as well

#

I get that

#
COUNT1 (2^18): Copied 1048576 bytes (262144 ints) 10000 times
COUNT1 (2^18): Total time: 8.590000 sec
14
COUNT1 (2^18): Copied 1048576 bytes (262144 ints) 10000 times
COUNT1 (2^18): Total time: 8.605000 sec
14
COUNT2 (2^18 + 1): Copied 1048580 bytes (262145 ints) 10000 times
COUNT2 (2^18 + 1): Total time: 0.400000 sec
14
COUNT2 (2^18 + 1): Copied 1048580 bytes (262145 ints) 10000 times
COUNT2 (2^18 + 1): Total time: 0.400000 sec
14
COUNT3 (2^18 + 10): Copied 1048616 bytes (262154 ints) 10000 times
COUNT3 (2^18 + 10): Total time: 0.335000 sec
14
COUNT3 (2^18 + 10): Copied 1048616 bytes (262154 ints) 10000 times
COUNT3 (2^18 + 10): Total time: 0.333000 sec
14
#

But that really makes no sense, the difference is huge, we're talking *20

#

(And I kinda wrote pseudo code and asked a LLM to translate so it might not be great code)

#

With MSVC I get that:

#

At least that aligns with the GitHub issue

#

It does seem that gcc has its own problems with memcpy they just don' thit the same values

#

OR maybe my code is trash

regal spoke
#

Even though it makes stupid copies on the right hand side

stiff quartz
#

but look at the gaps

#

for memcpy

#

if it really is a memcpy bug that has a *20 gap

#

if you avoid it, even with stupid copies you'll be faster

regal spoke
#

That's what I think is happening. Or rather, I don't think my code uses memcpy at all

#

Since python needs its reference count it can't use memcpy

stiff quartz
halcyon plankBOT
#

Include/internal/pycore_list.h lines 56 to 65

_Py_memory_repeat(char* dest, Py_ssize_t len_dest, Py_ssize_t len_src)
{
    assert(len_src > 0);
    Py_ssize_t copied = len_src;
    while (copied < len_dest) {
        Py_ssize_t bytes_to_copy = Py_MIN(copied, len_dest - copied);
        memcpy(dest + copied, dest, (size_t)bytes_to_copy);
        copied += bytes_to_copy;
    }
}```
regal spoke
#

I'm just comparing copy_k(A, 20) to A*20 atm.

stiff quartz
#

since there is no list.__mul__ being called

#

Yes, in copy_k(A, 20), _Py_memory_repeat might not be called?

regal spoke
#

It doesn't

stiff quartz
#

So the bug isn't triggered

regal spoke
#

Python needs to keep track of reference counting

#

Only when it doesn't have to think about that can it use memcpy

stiff quartz
#

Ah ok I didn't know

regal spoke
#

So like if you do .insert on a list, then it can use memcpy/memmove

#

Since no reference counts are changing

stiff quartz
#

is that why .insert has such a low constant factor 💀

regal spoke
#

Ye .insert has insanely low constant factor because of that

stiff quartz
regal spoke
#

the right hand side there creates a new list

#

That requires updated counts