#algos-and-data-structs | Python | Page 53

exotic parrot Apr 26, 2024, 1:43 AM

#

space as in making code less clutterd not memory

#

can the same be done with if statements?

regal spoke Apr 26, 2024, 1:45 AM

#

The closest thing I know about if statements is that you can use any and all

#

!e

A = [1,2,3]

print(all(a > 0 for a in A))

halcyon plankBOT Apr 26, 2024, 1:45 AM

#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

True

regal spoke Apr 26, 2024, 1:46 AM

#

This is a lot more convinient than using an if-statement in a for loop

exotic parrot Apr 26, 2024, 1:46 AM

#

oh so you can use this to check if there are negative numbers in your list, if not remove all numbers that are larger then your 'wanted' number

regal spoke Apr 26, 2024, 1:47 AM

#

I didn't try to imply anything with my example other than that all and any can be nice to use

regal spoke Apr 26, 2024, 1:47 AM

#

exotic parrot oh so you can use this to check if there are negative numbers in your list, if n...

Not sure what you are talking about

exotic parrot Apr 26, 2024, 1:47 AM

#

in the context of the subsut sum problem

regal spoke Apr 26, 2024, 1:48 AM

#

I wasn't talking about subset sum problem at all with my any/all example

exotic parrot Apr 26, 2024, 1:48 AM

#

yeah I know, I was just thinking of implementations 'outloud'

regal spoke Apr 26, 2024, 1:49 AM

#

I'd say try to go through this code and figure out what it is doing #algos-and-data-structs message
Especially this line is pretty nice return find_sum(A1, target - sum2) + find_sum(A2, sum2)

exotic parrot Apr 26, 2024, 1:49 AM

#

is there any function that let's you remove something prom a list besided A.pop()

regal spoke Apr 26, 2024, 1:50 AM

#

exotic parrot is there any function that let's you remove something prom a list besided A.pop(...

Yes. del A[3:10] removes A[3], A[4], ..., A[9] from A

#

So you can remove intervals using del

exotic parrot Apr 26, 2024, 1:50 AM

#

oh that's pretty handy

#

can it be combinden with all/any?

regal spoke Apr 26, 2024, 1:51 AM

#

I don't see how del and all/any could be combined

exotic parrot Apr 26, 2024, 1:51 AM

#

like idk del any( a > x for a in A)

regal spoke Apr 26, 2024, 1:51 AM

#

Ah

#

Then you should do

#

[a for a in A if a <= x]

#

This creates a new list

#

that only contains the elements <= x

regal spoke Apr 26, 2024, 1:53 AM

#

exotic parrot like idk del any( a > x for a in A)

I've seen something like this in Matlab, but there is nothing like that in Python. Maybe in numpy there is something similar to it

exotic parrot Apr 26, 2024, 1:53 AM

#

I assume putting the a before for a in... is just notation? or "defining "a" "

regal spoke Apr 26, 2024, 1:55 AM

#

!e

A = [i*i for i in range(10)]
print(A)

exotic parrot Apr 26, 2024, 1:55 AM

#

or does it just refer to the element that has to be put inside the new list

halcyon plankBOT Apr 26, 2024, 1:55 AM

#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

exotic parrot Apr 26, 2024, 1:56 AM

#

ahh alright, it's that simple

#

this stuff could've saved me alot of time... dammit

#

I've been going through your code

#

U used set to make iterating faster, right?

regal spoke Apr 26, 2024, 2:01 AM

#

The time it takes for
x in A1sums:
depends on what data structure A1sums is

#

If A1sums is a list, then this is slow (O(n) time where n is size of A1sums)

#

But if A1sums is a set, then this is fast (O(1) time where n is size of A1sums)

exotic parrot Apr 26, 2024, 2:03 AM

#

how does it go over the set then?

#

I mean I know how a set works, I'm just wondering why it can get it out instantly without iterating

regal spoke Apr 26, 2024, 2:03 AM

#

set is a hashtable. It uses hashes to pretty much instantly tell if x lies in A1sums or not

exotic parrot Apr 26, 2024, 2:03 AM

#

ohh ok

#

haven't seen much stuff about hashes but I assume they're just "adresses"?

regal spoke Apr 26, 2024, 2:05 AM

#

hash(x) is pretty much just a random number

#

You should look up hash tables if you are interested in how they work

#

Its not super important for now

#

Just remember that x in A1sums: is instant if A1sums is a set

#

and slow if A1sums is a list

exotic parrot Apr 26, 2024, 2:07 AM

#

alr got it

#

the last return statement,
return find_sum(A1, target - sum2) + find_sum(A2, sum2) does this basically just start from our wanted number, and just keeps subtracting until we're left with target = 0

regal spoke Apr 26, 2024, 2:09 AM

#

The idea is that if we get to that line of the code

exotic parrot Apr 26, 2024, 2:09 AM

#

our target becomes the two summed numbers used to create initial target

regal spoke Apr 26, 2024, 2:09 AM

#

Then I know there exists sum1 in A1sums and sum2 in A2sums such that sum1 + sum2 = target

#

Now I recursively ask for which numbers in A1 that can be used to make sum1, and which numbers in A2 that can be used to make sum2

exotic parrot Apr 26, 2024, 2:10 AM

#

ohh ok

#

and the iterations stop, after target is equal to a number which cannot be written as a sum of available numbers

regal spoke Apr 26, 2024, 2:11 AM

#

If it is not possible to hit the target, then the function returns None

exotic parrot Apr 26, 2024, 2:12 AM

#

let's say I'd want to print those results, what would be a good place to place the print statement?

#

considering we iterate using previous sums

#

would putting used numbers in a list be a viable option?

#

and then just iterating over this list

#

with "+" and then just "=" target

regal spoke Apr 26, 2024, 2:17 AM

#

!e

def find_sum(A, target):
  if sum(A) == target:
    return A
  if target == 0:
    return []

  A1 = A[:len(A)//2] # First half
  A2 = A[len(A)//2:] # 2nd half

  # Compute all possible sums of first half
  A1sums = [0]
  for a in A1:
    A1sums += [s + a for s in A1sums]

  # Compute all possible sums of 2nd half
  A2sums = [0]
  for a in A2:
    A2sums += [s + a for s in A2sums]

  # Check if target - sum2 = sum1, for sum1 in A1sums, and sum2 in A2sums
  A1sums = set(A1sums) # Make into set for fast lookup
  for sum2 in A2sums:
    if target - sum2 in A1sums:
      # Target is possible to reach!
      # Recursively find which numbers to use from A1 and A2
      # This runs very fast since A1 and A2 are tiny in comparision to A
      return find_sum(A1, target - sum2) + find_sum(A2, sum2)
  
  # Not possible
  return None

A = [1,2,3,4,5,6,7,8,9]
target = 26
print(find_sum(A, target))

halcyon plankBOT Apr 26, 2024, 2:17 AM

#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

[1, 3, 4, 5, 6, 7]

exotic parrot Apr 26, 2024, 2:19 AM

#

wait... what does return [] do?

#

I mean I know it has smtg to do with the output, but how does it get filled

regal spoke Apr 26, 2024, 2:20 AM

#

Returns an empty list

#

Which corresponds to the empty subset

exotic parrot Apr 26, 2024, 2:22 AM

#

ah ok, how does the output list get filled then?

regal spoke Apr 26, 2024, 2:23 AM

#

The + on the return find_sum(A1, target - sum2) + find_sum(A2, sum2) line

#

The + here joins two lists

regal spoke Apr 26, 2024, 2:25 AM

#

exotic parrot ah ok, how does the output list get filled then?

If the target is 0 then [] is a valid output

exotic parrot Apr 26, 2024, 2:26 AM

#

regal spoke The + here joins two lists

I think I don't see how the recursive functions return output, considering the output is the function itself

#

well I know it stops at some point, but what happens to those previous iterations

regal spoke Apr 26, 2024, 2:28 AM

#

It returns output just like any recursive function does

#

Recursive calls until it reaches a base case

exotic parrot Apr 26, 2024, 2:31 AM

#

yeah but this function should return [] after the last iteration right?

#

and I guess that's the thing I don't get

#

how is ```[1, 3, 4, 5, 6, 7]

#

ah nvm ok

exotic parrot Apr 26, 2024, 2:33 AM

#

exotic parrot yeah but this function should return [] after the last iteration right?

A is returned

#

ohhh

#

ok

#

nvm

#

I forgot that the target was changing, thus the sum which was returned by A

#

ok I see it now

#

alright, I think you helped me with some intuitive understanding of these recursive algorithms

#

thank you for your time, sorry for being a bit slow it's pretty late here

rigid trench Apr 26, 2024, 6:17 PM

#

Hey, where would I ask about optimizing numpy?

#

There doesn't really seem to be a great "optimization" space other than this one maybe

slender sandal Apr 26, 2024, 6:57 PM

#

Wdym

#

What do you need

haughty mountain Apr 26, 2024, 9:21 PM

#

@slender sandal presumably this

#

but I don't think that's asking the right question, numpy will for sure do that pretty efficiently

#

what's the overall operation being done? maybe how that overall computation is done can be optimized

pearl crag Apr 26, 2024, 11:35 PM

#

hello everyone , am new member in the group , i was wandering if someone can help me , i have some problems using python

#

?

languid thistle Apr 27, 2024, 2:47 PM

#

guys I need a sanity check

#

    //first we sort, such that we can short-circuit after the first two items' sum is greater than the next item to be placed as the third item
    let result = []
    //it's each entry in candidates may be used once, not each number in candidates may be used once
    //results will hold an iteration-ordered list of each acceptable combination (also as an ordered list)
    //because there may be repeats in candidates, this may include multiple, identical lists
    candidates.sort(function (a,b){return a-b})
    console.log(`candidates: ${candidates}`)

    _combinationSum(0, [], target)

    function _combinationSum(i, usedIndices, remainder){
        console.log(`remainder: ${remainder} values used:`)
        console.log(usedIndices.map((e,i)=>candidates[i]))
        if (remainder==0){
            let usedVals = usedIndices.map((x, i)=>candidates[i])
            result.push(usedVals)
            return
        }
        else if (remainder <0){
         console.log('short circuiting with values')
         console.log(usedIndices.map((e,i)=>(candidates[i])))
         console.log(`and remainder ${remainder}`)
            
            return
        }
        else{
            for (let j=i; j<candidates.length;  j+= 1){
                const newRemainder = remainder-candidates[j]
                if (newRemainder < 0){break}
                let newUsedIndices = usedIndices.slice()
                newUsedIndices.push(j)
                _combinationSum(j+1, newUsedIndices, newRemainder)
            }
        }
    }
    return result
};

};```

#

example target: 8 example input: [10,1,2,7,6,1,5] example output: [[0,1,5],[0,3,4],[0,6],[1,3,4],[1,6],[3,5]] expected output: [[1,1,6],[1,2,5],[1,7],[2,6]]

#

where am I going wrong?

haughty mountain Apr 27, 2024, 4:16 PM

#

sir, this is a python's||erver||

languid thistle Apr 27, 2024, 4:22 PM

#

haughty mountain sir, this is a python's||erver||

I literally forgot I switched to leetcoding in javascript.

#

probably shouldn't matter though

haughty mountain Apr 27, 2024, 4:34 PM

#

first off you're returning indices

languid thistle Apr 27, 2024, 4:38 PM

#

haughty mountain first off you're returning indices

yeah that was old code

#

examples are from the current code (it's big so I didn't wanna repost, just edited it

haughty mountain Apr 27, 2024, 4:39 PM

#

languid thistle examples are from the current code (it's big so I didn't wanna repost, just edit...

no it's not?

#

0 is not an element

languid thistle Apr 27, 2024, 4:40 PM

#

haughty mountain no it's not?

hmmm?

haughty mountain Apr 27, 2024, 4:40 PM

#

those are for sure indices

languid thistle Apr 27, 2024, 4:40 PM

#

haughty mountain those are for sure indices

sure but observe this if (remainder==0){ let usedVals = usedIndices.map((x, i)=>candidates[i]) result.push(usedVals) return

#

it converts from the used indices to grabbing actual entries from candidates

haughty mountain Apr 27, 2024, 4:41 PM

#

the examples aren't updated

languid thistle Apr 27, 2024, 4:41 PM

#

I dunno, I can like just see it right there, the map over usedIndicies into candidates and returned to usedVals whenever the remainder is 0

#

maybe discord has not sent that update to the server yet but doubtful

haughty mountain Apr 27, 2024, 4:42 PM

#

languid thistle Apr 27, 2024, 4:42 PM

#

oh sorry the outputs

#

I thought you meant the code as an example for some reason

#

my bad

#

``
candidates:[10,1,2,7,6,1,5]
target =8

Output
[[1,1,2],[1,1,2],[1,1],[1,1,2],[1,1],[1,1]]
Expected
[[1,1,6],[1,2,5],[1,7],[2,6]]```

haughty mountain Apr 27, 2024, 4:47 PM

#

so you're clearly extracting the wrong values

#

and duplicates of sets, presumably

#

just from the output I can tell this doesn't do what you think it does

usedIndices.map((x, i)=>candidates[i])

#

it's suspicious that all the values are the first few in the sorted sequence, no?

languid thistle Apr 27, 2024, 4:52 PM

#

haughty mountain just from the output I can tell this doesn't do what you think it does ``` usedI...

well what that does is grab the ith element in the array candidates

#

I would sort of agree with you except for the fact that the output is being built from a recursive call and you know how those can get whacky. Say, just repeatedly grabbing the first element or repeating over a recursive step which is already done, or not updating the input to subset a given recursive call's input for the next call

haughty mountain Apr 27, 2024, 4:54 PM

#

languid thistle well what that does is grab the ith element in the array candidates

i being what?

languid thistle Apr 27, 2024, 4:54 PM

#

haughty mountain `i` being what?

i is set to the index of the current element there. It's an optional parameter to map

#

map will take each element of an array, assign it to x, and assign i to whatever the index of that element is in the array

haughty mountain Apr 27, 2024, 4:55 PM

#

why would you want to use that index?

languid thistle Apr 27, 2024, 4:56 PM

#

well because, once the remainder hits 0, I need to take the indices whose' values were used in computing the remainder and find their associated values to build the set that was subtracted from the initial sum

haughty mountain Apr 27, 2024, 4:56 PM

#

I'm not asking why you're trying to map things back to values

#

I'm saying

usedIndices.map((x, i)=>candidates[i])
```is just incorrect

#

it's not doing what you want it to do

#

there is a reason all the sequences in the output are all the first few elements of the sorted sequence

languid thistle Apr 27, 2024, 4:59 PM

#

can you just explain why you think it is that part specifically that is not correct / not doing w hat I think it's doing?

#

I can explain why I think it does what I explained

haughty mountain Apr 27, 2024, 5:00 PM

#

you're always extracting the first few elements of candidates

#

you're not extracting the elements at the indices of usedIndices

languid thistle Apr 27, 2024, 5:00 PM

#

yeyp

#

oh

haughty mountain Apr 27, 2024, 5:01 PM

#

I'm assuming this would work fine

usedIndices.map(i=>candidates[i])

#

now things will actually have the target sum

#

but you'll probably have duplicates, which is a more fundamental problem

languid thistle Apr 27, 2024, 5:03 PM

#

haughty mountain I'm assuming this would work fine ``` usedIndices.map(i=>candidates[i]) ```

this will map the values to whatever is stored at the index of that value

#

so [400,2].map(i=>candidates[i]) will return [candidates[400], candidates[2]]

haughty mountain Apr 27, 2024, 5:03 PM

#

correct

#

you are storing indices

#

you want to extract the corresponding values in candidates

#

this does that

languid thistle Apr 27, 2024, 5:04 PM

#

thanks dude

#

definitely definitely that is my issue. I went brain dead and was using indices derived automatically of the map function for their array entries to map instead of that same array's values which are, in fact, the relevant indices

#

much thank you for taking the time to enlighten me

rigid trench Apr 28, 2024, 3:02 AM

#

import cupy as cp
self.F = cp.random.random((res,res), dtype=cp.float32)

def addTo(self, arr):
    # Very slow.
    i0, i1= self.index # The middle of the right square
    wrap0 = i0 if i0!=0 else self.res # Fixes an issue with slices when the origin is 0
    wrap1 = i1 if i1!=0 else self.res
    arr[:-i0,:-i1]+=self.F[wrap0:,wrap1:] # A
    arr[:-i0,-i1:]+=self.F[wrap0:,:wrap1] # B
    arr[-i0:,:-i1]+=self.F[:wrap0,wrap1:] # C
    arr[-i0:,-i1:]+=self.F[:wrap0, :wrap1] # D
    return arr

This is my bottleneck, looking for ideas on how to speed it up.

lean walrus Apr 28, 2024, 3:47 AM

#

i have no experience with cupy (and little experience with numpy), but i think this might help: https://docs.cupy.dev/en/stable/reference/generated/cupy.roll.html#cupy.roll
the idea is to "shift" first array down-right by the size of A, and then add it to the original array
documentation says that Elements that roll beyond the last position are re-introduced at the first., which is perfect for this case

rigid trench Apr 28, 2024, 4:01 AM

#

Yeah the issue is that roll I believe re-writes in memory

#

so while it works, it's slow

tender atlas Apr 28, 2024, 7:01 AM

#

Have you ever guys solve DFS with shortest path? I have 20x20 grid where i am getting a path which so much larges. I tried path compression to make the dfs path from start point to goal point for getting an optimal path. Is it really possible to get an optimal path from dfs?

regal spoke Apr 28, 2024, 9:21 AM

#

tender atlas Have you ever guys solve DFS with shortest path? I have 20x20 grid where i am ge...

DFS for shortest paths is not really a thing. You need BFS

#

Also fyi, a 20x20 grid is tiny, so any shortest path program should run super fast for it

regal spoke Apr 28, 2024, 10:50 AM

#

rigid trench ```py import cupy as cp self.F = cp.random.random((res,res), dtype=cp.float32) ...

I dont understand why you are indexing with negative numbers like that. If the sizes of A,B,C,D are the same, then you should be able to index the left hand side and right hand side using the same (positive) numbers

haughty mountain Apr 28, 2024, 11:04 AM

#

rigid trench ```py import cupy as cp self.F = cp.random.random((res,res), dtype=cp.float32) ...

what are the sizes? how long does it take? and how long is it ok to take?

flat sorrel Apr 28, 2024, 12:31 PM

#

rigid trench ```py import cupy as cp self.F = cp.random.random((res,res), dtype=cp.float32) ...

Are you calling this function many times? If that's the case, perhaps you can batch the calls together to avoid looping in Python

regal spoke Apr 28, 2024, 12:53 PM

#

One more thing, A + D = D + A and B + C = C + B. So looks like you are computing all sums twice

flat sorrel Apr 28, 2024, 12:57 PM

#

btw what is the relationship between arr and self.F?

muted helm Apr 28, 2024, 1:00 PM

#

Hola guys, I have to convert my sha256 into something that is bigint compatible - just want to triple check but the below is correct right?:
def f_sha256_to_8_byte(f_sha256_input_value):
return int(f_sha256_input_value, 16) % 2**63

slender sandal Apr 28, 2024, 1:12 PM

#

You can mask it via int(...) & ((1 << 63) - 1) to not use modulus and exponentiation

#

Also you should probably use the right number of bits to mask, so int(...) & ((1 << 255) - 1)

#

Oh I'm just now realizing the name of the function

#

SHA256 gives hashes of 8 32-bit integers, not 8 one byte long integers

muted helm Apr 28, 2024, 1:41 PM

#

Yeah but the idea would be to take sha and compress it into bigint which i understand to be 8 byte

#

This is just for internal house keeping where I have to use bigint

#

you are right it should be
def f_sha256_to_8_byte(f_sha256_input_value):
return int(f_sha256_input_value, 16) % (2**63 -1)

#

But that should give the same result no?

flat sorrel Apr 28, 2024, 1:48 PM

#

if you're applying modulus anyway, why not simply truncate the string before converting it into an int?

muted helm Apr 28, 2024, 1:48 PM

#

I am fairly indifferent, only point is then I have to calculate how much to truncate

slender sandal Apr 28, 2024, 1:49 PM

#

flat sorrel if you're applying modulus anyway, why not simply truncate the string before con...

Even better imo

regal spoke Apr 28, 2024, 1:51 PM

#

flat sorrel if you're applying modulus anyway, why not simply truncate the string before con...

%(2**63 - 1) and %2**63 is not the same thing

#

oh I replied to the wrong comment

regal spoke Apr 28, 2024, 1:51 PM

#

muted helm you are right it should be def f_sha256_to_8_byte(f_sha256_input_value): re...

I meant this

muted helm Apr 28, 2024, 1:51 PM

#

I know, I meant the same as in using modulus or truncating

#

actually it wont be the same either

#

but it should serve the same purpose given sha256 is uniform

#

or am I completely on the moon?

rigid trench Apr 28, 2024, 1:52 PM

#

flat sorrel btw what is the relationship between `arr` and `self.F`?

arr is an accumulator.
The algorithm outline:
There are 6 matrix of size 256x256 (or any size). Each matrix has an offset. This avoids then need to roll the matrix, but makes the sum of two matrixes more complicated.

For example, Matrix 1 may have the origin at (5,0). In which case the data is really [5:, 0:] concat to [:5,0:]
That data is collected at the accumulator at indicies [:-5,0:] concat to [-5:,0:] We avoid concatting in practice by accumulating directly into the correct area of the matrix.

regal spoke Apr 28, 2024, 1:52 PM

#

Your code is wrong, you want either %2**63 or &(2**63 -1) (these are the sasme thing)

muted helm Apr 28, 2024, 1:53 PM

#

ah because mod also returns 0 I suppose?

regal spoke Apr 28, 2024, 1:53 PM

#

wut

#

% (2**63 - 1) is a weird operation that you should probably avoid using

muted helm Apr 28, 2024, 1:54 PM

#

Yeah my bad I confused 2 thiungs

#

in any case, given that sha256 is uniform mod 2**63 should compress sha256 correctly to bigint right? (Fully appreciating that it materially increases collision risk, but that should be managable from my end as I dont have more than a few million records)

#

I am happy to truncate instead if that is better

flat sorrel Apr 28, 2024, 1:55 PM

#

rigid trench arr is an accumulator. The algorithm outline: There are 6 matrix of size 256x256...

can you show an example of how you're calling the function?

#

I don't really see why the current function is "slow" enough to matter

rigid trench Apr 28, 2024, 1:56 PM

#

haughty mountain what are the sizes? how long does it take? and how long is it ok to take?

As fast as possible. This is used for graphics processing. Currently I can do the operation about 150x per second on my laptop, but the faster it is, the more iterations I can do per second

slender sandal Apr 28, 2024, 1:58 PM

#

muted helm I am fairly indifferent, only point is then I have to calculate how much to trun...

Slicing is one wonderful thing. Assuming your string represents a number in base 16, big endian: needed = int(value[-64:], 16)

rigid trench Apr 28, 2024, 1:59 PM

#

flat sorrel can you show an example of how you're calling the function?


class LatticeVector:
  def addTo(self, arr):
    # Very slow. [ above]

class LatticeFrame:
  def addTo(self):
    self.temp.fill(0)
    return self.Qs[4].addTo(self.Qs[3].addTo(self.Qs[2].addTo(self.Qs[1].addTo(self.Qs[0].addTo(self.temp)))))

class CubeLattice:
  def get(self):
    for i in range(6):
      self.output[i,:,:]=self.F[i].addTo()
    return self.output.view()

L=CubeLattice(256) # controllable texture size
cProfile.run('''for i in range(1000):
                     L.get()''')

muted helm Apr 28, 2024, 2:00 PM

#

slender sandal Slicing is one wonderful thing. Assuming your string represents a number in base...

This will become way too big no? If I wanted to slice shouldnt I do it after conversion to int?

flat sorrel Apr 28, 2024, 2:00 PM

#

rigid trench ```py class LatticeVector: def addTo(self, arr): # Very slow. [ above] c...

ok I think you should be batching your operations

#

e.g. by stacking the input array to 3D

haughty mountain Apr 28, 2024, 2:01 PM

#

rigid trench As fast as possible. This is used for graphics processing. Currently I can do th...

That's a a wildly unhelpful answer to my questions 😔

rigid trench Apr 28, 2024, 2:01 PM

#

haughty mountain That's a a wildly unhelpful answer to my questions 😔

The size is 6*256*256*5

#

in total

flat sorrel Apr 28, 2024, 2:01 PM

#

then modifying your function to handle batch input

def addTo(self, arr):
    # Very slow.
    i0, i1= self.index # The middle of the right square
    wrap0 = i0 if i0!=0 else self.res # Fixes an issue with slices when the origin is 0
    wrap1 = i1 if i1!=0 else self.res
    arr[..., :-i0,:-i1]+=self.F[wrap0:,wrap1:] # A
    arr[..., :-i0,-i1:]+=self.F[wrap0:,:wrap1] # B
    arr[..., -i0:,:-i1]+=self.F[:wrap0,wrap1:] # C
    arr[..., -i0:,-i1:]+=self.F[:wrap0, :wrap1] # D
    return arr

haughty mountain Apr 28, 2024, 2:02 PM

#

rigid trench The size is `6*256*256*5`

that's not that big, how long does it take in just plain numpy?

rigid trench Apr 28, 2024, 2:03 PM

#

haughty mountain that's not that big, how long does it take in just plain numpy?

I agree, it's not big. I'm CPU bound or something.

haughty mountain Apr 28, 2024, 2:03 PM

#

you're running stuff on a GPU, you're probably bound on transferring data to the GPU

rigid trench Apr 28, 2024, 2:03 PM

#

flat sorrel then modifying your function to handle batch input ```py def addTo(self, arr): ...

How does the ... work?

haughty mountain Apr 28, 2024, 2:04 PM

#

transferring data to the GPU is comparatively slow

#

hence stuff like batching

flat sorrel Apr 28, 2024, 2:04 PM

#

it skips the leading dimensions, so :-i0, :-i1 (for example) would correspond to the last 2 dimensions

haughty mountain Apr 28, 2024, 2:04 PM

#

I still say get some baseline just based on numpy

muted helm Apr 28, 2024, 2:05 PM

#

slender sandal Slicing is one wonderful thing. Assuming your string represents a number in base...

Also is there a particular reason to slice instead of mod? I assume it is faster?

haughty mountain Apr 28, 2024, 2:05 PM

#

i.e. how long does this take on the CPU?

rigid trench Apr 28, 2024, 2:05 PM

#

flat sorrel it skips the leading dimensions, so `:-i0, :-i1` (for example) would correspond ...

No I need to sum the matrixes together

flat sorrel Apr 28, 2024, 2:05 PM

#

you can perform the sum (a reduction operation) over the leading axes at the end

slender sandal Apr 28, 2024, 2:05 PM

#

muted helm This will become way too big no? If I wanted to slice shouldnt I do it after con...

It will save int(..., 16) from doing its thing for certainly too long hex digests

flat sorrel Apr 28, 2024, 2:06 PM

#

flat sorrel you can perform the sum (a reduction operation) over the leading axes at the end

it seems that self.F isn't changing between each call, that's why I suggested to do it in batch

rigid trench Apr 28, 2024, 2:07 PM

#

haughty mountain i.e. how long does this take on the CPU?

Numpy

ncalls tottime percall cumtime percall
30000 4.150 0.000 4.150 0.000 LatticeVector.py:41(addTo)
Cupy
30000 2.587 0.000 2.613 0.000 LatticeVector.py:41(addTo)

slender sandal Apr 28, 2024, 2:08 PM

#

slender sandal It will save `int(..., 16)` from doing its thing for certainly too long hex dige...

@muted helm
But if you're getting your hash from hashlib.sha256, you should be able to get precisely what you want by doing

int.from_bytes(hashlib_obj.digest())

haughty mountain Apr 28, 2024, 2:08 PM

#

do you have some minimal runnable example?

muted helm Apr 28, 2024, 2:08 PM

#

slender sandal <@459739123348144138> But if you're getting your hash from `hashlib.sha256`, yo...

But this doesnt compress it to bigint or am I missing something?

rigid trench Apr 28, 2024, 2:08 PM

#

flat sorrel it seems that `self.F` isn't changing between each call, that's why I suggested ...

Self.F will eventually change between each call, but the full algorithm isn't coded yet

slender sandal Apr 28, 2024, 2:09 PM

#

muted helm But this doesnt compress it to bigint or am I missing something?

int.from_bytes literally turns a bytes-like object into an int object. Like what int(..., 16) would do in the end

flat sorrel Apr 28, 2024, 2:09 PM

#

rigid trench Self.F will eventually change between each call, but the full algorithm isn't co...

where are you getting that algorithm from? I think it would give us some context to work on

rigid trench Apr 28, 2024, 2:09 PM

#

flat sorrel where are you getting that algorithm from? I think it would give us some context...

This is Lattice Botzmann fluid simulation, using setup D2Q5

#

I've adapated it for a cube

muted helm Apr 28, 2024, 2:10 PM

#

slender sandal `int.from_bytes` literally turns a bytes-like object into an `int` object. Like ...

yeah but it will still be too long no? Then I would need to slice out the first x digits to make it compatible with bigint

rigid trench Apr 28, 2024, 2:10 PM

#

(hence: 6 surfaces of square size)

rigid trench Apr 28, 2024, 2:13 PM

#

flat sorrel where are you getting that algorithm from? I think it would give us some context...

I've optimized the "Streaming" step by not rolling the data, and instead moving the origin point. However for the Collision step, I still need to know how many particles are in the 'real' cell. So I need to create an accumulator to add many matrixes with different offsets

#

Right now, that's 5 matrixes per cell, but a standard D2Q9 setup would require 9 (one for each direction + standing still).
A 3D model (which I might try) would require between21 to 27 matrixes to be added together.

#

My gut tells me: all the data is already in the GPU, and it's just matrix addition. So I don't know if I'm doing something wrong, I feel like it should be able to add much more data

#

Otherwise, I'm bound to maximum 150 FPS

slender sandal Apr 28, 2024, 2:17 PM

#

muted helm yeah but it will still be too long no? Then I would need to slice out the first ...

What bigint are you talking about

flat sorrel Apr 28, 2024, 2:31 PM

#

rigid trench I've optimized the "Streaming" step by not rolling the data, and instead moving ...

... and the matrix to add each time may be different, right?

muted helm Apr 28, 2024, 2:33 PM

#

slender sandal What bigint are you talking about

bigint is the format I have to compress my sha256 hash into

flat sorrel Apr 28, 2024, 2:33 PM

#

so basically you have an accumulator array with shape (256, 256), and you want to add Q other matrixes (each also (256, 256) in size but with different offsets) to that accumulator array

#

@rigid trench am I understanding this correctly?

slender sandal Apr 28, 2024, 2:36 PM

#

muted helm bigint is the format I have to compress my sha256 hash into

That means throwing away 75% of the hash though... If you want to, sure, mask only the first 64 bits ¯\_(ツ)_/¯

muted helm Apr 28, 2024, 2:37 PM

#

Yeah unfortunately I have to given thats the required values

#

but mod using 2**63 should do the same no?

slender sandal Apr 28, 2024, 2:40 PM

#

**64 I think

#

Or you could do

int.from_bytes(hashlib_obj.digest()[-8:])

which is a little cryptic

#

Not the security related definition of cryptic

flat sorrel Apr 28, 2024, 2:45 PM

#

so a basic numpy code for the computation would be something like

#

!e

import random
import numpy as np

def rand_q(height: int, width: int):
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 16
Q = 9
result = np.zeros((HEIGHT, WIDTH), dtype=float)
origin_frames = [rand_q(HEIGHT, WIDTH) for _ in range(9)]

for origin, frame in origin_frames:
    result += np.roll(frame, origin)

print(result)

halcyon plankBOT Apr 28, 2024, 2:45 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | [[4.48619444 3.74053379 4.23902768 3.92672632 5.08335403 3.34358598
002 |   5.71451535 5.06600711 4.32237329 4.51426973 4.46561417 4.62321253
003 |   5.12898984 5.95249311 6.05474966 3.04084689]
004 |  [4.40489298 5.21467323 4.93804735 5.60822198 4.43054237 5.08963529
005 |   4.60860832 3.8062857  3.39699674 4.37238177 4.30100951 4.60762078
006 |   3.3029477  5.61203333 5.4895313  5.8631443 ]
007 |  [4.17443367 4.91755452 3.98782944 5.72625161 4.72554572 3.36503766
008 |   3.82932581 3.69205534 5.63291705 4.72333282 5.58175435 3.81820228
009 |   4.04194926 6.38564139 3.13121809 4.56128867]
010 |  [3.9851704  2.49233092 4.94240525 2.23528422 4.89501334 3.47701614
011 |   4.12929219 5.22873326 5.02825565 5.36588294 6.81544752 4.66081924
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/DLXG5P3X4I2ULWK6SMCBD5WK3Y

flat sorrel Apr 28, 2024, 2:48 PM

#

and then you have to perform this computation for every frame (up to 150 FPS)

rigid trench Apr 28, 2024, 3:03 PM

#

flat sorrel so basically you have an accumulator array with shape `(256, 256)`, and you want...

exactly correct

rigid trench Apr 28, 2024, 3:04 PM

#

flat sorrel !e ```py import random import numpy as np def rand_q(height: int, width: int): ...

Yes (though your origin should be an int so it's grid aligned, or else you'll get an error)

rigid trench Apr 28, 2024, 3:06 PM

#

flat sorrel and then you have to perform this computation for every frame (up to 150 FPS)

Yup. This is the task.

flat sorrel Apr 28, 2024, 3:06 PM

#

random.randrange returns an int, so it should be ok

#

is there any relationship between the origin and/or values of different Q matrices?

rigid trench Apr 28, 2024, 3:08 PM

#

Technically yes.
Consider the two Q matricies representing "go east" and "go west".
On timestep 1, they are both origin (0,0).
On timestep 2, they are on (-1,0) and (1,0)

#

after 256 steps, they'll both be back to (0,0)

flat sorrel Apr 28, 2024, 3:09 PM

#

how about within the same timestep?

rigid trench Apr 28, 2024, 3:09 PM

#

Within the same timestep you can represent their origin as:
originQ = timestep * (directionVectorQ)

flat sorrel Apr 28, 2024, 3:10 PM

#

direction vector being each combination of {-1, 0, 1} x {-1, 0, 1}?

rigid trench Apr 28, 2024, 3:10 PM

#

Yup, for D2Q9 setup, that's exactly correct

#

For D2Q5, you don't include the diagonal directions (there is some accuracy / time trade offs based on the number of directions included)

flat sorrel Apr 28, 2024, 3:12 PM

#

but the values may be different for each Q, right?

rigid trench Apr 28, 2024, 3:12 PM

#

Yes the value matrixes are 100% unrelated

#

They represent "number of particles moving in direction Q, at timestep T, in this grid cell" which can be anything

flat sorrel Apr 28, 2024, 3:13 PM

#

are the values for the same direction vector fixed between timesteps?

rigid trench Apr 28, 2024, 3:13 PM

#

They are not fixed, but tend to be correlated with the previous step

#

The calculation is:
probability of interacting * CollisionDistribution(x,y) + (1- probability of interacting) * currentDistribution(x,y)

flat sorrel Apr 28, 2024, 3:15 PM

#

so those values cannot be computed in advance, since they are dependant on the previous step

rigid trench Apr 28, 2024, 3:15 PM

#

Yup

#

Steps are:

Move the origins (imagine every point is "moving", but instead we move the origin in the opposite direction)
Update each cell based on the collision probability

Repeat for each timestep

#

So this algorithm handles step 1

flat sorrel Apr 28, 2024, 3:18 PM

#

maybe you can use one of the Q matrices as the "anchor" instead of creating a fresh zero matrix (I doubt that would be much of an improvement though...)

rigid trench Apr 28, 2024, 3:18 PM

#

We use the sum of the number of particles moving in all directions Q to calculate the density, which feeds into the probability of collision

rigid trench Apr 28, 2024, 3:19 PM

#

flat sorrel maybe you can use one of the `Q` matrices as the "anchor" instead of creating a ...

Q = (0,0) is available yeah

#

But I can't clobber the values

#

I feel like maybe I can do something in parallel.
Like:
temp= Q1,0 + Q-1,0
temp2= Q0,1 + Q0,-1
temp3= Q0,0
temp3 += temp
temp3 += temp2

#

For d2Q5 it's not that much of a speedup, but it seems like I can get like, log2 behavior instead of linear

flat sorrel Apr 28, 2024, 3:22 PM

#

you only have a relatively small number of matrices (<10 in your 2D case), the overhead of parallelizing the operation may be significant

rigid trench Apr 28, 2024, 3:22 PM

#

yeah fair

#

Do you suppose doing anything here with like, Numba, or something might help?

flat sorrel Apr 28, 2024, 3:23 PM

#

and since the values in each Q matrix are different, you can't use symmetry to reduce the number of operations

flat sorrel Apr 28, 2024, 3:24 PM

#

rigid trench Do you suppose doing anything here with like, Numba, or something might help?

I'm quite doubtful, but you can try

rigid trench Apr 28, 2024, 3:24 PM

#

When I run cupyx benchmark, it seems to imply that I'm CPU bound, but I'm not sure why. I don't think I'm copying things out of GPU...

flat sorrel Apr 28, 2024, 3:25 PM

#

can you guarantee that certain patches in each Q matrix contain only zero values? then you might be able to ignore those parts when performing addition

rigid trench Apr 28, 2024, 3:25 PM

#

Nope, these will all be non-zero

#

here's an example visualization (you can see the "noise" as the particles move around)

#

Actually it might not be visible in the gif

flat sorrel Apr 28, 2024, 3:29 PM

#

the only thing I could think of right now is writing a custom CUDA kernel to perform the addition with the given offset. that way, you can avoid having to use roll() which copies the array.

rigid trench Apr 28, 2024, 3:30 PM

#

flat sorrel the only thing I could think of right now is writing a custom CUDA kernel to per...

Interesting. Ok well I think if that's going to be the bottleneck, maybe I will accept it's about as optimized as my current skillset allows. Not sure I want to jump into writing custom CUDA code

flat sorrel Apr 28, 2024, 3:30 PM

#

yeah, that is going to be a pain xD

#

personally I don't have much experience in it either

rigid trench Apr 28, 2024, 3:32 PM

#

On my desktop I can hit around 210 FPS, which I guess is plenty. When I do the other steps it will likely still be >60 FPS which is probably fine

#

Obviously faster is better, but I guess as long as I maintain interactive speed it's OK

flat sorrel Apr 28, 2024, 3:35 PM

#

parallelizing might become more useful when you need to perform this computation at higher resolution (so the overhead isn't as much compared to the total computation time)

#

but in that case the FPS would probably be so low that you have to pre-compute your simulation before visualizing it

rigid trench Apr 28, 2024, 3:38 PM

#

Thanks, I'll do some more coding and circle back. Already doing Cupy helped so much more than Numpy. I was getting like between4-30 FPS before, so Cupy unlocked the project, and I wanted to check if I'm missing anything else before ending this round of optimization

flat sorrel Apr 28, 2024, 3:48 PM

#

@rigid trench ok I just thought of more ideas.

For each offset (which you know in advance), you can pre-compute the indices to assign and reshape them to (256, 256, 2) (assuming 2D), then you can use multidimensional indexing to assign the values without using roll(). However, the "random access" nature may cause this to actually be slower than your quadrant-tiling approach.
Consider your quadrant-tiling approach. Although the values are not symmetrical, the offsets are symmetrical. This opens up the possibility of batching the additions in each quadrant across multiple Q matrices. (e.g. by considering which regions overlap) Actually, I think you may be able to move around the slice indices such that the indices on the arr side are the same for each Q, enabling you to batch the additions.
Reuse the accumulator matrix (keeping the same instance but resetting its values to zero between each function call) so that memory does not have to be reallocated.

#

I have to go to bed soon so I hope this is clear enough for now xD

flat sorrel Apr 28, 2024, 4:20 PM

#

@rigid trench updated ^

haughty mountain Apr 28, 2024, 9:18 PM

#

flat sorrel !e ```py import random import numpy as np def rand_q(height: int, width: int): ...

so wait, this code but with HEIGHT = WIDTH = 256 should take roughly the same time as the actual computation?

#

err

#

oh wait

#

100 instead of 1000, let me delete and re-run

#

because python but is dumb

#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9
origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')

halcyon plankBOT Apr 28, 2024, 9:22 PM

#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 0.5476519698277116ms

haughty mountain Apr 28, 2024, 9:22 PM

#

and locally

> python a.py
Time: 0.289999102242291ms

#

so stupid cheap pithink

#

You have: 1/0.54ms
You want:
        Definition: 1851.8519 / s

flat sorrel Apr 29, 2024, 2:00 AM

#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')

halcyon plankBOT Apr 29, 2024, 2:00 AM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 4.613355630135629ms

flat sorrel Apr 29, 2024, 2:00 AM

#

haughty mountain and locally ``` > python a.py Time: 0.289999102242291ms ```

the Q matrices are different in each timestep so this is a better representation

#

seems that memory allocation is the real bottleneck then, not the assignment using roll()

#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for i in range(n_runs):
  if i == 0:
    result = np.zeros((HEIGHT, WIDTH), dtype=float)
  else:
    result.fill(0)

  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')

halcyon plankBOT Apr 29, 2024, 2:03 AM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 4.700911450199783ms

flat sorrel Apr 29, 2024, 2:04 AM

#

hmm reusing the accumulator matrix doesn't seem to improve performance

haughty mountain Apr 29, 2024, 2:04 AM

#

!e Let's see how big a fraction is just the random number generation

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.zeros((height, width))
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')

#

err

flat sorrel Apr 29, 2024, 2:04 AM

#

put height and width in a tuple

halcyon plankBOT Apr 29, 2024, 2:04 AM

#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 1.2401868900633417ms

haughty mountain Apr 29, 2024, 2:05 AM

#

so a really big chunk of it

#

is just the random number generation

flat sorrel Apr 29, 2024, 2:05 AM

#

still 1.2ms is quite a bit more than 0.5ms

#

I wonder how @rigid trench determined that the assignment of the matrices is the bottleneck then

#

Perhaps we could get better gains by optimizing the computation of the values in Q

haughty mountain Apr 29, 2024, 2:08 AM

#

on the gpu maybe it is because it ends up bottlenecked by small data transfers to the gpu

#

which is why I was interested in seeing just plain numpy, which I assume would be quite fast already

flat sorrel Apr 29, 2024, 2:08 AM

#

that is true. maybe it would be better to perform the whole accumulation step in cpu before transferring the result into gpu

flat sorrel Apr 29, 2024, 2:10 AM

#

rigid trench Numpy > ncalls tottime percall cumtime percall > 30000 4.150 0.000 ...

not sure whether this is for the overall computation or just the accumulator function

rigid trench Apr 29, 2024, 2:10 AM

#

flat sorrel I wonder how <@100806863485952000> determined that the assignment of the matrice...

There's more to it yeah

regal spoke Apr 29, 2024, 5:57 AM

#

slender sandal `**64` I think

%2**64 gives 64 bits but the result could be a Python big int, which kind of defeats the purpose. I think %2**63 returns a normal int (not a big int)

dawn iris Apr 29, 2024, 10:51 AM

#

Does anyone have a preference on using tuple vs dict on a func . dict for instance makes the code clean as you dont have to unpack it if a func sets or return multiple vals but tuple on other hand provides a consistent order but having more return items make the code ugly .

flat sorrel Apr 29, 2024, 10:53 AM

#

Why not use position/keyword arguments instead of passing tuple/dict? In most cases I think doing so would be more convenient

dawn iris Apr 29, 2024, 11:15 AM

#

its the return value i am after do i set the dict inside my func or do i return the tuple

#

having like 8 tuple returns kind of makes it ugly and i am hesitant on setting and retrieving a key from a dict

flat sorrel Apr 29, 2024, 11:19 AM

#

I see. For return values, I much prefer dict over tuple for the readability.

#

even though Python doesn't have object unpacking syntax like in JavaScript, if I have to return more than 2 items from a function then I would most likely use a dict

#

if the accessor syntax val = return_dict["key"] is too ugly, returning a dataclass is also a valid option

flat sorrel Apr 29, 2024, 11:29 AM

#

dawn iris having like 8 tuple returns kind of makes it ugly and i am hesitant on setting a...

I think NamedTuple might also do the trick, so you get to do it both ways

hushed yoke Apr 29, 2024, 2:06 PM

#

guys best resource for dsa? ik im asking the most annoying basic ass question but please tell.

young totem Apr 29, 2024, 2:52 PM

#

hushed yoke guys best resource for dsa? ik im asking the most annoying basic ass question bu...

check pins plenty of gr8 resources

jolly mortar Apr 29, 2024, 4:42 PM

#

https://codeforces.com/contest/1969/problem/D
this was my solution

t = int(input())
for _ in range(t):
    n, k = map(int, input().split())
    *a, = map(int, input().split())
    *b, = map(int, input().split())
    b, a = zip(*sorted(zip(b, a)))
    prevn = [0]*(k+1)
    prevn[0] = max(b[0] - a[0], 0)
    for nn in range(1, n):
        newn = [None]*(k+1)
        newn[0] = prevn[0] + max(b[nn] - a[nn], 0)
        for kk in range(1, k+1):
            newn[kk] = max(prevn[kk-1] - a[nn], prevn[kk])
        prevn = newn
    print(prevn[-1])

which TLEd on test 11 😔
algorithmically can you do better than this O(nk) or is it a constant factor diff

Codeforces

Problem - D - Codeforces

rigid trench Apr 29, 2024, 5:10 PM

#

flat sorrel still 1.2ms is quite a bit more than 0.5ms

I'm worried maybe I'm preoptimizing, as it's making my code harder to understand

solemn moss Apr 29, 2024, 5:19 PM

#

jolly mortar https://codeforces.com/contest/1969/problem/D this was my solution ```py t = in...

I have smth about O(n log k) with c++

#

It's easy to do with ordered set, but python doesn't have it built-in :(

jolly mortar Apr 29, 2024, 5:20 PM

#

pls share 👉 👈

solemn moss Apr 29, 2024, 5:20 PM

#

O(nk) surely can't pass here

#

https://codeforces.com/contest/1969/submission/258770000

Codeforces

Submission #258770000 - Codeforces

jolly mortar Apr 29, 2024, 5:21 PM

#

ty

solemn moss Apr 29, 2024, 5:23 PM

#

We want only values where B[i] > A[i], other are useless

Two sets - one is for values that we take but bob doesn't take them, second one for values that bob does take

Then we erase the value with the biggest A value from the second set, and update it with the new value that would get into it from first one (the one with max B value)

And the answer is the best from these possible answers

solemn moss Apr 29, 2024, 5:24 PM

#

solemn moss It's easy to do with ordered set, but python doesn't have it built-in \:(

I guess we can do what we need here in python with heapq

#

Yep

#

Here is the same code in python https://codeforces.com/contest/1969/submission/258773267

Codeforces

Submission #258773267 - Codeforces

jolly mortar Apr 29, 2024, 5:37 PM

#

oh

#

i get it now

#

thanks

buoyant wing Apr 29, 2024, 10:08 PM

#

https://discord.com/channels/267624335836053506/1234622571287417033

#

billybobby said to check it out and hes a mod soooo shrug

#

pls help tho, i really need it 😭

keen hatch Apr 29, 2024, 11:44 PM

#

hi yall, homework problem I'm a bit stumped on.
Given two sorted integer arrays A and B, merge B into A as one sorted array in O(n) time. Write the description and code of the algorithm.

def two_sorted_merge(a: list, b: list):
    a_point = 0
    b_point = 0

    # Run loop while both our pointers are in bounds
    while a_point <= len(a) - 1 and b_point <= len(b) - 1:
        # If our current element in A is greater than b[b_point], insert behind us and increase b_point
        if a[a_point] > b[b_point]:
            a.insert(a_point, b[b_point])
            b_point += 1  # Move our b pointer forward to look at the next element in the array

        # We need to increase our a pointer whether we just inserted an element from B or not. If we didn't then it's
        # time to move forward and see if the next element in A is greater than b[b_point]. If we did then all our
        # indexes in A just shifted up 1, and we need to increase to maintain our same position in the array.
        a_point += 1

    # If B has items remaining then b[b_point] >= max(a), so we can simply insert them all
    if b_point <= len(b) - 1:
        for i in range(b_point, len(b)):
            a.append(b[i])

    return a```
this function does actually work but `a.insert(a_point, b[b_point]` makes the whole thing not O(n), and I'm having trouble imagining another way

modern gulch Apr 29, 2024, 11:52 PM

#

If it's not O(n), what do you think it is?

keen hatch Apr 30, 2024, 12:02 AM

#

modern gulch If it's not O(n), what do you think it is?

Well in the worst case like O(n^2) where n is the length of B right? with inputs like A = [1, 9999] and B = [2, 3, ..., 9998] that inner loop is gonna run for every element in B with an O(n) call every iteration

modern gulch Apr 30, 2024, 12:03 AM

#

keen hatch Well in the worst case like O(n^2) where n is the length of B right? with inputs...

There are len(a)+len(b) elements, right?

keen hatch Apr 30, 2024, 12:04 AM

#

ahuh

modern gulch Apr 30, 2024, 12:04 AM

#

Oh, your issue is the insert.

#

Why not just build a new list, rather than mutate a?

#

Or is it required to mutate a?

keen hatch Apr 30, 2024, 12:06 AM

#

the problem specifically says merge b into a

modern gulch Apr 30, 2024, 12:06 AM

#

Did they provide the stub, and did they return a?

keen hatch Apr 30, 2024, 12:07 AM

#

yeah from the example it looks like thats what they want

#

I wasnt sure because my professors english isn't great but example seems pretty clear 🙁

modern gulch Apr 30, 2024, 12:09 AM

#

yah, the normal solution to this is to merge into a new list, since you're just appending to the end.

keen hatch Apr 30, 2024, 12:09 AM

#

thats what I did at first but the example got in my head

#

I did email to ask for clarification

#

but tbh I just wanted to check to make sure I wasn't missing some super obvious way to do this

#

but it doesn't seem obvious in O(n) with the built in lists. implementing it as a linked list would make it work right?

modern gulch Apr 30, 2024, 12:10 AM

#

Hmm, I guess you could do the merge in reverse. Extend A, and find the max element and put it at a[max_position]

#

then work backwards

#

(max to min avoids modifying the elements as you're using them)

keen hatch Apr 30, 2024, 12:12 AM

#

that makes sense 🤔

#

would there be input combinations where unsorted elements in A get overwritten by elements that come in from B?

modern gulch Apr 30, 2024, 12:13 AM

#

Or even easier, maybe just merge into an empty list, then a.clear() and add the sorted list?

keen hatch Apr 30, 2024, 12:15 AM

#

meh I think if they want it in A they want me to work in place and not allocate a new list

#

I'll try the backwards merge, thank you very much for that suggestion

keen hatch Apr 30, 2024, 3:03 AM

#

modern gulch Hmm, I guess you could do the merge in reverse. Extend A, and find the max eleme...

I got this working perfectly and then he emailed me basically saying "who cares ur answer is fine" 🥲

modern gulch Apr 30, 2024, 3:04 AM

#

keen hatch I got this working perfectly and then he emailed me basically saying "who cares ...

Hah, well, you'll see these again if you ever leetcode .)

urban tundra Apr 30, 2024, 8:01 AM

#

I solved it correctly...
But I want to know.. that Is my solution better than the approach given by website.??

and as we can see the approach given has time complexity of O(n*n) whereas mine solution is having time complexity of O(n) right...???

#

If anyone has any kind of suggestions etc.. pls let me know.. and ping me any number of times... not a problem... 😉

#

➕ ❕another ques..: does having comments in my code.. will increase runtime.. even by 1ms?? Ik that interpreter ignores it but it still parses it right.. so would it affect runtime even by 1ms??

haughty mountain Apr 30, 2024, 8:11 AM

#

urban tundra I solved it correctly... But I want to know.. that Is my solution better than t...

how is your thing O(n)?

urban tundra Apr 30, 2024, 8:11 AM

#

haughty mountain how is your thing O(n)?

sorry, it would be O(2*n)... right..?

#

haughty mountain Apr 30, 2024, 8:12 AM

#

O(2n) is the same as O(n)

#

your solution is not O(n)

urban tundra Apr 30, 2024, 8:12 AM

#

ik thats why i said.. O(n)

urban tundra Apr 30, 2024, 8:12 AM

#

haughty mountain your solution is not O(n)

why??

#

@haughty mountainthere are no nested loops.. then how it can be.. O(n*n)??

haughty mountain Apr 30, 2024, 8:13 AM

#

number of nested loops doesn't automatically tell you the complexity

#

why are you assuming your prints are O(1)?

urban tundra Apr 30, 2024, 8:14 AM

#

haughty mountain why are you assuming your prints are O(1)?

Oh... I thought that would be negligible.. sorry idk much..

haughty mountain Apr 30, 2024, 8:14 AM

#

well, prints and the string operations

urban tundra Apr 30, 2024, 8:14 AM

#

haughty mountain well, prints and the string operations

then what is the exact time complexity??

haughty mountain Apr 30, 2024, 8:15 AM

#

your thing is also O(n²)

urban tundra Apr 30, 2024, 8:15 AM

#

🤯

haughty mountain Apr 30, 2024, 8:15 AM

#

e.g. "a"*n is O(n)

urban tundra Apr 30, 2024, 8:15 AM

#

Omg

haughty mountain Apr 30, 2024, 8:15 AM

#

it shouldn't be surprising

#

it's creating something of size n

#

so it can't be cheaper than O(n)

urban tundra Apr 30, 2024, 8:16 AM

#

O(2*(n*(3*n))) this is the correct time complexity ... right..??@haughty mountain

#

I mean more precise...

haughty mountain Apr 30, 2024, 8:17 AM

#

that's missing the point of big O notation

urban tundra Apr 30, 2024, 8:17 AM

#

haughty mountain that's missing the point of big O notation

I mean more precise..

#

😅

urban tundra Apr 30, 2024, 8:18 AM

#

urban tundra ➕ ❕another ques..: does having comments in my code.. will increase runtime.. eve...

@haughty mountain your views on this??

haughty mountain Apr 30, 2024, 8:18 AM

#

urban tundra <@133944101929222144> your views on this??

it's not going to be significant

urban tundra Apr 30, 2024, 8:19 AM

#

haughty mountain it's not going to be significant

not even 1ms??

haughty mountain Apr 30, 2024, 8:21 AM

#

if you cared about performance on such a level you wouldn't use python anyway

urban tundra Apr 30, 2024, 8:21 AM

#

@haughty mountain chatgpt is saying that it is.. O(n) not O(n*n)
🤔

haughty mountain Apr 30, 2024, 8:22 AM

#

it's wrong

urban tundra Apr 30, 2024, 8:22 AM

#

Okay.. dude

haughty mountain Apr 30, 2024, 8:24 AM

#

like just consider how many characters you even output overall

#

you're outputting O(n²) characters

#

you can not do better than O(n²)

urban tundra Apr 30, 2024, 8:26 AM

#

Oh.. then the space complexity is also O(n*n) ... right?? @haughty mountain

haughty mountain Apr 30, 2024, 8:26 AM

#

how do you count space complexity?

#

the program itself doesn't use more than O(n) memory at one time

urban tundra Apr 30, 2024, 8:27 AM

#

Can you pls .. give me some resources to learn and understand these concepts great like you.. @haughty mountain

#

if possible something that you referred

haughty mountain Apr 30, 2024, 8:28 AM

#

I don't have resources for this

#

it's basically just a math exercise of counting stuff

urban tundra Apr 30, 2024, 8:29 AM

#

Okk

#

@haughty mountain this websites.. is showing space complexity of O(n*n)... 😅

#

see last line

haughty mountain Apr 30, 2024, 8:30 AM

#

your program doesn't do that

#

it just prints individual lines of output to an output stream

#

the thing that reads and displays the output would do that kind of thing

#

the program itself doesn't

urban tundra Apr 30, 2024, 8:33 AM

#

Ohkay...

haughty mountain Apr 30, 2024, 8:35 AM

#

urban tundra <@133944101929222144> this websites.. is showing space complexity of O(n*n)... �...

ok this thing is just straight garbage

#

err, that j in the range should be an i

#

but doesn't change it saying it's O(n²)

#

that code is O(n log n)

flat sorrel Apr 30, 2024, 8:37 AM

#

sounds like this is some AI-powered website

haughty mountain Apr 30, 2024, 8:37 AM

#

sure does

flat sorrel Apr 30, 2024, 8:38 AM

#

don't blindly trust the AI, especially when logical thinking or math is involved

#

they can and do make mistakes

haughty mountain Apr 30, 2024, 8:38 AM

#

I like that it got to
n + n/2 + n/3 + ...
and then just claims that's O(n²)

#

(technically it is, but only because O is an upper bound)

#

n + n/2 + n/3 + ... is even Θ(n log n)

regal spoke Apr 30, 2024, 8:44 AM

#

keen hatch meh I think if they want it in A they want me to work in place and not allocate ...

Extending a list is essentially the same thing as allocating a new list

#

I highly doubt the point of the exercise is to do some kind of in place implementation because that wouldn't really make sense

keen hatch Apr 30, 2024, 8:45 AM

#

regal spoke Extending a list is essentially the same thing as allocating a new list

I didn't think about this I suppose thats true

solemn moss Apr 30, 2024, 8:45 AM

#

regal spoke Apr 30, 2024, 8:46 AM

#

keen hatch I didn't think about this I suppose thats true

Maybe they meant that the result should be stored inside A. Then you could do something like this

def merge(A, B):
  C = #... Add merge of A and B into a new list C
  A[:] = C # Replace the content of A with C

#

Btw the merge sort algorithm done in-place is notorious to be hard to implement (see https://stackoverflow.com/questions/2571049/how-to-sort-in-place-using-the-merge-sort-algorithm). So I really don't think they expect any kind of in-place solution.

flat sorrel Apr 30, 2024, 8:50 AM

#

solemn moss

I trivially got it to leak the system prompt. They really didn't put much thought into this site.

#

(At least add a disclaimer that the answer is AI-generated so that people won't be misled by it...)

keen hatch Apr 30, 2024, 8:52 AM

#

regal spoke Btw the merge sort algorithm done in-place is notorious to be hard to implement ...

That's interesting. I'm not sure if my 2nd solution is correct then because it didn't seem super difficult

haughty mountain Apr 30, 2024, 8:52 AM

#

well that's underwhelming

keen hatch Apr 30, 2024, 8:53 AM

#

I definitely just read too far into "merge B into A" as my professor emailed me back and said my first solution was fine lmao

regal spoke Apr 30, 2024, 8:53 AM

#

keen hatch That's interesting. I'm not sure if my 2nd solution is correct then because it d...

Whats your 2nd solution?

haughty mountain Apr 30, 2024, 8:55 AM

#

extend A, merge into A back to front

keen hatch Apr 30, 2024, 8:55 AM

#

what he said

regal spoke Apr 30, 2024, 8:55 AM

#

huh

#

O(n^2)?

haughty mountain Apr 30, 2024, 8:55 AM

#

no

regal spoke Apr 30, 2024, 8:56 AM

#

Oh I see now

#

Well that is not in-place

#

But I guess it is still pretty efficient memory wise

keen hatch Apr 30, 2024, 8:57 AM

#

yeah I realize now it's probably technically impossible to "in place" merge a list into another when the given list isn't big enough to hold all the elements from the 2nd lmao

haughty mountain Apr 30, 2024, 8:57 AM

#

in-place doesn't make much sense when you have two separate lists to start with 😛

keen hatch Apr 30, 2024, 8:57 AM

#

yea

regal spoke Apr 30, 2024, 8:58 AM

#

yes

keen hatch Apr 30, 2024, 8:59 AM

#

well, I hope my professor likes the over engineered solution

regal spoke Apr 30, 2024, 9:01 AM

#

stray fractal Apr 30, 2024, 9:02 AM

#

well i mean

keen hatch Apr 30, 2024, 9:03 AM

#

openai's api might be the death of interesting tech projects / startups 😵‍💫

#

i used to browse r/saas but half the shit posted there now just ends up being a wrapper of chatgpt with a different system prompt :/

haughty mountain Apr 30, 2024, 9:23 AM

#

lol

flat sorrel Apr 30, 2024, 9:26 AM

#

If you replace >= with >, it becomes O(n) which is closer to the real answer, I guess...

urban tundra Apr 30, 2024, 9:45 AM

#

haughty mountain ok this thing is just straight garbage

Ok thanks.. understood... 👍

#

I want to ask that.. my solution is OK??
because I think that the way I used If statement is maybe wrong.. idk
let me know, pls

flat sorrel Apr 30, 2024, 10:30 AM

#

urban tundra I want to ask that.. my solution is OK?? because I think that the way I used If ...

if you got the correct answer then it's fine. imo you kinda have to actually try in order to come up with an algorithm that is worse than O(n^2) for printing on a 2D screen

#

this isn't usually where you have to care about alg. complexity

urban tundra Apr 30, 2024, 11:17 AM

#

flat sorrel if you got the correct answer then it's fine. imo you kinda have to actually try...

wdym by "worse"??

flat sorrel Apr 30, 2024, 11:18 AM

#

having a higher complexity

urban tundra Apr 30, 2024, 11:23 AM

#

Ok 👍

lean walrus Apr 30, 2024, 5:20 PM

#

haughty mountain lol

is this conclusion produced by an LLM?

haughty mountain Apr 30, 2024, 5:24 PM

#

lean walrus is this conclusion produced by an LLM?

sure is

haughty mountain Apr 30, 2024, 5:25 PM

#

haughty mountain well that's underwhelming

@lean walrus https://www.bigocalc.com

Big O Calc

Calculate the time and space complexity of your code using Big O notation.

#

the prompt is also pretty garbage

#

the lowest of efforts

agile sundial Apr 30, 2024, 5:27 PM

#

😔

lean walrus Apr 30, 2024, 5:29 PM

#

😔

outer rain Apr 30, 2024, 5:50 PM

#

haughty mountain <@575681145929203724> https://www.bigocalc.com

yooo free llama API

lean walrus Apr 30, 2024, 5:52 PM

#

bruh

#

i asked it to write philosophy essay

muted helm Apr 30, 2024, 6:46 PM

#

This might be really dumb - but why is in rare casing python printing with double quotes instead of single

#

or well it is not printing, but I can see in my list that it is double cased

#

[ 'xxxx,
"yyyy",
'xxxx']

hallow slate Apr 30, 2024, 7:23 PM

#

hey guys!

I have code in Python and I want it to run at scheduled times in the cloud without the need for my computer to be turned on.

I know I can upload the code to Google Cloud, AWS...

I wanted to know if you have any tips on which one is better or if there is another way that might be better (the code isn't very big, it's simple).

I would need to monitor it to know if it is running correctly.

haughty mountain Apr 30, 2024, 7:52 PM

#

muted helm This might be really dumb - but why is in rare casing python printing with doubl...

wrong channel for this question, but it tries to minimize the number of escaped quotes in the string

#

!e

print(repr('with singles'))
print(repr('that\'s a double!'))
print(repr('''"oh god, what's this, I guess use singles"'''))

halcyon plankBOT Apr 30, 2024, 8:04 PM

#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 'with singles'
002 | "that's a double!"
003 | '"oh god, what\'s this, I guess use singles"'

haughty mountain Apr 30, 2024, 8:06 PM

#

it feels slightly concerning I knew the exact output of these before running them 😅

lucid harbor May 1, 2024, 8:09 AM

#

how do i speed up this loop, it needs to be very fast, so i can run it like 120 times a second:

`def RSI_strategy_numba(data: pd.DataFrame, rsi_values, indicators) -> tuple[list[pd.DatetimeIndex], list[pd.DatetimeIndex]]:
buy_dates, sell_dates, state = [], [], 0
for idx, rsi in zip(data.index.values, rsi_values):
# If were in the buy state, check for a buy
if rsi > indicators[0] and state == 0:
buy_dates.append(idx); state = 1
# Otherwise check for a sell
elif rsi < indicators[1] and state == 1:
sell_dates.append(idx); state = 0
return buy_dates, sell_dates`

flat sorrel May 1, 2024, 8:12 AM

#

avoid using Python loops. try to make use of boolean indexing to vectorize conditional assignment

lucid harbor May 1, 2024, 8:17 AM

#

@flat sorrel I've tried that, but i cant figure out how.
Every answer does not seem to account for the fact that i can only sell after i buy and vice versa

flat sorrel May 1, 2024, 8:22 AM

#

maybe you can get the potential buy and sell dates (without regard for state) in the first pass, which can be easily vectorized

#

you can then deal with the state separately, simplifying the problem

regal spoke May 1, 2024, 10:29 AM

#

lucid harbor how do i speed up this loop, it needs to be very fast, so i can run it like 120 ...

so i can run it like 120 times a second:
Without you telling us the sizes of things, it is impossible to judge how fast the code runs

vocal gorge May 1, 2024, 10:39 AM

#

why is this function called "numba"? Does it get @njit-compiled and you just omitted the decorator doing it?

#

(generally speaking, for a task like this I'd expect a numba function to be a good idea)

stoic sand May 1, 2024, 11:21 AM

#

is there any gud course for dsa in python.

lucid harbor May 1, 2024, 5:28 PM

#

flat sorrel maybe you can get the potential buy and sell dates (without regard for state) in...

Sorry, for not replying, it was 10pm where i live. What do you mean by dealing with the state seperately?

I know i could do rsi_values < indicators[0] to create a buy_mask, and then do something like data[buy_mask], but this would still use a loop wouldn't it? would it be faster?

fervent light May 1, 2024, 6:16 PM

#

uh

#

someone pinged me

flat sorrel May 2, 2024, 1:47 AM

#

lucid harbor Sorry, for not replying, it was 10pm where i live. What do you mean by dealing w...

Yes, it will use a loop, but in C, so it is much faster than the equivalent in Python

astral patio May 2, 2024, 6:37 AM

#

Can I use perfplot for several algorithms that I am benchmarking:

def aho_corasick(string_to_search: str, file_contents: Set[str]) -> bool:
"""
Aho-Corasick Algorithm

:param string_to_search: string to search
:param file_contents: set of words to search for
:return: True if any word from the dictionary is found in the text, False otherwise
"""

That is a sample of the algorithmic function I will be testing on alongside other algorithmic functions.

I want to benchmark based on input sizes of 10,000 to 1,000,000 rows
also based on a REREAD_ON_QUERY setting whether True or False
and lastly on number of string_to_search(queries) per second. How do I go about this?

modern gulch May 2, 2024, 9:50 PM

#

astral patio Can I use perfplot for several algorithms that I am benchmarking: def aho_coras...

I don't use perfplot, I just generate my results, store the timing information and plot the results directly.

modern gulch May 2, 2024, 9:50 PM

#

astral patio Can I use perfplot for several algorithms that I am benchmarking: def aho_coras...

Perfplot does seem interesting: are you having problems using it?

astral patio May 3, 2024, 1:31 AM

#

Yes, I am finding problems implementing it to my use case

astral patio May 3, 2024, 1:31 AM

#

modern gulch I don't use perfplot, I just generate my results, store the timing information ...

I can use your approach, would appreciate if you could walk me through

modern gulch May 3, 2024, 1:32 AM

#

astral patio I can use your approach, would appreciate if you could walk me through

Simply, write a loop that runs the calculation, calculates the run time, and writes (appends) it to a file (ie: a csv file).

astral patio May 3, 2024, 1:34 AM

#

modern gulch Simply, write a loop that runs the calculation, calculates the run time, and wri...

It has to be a csv yeah?

modern gulch May 3, 2024, 1:35 AM

#

astral patio It has to be a csv yeah?

CSV, JSON, whatever. Something persistent so you don't lose it between tests

#

CSV is convenient because you can just write line by line.

astral patio May 3, 2024, 1:35 AM

#

I was thinking of how to automate it when I can just be simplifying it and using other external factors

astral patio May 3, 2024, 1:38 AM

#

modern gulch CSV is convenient because you can just write line by line.

These are the criterias:

Unit tests for:

Showing different execution times for different file sizes from 10,000 to 1,000,000 with a client you write for testing purposes and cover these in your speed testing report,

Showing different execution times for file sizes vs. number of queries per second, up to the point that the server can not handle it anymore (document the limitations of the software),

The thing is I do not know how to wrap my head around it and I would really appreciate if I can get a step by step approach to get to chart it across multiple other algorithms

modern gulch May 3, 2024, 1:38 AM

#

Can you test a filesize of 10,000 right now?

astral patio May 3, 2024, 1:39 AM

#

Yeah, I should be able to. Although my PC be acting funny lately but I still could

#

That is why I am using CDE (Cloud Development Env)

modern gulch May 3, 2024, 1:40 AM

#

So, figure out how to:

#

Run a single test case and capture the execution time. Write this execution time to a CSV file.

#

Then, write a loop to run that test case for multiple file sizes

#

Plot the results from the CSV

#

I don't quite understand the file sizes / queries per second, but you'll have to repeat the process: figure out how to test for a certain file size / query per second.

#

But I'd deal with the first one first.

astral patio May 3, 2024, 1:42 AM

#

Okay, this should do

#

What do you use to plot the results from the CSV, I am not quite familiar with CSV manipulations

#

I only know it can be used in Excel and programming languages

#

Also I was wondering if matplotlib will work out but that is like setting myself up for more complexity

modern gulch May 3, 2024, 1:44 AM

#

astral patio Okay, this should do

Yah, you have lots of choices. Matplotlib, plotly, whatever. That's simple stuff. I wouldn't worry about it today: focus on creating that CSV

astral patio May 3, 2024, 1:44 AM

#

modern gulch I don't quite understand the file sizes / queries per second, but you'll have to...

And about this, it is a server and client scripting task that I am working on. The server reads from a 200k.txt file and if the client sends the exact line of data (that is query) to the server, it gives True or False

modern gulch May 3, 2024, 1:44 AM

#

But, afterwards, you can read the file using: ```py
import pandas as pd
df = pd.read_csv('yourfile.csv')

astral patio May 3, 2024, 1:44 AM

#

modern gulch Yah, you have lots of choices. Matplotlib, plotly, whatever. That's simple stuff...

Thanks man, really appreciate

astral patio May 3, 2024, 1:45 AM

#

modern gulch But, afterwards, you can read the file using: ```py import pandas as pd df = pd....

Then it is data analysis from there onwards, I should be safe

modern gulch May 3, 2024, 1:45 AM

#

astral patio And about this, it is a server and client scripting task that I am working on. T...

Yah, what I'm not sure about is how you'll test queries per second. There's some slightly more complicated solutions that come to mind (ie: using a rate limiter on your side)

astral patio May 3, 2024, 1:49 AM

#

rate limiter? that is new, I will look that up. That QPS was the genesis of my problem

modern gulch May 3, 2024, 1:50 AM

#

astral patio rate limiter? that is new, I will look that up. That QPS was the genesis of my p...

Yah, I was playing with one for a different purpose, to avoid overloading a vendors API that had a rate limit...

#

!pypi pyrate_limiter

halcyon plankBOT May 3, 2024, 1:50 AM

#

pyrate-limiter v3.6.1

Python Rate-Limiter using Leaky-Bucket Algorithm

Released on <t:1714496625:D>.

modern gulch May 3, 2024, 1:50 AM

#

I'm just not sure it's the right answer for load testing.

astral patio May 3, 2024, 1:55 AM

#

Okay, I am going to start from there. Thanks

#

I also wanted to ask, b ased on testing, do I have to use the server with the tests, i.e generation of the 10_000 rows file sizes

outer bane May 3, 2024, 1:57 AM

#

teoo

astral patio May 3, 2024, 1:58 AM

#

halcyon plank

And @modern gulch this should be at the test_client.py side, yeah? Not the server side

modern gulch May 3, 2024, 1:58 AM

#

yes

astral patio May 3, 2024, 1:58 AM

#

Okay thanks

modern gulch May 3, 2024, 1:59 AM

#

astral patio Okay thanks

You might want something like https://locust.io/

astral patio May 3, 2024, 1:59 AM

#

even if the server is reading on localhost?

modern gulch May 3, 2024, 1:59 AM

#

I dunno, start simple I guess

#

But load testing a server by running code on the server is probably a bad idea

outer bane May 3, 2024, 2:00 AM

#

Idk

#

Anyways

astral patio May 3, 2024, 2:02 AM

#

modern gulch I dunno, start simple I guess

I dont really understand

astral patio May 3, 2024, 2:03 AM

#

modern gulch But load testing a server by running code on the server is probably a bad idea

I mean, for this actually

modern gulch May 3, 2024, 2:04 AM

#

astral patio I mean, for this actually

I thought you just said "even if the server is reading on localhost?"

#

Doesn't that mean that you're running the load test from the server?

astral patio May 3, 2024, 2:14 AM

#

Yes, I would be running the load test from the server

#

sorry for the delayed response

#

want to fill up my daily log

#

You said it this way, "But load testing a server by running code on the server is probably a bad idea", and it made me wonder if I am to run a new code on the server to load test or do I have to create a new file for load test

cosmic swallow May 3, 2024, 4:14 AM

#

can someone tell me what is the time complexity of insert, findmin, heapify, and remove operation on a minheap?

swift arch May 3, 2024, 4:36 AM

#

not sure if it it just me but I barely seen tech companies that use Python implement DS principles like stack, heap, queue, etc how come?

tardy monolith May 3, 2024, 1:21 PM

#

is this the right place to ask for help understanding the complexity of an algorithm?

tardy monolith May 3, 2024, 2:14 PM

#

i basically need to figure out the worst case complexity and write it in big O notation

#

if anyone thinks they can help, ping me or dm me and i can post the code and we can talk through it. it isnt that long its a brute force longest common substring algorithm that takes two string inputs

#

i just dont fully understand how to work out the complexity, i have a bit of understanding on the subject but its not fully there and would like some hints or pointers

haughty mountain May 3, 2024, 7:01 PM

#

tardy monolith if anyone thinks they can help, ping me or dm me and i can post the code and we ...

it is the right place, why not just post the code and then maybe someone decides to look at it?

tardy monolith May 3, 2024, 9:15 PM

#

haughty mountain it is the right place, why not just post the code and then maybe someone decides...

good point lol

#

so thats my little algorithm above for the longest substring brute force method

#

im trying to work out the worst case complexity

#

so far im assuming that a for loop is generally linear but with the slice operation in it which is also linear it becomes quadratic

#

and i do that twice for getting all the substrings of the left and right string

#

and my double for loop is also possibly quadratic.

#

but im not sure what the overall complexity is or if i understand it properlly

haughty mountain May 3, 2024, 9:18 PM

#

first two loops are quadratic overall, the third one runs the risk of being cubic

tardy monolith May 3, 2024, 9:18 PM

#

so if i have two quadratic loops and a cupic how would that be written in big O notation

#

is it only the biggest time sink we care about or do they add together

haughty mountain May 3, 2024, 9:19 PM

#

you know how these asymptotic notations work?

tardy monolith May 3, 2024, 9:21 PM

#

my course im learnign this stuff on is very crap at explaining complexity even though thats the whole point of it lol. i have literally had to self teach to try and fill in the gaps

#

so no that is a complete new word to me

haughty mountain May 3, 2024, 9:21 PM

#

in rough terms, f(x) is in O(g(x)) if f(x)/g(x) approaches a constant (or zero) when x gets large

#

O is basically ≤ but for asymptotics

#

so e.g. is x^2 in O(x^3)?

x^2/x^3 = 1/x

let x get large, 1/x tends to zero, which is ≤ some constant

#

similarly, is x^2 + x^3 in O(x^3)?

tardy monolith May 3, 2024, 9:24 PM

#

btw i am still here, i'm trying to absorb the info

haughty mountain May 3, 2024, 9:24 PM

#

(x^2 + x^3)/x^3 =
x^2/x^3 + x^3/x^3 = 1/x + 1

which is ≤ some constant, so yes

#

the effect of the definition is that you can typically ignore lower order terms because it just doesn't matter, e.g. x^2 is much smaller than x^3 as x grows large

tardy monolith May 3, 2024, 9:26 PM

#

right

#

so what made you think that the nested for loop could possible by cubic earlier, because in my head it was quadratic

haughty mountain May 3, 2024, 9:27 PM

#

let's say your first loops were something quadratic, and third something cubic, then you have something like
a x^2 + b x^2 + c x^3

#

the first two terms would be negligible compared to the cubic term as x grows large

#

overall it's O(x^3)

#

(and you can go through the definition as well, computing the division and see what happens as x gets large)

haughty mountain May 3, 2024, 9:28 PM

#

tardy monolith so what made you think that the nested for loop could possible by cubic earlier,...

so, you have two nested loops that could potentially be O(n) long

tardy monolith May 3, 2024, 9:28 PM

#

yes

haughty mountain May 3, 2024, 9:28 PM

#

and you have a string comparision which is also linear

tardy monolith May 3, 2024, 9:29 PM

#

ah got you

haughty mountain May 3, 2024, 9:29 PM

#

but I don't know if you can actually hit the worst case where every comparison is really expensive

#

which is why I went with it risking being cubic

#

it's possible there is some constraint on the strings you produce that makes is quadratic overall

tardy monolith May 3, 2024, 9:30 PM

#

so assuming this is cubic, when i run my timing checks on this, unless i am being really stupid. and possibly using the timing code wrong, the numbers i was getting didnt seem to be cubic or quadratic

#

let me give you an example

#

im not sure if the runs and loops effect the outcome, but the times didnt really seem like they matched the complexity. unless im just more shockingly bad at math than i thought

#

lol which is quite possible

haughty mountain May 3, 2024, 9:32 PM

#

ok, saying quadratic is really us being sloppy

#

it's more O(n m)

#

where n and m are the sizes of the strings

#

err, let me re-read the code

#

let's say the lengths of the strings are n and m
first loops is like n^2
second loop is like m^2
third is n*m*something

#

fwiw, the third loop will depend a lot on the data

#

for most strings you would find a mismatch early and you don't have to do the full comparison

#

for bad cases you might need to do a lot of work comparing strings

tardy monolith May 3, 2024, 9:36 PM

#

ah that actually makes sense because i am talking about the complexity in terms of the left and right inputs

#

yeah this code does also have some prechecks to avoid the loops altogether it was just a bit long to post here

haughty mountain May 3, 2024, 9:37 PM

#

I think something like

left = "A"*n
right = "A"*n
```would cause cubic looking behavior

tardy monolith May 3, 2024, 9:37 PM

#

for the best case its (1)

#

because i do a little check to see if any of the stings are empty or if they are identicle and just return it which is a constant operation

#

but it was mainly this worst case part that was causing me the grief

haughty mountain May 3, 2024, 9:38 PM

#

haughty mountain I think something like ```py left = "A"*n right = "A"*n ```would cause cubic loo...

actually, maybe you're bailed out by the fact that the string comparison will check the length before comparing the contents...

tardy monolith May 3, 2024, 9:38 PM

#

ah with my finding the shortest of the two strings?

#

oh wait sorry i misread you

haughty mountain May 3, 2024, 9:39 PM

#

no, the if in the nested loop

tardy monolith May 3, 2024, 9:39 PM

#

i get what you mean

haughty mountain May 3, 2024, 9:40 PM

#

I suspect overall it probably ends up being proportional to n^2 + m^2 + n*m

#

you had one string being very small and the other being large, so the large^2 dominates

tardy monolith May 3, 2024, 9:42 PM

#

interesting, let me just check something with my timeit

#

2 secs

#

there we go

#

here check this

#

if the strings are equal sizes and i double the strings i get what looks more of what i was expecting

#

with the original O(n^3)

haughty mountain May 3, 2024, 9:46 PM

#

that looks quadratic

#

divide the adjacent times there

#

!e looks close to 4

print(123/37.2)
print(448/123)

halcyon plankBOT May 3, 2024, 9:46 PM

#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 3.3064516129032255
002 | 3.6422764227642275

haughty mountain May 3, 2024, 9:46 PM

#

well, close ish

tardy monolith May 3, 2024, 9:47 PM

#

yeah close is perfectly fine

haughty mountain May 3, 2024, 9:47 PM

#

(2*n)^2 = 4*n^2

tardy monolith May 3, 2024, 9:47 PM

#

so the original assumption of O(n^2) was more on the mark then

#

Depending on the size of the left and right strings that can change. I will say this stuff is fascinating lol

haughty mountain May 3, 2024, 9:49 PM

#

haughty mountain I suspect overall it probably ends up being proportional to n^2 + m^2 + n*m

the overall behavior is something like this , I'm just not sure about if the last term is n*m or worse than that

tardy monolith May 3, 2024, 9:50 PM

#

thats ok im feeling better about the topic currently than i did when i started the chat

#

you have been super helpful

haughty mountain May 3, 2024, 9:51 PM

#

I would recomment getting used to the definition of O and related notations, the basics of it is just looking at f(x)/g(x) and see what happens as x grows large

tardy monolith May 3, 2024, 9:52 PM

#

i dont suppose you have a link to some reading or a youtube vid or something on this topic that you have found usefull do you?

haughty mountain May 3, 2024, 9:52 PM

#

nothing specific

tardy monolith May 3, 2024, 9:53 PM

#

because i have to do all this again with another algorithm that does the same thing with recursion and dynamic programming which im not looking forward to deciphering lol

haughty mountain May 3, 2024, 9:54 PM

#

there are basically 3 ones you might see, and 2 more you are unlikely to see
O, Ω and Θ are fairly common
o and ω are rarer

#

it might help to know what they kinda correspond to in the math you're used to

#

I mentioned O is like ≤

tardy monolith May 3, 2024, 9:54 PM

#

yeah my course covered briefly the top 3

haughty mountain May 3, 2024, 9:54 PM

#

Ω is like ≥

#

Θ is like =

tardy monolith May 3, 2024, 9:55 PM

#

and as it goes along it briefly talks about complexity of certain thigns, but never about an algorithm as a whole and what the complexity comes to

haughty mountain May 3, 2024, 9:55 PM

#

o and ω are like < and > respectively

tardy monolith May 3, 2024, 9:56 PM

#

and caches the results, so i have to alo figure out the complexity here

#

so i have recursion and splitting going on again

#

although i was dead proud of getting this one working lol

#

its a top down aproach

#

my god my spellings awful today lol

haughty mountain May 3, 2024, 10:00 PM

#

haughty mountain o and ω are like < and > respectively

basically, look at f(x)/g(x), you can either have it go to infinity, to a (non-zero) constant, or to zero

infinity → ω
infinity or constant → Ω
constant → Θ
constant or zero → O
zero → o

#

recursion is always fun to analyze

tardy monolith May 3, 2024, 10:05 PM

#

by fun you mean depressing lol

#

from my looking around and working thins out if a function calls itself its linear, and if it calls itself twice withing itself its quadratic, then i also have to factor in what the recursion is doing

#

so in my case i am recursively calling the function twice within itself while performing a split on each

haughty mountain May 3, 2024, 10:06 PM

#

tardy monolith from my looking around and working thins out if a function calls itself its line...

it's not that simple

tardy monolith May 3, 2024, 10:07 PM

#

but then my algorithm is also caching the results

#

yeah i didnt think it would be

haughty mountain May 3, 2024, 10:07 PM

#

the caching makes it more annoying, yes

haughty mountain May 3, 2024, 10:07 PM

#

haughty mountain it's not that simple

https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)

Master theorem (analysis of algorithms)

In the analysis of algorithms, the master theorem for divide-and-conquer recurrences provides an asymptotic analysis for many recurrence relations that occur in the analysis of divide-and-conquer algorithms. The approach was first presented by Jon Bentley, Dorothea Blostein (née Haken), and James B. Saxe in 1980, where it was described as a "uni...

tardy monolith May 3, 2024, 10:09 PM

#

Nice i'll take a look.

#

thanks for the help, i dont want to sap away anymore of your time, but just know you have been a godsend

#

lol this link is painfully looking like my engineering courses calculus module lol

#

sigh

#

thanks again

crimson epoch May 4, 2024, 7:47 PM

#

i know basics of python
how to go about the DSA part
any resources or course recommendation ?'

jolly mortar May 5, 2024, 3:59 AM

#

check pins

patent junco May 5, 2024, 11:54 AM

#

i'm trying to write manually parser for .gpl files and i'm stuck.
i have this code:

def listfile(input_file):
    global name
    global colors
    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break; 
    print(colors)

someone knows why i get name errors ?

patent junco May 5, 2024, 12:19 PM

#

Python 3.11.2 BTW

flat sorrel May 5, 2024, 12:24 PM

#

patent junco `Python 3.11.2` BTW

why are you using globals instead of passing them into your function?

patent junco May 5, 2024, 12:25 PM

#

flat sorrel why are you using globals instead of passing them into your function?

it's inside

#

and without was getting yet more errors…

flat sorrel May 5, 2024, 12:26 PM

#

patent junco it's inside

you should pass variables in via parameters instead of globals, otherwise the function will modify the value of the global which might mess up other code that's using it

#

speaking of globals, it seems that you have code that is outside of your function. can you show that as well?

patent junco May 5, 2024, 12:27 PM

#

moment

#

full file although will be later used in a bigger application

#

(and sorry for language)

flat sorrel May 5, 2024, 12:28 PM

#

it seems that the problem is that colors is not initialized

patent junco May 5, 2024, 12:28 PM

#

but where i should?

#

i want to make it function-only

flat sorrel May 5, 2024, 12:28 PM

#

probably at the start of the function

#

global colors doesn't actually initialize the value by itself

#

you should set colors = <some value>

patent junco May 5, 2024, 12:29 PM

#

flat sorrel `global colors` doesn't actually initialize the value by itself

but initializes variable existence?

flat sorrel May 5, 2024, 12:30 PM

#

no, it doesn't

patent junco May 5, 2024, 12:30 PM

#

flat sorrel you should set `colors = <some value>`

but what value? i want it to be empty and fill it later with a dict

flat sorrel May 5, 2024, 12:30 PM

#

set it to an empty list then

#

and then append to it inside the loop

patent junco May 5, 2024, 12:31 PM

#

a dict, not list

flat sorrel May 5, 2024, 12:31 PM

#

ok, then an empty dict

patent junco May 5, 2024, 12:31 PM

#

{"name":["1","2","3"]}
(after a file with one entry)

#

or a more practical example after running code (yep, .gpl is for graphics, at the end gonna make a gimp alternative):

output:
"colors", {"red":["255","0","0"],"green":["0","255","0"],"blue":["0","0","255"]}

patent junco May 5, 2024, 12:51 PM

#

https://www.w3schools.com/python/python_variables.asp

Python has no command for declaring a variable.
A variable is created the moment you first assign a value to it.
i feel like it should make variable on 13 line and that python shouldn't ever tell that variable isn't defined - should create it…

W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

flat sorrel May 5, 2024, 1:08 PM

#

patent junco https://www.w3schools.com/python/python_variables.asp > Python has no command fo...

but then what should be assigned to the variable?

#

it's not like in statically typed languages where the type of the variable is known in advance

#

so in Python, you need to actually set the initial value of the variable upon declaring it

patent junco May 5, 2024, 1:23 PM

#

flat sorrel but then what should be assigned to the variable?

it should begin by itself crate a dict and add this as a first entry to dict

flat sorrel May 5, 2024, 1:24 PM

#

how would it know that it should be a dict?

patent junco May 5, 2024, 1:24 PM

#

flat sorrel how would it know that it should be a dict?

now did this and yet worse:

flat sorrel May 5, 2024, 1:25 PM

#

you didn't initialize the colors variable

#

you should assign it an empty dictionary (explicitly) at the start

narrow mica May 5, 2024, 1:25 PM

#

and also that's not valid python syntax (as also shown by the error)

flat sorrel May 5, 2024, 1:25 PM

#

like colors = {}

patent junco May 5, 2024, 1:25 PM

#

i want it IN ONE LINE to either initialize variable with proper type if doesn't exist or append if does

flat sorrel May 5, 2024, 1:26 PM

#

but your variable needs to exist for the whole function

patent junco May 5, 2024, 1:26 PM

#

flat sorrel but your variable needs to exist for the whole function

should exist at first appearance

#

and if doesnt - to be created in same place

flat sorrel May 5, 2024, 1:27 PM

#

but if the current_file is empty, you still return colors

#

so you can't only initialize it in the for loop

#

it must be outside

patent junco May 5, 2024, 1:28 PM

#

file error handling will do later

#

now trying to make it work on a full file

#

(and doesnt)

flat sorrel May 5, 2024, 1:30 PM

#

it would be a lot simpler if you just initialize the variables at the start of the function

narrow mica May 5, 2024, 1:33 PM

#

patent junco i want it IN ONE LINE to either initialize variable with proper type if doesn't ...

I don't think there's a good way to do that
if for some reason you must do it then

>>> def my_fn():
...     globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
...
>>> test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'test' is not defined
>>> my_fn()
>>> test
{'a': 'hello!'}
>>>
```which is in general pretty horrible code, both readability wise and performance wise (you're creating a new dictionary every time instead of updating an existing one)

patent junco May 5, 2024, 1:34 PM

#

i'd rather used this:

colors: dict[str, str] = {single[3][1:-1] : [single[0],single[1],single[2]]} if not colors else colors: dict[str, str] += {single[3][1:-1] : [single[0],single[1],single[2]]}

but why running similar code twice?

narrow mica May 5, 2024, 1:35 PM

#

patent junco i'd rather used this: ```py colors: dict[str, str] = {single[3][1:-1] : [single...

that won't work, and colors: type += is in general invalid python

>>> b = 'test' if not b else b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>>

In the above, I ran test twice to show that before I ran the function the variable isn't defined, but afterwards it is

flat sorrel May 5, 2024, 1:36 PM

#

Why not this?

def listfile(input_file):
    # Initialize colors here
    colors = {}

    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break;

#

just set colors once at the beginning

patent junco May 5, 2024, 1:36 PM

#

flat sorrel Why not this? ```py def listfile(input_file): # Initialize colors here c...

crappy

flat sorrel May 5, 2024, 1:36 PM

#

how so?

patent junco May 5, 2024, 1:36 PM

#

even assembly does better :/

flat sorrel May 5, 2024, 1:36 PM

#

if you want to be fast then you shouldn't use Python

narrow mica May 5, 2024, 1:37 PM

#

again it's just bad code though, it's way better to

>>> test = {}
>>> def my_fn():
...     global test
...     test |= {'a': 'hello!'}
```even better if you don't use `global` at all
```py
>>> def my_fn(test):
...     test |= {'a': 'hello!'}
...
>>> d = {'b': 'world!'}
>>> my_fn(d)
>>> d
{'b': 'world!', 'a': 'hello!'}

patent junco May 5, 2024, 1:43 PM

#

okay. maybe you didnt understood.
i want something that merges "create dictionary" and "append to dictionary" at once

#

sort of "if exists: append; else: create with initial value"

#

but at once

narrow mica May 5, 2024, 1:44 PM

#

patent junco okay. maybe you didnt understood. i want something that merges "create dictionar...

yes, and in fact

globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
```this merges "create `test` if it doesn't exist as a dictionary" and "give me an updated dictionary if it exist" at once, but is in general horrible python (you should almost never have to touch `globals()` ever)

narrow mica May 5, 2024, 1:45 PM

#

narrow mica yes, and in fact ```py globals()['test'] = globals().get('test', {}) | {'a': 'he...

its performance is also really bad when compared to other solutions because you're making a new dictionary every time instead of updating an existing one

narrow mica May 5, 2024, 1:46 PM

#

patent junco sort of "if exists: append; else: create with initial value"

and I'm really not sure what's your gripe with simply

my_dict = {}  # initialize it as an empty dictionary

my_dict |= {'a': 'hello!'}  # update it later

patent junco May 5, 2024, 1:48 PM

#

narrow mica and I'm really not sure what's your gripe with simply ```py my_dict = {} # init...

cause i want it to initialize in place? it's very nested and in LOOP…

#

literally - trying to make a pseudo-csv parser

flat sorrel May 5, 2024, 1:49 PM

#

patent junco cause i want it to initialize in place? it's very nested and in LOOP…

but the code doesn't let you do that. you're already returning colors regardless of whether the loop is run

#

this makes colors a variable outside the loop

narrow mica May 5, 2024, 1:49 PM

#

patent junco cause i want it to initialize in place? it's very nested and in LOOP…

I don't think I get what you mean exactly
I don't see why it'd hurt so bad to change one line

colors = {}  # <-- the only thing you have to add
def listfile(input_file):
    global name
    global colors
    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break; 
    print(colors)

patent junco May 5, 2024, 1:50 PM

#

narrow mica I don't think I get what you mean exactly I don't see why it'd hurt so bad to ch...

cause it will be a module later?

narrow mica May 5, 2024, 1:50 PM

#

patent junco cause it will be a module later?

so the problem is people will be able to access colors, and you don't want that?

patent junco May 5, 2024, 1:50 PM

#

and i don't want importing it to create empty variable?

patent junco May 5, 2024, 1:50 PM

#

narrow mica so the problem is people will be able to access `colors`, and you don't want tha...

i want this var to be visible only when function is ran

flat sorrel May 5, 2024, 1:51 PM

#

then don't make it a global. initialize colors inside the function so it is local to the function

flat sorrel May 5, 2024, 1:51 PM

#

flat sorrel Why not this? ```py def listfile(input_file): # Initialize colors here c...

like this

patent junco May 5, 2024, 1:51 PM

#

it can appear but only when processing file…

flat sorrel May 5, 2024, 1:52 PM

#

my code does exactly that. the colors variable only exists inside the scope of listfile, i.e. when the function is called

patent junco May 5, 2024, 1:52 PM

#

exists before

flat sorrel May 5, 2024, 1:52 PM

#

patent junco exists before

what do you mean by this?

patent junco May 5, 2024, 1:53 PM

#

but isn't global

narrow mica May 5, 2024, 1:54 PM

#

patent junco but isn't global

so you need a global variable, that only exists while the function is running? why?

patent junco May 5, 2024, 1:54 PM

#

narrow mica so you need a global variable, that only exists while the function is running? w...

cause i want to be able to make same-named variable in general file later?

narrow mica May 5, 2024, 1:55 PM

#

patent junco cause i want to be able to make same-named variable in general file later?

and what's stopping you from doing that with the solutions we've given?

patent junco May 5, 2024, 1:55 PM

#

python isn't pythonish

narrow mica May 5, 2024, 1:55 PM

#

!e nothing in python is stopping you from "re-defining" variables (even though that notion is kinda weird)

my_var = 10
my_var = 'hello'
my_var = [123]
print(my_var)

halcyon plankBOT May 5, 2024, 1:55 PM

#

@narrow mica :white_check_mark: Your 3.12 eval job has completed with return code 0.

[123]

patent junco May 5, 2024, 1:56 PM

#

may i test something here?

flat sorrel May 5, 2024, 1:57 PM

#

you can use !e to execute python snippets here

patent junco May 5, 2024, 1:57 PM

#

i know but idk rules

narrow mica May 5, 2024, 1:57 PM

#

you can run snippets with !e, but if it involves multiple files it'd probably be easier to run locally or smthn

patent junco May 5, 2024, 1:57 PM

#

no, two/three liner ( i prefer short code)

narrow mica May 5, 2024, 1:57 PM

#

sure?

patent junco May 5, 2024, 1:58 PM

#

!e

var : str = [11]
print(var)

halcyon plankBOT May 5, 2024, 1:58 PM

#

@patent junco :white_check_mark: Your 3.12 eval job has completed with return code 0.

[11]

narrow mica May 5, 2024, 1:58 PM

#

(if you haven't noticed, python is not statically typed and you can kinda "do whatever" with variables)

patent junco May 5, 2024, 1:59 PM

#

tried to make it rusty xD

narrow mica May 5, 2024, 2:00 PM

#

I think more and more languages are taking up that syntax tbh, not just rust

patent junco May 5, 2024, 2:00 PM

#

i know, JS/ES too

narrow mica May 5, 2024, 2:06 PM

#

patent junco i know, JS/ES too

not just, e.g.
zig const x: u8 = 125;
kotlin val x: Int = 5
ocaml let x: string = "hello!";;

patent junco May 5, 2024, 2:11 PM

#

i know

regal spoke May 5, 2024, 3:04 PM

#

narrow mica I don't think I get what you mean exactly I don't see why it'd hurt so bad to ch...

Your code looks super weird

for line in current_file:
  ...
  for line in current_file:
     ...
  if not line:
    break;

#

Also, I'm pretty sure line cannot be empty. The smallest string it can be is a newline character. (also the ; shouldnt be there)

patent junco May 5, 2024, 3:05 PM

#

regal spoke Your code looks super weird ```py for line in current_file: ... for line in...

yep, used something so needed to iterate twice

narrow mica May 5, 2024, 3:06 PM

#

I copied the code hacknorris wrote and added that 1 line, didn't look too thoroughly into it

regal spoke May 5, 2024, 3:08 PM

#

I'm not even sure what that double for loop does

#

Does the inner loop continue off after the line of the first loop?

#

Could you post an example of what you are trying to parse?

#

is it something like this?

Name: MyFile
#
"Red" 255 0 1
"Blue" 0 0 255

patent junco May 5, 2024, 3:31 PM

#

regal spoke Could you post an example of what you are trying to parse?

you'd wish

patent junco May 5, 2024, 3:32 PM

#

regal spoke is it something like this? ``` Name: MyFile # "Red" 255 0 1 "Blue" 0 0 255 ```

name at end but yep

#

gimp palettes

patent junco May 5, 2024, 3:33 PM

#

regal spoke I'm not even sure what that double for loop does

first for is to be ever able to find from where is color list and second to loop ever colors itself

flat sorrel May 5, 2024, 3:34 PM

#

using for loops on the same iterator multiple times is real dodgy imo. this is better served by a while loop

regal spoke May 5, 2024, 3:34 PM

#

patent junco first for is to be ever able to find from where is color list and second to loop...

The tricky thing is that generators in python can get used up when you iterate over them.

patent junco May 5, 2024, 3:35 PM

#

regal spoke The tricky thing is that generators in python can get used up when you iterate o...

and that's what i'm trying to use. it works. but idk how to. in one place. in the same line. create or append (depending on existence) a dictionary

flat sorrel May 5, 2024, 3:36 PM

#

this has nothing to do with the dictionary

#

it's about iterating through the file

patent junco May 5, 2024, 3:36 PM

#

but iterating works

flat sorrel May 5, 2024, 3:36 PM

#

does creating the dictionary at the start not solve the problem?

patent junco May 5, 2024, 3:36 PM

#

why ?

flat sorrel May 5, 2024, 3:37 PM

#

what does your code look like now? does it work?

patent junco May 5, 2024, 3:37 PM

#

flat sorrel what does your code look like now? does it work?

works? no

#

dictionary handling broke it

flat sorrel May 5, 2024, 3:37 PM

#

patent junco but iterating works

then why are you saying this?

patent junco May 5, 2024, 3:38 PM

#

cause iterating works but dictionary handling breaks code

flat sorrel May 5, 2024, 3:38 PM

#

how do you know that?

patent junco May 5, 2024, 3:38 PM

#

cause it'd show an error

flat sorrel May 5, 2024, 3:38 PM

#

ok, but what if you comment out that line? it shouldn't affect the iteration

patent junco May 5, 2024, 3:38 PM

#

then it wouldn't work

#

it's core function

flat sorrel May 5, 2024, 3:38 PM

#

so if you comment out that line and your iteration logic is correct, the function should not throw an error

#

it'll just return nothing

patent junco May 5, 2024, 3:39 PM

#

and it doesn't

#

dictionary is problem

flat sorrel May 5, 2024, 3:39 PM

#

and if it throws an error, that shows that the iteration logic is wrong

patent junco May 5, 2024, 3:39 PM

#

no, error is about dictionary (ain't blind)

flat sorrel May 5, 2024, 3:39 PM

#

regarding the dictionary, just set the variable to a dictionary at the start of the function

#

that should avoid the NameError since the variable is always defined inside the function

patent junco May 5, 2024, 3:46 PM

#

show me a shortcut for this:

colors += {single[3][1:-1] : [single[0],single[1],single[2]]} if colors else colors = {single[3][1:-1] : [single[0],single[1],single[2]]}

regal spoke May 5, 2024, 3:48 PM

#

According to https://developer.gimp.org/core/standards/gpl/ you should be able to parse .gpl like this:

colors = {}
for line in current_file:
  line = line.rstrip() # Remove trailing whitespace
  if line.startswith('GIMP Palette'): # Start of header
    pass
  elif line.startswith('Name: '):
    name = line.split(maxsplit=1)[1]
  elif line.startswith('Columns: '):
    columns = line.split(maxsplit=1)[1]
  elif line.startswith('#'): # comment line, should be ignored
    pass
  elif not line: # empty line, should be ignored
    pass
  else:
    r,g,b,color_name = line.split(maxsplit=3)
    colors[color_name] = [r,g,b]

Doing it this way solves the issue of having double for loops. It also is able to correctly parse .gpl files with tons of extra whitespace like this:

GIMP Palette
Name: bugslife_final.png-10
Columns: 16
#
191 180 180   Index 0
163 158 157   Index 1
145 136 132   Index 2
130 125 112   Index 3
… … …
56  50  49   Index 29
41  38  38   Index 30
23  23  23   Index 31
242 245 213   Index 32
227 232 181   Index 33
210 217 147   Index 34
195 204 118   Index 35
… … …
  0   0   0   Index 251
  0   0   0   Index 252
  0   0   0   Index 253
  0   0   0   Index 254
  0   0   0   Index 255

GIMP Developer - GIMP Palette Format Version 2 (.gpl)

#

You code from earlier wouldnt be able to handle 0 0 0 Index 255 correctly. But doing line.split(maxsplit=3) works really well

#

!e

line = "  0   0   0   Index 255"
r,g,b,color_name = line.split(maxsplit=3)
print(r)
print(g)
print(b)
print(color_name)

halcyon plankBOT May 5, 2024, 3:53 PM

#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 0
002 | 0
003 | 0
004 | Index 255

flat sorrel May 5, 2024, 3:56 PM

#

!e

def listfile(input_file):
    name = None
    colors = {}
    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, cname = line.split(' ')
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '255 0 1 Red',
        '0 0 255 Blue',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()

halcyon plankBOT May 5, 2024, 3:56 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255')})

📎 test

flat sorrel May 5, 2024, 3:56 PM

#

there we go

#

notice how I used a context manager (with statement) when opening the file to ensure that it gets closed. also the walrus operator (:=) lets me assign the new value to the variable in the same statement as the while condition

regal spoke May 5, 2024, 4:00 PM

#

Btw I dont think storing the color in a dictionary like this is a good idea. The format allows colors without a name too.

#

"The color name is optional, i.e. that a color description line may only contain r g b"

#

So using a dictionary keyed by the color name is not a good approach

patent junco May 5, 2024, 4:01 PM

#

regal spoke So using a dictionary keyed by the color name is not a good approach

it will be used in bigger (gui) application and i will clearly mention it on codeberg in advanced wiki page…

flat sorrel May 5, 2024, 4:05 PM

#

!e

def listfile(input_file):
    name = None
    colors = {}
    num_unknown_colors = 0

    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(' ')
                if len(rest) == 0:
                    num_unknown_colors += 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
 
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255 0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 0',
        '0 255 0 Green',
        '4 0 0',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()

halcyon plankBOT May 5, 2024, 4:05 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})

📎 test

flat sorrel May 5, 2024, 4:05 PM

#

this version also takes care of blank lines and unnamed colors

#

(assuming that the file itself does not contain colors that are named UNKNOWN_{idx})

regal spoke May 5, 2024, 4:07 PM

#

flat sorrel !e ```py def listfile(input_file): name = None colors = {} num_unkno...

I think line.split(' ') is flawed. Tokenizing with something like line.split(maxsplit=3) is a lot nicer

#

split behaves very differently if you specify a delimiter or not

flat sorrel May 5, 2024, 4:08 PM

#

that is true

#

!e

def listfile(input_file):
    name = None
    colors = {}
    num_unknown_colors = 0

    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(maxsplit=3)
                if len(rest) == 0:
                    num_unknown_colors += 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
 
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255\t0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 \t0',
        '0 255 0 Green',
        '4 0 0\t',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()

halcyon plankBOT May 5, 2024, 4:09 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})

📎 test

flat sorrel May 5, 2024, 4:09 PM

#

ok this should take care of tabs as well xD

regal spoke May 5, 2024, 4:09 PM

#

Ye. Also whitespace in the cname also works now

flat sorrel May 5, 2024, 4:12 PM

#

we can be a bit stricter and require that the header is actually in the first line, then optionally followed by name and columns

regal spoke May 5, 2024, 4:12 PM

#

flat sorrel we can be a bit stricter and require that the header is actually in the first li...

GIMP Palette and Name: are required, but Columns: is optional

#

but ye

flat sorrel May 5, 2024, 4:13 PM

#

Name: is only required in version 2

regal spoke May 5, 2024, 4:14 PM

#

oh ye

#

Here is a .gpl parser from the wild https://github.com/python-pillow/Pillow/blob/main/src/PIL/GimpPaletteFile.py
It just ignores the color names completely

GitHub

Pillow/src/PIL/GimpPaletteFile.py at main · python-pillow/Pillow

Python Imaging Library (Fork). Contribute to python-pillow/Pillow development by creating an account on GitHub.

flat sorrel May 5, 2024, 4:22 PM

#

patent junco show me a shortcut for this: ```py colors += {single[3][1:-1] : [single[0],sing...

Purplys already provided the solution to this here: #algos-and-data-structs message

But as mentioned, it is considered extremely hacky (#esoteric-python) and certainly not Pythonic. You can see that our solutions above assign the dictionary at the start of the function and work as expected

flat sorrel May 5, 2024, 4:23 PM

#

regal spoke Here is a .gpl parser from the wild https://github.com/python-pillow/Pillow/blob...

lol what is that regex in https://github.com/python-pillow/Pillow/blob/main/src/PIL/GimpPaletteFile.py#L41

halcyon plankBOT May 5, 2024, 4:23 PM

#

src/PIL/GimpPaletteFile.py line 41

if re.match(rb"\w+:|#", s):```

patent junco May 5, 2024, 4:24 PM

#

flat sorrel Purplys already provided the solution to this here: https://discord.com/channels...

that hacky one didn't even worked xD

regal spoke May 5, 2024, 4:25 PM

#

flat sorrel lol what is that regex in https://github.com/python-pillow/Pillow/blob/main/src/...

Match for x...x: or #, where x (called \w in the regex) is a "word character", meaning alphabetic, numbers, ...

flat sorrel May 5, 2024, 4:26 PM

#

I get that, but is it really necessary to use a regex?

regal spoke May 5, 2024, 4:26 PM

#

patent junco that hacky one didn't even worked xD

#algos-and-data-structs message is this one working for you?

patent junco May 5, 2024, 4:27 PM

#

probably would but i'd like to also avoid empty initialization

regal spoke May 5, 2024, 4:27 PM

#

Wut

patent junco May 5, 2024, 4:27 PM

#

that's what i was talking about

flat sorrel May 5, 2024, 4:27 PM

#

but the file is almost guaranteed to contain at least one color. so an initialization would be made in basically 100% of cases

patent junco May 5, 2024, 4:27 PM

#

i want to avoid x = {}

patent junco May 5, 2024, 4:28 PM

#

flat sorrel but the file is almost guaranteed to contain at least one color. so an initializ...

*empty initialization

flat sorrel May 5, 2024, 4:28 PM

#

why do you want to avoid this so badly?

regal spoke May 5, 2024, 4:28 PM

#

patent junco i want to avoid `x = {}`

then just do

assert len(colors) >= 1
return name, colors

patent junco May 5, 2024, 4:29 PM

#

i don't want this at start:

    name = None
    colors = {}
    num_unknown_colors = 0

flat sorrel May 5, 2024, 4:29 PM

#

patent junco i don't want this at start: ``` name = None colors = {} num_unknown...

yeah but why?

narrow mica May 5, 2024, 4:29 PM

#

originally the problem was they wanted a way to (in 1 line) initialize colors if colors isn't defined, or update colors if it is defined

patent junco May 5, 2024, 4:29 PM

#

cause i feel like there is oneliner which can do it same way as you can open a file in append mode (it doesn't give a f if file is empty or not)

regal spoke May 5, 2024, 4:30 PM

#

patent junco i don't want this at start: ``` name = None colors = {} num_unknown...

ots

flat sorrel May 5, 2024, 4:30 PM

#

I mean, you can, but it's not pretty

patent junco May 5, 2024, 4:30 PM

#

who cares?

narrow mica May 5, 2024, 4:30 PM

#

which lead to this mess

globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
```idk how that worked out

flat sorrel May 5, 2024, 4:30 PM

#

people who read your code would care

patent junco May 5, 2024, 4:30 PM

#

narrow mica which lead to this mess ```py globals()['test'] = globals().get('test', {}) | {'...

doesn't work

patent junco May 5, 2024, 4:31 PM

#

flat sorrel people who read your code would care

if someone ever does (and yep, it's open)

flat sorrel May 5, 2024, 4:31 PM

#

!e

globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
print(test)

halcyon plankBOT May 5, 2024, 4:31 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

{'a': 'hello!'}

flat sorrel May 5, 2024, 4:31 PM

#

it does work

regal spoke May 5, 2024, 4:31 PM

#

patent junco cause i feel like there is oneliner which can do it same way as you can open a f...

So why then not just add a check that listfile doesn't return something empty?

patent junco May 5, 2024, 4:31 PM

#

i meant to make nonexistent dictionary to work like that

regal spoke May 5, 2024, 4:32 PM

#

I cannot make sense of why you would ever want to avoid colors = {}

patent junco May 5, 2024, 4:32 PM

#

like open(x,"a") to make dict working. regardless of existence or content

#

so what? i can't open an empty file in append mode too?

regal spoke May 5, 2024, 4:33 PM

#

I'm extremely confused here

patent junco May 5, 2024, 4:33 PM

#

anyway - right now getting more serious problems. let's end it

flat sorrel May 5, 2024, 4:40 PM

#

!e Well, this is extremely cursed but does what @patent junco wants I guess...

def get_var(ctx, name, default):
    return ctx.get(name, default)

def listfile(input_file):
    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored
    
            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(maxsplit=3)
                if len(rest) == 0:
                    num_unknown_colors = get_var(locals(), 'num_unknown_colors', 0) + 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
                
                colors = get_var(locals(), 'colors', {})
                colors[cname] = r, g, b
    
    return get_var(locals(), 'name', None), get_var(locals(), 'colors', {})

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255\t0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 \t0',
        '0 255 0 Green',
        '4 0 0\t',
    ])
    with open('test', 'w') as f:
        f.write(content)
    
    print(listfile('test'))

test()

halcyon plankBOT May 5, 2024, 4:40 PM

#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})

📎 test

patent junco May 5, 2024, 4:42 PM

#

i thought more of combining = and +=
:/

flat sorrel May 5, 2024, 4:43 PM

#

if you want to delve into more cursed code, you are welcome to join the madness in #esoteric-python

patent junco May 5, 2024, 4:48 PM

#

flat sorrel if you want to delve into more cursed code, you are welcome to join the madness ...

already there

regal spoke May 5, 2024, 4:52 PM

#

flat sorrel !e ```py def listfile(input_file): name = None colors = {} num_unkno...

Btw @patent junco note that if the listfile function fails for whatever reason (as in gets an excpetion), then nothing is returned.
Making it impossible for an empty dictionary to leak out of listfile if you have assert len(colors)>=1 before the return.

#

So if the parsing fails, then it is as if no dictionary ever existed in the first place

#

On the other hand, if you had initialized colors = {} as a global variable, then it is a different story. But that is not what DarkLight's code is doing

south umbra May 6, 2024, 11:47 AM

#

i have JSON Data coming from 5 different places i believe that the data will get pilled up also it would be very big and messy, actually i made a RAG to pass the data for LLM to have knowledge base so it can provide a personalized answer,
Anyways the thing is i am getting data from 5 different channels of marketing then passing it into 1 json for a workspace to the RAG
so my fear is that the data would get pilled up/can break

so what should be the optimal approach?
what do u guys think?

#

flat sorrel May 6, 2024, 12:46 PM

#

won't the RAG process take longer than combining 5 jsons together?

#

I guess you can set a maximum lifespan for each data group to combine so that it won't hog up resources in case one of the 5 processes fails

pale prism May 6, 2024, 5:00 PM

#

(cross-asking from #data-science-and-ml - I asked them in case of numpy answers, but it's more of algo thing)

My partner has an interesting problem involving Jaccard index...
Comparing >300k unique subsets A (2^A), where |A| > 300 - and for each such set finding sets with which it has minimum Jaccard index...

We were thinking about representing the sets as 300 bits- that gives us fast calculation of the index itself (because bitwise operations), so only the number of calculations makes it costly -
bruteforce of everything-to-everything is (300k)² operations.

Does anyone have any ideas how to get it lower? We were thinking about clustering it somehow but |2^A| is so big it's hard to think of something that makes sense (there's a lot of pairs that don't intersect at all).

haughty mountain May 6, 2024, 5:04 PM

#

jaccard index here would be n_matching_bits/|A|?

#

which I guess just boils down to looking at n_matching_bits

#

i.e. hamming distance

boreal schooner May 6, 2024, 5:08 PM

#

What's the most performatic way to solve a shopping cart queue?

I participated in an Amazon coding problem yesterday, and one of the questions was a really simple one. To paraphrase: "Develop a method to handle Amazon shopping carts that takes two inputs, items being the current shopping cart which is an array of product IDs, and queue being the operation IDs to apply to the shopping cart. If the ID is positive, append the product ID to the end of the cart, if the ID is negative, remove the first instance of the module of the ID from the shopping cart. Return the processed shopping cart." I implemented a simple solution, and it passed all the simple test cases, however it failed all the test cases with large inputs. I ended up not having time to come up with a performatic solution, my best being O(n^2) I think, which I will attach below. How would you have done it? I thought about using either a collections.deque or collections.defaultdict.

def problem_solution(items: list[int], query: list[int]) -> list[int]:
    for item in query:
        if item > 0:
            items.append(item)
            
        else:
            items.remove(abs(item))
            
    return items

pale prism May 6, 2024, 5:23 PM

#

haughty mountain jaccard index here would be `n_matching_bits/|A|`?

No Jaccard would be done normally.
J(x, y) = count_bits(x&y)/count_bits(x|y) (where count_bits counts 1s in the binary repr)

The /|A| makes no sense as it would only do size of intersection, not actual similarity of sets.

Also, *maximum, because obviously minimum is 0, lol. We want the most similar sets.

It's "just":
n > 300k
a_1,...,a_n \in 2^A

For each a_x find a_y (x!=y) such that J(a_x, a_y) >= J(a_x, a_i) for all i<n, i!=x

(Theoretically there can be multiple a_y for given a_x - and in such case we want all of them, but idk how it write in mathsy way)

So with bruteforce, you'd calculate everything-to-everything and then find max value for each a_x and indexes where that max is present. Or something.
(Or of course calculate along the way keeping only current max. But that's still 300k squared operations)

haughty mountain May 6, 2024, 5:24 PM

#

oh right, intersection over union

haughty mountain May 6, 2024, 5:41 PM

#

pale prism No Jaccard would be done normally. `J(x, y) = count_bits(x&y)/count_bits(x|y)` ...

depending on your latency requirements (300k)² might not be that bad

#

O(minutes)

vocal gorge May 6, 2024, 5:59 PM

#

boreal schooner What's the most performatic way to solve a shopping cart queue? I participated ...

Hmm, my idea would be to first make one pass over items and query to determine the counts of the result, and then another pass actually creating the final list, with the advantage that you can ignore ids that you know will not appear in the final list. I think that's still O(n^2) worst-case, though.

haughty mountain May 6, 2024, 5:59 PM

#

haughty mountain O(minutes)

throwing a GPU at this problem might not be the worst idea

#

(assuming these operations can be expressed in a way that the GPU likes)

vocal gorge May 6, 2024, 6:05 PM

#

vocal gorge Hmm, my idea would be to first make one pass over `items` and `query` to determi...

actually... the key I think is the fact it's always the first occurence of the id that's removed. So first, for each id calculate how many times it has been removed (a dict[int,int]). Then you can iterate over items and (the positive elements in) query constructing the final list, for each id skipping the first removed_cnt[id] occurences of it. That's O(n) total.

boreal schooner May 6, 2024, 6:17 PM

#

vocal gorge actually... the key I think is the fact it's always the first occurence of the i...

You mean something like this?

def sol2(items: list[int], query: list[int]) -> list[int]:
    dic = {}

    for item in query:
        if item > 0:
            items.append(item)
        else:
            if item not in dic:
                dic[item] = [0, 1]
            else:
                dic[item][1] += 1

    output = []

    for item in items:
        if item in dic and dic[item][0] < dic[item][1]:
            dic[item][0] += 1
        else:
            output.append(item)

    return output

unborn sundial May 6, 2024, 6:30 PM

#

boreal schooner You mean something like this? ```py def sol2(items: list[int], query: list[int]...

import unittest
from collections import defaultdict 


def problem_solution(items: list[int], queries: list[int]) -> list[int]:
    elements_to_add: list[int] = []
    elements_to_remove: dict[int,int] = defaultdict(lambda: 0)
    for query in queries:
        if query > 0:
            items.append(query)
        else:  
            elements_to_remove[abs(query)] += 1

    result = []    
    for item in items:
        if elements_to_remove[item] > 0:
            elements_to_remove[item] -= 1
            continue

        result.append(item)

    return result + elements_to_add

class TestStringMethods(unittest.TestCase):

    def test_1(self):
        self.assertEqual(problem_solution([1,4,5], queries=[3]), [1,4,5,3])

    def test_2(self):
        self.assertEqual(problem_solution([1,4,5], queries=[-4]), [1,5])

if __name__ == '__main__':
    unittest.main()

or smth like this

pale prism May 6, 2024, 6:31 PM

#

haughty mountain O(minutes)

Uh... What?
We checked and it's ~1000s to get over all elements once (so ~3.33ms per element). That gives 1000s *300k for bruteforce. That's 1000×300000÷60÷60÷24 days. That's still 3k days. 💀

Even if it was 1ms per element it would still be 1k days. 💀

Also, my partner says the bit representation actually doesn't make much sense because for a lot of those sets the bit count is like 10.
And they counted the A and it's even bigger, so yep, bit representation is completely useless. 😭 My experience with competitive programming didn't help - the bit representation was my first thought because it's common in such tasks in competitive programming

vocal gorge May 6, 2024, 6:32 PM

#

boreal schooner You mean something like this? ```py def sol2(items: list[int], query: list[int]...

Your implementation loses most items, I think, but very broadly yes. I meant like this:

def sol2_cnt(items: list[int], query: list[int]) -> list[int]:
    removals = defaultdict(int)
    for item in query:
        if item < 0:
            removals[abs(item)] += 1
    res = []
    for el in itertools.chain(items, query):
        if el < 0:
            continue
        if removals[el] > 0:
            removals[el] -= 1
        else:
            res.append(el)
    return res

which seems to match the output of the naive solution

haughty mountain May 6, 2024, 6:33 PM

#

pale prism Uh... What? We checked and it's ~1000s to get over all elements once (so ~3.33m...

isn't 1 element just |A| bits or did I misunderstand you?

#

going over 300k items should for sure not take minutes pithink

#

or were you doing this in plain python or something?

#

even then that seems quite slow

#

3.33ms is a lot of time in computer world

haughty mountain May 6, 2024, 6:41 PM

#

haughty mountain 3.33ms is a lot of time in computer world

(as in, that'll be millions of instructions)

pale prism May 6, 2024, 6:44 PM

#

haughty mountain isn't 1 element just |A| bits or did I misunderstand you?

... My partner used shorthands when talking... apparently their "going over" meant just grabbing the sets for each thing*, not "going over with already grabbed sets". So my calculations might be way off 🤦 talking to my partner is sometimes hard when it comes to coding, lol. I should've realised it when they said about caching - but then said something about 1s... but when I asked "so going over all elements is 1s" they said that no, it takes 1000s. 💀

*each element being a set was already simplification, we have mapping from original element to set of traits it has

Idk whether i should push for the bit representation. Apparently most things have sets of size ~10, and we now know that |A| is at least 2k, so the theoretical bit representation gets longer and longer 💀

pale prism May 6, 2024, 6:48 PM

#

haughty mountain or were you doing this in plain python or something?

I think that 1k s was database? Idk anymore, seriously, talking to my partner is sometimes hard if you don't confirm several times that you're on the same page (see the "going over all elements" thing 💀).

My partner would prefer to do it in their codebase directly but I'm supposed to explore whether python could do it better.

haughty mountain May 6, 2024, 7:26 PM

#

even if it turns out to be hard to do better algorithmically, this is a task that's massively parallelizable

#

e.g. you can model this as K² problems of size (N/K)²

#

and then reduce to get the result

remote slate May 6, 2024, 11:00 PM

#

I have a network graph (minimum spanning tree) that I split into k groups (by removing the largest k-1 edges). this is expressed as a list of (X,Y) pairs. where X and Y denote nodes, and the existence of the pair denotes an edge between them.

I want to group nodes and assign labels to the groups. I've been staring at this problem for a couple hours now and am at a loss as to how to start.

#

I'm sure this is a solved problem. I seem to be blind as i can't find it already done.

haughty mountain May 6, 2024, 11:08 PM

#

just remove the edges and do some dfs traversal to find the graph components? pithink

remote slate May 6, 2024, 11:27 PM

#

haughty mountain just remove the edges and do some dfs traversal to find the graph components? <:...

yes, those were words.

#

i think i have just brute-forced it. need to test.

haughty mountain May 6, 2024, 11:28 PM

#

https://en.wikipedia.org/wiki/Component_(graph_theory)

Component (graph theory)

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components ar...

#

fairly basic words

#

Start somewhere you haven't yet visited, visit everything you can and mark that as visited. That is a new group/component. Repeat

remote slate May 7, 2024, 12:56 AM

#

I think i have it. pretty cursed code.

#

haughty mountain May 7, 2024, 1:53 AM

#

remote slate

a more typical representation of a graph is an adjacency list, where you have a list of neighbors for every node

#

so you get something like

graph = {'node': ['neigh1', 'neigh2'], ...}

components = []
seen = set()
for v in graph:
  if v in seen:
    continue
  # Traverse the graph.
  stack = [v]
  component = []
  while stack:
    cur = stack.pop()
    if cur in seen:
      continue
    seen.add(cur)
    component.append(cur)
    for neigh in graph[cur]:
      stack.append(neigh)

#

where most of the logic is just the graph traversal

#

breaking some part of the logic out of the loop:

graph = {'node': ['neigh1', 'neigh2'], ...}

components = []
seen = set()

def find_component(start):
  stack = [start]
  component = []
  while stack:
    cur = stack.pop()
    if cur in seen:
      continue
    seen.add(cur)
    component.append(cur)
    for neigh in graph[cur]:
      stack.append(neigh)
  return component

for v in graph:
  if v in seen:
    continue
  components.append(find_component(v))

grizzled copper May 7, 2024, 3:25 AM

#

What's the most efficient undirected graph representation for this problem?
max 5000 vertices, max edges 10,000 or v(v-1)/2 where operations needed are

Max number of vertices able to be reached with certain boolean value where weight on each edge < arbitrary amount
Minimum weight required to reach a vertice with certain boolean value
Minimum amount of weight (from edges) required to connect all vertices

#

For problem 1) I was thinking of keeping a list of all vertices with True tag and then checking if they are be able to be traversed to from current vertice

#

This would be a problem if the majority of vertices had the True tag though

#

would be Minimum spanning tree

rigid trench May 7, 2024, 3:53 AM

#

grizzled copper For problem 1) I was thinking of keeping a list of all vertices with `True` tag ...

1 has weighted edges with a maximum travel budget. So you can't just keep a list, because it depends where you start.

#

Besides, the algorithm to solve the problem isn't the question

#

You're asking about the storage

grizzled copper May 7, 2024, 3:55 AM

#

rigid trench You're asking about the storage

Sorry, the fastest representation i guess?

rigid trench May 7, 2024, 3:56 AM

#

The representation would be basically just one of 3 options. Adjacency Matrix, or Adjacency Lists being the most common

#

I think you want the algorithm, not the memory representation

#

Are you looking for one data structure? Or is it three different questions?

#

I guess I don't really understand what the question is trying to ask

rigid trench May 7, 2024, 4:04 AM

#

grizzled copper 3) would be Minimum spanning tree

I think it's minimum spanning tree for all three, but only if (1) has positive weights

grizzled copper May 7, 2024, 4:05 AM

#

rigid trench Are you looking for one data structure? Or is it three different questions?

I guess a combination?

#

I don't really care how much memory it uses so long as it can enable the algorithms to be fast

grizzled copper May 7, 2024, 4:14 AM

#

rigid trench I think it's minimum spanning tree for all three, but only if (1) has positive w...

Yeah, it's all positive integer weights

#

max weight is 10^9, guaranteed that all vertices have a path to all other vertices ignoring weight and no more than one edge connecting the same two vertices

haughty mountain May 7, 2024, 8:39 AM

#

grizzled copper I don't really care how much memory it uses so long as it can enable the algorit...

what does your algorithm need?

#

an adjacency list is usually a good default

grizzled copper May 7, 2024, 9:02 AM

#

haughty mountain what does your algorithm need?

#algos-and-data-structs message this

#

maybe an adjacency list with classes so i can contain the boolean values?

haughty mountain May 7, 2024, 9:22 AM

#

feels way overkill

#

the boolean labels can just be a list (or dict, if your nodes aren't represented by integers 0, 1, 2, ...

#

the adjacency list can store the weights, or even that can be kept separate in a dict, whatever is convenient

grizzled copper May 7, 2024, 10:03 AM

#

haughty mountain the boolean labels can just be a list (or dict, if your nodes aren't represented...

So something like

  graph = {
    1 : [(2, 3), (4, 2)]
    2 : [(1, 3)]
    4 : [(1, 2)]
  
  values = [None, True, False, True]

?

haughty mountain May 7, 2024, 10:10 AM

#

kinda, you're missing some edges

grizzled copper May 7, 2024, 10:13 AM

#

haughty mountain kinda, you're missing some edges

eh? why

#

i meant it as vertice1 : [(vertice2, weight)]

haughty mountain May 7, 2024, 11:21 AM

#

grizzled copper eh? why

oh wait, I misread it

#

the indices look off though

#

compared to the values list

#

4 would be out of bounds

#

usually I let my vertices be represented by 0, 1, 2, ...

#

in which case you can have

graph = [
    [(1, 3), (2, 2)],
    [(0, 3)],
    [(0, 2)],
]
  
values = [True, False, True]

grizzled copper May 7, 2024, 11:27 AM

#

Ohh alright thanks

regal spoke May 7, 2024, 6:49 PM

#

grizzled copper What's the most efficient undirected graph representation for this problem? max ...

If this is a competitive programming problem. Then those constraints makes it sound like it wasn't made for Python.
Btw I do not understand your query 2.

#

Minimum weight required to reach a vertice with certain boolean value

#

Reach from where?

#

As for graph representations. My guess is that having the graph represented by a list of edges stored in sorted order (depending on the weight) would be a good idea.

haughty mountain May 7, 2024, 8:41 PM

#

if you need to do MST-like stuff having a list of sorted edges helps, yeah

grizzled copper May 7, 2024, 11:11 PM

#

regal spoke If this is a competitive programming problem. Then those constraints makes it so...

There are several vertices constraints 1 <= < x <= N where they have a boolean value, what is the minimum weight (path to vertice that has the lowest max weight of all edges in the path) to reach any one of those from a certain vertices

regal spoke May 7, 2024, 11:12 PM

#

Is the graph changing for you, or is it static?

grizzled copper May 7, 2024, 11:12 PM

#

Static

regal spoke May 7, 2024, 11:13 PM

#

So 3. is just MST?

grizzled copper May 7, 2024, 11:13 PM

#

I think so yeah

#

Im just not sure on 1

regal spoke May 7, 2024, 11:16 PM

#

In 1. are you given a vertex v and a value x, and then the answer is to compute the number of vertices rechable from v only using edges with weight <= x?

grizzled copper May 7, 2024, 11:16 PM

#

Yes

#

This is my graph structure atm:

{1: [(2, 8), (3, 4)], 2: [(1, 8), (5, 6), (6, 12)], 3: [(1, 4), (4, 7), (5, 20), (7, 15)], 4: [(3, 7), (6, 13)], 5: [(2, 6), (3, 20), (6, 15)], 6: [(2, 12), (4, 13), (5, 15), (7, 10)], 7: [(3, 15), (6, 10)]}
{1: False, 2: False, 3: False, 4: False, 5: False, 6: False, 7: False}

#

So 2 dicts

regal spoke May 7, 2024, 11:17 PM

#

In that case, there is an rather simply way to solve 1. "offline".

#

The trick is to build the graph, one edge at a time, starting from the edge with the smallest weight

#

Keep track of the graph's connected components using a DSU