#algos-and-data-structs

1 messages · Page 53 of 1

exotic parrot
#

space as in making code less clutterd not memory

#

can the same be done with if statements?

regal spoke
#

The closest thing I know about if statements is that you can use any and all

#

!e

A = [1,2,3]

print(all(a > 0 for a in A))
halcyon plankBOT
#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

True
regal spoke
#

This is a lot more convinient than using an if-statement in a for loop

exotic parrot
#

oh so you can use this to check if there are negative numbers in your list, if not remove all numbers that are larger then your 'wanted' number

regal spoke
#

I didn't try to imply anything with my example other than that all and any can be nice to use

regal spoke
exotic parrot
#

in the context of the subsut sum problem

regal spoke
#

I wasn't talking about subset sum problem at all with my any/all example

exotic parrot
#

yeah I know, I was just thinking of implementations 'outloud'

regal spoke
#

I'd say try to go through this code and figure out what it is doing #algos-and-data-structs message
Especially this line is pretty nice return find_sum(A1, target - sum2) + find_sum(A2, sum2)

exotic parrot
#

is there any function that let's you remove something prom a list besided A.pop()

regal spoke
#

So you can remove intervals using del

exotic parrot
#

oh that's pretty handy

#

can it be combinden with all/any?

regal spoke
#

I don't see how del and all/any could be combined

exotic parrot
#

like idk del any( a > x for a in A)

regal spoke
#

Ah

#

Then you should do

#

[a for a in A if a <= x]

#

This creates a new list

#

that only contains the elements <= x

regal spoke
exotic parrot
#

I assume putting the a before for a in... is just notation? or "defining "a" "

regal spoke
#

!e

A = [i*i for i in range(10)]
print(A)
exotic parrot
#

or does it just refer to the element that has to be put inside the new list

halcyon plankBOT
#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
exotic parrot
#

ahh alright, it's that simple

#

this stuff could've saved me alot of time... dammit

#

I've been going through your code

#

U used set to make iterating faster, right?

regal spoke
#

The time it takes for
x in A1sums:
depends on what data structure A1sums is

#

If A1sums is a list, then this is slow (O(n) time where n is size of A1sums)

#

But if A1sums is a set, then this is fast (O(1) time where n is size of A1sums)

exotic parrot
#

how does it go over the set then?

#

I mean I know how a set works, I'm just wondering why it can get it out instantly without iterating

regal spoke
#

set is a hashtable. It uses hashes to pretty much instantly tell if x lies in A1sums or not

exotic parrot
#

ohh ok

#

haven't seen much stuff about hashes but I assume they're just "adresses"?

regal spoke
#

hash(x) is pretty much just a random number

#

You should look up hash tables if you are interested in how they work

#

Its not super important for now

#

Just remember that x in A1sums: is instant if A1sums is a set

#

and slow if A1sums is a list

exotic parrot
#

alr got it

#

the last return statement,
return find_sum(A1, target - sum2) + find_sum(A2, sum2) does this basically just start from our wanted number, and just keeps subtracting until we're left with target = 0

regal spoke
#

The idea is that if we get to that line of the code

exotic parrot
#

our target becomes the two summed numbers used to create initial target

regal spoke
#

Then I know there exists sum1 in A1sums and sum2 in A2sums such that sum1 + sum2 = target

#

Now I recursively ask for which numbers in A1 that can be used to make sum1, and which numbers in A2 that can be used to make sum2

exotic parrot
#

ohh ok

#

and the iterations stop, after target is equal to a number which cannot be written as a sum of available numbers

regal spoke
#

If it is not possible to hit the target, then the function returns None

exotic parrot
#

let's say I'd want to print those results, what would be a good place to place the print statement?

#

considering we iterate using previous sums

#

would putting used numbers in a list be a viable option?

#

and then just iterating over this list

#

with "+" and then just "=" target

regal spoke
#

!e

def find_sum(A, target):
  if sum(A) == target:
    return A
  if target == 0:
    return []

  A1 = A[:len(A)//2] # First half
  A2 = A[len(A)//2:] # 2nd half

  # Compute all possible sums of first half
  A1sums = [0]
  for a in A1:
    A1sums += [s + a for s in A1sums]

  # Compute all possible sums of 2nd half
  A2sums = [0]
  for a in A2:
    A2sums += [s + a for s in A2sums]

  # Check if target - sum2 = sum1, for sum1 in A1sums, and sum2 in A2sums
  A1sums = set(A1sums) # Make into set for fast lookup
  for sum2 in A2sums:
    if target - sum2 in A1sums:
      # Target is possible to reach!
      # Recursively find which numbers to use from A1 and A2
      # This runs very fast since A1 and A2 are tiny in comparision to A
      return find_sum(A1, target - sum2) + find_sum(A2, sum2)
  
  # Not possible
  return None

A = [1,2,3,4,5,6,7,8,9]
target = 26
print(find_sum(A, target))
halcyon plankBOT
#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

[1, 3, 4, 5, 6, 7]
exotic parrot
#

wait... what does return [] do?

#

I mean I know it has smtg to do with the output, but how does it get filled

regal spoke
#

Returns an empty list

#

Which corresponds to the empty subset

exotic parrot
#

ah ok, how does the output list get filled then?

regal spoke
#

The + on the return find_sum(A1, target - sum2) + find_sum(A2, sum2) line

#

The + here joins two lists

regal spoke
exotic parrot
#

well I know it stops at some point, but what happens to those previous iterations

regal spoke
#

It returns output just like any recursive function does

#

Recursive calls until it reaches a base case

exotic parrot
#

yeah but this function should return [] after the last iteration right?

#

and I guess that's the thing I don't get

#

how is ```[1, 3, 4, 5, 6, 7]

#

ah nvm ok

exotic parrot
#

ohhh

#

ok

#

nvm

#

I forgot that the target was changing, thus the sum which was returned by A

#

ok I see it now

#

alright, I think you helped me with some intuitive understanding of these recursive algorithms

#

thank you for your time, sorry for being a bit slow it's pretty late here

rigid trench
#

Hey, where would I ask about optimizing numpy?

#

There doesn't really seem to be a great "optimization" space other than this one maybe

slender sandal
#

Wdym

#

What do you need

haughty mountain
#

@slender sandal presumably this

#

but I don't think that's asking the right question, numpy will for sure do that pretty efficiently

#

what's the overall operation being done? maybe how that overall computation is done can be optimized

pearl crag
#

hello everyone , am new member in the group , i was wandering if someone can help me , i have some problems using python

#

?

languid thistle
#

guys I need a sanity check

#
    //first we sort, such that we can short-circuit after the first two items' sum is greater than the next item to be placed as the third item
    let result = []
    //it's each entry in candidates may be used once, not each number in candidates may be used once
    //results will hold an iteration-ordered list of each acceptable combination (also as an ordered list)
    //because there may be repeats in candidates, this may include multiple, identical lists
    candidates.sort(function (a,b){return a-b})
    console.log(`candidates: ${candidates}`)

    _combinationSum(0, [], target)

    function _combinationSum(i, usedIndices, remainder){
        console.log(`remainder: ${remainder} values used:`)
        console.log(usedIndices.map((e,i)=>candidates[i]))
        if (remainder==0){
            let usedVals = usedIndices.map((x, i)=>candidates[i])
            result.push(usedVals)
            return
        }
        else if (remainder <0){
         console.log('short circuiting with values')
         console.log(usedIndices.map((e,i)=>(candidates[i])))
         console.log(`and remainder ${remainder}`)
            
            return
        }
        else{
            for (let j=i; j<candidates.length;  j+= 1){
                const newRemainder = remainder-candidates[j]
                if (newRemainder < 0){break}
                let newUsedIndices = usedIndices.slice()
                newUsedIndices.push(j)
                _combinationSum(j+1, newUsedIndices, newRemainder)
            }
        }
    }
    return result
};

};```
#

example target: 8 example input: [10,1,2,7,6,1,5] example output: [[0,1,5],[0,3,4],[0,6],[1,3,4],[1,6],[3,5]] expected output: [[1,1,6],[1,2,5],[1,7],[2,6]]

#

where am I going wrong?

haughty mountain
#

sir, this is a python's||erver||

languid thistle
#

probably shouldn't matter though

haughty mountain
#

first off you're returning indices

languid thistle
#

examples are from the current code (it's big so I didn't wanna repost, just edited it

haughty mountain
#

0 is not an element

languid thistle
haughty mountain
#

those are for sure indices

languid thistle
#

it converts from the used indices to grabbing actual entries from candidates

haughty mountain
#

the examples aren't updated

languid thistle
#

I dunno, I can like just see it right there, the map over usedIndicies into candidates and returned to usedVals whenever the remainder is 0

#

maybe discord has not sent that update to the server yet but doubtful

haughty mountain
languid thistle
#

oh sorry the outputs

#

I thought you meant the code as an example for some reason

#

my bad

#

``
candidates:[10,1,2,7,6,1,5]
target =8

Output
[[1,1,2],[1,1,2],[1,1],[1,1,2],[1,1],[1,1]]
Expected
[[1,1,6],[1,2,5],[1,7],[2,6]]```

haughty mountain
#

so you're clearly extracting the wrong values

#

and duplicates of sets, presumably

#

just from the output I can tell this doesn't do what you think it does

usedIndices.map((x, i)=>candidates[i])
#

it's suspicious that all the values are the first few in the sorted sequence, no?

languid thistle
#

I would sort of agree with you except for the fact that the output is being built from a recursive call and you know how those can get whacky. Say, just repeatedly grabbing the first element or repeating over a recursive step which is already done, or not updating the input to subset a given recursive call's input for the next call

languid thistle
#

map will take each element of an array, assign it to x, and assign i to whatever the index of that element is in the array

haughty mountain
#

why would you want to use that index?

languid thistle
#

well because, once the remainder hits 0, I need to take the indices whose' values were used in computing the remainder and find their associated values to build the set that was subtracted from the initial sum

haughty mountain
#

I'm not asking why you're trying to map things back to values

#

I'm saying

usedIndices.map((x, i)=>candidates[i])
```is just incorrect
#

it's not doing what you want it to do

#

there is a reason all the sequences in the output are all the first few elements of the sorted sequence

languid thistle
#

can you just explain why you think it is that part specifically that is not correct / not doing w hat I think it's doing?

#

I can explain why I think it does what I explained

haughty mountain
#

you're always extracting the first few elements of candidates

#

you're not extracting the elements at the indices of usedIndices

languid thistle
#

yeyp

#

oh

haughty mountain
#

I'm assuming this would work fine

usedIndices.map(i=>candidates[i])
#

now things will actually have the target sum

#

but you'll probably have duplicates, which is a more fundamental problem

languid thistle
#

so [400,2].map(i=>candidates[i]) will return [candidates[400], candidates[2]]

haughty mountain
#

correct

#

you are storing indices

#

you want to extract the corresponding values in candidates

#

this does that

languid thistle
#

thanks dude

#

definitely definitely that is my issue. I went brain dead and was using indices derived automatically of the map function for their array entries to map instead of that same array's values which are, in fact, the relevant indices

#

much thank you for taking the time to enlighten me

rigid trench
#
import cupy as cp
self.F = cp.random.random((res,res), dtype=cp.float32)

def addTo(self, arr):
    # Very slow.
    i0, i1= self.index # The middle of the right square
    wrap0 = i0 if i0!=0 else self.res # Fixes an issue with slices when the origin is 0
    wrap1 = i1 if i1!=0 else self.res
    arr[:-i0,:-i1]+=self.F[wrap0:,wrap1:] # A
    arr[:-i0,-i1:]+=self.F[wrap0:,:wrap1] # B
    arr[-i0:,:-i1]+=self.F[:wrap0,wrap1:] # C
    arr[-i0:,-i1:]+=self.F[:wrap0, :wrap1] # D
    return arr

This is my bottleneck, looking for ideas on how to speed it up.

lean walrus
#

i have no experience with cupy (and little experience with numpy), but i think this might help: https://docs.cupy.dev/en/stable/reference/generated/cupy.roll.html#cupy.roll
the idea is to "shift" first array down-right by the size of A, and then add it to the original array
documentation says that Elements that roll beyond the last position are re-introduced at the first., which is perfect for this case

rigid trench
#

Yeah the issue is that roll I believe re-writes in memory

#

so while it works, it's slow

tender atlas
#

Have you ever guys solve DFS with shortest path? I have 20x20 grid where i am getting a path which so much larges. I tried path compression to make the dfs path from start point to goal point for getting an optimal path. Is it really possible to get an optimal path from dfs?

regal spoke
#

Also fyi, a 20x20 grid is tiny, so any shortest path program should run super fast for it

regal spoke
haughty mountain
flat sorrel
regal spoke
#

One more thing, A + D = D + A and B + C = C + B. So looks like you are computing all sums twice

flat sorrel
#

btw what is the relationship between arr and self.F?

muted helm
#

Hola guys, I have to convert my sha256 into something that is bigint compatible - just want to triple check but the below is correct right?:
def f_sha256_to_8_byte(f_sha256_input_value):
return int(f_sha256_input_value, 16) % 2**63

slender sandal
#

You can mask it via int(...) & ((1 << 63) - 1) to not use modulus and exponentiation

#

Also you should probably use the right number of bits to mask, so int(...) & ((1 << 255) - 1)

#

Oh I'm just now realizing the name of the function

#

SHA256 gives hashes of 8 32-bit integers, not 8 one byte long integers

muted helm
#

Yeah but the idea would be to take sha and compress it into bigint which i understand to be 8 byte

#

This is just for internal house keeping where I have to use bigint

#

you are right it should be
def f_sha256_to_8_byte(f_sha256_input_value):
return int(f_sha256_input_value, 16) % (2**63 -1)

#

But that should give the same result no?

flat sorrel
#

if you're applying modulus anyway, why not simply truncate the string before converting it into an int?

muted helm
#

I am fairly indifferent, only point is then I have to calculate how much to truncate

regal spoke
#

oh I replied to the wrong comment

muted helm
#

I know, I meant the same as in using modulus or truncating

#

actually it wont be the same either

#

but it should serve the same purpose given sha256 is uniform

#

or am I completely on the moon?

rigid trench
# flat sorrel btw what is the relationship between `arr` and `self.F`?

arr is an accumulator.
The algorithm outline:
There are 6 matrix of size 256x256 (or any size). Each matrix has an offset. This avoids then need to roll the matrix, but makes the sum of two matrixes more complicated.

For example, Matrix 1 may have the origin at (5,0). In which case the data is really [5:, 0:] concat to [:5,0:]
That data is collected at the accumulator at indicies [:-5,0:] concat to [-5:,0:] We avoid concatting in practice by accumulating directly into the correct area of the matrix.

regal spoke
#

Your code is wrong, you want either %2**63 or &(2**63 -1) (these are the sasme thing)

muted helm
#

ah because mod also returns 0 I suppose?

regal spoke
#

wut

#

% (2**63 - 1) is a weird operation that you should probably avoid using

muted helm
#

Yeah my bad I confused 2 thiungs

#

in any case, given that sha256 is uniform mod 2**63 should compress sha256 correctly to bigint right? (Fully appreciating that it materially increases collision risk, but that should be managable from my end as I dont have more than a few million records)

#

I am happy to truncate instead if that is better

flat sorrel
#

I don't really see why the current function is "slow" enough to matter

rigid trench
slender sandal
rigid trench
# flat sorrel can you show an example of how you're calling the function?

class LatticeVector:
  def addTo(self, arr):
    # Very slow. [ above]

class LatticeFrame:
  def addTo(self):
    self.temp.fill(0)
    return self.Qs[4].addTo(self.Qs[3].addTo(self.Qs[2].addTo(self.Qs[1].addTo(self.Qs[0].addTo(self.temp)))))

class CubeLattice:
  def get(self):
    for i in range(6):
      self.output[i,:,:]=self.F[i].addTo()
    return self.output.view()

L=CubeLattice(256) # controllable texture size
cProfile.run('''for i in range(1000):
                     L.get()''')
muted helm
flat sorrel
#

e.g. by stacking the input array to 3D

haughty mountain
rigid trench
#

in total

flat sorrel
#

then modifying your function to handle batch input

def addTo(self, arr):
    # Very slow.
    i0, i1= self.index # The middle of the right square
    wrap0 = i0 if i0!=0 else self.res # Fixes an issue with slices when the origin is 0
    wrap1 = i1 if i1!=0 else self.res
    arr[..., :-i0,:-i1]+=self.F[wrap0:,wrap1:] # A
    arr[..., :-i0,-i1:]+=self.F[wrap0:,:wrap1] # B
    arr[..., -i0:,:-i1]+=self.F[:wrap0,wrap1:] # C
    arr[..., -i0:,-i1:]+=self.F[:wrap0, :wrap1] # D
    return arr
haughty mountain
rigid trench
haughty mountain
#

you're running stuff on a GPU, you're probably bound on transferring data to the GPU

haughty mountain
#

transferring data to the GPU is comparatively slow

#

hence stuff like batching

flat sorrel
#

it skips the leading dimensions, so :-i0, :-i1 (for example) would correspond to the last 2 dimensions

haughty mountain
#

I still say get some baseline just based on numpy

muted helm
haughty mountain
#

i.e. how long does this take on the CPU?

rigid trench
flat sorrel
#

you can perform the sum (a reduction operation) over the leading axes at the end

slender sandal
flat sorrel
rigid trench
slender sandal
haughty mountain
#

do you have some minimal runnable example?

muted helm
rigid trench
slender sandal
flat sorrel
rigid trench
#

I've adapated it for a cube

muted helm
rigid trench
#

(hence: 6 surfaces of square size)

rigid trench
#

Right now, that's 5 matrixes per cell, but a standard D2Q9 setup would require 9 (one for each direction + standing still).
A 3D model (which I might try) would require between21 to 27 matrixes to be added together.

#

My gut tells me: all the data is already in the GPU, and it's just matrix addition. So I don't know if I'm doing something wrong, I feel like it should be able to add much more data

#

Otherwise, I'm bound to maximum 150 FPS

slender sandal
flat sorrel
muted helm
flat sorrel
#

so basically you have an accumulator array with shape (256, 256), and you want to add Q other matrixes (each also (256, 256) in size but with different offsets) to that accumulator array

#

@rigid trench am I understanding this correctly?

slender sandal
muted helm
#

Yeah unfortunately I have to given thats the required values

#

but mod using 2**63 should do the same no?

slender sandal
#

**64 I think

#

Or you could do

int.from_bytes(hashlib_obj.digest()[-8:])

which is a little cryptic

#

Not the security related definition of cryptic

flat sorrel
#

so a basic numpy code for the computation would be something like

#

!e

import random
import numpy as np

def rand_q(height: int, width: int):
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 16
Q = 9
result = np.zeros((HEIGHT, WIDTH), dtype=float)
origin_frames = [rand_q(HEIGHT, WIDTH) for _ in range(9)]

for origin, frame in origin_frames:
    result += np.roll(frame, origin)

print(result)
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | [[4.48619444 3.74053379 4.23902768 3.92672632 5.08335403 3.34358598
002 |   5.71451535 5.06600711 4.32237329 4.51426973 4.46561417 4.62321253
003 |   5.12898984 5.95249311 6.05474966 3.04084689]
004 |  [4.40489298 5.21467323 4.93804735 5.60822198 4.43054237 5.08963529
005 |   4.60860832 3.8062857  3.39699674 4.37238177 4.30100951 4.60762078
006 |   3.3029477  5.61203333 5.4895313  5.8631443 ]
007 |  [4.17443367 4.91755452 3.98782944 5.72625161 4.72554572 3.36503766
008 |   3.82932581 3.69205534 5.63291705 4.72333282 5.58175435 3.81820228
009 |   4.04194926 6.38564139 3.13121809 4.56128867]
010 |  [3.9851704  2.49233092 4.94240525 2.23528422 4.89501334 3.47701614
011 |   4.12929219 5.22873326 5.02825565 5.36588294 6.81544752 4.66081924
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/DLXG5P3X4I2ULWK6SMCBD5WK3Y

flat sorrel
#

and then you have to perform this computation for every frame (up to 150 FPS)

rigid trench
flat sorrel
#

random.randrange returns an int, so it should be ok

#

is there any relationship between the origin and/or values of different Q matrices?

rigid trench
#

Technically yes.
Consider the two Q matricies representing "go east" and "go west".
On timestep 1, they are both origin (0,0).
On timestep 2, they are on (-1,0) and (1,0)

#

after 256 steps, they'll both be back to (0,0)

flat sorrel
#

how about within the same timestep?

rigid trench
#

Within the same timestep you can represent their origin as:
originQ = timestep * (directionVectorQ)

flat sorrel
#

direction vector being each combination of {-1, 0, 1} x {-1, 0, 1}?

rigid trench
#

Yup, for D2Q9 setup, that's exactly correct

#

For D2Q5, you don't include the diagonal directions (there is some accuracy / time trade offs based on the number of directions included)

flat sorrel
#

but the values may be different for each Q, right?

rigid trench
#

Yes the value matrixes are 100% unrelated

#

They represent "number of particles moving in direction Q, at timestep T, in this grid cell" which can be anything

flat sorrel
#

are the values for the same direction vector fixed between timesteps?

rigid trench
#

They are not fixed, but tend to be correlated with the previous step

#

The calculation is:
probability of interacting * CollisionDistribution(x,y) + (1- probability of interacting) * currentDistribution(x,y)

flat sorrel
#

so those values cannot be computed in advance, since they are dependant on the previous step

rigid trench
#

Yup

#

Steps are:

  1. Move the origins (imagine every point is "moving", but instead we move the origin in the opposite direction)
  2. Update each cell based on the collision probability

Repeat for each timestep

#

So this algorithm handles step 1

flat sorrel
#

maybe you can use one of the Q matrices as the "anchor" instead of creating a fresh zero matrix (I doubt that would be much of an improvement though...)

rigid trench
#

We use the sum of the number of particles moving in all directions Q to calculate the density, which feeds into the probability of collision

rigid trench
#

But I can't clobber the values

#

I feel like maybe I can do something in parallel.
Like:
temp= Q1,0 + Q-1,0
temp2= Q0,1 + Q0,-1
temp3= Q0,0
temp3 += temp
temp3 += temp2

#

For d2Q5 it's not that much of a speedup, but it seems like I can get like, log2 behavior instead of linear

flat sorrel
#

you only have a relatively small number of matrices (<10 in your 2D case), the overhead of parallelizing the operation may be significant

rigid trench
#

yeah fair

#

Do you suppose doing anything here with like, Numba, or something might help?

flat sorrel
#

and since the values in each Q matrix are different, you can't use symmetry to reduce the number of operations

flat sorrel
rigid trench
#

When I run cupyx benchmark, it seems to imply that I'm CPU bound, but I'm not sure why. I don't think I'm copying things out of GPU...

flat sorrel
#

can you guarantee that certain patches in each Q matrix contain only zero values? then you might be able to ignore those parts when performing addition

rigid trench
#

Nope, these will all be non-zero

#

here's an example visualization (you can see the "noise" as the particles move around)

#

Actually it might not be visible in the gif

flat sorrel
#

the only thing I could think of right now is writing a custom CUDA kernel to perform the addition with the given offset. that way, you can avoid having to use roll() which copies the array.

rigid trench
flat sorrel
#

yeah, that is going to be a pain xD

#

personally I don't have much experience in it either

rigid trench
#

On my desktop I can hit around 210 FPS, which I guess is plenty. When I do the other steps it will likely still be >60 FPS which is probably fine

#

Obviously faster is better, but I guess as long as I maintain interactive speed it's OK

flat sorrel
#

parallelizing might become more useful when you need to perform this computation at higher resolution (so the overhead isn't as much compared to the total computation time)

#

but in that case the FPS would probably be so low that you have to pre-compute your simulation before visualizing it

rigid trench
#

Thanks, I'll do some more coding and circle back. Already doing Cupy helped so much more than Numpy. I was getting like between4-30 FPS before, so Cupy unlocked the project, and I wanted to check if I'm missing anything else before ending this round of optimization

flat sorrel
#

@rigid trench ok I just thought of more ideas.

  1. For each offset (which you know in advance), you can pre-compute the indices to assign and reshape them to (256, 256, 2) (assuming 2D), then you can use multidimensional indexing to assign the values without using roll(). However, the "random access" nature may cause this to actually be slower than your quadrant-tiling approach.

  2. Consider your quadrant-tiling approach. Although the values are not symmetrical, the offsets are symmetrical. This opens up the possibility of batching the additions in each quadrant across multiple Q matrices. (e.g. by considering which regions overlap) Actually, I think you may be able to move around the slice indices such that the indices on the arr side are the same for each Q, enabling you to batch the additions.

  3. Reuse the accumulator matrix (keeping the same instance but resetting its values to zero between each function call) so that memory does not have to be reallocated.

#

I have to go to bed soon so I hope this is clear enough for now xD

flat sorrel
#

@rigid trench updated ^

haughty mountain
#

err

#

oh wait

#

100 instead of 1000, let me delete and re-run

#

because python but is dumb

#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9
origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')
halcyon plankBOT
#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 0.5476519698277116ms
haughty mountain
#

and locally

> python a.py
Time: 0.289999102242291ms
#

so stupid cheap pithink

#
You have: 1/0.54ms
You want:
        Definition: 1851.8519 / s
flat sorrel
#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 4.613355630135629ms
flat sorrel
#

seems that memory allocation is the real bottleneck then, not the assignment using roll()

#

!e

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.random.rand(height, width)
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for i in range(n_runs):
  if i == 0:
    result = np.zeros((HEIGHT, WIDTH), dtype=float)
  else:
    result.fill(0)

  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 4.700911450199783ms
flat sorrel
#

hmm reusing the accumulator matrix doesn't seem to improve performance

haughty mountain
#

!e Let's see how big a fraction is just the random number generation

import random
import time

import numpy as np


def _rand_q(height: int, width: int) -> tuple[tuple[int, int], np.ndarray]:
    y0 = random.randrange(height)
    x0 = random.randrange(width)
    values = np.zeros((height, width))
    return (y0, x0), values

HEIGHT = WIDTH = 256
Q = 9

n_runs = 100
start = time.perf_counter()
for _ in range(n_runs):
  result = np.zeros((HEIGHT, WIDTH), dtype=float)
  origin_frames = [_rand_q(HEIGHT, WIDTH) for _ in range(9)]
  for origin, frame in origin_frames:
      result += np.roll(frame, origin)
end = time.perf_counter()

print(f'Time: {1000*(end - start)/n_runs}ms')
#

err

flat sorrel
#

put height and width in a tuple

halcyon plankBOT
#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

Time: 1.2401868900633417ms
haughty mountain
#

so a really big chunk of it

#

is just the random number generation

flat sorrel
#

still 1.2ms is quite a bit more than 0.5ms

#

I wonder how @rigid trench determined that the assignment of the matrices is the bottleneck then

#

Perhaps we could get better gains by optimizing the computation of the values in Q

haughty mountain
#

on the gpu maybe it is because it ends up bottlenecked by small data transfers to the gpu

#

which is why I was interested in seeing just plain numpy, which I assume would be quite fast already

flat sorrel
#

that is true. maybe it would be better to perform the whole accumulation step in cpu before transferring the result into gpu

flat sorrel
regal spoke
# slender sandal `**64` I think

%2**64 gives 64 bits but the result could be a Python big int, which kind of defeats the purpose. I think %2**63 returns a normal int (not a big int)

dawn iris
#

Does anyone have a preference on using tuple vs dict on a func . dict for instance makes the code clean as you dont have to unpack it if a func sets or return multiple vals but tuple on other hand provides a consistent order but having more return items make the code ugly .

flat sorrel
#

Why not use position/keyword arguments instead of passing tuple/dict? In most cases I think doing so would be more convenient

dawn iris
#

its the return value i am after do i set the dict inside my func or do i return the tuple

#

having like 8 tuple returns kind of makes it ugly and i am hesitant on setting and retrieving a key from a dict

flat sorrel
#

I see. For return values, I much prefer dict over tuple for the readability.

#

even though Python doesn't have object unpacking syntax like in JavaScript, if I have to return more than 2 items from a function then I would most likely use a dict

#

if the accessor syntax val = return_dict["key"] is too ugly, returning a dataclass is also a valid option

flat sorrel
hushed yoke
#

guys best resource for dsa? ik im asking the most annoying basic ass question but please tell.

young totem
jolly mortar
#

https://codeforces.com/contest/1969/problem/D
this was my solution

t = int(input())
for _ in range(t):
    n, k = map(int, input().split())
    *a, = map(int, input().split())
    *b, = map(int, input().split())
    b, a = zip(*sorted(zip(b, a)))
    prevn = [0]*(k+1)
    prevn[0] = max(b[0] - a[0], 0)
    for nn in range(1, n):
        newn = [None]*(k+1)
        newn[0] = prevn[0] + max(b[nn] - a[nn], 0)
        for kk in range(1, k+1):
            newn[kk] = max(prevn[kk-1] - a[nn], prevn[kk])
        prevn = newn
    print(prevn[-1])

which TLEd on test 11 😔
algorithmically can you do better than this O(nk) or is it a constant factor diff

rigid trench
solemn moss
#

It's easy to do with ordered set, but python doesn't have it built-in :(

jolly mortar
#

pls share 👉 👈

solemn moss
#

O(nk) surely can't pass here

jolly mortar
#

ty

solemn moss
#

We want only values where B[i] > A[i], other are useless

Two sets - one is for values that we take but bob doesn't take them, second one for values that bob does take

Then we erase the value with the biggest A value from the second set, and update it with the new value that would get into it from first one (the one with max B value)

And the answer is the best from these possible answers

solemn moss
#

Yep

jolly mortar
#

oh

#

i get it now

#

thanks

buoyant wing
#

billybobby said to check it out and hes a mod soooo shrug

#

pls help tho, i really need it 😭

keen hatch
#

hi yall, homework problem I'm a bit stumped on.
Given two sorted integer arrays A and B, merge B into A as one sorted array in O(n) time. Write the description and code of the algorithm.

def two_sorted_merge(a: list, b: list):
    a_point = 0
    b_point = 0

    # Run loop while both our pointers are in bounds
    while a_point <= len(a) - 1 and b_point <= len(b) - 1:
        # If our current element in A is greater than b[b_point], insert behind us and increase b_point
        if a[a_point] > b[b_point]:
            a.insert(a_point, b[b_point])
            b_point += 1  # Move our b pointer forward to look at the next element in the array

        # We need to increase our a pointer whether we just inserted an element from B or not. If we didn't then it's
        # time to move forward and see if the next element in A is greater than b[b_point]. If we did then all our
        # indexes in A just shifted up 1, and we need to increase to maintain our same position in the array.
        a_point += 1

    # If B has items remaining then b[b_point] >= max(a), so we can simply insert them all
    if b_point <= len(b) - 1:
        for i in range(b_point, len(b)):
            a.append(b[i])

    return a```
this function does actually work but `a.insert(a_point, b[b_point]` makes the whole thing not O(n), and I'm having trouble imagining another way
modern gulch
#

If it's not O(n), what do you think it is?

keen hatch
modern gulch
keen hatch
#

ahuh

modern gulch
#

Oh, your issue is the insert.

#

Why not just build a new list, rather than mutate a?

#

Or is it required to mutate a?

keen hatch
#

the problem specifically says merge b into a

modern gulch
#

Did they provide the stub, and did they return a?

keen hatch
#

yeah from the example it looks like thats what they want

#

I wasnt sure because my professors english isn't great but example seems pretty clear 🙁

modern gulch
#

yah, the normal solution to this is to merge into a new list, since you're just appending to the end.

keen hatch
#

thats what I did at first but the example got in my head

#

I did email to ask for clarification

#

but tbh I just wanted to check to make sure I wasn't missing some super obvious way to do this

#

but it doesn't seem obvious in O(n) with the built in lists. implementing it as a linked list would make it work right?

modern gulch
#

Hmm, I guess you could do the merge in reverse. Extend A, and find the max element and put it at a[max_position]

#

then work backwards

#

(max to min avoids modifying the elements as you're using them)

keen hatch
#

that makes sense 🤔

#

would there be input combinations where unsorted elements in A get overwritten by elements that come in from B?

modern gulch
#

Or even easier, maybe just merge into an empty list, then a.clear() and add the sorted list?

keen hatch
#

meh I think if they want it in A they want me to work in place and not allocate a new list

#

I'll try the backwards merge, thank you very much for that suggestion

keen hatch
modern gulch
urban tundra
#

I solved it correctly...
But I want to know.. that Is my solution better than the approach given by website.??

and as we can see the approach given has time complexity of O(n*n) whereas mine solution is having time complexity of O(n) right...???

#

If anyone has any kind of suggestions etc.. pls let me know.. and ping me any number of times... not a problem... 😉

#

➕ ❕another ques..: does having comments in my code.. will increase runtime.. even by 1ms?? Ik that interpreter ignores it but it still parses it right.. so would it affect runtime even by 1ms??

urban tundra
haughty mountain
#

O(2n) is the same as O(n)

#

your solution is not O(n)

urban tundra
#

ik thats why i said.. O(n)

urban tundra
#

@haughty mountainthere are no nested loops.. then how it can be.. O(n*n)??

haughty mountain
#

number of nested loops doesn't automatically tell you the complexity

#

why are you assuming your prints are O(1)?

urban tundra
haughty mountain
#

well, prints and the string operations

urban tundra
haughty mountain
#

your thing is also O(n²)

urban tundra
#

🤯

haughty mountain
#

e.g. "a"*n is O(n)

urban tundra
#

Omg

haughty mountain
#

it shouldn't be surprising

#

it's creating something of size n

#

so it can't be cheaper than O(n)

urban tundra
#

O(2*(n*(3*n))) this is the correct time complexity ... right..??@haughty mountain

#

I mean more precise...

haughty mountain
#

that's missing the point of big O notation

urban tundra
#

😅

urban tundra
haughty mountain
urban tundra
haughty mountain
#

if you cared about performance on such a level you wouldn't use python anyway

urban tundra
#

@haughty mountain chatgpt is saying that it is.. O(n) not O(n*n)
🤔

haughty mountain
#

it's wrong

urban tundra
#

Okay.. dude

haughty mountain
#

like just consider how many characters you even output overall

#

you're outputting O(n²) characters

#

you can not do better than O(n²)

urban tundra
#

Oh.. then the space complexity is also O(n*n) ... right?? @haughty mountain

haughty mountain
#

how do you count space complexity?

#

the program itself doesn't use more than O(n) memory at one time

urban tundra
#

Can you pls .. give me some resources to learn and understand these concepts great like you.. @haughty mountain

#

if possible something that you referred

haughty mountain
#

I don't have resources for this

#

it's basically just a math exercise of counting stuff

urban tundra
#

Okk

#

@haughty mountain this websites.. is showing space complexity of O(n*n)... 😅

#

see last line

haughty mountain
#

your program doesn't do that

#

it just prints individual lines of output to an output stream

#

the thing that reads and displays the output would do that kind of thing

#

the program itself doesn't

urban tundra
#

Ohkay...

haughty mountain
#

err, that j in the range should be an i

#

but doesn't change it saying it's O(n²)

#

that code is O(n log n)

flat sorrel
#

sounds like this is some AI-powered website

haughty mountain
#

sure does

flat sorrel
#

don't blindly trust the AI, especially when logical thinking or math is involved

#

they can and do make mistakes

haughty mountain
#

I like that it got to
n + n/2 + n/3 + ...
and then just claims that's O(n²)

#

(technically it is, but only because O is an upper bound)

#

n + n/2 + n/3 + ... is even Θ(n log n)

regal spoke
#

I highly doubt the point of the exercise is to do some kind of in place implementation because that wouldn't really make sense

keen hatch
solemn moss
regal spoke
flat sorrel
# solemn moss

I trivially got it to leak the system prompt. They really didn't put much thought into this site.

#

(At least add a disclaimer that the answer is AI-generated so that people won't be misled by it...)

keen hatch
haughty mountain
#

well that's underwhelming

keen hatch
#

I definitely just read too far into "merge B into A" as my professor emailed me back and said my first solution was fine lmao

haughty mountain
#

extend A, merge into A back to front

keen hatch
#

what he said

regal spoke
#

huh

#

O(n^2)?

haughty mountain
#

no

regal spoke
#

Oh I see now

#

Well that is not in-place

#

But I guess it is still pretty efficient memory wise

keen hatch
#

yeah I realize now it's probably technically impossible to "in place" merge a list into another when the given list isn't big enough to hold all the elements from the 2nd lmao

haughty mountain
#

in-place doesn't make much sense when you have two separate lists to start with 😛

keen hatch
#

yea

regal spoke
#

yes

keen hatch
#

well, I hope my professor likes the over engineered solution

regal spoke
stray fractal
#

well i mean

keen hatch
#

openai's api might be the death of interesting tech projects / startups 😵‍💫

#

i used to browse r/saas but half the shit posted there now just ends up being a wrapper of chatgpt with a different system prompt :/

haughty mountain
flat sorrel
#

If you replace >= with >, it becomes O(n) which is closer to the real answer, I guess...

urban tundra
#

I want to ask that.. my solution is OK??
because I think that the way I used If statement is maybe wrong.. idk
let me know, pls

flat sorrel
#

this isn't usually where you have to care about alg. complexity

flat sorrel
#

having a higher complexity

urban tundra
#

Ok 👍

lean walrus
haughty mountain
haughty mountain
#

the prompt is also pretty garbage

#

the lowest of efforts

agile sundial
#

😔

lean walrus
#

😔

outer rain
lean walrus
#

i asked it to write philosophy essay

muted helm
#

This might be really dumb - but why is in rare casing python printing with double quotes instead of single

#

or well it is not printing, but I can see in my list that it is double cased

#

[ 'xxxx,
"yyyy",
'xxxx']

hallow slate
#

hey guys!

I have code in Python and I want it to run at scheduled times in the cloud without the need for my computer to be turned on.

I know I can upload the code to Google Cloud, AWS...

I wanted to know if you have any tips on which one is better or if there is another way that might be better (the code isn't very big, it's simple).

I would need to monitor it to know if it is running correctly.

haughty mountain
#

!e

print(repr('with singles'))
print(repr('that\'s a double!'))
print(repr('''"oh god, what's this, I guess use singles"'''))
halcyon plankBOT
#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 'with singles'
002 | "that's a double!"
003 | '"oh god, what\'s this, I guess use singles"'
haughty mountain
#

it feels slightly concerning I knew the exact output of these before running them 😅

lucid harbor
#

how do i speed up this loop, it needs to be very fast, so i can run it like 120 times a second:

`def RSI_strategy_numba(data: pd.DataFrame, rsi_values, indicators) -> tuple[list[pd.DatetimeIndex], list[pd.DatetimeIndex]]:
buy_dates, sell_dates, state = [], [], 0
for idx, rsi in zip(data.index.values, rsi_values):
# If were in the buy state, check for a buy
if rsi > indicators[0] and state == 0:
buy_dates.append(idx); state = 1
# Otherwise check for a sell
elif rsi < indicators[1] and state == 1:
sell_dates.append(idx); state = 0

return buy_dates, sell_dates`
flat sorrel
#

avoid using Python loops. try to make use of boolean indexing to vectorize conditional assignment

lucid harbor
#

@flat sorrel I've tried that, but i cant figure out how.
Every answer does not seem to account for the fact that i can only sell after i buy and vice versa

flat sorrel
#

maybe you can get the potential buy and sell dates (without regard for state) in the first pass, which can be easily vectorized

#

you can then deal with the state separately, simplifying the problem

regal spoke
vocal gorge
#

why is this function called "numba"? Does it get @njit-compiled and you just omitted the decorator doing it?

#

(generally speaking, for a task like this I'd expect a numba function to be a good idea)

stoic sand
#

is there any gud course for dsa in python.

lucid harbor
fervent light
#

uh

#

someone pinged me

flat sorrel
astral patio
#

Can I use perfplot for several algorithms that I am benchmarking:

def aho_corasick(string_to_search: str, file_contents: Set[str]) -> bool:
"""
Aho-Corasick Algorithm

:param string_to_search: string to search
:param file_contents: set of words to search for
:return: True if any word from the dictionary is found in the text, False otherwise
"""

That is a sample of the algorithmic function I will be testing on alongside other algorithmic functions.

I want to benchmark based on input sizes of 10,000 to 1,000,000 rows
also based on a REREAD_ON_QUERY setting whether True or False
and lastly on number of string_to_search(queries) per second. How do I go about this?

modern gulch
modern gulch
astral patio
#

Yes, I am finding problems implementing it to my use case

astral patio
modern gulch
modern gulch
#

CSV is convenient because you can just write line by line.

astral patio
#

I was thinking of how to automate it when I can just be simplifying it and using other external factors

astral patio
# modern gulch CSV is convenient because you can just write line by line.

These are the criterias:

Unit tests for:

Showing different execution times for different file sizes from 10,000 to 1,000,000 with a client you write for testing purposes and cover these in your speed testing report,

Showing different execution times for file sizes vs. number of queries per second, up to the point that the server can not handle it anymore (document the limitations of the software),

The thing is I do not know how to wrap my head around it and I would really appreciate if I can get a step by step approach to get to chart it across multiple other algorithms

modern gulch
#

Can you test a filesize of 10,000 right now?

astral patio
#

Yeah, I should be able to. Although my PC be acting funny lately but I still could

#

That is why I am using CDE (Cloud Development Env)

modern gulch
#

So, figure out how to:

#
  1. Run a single test case and capture the execution time. Write this execution time to a CSV file.
#
  1. Then, write a loop to run that test case for multiple file sizes
#
  1. Plot the results from the CSV
#

I don't quite understand the file sizes / queries per second, but you'll have to repeat the process: figure out how to test for a certain file size / query per second.

#

But I'd deal with the first one first.

astral patio
#

Okay, this should do

#

What do you use to plot the results from the CSV, I am not quite familiar with CSV manipulations

#

I only know it can be used in Excel and programming languages

#

Also I was wondering if matplotlib will work out but that is like setting myself up for more complexity

modern gulch
astral patio
modern gulch
#

But, afterwards, you can read the file using: ```py
import pandas as pd
df = pd.read_csv('yourfile.csv')

astral patio
modern gulch
astral patio
#

rate limiter? that is new, I will look that up. That QPS was the genesis of my problem

modern gulch
#

!pypi pyrate_limiter

halcyon plankBOT
modern gulch
#

I'm just not sure it's the right answer for load testing.

astral patio
#

Okay, I am going to start from there. Thanks

#

I also wanted to ask, b ased on testing, do I have to use the server with the tests, i.e generation of the 10_000 rows file sizes

outer bane
#

teoo

astral patio
# halcyon plank

And @modern gulch this should be at the test_client.py side, yeah? Not the server side

modern gulch
#

yes

astral patio
#

Okay thanks

modern gulch
astral patio
#

even if the server is reading on localhost?

modern gulch
#

I dunno, start simple I guess

#

But load testing a server by running code on the server is probably a bad idea

outer bane
#

Idk

#

Anyways

astral patio
modern gulch
#

Doesn't that mean that you're running the load test from the server?

astral patio
#

Yes, I would be running the load test from the server

#

sorry for the delayed response

#

want to fill up my daily log

#

You said it this way, "But load testing a server by running code on the server is probably a bad idea", and it made me wonder if I am to run a new code on the server to load test or do I have to create a new file for load test

cosmic swallow
#

can someone tell me what is the time complexity of insert, findmin, heapify, and remove operation on a minheap?

swift arch
#

not sure if it it just me but I barely seen tech companies that use Python implement DS principles like stack, heap, queue, etc how come?

tardy monolith
#

is this the right place to ask for help understanding the complexity of an algorithm?

tardy monolith
#

i basically need to figure out the worst case complexity and write it in big O notation

#

if anyone thinks they can help, ping me or dm me and i can post the code and we can talk through it. it isnt that long its a brute force longest common substring algorithm that takes two string inputs

#

i just dont fully understand how to work out the complexity, i have a bit of understanding on the subject but its not fully there and would like some hints or pointers

haughty mountain
tardy monolith
#

so thats my little algorithm above for the longest substring brute force method

#

im trying to work out the worst case complexity

#

so far im assuming that a for loop is generally linear but with the slice operation in it which is also linear it becomes quadratic

#

and i do that twice for getting all the substrings of the left and right string

#

and my double for loop is also possibly quadratic.

#

but im not sure what the overall complexity is or if i understand it properlly

haughty mountain
#

first two loops are quadratic overall, the third one runs the risk of being cubic

tardy monolith
#

so if i have two quadratic loops and a cupic how would that be written in big O notation

#

is it only the biggest time sink we care about or do they add together

haughty mountain
#

you know how these asymptotic notations work?

tardy monolith
#

my course im learnign this stuff on is very crap at explaining complexity even though thats the whole point of it lol. i have literally had to self teach to try and fill in the gaps

#

so no that is a complete new word to me

haughty mountain
#

in rough terms, f(x) is in O(g(x)) if f(x)/g(x) approaches a constant (or zero) when x gets large

#

O is basically ≤ but for asymptotics

#

so e.g. is x^2 in O(x^3)?

x^2/x^3 = 1/x

let x get large, 1/x tends to zero, which is ≤ some constant

#

similarly, is x^2 + x^3 in O(x^3)?

tardy monolith
#

btw i am still here, i'm trying to absorb the info

haughty mountain
#

(x^2 + x^3)/x^3 =
x^2/x^3 + x^3/x^3 = 1/x + 1

which is ≤ some constant, so yes

#

the effect of the definition is that you can typically ignore lower order terms because it just doesn't matter, e.g. x^2 is much smaller than x^3 as x grows large

tardy monolith
#

right

#

so what made you think that the nested for loop could possible by cubic earlier, because in my head it was quadratic

haughty mountain
#

let's say your first loops were something quadratic, and third something cubic, then you have something like
a x^2 + b x^2 + c x^3

#

the first two terms would be negligible compared to the cubic term as x grows large

#

overall it's O(x^3)

#

(and you can go through the definition as well, computing the division and see what happens as x gets large)

haughty mountain
tardy monolith
#

yes

haughty mountain
#

and you have a string comparision which is also linear

tardy monolith
#

ah got you

haughty mountain
#

but I don't know if you can actually hit the worst case where every comparison is really expensive

#

which is why I went with it risking being cubic

#

it's possible there is some constraint on the strings you produce that makes is quadratic overall

tardy monolith
#

so assuming this is cubic, when i run my timing checks on this, unless i am being really stupid. and possibly using the timing code wrong, the numbers i was getting didnt seem to be cubic or quadratic

#

let me give you an example

#

im not sure if the runs and loops effect the outcome, but the times didnt really seem like they matched the complexity. unless im just more shockingly bad at math than i thought

#

lol which is quite possible

haughty mountain
#

ok, saying quadratic is really us being sloppy

#

it's more O(n m)

#

where n and m are the sizes of the strings

#

err, let me re-read the code

#

let's say the lengths of the strings are n and m
first loops is like n^2
second loop is like m^2
third is n*m*something

#

fwiw, the third loop will depend a lot on the data

#

for most strings you would find a mismatch early and you don't have to do the full comparison

#

for bad cases you might need to do a lot of work comparing strings

tardy monolith
#

ah that actually makes sense because i am talking about the complexity in terms of the left and right inputs

#

yeah this code does also have some prechecks to avoid the loops altogether it was just a bit long to post here

haughty mountain
#

I think something like

left = "A"*n
right = "A"*n
```would cause cubic looking behavior
tardy monolith
#

for the best case its (1)

#

because i do a little check to see if any of the stings are empty or if they are identicle and just return it which is a constant operation

#

but it was mainly this worst case part that was causing me the grief

haughty mountain
tardy monolith
#

ah with my finding the shortest of the two strings?

#

oh wait sorry i misread you

haughty mountain
#

no, the if in the nested loop

tardy monolith
#

i get what you mean

haughty mountain
#

I suspect overall it probably ends up being proportional to n^2 + m^2 + n*m

#

you had one string being very small and the other being large, so the large^2 dominates

tardy monolith
#

interesting, let me just check something with my timeit

#

2 secs

#

there we go

#

here check this

#

if the strings are equal sizes and i double the strings i get what looks more of what i was expecting

#

with the original O(n^3)

haughty mountain
#

that looks quadratic

#

divide the adjacent times there

#

!e looks close to 4

print(123/37.2)
print(448/123)
halcyon plankBOT
#

@haughty mountain :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 3.3064516129032255
002 | 3.6422764227642275
haughty mountain
#

well, close ish

tardy monolith
#

yeah close is perfectly fine

haughty mountain
#

(2*n)^2 = 4*n^2

tardy monolith
#

so the original assumption of O(n^2) was more on the mark then

#

Depending on the size of the left and right strings that can change. I will say this stuff is fascinating lol

haughty mountain
tardy monolith
#

thats ok im feeling better about the topic currently than i did when i started the chat

#

you have been super helpful

haughty mountain
#

I would recomment getting used to the definition of O and related notations, the basics of it is just looking at f(x)/g(x) and see what happens as x grows large

tardy monolith
#

i dont suppose you have a link to some reading or a youtube vid or something on this topic that you have found usefull do you?

haughty mountain
#

nothing specific

tardy monolith
#

because i have to do all this again with another algorithm that does the same thing with recursion and dynamic programming which im not looking forward to deciphering lol

haughty mountain
#

there are basically 3 ones you might see, and 2 more you are unlikely to see
O, Ω and Θ are fairly common
o and ω are rarer

#

it might help to know what they kinda correspond to in the math you're used to

#

I mentioned O is like ≤

tardy monolith
#

yeah my course covered briefly the top 3

haughty mountain
#

Ω is like ≥

#

Θ is like =

tardy monolith
#

and as it goes along it briefly talks about complexity of certain thigns, but never about an algorithm as a whole and what the complexity comes to

haughty mountain
#

o and ω are like < and > respectively

tardy monolith
#

and caches the results, so i have to alo figure out the complexity here

#

so i have recursion and splitting going on again

#

although i was dead proud of getting this one working lol

#

its a top down aproach

#

my god my spellings awful today lol

haughty mountain
#

recursion is always fun to analyze

tardy monolith
#

by fun you mean depressing lol

#

from my looking around and working thins out if a function calls itself its linear, and if it calls itself twice withing itself its quadratic, then i also have to factor in what the recursion is doing

#

so in my case i am recursively calling the function twice within itself while performing a split on each

tardy monolith
#

but then my algorithm is also caching the results

#

yeah i didnt think it would be

haughty mountain
#

the caching makes it more annoying, yes

haughty mountain
# haughty mountain it's not that simple

In the analysis of algorithms, the master theorem for divide-and-conquer recurrences provides an asymptotic analysis for many recurrence relations that occur in the analysis of divide-and-conquer algorithms. The approach was first presented by Jon Bentley, Dorothea Blostein (née Haken), and James B. Saxe in 1980, where it was described as a "uni...

tardy monolith
#

Nice i'll take a look.

#

thanks for the help, i dont want to sap away anymore of your time, but just know you have been a godsend

#

lol this link is painfully looking like my engineering courses calculus module lol

#

sigh

#

thanks again

crimson epoch
#

i know basics of python
how to go about the DSA part
any resources or course recommendation ?'

jolly mortar
#

check pins

patent junco
#

i'm trying to write manually parser for .gpl files and i'm stuck.
i have this code:

def listfile(input_file):
    global name
    global colors
    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break; 
    print(colors)

someone knows why i get name errors ?

patent junco
#

Python 3.11.2 BTW

flat sorrel
patent junco
#

and without was getting yet more errors…

flat sorrel
# patent junco it's inside

you should pass variables in via parameters instead of globals, otherwise the function will modify the value of the global which might mess up other code that's using it

#

speaking of globals, it seems that you have code that is outside of your function. can you show that as well?

patent junco
#

moment

#

full file although will be later used in a bigger application

#

(and sorry for language)

flat sorrel
#

it seems that the problem is that colors is not initialized

patent junco
#

but where i should?

#

i want to make it function-only

flat sorrel
#

probably at the start of the function

#

global colors doesn't actually initialize the value by itself

#

you should set colors = <some value>

patent junco
flat sorrel
#

no, it doesn't

patent junco
flat sorrel
#

set it to an empty list then

#

and then append to it inside the loop

patent junco
#

a dict, not list

flat sorrel
#

ok, then an empty dict

patent junco
#

{"name":["1","2","3"]}
(after a file with one entry)

#

or a more practical example after running code (yep, .gpl is for graphics, at the end gonna make a gimp alternative):

output:
"colors", {"red":["255","0","0"],"green":["0","255","0"],"blue":["0","0","255"]}
patent junco
#

https://www.w3schools.com/python/python_variables.asp

Python has no command for declaring a variable.
A variable is created the moment you first assign a value to it.
i feel like it should make variable on 13 line and that python shouldn't ever tell that variable isn't defined - should create it…

flat sorrel
#

it's not like in statically typed languages where the type of the variable is known in advance

#

so in Python, you need to actually set the initial value of the variable upon declaring it

patent junco
flat sorrel
#

how would it know that it should be a dict?

patent junco
flat sorrel
#

you didn't initialize the colors variable

#

you should assign it an empty dictionary (explicitly) at the start

narrow mica
#

and also that's not valid python syntax (as also shown by the error)

flat sorrel
#

like colors = {}

patent junco
#

i want it IN ONE LINE to either initialize variable with proper type if doesn't exist or append if does

flat sorrel
#

but your variable needs to exist for the whole function

patent junco
#

and if doesnt - to be created in same place

flat sorrel
#

but if the current_file is empty, you still return colors

#

so you can't only initialize it in the for loop

#

it must be outside

patent junco
#

file error handling will do later

#

now trying to make it work on a full file

#

(and doesnt)

flat sorrel
#

it would be a lot simpler if you just initialize the variables at the start of the function

narrow mica
# patent junco i want it IN ONE LINE to either initialize variable with proper type if doesn't ...

I don't think there's a good way to do that
if for some reason you must do it then

>>> def my_fn():
...     globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
...
>>> test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'test' is not defined
>>> my_fn()
>>> test
{'a': 'hello!'}
>>>
```which is in general pretty horrible code, both readability wise and performance wise (you're creating a new dictionary every time instead of updating an existing one)
patent junco
#

i'd rather used this:

colors: dict[str, str] = {single[3][1:-1] : [single[0],single[1],single[2]]} if not colors else colors: dict[str, str] += {single[3][1:-1] : [single[0],single[1],single[2]]}

but why running similar code twice?

narrow mica
flat sorrel
#

Why not this?

def listfile(input_file):
    # Initialize colors here
    colors = {}

    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break; 
#

just set colors once at the beginning

flat sorrel
#

how so?

patent junco
#

even assembly does better :/

flat sorrel
#

if you want to be fast then you shouldn't use Python

narrow mica
#

again it's just bad code though, it's way better to

>>> test = {}
>>> def my_fn():
...     global test
...     test |= {'a': 'hello!'}
```even better if you don't use `global` at all
```py
>>> def my_fn(test):
...     test |= {'a': 'hello!'}
...
>>> d = {'b': 'world!'}
>>> my_fn(d)
>>> d
{'b': 'world!', 'a': 'hello!'}
patent junco
#

okay. maybe you didnt understood.
i want something that merges "create dictionary" and "append to dictionary" at once

#

sort of "if exists: append; else: create with initial value"

#

but at once

narrow mica
narrow mica
narrow mica
patent junco
#

literally - trying to make a pseudo-csv parser

flat sorrel
#

this makes colors a variable outside the loop

narrow mica
# patent junco cause i want it to initialize in place? it's very nested and in LOOP…

I don't think I get what you mean exactly
I don't see why it'd hurt so bad to change one line

colors = {}  # <-- the only thing you have to add
def listfile(input_file):
    global name
    global colors
    current_file=open(input_file,encoding='utf-8')
    for line in current_file:
        if line.startswith("Name:"):
            name = line.split(" ")[1]
        elif line.startswith("#"):
            for line in current_file:
                single = line.strip().split(" ") #here 
                colors[single[3][1:-1]] = [single[0],single[1],single[2]] #or here errors
        if not line:
            break; 
    print(colors)
patent junco
narrow mica
patent junco
#

and i don't want importing it to create empty variable?

patent junco
flat sorrel
#

then don't make it a global. initialize colors inside the function so it is local to the function

patent junco
#

it can appear but only when processing file…

flat sorrel
#

my code does exactly that. the colors variable only exists inside the scope of listfile, i.e. when the function is called

patent junco
#

exists before

flat sorrel
patent junco
#

but isn't global

narrow mica
patent junco
narrow mica
patent junco
#

python isn't pythonish

narrow mica
#

!e nothing in python is stopping you from "re-defining" variables (even though that notion is kinda weird)

my_var = 10
my_var = 'hello'
my_var = [123]
print(my_var)
halcyon plankBOT
#

@narrow mica :white_check_mark: Your 3.12 eval job has completed with return code 0.

[123]
patent junco
#

may i test something here?

flat sorrel
#

you can use !e to execute python snippets here

patent junco
#

i know but idk rules

narrow mica
#

you can run snippets with !e, but if it involves multiple files it'd probably be easier to run locally or smthn

patent junco
#

no, two/three liner ( i prefer short code)

narrow mica
#

sure?

patent junco
#

!e

var : str = [11]
print(var)
halcyon plankBOT
#

@patent junco :white_check_mark: Your 3.12 eval job has completed with return code 0.

[11]
narrow mica
#

(if you haven't noticed, python is not statically typed and you can kinda "do whatever" with variables)

patent junco
#

tried to make it rusty xD

narrow mica
#

I think more and more languages are taking up that syntax tbh, not just rust

patent junco
#

i know, JS/ES too

narrow mica
patent junco
#

i know

regal spoke
#

Also, I'm pretty sure line cannot be empty. The smallest string it can be is a newline character. (also the ; shouldnt be there)

patent junco
narrow mica
#

I copied the code hacknorris wrote and added that 1 line, didn't look too thoroughly into it

regal spoke
#

I'm not even sure what that double for loop does

#

Does the inner loop continue off after the line of the first loop?

#

Could you post an example of what you are trying to parse?

#

is it something like this?

Name: MyFile
#
"Red" 255 0 1
"Blue" 0 0 255
patent junco
#

gimp palettes

patent junco
flat sorrel
#

using for loops on the same iterator multiple times is real dodgy imo. this is better served by a while loop

regal spoke
patent junco
flat sorrel
#

this has nothing to do with the dictionary

#

it's about iterating through the file

patent junco
#

but iterating works

flat sorrel
#

does creating the dictionary at the start not solve the problem?

patent junco
#

why ?

flat sorrel
#

what does your code look like now? does it work?

patent junco
#

dictionary handling broke it

flat sorrel
patent junco
#

cause iterating works but dictionary handling breaks code

flat sorrel
#

how do you know that?

patent junco
#

cause it'd show an error

flat sorrel
#

ok, but what if you comment out that line? it shouldn't affect the iteration

patent junco
#

then it wouldn't work

#

it's core function

flat sorrel
#

so if you comment out that line and your iteration logic is correct, the function should not throw an error

#

it'll just return nothing

patent junco
#

and it doesn't

#

dictionary is problem

flat sorrel
#

and if it throws an error, that shows that the iteration logic is wrong

patent junco
#

no, error is about dictionary (ain't blind)

flat sorrel
#

regarding the dictionary, just set the variable to a dictionary at the start of the function

#

that should avoid the NameError since the variable is always defined inside the function

patent junco
#

show me a shortcut for this:

colors += {single[3][1:-1] : [single[0],single[1],single[2]]} if colors else colors = {single[3][1:-1] : [single[0],single[1],single[2]]}
regal spoke
#

According to https://developer.gimp.org/core/standards/gpl/ you should be able to parse .gpl like this:

colors = {}
for line in current_file:
  line = line.rstrip() # Remove trailing whitespace
  if line.startswith('GIMP Palette'): # Start of header
    pass
  elif line.startswith('Name: '):
    name = line.split(maxsplit=1)[1]
  elif line.startswith('Columns: '):
    columns = line.split(maxsplit=1)[1]
  elif line.startswith('#'): # comment line, should be ignored
    pass
  elif not line: # empty line, should be ignored
    pass
  else:
    r,g,b,color_name = line.split(maxsplit=3)
    colors[color_name] = [r,g,b]

Doing it this way solves the issue of having double for loops. It also is able to correctly parse .gpl files with tons of extra whitespace like this:

GIMP Palette
Name: bugslife_final.png-10
Columns: 16
#
191 180 180   Index 0
163 158 157   Index 1
145 136 132   Index 2
130 125 112   Index 3
… … …
56  50  49   Index 29
41  38  38   Index 30
23  23  23   Index 31
242 245 213   Index 32
227 232 181   Index 33
210 217 147   Index 34
195 204 118   Index 35
… … …
  0   0   0   Index 251
  0   0   0   Index 252
  0   0   0   Index 253
  0   0   0   Index 254
  0   0   0   Index 255
#

You code from earlier wouldnt be able to handle 0 0 0 Index 255 correctly. But doing line.split(maxsplit=3) works really well

#

!e

line = "  0   0   0   Index 255"
r,g,b,color_name = line.split(maxsplit=3)
print(r)
print(g)
print(b)
print(color_name)
halcyon plankBOT
#

@regal spoke :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | 0
002 | 0
003 | 0
004 | Index 255
flat sorrel
#

!e

def listfile(input_file):
    name = None
    colors = {}
    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, cname = line.split(' ')
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '255 0 1 Red',
        '0 0 255 Blue',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255')})
flat sorrel
#

there we go

#

notice how I used a context manager (with statement) when opening the file to ensure that it gets closed. also the walrus operator (:=) lets me assign the new value to the variable in the same statement as the while condition

regal spoke
#

Btw I dont think storing the color in a dictionary like this is a good idea. The format allows colors without a name too.

#

"The color name is optional, i.e. that a color description line may only contain r g b"

#

So using a dictionary keyed by the color name is not a good approach

patent junco
flat sorrel
#

!e

def listfile(input_file):
    name = None
    colors = {}
    num_unknown_colors = 0

    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(' ')
                if len(rest) == 0:
                    num_unknown_colors += 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
 
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255 0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 0',
        '0 255 0 Green',
        '4 0 0',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})
flat sorrel
#

this version also takes care of blank lines and unnamed colors

#

(assuming that the file itself does not contain colors that are named UNKNOWN_{idx})

regal spoke
#

split behaves very differently if you specify a delimiter or not

flat sorrel
#

that is true

#

!e

def listfile(input_file):
    name = None
    colors = {}
    num_unknown_colors = 0

    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored

            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(maxsplit=3)
                if len(rest) == 0:
                    num_unknown_colors += 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
 
                colors[cname] = r, g, b
    
    return name, colors

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255\t0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 \t0',
        '0 255 0 Green',
        '4 0 0\t',
    ])
    with open('test', 'w') as f:
        f.write(content)

    print(listfile('test'))

test()
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})
flat sorrel
#

ok this should take care of tabs as well xD

regal spoke
#

Ye. Also whitespace in the cname also works now

flat sorrel
#

we can be a bit stricter and require that the header is actually in the first line, then optionally followed by name and columns

regal spoke
#

but ye

flat sorrel
#

Name: is only required in version 2

regal spoke
#

oh ye

flat sorrel
halcyon plankBOT
#

src/PIL/GimpPaletteFile.py line 41

if re.match(rb"\w+:|#", s):```
patent junco
regal spoke
flat sorrel
#

I get that, but is it really necessary to use a regex?

regal spoke
patent junco
#

probably would but i'd like to also avoid empty initialization

regal spoke
#

Wut

patent junco
#

that's what i was talking about

flat sorrel
#

but the file is almost guaranteed to contain at least one color. so an initialization would be made in basically 100% of cases

patent junco
#

i want to avoid x = {}

flat sorrel
#

why do you want to avoid this so badly?

regal spoke
patent junco
#

i don't want this at start:

    name = None
    colors = {}
    num_unknown_colors = 0

narrow mica
#

originally the problem was they wanted a way to (in 1 line) initialize colors if colors isn't defined, or update colors if it is defined

patent junco
#

cause i feel like there is oneliner which can do it same way as you can open a file in append mode (it doesn't give a f if file is empty or not)

flat sorrel
#

I mean, you can, but it's not pretty

patent junco
#

who cares?

narrow mica
#

which lead to this mess

globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
```idk how that worked out
flat sorrel
#

people who read your code would care

patent junco
flat sorrel
#

!e

globals()['test'] = globals().get('test', {}) | {'a': 'hello!'}
print(test)
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

{'a': 'hello!'}
flat sorrel
#

it does work

regal spoke
patent junco
#

i meant to make nonexistent dictionary to work like that

regal spoke
#

I cannot make sense of why you would ever want to avoid colors = {}

patent junco
#

like open(x,"a") to make dict working. regardless of existence or content

#

so what? i can't open an empty file in append mode too?

regal spoke
#

I'm extremely confused here

patent junco
#

anyway - right now getting more serious problems. let's end it

flat sorrel
#

!e Well, this is extremely cursed but does what @patent junco wants I guess...

def get_var(ctx, name, default):
    return ctx.get(name, default)

def listfile(input_file):
    with open(input_file, 'r') as f:
        while line := f.readline():
            line = line.rstrip()
            if not line:
                continue    # Empty line which should be ignored
    
            if line.startswith('GIMP Palette'):
                pass
            elif line.startswith('Name:'):
                name = line.split(maxsplit=1)[1]
            elif line.startswith('Columns: '):
                columns = line.split(maxsplit=1)[1]
            elif line.startswith('#'):
                pass
            else:
                r, g, b, *rest = line.split(maxsplit=3)
                if len(rest) == 0:
                    num_unknown_colors = get_var(locals(), 'num_unknown_colors', 0) + 1
                    cname = f'UNKNOWN_{num_unknown_colors}'
                else:
                    cname, = rest
                
                colors = get_var(locals(), 'colors', {})
                colors[cname] = r, g, b
    
    return get_var(locals(), 'name', None), get_var(locals(), 'colors', {})

def test():
    content = '\n'.join([
        'Name: MyFile',
        '#',
        '1 0 0',
        '255\t0 1 Red',
        '0 0 255 Blue',
        '',
        '2 0 0',
        '3 0 \t0',
        '0 255 0 Green',
        '4 0 0\t',
    ])
    with open('test', 'w') as f:
        f.write(content)
    
    print(listfile('test'))

test()
halcyon plankBOT
#

@flat sorrel :white_check_mark: Your 3.12 eval job has completed with return code 0.

('MyFile', {'UNKNOWN_1': ('1', '0', '0'), 'Red': ('255', '0', '1'), 'Blue': ('0', '0', '255'), 'UNKNOWN_2': ('2', '0', '0'), 'UNKNOWN_3': ('3', '0', '0'), 'Green': ('0', '255', '0'), 'UNKNOWN_4': ('4', '0', '0')})
patent junco
#

i thought more of combining = and +=
:/

flat sorrel
#

if you want to delve into more cursed code, you are welcome to join the madness in #esoteric-python

regal spoke
#

So if the parsing fails, then it is as if no dictionary ever existed in the first place

#

On the other hand, if you had initialized colors = {} as a global variable, then it is a different story. But that is not what DarkLight's code is doing

south umbra
#

i have JSON Data coming from 5 different places i believe that the data will get pilled up also it would be very big and messy, actually i made a RAG to pass the data for LLM to have knowledge base so it can provide a personalized answer,
Anyways the thing is i am getting data from 5 different channels of marketing then passing it into 1 json for a workspace to the RAG
so my fear is that the data would get pilled up/can break

so what should be the optimal approach?
what do u guys think?

flat sorrel
#

won't the RAG process take longer than combining 5 jsons together?

#

I guess you can set a maximum lifespan for each data group to combine so that it won't hog up resources in case one of the 5 processes fails

pale prism
#

(cross-asking from #data-science-and-ml - I asked them in case of numpy answers, but it's more of algo thing)

My partner has an interesting problem involving Jaccard index...
Comparing >300k unique subsets A (2^A), where |A| > 300 - and for each such set finding sets with which it has minimum Jaccard index...

We were thinking about representing the sets as 300 bits- that gives us fast calculation of the index itself (because bitwise operations), so only the number of calculations makes it costly -
bruteforce of everything-to-everything is (300k)² operations.

Does anyone have any ideas how to get it lower? We were thinking about clustering it somehow but |2^A| is so big it's hard to think of something that makes sense (there's a lot of pairs that don't intersect at all).

haughty mountain
#

jaccard index here would be n_matching_bits/|A|?

#

which I guess just boils down to looking at n_matching_bits

#

i.e. hamming distance

boreal schooner
#

What's the most performatic way to solve a shopping cart queue?

I participated in an Amazon coding problem yesterday, and one of the questions was a really simple one. To paraphrase: "Develop a method to handle Amazon shopping carts that takes two inputs, items being the current shopping cart which is an array of product IDs, and queue being the operation IDs to apply to the shopping cart. If the ID is positive, append the product ID to the end of the cart, if the ID is negative, remove the first instance of the module of the ID from the shopping cart. Return the processed shopping cart." I implemented a simple solution, and it passed all the simple test cases, however it failed all the test cases with large inputs. I ended up not having time to come up with a performatic solution, my best being O(n^2) I think, which I will attach below. How would you have done it? I thought about using either a collections.deque or collections.defaultdict.

def problem_solution(items: list[int], query: list[int]) -> list[int]:
    for item in query:
        if item > 0:
            items.append(item)
            
        else:
            items.remove(abs(item))
            
    return items
pale prism
# haughty mountain jaccard index here would be `n_matching_bits/|A|`?

No Jaccard would be done normally.
J(x, y) = count_bits(x&y)/count_bits(x|y) (where count_bits counts 1s in the binary repr)

The /|A| makes no sense as it would only do size of intersection, not actual similarity of sets.

Also, *maximum, because obviously minimum is 0, lol. We want the most similar sets.

It's "just":
n > 300k
a_1,...,a_n \in 2^A

For each a_x find a_y (x!=y) such that J(a_x, a_y) >= J(a_x, a_i) for all i<n, i!=x

(Theoretically there can be multiple a_y for given a_x - and in such case we want all of them, but idk how it write in mathsy way)

So with bruteforce, you'd calculate everything-to-everything and then find max value for each a_x and indexes where that max is present. Or something.
(Or of course calculate along the way keeping only current max. But that's still 300k squared operations)

haughty mountain
#

oh right, intersection over union

haughty mountain
#

O(minutes)

vocal gorge
haughty mountain
#

(assuming these operations can be expressed in a way that the GPU likes)

vocal gorge
boreal schooner
unborn sundial
# boreal schooner You mean something like this? ```py def sol2(items: list[int], query: list[int]...
import unittest
from collections import defaultdict 


def problem_solution(items: list[int], queries: list[int]) -> list[int]:
    elements_to_add: list[int] = []
    elements_to_remove: dict[int,int] = defaultdict(lambda: 0)
    for query in queries:
        if query > 0:
            items.append(query)
        else:  
            elements_to_remove[abs(query)] += 1

    result = []    
    for item in items:
        if elements_to_remove[item] > 0:
            elements_to_remove[item] -= 1
            continue

        result.append(item)

    return result + elements_to_add

class TestStringMethods(unittest.TestCase):

    def test_1(self):
        self.assertEqual(problem_solution([1,4,5], queries=[3]), [1,4,5,3])

    def test_2(self):
        self.assertEqual(problem_solution([1,4,5], queries=[-4]), [1,5])

if __name__ == '__main__':
    unittest.main()

or smth like this

pale prism
# haughty mountain O(minutes)

Uh... What?
We checked and it's ~1000s to get over all elements once (so ~3.33ms per element). That gives 1000s *300k for bruteforce. That's 1000×300000÷60÷60÷24 days. That's still 3k days. 💀

Even if it was 1ms per element it would still be 1k days. 💀

Also, my partner says the bit representation actually doesn't make much sense because for a lot of those sets the bit count is like 10.
And they counted the A and it's even bigger, so yep, bit representation is completely useless. 😭 My experience with competitive programming didn't help - the bit representation was my first thought because it's common in such tasks in competitive programming

vocal gorge
# boreal schooner You mean something like this? ```py def sol2(items: list[int], query: list[int]...

Your implementation loses most items, I think, but very broadly yes. I meant like this:

def sol2_cnt(items: list[int], query: list[int]) -> list[int]:
    removals = defaultdict(int)
    for item in query:
        if item < 0:
            removals[abs(item)] += 1
    res = []
    for el in itertools.chain(items, query):
        if el < 0:
            continue
        if removals[el] > 0:
            removals[el] -= 1
        else:
            res.append(el)
    return res

which seems to match the output of the naive solution

haughty mountain
#

going over 300k items should for sure not take minutes pithink

#

or were you doing this in plain python or something?

#

even then that seems quite slow

#

3.33ms is a lot of time in computer world

haughty mountain
pale prism
# haughty mountain isn't 1 element just |A| bits or did I misunderstand you?

... My partner used shorthands when talking... apparently their "going over" meant just grabbing the sets for each thing*, not "going over with already grabbed sets". So my calculations might be way off 🤦 talking to my partner is sometimes hard when it comes to coding, lol. I should've realised it when they said about caching - but then said something about 1s... but when I asked "so going over all elements is 1s" they said that no, it takes 1000s. 💀

*each element being a set was already simplification, we have mapping from original element to set of traits it has

Idk whether i should push for the bit representation. Apparently most things have sets of size ~10, and we now know that |A| is at least 2k, so the theoretical bit representation gets longer and longer 💀

pale prism
# haughty mountain or were you doing this in plain python or something?

I think that 1k s was database? Idk anymore, seriously, talking to my partner is sometimes hard if you don't confirm several times that you're on the same page (see the "going over all elements" thing 💀).

My partner would prefer to do it in their codebase directly but I'm supposed to explore whether python could do it better.

haughty mountain
#

even if it turns out to be hard to do better algorithmically, this is a task that's massively parallelizable

#

e.g. you can model this as K² problems of size (N/K)²

#

and then reduce to get the result

remote slate
#

I have a network graph (minimum spanning tree) that I split into k groups (by removing the largest k-1 edges). this is expressed as a list of (X,Y) pairs. where X and Y denote nodes, and the existence of the pair denotes an edge between them.

I want to group nodes and assign labels to the groups. I've been staring at this problem for a couple hours now and am at a loss as to how to start.

#

I'm sure this is a solved problem. I seem to be blind as i can't find it already done.

haughty mountain
#

just remove the edges and do some dfs traversal to find the graph components? pithink

remote slate
#

i think i have just brute-forced it. need to test.

haughty mountain
#

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components ar...

#

fairly basic words

#

Start somewhere you haven't yet visited, visit everything you can and mark that as visited. That is a new group/component. Repeat

remote slate
#

I think i have it. pretty cursed code.

haughty mountain
# remote slate

a more typical representation of a graph is an adjacency list, where you have a list of neighbors for every node

#

so you get something like

graph = {'node': ['neigh1', 'neigh2'], ...}

components = []
seen = set()
for v in graph:
  if v in seen:
    continue
  # Traverse the graph.
  stack = [v]
  component = []
  while stack:
    cur = stack.pop()
    if cur in seen:
      continue
    seen.add(cur)
    component.append(cur)
    for neigh in graph[cur]:
      stack.append(neigh)
#

where most of the logic is just the graph traversal

#

breaking some part of the logic out of the loop:

graph = {'node': ['neigh1', 'neigh2'], ...}

components = []
seen = set()

def find_component(start):
  stack = [start]
  component = []
  while stack:
    cur = stack.pop()
    if cur in seen:
      continue
    seen.add(cur)
    component.append(cur)
    for neigh in graph[cur]:
      stack.append(neigh)
  return component

for v in graph:
  if v in seen:
    continue
  components.append(find_component(v))
grizzled copper
#

What's the most efficient undirected graph representation for this problem?
max 5000 vertices, max edges 10,000 or v(v-1)/2 where operations needed are

  1. Max number of vertices able to be reached with certain boolean value where weight on each edge < arbitrary amount
  2. Minimum weight required to reach a vertice with certain boolean value
  3. Minimum amount of weight (from edges) required to connect all vertices
#

For problem 1) I was thinking of keeping a list of all vertices with True tag and then checking if they are be able to be traversed to from current vertice

#

This would be a problem if the majority of vertices had the True tag though

#
  1. would be Minimum spanning tree
rigid trench
#

Besides, the algorithm to solve the problem isn't the question

#

You're asking about the storage

grizzled copper
rigid trench
#

The representation would be basically just one of 3 options. Adjacency Matrix, or Adjacency Lists being the most common

#

I think you want the algorithm, not the memory representation

#

Are you looking for one data structure? Or is it three different questions?

#

I guess I don't really understand what the question is trying to ask

rigid trench
grizzled copper
#

I don't really care how much memory it uses so long as it can enable the algorithms to be fast

grizzled copper
#

max weight is 10^9, guaranteed that all vertices have a path to all other vertices ignoring weight and no more than one edge connecting the same two vertices

haughty mountain
#

an adjacency list is usually a good default

grizzled copper
#

maybe an adjacency list with classes so i can contain the boolean values?

haughty mountain
#

feels way overkill

#

the boolean labels can just be a list (or dict, if your nodes aren't represented by integers 0, 1, 2, ...

#

the adjacency list can store the weights, or even that can be kept separate in a dict, whatever is convenient

grizzled copper
haughty mountain
#

kinda, you're missing some edges

grizzled copper
#

i meant it as vertice1 : [(vertice2, weight)]

haughty mountain
#

the indices look off though

#

compared to the values list

#

4 would be out of bounds

#

usually I let my vertices be represented by 0, 1, 2, ...

#

in which case you can have

graph = [
    [(1, 3), (2, 2)],
    [(0, 3)],
    [(0, 2)],
]
  
values = [True, False, True]
grizzled copper
#

Ohh alright thanks

regal spoke
#

Minimum weight required to reach a vertice with certain boolean value

#

Reach from where?

#

As for graph representations. My guess is that having the graph represented by a list of edges stored in sorted order (depending on the weight) would be a good idea.

haughty mountain
#

if you need to do MST-like stuff having a list of sorted edges helps, yeah

grizzled copper
regal spoke
#

Is the graph changing for you, or is it static?

grizzled copper
#

Static

regal spoke
#

So 3. is just MST?

grizzled copper
#

I think so yeah

#

Im just not sure on 1

regal spoke
#

In 1. are you given a vertex v and a value x, and then the answer is to compute the number of vertices rechable from v only using edges with weight <= x?

grizzled copper
#

Yes

#

This is my graph structure atm:

{1: [(2, 8), (3, 4)], 2: [(1, 8), (5, 6), (6, 12)], 3: [(1, 4), (4, 7), (5, 20), (7, 15)], 4: [(3, 7), (6, 13)], 5: [(2, 6), (3, 20), (6, 15)], 6: [(2, 12), (4, 13), (5, 15), (7, 10)], 7: [(3, 15), (6, 10)]}
{1: False, 2: False, 3: False, 4: False, 5: False, 6: False, 7: False}
#

So 2 dicts

regal spoke
#

In that case, there is an rather simply way to solve 1. "offline".

#

The trick is to build the graph, one edge at a time, starting from the edge with the smallest weight

#

Keep track of the graph's connected components using a DSU