#algos-and-data-structs
1 messages · Page 16 of 1
ok, so value(100%)-value(10%) is roughly value(10%) - value(1%)
so yeah, looks logarithmic
max being (0, 240) is kinda bizarre though
wrong way around I think
x = log(61440)
with some scaling factor
err, maybe I'm dumb
these things confuse me too
e**10 is 22k60 nowhere close to 33920
should be log, right?
right, this checks out
so log(percentage)
err
I need help for an exercice, Idk if it's the right place to ask it ??
probably better to use one of the help channels, under AVAILABLE HELP CHANNELS
prediction: 50% would be roughly 57297
lemme check quick
what would that do
get the byte representation of the float
(double, really)
57344 but close yea, what formula did you use
oh its a float?
wtf is that
ah so like õ¼º?
idk, log10(percentage) gives a straight line plotting, so I know it's basically right
the rest is just scaling things
maybe you mean bit representation idk
well its exactly 0.5 but yea its close
the values here are just based on the values of 1% and 100% you provided
that gives me 51902, how did you arrive at 52797?
In [52]: (log10(0.5) + 2) / 2 * (61440 - 33920) + 33920
Out[52]: 57297.827259663616
the 1% gets approximated (float precision issues)
is math.log different from log10?
math.log is the base e log
you can scale it and it will be correct
math.log(x)/math.log(e)
ok
let me work out if the formula ends up any nice
sure thaks
fwiw, they are interesting values
>>> bin(57344)
'0b1110000000000000'
>>> bin(61440)
'0b1111000000000000'
the 50% and 100% points
one bit flip away
hmm
(log2(x) + 1) * 2**12 + 57344
I guess that's nicer?
but still some magic values
why are you adding 57344
because I'm fitting my range to match your measured values
is it not possible to do this without adding magic values at all?
log(percentage) is proportional to the values
that's all I'm basing my things on
the multiplying and adding is to scale to fit the measured values
(0.5, 57344) and (1.0, 61440)
the 2**12 is 61440 - 57344
well 33920 is not for 0.01 but its for 0.010009765625 which is why the formulas are giving slight deviations
4096
that's why I wanted to use 0.5 and 1.0
which are exact in floating point
sadly things are still a bit off
!e
from math import log2
print((log2(0.010009765625) + 1) * 2**12 + 57344)
@haughty mountain :white_check_mark: Your 3.11 eval job has completed with return code 0.
34232.53301091567
i guess its probably better to leave it at that and add a note about it in the docstring
this formula is very accurate tho
and it is probably used at tons of more places (knobs)
except it's wrong for a bunch of values, no?
like the value close to 0.01
it's not far off, but still off
its off for the exact value as well
yes, that's what I mean
FL is coded in delphi so dunno what type of floats or encoding that uses
it should be log2'ing a value between 0.009494 and 0.009495
maybe there's some hard constant left to add to
right
In [66]: x = 33920
...: 2**((x - 57344) / 2**12 - 1)
Out[66]: 0.009494119999847414
so this is why I'm quite certain it's logarithmic
octave:35> data = [0.01 33920; 0.1 47512; 0.5 57344; 1.0 61440];
octave:36> plot(log10(data(:,1)), data(:, 2), 'o-')
very nice line
(log base is irrelevant, I used log10 because it was convenient at the time)
oh this is awesome
i always wanted a tool that could build a graph like that from data
there are loads of options for that
octave is basically open source matlab
you can use matplotlib from python
and there are heaps of other tools for generating some quick plots
so what's the formula for this then?
reverse formula
i.e get 0.5 from 57344
what language is this?
see left hand side of the prompt 😛
just solve this for x
i.e. this, though it could probably be simplified a tad
smart, thanks 😄
I can get it down to this, but it's not exactly easy to figure out where it comes from
perc = 2**(x / 2**12) / 2**15
or the reverse
x = 2**12 * log2(perc*2**15)
= 2**12 * (log2(perc) + 15)
these numbers come out very clean, which I guess is a good sign
I guess the third byte is still unexplained
though it might just be a flag
do you think its some kind of factor useful for the formula?
it might as well be some delphi logarithmic integer encoding
probably not useful for the value itself
granted 63 is also an interesting binary number
you are indeed good at math
0b00111111
try verifying 0.25 and 0.125
ok hang on
those should be exact floating point values
0.25: 53248
0.125: 49152
well that's perfect
In [83]: 2**12 * (log2(0.25) + 15)
Out[83]: 53248.0
In [84]: 2**12 * (log2(0.125) + 15)
Out[84]: 49152.0
neat
I mean, I can step through the whole process. I plotted and saw that log(perc) ~ x
I have two values I assume to be exact
(0.5, 57344) and (1.0, 61440)
ok, so log2(1.0) = 0 and log2(0.5) = -1
I would like to scale and translate this to match my values
i.e. I want to translate the range [-1, 0] to [57344, 61440]
[-1, 0]
[0, 1] # + 1 nice thing to start with
[0, 4096] # *(61440 - 57344) fix the scaling
[57344, 61440] # + 57344 shift to the correct place
x = (log2(perc) + 1) * (61440 - 57344) + 57344
the rest is just trying to simplify this
yea you had calculated that formula earlier
it's pretty standard I would say
try to figure out a relation that makes things proportional, and then scale things appropriately
with this formula now all I hopefully need is an adapter
granted, this is the kind of thing I encountered a bunch when studying in uni
unless the 3rd byte 63 does some of its own magic
because physics and numerical analysis
i never was quite a math guy
idk what the 3rd byte is
given that it's always either 0 or 63 it feels like some kind of flag
i am very suspicious that it is related somehow to the actual formula (if it might be different or some conditional thing)
maybe it encodes what kind of value it is
63 is not a single flag though its a bunch of them
if it's logarithmic/linear
yea like a unit
oh that's a good line of thinking
granted, this doesn't make too much sense
because 0 means very different things for linear and logarithmic
idk
maybe there are more types as well, which i misinterpreted as just a combined 4 byte integer
I feel pretty certain the first two bytes is the value though
but on a whole, 63 is the majority
holy shit u guys r big brains
not me
might as well make you a contributor for this formula you found out, big discovery! 😄
hi
my name is Pulkit Sharma
I'm amateur coder
right now i know CPP and a little Python
nice to meet you all
@haughty mountain hi there
You had mentioned you knew some binary parsing libraries before
I know about stuff like kaitai that's kinda made for this
there should be a few others
anyone know backtracking
stack = []
res = []
def backtrack(numbs):
res.append(stack)
for i in range(len(numbs)):
stack.append(numbs[i])
backtrack(numbs[i:])
stack.pop()
backtrack(nums)
return res
trying to return power set but its not working
for example nums is [0] and the power set is [[],[0]] but it returns set with two empty sets
first it puts the empty set into the res list
then it iterates over the set, the example set only contains one element
it adds 0 to stack then recursive call
the recursive call should add [0] to the stack as the stack contains [0], even the recursive call prints [0] for stack but it appends an empty list
Kaitai doesn't support serialisation tho
solved it but am wondering why the former didnt work
stack = []
res = []
def backtrack(numbs,curr):
res.append(curr)
for i in range(len(numbs)):
backtrack(numbs[i+1:],curr + [numbs[i]])
backtrack(nums,[])
return res
its the same idea implemented differently but the first implementation is more intuitive to me
dont want to make the same mistake onna test
hi! is this sorting algorithm a quicksort?
dze ccf bkw hwy pjk xce aux qtr xpa atm
ccf dze bkw hwy pjk xce aux qtr xpa atm
bkw ccf dze hwy pjk xce aux qtr xpa atm
aux bkw ccf dze hwy pjk xce qtr xpa atm
aux bkw ccf dze hwy pjk qtr xce xpa atm
atm aux bkw ccf dze hwy pjk qtr xce xpa
Where will I insert a key 25 in this AVL tree?
for sure not, e.g. you can't get from line 1 to 2 though a quick sort partitioningstep
also, what you have there is not a description of an algorithm...
I'm confused whts goin on in this removeFirstNode method(removes the head of the Linked List, sort of mimicking a stack).
They set, temp = head
but then, head = head.next, and then temp = None.
However, since temp = head, isn't temp = head.next as well?
And if you're setting temp = None, aren't you setting head.next = None as well?? How does Python even work here.
what the fuk is goin on
This looks like it was poorly translated from another language by someone who didn't understand it
yes mb I don't get Python at all, so I was only able to muster up tht much to explain wht im confused at
man this looks like simple shit I should alreayd know wtf
unless I'm completely missing something, it's equivalent to ```py
if not head:
return None
return head.next
what about deleting the first node?
temp isn't even doing anything
it'll get garbage collected if nobody is using it anymore
but if you keep the original head around you will still be able to access the whole original list
basically, yes
oooo so if I return the 2nd node only, then thts the same as taking out the first node completely
anything that you aren't keeping a reference to gets deleted eventually
Im a noob so this should be a simple question.
if "sample 1" or "sample 2" in submission.selftext.lower() or submission.name.lower() and submission.id not in posts_replied_to and post.author != r.user.me():
I want this to run if either "sample 1" OR "sample 2" is in either one of the two lists, as long as the last 2 and statements are true. Did I do this right?
I use Arduino language (similar to C/C++) so Im used to putting everything in parenthesis for order.
if (("sample 1" or "sample 2") in (submission.selftext.lower() or submission.name.lower())) and (submission.id not in posts_replied_to) and (post.author != r.user.me()):
Here I added parenthesis to demenstrate how I want it to behave.
Apparently, that's not correct. I may or may not have spammed the same test comment to the last 30 posts on the r/test subreddit. How do I fix this to work correctly?
Hi, I don't quite understand this question. If there is "aabc" shouldn't the output be "a0bc" since b is not repeating ?
What is the answer of this?
The if statement consists of two part: and?
Hey @sturdy vapor!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
is that even a question
https://paste.pythondiscord.com/abugaxivux
Hey there. I'm playing around with ways of optimizing run time for Monte Carlo simulations in Python using a basic dice game simulator. I'm trying to make this thing run as fast as possible on one core before I start trying to implement multiprocessing. Are there any ways which I can make this run faster than it already does? I've switched all while loops I previously had to for loops, removed a loop in roll_dice() by passing it trial_number and generated all the random numbers that will be used in the simulation beforehand using numpy.
I'm no backtracking expert but I can tell you a few things wrong with your code. You're appending reference copies of the list stack into res. Your backtracking algorithm eventually clears the contents of the list stack. You must copy the list by value and append that into res instead.
Anyway, I tried running your algorithm but it recurses indefinitely. It needs a proper stopping condition
it should be numbs[i+1:] in the recursive call
that should fix the infinite recursion
So for time complexity we use Big O and omega and theta, but they are not related to worst average best cases right?
Like Big O isn’t worst
Theta isn’t average
Omega isn’t best
?
Yes I know
think of them as <=, >= and =
I know but they are not related to worst best average?
I can give upper/lower/tight bounds on the average running time
Or another question how I know worst best avg
if you consider all possible inputs (and not an average over them) then O is the worst case and Ω is the best case
so what they end up meaning depends on what you're bounding
So O is not always mean worst case?
what do you mean by worst case?
I can give an upper bound on average complexity that's better than the overall worst case
@haughty mountain
remember the formula you gave me?
2**12 * (math.log2(obj) + 15)
well log2 doesn't work for 0
raises ValueError: math domain error
well yeah
it took me a day to figure that one out because construct was swallowing the exception
and serialising incorrectly
if anything it's -∞
you probably need to special case it, yeah
well 0 is the minimum value in uint16 representation
so I can just do if value == 0.0: return [0, 63] something like that
actually [0, 0] because the 63 comes down to 0 for a value of 0
0.0 shouldn't become zero, should it?
i mean that's what get stored when the knob is at absolute 0
yeah, I guess that's one way to get rid of the -inf issues
put a special value instead
but then what about for 0.00000001 type of extremely close to 0 values?
i should probably use math.isclose but then it might cause a change in values when serialised as-is which is undesired
fucking floats
you can for sure get values outside the valid range
it's not really a float issue
0.0 is special
that't about it
the other results should be clamped to the correct range
Not quite expected huh
it's what I expected 😛
maybe i should check if result is negative and just set it to 0 instead?
why not just special case zero and clamp the value?
unless you want the behavior you described
wdym by clamp the value?
what does the actual tool do?
is this different
if i set the knob value to 0.000001 something?
right
clamping means you have a valid range [a, b] you put the closest valid point
it just sets it to plain 0
ok, then what you propose matches their behavior
0.0001 gets set to something close by but not 0, 0.00001 gets set to plain 0
if valid range is [2, 4]
0 -> 2
1 -> 2
2 -> 2
3 -> 3
4 -> 4
5 -> 4
6 -> 4
that's a clamp operation
I don't think python has clamp, but you can implement it like
max(2, min(4, x))
@haughty mountain Have another scaling problem, this time for frequency:
90 Hz is 5777
1500 Hz is 33145
8000Hz is 55825
min 0 - 0hz
max 65536
(not 61440 😛
could you give some nice values? like some powers of 2
power of 2 frequencies I meant
since that's the side that I'm afraid will have rounding issues
yea right my bad
its just that this knob accepts a float from 0 to 1 too
128Hz (approx) is 7832
can't set it very exactly, in float its 0.1195
2048Hz - 37218 - 0.5679
but atleast its some curve
could be log(constant + x) 
do you have some smaller values as well?
like 128, 64, ...
the small values are the ones that will tell me if my guess is all wrong or not
4096Hz - 46565 - 0.710525512695313
ok its hard to go accurate with low values margin of error is high
just roughly correct is fine
I just want some nice spread over orders of magnitude
interesting looking value
but not a good number
and the scale begins from 10Hz, so 10Hz is 0
ends at 16k
interesting becausae
>>> 2**6
64
>>> 4**6
4096
exactly 10 and exactly 16000?
yes
not even close
(log(constant + freq), value) is so close to a line
maybe that's it?
ahh its not khz likely
no, that's log(constant + freq) on x axis
where constant = 160 for now
which is just a random guess
that's good for a random guess 😄
do you have any values outside 10 and 16000 where you are completely sure they are exact?
I guess 4096 and 63 was one?
err, you meant 65535 right?
no 65536
wat
its a 4 byte integer
do they use a 4 byte integer to be able to represent exactly 2**16?
but nothing above?
idk its everywhere around the format, spread like a plague
they use 2 bytes for storing 0-256
similarly 4 bytes for 0-65536
here is a test which passes (uses 65536 checks)
https://github.com/demberto/PyFLP/blob/596dcde80e7740a9219d4c233ddc219ac0846ac5/tests/test_mixer.py#L82-L92
tests/test_mixer.py lines 82 to 92
def test_insert_eq(get_insert: InsertFixture):
eq = get_insert("post-eq.fst").eq
assert eq.low.freq == 0
assert eq.low.gain == 1800
assert eq.low.reso == 0
assert eq.mid.freq == 33145
assert eq.mid.gain == 0
assert eq.mid.reso == 17500
assert eq.high.freq == 65536
assert eq.high.gain == -1800
assert eq.high.reso == 65536```
or maybe its harder to notice changes in tuples:
(0, 0, 0, 0)
and
(0, 0, 1, 0)
than
(255, 255, 0, 0)
is super hard to see when manually checking big chunks of bytearrays
4096 is integer representation for 63 hz
the knob only moves that much
I'm a fraction of a % off 😭
looks close at a glance with all the constants roughly fitted
still close enough ig
yea that looks accurate af
the frequency values were approx, as it was impossible to set the frequency exactly
my values are consistently off 
first 2 and last 1 point was used for computing constants
which is why they are at 0
or close to 0 anyway
not beautiful 🙂
-71718.57858698638 + 14165.208975887555*log(148.065401 + freq)
a different regression fit https://www.desmos.com/calculator/flxuyqogc7
rather than my manual hacky one 😛
that used only 3 points
granted this regression kinda has issues
since we have both very large and small numbers
but yknow, it's quite close
though it's a pretty weird format 
Why wouldn't they just simple store 10 to 16000 if they had 4 bytes to begin with
It can be even done in 2
it seems a bit suspect that the max would be at 16000Hz
It is
since that's still in quite audible range
The filters it uses are old and CPU cheap
Ig never updated since they were coded back in 2000s sometime
Nobody really uses that feature, its just there. I only myself use if I am running low on CPU
Any course for learn data strutture and algos from zero?
Is there an answer to if top down or bottom up dynamic programming is faster.
Or it just depends
Some in pinned messages
Thanks
depends, top down might not need to evaluate all states, if you need to evaluate everything bottom up avoids recursion overhead and the dp array is nice and compact in memory which gives you good cache usage
That still means no conclusive answer without more information right?
yes, depends on the task
I prefer bottom up is that version is sensible
in part because python recursion is terrible
Does anyone know what the worst case scenario is for the shell sort algorithm?
which shell sort algorithm?
wikipedia has a whole table of gap sequences and their complexities
oh that's a good question...
can anyone here help me with a complex number python assignment i have ?
I'm not sure, but I'll find out
the stack for keeping track of free space is trivial, no?
start with all indices in the stack
if you need a free space to put something in, pop
if you delete an element push the newly free index on the stack
right right that's exactly what i'm attempting to do. i think what is tripping me up is how to do the chaining pointers. also i'm getting list index out of range errors so i must be going wrong somewhere. one good thing is that i was able to implement my quadratic probing going properly on my own
sorry that's not my current error, current error is int object has no attribute keys
so.. i'm reading a bucket incorrectly i suppose
i'll need to maintain a linked list of pointers (the chain) to support searches, i think
its not really clear if we have to support all the standard operations of hash tables like delete, insert, etc from the handout
here's what i have so far:
and after reading i'm sure you can see where this logic came from
but i want to tease out the error before going and further changing or possibly breaking things
Hi guys. I got this question. Can anyone help me to figure out why my output is wrong please?
Question
Given a dict of accounts and a threshold int, make sure that all the accounts balance to meet the threshold
Example with threshold of 100
dict { "Acct1": 130, "Acct2": 90, "Acct3": 120, "Acct4": 70, }
Expected Output
Acct 1 -> Acct 2: 10 Acct 1 -> Acct 4: 20 Acct 3 -> Acct 4: 10
My code which is not working well
!e
def account_balance(account: dict, threshold: int) -> None:
temp_account = {}
accounts_list = []
for key, value in account.items():
if value > threshold:
temp_account[key] = abs(value - threshold)
accounts_list.append(key)
if value < threshold:
diff = abs(value - threshold)
for curr in accounts_list:
if curr != 0:
temp_account[curr] = temp_account[curr] - diff
print(f"{curr} -> {key} : {diff}")
dd = {
"Acct1": 130,
"Acct2": 90,
"Acct3": 120,
"Acct4": 70,
}
account_balance(dd, 100)
@last fulcrum :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | Acct1 -> Acct2 : 10
002 | Acct1 -> Acct4 : 30
003 | Acct3 -> Acct4 : 30
why do you need to maintain a linked list?
at least you don't need a separate linked list
your table is (several) linked lists already
with the pointers
i'm thinking of the series of pointers as a linked list where each pointer will point to the next array location if a given location is full
look at anywhere you assign to keys
wdym
or wait
you replaced a bucket with a key somewhere
look at any assignment to table[...] then
yes, you were correct
i have a new error, let me try to sort it
also, can i change
to include the for loop
to support when i have bucketsize > 1?
ah crap
then i'll have to return which bucket slot is open
so i'll need to also return i
huh? == vs is doesn't matter here
oh hey, my previous answer is probably relevant

i'm having issue passing variables from one function to another
maybe i make it an attribute of openslot?
sry poor naming
gonna change openslot function name to isopen
i just want to be able to yield that to another function
you could have it return an index or None
the return you have now is very awkward to use
concise
well, you can't do that with the code you posted
i know
should i change my else: return False to else: return None?
i'm not following 😦
I'm suggesting you return an optional result
i.e. on success return an int value (the index) on failure return None
because there is no such index
ok ok
looks like chaininsert has no scope for it
ah crap it doesn't color in here
oh i didn't implement your suggestion above yet 1s
what is slot here
oh nvm
what is this operator
:=
it's a fairly recent thing
walrus operator
basically assign and return the assigned value
oh wow
you could also write the assignment above the if to do it without :=
well not the to n bit but it gets you closer with an in line assignment
hmm now i just need to fix up my else clause and figure out the pointer situation
lol this is getting confusing with all of the attributes and various functions and variables
hmm
got an error i've never seen
pop from empty list
hmm
so it is inside of a for loop but i don't really seem to be handling the case when isopen(newbucket) returns None
so perhaps i should wrap everything in the else statement in a while loop?
and that would solve what?
hmm no idk
how many elements should be popped in chaininsert?
as many as it takes until self.isopen(newbucket) returns an openslot
newbucket being a pop from a stack
the stack just being the integers of the array indices
actually you have several problems here
oof
you're not following the links at all
and the stack should be popped at most once
if you actually need a new bucket to link to
🤦♂️
ok, so i had p = 0 so that after i write a key somewhere, i can keep track of where with p and then set table[loc].pointer = whatever my insertion location ends up being
ohhh
i just remembered, i only have to implement chaining for when bucketsize = 1
so i don't have to worry about updating individual slots for the pointers
@haughty mountain can you help me with my problem?
^^
your second if is pretty badly wrong
you can take more than is valid from an account, you can also end up taking more than you need
i guess i don't need that p var
do you follow links?
ohhh
i'm picking up what you're laying down
obviously i need to check the bucket pointer
and you should only give a new bucket if you absolutely need to
i should just go ahead and write key if the bucket pointer is None
you still have a big bug on top of that
i'll get there
how many times should you insert the key?
1
how many times could your code insert the key?
datalen
right
sheesh i am bad at this
hint: ||break||
oh duh, i have break in my other insert loops
ok, how far off am i from something viable
nvm
i see error
i'm just trying to get this running so i have plenty of time for the analysis and adding collision counting and stuff
the pointer doesn't tell where to insert, only used for lookup, correct?
not only lookup
you might hash to bucket h,
if h has room you can insert
if h is full but tells you that it's linked to bucket h', you'll need to check there
if h is full and not linked to anything...
so that logic nullifies my "if a pointer is None I should go ahead and write the key" statement
if h is full and not linked to anything, pop a new bucket from the stack
ok my slot = None is giving me trouble
slot being None means the bucket is full
but that's the same as when theres nothing there and i want to write somewhere
oh, nevermind
if there is nothing there i return the empty index as openslot
damn i was really close then i lost my logic trail
im basically breaking up your conditions above into 3 if elif else statements
fwiw you can also add functions to bucket to make the code easier to understand
e.g. why is isopen a function of the hash table?
it's kinda the bucket's job to know that
and you could even add operations to bucket like "try_insert"
oh you know why, so then it has scope of bucketsize and other stuff
which could return True/False depending on if the insert succeeded
if insert failed, try linked bucket if it exists
yeah that's obviously a more logical approach, though adding those conditionals would make the mental tracking of what's happening more difficult, for me anyhow
maybe not
probably not
you can have the bucket perform a simple operation on itself
and then you can write the rest of the code using that abstraction
what like bucket.isfull
ok, but then i'd need to modify my existing linearinsert and quadinsert functions
it might make them simpler even
alright, i'll try implementing this this evening
it's running now without errors so i'll probably start a v2 version
good abstractions can simplify code a lot
running without errors != working, i know
hmm that's helpful to know
so write a try_insert function for my Bucket class.. any other tips / tricks
try_insert will be very helpful in the hash table insert
you could probably imagine search having similarly good operations on the bucket
ok
You are given an array of strings arr. A string s is formed by the concatenation of a subsequence of arr that has unique characters.
Return the maximum possible length of s.
A subsequence is an array that can be derived from another array by deleting some or no elements without changing the order of the remaining elements.
``` Can anyone help me with this, I am thinking to use DP but I am not sure about the right approach
this feels like https://en.wikipedia.org/wiki/Exact_cover which is NP-complete
In the mathematical field of combinatorics, given a collection S of subsets of a set X, an exact cover is a subcollection S* of S such that each element in X is contained in exactly one subset in S*. In other words, S* is a partition of X consisting of subsets contained in S.
One says that each element in X is covered by exactly one subset in S*...
or similar at least
find biggest vs find a full cover
but they are the same problem is a full cover exists
so I doubt you'll find an efficient solution
Oof, like O(2^n) perhaps?
As an aside, it might be fun (or trivial) to think about if, using the exact cover definition, we could determine the compactness of a proper subset of a discrete space. I assume yes, since the subsets in this particular cover should be closed and bounded.. but.. idk. Fun to think about tho. Maybe.
Could anyone explain how to approach this problem for me?
I don't understand when doing dfs, when I've found the optimal edge (solution)
basically yes
i've gotten myself confused regarding how to mark the bucket pointers, but i think i can figure this out
mark?
still referring back to this as a guide
not mark them but how and when to designate a given bucket.pointer. my logic is as follows: if i try_insert and fail due to finding no None in my loop, then the bucket is full, and i should save that bucket table location in p so that later, after I successfully insert, I can go back to the original bucket I had first tried (the one thats full), and assign its pointer to point at the bucket that ultimately got the key written to it.
you don't need p for that part
oh right, i have h(key)
you should already have a variable for the current bucket, no?
for when you're following links
there are no links yet
i'm just trying to get a chaininsert going, nowhere near like deletion or lookup just yet. just trying to build the table
oh
are you saying, if try_insert fails, go and pop from the stack and assign current bucket's pointer to that, because being on the stack, we know it will not fail try_insert
chaininsert needs links though
yeah i'll make the buckets point from one to another with a singly linked list via bucket.pointer
currently you don't have logic for following the links though
i can save that for when i build a lookup function, unless i am misunderstanding
describe what happens if you insert 3 things with the same hash (bucket size 1)
with chaining for collision resolution?
yes
so the first one gets written directly to the bucket to which it hashes
the second one, i'll go there, do a try_insert, fail, pop a newbucket from my stack, go and write the second key to newbucket.keys. then i can write the original bucket pointer to newbucket, building a chain i can follow. i think what you're wanting me to find is that i need a bucket.pointer check during chaininsert itself
you do indeed
yeah i'm having a bit of a chicken and egg logic problem
and when adding a new bucket you can assign the index of the new bucket to pointer and then move to the new bucket
haven't used p yet i know
let p be the current bucket location
hmm it'll have to go after the hash calc then
technically yes
but not in the way you think
what should the initial value of p be?
itself?
oh maybe None?
idk. you're talking about the variable p to update bucket.pointer yes
not the .pointer attribute itself
you need to keep track of which bucket index you're working with
what's the first bucket index to look at?
self.table[hash]
so p = h
well, you don't really need the h
where h = primary_hash(key, self.modulo_int)
p = <the hash value>
hmm ok. i'm going to keep them separate for now but i see how p = h = hash value
it's just easy for me to see that and know i'm looking at the hashvalue
wait what
ok well yes i am populating the table with a for datum in data loop
ok if i can pull that out of the loop, i must be able to pull the hash out as well
bc i'm never rehashing for a given insert attempt
oh wow this was really inefficient huh
actually. why are you taking datalen as an argument?
good ?
i guess in my head i was structuring it like the for loops of the other insert functions, but this one is fundamentally different
why is datalen relevant to anything? 
do i not need a loop to follow all the possible pointers?
why is datalen relevant for that?
idk i will remove
shouldn't it just be the table size?
lms
so the problem statement sort of confuses me in this regard, it seemingly has conflicting statements about the size of the table
i guess for the bucket size 3 approach i'll just make the table length 40. but they say "There will be 120 table slots for all the schemes."
i'll make the loop run through tablesize
so the idea of p is to have it be the current bucket you look at
the start point is based on the hash
if you have a full bucket with a link, what should happen to p?
update it to the current bucket location
this
still curious about this
elaborate?
update p to the bucket.pointer from where try_insert failed
and not quite, since this ignores links
yes
yeah so i'm trying that and sort of caught in an infinite loop like in the case that self.table[p] itself has a pointer
which i can check with if self.table[p].pointer
how is the loop infinite?
it's not infinite. i just need to find the proper checks to enable following each next pointer successfully
i feel like there is a recursive solution staring me in the face
where i call chaininsert again
like, the logic in the loop should be fairly straightforward
i.e. this
no need for a recursive call
ok good lol cause that'd get complex passing p and everything
the idea is recursive though
for sure
the loop is basically the recursion unfolded
can i ask a dumb question
nvm i think i know the answer
why is this not seeing a pointer at that location:
it'll probably work still
also, why are you using h?
h is not relevant
the hash value only serves as the first bucket position
also, what's up with that if elif?
calling try_insert twice
h helps me visualize things. i could put in the primary hash call in for the index but i don't prefer to. the elif is because, there's going to (i think) be a third else clause in the logic. the way its written now, i call it again for visualization purpose so i can follow the steps
shoot i need a while loop huh
not seeing how to follow every bucket's pointer while they are all not None
i feel like i'm if elsing myself into oblivion rn
hmmm
i have an i running through tablesize
perhaps i can use self.table[i].pointer somehow
but h is the wrong thing
in anything after iteration 1
is this remotely in the right direction
i will drop h after checking the bucket and getting it's pointer
no
maybe a nextbucket variable?
the loop is only there because you couldn't make more than tablesize steps
why do you need nextbucket?
the thing im struggling with is the syntax for "follow the pointer to the next bucket, and if it also has a pointer, follow that pointer to the next, and so on."
you have p the current bucket
maybe think through a recursive formulation
not in code, but just on a a wordy description level
if table location at hash has a pointer, set p equal to pointer, call chaininsert on the key at table location p
so I will stress again, don't use h/hash for the indexing
it only serves as the first value of p
oooh
i need like a table[pointer] type jaun
still unable to grasp how to continuously follow pointers 🤦♂️
its sort of painful how bad i am at this at times
other times, it feels easy
did you remove h yet?
yeah i just used p there
so at the beginning of the loop p is always the current relevant node
so if you are updating p to follow the link in bucket p you are following the pointers
pls help
I suspect emails is empty
hmm
oh lol im stupid xD thx
the most highly sought after user in this server lol
the overlord
you need to if slot is not None
i want to insert there if i find that its currently None
keys is a list populated with None in each array location
Anyone?
we're kind of all at the mercy of more talented coders lol
def try_insert(self, key):
for idx, slot in enumerate(self.keys):
if slot is None:
self.keys[idx] = key
return True
return False
hi
@vast adder I've managed a considerable improvement in my image finder
when we were talking it took 8.8 seconds to find 7 instances of the same object
used several tricks specific to my images
didn't use any library or anything
I'm filtering through the big image 8 pixels at a time because the game is that way
wrote my own comparison function to skip a bunch of unnecessary steps
and other stuff
the transparency thing is working aswell
i feel like that has to be the way
because if there's no patterns you can exploit you're basically forced to go pixel by pixel
if you have time, could you provide some test code? would love to give this another try 😛
test code as in the code?
a minimal reproducible example that sufficiently explain what you are trying to optimise!
https://paste.pythondiscord.com/uqazutayek this would be a basic version of the code
Hey @forest tundra!
It looks like you tried to attach file type(s) that we do not allow (.7z). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
hi all, need help with my hash table program
bruh why is argparser so difficult to understand
why are you duplicating logic for the different hash functions?
the only difference is the primary_hash used
why not move that out and deduplicate the logic?
code duplication means editing stuff is more work, more error prone and more annoying
as a first step linearinsert can become
#
def linearinsert(self, key):
if self.hashtype == 'division':
primary_hash = primary_div_hash
elif self.hashtype == 'multiplication':
primary_hash = primary_mult_hash
for i in range(self.tablesize):
h = primary_hash(key, self.modulo_int)
j = ((h + i) % self.modulo_int)
if self.table[j].try_insert(key):
break
already the logic is deduplicated
now, why even do the hashtype check in the function?
we could do that outside the class and pass the hash function to the HashTable instead
boy am i glad you showed up
code is a mess rn
trying to write the main() bit with argparser
and your function might just become
#
def linearinsert(self, key):
for i in range(self.tablesize):
h = self.primary_hash(key, self.modulo_int)
j = ((h + i) % self.modulo_int)
if self.table[j].try_insert(key):
break
i'm not used to setting variables to existing functions, i'll need to do somthing like
primary_hash = primary_div_hash() correct
ok i made the change to linearinsert
so with this i can call from main and not need another for datum in data loop?
wdym?
nothing change other than moving the if conditions for the hash type out
to dedupe code
while i was testing, i was testing with
for datum in data: mytable.linearinsert(datum)
the thing I wrote doesn't change that
my point is you have a lot of duplicated code
im with you
which is annoying to maintain and edit
you should be able to also move the linear vs quadratic probing out, but that could come later
yeah there are a lot of different combinations of logic arising due to all the arguments im passing
getting rid of the hashtype ifs already helps a bunch
btw, your repr isn't ok
it should be returning a string
return str(self.table)
yeah, that works
your Bucket repr should probably be updated to include the pointer
that won't work, and why not format it in some nice way?
do you recall how they're asking for print statements
e.g.
return f'Bucket(keys={self.keys}, pointer={self.pointer})'
i think the pointer will make that difficult
how so?
they just want newlines with either 5 or 3 of the keys printed across so the format is nice for the grader. i guess i don't have to use the __repr__ and i can simply use the keys where necessary
you can override __str__ if you want to
let's not overcomplicate things
what i don't understand rn is the main() logic
the main like for data in datum loop
when i try to write it its just duplicated logic like if collision_resolution = 'linear' type thing
why are you opening a file before reading arguments?
aren't the file name you want among the args?
that's not in the main you posted 
i'm trying to write this loop into my main, since that was the driver of my test line
idk, what should i be doing
you're taking the input and output files as command line arguments, no?
currently you use neither
yes. alternatively i could write a defaultname_out.txt
alright i've changed input to be equal to args.input
and the output?
taking out this line: print(args.accumulate(args.integers))
ah frick
missing a with open('output.txt') as output:
where should i put that
and why the try?
🤷♂️
Move both prints to the same place and put the open right above. Opening should happen right before you do the printing, not long before.
before when you need it :^)
yeah thanks, i've done this.
i think the main thing i'm missing is a loop to write each datum into the table?
The idea is that you don't want to hold the file longer than needed.
right, no reason to keep a file open while you determine the input arguments won't work
Imagine you have a physical file on a clipboard and you want to eat lunch. You could either grab the clipboard, then eat lunch, then fill out the file. Or eat lunch, the grab the clipboard and fill it out. Someone else might want the clipboard and the first option would block them for a long time. It's not super relevant right now. But it also makes that part of the code feel like a complete thought / paragraph (one idea instead of multiple).
i see what you're saying. its about not tying up shared resources
It's easier to reason about non-interleaved things. And opening the file early is now something the reader needs to keep in the back of their mind the whole time even though it does not really come up until the end.
right of course
why is this not writing an output file:
running from command line with this argument:
oof
You can move the open closer to the prints.
keep scopes as small as possible 🙂
(sadly python kinda makes that weird because of its variable scoping)
ok im on it
hmm i need to implement some sort of collision counting i think for stats
just tested "my" hashing method (multiplication), appears to be working with linear probing
going to test with quadratic probing
appears to be working
is this chaining logic sound:
fwiw, you can do what I suggested and move that if-elif out further
you can have it outside the class and pass a function to the class
err, why do you have multiple try_insert?
The function seems very strange to me in multiple ways is really all I can say, if it is trying to do what I think it is.
one case is to try there, next case is to check if there is a pointer
self.stack seems to be a resource that can be depleted, as it never gets anything pushed to it except at initialization, and it seems to be just a range.
idk if your thing is correct, but you should only need one try_insert in the loop
for i in ..., yet i is never used.
think of it as a free list
Oh.
For me this would feel like a while loop.
yes
how can i confirm correctness
While failing to insert...
yeah that'd make more sense if i could figure that out
the one nice thing about a for loop is if you fall through, insertion is impossible
but rather than optimize, i need to get the core functionality of this thing going
but you can detect that in a while loop and a counter as well
like building the hashtable, writing it to an out, and doing their required print statement format
for loop vs while with counter
Yeah in another language like C I would go for while count < ... and failed to insert.
Or well, in C, I can put the failed to insert in the for.
yeah thats like pseudocode
@haughty mountain just told me about the new walrus operator
which i suppose you could use in such a structure
eg declaring variables in a loop header
write test cases where you validate that things are correct
like smaller input size and track the logic?
I think the main thing is the try to insert multiple times (in a single iteration) when we are looping for that.
in any case, the logic in the loop is quite weird, I think I can easily construct a case that makes it fail
I think 4 elements hashing to the same thing breaks it
actually, it's more broken than that
you never assign any pointers
do you actually have a mental model of what the thing should do?
ignoring the code
for chain insertion?
if you do, walk through the code and see if the code actually does what you want it to do
yes
ok
I asked yesterday, what is the behavior when you insert 3 elements with the same hash?
This simple sequence of operation will actually cover most logic
yeah, sure
i think i am missing updating the original collision location table[p].pointer
upon successful insertion
why would you update any pointer on successful insertion?
that's not the operation that means you need to update a pointer
is it like, if bucket == full, update bucket.pointer to point at the very next bucket?
no
if bucket == full, update it's pointer to be the free bucket popped from the stack
and for the logic in the loop, it all comes back to 
correct pointer update, wrong logic
if try_insert fails, update bucket[p].pointer = free bucket location popped from stack
still wrong
go back to my old statement I replied to earlier
re-think things...
you only need 1 try_insert
what you have is way overcomplicating it
hmm ok
and yeah, I can trivially break that code, two elements with the same hash breaks that code
i'm studying this:
you might hash to bucket h
a. if h has room you can insert
b. if h is full but tells you that it's linked to bucket h', you'll need to check there
c. if h is full and not linked to anything...
i could see how if there was a full flag, i wouldn't need try_insert more than once
maybe, a pointer existing indicates it is full
try_insert already tells you
what does try_insert respond if the bucket is full?
False
cool, so you have that info
a. is your if ... break
that part is fine
(though I would make it a return)
A bucket has two parts to it. One is being full or not...
so you have b and c left
i don't see how i can do it without using try_insert again
why do you need try_insert again?
let the "recursion" deal with that
a. not working means you need to find another bucket to put things in
there are two possibilities
b and c
no, doing it with a break/return if true is fine
to be blunt the rest of the loop logic is just about "find another bucket"
no insertion in the rest of the loop
close?
there you go, though you want an is not None for the pointer check
what about the chaining?
(because 0 can be a valid pointer)
what about it?
you said i wasn't doing it before, does this remedy that?
think through the scenario of inserting 3 elements with the same hash
walk through the code in your head
Draw it.
drawing/writing down how the table changes over time is helpful, yes
yeah no it makes sense. during the second insert, the initial if fails, we therefore continue to the elif. if the bucket we tried to insert in has a pointer (is not none), p is reassigned and we go back to the beginning of the loop, and try a new insertion with our new p value (the collision bucket's pointer location). if we fail an insert, and the collision bucket's pointer is None, we go and select a new location from the free stack
right
i mean, it makes sense that we're following a chain, idk about updating pointers
that's what i was on about
Well you have to set it at some point in the loop.
And you have 3 cases.
Would you do it in the first?
should that be a part of the insert functions or bucket defs
this
here
ah, here
your overall logic was wrong then, but...
In your loop's 3 cases you can use process of elimination. Where should you NOT update the pointers?
i need to remember where something has a collision, so i can update that buckets pointer to point at the bucket where i finally successfully insert
let me look
you don't need to remember anything
by the way, this'd be a great time to implement the required collision counters
definitely not case 1
if you're following a pointer, do you need to update that pointer?
no
If it's not None, do I want to change it?
no
lol, same case as I asked about yes
Two eliminated, only 3 options...
almost
right
got it
how can i track collisions? btw that has to be for all collision strategies, so this function likely isn't the place for it
you could technically do this I think, but it's not exactly obvious
p = self.table[p].pointer = self.stack.pop()
oh i see why
i thought you meant simply bc it was a double assignment. but i've got you
btw i ran this and i think there is something wrong, im also printing a count which counts my None's in my out, and i'm getting 92 when it should be 60. so i am failing to insert some elements
let me see if it works with the division hash
the hash shouldn't matter
something is off there as well
I would recommend throwing an exception after the loop
you should never get there
but as written your code would just silently fail if it does
like raise Exception
raise Exception('message') yeah
like same indentation as the for
it shouldn't happen, but you will want to know if it does
correct
after the for
after the for, not in it