#algos-and-data-structs
1 messages · Page 128 of 1
I tried to implement that in this code but I get error
n = 5
with open("file_text_chat.txt","r+") as f1:
#lineas = [linea.strip() for linea in f1.readlines()]
lineas = [linea.strip for linea in f1.readlines[-n:]]
with open("file_only_last_lines.txt","r+") as f2:
if palabra not in lineas:
num_linea = lineas.index(palabra)
f2.write(f"{palabra}\n")
I get this error message
Traceback (most recent call last):
File "n_ultimas_lineas.py", line 5, in <module>
lineas = [linea.strip for linea in f1.readlines[-n:]]
TypeError: 'builtin_function_or_method' object is not subscriptable
What should I modify?
anyone here can help me with coding problems tomorrow 8-10 am??
[9:29 AM]
coding questions like dynamic programming ,graph and trees
Oh dangit.. I forgot the parenthesis there
f1.readlines()[-n:]
My bad
I try with that code but it doesn't give me the correct lines
n = 5
with open("file_text_chat.txt","r+") as f1:
#lineas = [linea.strip() for linea in f1.readlines()]
lineas = [linea.strip for linea in f1.readlines()[-n:]]
print(lineas)
with open("file_only_last_lines.txt","r+") as f2:
f2.write(f"{lineas}\n")
This code write that:
[<built-in method strip of str object at 0x0000024EEED9ECB0>, <built-in method strip of str object at 0x0000024EEED59D30>, <built-in method strip of str object at 0x0000024EEEDE69F0>, <built-in method strip of str object at 0x0000024EEECB7E90>, <built-in method strip of str object at 0x0000024EEEE03480>]
strip is a function call
Use strip with paranthesis: "strip()"
Hope this helps:
https://www.google.com/amp/s/www.geeksforgeeks.org/python-reading-last-n-lines-of-a-file/amp/
oops- i forgot the parenthesis around strip too
what timezone?
India Standard Time
Time zone in India (GMT+5:30)
@winged ploverthank you very much it worked for me 😄
@rancid houndThank you very much, I tried the methods on that page and they also worked for me
!decorator
sounds like someone looking for help in an exam...
will let u know soon tysm
What's it for? As I've already said, if it's for an exam or some other thing where you'll be compared against other people, we will not help with it here.
yeah i know
it is related to web scrapping
now good?
indeed, it sounds like you want help with some sort of graded assignment
hmm
why is the time window so short, if it isn't some sort of graded assignment?
@floral lintel let's not spam #internals-and-peps which is a channel about python and not about algorithms
i can show you a solver i wrote with numpy --- maybe the structure of it will make sense anyway:
def possible(y, x):
"""
Return array of possible digits for location (y, x).
"""
row = grid[y]
column = grid[:, x]
gy = y - y % 3
gx = x - x % 3
subgrid = grid[gy: gy + 3, gx: gx + 3]
return np.argwhere(
~np.isin(N, row)
& ~np.isin(N, column)
& ~np.isin(N, subgrid)
)
def solve(where_empty=None):
"""
Sudoku solver with backtracking.
"""
if where_empty is None:
# Find coordinates of cells that are empty.
where_empty = np.argwhere(grid == 0)
if len(where_empty) == 0:
return True # Indicate grid is solved.
y, x = where_empty[0]
for n in possible(y, x):
grid[y, x] = n
if solve(where_empty=where_empty[1:]): # Done!
print(grid)
return
grid[y, x] = 0
This doesn't keep backtracking once the grid is solved, which i think is the issue you're having.
It returns a sentinel value to indicate it has a solved grid (True, in this case)
oh, wow. That is a very different way of solving my issue.
I gave up trying to get it to return a value, but this could work very well
given an infinite DAG where each edge has a weight 1, is there a better algorithm than just breath first search to find the shortest path to a point.
paths can be assumed finite
hmm, in my problem I can find an upper bound on the path
so IG it is not an infinite DAG and therefore I can just use normal graph search algorithms
you can have an infinite dag where a path between any two points is always finite
kinda trivially
but i don't think there's a fast algorithm for finding paths unless there's some nice property of your dag
A college maintains academic information about students in three separate lists
Course details: A list of pairs of form (coursecode,coursename), where both entries are strings. For instance,
[ ("MA101","Calculus"),("PH101","Mechanics"),("HU101","English") ]
Student details: A list of pairs of form (rollnumber,name), where both entries are strings. For instance,
[ ("UGM2018001","Rohit Grewal"),("UGP2018132","Neha Talwar") ]
A list of triples of the form (rollnumber,coursecode,grade), where all entries are strings. For instance,
[ ("UGM2018001", "MA101", "AB"), ("UGP2018132", "PH101", "B"), ("UGM2018001", "PH101", "B") ]. You may assume that each roll number and course code in the grade list appears in the student details and course details, respectively.
Your task is to write a function transcript (coursedetails,studentdetails,grades) that takes these three lists as input and produces consolidated grades for each student. Each of the input lists may have its entries listed in arbitrary order. Each entry in the returned list should be a tuple of the form
(rollnumber, name,[(coursecode_1,coursename_1,grade_1),...,(coursecode_k,coursename_k,grade_k)])
where the student has grades for k >= 1 courses reported in the input list grades.
The output list should be organized as follows.
1 The tuples shold sorted in ascending order by rollnumber
2 Each student's grades should sorted in ascending order by coursecode
Can someone kindly help me with this question?
guys what would be a good way to get all possible combinations of natural numbers whose sum = a given number
like3 = {(1,1,1), (1,2), (3,)}```i mean i tried a method and it works.. but i am not sure if its best because above 50-60 ish i think it starts to slow up 👀 my method being iterate and reach the number by addition like start from [(1,)] iterate to give [(1,1),(2,)] and so on
so uhm well wanted to ask if there is anything better 👀
idk, but you can use numba, that will speed it up.
hmm yeah i think.. that would speed it up a bit
super basic example using pointless code
like ~14x faster in that specific example (I haven't slept in a while so my math is prob wrong)
ah hmm yeah
but the thing is i am using tuples and sets
for a functionality in it
ahh ohkay hmm nice
pretty much any base python and a few libraries like numpy
Only downside to numba is that the first time you call a function it has to take a few seconds to compile, which will lead to an initial slow down of the program. But after the function compiles, it will be faster.
ahh ohkay
26.527116537094116
time to find it for 50
It is good if you call a function many times or if the function has a large loop or while statement.
Ahh it does have large loops yes
heyy all, I implemented the algo to find the longest path in the maze, but as the size of the grid increases, the time of computation increases, and yes the maze have multiple solutions, so I am approaching the longest path, is there any way to reduce the time of computation, I mean using VMs or something else, that I am not aware of...
or maybe GPUs, but I suppose they are mostly used for parallel computation rather than recursion one...
How are you defining the "longest" path? Can't I just keep retracing my steps to make a path longer?
Just have this node only have one child, yes. An AST is generally not binary, nodes often have many or only 1 child.
that's a pretty cursed way to do unary minus
just don't have it be a binary tree, yeah. It's fine to have different kinds of nodes
just use linked lists when 2 is too few children 😛
[1,2,3] -> List(ListStart, List(2, List(3, ListEnd)))
aren't binary trees beautiful
😔 ListStart and ListEnd
where's the unary function
okay okay, UnaryMinus(None, 5)
there's no reason for a binary ast if you are including any ary nodes
just complicates things
python's ast is just a general tree:
In [39]: ppast("[1, 2, 3, 4, 5]")
Expr
╰──List
├──Constant
│ ╰──1
├──Constant
│ ╰──2
├──Constant
│ ╰──3
├──Constant
│ ╰──4
├──Constant
│ ╰──5
╰──Load
yup, ASTs in general really aren't binary
If you’re reading a leetcode type of problem, are there different things that give hints on what kind of techniques or data structures you’ll probably need to use? Is there anything that talks about that?
Like can you figure out that you’ll probably need to use a hash table, or a graph, or dynamic programming, just by reading the problem?
sometimes, yeah
Is there like a cheat sheet or something for doing that?
i think it's just practice, a lot of questions are worded similarly
can a binary tree keep an object alive?
if it has a reference to it yes
can this be referenced?
node._left = Cacheable(...)
yes, through node._left
Thanks
Hello folks, I have a question.
from what I know nested for loops are usually O(n^2) but not all the time, my question is when you have 2 separate for loops, thats O(n) correct?
for i in somethingHere:
return something
for i in somethingHere:
return something```
Yeah, O(n)
for i in somethingHere:
# do something with somethingHere
for i in somethingHere:
# do something with somethingHere ```
O(n) with the size of somethingHere
Assuming somethingHere is the same thing for both loops
Otherwise, if it’s two independent things, it’s O(n + m)
Are these function both O(n) https://github.com/georgepittock/py-scripts/blob/main/scripts/numbers_in_jumbled_string.py? If I run the code in the comments at the top it looks like submitted is O(n) but new is O(log n) but I don’t understand how it could be?
If it is O(log n) could someone please explain how it is, I’m fairly new to big o notation and understand log n as “you don’t check every element just halving and halving again”
Counter iterates over each character in string so it's O(n)
I thought so, so what would explain why execution time appears to show it as O(log n)?
Or is that just an example of it’s up to the order of n but not necessarily order of n
Why do you believe it to be O(log n)?
could be many things, noise, internal optimization for short strings, etc.
If I run the execution time based on input size n it goes up logarithmically
are you sure it's actually logarithmic instead of just a small coefficient
That shouldn't be possible, as Counter(string) is O(n) already.
what range of input lengths are we talking about
Well uo to n = 4 the new function is slower than the submitted function but above that it gets faster, by the time n is 50-60 it’s 1/4 of the speed
Just from reading the numbers it seemed logarithmic so yeah not very sure :/
Consider the graphs of 1 + 0.2n and 0.5n
import perfplot
numbers = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"]
perfplot.show(
setup = lambda n: "".join(random.choices(numbers, k=n)),
kernels = [submitted, new],
n_range = list(range(10)) + list(2**i for i in range(4,10)),
xlabel = "len(string)",
target_time_per_measurement=1,
equality_check=lambda a,b:a==b
)
Looks extremely linear to me.
both grow at the order of n but one of them has this pesky 1 + at the beginning
in big-o notation we don't care about it because we only care about the rate of growth, it's also the same reason coefficients are discarded
Ah that makes it much clearer, thanks
and here's a loglog plot
An algorithm described as O(N^2) could in reality be something like 3 N ^ 2 + 15 N + 10 000 but since as N grows, the term N ^ 2 will have the most drastic effects the others are discarded.
only at below 2 numbers is the submitted version faster
You also have to remember that runtime complexity does not automatically imply performance
👀 that's a nice library
sleepsort is O(n) 🙂
Are you effectively saying here that for mostly reasonable N some functions will have O(N) performs better than a function which is O(log N)
I hate its interface (every time I use it, I have to go look at the github page, because the author didn't bother putting the same thing in the docstring or, god forbid, the type hints), but yeah, it is
No I'm saying that results between algorithms can be surprising and if you want to gauge performance you need to measure it.
>>> ?perfplot.show
Signature:
perfplot.show(
*args,
time_unit='s',
relative_to=None,
logx='auto',
logy='auto',
**kwargs,
)
Docstring: <no docstring>
you might notice that these are not all the arguments this function accepts, but you wouldn't be able to tell without looking at github 😩
Got it thank you both @vocal gorge @heady jewel really helpful, @vocal gorge am I ok to use one of those photos on my GitHub?
sure; you can also generate them yourself - the snippet I posted is what does it
@ivory questhere's a better one with more points and correctly labeled x axis
numbers = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"]
perfplot.show(
setup = lambda n: "".join(random.choices(numbers, k=n)),
kernels = [submitted, new],
n_range = list(range(1,10)) + list(2**i for i in range(4,15)),
xlabel = "number count",
target_time_per_measurement=1,
equality_check=lambda a,b:a==b,
logx=True,
logy=True
)
As a simple example, in something like C a simple O(n) linear scan for membership on an array is often faster than a O(log n) binary search until fairly significant sizes as reading sequential data is one of the things that CPUs love, and there is not really branching (no if statements) so the branch predictor doesn't fail as often. Runtime analysis does not give you the full picture, you always have to measure.
Are there any practical instances of sin(n) being part of the big O? 
sin(n) is bounded by 1 on either side
Of course
import math
import time
def f(n):
time.sleep(10 * math.sin(n / 1000))
By practical I mean part of an actual, useful algorithm 
https://www.online-python.com/0iB2dDrTWR
dang that was harder to figure out than i expected
Build and Run your Python code instantly. Online-Python is a quick and easy tool that helps you to build, compile, test your python programs.
anyone have xp with a*
my code runs in to an infinite ish loop in some particular mazes
id like to talk it over with someone
share your code
hello all, I have a bunch of partial call graphs that I created per script in the form of json files.
for example, it looks like this
I assume its a dictionary of dictionaries of lists?
does networkx accept formats like this? If it does, can I just concatenate the json files together? or will I need to perform cleanup on the files
Hey @wheat flare!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .json attachments, so here are some tips to help you travel safely:
• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
oh
Hey @short mesa!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
it takes a while to get out of a boxed in start
something like this gets solved but after like 5 minutes
@stable pecan sorry for the ping but i got a few questions on networkx
do you prefer if we do here or in a help channel?
you can ask wherever
ok let me just repeat earlier from above
I have a bunch of partial call graphs that I created per script in the form of json files. https://cdn.discordapp.com/attachments/650401909852864553/883177261635874857/unknown.png
for example, it looks like this
I assume its a dictionary of dictionaries of lists?
does networkx accept formats like this? If it does, can I just concatenate the json files together? or will I need to perform cleanup on the files
heres an example sorry if its messy
i assume this is what i want then? https://networkx.org/documentation/stable/reference/generated/networkx.convert.from_dict_of_lists.html
please tell me if you need me to be clearer
you need to convert the json files to dictionary yes
ah i know when i read them they're already in dict format
can i just concatenate them? or i have to combine them into one big dictionary
i assume its the second one
either way, it's probably easier to do the second
from the example i showed here
so after combining, i'll use this to read it? https://networkx.org/documentation/stable/reference/generated/networkx.convert.from_dict_of_lists.html
yes just want to combine i have like 2k json files
i have another question but after this one
thanks for answering
are the key, value pairs the same or different between dictionaries?
for dictionaries that share keys
different keys but they have the same value
its a gigantic callgraph so a lot of functions may call one function
oh most format is like this i guess
should have said earlier sorry
{
"node1": ["node2", "node3"],
"node2": ["node3"],
"node3": []
}
there can also be another json file thats like
!e
from collections import defaultdict
my_dict_1 = {
"a": [1, 2, 3],
"b": [4],
"c": [],
}
my_dict_2 = {
"a": [2, 3, 4],
"c": [4],
"d": [2, 4],
}
master_dict = defaultdict(set)
def combine(graph):
"""
Add a dict to the master.
"""
for key, value in graph.items():
master_dict[key].update(value)
combine(my_dict_1)
combine(my_dict_2)
print(master_dict)
@stable pecan :white_check_mark: Your eval job has completed with return code 0.
defaultdict(<class 'set'>, {'a': {1, 2, 3, 4}, 'b': {4}, 'c': {4}, 'd': {2, 4}})
{
"node4": ["node1", "node3"],
"node5": ["node2"],
"node6": []
}
this is how i would combine them
oh sorry, you can assume the keys in every dict to be unique
but different keys from different dicts can share the same value
im sorry i should have clarified
every key is unique across all the dicts?
ok, then that's simpler because you can just use |=
{
"node1": ["node2", "node3"],
"node2": ["node3"],
"node3": []
}
{
"node4": ["node1", "node3"],
"node5": ["node2"],
"node6": []
}
it can be like this
and other dicts can have a key in one dict be a value in theirs
!e
my_dict_1 = {
"a": [1, 2, 3],
"b": [4],
"c": [],
}
my_dict_2 = {
"d": [2, 4],
}
master_dict = { }
master_dict |= my_dict_1
master_dict |= my_dict_2
print(master_dict)
@stable pecan :white_check_mark: Your eval job has completed with return code 0.
{'a': [1, 2, 3], 'b': [4], 'c': [], 'd': [2, 4]}
the |= operator will join them all
oh i see
and other dicts can have a key in one dict be a value in theirs
this covers this situation too
i would say
i see i see
you can just do:
for a_dict in iterable_of_dicts:
master_dict |= a_dict
yeah i just can loop through all the json files i have in a folder
and then you can convert master_dict into a graph
regarding the issue of duplicate pairs
networkx will just ignore it right
i dont really know if theres duplicates but theres potential to have it
keys are unique in a dictionary, so if you have the same key it will overwrite a previous one
oh i see
okay
ok i'll give it a shot then
one last thing
as you can see these 2 are the same functions
but the cursed json file outputs it differently for different dicts
i assume networkx is unable to find that they are the same
and i have to perform clean up on the json files
yeah, you can clean it up with regex or maybe modify how you're generating the json to begin with
yeah i have to clean up then alright
is it fair to ask about dictionary clean up then here?
well i would just comb through the raw json files and make sure everything is in the same format with some regex
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.boolean_mask._apply_mask_1d",
"tensorflow.python.ops.gen_math_ops.prod",
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.reshape",
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.concat",
"tensorflow.python.framework.ops.convert_to_tensor",
"tensorflow.python.framework.tensor_shape.as_shape",
"tensorflow.python.framework.tensor_util.constant_value",
"<builtin>.ValueError",
"tensorflow.python.framework.ops.name_scope",
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.shape"
],```
unfortunately its a bit harder
sorry if its a mess
this is one of the dictionaries in the json
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.reshape",
"tensorflow.python.framework.ops.convert_to_tensor",
you can see these two so i need to do some editing to the values
yes, but you can automate that with regex
to get array_ops.reshape
and ops.convert_to_tensor
oh can i?
i dont know like every value to look for though
i also need to change the key itself
that doesn't matter
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.boolean_mask": to array_ops.boolean_mask
really?
let's see if i can figure out a regex pattern for it
thank you so much
feel like i have to relearn regex everytime
i'll try to look around too
yeah i feel you
heres one if you want to test
my previous idea was
see if tensorflow exists in the value name,
extract depending if theres \\ or . in the name
but if regex works then its much easier
theres also the issue of changing the key pair i saw i had to make a new key which is a pain too
can i just use the last part of all the strings --- or do you need the preceding modules
give me a sec let me compare
"tensorflow.python.framework.fast_tensor_util.AppendObjectArrayToTensorProto" -> "AppendObjectArrayToTensorProto" or "fast_tensor_util.AppendObjectArrayToTensorProto"
the latter
"tensorflow.python.framework.tensor_util.constant_value"
"Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util.constant_value",
from these two i want tensor_util.constant_value
there's no way to tell a module from a larger directory unfortunately
yeah so is this idea all i have?
very messy splitting
cause i noticed the names either have \\ or not
ok
!e
import re
pattern = re.compile(r'(?:\S*\\)*(.+)')
m = re.match(pattern, "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util.SlowAppendBFloat16ArrayToTensorProto")
n = re.match(pattern, "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util.ExtractBitsFromBFloat16")
print(m.group(1), n.group(1))
@stable pecan :white_check_mark: Your eval job has completed with return code 0.
tensor_util.SlowAppendBFloat16ArrayToTensorProto tensor_util.ExtractBitsFromBFloat16
ooo
it seems to work
!e
pattern = re.compile(r'(?:\S*\\)*(.+)')
m = re.match(pattern, "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.expand_dims_v2")
n = re.match(pattern, "tensorflow.python.ops.gen_array_ops.list_diff")
print(m.group(1), n.group(1))
@wheat flare :white_check_mark: Your eval job has completed with return code 0.
array_ops.expand_dims_v2 tensorflow.python.ops.gen_array_ops.list_diff
for every value of a key i can run this regex then and just rewrite
on works for the \\ patterns
i was trying to run it on the json text, but i'm failing
might be easier to use on keys to the dictionary and just create a new dict
or find someone better at regex
i dont mind rewriting
so i just perform your regex as i iterate through it?
all the keys and their values
yeah, but still need some way to turn "tensorflow.python.ops.gen_array_ops.list_diff" into gen_array_ops.list_diff it's not obvious to me how you'd do --- perhaps you need a list of directory prefixes to ignore
possibly generated from your filesystem
i was thinking a horrible way
search the list for tensorflow
"tensorflow.python.ops.gen_array_ops.list_diff"
then just split it by . and append last 2
its a very big assumption that they're all represented like that
well i need a condition if your regex fails now
god this method is horribly messy..
i need to go through every list and make a new list then assign a new key
Rough psuedocode
1. for every key, go through the value (a list)
2. for every value, go through each of its element
3. perform filtering and append to a list (regex and whatever method i said up there) for the values
4. perform filtering on the key too
5. assign them together in a new dictionary and output a new json file
then i will obtain a gigantic dictionary of dictionaries of lists which will be converted to a graph using this ``` https://networkx.org/documentation/stable/reference/generated/networkx.convert.from_dict_of_lists.html
Please validate and thanks a lot for helping
how does networkx handle this btw? i guess its fine
it's all just strings to networkx
if the strings are different then they're different nodes
@stable pecan very sorry but after im done with whatever cursed thing i come up with, can you validate this step?
Sorry if you answered it already my head is a bit cluttered
yes you would use from_dict_of_lists
@stable pecan im so sorry for pinging again, but i have been testing a graph example on network x
f = open(r"C:\Users\User\Desktop\Work\SMU\Capstone\Datasets\callgraph\TensorflowPythonCallgraphs\tensor_util.json")
graph = json.load(f)
G = nx.from_dict_of_lists(graph)
for i in list(G.edges):
print(i)
said json
why does it have an edge from a "leaf" to a node
there you go, create_using= just pass nx.DiGraph
f = open(r"C:\Users\User\Desktop\Work\SMU\Capstone\Datasets\callgraph\TensorflowPythonCallgraphs\tensor_util.json")
graph = json.load(f)
G = nx.from_dict_of_lists(graph, create_using=nx.DiGraph)
for i in G.edges: # edges is iterable already, don't need to make a list first
print(i)
sorry for pinging im working on that filtering we made
thanks again
seemed to work
i have an idea to merge the different keys that are actually the same
wouldnt it just be overwritten when i combine them?
heres the code if you're curious its very messy
pattern = re.compile(r'(?:\S*\\)*(.+)')
old_dict = json.load("Some json file")
for key in old_dict:
value_list = []
#Check if the key contains tensorflow, if it does we want to filter it
if "tensorflow" in key:
#Assumption: the tensorflow keys have either \ or not, filter accordingly
#Example: Desktop\Work\tensorflow-master\tensorflow\python\framework\tensor_util._is_array_like -> tensor_util._is_array_like
if "\\" in key:
#Use that regex filtering to perform it
cleaned_key = re.match(pattern, key)
new_key = cleaned_key.group(1)
#Else split it with the dots and get the last 2 parts
#Example: tensorflow.python.util.nest.flatten -> nest.flatten
else:
parts = key.split(".")
new_key = ".".join(parts[-2:])
#Do the same on the values
for uncleaned_value in old_dict[key]:
if "tensorflow" in uncleaned_value:
if "\\" in uncleaned_value:
cleaned_value = re.match(pattern, uncleaned_value)
new_value = cleaned_value.group(1)
else:
parts = uncleaned_value.split(".")
new_value = ".".join(parts[-2:])
value_list.append(new_value)
elif "tensorflow" not in uncleaned_value:
value_list.append(uncleaned_value)
i have something similar but no regex, and doesn't look for "tensorflow" but instead just splits the keys, labeling them by the suffix, and preferring the . ones over the // ones:
In [6]: keys = [
...: "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.boolean_mask._apply_mask_1d",
...: "tensorflow.python.ops.gen_math_ops.prod",
...: "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.reshape",
...: "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.concat",
...: "tensorflow.python.framework.ops.convert_to_tensor",
...: "tensorflow.python.framework.tensor_shape.as_shape",
...: "tensorflow.python.framework.tensor_util.constant_value",
...: "<builtin>.ValueError",
...: "tensorflow.python.framework.ops.name_scope",
...: ]
In [7]: key_suffix = { }
...: for key in keys:
...: if "\\" in key:
...: split_key = key.split("\\")
...: split_key[-1:] = split_key[-1].split(".")
...: if split_key[-1] in key_suffix: # Key already in dict
...: continue
...: key_suffix[split_key[-1]] = split_key
...: else:
...: split_key = key.split(".")
...: key_suffix[split_key[-1]] = split_key
...:
In [8]: key_suffix
Out[8]:
{'_apply_mask_1d': ['Desktop',
'Work',
'tensorflow-master',
'tensorflow',
'python',
'ops',
'array_ops',
'boolean_mask',
'_apply_mask_1d'],
'prod': ['tensorflow', 'python', 'ops', 'gen_math_ops', 'prod'],
'reshape': ['Desktop',
'Work',
'tensorflow-master',
'tensorflow',
'python',
'ops',
'array_ops',
'reshape'],
'concat': ['Desktop',
'Work',
'tensorflow-master',
'tensorflow',
'python',
'ops',
'array_ops',
'concat'],
'convert_to_tensor': ['tensorflow',
'python',
'framework',
'ops',
'convert_to_tensor'],
'as_shape': ['tensorflow', 'python', 'framework', 'tensor_shape', 'as_shape'],
'constant_value': ['tensorflow',
'python',
'framework',
'tensor_util',
'constant_value'],
'ValueError': ['<builtin>', 'ValueError'],
'name_scope': ['tensorflow', 'python', 'framework', 'ops', 'name_scope']}
oh wow
after they are filtered you can rejoin them
this does assume there are no function names that are the same, hopefully that's true
interesting
let me try to understand them
mine kinda works already let me show output
New key: tensor_util
Old list is:
['tensorflow.python.util.tf_export.tf_export', 'Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util._generate_isinstance_check', '<builtin>.frozenset']
New list is:
['tf_export.tf_export', 'tensor_util._generate_isinstance_check', '<builtin>.frozenset']```
for "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util": [ "tensorflow.python.util.tf_export.tf_export", "Desktop\\Work\\tensorflow-master\\tensorflow\\python\\framework\\tensor_util._generate_isinstance_check", "<builtin>.frozenset" ],
another interesting idea is just to have the nodes in your graph be the suffix of the keys (the bare function or whatever) and have a dictionary that provides the modules the functions come from elsewhere
thats another alternative
you can save these properties on the nodes themselves in networkx
i'll visit that if my current one fails
it seems to work well i hope?
well its rather hard to understand, always hard to read toher people's code
im going off the assumption that all the keys are unique
the last idea would be simplest to implement, gimme a sec for an example
my method seems to work nicely, im curious about your method, always good to learn
In [9]: from collections import defaultdict
...: filtered_keys = defaultdict(dict)
...:
...: def clean_key(key):
...: if "\\" in key:
...: *_, last = key.split("\\")
...: *_, last = last.split(".")
...: filtered_keys[last]["directory path"] = key
...: else:
...: *_, last = key.split(".")
...: filtered_keys[last]["moduled path"] = key
...:
...: for key in keys:
...: clean_key(key)
...:
In [10]: filtered_keys
Out[10]:
defaultdict(dict,
{'_apply_mask_1d': {'directory path': 'Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.boolean_mask._apply_mask_1d'},
'prod': {'moduled path': 'tensorflow.python.ops.gen_math_ops.prod'},
'reshape': {'directory path': 'Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.reshape'},
'concat': {'directory path': 'Desktop\\Work\\tensorflow-master\\tensorflow\\python\\ops\\array_ops.concat'},
'convert_to_tensor': {'moduled path': 'tensorflow.python.framework.ops.convert_to_tensor'},
'as_shape': {'moduled path': 'tensorflow.python.framework.tensor_shape.as_shape'},
'constant_value': {'moduled path': 'tensorflow.python.framework.tensor_util.constant_value'},
'ValueError': {'moduled path': '<builtin>.ValueError'},
'name_scope': {'moduled path': 'tensorflow.python.framework.ops.name_scope'}})
networkx doesn't know the difference, anything that's hashable can be a node
but you can just feed networkx these suffixes
and keep this table to look up the paths
or attach the paths to the nodes after you build the graph
np
Hi I'm new to Python. I'm a freshman college student hoping to learn AI/Data Science. I'm looking for a laptop, and would a RTX 3050 GPU do me good?
idk, sometimes they need 3 gpu for the AI.
Probably a better fit for the #data-science-and-ml channel.
man i feel like a newb for doing {**dict1,**dict2} to join dictionaries lol
{**dict1,dict2}
i m ean
arg it keeps killing the asteriks
i lost some values using |= and im not sure why
The asteriks one also makes me lose values
Hm
like some keys no longer have the values associated with it
nwm
|= means what?
calls __ior__
🖐️
For a |= b it means a = a | b
!e
a = 0b101
b = 0b011
print(f"a = {a}")
print(f"b = {b}")
a |= b
print("a |= b")
print(f"a = {a}")
print(f"b = {b}")
@trim fiber :white_check_mark: Your eval job has completed with return code 0.
001 | a = 5
002 | b = 3
003 | a |= b
004 | a = 7
005 | b = 3
Things are getting real here
i want to imporve my dsa is there any free good source for that?
Hello 🙂 There are some resources linked in the pins of this channel.
ok thankyou
it seems nice
x = some_dict.pop(k)
some_dict[k] = None```
Is this amortized O(1), or does needing to resize the insertion ordered array make it amortized O(n)?
for _ in range(n):
x = some_dict.pop(k)
some_dict[k] = x```
Would that just be O(n), or would it be O(n * size_of_dict)?
Repeately popping and setting the same element?
I believe it's amortized O(1), because dicts simply don't shrink their tables
maybe at all, maybe unless they get really empty
i don't think they ever shrink
!e
import sys
d = {}
print("pure", sys.getsizeof(d))
d[0] = None
print("one element", sys.getsizeof(d))
n = 8
for i in range(n):
d[i] = i
print(f"{n} elements", sys.getsizeof(d))
for i in range(n):
value = d.pop(i)
d[i] = value
print(f"after loop", sys.getsizeof(d))
@trim fiber :white_check_mark: Your eval job has completed with return code 0.
001 | pure 64
002 | one element 232
003 | 8 elements 360
004 | after loop 640

dict never shrinks only grows more powerful
!e
import sys
d = {}
print("pure", sys.getsizeof(d))
d[0] = None
print("one element", sys.getsizeof(d))
n = 8
for i in range(n):
d[i] = i
print(f"{n} elements", sys.getsizeof(d))
for _ in range(1000000):
value = d.pop(0)
d[0] = value
print(f"after loop", sys.getsizeof(d))
@fervent saddle :white_check_mark: Your eval job has completed with return code 0.
001 | pure 64
002 | one element 232
003 | 8 elements 360
004 | after loop 640
It’s decreasing in size
So it’s O(n * size_of_dict)?
No wait, it’s just never increasing in size
!e ```py
import sys
d = {}
for n in range(10000):
d[n] = None
print(sys.getsizeof(d))
for n in range(10000):
del d[n]
print(sys.getsizeof(d))```
@fervent saddle :white_check_mark: Your eval job has completed with return code 0.
001 | 295000
002 | 295000
what are you asking is O(n) or O(n*dictsize)?
Wow
the search time of a dict item?
This
pretty sure its O(1) no?
getting an item is avg O(1) and setting an item too iirc when using hashes
python by default uses hashes for dicts, not red-black-trees or anything
But I was thinking about it resetting the iterated array
Because it fills with empty space
space or time complexity?
Time complexity
well yh thats bound to n for sure
!e py import sys d = {} print("pure", sys.getsizeof(d)) d[0] = None print("one element", sys.getsizeof(d)) n = 8 for i in range(n): d[i] = i print(f"{n} elements", sys.getsizeof(d)) for n in range(1000000): value = d.pop(n % 2) d[n % 2] = value print(f"after loop", sys.getsizeof(d))
@fervent saddle :white_check_mark: Your eval job has completed with return code 0.
001 | pure 64
002 | one element 232
003 | 8 elements 360
004 | after loop 640
I don’t get how is this not increasing the dict’s memory linearly. The insertion ordered array should be building up empty space
If it never decreases size, like this suggests
I don’t get it
Does it only account for the size of the hashed into array, not the consecutively filled array?
I think you have to know the arity of the operators
How would you know what args to give the - here 4 3 1 - +
How to separate a string of input characters into 2 substrings and save each of them in a variable.
input_text_to_check = "ElectrikVocal95#9525: what are we talking about?"
I need to create a condition regex that if what are we talking about? appears in the sentence(patron) then enter the condition...
Something like this:
regex_patron = r"\s*\¿?(?:what we were talking about | what we talked about before | what we talked about)\s*(about|)\s*((?:\w+\s*)+)?"
l = re.search(regex_patron, input_text_to_check, re.IGNORECASE)
if l:
Inside that if statement I would need to divide the sentence between the username and what the sentence really is, and to save each substring in a variable
Like this:
user_name = "ElectrikVocal95#9525" #without the ":"
text = "what are we talking about?"
I hope you can help me with this.
does igraph have a similar version of this?
my computer froze when i tried using networkx on a 20k node graph
networkx can handle 24k nodes and 52k edges right?
please ping if you want to discuss or soemthing
Hello. Is this channel about algorytm channel?
Awnser me if anyone is here...
:(

Yes. About algorithms and data structures.
Thanks!
hey @stable pecan i managed to fit my graph into networkx but it froze my entire computer
just want to ask if these pings are annoying
also wanted to ask about this question because i think you're one of the very few people here who know about these
up here
from what i understand they seem to look the same
networkx vs igraph
neat it has a from_networkx thing
i hate going through 2 things but i cant figure out how to make it load a json file
it says unknown format
please ping if you see this and would like to discuss
2. making it accept json or a dictionary of lists, cant seem to find it in the python documentation```
this is a pair thats in the json i fed to networkx -> converted to igraph
so im not sure whats wrong
yes nodes in igraph are always integers
same with graph-tool
you can store node names as properties of the node though
these are c-libraries wrapped in python so they are very good for large graphs though
Oh it doesnt do that automatically?
no
Oh boy
what you can do, if you want
is create your networkx graph, and use nx.relabel to get all integer nodes
but you'll want some list of nodes labels by index
int_to_labels = dict(enumerate(dict_of_lists))
labels_to_int = {v, k for k, v in int_to_labels}
new_graph = {labels_to_int[k]: [labels_to_int[node] for node in v] for k, v in dict_of_lists}
can also do this to not go through networkx
Oh if i want to feed the json directly to igraph ?
Either way it ends at the same result right
Im fine with using libraries
it will be faster not going through nx for this many nodes
networkx is dict-of-dict-of-dict structure for graphs, so it's a huge memory hog for large graphs
Can it reliably handle 20k nodes
it can reliably handle any nodes, if you have infinite memory
I can try it on a better computer but with like 32gb memory but its hard to say if it'll handle huh
sounds like plenty
This performs the steps you mention here?
This
this takes your dict of lists and creates two intermediate dictionaries that convert integers to labels and vice versa
Oops sorry didnt turn off ping
then another dictionary that's the relabeled graph
technically, labels_to_int could just be a list
would be more efficient
Then ill feed both to igraph or something?
Which would be this
i'm not sure if it has a dict_of_lists constructor, but you can create an iterable of tuples from the dict in any case
Assume its this i guess
So steps are
1. Use the code up there to generate new_graph and feed to networkx.
2. If it doesnt, create an iterate of tuples from the dict to feed
New graph should have this already?
Maybe ill just look for a better pc >< this is a lot of work
no the new graph doesn't have this already
In [3]: dict_of_lists = {
...: "zero": [],
...: "one": ["one", "two", "three"],
...: "two": ["three"],
...: "three": ["zero"],
...: }
...: int_to_labels = list(dict_of_lists)
...: labels_to_int = {label: i for i, label in enumerate(int_to_labels)}
...: new_graph = {labels_to_int[key]: [labels_to_int[node] for node in value] for key, value in dict_of_lists.items()}
In [4]: print(int_to_labels, labels_to_int, new_graph)
['zero', 'one', 'two', 'three'] {'zero': 0, 'one': 1, 'two': 2, 'three': 3} {0: [], 1: [1, 2, 3], 2: [3], 3: [0]}
here is an example
you have string labels in dict_of_lists
you make a list of the labels called int_to_labels
you make a dictionary of labels to integers named labels_to_int, and then you can use this last dictionary to relabel the entire graph
the last dictionary is all integers
if you need to know what label an integer node is referring to, you can look it up in int_to_labels
In [5]: int_to_labels[0]
Out[5]: 'zero'
I see, from there though int to labels only gets the keys right of the dict
yep, this assumes that all the labels will appear as keys in the dict
Ah there are unique labels in the values too
then should add them as keys
Ohh
Well i need to look at my json to figure that out it may be already like that
Ok i mostly understand what you do, how would that work for igraph now?
I guess we're just preprocessing the data
once you've converted everything to integer, you can make an igraph graph with it
Yes then i store node names as properties?
yes
The python version of this
Hm does this mean i have to iterate the whole dict and manually construct the graph then
Since it doesnt seem to accept the whole input
Well i cant find it from here
Sorry the documentation is rather confusing
Sounds like a lot of iteration
Going through every node and creating it then assigning after
In [39]: dict_of_lists = {
...: "zero": [],
...: "one": ["one", "two", "three"],
...: "two": ["three"],
...: "three": ["zero"],
...: }
...: int_to_labels = list(dict_of_lists)
...: labels_to_int = {label: i for i, label in enumerate(int_to_labels)}
...: new_graph = {labels_to_int[key]: [labels_to_int[node] for node in value] for key, value in dict_of_lists.items()}
...: g = ig.Graph(); g.add_vertices(len(int_to_labels)); g.add_edges(((u, v) for u, out in new_graph.items() for v in out))
...: g.vs["label"] = int_to_labels
In [40]: print(g)
IGRAPH U--- 4 5 --
+ attr: label (v)
+ edges:
1--1 1--2 1--3 2--3 0--3
In [41]: g.vs[0]["label"]
Out[41]: 'zero'
it's undirected, use directed graph
g = ig.Graph(directed=True)
is all you have to do
Ah i see
G.vs label just saves node properties
Interesting
Well saves it to the node under label
yeah, you can attach arbitrary properties to nodes or edges
Ill try to work it on a small file first
Thanks a lot
May i ask if im bothering with the pings
it's fine
I try to not do it too much but its confusing
there's not a lot of people that know all the graph libraries here
i've worked with them all at some point
well, "all" -- i've worked with nx, igraph, and graph-tool
Thanksss
Anything i should look out for or any extra tips?
Probably gonna have issues cause i need to learn it
a lot of it is just playing around in an interpreter and making sure the graph is doing what you want
After this is labeled i can try
i have a file somewhere that unifies the graph interface for all three libraries
Well the get shortest path stuff
Using names
print(g.get_shortest_paths("one", to="zero") )
I assume
Then itll be [one, three, zero]
oh yeah, https://gist.github.com/salt-die/ac6b7e75df258bdd4cff6e259ee50909 -- this was a bit of an experiment to create a single Graph that could use any library underneath, i didn't finish it
Wewww that looks neat
Salt die
Pepper
Then its againn
Salt die and then pepper
You guyz are not gonna give chance to anyone else to write : 😅 😂
One final thing before i go mess around, would this be correct then
Ill do this in a help channel next time
i think what will happen is ig will give you a list of Vertex objects and you can get the label from each of these
Hmm so i need to know the number associated to the label then
Guess i can just get it that way
more like, labels_to_int dictionary was created for this
g.get_shortest_paths(labels_to_int["one"], to=labels_to_int["zero"])
Ahh
Alright
Thabks a lot sorry for hogging the channel
Ill do it in a help channel next time
@south gorge sorry for taking the whole channel
Hey no problem i was just joking
I mean this had alot of serious in it
from itertools import product as prod
lst2=['d','e','f']
lst1=['a','b','c']
lst=[lst1,lst2]
list(prod(lst1,lst2))
('a', 'e'),
('a', 'f'),
('b', 'd'),
('b', 'e'),
('b', 'f'),
('c', 'd'),
('c', 'e'),
('c', 'f')]
How do I correct the code if I want to just pass 'lst' in list(prod(lst)) and get the same output instead of whats written
Like, pass a single list that contains all of the individual lists within itself.
Btw, the output is just the list itself if I try to pass just 'lst'
@stable pecan Hi sorry to bother you but I saw the conversation earlier about graphs and I've never used a graphing library before could you explain to me if it would be better for storing/creating a simple graph with node - connections than a dict would be
graph = {
"node_name": [children]
}
typically graph libraries already have algorithms made for things like shortest path or largest connected component
and other than that wouldn't it be even more efficient to create a node class that has reference to its connections?
so if you need to do any algorithmic thing with your graph, probably better off using the library
depends -- some libraries are actually written in c, so the graphs are smaller than anything you could make in python
networkx has large graphs though, because it's a pure python library
really?
yes, graph-tool and igraph both wrap c
I know it would probably be faster but dont understand why it would take less memory
because c objects are often contiguous memory
I dont think that would work for creating a graph though no?
you would need pointers either way
you don't need, you can just resize when adding nodes, but i don't know how they're implemented - just that it's possible
hmm but then I guess the maximum size would be smaller since it would require a fully empty block in memory?
whereas pointers as long as there is memory available it can keep expanding the graph and would also be faster to add connections dynamically?
so I guess its more about use case then if I know the graph wont ever change I could create it as a contiguous block otherwise use pointers to add nodes dynamically
both graph-tool and igraph don't allow arbitrary objects for nodes though, nodes can only be ints, so you can specialize your data structure here
ahh
and the nodes are always contiguous
so you cant have like 0, 5, 19 as your only nodes
do you know what kind of structure they use to create the graphs?
no idea, graph-tool wraps boost library, and igraph wraps something else
can just read their documentation
aight thx for the info appreciate it 🙂
https://igraph.org/ igraph docs
Hi guys, I'm looking for a good library for Trees(RBTree, BTree, AVLTree)
i implemented avl and binary trees in a library, i dunno a general tree library though
sacks/sets/avl_tree.py lines 55 to 63
class AVLTree(BinarySearchTree):
"""
A self-balancing binary search tree.
Notes
-----
This version of an AVL tree allows multiple of the same item to be inserted.
"""```
Thanks
is there a better way of finding the "length" of a number in binary other than len(bin(n)) - 2 for n > -1? by length i mean this ```
length(0b1010) == 4
length(0b000000010) == 2
^^^^^^^
Leading 0s ignored
can always take the log base 2 of the number
In [6]: from random import randrange
...: from math import log, ceil
...:
...: for _ in range(10):
...: c = randrange(100000)
...: print(c, len(bin(c)) - 2, ceil(log(c, 2)))
...:
19575 15 15
66502 17 17
13819 14 14
72240 17 17
17658 15 15
63511 16 16
49062 16 16
97095 17 17
43991 16 16
81209 17 17
Also, int.bit_length
!eval ```py
for i in range(10):
print(i, bin(i), i.bit_length())
@keen hearth :white_check_mark: Your eval job has completed with return code 0.
001 | 0 0b0 0
002 | 1 0b1 1
003 | 2 0b10 2
004 | 3 0b11 2
005 | 4 0b100 3
006 | 5 0b101 3
007 | 6 0b110 3
008 | 7 0b111 3
009 | 8 0b1000 4
010 | 9 0b1001 4
This comes in handy if you want to get an approximate log base 2 without importing math (for some reason).
It would probably be better to test this with eg c = int(10 ** random.uniform(0, 5)) rather than c = random.randrange(100000), so that you get a nice distribution of bit-lengths 🤔
kinda unlucky he didn't get any that weren't 5 digits lol
Ah, well it's because 9/10ths of the numbers in range(10_000) are 5-digits long.
at least a little bit unlucky then ¯_(ツ)_/¯
A little bit 😄 It has a 35% chance of happening.
ah i found it, its int(log2(x)) + 1
source: https://math.stackexchange.com/questions/1508902/given-a-number-how-to-find-the-length-of-its-binary-representation
this isn't exactly right
if log2 is an integer, you'll be one too high
isn't it just ceil? not floor + 1?
!eval ```py
import random, math
def f(x):
return len(bin(x)) - 1
def g(x):
return int(math.log2(x)) + 1
def test(x):
assert f(x) == g(x), f"{x}: {f(x)} != {g(x)}"
for _ in range(104):
x = int(2random.uniform(0, 5))
test(x)
for x in range(1, 10):
test(x)
print('Pass!')
you did log 0
log(0)
@keen hearth :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 14, in <module>
003 | File "<string>", line 10, in test
004 | AssertionError: 9: 5 != 4
Ah yeah 
- 2
!eval ```py
import random, math
def f(x):
return len(bin(x)) - 2
def g(x):
return int(math.log2(x)) + 1
def test(x):
assert f(x) == g(x), f"{x}: {f(x)} != {g(x)}"
for _ in range(104):
x = int(2random.uniform(0, 5))
test(x)
for x in range(1, 10):
test(x)
print('Pass!')
@keen hearth :white_check_mark: Your eval job has completed with return code 0.
Pass!
huh, interesting
Might not work for large numbers though.
this isn't a great random test because x is always a power of 2
random.uniform gives a float though
although it didn't tnecessarily test the case salt-die mentioned, what if it is a power of two
Ah yeah, the int is done after the exponentiation.
Some more thorough testing: #bot-commands message
How many bits of precision do python floats have?
they're doubles, so...53?
Ah right, so maybe we should test 2**54 and numbers around this.
that wouldn't work for any method involving math.log though, since it converts it to a float, right?
Yeah, found a bug! 😄 #bot-commands message
actually, what happens when you convert a really big int into a float. does it just round down?
oh, overflow
duh
Just realised we haven't tested negative numbers 👀
For which I think len(bin(x)) - 2 is wrong anyway, because theres a - sign.
yeah
log solution is wrong too
In [20]: cmath.log(-1234, 2)
Out[20]: (10.26912667914942+4.532360141827194j)
you have to discard the imaginary part
and you can't even choose which branch cut to use
that would give 11 right? which is correct?
yes, after you discard the imaginary part
but this doesn't work:
In [22]: int(cmath.log(-1234, 2)) + 1
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-92bdd2d35a8e> in <module>
----> 1 int(cmath.log(-1234, 2)) + 1
TypeError: can't convert complex to int
So, all in all, I think it's safest to go with int.bit_length 😄
Goes to show, always test at or and around the edge cases.
I'll have to review complex numbers to understand how this works 🤔
Well data structures are just how data is represented in computer programs. The way that you represent data can have a dramatic effect on how efficiently you can process that data.
For example, if you represent a list of numbers as an un-sorted array, to find out if a number is in that list, you have to check all of the numbers one-by-one.
Yes
But if you stored those numbers in Eg a sorted list, or a binary-search tree, you could narrow in on where the number should be by repeatedly ruling out half of the numbers.
Even better, if you store them in a hash-table, you can to straight to where the number should be, and check if it is there.
U mean this will just reduce some time?
You could start at the first page and go page-by-page until you find the page number you're looking for. Or you could go to the middle of the book, see whether the page number is less than or greater than the one you're looking for, then either look in the left-half or the right half, then repeat.
Sorry, I started typing that out so had to finish 😄
Yeah, so have you used a relational database before?
I haave used
Have you heard of "indexing a column"?
I have experience with both SQL and non SQL db
Yep I have heard about it before but not sure as I used SQL many days ago or months
If you index a column of a table, the database builds an auxiliary data-structure (generally a tree or a hash-table), which allows you to quickly look up rows of the table based on the value of that column.
Like primary key does?
Yep, I think when you specify a column as the primary key most databases automatically index that column.
Well maybe but as much as I know primary key is the special value of a cell which cannot be repeated in the table once more and thats why it helps the db to find it faster
As it is the only one with that value in the table
Or I should say a row
I always get confused in row and column lol
Ah yeah, but under the surface, the thing that actually makes it faster is that the database builds an index for the primary key.
Ah I get it
So how does data structs help to make it faster
Well if for example the index is a hash-table, the database can go straight to that row of the table. It doesn't have to scan through the rows of the table to find it.
Dictionaries in python are also hash tables.
So we need a hash table so thaat we can stop scanning all the rows
Can u plz explain me what are hash tables
Sry if I am irritating u by pinging and asking again and again
Erm, no problem, I'm in this channel anyway so I don't get the pings.
And I get like 100 pings a day anyway as a mod 😄
Lol
hash tables are tables made out of shredded potatoes
What!
Wth was that!
if you fry them they become hash brown tables
Yeah. So a hash-table stores key-value pairs. You can use a key to look up its associated value. The hash table uses the key itself to decide where to store it.
hash tables are really a reserved chunk of memory -- then keys are hashed which gives you which chunk in memory the key goes
and hamt is just bacon
if you have too many keys and a small table, your hash table will have poor performance!
I am still confused maybe a example will be helpful
have you used a python dictionary before?
Well yes but not too much
python dicts are hashmaps
Can u plz just code a simple example and tell me which one is the hash or whatever u call it
Like, we could implement our own hash-table in python: ```py
buckets = []
NUM_BUCKETS = 10
for _ in range(NUM_BUCKETS):
bucket = []
buckets.append(bucket)
def hash(key: str) -> int:
"""Tells you which bucket a given key should be stored in."""
# We just add up the ordinals of the characters of the key
# then take the remainder when you divide that by the number
# of buckets.
result = 0
for char in key:
result += ord(char)
return result % NUM_BUCKETS
def add(key: str):
"""Adds a key to the hash table."""
if contains(key):
return
# We want to add the key to a specific bucket.
bucket = buckets[hash(key)]
bucket.append(key)
def contains(key: str) -> bool:
"""Tells you whether the hash table contains the given key."""
# We only need to look in one bucket for the key (not all of them).
bucket = buckets[hash(key)]
return key in bucket
``` This example of a hash-table doesn't have values, just keys. (In practice, you would just use dict or set in python, you wouldn't implement your own hash table. If you were writing a program in C, you might implement your own.)
@steady maple 
I am reading it
Remember that algorithms and data structures are abstract concepts.
You can implement the same data-structure in any language.
@steady maple #bot-commands message
My implementation is not a particularly good implementation of a hash table, but is just to illustrate the concept.
Well its pretty confusing to me what it does as I am new to this data struct and algo
I saw it in bot cmds and it was printing something
Tbh, it would probably be best to learn this from a book or lecture. These concepts need to be learned carefully to understand them.
Well yea this is really confusing
the buckets represent different slots in memory and the hashing is a way to pick which slot a key goes in
Well isin't hashing the way by which websites store user data and password in their db encrypted
Yep, that's a related concept!
yes, that's one use of hashing
In that case they want the hashes to be evenly distributed for security reasons (to make it hard to guess the password from the hash).
In this case we want the hashes to be evenly distributed so that the data is evenly distributed in the table.
The key idea is that the hash function tells us where to look for a given key.
Sorry for mixing different meanings of the word "key" in one sentence 😄
U mean hash just updates the data and make it something like a primary key ?@keen hearth
So that the db gets it faster
This image from Wikipedia kind of illustrates it:
Hash-tables are an algorithm. Databases are one place where they are implemented, to make looking up rows by primary key faster.
Yep, but notice that the hash function tells us exactly where to look for certain data.
Otherwise, we would have to go through the table row-by-row to find it.
The hash function just points us straight to it.
Any data can be converted into a string of bytes, then you just hash that string of bytes.
Can u tell me how to convert something into bytes am I supposed to use ioBytes or any other method
I just mean in general, that's one way you could hash any kind of data.
Ok
all data is just bytes afterall
We don't talk about implementation details in this channel 
Python has a built-in hash function.
What!
But python also has a built-in hash-table called dict.
You would generally use dict rather than using hash directly.
Where and how to get that
dict?
ah right 😄
hash just takes an object and turns it into a number. It is what dict, a key-value hash-table, uses to decide where to store things.
!eval print(hash('hello'))
@keen hearth :white_check_mark: Your eval job has completed with return code 0.
5560446419581122223
Ah I got it
Example of a dict: ```py
my_dict = {
'Steve': 20,
'Amanda': 25,
}
yeah, most ints just hash to themselves
But what in case I want to get the hashed value back @keen hearth
I'm sure hash is meant to be one way
Anyway, I'd recommend picking up a book or online-course in algorithms to learn more.
There are some recommended resources in the pinned messages of this channel, and on our website:
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Neo4j client has builtin option for that as far as I remember 
yeah it has a pretty good renderer too iirc
A commonly used tool is Graphviz: https://graphviz.org
Graphs are specified in its DOT language: https://graphviz.org/doc/info/lang.html
What format is the graph in to begin with?
graphviz is great but if you need something easy and fast check https://csacademy.com/app/graph_editor/
it's really helpful in cp
Ooo nice!
!eval print(hash('hello'))
@steady maple :white_check_mark: Your eval job has completed with return code 0.
8854791851944303749
!eval print(hash('hello'))
@steady maple :white_check_mark: Your eval job has completed with return code 0.
7179474724399652171
LX help
Is it possible to get the hashed value back?
Ah, I think hashes aren't necessarily the same between runs of Python.
the db doesn't use the same hash function as python
python's hash function was mostly meant for speed
@knotty magnet Can u suggest me a hashing algo that can do it?
i think most databases will use BTrees actually
rust uses siphash for their hashmaps so that's probably fine
wonder what python's hashing algorithm is named 🤔
Hello! I have a question regarding matplotlib. As someone suggested, I might as well ask here. This is the message link with the specifics, feel free to ping if you have any solution!
#help-peanut message
plt.plot(x, y)
fig, ax = plt.subplots()
This plots a plot on a new figure, then creates a second figure with one subplot, which you apply all the formatting to. I'm not sure why you need subplots at all for only one plot, but either way, you should be calling it before plotting, and plot on the Axes object it provides you, like ax.plot(x, y).
!eval print(1 + 1)
@keen hearth :white_check_mark: Your eval job has completed with return code 0.
5
Wait what @elfin river
Just kidding 
nothing to see here.. just adding bits together.
Looking to see if anyone has any ideas for approaches to this as I'd be interested in hearing things.
So my problem is that I have a file that's too big to load into memory and I need to randomise the lines in the file. How would you approach it?
I've got a couple ideas related to splitting the file and then merging files, a little like splitting a deck of cards in half and then merging.
One of my mates also suggested I could do passes over the file, noting a line and then doing an in place swap. Then repeating this process.
But I'm curious if there's a better way / alternatives
Since Im sure there's a good way to do it. I just don't know what it is
One of my mates also suggested I could do passes over the file, noting a line and then doing an in place swap. Then repeating this process.
The lines are all the same length?
If they are then you could do something like this in "r+" mode. You can kind of index it using seek
Seems like splitting it up into files small enough to fit in memory, shuffling each individually, then randomly merging them back together would a reasonable way to do it. I don't think you can do an in-place swap, because, as mesolikey pointed out, the lines would have to be the same length.
Yea there not same length unfortunately
numpy question:
I have a viewing frustum that is represented by 6 planes (a plane is 4 floats: a,b,c for the normal, and d for the distance: ax+by+cz+d = 0). I have N elements that I want to render. each element has an axis-aligned bounding box represented by a bounding_box_min vector3 (x,y,z floats) and a bounding_box_max.
I'm trying to figure out which elements are completely outside the viewing frustum by checking where their bounding box lies on each plane.
This is my regular, procedural python code:
def is_bbox_contained_or_intersects(self, bbox_min: Vector3, bbox_max: Vector3) -> bool:
for plane in self._planes:
bbox_x = bbox_max[0] if plane[0] > 0 else bbox_min[0]
bbox_y = bbox_max[1] if plane[1] > 0 else bbox_min[1]
bbox_z = bbox_max[2] if plane[2] > 0 else bbox_min[2]
dot = plane[0] * bbox_x + plane[1] * bbox_y + plane[2] * bbox_z
if dot < -plane[3]:
return False
return True
self._planes is a list of the 6 planes (a plane is a Tuple[float, float, float, float]). Vector3 is Tuple[float, float, float]
I'm running this function per element.
How can I vectorize this with numpy?
np.where
for the first 3 lines
np.dot for the dot product
then just vectorised comparison
how would the np arrays look like though?
(I'm a complete numpy novice, apologies for the silly questions)
nah they're not silly
this is an interesting question
so you wanna make self._planes an array
a 6x4 array
so it would be a 6x4?
okay
and I'm going to assume
you want to stack the bounding boxes
which will make them Nx3 arrays?
where N is the number of elements to be rendered
so bbox_min and bbox_max will be Nx3, sure makes sense
that seems like the main axis of vectorisation to me
yeah
that's the sketched outline
of the solution
if you don't get it in a few hours ping me again here or in #data-science-and-ml
I'm not on my coding computer
thanks!
much appreciated
another numpy question - are there good type hints for numpy arrays and methods?
if you call:
seven(times(five()))
the definition for seven is:
def seven(f=None):
return 7 if not f else f(7)
let's substitute in times(five()) for f:
seven(times(five())) = 7 if not times(five()) else times(five())(7)
but what's times(five())?
times(five()) = lambda x: x * five()
what's five()?
five() = 5 if not f else f(5) = 5
going back up the chain:
times(five()) = lambda x: x * five()
times(5) = lambda x: x * 5
and finally:
seven(lambda x: x * 5) = 7 if not (lambda x: x * 5) else (lambda x: x * 5)(7) = (lambda x: x * 5)(7) = 7 * 5 = 35
not really.
AFAIK?
I could be wrong about this
haven't done much with numpy for at least a year
what kind of type hints
it could be nice to type hint the shape of ndarrays
are you looking for
oh
that would require dependent types
which we don't have in Python
@brave oak I mean the python type hinting: https://docs.python.org/3/library/typing.html
@brave oak regarding np.where, I'm not sure what my condition is for the 6x4 planes array vs. the bbox_min + bbox_max arrays?
yeah, so
two arrays of different shapes
are of type np.ndarray
how do you represent the shape?
keep in mind that shapes differ based on the values in the array
in other words, we have types depending on values
this is called dependent typing
not many languages have it
I see
okay
so
planes is 6x4, right
I really wanted it just as a comment for "this ndarray is of shape X at the moment"
ok, yeah
planes > 0 will also be 6x4
you don't need the distance, though
so you can take planes[:, :3] > 0
that gives you 6x3
now, you want a coordinate from bbox_max or bbox_min depending on whether the corresponding vavalue in plane is True or False, respectively
free meaning?
like not taking time based on the amount of values in it
ok sorry, I was distracting your main point
this is where
you use np.where
because np.where is basically
np.where(c, x, y) -> x if c else y
but broadcasted
np.where(planes[:, :3] > 0), bbox_max, bbox_min)
that's literally it?
well, no
because you want to broadcast it across all the bounding boxes
so the dimensions need to line up
basically, what that means is that each axis needs to "mean" the same thing
for example
right now a bbox is (3,)
and planes is (6, 3)
so the first axis of planes means "number of planes"
but the first axis of the bbox means "coordinate" (x, y, or z)
and this in fact corresponds to the second axis of planes
you see what I mean?
consider further
yes
that if you want to vectorise across bboxes
you need a third axis
because you can't "mix" that with either axis in planes
since, again, that third axis "means" something different
got all that?
I don't get the third axis
okay
one bbox is 3 coordinates
N bboxes
are Nx3
let's say
so in general
ugh
man I should have stuck with M
you have P planes, each of which is 3 points (ignore distance)
therefore Px3
planes[:, :3] is a view, but planes[:, :3] > 3 is not view --- numpy will create a new boolean matrix
yup, that's true
such manipulations always create new arrays (basically)
now (focus on the mins), you have B bounding boxes, each of which is also 3 points
so Bx3
you can manually call the functions with the out= parameter to reuse arrays if you want
i'm derailing though
yeah, simplest example is np.add
I don't know what the equivalent is for comparison actually
the "3" in both cases means "number of coordinates"
so when using np.where between two arrays
one of which represents the planes
and the other of which represents the bounding boxes
that needs to be the same axis
so we have one axis accounted for
next, we have the P axis (number of planes) and the B axis (number of bounding boxes)
these two mean different things
therefore they should not line up
ok so for that I reshape planes[:, :3] so the coordinate axis will be first
yes
yeah
that's about it
I need to go
but
play around with it
it's a good experience
🙂
OK, no idea how to combine the different number of planes with the different number of bboxes but I'll try 🙂
thanks!
@brave oak OK I'm kinda lost with shaping the planes array to support the np.where call so the condition, bbox_min and bbox_max are broadcastable. still trying though
@brave oak
If I'm not mistaken, if bbox_min and bbox_max are each 1000x3, and planes is 6x4, then:
planes_normal = all_planes[:, :3]
planes_greater_than_zero = planes_normal > 0
bbox = np.where(planes_greater_than_zero[:, None, :3], bbox_max, bbox_min)
Now bbox is 6x1000x3 which means, for each of the 6 planes, for each element, what is the x,y,z that I want to check for the bbox.
Now I'm having trouble with doing the dot product between planes_normal and bbox 🙂
my brain is currently fried, sorry
but
that doesn't seem so right..
the resultant shape should be 6, 1000, 3, right
the bbox array is correct, I think. I checked some values.
but I can't figure out yet how to do a dot product between the 3 in bbox (which is 6x1000x3), and the 3 in planes_normal (which is 6x3) while "Sharing" the 6 at the beginning.
The result of the dot product should be 6x1000, a dot product per plane, per element
np.tensordot(planes_normal, bbox, axes=(1,2)).shape
Out[120]: (6, 6, 1000)
I don't get the extra 6 here
