#data-science-and-ml
1 messages · Page 37 of 1
Looking for input on how to make the following snippet run more efficiently:
model = SentenceTransformer('efederici/sentence-bert-base')
def embedding(x):
return np.transpose(np.expand_dims(model.encode(x), axis=-1))
def compute_title_similarity(row):
source_title = row['linkSource']
target_title = row['linkTarget']
# embedding for strings
source_title_embedding = embedding(source_title)
target_title_embedding = embedding(target_title)
# compute cosine similarity between the two titles
return pairwise.cosine_similarity(source_title_embedding, target_title_embedding).item()
df_links['title_similarity_score'] = df_links.progress_apply(compute_title_similarity, axis=1)
I was planning on making a simulation on prey vs predator
Collectively, predators(red) chase preys(green) both have a speed and energy, energy depletes over time. Predators need to eat to split, preys just have to survive.
and I was planning matplotlib to plot this, How would I go on making the code after I've made the classes for doing the basic stuff with prey and predator?
I mean the hidden layers and stuff.
what are you simulating? usually this sort of thing is a genetic algorithm
are you basing this off of something?
like are you trying to do something like this?
https://www.youtube.com/watch?v=tVNoetVLuQg
yea but on a smaller scale and slightly different
well it doesn't seem like he used hidden layers at all. It looks like the neural networks in that video are all single layer nets
each entity will need its own neural network
and every time they reproduce you'll need to randomly tweak the weights of the offspring by a certain amount
for the neural nets I recommend pytorch
I was trying to do it from pyGad.
This is beautiful...I wonder how difficult it would be to make it on PyGame... 
So,after making those seperate entities. I would have to do what?
I've never used pyGad but it looks like it would also work
my guess is it would lag a lot
but I'm sure it would work on a small scale
Aaw...
This made me want to make some kind of AI x AI battle simulation, like some Real Time Strategy game
you would make a function that passes the input data to the entities, which passes that to their neural networks, which outputs their action
then you'd code them to take that action
which input layers should I make for the same?
the video guy had each entity fire several rays at different angles in front of them. Each ray's hit distance would be an input neuron
he's also got neurons for whether the hit object is prey/predator
What does it output?
it outputs a 2-vector
the first number in the vector is the speed it moves forward that turn
the second is angular velocity, which I'm pretty sure is how much it turns that turn
@thick seal consider this example
import torch
import torch.nn as nn
import torch.nn.functional as F
neural_network = nn.Linear(6,2)
ray1 = 5.0
hit_type1 = 0
ray2 = 5.0
hit_type2 = 0.0
ray3 = 2.0
hit_type3 = 1.0
input_tensor = torch.tensor([[ray1,hit_type1,ray2,hit_type2,ray3,hit_type3]])
output = neural_network(input_tensor)
print(output)```
if you have 3 rays pointing forward at different angles and they project 5 units forward
let's say 2 of them didn't hit anything
so their distance is 5
and their hit_type is 0, indicating no hit
but the third one hit a prey at distance 2
then you put all six values in a vector and pass it to the neural network
which outputs 2 values based on its random weights and biases
does that make sense?
Yea it does, but what does torch.tensor do?
pytorch doesn't work with regular python lists
cause they're mad slow
so you need to convert your values to a pytorch tensor, which is probably optimized to be really fast
Ok, and those will have the inputs for all the entities right?
no, each entity will have its own neural net and its own set of inputs and outputs
As a 2d list?
you have a class structure right?
Yes
you'd have a Prey class, and they would each have a neural_network variable
For prey and predators? yes.
yeah same for the Predator class
and you'd loop over all of them and call their neural network function
Why is the list a 2d?
[[ray1,hit_type1,ray2,hit_type2,ray3,hit_type3]]
oh that's a pytorch thing. All vectors and scalars are treated as tensors, so it just treats it as a 2d array with zero rows in the 2nd dimension
so in truth it's not actually a 2d array
if you're doing it the way the guy in the video did it, you'll only need vectors, but pytorch will still insist you put them all in tensor form
So I've to make the neural_network which takes those tensors then returns a tuple of speed and angular velocity?
yes. you'll need to code the raycasting too
So I would need to keep track of coords in the specified classes.
exactly
But how does the ray casting work?
If I got 2 entities nearby How can I check which ray was intersected?
what graphics thing are you using? pygame?
I don't think matplotlib is great for real time animations
well pygame doesn't have built in raycasting
but it does have built in rect collision
so what you could do is simply make a pygame rect with a 2 pixel width
and check if that's colliding with anything
you can do rect collision like this
https://www.geeksforgeeks.org/adding-collisions-using-pygame-rect-colliderect-in-pygame/
then you'll just need code to make the rectangle rotate with the prey/predator
what do you mean by no borders?
like if you go off one end of the screen what should happen?
It would come on the other side of the screen
it would be very tricky to make it continuous, but you could just have it so that once you touch the edge you teleport to the other edge
although I could see the algorithm learning about that and abusing it
you might ask questions like that in #game-development since there might be people there who know more about that
There's an Animator Class in matplotlib which allows you to make animated plots
Can you give me an example on it? Can I make the space continous on that?
Maybe you could make an if statement where if an agent escapes through an extreme, it appears at the other extreme of the window
what would py df[df['column_of_interest']] return btw?
nothing, because bracket mismatch
"A tensor with rank 0 can be represented by a scalar"
Can someone explain what this means? I only know tensor is a 3 or more dimensional array
that's not really what a tensor is
aa
the word "rank" is also not uniquely defined.
a tensor in general is simply a linear transformation, possibly multilinear.
when rank is used to describe the number of "dimensions" (this is also not what dimension really means), then rank 1 means vector, rank 2 is a matrix, rank 3 is a 3-way array, and rank 0 is a scalar
Yea... this is also what the website stated
when rank is used to describe the dimensionality of a vector space (this is the usual definition of dimension), then rank 0 means it is a 0 vector. this is probably not the definition whatever you're using is reading
But I struggle to really understand it
what about it troubles you?
(and again, the number of "dimensions" has nothing to do with it being a tensor)
ahhh
I'm pretty lost right now... um can I start back from the basics
I thought vectors are 1D array, matrix are 2D arrays and tensors are 3D or more arrays
no
But using this, it seems to contradict with what you stated here
yeah almost all of that is wrong
Ahhhh
a vector is an element of a vector space satisfying the 8 axioms
a matrix is a linear transformation between vector spaces given a basis for each of the vector spaces
a tensor is also a linear transformation, more formally being independent of the basis
for example, polynomials can also be vectors
matrices and tensors can be vectors too
vectors and also be tensors 😛
these are just completely separate things in general, and sometimes they overlap
sadly learning about tensors WELL requires a fair amount of familiarity with abstract linear algebra
but let's take it a step back
let's assume by vectors you mean ordered tuples of real or complex numbers
so elements of R^n or C^n
Umm.. do you mind explaining a little more detail to me? For example what are "axioms", what's a "linear transformation", what's a "vector space" and ect?
Linear algebra... I mean I did graduate from highschool taking maths and addmaths... but these are on a completely different level
ok if you know nothing let's just take a step back and make a working definition that will you through this
again, let's consider vectors as collections of real or complex numbers
e.g. [1,2,3]
Yupp that I understand
a matrix can transform this vector into another vector

in particular, it applies a "linear transformation". this is important
a linear transformation has 2 properties
say we have a transformation T, and the vectors x and y
and say we have scalars a and b
hold on what's a scalar?
then T(x + y) = T(x) + T(y), and T(a*x) = a*T(x) means that T is linear
a scalar is an element of a field... which is to say, it's a number for which nice operations are defined. in common cases, a scalar is a number which can be multiplied by a vector in a simple way, and it can also be added and multiplied by other scalars
in the case of R^n and C^n, scalars are just real or complex numbers
let's take again [1,2,3]. if we take the scalar 3, we can multiply it by the vector. 3*[1,2,3] is normally computed as [3,6,9] in R^n
you can also do (3 + 5)[1,2,3] and this will have nice properties like being equal to 3[1,2,3] + 5[1,2,3]
and a couple more properties. objects that satisfy those properties are called scalars. in the case of R^n and C^n the name is clear, as multiplying by them simply "scales" all of the elements of whtaever they multiply
(by the way I'm reading and I take notes to visualize and digest it, keep going)
the tensor part gets more abstract. one usually thinks of tensors as being multilinear
earlier we saw T(x) and studied what needed to be true in order for T to be linear
now let's make a new transformation W, this time acting on 2 vectors
let's say W(x,y)
if we keep y constant, we can ignore it and focus on W(x). similarly we can keep x constant and consider W(y). if both W(x) and W(y) are linear, then W(x,y) is said to be bilinear
this generalizes to arbitrarily many vectors, which is called "multilinearity"
as it happens, T(x) and W(x,y) can ALWAYS be written as matrices in R^n and C^n for finite n
if T(x) = Mx, where M is a matrix, then M is the matrix that applies the linear transformation T
similarly if we make a tensor N so that W(x,y) = N _*_ (x,y). the problem is that _*_ is now not uniquely defined
matrix multiplication always works the same way. tensor products don't
you can defined them arbitrarily
Pandas has two data structure. A dataframe and a series.
So the above code will return a pandas data frame of column_of_interest. If you hadn't wrapped df['column_of_interest'] with an additional df[] it will return a pandas series.
It's better understood when you try it out yourself. Remove the outer df[] and run it to see what it'll also return
however, you can always turn a tensor into a matrix with clever reindexing
in that case, a matrix can be a tensor. it's a (multi)linear transformation, and multiplication is defined as usual
when you keep the tensor as a "rank r" array, or as you called it, a 3 dimensional array, multiplication can be done in several ways and it's up to you to implement it correctly. people usually result to einstein notation to make this clear
again, that 3 d array can be unfolded into a matrix if you like
now coming back to the rank
a rank 3 array has "3 dimensions"
a rank 2 array has 2
you can reshape tensors from one rank to another (with few restrictions)
nothing stops you from flattening a 3 d array into a long vector or a matrix
if you're careful, they can still represent the same original (multi)linear transformation
one of the restrictions is rank 0
rank 1 is 1 column
rank 2 lets you have columns and rows (a matrix)
rank 3 is like a "cube" of entries
rank 0 is no rows, no columns, no nothing. just a single number
which in R^n and C^n, as we said previously, is a scalar
(again, be careful not to confuse this definition or rank with the one related to vector space dimensionality and linear independence)
i think that's about the shorted explanation you can get
ahh
Hold on @wooden sail are you free rn? Or do you prefer another time to like discuss about this further? I have quite a few questions about the explanation you provided...
i have about 15 minutes to spare
Alrightt so um first question, what do you mean by R^n and C^n?
I understand it in a mathematical way, where R power n and C power n
like exponentials stuff
not power, it represents cartesian products
it's a formal way of describing ordered tuples of numbers
isn't cartesian like the graph cartesian?
for example the tuple [a, b, c] has 3 real numbers
not quite
each of a, b, and c is real number
so all the possible values this vector can take are represented by the cartesian product a x b x c
equivalently written as R^3
it'S related to cartesian coordinates in the sense that any cartesian coordinate system CAN be generated and described as a cartesian product, sure
but not quite equivalent. cartesian products are a more general notion from set theory
right...
but for this, what do you mean by "all possible values this vector can take"?
well, a can have infinitely many values
yea
so can b and c
yup
I see... so "R" represents real number
.latex technically $\mathbb{R}$, but i was lazy
ahh I see
.latex $\mathbb{R}^3$, at that
.latex we can say that $a,b,c$ are real numbers more succinctly as $a,b,c \in \mathbb{R}$. similarly for complex numbers via $z \in \mathbb{C}$
oops that was meant to be a C, not a Z. Z are integers lmao
What are complex numbers? I've seen it a few times but never understood it
ok, nevermind that, don't worry about it for now
alright um
for this, does it mean:
all possible values the vector [a, b, c] can take = cartesian product of a x b x c = R^3?
yeah
I see.. I get the gist of it but I can't really make sense of it though...
(roughly, but let's leave the set notation aside for now as well)
especially the cartesian products
yup
the cartesian product of the sets A and B, A x B, is the set of tuples {(1,4), (1,5), (2,4), (2,5), (3,4), (3,5)}
there's no other tuple you can make out of A and B when considering pairs of elements taken from them
right... I see
B x B would be (4,4), (4,5), (5,4), (5,5)
R x R is any possible tuple of 2 real numbers
So basically cartesian product of two sets is basically all the possible values when two sets "combine" in a way?
yep
I see...
that's good to know because it's a fairly common operation to have to do. not only when working with numpy, just in general when coding
hence why the itertools library has a function for cartesian products
I see...
!e
import itertools
x = ['beep', 'boop']
y = [1,2,3]
for tup in itertools.product(x, y):
print(tup)
oof
@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | ('beep', 1)
002 | ('beep', 2)
003 | ('beep', 3)
004 | ('boop', 1)
005 | ('boop', 2)
006 | ('boop', 3)
there we go
In a way, for R^n, n is basically the number of real numbers?
right. that's how long your vector is
you usually see vectors specified as
.latex $x \in \mathbb{R}^n$, or also matrices $M \in \mathbb{n \times m}$
mhm
and the notation says it's a nesting of cartesian products, pretty much
like R ^{2 x 3} is (R x R) x (R x R x R)
I see...
but doesn't that actually mean R^6?
nope, not the same thing
ahh... I see
you CAN make them equivalent through isomorphisms
but that depends on what kind of object you work with
when treated as vectors, yeah, the isomorphism is simple
as linear transformations, you're restricted by dimensionality
right... hold on um
in general cartesian products are not associative nor commutative
for this, what do you mean by "scalars are just real or complex numbers"?
I mean, R^n and C^n, they're just cartesian products? How does scalar come into play here?
i mean that for the canonical vector space R^n, the scalars are real numbers
since s*[a,b,c] = [sa, sb, sc] if all of a,b,c,s are real numbers
how many epochs are enough for transformers?
What do you mean by canonical vector space?
Yep I can understand this
the vector space R^n with the usual definition of vector addition, and multiplication between vectors and real numbers done through distribution, as well as distributivity and associativity
that's the canonical vector space R^n, denoted as "R^n over R"
ah... I
Hold on @wooden sail you probably have to leave soon, do you know of any websites or any reliable source that I can use to research on all the above you mentioned?
I'm starting to get the gist of it but it's too different from what I've learnt about quite literally anything so far
most websites that cover these topics from the coding standpoint are plain wrong
😭
It's alright then... I'll dig as much as I can until you're available again. Do you know approximately when you'll be available again?
https://en.wikipedia.org/wiki/Real_coordinate_space you can try reading here
anywhere that explains the math should be fine
ooo alright, I'll check it up
this looks very friendly https://www.math.toronto.edu/gscott/WhatVS.pdf
can anyone help me on this?
on pytorch
can somebody help me 😢to connect neurons like this , that red marked lines for example. i want to connect like that (from scratch, not using any ai libraries)..
you're trying to set up the matrices for this? there's more than one valid parametrization, depending on how many trainable parameters you want to have
just say the name of this type of nural network
sometimes they call this intra-layer or lateral connection (the one pointing down in the same layer), and it can be rewritten in a way that it still has a feed forward structure
the other red arrow is trivial, that's a usual feed forward connection
idk if the overall thing has a name
got it, i got confused, will brush up concepts once again
i think its recurrent...
i thought of mentioning recurrent networks, but just given your drawing, there was no way to tell whether that was the case
since one has to specify that the node specifically depicts a value from the prior input
Anyone got a dataset for detecting images of diamonds?
Or a pretrained one specifically for diamonds?
wait..
but i want connect only some of it... recurrent connets all
which is why i said there isn't really a name for it
mm
https://www.tutorialspoint.com/artificial_neural_network/artificial_neural_network_quick_guide.htm i think it can be here
Artificial Neural Network - Quick Guide, Neural networks are parallel computing devices, which is basically an attempt to make a computer model of the brain. The main objective is to develop a system t
i am asking because to create a brain like network and to test if its in conscious... i am making it by my own theory
it doesn't have to have a special name for you to implement it
the math in the link you shared is enough to do it, sure
i would recommend using numpy for this, though if you'll involve training, maybe using jax or pytorch would be better. i like jax because it keeps the numpy syntax but allows autodiff
mmmm
i just sent a frd request
and it will take some design my network in paint and will send it later... @wooden sail
Hi, I'm a physics student doing a small-ish coding project using python and I'm struggling.
It's about using scipy.stats to model a general covid19-like disease and its spread.
I have to recreate a graph that's provided to me.
This is an examined project, so I'm not asking for someone to do it for me, just need direction and help.
Is this the right server to be asking for this sort of stuff?
hmm you can try asking here, sure. alternatively you can try the mathematics server, the one with a torus as logo
should I make a thread in python help, since slowmode is enabled here
or is the slowmode just to prevent spam
i think this channel is your best bet. indeed, it's to discourage spam
Hmm, thank you for the help, but it appears all the images are from the front. You don't possibly know a source with pictures from different angles?
What is the job title typically for someone who deploys models into production? Not just training stuff but using cloud sdks
MLops engineer?
s, but i think it helps
s?
I’ve used it but I don’t know it’s syntax by heart, sadly
i need help brother asap
Say that you’d rather use a better framework, ha
Never
need some visualization thats all
it is but I dont have much time
to understand and process it
I know what insights I need to derive, the problem is the syntax
@steady basalt if you could help, that would be life saving moment for me
😦
I’m sorry but as I said I havnt memorised Altair syntax
You have to spend the time going through the docs
Did you have a specific question about a specific syntax?
yes
i want the past and the present trend
using the chemical in a specific year
Syntax : I am clueless how to give the filter and the condition
wanna visualise this one
I’m not actually familiar with past and present trends, what are they?
Current year vs prior years averaged?
yes
Do u have the data
yes
It’s two columns
So you just want to plot a line chart
Well, it’s three; month, current value and past value
Have you been to the line charts page of Altair
yes
the syntax is the problem. I am unable to write the syntax for such a large data set
What if you use matplotlib and just don’t tell anyone
Not opening on mobile phone
lol next to impossible to view on mobile ohone
I’m sure the syntax is the same for small data
If your variables are the same
And if you made monthly averages
Maybe try .melt your dataframe and then giving x,y and colour being variable
Check Altair issues 968 on GitHub
ok
Say I am guessing the price of diamond auctions based on their images.
Each auction has a variable number of pictures; How do I pass this into an image classification?
Can be parallelized, sort of have "dynamic weights" (in most basic neural networks in deep learning, during the forward pass, the weights are held constant, here the output of one part is used as the weights of another), flexible enough that the network can choose to do whatever it wants in the end, the "dynamic weights" (avoiding the term "fast weights" although Schmidhuber will tell you that he invented this) need to add to one due to the softmax.
*Other reasons that are not fully understood yet.
*Avoids other issues of RNNs.
Damn...but just the parallelization can do that much?
I mean, each head acts independently from each other. Only in the end their output is concatenated
Currently, deep learning is mostly a game of "doing more with more", that is, throwing more hardware and data at the problem. If the design can't make good use of the hardware then that is a problem.
Aw... That's a bit lame...
It's still good for other reasons.
I mean...most technologies focus on doing more with less, like automotive engines, fuels...yet deep learning seems to focus on use more and more hardware in order to achieve good results...
(We do more with less, it's our focus and why we don't use deep learning, and by that I specifically mean backprop)
More data is still a problem and always will be though.
Better sample efficiency helps, but only so much.
Yeah, I know that the backpropagation is quite optimized to be viable
But then...I also got the impression that parallelization also helps with the backpropagation, doesn't it? Since the model can be shallow yet efficient
Even solves the vanishing gradients problem...
Too bad that my GAN collapsed after 100 epochs... It was going so well, but the discriminator got too good
Yes, it's all designed around making backprop work.
And making it fit the hardware (GPUs).
Yes, it's very hard to make GANs work.
So sad that, despite so many tricks to make them easier to work, they still end up being trial and error
Even with spectral normalization
Probably not data science but...
Today I got a working render of the Mandelbrot set, in python 3.
I felt it was data science tho so I put it here
I'm trying to replicate a model that I made a few years ago--all the libraries I used at that time have different APIs than they did then. But I'm not completely sure that my new training pipeline is the same as the original. I'm now trying to replicate the evaluation part so I can know for sure if the performance is the same, but the average loss at each epoch is almost the same this time around as it was then. Is that a pretty good indicator that I did it right?
Why wouldn't it be?
I'm too tired to write the evaluation part of the new pipeline, and I don't want to set myself up for disappointment
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
Please don't ask for code reviews of screenshots. Actual text makes it easier for everyone.
'''py
options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--headless") #allows the execution of a full version of the Chrome Browser
options.add_argument("--enable-javascript")
models_status = []
with open('data.json', 'w') as json_file:
for page in range(1,10): #number of pages to check
options = webdriver.Chrome()
#use Chrome as the main browser
options.get(f"site/{page}")
bt = options.find_element(By.XPATH, '//*[@id="app"]/div/div/div/div/div[3]/button')
bt.click()
#Finds the location of The Age restriction button and clicks it
time.sleep(10)
wait = WebDriverWait(options, 30)
#not necessary to wait that much, but my pc is slow
bs = BeautifulSoup(options.page_source,'html.parser')
#get the web page as html string
divs = bs.find('div', id="app" ) #find where the data is
string = divs.get_text(" ", strip=True)
array = string.split()
#split the data from the html into an array of strings
'''
i mean yeah its pretty slow, i try to parse some data in a json file
Use `` instead of '''
``options = webdriver.ChromeOptions()
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--headless") #allows the execution of a full version of the Chrome Browser
options.add_argument("--enable-javascript")
models_status = []
with open('data.json', 'w') as json_file:
for page in range(1,10): #number of pages to check
options = webdriver.Chrome()
#use Chrome as the main browser
options.get(f"site/{page}")
bt = options.find_element(By.XPATH, '//*[@id="app"]/div/div/div/div/div[3]/button')
bt.click()
#Finds the location of The Age restriction button and clicks it
time.sleep(10)
wait = WebDriverWait(options, 30)
#not necessary to wait that much, but my pc is slow
bs = BeautifulSoup(options.page_source,'html.parser')
#get the web page as html string
divs = bs.find('div', id="app" ) #find where the data is
string = divs.get_text(" ", strip=True)
array = string.split()
#split the data from the html into an array of strings`
This isn't a data science question; try #1035199133436354600. And remember that lists are not arrays.
the error message seems to indicate that you're installing a library written in python 2.
actually it looks like it's coming from pip
do you have a really old version of pip??
i don't know how can i find it ?
pip --version
again erroe
I won't look at any more screenshots of text, FYI. please do actual text.
how did you install python?
srry
(base) C:\Users\HP>pip --version
pip 22.2.2 from C:\Users\HP\anaconda3\lib\site-packages\pip (python 3.9)
i used in conda it worked
glad it worked
looks like your python environment is cursed. probably because of the windows store.
how can i fixed it im facing these problem regularly
download python anew from the python website and don't use the windows store
ok i'll get back to you
@serene scaffold i have downloaded a new python latest version from python website now pycharm is showing me this error
PS C:\Users\HP\PycharmProjects\movie-recommender-system> pip install streamlit
pip : The term 'pip' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the
path is correct and try again.
At line:1 char:1
- pip install streamlit
-
+ CategoryInfo : ObjectNotFound: (pip:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException
Hey guys, I'm working on the first Natural Language Processing project I've ever undertaken where I needed to use the pretrained word vectors from GloVe. I can find the files to download the vectors, but can't find any good documentation for how to actually access the word vectors in Python. Does anyone know of any quality resources and reference materials for using GloVe?
i'm doing ML project just learned about vectorization 1 hr earlier
are you sure that the python that you're using is the one that you just installed?
@serene scaffold it has installed in terminal i used py -m pip instaed of direct pip
yes because i i have uninstalled the previous 1
but when im trying to run code error is coming'
PS C:\Users\HP\PycharmProjects\movie-recommender-system> streamlit run app.py
streamlit : The term 'streamlit' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included,
verify that the path is correct and try again.
At line:1 char:1
- streamlit run app.py
-
+ CategoryInfo : ObjectNotFound: (streamlit:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException
PS C:\Users\HP\PycharmProjects\movie-recommender-system> py -m ensurepip --upgrade
Looking in links: c:\Users\HP\AppData\Local\Temp\tmp9q9hg3gx
Requirement already satisfied: setuptools in c:\users\hp\appdata\local\programs\python\python311\lib\site-packages (65.5.0)
Requirement already satisfied: pip in c:\users\hp\appdata\local\programs\python\python311\lib\site-packages (22.3.1)
also upgraded my pip
@serene scaffold it has finally worked
powershell was not installed in my system installed it my pycharm terminal was giving me red alerts every time open it but i ignored it verytime
finally it has worked properly
it was administrator problem
This is likely why you're getting the error
-
If you've pip installed Streamlit ensure your Pycharm is using the same python interpreter where Streamlit was installed.
-
The python interpreter your pycharm is using isn't added to PATH. Have you been using pycharm for your coding before now?
Ohh, it's working now 😊 👍
Hey guys, I'm working on the first Natural Language Processing project I've ever undertaken where I needed to use the pretrained word vectors from GloVe. I can find the files to download the vectors, but can't find any good documentation for how to actually access the word vectors in Python. Does anyone know of any quality resources and reference materials for using GloVe?
I am trying to split some stacked pdfs using computer vision using basic statistics such as std of column pixel density iteratively to merge pages, but the accuracy is relatively low. What are other ways possible to achieve this?
Basically, the problem tones down to page classification for a document, fully unsupervised.
Lmfao I had that exact error once
Only on windows
so you shifted on mac ?
Hi guys, what do you guys recommend for a good data analytics course?
I just finished a blogpost about image similarity based on the colors in an image:
https://technicallyruns.com/posts/similarity/
Feel free to ask questions 🙂
Hello guys I'm making a machine learning model that can predict which genre will be the movie.
As we know some movies got multipole genres so how can I deal with that ?
So if it can have 1 genre, you would probably use the softmax activation function for the last layer, which makes it so the results sum to one, and you pick the class with the highest result
But you could also use a sigmoid, which makes every result either 0 or 1, instead of making all results sum up to 1
This way you are predicting for each genre if the movie has that genre or not
@obtuse talon
https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff you can take a look at this article.
there are few different ways for approaching this problem
Thank you so much!
Hey, so, I'm encountering this error. I've been able to do this algorithm multiple times without any issue, but now it seems like refuses to work.
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
For reference, I'm running on a 3090 24 GB GPU
that means that pytorch can't find the proper cuda installation
could be caused by a lot of things, installing the wrong version of cuda or cudnn, a cpu-only version of pytorch, not adding cuda/cudnn binaries to the proper environment variables (there's a CUDA_HOME variable which it uses to find the installations), etc
@bright pasture are you running this in an environment?
Yes, I am.
I tried multiple environments, and it gave me the same result.
conda or some other environment manager?
Conda.
do you have cudatoolkit installed in the conda environment?
How do I know if I have it?
use conda list and see if it's in the package list
is that it?
Yes. Because most of the stuff was installed under pip
And I just checked the list for pip, cuda's not there either.
cuda won't show up under pip, its a conda package
and that should show pip packages as well
what version of pytorch do you have?
torch 1.13.1
try installing cuda: conda install cuda=11.7 -c nvidia
if that doesn't work then try reinstalling pytorch using the command they show on the website (i'd suggest doing this in a new environment so that you don't mess up anything you currently have in yours)
I'm kind of confused about the difference between probability and odds.
If the probability of something occurring is 1/6 - then the odds are:
1/6 ÷ 5/6 = 1/5
So the chance is 20%
But from the probability, I would assume the calculation would be 100/6 * 1 for 16%
I don't see how if you have a 1 in 6 probability of something occurring, the odds are that you have a 1/5 chance.
Does this mean that 20% of the time the 1 out of 1/6 will happen?
Hey does anybody know how to convert a non-ascii character to an ascii char? for ex: Ἁ -> A
I don't quite understand how you would go about doing that given that the majority of non-ascii characters look nothing like ascii characters
for example what would you convert ↔ to
or λ
assuming you already know which non-ascii characters should be converted to which ascii characters you could just use a dictionary to map them
just the ones that have an ASCII similar character
@austere swift im doing some NLP, and I keep seeing characters that are often used in Italian and other European languages that are not part of the ASCII set, so I want to convert them, for ex:
É, Í, ü, and so on
I think unicode/ascii folding might be what you're looking for (https://pypi.org/project/fold-to-ascii/ etc). Not sure on the best library or whatever for this, but it's something to start googling if you haven't heard the term before
usually that's handled by giving those unicode characters their own tokens as well
that way the model will also be able to understand the difference between E and É
since in those european languages, the accents do make a difference in the word, it's important that your model understands the difference between them
spaCy processes them pretty well, but after extracting all the entities I want to get the most popular, but the issue is that I got the same entities with different non-ASCII chars, for example: 'hannibal' and 'hanníbal'
yeah but im extracting all the names from a given Wikipedia page, and keeping the most popular, to count them all I need to first normalize them
Thanks bro, this seems like what im looking for...
You tell python to make all of those characters the other character? It’s a one liner
Dictionary, list, whatever
from the sounds of it, Spacecraft is right and you don't actually need this. E.g names should use consistent diacritics throughout
You process the characters
I suppose it can't hurt to verify that fact though
.
it does not follow that you first need to normalise them from what you say before
Start with A and work ur way through
15 minute job. Then you can convert any text
Instantly
yes, just list every single possible diacritic for every single alphabet under the sun
there's a reason unicode exists
There’s like 5 per letter no
Ok there’s 9 A on my iPhone
I’m pretty sure you have a list of those typed out for you
and your iphone is probably not exhaustive
Not hard to code this lol
Ok so what’s the issue webscraping A’s and assigning A. Same for every other letter
Get a wiki list or whatever
whenever I run my model like so:
epochs = 25
hist = model.fit(dataset, epochs=epochs, verbose=2)
I get the following output:
Epoch 1/25
4/4 - 1s - loss: 0.0000e+00 - accuracy: 1.0000 - 1s/epoch - 332ms/step
Epoch 2/25
4/4 - 2s - loss: 0.0000e+00 - accuracy: 1.0000 - 2s/epoch - 483ms/step
Epoch 3/25
4/4 - 1s - loss: 0.0000e+00 - accuracy: 1.0000 - 1s/epoch - 319ms/step
and so on and so forth
or... use the library that someone else has made for you to solve a complex and repeated problem
can anyone suggest what I might be doing wrong?
I didn’t know one existed, am offering a solution that would take 5-10 minutes
Incase one didn’t exist
the dataset is a keras dataset created via keras dataset from directory
one was linked
@fallow frost some quick googling also led me to this function, which may be easier to use since it's a builtin
before you participated in the conversation
!d unicodedata.normalize
unicodedata.normalize(form, unistr)```
Return the normal form *form* for the Unicode string *unistr*. Valid values for *form* are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.
The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).
Awesome, problem solved
you might be accidentally including your labels in your input data
it could be caused by a lot of things though
So, I understand now. Probability is the chance/likelihood of an event occurring. Odds is the ratio of positive outcomes to negative outcomes.
It's kind of weird though because Odds are just another way of representing probability, but it's not connected to what we think of as meaningful => the chance of something happening
!e ```py
import unicodedata
print(unicodedata.normalize("NFD", "à"))
@worn stratus :white_check_mark: Your 3.11 eval job has completed with return code 0.
à
that does not do what you think it does - it does what the docs say it does @steady basalt
you can do it with a combo of that plus string encoding though https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string/518232#518232
If I only had a penny for each time someone recommended this function...
Wdym? I don’t know what it does nor did I claim to
But a simple solution is beautiful soup + python default string and data structure
But you do you friend
.
There are more than one ways to solve a problem
Hello, how i can find a cut off point so that anything less than x doesnt get included in my dataset? I was thinking of doing gaussian distribution and anything less than mean or some standard deviation gets removed- am i on the rigth track?
my situtation is the above graph represents total reviews count for each business
however i have lots of review count as shown in the graph that are less than 500, so i not sure what is a good cut off point- i want to keep reviews count that are more than 1000 since that means it popular
well, anything more than 3 std away is usually a good start. Getting into the math, anything with 1 std deviation of the mean includes 66% of your results, and anything past 3 sigma only includes 0.1% which is much more likely to be errenous
thank you yes i am trying to implement this right now.
you could also say if the standard deviation is high relative to the scale you are working at. say you have a 5 star system and the standard deviation 2 (indicating its very controversial) you could include that. In the 5 star review system for example you would need more than 20 reviews of a lower error
yeah thanks, do you know any resources where I can learn how to implement std in python? I am trying to visualize it but most resources online use lots of code just to plot, and i dont understand some of the code
I was trying to implement Linear Regression and I got this error:
**TypeError Traceback (most recent call last)
tensorflow/python/framework/fast_tensor_util.pyx in tensorflow.python.framework.fast_tensor_util.AppendObjectArrayToTensorProto()
/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/compat.py in as_bytes(bytes_or_text, encoding)
83 return bytes_or_text
84 else:
---> 85 raise TypeError('Expected binary or unicode string, got %r' %
86 (bytes_or_text,))
87
TypeError: Expected binary or unicode string, got 37**
This was my code:
CATEGORICAL_COLUMNS = ['sex', 'children', 'smoker', 'region']
NUMERIC_COLUMNS = ['age', 'bmi']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = train_dataset[feature_name].unique()
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
print(feature_columns)
def make_input_fn(data_df, label_df, num_epochs=30, shuffle = True, batch_size=32):
def input_function():
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
if shuffle:
ds = ds.shuffle(1000)
ds = ds.batch(batch_size).repeat(num_epochs)
return ds
return input_function
train_input_fn = make_input_fn(train_dataset, test_dataset)
eval_input_fn = make_input_fn(train_labels, test_labels, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(result['accuracy'])
print(result)
it might help to show the code snippet for where the error occurs
code snippet? and what line is the error on
That is the code @robust jungle@prime hearth
It occured on this line: ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
np.std
it finds the number required to create the percent breakdowns
so thats a distance
yes thank you and i guess i meant plotting, i know .std() gives standard deviation but it is more of visulizing but i think i got it found something online:
means = [-1, 1, 0]
std_values = [0.1, 0.25, 0.5]
plt.figure(figsize=(16, 9))
for mu, std in zip(means, std_values):
# pdf stands for Probability Density Function, which is the plot the probabilities of each range of values
probabilities = norm.pdf(domain, mu, std)
plt.plot(domain, probabilities, label=f"$\mu={mu}$\n$\sigma={std}$\n")
plt.legend()
plt.xlabel("Value")
plt.ylabel("Probability")
plt.show()
``` found this code snippet so gona use it and modify the code using .std() and so forth
hmm okay i not sure why i get this error- is it because the domain or x-axis isnt like in correct order?
@robust jungle CATEGORICAL_COLUMNS = ['sex', 'children', 'smoker', 'region']
NUMERIC_COLUMNS = ['age', 'bmi']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = train_dataset[feature_name].unique()
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
print(feature_columns)
def make_input_fn(data_df, label_df, num_epochs=30, shuffle = True, batch_size=32):
def input_function():
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
if shuffle:
ds = ds.shuffle(1000)
ds = ds.batch(batch_size).repeat(num_epochs)
return ds
return input_function
train_input_fn = make_input_fn(train_dataset, test_dataset)
eval_input_fn = make_input_fn(train_labels, test_labels, num_epochs=1, shuffle=False)
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(result['accuracy'])
print(result)
The error occured on "ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))"
its because for norm.pdf you need a smooth x domain
oh okay, my domain is the column from my dataframe
like np.linspace(np.min(),np.max(),blah)
oh okay thanks il try that
in the last plt.plot
see how you have xline, norm.pdf(xline, np.mean(data),np.std(data))
oh me?
I'm getting this error currently:
Received a label value of 2 which is outside the valid range of [0, 1). Label values: 1 2 2 2 2 2 0 1 1 0 0 0 1 2 1 0
[[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_73146]
my confusion is because I thought sparse categorical crossentropy was supposed to take classes in categorical/int encoding rather than one hot
it cant be above or equal to 1\
it looks like
[0,1)
so is it only binary? or am I missing something
no
or they want it normalized?
the confusing part is that when I look it up
it says to input it in integer format
no idea
I dont have any context
but normally things like that are scaling factors compared a maximum value
and 0,1) is implicit because if its 1 it already is the max and the scale factor is not scaling
Hi guys, I'm doing my undergrad honours right now and I want to code up Tacotron. I've taken a look at the paperswithcode site but the implementations there look pretty complicated - does anyone have an alternate way to get started?
Hii
I need help with Pandas, i am doing this to get 1,2,3 in my csv with NaN in other remaining columns
but getting
Clear? Like expand the x-axis to make the index more readable?
how do you expand it?
i read some stuff but they are like xlim or something which is not what we want
made some minor improvements but still the x axis looks really messy
that's literally the first link i clicked
i checked some other websites too but it didn't work
i wanna have this kind of scroll effect
Classification problems tend to be better when you're dealing with self-learning + supervised tuning, right?
Then, would it be better if I made a GAN where the Discriminator is a self-learning + tuned model?
I don't know if there's a paper testing this. I can only find papers with GANs generating new data without labels and one with a classifier in a conditional GAN
EDIT: Oh yes...this self-learning+supervised tuning (+distillation) is a recent discovery...so this may be why I couldn't find anything...
Oh...and are conditional GANs more stabler than random GANs?
This is getting interesting...
Hi lets say I have an excel sheet that has deets like this
Name address
abc 123
def 456
ghi 789
Now ik there are two columns here and I want details columns wise so I just do like n=list(df.Name) and a =list(df.address) to generate them list wise
But now lets assume I have an excel sheet where I dont know how many and what columns are there
Now how can I fetch the details in the same manner as before (column wise list individually for each column)?? 
I tried to play with df.columns.values but it didnt work and Im stuck in this from long. Tried searching on docs but its confusing
Hey, I need help.
AttributeError '_OpNamespace' 'torchaudio' object has no attribute '_lfilter_core_loop'
I used to have a no module "torchaudio" error, but now I have this error.
hallo
if i compile a model with keras (loaded from folder), will the training progress be reset?
The weights of the loaded model should be whatever they were when you saved it
You can continue adjusting the weights if you want. But it would be up to you to know how many epochs you had trained it for prior
(unless keras behaves fundamentally differently than pytorch.)
I think that, in keras, you have to compile a model and then use model.load(PATH)
Only then you can actually load the pretrained weights
Weird
Sorry if I gave misinformation
Btw remember that thing I said about the loss for my thing @hasty mountain ?
Yes
It didn't work at all?
Even though the loss was the same, the performance was terrible. But I eventually figured out what the problem was


my language model i've trained has plateaud
it is dropping VERY VERY slowly
inverse exponential improvement
hii
when we concat we obtain this using two databases
but I want one below another
I can't use column arguement as I am unsure of how long my row is
the problem is it has like 8 words
it puts 8 words together then repeats 6 words
do you want what?
What is the size of your dataset?
not very big, for training speed
i've been thinking of training a new model on http://norvig.com/big.txt
the lines are in the triple digits
like, change the order of concatenation?
word count is undetermined but i'll check
I dont want irregular columns
1st database should from beginning
8 305 words
well i don't think you should be doing that. give your columns name corresponding to columns in original dataframe
I cant do it because number of columns are way too much
To do it I must mention them
Aren't they just L1-L10?
Hm...strange...
this is just sample, main problem has some 3000
Are you using word embeddings, and a good depth model?
Just generate them then!
I didnt understand
clarify word embedding?
model = Sequential()
model.add(Embedding(vocab, 50, input_length=30, trainable=True))
model.add(GRU(150, recurrent_dropout=0.1, dropout=0.1))
model.add(Dense(vocab, activation='softmax'))
this is everything behind the model
96 000 ish parameters
i've had models with the same parameters, merely a smaller data-set be better at not repeating themselves
Strange, then... It shouldn't be just repeating words...
renamer = { n: f"L{n}" for n in range(3000) }
df.rename(columns=renamer, inplace=True)
Then your columns will have names from 0 to 2999
this is the same model that plateaud around the 1.0 loss and 0.64 accuracy
It outputs a phrase, then repeats that phrase?
indeed.
i have a model with a bigger data-set in the oven (74kb), and it'll train for 100 epochs
Maybe you didn't use a mark to let the model know when the phrase ends? Like <EOS> tokens and <pad> tokens
Can't I just simply add a list as a new row in database?
this data-set is 70 500 characters, and 13 394 words.
i'm pretty sure i don't have tokens no, not even sure if my model has a tokenizer
tokenizers were a concept i didn't quite grasp
How is that related? And well, not arbitrary list, but probably yes.
Probably that's the problem. How did you manage to make your model inputs to have the same size? And the outputs?
the data-sets are just different phrases (messages), separated by newlines taht get filtered out
i have no idea, i'm pretty sure it processes in parts.
I only need to do that
i'm going to be honest i don't understand how this works quite fully lmfao
i'll paste code
Well then just do it
How 😭
Usually, you have to apply padding to shorter sentences in order to make them have the same length as the longer ones.
.append()
full training file
In order to make the model properly differentiate where a sentence ends, you use <EOS> (End of Sentence) token and <pad> for padding
guys i feel so dumb rn
in which columns you want 1 2 3 to go to
first 2 columns
so the method i have right now is to put it briefly, very dumb?
throwing a bunch of words at it and tellling it "figure it out"
import pandas as pd
df = pd.read_csv("C:\\Privat\\Python_VSC\\Data_Analytics_course\\Projects\\fcc-forum-pageviews.csv")
print(df[0])
it gives me an error message
basically
is your username privat?
I don't think that's how it works. You actually need to now which data you want to insert
what error message?
Exactly I dont know because columns are way too many
nah that is just a folder
i think i messed up my script path
I assume you want L1 = 1, L2 = 2, L3 = 3, then you just
newrow = [1,2,3]
df.insert({ f"L{k}": v for k, v in enumerate(newrow, 1) })
@hasty mountain do you have any articles you read to pick up the knowledge of language models you have?
what you get when you just print(df)? is it empty?
oh i get smth
seems like my error was in the csv itself
date value
0 2016-05-09 1201
1 2016-05-10 2329
2 2016-05-11 1716
3 2016-05-12 10539
4 2016-05-13 6933
... ... ...
1299 2019-11-29 171584
1300 2019-11-30 141161
1301 2019-12-01 142918
1302 2019-12-02 220144
1303 2019-12-03 158549
with
print(df)
I have 1000s of rows for 1000s of columns
And I dont know which row ends at which column : D
but when i use
print(df[0 or any row])
i get the same error
print(df[3])
File "C:\Users\Rap\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\Rap\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 3
this error message
ye idk wtf is going on
if you want to get the first row it should be df.iloc[0]
ye, but shouldnt print(df[0]) simply print out the first row?
why does python have a proplem with this?
iloc works btw
i think df[something] indexes a column, not a row
so when you want to get out rows, you do df.loc or df.iloc, when you just want to access series you can do either df[column] or df.loc[:, column]
but why doesnt
print(df[0])
gives me all the dates?
does df['date'] work?
Why 0 does not work -- i don't know, pandas developers decided it is bad to access columns by their order.
probably...
I don't really know, I've started studying NLP recently. But I presume the tokens might do some help
oh yeah,. i just looked it up inmy book, your right
The glorious, famous one, Attention is All you Need.
There's also some coursera courses that might be interesting. Try taking a look as listener.
they even said why the devs prefered the strings instead of the int
sup i forgot smth again
df = (df['value']> min_border) & (df['value'] < max_border)
print(df)
date
2016-05-09 False
2016-05-10 True
2016-05-11 True
2016-05-12 True
2016-05-13 True
...
2019-11-29 True
2019-11-30 True
2019-12-01 True
how do i make this to show the exact valöues again?
how resources-intensive is TF?
Like, is it worth to pay a VPS to run it instead?
I'm currently looking for a perspective to see if I should bother to spend for an appropriate amount of performance free of headaches of insufficient ram and stuff
df = df[ (df['value']> min_border) & (df['value'] < max_border) ]
print(df)
ah ty
so the pattern i'm noticing is that the bigger the dataset is the shorter the repetitions and the more common they are
so i probably shouldn't train that base model on something like big.txt
Uh... I don't know. But NLP models usually train on obscenely large amounts of data
When I tried to reproduce the Transformer, I used like... 6 sentences...and the performance can't get better than horrible
without any libraries at all, in python? not practically, no. your operations will be too slow
then yes
Whenever I import pytorch, I get the following error
Any ideas how to fix this?
/home/matthewbaggins/code/d2l/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:497: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
/home/matthewbaggins/code/d2l/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:529: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count
Hello everyone
What IDE do you recommend for data scientist? I wanted to use vscode but the libraries don't work, now I'm looking for an IDE that doesn't consume a lot of ram.
libraries should work in vscode, make sure you're using the correct venv
but also, the IDE doesn't matter at all. anything is fine. all i'd say is "not pycharm" if you want low ram usage
i use spyder when i have to check intermediate or final values of variables after the code is done executing, but otherwise just text editor and terminal
Why do you think the library don't work? It ordinarily should work.
All you need to do is, change your python interpreter to conda or create a venv and pip install the necessary libraries.
I used Jupyter Notebook and Jupyter Lab most times. Then if I want to do deployment, I switch to VSCode
Sometimes vscode glitches, just delete the library and pip if needed and redownload. Also make sure you have everything updated fully
I personally use vscode anaconda since the environment has the bare minimum stuff for data science
Although jupyter notebook tends to be better at handling dataframes since i want to peek everytime i made changes
you can just do that with normal python with Python: Run Selection/Line in Python Terminal (default shortcut: shift+enter), but notebooks are fairly useful for that, specially when working with graphs
How can i use TensorFlow to recognise a fixed set of 21 characters(like in valorant - Raze,Jett)
(Should ideally return the index as an output found in the list)
I made some code as follows , can anyone suggest changes to it?
Hey @thick seal!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
To mention, the agent_mapping is just a dict mapping sample images names in the same dir (I have 200+) to the correct output.
see https://www.tensorflow.org/tutorials/keras/classification if you haven't yet
you probably just have to do some processing about the labels - I wouldn't even worry about being efficient, just loop though the labels and get the index
I having trouble to download nltk package, using nltk.download() but its not working. plez help me.
help me...
How do I choose the number of neurons in the hidden layer?
So I have to install python to anaconda?But what if I already have the normal python installed? Doesn't it cause problems?
I don't think there are hard-and-fast rules for that.
is this a meme? 😛
I would stay away from anaconda. the problems it was intended to solve have largely been solved by pip and venv.
Well I have downloaded the normal python and the vscode python extension, I gave it - m install pandas, it installed it but when I used it it said "import pandas could not be resolved from source"
Then I saw that to solve that I had to add the path as a virtual variable but it still doesn't work :(
I tried to use venv but I got that my pc does not allow scripts
is this on windows?
Yes
make sure you've downloaded python from the python website. don't use the one that comes with windows or the windows store
I also recommend this terminal over CMD or powershell: https://cmder.app/
cmder is software package that provides great console experience even on Windows
Now that I remember I had the same python installed but it didn't run so I installed the Windows python and it worked
So I have to uninstall python form window don't I?
no. if you download it from the python website, you can do things like py -m venv ./venv to make a venv
that's py, not python
and then ./venv/Scripts/activate
no i want that @serene scaffold
Hey folks. I'm kind of new to pandas and could use help.
I have two dfs. They have different lengths.
I performed a groupby aggregation on one of them, and I'd like to add that info to the first df where the value in a certain Column matches the index value of the Series I got from my groupby calculation.
Because they are different lengths, I'm getting an error.
I could do this the caveman way by just writing some Python loops, but I know there's a better way.
can you do print(df.head().to_dict()) for both dataframes and put it in the paste bin? please ping me when you have done this.
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
With respect, no, I can't. Data is proprietary. It's alright, I'll keep looking on my own.
I see. Unfortunately, the question is too abstract for me to help without seeing the data. Good luck!
could you make a minimum working example with random data instead of the true data?
Yes. Example dataframes that encapsulate all the relevant properties of the real ones would be sufficient.
I'll work on that and come back
I tried to add 60k + neurons, which was impossible
then I came to know about cnn but how can I implement it on a 300x300 img?
How many layers?
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(300,300)),
tf.keras.layers.Dense((), activation='relu'),
tf.keras.layers.Dense(len(valorant_agents))
])
Once you download anaconda it automatically comes with a python interpreter. So that's what executes any python code you write on your Jupyter Notebook.
However, once you switch to VScode (the version that's not inside anaconda) i.e by downloading the VSCode IDE online, you'd have to do some few setup; like installing Microsoft python extension, downloading and installing the version of Python you want on your local machine; from the official python website (i.e if you don't have it installed already.)
Now, this Python.exe file you've installed is different from the python that comes with Anaconda. So in essence you have 2 python interpreters now.
1st from Anaconda distribution env, 2nd from the one you downloaded directly from python.org.
So, afterwards, you'd also have to go back to your VSCode and select/add the python interpreter you want to use.
I hope I didn't further confuse you.
For example here's mine.
Yes once you've downloaded a new python, you need to add it to path. I add both the pip directory and the python directory to path. once you've done that, exit your command prompt, then go back to your vscode, add the python interpreter to your vscode. that should solve the problem.
just i have one interpreter
Cool. Now you need to add python and your pip directory to path.
Good to know I'm not the only one who got problems with anaconda
And where do I do that?
But then, was installing packages through pip that hard back then? I remember when I tried to explore Python in, like, 2017 and even though I didn't know anything else from what's on Python's wiki, I already got angry with anaconda
And I had installed Anaconda because it's supposed "to make things easier for beginners"
Can anyone help me out on this?
on your windows laptop, press windows button + R, you should see this. Hit enter to open your command prompt. Once you're in, type this two things one after the other and hit enter
python --versionthen hit enterpip --versionthen hit enter
Let me know what you're seeing once you've done that
and now what do i do?
The version your Vscode is seeing isn't the same python version your command prompt is showing as well.
Vscode => python 3.10.5
CMD => python 3.10.9
So two things might be the cause.
- You've not properly added python 3.10.9 to path
- You might have added it to path but it hasn't reflected on your vscode. (usually in this case, close your vscode and reopen it. It should pick it after a while)
I think it is the first one
so where do I get version 3.10.9 to add it to the path?
your cmd is showing python 3.10.9 as the active version in use. So you need to add python 3.10.9 interpreter to your vscode as well.
hello, i am trying to make a pearson corelation for business names, but i notice I get so much NaN values. I am following a tutorial but they are doing it in a movie dataset, but im using business names like Mcdonalds and Burger king etc. Does this mean i am doing something wrong or that there is no corelation or i need more data im only using about 1000 samples?
should i use maybe review id like a user id and their associated business recommendation as now that I am checking the tutorial they use index as user_id but i simply just use my dataframe index as default. Not sure if this changes anything. My code:
data_table = pd.pivot_table(yelp_df,values='business_rating',columns='business_id')
data_table.head()
This might be more helpful
- Adding python to path https://www.youtube.com/watch?v=NPML38E6flQ
- Solving the environment problem in vscode https://www.youtube.com/watch?v=GqTsFOtZiQI
In this tutorial you will learn How to Add Python Installation location to Path Environment Variable in Windows 11 operating system.
Download and Install Python in Windows 11 OS
https://youtu.be/waO9Tw_ToJA
our Social Media Pages
https://www.facebook.com/ExampleProgram
https://www.twitter.com/ExampleProgram
https://www.Instagram.com/example_pr...
These are the settings adjustments you'll need to make to use a conda Python environment to execute your code inside of VSCode.
Did you encode the categorical features properly? Can you show the data type of at least the first 3 columns with NaN?
Look, to make things better I uninstalled all the python stuff, so before doing the video I install python from here https://www.python.org/downloads/ ?
hmm maybe i didnt do it right the feature engineering for business_id?
since it not integer
in the tutorial im following their dataset
Yes, download and install python; maybe 3.11 or 3.10
Pearson correlation is a statistical analysis used to gauge/estimate how closely two variables are related. So if the columns you used doesn't have numeric values in them it wont work. Perhaps that's why you're getting NaN
oh okay thanks
oki
so now I alredy have same version
il try with integers then
or in this case for business_id what can i use ?
i was thinknig one hot encoding but maybe not
Awesome. Now you can pip install pandas and proceed with your work in VSCode. Meanwhile is there a reason why you don't use Jupyter Noteebook / Jupyter Lab? 😀
hey i need help
maybe make each business have integer 1 as true and everything else false
i was working on a ai project but spech rec is not working
when i call it, dosent show up
cause almost everything I'm doing in the race I do in vscode and it's easier to use it and it doesn't take up much ram
but
That would make the business id column a categorical variable with two classes (if I unnderstand what you've explained correctly.) This will also leads to another problem when doing Pearson correlation.
You can't compute a Pearson correlation on a column that is discrete / a category.
so in essence;
Pearson correlation on Categorical column & Numeric column == Wrong Statistical Metric To Use
Pearson correlation on Categorical column & Categorical column == Wrong Statistical Metric To Use
Pearson correlation should only be used on two Numeric columns only.
You need to pip install pandas first.
Wouldn't it be better to do this in a virtual environment? Well, you can learn about that much later. Just pip install pandas from your terminal / cmd once it's installed, close the cmd/terminal and run the script again
aa
its working
thsnks so much bro :3😭
Awesome. Now run the script again. pylance should start seeing pandas as a library that's 'customer-friendly' in your VS Code
You're welcome ✌🏾
Merry Christmas 🎉
mery christmas too
Is it returning any error message?
You invoked the wishme() function as well. What does this function do? Meanwhile, the speech recognition is suppose to return the text of what the microphone captured, right?
Ensure when you run the script... when it prints Listening (I hope you do see Listening being printed out) just say something, maybe, "Good morning the cyber guy". If the microphone this catch any voice will return "Say something again" otherwise, it should return the text.
Oops... It appears Tensorflow hasn't been updated to support the new python 3.11 version. Use pip to downgrade to version 3.10 and try again.
pip install python==3.10.9
whichever python installation you are using to run the Jupyter server does not have numpy installed
Please install numpy
ohh I see!! let me try it on jupyter
Have you created python virtual environment
yes I have :)
i'll keep you updated
how can I check if numpy is installed
still not working
weird cause it has worked in the past
pretty much just try to import it
or pip show numpy if you're using pip to manage your dependencies
P and Q are used a lot in the context of probability distributions. Which typically represents predicted and which represents actual?
are you sure the jupyter notebook is executing in the same env you have installed numpy?
and if I wait, how long do you think it will take for tenserflow to work with version 3.11?
I'm pretty sure it is but I can try again
it works on jupyter just not vs code
off the top of my head, p and P are usually used for the true distribution and probability, respectively, and q and Q as another computed quantity that should estimate or approximate the true one
but the notation depends entirely on what you're reading
thanks 😄 I'll stick to that convention, then
(realizing that some authors might not be using it.)
hmm not sure why this wouldn't work if my function scatterangles only takes in two paramaters?
can you show where you defined that function?
import numpy as np
import matplotlib.pyplot as plt
# Define functions to compute the right-hand sides of the differential equations
def f_x(x, y, vx, vy):
return -2 * y**2 * x * (1 - x**2) * np.exp(- (x**2 + y**2))
def f_y(x, y, vx, vy):
return -2 * x**2 * y * (1 - y**2) * np.exp(- (x**2 + y**2))
def trajectory(impactpar, speed):
maxtime = 10 / speed
t = np.linspace(0, maxtime, 300)
x = impactpar
y = -2
vx = 0
vy = speed
# Initialize arrays to store the solutions
x_sol = np.empty(t.shape)
y_sol = np.empty(t.shape)
for i, _t in enumerate(t[:-1]):
h = t[i+1] - _t
k1_x, k1_y = h * vx, h * vy
k2_x, k2_y = h * (vx + 0.5 * k1_x), h * (vy + 0.5 * k1_y)
k3_x, k3_y = h * (vx + 0.5 * k2_x), h * (vy + 0.5 * k2_y)
k4_x, k4_y = h * (vx + k3_x), h * (vy + k3_y)
x += (k1_x + 2 * k2_x + 2 * k3_x + k4_x) / 6
y += (k1_y + 2 * k2_y + 2 * k3_y + k4_y) / 6
vx = f_x(x, y, vx, vy)
vy = f_y(x, y, vx, vy)
x_sol[i+1], y_sol[i+1] = x, y
return x_sol, y_sol
x_sol,y_sol = trajectory(0.1, 0.1)
# Plot the resulting trajectory
plt.plot(x_sol, y_sol)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# Solution to part (b)
def scatterangles(allb, speed):
# Initialize an array to store the scatter angles
angles = np.empty(allb.shape)
# Loop over the impact parameter values
for i, impactpar in enumerate(allb):
# Solve the differential equations and store the final values of x and y
_, vy = trajectory(impactpar, speed)
# Compute the scatter angle
angles[i] = np.arctan2(vy, 0)
# Return the array of scatter angles
return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)
# Plot the scatter angles as a function of impact parameter
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```
the error
ah but if you read the error message, the issue is in line 64 of that function, not in the line where you call scatter angles
it looks like trajectory returns 2 values, but you asked it for 4
where you write _,_,_,vy it should be _, vy or something of the sort
I can't say when that's gonna happen but it's definitely gon be anytime next year. However, please DHYB 😀.
Wouldn't it be better to downgrade to Python 3.10.9 and move on with what you're building instead of waiting? I'd certainly downgrade if I was in that situation.
especially considering that the 3.11 speed boosts don't apply to ML libraries like tensorflow
tensorflow
ohh i see let me try fix this
that fixed it thank you!!
awesome
does math need a special library
those are two separate libraries
math is a library on its own.
math is part of the python standard libraries
ohh okay so import math m
numpy is a completely separate thing... which also does math, but on vectors and by calling C functions built on BLAS/LAPACK
oh right!
sure, try importing math. if you get a typing error (e.g. if vy is a numpy array), then you'll have to use numpy.atan2 instead of math.atan2. give it a shot and see
I'm probably gon have to learn JAX next year 🤔
do it now 😌
it's pretty similar to numpy tbh, but you do have to learn to sidestep conditionals and loops a little
tried import math as m doesn't work
the other stuff with (non)analytic jacobians and hessians requires a fair amount of math to get it right
and this doesn't
ah sorry, you had imported numpy as np. try np.atan2
I've not conquered PyTorch yet 😀. I don't want to jump to JAX just yet.
i've never used pytorch 😌
it wouldn't help you in jax btw. it's really just numpy with autodiff
it's not a machine learning framework, it's lower level than that
ooops hehe yes
there ARE frameworks built on top of it, and they're very similar to pytorch and tf
but error
!e
import numpy as np
print(np.atan2(1,0))
@wooden sail :x: Your 3.11 eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 2, in <module>
003 | File "/snekbox/user_base/lib/python3.11/site-packages/numpy/__init__.py", line 311, in __getattr__
004 | raise AttributeError("module {!r} has no attribute "
005 | AttributeError: module 'numpy' has no attribute 'atan2'. Did you mean: 'arctan2'?
aha, arctan2
Which Deep Learning framework do you use? JAX and TensorFlow?
i don't use a framework
omg smart let me try it
okay fixed now for new errors😂
Hey guys... if I use a GAN where the discriminator does not relies on labels(Unsupervised Learning)...will it assign labels values that will decrease infinitely?
This is what I'm getting so far...after 350 epochs
You pretty much use just numpy from scratch or ? 😳 Isn't that time consuming
numpy and jax, sure. yes, but there's no alternative. the point of the papers we publish is to make new stuff and show it works
it inherently means it doesn't exist
it's easier to make it from scratch for our tests. we don't care about making it into nice software for others to use, it's rather a test for the math in the papers so that it's more convincing
For my current (non work) project, I need to copy a model and concatenate more rows onto the last layer of the copy. And these are things that pytorch make it annoying to do. (And it annoys me that pytorch's API often lacks symmetry with numpy.) Does JAX make these any easier?
not the way you'd like, probably
that's not really a math operation, so it's also not differentiable
the way you'd do it is by embedding in a higher dimensional vector space, so by multiplying with an identity matrix with an extra row of all zeros
jax hates anything that depends on the vector length
And I suppose that Jax is preoccupied with mathematical purity?
yep

it basically requires functional programming
(not quite, it's fairly relaxed, but you get the idea)
Do I? 
Why
My loss was negative...and it was a Binary Cross Entropy Loss
What would you want to change about it
i'd remove it, i don't think it makes sense with dynamic typing 😛 that's my hot take
You could be friends with godlygeek
My issue with type hinting is that you can't get very much specificity for data science stuff. Saying that a function "returns a DataFrame" is one of the least informative statements one could make.
yeah i guess that's mostly my issue with it. i can see it being more sensible in other applications, but it's not descriptive in any way for this stuff
i prefer a hefty numpy style docstring any day of the week
If we have a dataset in which the training set has let’s say 2 images of the same sample. Will random shuffling cause data leakage due to potentially 2 images of the same entity going to train and test or is this something not to be worried about ?
Spark, in Scala, has a dataset API, which is basically statically typed dataframes. One way you can type it is by defining a case class (dataclass) where the rows are attributes.
Someone should do that for pandas or polars
does anyone see why I'm getting an error from my code
there's a rejected pep that would have made expressions like pd.DataFrame[columns=['a', 'b'], dtypes=[int, str]] syntactically valid. I had been considering making some sort of validator, but I'm sure I never would have finished it even if the pep had been accepted.
I just want some way for my editor to actually become helpful when working with dataframes
right now it's just miserable
looks like you might have nested lists unknowingly. let's see if we can replicate it
!e
import numpy as np
np.array([1,2,[3,4]])
huh is the 3.10 version stripped down?
@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.
<string>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
sorry what do you mean
oo ?
!e
import numpy as np
x = np.array([1,2,3])
x[0] = np.array([4,5])
@wooden sail :x: Your 3.11 eval job has completed with return code 1.
001 | TypeError: only size-1 arrays can be converted to Python scalars
002 |
003 | The above exception was the direct cause of the following exception:
004 |
005 | Traceback (most recent call last):
006 | File "<string>", line 3, in <module>
007 | ValueError: setting an array element with a sequence.
is it this bit?
.bm
Click the button to be sent your very own bookmark to [this message](#data-science-and-ml message).
probably in the line where you assign stuff to x_sol and y_sol
x and y look like they're vectors (at a glance)
what needs to be done to make it scalar (not sure if this is a dumb question) or understand what is wrong
import numpy as np
import matplotlib.pyplot as plt
# Define functions to compute the right-hand sides of the differential equations
def f_x(x, y, vx, vy):
return -2 * y**2 * x * (1 - x**2) * np.exp(- (x**2 + y**2))
def f_y(x, y, vx, vy):
return -2 * x**2 * y * (1 - y**2) * np.exp(- (x**2 + y**2))
def trajectory(impactpar, speed):
maxtime = 10 / speed
t = np.linspace(0, maxtime, 300)
x = impactpar
y = -2
vx = 0
vy = speed
# Initialize arrays to store the solutions
x_sol = np.empty(t.shape)
y_sol = np.empty(t.shape)
for i, _t in enumerate(t[:-1]):
h = t[i+1] - _t
k1_x, k1_y = h * vx, h * vy
k2_x, k2_y = h * (vx + 0.5 * k1_x), h * (vy + 0.5 * k1_y)
k3_x, k3_y = h * (vx + 0.5 * k2_x), h * (vy + 0.5 * k2_y)
k4_x, k4_y = h * (vx + k3_x), h * (vy + k3_y)
x += (k1_x + 2 * k2_x + 2 * k3_x + k4_x) / 6
y += (k1_y + 2 * k2_y + 2 * k3_y + k4_y) / 6
vx = f_x(x, y, vx, vy)
vy = f_y(x, y, vx, vy)
x_sol[i+1], y_sol[i+1] = x, y
return x_sol, y_sol
x_sol,y_sol = trajectory(0.1, 0.1)
# Plot the resulting trajectory
plt.plot(x_sol, y_sol)
plt.xlabel("x")
plt.ylabel("y")
plt.show()
# Solution to part (b)
def scatterangles(allb, speed):
# Initialize an array to store the scatter angles
angles = np.empty(allb.shape)
# Loop over the impact parameter values
for i, impactpar in enumerate(allb):
# Solve the differential equations and store the final values of x and y
_, vy = trajectory(impactpar, speed)
# Compute the scatter angle
angles[i] = np.arctan2(vy, 0)
# Return the array of scatter angles
return angles
allb = np.arange(-2, 2, 0.001)
angles = scatterangles(allb, 0.1)
# Plot the scatter angles as a function of impact parameter
plt.plot(allb, angles)
plt.xlabel("Impact parameter")
plt.ylabel("Scatter angle")
plt.show()```
honestly i can't answer that off the top of my head. i haven't done runge kutta methods in literally 10 years
thanks for your help:)
no prob, best of luck
thank youu
you can try asking in the mathematics server btw, the one with a torus as its logo. there's a channel there called "computational maths" or something similar which deals with numerics. on the other hand, they don't directly do python, so you might have to formulate your question in a more mathy way.
ooo okay might try that
sounds like a good idea
