#data-science-and-ml
1 messages · Page 418 of 1
Hello guys could anyone help me out with correlation coefficients in python
Short background, I need to do correlation analysis using ordinal data
but the thing is if i do spearman's correlation analysis
im not sure if its possible using spearman's correlation using Multiple ordinal columns
ive done some research but so far no one has done it with multiple ordinal columns
any advice?
or how do I calculate the correlation between multiple ordinal variables?
Hi, I'm trying install wget in site-packages in my d drive using !pip install wget. However, my python is located in my c drive. Is it possible to install wget on d drive instead?
@upbeat furnace I've only seen wget as a bash command. Not as a python package
IMO exams are there to test your ability to handle tough situations by putting you under a lot of stress, similar to actual life events. Surely you have met people who lost their cool or stayed home when the situation needed them. I for one would hate to work with such a person.
I think exams are intended to make it easy for the instructor to assign grades to everyone. What you've said sounds like a retroactive justification for what I see as evidence of their failure
Titles are often arbitrary. A "data scientist" might still write papers.
Hi guys,
Is there a way I can create an new column based on the x column here in my dataframe that finds the difference between two rows i.e. the new column will be row1-row0 of the x column and so on?
@versed gulch yes, you would just do df['new'] = df['a'] - df['b']
If you're trying to do operations between elements of the same column, please be more specific about the expected input and output
i.e. I want the first value to be Nan then do 65388.270 - 64624.593 and so on
What would the next one be? Two elements isn't enough to establish the rule
then 66151.947-65388.270
ok thanks
Also is there a way to take the tile numbers that have the same y values and put them into a list?
wud anyone like to help me put a bunch of dicts adn lists into a dataframe, i have the process done for one df but instead of doing it 10 times i wana do it in a loop in a single cell?
and my brains gettin goverloaded
Hey, anyworked with Celonis / pycelonis yet and encountered an issue?
I'm wondering if it is just a bug or I made a mistake in my query
does anyone know why interpolating doesnt work on a list of dataframe object?
lets say i have list of dataframes [pd.dataframe(list[0]), pd.dataframe(list[1]) etc, why doesnt for i in range len dataframes: dataframes[i]['col']=dataframes[i]['col'].interpolate(method=linear) work?
does pandas not creeate the dataframe as an object when reading from lists?
infact, its replaceing them with None
helloo gfolks...so i have this dict data with me ..just wanted to know how to divide this into two parts preferably 70:30
0. 0.04303315 0.03849002 0. 0. ]
[0. 1. 0. 0.12309149 0. 0.
0. 0. 0. 0.05913124 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. 0. 0. 0. ]
[0. 0.12309149 0. 1. 0.07216878 0.
0. 0. 0. 0. 0. ]
[0. 0. 0. 0.07216878 1. 0.06454972
0. 0. 0. 0. 0. ]
[0.0496904 0. 0. 0. 0.06454972 1.
0. 0. 0. 0. 0.14322297]
[0. 0. 0. 0. 0. 0.
1. 0. 0. 0. 0. ]
[0.04303315 0. 0. 0. 0. 0.
0. 1. 0.04472136 0.06201737 0. ]
[0.03849002 0. 0. 0. 0. 0.
0. 0.04472136 1. 0. 0.05547002]
[0. 0.05913124 0. 0. 0. 0.
0. 0.06201737 0. 1. 0. ]
[0. 0. 0. 0. 0. 0.14322297
0. 0. 0.05547002 0. 1. ]]```
nvm fixed
Hi all! Anyone familiar with sympy? I'm trying to solve an integral and need some help
How would i go about training a TTS model to replicate someone's voice, i have essentially a infinite amount of training data as the person is locked in my basement, and i'm just looking for a framework or a good guide.
The goal is not to make any TTS, the goal is to replicate the person's voice
Simpoly just wanna split this data into two parts and store it..may i knwo how [0. 0. 0. 0. 0. 0. 0. 0.0766965 0.14142136 0. 1. 0.08164966] [0. 0. 0. 0. 0. 0. 0. 0.0766965 0.14142136 0. 1. 0.08164966]
its stored in a variable
MonkaS
Yes just split it by index
I assume you're talking about the emote
A = thelist[lenlist/5] would give u the first 20% right?
Sorry I missed a :
Put the column before
Lenlist
so i have this store in a variable called similarity..so how do i pu this hia?
can we use python to make ai
Splitting it in half you index as yourlist[:lenyourlist/2]
sry its my forst time doing python so new to tiis
oh sorry i mean where do i put my vairbale which i store the list in
thelist will be that variable right?\
You store variables in python by just saying x = 1
So firsthalf = thecodeigaveyou means first half is that first half list
can we use python to make ai
Holy shit chat today
"make ai" is an extremely oversimplified way of putting it, that doesn't really matches the reality, but yes, Python is used for working with AI
@sour tide go to main python channel not data for this it’s basic python
like learn wt the user is interested in, or some simple ai
Yes python is the most popular for that
O thanks
funnny thing they actaully sent me hia lol
weird
did you guys know postman can give you code for python request for APIs?
i had no idea
hey nlp people, is there a framework or tool that we can define a grammar by notation or any other way, from a pool of words with pos tags, it stochastically creates sentences?
NLTK pos tags are too specific for my case. All I want to basically use is S NP V
spacy?
Does anyone know if I need machine learning for image processing/computer vision?
How much can I do without machine learning?
Let's say I want to create a project in which I analyze an image of a heart, and then I as a result I want to know whether or not that person has some sort of disease, can I do that with only image processing/computer vision?
computer vision often, but not always involves machine learning
the difference is how much math you do yourself 😛
same with image processing, which i would usually put in a separate category, as it deals with different tasks in general (they do have some overlap)
what if I'm bad at math? xD
then image and signal processing in general are a bad idea, unless you're willing to sink in a lot of time
people get masters and phds in engineering and maths for signal processing/image processing/computer vision
especially if you wanna do it in a medical area
I see!
Does that mean that if I use ML, then I wouldn't need to dive in too deep in math?
hmm you dive into different math, but depending on how novel or old it is, you don't need to do it yourself
also consider that when i say "do math", i don't mean you're gonna go and multiply numbers and do integrals on paper, but rather that you'll formulate problems in a clever way and recognize good solution approaches
I see!
When you said math, I was thinking of stuff such as calculus and linear algebra
as a dumb example, noticing that you can find the coefficients of a polynomial of arbitrary degree by doing a linear regression, even though you'd normally associate "linear" with polynomials of order 1
calculus and linear algebra are the bare basics, you won't get anywhere without those
I see!
What topics should I know?
besides calculus and linear algebra
probability and statistics in the multivariate case, some optimization
image processing methods often basic physics and differential equations, e.g. when you optimally detect edges in an image or try to find regions that satisfy some condition, try to denoise, etc
I see! @wooden sail
and statistical signal processing itself
Image processing and computer vision are actually very interesting topics
i'd say optimization and sigproc are applications of the other topics... in a very handwavy way, because there's a lot to those topics in and of themselves
But, when should I use which, though?
I see!
being able to detect that is what makes you be good at it 😛 it depends on what you're doing
I see, that makes sense
I want to plot a scatter plot with the drop down as Species in R, not in shiny. The dropdown should have the Species (setosa, versicolor, and virginica) and by selecting one the plot should change. Can someone suggest me here?
we do R stuff here?
can anyone recommend a good general tensorflow tutorial, like how to get a grasp of how to use it
I'm afraid no, this is a Python community not R.
I have a pressure log with oscillations from which I need to extract some timing info - anyone have suggestions on how to do so (I'm mostly on pandas):
basically - extract the 'timestamp' of each red dot, and the duration of the black bar:
in this particular example, crossing the '490' value would work, but unfortunately there's a significant low frequency component that makes absolute value approach useless:
So using the derrivative then maybe?
If the value suddenly decreases rapidly, then you found the red dot
And if it rapidly increases, that's the end of the black bar
that would make sense - I've been shying away from adding a low pass filter because I don't want to lose timing resolution (this is a ~20Hz signal), but maybe some additional step to ensure an 'extended' drop is noted, as opposed to 'jiggles'? I'm not sure if there are adaptive peak/threshold detection tools out there (and whether those would be too complicated for this)?
Not sure about low-pass filters, but if I were to do this task, I would probably just check for each point if the point x time steps ahead is at least 10-20 lower
If it is, then red point, and we start looking for the end of black bar
which is when the point x time steps ahead is at least 10-20 higher
That would probably already give "decent-ish" results
If you have a lot of data, you could maybe even train a simple rolling regression model
what's up emyrs
I'm doing great myself I can't complain.
that's good man
No. Don't be alarmed. If you are on the production side you don't need to know any math at all. Or even stats for that case. There is many out of the box running frameworks. Apart from that whether you need ML or Vanilla CV really depends on the problem.
What kind of things are you trying to detect from heart image?
I'm glad to hear that, thanks for the insight
I don't really know, it was just something I thought of. Is there any beginner project you'd recommend me to do?
I have a 1D array of vectorized words. How do I use it with a lstm? It want's ndim 3, but I only have 2 (batch size, vector count)
For what, like for any image job?
MNIST is like the hello world of statistics dataset and there is many tutorials.
For computer vision
Yes, something basic that involves image processing/cv with ml
Are there interesting platforms that I should look up, related to the topics I've mentioned above?
fall cohort about to start https://fullstackdeeplearning.com/course/

considering doing it
especially since im already running into problems at work with model deployment

pyimagesearch for more opencv projects and for practical applications sentdex tutorials are great in my opinion. not really theorizing everything.
or like anything at all
Thank you, I'll look it up! 🙂
What is this?
oh this isnt for you. this is mostly for the lurkers in chat. 
theres also an academic discount for the students and an accessibility discount for those from low income countries

havent read it myself, but i heard many from my podcasts say its a classic
Hey @ancient fractal!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
Can someone fix my python code? I am trying to Cluster a 2D array and calculate min and max value of each cluster. I got stuck, it does not output the results that I am looking for. Here is my code: ```py
import numpy as np
from collections import defaultdict
from scipy.cluster.vq import kmeans, vq
data = defaultdict(list)
arr =[[2, 230], [2, 233], [1, 676], [2, 233], [1, 698], [2, 233], [1, 685], [2, 234], [2, 236], [2, 232], [2, 261], [1, 674], [2, 262], [2, 236], [2, 267], [1, 690], [2, 261], [2, 231], [1, 540], [2, 231], [1, 696], [2, 233], [1, 528], [2, 231], [2, 232]]
for k in arr:
data[k[0]].append(k[1])
data = dict(data)
new_data = defaultdict(list)
check_cluster_list = [len(x) for ii,x in data.items()]
def chunk(l: list, N: int):
return [l[i:i+N] for i in range(0, len(l), N)]
arr_d = defaultdict(list)
for entry in arr:
arr_d[entry[0]].append(entry[1])
chunks = {
1: 3,
2: 4,
}
for k, l in arr_d.items():
number_of_clusters = chunks[k]
if number_of_clusters > min(check_cluster_list):
print("Clusters cannot be larger than",min(check_cluster_list))
raise Exception(f"Clusters cannot be larger than {min(check_cluster_list)}")
for indx, (id, y) in enumerate(data.items()):
cluster_dict = defaultdict(list)
codebook, _ = kmeans(np.array(y, dtype=float), number_of_clusters)
cluster_indices, _ = vq(y, codebook)
for i, val in enumerate(cluster_indices):
cluster_dict[val].append(y[i])
final_list = []
for id_1,y_1 in cluster_dict.items():
final_list.append([min(y_1), max(y_1)])
new_data[id].append(final_list)
new_data = dict(new_data)
new_data = {id:y[0] for id,y in new_data.items()}
print(new_data)```
from the one and only andrew ng https://read.deeplearning.ai/the-batch/issue-152/
published yesterday
More papers have been published on AI than any person can read in a lifetime. So, in your efforts to learn, it’s critical to prioritize topic selection. I believe the most important topics for a technical career in machine learning are:
- Foundational machine learning skills.
- Deep learning.
- Math relevant to machine learning.
- Software development.
he goes into the specifics in his article

the most important topics for a technical career in machine learning are:
- Foundational machine learning skills.
no shit
obligatory "this article was written by an AI"

well his audience is for early career peeps or students
so he needs to state the obvious
please dont hate andrew ng

hes done a lot for the community
sure, i've done his ML course
hes doing a renewed version that uses python
unless youre referring to that one
this time theres RecSys + RL in the last module of the syllabus

yo, i built a model today which got an accuracy score of 90% and then ran some data on it from another year and got 80%, would this be a good model?
y and fx are the same thing in most contexts
is this a 2 dimensional field
do you know what a function is in general?
is this just used as an example
yes its something which can be applied to a variable
but then this confuses me as its being equated to y
so basically f(x) means you input x into a formula in this case and the result y
then hes saying that if y = mx+c that multiplying by m and adding c is the function f?
yes
so the times m and + c are basically like applying a function to x and then spitting out y
how do u write f(x) without x just as f in text?
is it possible?
so you can have a function that u can apply to anything
like u said, *m+c
f = *m+c
is that even real?
ive never seen somene write that way
its always f(x)
i think it's just shorthand
ive enver ever seen that, and someone told me that it doesnte xist
exist
so ive always been confused
someone said that f on itsown is nothing
i mean technically no, the function needs an input
without the input there's no output
so this guy on the photo saying y = f(x)
hes just using it as an example for y = graphs?
yes
its not always the case or its always the case in graphs
u cant have a graph withuot some sort of x
what does it mean to say y vs y''
second derivative?
d2y/dx2 is physically on the graph what?
another location on the line?
shifted the dot so to speak?
a physical analogy would be:
- position
- velocity is the derivative of the position
- acceleration is the derivative of velocity
the way i see it is a position yes
how does velocity come into play?
or acceleration?
the difference between the derivatives?
yeah i took physics
but in math they never explained this
it was just random numbers and no actual meaning
solving first and second order derivatives
to pass a quesiton
without knowing lkiterally waht it means
how can a new point on the line related to velocity?
you mean gradient?
thers a new gradient
not sure about the acceleration part
What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks
In this first video of the series, we see how unraveling the nuances of a simp...
why the fuck didnt they just say so in school?
limit theorem wasnt touched at all
more like remember simple rules
finally learning about first principals properly
Have you taken Calculus 3?
no just 1
It expands on the uses of Calculus in three dimensions.
btw, if the limit of fx is undefined
for x1, x2,x3.....as input are simple feature engineering such as making a new x=x1*x2 etc., make any substantial improvement?
i feel like a much complex feature engineering might help more, but these simple feature engineering isnt much.
whats that written as
L'Hopital's rule.
so u cant just say 0
No lol.
But if you don't read it and do it yourself, you won't understand.
oh so im wrong?
Don't understand what you're trying to say.
u can have random points on the graph that literally arent even touching the curve
so long as the function defines it
?
Wut?
Yup it’s true
I think what you are asking about are piecewise functions https://en.wikipedia.org/wiki/Piecewise
In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. Piecewise definition is actually a way of expressing the function, rather than a characteristic of the f...
How to improve prediction in classification problems?
This is a broad question, but you could try feature engineering to make sure you are extracting as much relevant information from your data as you can. Also, test different models. If your dataset is small, maybe focus on Logistic Regression or Random Forest.
yes, I used random forest
and the object column I transform with one hot encoder
What are you trying to classify?
classification problem
contain of number and object columns
I drop ID
because it is not important
Need to Optimise my Program .......Help!!!
from numpy.core.fromnumeric import amax
from numpy.core.fromnumeric import amin
def peak_valley_detector(d, ker_sz, sigma, width):
ds = gderiv(d, sigma, ker_sz)
# print(ds)
for j in range(1, int((ker_sz/2))+1):
ds = np.delete(ds, 0)
ds = np.delete(ds, len(ds)-1)
d1 = ds
idx1 = np.zeros(len(d1))
idx2 = np.zeros(len(d1))
for i in range(1, len(d1)-2):
if (np.sign(d1[i-1])>=np.sign(d1[i+1])) and (d[i]>=0.5*(np.amax(d)+np.amin(d))):
idx1[i] = 1
else:
idx1[i] = 0
for i in range(1, len(d1)-2):
if (np.sign(d1[i+1])>np.sign(d1[i-1])) and (d[i]<0.5*(np.amax(d)+np.amin(d))):
idx2[i+1] = 1
else:
idx2[i+1] = 0
index1 = np.where(idx1 == 1)
index2 = np.where(idx2 == 1)
flag = 0
# indexo2 = [1,len(d1)]
# for k in index2:
# indexo2.append(k)
# index2 = indexo2
index2 = np.append(index2, [1, len(d1)])
index2 = np.sort(index2)
# Amongst multiple close peaks detected, choose the highest peak, discard the rest.
while not flag:
flag = 1
for i in range(1 , len(index1)-1):
if abs(index1[i]-index1[i+1]) < width:
flag = 0
if d[index1[i]] < d[index1[i+1]]:
index1[i] = 9999
else:
index1[i+1] = 9999
irx1 = np.where(index1 == 9999)
index1 = np.delete(index1 , irx1)
flag = 0
# Amongst multiple close valleys detected, choose the lowest valley, discard the rest.
while not flag:
flag = 1
for i in range(1 , len(index2)-2):
if abs(index2[i]-index2[i+1]) < width:
flag = 0
if d[index2[i]] > d[index2[i+1]]:
index2[i] = 9999
else:
index2[i+1] = 9999
irx2 = np.where(index2 == 9999)
# print(irx2)
index2 = np.delete(index2 , irx2)
return index1,index2
are SQL related questions allowed in this channel? 8-)
Hi everyone! I need some help with implementing this technology in my application. demo.py uses SSDLite, I need to change it to ResNeXt101. How can I do it? Thanks in advance 🙂
Is there any database somewhere with images for computer vision?
overalaping images for NeRF especially
Where x=certain number
Fx under two conditions
like we can change the degree of fitting polynomial in LogReg algo using python library, can we also change the coefficient of those polynomial?or will we have write all the code by hand?
Like a box plot?
Can you use this channel #databases,
But if your questions about "use sql for ML or DS" I think can you use this channel
@hasty kiln you are right 
hey quick question guys:
axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker='.')
this line seems to crash my code, any idea why that could be?
img = cv2.imread('IMG_7659.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
pixel_colors = img.reshape(-1, 3).astype(int)
norm = colors.Normalize(vmin=-1.,vmax=1.)
norm.autoscale(pixel_colors)
pixel_colors = norm(pixel_colors).tolist()
r, g, b = cv2.split(img)
fig = plt.figure()
axis = fig.add_subplot(1, 1, 1, projection='3d')
axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker='.')
axis.set_xlabel('Red')
axis.set_ylabel('Green')
axis.set_zlabel('Blue')
plt.show()```
it just doenst show the plot
Thank you
Show the error traceback
scatter normally takes an array of x coordinates, and an array of y coordinates
you supply r, g and b
Not sure what those are supposed to be
And you are saying that the line is "crashing" your code, what does that mean?
@pliant star
Yeah its just doing nothing and loading properly
Rgb are the color values of the imagrs
I got then by cv2.split(img)
So what are you trying to plot?
The image itself?
Using the hsv ones
Don't really get it
yeah
Alright, so just do plt.imshow(img) and then plt.show()
Or use cv.imshow(img) and then cv2.waitkey(0) iirc
This is a 3d scatter plot, you need to use something other than plt.scatter
I want sth like that, such that i can group the different pixel values
Oh hmm, nvm it seems it can be done with this as well
Did you follow this?
And what does it show now?
Can you show your screen after it shows the plot
I'm moving today, so I probably won't be able to look at it much later
But maybe someone else could help at that point ^^
!pastebin
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
pd.to_numeric(car_data["Price"])
pd.to_numeric(car_data["Doors"])
i thought the error was that it was stringss
something something int can't be divided by string
but i don't think there are any strings here
@iron basalt Sorry for the ping, but I’ve been wondering if I could somehow “test” the monotonicity constraint? Could I generate some fake values? I want to see how the monotonicity constraint works. Because obviously it works but I need to know how. Any idea how to proceed with this?
Man I hate waiting networks to train so long
It’s been 2 hours
Still only 10% in
Do any of u use a tuner or just eyeball it
IIRC, the source has a boolean in QMix that let you turn if off (on by default). And also you could check if the weights are ever negative or not. If you want to check the partial derivative, you could work it out by hand (or use a tool) to see that positive weights have the desired effect.
the output we get..is higher pg good or lower log
i want to classify racy or violent or border line videos( with audio), where can i get a good pre-trained model for that?
how does the partial derivative help?
so I work that out by hand to prove if the weights are always positive?
No, non-negative weights give that constraint. The weights being non-negative is given, it's what the code does. Show that given non-negative weights, you have that constraint. And then show that given that constraint your problem becomes more feasible. Then demonstrate it with an experiment (this is what the paper is all about, it's pretty straight forward even if the code seems like a lot (it's easy to say that you have some network in text, but it can be very annoying to code / the difference between stating something and actually doing it)).
(5) is just (1) again, not sure why they wrote it twice.
hey there, im currently building a timeseries database for pandas / dask dataframe data which can handle multiple billions of lines of dataframes. if anyone has a usecase for this and would like a specific feature hit me up! https://github.com/mercator-labs/oakstore
highspeed timeseries pandas dataframe database. Contribute to mercator-labs/oakstore development by creating an account on GitHub.
Can anyone give me a simple explanation as to why we have to specify the DataFrame name multiple times when using Pandas methods? e.g.
my_df[(my_df.value == 10) & (my_df.object == 'Sally')]
why do we need to keep repeating my_df?
How else can it work? The way this works is just combining a few pandas features:
- a comparison with a Series returns a Series of booleans (the comparison results for each element)
- boolean series can be elementwise ANDed using
& - dataframes can be indexed with a boolean Series to select only the rows for which the corresponding element is True.
- If a column's name is a valid Python identifier, like
value, you can access that column not just asdf["value"](like usual), but asdf.value.
It's not magic. The expression on the inside of the square brackets here knows nothing about what it's used for, and can't possibly replace, say,valuewithmy_df.value.
heartbreaking: the worst person you know just made a great point
I wanna start learning about ai programing but i don't know if im ready, i took a 20hour course before about python. Should i start learning data sceine and ai or should i learn something else before learning this?
learn some math (if you dont know already)
as in calculus and linear algebra
not necessarily before though
i know a little bit of math, 10th grade math))
yeah so perhaps you know like dervatives, chain-rule, product-rule etc?
that stuff is important for ml
or perhaps you also had some stuff about vectors
only a little bit about derivates and chain rule
in general you wont have to apply math yourself directly however having deeper understanding and intuition for those things is very useful
focus python and statistics first id sayt
in my experience starting similar to you once, the maths is really hard to just learn at will, just pick up the ideas over time and ull be ok
like we discussed earlier, i'd fail a linalg/calc2 exam 10/10 times but i still managed to impress interviewers for junior DS roles, if its not faang youll be fine
people like to pretend reality is that DS requires you to leave uni like zuckerberg, this isnt the case, alot of ds have had to learn for many years
Companies probably value your ability to produce and to make projects using tools more than your ability to do math off hand
Or so I’ve been told today
Check out this natural language query add-on for Pandas: https://pypi.org/project/askedith/
youtube has violence, racism, hate detection model for video, what kind of algo is applied there?
anomaly etc. or is it like seperate algorithm for each of those??
i only see specific models for hate speech, or for gore etc
but i want to make a something that can detect all type of unpleasant behaviour
does someone know how to exclude that dtype int64?
having trouble when calculating the accuracy
Exclude from printing? Why do you need to?
yes, from printing
Why would removing that from printing help you calculate the accuracy better?
dont they need to be reported
I have a text corpus. I want to train an AI in such a way that it takes in an input sentence and predicts the next sentence. How do I setup the training and testing data, as well as sentence pairs, without letting the AI "cheat"
hey guys can anyone explain why its only capturing 100 items when there is 500 items on the website? https://www.pythonmorsels.com/p/2bfgr/
How do I fix this?
I get this error when I pip install pyaudio
How to fix it without installing visual c++ 14.0
???
required
The first few answers to google for "pyaudio windows" yield some great results (including the very same error message you get). Rather than copy/pasting them, I would suggest you try that.
In general, it helps a lot to google the error messages too
(I don't use windows, so I can't really help further)
Thanks
For u muaahHh ...
https://www.reddit.com/r/learnmachinelearning/comments/cxrpjz/comment/eyn8cna/?context=3
391 votes and 67 comments so far on Reddit
Termux isnot compaitible with ai-python is it?
Thats secondary, without detection, no reporting
Trying to detect views from title, subscriber and days since published data
What I did for embedding data is I got each word's embedding (100 features) in title and then averaged it.
what else can I do for dimensional reduction of Embedding data
or like anything to increase the prediction quality
What gpu is that
Fast
why don't you want to install it? even if you find a workaround for this library, the c++ build tools are necessary for a lot of installations. it was one of the first things I install when I get a new Windows machine.
Hi folks
Someone by chance knows how to recognize objects in image by using
"image registration" method?
I'll be happy get any help, thanks in advance ! 🙏
do you have to do it in terms of whole sentences? do you want each sentence to be copied exactly from the corpus?
I have to do it in whole sentences, and I'm not sure about any of other way to do the last part. I think I got it working. (Every two sentences I put on in labels and one in inputs)
Guys, how do you use AI GPT-3 to create music melodies, lyrics, and other things that help with programming, engineering, or even learning English? I am completely new to programming and have only recently discovered information about AI GPT-3; please point me in the direction of where I can read or watch (youtube) about it. I'm not sure if there is a UI for colobaration with this AI GPT-3. Thanks...
anyone CURRENTLY doing the cnn course of deep learning specialisation on coursera?? i need ot see updated lab code.
Using matplotlib with Nodezator (pip install nodezator) for data visualization
I'm happy to announce the Nodezator app, a node editor for the Python programming language that turns Python functions into nodes. It is expected to be released in June 2022, the first app of the Indie Python project to be released. Visit its dedicated website: http://nodezator.com
http://indiepython.com
https://twitter.com/IndiePython
http://...
can I Remote Desktop into my machine from a laptop? and see the visualizations?
bc I’m not sure if I should buy a laptop or make my pc smaller
If you're getting started with AI, you don't want to start with GPT-3.
@pearl locust is this self promotion?
what about normal extensions? do I have to setup a server?
What can u recommend then? And if it a not a big deal for you, pls, can ya send me some links, where I can get started with gpt-3?
How can I identify if a word is a material or not?
Such as:
black titanium handle
I am able to identify the color
but I need to be able to identify the titanium
It is. I made it brief so as not to disrupt your conversation. I'm just sharing this app, since I think it can be useful for someone, specially since it is free of charge and on the public domain. That's all.
Here's the github page: https://github.com/KennedyRichard/nodezator
For some reason pytorch is is giving me a single data point instead a full batch, and I don't know why.
dataset = torch.utils.data.TensorDataset(seqs, labels)
dataloader = DataLoader(dataset, batch_size=32, drop_last=True, shuffle=True)
for epoch in range(100):
for i, data in enumerate(tqdm(dataset, desc=f"Epoch: {epoch}")):
print(data)```
Data here is just a tuple of one tensor from seqs and one from labels
Hey y'all, is there a rough formula or rule of thumb on how many iterations to try out when training a Self-Organizing Map (SOM) / Kohonen Map?
How are you getting the color right now?
I would start with a beginner data science book. the fundamentals of data science lend themselves well to AI.
I know it's tempting to jump to the newest and coolest things that are happening in AI, but you won't be able to understand them until you've been developing your knowledge for a long time. starting from the basics and working your way up can still be satisfying. you have to keep a positive attitude about learning
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Hello. Can I ask a statistics question here?
Or would #algos-and-data-structs be better?
Dont ask to ask 😉
Nvm I see the channel description says statistics.
If I have some z-scores and combine them into one z-score, is this a good approach? I have 1 sample so I can't use Stouffer's method.
I'm trying to make backpropagation myself, and one problem I have is that it doesn't work when a layer is using relu as its activation function instead of sigmoid, since the derivative I tried was relu(x)/x, and that can sometimes result in division by zero errors. Do I have the derivative wrong or is there something else I'm missing about it?
@serene scaffold man, thanks a lot, I utterly appreciate your swift responses and advices so so much, be blessed and happy
is YOLO still used in industry??
u can use davinci on openais playground, but it isnt going to create melodies for u lol
statistics questions are better for here. #algos-and-data-structs is more about sorts, graph traversals, big-O, etc. as well as general CS theory stuff.
Ok.
What will create davinci then? In generally, what can I do with this davinci addition? Which else benefits it can get? Thanks
you can tell it do say something
speaking of R, no offense to R users, but i made it the 3rd round for a company and decided to cancel my upcoming interviews bc i found out they worked in R mostly (among other reasons)
if you do have to use R, for whatever reason, you can use R Studio or even Jupyter notebooks for it


it actually has some good stats packages
for monte carlo simulations, etc.
or bayesian stuff
but most of the time you dont need that
and if youre going to deploy, its better to be in python anyway
i feel like youre more marketable too
also you can switch to more software jobs easier if you want to later on
bro...
you should see javascript

that dot notation
but yeah it is kinda heavy in R
oh the tidyverse is good too if you ever have to work in R

oooooops. outputs are supposed to be between -1 and 1 and I used sigmoid. my bad
so guys should I buy a laptop to Remote Desktop into my pc or should I make my pc smaller? Rn I’m doing work in MARL
how in pytthon
might as well get a macbook pro
sm?
how to put this
you didnt like what i had to say about data engineering basics so why would you listen to me about bayesian stats...



dude does it on a dataframe
what now?
it’s not supported
cuda isn’t supported
no data engineering really just calculations
why u need that
Well it depends if u need something rly strong like RTX 3080 or better
If not the m1 pro has a gpu that’s alright
i must say, you're one of my biggest inspirations to pursue DS
this one is hella neat, what equation/function is that? 👀
I created a segmentation model with a train: 80%, val: 60 after 500 epochs. What is the problem? Is it overfitting or underfitting?
Where to get started with Data Science?
We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.
having been on here for a while I see that many share my sentiment that pandas is very difficult to use so I made an alt for CSV parsing
any feed back would be very nice
So, I'm reading this: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
I need to read a dataframe row by row - and then depending on the contents of certain cells, write stuff to a list of lists that I'll eventually write to a csv.
But it sounds like iterating over a dataframe is not a great idea - so how would someone do that?
For example, my df contains a column containing a uuid, a name, and then a bunch of other columns with boolean values - and based on those I want to write a line to a csv that has that uuid, name, and those boolean values.
Hello guys, I have a question
Supervised learning got performance measure techniques such a Cross-Val, Confusion-Matrix, Precision & Recall, ROC ..
How could we therefore evaluate a reinforced agent in RL ??
Heyy guys, what does this mean? 😅 " tensorflow 2.9.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.21.2 which is incompatible."
does it mean I have to use an older version of protobuf?
do you know what those three-part version numbers mean?
depending on the contents of certain cells
let's make this question not abstract. Please doprint(df.head().to_dict('list')), show the text, and say which rows you don't want and why.
Requires protobuf version Less than 3.2 or more or equal than 3.9.2 is what I think
But I thougtht that wouldn't make sense
for version numbers, 3.2 is not the same as 3.20
they're not decimal numbers.
Ohh
when you have x.y.z, x, y, and z are each their own number
that's why we went from Python 3.9 to 3.10
Ahh thanks! 😅
look into "semantic versioning"
do u think we shud have a pinned or channel description for all the peope who asking where to start?
I'd have to write it, and that is work.
Can my segmented image be in gray scale if it is multiclass?
whoa, that's an old-ass plot of mine you found
this was simulating, IIRC, a particle moving in a combination of constant magnetic and electic fields, for an electrodynamics homework task
I realised after making it that the task in question was in fact exactly analytically solvable (rather than only approximately so like I thought) by just applying a fourier transform 😔
haha yeah was going through some videos/pics for some ✨ inspiration ✨ and was taken aback by that
do you perhaps have the task because i really want to try doing that
it's hella cool!
Here, dug it up (and translated to English). Looks like it was a variable field, not a constant one.
.latex A dielectric can be described using a model in which each atom constists of a stationary positive charge and an electron moving around close to it. Their interaction is described by a simplified potential $\frac{m \omega_0^2 r^2}{2}$, where $r$ is the distance between the electron and the center of the atom, $m$ is the electron mass. Find the dielectric permittivity tensor of the medium in a variable magnetic field $B$ which is pointing along axis $z$ and has frequency $\omega$. Represent the answer in terms of the following parameter:
[
\omega_{LT}= \frac{2 \pi e^2 n}{m \omega_0},
]
where $n$ is the concentration of atoms, $e$ is the electron charge.
yay, surprised the bot handled that correctly.
wah, thank you!
(note: the permittivity tensor is at frequency omega, too. So you have a magnetic field at that frequency, and apply an electric field at that frequency, and get some electrons moving at that frequency as a result.)
what is this physics shit
What about a link to a pre existing website ha
ooh pretty nice
would rather you do the one for #career-advice instead

is sklearns svr rbf guassian?
Hey all , this is my new notebook on Support Vector Machines (from the book Hand's on Machine Learning) , do leave an upvote if you learn something. Cheers! https://www.kaggle.com/code/supreeth888/support-vector-machines-hand-s-on-ml/notebook
@night sequoia ur the perfect person to answer my question then
im literally writing a report on svr as we speak
if its ok can i use ur visualisastion code i cba to write my own
Cool ! Check that notebook and let me know what you think ?
yes actually it does , you can use the kernel trick and specify like this : (kernel="rbf")
do u think i shud visualise sample points to show how it works
or just use a randomised one like oyu
u can use the make_moon dataset
Can someone help to refer a text book of stats?
@tacit basin I mean to say for helping data science,this above book doesn't contain topic like p-test,z-test,chi-square test
It's considered one of the best books on the subject.
can i get ur guys take on normalising by dividing by 100 if values are between 80 and 300
my professor did it
he just put eveything on a 0-3 scale
lol
well theres v small values that are <1 and they were /100 also
so its the same scale
just not 0-1
I thought they talk p value in stats learning. Was Long time ago when i read it though...
they do
guys I am currently working on this binary classification dataset. I've made the model and want to now apply it to the actual test data. It contains around 1 mill rows and I fear it may restart my kernel due to insufficient memory. Is there anyway I could maybe get around this. Should I split the testing set or wot. I need a final csv of all the final scores
How many columns and what type of data? @nova matrix
It's a numerical data 190 columns
I mean it did have a few categorical strings but aim is to change them into int vals
Yeah so that might be too much to load at once
Not sure in what format it is stored right now, but you could just load it in batches
you know any good way to do that or any link to smth that shows it, cuz I've never done it like that b4
How is it stored right now?
csv format
This explains how to do it pretty well
I'm using pandas to do expectation-maximization but I'm stuck with how the table is rearranged by the merge method after multiplying two factors.
this is what I have.
What I want rearrange column Dunett to the pattern of ('False', mild, severe, false, mild,severe, so and so forth)
That way it's easier for me to do a normalization
Does anyone have any suggestion?
Have you tried pd.sort_values
Would it work though because I'm sorting Dunett, and Dunett only has False, mild, and severe
Oo just realized you wanted to order by columns too
I did it!! Praise God
In the end, I just had to add a new column seems like
Wow this is something I learned. To manipulate something in ur df, often if you add more columns to store values can help
is use the mnist dataset
i*
why is it showing 1875 instead of 60000 while training
?
# import libraries
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist
from matplotlib import pyplot as plt
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
class_names = [0,1,2,3,4,5,6,7,8,9]
model = keras.Sequential([keras.layers.Flatten(input_shape=(28,28)),
keras.layers.Dense(128,activation='relu'),
keras.layers.Dense(10, activation='softmax')])
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(train_images,train_labels,epochs=3)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose = 1)
print("Test accuracy : ", test_acc)
i = 0
plt.figure()
plt.title(f"the number is : {train_labels[i]}")
plt.imshow(train_images[i])
plt.colorbar()
plt.grid(False)
plt.show()
print(class_names[test_labels[i]])
this is the code
batch_size will default to 32, so 60000/32 = 1875
Hi, I hope this is the right forum to pose this question
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
data = sheet.get_all_values()
df = pd.DataFrame(data)
df_sorted = df.sort_values("GAMEs")
NAME GAMEs ATT COMP YRDs PCT TDs INTs
Mills 14 438 287 3468 65.5 18 8
Lance 19 318 208 2947 65.4 30 1
Trask 26 813 552 7386 67.9 69 15
Wilson 30 837 566 7652 67.6 56 15
Jones 30 556 413 6126 74.3 56 7
Fields 34 618 423 5761 68.4 67 9
Law 40 1138 758 10098 66.6 90 17
Book 45 1141 728 8948 63.8 72 20
Mond 46 1358 801 9661 59 71 27
Eger 46 1476 923 11436 62.5 94 27
num0 = df_sorted["GAMEs"]
num1 = ["green" if (g < max(num0)) else "red" for g in num0]
plt.figure(figsize = (10, 10))
plt.bar(x = df_sorted["NAME"], height = df_sorted["GAMEs"], width = 0.4, color = num1)
plt.grid(color = 'red', alpha = 0.2, linestyle = '--', linewidth = 1)
plt.xlabel("QB Names (x)", fontsize = 12)
plt.ylabel("Number of Games Played (y)", fontsize = 12)
plt.xticks(rotation = 45, fontsize = 12)
plt.yticks(fontsize = 12)
plt.title("Total Number of Games Played", fontsize = 12, fontweight = "bold")
plt.show()```
This starts at 14, I would like it to start at zero
adding plt.ylim(0, 50) does this
adding bottom = 0 or bottom = None, does not change the graph at all
I have tried everything and researched multiple sites that offer almost the same advice and none of it works. So decided to try here
2 dataframes in a list and this doesnt work....
for df in df_list:
df.loc[~(df==0).any(axis=1)]
why
Because they are the same df and you are telling it to produce the same output?
you want to replace 0's in the df with 1's correct?
no just delete all 0
actually i want do delete rows that contain 1
in one row and zeroes in another, but i wrote a small example
and then use that on the list. Problem is it works without for, but not with for
1 for df in df_list:
----> 2 df.loc[~(df==0).any(axis=1)]
AttributeError: 'int' object has no attribute 'loc'
df.loc doesn't modify the original df, so doing df.loc[something] and discarding the result is like doing nothing.
So I evaluated a number of models in terms of binary classification for different data. Now that I am writing a about it, I am unsure how to group these models thematically and especially regarding my table of contents. The models are:
Logistic Regression, Bernoulli Naive Bayes, Random Forest, Regularized Greedy Forest, XGB, Deep Neural Network, ROCKET, SVM, KNN, LSTM, GRU, RNN, Voting, Stacking, Bagging.
Easy ones:
Recurrent models: LSTM, GRU, RNN
Decision Trees: Random Forest, RGF, XGB
Ensemble Models: Voting, Stacking, Bagging
That leaves: Logistic Regression, Bernoulli Naive Bayes, Deep Neural Network, ROCKET, SVM, KNN,
As ROCKET uses convolutional kernels and basically resembles CNNs, I thought about grouping DNN and ROCKET as Neural Networks, but I mean the recurrent models above are neural networks as well.
SVM and KNN could be grouped as nonparametric models. That would leave Logistic Regression and Bernoulli. Logistic Regression despite its name could be categorized as generalized linear model in this binary classification approach. But Bernoulli is not really a linear model, is it?
I would love to categorize the models ideally in groups of 3 (give or take). Do you have suggestions on what kind of meta chapters / categories to choose? How would you go about categorizing these models?
If im going to pursue a career in Data analytics which languages, libraries, and tools do I need to know ?
I already know Python pretty well (OOP, Numpy, Pandas), some bash, and SQL pretty decently, what should be my next priority ?
@fallow frost try doing some actually projects that leverage or build on all of these skills
I have! I made quite a few, and im planning to upload 2 more to my github
but do you think I could even get an Internship or an entry level job yet with just these skills ?
are you a CS student currently?
at least in the US, you're probably not going to get a data scientist job without a degree.
do you have a degree in something else?
and it will take 7 months to finish, so I want to find something befoore
I dont have anything, not even a GED
but ik for a fact there are self taught data analyst in the US
which is why im doing the bootcamp,, to get some kinf of formal 'education'
tbh, if I were you, I would quit the bootcamp (if you can get your money back) and enroll in university.
sorry, but the bootcamp just isn't going to be valued the same as a degree by employers.
ofc but you cant say that there is no chance at all
why would you go with the option that gives you a worse chance...?
like for a senior data scientist I can understand, but for an ENTRY Data Analyst then It should be fairly easy if you have a solid portfolio
but for an ENTRY Data Analyst then It should be
fairly easynot completely impossible if you have a solid portfolio
nah first of all I dont pay any money to the bootcamp until im hired
Anyway, I don't think I'm going to change your mind, but I hope this works out for you.
so they will have to find me a job
Yeah probably
but what do you do ? @serene scaffold
Data scientist ? analyst ?
I'm an AI developer, namely for AIs that involve language.
like NLP ?
yes
ahh ok
and what kind of degree do you have ?
bachelors in CS, with AI and DS-related coursework.
I probably wouldn't have gotten this job with only a bachelors if I didn't also publish as an undergrad.
ayee im doing the same 🤝
pardon my noobiness but wth is an undergrad
a bachelors degree (ie not a masters or phd)
so its the same as a bachelors ?
my degree is a bachelors, yes.
I should add that I don't think university is the perfect mechanism for imparting knowledge. it's just the most widely recognized.
I think that just learning arbitrary tools is the wrong way to do it. that's why I suggested to do more projects. preferably one that relies on some concept that you're not currently familiar with.
I could keep making more projects, but for me it seems like I already mastered the basics of Python and Its already OP for being a data analyst which the max i'll do with it is clean data.
altough I do agree I should make more projects that are specifically related to data analysis using Pandas and SQL
but even then, just knowing those three tools isnt enough, right ?
enough for what?
each job is going to have different criteria that they look for
which imo I will be able to find even b4 I finish the boot camp if I manage to master those tools
and btw im not trying to get a 'good' job, I really dont give a shit as long as im hired
then I could leverage my work experience to get smth better instead of doing 3-4 years of college
do you have some idea of the kind of position you'd like, or where you'd like to work?`you could look at job descriptions and see how you compare to the sought-after profile
or whether you feel confident enough to do what the task descriptions contain
where it dosent matter, but I would love to make scripts/programs with python and work with pandas, since I really enjoy creating programs, and Im good at that imho
hmm but it makes more sense imo to focus on the task, not the tool. what if you don't even get the choice to work with python, for whatever reason?
I'm looking at the job descriptions on Linkdin and 95% of the job postings are by agencies who dont know jack shit about programming or data science, but most are like: 'were looking for an Entry level data analyst with 2+ years of experience and most know the following 10 technologies, and the ideal candidate will also be familiar with these additional 10 tools'
so its fucking ridiculous
exactly, so what should I learn ?
Like there is so much that im not sure where to continue
Data visualization: Tablue, Power BI, ETL, ML, pySpark and so many more
i guess i couldn't say, i guess my position is a little removed from the real world
looks like you'd wanna go into visualization next, in any case
Is there a way to pass multiple separators to pandas' read_csv()? Something akin to this:
so true
I've no idea how to use Regex, I need it to use both ; and , as separators
it sucks, Ill need to find a way to reach out to smaller companies directly
pandas.errors.ParserError: Expected 1140 fields in line 1029, saw 1144. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
Used: df = pd.read_csv('products_2.csv', sep="\s+|;|,", engine='python')
@limber token chances are, this CSV actually represents nested tables
because what that error message is telling you is that if you use \s+|;|, as the delimiter, each row has different numbers of columns
and that's not allowed.
rekt
see if you can figure out which delimiter is used the same number of times in each row
that's the real file-level delimiter. if you need to break up the nested data, you can use .str.split(';').explode()
I'm not sure what to do.
The file is an import file for my work, but it was a faulty import file that setted a bunch of attribute values as empty strings when it shouldn't have, my job is to recover the lost attributes
did you figure out which delimiter is used the same number of times in every row?
Looks like it's the comma
Hi everyone. Can someone please let me know what is the best way and resource to learn Pyspark?
guys do you face problems lately using chrome with google colab?
Nope 👍🏽
Hey guys,
Does anyone know how to fix this error?
ValueError: cannot reindex on an axis with duplicate labels
#help-orange full contsxt
please show the whole error message from Traceback:, and the relevant code, and print(df.head().to_dict())
Ill do the print lets continue in orange channel
Hello everyone!. I need your help please, I am investigate this to long time but i don't have answer. I need export a dataframe to excel but in excel in the "format number", I need that have "format number " be text, like the pic. Because always when I export to excel it puts "General" and not "Text"
I think "general" is the default option when you create an excel... try to convert it in your excel sheet instead
But I need to do it from python(pandas or other library) but automatically
there are excel libraries for python
Python for Excel compiles the best open-source Python libraries for working with Excel. It helps you choose the most suitable library for your use case.
have you also checked Pandas.ExcelWriter Method?
I took that but the number format gives me "general"
maybe try with XlsxWriter
I'd say, using YouTube or Buying courses on DataCamp/DataQuest/Coursera etc
but first you might have to create your excel filewith pandas and then edit it with XlsxWriter
I will try, thx u Obserdo!
Hello! Does anyone here knows how to implement the NBEATS algorithm and can help me with it? Thanks in advance.
Hello. Does anyone know what these mean or how to replicate this format?
1.346899999999999977e+02
1.322500000000000000e+02
1.300000000000000000e+02
1.335200000000000102e+02
1.303079999999999927e+02
Hmm... These are supposed to be stock prices so I have no idea how it got turned into that.
it's just a way to print a float
hey guys, any idea why the code keeps crashing me?
def detect(self, image, original_shape=None):
height, width, channels = image.shape
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
self.net.setInput(blob)
outs = self.net.forward(self.output_layers)```
(tensorflow) SoSe22/appliedcomputerscienceinsports/detector » python main.py main deleted modified untracked
Traceback (most recent call last):
File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/main.py", line 55, in <module>
run()
File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/main.py", line 32, in run
processed_image = detector.detect(preprocessed_image, original_image.shape)
File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/test_ui/detector/__init__.py", line 32, in detect
blob = cv2.dnn.blobFromImage(image, 1, (416, 416), (0, 0, 0), True, crop=False)
cv2.error: OpenCV(4.6.0) /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:3689: error: (-215:Assertion failed) !dsize.empty() in function 'resize'```
(3, 2160, 1216) that's my image.shape
That normally means that the image is empty @pliant star maybe you didn't load it in correctly
Like a wrong path or something
Try cv2.imshow(image) and cv2.waitkey(0)
before the line that gives the error
is it possible to detect 3 classes using only 2 "anchor boxes" when using bounding box technique??
why does he take 2
am i right if i think, we assume that more then 2 classes are highly unlikely in single bounding box??
Anchor boxes are a set of predefined bounding boxes of a certain height and width. These boxes are defined to capture the scale and aspect ratio of specific object classes you want to detect and are typically chosen based on object sizes in your training datasets.
This is what I found on anchor boxes
It seems that there may be 2 different shapes that are likely to contain a car/motorcycle/pedestrian etc.
@mint palm
Does that make sense?
So for each bounding box shape (2) you get a result for each class (8) whether that class is contained in a single box of the 3x3 grid
this can be contradict with what he further said also the 8 signifies following:
is something detected
center of detection x
center of detection y
length of anchor box
widht of anchor box
one hot representaion for class1
one hot representaion for class2
one hot representaion for class3
these are stacked twice(as seen)
i think, this is actually the reason now
Anyone know of a tokenizer that will parse a sentence like this
"You're amazing," she said.
['"', "you're", "amazing", ",", '"', "she", "said", "."]
" you're amazing , " she said .
you can probably do phrase chunking with spacy
Could you elaborate, google isn't being useful today
spaCy (with the upper C) is a general-purpose NLP library. are you familiar with it?
oh, I some-how read scipy. I had forgotten about spacy.
I forgot how amazing spacy's docs are
what do I need to know to learn and use ML/AI
linear algebra, calculus, and probability/statistics, to start.
tysm
really 🙂
could someone help me with 3d patch extraction?
if you append dataframes to a list, such that each item in the list is a dataframe, is there a way to name these dataframes for reference? or do you just need to use the list index (listname[0], for example) and remember what each dataframe is meant to tie back to?
i think with my situation I may need to just keep track in my head what each df is. they're more or less in a certain order already, but it'd help to have a visual reminder
do all these dataframes have the same schema?
yeah, if by same schema you mean the same columns
then you should probably have one dataframe with multi-indexed rows.
unrelated to my other question.
I wish to make an model that creates space ships for a video game. I've got the data from the save files and the game store them as json.
"Parts": [
{
"IDString": "cosmoteer.armor_1x2_wedge_L",
"Location": [
-14,
6
],
"Rotation": 0
},
{
"IDString": "cosmoteer.armor_1x2_wedge_R",
"Location": [
-14,
8
],
"Rotation": 2
},
ect...```
Json is impractical for ML models, but I could turn the part list into an image, giving each tile an id. The big issue is parts come in a bunch of different sizes. (Luckily always rectangles.) So if I just tell it to predict a grid of part ids, it would likely start making impossible parts. But I can't see a better way.
Also having it output a rotation for every tile seems problematic, but inferencing the rotation from an output takes away control from the AI when multiple rotations could work.
I also need it to output a map of doors. Those are 1x1 so it's easier. my only issue here is this might be too much to throw at one model.
its this worth reading? http://aima.cs.berkeley.edu/
Unknown label type: 'continuous-multioutput' what's this error?
@chilly abyss I was looking into this earlier today. People suggested using pd.to_datetime(df.date) but I could not get it to work. You can do
import datetime
df['date'].dt.normalize which sets all of the times to midnight, it obviously doesnt get rid of the time, but it may be useful
Thanks @tawdry urchin This was the error msg I got - 'AttributeError: Can only use .dt accessor with datetimelike values'
I am still checking online for solution, I think because of the '..+00' at the end of the ts (time series)
I will also try the solution you gave
@chilly abyss Sorry you are right, you need to use both, so you do
df.date = pd.to_datetime(df.date)
df.date = df.date.dt.normalize
I was able to get it working like this in mine, hopefully it helps you out
Im still learning myself, I basically have to learn python to create fixes in processes, so apologies if im missing anything
haha sounds good, im pretty comfortable with pandas, and ive done some pretty neat projects with python, its the classes/functions that I am mostly unfamiliar with, the code i am writing isn't really set up to run forever because im working with pretty bad data
in pandas is there a way to specify dtype as "dict"? both strings and dictionaries are "objects" and it's making preprocessing a dataset rather annoying
i.e. select all columns that are a dict
no. anything other than a primitive type (including strings) are object.
you probably shouldn't have dicts as elements of the dataframe. that sounds like a bad version of a better data model.
Not possible, I'm pretty sure. Dtypes are a numpy concept and essentially describe how an element is stored in memory. The dtype of object actually means "a pointer to some PyObject". Pointers to objects of different python types aren't different, so there's no dtypes for different python types.
(This is also exactly how python lists store elements.)
can't do much about it as the dataset i'm running machine learning on has arbitrary dicts as an integral part
but thank you
Could you give an example? Im not sure I quite understand
it's pretty unlikely that there isn't a better way to do it, but we'd need to know what those dicts are (their key-value pairs, their types, and what they represent)
i'll give an example of a few:
"attributes": {
"mana_pool": 1,
"undead_resistance": 1
}
there's around 15 different possible keys, and maximum of 2 can be present in one row, value is always an int
"runes": {
"BLOOD_2": 1
}
there's an unknown amount of possible keys, but the value is always an int
"gems": {
"AMETHYST_2": "FINE",
"AMETHYST_1": "FINE",
"AMETHYST_0": "FINE"
}
unknown amount of keys, value is always a string
the thing is is that i'm able to easily process those dicts to make them better to work with (i.e. similar to one hot encoding, convert the key to a column and the value becomes the cell for that column), however when trying to continue with preprocessing when i use pd.getdummies to one hot encode all of the leftover strings, it's finding a dict in the dataset and i'm trying to figure out where it's hiding
I have never dealt with a dataset like that and truthfully im not sure what I would do, but luckily im not the smartest one here 🙂
haha it's all good
there must be somewhere where i'm forgetting to delete a column or smth after i preprocess it
but when i print the dataset columns none of them have dicts which is odd
just make sure your dataframe names are consistent throughout
pretty easy to bring the wrong dataframe in and then youre reviewing the correct one and working with a prior version
mhm
Numpy is designed to contain homogeneous plain old data (POD). It looks like you could simplify runes and gems since they don't seem like they should even be dicts, but rather lists.
i know that, that's what i'm doing... i can't change how the dataset is originally stored bc that's out of my control 😭
i'm just working with what i have
Convert them to lists and store them somewhere.
Then just use that, so you don't have to convert each time.
You can transform it however you want and store it however you want.
Take the bad data format and fix it.
The attributes looks like an array of booleans. If that is the case, since there are only 15 at most, you can use a bitmask.
(bitset)
How many different values the runes and gems can be changes how they can be stored.
If there is some fixed set of values you can improve it a lot.
not booleans, can range from 1-10
So 11 possible states for each item.
(Not there, or 1-10)
Well you can store those in a numpy array.
Runes and gems just look like two lists, that are dicts for some reason.
(But out of order?)
i blame the devs that made the game i'm running machine learning on lmao
but what i think you're not realizing is that i'm writing code on doing all of this rn
and i can handle that and i've already written code that converts these dicts into a better format
the issue is that when trying to run pd.getdummies it's saying there's still a dict dtype present, even though when I check the dataframe i don't find any
Show code.
sheesh that was passive aggressive, one sec
sounds like some stardew valley shit
hypixel skyblock 🙃
btw the code isn't pretty but it works, wrote it without the intention of it being read by other people
df = full_df.drop(['uuid'], axis=1, errors='ignore')
if verbose: print('Starting enchantments')
df = df.join(pd.DataFrame(list(df['enchantments'])).fillna(0).add_suffix('_enchantment')).drop('enchantments', axis=1)
if verbose: print('Encoded enchantments')
if verbose: print('Starting ability_scroll')
df = df.join(pd.DataFrame(ability_scroll_mlb.transform(df['ability_scroll']), columns=ability_scroll_mlb.classes_).add_suffix('_ability_scroll')).drop('ability_scroll', axis=1)
if verbose: print('Encoded ability_scroll')
if verbose: print('Starting gems')
df = df.join(pd.DataFrame(list(df['gems'])).fillna('').drop('unlocked_slots', axis=1, errors='ignore').add_suffix('_gem')).drop('gems', axis=1)
if verbose: print('Finished gems')
if verbose: print('Started runes')
df = df.join(pd.DataFrame(list(df['runes'])).fillna('').add_suffix('_rune')).drop('runes', axis=1)
if verbose: print('Finished runes')
if verbose: print('Started necromancer_souls')
df = df.join(pd.DataFrame(necromancer_souls_mlb.transform(df['necromancer_souls'].apply(lambda x: list(map(lambda y: y['mob_id'], x)))), columns=necromancer_souls_mlb.classes_).add_suffix('_necromancer_soul')).drop('necromancer_souls', axis=1)
if verbose: print('Finished necromancer_souls')
if verbose: print('Started attributes')
df = df.join(pd.DataFrame(list(df['attributes'])).fillna(0).add_suffix('_attribute')).drop('attributes', axis=1)
if verbose: print('Finished attributes')
here's the code for converting all of those dicts/lists that ik are present in the dataset into a better format with pandas
@iron basalt
Where does the get_dummies happen?
this is the code that contains it (it is run directly after the code block above):
X = df.drop('price', axis=1)
y = df[['price']]
if verbose: print('Started encoding')
X = pd.get_dummies(X) # here
df_columns = X.columns.tolist()
scaler_X = StandardScaler()
scaler_X.fit(X)
scaler_y = StandardScaler()
scaler_y.fit(y)
What is X's type / how does it look like? Before get_dummies.
X should just be a df filled with ints/floats/strings assuming nothing wrong happened in the first code block
What is it in reality?
one moment
it is what i believe it should be
set(X.dtypes)
# {dtype('int64'), dtype('float64'), dtype('O')}
dtype('0') is something invalid, probably a dict.
Well, it can be in a numpy array, but it's kind of like object.
no it's just object (anything not a primitive, including strings)
when i checked all of the columns it's all either strings floats or ints
So it's the strings.
yeah... that's what i want, pd.get_dummies's purpose it to one hot encode all of the strings
What does print(X.dtypes) show?
well there's 300 columns by the time it gets to X, so when i print X.dtypes it shouls a few int64s, a decent amount of float64s, and a whole lot of objects, and when i go to check all of the columns that show object they are all just strings
and i won't be able to send that here without flooding the channel, it doesn't even print everything to console after some point it says "..." then goes to the end (but i checked the dtypes manually without relying on the console output
And what is the exact error it gives?
Do your strings have some max size?
If your numpy arrays at any point contain different types of objects they will have the "object" type (like regular Python lists), but if it's all strings, with some max length, it can actually store that.
>>> x = np.array([foo, "Hello"])
>>> x
array([<__main__.Foo object at 0x7f402d6a7f70>, 'Hello'], dtype=object)
>>> y = np.array(["Hello", "World"])
>>> y
array(['Hello', 'World'], dtype='<U5')
>>>
``` With a unicode dtype.
@serene scaffold how should I go about this now?
It's hard to spot something obviously wrong with it, so all I can say at this point is to make hand crafted test data and assume that the input is hostile, especially if there is no official format specification.
(And that "object" may mean mixed types (it's a design flaw of pandas, because Numpy was not really meant for this and it uses it))
i doubt there are mixed types, due to the fact that i make sure that all columns have the same type throughout it before i go ahead and process it
it's not letting me paste the error here so i'll pastebin it
https://pastebin.com/zqdaGebm @iron basalt here's the error
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
this happens on the X = pd.get_dummies(X) line
That is what I got when trying to get_dummies on a df that had both strings and dicts in the same column, dtype shows as "object" (for the column).
If it's all string it could in theory be something like <U... for the dtype, which would only be strings up to that length. It's both faster and in this case safer for detecting issues.
Pandas uses object for strings, a design flaw to an extent.
They want dynamic strings.
(arbitrary length)
(This gives issues to databases too in terms of speed and safety and all that)
how could i convert a string column to a unicode dtype
?
test = pd.DataFrame.from_dict({
'test': ['abc', 'def'],
'test2': ['ghi', 'jkl']
})
print(test)
print(test.dtypes)
test test2
0 abc ghi
1 def jkl
test object
test2 object
dtype: object
If you make a numpy array with strings it will do it.
test = pd.DataFrame.from_dict({
'test': ['abc', 'def'],
'test2': ['ghi', 'jkl']
})
print(np.array(test['test']))
array(['abc', 'def'], dtype=object)
@iron basalt hm
any way to coerce it into the unicode string dtype?
Yeah if you explicitly do dtype=whatever for a numpy array
Pandas seems allergic to the idea though.
yep, even when i converted it into a unicode string, when i try to put it back in the dataframe it decides to becmoe an object again
Yeah, ```py
x = np.array(["Hello", "World"])
x
array(['Hello', 'World'], dtype='<U5')
Pretty annoying.
Makes debugging Pandas even harder.
what's worse, there's no way to limit casting
if i set any casting limits other than unsafe it decides that strings aren't strings anymore
the only way for it to work is casting unsafe
and the thing is, that converts the dict to a string too
🤦
Playing fast and loose with types.
Yeah, IDK, now gotta make a separate part that loops through it all and tries to find out the type to detect an issue, then try to narrow down what caused it and craft an example to test against. Ideally the input would have some spec. to make this way less hacky.
seems i got something basic to work for that
test = pd.DataFrame.from_dict({
'test': ['abc', 'def', {'test': 1}],
'test2': ['ghi', 'jkl', {'test': 2}]
})
def test_func(cell):
try:
json.loads(cell)
except:
print(cell)
return cell
test['test'].apply(test_func)
it excepts whenever it encounters a dict
sorry whenever it doesn't encounter a dict
so i can kinda flip flop it
wait a minute no this doesn't work
first i need to convert it all to a string
oh you're kidding me
it expects double quotes
ok @iron basalt this one actually works:
test = pd.DataFrame.from_dict({
'test': ['abc', 'def', {'test': 1}],
'test2': ['ghi', 'jkl', {'test': 2}]
})
test['test'] = np.array(test['test']).astype('unicode')
def test_func(cell):
try:
isinstance(eval(cell), dict)
print(cell)
except:
pass
return cell
test['test'].apply(test_func)
alright trying to apply that to the entire dataset
istg if it passes cleanly i'm gonna punch a wall
Ok, just make sure there is nothing strange happening with the eval.
Don't want to suddenly start deleting root.
yeah that's not gonna be an issue thankfully
ISTFG
@iron basalt IT PASSED CLEANLY
IDEK WHAT TO SAY
Uh, i'm out of ideas for now.
me too
maybe i should use sklearn to onehotencode now
maybe that will work better than get dummies 😭
Personally I would have switch languages to something with static types and done it manually at this point. This seems a bit too complex for Pandas.
what language would you suggest?
R?
IDK, whichever you prefer for manual looping. Or think can handle it with its own equivalent of Pandas.
Or a different dataframe lib for Python.
thank you, ig i'll try polars now
It can convert to a Pandas df, so you could do your work before the get_dummies with it.
Then if get_dummies is still bugging, IDK.
>>> import polars as pl
>>> df = pl.DataFrame(
... {
... "A": [1, 2, 3, 4, 5],
... "fruits": ["banana", "banana", "apple", "apple", "banana"],
... "B": [5, 4, 3, 2, 1],
... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
... "optional": [28, 300, None, 2, -30],
... }
... )
>>> df
shape: (5, 5)
┌─────┬────────┬─────┬────────┬──────────┐
│ A ┆ fruits ┆ B ┆ cars ┆ optional │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ i64 │
╞═════╪════════╪═════╪════════╪══════════╡
│ 1 ┆ banana ┆ 5 ┆ beetle ┆ 28 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ banana ┆ 4 ┆ audi ┆ 300 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3 ┆ apple ┆ 3 ┆ beetle ┆ null │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4 ┆ apple ┆ 2 ┆ beetle ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5 ┆ banana ┆ 1 ┆ beetle ┆ -30 │
└─────┴────────┴─────┴────────┴──────────┘
>>>
``` Polars has an actual str type.
Rather than just object.
>>> df2 = df.to_pandas()
>>> df2
A fruits B cars optional
0 1 banana 5 beetle 28.0
1 2 banana 4 audi 300.0
2 3 apple 3 beetle NaN
3 4 apple 2 beetle 2.0
4 5 banana 1 beetle -30.0
>>> df2.dtypes
A int64
fruits object
B int64
cars object
optional float64
dtype: object
>>>
(Pandas just decided to throw everything under "object" because it's using numpy)
possible to go from pandas -> polars?
if so then i can do the first parts in pandas
possibly then after that convert to polars
unless polars doesn't have a get dummies equivalent
then i'll have to refactor my entire processing setup 😀
oh it seems it does
polars.from_pandas and polars.get_dummies both exist
Yeah, maybe try all polars. Should not be too much work to switch it.
I find polars easier to read. Feels like database queries.
looking at the docs i agree
who tf decided pandas would be the standard
this seems so much better
If your Polars code looks like it could be Pandas code, it might run, but it likely runs slower than it should.
https://pola-rs.github.io/polars-book/user-guide/coming_from_pandas.html#key-syntax-differences
Standards by popularity are not really standards, but also standardization should only be done when the thing being standardized has had a high level of effort and hindsight to it (and is not rapidly changing).
Most big Python libs are constantly changing (open source) and all that, not really stable stuff to make standards of.
More abstract standards do exist though and are fine, like the one for generic array-like API for Python libs.
@iron basalt might be a dumb question, but is polars able to store dicts?
because i keep getting an error when trying to convert a string representation of a dict into a dict saying "tuple must be same length"
Hello darkness my old friend, over-fitting has come again.
IDK, storing a dict in a dataframe is a strange thing to do, not what they are really meant for.
Don't hate on pandas 
Also, there's pyspark for larger datasets. and SQL
I have not read through the entire Apache Arrow Columnar Format yet. Will get to it at some point. Only have a rough idea of how it works.
Manually would have been done with it, it might seem like more work, but you don't have to deal with any limitations / really learn anything new (and it can be done in pretty much any language).
Dataframes are really nice when your problem / data fits with them (same with databases).
Extracting data from some non-standard file format will always be a pain best done manually.
(If that file format is complex and/or highly structured / nested)
(CSV would be an example that is the opposite, well understood, simple, built-in support everywhere, fits well into what dataframes want to do)
im going to a machine learning event thingy and i was wondering what all i should bring on a flash drive
only thing i can think of is a matrix multiplication and normalizing algorithm
if my dataframe uses a datetime index, is it possible to find the "row number" of a specific time? I am wanting to drop all rows that occur after a certain time, but am unsure of how to do it
ping with response so i can see it
i would typically do something like
df.drop(df.tail(rows).index, inplace = True)
But because of the datetime index, I can't come up with an integer to put in place of "rows"
Is it possible for a function used with dataframe.apply to create a new dataframe?
I want to read each row in my original dataframe, and based on the values of certain columns, conditionally populate a new dataframe.
Hey,
lets say i have a column in 2 dataframes, that are in a list, that has the same name and i want to remove alls rows with 1 in that column. with ```
for df in df_list
It doesnt apply it to the dataframes. Works without for
I am making a visualization here and the bar chart has lines on the bars
I want to remove these lines
here is the code
import pandas as pd
import plotly
import plotly.express as px
import plotly.io as pio
df = pd.read_csv("Caste.csv")
df = df[df['state_name']=='Maharashtra']
#df = df.groupby(['year','gender',],as_index=False)[['detenues','under_trial','convicts','others']].sum()
barchart = px.bar(data_frame=df,
x='year',
y='convicts',
color='gender',
opacity=1, orientation='v',
barmode='relative',
)
pio.show(barchart)
@untold bloom cheers for your help in the help channel, was pulled into a meeting so wasn't able to follow through but thanks again ^^
Guys help pls creating a dataframe to count members with diff activity statuses. I was here yesterday but need help again #help-pear
🌟 👉 hi guys, if you are into generative AI/generative art, you must try this Python library https://github.com/jina-ai/discoart super easy to use as long as you know a little bit Python and smoothly run with free GPU on Google colab!
Dash is pretty cool
Hi Everyone,
I wanted to know how to find correlation between multiple variables in python?
I was looking at numpy document where it said with np.corrcoef() you can find correaltion but only of 2 variables
No I have been told to use only np.corrcoef
what's your question?
you can place the variables as columns (or rows) of the matrix (or matrices) you pass to np.corrcoef
you can put as many variables as you want into the columns or rows of the matrix, as long as you specify which axis you put them on. in this example, we see that, as one would expect from independent random vectors with mean 0, their correlation goes to 0 with the number of observations
In [16]: import numpy as np
In [17]: x = np.random.normal(0,1,(20,2))
In [18]: np.corrcoef(x, rowvar=False)
Out[18]:
array([[1. , 0.02356115],
[0.02356115, 1. ]])
In [19]: x = np.random.normal(0,1,(100,2))
In [20]: np.corrcoef(x, rowvar=False)
Out[20]:
array([[ 1. , -0.06573243],
[-0.06573243, 1. ]])
*edit on second glance it doesn't behave so nicely even with so many samples, but you still get the idea of how to use the func
you can have as many variables as you want, by default array rows are variables and columns are observations
In [25]: x = np.random.normal(0,1,(10000000,2))
In [26]: np.corrcoef(x, rowvar=False)
Out[26]:
array([[1.00000000e+00, 2.93622044e-04],
[2.93622044e-04, 1.00000000e+00]])
ok, this is better
you could alternatively compute it "by hand" by normalizing the rows or columns, and then computing X^T X or X X^T depending on how you arrange your data
So I have four variables(ABCD)
A, B, C, D
1 2 3 4
5 6 7 8
11 12 13 14
first I was told to find pearson correlation between A and D variables
by using scipy.stats.pearson()
and now I have been told to find correlations between all variables using np.corrcoef()
aight. so you have your variables arranged as columns, and several observations arranged as the rows. that means you'd also have to use rowvar=False
Ohhhkayyyy Thanks🙏 🤩
In [32]: x = np.zeros((3,4))
In [33]: x[0,:] = np.arange(1,5)
In [34]: x[1,:] = np.arange(5,9)
In [35]: x[2,:] = np.arange(11,15)
In [36]: x
Out[36]:
array([[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[11., 12., 13., 14.]])
In [37]: np.corrcoef(x, rowvar=False)
Out[37]:
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
like so
SO when we don't use rowvar=False
Is np.corrcoef() gives correlation between each element in columns?
yep
it would take the rows as defining variables, and each column would be an observation of the variables
the output would be 3x3
Okay got it thanks🎉
import requests
import pprint as pp
# Change this to be your API key.
MY_API_KEY="bla bla"
url = "https://beta3.api.climatiq.io/search"
query="hotel room"
query_params = {
# Free text query can be writen as the "query" parameter
"query": query,
# You can also filter on region, year, source and more
# "AU" is Australia
"region": "AU"
}
# You must always specify your AUTH token in the "Authorization" header like this.
authorization_headers = {"Authorization": f"Bearer: {MY_API_KEY}"}
# This performs the request and returns the result as JSON
response = requests.get(url, params=query_params, headers=authorization_headers).json()
# And here you can do whatever you want with the results
print(response.keys())
so i'm trying to form the URL for this API request but idk how
bc printing the json out on my console is lagging my machine
i can't even look at the json without my computer dyying