#data-science-and-ml
1 messages ยท Page 282 of 1
im really not sure i think his point is to get me familiar with using mapping and reduction in pandas, but its throwing me for a loop
this is what i get with the max() applied
this is close to what i need it to display
this is the exact question he is asking verbatim. Find the State with the highest density of each of the race categories (e.g.Hispanic, White, Black, Native, Asian, Pacific) - (6 answers). Please note that "Puerto Rico" is not a state even though it is in the data.
surely theres census data that just has it by state
instead of counties
that would make it much easier
it only displays buy state if i use groupby(['State'])
but he said not to do that. im not sure why
without it it looks like this:
without it median it displays the highest density of a county within a state so i would get 100, but alabama doest fully contain 100 percent of a race
median takes the average of all counties
mean is close in this usage if i use it i get the same answer
so this is almost the perfect answer, its just missing the name of the state
for instance, new mexico is the highest hispanic density, of 43.5
but it fails to display new mexico
it just shows that 43.5 is the highest density
but you cant use group by?
weird
i would ask someone who knows pandas more than me
i would just assign a title in print to each, or use a dictionary to do so, but he said it cant be hard typed. he didnt say i couldnt use groupby he just stated that i would never get the answer to work if i use groupby
your data frame, what are the row names?
if its states you can do this https://stackoverflow.com/questions/26640145/how-do-i-get-the-name-of-the-rows-from-the-index-of-a-data-frame

the row names are just index 0 - 74000
ahhh
this is why group by makes sense to me
maybe you can just return it using indexing
sorry sum is better here than.count
makes better sense
so i suppose that density = Hispanic/TotalPop for each state
even with that info i get lost still lol yikes
well thank you for taking the time to look through it with me @misty flint i appreciate it greatly.
np sad i couldnt help more
tbf i just started coding not long ago

i think its more of a data science question rather than a pandas question tho
but idk
SciPy's solve_ivp documentation and examples use time t as the independent variable such as dy/dt = f(t, y). But as far as I can tell, the solver can be used to solve ODE systems for space/position such as dy/dx = f(x, y). Is this true or is the solver restricted to ODEs in the time domain? Here's a link to Scipy's docs: https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.solve_ivp.html
im unfamiliar with scipy
the way im interpreting the documentation, it looks like t is fine as long as its a 1 dimensional variable..?
Yes, that's how I see it. You can define the function passed to the solver however you like as long as it meets the requirements of the solver. However, the returned solution object will always be in terms of y and t.
Anyone have experience with matplotlib as a way to plot data?
im learning that next, but i have minor experience in it
wups I got distracted
have you solved your problem
i have not
i am trying like mad to though its due in a few hours lol if you could help i would greatly appreciate you
@velvet thorn forgot to @ you
go through again what you're trying to do and where you are
ill start with the question i am trying to solvee
Find the State with the highest density of each of the race categories (e.g.Hispanic, White, Black, Native, Asian, Pacific) - (6 answers). Please note that "Puerto Rico" is not a state even though it is in the data.
that is verbatim
okay
so i am using a csv from kaggle
sounds like a groupby problem
thats what i thought
so this is what i tried so far
d = df.groupby(['State'])[['Hispanic','White','Black','Native','Asian','Pacific']].mean()
d
and i get
a data frame with states as an the index and race for columns with mean values per state
then i need to find which of those states are the highest for each race
so i apply max() and get this
which is the correct answer but not in the format he is requesting
what format do you want it to be in
he wants it to read:
hispanic: New Mexico
so new mexico is 43.5 but when i apply max it is now a series and no longer a dataframe
so i lose the state column
new mexico is 45.3
yea, but the state has to be calculated not hard coded
sec
>>> df[df['State'] != 'Puerto Rico'].groupby('State')[['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']].mean().idxmax()
Hispanic New Mexico
White Vermont
Black District of Columbia
Native Alaska
Asian Hawaii
Pacific Hawaii
like this? @rotund dagger
yes exactly
change to
i see how to drop district in that
df[~df['State'].isin({'Puerto Rico', 'District of Columbia'})]
ive been working on this for 3 days you are absolutely amazing!
hopefully i can cruise through the rest of the questions now
yw ๐
yea, absolutely slayed it
my professor totally tried to throw me off that path too
hey
so I was
just thinking about your problem and
I feel like the statistical methodology is wrong?
because
what you're doing
is taking the mean of the percentages of each race, right?
but that doesn't necessarily represent the percentage of each race for that state
because each entry may have a different total population
do you get what I mean?
so im trying to filter out a row in my dataframe between 2 minutes ( my dataframe has this kind of timestamp: 2021-01-31 15:46:33 ) and i can not understand how pandas between_time works.
Right now i have this:
peaks = peaks[peaks['time'].between_time('00:30', '00:32')]
which gives me the following error:
TypeError: Index must be DatetimeIndex
if i run df.dtypes it returns the column as datetime64[ns]
what am i doing wrong/do i need to supply more info?
datetime64 !== DatetimeIndex
Do a conversion
does the column datetime need to be the index aswell?
It's a type error, you need to fix the type so pandas recognize it
e.g convert ur datetime64 to normal DateTime by calling .tolist() and I think then Pandas should recognize it
thanks, ill try that
no
between_time implicitly filters on the index
so you want to set_index first
ya i figured, did this now df['time'] = pd.DatetimeIndex(df['time'])
df.set_index(keys='time', inplace=True, drop=False)
plss help in this... I am a begginer and new here... so cant access the voice ... plsss help me in this... and have to give a voice message coz... the problem was long to write
is pandas a good library to begin learning data science?
hello guys
can anyone plz help me in installing yolov4 on ubuntu VM
?
any tutorial is appreciated thx
Yes, check this repo. https://github.com/qfgaohao/pythorch-ssd read near the end of readme
It has the option of fine tuning too, so make sure to read the params and change it as needed if you want training from scratch. (ps. I don't recommend training from scratch)
Hello all,
I have a question based on feature selection. Based on this picture(mutual info) what is the best approach when we need to decide what feature we gonna take and which we gonna drop? Like is it ok if I take the first 7(including age) features or should I just take the first two since they have better results?
Don't choose the number of features directly from plot. You can instead decide on some threshold, say cumulative 0.9 or cutoff 0.005
Then take whatever n you get from that approach
Note that while first two features seem to be clearly stronger, it doesn't automatically make other features bad. There's still information there.
Would any of you guys know a way to visualize neural networks like with a library similar to sns ? I have seen ann_visualizer, but I am working on my own neural net without using Keras so i am not sure if that will work
File "/mnt/disks/sdb/superai/ai/objectrecognition/vision/datasets/open_images.py", line 17, in __init__
self.data, self.class_names, self.class_dict = self._read_data()
File "/mnt/disks/sdb/superai/ai/objectrecognition/vision/datasets/open_images.py", line 63, in _read_data
class_names = ['BACKGROUND'] + sorted(list(annotations['ClassName'].unique()))
``` focus on this part of the traceback. this is giving you a clue about where to look
see if you can read the code in that place to figure out why this error is happening
My first guess would be it expects the class names to be provided in a certain format, yeah
ehy guys can i ask here for tensorflow/keras questions?
yep, you can
so ive a few questions, first, im working on a image segmentation project for the first time using the cityscape dataset, ive 2 folders one containing the images, and one containing the masks, ive made a function to create a list containing the path of all the images, can i convert these lists in a keras dataset?
Can I get some ideas for a project using machine learning
For a project to practice you can use the covid data sets. I used the data set as well for my data science lecture class. The source code of the lecture is free on GITHUB: https://github.com/kienlef/Lecture_Covid_19_data_analysis
Python Projects: How to Build a Simple Trading Bot Skeleton in Python | Episode 1 by Third Eye Cyborg Podcast โข A podcast on Anchor. This podcast episode goes into the code of a basic Python project. Let me know what you think, I am always open to feedback! https://anchor.fm/thirdeyecyborg/episodes/Python-Projects-How-to-Build-a-Simple-Trading-Bot-Skeleton-in-Python--Episode-1-epplkg
I will be using the knowledge that is covered in the Python Basics Series to conduct several Python Projects in the Episodes of this Podcast. In this episode I will be going over building a basic trading bot skeleton in the Python Programming Language. I also plan to go into other programming languages and technologies in future episodes.
Check ...
should i start learning pandas or mysql first
I'd say pandas
hello, can anyone recommend me good online visualization tools which can fetch data from an api
Hey, I'm currently collecting some data from Twitter Streaming API and I need to run the script all the time. What are my options in terms of free script hosting?
Does anyone how a sub-par GPU affects tensorflow training?
theres really 2 main things about the gpu you need to worry about
first is vram
if you have a very low amount of vram then some larger models won't be able to train
or you'd have to lower the parameters of the model or the batch size to get it to fit
second is the gpus actual speed
this, unlike the vram, wouldn't completely stop you from training the models
it would just make it slower/faster
hey does anybody know python well i need help
Hi I'm having trouble with some data visualizing I'm not seeing the whole behavior of contours
I'm just getting a chunk of it
do you know how to resolve it?
matplotlib?
yeah
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0
zf =1000
N = 400 # Number of Runge-Kutta steps
h = (zf - z0)/N
def f(p, z ):
x = p[0]
U = p[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)**2.(.3(1.+z)*3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)**2.((1.+z)**3.+.7))
return array([dx, dU], float)
zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
ypoints.append(p[0])
vpoints.append(p[1])
k1 = h * f(p, z)
k2 = h * f(p + 0.5k1, z + 0.5h)
k3 = h * f(p + 0.5k2, z + 0.5h)
k4 = h * f(p + k3, z + h)
p = p + (k1 + 2k2 + 2k3 + k4)/6
z0q=1000 #interesting redshift values in QSO data(initial)
zfq=1100 #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
x = q[0]
U = q[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)**2.(.3(1.+z)*3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)**2.((1.+z)**3.+.7))
return array([dx, dU], float)
zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)
for z in zqpoints:
xqpoints.append(q[0])
vqpoints.append(q[1])
k1 = h * f(q, z)
k2 = h * f(q + 0.5k1, z + 0.5h)
k3 = h * f(q + 0.5k2, z + 0.5h)
k4 = h * f(q + k3, z + h)
q = q + (k1 + 2k2 + 2k3 + k4)/6
oh no can you put in in markdown
nvm
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0
zf =1000
N = 400 # Number of Runge-Kutta steps
h = (zf - z0)/N
def f(p, z ):
x = p[0]
U = p[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))
return array([dx, dU], float)
zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
ypoints.append(p[0])
vpoints.append(p[1])
k1 = h * f(p, z)
k2 = h * f(p + 0.5k1, z + 0.5h)
k3 = h * f(p + 0.5k2, z + 0.5h)
k4 = h * f(p + k3, z + h)
p = p + (k1 + 2k2 + 2k3 + k4)/6
z0q=1000 #interesting redshift values in QSO data(initial)
zfq=1100 #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
x = q[0]
U = q[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))
return array([dx, dU], float)
zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)
for z in zqpoints:
xqpoints.append(q[0])
vqpoints.append(q[1])
k1 = h * f(q, z)
k2 = h * f(q + 0.5k1, z + 0.5h)
k3 = h * f(q + 0.5k2, z + 0.5h)
k4 = h * f(q + k3, z + h)
q = q + (k1 + 2k2 + 2k3 + k4)/6
!e
You are not allowed to use that command here. Please use the #bot-commands channel instead.
ah dang it
ill just pull up a notebook rq
what were the libraries you used
numpy
matplotlib
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
%matplotlib inline
from numpy import array, arange
plt.rcParams["font.family"] = "serif"
fig, ax = plt.subplots()
level = [ -.0026, -0.0021,-.0016,-.0011,-.0005]
levels = [ -0.0000875,-0.0000701,-0.0000576,-0.0000450, -0.0000285]
plt.contour(W,M,xqpoints,10, cmap='jet');
#CS=ax.contour(Q,P,xpoints, levels, colors='black')
plt.colorbar()
#CS=ax.contour(W,M,xqpoints, level, colors='green')
#plt.ylim([-5,-2])
#ax.set_ylabel("$ ฮฑ_{had,0} $")
#ax.set_xlabel("$ฯ'_{0}$")
#ax.clabel(CS, inline=1, fmt='%1.9f')
#ax.yaxis.grid(True, zorder=0)
#ax.xaxis.grid(True, zorder=0)
plt.show()
this is what I'm doing for the contour
dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))```
this line
theres problems with the parentheses
dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
copy paste it directly maybe?
oh yeah there is some difference
you raised to the power of two at one part
i dont have that in the original code
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0
zf =1000
N = 400 # Number of Runge-Kutta steps
h = (zf - z0)/N
def f(p, z ):
x = p[0]
U = p[1]
dx = U
dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
return array([dx, dU], float)
zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
ypoints.append(p[0])
vpoints.append(p[1])
k1 = h * f(p, z)
k2 = h * f(p + 0.5*k1, z + 0.5*h)
k3 = h * f(p + 0.5*k2, z + 0.5*h)
k4 = h * f(p + k3, z + h)
p = p + (k1 + 2*k2 + 2*k3 + k4)/6
z0q=1000 #interesting redshift values in QSO data(initial)
zfq=1100 #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
x = q[0]
U = q[1]
dx = U
dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
return array([dx, dU], float)
zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)
for z in zqpoints:
xqpoints.append(q[0])
vqpoints.append(q[1])
k1 = h * f(q, z)
k2 = h * f(q + 0.5*k1, z + 0.5*h)
k3 = h * f(q + 0.5*k2, z + 0.5*h)
k4 = h * f(q + k3, z + h)
q = q + (k1 + 2*k2 + 2*k3 + k4)/6
this is the code
oh yeah its dif than the previous one
keeps giving me errors
like its not multiplying the k1 and k2 in the previous code
let me use this one
i think when you copied from this it got messed up
i think the key here is changing what pass in in these lines:
zpoints = arange(z0, zf, h)
zqpoints = arange(z0q, zfq, i)
i believe
i could be wrong tho
so let me try
well if i do huge linspace then i can't see my paramters well enough
were you able to make the full contours anyhow though
what error?
it does look interesting
the problem here is if you increase the linspace parameters too much you get exponent overflow error
i think theres a mathematical solution instead
yeah mass is a power 10
maybe using log or something
and then youll visualize it that way instead
how much can you change the equation
i would start there maybe
what do you mean changing the equation
second derivative
2nd derivative of x
have you messed at all with the delta?
which delta?
what's/who's gm
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-10, 3,400)
w= np.linspace(-.0002,.0008,400)
dunno if its actually supposed to be there or if its matplotlib
you can try it
this is helpful thanks i still want to see the full behavior though ๐ฆ
no worries thanks for your help
you should not
tag specific people
for help
Write a short Python program which, given an array of integers, a, calculates an array of the same length, p, in which p[i] is the product of all the integers in a except a[i].
Can someone help me with a question?
what's your problem
Write a short Python program which, given an array of integers, a, calculates an array of the same length, p, in which p[i] is the product of all the integers in a except a[i].
I can't visualize the whole contour
I'm getting it partially
like
there's stuff
outside the bounds of the box
that you want to see?
yes
honestly there's way too much code + discussion there than I care to wade through
but
a good start would be
ax.axis, which lets you set the viewport bounds
if your range is very big
I would suggest
a different scale
log scale, in particular
the problem with that is I want to see the change in parameter in really small scales like order of 10^-5 if i do log scale it would just be -3,-4,-5...
but i want to see for values like .00005 and .00008 because I want to compare it with another set of data
think
you might want to visualise subsets
use axis to move the viewport
to parts you want to focus on
sounds too big otherwise
I have a weird question
Do you guys fully memorize plot methods or you do it copy pasta from google
mostly google here
You don't need to memorize anything...it comes with practice
And still if you're stuck somewhere, google is always there
im doing an internship right now, here they mostly document their google findings and present them to each other to teach everyone if its usefull
i need to build a dashboard which visualizes analysed data that suits the company application landscape
and i need to do the analyzing part myself too
and write a massive report about it too
I envy you
hey guys if i use the tf.data.Dataset.list_files can i specify to import only images that have a particular word in the name?
I'd say what you search for changes
like as you get better
usually you know what you want to do, just not how to do it in a specific context
so for example
for a certain problem, a beginner might search "how to find text with dynamic prefix"
which is something you can do with a regex
and you might just have forgotten the syntax
so you might instead look for "python lookbehind regex"
experience also helps you know what to search for
asking the right question is very important
I personally mostly Google things as well, there's too many other things to think about rather than worrying about memorizing specific syntax or parameters of an api call
over time, when presented with new problems, even if you don't know exactly how to solve it, you'll have a sense of what kind of approach is more likely to work
there is stuff that can be looked up easily (what a parameter is called) and stuff that cannot (how to reduce a set of complex business requirements to a viable technical architecture).
you want to be the kind of person who is good @ the latter.
to draw one final analogy: the best Scrabble players are not the best writers.
Thanks for your amazing input
I agree
4 for loops and it takes forever to execute lol
well I just need to split it into country wise then
it takes forever
iterating over 1.75mil is ridiculous?
You can take the line country_city_list = df['City'].unique() out of the loop
Also, use df.loc instead of df when boolean masking, that way you avoid making copies of the dataframe
Thanks guys
I'll try changing it
seems like data are too large to do the 4 for loops
figure = plt.Figure(figsize=(0.5 * res_width / 100, 0.75 * res_height / 100), facecolor="#67676b")
figure.add_subplot(fc="#15151c").plot(df["Close time"], df["Close"].astype(float), "-" + colour)
figure.add_subplot(fc="#15151c").plot(df["Close time"], moving_avg, "-w")
error: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
figure.add_subplot(fc="#15151c").plot(df["Close time"], moving_avg, "-w")
How do I properly add another line to my figure
the graph comes out correctly but I wish to remove the deprecation warning
Hi guys! I have a question! I have a dataset that contains unique orders in it for a number of accounts. So the first column in each row is account number, and the rest are info on a specific order for that account. Hence, any account may have a number of rows for different things they have ordered - all of them containing the unique account number. I would like to pull a list of unique account numbers that have NOT ordered any from a short list of items, which I would identify by keyword... Any ideas? You're saving a life here. Thank you, hope you're keeping safe
PS. Apologies if this is the wrong space for this query...
How is the dataset stored? (Please ping to reply for every message directed at me, even if you think I'm here.)
@serene scaffold appreciate you getting in touch Stelercus. Dataset stored in csv/xls.
can you show an example of the first few rows of the csv?
@serene scaffold Happy to. So I am attaching an example and will talk a bit about it
Hi, please share data sample,
@serene scaffold
@glad night please share it as text so that I can use it in a program.
namely as a csv
For future reference, text is the best format for anything pertaining to a question. Anything you can share as text and not as a screenshot, do that.
@serene scaffold @bleak fox Of course guys, give me a second. Just to comment on this - the idea is that if I run whatever code we come up with successfully, my 'exclusion criteria' would be the keywords 'voucher' and 'coupon', and in this specific case the one that does not contain them is AB000005
@serene scaffold Apologies - Discord turned it into a png! Attaching now
@serene scaffold @bleak fox
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
for future reference, share text as text in the chat. However I'll download it this time around.
@serene scaffold Thanks man I will.
Do you have pandas installed?
@serene scaffold Yes!
So for this dataframe, "unique customer number" is the index. And you'd like to get every index for which "coupon" and "vounter" are NOT a substring of the value in "product name", yes?
@serene scaffold Exactly that.
>>> df['Product Name'].str.contains('coupon')
Unique customer number
AB00001 False
AB00001 False
AB00001 False
AB00001 False
AB00001 False
AB00003 False
AB00003 True
AB00003 False
AB00005 False
AB00005 False
AB00005 False
this is part of the solution
@glad night the things you need to learn are how to use pd.Series.str.contains, how to do boolean logic with pandas, and how to select from a dataframe using a boolean dataframe.
!docs pandas.Series.str.contains
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)```
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Parameters **pat**strCharacter sequence or regular expression.
**case**bool, default TrueIf True, case sensitive.
**flags**int, default 0 (no flags)Flags to pass through to the re module, e.g. re.IGNORECASE.
**na**scalar, optionalFill value for missing values. The default depends on dtype of the array. For object-dtype, `numpy.nan` is used. For `StringDtype`, `pandas.NA` is used.
**regex**bool, default TrueIf True, assumes the pat is a regular expression.
If False, treats the pat as a literal string.
Returns Series or Index of boolean valuesA Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.
See also... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html#pandas.Series.str.contains)
@serene scaffold Thanks a lot man - I will do some reading after work tonight.
hello guys, im trying to import images into a tensorflow dataset, i need the dataset for image segmentation and i cant understand which is the correct way to create teh dataset, does it need to have couples (image-mask)? or i need 2 datasets one containing the images and one containing the masks?
u'll get there, im not that special
anyone ever worked with a datalake and databricks togheter? how do you check for new available data? does a notebook save variables if you run it as a task?
So I have this numpy array as follows:
(I'm substituting in small numbers for the dimensions to make it easier to explain)
It's shape is (10, 10, 5, 5, 3)
it's a 10x10 array of 5x5x3 image tiles
I need to recombine it into a single 50x50x3 image
what's the best way to do this?
Obviously I could do it with two for-loops but I'd rather have a more efficient vectorized way
Good afternoon, can you tell me where a data science programmer can work?
Can you work in physics and chemistry labs?
I wanted to go to Astrophysics College to work with python, but I don't know if there is a library for that.
๐ฎ
ehueuehue
In Brazil python is very popular, they use it a lot to create websites with back end and facial recognition programs.
Okay, thanks for helping me.
hoe to hak naza
You will need blue dye and the white house heuheuhueheeuheu
Does anyone here use pytorch or tensorflow? If so why do you use each one respectively and which would you prefer to use for a personal major web project that incorporates Django? What are the learning curves?
Hello! I'm currently working on a project where I need to convert the adjective form of a country name into its noun. For example, convert Italian to Italy, Spanish to Spain, etc...
Does anyone know how to achieve this? Its hard to search this on google, because the default search returns currency converters lol
ngl i looked at this for forever until i realized its supposed to be read bottom-up
depends
if you can guarantee
no misspellings
and a fixed + known set of source words
you can just use regex + replacement
otherwise things get more complex

in what order are the tiles?
depending on structure
some combination of np.concatenate and np.stack
Ooh perfect, thanks!
Is it ok if I post a referral link for $15 off of a dataquest subscription? I get a free year if someone signs up with it, but I don't want to break any rules regarding spam or solicitation. Someone here might be able to make use of it.
Seems like it shouldn't be an issue. I've found them useful for brushing up on R and Python for data manipulation. Full transparency: if four people sign up through the link I get a free year. I hope someone is able to make use of it. Link: app.dataquest.io/referral-signup/p2d1jh5t/
Take Developer Ecosystem Survey by JetBrains
https://t.co/S8US6FsJAQ?amp=1
Not really a Python related question, more of a data pre-processing question. I have a dataset which is about 600,000+ rows, would it make sense to remove rows to make it easier to work with or would that give me bias or incorrect results?
Depends on how your select the rows to be removed
If you do it in a completely random manner it should be reasonably fine.
Yeah I'm not deleting selected rows or such, it's all random.
However you should note that the more data you have, the easier it is for your model to generalize
Still, using less data may be sensible if you just want a proof-of-concept
If anyone here has a solution to this issue, please tell me.
https://github.com/tensorflow/tensorflow/issues/46247
System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have followed this tutorial using my own data. Tutorial: https://stackabuse.com/...
Hey guys ive this error when executing my script : Shapes (None, 256, 512, 3) and (None, 256, 512, 4) are incompatible.
how can i check what is giving it?
Find the line where the error occurs, which should tell you which two tensors have incompatible shapes
it says that happens when i train my model so i cant understand with part of my model is wrong
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
ok, where should i put this parameter?
If you need to debug your model you can just take the first few elements of the dataset to save time
Are you using the correct loss function? Seems that the error is occurring there
im honestly going blind right now, i tried both sparse_ catergorical_cross and categorical cross
i keep receiving a different error everytime
this is my model, https://paste.pythondiscord.com/kesijaxipu.apache
as input i use pictures rgb of size 256,512
Can you display the model summary?
yes
this is my first try at image segmentation and im a bit lost tbh
https://paste.pythondiscord.com/boyakewuve.md here is the summary
How about the ground truth?
Ground truth = The correct "answer" you're supposed to predict
as u could understand im a beginner, so idont know exactly how to check the ground truth, i have two numpy arrays containing my images and my mask, i create them a dataset with the from tensor slices function and i feed it to my model just to check if my dataset was correct, but probably my model is not correct for the dataset im using
print the dataset directly, it should tell you the shapes of its components
PrefetchDataset shapes: ((None, 256, 512, 3), (None, 256, 512, 3)), types: (tf.float32, tf.float32)>
oh ok, they should be my masks
that doesn't fit your model output
your model is outputting 4 masks
but the ground truth only has 3
as indicated by the last element of its shape
conv2d_23 (Conv2D) (None, 256, 512, 4) this part of the summary?
yes
i changed the last conv2d so now the output correspond , but i still get this : https://paste.pythondiscord.com/pumuvalote.sql this is probably given by my loss function right?
Can you describe the image segmentation task you are trying to achieve?
Taking a step back ^
, i have the cityscape dataset and i need to make and train a model that can fit into a microcrontroller (1.5MB) so what i want to achieve is that the model is able to generate a mask that can classify the objects on the pictures, with an accuracy of 80%+
How are you generating the dataset?
ive a function that get all the paths of the pictures and the masks and give me back 2 lists containing all them sorted, i then generate 2 numpy array from the path that contain the images and the masks
and I suppose there are only 3 classes that have to be identified?
are they mutually exclusive?
teh calsses are 30 in the dataset
then why is the shape (..., 3)?
road , sky , person car, etc...
if there are 30 classes it should be (..., 30)
i see, seems logic, so now i need to sort some things out, first i need to import the class correctly i guess, so that i have all the 30 classes
i have json files containing the objects and the masks poligons
yeah
i guess i need to parse the json for every single mask right?
Not sure how it's formatted, it's on you to parse it
i just dont understand one thing, how can the model understand which pixel correspond to a class during the training?
It doesn't need to
You just have to ensure that the classes in your dataset are consistent, so that channel N always corresponds to class N
Then, whenever the model predicts a high value for channel N, the same is said for class N.
The model only needs to learn to predict a value for each channel (like in other tasks), which is implicitly the class (in this case)
ok, i have a json for every image/mask it is like : https://paste.pythondiscord.com/egiraqiqoj.json
you'll have to convert the labels into indices for the masks (channels)
u mean a list containing all the classes?
i made a dic containing all the classes like car:0 etc...
you can lookup the dict to get the index from the label name then
so now i just need to find a way to assign labels to the masks i guess
^
the dataset give me 3 masks, which one should i use? can i post here them?
a fully colorized one, named color.png, one that has car higlighted called instancedlds and one that seems to be greyscale called labellds
dont worry u helped me a lot already
3 masks for every image
anyone got a good guide on picking a good alpha value?
('clf', OneVsRestClassifier(MultinomialNB(alpha=0.1, fit_prior=True, class_prior=None))),
])```
What's the difference between fit and transform? I still don't get it :/
Hi all, i'm not sure if anyone will see this but while having a relaxing walk I had an idea to create and machine learning algorithm which will detect sequences in data e.g. Arithmetic sequences, geometric sequences using an example data set. Now i'm home i've attempted to find a example dataset but I can't find one and I wondered if there was any way you guys could help me find/create one to use? My template was going to be the pattern formula in the first column then the pattern from 0n to 10n. Thank you sooo much to anyone that can help me !๐
do anyone know how to import numpy from the cloned github numpy repo?
I have cloned the numpy github repo to my local.
then in terminal i did import numpy as np
but I'm not able to access numpy method. for ex. np.sin()
getting error AttributeError: module 'numpy' has no attribute 'sin'
what can i do when my data set doesnt have the same ammount of images per class?
the confusion matrix is a shiit xd every class (almost) filled with 0's
try changing the class weights
so having some classes that have a lower amount of images be weighted more
thats assuming you don't wanna do augmentation or anything which would be a better idea
i did augmentation
damn python bot has no scikit learn docs
that's how you would do the balancing class weights
001_Class 0.00 0.00 0.00 18
002_Class 0.00 0.00 0.00 17
003_Class 0.00 0.00 0.00 23
004_Class 0.00 0.00 0.00 18
005_Class 0.00 0.00 0.00 16
006_Class 0.00 0.00 0.00 22
007_Class 0.00 0.00 0.00 20```
it looks like this almost all the matrix
what are you simulating @azure leaf
you should really just benchmark your test set and plot for different values of alpha, this is usually a good way to determine optimal hyperparamaters, unless you have an underlying idea of how much shift or smoothing you would like
How would i find all rows that have a certain value within one of their columns in a pandas dataframe? for example if i had a dataframe that has one column that contains a class, but the column can contain more than one class in it, how would i get all the rows that have a certain class in that column
the classes within the column are structured like "class1|class2|class3" etc
and they can have different amounts of classes as well, or just one single class
pandas.DataFrame.eval?
no i figured it out, it was pandas.Series.str.contains
I knew that was a thing i didnt know it supported checking for multiple items
but thank you anyways
can someone help me convert unix to datetime?
df1['datetime'] = df1['unix'].apply(lambda x: datetime.utcfromtimestamp(x).strftime('%Y-%m-%d %H:00'))
this is my try. but it returns just OSError: [Errno 22] Invalid argument
the unix column looks like this in the dataframe:
0 1.612310e+12
1 1.612307e+12
2 1.612303e+12
3 1.612300e+12
4 1.612296e+12
...
33016 1.502957e+09
33017 1.502953e+09
33018 1.502950e+09
33019 1.502946e+09
33020 1.502942e+09
Name: unix, Length: 33021, dtype: float64
nevermind, figured it out with some help
Hi guys
it is in milliseconds and not normal seconds. just had to divide the x with 1000
I was trying to launch anaconda navigator using cli on Linux
But for some reason it is not working
Anyone knows how to solve this issue?
what is the seed argument for on flow_from_directory?
I have a ImageDataGenerator object
like this
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.975)```
validation split is so high for resting purposes
and then i do this
directory=data_dir, target_size=dimensions[:2],
seed=seed, subset='training')```
but when i print train_generator.filenames
It always prints the same ones
no matter the seed
Hygy
hi ! What would be the best approach to handling missing data in time series of cryptocurrencies data? I want to predict etherum prices but most others currencies didn't exist at the time etherum was invented so I have a lot of missing data in the bottom of my table. What is the proper way to handle this? Nice day to all of you! ๐
Can anybody help out with opencv? ๐ Thank you
hey guys which is the fastest model i can use for image segmentation? i need low accuracy (80%)
i have a list of 106 tokens
when i analyzed them i got this analysis , how can i predict the next token?
what i mean is to generate a few tokens which follows the similar pattern
ping me up if anyone decides to help
Dense(6,activation="sigmoid",kernel_initializer='glorot_uniform')(x)
what is the meaning of this source code? whereas so far sigmoid has only been used for binary classification
man ML is so hard
https://towardsdatascience.com/multi-label-text-classification-5c505fdedca8 is this a good article to learn off?
https://twitter.com/Br3Sc/status/1357328840431910913?s=20
Sup guys, pls if y'all don't mind take a look in my tweet about learn data science for free
Free resources to learn #MachineLearning and #DataScience for free๐:
I've been gathering resources from these wonderful people:
@svpino @PrasoonPratham
#100DaysOfCode #Python #codingtips #CodeNewbie #pythonprogramming #Python3 #Algorithm #data #pythonlearning #code #Tips #ML
hello data science people. I have a csv file and one of the column names is "Activities", under this column are a bunch of records where this value is "Sitting", which I dont want
how do I drop all records where the Activity value is "Sitting"? please ๐
if anyone could help, pls pm me ๐
df_filtered = df[df['Activities'] != 'Sitting']
How would you make that work for multiple fields? right now I have this, and it doesnt work:
`NotUsedActivites = unseen_df[unseen_df['Activity'] == 'Vibration' | (unseen_df['Activity'] == 'Drop_n_Pickup')].index
unseen_df.drop(NotUsedActivites, inplace = True)`
df_filtered = df[df['Activity'].isin(['Vibration', 'Drop_n_Pickup'])]
I think this is what you're looking for
@olive tinsel
sure ๐
ummm...i have these 318 samples of a token when i analyzed them i got this as output
so can i generate a few similar patterns of token using this data?
like predicting what the next token would be
@olive tinsel
oh my... I am no where near that good at python
im sorry, I thought it would be simple ๐ข
ping me up anyone if u would like to help, thanks
anyone here have experience with optimizing a streamlit app so that a weak server can run it? i'm having some issues and can't seem to get it to just... not crash hah
what would you choose?
hmmm
leaning to 1. option
lots of rows?
yeah, few columns reduce accuracy quickly
anwer to all questions is: "well, it depends"
EXACTLY
incorrectly labelled data but if you have low dimensionality data then this probably is an easy fix
yikes
rough one
with labels it also depends whether you realize it during your analysis
trueeeeee, do they give you a 2000 word essay as an answer xd
it is a question in a job application form
input your date, upload resume, answer this question ๐
maybe answer with the ones that are most appropriate for the job youre applying for and the data youre working with?
junior data scienstist for DS/AI consulting company
just submitted by application, picked the first and second option and wrote it the comment below that "it depends".
thanks for your help!
Heya, any chance to get some help with Selenium?
I am trying to filter HTML elements by using multiple filters, not sure how to accomplish that though
not sure bro
find by CSS selector?
Thanks for the hasty reply, was already able to fetch it using xpath -
for img in driver.find_elements_by_xpath("//img[contains(@data-automation, 'mosaic-grid-cell-image')]"):
But now I have another issue
The site I'm trying to fetch the data from loads the images when the images are close to being on screen
I tried scrolling slowly to the bottom of the screen and then fetch the links but seems like it doesn't work because the browser is in the background.
Got any idea how to get it to work with a headless setup?
Would love some pointers regarding it, tried moving the focus of the driver between the windows, no change whatsoever
hm
haven't worked with this in a long time, honestly
don't really remember, sorry
but
Well, I really appreciate you trying though ๐
I am a bit sceptical that focus would matter
do you know this to be the case?
or is it due to the scrolling behaviour itself
I tried it a couple of times, once I tried running the script when the FireFox opened in the background and the other I tried executing it I was just looking at the browser - the two had whole different outcomes
Hello everyone, is there like a website or directory to know what version of Tensorflow would work with your laptop?
put it in the background but in a position where you can visually inspect it
I've been practicing on Google's colab but would like to run some stuffs on my PC (hp elite book 8440p).
The latest versions of TF are giving me issues, so I'm thinking of installing a lesser version.
Like 1.X versions. Thanks.
why do you think an older version would work?
Actually, I'm thinking an older version would work better as the latest versions (2.X) are not running on my PC.
I'm getting different issues which I can't resolve.
not running how
It's saying some stuff about DLL import error and stuff.
So I'm thinking maybe 1.X versions would work with my laptop since it's old.
hard to say how to fix it, but that seems to be the case
ummm I'll try, but don't forget that I strive to achieve a headless script
And I'm using a tiling WM, not sure how I'd be able to do so fast enough
no, I mean, to verify your hypothesis
namely, that it's focus that's causing the problem
Huh, seems like you were right, or rather that I was wrong
I had it in view but not under focus
Yet I was able to extract all needed elements
Got any idea how to deal with it though? It only raises more questions
I'm heading to bed, if you guys have any idea how to handle that beast I'd be more than happy to hear
Thanks! ๐
Following this tuto
How (coding) can i say if a match between 2 images is good or not?
Is someone familiar with plotly?
hi guys, i have a question about finding start to end times of a data frame in days months years. this i what i have tried df['Date] = pd.to_datetime(df['Date'] to change to a datetime object. then i did df['Date'].max().day - df['Date].min().day. but i now see that this is wrong because what it is doing is going to the last date.day and subtracting the min date.day from it but that is not actually number of days. for instance '2010-27-02' - '2000-01-02' it will return 26
when really what i am trying to find "this data span is from 9 years, 3 months and 28 days".
bruh @rotund dagger can u send sc of pd df
@abstract zealot
the correct answer is 9 years, 7 months, and 25 days. im just not sure how to get there
via pandas that is
gimme one sec
thank you
youre basically trying to find the difference between 1st and last date?
or am i wrong
yea
sorry bro back now
because you sort the column there are a couple ways you could do it
calculating the difference like ```py
start = pd.to_datetime(df['Date'][0], format='%Y-%d-%m')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%d-%m')
difference = end - start
this will probs return a timedelta in days??? although im not sure
i cant remember much about it
if it does you can just manipulate it a bit to get the required format @rotund dagger
thank you i will mess around with that. i dont have to sort the column necessarily i was just messing with functions. im looking into relativedelta from the dateutil libray at the moment
yea so if you print(difference) it should return number of days between those two dates
is it right? lol
it kind of does but its a bit skewed.
wym?
are you years-day-month?
yikes
youre years-month-day
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
try that
ok , sec ill let you know
@abstract zealot
i start.day rather lol in the diffday line, but still stlightly wrong
what i wrote should return that, i just tried
ahhhhh
it might not work because you need to sort values
try putting df = df.sort_values('Date')
so ```py
df = df.sort_values('Date')
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
print(difference)
can someone explain me why boxplot doesn't working?
this is my output
bruh
ok, what might be happening is whenever you sort the date column, pandas sorts by year, and doesnt know how to handle datetime objects, take away the sort_values
or convert the entire column to datetime objects and then sort it
sorting the column is what messed it up, that code should return the exact number of days between the first and last date in the column
you will need to rerun the block of code where you defined df
since you overwrote it when you said df = df.sort_values
run the top one where you say df=read_csv
then run my code
and see if it works
lmao
im struggling today xd
@supple minnow do you have repeats for those categories?
if you only do something once it appears like that
@abstract zealot all 3 implementations show 3128
except for 1 is not calculating the last day in the mix
omg im so sorry, you will need to put df['Date'] = pd.to_datetime(df['Date'])
then df.sort_values('Date')
take away all the other code, just make sure youre defining the dataframe, converting the date column to datetime objects, sorting the column, then using my code
idk man, i literally replicated this on my own system and got 3524
dam, let me try with just hardcoding those 2 dates
thats a good idea actually
lst = ['2007-11-01', '2016-02-01', '2020-02-10', '2017-06-25']
df = pd.DataFrame(lst, columns=['Date'])
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
print(difference)
this prints 3524 and is just simulating your situation
although your indexing is crazy on that df
try the df = df.sort_values('Date') again and print df
i dont think it reindexes it
or call it org = df.sort_values('Date') so you dont overwrite
anyway, ill make new code that shouldnt need you to sort anything
nah it doesnt
i think its an indxing thing
are you using publicly available data?
yea np
how do i rename images like photo0,photo1,photo2.... using csv file that contains the name? Thanks in advance!
if I have a pandas dataframe that has two different index values but multiple elements with each index, how would I split that up into multiple dataframes? Is there a better way to do this?
@astral path i think there is a way to do it let me try
thank you!
to clarify, you want a dataframe with url = abc, and values 1 12 3 4, and a second dataframe with url = def, and val = -190, -4, -5
yes, correct
so it will be something like this
Ye
df1 = pd.dataframe([])
df2 = pd.dataframe([])
then load the values you need in the dataframe, i have to leave for a sec, when i come back i will try to load it up and show displays
ok, thank you! i appreciate this
np
the bigger issue i have in particular though, is that I have 209 different indices that I want to make into 209 different dataframes
i. e.
the bigger issue i have in particular though, is that I have 209 different indices that I want to make into 209 different dataframes
@astral path groupby apply
what would the apply do?
Can anyone suggest me the best model for multiclass classification. I am thinkingย of Naives Bayes(Gaussian) but is there any better model?
@nocturne plover you can model your data by lots of distributions
wouldnt it depend on the question youre trying to answer
literally check out https://medium.com/@ciortanmadalina/overview-of-data-distributions-87d95a5cbf0a for a flavour of the different ways you can model data
@astral path did you solve your problem?>
lemme know if you need help man
will do and thank you!
I did a df.groupby(["a", "b"]).x.mean() and I ended up with a,b as multiindex with my x-mean column, I'm trying to plot a separate plot for each a with b being on the x axis and x being on the y-axis
for more context just to let you guys know
not sure how to do that
im looping over every game from this NBA season and am trying to split each game into a new dataframe
im solving it a different way though so far:
vals = pd.DataFrame([])
df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()
display(df.loc[df.URL==urls[0]])
for thisURL in urls:
gamestats = df.loc[df.URL==thisURL]
display(gamestats['ShotDist'])
are you getting this from dictionaries?
no, from a csv
Create a new spreadsheet and edit with others at the same time -- from your computer, phone or tablet. Get stuff done with or without an internet connection. Use Sheets to edit Excel files. Free from Google.
the data looks like this
can you post screenshot im not logged into google 
yeah thats what im doing
however instead, i'm looping over each URL and using df.loc to get the rows with that URL instead
emm
you could try something like ```py
for a, b in df.groupby(by='URL'):
print(a)
print(b)
break
this would just give you an example thats why theres a break but basically it takes each element from url and makes a df out of it
๐ @astral path
right now i'm doing this
vals = pd.DataFrame([])
df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()
display(df.loc[df.URL==urls[0]])
for thisURL in urls:
gamestats = df.loc[df.URL==thisURL]
shotDists = gamestats['ShotDist']
shotOutcomes = gamestats['ShotOutcome']
for i in enumerate(shotOutcomes):
if(shotOutcomes[i] == 'miss'):
(incomplete)
what im trying to do specifically is get the ShotDists for both teams in a specific game, do an analysis with each teams separate series of ShotDists which returns a float, and then send that float to a separate dataframe for storage
anyone know how to convert keras model into pytorch model ?
oh good they got you @astral path
yeah but
i have an error
vals = pd.DataFrame([])
df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()
display(df.loc[df.URL==urls[0]])
for thisURL in urls:
gamestats = df.loc[df.URL==thisURL]
homestats = gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()]
awaystats = gamestats['AwayPlay'].dropna()
#homedists = homestats['ShotDist']
#awaydists = awaystats['ShotDist']
#homeoutcomes = homestats['ShotOutcome']
#awayoutcomes = awaystats['ShotOutcome']
display(homestats)
ValueError: Can only compare identically-labeled Series objects
i'm trying to make a new dataframe (homestats) which uses .loc on gamestats with the parameter of all gamestats elements in the column 'HomePlay' which are not NaN
@abstract zealot @rotund dagger any ideas?
Hmm I'll try that
i have no ideas but you should follow ken jee on YT
hes a big sports data science guy

i'll check him out!
thank you! will def be checking this out
i chose nba analytics for my focus in my DS class this semester
im looking it over now so far nothing jumps out
in this line gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()].... is the gamestats.HomePlay the same as saying gamestats['HomePlay']?
HomePlay is a column, but the environment might be confusing it with a method call
so could you do gamestats.loc[gamestats.['HomePlay']==gamestats['HomePlay'].dropna()]
im fairly new to this so i could be entirely off base
@abstract zealot @astral path
hmm i dont know
i mean to get gameStats I did gamestats = df.loc[df.URL==thisURL]
i'll try it
Recoding it is the only way.. if I'm not wrong
u have recoding it?
wew I need more time to spend up
There's no other way... If you need to use same model in PyTorch cause both have separate backends (incompatible)
But there's something at GH called PyTorch Lighting which makes it compatible on coding it in PyTorch. I'm not sure if it works but maybe yes.
Incompatible backend?
Easy to understand, thanks for the info!
you can easily convert pytorch to keras by using onnx as a middle man but pytorch doesnt allow loading onnx models
so now can you use onnx to convert from keras to pytorch?
pytorch doesnt allow loading onnx models
its the same error when i try that
so i think i know why
when i do homestats = gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()]
its trying to compare gamestats.HomePlay and gamestats['Homeplay'].dropna()
which are both of different lengths because .dropna() is removing the rows with NaN as the value for HomePlay
so how would I return a df with all the rows where gamestats['HomePlay'] is not NaN?
thanks!
gamestats[gamestats['HomePlay'].notna()]
How do you color your output?
Also, if any of you are learning data science here is a good website to start. app.dataquest.io/referral-signup/inxant5f/
someone could help me with syntax of keras and tensorflow, they seem having a conflict in my code
who can I dm about it?
How much math do I need to know before learning tensorflow?
hey guys how can i convert my rgb channels into classes in tensorflow?
Does anybody know why is plotting like this?
Bruh show code @supple minnow
If you just wanna use TF apis as default.... 0 math.... but if you want to tune your model for better results th defiantly calculus, probability topics are required!!!
DM Me
Ok, Thanks ๐
sry for the late reply I just notice it. Code is in the picture: sns.boxplot(x='Fjob',y='repeated',data = data)
valid_datagen = ImageDataGenerator(
rescale=1. / 255,
validation_split=0.2)
valid_generator = valid_datagen.flow_from_directory(
directory=data_dir, target_size=dimensions[:2],
seed=seed, subset='validation')```
How can i get X_test and Y_test from this object?
I need it to plot confusion matrix
How exactly do you select features from an image (pixels) for subsequent input (X,y) to classifier/neural network?
normally you'd just take the pixels and put them directly into an array which you can later convert to a tensor
you can use something like opencv to read the image
why when i use plot_confusion_matrix i get "only classifiers supported"?
thank you! worked
Is this also a channel I can ask for help?
because I needed some help with matplotlib
go ahead
so whenever I use FuncAnimation, anything I return from its animate function seems to draw on top of everything
and you divide your axis with ```py
fig, ax = plt.subplots(<number rows>, <number columns>)
I used fig.add_axes
can you show code?
which part specifically?
maybe just the part where youre trying to plot?
what I'm trying to do btw is make it not draw over everything
yes i know
Idk what part still, mean my animation code?
you can dm me the code and i can suggest a fix if you dont want to post it here
100 ?
?
100 lines
166 in total
you just need to show the part where you start using matplotlib, if i need more ill letcha know ๐
the entire code is just matplotlib really
dm me the code then xd
okay
anyone here work with h5py files
Aye could someone point me in the right direction? How could i use a pattern of data from a set of users to find similar users? At this point i don't even know what to google. I've got a pretty massive amount of users, and a decent sized subset of them that I know fit, i just need a way to find other users like them in the entire group. Sorry if this doesnt make sense
if I have a dataframe with two columns and each row is an integer from 3 to 37, how could I figure out the exact number of times a specific combination of values appears in the dataframe?
so like col1 and col2 are the two columns i want to analyze
and combo count contains the number of times a row appears
This sounds like a task for a typical recommender system. There are some simple algorithms for this, and advanced stuff too. Maybe look at this link for some simple starters: https://www.kdnuggets.com/2019/09/machine-learning-recommender-systems.html
ty cheers m8
What is your combo of numbers @astral path
try ```py
number = len(df[(df['col1'] == '3') & (df['col2'] == '32')])
How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?
will do! thank you
although how would I apply it to an entire dataframe?
depends what combos youre looking for
so rather than df['col1'] =='3', it would be df['col1'] == col1vals
yea
how would that work?
yeah
im making a scatterplot where the size of the point is dependent on the frequency that the x value and y value combination appears
in seaborn
maybe then try a different strategy like ```py
for e, i in df.groupby(by=['col1', 'col2']):
print(f'The combo {e} appears {len(i)} times ')
does that work?
lemme check
instead of printing them then, just put them into a dictionary which you can use to form your plot
How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?
nice, goodluck man ๐
ty!
How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?







