#data-science-and-ml

1 messages ยท Page 282 of 1

misty flint
rotund dagger
#

im really not sure i think his point is to get me familiar with using mapping and reduction in pandas, but its throwing me for a loop

misty flint
#

so his questions is..."for each race, find the states with the highest density" ?

rotund dagger
#

this is the exact question he is asking verbatim. Find the State with the highest density of each of the race categories (e.g.Hispanic, White, Black, Native, Asian, Pacific) - (6 answers). Please note that "Puerto Rico" is not a state even though it is in the data.

misty flint
#

surely theres census data that just has it by state

#

instead of counties

#

that would make it much easier

rotund dagger
#

it only displays buy state if i use groupby(['State'])

#

but he said not to do that. im not sure why

misty flint
#

maybe he wants you to calculate the density in your algorithm

#

yikes

rotund dagger
misty flint
#

im still confused as to what youre getting returned

#

why did you use median

rotund dagger
#

without it median it displays the highest density of a county within a state so i would get 100, but alabama doest fully contain 100 percent of a race

#

median takes the average of all counties

misty flint
#

ah thats how youre using it

#

but wait

#

median isnt the average technically

rotund dagger
#

mean is close in this usage if i use it i get the same answer

misty flint
#

why did you use mean()

#

oh i see

#

well its good that its normally distributed

rotund dagger
#

so this is almost the perfect answer, its just missing the name of the state

#

for instance, new mexico is the highest hispanic density, of 43.5

#

but it fails to display new mexico

misty flint
#

but you cant use group by?

#

weird

#

i would ask someone who knows pandas more than me

rotund dagger
#

i would just assign a title in print to each, or use a dictionary to do so, but he said it cant be hard typed. he didnt say i couldnt use groupby he just stated that i would never get the answer to work if i use groupby

misty flint
#

your data frame, what are the row names?

rotund dagger
#

the row names are just index 0 - 74000

misty flint
#

ahhh

rotund dagger
misty flint
#

maybe you can just return it using indexing

rotund dagger
#

sorry sum is better here than.count

#

so i suppose that density = Hispanic/TotalPop for each state

misty flint
#

^

#

thats what i would do

#

at least from the beginning

rotund dagger
#

even with that info i get lost still lol yikes

misty flint
#

@velvet thorn do you understand

#

we are both still noobs

rotund dagger
#

well thank you for taking the time to look through it with me @misty flint i appreciate it greatly.

misty flint
#

np sad i couldnt help more

#

tbf i just started coding not long ago

#

i think its more of a data science question rather than a pandas question tho

#

but idk

lapis sequoia
rotund dagger
#

im unfamiliar with scipy

misty flint
#

the way im interpreting the documentation, it looks like t is fine as long as its a 1 dimensional variable..?

lapis sequoia
#

Yes, that's how I see it. You can define the function passed to the solver however you like as long as it meets the requirements of the solver. However, the returned solution object will always be in terms of y and t.

lapis sequoia
#

Anyone have experience with matplotlib as a way to plot data?

rotund dagger
#

im learning that next, but i have minor experience in it

velvet thorn
#

have you solved your problem

rotund dagger
#

i have not

#

i am trying like mad to though its due in a few hours lol if you could help i would greatly appreciate you

#

@velvet thorn forgot to @ you

velvet thorn
#

uh

#

can you

velvet thorn
rotund dagger
#

ill start with the question i am trying to solvee

#

Find the State with the highest density of each of the race categories (e.g.Hispanic, White, Black, Native, Asian, Pacific) - (6 answers). Please note that "Puerto Rico" is not a state even though it is in the data.

#

that is verbatim

velvet thorn
#

okay

rotund dagger
#

so i am using a csv from kaggle

velvet thorn
#

sounds like a groupby problem

rotund dagger
#

thats what i thought

#

so this is what i tried so far

#

d = df.groupby(['State'])[['Hispanic','White','Black','Native','Asian','Pacific']].mean()
d

#

and i get

#

a data frame with states as an the index and race for columns with mean values per state

velvet thorn
#

okay

#

and then

rotund dagger
#

then i need to find which of those states are the highest for each race

#

so i apply max() and get this

velvet thorn
#

what format do you want it to be in

rotund dagger
#

he wants it to read:

#

hispanic: New Mexico

#

so new mexico is 43.5 but when i apply max it is now a series and no longer a dataframe

#

so i lose the state column

#

new mexico is 45.3

velvet thorn
#

so basically

#

you want the index

#

to be the race

#

and the value to be the state?

rotund dagger
#

yea, but the state has to be calculated not hard coded

velvet thorn
#

sec

#
>>> df[df['State'] != 'Puerto Rico'].groupby('State')[['Hispanic', 'White', 'Black', 'Native', 'Asian', 'Pacific']].mean().idxmax()
Hispanic              New Mexico
White                    Vermont
Black       District of Columbia
Native                    Alaska
Asian                     Hawaii
Pacific                   Hawaii
#

like this? @rotund dagger

rotund dagger
#

yes exactly

velvet thorn
#

ye

#

there you go

rotund dagger
#

but with puerto rico and district dropped

#

omg your a life saver

velvet thorn
#

change to

rotund dagger
#

i see how to drop district in that

velvet thorn
#

df[~df['State'].isin({'Puerto Rico', 'District of Columbia'})]

rotund dagger
#

ive been working on this for 3 days you are absolutely amazing!

#

hopefully i can cruise through the rest of the questions now

velvet thorn
#

yw ๐Ÿ‘‹

misty flint
#

wow a real superhero

rotund dagger
#

yea, absolutely slayed it

misty flint
#

you were so close to

rotund dagger
#

my professor totally tried to throw me off that path too

steel zealot
#

my minds blown

#

i found somthing out

velvet thorn
#

so I was

#

just thinking about your problem and

#

I feel like the statistical methodology is wrong?

#

because

#

what you're doing

#

is taking the mean of the percentages of each race, right?

#

but that doesn't necessarily represent the percentage of each race for that state

#

because each entry may have a different total population

#

do you get what I mean?

tall trail
#

so im trying to filter out a row in my dataframe between 2 minutes ( my dataframe has this kind of timestamp: 2021-01-31 15:46:33 ) and i can not understand how pandas between_time works.
Right now i have this:
peaks = peaks[peaks['time'].between_time('00:30', '00:32')]
which gives me the following error:
TypeError: Index must be DatetimeIndex

if i run df.dtypes it returns the column as datetime64[ns]

what am i doing wrong/do i need to supply more info?

last rivet
#

datetime64 !== DatetimeIndex
Do a conversion

tall trail
#

does the column datetime need to be the index aswell?

last rivet
#

It's a type error, you need to fix the type so pandas recognize it

#

e.g convert ur datetime64 to normal DateTime by calling .tolist() and I think then Pandas should recognize it

tall trail
#

thanks, ill try that

velvet thorn
#

between_time implicitly filters on the index

#

so you want to set_index first

tall trail
#

ya i figured, did this now df['time'] = pd.DatetimeIndex(df['time'])
df.set_index(keys='time', inplace=True, drop=False)

lapis sequoia
#

plss help in this... I am a begginer and new here... so cant access the voice ... plsss help me in this... and have to give a voice message coz... the problem was long to write

atomic obsidian
#

is pandas a good library to begin learning data science?

rigid ledge
#

hello guys

#

can anyone plz help me in installing yolov4 on ubuntu VM

#

?

#

any tutorial is appreciated thx

ripe forge
#

It has the option of fine tuning too, so make sure to read the params and change it as needed if you want training from scratch. (ps. I don't recommend training from scratch)

supple minnow
#

Hello all,
I have a question based on feature selection. Based on this picture(mutual info) what is the best approach when we need to decide what feature we gonna take and which we gonna drop? Like is it ok if I take the first 7(including age) features or should I just take the first two since they have better results?

ripe forge
#

Don't choose the number of features directly from plot. You can instead decide on some threshold, say cumulative 0.9 or cutoff 0.005

#

Then take whatever n you get from that approach

#

Note that while first two features seem to be clearly stronger, it doesn't automatically make other features bad. There's still information there.

crisp gazelle
#

Would any of you guys know a way to visualize neural networks like with a library similar to sns ? I have seen ann_visualizer, but I am working on my own neural net without using Keras so i am not sure if that will work

ripe forge
#
  File "/mnt/disks/sdb/superai/ai/objectrecognition/vision/datasets/open_images.py", line 17, in __init__
    self.data, self.class_names, self.class_dict = self._read_data()
  File "/mnt/disks/sdb/superai/ai/objectrecognition/vision/datasets/open_images.py", line 63, in _read_data
    class_names = ['BACKGROUND'] + sorted(list(annotations['ClassName'].unique()))
``` focus on this part of the traceback. this is giving you a clue about where to look
#

see if you can read the code in that place to figure out why this error is happening

#

My first guess would be it expects the class names to be provided in a certain format, yeah

dusty anchor
#

ehy guys can i ask here for tensorflow/keras questions?

ripe forge
#

yep, you can

dusty anchor
#

so ive a few questions, first, im working on a image segmentation project for the first time using the cityscape dataset, ive 2 folders one containing the images, and one containing the masks, ive made a function to create a list containing the path of all the images, can i convert these lists in a keras dataset?

delicate crane
#

Can I get some ideas for a project using machine learning

lapis sequoia
#

Python Projects: How to Build a Simple Trading Bot Skeleton in Python | Episode 1 by Third Eye Cyborg Podcast โ€ข A podcast on Anchor. This podcast episode goes into the code of a basic Python project. Let me know what you think, I am always open to feedback! https://anchor.fm/thirdeyecyborg/episodes/Python-Projects-How-to-Build-a-Simple-Trading-Bot-Skeleton-in-Python--Episode-1-epplkg

Python Projects: How to Build a Simple Trading Bot Skeleton in Python | Episode 1 by Third Eye Cyborg Podcast

I will be using the knowledge that is covered in the Python Basics Series to conduct several Python Projects in the Episodes of this Podcast. In this episode I will be going over building a basic trading bot skeleton in the Python Programming Language. I also plan to go into other programming languages and technologies in future episodes.
Check ...

atomic obsidian
#

should i start learning pandas or mysql first

cerulean spindle
loud osprey
#

hello, can anyone recommend me good online visualization tools which can fetch data from an api

quick veldt
#

Hey, I'm currently collecting some data from Twitter Streaming API and I need to run the script all the time. What are my options in terms of free script hosting?

cerulean spindle
#

Does anyone how a sub-par GPU affects tensorflow training?

austere swift
#

theres really 2 main things about the gpu you need to worry about

#

first is vram

#

if you have a very low amount of vram then some larger models won't be able to train

#

or you'd have to lower the parameters of the model or the batch size to get it to fit

#

second is the gpus actual speed

#

this, unlike the vram, wouldn't completely stop you from training the models

#

it would just make it slower/faster

carmine finch
#

hey does anybody know python well i need help

austere swift
#

this is a python server

#

a lot of people know python well

misty flint
#

dont ask to ask meme

foggy fern
#

Hi I'm having trouble with some data visualizing I'm not seeing the whole behavior of contours

#

I'm just getting a chunk of it

misty flint
#

i understand that feeling

foggy fern
#

do you know how to resolve it?

misty flint
#

matplotlib?

foggy fern
#

yeah

misty flint
#

well depends

#

whats your code

foggy fern
#

phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0
zf =1000
N = 400 # Number of Runge-Kutta steps
h = (zf - z0)/N
def f(p, z ):
x = p[0]
U = p[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)**2.(.3(1.+z)*3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)**2.((1.+z)**3.+.7))

return array([dx, dU], float)

zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
ypoints.append(p[0])
vpoints.append(p[1])
k1 = h * f(p, z)
k2 = h * f(p + 0.5k1, z + 0.5h)
k3 = h * f(p + 0.5k2, z + 0.5h)
k4 = h * f(p + k3, z + h)
p = p + (k1 + 2k2 + 2k3 + k4)/6
z0q=1000 #interesting redshift values in QSO data(initial)
zfq=1100 #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
x = q[0]
U = q[1]
dx = U
dU= -(2./(1.+z)+(3..03/(2.(1.+z)**2.(.3(1.+z)*3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)**2.((1.+z)**3.+.7))

return array([dx, dU], float)

zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)

for z in zqpoints:
xqpoints.append(q[0])
vqpoints.append(q[1])
k1 = h * f(q, z)
k2 = h * f(q + 0.5k1, z + 0.5h)
k3 = h * f(q + 0.5k2, z + 0.5h)
k4 = h * f(q + k3, z + h)
q = q + (k1 + 2k2 + 2k3 + k4)/6

misty flint
#

oh no can you put in in markdown

#

nvm

#
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0
zf =1000
N = 400        # Number of Runge-Kutta steps
h = (zf - z0)/N 
def f(p, z ):
    x = p[0]
    U = p[1]
    dx = U
    dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))

    return array([dx, dU], float)
zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
    ypoints.append(p[0])
    vpoints.append(p[1])
    k1 = h * f(p, z)
    k2 = h * f(p + 0.5k1, z + 0.5h)
    k3 = h * f(p + 0.5k2, z + 0.5h)
    k4 = h * f(p + k3, z + h)
    p = p + (k1 + 2k2 + 2k3 + k4)/6
z0q=1000    #interesting redshift values in QSO data(initial)
zfq=1100   #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
    x = q[0]
    U = q[1]
    dx = U
    dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))

    return array([dx, dU], float)
zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)

for z in zqpoints:
    xqpoints.append(q[0])
    vqpoints.append(q[1])
    k1 = h * f(q, z)
    k2 = h * f(q + 0.5k1, z + 0.5h)
    k3 = h * f(q + 0.5k2, z + 0.5h)
    k4 = h * f(q + k3, z + h)
    q = q + (k1 + 2k2 + 2k3 + k4)/6
#

!e

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

misty flint
#

ah dang it

#

ill just pull up a notebook rq

#

what were the libraries you used

#

numpy

#

matplotlib

foggy fern
#
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
%matplotlib inline
from numpy import array, arange
misty flint
#

thanks

#

its giving me an invalid syntax

foggy fern
#
plt.rcParams["font.family"] = "serif"
fig, ax = plt.subplots()
level = [ -.0026, -0.0021,-.0016,-.0011,-.0005]
levels = [ -0.0000875,-0.0000701,-0.0000576,-0.0000450, -0.0000285]
plt.contour(W,M,xqpoints,10, cmap='jet');
#CS=ax.contour(Q,P,xpoints, levels, colors='black')
plt.colorbar()
#CS=ax.contour(W,M,xqpoints, level, colors='green')
#plt.ylim([-5,-2])
#ax.set_ylabel("$ ฮฑ_{had,0} $")
#ax.set_xlabel("$ฯ†'_{0}$")
#ax.clabel(CS, inline=1, fmt='%1.9f')
#ax.yaxis.grid(True, zorder=0)
#ax.xaxis.grid(True, zorder=0)
plt.show()
#

this is what I'm doing for the contour

misty flint
#
dU= -(2./(1.+z)+(3..03/(2.(1.+z)2.(.3(1.+z)3.+.7))))p[1] +2.w.3np.exp(-2.x)((1.+z)/(.3(1.+z)**3.+.7))-(np.power(10.,m))x/((z+1.)2.*((1.+z)3.+.7))```
#

this line

#

theres problems with the parentheses

foggy fern
#

is there?

#

i dont get any errors

#

i mean ((1.+z)3.+.7)) has 2 parenthesis

misty flint
#

the code look the same to you?

#

or is it different

foggy fern
#

but this doesnt change anything

#

same

misty flint
#

ah 2 instead of 4?

#

its not liking the 4

foggy fern
#

it works fine for me

misty flint
#

weird

#

might just be me then

foggy fern
#
 dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
#

copy paste it directly maybe?

misty flint
#

oh yeah there is some difference

#

you raised to the power of two at one part

#

i dont have that in the original code

foggy fern
#
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-5, 0,400)
w= np.linspace(-.0002,.0008,400)
[M,W] =np.meshgrid(m,w)
z0 = 0           
zf =1000          
N = 400        # Number of Runge-Kutta steps
h = (zf - z0)/N 
def f(p, z ):
    x = p[0]
    U = p[1]
    dx = U  
    dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
     
    return array([dx, dU], float)
zpoints = arange(z0, zf, h)
ypoints = []
vpoints = []
p = array([phi_0, phi_1], float)
for z in zpoints:
    ypoints.append(p[0])
    vpoints.append(p[1])
    k1 = h * f(p, z)
    k2 = h * f(p + 0.5*k1, z + 0.5*h)
    k3 = h * f(p + 0.5*k2, z + 0.5*h)
    k4 = h * f(p + k3, z + h)
    p = p + (k1 + 2*k2 + 2*k3 + k4)/6
z0q=1000    #interesting redshift values in QSO data(initial)
zfq=1100   #interesting redshift values in QSO data(final)
i = (zfq - z0q)/N #step size
def f(q, z ):
    x = q[0]
    U = q[1]
    dx = U     
    dU= -(2./(1.+z)+(3.*.03/(2.*(1.+z)**2.*(.3*(1.+z)**3.+.7))))*p[1] +2.*w*.3*np.exp(-2.*x)*((1.+z)/(.3*(1.+z)**3.+.7))-(np.power(10.,m))*x/((z+1.)**2.*((1.+z)**3.+.7))
     
    return array([dx, dU], float)
zqpoints = arange(z0q, zfq, i)
xqpoints = []
vqpoints = []
q = array([ypoints[399], phi_1], float)

for z in zqpoints:
    xqpoints.append(q[0])
    vqpoints.append(q[1])
    k1 = h * f(q, z)
    k2 = h * f(q + 0.5*k1, z + 0.5*h)
    k3 = h * f(q + 0.5*k2, z + 0.5*h)
    k4 = h * f(q + k3, z + h)
    q = q + (k1 + 2*k2 + 2*k3 + k4)/6

#

this is the code

misty flint
#

oh yeah its dif than the previous one

#

keeps giving me errors

#

like its not multiplying the k1 and k2 in the previous code

#

let me use this one

foggy fern
foggy fern
#

yeah

#

i want to see their full behavior

#

like the whole contours

misty flint
#

i think the key here is changing what pass in in these lines:

#

zpoints = arange(z0, zf, h)

#

zqpoints = arange(z0q, zfq, i)

#

i believe

#

i could be wrong tho

#

so let me try

foggy fern
#

those are the initial and ending values though

#

for where I'm trying to calculate ode

misty flint
#

oh

#

well then i guess its just changing the linspace then

#

no?

foggy fern
#

well if i do huge linspace then i can't see my paramters well enough

misty flint
#

thats a problem

#

bc i cant think of a better solution

foggy fern
#

were you able to make the full contours anyhow though

misty flint
#

kept giving me errors

foggy fern
#

what error?

foggy fern
#

it does look interesting

misty flint
#

the problem here is if you increase the linspace parameters too much you get exponent overflow error

#

i think theres a mathematical solution instead

foggy fern
#

yeah mass is a power 10

misty flint
#

maybe using log or something

#

and then youll visualize it that way instead

#

how much can you change the equation

#

i would start there maybe

foggy fern
#

what do you mean changing the equation

misty flint
#

what is dU here

#

or what does it represent

foggy fern
#

second derivative

misty flint
#

second derivative of what

#

of just U?

foggy fern
#

2nd derivative of x

misty flint
#

oh im dumb

#

lol

foggy fern
#

first derivative of U

#

no you're good

misty flint
#

have you messed at all with the delta?

foggy fern
#

which delta?

misty flint
#

ah nvm it didnt work

#

if only gm was here

#

they would know what to do

foggy fern
#

what's/who's gm

misty flint
#

@velvet thorn

#

can you help us please if you are free

foggy fern
#

whoa!!!!

#

there's a dip!

#

i didn't expect this

misty flint
#
phi_0 = np.linspace(0, 0,400)
phi_1 = np.linspace(0, 0,400)
m= np.linspace(-10, 3,400)
w= np.linspace(-.0002,.0008,400)
#

dunno if its actually supposed to be there or if its matplotlib

#

you can try it

foggy fern
#

this is helpful thanks i still want to see the full behavior though ๐Ÿ˜ฆ

misty flint
#

same

#

sorry bud

#

i am still noob

foggy fern
#

no worries thanks for your help

misty flint
#

ngl i thought your issue was going to be easier

#

like mine 2 weeks ago

#

np

velvet thorn
#

tag specific people

#

for help

misty flint
#

were all struggling here

#

sorry

lapis sequoia
#

Write a short Python program which, given an array of integers, a, calculates an array of the same length, p, in which p[i] is the product of all the integers in a except a[i].

#

Can someone help me with a question?

velvet thorn
lapis sequoia
#

Write a short Python program which, given an array of integers, a, calculates an array of the same length, p, in which p[i] is the product of all the integers in a except a[i].

foggy fern
#

I'm getting it partially

velvet thorn
#

there's stuff

#

outside the bounds of the box

#

that you want to see?

foggy fern
#

yes

velvet thorn
#

honestly there's way too much code + discussion there than I care to wade through

#

but

#

a good start would be

#

ax.axis, which lets you set the viewport bounds

#

if your range is very big

#

I would suggest

#

a different scale

#

log scale, in particular

foggy fern
#

the problem with that is I want to see the change in parameter in really small scales like order of 10^-5 if i do log scale it would just be -3,-4,-5...

#

but i want to see for values like .00005 and .00008 because I want to compare it with another set of data

velvet thorn
#

you might want to visualise subsets

#

use axis to move the viewport

#

to parts you want to focus on

#

sounds too big otherwise

lapis sequoia
#

I have a weird question

#

Do you guys fully memorize plot methods or you do it copy pasta from google

tall trail
#

mostly google here

fleet heath
#

And still if you're stuck somewhere, google is always there

lapis sequoia
#

I mean, do professionals do like this too?

#

like, google most of the time

tall trail
#

im doing an internship right now, here they mostly document their google findings and present them to each other to teach everyone if its usefull

lapis sequoia
#

Wow

#

Thats impressive

#

What do you do as an intern

tall trail
#

i need to build a dashboard which visualizes analysed data that suits the company application landscape

#

and i need to do the analyzing part myself too

#

and write a massive report about it too

lapis sequoia
#

I envy you

dusty anchor
#

hey guys if i use the tf.data.Dataset.list_files can i specify to import only images that have a particular word in the name?

velvet thorn
#

like as you get better

#

usually you know what you want to do, just not how to do it in a specific context

#

so for example

#

for a certain problem, a beginner might search "how to find text with dynamic prefix"

#

which is something you can do with a regex

#

and you might just have forgotten the syntax

#

so you might instead look for "python lookbehind regex"

#

experience also helps you know what to search for

#

asking the right question is very important

ripe forge
#

I personally mostly Google things as well, there's too many other things to think about rather than worrying about memorizing specific syntax or parameters of an api call

velvet thorn
#

over time, when presented with new problems, even if you don't know exactly how to solve it, you'll have a sense of what kind of approach is more likely to work

#

there is stuff that can be looked up easily (what a parameter is called) and stuff that cannot (how to reduce a set of complex business requirements to a viable technical architecture).

#

you want to be the kind of person who is good @ the latter.

#

to draw one final analogy: the best Scrabble players are not the best writers.

lapis sequoia
#

I agree

lapis sequoia
lapis sequoia
#

so it's very slow, right?

#

hmm

fleet heath
#

Why would it be ^?

#

Yepp

lapis sequoia
#

well I just need to split it into country wise then

#

it takes forever

#

iterating over 1.75mil is ridiculous?

hasty grail
#

You can take the line country_city_list = df['City'].unique() out of the loop

#

Also, use df.loc instead of df when boolean masking, that way you avoid making copies of the dataframe

lapis sequoia
#

Thanks guys

#

I'll try changing it

#

seems like data are too large to do the 4 for loops

sullen hull
#
figure = plt.Figure(figsize=(0.5 * res_width / 100, 0.75 * res_height / 100), facecolor="#67676b")
figure.add_subplot(fc="#15151c").plot(df["Close time"], df["Close"].astype(float), "-" + colour)
figure.add_subplot(fc="#15151c").plot(df["Close time"], moving_avg, "-w")

error: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
figure.add_subplot(fc="#15151c").plot(df["Close time"], moving_avg, "-w")

#

How do I properly add another line to my figure

glad night
#

Hi guys! I have a question! I have a dataset that contains unique orders in it for a number of accounts. So the first column in each row is account number, and the rest are info on a specific order for that account. Hence, any account may have a number of rows for different things they have ordered - all of them containing the unique account number. I would like to pull a list of unique account numbers that have NOT ordered any from a short list of items, which I would identify by keyword... Any ideas? You're saving a life here. Thank you, hope you're keeping safe

#

PS. Apologies if this is the wrong space for this query...

serene scaffold
glad night
#

@serene scaffold appreciate you getting in touch Stelercus. Dataset stored in csv/xls.

serene scaffold
glad night
#

@serene scaffold Happy to. So I am attaching an example and will talk a bit about it

glad night
serene scaffold
#

@glad night please share it as text so that I can use it in a program.

#

namely as a csv

#

For future reference, text is the best format for anything pertaining to a question. Anything you can share as text and not as a screenshot, do that.

glad night
#

@serene scaffold @bleak fox Of course guys, give me a second. Just to comment on this - the idea is that if I run whatever code we come up with successfully, my 'exclusion criteria' would be the keywords 'voucher' and 'coupon', and in this specific case the one that does not contain them is AB000005

#

@serene scaffold Apologies - Discord turned it into a png! Attaching now

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

for future reference, share text as text in the chat. However I'll download it this time around.

glad night
#

@serene scaffold Thanks man I will.

serene scaffold
glad night
#

@serene scaffold Yes!

serene scaffold
glad night
#

@serene scaffold Exactly that.

serene scaffold
# glad night <@!253696366952316929> Exactly that.
>>> df['Product Name'].str.contains('coupon')
Unique customer number
AB00001    False
AB00001    False
AB00001    False
AB00001    False
AB00001    False
AB00003    False
AB00003     True
AB00003    False
AB00005    False
AB00005    False
AB00005    False
#

this is part of the solution

#

@glad night the things you need to learn are how to use pd.Series.str.contains, how to do boolean logic with pandas, and how to select from a dataframe using a boolean dataframe.

#

!docs pandas.Series.str.contains

arctic wedgeBOT
#
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)```
Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Parameters  **pat**strCharacter sequence or regular expression.

**case**bool, default TrueIf True, case sensitive.

**flags**int, default 0 (no flags)Flags to pass through to the re module, e.g. re.IGNORECASE.

**na**scalar, optionalFill value for missing values. The default depends on dtype of the array. For object-dtype, `numpy.nan` is used. For `StringDtype`, `pandas.NA` is used.

**regex**bool, default TrueIf True, assumes the pat is a regular expression.

If False, treats the pat as a literal string.

Returns  Series or Index of boolean valuesA Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.

See also... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html#pandas.Series.str.contains)
glad night
#

@serene scaffold Thanks a lot man - I will do some reading after work tonight.

dusty anchor
#

hello guys, im trying to import images into a tensorflow dataset, i need the dataset for image segmentation and i cant understand which is the correct way to create teh dataset, does it need to have couples (image-mask)? or i need 2 datasets one containing the images and one containing the masks?

tall trail
#

anyone ever worked with a datalake and databricks togheter? how do you check for new available data? does a notebook save variables if you run it as a task?

alpine mountain
#

Can anybody help me with this?

misty flint
#

i just got an interview for a DS internship

solid aurora
#

So I have this numpy array as follows:

#

(I'm substituting in small numbers for the dimensions to make it easier to explain)

#

It's shape is (10, 10, 5, 5, 3)

#

it's a 10x10 array of 5x5x3 image tiles

#

I need to recombine it into a single 50x50x3 image

#

what's the best way to do this?

#

Obviously I could do it with two for-loops but I'd rather have a more efficient vectorized way

hazy flax
#

Good afternoon, can you tell me where a data science programmer can work?

#

Can you work in physics and chemistry labs?

#

I wanted to go to Astrophysics College to work with python, but I don't know if there is a library for that.

#

๐Ÿ˜ฎ

#

ehueuehue

#

In Brazil python is very popular, they use it a lot to create websites with back end and facial recognition programs.

#

Okay, thanks for helping me.

sturdy musk
#

hoe to hak naza

hazy flax
idle cave
#

Does anyone here use pytorch or tensorflow? If so why do you use each one respectively and which would you prefer to use for a personal major web project that incorporates Django? What are the learning curves?

stray ingot
#

Hello! I'm currently working on a project where I need to convert the adjective form of a country name into its noun. For example, convert Italian to Italy, Spanish to Spain, etc...

#

Does anyone know how to achieve this? Its hard to search this on google, because the default search returns currency converters lol

misty flint
#

ngl i looked at this for forever until i realized its supposed to be read bottom-up

velvet thorn
#

if you can guarantee

#

no misspellings

#

and a fixed + known set of source words

#

you can just use regex + replacement

#

otherwise things get more complex

misty flint
velvet thorn
#

depending on structure

#

some combination of np.concatenate and np.stack

solid aurora
#

Ooh perfect, thanks!

fair shoal
#

Is it ok if I post a referral link for $15 off of a dataquest subscription? I get a free year if someone signs up with it, but I don't want to break any rules regarding spam or solicitation. Someone here might be able to make use of it.

fair shoal
#

Seems like it shouldn't be an issue. I've found them useful for brushing up on R and Python for data manipulation. Full transparency: if four people sign up through the link I get a free year. I hope someone is able to make use of it. Link: app.dataquest.io/referral-signup/p2d1jh5t/

lapis sequoia
earnest widget
#

Not really a Python related question, more of a data pre-processing question. I have a dataset which is about 600,000+ rows, would it make sense to remove rows to make it easier to work with or would that give me bias or incorrect results?

hasty grail
#

Depends on how your select the rows to be removed

#

If you do it in a completely random manner it should be reasonably fine.

earnest widget
#

Yeah I'm not deleting selected rows or such, it's all random.

hasty grail
#

However you should note that the more data you have, the easier it is for your model to generalize

#

Still, using less data may be sensible if you just want a proof-of-concept

rugged comet
dusty anchor
#

Hey guys ive this error when executing my script : Shapes (None, 256, 512, 3) and (None, 256, 512, 4) are incompatible.

#

how can i check what is giving it?

hasty grail
dusty anchor
hasty grail
#

Can you paste the error log?

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hasty grail
#

Try compiling your model with run_eagerly=True

#

Should make it easier to debug

dusty anchor
#

ok, where should i put this parameter?

hasty grail
#

in model.compile

#

Did it work?

dusty anchor
#

it takes a while cuz ive a big dataset

hasty grail
#

If you need to debug your model you can just take the first few elements of the dataset to save time

#

Are you using the correct loss function? Seems that the error is occurring there

dusty anchor
#

im honestly going blind right now, i tried both sparse_ catergorical_cross and categorical cross

#

i keep receiving a different error everytime

hasty grail
#

What is the output shape of your model?

#

Also the same for your ground truth

dusty anchor
#

as input i use pictures rgb of size 256,512

hasty grail
#

Can you display the model summary?

dusty anchor
#

yes

#

this is my first try at image segmentation and im a bit lost tbh

hasty grail
#

How about the ground truth?

dusty anchor
#

u mean my metric?

#

i use categorical accuracy

hasty grail
#

Ground truth = The correct "answer" you're supposed to predict

dusty anchor
# hasty grail Ground truth = The correct "answer" you're supposed to predict

as u could understand im a beginner, so idont know exactly how to check the ground truth, i have two numpy arrays containing my images and my mask, i create them a dataset with the from tensor slices function and i feed it to my model just to check if my dataset was correct, but probably my model is not correct for the dataset im using

hasty grail
#

print the dataset directly, it should tell you the shapes of its components

dusty anchor
#

PrefetchDataset shapes: ((None, 256, 512, 3), (None, 256, 512, 3)), types: (tf.float32, tf.float32)>

hasty grail
#

ok so your ground truth is the second element

#

with shape (None, 256, 512, 3)

dusty anchor
#

oh ok, they should be my masks

hasty grail
#

that doesn't fit your model output

#

your model is outputting 4 masks

#

but the ground truth only has 3

#

as indicated by the last element of its shape

dusty anchor
#

conv2d_23 (Conv2D) (None, 256, 512, 4) this part of the summary?

hasty grail
#

yes

dusty anchor
hasty grail
#

Can you describe the image segmentation task you are trying to achieve?

#

Taking a step back ^

dusty anchor
#

, i have the cityscape dataset and i need to make and train a model that can fit into a microcrontroller (1.5MB) so what i want to achieve is that the model is able to generate a mask that can classify the objects on the pictures, with an accuracy of 80%+

hasty grail
#

How are you generating the dataset?

dusty anchor
#

ive a function that get all the paths of the pictures and the masks and give me back 2 lists containing all them sorted, i then generate 2 numpy array from the path that contain the images and the masks

hasty grail
#

and I suppose there are only 3 classes that have to be identified?

#

are they mutually exclusive?

dusty anchor
#

teh calsses are 30 in the dataset

hasty grail
#

then why is the shape (..., 3)?

dusty anchor
#

road , sky , person car, etc...

hasty grail
#

if there are 30 classes it should be (..., 30)

dusty anchor
#

i probably did some mistakes then

#

i think the 3 is about the rgb of the picture

hasty grail
#

yeah

#

also your model should be outputting 30 channels, 1 per class

#

not 4

dusty anchor
#

i see, seems logic, so now i need to sort some things out, first i need to import the class correctly i guess, so that i have all the 30 classes

#

i have json files containing the objects and the masks poligons

hasty grail
#

yeah

dusty anchor
#

i guess i need to parse the json for every single mask right?

hasty grail
#

Not sure how it's formatted, it's on you to parse it

dusty anchor
#

i just dont understand one thing, how can the model understand which pixel correspond to a class during the training?

hasty grail
#

It doesn't need to

#

You just have to ensure that the classes in your dataset are consistent, so that channel N always corresponds to class N

#

Then, whenever the model predicts a high value for channel N, the same is said for class N.

#

The model only needs to learn to predict a value for each channel (like in other tasks), which is implicitly the class (in this case)

hasty grail
#

you'll have to convert the labels into indices for the masks (channels)

dusty anchor
#

u mean a list containing all the classes?

hasty grail
#

e.g. car = 0, person = 1

#

I think a dictionary would be more appropriate

dusty anchor
hasty grail
#

you can lookup the dict to get the index from the label name then

dusty anchor
#

so now i just need to find a way to assign labels to the masks i guess

hasty grail
#

^

dusty anchor
# hasty grail ^

the dataset give me 3 masks, which one should i use? can i post here them?

hasty grail
#

only 3 masks? which ones?

#

I have to go soon so maybe someone else can help

dusty anchor
dusty anchor
#

3 masks for every image

azure leaf
#

anyone got a good guide on picking a good alpha value?

#
            ('clf', OneVsRestClassifier(MultinomialNB(alpha=0.1, fit_prior=True, class_prior=None))),
            ])```
fallen yoke
#

What's the difference between fit and transform? I still don't get it :/

quiet elk
#

Hi all, i'm not sure if anyone will see this but while having a relaxing walk I had an idea to create and machine learning algorithm which will detect sequences in data e.g. Arithmetic sequences, geometric sequences using an example data set. Now i'm home i've attempted to find a example dataset but I can't find one and I wondered if there was any way you guys could help me find/create one to use? My template was going to be the pattern formula in the first column then the pattern from 0n to 10n. Thank you sooo much to anyone that can help me !๐Ÿ˜

woeful leaf
#

do anyone know how to import numpy from the cloned github numpy repo?
I have cloned the numpy github repo to my local.
then in terminal i did import numpy as np
but I'm not able to access numpy method. for ex. np.sin()

#

getting error AttributeError: module 'numpy' has no attribute 'sin'

woeful hamlet
#

what can i do when my data set doesnt have the same ammount of images per class?
the confusion matrix is a shiit xd every class (almost) filled with 0's

austere swift
#

try changing the class weights

#

so having some classes that have a lower amount of images be weighted more

#

thats assuming you don't wanna do augmentation or anything which would be a better idea

woeful hamlet
#

i did augmentation

austere swift
#

damn python bot has no scikit learn docs

#

that's how you would do the balancing class weights

woeful hamlet
#

       001_Class       0.00      0.00      0.00        18
       002_Class       0.00      0.00      0.00        17
       003_Class       0.00      0.00      0.00        23
       004_Class       0.00      0.00      0.00        18
       005_Class       0.00      0.00      0.00        16
       006_Class       0.00      0.00      0.00        22
       007_Class       0.00      0.00      0.00        20```
#

it looks like this almost all the matrix

azure leaf
#

anyone here familliar with the multinomialNB

#

and how to pick good alpha vlaues

abstract zealot
#

what are you simulating @azure leaf

#

you should really just benchmark your test set and plot for different values of alpha, this is usually a good way to determine optimal hyperparamaters, unless you have an underlying idea of how much shift or smoothing you would like

austere swift
#

How would i find all rows that have a certain value within one of their columns in a pandas dataframe? for example if i had a dataframe that has one column that contains a class, but the column can contain more than one class in it, how would i get all the rows that have a certain class in that column

#

the classes within the column are structured like "class1|class2|class3" etc

#

and they can have different amounts of classes as well, or just one single class

lapis sequoia
#

pandas.DataFrame.eval?

austere swift
#

no i figured it out, it was pandas.Series.str.contains

#

I knew that was a thing i didnt know it supported checking for multiple items

#

but thank you anyways

lapis sequoia
#

can someone help me convert unix to datetime?

#

df1['datetime'] = df1['unix'].apply(lambda x: datetime.utcfromtimestamp(x).strftime('%Y-%m-%d %H:00'))

#

this is my try. but it returns just OSError: [Errno 22] Invalid argument

#

the unix column looks like this in the dataframe:

0        1.612310e+12
1        1.612307e+12
2        1.612303e+12
3        1.612300e+12
4        1.612296e+12
             ...     
33016    1.502957e+09
33017    1.502953e+09
33018    1.502950e+09
33019    1.502946e+09
33020    1.502942e+09
Name: unix, Length: 33021, dtype: float64
lapis sequoia
#

nevermind, figured it out with some help

ashen nacelle
#

Hi guys

lapis sequoia
#

it is in milliseconds and not normal seconds. just had to divide the x with 1000

ashen nacelle
#

I was trying to launch anaconda navigator using cli on Linux

#

But for some reason it is not working

#

Anyone knows how to solve this issue?

austere swift
#

lol i messed up on the accuracy reporting

woeful hamlet
#

what is the seed argument for on flow_from_directory?

#

I have a ImageDataGenerator object

#

like this

#
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.975)```
#

validation split is so high for resting purposes

#

and then i do this

#
    directory=data_dir, target_size=dimensions[:2],
    seed=seed, subset='training')```
#

but when i print train_generator.filenames

#

It always prints the same ones

#

no matter the seed

remote summit
#

Hygy

old veldt
#

hi ! What would be the best approach to handling missing data in time series of cryptocurrencies data? I want to predict etherum prices but most others currencies didn't exist at the time etherum was invented so I have a lot of missing data in the bottom of my table. What is the proper way to handle this? Nice day to all of you! ๐ŸŒŸ

sterile saddle
#

Can anybody help out with opencv? ๐Ÿ™‚ Thank you

dusty anchor
#

hey guys which is the fastest model i can use for image segmentation? i need low accuracy (80%)

lapis sequoia
#

i have a list of 106 tokens
when i analyzed them i got this analysis , how can i predict the next token?

#

what i mean is to generate a few tokens which follows the similar pattern

#

ping me up if anyone decides to help

warm bane
#

Dense(6,activation="sigmoid",kernel_initializer='glorot_uniform')(x)

what is the meaning of this source code? whereas so far sigmoid has only been used for binary classification

azure leaf
#

man ML is so hard

tall basin
#

https://twitter.com/Br3Sc/status/1357328840431910913?s=20
Sup guys, pls if y'all don't mind take a look in my tweet about learn data science for free

Free resources to learn #MachineLearning and #DataScience for free๐Ÿ†“:

I've been gathering resources from these wonderful people:
@svpino @PrasoonPratham

#100DaysOfCode #Python #codingtips #CodeNewbie #pythonprogramming #Python3 #Algorithm #data #pythonlearning #code #Tips #ML

olive tinsel
#

hello data science people. I have a csv file and one of the column names is "Activities", under this column are a bunch of records where this value is "Sitting", which I dont want

#

how do I drop all records where the Activity value is "Sitting"? please ๐Ÿ™‚

olive tinsel
#

if anyone could help, pls pm me ๐Ÿ™‚

stray owl
#

df_filtered = df[df['Activities'] != 'Sitting']

olive tinsel
# stray owl df_filtered = df[df['Activities'] != 'Sitting']

How would you make that work for multiple fields? right now I have this, and it doesnt work:
`NotUsedActivites = unseen_df[unseen_df['Activity'] == 'Vibration' | (unseen_df['Activity'] == 'Drop_n_Pickup')].index

unseen_df.drop(NotUsedActivites, inplace = True)`

stray owl
#

df_filtered = df[df['Activity'].isin(['Vibration', 'Drop_n_Pickup'])]

#

I think this is what you're looking for

#

@olive tinsel

olive tinsel
#

that worked ๐Ÿ˜„ thank you!!!!

#

MU!!!! to you too!

lapis sequoia
#

hi

#

umhm

#

sorry but mind if i ask something?

olive tinsel
#

sure ๐Ÿ™‚

lapis sequoia
#

ummm...i have these 318 samples of a token when i analyzed them i got this as output

#

so can i generate a few similar patterns of token using this data?

#

like predicting what the next token would be

#

@olive tinsel

olive tinsel
#

oh my... I am no where near that good at python

#

im sorry, I thought it would be simple ๐Ÿ˜ข

lapis sequoia
#

;-;

#

datascience is not my subject as well

#

thats why asked

olive tinsel
#

same here ๐Ÿ˜ฆ

#

I only started python 1 week ago

lapis sequoia
#

ping me up anyone if u would like to help, thanks

analog kiln
#

anyone here have experience with optimizing a streamlit app so that a weak server can run it? i'm having some issues and can't seem to get it to just... not crash hah

misty flint
#

gl

analog schooner
abstract zealot
#

2 and 6 maybe?

#

what do you think?

#

@analog schooner

analog schooner
#

2 is obvious choice for me

#

but not sure about another point

abstract zealot
#

hmmm

analog schooner
#

leaning to 1. option

abstract zealot
#

lots of rows?

analog schooner
#

yeah, few columns reduce accuracy quickly

abstract zealot
#

thats true

#

and last 4 are kind of vague

analog schooner
#

anwer to all questions is: "well, it depends"

abstract zealot
#

EXACTLY

#

incorrectly labelled data but if you have low dimensionality data then this probably is an easy fix

#

yikes

#

rough one

analog schooner
#

with labels it also depends whether you realize it during your analysis

abstract zealot
#

trueeeeee, do they give you a 2000 word essay as an answer xd

analog schooner
#

it is a question in a job application form

#

input your date, upload resume, answer this question ๐Ÿ˜„

abstract zealot
#

maybe answer with the ones that are most appropriate for the job youre applying for and the data youre working with?

analog schooner
#

junior data scienstist for DS/AI consulting company

#

just submitted by application, picked the first and second option and wrote it the comment below that "it depends".

#

thanks for your help!

misty flint
#

good luck dude!

twin moth
#

Heya, any chance to get some help with Selenium?

#

I am trying to filter HTML elements by using multiple filters, not sure how to accomplish that though

abstract zealot
#

not sure bro

twin moth
# velvet thorn find by CSS selector?

Thanks for the hasty reply, was already able to fetch it using xpath -

    for img in driver.find_elements_by_xpath("//img[contains(@data-automation, 'mosaic-grid-cell-image')]"):
#

But now I have another issue

#

The site I'm trying to fetch the data from loads the images when the images are close to being on screen

#

I tried scrolling slowly to the bottom of the screen and then fetch the links but seems like it doesn't work because the browser is in the background.
Got any idea how to get it to work with a headless setup?

#

Would love some pointers regarding it, tried moving the focus of the driver between the windows, no change whatsoever

velvet thorn
#

haven't worked with this in a long time, honestly

#

don't really remember, sorry

#

but

twin moth
#

Well, I really appreciate you trying though ๐Ÿ™‚

velvet thorn
#

I am a bit sceptical that focus would matter

#

do you know this to be the case?

#

or is it due to the scrolling behaviour itself

twin moth
#

I tried it a couple of times, once I tried running the script when the FireFox opened in the background and the other I tried executing it I was just looking at the browser - the two had whole different outcomes

boreal summit
#

Hello everyone, is there like a website or directory to know what version of Tensorflow would work with your laptop?

velvet thorn
boreal summit
#

I've been practicing on Google's colab but would like to run some stuffs on my PC (hp elite book 8440p).

#

The latest versions of TF are giving me issues, so I'm thinking of installing a lesser version.

#

Like 1.X versions. Thanks.

velvet thorn
boreal summit
#

I'm getting different issues which I can't resolve.

boreal summit
#

It's saying some stuff about DLL import error and stuff.

velvet thorn
#

your dependencies

#

might not be set up properly

#

hard to say

boreal summit
#

So I'm thinking maybe 1.X versions would work with my laptop since it's old.

velvet thorn
boreal summit
#

Okay, I'll see what I can do.

#

Thanks.

twin moth
#

And I'm using a tiling WM, not sure how I'd be able to do so fast enough

velvet thorn
#

namely, that it's focus that's causing the problem

twin moth
#

I had it in view but not under focus

#

Yet I was able to extract all needed elements

#

Got any idea how to deal with it though? It only raises more questions

twin moth
#

I'm heading to bed, if you guys have any idea how to handle that beast I'd be more than happy to hear
Thanks! ๐Ÿ™‚

woeful hamlet
#

Following this tuto

#

How (coding) can i say if a match between 2 images is good or not?

opal sleet
#

Is someone familiar with plotly?

misty flint
#

i decided to apply for a Computational LInguistics/NLP minor bc im a clown

rotund dagger
#

hi guys, i have a question about finding start to end times of a data frame in days months years. this i what i have tried df['Date] = pd.to_datetime(df['Date'] to change to a datetime object. then i did df['Date'].max().day - df['Date].min().day. but i now see that this is wrong because what it is doing is going to the last date.day and subtracting the min date.day from it but that is not actually number of days. for instance '2010-27-02' - '2000-01-02' it will return 26

#

when really what i am trying to find "this data span is from 9 years, 3 months and 28 days".

abstract zealot
#

bruh @rotund dagger can u send sc of pd df

rotund dagger
#

the correct answer is 9 years, 7 months, and 25 days. im just not sure how to get there

#

via pandas that is

abstract zealot
#

gimme one sec

rotund dagger
#

thank you

abstract zealot
#

youre basically trying to find the difference between 1st and last date?

#

or am i wrong

rotund dagger
#

yea

abstract zealot
#

sorry bro back now

#

because you sort the column there are a couple ways you could do it

#

calculating the difference like ```py
start = pd.to_datetime(df['Date'][0], format='%Y-%d-%m')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%d-%m')
difference = end - start

#

this will probs return a timedelta in days??? although im not sure

#

i cant remember much about it

#

if it does you can just manipulate it a bit to get the required format @rotund dagger

rotund dagger
#

thank you i will mess around with that. i dont have to sort the column necessarily i was just messing with functions. im looking into relativedelta from the dateutil libray at the moment

abstract zealot
#

yea so if you print(difference) it should return number of days between those two dates

#

is it right? lol

rotund dagger
#

it kind of does but its a bit skewed.

abstract zealot
#

wym?

rotund dagger
#

i get 10 years 25 days and 5 months.

#

it should be 9 years 7 monts and 25 days

abstract zealot
#

are you years-day-month?

#

yikes

#

youre years-month-day

#
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
#

try that

rotund dagger
#

ok , sec ill let you know

#

i start.day rather lol in the diffday line, but still stlightly wrong

abstract zealot
#

what i wrote should return that, i just tried

#

ahhhhh

#

it might not work because you need to sort values

rotund dagger
#

ohhhhhh i just realized i missed some values of what you had duh

#

sec

abstract zealot
#

try putting df = df.sort_values('Date')

#

so ```py
df = df.sort_values('Date')
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
print(difference)

supple minnow
rotund dagger
abstract zealot
#

bruh

#

ok, what might be happening is whenever you sort the date column, pandas sorts by year, and doesnt know how to handle datetime objects, take away the sort_values

#

or convert the entire column to datetime objects and then sort it

#

sorting the column is what messed it up, that code should return the exact number of days between the first and last date in the column

rotund dagger
#

true

#

let me retry without sort

abstract zealot
#

you will need to rerun the block of code where you defined df

#

since you overwrote it when you said df = df.sort_values

rotund dagger
#

so this is what i have and what ive re ran

abstract zealot
#

run the top one where you say df=read_csv

#

then run my code

#

and see if it works

#

lmao

#

im struggling today xd

#

@supple minnow do you have repeats for those categories?

#

if you only do something once it appears like that

rotund dagger
#

@abstract zealot all 3 implementations show 3128

#

except for 1 is not calculating the last day in the mix

abstract zealot
#

omg im so sorry, you will need to put df['Date'] = pd.to_datetime(df['Date'])

#

then df.sort_values('Date')

#

take away all the other code, just make sure youre defining the dataframe, converting the date column to datetime objects, sorting the column, then using my code

rotund dagger
abstract zealot
#

idk man, i literally replicated this on my own system and got 3524

rotund dagger
#

dam, let me try with just hardcoding those 2 dates

abstract zealot
#

thats a good idea actually

#
lst = ['2007-11-01', '2016-02-01', '2020-02-10', '2017-06-25']
df = pd.DataFrame(lst, columns=['Date'])
start = pd.to_datetime(df['Date'][0], format='%Y-%m-%d')
end = pd.to_datetime(df['Date'].iloc[-1], format='%Y-%m-%d')
difference = end - start
print(difference)
#

this prints 3524 and is just simulating your situation

#

although your indexing is crazy on that df

rotund dagger
#

yea index runs from 0 - 142193

#

but unsorted lol

abstract zealot
#

try the df = df.sort_values('Date') again and print df

#

i dont think it reindexes it

#

or call it org = df.sort_values('Date') so you dont overwrite

rotund dagger
abstract zealot
#

anyway, ill make new code that shouldnt need you to sort anything

#

nah it doesnt

#

i think its an indxing thing

#

are you using publicly available data?

rotund dagger
#

yea ill send you the link

abstract zealot
#

i dont have acc can you send me data in dm

#

its only 4mb

rotund dagger
#

yea np

nova kelp
#

how do i rename images like photo0,photo1,photo2.... using csv file that contains the name? Thanks in advance!

astral path
#

if I have a pandas dataframe that has two different index values but multiple elements with each index, how would I split that up into multiple dataframes? Is there a better way to do this?

rotund dagger
#

@astral path i think there is a way to do it let me try

astral path
#

thank you!

rotund dagger
#

to clarify, you want a dataframe with url = abc, and values 1 12 3 4, and a second dataframe with url = def, and val = -190, -4, -5

astral path
#

yes, correct

rotund dagger
#

so it will be something like this

finite harness
#

Ye

rotund dagger
#

df1 = pd.dataframe([])

#

df2 = pd.dataframe([])

#

then load the values you need in the dataframe, i have to leave for a sec, when i come back i will try to load it up and show displays

astral path
#

ok, thank you! i appreciate this

rotund dagger
#

np

astral path
#

the bigger issue i have in particular though, is that I have 209 different indices that I want to make into 209 different dataframes

velvet thorn
#

the bigger issue i have in particular though, is that I have 209 different indices that I want to make into 209 different dataframes
@astral path groupby apply

astral path
#

what would the apply do?

nocturne plover
#

Can anyone suggest me the best model for multiclass classification. I am thinkingย of Naives Bayes(Gaussian) but is there any better model?

abstract zealot
#

@nocturne plover you can model your data by lots of distributions

astral path
#

wouldnt it depend on the question youre trying to answer

abstract zealot
#

@astral path did you solve your problem?>

astral path
#

im close

#

colab is just being really slow as of right now so i dont know yet

abstract zealot
#

lemme know if you need help man

astral path
#

will do and thank you!

lapis sequoia
#

I did a df.groupby(["a", "b"]).x.mean() and I ended up with a,b as multiindex with my x-mean column, I'm trying to plot a separate plot for each a with b being on the x axis and x being on the y-axis

astral path
#

for more context just to let you guys know

lapis sequoia
#

not sure how to do that

astral path
#

im looping over every game from this NBA season and am trying to split each game into a new dataframe

#

im solving it a different way though so far:

vals = pd.DataFrame([])

df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()

display(df.loc[df.URL==urls[0]])

for thisURL in urls:
  gamestats = df.loc[df.URL==thisURL]
  display(gamestats['ShotDist'])
abstract zealot
#

are you getting this from dictionaries?

astral path
#

no, from a csv

#

the data looks like this

abstract zealot
#

can you post screenshot im not logged into google sadge

astral path
#

yea sure

#

lol didnt mean to screenshot both screens

abstract zealot
#

np

#

so youre trying to create a new df for each unique URL?

astral path
#

yeah thats what im doing

#

however instead, i'm looping over each URL and using df.loc to get the rows with that URL instead

abstract zealot
#

emm

#

you could try something like ```py
for a, b in df.groupby(by='URL'):
print(a)
print(b)
break

#

this would just give you an example thats why theres a break but basically it takes each element from url and makes a df out of it

#

๐Ÿ™‚ @astral path

astral path
#

right now i'm doing this

#
vals = pd.DataFrame([])

df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()

display(df.loc[df.URL==urls[0]])

for thisURL in urls:
  gamestats = df.loc[df.URL==thisURL]
  shotDists = gamestats['ShotDist']
  shotOutcomes = gamestats['ShotOutcome']
  for i in enumerate(shotOutcomes):
    if(shotOutcomes[i] == 'miss'):
#

(incomplete)

#

what im trying to do specifically is get the ShotDists for both teams in a specific game, do an analysis with each teams separate series of ShotDists which returns a float, and then send that float to a separate dataframe for storage

abstract zealot
#

ahhhhhh okay i get you

#

looks pretty good, keep it up!

astral path
#

thank you!

#

i'll update if i need help or if i finish it

warm bane
#

anyone know how to convert keras model into pytorch model ?

rotund dagger
#

oh good they got you @astral path

astral path
#

yeah but

#

i have an error

#
vals = pd.DataFrame([])

df.set_index(keys=['URL'], drop=False,inplace=True)
urls = df['URL'].unique().tolist()

display(df.loc[df.URL==urls[0]])

for thisURL in urls:
  gamestats = df.loc[df.URL==thisURL] 
  homestats = gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()]
  awaystats = gamestats['AwayPlay'].dropna()
  #homedists = homestats['ShotDist']
  #awaydists = awaystats['ShotDist']
  #homeoutcomes = homestats['ShotOutcome']
  #awayoutcomes = awaystats['ShotOutcome']
  display(homestats)
#

ValueError: Can only compare identically-labeled Series objects

#

i'm trying to make a new dataframe (homestats) which uses .loc on gamestats with the parameter of all gamestats elements in the column 'HomePlay' which are not NaN

#

@abstract zealot @rotund dagger any ideas?

misty flint
#

i have no ideas but you should follow ken jee on YT

#

hes a big sports data science guy

astral path
#

i'll check him out!

astral path
#

thank you! will def be checking this out

#

i chose nba analytics for my focus in my DS class this semester

rotund dagger
#

im looking it over now so far nothing jumps out

#

in this line gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()].... is the gamestats.HomePlay the same as saying gamestats['HomePlay']?

#

HomePlay is a column, but the environment might be confusing it with a method call

#

so could you do gamestats.loc[gamestats.['HomePlay']==gamestats['HomePlay'].dropna()]

#

im fairly new to this so i could be entirely off base

#

@abstract zealot @astral path

astral path
#

hmm i dont know

#

i mean to get gameStats I did gamestats = df.loc[df.URL==thisURL]

#

i'll try it

nocturne plover
warm bane
nocturne plover
#

There's no other way... If you need to use same model in PyTorch cause both have separate backends (incompatible)

#

But there's something at GH called PyTorch Lighting which makes it compatible on coding it in PyTorch. I'm not sure if it works but maybe yes.

warm bane
#

Incompatible backend?
Easy to understand, thanks for the info!

austere swift
#

you can easily convert pytorch to keras by using onnx as a middle man but pytorch doesnt allow loading onnx models

warm bane
#

so now can you use onnx to convert from keras to pytorch?

austere swift
#

pytorch doesnt allow loading onnx models

warm bane
#

so can't ?

#

okay

astral path
astral path
#

so i think i know why

#

when i do homestats = gamestats.loc[gamestats.HomePlay==gamestats['HomePlay'].dropna()]

#

its trying to compare gamestats.HomePlay and gamestats['Homeplay'].dropna()

#

which are both of different lengths because .dropna() is removing the rows with NaN as the value for HomePlay

#

so how would I return a df with all the rows where gamestats['HomePlay'] is not NaN?

#

thanks!

velvet thorn
soft mango
#

How do you color your output?

earnest forge
#

someone could help me with syntax of keras and tensorflow, they seem having a conflict in my code

#

who can I dm about it?

heavy bay
#

How much math do I need to know before learning tensorflow?

dusty anchor
#

hey guys how can i convert my rgb channels into classes in tensorflow?

supple minnow
abstract zealot
#

Bruh show code @supple minnow

bleak fox
supple minnow
woeful hamlet
#
valid_datagen = ImageDataGenerator(
    rescale=1. / 255,
    validation_split=0.2)

valid_generator = valid_datagen.flow_from_directory(
    directory=data_dir, target_size=dimensions[:2],
    seed=seed, subset='validation')```
#

How can i get X_test and Y_test from this object?

#

I need it to plot confusion matrix

bold olive
#

How exactly do you select features from an image (pixels) for subsequent input (X,y) to classifier/neural network?

austere swift
#

normally you'd just take the pixels and put them directly into an array which you can later convert to a tensor

#

you can use something like opencv to read the image

woeful hamlet
#

why when i use plot_confusion_matrix i get "only classifiers supported"?

astral path
elfin stream
#

Is this also a channel I can ask for help?

#

because I needed some help with matplotlib

abstract zealot
#

go ahead

elfin stream
#

so whenever I use FuncAnimation, anything I return from its animate function seems to draw on top of everything

abstract zealot
#

and you divide your axis with ```py
fig, ax = plt.subplots(<number rows>, <number columns>)

elfin stream
#

I used fig.add_axes

abstract zealot
#

can you show code?

elfin stream
#

which part specifically?

abstract zealot
#

maybe just the part where youre trying to plot?

elfin stream
#

what I'm trying to do btw is make it not draw over everything

abstract zealot
#

yes i know

elfin stream
#

Idk what part still, mean my animation code?

abstract zealot
#

you can dm me the code and i can suggest a fix if you dont want to post it here

elfin stream
#

there is a lot of code

#

should I post the whole thing?

abstract zealot
#

100 ?

elfin stream
#

?

abstract zealot
#

100 lines

elfin stream
#

166 in total

abstract zealot
#

you just need to show the part where you start using matplotlib, if i need more ill letcha know ๐Ÿ˜„

elfin stream
#

the entire code is just matplotlib really

abstract zealot
#

dm me the code then xd

elfin stream
#

okay

lapis sequoia
#

anyone here work with h5py files

untold raft
#

ะฟั€ะธะฒะตั‚

#

ั€ัƒััะบะธั… ะฝะตั‚?

slow grove
#

Aye could someone point me in the right direction? How could i use a pattern of data from a set of users to find similar users? At this point i don't even know what to google. I've got a pretty massive amount of users, and a decent sized subset of them that I know fit, i just need a way to find other users like them in the entire group. Sorry if this doesnt make sense

astral path
#

if I have a dataframe with two columns and each row is an integer from 3 to 37, how could I figure out the exact number of times a specific combination of values appears in the dataframe?

#

and combo count contains the number of times a row appears

barren prism
# slow grove Aye could someone point me in the right direction? How could i use a pattern of ...

This sounds like a task for a typical recommender system. There are some simple algorithms for this, and advanced stuff too. Maybe look at this link for some simple starters: https://www.kdnuggets.com/2019/09/machine-learning-recommender-systems.html

slow grove
#

ty cheers m8

abstract zealot
#

What is your combo of numbers @astral path

astral path
#

in the example, 3 and 32

#

is that what you're asking?

abstract zealot
#

try ```py
number = len(df[(df['col1'] == '3') & (df['col2'] == '32')])

woeful hamlet
#

How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?

astral path
#

will do! thank you

astral path
abstract zealot
#

depends what combos youre looking for

astral path
#

so rather than df['col1'] =='3', it would be df['col1'] == col1vals

abstract zealot
#

yea

astral path
#

how would that work?

abstract zealot
#

so you want to do this for every row

#

?

astral path
#

yeah

#

im making a scatterplot where the size of the point is dependent on the frequency that the x value and y value combination appears

#

in seaborn

abstract zealot
#

maybe then try a different strategy like ```py
for e, i in df.groupby(by=['col1', 'col2']):
print(f'The combo {e} appears {len(i)} times ')

#

does that work?

astral path
#

lemme check

abstract zealot
#

instead of printing them then, just put them into a dictionary which you can use to form your plot

woeful hamlet
#

How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?

abstract zealot
#

not sure man

#

@astral path any success?

astral path
#

yep it worked!

#

now i just have to work it into the vis

abstract zealot
#

nice, goodluck man ๐Ÿ˜„

astral path
#

ty!

woeful hamlet
#

How can i use plot_confusion_matrix from sklearn.metrics with the images on an ImageDataGenerator from keras?

rancid ruin
#

hey

#

can anybody help

#

@client.event
async def on_guild_join(user, guild, ctx):
await ctx.send(f"{user} joined {guild}!")