#data-science-and-ml

1 messages · Page 257 of 1

royal thunder
#

oh great thanks

outer geyser
#

they got calculus 2 also about 7 hours course

royal thunder
#

wondering how much days it needs for me to become good in machine learning

#

i had my friends say like 3 months and more

#

thanks for maths man

dawn turtle
#

I suppose this is more of a software best-practises type question but why doesnt numpy use some sort of abstract representation to evaluate the expression after it is required to be calculated (without an @ operation or maybe there are even more gains to efficiency you could make with this knowledge) which would allow numpy to always minimize the complexity? or maybe there is already a wrapper to array that does this?

lapis sequoia
#

idk what exactly ur asking but hope someone answers ur q my man

#

yo so idk if anyone here remembers but im working on a gan

#

i been trying to run it

#

i get this error

#
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-16-41501876ca3c> in <module>()
     15 
     16 for example_input, example_target in t_in.take(1):
---> 17   generate_images(generator, example_input, example_target)
     18 
     19 EPOCHS = 150

7 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: Index out of range using input dim 0; input has only 0 dims [Op:StridedSlice] name: strided_slice/
<Figure size 1080x1080 with 0 Axes>```
digital zenith
#

hey

#

i have an issue

#

hlo

#

anyone

#

nvr mind

lapis sequoia
#

watup

#

@digital zenith

digital zenith
#

hey

#

uh

#

i wanna make an AI

lapis sequoia
#

yes

digital zenith
#

for that i need a speech recognition module

#

the current one is not compatible with python3.8\

#

because pyaudio stopped at 2017

lapis sequoia
#

so use python 3.7?

digital zenith
#

no it is only compatible with 3.6

lapis sequoia
#

u can just use that lol

digital zenith
#

is there any other modules

#

or should i downgrade

lapis sequoia
#

never used it tho

digital zenith
#

yeah

#

this one needs pyaudio

digital zenith
#

so i guess i should downgrade

lapis sequoia
#

this need pyaudio as well?

digital zenith
#

yeah

#

i've seen this page

lapis sequoia
#

ye

#

probably ur best op

digital zenith
#

hmm

#

thank you

lapis sequoia
#

Hey guys, just wondering how I could make a function to check if these cities are present in the 'Kommun_name' column:
("Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro")

I did a test to see that it works by checking if 'Haparanda' exists which it does as you can see

native ridge
#

df.Kommun_name.is_in(YOUR_TUPLE)

lapis sequoia
#

Nice!

native ridge
#

#gladtohelp

lapis sequoia
#

Thanks!

#

#gladtohelp
@native ridge Like this? ```#Tuple with all the cities that already has an Ikea store
ikea_stores = ("Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro")

df_dropped.Kommun_name.is_in(ikea_stores)```

native ridge
#

Sorry, try isin, without the "_".
Hard to remenber every single name...

#

This should return a Series of bool, with which you can index the DataFrame.

velvet thorn
#

yup, it's isin

velvet thorn
#

also, try to use a set ({}) with isin

#

{"Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro"}

lapis sequoia
#

That the output shows be the complete row like with the city like 'Haparanda'

velvet thorn
#

filter on it

#

df[df_dropped.Kommun_name.isin(ikea_stores)]

#

exactly as you did in the previous cell

#

if you take the df[] out of it

#

you'll see that it also returns a boolean Series

lapis sequoia
#

Sweet!

#

Thank you all so much!

#

Got what I needed.

#

Thanks for the knowledge 🙂

lapis sequoia
#

Should I create a conditional function for this or what is the best way?

#

your condition checks if the name is equal to the whole tuple

#

not if the name is in the tuple

#

how do I check if the values in the tuple are present

#

in operator

#

and if thats the only purpose of that tuple you get a bit of perfomance by making it a set

royal thunder
#

how to make a dataset through webpages?

eternal fractal
#

how to make a dataset through webpages?
@royal thunder do you mean scraping data from web ?

royal thunder
#

yeah like that

#

here is an example

#

i wanna scrape some data from that website and make some csv file tho

fast plover
#

@paper niche that was the ticket, thank you.

eternal fractal
#

it teaches you to scrape data from web and make a visual representation of the data you scraped

#

if you got your dataframe
you may do this:
dataframe.to_csv("data.csv",sep=',',index=False)
this should get you your csv file

royal thunder
#

thanks man

lapis sequoia
#

Hi guys,
I've a stationary time series, I'm trying to know which model to use for forecasting this timeseries, I've performed different analysis technics and come to conclusion that my series is stationary and normally distributed but couldn't know what will be right model for forecasting, here's pictures for my seasonal decomposition, acf and pacf:

#

looking at this graphs what will be first conclusion comes to mind ? Can we say that a moving average is good model here ?

#

thanks in advance for ur help 🙂

#

(I know for most of you this is pretty basic but not for me so any help or insight to put me in the right path is highly appreciated py_guido )

tidal sonnet
#

@tidal sonnet how did you find the values of a, b, c to be 3, -1/2, and 0?
@heady hatch They were from a previous question, which I got correct, then they said to take the answer and plug them into the [a,b, c], then get the echelon form :(

slender nymph
#

hi good morning

#

a little question: how can i select a title of a column without using the name column

#

i want select 'BTC Returns' as string

#

maybe iloc?

#

df.iloc[:0, 5:6]

#

but can i select it as string

#

how*

heady hatch
#

@slender nymph Hey I'm not too sure what you're asking for.

Are you looking to grab one particular cell as a string? Or did you want them as a list of strings?

#

@tidal sonnet

from what I've found, A in REF ended up to be

A = [
      1, 1, 1,
      0, 1, 2,
      0, 0, 1
    ]

Then I would probably manipulate S the way A was manipulated.

I'm not super sure where a, b, c is supposed to come in.

lapis sequoia
#

can anyone suggest me any good machine learning cources?

tidal sonnet
#

Interesting... Can you explain the method you used?

keen prism
eternal fractal
#

try running cmd as admin, had the same problem before when I installed scrapy, got it installed when I ran cmd as admin

#

a little question: how can i select a title of a column without using the name column
@slender nymph col_headers = dataframe.head()
try this

#

i use that for selecting column headers of csv

heady hatch
#

Ahh that makes a lot of sense that they want the column name. hahaha I thought they wanted the values from the column.

#

@tidal sonnet

Yea so I started with

A = [
      [1, 1, 1],
      [3, 2, 1],
      [2, 1, 2]
    ]

Then from there,
-3 * first row + second
-2 * first + third

then keep going from there to reach REF.

tidal sonnet
#

You can do it multiple times??

#

so that would give you

A = [
  [1, 1, 1],
  [0, -1, -2],
  [0, -1, 0]
]```
#

That's what i got as well, difference being I had multiplied first row by 3 and 2 and subtracted it from the others. But where did you go next?

#

seeing that in row 3, the [a] and [c] are both the same? @heady hatch ?

heady hatch
#

@tidal sonnet and then I

multiplied -1 * second row + third
then divide third by 2.

tidal sonnet
#

[0, 1, 2],
[0, 0, 2]

#

MY GOSH

#

@heady hatch THAT'S SO COOL

#

i didn't know that you didn't HAVE to use the first row

#

thank youuuuuuuuu

heady hatch
#

hahaha glad to be of help.

keen prism
#

@eternal fractal oh i didn't think about that

tidal sonnet
#

Did I do this correctly? @surreal ingot
If so, how can i get rid of the negatives that I get out for S?

#

Using Back Substitution

#

I am genuinely confused

heady hatch
#

I think you might have gotten the wrong @heady hatch . hahaha

#

On the other hand, the first S should be [15, -17, -7] I think.

#

Because 15 * -2 + 23 = -7.

#

@tidal sonnet

tidal sonnet
#

yea it's -7, i realized that just now

#

But I also can't figure out where the (r) is supposed to come in

heady hatch
#

I can't do math myself.

#

Hmm.

#

What do you mean by r?

tidal sonnet
#

then they tell me to take this answer, and plug it back into question 1

#

So i end up with something like

A =[[1, 1, 1], [3, 2, 1], [2, 1, 2]]
r = [3, -0.5, 0]
S = [15, 28, 23]
heady hatch
#

Hmm. let me think.

#

Yea same.

#

hahaha

#

Can you give me the problem in sequence. Like in screenshots? I might be able to better give some suggestions.

tidal sonnet
#

🤔

heady hatch
#

Because I feel like I'm getting bits and pieces and can't really connect back together.

tidal sonnet
#

i'm getting different numbers since i fixed me having -8 instead of -7

#

so it's actually looking like it's start to make sense... a bit

heady hatch
#

Okay cool cool cool.

tidal sonnet
#

15- 12 = 3 💀 not 2

heady hatch
#

:^)

tidal sonnet
#

Would I have to reflect the same change in the matrix?

#

changing it to
identity?

heady hatch
#

Uh what do you mean?

#

Isn't A the matrix?

tidal sonnet
#

Yea

#

that's what I meant...
like if i'd have to set it to

A = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]```
heady hatch
#

Hm

#

That's RREF.

#

If it's only asking for REF, then I don't think so.

tidal sonnet
#

they wanted me to solve it 😁
THANK YOU SOOO MUCH

#

i've been on this practice quiz for 3 days now :(

heady hatch
#

Hey, we've learned something new today!

tidal sonnet
heady hatch
#

hahaha congratulations.

tidal sonnet
#

thank you alot m8

heady hatch
#

Glad to be of help.

tidal sonnet
#

i couldn't figure out at all how to get row 3...
but now i know that you don't have to use the first row

#

something else i'm curious about
If I hadn't multiplied by a negative scalar and added
but instead multiplied by a positive scalar and subtracted, would that have been the same?

heady hatch
#

It is the same.

#

You can think of it as 2 - 3 => 2 + (-3).

#

I prefer the add negative notation just to keep things uniformed.

tidal sonnet
#

Ah

#

Thank you 🙇🏿‍♂️

#

I still wonder where r is supposed to come in 🤔

slender nymph
#

hello data scientist. someone had made a OLS regresssion without statsmodels module only with numpy and pandas

spice marten
#

hello could somebody help me with scraping a web page? Basically I am trying to get this picture off a website but its labled as an event which i think means that some javascript is being executed or somthing so beautiful soupd doesn't read it. Any ideas on what to do?

tidal sonnet
#

But i'm not sure where I went wrong

#

I tried finding the inverse...
They say that the above answer is the right one

old thorn
#

ok

past raptor
#

Trying to do a histogram with array([13., 23., 33., 48., 52., 48., 33.]).
So every element is one column. Instead, I get these elements sorted to their numerical value.
How do I fix this?

old thorn
#

let me think for a second its been a while since I've worked with this

past raptor
#

Alright, thanks sir

old thorn
#

no need to call me sir haha im only a teenager

past raptor
#

hahaha alrighty

old thorn
#

are you using google colab or what?

past raptor
#

jupyter

old thorn
#

hmm i never used jupyter but its similar to colab i think

#

what was your code for this line?

past raptor
#
ax1.hist(DS, density=True) ```
#

DS stands for the array

#

density, I tried turning off and on

old thorn
#

ok and what do you want this to be

past raptor
#

It seems like the axis is the problem

#

Oh i want

old thorn
#

yeah what do u want the histogram to represent

past raptor
#

so every element in the array has to represent one column

#

if I have an array of all 10, then all the columns have to be same height of 10

#

Is this clear?

old thorn
#

yes, I think I understand what you're asking

#

sorry I just haven't done these in a while, almost a year

past raptor
#

Oh, i see, well if u think u cant help me, dont worry

#

but if u have any hints at least of how to tweak

old thorn
#

no no I think I can it will take me just a bit to remember some things

past raptor
#

alrighty

old thorn
#

I might not be able to give the solution but I could definitely point you in the right direction

past raptor
#

thats more than enough

old thorn
#

I believe it might be a problem with the axis because your array seems to be only for the x - axis, you might have to make it to where the y -axis has the same array as your x - axis if you want the histogram to have the same height as its location on the x - axis

#

did that make sense?

#

I don't know if that is correct though

past raptor
#

let me digest that

old thorn
#

yeah go ahead, I am not the best at clearly explaining stuff but if you need clarity go ahead and ask

past raptor
#

oh, I think i get it. Because there is no linearity (correlation) between the two variables, the y axis is misrepresented

#

thus, showing that funny 0.00 to 0.07 value

#

on the y-axis

old thorn
#

yup

past raptor
#

oh, let me tweak on that, thanks mate

old thorn
#

I don't know why you got 0.00 - 0.07

past raptor
#

that was given by the program

#

but im trying to look for the y axis parameter

#

but cant find

old thorn
#

hmm well try tweaking around with the axis and if u need anymore help just ping me in this channel

past raptor
#

uhum, thanks mate

past raptor
#

@old thorn Hey, couldnt really find a way through this.. Is there more that you know?

keen root
#

Hi, I need some advice: I want to try to train a very (VERY) simple network, a simple perceptron, and for that there is an analytical solution, which involves the Penrose Pseudo inverse. However, my input data is a bunch of binary strings like "00010111". Now, calculating the that inverse through np.linalg.pinv(X_train) gives me sometimes a convergence error, but if I run it a second time then that error does not appear (no idea why). But if on the other hand I decide to build a keras mode like this:

model_y=keras.models.Sequential([keras.layers.Dense(100, activation="relu",input_shape=[8]),
                                keras.layers.Dense(1, activation="linear", name="l3")])
model_y.compile(optimizer=keras.optimizers.SGD(learning_rate=0.1), loss='mse', metrics=[tf.keras.metrics.RootMeanSquaredError()])

I get "no learning" at all. My first guess is that this has something to do with the fact that my input consists of binary data, but does anyone have any ideas what can be done?

heady hatch
#

Hmm How come you chose 100 units for the first layer?

keen root
#

oh, my bad, that was a mistake. It should only have the 1 neuron layer with the input shape specified

heady hatch
#

Ahh okay okay. What's the input/input shape?

keen root
#

the input are binary integers in the neurons like an array of [0,1,1,0,0,1,1,0] and the output will be a continuous variable

#

its a regression problem

heady hatch
#

That's what I was thinking of too. Any reason why you're using relu?

keen root
#

no reason at all. I'm way too unexperienced for it

lapis sequoia
#

to predict an answer to something i should just take the mean?

heady hatch
#

That is one form of prediction.

lapis sequoia
#

what others are there?

heady hatch
#

Depends on the context, let's say given some data you want to find some form of predictor for this set of data.

You can choose mean or median.

#

or maybe even mode.

lapis sequoia
#

wouldn't mean be the same as mode in the context of YES or NO?

#

and how would median be relevant for prediction?

heady hatch
#

So prediction without any kind of context is vague.

#

Could you clarify what you mean by prediction?

lapis sequoia
#

like say with a given age, the program tells you if it's more likely the person will say yes or no to something

#

like idk... do you have a bedtime?

#

i'm not working on a project. just trying to understand the basics

heady hatch
#

@keen root I think we can start really simple.

Just maybe something simple like

model = Sequential()
model.add(Dense(1, activation='linear', input_dim=input_len))
model.compile(...)

I don't know if this will work or not. Would start simple.

#

Yea of course. @lapis sequoia

So in your example, the prediction would be a yes or a no.

#

And I'm assuming what you're basing that prediction is on some stats or measurement, like mean?

lapis sequoia
#

yeah mean ig

#

but wouldn't mean just get the same result as mode? in that situation

heady hatch
#

Depends on your data.

past raptor
#

If I have a 4x4 np.array and I want to add all of the rows horizontally so that I end up with a single column, what is a way to do it?

[[69,0,86,8],           
[45,52,87,29],
[42,38,81,43],
[63,73,60,0]]
to
[[162]
[213]
[204]
[196]]
heady hatch
#

Let's say your data is 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4.

I don't think the mean is the same as the mode here.

lapis sequoia
#

that is true but what i am saying is data will either be 1 or 2

keen root
#

@keen root I think we can start really simple.

Just maybe something simple like

model = Sequential()
model.add(Dense(1, activation='linear', input_dim=input_len))
model.compile(...)

I don't know if this will work or not. Would start simple.
@heady hatch I've tried it before, though I'll try it again

heady hatch
#

@past raptor Are you using regular python or some library like NumPy?

lapis sequoia
#

as in yes or no

past raptor
#

numpy sir

keen root
#

My main concern is the fact that there is binary data at the entry. Is there anything special about it? Would I have to have any special care?

heady hatch
#

@past raptor You can do np.sum(data, axis=1)

past raptor
#

Let me try that!

heady hatch
#

@lapis sequoia Ahh okay so like 1 = yes and 2 = no?

So what would the mean represent?

lapis sequoia
#

the same as mode no?

heady hatch
#

@keen root I think once you've changed it into an array representation, it's not in binary format anymore. Instead it is an array/tensor of numbers.

#

@lapis sequoia Oh uh just to make sure we're on the same page, how do you calculate the mode?

lapis sequoia
#

whatever is the most used

#

right?

past raptor
#

@heady hatch thanks, it worked!

heady hatch
#

@lapis sequoia right, you might need to connect the dots for me.

I'm not sure how you're getting the mean and the mode to be the same here if all you have is 1 = yes and 2 = no.

lapis sequoia
#

like i would round the mean

#

bc 0.6 wouldnt be an answer

heady hatch
#

Right.

lapis sequoia
#

1.6 i mean

heady hatch
#

To focus a bit on the details here, so rounding the mean isn't the same as the mean itself.

lapis sequoia
#

yes i know that, but to make the mean into an answer wouldn't i have to round it?

#

i'm probably being dumb 😅

heady hatch
#

No no you're not, you're learning and we're discussing.

lapis sequoia
heady hatch
#

So straight up taking the mean and rounding it is very crude but it's one way to get predictions.

#

Do you know anything about linear regression or logistic regression?

lapis sequoia
#

i don't

heady hatch
#

Ahh.

To give you analogy.

Let's say we're creating an algorithm to predict whether someone will be asleep or not.

#

Our simple algorithm is just to take the mean of the data and probably apply some function.

lapis sequoia
#

what is sigmoid function?

#

oh

heady hatch
#

hahaha sorry don't want to throw too many things at you.

#

So linear or logistic regression are another kind of algorithms.

lapis sequoia
#

and what do they do?

heady hatch
#

Similar to how we grab the mean and apply some function.

#

So linear regression will predict a number of some kind given an x.

#

I don't know how familiar you are with math.

#

but like y = mx + b.

lapis sequoia
#

mx being...

heady hatch
#

m = slope, x = data.

lapis sequoia
#

what is a slope?

#

this might be basic english lol

heady hatch
#

oh no worries, it's more of a math term. hahaha

#

Imagine the line y = x.

#

You know how it's just a diagonal line?

lapis sequoia
#

y = x would be a constant no

#

?

heady hatch
#

Right.

lapis sequoia
#

unless x has got something to do with y

heady hatch
#

And the slope of the function is ratio of the vertical change over the horizontal change.

#

In y = x

#

slope = 1

#

But let's say y = 2 * x + 3

#

slope here is 2

lapis sequoia
#

and the +3 ?

#

does that have no effect on the slope?

heady hatch
#

So the +3 is something called the intercept.

#

You might need to learn some basic algebra if you're unsure of all this.

lapis sequoia
#

i might know what these terms are in my language

#

i just don't recognize them in english that well

heady hatch
#

Ahh. hmm what language are you familiar with?

lapis sequoia
#

portuguese

heady hatch
#

Let me google it.

#

This is from google translation.

#

A inclinação de uma função linear
A inclinação de uma colina é chamada de declive. O mesmo vale para a inclinação de uma linha. A inclinação é definida como a razão entre a mudança vertical entre dois pontos, a elevação, e a mudança horizontal entre os mesmos dois pontos, a corrida.

#

Let me know if that makes sense. hahaha

lapis sequoia
#

kinda

#

so is slope "inclinação"?

odd yoke
#

yes

lapis sequoia
#

when predicting something should i add that for future learning or not? bc it might not be 100% correct

rustic apex
#

I have a list of stocks I’ve kept track of over a while. I want to get the stock price per cell, per day, and then see what the price was.

plucky spindle
#

Hello guys, I present a repo for a good cause, This repository is an initiative to share knowledge in data science to a community of Spanish-speaking practitioners, most of the content on this subject is in English, if you know techniques and methods of data science and machine learning you can share it with our study group through a pull request to be translated and serve as study material and expand the amount of understandable material, they can be, explain how a machine learning model works, some technique of cleaning or data exploration, a tutorial on how to use a module etc. etc .. Apart from participating in Hacktober fest and winning a shirt or planting a tree, you are helping a community of people who want to learn.

https://github.com/LATAM-Data-Science-Study-Group/Data-Science-Notebooks

agile wing
#

boom studying adnrew ng's course

#

soo far, pretty good course

mild topaz
#
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 89, in <module>
    assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```
#

i am facing an AssertionError in my code

heady hatch
#

I think the dimensions of your training images are wrong. How does the tutorial setup the input?

mild topaz
#

@heady hatch in tutorial they have used imageDimensions = (32, 32, 3)

#

but when i pass the the same imageDimensions then it is giving an error

heady hatch
#

And you both are using the same version of Tensorflow, right?

#

What about the data input? Similar shape?

#

Because that would be my guess, your data input might not be of right shape.

mild topaz
#

i think he has not mentioned about shape

heady hatch
#

So I would take it as an assumption here from the variable imageDimension.

#

Because it's looking for (32, 32, 3).

#

I would set that to be your shape.

mild topaz
#

i am using imageDimensions = (32, 32, 3)

#

can i share my code ? @heady hatch

heady hatch
#

Sure.

mild topaz
heady hatch
#

From what I'm seeing on line 89, you're checking

assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

right?

mild topaz
#

yes

heady hatch
#

So I think if I'm following the code correctly,

you're checking if (32, 3) == (32, 32, 3)?

#

Because images are of the shape, 32 x 32 x 3?

#

I guess I'm wondering how come you're checking the shape index 1 and on instead of just x_train.shape == imageDimensions?

mild topaz
#

in tutorial 6:06 please check line 66

#

i am following the same as shown in tutorial @heady hatch

heady hatch
#

Right right, I would ignore the tutorial real quick.

#

Try changing x_train.shape == imageDimensions real quick.

#

And let me know how that goes.

#

Oh wait.

#

probably something like

#

x_train[0].shape == imageDimensions

#

OH

#

Wait I get it now.

#

print your x_train.shape before the assert.

#

Sorry I'm not thinking too clearly.

#

So before the line
assert ..., add a print(x_train.shape)

mild topaz
#
(378,)```
heady hatch
#

Yea that doesn't sound right.

#

So on your line 65.

#

what's np.array(images)?

mild topaz
#

this is my console output```python
total classs detected : 24
noofClasses: 24
importing classes...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
data shapes
train (378,) (378,)
validation (95,) (95,)
test (119,) (119,)
(378,)
Traceback (most recent call last):

File "E:\demo3\image_classification.py", line 90, in <module>
assert (x_train.shape == (imageDimensions)), "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

heady hatch
#

Right right, sorry about the x_train.shape == imageDimensions.

#

I think there's something wrong with your images.

#

these lines.

#
count = 0
images = []
classNo = []
#mylist = os.listdir(path)
p = pl.Path(path)
mylist = [x for x in p.iterdir() if x.is_dir()]
print("total classs detected :", len(mylist))
noofClasses = len(mylist)
print("noofClasses:", noofClasses)
print("importing classes...")
for x in range(0, len(mylist)):
    myPicList = os.listdir(path+"/"+str(count))
    for y in myPicList:
        curImg = cv2.imread(path+"/"+str(count)+y)
        images.append(curImg)
        classNo.append(count)
    print(count, end = " ")
    count+=1
print(" ")
images = np.array(images)
classNo = np.array(classNo)
mild topaz
#

np.array(images) gives None

heady hatch
#

After images = np.array(images)

can you print images[0]?

#

and double check if that's what you expect it to be.

mild topaz
#

None

After images = np.array(images)

can you print images[0]?
@heady hatch

heady hatch
#

Yea.

#

I think you're not reading in the images properly.

#

I'm not familiar enough with the cv library, but I can help you debug.

mild topaz
#

I think you're not reading in the images properly.
@heady hatch ok

heady hatch
#

So on line 59.

#

after curImg = cv2.imread(path+"/"+str(count)+y)

#

add a print curImg.

#

and then add a break.

#

on line 57.

#

after myPicList = os.listdir(path+"/"+str(count)) add a print myPicList and then add a break.

mild topaz
#

add a print curImg.
@heady hatch python None 0 None 1 None 2 None 3 None 4 None 5 None 6 None 7 None 8 None 9 None 10 None 11 None 12 None 13 None 14 None 15 None 16 None 17 None 18 None 19 None 20 None 21 None 22 None 23

heady hatch
#

Be sure to add a break.

#

so something like

#
print(curImg)
break
#

So double check

#

is this where your images are?

#

path = r'E://demo3//india'

mild topaz
#

path = r'E://demo3//india'
@heady hatch yes

heady hatch
#

okay now, since you've imported os.

#

You can try something like

#

os.path.isfile(path_to_image)

#

and check to see if you have the right path.

#

You can print it anywhere.

mild topaz
#

os.path.isfile(path_to_image)
@heady hatch ok let me try...

#
os.path.isfile(r'E://demo3//india//0//a.jpg')
True``` @heady hatch
heady hatch
#

Okay okay cool.

mild topaz
#

i think the input shape or dimensions is not proper i guess

heady hatch
#

I think it's your images.

#

Because you saw up there that it's printing None.

#

I think curImg = cv2.imread(path+"/"+str(count)+y) is incorrect.

mild topaz
#

okay, means my images are not in correct format?

heady hatch
#

Hmm.

#

Or maybe you're not reading them correctly.

#

like

#

I guess print path+"/"+str(count)+y

#

to make sure it's path to actual image.

#

Or actually no I think you might have a point, sorry for jumping the gun.

#

I think either path to images isn't correct

#

or something wrong with the images.

#

Since cv2.imread isn't reading them properly.

mild topaz
#

my images consists of rotated images also

heady hatch
#

but they're still in readable formats, right?

#

Oh wait.

#

I think

#

I might have an idea.

#

path+"/"+str(count)+y isn't this supposed to be path + "/" + str(count) + "//" + y?

#

Seeing how the files live in 'E://demo3//india//0//a.jpg'.

#

Double check if your path is correct.

#

It might also be path + "//" + str(count) + "//" + y

mild topaz
#

let me check path + "/" + str(count) + "//" + y this?

heady hatch
#

Yea

#

or even check path+"/"+str(count)+y.

#

Like add a line above curImg,

print(path+"/"+str(count)+y)

#

See if that's what you think it is.

mild topaz
#
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 71, in <module>
    print(images[0])

IndexError: index 0 is out of bounds for axis 0 with size 0```
heady hatch
#

Oh no no no.

#

print the path.

#

print(path+"/"+str(count)+y)

sage palm
mild topaz
#
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch
heady hatch
#

I don't know what you want me to say @mild topaz . hahaha

#

Hey @sage palm , what do you need help with?

#

Are you allowed to use libraries?

mild topaz
#

@heady hatch sorry, i got confused can u explain again

heady hatch
#

It's okay, so

#

Add a line between line 58 and 59.

#

print(path+"/"+str(count)+y).

sage palm
#

Thanks for answering! Yes, I'm allowed to use numpy

mild topaz
#
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 59, in <module>
    print(path+"/"+str(count)+y)

NameError: name 'y' is not defined``` @heady hatch
heady hatch
#

@mild topaz

Sorry looking at your code, your line 58 and 59 are these.

    for y in myPicList:
        curImg = cv2.imread(path+"/"+str(count)+y)

So instead of that
add a print statement there.

    for y in myPicList:
        print(path+"/"+str(count)+y)
        curImg = cv2.imread(path+"/"+str(count)+y)
mild topaz
#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch
heady hatch
#

@sage palm Hmm I have some idea but what do you have in mind? Sorry I was typing a bunch of stuff but realized I should asked you first.

#

Hey @mild topaz I really think you should check the path. Because I think your data is just filled with Nones.

mild topaz
#
total classs detected : 24
noofClasses: 24
importing classes...
['a.jpg', 'aa.jpg', 'aaa.jpg', 'aaaa.jpg', 'b.jpg', 'bb.jpg', 'bbb.jpg', 'bbbb.jpg', 'c1.jpg', 'cc.jpg', 'ccc.jpg', 'ccccc.jpg', 'download (7).jpg', 'download.jpg', 'ges.jpg', 'images (1).jpg', 'rfg.jpg', 't.jpg', 'tt.jpg', 'ttt.jpg', 'ttttt.jpg', 'z.jpg', 'z1.jpg', 'zz.jpg', 'zzzzz.jpg']
E://demo3//india///0a.jpg
0 ['a.jpg', 'aa.jpg', 'aaa.jpg', 'aaaa.jpg', 'bbb.jpg', 'bbbb.jpg', 'bbbbb.jpg', 'cdfg.jpg', 'cfd.jpg', 'download (3).jpg', 'download (4).jpg', 'download (5).jpg', 'download (6).jpg', 'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1``` @heady hatch
heady hatch
#

Okay yea. I think it's your path.

#

You noticed E://demo3//india///0a.jpg?

#

But your files live in E://demo3//india///0//a.jpg?

#

You're missing //.

mild topaz
#

But your files live in E://demo3//india///0//a.jpg?
@heady hatch a.jpg is name of my image file

heady hatch
#

Right

#

But

#

0a.jpg is not the file, right?

mild topaz
#

print(curImg) return None

#

0a.jpg is not the file, right?
@heady hatch yes

heady hatch
#

and neither is 1a.jpg, right?

mild topaz
#

and neither is 1a.jpg, right?
@heady hatch correct

heady hatch
#

So I guess I'm wondering, how come you're trying to read those files if they don't exist?

mild topaz
heady hatch
#

Yes.

#

But you see how

#

You're printing out

#
'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1
sage palm
#

@heady hatch No problem, I will wait. This is a problem sheet which I'm working on for the upcoming exam. There are 6 pure math problems which I have done, but the ones in python I simply can't figure it out on my own. I'm not good at Python and the course has been a bit of a nightmare, so we have not learned what we should.

mild topaz
#
'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1

@heady hatch OH i see

#

why i am getting this but ?

heady hatch
#

@sage palm I can help you with the python but my linear algebra is a bit rusty. hahaha

I'm reading up on the converging series right now. But I would love to hear you breaking down the math portion if you can.

mild topaz
#

i am getting this with every folder

heady hatch
#

Yes.

#

Because you wrote curImg = cv2.imread(path+"/"+str(count)+y)

#

Your path is wrong.

#

I think it's supposed to be
curImg = cv2.imread(path+"/"+str(count)+"//"+y)

mild topaz
#

okay, let me check

#

see this way i am getting here```python
14 ['1021.jpg', '123.jpg', '152.jpg', '52.jpg', '7856.jpg', 'a.jpg', 'aa.jpg', 'aaa.jpg', 'b.jpg', 'bb.jpg', 'bbb.jpg', 'c.jpg', 'cc.jpg', 'ccc.jpg', 'd.jpg', 'dd.jpg', 'ddd.jpg', 'e.jpg', 'ee.jpg', 'eee.jpg', 'images (1).jpg', 'images (2).jpg', 'images (4).jpg', 'images.jpg', 'pn_dl2.jpg', 'pn_dl9.jpg', 'x.jpg', 'xx.jpg', 'xxx.jpg']
E://demo3//india///151021.jpg
[[[184 181 150]
[184 181 150]
[161 158 127]
...
[198 201 199]
[ 56 61 62]
[ 0 0 3]]

[[162 165 139]
[190 193 168]
[175 178 153]
...```

heady hatch
#

Congratulations.

mild topaz
#
 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]]
23  
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch
sage palm
#

@heady hatch i will! But please give me a minut or two, i’m on my phone because my mac has frosen. Sorry about that

heady hatch
#

No worries, I can wait.

mild topaz
#

what is wrong in my case , can u plz help me to understand? @heady hatch

heady hatch
#

hmm

#

are images actual data?

#

I'm not sure what you've changed in your code.

mild topaz
#

i have changed this only pytho for y in myPicList: print(path+"/"+str(count)+y) curImg = cv2.imread(path+"/"+str(count)+"//"+y)

#

@heady hatch can i share my code again?

heady hatch
#

Sure.

mild topaz
heady hatch
#

@mild topaz Oh remove line 64 and 65.

        print(curImg)
        break
mild topaz
#

see this

#
E://demo3//india///23download (16).jpg
E://demo3//india///23download (18).jpg
E://demo3//india///23download (19).jpg
E://demo3//india///23download (21).jpg
E://demo3//india///23download.jpg
E://demo3//india///23gfd.jpg
E://demo3//india///23gh.jpg
E://demo3//india///23images (10).jpg
E://demo3//india///23images (11).jpg
E://demo3//india///23images (12).jpg
E://demo3//india///23images.jpg
E://demo3//india///23iu.jpg
E://demo3//india///23ry.jpg
E://demo3//india///23uiop.jpg
E://demo3//india///23y.jpg
23  
data shapes 
train (377,) (377,)
validation (95,) (95,)
test (119,) (119,)
(377,)
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 96, in <module>
    assert (x_train.shape[1:]  == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong``` @heady hatch
heady hatch
#

before that can you print x_train[0]?

mild topaz
#

on which line?

heady hatch
#

Probably on line 80 or something.

#

after

x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train , test_size = validationRatio)
mild topaz
#

 [[232 245 253]
  [232 245 253]
  [232 245 253]
  ...
  [220 230 237]
  [221 231 241]
  [222 231 245]]]
data shapes 
train (377,) (377,)
validation (95,) (95,)
test (119,) (119,)
(377,)
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 96, in <module>
    assert (x_train.shape[1:]  == (imageDimensions)),  "the dimension of training images are wrong" 

AssertionError: the dimension of training images are wrong``` @heady hatch
sage palm
#

@heady hatch As I see there is no LaTeX bot on the server, but I can probably explain it without-

#

is have you define the exponential function for a matrix.

#

It is very similar to the one for numbers. exp(kx). Here k is a real number. In our definition of the exponential function this constant is a squared matrix! let us say a m x m matrix.

#

Taking a square matrix to the power of n means: A^n = A · A · ... ·A (n times)

#

Let take an example

#

A=[[1,0],[0,1]]. So this is the Identity matrix of dimension 2 x 2. And A^3 = A · A · A.
The dot is just a symbol for a matrix product.

#

So let us look at the rest of the term: x^n/n!·A^n.
x is the variable, it can be 0 or negative. x^n just means x·...·x (n times)

#

@heady hatch are you there? 🙂

heady hatch
#

I am, I'm also worried about time since I will have to sleep soon.

#

I think I kinda understood everything so far.

#

I guess I was wondering, this is an infinite series that converges.

How do you calculate the convergence?

sage palm
#

Alright, then just go to bed 🙂 sleep is important. Can we discuss it later, when you have time?

heady hatch
#

I would love that!

#

@mild topaz , we will have to deal with your issue tomorrow as well.

#

Good night you both.

mild topaz
#

if u dont mind can u give some hint to me so i can try something @heady hatch

sage palm
#

Good night! (just woke up. lol)

heady hatch
#

Alrighty. @mild topaz

So I'm not sure why your images have the shape of (377,).

Because if they were an ndarray, they should have multi-dimensions.

I would look into your images and see how you can make them (32, 32, 3).

so do stuff like print(images[0]) and stuff and try to track it down and see if they're what you expect them to be.

mild topaz
#

377 is a no of training images @heady hatch

spiral zealot
#

Hey, I implemented a GAN and will that be considered as a final year project?

sage palm
#

@heady hatch I have found a very nice method of implementing our problem in python. I will tell you about it when you wake up. I can also use one of the voice channels if you like.

royal thunder
#

can anyone explaing me this

#

over fitting the data confuses me

lapis sequoia
#

@royal thunder Are you confused about the sudden cut in the line plot?
It's a zoomed plot so you can imagine them connecting at infinite or something.

royal thunder
#

yeah

hushed flax
#

print("Hello World")

late halo
#

can I ask questions related to tensorflow here?

south gull
#

ye

spark stag
#

@royal thunder overfitting is where the algorithm trying to learn patterns in the data becomes too specialised to the data its training on, as you can see in the picture the predictions are very accurate on those data points but if you consider a point halfway between the 2 right most points, you can see that the general trend is a straight line but the line the algorithm has generated for that data has the prediction for that input far off the charts, thats where it can be sometimes better to use a simpler algorithm / simpler structure because a simple straight line can describe the data there quite well and as the text says, the predictions from a linear model are more likely to be accurate on new data than that line which has horribly overfitted to the training data

royal thunder
#

thanks @spark stag

marble bison
#

anyone know how to plot 3d vector fields without using quiver in matplotlib?

lapis sequoia
lapis sequoia
serene scaffold
#

Is this an accurate description of logistic regression?
A model that projects each object into an n-dimensional space and solves for the n-dimensional plane that best separates objects from different classes.

#

I guess it should be (n-1)dimensional

#

looks like I may have inadvertently described SVM instead.

lapis sequoia
#

The above definition is of any general linear classifier.
Both SVM and Logistic regression can separate different classes that are in N-Dimensional Hyperspace using an (N-1) - Dimensional HyperPlane.

#

@serene scaffold

serene scaffold
#

I see

rustic apex
#

I have this list of stocks I write down years ago. I just wrote down the ticker and if it was up or down. How would you use this? I want to display the stocks on a grid of how many I found per day, how often a “same” stock was written down when.... also the actual prices on a graph. What will I need?

marble bison
#

@lapis sequoia hey yeah thanks, ill have look at plotly. its just when i try to make a 3d vector field function with quiver it doesn't like having the arrow directions as an input

heady hatch
#

Hey @sage palm , I would love to hear it. Voice channel will definitely work as well, let me know when you're free!

sage palm
#

@heady hatch good morning!

#

I'm still here. How been sitting all" nighh"

heady hatch
#

Good morning to you too.

#

Were you able to solve the matrix exponential problem? Or are you in the debugging stage?

sage palm
#

No, unfor. I how only solved the proof based math question. I do not know python, so I need some help to get my thoughts implemented.

#

😄

heady hatch
#

Oh man, I'm totally excited to help you implement it.

sage palm
#

thanks!

#

I'm kinda slow mentally because of lack of sleep, so bare with me

#

Which channel to join?

heady hatch
#

Give me about 30 minutes, I'm going to get ready and be back!

sage palm
#

cool! 🔥

#

I have never tried voice channel. but i will figure it out.

sage palm
#

yes

heady hatch
#

@sage palm Alrighty, I'm back!

sage palm
#

👍

heady hatch
#

Yes indeed.

#

I'm not super familiar with Discord so I'll be figuring things out with you.

sage palm
#

Lol, I was about to say the same 😄

#

But I think a private discord call will be eaiser!

#

do you mind?

heady hatch
#

Nope not at all.

bold olive
#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

lapis sequoia
#

@bold olive It's not a good practice to apply under-sampling or oversampling before train and test split. You should first do a random splitting and then sampling to create balance training set.

bold olive
#

I understand but then how do you balance the dataset according to a different label when y is different in the split?

lapis sequoia
#

@lapis sequoia Scaling helps in faster convergence to the optimal result. So you should do it almost all the time.

#

@lapis sequoia Robust scaling here is correct right ?

#

@bold olive I'm not able to understand your statement can you rephrase it or describe it more.
Do you want to know how to do the sampling?

bold olive
#

No.

#

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

I know how to sample using the existing target label, but how do I sample it according to a different label in the dataset while being in the same classification problem?

lapis sequoia
#

@lapis sequoia how to know which one to use between robust and standard ? as both they look the same

#

@lapis sequoia Both are good choice and you should get very similar result from them. You can choose anyone.

#

@lapis sequoia thanks 🙂

#

@bold olive One option will be to increase the weight such that Male and Female have same count in dataset.

bold olive
#

Increase the weight where exactly?

lapis sequoia
#

are you using scikit-learn ?

bold olive
#

Yes.

#

X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)```

```from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```
#

y is the cancer yes/no column, there is one more gender column and I want to balance the whole dataset according to that.

lapis sequoia
#

X = [Gender, Cancer]

                                                    stratify=X[Gender], 
                                                    test_size=0.25)```
#

try this once and tell me if this works.

bold olive
#

Sure, hang on.

lapis sequoia
#

I don't think it will work.

bold olive
#

Yes, it doesn't.

#

Do you understand what I am trying to achieve though?

lapis sequoia
#
from imblearn.under_sampling import RandomUnderSampler 
X_temp = [X_train,y_train]

rus = RandomUnderSampler(random_state=42)
X_temp_rus, _ = rus.fit_resample(X_temp, X_temp[Gender])

X_train_new = X_temp_rus[Gender]
Y_train_new = X_temp_rus[Cancer]
#

You'll have to make the X[Gender] column as Y and concat the label into X_temp.
then do the sampling and extract your X_train and Y_train.
Is this making sense ?

#

@bold olive

bold olive
#

So in this way the sampling is done according to the gender but target label in the classification is still cancer? @lapis sequoia

fringe cove
#

hello can someone help me to solve this issue ? idk if the csv idownloaded is broken or my utf - 8 encoding not working ?

#

it is supposed to be "système"

bold olive
#

Not working out as I want it to unfortunately.

fringe cove
#

ok i found a solution using a westen europe iso

restive basin
#

somewhat random question, but I had a thought. is a linter basically the same as a compiler stopping half way, or are they completely different beasts? I only have a very superficial understanding of compilers, but it seems to me a linter will need to do the same work of tokenization and building some kind of syntax tree.

#

there was something I noticed recently where the kotlin linter in intellij would warn me that while I did a check to see if something was null, the variable is mutable and thus the value could change to null at anytime. I don't see how it could know that without doing all that stuff and working through the program in it's entirity

dull musk
#

@rustic fern dutch

lapis sequoia
#

Out of curiosity, how long does it take to deploy a machine learning model for you guys?

#

@marble jasper So your models are deployed automatically? Is what I am hearing?

marble jasper
#

our pipelines that automatically ingest data for unsupervised learning, pretty much do it automatically, they're run in Airflow and I dunno, process takes a few minutes to push the models to the models store API, and update some database values. Next time something tries to use a model, the endpoint reads the latest model version from database and realises it doesn't have the model cached, so pulls it, and now it's in the models cache to be used for this and new requests.
for other kinds of model, someone has to compress it, and upload it to the models bucket, and update the path in the project that's using it so it downloads the model. Tag a build, CI takes care of deploying it

#

sorry, pasting message from earlier

odd yoke
#

depends on the infrastructure of the project i'm working on

#

at my current work, about 6 months

marble jasper
#

yes, I assume you're talking after all the training is done and there's a model ready to go into production

lapis sequoia
#

And you edit the hyperparameters

#

and k-instances

marble jasper
#

we have a bunch of stuff on Airflow that's ingesting data and running unsupervised learning to create new models, and then uploading it to our models store

lapis sequoia
#

How helpful would a feature like that be benefiival or useless

odd yoke
#

you're describing automl @lapis sequoia

marble jasper
#

so that pipeline is pretty much zero-touch. we have an internal model store API that you can post models to, and it tracks the latest version of a model for a given task; the Airflow ETL is pushing models in there and bumping the version number. Production systems that use the model query for the latest model, and checks in their local cache for the model, and pulls it if it's newer

#

automl is pretty good

lapis sequoia
#

For side projects, it is a large platform to incorporate into

marble jasper
#

well, it's a google cloud product, they're upselling their entire cloud

lapis sequoia
#

What could make autoML better, if there was anything that could imprive it?

#

infrastructure

odd yoke
#

automl as it is, at least from my point of view, is perfectly suited for companies without a data science department, that want to deploy simple-ish models, in such cases, having it managed for you on some cloud platform is a perfect fit imo

lapis sequoia
#

That is a good point

#

But doesn't take forever for data science teams to deploy models?

odd yoke
#

if you're building the entire platform from scratch, i think using NAS is about the same as developing a "traditional" model

lapis sequoia
#

*doesn't it

#

So why don't they automate that part yet

odd yoke
#

in a perfect world, all you'd have to do is to provide the saved model, define endpoints, and you're good

lapis sequoia
#

Right

#

So you are saying

marble jasper
#

probably they don't automate that part because the time cost of deploying the model is miniscule compared to the time it takes to do all the other stuff

lapis sequoia
#

I thought it usually takes weeks to months to deploy a single model

#

And then cleaning beforehand

marble jasper
#

depends what you mean by "deploy"

odd yoke
#

yeah, i think any sufficiently large company with a good devops team with ml engineers can do that part in a small proportion of the total time

lapis sequoia
#

I think I phrased this poorly

#

I am not an ML engineer haha I am new to data science

odd yoke
#

i have a friend working in a data science consulting company, and they can deploy models for clients in less than a day after the model is ready

#

because they invest a lot in devops

#

the company i work at, on the other hand, does not have any dev ops dpt because "we don't do development", and it takes literal months to get it ready on a project

marble jasper
#

yeah, ours is probably about 20 minutes, depending on how quickly you can convince someone to review the PR for the model version change, due to PR review policy. Assuming the model has been vetted already

odd yoke
#

generally we just quickly patch it together and leave it as is

marble jasper
#

the full release process depends on exactly what, but usually:

  • upload some files to bucket
  • change some docker files or env vars
  • commit and tag to trigger a CI build of it
  • go and edit the version you want in production in a different repo, and PR that
  • wait for someone to accept PR
  • someone has to run that deploy because that's not automatic (but could be, just a human-in-the-loop thing)
lapis sequoia
#

Ohh

marble jasper
#

this is assuming everyone agrees that the model is ready

#

I'm not sure what problem you're solving, because it sounds like to use your system it would require someone to hook up an API

#

sure

lapis sequoia
#

Also this is very helpful

marble jasper
#

I think for some companies that have a separate team for data pipelines and devops, this is probably not that useful, because our model deployment process isn't that different from other CD tasks (there's just an extra big model file somewhere to handle). Maybe for smaller teams like Igneous mentioned, who don't have dedicated devops?

lapis sequoia
#

Again I am not a data engineer or data scientists

#

or data analyst

#

@marble jasper What is something that would speed up the process within your work? What takes the longest?

#

Also, I should have asked this before, but are you a data scientist?

#

or data engineer

marble jasper
#

no, I lean more on the devops and backend side, but I manage some ML engineers

lapis sequoia
#

Ohh I see

marble jasper
#

my main gripe with our systems is Apache Airflow kind of sucks

lapis sequoia
#

Also, I am also new to Discord haha I didnt know I can jump into communities just recently

marble jasper
#

it just doesn't FEEL like a modern app, what with the weird limitations like not being able to schedule two tasks at the same time, etc

lapis sequoia
#

Do many companies use Apache Airlfow?

marble jasper
#

probably quite a lot

#

I mean, AirBnB use it

#

most likely. it would be weird if they didn't

lapis sequoia
#

Ah I see, if you don't mind, and I understand if youre not comfortable with sharing this, but which company do you work at?

#

Idk if on discord, people share those details or that is not really a common

#

Could you elaborate on apache airflow process?

marble jasper
#

Airflow runs pipelines. Our pipeline stages are mostly either docker images running in kubernetes, or calls to internal APIs. We use Airflow for some periodic data collection, and periodic generation of some models that use unsupervised learning

#

some runs have external triggers, most run on a timer (and pull available data from an ingestion API that collects data to be processed)

lapis sequoia
#

This is a really cool

#

So everything is just automated by cronjobs

marble jasper
#

they're not cron jobs, but those processes are on a timer

lapis sequoia
#

Ohh I see

marble jasper
#

Airflow uses python

lapis sequoia
#

When do you guys agree that a model is ready?

marble jasper
#

you define your pipeline in python. for the tasks on the timer, it's part of your DAG definition. Those python DAGs live in a repository. When you push to the repository, CI/CD takes it and inserts it into a foder that Airflow watches, and Airflow automatically loads this

#

so the process of defining a new pipeline is you just create a new python file in the DAGs repo, and commit it and tag for CI/CD build

lapis sequoia
#

So everything is mostly being automated by airflow

marble jasper
#

unsupervised learning, yes

#

there's a data ingestion API that handles collecting data and making clean formatted data available to some of the pipelines

#

it's slightly decoupled, because there are Airflow processes that perform the data collection, pushing the data to the ingestion API, and other pipelines that get data from the API. This is because Airflow isn't responsible for raw data storage, and also we get a data stream from elsewhere as well

#

but yes, this is unsupervised learning on ML that's already been defined. Everything else - exploring new ML algorithms and anything that requires supervised learning, that's all offline

lapis sequoia
#

Aslo, random question but you got forbes 30 under 30

#

thats super impressive

marble jasper
#

someone has to design the experiments, do the labelling, etc. etc. that's all desktop stuff

lapis sequoia
#

It was really not a question but rather a comment. That is insanely impressive

#

But this is really helpful information

#

A lot of people don't really help out this much with the process of how it flows or put in time to write it out. I truly appreciate it

#

I hope I didn't sound weird

sinful pewter
#

idk where i

#

lol

#

idk where i'd put this but i made a calculator function

#

just found it

lapis sequoia
#

Also, is there any way that I can reach out @marble jasper whenever ? That was really helpful

rustic apex
#

What’s the better alternative to web scrapping?

lapis sequoia
#

@marble jasper Also, there is no devops team that detects data drif

#

drift

#

detecting data drift

rustic apex
#

I wrote down stocks that caught my eye years ago. As a test, I want to display the price per day and some more info. What type of way would you display stocks like this? There are duplicates

velvet thorn
#

What’s the better alternative to web scrapping?
@rustic apex it depends.

#

if the data is publicly available through an API, that usually is better

rustic apex
#

@velvet thorn how can I display the gain/loss difference from my list? I have allot of stock tickers listed

velvet thorn
#

@velvet thorn how can I display the gain/loss difference from my list? I have allot of stock tickers listed
@rustic apex I don't understand the question

rustic apex
#

@velvet thorn it’s ticker names for stocks. How can I cycle through the list to show the +- of each to now?

velvet thorn
#

I actually have no idea what you want

#

perhaps an example would help

rustic apex
#

@velvet thorn I want to display a graph/line per stock to see how the trend has been since I wrote them down.

velvet thorn
#

but where are the numbers

rustic apex
#

@velvet thorn I didn’t write them down, just the date. That’s what I want to have added to them

velvet thorn
#

okay, what does each row represent?

rustic apex
#

Each row is a day

velvet thorn
#

I'm pretty sure each COLUMN is a day?

deep mason
#

what an odd sheet

rustic apex
#

@velvet thorn yes

velvet thorn
#

so what is each row?

#

this is a pretty weird way to store data, not gonna lie

rustic apex
#

@velvet thorn I guess not really anything, it’s a list of stocks by day. The day is at the top

deep mason
#

for real.. columns for days, u expect then rows to be symbols? but it isnt

velvet thorn
#

@velvet thorn it’s ticker names for stocks. How can I cycle through the list to show the +- of each to now?
@rustic apex so why does the day matter?

#

does it matter at all?

#

or do you just want to get all the ticker symbols in that DataFrame

rustic apex
#

@velvet thorn it’s when I found a stock, and want to know the difference between when I wrote it down again

velvet thorn
#

okay

#

so

#

for each stock

#

you want the difference between

#

the day it was entered (from the column)

#

and the present

#

and I'm assuming

#

you will get the prices

#

from some external API?

rustic apex
#

@velvet thorn yes

velvet thorn
#

okay

#

got it

#

no need to mention me if you're not replying to a specific message btw

rustic apex
#

@velvet thorn ok 👍 there’s sticks I wrote down at +100%, that then shot up even more at +800%, so I want to see how this list still holds up.

#

Stocks.... not sticks

velvet thorn
#

hm.

#

a CSV is a bit of a bad choice for this

#

okay what I would do

#

is apply some data transformation

#

so you have a 2-column DataFrame

#

ticker and date

#

then you can iterate through it and call an API

#

to get those prices

rustic apex
#

Or should I just go by day?

#

Should I use web scrapping from yahoo finance?

velvet thorn
#

Should I use web scrapping from yahoo finance?
@rustic apex that's a separate question

#

one thing at a time

#

Or should I just go by day?
@rustic apex what do you mean by that?

#

isn't that implied here

#

ticker and date
@velvet thorn this

rustic apex
#

Oops, yes

#

So in Jupyter, should I display just one graph of stocks at a time?

velvet thorn
#

that would be up to you

#

you'd also need to think about what you're plotting in the first place

#

simple price?

#

some sort of moving average?

#

comparison to an index?

#

etc.

rustic apex
#

Ok, all of those 👍, I’ve seen tutorials to predict a stock, I want to try that latter as well. There’s been allot of stocks I’ve found at around 50¢/1$, and they ended up being $5, $10, $25

velvet thorn
#

yup

#

so like

#

you have a lot on your plate right now

#

I suggest you make a list of things you want to do

#

and work on them bit by bit

rustic apex
#

When I’ve watched tutorials and also some samples on Kaggle, it dosnt show a import from any API or url, it just has a analysts of the data

velvet thorn
#

okay

#

what are you getting at though

#

I don't really understand

#

where are you going to get the prices then

rustic apex
#

Well, yes I want the prices 👍, but which api or web scraping is best?

snow flax
#

What are some beginner projects for open cv that people have done

lapis sequoia
#

Are there any for just simply video classification ones or integrations within scale.ai

velvet thorn
#

Well, yes I want the prices 👍, but which api or web scraping is best?
@rustic apex depends on what you want.

#

I suggest you do some original research

velvet thorn
#

@whole roost you can ask about matplotlib here

#

anyway, to answer your question, a for loop would be appropriate

pale thunder
#

@snow flax I made a bunch of a simple filters, like negative, pixelate, posterize, and just a whole bunch of aliases to cv2 built ins, like edge detection, blur, ...

royal thunder
#

anyone ?

#

i am currently learning machine learning

#

from hands on machine learning

#

i have this huge doubt anyway

#

either to learn the math and continue on or parallely learn machine learning and learn math for it?

bold olive
#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

#

Currently, I have this:


X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```

But this balances the dataset according to the cancer yes/no (column 2) label as y is defined that way. I want to perform the sampling with column 3 (gender) and then perform classification with 2.
tiny orchid
#

Hey

#

i am new here

#

can anyone guide me from where i should learn machine learning

#

It's be awesome if you guys can help me 🙂

lapis sequoia
#

Hey guys, so im wondering what the best way is to fill those missing values. Dtypes returns as objects. I dropped all rows that all have NAN's. This is the output

left moth
#

i dunno much but may be putting mean values instead of dropping them might be better @lapis sequoia ?

hushed flax
#

I have made Tic Tac Toe in Python

arctic wedgeBOT
#

Hey @hushed flax!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

lapis sequoia
#

How can one detect data drift?

#

Or what are ways of avoiding that

whole roost
#

Hi : ) Thanks for the recommendation that I use this channel! I'm trying to figure out how to use ax.bar to, eventually, make a histogram. This is for homework, so much of the code I'm trying to use was provided and I'm modifying it. The error I'm getting is this: AttributeError: 'numpy.ndarray' object has no attribute 'bar'

#

I'm still quite unsure what ax.bar is technically supposed to do.

velvet thorn
#

@whole roost okay, I looked at your code

#

what do you want to return from plot_f_sampled?

#

it seems like you're returning an array

whole roost
#

Hm, how could I share a Word document that provides a lot of background information?

velvet thorn
#

never mind

#

@whole roost the main thing is:

pmf_for_test_plot = plot_f_sampled(n=15)

print("\nBegin homework 1, problem 3")
plot_pmf_samples(pmf_for_test_plot, x_lim=(0, 1), n=200)

here

#

you're passing pmf_for_test_plot into plot_pmf_samples, right?

#

so eventually it calls this:

#

plot_pmf(keep_count,bins)

#

however, the signature of plot_pmf is def plot_pmf(ax, pmf=(0.1, 0.8, 0.1), x_vals=(-1, 0, 1), title='No title')

#

so you're passing keep_count as the ax argument, which expects an Axes

whole roost
#

An Axes instead of a y-array (like keep_count currently is, if I'm understanding it right)? Or is it feasible that I tell keep_count to add an Axes argument in addition?

velvet thorn
#

yes, and no

#

look up keyword arguments

#

to specify how to make each argument go where you want

whole roost
#

So I need to convert keep_count into an ax argument ...

velvet thorn
#

mp

#

no

#

you need to tell plot_pmf that you're not going to provide it an Axes

#

and for it to create its own

whole roost
#

Ah! So, make Axes an optional argument with Axes='default axes'?

velvet thorn
#

uh.

#

not exactly

#

look at its code

#

and think about how that would work

#

plot_pmf

whole roost
#

I'm ... trying. Unfortunately, due to lack of sleep, my initiative is pretty shot : /

#

def plot_pmf(ax, pmf=(0.1, 0.8, 0.1), x_vals=(-1, 0, 1), title='No title'):
"""
Plot a pmf as a set of bars
:param ax: Figure axes. If None, will call subplots

#

this :param ax: comment seems to imply that ax should already default to something if not provided ... oh. plot_pmf_samples has axis, I should just provide them to plot_pmf as an argument, yeah?

#

Unsure how to reword this to be able to get the axes from it:

#

Make the subplots

f, (ax1, ax2, ax3) = plt.subplots(1, 3)
#

I'm a little reluctant to alter it or move it around, as it's part of the code I was provided.

#

I'm looking at the documentation, and can I call 'ax' within plot_pmf_samples can have it know it's referring to the

#

f, (ax1, ax2, ax3) = plt.subplots(1, 3)

velvet thorn
#

this :param ax: comment seems to imply that ax should already default to something if not provided

#

nope

whole roost
#

ax1, ax2, ax3 from here?

velvet thorn
#

hint: you can just modify how you call plot_pmf

#

you don't need to modify plot_pmf itself

whole roost
#

Oh, three subplots, three axis, right?

#

(The homework has this as an example that my results should roughly resemble:

#

So the idea is that it provides me with axis for each subplot, and then I call the appropriate axis when plotting?

#

Hah, fixed the error! But the function still doesn't quite return my plots when I run it.

lapis sequoia
#

Do you guys recommend anything for learning Machine Learning? I'm trying Codecademy for K-Means clustering and I just don't understand it.

nimble obsidian
#

I've a bit of an odd numpy question -- given a list of 2d matrices, what would be a simple way of removing all transposed copies of a matrix, leaving only one (any version)

sharp kettle
#

Do you guys recommend anything for learning Machine Learning? I'm trying Codecademy for K-Means clustering and I just don't understand it.
@lapis sequoia
Hi ! Dou you know this site :
https://towardsdatascience.com/complete-guide-to-data-visualization-with-python-2dd74df12b5e

Do you learn on scikit learn ?
https://scikit-learn.org/stable/search.html?q=KMEAN+CLUSTERING

Medium

Most libraries for data visualization with Python explained. Interactive charts, interactive reports and maps included

rocky fjord
#

Hi,

Trying to get some outside perspective. I'm working on a project about housing prices. I am using a dataset that has 500 entries. With attributes of ('Monthly Mortage Payment', "Sq Ft", etc).

The question is, "How much monthly payment can one afford?" (Taking into account average income and debt).

I'm brainstorming ideas of how to answer it, and open for suggestions.

Restricted to (Pandas, Numpy, Seaborn, Matplotlib, and Scikit Learn).

gaunt slate
#

Hi,

Trying to get some outside perspective. I'm working on a project about housing prices. I am using a dataset that has 500 entries. With attributes of ('Monthly Mortage Payment', "Sq Ft", etc).

The question is, "How much monthly payment can one afford?" (Taking into account average income and debt).

I'm brainstorming ideas of how to answer it, and open for suggestions.

Restricted to (Pandas, Numpy, Seaborn, Matplotlib, and Scikit Learn).
@rocky fjord Have you tried linear regression?

bold olive
#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

Currently, I have this:


X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```


But this balances the dataset according to the cancer yes/no (column 2) label as y is defined that way. I want to perform the sampling with column 3 (gender) and then perform classification with 2.
lapis sequoia
#

How do you guys detect data drift while monitoring the quality of your models?

lapis sequoia
#

And in what cases do people run object classification algorithms on videos/images? For what purpose?

velvet thorn
#

And in what cases do people run object classification algorithms on videos/images? For what purpose?
@lapis sequoia lots of stuff.

#

are you talking about just simple classification?

#

i.e. into one of several mutually exclusive classes

#

e.g. DOG or CAT or HAMSTER

lapis sequoia
#

Like

#

Is there any any ttool out there that runs different models onto videos/images at once?

#

For instance, I want to run a simple object classifcation model with tensorflow

#

Or pytorch with faster rcnn

velvet thorn
#

For instance, I want to run a simple object classifcation model with tensorflow
@lapis sequoia faster R-CNN isn't really

#

simple classification

lapis sequoia
#

Not a bad featuer, but they should really make it in depth

velvet thorn
#

so basically

#

you're kinda looking for model orchestration

lapis sequoia
#

Also I am new to data science

velvet thorn
#

not really sure if there's a better term for it

lapis sequoia
#

Correct

velvet thorn
#

let's step back a bit

#

why do you want to do this?

lapis sequoia
#

I worded poorly

#

So that I can use it any of my projects

#

Idk if that exists

velvet thorn
#

hm

#

you're kinda asking for a lot

#

TBH

lapis sequoia
#

Yeah

#

I was wondering first if it existed

velvet thorn
#

each of those features exist

#

but all in one...I am not sure

lapis sequoia
#

Do you think it would help others out

#

beside data labelling had that next step of now running predictions on your video footages

#

That would be the best integration I would ever see

#

Could be completely wrong

odd yoke
#

what do you mean by "running predictions on your video footages" ?

lapis sequoia
#

Also, how do you guys monitor your models?

odd yoke
#

as in, what's special about video here

lapis sequoia
#

I can poorly wording all of this, I apologize for that. Meaning, running classification models on the videos

odd yoke
#

Also, how do you guys monitor your models?
at my current workplace, we have a dashboard that shows our metrics along with the pictures that were last taken, we can specify timeframes and stuff, but it's all really basic stuff

#

the hardest part is making the frontend pretty to be fair

lapis sequoia
#

Someone told me that they had to redeploy models

#

to get rid of data drift??

odd yoke
#

and to ensure the model is still relevant, we run campaigns every N months

#

yes, that's a problem

lapis sequoia
#

Oh wow every N months

odd yoke
#

you have to re-annotate every N <time unit> to ensure the data can still be represented by the algorithm

lapis sequoia
#

Are you guys alerted whenever the quality goes poor?

odd yoke
#

no, we can't know whether or not it goes bad

lapis sequoia
#

Ohhh

odd yoke
#

hence why we have to manually monitor, our annotation process is also extremely tedious, so we can't do it continuously

#

our clients don't want to hire annotators

lapis sequoia
#

So how can you guys conclude certain decisions,

#

Oh I see

#

And if you mind me asking, but do you work as an ML engineer or which side are you on?

#

So I am aware of the perspective speaking, because

odd yoke
#

My title is a bit unclear, but I guess that would close to ML engineer ?

#

My official title is something along the lines of "Image Processing Engineer" so not very helpful

lapis sequoia
#

Ah I see, and why dont companies use Sagemaker's model monitoring for their companies?

#

Because I heard some do but some don't. Is there a reason behind that?

odd yoke
#

I'm not experienced with sagemaker, what does the model monitoring aspect of it do ?

lapis sequoia
#

I honestly just heard about it today earlier. I was speaking to a data analyst, and she said that for her job, she deploys models on Amazon's sagemaker

#

And because she does not write large python scripts, she can easily mimic data scientists' tools using SageMaker

#

Hence then I asked, how she monitored the quality constantly. She told me that Sagemaker has that feature?? I may be wrong

#

Don't know if the perspective was widely ranged because she was a data analyst at a consulting firm, so I cannot tell

odd yoke
#

sagemaker is probably very useful, it can be used for any part in the pipeline: annotation, analysis, training, verification, deployment
tho it's only for very generic problems last we checked, and it was very hard to customize models and stuff iirc

#

I'm only speaking from what my colleague that was supposed to explore sagemaker told us

lapis sequoia
#

Ohh, does Sagemaker benchmark models and automate many from once?

#

*at once

odd yoke
#

probably, but I guess you pay per model

#

or per resource used

lapis sequoia
#

Oh wow

odd yoke
#

amazon's ground truth was bad

#

that put us off directly to be honest

lapis sequoia
#

SHE TOLD ME ABOUT THAT

#

Oh sorry for the caps

odd yoke
#

i thought you were cheeringly agreeing with me lol

lapis sequoia
#

But she told me how it cannot automate classifications for training data

#

If I am not wrong on what it does

#

and many customers requested that feature

odd yoke
#

it's for annotating data

#

and managing data in general

lapis sequoia
#

I assumed Amazon would have such a service by that point

odd yoke
#

like creating versions and stuff

lapis sequoia
#

Ohhh

odd yoke
#

there are many annotation tools, but like, every single one we tried was missing something

#

so we made our own

lapis sequoia
#

Also, if your company was able to detect and get alerted by data drift, how impactful would it be to the overall decision making and makeup?

#

Oh thats smart

odd yoke
#

the impact would be huge

#

for the projects where it matters

#

i feel like it's either a non-issue, or it's crippling

lapis sequoia
#

That they dont even monitor carefully and all they do is ask their ML engineers to redeploy

odd yoke
#

that's what we do really

#

give me a minute, brb

lapis sequoia
#

Who makes the decision for you guys to deploy? Is it just by an automated timer?

#

Okay

odd yoke
#

so, we have clauses in the contract with our client that says we include in the product the price of maintenance, this includes going over to the site collecting data every N months (depends on the project, the client, etc) to evaluate the existing models and see if they need to relearn on new data

lapis sequoia
#

Ohh, and just for background, do you work at a large enterprise company or for a firm for private clients?

odd yoke
#

it's very large

lapis sequoia
#

Ohh okay

odd yoke
#

but we're a research branch, so we're a small team

lapis sequoia
#

So that is super interesting how they wait for a certain time period

#

Do people just put it aside??

odd yoke
#

put what aside ?

lapis sequoia
#

Model

#

quality

odd yoke
#

i'm certain some do yea

#

it's one of the annoying part of ML that you don't see after much time

lapis sequoia
#

A lot of others have been telling me that same sort of issue'

#

I never thought that other people experienced it

odd yoke
#

i'm sure many ML solutions out there don't check for quality over time, either from ignorance, laziness, or malice

lapis sequoia
#

Why would you say malice?

bronze skiff
#

i think in a lot of fields monitoring data/model quality is important

#

ie distributional shifts leading to biased predictions, etc.

odd yoke
#

you can promise a product, developing and shipping it, then forget about it, because you know that's where the real challenge is

lapis sequoia
#

And how does one check qqality over time of a model?

odd yoke
#

and you still got the client's money

lapis sequoia
#

Like store information of that model

bronze skiff
#

for example, our fully productionized pipelines run on kubeflow set to a job that triggers during periodic data ingests

lapis sequoia
#

Just save it over time

#

That is just crazy to me how no one yet fixed this

#

So it persists in large companies, but people havent yet found a way to just simply manage their model's infrastructure over time

#

Because this isnt the first I heard about this

bronze skiff
#

why do you say that? there are a shit ton of tools that allow you to trigger retrainings based on shifting metrics

#

it's not a new problem, and it has ways to go about it

odd yoke
#

i would say that isn't the problematic factor

#

the problem is getting said metrics

lapis sequoia
#

OHH

#

YEAH

#

Sorry my caps go off

#

But I do agre

bronze skiff
#

yeah, but i would say that's domain specific

odd yoke
#

completely

#

and that's the issue

bronze skiff
#

the MLops solutions are there

odd yoke
#

it needs to be tailored per project

bronze skiff
#

i mean, i use them all the time

#

haha

#

but it depends on what you're doing

lapis sequoia
#

I heard someone saif also about the metrics

#

beginner here . how to install tenforlow

#

I dont know if I heard it correctly from someone. But they did say it was about receiving the metrics as well

#

tamserlow

bronze skiff
#

try again

lapis sequoia
#

But I didnt know exactly what they meant

#

tensorflow