royal thunder Oct 2, 2020, 8:42 AM

#

oh great thanks

outer geyser Oct 2, 2020, 8:43 AM

#

they got calculus 2 also about 7 hours course

royal thunder Oct 2, 2020, 8:43 AM

#

wondering how much days it needs for me to become good in machine learning

#

i had my friends say like 3 months and more

#

thanks for maths man

dawn turtle Oct 2, 2020, 10:47 AM

#

I suppose this is more of a software best-practises type question but why doesnt numpy use some sort of abstract representation to evaluate the expression after it is required to be calculated (without an @ operation or maybe there are even more gains to efficiency you could make with this knowledge) which would allow numpy to always minimize the complexity? or maybe there is already a wrapper to array that does this?

lapis sequoia Oct 2, 2020, 10:55 AM

#

idk what exactly ur asking but hope someone answers ur q my man

#

yo so idk if anyone here remembers but im working on a gan

#

i been trying to run it

#

i get this error

#

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-16-41501876ca3c> in <module>()
     15 
     16 for example_input, example_target in t_in.take(1):
---> 17   generate_images(generator, example_input, example_target)
     18 
     19 EPOCHS = 150

7 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: Index out of range using input dim 0; input has only 0 dims [Op:StridedSlice] name: strided_slice/
<Figure size 1080x1080 with 0 Axes>```

#

here's the code with the model and train step https://pastebin.com/dA6dS27C

Pastebin

gan+error - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

digital zenith Oct 2, 2020, 11:10 AM

#

hey

#

i have an issue

#

hlo

#

anyone

#

nvr mind

lapis sequoia Oct 2, 2020, 11:15 AM

#

watup

#

@digital zenith

digital zenith Oct 2, 2020, 11:22 AM

#

hey

#

uh

#

i wanna make an AI

lapis sequoia Oct 2, 2020, 11:23 AM

#

yes

digital zenith Oct 2, 2020, 11:23 AM

#

for that i need a speech recognition module

#

the current one is not compatible with python3.8\

#

because pyaudio stopped at 2017

lapis sequoia Oct 2, 2020, 11:23 AM

#

so use python 3.7?

digital zenith Oct 2, 2020, 11:23 AM

#

no it is only compatible with 3.6

lapis sequoia Oct 2, 2020, 11:23 AM

#

u can just use that lol

digital zenith Oct 2, 2020, 11:24 AM

#

is there any other modules

#

or should i downgrade

lapis sequoia Oct 2, 2020, 11:24 AM

#

i just looked this up https://pypi.org/project/SpeechRecognition/

PyPI

SpeechRecognition

Library for performing speech recognition, with support for several engines and APIs, online and offline.

#

never used it tho

digital zenith Oct 2, 2020, 11:24 AM

#

yeah

#

this one needs pyaudio

lapis sequoia Oct 2, 2020, 11:24 AM

#

oh :/

#

https://medium.com/towards-artificial-intelligence/creating-a-voice-recognition-application-with-python-57d8c3e55256

Medium

Creating a Voice Recognition Application with Python

Put your audio files and speeches into text with Python.

digital zenith Oct 2, 2020, 11:25 AM

#

so i guess i should downgrade

lapis sequoia Oct 2, 2020, 11:25 AM

#

this need pyaudio as well?

digital zenith Oct 2, 2020, 11:25 AM

#

yeah

#

i've seen this page

lapis sequoia Oct 2, 2020, 11:25 AM

#

ye

#

probably ur best op

digital zenith Oct 2, 2020, 11:25 AM

#

hmm

#

thank you

lapis sequoia Oct 2, 2020, 11:27 AM

#

Hey guys, just wondering how I could make a function to check if these cities are present in the 'Kommun_name' column:
("Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro")

I did a test to see that it works by checking if 'Haparanda' exists which it does as you can see

📎 Screenshot_2020-10-02_at_13.24.14.png

native ridge Oct 2, 2020, 11:29 AM

#

df.Kommun_name.is_in(YOUR_TUPLE)

lapis sequoia Oct 2, 2020, 11:29 AM

#

Nice!

native ridge Oct 2, 2020, 11:31 AM

#

#gladtohelp

lapis sequoia Oct 2, 2020, 11:31 AM

#

Thanks!

#

#gladtohelp
@native ridge Like this? ```#Tuple with all the cities that already has an Ikea store
ikea_stores = ("Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro")

df_dropped.Kommun_name.is_in(ikea_stores)```

#

No

📎 Screenshot_2020-10-02_at_13.34.00.png

native ridge Oct 2, 2020, 11:36 AM

#

Sorry, try isin, without the "_".
Hard to remenber every single name...

#

This should return a Series of bool, with which you can index the DataFrame.

velvet thorn Oct 2, 2020, 11:37 AM

#

yup, it's isin

native ridge Oct 2, 2020, 11:37 AM

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html

velvet thorn Oct 2, 2020, 11:37 AM

#

also, try to use a set ({}) with isin

#

{"Borlänge", "Gävle", "Göteborg", "Haparanda", "Helsingborg" , "Jönköping", "Kalmar", "Karlstad", "Linköping", "Malmö", "Stockholm", "Sundsvall", "Uddevalla", "Umeå", "Uppsala", "Västerås", "Älmhult", "Örebro"}

lapis sequoia Oct 2, 2020, 11:38 AM

#

No worries, thanks. It worked but is there a more intuitive way so that it outputs just like the above?

📎 Screenshot_2020-10-02_at_13.37.18.png

#

That the output shows be the complete row like with the city like 'Haparanda'

velvet thorn Oct 2, 2020, 11:38 AM

#

filter on it

#

df[df_dropped.Kommun_name.isin(ikea_stores)]

#

exactly as you did in the previous cell

#

if you take the df[] out of it

#

you'll see that it also returns a boolean Series

lapis sequoia Oct 2, 2020, 11:39 AM

#

Sweet!

#

Thank you all so much!

#

Got what I needed.

#

Thanks for the knowledge 🙂

lapis sequoia Oct 2, 2020, 12:06 PM

#

I want to create a Has_store variable and assign value 1 for city having Ikea store and 0 otherwise. The cities in my tuple are the cities that has an Ikea store

📎 Screenshot_2020-10-02_at_14.04.09.png

#

Should I create a conditional function for this or what is the best way?

#

It sets all to yes now, what is the problem here?

📎 Screenshot_2020-10-02_at_14.10.06.png

#

your condition checks if the name is equal to the whole tuple

#

not if the name is in the tuple

#

how do I check if the values in the tuple are present

#

in operator

#

and if thats the only purpose of that tuple you get a bit of perfomance by making it a set

#

📎 Screenshot_2020-10-02_at_14.19.24.png

royal thunder Oct 2, 2020, 12:58 PM

#

how to make a dataset through webpages?

eternal fractal Oct 2, 2020, 1:10 PM

#

how to make a dataset through webpages?
@royal thunder do you mean scraping data from web ?

royal thunder Oct 2, 2020, 1:12 PM

#

yeah like that

#

here is an example

#

https://data.oecd.org/

theOECD

OECD data

Find, compare and share OECD data.

#

i wanna scrape some data from that website and make some csv file tho

fast plover Oct 2, 2020, 1:14 PM

#

@paper niche that was the ticket, thank you.

eternal fractal Oct 2, 2020, 1:15 PM

#

https://www.dataquest.io/blog/web-scraping-beautifulsoup/
this site might help .. though it uses beautifulsoup

Dataquest

Tutorial: Web Scraping and BeautifulSoup – Dataquest

This intermediate tutorial teaches you use BeautifulSoup and Python to collect data from multiple pages on IMDB using a technique called web scraping.

#

it teaches you to scrape data from web and make a visual representation of the data you scraped

#

if you got your dataframe
you may do this:
dataframe.to_csv("data.csv",sep=',',index=False)
this should get you your csv file

royal thunder Oct 2, 2020, 1:19 PM

#

thanks man

lapis sequoia Oct 2, 2020, 1:30 PM

#

Hi guys,
I've a stationary time series, I'm trying to know which model to use for forecasting this timeseries, I've performed different analysis technics and come to conclusion that my series is stationary and normally distributed but couldn't know what will be right model for forecasting, here's pictures for my seasonal decomposition, acf and pacf:

📎 Capture_decran_2020-10-02_a_12.31.40.png

#

📎 Capture_decran_2020-10-02_a_12.31.46.png

#

📎 Capture_decran_2020-10-02_a_12.31.55.png

#

looking at this graphs what will be first conclusion comes to mind ? Can we say that a moving average is good model here ?

#

thanks in advance for ur help 🙂

#

(I know for most of you this is pretty basic but not for me so any help or insight to put me in the right path is highly appreciated py_guido )

tidal sonnet Oct 2, 2020, 2:18 PM

#

@tidal sonnet how did you find the values of a, b, c to be 3, -1/2, and 0?
@heady hatch They were from a previous question, which I got correct, then they said to take the answer and plug them into the [a,b, c], then get the echelon form :(

slender nymph Oct 2, 2020, 2:38 PM

#

hi good morning

#

a little question: how can i select a title of a column without using the name column

#

thats the dataset

📎 unknown.png

#

i want select 'BTC Returns' as string

#

maybe iloc?

#

df.iloc[:0, 5:6]

#

but can i select it as string

#

how*

heady hatch Oct 2, 2020, 3:03 PM

#

@slender nymph Hey I'm not too sure what you're asking for.

Are you looking to grab one particular cell as a string? Or did you want them as a list of strings?

#

@tidal sonnet

from what I've found, A in REF ended up to be

A = [
      1, 1, 1,
      0, 1, 2,
      0, 0, 1
    ]

Then I would probably manipulate S the way A was manipulated.

I'm not super sure where a, b, c is supposed to come in.

lapis sequoia Oct 2, 2020, 3:30 PM

#

can anyone suggest me any good machine learning cources?

tidal sonnet Oct 2, 2020, 3:38 PM

#

Interesting... Can you explain the method you used?

keen prism Oct 2, 2020, 3:40 PM

#

https://hatebin.com/hortmvvtid
anyone know why i can't install transformers? i could use a bit of help... thanks.

📎 unknown.png

eternal fractal Oct 2, 2020, 3:56 PM

#

try running cmd as admin, had the same problem before when I installed scrapy, got it installed when I ran cmd as admin

#

a little question: how can i select a title of a column without using the name column
@slender nymph col_headers = dataframe.head()
try this

#

i use that for selecting column headers of csv

heady hatch Oct 2, 2020, 4:04 PM

#

Ahh that makes a lot of sense that they want the column name. hahaha I thought they wanted the values from the column.

#

@tidal sonnet

Yea so I started with

A = [
      [1, 1, 1],
      [3, 2, 1],
      [2, 1, 2]
    ]

Then from there,
-3 * first row + second
-2 * first + third

then keep going from there to reach REF.

tidal sonnet Oct 2, 2020, 4:30 PM

#

You can do it multiple times??

#

so that would give you

A = [
  [1, 1, 1],
  [0, -1, -2],
  [0, -1, 0]
]```

#

That's what i got as well, difference being I had multiplied first row by 3 and 2 and subtracted it from the others. But where did you go next?

#

seeing that in row 3, the [a] and [c] are both the same? @heady hatch ?

heady hatch Oct 2, 2020, 5:28 PM

#

@tidal sonnet and then I

multiplied -1 * second row + third
then divide third by 2.

tidal sonnet Oct 2, 2020, 5:54 PM

#

[0, 1, 2],
[0, 0, 2]

#

MY GOSH

#

@heady hatch THAT'S SO COOL

#

i didn't know that you didn't HAVE to use the first row

#

thank youuuuuuuuu

heady hatch Oct 2, 2020, 6:02 PM

#

hahaha glad to be of help.

keen prism Oct 2, 2020, 6:18 PM

#

@eternal fractal oh i didn't think about that

#

@eternal fractal https://hatebin.com/qkqzgoqemd
no success

tidal sonnet Oct 2, 2020, 6:36 PM

#

Did I do this correctly? @surreal ingot
If so, how can i get rid of the negatives that I get out for S?

📎 unknown.png

#

Using Back Substitution

#

:(

📎 unknown.png

#

I am genuinely confused

heady hatch Oct 2, 2020, 6:41 PM

#

I think you might have gotten the wrong @heady hatch . hahaha

#

On the other hand, the first S should be [15, -17, -7] I think.

#

Because 15 * -2 + 23 = -7.

#

@tidal sonnet

tidal sonnet Oct 2, 2020, 6:43 PM

#

yea it's -7, i realized that just now

#

But I also can't figure out where the (r) is supposed to come in

heady hatch Oct 2, 2020, 6:43 PM

#

I can't do math myself.

#

Hmm.

#

What do you mean by r?

tidal sonnet Oct 2, 2020, 6:44 PM

#

And this is correct

📎 unknown.png

#

then they tell me to take this answer, and plug it back into question 1

#

So i end up with something like

A =[[1, 1, 1], [3, 2, 1], [2, 1, 2]]
r = [3, -0.5, 0]
S = [15, 28, 23]

heady hatch Oct 2, 2020, 6:45 PM

#

Hmm. let me think.

#

Yea same.

#

hahaha

#

Can you give me the problem in sequence. Like in screenshots? I might be able to better give some suggestions.

tidal sonnet Oct 2, 2020, 6:47 PM

#

🤔

heady hatch Oct 2, 2020, 6:48 PM

#

Because I feel like I'm getting bits and pieces and can't really connect back together.

tidal sonnet Oct 2, 2020, 6:48 PM

#

i'm getting different numbers since i fixed me having -8 instead of -7

📎 unknown.png

#

so it's actually looking like it's start to make sense... a bit

heady hatch Oct 2, 2020, 6:49 PM

#

Okay cool cool cool.

tidal sonnet Oct 2, 2020, 6:49 PM

#

:o

📎 unknown.png

#

15- 12 = 3 💀 not 2

heady hatch Oct 2, 2020, 6:50 PM

#

:^)

tidal sonnet Oct 2, 2020, 6:50 PM

#

Would I have to reflect the same change in the matrix?

#

changing it to
identity?

heady hatch Oct 2, 2020, 6:51 PM

#

Uh what do you mean?

#

Isn't A the matrix?

tidal sonnet Oct 2, 2020, 6:51 PM

#

Yea

#

that's what I meant...
like if i'd have to set it to

A = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]```

heady hatch Oct 2, 2020, 6:52 PM

#

Hm

#

That's RREF.

#

If it's only asking for REF, then I don't think so.

tidal sonnet Oct 2, 2020, 6:52 PM

#

they wanted me to solve it 😁
THANK YOU SOOO MUCH

#

i've been on this practice quiz for 3 days now :(

heady hatch Oct 2, 2020, 6:53 PM

#

Hey, we've learned something new today!

tidal sonnet Oct 2, 2020, 6:53 PM

#

I checked the thing just now...

📎 unknown.png

heady hatch Oct 2, 2020, 6:53 PM

#

hahaha congratulations.

tidal sonnet Oct 2, 2020, 6:53 PM

#

thank you alot m8

heady hatch Oct 2, 2020, 6:53 PM

#

Glad to be of help.

tidal sonnet Oct 2, 2020, 6:53 PM

#

i couldn't figure out at all how to get row 3...
but now i know that you don't have to use the first row

#

something else i'm curious about
If I hadn't multiplied by a negative scalar and added
but instead multiplied by a positive scalar and subtracted, would that have been the same?

heady hatch Oct 2, 2020, 6:54 PM

#

It is the same.

#

You can think of it as 2 - 3 => 2 + (-3).

#

I prefer the add negative notation just to keep things uniformed.

tidal sonnet Oct 2, 2020, 6:55 PM

#

Ah

#

Thank you 🙇🏿‍♂️

#

I still wonder where r is supposed to come in 🤔

slender nymph Oct 2, 2020, 7:18 PM

#

hello data scientist. someone had made a OLS regresssion without statsmodels module only with numpy and pandas

spice marten Oct 2, 2020, 7:21 PM

#

hello could somebody help me with scraping a web page? Basically I am trying to get this picture off a website but its labled as an event which i think means that some javascript is being executed or somthing so beautiful soupd doesn't read it. Any ideas on what to do?

tidal sonnet Oct 2, 2020, 7:48 PM

#

📎 unknown.png

#

But i'm not sure where I went wrong

#

I tried finding the inverse...
They say that the above answer is the right one

#

But this is what I got out

📎 unknown.png

old thorn Oct 2, 2020, 8:07 PM

#

ok

past raptor Oct 2, 2020, 8:07 PM

#

Trying to do a histogram with array([13., 23., 33., 48., 52., 48., 33.]).
So every element is one column. Instead, I get these elements sorted to their numerical value.
How do I fix this?

📎 unknown.png

old thorn Oct 2, 2020, 8:08 PM

#

let me think for a second its been a while since I've worked with this

past raptor Oct 2, 2020, 8:08 PM

#

Alright, thanks sir

old thorn Oct 2, 2020, 8:08 PM

#

no need to call me sir haha im only a teenager

past raptor Oct 2, 2020, 8:08 PM

#

hahaha alrighty

old thorn Oct 2, 2020, 8:08 PM

#

are you using google colab or what?

past raptor Oct 2, 2020, 8:09 PM

#

jupyter

old thorn Oct 2, 2020, 8:09 PM

#

hmm i never used jupyter but its similar to colab i think

#

what was your code for this line?

past raptor Oct 2, 2020, 8:10 PM

#

ax1.hist(DS, density=True) ```

#

DS stands for the array

#

density, I tried turning off and on

old thorn Oct 2, 2020, 8:10 PM

#

ok and what do you want this to be

past raptor Oct 2, 2020, 8:10 PM

#

It seems like the axis is the problem

#

Oh i want

old thorn Oct 2, 2020, 8:11 PM

#

yeah what do u want the histogram to represent

past raptor Oct 2, 2020, 8:11 PM

#

so every element in the array has to represent one column

#

if I have an array of all 10, then all the columns have to be same height of 10

#

Is this clear?

old thorn Oct 2, 2020, 8:11 PM

#

yes, I think I understand what you're asking

#

sorry I just haven't done these in a while, almost a year

past raptor Oct 2, 2020, 8:12 PM

#

Oh, i see, well if u think u cant help me, dont worry

#

but if u have any hints at least of how to tweak

old thorn Oct 2, 2020, 8:13 PM

#

no no I think I can it will take me just a bit to remember some things

past raptor Oct 2, 2020, 8:13 PM

#

alrighty

old thorn Oct 2, 2020, 8:13 PM

#

I might not be able to give the solution but I could definitely point you in the right direction

past raptor Oct 2, 2020, 8:13 PM

#

thats more than enough

old thorn Oct 2, 2020, 8:15 PM

#

I believe it might be a problem with the axis because your array seems to be only for the x - axis, you might have to make it to where the y -axis has the same array as your x - axis if you want the histogram to have the same height as its location on the x - axis

#

did that make sense?

#

I don't know if that is correct though

past raptor Oct 2, 2020, 8:16 PM

#

let me digest that

old thorn Oct 2, 2020, 8:17 PM

#

yeah go ahead, I am not the best at clearly explaining stuff but if you need clarity go ahead and ask

past raptor Oct 2, 2020, 8:17 PM

#

oh, I think i get it. Because there is no linearity (correlation) between the two variables, the y axis is misrepresented

#

thus, showing that funny 0.00 to 0.07 value

#

on the y-axis

old thorn Oct 2, 2020, 8:17 PM

#

yup

past raptor Oct 2, 2020, 8:17 PM

#

oh, let me tweak on that, thanks mate

old thorn Oct 2, 2020, 8:17 PM

#

I don't know why you got 0.00 - 0.07

past raptor Oct 2, 2020, 8:18 PM

#

that was given by the program

#

but im trying to look for the y axis parameter

#

but cant find

old thorn Oct 2, 2020, 8:18 PM

#

hmm well try tweaking around with the axis and if u need anymore help just ping me in this channel

past raptor Oct 2, 2020, 8:19 PM

#

uhum, thanks mate

past raptor Oct 2, 2020, 8:41 PM

#

@old thorn Hey, couldnt really find a way through this.. Is there more that you know?

keen root Oct 2, 2020, 8:41 PM

#

Hi, I need some advice: I want to try to train a very (VERY) simple network, a simple perceptron, and for that there is an analytical solution, which involves the Penrose Pseudo inverse. However, my input data is a bunch of binary strings like "00010111". Now, calculating the that inverse through np.linalg.pinv(X_train) gives me sometimes a convergence error, but if I run it a second time then that error does not appear (no idea why). But if on the other hand I decide to build a keras mode like this:

model_y=keras.models.Sequential([keras.layers.Dense(100, activation="relu",input_shape=[8]),
                                keras.layers.Dense(1, activation="linear", name="l3")])
model_y.compile(optimizer=keras.optimizers.SGD(learning_rate=0.1), loss='mse', metrics=[tf.keras.metrics.RootMeanSquaredError()])

I get "no learning" at all. My first guess is that this has something to do with the fact that my input consists of binary data, but does anyone have any ideas what can be done?

📎 unknown.png

heady hatch Oct 2, 2020, 9:54 PM

#

Hmm How come you chose 100 units for the first layer?

keen root Oct 2, 2020, 10:00 PM

#

oh, my bad, that was a mistake. It should only have the 1 neuron layer with the input shape specified

heady hatch Oct 2, 2020, 10:04 PM

#

Ahh okay okay. What's the input/input shape?

keen root Oct 2, 2020, 10:07 PM

#

the input are binary integers in the neurons like an array of [0,1,1,0,0,1,1,0] and the output will be a continuous variable

#

its a regression problem

heady hatch Oct 2, 2020, 10:13 PM

#

That's what I was thinking of too. Any reason why you're using relu?

keen root Oct 2, 2020, 10:14 PM

#

no reason at all. I'm way too unexperienced for it

lapis sequoia Oct 2, 2020, 10:14 PM

#

to predict an answer to something i should just take the mean?

heady hatch Oct 2, 2020, 10:15 PM

#

That is one form of prediction.

lapis sequoia Oct 2, 2020, 10:16 PM

#

what others are there?

heady hatch Oct 2, 2020, 10:19 PM

#

Depends on the context, let's say given some data you want to find some form of predictor for this set of data.

You can choose mean or median.

#

or maybe even mode.

lapis sequoia Oct 2, 2020, 10:19 PM

#

wouldn't mean be the same as mode in the context of YES or NO?

#

and how would median be relevant for prediction?

heady hatch Oct 2, 2020, 10:20 PM

#

So prediction without any kind of context is vague.

#

Could you clarify what you mean by prediction?

lapis sequoia Oct 2, 2020, 10:22 PM

#

like say with a given age, the program tells you if it's more likely the person will say yes or no to something

#

like idk... do you have a bedtime?

#

i'm not working on a project. just trying to understand the basics

heady hatch Oct 2, 2020, 10:23 PM

#

@keen root I think we can start really simple.

Just maybe something simple like

model = Sequential()
model.add(Dense(1, activation='linear', input_dim=input_len))
model.compile(...)

I don't know if this will work or not. Would start simple.

#

Yea of course. @lapis sequoia

So in your example, the prediction would be a yes or a no.

#

And I'm assuming what you're basing that prediction is on some stats or measurement, like mean?

lapis sequoia Oct 2, 2020, 10:26 PM

#

yeah mean ig

#

but wouldn't mean just get the same result as mode? in that situation

heady hatch Oct 2, 2020, 10:28 PM

#

Depends on your data.

past raptor Oct 2, 2020, 10:28 PM

#

If I have a 4x4 np.array and I want to add all of the rows horizontally so that I end up with a single column, what is a way to do it?

[[69,0,86,8],           
[45,52,87,29],
[42,38,81,43],
[63,73,60,0]]
to
[[162]
[213]
[204]
[196]]

heady hatch Oct 2, 2020, 10:29 PM

#

Let's say your data is 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4.

I don't think the mean is the same as the mode here.

lapis sequoia Oct 2, 2020, 10:29 PM

#

that is true but what i am saying is data will either be 1 or 2

keen root Oct 2, 2020, 10:29 PM

#

@keen root I think we can start really simple.

Just maybe something simple like
model = Sequential()
model.add(Dense(1, activation='linear', input_dim=input_len))
model.compile(...)
I don't know if this will work or not. Would start simple.
@heady hatch I've tried it before, though I'll try it again

heady hatch Oct 2, 2020, 10:29 PM

#

@past raptor Are you using regular python or some library like NumPy?

lapis sequoia Oct 2, 2020, 10:29 PM

#

as in yes or no

past raptor Oct 2, 2020, 10:29 PM

#

numpy sir

keen root Oct 2, 2020, 10:30 PM

#

My main concern is the fact that there is binary data at the entry. Is there anything special about it? Would I have to have any special care?

heady hatch Oct 2, 2020, 10:30 PM

#

@past raptor You can do np.sum(data, axis=1)

past raptor Oct 2, 2020, 10:30 PM

#

Let me try that!

heady hatch Oct 2, 2020, 10:31 PM

#

@lapis sequoia Ahh okay so like 1 = yes and 2 = no?

So what would the mean represent?

lapis sequoia Oct 2, 2020, 10:31 PM

#

the same as mode no?

heady hatch Oct 2, 2020, 10:31 PM

#

@keen root I think once you've changed it into an array representation, it's not in binary format anymore. Instead it is an array/tensor of numbers.

#

@lapis sequoia Oh uh just to make sure we're on the same page, how do you calculate the mode?

lapis sequoia Oct 2, 2020, 10:32 PM

#

whatever is the most used

#

right?

past raptor Oct 2, 2020, 10:32 PM

#

@heady hatch thanks, it worked!

heady hatch Oct 2, 2020, 10:37 PM

#

@lapis sequoia right, you might need to connect the dots for me.

I'm not sure how you're getting the mean and the mode to be the same here if all you have is 1 = yes and 2 = no.

lapis sequoia Oct 2, 2020, 10:37 PM

#

like i would round the mean

#

bc 0.6 wouldnt be an answer

heady hatch Oct 2, 2020, 10:38 PM

#

Right.

lapis sequoia Oct 2, 2020, 10:38 PM

#

1.6 i mean

heady hatch Oct 2, 2020, 10:38 PM

#

To focus a bit on the details here, so rounding the mean isn't the same as the mean itself.

lapis sequoia Oct 2, 2020, 10:39 PM

#

yes i know that, but to make the mean into an answer wouldn't i have to round it?

#

i'm probably being dumb 😅

heady hatch Oct 2, 2020, 10:39 PM

#

No no you're not, you're learning and we're discussing.

lapis sequoia Oct 2, 2020, 10:40 PM

#

lemon_blush

heady hatch Oct 2, 2020, 10:40 PM

#

So straight up taking the mean and rounding it is very crude but it's one way to get predictions.

#

Do you know anything about linear regression or logistic regression?

lapis sequoia Oct 2, 2020, 10:40 PM

#

i don't

heady hatch Oct 2, 2020, 10:41 PM

#

Ahh.

To give you analogy.

Let's say we're creating an algorithm to predict whether someone will be asleep or not.

#

Our simple algorithm is just to take the mean of the data and probably apply some function.

lapis sequoia Oct 2, 2020, 10:42 PM

#

what is sigmoid function?

#

oh

heady hatch Oct 2, 2020, 10:42 PM

#

hahaha sorry don't want to throw too many things at you.

#

So linear or logistic regression are another kind of algorithms.

lapis sequoia Oct 2, 2020, 10:43 PM

#

and what do they do?

heady hatch Oct 2, 2020, 10:43 PM

#

Similar to how we grab the mean and apply some function.

#

So linear regression will predict a number of some kind given an x.

#

I don't know how familiar you are with math.

#

but like y = mx + b.

lapis sequoia Oct 2, 2020, 10:43 PM

#

mx being...

heady hatch Oct 2, 2020, 10:43 PM

#

m = slope, x = data.

lapis sequoia Oct 2, 2020, 10:44 PM

#

what is a slope?

#

this might be basic english lol

heady hatch Oct 2, 2020, 10:44 PM

#

oh no worries, it's more of a math term. hahaha

#

Imagine the line y = x.

#

You know how it's just a diagonal line?

lapis sequoia Oct 2, 2020, 10:44 PM

#

y = x would be a constant no

#

?

heady hatch Oct 2, 2020, 10:45 PM

#

Right.

lapis sequoia Oct 2, 2020, 10:45 PM

#

unless x has got something to do with y

heady hatch Oct 2, 2020, 10:45 PM

#

And the slope of the function is ratio of the vertical change over the horizontal change.

#

In y = x

#

slope = 1

#

But let's say y = 2 * x + 3

#

slope here is 2

lapis sequoia Oct 2, 2020, 10:46 PM

#

and the +3 ?

#

does that have no effect on the slope?

heady hatch Oct 2, 2020, 10:46 PM

#

So the +3 is something called the intercept.

#

You might need to learn some basic algebra if you're unsure of all this.

lapis sequoia Oct 2, 2020, 10:46 PM

#

i might know what these terms are in my language

#

i just don't recognize them in english that well

heady hatch Oct 2, 2020, 10:47 PM

#

Ahh. hmm what language are you familiar with?

lapis sequoia Oct 2, 2020, 10:47 PM

#

portuguese

heady hatch Oct 2, 2020, 10:47 PM

#

Let me google it.

#

This is from google translation.

#

A inclinação de uma função linear
A inclinação de uma colina é chamada de declive. O mesmo vale para a inclinação de uma linha. A inclinação é definida como a razão entre a mudança vertical entre dois pontos, a elevação, e a mudança horizontal entre os mesmos dois pontos, a corrida.

#

Let me know if that makes sense. hahaha

lapis sequoia Oct 2, 2020, 10:49 PM

#

kinda

#

so is slope "inclinação"?

odd yoke Oct 2, 2020, 10:50 PM

#

yes

lapis sequoia Oct 2, 2020, 11:12 PM

#

when predicting something should i add that for future learning or not? bc it might not be 100% correct

rustic apex Oct 3, 2020, 2:18 AM

#

I have a list of stocks I’ve kept track of over a while. I want to get the stock price per cell, per day, and then see what the price was.

📎 image0.jpg

plucky spindle Oct 3, 2020, 3:47 AM

#

Hello guys, I present a repo for a good cause, This repository is an initiative to share knowledge in data science to a community of Spanish-speaking practitioners, most of the content on this subject is in English, if you know techniques and methods of data science and machine learning you can share it with our study group through a pull request to be translated and serve as study material and expand the amount of understandable material, they can be, explain how a machine learning model works, some technique of cleaning or data exploration, a tutorial on how to use a module etc. etc .. Apart from participating in Hacktober fest and winning a shirt or planting a tree, you are helping a community of people who want to learn.

https://github.com/LATAM-Data-Science-Study-Group/Data-Science-Notebooks

GitHub

LATAM-Data-Science-Study-Group/Data-Science-Notebooks

Cuadernos de notas con diferentes técnicas de ciencia de datos y exploración de datos - LATAM-Data-Science-Study-Group/Data-Science-Notebooks

agile wing Oct 3, 2020, 4:24 AM

#

boom studying adnrew ng's course

#

soo far, pretty good course

mild topaz Oct 3, 2020, 6:10 AM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 89, in <module>
    assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

#

i am following https://www.youtube.com/watch?v=SWaYRyi0TTs this tutorial

YouTube

Murtaza's Workshop - Robotics and AI

Traffic Signs Classification Using Convolution Neural Networks CNN ...

Train and classify Traffic Signs using Convolutional neural networks This will be done using OPENCV in real time using a simple webcam . CNNs have been gaining popularity in the past couple of years due to their ability to generalize and classify the data with high accuracy. ...

▶ Play video

#

i am facing an AssertionError in my code

heady hatch Oct 3, 2020, 6:19 AM

#

I think the dimensions of your training images are wrong. How does the tutorial setup the input?

mild topaz Oct 3, 2020, 6:22 AM

#

@heady hatch in tutorial they have used imageDimensions = (32, 32, 3)

#

but when i pass the the same imageDimensions then it is giving an error

heady hatch Oct 3, 2020, 6:25 AM

#

And you both are using the same version of Tensorflow, right?

#

What about the data input? Similar shape?

#

Because that would be my guess, your data input might not be of right shape.

mild topaz Oct 3, 2020, 6:26 AM

#

i think he has not mentioned about shape

heady hatch Oct 3, 2020, 6:27 AM

#

So I would take it as an assumption here from the variable imageDimension.

#

Because it's looking for (32, 32, 3).

#

I would set that to be your shape.

mild topaz Oct 3, 2020, 6:29 AM

#

i am using imageDimensions = (32, 32, 3)

#

can i share my code ? @heady hatch

heady hatch Oct 3, 2020, 6:32 AM

#

Sure.

mild topaz Oct 3, 2020, 6:33 AM

#

https://paste.pythondiscord.com/usuxakuxis.coffeescript my code here ,please check line 30 and line 37 @heady hatch

heady hatch Oct 3, 2020, 6:36 AM

#

From what I'm seeing on line 89, you're checking

assert (x_train.shape[1:] == (imageDimensions)),  "the dimension of training images are wrong"

right?

mild topaz Oct 3, 2020, 6:37 AM

#

yes

heady hatch Oct 3, 2020, 6:37 AM

#

So I think if I'm following the code correctly,

you're checking if (32, 3) == (32, 32, 3)?

#

Because images are of the shape, 32 x 32 x 3?

#

I guess I'm wondering how come you're checking the shape index 1 and on instead of just x_train.shape == imageDimensions?

mild topaz Oct 3, 2020, 6:40 AM

#

in tutorial 6:06 please check line 66

#

i am following the same as shown in tutorial @heady hatch

heady hatch Oct 3, 2020, 6:41 AM

#

Right right, I would ignore the tutorial real quick.

#

Try changing x_train.shape == imageDimensions real quick.

#

And let me know how that goes.

#

Oh wait.

#

probably something like

#

x_train[0].shape == imageDimensions

#

OH

#

Wait I get it now.

#

print your x_train.shape before the assert.

#

Sorry I'm not thinking too clearly.

#

So before the line
assert ..., add a print(x_train.shape)

mild topaz Oct 3, 2020, 6:44 AM

#

(378,)```

heady hatch Oct 3, 2020, 6:44 AM

#

Yea that doesn't sound right.

#

So on your line 65.

#

what's np.array(images)?

mild topaz Oct 3, 2020, 6:44 AM

#

this is my console output```python
total classs detected : 24
noofClasses: 24
importing classes...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
data shapes
train (378,) (378,)
validation (95,) (95,)
test (119,) (119,)
(378,)
Traceback (most recent call last):

File "E:\demo3\image_classification.py", line 90, in <module>
assert (x_train.shape == (imageDimensions)), "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong```

heady hatch Oct 3, 2020, 6:45 AM

#

Right right, sorry about the x_train.shape == imageDimensions.

#

I think there's something wrong with your images.

#

these lines.

#

count = 0
images = []
classNo = []
#mylist = os.listdir(path)
p = pl.Path(path)
mylist = [x for x in p.iterdir() if x.is_dir()]
print("total classs detected :", len(mylist))
noofClasses = len(mylist)
print("noofClasses:", noofClasses)
print("importing classes...")
for x in range(0, len(mylist)):
    myPicList = os.listdir(path+"/"+str(count))
    for y in myPicList:
        curImg = cv2.imread(path+"/"+str(count)+y)
        images.append(curImg)
        classNo.append(count)
    print(count, end = " ")
    count+=1
print(" ")
images = np.array(images)
classNo = np.array(classNo)

mild topaz Oct 3, 2020, 6:46 AM

#

np.array(images) gives None

heady hatch Oct 3, 2020, 6:46 AM

#

After images = np.array(images)

can you print images[0]?

#

and double check if that's what you expect it to be.

mild topaz Oct 3, 2020, 6:47 AM

#

None

After images = np.array(images)

can you print images[0]?
@heady hatch

heady hatch Oct 3, 2020, 6:47 AM

#

Yea.

#

I think you're not reading in the images properly.

#

I'm not familiar enough with the cv library, but I can help you debug.

mild topaz Oct 3, 2020, 6:47 AM

#

I think you're not reading in the images properly.
@heady hatch ok

heady hatch Oct 3, 2020, 6:47 AM

#

So on line 59.

#

after curImg = cv2.imread(path+"/"+str(count)+y)

#

add a print curImg.

#

and then add a break.

#

on line 57.

#

after myPicList = os.listdir(path+"/"+str(count)) add a print myPicList and then add a break.

mild topaz Oct 3, 2020, 6:49 AM

#

add a print curImg.
@heady hatch python None 0 None 1 None 2 None 3 None 4 None 5 None 6 None 7 None 8 None 9 None 10 None 11 None 12 None 13 None 14 None 15 None 16 None 17 None 18 None 19 None 20 None 21 None 22 None 23

heady hatch Oct 3, 2020, 6:49 AM

#

Be sure to add a break.

#

so something like

#

print(curImg)
break

#

So double check

#

is this where your images are?

#

path = r'E://demo3//india'

mild topaz Oct 3, 2020, 6:51 AM

#

path = r'E://demo3//india'
@heady hatch yes

heady hatch Oct 3, 2020, 6:51 AM

#

okay now, since you've imported os.

#

You can try something like

#

os.path.isfile(path_to_image)

#

and check to see if you have the right path.

#

You can print it anywhere.

mild topaz Oct 3, 2020, 6:53 AM

#

os.path.isfile(path_to_image)
@heady hatch ok let me try...

#

os.path.isfile(r'E://demo3//india//0//a.jpg')
True``` @heady hatch

heady hatch Oct 3, 2020, 6:54 AM

#

Okay okay cool.

mild topaz Oct 3, 2020, 6:55 AM

#

i think the input shape or dimensions is not proper i guess

heady hatch Oct 3, 2020, 6:55 AM

#

I think it's your images.

#

Because you saw up there that it's printing None.

#

I think curImg = cv2.imread(path+"/"+str(count)+y) is incorrect.

mild topaz Oct 3, 2020, 6:56 AM

#

okay, means my images are not in correct format?

heady hatch Oct 3, 2020, 6:57 AM

#

Hmm.

#

Or maybe you're not reading them correctly.

#

like

#

I guess print path+"/"+str(count)+y

#

to make sure it's path to actual image.

#

Or actually no I think you might have a point, sorry for jumping the gun.

#

I think either path to images isn't correct

#

or something wrong with the images.

#

Since cv2.imread isn't reading them properly.

mild topaz Oct 3, 2020, 6:59 AM

#

my images consists of rotated images also

heady hatch Oct 3, 2020, 7:00 AM

#

but they're still in readable formats, right?

#

Oh wait.

#

I think

#

I might have an idea.

#

path+"/"+str(count)+y isn't this supposed to be path + "/" + str(count) + "//" + y?

#

Seeing how the files live in 'E://demo3//india//0//a.jpg'.

#

Double check if your path is correct.

#

It might also be path + "//" + str(count) + "//" + y

mild topaz Oct 3, 2020, 7:02 AM

#

let me check path + "/" + str(count) + "//" + y this?

heady hatch Oct 3, 2020, 7:03 AM

#

Yea

#

or even check path+"/"+str(count)+y.

#

Like add a line above curImg,

print(path+"/"+str(count)+y)

#

See if that's what you think it is.

mild topaz Oct 3, 2020, 7:04 AM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 71, in <module>
    print(images[0])

IndexError: index 0 is out of bounds for axis 0 with size 0```

heady hatch Oct 3, 2020, 7:05 AM

#

Oh no no no.

#

print the path.

#

print(path+"/"+str(count)+y)

sage palm Oct 3, 2020, 7:07 AM

#

Can some one please help me with this:

📎 unknown.png

mild topaz Oct 3, 2020, 7:07 AM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch

heady hatch Oct 3, 2020, 7:08 AM

#

I don't know what you want me to say @mild topaz . hahaha

#

Hey @sage palm , what do you need help with?

#

Are you allowed to use libraries?

mild topaz Oct 3, 2020, 7:09 AM

#

@heady hatch sorry, i got confused can u explain again

heady hatch Oct 3, 2020, 7:09 AM

#

It's okay, so

#

Add a line between line 58 and 59.

#

print(path+"/"+str(count)+y).

sage palm Oct 3, 2020, 7:11 AM

#

Thanks for answering! Yes, I'm allowed to use numpy

mild topaz Oct 3, 2020, 7:11 AM

#

Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 59, in <module>
    print(path+"/"+str(count)+y)

NameError: name 'y' is not defined``` @heady hatch

heady hatch Oct 3, 2020, 7:13 AM

#

@mild topaz

Sorry looking at your code, your line 58 and 59 are these.

    for y in myPicList:
        curImg = cv2.imread(path+"/"+str(count)+y)

So instead of that
add a print statement there.

    for y in myPicList:
        print(path+"/"+str(count)+y)
        curImg = cv2.imread(path+"/"+str(count)+y)

mild topaz Oct 3, 2020, 7:16 AM

#


Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch

heady hatch Oct 3, 2020, 7:19 AM

#

@sage palm Hmm I have some idea but what do you have in mind? Sorry I was typing a bunch of stuff but realized I should asked you first.

#

Hey @mild topaz I really think you should check the path. Because I think your data is just filled with Nones.

mild topaz Oct 3, 2020, 7:21 AM

#

total classs detected : 24
noofClasses: 24
importing classes...
['a.jpg', 'aa.jpg', 'aaa.jpg', 'aaaa.jpg', 'b.jpg', 'bb.jpg', 'bbb.jpg', 'bbbb.jpg', 'c1.jpg', 'cc.jpg', 'ccc.jpg', 'ccccc.jpg', 'download (7).jpg', 'download.jpg', 'ges.jpg', 'images (1).jpg', 'rfg.jpg', 't.jpg', 'tt.jpg', 'ttt.jpg', 'ttttt.jpg', 'z.jpg', 'z1.jpg', 'zz.jpg', 'zzzzz.jpg']
E://demo3//india///0a.jpg
0 ['a.jpg', 'aa.jpg', 'aaa.jpg', 'aaaa.jpg', 'bbb.jpg', 'bbbb.jpg', 'bbbbb.jpg', 'cdfg.jpg', 'cfd.jpg', 'download (3).jpg', 'download (4).jpg', 'download (5).jpg', 'download (6).jpg', 'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1``` @heady hatch

heady hatch Oct 3, 2020, 7:21 AM

#

Okay yea. I think it's your path.

#

You noticed E://demo3//india///0a.jpg?

#

But your files live in E://demo3//india///0//a.jpg?

#

You're missing //.

mild topaz Oct 3, 2020, 7:22 AM

#

But your files live in E://demo3//india///0//a.jpg?
@heady hatch a.jpg is name of my image file

heady hatch Oct 3, 2020, 7:22 AM

#

Right

#

But

#

0a.jpg is not the file, right?

mild topaz Oct 3, 2020, 7:23 AM

#

print(curImg) return None

#

0a.jpg is not the file, right?
@heady hatch yes

heady hatch Oct 3, 2020, 7:23 AM

#

and neither is 1a.jpg, right?

mild topaz Oct 3, 2020, 7:23 AM

#

and neither is 1a.jpg, right?
@heady hatch correct

heady hatch Oct 3, 2020, 7:23 AM

#

So I guess I'm wondering, how come you're trying to read those files if they don't exist?

mild topaz Oct 3, 2020, 7:24 AM

#

see this directory of images @heady hatch

📎 unknown.png

heady hatch Oct 3, 2020, 7:24 AM

#

Yes.

#

But you see how

#

You're printing out

#

'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1

#

📎 unknown.png

#

📎 unknown.png

sage palm Oct 3, 2020, 7:26 AM

#

@heady hatch No problem, I will wait. This is a problem sheet which I'm working on for the upcoming exam. There are 6 pure math problems which I have done, but the ones in python I simply can't figure it out on my own. I'm not good at Python and the course has been a bit of a nightmare, so we have not learned what we should.

mild topaz Oct 3, 2020, 7:27 AM

#

'download (7).jpg', 'g.jpg', 'gg.jpg', 'ggg.jpg', 'images (1).jpg', 'qqq.jpg', 'r.jpg', 'rr.jpg', 'rrr.jpg', 's.jpg', 'ss.jpg', 'sss.jpg', 'ssss.jpg', 'z.jpg', 'zz.jpg', 'zzz.jpg']
E://demo3//india///1a.jpg
1

@heady hatch OH i see

#

why i am getting this but ?

heady hatch Oct 3, 2020, 7:29 AM

#

@sage palm I can help you with the python but my linear algebra is a bit rusty. hahaha

I'm reading up on the converging series right now. But I would love to hear you breaking down the math portion if you can.

mild topaz Oct 3, 2020, 7:29 AM

#

i am getting this with every folder

heady hatch Oct 3, 2020, 7:29 AM

#

Yes.

#

Because you wrote curImg = cv2.imread(path+"/"+str(count)+y)

#

Your path is wrong.

#

I think it's supposed to be
curImg = cv2.imread(path+"/"+str(count)+"//"+y)

mild topaz Oct 3, 2020, 7:30 AM

#

okay, let me check

#

see this way i am getting here```python
14 ['1021.jpg', '123.jpg', '152.jpg', '52.jpg', '7856.jpg', 'a.jpg', 'aa.jpg', 'aaa.jpg', 'b.jpg', 'bb.jpg', 'bbb.jpg', 'c.jpg', 'cc.jpg', 'ccc.jpg', 'd.jpg', 'dd.jpg', 'ddd.jpg', 'e.jpg', 'ee.jpg', 'eee.jpg', 'images (1).jpg', 'images (2).jpg', 'images (4).jpg', 'images.jpg', 'pn_dl2.jpg', 'pn_dl9.jpg', 'x.jpg', 'xx.jpg', 'xxx.jpg']
E://demo3//india///151021.jpg
[[[184 181 150]
[184 181 150]
[161 158 127]
...
[198 201 199]
[ 56 61 62]
[ 0 0 3]]

[[162 165 139]
[190 193 168]
[175 178 153]
...```

heady hatch Oct 3, 2020, 7:31 AM

#

Congratulations.

mild topaz Oct 3, 2020, 7:33 AM

#

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]]
23  
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 78, in <module>
    x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 2122, in train_test_split
    default_test_size=0.25)

  File "C:\Users\Admin\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1805, in _validate_shuffle_split
    train_size)

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.``` @heady hatch

sage palm Oct 3, 2020, 7:33 AM

#

@heady hatch i will! But please give me a minut or two, i’m on my phone because my mac has frosen. Sorry about that

heady hatch Oct 3, 2020, 7:34 AM

#

No worries, I can wait.

mild topaz Oct 3, 2020, 7:34 AM

#

what is wrong in my case , can u plz help me to understand? @heady hatch

heady hatch Oct 3, 2020, 7:38 AM

#

hmm

#

are images actual data?

#

I'm not sure what you've changed in your code.

mild topaz Oct 3, 2020, 7:39 AM

#

i have changed this only pytho for y in myPicList: print(path+"/"+str(count)+y) curImg = cv2.imread(path+"/"+str(count)+"//"+y)

#

@heady hatch can i share my code again?

heady hatch Oct 3, 2020, 7:41 AM

#

Sure.

mild topaz Oct 3, 2020, 7:42 AM

#

https://paste.pythondiscord.com/bisuwuhitu.coffeescript my code here

heady hatch Oct 3, 2020, 7:43 AM

#

@mild topaz Oh remove line 64 and 65.

        print(curImg)
        break

mild topaz Oct 3, 2020, 7:44 AM

#

see this

#

E://demo3//india///23download (16).jpg
E://demo3//india///23download (18).jpg
E://demo3//india///23download (19).jpg
E://demo3//india///23download (21).jpg
E://demo3//india///23download.jpg
E://demo3//india///23gfd.jpg
E://demo3//india///23gh.jpg
E://demo3//india///23images (10).jpg
E://demo3//india///23images (11).jpg
E://demo3//india///23images (12).jpg
E://demo3//india///23images.jpg
E://demo3//india///23iu.jpg
E://demo3//india///23ry.jpg
E://demo3//india///23uiop.jpg
E://demo3//india///23y.jpg
23  
data shapes 
train (377,) (377,)
validation (95,) (95,)
test (119,) (119,)
(377,)
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 96, in <module>
    assert (x_train.shape[1:]  == (imageDimensions)),  "the dimension of training images are wrong"

AssertionError: the dimension of training images are wrong``` @heady hatch

heady hatch Oct 3, 2020, 7:44 AM

#

before that can you print x_train[0]?

mild topaz Oct 3, 2020, 7:45 AM

#

on which line?

heady hatch Oct 3, 2020, 7:45 AM

#

Probably on line 80 or something.

#

after

x_train, x_test, y_train, y_test = train_test_split(images, classNo, test_size = testRatio)
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train , test_size = validationRatio)

mild topaz Oct 3, 2020, 7:47 AM

#


 [[232 245 253]
  [232 245 253]
  [232 245 253]
  ...
  [220 230 237]
  [221 231 241]
  [222 231 245]]]
data shapes 
train (377,) (377,)
validation (95,) (95,)
test (119,) (119,)
(377,)
Traceback (most recent call last):

  File "E:\demo3\image_classification.py", line 96, in <module>
    assert (x_train.shape[1:]  == (imageDimensions)),  "the dimension of training images are wrong" 

AssertionError: the dimension of training images are wrong``` @heady hatch

sage palm Oct 3, 2020, 7:48 AM

#

@heady hatch As I see there is no LaTeX bot on the server, but I can probably explain it without-

#

So

📎 unknown.png

#

is have you define the exponential function for a matrix.

#

It is very similar to the one for numbers. exp(kx). Here k is a real number. In our definition of the exponential function this constant is a squared matrix! let us say a m x m matrix.

#

Taking a square matrix to the power of n means: A^n = A · A · ... ·A (n times)

#

Let take an example

#

A=[[1,0],[0,1]]. So this is the Identity matrix of dimension 2 x 2. And A^3 = A · A · A.
The dot is just a symbol for a matrix product.

#

So let us look at the rest of the term: x^n/n!·A^n.
x is the variable, it can be 0 or negative. x^n just means x·...·x (n times)

#

@heady hatch are you there? 🙂

heady hatch Oct 3, 2020, 7:56 AM

#

I am, I'm also worried about time since I will have to sleep soon.

#

I think I kinda understood everything so far.

#

I guess I was wondering, this is an infinite series that converges.

How do you calculate the convergence?

sage palm Oct 3, 2020, 7:59 AM

#

Alright, then just go to bed 🙂 sleep is important. Can we discuss it later, when you have time?

heady hatch Oct 3, 2020, 8:00 AM

#

I would love that!

#

@mild topaz , we will have to deal with your issue tomorrow as well.

#

Good night you both.

mild topaz Oct 3, 2020, 8:00 AM

#

if u dont mind can u give some hint to me so i can try something @heady hatch

sage palm Oct 3, 2020, 8:02 AM

#

Good night! (just woke up. lol)

heady hatch Oct 3, 2020, 8:03 AM

#

Alrighty. @mild topaz

So I'm not sure why your images have the shape of (377,).

Because if they were an ndarray, they should have multi-dimensions.

I would look into your images and see how you can make them (32, 32, 3).

so do stuff like print(images[0]) and stuff and try to track it down and see if they're what you expect them to be.

mild topaz Oct 3, 2020, 8:07 AM

#

377 is a no of training images @heady hatch

spiral zealot Oct 3, 2020, 9:16 AM

#

Hey, I implemented a GAN and will that be considered as a final year project?

sage palm Oct 3, 2020, 9:20 AM

#

@heady hatch I have found a very nice method of implementing our problem in python. I will tell you about it when you wake up. I can also use one of the voice channels if you like.

royal thunder Oct 3, 2020, 11:54 AM

#

can anyone explaing me this

#

📎 g1.png

#

over fitting the data confuses me

lapis sequoia Oct 3, 2020, 11:56 AM

#

@royal thunder Are you confused about the sudden cut in the line plot?
It's a zoomed plot so you can imagine them connecting at infinite or something.

royal thunder Oct 3, 2020, 11:57 AM

#

yeah

hushed flax Oct 3, 2020, 12:37 PM

#

print("Hello World")

late halo Oct 3, 2020, 12:40 PM

#

can I ask questions related to tensorflow here?

south gull Oct 3, 2020, 2:08 PM

#

ye

spark stag Oct 3, 2020, 2:14 PM

#

@royal thunder overfitting is where the algorithm trying to learn patterns in the data becomes too specialised to the data its training on, as you can see in the picture the predictions are very accurate on those data points but if you consider a point halfway between the 2 right most points, you can see that the general trend is a straight line but the line the algorithm has generated for that data has the prediction for that input far off the charts, thats where it can be sometimes better to use a simpler algorithm / simpler structure because a simple straight line can describe the data there quite well and as the text says, the predictions from a linear model are more likely to be accurate on new data than that line which has horribly overfitted to the training data

royal thunder Oct 3, 2020, 2:29 PM

#

thanks @spark stag

marble bison Oct 3, 2020, 3:17 PM

#

anyone know how to plot 3d vector fields without using quiver in matplotlib?

lapis sequoia Oct 3, 2020, 3:37 PM

#

@marble bison You can try other libraries like plotly.
https://plotly.com/python/3d-charts/
See if they fit your requirements.
Also check the 3D plot section of matplotlib. Everything that is possible is documented.
https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html

3D Charts

Plotly's Python graphing library makes interactive, publication-quality graphs online. Examples of how to make 3D charts.

lapis sequoia Oct 3, 2020, 4:02 PM

#

Looking at this graph of outliers for my timeseries, can wee consider series normally distributed ?

📎 Capture_decran_2020-10-03_a_17.59.12.png

serene scaffold Oct 3, 2020, 4:02 PM

#

Is this an accurate description of logistic regression?
A model that projects each object into an n-dimensional space and solves for the n-dimensional plane that best separates objects from different classes.

#

I guess it should be (n-1)dimensional

#

looks like I may have inadvertently described SVM instead.

lapis sequoia Oct 3, 2020, 4:08 PM

#

The above definition is of any general linear classifier.
Both SVM and Logistic regression can separate different classes that are in N-Dimensional Hyperspace using an (N-1) - Dimensional HyperPlane.

#

@serene scaffold

serene scaffold Oct 3, 2020, 4:09 PM

#

I see

rustic apex Oct 3, 2020, 4:11 PM

#

I have this list of stocks I write down years ago. I just wrote down the ticker and if it was up or down. How would you use this? I want to display the stocks on a grid of how many I found per day, how often a “same” stock was written down when.... also the actual prices on a graph. What will I need?

📎 image0.jpg

marble bison Oct 3, 2020, 4:19 PM

#

@lapis sequoia hey yeah thanks, ill have look at plotly. its just when i try to make a 3d vector field function with quiver it doesn't like having the arrow directions as an input

heady hatch Oct 3, 2020, 4:25 PM

#

Hey @sage palm , I would love to hear it. Voice channel will definitely work as well, let me know when you're free!

sage palm Oct 3, 2020, 4:28 PM

#

@heady hatch good morning!

#

I'm still here. How been sitting all" nighh"

heady hatch Oct 3, 2020, 4:29 PM

#

Good morning to you too.

#

Were you able to solve the matrix exponential problem? Or are you in the debugging stage?

sage palm Oct 3, 2020, 4:30 PM

#

No, unfor. I how only solved the proof based math question. I do not know python, so I need some help to get my thoughts implemented.

#

😄

heady hatch Oct 3, 2020, 4:31 PM

#

Oh man, I'm totally excited to help you implement it.

sage palm Oct 3, 2020, 4:31 PM

#

thanks!

#

I'm kinda slow mentally because of lack of sleep, so bare with me

#

Which channel to join?

heady hatch Oct 3, 2020, 4:32 PM

#

Give me about 30 minutes, I'm going to get ready and be back!

sage palm Oct 3, 2020, 4:32 PM

#

cool! 🔥

#

I have never tried voice channel. but i will figure it out.

sage palm Oct 3, 2020, 4:56 PM

#

yes

heady hatch Oct 3, 2020, 4:56 PM

#

@sage palm Alrighty, I'm back!

sage palm Oct 3, 2020, 4:56 PM

#

👍

heady hatch Oct 3, 2020, 4:56 PM

#

Yes indeed.

#

I'm not super familiar with Discord so I'll be figuring things out with you.

sage palm Oct 3, 2020, 4:57 PM

#

Lol, I was about to say the same 😄

#

But I think a private discord call will be eaiser!

#

do you mind?

heady hatch Oct 3, 2020, 4:57 PM

#

Nope not at all.

bold olive Oct 3, 2020, 5:02 PM

#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

lapis sequoia Oct 3, 2020, 5:15 PM

#

do you think I need to scale my data or just use it as its ?

📎 Capture_decran_2020-10-03_a_19.13.22.png

#

@bold olive It's not a good practice to apply under-sampling or oversampling before train and test split. You should first do a random splitting and then sampling to create balance training set.

bold olive Oct 3, 2020, 5:17 PM

#

I understand but then how do you balance the dataset according to a different label when y is different in the split?

lapis sequoia Oct 3, 2020, 5:19 PM

#

@lapis sequoia Scaling helps in faster convergence to the optimal result. So you should do it almost all the time.

#

@lapis sequoia Robust scaling here is correct right ?

#

@bold olive I'm not able to understand your statement can you rephrase it or describe it more.
Do you want to know how to do the sampling?

bold olive Oct 3, 2020, 5:20 PM

#

No.

#

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

I know how to sample using the existing target label, but how do I sample it according to a different label in the dataset while being in the same classification problem?

lapis sequoia Oct 3, 2020, 5:21 PM

#

@lapis sequoia how to know which one to use between robust and standard ? as both they look the same

#

@lapis sequoia Both are good choice and you should get very similar result from them. You can choose anyone.

#

@lapis sequoia thanks 🙂

#

@bold olive One option will be to increase the weight such that Male and Female have same count in dataset.

bold olive Oct 3, 2020, 5:26 PM

#

Increase the weight where exactly?

lapis sequoia Oct 3, 2020, 5:26 PM

#

are you using scikit-learn ?

bold olive Oct 3, 2020, 5:26 PM

#

Yes.

#


X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)```

```from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```

#

y is the cancer yes/no column, there is one more gender column and I want to balance the whole dataset according to that.

lapis sequoia Oct 3, 2020, 5:28 PM

#

X = [Gender, Cancer]

                                                    stratify=X[Gender], 
                                                    test_size=0.25)```

#

try this once and tell me if this works.

bold olive Oct 3, 2020, 5:30 PM

#

Sure, hang on.

lapis sequoia Oct 3, 2020, 5:31 PM

#

I don't think it will work.

bold olive Oct 3, 2020, 5:32 PM

#

Yes, it doesn't.

#

Do you understand what I am trying to achieve though?

lapis sequoia Oct 3, 2020, 5:32 PM

#

from imblearn.under_sampling import RandomUnderSampler 
X_temp = [X_train,y_train]

rus = RandomUnderSampler(random_state=42)
X_temp_rus, _ = rus.fit_resample(X_temp, X_temp[Gender])

X_train_new = X_temp_rus[Gender]
Y_train_new = X_temp_rus[Cancer]

#

You'll have to make the X[Gender] column as Y and concat the label into X_temp.
then do the sampling and extract your X_train and Y_train.
Is this making sense ?

#

@bold olive

bold olive Oct 3, 2020, 5:38 PM

#

So in this way the sampling is done according to the gender but target label in the classification is still cancer? @lapis sequoia

fringe cove Oct 3, 2020, 5:43 PM

#

hello can someone help me to solve this issue ? idk if the csv idownloaded is broken or my utf - 8 encoding not working ?

📎 unknown.png

#

it is supposed to be "système"

#

i have th same for this where it is supossed to be a " ' "

📎 unknown.png

bold olive Oct 3, 2020, 5:57 PM

#

Not working out as I want it to unfortunately.

fringe cove Oct 3, 2020, 6:07 PM

#

ok i found a solution using a westen europe iso

restive basin Oct 3, 2020, 6:35 PM

#

somewhat random question, but I had a thought. is a linter basically the same as a compiler stopping half way, or are they completely different beasts? I only have a very superficial understanding of compilers, but it seems to me a linter will need to do the same work of tokenization and building some kind of syntax tree.

#

there was something I noticed recently where the kotlin linter in intellij would warn me that while I did a check to see if something was null, the variable is mutable and thus the value could change to null at anytime. I don't see how it could know that without doing all that stuff and working through the program in it's entirity

dull musk Oct 3, 2020, 8:27 PM

#

@rustic fern dutch

lapis sequoia Oct 3, 2020, 11:10 PM

#

Out of curiosity, how long does it take to deploy a machine learning model for you guys?

#

@marble jasper So your models are deployed automatically? Is what I am hearing?

marble jasper Oct 3, 2020, 11:17 PM

#

our pipelines that automatically ingest data for unsupervised learning, pretty much do it automatically, they're run in Airflow and I dunno, process takes a few minutes to push the models to the models store API, and update some database values. Next time something tries to use a model, the endpoint reads the latest model version from database and realises it doesn't have the model cached, so pulls it, and now it's in the models cache to be used for this and new requests.
for other kinds of model, someone has to compress it, and upload it to the models bucket, and update the path in the project that's using it so it downloads the model. Tag a build, CI takes care of deploying it

#

sorry, pasting message from earlier

odd yoke Oct 3, 2020, 11:17 PM

#

depends on the infrastructure of the project i'm working on

#

~~at my current work, about 6 months~~

marble jasper Oct 3, 2020, 11:17 PM

#

yes, I assume you're talking after all the training is done and there's a model ready to go into production

lapis sequoia Oct 3, 2020, 11:18 PM

#

And you edit the hyperparameters

#

and k-instances

marble jasper Oct 3, 2020, 11:18 PM

#

we have a bunch of stuff on Airflow that's ingesting data and running unsupervised learning to create new models, and then uploading it to our models store

lapis sequoia Oct 3, 2020, 11:19 PM

#

How helpful would a feature like that be benefiival or useless

odd yoke Oct 3, 2020, 11:19 PM

#

you're describing automl @lapis sequoia

marble jasper Oct 3, 2020, 11:19 PM

#

so that pipeline is pretty much zero-touch. we have an internal model store API that you can post models to, and it tracks the latest version of a model for a given task; the Airflow ETL is pushing models in there and bumping the version number. Production systems that use the model query for the latest model, and checks in their local cache for the model, and pulls it if it's newer

#

automl is pretty good

lapis sequoia Oct 3, 2020, 11:21 PM

#

For side projects, it is a large platform to incorporate into

marble jasper Oct 3, 2020, 11:21 PM

#

well, it's a google cloud product, they're upselling their entire cloud

lapis sequoia Oct 3, 2020, 11:21 PM

#

What could make autoML better, if there was anything that could imprive it?

#

infrastructure

odd yoke Oct 3, 2020, 11:22 PM

#

automl as it is, at least from my point of view, is perfectly suited for companies without a data science department, that want to deploy simple-ish models, in such cases, having it managed for you on some cloud platform is a perfect fit imo

lapis sequoia Oct 3, 2020, 11:22 PM

#

That is a good point

#

But doesn't take forever for data science teams to deploy models?

odd yoke Oct 3, 2020, 11:23 PM

#

if you're building the entire platform from scratch, i think using NAS is about the same as developing a "traditional" model

lapis sequoia Oct 3, 2020, 11:23 PM

#

*doesn't it

#

So why don't they automate that part yet

odd yoke Oct 3, 2020, 11:23 PM

#

in a perfect world, all you'd have to do is to provide the saved model, define endpoints, and you're good

lapis sequoia Oct 3, 2020, 11:23 PM

#

Right

#

So you are saying

marble jasper Oct 3, 2020, 11:24 PM

#

probably they don't automate that part because the time cost of deploying the model is miniscule compared to the time it takes to do all the other stuff

lapis sequoia Oct 3, 2020, 11:25 PM

#

I thought it usually takes weeks to months to deploy a single model

#

And then cleaning beforehand

marble jasper Oct 3, 2020, 11:25 PM

#

depends what you mean by "deploy"

odd yoke Oct 3, 2020, 11:25 PM

#

yeah, i think any sufficiently large company with a good devops team with ml engineers can do that part in a small proportion of the total time

lapis sequoia Oct 3, 2020, 11:25 PM

#

I think I phrased this poorly

#

I am not an ML engineer haha I am new to data science

odd yoke Oct 3, 2020, 11:26 PM

#

i have a friend working in a data science consulting company, and they can deploy models for clients in less than a day after the model is ready

#

because they invest a lot in devops

#

the company i work at, on the other hand, does not have any dev ops dpt because "we don't do development", and it takes literal months to get it ready on a project

marble jasper Oct 3, 2020, 11:26 PM

#

yeah, ours is probably about 20 minutes, depending on how quickly you can convince someone to review the PR for the model version change, due to PR review policy. Assuming the model has been vetted already

odd yoke Oct 3, 2020, 11:27 PM

#

generally we just quickly patch it together and leave it as is

marble jasper Oct 3, 2020, 11:29 PM

#

the full release process depends on exactly what, but usually:

upload some files to bucket
change some docker files or env vars
commit and tag to trigger a CI build of it
go and edit the version you want in production in a different repo, and PR that
wait for someone to accept PR
someone has to run that deploy because that's not automatic (but could be, just a human-in-the-loop thing)

lapis sequoia Oct 3, 2020, 11:29 PM

#

Ohh

marble jasper Oct 3, 2020, 11:29 PM

#

this is assuming everyone agrees that the model is ready

#

I'm not sure what problem you're solving, because it sounds like to use your system it would require someone to hook up an API

#

sure

lapis sequoia Oct 3, 2020, 11:32 PM

#

Also this is very helpful

marble jasper Oct 3, 2020, 11:32 PM

#

I think for some companies that have a separate team for data pipelines and devops, this is probably not that useful, because our model deployment process isn't that different from other CD tasks (there's just an extra big model file somewhere to handle). Maybe for smaller teams like Igneous mentioned, who don't have dedicated devops?

lapis sequoia Oct 3, 2020, 11:32 PM

#

Again I am not a data engineer or data scientists

#

or data analyst

#

@marble jasper What is something that would speed up the process within your work? What takes the longest?

#

Also, I should have asked this before, but are you a data scientist?

#

or data engineer

marble jasper Oct 3, 2020, 11:36 PM

#

no, I lean more on the devops and backend side, but I manage some ML engineers

lapis sequoia Oct 3, 2020, 11:36 PM

#

Ohh I see

marble jasper Oct 3, 2020, 11:36 PM

#

my main gripe with our systems is Apache Airflow kind of sucks

lapis sequoia Oct 3, 2020, 11:37 PM

#

Also, I am also new to Discord haha I didnt know I can jump into communities just recently

marble jasper Oct 3, 2020, 11:37 PM

#

it just doesn't FEEL like a modern app, what with the weird limitations like not being able to schedule two tasks at the same time, etc

lapis sequoia Oct 3, 2020, 11:37 PM

#

Do many companies use Apache Airlfow?

marble jasper Oct 3, 2020, 11:38 PM

#

probably quite a lot

#

I mean, AirBnB use it

#

most likely. it would be weird if they didn't

lapis sequoia Oct 3, 2020, 11:38 PM

#

Ah I see, if you don't mind, and I understand if youre not comfortable with sharing this, but which company do you work at?

#

Idk if on discord, people share those details or that is not really a common

#

Could you elaborate on apache airflow process?

marble jasper Oct 3, 2020, 11:44 PM

#

Airflow runs pipelines. Our pipeline stages are mostly either docker images running in kubernetes, or calls to internal APIs. We use Airflow for some periodic data collection, and periodic generation of some models that use unsupervised learning

#

some runs have external triggers, most run on a timer (and pull available data from an ingestion API that collects data to be processed)

lapis sequoia Oct 3, 2020, 11:46 PM

#

This is a really cool

#

So everything is just automated by cronjobs

marble jasper Oct 3, 2020, 11:46 PM

#

they're not cron jobs, but those processes are on a timer

lapis sequoia Oct 3, 2020, 11:47 PM

#

Ohh I see

marble jasper Oct 3, 2020, 11:47 PM

#

Airflow uses python

lapis sequoia Oct 3, 2020, 11:47 PM

#

When do you guys agree that a model is ready?

marble jasper Oct 3, 2020, 11:48 PM

#

you define your pipeline in python. for the tasks on the timer, it's part of your DAG definition. Those python DAGs live in a repository. When you push to the repository, CI/CD takes it and inserts it into a foder that Airflow watches, and Airflow automatically loads this

#

so the process of defining a new pipeline is you just create a new python file in the DAGs repo, and commit it and tag for CI/CD build

lapis sequoia Oct 3, 2020, 11:49 PM

#

So everything is mostly being automated by airflow

marble jasper Oct 3, 2020, 11:49 PM

#

unsupervised learning, yes

#

there's a data ingestion API that handles collecting data and making clean formatted data available to some of the pipelines

#

it's slightly decoupled, because there are Airflow processes that perform the data collection, pushing the data to the ingestion API, and other pipelines that get data from the API. This is because Airflow isn't responsible for raw data storage, and also we get a data stream from elsewhere as well

#

but yes, this is unsupervised learning on ML that's already been defined. Everything else - exploring new ML algorithms and anything that requires supervised learning, that's all offline

lapis sequoia Oct 3, 2020, 11:53 PM

#

Aslo, random question but you got forbes 30 under 30

#

thats super impressive

marble jasper Oct 3, 2020, 11:53 PM

#

someone has to design the experiments, do the labelling, etc. etc. that's all desktop stuff

lapis sequoia Oct 3, 2020, 11:54 PM

#

It was really not a question but rather a comment. That is insanely impressive

#

But this is really helpful information

#

A lot of people don't really help out this much with the process of how it flows or put in time to write it out. I truly appreciate it

#

I hope I didn't sound weird

sinful pewter Oct 3, 2020, 11:55 PM

#

idk where i

#

lol

#

idk where i'd put this but i made a calculator function

#

just found it

lapis sequoia Oct 4, 2020, 12:03 AM

#

Also, is there any way that I can reach out @marble jasper whenever ? That was really helpful

rustic apex Oct 4, 2020, 12:22 AM

#

What’s the better alternative to web scrapping?

lapis sequoia Oct 4, 2020, 1:40 AM

#

@marble jasper Also, there is no devops team that detects data drif

#

drift

#

detecting data drift

rustic apex Oct 4, 2020, 2:45 AM

#

I wrote down stocks that caught my eye years ago. As a test, I want to display the price per day and some more info. What type of way would you display stocks like this? There are duplicates

📎 image0.jpg

velvet thorn Oct 4, 2020, 2:47 AM

#

What’s the better alternative to web scrapping?
@rustic apex it depends.

#

if the data is publicly available through an API, that usually is better

rustic apex Oct 4, 2020, 2:50 AM

#

@velvet thorn how can I display the gain/loss difference from my list? I have allot of stock tickers listed

velvet thorn Oct 4, 2020, 2:51 AM

#

@velvet thorn how can I display the gain/loss difference from my list? I have allot of stock tickers listed
@rustic apex I don't understand the question

rustic apex Oct 4, 2020, 2:53 AM

#

@velvet thorn it’s ticker names for stocks. How can I cycle through the list to show the +- of each to now?

velvet thorn Oct 4, 2020, 2:53 AM

#

I actually have no idea what you want

#

perhaps an example would help

rustic apex Oct 4, 2020, 2:56 AM

#

@velvet thorn I want to display a graph/line per stock to see how the trend has been since I wrote them down.

velvet thorn Oct 4, 2020, 2:57 AM

#

but where are the numbers

rustic apex Oct 4, 2020, 2:57 AM

#

@velvet thorn I didn’t write them down, just the date. That’s what I want to have added to them

velvet thorn Oct 4, 2020, 2:57 AM

#

okay, what does each row represent?

rustic apex Oct 4, 2020, 2:57 AM

#

Each row is a day

velvet thorn Oct 4, 2020, 2:57 AM

#

I'm pretty sure each COLUMN is a day?

deep mason Oct 4, 2020, 2:58 AM

#

what an odd sheet

rustic apex Oct 4, 2020, 2:58 AM

#

@velvet thorn yes

velvet thorn Oct 4, 2020, 2:59 AM

#

so what is each row?

#

this is a pretty weird way to store data, not gonna lie

rustic apex Oct 4, 2020, 3:00 AM

#

@velvet thorn I guess not really anything, it’s a list of stocks by day. The day is at the top

deep mason Oct 4, 2020, 3:00 AM

#

for real.. columns for days, u expect then rows to be symbols? but it isnt

velvet thorn Oct 4, 2020, 3:00 AM

#

@velvet thorn it’s ticker names for stocks. How can I cycle through the list to show the +- of each to now?
@rustic apex so why does the day matter?

#

does it matter at all?

#

or do you just want to get all the ticker symbols in that DataFrame

rustic apex Oct 4, 2020, 3:00 AM

#

@velvet thorn it’s when I found a stock, and want to know the difference between when I wrote it down again

velvet thorn Oct 4, 2020, 3:00 AM

#

okay

#

so

#

for each stock

#

you want the difference between

#

the day it was entered (from the column)

#

and the present

#

and I'm assuming

#

you will get the prices

#

from some external API?

rustic apex Oct 4, 2020, 3:01 AM

#

@velvet thorn yes

velvet thorn Oct 4, 2020, 3:01 AM

#

okay

#

got it

#

no need to mention me if you're not replying to a specific message btw

rustic apex Oct 4, 2020, 3:02 AM

#

@velvet thorn ok 👍 there’s sticks I wrote down at +100%, that then shot up even more at +800%, so I want to see how this list still holds up.

#

Stocks.... not sticks

velvet thorn Oct 4, 2020, 3:03 AM

#

hm.

#

a CSV is a bit of a bad choice for this

#

okay what I would do

#

is apply some data transformation

#

so you have a 2-column DataFrame

#

ticker and date

#

then you can iterate through it and call an API

#

to get those prices

rustic apex Oct 4, 2020, 3:03 AM

#

Or should I just go by day?

#

Should I use web scrapping from yahoo finance?

velvet thorn Oct 4, 2020, 3:04 AM

#

Should I use web scrapping from yahoo finance?
@rustic apex that's a separate question

#

one thing at a time

#

Or should I just go by day?
@rustic apex what do you mean by that?

#

isn't that implied here

#

ticker and date
@velvet thorn this

rustic apex Oct 4, 2020, 3:04 AM

#

Oops, yes

#

So in Jupyter, should I display just one graph of stocks at a time?

velvet thorn Oct 4, 2020, 3:06 AM

#

that would be up to you

#

you'd also need to think about what you're plotting in the first place

#

simple price?

#

some sort of moving average?

#

comparison to an index?

#

etc.

rustic apex Oct 4, 2020, 3:08 AM

#

Ok, all of those 👍, I’ve seen tutorials to predict a stock, I want to try that latter as well. There’s been allot of stocks I’ve found at around 50¢/1$, and they ended up being $5, $10, $25

velvet thorn Oct 4, 2020, 3:08 AM

#

yup

#

so like

#

you have a lot on your plate right now

#

I suggest you make a list of things you want to do

#

and work on them bit by bit

rustic apex Oct 4, 2020, 3:17 AM

#

When I’ve watched tutorials and also some samples on Kaggle, it dosnt show a import from any API or url, it just has a analysts of the data

velvet thorn Oct 4, 2020, 3:18 AM

#

okay

#

what are you getting at though

#

I don't really understand

#

where are you going to get the prices then

rustic apex Oct 4, 2020, 3:27 AM

#

Well, yes I want the prices 👍, but which api or web scraping is best?

snow flax Oct 4, 2020, 3:41 AM

#

What are some beginner projects for open cv that people have done

lapis sequoia Oct 4, 2020, 3:41 AM

#

Are there any for just simply video classification ones or integrations within scale.ai

velvet thorn Oct 4, 2020, 4:30 AM

#

Well, yes I want the prices 👍, but which api or web scraping is best?
@rustic apex depends on what you want.

#

I suggest you do some original research

velvet thorn Oct 4, 2020, 8:18 AM

#

@whole roost you can ask about matplotlib here

#

anyway, to answer your question, a for loop would be appropriate

pale thunder Oct 4, 2020, 8:24 AM

#

@snow flax I made a bunch of a simple filters, like negative, pixelate, posterize, and just a whole bunch of aliases to cv2 built ins, like edge detection, blur, ...

royal thunder Oct 4, 2020, 8:34 AM

#

anyone ?

#

i am currently learning machine learning

#

from hands on machine learning

#

i have this huge doubt anyway

#

either to learn the math and continue on or parallely learn machine learning and learn math for it?

bold olive Oct 4, 2020, 10:14 AM

#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

#

Currently, I have this:


X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```

But this balances the dataset according to the cancer yes/no (column 2) label as y is defined that way. I want to perform the sampling with column 3 (gender) and then perform classification with 2.

tiny orchid Oct 4, 2020, 11:38 AM

#

Hey

#

i am new here

#

can anyone guide me from where i should learn machine learning

#

It's be awesome if you guys can help me 🙂

lapis sequoia Oct 4, 2020, 12:06 PM

#

Hey guys, so im wondering what the best way is to fill those missing values. Dtypes returns as objects. I dropped all rows that all have NAN's. This is the output

📎 missing.PNG

left moth Oct 4, 2020, 12:38 PM

#

i dunno much but may be putting mean values instead of dropping them might be better @lapis sequoia ?

hushed flax Oct 4, 2020, 1:35 PM

#

I have made Tic Tac Toe in Python

arctic wedgeBOT Oct 4, 2020, 1:36 PM

#

Hey @hushed flax!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

lapis sequoia Oct 4, 2020, 1:42 PM

#

How can one detect data drift?

#

Or what are ways of avoiding that

whole roost Oct 4, 2020, 2:43 PM

#

Hi : ) Thanks for the recommendation that I use this channel! I'm trying to figure out how to use ax.bar to, eventually, make a histogram. This is for homework, so much of the code I'm trying to use was provided and I'm modifying it. The error I'm getting is this: AttributeError: 'numpy.ndarray' object has no attribute 'bar'

#

The code I'm using is this: https://repl.it/repls/GrowingScientificService#main.py

repl.it

GrowingScientificService

A Python repl created by an anonymous user

#

I'm still quite unsure what ax.bar is technically supposed to do.

velvet thorn Oct 4, 2020, 3:07 PM

#

@whole roost okay, I looked at your code

#

what do you want to return from plot_f_sampled?

#

it seems like you're returning an array

whole roost Oct 4, 2020, 3:09 PM

#

Hm, how could I share a Word document that provides a lot of background information?

velvet thorn Oct 4, 2020, 3:10 PM

#

never mind

#

@whole roost the main thing is:

pmf_for_test_plot = plot_f_sampled(n=15)

print("\nBegin homework 1, problem 3")
plot_pmf_samples(pmf_for_test_plot, x_lim=(0, 1), n=200)

here

#

you're passing pmf_for_test_plot into plot_pmf_samples, right?

#

so eventually it calls this:

#

plot_pmf(keep_count,bins)

#

however, the signature of plot_pmf is def plot_pmf(ax, pmf=(0.1, 0.8, 0.1), x_vals=(-1, 0, 1), title='No title')

#

so you're passing keep_count as the ax argument, which expects an Axes

whole roost Oct 4, 2020, 3:13 PM

#

An Axes instead of a y-array (like keep_count currently is, if I'm understanding it right)? Or is it feasible that I tell keep_count to add an Axes argument in addition?

velvet thorn Oct 4, 2020, 3:14 PM

#

yes, and no

#

look up keyword arguments

#

to specify how to make each argument go where you want

whole roost Oct 4, 2020, 3:14 PM

#

So I need to convert keep_count into an ax argument ...

velvet thorn Oct 4, 2020, 3:14 PM

#

mp

#

no

#

you need to tell plot_pmf that you're not going to provide it an Axes

#

and for it to create its own

whole roost Oct 4, 2020, 3:15 PM

#

Ah! So, make Axes an optional argument with Axes='default axes'?

velvet thorn Oct 4, 2020, 3:16 PM

#

uh.

#

not exactly

#

look at its code

#

and think about how that would work

#

plot_pmf

whole roost Oct 4, 2020, 3:17 PM

#

I'm ... trying. Unfortunately, due to lack of sleep, my initiative is pretty shot : /

#

def plot_pmf(ax, pmf=(0.1, 0.8, 0.1), x_vals=(-1, 0, 1), title='No title'):
"""
Plot a pmf as a set of bars
:param ax: Figure axes. If None, will call subplots

#

this :param ax: comment seems to imply that ax should already default to something if not provided ... oh. plot_pmf_samples has axis, I should just provide them to plot_pmf as an argument, yeah?

#

Unsure how to reword this to be able to get the axes from it:

#

Make the subplots

f, (ax1, ax2, ax3) = plt.subplots(1, 3)

#

I'm a little reluctant to alter it or move it around, as it's part of the code I was provided.

#

I'm looking at the documentation, and can I call 'ax' within plot_pmf_samples can have it know it's referring to the

#

f, (ax1, ax2, ax3) = plt.subplots(1, 3)

velvet thorn Oct 4, 2020, 3:29 PM

#

this :param ax: comment seems to imply that ax should already default to something if not provided

#

nope

whole roost Oct 4, 2020, 3:29 PM

#

ax1, ax2, ax3 from here?

velvet thorn Oct 4, 2020, 3:29 PM

#

hint: you can just modify how you call plot_pmf

#

you don't need to modify plot_pmf itself

whole roost Oct 4, 2020, 3:31 PM

#

Oh, three subplots, three axis, right?

#

(The homework has this as an example that my results should roughly resemble:

#

📎 unknown.png

#

So the idea is that it provides me with axis for each subplot, and then I call the appropriate axis when plotting?

#

Hah, fixed the error! But the function still doesn't quite return my plots when I run it.

lapis sequoia Oct 4, 2020, 5:20 PM

#

Do you guys recommend anything for learning Machine Learning? I'm trying Codecademy for K-Means clustering and I just don't understand it.

nimble obsidian Oct 4, 2020, 5:43 PM

#

I've a bit of an odd numpy question -- given a list of 2d matrices, what would be a simple way of removing all transposed copies of a matrix, leaving only one (any version)

sharp kettle Oct 4, 2020, 5:44 PM

#

Do you guys recommend anything for learning Machine Learning? I'm trying Codecademy for K-Means clustering and I just don't understand it.
@lapis sequoia
Hi ! Dou you know this site :
https://towardsdatascience.com/complete-guide-to-data-visualization-with-python-2dd74df12b5e

Do you learn on scikit learn ?
https://scikit-learn.org/stable/search.html?q=KMEAN+CLUSTERING

Medium

Complete Guide to Data Visualization with Python

Most libraries for data visualization with Python explained. Interactive charts, interactive reports and maps included

rocky fjord Oct 4, 2020, 7:35 PM

#

Hi,

Trying to get some outside perspective. I'm working on a project about housing prices. I am using a dataset that has 500 entries. With attributes of ('Monthly Mortage Payment', "Sq Ft", etc).

The question is, "How much monthly payment can one afford?" (Taking into account average income and debt).

I'm brainstorming ideas of how to answer it, and open for suggestions.

Restricted to (Pandas, Numpy, Seaborn, Matplotlib, and Scikit Learn).

gaunt slate Oct 4, 2020, 8:11 PM

#

Hi,

Trying to get some outside perspective. I'm working on a project about housing prices. I am using a dataset that has 500 entries. With attributes of ('Monthly Mortage Payment', "Sq Ft", etc).

The question is, "How much monthly payment can one afford?" (Taking into account average income and debt).

I'm brainstorming ideas of how to answer it, and open for suggestions.

Restricted to (Pandas, Numpy, Seaborn, Matplotlib, and Scikit Learn).
@rocky fjord Have you tried linear regression?

bold olive Oct 4, 2020, 9:01 PM

#

What do I do if I want to apply an undersampling/oversampling technique on a different target column and then train the model with a different column as the label?

All the imabalanced-learn methods I have seen are applied after the training-test split, so at that point y is already defined.

Basically, I have two columns - cancer yes/no & gender M/F. I want to sample the dataset so that there are equal instances of M and F, and then proceed with my classification problem: cancer yes or no (irrespective of the no. of instances).

Currently, I have this:


X = dataa.iloc[:, 10:26]
y = dataa.iloc[:, 2]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


from imblearn.under_sampling import RandomUnderSampler 

rus = RandomUnderSampler(random_state=42)
X_res, y_res = rus.fit_resample(X_train, y_train)```


But this balances the dataset according to the cancer yes/no (column 2) label as y is defined that way. I want to perform the sampling with column 3 (gender) and then perform classification with 2.

lapis sequoia Oct 5, 2020, 12:15 AM

#

How do you guys detect data drift while monitoring the quality of your models?

lapis sequoia Oct 5, 2020, 12:34 AM

#

And in what cases do people run object classification algorithms on videos/images? For what purpose?

velvet thorn Oct 5, 2020, 12:37 AM

#

And in what cases do people run object classification algorithms on videos/images? For what purpose?
@lapis sequoia lots of stuff.

#

are you talking about just simple classification?

#

i.e. into one of several mutually exclusive classes

#

e.g. DOG or CAT or HAMSTER

lapis sequoia Oct 5, 2020, 12:42 AM

#

Like

#

Is there any any ttool out there that runs different models onto videos/images at once?

#

For instance, I want to run a simple object classifcation model with tensorflow

#

Or pytorch with faster rcnn

velvet thorn Oct 5, 2020, 12:48 AM

#

For instance, I want to run a simple object classifcation model with tensorflow
@lapis sequoia faster R-CNN isn't really

#

simple classification

lapis sequoia Oct 5, 2020, 12:48 AM

#

Not a bad featuer, but they should really make it in depth

velvet thorn Oct 5, 2020, 12:48 AM

#

so basically

#

you're kinda looking for model orchestration

lapis sequoia Oct 5, 2020, 12:48 AM

#

Also I am new to data science

velvet thorn Oct 5, 2020, 12:48 AM

#

not really sure if there's a better term for it

lapis sequoia Oct 5, 2020, 12:49 AM

#

Correct

velvet thorn Oct 5, 2020, 12:49 AM

#

let's step back a bit

#

why do you want to do this?

lapis sequoia Oct 5, 2020, 12:49 AM

#

I worded poorly

#

So that I can use it any of my projects

#

Idk if that exists

velvet thorn Oct 5, 2020, 12:52 AM

#

hm

#

you're kinda asking for a lot

#

TBH

lapis sequoia Oct 5, 2020, 12:52 AM

#

Yeah

#

I was wondering first if it existed

velvet thorn Oct 5, 2020, 12:52 AM

#

each of those features exist

#

but all in one...I am not sure

lapis sequoia Oct 5, 2020, 12:52 AM

#

Do you think it would help others out

#

Because if Scale.ai

#

beside data labelling had that next step of now running predictions on your video footages

#

That would be the best integration I would ever see

#

Could be completely wrong

odd yoke Oct 5, 2020, 12:55 AM

#

what do you mean by "running predictions on your video footages" ?

lapis sequoia Oct 5, 2020, 12:55 AM

#

Also, how do you guys monitor your models?

odd yoke Oct 5, 2020, 12:55 AM

#

as in, what's special about video here

lapis sequoia Oct 5, 2020, 12:55 AM

#

I can poorly wording all of this, I apologize for that. Meaning, running classification models on the videos

odd yoke Oct 5, 2020, 12:56 AM

#

Also, how do you guys monitor your models?
at my current workplace, we have a dashboard that shows our metrics along with the pictures that were last taken, we can specify timeframes and stuff, but it's all really basic stuff

#

the hardest part is making the frontend pretty to be fair

lapis sequoia Oct 5, 2020, 12:56 AM

#

Someone told me that they had to redeploy models

#

to get rid of data drift??

odd yoke Oct 5, 2020, 12:56 AM

#

and to ensure the model is still relevant, we run campaigns every N months

#

yes, that's a problem

lapis sequoia Oct 5, 2020, 12:57 AM

#

Oh wow every N months

odd yoke Oct 5, 2020, 12:57 AM

#

you have to re-annotate every N <time unit> to ensure the data can still be represented by the algorithm

lapis sequoia Oct 5, 2020, 12:57 AM

#

Are you guys alerted whenever the quality goes poor?

odd yoke Oct 5, 2020, 12:57 AM

#

no, we can't know whether or not it goes bad

lapis sequoia Oct 5, 2020, 12:57 AM

#

Ohhh

odd yoke Oct 5, 2020, 12:58 AM

#

hence why we have to manually monitor, our annotation process is also extremely tedious, so we can't do it continuously

#

our clients don't want to hire annotators

lapis sequoia Oct 5, 2020, 12:58 AM

#

So how can you guys conclude certain decisions,

#

Oh I see

#

And if you mind me asking, but do you work as an ML engineer or which side are you on?

#

So I am aware of the perspective speaking, because

odd yoke Oct 5, 2020, 12:58 AM

#

My title is a bit unclear, but I guess that would close to ML engineer ?

#

My official title is something along the lines of "Image Processing Engineer" so not very helpful

lapis sequoia Oct 5, 2020, 12:59 AM

#

Ah I see, and why dont companies use Sagemaker's model monitoring for their companies?

#

Because I heard some do but some don't. Is there a reason behind that?

odd yoke Oct 5, 2020, 12:59 AM

#

I'm not experienced with sagemaker, what does the model monitoring aspect of it do ?

lapis sequoia Oct 5, 2020, 1:00 AM

#

I honestly just heard about it today earlier. I was speaking to a data analyst, and she said that for her job, she deploys models on Amazon's sagemaker

#

And because she does not write large python scripts, she can easily mimic data scientists' tools using SageMaker

#

Hence then I asked, how she monitored the quality constantly. She told me that Sagemaker has that feature?? I may be wrong

#

Don't know if the perspective was widely ranged because she was a data analyst at a consulting firm, so I cannot tell

odd yoke Oct 5, 2020, 1:01 AM

#

sagemaker is probably very useful, it can be used for any part in the pipeline: annotation, analysis, training, verification, deployment
tho it's only for very generic problems last we checked, and it was very hard to customize models and stuff iirc

#

I'm only speaking from what my colleague that was supposed to explore sagemaker told us

lapis sequoia Oct 5, 2020, 1:02 AM

#

Ohh, does Sagemaker benchmark models and automate many from once?

#

*at once

odd yoke Oct 5, 2020, 1:02 AM

#

probably, but I guess you pay per model

#

or per resource used

lapis sequoia Oct 5, 2020, 1:03 AM

#

Oh wow

odd yoke Oct 5, 2020, 1:03 AM

#

amazon's ground truth was bad

#

that put us off directly to be honest

lapis sequoia Oct 5, 2020, 1:03 AM

#

SHE TOLD ME ABOUT THAT

#

Oh sorry for the caps

odd yoke Oct 5, 2020, 1:03 AM

#

i thought you were cheeringly agreeing with me lol

lapis sequoia Oct 5, 2020, 1:03 AM

#

But she told me how it cannot automate classifications for training data

#

If I am not wrong on what it does

#

and many customers requested that feature

odd yoke Oct 5, 2020, 1:03 AM

#

it's for annotating data

#

and managing data in general

lapis sequoia Oct 5, 2020, 1:04 AM

#

I assumed Amazon would have such a service by that point

odd yoke Oct 5, 2020, 1:04 AM

#

like creating versions and stuff

lapis sequoia Oct 5, 2020, 1:04 AM

#

Ohhh

odd yoke Oct 5, 2020, 1:04 AM

#

there are many annotation tools, but like, every single one we tried was missing something

#

so we made our own

lapis sequoia Oct 5, 2020, 1:04 AM

#

Also, if your company was able to detect and get alerted by data drift, how impactful would it be to the overall decision making and makeup?

#

Oh thats smart

odd yoke Oct 5, 2020, 1:05 AM

#

the impact would be huge

#

for the projects where it matters

#

i feel like it's either a non-issue, or it's crippling

lapis sequoia Oct 5, 2020, 1:05 AM

#

That they dont even monitor carefully and all they do is ask their ML engineers to redeploy

odd yoke Oct 5, 2020, 1:05 AM

#

that's what we do really

#

give me a minute, brb

lapis sequoia Oct 5, 2020, 1:05 AM

#

Who makes the decision for you guys to deploy? Is it just by an automated timer?

#

Okay

odd yoke Oct 5, 2020, 1:08 AM

#

so, we have clauses in the contract with our client that says we include in the product the price of maintenance, this includes going over to the site collecting data every N months (depends on the project, the client, etc) to evaluate the existing models and see if they need to relearn on new data

lapis sequoia Oct 5, 2020, 1:09 AM

#

Ohh, and just for background, do you work at a large enterprise company or for a firm for private clients?

odd yoke Oct 5, 2020, 1:10 AM

#

it's very large

lapis sequoia Oct 5, 2020, 1:10 AM

#

Ohh okay

odd yoke Oct 5, 2020, 1:10 AM

#

but we're a research branch, so we're a small team

lapis sequoia Oct 5, 2020, 1:10 AM

#

So that is super interesting how they wait for a certain time period

#

Do people just put it aside??

odd yoke Oct 5, 2020, 1:11 AM

#

put what aside ?

lapis sequoia Oct 5, 2020, 1:11 AM

#

Model

#

quality

odd yoke Oct 5, 2020, 1:11 AM

#

i'm certain some do yea

#

it's one of the annoying part of ML that you don't see after much time

lapis sequoia Oct 5, 2020, 1:11 AM

#

A lot of others have been telling me that same sort of issue'

#

I never thought that other people experienced it

odd yoke Oct 5, 2020, 1:12 AM

#

i'm sure many ML solutions out there don't check for quality over time, either from ignorance, laziness, or malice

lapis sequoia Oct 5, 2020, 1:12 AM

#

Why would you say malice?

bronze skiff Oct 5, 2020, 1:12 AM

#

i think in a lot of fields monitoring data/model quality is important

#

ie distributional shifts leading to biased predictions, etc.

odd yoke Oct 5, 2020, 1:13 AM

#

you can promise a product, developing and shipping it, then forget about it, because you know that's where the real challenge is

lapis sequoia Oct 5, 2020, 1:13 AM

#

And how does one check qqality over time of a model?

odd yoke Oct 5, 2020, 1:13 AM

#

and you still got the client's money

lapis sequoia Oct 5, 2020, 1:13 AM

#

Like store information of that model

bronze skiff Oct 5, 2020, 1:13 AM

#

for example, our fully productionized pipelines run on kubeflow set to a job that triggers during periodic data ingests

lapis sequoia Oct 5, 2020, 1:13 AM

#

Just save it over time

#

That is just crazy to me how no one yet fixed this

#

So it persists in large companies, but people havent yet found a way to just simply manage their model's infrastructure over time

#

Because this isnt the first I heard about this

bronze skiff Oct 5, 2020, 1:15 AM

#

why do you say that? there are a shit ton of tools that allow you to trigger retrainings based on shifting metrics

#

it's not a new problem, and it has ways to go about it

odd yoke Oct 5, 2020, 1:15 AM

#

i would say that isn't the problematic factor

#

the problem is getting said metrics

lapis sequoia Oct 5, 2020, 1:15 AM

#

OHH

#

YEAH

#

Sorry my caps go off

#

But I do agre

bronze skiff Oct 5, 2020, 1:16 AM

#

yeah, but i would say that's domain specific

odd yoke Oct 5, 2020, 1:16 AM

#

completely

#

and that's the issue

bronze skiff Oct 5, 2020, 1:16 AM

#

the MLops solutions are there

odd yoke Oct 5, 2020, 1:16 AM

#

it needs to be tailored per project

bronze skiff Oct 5, 2020, 1:16 AM

#

i mean, i use them all the time

#

haha

#

but it depends on what you're doing

lapis sequoia Oct 5, 2020, 1:16 AM

#

I heard someone saif also about the metrics

#

beginner here . how to install tenforlow

#

I dont know if I heard it correctly from someone. But they did say it was about receiving the metrics as well

#

tamserlow

bronze skiff Oct 5, 2020, 1:17 AM

#

try again

lapis sequoia Oct 5, 2020, 1:17 AM

#

But I didnt know exactly what they meant

#

tensorflow

#data-science-and-ml

Make the subplots