#data-science-and-ml | Python | Page 361

subtle breach Dec 16, 2021, 6:02 PM

#

region column = northwest etc

#

4 plots

hasty nimbus Dec 16, 2021, 6:04 PM

#

you have missed northeast?

subtle breach Dec 16, 2021, 6:04 PM

#

#

thats what i got with your code

#

nevermind..

#

thx anyway

hasty nimbus Dec 16, 2021, 6:05 PM

#

is this not what you wanted ?

subtle breach Dec 16, 2021, 6:05 PM

#

sorry no, thought that be obvious. its ok ill figure it out

hasty nimbus Dec 16, 2021, 6:07 PM

#

subtle breach sorry no, thought that be obvious. its ok ill figure it out

Oops, sorry then..

#

lets hope you get help from someone here (I just came in to clear one my doubts and then saw yours on the way and thought I might be able to help in some way)

desert oar Dec 16, 2021, 6:38 PM

#

subtle breach thats what i got with your code

are you looking for a smooth kernel density estimate instead of histogram?

subtle breach Dec 16, 2021, 6:39 PM

#

Yes initially

#

Needs to filter for 4 diff regions

desert oar Dec 16, 2021, 6:42 PM

#

so you already have code that groups of data into four regions and draws a histogram for each region

#

perhaps you can spend some time understanding that code, in order to figure out how to modify it appropriately?

#

alternatively, you could learn the more idiomatic way to do this with Seaborn, which uses a paradigm called "the grammar of graphics"

#

under that paradigm, this grid of subplots is called "faceting"

#

each subplot is a "facet"

#

and usually you create one facet per sub-group in the data, which appears to be precisely what you are looking for

#

https://seaborn.pydata.org/examples/faceted_lineplot.html

#

so here is a demonstration of using line plots

#

even better, here is an example with histograms https://seaborn.pydata.org/examples/faceted_histogram.html

#

Note: the row and col parameters control faceting

#

perhaps you might also want to read about displot https://seaborn.pydata.org/generated/seaborn.displot.html

#

and I suspect you also want to read about how to use col_wrap and col for "wrapped" facet plots

#

https://seaborn.pydata.org/generated/seaborn.FacetGrid.html this documentation pages long, but the parameter names are the same as in the higher level clotting functions

#

so it's a good place to learn about what those individual parameters do

#

i'm surprised that there isn't a nice document explaining wrapped facet plots for seaborn, at least i couldn't find one

#

but here is one for a different plotting package: https://www.sharpsightlabs.com/blog/facet_wrap/. obviously you should ignore all the code, but hopefully the explanation and examples makes sense

Sharp Sight

Joshua Ebner

How to use to facet_wrap in ggplot2 - Sharp Sight

This tutorial will show you how to use facet_wrap in ggplot2. It will explain the syntax, and also show you a step-by-step example.

#

https://indico.cern.ch/event/626147/attachments/1456066/2247140/FloatingPoint.Handout.pdf 🙂

#

floating point numbers are not "real numbers" in the mathematical sense

#

it's unrealistic to expect them to sum exactly to 1

hasty nimbus Dec 16, 2021, 6:53 PM

#

If I have used float64, It would give 1

desert oar Dec 16, 2021, 6:53 PM

#

changing the floating-point number representation changes how precisely the numbers are stored, which changes the way errors are propagated through calculations

#

64 bits is twice as precise as 32 bits

#

so maybe with 32 bits you get enough error that it shows when you print the numbers

#

i think at some point every practitioner of data analysis will be forced to learn about floating-point arithmetic and numerical stability

hasty nimbus Dec 16, 2021, 6:54 PM

#

is there any way to calculate the probability mass function (pmf) column wise?

desert oar Dec 16, 2021, 6:55 PM

#

i don't quite understand the question, is that a joint probability table? and you are trying to compute marginal or conditional probabilities?

hasty nimbus Dec 16, 2021, 6:55 PM

#

I would like to get an array, where the column sum could be 1..

#

or the probability sum of elements column wise would be 1, when the datatypes are of numpy float32 values..

desert oar Dec 16, 2021, 6:58 PM

#

if you are just trying to check the correctness of your code, don't be concerned by floating point errors on the order of 1e-6 or whatever they were

#

that said, why do you need 32 bit specifically?

hasty nimbus Dec 16, 2021, 6:59 PM

#

i am trying to calculate a loss function..

desert oar Dec 16, 2021, 6:59 PM

#

unless you have a very specific need to do otherwise, just use the default np.float which is usually 64-bit on modern machines

#

ok, and like i said: floating-point arithmetic is never 100% accurate because most floating point numbers are not exact

#

so expect small errors

#

in some situations, it might be a problem if those errors start to accumulate throughout the sequence of computations

loud cave Dec 16, 2021, 7:00 PM

#

is there a name of the loss? maybe a reference would help us iunderstand

desert oar Dec 16, 2021, 7:00 PM

#

in your case, it sounds like you are just concerned that you are implementation is wrong because the numbers don't add up exactly to 1

#

i am telling you that your implementation is probably fine because those errors look like what you expect from accumulated floating-point errors, rather than the algorithm being wrong

hasty nimbus Dec 16, 2021, 7:01 PM

#

loud cave is there a name of the loss? maybe a reference would help us iunderstand

jensen shannon divergence

desert oar Dec 16, 2021, 7:01 PM

#

if you are still concerned and not convinced, i recommend you spend some time reading about floating point numbers and floating point arithmetic

#

this is just the reality of using computers, at some point you have to deal with the fact that they are physical computers and not idealized computing machines

hasty nimbus Dec 16, 2021, 7:02 PM

#

desert oar if you are still concerned and not convinced, i recommend you spend some time re...

I have understood what you are saying..

pastel valley Dec 16, 2021, 7:24 PM

#

yo what are those kernel options and verbosity for svm?

subtle breach Dec 16, 2021, 7:27 PM

#

Thanks Salt. I look it all up. Exhausted! 👍

pastel valley Dec 16, 2021, 7:35 PM

#

does svm kernels are like optimizers on neural networks?

stuck gull Dec 16, 2021, 7:47 PM

#

Hey guys, anyone here familiar with pandas? I need help making a subset of a very large CSV file into another file to use.

rigid zodiac Dec 16, 2021, 8:16 PM

#

stuck gull Hey guys, anyone here familiar with pandas? I need help making a subset of a ver...

can you be more specific?

#

what do you mean making a subset??

glossy bobcat Dec 16, 2021, 8:19 PM

#

Hi, we are making a deep learning translator like program with pytorch and would really appriciate some help with trying to normalize the data and transforming words (different values derrived from words) into tensors

lapis sequoia Dec 16, 2021, 8:41 PM

#

look into lemmatization, stemming, and algorithms like word2vec

glossy bobcat Dec 16, 2021, 8:46 PM

#

thanks 🙂

olive patio Dec 16, 2021, 9:08 PM

#

Hey guys, I have a question related to normalization in grayscale images. https://stackoverflow.com/questions/70371050/finding-the-mean-and-std-of-pixel-values-for-grayscale-images-in-pytorch
Would appreciate any help

Stack Overflow

Finding the mean and std of pixel values for grayscale images in py...

I'm trying to normalize this grayscale xray images dataset https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
I have a few doubts
1)I looked up some of the projects done using the same d...

agile monolith Dec 16, 2021, 9:43 PM

#

x = np.linspace(0 , (2 * np.pi), 200)
def h2d(x):
    fig = plt.figure()
    n=np.arange(1, 12,2)
    print(n, 'n')
    xx,nn = np.meshgrid((x),(n))
    plt.plot(xx,nn)
    
    
    happrox = (1*(4/(np.pi)) * ((np.sin(nn*(xx))) / nn))
    happrox = np.cumsum(happrox,1)
    return(happrox)

happrox = h2d(x)
print(happrox)

#

This my code but it gives wack result for cumsum, is it coz i have imaginary numbers after the sin operation?

tidal bough Dec 16, 2021, 10:19 PM

#

why'd you have imaginary numbers here?

agile monolith Dec 16, 2021, 10:20 PM

#

tidal bough why'd you have imaginary numbers here?

wdym?

tidal bough Dec 16, 2021, 10:21 PM

#

I don't see anything here that'd result in imaginary results

agile monolith Dec 16, 2021, 10:25 PM

#

tidal bough I don't see anything here that'd result in imaginary results

it shouldn't yes

#

so why is the output wrong?

tidal bough Dec 16, 2021, 10:26 PM

#

Why do you think that it's wrong?

agile monolith Dec 16, 2021, 10:28 PM

#

because when i get it to check the code jupyter jotebook says the desired output is


y: array([[ 0.000000e+00, 8.033524e-02, 1.602705e-01, 2.394092e-01, 3.173611e-01, 3.937456e-01,
 4.681952e-01, 5.403578e-01, 6.099000e-01, 6.765098e-01, 7.398989e-01, 7.998050e-01,...

agile monolith Dec 16, 2021, 10:28 PM

#

tidal bough Why do you think that it's wrong?

y array is the desired x array is my output

tidal bough Dec 16, 2021, 10:28 PM

#

If I'm reading it right, even the shape isn't right.

agile monolith Dec 16, 2021, 10:29 PM

#

tidal bough If I'm reading it right, even the shape isn't right.

yes which is really scary

arctic wedgeBOT Dec 16, 2021, 10:30 PM

#

Hey @agile monolith!

It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

agile monolith Dec 16, 2021, 10:31 PM

#

doc isnt allowed damn

#

#

#

this is the task so it doesnt make sense for the shape to be (4,200)

#

@tidal bough im not trippin right?

tidal bough Dec 16, 2021, 10:34 PM

#

Yeah, it's somewhat weird that the shape isn't (6,200)

agile monolith Dec 16, 2021, 10:36 PM

#

tidal bough Yeah, it's somewhat weird that the shape isn't (6,200)

my shape is (0,6,200) apparently which if we consider it to be (6,200) it makes sense along with the instructions. but the wanted result is in the form (4,200)

slow vigil Dec 17, 2021, 12:26 AM

#

pretty noob question but does anyone know a good way to only view the first few lines of a massive json file in the terminal?

#

I'm getting these api responses that are so big my terminal won't print the whole thing, but it only prints the end of the file up, so I can't see the data structure

#

I'm writing them to a file currently, but I was wondering if there's an easy way to see the beginning in the terminal

tidal bough Dec 17, 2021, 12:47 AM

#

you can just .splitlines the JSON string and take the first few lines

lost zodiac Dec 17, 2021, 12:49 AM

#

is there a place where i can practice Data Science based Python questions?
ive checked Leetcode but its not there 😦

agile monolith Dec 17, 2021, 2:56 AM

#

tidal bough you can just `.splitlines` the JSON string and take the first few lines

i did it

#

lowkey very happy

#

no idea why or how it works

wicked grove Dec 17, 2021, 4:06 AM

#

<ipython-input-82-182241fd5e53> in <module>
      6                "stride": 2}
      7 
----> 8 Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
      9 print("Z's mean =\n", np.mean(Z))
     10 print("Z[0,2,1] =\n", Z[0, 2, 1])

<ipython-input-81-f5cd533d7e29> in conv_forward(A_prev, W, b, hparameters)
     86                     weights = W[:,:,:,c]
     87                     biases = b[:,:,:,c]
---> 88                     Z[i,h,w,c] = conv_single_step(a_slice_prev,weights,biases)
     89 
     90 

IndexError: index 4 is out of bounds for axis 3 with size 4``` i cant understand where i am going wrong

hearty token Dec 17, 2021, 6:08 AM

#

I see that there's a slight improvement when it comes to training a contextual chatbot in deeplearning with data that fits this criteria:

Equal distribution of interrogative words
i.e. if one tag contains how do i water the plants?
Adding a question starting with how on another tag to balance it makes it more intelligent
Equal distribution of patterns (i.e. 10 patterns each tag)

I have some idea why this is the case but could anyone explain to me why exactly this is the case? (or if it isn't and it's a matter of chance)

rapid pelican Dec 17, 2021, 6:25 AM

#

desert oar if you are still concerned and not convinced, i recommend you spend some time re...

i know this isn't related to data science nor artificial intelligence but i remember you! congrats on getting the helper role!

inner pebble Dec 17, 2021, 8:10 AM

#

Thanks for answering. I m trying this

inner pebble Dec 17, 2021, 8:34 AM

#

It works just fine @polar acorn I think I ve lost myself in my own code as it seems just easy and logic today.
Thanks for it.

#

Guys, I have another question.
I think I m gonna use streamlit to create dashboards app for my company.
My question (and I ve asked it in game dev as well because I think the stakes are the same) is how can I share the built apps to my colleagues.

I can either build the app on the common file server and installing a python env in this folder so everyone can launch the app from a shared folder but only one user can use it at a time. + Other problem is that I need to create a file that launch the python env before the app file.
Sounds like mac gyver solution.

Or I create an app just like any software and I can share the app to anyone locally. How can I do that?
Should I use docker to create an image?

odd meteor Dec 17, 2021, 8:45 AM

#

inner pebble Guys, I have another question. I think I m gonna use streamlit to create dashboa...

Seems like a problem Gradio can solve. Check out Gradio

inner pebble Dec 17, 2021, 8:47 AM

#

odd meteor Seems like a problem Gradio can solve. Check out Gradio

I m checking this out thanks for it

odd meteor Dec 17, 2021, 8:51 AM

#

inner pebble I m checking this out thanks for it

Also, I think HuggingFace has a feature called Spaces, you can host what you've built with Gradio on Spaces and multiple users can access it at the same time.

Well, since HuggingFace just acquired Gradio yesterday, I believe it'll definitely blossom into something more beautiful.

inner pebble Dec 17, 2021, 8:52 AM

#

ah yeah I didn t mention something is that as it s working with companies data I wan t it to stay local or intranet
HuggingFace's spaces is a hosting cloud service?

odd meteor Dec 17, 2021, 8:58 AM

#

inner pebble ah yeah I didn t mention something is that as it s working with companies data I...

IDK for sure because I've not personally used Spaces. But what I do know is, whatever code / model you put out on HF spaces is still 100% yours.

inner pebble Dec 17, 2021, 8:58 AM

#

ok I m gonna check this out as well thanks @odd meteor

sleek tapir Dec 17, 2021, 10:01 AM

#

https://www.maths.unsw.edu.au/courses/math3371

#

is this course good

wicked grove Dec 17, 2021, 10:02 AM

#

hello i am really new tensorflow and i keep getting these errors

#

AttributeError                            Traceback (most recent call last)
<ipython-input-9-797be23c9feb> in <module>
      3 loss = tf.Variable((y-y_hat)**2,name='loss')
      4 #init = tf.initialize_all_variables
----> 5 with tf.Session() as session:
      6     #session.run(init)
      7     print(session.run(loss))

AttributeError: module 'tensorflow' has no attribute 'Session'```

wicked grove Dec 17, 2021, 10:25 AM

#

odd meteor IDK for sure because I've not personally used Spaces. But what I do know is, wha...

sorry to ping you, could you please tell me why i am getting the above error

odd meteor Dec 17, 2021, 10:42 AM

#

wicked grove sorry to ping you, could you please tell me why i am getting the above error

Because the attribute has been deprecated in TensorFlow 2.0

https://stackoverflow.com/questions/55142951/tensorflow-2-0-attributeerror-module-tensorflow-has-no-attribute-session

Stack Overflow

Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribu...

When I am executing the command sess = tf.Session() in Tensorflow 2.0 environment, I am getting an error message as below:
Traceback (most recent call last):
File "<stdin>", line 1,...

wicked grove Dec 17, 2021, 10:43 AM

#

Ohhh okayy,thank you! So ill just use tf.print()?

odd meteor Dec 17, 2021, 10:48 AM

#

wicked grove Ohhh okayy,thank you! So ill just use tf.print()?

Use tf.compat.v1.Session() instead

modest mulch Dec 17, 2021, 11:28 AM

#

Anyone knows of deep learning models used for time series classification that I could read about?

mighty spoke Dec 17, 2021, 12:10 PM

#

Hi I'm getting this error: but not sure how to fix it
File "C:\Users\haris\Documents\Bsc stock project\centroid lags.py", line 68, in xvals
return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),

IndexError: index 0 is out of bounds for axis 0 with size 0 import pandas as pd#import pandas package to read data more easily import matplotlib.pyplot as plt#imported pyplot to plot graphs import datetime as dt#date time to read first column of csv file import numpy as np from datetime import datetime CL=[] for i in range(100): df = pd.read_csv('TSLA.csv') df2 = pd.read_csv('NBM.V.csv') df0=pd.read_csv('file1.csv') df5=pd.read_csv('file2.csv') data1=df0 data2=df5 data1['Date'] = pd.to_datetime(data1['Date']) data2['Date'] = pd.to_datetime(data2['Date']) x1=(data1['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400 x2=(data2['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400 y1=data1['Close'] y2=data2['Close'] t0=[] d0=[]

#

    y2_mean = np.mean(y2)
    y1_stdv = np.std(y1)
    y2_stdv = np.std(y2)
    for i in range(len(data1)):
        for j in range(len(data2)):
            t=x2[j]-x1[i]
            t0.append(t)
            d = (y1[i]- y1_mean)*(y2[j] - y2_mean)/(y1_stdv*y2_stdv)
            d0.append(d)
           # return udcf
        #data=udcf(data1,data2)
    x, y = zip(*sorted(zip(t0, d0)))#ensures x and y values correspond to each others in pairs when sorted
    df4 = pd.DataFrame({'X' : x, 'Y' : y})  #we build a dataframe from the data
    #bins = create_bins(lower_bound=-6,width=3,quantity=30)
    bins=np.arange(min(x), max(x)+0.01, step=4.3)
    #bins2 = pd.IntervalIndex.from_tuples(bins, closed="left")
    categorical_object = pd.cut(x, bins)
    count=pd.value_counts(categorical_object)
    grp = df4.groupby(by = categorical_object)        #we group the data by the cut
    ret = grp.aggregate(np.mean)
    data2_new=df2.sample(frac = 0.7)
    data1_new=df.sample(frac = 0.7)
    dict = pd.DataFrame({'Date':data1_new['Date'],'Close': data1_new['Close']})
    kd = pd.DataFrame(dict)
    kd.to_csv('file2.csv', index=False)
    dict2 = pd.DataFrame({'Date':data2_new['Date'],'Close': data2_new['Close']})
    kd = pd.DataFrame(dict2)
    kd.to_csv('file3.csv', index=False) 
    x1,y1=zip(*sorted(zip(ret.X,ret.Y)))
    def lin_interp(x, y, i, percent_y):
        return x[i] + (x[i+1] - x[i]) * ((percent_y - y[i]) / (y[i+1] - y[i]))
    def xvals(x, y):
        percent_y = (max(y)*0.8)
        signs = np.sign(np.add(y, -percent_y))
        zero_crossings = (signs[0:-2] != signs[1:-1])
        zero_crossings_i = np.where(zero_crossings)[0]
        return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),
                lin_interp(x1, y1, zero_crossings_i[1], percent_y)]
    hmx = xvals(x1,y1)
    centroid=np.mean(hmx)
    CL.append(np.mean(centroid))```

agile monolith Dec 17, 2021, 12:40 PM

#

mighty spoke Hi I'm getting this error: but not sure how to fix it File "C:\Users\haris\Doc...

Axis 0 size 0? Wth

mighty spoke Dec 17, 2021, 12:46 PM

#

agile monolith Axis 0 size 0? Wth

yhh it says says this also: return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),lin_interp(x1, y1, zero_crossings_i[1], percent_y)]

IndexError: index 1 is out of bounds for axis 0 with size 1

#

but when I take it all out the for loop it runs without errors

agile monolith Dec 17, 2021, 1:07 PM

#

what exactly are you trying to do?

mighty spoke Dec 17, 2021, 1:37 PM

#

agile monolith what exactly are you trying to do?

I'm trying to find 80% of the max y values in a particular 70% sample then I'm trying to find the x coordinates of where the line y_value=max(ret.Y)*0.8 crosses/intersects the 2 points either side of the peak, Then I try finding the midpoint of these to x coordinates and append them to a list, then I will try bin these values and plot N(number of points in each bin) vs the binned data

#

ret.Y is my y data and ret.X is my x data

agile monolith Dec 17, 2021, 1:39 PM

#

and plot N(number of points in each bin) vs the binned data the issue starts b4 this but after the following 'm trying to find 80% of the max y values in a particular 70% sample then I'm trying to find the x coordinates of where the line y_value=max(ret.Y)*0.8 crosses/intersects the 2 points either side of the peak, Then I try finding the midpoint of these to x coordinates and append them to a list

#

ur samples have any nans?

mighty spoke Dec 17, 2021, 1:40 PM

#

lemme check one sec

agile monolith Dec 17, 2021, 1:40 PM

#

one of the operations u do is preventing u i think

#

coz the size doesnt match

#

check the size of all the lists

#

check what they are after the nans are filtered (if u have any)

mighty spoke Dec 17, 2021, 1:42 PM

#

when I print my sample data frame it dosent contain any Nan

agile monolith Dec 17, 2021, 1:42 PM

#

chck size

#

of each sample

mighty spoke Dec 17, 2021, 1:42 PM

#

oh yes

upbeat prism Dec 17, 2021, 1:48 PM

#

so how dose one pass weight scaling to a loss function in pytorch? I see that BCELoss can take weights when a object is being created but I go through my data in batches and I want batch wise. do you guys just create a new object for each batch or what?

#

I just pass a "reference" or whatever python calls it to my training function

desert oar Dec 17, 2021, 2:48 PM

#

pastel valley does svm kernels are like optimizers on neural networks?

no, the svm kernel is more like hidden layers in the NN

desert oar Dec 17, 2021, 2:48 PM

#

rapid pelican i know this isn't related to data science nor artificial intelligence but i reme...

thanks!

pastel valley Dec 17, 2021, 2:52 PM

#

desert oar no, the svm kernel is more like hidden layers in the NN

how to choose parameters with it?

#

ill use polynomial kernel but what parameters i should put?

desert oar Dec 17, 2021, 2:52 PM

#

what do you mean?

#

a polynomial kernel turns y = wx into y = f(x) where f is a polynomial. the "parameter" is just the order of the polynomial. you probably don't want more than 2 or 3 imo

pastel valley Dec 17, 2021, 2:53 PM

#

desert oar what do you mean?

this one

#

desert oar Dec 17, 2021, 2:54 PM

#

for the other kernels, treat them like hyperparameters in any other model, e.g. a neural network

#

you should familiarize yourself with how those kernels actually work (and what a kernel actually is)

pastel valley Dec 17, 2021, 2:54 PM

#

oh maybe thats why haha
thank you ill try to look into it

subtle breach Dec 17, 2021, 3:10 PM

#

desert oar you should familiarize yourself with how those kernels actually work (and what a...

You great man; your links really helped - so easy:

#

g = sns.FacetGrid(df2, col="region", height=3.5, aspect=.65)
g.map(sns.kdeplot, "charges")

#

#

easy!!

#

(just now need to add mean/median*).

desert oar Dec 17, 2021, 3:59 PM

#

subtle breach (just now need to add mean/median*).

these are still matplotlib plots, so you can get access to the Axes objects and use the usual .plot method to add lines

#

or better yet, the .axvline method

weary stag Dec 17, 2021, 4:56 PM

#

hey can anyone help me with this. nameError: 'app' is not defined on my 1 flask code

wicked grove Dec 17, 2021, 5:21 PM

#

online learning with relative preferences has resulted in a new framework for optimizing over sets of alternatives with only relative, subset-wise observations. This is a general framework that is applicable to many automated recommendation systems that can sequentially elicit only relative preferences from, say, human users, e.g., “Do you like X over Y (and Z)?” This study has yielded state-of-the-art learning algorithms that make optimal subset selection decisions in terms of regret and rank-order estimation error, along with new insights on how to efficiently make comparisons to elicit items’ utilities within a wide range of social choice models.i came across this in a paper, can someone please explain it to me

serene scaffold Dec 17, 2021, 5:23 PM

#

weary stag hey can anyone help me with this. nameError: 'app' is not defined on my 1 flas...

sounds like a #web-development question, but the problem is that you referred to something that you haven't defined. There are built-in functions and classes that are already named when your program starts, but everything else you have to import or name in the code.

#

I would recommend using a code editor that highlights built-in names so that you start remembering which ones they are. It's helpful to know them since they're always available.

bronze skiff Dec 17, 2021, 5:39 PM

#

wicked grove ```online learning with relative preferences has resulted in a new framework for...

what is your previous expertise with ML? for example, does "regret minimization" mean anything to you?

serene scaffold Dec 17, 2021, 5:57 PM

#

Nothing can minimize my regret.

odd meteor Dec 17, 2021, 6:15 PM

#

serene scaffold Nothing can minimize my regret.

What of a 4 story building in Silicon Valley and a yatch? 😃

serene scaffold Dec 17, 2021, 6:16 PM

#

odd meteor What of a 4 story building in Silicon Valley and a yatch? 😃

if I can sell them both and buy a condo here, sure.

warped rapids Dec 17, 2021, 6:28 PM

#

Hey all! I want to fill the area's in between the lines and put a text

#

Any idea on how to do it?

#

odd meteor Dec 17, 2021, 6:28 PM

#

weary stag hey can anyone help me with this. nameError: 'app' is not defined on my 1 flas...

I don't know much about model deployment (I'm assuming you're trying to deploy your model using Flask)

Ensure you've created app.py file

Hopefully, people into MLOps can help you.

warped rapids Dec 17, 2021, 6:30 PM

#

I know there's a fill_between function in matplotlib, but have no idea on how to implement it in the current state of my code

#

And it needs to be a different color in each section

odd meteor Dec 17, 2021, 6:31 PM

#

serene scaffold if I can sell them both and buy a condo here, sure.

😂

warped rapids Dec 17, 2021, 6:38 PM

#

Anyone any idea?

bronze skiff Dec 17, 2021, 6:56 PM

#

warped rapids

this can't be a real plot

#

also post your code if you're unsure of implementation

slow vigil Dec 17, 2021, 7:28 PM

#

Hey guys I have a pandas dataframe and the index is all labels. I need to use the value of the labels individually in another function and I was wondering how to go about getting them

loud cave Dec 17, 2021, 7:39 PM

#

slow vigil Hey guys I have a pandas dataframe and the index is all labels. I need to use th...

You can do df.index,or if you only need the distinct values df.index.unique

slow vigil Dec 17, 2021, 7:41 PM

#

I need each index value one at a time

warped rapids Dec 17, 2021, 7:53 PM

#

bronze skiff this can't be a real plot

What do you mean?

#

It is

mighty spoke Dec 17, 2021, 8:48 PM

#

Hi I my code was running like 2 mins ago but now when I run it it says
File "pandas_libs\parsers.pyx", line 549, in pandas._libs.parsers.TextReader.cinit

EmptyDataError: No columns to parse from file

#

and in another tab using the same files it says the same thing

serene scaffold Dec 17, 2021, 9:36 PM

#

mighty spoke Hi I my code was running like 2 mins ago but now when I run it it says File "...

well, try printing the DataFrame right before the line that eventually causes that error. Remember to read the traceback from top to bottom to understand how the error happened

mighty spoke Dec 17, 2021, 9:44 PM

#

serene scaffold well, try printing the DataFrame right before the line that eventually causes th...

thankyou it worked

serene scaffold Dec 17, 2021, 9:45 PM

#

mighty spoke thankyou it worked

what I suggested was not a solution--it was only intended to help you figure out why you got the error

#

but I'm glad it worked, whatever that means

mighty spoke Dec 17, 2021, 9:45 PM

#

yhh it was reading data from a file with no data

arctic wedgeBOT Dec 17, 2021, 10:14 PM

#

:incoming_envelope: :ok_hand: applied mute to @dense atlas until <t:1639779871:f> (9 minutes and 58 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

hybrid mica Dec 17, 2021, 10:16 PM

#

How many types of machine learning are there?

odd meteor Dec 17, 2021, 10:21 PM

#

hybrid mica How many types of machine learning are there?

Supervised Learning, Unsupervised Learning, Reinforcement Learning, And most recently ; Semi-Supervised Learning and Self-Supervised Learning 😫

delicate sphinx Dec 17, 2021, 10:32 PM

#

TENSORFLOW
So this is the output from my model, I have a tokenizer which can map integers to and from words, but as this is my output, I can't use it, is there a way to return this to integer form? I looked into TextVectorisation but my model uses an image and a question input to get an answer (in text) output. So I'm not sure if I can use it

#

Can anyone give me some tips on how to convert this into integers so I can tokenize it back into english?

#

I imagine I need some sort of TextVectorisation layer as output from my merged model?

hollow sentinel Dec 18, 2021, 12:31 AM

#

#

i am so beyond confused

#

by that bottom arrow

#

x should be 2, y is 2, and z is 0

#

but i don't see how exactly y is two?

#

oh

#

i'm an idiot

#

lol

#

now it makes sense

hollow sentinel Dec 18, 2021, 1:27 AM

#

it's 2 diagonally

arctic crown Dec 18, 2021, 1:40 AM

#

in python tensorflow what does .loc() do?

wicked grove Dec 18, 2021, 2:01 AM

#

bronze skiff what is your previous expertise with ML? for example, does "regret minimization"...

oh no i do not know what it means. i had to read a professor's paper and write a mail

serene scaffold Dec 18, 2021, 2:26 AM

#

arctic crown in python tensorflow what does .loc() do?

example?

steel berry Dec 18, 2021, 3:39 AM

#

serene scaffold Nothing can minimize my regret.

this

steel berry Dec 18, 2021, 3:41 AM

#

arctic crown in python tensorflow what does .loc() do?

I am not aware of a loc method in tensorflow, perhaps you are using the pandas .loc()?
in which case refer you can refer to this for complete documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

granite furnace Dec 18, 2021, 6:14 AM

#

super noob here. I have this pandas dataframe, and i would like to compare the second two columns to the first, outputting true if their signs match, and false otherwise.

#

_df['randforest_result'] = np.where((_df['randforest'] >= 0 and _df['y_test'] >= 0) or (_df['randforest'] < 0 and _df['y_test'] < 0), True, False)
``` I tried something like this but it's giving me issues about the truthiness of series/dataframes

shrewd saddle Dec 18, 2021, 6:31 AM

#

This probably is not the right place to ask this, but anyway, I am trying to do some elementary satellite image analysis with rasterio and earthpy. The NDVI is given as the normalized difference between the near infra-red and red bands. Water should appear yellow or red (negative values) and vegetation should appear green (positive value) in the NDVI result. But I am getting kinda the opposite

#

The green part is supposed to be water, and yellowish part is land and vegetation

#

this is the code:

ndvi = es.normalized_diff(stacked[4], stacked[3])

ep.plot_bands(ndvi, cmap="RdYlGn", cols=1, vmin=-1, vmax=1, figsize=(10, 14))

plt.show()

#

stacked[4] is NIR and stacked[3] is the red band

hasty kiln Dec 18, 2021, 7:10 AM

#

import numpy as np
a = np.array([
          [1, 2, 3],
          [4, 5, 6],
          [7, 8, 9],
          [10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]])

arr = np.array_split(a, 3, axis=1)

print(arr)              
[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]

Why is it divided on the basis of the column and not on the basis of the row, even though (1) the division on the basis of the row?

#

old grove Dec 18, 2021, 8:38 AM

#

if the difference is 0.5 between mean and median... can we use mean... eg i am analyzing avg no of likes on Instagram

bold timber Dec 18, 2021, 8:53 AM

#

from this code I got an error like this: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

how to fix that?

timber flame Dec 18, 2021, 9:42 AM

#

Bros, I'm a 1.5 years machine learning engineer, with devops and automation experience too. I know python and am learning Go too, know SQL well and bash too. I had a rough week at my org and want to prepare asap for a couple of months and get a new job. Can someone tell me what specif site, topics and courses I should start doing to get a new job. For Python btw. If u know what ML topics I should learn that would be cool.

tall loom Dec 18, 2021, 9:47 AM

#

What are the properties of a stop word without considering its semantics? I want to determine whether a word is a stop word for any random language, one I can think of is that it occurs in a high frequency within a text.

#

The inverse document frequency would be high too, but I don't want to work with a group of documents. Rather, for a single document.

#

For a multi-document approach, I can perhaps create a context vector and train a naive bayes classifier to determine if a word is stop word or not.

#

However, I need a lightweight way to just determine with some accuracy on a single document.

odd meteor Dec 18, 2021, 10:43 AM

#

tall loom What are the properties of a stop word without considering its semantics? I want...

Stopwords are literally those common words in every language that adds Zero value to our task yet they tend to be littered everywhere in our document most times.

A word can be a stopword and still not appear frequently in a document.

odd meteor Dec 18, 2021, 10:48 AM

#

timber flame Bros, I'm a 1.5 years machine learning engineer, with devops and automation expe...

Since you're already a Machine Learning Engineer, I think a demonstrable work experience would be sufficient in your case. Except there's a niche in ML you wanna go into

pastel valley Dec 18, 2021, 10:53 AM

#

yo in application of data augmentation for samples there could be a boundary like there will be a optimal amount of augmented data that best give result and there is something like a dataset that compose of too much augmented data to the point that it gives bad observation to the model?

odd meteor Dec 18, 2021, 10:57 AM

#

tall loom The inverse document frequency would be high too, but I don't want to work with...

Is there any reason why you'd wanna do this from sratch when there's already an easier option to achieve your task?

Just as we have English stopwords, there are other stopwords in various languages too. Libraries like spaCy, TfidfVectorizer etc already has a parameter that can handle such.

If you're working on a French text, just indicate you'd wanna remove stopwords in French with the appropriate parameter.

tall loom Dec 18, 2021, 11:07 AM

#

odd meteor Stopwords are literally those common words in every language that adds Zero valu...

Yes, that is true, but as I said, "without considering semantics", the high-frequency part is one of the general trend features and I only want to detect with some accuracy which is expected here. I am looking for some extra features which can help generalize with more certainty.

tall loom Dec 18, 2021, 11:08 AM

#

odd meteor Is there any reason why you'd wanna do this from sratch when there's already an ...

I have some low resource languages for which such stop word data does not exist. So I have to work with an unsupervised method

timber flame Dec 18, 2021, 11:09 AM

#

odd meteor Since you're already a Machine Learning Engineer, I think a demonstrable work ex...

Yes but I have to solve the coding questions, and get through the coding round. I haven't been doing an data structures and algorithms or leetcode etc

#

Plus my main work was data science related with a niche but I want to work in anything that is being offered

lethal lark Dec 18, 2021, 11:16 AM

#

I am doing a kaggle compeitition, getting low f1 score (its stuck around 0.5) but high accuracy any suggestions on what might be causing this ? my dataset is balanced and has two classes

odd meteor Dec 18, 2021, 11:16 AM

#

tall loom Yes, that is true, but as I said, "without considering semantics", the high-freq...

Aside the high frequency, I doubt if there's away to easily get the kind of accuracy you'd love to see on the project if you're not familiar with the 'new' language or at least a native speaker of that 'new' language you wanna work with.

That's why I kinda feel using SpaCy or TfidfVectorizer or CountVectorizer's stopwords parameter is best.

It's an interesting project regardless 😊

tall loom Dec 18, 2021, 11:21 AM

#

odd meteor Aside the high frequency, I doubt if there's away to easily get the kind of accu...

SpaCy and the other packages deal with stopwords using a pre-defined dictionary for some language, right?

#

If such a dictionary does not exist in them, I don't think they will be able to work for a new low-resource language.

odd meteor Dec 18, 2021, 11:25 AM

#

tall loom Yes, that is true, but as I said, "without considering semantics", the high-freq...

Even going with the high frequency, this could easily mis-classify words that aren't stopwords as one.

Remember, even after removal of stop words, we still had to do Stemming and Lemmatization on a text. And most times, a stem of a word could very much appear frequently in a text too alongside the main culprit (stopwords)

tall loom Dec 18, 2021, 11:28 AM

#

However, certain features which I can think of is, properties of NOT STOP WORDS instead of choosing STOP words. One could be the rank of the sentence a word appears in, given a document, I think the top paragraphs convey way more information and as we go down it saturates and then concludes.
Such words in top sentences have a chance to be "keywords" rather than stop words, so words appearing in top sentences get lower weights.

Also frequency of a word within a sentence.

No. of sentences some word occurs in : More it is, the higher chances to be a stop word

Bigram frequency

Unique surrounding trigrams, if a word is surrounded by more unique words in a window of size 1, the greater chance it has to be a stop word.

And of course the frquency.

Combining these 6 features, I can train a classifier I guess

odd meteor Dec 18, 2021, 11:28 AM

#

tall loom SpaCy and the other packages deal with stopwords using a pre-defined dictionary ...

Yeah. Basically, I'd assumed it was majorly worked on by those who are fluent speakers or native speakers of that language.

tall loom Dec 18, 2021, 11:30 AM

#

odd meteor Even going with the high frequency, this could easily mis-classify words that ar...

Lemmatization and stemming will not exist for a low resource language as there is no pre-defined tree/rules for that language.

odd meteor Dec 18, 2021, 11:36 AM

#

To my understanding...

We all understand English, so that's why we can from the top of our head sight a stopword in English without sweating it.

It'll be difficult to spot a stopword in Igbo Language if you don't at least understand the language to a reasonable level. Languages differ and can be complex at times... so I'm just thinking hmmm 🤔

Well, what do I know 🤷🏾‍♂️

tall loom Dec 18, 2021, 11:37 AM

#

tall loom However, certain features which I can think of is, properties of NOT STOP WORDS ...

@odd meteor Don't you think these features will be unique across any language? Even if its Igbo

tall loom Dec 18, 2021, 11:41 AM

#

odd meteor To my understanding... We all understand English, so that's why we can from t...

Yes, that's why we need something to handle it, more of a syntactic approach where the meaning is not known.

odd meteor Dec 18, 2021, 11:43 AM

#

tall loom <@!519319496868233227> Don't you think these features will be unique across any ...

I don't believe it will be tbh. Language can be complex and tend to differ in a lot of ways. Definitely, the only intersection I'd say all language would share when it comes to stopwords is their frequency of occurrence. And we need more than frequency of occurrence to get a reasonable result.

I don't believe those guys that worked on spaCy were able to come up with stopwords for several languages by just looking at frequency of occurrence in a text...

😂😂😂 To me, I believe it starts with grouping people who are native speakers of each language together to manually annotate, label or identify stopwords in that language

#

I might be wrong tho.... But that's what my brain is telling me they did

tall loom Dec 18, 2021, 11:45 AM

#

odd meteor I don't believe it will be tbh. Language can be complex and tend to differ in a ...

Yes, language can differ in a lot of ways, but the writing style for an article is consistent, as the frequency.

Which feature do you think from them will not be consistent across multiple languages? Because they are all term features, just like frequency (which you said will be consistent) , what differs them to not make them consistent? I am asking to gain more insights and better the features.

pastel valley Dec 18, 2021, 11:45 AM

#

yo can i generate more than 1000 augmented image per image
lets say i have 10 samples and i genereted 1000 augmented images per sample so my total data set is 10010?

#

or based on the parameters or techniques that will be applied to the image there will be a limit on how much augmented images i can generate?

#

i am talking about the imagedatagen by keras

#

or if there are other library for image augmentation?

tall loom Dec 18, 2021, 11:48 AM

#

odd meteor I don't believe it will be tbh. Language can be complex and tend to differ in a ...

spaCy is open source, so I believe some contributors provided dictionaries of words for languages, frequency alone is definitely not the case. I will read the implementation of the module I guess

#

I saw, they just predefined a list of stop words for all languages, from different contributors.

odd meteor Dec 18, 2021, 11:52 AM

#

tall loom spaCy is open source, so I believe some contributors provided dictionaries of wo...

That's definitely a good way to go about it. I'm not an expert in NLP myself... All ideas are valid at this point 💡

odd meteor Dec 18, 2021, 11:54 AM

#

tall loom I saw, they just predefined a list of stop words for all languages, from differe...

Now, that's more like it. I kinda perceived that's what they'd do.

tall loom Dec 18, 2021, 11:56 AM

#

odd meteor That's definitely a good way to go about it. I'm not an expert in NLP myself... ...

I am only wondering as you said those features will not be consistent, but the frequency will be. So why is that the case, as implicitly most of them are working with some form of frequency and the rest are about order. If you can explain a bit on why you think they won't be consistent, then I can enhance them or maybe remove them. :)'

odd meteor Dec 18, 2021, 12:01 PM

#

timber flame Yes but I have to solve the coding questions, and get through the coding round. ...

Certificates aren't always necessary to break into tech, but if you're already good in Data Science and Machine Learning, then you might wanna consider learning Deep Learning.

Start from TensorFlow + Keras , then take the TensorFlow certification exam if you wanna have a certification.

If you can add PyTorch + TensorFlow + Keras in your repertoire that'd be super 🔥 🔥 🔥

timber flame Dec 18, 2021, 12:03 PM

#

Never once said "certificates"

#

Coding round isn't remotely related to deep learning

tall loom Dec 18, 2021, 12:06 PM

#

pastel valley yo can i generate more than 1000 augmented image per image lets say i have 10 sa...

Not sure about the implementation by imagedatagen, but by definition you can augment an image an infinite number of times!
However, the important aspect will be to check the redundancy among the augmented images, what parameters are u setting for augmentation depending upon the objective. because you can have 10 augmented images, and they are not providing sufficient NEW information for the neural network to learn anything significant.

pastel valley Dec 18, 2021, 12:09 PM

#

tall loom Not sure about the implementation by imagedatagen, but by definition you can aug...

do you think it can be a thesis topic? determine the optimal volume of augmented data for cnn models? like ill experiments using the same augmentation parameters and use 3 configurations
1 train the model with 30% of samples being augmented
2 train the model with 60% of samples being augmented
3 train the model with 90& of samples being augmented

will this topic hold value?

tall loom Dec 18, 2021, 12:12 PM

#

pastel valley do you think it can be a thesis topic? determine the optimal volume of augmented...

I think doing it from the redundancy and objective aspect with it will add more value, that is
Creating 10 augmented images for an "auto brightness" task will have high redundancy if the augmented images only have varying level of pixel brightness, but could be still optimal given the task.
However, for an object detection task, 10 of such images would be not optimal, in general (depends on the task again)

odd meteor Dec 18, 2021, 12:15 PM

#

tall loom I am only wondering as you said those features will not be consistent, but the f...

What I meant was that...

Languages can be fluid and dynamic, and as such, each language will be governed by different set of rules.

So I believe that Identified Features in language A will be consistent for language A but might not necessary be in Language B and Language C. That kinda explained why spaCy had to use different contributors who supposedly are native speakers of the language they were assigned to work on.

pastel valley Dec 18, 2021, 12:19 PM

#

tall loom I think doing it from the redundancy and objective aspect with it will add more ...

what do you mean by auto brightness? classifying the brightness level? what i said earlier will be applied to a fish classification task could it be useful to determine the optimal amount of data for classifying fish images? or should i try to use a open source dataset and apply the configurations (1-3) and compare the result ?

#

do you think the result will be useful for others? or the topic itself is subjective on the task of what classification task is being conducted?

tall loom Dec 18, 2021, 12:19 PM

#

odd meteor What I meant was that... Languages can be fluid and dynamic, and as such, each...

spaCy didn't use an unsupervised approach for stop words, that's why they used available stop word data and for the languages they dont have data, there is no rigorous implementation yet.

I am only quoting this line which you said:

Definitely, the only intersection I'd say all language would share when it comes to stopwords is their frequency of occurrence.

The other features I have defined are almost from a frequency perspective only, so I believe they have to be consistent. But I was only discussing which feature among the ones I have said won't be consistent.

tall loom Dec 18, 2021, 12:21 PM

#

pastel valley what do you mean by auto brightness? classifying the brightness level? what i sa...

Like given an image, which has a bit darker areas, the task is to increase brightness optimally on those areas only.
Like, say image processing part of many cameras in night mode.

pastel valley Dec 18, 2021, 12:22 PM

#

tall loom Like given an image, which has a bit darker areas, the task is to increase brigh...

there are already existing studies about this or not?

wicked grove Dec 18, 2021, 12:23 PM

#

odd meteor That's definitely a good way to go about it. I'm not an expert in NLP myself... ...

hello can you please tell me why we add (x) at the end of dense pyx = Flatten()(input) x = Dense(dense_shape, activation='relu')(x)

tall loom Dec 18, 2021, 12:27 PM

#

pastel valley do you think the result will be useful for others? or the topic itself is subjec...

What I meant is, the task of finding an optimal number of augmentation depends on the objective too!
" determine the optimal volume of augmented data for cnn models" + "Given a task"

If your task is fish classification, then it becomes
"" determine the optimal volume of augmented data for cnn models if we are doing fish classification task"

It will depend on the data as well.

A better topic willbe
"Given any random task, and data, finding an optimal number of augmentation images for a cnn model", which will be also an interesting cntribution

tall loom Dec 18, 2021, 12:29 PM

#

pastel valley there are already existing studies about this or not?

This was an example on how changing the task will change the optimal number for augmentation as well. So it becomes difficult to generalize, but also a good problem.
I think going with a meta-heuristics way is a good start for this.

odd meteor Dec 18, 2021, 12:30 PM

#

timber flame Coding round isn't remotely related to deep learning

My response was with the assumption that you're interested in getting into another ML position. However, If you're more interested in coding round, then I'd suggest data structures and algorithm related tests.

Personally, the only coding test I've done during one of my ML job interview stages was a 2 hours Kaggle problem. 😂 About 11 of us were given a real life company data and asked to build a model and make prediction... F1 score was the metric used to rank submission. Top 4 submissions proceeded to the next level of the interview stage...

Again, it depends on your country as well. Other experienced guys here might be able to add one or two....

odd meteor Dec 18, 2021, 12:32 PM

#

tall loom spaCy didn't use an unsupervised approach for stop words, that's why they used a...

Okay, I get your point now.

stone marlin Dec 18, 2021, 12:33 PM

#

I'm in the US and have done ML-engineering + DS + DE roles --- it's a wildly different process depending on the company and their needs.

wicked grove Dec 18, 2021, 12:33 PM

#

wicked grove hello can you please tell me why we add (x) at the end of dense ```pyx = Flatten...

should i learn oops in python to understand it better?

stone marlin Dec 18, 2021, 12:34 PM

#

So, depends on what you're into 0n3. You've done MLE for 1.5yrs, you prob know a direction you wanna move into. From there, perhaps look at some Indeed / BuiltIn postings to see what technologies and so forth some desired companies use.

#

I've never been asked to do anything related to deep learning w/rt my work/interviews but I'm also not in an industry which uses deep learning frequently, so YMMV.

pastel valley Dec 18, 2021, 12:34 PM

#

tall loom What I meant is, the task of finding an optimal number of augmentation depends o...

oh i see then it will not be a good topic unless i figure out a technique on the better topic you suggested

stone marlin Dec 18, 2021, 12:38 PM

#

Regardless of where you move into, if you haven't checked out Advent of Code (which is happening now!) check it out. Solving those problems will teach you a ton about software engineering and the like, and that's always a plus for applicants (at least in the fields I work). There's also a bunch'a channels here for it.

pastel valley Dec 18, 2021, 12:45 PM

#

stone marlin Regardless of where you move into, if you haven't checked out Advent of Code (wh...

where is this advent of code?

#

website?

#

or channel?

stone marlin Dec 18, 2021, 12:46 PM

#

It can be found at https://adventofcode.com/ and the associated channel here is #advent-of-code .

#

Oh, also, I meant this for 0n3, but it is useful for anyone.

pastel valley Dec 18, 2021, 12:49 PM

#

@tall loom yo sir i found a paper that classifies fish using cnn also but they did not apply augmentation on their dataset so if i redo their study and apply augmentation would it be a good contribution?
sorry i just dont know how to classify what is a good or nah topic because there i see more on solving low level problems like about the theories which is so hard for me and i see this classification tasks as topic and i dont know why our topic got revised and demanded a modification or alternative to data augmentation or find an issue on a cnn model that we can address which is soo much for me hahaha

pastel valley Dec 18, 2021, 12:50 PM

#

stone marlin It can be found at https://adventofcode.com/ and the associated channel here is ...

oh this like challenges ?

#

its like school festival but for programmers?

stone marlin Dec 18, 2021, 12:50 PM

#

It's a series of programming challenges, yes.

pastel valley Dec 18, 2021, 1:31 PM

#

@tall loom yo sir i go back about the image augmentation stuff
what if what i experiment is the parameters?
each configuration will have each parameters and tested
1 zoom, flip
2 color manipulation
3 patch erasing cropping
the best resulting parameters for image augmentation can be the my contribution for like
if there are anyone interested in training a model on classifying fish and they have those fish images ithey can use the result of the experiment on which augmentation techniques they can apply on their model yeah? or nah?

#

they recommended me to think of an alternative way or better way for image augmentation for cnn models but i think everything is already there i cant figure out new technique

tall loom Dec 18, 2021, 1:37 PM

#

pastel valley <@!570671965069901845> yo sir i found a paper that classifies fish using cnn al...

Here is how it will be:

You are getting better results by using image augmentation, on the methods which THEY have used.
There already exists other models which can classify fish, but it is not something that THEY have used, and perhaps they can generate better results without any augmentation, but for them baselines might already exist!

However, it can still get accepted, I know of a case personally who worked on some classification (can't disclose) on top of image augmentation, however they also used more dataset.
From a contributions perspective, it's not just better results though, its more about WHAT is used to get better results, augmentation is one way and it can stil be considered as a topic, but if you are looking for novelty then this is not it, but that is just my opinion. As augmentation is standard practice these days for similar tasks.

pastel valley Dec 18, 2021, 1:42 PM

#

tall loom Here is how it will be: 1. You are getting better results by using image augment...

yeah i see other topics like a classification task with cnn also and they got accepted which is weird because we are pretty much the same almost only the samples are different

also the reason they say that i need revision is because i need to have some originality or something like what i see on cnn that i can change or what alternative method i can replace with image augmentation which is what you say a novel topic right? why are they expecting that kind of idea to me hahahaha

#

so i am trying to find something like a comparative analysis type of study which maybe acceptable than creating a whole new theory which is impossible for me

tall loom Dec 18, 2021, 1:45 PM

#

pastel valley <@!570671965069901845> yo sir i go back about the image augmentation stuff what...

Yes, optimal parameter estimation is a good idea, although that is also task-dependent, and if you are restricting yourself to the task of "fish detection" then many things can be explored.
Although not generalized, it will still be a good analysis for this task and then it can be inferred by some empirical experiment on how it can be used further for similar if not all tasks.

But just testing won't be good enough, if you are doing only an iterative approach to find an optimum point, that will be data-dependent. But this can be a baseline for this data, the motive could be "How the standard practice of such and such hyperparameters for the fish classification task can be improvised if we modify this hyperparameter in this way, this you would have to show empirically of course, and then recommend something based on that"

pastel valley Dec 18, 2021, 1:46 PM

#

tall loom Yes, optimal parameter estimation is a good idea, although that is also task-dep...

i need to present a logical reason on why this configuration is better than that and that? is what you mean?

#

btw you are pro sir are you one of those who create papers and attend conferences?

tall loom Dec 18, 2021, 1:51 PM

#

pastel valley i need to present a logical reason on why this configuration is better than that...

Yes, you will say start with that logical reason as your hypothesis and then you prove it by showing it from results with that experiment of tuning. You will either have to justify why the previous tasks have NOT explored your configuration OR you will be the first to suggest that configuration, for that task.
And the main theme should be how this idea can be implemented for other similar tasks.

pastel valley Dec 18, 2021, 1:51 PM

#

for example in this 3 configuration
1 zoom, flip
2 color manipulation
3 patch erasing cropping
i got 1 as the best result that implies a good parameter for data augmentation of fish images

i need to say why is it the winner
like because the 2nd configuration makes the image pixels change colors thus making the model confused because fish colors are important features(for example this is my observation on the model using this configuration)
and another reason for 3rd configuration

#

the reasons would be from the observed result of experimentation

tall loom Dec 18, 2021, 1:58 PM

#

pastel valley for example in this 3 configuration 1 zoom, flip 2 color manipulation 3 patch e...

The other way.
This is like, you did some experiment randomly, analyzed some configurations, and then found this should work well, and then you are making a theory of why it is working better. This works when there is a generalized data or generalized model. But here, it's one specific task on some data!
So you would have to then explain further on why that "image pixel theory" is correct, not just for this experiment, but for in general for similar tasks.
(You can try running on other datasets, previous and after results, etc)

#

Or you can also start by proposing that why enhancing some parameters should work better and is something that has not been done in previous works, then you give a theory, and then implement it and show that it is correct for a number of datasets, this is how it should work though, as you start with some hypothesis based on some observation and then you prove it.

pastel valley Dec 18, 2021, 2:01 PM

#

oh for example i have the results then i will try the configuration on example imagenet dataset and check the result with and without using the configuration

pastel valley Dec 18, 2021, 2:02 PM

#

tall loom Or you can also start by proposing that why enhancing some parameters should wor...

but if i fail to prove it do i also fail my semester? 😅 or is it still an acceptable discovery?

#

for example my hypo is the color manipulation is bad configuration and it turns out good dang what do i do hahaha

tall loom Dec 18, 2021, 2:04 PM

#

pastel valley but if i fail to prove it do i also fail my semester? 😅 or is it still an acce...

No, you won't fail, the conclusion would be that hyperparameter tuning for such and such parameters is not acceptable and is something that is not worth implemeting, given the increase in efficiency is not sufficient to justify the extra data.

#

This helps others to NOT explore the part or explore in a different way(some other way which you can also suggest in end to explore further as ideas)

pastel valley Dec 18, 2021, 2:07 PM

#

oh so this what a real science topic is right?

tall loom Dec 18, 2021, 2:10 PM

#

pastel valley for example my hypo is the color manipulation is bad configuration and it turns ...

Also, the people judging also matter in this case if its for semester exams, ( a lot of bias occurs from panel memebers) you would have to present it in a different way then.
Keep a few hypotheses on side(like different configurations, you would have to read some papers on why people are exploring something), and if all of them fail on your data, then your project will be simply "effects of certain changes in a system" and tbh it will be an interesting study to explain why some tuning didn't work and why it worked for others. If you figure this out, you can definitely suggest some different directions to explore by end.

tall loom Dec 18, 2021, 2:11 PM

#

pastel valley oh so this what a real science topic is right?

Yes yes! Just make sure to not just analyze things and say something about them, it should be a comprehensive study and any claim should be justified with experiments+/mathematically or citing other works

pastel valley Dec 18, 2021, 2:12 PM

#

tall loom Also, the people judging also matter in this case if its for semester exams, ( a...

yeah there are different panel members by each group and maybe i draw the short straw on this one hahaha

lean hull Dec 18, 2021, 2:13 PM

#

trying to install Tensorflow/PyTorch on my Mac M1. is it possible without installing conda?
conda comes with too many unused packages and junk associated with it.

pastel valley Dec 18, 2021, 2:13 PM

#

tall loom Yes yes! Just make sure to not just analyze things and say something about them,...

its like everything should have a legitimate basis? this is the crucial part right?

serene scaffold Dec 18, 2021, 2:15 PM

#

lean hull trying to install Tensorflow/PyTorch on my Mac M1. is it possible without instal...

do you need CUDA?

#

just pip install torch should work on Mac but getting CUDA requires extra work, apparently.

pastel valley Dec 18, 2021, 2:17 PM

#

image augmentation is pretty popular so maybe there are alot of classification task that uses image augmentation that i can compare

tall loom Dec 18, 2021, 2:18 PM

#

pastel valley yeah there are different panel members by each group and maybe i draw the short ...

No, don't worry. Any study which concludes something with experiments and ideas is accepted. It should be robust though, the part after the results should be rigorous. For example: Just analyzing a dataset and then concluding something about it is not acceptable, you would have to justify the conclusion, its reason with robust evidence, and then further generalize it.

tall loom Dec 18, 2021, 2:23 PM

#

pastel valley its like everything should have a legitimate basis? this is the crucial part rig...

Yes, you cannot say something happened because you think this is why it should happen. Your reason should have evidence.
The only thing to be careful about it, "You are explaining the panel about what you thought should happen, its reason and you concluded it didn't happen"
And turns out your reason has flaws OR something didn't happen and the cause of it is something to be studied before even starting. Discuss this with supervisor 🙂

lean hull Dec 18, 2021, 2:26 PM

#

serene scaffold do you need CUDA?

yeah, I do

serene scaffold Dec 18, 2021, 2:26 PM

#

lean hull yeah, I do

let me see

serene scaffold Dec 18, 2021, 2:27 PM

#

lean hull yeah, I do

looks like Anaconda is the recommended way to do it for Mac.

wicked grove Dec 18, 2021, 2:30 PM

#

tall loom Yes yes! Just make sure to not just analyze things and say something about them,...

Hello, im trying to do data augmentation for my dataset of 40 images and am a bit confused

#

Do i do the augmentation per image and generate 10 images

pastel valley Dec 18, 2021, 2:30 PM

#

tall loom Yes, you cannot say something happened because you think this is why it should h...

yeah we got advisers but they are pretty much busy anyways i think i got this topic and maybe give this idea to the panel and start drafting on how should this go
incase it got accepted hahhaha because base on my understanding is i need to have something original to add or modify to existing algorithms well maybe this experimental study could suffice the panel

thank you very much sir hoping to talk to you again with my future struggles 😅 👍

wicked grove Dec 18, 2021, 2:31 PM

#

wicked grove Do i do the augmentation per image and generate 10 images

Or should i go about it in batches??

#

train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=45,horizontal_flip=True)
``` i have been stuck and idk what i should do after this:/

tall loom Dec 18, 2021, 2:36 PM

#

pastel valley yeah we got advisers but they are pretty much busy anyways i think i got this to...

Yes, even if it's not original, a rigorous study is sufficient for a thesis. However, some modifications and suggestions can help in publication as well if that is your goal. And dont call me sir! 😄

lament idol Dec 18, 2021, 2:45 PM

#

Is anyone here from the United Kingdom?

serene scaffold Dec 18, 2021, 2:46 PM

#

lament idol Is anyone here from the United Kingdom?

why do you ask?

lament idol Dec 18, 2021, 2:46 PM

#

Has anyone here done A-Level maths?
I wanted to ask if the knowledge for calculus from there is sufficient enough to do stuff in machine learning

#

Or would I need to learn anything extra?

serene scaffold Dec 18, 2021, 2:47 PM

#

lament idol Has anyone here done A-Level maths? I wanted to ask if the knowledge for calculu...

I can't speak for what the job market is like in the UK, but AI depends on linear algebra and prob/stat, so it depends on whether they will let you learn that theory on the job. I suspect not.

tall loom Dec 18, 2021, 2:48 PM

#

wicked grove Hello, im trying to do data augmentation for my dataset of 40 images and am a bi...

You have to augment every image in your training dataset

wicked grove Dec 18, 2021, 2:48 PM

#

And how many images should be generated per image? I think i am actually confused

lament idol Dec 18, 2021, 2:49 PM

#

serene scaffold I can't speak for what the job market is like in the UK, but AI depends on linea...

In that case, I think i should be fine then
I'm starting uni soon but I heard that machine learning is heavy on calculus, which the degree I am taking doesn't expand on further

serene scaffold Dec 18, 2021, 2:49 PM

#

lament idol In that case, I think i should be fine then I'm starting uni soon but I heard th...

it's more linear algebra than calculus.

lament idol Dec 18, 2021, 2:49 PM

#

So I was wondering whether I'd need to catch up on or learn anything extra
But there was this other book I found online called mathematics for machine learning

#

Would you recommend it?

wicked grove Dec 18, 2021, 2:50 PM

#

tall loom You have to augment every image in your training dataset

Also I'm augmenting before training the model,is it necessary to augment it on the fly or can i augment before save it to the drive and use it in cnn later?

serene scaffold Dec 18, 2021, 2:50 PM

#

lament idol So I was wondering whether I'd need to catch up on or learn anything extra But t...

you should learn linear algebra and probability/stastistics. My degree (B.Sc. in computer science/data science, in the US) had both.

tall loom Dec 18, 2021, 2:50 PM

#

wicked grove And how many images should be generated per image? I think i am actually confuse...

So you split the data into train, test, and validation
The number of images depends on your motive, you can create a rescaled image, a section of the image, some rotation image, etc

lament idol Dec 18, 2021, 2:52 PM

#

serene scaffold you should learn linear algebra and probability/stastistics. My degree (B.Sc. in...

Can i go into data science with limited knowledge in calculus? So just stuff like differentiation and integration and other things

serene scaffold Dec 18, 2021, 2:52 PM

#

lament idol Can i go into data science with limited knowledge in calculus? So just stuff lik...

No

lament idol Dec 18, 2021, 2:52 PM

#

serene scaffold No

What else would I need then? Apart from linear algebra and stats

serene scaffold Dec 18, 2021, 2:53 PM

#

lament idol What else would I need then? Apart from linear algebra and stats

in terms of math, or other general skills?

lament idol Dec 18, 2021, 2:53 PM

#

serene scaffold in terms of math, or other general skills?

Maths

tall loom Dec 18, 2021, 2:53 PM

#

wicked grove Also I'm augmenting before training the model,is it necessary to augment it on t...

You can augment it anytime, but the 'on the fly' is more like a function you are applying on the image, so you don't have to generate and save all those images, rather function(Image) goes in the network. (I think this is how the library should do, to save space and avoid reading writing all images, or at least that's how I would implement)

#

But from the learning perspective, generating 'on the fly' or providing input of all augmented images from saved data makes no difference.

serene scaffold Dec 18, 2021, 2:55 PM

#

lament idol Maths

graph theory, I guess

#

and discrete math

tall loom Dec 18, 2021, 2:55 PM

#

tall loom But from the learning perspective, generating 'on the fly' or providing input of...

All it is doing is, creating new data for the model to learn, and all goes in the network @wicked grove

lament idol Dec 18, 2021, 2:56 PM

#

serene scaffold and discrete math

I've found a book that teaches maths for machine learning

agile monolith Dec 18, 2021, 2:56 PM

#

stelercus

#

i need ur help

lament idol Dec 18, 2021, 2:56 PM

#

I'll just teach myself the math needed
Thanks for informing me of the math needed

odd meteor Dec 18, 2021, 2:56 PM

#

lament idol Has anyone here done A-Level maths? I wanted to ask if the knowledge for calculu...

I'm not in the UK but I did A-level in Math, Physics, and Economics. From my experience, my knowledge of A-level in math was sufficient for calculus but poor in linear algebra and statistics. I learned Linear Algebra and Statistics in the University.

lament idol Dec 18, 2021, 2:57 PM

#

odd meteor I'm not in the UK but I did A-level in Math, Physics, and Economics. From my exp...

Thank you for informing me
I'll learn the stuff needed then

agile monolith Dec 18, 2021, 2:57 PM

#

odd meteor I'm not in the UK but I did A-level in Math, Physics, and Economics. From my exp...

there are optionary modules in a level maths for stats, linear algebra is not taught coz u are assumedto know it already

serene scaffold Dec 18, 2021, 2:57 PM

#

agile monolith i need ur help

the last time someone was sure I was the only one who could help them, some of their limbs were never found.

wicked grove Dec 18, 2021, 2:58 PM

#

tall loom So you split the data into train, test, and validation The number of images depe...

I have a separate folder w images for validation and test

lament idol Dec 18, 2021, 2:58 PM

#

agile monolith there are optionary modules in a level maths for stats, linear algebra is not ta...

I think linear algebra is taught in fm

agile monolith Dec 18, 2021, 2:58 PM

#

serene scaffold the last time someone was sure I was the only one who could help them, some of t...

hahahahahahaha

lament idol Dec 18, 2021, 2:58 PM

#

I don't take fm because I'm massive stoopid

agile monolith Dec 18, 2021, 2:59 PM

#

lament idol I think linear algebra is taught in fm

pretty sure linear algebra is heavy in c3 and c4

agile monolith Dec 18, 2021, 2:59 PM

#

lament idol I think linear algebra is taught in fm

r u taking physics a lvl?

lament idol Dec 18, 2021, 2:59 PM

#

agile monolith r u taking physics a lvl?

Comp sci, physics, maths

agile monolith Dec 18, 2021, 2:59 PM

#

@serene scaffold come #help-pretzel

arctic crown Dec 18, 2021, 2:59 PM

#

what library should i use as a beginner? like tensorflow, pytorch, keras

wicked grove Dec 18, 2021, 3:00 PM

#

tall loom All it is doing is, creating new data for the model to learn, and all goes in th...

Thank uou soo much for the explanation!! I am doing this for a thesis paper. After getting more images i wanna preprocess it and then feed it into a cnn

serene scaffold Dec 18, 2021, 3:00 PM

#

arctic crown what library should i use as a beginner? like tensorflow, pytorch, keras

I would do something that isn't deep learning first.

#

So, none of them for now.

arctic crown Dec 18, 2021, 3:00 PM

#

then?

serene scaffold Dec 18, 2021, 3:00 PM

#

sklearn

agile monolith Dec 18, 2021, 3:00 PM

#

lament idol Comp sci, physics, maths

gg bad combo

wicked grove Dec 18, 2021, 3:00 PM

#

wicked grove Thank uou soo much for the explanation!! I am doing this for a thesis paper. Aft...

Also another question
What should be the batch size for the augmentation

agile monolith Dec 18, 2021, 3:00 PM

#

seems like a good combo at first but thats horrible

arctic crown Dec 18, 2021, 3:00 PM

#

serene scaffold sklearn

ah ok got any tutorials ?

serene scaffold Dec 18, 2021, 3:01 PM

#

arctic crown ah ok got any tutorials ?

not off the top of my head

lament idol Dec 18, 2021, 3:01 PM

#

agile monolith gg bad combo

It's good enough for me
I enjoy physics and I want to take comp sci at uni

#

So it works imo

arctic crown Dec 18, 2021, 3:01 PM

#

agile monolith gg bad combo

how?

agile monolith Dec 18, 2021, 3:01 PM

#

lament idol Comp sci, physics, maths

if u taking physics u must do FM too coz 50% of the topics are same

#

is it ur second year?

arctic crown Dec 18, 2021, 3:01 PM

#

serene scaffold not off the top of my head

i just cant find any on yt

lament idol Dec 18, 2021, 3:02 PM

#

agile monolith if u taking physics u must do FM too coz 50% of the topics are same

I know a lot of people who take physics but not fm
Then again, idk the contents of fm
Also yeah, second year

arctic crown Dec 18, 2021, 3:03 PM

#

serene scaffold sklearn

is sklearn and scikit-learn the same?

lean hull Dec 18, 2021, 3:03 PM

#

serene scaffold looks like Anaconda is the recommended way to do it for Mac.

😭

agile monolith Dec 18, 2021, 3:05 PM

#

lament idol I know a lot of people who take physics but not fm Then again, idk the contents ...

yh thats usually the teachers stupidity not knowing how many topics are actually linked. they keep saying no FM is too hard when u literally cover most of that in physics too.

lament idol Dec 18, 2021, 3:06 PM

#

agile monolith yh thats usually the teachers stupidity not knowing how many topics are actually...

I plan on teaching myself fm once exams end so I'll try and draw comparisons then

agile monolith Dec 18, 2021, 3:07 PM

#

lament idol I plan on teaching myself fm once exams end so I'll try and draw comparisons the...

how r u getting along with ur a lvls so far?

arctic crown Dec 18, 2021, 3:08 PM

#

serene scaffold not off the top of my head

which one shoulds i watch?

serene scaffold Dec 18, 2021, 3:08 PM

#

arctic crown which one shoulds i watch?

PeepoShrug

lament idol Dec 18, 2021, 3:08 PM

#

agile monolith how r u getting along with ur a lvls so far?

Good, thanks for asking
I submitted my uni app about two weeks ago and I've heard back from 4/5 of my unis

odd meteor Dec 18, 2021, 3:08 PM

#

agile monolith there are optionary modules in a level maths for stats, linear algebra is not ta...

Well, I'm from 🇳🇬. Statistics was the last module in Math curriculum that briefly introduced us to stats. And it only covered introduction to measures of central tendency... nothing too serious.

I proceeded to do my major in Statistics afterwards though.

agile monolith Dec 18, 2021, 3:09 PM

#

lament idol Good, thanks for asking I submitted my uni app about two weeks ago and I've hear...

which ones did u apply for?

lament idol Dec 18, 2021, 3:09 PM

#

agile monolith which ones did u apply for?

UCL, QMUL, Loughborough, City, Notts

agile monolith Dec 18, 2021, 3:09 PM

#

odd meteor Well, I'm from 🇳🇬. Statistics was the last module in Math curriculum that bri...

there are many optional modules for maths u only did S1 there is also S2,S3,S4

agile monolith Dec 18, 2021, 3:10 PM

#

lament idol UCL, QMUL, Loughborough, City, Notts

all russel?

lament idol Dec 18, 2021, 3:10 PM

#

agile monolith all russel?

City and Loughborough are the odd ones out

wicked grove Dec 18, 2021, 3:10 PM

#

@tall loom im so sorry i had another doubt
While doing the augmentation for the entire dataset
What parameters should i set for the image to come about 1000

odd meteor Dec 18, 2021, 3:11 PM

#

agile monolith there are many optional modules for maths u only did S1 there is also S2,S3,S4

Perhaps we have different A-level bodies. The one in my country is probably different from yours. Was yours Cambridge A-level?

agile monolith Dec 18, 2021, 3:13 PM

#

odd meteor Perhaps we have different A-level bodies. The one in my country is probably diff...

they are usually slightly different in other countries

agile monolith Dec 18, 2021, 3:13 PM

#

lament idol UCL, QMUL, Loughborough, City, Notts

gl fam, u need like 3 As for ucl

lament idol Dec 18, 2021, 3:14 PM

#

agile monolith gl fam, u need like 3 As for ucl

Thank you
I'm predicted A* A* A
So i think i should be fine

agile monolith Dec 18, 2021, 3:18 PM

#

yh u will be fine probs

lament idol Dec 18, 2021, 3:19 PM

#

I hope so
UCL is the only I haven't heard back from as of yet

odd meteor Dec 18, 2021, 3:19 PM

#

arctic crown which one shoulds i watch?

😃 Check all of them first before you settle for one. Drop anyone that doesn't work for you.

PS: I'd also advice learning from a more structured platform. Learning solely on YouTube can be overwhelming and also cause fatigue.

agile monolith Dec 18, 2021, 3:19 PM

#

lament idol I hope so UCL is the only I haven't heard back from as of yet

got a friend who went to ucl, everyday was a misery for him afaik

agile monolith Dec 18, 2021, 3:19 PM

#

lament idol I hope so UCL is the only I haven't heard back from as of yet

u will if the predicteds are correct

lament idol Dec 18, 2021, 3:19 PM

#

agile monolith got a friend who went to ucl, everyday was a misery for him afaik

Was it that bad? I've heard good things about it

agile monolith Dec 18, 2021, 3:20 PM

#

in terms of difficulty*

lament idol Dec 18, 2021, 3:20 PM

#

agile monolith u will if the predicteds are correct

I just about meet the requirements so hopefully I should get in

arctic crown Dec 18, 2021, 3:20 PM

#

odd meteor 😃 Check all of them first before you settle for one. Drop anyone that doesn't w...

yea ty
you got any structured platforms that i can learn from?

lament idol Dec 18, 2021, 3:20 PM

#

agile monolith in terms of difficulty*

Ah, that makes sense

#

My first choice is either qmul or ucl

wicked grove Dec 18, 2021, 3:23 PM

#

odd meteor 😃 Check all of them first before you settle for one. Drop anyone that doesn't w...

Is it important to learn oops and dsa in python for a career in ml or ds ?

odd meteor Dec 18, 2021, 3:26 PM

#

arctic crown yea ty you got any structured platforms that i can learn from?

Andrew Ng Coursera course on ML
Udemy.com
DataQuest.io
DataCamp.com
Kaggle.com/learn
Jovian.ai
Neuromatch Academy

arctic crown Dec 18, 2021, 3:26 PM

#

odd meteor 1. Andrew Ng Coursera course on ML 2. Udemy.com 3. DataQuest.io 3. DataCamp.com ...

ty

arctic crown Dec 18, 2021, 3:28 PM

#

odd meteor 1. Andrew Ng Coursera course on ML 2. Udemy.com 3. DataQuest.io 3. DataCamp.com ...

hows this one https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms

Machine Learning with Python: Zero to GBMs | Jovian

A beginner-friendly introduction to supervised machine learning, decision trees, and gradient boosting using Python and Scikit-learn. Earn a verified certificate of accomplishment by completing assignments & building a real-world project.

odd meteor Dec 18, 2021, 3:28 PM

#

arctic crown ty

In fact, Neuromatch Academy has one of the best customer-friendly course on Deep Learning.

https://academy.neuromatch.io/

Check their YouTube channel also

Home

NMA 2021 has now concluded!
Thank you to all of our wonderful TAs, Students, Mentors, and Volunteers for an amazing experience.

If you missed it, the materials (videos and tutorials codebooks) will be available for free with no time limit
Scroll down to find content links for both of our 2021

wicked grove Dec 18, 2021, 3:29 PM

#

arctic crown which one shoulds i watch?

There's a youtube channel called DataSchool ,his tutorials were really good

agile monolith Dec 18, 2021, 3:35 PM

#

lament idol My first choice is either qmul or ucl

why queen mary?

#

they arent that good

#

compared to ucl

lament idol Dec 18, 2021, 3:35 PM

#

agile monolith why queen mary?

Close to where I live and my girlfriend is going there for medicine

odd meteor Dec 18, 2021, 3:39 PM

#

wicked grove Is it important to learn oops and dsa in python for a career in ml or ds ?

I didn't learn OOP before I started ML. However, from my experience it'll be great to know OOP in Python before starting ML.

You'll find it very useful when you start learning PyTorch. Because PyTorch will assume you already have the knowledge of OOP. If you use other DeepLearning frameworks like TensorFlow + Keeas then you won't really use OOP.

It's good to know at least two frameworks. Avoid been over dependent on one DL framework.

PyTorch, TensorFlow + Keras, JAX, MXNet, Sonnet, CNTK, Kaffe etc..

Just know at least two.

So I started DL with TensorFlow but I'm currently learning PyTorch now. But I had to learn OOP first before coming back to PyTorch.

arctic crown Dec 18, 2021, 3:40 PM

#

arctic crown hows this one https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms

@odd meteor

odd meteor Dec 18, 2021, 3:41 PM

#

arctic crown hows this one https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms

I can't really tell you which one to use. Just check all of them and go for the one that works for you.

Jovian is nice but so are other websites as well.

odd meteor Dec 18, 2021, 3:42 PM

#

arctic crown <@!519319496868233227>

I started with Andrew Ng course then I dropped it and moved to Udemy.

wicked grove Dec 18, 2021, 3:46 PM

#

odd meteor I didn't learn OOP before I started ML. However, from my experience it'll be gre...

Ohhh okayy ,thank youu😁 im learning tensorflow right now
And what about dsa? Or do i just practice pyton on leetcode/hackerrank

odd meteor Dec 18, 2021, 3:48 PM

#

wicked grove Ohhh okayy ,thank youu😁 im learning tensorflow right now And what about dsa? Or...

DSA? Data Science Africa? 😀

arctic crown Dec 18, 2021, 3:53 PM

#

odd meteor I started with Andrew Ng course then I dropped it and moved to Udemy.

udemy is paid

stone marlin Dec 18, 2021, 3:54 PM

#

I've been curious --- most of y'all seem to be doing a lot of NN-type things, but I've seen very little of it in the field in the last ten-or-so years where I've done work [IoT, Travel, Loans...] besides a few ad hoc projects. Do your fields have you working with NNs quite a bit?

I think the most I've used'em is autoencoders for fraud / sensor fluctuations.

Edit: I didn't mean to sound judge-y here! I'm legit interested in what people are doing with NNs.

wicked grove Dec 18, 2021, 3:55 PM

#

odd meteor DSA? Data Science Africa? 😀

😂😂no no data structures and algorithms

odd meteor Dec 18, 2021, 4:02 PM

#

wicked grove 😂😂no no data structures and algorithms

Ooh... I wasn't a software developer before getting into ML so I don't really know beyond my undergrad Big O notation in csc class 😩 I don't know anything about binary tree either.. Lol

I do pick up new stuff along the way though. I was first a Statistician before ML blew up to become something I could no longer ignore .

odd meteor Dec 18, 2021, 4:05 PM

#

arctic crown udemy is paid

Oftentimes much value is gotten from paid services 😁 don't you think so?

There's no amount of free ML course online that could match Courses on platforms like DataCamp and DataQuest. I personally haven't seen one yet.

wicked grove Dec 18, 2021, 4:10 PM

#

odd meteor Ooh... I wasn't a software developer before getting into ML so I don't really kn...

ohhh alrightt!!i guess ill leave it for now then

#

@odd meteor i was trying out the data augmentation and this is what i did ... could you please tell me how i can i save the augmented images?

#

train_path = 'D:\glaucoma_train\ODIR-5K\ODIR-5K\Glaucoma'
train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=45,horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(train_path,target_size=(512,512),batch_size=16)```

odd meteor Dec 18, 2021, 4:13 PM

#

stone marlin I've been curious --- most of y'all seem to be doing a lot of NN-type things, bu...

I think Research oriented companies tend to use NN the most. DeepMind, Google, etc.

My company (folks at Research department actually) mostly worked on CV + GAN this year. With two research papers on the projects accepted at this year's NeurIPS.

stone marlin Dec 18, 2021, 4:16 PM

#

odd meteor I think Research oriented companies tend to use NN the most. DeepMind, Google, e...

This checks out. I've mainly seen them in academia or on things which are heavy users of image-processing [self-driving cars, satellite + agriculture, etc.] so I was interested in seeing who was doing what with them.

Out of all of the NNs I wanted to dip into, GANs are among the top ones. They're pretty interesting!

odd meteor Dec 18, 2021, 4:17 PM

#

wicked grove <@!519319496868233227> i was trying out the data augmentation and this is what i...

I don't know enough about Image Processing or CV yet to give an appropriate response at this time.

I just started my DL journey with NLP

stone marlin Dec 18, 2021, 4:18 PM

#

Dang, have fun with NLP. That's one I started up again because it seems to be in demand at more and more places. It's cool, I just don't remember a whole lot of it. And I def don't know the new techniques.

odd meteor Dec 18, 2021, 4:20 PM

#

stone marlin This checks out. I've mainly seen them in academia or on things which are heavy...

It's mind blowing to be honest 😀. I do see the wonders those guys in my company's Research Dept are making from time to time.

stone marlin Dec 18, 2021, 4:22 PM

#

Yeah, I get a little jealous that most of what I'm doing is fittin' GLMs and boostin' a bit! Haha, but NNs, even with LIME, are pretty terrible explainers and it's hard to say to a customer, "Hey, your XYZ is going to break." "What signals lead you to believe that?" "Idk, lol."

odd meteor Dec 18, 2021, 4:27 PM

#

stone marlin Yeah, I get a little jealous that most of what I'm doing is fittin' GLMs and boo...

Bruhhhhhh you're not the only one that's jealous. Why did I just started learning Deep Learning? Because I can't take it anymore.... I just have to know it too 😀 😂 😂

stone marlin Dec 18, 2021, 4:28 PM

#

I know what'cha mean. Maybe I'll jump down this rabbit hole too and try'ta learn some!

severe hare Dec 18, 2021, 6:20 PM

#

How do I plot a function in opencv

#

I want to plot a parabola

#

simple y = x^2

bronze skiff Dec 18, 2021, 6:21 PM

#

is... is matplotlib not sufficient for you?

severe hare Dec 18, 2021, 6:21 PM

#

bronze skiff is... is matplotlib not sufficient for you?

My teacher asked me to draw a function and then apply transformation on it in opencv

swift sigil Dec 18, 2021, 6:44 PM

#

Guys can anyone tell me how statistics ,AI, ML ,DL , Probability are used in Data Science . I am very confused so kindly give me a real life example

bronze skiff Dec 18, 2021, 6:50 PM

#

your confusion stems from probably not defining data science well

#

how would you define it?

arctic wedgeBOT Dec 18, 2021, 6:51 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1639854074:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia Dec 18, 2021, 7:58 PM

#

F

timber flame Dec 18, 2021, 8:31 PM

#

odd meteor My response was with the assumption that you're interested in getting into anoth...

Well thanks for your input 🤠 I don't know how most companies operate, but from what I've seen coding skills seem way more important (think MLops) than ML skills as a MLE, for Data Scientist roles what u say maybe true (more Kaggle less leetcode etc) but again thanks 👍

timber flame Dec 18, 2021, 8:31 PM

#

stone marlin I'm in the US and have done ML-engineering + DS + DE roles --- it's a wildly dif...

Right

timber flame Dec 18, 2021, 8:32 PM

#

stone marlin So, depends on what you're into 0n3. You've done MLE for 1.5yrs, you prob know ...

Same for me no deep learning, more and more of ML quick fix solutions then all about getting them deployed, AWS sagemaker + lambda ▶️

timber flame Dec 18, 2021, 8:32 PM

#

stone marlin Regardless of where you move into, if you haven't checked out Advent of Code (wh...

Yes ! That's the kind of stuff I need for a new job

timber flame Dec 18, 2021, 8:36 PM

#

stone marlin I've been curious --- most of y'all seem to be doing a lot of NN-type things, bu...

Data Scientists do use a lot of NN, one of my friends has a few papers published and is going to pursue a PhD too soon, got 1 patent too. He works at Uber in the computer vision section so yes he does regularly write purely data science oriented python programs ...

#

There are more and more jobs becoming like this

arctic wedgeBOT Dec 18, 2021, 8:57 PM

#

:incoming_envelope: :ok_hand: applied mute to @lusty sphinx until <t:1639861645:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

mighty spoke Dec 18, 2021, 8:57 PM

#

Hi i'm trying to plot(scatter plot) with binned data on the x axis and number of points in each bin on the y axis would anyone know how I would do this, any help appreciated, my code: ```x_cl, y_cl=zip(*sorted(zip(CL,peak)))

dat = pd.DataFrame({'CL' : x_cl, 'Y_VAL' : y_cl}) #we build a dataframe from the data

bins=np.arange(min(x_cl), max(x_cl), step=0.005)
categorical_object = pd.cut(x_cl, bins)
count=pd.value_counts(categorical_object)
grp = dat.groupby(by = categorical_object) #we group the data by the cut
plt.scatter()

plt.show()```

agile monolith Dec 18, 2021, 9:04 PM

#

i need help in #help-cheese regarding data science

granite furnace Dec 18, 2021, 9:11 PM

#

pd.DataFrame.hist()

#

i think it has param options to use plt as backend as well

#

oh scatter

#

sorry

#

@mighty spoke https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html this looks like what youre trying to do

#

also https://matplotlib.org/stable/gallery/statistics/hexbin_demo.html

bronze skiff Dec 18, 2021, 10:27 PM

#

stone marlin I've been curious --- most of y'all seem to be doing a lot of NN-type things, bu...

there's a lot of data out there that isn't purely tabular, and deep learning is less about neural networks and more about "architectures for creating certain inductive biases for un- or semi-structured data"

#

and actually, even for tabular data we use it

stone marlin Dec 18, 2021, 10:36 PM

#

I usually hear deep learning as using anything with hidden layers, but what sort of things are you doing with tabular data, for example?

bronze skiff Dec 18, 2021, 10:40 PM

#

i mean, you can build a basic model that takes a tabular dataset for regression and use a transformer-based model on it

#

there are some conveniences that it gives, one of which is it is significantly easier to write custom loss functions

granite furnace Dec 18, 2021, 10:42 PM

#

slightly off-topic, how important are custom loss functions? are the standard ones usually too generalized?

bronze skiff Dec 18, 2021, 10:42 PM

#

it depends on your problem, of course

#

and "custom" is relative to the specific package you're using and which functions are pre-installed

granite furnace Dec 18, 2021, 10:44 PM

#

I see, so you didn't necessarily mean loss functions you write yourself

bronze skiff Dec 18, 2021, 10:44 PM

#

like, if you're working on underwriting, you might use tweedie deviance as your loss function

#

some GBM packages support it, some don't

#

you will often have to write one yourself if it doesn't

#

but there are times where you have to write it yourself

#

and in that case you would wish that you were working with an autodiff library like pytorch

granite furnace Dec 18, 2021, 10:46 PM

#

does this also extend to validation methods?

bronze skiff Dec 18, 2021, 10:47 PM

#

your validation method usually uses similar metrics to that used in model training

#

and if not, then you would need to check that

#

like, god forbid you have to write your own implementation of the hyvarinen scoring loss in lightgbm

granite furnace Dec 18, 2021, 10:50 PM

#

ml world is a scary place it seems. just took intro to AI this semester and was curious. thanks for humoring me

stone marlin Dec 18, 2021, 10:56 PM

#

Sure, but aren't transformer-based models NNs? I do agree that that kind of ensemble might be nice to do --- I don't think I've seen it done outside of CV though.

#

Oh, maybe I misunderstood what you were saying. It's not just plugging things into NNs, but rather a methodology of model-making around certain types of data. Maybe?

arctic crown Dec 18, 2021, 11:03 PM

#

is shaping the same in all frameworks?

lapis sequoia Dec 18, 2021, 11:05 PM

#

Hello, I was working on a binary classification NN with Pytorch and I got this issue: https://stackoverflow.com/q/70405429/13071340 , maybe you have any ideas on how can I fix it.

bronze skiff Dec 18, 2021, 11:05 PM

#

stone marlin Oh, maybe I misunderstood what you were saying. It's not just plugging things i...

yes

#

i mean, deep learning is kinda catch-all term, right?

#

but a lot of research is really centering around the idea of "inductive biases"

stone marlin Dec 18, 2021, 11:05 PM

#

Got'cha. Yeah, but I usually hear it specifically referring to things where you're using hidden layers to feature-find.

#

I'm not sure I've heard of Inductive Biases --- it seems that https://en.wikipedia.org/wiki/Inductive_bias is fairly general though.

bronze skiff Dec 18, 2021, 11:09 PM

#

a simple example of an inductive bias is in a conv net

#

if you have images, you would expect that the correlation between two pixels drops off heavily with distance

#

so a fully-connected neural net would perform not great on it

stone marlin Dec 18, 2021, 11:10 PM

#

Sure, this is also the principle behind k-NN though, no?

bronze skiff Dec 18, 2021, 11:10 PM

#

mostly because it has parameters for every distance on the image

stone marlin Dec 18, 2021, 11:11 PM

#

So, I guess the idea is "which inductive biases work in what way for what problem" is a research topic for NN stuff. Makes sense.

bronze skiff Dec 18, 2021, 11:11 PM

#

yeah, the inductive biases of knn is that "you are defined by what you're close to"

#

there's a lot of ML research these days on "how can i build my model architecture to exploit these inductive biases that i want to hold"

stone marlin Dec 18, 2021, 11:12 PM

#

Yeah. If we are gonna just take deep learning as "not shallow learning" (not given features a priori) then it would probably be good to know what features it can create and what proximity to cluster them around, via inductive biases.

bronze skiff Dec 18, 2021, 11:12 PM

#

that's a gist, yeah

stone marlin Dec 18, 2021, 11:14 PM

#

Yeah, that jives with what you noted about custom loss functions, because it's basically the same kind'a deal.

#

It would be nice to ensemble something onto a standard alg for tabular data (even for a toy problem), but my hesitance is always that I need interpretability for most of my jobs. But maybe just for fun I'll try something out on a toy data set and see what it picks up.

bronze skiff Dec 18, 2021, 11:18 PM

#

interpretability is a harder job, but at my job we approach it via a lot of ablation and perturbation testing

#

also, keeping a bunch of reasonable synthetic data sets around is wonders for debugging weird model predictions

stone marlin Dec 18, 2021, 11:19 PM

#

Yeah, LIME + Perturbation was pret much stock and standard when I was doing it, but LIME is sometimes a pretty big nightmare to work with if you're feature-heavy.

bronze skiff Dec 18, 2021, 11:19 PM

#

agreed

#

unfortunately, that's pretty state of the art unless you have someone whose full time job is to investigate this stuff

stone marlin Dec 18, 2021, 11:20 PM

#

Yeah, the other kin of LIME are pretty much more or less specific versions. I honestly don't know, with the computing power we have now, how we would get anything better than pert testing and maybe well-defined LIME stuff. But maybe someone clever is on it.

#

Maybe I'll try it on some easy, small-featured dataset and see if I can mess around with different kinds of transformers. Might be worth learning!

granite furnace Dec 19, 2021, 12:09 AM

#

This is a result of a HalvingRandomSearchCV(), which looks nothing like the example here https://scikit-learn.org/stable/auto_examples/model_selection/plot_successive_halving_iterations.html#sphx-glr-auto-examples-model-selection-plot-successive-halving-iterations-py
Can it still be considered "correct?"

scikit-learn

Successive Halving Iterations

This example illustrates how a successive halving search ( HalvingGridSearchCV and HalvingRandomSearchCV) iteratively chooses the best parameter combination out of multiple candidates. import panda...

#

RandomizedSearchCV took 24.04 seconds for 188 candidates parameter settings.
Model with rank: 1
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007622902978304772, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.00015997189211448186}

Model with rank: 2
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007830958292280829, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.0002491044511612459}

Model with rank: 3
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007622902978304772, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.00015997189211448186}

this is some more output of my search

hasty mountain Dec 19, 2021, 12:28 AM

#

Can someone give me a hint on how to avoid lack of convergence in DCGAN model? I'm kind of tired of looking at figures of squares full of random colored pixels...

#

My Adam optimizers have the same learning rate as the one used in Pytorch's tutorial for DCGAN, yet my model doesn't work.

stoic musk Dec 19, 2021, 3:38 AM

#

23 for y in range(len(pos)):
24 for k in range(d):
---> 25 angles[y,k] = np.sin(y/(100002i/d)) if k % 2 == 0 else np.cos(y/(100002i/d))
26
27

ValueError: setting an array element with a sequence.

Trying to code position encoding for an RNN Transformer

vague moon Dec 19, 2021, 4:59 AM

#

Hey, I have been having trouble trying to install Bazel, is anybody willing to help me with the process. With bazelisk I'd think this would be really quick and easy I don't know why I'm running into so much trouble

hasty grail Dec 19, 2021, 6:37 AM

#

stoic musk 23 for y in range(len(pos)): 24 for k in range(d): ---> 25 ...

You'll have to provide more context than that, perhaps you should show more of your code

barren tide Dec 19, 2021, 7:36 AM

#

How do i connect a folder to google drive..

frail frost Dec 19, 2021, 8:40 AM

#

has anyone here tried openAI’s codex?
i need a little help with it

low spear Dec 19, 2021, 9:09 AM

#

does anyone know what is tenserflow v2 equivalent for image_dim_ordering()

#

what does image_dim_ordering() do actually

wicked grove Dec 19, 2021, 9:57 AM

#

@serene scaffold hello i have a doubt in pandas
The length of this df is 7820
I want to make only the images with 0s to a total of 1000 and drop the rest

#

Can you please guide me

#

#

Should i split it into separate columns and use df.drop or is there a simpler way?

lapis sequoia Dec 19, 2021, 11:03 AM

#

wicked grove <@253696366952316929> hello i have a doubt in pandas The length of this df is 78...

If you only want 0s why use df? I'm not sure what you mean over here.

tired shuttle Dec 19, 2021, 11:16 AM

#

how do I connect to a sqlite3 database that's on another computer?

#

do I have to host it on a server?

wicked grove Dec 19, 2021, 11:51 AM

#

lapis sequoia If you only want 0s why use df? I'm not sure what you mean over here.

No no i want both 0s and 2s but i want to reduce the images having 0s to a 1000

#

And i do not know how i should do that

lapis sequoia Dec 19, 2021, 11:53 AM

#

wicked grove No no i want both 0s and 2s but i want to reduce the images having 0s to a 1000

You can find all images for 0 and then slice down to 1000?

wicked grove Dec 19, 2021, 12:07 PM

#

lapis sequoia You can find all images for 0 and then slice down to 1000?

How do i do that??

wicked grove Dec 19, 2021, 12:24 PM

#

Should i split the df such that i have 2 columns one w 0s and another w 2s or is there a better way?

lapis sequoia Dec 19, 2021, 1:27 PM

#

wicked grove Should i split the df such that i have 2 columns one w 0s and another w 2s or is...

that is the way i can think right now.

wicked grove Dec 19, 2021, 1:28 PM

#

Alrightt!!

#

Thank youu!!

serene scaffold Dec 19, 2021, 1:58 PM

#

wicked grove

I don't help with screenshots of DataFrames. Sorry sad_cat

wicked grove Dec 19, 2021, 2:06 PM

#

serene scaffold I don't help with screenshots of DataFrames. Sorry <:sad_cat:827636798666309712>

merged_labels = pd.merge(train_labels1,labels )
merged_labels.tail(5)
index_names = merged_labels[ merged_labels['level'] == 1 ].index
merged_labels.drop(index_names, inplace = True)
#print(merged_labels.head(20))
merged_labels_norm=merged_labels[merged_labels['level']==0]
merged_labels_dr=merged_labels[merged_labels['level']==2]

#print(merged_labels_norm.head(6))
print(len(merged_labels_dr))
merged_labels_norm.iloc[0:1000,:]
merged_labels_dr.iloc[0:1000,:]
final_df1 = pd.merge(merged_labels_norm,merged_labels_dr)
print(final_df1.head(5))```

#

this is what i tried

serene scaffold Dec 19, 2021, 2:06 PM

#

wicked grove ```py merged_labels = pd.merge(train_labels1,labels ) merged_labels.tail(5) inde...

I need to know what data is in it as text. df.head().to_dict('list')

wicked grove Dec 19, 2021, 2:06 PM

#

this is the df

#

0    10003_left    0
1    10003_right    0
2    10007_left    0
3    10007_right    0
4    10009_left    0
...    ...    ...
8403    19494_right    0
8404    19498_left    0
8405    19498_right    0
8406    194_left    0
8407    194_right    0```

serene scaffold Dec 19, 2021, 2:07 PM

#

please do df.head().to_dict('list')

#

that's the only way that I'll read it. Otherwise I have to go back to what I was doing.

wicked grove Dec 19, 2021, 2:08 PM

#

oh okayy

serene scaffold Dec 19, 2021, 2:09 PM

#

if I understand correctly, you want to retain up to 1000 rows for which level is 0, ignoring the rest

#

try df.query("level == 0").head(1000)

wicked grove Dec 19, 2021, 2:10 PM

#

serene scaffold if I understand correctly, you want to retain up to 1000 rows for which `level` ...

yess exactly!!

wicked grove Dec 19, 2021, 2:10 PM

#

serene scaffold try `df.query("level == 0").head(1000)`

alright i will thank youu!!and i need to concat after that

#

but i get an empty df

serene scaffold Dec 19, 2021, 2:10 PM

#

concat with what?

wicked grove Dec 19, 2021, 2:11 PM

#

  '10003_right',
  '10007_left',
  '10007_right',
  '10009_left'],
 'level': [0, 0, 0, 0, 0]}```

serene scaffold Dec 19, 2021, 2:11 PM

#

what do you need to concat?

wicked grove Dec 19, 2021, 2:11 PM

#

serene scaffold concat with what?

with a df that has level 2

#

the above df has images with level 2

serene scaffold Dec 19, 2021, 2:12 PM

#

what are you really trying to do? there's apparently more to it than just getting 1000 rows where level == 0

wicked grove Dec 19, 2021, 2:12 PM

#

i split it in the same way

wicked grove Dec 19, 2021, 2:12 PM

#

serene scaffold what are you really trying to do? there's apparently more to it than just gettin...

correct there are rows with level==2

serene scaffold Dec 19, 2021, 2:12 PM

#

what are all the unique values in the level column?

#

is it 0, 1, and 2 only?

#

and you need 1000 from each?

#

because that's just df.groupby('level').head(1000)

wicked grove Dec 19, 2021, 2:16 PM

#

i got itt!!i did this, it was a stupid mistake

#

merged_labels_norm=merged_labels[merged_labels['level']==0]
merged_labels_dr=merged_labels[merged_labels['level']==2]

#print(merged_labels_norm.head(6))
print(len(merged_labels_dr))
merged_labels_norm=merged_labels_norm.iloc[0:1000,:]
merged_labels_dr=merged_labels_dr.iloc[0:1000,:]
final_df1 = pd.concat([merged_labels_norm, merged_labels_dr])
print(final_df1.head(2000))```

serene scaffold Dec 19, 2021, 2:19 PM

#

my way is probably going to be faster.

wicked grove Dec 19, 2021, 2:20 PM

#

serene scaffold my way is probably going to be faster.

i will try it outt

wicked grove Dec 19, 2021, 2:20 PM

#

serene scaffold because that's just `df.groupby('level').head(1000)`

ohhh i did not thing about thiss

wicked grove Dec 19, 2021, 2:20 PM

#

serene scaffold is it 0, 1, and 2 only?

only 0 and 2

serene scaffold Dec 19, 2021, 2:21 PM

#

then what I suggested would work. If there were values in the level column that you didn't want, only slightly more would be needed to ignore them.

wicked grove Dec 19, 2021, 2:24 PM

#

serene scaffold then what I suggested would work. If there were values in the level column that ...

alrightt!! thank youu😄 ill need slightly more or just 1000 is enough?

serene scaffold Dec 19, 2021, 2:27 PM

#

wicked grove alrightt!! thank youu😄 ill need slightly more or just 1000 is enough?

idk what you're trying to do, so idk.

lapis sequoia Dec 19, 2021, 2:41 PM

#

Do you control the body

#

Hey guys, i have these different set of files that are generated when i run my model. I want to organise all the files by putting them to respective folders. is there a smart way to organise them than manually selecting files and putting them to different folders?

#

serene scaffold Dec 19, 2021, 2:53 PM

#

lapis sequoia Hey guys, i have these different set of files that are generated when i run my m...

I don't read screenshots of text; are you doing it systematically based on their file name?

lapis sequoia Dec 19, 2021, 2:53 PM

#

yes

serene scaffold Dec 19, 2021, 2:53 PM

#

lapis sequoia Do you control the body

yes; I am the glasses

serene scaffold Dec 19, 2021, 2:54 PM

#

lapis sequoia yes

I recommend using pathlib

lapis sequoia Dec 19, 2021, 2:55 PM

#

okay!! do you some examples to follow through?

serene scaffold Dec 19, 2021, 3:44 PM

#

what were you expecting tensorflow to do with that array of objects?

#

do you understand why an array of objects that are lists is not a valid input?

#

that's part of it. arrays have to be "rectangular". you can't get around that by having an array of lists that are different lengths

#

but also, just naively throwing data into the network isn't going to accomplish anything. it looks like you haven't done any kind of pre-processing.

lapis sequoia Dec 19, 2021, 3:59 PM

#

lapis sequoia okay!! do you some examples to follow through?

@serene scaffold ?

serene scaffold Dec 19, 2021, 3:59 PM

#

lapis sequoia okay!! do you some examples to follow through?

no

vague moon Dec 19, 2021, 4:46 PM

#

Hey, I have been having trouble trying to install Bazel. I am installing with Bazelisk, and when installing through command line it seems to install correctly, I will get the text Starting local Bazel server and connecting to it... [bazel release 4.2.1] So I try and test it out by getting the version number but recieve the error 'bazel' is not recognized as an internal or external command, operable program or batch file.

lapis sequoia Dec 19, 2021, 4:47 PM

#

do you have pip

vague moon Dec 19, 2021, 4:48 PM

#

yes

zealous wolf Dec 19, 2021, 4:49 PM

#

upgrade required in think so sudo apt-get upgrade bazel

vague moon Dec 19, 2021, 4:54 PM

#

'sudo' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\*******\Desktop\bazelisk-master>apt-get upgrade bazel
'apt-get' is not recognized as an internal or external command,
operable program or batch file.```

#

I'm running windows btw

zealous wolf Dec 19, 2021, 5:12 PM

#

okay

#

https://docs.bazel.build/versions/main/install-windows.html

Installing Bazel on Windows

#

may this site help full visit once

bronze skiff Dec 19, 2021, 5:40 PM

#

vague moon Hey, I have been having trouble trying to install Bazel. I am installing with Ba...

you need to add bazel (or bazelisk) to your PATH

south gull Dec 19, 2021, 6:03 PM

#

I'm trying to use https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html but it can't even fit a second degree polynomial

#

https://cdn.discordapp.com/attachments/696888170167664691/922186959370194984/unknown.png

#

https://cdn.discordapp.com/attachments/696888170167664691/922187007252394004/unknown.png

#

blue are true values

#

heck, even giving it the optimal values as initial guess, it insist on making it linear

vague moon Dec 19, 2021, 7:14 PM

#

zealous wolf https://docs.bazel.build/versions/main/install-windows.html

Thank you

vague moon Dec 19, 2021, 7:14 PM

#

bronze skiff you need to add bazel (or bazelisk) to your PATH

will do

zealous wolf Dec 19, 2021, 7:15 PM

#

welcome

vague moon Dec 19, 2021, 7:17 PM

#

I'm trying to add bazel to path and reading how to download and add to path, I think the problem is I have a space in my path, my friend made my user name on windows "cum face" when he was helping repair my pc, so my my file path to my user folder is C:\Users\cum face, and according to the bazel site "None of these paths should contain spaces or non-ASCII characters."

#

fun

bronze skiff Dec 19, 2021, 7:54 PM

#

vague moon I'm trying to add bazel to path and reading how to download and add to path, I t...

...lmfao

#

i'm sorry, that's fucking hilarious

south gull Dec 19, 2021, 8:03 PM

#

friend made my user name on windows "cum face" when he was helping repair my pc
wth, that's not nice at all

bronze skiff Dec 19, 2021, 8:09 PM

#

south gull > friend made my user name on windows "cum face" when he was helping repair my p...

that's called friendship

south gull Dec 19, 2021, 8:11 PM

#

guess I don't know what friendship is then

cold skiff Dec 19, 2021, 8:20 PM

#

what are some good beginner resources for ai and data science, especially for someone being self tauhgt?
I know there are good modules like numpy and pandas, but I'd like a resource that helps me apply that to some projects if y'all know of any.
I have a project I'm working on but I need to see what a fleshed out data science project might look like

#

I found this resource a while back ago:

#

https://towardsdatascience.com/a-complete-26-week-course-to-learn-python-for-data-science-in-2022-e95b67551df4
but if y'all have any others I'd love to see it!

Medium

A Complete 26 Week Course to Learn Python for Data Science in 2022

Learn most of the Python stuff you need for data science in 26 weeks

vague moon Dec 19, 2021, 8:31 PM

#

cold skiff what are some good beginner resources for ai and data science, especially for so...

If you get any good responses send them my way too

#

Trying to add bazel to path still, do I directly link to the exe in Path or the directory it is in

odd meteor Dec 19, 2021, 8:49 PM

#

vague moon I'm trying to add bazel to path and reading how to download and add to path, I t...

You can always change it from "cum face" to a preferred name (that's if you wanna change it) 😂😂

vague moon Dec 19, 2021, 8:55 PM

#

I dont know if I can. So I changed my username but can't change the name of the folder from what I've tried, my original version of windows I bought was lost/corrupted during the repair process so I'm having to run a torrented version of windows at the moment which locks me out of changing some things until I activate windows

#

Finally I have installed bazel using chocolatey and it seems to work, it still gives me that error about a space in the file path but it seems to work now so fingers crossed.

hasty mountain Dec 19, 2021, 9:17 PM

#

south gull https://cdn.discordapp.com/attachments/696888170167664691/922186959370194984/unk...

Out of curiosity: which IDE and how?

vague moon Dec 19, 2021, 9:22 PM

#

new problem, if I type bazel into cmd 'bazel' is not recognized as an internal or external command, operable program or batch file. but if I run cmd as admin it works

#

I should be able to run bazel commands without running cmd as admin right?

south gull Dec 19, 2021, 9:39 PM

#

hasty mountain Out of curiosity: which IDE and how?

PyCharm
It's an option in the settings. I use it with high-contrast theme

hasty mountain Dec 19, 2021, 9:43 PM

#

south gull PyCharm It's an option in the settings. I use it with high-contrast theme

Thanks!

south gull Dec 19, 2021, 9:57 PM

#

e1happy

cosmic pelican Dec 19, 2021, 10:47 PM

#

hello, anyone knows how to fix this?

#

i need to set a fixed scale for both axis, using matplotlib

cosmic pelican Dec 19, 2021, 11:11 PM

#

wanna do it like this

cosmic pelican Dec 19, 2021, 11:26 PM

#

    file = open('Simulationn-algo.txt','r')
    data = json.loads(file.read())


    size = []
    Insertion =[]
    Merge =[]
    Heap =[]
    Quick =[]
    Bubble =[]
    Selection =[]
    Counting =[]

    for i in range(len(data['Simulation Details'])):
        size.append(data['Simulation Details'][i]['Size'])
        Insertion.append(data['Simulation Details'][i]['Insertion Sort'][0:5])
        Merge.append(data['Simulation Details'][i]['Merge Sort'][0:5])
        Heap.append(data['Simulation Details'][i]['Heap Sort'][0:5])
        Quick.append(data['Simulation Details'][i]['Quick Sort'][0:5])
        Bubble.append(data['Simulation Details'][i]['Bubble Sort'][0:5])
        Selection.append(data['Simulation Details'][i]['Selection Sort'][0:5])
        Counting.append(data['Simulation Details'][i]['Counting Sort'][0:5])

    _Insertion = np.array(Insertion)
    _Merge = np.array(Merge)
    _Heap = np.array(Heap)
    _Quick = np.array(Quick)
    _Bubble = np.array(Bubble)
    _Selection = np.array(Selection)
    _Counting = np.array(Counting)
    _size = np.array(size)

    plt.plot(_size, _Insertion, label='Insertion')
    plt.plot(_size, _Merge, label='Merge')
    plt.plot(_size, _Heap, label='Heap')
    plt.plot(_size, _Quick, label='Quick')
    plt.plot(_size, _Bubble, label='Bubble')
    plt.plot(_size, _Selection, label='Selection')
    plt.plot(_size, _Counting, label="Counting")


    plt.xlabel('size')
    plt.ylabel("Duration (ms)")
    plt.title("Different Sorting Algorithms")
    # plt.legend()
    plt.show()

Plot()```

south gull Dec 19, 2021, 11:46 PM

#

south gull I'm trying to use https://docs.scipy.org/doc/scipy/reference/generated/scipy.opt...

can someone help me? :(

grave frost Dec 19, 2021, 11:51 PM

#

cosmic pelican hello, anyone knows how to fix this?

your axes are off the grid

cosmic pelican Dec 19, 2021, 11:52 PM

#

grave frost your axes are off the grid

how to fix? i searched everywhere

grave frost Dec 19, 2021, 11:52 PM

#

cosmic pelican how to fix? i searched everywhere

you can't search it - that's the entire passive "off the grid"

cosmic pelican Dec 19, 2021, 11:53 PM

#

i searched for a fix

grave frost Dec 19, 2021, 11:54 PM

#

you apparently don't use reddit much - smart.

#

just google who to set scale for axes in matplotlib

cosmic pelican Dec 19, 2021, 11:56 PM

#

grave frost just google who to set scale for axes in matplotlib

didnt work 🙂

grave frost Dec 19, 2021, 11:57 PM

#

ask crypto then 😏

pastel valley Dec 20, 2021, 12:18 AM

#

yo how to normalize on a cnn model is there any layer that can do it on tensor? or it willbe a preprocessing?

charred umbra Dec 20, 2021, 12:27 AM

#

Does anyone know what the newest proposed activation function for deep neural networks is?

arctic crown Dec 20, 2021, 12:39 AM

#

the shape is just how you see the list right?

#

list/array

slow vigil Dec 20, 2021, 12:49 AM

#

does anyone know if it's possible to use Spark Streaming with websockets?

crude karma Dec 20, 2021, 1:43 AM

#

do beginner data science projects need to be perfect

halcyon storm Dec 20, 2021, 1:55 AM

#

Help

#

So baicalky I am working on a machine learning assignment in creating an algorithm that will detect brain tumors based on brain data tumor data sets. This is an assignment that is asked by my professor but I don’t have experience with coding.

These are the steps I already complete which is

(1) Import the dataset into a fresh Google Collab project
(2) Split the dataset into training / testing / validation sets (thursday)

But i still need help with
(3) Defining my classification model(s). You would probably want to try a few different models here. You can either build your own convolution neural network (layer by layer) in tensorflow and train it from scratch, or you can modify an existing pre-trained network like VGG19, alter it to better suit our binary classification needs, and retrain it on the dataset.
(4) Train you model(s) and evaluate

hasty mountain Dec 20, 2021, 2:29 AM

#

halcyon storm So baicalky I am working on a machine learning assignment in creating an algorit...

If you don't have experience with coding, things might get complicated. Try seeing how to use VGG19.
It seems that scikit-learn also has a module especially for creating a neural network automatically, but I don't know how reliable that is.
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

However, my bias is to create your own, which isn't hard if you try doing that through keras/tensorflow.keras.

scikit-learn

sklearn.neural_network.MLPClassifier

Examples using sklearn.neural_network.MLPClassifier: Classifier comparison Classifier comparison, Visualization of MLP weights on MNIST Visualization of MLP weights on MNIST, Compare Stochastic lea...

#

If your dataset is already preprocessed, then you just need to do something like this:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, Dense

model = Sequential()

model.add(Conv2D(filters=32, kernel_size=5, padding='same', strides=2, activation='relu'))
model.add(BatchNormalizaton())
model.add(Conv2D(filters=64, kernel_size=5, padding='same', activation='relu'))
model.add(Dense(1, activation='relu'))

And your neural network is ready. Just fit it to your dataset and train it.

halcyon storm Dec 20, 2021, 2:36 AM

#

Ohh ok

hasty mountain Dec 20, 2021, 2:37 AM

#

However, it's a good idea to learn more about it, how those layers work, so you can enhance this neural network.

south gull Dec 20, 2021, 2:39 AM

#

omg, I tried to write a network too, but it's incredibly dumb
can you help me with it maybe? @hasty mountain

hasty mountain Dec 20, 2021, 2:40 AM

#

south gull omg, I tried to write a network too, but it's incredibly dumb can you help me wi...

It depends. If it's a Generative Adversarial Network, I can't. I got problems with those.

south gull Dec 20, 2021, 2:40 AM

#

Nooooooooooo

#

It's vanilla

hasty mountain Dec 20, 2021, 2:41 AM

#

south gull It's vanilla

Hm...What for?

south gull Dec 20, 2021, 2:42 AM

#

uhm, nothing
I can't even train it on data, from a polynomial function
I've only managed with linear function

halcyon storm Dec 20, 2021, 2:43 AM

#

Basically there was this dude that helped me with this. And he wasnt able to continue to help me bc he is busy. Is it ok if I add u onto a group chat so then u can see where we left off? @hasty mountain

south gull Dec 20, 2021, 2:44 AM

#

I wrote the backpropagation and stuff
but I'm doing something wrong with training and gradient descent I think

hasty mountain Dec 20, 2021, 2:44 AM

#

halcyon storm Basically there was this dude that helped me with this. And he wasnt able to con...

Nah, sorry man, I can't. I'll have to sleep soon and I'll be busy for the week.

halcyon storm Dec 20, 2021, 2:44 AM

#

Ohh okk

#

Did u know who is willing to help me ?

gloomy hatch Dec 20, 2021, 2:45 AM

#

Hey guys, I'm trying to figure out a way to calculate (efficiently) sets of density based clusters given a set of x,y coordinates -- does anyone have any articles or suggestions for that?

hasty mountain Dec 20, 2021, 2:45 AM

#

halcyon storm Basically there was this dude that helped me with this. And he wasnt able to con...

But from what you told there, if you use keras, probably the only thing you'll have to worry is about preprocessing your data, how the neural network works and probably that'll be all. Maybe just organizing the output of your neural network...

Keras is really intuitive and quite useful. Like...to train your neural network you just define it and then do model.fit(X, y) and to predict data, model.predict(X, y)
Problem would be if you're using Pytorch.

hasty mountain Dec 20, 2021, 2:45 AM

#

halcyon storm Did u know who is willing to help me ?

Unfortunately not. Sorry.

gloomy hatch Dec 20, 2021, 2:47 AM

#

gloomy hatch Hey guys, I'm trying to figure out a way to calculate (efficiently) sets of dens...

I also asked this question in #help-potato

hasty mountain Dec 20, 2021, 2:47 AM

#

south gull I wrote the backpropagation and stuff but I'm doing something wrong with trainin...

Oh...if you're using Pytorch, things are a bit complicated...

#

I just started using Pytorch some days ago.

south gull Dec 20, 2021, 2:48 AM

#

hasty mountain Oh...if you're using Pytorch, things are a bit complicated...

Im not. I've just hacked it myself using a matrix library
things are probably complicated by that

#

wanted to try writing it all

hasty mountain Dec 20, 2021, 2:49 AM

#

Uuuh...then I really can't help you, you're going through a quite difficult path.

south gull Dec 20, 2021, 2:50 AM

#

uhhh, well I was still hoping you could help me
I think my problem is, in my gradient descent

bronze skiff Dec 20, 2021, 2:51 AM

#

why don't you post it

hasty mountain Dec 20, 2021, 2:51 AM

#

However, maybe you could learn how to do that if you check the source code for tensorflow...which is a module that requires manually creating every single step in a neural network...and it's the hardest way, already.

hasty mountain Dec 20, 2021, 2:52 AM

#

south gull uhhh, well I was still hoping you could help me I think my problem is, in my gra...

Well...if you've gone that far, then you probably know much more than me

south gull Dec 20, 2021, 2:53 AM

#

oh ok ahehe
well, gl with Pytorch though!

hasty mountain Dec 20, 2021, 2:53 AM

#

Try checking the source code for some optimizers in pytorch and keras.

hasty mountain Dec 20, 2021, 2:53 AM

#

south gull oh ok <:ahehe:592224211343507464> well, gl with Pytorch though!

Thanks!

bronze skiff Dec 20, 2021, 2:53 AM

#

south gull oh ok <:ahehe:592224211343507464> well, gl with Pytorch though!

post the code you're having difficulties with

bronze skiff Dec 20, 2021, 2:54 AM

#

hasty mountain Try checking the source code for some optimizers in pytorch and keras.

that's a terrible way to first learn about gradient descent

#

since none of the gradients are exposed

hasty mountain Dec 20, 2021, 2:54 AM

#

Hm, I didn't know about that

south gull Dec 20, 2021, 2:55 AM

#

bronze skiff post the code you're having difficulties with

okay, it's not python though, so I hope that's alright
we started talking about this, because someone else mentioned it

bronze skiff Dec 20, 2021, 2:55 AM

#

gradient descent is literally params = params - learning_rate * gradient

#

lol sure

south gull Dec 20, 2021, 2:56 AM

#

C code

/*
Trains the network [pos] on the dataset [points/next] using [steps] with adaptive stride
*/
void adaptive_learn(framework* pos, int steps, void* points, int next(void*, point*)) {
     netw* vel = netw_init(pos->spec);    // The gradient
     point* point = point_init(pos);      // A point which gradient finder caches to 
     double error = INFINITY;             // error starts at infity: i.e. as bad as possible
     int diag_hz = steps / 10;            // frequency of outputs
     print_vbar("DESCENDING");
     for (int i = 0; i < steps; i++) {
          double prev_error = error;                                       // save the error, before computing next
          error = next_gradient(pos, vel, point, points, next);            // sets vel
          double rate = minimize_diagonal(pos, vel, point, points, next);  // computes optimal learning_rate/step_size by mimizing in the direction of gradient
          netw_scale(vel, rate);                                           // scales the gradient by optimal
          if (i % diag_hz  == 0) print_step(pos->net, vel, error, rate);   // print stats
          if (is_error(prev_error, error)) {                               // gradient descent is wild
               if (i % diag_hz != 0) print_step(pos->net, vel, error, rate);// in case step was skipped, print the final step
               exit(2);
          };                       
          netw_add(pos->net, vel);                                         // move
     }
     netw_free(vel);
     point_free(point);
}

#

params = params - learning_rate * gradient
would be
netw_add(pos->net, vel);
in my code

bronze skiff Dec 20, 2021, 2:58 AM

#

so netw_add does a minus?

#

or is netw_scale already scaling by a negative learning rate

south gull Dec 20, 2021, 2:58 AM

#

exactly

#

the gradient is negative rather

#

from mimize_diagonal

bronze skiff Dec 20, 2021, 2:59 AM

#

wait

#

i thought your gradient was error

#

hence the next_gradient function

#

i don't see that error being used anywhere meaningful

south gull Dec 20, 2021, 3:01 AM

#

next gradient is stored into vel through side effects

bronze skiff Dec 20, 2021, 3:01 AM

#

fair enough

south gull Dec 20, 2021, 3:01 AM

#

I print the error to screen

#

so I can see what's going on lol

bronze skiff Dec 20, 2021, 3:01 AM

#

are you sure you have your minuses correct?

#

if you put a - in front of minimize_diagonal and run it

#

what do you get

south gull Dec 20, 2021, 3:02 AM

#

yeah! No, it can fit a straight line

#

it's just way too inefficient at learning

#

also I know the code is messy, but that's because it's C lol

bronze skiff Dec 20, 2021, 3:03 AM

#

what model are you fitting to a straight line?

#

another linear model?

south gull Dec 20, 2021, 3:04 AM

#

well, actually...
The training set is pairs of (x,y) coordinates

#

which lie on a straight line

bronze skiff Dec 20, 2021, 3:04 AM

#

right, okay

#

and your model is...?

south gull Dec 20, 2021, 3:04 AM

#

what's a model?

bronze skiff Dec 20, 2021, 3:04 AM

#

what's the network you're training

#

architectually

south gull Dec 20, 2021, 3:06 AM

#

it's uhh Feed-Forward

#

I'm surprised this
params = params - learning_rate * gradient
should be enough

#

doesn't work for me at all

#

maybe it's too slow

#

how many steps are needed?

pale mural Dec 20, 2021, 3:22 AM

#

hey I've got this massive table (12 sets of a 12x21 table), what would be a good way to display this? any libraries or smthn I should use?

#

col/row labels are just numbers, and the cells are probabilities

inland zephyr Dec 20, 2021, 3:32 AM

#

guys I need suggestion about embedding a image. I have review some famous model for embedding (arcface, deepid, vgg) and I want to ask this one. is the embedded vector are normalized by the model or not? and is it normal practice to normalize the vector before further process like store it to vector db or just keep it like that?

#

and is it normal practice to mapping the cluster made by each vector (for example i use 26 times augmentation per image and want to check whether each vector are clustered perfectly per image or there is a slight mix)?

lapis sequoia Dec 20, 2021, 4:12 AM

#

pale mural col/row labels are just numbers, and the cells are probabilities

You can look at the heatmap. Matplotlib and seaborn will do your work.

pale mural Dec 20, 2021, 4:25 AM

#

lapis sequoia You can look at the heatmap. Matplotlib and seaborn will do your work.

hmm I was thinking something like that, kinda made one with tabulate and some colors

I like how it actually shows the numbers, but there's basiaclly a bunch of these tables (yellow colored 12-21) and I've also got multiple different methods of calculating that set of tables so I'd like to show the difference in those too. Don't want to have to show 10 different tables 5 times. Any idea of how to compactly show that?

south gull Dec 20, 2021, 4:25 AM

#

inland zephyr guys I need suggestion about embedding a image. I have review some famous model ...

uhhhhhhhhhhhhhhhhhhhhhhhhh?

#

I don't understand your english

inland zephyr Dec 20, 2021, 4:27 AM

#

ehe sorry for my bad english

#

umm i wonder if normalizing the embedding vector from image is a common practice or not

#

and i wonder if cluster the vector from each augmented image is common to, to analyze if the augmentation works well to separate each entity

#

since i done some research about the effect of each embedding model and want to know what models and augmentation methods meet my expectation

quasi parcel Dec 20, 2021, 4:49 AM

#

hi i hope everyone are doing well

arctic crown Dec 20, 2021, 4:49 AM

#

what does .linspace() do?

#

quasi parcel Dec 20, 2021, 4:52 AM

#

i need an help in literal_eval and a pandas coloumn

#

ValueError: malformed node or string: 0 [312020]

#

this is the error i am getting

pale mural Dec 20, 2021, 5:08 AM

#

arctic crown

returns an array of 100 evenly spaced numbers between 0 and 70

#

*oreder small to big as well

lapis sequoia Dec 20, 2021, 5:42 AM

#

pale mural hmm I was thinking something like that, kinda made one with tabulate and some co...

You can use subplots. So basically all of them will be in one of them.

lapis sequoia Dec 20, 2021, 5:42 AM

#

pale mural hmm I was thinking something like that, kinda made one with tabulate and some co...

I may have an example for the same. Gimmi an hour or something.

pale mural Dec 20, 2021, 6:13 AM

#

lapis sequoia You can use subplots. So basically all of them will be in one of them.

would you know what kind I should look at? I'm scrolling through matplotlib's 'subplot' docs but not sure which I'd use

charred wedge Dec 20, 2021, 7:59 AM

#

What is the best way to retrieve and parse a streaming api in json format? Like if I want to search for a match in name:

heavy bay Dec 20, 2021, 9:46 AM

#

Hey, so I installed tensorflow but when I run my code I get this warning py 2021-12-20 15:15:23.864238: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
How do I rebuild tensorflow with the appropriate compiler flags?

bold timber Dec 20, 2021, 9:49 AM

#

Hi, I have a question about NLP. Must we remove a data text when it appears more than once?

#

like a use drop_duplicates??

uneven flame Dec 20, 2021, 9:51 AM

#

bold timber like a use drop_duplicates??

this might help-
https://indicodata.ai/blog/should-we-remove-duplicates-ask-slater/

Indico Data

Slater Victoroff

Should we remove duplicates from a data-set while training a Machin...

It…depends. Mostly it depends on what your goals are and what your dataset looks like....

#

but personally so far, reducing data redundancy worked well in most cases, so i prefer removing duplicates mostly

#

also check this out^

uneven flame Dec 20, 2021, 9:55 AM

#

uneven flame also check this out^

verbal dock Dec 20, 2021, 10:17 AM

#

import cv2
from PIL import Image
cam = cv2.VideoCapture(0)
def draw_rectangle(img, rect):
    (x, y, w, h) = rect
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
def draw_text(img, text, x, y):
    cv2.putText(img, text, (x, y), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 255, 0), 2)
def predict(test_img):
    
    face, rect = detect_face(img)

    label = face_recognizer.predict(face)

    label_text = subjects[label]
 
    #draw a rectangle around face detected
    draw_rectangle(img, rect)
    #draw name of predicted person
    draw_text(img, label_text, rect[0], rect[1]-5)
    return img
def predict(test_img):
    #make a copy of the image as we don't want to chang original image
    img = test_img.copy()
    #detect face from the image
    face, rect = detect_face(img)

    #predict the image using our face recognizer 
    label, confidence = face_recognizer.predict(face)
    #get name of respective label returned by face recognizer
    global label_text
    label_text = subjects[label]
    return img
predicted_persons = []
while True:
    ret, frame = cam.read()
    if not ret:
        print("failed to grab frame")
        break
    cv2.imshow("Attendence...", frame)

    k = cv2.waitKey(1)
    if k%256 == 27:
        break
    elif k%256 == 32:
        # SPACE pressed
        stimg = cv2.imwrite("Student_Image.jpg", frame)
    studentimg = cv2.imread("Student_Image.jpg")
    Student_Prediction = predict(studentimg)
    #draw a rectangle around face detected
    draw_rectangle(Student_Prediction, rect)
    #draw name of predicted person
    draw_text(Student_Prediction, label_text, rect[0], rect[1]-5)
    predicted_persons.append(label_text)

    from openpyxl import Workbook
    book = Workbook()
    sheet = book.active
    row = (label_text)

    if row not in predicted_persons:
        sheet.append(row)
book.save("Today's Attendence.xlsx") 
cam.release()

cv2.destroyAllWindows()

#

While proceeding with the code, I'm getting the following error-

#

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-271c883c8a01> in <module>
     47         stimg = cv2.imwrite("Student_Image.jpg", frame)
     48     studentimg = cv2.imread("Student_Image.jpg")
---> 49     Student_Prediction = predict(studentimg)
     50     #draw a rectangle around face detected
     51     draw_rectangle(Student_Prediction, rect)

<ipython-input-14-271c883c8a01> in predict(test_img)
     22 def predict(test_img):
     23     #make a copy of the image as we don't want to chang original image
---> 24     img = test_img.copy()
     25     #detect face from the image
     26     face, rect = detect_face(img)

AttributeError: 'NoneType' object has no attribute 'copy'

#

Why am I getting this error and what can be the possible fixes for this?

lapis sequoia Dec 20, 2021, 10:23 AM

#

pale mural would you know what kind I should look at? I'm scrolling through matplotlib's 's...

Sure. Gimmi some time. I've been quite busy today.

verbal dock Dec 20, 2021, 10:24 AM

#

verbal dock Why am I getting this error and what can be the possible fixes for this?

I think that studentimg is image, but then also it is showing that it is showing it as 'NoneType'.

arctic wedgeBOT Dec 20, 2021, 10:28 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

vivid echo Dec 20, 2021, 10:40 AM

#

May be you are passing wrong input

#

Function not able to read image properly that's why getting error attributeerror

#

@verbal dock Check the path of the image

lapis sequoia Dec 20, 2021, 10:50 AM

#

verbal dock ```py import cv2 from PIL import Image cam = cv2.VideoCapture(0) def draw_rectan...

I WISH I COULD BE THIS GOOD

bold timber Dec 20, 2021, 11:34 AM

#

If I have a dataset that label is doesn't match the feature. What can I do? Remove the data or replace the label value?

vivid echo Dec 20, 2021, 11:50 AM

#

share the error

inland zephyr Dec 20, 2021, 11:51 AM

#

bold timber If I have a dataset that label is doesn't match the feature. What can I do? Remo...

what do you mean by not match with the feature? can you elaborate this?

bold timber Dec 20, 2021, 11:55 AM

#

inland zephyr what do you mean by not match with the feature? can you elaborate this?

I have a headline classification dataset and the feature of headline is doesn't match in label of category. What should I do? remove the data or rename the label value?

vivid echo Dec 20, 2021, 11:58 AM

#

bold timber I have a headline classification dataset and the feature of headline is doesn't ...

rename the label values

bold timber Dec 20, 2021, 12:00 PM

#

vivid echo rename the label values

can we change the label value in the dataset?

pastel valley Dec 20, 2021, 12:01 PM

#

yo is normalization and image augmentation have up and downs on cnn models?

vivid echo Dec 20, 2021, 12:01 PM

#

Share the screenshot of the dataset and which algorithm you use for the classification?

vivid echo Dec 20, 2021, 12:02 PM

#

pastel valley yo is normalization and image augmentation have up and downs on cnn models?

Means ?

vivid echo Dec 20, 2021, 12:05 PM

#

bold timber can we change the label value in the dataset?

send me the dataset link

#

So I will guide you in better way

amber lark Dec 20, 2021, 12:37 PM

#

Hello, where can I start learn ML or deep learning?
I am with 0 knowledge.

pastel valley Dec 20, 2021, 12:49 PM

#

vivid echo Means ?

yo bro nevermind i got confused i cant even remember why i asked that hahaha

#

btw this is new question does cnn models naturally recognizes the shapes right? or their colors too?

serene scaffold Dec 20, 2021, 12:51 PM

#

amber lark Hello, where can I start learn ML or deep learning? I am with 0 knowledge.

try "Data Science from Scratch"

vivid echo Dec 20, 2021, 12:52 PM

#

pastel valley btw this is new question does cnn models naturally recognizes the shapes right? ...

yes colors are recognize by the cnn

pastel valley Dec 20, 2021, 12:52 PM

#

like convo layers detects edges therefor they can see the shapes

vivid echo Dec 20, 2021, 12:52 PM

#

like vehicle color detection

pastel valley Dec 20, 2021, 12:54 PM

#

vivid echo yes colors are recognize by the cnn

so if i train a model with a dataset that for example 2 balls same shape but one is blue and one is red and i used color space manipulation (image augmentation technique) this will then give disadvantage on the model right?

vivid echo Dec 20, 2021, 12:55 PM

#

I don't think so

#

can you give a reason why you feel it will give disadvantage?

pastel valley Dec 20, 2021, 1:01 PM

#

if the distinct feature that differentiate this 2 class is only color then altering their color in preprocessing(using image augmentation) can confuse the model?
for example applied red casting to blue ball and applied blue casting to red ball?@vivid echo

vivid echo Dec 20, 2021, 1:05 PM

#

Okay , I see

#

Now I got it.
You are right

pastel valley Dec 20, 2021, 1:06 PM

#

can this be a good thesis topic?

vivid echo Dec 20, 2021, 1:06 PM

#

we use augmentation the generate more training data so our model trains with better accuracy.

vivid echo Dec 20, 2021, 1:07 PM

#

pastel valley can this be a good thesis topic?

Yes

pastel valley Dec 20, 2021, 1:07 PM

#

vivid echo we use augmentation the generate more training data so our model trains with bet...

but if the generated data is somewhat questionable(like the color casting thing) then it can then give negative impact?

vivid echo Dec 20, 2021, 1:08 PM

#

yes that time model will confuse or give wrong prediction

inland zephyr Dec 20, 2021, 1:22 PM

#

pastel valley can this be a good thesis topic?

actually i have researcher this one

#

and yes

#

sometimes augmentation will give bad result depends on the characteristic of the object

pastel valley Dec 20, 2021, 1:23 PM

#

inland zephyr actually i have researcher this one

oh is it already done?

#

published?

inland zephyr Dec 20, 2021, 1:23 PM

#

um nope

#

actually my personal project

#

but it could be a good thesis project

pastel valley Dec 20, 2021, 1:24 PM

#

oh i can still use this as thesis yeah?

#

i just need a topic

#

hahaha my 1st topic got rejected

pastel valley Dec 20, 2021, 1:29 PM

#

inland zephyr sometimes augmentation will give bad result depends on the characteristic of the...

so my topic would be selecting object to classify then if my experiment proves right then that object or similar objects to that are not good to use augmentation(that generate data with altered feature(color)) if my experiment proves me wrong then at least i did conduct experiment and have a thesis right?

#

😅

inland zephyr Dec 20, 2021, 1:39 PM

#

you can try that one

amber lark Dec 20, 2021, 1:49 PM

#

serene scaffold try "Data Science from Scratch"

Thx

delicate sphinx Dec 20, 2021, 3:44 PM

#

In Tensorflow, is there a way to use a tokenizer vocabulary as the vocabulary = ... parameter in the TextVectorization layer or should I only use a TextVectorization layer

wicked grove Dec 20, 2021, 4:02 PM

#

vivid echo we use augmentation the generate more training data so our model trains with bet...

While using the imagedatageneratkr does it also generate original images? I used rotation_range=45 and got a few images which were like the original one

merry ridge Dec 20, 2021, 4:03 PM

#

Does anyone have any recommendation for a paid online course to learn Python for Data Science for a complete beginner? I have a coworker that is interested in learning and I already sent them a lot the free material, but I figured a well-structured paid one would be better if it is all being billed back to the company anyway.

bronze skiff Dec 20, 2021, 4:09 PM

#

merry ridge Does anyone have any recommendation for a paid online course to learn Python for...

buy a book

#

like grus' data science from scratch

#

or bishop's pattern recognition and machine learning

#

(the second is a personal recommendation)

#

first book has a python tutorial in it-- you can supplement it with beazley's python cookbook

desert oar Dec 20, 2021, 4:14 PM

#

yeah maybe at least bill the book to the company and give them a few hours a week to learn it

wicked grove Dec 20, 2021, 4:46 PM

#

desert oar yeah maybe at least bill the book to the company and give them a few hours a wee...

Hello,While using the imagedatagenerator in keras does it also generate original images? I used rotation_range=45 and got a few images which were like the original one

amber lark Dec 20, 2021, 4:47 PM

#

serene scaffold try "Data Science from Scratch"

Do I need to have a high math level?

serene scaffold Dec 20, 2021, 4:47 PM

#

amber lark Do I need to have a high math level?

it goes over the basics of the math you'll need

amber lark Dec 20, 2021, 4:48 PM

#

Ok

delicate sphinx Dec 20, 2021, 4:48 PM

#

delicate sphinx In Tensorflow, is there a way to use a tokenizer vocabulary as the vocabulary = ...

Figured this out but I got no clue how to use my TextVectorization layer lmao x-x

delicate sphinx Dec 20, 2021, 4:49 PM

#

wicked grove Hello,While using the imagedatagenerator in keras does it also generate original...

you might want to give examples or elaborate more, I've not touched on image generation but it could just be taking smaller images from the rest of the image to focus on each different part?

wicked grove Dec 20, 2021, 4:54 PM

#

delicate sphinx you might want to give examples or elaborate more, I've not touched on image gen...

I have this dataset called Refuge with 40 images i want to increase it to 120 images
I used imagedatagenerator in keras for it

#

So for data augmentation i am trying rotation
I set rotation_range to 45 degree but it generates a few images which are very similar to the original

delicate sphinx Dec 20, 2021, 5:34 PM

#

wicked grove So for data augmentation i am trying rotation I set rotation_range to 45 degree...

Have you checked the documentation? Personally I've never used an imagedatagenerator (and I use Tensorflow 2 anyway). Would this be of any assistance?https://stackoverflow.com/questions/34801342/tensorflow-how-to-rotate-an-image-for-data-augmentation

Stack Overflow

tensorflow: how to rotate an image for data augmentation?

In tensorflow, I would like to rotate an image from a random angle, for data augmentation. But I don't find this transformation in the tf.image module.

wicked grove Dec 20, 2021, 5:35 PM

#

delicate sphinx Have you checked the documentation? Personally I've never used an imagedatagener...

Yess i did check thiss
I actually also wanna know what is better for data augmentation
Imagedatagenerator or openCV
What do you think i should go for??

uneven flame Dec 20, 2021, 5:41 PM

#

wicked grove Yess i did check thiss I actually also wanna know what is better for data augmen...

https://towardsdatascience.com/top-python-libraries-for-image-augmentation-in-computer-vision-2566bed0533e

here are some libraries used for image data augmentation
if u just want to solve a classification problem u can use any data augmentation libraries
for object detections some times after the augmentation one might have to take care of the bounding box coordinates too. I am not sure but I think some of these libraries might have methods of doin that, and maybe there are some APIs which helps with that too. I am also trying to learn about it all atm.

Medium

Top Python libraries for Image Augmentation in Computer Vision

Featuring the best augmentation libraries (along with sample codes) for your next computer vision project

wicked grove Dec 20, 2021, 5:46 PM

#

uneven flame https://towardsdatascience.com/top-python-libraries-for-image-augmentation-in-co...

Can you please name the libraries,i cant access the article
Yess mine is just a classification problem

uneven flame Dec 20, 2021, 5:48 PM

#

wicked grove Can you please name the libraries,i cant access the article Yess mine is just a...

sure let me post screenshots for you-

wicked grove Dec 20, 2021, 5:49 PM

#

Thank you soo much!

arctic wedgeBOT Dec 20, 2021, 5:51 PM

#

:incoming_envelope: :ok_hand: applied mute to @uneven flame until <t:1640023262:f> (9 minutes and 58 seconds) (reason: attachments rule: sent 7 attachments in 10s).

rapid fog Dec 20, 2021, 5:53 PM

#

!unmute 509403906963406860

arctic wedgeBOT Dec 20, 2021, 5:53 PM

#

:incoming_envelope: :ok_hand: pardoned infraction mute for @uneven flame.

rapid fog Dec 20, 2021, 5:53 PM

#

@uneven flame Sorry, your message got zapped by our filters since it had quite a few attachments.

#

Would you like me to get them back for you? We have them in logs.

uneven flame Dec 20, 2021, 5:56 PM

#

rapid fog <@!509403906963406860> Sorry, your message got zapped by our filters since it ha...

No worries mate, I dm'd @wicked grove the details

rapid fog Dec 20, 2021, 5:56 PM

#

👌

hasty mountain Dec 20, 2021, 6:22 PM

#

Hey guys, about GANs and, especifically, DCGAN: is it possible to use some kind of automate hyperparameter tuning for the optimizer without collapsing the model? Or do I have to be changing the learning rate manually and train for many epochs after each change to see how it goes?

#

I made a DCGAN that will only stop generating noise and generate something that slightly resembles the real images if I use more than 5000 epochs, so it kinda sucks to have to change the learning rate and wait for 5000 epochs everytime.

bronze skiff Dec 20, 2021, 6:32 PM

#

i mean, you can use an LR scheduler

#

https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

hasty mountain Dec 20, 2021, 6:36 PM

#

bronze skiff i mean, you can use an LR scheduler

Oh, I see. Do you have a suggestion to how many epochs would be reasonable to make the LR change?

bronze skiff Dec 20, 2021, 6:37 PM

#

that's often a hard hyperparameter choice

#

some of those schedulers use the validation set to determine good times to lower/raise learning rates

#

so maybe check out those

hasty mountain Dec 20, 2021, 6:40 PM

#

bronze skiff some of those schedulers use the validation set to determine good times to lower...

Can I simply apply a scheduler for the generator and another one for the discriminator at the same time? DCGANs feels like I'm always walking on eggshells.

bronze skiff Dec 20, 2021, 6:40 PM

#

yes

#

gans in general are very unstable, due to mode collapse

#

it might be good to look into modifications that try to get around those issues

#

like infoGANs and WGANs

#

https://www.depthfirstlearning.com/2019/WassersteinGAN

#

my favorite resource to learn about wgans

hasty mountain Dec 20, 2021, 6:42 PM

#

I see. I think model collapse isn't exactly the problem for me, since both the generator and discriminator loss function doesn't go to infinite and beyond. But the generator still will only generate noise until around 5000 epochs.

#

I've added some gaussian noise to the discriminator's conv2d layers and now I'm trying to change the learning rate.

bronze skiff Dec 20, 2021, 6:46 PM

#

that's usually what WGANs do-- accelerate training

#

but for now, tweaking learning rates is good to do

toxic kraken Dec 20, 2021, 7:07 PM

#

Hi! I have a silly question: I am coding a classifier estimator with sklearn, and I have a dataset of diamonds (size, color, clarity and cut).
Based on the first 3 features, i want to classify each sample by cut, this can be: ['Ideal', 'Premium', 'Very Good', 'Good', 'Fair'] .

So my target Y is a Series of strings.

My question: what is the difference between OneHotEncoder and simply putting numbers from 0 to 4?

delicate sphinx Dec 20, 2021, 7:19 PM

#

toxic kraken Hi! I have a silly question: I am coding a classifier estimator with sklearn, an...

well firstly, if you have 10000 examples, every output from a one-hot encoder model would give you 10000 items in a list

#

with 9999 "0" and 1 "1"

#

from my understanding of it

hasty mountain Dec 20, 2021, 7:25 PM

#

toxic kraken Hi! I have a silly question: I am coding a classifier estimator with sklearn, an...

I think some models won't work properly if you use 0 to 4 instead of one-hot encoder.

#

My head is on neural networks right now, and those demand one-hot encoder to be able to use soft max and classify classes correctly. Otherwise, I think the model would the see the values from 0 to 4 as continuous, something like price prediction, so it would require a different structure.

#

You probably can use 0~4 values, but I think it would be more complicated to work with.

toxic kraken Dec 20, 2021, 7:29 PM

#

Thanks!

arctic wedgeBOT Dec 20, 2021, 7:38 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1640029708:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

arctic crown Dec 20, 2021, 8:29 PM

#

please help
so i am making a personal ai assistant its been over 5 to 6 months now
and now i want to add some ml algorithm for doing this:
lets say i wake up at 7 am and turn on the lights and i do this continuously every day till a week when the next week starts i want my my ai to automatically turn on the lights for me . another example could be lets say i set a alarm to wake up at 7 am everyday the more i do it the more it knows and does it itself.
which ml algorithm can i use to achieve this?

uneven flame Dec 20, 2021, 8:33 PM

#

arctic crown please help so i am making a personal ai assistant its been over 5 to 6 months n...

suppose u set alarm straight for 5 days to ring at 7am, u want the program to set an alarm at that same time on the 6th day, in case if u forget to set it yourself?

novel acorn Dec 20, 2021, 8:34 PM

#

Hey, so I have a question.

What could be happening here? according to the shape it is 1,445,477 entries but the index goes from 0 to 523,862, which seems pretty weird

arctic crown Dec 20, 2021, 8:46 PM

#

uneven flame suppose u set alarm straight for 5 days to ring at 7am, u want the program to se...

yup

novel acorn Dec 20, 2021, 8:48 PM

#

Already checked, seems normal honestly

desert oar Dec 20, 2021, 8:50 PM

#

this is difficult to follow. can you post a snippet of code and some kind of demonstration of what goes wrong?

#

!code see below for using code formatting:

arctic wedgeBOT Dec 20, 2021, 8:50 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

desert oar Dec 20, 2021, 8:50 PM

#

!paste or use our paste site:

arctic wedgeBOT Dec 20, 2021, 8:50 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar Dec 20, 2021, 8:51 PM

#

please avoid posting screenshots unless there's absolutely no other way to show what you are asking about

#

it's impossible to search, impossible to read for certain people, and generally puts a lot more burden on other people

#

likewise @novel acorn, can you please post this as a code block

#

check for extra whitespace around the names in the CSV

novel acorn Dec 20, 2021, 8:52 PM

#

hqahahahah it's because I'm using multiple versions and want to see what changes I'm doing

desert oar Dec 20, 2021, 8:53 PM

#

e.g. maybe your list has ['a', 'b', 'c'] but the csv has a , b , c which would be ['a ', ' b ', ' c']

#

then you need to manually inspect the files and find out what's going on

#

figure out which particular files are causing trouble, read one of them, and then print the columns with df.columns.tolist()

novel acorn Dec 20, 2021, 8:54 PM

#

desert oar likewise <@!258018612986511361>, can you please post this as a code block

sure, but I'm sending screenshots because of the resulting dataframe, not the code 😄

desert oar Dec 20, 2021, 8:54 PM

#

you can also put the output in a code block? or in the paste site. it doesn't have to be syntactically valid python code

novel acorn Dec 20, 2021, 8:54 PM

#

sure, look

#

yup, but that's the problem

#

1.41 million rows

#

default, the file didn't have an index

desert oar Dec 20, 2021, 8:57 PM

#

@olive jackal so what exactly is the problem with this? you are expecting one of these dataframes to have certain columns, but it doesn't have those columns?

desert oar Dec 20, 2021, 8:57 PM

#

novel acorn yup, but that's the problem

show the code you used to create this dataframe. it's probably just not sorted by its index value

#

i told you, the video isn't useful. at least isn't not something i personally can use to help you with

novel acorn Dec 20, 2021, 8:58 PM

#

it may be, I actually had to do a join because I had every single year in different datasets

desert oar Dec 20, 2021, 8:59 PM

#

novel acorn it may be, I actually had to do a join because I had every single year in differ...

it's very likely that there's nothing wrong with the dataframe, but that the last row does not have the highest index value. first of all .shape should tell you the size of the dataframe, and that number will not lie to you. second, you can do .index.max() or equivalent

#

indexes are there to help you, but they can get confusing if you aren't used to working with them

novel acorn Dec 20, 2021, 9:00 PM

#

I'll try that

#


ff_id = pd.read_csv(path_customer, encoding='unicode_escape')

data_2018 = pd.read_csv(path_2018, encoding='unicode_escape')
data_2019 = pd.read_csv(path_2019, encoding='unicode_escape')
data_2020 = pd.read_csv(path_2020, encoding='unicode_escape')
data_2021 = pd.read_csv(path_2021, encoding='unicode_escape')

filtered_2018 = data_2018.merge(ff_id, on="ID", how="inner")
filtered_2019 = data_2019.merge(ff_id, on="ID", how="inner")
filtered_2020 = data_2020.merge(ff_id, on="ID", how="inner")
filtered_2021 = data_2021.merge(ff_id, on="ID", how="inner")

year_2018 = filtered_2018.drop(columns_to_drop, axis=1)
year_2019 = filtered_2019.drop(columns_to_drop, axis=1)
year_2020 = filtered_2020.drop(columns_to_drop, axis=1)
year_2021 = filtered_2021.drop(columns_to_drop, axis=1)

all_years = pd.concat([year_2018, year_2019, year_2020, year_2021])

#

that's the code I used to read it and to concat it in a single big dataset

#

desert oar Dec 20, 2021, 9:00 PM

#

novel acorn ```py ff_id = pd.read_csv(path_customer, encoding='unicode_escape') data_2018 ...

all those merges will likely reorder the data, yes

#

is the ID unique across rows? if so, consider setting it as the index for each dataframe

novel acorn Dec 20, 2021, 9:01 PM

#

yup, but it's a long id for unique customers

desert oar Dec 20, 2021, 9:02 PM

#

is it 1 customer per row? or more than 1?

novel acorn Dec 20, 2021, 9:02 PM

#

1 per row, but customers repeat because it's the movements of 4 years

desert oar Dec 20, 2021, 9:02 PM

#

that would be more than 1 row per customer then

novel acorn Dec 20, 2021, 9:02 PM

#

indeed

desert oar Dec 20, 2021, 9:02 PM

#

as in, the customer id's are not unique

#

are they unique in ff_id?

novel acorn Dec 20, 2021, 9:03 PM

#

yes, that dataset only has 3k rows

#

because it's the id of the customers that belong to certain category

desert oar Dec 20, 2021, 9:04 PM

#

and what's in each row of the other tables? some kind of transaction?

novel acorn Dec 20, 2021, 9:04 PM

#

yup, not transaction but information of a movement (cargo)

desert oar Dec 20, 2021, 9:05 PM

#

so customer 1234 can appear in any table multiple times? or customer 1234 can only appear in each table once, but they can appear in multiple tables?

novel acorn Dec 20, 2021, 9:05 PM

#

I'll try it 😄

desert oar Dec 20, 2021, 9:05 PM

#

@olive jackal i believe you can pass a list of column names to usecols= so you don't have to deal with the column name index business

#

desired_columns = ['a', 'b', 'c']

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    columns_in_file = list(set(tmp.columns) & set(desired_columns))
    data = pd.read_csv(data, usecols=columns_in_file)
    ...

novel acorn Dec 20, 2021, 9:07 PM

#

desert oar so customer 1234 can appear in any table multiple times? or customer 1234 can on...

first one, they can appear in any table multiple times and each time it may be a different movement

desert oar Dec 20, 2021, 9:10 PM

#

ok then @novel acorn. this probably won't change your outcome much, but in general i'd do something like this to make it clear what the unique identifiers are (and maybe wrap it up in a function to reduce duplication and the risk of typos):

# Customer data: one row per customer
ff_id = pd.read_csv(path_customer, encoding='unicode_escape').set_index("ID")

# Load cargo data: one row per shipment
def load_cargo_table(path):
    data = pd.read_csv(path, encoding='unicode_escape')
    data = data.join(ff_id, on="ID", how="inner")
    return data.drop(columns_to_drop, axis=1)
paths = [path_2018, path_2019, path_2020, path_2021]
all_years = pd.concat([load_cargo_table(path) for path in paths])

#

i'm not sure what you mean by that. but you can actually use the fact that it accepts callables to your advantage (this example is even given in the docs):

desired_columns = ['a', 'b', 'c']

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    data = pd.read_csv(data, usecols=lambda c: c in desired_columns)
    ...

#

desired_columns = ['a', 'b', 'c']
def is_desired_column(c):
    return c in desired_columns

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    data = pd.read_csv(data, usecols=is_desired_column)
    ...

novel acorn Dec 20, 2021, 9:14 PM

#

desert oar ok then <@!258018612986511361>. this probably won't change your outcome much, bu...

Thank you!

I'll try that

bronze skiff Dec 20, 2021, 9:17 PM

#

any opinions here on metaflow?

#

i don't know how i feel about stuffing everything into a single class

novel acorn Dec 20, 2021, 9:50 PM

#

desert oar ok then <@!258018612986511361>. this probably won't change your outcome much, bu...

tysm, this helped a lot in cleaning the code, but the solution ended up being resetting the index, probably as you said, due to the merges and the concat (most likely this), the indexes were broken, after restarting it, everything seems fine.

#

hasty mountain Dec 20, 2021, 9:54 PM

#

@bronze skiff hey, just a quick question:
I've seen that usually GANs don't use Dropout layers, even though a dropout layer helps to avoid overfitting and thus helps with loss function.
However, since GANs are...kinda special and fragile, is this a bad idea?

(I tried using dropout(0.4) after each ReLU in my discriminator. I think I broke my GAN...)

novel acorn Dec 20, 2021, 10:01 PM

#

yup I know, I was given the data in different files because it was one file per year and they wanted me to do a global analysis

#

I want to learn sql, but due to college and work, I have little to no time, but it's in my to do list 😄

#

😮

#

I'll try it then, I'll see if in the following days I have a little time 😄

#

Thought it was as hard to learn as a new programming language hahahha

stone marlin Dec 20, 2021, 10:07 PM

#

I'd recommend messing around in pgexercises.com. They have very common questions, have solutions at the bottom which explain their process, and it's pretty fast to pick up SQL. This is in Postgres dialect, but most of the dialects are very, very similar to one-another.

#

When I was doing data engineering, I recommended this to all of the interns who wanted to get into DE. Most [not all] were able to get up to speed relatively quickly and commit something to our codebase in a few weeks. :']

#

To optimize SQL calls takes quite a bit of experience and knowledge of the architecture --- but to do simple calls (which are 99% of the calls people prob will want) the learning curve is fairly low.

arctic crown Dec 20, 2021, 10:12 PM

#

in LinearRegression can you predict on a single dimension like x?

stone marlin Dec 20, 2021, 10:12 PM

#

As in, you have a list like [1, 4, 6, 1, 7, 8] and you want to regress on it?

#

They might wanna take the mean in a weirdly convoluted way.

novel acorn Dec 20, 2021, 10:14 PM

#

stone marlin When I was doing data engineering, I recommended this to all of the interns who ...

thank you!

I'll keep this in mind

stone marlin Dec 20, 2021, 10:15 PM

#

No problemo, I love SQL stuff. It's a really fantastic tool to learn and I'd argue that it's essential for most jobs in DS / DE / Analysis these days.

#

I don't think that it's a competition, both are very important to know.

#

Give me an example, I don't think I follow.

#

The kinds of data which I have been working with were not always able to be pulled with looker/tableaux/powerbi --- this was more of the Data Analysts job. But I understand what you mean here. Nevertheless, if you're using a BI tool, the point of it is to be able to pass it around easily and modify it --- so I'm not sure I get where the "passed around and now it's unusable" thing is coming from, unless you mean that there's some people exporting to excel, changing it, etc., which is bad practice in general.

arctic crown Dec 20, 2021, 10:22 PM

#

wait sorry my question doesent make sense

stone marlin Dec 20, 2021, 10:22 PM

#

I think though that either way, knowing proper structure to put things in is important as well. For ETL, you're going to need to know the proper ways to Extract (sql, or whatever BI tool you're using if that's acceptable), Transform (whatever system you use here, python/spark/etc.), and Load (which is where the datastructures comes in).

arctic crown Dec 20, 2021, 10:23 PM

#

please help
so i am making a personal ai assistant its been over 5 to 6 months now
and now i want to add some ml algorithm for doing this:
lets say i wake up at 7 am and turn on the lights and i do this continuously every day till a week when the next week starts i want my my ai to automatically turn on the lights for me . another example could be lets say i set a alarm to wake up at 7 am everyday the more i do it the more it knows and does it itself.
which ml algorithm can i use to achieve this?

stone marlin Dec 20, 2021, 10:23 PM

#

If the whole df is 90mb, it depends strongly on what you're doing, right? If your ETL is windowing over a whole bunch of stuff, that's more complicated than a single SELECT for the business team.

#

Moreover, it depends where the load is going to. To a DW for the business team? To a db for the DS team? Etc.

#

I think we're both saying right things here: it's important to know SQL (or, equally, the THEORY behind how querying works, since it's still the same thing in Tableaux / whatever, just simplified), as well as how to produce a product relevant for the team you're handing it off to.

#

I'd never give my business team a raw Excel file. I give them a Looker view and they can export if they want, but they can't change it on looker and mess up anything.

#

Haha, okay, see, I'm the person on the data team people buy the coffee for. :']

arctic crown Dec 20, 2021, 10:26 PM

#

arctic crown please help so i am making a personal ai assistant its been over 5 to 6 months n...

simplified example:
suppose u set alarm straight for 5 days to ring at 7am, u want the program to set an alarm at that same time on the 6th day, in case if u forget to set it yourself?

stone marlin Dec 20, 2021, 10:26 PM

#

But yeah, if one is not going to be a Data Scientist / Data Engineer, then it's probably not AS important to get too deep in the weeds, you're totally right.

#

Ash, that's still a 2D regression. Your x-axis is day number, your y-axis is the time.

#

Yes, this is true. I tend to do smaller companies, so this is my bias, certainly.

#

Yeah, I like to have a "say" in the company, for some definition of "say". Most of my companies have been around 20 - 300 people, so it's a wide range, but def not a big company.

#

Haha, well, yes. We usually will have a data lake which only DBAs have access to, and they will then push data (with help from DS + Commercial) to a Data Warehouse so that we can attach looker / tableaux / whatever to that. Then we'll have another smaller data warehouse for DS which is mostly lightly parsed Data Lake stuff.

#

That way commercial gets what they want and we don't have to do a ton of gross formatting AND we don't have to worry about weird business defs accidentally getting into the DS part. And then for the DS part, we're free to do whatever we want with that, we're the DBAs of that DW.

#

I don't think it's optimal, but it's definitely served us well! Which goes back to my comment: without SQL, I'd be SOL. Haha. But yeah, in a big company if you're not required to just get your data willy-nilly, SQL might not be a top thing to invest in.

#

Haha, yeah. For BI stuff, tableaux + looker have served be very well. I didn't do much with PowerBI but it looks fine.

#

I've been meaning to learn it, a few companies I'm looking at use it and I don't know much about it.

#

Yeah, it's kind of weird, since Windows got WSL, a bunch of companies my friends are at have slowly transitioned off macs (because they're VERY expensive) and went over to windows. Since the devs get the WSL stuff w/ Linux for dev work, and everyone else is like, "whatever, windows is fine."

#

The only thing I don't wanna do is have to learn Azure cloud. I've already had to learn AWS and GCP stuff, I don't wanna learn another one, haha. But either way, that's a good task for me to do, just to look into it.

#

Yeeeeeep. I'm not sure the direction of the market, but it seems a few companies have been moving over. I'm not sure if it's price-point or what.

#

We'll see how it all goes. AWS certainly has marketshare and name. GCP is way friendlier to use, imo. I dunno about Azure yet. But in ten years we'll see where the market is, haha.

delicate sphinx Dec 20, 2021, 10:38 PM

#

How can I translate a softmax output (length of 16 data type of float) to integer again (I tokenized my words so the output float -> int -> string)

I'm using Tensorflow but at this point I'm open to anything

delicate sphinx Dec 20, 2021, 10:38 PM

#

delicate sphinx How can I translate a softmax output (length of 16 data type of float) to intege...

My vocabulary is about ~28,000 words so one-hot encoding might work but I'd rather look and see if there's a more dynamic approach first

stone marlin Dec 20, 2021, 10:38 PM

#

I'm not sure what Dax is, but I'd prob do most of my modeling in the SQL anyhow regardless, haha.

#

28,000 words with one-hot? Lawddddd that's a large, sparse dataset.

delicate sphinx Dec 20, 2021, 10:39 PM

#

stone marlin 28,000 words with one-hot? Lawddddd that's a large, sparse dataset.

yeah that's why I'm hoping for a more dynamic approach

#

I think TextVectorization gives a more dynamic approach but I couldn't wrap my head around it

arctic crown Dec 20, 2021, 10:39 PM

#

what does Linear Regression mean?

delicate sphinx Dec 20, 2021, 10:39 PM

#

I know there's a way of doing it but I've not found a guide or documentation on it

stone marlin Dec 20, 2021, 10:40 PM

#

I'm not excellent with NLP, so I'm sure someone else can guide you, Tenten. :'[

delicate sphinx Dec 20, 2021, 10:40 PM

#

Yeah I've been asking for days sadly 😦

stone marlin Dec 20, 2021, 10:40 PM

#

Ah, got it. Yeah, I have a weird feeling either GCP or Azure is going to pick up Amazon's pieces, but we'll see if they do.

stone marlin Dec 20, 2021, 10:41 PM

#

arctic crown what does Linear Regression mean?

It's making a line-of-best-fit. So, you have a lot of data points on an xy plot (for example) and you draw a line that's kind of in the middle the "fits the data" the best.

delicate sphinx Dec 20, 2021, 10:41 PM

#

But it's fine, I find a lot of AI based subjects don't have too many people talking about it (I think a lot of people are just really good and don't need help or it's more of a test and try method)

#

yeah

#

I'm not saying anyones mean, I'm just saying 1) finding someone for your needs and 2) them seeing your messages are a hard to get combo

#

this server is lovely so every day I put the same question in xD

stone marlin Dec 20, 2021, 10:43 PM

#

Sorry, this is done in MSPaint because I'm on my laptop. For linear regression (in 2d) you have these little datapoints. For you, it might be the x-axis being day number, and the y axis being time to wake up. Something like that. Anyhow, these are the blue circles (they should be the same size, but I'm terrible at mspaint). Linear Regression allows you to draw this red line which "approximates" where the dots are kind of headed.

#

Tenten, can you give me a little more info on what you're doing? You're taking a corpus and some kind of bi/tri/whatever-grams stuff and putting that through a NN? And at the end, you want your activation function to kind of classify something so you need an integer?

delicate sphinx Dec 20, 2021, 10:44 PM

#

If you've heard of Visual Question Answering, that's what

stone marlin Dec 20, 2021, 10:44 PM

#

I'm mostly guessing here, but that's the kind'a thing I'd like to know about so I can try to help a bit better. Haha.

delicate sphinx Dec 20, 2021, 10:44 PM

#

I take an image and a question, and I try to output an answer

#

I'm using tensorflow due to it's extremely powerful Keras API (and its own functions) as well as its documentation

#

but I don't see many examples that allow me to classify text in the capacity I need it - namely from:

String --> Integer tokens (preprocess/tokenize) --> float values (during model training/predicting) --> (output) integer tokens --> string

stone marlin Dec 20, 2021, 10:46 PM

#

Can you give me an example using a real string (fake values, obv). I'm not sure what you mean by integer tokens.

delicate sphinx Dec 20, 2021, 10:47 PM

#

so basically a question can be up to length 32:

"What are you doing?"

Would be standardized to lose punctuation and make it all lowercase

#

to become: "what are you doing"

#

then I use a tensorflow tokenizer that splits it at each whitespace

#

["what", "are", "you", "doing"]

#

then my tokenizer is fitted on my examples to give integer token values to my text

#

what --> 581, are --> 20, ....

[581, 20, 14, 3414]

#

Then it's put into my model through an embedding layer and dense layers, activations layers blah blah blah until it reaches the output where it's of the type:

[3.43456e-03, 3.90534e-04, ....]

#

That's 16 long

#

My tokenizer saved in txt file form is 11 KB lol

#

whoops wrong one

#

that's my tensorflow model

#

my tokenizer is 2.7MB