#data-science-and-ml

1 messages · Page 361 of 1

subtle breach
#

region column = northwest etc

#

4 plots

hasty nimbus
#

you have missed northeast?

subtle breach
#

thats what i got with your code

#

nevermind..

#

thx anyway

hasty nimbus
#

is this not what you wanted ?

subtle breach
#

sorry no, thought that be obvious. its ok ill figure it out

hasty nimbus
#

lets hope you get help from someone here (I just came in to clear one my doubts and then saw yours on the way and thought I might be able to help in some way)

desert oar
subtle breach
#

Yes initially

#

Needs to filter for 4 diff regions

desert oar
#

so you already have code that groups of data into four regions and draws a histogram for each region

#

perhaps you can spend some time understanding that code, in order to figure out how to modify it appropriately?

#

alternatively, you could learn the more idiomatic way to do this with Seaborn, which uses a paradigm called "the grammar of graphics"

#

under that paradigm, this grid of subplots is called "faceting"

#

each subplot is a "facet"

#

and usually you create one facet per sub-group in the data, which appears to be precisely what you are looking for

#

so here is a demonstration of using line plots

#

Note: the row and col parameters control faceting

#

and I suspect you also want to read about how to use col_wrap and col for "wrapped" facet plots

#

so it's a good place to learn about what those individual parameters do

#

i'm surprised that there isn't a nice document explaining wrapped facet plots for seaborn, at least i couldn't find one

#

floating point numbers are not "real numbers" in the mathematical sense

#

it's unrealistic to expect them to sum exactly to 1

hasty nimbus
#

If I have used float64, It would give 1

desert oar
#

changing the floating-point number representation changes how precisely the numbers are stored, which changes the way errors are propagated through calculations

#

64 bits is twice as precise as 32 bits

#

so maybe with 32 bits you get enough error that it shows when you print the numbers

#

i think at some point every practitioner of data analysis will be forced to learn about floating-point arithmetic and numerical stability

hasty nimbus
#

is there any way to calculate the probability mass function (pmf) column wise?

desert oar
#

i don't quite understand the question, is that a joint probability table? and you are trying to compute marginal or conditional probabilities?

hasty nimbus
#

I would like to get an array, where the column sum could be 1..

#

or the probability sum of elements column wise would be 1, when the datatypes are of numpy float32 values..

desert oar
#

if you are just trying to check the correctness of your code, don't be concerned by floating point errors on the order of 1e-6 or whatever they were

#

that said, why do you need 32 bit specifically?

hasty nimbus
#

i am trying to calculate a loss function..

desert oar
#

unless you have a very specific need to do otherwise, just use the default np.float which is usually 64-bit on modern machines

#

ok, and like i said: floating-point arithmetic is never 100% accurate because most floating point numbers are not exact

#

so expect small errors

#

in some situations, it might be a problem if those errors start to accumulate throughout the sequence of computations

loud cave
#

is there a name of the loss? maybe a reference would help us iunderstand

desert oar
#

in your case, it sounds like you are just concerned that you are implementation is wrong because the numbers don't add up exactly to 1

#

i am telling you that your implementation is probably fine because those errors look like what you expect from accumulated floating-point errors, rather than the algorithm being wrong

hasty nimbus
desert oar
#

if you are still concerned and not convinced, i recommend you spend some time reading about floating point numbers and floating point arithmetic

#

this is just the reality of using computers, at some point you have to deal with the fact that they are physical computers and not idealized computing machines

hasty nimbus
pastel valley
#

yo what are those kernel options and verbosity for svm?

subtle breach
#

Thanks Salt. I look it all up. Exhausted! 👍

pastel valley
#

does svm kernels are like optimizers on neural networks?

stuck gull
#

Hey guys, anyone here familiar with pandas? I need help making a subset of a very large CSV file into another file to use.

rigid zodiac
#

what do you mean making a subset??

glossy bobcat
#

Hi, we are making a deep learning translator like program with pytorch and would really appriciate some help with trying to normalize the data and transforming words (different values derrived from words) into tensors

lapis sequoia
#

look into lemmatization, stemming, and algorithms like word2vec

glossy bobcat
#

thanks 🙂

olive patio
#

Hey guys, I have a question related to normalization in grayscale images. https://stackoverflow.com/questions/70371050/finding-the-mean-and-std-of-pixel-values-for-grayscale-images-in-pytorch
Would appreciate any help

agile monolith
#
x = np.linspace(0 , (2 * np.pi), 200)
def h2d(x):
    fig = plt.figure()
    n=np.arange(1, 12,2)
    print(n, 'n')
    xx,nn = np.meshgrid((x),(n))
    plt.plot(xx,nn)
    
    
    happrox = (1*(4/(np.pi)) * ((np.sin(nn*(xx))) / nn))
    happrox = np.cumsum(happrox,1)
    return(happrox)

happrox = h2d(x)
print(happrox) 
#

This my code but it gives wack result for cumsum, is it coz i have imaginary numbers after the sin operation?

tidal bough
#

why'd you have imaginary numbers here?

agile monolith
tidal bough
#

I don't see anything here that'd result in imaginary results

agile monolith
#

so why is the output wrong?

tidal bough
#

Why do you think that it's wrong?

agile monolith
#

because when i get it to check the code jupyter jotebook says the desired output is


y: array([[ 0.000000e+00, 8.033524e-02, 1.602705e-01, 2.394092e-01, 3.173611e-01, 3.937456e-01,
 4.681952e-01, 5.403578e-01, 6.099000e-01, 6.765098e-01, 7.398989e-01, 7.998050e-01,...
agile monolith
tidal bough
#

If I'm reading it right, even the shape isn't right.

agile monolith
arctic wedgeBOT
#

Hey @agile monolith!

It looks like you tried to attach file type(s) that we do not allow (.docx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

agile monolith
#

doc isnt allowed damn

#

this is the task so it doesnt make sense for the shape to be (4,200)

#

@tidal bough im not trippin right?

tidal bough
#

Yeah, it's somewhat weird that the shape isn't (6,200)

agile monolith
slow vigil
#

pretty noob question but does anyone know a good way to only view the first few lines of a massive json file in the terminal?

#

I'm getting these api responses that are so big my terminal won't print the whole thing, but it only prints the end of the file up, so I can't see the data structure

#

I'm writing them to a file currently, but I was wondering if there's an easy way to see the beginning in the terminal

tidal bough
#

you can just .splitlines the JSON string and take the first few lines

lost zodiac
#

is there a place where i can practice Data Science based Python questions?
ive checked Leetcode but its not there 😦

agile monolith
#

lowkey very happy

#

no idea why or how it works

wicked grove
#
<ipython-input-82-182241fd5e53> in <module>
      6                "stride": 2}
      7 
----> 8 Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
      9 print("Z's mean =\n", np.mean(Z))
     10 print("Z[0,2,1] =\n", Z[0, 2, 1])

<ipython-input-81-f5cd533d7e29> in conv_forward(A_prev, W, b, hparameters)
     86                     weights = W[:,:,:,c]
     87                     biases = b[:,:,:,c]
---> 88                     Z[i,h,w,c] = conv_single_step(a_slice_prev,weights,biases)
     89 
     90 

IndexError: index 4 is out of bounds for axis 3 with size 4``` i cant understand where i am going wrong
hearty token
#

I see that there's a slight improvement when it comes to training a contextual chatbot in deeplearning with data that fits this criteria:

  1. Equal distribution of interrogative words
    i.e. if one tag contains how do i water the plants?
    Adding a question starting with how on another tag to balance it makes it more intelligent
  2. Equal distribution of patterns (i.e. 10 patterns each tag)

I have some idea why this is the case but could anyone explain to me why exactly this is the case? (or if it isn't and it's a matter of chance)

rapid pelican
inner pebble
#

Thanks for answering. I m trying this

inner pebble
#

It works just fine @polar acorn I think I ve lost myself in my own code as it seems just easy and logic today.
Thanks for it.

#

Guys, I have another question.
I think I m gonna use streamlit to create dashboards app for my company.
My question (and I ve asked it in game dev as well because I think the stakes are the same) is how can I share the built apps to my colleagues.

I can either build the app on the common file server and installing a python env in this folder so everyone can launch the app from a shared folder but only one user can use it at a time. + Other problem is that I need to create a file that launch the python env before the app file.
Sounds like mac gyver solution.

Or I create an app just like any software and I can share the app to anyone locally. How can I do that?
Should I use docker to create an image?

odd meteor
inner pebble
odd meteor
# inner pebble I m checking this out thanks for it

Also, I think HuggingFace has a feature called Spaces, you can host what you've built with Gradio on Spaces and multiple users can access it at the same time.

Well, since HuggingFace just acquired Gradio yesterday, I believe it'll definitely blossom into something more beautiful.

inner pebble
#

ah yeah I didn t mention something is that as it s working with companies data I wan t it to stay local or intranet
HuggingFace's spaces is a hosting cloud service?

odd meteor
inner pebble
#

ok I m gonna check this out as well thanks @odd meteor

sleek tapir
#

is this course good

wicked grove
#

hello i am really new tensorflow and i keep getting these errors

#
AttributeError                            Traceback (most recent call last)
<ipython-input-9-797be23c9feb> in <module>
      3 loss = tf.Variable((y-y_hat)**2,name='loss')
      4 #init = tf.initialize_all_variables
----> 5 with tf.Session() as session:
      6     #session.run(init)
      7     print(session.run(loss))

AttributeError: module 'tensorflow' has no attribute 'Session'```
wicked grove
odd meteor
# wicked grove sorry to ping you, could you please tell me why i am getting the above error
wicked grove
#

Ohhh okayy,thank you! So ill just use tf.print()?

odd meteor
modest mulch
#

Anyone knows of deep learning models used for time series classification that I could read about?

mighty spoke
#

Hi I'm getting this error: but not sure how to fix it
File "C:\Users\haris\Documents\Bsc stock project\centroid lags.py", line 68, in xvals
return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),

IndexError: index 0 is out of bounds for axis 0 with size 0 import pandas as pd#import pandas package to read data more easily import matplotlib.pyplot as plt#imported pyplot to plot graphs import datetime as dt#date time to read first column of csv file import numpy as np from datetime import datetime CL=[] for i in range(100): df = pd.read_csv('TSLA.csv') df2 = pd.read_csv('NBM.V.csv') df0=pd.read_csv('file1.csv') df5=pd.read_csv('file2.csv') data1=df0 data2=df5 data1['Date'] = pd.to_datetime(data1['Date']) data2['Date'] = pd.to_datetime(data2['Date']) x1=(data1['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400 x2=(data2['Date'] - dt.datetime(1970,1,1)).dt.total_seconds()/86400 y1=data1['Close'] y2=data2['Close'] t0=[] d0=[]

#
    y2_mean = np.mean(y2)
    y1_stdv = np.std(y1)
    y2_stdv = np.std(y2)
    for i in range(len(data1)):
        for j in range(len(data2)):
            t=x2[j]-x1[i]
            t0.append(t)
            d = (y1[i]- y1_mean)*(y2[j] - y2_mean)/(y1_stdv*y2_stdv)
            d0.append(d)
           # return udcf
        #data=udcf(data1,data2)
    x, y = zip(*sorted(zip(t0, d0)))#ensures x and y values correspond to each others in pairs when sorted
    df4 = pd.DataFrame({'X' : x, 'Y' : y})  #we build a dataframe from the data
    #bins = create_bins(lower_bound=-6,width=3,quantity=30)
    bins=np.arange(min(x), max(x)+0.01, step=4.3)
    #bins2 = pd.IntervalIndex.from_tuples(bins, closed="left")
    categorical_object = pd.cut(x, bins)
    count=pd.value_counts(categorical_object)
    grp = df4.groupby(by = categorical_object)        #we group the data by the cut
    ret = grp.aggregate(np.mean)
    data2_new=df2.sample(frac = 0.7)
    data1_new=df.sample(frac = 0.7)
    dict = pd.DataFrame({'Date':data1_new['Date'],'Close': data1_new['Close']})
    kd = pd.DataFrame(dict)
    kd.to_csv('file2.csv', index=False)
    dict2 = pd.DataFrame({'Date':data2_new['Date'],'Close': data2_new['Close']})
    kd = pd.DataFrame(dict2)
    kd.to_csv('file3.csv', index=False) 
    x1,y1=zip(*sorted(zip(ret.X,ret.Y)))
    def lin_interp(x, y, i, percent_y):
        return x[i] + (x[i+1] - x[i]) * ((percent_y - y[i]) / (y[i+1] - y[i]))
    def xvals(x, y):
        percent_y = (max(y)*0.8)
        signs = np.sign(np.add(y, -percent_y))
        zero_crossings = (signs[0:-2] != signs[1:-1])
        zero_crossings_i = np.where(zero_crossings)[0]
        return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),
                lin_interp(x1, y1, zero_crossings_i[1], percent_y)]
    hmx = xvals(x1,y1)
    centroid=np.mean(hmx)
    CL.append(np.mean(centroid))```
mighty spoke
# agile monolith Axis 0 size 0? Wth

yhh it says says this also: return [lin_interp(x1, y1, zero_crossings_i[0], percent_y),lin_interp(x1, y1, zero_crossings_i[1], percent_y)]

IndexError: index 1 is out of bounds for axis 0 with size 1

#

but when I take it all out the for loop it runs without errors

agile monolith
#

what exactly are you trying to do?

mighty spoke
# agile monolith what exactly are you trying to do?

I'm trying to find 80% of the max y values in a particular 70% sample then I'm trying to find the x coordinates of where the line y_value=max(ret.Y)*0.8 crosses/intersects the 2 points either side of the peak, Then I try finding the midpoint of these to x coordinates and append them to a list, then I will try bin these values and plot N(number of points in each bin) vs the binned data

#

ret.Y is my y data and ret.X is my x data

agile monolith
#

and plot N(number of points in each bin) vs the binned data the issue starts b4 this but after the following 'm trying to find 80% of the max y values in a particular 70% sample then I'm trying to find the x coordinates of where the line y_value=max(ret.Y)*0.8 crosses/intersects the 2 points either side of the peak, Then I try finding the midpoint of these to x coordinates and append them to a list

#

ur samples have any nans?

mighty spoke
#

lemme check one sec

agile monolith
#

one of the operations u do is preventing u i think

#

coz the size doesnt match

#

check the size of all the lists

#

check what they are after the nans are filtered (if u have any)

mighty spoke
#

when I print my sample data frame it dosent contain any Nan

agile monolith
#

chck size

#

of each sample

mighty spoke
#

oh yes

upbeat prism
#

so how dose one pass weight scaling to a loss function in pytorch? I see that BCELoss can take weights when a object is being created but I go through my data in batches and I want batch wise. do you guys just create a new object for each batch or what?

#

I just pass a "reference" or whatever python calls it to my training function

desert oar
pastel valley
#

ill use polynomial kernel but what parameters i should put?

desert oar
#

what do you mean?

#

a polynomial kernel turns y = wx into y = f(x) where f is a polynomial. the "parameter" is just the order of the polynomial. you probably don't want more than 2 or 3 imo

pastel valley
desert oar
#

for the other kernels, treat them like hyperparameters in any other model, e.g. a neural network

#

you should familiarize yourself with how those kernels actually work (and what a kernel actually is)

pastel valley
#

oh maybe thats why haha
thank you ill try to look into it

subtle breach
#

g = sns.FacetGrid(df2, col="region", height=3.5, aspect=.65)
g.map(sns.kdeplot, "charges")

#

easy!!

#

(just now need to add mean/median*).

desert oar
#

or better yet, the .axvline method

weary stag
#

hey can anyone help me with this. nameError: 'app' is not defined on my 1 flask code

wicked grove
#

online learning with relative preferences has resulted in a new framework for optimizing over sets of alternatives with only relative, subset-wise observations. This is a general framework that is applicable to many automated recommendation systems that can sequentially elicit only  relative preferences from, say, human users, e.g., “Do you like X over Y (and Z)?” This study has yielded state-of-the-art learning algorithms that make optimal subset selection decisions in terms of regret and rank-order estimation error, along with new insights on how to efficiently make comparisons to elicit items’ utilities within a wide range of social choice models.i came across this in a paper, can someone please explain it to me

serene scaffold
#

I would recommend using a code editor that highlights built-in names so that you start remembering which ones they are. It's helpful to know them since they're always available.

bronze skiff
serene scaffold
#

Nothing can minimize my regret.

odd meteor
serene scaffold
warped rapids
#

Hey all! I want to fill the area's in between the lines and put a text

#

Any idea on how to do it?

odd meteor
warped rapids
#

I know there's a fill_between function in matplotlib, but have no idea on how to implement it in the current state of my code

#

And it needs to be a different color in each section

warped rapids
#

Anyone any idea?

bronze skiff
#

also post your code if you're unsure of implementation

slow vigil
#

Hey guys I have a pandas dataframe and the index is all labels. I need to use the value of the labels individually in another function and I was wondering how to go about getting them

loud cave
slow vigil
#

I need each index value one at a time

warped rapids
#

It is

mighty spoke
#

Hi I my code was running like 2 mins ago but now when I run it it says
File "pandas_libs\parsers.pyx", line 549, in pandas._libs.parsers.TextReader.cinit

EmptyDataError: No columns to parse from file

#

and in another tab using the same files it says the same thing

serene scaffold
serene scaffold
#

but I'm glad it worked, whatever that means

mighty spoke
#

yhh it was reading data from a file with no data

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @dense atlas until <t:1639779871:f> (9 minutes and 58 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

hybrid mica
#

How many types of machine learning are there?

odd meteor
delicate sphinx
#

TENSORFLOW
So this is the output from my model, I have a tokenizer which can map integers to and from words, but as this is my output, I can't use it, is there a way to return this to integer form? I looked into TextVectorisation but my model uses an image and a question input to get an answer (in text) output. So I'm not sure if I can use it

#

Can anyone give me some tips on how to convert this into integers so I can tokenize it back into english?

#

I imagine I need some sort of TextVectorisation layer as output from my merged model?

hollow sentinel
#

i am so beyond confused

#

by that bottom arrow

#

x should be 2, y is 2, and z is 0

#

but i don't see how exactly y is two?

#

oh

#

i'm an idiot

#

lol

#

now it makes sense

hollow sentinel
#

it's 2 diagonally

arctic crown
#

in python tensorflow what does .loc() do?

wicked grove
serene scaffold
steel berry
steel berry
granite furnace
#

super noob here. I have this pandas dataframe, and i would like to compare the second two columns to the first, outputting true if their signs match, and false otherwise.

#
_df['randforest_result'] = np.where((_df['randforest'] >= 0 and _df['y_test'] >= 0) or (_df['randforest'] < 0 and _df['y_test'] < 0), True, False)
``` I tried something like this but it's giving me issues about the truthiness of series/dataframes
shrewd saddle
#

This probably is not the right place to ask this, but anyway, I am trying to do some elementary satellite image analysis with rasterio and earthpy. The NDVI is given as the normalized difference between the near infra-red and red bands. Water should appear yellow or red (negative values) and vegetation should appear green (positive value) in the NDVI result. But I am getting kinda the opposite

#

The green part is supposed to be water, and yellowish part is land and vegetation

#

this is the code:

ndvi = es.normalized_diff(stacked[4], stacked[3])

ep.plot_bands(ndvi, cmap="RdYlGn", cols=1, vmin=-1, vmax=1, figsize=(10, 14))

plt.show()
#

stacked[4] is NIR and stacked[3] is the red band

hasty kiln
#
import numpy as np
a = np.array([
          [1, 2, 3],
          [4, 5, 6],
          [7, 8, 9],
          [10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]])

arr = np.array_split(a, 3, axis=1)

print(arr)              
[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]

Why is it divided on the basis of the column and not on the basis of the row, even though (1) the division on the basis of the row?

old grove
#

if the difference is 0.5 between mean and median... can we use mean... eg i am analyzing avg no of likes on Instagram

bold timber
#

from this code I got an error like this: TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

how to fix that?

timber flame
#

Bros, I'm a 1.5 years machine learning engineer, with devops and automation experience too. I know python and am learning Go too, know SQL well and bash too. I had a rough week at my org and want to prepare asap for a couple of months and get a new job. Can someone tell me what specif site, topics and courses I should start doing to get a new job. For Python btw. If u know what ML topics I should learn that would be cool.

tall loom
#

What are the properties of a stop word without considering its semantics? I want to determine whether a word is a stop word for any random language, one I can think of is that it occurs in a high frequency within a text.

#

The inverse document frequency would be high too, but I don't want to work with a group of documents. Rather, for a single document.

#

For a multi-document approach, I can perhaps create a context vector and train a naive bayes classifier to determine if a word is stop word or not.

#

However, I need a lightweight way to just determine with some accuracy on a single document.

odd meteor
odd meteor
pastel valley
#

yo in application of data augmentation for samples there could be a boundary like there will be a optimal amount of augmented data that best give result and there is something like a dataset that compose of too much augmented data to the point that it gives bad observation to the model?

odd meteor
# tall loom The inverse document frequency would be high too, but I don't want to work with...

Is there any reason why you'd wanna do this from sratch when there's already an easier option to achieve your task?

Just as we have English stopwords, there are other stopwords in various languages too. Libraries like spaCy, TfidfVectorizer etc already has a parameter that can handle such.

If you're working on a French text, just indicate you'd wanna remove stopwords in French with the appropriate parameter.

tall loom
tall loom
timber flame
#

Plus my main work was data science related with a niche but I want to work in anything that is being offered

lethal lark
#

I am doing a kaggle compeitition, getting low f1 score (its stuck around 0.5) but high accuracy any suggestions on what might be causing this ? my dataset is balanced and has two classes

odd meteor
# tall loom Yes, that is true, but as I said, "without considering semantics", the high-freq...

Aside the high frequency, I doubt if there's away to easily get the kind of accuracy you'd love to see on the project if you're not familiar with the 'new' language or at least a native speaker of that 'new' language you wanna work with.

That's why I kinda feel using SpaCy or TfidfVectorizer or CountVectorizer's stopwords parameter is best.

It's an interesting project regardless 😊

tall loom
#

If such a dictionary does not exist in them, I don't think they will be able to work for a new low-resource language.

odd meteor
tall loom
#

However, certain features which I can think of is, properties of NOT STOP WORDS instead of choosing STOP words. One could be the rank of the sentence a word appears in, given a document, I think the top paragraphs convey way more information and as we go down it saturates and then concludes.
Such words in top sentences have a chance to be "keywords" rather than stop words, so words appearing in top sentences get lower weights.

Also frequency of a word within a sentence.

No. of sentences some word occurs in : More it is, the higher chances to be a stop word

Bigram frequency

Unique surrounding trigrams, if a word is surrounded by more unique words in a window of size 1, the greater chance it has to be a stop word.

And of course the frquency.

Combining these 6 features, I can train a classifier I guess

odd meteor
tall loom
odd meteor
#

To my understanding...

We all understand English, so that's why we can from the top of our head sight a stopword in English without sweating it.

It'll be difficult to spot a stopword in Igbo Language if you don't at least understand the language to a reasonable level. Languages differ and can be complex at times... so I'm just thinking hmmm 🤔

Well, what do I know 🤷🏾‍♂️

tall loom
tall loom
odd meteor
# tall loom <@!519319496868233227> Don't you think these features will be unique across any ...

I don't believe it will be tbh. Language can be complex and tend to differ in a lot of ways. Definitely, the only intersection I'd say all language would share when it comes to stopwords is their frequency of occurrence. And we need more than frequency of occurrence to get a reasonable result.

I don't believe those guys that worked on spaCy were able to come up with stopwords for several languages by just looking at frequency of occurrence in a text...

😂😂😂 To me, I believe it starts with grouping people who are native speakers of each language together to manually annotate, label or identify stopwords in that language

#

I might be wrong tho.... But that's what my brain is telling me they did

tall loom
# odd meteor I don't believe it will be tbh. Language can be complex and tend to differ in a ...

Yes, language can differ in a lot of ways, but the writing style for an article is consistent, as the frequency.

Which feature do you think from them will not be consistent across multiple languages? Because they are all term features, just like frequency (which you said will be consistent) , what differs them to not make them consistent? I am asking to gain more insights and better the features.

pastel valley
#

yo can i generate more than 1000 augmented image per image
lets say i have 10 samples and i genereted 1000 augmented images per sample so my total data set is 10010?

#

or based on the parameters or techniques that will be applied to the image there will be a limit on how much augmented images i can generate?

#

i am talking about the imagedatagen by keras

#

or if there are other library for image augmentation?

tall loom
#

I saw, they just predefined a list of stop words for all languages, from different contributors.

odd meteor
odd meteor
tall loom
odd meteor
timber flame
#

Never once said "certificates"

#

Coding round isn't remotely related to deep learning

tall loom
# pastel valley yo can i generate more than 1000 augmented image per image lets say i have 10 sa...

Not sure about the implementation by imagedatagen, but by definition you can augment an image an infinite number of times!
However, the important aspect will be to check the redundancy among the augmented images, what parameters are u setting for augmentation depending upon the objective. because you can have 10 augmented images, and they are not providing sufficient NEW information for the neural network to learn anything significant.

pastel valley
tall loom
# pastel valley do you think it can be a thesis topic? determine the optimal volume of augmented...

I think doing it from the redundancy and objective aspect with it will add more value, that is
Creating 10 augmented images for an "auto brightness" task will have high redundancy if the augmented images only have varying level of pixel brightness, but could be still optimal given the task.
However, for an object detection task, 10 of such images would be not optimal, in general (depends on the task again)

odd meteor
# tall loom I am only wondering as you said those features will not be consistent, but the f...

What I meant was that...

Languages can be fluid and dynamic, and as such, each language will be governed by different set of rules.

So I believe that Identified Features in language A will be consistent for language A but might not necessary be in Language B and Language C. That kinda explained why spaCy had to use different contributors who supposedly are native speakers of the language they were assigned to work on.

pastel valley
#

do you think the result will be useful for others? or the topic itself is subjective on the task of what classification task is being conducted?

tall loom
# odd meteor What I meant was that... Languages can be fluid and dynamic, and as such, each...

spaCy didn't use an unsupervised approach for stop words, that's why they used available stop word data and for the languages they dont have data, there is no rigorous implementation yet.

I am only quoting this line which you said:

Definitely, the only intersection I'd say all language would share when it comes to stopwords is their frequency of occurrence.

The other features I have defined are almost from a frequency perspective only, so I believe they have to be consistent. But I was only discussing which feature among the ones I have said won't be consistent.

tall loom
pastel valley
wicked grove
tall loom
# pastel valley do you think the result will be useful for others? or the topic itself is subjec...

What I meant is, the task of finding an optimal number of augmentation depends on the objective too!
" determine the optimal volume of augmented data for cnn models" + "Given a task"

If your task is fish classification, then it becomes
"" determine the optimal volume of augmented data for cnn models if we are doing fish classification task"

It will depend on the data as well.

A better topic willbe
"Given any random task, and data, finding an optimal number of augmentation images for a cnn model", which will be also an interesting cntribution

tall loom
odd meteor
# timber flame Coding round isn't remotely related to deep learning

My response was with the assumption that you're interested in getting into another ML position. However, If you're more interested in coding round, then I'd suggest data structures and algorithm related tests.

Personally, the only coding test I've done during one of my ML job interview stages was a 2 hours Kaggle problem. 😂 About 11 of us were given a real life company data and asked to build a model and make prediction... F1 score was the metric used to rank submission. Top 4 submissions proceeded to the next level of the interview stage...

Again, it depends on your country as well. Other experienced guys here might be able to add one or two....

stone marlin
#

I'm in the US and have done ML-engineering + DS + DE roles --- it's a wildly different process depending on the company and their needs.

wicked grove
stone marlin
#

So, depends on what you're into 0n3. You've done MLE for 1.5yrs, you prob know a direction you wanna move into. From there, perhaps look at some Indeed / BuiltIn postings to see what technologies and so forth some desired companies use.

#

I've never been asked to do anything related to deep learning w/rt my work/interviews but I'm also not in an industry which uses deep learning frequently, so YMMV.

pastel valley
stone marlin
#

Regardless of where you move into, if you haven't checked out Advent of Code (which is happening now!) check it out. Solving those problems will teach you a ton about software engineering and the like, and that's always a plus for applicants (at least in the fields I work). There's also a bunch'a channels here for it.

pastel valley
#

website?

#

or channel?

stone marlin
#

Oh, also, I meant this for 0n3, but it is useful for anyone.

pastel valley
#

@tall loom yo sir i found a paper that classifies fish using cnn also but they did not apply augmentation on their dataset so if i redo their study and apply augmentation would it be a good contribution?
sorry i just dont know how to classify what is a good or nah topic because there i see more on solving low level problems like about the theories which is so hard for me and i see this classification tasks as topic and i dont know why our topic got revised and demanded a modification or alternative to data augmentation or find an issue on a cnn model that we can address which is soo much for me hahaha

pastel valley
#

its like school festival but for programmers?

stone marlin
#

It's a series of programming challenges, yes.

pastel valley
#

@tall loom yo sir i go back about the image augmentation stuff
what if what i experiment is the parameters?
each configuration will have each parameters and tested
1 zoom, flip
2 color manipulation
3 patch erasing cropping
the best resulting parameters for image augmentation can be the my contribution for like
if there are anyone interested in training a model on classifying fish and they have those fish images ithey can use the result of the experiment on which augmentation techniques they can apply on their model yeah? or nah?

#

they recommended me to think of an alternative way or better way for image augmentation for cnn models but i think everything is already there i cant figure out new technique

tall loom
# pastel valley <@!570671965069901845> yo sir i found a paper that classifies fish using cnn al...

Here is how it will be:

  1. You are getting better results by using image augmentation, on the methods which THEY have used.
  2. There already exists other models which can classify fish, but it is not something that THEY have used, and perhaps they can generate better results without any augmentation, but for them baselines might already exist!

However, it can still get accepted, I know of a case personally who worked on some classification (can't disclose) on top of image augmentation, however they also used more dataset.
From a contributions perspective, it's not just better results though, its more about WHAT is used to get better results, augmentation is one way and it can stil be considered as a topic, but if you are looking for novelty then this is not it, but that is just my opinion. As augmentation is standard practice these days for similar tasks.

pastel valley
# tall loom Here is how it will be: 1. You are getting better results by using image augment...

yeah i see other topics like a classification task with cnn also and they got accepted which is weird because we are pretty much the same almost only the samples are different

also the reason they say that i need revision is because i need to have some originality or something like what i see on cnn that i can change or what alternative method i can replace with image augmentation which is what you say a novel topic right? why are they expecting that kind of idea to me hahahaha

#

so i am trying to find something like a comparative analysis type of study which maybe acceptable than creating a whole new theory which is impossible for me

tall loom
# pastel valley <@!570671965069901845> yo sir i go back about the image augmentation stuff what...

Yes, optimal parameter estimation is a good idea, although that is also task-dependent, and if you are restricting yourself to the task of "fish detection" then many things can be explored.
Although not generalized, it will still be a good analysis for this task and then it can be inferred by some empirical experiment on how it can be used further for similar if not all tasks.

But just testing won't be good enough, if you are doing only an iterative approach to find an optimum point, that will be data-dependent. But this can be a baseline for this data, the motive could be "How the standard practice of such and such hyperparameters for the fish classification task can be improvised if we modify this hyperparameter in this way, this you would have to show empirically of course, and then recommend something based on that"

pastel valley
#

btw you are pro sir are you one of those who create papers and attend conferences?

tall loom
# pastel valley i need to present a logical reason on why this configuration is better than that...

Yes, you will say start with that logical reason as your hypothesis and then you prove it by showing it from results with that experiment of tuning. You will either have to justify why the previous tasks have NOT explored your configuration OR you will be the first to suggest that configuration, for that task.
And the main theme should be how this idea can be implemented for other similar tasks.

pastel valley
#

for example in this 3 configuration
1 zoom, flip
2 color manipulation
3 patch erasing cropping
i got 1 as the best result that implies a good parameter for data augmentation of fish images

i need to say why is it the winner
like because the 2nd configuration makes the image pixels change colors thus making the model confused because fish colors are important features(for example this is my observation on the model using this configuration)
and another reason for 3rd configuration

#

the reasons would be from the observed result of experimentation

tall loom
# pastel valley for example in this 3 configuration 1 zoom, flip 2 color manipulation 3 patch e...

The other way.
This is like, you did some experiment randomly, analyzed some configurations, and then found this should work well, and then you are making a theory of why it is working better. This works when there is a generalized data or generalized model. But here, it's one specific task on some data!
So you would have to then explain further on why that "image pixel theory" is correct, not just for this experiment, but for in general for similar tasks.
(You can try running on other datasets, previous and after results, etc)

#

Or you can also start by proposing that why enhancing some parameters should work better and is something that has not been done in previous works, then you give a theory, and then implement it and show that it is correct for a number of datasets, this is how it should work though, as you start with some hypothesis based on some observation and then you prove it.

pastel valley
#

oh for example i have the results then i will try the configuration on example imagenet dataset and check the result with and without using the configuration

pastel valley
#

for example my hypo is the color manipulation is bad configuration and it turns out good dang what do i do hahaha

tall loom
#

This helps others to NOT explore the part or explore in a different way(some other way which you can also suggest in end to explore further as ideas)

pastel valley
#

oh so this what a real science topic is right?

tall loom
# pastel valley for example my hypo is the color manipulation is bad configuration and it turns ...

Also, the people judging also matter in this case if its for semester exams, ( a lot of bias occurs from panel memebers) you would have to present it in a different way then.
Keep a few hypotheses on side(like different configurations, you would have to read some papers on why people are exploring something), and if all of them fail on your data, then your project will be simply "effects of certain changes in a system" and tbh it will be an interesting study to explain why some tuning didn't work and why it worked for others. If you figure this out, you can definitely suggest some different directions to explore by end.

tall loom
pastel valley
lean hull
#

trying to install Tensorflow/PyTorch on my Mac M1. is it possible without installing conda?
conda comes with too many unused packages and junk associated with it.

pastel valley
serene scaffold
#

just pip install torch should work on Mac but getting CUDA requires extra work, apparently.

pastel valley
#

image augmentation is pretty popular so maybe there are alot of classification task that uses image augmentation that i can compare

tall loom
tall loom
# pastel valley its like everything should have a legitimate basis? this is the crucial part rig...

Yes, you cannot say something happened because you think this is why it should happen. Your reason should have evidence.
The only thing to be careful about it, "You are explaining the panel about what you thought should happen, its reason and you concluded it didn't happen"
And turns out your reason has flaws OR something didn't happen and the cause of it is something to be studied before even starting. Discuss this with supervisor 🙂

lean hull
serene scaffold
serene scaffold
wicked grove
#

Do i do the augmentation per image and generate 10 images

pastel valley
# tall loom Yes, you cannot say something happened because you think this is why it should h...

yeah we got advisers but they are pretty much busy anyways i think i got this topic and maybe give this idea to the panel and start drafting on how should this go
incase it got accepted hahhaha because base on my understanding is i need to have something original to add or modify to existing algorithms well maybe this experimental study could suffice the panel

thank you very much sir hoping to talk to you again with my future struggles 😅 👍

wicked grove
#
train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=45,horizontal_flip=True)
``` i have been stuck and idk what i should do after this:/
tall loom
lament idol
#

Is anyone here from the United Kingdom?

serene scaffold
lament idol
#

Has anyone here done A-Level maths?
I wanted to ask if the knowledge for calculus from there is sufficient enough to do stuff in machine learning

#

Or would I need to learn anything extra?

serene scaffold
tall loom
wicked grove
#

And how many images should be generated per image? I think i am actually confused

lament idol
serene scaffold
lament idol
#

So I was wondering whether I'd need to catch up on or learn anything extra
But there was this other book I found online called mathematics for machine learning

#

Would you recommend it?

wicked grove
serene scaffold
tall loom
lament idol
lament idol
serene scaffold
lament idol
tall loom
#

But from the learning perspective, generating 'on the fly' or providing input of all augmented images from saved data makes no difference.

serene scaffold
#

and discrete math

tall loom
lament idol
agile monolith
#

stelercus

#

i need ur help

lament idol
#

I'll just teach myself the math needed
Thanks for informing me of the math needed

odd meteor
lament idol
agile monolith
serene scaffold
wicked grove
lament idol
lament idol
#

I don't take fm because I'm massive stoopid

agile monolith
agile monolith
lament idol
agile monolith
arctic crown
#

what library should i use as a beginner? like tensorflow, pytorch, keras

wicked grove
serene scaffold
#

So, none of them for now.

arctic crown
#

then?

serene scaffold
#

sklearn

agile monolith
wicked grove
agile monolith
#

seems like a good combo at first but thats horrible

arctic crown
serene scaffold
lament idol
#

So it works imo

arctic crown
agile monolith
#

is it ur second year?

arctic crown
lament idol
arctic crown
agile monolith
lament idol
agile monolith
arctic crown
serene scaffold
lament idol
odd meteor
agile monolith
lament idol
agile monolith
agile monolith
lament idol
wicked grove
#

@tall loom im so sorry i had another doubt
While doing the augmentation for the entire dataset
What parameters should i set for the image to come about 1000

odd meteor
agile monolith
agile monolith
lament idol
agile monolith
#

yh u will be fine probs

lament idol
#

I hope so
UCL is the only I haven't heard back from as of yet

odd meteor
# arctic crown which one shoulds i watch?

😃 Check all of them first before you settle for one. Drop anyone that doesn't work for you.

PS: I'd also advice learning from a more structured platform. Learning solely on YouTube can be overwhelming and also cause fatigue.

agile monolith
agile monolith
lament idol
agile monolith
#

in terms of difficulty*

lament idol
arctic crown
lament idol
#

My first choice is either qmul or ucl

wicked grove
arctic crown
odd meteor
# arctic crown ty

In fact, Neuromatch Academy has one of the best customer-friendly course on Deep Learning.

https://academy.neuromatch.io/

Check their YouTube channel also

wicked grove
agile monolith
#

they arent that good

#

compared to ucl

lament idol
odd meteor
# wicked grove Is it important to learn oops and dsa in python for a career in ml or ds ?

I didn't learn OOP before I started ML. However, from my experience it'll be great to know OOP in Python before starting ML.

You'll find it very useful when you start learning PyTorch. Because PyTorch will assume you already have the knowledge of OOP. If you use other DeepLearning frameworks like TensorFlow + Keeas then you won't really use OOP.

It's good to know at least two frameworks. Avoid been over dependent on one DL framework.

PyTorch, TensorFlow + Keras, JAX, MXNet, Sonnet, CNTK, Kaffe etc..

Just know at least two.

So I started DL with TensorFlow but I'm currently learning PyTorch now. But I had to learn OOP first before coming back to PyTorch.

odd meteor
odd meteor
wicked grove
stone marlin
#

I've been curious --- most of y'all seem to be doing a lot of NN-type things, but I've seen very little of it in the field in the last ten-or-so years where I've done work [IoT, Travel, Loans...] besides a few ad hoc projects. Do your fields have you working with NNs quite a bit?

I think the most I've used'em is autoencoders for fraud / sensor fluctuations.

Edit: I didn't mean to sound judge-y here! I'm legit interested in what people are doing with NNs.

wicked grove
odd meteor
# wicked grove 😂😂no no data structures and algorithms

Ooh... I wasn't a software developer before getting into ML so I don't really know beyond my undergrad Big O notation in csc class 😩 I don't know anything about binary tree either.. Lol

I do pick up new stuff along the way though. I was first a Statistician before ML blew up to become something I could no longer ignore .

odd meteor
# arctic crown udemy is paid

Oftentimes much value is gotten from paid services 😁 don't you think so?

There's no amount of free ML course online that could match Courses on platforms like DataCamp and DataQuest. I personally haven't seen one yet.

wicked grove
#

@odd meteor i was trying out the data augmentation and this is what i did ... could you please tell me how i can i save the augmented images?

#
train_path = 'D:\glaucoma_train\ODIR-5K\ODIR-5K\Glaucoma'
train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=45,horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(train_path,target_size=(512,512),batch_size=16)```
odd meteor
stone marlin
odd meteor
stone marlin
#

Dang, have fun with NLP. That's one I started up again because it seems to be in demand at more and more places. It's cool, I just don't remember a whole lot of it. And I def don't know the new techniques.

odd meteor
stone marlin
#

Yeah, I get a little jealous that most of what I'm doing is fittin' GLMs and boostin' a bit! Haha, but NNs, even with LIME, are pretty terrible explainers and it's hard to say to a customer, "Hey, your XYZ is going to break." "What signals lead you to believe that?" "Idk, lol."

odd meteor
stone marlin
#

I know what'cha mean. Maybe I'll jump down this rabbit hole too and try'ta learn some!

severe hare
#

How do I plot a function in opencv

#

I want to plot a parabola

#

simple y = x^2

bronze skiff
#

is... is matplotlib not sufficient for you?

severe hare
swift sigil
#

Guys can anyone tell me how statistics ,AI, ML ,DL , Probability are used in Data Science . I am very confused so kindly give me a real life example

bronze skiff
#

your confusion stems from probably not defining data science well

#

how would you define it?

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1639854074:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia
#

F

timber flame
timber flame
timber flame
timber flame
#

There are more and more jobs becoming like this

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lusty sphinx until <t:1639861645:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

mighty spoke
#

Hi i'm trying to plot(scatter plot) with binned data on the x axis and number of points in each bin on the y axis would anyone know how I would do this, any help appreciated, my code: ```x_cl, y_cl=zip(*sorted(zip(CL,peak)))

dat = pd.DataFrame({'CL' : x_cl, 'Y_VAL' : y_cl}) #we build a dataframe from the data

bins=np.arange(min(x_cl), max(x_cl), step=0.005)
categorical_object = pd.cut(x_cl, bins)
count=pd.value_counts(categorical_object)
grp = dat.groupby(by = categorical_object) #we group the data by the cut
plt.scatter()

plt.show()```

agile monolith
granite furnace
#

pd.DataFrame.hist()

#

i think it has param options to use plt as backend as well

#

oh scatter

#

sorry

bronze skiff
#

and actually, even for tabular data we use it

stone marlin
#

I usually hear deep learning as using anything with hidden layers, but what sort of things are you doing with tabular data, for example?

bronze skiff
#

i mean, you can build a basic model that takes a tabular dataset for regression and use a transformer-based model on it

#

there are some conveniences that it gives, one of which is it is significantly easier to write custom loss functions

granite furnace
#

slightly off-topic, how important are custom loss functions? are the standard ones usually too generalized?

bronze skiff
#

it depends on your problem, of course

#

and "custom" is relative to the specific package you're using and which functions are pre-installed

granite furnace
#

I see, so you didn't necessarily mean loss functions you write yourself

bronze skiff
#

like, if you're working on underwriting, you might use tweedie deviance as your loss function

#

some GBM packages support it, some don't

#

you will often have to write one yourself if it doesn't

#

but there are times where you have to write it yourself

#

and in that case you would wish that you were working with an autodiff library like pytorch

granite furnace
#

does this also extend to validation methods?

bronze skiff
#

your validation method usually uses similar metrics to that used in model training

#

and if not, then you would need to check that

#

like, god forbid you have to write your own implementation of the hyvarinen scoring loss in lightgbm

granite furnace
#

ml world is a scary place it seems. just took intro to AI this semester and was curious. thanks for humoring me

stone marlin
#

Sure, but aren't transformer-based models NNs? I do agree that that kind of ensemble might be nice to do --- I don't think I've seen it done outside of CV though.

#

Oh, maybe I misunderstood what you were saying. It's not just plugging things into NNs, but rather a methodology of model-making around certain types of data. Maybe?

arctic crown
#

is shaping the same in all frameworks?

lapis sequoia
bronze skiff
#

i mean, deep learning is kinda catch-all term, right?

#

but a lot of research is really centering around the idea of "inductive biases"

stone marlin
#

Got'cha. Yeah, but I usually hear it specifically referring to things where you're using hidden layers to feature-find.

bronze skiff
#

a simple example of an inductive bias is in a conv net

#

if you have images, you would expect that the correlation between two pixels drops off heavily with distance

#

so a fully-connected neural net would perform not great on it

stone marlin
#

Sure, this is also the principle behind k-NN though, no?

bronze skiff
#

mostly because it has parameters for every distance on the image

stone marlin
#

So, I guess the idea is "which inductive biases work in what way for what problem" is a research topic for NN stuff. Makes sense.

bronze skiff
#

yeah, the inductive biases of knn is that "you are defined by what you're close to"

#

there's a lot of ML research these days on "how can i build my model architecture to exploit these inductive biases that i want to hold"

stone marlin
#

Yeah. If we are gonna just take deep learning as "not shallow learning" (not given features a priori) then it would probably be good to know what features it can create and what proximity to cluster them around, via inductive biases.

bronze skiff
#

that's a gist, yeah

stone marlin
#

Yeah, that jives with what you noted about custom loss functions, because it's basically the same kind'a deal.

#

It would be nice to ensemble something onto a standard alg for tabular data (even for a toy problem), but my hesitance is always that I need interpretability for most of my jobs. But maybe just for fun I'll try something out on a toy data set and see what it picks up.

bronze skiff
#

interpretability is a harder job, but at my job we approach it via a lot of ablation and perturbation testing

#

also, keeping a bunch of reasonable synthetic data sets around is wonders for debugging weird model predictions

stone marlin
#

Yeah, LIME + Perturbation was pret much stock and standard when I was doing it, but LIME is sometimes a pretty big nightmare to work with if you're feature-heavy.

bronze skiff
#

agreed

#

unfortunately, that's pretty state of the art unless you have someone whose full time job is to investigate this stuff

stone marlin
#

Yeah, the other kin of LIME are pretty much more or less specific versions. I honestly don't know, with the computing power we have now, how we would get anything better than pert testing and maybe well-defined LIME stuff. But maybe someone clever is on it.

#

Maybe I'll try it on some easy, small-featured dataset and see if I can mess around with different kinds of transformers. Might be worth learning!

granite furnace
#

This is a result of a HalvingRandomSearchCV(), which looks nothing like the example here https://scikit-learn.org/stable/auto_examples/model_selection/plot_successive_halving_iterations.html#sphx-glr-auto-examples-model-selection-plot-successive-halving-iterations-py
Can it still be considered "correct?"

#
RandomizedSearchCV took 24.04 seconds for 188 candidates parameter settings.
Model with rank: 1
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007622902978304772, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.00015997189211448186}

Model with rank: 2
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007830958292280829, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.0002491044511612459}

Model with rank: 3
Mean validation score: 1.000 (std: 0.000)
Parameters: {'eps': 0.0007622902978304772, 'fit_intercept': 1.0, 'normalize': 1.0, 'tol': 0.00015997189211448186}

this is some more output of my search

hasty mountain
#

Can someone give me a hint on how to avoid lack of convergence in DCGAN model? I'm kind of tired of looking at figures of squares full of random colored pixels...

#

My Adam optimizers have the same learning rate as the one used in Pytorch's tutorial for DCGAN, yet my model doesn't work.

stoic musk
#

23 for y in range(len(pos)):
24 for k in range(d):
---> 25 angles[y,k] = np.sin(y/(100002i/d)) if k % 2 == 0 else np.cos(y/(100002i/d))
26
27

ValueError: setting an array element with a sequence.

Trying to code position encoding for an RNN Transformer

vague moon
#

Hey, I have been having trouble trying to install Bazel, is anybody willing to help me with the process. With bazelisk I'd think this would be really quick and easy I don't know why I'm running into so much trouble

hasty grail
barren tide
#

How do i connect a folder to google drive..

frail frost
#

has anyone here tried openAI’s codex?
i need a little help with it

low spear
#

does anyone know what is tenserflow v2 equivalent for image_dim_ordering()

#

what does image_dim_ordering() do actually

wicked grove
#

@serene scaffold hello i have a doubt in pandas
The length of this df is 7820
I want to make only the images with 0s to a total of 1000 and drop the rest

#

Can you please guide me

#

Should i split it into separate columns and use df.drop or is there a simpler way?

lapis sequoia
tired shuttle
#

how do I connect to a sqlite3 database that's on another computer?

#

do I have to host it on a server?

wicked grove
#

And i do not know how i should do that

lapis sequoia
wicked grove
wicked grove
#

Should i split the df such that i have 2 columns one w 0s and another w 2s or is there a better way?

lapis sequoia
wicked grove
#

Alrightt!!

#

Thank youu!!

serene scaffold
wicked grove
# serene scaffold I don't help with screenshots of DataFrames. Sorry <:sad_cat:827636798666309712>
merged_labels = pd.merge(train_labels1,labels )
merged_labels.tail(5)
index_names = merged_labels[ merged_labels['level'] == 1 ].index
merged_labels.drop(index_names, inplace = True)
#print(merged_labels.head(20))
merged_labels_norm=merged_labels[merged_labels['level']==0]
merged_labels_dr=merged_labels[merged_labels['level']==2]

#print(merged_labels_norm.head(6))
print(len(merged_labels_dr))
merged_labels_norm.iloc[0:1000,:]
merged_labels_dr.iloc[0:1000,:]
final_df1 = pd.merge(merged_labels_norm,merged_labels_dr)
print(final_df1.head(5))```
#

this is what i tried

serene scaffold
wicked grove
#

this is the df

#
0    10003_left    0
1    10003_right    0
2    10007_left    0
3    10007_right    0
4    10009_left    0
...    ...    ...
8403    19494_right    0
8404    19498_left    0
8405    19498_right    0
8406    194_left    0
8407    194_right    0```
serene scaffold
#

please do df.head().to_dict('list')

#

that's the only way that I'll read it. Otherwise I have to go back to what I was doing.

wicked grove
#

oh okayy

serene scaffold
#

if I understand correctly, you want to retain up to 1000 rows for which level is 0, ignoring the rest

#

try df.query("level == 0").head(1000)

wicked grove
#

but i get an empty df

serene scaffold
#

concat with what?

wicked grove
#
  '10003_right',
  '10007_left',
  '10007_right',
  '10009_left'],
 'level': [0, 0, 0, 0, 0]}```
serene scaffold
#

what do you need to concat?

wicked grove
#

the above df has images with level 2

serene scaffold
#

what are you really trying to do? there's apparently more to it than just getting 1000 rows where level == 0

wicked grove
#

i split it in the same way

wicked grove
serene scaffold
#

what are all the unique values in the level column?

#

is it 0, 1, and 2 only?

#

and you need 1000 from each?

#

because that's just df.groupby('level').head(1000)

wicked grove
#

i got itt!!i did this, it was a stupid mistake

#
merged_labels_norm=merged_labels[merged_labels['level']==0]
merged_labels_dr=merged_labels[merged_labels['level']==2]

#print(merged_labels_norm.head(6))
print(len(merged_labels_dr))
merged_labels_norm=merged_labels_norm.iloc[0:1000,:]
merged_labels_dr=merged_labels_dr.iloc[0:1000,:]
final_df1 = pd.concat([merged_labels_norm, merged_labels_dr])
print(final_df1.head(2000))```
serene scaffold
#

my way is probably going to be faster.

wicked grove
wicked grove
wicked grove
serene scaffold
#

then what I suggested would work. If there were values in the level column that you didn't want, only slightly more would be needed to ignore them.

wicked grove
serene scaffold
lapis sequoia
#

Do you control the body

#

Hey guys, i have these different set of files that are generated when i run my model. I want to organise all the files by putting them to respective folders. is there a smart way to organise them than manually selecting files and putting them to different folders?

serene scaffold
lapis sequoia
#

yes

serene scaffold
serene scaffold
lapis sequoia
#

okay!! do you some examples to follow through?

serene scaffold
#

what were you expecting tensorflow to do with that array of objects?

#

do you understand why an array of objects that are lists is not a valid input?

#

that's part of it. arrays have to be "rectangular". you can't get around that by having an array of lists that are different lengths

#

but also, just naively throwing data into the network isn't going to accomplish anything. it looks like you haven't done any kind of pre-processing.

lapis sequoia
serene scaffold
vague moon
#

Hey, I have been having trouble trying to install Bazel. I am installing with Bazelisk, and when installing through command line it seems to install correctly, I will get the text Starting local Bazel server and connecting to it... [bazel release 4.2.1] So I try and test it out by getting the version number but recieve the error 'bazel' is not recognized as an internal or external command, operable program or batch file.

lapis sequoia
#

do you have pip

vague moon
#

yes

zealous wolf
#

upgrade required in think so sudo apt-get upgrade bazel

vague moon
#
'sudo' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\*******\Desktop\bazelisk-master>apt-get upgrade bazel
'apt-get' is not recognized as an internal or external command,
operable program or batch file.```
#

I'm running windows btw

zealous wolf
#

okay

#

may this site help full visit once

bronze skiff
south gull
#

blue are true values

#

heck, even giving it the optimal values as initial guess, it insist on making it linear

zealous wolf
#

welcome

vague moon
#

I'm trying to add bazel to path and reading how to download and add to path, I think the problem is I have a space in my path, my friend made my user name on windows "cum face" when he was helping repair my pc, so my my file path to my user folder is C:\Users\cum face, and according to the bazel site "None of these paths should contain spaces or non-ASCII characters."

#

fun

bronze skiff
#

i'm sorry, that's fucking hilarious

south gull
#

friend made my user name on windows "cum face" when he was helping repair my pc
wth, that's not nice at all

south gull
#

guess I don't know what friendship is then

cold skiff
#

what are some good beginner resources for ai and data science, especially for someone being self tauhgt?
I know there are good modules like numpy and pandas, but I'd like a resource that helps me apply that to some projects if y'all know of any.
I have a project I'm working on but I need to see what a fleshed out data science project might look like

#

I found this resource a while back ago:

vague moon
#

Trying to add bazel to path still, do I directly link to the exe in Path or the directory it is in

odd meteor
vague moon
#

I dont know if I can. So I changed my username but can't change the name of the folder from what I've tried, my original version of windows I bought was lost/corrupted during the repair process so I'm having to run a torrented version of windows at the moment which locks me out of changing some things until I activate windows

#

Finally I have installed bazel using chocolatey and it seems to work, it still gives me that error about a space in the file path but it seems to work now so fingers crossed.

hasty mountain
vague moon
#

new problem, if I type bazel into cmd 'bazel' is not recognized as an internal or external command, operable program or batch file. but if I run cmd as admin it works

#

I should be able to run bazel commands without running cmd as admin right?

south gull
south gull
cosmic pelican
#

hello, anyone knows how to fix this?

#

i need to set a fixed scale for both axis, using matplotlib

cosmic pelican
#

wanna do it like this

cosmic pelican
#
    file = open('Simulationn-algo.txt','r')
    data = json.loads(file.read())


    size = []
    Insertion =[]
    Merge =[]
    Heap =[]
    Quick =[]
    Bubble =[]
    Selection =[]
    Counting =[]

    for i in range(len(data['Simulation Details'])):
        size.append(data['Simulation Details'][i]['Size'])
        Insertion.append(data['Simulation Details'][i]['Insertion Sort'][0:5])
        Merge.append(data['Simulation Details'][i]['Merge Sort'][0:5])
        Heap.append(data['Simulation Details'][i]['Heap Sort'][0:5])
        Quick.append(data['Simulation Details'][i]['Quick Sort'][0:5])
        Bubble.append(data['Simulation Details'][i]['Bubble Sort'][0:5])
        Selection.append(data['Simulation Details'][i]['Selection Sort'][0:5])
        Counting.append(data['Simulation Details'][i]['Counting Sort'][0:5])

    _Insertion = np.array(Insertion)
    _Merge = np.array(Merge)
    _Heap = np.array(Heap)
    _Quick = np.array(Quick)
    _Bubble = np.array(Bubble)
    _Selection = np.array(Selection)
    _Counting = np.array(Counting)
    _size = np.array(size)

    plt.plot(_size, _Insertion, label='Insertion')
    plt.plot(_size, _Merge, label='Merge')
    plt.plot(_size, _Heap, label='Heap')
    plt.plot(_size, _Quick, label='Quick')
    plt.plot(_size, _Bubble, label='Bubble')
    plt.plot(_size, _Selection, label='Selection')
    plt.plot(_size, _Counting, label="Counting")


    plt.xlabel('size')
    plt.ylabel("Duration (ms)")
    plt.title("Different Sorting Algorithms")
    # plt.legend()
    plt.show()

Plot()```
grave frost
cosmic pelican
grave frost
cosmic pelican
#

i searched for a fix

grave frost
#

you apparently don't use reddit much - smart.

#

just google who to set scale for axes in matplotlib

cosmic pelican
grave frost
#

ask crypto then 😏

pastel valley
#

yo how to normalize on a cnn model is there any layer that can do it on tensor? or it willbe a preprocessing?

charred umbra
#

Does anyone know what the newest proposed activation function for deep neural networks is?

arctic crown
#

the shape is just how you see the list right?

#

list/array

slow vigil
#

does anyone know if it's possible to use Spark Streaming with websockets?

crude karma
#

do beginner data science projects need to be perfect

halcyon storm
#

Help

#

So baicalky I am working on a machine learning assignment in creating an algorithm that will detect brain tumors based on brain data tumor data sets. This is an assignment that is asked by my professor but I don’t have experience with coding.

These are the steps I already complete which is

(1) Import the dataset into a fresh Google Collab project
(2) Split the dataset into training / testing / validation sets (thursday)

But i still need help with
(3) Defining my classification model(s). You would probably want to try a few different models here.  You can either build your own convolution neural network (layer by layer) in tensorflow and train it from scratch, or you can modify an existing pre-trained network like VGG19, alter it to better suit our binary classification needs, and retrain it on the dataset.
(4) Train you model(s) and evaluate

hasty mountain
# halcyon storm So baicalky I am working on a machine learning assignment in creating an algorit...

If you don't have experience with coding, things might get complicated. Try seeing how to use VGG19.
It seems that scikit-learn also has a module especially for creating a neural network automatically, but I don't know how reliable that is.
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

However, my bias is to create your own, which isn't hard if you try doing that through keras/tensorflow.keras.

#

If your dataset is already preprocessed, then you just need to do something like this:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, Dense

model = Sequential()

model.add(Conv2D(filters=32, kernel_size=5, padding='same', strides=2, activation='relu'))
model.add(BatchNormalizaton())
model.add(Conv2D(filters=64, kernel_size=5, padding='same', activation='relu'))
model.add(Dense(1, activation='relu'))

And your neural network is ready. Just fit it to your dataset and train it.

halcyon storm
#

Ohh ok

hasty mountain
#

However, it's a good idea to learn more about it, how those layers work, so you can enhance this neural network.

south gull
#

omg, I tried to write a network too, but it's incredibly dumb
can you help me with it maybe? @hasty mountain

hasty mountain
south gull
#

Nooooooooooo

#

It's vanilla

hasty mountain
south gull
#

uhm, nothing
I can't even train it on data, from a polynomial function
I've only managed with linear function

halcyon storm
#

Basically there was this dude that helped me with this. And he wasnt able to continue to help me bc he is busy. Is it ok if I add u onto a group chat so then u can see where we left off? @hasty mountain

south gull
#

I wrote the backpropagation and stuff
but I'm doing something wrong with training and gradient descent I think

hasty mountain
halcyon storm
#

Ohh okk

#

Did u know who is willing to help me ?

gloomy hatch
#

Hey guys, I'm trying to figure out a way to calculate (efficiently) sets of density based clusters given a set of x,y coordinates -- does anyone have any articles or suggestions for that?

hasty mountain
# halcyon storm Basically there was this dude that helped me with this. And he wasnt able to con...

But from what you told there, if you use keras, probably the only thing you'll have to worry is about preprocessing your data, how the neural network works and probably that'll be all. Maybe just organizing the output of your neural network...

Keras is really intuitive and quite useful. Like...to train your neural network you just define it and then do model.fit(X, y) and to predict data, model.predict(X, y)
Problem would be if you're using Pytorch.

hasty mountain
hasty mountain
#

I just started using Pytorch some days ago.

south gull
#

wanted to try writing it all

hasty mountain
#

Uuuh...then I really can't help you, you're going through a quite difficult path.

south gull
#

uhhh, well I was still hoping you could help me
I think my problem is, in my gradient descent

bronze skiff
#

why don't you post it

hasty mountain
#

However, maybe you could learn how to do that if you check the source code for tensorflow...which is a module that requires manually creating every single step in a neural network...and it's the hardest way, already.

hasty mountain
south gull
#

oh ok ahehe
well, gl with Pytorch though!

hasty mountain
#

Try checking the source code for some optimizers in pytorch and keras.

bronze skiff
bronze skiff
#

since none of the gradients are exposed

hasty mountain
#

Hm, I didn't know about that

south gull
bronze skiff
#

gradient descent is literally params = params - learning_rate * gradient

#

lol sure

south gull
#

C code

/*
Trains the network [pos] on the dataset [points/next] using [steps] with adaptive stride
*/
void adaptive_learn(framework* pos, int steps, void* points, int next(void*, point*)) {
     netw* vel = netw_init(pos->spec);    // The gradient
     point* point = point_init(pos);      // A point which gradient finder caches to 
     double error = INFINITY;             // error starts at infity: i.e. as bad as possible
     int diag_hz = steps / 10;            // frequency of outputs
     print_vbar("DESCENDING");
     for (int i = 0; i < steps; i++) {
          double prev_error = error;                                       // save the error, before computing next
          error = next_gradient(pos, vel, point, points, next);            // sets vel
          double rate = minimize_diagonal(pos, vel, point, points, next);  // computes optimal learning_rate/step_size by mimizing in the direction of gradient
          netw_scale(vel, rate);                                           // scales the gradient by optimal
          if (i % diag_hz  == 0) print_step(pos->net, vel, error, rate);   // print stats
          if (is_error(prev_error, error)) {                               // gradient descent is wild
               if (i % diag_hz != 0) print_step(pos->net, vel, error, rate);// in case step was skipped, print the final step
               exit(2);
          };                       
          netw_add(pos->net, vel);                                         // move
     }
     netw_free(vel);
     point_free(point);
}

#

params = params - learning_rate * gradient
would be
netw_add(pos->net, vel);
in my code

bronze skiff
#

so netw_add does a minus?

#

or is netw_scale already scaling by a negative learning rate

south gull
#

exactly

#

the gradient is negative rather

#

from mimize_diagonal

bronze skiff
#

wait

#

i thought your gradient was error

#

hence the next_gradient function

#

i don't see that error being used anywhere meaningful

south gull
#

next gradient is stored into vel through side effects

bronze skiff
#

fair enough

south gull
#

I print the error to screen

#

so I can see what's going on lol

bronze skiff
#

are you sure you have your minuses correct?

#

if you put a - in front of minimize_diagonal and run it

#

what do you get

south gull
#

yeah! No, it can fit a straight line

#

it's just way too inefficient at learning

#

also I know the code is messy, but that's because it's C lol

bronze skiff
#

what model are you fitting to a straight line?

#

another linear model?

south gull
#

well, actually...
The training set is pairs of (x,y) coordinates

#

which lie on a straight line

bronze skiff
#

right, okay

#

and your model is...?

south gull
#

what's a model?

bronze skiff
#

what's the network you're training

#

architectually

south gull
#

it's uhh Feed-Forward

#

I'm surprised this
params = params - learning_rate * gradient
should be enough

#

doesn't work for me at all

#

maybe it's too slow

#

how many steps are needed?

pale mural
#

hey I've got this massive table (12 sets of a 12x21 table), what would be a good way to display this? any libraries or smthn I should use?

#

col/row labels are just numbers, and the cells are probabilities

inland zephyr
#

guys I need suggestion about embedding a image. I have review some famous model for embedding (arcface, deepid, vgg) and I want to ask this one. is the embedded vector are normalized by the model or not? and is it normal practice to normalize the vector before further process like store it to vector db or just keep it like that?

#

and is it normal practice to mapping the cluster made by each vector (for example i use 26 times augmentation per image and want to check whether each vector are clustered perfectly per image or there is a slight mix)?

lapis sequoia
pale mural
# lapis sequoia You can look at the heatmap. Matplotlib and seaborn will do your work.

hmm I was thinking something like that, kinda made one with tabulate and some colors

I like how it actually shows the numbers, but there's basiaclly a bunch of these tables (yellow colored 12-21) and I've also got multiple different methods of calculating that set of tables so I'd like to show the difference in those too. Don't want to have to show 10 different tables 5 times. Any idea of how to compactly show that?

south gull
#

I don't understand your english

inland zephyr
#

ehe sorry for my bad english

#

umm i wonder if normalizing the embedding vector from image is a common practice or not

#

and i wonder if cluster the vector from each augmented image is common to, to analyze if the augmentation works well to separate each entity

#

since i done some research about the effect of each embedding model and want to know what models and augmentation methods meet my expectation

quasi parcel
#

hi i hope everyone are doing well

arctic crown
#

what does .linspace() do?

quasi parcel
#

i need an help in literal_eval and a pandas coloumn

#

ValueError: malformed node or string: 0 [312020]

#

this is the error i am getting

pale mural
# arctic crown

returns an array of 100 evenly spaced numbers between 0 and 70

#

*oreder small to big as well

lapis sequoia
lapis sequoia
pale mural
charred wedge
#

What is the best way to retrieve and parse a streaming api in json format? Like if I want to search for a match in name:

heavy bay
#

Hey, so I installed tensorflow but when I run my code I get this warning py 2021-12-20 15:15:23.864238: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
How do I rebuild tensorflow with the appropriate compiler flags?

bold timber
#

Hi, I have a question about NLP. Must we remove a data text when it appears more than once?

#

like a use drop_duplicates??

uneven flame
#

but personally so far, reducing data redundancy worked well in most cases, so i prefer removing duplicates mostly

#

also check this out^

uneven flame
verbal dock
#
import cv2
from PIL import Image
cam = cv2.VideoCapture(0)
def draw_rectangle(img, rect):
    (x, y, w, h) = rect
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
def draw_text(img, text, x, y):
    cv2.putText(img, text, (x, y), cv2.FONT_HERSHEY_PLAIN, 1.5, (0, 255, 0), 2)
def predict(test_img):
    
    face, rect = detect_face(img)

    label = face_recognizer.predict(face)

    label_text = subjects[label]
 
    #draw a rectangle around face detected
    draw_rectangle(img, rect)
    #draw name of predicted person
    draw_text(img, label_text, rect[0], rect[1]-5)
    return img
def predict(test_img):
    #make a copy of the image as we don't want to chang original image
    img = test_img.copy()
    #detect face from the image
    face, rect = detect_face(img)

    #predict the image using our face recognizer 
    label, confidence = face_recognizer.predict(face)
    #get name of respective label returned by face recognizer
    global label_text
    label_text = subjects[label]
    return img
predicted_persons = []
while True:
    ret, frame = cam.read()
    if not ret:
        print("failed to grab frame")
        break
    cv2.imshow("Attendence...", frame)

    k = cv2.waitKey(1)
    if k%256 == 27:
        break
    elif k%256 == 32:
        # SPACE pressed
        stimg = cv2.imwrite("Student_Image.jpg", frame)
    studentimg = cv2.imread("Student_Image.jpg")
    Student_Prediction = predict(studentimg)
    #draw a rectangle around face detected
    draw_rectangle(Student_Prediction, rect)
    #draw name of predicted person
    draw_text(Student_Prediction, label_text, rect[0], rect[1]-5)
    predicted_persons.append(label_text)

    from openpyxl import Workbook
    book = Workbook()
    sheet = book.active
    row = (label_text)

    if row not in predicted_persons:
        sheet.append(row)
book.save("Today's Attendence.xlsx") 
cam.release()

cv2.destroyAllWindows()
#

While proceeding with the code, I'm getting the following error-

#
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-271c883c8a01> in <module>
     47         stimg = cv2.imwrite("Student_Image.jpg", frame)
     48     studentimg = cv2.imread("Student_Image.jpg")
---> 49     Student_Prediction = predict(studentimg)
     50     #draw a rectangle around face detected
     51     draw_rectangle(Student_Prediction, rect)

<ipython-input-14-271c883c8a01> in predict(test_img)
     22 def predict(test_img):
     23     #make a copy of the image as we don't want to chang original image
---> 24     img = test_img.copy()
     25     #detect face from the image
     26     face, rect = detect_face(img)

AttributeError: 'NoneType' object has no attribute 'copy'
#

Why am I getting this error and what can be the possible fixes for this?

lapis sequoia
verbal dock
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

vivid echo
#

May be you are passing wrong input

#

Function not able to read image properly that's why getting error attributeerror

#

@verbal dock Check the path of the image

lapis sequoia
bold timber
#

If I have a dataset that label is doesn't match the feature. What can I do? Remove the data or replace the label value?

vivid echo
#

share the error

inland zephyr
bold timber
bold timber
pastel valley
#

yo is normalization and image augmentation have up and downs on cnn models?

vivid echo
#

Share the screenshot of the dataset and which algorithm you use for the classification?

vivid echo
#

So I will guide you in better way

amber lark
#

Hello, where can I start learn ML or deep learning?
I am with 0 knowledge.

pastel valley
#

btw this is new question does cnn models naturally recognizes the shapes right? or their colors too?

serene scaffold
vivid echo
pastel valley
#

like convo layers detects edges therefor they can see the shapes

vivid echo
#

like vehicle color detection

pastel valley
# vivid echo yes colors are recognize by the cnn

so if i train a model with a dataset that for example 2 balls same shape but one is blue and one is red and i used color space manipulation (image augmentation technique) this will then give disadvantage on the model right?

vivid echo
#

I don't think so

#

can you give a reason why you feel it will give disadvantage?

pastel valley
#

if the distinct feature that differentiate this 2 class is only color then altering their color in preprocessing(using image augmentation) can confuse the model?
for example applied red casting to blue ball and applied blue casting to red ball?@vivid echo

vivid echo
#

Okay , I see

#

Now I got it.
You are right

pastel valley
#

can this be a good thesis topic?

vivid echo
#

we use augmentation the generate more training data so our model trains with better accuracy.

vivid echo
pastel valley
vivid echo
#

yes that time model will confuse or give wrong prediction

inland zephyr
#

and yes

#

sometimes augmentation will give bad result depends on the characteristic of the object

pastel valley
#

published?

inland zephyr
#

um nope

#

actually my personal project

#

but it could be a good thesis project

pastel valley
#

oh i can still use this as thesis yeah?

#

i just need a topic

#

hahaha my 1st topic got rejected

pastel valley
#

😅

inland zephyr
#

you can try that one

amber lark
delicate sphinx
#

In Tensorflow, is there a way to use a tokenizer vocabulary as the vocabulary = ... parameter in the TextVectorization layer or should I only use a TextVectorization layer

wicked grove
merry ridge
#

Does anyone have any recommendation for a paid online course to learn Python for Data Science for a complete beginner? I have a coworker that is interested in learning and I already sent them a lot the free material, but I figured a well-structured paid one would be better if it is all being billed back to the company anyway.

bronze skiff
#

like grus' data science from scratch

#

or bishop's pattern recognition and machine learning

#

(the second is a personal recommendation)

#

first book has a python tutorial in it-- you can supplement it with beazley's python cookbook

desert oar
#

yeah maybe at least bill the book to the company and give them a few hours a week to learn it

wicked grove
amber lark
serene scaffold
amber lark
#

Ok

delicate sphinx
delicate sphinx
wicked grove
#

So for data augmentation i am trying rotation
I set rotation_range to 45 degree but it generates a few images which are very similar to the original

delicate sphinx
# wicked grove So for data augmentation i am trying rotation I set rotation_range to 45 degree...

Have you checked the documentation? Personally I've never used an imagedatagenerator (and I use Tensorflow 2 anyway). Would this be of any assistance?https://stackoverflow.com/questions/34801342/tensorflow-how-to-rotate-an-image-for-data-augmentation

wicked grove
uneven flame
# wicked grove Yess i did check thiss I actually also wanna know what is better for data augmen...

https://towardsdatascience.com/top-python-libraries-for-image-augmentation-in-computer-vision-2566bed0533e

here are some libraries used for image data augmentation
if u just want to solve a classification problem u can use any data augmentation libraries
for object detections some times after the augmentation one might have to take care of the bounding box coordinates too. I am not sure but I think some of these libraries might have methods of doin that, and maybe there are some APIs which helps with that too. I am also trying to learn about it all atm.

Medium

Featuring the best augmentation libraries (along with sample codes) for your next computer vision project

wicked grove
uneven flame
wicked grove
#

Thank you soo much!

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @uneven flame until <t:1640023262:f> (9 minutes and 58 seconds) (reason: attachments rule: sent 7 attachments in 10s).

rapid fog
#

!unmute 509403906963406860

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: pardoned infraction mute for @uneven flame.

rapid fog
#

@uneven flame Sorry, your message got zapped by our filters since it had quite a few attachments.

#

Would you like me to get them back for you? We have them in logs.

uneven flame
rapid fog
#

👌

hasty mountain
#

Hey guys, about GANs and, especifically, DCGAN: is it possible to use some kind of automate hyperparameter tuning for the optimizer without collapsing the model? Or do I have to be changing the learning rate manually and train for many epochs after each change to see how it goes?

#

I made a DCGAN that will only stop generating noise and generate something that slightly resembles the real images if I use more than 5000 epochs, so it kinda sucks to have to change the learning rate and wait for 5000 epochs everytime.

bronze skiff
#

i mean, you can use an LR scheduler

hasty mountain
bronze skiff
#

that's often a hard hyperparameter choice

#

some of those schedulers use the validation set to determine good times to lower/raise learning rates

#

so maybe check out those

hasty mountain
bronze skiff
#

yes

#

gans in general are very unstable, due to mode collapse

#

it might be good to look into modifications that try to get around those issues

#

like infoGANs and WGANs

#

my favorite resource to learn about wgans

hasty mountain
#

I see. I think model collapse isn't exactly the problem for me, since both the generator and discriminator loss function doesn't go to infinite and beyond. But the generator still will only generate noise until around 5000 epochs.

#

I've added some gaussian noise to the discriminator's conv2d layers and now I'm trying to change the learning rate.

bronze skiff
#

that's usually what WGANs do-- accelerate training

#

but for now, tweaking learning rates is good to do

toxic kraken
#

Hi! I have a silly question: I am coding a classifier estimator with sklearn, and I have a dataset of diamonds (size, color, clarity and cut).
Based on the first 3 features, i want to classify each sample by cut, this can be: ['Ideal', 'Premium', 'Very Good', 'Good', 'Fair'] .

So my target Y is a Series of strings.

My question: what is the difference between OneHotEncoder and simply putting numbers from 0 to 4?

delicate sphinx
#

with 9999 "0" and 1 "1"

#

from my understanding of it

hasty mountain
#

My head is on neural networks right now, and those demand one-hot encoder to be able to use soft max and classify classes correctly. Otherwise, I think the model would the see the values from 0 to 4 as continuous, something like price prediction, so it would require a different structure.

#

You probably can use 0~4 values, but I think it would be more complicated to work with.

toxic kraken
#

Thanks!

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1640029708:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

arctic crown
#

please help
so i am making a personal ai assistant its been over 5 to 6 months now
and now i want to add some ml algorithm for doing this:
lets say i wake up at 7 am and turn on the lights and i do this continuously every day till a week when the next week starts i want my my ai to automatically turn on the lights for me . another example could be lets say i set a alarm to wake up at 7 am everyday the more i do it the more it knows and does it itself.
which ml algorithm can i use to achieve this?

uneven flame
novel acorn
#

Hey, so I have a question.

What could be happening here? according to the shape it is 1,445,477 entries but the index goes from 0 to 523,862, which seems pretty weird

novel acorn
#

Already checked, seems normal honestly

desert oar
#

this is difficult to follow. can you post a snippet of code and some kind of demonstration of what goes wrong?

#

!code see below for using code formatting:

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

desert oar
#

!paste or use our paste site:

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

please avoid posting screenshots unless there's absolutely no other way to show what you are asking about

#

it's impossible to search, impossible to read for certain people, and generally puts a lot more burden on other people

#

likewise @novel acorn, can you please post this as a code block

#

check for extra whitespace around the names in the CSV

novel acorn
#

hqahahahah it's because I'm using multiple versions and want to see what changes I'm doing

desert oar
#

e.g. maybe your list has ['a', 'b', 'c'] but the csv has a , b , c which would be ['a ', ' b ', ' c']

#

then you need to manually inspect the files and find out what's going on

#

figure out which particular files are causing trouble, read one of them, and then print the columns with df.columns.tolist()

novel acorn
desert oar
#

you can also put the output in a code block? or in the paste site. it doesn't have to be syntactically valid python code

novel acorn
#

sure, look

#

yup, but that's the problem

#

1.41 million rows

#

default, the file didn't have an index

desert oar
#

@olive jackal so what exactly is the problem with this? you are expecting one of these dataframes to have certain columns, but it doesn't have those columns?

desert oar
#

i told you, the video isn't useful. at least isn't not something i personally can use to help you with

novel acorn
#

it may be, I actually had to do a join because I had every single year in different datasets

desert oar
#

indexes are there to help you, but they can get confusing if you aren't used to working with them

novel acorn
#

I'll try that

#

ff_id = pd.read_csv(path_customer, encoding='unicode_escape')

data_2018 = pd.read_csv(path_2018, encoding='unicode_escape')
data_2019 = pd.read_csv(path_2019, encoding='unicode_escape')
data_2020 = pd.read_csv(path_2020, encoding='unicode_escape')
data_2021 = pd.read_csv(path_2021, encoding='unicode_escape')

filtered_2018 = data_2018.merge(ff_id, on="ID", how="inner")
filtered_2019 = data_2019.merge(ff_id, on="ID", how="inner")
filtered_2020 = data_2020.merge(ff_id, on="ID", how="inner")
filtered_2021 = data_2021.merge(ff_id, on="ID", how="inner")

year_2018 = filtered_2018.drop(columns_to_drop, axis=1)
year_2019 = filtered_2019.drop(columns_to_drop, axis=1)
year_2020 = filtered_2020.drop(columns_to_drop, axis=1)
year_2021 = filtered_2021.drop(columns_to_drop, axis=1)

all_years = pd.concat([year_2018, year_2019, year_2020, year_2021])


#

that's the code I used to read it and to concat it in a single big dataset

desert oar
#

is the ID unique across rows? if so, consider setting it as the index for each dataframe

novel acorn
#

yup, but it's a long id for unique customers

desert oar
#

is it 1 customer per row? or more than 1?

novel acorn
#

1 per row, but customers repeat because it's the movements of 4 years

desert oar
#

that would be more than 1 row per customer then

novel acorn
#

indeed

desert oar
#

as in, the customer id's are not unique

#

are they unique in ff_id?

novel acorn
#

yes, that dataset only has 3k rows

#

because it's the id of the customers that belong to certain category

desert oar
#

and what's in each row of the other tables? some kind of transaction?

novel acorn
#

yup, not transaction but information of a movement (cargo)

desert oar
#

so customer 1234 can appear in any table multiple times? or customer 1234 can only appear in each table once, but they can appear in multiple tables?

novel acorn
#

I'll try it 😄

desert oar
#

@olive jackal i believe you can pass a list of column names to usecols= so you don't have to deal with the column name index business

#
desired_columns = ['a', 'b', 'c']

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    columns_in_file = list(set(tmp.columns) & set(desired_columns))
    data = pd.read_csv(data, usecols=columns_in_file)
    ...
novel acorn
desert oar
#

ok then @novel acorn. this probably won't change your outcome much, but in general i'd do something like this to make it clear what the unique identifiers are (and maybe wrap it up in a function to reduce duplication and the risk of typos):

# Customer data: one row per customer
ff_id = pd.read_csv(path_customer, encoding='unicode_escape').set_index("ID")

# Load cargo data: one row per shipment
def load_cargo_table(path):
    data = pd.read_csv(path, encoding='unicode_escape')
    data = data.join(ff_id, on="ID", how="inner")
    return data.drop(columns_to_drop, axis=1)
paths = [path_2018, path_2019, path_2020, path_2021]
all_years = pd.concat([load_cargo_table(path) for path in paths])
#

i'm not sure what you mean by that. but you can actually use the fact that it accepts callables to your advantage (this example is even given in the docs):

desired_columns = ['a', 'b', 'c']

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    data = pd.read_csv(data, usecols=lambda c: c in desired_columns)
    ...
#
desired_columns = ['a', 'b', 'c']
def is_desired_column(c):
    return c in desired_columns

for file in files:
    tmp = pd.read_csv(file, nrows=0)
    data = pd.read_csv(data, usecols=is_desired_column)
    ...
bronze skiff
#

any opinions here on metaflow?

#

i don't know how i feel about stuffing everything into a single class

novel acorn
hasty mountain
#

@bronze skiff hey, just a quick question:
I've seen that usually GANs don't use Dropout layers, even though a dropout layer helps to avoid overfitting and thus helps with loss function.
However, since GANs are...kinda special and fragile, is this a bad idea?

(I tried using dropout(0.4) after each ReLU in my discriminator. I think I broke my GAN...)

novel acorn
#

yup I know, I was given the data in different files because it was one file per year and they wanted me to do a global analysis

#

I want to learn sql, but due to college and work, I have little to no time, but it's in my to do list 😄

#

😮

#

I'll try it then, I'll see if in the following days I have a little time 😄

#

Thought it was as hard to learn as a new programming language hahahha

stone marlin
#

I'd recommend messing around in pgexercises.com. They have very common questions, have solutions at the bottom which explain their process, and it's pretty fast to pick up SQL. This is in Postgres dialect, but most of the dialects are very, very similar to one-another.

#

When I was doing data engineering, I recommended this to all of the interns who wanted to get into DE. Most [not all] were able to get up to speed relatively quickly and commit something to our codebase in a few weeks. :']

#

To optimize SQL calls takes quite a bit of experience and knowledge of the architecture --- but to do simple calls (which are 99% of the calls people prob will want) the learning curve is fairly low.

arctic crown
#

in LinearRegression can you predict on a single dimension like x?

stone marlin
#

As in, you have a list like [1, 4, 6, 1, 7, 8] and you want to regress on it?

#

They might wanna take the mean in a weirdly convoluted way.

novel acorn
stone marlin
#

No problemo, I love SQL stuff. It's a really fantastic tool to learn and I'd argue that it's essential for most jobs in DS / DE / Analysis these days.

#

I don't think that it's a competition, both are very important to know.

#

Give me an example, I don't think I follow.

#

The kinds of data which I have been working with were not always able to be pulled with looker/tableaux/powerbi --- this was more of the Data Analysts job. But I understand what you mean here. Nevertheless, if you're using a BI tool, the point of it is to be able to pass it around easily and modify it --- so I'm not sure I get where the "passed around and now it's unusable" thing is coming from, unless you mean that there's some people exporting to excel, changing it, etc., which is bad practice in general.

arctic crown
#

wait sorry my question doesent make sense

stone marlin
#

I think though that either way, knowing proper structure to put things in is important as well. For ETL, you're going to need to know the proper ways to Extract (sql, or whatever BI tool you're using if that's acceptable), Transform (whatever system you use here, python/spark/etc.), and Load (which is where the datastructures comes in).

arctic crown
#

please help
so i am making a personal ai assistant its been over 5 to 6 months now
and now i want to add some ml algorithm for doing this:
lets say i wake up at 7 am and turn on the lights and i do this continuously every day till a week when the next week starts i want my my ai to automatically turn on the lights for me . another example could be lets say i set a alarm to wake up at 7 am everyday the more i do it the more it knows and does it itself.
which ml algorithm can i use to achieve this?

stone marlin
#

If the whole df is 90mb, it depends strongly on what you're doing, right? If your ETL is windowing over a whole bunch of stuff, that's more complicated than a single SELECT for the business team.

#

Moreover, it depends where the load is going to. To a DW for the business team? To a db for the DS team? Etc.

#

I think we're both saying right things here: it's important to know SQL (or, equally, the THEORY behind how querying works, since it's still the same thing in Tableaux / whatever, just simplified), as well as how to produce a product relevant for the team you're handing it off to.

#

I'd never give my business team a raw Excel file. I give them a Looker view and they can export if they want, but they can't change it on looker and mess up anything.

#

Haha, okay, see, I'm the person on the data team people buy the coffee for. :']

arctic crown
stone marlin
#

But yeah, if one is not going to be a Data Scientist / Data Engineer, then it's probably not AS important to get too deep in the weeds, you're totally right.

#

Ash, that's still a 2D regression. Your x-axis is day number, your y-axis is the time.

#

Yes, this is true. I tend to do smaller companies, so this is my bias, certainly.

#

Yeah, I like to have a "say" in the company, for some definition of "say". Most of my companies have been around 20 - 300 people, so it's a wide range, but def not a big company.

#

Haha, well, yes. We usually will have a data lake which only DBAs have access to, and they will then push data (with help from DS + Commercial) to a Data Warehouse so that we can attach looker / tableaux / whatever to that. Then we'll have another smaller data warehouse for DS which is mostly lightly parsed Data Lake stuff.

#

That way commercial gets what they want and we don't have to do a ton of gross formatting AND we don't have to worry about weird business defs accidentally getting into the DS part. And then for the DS part, we're free to do whatever we want with that, we're the DBAs of that DW.

#

I don't think it's optimal, but it's definitely served us well! Which goes back to my comment: without SQL, I'd be SOL. Haha. But yeah, in a big company if you're not required to just get your data willy-nilly, SQL might not be a top thing to invest in.

#

Haha, yeah. For BI stuff, tableaux + looker have served be very well. I didn't do much with PowerBI but it looks fine.

#

I've been meaning to learn it, a few companies I'm looking at use it and I don't know much about it.

#

Yeah, it's kind of weird, since Windows got WSL, a bunch of companies my friends are at have slowly transitioned off macs (because they're VERY expensive) and went over to windows. Since the devs get the WSL stuff w/ Linux for dev work, and everyone else is like, "whatever, windows is fine."

#

The only thing I don't wanna do is have to learn Azure cloud. I've already had to learn AWS and GCP stuff, I don't wanna learn another one, haha. But either way, that's a good task for me to do, just to look into it.

#

Yeeeeeep. I'm not sure the direction of the market, but it seems a few companies have been moving over. I'm not sure if it's price-point or what.

#

We'll see how it all goes. AWS certainly has marketshare and name. GCP is way friendlier to use, imo. I dunno about Azure yet. But in ten years we'll see where the market is, haha.

delicate sphinx
#

How can I translate a softmax output (length of 16 data type of float) to integer again (I tokenized my words so the output float -> int -> string)

I'm using Tensorflow but at this point I'm open to anything

delicate sphinx
stone marlin
#

I'm not sure what Dax is, but I'd prob do most of my modeling in the SQL anyhow regardless, haha.

#

28,000 words with one-hot? Lawddddd that's a large, sparse dataset.

delicate sphinx
#

I think TextVectorization gives a more dynamic approach but I couldn't wrap my head around it

arctic crown
#

what does Linear Regression mean?

delicate sphinx
#

I know there's a way of doing it but I've not found a guide or documentation on it

stone marlin
#

I'm not excellent with NLP, so I'm sure someone else can guide you, Tenten. :'[

delicate sphinx
#

Yeah I've been asking for days sadly 😦

stone marlin
#

Ah, got it. Yeah, I have a weird feeling either GCP or Azure is going to pick up Amazon's pieces, but we'll see if they do.

stone marlin
delicate sphinx
#

But it's fine, I find a lot of AI based subjects don't have too many people talking about it (I think a lot of people are just really good and don't need help or it's more of a test and try method)

#

yeah

#

I'm not saying anyones mean, I'm just saying 1) finding someone for your needs and 2) them seeing your messages are a hard to get combo

#

this server is lovely so every day I put the same question in xD

stone marlin
#

Sorry, this is done in MSPaint because I'm on my laptop. For linear regression (in 2d) you have these little datapoints. For you, it might be the x-axis being day number, and the y axis being time to wake up. Something like that. Anyhow, these are the blue circles (they should be the same size, but I'm terrible at mspaint). Linear Regression allows you to draw this red line which "approximates" where the dots are kind of headed.

#

Tenten, can you give me a little more info on what you're doing? You're taking a corpus and some kind of bi/tri/whatever-grams stuff and putting that through a NN? And at the end, you want your activation function to kind of classify something so you need an integer?

delicate sphinx
#

If you've heard of Visual Question Answering, that's what

stone marlin
#

I'm mostly guessing here, but that's the kind'a thing I'd like to know about so I can try to help a bit better. Haha.

delicate sphinx
#

I take an image and a question, and I try to output an answer

#

I'm using tensorflow due to it's extremely powerful Keras API (and its own functions) as well as its documentation

#

but I don't see many examples that allow me to classify text in the capacity I need it - namely from:

String --> Integer tokens (preprocess/tokenize) --> float values (during model training/predicting) --> (output) integer tokens --> string

stone marlin
#

Can you give me an example using a real string (fake values, obv). I'm not sure what you mean by integer tokens.

delicate sphinx
#

so basically a question can be up to length 32:

"What are you doing?"

Would be standardized to lose punctuation and make it all lowercase

#

to become: "what are you doing"

#

then I use a tensorflow tokenizer that splits it at each whitespace

#

["what", "are", "you", "doing"]

#

then my tokenizer is fitted on my examples to give integer token values to my text

#

what --> 581, are --> 20, ....

[581, 20, 14, 3414]

#

Then it's put into my model through an embedding layer and dense layers, activations layers blah blah blah until it reaches the output where it's of the type:

[3.43456e-03, 3.90534e-04, ....]

#

That's 16 long

#

My tokenizer saved in txt file form is 11 KB lol

#

whoops wrong one

#

that's my tensorflow model

#

my tokenizer is 2.7MB