#data-science-and-ml

1 messages · Page 378 of 1

junior lintel
#

I am sorry for the silly/useless question but it is in order to “prepare” myself mentally :
How long does it take to have a brief knowledge of machine learning?

By brief learning I mean knowing how to make an “AI” without deep learning

fresh shadow
misty flint
#

it seems like theyre helping you already

misty flint
#

id say focus on the goal: what does that look like to you?

#

and incrementally learning just enough ML to solve your problem, and then over time, increasing your knowledge base from there

fresh shadow
#

please if you have any more and actual knowldege

#

it won't be long !

misty flint
#

well im on mobile but it looks like a join problem

#

sorry but i wont be able to answer without a laptop

fresh shadow
#

:// i see

#

is anyone else able to, please let me know

misty flint
#

ive tried typing code on mobile before

#

never again

#

will i make that mistake

fresh shadow
#

if ur able to look through that

#

if not then its fine

misty flint
#

JavaLim seems to have it handled

fresh shadow
#

yeah ! going good now

#

hello

#

is anyone available

#

they had to go, and eventho its nearly done im back to needing some help with errors

#

please lmk if u can

upper spindle
#

sorry for asking the same questions about LSTMs, but should i be scaling be sentiment values, as they lie b/w [-1,1]

fresh shadow
#

hello, anyone available to help me ?

serene scaffold
#

@fresh shadow please don't flood this channel with requests for help about the same question. You may be on your own for a bit.

fresh shadow
#

and i am literally almost done, so just kind of ticked and desperate

#

its taken up a lot of my time to explain it to the new helper who has offered, only for them to then leave again

fresh shadow
#

i understand i am texting quite a lot, but i think u can also understand that if all 3 helpers have left half way, that is a lot of time gone taken for me to re explain

#

and not even end up at the end of/ finishing my code

misty flint
#

we're doing minitorch for my deep learning class and i have mixed feelings about it lol. idk if anyone else has tried it.

lone drum
#

full error ```python

File "D:\college_project\modules\train.py", line 212, in post
x_train = np.array(list(map(preprocessing, x_train)))
File "D:\college_project\modules\train.py", line 309, in preprocessing
images= grayscale(images) #convert to grayscale
File "D:\college_project\modules\train.py", line 301, in grayscale
images= cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'```

desert oar
#

just reading its description idk why you'd use it for a deep learning class

misty flint
#

which is understandable but at the same time...trying to write the backprop algo into an ML library is a pain ngl

#

makes me appreciate all the ML libraries i take for granted that i use regularly

desert oar
#

i dont think you should waste time on numerical computing when trying to also teach deep learning

misty flint
#

this is true

#

its eaten up so much time

desert oar
#

yeah

#

seems like someone has an agenda with that syllabus

misty flint
#

hahaha

#

honestly i think it was mostly the TA's agenda

#

the prof is the dept head

#

so she has tenure and all that

#

she mostly just teaches the math part

desert oar
#

that bothers me

#

i really dontl ike classes where you dont learn what your peers would be learning

#

not that school should be a competition as such but

misty flint
#

yeah, idk, its def something, hence my mixed feelings

iron basalt
#

I actually really like this from the POV of being able to make something new. But with the goal in mind of applying ML as quickly as possible (and maybe also for getting the bigger picture first), not so much.

#

This is something to be done AFTER learning Pytorch.

misty flint
desert oar
#

right

#

learning how pytorch is implemented is cool

misty flint
#

like this could be an elective/advanced class or something after doing that

desert oar
#

but not really important

iron basalt
#

It's important for the more niche group that makes ML libraries and systems. But if you are not a systems programmer it's not applicable to you.

misty flint
#

yeah for sure

#

its not like ill be cranking out libraries lol

iron basalt
#

It depends what you do. I do crank out ML libs, but that is relatively niche.

misty flint
#

very interesting

#

well, for me, since its for a grade, ill just approach it from the perspective of working on my coding + problem solving skills

iron basalt
#

It's also perhaps nice from a sort of "I could make this if it suddenly got deleted from the internet" POV.

misty flint
#

do certain companies ask for custom made ML library functions or something?

iron basalt
misty flint
#

i mean if you cant talk about it too much i understand

#

ah interesting

#

i heard a podcast today about OSS business models and sustainability/maintenance over time

iron basalt
misty flint
#

it was kinda sad especially since so many devs use certain OSS

#

interesting interesting

iron basalt
#

"Systems programming" as people call it, although that is often associated with operating systems. I just use the term in general for systems which support some task and generally care about making that fast and stable (usually written in something like C, but in the case of ML also has bindings / a Python side to it (to allow non-systems people to interact with it and unite the systems and scientific community)).

misty flint
#

i can see that

desert oar
#

i'd call that "lower level" programming

misty flint
#

my company is still relatively new so out-of-the-box stuff works fine for them

desert oar
#

i definitely stick to "systems programming = close to hardware"

misty flint
#

they did ask me to see if i can sit on this one meeting to improve one of their "AI products"

#

i heard the problem and i was like "oh. you could probably use NLP to solve this"

iron basalt
#

Systems programming is not always close to hardware. It used to be but not so much now. Low level is when using something like a hardware description language IMO. C used to be considered high-level.

serene scaffold
#

Those were dark times

misty flint
#

thats wild

iron basalt
#

If you think assembly is a pain, try programming an FPGA with a HDL.

misty flint
#

my cousin uses C for work since he does embedded systems and gave me a mini-lecture on EE concepts and i was like "nope"

#

he says C IDEs are funky

desert oar
#

i said "lower level", not "low level" for a reason 😛

iron basalt
#

OpenCL feels like a blessing then.

#

Anyhow, someone has to be able to make stuff like Pytorch, and while you don't need that many people doing that, it still is a thing. Should it be taught in schools? Yeah, probably, but later, if only so that if something goes wrong, like a bug, it could be fixed by them or at least give a good report of it.

#

Also the way things are trending, stuff like FPGAs and other strange hardware is on the rise, and important for future ML.

misty flint
#

everything follows hardware

lone drum
#

  File "D:\college_project\modules\train.py", line 213, in post
    x_validation = np.array(list(map(preprocessing, x_validation)))
  File "D:\college_project\modules\train.py", line 309, in preprocessing
    images= grayscale(images)  #convert to grayscale
  File "D:\college_project\modules\train.py", line 301, in grayscale
    images= cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'```   can anyone help me in this ?
gentle epoch
#

can Matplotlib/Pyplot do something like this? IE, plotting the opening of an angle with an arc and a bisecting line at the mid point ?

wicked cove
#

Hi everyone. If we have a 27.3% of missing values in 8 columns. Is this considered a class imbalance?

wicked cove
stark phoenix
#
TypeError: 'int' object is not iterable
lyric dew
#

hi everyone!
I'm a physics undergraduate student, searching for ideas for a biophysics data analysis project
any suggestion (site or book or something)?😅

safe elk
lyric dew
safe elk
#

Think of using motion sensors used in VR trackers in this project

#

Then its all data analysis

neon imp
#

Figure out how to use sensors to measure leg angles, leverage with the ground and such.

weak kiln
#

Is anyone familiar with a module that works with Python 3.9 that can be used for conjugating verbs in English? Feels like a lot of the modules that can do this (nodebox, pattern.en) have fallen into serious disrepair.

#

please do ping me if you know about anything like that. lemon_cowboy

weak kiln
untold belfry
#

Hi together!

Do you have any idea where the error Columns must be same length as key could come from and how I can solve it?

#

It appeared after I tried to turn my last loop

def compare_row(value, index):
def compare_row(value, index):
    return df[[strName, simRows]].apply(lambda y:
                                        y[simRows] + '|' + str(index + 1)
                                        if fuzz.partial_token_sort_ratio(value, y[strName]) > 90
                                        else y[simRows], axis=1)


for i in range(len(df)):
    df[simRows] = compare_row(df.at[i, strName], i)

into a pandas apply function.

#
def compare_row(value, index):
def compare_row(value, index):
    return df[[strName, simRows]].apply(lambda y:
                                        y[simRows] + '|' + str(index + 1)
                                        if fuzz.partial_token_sort_ratio(value, y[strName]) > 90
                                        else y[simRows], axis=1)


df[simRows] = df[[strName, simRows]].apply(lambda x: compare_row(x[strName], 1), axis=1)
#

(Upper code works, but is slow)
(Code below doesn't work, but would be much faster)

neon imp
#

Not really, but if you break out your lamda function into a subroutine things would be a lot easier to understand.

untold belfry
#

Changed it, thanks for making me acknowledge it.

Basically I'm comparing a column with itself with the program above.

lone drum
#
Traceback (most recent call last):

  File "D:\college_project\modules\model_train.py", line 15, in <module>
    model.add(Convolution2D(16, 3, 3, input_shape = ( 64, 64, 3), activation = 'relu'))

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\layers\convolutional.py", line 473, in __init__
    super(Conv2D, self).__init__(

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\layers\convolutional.py", line 105, in __init__
    super(_Conv, self).__init__(**kwargs)

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\engine\base_layer.py", line 132, in __init__
    name = _to_snake_case(prefix) + '_' + str(K.get_uid(prefix))

  File "C:\Users\shubh\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 74, in get_uid
    graph = tf.get_default_graph()

AttributeError: module 'tensorflow' has no attribute 'get_default_graph'```
#

ping me when replying

misty flint
karmic moth
#

Has anyone has used TF-IDF?

#

I have a small issue regarding where once we apply TFIDF to our text data and set them as the X features, should we convert the X features to a sequence of length by padding it before feeding to a neural network?

karmic moth
#

to elaborate if we apply tfidf to a short sentence, lets say we get 20 features (columns)

#

but if it was a long sentence, we could get 20+ feautures

#

so is there anyway i can have a fixed feature length, just so that i can have that fixed feature length set as the No of Neurons on the input layer of the NN

serene scaffold
untold belfry
misty flint
untold belfry
#

Right now I'm trying to get rid of my last for loop, but I'm not sure why it keeps telling me that Columns must be same length as key.

serene scaffold
#

!traceback

arctic wedgeBOT
#

Please provide the full traceback for your exception in order to help us identify your issue.
While the last line of the error message tells us what kind of error you got,
the full traceback will tell us which line, and other critical information to solve your problem.
Please avoid screenshots so we can copy and paste parts of the message.

A full traceback could look like:

Traceback (most recent call last):
  File "my_file.py", line 5, in <module>
    add_three("6")
  File "my_file.py", line 2, in add_three
    a = num + 3
TypeError: can only concatenate str (not "int") to str

If the traceback is long, use our pastebin.

untold belfry
#

One second, I will do.
Maybe also crosstab could be an option, just saw how it works.

gentle epoch
wicked cove
untold belfry
# serene scaffold Please always show the whole error message
Traceback (most recent call last):
  File "C:\Users\Adrian\PycharmProjects\ProjectMystery1\version2.py", line 22, in <module>
    df[simRows] = df[[strName, simRows]].apply(lambda x: compare_row(x[strName], 1), axis=1)
  File "C:\Users\Adrian\PycharmProjects\ProjectMystery1\venv\lib\site-packages\pandas\core\frame.py", line 3645, in __setitem__
    self._set_item_frame_value(key, value)
  File "C:\Users\Adrian\PycharmProjects\ProjectMystery1\venv\lib\site-packages\pandas\core\frame.py", line 3775, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

But now I saw it, it might be smarter to not work with a double apply, but instead with crosstab.
Is there some kind of way or a similar function to do this (found it in the internet), but without the aggregation?

def cross_fuzz(df):
    ct = pd.crosstab(df['strings'], df['strings'])
    ct = ct.apply(lambda col: [fuzz.ratio(col.name, x) for x in col.index])
    return ct
gentle epoch
pastel valley
#
METRICS = [
      keras.metrics.TruePositives(name='tp'),
      keras.metrics.FalsePositives(name='fp'),
      keras.metrics.TrueNegatives(name='tn'),
      keras.metrics.FalseNegatives(name='fn'), 
      keras.metrics.CategoricalAccuracy(name='accuracy'),
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
]

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)

this the right way to get evaluations ?

#

btw by building a model

model = Sequential()

model.add(Conv2D(16, (3, 3), padding='valid', activation='relu', input_shape=(144,144,3),name='input_tensor'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3), padding='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), padding='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.5))
model.add(Flatten())

model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(train_generator.num_classes, activation='softmax', name='output_tensor'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=METRICS)

their initial weights are random right?
is there a way i can create a 2 models with same initial weights? like a perfectly identical model?
because ill try to train both using different kind of dataset but contains the same classes

weak kiln
#

do feel free to make a suggestion though

shut phoenix
serene scaffold
turbid knot
#

finaly i made my code

snow helm
#

Hello, I am trying to take the correlation coefficient here using pandas of the 2nd two columns of data within each row for each corresponding country

#

For instance, for however many Afghanistan's there are I would take the correlation coefficient of (0.14, 0.222), (0.19, 0.183), (0.24, 0.24), and so on and have a number dedicated just toward that country

#

If I want to go down the whole list all the way to the end is there a way I can do that to only take the CC of certain row indicies? Any help would be greatly appreciated.

serene scaffold
#

@snow helm so you want to group by country, and then treat the left column as x and the right column as y, and calculate the correlation coefficient?

#
from scipy.stats import pearsonr
df.groupby('Country').apply(lambda d: pearsonr(d['x'], d['y']).r)

something like that.

#

though it's not clear from your screenshot what data type this is. it looks like you actually have a numpy array.

snow helm
#

Yes that is exactly what I am trying to do but still struggling a tad @serene scaffold

#

I want a correlation coefficient of both columns for each country

#

This is my code

#
data = pd.read_csv('/users/rchan/Desktop/BCYear3/Data_Science/Homework/Homework4/owid-covid-data.csv')

country_stats2 = data[["location", "people_fully_vaccinated_per_hundred", "new_deaths_smoothed_per_million"]]

country_stats2.dropna(subset = ["people_fully_vaccinated_per_hundred"], inplace=True)
country_stats2.dropna(subset = ["new_deaths_smoothed_per_million"], inplace=True)

print(country_stats2.to_numpy())

columnA = country_stats2["people_fully_vaccinated_per_hundred"]
columnB = country_stats2["new_deaths_smoothed_per_million"]
serene scaffold
#

@snow helm try making it a dataframe with columns named Country, x, and y, and then do the thing I showed.

snow helm
#

so do not convert to numpy then

serene scaffold
#

no

snow helm
#

so right now it looks like this not in numpy form

#

now designate column x to the first variable and y to the second?

#

I believe this is what you are saying but I'm getting a keyerror saying columnA or 'x' in your case doesnt exist

#

KeyError: 'columnA'

#

Sorry still learning in my intro to ds class so im a bit lost on the proper syntaxing

serene scaffold
snow helm
#

but i designated column A to that literal column?

serene scaffold
#

so you get the same result as if you had looked up a dictionary key that's not there

snow helm
#

"people_fully_vaccinated_per_hundred"

serene scaffold
#

nope, it has to be based on d in the lambda

#

assigning a variable doesn't change the names of df columns

snow helm
#

So I would have to assign the acutal column names in the lamda

#

by replacing columnA and B to the name of the actual thing

serene scaffold
#

just use the whole name of the column.

snow helm
#

Gotcha

#

AttributeError: 'tuple' object has no attribute 'r'

serene scaffold
#

let me think

snow helm
#

No problem

#

I appreciate the help by the way

#

Thank you very much

#
country_stats2.groupby("location").apply(lambda d: pearsonr(d['people_fully_vaccinated_per_hundred'], d['new_deaths_smoothed_per_million']).r)
serene scaffold
#

just remove the .r and see what happens

#

I thought it would return a named tuple, not a regular tuple

snow helm
#

ValueError: x and y must have length at least 2.

serene scaffold
#

but I was wrong. just like I always am sad_cat

serene scaffold
#

can you do country_stats2['location'].value_counts() so we can see?

snow helm
#

Quite possibly the list is like 10k rows

#

sure

#

Looks like there exists at least 1 with a count of 1

serene scaffold
#

and here I was, thinking that the Solomon Islands were innocent

snow helm
#

ahaha

serene scaffold
#

so, you'll have to drop rows for countries with only one row

snow helm
#

I dont think you can even take a proper CC with 1 row of data anyways

#

Not enough data points

serene scaffold
#

yeah, that's what the error from before was telling you lemon_sweat

snow helm
#

sorry im a newb 😦

serene scaffold
#

that's okay 🙂

snow helm
#

how exactly would i drop the guily solomon islands from my list?

#

or those with count of 1

serene scaffold
#

I'm reading a book about neural networks right now in another tab, so I'm someone else's noob.

snow helm
#

From the looks of a grouping by count on my own it looks like there might be more with just 1 instance so ill have to eliminate all of those somehow

#

Would it be something like this?

df = df[df.groupby('ID').ID.transform(len) > 1]
#

Found that on stack

serene scaffold
#

thinking

#

I mean you can try that 😄 and then just chain .groupby("location").apply(lambda d: pearsonr(d['people_fully_vaccinated_per_hundred'], d['new_deaths_smoothed_per_million'])) onto it, without writing over any variables

snow helm
#

i mean it looks like it possibly worked?

serene scaffold
#

what was the result

snow helm
serene scaffold
#

wooo

snow helm
#
country_stats2 = country_stats2[country_stats2.groupby('location').location.transform(len) > 1]
serene scaffold
#

though you want to be careful about writing over df variables, since you lose whatever it was before

snow helm
#

Would it work if I kept the df variables?

#

Because I named my data frame as something else

#

As shown with 'country_stats2'

serene scaffold
#

depends on what calculations you're doing

snow helm
#

Ah

#

So it looks like it runs now the line you provided as well with lambda but I'm not seeing where my coefficients are exactly

#

Or I might just be clueless 🤣

serene scaffold
#

dataframe operations mostly create new dataframes, rather than modifying existing ones. so if you don't print it, it's just gone

snow helm
#

oh so I would have to print right away

serene scaffold
#

yes, or give it to a variable

snow helm
#

Now im trying to figure out which one is the CC now 🤣

serene scaffold
#

is this everything you ever wanted?

snow helm
#

I would think the CC would be just one number but i guess its of each column

serene scaffold
#

the left value is the CC

snow helm
#

And what would the right represent

#

oh wait

#

CC is between -1 and 1

#

im a bozo

#

The last part I just need to accomplish now is to put all of the CC's in a 1d array to take stats of it

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @rich kestrel until <t:1645384841:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

serene scaffold
#

oh boy

snow helm
#

Hmmm so I gotta find a way to iterate each country and just pull the left number

serene scaffold
#

lambda d: pearsonr(d['people_fully_vaccinated_per_hundred'], d['new_deaths_smoothed_per_million']) is the lambda

snow helm
#

Looks a lot nicer as well

serene scaffold
#

YAY

#

and no solomon island

snow helm
#

No solomon island indeed!

#

got my median number available to me as well with .describe()

#

and now just gotta notch bot plot and i am done 😃

serene scaffold
#

Has anyone seen this use of a semicolon in math notation?

#

I think this is basically the point? you get a vector that shows the distance between the prediction and the desired value for each item, and then the cost is the average.

calm thicket
#

yeah, i've seen it, let me dig up what it means

serene scaffold
#

so I assume the colon is just separating arguments

calm thicket
#

ah right, separating variables and parameters

serene scaffold
#

instead of a comma

echo vigil
#

i was wondering if somebody had a good explanation of cache awareness and out of core computation and how these improve the performance of xgboost

sour edge
#

does anyone want to build an ai in a discord bot????

iron basalt
minor elbow
#

its a bad table cause they didnt label the rows and the columns, but if the rows are the predicted class and the columns are the actual class its correct

#

yeah not labelling axes is a big red flag for me

#

ah yeah the second plot is better

#

i find it easier to think of recall as "of the samples we applied the label to, what % did we label correctly" rather than in terms of TP/FN

#

oh no thats precision

#

or is it

#

gah i need to finish my coffee

#

recall is "of all the instances of label x, what proportion did our classifier actually give the label x"

#

its better for reasoning when u have more than 2 classes

stone marlin
#

This is the pic I always think about.

Precision is: "I said a lot of things were positives... but what percent were true positives?"
Recall is: "I got a lot of positives... but how many positives did I get out of the total possible?"

upper spindle
#

how do i join these two columns together, so there is another column

#

so the two columns are side by side along with the date*

spark urchin
#

So I'm curious in generating unique sentences using AI, but I'm not sure where to start. My only requirement is it should be able to use a list like this:py self.responses = [ "good", "fine, how are you?", "understandable, have a great day", "Reddit is wonderful", "Python is better then JavaScript and C++", "The dev cat speel :p", "Hi guys! Today’s going great! ᕕ( ᐛ )ᕗ", "c:" ]
What do you guys recommend for making an AI?

serene scaffold
serene scaffold
arctic wedgeBOT
#

DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)```
Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.
spark urchin
#

ok, so I've got something like this: py [('that', 'stupid', 'fan', 'in', 'standingpad'), ('stupid', 'fan', 'in', 'standingpad', 's'), ('fan', 'in', 'standingpad', 's', 'raspberry'), ('in', 'standingpad', 's', 'raspberry', 'pi'), ('standingpad', 's', 'raspberry', 'pi', 'won'), ('s', 'raspberry', 'pi', 'won', 't'), ('raspberry', 'pi', 'won', 't', 'shut'), ('pi', 'won', 't', 'shut', 'up'), ('i', 'm', 'trying', 'to', 'study'), ('m', 'trying', 'to', 'study', 'for'), ('trying', 'to', 'study', 'for', 'a'), ('to', 'study', 'for', 'a', 'math'), ('study', 'for', 'a', 'math', 'test'), ('will', 'you', 'shut', 'up', 'man'), ('can', 'you', 'shut', 'up', 'for'), ('you', 'shut', 'up', 'for', '5'), ('shut', 'up', 'for', '5', 'minutes'), ('may', 'you', 'please', 'stop', 'asking'), ('if', 'you', 'won', 't', 'shut'), ('you', 'won', 't', 'shut', 'up'), ('won', 't', 'shut', 'up', 'i'), ('t', 'shut', 'up', 'i', 'll'), ('shut', 'up', 'i', 'll', 'just'), ('up', 'i', 'll', 'just', 'stop'), ('i', 'll', 'just', 'stop', 'here'), ('for', 'goodness', 'sake', 'let', 'me'), ('goodness', 'sake', 'let', 'me', 'watch'), ('sake', 'let', 'me', 'watch', 'youtube'), ('let', 'me', 'watch', 'youtube', 'c'), ('angry', 'garbling', 'noises', 'someone', 'stole'), ('garbling', 'noises', 'someone', 'stole', 'my'), ('noises', 'someone', 'stole', 'my', 'muffins')]

#

but how would I work with this?

serene scaffold
#

anyway, you also need 4-grams for the same sentences. once you've made the 4-grams, pick one randomly, and then pick a 5-gram that's the same except for the last token (since the 5-gram has an extra token on the end)

spark urchin
#

I just used the nltk library

serene scaffold
#

@spark urchin here's an example from an ngram generator I made a few years ago

#

using 3-grams and 4-grams

#

you have to count (n-1)grams and ngrams. or n and (n+1)grams, I guess, depending on how you think of it. two consecutive integers >2

#

and you pick one of the (n-1)grams that you have, and then pick an ngram that completes it

#

and then step forward by one token/gram and continue.

#

does that help?

#

in this example, "They fought with weapons" and "They fought with them" are sequences of 4 tokens that also appear in the data

#

but we randomly select one so that we can continue.

spark urchin
#

so for selecting, something like this?: ```py
def mood_response(self):
rand_response = random.choice(self.responses)

    self.out4 = list(ngrams(rand_response.split(), 4))
    self.out3 = list(ngrams(rand_response.split(), 3))

    rand_response4 = random.choice(self.out4)  
    rand_response3 = random.choice(self.out3)  
serene scaffold
#

default palate. they just need to be distinct.

#

it's black on my screen and then exports as white.

spark urchin
#

so I have an output like: slams head on CPU | head on CPU

serene scaffold
#

and then to start making a sentence, you randomly pick an (n-1)gram

spark urchin
#

so in this case, a 3gram?

serene scaffold
#

and then you randomly pick an ngram where the first n-1 grams are the same

#

this means that the first n grams will be an actual sequence of grams from the corpus.

#

does this much make sense? so far, you've just copied an ngram from the corpus, basically.

spark urchin
#

I don't get it

serene scaffold
#

okay, well I'll continue with the explanation

#

the window then slides forward

#

so now you're looking at (n-1) grams again

#

see how for the red section, it's an (n-1) gram of ("they", "fought", "with")?

spark urchin
#

yeah

serene scaffold
#

and then the next n-1gram is ("fought", "with", "the")?

spark urchin
#

ok

serene scaffold
#

each time, you're completing the current (n-1)gram with an ngram.

#

and then you move forward by one gram and do it again.

spark urchin
#

ahh

#

so something like this?:py loop: rand = select_rand(ngrams) next_ngram = rand + 1 merge(rand, next_ngram) continue

serene scaffold
#

you need a collection of (n-1)grams and ngrams. two separate lists/sets/whatever

#

there are guides for how to make them online.

spark urchin
#

I do have a set of 4grams and 3 grams

serene scaffold
#

but this is basically the original system for sentence generation.

spark urchin
#

so it's more like?:py loop: ngram_3 = select_rand(ngrams) ngram_4 = get_ngram_4(ngram_3) merge(ngram_3, ngram_4 ) continue

serene scaffold
#

something like that, I guess

#

if you want to do it the simpler way, where you don't have to account for sentences ending, you can just stop after you reach a certain number of grams

spark urchin
#

ok, thanks

misty flint
#

oh stelercus is here

#

i have an NLP question if youre interested

#

so we are trying to find out synonyms for words in this one domain

#

my initial idea is to find out the word embeddings of the words that we need synonyms for

#

so that we can figure out some sort of cosine similarity score between words that are synonyms or not

#

then i thought, hmm, this domain is unique enough where it would be better to use a model fine-tuned with words from this domain beforehand

#

@serene scaffold what would be your approach?

#

if youre busy, feel free to ignore

#

just curious

serene scaffold
misty flint
#

cool cool

#

im glad our initial ideas are the same

#

ill look more into it just wanted to make sure im going down the right path

#

thanks

serene scaffold
#

though one dilemma with word embeddings is that if there are homographs for that word, it's all the same

#

are you trying to find synonyms in context, or just in general? @misty flint

#

if you're looking for synonyms in context, one approach is to use BERT. though the top prediction from BERT might be something uninteresting, like a pronoun, or a word of a different part of speech that happens to also form a coherent result.

misty flint
#

theres a biomedical BERT that might give us better results

#

but i would have to test ir

#

it

serene scaffold
#

I used bioBERT in my paper

#

am I famous now?

#

🥺

misty flint
#

oh shoot

#

famous stelercus

serene scaffold
#

I think I also have a trivial commit in their repo

misty flint
#

you are def famous

#

in my world

serene scaffold
#

thx bb 💚

misty flint
serene scaffold
#

aww

#

I love kitties

misty flint
#

its one of my favorite emojis

serene scaffold
#

I wrote it like two years ago. I think it works.

misty flint
#

ill def look into it and try a few things

#

see what works best for our use case

#

if i end up using your repo, ill def reference you

iron basalt
compact parrot
#

(I hope it's the most relevant channel)
I am trying to implement fact searching, but I stucked with find facts in my language
So, would be ok to translate text to English and try to use English solutions and translate facts again to my language or mistake probability will be too high? 🤔

exotic thicket
#

Hello, peps would someone mind interpreting radiance and irradiance in computer vision I surfed the internet lot but didn't got the idea..

iron basalt
# exotic thicket Hello, peps would someone mind interpreting radiance and irradiance in computer ...

In radiometry, radiance is the radiant flux emitted, reflected, transmitted or received by a given surface, per unit solid angle per unit projected area. Spectral radiance is the radiance of a surface per unit frequency or wavelength, depending on whether the spectrum is taken as a function of frequency or of wavelength. These are directional qu...

#

In radiometry, irradiance is the radiant flux received by a surface per unit area. The SI unit of irradiance is the watt per square metre (W⋅m−2). The CGS unit erg per square centimetre per second (erg⋅cm−2⋅s−1) is often used in astronomy. Irradiance is often called intensity, but this term is avoided in radiometry where such usage leads to con...

#

In geometry, a solid angle (symbol: Ω) is a measure of the amount of the field of view from some particular point that a given object covers. That is, it is a measure of how large the object appears to an observer looking from that point.
The point from which the object is viewed is called the apex of the solid angle, and the object is said to s...

round cove
#

or ann API for that

iron basalt
#

Radiance - "Radiant flux emitted, reflected, transmitted or received by a surface, per unit solid angle per unit projected area. This is a directional quantity."

#

Irradiance - "Radiant flux received by a surface per unit area."

#

"Radiance is useful because it indicates how much of the power emitted, reflected, transmitted or received by a surface will be received by an optical system looking at that surface from a specified angle of view."

river maple
#

please helpme solve thsi

lapis sequoia
#

Which would be a place to draw our NNs to put in report or something?

upper spindle
lapis sequoia
upper spindle
rigid summit
#

Hello all
I am working in a time series LSTM problem

I have numeric data and also caterogerical data (weekdays which i ONE HOT ENCODED), How do i combine this to create a sequence for my prediction. I want to predict the numerical values

lapis sequoia
upper spindle
#

I executed the code combined = data.join(df1) where data and df1 are my dateframes

#

even when i try to tweak it, using parameter to place it on the right how, still gives me a column with values NaN

lapis sequoia
#

show that code

#

hm btw i think you would need merge

#

since cols are different

upper spindle
#

i tried merge

#

i think, the issue is that, i cant get rid of the date column

lapis sequoia
#

not sure why stel suggested join

#

they have same vals right?

#

and you want to join in terms of their vals?

#

(the two cols)

upper spindle
#

yeye, i want to append the two columns so theyre side by side

#

and by date

upper spindle
#

i think the date column is an issue

lapis sequoia
#

so you want something like 130.08+0.21 something for date 2020....

upper spindle
#

yeye

rigid summit
#

Hello all
I am working in a time series LSTM problem

I have numeric data and also caterogerical data (weekdays which i ONE HOT ENCODED), How do i combine this to create a sequence for my prediction. I want to predict the numerical values

upper spindle
#

in the same df

#

if that makes sense

lapis sequoia
#

then add both cols simply.
df.new_col = df.a + df.b

upper spindle
#

hmm, let me give it a try

lapis sequoia
#

I'll mess in my lab hold on

upper spindle
#

sure thing, no probs

lapis sequoia
#

!e

import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
                   'A': np.arange(6) + 1})
df2 = pd.DataFrame({'Date': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
                   'B': np.arange(6) + 3})
df3 = df.merge(df2, on='Date')
df3['C'] = df3.A+df3.B
print(df3)
arctic wedgeBOT
#

@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.

001 |   Date  A  B   C
002 | 0   K0  1  3   4
003 | 1   K1  2  4   6
004 | 2   K2  3  5   8
005 | 3   K3  4  6  10
006 | 4   K4  5  7  12
007 | 5   K5  6  8  14
lapis sequoia
#

done

#

you can get rid of A and B ofc

upper spindle
#

thanks, ill give it a try now aha

serene scaffold
upper spindle
#

dammit

#

is there a way to the missing values the value from before, but because i scraped the data it would be hard to as there isnt a row that can replace the missing data

shell depot
#

Hey,
is there anyone who worked with healthcare data standards like HL7 V2 or FHIR ?

serene scaffold
#

@upper spindle look into interpolation

serene scaffold
shell depot
#

actually they are two :

  1. How can I convert some proper data to FHIR standard ?
  2. How can I transform an HL7 V2 data to FHIR data ?
lapis sequoia
waxen girder
#

I'm working with text data, specifically I'm trying to make sure links have been generated correctly based off of record names. I have extracted the relevant part of the link to check. Now I'm dealing with the issue of trying to understand how the programmer sanitized the name before generating the link. For example, in one instance he removed all commas in the name before generating the link. Should I bother trying to reverse engineer what he did? There seems to be a lot of edge cases. I'm thinking of using more of a heuristic and just split the name and check to see if each word in the name is in the link. And if that threshold is > 50% say the link is correctly generated.

waxen girder
#

I think this is working.

short heart
#

What does lstm len mean in bilstm, pytorch?

agile cobalt
#

might be long short term memory?

neat anvil
short heart
neat anvil
#

I don’t see that variable in the tutorial I linked, so I’m not sure what you are referring to.

snow helm
#
dict = {}

def alphaToInt256(x):
    ascii_values = [ord(char) for char in x]
    count = 0
    tracker = 0
    intValue = 0
    for i in ascii_values:
        count = count + 1
    for i in ascii_values:
        i = (i * 256 ** (count - 1))
        ascii_values[tracker] = i
        tracker = tracker + 1
        count = count - 1
    for i in ascii_values:
        intValue = intValue + i
    return intValue
    
numerical_code = list(map(alphaToInt256, iso_codes))
#print(numerical_code)

total_dpm = data.groupby("iso_code", as_index=False)[["total_deaths_per_million"]].max()
#print(total_dpm)

for i in range(len(numerical_code)):
    dict[numerical_code[i]] = total_dpm.iloc[i,1]

print(dict)

print("\nQ1 - nanquantile of dict : ", np.nanquantile(dict.values(), .25))
#

Just trying to take the values within my dictionary and get the nanquantiles of them, would anyone be able to provide some assistance?

#

Just having trouble accessing the values themselves since they dont support indexing

#

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

maiden sundial
#

For code below I am getting result
df.groupby(["Outlet_Establishment_Year","Item_Type","Item_Outlet_Sales"]).max().sort_values(by=['Item_Outlet_Sales'])

But for below code I am getting "sort_values() got an unexpected keyword argument 'by'
df.groupby(["Outlet_Establishment_Year","Item_Type","Item_Outlet_Sales"]).size().sort_values(by=['Item_Outlet_Sales'])

Why sort_values is not working with size()?

lapis bramble
#

can someone suggest a good course for SQL on youtube

serene scaffold
serene scaffold
serene scaffold
#

or the .size() one.

lapis bramble
#

freecodecamp has one but it is just a beginner course

serene scaffold
#

in either case, this is the data science channel. try asking in #databases

#

if you do end up finding a resource you like that meets your expectations, let me know in #community-meta so I can see about getting it on our website.

serene scaffold
misty flint
#

update:

#

IBM Watson is pretty decent at NLP

#

however, there are some serious deficiencies when applied to our domain

#

i think its just the nature of the data it was trained on aka probably not enough from our domain

serene scaffold
#

IBM Watson 🤮

misty flint
#

bro

#

apparently its super overpriced too

serene scaffold
#

I just don't like how pretentious IBM is, and how they're still coasting on the publicity that their Watson brand got for the success of a specific algorithm that's now a decade old and probably not even used.

misty flint
#

yeahhh

#

like

#

i feel like, at least for our use case, i could probably fine-tune GPT3 and it would probs perform better

serene scaffold
#

can you fine-tune gpt3? I thought there's only one instance of gpt3 and it's behind an api paywall

misty flint
#

but idk how much we have in our budget to try something new and i think they charge per token

#

this FEELS expensive

serene scaffold
#

but I think you can train your own gpt2, or something? (I'm mostly about information extraction so I'm not up-to-date on GPT-\d)

misty flint
#

but i dont actually know

misty flint
misty flint
#
OpenAI

OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.

serene scaffold
#

OpenAI
Pricing
Open?

misty flint
#

💀

#

ikr

neon heart
#

Why is my fillna method not working?

geo_AL_fips["BusinessType"] = geo_AL_fips["BusinessType"].fillna('N/A')

It works and then I write to csv and check the dataframe again and its back to nan

serene scaffold
#

well for one thing, why do you want to fill them with N/A?

#

because that's just a worse version of NaN, in many ways.

neon heart
serene scaffold
#

are you trying to make sure that NaN values are written a certain way when you to to_csv?

neon heart
serene scaffold
#

what is geo_AL_fips["BusinessType"].unique() intended to do?

neon heart
#

but when I read it in again it's back to nan I just need to to read 'N/A' for js application

serene scaffold
#

If all you're trying to do is write the null values in the CSV file as N/A, you can just do this

path = "data/state_data/geo/geo_fips/AL_fips_scraped.csv"
pd.read_csv(path).to_csv(path, na_rep='N/A')
neon heart
#

Yes but I only need it for this one column.

#

If it's just going to do that because thats how it interprets N/A I'll just come up with something else to change it to.

#

But that may come in handy in the future as well, thank you.

serene scaffold
#
path = "data/state_data/geo/geo_fips/AL_fips_scraped.csv"
df = pd.read_csv(path)
df['BusinessType'].fillna('N/A', inplace=True)
df.to_csv(path)
neon heart
#

Still does the same thing. fills with N/A, I re-write file, read it back in and it is back to nan.

serene scaffold
#

oh. can you see what's missing from my to_csv call? I included it earlier.

#

actually, hmm

#

well, I need to focus on work. sorry.

neon heart
#

No problem, thanks.

minor elbow
#

what does df.BusinessType.dtype say

#

if u want to store a literal 'N/A' it should be type object

vernal haven
#

i am noob looking for fun learning project based on lottery numbers draw history results, not sure what to do with csv of 100's of lotto results, any ideas please?! 🙂

minor elbow
#

or that they fit uniform distribution

vernal haven
#

benfords law... ok I will google this ty!, I was thinking about 'popularity' of numbers, or trying to find a recurring pattern! this is more analysis i think, not ML which was my first thought ha ha!

#

(like i say i am beginner to data science/ ML!) still got long way to go!

gray fossil
#

Ik you all must have heard this question several times but pls bare with me. I am done with andrew ng course on machine learning and now I am feeling lost on how to actually actually code ml and ai or what tutorials or resource to follow could anyone pls help .

neon heart
minor elbow
#

u can covert it with like geo_AL_fips["BusinessType"] = geo_AL_fips["BusinessType"].fillna('N/A').astype('object')

#

although i woulda thought that happened automatically

neon heart
#

I ended up changing it to a different string variable but I was thinking that's the case. Edit: Was already obj. type

minor elbow
vernal haven
vernal haven
neat anvil
neat anvil
# gray fossil Ik you all must have heard this question several times but pls bare with me. I a...

The big libraries all have excellent tutorials for how to apply ML/AI for various datasets. See https://scikit-learn.org/stable/tutorial/index.html as an example. PyTorch, tensorflow, xgboost all have good introductions as well.

#

Writing a basic neural network “from scratch” (meaning just numpy functions) is also a good introductory project

lapis sequoia
#

yo is it possible to program a gf for myself?

odd meteor
#

!e

print('Hello World')
odd meteor
minor elbow
serene scaffold
#

The bot won't recognize it as a command if you edit it. It has to be a command the first time

#

!echo spam spam spam spam

arctic wedgeBOT
#

spam spam spam spam

serene scaffold
#

There's no particular reason why I did that.

odd meteor
#

!e

import pandas as pd
import numpy as np

df = pd.DataFrame({'Languages' : ['Lingala', 'Swahili', 'Twi', 'Yoruba'], 'Country': ['Congo', 'South Africa', 'Ghana', 'Nigeria']}) 

print(df) 
arctic wedgeBOT
#

@odd meteor :white_check_mark: Your eval job has completed with return code 0.

001 |   Languages       Country
002 | 0   Lingala         Congo
003 | 1   Swahili  South Africa
004 | 2       Twi         Ghana
005 | 3    Yoruba       Nigeria
odd meteor
serene scaffold
#

@odd meteor this significantly understates the linguistic diversity of Africa. And the cultural irrelevance of its boarders.

odd meteor
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @exotic epoch until <t:1645487073:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

serene scaffold
#

rekt

lapis sequoia
serene scaffold
#

> having standards

untold belfry
#

Thanks again for your help over the last few days, @serene scaffold.
Improved my program a bit more and changed to using lists and later tuples to store my results.

Now I got a program which compares automatically columns by name for me. 😄

#

I mean, it still can and will be extended, but the core function does work very well already.

#

The main changes I will add from now on, are mostly only some kind of splitting the data and maybe add a renaming algorithm for very similar, but not equally named rows.

inland zephyr
#

I want to make a CNN model (the FrontBlock) to share a same output to other model blocks (BuildingBlock and BuildingBlock2) then combine the result in the end with multiply (EndModel). However it seems that my model just splitted between BuildingBlock and BuildingBlock2 with the same FrontBlock structure. This is how i create my model

def FrontBlock():
    input_front = Input(shape=(160, 160, 3))
    x = Conv2D(filters=64, kernel_size=(3, 3), activation='relu')(input_front)
    x = Conv2D(filters=128, kernel_size=(3, 3), activation='relu')(x)
    out = Conv2D(filters=128, kernel_size=(3, 3), activation='relu')(x)
    HeadBlock = Model(input_front,out)
    return HeadBlock
def BuildingBlock():
    hdr = FrontBlock()
    x = Conv2D(filters=64,kernel_size=(3,3),activation='relu')(hdr.output)
    x = Conv2D(filters=64,kernel_size=(3,3),activation='relu')(x)
    x = BatchNormalization()(x)
    x = AveragePooling2D(pool_size=(3,3))(x)
    x = Flatten()(x)
    out = Dense(512, 'relu')(x)
    MiddleBlock = Model(hdr.inputs,out)
    return MiddleBlock
def BuildingBlock_Two():
    hdr = FrontBlock()
    x = DepthwiseConv2D(kernel_size=(3, 3), activation='relu')(hdr.output)
    x = Conv2D(filters=64, kernel_size=(5, 5), activation='relu')(x)
    x = MaxPooling2D(pool_size=(3, 3))(x)
    x= Flatten()(x)
    out = Dense(512, 'relu')(x)
    MidModel = Model(hdr.inputs,out)
    return MidModel
def EndModel():
    mid_one = BuildingBlock()
    mid_two = BuildingBlock_Two()
    x = Multiply()([mid_one.output,mid_two.output])
    out = LeakyReLU()(x)
    TailModel = Model([mid_one.inputs,mid_two.inputs],out)
    return TailModel
#

and this is when i call these line:

MComb.compile(optimizer='Adam')
plot_model(MComb, to_file=r'C:\Users\Admin\Pictures\modelo.png', show_shapes=True)```
serene crystal
#

anyone know why matplotlibs bar.set_height doesn't do anything for me?

inland zephyr
#

it not stacked well since the front block not splitted the result as i expected (i cannot assure if both share same output from FrontBlock)

serene scaffold
serene crystal
serene scaffold
misty flint
#

@serene scaffold update: the VP wants me to see if we can replace Watson with GPT-3

#

but the thing is, he also told me that they dont have any FTE positions after i graduate

#

and i was like

#

welp

#

rip

#

i guess i will try my best before i leave

#

just so i can say ive done it before

#

idk

serene scaffold
pearl ice
serene scaffold
#

though this person uses str.format where I would use fstrings. and my way is better.

misty flint
serene scaffold
misty flint
#

i dont usually use the regular matplotlib but ill save your link for when i do

lapis sequoia
#

for ordinary least squares, how would I go about proving this?

#

Could someone explain or show me how?

serene scaffold
#

Interesting question, and one I should probably know the answer to. I'll see if I can investigate tomorrow.

lapis sequoia
#

Thanks @serene scaffold

rigid bronze
#

Hello plz help me
I don't know much statistics
Please suggest me some YouTube channel or any paid course that teach complete statistics for data science and ML

dusk tide
calm sequoia
#

For transforming a very large JSON dataset into a different schema should I use pyspark? I'm particularly interested since it seems very easy to use though AWS Glue

upper spindle
#

are validation sets always necessary

odd meteor
rigid summit
#

hello all

i have this array

           [9,7,6,5,4,3,2,1,6,7],
           [7,6,5,4,3,7,8,9,0,1]])```


and i want to convert it to

```array([[1,[2,3,4,5,6,7,8,9,0]],
          [9,[7,6,5,4,3,2,1,6,7]],
          [7,[6,5,4,3,7,8,9,0,1]]])```


How can i go around it
pastel valley
#

hi how can i plot the learning rate?

#

also why are there values like this on my metrics?

lapis sequoia
lapis sequoia
lapis sequoia
#

anyways, can you tell me which matric do you want to plot? I just did it in my own notebook.

#

you can store the model.compile in history variable and then use it.

pastel valley
lapis sequoia
#

we usually plot what changes with iterations, like accuracy.

pastel valley
lapis sequoia
pastel valley
#

these are the metrics i used

lapis sequoia
# pastel valley like this

so i think for this you will need to do experiment with different learning rates.
then you can plot with given data.

pastel valley
#

categoricalaccuracy for multiclass right?

lapis sequoia
#

may be tho. I'll need to check.

pastel valley
#

maybe i should manually calculate the precision and recall based on tp fp etc?

lapis sequoia
#

i can see your accuracy going up. so precision and recall should too. atleast more than 0.

acoustic halo
#

Your model isn't getting any true or false positives at all

lapis sequoia
acoustic halo
#

It looks like everything is being predicted as negative

#

And it just happens that negative is right 38% of the time

lapis sequoia
#

oh yes yes, tn and tf are all up and tp and fp are all 0.

lapis sequoia
#

@pastel valley you way wonna look at the model again then, currently its predicting everything as negative.

#

why dont you show your model?

acoustic halo
#

There is something fundamentally wrong with your model and it's not learning anything

lapis sequoia
#

also why would you print tf, nf, tp, fp anyways since you are saying its categorical.

acoustic halo
#

It's probably because the final layer is set up wrong, likely using a sigmoid layer for binary classifications when he wants to do something like softmax for multiclassification

forest bluff
#

how to create a python program to extract emails found within the txt file . and result in nested json format with No of times the email is repeated in the text.

lapis sequoia
forest bluff
#

@lapis sequoia

acoustic halo
#

f is undefined, probably because it can't read the file

#

specify your encoding

#

open(path, 'r', encoding='utf-8')
it might not be utf-8 though

pastel valley
lapis sequoia
#

we assume that one class is positive and one is negative, we cannot define positive and negative if we have 5 classes.

pastel valley
#

@acoustic halo@lapis sequoia
its not this bad last attempt i just tried to input custom learning rate rather than the default? what is the default learning rate of i did not put something there?

acoustic halo
#

It has nothing to do with your learning rate

pastel valley
pastel valley
acoustic halo
#

Your model is broken

#

You are doing multiclassification right? Your model is trying to do binary classification

pastel valley
#

hahaha damn thats worst

lapis sequoia
acoustic halo
#

Just post what your model looks like

pastel valley
#

Layer (type) Output Shape Param #

conv2d_32 (Conv2D) (None, 142, 142, 16) 448

max_pooling2d_33 (MaxPoolin (None, 71, 71, 16) 0
g2D)

conv2d_33 (Conv2D) (None, 69, 69, 32) 4640

max_pooling2d_34 (MaxPoolin (None, 34, 34, 32) 0
g2D)

conv2d_34 (Conv2D) (None, 32, 32, 64) 18496

max_pooling2d_35 (MaxPoolin (None, 16, 16, 64) 0
g2D)

dropout_11 (Dropout) (None, 16, 16, 64) 0

flatten_11 (Flatten) (None, 16384) 0

dense_40 (Dense) (None, 1024) 16778240

dense_41 (Dense) (None, 512) 524800

dense_42 (Dense) (None, 256) 131328

dense_43 (Dense) (None, 5) 1285

lapis sequoia
#

why don't you show the model? may be you fucked up your loss.

acoustic halo
#

What activation function is on your last dense layer?

pastel valley
#

softmax with 5units

acoustic halo
#

huh, that is surprising

pastel valley
#

ill try to retrain without using the learning rate of 0.007

lapis sequoia
acoustic halo
#

Can you give an example of some of the outputs of the model?

minor elbow
acoustic halo
#

Are they all exactly the same or do you get different results?

pastel valley
acoustic halo
#

0.001 is default

#

Does it always make the same prediction is what i mean?

pastel valley
#

this is the prediction of gt_model why its above 1?

acoustic halo
#

it's not above 1

pastel valley
#

o those e- hahaha

lapis sequoia
#

hm so you are applying softmax, it is converting to probability distribution so it is doint what you are telling it to do.

acoustic halo
#

Okay, well I think your model is maybe alright then, true positive etc are probably just not a good metric for this

pastel valley
lapis sequoia
#

your expected y is like... [0,0,0,1,0..] ?

pastel valley
acoustic halo
pastel valley
#

i am retraining it now its looks ok

#

maybe the learning rate i entered is bad

#

i just now used the default

acoustic halo
#

What was it originally?

pastel valley
acoustic halo
#

yeah

lapis sequoia
#

hm i guess you trained less(or less slowly). also do not use tp tn kinda metrics here, well they are not logical for multiclass.

pastel valley
#

when i used the default optimizer='adam' i get this good result and when i tried to change the learning rate to 0.007 thats where everything is negative

lapis sequoia
pastel valley
#

btw whats this number ?

lapis sequoia
#

precision and recall..well we usually use them for 2 classes(if im not mistaken(i can be wrong))

lapis sequoia
acoustic halo
#

you can use tp/tn/fp/fn but its a bit of a pain, you have to do it treating each class individually as a binary classification problem to get the F1 score for EACH class

pastel valley
#

rows? what rows?

acoustic halo
#

Then you can use the F1 from each class to get the macro/micro f1 scores

acoustic halo
pastel valley
#

wait wait lets talk it slowly

#

precision is the confidence of the model in predicting the right class
recall is the score of the model on how much its getting wrong prediction?

#

f1 score is?

acoustic halo
#

accuracy, calculated from precision and recall

upper spindle
#

how could I calculate the (standard deviation)*7 for the first seven days, then the next 7 days then the 7 days after that, to get the weekly volatility from the column log returns. And put it into a dataframe, please, thanks in advance

pastel valley
acoustic halo
pastel valley
#

btw its doing good now model is being genius

acoustic halo
#

Yeah thats doing really well now

#

A little bit of hyperparam optimisation and you could probably bump it up another 0.5%

pastel valley
#

hyperparameter optimization is like trial and error right?

lapis sequoia
#

since this is image you can use pretrained models and add another layer on them

pastel valley
#

try this if its good maintain if bad try other is it how it works?

acoustic halo
#

You could try different optimizers, different layer sizes

#

diffferent activation functions

#

loads of stuff

pastel valley
pastel valley
lapis sequoia
#

then waht spagoose said^

#

but that's around 96% now right?

upper spindle
pastel valley
#

is the the image being processed by the model per epoch?

acoustic halo
#

How many images are you training?

#

Each epoch is 1 full round of training after every image has been used

lapis sequoia
pastel valley
#

6k
also i rescaled it its the same as normalize? right ? its good because it makes the training faster right?

lapis sequoia
#

you have a lot more options in image generator btw.

acoustic halo
#

When training through a single epoch, it does it in smaller batches of images

lapis sequoia
#

giving more rotated and zoomed out and inned and blurred images, it helps your model.

pastel valley
#

my folder is arranged like this

pastel valley
acoustic halo
#

The fact that you don't have separate tarining and test data is a problem though

pastel valley
acoustic halo
#

So it will train 5 images at a time

lapis sequoia
#

by giving same training and testing data, you are literally ignoring that overfitting exists in ML.

acoustic halo
#

6k/5 ~=1360

pastel valley
acoustic halo
#

Yeah, I just mean you can't tell if your model is good or not until you do that

pastel valley
acoustic halo
#

it will use all 6k

#

It just does it 5 at a time

lapis sequoia
acoustic halo
#

Each epoch is basically a load of smaller training steps

#

I would probably leave that as 32(default)

pastel valley
pastel valley
# lapis sequoia 1360 batches**

oh so it really the batch but with 5 images i get it now is it also considered as hyperparameter?
if i tweak the batch will it affect performance or just training speed?

acoustic halo
#

Yes and Both

pastel valley
acoustic halo
#

It probably wont make much of a difference tbh, plus there's not much point making hyperparam optimisations until you have the separate test set

pastel valley
#

if i dont include batch_size in flow_from_directory() then the default is 32?

pastel valley
lapis sequoia
#

If I'm having my custom layer in keras, how can i add non linearity function?

pastel valley
#

btw this means half of the weights are ignored right ?

acoustic halo
#

yeah and dropout is another hyperparameter

lapis sequoia
pastel valley
#

oh yeah its randomly selected every input ?

#

is it by input?

#

or pass?

#

or batch?

#

whats the term?

acoustic halo
#

I would assume each batch but i'm not sure

pastel valley
#

meaning half of the 16384 are only activated?

acoustic halo
#

The layer before

pastel valley
acoustic halo
#

No, thats wrong

lapis sequoia
pastel valley
acoustic halo
#

Basically, each epoch the model will train on 5 images, then the next 5, then the next 5 until all images have been trained

pastel valley
#

before flatting?

acoustic halo
#

Before being passed into flatten_12 yes

#

It will basically set half of the values passed into flatten to 0

lapis sequoia
#

but 5 by 5 yeah.

pastel valley
#

in imagination the 16,16,64 is being flattened to be 16384 but the half of it is ignored and dont apply activations for the dense_44?

acoustic halo
#

It will train all images over a single epoch, but it does it 5 at a time in smaller training steps is what i mean

acoustic halo
pastel valley
#

it will become zero or just example?

acoustic halo
#

will become 0

pastel valley
#

i thought its not getting passed

acoustic halo
#

It has to be the same shape still, it can't just not pass values

pastel valley
#

oh i see it will retain the units

#

btw on f1 score if i got the precision and recall i just use to for the formula then thats the f1 score of my model?

acoustic halo
#

I think keras can calculate the f1 score for you

lapis sequoia
acoustic halo
#

Then use those f1 scores to get the macro-f1 score for the model

lapis sequoia
#

TensorflowAddons has f1Score, not yet in tf.

acoustic halo
#

So I would use whatever keras has built-in to do it for you (if it has any)

pastel valley
pastel valley
lapis sequoia
#

(i have not played with it yet, so try by yourself)

acoustic halo
#

tfa.metrics.F1Score

pastel valley
#

this is probably will come from the model.fit right? how can i retrieve the y_true , y_pred on training ?

#

of if i add it hear it will automatically works?

#

i dont even know how the model knows that metrics isnt its just argument to fit()? is the keras.model.sequential expecting those metrics already maybe?

acoustic halo
#

Generally i only use the default metrics so you will have to play about and see what happens

#

If you are obly bothered about the accuracy, just use that

pastel valley
#

oh i see but if i did it this should be what you mean by f1 score by class?

acoustic halo
#

Probably? I can't be bothered to do the math in my head

#

But like i said, if you don't specifically need it, just stick to accuracy

pastel valley
acoustic halo
#

It is the f1 for each class im fairly sure

pastel valley
#

btw my model is predicting 1s and 0s is it because its confident? or i can change it so that it outputs the probabily for each classes?

acoustic halo
#

What was the final layer?

pastel valley
#

softmax

acoustic halo
#

It should be a probability distribution then, maybe it is really confident then

#

Infact yeah obviously, its because you are training and testing on the same data

#

It knows it is correct because it's been trained on the test data so it is super confident

pastel valley
#

thats what probability distribution right? or i understand it wrong?

acoustic halo
#

Yeah thats right, except they add up to 1 not 0.9

pastel valley
#

ops hahaha i swear i counted it 1

#

anyways i learned i lot today thank you @acoustic halo@lapis sequoia
till next time 😅 👍 .

clever hinge
#

What is the difference between fit_transform & tranform ?

from sklearn.impute import SimpleImputer

X_train, X_valid, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2,

# Imputation
my_imputer = SimpleImputer()
imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))

Using Imputation to address Missing Values

#

Can we use transform instead of fit_transform for the above example as the second argument is not passed

clever hinge
# clever hinge

It says it will first fit then it will return the transformed X. Can anyone tell me what it is going to do by fitting it

agile cobalt
#

you must first fit a transform before using it to transform the data, so that it will have the necessary information about the data you're fitting

#

if you check the documentation, it should have some fields ending with _ that are "learned" from the data

clever hinge
#

Ok & what about tranform it is not fitting the data

odd meteor
agile cobalt
#

try to comment out the line with fit_transform and see what happens if you try to transform without fitting

#

that said... if the transformer you're using is doing something extremely simple that does not requires any information about the data, it is possible that they just included it for compatibility with the rest of the api

clever hinge
#

Ok but X_train & X_valid are having the same features then why is one tranformed differently from the other

#

except that X_train consists of 80% data & rest is in X_valid

odd meteor
# clever hinge Ok but X_train & X_valid are having the same features then why is one tranformed...

Because that's generally how it works. Remember we apply the information learned from the train set on the validation/test set.

Recall that we only call fit() method on the train set when training a model and then call .predict() method on the holdout set ( val/test set) when making prediction. If you understand this concept without confusion, then you'll realize it's pretty much the same principle that's applied when calling .fit_transform() method on train set and then calling .transform() method on the val set.

neat anvil
# clever hinge Ok but X_train & X_valid are having the same features then why is one tranformed...

They are transformed the same - that is the whole point. For example, if you are Imputing missing values by using the average of that feature for the data set, you want to "fit" that average to be from the training set data only. You then "transform" the training data by filling in missing values with the average from the training data. Then in the validation and future prediction input data, you "transform" them in the same way - by filling in missing values with the average of the training data.

#

like has been mentioned. the fit_transform method is just for convenience. You could also use fit and transform called separately, both on the same dataset.

#

however you definitely do not want to call fit_transform or fit on the validation, testing, or future prediction data, because then you've changed your model pipeline.

clever hinge
neat anvil
rose quarry
#

I have the following code which I've got from a book

mentions = [500, 505]
years= [2017, 2018]

plt.bar(years, mentions, 0.8)
plt.xticks(years)
plt.ylabel("# of times I hears someone say 'data science'")

# if you dont do this, matplotlib will label the x-axis 0, 1
# and then add a +2.0.13e3 off the corner (bad matplotlib!)
plt.ticklabel_format(useOffset=False)

#misleading y-axis only shows the part above 500
plt.axis([2016.5, 2018.5, 499, 506])
plt.title("Look at the 'Huge' Increase!")
plt.show()```
and this is the graph that is produced
#

but when I comment out that line, this is the graph that is produced

#

What exactly is the difference?

neat anvil
#

there may be none - are you using the exact same version of matplotlib as the code in the book?

#

If not, the behavior may be different

rose quarry
#

Im not entirely sure,

#

It was printed in 2019, would have changed since then?

neat anvil
tidal bough
#

I feel like it wouldn't even necessarily require a different mpl version; the defaults might be system-dependent.

neat anvil
#

yeah - it'd be very likely for the book author to have some plotting configuration saved on their computer that they were using to make sure plots looked good in print.

#

this is why we have Docker

onyx mica
#

. Plot a 30Hz sine wave, with an amplitude of 40mV peak-to-peak over 3 cycles. Select an appropriate step size for the plot and add an appropriate title and axis labels with units.

  1. Plot a 30Hz cosine wave, with an amplitude of 40mV peak-to-peak over 3 cycles. Select an appropriate step size for the plot and add an appropriate title and axis labels with units. Use the same worksheet, and the same time column from Q1.
    trying this on excel
#

any tips

somber prism
#

guys i got one doubt, is it necessary to divide the image /255 and also use Normalize from torchvision transforms ?? isn't /255 normalize the image and convert the pixels range to 0-1 ? then why do we have to use Normalize again or am i mistaken ?

tidal bough
#

Normalize doesn't act to make the range of values from 0 to 1, but to get a specific mean and std (usually 0 and 1)

#

observe that if all the values are from 0 to 1, the std is definitely less than 1, and the mean is probably higher than 0

somber prism
#

ohhh

agile cobalt
#

0~1 is something like MinMax if I recall correctly?

somber prism
#

yeh

somber prism
somber prism
tidal bough
#

yeah, though you could skip the division too, really - Normalize can take it, it's a linear transformation after all

terse frigate
#

hi guys

#

this is an assignment given to me to get an internship

#

i have never done any of these

#

and i am still learning

#

what would be a good resource or a starting point?

#

is there any particular tool/library i should learn or use?

#

pls guide me

somber prism
#

would be a good resource to get into computer vision

misty flint
#

interesting problem PikaThink

#

any model trained on ImageNet should be sufficient. but yeah theres tons of resources out there

surreal badge
#

Hi, So i trying to line-graph Music-albums. I want the Y-axis to be the the length of the longest album im going to use and every song on the album to be dotted out at the right time on the timeline. I have no idea how to go about this at all. Can anyone point me in the right direction? Dont really know the right terms to search

misty flint
#

sorry, having trouble figuring out your question / visualizing it

surreal badge
#

i will draw an image

misty flint
#

ok

terse frigate
surreal badge
#

Hi, So i trying to line-graph Music-albums. I want the Y-axis to be the the length of the longest album im going to use and every song on the album to be dotted out at the right time on the timeline. I have no idea how to go about this at all. Can anyone point me in the right direction? Dont really know the right terms to search ( see an drawing of what im trying to do https://i.imgur.com/Pd13HsK.jpeg )

#

Now @misty flint hope this make is more clear.

misty flint
#

where the size of the bubble can indicate one of your axes

#

since youre technically comparing 3 things at once

#

hmm but if youre insistent, you could just graph this using excel, powerBI, or tableau

#

or any other BI tool

#

if you want to use python, matplotlib, seaborn, plotly

#

etc.

#

i recommend tableau or plotly

surreal badge
#

Im doing this just to learn so i want to use python.

misty flint
#

oh

#

then any of the python ones i mentioned should be fine

surreal badge
#

Thanks a lot i will check them out. i was using matplot first

misty flint
#

that one is ok. its just too ugly for me sometimes lol

surreal badge
#

haha, ye it doest look too twenty first century 😄

misty flint
#

even R's stuff looks cleaner

#

no offense to R. i like R for bioinformatics/stats

surreal badge
#

Thanks for the help.

lapis sequoia
lapis sequoia
#

This is the current output. After one interation, I would like to save np array of positions and one value for ln probability. After another iteration I would like to append the file.

gloomy fulcrum
#

Hey, I'm building a script to do an analysis of a text to find lines that corresponds to a given subject matter. For example to grab all lines that correspond to a subject matter of ”love” or ”hate”, or maybe even something more specific like ”I have just fallen in love” or ”I’m feeling lonely”. Anyone know a library that would help me to do such analysis?

lapis sequoia
desert oar
#

i don't think numpy data files are "extensible"

#

or you can save to hdf5, which does allow you save multiple arrays in the same file

#

just call it mcmc_output_1.npz, mcmc_output_2.npz etc

ornate ore
#

does anyone know how to take a data frame in python and convert it into a table in excel?

serene scaffold
ornate ore
#

A workbook in python

#

Sorry I meant data frame

#

Still new lol

serene scaffold
#

Save a dataframe as an excel file. Look for ExcelWriter in the pandas docs @ornate ore

#

I'm on mobile or I'd find the link for you

#

It's easy though.

ornate ore
#

okay thank you

rigid schooner
#

hay fam

#

who will send sales data to work on for fun?

brave sand
#

Has anyone gotten OpenAI to work?

#

I’ve downloaded it but it won’t let me run the Lunar Lander bc of Box2D not valid.

serene scaffold
forest bluff
#

how can i get output

#

in thiss format

#

in nested json

serene scaffold
# forest bluff

Is this a data science question? I'll only look at the code as actual text

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

forest bluff
serene scaffold
#

Next time, use the markdown format.

#
import re
from collections import  Counter
file = open('websiteData.txt', 'r', encoding='utf-8')
f = file.read()
h = re.findall('[A-Za-z0-9.+-]+@[A-Za-z0-9.-]+.[a-zA-Z]*', f)
count=Counter(h)
print(count)
for i in h:
    a = i.split('@')
    if (len(a[0]) <= 8):
        print('Company Email:',i)
    else:
        print('Human Email:',i)
#

Please copy and paste an example from websiteData.txt as text.

arctic wedgeBOT
#

Hey @forest bluff!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

forest bluff
#

@serene scaffold

serene scaffold
#

I think this pattern is better: [\w\.+-]+@[\w-]+\.[A-z]+

#

is the length of the website name really what tells you if it's a company or not?

#

you can also add named groups: (?P<name>[\w\.+-]+)@(?P<site>[\w-]+\.[A-z]+)

forest bluff
#

ok

serene scaffold
#

!e

import re

text = """Emonics LLC
Senior Big Data Engineer
Emonics LLC
Vancouver, British Columbia, Canada
send email to hr@vancuemonics.com
likecelebratesupport agent syneca.gregory@gmail.com"""

emails = re.finditer(
    r"""(?P<name>[\w\.+-]+)@(?P<site>[\w-]+\.[A-z]+)""",
    text
)

for email in emails:
    print(email.groupdict())
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | {'name': 'hr', 'site': 'vancuemonics.com'}
002 | {'name': 'syneca.gregory', 'site': 'gmail.com'}
serene scaffold
#

@forest bluff see?

#

anyway, I would put every email in a list, and then pass that list to Counter

forest bluff
#

okay

serene scaffold
#

and then you can make a new dict that's {email: {count: n, type: human/company}}

#

by iterating over the counter and checking the end of the email string

forest bluff
#

ok

#

i will try

tranquil glacier
#

!e
import re

text = """Emonics LLC
Senior Big Data Engineer
Emonics LLC
Vancouver, British Columbia, Canada
send email to hr@vancuemonics.com
likecelebratesupport agent syneca.gregory@gmail.com"""

emails = re.finditer(
r"""(?P<name>[\w.+-]+)@(?P<site>[\w-]+.[A-z]+)""",
text
)

for email in emails:
print(email.groupdict())

arctic wedgeBOT
#

@tranquil glacier :white_check_mark: Your eval job has completed with return code 0.

001 | {'name': 'hr', 'site': 'vancuemonics.com'}
002 | {'name': 'syneca.gregory', 'site': 'gmail.com'}
royal crest
#

what would be the best way to go about making a new column in a data frame based on the values of some other columns?

for example if i want to make New based on A and B and the logic being
New_i = 1 if A_i == 1 or B_i == 1 else 0

  |  A  |  B  | New |
0    0     0     0
1    0     1     1
2    1     0     1
novel elbow
# royal crest what would be the best way to go about making a new column in a data frame based...

There are several ways:
One way is to do
df['new'] = df.apply(lambda x: 1 if (x.A==1) or (x.B==1) else 0, axis=1)
But it will execute row by row and that may be slow if data is big
Taking advantage of vectorization you can do:
df['new'] = ((df.A==1) | (df.B==1)).map({True: 1, False: 0})
But still the map part is not vectorized (it will be row by row). So, an even faster way will be:

df['new'] = 0
df.loc[(df.A==1) | (df.B==1), 'new'] = 1```
#

Just checked in my pc, with a dataframe of size 10k.
Method 1: 94.7 ms ± 1.27 ms
Method 2: 1.35 ms ± 79.2 µs
Method 3: 555 µs ± 146 µs

#

So yup, as always in python, avoid for loops D:

sharp vapor
#

I want to write my pandas data frame to CSV (1371980 rows × 2 columns) in if I am using df.to_csv(path,index=False), it is only writing 104000 rows...... can anyone help me with that I am using google collab

terse frigate
#

hi @serene scaffold

terse frigate
acoustic halo
somber prism
somber prism
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @compact stirrup until <t:1645615491:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia
#

hey guys, how to put every word in a string to a list? The below image shows my example strong

hollow sentinel
#

did they really just encourage list comprehensions in pandas

tulip breach
#

hi

#

does anyone have ai program that generate pictures from given sample

#

for example sample is folder with 1000 photos

#

and this program should generate similar photos

oak creek
#

hola

tulip breach
#

^

oak creek
#

generating picture?

#

that is a hard work damnn

tulip breach
#

yea

#

from given samples

oak creek
#

hmm

#

let me be honest

#

it is possible, yes

#

but in that condition

#

it will be hardest thing to ever done

tulip breach
#

or maybe there is already something like this

#

on internet

#

but i just cant finfd anything like this

oak creek
#

that used text instead of picture right?

tulip breach
#

no

oak creek
#

any example

tulip breach
#

i say maybe

#

i dont know if there is

oak creek
#

i don't think it will be done within 10k IMO

desert oar
#

i find it odd that he is using a line profiler and not just timing the execution

wooden forge
#

Hi everyone, so I just was wondering if anyone could help me find ressources about a problem I have. Basically I'd like to plot a harmonic functions and a scattered plot (on the same figure) and counts dynamically how many times the harmonic function encounters a randomly placed dots on the figure. I'd love to make it dynamic in terms of seeing the curves being traced and having a counter at the top to see how many times it meets a dot on the grah. Thanks in advance!

teal mortar
#

does anyone know why to calculate softmax we use exp(x) / exp(x).sum(dim=1), why not use 10^x or 100^x, why we use specifically e**x???

desert oar
desert oar
wooden forge
#

Would it be possible to during the animation have a counter tho? so getting in real time information of the graph ? or enlight the concerned dots time to time ?

teal mortar
wooden forge
#

basically I don't really know where to start on that, I can make a basic programm with two harmonic functions, make a 2d array of some kind with random point and check how many time each function cross a dot from the array

#

but how could I specify that the size of a dot isn't 0 (basically just a geometrical position) but a real dot with a radius ? should I make a specific function that creates a circle inside an array and then apply that randomly to each cell of the global array?

desert oar
#

And of course the natural logarithm is the inverse of an exponential function with base e

teal mortar
desert oar
#

I think 3b1b has some good video about why it's so "natural"

#

Maybe not him, but I know somebody does

teal mortar
desert oar
wooden forge
#

my problem is that (not my only one), I don't know how to create a numpy array containing the dots in the first place

#

I would just :
Create array
Take random position
Apply some function to create dots of a certain radius there
return the array

#

Apply some function to create dots of a certain radius there is my issue

desert oar
wooden forge
#

😳

desert oar
#

and "advancing" in this case just means looping through the pre-computed function

wooden forge
#

ho ye !

#

And is it possible in an animated graph to get the current value of the point at a certain time, and then apply the function on that point to see if there is any nearby circles ?

#

I feel like this would be very slow, so maybe using time steps ?

desert oar
#

again, do this without worrying about animation first

#

just write a loop that prints a counter

wooden forge
#

oki

#

start easy and then more advanced

#

that makes sense thanks !

desert oar
#

yes, always

wooden forge
#

I'll try this !

#

this is exciting :3

#

see you then !

#

now I feel like an idiot because I want to make an array of 0 and 1 (1 means there is a dot), with randomly placed 1 but I don't know how