desert oar May 24, 2022, 2:12 PM

#

('Quantity', 'sum') this is a tuple

loud cove May 24, 2022, 2:12 PM

#

desert oar `('Quantity', 'sum')` this is a tuple

I copied your code.

desert oar May 24, 2022, 2:12 PM

#

my code does not include that

loud cove May 24, 2022, 2:12 PM

#

.

desert oar May 24, 2022, 2:12 PM

#

don't copy and paste. read, understand, and apply the knowledge to your own situation

loud cove May 24, 2022, 2:13 PM

#

.

#

t_q = group by orders and get sum quantity.
drop duplicates from original df then merge it back with t_q to get the results i want, which is working fine.

cerulean stream May 24, 2022, 2:15 PM

#

Hello, idk if this is the right channel to ask this but
how would I find the most common item in a 3d numpy array?, bincount only works on 1D arrays , I have a an array thats like [[[1, 2, 3], [4, 5, 6], [[1, 2, 3], [10, 11, 12]]] and I want the most common "innermost list" not the numbers
obviously the lactual array would be much bigger but in this case [1, 2, 3] would be returned

desert oar May 24, 2022, 2:16 PM

#

loud cove t_q = group by orders and get sum quantity. drop duplicates from original df the...

but as you showed, the sum is wrong

#

the resulting dataframe is empty

#

and you get messed up data as a result

loud cove May 24, 2022, 2:17 PM

#

desert oar but as you showed, the sum is wrong

my merge is working perfectly fine.

#

my question is about the group by with the same exact code you evaluated here and not working.

#

nvm

wooden sail May 24, 2022, 2:18 PM

#

cerulean stream Hello, idk if this is the right channel to ask this but how would I find the mos...

one way to do this is to iterate through the inner arrays, subtract them (with broadcasting) from the nd array, and compute the l0 norm of the result. the inner array that yields the smallest l0 norm is the most "common" one... except that this doesn't tell you if the array repeats at all. it's a start, though

loud cove May 24, 2022, 2:18 PM

#

but yea doesn't matter im mainly interested on the group by thing

desert oar May 24, 2022, 2:19 PM

#

@loud cove i am actually running your code now with your data... give me a bit

loud cove May 24, 2022, 2:19 PM

#

wooden sail one way to do this is to iterate through the inner arrays, subtract them (with b...

wouldn't the mode of the mode of np.stats be it?

wooden sail May 24, 2022, 2:20 PM

#

possibly, if it takes nd arrays as valid objects

desert oar May 24, 2022, 2:20 PM

#

wooden sail one way to do this is to iterate through the inner arrays, subtract them (with b...

i was going to suggest reshaping this to Nx3, converting it to a list of tuples, and then using Counter or similar

#

your solution is pretty clever

serene scaffold May 24, 2022, 2:21 PM

#

cerulean stream Hello, idk if this is the right channel to ask this but how would I find the mos...

this channel is busy today, so try an individual help channel. but this is the correct topical channel for your question.

desert oar May 24, 2022, 2:21 PM

#

@loud cove it's possible that groupby interacts badly with the missing values in the string data

wooden sail May 24, 2022, 2:21 PM

#

alternatively, you can use an equivalence relation to do something similar. this should be better, on second thought. depending on what index gymnastics you are used to, you could keep the dimensions as is or reshape to a matrix, then use the outermost index and == my_array_at_this_iteration. again, summing over the resulting boolean array will give you the count you're after

loud cove May 24, 2022, 2:22 PM

#

desert oar <@965084284274765824> it's possible that groupby interacts badly with the missin...

yeah makes sense, that's why i think just seperating then merging is the way to go.

desert oar May 24, 2022, 2:22 PM

#

well you'd have to merge anyway, so that solution is correct

loud cove May 24, 2022, 2:22 PM

#

desert oar well you'd have to merge anyway, so that solution _is_ correct

I was hoping that grouping would get it all at once, but doesn't matter.

desert oar May 24, 2022, 2:23 PM

#

however are you really trying to group and merge on all of these fields?

desert oar May 24, 2022, 2:23 PM

#

wooden sail alternatively, you can use an equivalence relation to do something similar. this...

i've never been the best at numpy index gymnastics, i'd be curious to see this solution

loud cove May 24, 2022, 2:23 PM

#

desert oar however are you really trying to group and merge on _all_ of these fields?

the id column is the important one, that is why i went with merge.

cerulean stream May 24, 2022, 2:23 PM

#

Okay thanks everyone Ill try them

wooden sail May 24, 2022, 2:24 PM

#

lemme set something up and show you an example

desert oar May 24, 2022, 2:24 PM

#

loud cove the id column is the important one, that is why i went with merge.

i see... maybe you just need groupby('Order ID') then?

loud cove May 24, 2022, 2:24 PM

#

desert oar i see... maybe you just need `groupby('Order ID')` then?

then I'd need to merge anyways

desert oar May 24, 2022, 2:25 PM

#

oh, i see... you were trying to avoid merging

#

yeah just do the join/merge

#

apparently groupby + null is a bad mix

#

@loud cove https://replit.com/@maximum__/groupby

loud cove May 24, 2022, 2:26 PM

#

it seems to be more about the duplicates

#

#

you see the dropping duplicates doesn't work

desert oar May 24, 2022, 2:27 PM

#

i think it might be dropping the nulls when grouping

#

i don't think it has to do with duplicates

#

import pandas as pd

text_columns = [
    "Sale Code",
    "Order ID",
    "Store Name",
    "Player First Name",
    "Player Last Name",
    "Shipping First Name",
    "Shipping Last Name",
    "Shipping Address",
    "Shipping City",
    "Shipping State",
    "Shipping Zip",
    "Billing Phone",
    "Billing Email",
]

dtypes = {c: "string" for c in text_columns}
dtypes.update({"Quantity": pd.Int64Dtype()})

df = pd.read_csv(
    "order_report.csv",
    usecols=text_columns + ["Quantity"],
    dtype=dtypes,
)

total_quantity = df.groupby('Order ID')["Quantity"].sum().rename('Total Quantity')
df = df.join(total_quantity, on='Order ID')

print(df[['Sale Code', 'Order ID', 'Quantity', 'Total Quantity']])

loud cove May 24, 2022, 2:27 PM

#

desert oar May 24, 2022, 2:27 PM

#

note the use of proper null-supporting Int64 and Stringg dtypes

#

that's still not the point

#

the point is that there are nulls in the columns you are grouping on

#

it says it right here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

loud cove May 24, 2022, 2:28 PM

#

yea im just saying trying to dedupe even doesn't work

desert oar May 24, 2022, 2:28 PM

#

because it's irrelevant

#

that's the whole point of groupby - aggregating across duplicated values

#

you need to pass dropna=False to groupby

loud cove May 24, 2022, 2:29 PM

#

im talking about the groups, not the aggregations

#

it is probably just the NaNs given that drop https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html didn't work either.

wooden sail May 24, 2022, 2:33 PM

#

cerulean stream Okay thanks everyone Ill try them

In [24]: import numpy as np

In [25]: X = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [10, 11, 12]]])

In [26]: test = np.array([1,2,3]) #iterate over these

In [27]: counts_intermediate_step = ( X.reshape(4,3,order='F') == test ).dot(np.ones(3))

In [28]: counts = counts_intermediate_step == 3

In [29]: counts
Out[29]: array([ True,  True, False, False])

In [30]: result = sum(counts)

In [31]: result
Out[31]: 2

there must be a more clever way, but this works and should be more or less efficient. equivalently, you could use the subtraction approach i mentioned earlier. idk what is faster in numpy, a broadcasted difference or a boolean comparison

#

ofc there is no need to print nor keep the intermediate result, so you can take what says In [28] and call sum on it directly or multiply by a vector of ones from the left

#

this should extend to arbitrary-sized innermost dimensions and arbitrarily many axes or ways or whatever you call it (here you have a 3d or 3way array) as long as you're careful in the reshaping

#

on further thought, this can be done in like 2 lines using einsum, but i don't think it's much faster

desert oar May 24, 2022, 2:55 PM

#

good old einsum

arctic wedgeBOT May 24, 2022, 3:03 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1653405227:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

lapis sequoia May 24, 2022, 3:08 PM

#

@runic raft

#

Hi mate. I just realised that the JACCARD SIMILARITY that you taught me doesn't have all booleans. And one column is the sex column with 0 as females and 1 as males. Is it still fine to put them in the Jaccard? Seems alright because it's a relative measure standard for all rows. But just confirming.

loud cove May 24, 2022, 3:14 PM

#

So I have a data frame that I want to use to fill a pdf, anyone have recommendation for a lib? I'll wrap it in a file and then move it to an excutable for non python users to use.

stone pollen May 24, 2022, 3:20 PM

#

https://stackoverflow.com/questions/33155776/export-pandas-dataframe-into-a-pdf-file-using-python
maybe that will help

Stack Overflow

Export Pandas DataFrame into a PDF file using Python

What is an efficient way to generate PDF for data frames in Pandas?

loud cove May 24, 2022, 4:12 PM

#

stone pollen https://stackoverflow.com/questions/33155776/export-pandas-dataframe-into-a-pdf-...

not exactly what I wanted, but just converted to dictionary, looped through it, and went with this.
https://github.com/t-houssian/fillpdf

misty flint May 24, 2022, 4:40 PM

#

have you guys noticed the websites with good search engines vs. those that have crappy search engines

#

kekHands

#

makes me want to build my own sometimes

#

~~if anyone has good resources for that btw lmk~~ ID_blurryeyes

dusty valve May 24, 2022, 6:01 PM

#

How would i write a wordle solver using tensorflow and transformers?

serene scaffold May 24, 2022, 6:54 PM

#

dusty valve How would i write a wordle solver using tensorflow and transformers?

why transformers?

dusty valve May 24, 2022, 6:54 PM

#

serene scaffold why transformers?

because i was wondering if there was a pre trained modal

serene scaffold May 24, 2022, 6:55 PM

#

dusty valve because i was wondering if there was a pre trained modal

when one uses transformers in natural language stuff, it's because you care about the meaning of the words. in wordle, the fact that you're working with words doesn't even matter. you're just trying strings that match a known set of constraints.

dusty valve May 24, 2022, 6:55 PM

#

serene scaffold when one uses transformers in natural language stuff, it's because you care abou...

hmm, alr

#

and wow do you type fast

#

is that a steno keyboard i sense?

serene scaffold May 24, 2022, 6:56 PM

#

dusty valve and wow do you type fast

No, I've just been typing since I was like 7

dusty valve May 24, 2022, 6:57 PM

#

serene scaffold No, I've just been typing since I was like 7

nice

serene scaffold May 24, 2022, 6:58 PM

#

probably not that uncommon for a late millennial.

#

when you get to gen z, they probably have more experience with touchscreen keyboards than physical ones.

#

@dusty valve you might enjoy this deep dive: https://www.youtube.com/watch?v=v68zYyaEmEA

YouTube

3Blue1Brown

Solving Wordle using information theory

An excuse to teach a lesson on information theory and entropy.
Special thanks to these supporters: https://3b1b.co/lessons/wordle#thanks
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share the videos.

Contents:
0:00 - What is Wordle?
2:43 - Initial ideas
8:04 - Information theory...

▶ Play video

dusty valve May 24, 2022, 7:01 PM

#

serene scaffold <@879807617260716143> you might enjoy this deep dive: https://www.youtube.com/wa...

thanks

lapis sequoia May 24, 2022, 7:07 PM

#

model=DecisionTreeClassifier() 
kfold_validation=KFold(10)

results=cross_val_score(model,X,y,cv=kfold_validation)```
Can someone tell me the difference between this and
```py
model=DecisionTreeClassifier() results=cross_val_score(model,X,y,cv=10)```

serene scaffold May 24, 2022, 7:09 PM

#

model = DecisionTreeClassifier() 
kfold_validation = KFold(10)
results = cross_val_score(model, X, y, cv=kfold_validation)
# vs
model = DecisionTreeClassifier()
results = cross_val_score(model, X, y, cv=10)

Please use spaces in your code, so that it's easier to read.

One moment.

#

!docs sklearn.model_selection.cross_val_score

arctic wedgeBOT May 24, 2022, 7:10 PM

#

sklearn.model\_selection.cross\_val\_score


sklearn.model_selection.cross_val_score(estimator, X, y=None, *, groups=None, scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', error_score=nan)```
Evaluate a score by cross-validation.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation).

serene scaffold May 24, 2022, 7:11 PM

#

@lapis sequoia it appears that if you pass a KFold for cv=, then the KFold instance does the work of partitioning the dataset. Whereas if you pass an int, then cross_val_score decides how to partition the data into that many folds on its own.

lapis sequoia May 24, 2022, 7:12 PM

#

So k-fold is the same as standard value of 10. Whereas you can send in different instances such as stratified k fold. Or stratified random?

serene scaffold May 24, 2022, 7:13 PM

#

sounds right to me.

lapis sequoia May 24, 2022, 7:13 PM

#

Or whatever the default way to split might be for cross_val_score. You just gotta put in the number of folds.

#

Sounds good

#

https://c.tenor.com/7Ypq9_9najcAAAAM/thumbs-up-double-thumbs-up.gif

serene scaffold May 24, 2022, 7:13 PM

#

or you could define your own generator that does it however you want

lapis sequoia May 24, 2022, 7:14 PM

#

Also. You know you were right. Feature selection is usually worthless for decision trees

#

At each split it automatically does a sort of "feature selection" so it knows what's good for it.

serene scaffold May 24, 2022, 7:14 PM

#

yay

lapis sequoia May 24, 2022, 7:15 PM

#

Based on information gain or some other index.

#

Also. Could you tell me something on how can I compare 2 models. I only have the final accuracy scores of them. And I compared them a bit on that. Is there something else I can do?

serene scaffold May 24, 2022, 7:24 PM

#

lapis sequoia Also. Could you tell me something on how can I compare 2 models. I only have the...

this is what, a multiclass classifier?

lapis sequoia May 24, 2022, 7:26 PM

#

@serene scaffold no just a binary classifier

serene scaffold May 24, 2022, 7:28 PM

#

lapis sequoia <@253696366952316929> no just a binary classifier

you could look at the confusion matrices

#

so you'll know not only which one has a better accuracy score, but also if the worse one is doing poorly in terms of false positives or false negatives.

lapis sequoia May 24, 2022, 7:30 PM

#

serene scaffold you could look at the confusion matrices

I used the k fold. So I wrote in the report that "no confusion matrices for you sorry :( "

#

Because there's no combined confusion matrix available. Only one for each fold.

#

And wrote, no point looking at one for each fold. It's worthless

#

Well, not like it's actually worthless. But I was lazy to write the code. Since I was getting the validation score directly from cross_val_score without having to generate the folds 🤪

serene scaffold May 24, 2022, 7:52 PM

#

lapis sequoia And wrote, no point looking at one for each fold. It's worthless

there is a point to looking at it for each fold, because you can sum all the confusion matrices and get a composite one.

lapis sequoia May 24, 2022, 7:53 PM

#

Can we?

#

Shit

#

Not gonna do it now though. Gonna take up a lot of my brain power

#

Is the composite one the average of the values in each entry? Might look ugly

serene scaffold May 24, 2022, 8:03 PM

#

lapis sequoia Is the composite one the average of the values in each entry? Might look ugly

each instance is part of the test data for one fold, so if you sum all the matrices, element-for-element (not averaging them), then it will show how every instance was classified

mint palm May 24, 2022, 8:27 PM

#

when and how are custom loss functions made
i mean what are the symptoms they arent able to?
gradient descent not going down??
or irregularity or what?

#

@warm verge

warm verge May 24, 2022, 8:34 PM

#

Ok basically

#

Sometimes you get functions that you need to optimise which are just a mess

#

The gradients are too erratic or discontinuous on a local level to even make sense of anything

#

Or potentially your data isn't well suited to have a conventional error function

#

This will happen a lot in some problem domains, so you make a new loss function based on some method of calculation

mint palm May 24, 2022, 8:37 PM

#

warm verge The gradients are too erratic or discontinuous on a local level to even make sen...

oh ok, will i be able to infer that erratic behaviour from loss graph

#

got it thx

grave hare May 24, 2022, 8:40 PM

#

I have a dataset that i am trying to forcast by the day using 3 indicators to do so. the dataset is a list of orders that have happened over the last 6 months. some customers/order combinations are repeated, some are but taper off. I am needing to forcast what orders will fall on what day. forecasting would be no more than a month ahead. I'm thinking some type of time series modeling, but not sure how to go about it. any suggestions or directions?

scenic tulip May 24, 2022, 8:58 PM

#

well, you could isolate the orders that you know are repeated. find which day they fall on, then just add a timedelta that would increment the month and generate and new spreadsheet with the forecasted data @grave hare

#

the ones that taper off you would have to find at what rate they taper off, find the day and decrement the difference from the running value and just add it to that day of the next month @grave hare

mint palm May 24, 2022, 9:07 PM

#

i was wondering: as there are two or more approach to predict generally everything, then can you apply siamese network to every prediction model??

arctic wedgeBOT May 24, 2022, 9:59 PM

#

:incoming_envelope: :ok_hand: applied mute to @boreal summit until <t:1653430148:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

tranquil sage May 25, 2022, 2:50 AM

#

Can anyone explain why need to split the training set into mini-batch? If my data size is around 3k, what is the recommended mini-batch size? Training Text CNN classifier

misty flint May 25, 2022, 2:58 AM

#

mini-batch is a batch normalization method that can help the model train faster and sometimes improve model accuracy; dunno the recommended size since this is usually empirical and you have to try a few values for that hyperparameter.

theres a seminal paper about this entire concept from google

hallow patrol May 25, 2022, 2:58 AM

#

Hello team, I am having issues understanding the below code related to Data Aggregates

Code#1

data = [1, 2, 3, 4, 5, 6]

for i in range(1, 6):
data[i - 1] = data[i]

for i in range(0, 6):
print(data[i], end=' ')

Output

2 3 4 5 6 6

Code# 2

data = [[0, 1, 2, 3] for i in range(2)]
print(data[2][0])

Output

0

For code #1 I am not sure what the data[i - 1] = data[i] code is doing and for code #2 I do not know if [2] is referring to the range code portion and then [0] is the index to be applied on the list.

Thank you for any feedback

misty flint May 25, 2022, 2:59 AM

#

misty flint mini-batch is a batch normalization method that can help the model train faster ...

https://research.google/pubs/pub43442/

Google Research

Batch Normalization: Accelerating Deep Network Training by Reducing...

#

Cited by 36833

#

sounds about right

#

kekHands

misty flint May 25, 2022, 3:01 AM

#

hallow patrol Hello team, I am having issues understanding the below code related to Data Aggr...

try asking with #❓｜how-to-get-help - your question isnt really specific to this channel

tacit basin May 25, 2022, 3:27 AM

#

hallow patrol Hello team, I am having issues understanding the below code related to Data Aggr...

data[i - 1] = data[i] shifts elements of 'data' list to the left so [1,2,3,4,5,6] becomes [2,3,4,5,6,6]

Code#2 creates list of lists

>>> data = [[0,1,2,3] for i in range(2)]
>>> data
[[0, 1, 2, 3], [0, 1, 2, 3]]
>>> data[2][0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

The first index data[1] refers to list [0,1,2,3] and second data[1][0] is index of element in that list, so 0.
In your case there are two lists in the generated list so data[2] throws an error, as there's no index 2 in that list

tacit basin May 25, 2022, 3:31 AM

#

tranquil sage Can anyone explain why need to **split the training set into mini-batch**? If my...

Usually you use bs to fill up GPU memory to train faster, unless you have huge GPU and too large batch size can cause training performance downgrade. Then you change it empirically most often.

grand anvil May 25, 2022, 3:31 AM

#

Hi anyone knows how to save multiple ML model to a single pickle file? Thanks

dusk tide May 25, 2022, 4:01 AM

#

Anyone worked with song recommendation system?

rose agate May 25, 2022, 4:19 AM

#

grand anvil Hi anyone knows how to save multiple ML model to a single pickle file? Thanks

can you just put them all in a list and pickle the list?

lone vortex May 25, 2022, 5:23 AM

#

hey guys, does anyone know panda ?

fleet plover May 25, 2022, 5:29 AM

#

May I know why https://gist.github.com/buttercutter/b6f526c56e20f029d68e6f9041c3f5c0#file-gdas-py-L396 gives runtime error on inplace operation ?

arctic wedgeBOT May 25, 2022, 5:29 AM

#

gdas.py line 396

self.nodes[n-1].connections[ni].forward(x, types=types)  # Ltrain(w±, alpha)```

tacit basin May 25, 2022, 6:33 AM

#

lone vortex hey guys, does anyone know panda ?

hi A Human, yes quite some people here do know pandas.

summer oracle May 25, 2022, 7:37 AM

#

hi Is there anyone using doccano for relation annotation?

#

Sequence Labeling(NER part) works fine and but 'relation' label function seems not working

lone vortex May 25, 2022, 7:41 AM

#

tacit basin hi A Human, yes quite some people here do know pandas.

Ah I just started pandas and I have no idea what to do 😅

fair nimbus May 25, 2022, 11:31 AM

#

lone vortex Ah I just started pandas and I have no idea what to do 😅

The examples in the official pandas docs are quite good. Its a good lesson at minimum reproducible examples and every time I'm debugging something that has got too large debug and too proprietary to share. It worth to break it down the problem into smaller chunks usually from the pandas examples themselves.

mild dirge May 25, 2022, 11:51 AM

#

I have a dataset of letter images I want to train a CNN on. But for pre-training I also have a dataset of the same characters, but a different font. What would be the best way for pre-training? since the model architecture doesn't have to change at all.

mild dirge May 25, 2022, 1:33 PM

#

Seems like an exam or hw

tacit basin May 25, 2022, 2:26 PM

#

lone vortex Ah I just started pandas and I have no idea what to do 😅

What do you want to do?

shut phoenix May 25, 2022, 2:48 PM

#

#classifying algoirthim
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

training_path = tf.keras.utils.get_file('iris_training', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv')
testing_path = tf.keras.utils.get_file('iris_testing', 'https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv')
train = pd.read_csv(training_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(testing_path, names=CSV_COLUMN_NAMES, header=0)
training_y = train.pop('Species')
testing_y = test.pop('Species')

feature_column = []
for feature_name in train.keys():
  ...

why are we iterating over train

hollow hearth May 25, 2022, 3:05 PM

#

hey i need some help with some signal processing applications

#

i have this signal that can be composed of multiple fractional frequencies

#

and i want to use DFT to find those

#

but it only returns the integer frequencies

#

so after a ton of searching i heard about this sinc interpolation thing that i can use to estimate the frequency between two bins

#

but i for the life of me cant find any resource to help me with that

tacit basin May 25, 2022, 3:44 PM

#

shut phoenix ```py #classifying algoirthim CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', '...

It iterates over column names

mint palm May 25, 2022, 4:01 PM

#

doubt regarding siamese applicability:

#

i read that it need similar NN configuration and input types.....but

#

can i apply it onto something like following:
input 1 is lips

#

input 2 is nose

#

if i wanted to predict skin disease, and feature that indicate skin disease is similar but have different visibility(identifiable) and magnitude of identifiaabilty

#

i mean both inputs are different in look, but used to identify same thing

charred cedar May 25, 2022, 4:15 PM

#

Hello, keen for some help around implementing linear multiple regression analysis in Python please.
I have the analysis working in terms of I can select my four independent variables and my one dependent variable, and execute the analysis.
The problem is I have a bunch of control variables such as gender, age, another variable, and a categorical variable. How do I control for these?

dim palm May 25, 2022, 4:36 PM

#

charred cedar Hello, keen for some help around implementing linear multiple regression analysi...

you have to one hot your categorical variable

#

You must create as many binary variables as modalities in your categorical variable

#

gender variable must be a binary (1 or 0) not a string

upper spindle May 25, 2022, 4:39 PM

#

what courses do people recommend for me to learn deep learning/ML

serene scaffold May 25, 2022, 4:46 PM

#

upper spindle what courses do people recommend for me to learn deep learning/ML

a lot of people recommend the Andrew Ng course, though I have not taken it.

tidal bough May 25, 2022, 4:49 PM

#

I heard something about that course being moved to python

#

definitely not done yet, though

upper spindle May 25, 2022, 4:50 PM

#

serene scaffold a lot of people recommend the Andrew Ng course, though I have not taken it.

thanks, i will take it on board

tidal bough May 25, 2022, 4:50 PM

#

tidal bough I heard something about that course being moved to python

https://www.reddit.com/r/Python/comments/uhzg3u/andrew_ngs_machine_learning_course_will_be/

r/Python - Andrew Ng's Machine Learning Course will be re-released ...

1,205 votes and 87 comments so far on Reddit

mild dirge May 25, 2022, 4:52 PM

#

I really liked this book
https://www.amazon.nl/Deep-Learning-Pytorch-Neural-Networks/dp/1617295264

Deep Learning with Pytorch: Build, Train, and Tune Neural Networks ...

Deep Learning with Pytorch: Build, Train, and Tune Neural Networks Using Python Tools

#

It's specific for pytorch, but if you are a little bit familiair with most of the concepts of machine learning they do also explain some of the basics throughout

charred cedar May 25, 2022, 4:54 PM

#

dim palm you have to one hot your categorical variable

Hey Espwar, I have coded my variables as numbers. Gender is 0 or 1. Organisation (there is 3) is 1, 2, or 3.

mild dirge May 25, 2022, 4:55 PM

#

You want to 1 hot encode organisation @charred cedar

charred cedar May 25, 2022, 4:55 PM

#

What does that mean?

mild dirge May 25, 2022, 4:55 PM

#

otherwise you implicitely assume that organisation 1 and 2 are more similar to each other than 1 and 3 f.e.

#

So instead of having 1 number, let it be represented by 3 numbers

#

and 1 hot encoding means it's either [1, 0, 0] [0, 1, 0] or [0, 0, 1]

wooden sail May 25, 2022, 4:56 PM

#

1hot means to make a vector whose dimension is equal to the number of categories, with a 1 at the corresponding category and 0 everywhere else

mild dirge May 25, 2022, 4:56 PM

#

for organisation 1 2 or 3

charred cedar May 25, 2022, 4:56 PM

#

So a true false for each orgid?

mild dirge May 25, 2022, 4:56 PM

#

jup

wooden sail May 25, 2022, 4:56 PM

#

pretty much

mild dirge May 25, 2022, 4:56 PM

#

basically

charred cedar May 25, 2022, 4:56 PM

#

Then feed all three in as independent variables?

mild dirge May 25, 2022, 4:56 PM

#

yes

charred cedar May 25, 2022, 4:57 PM

#

Alright let me code this real quick...

#

               coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.0455      0.315      6.490      0.000       1.425       2.666
nofeed        -0.1022      0.041     -2.473      0.014      -0.184      -0.021
notrain       -0.0940      0.042     -2.258      0.025      -0.176      -0.012
cont           0.3919      0.074      5.313      0.000       0.247       0.537
neur          -0.0182      0.047     -0.385      0.700      -0.111       0.075
gender        -0.0362      0.141     -0.258      0.797      -0.313       0.241
age            0.0194      0.007      2.919      0.004       0.006       0.032
orgidA         0.4694      0.145      3.235      0.001       0.184       0.755
orgidB         0.7681      0.141      5.460      0.000       0.491       1.045
orgidC         0.8080      0.146      5.524      0.000       0.520       1.096

#

So does this mean I have controlled for them in the analysis now? Forgive the dumb question please 😄

wild dome May 25, 2022, 5:24 PM

#

in Pandas, how to group each red box into one column with 3 subcolumns?

#

this is what I want

#

a "parent" column, if that makes sense lol

serene scaffold May 25, 2022, 5:50 PM

#

wild dome in Pandas, how to group each red box into one column with 3 subcolumns?

pandas allows multiindexing, where you can have multiple "levels" of indexing on either axis. but if you do that, every column has to have both levels. not some of them

#

if you have those columns in a separate dataframe (with the same row indexing as the current dataframe), you can get that behavior

stone pollen May 25, 2022, 6:08 PM

#

what should i learn before trying to hop into data science and ml (considering i have basic python skills)

serene scaffold May 25, 2022, 6:17 PM

#

stone pollen what should i learn before trying to hop into data science and ml (considering i...

basic statistics, probability, and matrix/array algebra.

#

data science is applied statistics, in many ways. and then ML is a lot of statistical inference.

stone pollen May 25, 2022, 6:22 PM

#

thanks

tacit basin May 25, 2022, 6:28 PM

#

upper spindle what courses do people recommend for me to learn deep learning/ML

course.fast.ai and GitHub.com/fastai/fastbook

mint palm May 25, 2022, 8:46 PM

#

mint palm doubt regarding siamese applicability:

does no one know siamese? 😢

cold saddle May 25, 2022, 8:50 PM

#

I have 5000 documents of 5 different types, 1000 each. I want a model which will tell me which of the 5 kinds a document is.
Any advice on where to begin?

mild dirge May 25, 2022, 8:55 PM

#

what is a document?

#

It's like the most general definition of some data that you could give 😛

#

Is it just text, or also images, is it in image form, or do you have raw text as well? @cold saddle

cold saddle May 25, 2022, 9:28 PM

#

Sorry for delay I got rear ended lol. I’m gona take a step back and think about my problem.

#

Okay so I have invoices as images. They are scans but high quality ones. 600dpi I think. I only have 5 different ones so I don’t mind making a separate model for each one actually. Is there a recommendation on where to start for extracting information? I have tried pdf to text ocr solutions but I don’t think that’s the way forward as the formatting isn’t great.

mild dirge May 25, 2022, 9:47 PM

#

What differentiates the documents

#

If its the content of the text then you want to use that method

#

If it's just the lay-out, a cnn might be good enough alrdy

spare briar May 25, 2022, 10:27 PM

#

mint palm does no one know siamese? 😢

Siamese networks are not a good model for your application. Siamese architecture is typically used with triplet or contrastive loss to compare the two input images. The model essentially learns an energy function that measures similarity.

In the case of your example a siamese architecture could model something like whether the nose and lips originate from the same person.

Since you just want to use two images as inputs for classification the simplest thing would be to have a CNN for each of them, concatenate the outputs then add a few dense layers

cold saddle May 25, 2022, 10:45 PM

#

mild dirge If it's just the lay-out, a cnn might be good enough alrdy

Im going to focus on one kind of document first. Invoices from one vendor. The reason I can’t just do OCR and regex is the invoices come from China and the layout is similar but not perfect. Sometimes they very obviously cut and glue stuff lol. I think I just need bounding boxes around the table with the lines and paragraphs. Then I can OCR and regex what I need

#

I think my best path is treating them as images and CNN object detection. Since the docs are relatively similar I think I can be more specific then paragraphs and table

misty flint May 25, 2022, 11:56 PM

#

highly recommend streamlit for ML prototypes if you arent already using it

#

especially if you have to show your model or analyses to others

#

DoggoKek

half jolt May 26, 2022, 2:25 AM

#

who offers services to parallelize a genetic algorithm in gpu python?

serene scaffold May 26, 2022, 2:26 AM

#

half jolt who offers services to parallelize a genetic algorithm in gpu python?

you're asking about cloud computation on a GPU?

half jolt May 26, 2022, 2:27 AM

#

serene scaffold you're asking about cloud computation on a GPU?

gpu with colab

#

I have the algorithm, I just want to parallelize

serene scaffold May 26, 2022, 2:36 AM

#

half jolt I have the algorithm, I just want to parallelize

a GPU is already massively parallel on the inside. what part do you want to parallelize?

half jolt May 26, 2022, 2:38 AM

#

serene scaffold a GPU is already massively parallel on the inside. what part do you want to para...

I just want to see even a small improvement in the time between cpu and gpu.

#

parallelize fitness, crossover, mutation

#

or whatever is possible in the code

serene scaffold May 26, 2022, 2:38 AM

#

half jolt I just want to see even a small improvement in the time between cpu and gpu.

if the algorithm would benefit from GPU computation, the performance improvement would be orders of magnitude.

#

GPUs are ideal for lots of independent, element-wise operations

half jolt May 26, 2022, 2:40 AM

#

serene scaffold if the algorithm would benefit from GPU computation, the performance improvement...

is what I need, my code is not much

#

I have tried with Cuda but I cannot understand very well, in that part I am still very new

serene scaffold May 26, 2022, 2:44 AM

#

half jolt I have tried with Cuda but I cannot understand very well, in that part I am stil...

do you know what an API is? CUDA is an API for the GPU as a piece of hardware that libraries like tensorflow and pytorch can use. as an AI developer, you don't have to actually think that hard about what CUDA is or does, except to know that you're using a CUDA-enabled device to speed up your computation.

half jolt May 26, 2022, 2:46 AM

#

serene scaffold do you know what an API is? CUDA is an API for the GPU as a piece of hardware th...

I also used pytorch but I didn't find any time change between the code I had and the one from pytorch

#

I want to hire someone who can help me. u be available?

iron basalt May 26, 2022, 2:50 AM

#

!rule 9

arctic wedgeBOT May 26, 2022, 2:50 AM

#

Rules

9. Do not offer or ask for paid work of any kind.

iron basalt May 26, 2022, 2:51 AM

#

What is your task for the genetic algorithm?

serene scaffold May 26, 2022, 2:51 AM

#

half jolt I also used pytorch but I didn't find any time change between the code I had and...

did you move the tensors to the GPU?

half jolt May 26, 2022, 2:51 AM

#

serene scaffold did you move the tensors to the GPU?

yes, with to.device()

half jolt May 26, 2022, 2:54 AM

#

iron basalt What is your task for the genetic algorithm?

maximize profits from agricultural production

iron basalt May 26, 2022, 2:55 AM

#

What are your performance bottlenecks? Have you profiled it?

half jolt May 26, 2022, 2:58 AM

#

iron basalt What are your performance bottlenecks? Have you profiled it?

Yes, but what I want is to have comparison data between cpu and gpu

iron basalt May 26, 2022, 2:58 AM

#

Also need a bit more information on what kind of genetic algorithm / how it's implemented. Can it even be made parallel? By a GPU?

half jolt May 26, 2022, 3:00 AM

#

iron basalt Also need a bit more information on what kind of genetic algorithm / how it's im...

It would be enough for me to parallelize mutation and crossover

iron basalt May 26, 2022, 3:00 AM

#

Ok, but it depends on how that is done / represented. There are multiple ways, and the GPU is only good at some things (a lot, but there are limits to what it can do well / at all).

#

If you are dealing with a bunch of numeric arrays (e.g. big contiguous numpy arrays), then it may the type of problem to run on a GPU.

#

Linear algebra computations.

#

Have you parallelized the algorithm without the GPU (on CPU)?

half jolt May 26, 2022, 3:03 AM

#

iron basalt Have you parallelized the algorithm without the GPU (on CPU)?

no

#

what library could i use?

iron basalt May 26, 2022, 3:04 AM

#

My answer depends on the type of computations being done. Are you doing things with numpy arrays and that is what is taking the most time?

#

(or arrays of numbers in general)

lapis sequoia May 26, 2022, 3:06 AM

#

hello, don't want to clog up the channel, but i was just wondering if I could get some help with a dataframe problem in pandas.

half jolt May 26, 2022, 3:07 AM

#

iron basalt My answer depends on the type of computations being done. Are you doing things w...

yes, the crossing part is the one that takes the longest, because my chromosomes are double (product, lots), as I said it is an algorithm to agricultural profits

median moat May 26, 2022, 3:08 AM

#

lapis sequoia hello, don't want to clog up the channel, but i was just wondering if I could ge...

Better to ask a question then to ask to ask. Especially if you don't want to clog the channel.

iron basalt May 26, 2022, 3:08 AM

#

half jolt yes, the crossing part is the one that takes the longest, because my chromosomes...

Ok, is it all currently implemented with pure/plain Python, or Numpy, or TF, or Pytorch, etc?

half jolt May 26, 2022, 3:09 AM

#

iron basalt Ok, is it all currently implemented with pure/plain Python, or Numpy, or TF, or ...

numpy

lapis sequoia May 26, 2022, 3:10 AM

#

median moat Better to ask a question then to ask to ask. Especially if you don't want to clo...

I'm in the help-grape channel, and I've tried to outline my problem as extensively as possible over there. was told this channel had people experienced in pandas.

iron basalt May 26, 2022, 3:10 AM

#

half jolt numpy

Ok, you can try using numba first, maybe simply telling numba to parallelize it will be fast enough (it can do CPU or GPU, but for now try just CPU (assuming your CPU has a decent number of cores/threads)).

#

After that, if you want even fast, you can try using cupy if you are using an nvidia GPU, and pyopencl if not.

#

Or Pytorch and to device and all that. That can work too although it's a bit more than needed (it's a whole deep learning framework, not just for some generic computation on the GPU).

#

cupy basically gives you numpy on the GPU.

half jolt May 26, 2022, 3:12 AM

#

iron basalt After that, if you want even fast, you can try using cupy if you are using an nv...

i am working on google colab, with Tesla T4

iron basalt May 26, 2022, 3:13 AM

#

half jolt i am working on google colab, with Tesla T4

Ok, I would just try numba first, since it's very flexible and lets you even write Python code that gets run on the GPU.

#

(Or CPU in parallel)

#

(it also makes the numpy code run faster even without parallelism)

#

https://numba.pydata.org/

Numba: A High Performance Python Compiler

half jolt May 26, 2022, 3:14 AM

#

iron basalt Ok, I would just try numba first, since it's very flexible and lets you even wri...

I already tried with numba and pytorch, but I can't minimize the time with respect to the code I have, That's why I was asking for a service but I didn't know it was against the rules.

iron basalt May 26, 2022, 3:15 AM

#

half jolt I already tried with numba and pytorch, but I can't minimize the time with respe...

So you already have it running on the GPU?

#

I think I missed that, ok, so you tried pytorch.

half jolt May 26, 2022, 3:16 AM

#

iron basalt I think I missed that, ok, so you tried pytorch.

yes with pytorch too

#

maybe my algorithm cannot be parallelized 😦

iron basalt May 26, 2022, 3:17 AM

#

Yeah, but also could be how you are doing it.

#

IDK what to really say other than having to learn more about parallelization. It's too complicated of a topic, you often have to do some pretty big transformations on the algorithm to get it to parallelize well.

half jolt May 26, 2022, 3:21 AM

#

iron basalt IDK what to really say other than having to learn more about parallelization. It...

I've already exhausted all my possibilities :/

iron basalt May 26, 2022, 3:21 AM

#

Common ones are splitting the algorithm into multiple passes / phases (e.g. 1 for loop becomes 3 separate ones), flipping the data "touching" POV upside down (really hard to explain that one, it influences what synchronization is needed (if any)), removing branching (if statements).

#

Making local copies of data so that you don't need to have locks.

half jolt May 26, 2022, 3:23 AM

#

iron basalt Making local copies of data so that you don't need to have locks.

i have

iron basalt May 26, 2022, 3:24 AM

#

I guess a big one is make sure you are not constantly moving data back and forth from CPU to GPU and back. Do it all in one place (in batches).

half jolt May 26, 2022, 3:26 AM

#

iron basalt I guess a big one is make sure you are not constantly moving data back and forth...

You could help me?

iron basalt May 26, 2022, 3:27 AM

#

half jolt You could help me?

If you can't show code then I don't think anyone can. I can only give general direction / hints.

half jolt May 26, 2022, 3:33 AM

#

iron basalt If you can't show code then I don't think anyone can. I can only give general di...

how can i show everything?

iron basalt May 26, 2022, 3:34 AM

#

!paste

arctic wedgeBOT May 26, 2022, 3:34 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

half jolt May 26, 2022, 3:39 AM

#

there it is until the part of the crossover

iron basalt May 26, 2022, 3:42 AM

#

This has a lot of pure python stuff happening in it, including pure/plain Python loops. If you can vectorize it with numpy it will be a lot faster.

half jolt May 26, 2022, 3:42 AM

#

iron basalt This has a lot of pure python stuff happening in it, including pure/plain Python...

in what parts?

iron basalt May 26, 2022, 3:43 AM

#

half jolt in what parts?

For example seleccionTorneo.

#

It seems you are finding the argmin.

#

In a plain Python loop.

#

A better understanding of how to vectorize things with numpy will go a long way.

half jolt May 26, 2022, 3:45 AM

#

and with GPU?

iron basalt May 26, 2022, 3:45 AM

#

You probably don't need the GPU. Not even multiple CPU cores.

#

Simply making proper use of numpy will give you a very large performance increase.

#

If it's still too slow, you can throw numba at it (after correctly applying numpy).

half jolt May 26, 2022, 3:47 AM

#

iron basalt You probably don't need the GPU. Not even multiple CPU cores.

but could i parallelize with cpu?

iron basalt May 26, 2022, 3:47 AM

#

Yes, probably, just from a short glance looks like it.

#

GPU probably too, but it is probably not needed, unless you really start scaling up really big.

half jolt May 26, 2022, 3:49 AM

#

iron basalt GPU probably too, but it is probably not needed, unless you really start scaling...

is that that's what I'm going for now the data is not so much but it will start to increase and that's why I'm trying to do it with gpu

iron basalt May 26, 2022, 3:49 AM

#

If you don't want to learn more about numpy and just want to write in this style (hand written loops), then it's time to switch to a language that does numeric computation better.

#

(For example, if you translated this to something like C++ (pretty much as directly as possible), it would just be fast already (no parallelization))

half jolt May 26, 2022, 3:52 AM

#

iron basalt If you don't want to learn more about numpy and just want to write in this style...

Could I get a small code sample please?

iron basalt May 26, 2022, 3:54 AM

#

fo = np.inf
index = 0
for i in parents:
  _fo = fitness(poblacionInicial[i], rendimiento_kg_m2_prod, precio_kg_t_suma)
  if _fo < fo:
    fo = _fo
    index = i

#

float fo = F32_INFINITY;
int index = 0;
for (int i = 0; i < num_parents; ++i) {
  _fo = fitness(poblacionInicial[i], rendimiento_kg_m2_prod, precio_kg_t_suma);
  if (_fo < fo) {
    fo = _fo;
    index = i;  
  }
}

half jolt May 26, 2022, 3:57 AM

#

iron basalt ```cpp float fo = F32_INFINITY; int index = 0; for (int i = 0; i < num_parents; ...

I quoted optimizing with numpy 😦

#

sorry if i didn't express myself

iron basalt May 26, 2022, 3:58 AM

#

half jolt I quoted optimizing with numpy 😦

Something like index = np.argmin(fitness(poblacionInicial, rendimiento_kg_m2_prod, precio_kg_t_suma)).

#

The idea with numpy is to avoid Python loops.

#

You operate on entire arrays rather than individual elements.

#

So fitness does not work on a specific i, but rather all of them (so no [i]).

half jolt May 26, 2022, 4:01 AM

#

iron basalt You operate on entire arrays rather than individual elements.

I understand, thanks for your help, I will try to optimize that way because I don't know how to do it with gpu

iron basalt May 26, 2022, 4:02 AM

#

(The reason why avoiding Python loops is important is because they are slow, and instead the looping happens inside numpy which is implemented in C so its loops are fast like in the C++ example).

iron basalt May 26, 2022, 4:03 AM

#

half jolt I understand, thanks for your help, I will try to optimize that way because I do...

It becomes more obvious how to make it work on the GPU after you have vectorized it (made it all work at the array level / numpy). It's often a first step.

#

(Because for example, cupy has many of the same functions as numpy, and so it can be pretty much one to one converted (so the code looks the same, just using the GPU))

#

(For example cupy also has argmin, https://docs.cupy.dev/en/stable/reference/generated/cupy.argmin.html which would run on the GPU)

half jolt May 26, 2022, 4:08 AM

#

iron basalt Something like `index = np.argmin(fitness(poblacionInicial, rendimiento_kg_m2_pr...

To use this part, I would have to modify the fitness function, right?

half jolt May 26, 2022, 4:08 AM

#

iron basalt (For example cupy also has argmin, https://docs.cupy.dev/en/stable/reference/gen...

Little by little I understand

iron basalt May 26, 2022, 4:09 AM

#

half jolt To use this part, I would have to modify the fitness function, right?

Yes, think more about operating on entire arrays rather than individual items. Think groups / chunks of data.

#

Computers like groups of the same type of thing. For speed, simplicity, etc.

half jolt May 26, 2022, 4:10 AM

#

iron basalt Yes, think more about operating on entire arrays rather than individual items. T...

I'll try to do it that way, thank you very much

iron basalt May 26, 2022, 4:10 AM

#

So your functions should take arrays are arguments, and apply array-level operations like argmin.

#

Now you might get into a situation in which you don't know how to vectorize it / don't know which numpy functions to use and don't see a way to do it. That is where numba comes in, it lets you make your own functions like argmin that are just as fast / operate on numpy arrays. Numba is made to work with numpy to fill any gaps in numpy (missing functions). It can also just make it a lot faster (and even run on the GPU, but also can parallelize on CPU (don't worry about this yet)).

half jolt May 26, 2022, 4:13 AM

#

Thanks for the help

half jolt May 26, 2022, 4:17 AM

#

iron basalt So your functions should take arrays are arguments, and apply array-level operat...

One last question, how could I vectorize or optimize the fitness function?

iron basalt May 26, 2022, 4:18 AM

#

half jolt One last question, how could I vectorize or optimize the fitness function?

for i,j in zip(individuo[0], individuo[1]):

#

Try splitting individuo into two different arrays. Or you can do some fancy numpy datatype stuff.

#

(So fitness takes two args)

#

So you can store stuff like this in general: ```py
[(x, y), (x, y), (x, y), ...]

#

Or

#

[x, x, x, ...]
[y, y, y, ...]

#

1 array versus 2 arrays.

#

This part is a bunch of elementwise operations: rendimiento[i]*precio[i]*j

#

And the rest is a sum.

#

So say you have individuo_i and individuo_j.

#

rendimiento[individuo_i] gives you another array.

#

Numpy lets you use arrays with indices in them to index another array.

#

np.sum(rendimiento[individuo_i] * precio[individuo_i] * individuo_j)

#

>>> a = np.random.randint(10, size=10)
>>> a
array([6, 4, 2, 8, 8, 8, 3, 1, 7, 1])
>>> b = np.array([3, 1, 5])
>>> b
array([3, 1, 5])
>>> a[b]
array([8, 4, 8])
>>>

#

So, it's basically the same thing, just no hand written loop, working at the array level, and that includes indexing at the array level.

half jolt May 26, 2022, 4:33 AM

#

iron basalt `np.sum(rendimiento[individuo_i] * precio[individuo_i] * individuo_j)`

this seems to be what i was looking for, thanks

tacit basin May 26, 2022, 4:38 AM

#

Could you share sample frames you want to join and code and output?

timid narwhal May 26, 2022, 5:49 AM

#

does anyone know how to turn the to_numpy output into an array with separated indicies?

#

like [-35.2210673 -9.0063682 'Delmiro Gouveia 774, Maceió, Alagoas'] and add the commas into this [-35.2210673, -9.0063682, 'Delmiro Gouveia 774, Maceió, Alagoas']

#

I tried to do np.char.split but it doesnt work with nonstrings

tacit basin May 26, 2022, 5:56 AM

#

timid narwhal I tried to do np.char.split but it doesnt work with nonstrings

you want to convert this to python list?

timid narwhal May 26, 2022, 5:57 AM

#

yeah

tacit basin May 26, 2022, 6:03 AM

#

convert numpy array to python list?

royal crest May 26, 2022, 6:14 AM

#

list()

lone vortex May 26, 2022, 6:50 AM

#

Anyone can help with pandas, I am completely new to it

rose agate May 26, 2022, 7:00 AM

#

lone vortex Anyone can help with pandas, I am completely new to it

probably best to do a tutorial on it and try to use it yourself then ask questions when you get stuck. This might be helpful: https://towardsdatascience.com/python-for-data-science-basics-of-pandas-5f8d9680617e

Medium

Python for Data Science — A Guide to Pandas

The Complete Data Exploration Guide in 10 Minutes

lone vortex May 26, 2022, 7:07 AM

#

Ok thanks

bold timber May 26, 2022, 7:24 AM

#

What is the meaning of ord? whether ord=1 is for calculating manhattan distance?

wooden sail May 26, 2022, 7:32 AM

#

the common vector norms you are familiar with are what is called l-p norms, which consist of the sum of the absolute values of the entries of the vector raised to the pth power, and then you take the pth root of the whole sum

#

ord = 1 means raised to the first power and taking the 1st root, i.e. the sum of absolute values. as you said, this is the manhattan distance

#

ord = 2 is the usual euclidean distance

#

infinity norms return the element with largest or smallest magnitude, and 'fro' is short for Frobenius, which is similar to the 2-norm (euclidean distance) but for matrices (a double sum instead of a single sum)

bold timber May 26, 2022, 7:42 AM

#

wooden sail the common vector norms you are familiar with are what is called l-p norms, whic...

what is 'pth'? can you explain to me?

#

why the result of this code is one? can you elaborate to me by math?

wooden sail May 26, 2022, 7:44 AM

#

pth as in ordinal. e.g. 1st, 2nd, 3rd, 4th, 5th, 6th, etc

#

i don't think this server has a latex bot so i can't just write the math

#

lemme find an image

#

bold timber May 26, 2022, 7:45 AM

#

bold timber why the result of this code is one? can you elaborate to me by math?

can you explain to me about this? @wooden sail

wooden sail May 26, 2022, 7:46 AM

#

if you substitute what you have into the equation i shared, and note that by default norm takes p = 2

#

we get sqrt((4/5)^2 + (3/5)^2) = sqrt (16/25 + 9/25) = sqrt(25/25) = 1

bold timber May 26, 2022, 7:49 AM

#

wooden sail we get sqrt((4/5)^2 + (3/5)^2) = sqrt (16/25 + 9/25) = sqrt(25/25) = 1

ok, thank you so much

tacit basin May 26, 2022, 7:57 AM

#

lone vortex Anyone can help with pandas, I am completely new to it

What's your question re pandas?

wooden sail May 26, 2022, 8:02 AM

#

.latex $\sum_{n=1}^1 \vert x_n \vert^p$

dusty valve May 26, 2022, 8:10 AM

#

how would i fit a dataset from a text file into a language prediction model?

peak ridge May 26, 2022, 8:22 AM

#

How much math is imp to learn data science data visualization and machine learning
I mean how much math is required for being a data scientist

gray orchid May 26, 2022, 8:25 AM

#

peak ridge How much math is imp to learn data science data visualization and machine learni...

calculus, linear algebra, probability, statistics, some discrete math

odd meteor May 26, 2022, 8:26 AM

#

dusty valve how would i fit a dataset from a text file into a language prediction model?

Ckean + Preprocess the text to enable you extract numeric features from the text data. This extracted numeric feature could then be passed to your ML model.

wooden sail May 26, 2022, 8:35 AM

#

peak ridge How much math is imp to learn data science data visualization and machine learni...

for being a data scientist, at least undergrad level multivar calc, linalg, and stats

#

but the higher the level, the better

peak ridge May 26, 2022, 8:41 AM

#

@gray orchid @wooden sail resources to learn plz!

gray orchid May 26, 2022, 8:43 AM

#

Oh, and the most important

#

the ability to find resourse

wooden sail May 26, 2022, 8:43 AM

#

uni, spivak's calculus book, gilbert strang's linalg book

#

louis scharf's statistical signal processing book

#

and classics like randolph moses and petre stoica's spectral analysis of signals

dusty valve May 26, 2022, 8:54 AM

#

wooden sail and classics like randolph moses and petre stoica's spectral analysis of signals

.bm

wooden sail May 26, 2022, 8:57 AM

#

that book predates ML and AI becoming such hot buzzwords btw, so you'll find no mention of them. nowadays, most topics of signal processing, statistical analysis and optimization fall under that umbrella though. they overlap like 99% or one is a subset of the other, pretty much

terse frigate May 26, 2022, 9:03 AM

#

Ant colony optimization

#

can someone explain an approach

wooden sail May 26, 2022, 9:07 AM

#

you can set up a system of equations. kinda have to make an assumption on the desired quantity, but this should be a scalar factor. you can assume the output quantity is 1[units] * desired_percentage

#

with that in mind, you want a linear combination of the given percentages that yields the desired percentage

#

with the restriction that the sum of quantities equals 1

#

sounds like linear programming

terse frigate May 26, 2022, 9:08 AM

#

wooden sail you can set up a system of equations. kinda have to make an assumption on the de...

you mean the P^req

wooden sail May 26, 2022, 9:09 AM

#

mhm

terse frigate May 26, 2022, 9:11 AM

#

wooden sail with that in mind, you want a linear combination of the given percentages that y...

but i understand that, i just have no clue where to begin

terse frigate May 26, 2022, 9:11 AM

#

wooden sail sounds like linear programming

in terms of this

wooden sail May 26, 2022, 9:11 AM

#

idk, try lagrange multipliers?

#

you have one equation and a constraint

#

the equation is convex (but not strictly so, there might be many solutions)

wooden sail May 26, 2022, 9:59 AM

#

heh the solution is the same as for beamforming, but there's a pseudo inverse of a rank 1 matrix involved. i'll type it up after i eat

wooden sail May 26, 2022, 10:15 AM

#

ok let's give this a shot

#

.latex let's start by calling the desired percentage $p$, the given percentages $\boldsymbol{x} \in \mathbb{R}^n$, and our target quantity $\boldsymbol{w} \in \mathbb{R}^n$, i.e. the amount of each of the ingredients

#

oh latex is still not allowed here

#

oof

#

lemme grab my tablet

wooden sail May 26, 2022, 10:32 AM

#

#

#

underlined quantities are vectors, so that underlined 1 is a vector of 1s of size n @terse frigate

#

this should give you ONE solution. there are others, since xx^T is rank 1

mint palm May 26, 2022, 12:59 PM

#

i transfer learning we have two step, right?

training a task by simple supervised ANN
using pretrained model to further train ANN to suite similar but little different prob

but, i read about transfer learning in two place:

after pretrain model(using supervised ann) was deployed and still learned and improved
after pretrain model(using supervised ann) was again trained(using supervised ann) learned and then deployes
i mean, in 2nd algo doesnt improve after deploying

#

are both these transfer learning??

#

i wanna implement the first one where it improve after deploying....but i am getting tutorial for 2nd only

wooden sail May 26, 2022, 1:02 PM

#

they're both transfer learning, since the idea behind that is to train a part of the network ahead of time, and then keeps its parameters fixed while adding new, trainable parameters after the pre-trained network. how you do the training of the new part is a different matter

mint palm May 26, 2022, 1:05 PM

#

wooden sail they're both transfer learning, since the idea behind that is to train a part of...

makes sense

mint palm May 26, 2022, 1:31 PM

#

wooden sail they're both transfer learning, since the idea behind that is to train a part of...

and how does it learn after deployement?

wooden sail May 26, 2022, 1:34 PM

#

i'm not sure what methods are used for that

charred cedar May 26, 2022, 2:08 PM

#

Can anyone explain this stupid example code to me please?

>>> import statsmodels.api as sm
>>> import statsmodels.genmod.families.links as links
>>> probit = links.probit
>>> outcome_model = sm.GLM.from_formula("cong_mesg ~ emo + treat + age + educ + gender + income",
...                                     data, family=sm.families.Binomial(link=probit()))
>>> mediator_model = sm.OLS.from_formula("emo ~ treat + age + educ + gender + income", data)
>>> med = Mediation(outcome_model, mediator_model, "treat", "emo").fit()
>>> med.summary()

#

Specifically the string arguments... because it makes no sense...

#

https://www.statsmodels.org/stable/generated/statsmodels.stats.mediation.Mediation.html

#

This is meant to be an example implementation of a mediated regression analysis in Python with statsmodels

dusty valve May 26, 2022, 2:45 PM

#

i got this code -

#

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
print('done')
sentences = set(open('users.txt').read().split('\n'))

vocab_size = 1000
embedding_dim = 16
max_length = 16
trunc_type = 'post'
padding_type = 'post'
oov_toke = '<OOV>'
training_size = 20000

tokenizer = Tokenizer(num_words=100, oov_token='<OOV>')
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(sentences)

sequences = pad_sequences(sequences, padding=padding_type,
                          truncating=trunc_type, maxlen=5)


model = keras.Sequential([keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
                          keras.layers.GlobalAveragePooling1D(),
                          keras.layers.Dense(6, activation='relu'),
                          keras.layers.Dense(1, activation='sigmoid'),])```

#

when i run it, all it shows is 2022-05-26 10:42:40.631846: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-05-26 10:42:40.632428: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-05-26 10:42:59.610722: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found 2022-05-26 10:42:59.611401: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-05-26 10:42:59.623799: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-KDFNN9DK 2022-05-26 10:42:59.625071: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-KDFNN9DK 2022-05-26 10:42:59.626772: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

#

it only loads tf up, doesn't run the rest of the code

leaden crow May 26, 2022, 3:15 PM

#

hey, for some reason an image won;'t show up behind my data

wild dome May 26, 2022, 3:29 PM

#

in Pandas, how to merge the red boxes into a single cell? how to have multirows of rows with same value, from an existing dataframe

serene scaffold May 26, 2022, 3:46 PM

#

wild dome in Pandas, how to merge the red boxes into a single cell? how to have multirows ...

what would it mean if they were one cell?

#

do you want to sum them, or what?

#

or are you trying to have cells that span multiple rows? because you can't do that

desert oar May 26, 2022, 3:55 PM

#

you can make a cell that contains tuples, although usually you don't want to do that

lapis sequoia May 26, 2022, 4:25 PM

#

what is the difference between LSTM and RNN?

tacit basin May 26, 2022, 4:27 PM

#

charred cedar Specifically the string arguments... because it makes no sense...

Why doesn't it make sense?

charred cedar May 26, 2022, 4:28 PM

#

tacit basin Why doesn't it make sense?

Well the first question would be what is cong_message?

#

Second would be what does ~ mean?

#

Third would be is all the +'s after for control variables?

charred cedar May 26, 2022, 4:29 PM

#

tacit basin Why doesn't it make sense?

^, thanks for response

tacit basin May 26, 2022, 4:30 PM

#

What is data?

charred cedar May 26, 2022, 4:31 PM

#

I am assuming a dataframe but I don't know, this is example code

tacit basin May 26, 2022, 4:31 PM

#

I guess these are col names

charred cedar May 26, 2022, 4:31 PM

#

Yes they should be column names

tacit basin May 26, 2022, 4:32 PM

#

Ok so mistery solved? :)

charred cedar May 26, 2022, 4:32 PM

#

No unfortunately

#

It doesn't answer the three questions that have me stumped

tacit basin May 26, 2022, 4:38 PM

#

So first q. We assume there are col namea

#

~ usually means neg, so i guess here is the same

#

i would guess are for features to include in linear model

serene scaffold May 26, 2022, 4:39 PM

#

desert oar you can make a cell that contains tuples, although usually you don't want to do ...

I think they want row-spanning cells like the ones you can have in excel

tacit basin May 26, 2022, 4:39 PM

#

charred cedar It doesn't answer the three questions that have me stumped

But you could get some data and verify these assumptions :)

charred cedar May 26, 2022, 4:40 PM

#

So I have Neuroticism (which is the column I want to use as a mediator), Lack of Feedback (which is the independent variable), and Job Satisfaction (which is the dependent variable). I also have Age, Gender, OrgidA, OrgidB, and OrgidC which are variables to control for. So how do you think I format those columns into the correct arguments?

#

probit = links.probit
outcome_model = sm.GLM.from_formula("neur ~ nofeed + jsat + age + gender + orgidA + orgidB + orgidC",
                                     df, family=sm.families.Binomial(link=probit()))
mediator_model = sm.OLS.from_formula("nofeed ~ jsat + age + gender + orgidA + orgidB + orgidC", df)
med = Mediation(outcome_model, mediator_model, "jsat", "nofeed").fit()
med.summary()

#

This gets some error which tells you nothing helpful.

#

D:\Projects\Python\135 Code\Git\BSN414\.venv\lib\site-packages\statsmodels\stats\mediation.py:372: RuntimeWarning: invalid value encountered in true_divide
  self.prop_med_tx = self.ACME_tx / self.total_effect

#

All column names are correct

tacit basin May 26, 2022, 4:41 PM

#

Never used it. Now on mobile hard to debug this . Sorry

charred cedar May 26, 2022, 4:41 PM

#

All good, I appreciate the help either way. Do you think this is the way the string arguments are done though?

tacit basin May 26, 2022, 4:42 PM

#

Let me check docs

charred cedar May 26, 2022, 4:42 PM

#

That is the annoying part, docs are useless. Do you need the link again though?

tacit basin May 26, 2022, 4:43 PM

#

These are R-style formulas I'm reading

#

https://www.statsmodels.org/dev/examples/notebooks/generated/glm_formula.html

charred cedar May 26, 2022, 4:43 PM

#

Yes this Python packaged is probably based on R

tacit basin May 26, 2022, 4:45 PM

#

https://www.statsmodels.org/dev/examples/notebooks/generated/formulas.html

#

Patsy https://patsy.readthedocs.io/en/latest/overview.html

#

Looks like y ~ x1 + x, y is dependant var, x,y independent

#

Formula language https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-language

charred cedar May 26, 2022, 4:51 PM

#

I'll admit I don't understand this formula writing, and for a 3am read, these docs also aren't very clear.

#

None the less the code snippet should be correct for a mediated regression analysis.

haughty topaz May 26, 2022, 4:54 PM

#

from sklearn.preprocessing import MultiLabelBinarizer

the_100_most_common_words = ['i', 'you', 'the', 'to', 'and', 'a', 'it', 'ross', 'monica', 'rachel', 'chandler', 'is', 'that', 'joey', 'phoebe', 'oh', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'you', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'what', 'na', 'out', "'re", 'at', 'yeah', 'no', 'so', 'scene', 'well', 'your', 'there', 't', 'hey', 'no', 'she', 'okay', 'ross', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'chandler', 'can', 'monica', 'joey', 'rachel', 'the', 'here', 'phoebe', 'm', 'it', 'uh', 'they', 'one', 'think', 'mean', 'did', 'so', 'all', 're', 'see', 'don', 'back', 'and', "'ll", 'from', 'he', 'okay', 'if', 'want', "y'know"]

mlb = MultiLabelBinarizer().fit([the_100_most_common_words])

sentence_to_transform = ["c'mon", ',', 'you', "'re", 'going', 'out', 'with', 'the', 'guy', '!']

vector = mlb.transform([sentence_to_transform])
print(vector)
print(len(vector[0]))

#

The length of the 100 most common words is 100
How come the length of the vector it creates is only 84?

charred cedar May 26, 2022, 4:56 PM

#

haughty topaz ```py from sklearn.preprocessing import MultiLabelBinarizer the_100_most_common...

You break the most common words into two lists firstly.

#

Is that intended?

haughty topaz May 26, 2022, 4:57 PM

#

no that's just a copy paste mistake

charred cedar May 26, 2022, 4:57 PM

#

Did you confirm the length of that list?

haughty topaz May 26, 2022, 4:58 PM

#

Yea yea it's 100

#

For sure

charred cedar May 26, 2022, 5:00 PM

#

Well that is the dumb reasons checked off. I don't know enough about the sklearn package unfortunately.

#

Goodluck fixing it.

haughty topaz May 26, 2022, 5:01 PM

#

haughty topaz ```py from sklearn.preprocessing import MultiLabelBinarizer the_100_most_common...

@serene scaffold You gave me this solution, would you know the issue?

serene scaffold May 26, 2022, 5:09 PM

#

haughty topaz <@253696366952316929> You gave me this solution, would you know the issue?

the list contains duplicates, and only 84 of the elements are unique.

#

!e

print(len(set(['i', 'you', 'the', 'to', 'and', 'a', 'it', 'ross', 'monica', 'rachel', 'chandler', 'is', 'that', 'joey', 'phoebe', 'oh', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'you', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'what', 'na', 'out', "'re", 'at', 'yeah', 'no', 'so', 'scene', 'well', 'your', 'there', 't', 'hey', 'no', 'she', 'okay', 'ross', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'chandler', 'can', 'monica', 'joey', 'rachel', 'the', 'here', 'phoebe', 'm', 'it', 'uh', 'they', 'one', 'think', 'mean', 'did', 'so', 'all', 're', 'see', 'don', 'back', 'and', "'ll", 'from', 'he', 'okay', 'if', 'want', "y'know"])))

arctic wedgeBOT May 26, 2022, 5:09 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

charred cedar May 26, 2022, 5:09 PM

#

Guess we missed a dumb reason.

#

😄

serene scaffold May 26, 2022, 5:10 PM

#

😄

wooden sail May 26, 2022, 5:36 PM

#

ok, let's give this another shot

#

.latex $\left( \sum_{n=1}^N \vert x_n \vert ^p \right)^\frac{1}{p}$ for the l-p norm

strange elbowBOT May 26, 2022, 5:37 PM

#

$latex.png$

wooden sail May 26, 2022, 5:37 PM

#

aight, cool

serene scaffold May 26, 2022, 5:45 PM

#

!docs pandas.DataFrame.groupby

arctic wedgeBOT May 26, 2022, 5:45 PM

#

pandas.DataFrame.groupby


DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)```
Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.

serene scaffold May 26, 2022, 5:46 PM

#

groupby returns a grouped dataframe, whereon you can apply another operation, like mean

#

you would probably want to drop the name column before grouping, since it doesn't matter for this.

haughty topaz May 26, 2022, 5:53 PM

#

from sklearn.preprocessing import MultiLabelBinarizer

the_100_most_common_words = ['you', 'the', 'to', 'and', 'a', 'it', 'is', 'that', 'in', 'of', 'do', "n't", 'me', 'on', 'know', 'this', 'just', 'my', 's', 'with', 'what', 'her', 'we', 'have', "'m", 'was', 'for', 'are', 'not', 'he', 'like', 'up', 'be', 'na', 'out', "'re", 'at', 'so', 'your', 'there', 't', 'no', 'she', 'right', 'his', 'all', 'but', 'him', 'about', 'get', 'go', 'gon', 'got', 'can', 'here', 'm', 'uh', 'they', 'one', 'think', 'mean', 'did', 're', 'see', 'don', 'back', "'ll", 'from', 'okay', 'if', 'want', "y'know", 'look', 'now', 'over', 'really', 'guys', 'guy', 'as', 'how', 'then', 'who', 'phone', '‘', 'by', 'ah', "'ve", 'would', 'when', 'thing', 'down', 'going', 'good', 'were', 'tell', 'had', 'off', 'apartment', 'door', 'something']

mlb = MultiLabelBinarizer().fit([the_100_most_common_words])

sentence_to_transform = ["c'mon", ',', 'you', "'re", 'going', 'out', 'with', 'the', 'guy', '!']

vector = mlb.transform([sentence_to_transform])
print(vector)

#

[[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0]]

#

Why does this give this vector bruh, I can't with this MultiLabelBinarizer

#

Wtf it sorts the classes?

serene scaffold May 26, 2022, 5:56 PM

#

haughty topaz ``` [[0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 ...

you transformed one sentence, and you got a row vector. this looks like expected behavior.

haughty topaz May 26, 2022, 5:57 PM

#

Yeah but look at the classes I fit in the MultiLabelBinarizer

serene scaffold May 26, 2022, 5:59 PM

#

haughty topaz Yeah but look at the classes I fit in the MultiLabelBinarizer

why do you need it to preserve a given word order?

haughty topaz May 26, 2022, 5:59 PM

#

No it doesn't need to actually

#

but I didn't get why it did that

serene scaffold May 26, 2022, 5:59 PM

#

there's a mean() method

haughty topaz May 26, 2022, 5:59 PM

#

weird that it sorts the classes

serene scaffold May 26, 2022, 6:02 PM

#

haughty topaz weird that it sorts the classes

you could fit a MLB on something that isn't ordered, like a set, so in some respects, sorting it results in more consistent behavior.

#

I would need to see dfAthletesClean.head().to_dict('list') as text to tell you.

#

but I'm not sure why you didn't just do dfAthletesClean.groupby(['sex', 'nationality'])['height'].mean()

quiet cloak May 26, 2022, 6:28 PM

#

I just looked through this channel and I did not understand a thing that was written in here. Computer science major so flop right now 🤦‍♂️

haughty topaz May 26, 2022, 6:34 PM

#

same

mint palm May 26, 2022, 6:39 PM

#

has anyone used NS2 before??

serene scaffold May 26, 2022, 6:41 PM

#

quiet cloak I just looked through this channel and I did not understand a thing that was wri...

data science is a whole thing unto itself, so if you haven't taken any courses related to it, there's no expectation that you'd understand it just from being a CS student.

wooden sail May 26, 2022, 6:43 PM

#

you could also do a phd in CS and never touch any of these topics

#

CS is pretty broad, and depending on your country, ranges from really software dev, to basically a branch of math

#

this stuff can fit somewhere in between

serene scaffold May 26, 2022, 7:10 PM

#

so, you're doing mean imputation. the best solution I can think of involves pd.merge, and that might be confusing for you.

mint palm May 26, 2022, 7:11 PM

#

if a model it to be made considering its a pretraining model, how should i evaluate its appropriateness?
i mean should i evaluate normally, using accuracy, f1 score, confusion matrix etc??

wild dome May 26, 2022, 8:43 PM

#

serene scaffold what would it mean if they were one cell?

sorry I didn't explain it correctly, I meant merging each red column into one cell, like in Excel when you merge multiple cells

#

but now I wanna take a different approach

serene scaffold May 26, 2022, 8:47 PM

#

wild dome sorry I didn't explain it correctly, I meant merging each red column into one ce...

right, pandas doesn't support that. and it wouldn't really make sense in the context of what pandas is for.

wild dome May 26, 2022, 8:49 PM

#

consider this dataframe, and note how each color represent the same instance parameters, for example n=50, m=50, p=5, a=2 in red

I have multiple rows with these same parameters, and I want to group them by their average, so instead of having multiple rows, I want only one row of the same parameters but with the rest of the data being the average of the total rows

#

so the desired output is like

#

imagine the data from RGD to the right is the average of the original

agile cobalt May 26, 2022, 8:51 PM

#

re: the original thing
depending on what you want to do, perhaps you could use a Multi Index or just df.groupby(), but "multirows" does not makes much sense to me

serene scaffold May 26, 2022, 8:51 PM

#

wild dome consider this dataframe, and note how each color represent the same instance par...

you can achieve that with a groupby. but you said you want to "group them by their average", and that way of approaching it might cause you to make a mistake. you want to group by the (n, m, p, alpha) values and calculate the average of each group.

wild dome May 26, 2022, 8:52 PM

#

agile cobalt re: the original thing depending on what you want to do, perhaps you could use a...

yeah at this point I dropped the multirows idea

wild dome May 26, 2022, 8:52 PM

#

serene scaffold you can achieve that with a groupby. but you said you want to "group them by the...

lemme try that

serene scaffold May 26, 2022, 8:52 PM

#

I'm not completely sure how you'd achieve that when your columns are multiindexed

wild dome May 26, 2022, 8:53 PM

#

serene scaffold I'm not completely sure how you'd achieve that when your columns are multiindexe...

oh, well if I start getting weird mistakes I can remove the top headers and add them after grouping

serene scaffold May 26, 2022, 8:53 PM

#

if you do print(df.head().to_dict('list')) and show the text, I can experiment. No screenshots.

scenic tulip May 26, 2022, 8:54 PM

#

you could do groupby(n, m, p, alpha). pretty sure you can group multiple indexes eh? in sql you can

wild dome May 26, 2022, 8:54 PM

#

serene scaffold if you do `print(df.head().to_dict('list'))` and show the text, I can experiment...

{('instance', 'n'): [50, 50, 50, 50, 50],
 ('instance', 'm'): [50, 50, 50, 50, 50],
 ('instance', 'p'): [5, 5, 12, 12, 5],
 ('instance', 'alpha'): [2, 3, 2, 3, 2],
 ('RGD', 'OF'): [595, 824, 387, 595, 716],
 ('NI', 'OF'): [519, 626, 306, 358, 547],
 ('NI', 'time'): [0.2850522999999612,
  0.9070183999999699,
  0.2490571999999247,
  0.3609853000000385,
  0.35417499999994106],
 ('NI', 'improvement'): [12.77310924369748,
  24.02912621359223,
  20.930232558139537,
  39.831932773109244,
  23.60335195530726],
 ('FVS', 'OF'): [519, 626, 305, 438, 547],
 ('FVS', 'time'): [0.010051900000007663,
  0.04784549999999399,
  0.007143799999994371,
  0.005448199999818826,
  0.011858300000085364],
 ('FVS', 'improvement'): [12.77310924369748,
  24.02912621359223,
  21.188630490956072,
  26.386554621848738,
  23.60335195530726]}

#

ok I tried this code

df.groupby([("instance", "n"), ("instance", "m"), ("instance", "p"), ("instance", "alpha")]).mean()

#

the instance column is cursed lol now I'll try without top columns

serene scaffold May 26, 2022, 9:01 PM

#

wild dome ok I tried this code ```py df.groupby([("instance", "n"), ("instance", "m"), ("i...

that was the only solution I could come up with as well.

#

In [22]: poop.index.names
Out[22]: FrozenList([('instance', 'n'), ('instance', 'm'), ('instance', 'p'), ('instance', 'alpha')])

In [23]: poop.index.names = 'n m p alpha'.split()

In [24]: poop
Out[24]:
                  RGD     NI                          FVS
                   OF     OF      time improvement     OF      time improvement
n  m  p  alpha
50 50 5  2      655.5  533.0  0.319614   18.188231  533.0  0.010955   18.188231
         3      824.0  626.0  0.907018   24.029126  626.0  0.047845   24.029126
      12 2      387.0  306.0  0.249057   20.930233  305.0  0.007144   21.188630
         3      595.0  358.0  0.360985   39.831933  438.0  0.005448   26.386555

#

I had to think of a name for the resultant df, so I picked "poop" because I didn't like it.

#

but, uh, there you go

wild dome May 26, 2022, 9:12 PM

#

thanks

#

I have a question about the index

#

I removed the top headers and ran this code

results50.groupby("n m p alpha".split()).mean()

#

and if I add reset_index I get the following

results50.groupby("n m p alpha".split()).mean().reset_index()

#

#

why in the first case I had 2 rows in the headers? is it a multiindex too?

#

now for context, I'm gonna write this DF to a latex table, that's why I'd prefer a multirow

#

so I like the first output, without .reset_index, but idk why there are 2 rows in the headers

serene scaffold May 26, 2022, 9:16 PM

#

wild dome I removed the top headers and ran this code ```py results50.groupby("n m p alpha...

in the first screenshot, n m p alpha are names for the levels of indexing for the rows.
in the second screenshot, they are names of columns.

wild dome May 26, 2022, 9:17 PM

#

serene scaffold in the first screenshot, `n m p alpha` are names for the levels of indexing for ...

okay, and I cannot have both the multirows and the names for the columns, because as you mentioned is not supported right?

serene scaffold May 26, 2022, 9:18 PM

#

wild dome okay, and I cannot have both the multirows and the names for the columns, becaus...

even though in that visualization, it looks like the index levels span multiple rows, that's just for visualization. conceptually, every row has a value for every level of indexing. if you do print(results50.groupby("n m p alpha".split()).index), you will see a sequence of tuples.

lapis sequoia May 26, 2022, 9:30 PM

#

Can someone please share links to other big servers of data science and ml

serene scaffold May 26, 2022, 9:30 PM

#

https://discord.gg/7hUcqkZa

#

https://discord.gg/zEYzDnR9

lapis sequoia May 26, 2022, 9:30 PM

#

I tried joining it once. But didn't get entry access

#

The DS one

#

Stuck in quarantine

serene scaffold May 26, 2022, 9:31 PM

#

that may be by design. the DS server tries to cater to a more knowledgeable crowd than we do.

lapis sequoia May 26, 2022, 9:32 PM

#

But how did they find out that I am not knowledgeable 🤪

#

I am just stuck outside

#

serene scaffold May 26, 2022, 9:33 PM

#

lapis sequoia But how did they find out that I am not knowledgeable 🤪

discord has a low barrier to entry, so not being knowledgeable about data science is the default assumption.

lapis sequoia May 26, 2022, 9:33 PM

#

Lol

#

Now you are just making jokes

serene scaffold May 26, 2022, 9:34 PM

#

did you read everything in the screenshot?

lapis sequoia May 26, 2022, 9:35 PM

#

Maybe that's how they determine if I am smart or not

#

Can you help me cheat on this "exam"

serene scaffold May 26, 2022, 9:35 PM

#

the "Hint:" part looks relevant

lapis sequoia May 26, 2022, 9:35 PM

#

yert

serene scaffold May 26, 2022, 9:35 PM

#

lapis sequoia Can you help me cheat on this "exam"

absolutely not

lapis sequoia May 26, 2022, 9:36 PM

#

Oh there's a question

#

Regarding other name to normal distribution

robust jungle May 26, 2022, 9:37 PM

#

after augmenting my data (yale faces dataset) my loss actually went up and my accuracy went down, what am I doing wrong?

lapis sequoia May 26, 2022, 9:37 PM

#

I broke out mate 😀

#

Smart kolv

serene scaffold May 26, 2022, 9:37 PM

#

lapis sequoia I broke out mate 😀

gj. how did you find out the answer, or did you already know it?

lapis sequoia May 26, 2022, 9:38 PM

#

Well. I am very smart and ||googled it||

serene scaffold May 26, 2022, 9:38 PM

#

lapis sequoia Well. I am very smart and ||googled it||

exactly; I think they don't want people who wouldn't first google it

median moat May 26, 2022, 9:51 PM

#

Reading and the ability to use Google?!? Impossible.

misty flint May 26, 2022, 10:00 PM

#

kekHands

lapis sequoia May 26, 2022, 10:25 PM

#

shipit

mighty relic May 26, 2022, 11:15 PM

#

Hi guys, am I wanted to showcase my forecasting package here. I mentioned it six months ago. I am a professional forecaster and felt like this was a gap when it comes to large scale enterprise forecasting.
https://github.com/alexhallam/tablespoon

GitHub

GitHub - alexhallam/tablespoon: 🥄✨Time-series Benchmark methods tha...

🥄✨Time-series Benchmark methods that are Simple and Probabilistic - GitHub - alexhallam/tablespoon: 🥄✨Time-series Benchmark methods that are Simple and Probabilistic

#

I will be online for about an hour if anyone has any questions about it.

#

Also, here is a notebook to run through some examples https://github.com/alexhallam/tablespoon/blob/main/tablespoon.ipynb

#

If you click on "Open in Colab" you can run it in Google Colab.

austere steppe May 27, 2022, 12:19 AM

#

Hey everyone I have a problem on an exercise if someone can help me thanks

#

#

I can't show the linear regression on my scatter plot

#

It use pandas matplotlib and scikit learn

mighty relic May 27, 2022, 12:26 AM

#

can you share your notebook link?

misty flint May 27, 2022, 1:37 AM

#

mighty relic Hi guys, am I wanted to showcase my forecasting package here. I mentioned it six...

interesting package. ill star it and ill let you know if i end up using it for work or something

#

pithink

meager portal May 27, 2022, 1:54 AM

#

I've been stuck onto this for several months now and no video really explained it well. What weights do I use for the partial derivitive? Do I transpose the matrices and get the dot product of them? Do I multiply all the derivitives of all the weights together with respect to the previous layer? What do I do?

main fox May 27, 2022, 2:10 AM

#

austere steppe I can't show the linear regression on my scatter plot

If your X is just one feature, it might have to do with how .predict() expects a 2d array

barren wedge May 27, 2022, 3:00 AM

#

does anyone implement torch.jit in Bert model?

serene scaffold May 27, 2022, 3:04 AM

#

barren wedge does anyone implement torch.jit in Bert model?

not sure what you mean

#

also, keep in mind that "implement" does not mean the same thing as "use".

barren wedge May 27, 2022, 3:37 AM

#

serene scaffold also, keep in mind that "implement" does not mean the same thing as "use".

I think it's implement
because not as simple as use

#

https://pytorch.org/docs/stable/jit.html

wooden sail May 27, 2022, 4:02 AM

#

meager portal I've been stuck onto this for several months now and no video really explained i...

the example you show there does not appear to have any matrices at all, though you could generalize it to W_i being matrices and a_i and y being vectors. in the example you showed, all they have done is use the chain rule repeatedly, and the equations given are exactly what you would do to update the parameters: the gradients here are only products of the weights, and the only explanation really is "use the chain rule". as for the matrix case, there is no general expression for the derivative. some authors like expressing it all in einstein notation to hide the pain of the derivative being a 3-way tensor. you can also use some matrix unfoldings to turn it into a huge matrix. the easiest way is to find the expression component-wise and apply it that way, or use einsum to do the relevant operations

#

some things you can do are read about tensor unfoldings, einstein notation, and simply brush up your chain rule. since the weights and biases represent affine transformations, and the activation functions are usually "well-behaved", the derivatives are usually not very difficult to analyze component-wise if you write everything as a sum (or in einstein notation foregoing the sigmas)

#

you might find "the matrix cookbook" a useful read, although it presents some common matrix calculus results without any proof. the proofs follow from writing out the sum and doing it by hand 😛 not very difficult, but certainly tedious

glass lark May 27, 2022, 4:31 AM

#

are there is a website for training data science

lapis sequoia May 27, 2022, 4:39 AM

#

Sharing the best Pandas cheat sheet I have found yet. In case someone else might be interested. It's super intuitive and easy to understand!

bold timber May 27, 2022, 5:52 AM

#

How to grabbing values of 3,5,7?

drifting fjord May 27, 2022, 5:55 AM

#

bold timber How to grabbing values of 3,5,7?

Transpose it and use np.diagonal

torpid cave May 27, 2022, 5:56 AM

#

Hey @mighty relic, went through the package. Looks nice. You're only doing 3 methods though (maybe I saw it wrong)

#

Are you going to include more in the future?

bold timber May 27, 2022, 5:58 AM

#

drifting fjord Transpose it and use np.diagonal

still get the same value

drifting fjord May 27, 2022, 6:01 AM

#

bold timber still get the same value

My bad check this out

bold timber May 27, 2022, 6:09 AM

#

drifting fjord My bad check this out

thank youu

cerulean violet May 27, 2022, 9:03 AM

#

Hello there,I am getting this error for my JARVIS AI well it aint workin

tacit basin May 27, 2022, 9:21 AM

#

cerulean violet Hello there,I am getting this error for my JARVIS AI well it aint workin

What error

sleek tapir May 27, 2022, 9:27 AM

#

for svm

#

can u tune C, gamma and kernel at the same time

#

kernel has [linear, rbf, sigmoid, poly]

#

or u cant do tat

mint palm May 27, 2022, 10:43 AM

#

are there any rules for limiting regularization usage while pre training a model before deployment

sleek tapir May 27, 2022, 11:01 AM

#

hmmm

fading geyser May 27, 2022, 11:07 AM

#

bro u were right

#

it was corrupted

dusty valve May 27, 2022, 12:44 PM

#

how do i create a model and train it from a text file of sentences like

Never gonna give you up
Never gonna let you down
Never gonna run around and desert you```
because all the tutorials i tried to find didn't show me exactly how they worked

gray orchid May 27, 2022, 12:51 PM

#

https://arxiv.org/abs/1810.04805

arXiv.org

BERT: Pre-training of Deep Bidirectional Transformers for Language...

We introduce a new language representation model called BERT, which stands
for Bidirectional Encoder Representations from Transformers. Unlike recent
language representation models, BERT is...

young granite May 27, 2022, 1:04 PM

#

i created 7subplots onto a grid and now want to fig.update_layout but only the first subplot is changed how can i update all at the same time?:

from plotly.subplots import make_subplots

fig = go.Figure()
fig = make_subplots(rows=7, cols=1,
        specs=[[{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               [{'type': 'surface'}],
               ])
count=0
for group_name in data:
    define= "7a direct"
    if define in group_name:
        count+=1
        trace = group_name
        df = data[group_name]
        df.drop_duplicates(subset ="name",
                         keep = False, inplace = True)
        z = df.drop(["name"], axis=1)
        fig.add_trace(go.Surface(z=z,
                                 y=df["name"],
                                 x=df.columns[1:],
                                 name=trace,
                                ),
                      row=0+count,
                      col=1,
                     )```

mint palm May 27, 2022, 1:16 PM

#

what kind of notation is that?

#

"pi is represented by an
Artificial Neural Network (ANN), which is generated by
AI algorithms"

tidal bough May 27, 2022, 1:35 PM

#

huh, this looks to me like they either meant it's a set of 3 things (in which case they probably mean there are 3 kinds of NNs), or (less likely) an array of 3 things (in which case I guess they can be meaning that it's an NN consisting of 3 separate NNs).

final field May 27, 2022, 1:59 PM

#

can anyone help me with object detection with tensorflow?

dusty valve May 27, 2022, 2:03 PM

#

im following this tutorial in tf - https://www.tensorflow.org/text/tutorials/text_generation
but im getting this error

Input 0 of layer "gru" is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (100, 256)

Call arguments received by layer "my_model" (type MyModel):
  • inputs=tf.Tensor(shape=(100,), dtype=int64)
  • states=None
  • return_state=False
  • training=False```

#

!paste

arctic wedgeBOT May 27, 2022, 2:04 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

dusty valve May 27, 2022, 2:04 PM

#

code is - https://paste.pythondiscord.com/letisedona

fierce pine May 27, 2022, 2:15 PM

#

I'm learning machine learning and doing an internship. Can anyone please give me some project ideas

fierce pine May 27, 2022, 2:20 PM

#

dusty valve im following this tutorial in tf - https://www.tensorflow.org/text/tutorials/tex...

What is ur program and what are u trying to achieve

dusty valve May 27, 2022, 2:20 PM

#

fierce pine What is ur program and what are u trying to achieve

a language prediction model that speaks like me

fierce pine May 27, 2022, 2:23 PM

#

dusty valve a language prediction model that speaks like me

Ohkk I'll see if i can solve any error tho I'm just a beginner 😅

#

Can u give me some project ideas please

mint palm May 27, 2022, 2:38 PM

#

tidal bough huh, this looks to me like they either meant it's a set of 3 things (in which ca...

From what i read, it seams that pi represent the knowledge NN has learned

#

Can it be?

brittle skiff May 27, 2022, 3:08 PM

#

hello guys, can someone help me with bit strange question

"""
VAR 3

    min z = min(3x1 - 5x2 - 2x3 + 4x4)
    x1 + 7x2 + x3 + 7x4 <= 46
    3x1 - x2 + x3 + 2x4 <= 8
    2x1 + 3x2 - x3 + x4 <= 10
    xi >= 0, i = 1,2,3,4
"""

table: list = [[1, 7, 1, 7, 1, 0, 0],
               [3, -1, 1, 2, 0, 1, 0],
               [2, 3, -1, 1, 0, 0, 1],
               [-3, 5, 2, -4, 0, 0, 0],
               [46, 8, 10, 0]]

n, m = 4, 3


index_max_basis: int = table[-2].index(max(table[-2], key=abs))

max_basis_column: list = [column[index_max_basis] for column in table[:m+1]]
divided_basis: list = [float(f'{(i / j):.3f}') if j > 0 else 0 for i, j in zip(table[-1], max_basis_column)]

index_basis_row = divided_basis.index(min([x for x in divided_basis if x != 0]))

max_basis_row: list = table[index_basis_row][:]
max_basis_row.insert(0, table[-1][index_basis_row])

intersectDigit: int = max_basis_row[index_max_basis+1]

for column_id, column in enumerate(table):
    print(f"\nPART {column_id}\n")
    for row_id, row in enumerate(column):
        if len(max_basis_column) > column_id and row != max_basis_row[1:][row_id] and column != max_basis_column[column_id]:
            # print(f'{row} - ({max_basis_row[1:][row_id]} * {max_basis_column[column_id]}) / {intersectDigit}')
            print(row - (max_basis_row[1:][row_id] * max_basis_column[column_id]) / intersectDigit)
        elif row in max_basis_row and row == max_basis_row[row_id+1]:
            print(row / intersectDigit)
        else:
            print(row - (table[-1][index_basis_row] * max_basis_column[row_id]) / intersectDigit)```

its Lineal programming task and i need to solve it using Simplex method. i kinda made it but got stuck with last IF ELSE...
Every other column and row works  perfect exclude *x2* and *F(x1)*
Screenshot 1, simple look at **B** Column and what i got *screenshot 2*. If i change AND with OR i got *screenshot 3*. But i need both of them.
Hope u can help me. I can explain if need.

civic stone May 27, 2022, 4:21 PM

#

Hello Everyone,

I am working on Clustering Documents
i used TF-IDF matrix for vectorization
is there any other clustering algorithms that can work with TF-IDF matrix except K-Means and HAC ?

Thanks

bold canopy May 27, 2022, 4:32 PM

#

Hello,
i have the following code and i want to subtract new calculated gradient from my old weights but instead of subtracting the weight from the 1 at the beginning it replaces it with the gradient it self

self.calculate_gradient(self.X_train, self.y_train, self.weights)
new_weights = self.weights - self.alpha * self.gradient
self.send_new_weights(new_weights)

In the screenshot you can whats happening but i want that the outcome of new weights is
1- loss and not -loss

wooden sail May 27, 2022, 4:41 PM

#

the gradients are much too big, look at them

#

for practical purposes, that 1 may as well be 0 when you subtract a number 12 orders of magnitude larger

bold canopy May 27, 2022, 4:44 PM

#

I have to calculate the squared loss

#

but i dont know really how to calculate the gradient of the squared loss i assumed its 2*(y_pred - y) . x.T

#

of this

#

y_pred = np.dot(x, weights)
diff = y_pred - y
self.gradient = 2 * (np.dot(x.T, diff))

wooden sail May 27, 2022, 4:48 PM

#

what size are x, weights, and y?

bold canopy May 27, 2022, 4:48 PM

#

x is 50000, 406, weights 406,1 and y 50000,1

wooden sail May 27, 2022, 4:48 PM

#

ok

bold canopy May 27, 2022, 4:48 PM

#

i dont now if the formular im using for the gradient is right

strange elbowBOT May 27, 2022, 4:49 PM

#

Failed to render input.

View Logs

#

Failed to render input.

View Logs

wooden sail May 27, 2022, 4:51 PM

#

.latex we have the model $y_{\text{pred}} = X w$ and the loss $\Vert y - X w \Vert_2^2$

strange elbowBOT May 27, 2022, 4:51 PM

#

Failed to render input.

View Logs

wooden sail May 27, 2022, 4:51 PM

#

i wonder what the matter is, the log isn't super helpful

#

.latex we have the model $y_{pred} = X w$ and the loss $\Vert y - X w \Vert_2^2$

strange elbowBOT May 27, 2022, 4:52 PM

#

$latex.png$

wooden sail May 27, 2022, 4:52 PM

#

i guess it didn't like the text box in the subscript, weird

#

anyway

#

.latex the gradient w.r.t. w is indeex $X^T (X w - y)$

strange elbowBOT May 27, 2022, 4:53 PM

#

$latex.png$

wooden sail May 27, 2022, 4:53 PM

#

indeed* typo

#

and i missed a factor of 2, what's wrong with me today

serene scaffold May 27, 2022, 4:53 PM

#

!otn a indeex

arctic wedgeBOT May 27, 2022, 4:53 PM

#

:ok_hand: Added indeex to the names list.

wooden sail May 27, 2022, 4:54 PM

#

lemon_angrysad

bold canopy May 27, 2022, 4:54 PM

#

Ok the i guess my alpha has to be much smaller then so the loss isnt big any more

wooden sail May 27, 2022, 4:55 PM

#

for your info, the stability of gradient descent applied to linear least squares problems, if you keep your step size fixed, relies on the step size being SMALLER than 1/largest singular value squared of X

#

or equivalently, 1/largest eigenvalue of X^TX

bold canopy May 27, 2022, 4:57 PM

#

Ok thank you very much

wooden sail May 27, 2022, 4:58 PM

#

.latex though it seems you're working without the factor 1/2 in front, so revise that to $\frac{1}{2 \sigma^2(X)}$

strange elbowBOT May 27, 2022, 4:58 PM

#

$latex.png$

bold canopy May 27, 2022, 5:05 PM

#

which 1/2 factor ?

wooden sail May 27, 2022, 5:07 PM

#

some people like putting a 1/2 in front of their least squares cost so that the factor 2 that pops up in the gradient cancels out

#

your gradient has that factor 2 in front, which means the lipschitz constant is also twice as big

bold canopy May 27, 2022, 5:09 PM

#

but wouldnt be least squares when i do 1/n in front ?

wooden sail May 27, 2022, 5:10 PM

#

you can put whatever scalar factor you want in front. this changes the minimum value, but not the minimizer 😛 just be careful with the step size because you need to account for the actual size of the lipschitz constant when doing gradient descent. otherwise, the algorithm will converge slower than it could, or will diverge altogether

#

here without that factor 2 in the denominator of the step size, the alg would diverge

bold canopy May 27, 2022, 5:11 PM

#

ill test it

woven coral May 27, 2022, 5:59 PM

#

hello

#

anyone working on transformer models ???

misty flint May 27, 2022, 6:12 PM

#

vector databases are really cool

#

very good for semantic search and RecSys

#

DoggoKek

woven coral May 27, 2022, 6:16 PM

#

bert,albert anyone knows???

misty flint May 27, 2022, 6:40 PM

#

serene scaffold !otn a indeex

kekHands

lapis sequoia May 27, 2022, 6:53 PM

#

#

Hi, I want to make an item-based recommendation system. I found some info on the internet and tried to rebuilt their idea. They did the following: I always get this error...

#

fixed, thanks!

misty flint May 27, 2022, 7:21 PM

#

lapis sequoia Hi, I want to make an item-based recommendation system. I found some info on the...

if you are interested in different types of RecSys, i highly recommend going through this book chapter + notebooks https://d2l.ai/chapter_recommender-systems/index.html

#

there are different RecSys for different use cases and you can see the pros/cons of each

#

DoggoKek

lapis sequoia May 27, 2022, 7:23 PM

#

looks good indeed!

misty flint May 27, 2022, 7:24 PM

#

lapis sequoia looks good indeed!

Praise

mint palm May 27, 2022, 7:35 PM

#

In transfer learning, after pretraining, when we deploy the architecture it learns through unsupervised methods right??

#

but if we talk of classifier model how does it know while fine tuning which cluster belong to which class of pretrained model

velvet plover May 27, 2022, 8:22 PM

#

#

#

@spiral peak this is how it looks like

spiral peak May 27, 2022, 8:23 PM

#

velvet plover <@212644551926611969> this is how it looks like

So it looks like you're doing a linear fit against all the data points. You need to sub-select

velvet plover May 27, 2022, 8:24 PM

#

how can i do it sorry im relatively new to python

spiral peak May 27, 2022, 8:25 PM

#

So for this, I would use numpy to select the values that exclude the first X amount and the last Y amount. Looking at your code I think that can be done when you define w=... and p=..., you can slice them further and only take the section you're interested in

velvet plover May 27, 2022, 8:25 PM

#

alright im going to try it now

#

@spiral peak it unfortunately didnt work

#

it cuts off the curve but the pitch doesnt fit

#

civic stone May 27, 2022, 9:15 PM

#

civic stone Hello Everyone, I am working on Clustering Documents i used TF-IDF matrix for...

Why nobody answering my question 😦

trim sapphire May 27, 2022, 11:47 PM

#

civic stone Why nobody answering my question 😦

What was your question?

trim sapphire May 27, 2022, 11:53 PM

#

velvet plover

Why you write your functions all spaced out? grumpchib

fierce pine May 28, 2022, 12:34 AM

#

From where can i start for machine learning

serene scaffold May 28, 2022, 12:51 AM

#

fierce pine From where can i start for machine learning

do you have a general understanding of what machine learning is? do you have a goal in mind?

worthy trail May 28, 2022, 2:14 AM

#

Any recommendations on books that teach you stats in Python? My stats knowledge is very basic so I would like to get comfortable with advanced concepts like p-values, probability distributions, chi square testing etc through Python before jumping into ML. Been working at an AI company as a backend engineer (Python) so understanding what data science talks about/does everyday would be nice lol

solemn laurel May 28, 2022, 2:26 AM

#

worthy trail Any recommendations on books that teach you stats in Python? My stats knowledge ...

https://greenteapress.com/thinkstats2/thinkstats2.pdf

worthy trail May 28, 2022, 2:29 AM

#

solemn laurel https://greenteapress.com/thinkstats2/thinkstats2.pdf

This looks great! Thanks!

bitter quarry May 28, 2022, 3:40 AM

#

I’m hella stuck in my programming project zzzz my head is gonna burst can someone help me list the salary range and their total

serene scaffold May 28, 2022, 3:43 AM

#

bitter quarry I’m hella stuck in my programming project zzzz my head is gonna burst can someon...

what is the problem? by the way, any expression involving == True or == False is wrong.

bitter quarry May 28, 2022, 3:47 AM

#

serene scaffold what is the problem? by the way, any expression involving `== True` or `== False...

I don’t really know either I’m so gone it’s my school’s project and I have to submit by tonight aaaaaaaa

serene scaffold May 28, 2022, 3:47 AM

#

bitter quarry I don’t really know either I’m so gone it’s my school’s project and I have to su...

you've shown code that does something. what does it do that is different from what you want?

bitter quarry May 28, 2022, 3:48 AM

#

I’m tryna get the total of job postings I aint get that yet

#

I’m not sure if I did everything else right

serene scaffold May 28, 2022, 3:48 AM

#

the total of job posting. what does that mean?

#

the number of job postings?

bitter quarry May 28, 2022, 3:49 AM

#

ya

serene scaffold May 28, 2022, 3:50 AM

#

and that's not the number of rows?

bitter quarry May 28, 2022, 3:51 AM

#

column

#

With different title

serene scaffold May 28, 2022, 3:51 AM

#

the number of columns should be how many fields you have. not how many instances you have.

bitter quarry May 28, 2022, 3:52 AM

#

did I do that wrongly

robust granite May 28, 2022, 6:36 AM

#

Hi people!
i have data set of states , cities across 5 years and some additional column on which ill perform analysis

#

But, the values of cities are changing across years. How do i manage that?

#

For example, lets say in 2011-13 it was New Yorrk but latter years it had name as New York

rose agate May 28, 2022, 7:02 AM

#

robust granite For example, lets say in 2011-13 it was New Yorrk but latter years it had name a...

My assumption is that the best way would probably to do an iterative loop and check pairwise similarity between the city names. You could try something similar like LCS, the longest common subsequence, which I assume should work pretty well. You could check if the LCS is within 1 or 2 of the actual length which would indicate a minor misspelling, then change the names to match. If the names are really messed up you might look at word similarity with spaCy or something, but seems overkill to me. @serene scaffold might be able to give some better ideas.

rose agate May 28, 2022, 7:10 AM

#

robust granite For example, lets say in 2011-13 it was New Yorrk but latter years it had name a...

something like this

names = ['New York', 'New Yorkk', 'Los Vegas', 'Las Vegas', 'Hollywood']
print('before:', names)

def lcs(X, Y, m, n):
    if m == 0 or n == 0:
       return 0;
    elif X[m-1] == Y[n-1]:
       return 1 + lcs(X, Y, m-1, n-1);
    else:
       return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
  
for i in range(len(names)):
    for j in range(i, len(names)):
        X = names[i]
        Y = names[j]
        
        if X!=Y:
        
            LCS = lcs(X,Y,len(X), len(Y))
            if len(X) - LCS <= 2:
                print("Similar names found:", X, 'and', Y)
                names[j] = names[i]

print('after:', names)

shy mural May 28, 2022, 7:34 AM

#

is there a way that i can find the gap of this door section

young granite May 28, 2022, 8:01 AM

#

shy mural is there a way that i can find the gap of this door section

if u got the 3D Data of the doorframe sure

random peak May 28, 2022, 8:01 AM

#

can i feed array to support vector?
or does it have to be a dataframe?

young granite May 28, 2022, 8:04 AM

#

how can i adjust subplots in plotly when i use fig.update only the last one is changed

shy mural May 28, 2022, 8:10 AM

#

young granite if u got the 3D Data of the doorframe sure

thanks

young granite May 28, 2022, 8:12 AM

#

shy mural thanks

its not what u want to hear i know but with a simple picture in that resolution u cant even approximate by functions

robust granite May 28, 2022, 8:32 AM

#

rose agate something like this ``` names = ['New York', 'New Yorkk', 'Los Vegas', 'Las Veg...

So its like you are matching the length of common string.?

rose agate May 28, 2022, 8:37 AM

#

robust granite So its like you are matching the length of common string.?

i was doing the difference between the length of string X and the LCS, if it's only a single character that is wrong, that should just be 1. I did this instead of just checking if the LCS is large because I assume that some name pairs could have a high LCS but not actually be the same place. e.g. if there's a 'New Hampshire' and 'Old Hampshire' the LCS would be 9 because they both have the word 'Hampshire', but we wouldn't want to classify them as the same word

rose agate May 28, 2022, 8:40 AM

#

shy mural is there a way that i can find the gap of this door section

if you know the length of the door handle you can probably compare by multiplying it by the ratio of the pixel widths

shy mural May 28, 2022, 8:42 AM

#

rose agate if you know the length of the door handle you can probably compare by multiplyin...

i actually need to compare the gap of that section in two different door images. and find the difference of gap

harsh nexus May 28, 2022, 9:03 AM

#

Hey guys! I got an interesting problem in #help-pear about plotting a merged dataframe on 2 subplots with a shared Y axis, any help is welcome

dusty valve May 28, 2022, 12:00 PM

#

so, i've trained a language prediction model, (Sequential), do i just save it with mode.save()?

harsh nexus May 28, 2022, 12:54 PM

#

what library did u use? Genism?

desert oar May 28, 2022, 1:05 PM

#

dusty valve so, i've trained a language prediction model, (Sequential), do i just save it wi...

you always have to explain what library you are using, because there are many libraries and they all work differently

dusty valve May 28, 2022, 1:05 PM

#

desert oar you always have to explain what library you are using, because there are many li...

tensorflow

#

tf.keras.Sequential

austere swift May 28, 2022, 1:16 PM

#

yeah you can save it with model.save('filename.h5')

#

thats for saving the whole model, including the optimizer state and architecture, if you wanna save just the weights you can use the save_weights method instead, it works the same

merry glacier May 28, 2022, 1:57 PM

#

Where should I start with AI? I wanna try make some kind of text classification eventually but rn just need basics

austere swift May 28, 2022, 2:02 PM

#

learn the math behind it first

#

its primarily linear algebra and some calculus concepts

hollow sentinel May 28, 2022, 3:41 PM

#

wow

#

why did i not look at jason brownlee before this

harsh nexus May 28, 2022, 6:14 PM

#

I asked my question already but I'll try to simplify it more: I got 2 dataframes, it contains results of 2 different textfiles on an LDA model. I merged the two dataframes with an extra column ('Originated').

Next step: I want to visualise how each txt file scored on the LDA model. I make a figure with 2 subplots, a shared Y axis (with all the topics, IMPORTANT: they have some topics in common) and an inverted X axis (see image). Also: I'd like to color certain topics based on their category (which is also in the dataframe). It's really hard to succeed in this and I'm kinda stuck, 2 important things that won't work together: categorising by color AND getting the labels correct for BOTH subplots

#

My code for visualisation (using matplotlib pyplot as plt): https://paste.pythondiscord.com/puhojadife.py
My result:

serene scaffold May 28, 2022, 6:21 PM

#

harsh nexus I asked my question already but I'll try to simplify it more: I got 2 dataframes...

so the problem is making sure that the color code is the same in both subplots, or what?

harsh nexus May 28, 2022, 6:25 PM

#

So basically, a part of the Y axis is correct, generic economic language is the most important topic of Goodwin, it's category is correct and it's on the correct spot

Problem: in the other txt file, there is ALSO a score for this topic, so they should be NEXT to each other, not like this. They need to share this label/tick in some way.

The categories do seem to work 'okay'? I think, It's hard to say this way. Main focus is to get to clearly compare these 2 subplots

#

color code seems fine for the first subplot, as that's the only part that I can (kind of) evaluate

craggy pier May 28, 2022, 7:14 PM

#

What are the best free courses offers for basic programming in python from zero till database usage?

#

Can anyone give me a list at least?

#

good afternoon...

tacit basin May 28, 2022, 7:16 PM

#

craggy pier What are the best free courses offers for basic programming in python from zero ...

!resources

arctic wedgeBOT May 28, 2022, 7:16 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

craggy pier May 28, 2022, 7:31 PM

#

tacit basin !resources

Thanks, but only found 1 source that teaches database, what about the paid courses, any private company course recommendation?

craggy pier May 28, 2022, 7:31 PM

#

arctic wedge

Here, there's not so much options

tacit basin May 28, 2022, 7:32 PM

#

craggy pier Thanks, but only found 1 source that teaches database, what about the paid cours...

i am not that familiar with database courses for python, sorry!

#

but i mean python is still python, so you can learn python using any of the books/courses from resource page, then add database to it

tacit basin May 28, 2022, 7:37 PM

#

craggy pier Thanks, but only found 1 source that teaches database, what about the paid cours...

you may want to try asking in #python-discussion as well, as it's not only data science topic, more ppl in there, more chances for an answer

craggy pier May 28, 2022, 7:37 PM

#

tacit basin you may want to try asking in <#267624335836053506> as well, as it's not only da...

ok!

craggy pier May 28, 2022, 7:37 PM

#

tacit basin but i mean python is still python, so you can learn python using any of the book...

anyway, thanks!

tacit basin May 28, 2022, 7:42 PM

#

craggy pier anyway, thanks!

free code camp is a good resource, they offer Python for Eeverybody course from University of Michigan, which gets good reviews. It starts from zero and one of the last chapters id databases: https://www.freecodecamp.org/learn/scientific-computing-with-python/

freeCodeCamp.org

Learn to Code — For Free

#

direct P4E link: https://www.py4e.com/

#

they also have Database certification: https://www.freecodecamp.org/learn/relational-database/

freeCodeCamp.org

Learn to Code — For Free

#

there's also #databases channel

#

@craggy pier look at pinned messages in #databases channel

craggy pier May 28, 2022, 7:49 PM

#

tacit basin free code camp is a good resource, they offer Python for Eeverybody course from ...

Oh, you strike out man! hahaha

tacit basin May 28, 2022, 7:51 PM

#

yeah #python-discussion could be overwhelming at times lol

fierce pine May 28, 2022, 7:54 PM

#

serene scaffold do you have a general understanding of what machine learning is? do you have a g...

Yeahh. I am doing an internship for the same but don't know where to start from. Would u recommend any site or something

hard idol May 28, 2022, 10:56 PM

#

in the initial population of the neat algorithm, is every input node connected to every output node?

worldly dawn May 28, 2022, 11:05 PM

#

hard idol in the initial population of the neat algorithm, is every input node connected t...

wouldn't that make the population being made of completely identical individuals?

#

From the paper:

In contrast, NEAT biases the search towards minimal-dimensional spaces by starting out with a uniform
population of networks with zero hidden nodes (i.e., all inputs connect directly to out-
puts).

hard idol May 28, 2022, 11:06 PM

#

worldly dawn wouldn't that make the population being made of completely identical individuals...

but the connections themselves would have random weights

hard idol May 28, 2022, 11:07 PM

#

worldly dawn From the paper: ``` In contrast, NEAT biases the search towards minimal-dimensio...

isnt that just saying waht im asking

worldly dawn May 28, 2022, 11:08 PM

#

sounds about it

hard idol May 28, 2022, 11:08 PM

#

oh alright

hard idol May 28, 2022, 11:09 PM

#

worldly dawn sounds about it

would any of the connections be disabled?

#

just in the inital population

worldly dawn May 28, 2022, 11:12 PM

#

hard idol would any of the connections be disabled?

No clue. I am interested in GA/GP and NEAT is on my list of the next items to implement. So haven't gone through the whole paper yet.
That said, https://macwha.medium.com/evolving-ais-using-a-neat-algorithm-2d154c623828 also mentions:

Firstly we need a blank population of networks. Each of these networks will initially only have the input and output nodes — no hidden nodes or connections.

hard idol May 28, 2022, 11:12 PM

#

no connections??

worldly dawn May 28, 2022, 11:17 PM

#

hard idol no connections??

For some papers, I find it useful to dig through the associated source code to clarify some specific points. You may find the definitive answer there: http://nn.cs.utexas.edu/soft-view.php?SoftID=4

#

(random online articles do make some assumptions sometimes which turn out to go against the source code of the paper)

hard idol May 28, 2022, 11:18 PM

#

oh okay thanks

worldly dawn May 28, 2022, 11:43 PM

#

hard idol oh okay thanks

let me know what you find though.
I am curious about it as I will soon start looking into it 🙂

hard idol May 28, 2022, 11:44 PM

#

worldly dawn let me know what you find though. I am curious about it as I will soon start loo...

i started looking at the python implementation

#

https://neat-python.readthedocs.io/en/latest/config_file.html#:~:text=to be)%20recurrent.-,initial_connection,-Specifies%20the%20initial
it seems like it takes in user input, it can either be all connected, not connected at all, or a chance of being connected

#

and a few other possibilies too

#

@worldly dawn

worldly dawn May 28, 2022, 11:48 PM

#

makes sense for a library. Interesting to see by default there is no connection.

hard idol May 28, 2022, 11:50 PM

#

worldly dawn makes sense for a library. Interesting to see by default there is no connection.

yeah that doesnt really make sense to me

#

since in that case every organism in the initial population would be exactly identical

#

and it also wouldnt do anything at all

worldly dawn May 28, 2022, 11:54 PM

#

hard idol and it also wouldnt do anything at all

it would through mutation though.
I could see the argument about minimalism of the network with zero connection

hard idol May 28, 2022, 11:55 PM

#

worldly dawn it would through mutation though. I could see the argument about minimalism of t...

yeah thats true

#

but partial would also do the same

worldly dawn May 28, 2022, 11:55 PM

#

I don't think it would make or break it though

hard idol May 28, 2022, 11:55 PM

#

but ig its different chances

#

yeah

worldly dawn May 28, 2022, 11:57 PM

#

hard idol but partial would also do the same

that's probably what I would start with as it saves one generation

hard idol May 28, 2022, 11:58 PM

#

unless the randomness for partial and mutation is different

worldly dawn May 28, 2022, 11:58 PM

#

Having zero connection is an extra step to trying some and having all the connections for everyone would add some extra connections that the evolution would have to figure out to trim

hard idol May 28, 2022, 11:58 PM

#

true

#

i think i might agree with partial

#

wait actually

worldly dawn May 28, 2022, 11:59 PM

#

Comparing these starting points could be a fun project too, as a way to see which one could converge the fastest/most reliable way

hard idol May 28, 2022, 11:59 PM

#

nvm neither partial or mutation would have any new nodes

#

yeah

#

i also wanna compare the percentages on a 3d graph

worldly dawn May 29, 2022, 12:01 AM

#

I find quantiles useful too in these contexts

wind girder May 29, 2022, 1:29 AM

#

I want to find the intersection of a horizontal line to a contour line in plotly.

I cannot find an implementation of it
One said to use skimage.find_contours to find the contour line but it changes units

short heart May 29, 2022, 9:24 AM

#

How can I add custom augmentations to albumentations composition?

short heart May 29, 2022, 10:35 AM

#

short heart How can I add custom augmentations to albumentations composition?

irrelevant now

gray steppe May 29, 2022, 11:46 AM

#

centers = kmeans.cluster_centers_.reshape(10, 8, 8)
for axi, center in zip(ax.flat, centers):
    axi.set(xticks=[], yticks=[])
    axi.imshow(center, interpolation='nearest', cmap=plt.cm.binary)``` whats this code doing?

wooden sail May 29, 2022, 12:14 PM

#

the first line makes an image composed of several subplots. specifically, 2 rows with 5 columns each of subplots, of size (8,3) (i think this one is in inches, can't recall)

#

the second line seems to be doing some sort of kmeans clustering, i can't tell how exactly because i don't recognize the command. the result is reshaped into an array of size (10,8,8)

#

then, the axes (the object that contains the data to be plotted in each subfigure) are zipped together with the kmeans results. there are 10 subplots and 10 centers, so this iterates over them together

#

then the person removes the x and y ticks (the markings along the x and y axes)

#

and finally, in each of the subplots, an 8x8 image is displayed (of whatever it is that kmeans is returning here). since the image probably won't be 8x8 pixels (especially because of the size that was specified in the first line), they pick a flavor of interpolation to scale the figures up. 'nearest' essentially makes pixels bigger by just scaling them up, so the image will look blocky. cmap puts a colormap on the image. seems they just went for black and white

#

@gray steppe

gray steppe May 29, 2022, 12:23 PM

#

wooden sail the first line makes an image composed of several subplots. specifically, 2 rows...

Thank you. This is the output.

gray steppe May 29, 2022, 12:25 PM

#

wooden sail then, the axes (the object that contains the data to be plotted in each subfigur...

do you mean axi or axes? i am perplexed here.

wooden sail May 29, 2022, 12:26 PM

#

i mean axes, i'm not talking about the variable names

#

what the notebook calles "ax" there is a list of axes

gray steppe May 29, 2022, 12:27 PM

#

sorry, i am very poor in python.

#

so what's axi, center in this case?

wooden sail May 29, 2022, 12:27 PM

#

axes is the plural of axis, like in x and y axis

#

axi is an element of the list of axes there

gray steppe May 29, 2022, 12:27 PM

#

oh cool

wooden sail May 29, 2022, 12:28 PM

#

it seems like you should look up how for loops work in python

gray steppe May 29, 2022, 12:28 PM

#

so do you mind telling how's that for loop working?

wooden sail May 29, 2022, 12:28 PM

#

you should ask in python general or in a help channel, i think

gray steppe May 29, 2022, 12:28 PM

#

they won't answer

#


labels = np.zeros_like(clusters)
for i in range(10):
    mask = (clusters == i)
    labels[mask] = mode(digits.target[mask])[0]``` i am confused with this one as well.

#

these code snippets are from the https://www.telematika.org/py/pdsh_05.11-k-means/

Jupyter Snippet PDSH 05.11-K-Means

gray orchid May 29, 2022, 3:09 PM

#

gray steppe they won't answer

https://www.w3schools.com/python/default.asp

https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html

W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

gray steppe May 29, 2022, 4:21 PM

#

I was not expecting that.

gray orchid May 29, 2022, 4:29 PM

#

Or what?

loud apex May 29, 2022, 7:18 PM

#

Hello there

What is the roadmap to learn data science and ai? like should i learn data science then ai? and what are the libraries should i know? and if there are courses for beginners about ai that would be helpful

serene scaffold May 29, 2022, 7:20 PM

#

loud apex Hello there What is the roadmap to learn data science and ai? like should i lea...

!resources data science

arctic wedgeBOT May 29, 2022, 7:20 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

lapis sequoia May 29, 2022, 7:22 PM

#

loud apex Hello there What is the roadmap to learn data science and ai? like should i lea...

ai is not really a beginners topic, typically I'd advise:
basic numpy -> basic pandas -> matplotlib (optionally seaborn) -> mathematics & statistics -> more advanced numpy & pandas -> ai

loud apex May 29, 2022, 7:23 PM

#

i do know the basics of ai like naive classifier, NN and knn algorithm

#

thanks for the help

sour lynx May 29, 2022, 8:03 PM

#

how can i fix this? does anyone knows pls help me

serene scaffold May 29, 2022, 8:12 PM

#

sour lynx how can i fix this? does anyone knows pls help me

you apparently have string data in x_train or y_train, but the model you're using needs for it to be floating point numbers.

odd meteor May 29, 2022, 8:13 PM

#

sour lynx how can i fix this? does anyone knows pls help me

It appears you have a string in either x_train or y_train.

sour lynx May 29, 2022, 8:24 PM

#

but i used np method

#

is it wrong?

odd meteor May 29, 2022, 9:10 PM

#

sour lynx but i used np method

Did you vectorize your cleaned_text feature?

#

You need to vectorize your text and clean_text feature. It appears you didn't do that from your pics.

Use TfidfVectorizer or CountVectorizer + TfidfTransformer on those columns

lapis sequoia May 29, 2022, 9:14 PM

#

I was learning recommender systems. And I have a question. The dataset basically had rows with user no, item no and the corresponding rating. We then form n user x m item matrix using this.

The teacher taught to do train test split in this data. And then use the train data to find similarity between users. And then predict the ratings that are available in the test matrix based on train matrix.
But my question is, why didn't we simply predict each value in the whole data by finding similar users to the user at hand?
I don't see any data leakage happening here.

bold timber May 29, 2022, 9:18 PM

#

Hi, how to select the value of start_station_name that contain the words 'San Francisco' in dataframe?

serene scaffold May 29, 2022, 9:50 PM

#

bold timber Hi, how to select the value of start_station_name that contain the words 'San Fr...

There's a str.startswith method

#

Actually what you want is this @bold timber https://pandas.pydata.org/docs/reference/api/pandas.Series.str.contains.html

bold timber May 29, 2022, 10:01 PM

#

serene scaffold Actually what you want is this <@786960616664727572> https://pandas.pydata.org/d...

Ok, thank you so much

bold timber May 29, 2022, 10:21 PM

#

if I have the plot like this, what the type of integral to calculate the area? definite or indefinite?

mint palm May 29, 2022, 10:26 PM

#

are there architectures that are intelligent enough to extract portions of video which are relevant for a prediction and eliminate other portion.

serene scaffold May 29, 2022, 10:41 PM

#

@bold timber do you know what definite and indefinite integrals are?

serene scaffold May 29, 2022, 10:46 PM

#

mint palm are there architectures that are intelligent enough to extract portions of video...

Look into "attention"

lapis sequoia May 29, 2022, 10:48 PM

#

bold timber if I have the plot like this, what the type of integral to calculate the area? d...

You want area under the curve with respect to x-axis?

bold timber May 29, 2022, 11:42 PM

#

serene scaffold <@786960616664727572> do you know what definite and indefinite integrals are?

Yes I know. The indefinite integrals don't have limitation for calculating the function

#

Vice versa

bold timber May 29, 2022, 11:42 PM

#

lapis sequoia You want area under the curve with respect to x-axis?

Yes

eager wedge May 30, 2022, 12:20 AM

#

I am doing a project regarding semantic segmentation. I am achieving 98 accuracy and 0 loss on first epoch? Why is it not working?

unet = models.Sequential()
unet.add(layers.Conv2D(64, (3,3), activation='relu', padding='same', input_shape=(i_size, i_size, 1)))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(128, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(256, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(512, (3,3), activation='relu', padding='same'))
unet.add(layers.MaxPool2D((2,2), padding='same'))
unet.add(layers.Conv2D(1024, (3,3), activation='relu', padding='same'))
unet.add(layers.Conv2D(512, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(256, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(128, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))
unet.add(layers.Conv2D(64, (3,3), activation='relu', padding='same'))
unet.add(layers.UpSampling2D((2,2)))

unet.add(layers.Conv2D(1, 1, padding="same", activation = "sigmoid"))

unet.compile(optimizer='Adam', loss="categorical_crossentropy", metrics=["accuracy"])

model_history = unet.fit(x_train, y_train,
epochs=100,
verbose = 1,
batch_size = 32,
validation_data = (x_test, y_test))

unet.summary()

misty flint May 30, 2022, 2:35 AM

#

lapis sequoia I was learning recommender systems. And I have a question. The dataset basically...

theres a million and one ways to build a Rec Sys. dif models/systems -> dif outcomes -> dif pros/cons. another thing to keep in mind is your use case as this affects how you build your system.

#

highly recommend eugene yan's content about RecSys

#

praise

rose agate May 30, 2022, 5:01 AM

#

maybe look for a dataset on a topic you're interested in, e.g. books, movies, sports, etc. you can then think of something you want to predict or explore more about

#

or make a bot that optimises a game, I've always wanted to do that

#

depends what type of game I guess. If it's a video game then it needs to interpret the image which is quite difficult. If it's something like chess/checkers/go then AI can do that for sure

iron basalt May 30, 2022, 5:05 AM

#

World models are pretty cool. You can make your AI simulate various things, real or virtual, one cool experiment is having it mimic various applications by learning models of them (e.g. copy a text editor).

#

(input is the window's buffer (pixels / video) and the keyboard and mouse (it's also the outputs in this case))

barren wedge May 30, 2022, 5:44 AM

#

Is it better to batch into BERT model or not?

lusty spear May 30, 2022, 9:26 AM

#

barren wedge Is it better to batch into BERT model or not?

Is it better to batch into BERT model or not?
Generally, yes. When model processes inputs in a batch, GPU will process each input in parallel. But you're limited by the size of your GPU, so if you're running out of GPU memory, then you'll need to decrease batch size.
If you're using huggingface pipeline, then AFAIK it's going to handle batching for you.

barren wedge May 30, 2022, 9:29 AM

#

lusty spear Is it better to batch into BERT model or not? Generally, yes. When model proces...

is it faster to decrease the batch size or is it slower?

lapis sequoia May 30, 2022, 11:14 AM

#

guys i need help , i can code into C++ but when i entered the AI and data world i needed to learn python so i don't know where to learn and practice it for datascience

next sphinx May 30, 2022, 11:36 AM

#

What is the purpose of an activation function?
A. To decide whether a neuron will fire or not
B. To increase the depth of a neural network
C. To create connectivity among hidden layers
D. To normalize the inputs

flint mason May 30, 2022, 11:42 AM

#

how to put a condition where the running tab is interrupted automatically if its about to exceed available ram python

arctic wedgeBOT May 30, 2022, 12:24 PM

#

Hey @hollow prairie!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

gray steppe May 30, 2022, 1:15 PM

#

Hi guys, how logistic regression can be used as a classifier?

lime current May 30, 2022, 1:34 PM

#

hello guys, how can I use machine learning to detect fraudulent transactions in a dataset.
which ML algorithm will be suitable for it?

sharp leaf May 30, 2022, 2:21 PM

#

In order to test if the k-nn algorithm works properly for the selected parameters ( parameter k and metric) and sample database, an appropriate methodology should be used. One of them is 1 versus the rest.

How does this 1 versus rest method work with Knn? I want to implement this method into knn but I can't find any useful information that describes it.

light crescent May 30, 2022, 2:48 PM

#

hi, not exactly Python related but I'm asking here since I couldn't find anything online. does anyone know of an algorithm to generate a realistic set of values for a line chart/bar chart? basically, a "smoothed" set of random values with no big changes in values and ideally it should keep the random values within a neutral trend

wooden sail May 30, 2022, 2:49 PM

#

the easiest way would be to use either a gaussian distribution with a low variance around the true values, or a uniform distribution

light crescent May 30, 2022, 2:51 PM

#

wooden sail the easiest way would be to use either a gaussian distribution with a low varian...

that does seem like a simple solution! smoothness can be the stdev of the distribution

#

thank you 😁

wooden sail May 30, 2022, 2:51 PM

#

aight

#

let me cook up a MWE

#

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: xvals = np.arange(100)

In [4]: yvals = xvals + np.random.normal(size=100)

In [5]: plt.plot(xvals, yvals)
Out[5]: [<matplotlib.lines.Line2D at 0x23de33aa3a0>]

In [6]: plt.show()

#

#

just as an example. you can change the variance of the noise by multiplying it with a scalar. you could also low pass filter it if you wanted, to get it to look smoother

gloomy anvil May 30, 2022, 3:38 PM

#

hello party people

#

I've got a question, that I posted in stackoverflow: https://stackoverflow.com/questions/72436420/lstm-always-predicts-1s-for-binary-classifications
I figured I might ask here as well if you have some ideas why my LSTM always returns 1s in binary classification

Stack Overflow

LSTM always predicts 1s for binary classifications

I have a dataset (1800 rows, 55 columns) that I need to do binary classification on. I created a pipeline with different models (LogReg, XGB, RF, GRF, SVM, MLP) and one of them being an LSTM. I've ...

#

What else could I change about my model? I tried different configurations of nodes and hidden layers, different optimizers as well as learning rates

mint palm May 30, 2022, 3:44 PM

#

are transfer learning based architectures "regularly" fine tuned after deployment?

mild dirge May 30, 2022, 3:53 PM

#

Depends if the data distribution changes over time I'd think @mint palm

mint palm May 30, 2022, 4:04 PM

#

i was expecting the same....

hollow sentinel May 30, 2022, 4:16 PM

#

anyone know of websites to get data from besides kaggle?

#

i think kaggle is too clean

#

uci machine learning repo?

#

the problem is that companies don't really like putting their data out there anymore so it's difficult to come up with nice projects when the data isn't available

#

i wouldn't use a dataset from kaggle in a portfolio

half jolt May 30, 2022, 5:06 PM

#

Hello, could someone give me a hand and guide me how I could do this in an array type and not in a list like the example?

#

https://paste.pythondiscord.com/ehilucojet

ivory steppe May 30, 2022, 5:10 PM

#

Can anyone please guide me on how to determine whether the arm movement is in clockwise/anticlockwise through computer vision?
Can anyone share some similar projects?

native rune May 30, 2022, 5:20 PM

#

can anyone clarify my doubt that whether the offline handwritten recognition (OHR) and optical character recognition (OCR) the same?

harsh nexus May 30, 2022, 5:38 PM

#

serene scaffold so the problem is making sure that the color code is the same in both subplots, ...

Would making a seperate list with the index of the label (topic) work? Right now I think I’m using the wrong index so all I gotta do is get the topic of the current row, get the ‘index’ in the array of all topics (y axis) with that topic and boom, i got the right y index that a bar should be on

serene scaffold May 30, 2022, 5:57 PM

#

harsh nexus Would making a seperate list with the index of the label (topic) work? Right now...

you can add a color column to the dataframe and use that

harsh nexus May 30, 2022, 5:59 PM

#

Good one

sleek fjord May 30, 2022, 6:09 PM

#

hii, i m getting cuda out of memory error, this is my GPU memory usage

#

how do i resolve the error?

#

RuntimeError: CUDA out of memory. Tried to allocate 396.00 MiB (GPU 0; 4.00 GiB total capacity; 3.05 GiB already allocated; 0 bytes free; 3.09 GiB reserved in total by PyTorch)

misty flint May 30, 2022, 6:19 PM

#

i have opened up the data engineering can of worms

#

and there are a million dif ways to move data from point A to point B

#

kekHands

#

ELT/ETL nightmare

#

i am starting to understand this space a little bit more

#

and why it needs its own role

serene scaffold May 30, 2022, 6:27 PM

#

sleek fjord `RuntimeError: CUDA out of memory. Tried to allocate 396.00 MiB (GPU 0; 4.00 GiB...

do you understand what the error message is telling you?

sleek fjord May 30, 2022, 6:32 PM

#

serene scaffold do you understand what the error message is telling you?

yes i understood the message, but i m not able to resolve it

#

I tried many different things, clearing the cache, reducing the batch size.. getting the memory usage, but still no luck

serene scaffold May 30, 2022, 6:36 PM

#

@sleek fjord you're trying to allocate almost 100 times more memory than your GPU has, so you might need to brace for the possibility that you simply can't accomplish this with your hardware

#

Do your tensors have a lot of zeros?

sleek fjord May 30, 2022, 6:38 PM

#

serene scaffold Do your tensors have a lot of zeros?

i dont understand this

serene scaffold May 30, 2022, 6:38 PM

#

sleek fjord i dont understand this

You're trying to allocate a tensor on the GPU. A tensor is basically the same as an array. So the question is, are the elements mostly zeros?

sleek fjord May 30, 2022, 6:39 PM

#

serene scaffold You're trying to allocate a tensor on the GPU. A tensor is basically the same as...

i dont think so

serene scaffold May 30, 2022, 6:41 PM

#

sleek fjord i dont think so

What kind of model are you training?

sleek fjord May 30, 2022, 6:41 PM

#

serene scaffold What kind of model are you training?

https://github.com/v-iashin/MDVC

GitHub

GitHub - v-iashin/MDVC: PyTorch implementation of Multi-modal Dense...

PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops) - GitHub - v-iashin/MDVC: PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)

arctic wedgeBOT May 30, 2022, 7:01 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1653937878:f> (9 minutes and 59 seconds) (reason: discord_emojis rule: sent 35 emojis in 10s).

sleek fjord May 30, 2022, 7:14 PM

#

serene scaffold <@792289355741397012> you're trying to allocate almost 100 times more memory tha...

btw it is 396 Mb, not GB

mild dirge May 30, 2022, 7:28 PM

#

So how big would you estimate your batch in memory size?

#

@sleek fjord

#

How many floats in a batch?

upper spindle May 30, 2022, 11:28 PM

#

how good is the mit deep learning course by alexander amini

#

any opinions from anyone?

median dove May 30, 2022, 11:49 PM

#

Hello. I have been scratching my head about advanced projects but nothing comes to mind.
I want to make a project to impress a college and make them want to enroll me. For that I will need an advanced project but literally nothing comes to my mind.
What is an advanced AI project I could develop during the next year to impress some people? Thanks 🙂

lapis sequoia May 31, 2022, 12:15 AM

#

median dove Hello. I have been scratching my head about advanced projects but nothing comes ...

2d canvas AI bot in websocket

serene scaffold May 31, 2022, 12:25 AM

#

@median dove what kind of AI do you want to do?

median dove May 31, 2022, 12:27 AM

#

Well I’d like deep learning, I don’t have any real experience with AI but I would like to spend my highschool years on research and the development of an advanced model that colleges could like and offer me a place with them

serene scaffold May 31, 2022, 12:54 AM

#

median dove Well I’d like deep learning, I don’t have any real experience with AI but I woul...

"deep learning" is just machine learning with neural networks that have a lot of layers. all different kinds of AI may involve deep learning.

median dove May 31, 2022, 1:08 AM

#

serene scaffold "deep learning" is just machine learning with neural networks that have a lot of...

Okay, I’m not sure what kind of AI I want. I just want an advanced project and work on it until I graduate

celest flax May 31, 2022, 1:47 AM

#

im trying to get tensorflow to work

#

but

#

it just doesnt

#

is there any alternative

#

thatworks similar

#

everytime i look up neural network and machine learning

#

it just shows tensorflow and tensorflow.keras

royal crest May 31, 2022, 1:48 AM

#

pyTorch

celest flax May 31, 2022, 1:48 AM

#

thank youuuu

#

so is that like

#

the same thing

#

ish

serene scaffold May 31, 2022, 1:57 AM

#

celest flax it just doesnt

you've already been informed that PyTorch is similar to tensorflow. and it is. but you should probably address why tensorflow "isn't working". because it's a very widely used library, and chances are, you're the one making the mistake.

celest flax May 31, 2022, 2:06 AM

#

m1 chip mac

#

that's why

#

pip is up to date

#

mac is up to date

#

pycharm and python up to date

#

iirc google didnt get full access to develop on m1 chips, not sure tho

hollow oasis May 31, 2022, 2:15 AM

#

Where should I start learning data-science and ai? like what are some good resources to start learning

serene scaffold May 31, 2022, 2:35 AM

#

hollow oasis Where should I start learning data-science and ai? like what are some good resou...

!resources data science

arctic wedgeBOT May 31, 2022, 2:35 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

half jolt May 31, 2022, 3:10 AM

#

Hello, could someone give me a hand and guide me how I could do this in an array type and not in a list like the example?
https://paste.pythondiscord.com/ehilucojet

south condor May 31, 2022, 4:26 AM

#

Please check out the LineaPy
LineaPy Data Science Workflow In Just Two Lines: MLOps Made Easy
https://towardsdatascience.com/lineapy-data-science-workflow-in-just-two-lines-mlops-made-easy-679f36ac63bd?sk=78a8fa6cf59180eb25177f64ee87d50e

Medium

LineaPy Data Science Workflow In Just Two Lines: MLOps Made Easy

Data engineering, simplified

#

Please check this one-basics but still useful.
All about Python — 100 + Code Snippets, Tricks, Concepts and Important Modules
https://towardsdatascience.com/all-about-python-100-code-snippets-tricks-concepts-and-important-modules-9a9fda489b6b?sk=dc45d9ed480c8854cb15c48bfa13d672

Medium

All about Python — 100 + Code Snippets, Tricks, Concepts and Import...

Python is the most popular language as of now. It is used heavily in all fields from website building to artificial intelligence.

worldly dawn May 31, 2022, 4:29 AM

#

south condor Please check out the LineaPy LineaPy Data Science Workflow In Just Two Lines: ML...

Is this an ad?

misty flint May 31, 2022, 4:36 AM

#

celest flax iirc google didnt get full access to develop on m1 chips, not sure tho

no i just heard a recent podcast about how they released something specifically for M1 chips recently so check again and make sure your stuff is up to date. i mean, the podcast could also be wrong but the host seemed to know what he was talking about.

wooden sail May 31, 2022, 4:37 AM

#

celest flax m1 chip mac

i think you need to install metal for it to work well on mac https://developer.apple.com/metal/tensorflow-plugin/

Apple Developer

Tensorflow Plugin - Metal - Apple Developer

Find presentations, documentation, sample code, and resources for building macOS, iOS, and tvOS apps with the Metal framework.

pliant pewter May 31, 2022, 8:50 AM

#

What are the best discord communities for data science and AI/ML/DL?

wooden sail May 31, 2022, 8:54 AM

#

there's a data science and a math server that are part of the same network. both are good

pliant pewter May 31, 2022, 8:55 AM

#

I'm on a math server, it's just called Mathematics, is that the one?

#

Has a wireframe torus logo

wooden sail May 31, 2022, 9:10 AM

#

yeah that's the one. there's a channel on applied computational math

#

i'm pretty sure there's an AI-related server in their network, too

pliant pewter May 31, 2022, 9:19 AM

#

Yeah, I found the AI one. Cool

gray steppe May 31, 2022, 10:00 AM

#

hi guys, any help regarding this?

stable anchor May 31, 2022, 10:29 AM

#

wtf is this

#

@celest vine u want me to learn this s**t

wooden sail May 31, 2022, 10:42 AM

#

what are the usual regression model assumptions?

stable anchor May 31, 2022, 10:42 AM

#

https://tenor.com/view/science-statistics-chart-data-gif-17263757

Tenor

wooden sail May 31, 2022, 10:42 AM

#

i guess they're referring to optimality of LS under AWGN. this isn't AWGN, and so the estimator should have a covariance matrix that accounts for this

#

or in other words, the estimator will depend on the inverse covariance of the income

#

it should become more or less clear if you go all the way back to the expression of the PDF of the data and formulate it as a maximum likelihood problem

celest vine May 31, 2022, 11:19 AM

#

stable anchor <@968174073647599617> u want me to learn this s**t

Sorry brother, do want you what

celest flax May 31, 2022, 11:52 AM

#

wooden sail i think you need to install metal for it to work well on mac https://developer.a...

will metal stop the illegal instruction error

wooden sail May 31, 2022, 11:56 AM

#

idk, i didn't see which error you got

celest flax May 31, 2022, 11:59 AM

#

plus conda just doesnt wanna find the tensorflow deps

#

lmao i followed what the conda website says and it still dont work

#

okay i manually installed the latest release

#

still gives ```Process finished with exit code 132 (interrupted by signal 4: SIGILL)

#

i did the entire apple instruction too

#data-science-and-ml

Output

Output