#data-science-and-ml | Python | Page 21

hasty mountain Oct 7, 2022, 7:25 PM

#

Probably

#

Hey @serene scaffold , tell me a bit about LSTMs in text networks... If I make a GAN for text without any LSTMs, without any syllable/word/sentence sequence, my model will only generate text without any logic, right? ~~Even if I pass to the discriminator texts with some logic?~~ Nevermind this latter. I remembered that if I don't shuffle my batch, it'll overfit the model

#

Can I simply pass sequences as inputs to both generator and discriminator without using LSTMs? Or should I use them in order to achieve better performance?

desert oar Oct 7, 2022, 7:29 PM

#

even with overfitting, loss shouldn't start at zero. also that test curve is weirdly smooth compared to the train curve. check for bugs in your code

novel python Oct 7, 2022, 7:34 PM

#

desert oar even with overfitting, loss shouldn't start at zero. also that test curve is wei...

alright, will double check the parameters

desert oar Oct 7, 2022, 7:35 PM

#

novel python alright, will double check the parameters

i'm less concerned about the parameters and more about the train/test split and/or how you're calculating loss

#

when in doubt, try to simplify, then work back upwards

novel python Oct 7, 2022, 7:53 PM

#

I'm creating sequences and then separating those sequences into train and test set with the following functions:


def to_sequences(data, seq_len):
    d = []

    for index in range(len(data)+1 - seq_len):
        d.append(data[index: index + seq_len])

    return np.array(d)

def preprocess(data_raw, seq_len, train_split):

    data = to_sequences(data_raw, seq_len)

    num_train = int(train_split * data.shape[0])

    X_train = data[:num_train, :-1, :]
    y_train = data[:num_train, -1, :]

    X_test = data[num_train:, :-1, :]
    y_test = data[num_train:, -1, :]

    return X_train, y_train, X_test, y_test


X_train, y_train, X_test, y_test =\
 preprocess(scaled_set1, SEQ_LEN, train_split = 0.8)```

#

what i'm finding weird is that the original dataset has 7 points (set1), but the X_train and X_test have only 3 and 1 sequence, respectively, while 5 sequences are possible:

young granite Oct 7, 2022, 9:26 PM

#

wanted to use this old dataset of mine to try a bit of algorithm and nn on it but i struggle to find a suitable approach to generate 1 big dataset out of n=349 dfs like shown:

#

so far i tried .pivot with the column "temp"

#

atm the dfs are all stored inside a dict with range(0, 350)

soft badge Oct 7, 2022, 9:46 PM

#

people what is the best place for learn Machine learning?

young granite Oct 7, 2022, 9:48 PM

#

soft badge people what is the best place for learn Machine learning?

maybe kaggle works for u in my opinion its a bit to easy with the tuts and then the competitions are rlly hard but there are some good githubs u can check from contributors

soft badge Oct 7, 2022, 9:59 PM

#

I need to understand how it works and why it works, you know?

#

i was studying for w3school but i dont like the blog of ML

dusky mesa Oct 7, 2022, 10:00 PM

#

so i have these two csv files the first one is the ingrediets and the total amount we have of it and second file is a pastry, the price the pastry sells for then the amount of ingredients needed to create that pastry

Screen_Shot_2022-10-07_at_5.58.47_PM.png

Screen_Shot_2022-10-07_at_5.58.55_PM.png

#

sorry for cutting you off

#

but i need to find the best solution, given your circumstances. Output the total profit and how much of each
pastry you have to make

#

i'm not here for the answer i want to actually understand how i would go about this

young granite Oct 7, 2022, 10:08 PM

#

divide the amounts u need for each recipe to see how many u could produce after that u could build total price

dusky mesa Oct 7, 2022, 10:08 PM

#

by that you mean the max number of each pastry right?

young granite Oct 7, 2022, 10:09 PM

#

ye

#

u could then also check whats the best function for a mix of pastrys with the given amount

dusky mesa Oct 7, 2022, 10:13 PM

#

sorry if its dumb qs but just to make sure i divide the ingredient amount needed by total we have of that ingredient to find max pastry we could make right

#

so for apple pie i got 158, croissant 79, poppy seed 51

young granite Oct 7, 2022, 10:16 PM

#

u need to consider u can always only produce the least amount possible

#

so if
Y F S
1 2 3
is the result u can only do 1

young granite Oct 7, 2022, 10:17 PM

#

young granite wanted to use this old dataset of mine to try a bit of algorithm and nn on it bu...

got it i was dumbo

dusky mesa Oct 7, 2022, 10:18 PM

#

young granite so if Y F S 1 2 3 is the result u can only do 1

im a bit confused

#

so what i did was the max amount of each individual pastry that could be made with the total amount of ingredients

young granite Oct 7, 2022, 10:20 PM

#

yes but what if for example for x pastry u would need x sugar but >x flour

#

if it works out just fine thats good but u need to consider that

dusky mesa Oct 7, 2022, 10:20 PM

#

ohh ok yea i took that into account i divided each ingredient amount needed for apple pie by its total

#

then i took the lowest amount

young granite Oct 7, 2022, 10:21 PM

#

👍

dusky mesa Oct 7, 2022, 10:21 PM

#

thats what you mean right

regal ingot Oct 7, 2022, 10:21 PM

#

how do i do A* (star) search when my goal state is finding all the keys in a grid. Like how do i calculate my heuristic

dusky mesa Oct 7, 2022, 10:48 PM

#

young granite 👍

do yk what i should calculate next

#

how would i find the best combo to maximize profit

regal ingot Oct 7, 2022, 11:04 PM

#

well

soft badge Oct 7, 2022, 11:07 PM

#

people what is best place for learn machine learning with fundamentals

regal ingot Oct 7, 2022, 11:12 PM

#

lol

red canyon Oct 7, 2022, 11:33 PM

#

Neural network from scratch 🤓

serene scaffold Oct 8, 2022, 12:22 AM

#

red canyon Neural network from scratch 🤓

Have you read it?

red canyon Oct 8, 2022, 12:32 AM

#

serene scaffold Have you read it?

yeah its really good

#

im not done thoguh

serene scaffold Oct 8, 2022, 12:54 AM

#

red canyon yeah its really good

Good to know. It's on our resources page, but I can't actually verify that it's good

red canyon Oct 8, 2022, 1:04 AM

#

serene scaffold Good to know. It's on our resources page, but I can't actually verify that it's ...

Ok

serene scaffold Oct 8, 2022, 1:04 AM

#

Yes

latent cairn Oct 8, 2022, 1:34 AM

#

Add a new categorical column to df_housing called NOXCAT. This column categorizes the suburbs into towns with LOW, MEDIUM, and HIGH nitric oxides concentration (based on the variable NOX). The categorization should be based on quantiles of NOX as follows:
LOW (NOX <= 30% quantile)
MEDIUM (> 30% quantile; <= 70% quantile)
HIGH (> 70% quantile).

#

There is a dataset with a column NOX, all numbers with about 3 decimals.

I know this will be way off but my attemp that keeps getting an error is;

itm_low= np.quantile(df_housing["NOX"], q=0.30)
itm_med= np.quantile(df_housing["NOX"], q=0.70)
itm_high= np.quantile(df_housing["NOX"], q=1)
df_housing['NOXCAT']= {"NOX": {(itm_low): "LOW", (itm_med): "MEDIUM", (itm_high): "HIGH"}}

Any assistance would be much appreciated!

hasty mountain Oct 8, 2022, 3:51 AM

#

Does anyone have a tip to get the closest float number from a certain input?
I'm testing a word prediction model and I'm trying to work with data within range [-1, 1]. The model is doing quite fine, but I'm having some problems when trying to convert my tokens back to words again.
How can I make an output which has value -0.0703 be converted to a word which has value(in my dictionary) -0.0702?

hasty mountain Oct 8, 2022, 4:37 AM

#

Meh. I'll just stick to scikit learn's nearest neighbours...

latent cairn Oct 8, 2022, 5:19 AM

#

I'm still trying to work my one out, ffs

wooden sail Oct 8, 2022, 5:26 AM

#

hasty mountain Does anyone have a tip to get the closest float number from a certain input? I'm...

the operation is in general not invertible. in special cases, you can reconstruct the values using sparse recovery, so L1 regularized optimization

hasty mountain Oct 8, 2022, 5:27 AM

#

Oh... I see...

#

Uh...well...at least it worked with KNN...

#

I'm not even using embedding layers, since I'm using floats and not using one-hot encoding. Hope this doesn't prejudice the model too much.

latent cairn Oct 8, 2022, 5:31 AM

#

latent cairn Add a new categorical column to df_housing called NOXCAT. This column categorize...

My answer to this is;

is_small = df_housing['NOX'] < df_housing['NOX'].quantile(.3)
is_large = df_housing['NOX'] > df_housing['NOX'].quantile(.7)
is_medium = ~(is_small | is_large)

df_housing['NOXCAT'] = df_housing['NOX'].mask(is_small, 'small').mask(is_large, 'large').mask(is_medium, 'medium')

print (df_housing['NOXCAT'])

Seems to work.

#

just need to change the names around to LOW, MEDIUM and HIGH

#

maate no wonder software engineers and data scientists are on the big bucks, being proficient at excel I thought I was clever until I took on this stuff

shell crest Oct 8, 2022, 5:33 AM

#

!rule 8

arctic wedgeBOT Oct 8, 2022, 5:33 AM

#

Rules

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

shell crest Oct 8, 2022, 5:33 AM

#

Wait no

#

!rule 9

arctic wedgeBOT Oct 8, 2022, 5:33 AM

#

Rules

9. Do not offer or ask for paid work of any kind.

shell crest Oct 8, 2022, 5:34 AM

#

arctic wedge

Anyway other than this, anyone who claims to be 'paying well' is 99.95% guaranteed to not be paying well

latent cairn Oct 8, 2022, 5:34 AM

#

you sound really nice, I'm sure someone will be hanging to work for you

#

All the best chief

shell crest Oct 8, 2022, 5:36 AM

#

<@&831776746206265384>

wooden sail Oct 8, 2022, 6:16 AM

#

hasty mountain I'm not even using embedding layers, since I'm using floats and not using one-ho...

ok, not using embeddings does make it a lot easier. knn is certainly once way to do it, but it depends how. in theory you already know the centroids of the voronoi regions as these are what you have in your dict already, so there's no need to compute them again

#

you could also use the 2-norm (euclidean distance) if the encoding is multidimensional

hasty mountain Oct 8, 2022, 6:18 AM

#

wooden sail ok, not using embeddings does make it a lot easier. knn is certainly once way to...

Nice. Thanks. Then I'll stick to the knn

atomic shadow Oct 8, 2022, 6:45 AM

#

Hi, any idea on which algorithm i can explore (or article for reference) if i want to predict a coordinate value (x,y) based on x values (x1, x2, ... , x (n))

Eg:
| x1 | x2 | x3 | ....... | y
0.43 | 0.56 | 31.21 | ....... | (3.51, 4.66)

I tried RandomForest but it gives me error - "ValueError: could not convert string to float: ''

devout sail Oct 8, 2022, 6:46 AM

#

That's a Python error on your end, not an algorithmic error

atomic shadow Oct 8, 2022, 6:50 AM

#

devout sail That's a Python error on your end, not an algorithmic error

should I convert all my x values to float then?

devout sail Oct 8, 2022, 6:51 AM

#

They should be floats yeah

atomic shadow Oct 8, 2022, 6:55 AM

#

Let me try

versed gulch Oct 8, 2022, 8:41 AM

#

Hi is there a way I can condense my code, what I'm doing here is taking a centre pixel and looking neighbouring pixel values that are equal to 255

coords = zip(z_coords.tolist(), x_coords.tolist(), y_coords.tolist())
for z, x, y in coords:
  # exclude edegs/boundary of skeleton image (may cpuld pad skeleton image in the future)
  if z == skel3d.shape[0] - 1 or x == skel3d.shape[1] - 1  or y == skel3d.shape[2] -1:
    continue
  # keep track of the neighbours
  neighbours = []
  
  # current slice
  neighbours.append(skel3d[z, x-1, y-1])
  neighbours.append(skel3d[z, x-1, y])
  neighbours.append(skel3d[z, x-1, y+1])
  # middle so exclude the actual centre voxel - except for prev and next slice
  neighbours.append(skel3d[z, x, y-1])
  neighbours.append(skel3d[z, x, y+1])
  
  neighbours.append(skel3d[z, x+1, y-1])
  neighbours.append(skel3d[z, x+1, y])
  neighbours.append(skel3d[z, x+1, y+1])
  
  # previous slice
  neighbours.append(skel3d[z-1, x-1, y-1])
  neighbours.append(skel3d[z-1, x-1, y])
  neighbours.append(skel3d[z-1, x-1, y+1])
  
  neighbours.append(skel3d[z-1, x, y-1])
  neighbours.append(skel3d[z-1, x, y])
  neighbours.append(skel3d[z-1, x, y+1])
  
  neighbours.append(skel3d[z-1, x+1, y-1])
  neighbours.append(skel3d[z-1, x+1, y])
  neighbours.append(skel3d[z-1, x+1, y+1])
  
  # next slice
  neighbours.append(skel3d[z+1, x-1, y-1])
  neighbours.append(skel3d[z+1, x-1, y])
  neighbours.append(skel3d[z+1, x-1, y+1])
  
  neighbours.append(skel3d[z+1, x, y-1])
  neighbours.append(skel3d[z+1, x, y])
  neighbours.append(skel3d[z+1, x, y+1])
  
  neighbours.append(skel3d[z+1, x+1, y-1])
  neighbours.append(skel3d[z+1, x+1, y])
  neighbours.append(skel3d[z+1, x+1, y+1])
  
  if neighbours.count(255) > 2:
    print(z, x, y)

compact star Oct 8, 2022, 8:48 AM

#

has anyone created their own python implementation of neat? It would be really helpful if I could see it

strong sedge Oct 8, 2022, 9:44 AM

#

versed gulch Hi is there a way I can condense my code, what I'm doing here is taking a centre...

for loops ? lmao

devout sail Oct 8, 2022, 9:45 AM

#

versed gulch Hi is there a way I can condense my code, what I'm doing here is taking a centre...

Turn neighbors into a numpy array, compare the entire thing to 255, and apply a convolution on the result of the comparison

strong sedge Oct 8, 2022, 9:46 AM

#

^ would work better

devout sail Oct 8, 2022, 9:48 AM

#

Do a > 2 on the result of that, and send it to numpy.where to get the indices

serene scaffold Oct 8, 2022, 12:32 PM

#

Omg hi @devout sail

#

Wyd here

coral nimbus Oct 8, 2022, 1:09 PM

#

Hi guys, is there anyone here who has experience on number recognition by chance?

serene scaffold Oct 8, 2022, 1:10 PM

#

coral nimbus Hi guys, is there anyone here who has experience on number recognition by chance...

Please always ask your actual question, rather than asking if people know something

tacit nacelle Oct 8, 2022, 2:53 PM

#

hey! so I have this final year project in the theme of city. I thought about doing a program to optimize traffic light system. So my idea is counting vehicles on each waiting queue ( which I've already done using opencv and yolov3), but since I'm finding problems implementing an algorithm I found that uses a conflict matrix, I'm looking for alternatives things I can do in case I couldn't realize the code

#

something that uses vehicles detection

#

and that is not just programs but also math theories

arctic wedgeBOT Oct 8, 2022, 2:55 PM

#

Hey @tacit nacelle!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

viral oak Oct 8, 2022, 3:06 PM

#

Is it not recommended to try make an AI crack sha256 hashes?

serene scaffold Oct 8, 2022, 4:14 PM

#

viral oak Is it not recommended to try make an AI crack sha256 hashes?

Ai can't do that.

lapis sequoia Oct 8, 2022, 4:33 PM

#

Is dumping model with pickle library fine

#

I dumped it and loaded it in another script. But the predictions seems to be a bit off. So I was wondering if there's some issue with the model loading

#

Because my accuracy was good earlier

#

Infact it couldn't even predict well the data it was trained on

serene scaffold Oct 8, 2022, 4:35 PM

#

lapis sequoia Is dumping model with pickle library fine

If the predictions seem wrong, pickle probably isn't the problem. If there was a problem with using pickle, it would probably fail to load.

#

But for whatever library you used to train the model, I would use it's native saving functionality.

lapis sequoia Oct 8, 2022, 4:36 PM

#

I did model=logisticreg()
Model.fit(train, test)
Pickle.dump(model)

#

Is that fine way?

serene scaffold Oct 8, 2022, 4:36 PM

#

See if there's a save method for the model object

lapis sequoia Oct 8, 2022, 4:37 PM

#

Hmm

#

I used sklearn

#

And official sklearn says to use pickle

#

Might be something wrong with my preprocessing maybe then

viral oak Oct 8, 2022, 4:49 PM

#

serene scaffold Ai can't do that.

well i'm trying anyway

#

my lowest is 89 bits off out of 256 (Maximum error across 1,500 random hashes)

mild edge Oct 8, 2022, 5:09 PM

#

hey I want to learn AI can anyone share any roadmap

austere epoch Oct 8, 2022, 5:09 PM

#

@grand lion

#

#

whatever plot is my copied ax, is slightly shifted

austere epoch Oct 8, 2022, 5:30 PM

#

(ignore, figured it out)

serene scaffold Oct 8, 2022, 5:33 PM

#

viral oak well i'm trying anyway

That's not how it works. AI can't learn stuff that's basically random.

viral oak Oct 8, 2022, 5:41 PM

#

serene scaffold That's not how it works. AI can't learn stuff that's basically random.

i mean, yeah, but worth a shot

agile cobalt Oct 8, 2022, 5:41 PM

#

not worth a shot
AI isn't magic, it's a science
it is not applicable to that task at all

serene scaffold Oct 8, 2022, 5:43 PM

#

viral oak i mean, yeah, but worth a shot

You could potentially overfit a model to the hashes in your training data, but it simply isn't possible to create a generalized model that can do this.

viral oak Oct 8, 2022, 5:43 PM

#

agile cobalt not worth a shot AI isn't magic, it's a science it is not applicable to that tas...

I just want to make hash cracking faster for 32 bytes

serene scaffold Oct 8, 2022, 5:43 PM

#

viral oak I just want to make hash cracking faster for 32 bytes

Then it absolutely is not "worth a shot" to do it this way.

viral oak Oct 8, 2022, 5:44 PM

#

serene scaffold Then it absolutely is not "worth a shot" to do it this way.

ok

austere epoch Oct 8, 2022, 5:51 PM

#

viral oak i mean, yeah, but worth a shot

Without reading above, it is literally impossible to predict randomness

#

By definition of what randomness is

mint palm Oct 8, 2022, 5:57 PM

#

so i am using ssh and was having problem of process getting "killed" probably due to resource utilisation.
I had to extract feature using resnext3d on 1900 videos
doing all was giving that "killed" error
so i tried to decrease the number of videos i did feature extraction.
on 10 videos it took 50 minutes.
I dont know if its normal or no, Please guide.

misty flint Oct 8, 2022, 5:58 PM

#

this was pretty nifty https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor

Discovering novel algorithms with AlphaTensor

In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matric...

#

tldr is this:

image_6842a8a6-f9b7-4368-9532-38d3e872acae20221008_125826.jpg

viral oak Oct 8, 2022, 6:00 PM

#

austere epoch By definition of what randomness is

yeah

tidal bough Oct 8, 2022, 6:52 PM

#

it's pretty cool that they managed to discover a new multiplication algorithm for 4x4 matrixes over Z/2

#

hopefully they adjust their loss function and see if they can generate some novel algorithms that work on floats (or even just on ints)

proper wing Oct 8, 2022, 7:53 PM

#

hey if anyone is familiar with deploying a ml model in ai can u check my q in broccoli pls thx

lapis sequoia Oct 8, 2022, 9:29 PM

#

#

I need help on part 5

#

import numpy as np
import pandas as pd

yellow_taxi = pd.read_csv('2018_Yellow_Taxi_Trip_Data.csv')
# For each month, print the entire row with the highest fare_amount
print("For each month, print the entire row with the highest fare_amount.")
# obtain month from pickupdate
yellow_taxi['month'] = pd.DatetimeIndex(yellow_taxi['tpep_pickup_datetime']).month
monthlyMaxFares = yellow_taxi.groupby(['month'])['fare_amount'].max()

# for month_index in range(1,13):
#    month_subset = yellow_taxi.loc[yellow_taxi['month']==month_index]
#    max_fare_row = month_subset.loc[month_subset['fare_amount']==np.max(month_subset['fare_amount'])]
#    if max_fare_row.shape[0] != 0:
#        print(max_fare_row)

serene scaffold Oct 8, 2022, 9:31 PM

#

lapis sequoia ```python import numpy as np import pandas as pd yellow_taxi = pd.read_csv('201...

Look into idxmax

#

Once you know the idxmax, you can get the rows

lapis sequoia Oct 8, 2022, 9:35 PM

#

iv never heard of this before lol but let me check

#

cuz i have the csv i have to get maximum of certain column for every month and have to print down the entire row

odd meteor Oct 8, 2022, 9:41 PM

#

lapis sequoia Infact it couldn't even predict well the data it was trained on

The new script where this model is loaded in, is it running on the same machine you used to train the model in the first place? If No, then that's probably what could have caused the drift.

odd meteor Oct 8, 2022, 9:49 PM

#

lapis sequoia Is dumping model with pickle library fine

Yes it is. Giving the drift you're currently experiencing, you might wanna try using joblib instead of pickle. Then compare and contrast if there's any significant change in your model performance.

odd meteor Oct 8, 2022, 9:59 PM

#

mild edge hey I want to learn AI can anyone share any roadmap

Hi Musk, checkout this roadmap. You might wanna pay less attention to the timeline therein as I don't find very realistic.

Meanwhile.... I need a new Tesla 😊

serene scaffold Oct 8, 2022, 10:07 PM

#

odd meteor Hi Musk, checkout this roadmap. You might wanna pay less attention to the timeli...

Some of these need to be abbreviated so that deep learning can have more time. And NLP should probably be dropped completely

odd meteor Oct 8, 2022, 10:16 PM

#

serene scaffold Some of these need to be abbreviated so that deep learning can have more time. A...

Month 8 might probably be for learning a Deep Learning framework, who knows? 😄 I do find it ridiculously unrealistic to learn CV + NLP in 1 month. Is one month even enough to do justice to CV alone?

serene scaffold Oct 8, 2022, 10:30 PM

#

odd meteor Month 8 might probably be for learning a Deep Learning framework, who knows? 😄 ...

I doubt it

regal ingot Oct 8, 2022, 10:37 PM

#

how do i get a heuristic for multiple targets

#

like my goal isn't an end point but finding all keys

serene scaffold Oct 8, 2022, 10:55 PM

#

@regal ingot can you elaborate?

low bloom Oct 8, 2022, 10:57 PM

#

whats the best way to add metadata to a df in pandas python?

#

feel free to @ me

storm kelp Oct 8, 2022, 11:16 PM

#

Any one here read Hands on machine learning with ...?

#

Seems like a decent book

serene scaffold Oct 9, 2022, 12:26 AM

#

low bloom whats the best way to add metadata to a df in pandas python?

What metadata?

low bloom Oct 9, 2022, 12:26 AM

#

I think I figured it out
I am basically doing it this way

        # adding metadata of sheet name into the df to be used later
        df.sheet_name = sheet_name

serene scaffold Oct 9, 2022, 12:28 AM

#

low bloom I think I figured it out I am basically doing it this way ```python # ad...

This is too vague for me to understand what would be helpful for you. If you have a specific question, feel free to ask.

low bloom Oct 9, 2022, 12:28 AM

#

serene scaffold This is too vague for me to understand what would be helpful for you. If you hav...

thank you
no worries on this question though I think I am good for now

#

thank you though!

latent cairn Oct 9, 2022, 12:33 AM

#

Does anyone know of a credible paid tutoring service for pandas, numpy, matplotlib, scripy? Not sure I will use it but curious what kind of resources are out there.

serene scaffold Oct 9, 2022, 12:33 AM

#

latent cairn Does anyone know of a credible paid tutoring service for pandas, numpy, matplotl...

I don't. You can just ask questions in this channel, and as long as they're well formulated, people will be happy to answer.

latent cairn Oct 9, 2022, 12:34 AM

#

latent cairn Add a new categorical column to df_housing called NOXCAT. This column categorize...

This

#

I tried to create 3 variables using numpy.quantile, just can’t get it to work

serene scaffold Oct 9, 2022, 12:35 AM

#

I would have answered that if I were at a desktop

#

Unfortunately I'm on vacation. So I only have my phone.

latent cairn Oct 9, 2022, 12:35 AM

#

It’s ok, thanks though

serene scaffold Oct 9, 2022, 12:35 AM

#

Try again on Tuesday

latent cairn Oct 9, 2022, 12:36 AM

#

Just frustrating, uni course pre-requisites were basically nothing. They sting you $3500 and the lectures assume a lot of prior knowledge, I’m on track for a fail.

#

Good way for them to make money I guess.

shell crest Oct 9, 2022, 12:41 AM

#

latent cairn Add a new categorical column to df_housing called NOXCAT. This column categorize...

To get better help I suggest you come up with public available toy data and code

shell crest Oct 9, 2022, 12:49 AM

#

latent cairn Add a new categorical column to df_housing called NOXCAT. This column categorize...

At best I can give you this

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df.rename(columns={"sepal length (cm)": "a"}, inplace=True)
quantiles = np.quantile(df["a"], (0.2, 0.5, 0.8))
def cat(x):  # categorise
    if x < quantiles[0]: 
        return "LOW"
df["b"] = df["a"].apply(cat)

#

I don't want to ping Stelercus but let me know if you think that's too much help

latent cairn Oct 9, 2022, 12:59 AM

#

Cheers I’ll give it a run in a few hours

shell crest Oct 9, 2022, 1:00 AM

#

It won't solve your problem if that's what you're expecting

latent cairn Oct 9, 2022, 1:00 AM

#

I need to name the quantiles LOW MEDIUM HIGH and then do visualisations and regress each

#

That’s ok

#

I’m gonna try get my money back on the basis the course outline is misleading

#

Respect to those that are good at this, a lot to learn

serene scaffold Oct 9, 2022, 1:29 AM

#

shell crest I don't want to ping Stelercus but let me know if you think that's too much help

Do you think I'll eat you?

#

Your solution could be made a bit more performant with a loc assignment

#

I guess that's not actually a solution. And I strongly agree with your saying that creating a toy example is very helpful

latent cairn Oct 9, 2022, 1:50 AM

#

latent cairn My answer to this is; is_small = df_housing['NOX'] < df_housing['NOX'].quantile...

This was my answer, but when I regressed each against another column it omits LARGE, leading me to believe it’s wrong

#

It’s due tonight so I’ll submit that and hope for the best, problem is that part is only worth 5% 😵

#

The rest is probably even more difficult

shell crest Oct 9, 2022, 1:59 AM

#

serene scaffold Your solution could be made a bit more performant with a loc assignment

oh it's my first "let's do this" kind of solution

shell crest Oct 9, 2022, 2:00 AM

#

latent cairn This was my answer, but when I regressed each against another column it omits LA...

I think this means you don't quite understand the regression.

#

I'm not sure if you're regressing it correctly

latent cairn Oct 9, 2022, 2:02 AM

#

The newly created column NOXCAT in df_housing is a categorical column with three possible values (LOW, MEDIUM, and HIGH).

Create a set of dummy variables (for different values of NOXCAT).
Regress MEDV on the different NOX categories using the dummy variables. Choose the dummy variable coding in your regression such that the intercept reflects the MEDV value of suburbs in the MEDIUM category. Save the regression result as res_2 and print the regression result to the console.
Report the regression results from res_2 in your own words according to APA stype and interpret the coefficients.
Hint: Look at pd.get_dummies.

#

ANSWER that doesn't provide coef for HIGH

pd.get_dummies(df_housing['NOXCAT'], prefix="dummy")

#print (df_housing) - checked

mod_2 = smf.ols('MEDV ~ NOXCAT', data=df_housing)
res_2 = mod_2.fit()
print (res_2.summary())

#

I do note I see no HIGH values in NOXCAT, so I think my previous answer is wrong

#

That is what it produces

high creek Oct 9, 2022, 2:08 AM

#

Can I do MS in AI or NLP with a BS in IT?

serene scaffold Oct 9, 2022, 2:24 AM

#

high creek Can I do MS in AI or NLP with a BS in IT?

You might have to take some prerequisites before you would be able to start the MS courses. The best way to figure this out would be to look at the admissions websites for the programs.

There usually isn't an "MS in AI", and there definitely won't be one in NLP. It would probably be an MS in CS. If you know you want to do NLP, make sure there are research faculty at that university who specializes in it.

lapis sequoia Oct 9, 2022, 2:39 AM

#

odd meteor The new script where this model is loaded in, is it running on the same machine ...

It's the same machine but different consoles. I am using it with flask in anaconda second time. First time I trained it in jupyter.

high creek Oct 9, 2022, 2:55 AM

#

serene scaffold You might have to take some prerequisites before you would be able to start the ...

Do undergrad research?

lapis sequoia Oct 9, 2022, 2:55 AM

#

Are you in US btw?

serene scaffold Oct 9, 2022, 2:56 AM

#

high creek Do undergrad research?

I don't understand the question.

#

Please make sure that your question is a complete sentence. If you use an incomplete sentence thinking that I'll know what the intended full sentence is, there's a very high chance that we'll miscommunicate.

high creek Oct 9, 2022, 3:05 AM

#

serene scaffold I don't understand the question.

What you mean by this, "If you know you want to do NLP, make sure there are research faculty at that university who specializes in it."

high creek Oct 9, 2022, 3:06 AM

#

lapis sequoia Are you in US btw?

Yes

high creek Oct 9, 2022, 3:07 AM

#

serene scaffold You might have to take some prerequisites before you would be able to start the ...

They are faculty that specialize in it

serene scaffold Oct 9, 2022, 3:18 AM

#

high creek What you mean by this, "If you know you want to do NLP, make sure there are rese...

If you want to get an MS so that you can do NLP, you would get a CS degree and take the NLP courses. And if there's an NLP specialist, you could do thesis work with them.

high creek Oct 9, 2022, 3:19 AM

#

serene scaffold If you want to get an MS so that you can do NLP, you would get a CS degree and t...

Ah I understand

#

My major is fully Python

#

The CS major is heavily C language

serene scaffold Oct 9, 2022, 3:20 AM

#

high creek My major is fully Python

Basically all of NLP is done in Python. The "main" courses might be in C, but the AI ones won't be.

hasty mountain Oct 9, 2022, 3:20 AM

#

Can someone tell me some metrics and tricks using loss functions for Text Generator Models? I suppose there's some GANs for text in order to have a good conversation model, right? Perhaps some metric or trick to measure how much the generated text makes sense?

I know that, in SRGAN, it was used a MSE Loss multiplied by an "adversarial loss", in order to achieve the "pixel-wise loss"(or something like this), which can improve the GAN output diversity.

high creek Oct 9, 2022, 3:20 AM

#

Ah I see

#

Is Data Science needed for NLP

serene scaffold Oct 9, 2022, 3:21 AM

#

high creek Is Data Science needed for NLP

There's overlap between what is considered data science and what is considered NLP. Beyond that, I don't agree with the premise of your question.

high creek Oct 9, 2022, 3:22 AM

#

Ok understood

serene scaffold Oct 9, 2022, 3:22 AM

#

"data science" isn't a well defined thing. Linear algebra is, however. And it's needed for NLP.

high creek Oct 9, 2022, 3:23 AM

#

Okay thx for your time

strong sedge Oct 9, 2022, 5:11 AM

#

tacit nacelle hey! so I have this final year project in the theme of city. I thought about doi...

I have always thought of making a reinforced learning algorithm for better traffic lights, never got to it

strong sedge Oct 9, 2022, 5:15 AM

#

odd meteor Hi Musk, checkout this roadmap. You might wanna pay less attention to the timeli...

can you guys take mock interviews lmao, I would attend 100%
I am kinda scared of them

#

lol

shell crest Oct 9, 2022, 5:46 AM

#

latent cairn I do note I see no HIGH values in NOXCAT, so I think my previous answer is wrong

You need to read what is R doing for categorical predictors.
e.g. https://rpubs.com/beane/n3_5

latent cairn Oct 9, 2022, 6:06 AM

#

shell crest You need to *read* what is R doing for categorical predictors. e.g. https://rpub...

Cheers, I think I need to convert the categorical values to int, I’ll have a better read later and play around. Have 3hrs left til it’s due 🤪 Thinking with what I’ve done will go close to 50% and keep me afloat for now…

torpid quartz Oct 9, 2022, 6:20 AM

#

I want to get into concepts of ai and ml, where should i start?

worldly dawn Oct 9, 2022, 6:22 AM

#

torpid quartz I want to get into concepts of ai and ml, where should i start?

https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow-dp-1492032646/dp/1492032646/ is a good intro.
I also like https://www.amazon.com/Artificial-Intelligence-Textbook-Charu-Aggarwal/dp/3030723569/

torpid quartz Oct 9, 2022, 6:23 AM

#

Are these all books?

#

Just asking

worldly dawn Oct 9, 2022, 6:24 AM

#

torpid quartz Are these all books?

the links seem to point at books

torpid quartz Oct 9, 2022, 6:24 AM

#

Thanks, got it

#

I also saw a course at fcc, so toot think that is worth it?

worldly dawn Oct 9, 2022, 6:25 AM

#

torpid quartz I also saw a course at fcc, so toot think that is worth it?

I did not point at the fcc, so no, I don't think the fcc is worth it

torpid quartz Oct 9, 2022, 6:26 AM

#

Thanks

mild edge Oct 9, 2022, 9:04 AM

#

odd meteor Hi Musk, checkout this roadmap. You might wanna pay less attention to the timeli...

thanks dude Tesla Coming ur way😂

lapis sequoia Oct 9, 2022, 10:30 AM

#

Hi @serene scaffold

#

How are you

obsidian peak Oct 9, 2022, 10:36 AM

#

https://github.com/YashIndane/webcube

GitHub

GitHub - YashIndane/webcube: Rubik's cube assistant on Flask webapp

Rubik's cube assistant on Flask webapp. Contribute to YashIndane/webcube development by creating an account on GitHub.

#

Rubiks Cube AI assistant

stark ember Oct 9, 2022, 11:09 AM

#

Is it normal for PyTorch CUDA models to show barely any usage in Task Manager?

#

It seems like it just uses a fraction of Copy and fills up VRAM, but it's not actually working too hard

#

Is it possible to somehow use more of the GPU in order to accelerate the workload, or is that just not possible?

#

Sorry if it's a silly question, it's my first time using CUDA

serene scaffold Oct 9, 2022, 11:21 AM

#

lapis sequoia How are you

I'm just fabulous as always

clever sorrel Oct 9, 2022, 11:22 AM

#

hey!

#

I need a help in cnn

gloomy anvil Oct 9, 2022, 12:07 PM

#

Hello y'all! Does anyone of you know the correct term for when you assume the prediction for tomorrow is the same as the value today? I think I read it on machinelearningmastery, but I am unable to find it and I also don't remember what this was called. I believe he introduced it as the simplest benchmark in order to see if a model can beat the simple assumption that value today == value tomorrow

wooden sail Oct 9, 2022, 12:17 PM

#

time or shift invariance, maybe?

#

or stationarity?

serene scaffold Oct 9, 2022, 12:18 PM

#

What about interpolation?

gloomy anvil Oct 9, 2022, 12:24 PM

#

wooden sail time or shift invariance, maybe?

not stationarity. it was a special word that i cannot remember.... what i actually mean is a martingale sequence. but he used another word (and in my eyes better word) for it when he created a baseline model

untold bloom Oct 9, 2022, 12:25 PM

#

"naive" model it is called

gloomy anvil Oct 9, 2022, 12:25 PM

#

while martingale always kind of implys that you double your stakes, he used a word (not it but like) "autoregressive baseline"

untold bloom Oct 9, 2022, 12:26 PM

#

it's random walk's corresponding model

gloomy anvil Oct 9, 2022, 12:27 PM

#

untold bloom it's random walk's corresponding model

do you have some link or paper at hand ? Naive as search word always returns naive bayes 😄

untold bloom Oct 9, 2022, 12:28 PM

#

it's mentioned here https://otexts.com/fpp3/simple-methods.html

#

under "naive method"

#

although "i" is with 2 dots on it

gloomy anvil Oct 9, 2022, 12:29 PM

#

thank you so much!

#

This is what i was looking for! (even though i still believe brownlee used another word 😄 )

wooden sail Oct 9, 2022, 12:31 PM

#

you'd have to give more context 😛

gloomy anvil Oct 9, 2022, 12:31 PM

#

He actually called it naive! You are exactly right @untold bloom : https://machinelearningmastery.com/how-to-grid-search-naive-methods-for-univariate-time-series-forecasting/

Machine Learning Mastery

Jason Brownlee

How to Grid Search Naive Methods for Univariate Time Series Forecas...

Simple forecasting methods include naively using the last observation as the prediction or an average of prior observations. It is important to evaluate the performance of simple forecasting methods on univariate time series forecasting problems before using more sophisticated methods as their performance provides a lower-bound and point of comp...

#

finally found it 🙂

#

stupid follow up question: If i assume the price of X tomorrow is the same as price of X today in a naive model, how do i decide if I should buy or sell? I basically can only make the decision based on the differenced timeseries, right?

#

So if the change from yesterday to today was let's say 2%, I assume 2% for tomorrow as well. Whereas if todays price of X was 100 and I assume 100 for tomorrow as well, there is no room for decisionmaking - which would kind of imply a "hold" strategy, right?

hardy siren Oct 9, 2022, 1:09 PM

#

Given an image of a chess board, I would like to find out what piece each square contains. Is there any python package that could assist this task?

serene scaffold Oct 9, 2022, 1:57 PM

#

hardy siren Given an image of a chess board, I would like to find out what piece each square...

Don't look for libraries/packages. Look for techniques

#

The first step would be segmenting the chessboard image into each tile

#

This should be easy since chessboard is already a grid

#

The second step is to classify each tile as either blank or what piece it is

#

For the second step, you would need training data that has different images of what those pieces could look like

#

Are these pictures of real chess boards, or virtual chess boards that are 2d?

serene scaffold Oct 9, 2022, 3:12 PM

#

@hardy siren sorry, I was away for a bit. Look at 3blue1brown's series about neural networks. He makes a classifier for the MNIST dataset of images, which is a very similar problem.

storm kelp Oct 9, 2022, 3:12 PM

#

latent cairn Just frustrating, uni course pre-requisites were basically nothing. They sting y...

https://jakevdp.github.io/PythonDataScienceHandbook/
There is a second edition coming out this December which will also be online (although you may find a pdf of a pre-release version floating about 👀 )

Python Data Science Handbook | Python Data Science Handbook

#

Covers NumPy, Pandas, Matplotlib, and Scikit-learn

serene scaffold Oct 9, 2022, 3:18 PM

#

storm kelp https://jakevdp.github.io/PythonDataScienceHandbook/ There is a second edition c...

I have the pre release via my O'Reilly account. Perhaps this should go on our resources page?

heavy crow Oct 9, 2022, 3:19 PM

#

Does anyone know of research done on extracting features from images for structure from motion? A neural SIFT so to say. I've only found one or two Papers that don't really delve deep into the subject.

storm kelp Oct 9, 2022, 3:24 PM

#

serene scaffold I have the pre release via my O'Reilly account. Perhaps this should go on our re...

Yeah if you can access the latest version there is no reason not to

lapis sequoia Oct 9, 2022, 4:52 PM

#

Do you have something like that in video format

storm kelp Oct 9, 2022, 5:11 PM

#

lapis sequoia Do you have something like that in video format

You'll never find a video with the level of detail in a text book

swift furnace Oct 9, 2022, 5:35 PM

#

What do data scientists do? It is not clear to me, would anyone mind explaining it?

young granite Oct 9, 2022, 5:36 PM

#

swift furnace What do data scientists do? It is not clear to me, would anyone mind explaining ...

modern statistics

swift furnace Oct 9, 2022, 5:36 PM

#

young granite modern statistics

hmm

#

Would you give me an example of where data science could be used?

young granite Oct 9, 2022, 5:37 PM

#

drug tests or studys

#

correlation causality

#

its a big book

swift furnace Oct 9, 2022, 5:37 PM

#

I've seen some people commenting on the use of Python for managing investments, would that be a case scenario where data science is used?

young granite Oct 9, 2022, 5:37 PM

#

u can use python for everything thats the neat thing bout it

swift furnace Oct 9, 2022, 5:38 PM

#

hmm

young granite Oct 9, 2022, 5:38 PM

#

so u generate data

#

u import the data into ur algorithm

#

u transform the data for better use of it

#

u can run different types of "tests" to see trends in ur data

#

u can visualise ur data

#

so u see there is no clear description

swift furnace Oct 9, 2022, 5:41 PM

#

I see

#

It is quite broad and can be used for anything depending on the context

young granite Oct 9, 2022, 5:41 PM

#

yes

#

like statistics

#

thats why i call it modern statistic

swift furnace Oct 9, 2022, 5:42 PM

#

I got quite interested in investment lately, would Python be a useful tool for analyzing data and then deciding on what would be a good investment?

young granite Oct 9, 2022, 5:42 PM

#

on the first part for sure

#

but what to buy is a bit hard to tell

swift furnace Oct 9, 2022, 5:43 PM

#

young granite but what to buy is a bit hard to tell

This part I'd be doing myself rly

young granite Oct 9, 2022, 5:43 PM

#

the market is not following any rules

swift furnace Oct 9, 2022, 5:43 PM

#

I'd only use Python for analyzing and showing important data

young granite Oct 9, 2022, 5:43 PM

#

ye that works

swift furnace Oct 9, 2022, 5:43 PM

#

hmm

young granite Oct 9, 2022, 5:43 PM

#

easy

swift furnace Oct 9, 2022, 5:44 PM

#

Apparently, data science and machine learning seem to be used together quite often, is ML useful for data science?

young granite Oct 9, 2022, 5:44 PM

#

yfinance or another API

#

for ML u need data so ofc

#

but its not always the best approach to a problem

#

sometimes human brain works aswell

swift furnace Oct 9, 2022, 5:45 PM

#

I see

#

Do u work as a data scientist if I may ask?

young granite Oct 9, 2022, 5:45 PM

#

i plan to do so

swift furnace Oct 9, 2022, 5:46 PM

#

cool

young granite Oct 9, 2022, 5:46 PM

#

yes

#

so i advice u to learn python

swift furnace Oct 9, 2022, 5:47 PM

#

young granite yes

any tips on where I could get started? assuming I already know Python

young granite Oct 9, 2022, 5:47 PM

#

depends on ur background im new to data science aswell

swift furnace Oct 9, 2022, 5:47 PM

#

young granite depends on ur background im new to data science aswell

well, I'm a backend developer

young granite Oct 9, 2022, 5:48 PM

#

well then u got more knowledge then me i guess 😄

#

but if u wanna analyse stocks i can give u my import tool on crypto currencies

lapis sequoia Oct 9, 2022, 5:53 PM

#

storm kelp You'll never find a video with the level of detail in a text book

Yeah but not used to reading text books. They get boring to me.

#

Plus I get overwhelmed by how slow i proceed in a book

#

Like 30 minutes a page

swift furnace Oct 9, 2022, 5:57 PM

#

young granite but if u wanna analyse stocks i can give u my import tool on crypto currencies

that'd be cool, do u have the code on github? if so, would you mind letting me see it?

young granite Oct 9, 2022, 5:58 PM

#

swift furnace that'd be cool, do u have the code on github? if so, would you mind letting me s...

its not complex code only the import part i had not yet managed to work on it further

swift furnace Oct 9, 2022, 5:59 PM

#

young granite its not complex code only the import part i had not yet managed to work on it fu...

that's cool

young granite Oct 9, 2022, 6:00 PM

#

import pandas as pd
from requests_html import HTMLSession

numbers = [number for number in range(0, 1100, 100)]

table = pd.DataFrame()

for number in range(len(numbers)):
  if numbers[number] == 1000:
    break
  else:
    session = HTMLSession()
    resp = session.get(f"https://finance.yahoo.com/cryptocurrencies?offset={numbers[number]}&count=100")
    tables = pd.read_html(resp.html.raw_html)               
    df = tables[0].copy()
    df.index = range(numbers[number],
                     numbers[number+1])
    table = pd.concat([table,df])
    
Symbols = list(table.Symbol)

#

import yfinance as yf
import datetime as dt
import timeit
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import clear_output

fig=go.Figure()
fig = make_subplots(specs=[[{"secondary_y":True}]])

start = "2020-01-01"
end = dt.datetime.now()

count = 0
start_timer = timeit.default_timer()

for x in Symbols:
    count +=1
    
    symbol = x
    x = yf.download(x,
                    start,
                    end,
                   )
    
    stop_timer = timeit.default_timer()
    
    clear_output(wait=True)
    
    print("Current progress:",
          np.round((count/len(Symbols))*100, 2), "%",
          #end="\r"
         )
    print("Current runtime:", np.round((stop_timer-start_timer)/60, 2), "minutes")
    
    
    fig.add_trace(go.Scatter(
        y=x['Open'],
        x=x.index,
        name = symbol,
        legendgroup = symbol,
        marker_color = "green"
    ),
                  secondary_y=False
                 )
    fig.add_trace(go.Scatter(
        y=x['Volume'],
        x=x.index,
        name = symbol,
        legendgroup = symbol,
        marker_color = "red"
    ),
                  secondary_y=True
                 )```

swift furnace Oct 9, 2022, 6:01 PM

#

thx for sharing :]

young granite Oct 9, 2022, 6:01 PM

#

my pleasure

marble obsidian Oct 9, 2022, 6:23 PM

#

@young granite I was working with yfinance yesterday! Nice coincidence. What are you working on? I am an experienced ML developer, so I can help answer a few questions if you have any

autumn kindle Oct 9, 2022, 6:24 PM

#

How to measure speed rate when someone reading a paragraph

marble obsidian Oct 9, 2022, 6:25 PM

#

Interesting question. But my best guess is this is not something solved using ML, apart from the part where you track eye movement.

meager crater Oct 9, 2022, 6:36 PM

#

Hey anyone knows how to rewrite this compile without string parameter ?

m.compile(
    optimizer="RMSprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

Tried this, but got an error in model fitting:

m.compile(
    optimizer=keras.optimizers.RMSprop(),
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics=[keras.metrics.Accuracy()]
)

#

I understand that it is metrics issue, however, confused on why would that throw an error

ValueError: Shapes (None, 1) and (None, 10) are incompatible

Note: I'm getting this error only with Accuracy

fresh tiger Oct 9, 2022, 8:10 PM

#

Hi! I had a question regarding backwards propagation and gradient descent.

W <- W - alpha * dJ(W)/dW

From what I am understanding, the gradient in the formula above is retrieved via back propagation in the neural network. In screenshot1 I understand how this chain rule gives us the gradient for J(w) and w2.

In screenshot 2 though, I am a bit confused as to why we apply the chain rule in that way compared to screenshot 3. Since we have the values of w1 and J(w), why do we need to apply the chain rule again for dy/dw1? Wouldnt that be an unnecessary extra step?

desert oar Oct 9, 2022, 8:20 PM

#

fresh tiger Hi! I had a question regarding backwards propagation and gradient descent. ``...

what is J here?

#

i assume W are all the model parameters

fresh tiger Oct 9, 2022, 8:21 PM

#

J would be the cost function, and yes W are all the model parameters

desert oar Oct 9, 2022, 8:23 PM

#

what is the context for screenshots 2 and 3?

fresh tiger Oct 9, 2022, 8:24 PM

#

to get the gradient of the graph mapping the cost function J(W) against w1. This gradient is then used in the gradient descent formula to find the optimal w1

desert oar Oct 9, 2022, 8:25 PM

#

it looks like screenshot 2 is the "fully expanded" form of 3

fresh tiger Oct 9, 2022, 8:26 PM

#

Yes, but I am just a bit confused as to why we need to do that, in the video it was said that we cant directly get dy/dw1 (screenshot3)

#

hence why they apply chain rule again in screenshot 2

#

since we already have the values of dJ(W) and dw1

desert oar Oct 9, 2022, 8:30 PM

#

fresh tiger since we already have the values of dJ(W) and dw1

dw1 isn't really a "value". dJ/dw1 is notation indicating the derivative of J with respect to w1

fresh tiger Oct 9, 2022, 8:34 PM

#

Lets say we are at the stage where we want to find the optimal w1. Why would we even need to use the chain rule to find that gradient, isnt it enough to just have the values of the cost, based on different vals of w1?

desert oar Oct 9, 2022, 8:36 PM

#

you can do that if you know the closed form of that expression!

fresh tiger Oct 9, 2022, 8:41 PM

#

I came across this formula before

fresh tiger Oct 9, 2022, 8:41 PM

#

desert oar you can do that *if* you know the closed form of that expression!

Would that be the closed form?

#

If thats the case, then is the chain rule preferred due to performance? Ie would it be more expensive to use that closed form for each w (especially in networks with higher depth and width)?

desert oar Oct 9, 2022, 8:45 PM

#

fresh tiger I came across this formula before

it's hard to tell what this formula means

#

this looks a bit like the expression for one layer only

fresh tiger Oct 9, 2022, 8:46 PM

#

yes I think thats what it is

desert oar Oct 9, 2022, 8:46 PM

#

it might be illustrative to actually work through these expressions in their fully "expanded" form

#

it's good that you're asking these kinds of questions though

#

(and also a great example of why learning math from the videos is not usually that effective)

#

by fully "expanded", i mean write out a model with a small number of inputs, one output, and one small hidden layer, and then actually work through the backprop equations

fresh tiger Oct 9, 2022, 8:50 PM

#

Alright yes that sounds good, will give that a try. I just had one thing to ask about the closed form stuff u mentioned. If we did have closed form equations, would it be more efficient than the chain rule? I assume thats usually not the case, and finding the closed form seems harder/more computationally expensive than just using the chain rule?

desert oar Oct 9, 2022, 8:51 PM

#

it is, and in fact backpropagation allows us to take advantage of a lot of repeated computation in practice and significantly reduce runtime by caching them

fresh tiger Oct 9, 2022, 8:52 PM

#

desert oar it is, and in fact backpropagation allows us to take advantage of a lot of repea...

just to confirm, its more expensive to use the closed form?

desert oar Oct 9, 2022, 8:53 PM

#

https://theorydish.blog/2021/12/16/backpropagation-≠-chain-rule/

Theory Dish

Lunjia Hu

Backpropagation ≠ Chain Rule

The chain rule is a fundamental result in calculus. Roughly speaking, it states that if a variable $latex c$ is a differentiable function of intermediate variables $latex b_1,\ldots,b_n$, and each …

fresh tiger Oct 9, 2022, 8:54 PM

#

Ahhh ok things are starting to click now

#

I also stumbled across this earlier:

#

https://www.quora.com/Is-backpropagation-in-neural-networks-the-same-concept-as-the-chain-rule

Quora

Is backpropagation in neural networks the same concept as the chain...

Answer (1 of 5): The chain rule is a mathematical formula.

There are many ways of computing that formula.

For example, if you have a formula a + b + c, you could compute a + b first, then add c, but you could also do b + c first, then add a, and so on.

Back-propagation is one particular way to...

#

desert oar Oct 9, 2022, 8:55 PM

#

fresh tiger just to confirm, its more expensive to use the closed form?

right. the closed form very quickly becomes ridiculous

fresh tiger Oct 9, 2022, 8:55 PM

#

Awesome, its super clear now! Thank you so much for all of ur help!! I really appreciate it 🙂

sterile fjord Oct 9, 2022, 9:43 PM

#

Hey is anyone available to refactor a quick df.apply if statement into a np.where?

lapis sequoia Oct 9, 2022, 9:45 PM

#

Can anyone recommend a youtube video that will help with creating certain bots which tells u different key words to use esc and by bots I mean the ones that are meant to do stuff for you

#

I think this may have something to do with ai not so sure

alpine temple Oct 9, 2022, 10:05 PM

#

lapis sequoia Can anyone recommend a youtube video that will help with creating certain bots w...

You might want to start with reading up on Natural Language Processing.

#

And then I have an InterativeImputer object go over the columns with missing values after that point?

serene scaffold Oct 9, 2022, 11:45 PM

#

sterile fjord Hey is anyone available to refactor a quick df.apply if statement into a np.wher...

Please don't ask to ask. It wastes everyone's time, including your own. If you want help, show the series and the function you were applying.

serene scaffold Oct 9, 2022, 11:46 PM

#

lapis sequoia Can anyone recommend a youtube video that will help with creating certain bots w...

What do you mean "which key words to use"?

regal ingot Oct 10, 2022, 1:47 AM

#

O,O,O
O,O,O
O,O,T

#

can anyone help me with an a star search question

lapis sequoia Oct 10, 2022, 4:45 AM

#

i am lost with my "machine learning" project attempting to predict the winner of this upcoming world cup

#

i recognize that to train a model i would need to find at least two correlated variables that somehow connect back to the team who won (in a match). however, i realize that team names are not numbers but strings so that's not very useful in a correlation matrix

#

is my approach flawed? is there another (better) way to approach this problem?

remote vortex Oct 10, 2022, 5:28 AM

#

are there any books related to python ml for beginners that I could download?

worldly dawn Oct 10, 2022, 5:29 AM

#

remote vortex are there any books related to python ml for beginners that I could download?

https://www.amazon.com/gp/product/B0BHCFNY9Q/ is nice and available on kindle and oreilly

remote vortex Oct 10, 2022, 5:29 AM

#

worldly dawn <https://www.amazon.com/gp/product/B0BHCFNY9Q/> is nice and available on kindle ...

would it be good to read the data science one first before moving on to this book?

worldly dawn Oct 10, 2022, 5:30 AM

#

remote vortex would it be good to read the data science one first before moving on to this boo...

I don't know which book are you are referring to

remote vortex Oct 10, 2022, 5:34 AM

#

worldly dawn I don't know which book are you are referring to

https://zlibrary.org/book/4989630/29a81f this one

Data Science from Scratch: First Principles with Python, Second Edi...

Data Science from Scratch: First Principles with Python, Second Edition | Joel Grus | download | Z-Library. Download books for free. Find books

worldly dawn Oct 10, 2022, 5:35 AM

#

remote vortex https://zlibrary.org/book/4989630/29a81f this one

doesn't look as modern

#

ideally, read both 🙂

remote vortex Oct 10, 2022, 5:39 AM

#

worldly dawn doesn't look as modern

alright cool

reef dock Oct 10, 2022, 5:57 AM

#

Hey what's a good source to get started on AWS for ML/Infra?

unreal flicker Oct 10, 2022, 7:07 AM

#

Did anyone solve the turing.com test for the data science stack?

lapis sequoia Oct 10, 2022, 9:52 AM

#

Do I need to learn data science before starting with AI?

meager crater Oct 10, 2022, 11:05 AM

#

lapis sequoia Do I need to learn data science before starting with AI?

yes

lapis sequoia Oct 10, 2022, 11:05 AM

#

Oh damn

#

Ty

random forum Oct 10, 2022, 12:50 PM

#

Hi in tensorflow, I get a ValueError: Shapes (None, 1) and (None, 5) are incompatible. I am implementing an NLP scenario which has a multi class classification

#

I have converted my training data and labels into numpy arrays

wooden sail Oct 10, 2022, 12:52 PM

#

what operation are you trying to do, exactly? this is telling you you placed something of size 1 where it should have been of size 5 (or backwards)

#

common cases where it happens are where you use something that expected a one-hot encoded vector, but you returned an int instead

#

e.g. [0,0,1,0,0] in one-hot vs [2] as an int

alpine temple Oct 10, 2022, 1:04 PM

#

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.30, random_state=123, stratify=[target])

ValueError: Found input variables with inconsistent numbers of samples: [11846, 1]

print(features.shape, target.shape) # (11846, 25) (11846,)

#

😐

#

I'm going to give this a shot: https://stackoverflow.com/questions/30813044/sklearn-found-arrays-with-inconsistent-numbers-of-samples-when-calling-linearre

Stack Overflow

sklearn: Found arrays with inconsistent numbers of samples when cal...

Just trying to do a simple linear regression but I'm baffled by this error for:

regr = LinearRegression()
regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values)
which produces:

ValueEr...

desert oar Oct 10, 2022, 1:07 PM

#

lapis sequoia i recognize that to train a model i would need to find at least two correlated v...

you can think of "team" itself as a categorical variable, for which we have several encoding techniques. but i would not spend your energy trying to hammer your data into some generic format that people usually use for machine learning.

there is plenty of formal probability analysis that you can do with this. see for example the Elo ranking system https://en.m.wikipedia.org/wiki/Elo_rating_system, which predicts something akin to the probability of any one team beating any other team.

misty flint Oct 10, 2022, 1:07 PM

#

reef dock Hey what's a good source to get started on AWS for ML/Infra?

madewithml.com

#

look at the MLOps section

desert oar Oct 10, 2022, 1:08 PM

#

alpine temple ```py X_train, X_test, y_train, y_test = train_test_split(features, target, test...

no, stop guessing and trying to pattern-match to stackoverflow questions. look at the actual error message. clearly you provided arrays of different lengths. so what are the shapes of features and target?

alpine temple Oct 10, 2022, 1:09 PM

#

desert oar no, stop guessing and trying to pattern-match to stackoverflow questions. look a...

They each have the same number of rows, where one is a DataFrame and the other, the target, is a Series.

#

(11846, 25) (11846,) are not the same length?

desert oar Oct 10, 2022, 1:10 PM

#

alpine temple They each have the same number of rows, where one is a DataFrame and the other, ...

okay. i also see the shapes now, sorry. yes that SO post seems like a reasonable solution

#

you can try doing target.to_frame() to easily "upgrade" the series to dataframe

alpine folio Oct 10, 2022, 1:21 PM

#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

alpine temple Oct 10, 2022, 1:24 PM

#

desert oar you can try doing `target.to_frame()` to easily "upgrade" the series to datafram...

That worked in that it upgraded the frame, but it didn't work as I still got the same error message(!)

I've tried converting to np.arrays, parsed through the values method, added to.frame(), it reads exactly the same (like I'm not misreading those numbers below, am I?)

But none of these seem to have worked.

#

Wait - I resolved it.

#

Damn it.

#

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.30, random_state=123, stratify=[target])

I've read "Array Like" as it you would be parsing in an array of columns that you would be stratifying as. So the stratify parameter should be stratify=target, not =[target]

#

Thanks for your help, @desert oar

tame condor Oct 10, 2022, 1:47 PM

#

Hello, do you know how to make the attached picture? Sounds like it was made with python Matplotlib to me but I was wondering is there any resource you can suggest how to color the indicator function and super trend buy and sell order?

reef dock Oct 10, 2022, 1:51 PM

#

misty flint madewithml.com

cheers

misty flint Oct 10, 2022, 2:43 PM

#

reef dock cheers

np. take a look at fullstackdeeplearning, the 2022 course, if you are interested in diving deeper afterwards

reef dock Oct 10, 2022, 2:55 PM

#

Sure, will do. Thanks

alpine folio Oct 10, 2022, 3:10 PM

#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

lapis sequoia Oct 10, 2022, 4:29 PM

#

Dl is a part of AI right?

fresh tiger Oct 10, 2022, 4:37 PM

#

desert oar right. the closed form very quickly becomes ridiculous

Hi, I just had a random thought regarding this. I think im missing something basic, but for example in the image I uploaded earlier: https://cdn.discordapp.com/attachments/366673247892275221/1028761475453571132/unknown.png wouldn’t we need some sort of closed form formula to calc du/dz1 and dz1/dw? Ie wouldn’t this still have the same problem as if we were to use the closed form instead of chain rule?

wooden sail Oct 10, 2022, 5:14 PM

#

you do, but those are fairly simple to compute

#

from the perspective of y as a function of z1, all there is is a single layer

#

by using the chain rule, you take derivatives by considering a single layer at a time

#

those are a lot more simple because it's an affine transformation composed with a nonlinear func. that itself requires using the chain rule, but it's not so bad

desert oar Oct 10, 2022, 6:05 PM

#

alpine temple ```py X_train, X_test, y_train, y_test = train_test_split(features, target, test...

oops, i completely missed the stratify= kwarg. that was the original source of error that i suspected, but got thrown off

desert oar Oct 10, 2022, 6:07 PM

#

fresh tiger Hi, I just had a random thought regarding this. I think im missing something bas...

yes but the network is literally defined in terms of formulas like "z as a function of w1". and as Edd said, they are usually easy to differentiate

#

and when they are not necessarily easy to differentiate "symbolically" (what you learn in calculus class), they can probably still be differentiated using "automatic differentiation"

#

the latter is part of the magic behind the various deep learning and differentiable computing frameworks, and why specifically the property of a program or algorithm being differentiable is interesting: because you can actually use it in backpropagation, or rather you can backpropagate through it

fresh tiger Oct 10, 2022, 6:23 PM

#

wooden sail by using the chain rule, you take derivatives by considering a single layer at a...

If for example, this screenshot is z1, and x is the function in the screenshot nested many times, I am mainly confused about how you can take a derivative considering a single layer at a time when each layer is dependent on the previous due to the nesting?

wooden sail Oct 10, 2022, 6:24 PM

#

fresh tiger If for example, this screenshot is z1, and x is the function in the screenshot n...

by using the chain rule 😛

#

let's take an easier example

#

you know the derivative of e^x is equal to e^x * dx/dx

#

now let's replace x with f(x)

#

the derivative of e^f(x) is e^f(x) df/dx (x)

#

or in a more general case of function composition, the derivative of g(f(x)) is g'(f(x)) * f'(x)

#

you can see that, inside of g and g', it's always f(x). we can black box this

#

then we treat f'(x) completely separately

#

it's just your usual chain rule

fresh tiger Oct 10, 2022, 6:52 PM

#

Ahh ok, so if I am understanding this correctly, the black boxing of f(x) pretty much solves our issue of the deeply nested funtions making things super complex?

wooden sail Oct 10, 2022, 6:52 PM

#

yep

proper wing Oct 10, 2022, 6:55 PM

#

hi i got a question about pytorch cnn's

#

why after a conv2d layer, then a maxpool layer and we have a new conv2d layer why is the new conv2d layer input the same as the output layer of 1st conv2d

#

even tho theres a maxpooling layer

#

ah nvm

#

i think its cuz its channels and not image size

fresh tiger Oct 10, 2022, 6:59 PM

#

wooden sail yep

OH wait a sec, so basically we can find the dJ(X)/dw1 without having to look further back in the NN, ie if we had even more layers before w1

wooden sail Oct 10, 2022, 6:59 PM

#

yep, chain rule

fresh tiger Oct 10, 2022, 7:00 PM

#

but for example, how does dy/dz1 get calculated?

wooden sail Oct 10, 2022, 7:00 PM

#

instead of expanding the composition and differentiating a complex function once, we take several easy derivatives

fresh tiger Oct 10, 2022, 7:00 PM

#

ok I think this solves my doubt

wooden sail Oct 10, 2022, 7:00 PM

#

exactly as we did in the example above

#

look, let's take g(f(x))

#

now let's call z = f(x)

#

we find the derivative of g(z) w.r.t. x

#

that's g'(z) z'

#

and z' = f'(x)

#

so g'(z) f'(x)

#

g'(z) doesn't need anything other than the derivative of g, evaluated at whatever z is. it doesn't matter what

#

z is just the previous layers evaluated at the given input

#

it's just chain rule

#

forget about the network

#

just review your calculus

fresh tiger Oct 10, 2022, 7:12 PM

#

Ok ok I see now this is super clear. Yeah I checkout out another video on chain rule and things are connecting now, I now understand WHY we use the chain rule and what it actually does.

Just to confirm the process overall goes:

If we specify f(x) and the sigmoid as an activation function, we can specify the derivatives of those then in the code (I assume we would have to calculate/specify the derivative of our functions if we are implementing back propagation?). This allows us to then take take these simple derivatives we talked about earlier via the chain rule and hence find the derivative of J(W) w.r.t some w value that is super far back in the chain for example?

wooden sail Oct 10, 2022, 7:14 PM

#

right

#

if you can compute functions and their individual derivatives, you can do the same for their composition by using the chain rule

fresh tiger Oct 10, 2022, 7:17 PM

#

Alright awesome, Its very clear to me now! Thank you both so much for all of ur help and for bearing with me! I appreciate it a lot 😄

stark ember Oct 10, 2022, 7:50 PM

#

I'm trying to run a model on CUDA and am pretty clueless on what I'm doing - is there a way I can somehow get around this memory issue possibly at the cost of performance or is it a hard border of what I can and cannot run?

serene scaffold Oct 10, 2022, 7:55 PM

#

stark ember I'm trying to run a model on CUDA and am pretty clueless on what I'm doing - is ...

Can you show the part of the code that loads the model with all relevant import statements? And please don't ask people to read screenshots of text.

#

!code

arctic wedgeBOT Oct 10, 2022, 7:55 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

lapis sequoia Oct 10, 2022, 7:57 PM

#

serene scaffold What do you mean "which key words to use"?

Like the if statements or return or def or lists or the and statement keywords like that because rn im doing a course so i have a really good base for learning bot making since its a goal of mine for this year

lapis sequoia Oct 10, 2022, 7:59 PM

#

alpine temple You might want to start with reading up on Natural Language Processing.

Thank you very much I'll note that

stark ember Oct 10, 2022, 8:03 PM

#

serene scaffold Can you show the part of the code that loads the model with all relevant import ...

Sorry, I had screenshotted that without thinking 😅

And yes, here's the code (whisper) is https://github.com/openai/whisper:

import whisper
from torch import cuda

def load_model(model_name: str = 'base'):
    device = 'cuda' if cuda.is_available() else 'cpu'

    print(f'Loading {model_name} model on {device.upper()}')
    return whisper.load_model(model_name, device=device)

#

I'm not doing any of the loading myself, just using the library's exposed functions

#

If you want the source for whisper.load_model, I can provide that as well:

def load_model(name: str, device: Optional[Union[str, torch.device]] = None, download_root: str = None, in_memory: bool = False) -> Whisper:
    """
    Load a Whisper ASR model

    Parameters
    ----------
    name : str
        one of the official model names listed by `whisper.available_models()`, or
        path to a model checkpoint containing the model dimensions and the model state_dict.
    device : Union[str, torch.device]
        the PyTorch device to put the model into
    download_root: str
        path to download the model files; by default, it uses "~/.cache/whisper"
    in_memory: bool
        whether to preload the model weights into host memory

    Returns
    -------
    model : Whisper
        The Whisper ASR model instance
    """

    if device is None:
        device = "cuda" if torch.cuda.is_available() else "cpu"
    if download_root is None:
        download_root = os.getenv(
            "XDG_CACHE_HOME", 
            os.path.join(os.path.expanduser("~"), ".cache", "whisper")
        )

    if name in _MODELS:
        checkpoint_file = _download(_MODELS[name], download_root, in_memory)
    elif os.path.isfile(name):
        checkpoint_file = open(name, "rb").read() if in_memory else name
    else:
        raise RuntimeError(f"Model {name} not found; available models = {available_models()}")

    with (io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")) as fp:
        checkpoint = torch.load(fp, map_location=device)
    del checkpoint_file

    dims = ModelDimensions(**checkpoint["dims"])
    model = Whisper(dims)
    model.load_state_dict(checkpoint["model_state_dict"])

    return model.to(device)

serene scaffold Oct 10, 2022, 8:08 PM

#

stark ember Sorry, I had screenshotted that without thinking 😅 And yes, here's the code (`...

My battery is running low. Hopefully I can look later. Others are welcome to.

stark ember Oct 10, 2022, 8:08 PM

#

Alright, thanks a lot for helping!

desert oar Oct 10, 2022, 8:24 PM

#

@fresh tiger it looks like you worked through it with Edd, but this is a great example of why it's valuable to actually go through the motions with specific (simple) cases, like some small neural network with 4 inputs, 2 hidden nodes, and 3 outputs + softmax, with MSE loss and sigmoid activations. that's well within reach of what you can write out and work through entirely by hand, even completely avoiding vector notation and working with sums of scalar terms

#

it's not the kind of thing you need to do more than once or twice before you get it

#

part of the value of a good course, or at least a good textbook, is having exercises like the above presented to you

desert oar Oct 10, 2022, 8:28 PM

#

stark ember Alright, thanks a lot for helping!

can you post the error too? i can't read that screenshot

stark ember Oct 10, 2022, 8:41 PM

#

desert oar can you post the error too? i can't read that screenshot

whisper> load large
Loading large model on CUDA
100%|█████████████████████████████████████| 2.87G/2.87G [05:57<00:00, 8.64MiB/s]
CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 7.12 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
whisper>

alpine folio Oct 10, 2022, 8:52 PM

#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

fringe anvil Oct 10, 2022, 8:55 PM

#

hello, sorry to annoy, i just started uni (last month) for data science diploma. problem is, math was never a strength. is there any material out there, that is clear, and shows the maths directly in python.

cause looking at math formulas all day, i fried my brain 3 days in a row.

thanks for your time

alpine temple Oct 10, 2022, 9:38 PM

#

fringe anvil hello, sorry to annoy, i just started uni (last month) for data science diploma....

Not answering this one directly, but I am (somewhat) in the same boat as you. I'm in my third year now. Still struggling XD

Barbara Oakley's "Learning How To Learn" free course on Coursera is a must do soon. Helps create a framework to work with Math concepts and memory recall and comprehension.

#

Side note everyone - here's a silly question.

So this is a little right skewed, right?

#

Now watch before your very eyes as I REMOVE the slight right skewness.

# I am going to do a Log Transform of Min C, output the graph and p-value to see what improvement is shown.

logged_minc = df_imputed['Min °C'].copy() # not logged yet
logged_minc = (logged_minc - logged_minc.min()) + 1
logged_minc = np.log10(logged_minc)

plt.figure()
ax = sns.histplot(data=df_imputed, x=logged_minc, hue=df_imputed['Rain(Y/N)'], kde=True)

#

TADA!

The right skew is gone!

#

It is now a trailing left Skew.

Am I doing something wrong here?

fringe anvil Oct 10, 2022, 9:45 PM

#

alpine temple Not answering this one directly, but I am (somewhat) in the same boat as you. I'...

thanks, ill check out the course

alpine temple Oct 10, 2022, 9:53 PM

#

Log transform here as well - Standard Deviation has narrowed in terms of the values on the x-axis to have the deep crevasse of nothing from 0 to the start of the distribution be from 10 units to about 1.

But I have an overwhelming feeling what I should be saying in my report is that Log Transformations may not be the catch-all transformation for skewed data? Or is that taking it too far?

brave sand Oct 10, 2022, 9:56 PM

#

elif args.attack_type == "remote":
                    prisoner_loc = env.env.prisoner.location.copy()
                    dists = []
                    for i in range(env.env.num_known_cameras):
                        cam_loc = env.env.camera_list[i].location.copy()
                        dist = np.linalg.norm(np.array(prisoner_loc)-np.array(cam_loc))
                        dists.append((i, dist))
                    sorted_dists = sorted(dists, key=lambda x: x[1], reverse=True)
                    idx = np.random.choice(5, args.C, replace=False)
                    attack_action = [sorted_dists[i][0] for i in idx]```
can someone explain this chunk of code? it's supposed to only perturbs the detection flag to True and set the detected location to be the camera's own location. how could I edit this code so it could specify the location of the camera as an action?

sick moon Oct 10, 2022, 10:09 PM

#

https://www.youtube.com/watch?v=YqaNo0XfAD4
A quick talk I gave to PyHEP, in last september, organised by the people at CERN.
We talked about what Python can do in VR, not exactly related to particle physics, but they kindly invited us to show our work.

YouTube

HEP Software Foundation

PyHEP2022 3D and VR Industrial Use Cases in Python

Through several examples of practical use cases the talk will present our experiences of 3D and Virtual Reality, all implemented in Python with the help of our 3D package "HARFANG 3D" :

Human factor study of a railway station in virtual reality
Using a aircraft simulation sandbox for AI training
Tele-operating a humanoid robot in VR...

▶ Play video

desert oar Oct 11, 2022, 3:57 AM

#

@fringe anvil @alpine temple it's unfortunately a disservice to students to try to force them to learn material that they don't have the prerequisites for. "in python" is also a bit of a challenge here. there might be some books that use numpy for linear algebra examples (i don't know of one), and i know there is at least one book using code examples to teach probability. but if you can at least specify a couple of things you don't understand, someone might be able to direct you to useful resources

#

realistically the only way to learn applied math is to learn math. you don't need a graduate degree, but you do need the fundamentals of linear algebra and multivariable calculus.

#

if you are significantly more comfortable with code than with traditional math notation, maybe a good exercise would be to translate traditional formulas into python functions

lunar wharf Oct 11, 2022, 4:27 AM

#

Hey there! Whats a good free resource to learn python for data science / ML ? Something similar to TOP but for python?

lapis sequoia Oct 11, 2022, 4:36 AM

#

Dl is a sub category of ML right?

rough magnet Oct 11, 2022, 4:44 AM

#

lapis sequoia Dl is a sub category of ML right?

Yes, I believe so

lapis sequoia Oct 11, 2022, 4:45 AM

#

Ty

rough magnet Oct 11, 2022, 4:45 AM

#

lunar wharf Hey there! Whats a good free resource to learn python for data science / ML ? So...

Kaggle? Maybe

desert oar Oct 11, 2022, 5:43 AM

#

lunar wharf Hey there! Whats a good free resource to learn python for data science / ML ? So...

check the pinned messages, there might be something in there

#

@alpine temple @fringe anvil in addition to what i said before, check the pinned messages in this channel. look for the MML book, among other things

arctic wedgeBOT Oct 11, 2022, 5:49 AM

#

Hey @empty nacelle!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

undone storm Oct 11, 2022, 7:47 AM

#

Hey, I want to do some kind of benchmarking (multiple datasets with multiple different algorithms) and I was searching for some tooling in order to make it reproducable. I found DVC and mlflow, they seem to support running experiments with configuration files of the hyperparamaters etc, but all I could find was with only one dataset and one algorithm (but different models of this algorithm). Does anyone know if those tools are appropriate for my use case as well, or are there better alternatives?

hushed kraken Oct 11, 2022, 10:21 AM

#

Hi, for my engineering project I have to predict data of solar panels on how much energy they get. And I'm new to deep learning so would Keras be the best option to make my model or are there better alternatives?

topaz wave Oct 11, 2022, 11:06 AM

#

hi, am new to machine learning and needed some assistance from experienced ppl

#

heeeeeelp

wooden sail Oct 11, 2022, 11:06 AM

#

you're gonna have to specify what with

topaz wave Oct 11, 2022, 11:09 AM

#

wooden sail you're gonna have to specify what with

well i started learned numpy,pandas and matplotlib made some projects learned scikitlearn and did things with that , now i am confused if i should go on further with normal algorithms or i should just go on further to deep learning

#

any suggestions?

#

my main goal is to go further in deep learning and make deep learning models

wooden sail Oct 11, 2022, 11:09 AM

#

what do you have in mind when you say "normal algorithms"

topaz wave Oct 11, 2022, 11:11 AM

#

wooden sail what do you have in mind when you say "normal algorithms"

like i learned the basics such as linear regression logistic regression etc but i've seen there are more such as knn naive bayes and more

#

so should i complete them and then go for deep learning?

#

or am i good to go?

wooden sail Oct 11, 2022, 11:12 AM

#

i would say you should complete those first, but i'm a big proponent of learning from the ground up. it really depends on how you prefer doing stuff

#

you'll pick up stuff that will be useful/necessary for deep learning

topaz wave Oct 11, 2022, 11:14 AM

#

hmm so according to you i should strengthen my basics before goin for deep learning and neural networks

wooden sail Oct 11, 2022, 11:14 AM

#

that would be my claim since ML is math

#

the more you know, the better

#

you can learn it ahead of time or at the same time, and i'm saying learning ahead of time is what i prefer, but that's personal flavor

topaz wave Oct 11, 2022, 11:16 AM

#

oh alright that helps

#

thanku

meager crater Oct 11, 2022, 12:23 PM

#

hey anyone has speech labeled dataset in English?

worldly wren Oct 11, 2022, 12:31 PM

#

I dont know if this is the right place to ask this

#

but

#

I have a pandas dataframe , which has a coloumn

#

which is full with joined hashtags leme show you

#

#

so is there anything I can do to like

#

first take these rows , then make a list by separating those strings

#

then arrange in ascending order of hashtags used

desert oar Oct 11, 2022, 12:49 PM

#

worldly wren

where did you get this data from? the ÿ characters appear to be some kind of record separator, represented incorrectly because the original data is in a different text encoding from what pandas used to load the data

worldly wren Oct 11, 2022, 12:50 PM

#

I think that might be because

#

the guy who made the data , must have used a mac

#

or something. I am using win 11

desert oar Oct 11, 2022, 12:51 PM

#

do you know what program they might have used to make the data?

worldly wren Oct 11, 2022, 12:51 PM

#

📎 Instagram_data_2.csv

#

I have no idea honestly

desert oar Oct 11, 2022, 12:51 PM

#

i wonder if they just chose a byte that isn't valid ascii

worldly wren Oct 11, 2022, 12:52 PM

#

so what can I do with these hashtags or captions

#

am supposed to analyze the data

desert oar Oct 11, 2022, 12:52 PM

#

that's 0xFF which maybe was some overly-clever programmer's idea of a "character that nobody will use and will be obviously just a record separator"

desert oar Oct 11, 2022, 12:52 PM

#

worldly wren am supposed to analyze the data

split on ÿ of course

#

that way you will get a list of hashtags in each data frame cell, and you can proceed

#

!d pandas.Series.str.split

arctic wedgeBOT Oct 11, 2022, 12:53 PM

#

pandas.Series.str.split


Series.str.split(pat=None, n=- 1, expand=False, *, regex=None)```
Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.

worldly wren Oct 11, 2022, 12:53 PM

#

where will I get the list tho

#

as a different column

desert oar Oct 11, 2022, 12:54 PM

#

worldly wren where will I get the list tho

...from the data?

#

can you clarify your question?

worldly wren Oct 11, 2022, 12:54 PM

#

well

#

I want to seperate the hashtags

#

then ,

#

i want to count all the unique ones ,

#

and arrange them in ascending order

#

and make a bar graph out of it

desert oar Oct 11, 2022, 12:55 PM

#

okay, you want the number of unique hashtags in each row?

#

or the number of times each hashtag appears? or something else?

worldly wren Oct 11, 2022, 12:55 PM

#

no

#

or the number of hashtags that appears in the whole table

#

like including all rows

desert oar Oct 11, 2022, 12:56 PM

#

that's just one number, not something you want to plot with a bar chart

worldly wren Oct 11, 2022, 12:57 PM

#

but there will be many values right

#

for different hast tags

#

x axis will be the hashtag name

#

y axis will be its value

#

or count

desert oar Oct 11, 2022, 12:59 PM

#

worldly wren for different hast tags

it sounds like you are asking for the number of times each individual hash tag appears

worldly wren Oct 11, 2022, 12:59 PM

#

Yes

desert oar Oct 11, 2022, 1:00 PM

#

good question. it might be worth your while to at least make an attempt at it on your own

#

i will give you the hint that there is no single function or method that will do this for you

worldly wren Oct 11, 2022, 1:01 PM

#

welp I did try to do it before

#

also I am doing this for a school project

#

but I cant figure it out

desert oar Oct 11, 2022, 1:02 PM

#

worldly wren welp I did try to do it before

okay, and what did you try?

worldly wren Oct 11, 2022, 1:02 PM

#

well I tried that one split command

#

and made an effort to create a list

#

by making a virtual column

desert oar Oct 11, 2022, 1:02 PM

#

a virtual column?

worldly wren Oct 11, 2022, 1:02 PM

#

and placing the list values inside it

#

did not work

desert oar Oct 11, 2022, 1:03 PM

#

what is a virtual column?

worldly wren Oct 11, 2022, 1:03 PM

#

idk

#

in mysql they call it virtual column

#

I don't know what they call it in python

desert oar Oct 11, 2022, 1:04 PM

#

(keep in mind that pandas is not python, it's just a library written in python)

worldly wren Oct 11, 2022, 1:04 PM

#

Yup sorry about that

desert oar Oct 11, 2022, 1:05 PM

#

how did you attempt to create a virtual column? pandas doesn't really have that concept, so it's very likely that you just misunderstood what you were doing

worldly wren Oct 11, 2022, 1:06 PM

#

I tried to do something like that

#

because in sql, I did a question like that before

desert oar Oct 11, 2022, 1:07 PM

#

i'm asking you to describe specifically what you tried. pandas does not have virtual columns, so telling me that you tried to create one doesn't actually tell me anything!

#

how about this, can you share the code that you used?

#

!paste

arctic wedgeBOT Oct 11, 2022, 1:07 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar Oct 11, 2022, 1:07 PM

#

!code

arctic wedgeBOT Oct 11, 2022, 1:07 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

worldly wren Oct 11, 2022, 1:07 PM

#

desert oar how about this, can you share the code that you used?

i got rid of it

#

also it wasn't really a virtual column I think

#

I had just updated the column prob

#

but the process is big

desert oar Oct 11, 2022, 1:08 PM

#

indeed, i was going to suggest that but i didn't want to guess without seeing your code

#

in general, it's usually prudent to try out your code on a small sample of data before trying to use the full dataset

worldly wren Oct 11, 2022, 1:09 PM

#

so am I supposed to use loops?

desert oar Oct 11, 2022, 1:10 PM

#

worldly wren so am I supposed to use loops?

you often can avoid writing for loops in pandas, but conceptually almost always yes

#

using built-in pandas looping tools is usually much faster and considered better style

worldly wren Oct 11, 2022, 1:11 PM

#

can you just tell me the commands

#

syntax i mean

desert oar Oct 11, 2022, 1:12 PM

#

sorry, i won't do your homework for you

worldly wren Oct 11, 2022, 1:12 PM

#

I don't mind that , I know that

#

just asking you what commands will be used

#

like about the split thing you mentioned

desert oar Oct 11, 2022, 1:13 PM

#

worldly wren like about the split thing you mentioned

i posted a link to the docs for the split method

#

you would access the column with [] and then call .str.split on the result

worldly wren Oct 11, 2022, 1:13 PM

#

okay

desert oar Oct 11, 2022, 1:14 PM

#

if you are a practicing data scientist you must be able to read docs and combine them with your fundamental understanding in order to solve a problem

worldly wren Oct 11, 2022, 1:14 PM

#

so il get a list after that?

desert oar Oct 11, 2022, 1:14 PM

#

you can't rely on other people for that

desert oar Oct 11, 2022, 1:14 PM

#

worldly wren so il get a list after that?

you get a Series where each element is a list

worldly wren Oct 11, 2022, 1:14 PM

#

I am planning to take Cse

desert oar Oct 11, 2022, 1:14 PM

#

this is stated in the documentation

worldly wren Oct 11, 2022, 1:14 PM

#

I am in 12th grade

desert oar Oct 11, 2022, 1:15 PM

#

ok, then imo you've had enough education to know that in order to learn you need to solve problems for yourself, instead of waiting for solutions to be provided to you

#

i'm sure you hear this from people all the time, but expectations only go up as you get older and gain more experience

worldly wren Oct 11, 2022, 1:16 PM

#

after my entrance test

#

am planning to learn everything properly

woeful hatch Oct 11, 2022, 1:17 PM

#

worldly wren I am in 12th grade

Mhm... I'm in 11th..

desert oar Oct 11, 2022, 1:17 PM

#

well i gave you plenty of advice so far. why don't you at least try to split the hashtags into lists?

worldly wren Oct 11, 2022, 1:18 PM

#

yup doing that now

desert oar Oct 11, 2022, 1:18 PM

#

assign them to a new column in the same dataframe perhaps, just in case you make a mistake and need to get the original

woeful hatch Oct 11, 2022, 1:18 PM

#

pandas ??

desert oar Oct 11, 2022, 1:18 PM

#

i also strongly suggest working with a small sample of this data

woeful hatch Oct 11, 2022, 1:18 PM

#

desert oar i also strongly suggest working with a small sample of this data

Is it better to work with csv files or json files while using pandas ??

#

I mostly use csv..

desert oar Oct 11, 2022, 1:18 PM

#

1000 rows should be more than enough to be "realistic" for testing your code, without worrying about performance on a bigger dataset

desert oar Oct 11, 2022, 1:19 PM

#

woeful hatch Is it better to work with csv files or json files while using pandas ??

ideally neither, parquet is better than both for a lot of uses. csv and json have their own, but different, problems

#

csv is good if you don't have complicated text in your data

woeful hatch Oct 11, 2022, 1:20 PM

#

worldly wren yup doing that now

I think matrix calculations might be a part of your syllabus

woeful hatch Oct 11, 2022, 1:20 PM

#

desert oar csv is good if you don't have complicated text in your data

Mm...

woeful hatch Oct 11, 2022, 1:21 PM

#

desert oar ideally neither, parquet is better than both for a lot of uses. csv and json hav...

I really haven't worked much with data science....
Just doing some discord bot coding stuff ;-;

desert oar Oct 11, 2022, 1:21 PM

#

woeful hatch Mm...

there is no universally best data format. so really the question needs to be qualified: in what context, for what purpose?

woeful hatch Oct 11, 2022, 1:22 PM

#

desert oar there is no universally best data format. so really the question needs to be qua...

Mhm...
I was looking for a general answer but ya it's ok...

desert oar Oct 11, 2022, 1:22 PM

#

woeful hatch Mhm... I was looking for a general answer but ya it's ok...

the general answer is that there is no general answer

woeful hatch Oct 11, 2022, 1:22 PM

#

Mm....

desert oar Oct 11, 2022, 1:22 PM

#

you will encounter that in many many situations in programming, data science, and elsewhere in life

woeful hatch Oct 11, 2022, 1:23 PM

#

Hmm...

#

I've just started like 2 months ago...

#

Busy with exam portions now...

lapis sequoia Oct 11, 2022, 1:30 PM

#

Making a roadmap to self-study for data analist, starting with Excel. I will also need statistics and probability, but which math topics do you need to study before statistics and probability? Graduated 30 years ago as an engineer, but this math stuff is buried deep. You may point me to Khan Academy courses, YouTube videos, books incl. The Manga Guides to ... from nostarch, FreeCodeCamp, ... Anything that may help me to know where to start and what's next.

meager forge Oct 11, 2022, 1:36 PM

#

Is there any way to assess the quality of audio?

wooden sail Oct 11, 2022, 3:06 PM

#

hoo boy, is there

serene scaffold Oct 11, 2022, 3:06 PM

#

lapis sequoia Making a roadmap to self-study for data analist, starting with Excel. I will als...

if you want to be a data analyst, but not a "data scientist", this server may be of limited use for you. because we mostly use pandas or SQL for tabular data

wooden sail Oct 11, 2022, 3:06 PM

#

but it depends entirely on what kind of audio and the application

desert oar Oct 11, 2022, 3:07 PM

#

lapis sequoia Making a roadmap to self-study for data analist, starting with Excel. I will als...

plain calculus and vector/matrix arithmetic is good enough to start. eventually you'll want multivariable calculus and linear algebra.

#

there is a "Math for ML" book in the pinned messages

#

keep in mind that probability is often considered a subset of pure math, or at least tends to straddle the line between pure and applied. you will probably want to focus on understanding the fundamentals and don't need to worry (at least not at first) about things like moment generating functions

#

i just came up with this now: if you can derive the binomial distribution on the back of an envelope, you are off to a good start in practical probability

serene scaffold Oct 11, 2022, 3:11 PM

#

desert oar plain calculus and vector/matrix arithmetic is good enough to start. eventually ...

I don't think they want to do ML though

lapis sequoia Oct 11, 2022, 3:11 PM

#

serene scaffold if you want to be a data analyst, but not a "data scientist", this server may be...

Actually, the roadmaps I found for data analist also include SQL and python (pandas, numpy, matplotlib, ...)

desert oar Oct 11, 2022, 3:16 PM

#

serene scaffold I don't think they want to do ML though

well they asked about math for stats and probability

wooden sail Oct 11, 2022, 3:17 PM

#

stats and linalg are good for your soul anyway

#

they'll need that

desert oar Oct 11, 2022, 3:22 PM

#

@lapis sequoia http://mitran-lab.amath.unc.edu/courses/MATH768/biblio/introduction-to-prob-models-11th-edition.PDF

#

that's a pretty good book

#

it might be a little advanced for your level if you forgot all your math

#

but it's also full of the kinds of things that, if you can implement them in practice, he will be able to solve a huge variety of real problems in a variety of domains

#

a good introductory statistics book would also be a really good idea

#

let me see if there's a probability book that starts a little on the lighter side, so you can get into the fun stuff more quickly without worrying too much about math prereqs

#

you can do a lot without much more than high school algebra

#

the best "data analysts" in my experience are the ones who don't worry about learning fancy stuff but are incredibly solid with their fundamentals, and have one or two extremely powerful tools that they know how to use proficiently, like SQL and Excel, and also have substantial domain knowledge about whatever field they work in

worldly wren Oct 11, 2022, 3:38 PM

#

Hey I am back here

#

I managed to split the data a long time ago

#

was drinking tea

#

so now I have this

#

each row has a list of all the tags

#

how do I count how many times Finance or Money has been repeated

#

and stuff

#

serene scaffold Oct 11, 2022, 4:10 PM

#

worldly wren

is each element a list of strings, or a string that looks like a list of strings?

#

once you have a Series of lists of strings, you can do .explode().value_counts()

worldly wren Oct 11, 2022, 4:11 PM

#

List of strings

#

@serene scaffold

serene scaffold Oct 11, 2022, 4:12 PM

#

worldly wren List of strings

for Series[list[str]], .explode will give you a flat Series[str]. and then you can do value_counts on that

worldly wren Oct 11, 2022, 4:12 PM

#

I have a series

#

Of that column

#

What next

serene scaffold Oct 11, 2022, 4:12 PM

#

I already told you

worldly wren Oct 11, 2022, 4:13 PM

#

Okay il try that

#

Rq

digital folio Oct 11, 2022, 4:13 PM

#


x = PivotTable.loc[PivotTable.Retailer == "Bela","Promotion Relevance (Cat)_Energy"]
print(x)```

How can I get just the value

serene scaffold Oct 11, 2022, 4:14 PM

#

digital folio ```py x = PivotTable.loc[PivotTable.Retailer == "Bela","Promotion Relevance (Ca...

use .at instead of .loc

digital folio Oct 11, 2022, 4:14 PM

#

serene scaffold use `.at` instead of `.loc`

giving me error

x = PivotTable.at[PivotTable.Retailer == "Bela","Promotion Relevance (Cat)_Energy"]
print(x)

worldly wren Oct 11, 2022, 4:15 PM

#

I got the number of values in a single row

serene scaffold Oct 11, 2022, 4:15 PM

#

digital folio giving me error ```py x = PivotTable.at[PivotTable.Retailer == "Bela","Promotion...

please do not ask people to read screenshots of text. please copy and paste text as text.

I guess you can do x = PivotTable.loc[PivotTable.Retailer == "Bela","Promotion Relevance (Cat)_Energy"].iat[0]

serene scaffold Oct 11, 2022, 4:16 PM

#

worldly wren I got the number of values in a single row

I'm not sure what you mean.

worldly wren Oct 11, 2022, 4:16 PM

#

Hold on

#

Leme show you

serene scaffold Oct 11, 2022, 4:16 PM

#

worldly wren Leme show you

Please do print(series.head().to_dict()) and put the text in the chat. I will not accept a screenshot.

#

series is whatever the NewHash column is.

worldly wren Oct 11, 2022, 4:17 PM

#

Okay

digital folio Oct 11, 2022, 4:18 PM

#

it worked

worldly wren Oct 11, 2022, 4:19 PM

#

{0: ['#finance', '#money', '#business', '#investing', '#investment', '#trading', '#stockmarket', '#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#dataanalyst', '#amankharwal', '#thecleverprogrammer'], 1: ['#healthcare', '#health', '#covid', '#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#dataanalyst', '#amankharwal', '#thecleverprogrammer'], 2: ['#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#deeplearning', '#machinelearningprojects', '#datascienceprojects', '#amankharwal', '#thecleverprogrammer', '#machinelearningmodels'], 3: ['#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#pythonlearning', '#pythondeveloper', '#pythoncoding', '#pythonprogrammer', '#amankharwal', '#thecleverprogrammer', '#pythonprojects'], 4: ['#datavisualization', '#datascience', '#data', '#dataanalytics', '#machinelearning', '#dataanalysis', '#artificialintelligence', '#python', '#datascientist', '#bigdata', '#deeplearning', '#dataviz', '#ai', '#analytics', '#technology', '#dataanalyst', '#programming', '#pythonprogramming', '#statistics', '#coding', '#businessintelligence', '#datamining', '#tech', '#business', '#computerscience', '#tableau', '#database', '#thecleverprogrammer', '#amankharwal']}

serene scaffold Oct 11, 2022, 4:19 PM

#

thank you, one moment

serene scaffold Oct 11, 2022, 4:21 PM

#

worldly wren `{0: ['#finance', '#money', '#business', '#investing', '#investment', '#trading'...

In [5]: s.explode().value_counts()
Out[5]:
#thecleverprogrammer        5
#amankharwal                5
#pythonprojects             5
#pythonprogramming          5
...
#stockmarket                1
#trading                    1
#investment                 1
#investing                  1
#database                   1
dtype: int64

Is this different from what you wanted? Do you need the value counts per row, rather than overall?

worldly wren Oct 11, 2022, 4:21 PM

#

overall

#

but this format is exactly what I want

serene scaffold Oct 11, 2022, 4:22 PM

#

great, so we did it meow_party

worldly wren Oct 11, 2022, 4:22 PM

#

well you did it

#

how do I get this though

#

oh

#

okay got it

serene scaffold Oct 11, 2022, 4:22 PM

#

2    [#data, #datascience, #dataanalysis, #dataanal...
3    [#python, #pythonprogramming, #pythonprojects,...
4    [#datavisualization, #datascience, #data, #dat...
dtype: object

In [11]: s.explode()
Out[11]:
0                #finance
0                  #money
0               #business
0              #investing
0             #investment
             ...
4        #computerscience
4                #tableau
4               #database
4    #thecleverprogrammer
4            #amankharwal
Length: 98, dtype: object

worldly wren Oct 11, 2022, 4:22 PM

#

leme try

serene scaffold Oct 11, 2022, 4:23 PM

#

We can also clip the hashtag, if you don't want that.

In [14]: s.explode().str[1:].value_counts()
Out[14]:
thecleverprogrammer        5
amankharwal                5
pythonprojects             5
pythonprogramming          5
python                     5
pythoncode                 4

worldly wren Oct 11, 2022, 4:24 PM

#

since I am going to make a graph out of em

#

I would prob need it

serene scaffold Oct 11, 2022, 4:24 PM

#

you can use the .str accessor to do string methods to every element at once.

worldly wren Oct 11, 2022, 4:27 PM

#

no attributes called value_counts

serene scaffold Oct 11, 2022, 4:27 PM

#

worldly wren no attributes called value_counts

show code

worldly wren Oct 11, 2022, 4:27 PM

#

print(x.explode().str.value_counts())

serene scaffold Oct 11, 2022, 4:28 PM

#

worldly wren `print(x.explode().str.value_counts())`

you did .str., not .str[1:].

worldly wren Oct 11, 2022, 4:28 PM

#

I wanted the whole thing

#

so I was not supposed to do that?

serene scaffold Oct 11, 2022, 4:28 PM

#

then remove the .str part entirely

worldly wren Oct 11, 2022, 4:28 PM

#

oh okay

#

#thecleverprogrammer 117 #amankharwal 117 #python 109 #machinelearning 97 #pythonprogramming 95 ... #bigdataanalytics 1 #qrcodes 1 #datascienceinterview 1 #facebook 1 #boxplots 1 Name: Hashtags, Length: 164, dtype: int64

#

Did it , all thanks to you

serene scaffold Oct 11, 2022, 4:29 PM

#

worldly wren Did it , all thanks to you

you shall surely progress in the pandas arts

worldly wren Oct 11, 2022, 4:30 PM

#

well thank you

#

@serene scaffold Very sorry to bother you again

#

but just wanted to confirm one thing

#

the command we used made a new series right?

#

nvm thats a dumb question

serene scaffold Oct 11, 2022, 4:39 PM

#

worldly wren nvm thats a dumb question

no it's not

serene scaffold Oct 11, 2022, 4:40 PM

#

worldly wren the command we used made a new series right?

yes, pretty much all pandas operations return new objects

weary mountain Oct 11, 2022, 4:44 PM

#

@worldly wren

worldly wren Oct 11, 2022, 4:44 PM

#

yeah?

weary mountain Oct 11, 2022, 4:44 PM

#

Hello

worldly wren Oct 11, 2022, 4:44 PM

#

hey

weary mountain Oct 11, 2022, 4:45 PM

#

It's my cat

#

To make you day better

worldly wren Oct 11, 2022, 4:45 PM

#

Thanks alot man

weary mountain Oct 11, 2022, 4:45 PM

#

Mhm

worldly wren Oct 11, 2022, 4:45 PM

#

really needed a cute picture of a cat

#

but my day is made

#

I managed to get to a solution of something I have been trying to do for 2 days now

weary mountain Oct 11, 2022, 4:46 PM

#

Here comes woofer to help with your stress

serene scaffold Oct 11, 2022, 4:46 PM

#

weary mountain It's my cat

I love your kitty. but cat pics should go in one of the off-topic channels

weary mountain Oct 11, 2022, 4:46 PM

#

Ok

gloomy anvil Oct 11, 2022, 5:27 PM

#

hello yall! Can anyone of you tell me why statsmodels VECM.predict() returns sometimes float64 and sometimes complex128 arrays?

#

I don't understand what this triggers and statsmodels documentation does not say anything about this like always :/

#

also it seems to be random which prediction is complex128 and which is float64. not like "every third pred is X" or sth.

alpine folio Oct 11, 2022, 6:02 PM

#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

lapis sequoia Oct 11, 2022, 6:07 PM

#

is anyone familiar with RandomForestClassifier

fringe anvil Oct 11, 2022, 6:11 PM

#

desert oar <@206985846740615168> <@725199313549918228> it's unfortunately a disservice to s...

Thanks! I'm at work, but as soon as I get home I'll have a look

rough magnet Oct 11, 2022, 6:20 PM

#

lunar wharf Hey there! Whats a good free resource to learn python for data science / ML ? So...

I did also find https://machinelearningmastery.com/start-here/

Machine Learning Mastery

Jason Brownlee

Start Here with Machine Learning

Your guide to getting started and getting good at applied machine learning with Machine Learning Mastery.

tame ocean Oct 11, 2022, 6:26 PM

#

I just started with AI and I want to know how models work?

gloomy anvil Oct 11, 2022, 6:37 PM

#

tame ocean I just started with AI and I want to know how models work?

it's simple. you usually take exemplary data and pass it to a model (e.g. neural network):
2, 3 -> 5
3, 4 -> 7
6, 2 -> 8
....
while training the model with such data, it will learn in this case to add the first two numbers to find the desired output (here: 5,7,8). Now you take new data that the model has not seen yet to test if it really works:

#

4, 2 -> 6! Congrats you trained a model to add up numbers 🙂

tame ocean Oct 11, 2022, 6:38 PM

#

like how do the layers work tho?

#

@gloomy anvil

gloomy anvil Oct 11, 2022, 6:40 PM

#

tame ocean like how do the layers work tho?

sum of input times weight and then you compare it to a threshold or bias. if your sum is higher than the threshold you pass the sum to next layer as a respective input

#

if you are interested just watch a video on youtube on how neural nets work. its probably easier to see some graphics than explaining it here via text

tame ocean Oct 11, 2022, 6:49 PM

#

k

#

ty

plush jungle Oct 11, 2022, 6:50 PM

#

are there techniques to prevent your discriminator from learning faster than your generator in GANs?

#

my generator loss is almost always higher, and I'm told that ideally they should stay about equal until the generator starts to gradually get below .5 and the discriminator should end up at .5

desert oar Oct 11, 2022, 7:50 PM

#

worldly wren `#thecleverprogrammer 117 #amankharwal 117 #python ...

good job! note that this is the total number of times each hashtag appears, not the number of unique rows it appears in. that is, if a hashtag appears twice in the same row, it will be double counted.

plush jungle Oct 11, 2022, 7:59 PM

#

ok what exactly is going on here?

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        def discriminator_block(in_filters, out_filters, bn=True):
            block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)]
            if bn:
                block.append(nn.BatchNorm2d(out_filters, 0.8))
            return block

        self.model = nn.Sequential(
            *discriminator_block(opt.channels, 16, bn=False),
            *discriminator_block(16, 32),
            *discriminator_block(32, 64),
            *discriminator_block(64, 128),
        )

        # The height and width of downsampled image
        ds_size = opt.img_size // 2 ** 4
        self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1), nn.Sigmoid())

    def forward(self, img):
        out = self.model(img)
        out = out.view(out.shape[0], -1)
        validity = self.adv_layer(out)

        return validity```

#

this is not the usual way I see neural net layers defined

#

I saw a thing on the internet that suggested dumbing down the discriminator by removing a hidden layer

#

but every time I try to change any of the discriminator_block() lines it throws the following error

    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x32768 and 8192x1)```

#

how can I remove a hidden layer from this?

merry ridge Oct 11, 2022, 8:18 PM

#

Is there a way to vectorize this in some sensible way? I have a vector of random numbers, call it x. I need to create a numpy 1d array so that the ith element is a number drawn from a probability distribution with parameters that are a function of the ith element of x.

#

So for example, something like this:

x = scipy.stats.norm(0,1).rvs(10**5)
y = np.zeros(10**5)
for ind in range(x.shape[0]):
  y[ind] = scipy.stats.norm(y[ind], y[ind]/2).rvs(10**5)

except not glacially slow.

desert oar Oct 11, 2022, 8:23 PM

#

merry ridge So for example, something like this: ``` x = scipy.stats.norm(0,1).rvs(10**5) y...

is norm.rvs not vectorized over mean and variance?

merry ridge Oct 11, 2022, 8:24 PM

#

I don't think so, but it doesn't hurt to try. Let me see

#

it does not seem to work unless I am misunderstanding the syntax

desert oar Oct 11, 2022, 8:25 PM

#

x = scipy.stats.norm(0,1).rvs(10**5)
y = scipy.stats.norm.rvs(x, x/2, size=10**5)

or whatever the exact setup is

#

also numpy random is probably faster

#

scipy uses this object oriented framework that involves a lot of indirection internally

#

and i would be very very surprised if numpy rng norm was not vectorized over mean and variance

#

https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html

merry ridge Oct 11, 2022, 8:28 PM

#

I'll check it out and swap if that works. I can't recall the reason why I am using scipy other than that I brought it up and was told that if unless the reason is considerable, the speed increase from that swap isn't worth it because "the engineers want it this way".

desert oar Oct 11, 2022, 8:29 PM

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.rvs.html#scipy.stats.rv_continuous.rvs should be vectorized

#

numpy is usually simpler, scipy is good if you like the OO interface or want to reuse the object representing a specific distribution repeatedly

#

with vectorization both should be "fast enough"

merry ridge Oct 11, 2022, 8:36 PM

#

I think I must be just doing something weird with the syntax because your links suggest I can do it. That's helpful thank you

hasty mountain Oct 11, 2022, 8:40 PM

#

2 quick questions:
-> In a translation model, translating sentences is better than translating each word inside a sentence, right?
-> If so, then each sentence will be assigned to a token, right? I'll have a single value for an entire sentence, no matter how big that sentence is?

#

Oh...now I think I get it... I'll have to tokenize each word, but the input will be the entire sentence. So each sentence will be a sequence of tokens...

forest quartz Oct 11, 2022, 8:48 PM

#

hasty mountain 2 quick questions: -> In a translation model, translating sentences is better th...

Yes
i dont think so because the model tokenize each word no? I'm not sure

hasty mountain Oct 11, 2022, 8:48 PM

#

forest quartz 1. Yes 2. i dont think so because the model tokenize each word no? I'm not sure

Thanks! I'll see what I can do, then

forest quartz Oct 11, 2022, 8:49 PM

#

some model require you to tokenize the word, but if you use ready made/pretrained from hugging model you can just feed the whole sentence and done

hasty mountain Oct 11, 2022, 8:50 PM

#

Meh. The funny is part is doing it all by myself...
even if it's through copying someone else's code

forest quartz Oct 11, 2022, 8:51 PM

#

yeah its fun to code from scratch but those big ass model is fun to play with too

merry ridge Oct 11, 2022, 8:52 PM

#

desert oar https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.r...

I got it working now thanks again for your help.

meager forge Oct 11, 2022, 9:30 PM

#

Is there any way to assess the quality of audio?
To know wether it has disturbance in the sound
The audio is basically voice recodings

desert oar Oct 11, 2022, 11:23 PM

#

meager forge Is there any way to assess the quality of audio? To know wether it has disturban...

didn't someone already help you with this earlier today?

#

@RenegadeZed#4600 one more thing: if you feel like you are lacking in core intuition (many of us are), the 3blue1brown "essence of" video sequences are excellent

#

you won't learn the mechanical equation stuff, but you will probably come away with a much richer intuition than you had before

plush jungle Oct 11, 2022, 11:36 PM

#

meager forge Is there any way to assess the quality of audio? To know wether it has disturban...

do you have a database of voice recordings without any disturbance?

stoic compass Oct 12, 2022, 12:30 AM

#

Hey! I just wrote a data analysis project using Python on Jupyter Notebook and I really want someone to help me with a short review of it. Would you be up for this?

#

This is my first project and I want to get a second perspective from someone with more experience.

serene scaffold Oct 12, 2022, 12:35 AM

#

stoic compass Hey! I just wrote a data analysis project using Python on Jupyter Notebook and I...

if you put all the code in the paste bin (you'll have to copy and paste the code in each cell individually), I can look over it.

#

!paste

arctic wedgeBOT Oct 12, 2022, 12:35 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

stoic compass Oct 12, 2022, 12:36 AM

#

It is quite big, I don't think it will be readable here. 😦

serene scaffold Oct 12, 2022, 12:37 AM

#

stoic compass It is quite big, I don't think it will be readable here. 😦

that's why I told you to put it in the paste bin.

#

if it's too big to share, how were you expecting to get it reviewed?

stoic compass Oct 12, 2022, 12:38 AM

#

All right. I will try it later and ping you. Thanks a lot!

serene scaffold Oct 12, 2022, 12:41 AM

#

stoic compass All right. I will try it later and ping you. Thanks a lot!

Sure. Next time you want people to look at something, be it in the context of a question or whatever else, make sure that everything is available all at once. Don't ask people to commit to answering your question, or looking at your code, before sharing it.

#

If you had shared the code in your first message, I would be reviewing it right now. Now we're just wasting our time.

stoic compass Oct 12, 2022, 12:42 AM

#

That is for sure, thanks for your advice, I will keep this in mind.

lapis sequoia Oct 12, 2022, 2:03 AM

#

noob here, is it possible to use datetime dtype for training a ML model?

#

how should i approach the idea that the machine should pay attention (via feature selection) that the date or year column is pretty relevant considering

serene scaffold Oct 12, 2022, 2:15 AM

#

lapis sequoia noob here, is it possible to use datetime dtype for training a ML model?

what kind of model? please be specific.

lapis sequoia Oct 12, 2022, 2:15 AM

#

random tree

serene scaffold Oct 12, 2022, 2:16 AM

#

lapis sequoia random tree

are you sure you don't mean random forest?

lapis sequoia Oct 12, 2022, 2:16 AM

#

sorry, yes i mean random forest

#

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

scikit-learn

sklearn.ensemble.RandomForestClassifier

Examples using sklearn.ensemble.RandomForestClassifier: Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.24 Release Highlights for scikit-learn 0.22 Release Highlights...

serene scaffold Oct 12, 2022, 2:16 AM

#

what's the first and last year in the data?

lapis sequoia Oct 12, 2022, 2:17 AM

#

min() is 1930, max() is 2014

serene scaffold Oct 12, 2022, 2:19 AM

#

lapis sequoia min() is 1930, max() is 2014

how precise are the timestamps? to the day? hour?

lapis sequoia Oct 12, 2022, 2:19 AM

#

1930-2-15

serene scaffold Oct 12, 2022, 2:20 AM

#

lapis sequoia `1930-2-15`

what is the data? like what does each row represent?

lapis sequoia Oct 12, 2022, 2:20 AM

#

each row represents a football match

#

each row contains a date column, among other attributes ofc

serene scaffold Oct 12, 2022, 2:23 AM

#

lapis sequoia each row represents a football match

does the date actually have anything to do with what the model is supposed to learn?

serene scaffold Oct 12, 2022, 2:23 AM

#

lapis sequoia each row contains a date column, among other attributes ofc

among other attributes ofc
I have no idea what they are unless you tell me

lapis sequoia Oct 12, 2022, 2:24 AM

#

serene scaffold does the date actually have anything to do with what the model is supposed to le...

i mean tbh idk -- maybe?

serene scaffold Oct 12, 2022, 2:25 AM

#

lapis sequoia i mean tbh idk -- maybe?

then I wouldn't use it.

lapis sequoia Oct 12, 2022, 2:25 AM

#

serene scaffold > among other attributes ofc I have no idea what they are unless you tell me

home team score, away team score, penalty kicks, home team id, away team id

serene scaffold Oct 12, 2022, 2:26 AM

#

lapis sequoia home team score, away team score, penalty kicks, home team id, away team id

what is the model supposed to learn?

lapis sequoia Oct 12, 2022, 2:27 AM

#

learn the different teams based on the attributes provided, ultimately to predict match outcomes (win/lose/draw)

serene scaffold Oct 12, 2022, 2:28 AM

#

Great. So far, you've said what your model is, what your features are, and what the model is supposed to learn. Next time you have a question, please say all of these things in your first message.

serene scaffold Oct 12, 2022, 2:28 AM

#

lapis sequoia learn the different teams based on the attributes provided, ultimately to predic...

isn't the winner whichever team has the higher score?

lapis sequoia Oct 12, 2022, 2:30 AM

#

well, the reason i ask is simple because year might have statistical significance tied to it. as we get closer to modernity, the reocurring national teams should have more statistical advantage than non-reocurring national teams. if that makes sense?

serene scaffold Oct 12, 2022, 2:31 AM

#

statistical significance
this term has a specific meaning that isn't the one that you meant.

#

anyway

lapis sequoia Oct 12, 2022, 2:31 AM

#

noted

serene scaffold Oct 12, 2022, 2:32 AM

#

obviously, if a team has existed from 1930 all the way to the 2000s, then the players who compose that team have changed. so, can we assume that for any team, the players that compose that team is the same within a calendar year?

lapis sequoia Oct 12, 2022, 2:33 AM

#

i think that's a fair assumption

serene scaffold Oct 12, 2022, 2:34 AM

#

and do we know which teams have faster rates of turnover?

lapis sequoia Oct 12, 2022, 2:34 AM

#

hmm we do not, but i think if i did have that rate, i would certainly use it as a feature

#

ofc

serene scaffold Oct 12, 2022, 2:37 AM

#

it doesn't sound to me like you have enough features to do anything interesting. if you pick two teams, and ask "which is more likely to win", you can just pick whichever wins more often. if there are considerations other than whichever wins more, you don't really have features for that.

lapis sequoia Oct 12, 2022, 2:39 AM

#

you are exactly right, which is precisely the understanding that i have come to when approaching future models to train

serene scaffold Oct 12, 2022, 2:39 AM

#

hmm, I actually have an idea. do you know about time series forecasting?

lapis sequoia Oct 12, 2022, 2:42 AM

#

i am just entering the machine learning universe, so sadly no

serene scaffold Oct 12, 2022, 2:44 AM

#

so sadly no
don't look at what you don't know as a negative. think of it as another thing you get to learn

lapis sequoia Oct 12, 2022, 2:44 AM

#

but for my original question: in either general cases or specifically random forest classifier cases, can we designate datetime dtypes as features. i ask only because i understand machine learning models can only accept int and float dtypes (atm)

serene scaffold Oct 12, 2022, 2:45 AM

#

in "normal" ML, the order of the observations (ie, the rows of data) doesn't really matter. but for time series stuff, the order is taken into account.

lapis sequoia Oct 12, 2022, 2:46 AM

#

Time series forecasting means to forecast or to predict the future value over a period of time. It entails developing models based on previous data and applying them to make observations and guide future strategic decisions. The future is forecast or estimated based on what has already happened.

#

whoa that sounds very interesting

serene scaffold Oct 12, 2022, 2:46 AM

#

lapis sequoia but for my original question: in either general cases or specifically random for...

in general, you would want to decide what level of precision you need (years, months, days, etc) and encode the time as an int of that many of that unit of time

#

so if you decide that you need to be as precise as months, and your time range starts at 1 January, 1970, and you want to encode 7 February, 1971, you would encode that as 14, because individual days don't matter, and February 1971 is the 14th month in the data.

#

alternatively, if you're treating time as a sort of category (like the name of the month or the day of the week week), you can one-hot encode those

#

are you still with me? questions?

lapis sequoia Oct 12, 2022, 2:51 AM

#

i was just internalizing your points

#

but yes it makes sense

serene scaffold Oct 12, 2022, 2:51 AM

#

of course. take your time.

#

anyway, I don't know that you have enough features to do time series stuff, either. because if you had the turnover rate for each team, you could make a model that estimates how a team's turnover rate and past performance determines its future performance.

lapis sequoia Oct 12, 2022, 2:53 AM

#

so if i understand correctly your examples, the first example (month) requires ordering to be accounted for in the encoding process, while the other one is just for assigning an int encode?

serene scaffold Oct 12, 2022, 2:54 AM

#

lapis sequoia so if i understand correctly your examples, the first example (month) requires o...

the second one is just treating the month/day-of-week as a label. when you treat time things as a label, the model won't know which days or months come before or after each other. so you use that when you assume that certain things usually happen in/on certain days/months

lapis sequoia Oct 12, 2022, 2:55 AM

#

ok that makes total sense

serene scaffold Oct 12, 2022, 2:55 AM

#

for example, I used to work for Starbucks, and we knew that sales were higher on weekdays and on Friday especially, and that sales are especially low in July, and that sales are especially high in December.

#

and this is true week to week and year to year. so we don't really need to know that today is Friday, tomorrow is Saturday, and that Sunday comes afterward.

#

https://tenor.com/view/rebecca-black-friday-yesterday-was-thursday-cringe-pop-gotta-get-down-gif-19533718

Tenor

#

did you love my dank reference?

lapis sequoia Oct 12, 2022, 2:58 AM

#

i have a follow up question about categorical dtypes, using the example of t-shirt sizes (S,M,L,XL)

serene scaffold Oct 12, 2022, 2:58 AM

#

yeah, you'd one hot encode those, because knowing that some of them are bigger than others doesn't really help you that much

iron basalt Oct 12, 2022, 2:59 AM

#

serene scaffold alternatively, if you're treating time as a sort of category (like the name of t...

This can also be used to do what the single int does. If your system wants binary inputs or 0 to 1, it can be used.

#

But ignoring the ordering can be useful as described.

#

Since you are doing Football I can give you the hint that you want injury data more than anything else.

serene scaffold Oct 12, 2022, 3:02 AM

#

iron basalt Since you are doing Football I can give you the hint that you want injury data m...

did fifa give you forbidden knowledge or something?

iron basalt Oct 12, 2022, 3:02 AM

#

(Not easy to get, that is very private information)

lapis sequoia Oct 12, 2022, 3:02 AM

#

serene scaffold yeah, you'd one hot encode those, because knowing that some of them are bigger t...

using another example, under the context of competition and sports, specifically high school sports? wouldn't you want to designate hierarchy with grade level ? (Freshman, Sophomore, Junior, Senior)?

iron basalt Oct 12, 2022, 3:02 AM

#

serene scaffold did fifa give you forbidden knowledge or something?

I have seen a lot of Football predictors.

#

It's something many want to try to do.

lapis sequoia Oct 12, 2022, 3:03 AM

#

i guess the specific sport im thinking of is wresting (freshman vs sophomore)

iron basalt Oct 12, 2022, 3:03 AM

#

(And coach apps track injury data for maximum training efficiency too)

lapis sequoia Oct 12, 2022, 3:03 AM

#

i dont believe you'd want grade level to just be a label encode

#

idk i could be wrong totally

serene scaffold Oct 12, 2022, 3:04 AM

#

lapis sequoia using another example, under the context of competition and sports, specifically...

if you have a small set of categories, knowing that they're conceptualized in a certain order isn't really that helpful. and if you encode them as the integers 0 to 4, depending on your model, you might get a prediction of 2.653, or something. and what are you going to do with that?

lapis sequoia Oct 12, 2022, 3:06 AM

#

i honestly dont know, maybe im overcomplicating this concept. i think i will experiment with the question using small pilot tests

serene scaffold Oct 12, 2022, 3:07 AM

#

lapis sequoia i honestly dont know, maybe im overcomplicating this concept. i think i will exp...

since your data is of limited use, I would take this opportunity to practice data manipulation. see if you can make a line plot to show each team's performance over the years

#

like what percentage of games they win each year

#

that information is available given the features you described, but you'd have to fiddle around with it.

iron basalt Oct 12, 2022, 3:12 AM

#

lapis sequoia i honestly dont know, maybe im overcomplicating this concept. i think i will exp...

As Stelercus wrote, plot some stuff, then try adding some more data if you have it (although not just any data randomly, try to pick something reasonable or it will just make it harder).

lapis sequoia Oct 12, 2022, 3:15 AM

#

@iron basalt since you are familiar with football predictions, do you have some examples to share off the top of your head? preferable beginner-friendly examples?

lapis sequoia Oct 12, 2022, 3:16 AM

#

serene scaffold since your data is of limited use, I would take this opportunity to practice dat...

and by extension, assigned that "performance rating" to that year?

#

in a new column

iron basalt Oct 12, 2022, 3:18 AM

#

lapis sequoia <@119925597395877889> since you are familiar with football predictions, do you h...

You are using this dataset? https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017

International football results from 1872 to 2022

An up-to-date dataset of over 40,000 international football results

serene scaffold Oct 12, 2022, 3:23 AM

#

lapis sequoia and by extension, assigned that "performance rating" to that year?

the performance rating would just be the number of wins divided by the number of games

lapis sequoia Oct 12, 2022, 3:26 AM

#

iron basalt You are using this dataset? https://www.kaggle.com/datasets/martj42/internationa...

not quite

iron basalt Oct 12, 2022, 3:26 AM

#

Stelercus brought this up, but teams are not static, they are made up entities with players constantly moving in and out, and so without that data there is not too much that can be done. @lapis sequoia

#

It's not a team name that wins, but a temporary group of people that wins.

#

(And this gets really complicated, because while the nice story is that there are star players that carry teams (can still happen), the combinations of players are key (in ways that are not obvious, like two star players in their own teams might be good, but when put on the same team fail))

lapis sequoia Oct 12, 2022, 3:33 AM

#

iron basalt (And this gets really complicated, because while the nice story is that there ar...

these are very good points that i didnt think about

#

thank you both for entertaining my curious questions, i will leave you be

iron basalt Oct 12, 2022, 4:10 AM

#

serene scaffold https://tenor.com/view/rebecca-black-friday-yesterday-was-thursday-cringe-pop-go...

Sigh, now I have this robotic voice stuck in my head.

nova meadow Oct 12, 2022, 4:13 AM

#

Hi everyone,
I have been working on a project where I have to extract text from images and I am using pytesseract for that. Currently, I am working on preprocessing and have used basic transformations binarization, dilation followed by erosion. It is working well on some of the images but for other images it is not even detecting the text. Can anyone suggest me how to get better results?

iron basalt Oct 12, 2022, 4:29 AM

#

nova meadow Hi everyone, I have been working on a project where I have to extract text from...

What is different about the images it works on and the ones it does not?

nova meadow Oct 12, 2022, 4:35 AM

#

Images where the text is in black with white background, is giving really good results like how it is books and papers generally however the images where text is in white then it isn't able to detect it

iron basalt Oct 12, 2022, 4:41 AM

#

nova meadow Images where the text is in black with white background, is giving really good r...

And what does the text look like after pre-processing?

nova meadow Oct 12, 2022, 4:42 AM

#

#

This is one of the images where it failed to detect the text on the upper left

iron basalt Oct 12, 2022, 4:51 AM

#

nova meadow

Some adaptive thresholding might do the trick.

nova meadow Oct 12, 2022, 4:53 AM

#

I'm using adaptive thresholding because global thresholding methods weren't giving good results

iron basalt Oct 12, 2022, 4:54 AM

#

nova meadow I'm using adaptive thresholding because global thresholding methods weren't givi...

Is it detecting the smaller text in the image?

nova meadow Oct 12, 2022, 4:55 AM

#

Yes, it is detecting that text

iron basalt Oct 12, 2022, 4:56 AM

#

What colors are happening for the block with the 6 Person.

iron basalt Oct 12, 2022, 4:58 AM

#

nova meadow I'm using adaptive thresholding because global thresholding methods weren't givi...

If you put it through some edge detection what does it look like?

nova meadow Oct 12, 2022, 4:58 AM

#

I have not tried edge detection yet.

iron basalt Oct 12, 2022, 4:59 AM

#

If the edge detection gives some nice text without the block around it and it still does not work, it could be a multiple scales issue.

nova meadow Oct 12, 2022, 5:00 AM

#

Alright. I will try this. Thanks for the inputs 🙂

iron basalt Oct 12, 2022, 5:04 AM

#

nova meadow Alright. I will try this. Thanks for the inputs 🙂

Try messing with your adaptive threshold parameters too first.

#

(block size and constant subtracted)

nova meadow Oct 12, 2022, 5:06 AM

#

Yes, exactly. I did that because a small change in those was giving pretty different results. I was also looking as to how I can set block size if there is a way to figure out optimal value but could not find it.

iron basalt Oct 12, 2022, 5:07 AM

#

When the block size is small it can act kind of like edge detection.

iron basalt Oct 12, 2022, 5:12 AM

#

nova meadow Yes, exactly. I did that because a small change in those was giving pretty diffe...

Did you do any blurring?

#

(e.g. Gaussian blur)

nova meadow Oct 12, 2022, 5:12 AM

#

Yes, I tried median blurring but blurring was capturing unnecessary noise

iron basalt Oct 12, 2022, 5:13 AM

#

nova meadow Yes, I tried median blurring but blurring was capturing unnecessary noise

Try Gaussian.

#

The blur size relative to the threshold block size is something to consider.

nova meadow Oct 12, 2022, 5:14 AM

#

Alright. I will try that

iron basalt Oct 12, 2022, 5:15 AM

#

Also try Gaussian on the threshold if you are doing mean.

nova meadow Oct 12, 2022, 5:15 AM

#

And block size is somewhat relative to size of the image right? Other than that I could not come up with any relation to figure out block size

iron basalt Oct 12, 2022, 5:16 AM

#

nova meadow And block size is somewhat relative to size of the image right? Other than that ...

How quickly the lighting varies.

#

(And image size)

nova meadow Oct 12, 2022, 5:17 AM

#

okaay

iron basalt Oct 12, 2022, 5:18 AM

#

nova meadow okaay

The regular thresholding does not handle such variations (it's globally, not locally, applied).

#

(Imagine what happens when block size equals the image size)

nova meadow Oct 12, 2022, 5:24 AM

#

aaah okaay. Got it.

torpid quartz Oct 12, 2022, 5:24 AM

#

I have no idea what you are talking about but it sounds cool

old widget Oct 12, 2022, 6:05 AM

#

Can anyone help me use this API to get the population statistic for each city in Canada? I have gone through the doc so many times but I can't figure it out
https://www12.statcan.gc.ca/wds-sdw/2021profile-profil2021-eng.cfm

2021 Census Profile Web Data Service User Guide

desert oar Oct 12, 2022, 11:23 AM

#

old widget Can anyone help me use this API to get the population statistic for each city in...

this isn't a trivial document to read, it's definitely written for experienced programmers to use

#

it looks like you need to go through the docs and figure out what "flowRef" you need

olive vigil Oct 12, 2022, 11:43 AM

#

Probably easier just to download the data here rather than use the API: https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/details/download-telecharger.cfm?Lang=E

Census Profile Downloads, 2021

This product presents information from the Census of Population for various levels of geography. Data are from the 2021 Census of Population and are available according to the major releases of the 2021 Census release dates: February 9, 2022 – Population and dwelling counts; April 27, 2022 – Age, Sex at birth and gender, Type of dwelling; July 1...

maiden widget Oct 12, 2022, 12:25 PM

#

is there any library to recognise only alphabets from audio without API i.e offline ?

I tried VOSK , but as it is trying to recognise all Words and sentence, it has lots of errors.

I only want letter recognition.

serene scaffold Oct 12, 2022, 1:18 PM

#

What does it need to do

lapis sequoia Oct 12, 2022, 1:35 PM

#

i don't really understand the difference between data science and machine learning could someone explain to me?

serene scaffold Oct 12, 2022, 1:48 PM

#

lapis sequoia i don't really understand the difference between data science and machine learni...

data science is basically just knowing how to analyze data with code. machine learning is where you have programs that adjust themselves ("learn") based on example data.

proper wing Oct 12, 2022, 2:23 PM

#

Hi, why is the input for the linear 320

#

when the output after 2nd layer convd is 20

serene scaffold Oct 12, 2022, 2:26 PM

#

proper wing Hi, why is the input for the linear 320

!code

arctic wedgeBOT Oct 12, 2022, 2:26 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold Oct 12, 2022, 2:26 PM

#

Please don't ask people to read screenshots of text.

arctic wedgeBOT Oct 12, 2022, 2:29 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

proper wing Oct 12, 2022, 2:30 PM

#

..

#

    def __init__(self):
        super().__init__()
        # Simple CNN
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 1)

#

Hi, why is the input for the linear 320

#

@serene scaffold like this

cold ridge Oct 12, 2022, 2:51 PM

#

Hii

#

Can anyone tell me how to automate data filtering in excel using pandas ?

serene scaffold Oct 12, 2022, 3:03 PM

#

cold ridge Can anyone tell me how to automate data filtering in excel using pandas ?

Do you know how to open the excel file in pandas?

cold ridge Oct 12, 2022, 3:12 PM

#

import pandas as pd

#

pd.read_excel(file)

cold ridge Oct 12, 2022, 3:14 PM

#

serene scaffold Do you know how to open the excel file in pandas?

This is the approach....ryt ?

lapis sequoia Oct 12, 2022, 3:42 PM

#

@serene scaffold

#

A very silly thing about pandas that I noticed was that for object dtype. It can store some entries as int. And others as string.

#

That's very silly to me

#

Oh

#

Or maybe not. Maybe the mismatch was because of the "22.0" kind of strings.

lapis sequoia Oct 12, 2022, 3:44 PM

#

lapis sequoia A very silly thing about pandas that I noticed was that for object dtype. It can...

But I did notice this happening tbh

serene scaffold Oct 12, 2022, 3:45 PM

#

lapis sequoia A very silly thing about pandas that I noticed was that for object dtype. It can...

If a column is heterogenous, the type will be object

lapis sequoia Oct 12, 2022, 3:45 PM

#

So if you do str.isinstance to an object dtype. Some of them were str. And other were int

lapis sequoia Oct 12, 2022, 3:45 PM

#

serene scaffold If a column is heterogenous, the type will be object

Oh shoot. I wasn't aware of that

#

I thought it always transforms the whole series back to the most generalised dtype

#

Like strings

#

So if it has strings and int. All the ints being string

#

But it just keeps heterogeneous varieties

serene scaffold Oct 12, 2022, 3:48 PM

#

lapis sequoia I thought it always transforms the whole series back to the most generalised dty...

It can sort of do this for ints to floats, since any int can be represented as a float.

lapis sequoia Oct 12, 2022, 3:48 PM

#

So I think it's worth changing dtype to numeric each time for object dtype numerical columns

lapis sequoia Oct 12, 2022, 3:48 PM

#

serene scaffold It can sort of do this for ints to floats, since any int can be represented as a...

Yeah, in R it did even for strings

#

I always thought that's how it works. I didn't know about the existence of heterogeneous dtypes. No one told me 😭

serene scaffold Oct 12, 2022, 3:49 PM

#

lapis sequoia I always thought that's how it works. I didn't know about the existence of heter...

Each column should be heterogeneous. Rows often will not be.

lapis sequoia Oct 12, 2022, 3:49 PM

#

But I think I need to lemmatize you. Are you down?

serene scaffold Oct 12, 2022, 3:49 PM

#

You're going to lemmatize me?

#

Sounds violent.

lapis sequoia Oct 12, 2022, 3:50 PM

#

Gently lemmatize you*

serene scaffold Oct 12, 2022, 3:50 PM

#

You lemmatize words. Not people.

boreal cape Oct 12, 2022, 3:50 PM

#

my model keeps giving value of one kind

#

which leads to high accuracy how do I fix that

#

like I have two class yes and no

lapis sequoia Oct 12, 2022, 3:51 PM

#

serene scaffold Each column should be heterogeneous. Rows often will not be.

Wdym by heterogeneous. I am comprehending it as having more than one dtypes

boreal cape Oct 12, 2022, 3:52 PM

#

and my model predicts no

lapis sequoia Oct 12, 2022, 3:52 PM

#

boreal cape my model keeps giving value of one kind

I also faced this issue. Maybe something with your training data

boreal cape Oct 12, 2022, 3:52 PM

#

like it has 12,000 no values

#

and 2,000 yes values

lapis sequoia Oct 12, 2022, 3:52 PM

#

In predictions?

boreal cape Oct 12, 2022, 3:53 PM

#

and it keeps on predicting no

serene scaffold Oct 12, 2022, 3:53 PM

#

lapis sequoia Wdym by heterogeneous. I am comprehending it as having more than one dtypes

Heterogenous is more than one type. Homogenous is only one type. It's the same distinction as homo or heterosexual

boreal cape Oct 12, 2022, 3:53 PM

#

no not in predictions

#

@lapis sequoia

lapis sequoia Oct 12, 2022, 3:53 PM

#

Yeah so your input data is bad

#

I also had a data like you

boreal cape Oct 12, 2022, 3:53 PM

#

but how do I fix that

lapis sequoia Oct 12, 2022, 3:53 PM

#

Someone suggested a solution here though

#

It was called weighted class training something