#data-science-and-ml

1 messages · Page 21 of 1

hasty mountain
#

Probably

#

Hey @serene scaffold , tell me a bit about LSTMs in text networks... If I make a GAN for text without any LSTMs, without any syllable/word/sentence sequence, my model will only generate text without any logic, right? Even if I pass to the discriminator texts with some logic? Nevermind this latter. I remembered that if I don't shuffle my batch, it'll overfit the model

#

Can I simply pass sequences as inputs to both generator and discriminator without using LSTMs? Or should I use them in order to achieve better performance?

desert oar
#

even with overfitting, loss shouldn't start at zero. also that test curve is weirdly smooth compared to the train curve. check for bugs in your code

novel python
desert oar
#

when in doubt, try to simplify, then work back upwards

novel python
#

I'm creating sequences and then separating those sequences into train and test set with the following functions:


def to_sequences(data, seq_len):
    d = []

    for index in range(len(data)+1 - seq_len):
        d.append(data[index: index + seq_len])

    return np.array(d)

def preprocess(data_raw, seq_len, train_split):

    data = to_sequences(data_raw, seq_len)

    num_train = int(train_split * data.shape[0])

    X_train = data[:num_train, :-1, :]
    y_train = data[:num_train, -1, :]

    X_test = data[num_train:, :-1, :]
    y_test = data[num_train:, -1, :]

    return X_train, y_train, X_test, y_test


X_train, y_train, X_test, y_test =\
 preprocess(scaled_set1, SEQ_LEN, train_split = 0.8)```
#

what i'm finding weird is that the original dataset has 7 points (set1), but the X_train and X_test have only 3 and 1 sequence, respectively, while 5 sequences are possible:

young granite
#

wanted to use this old dataset of mine to try a bit of algorithm and nn on it but i struggle to find a suitable approach to generate 1 big dataset out of n=349 dfs like shown:

#

so far i tried .pivot with the column "temp"

#

atm the dfs are all stored inside a dict with range(0, 350)

soft badge
#

people what is the best place for learn Machine learning?

young granite
soft badge
#

I need to understand how it works and why it works, you know?

#

i was studying for w3school but i dont like the blog of ML

dusky mesa
#

so i have these two csv files the first one is the ingrediets and the total amount we have of it and second file is a pastry, the price the pastry sells for then the amount of ingredients needed to create that pastry

#

sorry for cutting you off

#

but i need to find the best solution, given your circumstances. Output the total profit and how much of each
pastry you have to make

#

i'm not here for the answer i want to actually understand how i would go about this

young granite
#

divide the amounts u need for each recipe to see how many u could produce after that u could build total price

dusky mesa
#

by that you mean the max number of each pastry right?

young granite
#

ye

#

u could then also check whats the best function for a mix of pastrys with the given amount

dusky mesa
#

sorry if its dumb qs but just to make sure i divide the ingredient amount needed by total we have of that ingredient to find max pastry we could make right

#

so for apple pie i got 158, croissant 79, poppy seed 51

young granite
#

u need to consider u can always only produce the least amount possible

#

so if
Y F S
1 2 3
is the result u can only do 1

dusky mesa
#

so what i did was the max amount of each individual pastry that could be made with the total amount of ingredients

young granite
#

yes but what if for example for x pastry u would need x sugar but >x flour

#

if it works out just fine thats good but u need to consider that

dusky mesa
#

ohh ok yea i took that into account i divided each ingredient amount needed for apple pie by its total

#

then i took the lowest amount

young granite
#

👍

dusky mesa
#

thats what you mean right

regal ingot
#

how do i do A* (star) search when my goal state is finding all the keys in a grid. Like how do i calculate my heuristic

dusky mesa
#

how would i find the best combo to maximize profit

regal ingot
#

well

soft badge
#

people what is best place for learn machine learning with fundamentals

regal ingot
#

lol

red canyon
#

Neural network from scratch 🤓

serene scaffold
red canyon
#

im not done thoguh

serene scaffold
serene scaffold
#

Yes

latent cairn
#

Add a new categorical column to df_housing called NOXCAT. This column categorizes the suburbs into towns with LOW, MEDIUM, and HIGH nitric oxides concentration (based on the variable NOX). The categorization should be based on quantiles of NOX as follows:
LOW (NOX <= 30% quantile)
MEDIUM (> 30% quantile; <= 70% quantile)
HIGH (> 70% quantile).

#

There is a dataset with a column NOX, all numbers with about 3 decimals.

I know this will be way off but my attemp that keeps getting an error is;

itm_low= np.quantile(df_housing["NOX"], q=0.30)
itm_med= np.quantile(df_housing["NOX"], q=0.70)
itm_high= np.quantile(df_housing["NOX"], q=1)
df_housing['NOXCAT']= {"NOX": {(itm_low): "LOW", (itm_med): "MEDIUM", (itm_high): "HIGH"}}

Any assistance would be much appreciated!

hasty mountain
#

Does anyone have a tip to get the closest float number from a certain input?
I'm testing a word prediction model and I'm trying to work with data within range [-1, 1]. The model is doing quite fine, but I'm having some problems when trying to convert my tokens back to words again.
How can I make an output which has value -0.0703 be converted to a word which has value(in my dictionary) -0.0702?

hasty mountain
#

Meh. I'll just stick to scikit learn's nearest neighbours...

latent cairn
#

I'm still trying to work my one out, ffs

wooden sail
hasty mountain
#

Oh... I see...

#

Uh...well...at least it worked with KNN...

#

I'm not even using embedding layers, since I'm using floats and not using one-hot encoding. Hope this doesn't prejudice the model too much.

latent cairn
#

just need to change the names around to LOW, MEDIUM and HIGH

#

maate no wonder software engineers and data scientists are on the big bucks, being proficient at excel I thought I was clever until I took on this stuff

shell crest
#

!rule 8

arctic wedgeBOT
#

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

shell crest
#

Wait no

#

!rule 9

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

shell crest
# arctic wedge

Anyway other than this, anyone who claims to be 'paying well' is 99.95% guaranteed to not be paying well

latent cairn
#

you sound really nice, I'm sure someone will be hanging to work for you

#

All the best chief

shell crest
#

<@&831776746206265384>

wooden sail
#

you could also use the 2-norm (euclidean distance) if the encoding is multidimensional

hasty mountain
atomic shadow
#

Hi, any idea on which algorithm i can explore (or article for reference) if i want to predict a coordinate value (x,y) based on x values (x1, x2, ... , x (n))

Eg:
| x1 | x2 | x3 | ....... | y
0.43 | 0.56 | 31.21 | ....... | (3.51, 4.66)

I tried RandomForest but it gives me error - "ValueError: could not convert string to float: ''

devout sail
#

That's a Python error on your end, not an algorithmic error

atomic shadow
devout sail
#

They should be floats yeah

atomic shadow
#

Let me try

versed gulch
#

Hi is there a way I can condense my code, what I'm doing here is taking a centre pixel and looking neighbouring pixel values that are equal to 255

coords = zip(z_coords.tolist(), x_coords.tolist(), y_coords.tolist())
for z, x, y in coords:
  # exclude edegs/boundary of skeleton image (may cpuld pad skeleton image in the future)
  if z == skel3d.shape[0] - 1 or x == skel3d.shape[1] - 1  or y == skel3d.shape[2] -1:
    continue
  # keep track of the neighbours
  neighbours = []
  
  # current slice
  neighbours.append(skel3d[z, x-1, y-1])
  neighbours.append(skel3d[z, x-1, y])
  neighbours.append(skel3d[z, x-1, y+1])
  # middle so exclude the actual centre voxel - except for prev and next slice
  neighbours.append(skel3d[z, x, y-1])
  neighbours.append(skel3d[z, x, y+1])
  
  neighbours.append(skel3d[z, x+1, y-1])
  neighbours.append(skel3d[z, x+1, y])
  neighbours.append(skel3d[z, x+1, y+1])
  
  # previous slice
  neighbours.append(skel3d[z-1, x-1, y-1])
  neighbours.append(skel3d[z-1, x-1, y])
  neighbours.append(skel3d[z-1, x-1, y+1])
  
  neighbours.append(skel3d[z-1, x, y-1])
  neighbours.append(skel3d[z-1, x, y])
  neighbours.append(skel3d[z-1, x, y+1])
  
  neighbours.append(skel3d[z-1, x+1, y-1])
  neighbours.append(skel3d[z-1, x+1, y])
  neighbours.append(skel3d[z-1, x+1, y+1])
  
  # next slice
  neighbours.append(skel3d[z+1, x-1, y-1])
  neighbours.append(skel3d[z+1, x-1, y])
  neighbours.append(skel3d[z+1, x-1, y+1])
  
  neighbours.append(skel3d[z+1, x, y-1])
  neighbours.append(skel3d[z+1, x, y])
  neighbours.append(skel3d[z+1, x, y+1])
  
  neighbours.append(skel3d[z+1, x+1, y-1])
  neighbours.append(skel3d[z+1, x+1, y])
  neighbours.append(skel3d[z+1, x+1, y+1])
  
  if neighbours.count(255) > 2:
    print(z, x, y)
compact star
#

has anyone created their own python implementation of neat? It would be really helpful if I could see it

devout sail
strong sedge
#

^ would work better

devout sail
#

Do a > 2 on the result of that, and send it to numpy.where to get the indices

serene scaffold
#

Omg hi @devout sail

#

Wyd here

coral nimbus
#

Hi guys, is there anyone here who has experience on number recognition by chance?

serene scaffold
tacit nacelle
#

hey! so I have this final year project in the theme of city. I thought about doing a program to optimize traffic light system. So my idea is counting vehicles on each waiting queue ( which I've already done using opencv and yolov3), but since I'm finding problems implementing an algorithm I found that uses a conflict matrix, I'm looking for alternatives things I can do in case I couldn't realize the code

#

something that uses vehicles detection

#

and that is not just programs but also math theories

arctic wedgeBOT
#

Hey @tacit nacelle!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

viral oak
#

Is it not recommended to try make an AI crack sha256 hashes?

serene scaffold
lapis sequoia
#

Is dumping model with pickle library fine

#

I dumped it and loaded it in another script. But the predictions seems to be a bit off. So I was wondering if there's some issue with the model loading

#

Because my accuracy was good earlier

#

Infact it couldn't even predict well the data it was trained on

serene scaffold
#

But for whatever library you used to train the model, I would use it's native saving functionality.

lapis sequoia
#

I did model=logisticreg()
Model.fit(train, test)
Pickle.dump(model)

#

Is that fine way?

serene scaffold
#

See if there's a save method for the model object

lapis sequoia
#

Hmm

#

I used sklearn

#

And official sklearn says to use pickle

#

Might be something wrong with my preprocessing maybe then

viral oak
#

my lowest is 89 bits off out of 256 (Maximum error across 1,500 random hashes)

mild edge
#

hey I want to learn AI can anyone share any roadmap

austere epoch
#

@grand lion

#

whatever plot is my copied ax, is slightly shifted

austere epoch
#

(ignore, figured it out)

serene scaffold
viral oak
agile cobalt
#

not worth a shot
AI isn't magic, it's a science
it is not applicable to that task at all

serene scaffold
viral oak
serene scaffold
austere epoch
#

By definition of what randomness is

mint palm
#

so i am using ssh and was having problem of process getting "killed" probably due to resource utilisation.
I had to extract feature using resnext3d on 1900 videos
doing all was giving that "killed" error
so i tried to decrease the number of videos i did feature extraction.
on 10 videos it took 50 minutes.
I dont know if its normal or no, Please guide.

misty flint
#

tldr is this:

viral oak
tidal bough
#

it's pretty cool that they managed to discover a new multiplication algorithm for 4x4 matrixes over Z/2

#

hopefully they adjust their loss function and see if they can generate some novel algorithms that work on floats (or even just on ints)

proper wing
#

hey if anyone is familiar with deploying a ml model in ai can u check my q in broccoli pls thx

lapis sequoia
#

I need help on part 5

#
import numpy as np
import pandas as pd

yellow_taxi = pd.read_csv('2018_Yellow_Taxi_Trip_Data.csv')
# For each month, print the entire row with the highest fare_amount
print("For each month, print the entire row with the highest fare_amount.")
# obtain month from pickupdate
yellow_taxi['month'] = pd.DatetimeIndex(yellow_taxi['tpep_pickup_datetime']).month
monthlyMaxFares = yellow_taxi.groupby(['month'])['fare_amount'].max()

# for month_index in range(1,13):
#    month_subset = yellow_taxi.loc[yellow_taxi['month']==month_index]
#    max_fare_row = month_subset.loc[month_subset['fare_amount']==np.max(month_subset['fare_amount'])]
#    if max_fare_row.shape[0] != 0:
#        print(max_fare_row)
serene scaffold
#

Once you know the idxmax, you can get the rows

lapis sequoia
#

iv never heard of this before lol but let me check

#

cuz i have the csv i have to get maximum of certain column for every month and have to print down the entire row

odd meteor
odd meteor
odd meteor
serene scaffold
odd meteor
regal ingot
#

how do i get a heuristic for multiple targets

#

like my goal isn't an end point but finding all keys

serene scaffold
#

@regal ingot can you elaborate?

low bloom
#

whats the best way to add metadata to a df in pandas python?

#

feel free to @ me

storm kelp
#

Any one here read Hands on machine learning with ...?

#

Seems like a decent book

serene scaffold
low bloom
#

I think I figured it out
I am basically doing it this way

        # adding metadata of sheet name into the df to be used later
        df.sheet_name = sheet_name
serene scaffold
low bloom
#

thank you though!

latent cairn
#

Does anyone know of a credible paid tutoring service for pandas, numpy, matplotlib, scripy? Not sure I will use it but curious what kind of resources are out there.

serene scaffold
latent cairn
#

I tried to create 3 variables using numpy.quantile, just can’t get it to work

serene scaffold
#

I would have answered that if I were at a desktop

#

Unfortunately I'm on vacation. So I only have my phone.

latent cairn
#

It’s ok, thanks though

serene scaffold
#

Try again on Tuesday

latent cairn
#

Just frustrating, uni course pre-requisites were basically nothing. They sting you $3500 and the lectures assume a lot of prior knowledge, I’m on track for a fail.

#

Good way for them to make money I guess.

shell crest
shell crest
#

I don't want to ping Stelercus but let me know if you think that's too much help

latent cairn
#

Cheers I’ll give it a run in a few hours

shell crest
#

It won't solve your problem if that's what you're expecting

latent cairn
#

I need to name the quantiles LOW MEDIUM HIGH and then do visualisations and regress each

#

That’s ok

#

I’m gonna try get my money back on the basis the course outline is misleading

#

Respect to those that are good at this, a lot to learn

serene scaffold
#

Your solution could be made a bit more performant with a loc assignment

#

I guess that's not actually a solution. And I strongly agree with your saying that creating a toy example is very helpful

latent cairn
#

It’s due tonight so I’ll submit that and hope for the best, problem is that part is only worth 5% 😵

#

The rest is probably even more difficult

shell crest
shell crest
#

I'm not sure if you're regressing it correctly

latent cairn
#

The newly created column NOXCAT in df_housing is a categorical column with three possible values (LOW, MEDIUM, and HIGH).

Create a set of dummy variables (for different values of NOXCAT).
Regress MEDV on the different NOX categories using the dummy variables. Choose the dummy variable coding in your regression such that the intercept reflects the MEDV value of suburbs in the MEDIUM category. Save the regression result as res_2 and print the regression result to the console.
Report the regression results from res_2 in your own words according to APA stype and interpret the coefficients.
Hint: Look at pd.get_dummies.

#

ANSWER that doesn't provide coef for HIGH

pd.get_dummies(df_housing['NOXCAT'], prefix="dummy")

#print (df_housing) - checked

mod_2 = smf.ols('MEDV ~ NOXCAT', data=df_housing)
res_2 = mod_2.fit()
print (res_2.summary())

#

I do note I see no HIGH values in NOXCAT, so I think my previous answer is wrong

#

That is what it produces

high creek
#

Can I do MS in AI or NLP with a BS in IT?

serene scaffold
# high creek Can I do MS in AI or NLP with a BS in IT?

You might have to take some prerequisites before you would be able to start the MS courses. The best way to figure this out would be to look at the admissions websites for the programs.

There usually isn't an "MS in AI", and there definitely won't be one in NLP. It would probably be an MS in CS. If you know you want to do NLP, make sure there are research faculty at that university who specializes in it.

lapis sequoia
lapis sequoia
#

Are you in US btw?

serene scaffold
#

Please make sure that your question is a complete sentence. If you use an incomplete sentence thinking that I'll know what the intended full sentence is, there's a very high chance that we'll miscommunicate.

high creek
high creek
high creek
serene scaffold
high creek
#

My major is fully Python

#

The CS major is heavily C language

serene scaffold
hasty mountain
#

Can someone tell me some metrics and tricks using loss functions for Text Generator Models? I suppose there's some GANs for text in order to have a good conversation model, right? Perhaps some metric or trick to measure how much the generated text makes sense?

I know that, in SRGAN, it was used a MSE Loss multiplied by an "adversarial loss", in order to achieve the "pixel-wise loss"(or something like this), which can improve the GAN output diversity.

high creek
#

Ah I see

#

Is Data Science needed for NLP

serene scaffold
high creek
#

Ok understood

serene scaffold
#

"data science" isn't a well defined thing. Linear algebra is, however. And it's needed for NLP.

high creek
#

Okay thx for your time

strong sedge
strong sedge
#

lol

shell crest
latent cairn
torpid quartz
#

I want to get into concepts of ai and ml, where should i start?

torpid quartz
#

Are these all books?

#

Just asking

worldly dawn
torpid quartz
#

Thanks, got it

#

I also saw a course at fcc, so toot think that is worth it?

worldly dawn
torpid quartz
#

Thanks

mild edge
lapis sequoia
#

Hi @serene scaffold

#

How are you

obsidian peak
#

Rubiks Cube AI assistant

stark ember
#

Is it normal for PyTorch CUDA models to show barely any usage in Task Manager?

#

It seems like it just uses a fraction of Copy and fills up VRAM, but it's not actually working too hard

#

Is it possible to somehow use more of the GPU in order to accelerate the workload, or is that just not possible?

#

Sorry if it's a silly question, it's my first time using CUDA

serene scaffold
clever sorrel
#

hey!

#

I need a help in cnn

gloomy anvil
#

Hello y'all! Does anyone of you know the correct term for when you assume the prediction for tomorrow is the same as the value today? I think I read it on machinelearningmastery, but I am unable to find it and I also don't remember what this was called. I believe he introduced it as the simplest benchmark in order to see if a model can beat the simple assumption that value today == value tomorrow

wooden sail
#

time or shift invariance, maybe?

#

or stationarity?

serene scaffold
#

What about interpolation?

gloomy anvil
# wooden sail time or shift invariance, maybe?

not stationarity. it was a special word that i cannot remember.... what i actually mean is a martingale sequence. but he used another word (and in my eyes better word) for it when he created a baseline model

untold bloom
#

"naive" model it is called

gloomy anvil
#

while martingale always kind of implys that you double your stakes, he used a word (not it but like) "autoregressive baseline"

untold bloom
#

it's random walk's corresponding model

gloomy anvil
untold bloom
#

under "naive method"

#

although "i" is with 2 dots on it

gloomy anvil
#

thank you so much!

#

This is what i was looking for! (even though i still believe brownlee used another word 😄 )

wooden sail
#

you'd have to give more context 😛

gloomy anvil
#

Simple forecasting methods include naively using the last observation as the prediction or an average of prior observations. It is important to evaluate the performance of simple forecasting methods on univariate time series forecasting problems before using more sophisticated methods as their performance provides a lower-bound and point of comp...

#

finally found it 🙂

#

stupid follow up question: If i assume the price of X tomorrow is the same as price of X today in a naive model, how do i decide if I should buy or sell? I basically can only make the decision based on the differenced timeseries, right?

#

So if the change from yesterday to today was let's say 2%, I assume 2% for tomorrow as well. Whereas if todays price of X was 100 and I assume 100 for tomorrow as well, there is no room for decisionmaking - which would kind of imply a "hold" strategy, right?

hardy siren
#

Given an image of a chess board, I would like to find out what piece each square contains. Is there any python package that could assist this task?

serene scaffold
#

The first step would be segmenting the chessboard image into each tile

#

This should be easy since chessboard is already a grid

#

The second step is to classify each tile as either blank or what piece it is

#

For the second step, you would need training data that has different images of what those pieces could look like

#

Are these pictures of real chess boards, or virtual chess boards that are 2d?

serene scaffold
#

@hardy siren sorry, I was away for a bit. Look at 3blue1brown's series about neural networks. He makes a classifier for the MNIST dataset of images, which is a very similar problem.

storm kelp
#

Covers NumPy, Pandas, Matplotlib, and Scikit-learn

serene scaffold
heavy crow
#

Does anyone know of research done on extracting features from images for structure from motion? A neural SIFT so to say. I've only found one or two Papers that don't really delve deep into the subject.

storm kelp
lapis sequoia
#

Do you have something like that in video format

storm kelp
swift furnace
#

What do data scientists do? It is not clear to me, would anyone mind explaining it?

swift furnace
#

Would you give me an example of where data science could be used?

young granite
#

drug tests or studys

#

correlation causality

#

its a big book

swift furnace
#

I've seen some people commenting on the use of Python for managing investments, would that be a case scenario where data science is used?

young granite
#

u can use python for everything thats the neat thing bout it

swift furnace
#

hmm

young granite
#

so u generate data

#

u import the data into ur algorithm

#

u transform the data for better use of it

#

u can run different types of "tests" to see trends in ur data

#

u can visualise ur data

#

so u see there is no clear description

swift furnace
#

I see

#

It is quite broad and can be used for anything depending on the context

young granite
#

yes

#

like statistics

#

thats why i call it modern statistic

swift furnace
#

I got quite interested in investment lately, would Python be a useful tool for analyzing data and then deciding on what would be a good investment?

young granite
#

on the first part for sure

#

but what to buy is a bit hard to tell

swift furnace
young granite
#

the market is not following any rules

swift furnace
#

I'd only use Python for analyzing and showing important data

young granite
#

ye that works

swift furnace
#

hmm

young granite
#

easy

swift furnace
#

Apparently, data science and machine learning seem to be used together quite often, is ML useful for data science?

young granite
#

yfinance or another API

#

for ML u need data so ofc

#

but its not always the best approach to a problem

#

sometimes human brain works aswell

swift furnace
#

I see

#

Do u work as a data scientist if I may ask?

young granite
#

i plan to do so

swift furnace
#

cool

young granite
#

yes

#

so i advice u to learn python

swift furnace
young granite
#

depends on ur background im new to data science aswell

swift furnace
young granite
#

well then u got more knowledge then me i guess 😄

#

but if u wanna analyse stocks i can give u my import tool on crypto currencies

lapis sequoia
#

Plus I get overwhelmed by how slow i proceed in a book

#

Like 30 minutes a page

swift furnace
young granite
young granite
#
import pandas as pd
from requests_html import HTMLSession

numbers = [number for number in range(0, 1100, 100)]

table = pd.DataFrame()

for number in range(len(numbers)):
  if numbers[number] == 1000:
    break
  else:
    session = HTMLSession()
    resp = session.get(f"https://finance.yahoo.com/cryptocurrencies?offset={numbers[number]}&count=100")
    tables = pd.read_html(resp.html.raw_html)               
    df = tables[0].copy()
    df.index = range(numbers[number],
                     numbers[number+1])
    table = pd.concat([table,df])
    
Symbols = list(table.Symbol)
#
import yfinance as yf
import datetime as dt
import timeit
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import clear_output

fig=go.Figure()
fig = make_subplots(specs=[[{"secondary_y":True}]])

start = "2020-01-01"
end = dt.datetime.now()

count = 0
start_timer = timeit.default_timer()

for x in Symbols:
    count +=1
    
    symbol = x
    x = yf.download(x,
                    start,
                    end,
                   )
    
    stop_timer = timeit.default_timer()
    
    clear_output(wait=True)
    
    print("Current progress:",
          np.round((count/len(Symbols))*100, 2), "%",
          #end="\r"
         )
    print("Current runtime:", np.round((stop_timer-start_timer)/60, 2), "minutes")
    
    
    fig.add_trace(go.Scatter(
        y=x['Open'],
        x=x.index,
        name = symbol,
        legendgroup = symbol,
        marker_color = "green"
    ),
                  secondary_y=False
                 )
    fig.add_trace(go.Scatter(
        y=x['Volume'],
        x=x.index,
        name = symbol,
        legendgroup = symbol,
        marker_color = "red"
    ),
                  secondary_y=True
                 )```
swift furnace
#

thx for sharing :]

young granite
#

my pleasure

marble obsidian
#

@young granite I was working with yfinance yesterday! Nice coincidence. What are you working on? I am an experienced ML developer, so I can help answer a few questions if you have any

autumn kindle
#

How to measure speed rate when someone reading a paragraph

marble obsidian
#

Interesting question. But my best guess is this is not something solved using ML, apart from the part where you track eye movement.

meager crater
#

Hey anyone knows how to rewrite this compile without string parameter ?

m.compile(
    optimizer="RMSprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

Tried this, but got an error in model fitting:

m.compile(
    optimizer=keras.optimizers.RMSprop(),
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics=[keras.metrics.Accuracy()]
)
#

I understand that it is metrics issue, however, confused on why would that throw an error

ValueError: Shapes (None, 1) and (None, 10) are incompatible

Note: I'm getting this error only with Accuracy

fresh tiger
#

Hi! I had a question regarding backwards propagation and gradient descent.

W <- W - alpha * dJ(W)/dW

From what I am understanding, the gradient in the formula above is retrieved via back propagation in the neural network. In screenshot1 I understand how this chain rule gives us the gradient for J(w) and w2.

In screenshot 2 though, I am a bit confused as to why we apply the chain rule in that way compared to screenshot 3. Since we have the values of w1 and J(w), why do we need to apply the chain rule again for dy/dw1? Wouldnt that be an unnecessary extra step?

desert oar
#

i assume W are all the model parameters

fresh tiger
#

J would be the cost function, and yes W are all the model parameters

desert oar
#

what is the context for screenshots 2 and 3?

fresh tiger
#

to get the gradient of the graph mapping the cost function J(W) against w1. This gradient is then used in the gradient descent formula to find the optimal w1

desert oar
#

it looks like screenshot 2 is the "fully expanded" form of 3

fresh tiger
#

Yes, but I am just a bit confused as to why we need to do that, in the video it was said that we cant directly get dy/dw1 (screenshot3)

#

hence why they apply chain rule again in screenshot 2

#

since we already have the values of dJ(W) and dw1

desert oar
fresh tiger
#

Lets say we are at the stage where we want to find the optimal w1. Why would we even need to use the chain rule to find that gradient, isnt it enough to just have the values of the cost, based on different vals of w1?

desert oar
#

you can do that if you know the closed form of that expression!

fresh tiger
#

I came across this formula before

fresh tiger
#

If thats the case, then is the chain rule preferred due to performance? Ie would it be more expensive to use that closed form for each w (especially in networks with higher depth and width)?

desert oar
#

this looks a bit like the expression for one layer only

fresh tiger
#

yes I think thats what it is

desert oar
#

it might be illustrative to actually work through these expressions in their fully "expanded" form

#

it's good that you're asking these kinds of questions though

#

(and also a great example of why learning math from the videos is not usually that effective)

#

by fully "expanded", i mean write out a model with a small number of inputs, one output, and one small hidden layer, and then actually work through the backprop equations

fresh tiger
#

Alright yes that sounds good, will give that a try. I just had one thing to ask about the closed form stuff u mentioned. If we did have closed form equations, would it be more efficient than the chain rule? I assume thats usually not the case, and finding the closed form seems harder/more computationally expensive than just using the chain rule?

desert oar
#

it is, and in fact backpropagation allows us to take advantage of a lot of repeated computation in practice and significantly reduce runtime by caching them

fresh tiger
desert oar
fresh tiger
#

Ahhh ok things are starting to click now

#

I also stumbled across this earlier:

desert oar
fresh tiger
#

Awesome, its super clear now! Thank you so much for all of ur help!! I really appreciate it 🙂

sterile fjord
#

Hey is anyone available to refactor a quick df.apply if statement into a np.where?

lapis sequoia
#

Can anyone recommend a youtube video that will help with creating certain bots which tells u different key words to use esc and by bots I mean the ones that are meant to do stuff for you

#

I think this may have something to do with ai not so sure

alpine temple
#

And then I have an InterativeImputer object go over the columns with missing values after that point?

serene scaffold
serene scaffold
regal ingot
#

O,O,O
O,O,O
O,O,T

#

can anyone help me with an a star search question

lapis sequoia
#

i am lost with my "machine learning" project attempting to predict the winner of this upcoming world cup

#

i recognize that to train a model i would need to find at least two correlated variables that somehow connect back to the team who won (in a match). however, i realize that team names are not numbers but strings so that's not very useful in a correlation matrix

#

is my approach flawed? is there another (better) way to approach this problem?

remote vortex
#

are there any books related to python ml for beginners that I could download?

remote vortex
worldly dawn
remote vortex
worldly dawn
#

ideally, read both 🙂

remote vortex
reef dock
#

Hey what's a good source to get started on AWS for ML/Infra?

unreal flicker
#

Did anyone solve the turing.com test for the data science stack?

lapis sequoia
#

Do I need to learn data science before starting with AI?

lapis sequoia
#

Oh damn

#

Ty

random forum
#

Hi in tensorflow, I get a ValueError: Shapes (None, 1) and (None, 5) are incompatible. I am implementing an NLP scenario which has a multi class classification

#

I have converted my training data and labels into numpy arrays

wooden sail
#

what operation are you trying to do, exactly? this is telling you you placed something of size 1 where it should have been of size 5 (or backwards)

#

common cases where it happens are where you use something that expected a one-hot encoded vector, but you returned an int instead

#

e.g. [0,0,1,0,0] in one-hot vs [2] as an int

alpine temple
#
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.30, random_state=123, stratify=[target])

ValueError: Found input variables with inconsistent numbers of samples: [11846, 1]

print(features.shape, target.shape) # (11846, 25) (11846,)
#

😐

desert oar
# lapis sequoia i recognize that to train a model i would need to find at least two correlated v...

you can think of "team" itself as a categorical variable, for which we have several encoding techniques. but i would not spend your energy trying to hammer your data into some generic format that people usually use for machine learning.

there is plenty of formal probability analysis that you can do with this. see for example the Elo ranking system https://en.m.wikipedia.org/wiki/Elo_rating_system, which predicts something akin to the probability of any one team beating any other team.

misty flint
#

look at the MLOps section

desert oar
alpine temple
#

(11846, 25) (11846,) are not the same length?

desert oar
#

you can try doing target.to_frame() to easily "upgrade" the series to dataframe

alpine folio
#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

alpine temple
#

Wait - I resolved it.

#

Damn it.

#
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.30, random_state=123, stratify=[target])

I've read "Array Like" as it you would be parsing in an array of columns that you would be stratifying as. So the stratify parameter should be stratify=target, not =[target]

#

Thanks for your help, @desert oar

tame condor
#

Hello, do you know how to make the attached picture? Sounds like it was made with python Matplotlib to me but I was wondering is there any resource you can suggest how to color the indicator function and super trend buy and sell order?

reef dock
misty flint
# reef dock cheers

np. take a look at fullstackdeeplearning, the 2022 course, if you are interested in diving deeper afterwards

reef dock
#

Sure, will do. Thanks

alpine folio
#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

lapis sequoia
#

Dl is a part of AI right?

fresh tiger
wooden sail
#

you do, but those are fairly simple to compute

#

from the perspective of y as a function of z1, all there is is a single layer

#

by using the chain rule, you take derivatives by considering a single layer at a time

#

those are a lot more simple because it's an affine transformation composed with a nonlinear func. that itself requires using the chain rule, but it's not so bad

desert oar
desert oar
#

and when they are not necessarily easy to differentiate "symbolically" (what you learn in calculus class), they can probably still be differentiated using "automatic differentiation"

#

the latter is part of the magic behind the various deep learning and differentiable computing frameworks, and why specifically the property of a program or algorithm being differentiable is interesting: because you can actually use it in backpropagation, or rather you can backpropagate through it

fresh tiger
wooden sail
#

let's take an easier example

#

you know the derivative of e^x is equal to e^x * dx/dx

#

now let's replace x with f(x)

#

the derivative of e^f(x) is e^f(x) df/dx (x)

#

or in a more general case of function composition, the derivative of g(f(x)) is g'(f(x)) * f'(x)

#

you can see that, inside of g and g', it's always f(x). we can black box this

#

then we treat f'(x) completely separately

#

it's just your usual chain rule

fresh tiger
#

Ahh ok, so if I am understanding this correctly, the black boxing of f(x) pretty much solves our issue of the deeply nested funtions making things super complex?

wooden sail
#

yep

proper wing
#

hi i got a question about pytorch cnn's

#

why after a conv2d layer, then a maxpool layer and we have a new conv2d layer why is the new conv2d layer input the same as the output layer of 1st conv2d

#

even tho theres a maxpooling layer

#

ah nvm

#

i think its cuz its channels and not image size

fresh tiger
# wooden sail yep

OH wait a sec, so basically we can find the dJ(X)/dw1 without having to look further back in the NN, ie if we had even more layers before w1

wooden sail
#

yep, chain rule

fresh tiger
#

but for example, how does dy/dz1 get calculated?

wooden sail
#

instead of expanding the composition and differentiating a complex function once, we take several easy derivatives

fresh tiger
#

ok I think this solves my doubt

wooden sail
#

exactly as we did in the example above

#

look, let's take g(f(x))

#

now let's call z = f(x)

#

we find the derivative of g(z) w.r.t. x

#

that's g'(z) z'

#

and z' = f'(x)

#

so g'(z) f'(x)

#

g'(z) doesn't need anything other than the derivative of g, evaluated at whatever z is. it doesn't matter what

#

z is just the previous layers evaluated at the given input

#

it's just chain rule

#

forget about the network

#

just review your calculus

fresh tiger
#

Ok ok I see now this is super clear. Yeah I checkout out another video on chain rule and things are connecting now, I now understand WHY we use the chain rule and what it actually does.

Just to confirm the process overall goes:

If we specify f(x) and the sigmoid as an activation function, we can specify the derivatives of those then in the code (I assume we would have to calculate/specify the derivative of our functions if we are implementing back propagation?). This allows us to then take take these simple derivatives we talked about earlier via the chain rule and hence find the derivative of J(W) w.r.t some w value that is super far back in the chain for example?

wooden sail
#

right

#

if you can compute functions and their individual derivatives, you can do the same for their composition by using the chain rule

fresh tiger
#

Alright awesome, Its very clear to me now! Thank you both so much for all of ur help and for bearing with me! I appreciate it a lot 😄

stark ember
#

I'm trying to run a model on CUDA and am pretty clueless on what I'm doing - is there a way I can somehow get around this memory issue possibly at the cost of performance or is it a hard border of what I can and cannot run?

serene scaffold
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

lapis sequoia
lapis sequoia
stark ember
#

I'm not doing any of the loading myself, just using the library's exposed functions

#

If you want the source for whisper.load_model, I can provide that as well:

def load_model(name: str, device: Optional[Union[str, torch.device]] = None, download_root: str = None, in_memory: bool = False) -> Whisper:
    """
    Load a Whisper ASR model

    Parameters
    ----------
    name : str
        one of the official model names listed by `whisper.available_models()`, or
        path to a model checkpoint containing the model dimensions and the model state_dict.
    device : Union[str, torch.device]
        the PyTorch device to put the model into
    download_root: str
        path to download the model files; by default, it uses "~/.cache/whisper"
    in_memory: bool
        whether to preload the model weights into host memory

    Returns
    -------
    model : Whisper
        The Whisper ASR model instance
    """

    if device is None:
        device = "cuda" if torch.cuda.is_available() else "cpu"
    if download_root is None:
        download_root = os.getenv(
            "XDG_CACHE_HOME", 
            os.path.join(os.path.expanduser("~"), ".cache", "whisper")
        )

    if name in _MODELS:
        checkpoint_file = _download(_MODELS[name], download_root, in_memory)
    elif os.path.isfile(name):
        checkpoint_file = open(name, "rb").read() if in_memory else name
    else:
        raise RuntimeError(f"Model {name} not found; available models = {available_models()}")

    with (io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")) as fp:
        checkpoint = torch.load(fp, map_location=device)
    del checkpoint_file

    dims = ModelDimensions(**checkpoint["dims"])
    model = Whisper(dims)
    model.load_state_dict(checkpoint["model_state_dict"])

    return model.to(device)
serene scaffold
stark ember
#

Alright, thanks a lot for helping!

desert oar
#

@fresh tiger it looks like you worked through it with Edd, but this is a great example of why it's valuable to actually go through the motions with specific (simple) cases, like some small neural network with 4 inputs, 2 hidden nodes, and 3 outputs + softmax, with MSE loss and sigmoid activations. that's well within reach of what you can write out and work through entirely by hand, even completely avoiding vector notation and working with sums of scalar terms

#

it's not the kind of thing you need to do more than once or twice before you get it

#

part of the value of a good course, or at least a good textbook, is having exercises like the above presented to you

desert oar
stark ember
# desert oar can you post the error too? i can't read that screenshot
whisper> load large
Loading large model on CUDA
100%|█████████████████████████████████████| 2.87G/2.87G [05:57<00:00, 8.64MiB/s]
CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 7.12 GiB already allocated; 0 bytes free; 7.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
whisper> 
alpine folio
#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

fringe anvil
#

hello, sorry to annoy, i just started uni (last month) for data science diploma. problem is, math was never a strength. is there any material out there, that is clear, and shows the maths directly in python.

cause looking at math formulas all day, i fried my brain 3 days in a row.

thanks for your time

alpine temple
#

Side note everyone - here's a silly question.

So this is a little right skewed, right?

#

Now watch before your very eyes as I REMOVE the slight right skewness.

# I am going to do a Log Transform of Min C, output the graph and p-value to see what improvement is shown.

logged_minc = df_imputed['Min °C'].copy() # not logged yet
logged_minc = (logged_minc - logged_minc.min()) + 1
logged_minc = np.log10(logged_minc)

plt.figure()
ax = sns.histplot(data=df_imputed, x=logged_minc, hue=df_imputed['Rain(Y/N)'], kde=True)
#

TADA!

The right skew is gone!

#

It is now a trailing left Skew.

Am I doing something wrong here?

fringe anvil
alpine temple
#

Log transform here as well - Standard Deviation has narrowed in terms of the values on the x-axis to have the deep crevasse of nothing from 0 to the start of the distribution be from 10 units to about 1.

But I have an overwhelming feeling what I should be saying in my report is that Log Transformations may not be the catch-all transformation for skewed data? Or is that taking it too far?

brave sand
#
elif args.attack_type == "remote":
                    prisoner_loc = env.env.prisoner.location.copy()
                    dists = []
                    for i in range(env.env.num_known_cameras):
                        cam_loc = env.env.camera_list[i].location.copy()
                        dist = np.linalg.norm(np.array(prisoner_loc)-np.array(cam_loc))
                        dists.append((i, dist))
                    sorted_dists = sorted(dists, key=lambda x: x[1], reverse=True)
                    idx = np.random.choice(5, args.C, replace=False)
                    attack_action = [sorted_dists[i][0] for i in idx]```
can someone explain this chunk of code? it's supposed to only perturbs the detection flag to True and set the detected location to be the camera's own location. how could I edit this code so it could specify the location of the camera as an action?
sick moon
#

https://www.youtube.com/watch?v=YqaNo0XfAD4
A quick talk I gave to PyHEP, in last september, organised by the people at CERN.
We talked about what Python can do in VR, not exactly related to particle physics, but they kindly invited us to show our work.

Through several examples of practical use cases the talk will present our experiences of 3D and Virtual Reality, all implemented in Python with the help of our 3D package "HARFANG 3D" :

Human factor study of a railway station in virtual reality
Using a aircraft simulation sandbox for AI training
Tele-operating a humanoid robot in VR...
▶ Play video
desert oar
#

@fringe anvil @alpine temple it's unfortunately a disservice to students to try to force them to learn material that they don't have the prerequisites for. "in python" is also a bit of a challenge here. there might be some books that use numpy for linear algebra examples (i don't know of one), and i know there is at least one book using code examples to teach probability. but if you can at least specify a couple of things you don't understand, someone might be able to direct you to useful resources

#

realistically the only way to learn applied math is to learn math. you don't need a graduate degree, but you do need the fundamentals of linear algebra and multivariable calculus.

#

if you are significantly more comfortable with code than with traditional math notation, maybe a good exercise would be to translate traditional formulas into python functions

lunar wharf
#

Hey there! Whats a good free resource to learn python for data science / ML ? Something similar to TOP but for python?

lapis sequoia
#

Dl is a sub category of ML right?

rough magnet
lapis sequoia
#

Ty

desert oar
#

@alpine temple @fringe anvil in addition to what i said before, check the pinned messages in this channel. look for the MML book, among other things

arctic wedgeBOT
#

Hey @empty nacelle!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

undone storm
#

Hey, I want to do some kind of benchmarking (multiple datasets with multiple different algorithms) and I was searching for some tooling in order to make it reproducable. I found DVC and mlflow, they seem to support running experiments with configuration files of the hyperparamaters etc, but all I could find was with only one dataset and one algorithm (but different models of this algorithm). Does anyone know if those tools are appropriate for my use case as well, or are there better alternatives?

hushed kraken
#

Hi, for my engineering project I have to predict data of solar panels on how much energy they get. And I'm new to deep learning so would Keras be the best option to make my model or are there better alternatives?

topaz wave
#

hi, am new to machine learning and needed some assistance from experienced ppl

#

heeeeeelp

wooden sail
#

you're gonna have to specify what with

topaz wave
# wooden sail you're gonna have to specify what with

well i started learned numpy,pandas and matplotlib made some projects learned scikitlearn and did things with that , now i am confused if i should go on further with normal algorithms or i should just go on further to deep learning

#

any suggestions?

#

my main goal is to go further in deep learning and make deep learning models

wooden sail
#

what do you have in mind when you say "normal algorithms"

topaz wave
#

so should i complete them and then go for deep learning?

#

or am i good to go?

wooden sail
#

i would say you should complete those first, but i'm a big proponent of learning from the ground up. it really depends on how you prefer doing stuff

#

you'll pick up stuff that will be useful/necessary for deep learning

topaz wave
#

hmm so according to you i should strengthen my basics before goin for deep learning and neural networks

wooden sail
#

that would be my claim since ML is math

#

the more you know, the better

#

you can learn it ahead of time or at the same time, and i'm saying learning ahead of time is what i prefer, but that's personal flavor

topaz wave
#

oh alright that helps

#

thanku

meager crater
#

hey anyone has speech labeled dataset in English?

worldly wren
#

I dont know if this is the right place to ask this

#

but

#

I have a pandas dataframe , which has a coloumn

#

which is full with joined hashtags leme show you

#

so is there anything I can do to like

#

first take these rows , then make a list by separating those strings

#

then arrange in ascending order of hashtags used

desert oar
# worldly wren

where did you get this data from? the ÿ characters appear to be some kind of record separator, represented incorrectly because the original data is in a different text encoding from what pandas used to load the data

worldly wren
#

I think that might be because

#

the guy who made the data , must have used a mac

#

or something. I am using win 11

desert oar
#

do you know what program they might have used to make the data?

worldly wren
#

I have no idea honestly

desert oar
#

i wonder if they just chose a byte that isn't valid ascii

worldly wren
#

so what can I do with these hashtags or captions

#

am supposed to analyze the data

desert oar
#

that's 0xFF which maybe was some overly-clever programmer's idea of a "character that nobody will use and will be obviously just a record separator"

desert oar
#

that way you will get a list of hashtags in each data frame cell, and you can proceed

#

!d pandas.Series.str.split

arctic wedgeBOT
#

Series.str.split(pat=None, n=- 1, expand=False, *, regex=None)```
Split strings around given separator/delimiter.

Splits the string in the Series/Index from the beginning, at the specified delimiter string.
worldly wren
#

where will I get the list tho

#

as a different column

desert oar
#

can you clarify your question?

worldly wren
#

well

#

I want to seperate the hashtags

#

then ,

#

i want to count all the unique ones ,

#

and arrange them in ascending order

#

and make a bar graph out of it

desert oar
#

okay, you want the number of unique hashtags in each row?

#

or the number of times each hashtag appears? or something else?

worldly wren
#

no

#

or the number of hashtags that appears in the whole table

#

like including all rows

desert oar
#

that's just one number, not something you want to plot with a bar chart

worldly wren
#

but there will be many values right

#

for different hast tags

#

x axis will be the hashtag name

#

y axis will be its value

#

or count

desert oar
worldly wren
#

Yes

desert oar
#

good question. it might be worth your while to at least make an attempt at it on your own

#

i will give you the hint that there is no single function or method that will do this for you

worldly wren
#

welp I did try to do it before

#

also I am doing this for a school project

#

but I cant figure it out

desert oar
worldly wren
#

well I tried that one split command

#

and made an effort to create a list

#

by making a virtual column

desert oar
#

a virtual column?

worldly wren
#

and placing the list values inside it

#

did not work

desert oar
#

what is a virtual column?

worldly wren
#

idk

#

in mysql they call it virtual column

#

I don't know what they call it in python

desert oar
#

(keep in mind that pandas is not python, it's just a library written in python)

worldly wren
#

Yup sorry about that

desert oar
#

how did you attempt to create a virtual column? pandas doesn't really have that concept, so it's very likely that you just misunderstood what you were doing

worldly wren
#

I tried to do something like that

#

because in sql, I did a question like that before

desert oar
#

i'm asking you to describe specifically what you tried. pandas does not have virtual columns, so telling me that you tried to create one doesn't actually tell me anything!

#

how about this, can you share the code that you used?

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

worldly wren
#

also it wasn't really a virtual column I think

#

I had just updated the column prob

#

but the process is big

desert oar
#

indeed, i was going to suggest that but i didn't want to guess without seeing your code

#

in general, it's usually prudent to try out your code on a small sample of data before trying to use the full dataset

worldly wren
#

so am I supposed to use loops?

desert oar
#

using built-in pandas looping tools is usually much faster and considered better style

worldly wren
#

can you just tell me the commands

#

syntax i mean

desert oar
#

sorry, i won't do your homework for you

worldly wren
#

I don't mind that , I know that

#

just asking you what commands will be used

#

like about the split thing you mentioned

desert oar
#

you would access the column with [] and then call .str.split on the result

worldly wren
#

okay

desert oar
#

if you are a practicing data scientist you must be able to read docs and combine them with your fundamental understanding in order to solve a problem

worldly wren
#

so il get a list after that?

desert oar
#

you can't rely on other people for that

desert oar
worldly wren
#

I am planning to take Cse

desert oar
#

this is stated in the documentation

worldly wren
#

I am in 12th grade

desert oar
#

ok, then imo you've had enough education to know that in order to learn you need to solve problems for yourself, instead of waiting for solutions to be provided to you

#

i'm sure you hear this from people all the time, but expectations only go up as you get older and gain more experience

worldly wren
#

after my entrance test

#

am planning to learn everything properly

woeful hatch
desert oar
#

well i gave you plenty of advice so far. why don't you at least try to split the hashtags into lists?

worldly wren
#

yup doing that now

desert oar
#

assign them to a new column in the same dataframe perhaps, just in case you make a mistake and need to get the original

woeful hatch
#

pandas ??

desert oar
#

i also strongly suggest working with a small sample of this data

woeful hatch
#

I mostly use csv..

desert oar
#

1000 rows should be more than enough to be "realistic" for testing your code, without worrying about performance on a bigger dataset

desert oar
#

csv is good if you don't have complicated text in your data

woeful hatch
woeful hatch
desert oar
# woeful hatch Mm...

there is no universally best data format. so really the question needs to be qualified: in what context, for what purpose?

woeful hatch
desert oar
woeful hatch
#

Mm....

desert oar
#

you will encounter that in many many situations in programming, data science, and elsewhere in life

woeful hatch
#

Hmm...

#

I've just started like 2 months ago...

#

Busy with exam portions now...

lapis sequoia
#

Making a roadmap to self-study for data analist, starting with Excel. I will also need statistics and probability, but which math topics do you need to study before statistics and probability? Graduated 30 years ago as an engineer, but this math stuff is buried deep. You may point me to Khan Academy courses, YouTube videos, books incl. The Manga Guides to ... from nostarch, FreeCodeCamp, ... Anything that may help me to know where to start and what's next.

meager forge
#

Is there any way to assess the quality of audio?

wooden sail
#

hoo boy, is there

serene scaffold
wooden sail
#

but it depends entirely on what kind of audio and the application

desert oar
#

there is a "Math for ML" book in the pinned messages

#

keep in mind that probability is often considered a subset of pure math, or at least tends to straddle the line between pure and applied. you will probably want to focus on understanding the fundamentals and don't need to worry (at least not at first) about things like moment generating functions

#

i just came up with this now: if you can derive the binomial distribution on the back of an envelope, you are off to a good start in practical probability

serene scaffold
lapis sequoia
desert oar
wooden sail
#

stats and linalg are good for your soul anyway

#

they'll need that

desert oar
#

that's a pretty good book

#

it might be a little advanced for your level if you forgot all your math

#

but it's also full of the kinds of things that, if you can implement them in practice, he will be able to solve a huge variety of real problems in a variety of domains

#

a good introductory statistics book would also be a really good idea

#

let me see if there's a probability book that starts a little on the lighter side, so you can get into the fun stuff more quickly without worrying too much about math prereqs

#

you can do a lot without much more than high school algebra

#

the best "data analysts" in my experience are the ones who don't worry about learning fancy stuff but are incredibly solid with their fundamentals, and have one or two extremely powerful tools that they know how to use proficiently, like SQL and Excel, and also have substantial domain knowledge about whatever field they work in

worldly wren
#

Hey I am back here

#

I managed to split the data a long time ago

#

was drinking tea

#

so now I have this

#

each row has a list of all the tags

#

how do I count how many times Finance or Money has been repeated

#

and stuff

serene scaffold
# worldly wren

is each element a list of strings, or a string that looks like a list of strings?

#

once you have a Series of lists of strings, you can do .explode().value_counts()

worldly wren
#

List of strings

#

@serene scaffold

serene scaffold
worldly wren
#

I have a series

#

Of that column

#

What next

serene scaffold
#

I already told you

worldly wren
#

Okay il try that

#

Rq

digital folio
#

x = PivotTable.loc[PivotTable.Retailer == "Bela","Promotion Relevance (Cat)_Energy"]
print(x)```

How can I get just the value
digital folio
worldly wren
#

I got the number of values in a single row

serene scaffold
serene scaffold
worldly wren
#

Hold on

#

Leme show you

serene scaffold
#

series is whatever the NewHash column is.

worldly wren
#

Okay

digital folio
#

it worked

worldly wren
#

{0: ['#finance', '#money', '#business', '#investing', '#investment', '#trading', '#stockmarket', '#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#dataanalyst', '#amankharwal', '#thecleverprogrammer'], 1: ['#healthcare', '#health', '#covid', '#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#dataanalyst', '#amankharwal', '#thecleverprogrammer'], 2: ['#data', '#datascience', '#dataanalysis', '#dataanalytics', '#datascientist', '#machinelearning', '#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#artificialintelligence', '#ai', '#deeplearning', '#machinelearningprojects', '#datascienceprojects', '#amankharwal', '#thecleverprogrammer', '#machinelearningmodels'], 3: ['#python', '#pythonprogramming', '#pythonprojects', '#pythoncode', '#pythonlearning', '#pythondeveloper', '#pythoncoding', '#pythonprogrammer', '#amankharwal', '#thecleverprogrammer', '#pythonprojects'], 4: ['#datavisualization', '#datascience', '#data', '#dataanalytics', '#machinelearning', '#dataanalysis', '#artificialintelligence', '#python', '#datascientist', '#bigdata', '#deeplearning', '#dataviz', '#ai', '#analytics', '#technology', '#dataanalyst', '#programming', '#pythonprogramming', '#statistics', '#coding', '#businessintelligence', '#datamining', '#tech', '#business', '#computerscience', '#tableau', '#database', '#thecleverprogrammer', '#amankharwal']}

serene scaffold
#

thank you, one moment

serene scaffold
worldly wren
#

overall

#

but this format is exactly what I want

serene scaffold
#

great, so we did it meow_party

worldly wren
#

well you did it

#

how do I get this though

#

oh

#

okay got it

serene scaffold
#
2    [#data, #datascience, #dataanalysis, #dataanal...
3    [#python, #pythonprogramming, #pythonprojects,...
4    [#datavisualization, #datascience, #data, #dat...
dtype: object

In [11]: s.explode()
Out[11]:
0                #finance
0                  #money
0               #business
0              #investing
0             #investment
             ...
4        #computerscience
4                #tableau
4               #database
4    #thecleverprogrammer
4            #amankharwal
Length: 98, dtype: object
worldly wren
#

leme try

serene scaffold
#

We can also clip the hashtag, if you don't want that.

In [14]: s.explode().str[1:].value_counts()
Out[14]:
thecleverprogrammer        5
amankharwal                5
pythonprojects             5
pythonprogramming          5
python                     5
pythoncode                 4
worldly wren
#

since I am going to make a graph out of em

#

I would prob need it

serene scaffold
#

you can use the .str accessor to do string methods to every element at once.

worldly wren
#

no attributes called value_counts

serene scaffold
worldly wren
#

print(x.explode().str.value_counts())

serene scaffold
worldly wren
#

I wanted the whole thing

#

so I was not supposed to do that?

serene scaffold
#

then remove the .str part entirely

worldly wren
#

oh okay

#

#thecleverprogrammer 117 #amankharwal 117 #python 109 #machinelearning 97 #pythonprogramming 95 ... #bigdataanalytics 1 #qrcodes 1 #datascienceinterview 1 #facebook 1 #boxplots 1 Name: Hashtags, Length: 164, dtype: int64

#

Did it , all thanks to you

serene scaffold
worldly wren
#

well thank you

#

@serene scaffold Very sorry to bother you again

#

but just wanted to confirm one thing

#

the command we used made a new series right?

#

nvm thats a dumb question

serene scaffold
serene scaffold
weary mountain
#

@worldly wren

worldly wren
#

yeah?

weary mountain
#

Hello

worldly wren
#

hey

weary mountain
#

It's my cat

#

To make you day better

worldly wren
#

Thanks alot man

weary mountain
#

Mhm

worldly wren
#

really needed a cute picture of a cat

#

but my day is made

#

I managed to get to a solution of something I have been trying to do for 2 days now

weary mountain
#

Here comes woofer to help with your stress

serene scaffold
weary mountain
#

Ok

gloomy anvil
#

hello yall! Can anyone of you tell me why statsmodels VECM.predict() returns sometimes float64 and sometimes complex128 arrays?

#

I don't understand what this triggers and statsmodels documentation does not say anything about this like always :/

#

also it seems to be random which prediction is complex128 and which is float64. not like "every third pred is X" or sth.

alpine folio
#

Does anyone know how I can convert a pytorch geometric GNN model to ONNX? I can't seem to find any examples on this topic

lapis sequoia
#

is anyone familiar with RandomForestClassifier

fringe anvil
rough magnet
tame ocean
#

I just started with AI and I want to know how models work?

gloomy anvil
# tame ocean I just started with AI and I want to know how models work?

it's simple. you usually take exemplary data and pass it to a model (e.g. neural network):
2, 3 -> 5
3, 4 -> 7
6, 2 -> 8
....
while training the model with such data, it will learn in this case to add the first two numbers to find the desired output (here: 5,7,8). Now you take new data that the model has not seen yet to test if it really works:

#

4, 2 -> 6! Congrats you trained a model to add up numbers 🙂

tame ocean
#

like how do the layers work tho?

#

@gloomy anvil

gloomy anvil
#

if you are interested just watch a video on youtube on how neural nets work. its probably easier to see some graphics than explaining it here via text

tame ocean
#

k

#

ty

plush jungle
#

are there techniques to prevent your discriminator from learning faster than your generator in GANs?

#

my generator loss is almost always higher, and I'm told that ideally they should stay about equal until the generator starts to gradually get below .5 and the discriminator should end up at .5

desert oar
plush jungle
#

ok what exactly is going on here?

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        def discriminator_block(in_filters, out_filters, bn=True):
            block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)]
            if bn:
                block.append(nn.BatchNorm2d(out_filters, 0.8))
            return block

        self.model = nn.Sequential(
            *discriminator_block(opt.channels, 16, bn=False),
            *discriminator_block(16, 32),
            *discriminator_block(32, 64),
            *discriminator_block(64, 128),
        )

        # The height and width of downsampled image
        ds_size = opt.img_size // 2 ** 4
        self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1), nn.Sigmoid())

    def forward(self, img):
        out = self.model(img)
        out = out.view(out.shape[0], -1)
        validity = self.adv_layer(out)

        return validity```
#

this is not the usual way I see neural net layers defined

#

I saw a thing on the internet that suggested dumbing down the discriminator by removing a hidden layer

#

but every time I try to change any of the discriminator_block() lines it throws the following error

    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x32768 and 8192x1)```
#

how can I remove a hidden layer from this?

merry ridge
#

Is there a way to vectorize this in some sensible way? I have a vector of random numbers, call it x. I need to create a numpy 1d array so that the ith element is a number drawn from a probability distribution with parameters that are a function of the ith element of x.

#

So for example, something like this:

x = scipy.stats.norm(0,1).rvs(10**5)
y = np.zeros(10**5)
for ind in range(x.shape[0]):
  y[ind] = scipy.stats.norm(y[ind], y[ind]/2).rvs(10**5)

except not glacially slow.

desert oar
merry ridge
#

I don't think so, but it doesn't hurt to try. Let me see

#

it does not seem to work unless I am misunderstanding the syntax

desert oar
#
x = scipy.stats.norm(0,1).rvs(10**5)
y = scipy.stats.norm.rvs(x, x/2, size=10**5)

or whatever the exact setup is

#

also numpy random is probably faster

#

scipy uses this object oriented framework that involves a lot of indirection internally

#

and i would be very very surprised if numpy rng norm was not vectorized over mean and variance

merry ridge
#

I'll check it out and swap if that works. I can't recall the reason why I am using scipy other than that I brought it up and was told that if unless the reason is considerable, the speed increase from that swap isn't worth it because "the engineers want it this way".

desert oar
#

numpy is usually simpler, scipy is good if you like the OO interface or want to reuse the object representing a specific distribution repeatedly

#

with vectorization both should be "fast enough"

merry ridge
#

I think I must be just doing something weird with the syntax because your links suggest I can do it. That's helpful thank you

hasty mountain
#

2 quick questions:
-> In a translation model, translating sentences is better than translating each word inside a sentence, right?
-> If so, then each sentence will be assigned to a token, right? I'll have a single value for an entire sentence, no matter how big that sentence is?

#

Oh...now I think I get it... I'll have to tokenize each word, but the input will be the entire sentence. So each sentence will be a sequence of tokens...

forest quartz
hasty mountain
forest quartz
#

some model require you to tokenize the word, but if you use ready made/pretrained from hugging model you can just feed the whole sentence and done

hasty mountain
#

Meh. The funny is part is doing it all by myself...
even if it's through copying someone else's code

forest quartz
#

yeah its fun to code from scratch but those big ass model is fun to play with too

merry ridge
meager forge
#

Is there any way to assess the quality of audio?
To know wether it has disturbance in the sound
The audio is basically voice recodings

desert oar
#

@RenegadeZed#4600 one more thing: if you feel like you are lacking in core intuition (many of us are), the 3blue1brown "essence of" video sequences are excellent

#

you won't learn the mechanical equation stuff, but you will probably come away with a much richer intuition than you had before

plush jungle
stoic compass
#

Hey! I just wrote a data analysis project using Python on Jupyter Notebook and I really want someone to help me with a short review of it. Would you be up for this?

#

This is my first project and I want to get a second perspective from someone with more experience.

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

stoic compass
#

It is quite big, I don't think it will be readable here. 😦

serene scaffold
#

if it's too big to share, how were you expecting to get it reviewed?

stoic compass
#

All right. I will try it later and ping you. Thanks a lot!

serene scaffold
#

If you had shared the code in your first message, I would be reviewing it right now. Now we're just wasting our time.

stoic compass
#

That is for sure, thanks for your advice, I will keep this in mind.

lapis sequoia
#

noob here, is it possible to use datetime dtype for training a ML model?

#

how should i approach the idea that the machine should pay attention (via feature selection) that the date or year column is pretty relevant considering

serene scaffold
lapis sequoia
#

random tree

serene scaffold
lapis sequoia
#

sorry, yes i mean random forest

serene scaffold
#

what's the first and last year in the data?

lapis sequoia
#

min() is 1930, max() is 2014

serene scaffold
lapis sequoia
#

1930-2-15

serene scaffold
lapis sequoia
#

each row represents a football match

#

each row contains a date column, among other attributes ofc

serene scaffold
serene scaffold
serene scaffold
lapis sequoia
serene scaffold
lapis sequoia
#

learn the different teams based on the attributes provided, ultimately to predict match outcomes (win/lose/draw)

serene scaffold
#

Great. So far, you've said what your model is, what your features are, and what the model is supposed to learn. Next time you have a question, please say all of these things in your first message.

serene scaffold
lapis sequoia
#

well, the reason i ask is simple because year might have statistical significance tied to it. as we get closer to modernity, the reocurring national teams should have more statistical advantage than non-reocurring national teams. if that makes sense?

serene scaffold
#

statistical significance
this term has a specific meaning that isn't the one that you meant.

#

anyway

lapis sequoia
#

noted

serene scaffold
#

obviously, if a team has existed from 1930 all the way to the 2000s, then the players who compose that team have changed. so, can we assume that for any team, the players that compose that team is the same within a calendar year?

lapis sequoia
#

i think that's a fair assumption

serene scaffold
#

and do we know which teams have faster rates of turnover?

lapis sequoia
#

hmm we do not, but i think if i did have that rate, i would certainly use it as a feature

#

ofc

serene scaffold
#

it doesn't sound to me like you have enough features to do anything interesting. if you pick two teams, and ask "which is more likely to win", you can just pick whichever wins more often. if there are considerations other than whichever wins more, you don't really have features for that.

lapis sequoia
#

you are exactly right, which is precisely the understanding that i have come to when approaching future models to train

serene scaffold
#

hmm, I actually have an idea. do you know about time series forecasting?

lapis sequoia
#

i am just entering the machine learning universe, so sadly no

serene scaffold
#

so sadly no
don't look at what you don't know as a negative. think of it as another thing you get to learn lemon_hyperpleased

lapis sequoia
#

but for my original question: in either general cases or specifically random forest classifier cases, can we designate datetime dtypes as features. i ask only because i understand machine learning models can only accept int and float dtypes (atm)

serene scaffold
#

in "normal" ML, the order of the observations (ie, the rows of data) doesn't really matter. but for time series stuff, the order is taken into account.

lapis sequoia
#

Time series forecasting means to forecast or to predict the future value over a period of time. It entails developing models based on previous data and applying them to make observations and guide future strategic decisions. The future is forecast or estimated based on what has already happened.

#

whoa that sounds very interesting

serene scaffold
#

so if you decide that you need to be as precise as months, and your time range starts at 1 January, 1970, and you want to encode 7 February, 1971, you would encode that as 14, because individual days don't matter, and February 1971 is the 14th month in the data.

#

alternatively, if you're treating time as a sort of category (like the name of the month or the day of the week week), you can one-hot encode those

#

are you still with me? questions?

lapis sequoia
#

i was just internalizing your points

#

but yes it makes sense

serene scaffold
#

of course. take your time.

#

anyway, I don't know that you have enough features to do time series stuff, either. because if you had the turnover rate for each team, you could make a model that estimates how a team's turnover rate and past performance determines its future performance.

lapis sequoia
#

so if i understand correctly your examples, the first example (month) requires ordering to be accounted for in the encoding process, while the other one is just for assigning an int encode?

serene scaffold
lapis sequoia
#

ok that makes total sense

serene scaffold
#

for example, I used to work for Starbucks, and we knew that sales were higher on weekdays and on Friday especially, and that sales are especially low in July, and that sales are especially high in December.

#

and this is true week to week and year to year. so we don't really need to know that today is Friday, tomorrow is Saturday, and that Sunday comes afterward.

#

did you love my dank reference?

lapis sequoia
#

i have a follow up question about categorical dtypes, using the example of t-shirt sizes (S,M,L,XL)

serene scaffold
#

yeah, you'd one hot encode those, because knowing that some of them are bigger than others doesn't really help you that much

iron basalt
#

But ignoring the ordering can be useful as described.

#

Since you are doing Football I can give you the hint that you want injury data more than anything else.

serene scaffold
iron basalt
#

(Not easy to get, that is very private information)

lapis sequoia
iron basalt
#

It's something many want to try to do.

lapis sequoia
#

i guess the specific sport im thinking of is wresting (freshman vs sophomore)

iron basalt
#

(And coach apps track injury data for maximum training efficiency too)

lapis sequoia
#

i dont believe you'd want grade level to just be a label encode

#

idk i could be wrong totally

serene scaffold
lapis sequoia
#

i honestly dont know, maybe im overcomplicating this concept. i think i will experiment with the question using small pilot tests

serene scaffold
#

like what percentage of games they win each year

#

that information is available given the features you described, but you'd have to fiddle around with it.

iron basalt
lapis sequoia
#

@iron basalt since you are familiar with football predictions, do you have some examples to share off the top of your head? preferable beginner-friendly examples?

lapis sequoia
#

in a new column

serene scaffold
iron basalt
#

Stelercus brought this up, but teams are not static, they are made up entities with players constantly moving in and out, and so without that data there is not too much that can be done. @lapis sequoia

#

It's not a team name that wins, but a temporary group of people that wins.

#

(And this gets really complicated, because while the nice story is that there are star players that carry teams (can still happen), the combinations of players are key (in ways that are not obvious, like two star players in their own teams might be good, but when put on the same team fail))

lapis sequoia
#

thank you both for entertaining my curious questions, i will leave you be

iron basalt
nova meadow
#

Hi everyone,
I have been working on a project where I have to extract text from images and I am using pytesseract for that. Currently, I am working on preprocessing and have used basic transformations binarization, dilation followed by erosion. It is working well on some of the images but for other images it is not even detecting the text. Can anyone suggest me how to get better results?

iron basalt
nova meadow
#

Images where the text is in black with white background, is giving really good results like how it is books and papers generally however the images where text is in white then it isn't able to detect it

iron basalt
nova meadow
#

This is one of the images where it failed to detect the text on the upper left

iron basalt
nova meadow
#

I'm using adaptive thresholding because global thresholding methods weren't giving good results

iron basalt
nova meadow
#

Yes, it is detecting that text

iron basalt
#

What colors are happening for the block with the 6 Person.

iron basalt
nova meadow
#

I have not tried edge detection yet.

iron basalt
#

If the edge detection gives some nice text without the block around it and it still does not work, it could be a multiple scales issue.

nova meadow
#

Alright. I will try this. Thanks for the inputs 🙂

iron basalt
#

(block size and constant subtracted)

nova meadow
#

Yes, exactly. I did that because a small change in those was giving pretty different results. I was also looking as to how I can set block size if there is a way to figure out optimal value but could not find it.

iron basalt
#

When the block size is small it can act kind of like edge detection.

iron basalt
#

(e.g. Gaussian blur)

nova meadow
#

Yes, I tried median blurring but blurring was capturing unnecessary noise

iron basalt
#

The blur size relative to the threshold block size is something to consider.

nova meadow
#

Alright. I will try that

iron basalt
#

Also try Gaussian on the threshold if you are doing mean.

nova meadow
#

And block size is somewhat relative to size of the image right? Other than that I could not come up with any relation to figure out block size

iron basalt
#

(And image size)

nova meadow
#

okaay

iron basalt
# nova meadow okaay

The regular thresholding does not handle such variations (it's globally, not locally, applied).

#

(Imagine what happens when block size equals the image size)

nova meadow
#

aaah okaay. Got it.

torpid quartz
#

I have no idea what you are talking about but it sounds cool

old widget
desert oar
#

it looks like you need to go through the docs and figure out what "flowRef" you need

olive vigil
#

Probably easier just to download the data here rather than use the API: https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/details/download-telecharger.cfm?Lang=E

maiden widget
#

is there any library to recognise only alphabets from audio without API i.e offline ?

I tried VOSK , but as it is trying to recognise all Words and sentence, it has lots of errors.

I only want letter recognition.

serene scaffold
#

What does it need to do

lapis sequoia
#

i don't really understand the difference between data science and machine learning could someone explain to me?

serene scaffold
proper wing
#

Hi, why is the input for the linear 320

#

when the output after 2nd layer convd is 20

serene scaffold
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

serene scaffold
#

Please don't ask people to read screenshots of text.

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

proper wing
#

..

#
    def __init__(self):
        super().__init__()
        # Simple CNN
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 1)
#

Hi, why is the input for the linear 320

#

@serene scaffold like this

cold ridge
#

Hii

#

Can anyone tell me how to automate data filtering in excel using pandas ?

serene scaffold
cold ridge
#

import pandas as pd

#

pd.read_excel(file)

cold ridge
lapis sequoia
#

@serene scaffold

#

A very silly thing about pandas that I noticed was that for object dtype. It can store some entries as int. And others as string.

#

That's very silly to me

#

Oh

#

Or maybe not. Maybe the mismatch was because of the "22.0" kind of strings.

lapis sequoia
serene scaffold
lapis sequoia
#

So if you do str.isinstance to an object dtype. Some of them were str. And other were int

lapis sequoia
#

I thought it always transforms the whole series back to the most generalised dtype

#

Like strings

#

So if it has strings and int. All the ints being string

#

But it just keeps heterogeneous varieties

serene scaffold
lapis sequoia
#

So I think it's worth changing dtype to numeric each time for object dtype numerical columns

lapis sequoia
#

I always thought that's how it works. I didn't know about the existence of heterogeneous dtypes. No one told me 😭

serene scaffold
lapis sequoia
#

But I think I need to lemmatize you. Are you down?

serene scaffold
#

You're going to lemmatize me?

#

Sounds violent.

lapis sequoia
#

Gently lemmatize you*

serene scaffold
#

You lemmatize words. Not people.

boreal cape
#

my model keeps giving value of one kind

#

which leads to high accuracy how do I fix that

#

like I have two class yes and no

lapis sequoia
boreal cape
#

and my model predicts no

lapis sequoia
boreal cape
#

like it has 12,000 no values

#

and 2,000 yes values

lapis sequoia
#

In predictions?

boreal cape
#

and it keeps on predicting no

serene scaffold
boreal cape
#

no not in predictions

#

@lapis sequoia

lapis sequoia
#

Yeah so your input data is bad

#

I also had a data like you

boreal cape
#

but how do I fix that

lapis sequoia
#

Someone suggested a solution here though

#

It was called weighted class training something