#data-science-and-ml | Python | Page 397

lapis sequoia Apr 16, 2022, 9:08 PM

#

Hi, is there anyone who can help me with my problem? I'm trying to create a robot in pybullet but I don't know why my joints aren't working correctly and why gravity isn't working as it should.

serene scaffold Apr 16, 2022, 9:09 PM

#

lapis sequoia Hi, is there anyone who can help me with my problem? I'm trying to create a robo...

is this a DS/AI question?

lapis sequoia Apr 16, 2022, 9:09 PM

#

AI because pybullet is used to train RL models

#

Where should I ask if not here?

frank edge Apr 16, 2022, 10:12 PM

#

robotics is not equal to AI

lapis sequoia Apr 16, 2022, 10:50 PM

#

frank edge robotics is not equal to AI

yeah but Reinforcement Learning is part of AI and pybullet is being used to create virtual environments for training RL agents

serene scaffold Apr 16, 2022, 11:09 PM

#

The question is fine for this channel, though unfortunately it's not likely to be answered given that it's very niche @lapis sequoia @frank edge

rugged tide Apr 16, 2022, 11:14 PM

#

Hi there 👋

#

I'm applying for data science degree apprenticeships and was wondering whether or not people have concocted any opinions as to how good they are?

#

I would be training as a data scientist while earning roughly £20000 give or take depending on the company (if I managed to land the apprenticeship ofc), and after 4 years would have a bachelors in Data Science paid for

#

my main long-term concern would be career progression with only a bachelors degree, I've seen a few posts on reddit about how its much harder to progress without at least a masters, with many people choosing to get their PhDs

#

So my question is, would you agree with that or not? Thanks in advance

serene scaffold Apr 16, 2022, 11:24 PM

#

@rugged tide you might ask in #career-advice, asking for those who are familiar with the job market in Britain.

rugged tide Apr 16, 2022, 11:24 PM

#

My apologies, didn't see that channel.

rugged tide Apr 16, 2022, 11:25 PM

#

serene scaffold <@229526688550223872> you might ask in <#470889390588035082>, asking for those w...

as for the question about the masters, the posts i've seen have been worldwide, do you have an opinion on that part?

serene scaffold Apr 16, 2022, 11:25 PM

#

In the US, it's harder to get your foot in the door with only a bachelors related to data science, but once you get a job, progression isn't necessarily stopped by not having a higher degree.

#

also idk what £20000 can get you in the UK, but if you take today's exchange rate for GBP->USD and try to live on that here, it wouldn't be that great. Are you sure you could live on that?

rugged tide Apr 16, 2022, 11:28 PM

#

serene scaffold also idk what £20000 can get you in the UK, but if you take today's exchange rat...

yeah up north you can live on that easily tbh

#

and if in london I can commute

serene scaffold Apr 16, 2022, 11:28 PM

#

I see

rugged tide Apr 16, 2022, 11:28 PM

#

rugged tide yeah up north you can live on that easily tbh

I say easily, I guess I mean for my lifestyle lol

#

I don't really drink or rave so its fine

serene scaffold Apr 16, 2022, 11:29 PM

#

There are other ways to burn money fast BingShrug

rugged tide Apr 16, 2022, 11:30 PM

#

this is also true

#

btw, would you like me to delete my relatively long-winded post?

serene scaffold Apr 16, 2022, 11:31 PM

#

what income prospects are you looking at if you have a bachelors? because people encouraged me to do community college before starting the CS program "to save money", but in taking longer to get my degree, I missed out on a few years of higher income.

#

So in retrospect, I lost money by not getting my degree in four years.

#

obviously the situation is different. I'm just pointing out that future income is a consideration.

rugged tide Apr 16, 2022, 11:33 PM

#

serene scaffold what income prospects are you looking at if you have a bachelors? because people...

it's not really like that here, it's usually 3 years to obtain a bachelors, and that would be around 27k in uni-fee debt plus another 20-30k maintenance loan debt, so roughly 55k I guess for 3 years of uni? This would allow me to obtain a degree in 4 years, with the degree fully paid for while also earning a little bit of money

#

so 1 extra year for the degree, but no debt, and far more exp

#

there are very conflicting opinions on degree apprenticeships here though, some people think they're amazing, other people say they suck

serene scaffold Apr 16, 2022, 11:35 PM

#

well, one thing you'll have to do as a data scientist is figure out why similar events have different outcomes, so I guess you can start doing that now 😄

rugged tide Apr 16, 2022, 11:35 PM

#

🤣 thanks

hoary rover Apr 17, 2022, 12:17 AM

#

serene scaffold also idk what £20000 can get you in the UK, but if you take today's exchange rat...

PPP is vastly different in the UK.

hoary rover Apr 17, 2022, 12:25 AM

#

rugged tide I would be training as a data scientist while earning roughly £20000 give or tak...

This is definitely more of a #career-advice place for this kind of discussion, but since you haven't been moaned at yet ill put my answer here 🤣

Getting a good apprenticeship in the UK is incredibly difficult and its unbelievably competitive for what it is (a job and a place at a mid-rank uni). My advice is to take what you have assuming you've just finished A-levels or equivalent and hit up the best name brand university you can (bristol, warwick, birmingham, etc). Coming from an elite University is what makes the most difference even over experience in some cases. Personally, I'm finishing my masters in September and will be an EO for the ONS and had a better edge than most the kids applying just because my universities name was shinier.

#

I was definitely dumber though.

wicked grove Apr 17, 2022, 3:45 AM

#

hello,in k fold cross validation are the weights initialized in each fold?

flat sable Apr 17, 2022, 4:14 AM

#

hello guys i need help i need to make a project in https://robotbenchmark.net/benchmark/obstacle_avoidance/ can sm1 give me some tutorials to learn abt controll library or any usefull documentation

hollow flare Apr 17, 2022, 5:07 AM

#

What is your's opinion on blobcity ai cloud

modest comet Apr 17, 2022, 5:09 AM

#

Hello, I'm just learning python for my homework. Asked to make a program for imageAI and when I tried, It comes to error. Can someone explain it why

austere swift Apr 17, 2022, 7:22 AM

#

modest comet Hello, I'm just learning python for my homework. Asked to make a program for ima...

the pip install failed

#

try restarting the kernel (as the message suggests)

#

i doubt anybody is gonna just make a model for you, thats a job you'd have to pay quite a bit for

#

you can try to make one and we'll help you if you run into any snags though

loud flame Apr 17, 2022, 7:27 AM

#

austere swift i doubt anybody is gonna just make a model for you, thats a job you'd have to pa...

its a school project-

loud flame Apr 17, 2022, 7:27 AM

#

austere swift you can try to make one and we'll help you if you run into any snags though

yeah I've already made one

#

but I just don't know how to improve it further

#

I've used every parameter for a random forest classifier ( best_params )

austere swift Apr 17, 2022, 7:27 AM

#

loud flame its a school project-

!rule 8

arctic wedgeBOT Apr 17, 2022, 7:27 AM

#

Rules

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

loud flame Apr 17, 2022, 7:28 AM

#

its not an exam

#

🗿

austere swift Apr 17, 2022, 7:28 AM

#

homework

#

we can't do it for you, but we can help you

#

which is what i explained earlier

loud flame Apr 17, 2022, 7:28 AM

#

yeah u don't need to help

#

but could u give tips on improving it ( after hyperparameter tuning )

#

is there anything else u can do on a model

#

just tell me, I'll research and do it myself

#

+I'm not able to balance the data

austere swift Apr 17, 2022, 7:29 AM

#

"can anyone create a good model for the dataset I'll give" sounds a lot like asking for someone to do it for you

austere swift Apr 17, 2022, 7:29 AM

#

loud flame but could u give tips on improving it ( after hyperparameter tuning )

try using a different type of model

#

like a neural network or an svm etc

loud flame Apr 17, 2022, 7:33 AM

#

austere swift like a neural network or an svm etc

welp SVM did very bad on the data

#

idk bout neural network

#

its more of a y/n model

loud flame Apr 17, 2022, 7:34 AM

#

austere swift "can anyone create a good model for the dataset I'll give" sounds a lot like ask...

😭 gomenasai, I'll be clearer next time

loud flame Apr 17, 2022, 7:35 AM

#

austere swift try using a different type of model

So far I've used :

Logistic Regression Model
Gradient Boosting Model
Random Forest Classifier
Decision Tree Classifier
Naive Bayes
Pipeline

#

Random Forest gave the best results but, the options to improve it are limited

austere swift Apr 17, 2022, 7:38 AM

#

well try neural networks

loud flame Apr 17, 2022, 7:41 AM

#

austere swift well try neural networks

will it give only 2 outputs?

#

isn't it a multiclass thing

austere swift Apr 17, 2022, 7:41 AM

#

neural networks can pretty much do whatever you want, it's just about how you configure them

#

you can give as many or as few outputs as you want

loud flame Apr 17, 2022, 7:41 AM

#

I have a doubt, is there any way I can increase the speed of my GridSearchCV

#

its been 3.5 hours

loud flame Apr 17, 2022, 7:41 AM

#

austere swift neural networks can pretty much do whatever you want, it's just about how you co...

oh i see

#

I'll try it then

austere swift Apr 17, 2022, 7:42 AM

#

loud flame + I have a doubt, is there any way I can increase the speed of my GridSearchCV

you can increase your n_jobs

#

which is just how many parallel jobs it'll run

loud flame Apr 17, 2022, 7:42 AM

#

austere swift which is just how many parallel jobs it'll run

oh

#

how much?

#

I did n_jobs=-1 before

austere swift Apr 17, 2022, 7:42 AM

#

well -1 is the maximum you can use anyways

loud flame Apr 17, 2022, 7:42 AM

#

oh 🗿

#

if I update a running jupyter cell

#

I just added

#

n_jobs=-1

#

into a running cell

#

will it update the parameter?

#

@austere swift

austere swift Apr 17, 2022, 7:44 AM

#

loud flame will it update the parameter?

you have to restart the cell

bronze spire Apr 17, 2022, 7:56 AM

#

Where can I start learning about Data Science?

lapis sequoia Apr 17, 2022, 8:13 AM

#

bronze spire Where can I start learning about Data Science?

Depends on your background, do you know the basics of Python and/or data analysis ?

mint palm Apr 17, 2022, 8:22 AM

#

what can happen if i encode my csv dataset into very high number of columns using hash encoders

crisp flax Apr 17, 2022, 8:36 AM

#

bronze spire Where can I start learning about Data Science?

Just learn statistics.

austere swift Apr 17, 2022, 9:11 AM

#

it was doing so good too :(

next phoenix Apr 17, 2022, 10:54 AM

#

Found this interesting. Important Advanced Regression Techniques with a project : https://medium.com/coders-mojo/day-37-60-days-of-data-science-and-machine-learning-series-2e78afca9680

Medium

Day 37: 60 days of Data Science and Machine Learning Series

Advanced Regression Techniques with project ( Part 1) …

lapis sequoia Apr 17, 2022, 11:16 AM

#

serene scaffold The question is fine for this channel, though unfortunately it's not likely to b...

I figured it out and it was not really connected to pybullet engine, the problem was that I just wrote colision instead of collision

fleet trail Apr 17, 2022, 11:39 AM

#

Hello, what would be the best model for a dataset containing 5000 rows and 2098 columns

grand scaffold Apr 17, 2022, 12:01 PM

#

austere swift it was doing so good too :(

F

#

And I once got a loss rate of 2 and was upset lmao

#

Yo why is this not working lemon_thinking

from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt("data.csv", delimiter=",")
X = dataset[:,0:3]
y = dataset[:,3]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=3, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# make class predictions with the model
predictions = (model.predict([4,5,7]) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
      print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))```

lapis sequoia Apr 17, 2022, 12:08 PM

#

import numpy as np
import matplotlib.pyplot as plt

def gradient_descent(x,y):
    m_curr = b_curr = 0
    iterations = 100000
    n = len(x)
    learning_rate = 0.001

    for i in range(iterations):
        y_predicted = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
        plt.plot(x,y_predicted, color = "green")
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i))

x = np.array([10,9,11,12,6,5,7,6,12,14])
y = np.array([95,90,90,105,75,75,80,85,110,115])

gradient_descent(x,y)

Even after so many iterations the cost is still 20

grand scaffold Apr 17, 2022, 12:09 PM

#

Tryna make a basic neural network in keras

rose agate Apr 17, 2022, 1:44 PM

#

loud flame +I'm not able to balance the data

if your ratio of classes isn't close to being even try using SMOTE or undersampling

cedar plank Apr 17, 2022, 2:45 PM

#

hey guys

#

i want help

#

can anyone help me with a machine learning course pls

#

can anyone help me

serene scaffold Apr 17, 2022, 2:51 PM

#

cedar plank can anyone help me with a machine learning course pls

you have to ask your actual question, not give a teaser for it

cedar plank Apr 17, 2022, 2:55 PM

#

the qustion that i want advanced machine learning course

serene scaffold Apr 17, 2022, 2:55 PM

#

you want someone to tell you what advanced ML course you should take?

cedar plank Apr 17, 2022, 2:56 PM

#

yes i want suggestions

serene scaffold Apr 17, 2022, 2:56 PM

#

the andrew ng course seems to be popular. I have not taken it.

cedar plank Apr 17, 2022, 2:57 PM

#

are u joking or talking serious

serene scaffold Apr 17, 2022, 2:58 PM

#

I am being serious. there's an ML course taught by Andrew Ng that I hear about a lot. But I have not taken it personally, so I can't tell you how it is from experience.

tough frigate Apr 17, 2022, 3:35 PM

#

me neither

#

i prefer ebooks

analog kiln Apr 17, 2022, 3:36 PM

#

    class MyCell(tf.keras.layers.AbstractRNNCell):
            @property
            def output_size(self):
                return 16
            @property
            def state_size(self):
                return 16
            def call(self, inputs, states):
                alpha_t, alpha_t_prev = inputs, states[0]

                return alpha_t, [alpha_t]

        my_cell = MyFCell()
        layer_res = tf.keras.layers.RNN(my_cell)(logits)

anything obviously wrong with this code? logits contains a tensor with shape [batch_size, timesteps, logits]. I know it doesn't do anything right now but i'm getting this error: TypeError: Cannot iterate over a scalar tensor.

#

is it the way i'm defining the output and state sizes?

hollow sentinel Apr 17, 2022, 3:38 PM

#

so i just tried to get anaconda on my computer

#

never again

#

anaconda's like that hot ex that you want back in your life bc you think it'll change

#

but then it's exactly the same

#

stel i know you're gonna read this

#

jupyter notebook fucking sucks

serene scaffold Apr 17, 2022, 3:45 PM

#

hollow sentinel anaconda's like that hot ex that you want back in your life bc you think it'll c...

except they're not even hot. you just had bad taste as a teen.

hollow sentinel Apr 17, 2022, 3:45 PM

#

HAHAHXCKDN

#

that describes my last one so well

#

anywaysssss

#

thonny is the shawty

serene scaffold Apr 17, 2022, 3:46 PM

#

there was an ask reddit where someone asked "what is your high school crush doing now?" and someone said "I'm 40. he's still a douchebag who spends all his time at the gym. but I was into that at the time."

hollow sentinel Apr 17, 2022, 3:47 PM

#

i was facetiming this girl last night and she was like full time real estate, full time law firm stuff, full time college student and she was dating this dude who had nothing going for him

#

and i was like why?

#

and then she said verbatim "i was bored"

#

oh i'm gonna preach this: everyone please pip install pyforest

#

you don't need to write import numpy as np, import pandas as pd, import scikitlearn as sklearn

#

it lazily imports everything

inland belfry Apr 17, 2022, 3:50 PM

#

i am using mediapipe to make stick figures from video (sorry for the rick roll i was just using it as a test video)

serene scaffold Apr 17, 2022, 3:50 PM

#

inland belfry i am using mediapipe to make stick figures from video (sorry for the rick roll i...

does that use AI in some way?

inland belfry Apr 17, 2022, 3:50 PM

#

mediapipe

serene scaffold Apr 17, 2022, 3:50 PM

#

what is that

inland belfry Apr 17, 2022, 3:51 PM

#

look it up

serene scaffold Apr 17, 2022, 3:51 PM

#

no

inland belfry Apr 17, 2022, 3:51 PM

#

ok

#

it's a library that does pose detection and face tracking and stuff

coarse narwhal Apr 17, 2022, 4:01 PM

#

hollow sentinel so i just tried to get anaconda on my computer

what should i use instead of jupyter?

desert bear Apr 17, 2022, 4:21 PM

#

Hi das someone know if you can export a yolov5 file in xml format I already found the https://pytorch.org/hub/ultralytics_yolov5/ but I can’t find it. i use open vc for all the image stuff

PyTorch

bronze spire Apr 17, 2022, 4:25 PM

#

lapis sequoia Depends on your background, do you know the basics of Python and/or data analysi...

I know the basics of python

#

@lapis sequoia So, from where should I learn

lapis sequoia Apr 17, 2022, 4:28 PM

#

bronze spire I know the basics of python

I like Medium articles (practical aplications) and Codecademy (structured learning) but there is also lots of free materials on Kaggle for example

bronze spire Apr 17, 2022, 4:29 PM

#

lapis sequoia I like Medium articles (practical aplications) and Codecademy (structured learni...

Can you send me a link to something you recommend the most

lapis sequoia Apr 17, 2022, 4:31 PM

#

bronze spire Can you send me a link to something you recommend the most

Sure thing https://www.codecademy.com/learn/paths/data-science

Codecademy

Learn Data Science - Kick-Start Your Career | Codecademy

Become a Data Scientist. Data Science is one of the fastest growing fields in tech. Get this dream job by mastering the skills you need to analyze data with SQL and Python. Then, go even further by building Machine Learning algorithms.

bronze spire Apr 17, 2022, 4:31 PM

#

lapis sequoia Sure thing https://www.codecademy.com/learn/paths/data-science

Thanks

next phoenix Apr 17, 2022, 4:49 PM

#

Found this interesting : https://medium.datadriveninvestor.com/writing-efficient-python-code-part-2-4bf876712677?sk=c83f6267d0c74479f626d11fd59942a7

serene scaffold Apr 17, 2022, 5:01 PM

#

next phoenix Found this interesting : https://medium.datadriveninvestor.com/writing-efficient...

I see that your entire participation in this channel is posting content from that same author. This is tantamount to advertising, so we're going to remove you if you don't actually contribute.

hollow sentinel Apr 17, 2022, 5:06 PM

#

coarse narwhal what should i use instead of jupyter?

thonny

#

thonny mad cute

#

and gives nice debugging tips

#

mad easy to install and update packages

#

lightweight

#

people give it shit bc it’s for beginners but i prefer it

#

it basically has a rubber duck installed that talks to you about your code and tries to suggest what went wrong

#

more descriptive than some long ass error message you’ll run a rabbit hole for hours looking to solve

wicked grove Apr 17, 2022, 5:24 PM

#

hello,can someone please tell me how i can use grad cam to correct my model

#

i can see the areas wheremy model is making a mistake using grad cam

cedar plank Apr 17, 2022, 5:41 PM

#

can anyone send a course from code academy to start in machine learning if i studied data analysis

grave frost Apr 17, 2022, 5:45 PM

#

cedar plank can anyone send a course from code academy to start in machine learning if i stu...

what did you study in data analysis

cedar plank Apr 17, 2022, 5:46 PM

#

i studied statistics and spread sheets and bussines metrics

#

iknow this is not enough

grave frost Apr 17, 2022, 5:47 PM

#

is that a MOOC or a degree?

misty flint Apr 17, 2022, 5:58 PM

#

coarse narwhal what should i use instead of jupyter?

google colab

#

kekHands

#

aka their version of jupyter notebooks

#

no need to install anything

#

CLe_FeelsEvilLurk

#

~~only obscure errors~~ i mean what

#

RunFail

cedar plank Apr 17, 2022, 6:01 PM

#

grave frost is that a MOOC or a degree?

i think yes

grave frost Apr 17, 2022, 6:03 PM

#

i think yes
yes

cedar plank Apr 17, 2022, 6:05 PM

#

grave frost > i think yes yes

ok yes

#

can you tell me a course to begin in ML in codecademy

grave frost Apr 17, 2022, 6:09 PM

#

yes

inland belfry Apr 17, 2022, 6:10 PM

#

coarse narwhal what should i use instead of jupyter?

notepad

#

:)

cedar plank Apr 17, 2022, 6:36 PM

#

cedar plank can you tell me a course to begin in ML in codecademy

???

lapis sequoia Apr 17, 2022, 8:00 PM

#

cedar plank ???

You can find all their courses in the catalogue

thorn bobcat Apr 17, 2022, 9:46 PM

#

yo

calm palm Apr 17, 2022, 10:10 PM

#

Hey small question since questions about pandas seem to fall short in the help channels, does anybody know of a quick way to replace values in a column with 'day' if it falls within a certain time of the day and 'night' if it falls within a certain time in a pandas dataframe? I've searched and dataframe.between_time does not seem to return it in a way such that I can set a certain column in those values to another value

calm palm Apr 17, 2022, 10:33 PM

#

Nvm I think I got it

upper mural Apr 17, 2022, 11:40 PM

#

Hello. Could anybody recommend a textbook something like a textbook on data science with python that mainly focuses on quant methods?

#

if you know more than one book, please do refer a few for evaluation

barren barn Apr 18, 2022, 2:27 AM

#

This is probably the wrong section but is python able to make a script that grabs a =count(E2:Ewhatever the last one is) on a specific column in multiple sheets and print them in a new column of a specific file?

#

I have a small macro from Fiji I made/found parts of and it runs an analysis on a set of images within a folder and spits out all the excel sheets into a folder. So I have that backbone but I don't think it's going to work

#

so if anyone knows if this is easily possible, that would be great

bronze spire Apr 18, 2022, 2:31 AM

#

Any free source from where I can learn Data Science?

serene scaffold Apr 18, 2022, 2:49 AM

#

bronze spire Any free source from where I can learn Data Science?

!resources data science

arctic wedgeBOT Apr 18, 2022, 2:49 AM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

jaunty belfry Apr 18, 2022, 6:20 AM

#

hello

#

#

what can be the right answer for this question.?

#

I selected B as the right answer

#

but unfortunately its wrong

jovial sleet Apr 18, 2022, 6:28 AM

#

jaunty belfry

B looks like the correct answer. I'm guessing they messed up the question

jaunty belfry Apr 18, 2022, 6:30 AM

#

#

What about this?

#

@jovial sleet

jaunty belfry Apr 18, 2022, 6:31 AM

#

jovial sleet B looks like the correct answer. I'm guessing they messed up the question

for that one they gave C as answer

tough frigate Apr 18, 2022, 7:52 AM

#

anybody got interviewed at Uber for data analyst ? any help is appreciated.

vernal hull Apr 18, 2022, 8:26 AM

#

hi'

regal ingot Apr 18, 2022, 11:41 AM

#

hey

bronze spire Apr 18, 2022, 12:32 PM

#

serene scaffold !resources data science

Thanks

burnt pilot Apr 18, 2022, 1:07 PM

#

How can I manipulate the Names of a Data Frame that are within a specific range of id`s

#

using Pandas

serene scaffold Apr 18, 2022, 1:25 PM

#

burnt pilot How can I manipulate the Names of a Data Frame that are within a specific range ...

you can use df.loc[start_id:end_id] to get the desired rows, but beyond that, you'll have to be a lot more specific.

burnt pilot Apr 18, 2022, 1:35 PM

#

I have a Data Frame that looks like this Bertha,F,1320 Sarah,F,1288 Annie,F,1258 Clara,F,1226 Ella,F,1156 Florence,F,1063 Cora,F,1045 Martha,F,1040 Laura,F,1012 the header is ['Name','Gender','Id'] and I want to change the names that have a an Id within a range of 1180-1200 to John

burnt pilot Apr 18, 2022, 1:35 PM

#

serene scaffold you can use `df.loc[start_id:end_id]` to get the desired rows, but beyond that, ...

Is this enough

serene scaffold Apr 18, 2022, 1:37 PM

#

you would just need to do df.loc[df['Id'].between(1180, 1200), 'Name'] = 'John'

#

if you change the Id column to be the index, it would be df.loc[1180:1200, 'Name'] = 'John'

burnt pilot Apr 18, 2022, 1:39 PM

#

serene scaffold you would just need to do `df.loc[df['Id'].between(1180, 1200), 'Name'] = 'John'...

lemme test it

burnt pilot Apr 18, 2022, 1:40 PM

#

serene scaffold you would just need to do `df.loc[df['Id'].between(1180, 1200), 'Name'] = 'John'...

im just getting ```py
k:
John

serene scaffold Apr 18, 2022, 1:45 PM

#

burnt pilot lemme test it

I'd have to see the code that caused that to be printed.

#

like, what caused k: to be displayed?

burnt pilot Apr 18, 2022, 1:48 PM

#

k = df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John'
print('k:\n',k)

burnt pilot Apr 18, 2022, 1:48 PM

#

serene scaffold I'd have to see the code that caused that to be printed.

sure

serene scaffold Apr 18, 2022, 1:49 PM

#

burnt pilot ```py k = df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John' print('k:\n',k)...

you have to print df to see the result

#

k = df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John' is the same as

df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John'
k = 'John'

hoary rover Apr 18, 2022, 1:50 PM

#

^

#

df is the dataframe you created from df_1880 within your parameters.

burnt pilot Apr 18, 2022, 1:52 PM

#

serene scaffold you have to print `df` to see the result

No I fixed it ```py
k = df_1880.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John'
print('k:\n',k)

#

its working now

#

when I look into variable explorer

serene scaffold Apr 18, 2022, 1:52 PM

#

burnt pilot No I fixed it ```py k = df_1880.loc[df_1880['Id'].between(42, 49), 'Name'] = 'Jo...

great. but k is still just going to be 'John'

burnt pilot Apr 18, 2022, 1:52 PM

#

but still it just shows John

#

yeah

hoary rover Apr 18, 2022, 1:52 PM

#

Yes, because you overwrote df_1880

serene scaffold Apr 18, 2022, 1:52 PM

#

k is irrelevant

#

df.loc[...] = ... is a method call. it's not actually doing assignment.

hoary rover Apr 18, 2022, 1:53 PM

#

k is just a variable you created. How do you want to display the output?

burnt pilot Apr 18, 2022, 1:53 PM

#

I see

#

I wanted to get smth like this ```py
Name
2117 Vince
2118 Vivian
2119 Whit
2120 Willaim
2121 Winifred
2122 Wirt
2123 Woodson
2124 Woody
2125 Worley
2126 Zed

hoary rover Apr 18, 2022, 1:54 PM

#

Yes. Then just write df_1880 underneath your input.

serene scaffold Apr 18, 2022, 1:54 PM

#

it's the same as df.loc.__setitem__(x, y). it changes the state of df. it doesn't write any new variables. but since you stacked it with k =, it assigned to k

hoary rover Apr 18, 2022, 1:55 PM

#

Pandas is the apple imac of python code. Please check the documentation.

burnt pilot Apr 18, 2022, 1:55 PM

#

hoary rover Pandas is the apple imac of python code. Please check the documentation.

I tried to but got lost in it

serene scaffold Apr 18, 2022, 1:55 PM

#

hoary rover Pandas is the apple imac of python code. Please check the documentation.

I appreciate where this is coming from, but the pandas docs are incomprehensible if you don't understand the basics of pandas.

#

but Kingu is right that pandas works very differently from the rest of Python

burnt pilot Apr 18, 2022, 1:57 PM

#

serene scaffold I appreciate where this is coming from, but the pandas docs are incomprehensible...

Of course this is more advanced but I just got into it and dont know how it reacts to code so I dont know what to expect

#

m = df_1880['Name'].value_counts()['Mary']
print('Mary :\n',m)
``` here I counted how many times the name Mary occurs

#

how would I do the same thing for 3 dataframes at once

serene scaffold Apr 18, 2022, 2:12 PM

#

burnt pilot how would I do the same thing for 3 dataframes at once

what distinguishes the three dataframes?

tidal sonnet Apr 18, 2022, 2:57 PM

#

what do you think about julia for datascience ?

pseudo wren Apr 18, 2022, 3:17 PM

#

maybe this is the dumbest error but i keep running into it

serene scaffold Apr 18, 2022, 3:18 PM

#

tidal sonnet what do you think about julia for datascience ?

I've heard that it performs well for CPU-bound work (which as you may know, is one of Python's greatest drawbacks), but it just doesn't have the ecosystem that Python has.

pseudo wren Apr 18, 2022, 3:18 PM

#

whenever i am trying to insert table values with sqlite3, i get a syntax error and i am not sure what's wrong with it or my spacing

#

idk

#

      Car_Name    
      Year    
      Selling_Price
      Present_Price    
      Kms_Driven    
      Fuel_Type    
      Seller_Type    
      Transmission    
      Owner
    )
    VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', tups)
connector.commit()
connector.close()```

serene scaffold Apr 18, 2022, 3:18 PM

#

you might try #databases

pseudo wren Apr 18, 2022, 3:18 PM

#

near "Year": syntax error

#

thanks!

serene scaffold Apr 18, 2022, 3:19 PM

#

you should probably specify what flavor of SQL you're using as well

burnt pilot Apr 18, 2022, 3:25 PM

#

serene scaffold what distinguishes the three dataframes?

df_1880,df_1881,df_1882

serene scaffold Apr 18, 2022, 3:25 PM

#

burnt pilot df_1880,df_1881,df_1882

if the columns of these three DFs represent the same thing, they should be one DF.

#

are 1880-2 just years?

young harness Apr 18, 2022, 3:40 PM

#

kind of a beginner question but can someone explain buffer size and batch size to me in simple terms? (in ML i mean)

serene scaffold Apr 18, 2022, 3:48 PM

#

young harness kind of a beginner question but can someone explain buffer size and batch size t...

are you doing audio processing?

young harness Apr 18, 2022, 3:49 PM

#

serene scaffold are you doing audio processing?

nope, im following a dcgan tutorial

serene scaffold Apr 18, 2022, 3:49 PM

#

young harness nope, im following a dcgan tutorial

I haven't heard of buffer size before. batch size is the number of training instances that are passed through the network at once.

young harness Apr 18, 2022, 3:50 PM

#

in an epoch or an iteration?

serene scaffold Apr 18, 2022, 3:51 PM

#

in an iteration, I guess. an epoch is a pass over the entire training set.

young harness Apr 18, 2022, 3:51 PM

#

thanks for the explaination, i was kinda confused :)

#

ima look more into what buffer size is though

burnt pilot Apr 18, 2022, 3:52 PM

#

serene scaffold are 1880-2 just years?

Yeah these are Years so its 3 years

serene scaffold Apr 18, 2022, 3:53 PM

#

burnt pilot Yeah these are Years so its 3 years

then Year should probably be a column. splitting things into a variable number of dataframes (like having a separate dataframe for each year in the data) is almost always bad.

burnt pilot Apr 18, 2022, 3:57 PM

#

So you suggest to create a DataFrame with these inside

tough tundra Apr 18, 2022, 3:58 PM

#

can someone help me to create a ChatBot Song Recommender System,
I can provide all the information which I have

serene scaffold Apr 18, 2022, 4:00 PM

#

burnt pilot So you suggest to create a DataFrame with these inside

yes. and then you can do df.groupby('Year')['Name'].value_counts()

valid rapids Apr 18, 2022, 4:07 PM

#

Does anybody here have experience with gpt-2? I'm really interested in making a chatbot with it, but I have NO clue what I'm doing.

karmic valley Apr 18, 2022, 4:08 PM

#

https://paste.pythondiscord.com/luhoyasewu can you help me with some different code i want to work out average pixel whiteness of image. my code i wrote i think the library doesnt support transparency, i have transparency in my image

green rune Apr 18, 2022, 4:45 PM

#

do people most typically use jupyter whenever they're actually doing data analysis?

serene scaffold Apr 18, 2022, 4:50 PM

#

green rune do people most typically use jupyter whenever they're actually doing data analys...

most people use some kind of interactive setup to explore data, though the appropriateness of jupyter for different use cases is controversial.

green rune Apr 18, 2022, 4:52 PM

#

serene scaffold most people use some kind of interactive setup to explore data, though the appro...

so what is the most popular way to do it?

serene scaffold Apr 18, 2022, 4:57 PM

#

green rune so what is the most popular way to do it?

Jupiter is a fine way

green rune Apr 18, 2022, 5:07 PM

#

thank you!

#

learning it now

serene scaffold Apr 18, 2022, 5:07 PM

#

green rune learning it now

just don't become dependent on it. it's a tool for visualization and exploration. if it becomes the only way you write code, you're gonna have a bad time in the future.

green rune Apr 18, 2022, 5:18 PM

#

serene scaffold just don't become dependent on it. it's a tool for visualization and exploration...

whats the tool for actual reporting? would it be inside the terminal or are there other things like sublime text that would be better fit?

agile cobalt Apr 18, 2022, 5:21 PM

#

Jupyter notebooks are fine-ish for reporting if you use it effectively (i.e., use markdown cells to document it well and clean up the code) then generate a PDF, or just use Powerpoint

green rune Apr 18, 2022, 5:21 PM

#

agile cobalt Jupyter notebooks are fine-ish for reporting if you use it effectively (i.e., us...

okay so coding can be done however but final reporting is usually done outside and probably in a more presentable way? such as screenshots for powerpoint?

agile cobalt Apr 18, 2022, 5:22 PM

#

usually not screenshots if possible

#

most libraries will have ways of outputting to a file

green rune Apr 18, 2022, 5:22 PM

#

agile cobalt usually not _screenshots_ if possible

ah okay so relying on jupyter is only bad if you don't know how to do the programming or final reporting separately?

agile cobalt Apr 18, 2022, 5:23 PM

#

that's not really the point

#

the issue about Jupyter is that the code tends to get messy, not as easy to isolate, and the global state is just a mess with things from even deleted cells still existing.
It is fine for data exploration and reporting though

#

it is bad if you do not organise your code or if you end up with something you cannot reproduce later

green rune Apr 18, 2022, 5:25 PM

#

ah okay I think I have a better idea of why now, and I see what you mean whenever I want to make outside changes you have to go back through it which can be tedious

agile cobalt Apr 18, 2022, 5:25 PM

#

Just to make sure - What exactly do you mean by that? (outside changes & going back through it)

green rune Apr 18, 2022, 5:26 PM

#

like if i have a csv and i want to change it, i cant just make changes and go back and continue working I have to rerun sections of the notebook that would create a df to make sure my changes were made

agile cobalt Apr 18, 2022, 5:27 PM

#

you really shouldn't simply "change it" in the middle of the process like that

serene scaffold Apr 18, 2022, 5:27 PM

#

A good rule of thumb for notebooks is that if some output needs to be reproducible, it should be possible to obtain it only by running each cell once in order.

agile cobalt Apr 18, 2022, 5:28 PM

#

changing your data is not something you should do often to the point of it being a concern, but if you do change anything about the source you should definitely shutdown/restart the kernel and rerun all

green rune Apr 18, 2022, 5:33 PM

#

thank you again

hoary rover Apr 18, 2022, 5:35 PM

#

Just to add in as well, but the point has probably already been made, jupyter globally caches elements of your script which can get tedius at times which is why its often better and easier to write from scratch in a script.

lapis sequoia Apr 18, 2022, 5:35 PM

#

Hi what's the best method to approach this question? logistic regression maybe?

hoary rover Apr 18, 2022, 5:36 PM

#

Yes. Each variable is categorical and it has <20 elements.

lapis sequoia Apr 18, 2022, 5:38 PM

#

hoary rover Yes. Each variable is categorical and it has <20 elements.

there are some numeric too

lapis sequoia Apr 18, 2022, 5:39 PM

#

hoary rover Yes. Each variable is categorical and it has <20 elements.

would it be enough to just fit the model using all variables and see there significance using a model summary in R (I use R studio)

hoary rover Apr 18, 2022, 6:20 PM

#

Yes. Make sure you diagnose gauss.

indigo garnet Apr 18, 2022, 6:29 PM

#

what is the 4 variable version of this class?

lapis sequoia Apr 18, 2022, 6:54 PM

#

indigo garnet what is the 4 variable version of this class?

do you mean four parameter version ? There is one that takes five parameters, of which three are optional https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

indigo garnet Apr 18, 2022, 6:55 PM

#

lapis sequoia do you mean four parameter version ? There is one that takes five parameters, of...

yea

indigo garnet Apr 18, 2022, 7:22 PM

#

lapis sequoia do you mean four parameter version ? There is one that takes five parameters, of...

Is it nn.Linear that takes in five parameters?

lapis sequoia Apr 18, 2022, 7:24 PM

#

indigo garnet Is it nn.Linear that takes in five parameters?

according to those docs yes

indigo garnet Apr 18, 2022, 7:41 PM

#

lapis sequoia according to those docs yes

Ohk cool, thanks for the help

lapis sequoia Apr 18, 2022, 7:42 PM

#

hoary rover Yes. Make sure you diagnose gauss.

diagnose gauss?

hoary rover Apr 18, 2022, 8:45 PM

#

Normality. Plus do all the other assumptions (independence, linearity etc)

misty flint Apr 18, 2022, 8:53 PM

#

ugh

#

cant get dask to cooperate

#

tragic

#

kekHands

quartz fable Apr 19, 2022, 1:33 AM

#

Hello, i've an problem to solve using python. i need to build an school timetable.. and i'm thinking in use deep learning or something like that. could anyone here tell me what is the best model to choose for this problem ? neural network, NPL, random forest.. i've some rules like (if theacher can give lass in the morning.. or nightly, and anothers rules.

serene scaffold Apr 19, 2022, 1:43 AM

#

quartz fable Hello, i've an problem to solve using python. i need to build an school timetabl...

what do you mean "school timetable"? also, if there's any way you can solve the problem without AI, that is better.

quartz fable Apr 19, 2022, 1:44 AM

#

serene scaffold what do you mean "school timetable"? also, if there's any way you can solve the ...

school timetable is an board that has defined hours with 1 teacher, 1 physic space and 1 theme.. like "William , 09:30AM - 10:30AM - Math - Romm 5"

quartz fable Apr 19, 2022, 1:45 AM

#

serene scaffold what do you mean "school timetable"? also, if there's any way you can solve the ...

i solved this problem without AI but.. maybe with AI the solution can be more smart.

serene scaffold Apr 19, 2022, 1:45 AM

#

quartz fable i solved this problem without AI but.. maybe with AI the solution can be more sm...

if you can solve a problem without AI, the AI solution will be worse.

#

one uses AI when it's not possible to write a program that can always solve the problem. AI programs attempt to approximate human judgement

#

if there's an exact series of steps that is guaranteed to produce the correct result for something, do that.

safe elk Apr 19, 2022, 1:55 AM

#

misty flint ugh

Time for coffee im drinking mine rn

misty flint Apr 19, 2022, 1:59 AM

#

safe elk Time for coffee im drinking mine rn

bruh too late over here

#

kekHands

quartz fable Apr 19, 2022, 2:01 AM

#

serene scaffold if you can solve a problem without AI, the AI solution will be worse.

ok.. but if i have some problem with a lot os strings, there is an "ideal" model to work ? or i will transform all string in numbers ?

desert oar Apr 19, 2022, 2:25 AM

#

quartz fable i solved this problem without AI but.. maybe with AI the solution can be more sm...

how did you solve it without ai? you might be confused about what ai actually is, at our current level of (non-fantasy, non-scifi) technology

willow karma Apr 19, 2022, 3:53 AM

#

What is a single metric I can use to calculate correlation amongst multiple variables? A correlation matrix doesn't accomplish it because it reports multiple correlations

next phoenix Apr 19, 2022, 4:08 AM

#

Found this interesting. https://medium.datadriveninvestor.com/python-advanced-modules-made-easy-part-1-181f4558a854?sk=e70d72b19b09c34dd8d0bd6a0a6cef8c

reef dock Apr 19, 2022, 4:13 AM

#

Could someone explain to me what PR AUC is?

#

Or has any resources that could help me understand that metric better?

serene scaffold Apr 19, 2022, 4:17 AM

#

!mute 899605898639597568 "1 week" It seems your only interest in our community is as a place to post Medium articles from the same author. This is not reddit. Please take your promotion elsewhere.

arctic wedgeBOT Apr 19, 2022, 4:18 AM

#

:incoming_envelope: :ok_hand: applied mute to @next phoenix until <t:1650946679:f> (6 days and 23 hours).

thorn venture Apr 19, 2022, 8:02 AM

#

I have a band column in a data frame. There are 5 values only spread throughout the 100 rows df . I want to make the df in 5 rows. All the same band values column data will be added into a single row. So there will be one row each for those 5 band. So 5 rows will be there. Can anyone please help me??

tough frigate Apr 19, 2022, 8:18 AM

#

Use pivot table or groupby function

quartz fable Apr 19, 2022, 9:33 AM

#

desert oar how did you solve it without ai? you might be confused about what ai actually is...

I solved it with an api using C#, i ve create an api, rules, everything.

jaunty mural Apr 19, 2022, 10:22 AM

#

hi there, need a help, can't plot the graphs like in excel

#

`
#%%
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#%%
df_new_1 = pd.read_excel(r"C:\Users\Nikita\Documents\MEGAsync\current_test\Outline Proposed\night_minds\data_velocities_impr.xlsx",
index_col=None, header=None,
sheet_name="Dynamic velocities_1")

df_new_2 = pd.read_excel(r"C:\Users\Nikita\Documents\MEGAsync\current_test\Outline Proposed\night_minds\data_velocities_impr.xlsx",
index_col=None, header=None,
sheet_name="Dynamic velocities_2")

#%%
z_1 = df_new.iloc[1:, 1:len(df_new_1.columns)]
z_2 = df_new.iloc[1:, 1:len(df_new_2.columns)]
#%%
z_1 = z_1.rename(columns={
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
16: 0.85,
17: 0.9,
18: 0.95,
19: 1.0
})
z_2 = z_2.rename(columns={
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
16: 0.85,
17: 0.9,
18: 0.95,
19: 1.0
})
#%%
z_1 = z_1.rename({
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
}, axis="index")
z_2 = z_2.rename({
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
}, axis="index")
`

#

`
#%%
sns.set(style = "whitegrid")

f, (ax1, ax2) = plt.subplots(1, 2, figsize = (16, 9), dpi=160)

x_1 = z_1.index
y_1 = z_1.columns
Y_1, X_1 = np.meshgrid(y_1, x_1)

ax1.plot(X_1, Y_1, ".-", label="")
ax1.legend(loc="upper right")
ax1.set(xlabel=r"$d_n$",
ylabel=r"$\rho_{m}^{-} \cdot 10^7$ (Ом$\cdot$м)")
ax1.grid(b=True, which='major', color='#666666', linestyle='-', alpha=0.7)
ax1.minorticks_on()
`

#

here's the result in python

true elk Apr 19, 2022, 10:57 AM

#

jaunty mural hi there, need a help, can't plot the graphs like in excel

!code

arctic wedgeBOT Apr 19, 2022, 10:57 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

jaunty mural Apr 19, 2022, 11:17 AM

#

true elk !code

i want to plot every row as separated graph

#

where x is column name and y is the data in the row

#

@true elk i want a scatter plot

serene scaffold Apr 19, 2022, 11:58 AM

#

quartz fable I solved it with an api using C#, i ve create an api, rules, everything.

Do the rules always produce the correct result?

jaunty mural Apr 19, 2022, 11:59 AM

#

how to select every time each row separated?
for instance i need a first row, I use head(1) but then I need only 2nd row

arctic wedgeBOT Apr 19, 2022, 12:45 PM

#

Hey @jaunty mural!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

jaunty mural Apr 19, 2022, 12:49 PM

#

#

that's how i plot the first row of my table

#

here's the result

#

but how to plot the next line and next, and ... etc
this command didn't work for next lines
df_new_1.head(1), df_new_1.iloc[1, :]

serene scaffold Apr 19, 2022, 12:52 PM

#

@jaunty mural you can use df.iterrows, I guess.

#

when you do df.head(n), you get the first n rows, so it's not a way to select a specific row.

jaunty mural Apr 19, 2022, 12:54 PM

#

serene scaffold <@403280466930565121> you can use `df.iterrows`, I guess.

didn't work for me, i have tried

serene scaffold Apr 19, 2022, 1:00 PM

#

jaunty mural didn't work for me, i have tried

it didn't work in what way? saying that something "didn't work" is quite opaque.

jaunty mural Apr 19, 2022, 1:02 PM

#

serene scaffold it didn't work in what way? saying that something "didn't work" is quite opaque.

because each row through loop printed in two columns

serene scaffold Apr 19, 2022, 1:26 PM

#

jaunty mural because each row through loop printed in two columns

if you use iterrows, it will give you values from each column, for every row

jaunty mural Apr 19, 2022, 1:44 PM

#

serene scaffold if you use iterrows, it will give you values from each column, for every row

but i need different for each row all columns (values)

jaunty mural Apr 19, 2022, 2:01 PM

#

i have managed it!!!!

plt.scatter(df_new.iloc[i].index, df_new.iloc[i, :]) plt.show()

lapis sequoia Apr 19, 2022, 2:03 PM

#

hoary rover Normality. Plus do all the other assumptions (independence, linearity etc)

for independence I have some predictors that are correlated, what should I do? cause I can't really remove them if I want to check to see if they affect whether someone has heart disease or not?

rose agate Apr 19, 2022, 2:09 PM

#

Is there any way to get interactive outputs like this one in Jupyter in Spyder?

jaunty mural Apr 19, 2022, 2:11 PM

#

damn it, i don't understand why the second ax2 is bigger than ax1

ionic beacon Apr 19, 2022, 3:20 PM

#

< hii >```

quartz fable Apr 19, 2022, 3:42 PM

#

serene scaffold Do the rules always produce the correct result?

Yep, but the result can be better

karmic valley Apr 19, 2022, 3:46 PM

#

hey trying to make loop

#

import skimage.io as io
image1 = io.imread(r"C:\Users\samay\part1.png")
image2 = io.imread(r"C:\Users\samay\part2.png")
images = [image1,image2]

for image in images:
    print(image[image[..., -1] != 0][...,0:-1].mean())

#

at the moment have to wrtie every image

#

they are all called part1, part2, part3, etc

#

i want to do it automatically

spare briar Apr 19, 2022, 4:13 PM

#

iterate over the directory

#

over files in C:\Users\samay\

#

easiest way would be using list comprehension

images = [read(file) for file in directory if file is pngfile]

#

@karmic valley

serene scaffold Apr 19, 2022, 4:19 PM

#

quartz fable Yep, but the result can be better

how can the result be better than "correct"? does the problem not have definitive answers?

candid pollen Apr 19, 2022, 4:27 PM

#

hey im trying to run this Masked_RCNN for webcam from https://github.com/Cheng-Lin-Li/MachineLearning/tree/master/Competition/ObjectDetectionSegmentation.
but i got an error AlreadyExistsError: Another metric with the same name already exists.

karmic valley Apr 19, 2022, 4:28 PM

#

spare briar easiest way would be using list comprehension ``` images = [read(file) for file ...

oh i see. will try this!

spare briar Apr 19, 2022, 4:29 PM

#

good luck i wrote pseudocode but you should be able to replace with python functions

karmic valley Apr 19, 2022, 4:29 PM

#

do i have to specify the filepath first like stop at this C:\Users\samay\

spare briar Apr 19, 2022, 4:30 PM

#

python standard library has a module called os

#

it gives functions for iterating over a directory

#

like os.listdir

karmic valley Apr 19, 2022, 4:31 PM

#

ah okay i will try look that up, im still newbie

spare briar Apr 19, 2022, 4:31 PM

#

no problem

#

https://docs.python.org/3/library/os.html

karmic valley Apr 19, 2022, 4:32 PM

#

karmic valley ```py import skimage.io as io image1 = io.imread(r"C:\Users\samay\part1.png") im...

from this code, which ones shall i delete for now? line 2,3,4,?

#

and i replace that with file directory

spare briar Apr 19, 2022, 4:32 PM

#

you can replace lines 2-4 with the list comprehension

#

and it will work for a directory full of as many .png files as you want

#

put them all in a list

karmic valley Apr 19, 2022, 4:33 PM

#

oh so they are 2 separate lines of codes. first read file directory then do list comrehsion

spare briar Apr 19, 2022, 4:33 PM

#

nope you are reading in the list comprehension

#

images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

karmic valley Apr 19, 2022, 4:35 PM

#

spare briar images = [io.imread(file) for file in os.listdir("C:\Users\samay\") if file.ends...

ahh thanks i will add this to my code!

spare briar Apr 19, 2022, 4:35 PM

#

do you understand this list comprehension syntax

#

it is very powerful

karmic valley Apr 19, 2022, 4:36 PM

#

yeah what you wrote seems to make sense logically just didnt know how to do before

spare briar Apr 19, 2022, 4:36 PM

#

yup just try to remember the pattern, it is very useful

karmic valley Apr 19, 2022, 4:37 PM

#


import skimage.io as io
images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

#for image in images:
print(image[image[..., -1] != 0][...,0:-1].mean())

#

do i still need for loop

#

or shall i put for loop on 2nd line

spare briar Apr 19, 2022, 4:37 PM

#

well images is a list

#

now you want to do something with each object in the list

#

so you need to iterate over images

karmic valley Apr 19, 2022, 4:38 PM

#

oh so your line of code adds them all to list?

spare briar Apr 19, 2022, 4:38 PM

#

right my line reads every .png file in C:\Users\samay into a list called images

karmic valley Apr 19, 2022, 4:38 PM

#


import skimage.io as io
images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())

spare briar Apr 19, 2022, 4:39 PM

#

yeah that should work

karmic valley Apr 19, 2022, 4:39 PM

#

so this should read the list right?

#

nice!

spare briar Apr 19, 2022, 4:39 PM

#

need to import os too

karmic valley Apr 19, 2022, 4:39 PM

#

thank you!

#

oh yeah

#

adding stuff to list like this so powerful true! way easier than doing manually

spare briar Apr 19, 2022, 4:40 PM

#

you can also do it with dictionaries

#

{key: val for key, val in zip(list1, list2)}

karmic valley Apr 19, 2022, 4:41 PM

#

oh nice, will search that up too

serene scaffold Apr 19, 2022, 4:41 PM

#

though that happens to work out to be the same as dict(zip(list1, list2))

spare briar Apr 19, 2022, 4:41 PM

#

im just trying to teach him that this is a generic pattern

serene scaffold Apr 19, 2022, 4:42 PM

#

right

karmic valley Apr 19, 2022, 4:42 PM

#

to learn more about this what can i search up exactly

spare briar Apr 19, 2022, 4:43 PM

#

these are called comprehensions

#

a related concept is generators

serene scaffold Apr 19, 2022, 4:43 PM

#

generator expressions are related, but generators in general are not.

karmic valley Apr 19, 2022, 4:44 PM

#

ah got you

quartz fable Apr 19, 2022, 4:44 PM

#

serene scaffold how can the result be better than "correct"? does the problem not have definitiv...

Its not so hard.
If i choose a teacher to give the class A and i need to put.other teacher to give the class B. If the first teacher also can give class B and i put another teacher into class A, maybe its better.

karmic valley Apr 19, 2022, 4:45 PM

#

images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

how come words after the if are in green

#

did i type wrong?

median moat Apr 19, 2022, 4:47 PM

#

That is just the formatting that is used when you do

#

!code

arctic wedgeBOT Apr 19, 2022, 4:48 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

karmic valley Apr 19, 2022, 4:48 PM

#

oh

#

got this error

#

true elk Apr 19, 2022, 4:50 PM

#

I've never used ML before, I think I have a good opportunity to use it in a project. Is 41 nodes in the output layer viable?

desert oar Apr 19, 2022, 4:52 PM

#

true elk I've never used ML before, I think I have a good opportunity to use it in a proj...

viable if necessary, but what are you actually doing? you might need a lot of data for something like that

true elk Apr 19, 2022, 4:53 PM

#

It's kinda of OCR but with only 41 possibilities

#

I have 30 hours of video to analyse, I can skip some frames (only need 1 info per seconde)

#

And the number only ranges from -20° to +20°

#

I thought about doing manual detect on the pixels but the background is changing too much

#

It's not too hard for the data entry but I'm on a Mac so no CUDA/big GPU

#

and as it would be my first project, I don't want to get into a problem that might not be solvable with my current setup/knowledge

compact rose Apr 19, 2022, 4:58 PM

#

So guys, i have a doubt about machine learning. I am currently building a model based on a dataset about music popularity. The idea for model is that the company gets a model that can predict the popularity of each song. But now i have a doubt about data preparation that is : Should i delete the musics with low popularity or should i keep it? As shown in the screenshot, the column of music popularity goes from 0 to 100, so should i delete values above 60/70 or are they good to train the model?

true elk Apr 19, 2022, 4:58 PM

#

desert oar viable if necessary, but what are you actually doing? you might need a lot of da...

what would you recommend in my case?

misty flint Apr 19, 2022, 5:14 PM

#

what are you trying to do? there are a few out of the box solutions that could just detect those numbers if needed

#

i would highly recommend doing some image processing however first

#

to increase accuracy

#

before feeding in frames from the videos

#

also i am not a Computer Vision guy so i will defer to someone with more expertise RunFail

true elk Apr 19, 2022, 5:22 PM

#

misty flint i would highly recommend doing some image processing however first

Huge pre processing was one of my ideas but I'm not sure how to implement it. It's not opaque numbers 😦

#

As I have a lot of data, and data entry is not hard, I thought going directly to ML would be a better option

true elk Apr 19, 2022, 5:26 PM

#

misty flint what are you trying to do? there are a few out of the box solutions that could j...

I've been stuck on other OCR projects so I can foresee some issues by going with an out of the box OCR solution. My best chance would be EasyOCR with a custom model, which kinda leads to the same situation I'm in right now

mild dirge Apr 19, 2022, 5:28 PM

#

true elk I've been stuck on other OCR projects so I can foresee some issues by going with...

If all images look similar to that, you could at least make a mask for the blue color of the digits

indigo garnet Apr 19, 2022, 5:29 PM

#

how to decide the output size for a convelutional 2d net?

true elk Apr 19, 2022, 5:30 PM

#

mild dirge If all images look similar to that, you could at least make a mask for the blue ...

Do you mind I @ you in an help channel so you can explain me better your suggestion?

mild dirge Apr 19, 2022, 5:30 PM

#

It's not that complicated, a mask is just checking for each pixel if it falls in a certain color range

#

and then setting that pixel to 1, if it is in that range, otherwise 0

#

So you'll get an image the size of your original image with only 1's and 0's

true elk Apr 19, 2022, 5:31 PM

#

Like a threshold with upper and lower boundaries? Better to do it in the RGB mode?

misty flint Apr 19, 2022, 5:31 PM

#

true elk Huge pre processing was one of my ideas but I'm not sure how to implement it. It...

no, image processing =/= pre processing

#

using matlab image functions or opencv for image processing

true elk Apr 19, 2022, 5:32 PM

#

misty flint no, image processing =/= pre processing

sorry, I'm not sure I understood what you meant

mild dirge Apr 19, 2022, 5:32 PM

#

If you make the color range around the blue color of your display, then you will at least have filtered out the important stuff (hopefully)

misty flint Apr 19, 2022, 5:32 PM

#

thats a good initial approach

true elk Apr 19, 2022, 5:33 PM

#

And then manually doing the detection of the 7 segment digits on some specific pixel location?

#

I think I've done something with 7 seg in AoC this year 😄

mild dirge Apr 19, 2022, 5:33 PM

#

Not really sure what would be the best way of classifying it

#

Just making a classifier with an output for each possible outcome seems naive since a lot of outcomes will be very similar (like -19 and 19 f.e.)

true elk Apr 19, 2022, 5:34 PM

#

and maybe just discard the frames where there is too much blue

mild dirge Apr 19, 2022, 5:34 PM

#

true elk and maybe just discard the frames where there is too much blue

Depends if you want your model to classify images with a lot of blue too

true elk Apr 19, 2022, 5:34 PM

#

mild dirge Just making a classifier with an output for each possible outcome seems naive s...

damn forgot about that.. Thank god I asked here 😄

mild dirge Apr 19, 2022, 5:34 PM

#

If you remove all the blue-ish images, your model will surely perform bad on those

true elk Apr 19, 2022, 5:35 PM

#

mild dirge Depends if you want your model to classify images with a lot of blue too

nah, I have a lot of frames, I just need data points each 30 or 60 frames

#

the only thing is that I need high confidence

mild dirge Apr 19, 2022, 5:35 PM

#

I get that you have a lot of training data, but if you want it to work for those blue-ish images, you want to use those for training too

#

Removing "difficult to classify images" is a bad idea is what I'm trying to say

#

Unless you know your real data is not gonna contain any of those

karmic valley Apr 19, 2022, 5:38 PM

#

help

#


import skimage.io as io
import os
images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())

desert oar Apr 19, 2022, 5:38 PM

#

true elk It's kinda of OCR but with only 41 possibilities

41 distinct classes is more manageable than 41 continuous outputs. seems reasonable, start with MNIST practice and go up from there to your real data

karmic valley Apr 19, 2022, 5:38 PM

#

i getting error

#

  File "C:\Users\samay\Dropbox\Average pixel colour.py", line 5
    images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]
                                                                                                                     ^
SyntaxError: EOL while scanning string literal

true elk Apr 19, 2022, 5:38 PM

#

so just to check if I understood correctly: doing blue mask on data + doing ML anyway (and keeping difficult images)?

mild dirge Apr 19, 2022, 5:39 PM

#

the blue mask is to make the images easier for whatever model you are planning to use

#

Since we already know the display is going to be blue

true elk Apr 19, 2022, 5:39 PM

#

desert oar 41 distinct classes is more manageable than 41 continuous outputs. seems reasona...

I actually have Kaggle tutorial on MNIST opened in my browser right now 😄

mild dirge Apr 19, 2022, 5:39 PM

#

mild dirge Since we already know the display is going to be blue

Your model won 't have to learn that

karmic valley Apr 19, 2022, 5:39 PM

#

karmic valley ```py import skimage.io as io import os images = [io.imread(file) for file in o...

i think im so close

#

but missing some syntax

#

cant figure out

true elk Apr 19, 2022, 5:40 PM

#

Thanks for the help guys! Let's try to code this 😄

karmic valley Apr 19, 2022, 5:42 PM

#

hey anyone know what wrong with this:

import skimage.io as io
import os
images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())

mild dirge Apr 19, 2022, 5:44 PM

#

karmic valley hey anyone know what wrong with this: ```py import skimage.io as io import os ...

You already asked

#

twice in here and opened a help channel, try waiting for a reply pls

karmic valley Apr 19, 2022, 5:44 PM

#

thought noone saw sorry

#

cant figure it out

frozen marten Apr 19, 2022, 6:42 PM

#

residual plot how can it have both normal distribution and constant variance (homoscedasticity) in graph
(wrt Ordinary Least Squares assumptions)'

bold timber Apr 19, 2022, 6:51 PM

#

What is the type of distance if I use p = 1.5?

#

I know p=1 is manhattan and p=2 is eucliden, but what is the distance if p=1.5?

mild dirge Apr 19, 2022, 6:53 PM

#

bold timber What is the type of distance if I use p = 1.5?

context?

bold timber Apr 19, 2022, 7:11 PM

#

mild dirge context?

in KNN

#

whether 1.5 is average distance?

misty flint Apr 19, 2022, 8:20 PM

#

bold timber I know p=1 is manhattan and p=2 is eucliden, but what is the distance if p=1.5?

instead of either Manhattan or Euclidean distance, youll get something in between. gimme a sec

#

#

p=1

#

#

p=2

#

now imagine a curve connecting the green and red blocks

#

but right between the previous two trajectories. and thats how you can visualize p=1.5

#

dont think it has a specific name. i think most just refer to it as a minkowski distance but with p=1.5

river sierra Apr 19, 2022, 8:25 PM

#

bold timber I know p=1 is manhattan and p=2 is eucliden, but what is the distance if p=1.5?

For a more mathematical/theoretical take, you should look into Lp spaces. Here is a good article on them: https://en.wikipedia.org/wiki/Lp_space?wprov=sfti1

Lp space

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue (Dunford & Schwartz 1958, III.3), although according to the Bourbaki group (Bourbaki 1987) they were first introduced by Frigyes Riesz (...

bold timber Apr 19, 2022, 8:29 PM

#

misty flint dont think it has a specific name. i think most just refer to it as a minkowski ...

It means manhattan and euclidean are included as Minkowski distance?

river sierra Apr 19, 2022, 8:33 PM

#

every vector from the origin to the unit circle has a length of one, the length being calculated with length-formula of the corresponding p

bold timber Apr 19, 2022, 8:35 PM

#

ok thank you for the explanation

trail horizon Apr 19, 2022, 8:48 PM

#

hi channel

#

If i have a question regarding file names in a directory, where should I go ?

serene scaffold Apr 19, 2022, 8:51 PM

#

trail horizon If i have a question regarding file names in a directory, where should I go ?

a general help channel. see #❓｜how-to-get-help for instructions for how to get one.

molten gust Apr 19, 2022, 9:41 PM

#

I am currently in a course for data science, that is going to take 3.5 months.

As it is of utmost importance for me to journey with a high learning curve, it is of essence to communicate.

What literature would be very convenient to go through?

I want to get very competent, as fast as I can. It is important to be playful and accumulate experience, so to program a lot on basics and then on different projects that are challenging regarding a solid structure and a diverse set of functions and styles to write code, is a no brainer.

I have a need to be efficient in learning and practice.

Would very much appreciate every constructive advice and suggestion.

To get from zero -> A.I. Developer

Also working on a degree in physics at the same time.

median moat Apr 19, 2022, 9:46 PM

#

molten gust I am currently in a course for data science, that is going to take 3.5 months. ...

Well if you are at the completely zero stage right now as I presume I would start with Automate the Boring Stuff the book to kind of take you through the beginning. Lots of practice projects good information and free PDFs online are available. I don't know much past that but it is always a good base.

molten gust Apr 19, 2022, 9:47 PM

#

Thank you for your feedback, I am already on that, but I do not have the time to go about it the average way, on the way I will sort out by myself what will be of importance and what not, but someone with loads of experience can give me some more insights on what to focus more and on what to focus less.

mild dirge Apr 19, 2022, 9:51 PM

#

you want to learn how AI works, or data science in general or?

mint palm Apr 19, 2022, 9:52 PM

#

I know some basic type of data cleaning we do about anomalous data....but when its comes to data such as face motion, heart rate pulse, etc etc., what kind of cleaning is done...i mean what do we do

mild dirge Apr 19, 2022, 9:52 PM

#

And there aren't clear shortcuts, if you want to do data science in Python, you need to know python

#

also the more basic stuff like data structures/functions/classes etc.

molten gust Apr 19, 2022, 9:53 PM

#

Yeah I am doing a python - Data science course -> MYSQL / NOSQL -> AI / ML

#

But the average way is not working by itself, I don't have the time to idle through this. I need to push it.

So I just do everything that is of essence for everyone but with a faster pace? So I need to adjust my learning speed, my reading speed and processing speed like in every other discipline, I guess.

mild dirge Apr 19, 2022, 9:55 PM

#

I think the expectation might be a bit too high

#

How much experience do you have with python?

#

or coding in general?

river sierra Apr 19, 2022, 10:06 PM

#

molten gust I am currently in a course for data science, that is going to take 3.5 months. ...

What is your background in mathematics? Specifically, Linear Algebra, Probability Theory (and therefore statistics), and Optimization theory?

molten gust Apr 19, 2022, 10:07 PM

#

river sierra What is your background in mathematics? Specifically, Linear Algebra, Probabilit...

I am almost done with my physics degree.

river sierra Apr 19, 2022, 10:11 PM

#

You should be good on the theory then.

#

I found this link that might be helpful: https://realpython.com/learning-paths/math-data-science/

Math for Data Science (Learning Path) – Real Python

In this learning path, you'll gain the mathematical foundations you'll need to get ahead with data science.

#

Either way, to get your desired speed, you’ll need to read and write a lot of code

#

That’s the main way to learn

#

Especially at your pace

molten gust Apr 19, 2022, 10:50 PM

#

Thank you very much @river sierra

misty flint Apr 19, 2022, 10:53 PM

#

bold timber It means manhattan and euclidean are included as Minkowski distance?

yes

sick fjord Apr 19, 2022, 11:45 PM

#

is there anyone here good with tensorflow?

#

I would like to ask a few questions to see if what I want to do is viable

earnest abyss Apr 19, 2022, 11:52 PM

#

sick fjord is there anyone here good with tensorflow?

dontasktoask.com

wheat ice Apr 19, 2022, 11:52 PM

#

that url is blocked

#

rather, just say "please ask"

earnest abyss Apr 19, 2022, 11:53 PM

#

still, that page makes an excellent point.

sick fjord Apr 19, 2022, 11:54 PM

#

Let me rephrase.. is there anyone I can DM to ask questions

earnest abyss Apr 19, 2022, 11:56 PM

#

still, same problem. Could you give a general idea of what you want here? I have some general idea of machine learning, I dabbled in this stuff a while ago, but I might have some idea of what might be possible. However, if you're asking about specifics, I've got absolutely no clue.

#

if you think it would contravene the rules of this server, you'd be better off not asking at all

sick fjord Apr 20, 2022, 12:02 AM

#

sure. i have a 2-d histogram that has two "tails" of data. i would like to build a model that can essentially distinguish between the two with a level of confidence.

earnest abyss Apr 20, 2022, 12:04 AM

#

sick fjord sure. i have a 2-d histogram that has two "tails" of data. i would like to build...

Like this?

#

sorry for the crappy ms paint picture, art isn't exactly my forte

sick fjord Apr 20, 2022, 12:06 AM

#

here you go

earnest abyss Apr 20, 2022, 12:08 AM

#

so you want to distinguish between the upper and lower tails?
What would stop you from using traditional programming, what necessitates machine learning?

sick fjord Apr 20, 2022, 12:10 AM

#

I can and have already. I would like to compare the two and for practice

earnest abyss Apr 20, 2022, 12:11 AM

#

do you have thousands of different input datasets with two tails? Is that feasible?

#

I really don't know how that sort of thing would be done, it's not like the sort of optimization problem that machine learning is really good at

sick fjord Apr 20, 2022, 12:14 AM

#

It's been done before using svm

earnest abyss Apr 20, 2022, 12:17 AM

#

What would be the optimal machine learning strategy to detect and remove specific sounds from an input audio source?

sick fjord Apr 20, 2022, 12:24 AM

#

im going to throw a wild guess and say fourier transform

earnest abyss Apr 20, 2022, 12:25 AM

#

I looked into that, and my intended use case, removing laugh tracks, wouldn't work with the fourier transform because the frequencies of a laugh track are so similar to the frequencies of everyday speech

#

desert oar Apr 20, 2022, 1:29 AM

#

molten gust I am currently in a course for data science, that is going to take 3.5 months. ...

it sounds like you will have enough on your plate between the physics degree and this course. i would suggest not fixating on trying to achieve mastery over something complicated in a short period of time, and instead focusing on what you are learning in the course, and attempting to internalize and apply it as much as possible.

#

a good foundation in the basics is more valuable than a scattered sample of a lot of advanced things

pseudo wren Apr 20, 2022, 2:15 AM

#

I need to practice predictive modeling with linear regression

#

How can I do this

#

What are good resources

desert oar Apr 20, 2022, 2:39 AM

#

pseudo wren I need to practice predictive modeling with linear regression

start with the "boston housing" dataset, aka the "ames" dataset. it's on kaggle

#

it's like the titanic dataset, but for regression instead of classification

pseudo wren Apr 20, 2022, 2:40 AM

#

Hmmm okay

#

Now the actual linear regression has a built in

#

And the predictive modeling built in as well

desert oar Apr 20, 2022, 2:43 AM

#

i don't know what you mean by that, sorry

pseudo wren Apr 20, 2022, 2:53 AM

#

desert oar i don't know what you mean by that, sorry

in order to do predictive modeling sklearn has built in functions for this right?

desert oar Apr 20, 2022, 2:56 AM

#

pseudo wren in order to do predictive modeling sklearn has built in functions for this right...

scikit-learn implements a large number of machine learning algorithms. you must decide if any of those algorithms are useful to your work.

molten gust Apr 20, 2022, 3:01 AM

#

desert oar it sounds like you will have enough on your plate between the physics degree and...

never said anything about mastery in a short time, but to go the average way or the way of mastery is about being in two different worlds

molten gust Apr 20, 2022, 3:02 AM

#

desert oar it sounds like you will have enough on your plate between the physics degree and...

and yes, it is important to focus on what I am confronted with, that's what I already do

I got myself a few books now and working these through as well, it will solidify what I already know and give me new perspectives. Thanks for your feedback!

indigo garnet Apr 20, 2022, 4:32 AM

#

how to calculate the output channels for conv2d layer, is it a random value or is there anyway to get the number for it?

misty flint Apr 20, 2022, 4:32 AM

#

pseudo wren in order to do predictive modeling sklearn has built in functions for this right...

yes. ditto what salt said

#

i would say after you understand the tooling, try and use it on a new and different dataset

bold timber Apr 20, 2022, 5:45 AM

#

anyone can explaining to me what the type of error like this: "ERROR! Session/line number was not unique in database. History logging moved to new session 12088"

#

I got it when I use bayesian search for tuning the hyperparameter

safe elk Apr 20, 2022, 7:48 AM

#

Ah if you ask the answer I think is it depends on your interests and your skills

tacit basin Apr 20, 2022, 7:59 AM

#

you can try kaggle comps, datasets for example

iron basalt Apr 20, 2022, 8:07 AM

#

Implement a Tsetlin Machine.

mint palm Apr 20, 2022, 8:18 AM

#

Does data such as temperature readings,and ECG need filtering?
What kind of filtering do these need

soft lance Apr 20, 2022, 9:34 AM

#

Hello, I need your help with pytorch.DataLoader. I want it to sample images from my custom Coco_Dataset_Manager, a batch of 3 images per iteration. Images in Coco have different sizes. Let's say DataLoader sampled 3 images with sizes (100, 100), (100, 100), (400, 400). I want to force it to cache these images in a "waiting room" and sample some more. When a waiting room of a particular size has 3 or more images, I want DataLoader to put the batch through collate_fn and return it.

#

How can I program such behaviour?

#

I'm thinking about writing a custom Sampler.

#

devout sail Apr 20, 2022, 10:46 AM

#

soft lance Hello, I need your help with `pytorch.DataLoader`. I want it to sample images fr...

Where are you taking the samples from? are you loading the images from a folder, or...

#

Trying to understand how much wiggle room you have

compact rose Apr 20, 2022, 11:20 AM

#

So guys, i have a doubt about machine learning. I am currently building a model based on a dataset about music popularity. The idea for model is that the company gets a model that can predict the popularity of each song. But now i have a doubt about data preparation that is : Should i delete the musics with low popularity or should i keep it? As shown in the screenshot, the column of music popularity goes from 0 to 100, so should i delete values above 60/70 or are they good to train the model?

small orbit Apr 20, 2022, 11:30 AM

#

how can i implement Keras Tuner into my code? https://nbviewer.org/urls/bpa.st/raw/A6JA

Notebook on nbviewer

Check out this Jupyter notebook!

devout sail Apr 20, 2022, 11:45 AM

#

compact rose So guys, i have a doubt about machine learning. I am currently building a model ...

I don't see a screenshot

#

Also, a high value means unpopular?

#

Either way, I don't see why you would remove it

compact rose Apr 20, 2022, 11:46 AM

#

means popular

#

#

Sorry, forgot to put it ahhaha

devout sail Apr 20, 2022, 11:47 AM

#

compact rose So guys, i have a doubt about machine learning. I am currently building a model ...

You said here "delete [...] with low popularity" and then "delete values above 60/70", so I'm confused

compact rose Apr 20, 2022, 11:47 AM

#

below 60/70* sorry

devout sail Apr 20, 2022, 11:47 AM

#

yeah I'm not sure why that's necessary

#

If anything you want a good representation of the sample space

compact rose Apr 20, 2022, 11:50 AM

#

I was in a paradox where i was thinking " Well, my target is predicting songs with high popularity, but should i delete low popularity? However, with low popularity, the model will also understand what are bad musics"

devout sail Apr 20, 2022, 11:52 AM

#

yeah, basically your answer is that last sentence

#

Negative examples are important

#

If it only sees popular songs, then nothing stops it from learning to say that everything is popular

compact rose Apr 20, 2022, 11:53 AM

#

True, thanks mate! Thank you for your time 🙂

devout sail Apr 20, 2022, 11:53 AM

#

np

small orbit Apr 20, 2022, 11:54 AM

#

anyone?

dense harbor Apr 20, 2022, 12:18 PM

#

Hi everyone, I am currently solving the Titanic Disaster Problem on Kaggle.

I analysed the data, and then built a machine learning model and I got score around 0.78%

I tried to improve my model (around 30times by now) for better accuracy and no hope. I tried almost every possible technique to improve my score (using machine learning) and no hope. Can you suggest me any proven technique I could’ve missed? (please don’t send full code to the problem I want to try on my own).
Many thanks in advance ☺️

true elk Apr 20, 2022, 12:23 PM

#

Hey guys, I'm back 😄

#

Capture_decran_2022-04-20_a_13.22.41.png

#

Do you think this image processing will be enough for classification? I'll try the MNIST dataset first to get some understanding on how to train a model

#

I've tried already a lot of things, I think HSV mask was my best option yet

soft lance Apr 20, 2022, 12:33 PM

#

devout sail Where are you taking the samples from? are you loading the images from a folder,...

From instances_train2017.json, fetching them through img_url

devout sail Apr 20, 2022, 12:34 PM

#

you're downloading the images on the fly each time you're training?

true elk Apr 20, 2022, 12:39 PM

#

Capture_decran_2022-04-20_a_13.38.59.png

#

Do you think I can get out of it with this data?

#

I need to classify those into -20 to 20, so 41 nodes output

#

I only need high confidence data, I can discard all the other ones

#

Yesterday, PC Camel suggested me to focus on processing first, then going for MNIST example for ML

#

Is CNN the right path for this task?

true elk Apr 20, 2022, 1:05 PM

#

😦

hazy knot Apr 20, 2022, 1:11 PM

#

Anyone have any suggestions for storing different versions of multiple models?

bronze spire Apr 20, 2022, 1:28 PM

#

I have finished the basics of Python and I've decided that I want to learn Data Science, where can I start from? Can someone give me a good source that's free?

true elk Apr 20, 2022, 1:42 PM

#

while waiting for an answer, I'm really amused by the amount of people requesting help for assignments or bad purposes Issou

warm oracle Apr 20, 2022, 2:01 PM

#

A lot of people want easy/fast solutions after all lol

spare briar Apr 20, 2022, 2:05 PM

#

bronze spire I have finished the basics of Python and I've decided that I want to learn Data ...

https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf

#

https://github.com/jeffrey-xiao/papers/blob/master/textbooks/designing-data-intensive-applications.pdf

true elk Apr 20, 2022, 2:10 PM

#

Is building my own ML with numpy a good idea to start with?

#

Thanks @serene scaffold 😄

warm oracle Apr 20, 2022, 2:26 PM

#

I'm planning to start a career as a Machine Learning Researcher.
I've already gotten the math side (Linear Algebra, Calculus, Statistics and Probabilities) somewhat covered. And currently learning TensorFlow/Keras and planning to study Pytorch after.
Tool-wise, is there anything else I need to make sure I know?

desert oar Apr 20, 2022, 2:31 PM

#

true elk Is CNN the right path for this task?

i think it is the right task, but you have a lot of noise. if you can do pre-processing to increase the signal-to-noise ratio that would help. and you will have to accept that some instances are not going to give usable results, e.g. the blank ones

#

it looks like maybe 1/3 to 1/2 of these are unusable

#

i think a "good" model should give ~0 confidence on all categories for those unusable ones

#

that's going to be the hard part imo, not obtaining false positives on the junk images

#

what is this task, anyway? is this the thermometer in your car dashboard? some industrial process that is hooked up to an LCD but not a usable computer?

true elk Apr 20, 2022, 2:33 PM

#

it's to detect the range of temperature in a game (NGL Biathlon)

#

Don't ask why 😄

#

Kidding

true elk Apr 20, 2022, 2:35 PM

#

desert oar that's going to be the hard part imo, not obtaining false positives on the junk ...

is going for a numpy as my ML machine a good idea in this context?

true elk Apr 20, 2022, 2:36 PM

#

true elk it's to detect the range of temperature in a game (NGL Biathlon)

I have the information that "Cold" should range from "-10 to -15" but I'd like to see the bell curve to check the repartition (and for all the other temperatures)

true elk Apr 20, 2022, 2:37 PM

#

desert oar i think it is the right task, but you have a _lot_ of noise. if you can do pre-p...

Any ideas for better pre-processing? The temperature is kind of transparent and blue, so on blue sky even an human can't read the temperature

plush glacier Apr 20, 2022, 2:38 PM

#

true elk Is CNN the right path for this task?

it seems like it although it might be better to pretrain a model on something like mnist and then maybe train the pretrained model on the data you have

#

or use a already pretrained model

desert oar Apr 20, 2022, 2:38 PM

#

true elk is going for a numpy as my ML machine a good idea in this context?

numpy is not really a machine learning framework. it is just a library for linear algebra. you will want to use a higher-level framework like pytorch that can take care of all the complicated mathematical optimization stuff for you

#

you can try mnist pre-training. you'll want to convert rgb to b&w first though

#

depending on your time and resources, you could also take like 1000 pictures of "clean" output, artificially add noise, and use that as pre-training

#

if this were an industrial process maybe i'd suggest doing that, idk if it's worth it for a video game

#

i had no idea there was a biathlon video game..

true elk Apr 20, 2022, 2:40 PM

#

yeah I thought of doing just manual pixel detection as this is 7 segment digits

#

but also a good first project for ML

#

but maybe the the optimal first project aniblobsweat

desert oar Apr 20, 2022, 2:41 PM

#

no i think it's a great project

#

it's a relatively straightforward task but with lots of noise and a small amount of data

plush glacier Apr 20, 2022, 2:42 PM

#

desert oar numpy is not really a machine learning framework. it is just a library for linea...

i personally recommend tensorflow keras more for beginners

true elk Apr 20, 2022, 2:42 PM

#

I thought of going on Numpy directly to get my hands dirty (and understanding better what I'm doing), and also because all the MNIST video/tutorial I found just import it easily. I can't really do that with my current data

desert oar Apr 20, 2022, 2:42 PM

#

plush glacier i personally recommend tensorflow keras more for beginners

fair enough

#

i'd try it both ways tbh. pre-train on mnist, and not-pretraining on mnist. i am skeptical that handwritten numbers will be good pre-training for 7-segment lcd

desert oar Apr 20, 2022, 2:42 PM

#

true elk I thought of going on Numpy directly to get my hands dirty (and understanding be...

i wouldn't suggest trying to implement a neural network from scratch in numpy, that would be a good exercise if you wanted to get into ml engineering or numerical computing. but it would be a distraction in this project i think

plush glacier Apr 20, 2022, 2:43 PM

#

but it is worth to try tensorflow and pytorch i personally really want to learn pytorch but dont have the time yet

true elk Apr 20, 2022, 2:43 PM

#

From my understand, pre-training wouldn't be beneficial in this case. The numbers are really in a fixed position!

desert oar Apr 20, 2022, 2:43 PM

#

the reason you want pre-training is that you have a very small dataset here

#

even smaller when you consider the number of "usable" items

plush glacier Apr 20, 2022, 2:43 PM

#

that is the entire dataset?

true elk Apr 20, 2022, 2:44 PM

#

I have 30 hours of footage at 30 fps, it's not enough?

desert oar Apr 20, 2022, 2:44 PM

#

oh

#

that's a lot

#

and they're all labeled? that is, you know the number for every frame?

plush glacier Apr 20, 2022, 2:44 PM

#

true elk I have 30 hours of footage at 30 fps, it's not enough?

well you might want to use semi supervised learning if they aren't labeled but if they are labeled use supervised learning

true elk Apr 20, 2022, 2:44 PM

#

This is where I'm at right now

Capture_decran_2022-04-20_a_15.44.40.png

true elk Apr 20, 2022, 2:45 PM

#

desert oar and they're all labeled? that is, you know the number for every frame?

I can do manual labelling to some extent

plush glacier Apr 20, 2022, 2:45 PM

#

are all numbers always the same at the same location?

true elk Apr 20, 2022, 2:45 PM

#

desert oar i wouldn't suggest trying to implement a neural network from scratch in numpy, t...

it's for learning purposes

true elk Apr 20, 2022, 2:45 PM

#

plush glacier are all numbers always the same at the same location?

exact same location!

desert oar Apr 20, 2022, 2:46 PM

#

as a side note, this game looks too realistic, my heart rate is going up just watching gameplay footage 😆

true elk Apr 20, 2022, 2:46 PM

#

That's why the fixed position pixel detection was my first idea

plush glacier Apr 20, 2022, 2:46 PM

#

true elk exact same location!

but a 2 will always look the some or will it slightly change

true elk Apr 20, 2022, 2:46 PM

#

(+ I've done some 7 segment logic in AoC this year 😄 )

plush glacier Apr 20, 2022, 2:46 PM

#

so basically the same font

desert oar Apr 20, 2022, 2:46 PM

#

i assume it's the temperature in the top right corner here? https://www.youtube.com/watch?v=ZlJ6VPN8G0o

YouTube

NGL Biathlon

БИАТЛОН ОЛИМПИАДА ЗА АЛЕКСАНДРА ЛОГИНОВА

В этом видео мы пробежим все гонки Олимпиады 2022 по биатлону за Александра Логинова в игре NGL Biathlon! Получится ли завоевать медаль?!

Скачать игру 1 - http://boosty.to/ngl_biathlon
Скачать игру 2 - http://patreon.com/biathlon
Группа Вконтакте - http://vk.com/ngl_biathlon
Instagram Васи - https://www.instagram.com/vasya_ngl/
Instagram Игры -...

▶ Play video

true elk Apr 20, 2022, 2:46 PM

#

yep

desert oar Apr 20, 2022, 2:47 PM

#

it looks like it's fixed position in a HUD, with varying backgrounds

true elk Apr 20, 2022, 2:47 PM

#

btw the game is free 😄

#

(or you can donate ❤️ )

desert oar Apr 20, 2022, 2:48 PM

#

gotta love open source gaming!

true elk Apr 20, 2022, 2:48 PM

#

Please don't suggest training Tesseract, I literally had nightmares with it. EasyOCR saved my life

plush glacier Apr 20, 2022, 2:48 PM

#

in that case you may not want to use ml

desert oar Apr 20, 2022, 2:48 PM

#

i think they just want to do this as a toy project to learn

plush glacier Apr 20, 2022, 2:48 PM

#

oh in that case use ml or multiple solutions

desert oar Apr 20, 2022, 2:48 PM

#

i actually love this idea. easy enough to download gameplay footage and DIY it too

#

maybe i'll do it too 😛

#

i need hands-on practice w/ image deep learning i think

#

what's the state of the art for semi-supervised learning? i looked into it several years ago and it seemed like it was kind of a dead end

#

active learning might be a better choice perhaps, i've used that successfully for record linkage / deduplication projects

true elk Apr 20, 2022, 2:50 PM

#

desert oar maybe i'll do it too 😛

if you want to pair-programming on this one, I'd love some help and peer review !

plush glacier Apr 20, 2022, 2:51 PM

#

desert oar i need hands-on practice w/ image deep learning i think

i'm trying to get some by downloading like 1600 images from pexels so all images are high res and then get a tile from 256,256 at a random point of the image and use that to train a super resolution model although i still need to make the discriminator and i'm planning on doing that today (i'm not expecting great results it only has like 50k to 300k params (depending on if i use conv2d or dephwise seperable convolution)

true elk Apr 20, 2022, 2:52 PM

#

plush glacier oh in that case use ml or multiple solutions

What kind of model would you recommend in my case? I've been suggested both CNN and CRNN

desert oar Apr 20, 2022, 2:53 PM

#

true elk if you want to pair-programming on this one, I'd love some help and peer review ...

i probably won't have time but i'd be interested to see your progress on this

plush glacier Apr 20, 2022, 2:53 PM

#

how often does the temperature change like how many frames are between there and do you want the challenge of a multi frame solution because that will be way harder

desert oar Apr 20, 2022, 2:53 PM

#

plush glacier i'm trying to get some by downloading like 1600 images from pexels so all images...

sounds fun

#

i know nothing about ML on videos, i'm curious to know what the multi-frame solution is

#

is it a 3D CNN over fixed-length durations of video? like 5 frames at a time

plush glacier Apr 20, 2022, 2:55 PM

#

desert oar is it a 3D CNN over fixed-length durations of video? like 5 frames at a time

maybe a ConvLSTM2d()

true elk Apr 20, 2022, 2:55 PM

#

plush glacier how often does the temperature change like how many frames are between there and...

That's the point of my project, check the time at the change of temperature and count seconds. The temperature changes are more likely to be each 3-10 seconds but I'd like to have data each second

#

so I need at least 1 good frame each 30

#

and it's okay if I can get 1 out of 100ish

#

I only need high confidence on the 1

#

Damn the more I think, the more I realise that image comparison would be much easier 😦

plush glacier Apr 20, 2022, 2:58 PM

#

true elk That's the point of my project, check the time at the change of temperature and ...

if you want extra challenge you might want to make a solution that considers multiple frames but that would be very hard

#

although you are also maybe able to average out like 3 frames and use that to get more clear data but that can result in that you are 2 frames off

true elk Apr 20, 2022, 3:00 PM

#

someone told me yesterday to not discard bad frames, but he didn't knew about the whole context, so I'm a bit confused rn

plush glacier Apr 20, 2022, 3:02 PM

#

no i mean that you have like 30 frames each sec so you could make 30 predictions on each sec but what if you make it 28 groups of 3

#

and you use the average of each one

true elk Apr 20, 2022, 3:05 PM

#

damn this would be great for my image comparison solution! Get 30 frames of each second and doing kind of overlay/masking thing

#

like groups of 30

plush glacier Apr 20, 2022, 3:06 PM

#

i would just use averages because the shape and color doesn't really change

#

but that way you get 3 values for each frame so if the output is 2,2,5 you can say it is most likely a 2

true elk Apr 20, 2022, 3:08 PM

#

that would dilute my error rate

#

sooo.. where should I start 😄 ?

#

using my current processing on images, labelling a few, then throw it into keras/pytorch and pray for the best?

plush glacier Apr 20, 2022, 3:10 PM

#

true elk using my current processing on images, labelling a few, then throw it into keras...

i would say spend a bit more time on processing

true elk Apr 20, 2022, 3:12 PM

#

tbh I'm out of ideas on how to improve it 😦

plush glacier Apr 20, 2022, 3:12 PM

#

also are the numbers slightly transparent? if not you can use only that single color of the letters

true elk Apr 20, 2022, 3:13 PM

#

plush glacier also are the numbers slightly transparent? if not you can use only that single c...

yep.. they are

plush glacier Apr 20, 2022, 3:13 PM

#

could you send like 2 images to me that aren't pre-processed 1 that gives bad results with your current method and 1 that gives good results

#

and with to me i mean in this chat

true elk Apr 20, 2022, 3:14 PM

#

here you go

#

Good luck with that 2nd one 😄

plush glacier Apr 20, 2022, 3:27 PM

#

only way i ca think of is to subtract the pixel value from the bottom right most pixel and then increase the brightness a lot

true elk Apr 20, 2022, 3:29 PM

#

I'm trying to np.mean some group of images, seem promising for now.

dapper dune Apr 20, 2022, 3:43 PM

#

hey there! Can some1 help me with nvidia DeepStream (docker)?

plush glacier Apr 20, 2022, 3:46 PM

#

true elk I'm trying to `np.mean` some group of images, seem promising for now.

subtracting the most common pixel could be a way

#

although i would have to say it is impossible to get the second image because there is nothing there

#

although a function like ```py
def try_to_get_number(image):
bottom_right_pixel = image.mean()
return image - bottom_right_pixel

#

i wouldn't change your current way

desert oar Apr 20, 2022, 4:10 PM

#

i would also argue that some images are actually "unknown" and that a good model should indicate this

true elk Apr 20, 2022, 4:29 PM

#

desert oar i would also argue that some images are actually "unknown" and that a good model...

should I consider "unknown" as an output node or as a low confidence on all nodes?

plush glacier Apr 20, 2022, 4:35 PM

#

true elk should I consider "unknown" as an output node or as a low confidence on all node...

no as like a separate category that the model can classify

frigid elk Apr 20, 2022, 4:45 PM

#

any mlops guys working on foundry (palantir)? .. looking for some best practices on workflow within that environment, ml pipeline in code repository specifically. how to keep code modular given the available toolset and proven libraries to provide reliable results while utilizing spark scalability and not wasting resources

jaunty belfry Apr 20, 2022, 5:30 PM

#

can somebody tell why there in no 1/2m in cost function of L2 regularization whereas it is present in Linear regression?

#

lapis sequoia Apr 20, 2022, 6:01 PM

#

for I in range(5):
Print('I am going to fail the AP test')

jolly stone Apr 20, 2022, 6:17 PM

#

Can anyone give me a hint of some performative search algorithm? The search is to find a word

agile cobalt Apr 20, 2022, 7:36 PM

#

kinda sus GWllentThinkPika

long locust Apr 20, 2022, 7:37 PM

#

Hello, please don't post unapproved advertising. Thanks

prime hearth Apr 20, 2022, 7:55 PM

#

@jaunty belfry it just a constant 1/2 * 1/m

#

the 1/m is used for averaging

#

the 1/2 was put there to cancel the square power when we take the derivative

#

this does not actually change or affect the derivative

#

it just scaling

small orbit Apr 20, 2022, 9:10 PM

#

how can i implement Keras Tuner into my code? https://nbviewer.org/urls/bpa.st/raw/A6JA

Notebook on nbviewer

Check out this Jupyter notebook!

misty flint Apr 20, 2022, 9:17 PM

#

has anyone used geopandas before? PikaThink

#

do you recommend it

wheat hemlock Apr 20, 2022, 9:18 PM

#

Hey Everyone! Happy to be here been doing data science with python for about a year now, and now wanting to use Django to create an API for my website

Some questions, I will presenting stats to a website
- Should I be updating the stats on the backend directly through the db or using post
- MySQL as backend of Stats website or Postgress another DB
- I will be doing historical analysis, can I do this through django views or should I do the analysis before and just present the already updated information

desert oar Apr 21, 2022, 12:29 AM

#

jaunty belfry can somebody tell why there in no 1/2m in cost function of L2 regularization whe...

where did you see this? dividing by a constant doesn't change the argmin so it shouldn't really matter anyway

desert oar Apr 21, 2022, 12:32 AM

#

wheat hemlock Hey Everyone! Happy to be here been doing data science with python for about a y...

postgres has the most features and is the easiest to administer imo. mysql i think has better scaling functionality but you don't need that.

Should I be updating the stats on the backend directly through the db or using post
you can write directly to the database, but if you go through your own api endpoints then you maybe have a "safer" interface, with fewer ways to make mistakes, but then you have to deal with authentication for a privileged user that has the ability to write to the db, which maybe is more complexity than you want in your website

I will be doing historical analysis
what is historical analysis?

modern cypress Apr 21, 2022, 12:36 AM

#

Say I have a list like [London, Paris, Chicago], is it better to index these in my data like [0, 1, 2] or to turn them into bools and have a new column for each? so like is_London, is_Paris, is_Chicago?

#

Or does this not have any effect?

desert oar Apr 21, 2022, 12:47 AM

#

modern cypress Say I have a list like `[London, Paris, Chicago]`, is it better to index these i...

neither? both? provide more context

#

are you talking about a series where each value is a list?

#

there's nothing inherently better about integers compared to strings. if anything, strings prevent you from making the mistake of putting your categorical-valued integers into a model, which will treat them incorrectly as continuous values

modern cypress Apr 21, 2022, 12:58 AM

#

desert oar neither? both? provide more context

I was looking back at some previous work, and some of the learning material I was provided and it said to change the categorical values in to indexes like that, and in other examples they were doing it the other way using pd.get_dummies(). So I was just unsure if one of the ways was better than the other

desert oar Apr 21, 2022, 1:09 AM

#

modern cypress I was looking back at some previous work, and some of the learning material I wa...

i dont think changing to numerical indexes does anything for you, unless you're using a specific model that treats integers as categoricals (maybe some random forest implementations do this)

#

ideally you'll use pd.Categorical for categorical data

wheat hemlock Apr 21, 2022, 1:09 AM

#

desert oar postgres has the most features and is the easiest to administer imo. mysql i thi...

Thanks for getting back to me! So essentially I will be keeping track of nft collections prices and volumes and want to be able to compare the change in the past lets say 7 days

desert oar Apr 21, 2022, 1:10 AM

#

wheat hemlock Thanks for getting back to me! So essentially I will be keeping track of nft col...

oh, i see. doing the queries and computations in the view will be the easiest from a software design perspective. it's probably the least efficient, but for a hobby project that's fine

wheat hemlock Apr 21, 2022, 1:11 AM

#

desert oar oh, i see. doing the queries and computations in the view will be the easiest fr...

This will be for a production website, Can I add you separately and can give you some SOL to consult me by chance?

desert oar Apr 21, 2022, 1:11 AM

#

no, sorry

#

i'd rather not give 1:1 private help

#

you might want to post these questions in #web-development or #databases , it sounds like the data science aspect of your project is unrelated to these questions

wheat hemlock Apr 21, 2022, 1:14 AM

#

Ok any help on how I can accomplish the loading of the data to the DB in the correct way, I am currently using mysql.connector but worried about slugs and timestamps then how I can do the analysis of historical data on the backend? Thank you so much

modern cypress Apr 21, 2022, 1:15 AM

#

desert oar i dont think changing to numerical indexes does anything for you, unless you're ...

One of the models I was testing on was actually random forest hahaha. Sounds good either way though, thanks for the help 👍

desert oar Apr 21, 2022, 1:15 AM

#

modern cypress One of the models I was testing on was actually random forest hahaha. Sounds goo...

yeah, whether to one-hot encode categorical variables in a random forest is a topic of debate. imo you shouldn't do it, leave them as categoricals

#

if your implementation requires "numbers", use LabelEncoder, otherwise leave them as-is

#

keep in mind that pd.Categorical is backed by an integer array anyway

modern cypress Apr 21, 2022, 1:17 AM

#

Oh for real? Damn

#

But yeah, I had done it because it was in one of the lectures

#

It also said to try change values like "yes, no" or "Risk, NoRisk" to binary

pseudo wren Apr 21, 2022, 3:54 AM

#

I’m going to be honest

#

Idk how you guys do it

#

I feel like my head is going to burst trying to figure all this stuff out

normal jay Apr 21, 2022, 5:20 AM

#

how do you print the number of items from a column from an excel file? while also using np.unique

#

because the names on that column are repeated , so i need the number of names without it being repeated?

rose agate Apr 21, 2022, 5:24 AM

#

normal jay how do you print the number of items from a column from an excel file? while als...

if you have a pandas dataframe it should be something like len(np.unique(df.names))

ivory steppe Apr 21, 2022, 5:40 AM

#

I had csv data involving the info of patients having mri scan in various regions and the population data. Now I wanted to have a predictive model which could predict the probability of the supply and demand with the population data so that the occurence of mri scans could be predicted in that region.
Can anyone help in apporaching this problem or suggest me some material/work to look having similar format and problem.

small orbit Apr 21, 2022, 5:48 AM

#

how can i implement Keras Tuner into my code? https://nbviewer.org/urls/bpa.st/raw/A6JA

Notebook on nbviewer

Check out this Jupyter notebook!

gleaming pulsar Apr 21, 2022, 7:41 AM

#

does somebody knows how this type of chart is called?

#

#

i want to implement it in a game wich works in rounds

lapis sequoia Apr 21, 2022, 8:35 AM

#

gleaming pulsar does somebody knows how this type of chart is called?

isn't that a line graph with just its axis moved right and top

gleaming pulsar Apr 21, 2022, 8:36 AM

#

i realised it is called a bump chart

lapis sequoia Apr 21, 2022, 8:38 AM

#

Hi all, I would like to ask a question on how filters are activated
my resource is mainly deeplizard from YT for cnn
for a filter to be able to detect patterns in an image, they mentioned that the loss function between the input image and the output channel must be maximised
which is done with gradient ascent
I am guessing this is for training a cnn, would that be right?
my question is
if they are maximising the loss function for a filter
to be able to detect the a feature from the image
won't it be easy to just have the filter (3x3 for this example) to just keep increasing the element values of the filter
like say that I have a filter, and it moves over a completely white image,
won't I be able to produce a filter that has a million for each value in the filter
and that would generate a very high value
and then when that filter moves over another part of the white image that may have a black spot, that black spot would be negligible
.
it was easy to understand back-propagation when we were trying to minimise the loss function, which brings the accuracy up
but I cant get over why the filter needs to have its loss function maximised, on top of that, maximising the loss function should be something that doesn't have an end right?
another question: is it that we maximise the constitutional layer's loss function, and minimise the rest of the loss function of the cnn in two separate training events?

cinder matrix Apr 21, 2022, 9:25 AM

#

guys how do i get the accuracy of my model

#

and how do i evalutae it

warm oracle Apr 21, 2022, 9:42 AM

#

Man, google colab is high
I have this code

def plot_pred (train_data=X_train,
               train_label=y_train,
               test_data=X_test,
               test_label=y_test,
               prediction=y_pred):
  plt.figure(figsize=(10, 7))
  plt.scatter(train_data, train_label, c='b', label="Training Data")
  plt.scatter(test_data, test_label, c='g', label="Testing Data")
  plt.scatter(test_data, prediction, c='r', label="Prediction Data")
  plt.legend()

plot_pred()```
Which gives me an error on the ```plot_pred()``` saying that X and y need to be the same value.
But then I comment the prediction code, run it again, uncomment it, run it yet again and it works fine with no errors.

mild dirge Apr 21, 2022, 9:53 AM

#

are you using global variables as default values in your function? @warm oracle

#

That's probably messing stuff up

#

Also not passing any values in your function call

warm oracle Apr 21, 2022, 9:56 AM

#

Yea. I have them set that way as the only change would be the y_pred between models. So I can just go plot_pred(prediction=y_pred_1) or whichever it is lol

mild dirge Apr 21, 2022, 9:57 AM

#

You shouldn't use global variables in a function to begin with

#

let alone use them as default values

#

Pass them as arguments

cinder matrix Apr 21, 2022, 9:59 AM

#

colab is geh

#

bans me from gpu usage

warm oracle Apr 21, 2022, 9:59 AM

#

Ah I see. Thanks.

mild dirge Apr 21, 2022, 10:00 AM

#

" saying that X and y need to be the same value." It would also help to just give the error traceback btw

#

I assume it actually said that they need to be the same length

warm oracle Apr 21, 2022, 10:01 AM

#

Sorry, I had it fixed so didn't have the traceback to post.
lemme see if I can replicate the error

warm oracle Apr 21, 2022, 10:19 AM

#

"ValueError: x and y must be the same size"

mild dirge Apr 21, 2022, 10:26 AM

#

So the length of both arrays/lists differ

warm oracle Apr 21, 2022, 10:32 AM

#

Not really, as I set both X and y to be the same size, so I can understand it better before going into an actual dataset. As I'm still new to TensorFlow.

X = tf.range(-100, 100, 4)
y = X + 10

X_train = X[:40]
y_train = y[:40]

X_test = X[:10]
y_test = y[:10]```

mild dirge Apr 21, 2022, 10:34 AM

#

Well they do, otherwise you don't get an exception

#

If you check the error traceback you already know which line causes it

warm oracle Apr 21, 2022, 10:44 AM

#

Yea. Which, like I said, got fixed by commenting one line on and off again.

mild dirge Apr 21, 2022, 10:47 AM

#

warm oracle Yea. Which, like I said, got fixed by commenting one line on and off again.

Alright, well if it's fixxed I don't know what you want

#

If it is still a problem, you aren't give close to enough information to let us help you, otherwise I don't know why you are telling this :/

warm oracle Apr 21, 2022, 10:48 AM

#

Was just an observation. Since I said it got fixed in my initial comment.

#

Guess those aren't allowed lol

mild dirge Apr 21, 2022, 10:49 AM

#

Yeah but commenting and uncommenting code doesn't fix anything, that makes no sense

#

It might be that you reran some code, that made some values change or something

#

Whatever it was, it made it so the length of two of those arrays/lists weren't the same size

upper bluff Apr 21, 2022, 10:50 AM

#

i have a 2d array, something like:

[[0,2,3],
 [1,0,3],
 [2,1,0],
 [3,1,2]]

and i have a pandas dataframe something like:

now i want to get a new dataframe based on the indexes listed in the 2d array, for this example, it will be like:

 id   val   id_2  val
0 a    9      a    9
0 a    9      c    3
0 a    9      d    7
1 b    8      b    8
1 b    8      a    9
1 b    9      d    7
2 c    3      c    3
2 c    3      b    8
2 c    3      a    9
3 d    7      d    7
3 d    7      b    8
3 d    7      c    3

warm oracle Apr 21, 2022, 10:50 AM

#

That's why I was confused.
As I only reran the two cells that have the function defined (after commenting and uncommenting a line), and the one with the function call.

#

But don't worry about it lol.
Sorry for taking your time. And thanks again for the advice.

mild dirge Apr 21, 2022, 10:52 AM

#

yeah wasn't meant to sound so angry, just a bit confused

#

sorry if it came of aggressive 😛

warm oracle Apr 21, 2022, 10:53 AM

#

Don't worry. I'd react the same if someone told me what I just said lol.

upper bluff Apr 21, 2022, 10:56 AM

#

mild dirge yeah wasn't meant to sound so angry, just a bit confused

can you help me out with my problem?

mild dirge Apr 21, 2022, 10:57 AM

#

I am not that experienced with pandas srr

upper bluff Apr 21, 2022, 10:57 AM

#

ah aighty

mighty orchid Apr 21, 2022, 11:32 AM

#

upper bluff i have a 2d array, something like: ```py [[0,2,3], [1,0,3], [2,1,0], [3,1,2]...

can you please explain this, uhhh, better? edit: nvm

rose agate Apr 21, 2022, 11:32 AM

#

upper bluff i have a 2d array, something like: ```py [[0,2,3], [1,0,3], [2,1,0], [3,1,2]...

might be a better way to do it but I got this working

import pandas as pd
import numpy as np

index = [[0,2,3],
 [1,0,3],
 [2,1,0],
 [3,1,2]]

idse = ['a','b','c','d']
vals = [9,8,3,7]

data = {'id': idse, 'val': vals}
df = pd.DataFrame(data=data)

newdf = pd.DataFrame(np.repeat(df.values, len(index[0]), axis=0))

flat_list = [item for sublist in index for item in sublist]
newdf['id_2'] = df.id[flat_list].values
newdf['val_2'] = df.val[flat_list].values

#

produces

upper bluff Apr 21, 2022, 11:33 AM

#

SO GOOD YES

#

thats EXACTLY what i awanted!!!!

rose agate Apr 21, 2022, 11:34 AM

#

I did use np.repeat so the length of each sublist in the index would need to remain constant or it'd break

#

no worries

upper bluff Apr 21, 2022, 11:34 AM

#

yessss each sublist has equal length

upper bluff Apr 21, 2022, 11:45 AM

#

rose agate might be a better way to do it but I got this working ``` import pandas as pd im...

hey so, my original dataframe has around 40 columns along with the id column,,, is there anyway add the columns all together instead of individually?

rose agate Apr 21, 2022, 11:47 AM

#

are you trying to repeat those columns like with id and val or index them like with id_2 and val_2

#

hard to understand

upper bluff Apr 21, 2022, 11:47 AM

#

multiple columns of values

#

each id will have its own set of values

#

we are repeating ids

upper bluff Apr 21, 2022, 11:48 AM

#

rose agate are you trying to repeat those columns like with id and val or index them like w...

index like we did with id_2 and val_2

rose agate Apr 21, 2022, 11:49 AM

#

for each of the 40 cols?

upper bluff Apr 21, 2022, 11:49 AM

#

pd.concat([newdf, df[flatlist]], axis = 1, columns = [...])

#

maybe this

rose agate Apr 21, 2022, 11:50 AM

#

I can't really tell what you need without an example

upper bluff Apr 21, 2022, 11:50 AM

#

upper bluff i have a 2d array, something like: ```py [[0,2,3], [1,0,3], [2,1,0], [3,1,2]...

just like here, i have only one column of val, in the original dataframe i have 40 columns of val

rose agate Apr 21, 2022, 11:53 AM

#

let me think

rose agate Apr 21, 2022, 11:58 AM

#

upper bluff just like here, i have only one column of val, in the original dataframe i have ...

not sure if this is what you need but try this modification

# newdf['id_2'] = df.id[flat_list].values
# newdf['val_2'] = df.val[flat_list].values

cols = df.columns
for col in cols:
    newdf[col] = (df[col])[flat_list].values

upper bluff Apr 21, 2022, 12:00 PM

#

YES thanks a ton

#

this works epicly

proper swift Apr 21, 2022, 2:03 PM

#

Hi is there a way to apply a function starting with the 3rd instance of a value? I.e. ID numbers, on the the third count of an id number do x

I have the following df:

ids = [1001, 1002,
       1003, 1004,
       1005, 1006,
       1007, 1008,
       1009, 1010]

numbers = list(range(1,11))

systems = ["ONE", "TWO"]

num = 40

sample1 = random.choices(ids, k=num)
sample2 = random.choices(systems, k=num)
sample3 = random.choices(numbers, k=num)

df = pd.DataFrame(zip(sample1, sample3, sample2), 
                 columns=['id', 'seq', 'system'])

df.sort_values(by=['id', 'seq'])

If the count of the IDs >= 3, then starting at the third row, shift all the values in the system column up by one

undone wind Apr 21, 2022, 2:12 PM

#

Im making a fairly basic music content-based recommender system, following some code I used before for a movie recommender system that had a much smaller dataset. Not sure if this is relevant but the biggest change I made was that the music dataset I used is way too large so I have the system create a sample and the index is reset.

When running the function that compares an input to the rest of the data and outputs a small list of most recommended artists I get an error Ive never seen before: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

#

Im unsure what this means and what I should do

#

    index = indices[artist_name]
    sig_scores = list(enumerate(sig[index]))
    sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
    sig_scores = sig_scores[1:11]
    spotify_indices = [i[0] for i in sig_scores]
    return spotifydf['artist_name'].iloc[spotify_indices]```

#


  ValueError                                Traceback (most recent call last)
<ipython-input-27-95e154b531ed> in <module>
----> 1 recommendation(spotifyRec)

<ipython-input-25-19214a8b47a6> in recommendation(artist_name, sig)
      2     index = indices[artist_name]
      3     sig_scores = list(enumerate(sig[index]))
----> 4     sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
      5     sig_scores = sig_scores[1:11]
      6     spotify_indices = [i[0] for i in sig_scores]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```

stoic trench Apr 21, 2022, 2:34 PM

#

Interesting

tidal bough Apr 21, 2022, 2:34 PM

#

undone wind ```def recommendation(artist_name, sig=sig): index = indices[artist_name] ...

Sounds like the rows of your original array aren't one-element, and so can't be compared. Basically, check what the elements of sig_scores are after the first line.

#

By the way, what you're doing can be done via np.argsort instead (it gives an array of indices such that the corresponding elements are in sorted order. That seems to be exactly what you're doing with all these lines).

undone wind Apr 21, 2022, 2:41 PM

#

tidal bough Sounds like the rows of your original array aren't one-element, and so can't be ...

tidal bough Apr 21, 2022, 2:42 PM

#

Yeah, looks like each element is an array, so how do you expect sorted to compare them? What's bigger, [0,1,2] or [1,-1,0]?

undone wind Apr 21, 2022, 2:43 PM

#

and yeah

#

#

it is different in my old version with movies

#

huh

desert oar Apr 21, 2022, 2:46 PM

#

undone wind

those are tuples, not arrays. python allows you to sort tuples

desert oar Apr 21, 2022, 2:47 PM

#

tidal bough Yeah, looks like each element is an array, so how do you expect `sorted` to comp...

just on principle i would expect that you could define the same ordering as you would define for tuples, i.e. compare elementwise until the tie is broken. but imo numpy is right to reject that as the default behavior

tidal bough Apr 21, 2022, 2:48 PM

#

yeah, sure. Anyway, sorted just does the equivalent of if a<b:, which for numpy arrays is not valid (a<b is an array of bools, and an array of bools can't be implicitly reduced to a single bool like that)

undone wind Apr 21, 2022, 2:54 PM

#

im a little confused then why my original has tuples and this version uses arrays

desert oar Apr 21, 2022, 2:55 PM

#

undone wind im a little confused then why my original has tuples and this version uses array...

presumably because sig is an array

undone wind Apr 21, 2022, 2:56 PM

#

this is from the movies one that works

#

#

which seems to show that sig is an array here

#

(the one that doesnt work is a similar result)

desert oar Apr 21, 2022, 3:01 PM

#

proper swift Hi is there a way to apply a function starting with the 3rd instance of a value?...

i would just loop over rows in this case

from collections import defaultdict

df = ...

systems = ["ONE", "TWO"]

id_counts = defaultdict(lambda: 0)
for row in df.itertuples():
    id_counts[row.id] += 1
    if id_counts[row.id] >= 3:
        df.loc[row.Index, systems] += 1

#

you could also do this with groupby:

df = ...

systems = ["ONE", "TWO"]

df['id_count'] = df.groupby('id').cumcount()
df.loc[df['id_count'] >= 3, systems] += 1

proper swift Apr 21, 2022, 3:06 PM

#

desert oar you could also do this with groupby: ```python df = ... systems = ["ONE", "TWO"...

Ah that might be what I'm looking for, let me give it a shot

desert oar Apr 21, 2022, 3:08 PM

#

or even combining the loop and groupby, which might be the best option if this dataframe is really big and you have a large number of duplicate id values:

df = ...

systems = ["ONE", "TWO"]

for _, group in df.groupby('id'):
    if len(group) > 2:
        inc_ids = group.index[2:]
    df.loc[inc_ids, systems] += 1

im not entirely sure about the semantics of += while looping over itertuples or groupby. if you want to be safer, you can construct a list of id's to modify first, and then do the modification in one shot after

#

df = ...

systems = ["ONE", "TWO"]

inc_ids = []
for _, group in df.groupby('id'):
    if len(group) > 2:
        inc_ids.extend(group.index[2:].tolist())
df.loc[inc_ids, systems] += 1

desert oar Apr 21, 2022, 3:09 PM

#

undone wind

can you post your entire code? both the "original" you mentioned as well as the new version that gives you a problem

undone wind Apr 21, 2022, 3:09 PM

#

ah

#

#

#

well then

undone wind Apr 21, 2022, 3:10 PM

#

desert oar can you post your entire code? both the "original" you mentioned as well as the ...

sure

proper swift Apr 21, 2022, 3:10 PM

#

@desert oar what I had so far was this:

def func(df_group):
    if len(df_group) >= 3:
       return df_group.system.shift(-1)
    else:
       return df_group.system

new_col = df.groupby(['id']), as index=False).apply(func)

df['new'] = new_col.reset_index(level=0, drop=True)

desert oar Apr 21, 2022, 3:10 PM

#

undone wind

oh, in this case you have different array shapes

desert oar Apr 21, 2022, 3:10 PM

#

proper swift <@389497659087650836> what I had so far was this: ```py def func(df_group): ...

oh, i didn't realize what you meant with the shift

#

that would work too but that shift would apply to the entire group, not just the values after the 3rd

proper swift Apr 21, 2022, 3:12 PM

#

desert oar that would work too but that shift would apply to the entire group, not just the...

Yeah I want to move the system values up one. But only if the count of ID's is 3+. And the shift needs to start from row count 3 of each ID, as they already have preset values.

I think that's where im stuck at.

My code works on id's with counts of 3+, and it applies it to all the values. But I only want to apply the shift to the values starting from the third row count

desert oar Apr 21, 2022, 3:14 PM

#

wait... is system a column? it looked like a list of columns

proper swift Apr 21, 2022, 3:14 PM

#

desert oar wait... is `system` a column? it looked like a list of columns

Yeah its a column with only 2 values inside.

desert oar Apr 21, 2022, 3:14 PM

#

oh i see

#

i misread your original example

proper swift Apr 21, 2022, 3:17 PM

#

no worries, hopefully the problem is abit clearer now?

desert oar Apr 21, 2022, 3:17 PM

#

huh, that's a new one

#

let me see what i broke

#

found it

#

let me do this offline 😆 hang on

undone wind Apr 21, 2022, 3:18 PM

#

desert oar can you post your entire code? both the "original" you mentioned as well as the ...


# In[11]:
spotifydf = spotifydf.sample(frac =.1)
spotifydf = spotifydf.reset_index()


# In[12]:
spotifydf.head()


# In[13]:
spotifydf['popularity'] = spotifydf['popularity'].apply(str)


# In[14]:
spotifydf['genre'] = str(spotifydf['genre'])
spotifydf['genre'] = str(spotifydf['artist_name'])
spotifydf['genre'] = str(spotifydf['track_name'])


# In[15]:
spotifydf["content"] = spotifydf['genre'] + spotifydf['artist_name'] + spotifydf['track_name'] + spotifydf['popularity']

# In[16]:

from sklearn.feature_extraction.text import TfidfVectorizer

tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1, 3), stop_words = 'english')

spotifydf['content'] = spotifydf['content'].fillna('')

# In[17]:
tfvmatrix = tfv.fit_transform(spotifydf['content'])
# In[18]:
tfvmatrix
# In[19]:
tfvmatrix.shape
# In[20]:
from sklearn.metrics.pairwise import sigmoid_kernel
# In[21]:
sig = sigmoid_kernel(tfvmatrix, tfvmatrix)
# In[22]:
sig[0]
# In[23]:
indices = pd.Series(spotifydf.index, index=spotifydf['artist_name']).drop_duplicates()
# In[24]:
indices
# In[29]:
def recommendation(artist_name, sig=sig):
    index = indices[artist_name]
    sig_scores = list(enumerate(sig[index]))
    sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
    sig_scores = sig_scores[1:11]
    spotify_indices = [i[0] for i in sig_scores]
    return spotifydf['artist_name'].iloc[spotify_indices]
# In[26]:

spotifyRec = input("Enter the artist you would like a recommendation based on!")
# In[27]:
recommendation(spotifyRec)

desert oar Apr 21, 2022, 3:19 PM

#

undone wind ```spotifydf.head() # In[11]: spotifydf = spotifydf.sample(frac =.1) spotifydf ...

!paste i recommend using our paste site for longer samples like this

arctic wedgeBOT Apr 21, 2022, 3:19 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

undone wind Apr 21, 2022, 3:19 PM

#

ok

#

https://paste.pythondiscord.com/igedokepux

#

this is the original then

slate scarab Apr 21, 2022, 3:21 PM

#

I have a dumb question, I saw they had discord.js would it be better in the long run to start and stay in python or is discord.js a good start, I plan on working on creating my own version of Carl bot and have never tried to make a project this big before.

undone wind Apr 21, 2022, 3:22 PM

#

https://paste.pythondiscord.com/hexovufelu

slate scarab Apr 21, 2022, 3:22 PM

#

I have done some basic bots before and would like to go bigger

undone wind Apr 21, 2022, 3:22 PM

#

and this is the new version with music instead of movies @desert oar

desert oar Apr 21, 2022, 3:23 PM

#

!eval @proper swift

import numpy as np
import pandas as pd

ids = [
    1001, 1002, 1003, 1004, 1005,
    1006, 1007, 1008, 1009, 1010,
]
numbers = list(range(1,11))
systems = ["ONE", "TWO"]
num = 40
rng = np.random.default_rng()
sample1 = rng.choice(ids, size=num)
sample2 = rng.choice(systems, size=num)
sample3 = rng.choice(numbers, size=num)
df = pd.DataFrame(
    zip(sample1, sample3, sample2),
    columns=['id', 'seq', 'system'],
)


def shift_system(group):
    if len(group) < 3:
        return group
    return pd.concat((
        group.iloc[:2],
        group.iloc[2:].shift(-1)
    ))

df['new'] = (
    df.groupby('id')['system']
    .apply(shift_system)
    .reset_index(level=0, drop=True)
)
print(df)

arctic wedgeBOT Apr 21, 2022, 3:23 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |       id  seq system  new
002 | 0   1008    6    TWO  TWO
003 | 1   1002    5    ONE  ONE
004 | 2   1004    2    TWO  TWO
005 | 3   1001   10    ONE  ONE
006 | 4   1009    3    TWO  TWO
007 | 5   1008    6    ONE  ONE
008 | 6   1001    7    TWO  TWO
009 | 7   1004    2    TWO  TWO
010 | 8   1001    7    ONE  ONE
011 | 9   1008    2    ONE  ONE
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/doverelawu.txt?noredirect

desert oar Apr 21, 2022, 3:24 PM

#

i can't imagine why you want to do this though 😆

desert oar Apr 21, 2022, 3:24 PM

#

slate scarab I have a dumb question, I saw they had discord.js would it be better in the long...

this sounds like a good question for #discord-bots

proper swift Apr 21, 2022, 3:24 PM

#

desert oar i can't imagine _why_ you want to do this though 😆

Long story haha. Need to fix some data entry issues

undone wind Apr 21, 2022, 3:32 PM

#

wait

cinder matrix Apr 21, 2022, 3:32 PM

#

hi can someone help me, i am trying to find a way to evaluate my model, which is created using this article https://www.ivanlai.project-ds.net/post/conditional-text-generation-by-fine-tuning-gpt-2, which uses transformers. i came up with training another model but with the rnn/lstm structure, how would i do this?
both models should convert keyworeds to sentences, am strugglign to find a tutorial to train a rnn/lstm to do this

proper swift Apr 21, 2022, 3:38 PM

#

@desert oar thanks works as intended! Been stuck on that problem for the last 2 days

undone wind Apr 21, 2022, 3:49 PM

#

ok so I think the issue was possibly coming through a few things

#

so originally when using my method I got an error when concatenating columns saying: "TypeError: can only concatenate str (not "int") to str"

#

wait

#

maybe not 😂

#

yeah I cant work out why its making full arrays instead of tuples, I cant see anywhere why its doing this

#

oh wait, is it potentially because im doing it based off of artist_name, and one artist can have many songs within the dataframe, so each artist is assigned a multitude of values in sig?

serene scaffold Apr 21, 2022, 4:09 PM

#

@undone wind can you do print(df.head().to_dict('list')) and show the text in this chat?

#

and then we can talk about how to transform it to get your desired result. only text will do--no screenshots.

#

Please ping me when you do that and we can get into it.

undone wind Apr 21, 2022, 4:21 PM

#

{'index': [133553, 204593, 52399, 79490, 93264], 'genre': ['Reggae', 'Soundtrack', 'Blues', 'Opera', 'Indie'], 'artist_name': ['Bob Marley & The Wailers', 'Nick Glennie-Smith', 'Galactic', 'Giacomo Puccini', 'The Lagoons'], 'track_name': ['Bend Down Low - B Is Version', "Jack's Death", "You Don't know (featuring Glen David Andrews and The Rebirth Brass Band)", 'Un bel dì (From "Madama Butterfly")', 'California'], 'track_id': ['6bwr7Qgxrc0hERBOrapmVh', '34devHoJ8tjNLPgSaOpPuo', '5qh4q09WZTUMCkqXWR4l6l', '4jekropd6vkVfunMXZqwVh', '35QAUfIbfIXT3p3cWhaKxZ'], 'popularity': ['35', '28', '25', '20', '64'], 'acousticness': [0.44, 0.973, 0.0393, 0.9890000000000001, 0.276], 'danceability': [0.779, 0.14400000000000002, 0.701, 0.24, 0.7859999999999999], 'duration_ms': [213867, 98440, 244200, 296693, 261773], 'energy': [0.445, 0.203, 0.7659999999999999, 0.163, 0.6859999999999999], 'instrumentalness': [0.000151, 0.7829999999999999, 0.000816, 2.8499999999999998e-05, 0.6679999999999999], 'key': ['C', 'D#', 'G', 'C#', 'E'], 'liveness': [0.166, 0.11599999999999999, 0.23800000000000002, 0.317, 0.0416], 'loudness': [-7.791, -17.989, -6.285, -15.071, -7.18], 'mode': ['Major', 'Minor', 'Major', 'Major', 'Major'], 'speechiness': [0.0458, 0.0356, 0.0976, 0.05, 0.0289], 'tempo': [87.94, 74.194, 110.001, 89.719, 119.99700000000001], 'time_signature': ['4/4', '1/4', '4/4', '4/4', '4/4'], 'valence': [0.755, 0.0389, 0.588, 0.0388, 0.542], 'content': ['ReggaeBob Marley & The WailersBend Down Low - B Is Version', "SoundtrackNick Glennie-SmithJack's Death", "BluesGalacticYou Don't know (featuring Glen David Andrews and The Rebirth Brass Band)", 'OperaGiacomo PucciniUn bel dì (From "Madama Butterfly")', 'IndieThe LagoonsCalifornia']} @serene scaffold

#

content comes from spotifydf["content"] = spotifydf['genre'] + spotifydf['artist_name'] + spotifydf['track_name']

#

I tfvmatrix content

#

then sig = sigmoid_kernel(tfvmatrix, tfvmatrix) to make sig

#

sig[index]

array([[0.7616427 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76161052, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       ...,
       [0.7616156 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.7616192 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416]])```

#

sig contains arrays with multiples values for some reason whereas it should only contain 1, the sigmoid value of each row in content

#

i dont know why and I believe this is what I am asking for

compact gazelle Apr 21, 2022, 4:38 PM

#

Question, how to make 2D array like this using numpy? The output wants the array starts from 137 to 166

serene scaffold Apr 21, 2022, 4:39 PM

#

compact gazelle Question, how to make 2D array like this using numpy? The output wants the array...

yes, you can do arr.reshape(-1, 1) to get one like that for any array. though you have to be careful that the array actually means something when you shape it like that.

serene scaffold Apr 21, 2022, 4:39 PM

#

undone wind {'index': [133553, 204593, 52399, 79490, 93264], 'genre': ['Reggae', 'Soundtrack...

thanks for posting it. I don't know that I understand what change you are trying to make

compact gazelle Apr 21, 2022, 4:40 PM

#

serene scaffold yes, you can do `arr.reshape(-1, 1)` to get one like that for any array. though ...

Ahh yes it works, thank you so much

desert oar Apr 21, 2022, 4:59 PM

#

@undone wind what is spotifyRec?

#

even the code snippets you posted don't include all of the code

#

don't make people guess at what you are doing here, if you can include the whole notebook please do so

undone wind Apr 21, 2022, 5:01 PM

#

https://paste.pythondiscord.com/fuxirature

desert oar Apr 21, 2022, 5:01 PM

#

this is the non-working one?

undone wind Apr 21, 2022, 5:01 PM

#

yes

desert oar Apr 21, 2022, 5:02 PM

#

@undone wind ```python
indices = pd.Series(spotifydf.index, index=spotifydf['artist_name']).drop_duplicates()

does `spotifydf` have a multi-index?

#

also it seems a bit weird that you're inverting the index and values like this

#

ok i see... you have this

spotifydf = pd.read_csv(r'C:\Users\cens\Downloads\archive (3)\SpotifyFeatures.csv')

so its index should just be the default RangeIndex

#

i see you also did spotifydf.reset_index() in cell 11

#

ok, so sig should be N x N where N is the number of rows in spotifydf

#

ahh i see why you invert the index, that's one way to do it. but sig is a plain numpy array, so it's only valid if you use a RangeIndex, you're better off with np.arange(len(spotifydf)) instead of spotifydf.index.

#

this notebook is also pretty messy. it's very likely that you have some weird intermediate state. did you try restarting the kernel and running it top to bottom?

#

if sig is indeed a 2d array, and if artist_name is a scalar (i.e. not a list/series/array), then sig[index] should be a 1d array because you deduplicated indices,

#

so if you get something different, then one of those assumptions is wrong

#

check the shape of sig and check that you are only passing plain "artist name" values, not arrays thereof

undone wind Apr 21, 2022, 5:09 PM

#

this is the full of the one that works too https://paste.pythondiscord.com/hafoyezoco

undone wind Apr 21, 2022, 5:11 PM

#

desert oar check the shape of `sig` and check that you are only passing plain "artist name"...

   sig[index]

array([[0.7616427 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76161052, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       ...,
       [0.7616156 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.7616192 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416]])```

#

contains multiple arrays for 1 artist name

desert oar Apr 21, 2022, 5:12 PM

#

undone wind ```index = indices[spotifyRec] sig[index] array([[0.7616427 , 0.76159416, 0....

show index as well as spotifyRec

#

i bet spotifyRec itself is not a scalar

#

this is why i always use .at when i expect to be indexing with scalars, if i accidentally pass a non-scalar it gives an error

undone wind Apr 21, 2022, 5:16 PM

#

desert oar show `index` as well as `spotifyRec`

how do you mean

desert oar Apr 21, 2022, 5:17 PM

#

undone wind how do you mean

what is spotifyRec? if it's not a single scalar, then it's going to produce an index that is not a a scalar, which will produce a 2d array from sig[index]

undone wind Apr 21, 2022, 5:19 PM

#

do you just mean

   indices
   
artist_name
Bob Marley & The Wailers        0
Nick Glennie-Smith              1
Galactic                        2
Giacomo Puccini                 3
The Lagoons                     4
                            ...  
Glass Animals               23267
Night Beats                 23268
Jackie Kashian              23269
311                         23270
Bruce Broughton             23271
Length: 23272, dtype: int64

   spotifyRec = input("Enter the artist you would like a recommendation based on!")```
?

#

and then index = indices[artist_name]

#

spotifyRec is just an input from the user

desert oar Apr 21, 2022, 5:22 PM

#

ok, can you confirm that index is also a scalar?

#

do this

index = indices.at[artist_name]

#

this way you definitely get an error if it's wrong

#

also can you show me sig.shape to confirm that it is definitely 2d and not 3d?

undone wind Apr 21, 2022, 5:24 PM

#

desert oar Apr 21, 2022, 5:24 PM

#

alright

#

and can you print the value of index too? using the same artist_name that caused the problem before

undone wind Apr 21, 2022, 5:28 PM

#

#

#

i think I was right earlier on, multiple songs with the same artist

#

is whats causing arrays rather than tuples

#

the movie one works because there arent any movies with the exact same title

#

would you say this is whats happening? @desert oar

desert oar Apr 21, 2022, 5:31 PM

#

undone wind would you say this is whats happening? <@389497659087650836>

yes, exactly. your "index inversion" didn't work as expected

#

i think you need to reconsider these data structures

analog kestrel Apr 21, 2022, 5:34 PM

#

Hi all, I have a question regarding training a multilayer perceptron for mnist using the classes/functions that I was provided with. Is this an appropriate place to reach out for some assistance?

undone wind Apr 21, 2022, 5:35 PM

#

desert oar i think you need to reconsider these data structures

in the short term I guess I could change the dataframe so that it is 1:1 between song and artist, just to get it working

#

wont make the greatest recommender system but I just want it working for now

#

and then is there a place anyone could recommend ( 😏 ) for learning more about content based recommenders and implementing them

desert oar Apr 21, 2022, 5:37 PM

#

undone wind in the short term I guess I could change the dataframe so that it is 1:1 between...

no, you need to use an index that is actually unique

#

this has nothing to do with recommendation systems... this is a matter of being a bit smarter about numpy and pandas usage

#

if your recommender system works with songs, then you have a song recommender system

#

so you can ask for an artist, but then obviously you will get more than one song per artist

#

which is maybe fine, but then you need to be smarter about how you do the lookup

#

you need to get the list of song ids from the artist id

#

or you need to come up with an artist-level recommendation system, not a song-level recommendation system