#data-science-and-ml

1 messages · Page 397 of 1

lapis sequoia
#

Hi, is there anyone who can help me with my problem? I'm trying to create a robot in pybullet but I don't know why my joints aren't working correctly and why gravity isn't working as it should.

lapis sequoia
#

AI because pybullet is used to train RL models

#

Where should I ask if not here?

frank edge
#

robotics is not equal to AI

lapis sequoia
serene scaffold
#

The question is fine for this channel, though unfortunately it's not likely to be answered given that it's very niche @lapis sequoia @frank edge

rugged tide
#

Hi there 👋

#

I'm applying for data science degree apprenticeships and was wondering whether or not people have concocted any opinions as to how good they are?

#

I would be training as a data scientist while earning roughly £20000 give or take depending on the company (if I managed to land the apprenticeship ofc), and after 4 years would have a bachelors in Data Science paid for

#

my main long-term concern would be career progression with only a bachelors degree, I've seen a few posts on reddit about how its much harder to progress without at least a masters, with many people choosing to get their PhDs

#

So my question is, would you agree with that or not? Thanks in advance

serene scaffold
#

@rugged tide you might ask in #career-advice, asking for those who are familiar with the job market in Britain.

rugged tide
#

My apologies, didn't see that channel.

rugged tide
serene scaffold
#

In the US, it's harder to get your foot in the door with only a bachelors related to data science, but once you get a job, progression isn't necessarily stopped by not having a higher degree.

#

also idk what £20000 can get you in the UK, but if you take today's exchange rate for GBP->USD and try to live on that here, it wouldn't be that great. Are you sure you could live on that?

rugged tide
#

and if in london I can commute

serene scaffold
#

I see

rugged tide
#

I don't really drink or rave so its fine

serene scaffold
#

There are other ways to burn money fast BingShrug

rugged tide
#

this is also true

#

btw, would you like me to delete my relatively long-winded post?

serene scaffold
#

what income prospects are you looking at if you have a bachelors? because people encouraged me to do community college before starting the CS program "to save money", but in taking longer to get my degree, I missed out on a few years of higher income.

#

So in retrospect, I lost money by not getting my degree in four years.

#

obviously the situation is different. I'm just pointing out that future income is a consideration.

rugged tide
#

so 1 extra year for the degree, but no debt, and far more exp

#

there are very conflicting opinions on degree apprenticeships here though, some people think they're amazing, other people say they suck

serene scaffold
#

well, one thing you'll have to do as a data scientist is figure out why similar events have different outcomes, so I guess you can start doing that now 😄

rugged tide
#

🤣 thanks

hoary rover
hoary rover
# rugged tide I would be training as a data scientist while earning roughly £20000 give or tak...

This is definitely more of a #career-advice place for this kind of discussion, but since you haven't been moaned at yet ill put my answer here 🤣

Getting a good apprenticeship in the UK is incredibly difficult and its unbelievably competitive for what it is (a job and a place at a mid-rank uni). My advice is to take what you have assuming you've just finished A-levels or equivalent and hit up the best name brand university you can (bristol, warwick, birmingham, etc). Coming from an elite University is what makes the most difference even over experience in some cases. Personally, I'm finishing my masters in September and will be an EO for the ONS and had a better edge than most the kids applying just because my universities name was shinier.

#

I was definitely dumber though.

wicked grove
#

hello,in k fold cross validation are the weights initialized in each fold?

flat sable
hollow flare
#

What is your's opinion on blobcity ai cloud

modest comet
#

Hello, I'm just learning python for my homework. Asked to make a program for imageAI and when I tried, It comes to error. Can someone explain it why

austere swift
#

try restarting the kernel (as the message suggests)

#

i doubt anybody is gonna just make a model for you, thats a job you'd have to pay quite a bit for

#

you can try to make one and we'll help you if you run into any snags though

loud flame
#

but I just don't know how to improve it further

#

I've used every parameter for a random forest classifier ( best_params )

austere swift
arctic wedgeBOT
#

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

loud flame
#

its not an exam

#

🗿

austere swift
#

homework

#

we can't do it for you, but we can help you

#

which is what i explained earlier

loud flame
#

yeah u don't need to help

#

but could u give tips on improving it ( after hyperparameter tuning )

#

is there anything else u can do on a model

#

just tell me, I'll research and do it myself

#

+I'm not able to balance the data

austere swift
#

"can anyone create a good model for the dataset I'll give" sounds a lot like asking for someone to do it for you

austere swift
#

like a neural network or an svm etc

loud flame
#

idk bout neural network

#

its more of a y/n model

loud flame
loud flame
#

Random Forest gave the best results but, the options to improve it are limited

austere swift
#

well try neural networks

loud flame
#

isn't it a multiclass thing

austere swift
#

neural networks can pretty much do whatever you want, it's just about how you configure them

#

you can give as many or as few outputs as you want

loud flame
#
  • I have a doubt, is there any way I can increase the speed of my GridSearchCV
#

its been 3.5 hours

austere swift
#

which is just how many parallel jobs it'll run

loud flame
#

how much?

#

I did n_jobs=-1 before

austere swift
#

well -1 is the maximum you can use anyways

loud flame
#

oh 🗿

#

if I update a running jupyter cell

#

I just added

#

n_jobs=-1

#

into a running cell

#

will it update the parameter?

#

@austere swift

austere swift
bronze spire
#

Where can I start learning about Data Science?

lapis sequoia
mint palm
#

what can happen if i encode my csv dataset into very high number of columns using hash encoders

crisp flax
austere swift
#

it was doing so good too :(

next phoenix
lapis sequoia
fleet trail
#

Hello, what would be the best model for a dataset containing 5000 rows and 2098 columns

grand scaffold
#

And I once got a loss rate of 2 and was upset lmao

#

Yo why is this not working lemon_thinking

from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt("data.csv", delimiter=",")
X = dataset[:,0:3]
y = dataset[:,3]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=3, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# make class predictions with the model
predictions = (model.predict([4,5,7]) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
      print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))```
lapis sequoia
#
import numpy as np
import matplotlib.pyplot as plt

def gradient_descent(x,y):
    m_curr = b_curr = 0
    iterations = 100000
    n = len(x)
    learning_rate = 0.001

    for i in range(iterations):
        y_predicted = m_curr * x + b_curr
        cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
        plt.plot(x,y_predicted, color = "green")
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        m_curr = m_curr - learning_rate * md
        b_curr = b_curr - learning_rate * bd
        print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i))

x = np.array([10,9,11,12,6,5,7,6,12,14])
y = np.array([95,90,90,105,75,75,80,85,110,115])

gradient_descent(x,y)

Even after so many iterations the cost is still 20

grand scaffold
#

Tryna make a basic neural network in keras

rose agate
cedar plank
#

hey guys

#

i want help

#

can anyone help me with a machine learning course pls

#

can anyone help me

serene scaffold
cedar plank
#

the qustion that i want advanced machine learning course

serene scaffold
#

you want someone to tell you what advanced ML course you should take?

cedar plank
#

yes i want suggestions

serene scaffold
#

the andrew ng course seems to be popular. I have not taken it.

cedar plank
#

are u joking or talking serious

serene scaffold
#

I am being serious. there's an ML course taught by Andrew Ng that I hear about a lot. But I have not taken it personally, so I can't tell you how it is from experience.

tough frigate
#

me neither

#

i prefer ebooks

analog kiln
#
    class MyCell(tf.keras.layers.AbstractRNNCell):
            @property
            def output_size(self):
                return 16
            @property
            def state_size(self):
                return 16
            def call(self, inputs, states):
                alpha_t, alpha_t_prev = inputs, states[0]

                return alpha_t, [alpha_t]

        my_cell = MyFCell()
        layer_res = tf.keras.layers.RNN(my_cell)(logits)

anything obviously wrong with this code? logits contains a tensor with shape [batch_size, timesteps, logits]. I know it doesn't do anything right now but i'm getting this error: TypeError: Cannot iterate over a scalar tensor.

#

is it the way i'm defining the output and state sizes?

hollow sentinel
#

so i just tried to get anaconda on my computer

#

never again

#

anaconda's like that hot ex that you want back in your life bc you think it'll change

#

but then it's exactly the same

#

stel i know you're gonna read this

#

jupyter notebook fucking sucks

serene scaffold
hollow sentinel
#

HAHAHXCKDN

#

that describes my last one so well

#

anywaysssss

#

thonny is the shawty

serene scaffold
#

there was an ask reddit where someone asked "what is your high school crush doing now?" and someone said "I'm 40. he's still a douchebag who spends all his time at the gym. but I was into that at the time."

hollow sentinel
#

i was facetiming this girl last night and she was like full time real estate, full time law firm stuff, full time college student and she was dating this dude who had nothing going for him

#

and i was like why?

#

and then she said verbatim "i was bored"

#

oh i'm gonna preach this: everyone please pip install pyforest

#

you don't need to write import numpy as np, import pandas as pd, import scikitlearn as sklearn

#

it lazily imports everything

inland belfry
#

i am using mediapipe to make stick figures from video (sorry for the rick roll i was just using it as a test video)

serene scaffold
inland belfry
#

mediapipe

serene scaffold
#

what is that

inland belfry
#

look it up

serene scaffold
#

no

inland belfry
#

ok

#

it's a library that does pose detection and face tracking and stuff

coarse narwhal
desert bear
bronze spire
#

@lapis sequoia So, from where should I learn

lapis sequoia
bronze spire
lapis sequoia
serene scaffold
hollow sentinel
#

thonny mad cute

#

and gives nice debugging tips

#

mad easy to install and update packages

#

lightweight

#

people give it shit bc it’s for beginners but i prefer it

#

it basically has a rubber duck installed that talks to you about your code and tries to suggest what went wrong

#

more descriptive than some long ass error message you’ll run a rabbit hole for hours looking to solve

wicked grove
#

hello,can someone please tell me how i can use grad cam to correct my model

#

i can see the areas wheremy model is making a mistake using grad cam

cedar plank
#

can anyone send a course from code academy to start in machine learning if i studied data analysis

grave frost
cedar plank
#

i studied statistics and spread sheets and bussines metrics

#

iknow this is not enough

grave frost
#

is that a MOOC or a degree?

misty flint
#

aka their version of jupyter notebooks

#

no need to install anything

#

only obscure errors i mean what

cedar plank
grave frost
#

i think yes
yes

cedar plank
#

can you tell me a course to begin in ML in codecademy

grave frost
#

yes

inland belfry
#

:)

lapis sequoia
thorn bobcat
#

yo

calm palm
#

Hey small question since questions about pandas seem to fall short in the help channels, does anybody know of a quick way to replace values in a column with 'day' if it falls within a certain time of the day and 'night' if it falls within a certain time in a pandas dataframe? I've searched and dataframe.between_time does not seem to return it in a way such that I can set a certain column in those values to another value

calm palm
#

Nvm I think I got it

upper mural
#

Hello. Could anybody recommend a textbook something like a textbook on data science with python that mainly focuses on quant methods?

#

if you know more than one book, please do refer a few for evaluation

barren barn
#

This is probably the wrong section but is python able to make a script that grabs a =count(E2:Ewhatever the last one is) on a specific column in multiple sheets and print them in a new column of a specific file?

#

I have a small macro from Fiji I made/found parts of and it runs an analysis on a set of images within a folder and spits out all the excel sheets into a folder. So I have that backbone but I don't think it's going to work

#

so if anyone knows if this is easily possible, that would be great

bronze spire
#

Any free source from where I can learn Data Science?

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

jaunty belfry
#

hello

#

what can be the right answer for this question.?

#

I selected B as the right answer

#

but unfortunately its wrong

jovial sleet
# jaunty belfry

B looks like the correct answer. I'm guessing they messed up the question

jaunty belfry
#

What about this?

#

@jovial sleet

jaunty belfry
tough frigate
#

anybody got interviewed at Uber for data analyst ? any help is appreciated.

vernal hull
#

hi'

regal ingot
#

hey

bronze spire
burnt pilot
#

How can I manipulate the Names of a Data Frame that are within a specific range of id`s

#

using Pandas

serene scaffold
burnt pilot
#

I have a Data Frame that looks like this Bertha,F,1320 Sarah,F,1288 Annie,F,1258 Clara,F,1226 Ella,F,1156 Florence,F,1063 Cora,F,1045 Martha,F,1040 Laura,F,1012 the header is ['Name','Gender','Id'] and I want to change the names that have a an Id within a range of 1180-1200 to John

serene scaffold
#

you would just need to do df.loc[df['Id'].between(1180, 1200), 'Name'] = 'John'

#

if you change the Id column to be the index, it would be df.loc[1180:1200, 'Name'] = 'John'

serene scaffold
#

like, what caused k: to be displayed?

burnt pilot
#
k = df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John'
print('k:\n',k)
serene scaffold
#

k = df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John' is the same as

df.loc[df_1880['Id'].between(42, 49), 'Name'] = 'John'
k = 'John'
hoary rover
#

^

#

df is the dataframe you created from df_1880 within your parameters.

burnt pilot
#

its working now

#

when I look into variable explorer

serene scaffold
burnt pilot
#

but still it just shows John

#

yeah

hoary rover
#

Yes, because you overwrote df_1880

serene scaffold
#

k is irrelevant

#

df.loc[...] = ... is a method call. it's not actually doing assignment.

hoary rover
#

k is just a variable you created. How do you want to display the output?

burnt pilot
#

I see

#

I wanted to get smth like this ```py
Name
2117 Vince
2118 Vivian
2119 Whit
2120 Willaim
2121 Winifred
2122 Wirt
2123 Woodson
2124 Woody
2125 Worley
2126 Zed

hoary rover
#

Yes. Then just write df_1880 underneath your input.

serene scaffold
#

it's the same as df.loc.__setitem__(x, y). it changes the state of df. it doesn't write any new variables. but since you stacked it with k =, it assigned to k

hoary rover
#

Pandas is the apple imac of python code. Please check the documentation.

burnt pilot
serene scaffold
#

but Kingu is right that pandas works very differently from the rest of Python

burnt pilot
#
m = df_1880['Name'].value_counts()['Mary']
print('Mary :\n',m)
``` here I counted how many times the name Mary occurs
#

how would I do the same thing for 3 dataframes at once

serene scaffold
tidal sonnet
#

what do you think about julia for datascience ?

pseudo wren
#

maybe this is the dumbest error but i keep running into it

serene scaffold
pseudo wren
#

whenever i am trying to insert table values with sqlite3, i get a syntax error and i am not sure what's wrong with it or my spacing

#

idk

#
      Car_Name    
      Year    
      Selling_Price
      Present_Price    
      Kms_Driven    
      Fuel_Type    
      Seller_Type    
      Transmission    
      Owner
    )
    VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', tups)
connector.commit()
connector.close()```
serene scaffold
pseudo wren
#

near "Year": syntax error

#

thanks!

serene scaffold
#

you should probably specify what flavor of SQL you're using as well

burnt pilot
serene scaffold
#

are 1880-2 just years?

young harness
#

kind of a beginner question but can someone explain buffer size and batch size to me in simple terms? (in ML i mean)

serene scaffold
young harness
serene scaffold
young harness
#

in an epoch or an iteration?

serene scaffold
#

in an iteration, I guess. an epoch is a pass over the entire training set.

young harness
#

thanks for the explaination, i was kinda confused :)

#

ima look more into what buffer size is though

burnt pilot
serene scaffold
burnt pilot
#

So you suggest to create a DataFrame with these inside

tough tundra
#

can someone help me to create a ChatBot Song Recommender System,
I can provide all the information which I have

serene scaffold
valid rapids
#

Does anybody here have experience with gpt-2? I'm really interested in making a chatbot with it, but I have NO clue what I'm doing.

karmic valley
#

https://paste.pythondiscord.com/luhoyasewu can you help me with some different code i want to work out average pixel whiteness of image. my code i wrote i think the library doesnt support transparency, i have transparency in my image

green rune
#

do people most typically use jupyter whenever they're actually doing data analysis?

serene scaffold
green rune
serene scaffold
green rune
#

thank you!

#

learning it now

serene scaffold
# green rune learning it now

just don't become dependent on it. it's a tool for visualization and exploration. if it becomes the only way you write code, you're gonna have a bad time in the future.

green rune
agile cobalt
#

Jupyter notebooks are fine-ish for reporting if you use it effectively (i.e., use markdown cells to document it well and clean up the code) then generate a PDF, or just use Powerpoint

green rune
agile cobalt
#

usually not screenshots if possible

#

most libraries will have ways of outputting to a file

green rune
agile cobalt
#

that's not really the point

#

the issue about Jupyter is that the code tends to get messy, not as easy to isolate, and the global state is just a mess with things from even deleted cells still existing.
It is fine for data exploration and reporting though

#

it is bad if you do not organise your code or if you end up with something you cannot reproduce later

green rune
#

ah okay I think I have a better idea of why now, and I see what you mean whenever I want to make outside changes you have to go back through it which can be tedious

agile cobalt
#

Just to make sure - What exactly do you mean by that? (outside changes & going back through it)

green rune
#

like if i have a csv and i want to change it, i cant just make changes and go back and continue working I have to rerun sections of the notebook that would create a df to make sure my changes were made

agile cobalt
#

you really shouldn't simply "change it" in the middle of the process like that

serene scaffold
#

A good rule of thumb for notebooks is that if some output needs to be reproducible, it should be possible to obtain it only by running each cell once in order.

agile cobalt
#

changing your data is not something you should do often to the point of it being a concern, but if you do change anything about the source you should definitely shutdown/restart the kernel and rerun all

green rune
#

thank you again

hoary rover
#

Just to add in as well, but the point has probably already been made, jupyter globally caches elements of your script which can get tedius at times which is why its often better and easier to write from scratch in a script.

lapis sequoia
#

Hi what's the best method to approach this question? logistic regression maybe?

hoary rover
#

Yes. Each variable is categorical and it has <20 elements.

lapis sequoia
lapis sequoia
hoary rover
#

Yes. Make sure you diagnose gauss.

indigo garnet
#

what is the 4 variable version of this class?

lapis sequoia
indigo garnet
lapis sequoia
indigo garnet
lapis sequoia
hoary rover
#

Normality. Plus do all the other assumptions (independence, linearity etc)

misty flint
#

ugh

#

cant get dask to cooperate

#

tragic

quartz fable
#

Hello, i've an problem to solve using python. i need to build an school timetable.. and i'm thinking in use deep learning or something like that. could anyone here tell me what is the best model to choose for this problem ? neural network, NPL, random forest.. i've some rules like (if theacher can give lass in the morning.. or nightly, and anothers rules.

serene scaffold
quartz fable
quartz fable
serene scaffold
#

one uses AI when it's not possible to write a program that can always solve the problem. AI programs attempt to approximate human judgement

#

if there's an exact series of steps that is guaranteed to produce the correct result for something, do that.

safe elk
misty flint
quartz fable
desert oar
willow karma
#

What is a single metric I can use to calculate correlation amongst multiple variables? A correlation matrix doesn't accomplish it because it reports multiple correlations

reef dock
#

Could someone explain to me what PR AUC is?

#

Or has any resources that could help me understand that metric better?

serene scaffold
#

!mute 899605898639597568 "1 week" It seems your only interest in our community is as a place to post Medium articles from the same author. This is not reddit. Please take your promotion elsewhere.

arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied mute to @next phoenix until <t:1650946679:f> (6 days and 23 hours).

thorn venture
#

I have a band column in a data frame. There are 5 values only spread throughout the 100 rows df . I want to make the df in 5 rows. All the same band values column data will be added into a single row. So there will be one row each for those 5 band. So 5 rows will be there. Can anyone please help me??

tough frigate
#

Use pivot table or groupby function

quartz fable
jaunty mural
#

hi there, need a help, can't plot the graphs like in excel

#

`
#%%
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#%%
df_new_1 = pd.read_excel(r"C:\Users\Nikita\Documents\MEGAsync\current_test\Outline Proposed\night_minds\data_velocities_impr.xlsx",
index_col=None, header=None,
sheet_name="Dynamic velocities_1")

df_new_2 = pd.read_excel(r"C:\Users\Nikita\Documents\MEGAsync\current_test\Outline Proposed\night_minds\data_velocities_impr.xlsx",
index_col=None, header=None,
sheet_name="Dynamic velocities_2")

#%%
z_1 = df_new.iloc[1:, 1:len(df_new_1.columns)]
z_2 = df_new.iloc[1:, 1:len(df_new_2.columns)]
#%%
z_1 = z_1.rename(columns={
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
16: 0.85,
17: 0.9,
18: 0.95,
19: 1.0
})
z_2 = z_2.rename(columns={
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
16: 0.85,
17: 0.9,
18: 0.95,
19: 1.0
})
#%%
z_1 = z_1.rename({
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
}, axis="index")
z_2 = z_2.rename({
1: 0.1,
2: 0.15,
3: 0.2,
4: 0.25,
5: 0.3,
6: 0.35,
7: 0.4,
8: 0.45,
9: 0.5,
10: 0.55,
11: 0.6,
12: 0.65,
13: 0.7,
14: 0.75,
15: 0.8,
}, axis="index")
`

#

`
#%%
sns.set(style = "whitegrid")

f, (ax1, ax2) = plt.subplots(1, 2, figsize = (16, 9), dpi=160)

x_1 = z_1.index
y_1 = z_1.columns
Y_1, X_1 = np.meshgrid(y_1, x_1)

ax1.plot(X_1, Y_1, ".-", label="")
ax1.legend(loc="upper right")
ax1.set(xlabel=r"$d_n$",
ylabel=r"$\rho_{m}^{-} \cdot 10^7$ (Ом$\cdot$м)")
ax1.grid(b=True, which='major', color='#666666', linestyle='-', alpha=0.7)
ax1.minorticks_on()
`

#

here's the result in python

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

jaunty mural
#

where x is column name and y is the data in the row

#

@true elk i want a scatter plot

serene scaffold
jaunty mural
#

how to select every time each row separated?
for instance i need a first row, I use head(1) but then I need only 2nd row

arctic wedgeBOT
jaunty mural
#

that's how i plot the first row of my table

#

here's the result

#

but how to plot the next line and next, and ... etc
this command didn't work for next lines
df_new_1.head(1), df_new_1.iloc[1, :]

serene scaffold
#

@jaunty mural you can use df.iterrows, I guess.

#

when you do df.head(n), you get the first n rows, so it's not a way to select a specific row.

jaunty mural
serene scaffold
jaunty mural
serene scaffold
jaunty mural
jaunty mural
#

i have managed it!!!!

plt.scatter(df_new.iloc[i].index, df_new.iloc[i, :]) plt.show()

lapis sequoia
rose agate
#

Is there any way to get interactive outputs like this one in Jupyter in Spyder?

jaunty mural
#

damn it, i don't understand why the second ax2 is bigger than ax1

ionic beacon
#
< hii >```
quartz fable
karmic valley
#

hey trying to make loop

#
import skimage.io as io
image1 = io.imread(r"C:\Users\samay\part1.png")
image2 = io.imread(r"C:\Users\samay\part2.png")
images = [image1,image2]

for image in images:
    print(image[image[..., -1] != 0][...,0:-1].mean())
#

at the moment have to wrtie every image

#

they are all called part1, part2, part3, etc

#

i want to do it automatically

spare briar
#

iterate over the directory

#

over files in C:\Users\samay\

#

easiest way would be using list comprehension

images = [read(file) for file in directory if file is pngfile]
#

@karmic valley

serene scaffold
candid pollen
spare briar
#

good luck i wrote pseudocode but you should be able to replace with python functions

karmic valley
#

do i have to specify the filepath first like stop at this C:\Users\samay\

spare briar
#

python standard library has a module called os

#

it gives functions for iterating over a directory

#

like os.listdir

karmic valley
#

ah okay i will try look that up, im still newbie

spare briar
#

no problem

karmic valley
#

and i replace that with file directory

spare briar
#

you can replace lines 2-4 with the list comprehension

#

and it will work for a directory full of as many .png files as you want

#

put them all in a list

karmic valley
#

oh so they are 2 separate lines of codes. first read file directory then do list comrehsion

spare briar
#

nope you are reading in the list comprehension

#

images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

karmic valley
spare briar
#

do you understand this list comprehension syntax

#

it is very powerful

karmic valley
#

yeah what you wrote seems to make sense logically just didnt know how to do before

spare briar
#

yup just try to remember the pattern, it is very useful

karmic valley
#

import skimage.io as io
images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

#for image in images:
print(image[image[..., -1] != 0][...,0:-1].mean())
#

do i still need for loop

#

or shall i put for loop on 2nd line

spare briar
#

well images is a list

#

now you want to do something with each object in the list

#

so you need to iterate over images

karmic valley
#

oh so your line of code adds them all to list?

spare briar
#

right my line reads every .png file in C:\Users\samay into a list called images

karmic valley
#

import skimage.io as io
images = [io.imread(file) for file in os.listdir("C:\Users\samay") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())

spare briar
#

yeah that should work

karmic valley
#

so this should read the list right?

#

nice!

spare briar
#

need to import os too

karmic valley
#

thank you!

#

oh yeah

#

adding stuff to list like this so powerful true! way easier than doing manually

spare briar
#

you can also do it with dictionaries

#

{key: val for key, val in zip(list1, list2)}

karmic valley
#

oh nice, will search that up too

serene scaffold
#

though that happens to work out to be the same as dict(zip(list1, list2))

spare briar
#

im just trying to teach him that this is a generic pattern

serene scaffold
#

right

karmic valley
#

to learn more about this what can i search up exactly

spare briar
#

these are called comprehensions

#

a related concept is generators

serene scaffold
#

generator expressions are related, but generators in general are not.

karmic valley
#

ah got you

quartz fable
karmic valley
#
images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

how come words after the if are in green

#

did i type wrong?

median moat
#

That is just the formatting that is used when you do

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

karmic valley
#

oh

#

got this error

true elk
#

I've never used ML before, I think I have a good opportunity to use it in a project. Is 41 nodes in the output layer viable?

desert oar
true elk
#

It's kinda of OCR but with only 41 possibilities

#

I have 30 hours of video to analyse, I can skip some frames (only need 1 info per seconde)

#

And the number only ranges from -20° to +20°

#

I thought about doing manual detect on the pixels but the background is changing too much

#

It's not too hard for the data entry but I'm on a Mac so no CUDA/big GPU

#

and as it would be my first project, I don't want to get into a problem that might not be solvable with my current setup/knowledge

compact rose
#

So guys, i have a doubt about machine learning. I am currently building a model based on a dataset about music popularity. The idea for model is that the company gets a model that can predict the popularity of each song. But now i have a doubt about data preparation that is : Should i delete the musics with low popularity or should i keep it? As shown in the screenshot, the column of music popularity goes from 0 to 100, so should i delete values above 60/70 or are they good to train the model?

true elk
misty flint
#

what are you trying to do? there are a few out of the box solutions that could just detect those numbers if needed

#

i would highly recommend doing some image processing however first

#

to increase accuracy

#

before feeding in frames from the videos

#

also i am not a Computer Vision guy so i will defer to someone with more expertise RunFail

true elk
#

As I have a lot of data, and data entry is not hard, I thought going directly to ML would be a better option

true elk
mild dirge
indigo garnet
#

how to decide the output size for a convelutional 2d net?

true elk
mild dirge
#

It's not that complicated, a mask is just checking for each pixel if it falls in a certain color range

#

and then setting that pixel to 1, if it is in that range, otherwise 0

#

So you'll get an image the size of your original image with only 1's and 0's

true elk
#

Like a threshold with upper and lower boundaries? Better to do it in the RGB mode?

misty flint
#

using matlab image functions or opencv for image processing

true elk
mild dirge
#

If you make the color range around the blue color of your display, then you will at least have filtered out the important stuff (hopefully)

misty flint
#

thats a good initial approach

true elk
#

And then manually doing the detection of the 7 segment digits on some specific pixel location?

#

I think I've done something with 7 seg in AoC this year 😄

mild dirge
#

Not really sure what would be the best way of classifying it

#

Just making a classifier with an output for each possible outcome seems naive since a lot of outcomes will be very similar (like -19 and 19 f.e.)

true elk
#

and maybe just discard the frames where there is too much blue

mild dirge
true elk
mild dirge
#

If you remove all the blue-ish images, your model will surely perform bad on those

true elk
#

the only thing is that I need high confidence

mild dirge
#

I get that you have a lot of training data, but if you want it to work for those blue-ish images, you want to use those for training too

#

Removing "difficult to classify images" is a bad idea is what I'm trying to say

#

Unless you know your real data is not gonna contain any of those

karmic valley
#

help

#

import skimage.io as io
import os
images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())
desert oar
karmic valley
#

i getting error

#
  File "C:\Users\samay\Dropbox\Average pixel colour.py", line 5
    images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]
                                                                                                                     ^
SyntaxError: EOL while scanning string literal

true elk
#

so just to check if I understood correctly: doing blue mask on data + doing ML anyway (and keeping difficult images)?

mild dirge
#

the blue mask is to make the images easier for whatever model you are planning to use

#

Since we already know the display is going to be blue

true elk
mild dirge
karmic valley
#

but missing some syntax

#

cant figure out

true elk
#

Thanks for the help guys! Let's try to code this 😄

karmic valley
#

hey anyone know what wrong with this:

import skimage.io as io
import os
images = [io.imread(file) for file in os.listdir(r"C:\Users\samay\out\test\raw-image\") if file.endswith(".png")]

for image in images:
  print(image[image[..., -1] != 0][...,0:-1].mean())
mild dirge
#

twice in here and opened a help channel, try waiting for a reply pls

karmic valley
#

thought noone saw sorry

#

cant figure it out

frozen marten
#

residual plot how can it have both normal distribution and constant variance (homoscedasticity) in graph
(wrt Ordinary Least Squares assumptions)'

bold timber
#

What is the type of distance if I use p = 1.5?

#

I know p=1 is manhattan and p=2 is eucliden, but what is the distance if p=1.5?

mild dirge
bold timber
#

whether 1.5 is average distance?

misty flint
#

p=1

#

p=2

#

now imagine a curve connecting the green and red blocks

#

but right between the previous two trajectories. and thats how you can visualize p=1.5

#

dont think it has a specific name. i think most just refer to it as a minkowski distance but with p=1.5

river sierra
# bold timber I know p=1 is manhattan and p=2 is eucliden, but what is the distance if p=1.5?

For a more mathematical/theoretical take, you should look into Lp spaces. Here is a good article on them: https://en.wikipedia.org/wiki/Lp_space?wprov=sfti1

In mathematics, the Lp spaces are function spaces defined using a natural generalization of the p-norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue (Dunford & Schwartz 1958, III.3), although according to the Bourbaki group (Bourbaki 1987) they were first introduced by Frigyes Riesz (...

bold timber
river sierra
#

every vector from the origin to the unit circle has a length of one, the length being calculated with length-formula of the corresponding p

bold timber
#

ok thank you for the explanation

trail horizon
#

hi channel

#

If i have a question regarding file names in a directory, where should I go ?

serene scaffold
molten gust
#

I am currently in a course for data science, that is going to take 3.5 months.

As it is of utmost importance for me to journey with a high learning curve, it is of essence to communicate.

What literature would be very convenient to go through?

I want to get very competent, as fast as I can. It is important to be playful and accumulate experience, so to program a lot on basics and then on different projects that are challenging regarding a solid structure and a diverse set of functions and styles to write code, is a no brainer.

I have a need to be efficient in learning and practice.

Would very much appreciate every constructive advice and suggestion.

To get from zero -> A.I. Developer

Also working on a degree in physics at the same time.

median moat
molten gust
#

Thank you for your feedback, I am already on that, but I do not have the time to go about it the average way, on the way I will sort out by myself what will be of importance and what not, but someone with loads of experience can give me some more insights on what to focus more and on what to focus less.

mild dirge
#

you want to learn how AI works, or data science in general or?

mint palm
#

I know some basic type of data cleaning we do about anomalous data....but when its comes to data such as face motion, heart rate pulse, etc etc., what kind of cleaning is done...i mean what do we do

mild dirge
#

And there aren't clear shortcuts, if you want to do data science in Python, you need to know python

#

also the more basic stuff like data structures/functions/classes etc.

molten gust
#

Yeah I am doing a python - Data science course -> MYSQL / NOSQL -> AI / ML

#

But the average way is not working by itself, I don't have the time to idle through this. I need to push it.

So I just do everything that is of essence for everyone but with a faster pace? So I need to adjust my learning speed, my reading speed and processing speed like in every other discipline, I guess.

mild dirge
#

I think the expectation might be a bit too high

#

How much experience do you have with python?

#

or coding in general?

river sierra
molten gust
river sierra
#

You should be good on the theory then.

#

Either way, to get your desired speed, you’ll need to read and write a lot of code

#

That’s the main way to learn

#

Especially at your pace

molten gust
#

Thank you very much @river sierra

sick fjord
#

is there anyone here good with tensorflow?

#

I would like to ask a few questions to see if what I want to do is viable

wheat ice
#

that url is blocked

#

rather, just say "please ask"

earnest abyss
#

still, that page makes an excellent point.

sick fjord
#

Let me rephrase.. is there anyone I can DM to ask questions

earnest abyss
#

still, same problem. Could you give a general idea of what you want here? I have some general idea of machine learning, I dabbled in this stuff a while ago, but I might have some idea of what might be possible. However, if you're asking about specifics, I've got absolutely no clue.

#

if you think it would contravene the rules of this server, you'd be better off not asking at all

sick fjord
#

sure. i have a 2-d histogram that has two "tails" of data. i would like to build a model that can essentially distinguish between the two with a level of confidence.

earnest abyss
#

sorry for the crappy ms paint picture, art isn't exactly my forte

sick fjord
#

here you go

earnest abyss
#

so you want to distinguish between the upper and lower tails?
What would stop you from using traditional programming, what necessitates machine learning?

sick fjord
#

I can and have already. I would like to compare the two and for practice

earnest abyss
#

do you have thousands of different input datasets with two tails? Is that feasible?

#

I really don't know how that sort of thing would be done, it's not like the sort of optimization problem that machine learning is really good at

sick fjord
#

It's been done before using svm

earnest abyss
#

What would be the optimal machine learning strategy to detect and remove specific sounds from an input audio source?

sick fjord
#

im going to throw a wild guess and say fourier transform

earnest abyss
#

I looked into that, and my intended use case, removing laugh tracks, wouldn't work with the fourier transform because the frequencies of a laugh track are so similar to the frequencies of everyday speech

desert oar
#

a good foundation in the basics is more valuable than a scattered sample of a lot of advanced things

pseudo wren
#

I need to practice predictive modeling with linear regression

#

How can I do this

#

What are good resources

desert oar
#

it's like the titanic dataset, but for regression instead of classification

pseudo wren
#

Hmmm okay

#

Now the actual linear regression has a built in

#

And the predictive modeling built in as well

desert oar
#

i don't know what you mean by that, sorry

pseudo wren
desert oar
molten gust
molten gust
indigo garnet
#

how to calculate the output channels for conv2d layer, is it a random value or is there anyway to get the number for it?

misty flint
#

i would say after you understand the tooling, try and use it on a new and different dataset

bold timber
#

anyone can explaining to me what the type of error like this: "ERROR! Session/line number was not unique in database. History logging moved to new session 12088"

#

I got it when I use bayesian search for tuning the hyperparameter

safe elk
#

Ah if you ask the answer I think is it depends on your interests and your skills

tacit basin
#

you can try kaggle comps, datasets for example

iron basalt
#

Implement a Tsetlin Machine.

mint palm
#

Does data such as temperature readings,and ECG need filtering?
What kind of filtering do these need

soft lance
#

Hello, I need your help with pytorch.DataLoader. I want it to sample images from my custom Coco_Dataset_Manager, a batch of 3 images per iteration. Images in Coco have different sizes. Let's say DataLoader sampled 3 images with sizes (100, 100), (100, 100), (400, 400). I want to force it to cache these images in a "waiting room" and sample some more. When a waiting room of a particular size has 3 or more images, I want DataLoader to put the batch through collate_fn and return it.

#

How can I program such behaviour?

#

I'm thinking about writing a custom Sampler.

devout sail
#

Trying to understand how much wiggle room you have

compact rose
#

So guys, i have a doubt about machine learning. I am currently building a model based on a dataset about music popularity. The idea for model is that the company gets a model that can predict the popularity of each song. But now i have a doubt about data preparation that is : Should i delete the musics with low popularity or should i keep it? As shown in the screenshot, the column of music popularity goes from 0 to 100, so should i delete values above 60/70 or are they good to train the model?

small orbit
devout sail
#

Also, a high value means unpopular?

#

Either way, I don't see why you would remove it

compact rose
#

means popular

#

Sorry, forgot to put it ahhaha

devout sail
compact rose
#

below 60/70* sorry

devout sail
#

yeah I'm not sure why that's necessary

#

If anything you want a good representation of the sample space

compact rose
#

I was in a paradox where i was thinking " Well, my target is predicting songs with high popularity, but should i delete low popularity? However, with low popularity, the model will also understand what are bad musics"

devout sail
#

yeah, basically your answer is that last sentence

#

Negative examples are important

#

If it only sees popular songs, then nothing stops it from learning to say that everything is popular

compact rose
#

True, thanks mate! Thank you for your time 🙂

devout sail
#

np

small orbit
#

anyone?

dense harbor
#

Hi everyone, I am currently solving the Titanic Disaster Problem on Kaggle.

I analysed the data, and then built a machine learning model and I got score around 0.78%

I tried to improve my model (around 30times by now) for better accuracy and no hope. I tried almost every possible technique to improve my score (using machine learning) and no hope. Can you suggest me any proven technique I could’ve missed? (please don’t send full code to the problem I want to try on my own).
Many thanks in advance ☺️

true elk
#

Hey guys, I'm back 😄

#

Do you think this image processing will be enough for classification? I'll try the MNIST dataset first to get some understanding on how to train a model

#

I've tried already a lot of things, I think HSV mask was my best option yet

soft lance
devout sail
#

you're downloading the images on the fly each time you're training?

true elk
#

Do you think I can get out of it with this data?

#

I need to classify those into -20 to 20, so 41 nodes output

#

I only need high confidence data, I can discard all the other ones

#

Yesterday, PC Camel suggested me to focus on processing first, then going for MNIST example for ML

#

Is CNN the right path for this task?

true elk
#

😦

hazy knot
#

Anyone have any suggestions for storing different versions of multiple models?

bronze spire
#

I have finished the basics of Python and I've decided that I want to learn Data Science, where can I start from? Can someone give me a good source that's free?

true elk
#

while waiting for an answer, I'm really amused by the amount of people requesting help for assignments or bad purposes Issou

warm oracle
#

A lot of people want easy/fast solutions after all lol

true elk
#

Is building my own ML with numpy a good idea to start with?

#

Thanks @serene scaffold 😄

warm oracle
#

I'm planning to start a career as a Machine Learning Researcher.
I've already gotten the math side (Linear Algebra, Calculus, Statistics and Probabilities) somewhat covered. And currently learning TensorFlow/Keras and planning to study Pytorch after.
Tool-wise, is there anything else I need to make sure I know?

desert oar
# true elk Is CNN the right path for this task?

i think it is the right task, but you have a lot of noise. if you can do pre-processing to increase the signal-to-noise ratio that would help. and you will have to accept that some instances are not going to give usable results, e.g. the blank ones

#

it looks like maybe 1/3 to 1/2 of these are unusable

#

i think a "good" model should give ~0 confidence on all categories for those unusable ones

#

that's going to be the hard part imo, not obtaining false positives on the junk images

#

what is this task, anyway? is this the thermometer in your car dashboard? some industrial process that is hooked up to an LCD but not a usable computer?

true elk
#

it's to detect the range of temperature in a game (NGL Biathlon)

#

Don't ask why 😄

#

Kidding

true elk
true elk
true elk
plush glacier
#

or use a already pretrained model

desert oar
#

you can try mnist pre-training. you'll want to convert rgb to b&w first though

#

depending on your time and resources, you could also take like 1000 pictures of "clean" output, artificially add noise, and use that as pre-training

#

if this were an industrial process maybe i'd suggest doing that, idk if it's worth it for a video game

#

i had no idea there was a biathlon video game..

true elk
#

yeah I thought of doing just manual pixel detection as this is 7 segment digits

#

but also a good first project for ML

#

but maybe the the optimal first project aniblobsweat

desert oar
#

no i think it's a great project

#

it's a relatively straightforward task but with lots of noise and a small amount of data

plush glacier
true elk
#

I thought of going on Numpy directly to get my hands dirty (and understanding better what I'm doing), and also because all the MNIST video/tutorial I found just import it easily. I can't really do that with my current data

desert oar
#

i'd try it both ways tbh. pre-train on mnist, and not-pretraining on mnist. i am skeptical that handwritten numbers will be good pre-training for 7-segment lcd

desert oar
plush glacier
#

but it is worth to try tensorflow and pytorch i personally really want to learn pytorch but dont have the time yet

true elk
#

From my understand, pre-training wouldn't be beneficial in this case. The numbers are really in a fixed position!

desert oar
#

the reason you want pre-training is that you have a very small dataset here

#

even smaller when you consider the number of "usable" items

plush glacier
#

that is the entire dataset?

true elk
#

I have 30 hours of footage at 30 fps, it's not enough?

desert oar
#

oh

#

that's a lot

#

and they're all labeled? that is, you know the number for every frame?

plush glacier
true elk
#

This is where I'm at right now

true elk
plush glacier
#

are all numbers always the same at the same location?

true elk
desert oar
#

as a side note, this game looks too realistic, my heart rate is going up just watching gameplay footage 😆

true elk
#

That's why the fixed position pixel detection was my first idea

plush glacier
true elk
#

(+ I've done some 7 segment logic in AoC this year 😄 )

plush glacier
#

so basically the same font

desert oar
#

i assume it's the temperature in the top right corner here? https://www.youtube.com/watch?v=ZlJ6VPN8G0o

В этом видео мы пробежим все гонки Олимпиады 2022 по биатлону за Александра Логинова в игре NGL Biathlon! Получится ли завоевать медаль?!

Скачать игру 1 - http://boosty.to/ngl_biathlon
Скачать игру 2 - http://patreon.com/biathlon
Группа Вконтакте - http://vk.com/ngl_biathlon
Instagram Васи - https://www.instagram.com/vasya_ngl/
Instagram Игры -...

▶ Play video
true elk
#

yep

desert oar
#

it looks like it's fixed position in a HUD, with varying backgrounds

true elk
#

btw the game is free 😄

#

(or you can donate ❤️ )

desert oar
#

gotta love open source gaming!

true elk
#

Please don't suggest training Tesseract, I literally had nightmares with it. EasyOCR saved my life

plush glacier
#

in that case you may not want to use ml

desert oar
#

i think they just want to do this as a toy project to learn

plush glacier
#

oh in that case use ml or multiple solutions

desert oar
#

i actually love this idea. easy enough to download gameplay footage and DIY it too

#

maybe i'll do it too 😛

#

i need hands-on practice w/ image deep learning i think

#

what's the state of the art for semi-supervised learning? i looked into it several years ago and it seemed like it was kind of a dead end

#

active learning might be a better choice perhaps, i've used that successfully for record linkage / deduplication projects

true elk
plush glacier
# desert oar i need hands-on practice w/ image deep learning i think

i'm trying to get some by downloading like 1600 images from pexels so all images are high res and then get a tile from 256,256 at a random point of the image and use that to train a super resolution model although i still need to make the discriminator and i'm planning on doing that today (i'm not expecting great results it only has like 50k to 300k params (depending on if i use conv2d or dephwise seperable convolution)

true elk
desert oar
plush glacier
#

how often does the temperature change like how many frames are between there and do you want the challenge of a multi frame solution because that will be way harder

desert oar
#

i know nothing about ML on videos, i'm curious to know what the multi-frame solution is

#

is it a 3D CNN over fixed-length durations of video? like 5 frames at a time

true elk
#

so I need at least 1 good frame each 30

#

and it's okay if I can get 1 out of 100ish

#

I only need high confidence on the 1

#

Damn the more I think, the more I realise that image comparison would be much easier 😦

plush glacier
#

although you are also maybe able to average out like 3 frames and use that to get more clear data but that can result in that you are 2 frames off

true elk
#

someone told me yesterday to not discard bad frames, but he didn't knew about the whole context, so I'm a bit confused rn

plush glacier
#

no i mean that you have like 30 frames each sec so you could make 30 predictions on each sec but what if you make it 28 groups of 3

#

and you use the average of each one

true elk
#

damn this would be great for my image comparison solution! Get 30 frames of each second and doing kind of overlay/masking thing

#

like groups of 30

plush glacier
#

i would just use averages because the shape and color doesn't really change

#

but that way you get 3 values for each frame so if the output is 2,2,5 you can say it is most likely a 2

true elk
#

that would dilute my error rate

#

sooo.. where should I start 😄 ?

#

using my current processing on images, labelling a few, then throw it into keras/pytorch and pray for the best?

plush glacier
true elk
#

tbh I'm out of ideas on how to improve it 😦

plush glacier
#

also are the numbers slightly transparent? if not you can use only that single color of the letters

plush glacier
#

could you send like 2 images to me that aren't pre-processed 1 that gives bad results with your current method and 1 that gives good results

#

and with to me i mean in this chat

true elk
#

Good luck with that 2nd one 😄

plush glacier
#

only way i ca think of is to subtract the pixel value from the bottom right most pixel and then increase the brightness a lot

true elk
#

I'm trying to np.mean some group of images, seem promising for now.

dapper dune
#

hey there! Can some1 help me with nvidia DeepStream (docker)?

plush glacier
#

although i would have to say it is impossible to get the second image because there is nothing there

#

although a function like ```py
def try_to_get_number(image):
bottom_right_pixel = image.mean()
return image - bottom_right_pixel

#

i wouldn't change your current way

desert oar
#

i would also argue that some images are actually "unknown" and that a good model should indicate this

true elk
plush glacier
frigid elk
#

any mlops guys working on foundry (palantir)? .. looking for some best practices on workflow within that environment, ml pipeline in code repository specifically. how to keep code modular given the available toolset and proven libraries to provide reliable results while utilizing spark scalability and not wasting resources

jaunty belfry
#

can somebody tell why there in no 1/2m in cost function of L2 regularization whereas it is present in Linear regression?

lapis sequoia
#

for I in range(5):
Print('I am going to fail the AP test')

jolly stone
#

Can anyone give me a hint of some performative search algorithm? The search is to find a word

agile cobalt
#

kinda sus GWllentThinkPika

long locust
#

Hello, please don't post unapproved advertising. Thanks

prime hearth
#

@jaunty belfry it just a constant 1/2 * 1/m

#

the 1/m is used for averaging

#

the 1/2 was put there to cancel the square power when we take the derivative

#

this does not actually change or affect the derivative

#

it just scaling

small orbit
misty flint
#

has anyone used geopandas before? PikaThink

#

do you recommend it

wheat hemlock
#

Hey Everyone! Happy to be here been doing data science with python for about a year now, and now wanting to use Django to create an API for my website

Some questions, I will presenting stats to a website
- Should I be updating the stats on the backend directly through the db or using post
- MySQL as backend of Stats website or Postgress another DB
- I will be doing historical analysis, can I do this through django views or should I do the analysis before and just present the already updated information

desert oar
desert oar
# wheat hemlock Hey Everyone! Happy to be here been doing data science with python for about a y...

postgres has the most features and is the easiest to administer imo. mysql i think has better scaling functionality but you don't need that.

Should I be updating the stats on the backend directly through the db or using post
you can write directly to the database, but if you go through your own api endpoints then you maybe have a "safer" interface, with fewer ways to make mistakes, but then you have to deal with authentication for a privileged user that has the ability to write to the db, which maybe is more complexity than you want in your website

I will be doing historical analysis
what is historical analysis?

modern cypress
#

Say I have a list like [London, Paris, Chicago], is it better to index these in my data like [0, 1, 2] or to turn them into bools and have a new column for each? so like is_London, is_Paris, is_Chicago?

#

Or does this not have any effect?

desert oar
#

are you talking about a series where each value is a list?

#

there's nothing inherently better about integers compared to strings. if anything, strings prevent you from making the mistake of putting your categorical-valued integers into a model, which will treat them incorrectly as continuous values

modern cypress
# desert oar neither? both? provide more context

I was looking back at some previous work, and some of the learning material I was provided and it said to change the categorical values in to indexes like that, and in other examples they were doing it the other way using pd.get_dummies(). So I was just unsure if one of the ways was better than the other

desert oar
#

ideally you'll use pd.Categorical for categorical data

wheat hemlock
desert oar
wheat hemlock
desert oar
#

no, sorry

#

i'd rather not give 1:1 private help

#

you might want to post these questions in #web-development or #databases , it sounds like the data science aspect of your project is unrelated to these questions

wheat hemlock
#

Ok any help on how I can accomplish the loading of the data to the DB in the correct way, I am currently using mysql.connector but worried about slugs and timestamps then how I can do the analysis of historical data on the backend? Thank you so much

modern cypress
desert oar
#

if your implementation requires "numbers", use LabelEncoder, otherwise leave them as-is

#

keep in mind that pd.Categorical is backed by an integer array anyway

modern cypress
#

Oh for real? Damn

#

But yeah, I had done it because it was in one of the lectures

#

It also said to try change values like "yes, no" or "Risk, NoRisk" to binary

pseudo wren
#

I’m going to be honest

#

Idk how you guys do it

#

I feel like my head is going to burst trying to figure all this stuff out

normal jay
#

how do you print the number of items from a column from an excel file? while also using np.unique

#

because the names on that column are repeated , so i need the number of names without it being repeated?

rose agate
ivory steppe
#

I had csv data involving the info of patients having mri scan in various regions and the population data. Now I wanted to have a predictive model which could predict the probability of the supply and demand with the population data so that the occurence of mri scans could be predicted in that region.
Can anyone help in apporaching this problem or suggest me some material/work to look having similar format and problem.

small orbit
gleaming pulsar
#

does somebody knows how this type of chart is called?

#

i want to implement it in a game wich works in rounds

lapis sequoia
gleaming pulsar
#

i realised it is called a bump chart

lapis sequoia
#

Hi all, I would like to ask a question on how filters are activated
my resource is mainly deeplizard from YT for cnn
for a filter to be able to detect patterns in an image, they mentioned that the loss function between the input image and the output channel must be maximised
which is done with gradient ascent
I am guessing this is for training a cnn, would that be right?
my question is
if they are maximising the loss function for a filter
to be able to detect the a feature from the image
won't it be easy to just have the filter (3x3 for this example) to just keep increasing the element values of the filter
like say that I have a filter, and it moves over a completely white image,
won't I be able to produce a filter that has a million for each value in the filter
and that would generate a very high value
and then when that filter moves over another part of the white image that may have a black spot, that black spot would be negligible
.
it was easy to understand back-propagation when we were trying to minimise the loss function, which brings the accuracy up
but I cant get over why the filter needs to have its loss function maximised, on top of that, maximising the loss function should be something that doesn't have an end right?
another question: is it that we maximise the constitutional layer's loss function, and minimise the rest of the loss function of the cnn in two separate training events?

cinder matrix
#

guys how do i get the accuracy of my model

#

and how do i evalutae it

warm oracle
#

Man, google colab is high
I have this code

def plot_pred (train_data=X_train,
               train_label=y_train,
               test_data=X_test,
               test_label=y_test,
               prediction=y_pred):
  plt.figure(figsize=(10, 7))
  plt.scatter(train_data, train_label, c='b', label="Training Data")
  plt.scatter(test_data, test_label, c='g', label="Testing Data")
  plt.scatter(test_data, prediction, c='r', label="Prediction Data")
  plt.legend()

plot_pred()```
Which gives me an error on the ```plot_pred()``` saying that X and y need to be the same value.
But then I comment the prediction code, run it again, uncomment it, run it yet again and it works fine with no errors.
mild dirge
#

are you using global variables as default values in your function? @warm oracle

#

That's probably messing stuff up

#

Also not passing any values in your function call

warm oracle
#

Yea. I have them set that way as the only change would be the y_pred between models. So I can just go plot_pred(prediction=y_pred_1) or whichever it is lol

mild dirge
#

You shouldn't use global variables in a function to begin with

#

let alone use them as default values

#

Pass them as arguments

cinder matrix
#

colab is geh

#

bans me from gpu usage

warm oracle
#

Ah I see. Thanks.

mild dirge
#

" saying that X and y need to be the same value." It would also help to just give the error traceback btw

#

I assume it actually said that they need to be the same length

warm oracle
#

Sorry, I had it fixed so didn't have the traceback to post.
lemme see if I can replicate the error

warm oracle
#

"ValueError: x and y must be the same size"

mild dirge
#

So the length of both arrays/lists differ

warm oracle
#

Not really, as I set both X and y to be the same size, so I can understand it better before going into an actual dataset. As I'm still new to TensorFlow.

X = tf.range(-100, 100, 4)
y = X + 10

X_train = X[:40]
y_train = y[:40]

X_test = X[:10]
y_test = y[:10]```
mild dirge
#

Well they do, otherwise you don't get an exception

#

If you check the error traceback you already know which line causes it

warm oracle
#

Yea. Which, like I said, got fixed by commenting one line on and off again.

mild dirge
#

If it is still a problem, you aren't give close to enough information to let us help you, otherwise I don't know why you are telling this :/

warm oracle
#

Was just an observation. Since I said it got fixed in my initial comment.

#

Guess those aren't allowed lol

mild dirge
#

Yeah but commenting and uncommenting code doesn't fix anything, that makes no sense

#

It might be that you reran some code, that made some values change or something

#

Whatever it was, it made it so the length of two of those arrays/lists weren't the same size

upper bluff
#

i have a 2d array, something like:

[[0,2,3],
 [1,0,3],
 [2,1,0],
 [3,1,2]]

and i have a pandas dataframe something like:

 id   val
0 a    9
1 b    8
2 c    3
3 d    7

now i want to get a new dataframe based on the indexes listed in the 2d array, for this example, it will be like:

 id   val   id_2  val
0 a    9      a    9
0 a    9      c    3
0 a    9      d    7
1 b    8      b    8
1 b    8      a    9
1 b    9      d    7
2 c    3      c    3
2 c    3      b    8
2 c    3      a    9
3 d    7      d    7
3 d    7      b    8
3 d    7      c    3
warm oracle
#

That's why I was confused.
As I only reran the two cells that have the function defined (after commenting and uncommenting a line), and the one with the function call.

#

But don't worry about it lol.
Sorry for taking your time. And thanks again for the advice.

mild dirge
#

yeah wasn't meant to sound so angry, just a bit confused

#

sorry if it came of aggressive 😛

warm oracle
#

Don't worry. I'd react the same if someone told me what I just said lol.

upper bluff
mild dirge
#

I am not that experienced with pandas srr

upper bluff
#

ah aighty

mighty orchid
rose agate
# upper bluff i have a 2d array, something like: ```py [[0,2,3], [1,0,3], [2,1,0], [3,1,2]...

might be a better way to do it but I got this working

import pandas as pd
import numpy as np

index = [[0,2,3],
 [1,0,3],
 [2,1,0],
 [3,1,2]]

idse = ['a','b','c','d']
vals = [9,8,3,7]

data = {'id': idse, 'val': vals}
df = pd.DataFrame(data=data)

newdf = pd.DataFrame(np.repeat(df.values, len(index[0]), axis=0))

flat_list = [item for sublist in index for item in sublist]
newdf['id_2'] = df.id[flat_list].values
newdf['val_2'] = df.val[flat_list].values
#

produces

upper bluff
#

SO GOOD YES

#

thats EXACTLY what i awanted!!!!

rose agate
#

I did use np.repeat so the length of each sublist in the index would need to remain constant or it'd break

#

no worries

upper bluff
#

yessss each sublist has equal length

upper bluff
rose agate
#

are you trying to repeat those columns like with id and val or index them like with id_2 and val_2

#

hard to understand

upper bluff
#

multiple columns of values

#

each id will have its own set of values

#

we are repeating ids

upper bluff
rose agate
#

for each of the 40 cols?

upper bluff
#

pd.concat([newdf, df[flatlist]], axis = 1, columns = [...])

#

maybe this

rose agate
#

I can't really tell what you need without an example

upper bluff
rose agate
#

let me think

rose agate
upper bluff
#

YES thanks a ton

#

this works epicly

proper swift
#

Hi is there a way to apply a function starting with the 3rd instance of a value? I.e. ID numbers, on the the third count of an id number do x

I have the following df:

ids = [1001, 1002,
       1003, 1004,
       1005, 1006,
       1007, 1008,
       1009, 1010]

numbers = list(range(1,11))

systems = ["ONE", "TWO"]

num = 40

sample1 = random.choices(ids, k=num)
sample2 = random.choices(systems, k=num)
sample3 = random.choices(numbers, k=num)

df = pd.DataFrame(zip(sample1, sample3, sample2), 
                 columns=['id', 'seq', 'system'])

df.sort_values(by=['id', 'seq'])

If the count of the IDs >= 3, then starting at the third row, shift all the values in the system column up by one

undone wind
#

Im making a fairly basic music content-based recommender system, following some code I used before for a movie recommender system that had a much smaller dataset. Not sure if this is relevant but the biggest change I made was that the music dataset I used is way too large so I have the system create a sample and the index is reset.

When running the function that compares an input to the rest of the data and outputs a small list of most recommended artists I get an error Ive never seen before: "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

#

Im unsure what this means and what I should do

#
    index = indices[artist_name]
    sig_scores = list(enumerate(sig[index]))
    sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
    sig_scores = sig_scores[1:11]
    spotify_indices = [i[0] for i in sig_scores]
    return spotifydf['artist_name'].iloc[spotify_indices]```
#

  ValueError                                Traceback (most recent call last)
<ipython-input-27-95e154b531ed> in <module>
----> 1 recommendation(spotifyRec)

<ipython-input-25-19214a8b47a6> in recommendation(artist_name, sig)
      2     index = indices[artist_name]
      3     sig_scores = list(enumerate(sig[index]))
----> 4     sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
      5     sig_scores = sig_scores[1:11]
      6     spotify_indices = [i[0] for i in sig_scores]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()```
stoic trench
#

Interesting

tidal bough
#

By the way, what you're doing can be done via np.argsort instead (it gives an array of indices such that the corresponding elements are in sorted order. That seems to be exactly what you're doing with all these lines).

tidal bough
#

Yeah, looks like each element is an array, so how do you expect sorted to compare them? What's bigger, [0,1,2] or [1,-1,0]?

undone wind
#

and yeah

#

it is different in my old version with movies

#

huh

desert oar
# undone wind

those are tuples, not arrays. python allows you to sort tuples

desert oar
tidal bough
#

yeah, sure. Anyway, sorted just does the equivalent of if a<b:, which for numpy arrays is not valid (a<b is an array of bools, and an array of bools can't be implicitly reduced to a single bool like that)

undone wind
#

im a little confused then why my original has tuples and this version uses arrays

desert oar
undone wind
#

this is from the movies one that works

#

which seems to show that sig is an array here

#

(the one that doesnt work is a similar result)

desert oar
#

you could also do this with groupby:

df = ...

systems = ["ONE", "TWO"]

df['id_count'] = df.groupby('id').cumcount()
df.loc[df['id_count'] >= 3, systems] += 1
proper swift
desert oar
#

or even combining the loop and groupby, which might be the best option if this dataframe is really big and you have a large number of duplicate id values:

df = ...

systems = ["ONE", "TWO"]

for _, group in df.groupby('id'):
    if len(group) > 2:
        inc_ids = group.index[2:]
    df.loc[inc_ids, systems] += 1

im not entirely sure about the semantics of += while looping over itertuples or groupby. if you want to be safer, you can construct a list of id's to modify first, and then do the modification in one shot after

#
df = ...

systems = ["ONE", "TWO"]

inc_ids = []
for _, group in df.groupby('id'):
    if len(group) > 2:
        inc_ids.extend(group.index[2:].tolist())
df.loc[inc_ids, systems] += 1
desert oar
# undone wind

can you post your entire code? both the "original" you mentioned as well as the new version that gives you a problem

undone wind
#

ah

#

well then

proper swift
#

@desert oar what I had so far was this:

def func(df_group):
    if len(df_group) >= 3:
       return df_group.system.shift(-1)
    else:
       return df_group.system

new_col = df.groupby(['id']), as index=False).apply(func)

df['new'] = new_col.reset_index(level=0, drop=True)
desert oar
desert oar
#

that would work too but that shift would apply to the entire group, not just the values after the 3rd

proper swift
desert oar
#

wait... is system a column? it looked like a list of columns

proper swift
desert oar
#

oh i see

#

i misread your original example

proper swift
#

no worries, hopefully the problem is abit clearer now?

desert oar
#

huh, that's a new one

#

let me see what i broke

#

found it

#

let me do this offline 😆 hang on

undone wind
# desert oar can you post your entire code? both the "original" you mentioned as well as the ...

# In[11]:
spotifydf = spotifydf.sample(frac =.1)
spotifydf = spotifydf.reset_index()


# In[12]:
spotifydf.head()


# In[13]:
spotifydf['popularity'] = spotifydf['popularity'].apply(str)


# In[14]:
spotifydf['genre'] = str(spotifydf['genre'])
spotifydf['genre'] = str(spotifydf['artist_name'])
spotifydf['genre'] = str(spotifydf['track_name'])


# In[15]:
spotifydf["content"] = spotifydf['genre'] + spotifydf['artist_name'] + spotifydf['track_name'] + spotifydf['popularity']

# In[16]:

from sklearn.feature_extraction.text import TfidfVectorizer

tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}', ngram_range=(1, 3), stop_words = 'english')

spotifydf['content'] = spotifydf['content'].fillna('')

# In[17]:
tfvmatrix = tfv.fit_transform(spotifydf['content'])
# In[18]:
tfvmatrix
# In[19]:
tfvmatrix.shape
# In[20]:
from sklearn.metrics.pairwise import sigmoid_kernel
# In[21]:
sig = sigmoid_kernel(tfvmatrix, tfvmatrix)
# In[22]:
sig[0]
# In[23]:
indices = pd.Series(spotifydf.index, index=spotifydf['artist_name']).drop_duplicates()
# In[24]:
indices
# In[29]:
def recommendation(artist_name, sig=sig):
    index = indices[artist_name]
    sig_scores = list(enumerate(sig[index]))
    sig_scores = sorted(sig_scores, key = lambda x: x[1], reverse=True)
    sig_scores = sig_scores[1:11]
    spotify_indices = [i[0] for i in sig_scores]
    return spotifydf['artist_name'].iloc[spotify_indices]
# In[26]:

spotifyRec = input("Enter the artist you would like a recommendation based on!")
# In[27]:
recommendation(spotifyRec)
desert oar
arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

undone wind
#

ok

#

this is the original then

slate scarab
#

I have a dumb question, I saw they had discord.js would it be better in the long run to start and stay in python or is discord.js a good start, I plan on working on creating my own version of Carl bot and have never tried to make a project this big before.

undone wind
slate scarab
#

I have done some basic bots before and would like to go bigger

undone wind
#

and this is the new version with music instead of movies @desert oar

desert oar
#

!eval @proper swift

import numpy as np
import pandas as pd

ids = [
    1001, 1002, 1003, 1004, 1005,
    1006, 1007, 1008, 1009, 1010,
]
numbers = list(range(1,11))
systems = ["ONE", "TWO"]
num = 40
rng = np.random.default_rng()
sample1 = rng.choice(ids, size=num)
sample2 = rng.choice(systems, size=num)
sample3 = rng.choice(numbers, size=num)
df = pd.DataFrame(
    zip(sample1, sample3, sample2),
    columns=['id', 'seq', 'system'],
)


def shift_system(group):
    if len(group) < 3:
        return group
    return pd.concat((
        group.iloc[:2],
        group.iloc[2:].shift(-1)
    ))

df['new'] = (
    df.groupby('id')['system']
    .apply(shift_system)
    .reset_index(level=0, drop=True)
)
print(df)
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |       id  seq system  new
002 | 0   1008    6    TWO  TWO
003 | 1   1002    5    ONE  ONE
004 | 2   1004    2    TWO  TWO
005 | 3   1001   10    ONE  ONE
006 | 4   1009    3    TWO  TWO
007 | 5   1008    6    ONE  ONE
008 | 6   1001    7    TWO  TWO
009 | 7   1004    2    TWO  TWO
010 | 8   1001    7    ONE  ONE
011 | 9   1008    2    ONE  ONE
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/doverelawu.txt?noredirect

desert oar
#

i can't imagine why you want to do this though 😆

proper swift
undone wind
#

wait

cinder matrix
proper swift
#

@desert oar thanks works as intended! Been stuck on that problem for the last 2 days

undone wind
#

ok so I think the issue was possibly coming through a few things

#

so originally when using my method I got an error when concatenating columns saying: "TypeError: can only concatenate str (not "int") to str"

#

wait

#

maybe not 😂

#

yeah I cant work out why its making full arrays instead of tuples, I cant see anywhere why its doing this

#

oh wait, is it potentially because im doing it based off of artist_name, and one artist can have many songs within the dataframe, so each artist is assigned a multitude of values in sig?

serene scaffold
#

@undone wind can you do print(df.head().to_dict('list')) and show the text in this chat?

#

and then we can talk about how to transform it to get your desired result. only text will do--no screenshots.

#

Please ping me when you do that and we can get into it.

undone wind
#

{'index': [133553, 204593, 52399, 79490, 93264], 'genre': ['Reggae', 'Soundtrack', 'Blues', 'Opera', 'Indie'], 'artist_name': ['Bob Marley & The Wailers', 'Nick Glennie-Smith', 'Galactic', 'Giacomo Puccini', 'The Lagoons'], 'track_name': ['Bend Down Low - B Is Version', "Jack's Death", "You Don't know (featuring Glen David Andrews and The Rebirth Brass Band)", 'Un bel dì (From "Madama Butterfly")', 'California'], 'track_id': ['6bwr7Qgxrc0hERBOrapmVh', '34devHoJ8tjNLPgSaOpPuo', '5qh4q09WZTUMCkqXWR4l6l', '4jekropd6vkVfunMXZqwVh', '35QAUfIbfIXT3p3cWhaKxZ'], 'popularity': ['35', '28', '25', '20', '64'], 'acousticness': [0.44, 0.973, 0.0393, 0.9890000000000001, 0.276], 'danceability': [0.779, 0.14400000000000002, 0.701, 0.24, 0.7859999999999999], 'duration_ms': [213867, 98440, 244200, 296693, 261773], 'energy': [0.445, 0.203, 0.7659999999999999, 0.163, 0.6859999999999999], 'instrumentalness': [0.000151, 0.7829999999999999, 0.000816, 2.8499999999999998e-05, 0.6679999999999999], 'key': ['C', 'D#', 'G', 'C#', 'E'], 'liveness': [0.166, 0.11599999999999999, 0.23800000000000002, 0.317, 0.0416], 'loudness': [-7.791, -17.989, -6.285, -15.071, -7.18], 'mode': ['Major', 'Minor', 'Major', 'Major', 'Major'], 'speechiness': [0.0458, 0.0356, 0.0976, 0.05, 0.0289], 'tempo': [87.94, 74.194, 110.001, 89.719, 119.99700000000001], 'time_signature': ['4/4', '1/4', '4/4', '4/4', '4/4'], 'valence': [0.755, 0.0389, 0.588, 0.0388, 0.542], 'content': ['ReggaeBob Marley & The WailersBend Down Low - B Is Version', "SoundtrackNick Glennie-SmithJack's Death", "BluesGalacticYou Don't know (featuring Glen David Andrews and The Rebirth Brass Band)", 'OperaGiacomo PucciniUn bel dì (From "Madama Butterfly")', 'IndieThe LagoonsCalifornia']} @serene scaffold

#

content comes from spotifydf["content"] = spotifydf['genre'] + spotifydf['artist_name'] + spotifydf['track_name']

#

I tfvmatrix content

#

then sig = sigmoid_kernel(tfvmatrix, tfvmatrix) to make sig

#
sig[index]

array([[0.7616427 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76161052, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       ...,
       [0.7616156 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.7616192 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416]])```
#

sig contains arrays with multiples values for some reason whereas it should only contain 1, the sigmoid value of each row in content

#

i dont know why and I believe this is what I am asking for

compact gazelle
#

Question, how to make 2D array like this using numpy? The output wants the array starts from 137 to 166

serene scaffold
serene scaffold
compact gazelle
desert oar
#

@undone wind what is spotifyRec?

#

even the code snippets you posted don't include all of the code

#

don't make people guess at what you are doing here, if you can include the whole notebook please do so

undone wind
desert oar
#

this is the non-working one?

undone wind
#

yes

desert oar
#

@undone wind ```python
indices = pd.Series(spotifydf.index, index=spotifydf['artist_name']).drop_duplicates()

does `spotifydf` have a multi-index?
#

also it seems a bit weird that you're inverting the index and values like this

#

ok i see... you have this

spotifydf = pd.read_csv(r'C:\Users\cens\Downloads\archive (3)\SpotifyFeatures.csv')

so its index should just be the default RangeIndex

#

i see you also did spotifydf.reset_index() in cell 11

#

ok, so sig should be N x N where N is the number of rows in spotifydf

#

ahh i see why you invert the index, that's one way to do it. but sig is a plain numpy array, so it's only valid if you use a RangeIndex, you're better off with np.arange(len(spotifydf)) instead of spotifydf.index.

#

this notebook is also pretty messy. it's very likely that you have some weird intermediate state. did you try restarting the kernel and running it top to bottom?

#

if sig is indeed a 2d array, and if artist_name is a scalar (i.e. not a list/series/array), then sig[index] should be a 1d array because you deduplicated indices,

#

so if you get something different, then one of those assumptions is wrong

#

check the shape of sig and check that you are only passing plain "artist name" values, not arrays thereof

undone wind
undone wind
# desert oar check the shape of `sig` and check that you are only passing plain "artist name"...
   sig[index]

array([[0.7616427 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76161052, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       ...,
       [0.7616156 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.76163003, 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416],
       [0.7616192 , 0.76159416, 0.76159416, ..., 0.76159416, 0.76159416,
        0.76159416]])```
#

contains multiple arrays for 1 artist name

desert oar
#

i bet spotifyRec itself is not a scalar

#

this is why i always use .at when i expect to be indexing with scalars, if i accidentally pass a non-scalar it gives an error

undone wind
desert oar
# undone wind how do you mean

what is spotifyRec? if it's not a single scalar, then it's going to produce an index that is not a a scalar, which will produce a 2d array from sig[index]

undone wind
#

do you just mean

   indices
   
artist_name
Bob Marley & The Wailers        0
Nick Glennie-Smith              1
Galactic                        2
Giacomo Puccini                 3
The Lagoons                     4
                            ...  
Glass Animals               23267
Night Beats                 23268
Jackie Kashian              23269
311                         23270
Bruce Broughton             23271
Length: 23272, dtype: int64

   spotifyRec = input("Enter the artist you would like a recommendation based on!")```
?
#

and then index = indices[artist_name]

#

spotifyRec is just an input from the user

desert oar
#

ok, can you confirm that index is also a scalar?

#

do this

index = indices.at[artist_name]
#

this way you definitely get an error if it's wrong

#

also can you show me sig.shape to confirm that it is definitely 2d and not 3d?

undone wind
desert oar
#

alright

#

and can you print the value of index too? using the same artist_name that caused the problem before

undone wind
#

i think I was right earlier on, multiple songs with the same artist

#

is whats causing arrays rather than tuples

#

the movie one works because there arent any movies with the exact same title

#

would you say this is whats happening? @desert oar

desert oar
#

i think you need to reconsider these data structures

analog kestrel
#

Hi all, I have a question regarding training a multilayer perceptron for mnist using the classes/functions that I was provided with. Is this an appropriate place to reach out for some assistance?

undone wind
#

wont make the greatest recommender system but I just want it working for now

#

and then is there a place anyone could recommend ( 😏 ) for learning more about content based recommenders and implementing them

desert oar
#

this has nothing to do with recommendation systems... this is a matter of being a bit smarter about numpy and pandas usage

#

if your recommender system works with songs, then you have a song recommender system

#

so you can ask for an artist, but then obviously you will get more than one song per artist

#

which is maybe fine, but then you need to be smarter about how you do the lookup

#

you need to get the list of song ids from the artist id

#

or you need to come up with an artist-level recommendation system, not a song-level recommendation system