#data-science-and-ml | Python | Page 392

lapis sequoia Mar 31, 2022, 6:38 PM

#

It's just changing the start point/ offset

#

I think i probably can't do that

#

Oh I can

#

https://c.tenor.com/qdg13PqYbxMAAAAM/yes-baby.gif

fickle hinge Mar 31, 2022, 6:51 PM

#

Hey guys

#

So I have a dataset which contains like 28 records.
Is it feasible to run ANN on this dataset?

#

It's a regression problem

misty flint Mar 31, 2022, 7:26 PM

#

oh man i should just take a bayesian class tbh kekHands
honestly im not even trying to go too far into bayesian stats. i only listened to that podcast bc one of my favorite podcasters was a guest on that show

#

but thanks for the guidance

#

i bookmarked the references anyway

#

kekHands

desert oar Mar 31, 2022, 7:58 PM

#

fickle hinge So I have a dataset which contains like 28 records. Is it feasible to run ANN on...

depending on the problem/data, maybe it would be ok with a small network (small number of features and hidden nodes)

#

but 28 is really small. like maybe even small for traditional linear regression

#

depending on what you actually want to do, you might want to take a more statistical approach to solving your problem

fickle hinge Mar 31, 2022, 8:07 PM

#

Hmm okay
Thank you my sir

sweet sequoia Mar 31, 2022, 8:28 PM

#

[0, 1, 3, 5, 21, 22, 22, 24, 25, 25, 26, 27, 31, 32, 34, 40, 40, 42, 43, 44, 47, 50, 52, 55, 56, 56, 57, 58, 58, 59, 60, 63, 74, 76, 76, 80, 83, 84, 84, 86, 86, 87, 88, 90, 91, 91, 95, 97, 97, 100]
okay so I have a sorted array like this: to get half of it I do
half_data = data[:len(data)//2]
but for getting all the data after the half
like the first half of data gets cancelled

#

Um how do I get the second half of the data? or well 24th index

#

got it

lapis sequoia Mar 31, 2022, 9:16 PM

#

sweet sequoia got it

What was the problem that was happening?

thin palm Mar 31, 2022, 10:58 PM

#

does this graph make sense

#

Space X missions cost

#

Screen_Shot_2022-03-31_at_4.59.06_PM.png

desert oar Mar 31, 2022, 11:04 PM

#

thin palm

maybe just say "Missions", not "Total Missions" on the y axis

#

also the graph makes sense but it's pretty ugly data for a histogram

#

what is the bin size? you should state that imo

#

with small datasets, histograms are really sensitive to the bin size

#

i'd suggest maybe adding a "rug plot" to the bottom showing all the individual missions

#

https://en.wikipedia.org/wiki/Rug_plot

Rug plot

A rug plot is a plot of data for a single quantitative variable, displayed as marks along an axis. It is used to visualise the distribution of the data. As such it is analogous to a histogram with zero-width bins, or a one-dimensional scatter plot.
Rug plots are often used in combination with two-dimensional scatter plots by placing a rug plot o...

mighty spoke Mar 31, 2022, 11:07 PM

#

Hi I keep trying to run my code but its not running properly ```import scipy.integrate
import numpy as np
from matplotlib import pyplot as plt
from scipy.constants import G
from scipy.constants import m_e
from scipy.constants import m_p
from scipy.constants import c
from scipy.constants import hbar
#defining constants
C=0.86
#h_bar = 1.05457266e-34
Ye =[ 6/12,26/56] #carbon-12 nuclei and iron-56 nuclei
#c= 2.99792458e8#speed of light
#G=6.67259e-11#gravitational constant
#me = 9.1093897e-31#mass of electron
#mp = 1.6726231e-27 #mass of proton
h_bar=hbar
mp=m_p
me=m_e

rh0_0 = (mpme**3c**3)/(3np.pih_bar)#natural unit for density

#define scaling functions

R0Val=[]#natural unit of length
#cycling through the 2 values of Ye the number of electrons per nucleon
for i in Ye:
R0=np.sqrt((C3imec**2)/(4np.piGmprh0_0))
R0Val.append(R0)

#defining first order ODE's
def rhs3(x, p):
dpdx = np.zeros_like(p)
M = p[0]
q = p[1]
g=q2/3/(3*(1+q2/3)1/2)#gamma factor us a function of q
dpdx[0] = 3qx2
dpdx[1] = -(CqM)/g*x**2
return dpdx#return the two coupled 1st order ODE's

sol = scipy.integrate.solve_ivp(rhs3, [0,1], [0,1], dense_output=True)
x=np.linspace(0,1,1000)
M=sol.sol(x)[0, :]
q=sol.sol(x)[1, :]```

thin palm Apr 1, 2022, 12:16 AM

#

desert oar also the graph makes sense but it's pretty ugly data for a histogram

thanks so much for your help on all my questions last few days man. Been super clutch, now going to write my 5 page report. Cheers

#

Does anyone know any good ways to share my Jupyter Notebooks? I know the company probably wants instructions on how to use it

desert oar Apr 1, 2022, 12:28 AM

#

thin palm Does anyone know any good ways to share my Jupyter Notebooks? I know the company...

you can share the .ipynb file as-is and they will be able to see the outputs you saved to it

#

you can also use nbconvert to export it to an html file or even pdf

thin palm Apr 1, 2022, 12:36 AM

#

desert oar you can share the `.ipynb` file as-is and they will be able to see the outputs y...

so in my instructions they ask for "we’d also like you to include any code you wrote along the way to generate it."
Is it still okay to share the .ipynb or are they wanting something different

mild dirge Apr 1, 2022, 12:47 AM

#

thin palm Does anyone know any good ways to share my Jupyter Notebooks? I know the company...

Make a readme for this

mild dirge Apr 1, 2022, 12:47 AM

#

thin palm so in my instructions they ask for "we’d also like you to include any code you ...

and code can be inside a jupyter notebook

thin palm Apr 1, 2022, 12:53 AM

#

mild dirge Make a readme for this

man thank you for reminding me!!!!

bold timber Apr 1, 2022, 1:07 AM

#

Hi, I have a question: Why I got different version of PyTorch?

serene scaffold Apr 1, 2022, 1:38 AM

#

@bold timber must be different virtual environments

bold timber Apr 1, 2022, 1:54 AM

#

serene scaffold <@786960616664727572> must be different virtual environments

How to fix that?

serene scaffold Apr 1, 2022, 1:57 AM

#

bold timber How to fix that?

you need to start the jupyter notebook using the same virtual environment for the pip call on the left. unfortunately it would be very difficult to walk you through this remotely.

#

you can run import sys; print(sys.executable) in the notebook, and which pip in the terminal, and see if they're in the same folder.

pseudo wren Apr 1, 2022, 2:12 AM

#

I am trying to write a helper function for this json tuple that i am currently working with

#

i tried to unpack the tuple

#

but it doesn't appear to be working

#

strings = []
for row in r.json():
    strings.append(json.dumps(row))
strings

tooples = []
for row in strings:
    tooples.append((row,))
tooples```

#

this is the conversion i did

#

i want to pull the entire row based on race

#

however my attempts at unpacking this are falling short so i'm not sure what i'm forgetting in trying to access this data

#

i'm using sqlite3

serene scaffold Apr 1, 2022, 2:19 AM

#

no one is going to want to look at this. if you paste a few lines of it into the chat as text, that would be sufficient to establish what is happening.

urban prism Apr 1, 2022, 2:28 AM

#

!paste

arctic wedgeBOT Apr 1, 2022, 2:28 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

pseudo wren Apr 1, 2022, 2:29 AM

#

@urban prism thank you

#

https://paste.pythondiscord.com/erofeqelac

#

here are a few lines of the output

urban prism Apr 1, 2022, 2:32 AM

#

Any context? :) @pseudo wren

pseudo wren Apr 1, 2022, 2:33 AM

#

yes!

#

so

#

i converted this dictionary to be able to put into my sqlite3 server

#

i converted it into a tuple

#

and now i want to write a helper function that will parse through the "race" section and return rows based on race

#

i also want to be able to return rows based on year

#

i'm new to using sqlite3 with python

#

so i'm not super sure how to execute this. I tried to unpack it like a regular tuple, but it didn't work.

urban prism Apr 1, 2022, 2:36 AM

#

Oh. I'd normally just use pandas, so I don't really know

pseudo wren Apr 1, 2022, 2:40 AM

#

I did import pandas

#

But this is like

#

A tuple

#

With a string of dictionary inside

#

So i’m trying to figure out the best approach

bold timber Apr 1, 2022, 2:46 AM

#

serene scaffold you need to start the jupyter notebook using the same virtual environment for th...

When I want to import torch it did happen. How to fix this?

urban prism Apr 1, 2022, 2:53 AM

#

pseudo wren So i’m trying to figure out the best approach

Maybe get the string tuple and split it from the commas?

desert oar Apr 1, 2022, 3:18 AM

#

thin palm so in my instructions they ask for "we’d also like you to include any code you ...

the ipynb is the code. but you should at least also list the packages you installed that are required to run it.

serene scaffold Apr 1, 2022, 3:24 AM

#

bold timber When I want to import torch it did happen. How to fix this?

did you try looking into "restart jupyter notebook kernel"?

#

your question about why your torch versions were different is fine as there weren't any obvious leads for how to resolve it. but if you ask "how do I fix this?" every time something goes wrong, you won't really develop any debugging skills.

kindred crag Apr 1, 2022, 3:37 AM

#

anyone help plz

serene scaffold Apr 1, 2022, 3:43 AM

#

kindred crag anyone help plz

this question belongs in #tools-and-devops

kindred crag Apr 1, 2022, 3:48 AM

#

serene scaffold this question belongs in <#463035462760792066>

ty

pseudo wren Apr 1, 2022, 3:51 AM

#

urban prism Maybe get the string tuple and split it from the commas?

It needs the commas to be read the correct way

misty flint Apr 1, 2022, 3:59 AM

#

pseudo wren It needs the commas to be read the correct way

!e

data = (1, '{"year": "2019", "leading_cause": "Alzheimer\'s Disease (G30)", "sex": "Female", "race_ethnicity": "Asian and Pacific Islander", "deaths": "50", "death_rate": "7.719849741", "age_adjusted_death_rate": "6.207494885"}')

import json

def tuple_unpacker(data):
  (tuple1, tuple2) = data
  dictionary = json.loads(tuple2)
  return dictionary["race_ethnicity"]

# function call
tuple_unpacker(data)

arctic wedgeBOT Apr 1, 2022, 3:59 AM

#

@misty flint :warning: Your eval job has completed with return code 0.

[No output]

misty flint Apr 1, 2022, 3:59 AM

#

ah shoot; forgot to print kekHands

#

well it returns this

#

but the thing is

#

this is just for unpacking for one row (but you can loop through it, etc.)

#

if you want it to return rows by race_ethnicity or by year, i recommend 1) unpacking and then 2) sending it to either an actual SQL database or pandas dataframe

#

so you can use the groupby function on it

#

otherwise you would have to create some sort of sorting algorithm if you try to do it all in one place

#

which does not sound like fun to me

#

kekHands

#

but i mean you could do it if you want

#

i guess if its already unpacked, you could do a lot of things to it already

#

PikaThink

pseudo wren Apr 1, 2022, 4:36 AM

#

;-; don't make me cry rex

#

@misty flint

#

i guess i could add a for loop

safe elk Apr 1, 2022, 5:35 AM

#

pseudo wren <@!446424248479645706>

Lmao coffee bribe

hybrid mica Apr 1, 2022, 6:51 AM

#

why is logistic regression a classification model?

paper trellis Apr 1, 2022, 7:17 AM

#

hey not sure if this is the right channel for this question, but what kind of graphs do you guys suggest if im trying to display frequency of data by location?
for example imagine a square separated into 3x3 sections. Each section has a numerical data associated with it, and I'd like to see which section gives me a high/low/most often etc

rotund isle Apr 1, 2022, 7:42 AM

#

I have a question
For sufficiently complex feature mappings, what problematic issue will we encounter that is particular to Logistic Regression

safe elk Apr 1, 2022, 7:48 AM

#

paper trellis hey not sure if this is the right channel for this question, but what kind of gr...

2D histogram

#

https://numpy.org/doc/stable/reference/generated/numpy.histogram2d.html

paper trellis Apr 1, 2022, 7:49 AM

#

safe elk 2D histogram

Thanks! I'll give it a try, from pictures looks like something that's really applicable for my case

safe elk Apr 1, 2022, 7:50 AM

#

hybrid mica why is logistic regression a classification model?

Its a binary classifier

safe elk Apr 1, 2022, 7:50 AM

#

paper trellis Thanks! I'll give it a try, from pictures looks like something that's really app...

Actually used that once lmao

hybrid mica Apr 1, 2022, 7:50 AM

#

does it output a number between 0 and 1? so logistic regression cannot be used with more than 2 groups? couldn't you use any regression model for binary classification?

safe elk Apr 1, 2022, 7:51 AM

#

hybrid mica does it output a number between 0 and 1? so logistic regression cannot be used w...

Two groups only

#

Ok regression can be done with two groups if we set a threshold value

hybrid mica Apr 1, 2022, 7:54 AM

#

so
features... binary value
1 x y z 1
2 a b c 0
...

safe elk Apr 1, 2022, 7:54 AM

#

https://en.wikipedia.org/wiki/Logistic_regression

Logistic regression

In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event taking place, such as the probability of a team winning, of a patient being healthy, etc. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being d...

#

Says Mathematically, a binary logistic model has a dependent variable with two possible values, such as pass/fail which is represented by an indicator variable, where the two values are labeled "0" and "1"

#

So binary class

hybrid mica Apr 1, 2022, 7:55 AM

#

is the output of logistic regression a binary class, or a decimal between 0 and 1; will it say 0, or something like 0.21?

#

source: https://www.analyticsvidhya.com/blog/2020/11/popular-classification-models-for-machine-learning/

Analytics Vidhya

Saurabh Gupta

Classification Models in Machine Learning | Classification Models

Classification is a basic type of problem every data scientist must know. Let's have a look at various classification models in ML.

safe elk Apr 1, 2022, 8:00 AM

#

If it is a binary logistic regression 0 or 1 ...if plain logistic regression it can have intermediate I think

#

So least squares in linear regression vs sigmoid in logistic regression

#

https://realpython.com/logistic-regression-python/

Logistic Regression in Python – Real Python

In this step-by-step tutorial, you'll get started with logistic regression in Python. Classification is one of the most important areas of machine learning, and logistic regression is one of its basic methods. You'll learn how to create, evaluate, and apply a model to make predictions.

#

Sample there did binary class

#

Typical logistic regression use

#

Because they did a "common case of logistic regression applied to binary classification"

safe elk Apr 1, 2022, 8:12 AM

#

pseudo wren ;-; don't make me cry rex

Dont cry it wont fix your code lmao

safe elk Apr 1, 2022, 8:19 AM

#

rotund isle I have a question ` For sufficiently complex feature mappings, what problematic ...

It is linear classifier like linear regression if the relationship between the variables are non linear or complex the model wont be too accurate

rotund isle Apr 1, 2022, 8:33 AM

#

safe elk It is linear classifier like linear regression if the relationship between the ...

yeah but that's not particular to logistic regression, that's in linear regression too

mint palm Apr 1, 2022, 8:34 AM

#

#

#

are these perfectly fine?

#

i hope

#

🤞

safe elk Apr 1, 2022, 8:44 AM

#

rotund isle yeah but that's not particular to logistic regression, that's in linear regressi...

If distribution is gaussian use linear regression if binomial use logistic regression...using logistic regression where the distribution doesnt match will hurt model accuracy

hybrid mica Apr 1, 2022, 9:12 AM

#

how do you get the coefficients and intercept for multiple linear regression?
is jupyter notebook recommended for data science / machine learning?

safe elk Apr 1, 2022, 10:02 AM

#

hybrid mica how do you get the coefficients and intercept for multiple linear regression? is...

See https://codeburst.io/multiple-linear-regression-sklearn-and-statsmodels-798750747755

Medium

Multiple Linear Regression: Sklearn and Statsmodels

In my last article https://medium.com/@subarna.lamsal1/linear-regression-normally-vs-with-seaborn-fff23c8f58f8 , I gave a brief…

#

Jupyter notebook good for EDA and small scale...if you are going to deploy your model to prod avoid it

odd meteor Apr 1, 2022, 10:37 AM

#

Stochastic Gradient Descent updates the weight n-times.

n = sample size /number of observation.

So if your data has 5000 rows, the sample size = 5000. SGD updates the weight per each sample ; i.e in our case here 5000 times. So it updates your weight per each number of sample observation in your data.

The main difference between Gradient Descent & SGD is just how the algorithm updates the weight.

Gradient Descent takes in the data and update weight just once. (This usually don't escape getting stuck in the local minima) so using SGD helps us to avoid getting stuck in local minima

There's also Mini-Batch Stochastic Gradient Descent.

They are all variants of Gradient Descent. The major difference to me is just how each algorithm updates the weight.

odd meteor Apr 1, 2022, 10:42 AM

#

safe elk Jupyter notebook good for EDA and small scale...if you are going to deploy your ...

I usually use JNB for modelling part, then switch to VSCode for deployment.

safe elk Apr 1, 2022, 10:44 AM

#

odd meteor I usually use JNB for modelling part, then switch to VSCode for deployment.

Makes sense...

odd meteor Apr 1, 2022, 10:44 AM

#

hybrid mica how do you get the coefficients and intercept for multiple linear regression? is...

By calling model.coef_ for weights (slope in Statistics) and
model.intercept for bias (in ML lingo) 😀

small orbit Apr 1, 2022, 11:46 AM

#

How do i get pandarallel to work? it just runs for hours without any error messages, the progress bars shows 0.00%. code: https://pastebin.com/WTHKDDSp

Pastebin

from pandarallel import pandarallelfrom math import sinimport panda...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

https://usercontent.irccloud-cdn.com/file/02tYZoCT/Screenshot.png

drowsy hemlock Apr 1, 2022, 12:51 PM

#

how does this code give me a negative values of y?

#

x=np.arange(0,100)
y=x**5
plt.scatter(x,y)
plt.show()

#

serene scaffold Apr 1, 2022, 1:20 PM

#

@drowsy hemlock thanks for giving a reproducible example! looks like the problem is that your integers are overflowing. if you do x = np.arange(0, 100).astype(float), the problem should go away.

#

the orange line is when I converted everything from integers to floats

#

#

!e

import numpy as np
result = np.arange(0, 100, 10) ** 5
print(result)

arctic wedgeBOT Apr 1, 2022, 1:31 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [         0     100000    3200000   24300000  102400000  312500000
002 |   777600000 1680700000 3276800000 5904900000]

serene scaffold Apr 1, 2022, 1:31 PM

#

hmmmmmm, bad example, I guess

drowsy hemlock Apr 1, 2022, 1:34 PM

#

serene scaffold <@!765475236702453760> thanks for giving a reproducible example! looks like the ...

ok thanks a lot.....

#

so will .astype fix overflow error while handling large datasets

serene scaffold Apr 1, 2022, 1:37 PM

#

the problem is the size of the numbers, not the size of the dataset

drowsy hemlock Apr 1, 2022, 1:37 PM

#

i ment large numbers in the datasets😅

orchid heart Apr 1, 2022, 2:01 PM

#

Hi everybody i have quick question maybe someone has an idea. I am trying to learn what kind of model name branding different company are using. quick example
apple: iphone 13s, iphone 8s, iphone 12 pro…
Samsung: galaxy S21, galaxy A52, galaxy note…
so if a new modelname (G note 54)occurs without its brand my model should guess the most probable brand.
i know its quite easy here but this is just for explanation.
My main issue is idk how to transform my modelnames into a vector for the ml algorithm to use.
All i find are libraries for real text document classification but in my case its not really a whole text but only different modelnames i want to tranform and be able to transfer new occuring modelnames in the same way.

#

i am trying ascii encoding right now but idk if this is the smartest way

lavish rune Apr 1, 2022, 2:24 PM

#

Make a python progarm that continuasly asks the user for a postive interger greater than 0. For every number enetered , your program must display pass if the sum of the of the squars of even digits is less then the sum of the sqaures of odd digits otherwise it must display fail. No input checking is required

#

I’m stuck on this question

#

Can anyone help me

serene scaffold Apr 1, 2022, 2:26 PM

#

@lavish rune this isn't a data science question, so try a help channel. see #❓｜how-to-get-help

serene scaffold Apr 1, 2022, 2:30 PM

#

orchid heart Hi everybody i have quick question maybe someone has an idea. I am trying to lea...

you can't really do this in a generalized way. knowing that something is the "iPhone 12" only tells you the brand and the order it was put on the market. those are two features. the names of Samsung phones tell you different things (there's the S series and the A series, whereas iPhone just has one series)

mild dirge Apr 1, 2022, 2:31 PM

#

odd meteor Stochastic Gradient Descent updates the weight n-times. n = sample size /numbe...

Thanks for the reply. But my problem was more that people just consider mini-batch GD as SGD

#

Some people just use them as synonyms which is confusing :/

serene scaffold Apr 1, 2022, 2:31 PM

#

odd meteor Stochastic Gradient Descent updates the weight n-times. n = sample size /numbe...

.bm

desert oar Apr 1, 2022, 2:32 PM

#

@odd meteor @mild dirge the point i was making is that they are synonymous because they have the same properties. it's just scaling the batch size from 1 to "N".

#

yes, smaller batches => more weight updates and smaller (but possibly more erratic) update steps

#

batch size is a parameter to the optimization algorithm, just like learning rate is

orchid heart Apr 1, 2022, 2:33 PM

#

serene scaffold you can't really do this in a generalized way. knowing that something is the "iP...

oh okay but wont the model learn model names including iphone is more likely apple and a modelname including galaxy or note its a samsung?

serene scaffold Apr 1, 2022, 2:33 PM

#

orchid heart oh okay but wont the model learn model names including iphone is more likely app...

well, iPhone is always apple.

#

these aren't things you need ML for.

desert oar Apr 1, 2022, 2:34 PM

#

orchid heart oh okay but wont the model learn model names including iphone is more likely app...

you can do that, yes. you can tokenize the model name into words and even just look at a frequency table. most model names will uniquely identify a company.

orchid heart Apr 1, 2022, 2:34 PM

#

serene scaffold well, iPhone is *always* apple.

like i said its just an simplified example but thats what i am generally trying to do

desert oar Apr 1, 2022, 2:34 PM

#

the best you can do for forecasting is just incrementing the number at the end 😆

compact rose Apr 1, 2022, 2:34 PM

#

Hello guys, i have a question. So, i was working in a dataset and it something has caught our attention. We are working in a dataset from spotify where we want to predict the music popularity. We are know on the phase of data preparation and we saw the some songs are duplicated. However, we can't call the fuction of duplicated() because there are songs with the same name but not with the same values(In other words, they just have the same name, but songwriter are another or is it a cover). In the screenshot that i post, you can see six rows, where four of them are repeated and the others aren't. The only thing that changes are the values of the song popularity, however they are the same song because they all the same value in the other features. We want to keep the highest values or a mean of it and remove the rows that are duplicating. What line of code can i do?

serene scaffold Apr 1, 2022, 2:35 PM

#

desert oar you can do that, yes. you can tokenize the model name into words and even just l...

it's not clear if they're actually doing NLP or if they're just trying to encode the product names as features for some other purpose

desert oar Apr 1, 2022, 2:36 PM

#

compact rose Hello guys, i have a question. So, i was working in a dataset and it something h...

copying my response from the help channel:

a song is not uniquely identified its name. you can have 2 songs of the same name by 2 different artists
they might be 2 different versions of the same song, or they might be 2 completely different songs that have the same title. you can have 2 different versions of the same song by the same artist that are substantially different. you can have 2 cover versions of the same song that are more or less interchangeable from a listener's perspective.

so my first question is: why do you even want to deduplicate at all? what does "unique" actually mean in your project?

#

and copying your reply:

Oh i didn't notice that u had channel for data science. sorry. However, if you notice the columns, there are various columns that make us understand if there was any change. Some of them are completely different songs and we can check it by looking in the values ( I just noticed that my printscreen didn't catch any of it) but other just change a value in song popularity and we can understand that is duplicated! But i apreciate your answer, because it make me thinking of it !

orchid heart Apr 1, 2022, 2:38 PM

#

serene scaffold it's not clear if they're actually doing NLP or if they're just trying to encode...

maybe its actually encoding as features because i want to find the neareast neighbour of an unknown modelname afterwards to guess the brand then

desert oar Apr 1, 2022, 2:38 PM

#

so you intend to use the genre scores to detect duplicates, that is a great idea @compact rose

#

you can't really solve this with one line of code though

#

you can use scipy.spatial.distance.pdist to compute the pairwise differences of all rows in an array, and then set a threshold on distance

#

or you can use hdbscan or hierarchical clustering

#

you could even manually label pairs and train a model, but that seems like overkill in this case

#

you can make this a lot faster if you apply the distance operation within groups of song titles, because songs with different titles probably aren't duplicates

#

of course you might need to do some text processing on the song titles first

compact rose Apr 1, 2022, 2:43 PM

#

desert oar _copying my response from the help channel:_ a song is not uniquely identified ...

Yeah, but notice that it would change the values of columns. If one was live, the value in liveness columns would increase. If was from another person, it would change at least a decimal value in speechiness. I just want to perfect it and learn how we can change it 😄

desert oar Apr 1, 2022, 2:43 PM

#

btw where did you get this dataset? seems interesting to work with

desert oar Apr 1, 2022, 2:43 PM

#

compact rose Yeah, but notice that it would change the values of columns. If one was live, th...

what do you mean "from another person"? what is each row in this dataset exactly?

compact rose Apr 1, 2022, 2:47 PM

#

desert oar what do you mean "from another person"? what is each row in this dataset exactly...

Check the rows in 4,5,6 and 7. In that rows, you can check that all values are the same except in song popularity. If it was from another person, the speechiness would change because it measure the speech-like. And we also can look for the column with the name song duration, where it represents the time of each song. This is an assumption, but i think most of the songs that have a cover or a remake or something similar, have at least one more second or a milisecond

compact rose Apr 1, 2022, 2:48 PM

#

desert oar btw where did you get this dataset? seems interesting to work with

It is a class project. I am learning python and this is for a predictive model. However, if you look for predictive dataset spotify, you can check it ^^ I can also pass you the dataset if you wish

gleaming osprey Apr 1, 2022, 2:49 PM

#

I found a maths function that might be useful as a error function, should I show?

#

maybe I didnt find it, but its cool

odd meteor Apr 1, 2022, 2:50 PM

#

desert oar <@!519319496868233227> <@!309775277720993792> the point i was making is that the...

Yes that's true.

desert oar Apr 1, 2022, 2:50 PM

#

compact rose Check the rows in 4,5,6 and 7. In that rows, you can check that all values are t...

so each row is one user's judgement of a song? as in, it's their opinion of how the song sounds?

#

or each row is one song on spotify?

desert oar Apr 1, 2022, 2:51 PM

#

compact rose Check the rows in 4,5,6 and 7. In that rows, you can check that all values are t...

yes, song duration is also a good indicator of similarity. maybe good enough that you don't need to mess with the genre stuff

#

i see a lot of spotify datasets compiled from their web api

compact rose Apr 1, 2022, 2:54 PM

#

It would be a spotify song, at least that is the information that we had. However, one of thing that our teacher make us understand is that dataset have always many error and most of the projects that we are going to work, we are going to use 75% of our resources in data preparation. So saying that, we think it maybe some values that professor put in purpose so we can learn how to solve it ! 😄

compact rose Apr 1, 2022, 2:55 PM

#

desert oar i see a lot of spotify datasets compiled from their web api

I can't say it is , however the columns are very similar to it. We have done some research and we could find there what would be the acceptable values for each column

compact rose Apr 1, 2022, 2:56 PM

#

desert oar yes, song duration is also a good indicator of similarity. maybe good enough tha...

I think it is the best. At least that is how we are going to support the purpose of deleting some rows in these dataset

desert oar Apr 1, 2022, 2:58 PM

#

compact rose It would be a spotify song, at least that is the information that we had. Howeve...

75% of our resources in data preparation
yep, that is to be expected

#

sounds like you are going in the right direction then

compact rose Apr 1, 2022, 3:02 PM

#

Thank you a lot! It means much to hear that! Sometimes, i find myself looking hours in how to do one thing that is so simple, however I think it is the learning curve talking to me ^^ But i have a question, would you guys recommend learning from Datacamp? I have been spending some time there since our professor apply us as 'premium' members and I have been using my free time in learning there. I am know learning more things to do in python, but i also want to learn SQL and Power BI. Do any of you guys have any reference from it ^^

hybrid mica Apr 1, 2022, 3:19 PM

#

jupyter notebook vs jupyter lab?

misty flint Apr 1, 2022, 3:39 PM

#

doesnt matter

#

kekHands

vestal ocean Apr 1, 2022, 4:29 PM

#

how can i reverse the effect of this?

#

dont want to print the full df anymore

serene scaffold Apr 1, 2022, 4:31 PM

#

vestal ocean how can i reverse the effect of this?

I guess you can replace the Nones with ints

tough frigate Apr 1, 2022, 4:31 PM

#

why not just use .head()?

#

By the way, I wanna have a solid understanding of data preparation, got any resources? i'd appreciate that.

arctic crown Apr 1, 2022, 4:34 PM

#

please help i want to get started with ml but i dont knwo where to start

serene scaffold Apr 1, 2022, 4:36 PM

#

arctic crown please help i want to get started with ml but i dont knwo where to start

did you learn about k nearest neighbors yet?

arctic crown Apr 1, 2022, 4:37 PM

#

not yet i want to start ffrom the begining

serene scaffold Apr 1, 2022, 4:38 PM

#

that pretty much is the beginning. it's one of the simplest kinds of classifiers.

tough frigate Apr 1, 2022, 4:41 PM

#

serene scaffold that pretty much is the beginning. it's one of the simplest kinds of classifiers...

let him first understand what is ML and followed by Supervised and Unsupervised learning you know

arctic crown Apr 1, 2022, 4:41 PM

#

tough frigate let him first understand what is ML and followed by Supervised and Unsupervised ...

yea

#

where can i do that?

#

i havent fond a good tutorial yet

tough frigate Apr 1, 2022, 4:43 PM

#

you can go through kaggle's official site

#

pretty much decent tuts

vestal ocean Apr 1, 2022, 5:04 PM

#

if i have a dataframe like this, how could i plot a line graph with 10 different lines, with date on the x axis and streams on the y?

#

the 10 different lines for 10 different artists

tough frigate Apr 1, 2022, 5:07 PM

#

use seaborn

#

and set hue="Artist"

vestal ocean Apr 1, 2022, 5:08 PM

#

tough frigate and set hue="Artist"

could u possibly provide an example?

#

not familar with seaborn

jaunty belfry Apr 1, 2022, 5:09 PM

#

tough frigate Apr 1, 2022, 5:09 PM

#

vestal ocean could u possibly provide an example?

dry lichen Apr 1, 2022, 5:10 PM

#

hai

jaunty belfry Apr 1, 2022, 5:10 PM

#

can someone tell what is .values[0] here? I checked the documentation but their they use only .values not .values[] with any index

vestal ocean Apr 1, 2022, 5:11 PM

#

tough frigate

thank you, i take it flights is the name of the df?

tough frigate Apr 1, 2022, 5:11 PM

#

yep

dry lichen Apr 1, 2022, 5:11 PM

#

is there any projects or teams where I could join to learn some stuff about this topic?

tough frigate Apr 1, 2022, 5:12 PM

#

jaunty belfry can someone tell what is .values[0] here? I checked the documentation but their ...

its fetching the first value from the array that you get after .values

jaunty belfry Apr 1, 2022, 5:14 PM

#

tough frigate its fetching the first value from the array that you get after .values

there is only one value being fetched by the code

#

without values[0]

#

@tough frigate

tough frigate Apr 1, 2022, 5:15 PM

#

lol your code is weird, it says, if your income's first value is less than 35000 than return the gender's first value or else return male

jaunty belfry Apr 1, 2022, 5:17 PM

#

no @tough frigate ...its just picking a random row from dataset

#

then checking if this random's income is less than 35000

tough frigate Apr 1, 2022, 5:17 PM

#

what are you trying to do there?

jaunty belfry Apr 1, 2022, 5:17 PM

#

may be you misread

#

its just a question in my course

#

and this values[0] thing confused me

#

its a quiz

#

basically

tough frigate Apr 1, 2022, 5:20 PM

#

.values results in an array of income, and if you do .values[0] that means it fetches the first value from your income column

jaunty belfry Apr 1, 2022, 5:27 PM

#

@tough frigate ...thanks

#

it was just unrequired in this code

#

thats what was confusing

vestal ocean Apr 1, 2022, 5:33 PM

#

@tough frigate im trying to rotate the x-axis labels but i get this output

tough frigate Apr 1, 2022, 5:34 PM

#

lol i dont remember every stuff, just go through the documentation

vestal ocean Apr 1, 2022, 5:40 PM

#

tough frigate lol i dont remember every stuff, just go through the documentation

yep found it, is it possible to show specific ticks like for each month and not the individual days without adding them together for the month?

#

cos i was a little unsure how to go about adding all the values for each month

#

for a df like this, how could i add all the streams for each month for each artist?

inland mantle Apr 1, 2022, 5:48 PM

#

what is the difference between a framework and a libary. For example Open CV and tenserflow

misty flint Apr 1, 2022, 5:58 PM

#

from what i understand, a framework is more "opinionated" than a library - that is my loose working definition

#

kekHands

robust jungle Apr 1, 2022, 6:03 PM

#

does anyone have any simple projects to get comfortable with the basics of tensorflow?

#

just finished a few of google's tutorials on it

bright garden Apr 1, 2022, 6:22 PM

#

robust jungle does anyone have any simple projects to get comfortable with the basics of tenso...

MNIST is a very popular dataset to get into neural networks

#

https://www.tensorflow.org/datasets/catalog/mnist

TensorFlow

mnist | TensorFlow Datasets

#

Though I'm not sure if you're comfortable with the level of programming they're doing for this

#

But definitely a classic to try out whenever you can

robust jungle Apr 1, 2022, 6:26 PM

#

bright garden MNIST is a very popular dataset to get into neural networks

wonderful

#

thanks

glad flume Apr 1, 2022, 6:45 PM

#

can someone help me with matplotlib?

#

i didnt get the meaning of this image

#

pseudo wren Apr 1, 2022, 6:54 PM

#

  (tuple1, tuple2) = x
  dictionary = json.loads(tuple2)
  return dictionary["race_ethnicity"]```

#

helper_function_1(tooples)

#

so i am trying to unpack my tuple in order to access the race and ethnicity section of the dictionary housed inside of it

#

however when i call the helper function i wrote, it says too many values to unpack

#

how should i go about fixing this

serene scaffold Apr 1, 2022, 7:03 PM

#

inland mantle what is the difference between a framework and a libary. For example Open CV and...

libraries have functions and classes that you can import and use however you'd like. frameworks tend to have a bunch of parts that partially implement some solution, and you supply a few key parts, and it uses them to achieve the rest.

it's also sort of a matter of perspective. the two neural network libraries are library-like when you're just making tensors and using them, but start to behave more like frameworks if you use the Sequential class to make a model, since that involves saying what layers you want, and it manages passing data through it.

#

CC @misty flint ^

misty flint Apr 1, 2022, 7:07 PM

#

this is a good mental framework kekHands

#

thanks for the CC

#

DoggoKek

tough frigate Apr 1, 2022, 7:08 PM

#

pseudo wren how should i go about fixing this

Give more variables when you unpack, first learn what parameters does that function have and what output does it throw

misty flint Apr 1, 2022, 7:09 PM

#

serene scaffold libraries have functions and classes that you can import and use however you'd l...

start to behave more like frameworks if you use the Sequential class to make a model, since that involves saying what layers you want, and it manages passing data through it.
def 100% agree with this perspective part; i can see this making sense for both pieces

pseudo wren Apr 1, 2022, 7:09 PM

#

tough frigate Give more variables when you unpack, first learn what parameters does that funct...

i should explain a bit better

#

so right now i am experimenting with my first sqlite3 server

#

to put it in the server i had to pack it into a tuple

misty flint Apr 1, 2022, 7:10 PM

#

stelercus, if you had a substack, i would subscribe and read it tbh @serene scaffold

#

kekHands

pseudo wren Apr 1, 2022, 7:10 PM

#

i am now trying to unpack the tuple to manipulate the data inside

#

i created the function to unpack the tuple to access the data

#

i thought if i tried entering my toople variable as a parameter, it would work

serene scaffold Apr 1, 2022, 7:11 PM

#

misty flint stelercus, if you had a substack, i would subscribe and read it tbh <@!253696366...

what is that

errant fern Apr 1, 2022, 7:13 PM

#

can i ask python pipeline related questions here ?

tough frigate Apr 1, 2022, 7:13 PM

#

pseudo wren i thought if i tried entering my toople variable as a parameter, it would work

I don't think that's how it works, if your tuple has 3 elements, so when unpacking you either create three different variables or use asterisk to get thr remaining elements inside a single variable making it a tuple

errant fern Apr 1, 2022, 7:13 PM

#

errant fern can i ask python pipeline related questions here ?

I'm getting this : Attempting to weight transformer "elo_offensive_1", but it is not present in transformer_list.

serene scaffold Apr 1, 2022, 7:14 PM

#

errant fern I'm getting this : Attempting to weight transformer "elo_offensive_1", but it is...

sounds like you're doing a deep learning thing. try showing the whole error message from Traceback and the relevant code in a markdown block

#

!traceback

pseudo wren Apr 1, 2022, 7:15 PM

#

tough frigate I don't think that's how it works, if your tuple has 3 elements, so when unpacki...

i understand the number of arguments has to be equal to the number of elements

#

however this would be an inconvenient way to solve this

#

as this is an entire data set with thousands of elements

errant fern Apr 1, 2022, 7:15 PM

#

serene scaffold sounds like you're doing a deep learning thing. try showing the whole error mess...

can i share the entire colab notebook instead ? cuz i don't know what are relevant in this case

pseudo wren Apr 1, 2022, 7:15 PM

#

so what i'm looking for is a way to pass an argument that will unpack the entire thing

serene scaffold Apr 1, 2022, 7:16 PM

#

errant fern can i share the entire colab notebook instead ? cuz i don't know what are releva...

you can put the link to it here, but people are more likely to help when the question is distilled.

tough frigate Apr 1, 2022, 7:17 PM

#

pseudo wren as this is an entire data set with thousands of elements

Try using * after or before your variable, coz maybe this should work, there is a way to unpack all thr element in a single variable

errant fern Apr 1, 2022, 7:18 PM

#

serene scaffold you can put the link to it here, but people are more likely to help when the que...

oooh i see

misty flint Apr 1, 2022, 7:18 PM

#

serene scaffold what is that

what is a substack? heres an example https://technically.substack.com/

Technically

Technically explains software and hardware in a simple and engaging way so you can impress your boss. Click to read Technically, by Justin, a Substack publication with tens of thousands of readers.

#

and they usually are sent to your emails too

#

so you get articles in your inbox

#

this 'Technically' guy is hilarious tbh

errant fern Apr 1, 2022, 7:20 PM

#

serene scaffold sounds like you're doing a deep learning thing. try showing the whole error mess...

ValueError Traceback (most recent call last)

<ipython-input-188-caa54ef5dede> in <module>()
1 #fitting
----> 2 pipeline.fit(df_train_test, y_oh)

6 frames

/usr/local/lib/python3.7/dist-packages/sklearn/pipeline.py in _validate_transformer_weights(self)
1059 if name not in transformer_names:
1060 raise ValueError(
-> 1061 f'Attempting to weight transformer "{name}", '
1062 "but it is not present in transformer_list."
1063 )

ValueError: Attempting to weight transformer "elo_offensive_1", but it is not present in transformer_list.

#

pipeline.fit(df_train_test, y_oh) : the line of code with problem

serene scaffold Apr 1, 2022, 7:23 PM

#

misty flint what is a substack? heres an example https://technically.substack.com/

how is it different from a blog

misty flint Apr 1, 2022, 7:29 PM

#

serene scaffold how is it different from a blog

it is a blog

#

kekHands

errant fern Apr 1, 2022, 7:35 PM

#

errant fern pipeline.fit(df_train_test, y_oh) : the line of code with problem

pipeline = Pipeline([

# Feature Union to concatenate features
('union', FeatureUnion(
    transformer_list=[

        # Pipeline elo scores team 1
        ('elo_scores_1', Pipeline([
            ('elo_sc_1', ItemSelector(key=['elo_offensive_1','elo_defensive_1', 'elo_home_offensive_1', 
                                          'elo_home_defensive_1'])),
            ('MinMaxScaler', MinMaxScaler()), 
        ])),
        
        # Pipeline elo scores team 2
        ('elo_scores_2', Pipeline([
            ('elo_sc_2', ItemSelector(key=['elo_offensive_2','elo_defensive_2', 'elo_away_offensive_2', 
                                          'elo_away_defensive_2'])),
            ('MinMaxScaler', MinMaxScaler()), 
        ])),
        
    ],

    # 1.0 for all
    transformer_weights={
        'elo_offensive_1': 1.0,
        'elo_defensive_1': 1.0,
        'elo_home_offensive_1' : 1.0,
        'elo_home_defensive_1' : 1.0,
        'elo_offensive_2': 1.0,
        'elo_defensive_2': 1.0,
        'elo_away_offensive_2' : 1.0,
        'elo_away_defensive_2' : 1.0,
    },
)),

#Classifieur
('Classifieur', LinearSVC(random_state = 1, verbose=1)),

])

#LinearSVC(random_state = 1, verbose=1)
#RandomForestClassifier(n_estimators = 100, max_depth = 4, min_samples_split = 500)

vestal ocean Apr 1, 2022, 8:27 PM

#

Does anyone know how i could add the streams for each day for all the months for each artist?

#

My df looks like this

lavish swift Apr 1, 2022, 8:32 PM

#

anyone know of any good resources for how to work with the apply() method on a dataframe?

hybrid mica Apr 1, 2022, 8:36 PM

#

# Evaluating the Model Performance
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

in the sklearn library, with these lines of code, is what calculated the r-squared value or the adjusted r-squared value?

agile cobalt Apr 1, 2022, 8:39 PM

#

lavish swift anyone know of any good resources for how to work with the `apply()` method on a...

step 1: don't
jokes aside, look up pandas dataframe apply and check the documentation. That said, you should avoid using it as much as possible - pandas supports many operations in a way that will be much faster than whatever you want to use apply() for

#

!d sklearn.metrics.r2_score

arctic wedgeBOT Apr 1, 2022, 8:40 PM

#

sklearn.metrics.r2\_score


sklearn.metrics.r2_score(y_true, y_pred, *, sample_weight=None, multioutput='uniform_average')```
\(R^2\) (coefficient of determination) regression score function.

Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Read more in the [User Guide](https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score).

agile cobalt Apr 1, 2022, 8:40 PM

#

Note that r2_score calculates unadjusted R² without correcting for bias in sample variance of y.

lavish swift Apr 1, 2022, 8:45 PM

#

agile cobalt step 1: don't jokes aside, look up `pandas dataframe apply` and check the docume...

Thanks! too funny...I'm often the person telling people not to use apply() 😂 so I definitely get it. In my case though, since I'm working with parsing and evaluating strings series/columns, it seems like the most appropriate route.

agile cobalt Apr 1, 2022, 8:47 PM

#

depending on what it's for, you might be able to just use string accessor methods, maybe with a bit of regex

lavish swift Apr 1, 2022, 8:48 PM

#

sadly, that's not going to work in this case.

#

but you're right about the pandas docs...which I usually find fairly helpful! but for apply, they really aren't.

#

I'm definitely looking to dig more into complicated apply

serene river Apr 1, 2022, 9:09 PM

#

does anyone have a good vtk tutorial ?

misty flint Apr 1, 2022, 9:29 PM

#

i thought this was funny

#

if you havent seen gpt-3's codex model, id check it out

mild dirge Apr 1, 2022, 10:01 PM

#

Looking at some lecture slides concerning deep Q learning/network (DQN), and the slide says the following:

#

I don't fully understand what the difference is between expected maximum value, and maximum expected value

#

Found some explanation online, but that did not really clarify it

#

This explanation

#

Could someone maybe give an example or a more intuitive explanation*?

misty flint Apr 1, 2022, 10:28 PM

#

pithink

#

ah i see

#

hmm idk if i can explain it over mobile tbh

#

i might sketch something if i was on desktop

#

maybe someone else can help

mild dirge Apr 1, 2022, 10:47 PM

#

Yeah dw, I'm going to sleep, if someone replies i'll be able to see it tomorow, but would appreciate it, thanks in advance 🙂

stuck schooner Apr 1, 2022, 11:52 PM

#

Hey anyone can give me a quick help ? I am using StandardScaler() on a dataframe, it works well but it return a ndarray. I lose index but also also other information

#

I found this post : https://stackoverflow.com/questions/35723472/how-to-use-sklearn-fit-transform-with-pandas-and-return-dataframe-instead-of-num

Stack Overflow

How to use sklearn fit_transform with pandas and return dataframe i...

I want to apply scaling (using StandardScaler() from sklearn.preprocessing) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not ...

#

Which mentioned sklearn_pandas library

agile cobalt Apr 1, 2022, 11:54 PM

#

You shouldn't really care about losing that metadata

#

if you need of it back later, pass it when creating another dataframe

stuck schooner Apr 1, 2022, 11:54 PM

#

I have a df containing some columns that are continious and named attr_cont = ['col1', 'col2', etc..]

agile cobalt Apr 1, 2022, 11:54 PM

#

but usually you should not do any further manipulation with it as a dataframe after scaling

stuck schooner Apr 1, 2022, 11:55 PM

#

But I think the DataFrameMapper() expect a pd.Index(['col1', 'col2', etc..]) object

#

and return those weird columns name

Capture_decran_2022-04-02_a_01.55.37.png

agile cobalt Apr 1, 2022, 11:56 PM

#

why do you care about losing the index etc?

stuck schooner Apr 1, 2022, 11:56 PM

#

I am seeing no documentation on how to create such Index([]) object from a list

#

Because the index are linked to categorical attribute contained in dataset

#

And that it have already been through train_test_split()

agile cobalt Apr 1, 2022, 11:57 PM

#

if the index itself is a feature, use df.reset_index() to throw it into a column

stuck schooner Apr 1, 2022, 11:57 PM

#

Index isn't a feature

agile cobalt Apr 1, 2022, 11:57 PM

#

if it is "linked" to another feature, you probably should not keep it anyway

stuck schooner Apr 1, 2022, 11:57 PM

#

but df= df[attribute_continuous] + df[attribute_categorical]

#

and i want to scale only the continuous attribute

agile cobalt Apr 1, 2022, 11:58 PM

#

use a ColumnTransformer instead of separating it yourself

#

https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

scikit-learn

sklearn.compose.ColumnTransformer

Examples using sklearn.compose.ColumnTransformer: Release Highlights for scikit-learn 1.0 Release Highlights for scikit-learn 1.0, Time-related feature engineering Time-related feature engineering,...

#

(maybe also look into Pipelines while you're at it, they allow for you to define the model - including pre-processing steps - as one single thing instead of applying multiple steps yourself)

stuck schooner Apr 2, 2022, 12:00 AM

#

Thanks, teacher just told us : "use standard scaler on continuous attribute on a new dataframe, then use the two dataframe to train again model...."

#

I don't think they even realize the sort of problem this cause..

#

on the data itself ..

#

Thanks I will look into it

urban prism Apr 2, 2022, 12:16 AM

#

Can someone help me with this?
https://stackoverflow.com/questions/71704956/running-evaluation-on-tensorflow-object-detection-api-memory-allocation-issue

Stack Overflow

Running evaluation on Tensorflow Object Detection API: Memory alloc...

I'm using Tensorflow Object Detection API and I am trying to evaluate my efficientdet_d2 model from Tensorflow Model Zoo. My evaluation TFRecords files have 8880 samples in total and when I run:
!p...

desert oar Apr 2, 2022, 12:27 AM

#

stuck schooner Which mentioned sklearn_pandas library

You don't need this package now, look up ColumnTransformer

stuck schooner Apr 2, 2022, 12:28 AM

#

desert oar You don't need this package now, look up ColumnTransformer

It returns np array and don't transform df itself. I am digging deeper into it rn

#

There is this solution but it imply to go through a Pipeline with the last step being :
("pandarizer",FunctionTransformer(lambda x: pd.DataFrame(x, columns = ["x", "y"]))

#

https://datascience.stackexchange.com/a/90627

Data Science Stack Exchange

Is there a way to force a transformer to return a pandas dataframe?

I am having issues with scikit-learn converting dataframes to numpy arrays.

For instance, the following code

from sklearn.impute import SimpleImputer
import pandas as pd

df = pd.DataFrame(dict(...

iron basalt Apr 2, 2022, 1:39 AM

#

mild dirge Could someone maybe give an example or a more intuitive explanation*?

Expected value of a die is 3.5, roll that die a bunch of times and take the max, that value is probably above the expected value (overestimated).

#

So in Q-learning, the values are pretty bad at first and so it will probably overestimate badly. So it will explore a bunch of overestimated state space making learning slow at first, and even after a long time, it's still generally an overestimation (not as bad as it is at first).

#

Expected-max, and max-expected, the order matters (you can try with a die).

#

(Which will probably be larger?)

hybrid mica Apr 2, 2022, 7:45 AM

#

classifier = LogisticRegression(random_state=0)

changing the random state value on this line does not seem to do anything to my results - would anyone happen to know why?

tough frigate Apr 2, 2022, 8:30 AM

#

Random state says no matter how many times you run your model, it will take the same observations to get you the same results. So removing that parameter will result in some change.

#

Geez, these bank datasets are awful, alot to clean there

toxic marlin Apr 2, 2022, 10:44 AM

#

https://imgur.com/a/3W9Hmfk guys what could i say about this image, its a web servers data in splunk

Imgur

#

with what appears to be their assets/pages

#

and i did the least common ones

#

could i argue these are the ones that may have the least bugs/or most bugs because nobody visits them to try and exploit them?

exotic thicket Apr 2, 2022, 11:32 AM

#

Hello, guys is there a problem solved on concepts Perspective projection, weak projection, orthographic projection.

jaunty belfry Apr 2, 2022, 12:03 PM

#

@tough frigate

#

import numpy as np

def add_to_neg_elements(mat, num):
    mat[mat<0] = mat[(mat<0)+num] 
    return mat

#

Can you tell whats wrong with this code? Im getting an error : IndexError: index 101 is out of bounds for axis 0 with size 4

tacit basin Apr 2, 2022, 12:46 PM

#

jaunty belfry ```py import numpy as np def add_to_neg_elements(mat, num): mat[mat<0] = ma...

I think you want

mat[mat<0] = mat[mat<0] + num

tacit basin Apr 2, 2022, 12:49 PM

#

tough frigate Random state says no matter how many times you run your model, it will take the ...

there are plenty of random states to set in python to get exact result, depends on lib as well

prisma mist Apr 2, 2022, 12:54 PM

#

how to pop rows from a pd df? first 5 rows.. then next 5 rows and so on

prisma mist Apr 2, 2022, 1:12 PM

#

from what i can tell there is no easy way to pop rows from a df?

bold timber Apr 2, 2022, 1:13 PM

#

I want to run the model Neural Network, but the kernel is always dead like this. What happened? How to fix this problem?

bright garden Apr 2, 2022, 1:19 PM

#

bold timber I want to run the model Neural Network, but the kernel is always dead like this...

Try to find a log, it might help you understand why the kernel died. A common cause is not enough RAM to support the loaded data

prisma mist Apr 2, 2022, 1:32 PM

#

❓ how to pop rows off a df ❓

bold timber Apr 2, 2022, 1:34 PM

#

bright garden Try to find a log, it might help you understand why the kernel died. A common ca...

can you guide me to handle this? my ram is 16gb and it still doesn't work

bright garden Apr 2, 2022, 1:35 PM

#

prisma mist ❓ how to pop rows off a df ❓

bright garden Apr 2, 2022, 1:35 PM

#

bold timber can you guide me to handle this? my ram is 16gb and it still doesn't work

Alright so I found this https://docs.qubole.com/en/latest/troubleshooting-guide/jupy-notebook-ts/analyze-jupy-notebook-logs.html

#

Try to see if you can access the Kernel logs from here

prisma mist Apr 2, 2022, 1:37 PM

#

bright garden

that's a column pop

bright garden Apr 2, 2022, 1:41 PM

#

prisma mist that's a column pop

Ohh yes, you're right

#

hmm, there doesn't seem to be an inbuilt function to do this

#

I found this df.T.pop(index)

#

and then you can transpose back to the original dataframe

mild dirge Apr 2, 2022, 1:49 PM

#

bold timber can you guide me to handle this? my ram is 16gb and it still doesn't work

First try to just lower the data you load in (lower batch size) see if that works

#

Try like 10 or something if it's images

#

At least then you can pretty much confirm it's a memory issue

modest shuttle Apr 2, 2022, 2:08 PM

#

Hello,
Is Prophet better or NeuralProphet?

hybrid mica Apr 2, 2022, 2:33 PM

#

what does this loss actually represent? (this is an artificial neural network for regression) is there a good thumb rule to determine how many hidden layers and how many neurons in each hidden layer in an artificial neural network?

serene scaffold Apr 2, 2022, 2:34 PM

#

hybrid mica what does this loss actually represent? (this is an artificial neural network fo...

the loss is an aggregate measurement of how far off your model is from getting it right. it looks like your loss is the same after each epoch, so your model is no longer learning.

hybrid mica Apr 2, 2022, 2:35 PM

#

the decrease in loss is incredibly slow from 30 to 100; should i just stop the epochs at 30?

serene scaffold Apr 2, 2022, 2:35 PM

#

hybrid mica the decrease in loss is incredibly slow from 30 to 100; should i just stop the e...

once the rate of change for the loss gets this slow, you might as well stop, yes.

#

I don't know that there is

hybrid mica Apr 2, 2022, 2:37 PM

#

it says 'Adding the input layer and the first hidden layer' but there is just one line of code in that cell. Is this code incorrect? Could you please explain?

bold timber Apr 2, 2022, 2:38 PM

#

mild dirge First try to just lower the data you load in (lower batch size) see if that work...

Previously I use batch_size = 128, then I try to use 64 and still doesn't work. Why did it happen?

serene scaffold Apr 2, 2022, 2:38 PM

#

hybrid mica it says 'Adding the input layer and the first hidden layer' but there is just on...

ann is a Sequential network, which just means that for any data that goes into it, it's a straight pass through the sequence of layers. Then you have two dense layers that the data passes through. and then the output layer is the one with the answer, so to speak.

mild dirge Apr 2, 2022, 2:39 PM

#

bold timber Previously I use batch_size = 128, then I try to use 64 and still doesn't work. ...

Did you try a really low amount like 10 or 5?

hybrid mica Apr 2, 2022, 2:39 PM

#

but where is the first input layer created in the lines of code? and where is the first hidden layer created? there is one line of code but two things are created?

serene scaffold Apr 2, 2022, 2:40 PM

#

that might actually be a mistake in the code. I only see three layers created here.

hybrid mica Apr 2, 2022, 2:41 PM

#

so only the input layer is being created here?

serene scaffold Apr 2, 2022, 2:43 PM

#

hybrid mica so only the input layer is being created here?

as far as I can tell, that's the first/input layer, and is thus not a hidden layer. where did you get this notebook?

#

you can do len(ann.layers) to see how many layers it has. but make sure you've run each cell exactly once when you do that, or you'll get an incorrect answer.

hybrid mica Apr 2, 2022, 2:47 PM

#

ok - they say that the input layer is automatically created depending on the number of features; ill check it with what you have advised to do

hybrid mica Apr 2, 2022, 2:50 PM

#

serene scaffold you can do `len(ann.layers)` to see how many layers it has. but make sure you've...

output of this is 3

serene scaffold Apr 2, 2022, 2:56 PM

#

hybrid mica output of this is 3

then I guess the notebook is wrong

vestal ocean Apr 2, 2022, 3:04 PM

#

How could i create a legend and have it in this format for a seaborn plot?

modest shuttle Apr 2, 2022, 3:07 PM

#

Hello,
Is Prophet better or NeuralProphet?

bold timber Apr 2, 2022, 3:34 PM

#

mild dirge Did you try a really low amount like 10 or 5?

I recently try to use 10 for batch size, but doesn't work

mild dirge Apr 2, 2022, 3:35 PM

#

Try updating your libraries maybe

chrome junco Apr 2, 2022, 3:38 PM

#

vestal ocean How could i create a legend and have it in this format for a seaborn plot?

is this matplotlib?

vestal ocean Apr 2, 2022, 3:39 PM

#

chrome junco is this matplotlib?

Yep

chrome junco Apr 2, 2022, 3:40 PM

#

you have to plot like this

#

plt.legend(loc='upper left')

vestal ocean Apr 2, 2022, 3:41 PM

#

ive currently got my graphs set out like this for all 6 sub plots

vestal ocean Apr 2, 2022, 3:41 PM

#

chrome junco ``` plt.legend(loc='upper left') ```

i tried that but i wanted to have it on its own not in a graph

chrome junco Apr 2, 2022, 3:43 PM

#

handles, labels = ax.get_legend_handles_labels()
fig.legend(handles, labels, loc='upper center')

#

https://stackoverflow.com/questions/9834452/how-do-i-make-a-single-legend-for-many-subplots-with-matplotlib

Stack Overflow

how do I make a single legend for many subplots with matplotlib?

I am plotting the same type of information, but for different countries, with multiple subplots with matplotlib. That is, I have 9 plots on a 3x3 grid, all with the same for lines (of course, diffe...

#

There is also a nice function get_legend_handles_labels() you can call on the last axis (if you iterate over them) that would collect everything you need from label= arguments:

#

you basically make them into one big axes and it pulls the legend from them all into one

#

@vestal ocean

vestal ocean Apr 2, 2022, 3:44 PM

#

chrome junco you basically make them into one big axes and it pulls the legend from them all ...

So would i need to change the code structure of mine?

chrome junco Apr 2, 2022, 3:45 PM

#

yes,

#

you would need to make them into subplots

#

ax1

#

ax2

#

etc

#

https://matplotlib.org/3.5.0/api/_as_gen/matplotlib.pyplot.subplots.html

#

I have a question for people that work with LSTM models

#

I am trying to predict asset prices from 60 days into the future

#

and it takes historical data and trains based off that data

#

but i have a problem with over fitting

#

ill show you

#

Epoch 1/500
237/237 [==============================] - 12s 52ms/step - loss: 3.8080e-04

#

Epoch 100/500
237/237 [==============================] - 11s 44ms/step - loss: 1.3932e-04

#

Epoch 200/500
237/237 [==============================] - 11s 45ms/step - loss: 1.1043e-04

#

Epoch 300/500
237/237 [==============================] - 11s 46ms/step - loss: 7.9581e-05

#

Epoch 400/500
237/237 [==============================] - 11s 47ms/step - loss: 6.8104e-05

#

Epoch 500/500
237/237 [==============================] - 11s 47ms/step - loss: 5.6542e-05

#

as you can see it goes from 1-4 untill like 275 then it skyrockets and hovers around 9

vestal ocean Apr 2, 2022, 3:50 PM

#

chrome junco you would need to make them into subplots

any idea how i could move it to take the place of the empty plot?

chrome junco Apr 2, 2022, 3:50 PM

#

idk why my model is over fitting

chrome junco Apr 2, 2022, 3:51 PM

#

vestal ocean any idea how i could move it to take the place of the empty plot?

to replace the first subplot with the legend?

vestal ocean Apr 2, 2022, 3:51 PM

#

this is the the code im using

#

yeah

#

for the plot^

chrome junco Apr 2, 2022, 3:52 PM

#

the reason its there is because ther fig.legend

#

its applying to the full figure

#

you would need to apply it only to the subplot you want it in

#

ill look online for some examples to make myself more clear

#

https://www.google.com/search?q=apply+legend+to+subplot&rlz=1C1CHBF_enCA971CA971&oq=apply+legend+to+subplot&aqs=chrome..69i57.4244j0j7&sourceid=chrome&ie=UTF-8

#

https://stackoverflow.com/questions/27016904/matplotlib-legends-in-subplot

Stack Overflow

Matplotlib legends in subplot

I would like to put legends inside each one of the subplots below.
I've tried with plt.legend but it didn't work.

Any suggestions?

Thanks in advance :-)

f, (ax1, ax2, ax3) = plt.subplots(3, sha...

#

this is for multiple legends so one for each subplot

#

so far thats all i can find

#

but I think that if you apply the full legend to a single subplot of your choice it should work

#

https://paste.pythondiscord.com/curocufofe

#

my LSTM stock prediction model

#

If anyone has the time to tell me why its over fit please dont hesitate to dm me

warped dagger Apr 2, 2022, 4:53 PM

#

yo can someone help me find a dataset which has different graphs like straight line, parabola, hyperbola, etc for a ML program (couldn't find anything on kaggle 😭)

arctic wedgeBOT Apr 2, 2022, 5:23 PM

#

Hey @thin palm!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

vestal ocean Apr 2, 2022, 5:29 PM

#

If i have a graph like this that is currently showing the streams for the top 10 global artists, could anyone recommend a way of having the top 10 european artists on the same plot?

#

Preferably not as the same line format and may reduce from top 10 to top 5 so its cleaner

thin palm Apr 2, 2022, 5:32 PM

#

vestal ocean If i have a graph like this that is currently showing the streams for the top 10...

instead of a line plot you could do dash plots

#

so European artists will be noticed

vestal ocean Apr 2, 2022, 5:34 PM

#

thin palm instead of a line plot you could do dash plots

Do u mean do a combination of both?

#

Like line for global and dash for european?

vestal ocean Apr 2, 2022, 5:39 PM

#

thin palm instead of a line plot you could do dash plots

like this?

cunning condor Apr 2, 2022, 6:44 PM

#

Hi all, I was wondering what the best way (resources) to learn AI using Python; is there any books, videos, courses or modules that you guys can recommend me with?

I am currently studying Computer Science A-Level (College-equivalent).

#

In my final year and will be heading off to Uni this year soon.

cunning condor Apr 2, 2022, 7:58 PM

#

Awesome. I'll start with those then, thank you so much 🙂

thin palm Apr 2, 2022, 8:40 PM

#

vestal ocean like this?

This is also an option, also you could do a scatter plot and define the hue by counrtries

cunning condor Apr 2, 2022, 9:08 PM

#

Hi, I was trying to install Face_Recognition using PIP, but an error occur. I found out the issue, but was wondering if it is fine to install again; will it override old file or do I have to do something so that they don't collide?

#

Nvm, it just says satisfied for files already installed. But a new error has occured. I will be using one of the help channels :_

#

🙂

desert oar Apr 3, 2022, 12:40 AM

#

cunning condor Hi, I was trying to install Face_Recognition using PIP, but an error occur. I fo...

yes in general it will uninstall before installing, eg if you upgrade versions

vagrant marsh Apr 3, 2022, 1:22 AM

#

Hi guys, i have a question, do you know how many items at most the apriori algorithm supports in python?

serene scaffold Apr 3, 2022, 1:27 AM

#

vagrant marsh Hi guys, i have a question, do you know how many items at most the apriori algor...

I would think this depends on the memory of your computer rather than a set limit in the code.

misty flint Apr 3, 2022, 3:39 AM

#

really interesting read if youre into RecSys and Personalization https://eugeneyan.com/writing/patterns-for-personalization/

eugeneyan.com

Patterns for Personalization in Recommendations and Search

A whirlwind tour of bandits, embedding+MLP, sequences, graph, and user embeddings.

#

the tl;dr

#

blobhyperthink

desert oar Apr 3, 2022, 3:41 AM

#

it's nice to have a tldr with actual use cases for various things

misty flint Apr 3, 2022, 3:41 AM

#

desert oar it's nice to have a tldr with actual use cases for various things

right? i need everything like this

#

kekHands

strange zealot Apr 3, 2022, 5:32 AM

#

i am working on a time series problem to predict sales could someone point me to a resource that helps me break down the problem ?

worldly dawn Apr 3, 2022, 6:30 AM

#

strange zealot i am working on a time series problem to predict sales could someone point me to...

https://otexts.com/fpp3/ ?

ashen arch Apr 3, 2022, 7:31 AM

#

Hey anyone got some time?
I've got some query about Reinforcement Learning that I want to ask

wise pelican Apr 3, 2022, 8:17 AM

#

Hoping anyone with knowledge of generating animated graphs with matplotlib can help me figure this out
My program takes in a CSV with one column for timestamps, and another for values recorded at those timestamps
I currently animate it in a way to show 60 timestamps of data, and it scrolls from left to right to show the next timestamp and recorded piece of data
Problem is that the timestamps are variable, and my current function to graph the data points in an animated way doesn't work for that

def init_line():
  line.set_data([], [])
  return (line,)

def animate_line(i):
  line.set_data(x.array[:i], y.array[:i])
  ax.set_xlim(x.array[i] - x.array[60], x.array[i] + x.array[60])
  return (line,)

anim = animation.FuncAnimation(
  my_fig,
  animate_line,
  init_func=init_line,
  frames=length,
  interval=float(100/6),
  blit=True,
  save_count=50,
)```
So what happens is that 1 data point is plotted per video frame (at 60fps that's 0.16667 seconds), which is not accurate to how far apart the data points are (which range from 0.08s to 0.2s)
How would I go about making it so my data points are put into the video at the same point that the timestamp is?

steady basalt Apr 3, 2022, 10:05 AM

#

What cooks distance is classed as an extreme outlier

#

4/n is just non extreme right?

tidal bough Apr 3, 2022, 10:23 AM

#

wise pelican Hoping anyone with knowledge of generating animated graphs with matplotlib can h...

You can track the time elapsed in the update function (adding dt = 100/6 to the time each frame), and only move the window when the time reaches the next timestamp.

#

so something like

dt = 100/6 # the interval
t = 0
shown = 0
def animate_line(i):
    global t, shown
    t += dt
    if shown < len(x.array) and t >= x.array[shown]: # time for next point!
        shown += 1
        # update the data:
        line.set_data(x.array[:shown], y.array[:shown])
        ax.set_xlim(x.array[i] - x.array[60], x.array[i] + x.array[60])
    return (line,)

abstract sinew Apr 3, 2022, 11:04 AM

#

Hiya, I'm training a cnn on the MNIST dataset. I'm doing k-fold cross validation right now, and i've noticed that the loss varies quite a bit between each fold. Is that something I should be worried about?

Score per fold
> Fold 1 - Loss: 0.1064145416021347 - Accuracy: 97.18992114067078%
> Fold 2 - Loss: 0.05745554342865944 - Accuracy: 98.25581312179565%
> Fold 3 - Loss: 0.0768362432718277 - Accuracy: 97.86821603775024%
> Fold 4 - Loss: 0.09651391208171844 - Accuracy: 97.86821603775024%
> Fold 5 - Loss: 0.07439924031496048 - Accuracy: 98.06201457977295%
> Fold 6 - Loss: 0.1636662632226944 - Accuracy: 96.60852551460266%
> Fold 7 - Loss: 0.10331138223409653 - Accuracy: 97.48061895370483%
> Fold 8 - Loss: 0.06524728238582611 - Accuracy: 98.44810962677002%
> Fold 9 - Loss: 0.07992955297231674 - Accuracy: 97.47817516326904%
> Fold 10 - Loss: 0.09972762316465378 - Accuracy: 97.47817516326904%
------------------------------------------------------------------------
Average scores for all folds:
> Accuracy: 97.67377853393555 (+- 0.5145294907723429)
> Loss: 0.09235015846788883

#

what will you be working on

hybrid mica Apr 3, 2022, 11:07 AM

#

im learning how to implement deep learning

abstract sinew Apr 3, 2022, 11:07 AM

#

notebook should be fine

hybrid mica Apr 3, 2022, 11:07 AM

#

when would you use lab over notebook? and when would you use some ide like spyder?

abstract sinew Apr 3, 2022, 11:08 AM

#

now that I look at it, looks like lab might be nice to display graphs in cause you can move them around in a separate window

#

I might have to look into that myself

#

yeah spyder might be nice too

tough frigate Apr 3, 2022, 11:22 AM

#

hybrid mica im learning how to implement deep learning

yeah, just started with that

mild dirge Apr 3, 2022, 11:27 AM

#

abstract sinew Hiya, I'm training a cnn on the MNIST dataset. I'm doing k-fold cross validation...

It doesn't seem to vary that much tbh

#

You already have a high accuracy, so you only guess a few cases wrong. And if you guess one more or one less wrong, the loss will be quite a bit different

abstract sinew Apr 3, 2022, 11:38 AM

#

mild dirge You already have a high accuracy, so you only guess a few cases wrong. And if yo...

haha yeah you might be right. The problem is that I'm trying to tweak some parameters, and it's difficult to tell if It's improving or not

mild dirge Apr 3, 2022, 11:39 AM

#

how many test samples?

#

And as long as the average loss over the folds does not have a very high deviation, it should be a good (or at least stable) measure

abstract sinew Apr 3, 2022, 11:41 AM

#

I'm doing 10 splits, so it's 1/10 for testing

#

out of 10318 total

mild dirge Apr 3, 2022, 11:41 AM

#

Alright

#

you get a more accurate measure with higher number of folds*

#

there's even leave-one-out cross validation

abstract sinew Apr 3, 2022, 11:43 AM

#

I'll look into that

#

Do you think I should revert back to a model where I haven't tuned the rest of the parameters yet? That should make training a bit faster too

#

And then once i've found what works best, combine all of the best parts and see where we're at

#

I think I might do thatr

livid crane Apr 3, 2022, 11:44 AM

#

from pycoingecko import CoinGeckoAPI
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from datetime import datetime
cg=CoinGeckoAPI()
# print(cg.get_price(ids="bitcoin",vs_currencies="eth"))


class crypto:
    def __init__(self,crypto,currency):
        self.x_cords=[]
        self.y_cords=[]
        self.crypto_curr=crypto
        self.vs_currency=currency
        plt.title(self.crypto_curr)

    def getName(self):
        return self.crypto_curr
    
    def insertion(self):
        self.x_cords.append(datetime.now())
        self.y_cords.append(cg.get_price(ids=self.crypto_curr,vs_currencies=self.vs_currency)[self.crypto_curr][self.vs_currency])
    
    def plotting(self):
        plt.plot(self.x_cords,self.y_cords)
    
    def start(self):
        plt.gcf().canvas.manager.set_window_title(f"Live Plotting {self.crypto_curr}")
        plt.tight_layout()
        FuncAnimation(plt.gcf(),self.plotting,interval=1000)
        plt.show()
        

Btc=crypto("bitcoin","usd")
Btc.insertion()
Btc.plotting()
Btc.start()

#

plot lines are not showing

#

can someone help me

mild dirge Apr 3, 2022, 11:45 AM

#

abstract sinew I think I might do thatr

Well the combinations is what give good results

#

you can keep other params constant and tune one, and then the next etc.

#

But this might not give the most satisfactory results

abstract sinew Apr 3, 2022, 11:46 AM

#

My thinking is that it should make the difference more obvious when I'm tuning if it's not already a good model

mild dirge Apr 3, 2022, 11:46 AM

#

Yeah, but you can't be sure it's the best parameter value

#

since the other values will be different

abstract sinew Apr 3, 2022, 11:46 AM

#

It should make more mistakes between each model

#

What I've been doing is combining them once I've found a good parameter

mild dirge Apr 3, 2022, 11:47 AM

#

There are more efficient grid search methods though I believe

#

so maybe look into that instead

abstract sinew Apr 3, 2022, 11:47 AM

#

mild dirge since the other values will be different

Yeah I'm kind of worried about that

mild dirge Apr 3, 2022, 11:47 AM

#

and MNIST should not* take a monstrous model, so training/testing shouldn't take too long per iteration right?

abstract sinew Apr 3, 2022, 11:48 AM

#

haha

#

well

#

😅