#data-science-and-ml

1 messages · Page 418 of 1

latent glacier
#

okay thanks!!

young ridge
#

Hello guys could anyone help me out with correlation coefficients in python

#

Short background, I need to do correlation analysis using ordinal data

#

but the thing is if i do spearman's correlation analysis

#

im not sure if its possible using spearman's correlation using Multiple ordinal columns

#

ive done some research but so far no one has done it with multiple ordinal columns

#

any advice?

#

or how do I calculate the correlation between multiple ordinal variables?

upbeat furnace
#

Hi, I'm trying install wget in site-packages in my d drive using !pip install wget. However, my python is located in my c drive. Is it possible to install wget on d drive instead?

serene scaffold
#

@upbeat furnace I've only seen wget as a bash command. Not as a python package

unique flame
#

IMO exams are there to test your ability to handle tough situations by putting you under a lot of stress, similar to actual life events. Surely you have met people who lost their cool or stayed home when the situation needed them. I for one would hate to work with such a person.

serene scaffold
#

Titles are often arbitrary. A "data scientist" might still write papers.

versed gulch
#

Hi guys,

Is there a way I can create an new column based on the x column here in my dataframe that finds the difference between two rows i.e. the new column will be row1-row0 of the x column and so on?

serene scaffold
#

@versed gulch yes, you would just do df['new'] = df['a'] - df['b']

#

If you're trying to do operations between elements of the same column, please be more specific about the expected input and output

versed gulch
#

i.e. I want the first value to be Nan then do 65388.270 - 64624.593 and so on

serene scaffold
versed gulch
#

then 66151.947-65388.270

serene scaffold
#

@versed gulch look into rolling

#

Or even just diff

versed gulch
#

ok thanks

#

Also is there a way to take the tile numbers that have the same y values and put them into a list?

steady basalt
#

wud anyone like to help me put a bunch of dicts adn lists into a dataframe, i have the process done for one df but instead of doing it 10 times i wana do it in a loop in a single cell?

#

and my brains gettin goverloaded

pulsar cosmos
#

Hey, anyworked with Celonis / pycelonis yet and encountered an issue?

#

I'm wondering if it is just a bug or I made a mistake in my query

steady basalt
#

does anyone know why interpolating doesnt work on a list of dataframe object?

#

lets say i have list of dataframes [pd.dataframe(list[0]), pd.dataframe(list[1]) etc, why doesnt for i in range len dataframes: dataframes[i]['col']=dataframes[i]['col'].interpolate(method=linear) work?

#

does pandas not creeate the dataframe as an object when reading from lists?

#

infact, its replaceing them with None

sour tide
#

helloo gfolks...so i have this dict data with me ..just wanted to know how to divide this into two parts preferably 70:30

  0.         0.04303315 0.03849002 0.         0.        ]
 [0.         1.         0.         0.12309149 0.         0.
  0.         0.         0.         0.05913124 0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.         0.         0.         0.        ]
 [0.         0.12309149 0.         1.         0.07216878 0.
  0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.07216878 1.         0.06454972
  0.         0.         0.         0.         0.        ]
 [0.0496904  0.         0.         0.         0.06454972 1.
  0.         0.         0.         0.         0.14322297]
 [0.         0.         0.         0.         0.         0.
  1.         0.         0.         0.         0.        ]
 [0.04303315 0.         0.         0.         0.         0.
  0.         1.         0.04472136 0.06201737 0.        ]
 [0.03849002 0.         0.         0.         0.         0.
  0.         0.04472136 1.         0.         0.05547002]
 [0.         0.05913124 0.         0.         0.         0.
  0.         0.06201737 0.         1.         0.        ]
 [0.         0.         0.         0.         0.         0.14322297
  0.         0.         0.05547002 0.         1.        ]]```
steady basalt
#

nvm fixed

rotund dock
#

Hi all! Anyone familiar with sympy? I'm trying to solve an integral and need some help

lapis sequoia
#

How would i go about training a TTS model to replicate someone's voice, i have essentially a infinite amount of training data as the person is locked in my basement, and i'm just looking for a framework or a good guide.

The goal is not to make any TTS, the goal is to replicate the person's voice

sour tide
#

its stored in a variable

steady basalt
lapis sequoia
steady basalt
#

A = thelist[lenlist/5] would give u the first 20% right?

#

Sorry I missed a :

#

Put the column before

#

Lenlist

sour tide
astral parrot
#

can we use python to make ai

steady basalt
#

Splitting it in half you index as yourlist[:lenyourlist/2]

sour tide
#

sry its my forst time doing python so new to tiis

steady basalt
#

I don’t understand what u want

#

I just gave u the code

sour tide
sour tide
steady basalt
#

You store variables in python by just saying x = 1

#

So firsthalf = thecodeigaveyou means first half is that first half list

astral parrot
#

can we use python to make ai

steady basalt
#

Holy shit chat today

agile cobalt
steady basalt
#

@sour tide go to main python channel not data for this it’s basic python

astral parrot
steady basalt
#

Yes python is the most popular for that

astral parrot
#

O thanks

sour tide
steady basalt
#

weird

hollow sentinel
#

did you guys know postman can give you code for python request for APIs?

#

i had no idea

ocean swallow
#

hey nlp people, is there a framework or tool that we can define a grammar by notation or any other way, from a pool of words with pos tags, it stochastically creates sentences?

#

NLTK pos tags are too specific for my case. All I want to basically use is S NP V

dull granite
#

spacy?

ocean swallow
#

this looks good tbh

swift furnace
#

Does anyone know if I need machine learning for image processing/computer vision?
How much can I do without machine learning?

#

Let's say I want to create a project in which I analyze an image of a heart, and then I as a result I want to know whether or not that person has some sort of disease, can I do that with only image processing/computer vision?

wooden sail
#

computer vision often, but not always involves machine learning

#

the difference is how much math you do yourself 😛

#

same with image processing, which i would usually put in a separate category, as it deals with different tasks in general (they do have some overlap)

swift furnace
wooden sail
#

then image and signal processing in general are a bad idea, unless you're willing to sink in a lot of time

#

people get masters and phds in engineering and maths for signal processing/image processing/computer vision

#

especially if you wanna do it in a medical area

swift furnace
#

I see!

#

Does that mean that if I use ML, then I wouldn't need to dive in too deep in math?

wooden sail
#

hmm you dive into different math, but depending on how novel or old it is, you don't need to do it yourself

#

also consider that when i say "do math", i don't mean you're gonna go and multiply numbers and do integrals on paper, but rather that you'll formulate problems in a clever way and recognize good solution approaches

swift furnace
#

I see!

#

When you said math, I was thinking of stuff such as calculus and linear algebra

wooden sail
#

as a dumb example, noticing that you can find the coefficients of a polynomial of arbitrary degree by doing a linear regression, even though you'd normally associate "linear" with polynomials of order 1

#

calculus and linear algebra are the bare basics, you won't get anywhere without those

swift furnace
#

I see!

swift furnace
#

besides calculus and linear algebra

wooden sail
#

probability and statistics in the multivariate case, some optimization

#

image processing methods often basic physics and differential equations, e.g. when you optimally detect edges in an image or try to find regions that satisfy some condition, try to denoise, etc

swift furnace
#

I see! @wooden sail

wooden sail
#

and statistical signal processing itself

swift furnace
#

Image processing and computer vision are actually very interesting topics

wooden sail
#

i'd say optimization and sigproc are applications of the other topics... in a very handwavy way, because there's a lot to those topics in and of themselves

swift furnace
wooden sail
steady basalt
#

anyone waan help me code a weird nested loop list

#

for statement

velvet rover
#

I want to plot a scatter plot with the drop down as Species in R, not in shiny. The dropdown should have the Species (setosa, versicolor, and virginica) and by selecting one the plot should change. Can someone suggest me here?

hollow sentinel
#

we do R stuff here?

dusty valve
#

can anyone recommend a good general tensorflow tutorial, like how to get a grasp of how to use it

odd meteor
quick eagle
#

I have a pressure log with oscillations from which I need to extract some timing info - anyone have suggestions on how to do so (I'm mostly on pandas):
basically - extract the 'timestamp' of each red dot, and the duration of the black bar:

#

in this particular example, crossing the '490' value would work, but unfortunately there's a significant low frequency component that makes absolute value approach useless:

mild dirge
#

So using the derrivative then maybe?

#

If the value suddenly decreases rapidly, then you found the red dot

#

And if it rapidly increases, that's the end of the black bar

quick eagle
#

that would make sense - I've been shying away from adding a low pass filter because I don't want to lose timing resolution (this is a ~20Hz signal), but maybe some additional step to ensure an 'extended' drop is noted, as opposed to 'jiggles'? I'm not sure if there are adaptive peak/threshold detection tools out there (and whether those would be too complicated for this)?

mild dirge
#

Not sure about low-pass filters, but if I were to do this task, I would probably just check for each point if the point x time steps ahead is at least 10-20 lower

#

If it is, then red point, and we start looking for the end of black bar

#

which is when the point x time steps ahead is at least 10-20 higher

#

That would probably already give "decent-ish" results

#

If you have a lot of data, you could maybe even train a simple rolling regression model

hollow sentinel
odd meteor
hollow sentinel
#

that's good man

ocean swallow
# swift furnace what if I'm bad at math? xD

No. Don't be alarmed. If you are on the production side you don't need to know any math at all. Or even stats for that case. There is many out of the box running frameworks. Apart from that whether you need ML or Vanilla CV really depends on the problem.

#

What kind of things are you trying to detect from heart image?

swift furnace
swift furnace
rough mountain
#

I have a 1D array of vectorized words. How do I use it with a lstm? It want's ndim 3, but I only have 2 (batch size, vector count)

ocean swallow
#

MNIST is like the hello world of statistics dataset and there is many tutorials.

#

For computer vision

swift furnace
#

Are there interesting platforms that I should look up, related to the topics I've mentioned above?

misty flint
#

considering doing it

#

especially since im already running into problems at work with model deployment

ocean swallow
#

or like anything at all

swift furnace
misty flint
#

oh this isnt for you. this is mostly for the lurkers in chat. DoggoKek

#

theres also an academic discount for the students and an accessibility discount for those from low income countries

misty flint
#

havent read it myself, but i heard many from my podcasts say its a classic

arctic wedgeBOT
ancient fractal
#

Can someone fix my python code? I am trying to Cluster a 2D array and calculate min and max value of each cluster. I got stuck, it does not output the results that I am looking for. Here is my code: ```py
import numpy as np
from collections import defaultdict
from scipy.cluster.vq import kmeans, vq

data = defaultdict(list)

arr =[[2, 230], [2, 233], [1, 676], [2, 233], [1, 698], [2, 233], [1, 685], [2, 234], [2, 236], [2, 232], [2, 261], [1, 674], [2, 262], [2, 236], [2, 267], [1, 690], [2, 261], [2, 231], [1, 540], [2, 231], [1, 696], [2, 233], [1, 528], [2, 231], [2, 232]]

for k in arr:
data[k[0]].append(k[1])

data = dict(data)

new_data = defaultdict(list)

check_cluster_list = [len(x) for ii,x in data.items()]

def chunk(l: list, N: int):
return [l[i:i+N] for i in range(0, len(l), N)]

arr_d = defaultdict(list)
for entry in arr:
arr_d[entry[0]].append(entry[1])

chunks = {
1: 3,
2: 4,
}

for k, l in arr_d.items():
number_of_clusters = chunks[k]

if number_of_clusters > min(check_cluster_list):
    print("Clusters cannot be larger than",min(check_cluster_list))
    raise Exception(f"Clusters cannot be larger than {min(check_cluster_list)}")


for indx, (id, y) in enumerate(data.items()):
  cluster_dict = defaultdict(list)

  codebook, _ = kmeans(np.array(y, dtype=float), number_of_clusters)
  cluster_indices, _ = vq(y, codebook)


  for i, val in enumerate(cluster_indices):
     cluster_dict[val].append(y[i])
  final_list = []
  for id_1,y_1 in cluster_dict.items():
    final_list.append([min(y_1), max(y_1)])
  new_data[id].append(final_list)


new_data = dict(new_data)
new_data = {id:y[0] for id,y in new_data.items()}
print(new_data)```
misty flint
#

published yesterday

#

More papers have been published on AI than any person can read in a lifetime. So, in your efforts to learn, it’s critical to prioritize topic selection. I believe the most important topics for a technical career in machine learning are:

  • Foundational machine learning skills.
  • Deep learning.
  • Math relevant to machine learning.
  • Software development.
#

he goes into the specifics in his article

tidal bough
#

the most important topics for a technical career in machine learning are:

  • Foundational machine learning skills.
    no shit
#

obligatory "this article was written by an AI"

misty flint
#

well his audience is for early career peeps or students

#

so he needs to state the obvious

#

please dont hate andrew ng

#

hes done a lot for the community

tidal bough
#

sure, i've done his ML course

misty flint
#

unless youre referring to that one

#

this time theres RecSys + RL in the last module of the syllabus

thorny aurora
#

yo, i built a model today which got an accuracy score of 90% and then ran some data on it from another year and got 80%, would this be a good model?

steady basalt
#

@ano

#

@spare briar

#

how is y' the same as f'x?

#

unless y=fx

thorny aurora
#

y and fx are the same thing in most contexts

steady basalt
#

how ?

#

y is a varaible and fx is a function of x?

thorny aurora
#

is this a 2 dimensional field

steady basalt
#

no idea

#

prob

thorny aurora
#

do you know what a function is in general?

steady basalt
#

is this just used as an example

steady basalt
#

but then this confuses me as its being equated to y

thorny aurora
#

so basically f(x) means you input x into a formula in this case and the result y

steady basalt
#

then hes saying that if y = mx+c that multiplying by m and adding c is the function f?

thorny aurora
#

yes

steady basalt
#

so f in and and of it self also exists, but ive never seen it

#

represented

thorny aurora
#

so the times m and + c are basically like applying a function to x and then spitting out y

steady basalt
#

how do u write f(x) without x just as f in text?

#

is it possible?

#

so you can have a function that u can apply to anything

#

like u said, *m+c

thorny aurora
#

uh im not sure what you're asking

#

x is just an input, it can be any number

steady basalt
#

f = *m+c

#

is that even real?

#

ive never seen somene write that way

#

its always f(x)

thorny aurora
#

i think it's just shorthand

steady basalt
#

but can u write f on its own

#

without an x

#

just an arbitary function

thorny aurora
#

my calculus teacher did a lot

#

especially with derivatives

steady basalt
#

ive enver ever seen that, and someone told me that it doesnte xist

#

exist

#

so ive always been confused

#

someone said that f on itsown is nothing

thorny aurora
#

i mean technically no, the function needs an input

#

without the input there's no output

steady basalt
#

so this guy on the photo saying y = f(x)

#

hes just using it as an example for y = graphs?

thorny aurora
#

yes

steady basalt
#

its not always the case or its always the case in graphs

#

u cant have a graph withuot some sort of x

thorny aurora
#

in 2 dimensions yes

#

in 3 dimensions it changes a bit

steady basalt
#

what does it mean to say y vs y''

#

second derivative?

#

d2y/dx2 is physically on the graph what?

#

another location on the line?

#

shifted the dot so to speak?

worldly dawn
steady basalt
#

the way i see it is a position yes

#

how does velocity come into play?

#

or acceleration?

#

the difference between the derivatives?

worldly dawn
#

velocity is how fast you move

#

And acceleration is how much the velocity change

steady basalt
#

yeah i took physics

#

but in math they never explained this

#

it was just random numbers and no actual meaning

#

solving first and second order derivatives

#

to pass a quesiton

#

without knowing lkiterally waht it means

#

how can a new point on the line related to velocity?

#

you mean gradient?

#

thers a new gradient

#

not sure about the acceleration part

iron basalt
# steady basalt it was just random numbers and no actual meaning

What might it feel like to invent calculus?
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/lessons/essence-of-calculus#thanks

In this first video of the series, we see how unraveling the nuances of a simp...

▶ Play video
steady basalt
#

why the fuck didnt they just say so in school?

#

limit theorem wasnt touched at all

#

more like remember simple rules

#

finally learning about first principals properly

dull granite
steady basalt
#

no just 1

dull granite
#

It expands on the uses of Calculus in three dimensions.

steady basalt
#

btw, if the limit of fx is undefined

mint palm
#

for x1, x2,x3.....as input are simple feature engineering such as making a new x=x1*x2 etc., make any substantial improvement?
i feel like a much complex feature engineering might help more, but these simple feature engineering isnt much.

steady basalt
#

whats that written as

dull granite
steady basalt
#

so u cant just say 0

dull granite
#

No lol.

steady basalt
#

its 1?

#

or wahteever

#

the final stop is

#

wait

dull granite
#

It's undefined.

#

1/0 is undefined.

steady basalt
#

the limit is what causes the undefined on the x axis

#

right?

#

so you say

dull granite
#

Study some calculus book dude.

#

Concepts are difficult ngl.

steady basalt
#

if the lim fx = 1

#

u say approaches 1

dull granite
#

But if you don't read it and do it yourself, you won't understand.

steady basalt
#

oh so im wrong?

dull granite
#

Don't understand what you're trying to say.

steady basalt
#

u can have random points on the graph that literally arent even touching the curve

#

so long as the function defines it

#

?

dull granite
#

Wut?

steady basalt
#

Yup it’s true

spare briar
#

I think what you are asking about are piecewise functions https://en.wikipedia.org/wiki/Piecewise

In mathematics, a piecewise-defined function (also called a piecewise function, a hybrid function, or definition by cases) is a function defined by multiple sub-functions, where each sub-function applies to a different interval in the domain. Piecewise definition is actually a way of expressing the function, rather than a characteristic of the f...

barren wedge
#

How to improve prediction in classification problems?

main fox
barren wedge
main fox
#

What are you trying to classify?

barren wedge
#

I drop ID
because it is not important

faint cargo
#

Need to Optimise my Program .......Help!!!

#
from numpy.core.fromnumeric import amax
from numpy.core.fromnumeric import amin
def peak_valley_detector(d, ker_sz, sigma, width):
  ds = gderiv(d, sigma, ker_sz)
  # print(ds)
  for j in range(1, int((ker_sz/2))+1):
    ds = np.delete(ds, 0)
    ds = np.delete(ds, len(ds)-1)
  d1 = ds
  idx1 = np.zeros(len(d1))
  idx2 = np.zeros(len(d1))
  for i in range(1, len(d1)-2):
    if (np.sign(d1[i-1])>=np.sign(d1[i+1])) and (d[i]>=0.5*(np.amax(d)+np.amin(d))):
      idx1[i] = 1
    else:
      idx1[i] = 0
  for i in range(1, len(d1)-2):
    if (np.sign(d1[i+1])>np.sign(d1[i-1])) and (d[i]<0.5*(np.amax(d)+np.amin(d))):
      idx2[i+1] = 1
    else:
      idx2[i+1] = 0
  
  index1 = np.where(idx1 == 1)
  index2 = np.where(idx2 == 1)

  flag = 0
  
  # indexo2 = [1,len(d1)]
  # for k in index2:
  #   indexo2.append(k)

  # index2 = indexo2  
  
  index2 = np.append(index2, [1, len(d1)])
  index2 = np.sort(index2)

  # Amongst multiple close peaks detected, choose the highest peak, discard the rest.
  while not flag:
    flag = 1
    for i in range(1 , len(index1)-1):
      if abs(index1[i]-index1[i+1]) < width:
        flag = 0 
        if d[index1[i]] < d[index1[i+1]]:
          index1[i] = 9999
        else:
          index1[i+1]  = 9999

    irx1 = np.where(index1 == 9999)
    index1 = np.delete(index1 , irx1)

  flag = 0   
      
  # Amongst multiple close valleys detected, choose the lowest valley, discard the rest.
  while not flag:
    flag = 1
    for i in range(1 , len(index2)-2):
      if abs(index2[i]-index2[i+1]) < width:
        flag = 0 
        if d[index2[i]] > d[index2[i+1]]:
          index2[i] = 9999
        else:
          index2[i+1]  = 9999

    irx2 = np.where(index2 == 9999)
    # print(irx2)
    index2 = np.delete(index2 , irx2)
  return index1,index2
sleek wolf
#

are SQL related questions allowed in this channel? 8-)

hushed sail
brazen spire
#

Is there any database somewhere with images for computer vision?

#

overalaping images for NeRF especially

steady basalt
#

Fx under two conditions

mint palm
#

like we can change the degree of fitting polynomial in LogReg algo using python library, can we also change the coefficient of those polynomial?or will we have write all the code by hand?

rose agate
#

Like a box plot?

hasty kiln
serene scaffold
#

@hasty kiln you are right lemon_hyperpleased

pliant star
#

hey quick question guys:
axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker='.')

this line seems to crash my code, any idea why that could be?

#
img = cv2.imread('IMG_7659.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

pixel_colors = img.reshape(-1, 3).astype(int)
norm = colors.Normalize(vmin=-1.,vmax=1.)
norm.autoscale(pixel_colors)
pixel_colors = norm(pixel_colors).tolist()

r, g, b = cv2.split(img)

fig = plt.figure()
axis = fig.add_subplot(1, 1, 1, projection='3d')
axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker='.')
axis.set_xlabel('Red')
axis.set_ylabel('Green')
axis.set_zlabel('Blue')
plt.show()```
it just doenst show the plot
pliant star
#

There is none

#

Doesnt show anything

mild dirge
#

scatter normally takes an array of x coordinates, and an array of y coordinates

#

you supply r, g and b

#

Not sure what those are supposed to be

#

And you are saying that the line is "crashing" your code, what does that mean?

#

@pliant star

pliant star
#

Yeah its just doing nothing and loading properly

#

Rgb are the color values of the imagrs

#

I got then by cv2.split(img)

mild dirge
#

So what are you trying to plot?

pliant star
#

The rgb values of the images

#

I‘d like to group them later

mild dirge
#

The image itself?

pliant star
#

Using the hsv ones

mild dirge
#

Don't really get it

pliant star
#

yeah

mild dirge
#

Alright, so just do plt.imshow(img) and then plt.show()

#

Or use cv.imshow(img) and then cv2.waitkey(0) iirc

pliant star
#

No i dont want to show the img itself

mild dirge
#

This is a 3d scatter plot, you need to use something other than plt.scatter

pliant star
#

I want sth like that, such that i can group the different pixel values

mild dirge
#

scatter is for 2d

#

I thought at least

pliant star
#

Ah okay

#

Idk

#

I want to group the different colors

mild dirge
#

Oh hmm, nvm it seems it can be done with this as well

#

Did you follow this?

#

And what does it show now?

#

Can you show your screen after it shows the plot

pliant star
#

I need to group different colors in an image

#

Will do once i’m home

mild dirge
#

I'm moving today, so I probably won't be able to look at it much later

#

But maybe someone else could help at that point ^^

hollow sentinel
#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
#
pd.to_numeric(car_data["Price"])
pd.to_numeric(car_data["Doors"])
#

i thought the error was that it was stringss

#

something something int can't be divided by string

#

but i don't think there are any strings here

brave sand
#

@iron basalt Sorry for the ping, but I’ve been wondering if I could somehow “test” the monotonicity constraint? Could I generate some fake values? I want to see how the monotonicity constraint works. Because obviously it works but I need to know how. Any idea how to proceed with this?

hollow sentinel
#

that would do iit

#

price and doors is an object

#

i did it

steady basalt
#

Man I hate waiting networks to train so long

#

It’s been 2 hours

#

Still only 10% in

#

Do any of u use a tuner or just eyeball it

iron basalt
sour tide
#

the output we get..is higher pg good or lower log

mint palm
#

i want to classify racy or violent or border line videos( with audio), where can i get a good pre-trained model for that?

brave sand
brave sand
# iron basalt

so I work that out by hand to prove if the weights are always positive?

iron basalt
# brave sand so I work that out by hand to prove if the weights are always positive?

No, non-negative weights give that constraint. The weights being non-negative is given, it's what the code does. Show that given non-negative weights, you have that constraint. And then show that given that constraint your problem becomes more feasible. Then demonstrate it with an experiment (this is what the paper is all about, it's pretty straight forward even if the code seems like a lot (it's easy to say that you have some network in text, but it can be very annoying to code / the difference between stating something and actually doing it)).

#

(5) is just (1) again, not sure why they wrote it twice.

quartz raptor
fallen portal
#

Can anyone give me a simple explanation as to why we have to specify the DataFrame name multiple times when using Pandas methods? e.g.

my_df[(my_df.value == 10) & (my_df.object == 'Sally')]

why do we need to keep repeating my_df?

tidal bough
# fallen portal Can anyone give me a simple explanation as to why we have to specify the DataFra...

How else can it work? The way this works is just combining a few pandas features:

  1. a comparison with a Series returns a Series of booleans (the comparison results for each element)
  2. boolean series can be elementwise ANDed using &
  3. dataframes can be indexed with a boolean Series to select only the rows for which the corresponding element is True.
  4. If a column's name is a valid Python identifier, like value, you can access that column not just as df["value"] (like usual), but as df.value.
    It's not magic. The expression on the inside of the square brackets here knows nothing about what it's used for, and can't possibly replace, say, value with my_df.value.
lapis sequoia
tidal bough
#

heartbreaking: the worst person you know just made a great point

blazing lagoon
#

I wanna start learning about ai programing but i don't know if im ready, i took a 20hour course before about python. Should i start learning data sceine and ai or should i learn something else before learning this?

quartz raptor
#

learn some math (if you dont know already)

#

as in calculus and linear algebra

#

not necessarily before though

blazing lagoon
#

i know a little bit of math, 10th grade math))

quartz raptor
#

yeah so perhaps you know like dervatives, chain-rule, product-rule etc?

#

that stuff is important for ml

#

or perhaps you also had some stuff about vectors

blazing lagoon
#

only a little bit about derivates and chain rule

quartz raptor
#

in general you wont have to apply math yourself directly however having deeper understanding and intuition for those things is very useful

blazing lagoon
#

thank you

#

you know some good courses about machine learning , etc?

steady basalt
#

in my experience starting similar to you once, the maths is really hard to just learn at will, just pick up the ideas over time and ull be ok

#

like we discussed earlier, i'd fail a linalg/calc2 exam 10/10 times but i still managed to impress interviewers for junior DS roles, if its not faang youll be fine

#

people like to pretend reality is that DS requires you to leave uni like zuckerberg, this isnt the case, alot of ds have had to learn for many years

#

Companies probably value your ability to produce and to make projects using tools more than your ability to do math off hand

#

Or so I’ve been told today

plain zephyr
mint palm
#

youtube has violence, racism, hate detection model for video, what kind of algo is applied there?
anomaly etc. or is it like seperate algorithm for each of those??

#

i only see specific models for hate speech, or for gore etc

#

but i want to make a something that can detect all type of unpleasant behaviour

shrewd locust
#

does someone know how to exclude that dtype int64?

#

having trouble when calculating the accuracy

tidal bough
shrewd locust
#

yes, from printing

mild dirge
#

Why would removing that from printing help you calculate the accuracy better?

steady basalt
rough mountain
#

I have a text corpus. I want to train an AI in such a way that it takes in an input sentence and predicts the next sentence. How do I setup the training and testing data, as well as sentence pairs, without letting the AI "cheat"

surreal brook
untold quail
#

How do I fix this?

#

I get this error when I pip install pyaudio

#

How to fix it without installing visual c++ 14.0

#

???

spare briar
#

required

untold quail
#

Yes

#

Is there no other way?

worldly dawn
# untold quail Is there no other way?

The first few answers to google for "pyaudio windows" yield some great results (including the very same error message you get). Rather than copy/pasting them, I would suggest you try that.
In general, it helps a lot to google the error messages too

#

(I don't use windows, so I can't really help further)

untold quail
#

Thanks

mint palm
blissful nymph
#

resize doesn't work and causes this too

ocean swallow
#

Trying to detect views from title, subscriber and days since published data

#

What I did for embedding data is I got each word's embedding (100 features) in title and then averaged it.

#

what else can I do for dimensional reduction of Embedding data

#

or like anything to increase the prediction quality

steady basalt
#

What gpu is that

ocean swallow
#

it is 2070

steady basalt
#

Fast

serene scaffold
# untold quail Is there no other way?

why don't you want to install it? even if you find a workaround for this library, the c++ build tools are necessary for a lot of installations. it was one of the first things I install when I get a new Windows machine.

lavish obsidian
#

Hi folks
Someone by chance knows how to recognize objects in image by using
"image registration" method?
I'll be happy get any help, thanks in advance ! 🙏

serene scaffold
rough mountain
nimble valley
#

Guys, how do you use AI GPT-3 to create music melodies, lyrics, and other things that help with programming, engineering, or even learning English? I am completely new to programming and have only recently discovered information about AI GPT-3; please point me in the direction of where I can read or watch (youtube) about it. I'm not sure if there is a UI for colobaration with this AI GPT-3. Thanks...

mint palm
#

anyone CURRENTLY doing the cnn course of deep learning specialisation on coursera?? i need ot see updated lab code.

pearl locust
brave sand
#

can I Remote Desktop into my machine from a laptop? and see the visualizations?

#

bc I’m not sure if I should buy a laptop or make my pc smaller

serene scaffold
#

@pearl locust is this self promotion?

brave sand
#

what about normal extensions? do I have to setup a server?

nimble valley
minor turret
#

How can I identify if a word is a material or not?

#

Such as:
black titanium handle

#

I am able to identify the color

#

but I need to be able to identify the titanium

pearl locust
rough mountain
#

For some reason pytorch is is giving me a single data point instead a full batch, and I don't know why.

dataset = torch.utils.data.TensorDataset(seqs, labels)
dataloader = DataLoader(dataset, batch_size=32, drop_last=True, shuffle=True)

for epoch in range(100):
    for i, data in enumerate(tqdm(dataset, desc=f"Epoch: {epoch}")):
        print(data)```
#

Data here is just a tuple of one tensor from seqs and one from labels

gloomy anvil
#

Hey y'all, is there a rough formula or rule of thumb on how many iterations to try out when training a Self-Organizing Map (SOM) / Kohonen Map?

rough mountain
serene scaffold
# nimble valley What can u recommend then? And if it a not a big deal for you, pls, can ya send ...

I would start with a beginner data science book. the fundamentals of data science lend themselves well to AI.

I know it's tempting to jump to the newest and coolest things that are happening in AI, but you won't be able to understand them until you've been developing your knowledge for a long time. starting from the basics and working your way up can still be satisfying. you have to keep a positive attitude about learning

#

!resources data science

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

primal gyro
#

Hello. Can I ask a statistics question here?

gloomy anvil
primal gyro
#

If I have some z-scores and combine them into one z-score, is this a good approach? I have 1 sample so I can't use Stouffer's method.

pulsar hull
#

I'm trying to make backpropagation myself, and one problem I have is that it doesn't work when a layer is using relu as its activation function instead of sigmoid, since the derivative I tried was relu(x)/x, and that can sometimes result in division by zero errors. Do I have the derivative wrong or is there something else I'm missing about it?

nimble valley
#

@serene scaffold man, thanks a lot, I utterly appreciate your swift responses and advices so so much, be blessed and happy

mint palm
#

is YOLO still used in industry??

steady basalt
serene scaffold
nimble valley
steady basalt
#

you can tell it do say something

misty flint
#

speaking of R, no offense to R users, but i made it the 3rd round for a company and decided to cancel my upcoming interviews bc i found out they worked in R mostly (among other reasons)

#

if you do have to use R, for whatever reason, you can use R Studio or even Jupyter notebooks for it

#

it actually has some good stats packages

#

for monte carlo simulations, etc.

#

or bayesian stuff

#

but most of the time you dont need that

#

and if youre going to deploy, its better to be in python anyway

#

i feel like youre more marketable too

#

also you can switch to more software jobs easier if you want to later on

#

bro...

#

you should see javascript

#

that dot notation

#

but yeah it is kinda heavy in R

#

oh the tidyverse is good too if you ever have to work in R

ocean swallow
brave sand
#

so guys should I buy a laptop to Remote Desktop into my pc or should I make my pc smaller? Rn I’m doing work in MARL

steady basalt
steady basalt
misty flint
#

but

steady basalt
#

sm?

misty flint
#

how to put this

#

you didnt like what i had to say about data engineering basics so why would you listen to me about bayesian stats...

steady basalt
#

what data engineering does it require

#

cant u just use stats models or smtn

misty flint
#

ok. try it

steady basalt
#

dude does it on a dataframe

#

what now?

brave sand
#

cuda isn’t supported

steady basalt
#

no data engineering really just calculations

steady basalt
brave sand
#

I use pytorch

steady basalt
#

Well it depends if u need something rly strong like RTX 3080 or better

#

If not the m1 pro has a gpu that’s alright

still dirge
#

i must say, you're one of my biggest inspirations to pursue DS
this one is hella neat, what equation/function is that? 👀

eager wedge
#

I created a segmentation model with a train: 80%, val: 60 after 500 epochs. What is the problem? Is it overfitting or underfitting?

swift furnace
#

Where to get started with Data Science?

tacit basin
burnt citrus
#

having been on here for a while I see that many share my sentiment that pandas is very difficult to use so I made an alt for CSV parsing

#

any feed back would be very nice

royal garnet
#

So, I'm reading this: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

I need to read a dataframe row by row - and then depending on the contents of certain cells, write stuff to a list of lists that I'll eventually write to a csv.

But it sounds like iterating over a dataframe is not a great idea - so how would someone do that?

#

For example, my df contains a column containing a uuid, a name, and then a bunch of other columns with boolean values - and based on those I want to write a line to a csv that has that uuid, name, and those boolean values.

haughty root
#

Hello guys, I have a question

#

Supervised learning got performance measure techniques such a Cross-Val, Confusion-Matrix, Precision & Recall, ROC ..

#

How could we therefore evaluate a reinforced agent in RL ??

upbeat furnace
#

Heyy guys, what does this mean? 😅 " tensorflow 2.9.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.21.2 which is incompatible."

#

does it mean I have to use an older version of protobuf?

serene scaffold
serene scaffold
upbeat furnace
#

But I thougtht that wouldn't make sense

serene scaffold
#

they're not decimal numbers.

upbeat furnace
#

Ohh

serene scaffold
#

when you have x.y.z, x, y, and z are each their own number

#

that's why we went from Python 3.9 to 3.10

upbeat furnace
#

Ahh thanks! 😅

serene scaffold
steady basalt
#

do u think we shud have a pinned or channel description for all the peope who asking where to start?

serene scaffold
eager wedge
#

Can my segmented image be in gray scale if it is multiclass?

tidal bough
#

this was simulating, IIRC, a particle moving in a combination of constant magnetic and electic fields, for an electrodynamics homework task

#

I realised after making it that the task in question was in fact exactly analytically solvable (rather than only approximately so like I thought) by just applying a fourier transform 😔

still dirge
#

do you perhaps have the task because i really want to try doing that

#

it's hella cool!

tidal bough
#

.latex A dielectric can be described using a model in which each atom constists of a stationary positive charge and an electron moving around close to it. Their interaction is described by a simplified potential $\frac{m \omega_0^2 r^2}{2}$, where $r$ is the distance between the electron and the center of the atom, $m$ is the electron mass. Find the dielectric permittivity tensor of the medium in a variable magnetic field $B$ which is pointing along axis $z$ and has frequency $\omega$. Represent the answer in terms of the following parameter:
[
\omega_{LT}= \frac{2 \pi e^2 n}{m \omega_0},
]

where $n$ is the concentration of atoms, $e$ is the electron charge.

strange elbowBOT
tidal bough
#

yay, surprised the bot handled that correctly.

tidal bough
# strange elbow

(note: the permittivity tensor is at frequency omega, too. So you have a magnetic field at that frequency, and apply an electric field at that frequency, and get some electrons moving at that frequency as a result.)

steady basalt
wooden sail
#

ooh pretty nice

misty flint
sage fulcrum
#

Hm

#

Fail to build pycocotools in linux ?

#

Without conda , does anyone have any fix

steady basalt
#

is sklearns svr rbf guassian?

night sequoia
steady basalt
#

@night sequoia ur the perfect person to answer my question then

#

im literally writing a report on svr as we speak

#

if its ok can i use ur visualisastion code i cba to write my own

night sequoia
steady basalt
#

so is sklearn using guaissian rbf?

#

oh nvm

#

theres only one kernel

night sequoia
steady basalt
#

do u think i shud visualise sample points to show how it works

#

or just use a randomised one like oyu

night sequoia
#

u can use the make_moon dataset

sharp sinew
#

Can someone help to refer a text book of stats?

tacit basin
# sharp sinew Can someone help to refer a text book of stats?
GitHub

An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code - GitHub - JWarmenhoven/ISLR-python: An Introduction to Statistical Learning (James, Witten, Hastie, T...

sharp sinew
#

@tacit basin I mean to say for helping data science,this above book doesn't contain topic like p-test,z-test,chi-square test

tacit basin
steady basalt
#

can i get ur guys take on normalising by dividing by 100 if values are between 80 and 300

#

my professor did it

#

he just put eveything on a 0-3 scale

#

lol

#

well theres v small values that are <1 and they were /100 also

#

so its the same scale

#

just not 0-1

tacit basin
#

I thought they talk p value in stats learning. Was Long time ago when i read it though...

steady basalt
#

yo does keras have attention function

#

oh damn it does

nova matrix
#

guys I am currently working on this binary classification dataset. I've made the model and want to now apply it to the actual test data. It contains around 1 mill rows and I fear it may restart my kernel due to insufficient memory. Is there anyway I could maybe get around this. Should I split the testing set or wot. I need a final csv of all the final scores

mild dirge
#

How many columns and what type of data? @nova matrix

nova matrix
#

I mean it did have a few categorical strings but aim is to change them into int vals

mild dirge
#

Yeah so that might be too much to load at once

#

Not sure in what format it is stored right now, but you could just load it in batches

nova matrix
#

you know any good way to do that or any link to smth that shows it, cuz I've never done it like that b4

mild dirge
#

How is it stored right now?

nova matrix
#

csv format

mild dirge
#

This explains how to do it pretty well

tropic tiger
#

I'm using pandas to do expectation-maximization but I'm stuck with how the table is rearranged by the merge method after multiplying two factors.

#

this is what I have.
What I want rearrange column Dunett to the pattern of ('False', mild, severe, false, mild,severe, so and so forth)

#

That way it's easier for me to do a normalization

#

Does anyone have any suggestion?

crude shadow
tropic tiger
crude shadow
#

Oo just realized you wanted to order by columns too

tropic tiger
#

maybe jump in this thread

#

I asked a question on here

tropic tiger
#

I did it!! Praise God

#

In the end, I just had to add a new column seems like

#

Wow this is something I learned. To manipulate something in ur df, often if you add more columns to store values can help

grave marten
#

is use the mnist dataset

#

i*

#

why is it showing 1875 instead of 60000 while training

#

?

#
# import libraries
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist
from matplotlib import pyplot as plt

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0

class_names = [0,1,2,3,4,5,6,7,8,9]

model = keras.Sequential([keras.layers.Flatten(input_shape=(28,28)),
                          keras.layers.Dense(128,activation='relu'),
                          keras.layers.Dense(10, activation='softmax')])
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(train_images,train_labels,epochs=3)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose = 1)
print("Test accuracy : ", test_acc)


i = 0
plt.figure()
plt.title(f"the number is : {train_labels[i]}")
plt.imshow(train_images[i])
plt.colorbar()
plt.grid(False)
plt.show()
print(class_names[test_labels[i]])

this is the code

tacit basin
lost ivy
#

Hi, I hope this is the right forum to pose this question

#
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

data = sheet.get_all_values()
df = pd.DataFrame(data)
df_sorted = df.sort_values("GAMEs")

NAME    GAMEs    ATT    COMP    YRDs    PCT    TDs    INTs
Mills    14    438    287    3468    65.5    18    8
Lance    19    318    208    2947    65.4    30    1
Trask    26    813    552    7386    67.9    69    15
Wilson    30    837    566    7652    67.6    56    15
Jones    30    556    413    6126    74.3    56    7
Fields    34    618    423    5761    68.4    67    9
Law     40    1138    758    10098    66.6    90    17
Book    45    1141    728    8948    63.8    72    20
Mond    46    1358    801    9661    59    71    27
Eger    46    1476    923    11436    62.5    94    27

num0 = df_sorted["GAMEs"]
num1 = ["green" if (g < max(num0)) else "red" for g in num0]

plt.figure(figsize = (10, 10))

plt.bar(x = df_sorted["NAME"], height = df_sorted["GAMEs"], width = 0.4, color = num1)

plt.grid(color = 'red', alpha = 0.2, linestyle = '--', linewidth = 1)

plt.xlabel("QB Names (x)", fontsize = 12)
plt.ylabel("Number of Games Played (y)", fontsize = 12)

plt.xticks(rotation = 45, fontsize = 12)
plt.yticks(fontsize = 12)

plt.title("Total Number of Games Played", fontsize = 12, fontweight = "bold")

plt.show()```
#

This starts at 14, I would like it to start at zero

#

adding plt.ylim(0, 50) does this

#

adding bottom = 0 or bottom = None, does not change the graph at all

#

I have tried everything and researched multiple sites that offer almost the same advice and none of it works. So decided to try here

stoic viper
#

2 dataframes in a list and this doesnt work....

for df in df_list:
    df.loc[~(df==0).any(axis=1)]
#

why

lost ivy
stoic viper
#

no its not the same df

#

i just want rows with zeroes to be gone

lost ivy
#

you want to replace 0's in the df with 1's correct?

stoic viper
#

no just delete all 0

#

actually i want do delete rows that contain 1

#

in one row and zeroes in another, but i wrote a small example

#

and then use that on the list. Problem is it works without for, but not with for

lost ivy
#

1 for df in df_list:
----> 2 df.loc[~(df==0).any(axis=1)]

AttributeError: 'int' object has no attribute 'loc'

tidal bough
gloomy anvil
#

So I evaluated a number of models in terms of binary classification for different data. Now that I am writing a about it, I am unsure how to group these models thematically and especially regarding my table of contents. The models are:

Logistic Regression, Bernoulli Naive Bayes, Random Forest, Regularized Greedy Forest, XGB, Deep Neural Network, ROCKET, SVM, KNN, LSTM, GRU, RNN, Voting, Stacking, Bagging.

Easy ones:
Recurrent models: LSTM, GRU, RNN

Decision Trees: Random Forest, RGF, XGB

Ensemble Models: Voting, Stacking, Bagging

That leaves: Logistic Regression, Bernoulli Naive Bayes, Deep Neural Network, ROCKET, SVM, KNN,

As ROCKET uses convolutional kernels and basically resembles CNNs, I thought about grouping DNN and ROCKET as Neural Networks, but I mean the recurrent models above are neural networks as well.

SVM and KNN could be grouped as nonparametric models. That would leave Logistic Regression and Bernoulli. Logistic Regression despite its name could be categorized as generalized linear model in this binary classification approach. But Bernoulli is not really a linear model, is it?

I would love to categorize the models ideally in groups of 3 (give or take). Do you have suggestions on what kind of meta chapters / categories to choose? How would you go about categorizing these models?

fallow frost
#

If im going to pursue a career in Data analytics which languages, libraries, and tools do I need to know ?

#

I already know Python pretty well (OOP, Numpy, Pandas), some bash, and SQL pretty decently, what should be my next priority ?

serene scaffold
#

@fallow frost try doing some actually projects that leverage or build on all of these skills

fallow frost
#

but do you think I could even get an Internship or an entry level job yet with just these skills ?

fallow frost
serene scaffold
fallow frost
#

Im self taught 😭

#

but I just enrolled in a boot camp

serene scaffold
#

do you have a degree in something else?

fallow frost
#

and it will take 7 months to finish, so I want to find something befoore

fallow frost
#

which is why im doing the bootcamp,, to get some kinf of formal 'education'

serene scaffold
serene scaffold
fallow frost
#

ofc but you cant say that there is no chance at all

serene scaffold
#

why would you go with the option that gives you a worse chance...?

fallow frost
#

like for a senior data scientist I can understand, but for an ENTRY Data Analyst then It should be fairly easy if you have a solid portfolio

serene scaffold
#

but for an ENTRY Data Analyst then It should be fairly easy not completely impossible if you have a solid portfolio

fallow frost
serene scaffold
#

Anyway, I don't think I'm going to change your mind, but I hope this works out for you.

fallow frost
#

so they will have to find me a job

#

Yeah probably

#

but what do you do ? @serene scaffold

#

Data scientist ? analyst ?

serene scaffold
fallow frost
#

like NLP ?

serene scaffold
#

yes

fallow frost
#

ahh ok
and what kind of degree do you have ?

serene scaffold
#

bachelors in CS, with AI and DS-related coursework.

#

I probably wouldn't have gotten this job with only a bachelors if I didn't also publish as an undergrad.

muted pendant
fallow frost
#

pardon my noobiness but wth is an undergrad

serene scaffold
fallow frost
#

so its the same as a bachelors ?

serene scaffold
#

my degree is a bachelors, yes.

fallow frost
#

ok

#

So what other tools should I study next

#

what do you recommend ?

serene scaffold
#

I should add that I don't think university is the perfect mechanism for imparting knowledge. it's just the most widely recognized.

serene scaffold
fallow frost
#

but even then, just knowing those three tools isnt enough, right ?

serene scaffold
fallow frost
#

so thats why im asking what else I should learn

#

to find a job

serene scaffold
#

each job is going to have different criteria that they look for

fallow frost
#

which imo I will be able to find even b4 I finish the boot camp if I manage to master those tools

#

and btw im not trying to get a 'good' job, I really dont give a shit as long as im hired

#

then I could leverage my work experience to get smth better instead of doing 3-4 years of college

wooden sail
#

do you have some idea of the kind of position you'd like, or where you'd like to work?`you could look at job descriptions and see how you compare to the sought-after profile

#

or whether you feel confident enough to do what the task descriptions contain

fallow frost
wooden sail
#

hmm but it makes more sense imo to focus on the task, not the tool. what if you don't even get the choice to work with python, for whatever reason?

fallow frost
# wooden sail do you have some idea of the kind of position you'd like, or where you'd like to...

I'm looking at the job descriptions on Linkdin and 95% of the job postings are by agencies who dont know jack shit about programming or data science, but most are like: 'were looking for an Entry level data analyst with 2+ years of experience and most know the following 10 technologies, and the ideal candidate will also be familiar with these additional 10 tools'
so its fucking ridiculous

fallow frost
#

Like there is so much that im not sure where to continue
Data visualization: Tablue, Power BI, ETL, ML, pySpark and so many more

wooden sail
#

i guess i couldn't say, i guess my position is a little removed from the real world

#

looks like you'd wanna go into visualization next, in any case

limber token
#

Is there a way to pass multiple separators to pandas' read_csv()? Something akin to this:

fallow frost
limber token
fallow frost
limber token
#

Used: df = pd.read_csv('products_2.csv', sep="\s+|;|,", engine='python')

serene scaffold
#

because what that error message is telling you is that if you use \s+|;|, as the delimiter, each row has different numbers of columns

#

and that's not allowed.

limber token
#

Hm

#

Libre Office reads it perfectly, but it has a size limit

serene scaffold
#

rekt

limber token
#

And I prefer to manipualte data on Pandas anyway

#

What should I do?

serene scaffold
#

see if you can figure out which delimiter is used the same number of times in each row

#

that's the real file-level delimiter. if you need to break up the nested data, you can use .str.split(';').explode()

limber token
#

I'm not sure what to do.
The file is an import file for my work, but it was a faulty import file that setted a bunch of attribute values as empty strings when it shouldn't have, my job is to recover the lost attributes

serene scaffold
regal warren
#

Hi everyone. Can someone please let me know what is the best way and resource to learn Pyspark?

spiral furnace
#

guys do you face problems lately using chrome with google colab?

mild dirge
#

Nope 👍🏽

terse dagger
#

Hey guys,

Does anyone know how to fix this error?

ValueError: cannot reindex on an axis with duplicate labels

#help-orange full contsxt

serene scaffold
terse dagger
#

Ill do the print lets continue in orange channel

river cloud
#

Hello everyone!. I need your help please, I am investigate this to long time but i don't have answer. I need export a dataframe to excel but in excel in the "format number", I need that have "format number " be text, like the pic. Because always when I export to excel it puts "General" and not "Text"

spiral furnace
river cloud
spiral furnace
#

have you also checked Pandas.ExcelWriter Method?

river cloud
spiral furnace
odd meteor
spiral furnace
#

but first you might have to create your excel filewith pandas and then edit it with XlsxWriter

paper quarry
#

Hello! Does anyone here knows how to implement the NBEATS algorithm and can help me with it? Thanks in advance.

boreal cape
#

Hello. Does anyone know what these mean or how to replicate this format?

1.346899999999999977e+02
1.322500000000000000e+02
1.300000000000000000e+02
1.335200000000000102e+02
1.303079999999999927e+02

mild dirge
#

It's scientific notation

#

the top one is like 1.346899.. * 10 ^ 02

#

@boreal cape

boreal cape
#

Hmm... These are supposed to be stock prices so I have no idea how it got turned into that.

wooden sail
#

it's just a way to print a float

mild dirge
#

Yeah, it doesn't change anything about the numbers

#

the number is still 134.6899...

boreal cape
#

OH

#

Lightbulb

#

Y'all just solved 2 days of frustration

pliant star
#

hey guys, any idea why the code keeps crashing me?

def detect(self, image, original_shape=None):
        height, width, channels = image.shape

        blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

        self.net.setInput(blob)
        outs = self.net.forward(self.output_layers)```
#
(tensorflow)   SoSe22/appliedcomputerscienceinsports/detector » python main.py                                                                        main deleted modified untracked
Traceback (most recent call last):
  File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/main.py", line 55, in <module>
    run()
  File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/main.py", line 32, in run
    processed_image = detector.detect(preprocessed_image, original_image.shape)
  File "/Users/s/dev/uni/SoSe22/appliedcomputerscienceinsports/detector/test_ui/detector/__init__.py", line 32, in detect
    blob = cv2.dnn.blobFromImage(image, 1, (416, 416), (0, 0, 0), True, crop=False)
cv2.error: OpenCV(4.6.0) /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:3689: error: (-215:Assertion failed) !dsize.empty() in function 'resize'```
#

(3, 2160, 1216) that's my image.shape

mild dirge
#

That normally means that the image is empty @pliant star maybe you didn't load it in correctly

#

Like a wrong path or something

#

Try cv2.imshow(image) and cv2.waitkey(0)

#

before the line that gives the error

mint palm
#

is it possible to detect 3 classes using only 2 "anchor boxes" when using bounding box technique??

#

why does he take 2

#

am i right if i think, we assume that more then 2 classes are highly unlikely in single bounding box??

mild dirge
#
Anchor boxes are a set of predefined bounding boxes of a certain height and width. These boxes are defined to capture the scale and aspect ratio of specific object classes you want to detect and are typically chosen based on object sizes in your training datasets.
#

This is what I found on anchor boxes

#

It seems that there may be 2 different shapes that are likely to contain a car/motorcycle/pedestrian etc.

#

@mint palm

#

Does that make sense?

#

So for each bounding box shape (2) you get a result for each class (8) whether that class is contained in a single box of the 3x3 grid

mint palm
mint palm
rough mountain
#

Anyone know of a tokenizer that will parse a sentence like this
"You're amazing," she said.
['"', "you're", "amazing", ",", '"', "she", "said", "."]

#

" you're amazing , " she said .

serene scaffold
rough mountain
serene scaffold
rough mountain
#

I forgot how amazing spacy's docs are

fiery dust
#

what do I need to know to learn and use ML/AI

serene scaffold
eager wedge
#

could someone help me with 3d patch extraction?

grand vapor
#

if you append dataframes to a list, such that each item in the list is a dataframe, is there a way to name these dataframes for reference? or do you just need to use the list index (listname[0], for example) and remember what each dataframe is meant to tie back to?

#

i think with my situation I may need to just keep track in my head what each df is. they're more or less in a certain order already, but it'd help to have a visual reminder

serene scaffold
grand vapor
serene scaffold
rough mountain
#

unrelated to my other question.

I wish to make an model that creates space ships for a video game. I've got the data from the save files and the game store them as json.

  "Parts": [
    {
      "IDString": "cosmoteer.armor_1x2_wedge_L",
      "Location": [
        -14,
        6
      ],
      "Rotation": 0
    },
    {
      "IDString": "cosmoteer.armor_1x2_wedge_R",
      "Location": [
        -14,
        8
      ],
      "Rotation": 2
    },
ect...```
Json is impractical for ML models, but I could turn the part list into an image, giving each tile an id. The big issue is parts come in a bunch of different sizes. (Luckily always rectangles.) So if I just tell it to predict a grid of part ids, it would likely start making impossible parts. But I can't see a better way.
#

Also having it output a rotation for every tile seems problematic, but inferencing the rotation from an output takes away control from the AI when multiple rotations could work.

#

I also need it to output a map of doors. Those are 1x1 so it's easier. my only issue here is this might be too much to throw at one model.

fiery dust
chilly abyss
#

Hi all

#

pls how can I split this date and time using pandas lib?

shrewd locust
#

Unknown label type: 'continuous-multioutput' what's this error?

tawdry urchin
#

@chilly abyss I was looking into this earlier today. People suggested using pd.to_datetime(df.date) but I could not get it to work. You can do
import datetime
df['date'].dt.normalize which sets all of the times to midnight, it obviously doesnt get rid of the time, but it may be useful

chilly abyss
#

Thanks @tawdry urchin This was the error msg I got - 'AttributeError: Can only use .dt accessor with datetimelike values'
I am still checking online for solution, I think because of the '..+00' at the end of the ts (time series)

#

I will also try the solution you gave

tawdry urchin
#

@chilly abyss Sorry you are right, you need to use both, so you do
df.date = pd.to_datetime(df.date)
df.date = df.date.dt.normalize
I was able to get it working like this in mine, hopefully it helps you out

#

Im still learning myself, I basically have to learn python to create fixes in processes, so apologies if im missing anything

chilly abyss
#

It's alright, we learn together. 🙂

#

I'm new to python too.

tawdry urchin
#

haha sounds good, im pretty comfortable with pandas, and ive done some pretty neat projects with python, its the classes/functions that I am mostly unfamiliar with, the code i am writing isn't really set up to run forever because im working with pretty bad data

tropic matrix
#

in pandas is there a way to specify dtype as "dict"? both strings and dictionaries are "objects" and it's making preprocessing a dataset rather annoying

#

i.e. select all columns that are a dict

serene scaffold
tidal bough
# tropic matrix i.e. select all columns that are a dict

Not possible, I'm pretty sure. Dtypes are a numpy concept and essentially describe how an element is stored in memory. The dtype of object actually means "a pointer to some PyObject". Pointers to objects of different python types aren't different, so there's no dtypes for different python types.
(This is also exactly how python lists store elements.)

tropic matrix
#

but thank you

tawdry urchin
#

Could you give an example? Im not sure I quite understand

serene scaffold
tropic matrix
# serene scaffold it's pretty unlikely that there isn't a better way to do it, but we'd need to kn...

i'll give an example of a few:

"attributes": {
  "mana_pool": 1,
  "undead_resistance": 1
}

there's around 15 different possible keys, and maximum of 2 can be present in one row, value is always an int

"runes": {
  "BLOOD_2": 1
}

there's an unknown amount of possible keys, but the value is always an int

"gems": {
  "AMETHYST_2": "FINE",
  "AMETHYST_1": "FINE",
  "AMETHYST_0": "FINE"
}

unknown amount of keys, value is always a string

the thing is is that i'm able to easily process those dicts to make them better to work with (i.e. similar to one hot encoding, convert the key to a column and the value becomes the cell for that column), however when trying to continue with preprocessing when i use pd.getdummies to one hot encode all of the leftover strings, it's finding a dict in the dataset and i'm trying to figure out where it's hiding

tawdry urchin
#

I have never dealt with a dataset like that and truthfully im not sure what I would do, but luckily im not the smartest one here 🙂

tropic matrix
#

there must be somewhere where i'm forgetting to delete a column or smth after i preprocess it

#

but when i print the dataset columns none of them have dicts which is odd

tawdry urchin
#

just make sure your dataframe names are consistent throughout

#

pretty easy to bring the wrong dataframe in and then youre reviewing the correct one and working with a prior version

iron basalt
tropic matrix
#

i'm just working with what i have

iron basalt
#

Convert them to lists and store them somewhere.

#

Then just use that, so you don't have to convert each time.

#

You can transform it however you want and store it however you want.

#

Take the bad data format and fix it.

#

The attributes looks like an array of booleans. If that is the case, since there are only 15 at most, you can use a bitmask.

#

(bitset)

#

How many different values the runes and gems can be changes how they can be stored.

#

If there is some fixed set of values you can improve it a lot.

tropic matrix
iron basalt
#

(Not there, or 1-10)

#

Well you can store those in a numpy array.

#

Runes and gems just look like two lists, that are dicts for some reason.

#

(But out of order?)

tropic matrix
#

but what i think you're not realizing is that i'm writing code on doing all of this rn

#

and i can handle that and i've already written code that converts these dicts into a better format

#

the issue is that when trying to run pd.getdummies it's saying there's still a dict dtype present, even though when I check the dataframe i don't find any

tropic matrix
#

sheesh that was passive aggressive, one sec

tawdry urchin
#

sounds like some stardew valley shit

tropic matrix
tropic matrix
# iron basalt Show code.

btw the code isn't pretty but it works, wrote it without the intention of it being read by other people

df = full_df.drop(['uuid'], axis=1, errors='ignore')

if verbose: print('Starting enchantments')
df = df.join(pd.DataFrame(list(df['enchantments'])).fillna(0).add_suffix('_enchantment')).drop('enchantments', axis=1)
if verbose: print('Encoded enchantments')

if verbose: print('Starting ability_scroll')
df = df.join(pd.DataFrame(ability_scroll_mlb.transform(df['ability_scroll']), columns=ability_scroll_mlb.classes_).add_suffix('_ability_scroll')).drop('ability_scroll', axis=1)
if verbose: print('Encoded ability_scroll')

if verbose: print('Starting gems')
df = df.join(pd.DataFrame(list(df['gems'])).fillna('').drop('unlocked_slots', axis=1, errors='ignore').add_suffix('_gem')).drop('gems', axis=1)
if verbose: print('Finished gems')
    
if verbose: print('Started runes')
df = df.join(pd.DataFrame(list(df['runes'])).fillna('').add_suffix('_rune')).drop('runes', axis=1)
if verbose: print('Finished runes')
    
if verbose: print('Started necromancer_souls')    
df = df.join(pd.DataFrame(necromancer_souls_mlb.transform(df['necromancer_souls'].apply(lambda x: list(map(lambda y: y['mob_id'], x)))), columns=necromancer_souls_mlb.classes_).add_suffix('_necromancer_soul')).drop('necromancer_souls', axis=1)
if verbose: print('Finished necromancer_souls')
    
if verbose: print('Started attributes')
df = df.join(pd.DataFrame(list(df['attributes'])).fillna(0).add_suffix('_attribute')).drop('attributes', axis=1)
if verbose: print('Finished attributes')

here's the code for converting all of those dicts/lists that ik are present in the dataset into a better format with pandas

#

@iron basalt

iron basalt
tropic matrix
# iron basalt Where does the get_dummies happen?

this is the code that contains it (it is run directly after the code block above):

X = df.drop('price', axis=1)
y = df[['price']]

if verbose: print('Started encoding')

X = pd.get_dummies(X) # here

df_columns = X.columns.tolist()

scaler_X = StandardScaler()
scaler_X.fit(X)

scaler_y = StandardScaler()
scaler_y.fit(y)
iron basalt
#

What is X's type / how does it look like? Before get_dummies.

tropic matrix
tropic matrix
#

one moment

tropic matrix
iron basalt
#

dtype('0') is something invalid, probably a dict.

#

Well, it can be in a numpy array, but it's kind of like object.

tropic matrix
#

when i checked all of the columns it's all either strings floats or ints

iron basalt
#

So it's the strings.

tropic matrix
iron basalt
#

What does print(X.dtypes) show?

tropic matrix
# iron basalt What does print(X.dtypes) show?

well there's 300 columns by the time it gets to X, so when i print X.dtypes it shouls a few int64s, a decent amount of float64s, and a whole lot of objects, and when i go to check all of the columns that show object they are all just strings

#

and i won't be able to send that here without flooding the channel, it doesn't even print everything to console after some point it says "..." then goes to the end (but i checked the dtypes manually without relying on the console output

iron basalt
#

And what is the exact error it gives?

#

Do your strings have some max size?

#

If your numpy arrays at any point contain different types of objects they will have the "object" type (like regular Python lists), but if it's all strings, with some max length, it can actually store that.

#
>>> x = np.array([foo, "Hello"])
>>> x
array([<__main__.Foo object at 0x7f402d6a7f70>, 'Hello'], dtype=object)
>>> y = np.array(["Hello", "World"])
>>> y
array(['Hello', 'World'], dtype='<U5')
>>> 
``` With a unicode dtype.
limber token
iron basalt
#

(And that "object" may mean mixed types (it's a design flaw of pandas, because Numpy was not really meant for this and it uses it))

tropic matrix
#

it's not letting me paste the error here so i'll pastebin it

#

this happens on the X = pd.get_dummies(X) line

iron basalt
#

That is what I got when trying to get_dummies on a df that had both strings and dicts in the same column, dtype shows as "object" (for the column).

tropic matrix
#

hm

#

..... you need to read up we've been discussing this for the past hour or two

iron basalt
#

If it's all string it could in theory be something like <U... for the dtype, which would only be strings up to that length. It's both faster and in this case safer for detecting issues.

tropic matrix
#

that's weird, i've never had pandas give me a unicode dtype

#

let me test smth rq

iron basalt
#

Pandas uses object for strings, a design flaw to an extent.

#

They want dynamic strings.

#

(arbitrary length)

#

(This gives issues to databases too in terms of speed and safety and all that)

tropic matrix
#

how could i convert a string column to a unicode dtype

#

?

#
test = pd.DataFrame.from_dict({
    'test': ['abc', 'def'],
    'test2': ['ghi', 'jkl']
})

print(test)

print(test.dtypes)
  test test2
0  abc   ghi
1  def   jkl
test     object
test2    object
dtype: object
iron basalt
#

If you make a numpy array with strings it will do it.

tropic matrix
#
test = pd.DataFrame.from_dict({
    'test': ['abc', 'def'],
    'test2': ['ghi', 'jkl']
})

print(np.array(test['test']))
array(['abc', 'def'], dtype=object)
#

@iron basalt hm

#

any way to coerce it into the unicode string dtype?

iron basalt
#

Yeah if you explicitly do dtype=whatever for a numpy array

#

Pandas seems allergic to the idea though.

tropic matrix
iron basalt
#

Yeah, ```py

x = np.array(["Hello", "World"])
x
array(['Hello', 'World'], dtype='<U5')

#

Pretty annoying.

#

Makes debugging Pandas even harder.

tropic matrix
#

if i set any casting limits other than unsafe it decides that strings aren't strings anymore

#

the only way for it to work is casting unsafe

#

and the thing is, that converts the dict to a string too

#

🤦

iron basalt
#

Playing fast and loose with types.

#

Yeah, IDK, now gotta make a separate part that loops through it all and tries to find out the type to detect an issue, then try to narrow down what caused it and craft an example to test against. Ideally the input would have some spec. to make this way less hacky.

tropic matrix
#
test = pd.DataFrame.from_dict({
    'test': ['abc', 'def', {'test': 1}],
    'test2': ['ghi', 'jkl', {'test': 2}]
})

def test_func(cell):
    try:
        json.loads(cell)
    except:
        print(cell)
    
    return cell
    
test['test'].apply(test_func)
#

it excepts whenever it encounters a dict

#

sorry whenever it doesn't encounter a dict

#

so i can kinda flip flop it

#

wait a minute no this doesn't work

#

first i need to convert it all to a string

#

oh you're kidding me

#

it expects double quotes

#

ok @iron basalt this one actually works:

#
test = pd.DataFrame.from_dict({
    'test': ['abc', 'def', {'test': 1}],
    'test2': ['ghi', 'jkl', {'test': 2}]
})

test['test'] = np.array(test['test']).astype('unicode')

def test_func(cell):
    try:
        isinstance(eval(cell), dict)
        print(cell)
    except:
        pass
    
    return cell
    
test['test'].apply(test_func)
#

alright trying to apply that to the entire dataset

#

istg if it passes cleanly i'm gonna punch a wall

iron basalt
#

Ok, just make sure there is nothing strange happening with the eval.

#

Don't want to suddenly start deleting root.

tropic matrix
#

yeah that's not gonna be an issue thankfully

#

ISTFG

#

@iron basalt IT PASSED CLEANLY

#

IDEK WHAT TO SAY

iron basalt
#

Uh, i'm out of ideas for now.

tropic matrix
#

me too

#

maybe i should use sklearn to onehotencode now

#

maybe that will work better than get dummies 😭

iron basalt
#

Personally I would have switch languages to something with static types and done it manually at this point. This seems a bit too complex for Pandas.

tropic matrix
#

R?

iron basalt
#

Or a different dataframe lib for Python.

tropic matrix
iron basalt
#

Then if get_dummies is still bugging, IDK.

#
>>> import polars as pl
>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...         "optional": [28, 300, None, 2, -30],
...     }
... )
>>> df
shape: (5, 5)
┌─────┬────────┬─────┬────────┬──────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ optional │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---      │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64      │
╞═════╪════════╪═════╪════════╪══════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 28       │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 300      │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ null     │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ -30      │
└─────┴────────┴─────┴────────┴──────────┘
>>> 
``` Polars has an actual str type.
#

Rather than just object.

#
>>> df2 = df.to_pandas()
>>> df2
   A  fruits  B    cars  optional
0  1  banana  5  beetle      28.0
1  2  banana  4    audi     300.0
2  3   apple  3  beetle       NaN
3  4   apple  2  beetle       2.0
4  5  banana  1  beetle     -30.0
>>> df2.dtypes
A             int64
fruits       object
B             int64
cars         object
optional    float64
dtype: object
>>> 
#

(Pandas just decided to throw everything under "object" because it's using numpy)

tropic matrix
#

if so then i can do the first parts in pandas

#

possibly then after that convert to polars

#

unless polars doesn't have a get dummies equivalent

#

then i'll have to refactor my entire processing setup 😀

#

oh it seems it does

#

polars.from_pandas and polars.get_dummies both exist

iron basalt
#

I find polars easier to read. Feels like database queries.

tropic matrix
#

who tf decided pandas would be the standard

#

this seems so much better

iron basalt
#

Standards by popularity are not really standards, but also standardization should only be done when the thing being standardized has had a high level of effort and hindsight to it (and is not rapidly changing).

#

Most big Python libs are constantly changing (open source) and all that, not really stable stuff to make standards of.

#

More abstract standards do exist though and are fine, like the one for generic array-like API for Python libs.

tropic matrix
#

@iron basalt might be a dumb question, but is polars able to store dicts?

#

because i keep getting an error when trying to convert a string representation of a dict into a dict saying "tuple must be same length"

charred light
#

Hello darkness my old friend, over-fitting has come again.

iron basalt
charred light
#

Also, there's pyspark for larger datasets. and SQL

iron basalt
iron basalt
#

Dataframes are really nice when your problem / data fits with them (same with databases).

#

Extracting data from some non-standard file format will always be a pain best done manually.

#

(If that file format is complex and/or highly structured / nested)

#

(CSV would be an example that is the opposite, well understood, simple, built-in support everywhere, fits well into what dataframes want to do)

violet gull
#

im going to a machine learning event thingy and i was wondering what all i should bring on a flash drive

#

only thing i can think of is a matrix multiplication and normalizing algorithm

grand vapor
#

if my dataframe uses a datetime index, is it possible to find the "row number" of a specific time? I am wanting to drop all rows that occur after a certain time, but am unsure of how to do it

violet gull
grand vapor
#

i would typically do something like

df.drop(df.tail(rows).index, inplace = True)

But because of the datetime index, I can't come up with an integer to put in place of "rows"

royal garnet
#

Is it possible for a function used with dataframe.apply to create a new dataframe?

#

I want to read each row in my original dataframe, and based on the values of certain columns, conditionally populate a new dataframe.

stoic viper
#

Hey,

#

lets say i have a column in 2 dataframes, that are in a list, that has the same name and i want to remove alls rows with 1 in that column. with ```
for df in df_list

#

It doesnt apply it to the dataframes. Works without for

stoic viper
#
df = df[df['Column_Name'] == 0]
#

example of what i do.

lofty elk
#

I am making a visualization here and the bar chart has lines on the bars
I want to remove these lines
here is the code

#

import pandas as pd 
import plotly 
import plotly.express as px 
import plotly.io as pio 

df = pd.read_csv("Caste.csv")
df = df[df['state_name']=='Maharashtra']
#df = df.groupby(['year','gender',],as_index=False)[['detenues','under_trial','convicts','others']].sum()


barchart = px.bar(data_frame=df, 
    x='year', 
    y='convicts', 
    color='gender', 
    opacity=1, orientation='v', 
    barmode='relative',
)

pio.show(barchart)
serene briar
#

@untold bloom cheers for your help in the help channel, was pulled into a meeting so wasn't able to follow through but thanks again ^^

terse dagger
#

Guys help pls creating a dataframe to count members with diff activity statuses. I was here yesterday but need help again #help-pear

zealous burrow
lofty elk
#

Dash is pretty cool

ancient pendant
#

Hi Everyone,
I wanted to know how to find correlation between multiple variables in python?

#

I was looking at numpy document where it said with np.corrcoef() you can find correaltion but only of 2 variables

#

No I have been told to use only np.corrcoef

wooden sail
#

what's your question?

#

you can place the variables as columns (or rows) of the matrix (or matrices) you pass to np.corrcoef

wooden sail
# ancient pendant No I have been told to use only np.corrcoef

you can put as many variables as you want into the columns or rows of the matrix, as long as you specify which axis you put them on. in this example, we see that, as one would expect from independent random vectors with mean 0, their correlation goes to 0 with the number of observations

In [16]: import numpy as np

In [17]: x = np.random.normal(0,1,(20,2))

In [18]: np.corrcoef(x, rowvar=False)
Out[18]:
array([[1.        , 0.02356115],
       [0.02356115, 1.        ]])

In [19]: x = np.random.normal(0,1,(100,2))

In [20]: np.corrcoef(x, rowvar=False)
Out[20]:
array([[ 1.        , -0.06573243],
       [-0.06573243,  1.        ]])

*edit on second glance it doesn't behave so nicely even with so many samples, but you still get the idea of how to use the func

tacit basin
wooden sail
#
In [25]: x = np.random.normal(0,1,(10000000,2))

In [26]: np.corrcoef(x, rowvar=False)
Out[26]:
array([[1.00000000e+00, 2.93622044e-04],
       [2.93622044e-04, 1.00000000e+00]])

ok, this is better

#

you could alternatively compute it "by hand" by normalizing the rows or columns, and then computing X^T X or X X^T depending on how you arrange your data

ancient pendant
# wooden sail what's your question?

So I have four variables(ABCD)
A, B, C, D
1 2 3 4
5 6 7 8
11 12 13 14

first I was told to find pearson correlation between A and D variables
by using scipy.stats.pearson()

and now I have been told to find correlations between all variables using np.corrcoef()

wooden sail
#

aight. so you have your variables arranged as columns, and several observations arranged as the rows. that means you'd also have to use rowvar=False

ancient pendant
#

Ohhhkayyyy Thanks🙏 🤩

wooden sail
#
In [32]: x = np.zeros((3,4))

In [33]: x[0,:] = np.arange(1,5)

In [34]: x[1,:] = np.arange(5,9)

In [35]: x[2,:] = np.arange(11,15)

In [36]: x
Out[36]:
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.],
       [11., 12., 13., 14.]])

In [37]: np.corrcoef(x, rowvar=False)
Out[37]:
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

like so

ancient pendant
#

SO when we don't use rowvar=False
Is np.corrcoef() gives correlation between each element in columns?

wooden sail
#

yep

#

it would take the rows as defining variables, and each column would be an observation of the variables

#

the output would be 3x3

ancient pendant
#

Okay got it thanks🎉

hollow sentinel
#
import requests
import pprint as pp

# Change this to be your API key.
MY_API_KEY="bla bla"

url = "https://beta3.api.climatiq.io/search"
query="hotel room"

query_params = {
    # Free text query can be writen as the "query" parameter
    "query": query,
    # You can also filter on region, year, source and more
    # "AU" is Australia
    "region": "AU"
}

# You must always specify your AUTH token in the "Authorization" header like this.
authorization_headers = {"Authorization": f"Bearer: {MY_API_KEY}"}

# This performs the request and returns the result as JSON
response = requests.get(url, params=query_params, headers=authorization_headers).json()

# And here you can do whatever you want with the results
print(response.keys())
#

so i'm trying to form the URL for this API request but idk how

#

bc printing the json out on my console is lagging my machine

#

i can't even look at the json without my computer dyying