#data-science-and-ml

1 messages · Page 112 of 1

final kiln
#

I think it's just adding a dimension of size 1 to the end of it, but not sure

river cape
final kiln
#

it's probably there because of broadcasting rules

river cape
#

One more thing have you heard of cross validation score?

#

In that we divide the training set into train-test folds and then compute the accuracies right?

#

What does a fold mean?

final kiln
untold bloom
#

scalers (like StandardScaler() or MinMaxScaler()) expect their input to be 2D

#

single output regressors (which are most of them, like LinearRegression() or RandomForestRegressor()) yield their .predict output as 1D

#

so one must reshape the 1D prediction output to be 2D to be inverse-transformable via those scalers

#

so you reshape a 1D input of shape (N,) to be (N, 1)

#

.reshape(-1, 1) is one of the ways, another is [:, None] or [:, np.newaxis]

#

in your code, I presume you'll find a similar sort of reshaping to fit sc2 to your training target values in the first place

#

because again, they expect a 2D input, your target is 1D

untold bloom
final kiln
#

I usually use unsqueeze

river cape
#

by the use of double square brackets

#

[[6.5]] is in 2D right

desert oar
#

sometimes error messages are unhelpful, but even knowing where the error came from (the "traceback" part) is useful

#

and it's especially useful when asking for help, because otherwise you're forcing other people to guess at what the problem might be

mellow vector
#

is there a meaningful difference between a) df.drop(index = 50) and b) df = df.loc[~(df.index == 50)] also, should i focus less on bracket notation?

desert oar
#

it doesn't help here, but don't forget that ~(x == y) is just x != y

mellow vector
#

ya i thought about that after the fact

desert oar
mellow vector
#

regarding 50 though, im not working with unique indexes

#

is generating a series still overkill?

desert oar
desert oar
mellow vector
#

i don't have a choice in that, it's the coursework

desert oar
#

oh?

#

is this a multiindex? that's different

mellow vector
#

no it's just the instructors preference i guess but thats kind of besides the point

desert oar
#

!e ```python
import pandas as pd
df = pd.DataFrame({"i": [1, 2, 2], "y": [4, 5, 6]}).set_index("i")
df = df.drop(index=1)
print(df)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 |    y
002 | i   
003 | 2  5
004 | 2  6
desert oar
#

!e ```python
import pandas as pd
df = pd.DataFrame({"i": [1, 2, 2], "y": [4, 5, 6]}).set_index("i")
df = df.drop(index=2)
print(df)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 |    y
002 | i   
003 | 1  4
desert oar
#

looks like it works

#

so yes, use .drop when available

mellow vector
#

hmm think i will, just out of curiosity do you happen to know if pandas handles different index types differently? if it were a rangeindex it could obviously jump straight to it

#

eh nvm, i should practice reading docs

untold bloom
final kiln
#

no way I actually need that sqrt right ? there's no floor nor anything, so there's an assumption that the result is always an integer, which means that I can simplify it from wtv properties guarantee that assumption

untold bloom
#

that's the 1D array I meant

final kiln
#

the result is not only an integer, it is also odd

#

otherwise the -1 and division by 2 wouldnt be an integer

#

(odd number) = sqrt(x), what can I say about x?

ancient drift
#

anyone know why my acc is constant for all epochs (ive just shown 2 but it stays constant after that) when i use binary cross entropy as a loss function? seems to calculate just fine when i switch to normal non-binary cross entropy

lapis sequoia
#

anyone already experimented the training loss and inference result begain strange after tensorflow version upgrade ?

#

it's very weird, there is no warning or deprecation notice

#

is pytorch more safe?

final kiln
#

it's allocated once when the model is instantiated and then reused across all layers

final kiln
final kiln
#

most times double checking your code works, in my experience at least

ancient drift
final kiln
#

one per class

ancient drift
final kiln
#

so the loss goes down, but acc remains stable with repeated values

#

that is kinda odd

#

maybe the API for the binary cross entropy has something different to it

#

it's optimizing something other that for acc

#

that is, you may be using binary wrong

ancient drift
#

maybe, ill check the docs

#

funny thing is when i run this model on my test data it performs the best in acc

final kiln
#

sounds like a deep debugging is required

#

never had this problem at least

untold bloom
river cape
untold bloom
#

indeed

river cape
#

For polynomial regression , how to we deicde the degree of the polynomial?

final kiln
#

dont go too high tho, cuz after a certain power floating point stops working at certain ranges

river cape
final kiln
#

that looks like an x2 or x3

final kiln
#

it works the same as when you're fitting a network

#

the degree is an hyper parameter

#

and you gotta check for overfitting

river cape
#

Is this overfitting?

final kiln
#

I don't know, which points were use for fitting and which are being used for validation/test ?

river cape
#

X_grid = np.arange(min(X),max(X),0.01)
X_grid = X_grid.reshape(len(X_grid),1)
plt.scatter(X,Y,color = 'red')
plt.plot(X_grid,lin_reg2.predict(poly.fit_transform(X_grid)),color='Blue')
plt.title("Polynomial Regression Results")
plt.xlabel("Positon Level")
plt.ylabel("Salaries")
plt.show()

#

Only difference is the degree

#

Only is in 4 and the other 24

final kiln
#

you gotta select a certain percentage of points at random

#

and not use them during fitting

#

then calculate the error on those points after the fit

#

if that's high, you got an overfit

river cape
#

Or if a model is overfitting?

final kiln
#

there's like, an infinite number of curves you can pass through those points

#

you want the one that«ll be most useful to you

final kiln
#

I think it's doing that cuz you didn't give it enough time to fit

#

24 degrees is a lot, most of physics happens in the 1st 2nd and 3rd

#

oh, one way to improve it within the same number of iterations would be to to do x -> x - 5 substitution, so that the center of the approx is in x = 5

river cape
#

Thing is when i visualize the results normally , in the graph it shows me a straight line to each poinr

#

plt.scatter(X,Y,color = 'red')
plt.plot(X,lin_reg2.predict(X_poly),color = 'blue')
plt.title("Polynomial Regression Results")
plt.xlabel("Positon Level")
plt.ylabel("Salaries")
plt.show()

#

This is for normal visualization

final kiln
#

makes sense

#

24 is too high

#

you gotta do the train/val split too

#

try like, 5 or 6 degrees

river cape
river cape
final kiln
#

I mean you can gauge it by looking at the graph, but the correct way is using a split yeah

#

It's also a good idea to normalize the data

#

Divide the x axis by 10 and the y axis by 1e6

#

It helps preventing overflow

mild grotto
#

@wooden sail a little animation I made, thanks

final kiln
#

chat gpt almost nailed this transcription, I think if I'm more careful with my scribbles I can get it to do all my latex

wooden sail
final kiln
#

this is the math for the cuda kernel

#

I already caught an error when writing this

#

chat gpt was also not very good, might as well just code it right away

#

dirac should be lower lower on the third eqn

#

4 to 5 is actually wrong, first term is running over too many l's

#

i gotta limit it to those where k = k'

#

there's also one more symmetry consideration besides Mkk' = Mk'k, which is talking about the coordinates of the two vectors being doted, but also, there's symmetry with cc', since the dot product itself is comutative, that is, qncc' = qnc'c

dusky abyss
#

i have 3 classes represented by 0, 1 and 8, when i use tf's to_categorical on them all i get are 1 0 and 0 1 which is 2 classes, i checked the documentation and from what i understand it should work with multiple classes, i even changed my classes to 0 1 and 2 but in that case i only get a single class, 1

#

nvm its working now, had to explicitly tell it i had 3 classes

final kiln
#

this is gonna require some faith

#

or maybe a drawing to convince myself it makes sense

#

but again, the idea is that the metric tensor is symmetric so I save half the operations on a given calculation of the dot product, and the dot product is comutative, so I save half the operation cuz the resulting scores matrix is then also symmetric

#

half at each step

clear dove
#

Hey, everyone

#

I am currently in 2nd year AI & DS field and I need a mini project that I can represent at my University also which I can use for my portfolio any ideas??

long canopy
#

@final kiln what times have you been getting on your instance+container startups?

#

am trying to see if I want to go instance -> container -> task or just instance -> task

#

am gonna run a couple of tests later

final kiln
#

if you have the image locally it should be like a couple seconds

#

usually better after you've initted it at least once before

long canopy
#

a custom minimal arch image or something

#

but it feels like it might be less efficient on a large scale

final kiln
#

it's a lot worst if I don't do it

#

haven't found a good solution for this

long canopy
final kiln
#

the nvidia stuff is heavy

#

I almost wonder if I should yield and just use the base AMI

#

but I don't want vendor lockin

long canopy
#

because virtualization can cause fuckups here

final kiln
#

like really big in comparison with the cpu version

long canopy
#

i mean in terms of speed

final kiln
#

haven't compared really

#

but ought to be faster, tho might not be

long canopy
#

hm i'll be running these benchmarks in the coming week

long canopy
final kiln
#

unless you got a mismatch between the image arch and the machine arch

long canopy
#

you're not on cloud?

final kiln
long canopy
#

yeah the instance is a virtual machine, it's not bare metal

final kiln
#

idk if they virtualize the gpu

#

I recall doing something like having multiple processes use the gpu at the same time, and there were no guard rails anywhere

#

might as well just do a 1:1 mounting of the gpu into the VM's right

sturdy thistle
#

“If you want to go fast, go alone. If you want to go far, go together”

Hey there! I'm currently self-studying statistics as a prerequisite for artificial intelligence. So, I'm looking to join a community of like-minded individuals. If you're also starting to learn prerequisites for AI, I'd love to connect. We could share knowledge, update each other on the topics we're covering each day, and discuss our plans for tomorrow or the week ahead. Let me know if you're interested in teaming up to support each other's learning journey!

umbral delta
#

Can someone help with part b?

final kiln
#

mixing tailor and fourier features I see

umbral delta
#

yes

umbral delta
final kiln
#

so isn't it the same problem but with L1 instead of L2

umbral delta
open raven
#

Encoding 2 categorial variables stored in one pandas data frame comprised from 2 columns. Each column has label built from string of alpha characters - no whitespaces.
One-hot encoder is used, Instantiated with arguments handle_unknown =ignore, sparse_output=False

Encoder delivers a data frame with columns axis labels as RangeIndex - means numeric.

The expectation is for new features the labels to be concatenation of original feature name and category encoded. But OH delivers data frame columns labeled in numeric way. Instantiating encoder with argument feature_name_combiner=‘combat’ doesn’t help.

What do I miss?

If I only understand it properly according to sklearn.preprocessing.OneHotEncoder API documentation, section constructor parameter feature_name_combiner the encoder should for presented circumstances deliver concatenated labels of encoded features: old feature and encoded category.

open raven
#

Well, get_feature_names_out() will help. Applied to encoder object, results stored as columns of data frame with encoded categories.

desert oar
#

that is, | y_predicted - y_actual | instead of sqrt( ( y_predicted - y_actual )^2 )

desert oar
kindred blade
#

LMAO

gritty vessel
#

Is this the correct way to feed data in cnn?

#

I have data like each row represent an Image

#

I want to capture spatial features as well as temporal features

#

Each array is oh 1000*1000

serene scaffold
#

if you're going to post a screenshot, make sure it's only exactly what you need to share.

gritty vessel
#

Oh I'm sorry for the overlook

gritty vessel
gritty vessel
#

First two columns are date and time and rest other columns are latitude longitude Imgtir1 imgtir2 imgvis imgswir all these are an array of 1000*1000 each in each row

serene scaffold
#

what are you using for the neural network

gritty vessel
#

For extracting spatial features from my data

serene scaffold
#

"Imgtir1 imgtir2 imgvis imgswir" -- what do these mean?

serene scaffold
gritty vessel
#

Oh sorry

#

I'm using tensorflow

serene scaffold
#

"Imgtir1 imgtir2 imgvis imgswir" -- what do these mean?

gritty vessel
#

They are short wave infrared electromagnetic light and visible light

#

I am working on satellite data

serene scaffold
#

so you probably need your data as a tensor with this shape
(num_rows, 1000, 1000, 4)

#

so basically, each image is a 3d array, two dimensions for height and width, and one dimension for those four... parts?

#

spectra, maybe?

#

dataframes are strictly two-dimensional. so you don't want it as that.

gritty vessel
#

I used

#

From_records

#

So it stored the array as it is in it

#

Just for the to get an idea how my data

#

Is I then stack lat long in one array and then stacked other 4 in channels

#

So it was like this 26,1000,1000,6

#

I was confused that whether I should pass lat lon as feature or not as coordinates are Improtsnt for catching spatial features right?

#

After wards I also wanted to capture temporal features also so what I did first pass it it cnn without dates after wards I will set date and time as index and will pass it through lstm

gritty vessel
#

As labels I have two arrays each of 1000*1000

#

They are named as flash and count

#

So flash is wherever we observe flash there will be 1 and and in count number of flashes at that particular area

final kiln
#

Ngl, never a good sign when you're solving eqns in latex and a tilde just floats off to another letter

#

now I have to redo everything

#

deltas on u should be up up too

#

and there you go, the cuda kernel

#

looks complicated but it's not, the deltas are if statements, and only one term is computed for a given (u, l)

#

and the super and subscripts are just indexes, so they're like M^n_l = matrix_array[n][l]

#

so each term will be computed in parallel and used to fill a matrix shaped (n, u, l) which I can then reduce sum along l, no space is wasted because every position in this matrix is filled. The f and g mappings are easily constructed using one of numpy's or pytorch's triu functions

#

I gotta do the gradient, but it's quite easy to calculate since it's all just simple multiplication

final kiln
#

here's the draft for the full treatment

grizzled sail
#

(excuse the phrasing, i'm still learning) when someone is writing a neural network or other similar kind of ai, and it comes to the programming of the model itself, i seem to see a lot of people writing nn's comprised of only dense layers like in the image attached? can someone explain to me what the purpose or point is, how they compare with writing a neural network comprised of conv2d or LSTM layers (i do understand these kinds of nn's are used for different purposes but thats beside my point)

wooden sail
#

if you have shift invariance, convolutions make sense

#

if you have slow variance along one axis, then LSTMs make sense

grizzled sail
#

so there is actually logic behind it? cause ive only ever written a nn totally alone and with zero guidance so i assumed it was supposed to be a component of a bigger whole, not it's own standalone layer

wooden sail
#

dense layers are just general affine transformations. the network can learn to make them shift invariant, but also not. you can use these if you know nothing at all about the problem

grizzled sail
#

(as in yes it is A standalone layer, but that it was supposed to work in conjunction with the "actual" nn)

#

also, do different nn layers require specific inputs, cause i assumed i could just add a different kind of layer (for example adding an LSTM layer but i started to get an error (this was a few days ago and i just undid it for the time being)

wooden sail
#

wdym by "specific inputs"

#

as long as the shape is correct, a layer won't care what the input is and will enforce its special conditions

grizzled sail
#

i guess i meant "specififcally shaped inputs" in that case

wooden sail
#

same as a sine function doesn't care what number you give as an input, it'll always treat it as an angle in radians return a number between -1 and 1

#

then yes, each layer requires a specifically shaped input

grizzled sail
#

i'm going to go and try running it again and see if i can get the shape bit right 👀

crisp raptor
#

Petition to use Q-learning with self driving cars

final kiln
crisp raptor
#

It's a joke dude

final kiln
#

Uhm, I didn't get it

crisp raptor
#

Reinforcement learning

final kiln
#

I'm not very knowledgeable about reinforcement learning. Gotta get on it eventually

trim needle
#

Is it possible to train a T5 model WITHOUT teacher forcing?

#

I don't understand any of this. I want my model to NEVER generate a specific token at the beginning. I'm so desperate that I've given it an extra penalty... and yes, the model doesn't use Teacher Forcing.

mellow vector
#

Morning DS/ML

hollow sentinel
#

I have a data science portfolio project idea. I got this tortilla price dataset from Mexico from Kaggle. My hypothesis is that supermarkets offer lower prices for tortillas compared to convenience stores and traditional markets in Mexico. Is this a good portfolio project to have?

#

I was also thinking of creating a website to showcase all my data science projects

#

I just don’t know what impacts the project itself is going to have. Like if it’s impressive enough for an employer.

#

I’ll do it anyways I guess?

final kiln
#

honestly just do it, once you get into the meat of the problem you'll know how to layer it more and more cuz you'll start having endless ideas

#

I need less ideas rn

#

have too many and there's not enough time for all

final kiln
#

more expensive, and significantly lower quality cuz they just buy it from the same supplier at lower quantities but closer to expire, or they just buy it from the supermarket and resell it

#

ig stuff specific to the place, like here they produce rice, so ig their rice is gonna be cheaper and super natural

#

whereas super market will be processed

hollow sentinel
subtle imp
#

I need to plot to graphs into one plot, but each having their own axes, e.g. like the following 2 gaussians, with one being rotated 90degrees cw. Is there a easy way to do this, I've tried for some time and didn't manage it, the only thing that I could image working is that I plot the second one on the y axis and scaled all the values to fit in the range of the first?

wooden sail
#

you'd just need to swap the axes

#

!e

import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.arange(N)
y = np.exp(-N/1000*(x - N//2)**2)*50
plt.plot(x, y)
plt.plot(y, x)
plt.show()
plt.savefig("biggest_oof.png")
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.12 eval job has completed with return code 0.

wooden sail
subtle imp
#

Hmm, could work if I scale the data to fit the ranges, I just would have thought that there might be a option to maybe have 2 subplots overlapping. Thanks for the help!

subtle imp
#

also tried this before, but they always have shared axis, but if I turn the plots the don't x and y are swapped

#

Maybe just normalize the whole data to [0,1] and then use the approach from before, just removeing the ticks and that should work

open raven
#

OneHot Encoder acting on pandas data frame. How to prevent fit_transform method from placing old index labels in new column?

serene scaffold
spark nimbus
#

Are there any guides on converting pandas-on-pyspark code to pyspark SQL? I have to convert a few thousand LoC next week 😓

open raven
void crescent
#

can someone please explain why my model gives the exact same prediction every time. it works for single predictions but when i mass predict it gives the same output for all predictions.

def process_image(img):
  resized = np.zeros((50,50,3))
  resized[:, :, :3] = read_img

  img_tensor = tf.convert_to_tensor(resized)
  img_tensor = tf.expand_dims(img_tensor, axis=0)

  return img_tensor

img_label_arr = random.choices(combined_data, k=4)
print(img_label_arr)

for il in img_label_arr:
  label = il[1]
  read_image = il[0]
  plt.imshow(read_image)
  plt.title(f"Label: {label}")
  plt.axis("off")
  plt.show()

  image_tensor = process_image(read_image)

  predictions = model.predict(image_tensor)

  print("Predictions:", predictions)


  prediction = benign_or_malignant(predictions[0][0])
  print("Prediction:", prediction)
final kiln
#

almost there

final kiln
#

refined this a bit more

wooden sail
#

may i ask what your motivation behind doing this is?

final kiln
#

it halves the number of floating point calculations, and also halves the amount of memory

wooden sail
#

not that, that i get

#

just the idea of learning metric tensors is common, so i would expect this to already exist in several flavors

final kiln
#

couldn't even find anyone using quadratic forms

#

like, xMx.T

#

the transformer does it implicitly

wooden sail
#

you might've looked with the wrong key words

final kiln
#

uhm, it is possible

wooden sail
#

every single time you read mahalanobis distance, this is what they're doing

final kiln
#

let me see

wooden sail
#

.wa s mahalanobis distance

strange elbowBOT
wooden sail
#

awesome

final kiln
#

right, but I haven't seen this concept used in ML

wooden sail
#

it's used everywhere

final kiln
#

where ?

wooden sail
#

anywhere you read "maximum likelihood" or under the name "mahalanobis distance" just as above

#

this is how optimization has been done for the past 100 years

final kiln
#

I'm not sure I follow

wooden sail
#

but maybe you needed to look under statistical methods

final kiln
#

this is an attention mechanism

wooden sail
#

machine learning people call this by the statistical name

#

this is why many problems in optimization have several names: the different communities don't talk with each other

final kiln
#

I have not seen quadratic forms be used explicitly in deeplearning NLP

wooden sail
#

the statistical mahalanobis distance squared is the same as using a metric tensor on a point of a manifold

#

you don't have to call it by name for it to be equivalent though

final kiln
#

I would've at least seen the equation I imagine

#

I do recall seeing something in computer vision

wooden sail
#

any term that involves any sort of mean squared error or maximum likelihood or maximum a posteriori or bayesian methods

#

anything that builds up a covariance matrix of some sort or a hessian matrix

#

they're all doing this, with a different name

final kiln
#

I'm confused because the stuff you mention seems to be related to loss right, this is a layer

wooden sail
#

no, not necessarily only loss

#

if you look up deep unfolding algorithms, they treat iterations of older algorithms as layers of a neural network

final kiln
#

I think it's very telling that they dont mention quadratic form on the 2017 paper

wooden sail
#

any unfolding of a 2nd order or higher method (e.g. quasi newton), or even of linear methods, involves products with gramian matrices (another name they go under)

final kiln
#

they're using a quadratic form as a layer and don't mention it

wooden sail
#

i would never take "they're not using my preferred terminology" as a sign

final kiln
#

it's not a matter of preferred terminology

wooden sail
#

quadratic forms are super standard and everyone expects the rest to take them for granted

final kiln
#

it's a no mention of it

#

they're using this whole CS Inspired terminology to talk about something that is just xMy.T

wooden sail
#

all newton methods do that too and they won't call it metric tensor nor quadratic form

final kiln
#

I would imagine you'd use the simple eqn tho

wooden sail
#

you'd be surprised

final kiln
#

I was

wooden sail
final kiln
#

it's less than a page and it's not just doing one product, I'm calculating a bunch of stuff at the same time

#

like, im not explaining a layer

#

im placing it in a form to feed it to the gpu

wooden sail
#

at any rate, the area you're looking for under statistical optimization is called "information geometry" and it's all about learning manifolds and metric tensors to do parallel transport

#

all right

final kiln
#

the actual motivation is that it is intuitive, in fact, even people who are not math savy find it to be an interesting concept. our brains like geometry so I see it as the way to make networks interpretable

#

ig the layer can exist somewhere under some other name, but it still wouldn't clash with my objective

past meteor
#

Anyone have specific tips to debug the location of tensors and/or memory related issues with torch?

#

I have a 70k parameter neural network that should be on my GPU (I literally call .to('cuda') on it) and when I call .to('cuda') on it closer to where I do inference if goes oom trying to allocate 60GB... For a 70k parameter network

final kiln
#

can only be the data right

#

are you doing backprop ? the more batches, the more gradients it stores, I've also found that loss functions tend to be inefficient with their allocations

abstract rune
#

I wrote a determinant calculator, for dimension 200, it takes around 10-11 secs
written in Go

meager ridge
abstract rune
#

i compared numpy solution with mine
numpy takes 1.2 secs while my takes 6 secs

#

numpy literally takes less than a second ughh

wooden sail
#

how does your algorithm compute the determinant?

abstract rune
#

i find the row-echelon form and then use this

#

I wrote the code for REF in Go

func RowEchelonForm(A [][]float64) ([][]float64, int) {
    matrix := Copy(A)
    var determinantFactor = 1
    rows := NumberOfRows(A)

    for i := 0; i < rows; i++ {
        r_idx, c_idx := LeftMostColumnWithNonZeroEntry(matrix, i)
        // r_idx = represents the row_index of the entry which is non-zero; it needs to be same as "i"; or else swap it.
        // c_idx = represents the col_index of the entry which is non-zero; on that

        if r_idx == -1 || c_idx == -1 {
            break
        }
        if r_idx != i {
            matrix, _ = RowSwitch(r_idx, i, matrix)
            determinantFactor *= -1
        }
        column, _ := GetColumnAt(c_idx, matrix)

        for j := r_idx + 1; j < rows; j++ {
            scalar := -1 * (float64(column[j]) / float64(column[i]))
            matrix, _ = RowAddition(scalar, j, i, matrix)
        }

    }
    return matrix, determinantFactor
}

func LeftMostColumnWithNonZeroEntry(A [][]float64, currentRow int) (int, int) {
    for i := 0; i < NumberOfCols(A); i++ {
        for j := currentRow; j < NumberOfRows(A); j++ {
            if A[j][i] != 0 {
                return j, i
            }
        }
    }
    return -1, -1
}

#

The code is a mess I know 😅

wooden sail
#

looks about right, i think LU decomposition is the most common, which is the same as row reducing

#

google does seem to say C is a little faster than go, which would explain the difference

abstract rune
#

1:5 ?

wooden sail
#

depends on the actual code, but people in google searches claim 3 to 20x factor

abstract rune
#

C is a factor, but my code is creating a lot of variables, and I am not mutating the original matrix (tryna do functional programming, (tho usiing for loops llmao))

wooden sail
#

that certainly makes it slower

#

see how it performs over several matrix sizes

#

but usually BLAS/LAPACK is hard to beat

abstract rune
#

whats BLAS, LAPACK ??

wooden sail
#

the libraries (usually written in c or fortran) that numpy wraps. they're libraries optimized for special linear algebra operations, also optimized for specific processor architectures

#

quite vexing because it automatically implements SIMD and parallelization

#

you can usually only beat it in cases of special composite matrices where you can split the total action of a matrix into smaller ones... for which you use BLAS/LAPACK 😛 instead of doing it naively for the whole matrix

past meteor
#

I'll look again tomorrow

#

Maybe I just needed sleep

abstract rune
#

damn it

#

it is quite complex stuff ugh

desert oar
#

i definitely encouraged him to pursue it 😆

#

you know about the manifold learning stuff and i don't, so maybe this has already been investigated and i didn't realize

desert oar
# final kiln couldn't even find anyone using quadratic forms

i'm with Edd though, this does show up everywhere all the time. when i said nobody has done it before, i specifically meant that i wasn't aware of anyone looking into this particular restatement of the attention mechanism. sorry if i wasn't clearer about that before.

wooden sail
#

it could be that no one has done it for the attention mechanism in particular, i have never read an NLP paper. the idea is overall standard and you find it in any book on optimization though

final kiln
#

It's like arguing against matrix mul

desert oar
wooden sail
#

yeah

#

from what little i recall of what is done in attention, the matrix is neither symmetric nor square

desert oar
wooden sail
#

that's interesting in its own right

#

a reasonable mapping where this even makes sense is nice to think about

desert oar
#

wdym by a reasonable mapping?

wooden sail
#

how to make it so that the vectors participating in the bilinear form are in the same vector space

#

one where this metric makes sense

final kiln
#

They all go through the same projection

desert oar
#

this i think was the derivation:

Q = X @ Wq
K = X @ Wk
V = X @ Wv

Q @ K.T == (X @ Wq) @ (X @ Wk).T
        == (X @ Wq) @ (Wk.T @ X.T)
        == X @ (Wq @ Wk.T) @ X.T
final kiln
#

Before being doted

wooden sail
#

what's what here

desert oar
#

and then the idea was to set M = (Wq @ Wk.T) and impose that M is symmetric and square, right?

wooden sail
#

that's a pretty strong constraint

final kiln
#

Yeah, like, I think the ideal thing to do is quadratic, but I wanted to explore metric just for the sake of it, in early experiments I found it was actually way easier to interpret what it was doing

desert oar
# wooden sail what's what here

X is the sequence, and Wq Wk Wv are the attention projection matrices. then Q K and V are the query key and value matrices. following the attention-is-all-you-need notation

wooden sail
#

without any special justification, this completely changes the data manifold

#

there's no reason why this should be better. it's gonna be faster due to structure and have nice geometric properties

#

not necessarily useful ones

#

nice to investigate though

desert oar
#

right. i was very curious what that would do to the model 😆

final kiln
#

It can also have a regularization effect maybe

wooden sail
#

yeah, that'd be the case

final kiln
#

The early experiment was an array sorter

#

So it would take a sequence 2, 3, 1 and output 1, 2, 3

#

With metric I could look at the distances between embeddings and see that they were actually being sorted along an axis

#

I have been posting everything here

#

But it was way back idk if I can retrieve it

desert oar
wooden sail
#

i thought the rectangularness of the matrices in attention stuff is what lets you deal with sequences of irregular lengths

#

i wouldn't know though

#

maybe that just comes from the choice of tokenization and embedding

#

i've barely touched this application

desert oar
#

i think you're right about the tokenization thing. stelercus is the NLP expert though

final kiln
#

The same projection is applied to every embedding and then every embedding "dots" with each other embedding, the sequence length is accounted by the cross doting

wooden sail
#

and from what i gathered, you have some pair of sets of input vectors that you elementwise inner product using this symmetric (positive definite???) matrix to get a new vector whose entries are the products?

#

or you have a different one per pair of vectors in the set

final kiln
#

If you're asking about 2017 scaled dot product, you have 3 projections. The results from two of them are used to produce the matrix whose entries are the "cross dots". Then you use that matrix as a transformation of on the third projection

#

So it's like the matrix of dot products is an MLP layer constructed on the fly

#

There's softmax applied to that matrix before using it as a transformation, and the values are scaled by 1/sqrt(dim of proj space) so they called scaled dot product

#

In my case I threw away the three projections and use only one, followed by a quadratic form

wooden sail
#

i'm just trying to figure out how much efficiency one can squeeze out

#

compared to, say, letting M for one pair of vectors be L + L^T for a lower triangular L, which makes it easier to guarantee symmetry

#

though that's not a problem if you compute the gradients by hand considering the symmetry explicitly

final kiln
#

Yeah I went the custom kernel route, that's what I was calculating earlier

#

But there's another layer to this

#

Which is that the particulars of the attention mechanism might not even be important

#

There's a study where they substituted the attention for an avg pooling and it still worked out fine

#

Caviat is

#

It was for vision and it's a paper on arxiv

#

So I'm reproducing their results but for NLP, and also exploring this other side with the metric tensor condition thing

#

I got a bunch of layers done, scaled dot product, quadratic, avg pooling, etc etc. Now I'm finishing up this one

#

But early results

#

For sentiment analysis it doesn't care one bit what you use for attention

wooden sail
#

if you have a deep enough network and enough data, the architecture doesn't matter much 😛 idk

#

fun times implementing the layers, but why don't you test it on what's already there? other than the street cred

final kiln
#

Well it's stranger than that cuz the rest of the network doesn't really make the embeddings interact

desert oar
final kiln
#

So like, I even used identity and it worked out

past meteor
desert oar
#

my intuition for vision is that you have much less of a requirement for long-range relationships between video frames because so much more information is available already in each frame. which is why the "token mixing" mechanism matters a lot less for video

wooden sail
desert oar
final kiln
wooden sail
#

ok, that answers the question

wooden sail
#

"we replaced everything with dense layers and threw as many A100's as we could at it. it just works."

past meteor
#

"Hyperparameters? We have graduate students exploring different sets as their master's thesis"

wooden sail
#

the reality of that hurts

past meteor
#

The famed grad student descent

final kiln
#

I hope you don't mind that I screenshot this hilarious exchange

past meteor
#

I had a bit of an evil thought today

#

The results of my modelling is beating the one of the prev batch by 10-20 %

#

The previous ones weren't bad either. Maybe an idea would be to present $client with middling results the first time and then show better ones the second 😩

#

that way, you always end the project on a positive note (I wouldn't actually do this)

wooden sail
#

you kinda rediscovered academia tbh

past meteor
#

Isn't this salami publishing and frowned upon

wooden sail
#

yes

#

also everyone does it

quasi sparrow
#

Hi everyone

I just started working on a regression task that uses data from IoT devices, each device has a different sampling rate.

The question is how can I create samples for my dataset if all the sensor readings (my features) have a different time stamp.

#

And also, when it comes to putting the model in production. How can I present the data for inference if all the data points are not available at the same time. Is there some kind of buffering technique that I could use?

#

I know there are tools in the cloud to stream data, but I am working on a problem where I won’t have access to the cloud, expect for training so I need to rely on open source.

#

Any feedback helps, thank you!

wooden sail
#

do you know anything about the measurement data? if you want to use the latest data and you know the data varies slowly, you can think of extrapolation methods

#

if you're ok with introducing a delay, storing some n previous samples makes sense, and then you could interpolate the data of all sensors to the timestamp of the sensor that was updated last

quasi sparrow
#

Yes, the data comes from an industrial machine. The data points are: temperatures, motor speeds, pressure, etc. The sample rates are between 1 to 8 seconds.

#

I think I could try both methods and see which one gets the most accurate model! Thank you for the advice.

primal pelican
#

Greetings Community:

I am Javascript developer and I have a task of getting text out of pdf and images
I searched google and find out that pytesseract and paddleocr are very good ocrs

any suggestions which libraries of python I can use to fullfil my task andatleast of getting 80% accuracy

serene scaffold
#

extracting text from PDFs isn't bad if it's just plain text. but PDFs with lots of images and stuff are hell on earth

abstract rune
#

tabula is also a great tooll !!

mild grotto
#

Now my particles will follow the gradient as they drop heat. This causes them to have pseudo collisions and bounce off of other particles

hallow radish
#

Hello guys, when merging two dataframes and returning selected columns; if the values are NaN it matches will all the NaNs and returns duplicates instead of returning NaN. What would be the ideal way to resolve this?

jovial tinsel
#

is there a possible way to run a python script in ios?

final kiln
#

with apple, anything could be impossible

final kiln
#

I'll get the full results once I'm done with this and prep a series of datasets

wooden sail
#

(i'm asking, i don't know)

final kiln
wooden sail
#

if you have enough layers after the transformer block, you can do whatever

final kiln
#

I didn't see much sense in having an avg pooling attention split into several heads, so there's also nothing additional there like projections

final kiln
#

simple average too if im not mistaken, so no weights there

#

my reasoning for why it works is that it's just counting good words and bad words and using that to decide if it's a positive or negative review

#

the interesting part is gonna be when I get it to do next token prediction, which I suspect it won't work, would make no sense if it did

#

no wait it's the other way around, I first project them and then avg

#

there's only one projection matrix for all embeddings tho so the point still stands

#

it's not the capacity of the output layer that is doing it

#

that explanation would also have been the first thing I would've thought

molten elk
#

Does anyone know if what can be done in R can be done in Python easily?

#

I'm not sure if it's worth investing time to learn R

final kiln
#

Never seen it be used outside of school

past meteor
#

Whether you should care about them is a different question 😄

molten elk
#

What if we use numpy?

boreal gale
# molten elk Does anyone know if what can be done in R can be done in Python easily?

for the most part, what can be done in R can also be accomplished in python quite easily.

but it's worth noting that R has first-class time series support and there exists implementations of some niche models like multinomial logistic regression, where python might not offer as much out-of-the-box support (though statsforecast in python has been a thing for some time, maybe R's time series capabilities are already matched in python?)

past meteor
#

They haven't been matched yet imo

#

For niche things at least

#

Also models like GAMs aren't as great in python. I think the valid question is, should you care about GAMs, maybe not

#

I don't think most people should learn it though

#

R has two great niches, it's easier to do descriptive statistics, plotting, ... etc. in it because dplyr, ggplot2 and co have a way more intuitive API than Pandas and matplotlib. The first niche is let's say social science researchers that don't want to dive deep into coding but want to do data analysis and ML

#

The second niche is cutting edge statistics. Some implementations are only in R land. I think the same applies for MATLAB.

The vast majority of people aren't in both scenarios so I'd just say: learn python 😉

final kiln
#

my take is that you cant go wrong with learning more things and languages in particular always have some new ways of thinking about things, tho I personally dont like R, py covers all my bases

#

I think that if you learn enough languages with similar paradigms, learning an additional one within that paradigm is not very hard, but learning languages that work very differently take time to get used to

past meteor
#

Most people don't learn R tho, I never learnt it. I learnt how to do things in it

final kiln
#

yeah I used it for a stats class

#

I liked matlab

past meteor
#

I haven't touched it in a year but I'm probably still faster at certain things with it than python

final kiln
#

there's also different phases right, most modern languages will be optimized so you can learn the basics quickly and be able to use it, but it might take a long time to master them

buoyant vine
#

Tbh i'm not sure really how true that is anymore

#

I think outside of some super specific industry thing which maybe is bundled up with a bunch of legacy stuff

#

It seems in general Python can go beyond what R can, especially with the number crunching related tasks, whether that be with Numba + Numpy or even Torch and tensors

final kiln
#

sticking with py is a good strategic choice career wise, cuz it is used in a variety of industry scenarios, not just ML and data science

past meteor
#

For my thesis I wanted to do a statistical test that had no credible python implementations

#

For R the implementation was from the author

#

Again, it's a question of whether or not you care about the advantages R can give you, because they exist but my entire point is that they're niche enough the vast majority of people shouldn't

final kiln
#

forward kernel is done (probably), i should throw a party

past meteor
#

I'm also hoping Polars becomes better integrated in the ecosystem because what I never got over was how incoherent pandas was

#

And matplotlib

final kiln
#

I wonder if the spreadsheet+python synergy could substitute pandas

past meteor
#

The whole idea of an index in pandas is just ... idk

#

It makes the library deeply strange

final kiln
#

I really like excel, and I really like py, so both of them together, could work

#

I think there's a company that used spreadsheets as their production db for an unreasonably long time

past meteor
#

That's just already Pandas/Polars?

final kiln
#

I suppose the difference would be that I like using one over the other

#

if I do like it

past meteor
#

Speaking of legacy, I don't know how it is abroad but here Java and C# dominated so most Python projects are Greenfields and data projects

#

I'm curious how it'll look like 5 and 10 years down the line 😅

final kiln
#

i've seen takes that in 5 to 10 years legacy will be much bigger problem because of tools like copilot

#

well, in excel they're just using daframes api, so no improvement

final kiln
#

the amount of ways I can shoot myself in the foot in cpp

desert oar
desert oar
#

i think the ideal scenario would be to not have separated indexes and data, but to have built-in "indexes" in the sense of a database index that can be attached to a data frame

#

data.table has that, but it's a bit auto-magical, compared to a proper database where you have a relatively wide range of control over what kind of index to use

#

the polars devs have expressed that they don't want to add that to polars, because they felt like it wasn't necessary (which they are totally wrong about, but it's their library and their choice), but they also said it should be easy enough to build a sidecar index thing that works with polars (not totally wrong)

#

for example in data.table you get to specify exactly one column as the primary index, which it uses to sort the rows and uses binary search for lookup and join operations. and you get to specify more columns as secondary indexes, but i don't remember offhand what kind of data structure it maintains for those and what kinds of optimizations it provides.

#

but unlike in pandas the index column does not become separated from the data, it's just the sorting key

#

at the opposite end of the spectrum is xarray, which is basically pandas but multi-dimensional and fully embracing the separation of "coordinates" (the xarray equivalent of a pandas index) and "values" (data)

#

so it's kind of use-case-dependent

desert oar
#

i think pandas indexes are great when you have clear obvious choices. it enables you to get much better performance than you otherwise might be able to do with a "dumb" data frame library that lacks a query execution engine

past meteor
#

They won performance in different places

desert oar
#

and it does help keep things organized by separating "id" columns from everything else, which i find appealing and convenient when "exporting" data to numpy or torch for use in ML. it also makes equi-joins on keys and as-of joins on timestamps very convenient.

#

so i like it in its own time and place. but i'm not exactly clamoring for the index/data separation in other data frame libraries either. i really just want the option to designate certain columns as "coordinate" columns (without physically separating them from the data) and to define database-style indexes for performance as needed. but at that point maybe i just want to start embedding duckdb.

past meteor
#

I also don't like .loc and .iloc and the general error messgage that goes something like "setting a copy on a slice on a dataframe..."

desert oar
#

i think the distinction between loc and iloc is important if you are going to be using the indexes

#

the error messages and UX/API... yeah

past meteor
#

We've been stockholmed into believing these make sense

#

But they don't

left tartan
#

(My obligatory: I just do everything in sql -because- of pandas's ridiculous api)

desert oar
#

i think it makes perfect sense if you believe that it makes sense to separate coordinates from data

past meteor
#

Just have .filter or .where be how you operate with data

desert oar
#

it's not a ridiculous API if you buy into the index-data separation

#

but if you don't like the index-data separation then frankly i think pandas itself is just not the tool for you. it's a core part of the pandas design, like it or not

#

that, or you ignore indexes and suffer with slow linear scans for every filtering operation. but you probably should just use polars then.

past meteor
#

That's the thing - it's tightly integrated with most of the DS stack you can't get around it

#

I use polars wherever I can

desert oar
#

is it? you could probably just skip pandas entirely except with seaborn and statsmodels

#

it's very integrated into the community though

#

so it will be hard to find a DS team that doesn't use it at some point

#

and it wouldn't be fair to prohibit colleagues from using it

past meteor
#

I have my interns using Pandas of course

left tartan
desert oar
#

tldr: it makes sense if you don't like the index/coordinate-data separation, but i maintain that the loc/iloc distinction makes sense and is necessary when that separation exists.

#

UX/API design for interacting with that distinction is another matter

past meteor
#

Oh, I didn't argue loc/iloc being different is bad

desert oar
#

oh, maybe that was just billy 😆

past meteor
#

One is for labels and the other for positions, that much is clear

#

I'm arguing that the idea of having indexes on a df is bad yes

#

If you have them, then yeah, you need both

#

Indexes in Pandas' way

desert oar
#

yeah, fair. it's like how maybe R didn't need to also be a radical lisp to be a useful stats language

left tartan
#

I think my problem is that there are many analytical cases where there's no clear index/data separation

past meteor
#

I think being able to add your index ad-hoc / when reading data would've made more sense. Kind of like how data.table does it

left tartan
#

Or, more particularly, that the indices vary depending on the query

past meteor
#

group_by().reset_index() 😩

#

(Aside from Pandas I have them doing dbt + duckdb)

#

My judgement call when selecting their stack was that SQL, dbt and Pandas will go a longer way for them than Polars early career wise (even though it's no secret that's my fav)

desert oar
desert oar
#

normally i do DVC + ad-hoc scripts

past meteor
past meteor
final kiln
#

I've refined this a bit more. im not sure if I can call x_i the thing itself or the coordinates of the thing in a given basis. I also should preface the 4th paragraph with y Im defining so much new stuff, the motivation is that Im defining the memory layout for the tensors

pastel gate
#

I'm learning data science and my teacher send me this code,

But the accuracy seems to be at 100% which I find impossible. Is there anything wrong with this code?

#

Oh I can't send a file

#

import pandas as pd
import numpy as np
import sys
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, roc_auc_score
from sklearn.model_selection import learning_curve
from sklearn import metrics
import scikitplot as skplt
from sklearn.model_selection import LearningCurveDisplay, ShuffleSplit
from sklearn.model_selection import train_test_split
df = pd.read_csv('data.csv',sep=';')
X = df.drop(['k'], axis=1)
y = df['k']
X_train, X_other, Y_train, Y_other = train_test_split(X, y, test_size=0.2, random_state=42)

X_test, X_val, Y_test, Y_val = train_test_split(X_other, Y_other, test_size=0.5, random_state=42)

#

x = X_train['w']
y = X_train['l1']
k = Y_train
for i,row in X_train.iterrows():
if k[i]==1:
plt.plot(x[i],y[i],'rx')
else:
plt.plot(x[i],y[i],'gx')
plt.xlabel("Weight")
plt.ylabel("First")
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
ac=pd.DataFrame({'k':[],'Accuracy':[]},index = [])
for k in [1,3,5]:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train,Y_train)
y_pred=knn.predict(X_val)
accuracy = accuracy_score(Y_val, y_pred)
row=pd.DataFrame({'k':[k],'Accuracy':[accuracy]})
ac = pd.concat([ac, row],ignore_index=True,axis=0)
data_p = scaler.transform([[70,16,15]])
knn.predict(data_p)
data_p = scaler.transform([[60,16,15]])
knn.predict(data_p)
ac['Accuracy'].max()
ac[ac['Accuracy']==ac['Accuracy'].max()]

k=int(ac[ac['Accuracy']==ac['Accuracy'].max()]['k'].values[0])
knn_super = KNeighborsClassifier(n_neighbors=k)
knn_super.fit(X_train,Y_train)
y_calculated_class=knn_super.predict(X_test)
y_probability_classs=knn_super.predict_proba(X_test)
knn_super.fit(X_train,Y_train)
y_calculated_class=knn_super.predict(X_train)
y_probability_classs=knn_super.predict_proba(X_train)
mac_pom=confusion_matrix(Y_train,y_calculated_class)
mac_pom
#PPV
for i in range(len(mac_pom)):
print('PPV',i, mac_pom[i,i]/mac_pom[:,i].sum())
print('TPR',i,mac_pom[i,i]/(mac_pom[i].sum()))
print('TNR',i,(mac_pom.sum()-mac_pom[i].sum()-mac_pom[:,i].sum()+mac_pom[i,i])/(mac_pom.sum()-mac_pom[i].sum()))
print(metrics.classification_report(Y_train, y_calculated_class))
fig, ax = plt.subplots(figsize=(2, 2))
cmd = metrics.ConfusionMatrixDisplay.from_predictions(Y_train, y_calculated_class,ax=ax)
plt.savefig('Knn_macierz_pomylek.pdf')

final kiln
pastel gate
#

Honestly I had like one theoretical lesson of data science and no practice and I don't know if the code is correct or not

final kiln
#

Have you tried printing out some results ?

#

Like, take the model and feed it some X and see if it is the correct Y

#

And also, print the Y

#

Usually in these cases theres something off with the data and the model converges to outputting some constant

pastel gate
#

I used diffrent data set, much larger and it still gave me a 100% accuracy

#

honestly I'm still trying to figure out how this exactly works because like I said I had only one theretical lesson of data science

#

And I think this was a good data set because I got it from kaggle and it seemed ok

final kiln
#

Try printing out the output, it will be clearer from there

pastel gate
#

I don't want to tell my teacher that his code is wrong unlesss I'm absolutely sure it is 😅

final kiln
#

And 100% accuracy is possible if the dataset is artificially constructed for it

pastel gate
#

I used two diffrent data sets so it's not likely

final kiln
#

What does the output of the model print out ?

pastel gate
#

While I was looking trough it I found that accuracy for every k is the same as well. My etacher said that it was because of data givem but I changed and expanded the data. Can the problem lie here?

desert oar
final kiln
#

My resume be like "I use Ricci btw"

muted hollow
#

Guys, what to do with the dataset that have the number of positive sample way more than the other. Should I downsample it or ignore

final kiln
#

Assuming you're using cross entropy loss

past meteor
#

Or ... do nothing and look at your metrics to decide (ROC, DET, ...) an operating point

#

that's my preferred choice over downsampling, upsamling and the likes

desert oar
#

that. you already have 2000 samples in the smallest case. that's probably enough.

#

start with not worrying about it

#

imbalance is less bad than outright not having enough data to make a good decision about one particular class

odd meteor
odd meteor
supple inlet
#

Im running a mistral 7B instruct model on my tesla p40 (24gb vram) and im getting a CUDA out of memory error for my gpu?

orchid lintel
#

Anyone got an efficient way of turning a Sparse Polars df into the short arrays that can define a CSR array? I tried turning every column into a 1-element list of Structs (each containing a non-zero value and its corresponding Row) but I kept getting weird errors around duplicate columns (apparently this is a known bug with Structs atm)

serene scaffold
mild grotto
#

Fixed a bug. Turns out int(1.9-2)==0 not -1

#

I was wondering why my stuff was sticking to the 0 line

tidal bough
versed pilot
manic latch
#
import matplotlib.pyplot as plt
import numpy as np
from io import BytesIO
plt.clf()
uber = 8
forge_frag = 1600
nitro_value = 175
for uber in range(8, 12):
    i = 0
    x = 1000
    x_axis = []
    y_axis = []
    lol = False
    lol2 = False
    for z in range(100):
        nitro = (nitro_value * (z + 1))
        i += x
        i += 1500 / (uber - 7) - forge_frag * (10 * (uber - 7) - 10)
        f_p_n = i / nitro
        x_axis.append(f_p_n)
        y_axis.append(nitro)
        x += 2000
    plt.plot(y_axis, x_axis, label=f"Uber {uber}")
plt.axhline(y = 0, color = 'b', linestyle = 'dashed', label = "0 ea") 
plt.axhline(y = 160, color = 'r', linestyle = 'dashed', label = "160 ea")
plt.legend()
plt.title = f"Flux cost per nitro (Vaults)"
plt.xlabel('Nitro')
plt.ylabel('Flux per nitro')
data = BytesIO()
plt.savefig(data, format="png")
data.seek(0)
#

I have this bit of code, however it's missing another X axis which is supposed to show 0 to 100 in intervals of 5

#

I wanted that axis to be on top...and grid the whole graph based on it

#

been looking in stack overflow with no real results, it ends up swalloing the whole graph and creating a new one over it it seems

final kiln
#

the full scaled dot product attention from 2017, I'm gonna use these to show the equivalence to the quadratic form, and then argue in favor of just using the quadratic form and then try to make the case for the further restriction of it being a metric

#

N_k is not a matrix, probly not the best convention now that I think about it but N acts on an index to produce its maximum range

open raven
#

pandas DataFrame.to_numpy( )

Did this method never run through deprecation stage? In pandas 2.2 no more available - code runs onto „object has no attribute” error, yet the API Reference not reporting its support. As for pandas 2.1.2 the method is still present however under no warning of deprecation.

tidal bough
arctic wedgeBOT
#

pandas/core/frame.py line 1857

def to_numpy(```
tidal bough
#

are you sure you're actually getting an error accessing it?

open raven
final kiln
#

tomorrow is gonna be intense, I was supposed to have finished this stuff by yesterday, and this deadline was already a pushback from mid last week

final kiln
# desert oar this i think was the derivation: ``` Q = X @ Wq K = X @ Wk V = X @ Wv Q @ K.T =...

there's actually a good reason for doing it the way they did it, it reduces the number of parameters, when you do Wq@Wk.T you're getting back a dxd matrix, basically, with their way, as long as you choose a ksuch that k < d/2 you're using less parameters to make a mathematically equivalent layer

it is possible however, to cook a better and still mathematically equivalent layer with even less parameters if you were to stick with the quadratic form, Im gonna include a small proof of this on my report thing

harsh scarab
#

Guys I need some help. I'm working on a Cars Dataset to predict the price based on cars infos, my problem is i donno how to get insight from the Car's Model Feature since there are 2736 unique values and my dataset has 27k rows

final kiln
harsh scarab
#

this how my dataset looks

#

I think that there will some relation between the Model and the Price

#

but the problem is there 2736 unique values and as you can see the dataset is very large

#

@final kiln

final kiln
harsh scarab
final kiln
#

ah you are doing EDA

harsh scarab
#

yeah

final kiln
#

maybe PCA can help somehow

harsh scarab
#

for the make for example i spot that even that the occurence of some cars is less frequent but it affects the price very high

#

like Ferrari, Rolls Roys

#

which is logical

#

but there are only 58 Make

#

so it was easy

final kiln
#

you can likely show this very well with a color coded histogram

#

x axis would be price bins, y axis would be count

#

and then each bar would have an assortment of colors showing the percentage that goes to each class

harsh scarab
final kiln
#

no

harsh scarab
#

hmmm

final kiln
#

2k classes might still look cluttered even in my proposed plot tho

#

maybe get the statistics for each class and plot that

#

like avg, std, etc

#

then a scatter plot with some error bars

#

labeled with the car model

harsh scarab
#

that sounds good

#

imma try that

#

thank you

toxic mortar
#

I want to build unsupervided learning semantic-based cluster grouping of key informations in uploaded files. I am considering between ChromaDB and Qdrant for the vector db. For the types of clustering algorithms Density-based or Centroid-based. For example, I have 10 uplaoded hedge fund files, and I want to add them into vect db and find up some informations for ex two advisers stock X will go up, one said it will go down, etc.. Did you solve similiar problem, if yes or if you have any suggestion for starting out and choosing optimal tech stack please lmk. ty

buoyant vine
#

How much data is there?

#

Can you not fit this into a system locally? If so, I would use Neither and use PyNNDescent with SKlearn to create your clustering pipelines

#

A) It is faster to search B) less hassle and C) faster construction

warm copper
#

Is there anyone online here?

hybrid prism
#

Hey

warm copper
#

😄

#

I have a question.

hybrid prism
#

Yes

warm copper
#

I think my teacher is crazy

hybrid prism
#

I'm not that good with Python but sure

warm copper
#

Im working on a project

hybrid prism
hybrid prism
warm copper
#

and my teacher asked us to use k-means clustering

#

and then find the accuracy

hybrid prism
#

For what

#

Oh k

warm copper
#

but k means clustering doesnt have accuracy for predictions

hybrid prism
#

L

#

Skill issue

#

Nah I'm jk

#

What do you need help in

warm copper
#

I mean that question must be wrong

tropic mirage
#

hi guys , becoming ML/AI engineer full , stack development require?

hybrid prism
#

Nah

clever owl
#

Hey guys I have a list of similar words, they are similar by typos i.e. INDONAKANO COMPANY, INDOKSANO COMPNY..., I would like to group these similar typo words together. What is the common practice for doing this?

muted hollow
#

Guys, is it bad to clean the sample for bert-base-uncased for classification problem like sentimental analysis problem

hot bridge
rain nymph
#

Hey guys! I am building a NLP virtual assistant. Currently so far I have build till semantic analysis where the machine can understand if my given text is positive or negative. I am trying on how machine can understand the entity and open the applications that I give the query to open. Example = “can you open calculator please?”
By NER I have such output: (S Can/MD you/PRP open/VB calculator/NN please/NN ?/.)
Im using the NLTK libraries but now idk how I would make a function that will make the machine understand that it has to open calculator. I was thinking of pattern matching but again it gets very tidious. Am I going correctly or is there anything I should consider for entity recognition that im lacking currently? Thank you for your help :D

#

also Im using datasets as nltk corpora

hollow sentinel
#

import pandas as pd

#

Import "pandas" could not be resolved from source

#

what does that mean 😿

past meteor
#

You can always double check by opening the terminal and running pip list | grep pandas

hollow sentinel
pastel gate
#

when I use knn algorithm do I have to have a validation group?

serene scaffold
pastel gate
#

Basically my teacher said that when you're using the knn algorithm you should have:

  • training group
  • Validation group where you check which k is best
  • Test group - to test final accuracy of the model
#

And validation group messed up my model, so I was thinking if I could check which k is best juts on the test group

serene scaffold
#

how did the validation group "mess up your model"?

pastel gate
#

I'm new to this so I basically have zero experience

pastel gate
serene scaffold
#

that doesn't mean that the validation set "messed up your model". the only set that has any influence on the model's behavior is the training set.

#

if your instructor told you that you need to have a validation set, then you do

#

but in general, to train a model, you only need a training set. (but if you don't have a test set, then you'll have no way of knowing how well it performs.)

hollow sentinel
#

so stel i read this medium article and it said to not use seaborn for "default visualizations"

#

bc it doesn't generate the most impressive ones. what should i use instead?

serene scaffold
hollow sentinel
serene scaffold
#

but don't take anything on medium for granted. most of the content on there is written by wannabe influencers.

hollow sentinel
#

isn't seaborn built on top of mpl?

serene scaffold
#

they all are.

hollow sentinel
#

idk i wanna create some more impressive data visualizations for my portfolio

#

maybe tableau is the answer?

hollow sentinel
#

should i put the link here? this guy put it exclusive to medium ppl only

serene scaffold
#

If you want

hollow sentinel
#

i think he's right tbh

lofty thorn
#

hi..
can anyone please teach me how to calculate the inter quartile range..

serene scaffold
lofty thorn
#

i was reading book and there came this..i didn't write any code for it

#

is this didn't find fair to you?

past meteor
lofty thorn
past meteor
#

Another option is plotly

hollow sentinel
#

also there's missing values... do i impute the values e.g fill in the missing ones with an average? or do i just drop the missing entirely?

#

this isn't a school project or anything if anyone is concerned about helping me

#

just my own curiosity

#

if you always fill in your missing values with imputed data, wouldn't your analysis be skewed?

final kiln
#

Math is hard

hollow sentinel
#

the old kanye would've imputed everything with the mean

#

i miss the old kanye

#

but now i fr don't know what to do

#

if i drop the values too, there's a problem

final kiln
#

I'm out of context here, who's Kanye

hollow sentinel
past meteor
past meteor
final kiln
#

Ah Kanye west

final kiln
#

I have a symmetric matrix, and for some reason my brain can't think of a way to optimize the matrix mul

final kiln
#

The matrix is laid out as a 1d array

#

this is the context

#

it worked out great when it was M_kk' cuz the result was a number and I could partition the sum, but now Im stuck

hollow sentinel
#
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("/Users/rahuldas/Desktop/Tortilla Dataset/tortilla_prices.csv")
print(df.head)
print(df.info()) 
print(df.shape)
print(df.columns)
print(df.dtypes)
sns.distplot(df["Price per kilogram"])
plt.show()
price_per_kilogram_missing = df["Price per kilogram"].isna().sum()
print(price_per_kilogram_missing)
print("hello world")
final kiln
#

s is symmetric in cc', and u=F(c,c'), F is the way Im flattening it

hollow sentinel
#

!pastein

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow sentinel
#

does anyone know why nothing is being outputted?

#

not even the hello world works

final kiln
#

whhat's the error that it throws

hollow sentinel
#

no error either

final kiln
#

not even seg fault

hollow sentinel
#

nada

final kiln
#

that's suss

hollow sentinel
#

i changed distplot to displot

final kiln
#

you gonna have to move the print line by line to try to figure out the guilty line

#

something is crashing the program silently

#

and wtv it is kill it with fire cuz programs shouldnt do that

hollow sentinel
hollow sentinel
#

2024-03-24 11:24:54.258 Python[61800:5006159] WARNING: Secure coding is automatically enabled for restorable state! However, not on all supported macOS versions of this application. Opt-in to secure coding explicitly by implementing NSApplicationDelegate.applicationSupportsSecureRestorableState:.

final kiln
hollow sentinel
#

my guy what in the world of fuck is that

final kiln
hollow sentinel
#

i can't w it

final kiln
#

I think I might prefer remaining unemployed than being forced to use one again

hollow sentinel
#

nah

#

ok i think i got the number. 6390.

#

there are 6390 rows of data missing for that specific column of Price Per Kilogram

#

based on that, is there any way i can conclude i should be imputing?

#

fuck it we ball, i'll impute anyways

lofty thorn
#

i don't get it

hollow sentinel
lofty thorn
#

very basic

hollow sentinel
#

yea i would recommend augmenting that with some extra work w a course or two

#

that book really isn't beginner friendly

wooden sail
# past meteor <@467435887236612106> ?

if you know the matrix ahead of time and it's manageably small, you can think of diagonalizing it. aside from that, i think computing either y^T(Mx) or (y^TM)x is much more efficient than looping for every single element as naive einstein notation would suggest. if you're coding it yourself, you might consider using strassen's algorithm for whichever product you associate with the matrix

#

i pinged the wrong person

#

@final kiln

final kiln
#

im taking the L on it tho, I've already got what I wanted to do done

#

actually, that does give me the idea, I could just like, not care about memory and tank the repeated calculations, I'd still be squeezing out performance cuz I'd be doing two matrix mul in one operation, I'd just take the same amount of memory instead of half

hollow sentinel
#
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("/Users/rahuldas/Desktop/Tortilla Dataset/tortilla_prices.csv")
sns.set(rc={
    "figure.figsize" : (11.7, 8.27)
})
print(df.head)
print(df.info()) 
print(df.shape)
print(df.columns)
print(df.dtypes)
print("hello world")
price_per_kilogram_missing = df["Price per kilogram"].isna().sum()
print(price_per_kilogram_missing)


price_per_kilogram_missing_mean = df["Price per kilogram"].mean()
print(price_per_kilogram_missing_mean)
df["Price per kilogram"] = df["Price per kilogram"].fillna(price_per_kilogram_missing_mean)
print(df["Price per kilogram"].isna().sum())
sns.kdeplot(df["Price per kilogram"], shade=True)
#plt.show()
sns.barplot(df, x = "Price per kilogram", y = "State", hue = "State")
plt.show()
#

anyone have any ideas on how to make this visualization more eye-friendly?

#

i might remove the error bars

#

but when i did i still have one

#

aguascalientes

#

weird ash

final kiln
#

i have no idea if any of this makes sense >.>

#

ig if backrprop dont work ill know why

hollow sentinel
#

just curious

#

i mean there's 32 cities here

#

maybe show the top 5 values?

final kiln
#

i think im making this more complicated than it needs to be, I can take the derivative with respect to one of the symbols already present, cuz it's symmetric with respect to the choice of either p

final kiln
hollow sentinel
final kiln
hollow sentinel
#

i always run into this problem with data visualizations

final kiln
#

potentially normalize the height with respect to the largest bar

hollow sentinel
#

whipped up a quick graph of what it could look like instead

#

the names are the problem tho

final kiln
#

remove the gradient, just a solid color is standard

toxic mortar
#

@final kiln GIGACHAD NAME bro

final kiln
#

like, if it has no information it is not useful

final kiln
toxic mortar
#

"cool name"

final kiln
final kiln
#

this is likely correct, when c = c' it becomes the derivative of x**2, which is 2x

#

kind of a crazy way to express it tho

hollow sentinel
#

better?

#

weird

#

i can't get it to have the highest value at the top

final kiln
#

and try dividing by the height of the largest bar

hollow sentinel
final kiln
#

price per kg = (price per kg) / max(price per kg )

#

im gonna assume my crazy equations are correct and move on, cuz I gotta get stuff done

#

I'll have the opportunity to refine them once I have a unit test on the entire layer

hollow sentinel
hollow sentinel
# final kiln price per kg = (price per kg) / max(price per kg )
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import colorcet as cc
df = pd.read_csv("/Users/rahuldas/Desktop/Tortilla Dataset/tortilla_prices.csv")
sns.set(rc={
    "figure.figsize" : (11.7, 8.27)
})
print(df.head)
print(df.info()) 
print(df.shape)
print(df.columns)
print(df.dtypes)
print("hello world")
price_per_kilogram_missing = df["Price per kilogram"].isna().sum()
print(price_per_kilogram_missing)


price_per_kilogram_missing_mean = df["Price per kilogram"].mean()
print(price_per_kilogram_missing_mean)

df.sort_values("Price per kilogram", ascending = False)

df["Price per kilogram"] = df["Price per kilogram"].fillna(price_per_kilogram_missing_mean)
print(df["Price per kilogram"].isna().sum())
df["Price per kilogram"] = (df["Price per kilogram"])/(df["Price per kilogram"].max())
sns.set_style("whitegrid")
sns.kdeplot(df["Price per kilogram"], shade=True);
#plt.show()
sns.barplot(df,estimator=np.median, x = "Price per kilogram", y = "State", color = "blue");
sns.despine(left = True
            );
plt.show()

#

what in the world is that

#

at the top part of the graph

#

very odd behavior

#

i don't think it likes it

final kiln
#

and now im gonna curl up and cry cuz I still didnt get everything done and that means im gonna be working on this til nite

final kiln
# final kiln

there's something that doesn't sit quite right with me tho, even if they are equivalent, isn't the one that performs a project to a lower dimensional space still a bottleneck ? like, if the embedding dimension is 1000000, and the space where it's projected to has 10 dimensions, like, there should be a higher risk of loss of information

#

even tho I can multiply both weights and get a 1000000x1000000 matrix

#

I'm willing to accept that in the ideal math world I can funnel and recover any information regardless of how tight the bottleneck is, but I think there is gonna be some sort of limit IRL, if nothing else, just coming out of the fact that IRL we dont operate on the reals, we operate on the lattice of floating point numbers

#

im sure someone has already figured out this stuff

lapis sequoia
#

i thought scipy optimize minimize is some insane math but all its actually doing is changing every single parameter and seeing how it affects the loss and that is what all solvers do apparently

#

it just changes them one by one

tidal bough
#

i mean, it depends on the algorithm

neon crystal
#

anyone have good recommendation for resources on time series forecasting in python? I have a course by jm portilla on udemy but that one is kinda outdated now.

lapis sequoia
# tidal bough i mean, it depends on the algorithm

i tried all of ones that dont require gradient, i made it recreate an image, and then train a pytorch model, so i tried "Nelder-Mead", "Powell", "CG","BFGS","TNC","COBYLA" and "SLSQP", and i logged the image it outputs after each time it evaluates it, it changes each pixel one at a time and then puts it back to original value and does the next pixel

gritty vessel
#

hey how can i handle memory error it says unable to allovate 2.01 gigs

#

but i have 16 gigs

#

and i was monitoring on task manager enough memory was there

kindred isle
#

Can anyone help with conda installation? I have two different ones installed and idk which one to keep

final kiln
#

Unless you've installed stuff that took you a long time to install or anything like that

final kiln
honest reef
#

How are you guys doing

supple inlet
#

Im getting a Out of Memory error, im trying to run Mistral-7B-Instruct-v0.2 model and i have a tesla p40 (24gb vram):

OutOfMemoryError                          Traceback (most recent call last)
Cell In[7], line 2
      1 model_inputs = encodeds.to(device)
----> 2 model.to(device)

OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.87 GiB of which 47.00 MiB is free. Including non-PyTorch memory, this process has 23.82 GiB memory in use. Of the allocated memory 23.68 GiB is allocated by PyTorch, and 1.14 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

final kiln
supple inlet
#

Its only the current task im doing

pale thunder
#

When dealing with the gini index for the purposes of deciding a split in a decision tree, you compute the gini index of samples on either side of the split, then take a weighted average of the indexes. However, a gini index is supposed to be the probability of a sample being misclassified - that is, the probability of random.choice(samples).class_ != random.choice(samples).class_ - the correct way to compute this for the two splits would be a different formula entirely. Why is the weighted average used?

lapis sequoia
#

wait, how do you deal with the gini index features = ["SEX","AGE","YEAR","EDUC","INCWAGE","WKSWORK2"]
sampling_weight = 'ASECWT'

df = sampled_df[features + [sampling_weight]]

df.isna().sum()

df = df[
(df['AGE'].between(21, 64)) &
(df['INCWAGE'].between(0, 99999998)) &
((df['WKSWORK2'] >= 1) & (df['WKSWORK2'] <= 6))
]

df['EDUC'] = df['EDUC']

df = df.drop('ASECWT',axis=1)

import numpy as np

df.info()
df.isna().sum()
a = 0.10
df['EDUC'] = df['EDUC'] * a

df['LOG_INCWAGE'] = np.log1p(df['INCWAGE'])

df['mean_hours_worked']= df['WKSWORK2'].mean()

df['WKSWORK2'] = df['WKSWORK2'].apply(lambda x: mean_hours_worked if 40 <= x <= 46 else x)

print(df)

df['LOG_INCWAGE_PER_WEEK'] = df['LOG_INCWAGE'] - np.log1p(df['WKSWORK2'])

df['MARKET_EXP'] = df['AGE'] - df['EDUC'] - 6

print(df)

print(df.dtypes)

df['YEAR_OF_EXP_SQUAREd'] = df['MARKET_EXP']**2

final kiln
#

With 8gb of memory that also is occupied with the rest of the system

#

Tho I did make a larger a
Swap file

supple inlet
final kiln
supple inlet
final kiln
supple inlet
#

heres all the code:

from transformers import AutoTokenizer, AutoModelForCausalLM
device = 'cuda'
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)

final kiln
#

Ah Ive never used the transformers lib. But, aren't you potentially loading the same model twice ?

final kiln
#

Not sure if you can use olama to finetune

supple inlet
#

Ive heard alot about olama, i havent tried it yet.

supple inlet
final kiln
#

Try to debug this line by line, start with loading only one model to the GPU

#

And see the effect of that on the memory

#

There's gonna be a line where it gets to 20gb, which I think it shouldn't right

supple inlet
#

Thanks ill give it a go

final kiln
#

Yeah Mistral 7b should be like 4gb on the gpu

supple inlet
#

Just running the tokenizer line it takes up 150mb lol. And it successfully encodes my message to a tensor

supple inlet
final kiln
lapis sequoia
#

how do you add weights to variables in python

wooden sail
#

especially in the linear case, the recoverability conditions are well known

final kiln
wooden sail
#

the link between unique recoverability of high dimensional vectors from low dimensional ones is through so-called "sparse recovery", where the constraint is that the projection matrix needs to satisfy special identifiability conditions and the vectors you're looking for in high dimensions are sparse or have a sparse linear representation

#

the property can be thought of as approximately preserving distances between the vectors in the original vector space even after projecting them to a lower dimensional one

#

a popular formulation is via the "restricted isometry property" using the johnson-lindenstrauss lemma

hollow sentinel
#
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
df = pd.read_csv("/Users/rahuldas/Desktop/Tortilla Dataset/tortilla_prices.csv")

print(df.head)
print(df.info()) 
print(df.shape)
print(df.columns)
print(df.dtypes)
print("hello world")
price_per_kilogram_missing = df["Price per kilogram"].isna().sum()
print(price_per_kilogram_missing)


price_per_kilogram_missing_mean = df["Price per kilogram"].mean()
print(price_per_kilogram_missing_mean)
df["Price per kilogram"] = df["Price per kilogram"].fillna(price_per_kilogram_missing_mean)
print(df["Price per kilogram"].isna().sum())
sns.set_style("whitegrid")
sns.kdeplot(df["Price per kilogram"], shade=True);
#plt.show()
fig, ax = plt.subplots(figsize=(6, 6))
# drawing the plot
sns.boxplot(data=df, x = "Store type", y = "Price per kilogram", color = "lightblue", ax=ax);
plt.xticks(rotation=90)
sns.despine(left=True, right=True, top=True, bottom=True)
#plt.show()

df["Date"] = pd.to_datetime(df[["Year", "Month", "Day"]])
print(df.columns)
print(df.head)
sns.lineplot(x = "Date", y = "Price per kilogram", data=df)
plt.show()
#
Traceback (most recent call last):
  File "/Users/rahuldas/Desktop/Tortilla Dataset/Tortilla Data Analysis.py", line 34, in <module>
    sns.lineplot(x = "Date", y = "Price per kilogram", data=df)
  File "/Users/rahuldas/Library/Python/3.9/lib/python/site-packages/seaborn/relational.py", line 508, in lineplot
    p._attach(ax)
  File "/Users/rahuldas/Library/Python/3.9/lib/python/site-packages/seaborn/_base.py", line 1135, in _attach
    converter.update_units(seed_data)
  File "/Library/Python/3.9/site-packages/matplotlib/axis.py", line 1717, in update_units
    self._update_axisinfo()
  File "/Library/Python/3.9/site-packages/matplotlib/axis.py", line 1729, in _update_axisinfo
    info = self.converter.axisinfo(self.units, self)
  File "/Library/Python/3.9/site-packages/matplotlib/dates.py", line 1882, in axisinfo
    return self._get_converter().axisinfo(*args, **kwargs)
  File "/Library/Python/3.9/site-packages/matplotlib/dates.py", line 1799, in axisinfo
    majloc = AutoDateLocator(tz=tz,
  File "/Library/Python/3.9/site-packages/matplotlib/dates.py", line 1333, in __init__
    super().__init__(tz=tz)
  File "/Library/Python/3.9/site-packages/matplotlib/dates.py", line 1132, in __init__
    self.tz = _get_tzinfo(tz)
  File "/Library/Python/3.9/site-packages/matplotlib/dates.py", line 236, in _get_tzinfo
    raise TypeError(f"tz must be string or tzinfo subclass, not {tz!r}.")
TypeError: tz must be string or tzinfo subclass, not <matplotlib.category.UnitData object at 0x1291503a0>.
(base) rahuldas@Das ~ % 
#

what does this error mean?

#

some kind of type mismatch

final kiln
#

Not isometry, wait

#

Ah idk, I thought you could say they have the same cardinality

wooden sail
#

even with n = m, general matrices are not invertible

#

with n != m, they cannot be invertible

hollow sentinel
#
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   State               278886 non-null  object        
 1   City                278886 non-null  object        
 2   Year                278886 non-null  int64         
 3   Month               278886 non-null  int64         
 4   Day                 278886 non-null  int64         
 5   Store type          278886 non-null  object        
 6   Price per kilogram  278886 non-null  float64       
 7   Date                278886 non-null  datetime64[ns]
wooden sail
#

one looks for special conditions under which left invertibility is possible

final kiln
#

I might've not expressed what I meant correctly

hollow sentinel
#

i thought it accepted that type

final kiln
#

If I have a linear map from Rn to Rm

hollow sentinel
#

seemingly it doesn't

final kiln
#

That's a matrix right, and I can decompose it into two matrix that multiply into the original

wooden sail
#

the cardinality of a vector space is card(field)^dimension, so also the cardinality is not the same btw

final kiln
#

So that means I have a map like

Rn -> Rk -> Rm

wooden sail
#

yeah, and neither of the two will be invertible

final kiln
#

If k is very small, I don't find it intuitive that this composition could represent the original one

#

Because there's information being compressed, whereas in the original, there was not

wooden sail
#

that's what i'm telling you, the operation is not invertible in general

final kiln
#

I'm saying I didn't express what I meant correcty

wooden sail
#

what do you mean to say, then?

#

you had already said that at the beginning

final kiln
#

There's compressiom going on right, so the second transformation must be recovering something from the first

wooden sail
#

for one, it cannot be done linearly

final kiln
#

Not sure I follow

wooden sail
#

the dimension of the intermediate R^k also has to satisfy special properties

#

how's your linear algebra?

final kiln
#

No like, n -> k and then k-> m

wooden sail
#

if the dimension of the vector space the data is in originally is larger than k, than you irremediably lose data and can't do anything about it

final kiln
wooden sail
#

what you're asking is exactly what the johnson-lindenstrauss lemma discusses

final kiln
#

So again, I don't find it intuitive that the composition can fully represent the n -> m

#

But it's possible cuz it's a matrix mul

wooden sail
#

you'll need to review your linear algebra

final kiln
#

I disagree, understanding the math is different from having an intuitive picture of it

wooden sail
#

i gave you an intuitive explanation in terms of isometry too but ok

#

at any rate, if you look up sparse recovery you'll find any level of abstraction and detail you like about the topic

final kiln
final kiln
# final kiln No like, n -> k and then k-> m

I think the explanation is gonna be that the set of maps that you can build, which come from a composition of linear maps like this one, and where k is much smaller than both m and n, are not gonna be very complicated from the get go, so there's not a lot of information that needs to flow from one side to the other

toxic mortar
# toxic mortar I want to build unsupervided learning semantic-based cluster grouping of key inf...

Regarding this question, I built a pipeline that is not giving me the results I was hoping for. Pipeline is following:

Input:
    List of file IDs for document body extraction.
Steps:
    1) get_document_sentences()
       Input Aggregates sentences within documents by semantic similarity. 
       Outputs a list of lists, where the inner list contains semantically grouped sentences, and the outer list aggregates these groups across documents.
       - Sentences are initially split using '. ' (dot_split).
       - Semantically similar sentences within a document are grouped using vectorization and clustering (semantic_split).
    2) cluster_sentences()
       - Further clusters the semantically grouped sentences across all documents to identify broader themes or contexts.
       - Takes as input a list of lists, where each list represents a context, and clusters these contexts across documents using specified clustering techniques (e.g., agglomerative, DBSCAN, kMeans).
       - Outputs a list of lists, with each inner list containing sentences from various documents that share a similar context.
Output:
    List of semantically grouped lists.
#

Problem is that I get one dense "centroid" cluster and the rest are very sparse, and I dont have optimal number of clusters. I fine-tune it for one example group, but I overfit it and cant generalize

#

These are hyperparams I use:

 methods = [
        ('agglomerative', {'distance_threshold': 1.2, 'linkage': 'ward'}),
        ('dbscan', {'eps': 4.0, 'min_samples': 2, 'n_neighbors': 50}),
        ('kmeans', {'n_clusters': 30})
    ]
def cluster_sentences(sentences, cluster_method='agglomerative', **kwargs):
    sentence_vectors = vectorize_text(sentences)
    if cluster_method == 'agglomerative':
        model = AgglomerativeClustering(n_clusters=None, **kwargs)
    elif cluster_method == 'dbscan':
        n_neighbors = min(len(sentences), kwargs.pop('n_neighbors'))
        nn_descent = NNDescent(sentence_vectors, n_neighbors=n_neighbors, metric='euclidean')
        distances, indices = nn_descent.neighbor_graph
        n_samples = sentence_vectors.shape[0]
        indptr = np.arange(0, n_samples * n_neighbors + 1, n_neighbors)
        precomputed_distance_matrix = csr_matrix((distances.ravel(), indices.ravel(), indptr), shape=(n_samples, n_samples))
        precomputed_distance_matrix = sort_graph_by_row_values(precomputed_distance_matrix)
        model = DBSCAN(metric='precomputed', **kwargs)
    elif cluster_method == 'kmeans':
        n_clusters = min(len(sentences), kwargs.get('n_clusters'))
        kwargs.pop('n_clusters', None)
        model = KMeans(n_clusters=n_clusters, **kwargs)
    else:
        raise ValueError("Unsupported clustering method.")

    if cluster_method != 'dbscan':
        model.fit(sentence_vectors)
        labels = model.labels_
    else:
        labels = model.fit_predict(precomputed_distance_matrix)

    return labels
#

I mean I can experiment with hyperparam tuning like gridsearch,random search, kfolds etc, but I am not sure how to establish validation metrics for unsupervised learning
If anyone did something similiar to what I am trying to build, please let me know if i fkd up pipeline logic

#

How to find an optimal number of clusters
How to evaluate clustering?
Everything I found about evaluation and hyperparam optimization is about supervised learning
Do I start labeling data?

lucid wadi
#

from analysis.inception import InceptionV3 isnt working in my python script:
PS C:\users\zayga\VATr-pp-main> python .\generate.py text --text hello
Traceback (most recent call last):
File "C:\users\zayga\VATr-pp-main\generate.py", line 2, in <module>
from generate import generate_text, generate_authors, generate_fid, generate_page, generate_ocr, generate_ocr_msgpack
File "C:\users\zayga\VATr-pp-main\generate_init_.py", line 1, in <module>
from generate.text import generate_text
File "C:\users\zayga\VATr-pp-main\generate\text.py", line 5, in <module>
from generate.writer import Writer
File "C:\users\zayga\VATr-pp-main\generate\writer.py", line 15, in <module>
from models.model import VATr
File "C:\users\zayga\VATr-pp-main\models\model.py", line 7, in <module>
from analysis.inception import InceptionV3
ModuleNotFoundError: No module named 'analysis.inception'

#

can sb help

#

dm me if you can because im exhausted ty sm if you can

feral wind
#

hi guys so im working on a chatbot project and this is the error that i got:


You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface.

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742```

i tried doing the pip install openai==0.28 but it didnt work, and the tutorial on github is hard to understand, can someone explain to me
wooden sail
#

it would also mean that you have to work with a positive semidefinite metric tensor instead of a positive definite one. otherwise you lose the low dimensional projection, which arguably is what the authors intended to have

#

you had mentioned an approach where the matrices are identities and then they use max pooling. that's the same as a low dimensional projection

carmine wharf
#

Hi everybody, I have a question. I have to buy a new laptop. I want to do some deep learning and some CNN for computer vision. Do you think it make sense to buy one with an Nvidia GPU such as RTX 4050 ? Would that be enough to train models ? or is it better to have a dedicated server or a googlecolab with GPU to do that ?

final kiln
# wooden sail it would also mean that you have to work with a positive semidefinite metric ten...

I'm gonna re read their paper, but from what I recall they don't motivate their choices, tho their code and choice of hyperparameters is very telling, they always use the k that make the projection more efficient than using a quadratic directly - I even have a suspicion they started with quadratic and came up with this, but my impression is that they were just building on top of previous approaches and "accidentally" stumbled upon this

final kiln
#

but no reason to go overboard and you can get by with not having it, I don't and my setup is super efficient, I must spend on average less than 5 cents per day on gpu, some days I use more than others ofc, but avg it out and if you use it mindfully with a good setup, it's much cheaper

#

furthermore, and this is my personal take, the trend is gonna be towards decentralized ML training

#

if nothing else, I'll personally make it happen since I've had the idea in the back of my mind for while, but there's quite a lot of smart people pushing for it already

wooden sail
carmine wharf
final kiln
#

embedding they mention for sure ofc

#

the word space appears once when talking about the 1/sqrt()

#

"Due to the reduced dimension of each head, the total computational cost
is similar to that of single-head attention with full dimensionality" they do mention it

#

yeah in fact I think the "multi-headed" part was also central to the transformer innovation, which makes sense

#

I gotta re read the entire thing, they actually do motivate some of these things quite well

final kiln
carmine wharf
final kiln
# carmine wharf 6 to 8 GB from what I saw

that's pretty small in the context of ML, but still useful, and there's also cool tricks like gradient accumulation that let you train on more data than what fits in the gpu

carmine wharf
#

ok, did not know that

#

I will keep investigating, see if I can make my mind

#

thanks for your help 😄

pure pond
#

hello people, im trying to get more practical experience with pytorch and I have a question about the conventional way to select a row of a tensor. I have a standard scalar which has been trained on a dataset with (x,y) rows,cols. I also have a dataset with a getitem. It gets a row from my torch tensor, but in order to scale it, I need to transform the selected "row" from shape (y) into (1,y). (If im doing something weird here let me know)

My question then is, whats the more standard way to do it?
x = X_train[i : i+1], or
x = X_train[i].reshape(1, -1)

tidal bough
#

the latter is IMO more readable.

winter nacelle
#

I need to know how they got these calulation.
Can somebody help me?