#data-science-and-ml

1 messages Β· Page 100 of 1

dry geyser
#

"BUT THEY BEAT KASPAROV, SON"

#

"there is no chess board"

#

on a serious note

#

write_json is a blessing to isolate test cases with polars

final kiln
#

I wouldn't use polars, spark is the way

dry geyser
#

sssshhh

final kiln
#

Polars may promise multi processing, but spark actually delivers

dry geyser
#

for my use case polars is absolutely great

#

i rewrote all my validation and coalescing into expr

#

and it didnt suck

final kiln
#

Yeah I remember some parts being nice, but when it came time to scale it didn't deliver

#

There were also issues with some of the interfaces

#

And the IO didn't work as promised

#

That one killed me >.<

dry geyser
#

i didnt see any great features in spark for data transformation in the context of what im doing/need to do

final kiln
#

It will use all your cores while maintaining ram below 4gb

#

Spark is not without fault tho, there's some serious issues with memory leaking

#

I just re create the session when I use it, but it's not ideal cuz of the random split

#

It's probly something I'm doing wrong tho, somehow

#

Clearly states java heap

dry geyser
#

java will never beat rust in anything

#

re memory and performance

final kiln
#

Execution is everything

#

You can benchmark C to be slower than python

#

If you don't got the skills for C

dry geyser
#

not that java is impossible to run anywhere constrained... you could argue that smartcard applets are java bytecode and indeed they are.... but thats besides the point.

#

writing shit C that performs worse than python sounds like a challenge

#

whereas writing shit py is arguably trivial

#

just like js

#

the entry bar is low

final kiln
#

All I'm saying is that the language is not everything if the person writing doesn't do a good job

dry geyser
#

there are intrinsic/inherent requirements for certain languages that make the mistakes also comparatively more complex. ex. in C memory safety.

final kiln
#

I don't even know if java allows intereopability with low level anyway

dry geyser
#

rust is opinionated to an unbearable level if you write shit code

#

it can...

final kiln
#

I don't think that will protect you enough

dry geyser
#

android is java and you have tons of JNI/native code

#

most android software protections are actually using native code to interop with java for obfuscation etc

final kiln
#

The type system can't predict if you're reading your CSV right

#

Or if you're making good use of your cores

dry geyser
#

sure, but thats a different issue

final kiln
#

It's not, that's what drove me away from polars

dry geyser
#

rust forces constraints for a reasonable baseline of safety

final kiln
#

Spark is written in Scala apparently

dry geyser
#

btw ontopic: how can I "pretty print" a df in polars?

#

the console output scales horizontally

buoyant vine
#

just printing the df should work

#

although it will cap the width

dry geyser
#

i run terminal like it's 1973

#

80x75

#

yeah

#

gimme deathscrolls plz

buoyant vine
#

I assume print(df) makes an ugly output with that size

dry geyser
#

@buoyant vine also where do i send you a case of beer/drinks/whatever?

buoyant vine
#

πŸ˜… I wouldn't worry about it

dry geyser
#

lol

buoyant vine
#

Unless you have a miracle cure for whooping cough lol

dry geyser
#

i believe in the honorable practice of displaying gratefulness to those who help

#

i mean, i wouldnt rob a hospital of cough suppressants, but if that pays the moral debt....

final kiln
#

There's a context manager that lets you alter the width of the rows

#

Scala seems appropriate for this stuff

#

Rust may be the wrong choice as it's more of a systems language

buoyant vine
dry geyser
#

whats the keyword?

#

re ctx manager

final kiln
#

Ask gpt

dry geyser
#

lol

final kiln
#

But it was something Config

buoyant vine
#

well that is an unhelpful awnser lol

final kiln
#

What am I gonna do, lie ?

dry geyser
#

humiliate yourself saying 'i dunno'

final kiln
#

That's exactly what I said tho

#

While pointing you to where I got it in the first place

dry geyser
#

gpt just threw a pandas answer to me

#

derp

#

ever polite.

#

lets hope those fearsome hostile AIs never become a thing. i bet she will remember.

dry geyser
#

gpt-4 is pretty awful for polars

#

out of date and also gets quite some things awfully wrong

final kiln
#

That's why you cross check with the docs

buoyant vine
#

Or alternatively you just go straight to the docs πŸ˜… Makes stuff much quicker and simpler

final kiln
#

This model has like 100M params, gonna take forever to train

buoyant vine
#

shouldn't take too long

final kiln
dry geyser
#

@buoyant vine im missing more code samples

final kiln
#

one is 500dim, the other is 1000dim (they are transformers )

dry geyser
#

@buoyant vine quick q: im using a public csv for testing, and have been teasing the idea of doing the coalescing and grouping entirely via expr:

mapping = {
    "CUSTOMER_FORENAME": "first_name",
    "CUSTOMER_SURNAME": "last_name",
    "CUSTOMER_GENDER": "gender"
}

structured_column = pl.struct({
    new_key: pl.col(old_key) for new_key, old_key in mapping.items()
}).alias("person")

How can I retrieve this to create a dict properly containing the keys-values? ex. person : { first_name: ...., }

#

df.select(structured_column)?

buoyant vine
#

hmm

#

I think that should work

#

Or at least I can't see anything wrong with the idea]

#

or it might be a with_columns otherwise

dry geyser
#

pprint.pprint(df_with_struct.rows(named=True)[0]) still shows the column names though

#

{'person': {'CUSTOMER_FORENAME': 'JOHN',
'CUSTOMER_GENDER': 'male',
'CUSTOMER_SURNAME': 'DOE'}}

buoyant vine
#

Isn't that what you have defined as your new_key?

dry geyser
#

if i swap them they obviously dont exist: polars.exceptions.ColumnNotFoundError: first_name

fossil forum
#

Hello everyone, I am an AI/ML engineer working in the US. We have recently started a discord channel aimed at

  • Sharing AI ideas
  • Finding project mates for AI projects
  • Study groups to learn AI
  • Resource sharing
  • Networking

This is in it's early stages, but if you are one of those who are interested in leading conversations and building a beautiful AI community join this channel.
Reach out to me if you are interested

final kiln
#

this one is 50, 200, 500, 1000 if im not mistaken

buoyant vine
dry geyser
#

i could but that seems against my ocd tendency towards not touching the original columns

buoyant vine
#

could do a .rename(mapping) before so the columns are correct before they go into the struct

dry geyser
#

lemme try the rename

#

@buoyant vine where do i place the rename_fields? can i concat/chain it directly to the pl.struct?

final kiln
#

I know for sure its gonna do the thing, but it sure is taking a while

buoyant vine
#

I think you can just do .alias("person").rename_fields(["field_1", "field_2"])

#

it expects the new fields to be in the order you defined the struct in

dry geyser
#

.struct.rename... seems to work

#

yep

#

structured_column = pl.struct({
new_key: pl.col(original_column) for new_key, original_column in mapping.items()
}).struct.rename_fields(list(mapping.values())).alias("person")

final kiln
#

you know you can also do stuff using sql, probly easy for those kinds of operations

#

easier *

dry geyser
#

@final kiln i havent got started with the sql side yet

#

does it go thru the same engine?

#

exprs seem blazing fast

buoyant vine
#

it uses the datafusion engine yeah

#

but it goes through a few more hoops

final kiln
#

i mean if all you're doing is renaming stuff

#

i think its about to do the thing, or am i losing my mind already

dry geyser
#

@buoyant vine now im rewriting the dynamic expression stuff. the basic things like "if this bool column is set to True, then the field value is foo" does not seem too complicated

#

i wrote my own sandboxed asteval-like expression engine, but it was horribly slow

buoyant vine
#

πŸ‘

final kiln
dry geyser
#

you should

final kiln
#

Ah I see the issue tho, it's just super slow

#

It's still on the third slice after an hour

dry geyser
#

it was hailing here so ill be cranking out shitcode

#

is there a way to limit/condition an expression to the presence of a non null value in a specific column?

buoyant vine
#

can combine the expression with a col.not_null() expr

#

i.e. (pl.col(col_name).not_null() & other_expr)

dry geyser
#

ex. if column X is not null and set to boolean true, set a new column FOO to value XYZ

buoyant vine
#

hmm

#

should be able to

dry geyser
#

on it

#

@buoyant vine is it possible to add a new field to a struct without recomposing it?

buoyant vine
#

I dont think so

dry geyser
#

i suppose then the way to do it is to create an intermediate column

#

and add it

#

can map_* be used to do something like what i asked earlier re conditional field values?

buoyant vine
#

it basically is a pandas.apply method

#

i.e. it gives you the column value, and expects a value returned

#

what you do inbetween those points it doesn't really care about

#

the problem is it limits your performance signficantly

dry geyser
#

expr?

buoyant vine
#

dont you want pl.concat_list(list(fields)).unique().drop_nulls().alias(name) rather than doing it after the explode?

dry geyser
#

lemme check

#

polars.exceptions.InvalidOperationError: unique operation not supported for dtype list[str]

buoyant vine
#

what if you do .arr.unique?

final kiln
#

I changed the LR schedule, increased the warmup period.

I really need to look up the rationale behind the 2017 LR scheduler

#

It had the opposite effect on the scheduler, but it looks like it improved the situation

Which would be awesome, except that it totally means idk what I'm doing >.>

#

Ok so.

With max LR of 500e-6 it stayed up there for a long time but I could notice a slight slope downwards + the mini batch loss was becoming more stable.

A smaller max LR (which implies smaller LR throughout), has not changed it very much except that the slope downwards has increased, but still nothing major

late ruin
#

hi, need a quick help, i've got an xlsx file that contains these numeric columns, they are float64, now as you can see, they are not very pretty id say, i mean the way they are represented are too long, i tried to change using with as type by doing

df['FF','Rs','Rsh','VOC(mV)','jsc(Β΅A/cm2)']= df['FF','Rs','Rsh','VOC(mV)','jsc(Β΅A/cm2)'].astype(double)

which haven't worked cause of the name of each column, would love to hear some tips and tricks

final kiln
#

My intention was to increase max LR to speed up the process. But since the opposite occurred, it means that the model is overshooting the minima.

left tartan
late ruin
#

id like to change the way each column has its information displayed

left tartan
#

There’s more advanced things you can do to render it differently, but that might be enough

dry geyser
#

@buoyant vine polars.exceptions.ComputeError: expected array dtype

late ruin
final kiln
dry geyser
#

@buoyant vine the fields array is Expr(col(thecolumn))

#

phone1, phone2, etc

buoyant vine
#

what is your code rn?

languid moss
#

Hello, I need to develop an AI to play the 2048 Game, I have a large experience with Python, I know i have to use Tensorflow but I just need a roadmap for my learning, thanks in advance !

late ruin
final kiln
dry geyser
#

taking break, stuck on this one

final kiln
#

It's also possible that it's just not a good idea to do d=1000 and N=12, since that's equivalent to what they used in 2017. And they used a lot more GPU than what I'm using.

#

I'm gonna let this one roll and start doing from 800

long canopy
#

anything out there to scrape or query github?

tidal bough
#

and googling pypi github gets me at least one popular library implementing it, even.

long canopy
#

hm right, i hope the rate limit doesn't affect me

#

maybe there's a github dataset of repositories somewhere?

tidal bough
#

a dataset of what info from the repos?

final kiln
#

one is batch size of 32. the other 128

#

32 has higher learning rate, 128 lower

#

one is betting that the gradient calculation is accurate enough, so I just need to double down on them (double the updates and larger steps), the other is betting that the model was overshooting the minima so it needs to calculate more accurate gradients and take smaller steps

long canopy
#

thanks a lot!

final kiln
# final kiln

New approach, I'm starting from 500, which I already saw that converges, then increase it til 1000 in chunks of 100

late ruin
#

hey im trying to normalize my data, in the next way (pic) , but the 'B.C' column is string and id like to keep it without applying anything to it, any suggestions?

tidal bough
#

well, simplest way would be normalized_df["B.C."] = df["B.C."]. :p

#

the nice way would be to exclude that column from df before doing this stuff

late ruin
#

im omega pepega lol ty

final kiln
long canopy
#

will two prompts with the same tokens in different positions have different vector embeddings?

final kiln
long canopy
#

ah right that makes sense

teal lance
#

Can anybody help with this error ☝🏽

neon crystal
#

Hey guys I am trying to run GMMHMM model for regime detection on a time series. But I am not sure about the amount of clusters distribution I need. I remember from normal ML for K means I can use metrics like WSS and other methods based on the centroids. But now I am working with Gaussian distributions here what metrics can I use? I am thinking maybe KL or JS divergence but are these really a good metrics to use?

signal holly
#

can anyone help mentor me for my ml journey
I keep giving up easily relying only on myself

limber mesa
teal lance
limber mesa
#

Jep that sounds about right

dense pawn
# signal holly can anyone help mentor me for my ml journey I keep giving up easily relying only...

I completely understand the struggle with giving up easily. While I'm not at the point of being a mentor someone, I'm also on this ML journey and I've found this roadmap helpful: [https://i.am.ai/roadmap/#note] It includes specific steps and resources that helped me stay motivated when I felt stuck. Perhaps some of them could be useful for you too!

limber mesa
final kiln
#

I've checked the commit hashes, compared them using github, there's no code differences

#

800 and 900 are missing because celery failed, I need to implement a circuit breaker and a timeout thing

#

This is turning out to be random. I need to step back and reflect on why this is random and how to make it, not be random. Otherwise I won't get anywhere with this.

#

I'm reshuffling the batches to prevent the network from capturing any patterns coming from the order in which it sees the sequences

#

My hypothesis is that the small batch size is at fault.

The way it is, it might be throwing the dices until it finds a sequence of batches that happen to accurately represent the gradient landscape, when it finds them in sequence and in sufficient number it then finds the direction towards the local minima and from there I suppose the slope is large and all directions are funneling towards the minima so the loss starts decreasing sharply, until it no longer is the case and the situation regresses to where it was but at a lower level, which would explain why they all converge to the same value more or less.

#

Guesswork is no good here. Since this setup is highly efficient memory wise, I can fit the transformer from 2017 and possibly their batch configuration. There's not gonna be a one to one correspondence but I can calculate the information content on each batch from 2017 and try to match it in my batches. The model hyper parameters are more or less the same too, especially if the MetaFormer stuff translates to NLP.

hushed scarab
#

Hello everyone; let’s say you want to train a dataset, where can I find the data instead of creating it from scratch?

versed pilot
versed pilot
#

It's focused on interferometry, you didn't say what exactly you are doing with SAR

tired otter
#

Hi. Question about variational autoencoders. Is the main idea that during encoding, latent space is sampled from multidimensional distribution which is shaped by parameters obtained from input data?

mental radish
#

Hey! I tried to fit the flux of cosmic rays versus energy for AllParticles&H. I used the CRDB package to extract the data. Then, i used the power law to fit. However, the shape of x is (1236,) and the error in y is (1236,2). how do i fix the shape issue?

Here is my code:

x = t_combined.e
y = t_combined.value
err = t_combined.err_sta
lsq = LeastSquares(x, y, err, power_law)
m = Minuit(lsq, a=1, gamma=-2.0)
plt.errorbar(x, y, err, fmt="o", label="data")
plt.plot(x, (x, *m.values), label="fit") # what does this line do?
ax.scatter(x, y, label="Combined original data", marker="x")
a_fit = minuit.values.a gamma_fit = minuit.values.gamma
x_fit = np.logspace(np.log10(t_combined['e'].min()), np.log10(t_combined['e'].max()), 100) y_fit = power_law(x_fit, a_fit, gamma_fit)
ax.plot(x_fit, y_fit, label="Fitted power law", linestyle='--', color='red')
plt.xlabel(r"πΈπ‘˜ [GeV]")
plt.ylabel(r"πΈπ‘˜ d𝐽/dπΈπ‘˜ [1/(m2 s sr)]")
plt.title('Power Law Fit')
plt.legend()
plt.xscale('log')
plt.yscale('log')
plt.show()
print("Fitted parameters (a, gamma):", m)
plt.show()

dry geyser
#

dang

#

still stuck with the dedup + struct expt

#

expr

merry oak
final kiln
#

behold, mah pipline

#

it now uses pull requests to centralize note taking

#

so I open a PR, it automatically detects if it's an experiment, creates it in mlflow, when I merge it starts running it

sharp zenith
#

the AI bubble blew up ? looks like there's nothing new in huggingface.co for a long time

agile owl
#

maybe there's just too many piglets for the teats

#

so is polars just generally faster than pandas

#

or is it situational

final kiln
#

may I try to convert you to the religion of Spark

zinc moat
#

What's the best low pc computing cost object detector? I want create my own security cam (only a detector of people)

final kiln
zinc moat
#

ty

#

I'll see

final kiln
#

Now I can use the PRs as logbooks for each experiment. And they can refer each other and all that stuff so everything is gonna be neatly organized

sterile talon
dry geyser
#

any polars guy around?

agile owl
final kiln
agile owl
#

Polars is written from the ground up with performance in mind. Its multi-threaded query engine is written in Rust and designed for effective paralellism. I

#

pandas is also that way afaik

final kiln
#

I mean ig the lib wasn't even able to read my data so idk

#

All I know is spark took my kaggle and got CPU up to 300% with no effort from my part

#

While polars wasn't able to read a basic CSV in lazy mode thing

iron basalt
agile owl
#

under what conditions might pandas be faster

#

or is it universally the case

final kiln
#

I will risk and say always cuz I've never seen pandas be fast in my life

agile owl
#

pandas is faster than pyspark lol

#

if it's a small dataset

#

the cost of spinning up the workers is not worth it oftentimes

final kiln
#

Ig if the dataset is small I'll be using python constructs

agile owl
#

that is not performant at all

#

you need to use numpy or pyarrow

final kiln
#

It's way more performant than pandas

iron basalt
# agile owl under what conditions might pandas be faster

Maybe in some specific single-threaded cases the Numpy operations used are faster in their C/Fortran implementations than those written in Rust (unlikely, even a simple loop with optimizations (auto-vectorization) enabled in LLVM will be fast).

final kiln
#

If you're using numpy

agile owl
#

that's just factually incorrect

#

that's not a python construct though

#

that's numpy

final kiln
#

True

#

Still

#

Small dataset, a dict + numpy does the trick for me

agile owl
#

by small dataset I mean smaller than RAM

#

or your RAM budget

#

I still wouldn't use a dict for nested indexes or anything like that

final kiln
#

Why do you find the spark init so expensive tho, the memory management alone makes it worth it

agile owl
#

because you don't need memory management if the data is smaller than your RAM budget?

final kiln
#

It's nice if you can cap it a 4gb at will

final kiln
agile owl
#

for what purpose

final kiln
#

For the purpose of having memory for your other stuff

boreal gale
#

my 2 cents is that polars is generally faster than pandas (pandas<2 for sure, less so for pandas>=2)
but i don't think time series support in polars is really fully there yet (if you care about those stuff, iirc you deal with finance so i thought this would be relevant)

final kiln
#

Memory leaking and etcs too ofc

#

If you want performance

#

You think memory management

agile owl
#

that is not what I think at all

final kiln
#

Then you're thinking wrong

agile owl
#

when I think performance I think using all the memory I have

#

because that's the axiom of computer science

final kiln
#

Idk none of that, all I know is that performance is about where you put your memory and how you lay it out

#

And pandas and polars are not the tool for that

agile owl
#

the tradeoff between using memory and not using memory is that you use the memory to get things done faster

#

im not worried about constraining memory usage

final kiln
#

I think you're arguing against a straw

agile owl
#

you said i need memory management but in fact that stuff has an overhead

#

and i was asking for performance

final kiln
#

Memory constraint is not the only thing in memory management

final kiln
agile owl
#

not willing to go that far

final kiln
#

As the code will be specifically made for your use case

#

Then numpy and polars are the next best thing afaik

#

Ah and spark ofc

agile owl
final kiln
#

Spark being better since it makes better use of resources

agile owl
#

I'm not "against" spark but adding dependencies especially ones that need separate runtimes has a cost

final kiln
#

Execution is everything

#

If it's a well done thing, you don't care

#

That's my take at least, I was very impressed with it

iron basalt
#

Why is there any issue here? Just download and try all of them on your data. Measure it.

final kiln
agile owl
#

I would have to rewrite a lot of code and I'm not sure if it's the best use of my time rn vs other things I need to write for this project

final kiln
#

Otherwise we just arguing about the size of the angels wings

agile owl
#

so I was trying to figure out as much as I could from ppl who have used both

iron basalt
agile owl
#

that's what I'm trying to figure out.. lol

#

if it's like 5% faster

#

then there's no point right now

warm copper
#

hey guys

iron basalt
warm copper
#

I have a question

slim wigeon
#

can someone help me to debug my implementation of DNN and backprop from scratch? i cannot provide provide any more information about the problem im having in the code because idk where the problem is
my goal is to write DNN using only numpy, but after i finished the implementation, my implementation of DNN just doesnt learn
i have been debugging for a few days and i couldnt find the problem
please dm me and ill send you the code

agile owl
#

it's bound by different resources at different stages. the ultimate bound is the CPU-GPU mem interface

#

at some stages it's bound by my shitty python code

#

at others by pandas implementations

#

but CPU bound in general

iron basalt
warm copper
#
# here we initialize a random data matrix X and random numerical labels y
import numpy as np

X = np.random.randn(10,3)
y = np.random.randn(10,1)

# we also initialize a hypothetical hyperplane defined by w and b
w = np.random.randn(1,3)
b = -1

# (i) find the numerical labels predicted by the model (w,b) for the points in X
#     your code should be a single numpy line
#     hint: we wrote this equation for a single point x in class
#           try to generalize it by expressing everying in terms of matrices

# your code goes here

y_predicted = np.dot(X, w.T) + b

print(y_predicted)

# (ii) find the updated weights after one application of gradient descent with lr = 0.1
#      your code should be a single numpy line

y_ = np.random.randn(10,1)

w_updated = w - 0.1 * np.dot((y_predicted - y_).T, X)

print(w_updated)
final kiln
warm copper
#

for the last part of the code

#

should I use y_ or y

boreal gale
# agile owl whats the problem with time series in polars?

i think my old gripe was just there is no groupby rolling and/or the interface was fairly clunky for my specific usecase - this is probably fixed, i can't recall my exact issue

my latest gripe is that ewm_mean in poalrs doesn't take a times like in pandas

warm copper
#

y_ was given in the last part of question

#

kinda felt weird to me

agile owl
#

so it's a reinforcement learning environment

iron basalt
agile owl
#

the environment itself is computationally complex and run on the CPU

warm copper
#

I used y_ nevertheless

agile owl
#

the GPU is doing the actual network

warm copper
#

it must be there for a reason right?

#

@final kiln

agile owl
#

and it's bound by the CPU-GPU memory interface

agile owl
#

but I'm also reading/writing with DB

warm copper
#

I dunno?

#

it was prewritten in the code

final kiln
#

I think you gotta use y right ? Since that's your data

warm copper
#

whats y_ doing there then?

#

πŸ₯²

final kiln
#

Compare y_predict with y and apply grad desc

final kiln
iron basalt
agile owl
#

i'm doing numpy operations on it but I use pandas to read from sql

#

also to index it

final kiln
#

My love for numpy is undying

iron basalt
#

Ok, what kind of operations are you doing on the CPU on that data?

agile owl
#

I could have the data in an in-memory database instead of postgres

iron basalt
agile owl
#

yeah

#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

agile owl
#

don't mind the asserts I'm still cleaning it up

#

I just wanted to quiet the typechecker without comments all over the place

wooden sail
warm copper
#

here @wooden sail

wooden sail
#

here what? what am i looking at

warm copper
#

the question

final kiln
#

Ig they want to apply gradient descent with y_

iron basalt
agile owl
#

why

iron basalt
#

Memory allocations.

final kiln
#

Memory management

agile owl
#

so it's copying everything?

wooden sail
iron basalt
warm copper
#

yeah

agile owl
#

isn't that what Spark does too?

final kiln
#

Can't you ask the professor

iron basalt
#

If you allocate in a loop on the heap, all performance goes out the window.

warm copper
#

for y_predicted I used this line of code

#
y_predicted = np.dot(X, w.T) + b
#

@wooden sail

final kiln
#

Like it's becoming a matter of interpretation

agile owl
#

so are you saying I should make a big numpy array first

#

and then assign the individual elements

iron basalt
# agile owl isn't that what Spark does too?

Not sure what it does, but normally you would either just avoid allocation entirely (probably not needed), or if you really need to, a fast memory arena (arenas are often used by big fast data projects (probably Spark does)).

#

A memory arena basically just being a pre-allocated chunk that is O(1) to allocate on.

warm copper
#
y_ = np.random.randn(10,1)

w_updated = w - 0.1 * np.dot((y_predicted - y_).T, X)

print(w_updated)
``` for the second part of the question
agile owl
#

so should I just make a big numpy array with the shape of the output

#

and then assign the elements

#

rather than concatenating

warm copper
#

what do you think? @wooden sail sounds good to me but Im not sure why we dont use y at all

wooden sail
warm copper
#

lol

iron basalt
#

Any way you can avoid memory allocation. Without that, you have no chance at fast speeds in a tight loop.

final kiln
#

That's why asking the person who wrote it is the best option

warm copper
#

hmmm

#

I asked openai just now

#

it did what I did hahahaha

final kiln
#

Be careful with those things when using for learning.

iron basalt
wooden sail
#

i can't comment on the gradient since the cost function isn't written there

warm copper
#

yup

#

weird function

iron basalt
#

Consider something like np.sum(a + b) in Numpy. Numpy has to run its elementwise addition, and then sum on that. That is looping over all the elements twice. But something that chains operations together like polars can just do that in a single loop. ```c
int sum = 0;
for (int i = 0; i < N; ++i) {
sum += a[i] + b[i];
}

#

So even though it may be nicely vectorized and whatever in numpy, it's still just doing more work.

slim wigeon
#

is there a place where i can upload python notebook to share it?

wooden sail
final kiln
#

I think it's least squares, I recall him showing a slides from a lecture

sterile talon
iron basalt
final kiln
#

Tensorflow ftw

iron basalt
#

(This is especially important for reducing allocations)

warm copper
#

1.  y_predicted

If you meant y_predicted, it represents the predicted output of the model for the given data matrix X. After calculating y_predicted = np.dot(X, w.T) + b, this variable holds the model's predicted values for each corresponding sample in X.

2. y_

If you actually did mean y_, it seems to be  an unused and undefined variable  in the provided context. Perhaps it was meant to serve a purpose but got left in the code without assignment.

Explanation

Here's a breakdown of what typically happens in a simple linear regression model, as seems to be intended in your code:

y: This variable holds the true labels or target values for each data point in your dataset.
y_predicted: This would store the model's predicted labels after being calculated using the weights w, bias b, and the data X.
Model Update: y_predicted and y are then used to compute the errors in the model's predictions and to update the weights w through gradient descent.
Let me know if you have a specific portion of the code where you're seeing y_ and I can provide a more tailored explanation!






profile picture
# here we initialize a random data matrix X and random numerical labels y
import numpy as np

X = np.random.randn(10,3)
y = np.random.randn(10,1)

# we also initialize a hypothetical hyperplane defined by w and b
w = np.random.randn(1,3)
b = -1

# (i) find the numerical labels predicted by the model (w,b) for the points in X
#  your code should be a single numpy line
#  hint: we wrote this equation for a single point x in class
#     try to generalize it by expressing everying in terms of matrices

# your code goes here

y_predicted = np.dot(X, w.T) + b

print(y_predicted)

# (ii) find the updated weights after one application of gradient descent with lr = 0.1
#   your code should be a single numpy line

y_ = np.random.randn(10,1)

w_updated = w - 0.1 * np.dot((y_predicted - y).T, X)

print(w_updated)

what does y_ do here



Show drafts

In this code, y_ serves as a source of randomness or noise, and it does not directly contribute to the linear regression process. Here's why:

 Gradient Descent Calculation:

 Let's focus on the line where you update the weights using gradient descent:

Python
w_updated = w - 0.1 * np.dot((y_predicted - y).T, X)
Use code with caution. Learn more

 Notice that  in this expression, you are using the following:

y_predicted: The values predicted by the current model.
y: The true labels for your data.
X: The data matrix itself.
 The calculation doesn't involve y_. The goal of gradient descent is to update the weights w to improve the model's accuracy. This improvement is based on the difference between the true labels (y) and the model's predictions (y_predicted).

 Possible Scenarios for y_:

 Typo:   It's very possible that y_ is the result of a typo, and the intention was to reuse y (the true labels) instead.

 Experimentation: The code might be  intended for experimentation. Perhaps the developer had the idea of introducing random noise during the gradient descent step, although this is not a standard practice in a typical linear regression context.

 Key Point:  When performing linear regression with gradient descent, the key elements for weight updates are the true labels (y), the predicted labels (y_predicted), and the data matrix (X).

 Let me know if you'd like me to analyze a different section of code or explore  more advanced variations on  gradient descent!
#

I asked Gemini

#

Im gonna use y instead of y_

wooden sail
#

you need to ask your lecturer because the way it's written, y is not used anywhere

final kiln
#

R u in the US, idk if Gemini ultra is already the Gemini latest in the EU

wooden sail
#

what cost function are you using?

warm copper
#

non

#

none

wooden sail
#

yes you are

warm copper
#

he just wants one line of code there

wooden sail
#

you are using one, otherwise there is no gradient to speak of

final kiln
#

It's the square function from one of your lectures

wooden sail
#

what is the gradient being computed of?

#

that's the cost function

warm copper
#

lr 0.1?

final kiln
#

0.1*dy/dw, 0.1 is the dw

#

In the numerator, not the other one

#

calculus of variations anyone

wooden sail
#

.latex the way you have written it, in column vector form, would be
[
\bm{y} = \bm{Xw} + \bm{b}
]
for which the gradient, assuming a least squares cost of the form
[
\Vert \bm {y} - \bm{Wx} - \bm{b} \Vert_2^2
]
is
[
g(\bm{w}) = 2(\bm{X}^T\bm{Xw} - \bm{X}^T(\bm{y} - \bm{b}))
]

strange elbowBOT
wooden sail
#

which you'd then scale by 0.1

#

gradient means you took the derivative of something. what did you take the derivative of?

warm copper
#

this is an exercise

wooden sail
#

because what you got does not match a least squares function. what was it instead, then?

warm copper
#

it doesnt have any previous stuff

#

in the code

wooden sail
#

then it doesn't make sense

#

i can't say "find the derivative" and not tell you what to take the derivative of

#

it says there you wrote the equation in class, so the answer is in your notes from class

tidal bough
wooden sail
agile owl
#
Invalid index type "tuple[slice, ndarray[Any, dtype[bool_]]]" for "_LocIndexerFrame"; expected type "slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | Callable[[DataFrame], slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | list[<nothing>]] | list[<nothing>] | tuple[slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | list[<nothing>] | tuple[Index[Any] | Series[bool] | ndarray[Any, dtype[bool_]] | list[bool] | str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | list[Any] | slice | tuple[str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex, ...], ...] | Callable[..., Any], list[<nothing>] | slice | Series[bool] | Callable[..., Any]]"Mypyindex

thanks mypy!

wooden sail
#

this is pretty contrived

warm copper
#

the question is called Linear Regression with numpy 1liners

wooden sail
#

standard linear regression is based on least squares

#

go to your notes and find the cost function that was used

final kiln
tidal bough
wooden sail
tidal bough
#

yup, y_ shouldn't exist

wooden sail
#

which is fine, X has full rank with high probability

warm copper
#

I think its probably a typo

#

Im using y

#

y_predicted - y

#

y_ doesnt make any sense there

wooden sail
tidal bough
#

and i like it this way

final kiln
wooden sail
#

i'd use y_ because it's what they put in the code block, but you have to ask them

#

and go verify in your notes whether it's least squares, cuz your gradient looks wrong

agile owl
#

I'm starting to think that typecheckers just can't handle get item

#

at least not with pandas

iron basalt
warm copper
#

@wooden sail this is from the notes

wooden sail
#

ok, so it's least squares and you're absorbing the scaling factors into the 0.1

warm copper
#

yup

#

what I dont understand is

#

this is not a linear regression

#

I dont see any fitting or nothing

#

feels stupid

wooden sail
#

yes there is

warm copper
#

where is the fit function?

final kiln
#

L

warm copper
#

oh the learning rate?

final kiln
#

no, L is the error being minimized I think

#

in your notes

wooden sail
#

it failed

final kiln
#

r u writing latex manually

wooden sail
#

wdym manually

warm copper
#

so should I use y_ or y?

wooden sail
final kiln
#

without at least real time feedback of the result

#

there's a nice web app thing for it

warm copper
#

what? lol

wooden sail
#

.latex "regression" is another word for "fitting" or "finding parameters". you're doing gradient descent on the function
[
L = \frac{1}{n} \Vert \bm{y} - (\bm{Wx} + \bm{b}) \Vert _2 ^2
]

strange elbowBOT
wooden sail
#

there we go

tidal bough
#

edd, crunching his bones: back in my day we had to write latex on a piece of paper, and get it right the first time

wooden sail
#

this function L is what you're minimizing. and you're doing so by tuning w via gradient descent

final kiln
wooden sail
#

no paper writer worth their salt uses that

warm copper
#

so what do you guys think I should do?

#

for my question

wooden sail
#

use y_, and more importantly, go review your course material

warm copper
#

thats all the course material

wooden sail
#

cuz it sounds like you aren't grasping the key ideas

final kiln
#

what's wrong with it, i'd say it's quite handy

warm copper
#

I dont see any mention of y_ in my notes

wooden sail
#

it's just slower. i'll just write a large chunk of raw tex and compile it later

wooden sail
#

i keep telling you, the code you were given is not consistent

#

you could use either y or y_ and it will work

#

this is just a parameter you evaluate into the loss function L

#

i assure you the person grading won't care either, but since they went through the trouble of making y_ in the new cell, just use that

warm copper
#

so this is my loss function

wooden sail
#

yes

#

that's the same as i wrote above

#

the sum of squared errors, which you then minimize to achieve the "least" value

#

hence "least squares"

warm copper
#

this is a and b

wooden sail
#

a is the same as w in your task

warm copper
#

yup

final kiln
#

what is the question

wooden sail
#

the real question is, why do they insist on using row vectors. yuck

warm copper
final kiln
#

I have been converted actually

#

row is better cuz pytorch

wooden sail
#

no, pytorch bad cuz row

warm copper
#

lol

#

I did a great job with the rest of the questions tho

final kiln
warm copper
#

πŸ˜„

agile owl
#

I'd guess it's a lot more intuitive for most people to think of tables as a set of rows stitched together than to think of it as columns where each index represents a different individual

warm copper
#

okay then y_predicted - y_

final kiln
#

yeah once it gets to several dimensions row is easier to think about

#

like (x, y, z, d, v, c) shapes

wooden sail
final kiln
#

linalg is agnostic to this tho

wooden sail
#

it honestly makes no difference as long as you're consistent and keep in mind your fundamental vector spaces

#

but math books canonically use column vectors, so

final kiln
warm copper
#

If our given data set is linearly separable, does the same hold true for the transformed set? In the following cells you can plot a transformed version of the Iris dataset, so that you see how it behaves (for your choice of π‘Ž , 𝑏 , 𝑐 .) But you should also try and justify your answer in a theoretical way: if there exists a 'good' perceptron for the original data set, what would be the weights for the perceptron that works on the transformed set? Are there any issues that might arise?

#

I answered this question using linear algebra

wooden sail
#

linalg would be the way

warm copper
wooden sail
#

rank-reducing transformations will give you a nontrivial kernel

wooden sail
tidal bough
warm copper
#

he wanted us to use chatGPT for this question as well

slim wigeon
#

guys, i need help with finding out why my implementation of dnn is not learning 😭 i have been trying to debug for 3 days and im slowly going insane

warm copper
#

But I assume linear transformations dont cause loss of linear separability? @wooden sail

final kiln
wooden sail
#

go ahead and try to separate your data if T is the zero matrix

warm copper
#

oh hsit yeah

wooden sail
#

the same will be true for any T that is not full rank

#

well, there's a discussion to be had about domains, kernels, and pre-images

#

this is exactly what i meant about your fundamental subspaces

#

you can play with the rank-nullity theorem or the fundamental theorem of linear algebra here

#

things get a little bit more tricky for nonzero b because it becomes an affine transformation, but the spirit of the discussion is the same

warm copper
#

πŸ˜„

#

Arent you a math wiz

#

you wanna see how stupid my DSA assignment is?

#

@wooden sail

agile owl
#

linear algebra has always hurt my head tbh

wooden sail
#

i want to sleep πŸ˜”

agile owl
#

I prefer calculus

#

it's a lot more intuitive to me

warm copper
#

Option 1:  all arriving passengers are placed in a single queue, and service stations take passengers from the front of that queue.

Option 2:  each service station has its own queue, and arriving passengers are dispatched to a queue according to one of many policies:

2.A:  round robin (1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, ...).

2.B:  arriving passenger is placed in a shortest queue.

2.C:  arriving passenger is placed in a random queue.

 Inputs to the simulation:
The duration of the simulation measured in minutes (D:  make it arbitrarily long, do not worry about it being or not being realistic).
The average arrival rate measured in minutes (A:  arrivals are random, but on average there is one new passenger every A minutes), 
The average service rate measured in minutes (S:  service rates are random, but on average they need about S minutes of service).
For the sake of this study, make sure to crowd the system, by choosing S >> 5*A, without causing an overflow of your queues.  Also choose D to be long enough to get rid of any transitory effects.

 Outputs of the Simulation for each queuing policy:
The duration of the simulation (which may be longer than the input parameter, as when check-in closes, there may be passengers in the waiting queues and service stations).
The maximum length of the queue for each queue.
The average and maximum waiting time for each queue.
The rate of occupancy of each service station (percentage of time each station was busy).
If you want: show the real-time evolution of the queues during the run-time simulation.
final kiln
#

try linear algebra with complex numbers

warm copper
#

I asked teacher if I could use Markov Queues from Queue Theory

#

M/M/c

#

He didnt even know what that was

final kiln
#

what is it

#

just searched on google, ofc queue theory is a thing in math

warm copper
#

this was my response

#

and he was like what?

agile owl
#

where does that say markov queue though

warm copper
#

Its a markovian model

final kiln
#

everything circles back to markov

agile owl
#

glad I don't need to know that

warm copper
#

similar to what I learnt

#

in my stats class

final kiln
#

check knot theory

#

it's also a thing

#

surprsingly useful tho

warm copper
#

oh god

#

you will learn markov chains as a part of your AI journey @agile owl

agile owl
#

I know what they are

#

I meant the whole queueiing thing

final kiln
#

where are markov chains used in ai ?

warm copper
#

its an implementation of markov chains on queues

agile owl
#

theyre used in reinforcement learning

final kiln
#

oh, havent gotten to that stuff

warm copper
#

yup

agile owl
#

state-action-reward-state-action-reward...

final kiln
#

makes sense

warm copper
#

whoever this markov guy was

#

he was genius

agile owl
#

the learning curves in reinforcement learning go up because we are trying to maximize rewards instead of minimizing losses

#

I like to joke that it's the optimistic branch of ML

final kiln
agile owl
#

yeah

warm copper
#

Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. He was also a strong, close to master-level chess player.
Markov and his younger brother Vladimir Andreevich Markov (1871–1897) proved the M...

agile owl
#

the way it fits is random exploration of states

#

but it has what is called a policy gradient

#

so it's partially random in that the state it explores are randomly generated but it optimizes the policy gradient which is the function that links actions to rewards

#

a | s -> R

final kiln
#

interesting

#

so you have a state, and you throw a dsiplacement at random and see if that improves the reward

warm copper
agile owl
#

yeah

#

your actions lead to states

final kiln
#

so it's proposal based right

#

like, if a certain displacement doesnt work you throw it away

agile owl
#

yeah it learns to make the right action given the state to optimize the rewards

#

not just for one step but for an entire episode of steps

warm copper
#

are you studying Data Science? @agile owl

agile owl
#

yea sure

final kiln
#

wait so it's like a genetic thing

#

i need to read up on this stuff

warm copper
agile owl
#

sure

#

I already got a degree in it they just didn't cover reinforcement learning very much so I'm self-studying it now

warm copper
#

yeah we didnt either

#

we will do in my grad program

final kiln
#

i might go for phd, really liking ml and stuff

#

down the line tho

warm copper
#

queue theoryy

#

learn some of that

final kiln
#

i wonder what other mundane things mathematicians have turned into fields

agile owl
#

I think reinforcement learning is really a lot more exciting than static learning problems

#

that's how they get robots to walk etc.

final kiln
#

the thing that drew me to ML was llms, really fascinated by them

agile owl
#

I would be surprised if chatgpt didn't use some kind of reinforcement learning

final kiln
warm copper
#

it does

agile owl
#

I think every time you tell it it did something wrong it learns from that

tidal bough
#

google "RLHF".

agile owl
#

plus side is you get to use your CPU and GPU at the same time πŸ™‚

final kiln
warm copper
#

Gemini is good

agile owl
#

reinforcement learning to the moon

#

I need to learn more about the implementation of these models instead of using sb3 though

#

I want to try to reimplement SAC in rust from scratch using their version of torch

final kiln
#

rust has torch ?

agile owl
#

it has a torch alternative

#

that has a torch-like API

final kiln
#

python is awesome but when the project starts growing I feel the need for type safety

#

mypy dont cut it cuz a lot of libs dont have stubs

warm copper
#

I also like NLP @agile owl

agile owl
#

I like NLP too but more for quantizing things like sentiment than generation

warm copper
#

cool!

final kiln
#

ive been burning a lot of gpu for sentiment analysis

#

,_.

warm copper
#

yeah

#

I dont do GPU intensive stuff on my computer

agile owl
#

I have an RTX 2080 Ti

#

it does alright

final kiln
#

I wrote a data and training pipelin in github actions

warm copper
#

I did a CNN implementation once during my undergrad. My computer shut down @agile owl RTX 2080

final kiln
#

models are trained in spot

warm copper
#

not Ti tho

final kiln
#

they all train during the night

agile owl
#

I think the Ti has 3 extra gb of VRAM

warm copper
#

I had to use Google Collab

#

50 bucks a month

#

REEEEE

agile owl
#

VRAM is super expensive did you guys know that

warm copper
#

no o.O

agile owl
#

the good stuff is

warm copper
#

is that why Graphics Cards are expensive?

final kiln
agile owl
#

part of the reason yes

final kiln
#

everything is automated

warm copper
#

I have 2080 and it cant run Cities Skylines 2 properly

#

saddddd

agile owl
#

I'm honestly somewhat surprised it took people so long to realize the potential of GPUs for machine learning

#

they had GPUs for a long time

warm copper
#

but they were not as capable

agile owl
#

I don't think they have changed the basic premise of it that much

#

but in relation to the CPU they always had more but worse cores didn't they?

warm copper
#

also we didnt have capable computers back then

#

I mean Im 90s kids

agile owl
#

right but think about it

#

using GPUs for compute

#

that only really became a thing when NVIDIA wrote Cuda

#

even though in principle it could have been done earlier

#

is it because the GPU instruction sets are proprietary?

#

I don't understand why no one did something like CUDA independently

final kiln
warm copper
#

yeah thats a good question @agile owl

#

o_O I dont know either

agile owl
#

I asked chatGPT and it gave me a reasonable sounding answer

warm copper
#

what did it say?

agile owl
#

lack of standardization

warm copper
#

ohhh

agile owl
#

limited programmability of earlier generations of GPUs

#

lack of awareness

final kiln
#

given how hard it is to use the AMD environment, may make sense

#

ive spent a whole weekend trying to get an amd gpu in the cloud to do stuff

#

took me an hour to do the same in nvidia

agile owl
#

I got AMD GPUs to work with ROCm years ago

#

it was alright

final kiln
#

ig my issue is the lack of docker support

agile owl
#

the experience of getting it set up sucked tho

final kiln
#

it's not zero support, but everything is so badly done

agile owl
#

it's like one dude

final kiln
#

10gb docker images for ex

agile owl
#

or it used to be

#

it was like literally one dude

final kiln
#

fr ?

agile owl
#

lol

final kiln
#

like one person supporting it ?

agile owl
#

I remember looking for resources about it and it was ONE GUY answering everyone's questions on github

final kiln
#

yeah that explains it

agile owl
#

i'd be surprised if they haven't grown the team

#

I mean he probably had a few coworkers but still

final kiln
#

behind every bad code is one overworked developer

agile owl
#

I still think AMD is underinvesting in AI

#

I don't understand it

#

if they could undercut Nvidia in AI it would be a massive coup

final kiln
#

massive corps are hard to change

#

people don't get compensated for innovating

#

so they kinda dont

#

I think AI was nvidias end game from the start

agile owl
#

I mean they don't need to be the first mover

#

being the second mover is also good if there's only one other company

#

they just need to invest in having something as good or better than cuda

final kiln
#

there seems to be a market vacum of sorts

agile owl
#

or even slightly worse

final kiln
#

no competitor to nvidia

agile owl
#

exactly

#

even if you're worse than nvidia if you're a viable option and can compete on price

#

their problem is rocm is barely viable

#

it just has to be good enough to work with common libraries

#

and they would get a massive sales boost from people going for the value alternative

final kiln
#

idk if it's easy to do that, you're saying like make cheaper gpu rite

agile owl
#

they already make cheaper GPU

#

I'm saying they need the SOFTWARE

#

so people buy them for compute

final kiln
#

ah, yeah that's for sure

agile owl
#

their gpus are usually slightly worse than nvidias

#

from a hardware perspective

#

but the software can be worse too

#

it just has to like, work

final kiln
#

I personally dont care or would notice the hardware

agile owl
#

which rocm barely does

#

you'd notice it would take like 10-20% longer to train

final kiln
#

I do notice the complete lack of support on the software side

agile owl
#

but the GPUs are like half the cost

#

so it's value from that perspective

final kiln
#

I mean rn I just schedule the thing and let it do it during the night

#

so I wouldn't notice it

agile owl
#

I'm making a webserver to provide a UI for model creation

final kiln
#

yeah I get the feeling a lot of modelling can be done with UI or some DSL

agile owl
#

the alternative is to write a DSL

#

but then it wouldn't have charts

final kiln
#

something Like labview could be cool

#

both a UI and a DSL

agile owl
#

this server is currently just for training an already curated dataset and plotting results

#

the next step I want to do is add the ability to do ETL from different APIs into a joined table before standardizing and slicing for CV etc.

final kiln
#

have you tried mlflow

agile owl
#

is that some paid service

final kiln
#

a webserver that does graphs sounds an awful lot like it

final kiln
agile owl
#

does it work for reinforcement learning

final kiln
#

its what Ive been using to log my stuff

#

you kinda just do .log_metric("metric_name", spme_val)

#

and it saves it and you can see it real time on the UI

#

theres also an auto log feature, but ive never used it

#

like it does some magic that you dont even have to explicitly log stuff

agile owl
#

like in a callback?

final kiln
#

there's a lot of automation done by me, each experiment is a PR, when I merge it automatically runs the training loops and they appear as runs

final kiln
agile owl
#

I'd rather roll my own

#

I don't like magic

final kiln
#

I meant the auto log feature

#

what im doing you just do .log_param, .log_metric and .log_artifact

#

so it backs up your models and everythng

#

surely you'd prefer not to reimplement all this

#

there are others similar to this

agile owl
#

I'm honestly not sure if that API can work with sb3

#

without doing deep surgery

#

they already have their own logging functions

#

I just need to visualize them

#

I already have the points

final kiln
#

technically it's an http api

#

been using it with pytorch, works quite well

agile owl
#

I mean, getting access to that state at the right level of granularity to plot it

warm copper
#

someone asked me what ML agorithms don't require regularization and which ones do

#

I said this

#

There are some ML algorithms where overfitting is not a problem at all. For example Naive Bayes is known for its conditional independence which makes it resistant to overfitting. KNN is another algorithm that is resistant to overfitting as it works by memorizing the training data. Random forest is also resilient when it comes to overfitting thanks to the way it combines many independently trained decision trees.
#

am I right?

final kiln
#

knn doesnt overfit ?

agile owl
#

knn has a very good calibration

final kiln
#

i thought every model can overfit

agile owl
#

it depends on what your assumptions are

#

some people say RL can't overfit but I'm pretty sure it can as soon as you introduce different data to the same environment

final kiln
#

it's way too late for me to use my brain to learn new stuff

warm copper
#

I found this online:

#

Non-parametric: KNN doesn't learn a fixed set of parameters. It essentially relies on memorizing the training data.
Focus on local regions: KNN makes predictions based on localized neighborhoods in the data space, reducing its susceptibility to extreme patterns that might mislead parametric models.
final kiln
#

i mean yeah, does it even make sense to say "knn overfits"

#

it's just a database query almost

agile owl
#

it's an interesting property though

#

I think you could make a more sophisticated model based on the same principle

final kiln
#

its called a transformer

warm copper
#

is there any other algorithm that uses memorization?

final kiln
#

likely yeah

#

not neural networks tho, they do compression

#

im gonna sleep

warm copper
#

Locally Weighted Regression: A non-parametric regression method that fits simple models to localized subsets of the training data. The focus is on predictions made close to a query point, relying more heavily on training examples in that local region.

#

found this

#

Locally Weighted Regression

agile owl
#

if you're localizing to time then that's just a rolling window

warm copper
#

interesting read

agile owl
#

you could like weight by time too

#

exponentially weight the cost with some halflife with respect to time

warm copper
#

When to use Locally Weighted Linear Regression?
When n (number of features) is small.
If you don’t want to think about what features to use.

quartz karma
#

Hi, does anyone know how to get simba work on top of spark? is it enough to just install the driver?

agile owl
#

so I got my code working with polars and it's actually slower

#

by quite a bit

#

gonna profile it and see what the problem seems to be

#

same stuff as where it was taking a lot of time with pandas except the flamechart is shallower

#

and it's taking longer

#

I feel like I got memed

#

I'm not sure my results were actually correct because I got a different end result but I'll save learning polars for a new project

#

I think it kind of stands to reason that a dataframe with an index built in is going to be faster than doing a filter on a column...

teal lance
rapid isle
#

guys is there a difference between the two

worldly dawn
rapid isle
#

sorry could u refer me to the respective channel

#

and no thats why im asking

worldly dawn
#

that's also why I am asking you

#

so you can think about it

dry geyser
#

morning!

#

An interesting side effect in polars:

        filtered_df = df.with_columns(
            pl.when(
                pl.col(column).is_not_null() & pl.col(column).str.contains(phone_regexp)
            ).then(
                pl.col(column).str.extract_groups(phone_regexp)
            ).otherwise(
                pl.lit(None)
            ).alias(column)
        )

This will create a structured column with as many None values as capture groups

#

I tried to fix that behavior to no avail

#

[.when([(col("PHONE3").is_not_null()) & (col("PHONE3").str.contains([String(^(?:(?P<country_code>+\d+)[\s-]+)?(?P<number>(?:\d[\s-]*)+)$)]))]).then(col("PHONE3").str.extract_groups()).otherwise(null.cast(Struct([Field { name: "country_code", dtype: String }, Field { name: "number", dtype: String }]))).alias("PHONE3")]

#

the dataframe does not contain any nulls in the columns parsed

#
 'phones': [{'country_code': None, 'number': None},
                 {'country_code': None, 'number': '5551234'},
                 {'country_code': None, 'number': '5551234'},
                 {'country_code': None, 'number': '5551234'}]}
sonic yoke
#

Guys can someone recommend a good beginner course for python for datascience

dry geyser
#

[2024-02-12 11:45:17,350] [MainProcess:MainThread] INFO: CSV: Processed 499999 lines in 1.66 seconds, 300776.14 lines/second

#

rewrote all the coalescing/transforms into expr engine query plans

final kiln
feral blade
#

Does pytorch have yolo model? My lab asked to train object detecting using it and specifically asked to switch to pytorch for this assignment.

molten acorn
#

Hi, I have a pandas dataframe which is grouped by a column named 'run'. Each group should have more or less the same amount of rows. In this dataframe, there is another column called 'total_data' and I would like to merge these groups into a single group, effectively eliminating the need for a 'run' column. While merging it would be nice if it took the mean of the row value of 'total_data' horizontally across each 'run' group, rather than the mean on the column itself. The end result should be a Series with the same amount of rows as a 'run' group. Could someone please assist me with this? I've been trying to solve this with AI but I can't seem to figure out the right combination of functions to call. Any help will be appreciated

#

Hopefully I explained that correctly. I'm new to pandas so I'm not sure if I'm describing the problem correctly

left tartan
#

Do they have a common index or something to "join" them on?

molten acorn
#

They have all the same columns, might differ in the number of rows by less than 5%

left tartan
#

Yes, so sounds like you want to a left outer join then.

#
df1.merge(df2, how="outer", ....)
#

There's a few different ways to merge. You can use "on=[list of columns to join on]", or left_index/right_index if the indexes is what you want to use.

molten acorn
#

Ok, great, this gives me a path to follow. This will then be possible to merge values by taking the mean?

#

Across the dataframes

#

I'll experiment, thank you for the help

final kiln
#

Not using seeds was a mistake

#

But also, I may be coming to the realization that 16gb of GPU is not gonna be enough to train this thing. What else could make this thing always converge to the same value other than it's just too small. Bert uncased is like 100M parameters

jagged latch
#

I have a question to those experienced in Dash. Can you call another function that you defined earlier in the Python script to generate the dataframe inside a Callback function to replace the old dataframe with a new one?

#

Like I have a program that generates a Pandas dataframe based on a given date and I want to implement a date picker that will replace the data with different data from the other dates entered by the user in the text box.

#

Inside the Callback function, I called the function that originally generated the dataframe so that it can generate a new dataframe with the new date.

agile owl
#
       spot = self.data.loc[curr_date, "spot"]
        spot_window = self.data.loc[prev_dates, "spot"].to_frame()
        log_spot_window = spot_window.apply(np.log)
        if self.current_step > 0:
            spot_returns = log_spot_window.unstack().diff().dropna().iloc[-1]
            spot_returns = spot_returns.values
        else:
            spot_returns = np.zeros(self.no_symbols)
        spot_window_vals = spot_window.values
        spot_values = spot.values
        spot_rank = get_percentile(spot_values, spot_window_vals, axis=0)

@left tartan pd.loc and pd.unstack are faster than pl.filter and pl.pivot in this code. particularly in the definition of spot_window and spot_returns

left tartan
#

(I'm leaving this for the resident polars experts πŸ™‚

left tartan
agile owl
#

my difficulty is always figuring out the right place to start and stop

#

for the appropriate amount of context

#

so basically

left tartan
agile owl
#

the starting state is I have the dataframe that represents the entire chunk deserialized as self.data

#

self.dates is the list of unique dates in data

#

I have an index counting which date I'm on

#

I need to update the current state given the date and self.data

#

!pastebin

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

agile owl
#

the class is 550 lines so don't wanna post it and be rude unless you're really interested

left tartan
#

Looks like you want the percentile of spot price over a window from current date -10 to +1?

agile owl
#

that's part of it

#

I should probably split this up into multiple methods

#

at the end you see everything that goes into the return

#

lines 31-38

#

the docstring is also wrong

#

I forgot to update it

#

I was originally clustering for a single stock and haven't updated it

#

there's obviously some repetition and dumbness going on here but the critical part is I think the part I posted originally

#

polars profile:

#

pandas profile

#

the two most expensive high-level polars operations are filter and pivot

#

this corresponds to the pandas indexing and pandas unstack

#

so that kind of makes sense

#

but the polars version is just 2x slower

#

unfortunately I lost the polars code I had because I never committed it out of disgust at the results

#

surely not having an index comes at some cost though

#

if pandas can't beat polars at a loc index vs a generic filter then I'd be surprised because the whole reason pandas has the badness it does have is so it can optimize around indexing

final kiln
#

It always comes down to the data

agile owl
#

I mean, it also comes down to how the data is structured for access

#

I don't understand how polars gets rid of the index and doesn't pay a price

#

you either have an index that's set as a discrete action, set it every time you do an operation which seems extremely expensive, or don't have it

final kiln
#

Ah I was talking about my thing

agile owl
#

oo mb

final kiln
#

Need to step back and re process the data using all the lessons I learned til now

agile owl
#

the polars code was something like this before I tossed it

spot = self.data.filter(pl.col("date") == curr_date))
spot_window = self.data.filter(pl.col("date").is_in(prev_dates))

...
spot_returns = log_spot_window.pivot(index = "date", columns = "ticker", values="spot").drop("date").diff()
final kiln
#

A 40M parameter model having the same loss graph as a 1M parameter one, like I can't even

final kiln
#

Probly not advisable

#

I think

final kiln
#

It's also possible that these things converge slowly, that graph looks an awful lot like mine

#

well more or less, x axis is number of steps w max range being the end of the dataset

jagged latch
#

I have a question. I'm trying to load into my Dash App with Debug Mode enabled. Does it usually take long?

past meteor
agile owl
#

I didn't commit it unfortunately but it was something like this:

spot = self.data.filter(pl.col("date") == curr_date))
spot_window = self.data.filter(pl.col("date").is_in(prev_dates))

...
spot_returns = log_spot_window.pivot(index = "date", columns = "ticker", values="spot").drop("date").diff()
#

I don't think polars has diff actually

#

but if my polars code was correct to begin with then I probably wouldn't have had this issue in the first place hah

past meteor
#

It has diff

agile owl
#

I also don't understand why polars should be faster than pandas if it never indexes the data?

past meteor
#

They answer this very question in their docs

final kiln
#

Is there a docs for training transformers, I'd be really happy if there was one

#

I'm just trying all this stuff until something sticks, not very efficient

past meteor
#

There's just no Pandas index weirdness

#

that's the nuance

agile owl
#

I see

#

so is filter wrong then

past meteor
#

Giving an index based on inter position in Pandas is also just a bit pointless

#

It's the default, of course there's smarter ways to do it

#

But I don't see people doing that, llike picking an index in Pandas that aligns with their data access patterns

#

You also need to check what type of index Pandas uses, as you know in DBs there's many different kinds

#

Hash based indexes don't give you a lot if you're filtering like <

agile owl
#

it's a multiindex on date as datetime64 and ticker as a string

#

I'm not using lt or gt just equals and isin

past meteor
#

hmm then a hash index is good

agile owl
#

the issue isn't really with the DB

past meteor
#

Yeah, I didn't mean that

#

I just linked this as a general resource about indexes

agile owl
#

ah gotcha

past meteor
#

Because they aren't magic ✨

past meteor
#

or eager only?

agile owl
#

I believe the read_database returned a lazy frame

past meteor
#

No

agile owl
#

it returns an eager frame?

past meteor
#

yes

agile owl
#

then i was using eager

past meteor
#

Never use eager unless you have to imo

#

that's a speedup left on the table

agile owl
#

can you read from db using lazy

past meteor
#

No, query the db and call .lazy() immediately

agile owl
#

thanks you're a lot more helpful than the polars discord

#

they have a beginner questions channel where no one answers beginner questions lol

past meteor
#

🀣

agile owl
#

typical of the rust community memes I have to admit

past meteor
#

πŸ‘ˆ polars simp # 1

agile owl
#

they have a reputation for being elitists and thinking people haven't done enough work if they need help

#

I used to think like that then I realized being eager to ask questions has almost no downside on the internet

#

as long as they are somewhat reasonable..

past meteor
#

Answering questions makes you think

#
  • they're without obligation
#

Nobody loses, unless someone is spamming or so

#

Oh, good addition about the lazy API is that it removes all footguns

#

You can't iterrows, maprows or whatever I see people doing

agile owl
#

well time to rewrite everything I wrote yesterday but with lazy this time

#

I thought I must have really screwed something up bad so I tossed it all

#

lesson learned

past meteor
#

last recommendation

agile owl
#

I also didn't feel like branching because I was lazy

past meteor
#

It will sound crazy

#

but 1) read all the docs. do it while you're eating cornflakes or whatever 2) browse through the method names in the API ref

#

Polars has the GOD tier method for time series

#

The danger is, if you focus on translating Pandas to Polars you'd never find it

agile owl
#

reading the whole API sounds a bit ambitious

past meteor
#

Not reading it, just look at the names

#

ofc I didn't read the entire API

agile owl
#

here's a funny thing

#

I asked copilot to retrieve what I wrote yesterday but it said it's not allowed to do it

#

but when I start retyping it it suggests what I had written

past meteor
#

That there

#

Is sentience /s

agile owl
#

that would be an interesting use for AI

#

"help I accidentally deleted this code I forgot to commit, replay your telemetry buffer"

#

XD

past meteor
#

While you're at it replay Jeff Bezos' credit card details thanks

boreal gale
boreal gale
#

probably 1 year ago or even more

past meteor
#

This time last year is when I turned the >1h Pandas data pipeline to ~15s polars

#

And I definitely used group by dynamic

#

Maybe it was longer ago then yeah

agile owl
#

this is in the environment step function in a reinforcement learning setup