verbal venture Jun 21, 2023, 12:58 AM

#

ok, I want to create 20 classes. SO I need 20 subfolders within train and test correct? how would the class names gets created withini the code after

small wedge Jun 21, 2023, 12:59 AM

#

uh you can segment images however you like, they could all be in the same file and have the label class in the file name for example

#

cat_0001.png for example if cat was a class. Note that is for single classification (i.e. only one of the classes can be chosen) it will be a little more complicated for multiclass classification

small wedge Jun 21, 2023, 1:01 AM

#

verbal venture ok, I want to create 20 classes. SO I need 20 subfolders within train and test c...

your model's output for class prediction will be a vector of length 20, so converting from index of the vector to class name and back is quite simple

verbal venture Jun 21, 2023, 1:04 AM

#

okay awesome, and just wondering, there's a subclass of male + femaile

#

so train/test -> male/female -> 0..20 for each folder

#

how would that work having the male + female subclass?

#

I can make some UI thing that just calls 2 separate models if it's too complicated to incorporate that subvision of folders @small wedge

small wedge Jun 21, 2023, 1:08 AM

#

verbal venture how would that work having the male + female subclass?

idk what you mean by subclass, is this some data you would pass to the input of the model or expect it to classify images as male/female?

verbal venture Jun 21, 2023, 1:09 AM

#

small wedge idk what you mean by subclass, is this some data you would pass to the input of ...

Yeah

small wedge Jun 21, 2023, 1:09 AM

#

that wasn't a yes/no question XD

#

I'm trying to think of it mathematically for the model's input/output

#

you have an input vector of pixels in the image (/ whatever data you want in addition to that)

#

and you have an output vector of your 20 classes

#

does this subclass fall into one of those categories? or is it just an organizational thing?

verbal venture Jun 21, 2023, 1:11 AM

#

The model has to work on both males + females but it has to detect whether the user uploaded photo is male or female

#

It’s age prediction but I can’t compare the ages of males + females

small wedge Jun 21, 2023, 1:12 AM

#

I see

#

well you can either have 2 models, the user inputs whether the picture is male or female, and pass that to the appropriate model

#

or you can use multiple classification, with 21 classes

#

and have the model predict male/female as one of the classes

cold osprey Jun 21, 2023, 5:01 AM

#

!code

arctic wedgeBOT Jun 21, 2023, 5:01 AM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

cold osprey Jun 21, 2023, 5:12 AM

#

Huh is this python

#

Why are there ;

#

And std::cout

plucky bolt Jun 21, 2023, 5:12 AM

#

cold osprey Huh is this python

C++ but I am using matplotlib with it
The syntax is a bit different but that shouldn't matter all that much. I'm trying to figure out why there is that additional straight blue line going from the origin to the current data point being plotted in every update.

stable remnant Jun 21, 2023, 5:33 AM

#

I have a projects based on image processing and computer vision I need help with, If there's any professional ML engineer willing to help please let me know, Thank you!

plucky bolt Jun 21, 2023, 5:40 AM

#

stable remnant I have a projects based on image processing and computer vision I need help with...

What sort of projects?

stable remnant Jun 21, 2023, 5:43 AM

#

plucky bolt What sort of projects?

Pardon me, It's only 1 project!

plucky bolt Jun 21, 2023, 5:45 AM

#

stable remnant Pardon me, It's only 1 project!

You need to include some details to explain what it is you are trying to do and to what extent and maybe why.

magic dune Jun 21, 2023, 5:48 AM

#

!paste

#

https://paste.pythondiscord.com/tizivataru

magic dune Jun 21, 2023, 5:48 AM

#

magic dune https://paste.pythondiscord.com/tizivataru

    
self.weights[layer_idx] -= self.learning_rate * np.dot(self.deltas[layer_idx + 1],
  File "<__array_function__ internals>", line 180, in dot
ValueError: shapes (5,3) and (1,4) not aligned: 3 (dim 1) != 1 (dim 0)

#

line 85

#

help

#

It is a nn

stable remnant Jun 21, 2023, 5:56 AM

#

plucky bolt You need to include some details to explain what it is you are trying to do and ...

Well, it's a freelance project for a business they have a requirement of image processing and computer vision, Looking for someone who has expertise in this field for support!

magic dune Jun 21, 2023, 5:56 AM

#

Works binary but not with multiple classes

potent sky Jun 21, 2023, 8:21 AM

#

stable remnant Well, it's a freelance project for a business they have a requirement of image p...

you'd have better luck just describing your problem here or in the help forum
people generally are less eager to get on a DM with someone
plus by putting it here you get the combined expertise of the community rather than one person

supple prawn Jun 21, 2023, 1:13 PM

#

is it better to watch tutorials than read books

#

also, is this https://github.com/ossu/data-science a good curriculum to follow for data science

GitHub

GitHub - ossu/data-science: :bar_chart: Path to a free self-taught ...

:bar_chart: Path to a free self-taught education in Data Science! - GitHub - ossu/data-science: :bar_chart: Path to a free self-taught education in Data Science!

simple tapir Jun 21, 2023, 1:38 PM

#

@mild dirge @left tartan , thanks a lot guys! 🙏

glossy aspen Jun 21, 2023, 1:55 PM

#

supple prawn also, is this https://github.com/ossu/data-science a good curriculum to follow f...

Too broad and why there's a java section. Maybe better to look for more specific courses

supple prawn Jun 21, 2023, 3:28 PM

#

glossy aspen Too broad and why there's a java section. Maybe better to look for more specific...

can you suggest any which i can do as a beginner

glossy aspen Jun 21, 2023, 3:50 PM

#

supple prawn can you suggest any which i can do as a beginner

I don't remember a specific curriculum now but I would suggest zoomcamps like this:

https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

GitHub

mlbookcamp-code/course-zoomcamp at master · alexeygrigorev/mlbookca...

The code from the Machine Learning Bookcamp book and a free course based on the book - mlbookcamp-code/course-zoomcamp at master · alexeygrigorev/mlbookcamp-code

sick ember Jun 21, 2023, 3:53 PM

#

Hey everyone I have a quick question

glossy aspen Jun 21, 2023, 3:53 PM

#

supple prawn can you suggest any which i can do as a beginner

Also khan academy courses would be easier to follow for statistics rather than the MIT calculus courses

sick ember Jun 21, 2023, 3:53 PM

#

Does loss in training in anyway related to number of gpu my laptop has?

#

For some reason the loss in my model is extremely high with extremely low accuracy

#

Even though I seems to have done everything right

mild dirge Jun 21, 2023, 3:57 PM

#

The number of gpu your laptop has?

#

Why do you make a relation between gpu and loss?

sick ember Jun 21, 2023, 3:58 PM

#

Umm I was watching a tutorials on CNN, and I was doing the exact same thing as the tutorial

#

But I’m getting different outcome

mild dirge Jun 21, 2023, 3:59 PM

#

Probably not the exact same, or you got a very unlucky run

sick ember Jun 21, 2023, 3:59 PM

#

https://youtu.be/n2MxgXtSMBw

YouTube

The Semicolon

Convolutional Neural Networks (CNN) Implementation with Keras - Python

#CNN #ConvolutionalNerualNetwork #Keras #Python #DeepLearning #MachineLearning

In this tutorial we learn to implement a convnet or Convolutional Neural Network or CNN in python using keras library with Tensor flow backend.

Convolutional Neural Networks are a varient of neural network specially used in feature extraction from images. In this v...

▶ Play video

#

Here is the tutorial

#

Something just felt wrong

glossy aspen Jun 21, 2023, 4:00 PM

#

sick ember But I’m getting different outcome

The initial weights would be different

sick ember Jun 21, 2023, 4:00 PM

#

glossy aspen The initial weights would be different

Weights?

potent sky Jun 21, 2023, 4:00 PM

#

How much of a difference is there

mild dirge Jun 21, 2023, 4:00 PM

#

Have you learned about linear regression and perceptrons yet?

sick ember Jun 21, 2023, 4:01 PM

#

mild dirge Have you learned about linear regression and perceptrons yet?

I have watch some videos of it

mild dirge Jun 21, 2023, 4:01 PM

#

You should have a solid understanding of those before you even begin looking at CNN

#

Start with the basics first would be my advice, there's many things that could make a cnn give bad results

glossy aspen Jun 21, 2023, 4:02 PM

#

sick ember Weights?

When you create NN you have connections between the neurons. In the beginning of the code you are giving some random numbers to them. So try several times to see if the loss decreases by chance

sick ember Jun 21, 2023, 4:03 PM

#

mild dirge You should have a solid understanding of those before you even begin looking at ...

I have taken linear algebra so I do understand

sick ember Jun 21, 2023, 4:03 PM

#

glossy aspen When you create NN you have connections between the neurons. In the beginning of...

Thank you I will try that!

harsh bane Jun 21, 2023, 4:19 PM

#

Hoi, for stable diffusion, how do i start off making a script that loads last of everything on boot, and has "if not detecting X extensions/models locally, install extension through extensions, install from url tab", restart webui, then read from new extension to fetch model", then reads it all to confirm it's there.

left tartan Jun 21, 2023, 4:55 PM

#

sick ember I have taken linear algebra so I do understand

If you haven’t watched, this series is excellent visualization of the topic: https://www.3blue1brown.com/lessons/neural-networks

3Blue1Brown - But what is a Neural Network?

An overview of what a neural network is, introduced in the context of recognizing hand-written digits.

#

Even if you know it, it’s a fun watch

sick ember Jun 21, 2023, 4:58 PM

#

left tartan If you haven’t watched, this series is excellent visualization of the topic: htt...

He’s my favorite math YouTuber I watch that entire series❤️👍

verbal venture Jun 21, 2023, 5:26 PM

#

doeas anyone know of a link/anywhere to know about what neural network architectures to create based off the problem

#

say if I wanted to classify 196 classes, how would I know what NN to create

#

is there a formal process or any resource cheat sheet

west oyster Jun 21, 2023, 6:20 PM

#

II want to write a python program for a typical data analytics workload: collect data, clean it, do some prediction, and display insights/dashboard on a website. I want to write it in a modular way so folks can replace a component with a different one, not even in Go. What's a good approach to do that? write python program that make calls to other python executables (out of process)?

#

And if someone has a codebase that I can get inspiration from, please share it

#

I found this one but I am not sure how to run it yet: https://github.com/tdpetrou/Build-an-Interactive-Data-Analytics-Dashboard-with-Python-Oreilly

GitHub

GitHub - tdpetrou/Build-an-Interactive-Data-Analytics-Dashboard-wit...

Contribute to tdpetrou/Build-an-Interactive-Data-Analytics-Dashboard-with-Python-Oreilly development by creating an account on GitHub.

timid kiln Jun 21, 2023, 10:02 PM

#

So I get a warning if I do this:

# code #1
output_df.dropna(subset=['flow'], inplace=True)
# compiler returns this message:
# See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

# But no warning here, code #2:
output_df = output_df.dropna(subset=['flow'])

What I'm trying to do is remove all rows where flow has a value of None.

I looked up that link and I don't see where it applies to what I'm (doing).

I know I can import warnings and filter out the warning but... is that the right thing to do? Seems like I could just do #2 above and move along but... what's the right thing to do?

left tartan Jun 21, 2023, 10:06 PM

#

timid kiln So I get a warning if I do this: ```py # code #1 output_df.dropna(subset=['flow'...

In general, avoid inplace. https://towardsdatascience.com/why-you-should-probably-never-use-pandas-inplace-true-9f9f211849e4?gi=611f8dfd661e

Medium

Why You Should Probably Never Use pandas inplace=True

Truly it is a curse on the library and a pox on thee if you use it

timid kiln Jun 21, 2023, 10:07 PM

#

"a pox on thee..." 😄 love it

serene scaffold Jun 21, 2023, 10:08 PM

#

I never use inplace with pandas

#

if I ever make koalas, that won't have it

tidal bough Jun 21, 2023, 10:08 PM

#

it's a bit annoying in pandas to things nicely and immutably
like, plenty of assigns

timid kiln Jun 21, 2023, 10:08 PM

#

Most times for what I'm doing I want to modify the underlying data. Otherwise i run the risk of making more and more copies of a dataframe, making the code more confusing. I think. Probably.

left tartan Jun 21, 2023, 10:08 PM

#

timid kiln Most times for what I'm doing I want to modify the underlying data. Otherwise i...

Read the whole article.

timid kiln Jun 21, 2023, 10:08 PM

#

left tartan Read the whole article.

yessir, will do rn 🙂 Thank you for the link.

#

I just used .melt for the first time today so seems timely.

left tartan Jun 21, 2023, 10:09 PM

#

also, my real answer is: don't use pandas, but I wont go there (and sometimes you have to)

timid kiln Jun 21, 2023, 10:10 PM

#

left tartan also, my real answer is: don't use pandas, but I wont go there (and sometimes yo...

OOOoooo tell me more please. Always looking for alternatives/options and such.

serene scaffold Jun 21, 2023, 10:10 PM

#

left tartan also, my real answer is: don't use pandas, but I wont go there (and sometimes yo...

you prefer polars, or what?

left tartan Jun 21, 2023, 10:10 PM

#

serene scaffold you prefer polars, or what?

I do a lot with pycompute + duckdb now

timid kiln Jun 21, 2023, 10:10 PM

#

So, essentially database tables/queries?

left tartan Jun 21, 2023, 10:11 PM

#

Polars is fast(er) than pandas, but I don't want to learn yet another pythonic dataframe api that's not as good as sql.

timid kiln Jun 21, 2023, 10:11 PM

#

I avoided pandas for a while and just used sqlite because I'm more comfortable with sql.

serene scaffold Jun 21, 2023, 10:11 PM

#

there's a lot that I dislike about pandas. but I'm too familiar with it to want to learn something else. and it's so well documented.

timid kiln Jun 21, 2023, 10:11 PM

#

"avoid" being relative... I have plenty of dataframes around the code things.

left tartan Jun 21, 2023, 10:11 PM

#

timid kiln So, essentially database tables/queries?

Yah, everyone needs to know pandas, but eventually you hit the wall.

#

pyarrow starts to change the game a bit tho.

worn stratus Jun 21, 2023, 10:33 PM

#

left tartan Polars is fast(er) than pandas, but I don't want to learn yet another pythonic d...

polars has a super intuitive API, if you know how to do something in SQL, you pretty much know how to do it in Polars. But you get the advantages of being able to do a ton of stuff in polars that is much harder in SQL - complex aggregation, pivoting etc - alongside composability which makes it workable as part of actual software rather than one off analysis/ETL

I was never super familiar with Pandas, but I really like polars

left tartan Jun 21, 2023, 10:34 PM

#

I wish polars and pandas would get together and make a love child.

worn stratus Jun 21, 2023, 10:34 PM

#

I don't really know what Pandas has to offer

#

that's a lie - it has more IO options

tidal bough Jun 21, 2023, 10:35 PM

#

indexes

worn stratus Jun 21, 2023, 10:35 PM

#

but the main reason is so popular is that it was the first mover

tidal bough Jun 21, 2023, 10:35 PM

#

what would be a messy join in polars is often something trivial like df1 + df2 in pandas, thanks to indexes.

left tartan Jun 21, 2023, 10:36 PM

#

(something something duckdb .... )

tidal bough Jun 21, 2023, 10:36 PM

#

but yeah, polars is very nice

iron basalt Jun 21, 2023, 10:37 PM

#

I would recommend using Polars instead of Pandas unless you have to or if Pandas is fast enough and you are already familiar with it.

worn stratus Jun 21, 2023, 10:37 PM

#

tidal bough what would be a messy `join` in polars is often something trivial like `df1 + df...

do you have an example? Ime with Pandas, indexes were largely an annoyance rather than helpful

left tartan Jun 21, 2023, 10:38 PM

#

pandas indices are helpful for that specific case: for (in sql terms) natural joins across two df's

worn stratus Jun 21, 2023, 10:38 PM

#

iron basalt I would recommend using Polars instead of Pandas unless you have to or if Pandas...

familiarity and company tooling built on top of Pandas are the things that are hamstringing polars over Pandas

tidal bough Jun 21, 2023, 10:39 PM

#

worn stratus do you have an example? Ime with Pandas, indexes were largely an annoyance rathe...

this section in Modern Pandas has some impressive examples: https://tomaugspurger.net/posts/modern-3-indexes/#indexes-for-alignment

pure hinge Jun 21, 2023, 10:41 PM

#

Since there is no finance sub here, thought I'd leave this here:

https://github.com/IlyaKipnis/PythonBacktesting/blob/main/edhec_perfa.py
https://github.com/IlyaKipnis/PythonBacktesting/blob/main/Return_portfolio.py

Some utility functions to rebuild some of R's financial ecosystem for portfolio allocation backtesting in Python considering that Zipline/Pyfolio are prone to breaking. More functions will be upcoming in the edhec_perfa file.

GitHub

PythonBacktesting/edhec_perfa.py at main · IlyaKipnis/PythonBacktes...

Backtest asset allocation strategies in Python with only a background in pandas necessary - PythonBacktesting/edhec_perfa.py at main · IlyaKipnis/PythonBacktesting

GitHub

PythonBacktesting/Return_portfolio.py at main · IlyaKipnis/PythonBa...

Backtest asset allocation strategies in Python with only a background in pandas necessary - PythonBacktesting/Return_portfolio.py at main · IlyaKipnis/PythonBacktesting

left tartan Jun 21, 2023, 10:41 PM

#

pure hinge Since there is no finance sub here, thought I'd leave this here: https://github...

thanks, that's actually very timely. That's pretty much what I just finished writing.

#

(i mean, not exactly of course, but yah)

pure hinge Jun 21, 2023, 10:42 PM

#

Heh--yeah, used a lot of chatGPT for this translating from R

left tartan Jun 21, 2023, 10:42 PM

#

zipline was frustrating

pure hinge Jun 21, 2023, 10:42 PM

#

But so many jobs say "must have Python, must have Python", though IDK which packages people use for testing signal based trading systems and limit orders

#

Market orders can be done just using pandas, but all of quantstrat's depth from R is just...nope nope nope

#

I looked at Zipline's code once and said "I am not touching that incomprehensible mess"

left tartan Jun 21, 2023, 10:43 PM

#

Yah, I opted for (i'm a huge duckdb stan) duckdb over pandas, but started from probably the same place

verbal venture Jun 21, 2023, 10:44 PM

#

hey guys weird question, but how do I find out the input dimensions of my model layers

pure hinge Jun 21, 2023, 10:44 PM

#

Have you tried asking chatGPT yet?

verbal venture Jun 21, 2023, 10:44 PM

#

yeah, just returns so many errors

#

plus i acutally need to learn this

pure hinge Jun 21, 2023, 10:44 PM

#

Seems you're not specific enough in your query then.

verbal venture Jun 21, 2023, 10:45 PM

#

I should learn it anyway. It's a cnn. 50x256x256x196

pure hinge Jun 21, 2023, 10:45 PM

#

not familiar with them, but doesn't the cnn library itself have a way of outputting that data?

verbal venture Jun 21, 2023, 10:45 PM

#

do my model layers matter with the exception of the output (classes)

#

if you use transfer learning yeah

#

if you want to make your own model you set the input feature/outputs

worn stratus Jun 21, 2023, 10:46 PM

#

left tartan pandas indices are helpful for that specific case: for (in sql terms) natural jo...

I see what you mean - it also seems like a pretty uncommon use, nicer with the Pandas way, but also less explicit and more magical - which often causes more pain than it saves

left tartan Jun 21, 2023, 10:47 PM

#

worn stratus I see what you mean - it also seems like a pretty uncommon use, nicer with the ...

Yah, I really hate them (pandas indices)

tidal bough Jun 21, 2023, 10:47 PM

#

yeah, they are generally not nice

#

it's also horrible that pandas makes you use them. like, if you want to do a join, usually you have to do set_index (it can only join on column in one of the dfs, not both)

agile cobalt Jun 21, 2023, 10:49 PM

#

you can specify left_on and right_on for pandas.merge iirc?

tidal bough Jun 21, 2023, 10:49 PM

#

yup, that one you can I think

#

also today I tried to do a cross join in pandas and it just. doesn't work right. the good way according to google is, I shit you not,

pd.merge(
    df1.assign(_tmp=0),
    df2.assign(_tmp=0),
    on="_tmp",
).drop(columns="_tmp")

agile cobalt Jun 21, 2023, 10:52 PM

#

I've never had to do a cross merge before, but does how="cross" just not works?

worn stratus Jun 21, 2023, 10:52 PM

#

@left tartan I hadn't ever looked at it till now, but DuckDB is very interesting

DuckDB looks like a solid api, and from a glance has good Polars support. I think I actually have a usecase where it will save me a ton of effort

left tartan Jun 21, 2023, 10:52 PM

#

Yah, polars & pyarrow... feel free to dm me, I'm super into it right now.

worn stratus Jun 21, 2023, 10:54 PM

#

Arrow has been great for getting everything to be able to talk to everything else.

glossy aspen Jun 21, 2023, 11:03 PM

#

left tartan Yah, I really hate them (pandas indices)

I use numpy almost for everything. Do I really need something like pandas? Or is it necessary for production level processing?

left tartan Jun 21, 2023, 11:04 PM

#

glossy aspen I use numpy almost for everything. Do I really need something like pandas? Or is...

You're really asking about dataframes (or tables): I only know my world, but everything in my world ends up in a table/dataframe of some type. Secondly, there's a trend towards Arrow away from Numpy.

#

So, numpy is still important and will be for a long time, but what we're really talking about is vectorized operations and there are multiple ways to get there. Dataframes are just containers to make those operations convenient (perhaps too much of a simplification?)

tidal bough Jun 21, 2023, 11:05 PM

#

glossy aspen I use numpy almost for everything. Do I really need something like pandas? Or is...

How do you manage to use numpy when you have heterogenous data?

#

Or do you use structured arrays?

glossy aspen Jun 21, 2023, 11:07 PM

#

tidal bough How do you manage to use numpy when you have heterogenous data?

I put np.nan values - if I understand the question correctly

tidal bough Jun 21, 2023, 11:07 PM

#

No, I mean, when you have several columns of wildly different types.

glossy aspen Jun 21, 2023, 11:08 PM

#

left tartan So, numpy is still important and will be for a long time, but what we're really ...

I am not an industry guy so trying to understand/learn thanks

left tartan Jun 21, 2023, 11:09 PM

#

Yah, the machine learning libraries tend to tell us what we must use. scitkit-learn wants numpy, so we end up with pandas+numpy data types.

glossy aspen Jun 21, 2023, 11:09 PM

#

tidal bough No, I mean, when you have several columns of wildly different types.

I usually don’t have them but I use dictionaries for it

tidal bough Jun 21, 2023, 11:10 PM

#

anyway, a pandas dataframe is pretty much a bunch of equal-length numpy arrays (one per column) collected into a table-like structure

#

with nice methods to work on single columns, multiple columns, selecting rows, etc.

#

when working on a single column it's often easier to do it the numpy way, but few datasets have one column.

glossy aspen Jun 21, 2023, 11:12 PM

#

tidal bough when working on a single column it's often easier to do it the numpy way, but fe...

I think I understand the problem thanks. As I remember pandas is also numpy based and it won’t make so much difference

tidal bough Jun 21, 2023, 11:12 PM

#

yeah, it's very connected to numpy

#

whereas e.g. polars, not so much (you can easily convert columns to numpy arrays but internally they're actually arrow I believe)

left tartan Jun 21, 2023, 11:14 PM

#

Pandas is headed in that direction too, but in baby steps.

glossy aspen Jun 21, 2023, 11:14 PM

#

tidal bough whereas e.g. polars, not so much (you can easily convert columns to numpy arrays...

Never used them but I’ll check them (numpy vs arrow articles etc.)

tidal bough Jun 21, 2023, 11:15 PM

#

left tartan Pandas is headed in that direction too, but in baby steps.

this would break so much stuff

#

first thing that comes to my mind is that I have numba functions working on pandas dataframes, and I have some doubts numba works with arrow

pale hemlock Jun 21, 2023, 11:15 PM

#

hmmm

left tartan Jun 21, 2023, 11:17 PM

#

tidal bough this would break so much stuff

Yah, that’s why it’s opt in right now.

pure hinge Jun 22, 2023, 12:37 AM

#

tidal bough How do you manage to use numpy when you have heterogenous data?

That's what Pandas is for.

verbal venture Jun 22, 2023, 1:48 AM

#

is anyone able to tell me why my model has 0.00045%

serene scaffold Jun 22, 2023, 2:04 AM

#

verbal venture is anyone able to tell me why my model has 0.00045%

what model? 0.00045% what?
if you ask a question, ask yourself if you've given enough information for anyone to answer it.

verbal venture Jun 22, 2023, 2:05 AM

#

ok, what's the difference between _, predicted = torch.max(output, 1) and _, predicted = torch.max(output.data, 1)

serene scaffold Jun 22, 2023, 2:05 AM

#

I don't know what output is.

verbal venture Jun 22, 2023, 2:06 AM

#

for data in train_loader: images, labels = data. output = model(images)

#

CNN model

#

I can send you the full model actuall to see if you see anything wrong with it.

serene scaffold Jun 22, 2023, 2:07 AM

#

what does print(type(output)) show?

verbal venture Jun 22, 2023, 2:07 AM

#

side question I was also considering doing a masters in AI. how much did that prep you for your job

serene scaffold Jun 22, 2023, 2:08 AM

#

I'm pursuing a masters currently, I got a job with just a bachelors in CS, but only because I had a publication under my belt.

#

in general, a masters is basically a requirement for entry level ML jobs.

verbal venture Jun 22, 2023, 2:09 AM

#

how'd you get a publication

serene scaffold Jun 22, 2023, 2:10 AM

#

one of my professors wanted to publish with me. and I thank god (I am an atheist) for this every day.

verbal venture Jun 22, 2023, 2:10 AM

#

cool

#

so what's the diff between output and outputs.data. I see it used in different models

#

mainly outputs with NN, and .data with CNns

serene scaffold Jun 22, 2023, 2:11 AM

#

I still need the answer to the most recent question that I asked you.

verbal venture Jun 22, 2023, 2:12 AM

#

yeah just waiting for my model to be done

verbal venture Jun 22, 2023, 2:26 AM

#

serene scaffold I still need the answer to the most recent question that I asked you.

output is torch.tensor

#

outputs.data is also torch.tensor

serene scaffold Jun 22, 2023, 2:29 AM

#

verbal venture outputs.data is also torch.tensor

looks like they're basically the same, and .data is there for historic reasons https://stackoverflow.com/questions/51743214/is-data-still-useful-in-pytorch

Stack Overflow

Is .data still useful in pytorch?

I'm new to pytorch. I read much pytorch code which heavily uses tensor's .data member. But I search .data in the official document and Google, finding little. I guess .data contains the data in the

#

I would just ignore it (and not use it)

verbal venture Jun 22, 2023, 2:29 AM

#

ok so my model is still fucked

#

are you able to take a look?

#

but it shouldn't be fucked. it's like pretty dense

#

class Model(nn.Module):
    def __init__(self, num_classes=num_classes):
        super(Model, self).__init__()
       
        self.conv_layers = nn.Sequential(

        nn.Conv2d(3, 64, kernel_size=3, padding=1),
        nn.ReLU(inplace=False),
        nn.Conv2d(64, 64, kernel_size=3, padding=1),
        nn.ReLU(inplace=False),
        nn.MaxPool2d(kernel_size=2, stride=2),
        
        nn.Conv2d(64, 128, kernel_size=3, padding=1),
        nn.ReLU(inplace=False),
        nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.fc_layers = nn.Sequential(
        nn.Linear(128 * 56 * 56, 512),
        nn.ReLU(inplace=False),
        nn.Dropout(0.5),
        nn.Linear(512, num_classes))
            
    def forward(self, x):
        x = self.conv_layers(x)
        print("conv layers", x.shape)
        x = torch.flatten(x, 1)
        print("after flattening", x.shape)
        x = self.fc_layers(x)
        print("After FC layers", x.shape)
            
        return x
        
        
model = Model()

model.parameters

def training(model, train_loader, loss_fn, optimizer, num_epochs):
    model.train()
    model.to(device) # using GPU if available 
    
    for epoch in range(1):
        epoch_train_loss = 0.0
        correct = 0
        
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            
            optimizer.zero_grad()
            
            outputs = model(images)
            
            train_loss = loss_fn(outputs, labels)
            
            train_loss.backward() 
            optimizer.step()
            
            epoch_train_loss += train_loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            
            correct += (predicted == labels).sum().item()
        accuracy = 100/batch_size * correct / len(train_loader)
     
Accuracy: {accuracy:.4f}")```

serene scaffold Jun 22, 2023, 2:34 AM

#

replicating all your work locally is more than I'm willing to commit to--sorry

#

though in general, please always use markdown blocks for pasting code

#

!code

arctic wedgeBOT Jun 22, 2023, 2:34 AM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

verbal venture Jun 22, 2023, 2:35 AM

#

don't replicate. just take a look and see why the accuracy might b e .5%. just in general: epochs were 30, batch size 64, pretty dense network, nn.CrossEntropyLoss() etc. accuracy shouldn't be half a percent

serene scaffold Jun 22, 2023, 2:36 AM

#

I don't know

#

though it looks like you only have one epoch

#

for epoch in range(1):

verbal venture Jun 22, 2023, 2:37 AM

#

it should still hit like 30% from that tho

#

I can't trust chatgpt either. spews so much nonsense

serene scaffold Jun 22, 2023, 2:37 AM

#

why should it reach 30% after one epoch?

verbal venture Jun 22, 2023, 2:37 AM

#

cuz if it's a good model it can't start at .5% can it?

#

I've trained like 3 models, but basically all the good ones started relatively good

#

but what do I know

serene scaffold Jun 22, 2023, 2:38 AM

#

do enough epochs to get a noticable diminishing rate of return on the loss

#

and then if it's still performing poorly, you can reevaluate

verbal venture Jun 22, 2023, 2:40 AM

#

its either a bug or I made something insanely shitty

#

cuz I think even like a single MLP could have gotten more of a prediction

#

should I do correct / len(train_loader.dataset) or / train_loader

serene scaffold Jun 22, 2023, 2:41 AM

#

you should do more epochs

#

you even said "just in general: epochs were 30"

#

but then it was secretly actually 1.

verbal venture Jun 22, 2023, 2:42 AM

#

I should do more than that?

#

ah 😂

#

are product level networks run on thousands of epochs?

serene scaffold Jun 22, 2023, 2:42 AM

#

no

#

for epoch in range(1):
this means that regardless of how many epochs you said you did, you did one.

verbal venture Jun 22, 2023, 2:43 AM

#

right

#

so if a dataset is like a million images

serene scaffold Jun 22, 2023, 2:43 AM

#

did you even actually do 30?

verbal venture Jun 22, 2023, 2:43 AM

#

the network would just be even more dense and the epochs would be low yeah?

#

no I did 1 for compute

#

but running 30 rn

#

kaggle notebook gonna explode

serene scaffold Jun 22, 2023, 2:44 AM

#

what did you say this model does?

verbal venture Jun 22, 2023, 2:45 AM

#

there's 196 classes of cars. a CNN to classify them

#

I gpt prompted the best network to do so, which seems. to be a VGG imitation but from scratch and much lighter

#

my accuracy is steadily at 0.5%

#

does anyone know why this accuracy is so low? 😂

proven sigil Jun 22, 2023, 3:51 AM

#

I'm building a multi-class classifier that classifies conversation text into one of 8 sentiments.


Within each message, there is a conversation id, which is basically which conversation the message takes place in. Each message is either the start of a conversation or a reply from the previous message. There is also a sentiment, which represents the emotion that the person who sent the message is feeling. There are 8 sentiments: Angry, Curious to Dive Deeper, Disguised, Fearful, Happy, Sad, and Surprised.

Sentiment Analysis: Build a multi-class sentiment analysis model based on this dataset.

I'm using sentence_transformer to transform text into embeddings, then using OneVsRestClassifier using a simple estimator.

But it's taking too long to train on my machine.
How can I speed this up?

serene scaffold Jun 22, 2023, 4:02 AM

#

@proven sigil how many cores does your machine have, and what value did you set for n_jobs, if any?

queen cradle Jun 22, 2023, 4:51 AM

#

glossy aspen I use numpy almost for everything. Do I really need something like pandas? Or is...

NumPy and Pandas have different use cases. Pandas is designed for columnar data, much like a SQL database. Many of its operations work with one column or a small number of columns, paying no attention to the others: Summary statistics like means, standard deviations, and counts; grouping; joins; and so on. Pandas uses one-dimensional arrays almost exclusively. Polars is so tightly focused on columnar data that, as far as I'm aware, it supports only one-dimensional arrays, and I think it even requires that items at adjacent indices are placed in adjacent memory locations. NumPy, on the other hand, is designed for scientific computing: Linear algebra, numerical integration, signal processing, solving PDEs, that sort of thing. Multidimensional arrays are fundamental to these applications. You can't even make sense of linear algebra without two-dimensional arrays! NumPy also needs to work with arrays where adjacent indices may refer to non-adjacent memory locations (i.e., the arrays may have arbitrary strides). For example, this allows NumPy to extract a column from a matrix without copying: It creates a new array which points to the same memory as the original matrix but has a stride that makes each successive array index skip over a whole row of the matrix. This sort of operation happens constantly in scientific computing applications, but it would be highly unusual for the columnar data that Pandas and Polars target.

You can get away with using NumPy alone, but it's not designed for data manipulation and is a bad tool for that. You can get away with Pandas or Polars alone if you want to manipulate your data (grouping, filtering, joining, and so on) but don't want to do prediction or statistical inference. However, something as simple as linear regression requires linear algebra and hence NumPy or equivalent. PyTorch and Tensorflow are more akin to NumPy than to Pandas because the use cases they're intended for are more like scientific computing applications.

#

I wouldn't mind. But I don't have much time this week and next, so I have no idea when I'll get around to it.

glossy aspen Jun 22, 2023, 4:59 AM

#

queen cradle NumPy and Pandas have different use cases. Pandas is designed for columnar data,...

Probably I'll continue to use only numpy for my processing algorithms but thanks

thin geyser Jun 22, 2023, 6:18 AM

#

Is it possible to use grad cams for gans? I'm trying to overlay a heatmap on an input image to localise the area which is causing a change in the output image.

errant spear Jun 22, 2023, 6:42 AM

#

I am curious, what is the most appropriate algorithm for predicting stock prices?

hoary wigeon Jun 22, 2023, 6:45 AM

#

Hi there!,

I'm trying to use shap[all] and I'm facing an error saying that

cuda extension was not built during install!
ImportError: cannot import name '_cext_gpu' from partially initialized module 'shap' (most likely due to a circular import) (/home/cosmix/.local/lib/python3.8/site-packages/shap/__init__.py)

Has anyone been through this error and know how to fix it?

storm valve Jun 22, 2023, 6:45 AM

#

errant spear I am curious, what is the most appropriate algorithm for predicting stock prices...

i dont think anyone generally agrees with this

errant spear Jun 22, 2023, 6:46 AM

#

storm valve i dont think anyone generally agrees with this

I do not understand.

#

As in, there is no most appropriate one?

storm valve Jun 22, 2023, 6:46 AM

#

as in there's no general agreement about the "most appropiate one"

errant spear Jun 22, 2023, 6:47 AM

#

Intriguing.

#

Any idea about what you believe the most appropriate ones would be?

errant remnant Jun 22, 2023, 7:14 AM

#

Is this good channel to ask machine learning questions?

Anyways how to train our model for images it failed to recognise again without training whole dataset again

grand spindle Jun 22, 2023, 8:08 AM

#

give it more data until it gets them all right

proven sigil Jun 22, 2023, 9:32 AM

#

errant spear I am curious, what is the most appropriate algorithm for predicting stock prices...

None 🙅‍♂️

proven sigil Jun 22, 2023, 9:38 AM

#

serene scaffold <@573073267766722561> how many cores does your machine have, and what value did ...

My machine has 8 cores. I did not set n_jobs :/. I'm using SGDClassifier

young granite Jun 22, 2023, 9:53 AM

#

@storm valve how was this one method called which gives different forecastings something with "M.." but i cant recall it

hasty mountain Jun 22, 2023, 10:42 AM

#

Hey guys, about how to calculate ROC-AUC curve, can someone help me with thresholding for Binary Classification?
If I'm using a Neural Network for Binary Classification which uses a Log Sigmoid activation function to make the classification...how should I proceed with thresholding selection?

I was thinking about simply using my output argmax, since it's the most obvious way and it's how my model is optimized(I suppose it's more or less how the BCE Loss works, even with Logits...), but when checking the source code of the model I'm using(TrimNet for drug toxicity prediction), I've found that they used scikit-learn's precision_recall_curve, which applies thresholding automatically.

I've seen that the threshold is used to get the True Positive Rate and the False Positive Rate. However, I'm simply using True Positive Predictions/Predictions and False Positive Predictions/Predictions, where the predictions are provided by my model.

I don't know how should I proceed with thresholding selection, specially since my model outputs are all in log scale. It appears to me that trying to use a threshold in this situation would be a bit arbitrary and prone to cherry picking...

hasty mountain Jun 22, 2023, 11:02 AM

#

hasty mountain Hey guys, about how to calculate ROC-AUC curve, can someone help me with thresho...

Hm... I've double checked the TrimNet's source code. They actually used scikit-learn's metrics.precision_recall_curve to calculate the precision through scikit-learn's metrics.auc. For ROC-AUC, they used the model predictions and the labels themselves.

Well, this solves my problem, but I don't really get the difference... Shouldn't ROC-AUC calculate both the precision and recall of the model?

hasty mountain Jun 22, 2023, 12:14 PM

#

I only got time to check this properly now. The thing is, no matter how my model would make correct predictions, (pred == label) would always return a mask of False booleans.

The funny thing is...I had converted my pred to numpy arrays, but kept my label as Pytorch tensor, which caused them to be interpreted as different elements.

#

Hours studying, reviewing and burning neurons on the math of ROC-AUC, and the solution was just a matter of .cpu().numpy()

#

py_guido

#

Thanks for the help. I hope I can now implement my ROC-AUC calculation properly.

high stump Jun 22, 2023, 3:52 PM

#

I hope this is the right channel for this question. 🙏 I'm new to ML and I am looking for open-source algorithms that have been built to predict the mechanical and or chemical properties of materials, any materials. Is there a place where I can start looking? Thanks.

stone glacier Jun 22, 2023, 3:53 PM

#

Question:

Can anyone suggest a better text corpus model than Word2Vec?
My recommender system used Word2Vec but it's not that good

#

#

I should getting the LOTR titles and the 3 hobbit titles in top 5..but I got this

#

(I did try TF-IDF, but it was even worse...)

left tartan Jun 22, 2023, 4:36 PM

#

high stump I hope this is the right channel for this question. 🙏 I'm new to ML and I am l...

Huh, that’s a fascinating question I’ve never thought about. No idea but would love to know if there is one. I know pharma does a lot of ‘similar’ modeling for, well, pharma reasons. Example: https://medium.com/geekculture/drug-target-interaction-prediction-through-python-4af9e76fc90 Eager to hear if there’s anything here.

Medium

Drug-Target interaction prediction through Python

In this post, I present Python code snippets to predict drug-target interaction using SVD (Singular Value Decomposition) and Matrix…

tidal bough Jun 22, 2023, 4:42 PM

#

high stump I hope this is the right channel for this question. 🙏 I'm new to ML and I am l...

I recently attended a presentation on the topic of predicting mechanical properties from composition, and according to it you pretty much want USPEX.
It's not opensource, though.

hoary jay Jun 22, 2023, 5:27 PM

#

queen cradle I wouldn't mind. But I don't have much time this week and next, so I have no ide...

just a paragraph haven't completed the whole paper yet, but tomorrow is the abstract submission only, do you mind if i DM

#

just nervous because it's the first time, Just wish to know if the style of writing is all good and the information in the abstract is interesting enough to catch an eye, basically just need criticism

glossy aspen Jun 22, 2023, 6:20 PM

#

high stump I hope this is the right channel for this question. 🙏 I'm new to ML and I am l...

They use graph neural networks and pytorch geometric library to model them: https://pytorch-geometric.readthedocs.io/en/latest/get_started/colabs.html

storm valve Jun 22, 2023, 7:34 PM

#

young granite <@998437135814238238> how was this one method called which gives different forec...

no clue

high stump Jun 22, 2023, 7:44 PM

#

@left tartan , @tidal bough , and @glossy aspen . Thank you very much you lot for the leads.

hasty mountain Jun 22, 2023, 7:59 PM

#

hasty mountain Thanks for the help. I hope I can now implement my ROC-AUC calculation properly.

I got ROC-AUC with negative values yert

#

Will it fix it if I use abs()?

||I'm joking...I guess...||

#

I know that the most appropriate way would be to use integrals of TPR(x) and FPR(x) to calculate the area (instead of simply decomposing the ROC grid into triangles and squares). But I don't really know how would I define those functions...

frail quarry Jun 22, 2023, 8:15 PM

#

maybe not the best place, but can't think of which other room would be better suited. I'm attempting to clip a LAS point cloud file using polygons in a geopandas GeoDataFrame. it is currenty taking quite a long time to do each one, specifically at this line in my code (it takes about 5 seconds to execute) within_polygon = np.array([polygon["geometry"].intersects(Point(point[0], point[1])) for point in coords])
any ideas on how to better do this?
full function:

def clip_las():
    start = datetime.datetime.now()
    
    # Iterate through the prepped_segments dataframe
    for index, polygon in prepped_segments.iterrows():
        print("Processing " + polygon["NAME"])
        poly_start = datetime.datetime.now()
        
        box_path = os.path.join(PROJECT_DIR, polygon[BOX_ID_FIELD] + BOX_SUFFIX)

        ## Read in the LAS file
        las = laspy.read(os.path.join(box_path, polygon[BOX_ID_FIELD] + ".las"))
        
        # Get coordinates of points
        coords = np.vstack((las.x, las.y, las.z)).transpose()
        
        # Get boolean array of points within the polygon
        within_polygon = np.array([polygon["geometry"].contains(Point(point[0], point[1])) for point in coords])
        print("Filtered points in", str(datetime.datetime.now() - poly_start) + " seconds")

        # Get the points within the polygon
        clipped_points = las.points[within_polygon]

        # Create a new laspy file
        new_las = laspy.LasData(las.header)
        
        # Add the clipped points to the new laspy file
        new_las.points = clipped_points
        new_las.write(os.path.join(box_path, "Final", "las", polygon["NAME"] + ".las"))
        
        print("Clipped " + polygon["NAME"] + " in " + str(datetime.datetime.now() - poly_start) + " seconds")

clever egret Jun 22, 2023, 9:20 PM

#

Hello all. I just joined the server and am interested in hanging out with other devs related to this channel's topic.

#

Is this place really active?

mild dirge Jun 22, 2023, 9:20 PM

#

It's more so for asking questions about DS and AI, not many people use it as a social hub atm

clever egret Jun 22, 2023, 9:21 PM

#

I see. Thank you for your response.

sleek harbor Jun 22, 2023, 9:36 PM

#

How does sklearn calculate mutual information and why are the results different each time (has random_state)? What's the randomness about? I thought the formula for mutual information was just like this.. I see no reason for something random to go on under the hood..

left tartan Jun 22, 2023, 9:42 PM

#

The source contains the note: " # Add small noise to continuous features as advised in Kraskov et. al." https://github.com/scikit-learn/scikit-learn/blob/364c77e04/sklearn/feature_selection/_mutual_info.py#L391

arctic wedgeBOT Jun 22, 2023, 9:42 PM

#

sklearn/feature_selection/_mutual_info.py line 391

def mutual_info_classif(```

left tartan Jun 22, 2023, 9:43 PM

#

I don't know more than that, but that could explain why you see differences.

sleek harbor Jun 22, 2023, 9:43 PM

#

left tartan The source contains the note: " # Add small noise to continuous features ...

hmmmmm.. ok, but how are continuous features binned? there's no option to chose bin parameters..

#

hmm.. 3 neighbors.. I gotta read that link tomorrow. Brain shutting down :/

left tartan Jun 22, 2023, 9:46 PM

#

yah, i dunno, I'd have to read through the paper to get this. _compute_mi_cc doesn't seem to bin, tho

verbal venture Jun 22, 2023, 9:52 PM

#

someone give me a billion dollar idea using CNNs

sleek harbor Jun 22, 2023, 9:52 PM

#

verbal venture someone give me a billion dollar idea using CNNs

anime

verbal venture Jun 22, 2023, 9:54 PM

#

sleek harbor anime

what

sleek harbor Jun 22, 2023, 9:56 PM

#

verbal venture what

make a bot that goes threw all the anime in the world, analyses it, then takes requests, and generates new anime

hasty mountain Jun 22, 2023, 10:21 PM

#

sleek harbor make a bot that goes threw all the anime in the world, analyses it, then takes r...

Already done. Anime-GAN, I think there's also Anime-DCGAN

#

Someone probably also did it with Stable Diffusion...

||but mine will be better||

left tartan Jun 22, 2023, 10:24 PM

#

verbal venture someone give me a billion dollar idea using CNNs

Make a cnn that analyzes cnn (the news)

hasty mountain Jun 22, 2023, 11:24 PM

#

stone glacier ### Question: Can anyone suggest a better text corpus model than Word2Vec? My r...

BERT? I know BERT is a classifier, but I don't know if it could be seen as "corpus model"

(I don't really know what would be a "text corpus model" specifically", but...well, usually Transformers are a jack-of-all-trades in NLP...and in most tasks...)

serene scaffold Jun 23, 2023, 12:20 AM

#

hasty mountain BERT? I know BERT is a classifier, but I don't know if it could be seen as "corp...

BERT can be fine-tuned for classification, but it's a language model.

#

(Source: I just spent all day fighting BERT to classify some shit)

nimble shale Jun 23, 2023, 5:35 AM

#

I was looking around and wondering if I could get any help to be put in the right direction. In general I was curious if it would be possible to determine a video games internal resolution based off an image. Since the output resolution can be different from its internal resolutions. Right now the most straight forward way I can think to do it is the count the pixels on a diagonal line. Though I'm wondering if there's any good opportunity here to get more into a machine learning method or some more computer vision methods. Mostly just looking for resources that would be very applicable to this or really any guidance. I'd decently experienced with python but not so much for ML, computer vision, data science, etc

cold osprey Jun 23, 2023, 7:03 AM

#

idk, seems like theres a definitive way to do it right

#

How would a ML approach be better? speed, accuracy etc/

sleek harbor Jun 23, 2023, 7:31 AM

#

How do you encode categorical variables before doing feature selection with mutual information/chi2/etc and are mutual information scores calculated between categorical/categorical x continuous/continuous features comparable?
If the categorical feature is ordinal, then no problem.. u just encode it as such. But what if it's nominal? Do you OHE? That'd be a bit weird, cus u get a bunch of scores for each value of the category, instead of just a score per category.. However, that just might be a good thing.. maybe in Blue, Red, Green, Yellow, Orange - yellow and orange aren't as important as the rest and could be safely discarded. But then again, that would (in my understanding) strongly affect comparability to other MI scores. And since MI doesn't really care about the ordering, would it make more sense to just ordinally encode the feature to get its MI score, even if it's nominal, and after that reencode it with OHE (after deciding whether or not to keep the entire feature)?
Are discrete numeric features treated exactly the same way as ordinally encoded categorical ones? Are the scores comparable? What about compared to continuous features?

nimble shale Jun 23, 2023, 7:37 AM

#

cold osprey idk, seems like theres a definitive way to do it right

Yeah I was trying to see if there's a more direct way to do it. Edge detection seemed like it could help, but wouldn't be consistent. And with different forms of AA and upscaling, well that just makes it a more difficult problem. I tried to see if there were any definitive ways to do it but haven't found much in my search

pseudo moon Jun 23, 2023, 8:13 AM

#

I am training a denoising autoencoder, but often the model outputs black images (values that are extremely close to 0). Sometimes, it can output good results but the next time I try training it again, it will give pitch black image. I am using 3 conv2d layers for the encoder and 3 conv2dtranspose layers for the decoder, with relu activation except sigmoid for the last layer. Strangely enough, when I switch to Dense layers instead of convolutional layers, the model will always result in some output although not as good as when conv layers are used. Does anyone know what might be the problem here that always give me black outputs?

hasty mountain Jun 23, 2023, 9:18 AM

#

pseudo moon I am training a denoising autoencoder, but often the model outputs black images ...

When I started playing with VAEs, I had many problems regarding the fact that they work around distributions...so, when you sample your images, did you remember to denormalize them?

#

If you're using RGB images, your VAE probably works with a Normal Distribution, so you have to de-Normalize the output. If you don't, you may get images that don't really correspond to the training data.

#

Also...it seems that, depending on the dataset, some outputs are more prone to generate dark images... I have a VAE trained on CIFAR100 that has this likelihood. But when I use a custom dataset(and on a simpler VAE), this doesn't happen.

hasty mountain Jun 23, 2023, 9:26 AM

#

sleek harbor How do you encode categorical variables before doing feature selection with mutu...

I don't know about that, but looks like you may get interested in how ROC-AUC score (a metric that is essentially categorical) is applied to regression tasks

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

scikit-learn

sklearn.metrics.roc_auc_score

Examples using sklearn.metrics.roc_auc_score: Release Highlights for scikit-learn 0.22 Release Highlights for scikit-learn 0.22 Probability Calibration curves Probability Calibration curves Multicl...

#

Though...it seems to be a bit annoying... ~~and prone to cherry picking~~

sleek harbor Jun 23, 2023, 9:29 AM

#

hasty mountain Though...it seems to be a bit annoying... ~~and prone to cherry picking~~

I don't see how you would use roc auc for feature importance/selection, which is what I'm currently concerned with

hasty mountain Jun 23, 2023, 9:29 AM

#

Maybe not for feature selection, but it could inspire you somehow.
Maybe adopting a threshold for certain continuous variables to be considered "classes"... At least this is what seems to be done in ROC-AUC. Maybe it could also work for feature selection.

past meteor Jun 23, 2023, 9:41 AM

#

sleek harbor How do you encode categorical variables before doing feature selection with mutu...

Is your problem that you encode color to 3 levels for example red green and blue and you want to know if you need to throw out color in its entirety?

#

What's stopping you from encoding your variable with 15 colors to Red, Blue, Green and Rest if the others are unimportant

#

If you're using K-1 dummies you can even just drop the final category

sleek harbor Jun 23, 2023, 9:43 AM

#

past meteor Is your problem that you encode `color` to 3 levels for example red green and bl...

I want to know what the correct approach is in general. Do I throw out the entire feature, do I throw out certain categories of the feature? How do I do so?

sleek harbor Jun 23, 2023, 9:43 AM

#

past meteor What's stopping you from encoding your variable with 15 colors to `Red`, `Blue`,...

the question is, how do I find out if they're unimportant?

past meteor Jun 23, 2023, 9:44 AM

#

With whatever feature importance algorithm that your model offers

#

Your background was in economics right? Then I'll just say three words you probably heard a million times: beware of multicollinearity

sleek harbor Jun 23, 2023, 9:49 AM

#

past meteor Your background was in economics right? Then I'll just say three words you proba...

that was next on my list.. how do I deal with that? The amount of approaches.. Like, do I use PCA? If yes, then do I replace all existing features with the principal components? If yes then I potentially lose some information and relationships the model might pick up, especially if I use a polynomial transform.. If instead of replacing I just add them, then that's basically adding to the problem by adding the solution to the problem. So my plan was to just weed out the less "important" features with mutual information, and then do some sequential feature selection with what's left to determine what works best (with cv ofc), thus kinda bypassing the need to deal with multicollinearity.. kinda

pseudo moon Jun 23, 2023, 9:55 AM

#

hasty mountain When I started playing with VAEs, I had many problems regarding the fact that th...

denormalize ?

wooden sail Jun 23, 2023, 10:02 AM

#

sleek harbor that was next on my list.. how do I deal with that? The amount of approaches.. L...

this IS dealing with multicollinearity :p

sleek harbor Jun 23, 2023, 10:04 AM

#

wooden sail this IS dealing with multicollinearity :p

ok, but I still don't know the answer to this #data-science-and-ml message to do that correctly :'/

latent shore Jun 23, 2023, 10:08 AM

#

Hey @past meteor can i ask you about K-means clustering in dms?

past meteor Jun 23, 2023, 10:10 AM

#

latent shore Hey <@260493929047130113> can i ask you about K-means clustering in dms?

I'd prefer you just ask here

latent shore Jun 23, 2023, 10:10 AM

#

It doesnt let me put pdfs and py files

#

But can you check my python help thread

past meteor Jun 23, 2023, 10:11 AM

#

I'm not going to open PDFs and py files, the discord has a paste functionality

latent shore Jun 23, 2023, 10:11 AM

#

It deletes the code i copy and paste too idk why

#

But i put screenshots on python-help

past meteor Jun 23, 2023, 10:12 AM

#

sleek harbor that was next on my list.. how do I deal with that? The amount of approaches.. L...

What's preventing you from just using regularisation?

#

Considering you're mentioning a polynomial transform I can assume you're using a linear model? Why not use Lasso / elastic net and solve multiple problems at once

sleek harbor Jun 23, 2023, 10:13 AM

#

past meteor What's preventing you from just using regularisation?

well.. regularisation doesn't exactly remove the problem.. it just smooths it out, so to speak. And I do intend to use regularization, on top of whatever else I do :3

past meteor Jun 23, 2023, 10:14 AM

#

Do you know L1 regularisation and specifically Lasso?

sleek harbor Jun 23, 2023, 10:14 AM

#

past meteor Considering you're mentioning a polynomial transform I can assume you're using a...

not necessarily. I'll be using a polynomial transform largely to just catch integrations between features

sleek harbor Jun 23, 2023, 10:16 AM

#

past meteor Do you know L1 regularisation and specifically Lasso?

I do know about it, yes. But can you say with certainty that using it will be the best move in removing unnecessary feature for, say, a tree based model? Probably not, so I'll probably stick to inbuilt regularization methods for whatever model I go with

past meteor Jun 23, 2023, 10:17 AM

#

And we're back to multicollinearity, you need to careful define what it is you mean by unnecessary feature and say it in the mirror 25 times haha

sleek harbor Jun 23, 2023, 10:17 AM

#

I haven't decided on the model yet, btw. I'm just messing with features so far

past meteor Jun 23, 2023, 10:17 AM

#

For decision trees you can just add 2 features that are 100 % noise and remove all features that have a lower feature importance than those 2

sleek harbor Jun 23, 2023, 10:18 AM

#

past meteor And we're back to multicollinearity, you need to careful define what it is you m...

my unnecessary I mean, those that contain little to no useful information about the target variable. In this case, I decided to go with mutual information to choose what is and isn't necessary

past meteor Jun 23, 2023, 10:18 AM

#

sleek harbor my unnecessary I mean, those that contain little to no useful information about ...

They might contain information about the target but 2 other features might contain exactly the same info

sleek harbor Jun 23, 2023, 10:19 AM

#

past meteor For decision trees you can just add 2 features that are 100 % noise and remove a...

🤔 I've never heard about that.. I probably won't be trying that now, but if u got a link I could add to my read list, wouldn't say no

past meteor Jun 23, 2023, 10:19 AM

#

Like in general you'll do the same thing right, you'll remove the variable. The nuance is in how you present the results etc etc

sleek harbor Jun 23, 2023, 10:19 AM

#

past meteor They might contain information about the target but 2 other features might conta...

that I'll solve with cross validating sequential feature selection - those that harm the model will automatically be removed

past meteor Jun 23, 2023, 10:20 AM

#

Oh you're going with stepwise 💀

#

https://journalofbigdata.springeropen.com/articles/10.1186/s40537-018-0143-6

SpringerOpen

Step away from stepwise - Journal of Big Data

Background Stepwise regression is a popular data-mining tool that uses statistical significance to select the explanatory variables to be used in a multiple-regression model. Findings A fundamental problem with stepwise regression is that some real explanatory variables that have causal effects on the dependent variable may happen to not be stat...

cold osprey Jun 23, 2023, 10:21 AM

#

latent shore It deletes the code i copy and paste too idk why

!code

arctic wedgeBOT Jun 23, 2023, 10:21 AM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

sleek harbor Jun 23, 2023, 10:21 AM

#

past meteor Oh you're going with stepwise 💀

this time, lets say yes, I go with that 💀

past meteor Jun 23, 2023, 10:21 AM

#

Don't do it, this is such a well studied thing

#

It's not just me

sleek harbor Jun 23, 2023, 10:21 AM

#

pfff.. time to read

latent shore Jun 23, 2023, 10:22 AM

#

cold osprey !code

Thank you

sleek harbor Jun 23, 2023, 10:23 AM

#

past meteor It's not just me

ok, I'll read later.. :3 pause that for a minute
Lets say I don't go with step wise, but use permutation importance (feature importance) and cross val to select the cutoff threshold.. that'd work right?

past meteor Jun 23, 2023, 10:24 AM

#

Yeah, do it by adding a few features that are 100 % noise (say 2) because tha automatically tells you where your cutoff could be

latent shore Jun 23, 2023, 10:25 AM

#

runfile('C:/Users/Ayla/Desktop/sonson.py', wdir='C:/Users/Ayla/Desktop')
C:\Users\Ayla\anaconda3\lib\site-packages\numpy\core\fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
c:\users\ayla\desktop\sonson.py:71: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use matplotlib.colormaps[name] or matplotlib.colormaps.get_cmap(obj) instead.
colors = plt.cm.get_cmap('rainbow', num_labels)
Traceback (most recent call last):

File ~\anaconda3\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)

File c:\users\ayla\desktop\sonson.py:107
plot_clusters(X_transformed, kmeans.centroids, kmeans.labels)

File c:\users\ayla\desktop\sonson.py:77 in plot_clusters
plt.scatter(transformed_centroids[:, 0], transformed_centroids[:, 1], marker='+', color='red', label='Centroids')

IndexError: index 0 is out of bounds for axis 1 with size 0

latent shore Jun 23, 2023, 10:25 AM

#

latent shore runfile('C:/Users/Ayla/Desktop/sonson.py', wdir='C:/Users/Ayla/Desktop') C:\User...

Do you know what does this error mean

lapis sequoia Jun 23, 2023, 10:26 AM

#

latent shore runfile('C:/Users/Ayla/Desktop/sonson.py', wdir='C:/Users/Ayla/Desktop') C:\User...

transformed_centroids[:, 0] , transformed_centroids --> this variable is probably None

sleek harbor Jun 23, 2023, 10:26 AM

#

past meteor Yeah, do it by adding a few features that are 100 % noise (say 2) because tha au...

ok, but that "solves" the correlation problem. But that still doesn't answer this #data-science-and-ml message how to encode cat features before calculating mut info and are results comparable between feature types?

latent shore Jun 23, 2023, 10:26 AM

#

lapis sequoia transformed_centroids[:, 0] , transformed_centroids --> this variable is probabl...

So what can i do

#

#

📎 winequality-red1.csv

lapis sequoia Jun 23, 2023, 10:28 AM

#

latent shore So what can i do

I am not sure how you defined that variable, probably share the snipper of code where you did.

latent shore Jun 23, 2023, 10:28 AM

#

These are codes and data

past meteor Jun 23, 2023, 10:30 AM

#

sleek harbor ok, but that "solves" the correlation problem. But that still doesn't answer *th...

You'd just one-hot or target encode it

sleek harbor Jun 23, 2023, 10:30 AM

#

past meteor You'd just one-hot or target encode it

so which is it??? do I one hot, or do I ordinal?

past meteor Jun 23, 2023, 10:31 AM

#

That's a question for you not me

#

I would not ordinal encode colours because you'll get something like 1, 2, 3, 4, 5

#

And Red (1) - Green (2) being -1 makes no sense

#

I might ordinal encode high cardinality categorical variables if I'm working with a tree algorithm because they have a degree of invariance to this problem

lapis sequoia Jun 23, 2023, 10:33 AM

#

latent shore

thanks. looking at the code, I think PCA and KMeans classes seems fine. please make sure your "kmeans.centroids" is a 2D array so that you could perform indexing on 2nd axis.

past meteor Jun 23, 2023, 10:33 AM

#

A degree, but it's definitely not perfect

#

That leaves you with one - hot vs target encoding. Target loses a lot of information but killing your model by expanding high cardinality categories is worse.

sleek harbor Jun 23, 2023, 10:34 AM

#

past meteor And Red (1) - Green (2) being -1 makes no sense

but for feature selection purposes, forget the model. Obv if I ordinally encode a feature for selection I'll reencode as one hot before passing it to the model, but before that. Before we get to the model, it makes a big difference: ord vs one hot
If I select a certain threshold, I'll get vastly different results afterwards

past meteor Jun 23, 2023, 10:35 AM

#

TL;DR

Low cardinality: One hot
High cardinality and tree: ordinal encoder
High cardinality and not a tree: target encoding

past meteor Jun 23, 2023, 10:36 AM

#

sleek harbor but for feature selection purposes, forget the model. Obv if I ordinally encode ...

It depends on how MI is specifically calculated

#

If it treats it like something continuous then ordinal encoding would not make sense

sleek harbor Jun 23, 2023, 10:37 AM

#

past meteor It depends on how MI is specifically calculated

it treats it as discrete categories (continuous gets binned)

#

*actually implementations differ, but that's the easiest explanation

past meteor Jun 23, 2023, 10:38 AM

#

How is the binning done, Is it done exactly at the cutoffs of your ordinal values or do some get grouped?

latent shore Jun 23, 2023, 10:38 AM

#

lapis sequoia thanks. looking at the code, I think PCA and KMeans classes seems fine. please m...

Okay

#

Thank you

#

If you can help with editing the code let me know please @lapis sequoia

past meteor Jun 23, 2023, 10:39 AM

#

Even if it it does it at exactly the right granularity, the issue is still that you imo, still want to OHE it at the feature selection level so you can see what specific levels are relevant.

sleek harbor Jun 23, 2023, 10:39 AM

#

past meteor How is the binning done, Is it done exactly at the cutoffs of your ordinal value...

binning is for continuous variable, they just get grouped into bins, like for a histogram, and those bins are used as categories to calculate the score. But that's not the main question. What do I do with nominal categorical variables is what I'm asking, not continuous

past meteor Jun 23, 2023, 10:40 AM

#

sleek harbor binning is for continuous variable, they just get grouped into bins, like for a ...

... yes but does your implementation treat nominal categorical as continuous

sleek harbor Jun 23, 2023, 10:41 AM

#

past meteor Even if it it does it at exactly the right granularity, the issue is still that ...

as discrete* so either ohe or ordinal encoding works

sleek harbor Jun 23, 2023, 10:41 AM

#

sleek harbor but for feature selection purposes, forget the model. Obv if I ordinally encode ...

but like this

past meteor Jun 23, 2023, 10:41 AM

#

Are you using sci-kit learn's implementation? I'll just read the docs

lapis sequoia Jun 23, 2023, 10:41 AM

#

sleek harbor but for feature selection purposes, forget the model. Obv if I ordinally encode ...

is it kaggle titanic problem?

sleek harbor Jun 23, 2023, 10:41 AM

#

past meteor Are you using sci-kit learn's implementation? I'll just read the docs

yeah. The docs aren't very clear on anything tho

sleek harbor Jun 23, 2023, 10:41 AM

#

lapis sequoia is it kaggle titanic problem?

yeah

lapis sequoia Jun 23, 2023, 10:42 AM

#

What is your doubt exactly? I couldn't catchup with the convo

sleek harbor Jun 23, 2023, 10:43 AM

#

lapis sequoia What is your doubt exactly? I couldn't catchup with the convo

this is the original question: #data-science-and-ml message

past meteor Jun 23, 2023, 10:47 AM

#

Based on the docs it looks like if you make them ordinal and then specify that it's discrete you should be fine but then indeed you're computing the MI for the entire feature and not the levels

#

So if you want to drop a feature in its totality that works

#

If you want to know what specific levels have a high MI, then you should OHE

past meteor Jun 23, 2023, 10:48 AM

#

sleek harbor this is the original question: https://discord.com/channels/267624335836053506/3...

Does this answer your question or did I miss something?

past meteor Jun 23, 2023, 10:49 AM

#

sleek harbor but for feature selection purposes, forget the model. Obv if I ordinally encode ...

For the titanic dataset specifically the second plot makes more sense, not all decks are important so you could drop those levels ocne you one hot encode. The first plot doesn't give you that information

sleek harbor Jun 23, 2023, 10:51 AM

#

past meteor If you want to know what specific levels have a high MI, then you should OHE

if I OHE, will the resulting scores be comparable to scores of other features? As far as I understand - no, and that would kinda defeat the purpose, because I wouldn't be able to set a threshold for elimination. Unless, that is, I set a separate threshold for each group of features created by the one hot encoding.. but then I'd end up with many thresholds.

And the other unrelated question: are MI scores between categorical features comparable to MI scores between continuous features?

past meteor Jun 23, 2023, 10:52 AM

#

sleek harbor if I OHE, will the resulting scores be comparable to scores of other features? A...

It will be comparable I believe

lapis sequoia Jun 23, 2023, 10:53 AM

#

sleek harbor if I OHE, will the resulting scores be comparable to scores of other features? A...

for continuous, there could be little loss of infomation as we will be doing binning of values, so it depends. but still comparable in most cases.

past meteor Jun 23, 2023, 10:53 AM

#

(I still believe you should just use proper regularization and then you can leave this be)

#

Plot your data vs. the target etc

lapis sequoia Jun 23, 2023, 10:57 AM

#

sleek harbor How do you encode categorical variables before doing feature selection with mutu...

I think @past meteor already answered most part of it. As, its more about experimentation and results could still vary irrespective of MI scores. maybe you can calculate MI individually, eliminate few categories after OHE. Also keep the original feature and concatenate that with selected OHE features, that way probably some of the information about eliminated categories will still exist for model to learn.

sleek harbor Jun 23, 2023, 10:57 AM

#

past meteor It will be comparable I believe

U sure OHE will be comparable to others? I mean, all the information that they contain is just on/off for one category of the initial feature.. that doesn't seem like a lot, tho it can be useful.. Say you have a high cardinality feature, with a bunch of categories, but OHE is basically deviding their importance by the cardinality, making their MI scores very low, meaning they might just all get eliminated, if you use a threshold comparable to the others.. Idk if I explained this well, if not I can try clarifying what I mean

sleek harbor Jun 23, 2023, 10:59 AM

#

lapis sequoia for continuous, there could be little loss of infomation as we will be doing bin...

hmm.. ok, that kinda makes sense, maybe, I think 🤔 or not? I mean, descrete scores are deterministic, where continuous ones aren't

wooden sail Jun 23, 2023, 10:59 AM

#

makes sense after normalization, considering that misclassification is a full mistake

#

since the classes are orthogonal to each other, mistakes there automatically yield a large distance

sleek harbor Jun 23, 2023, 11:02 AM

#

wooden sail since the classes are orthogonal to each other, mistakes there automatically yie...

this kinda went over my head.. 😅 where r classes orthogonal?

wooden sail Jun 23, 2023, 11:03 AM

#

when you use one hot

lapis sequoia Jun 23, 2023, 11:04 AM

#

sleek harbor hmm.. ok, that *kinda* makes sense, maybe, I think 🤔 or not? I mean, descrete s...

yes right, it's a bit tricky for continuous as it depends on number of bins. high bins --> overfit, too low bins --> we miss useful patterns. I try to plot the features to judge what could be the ideal bins to be used.

sleek harbor Jun 23, 2023, 11:05 AM

#

wooden sail when you use one hot

so.. normalizing after one hot makes the scores comparable?

latent shore Jun 23, 2023, 11:05 AM

#

@lapis sequoia the code worked but how can i make it do iterations?

wooden sail Jun 23, 2023, 11:06 AM

#

one hot is normalized by default. it makes an orthonormal basis for the labels

#

if you normalize the other scores, you can compare them to ones derived from one hot vectors

lapis sequoia Jun 23, 2023, 11:07 AM

#

latent shore <@456226577798135808> the code worked but how can i make it do iterations?

iterations for? I think you are already doing Kmeans iterations while kmeans.fit in your code.

latent shore Jun 23, 2023, 11:08 AM

#

lapis sequoia iterations for? I think you are already doing Kmeans iterations while kmeans.fit...

The professor wants us to make 4 iterations out puts idk how i will send screenshots of what output she expects

sleek harbor Jun 23, 2023, 11:08 AM

#

wooden sail if you normalize the other scores, you can compare them to ones derived from one...

huh.. never thought about that. looks like sklearn standardizes all features by default when calculating MI, so I guess the scores are comparable

#

that kinda clicked now, thx

lapis sequoia Jun 23, 2023, 11:09 AM

#

latent shore The professor wants us to make 4 iterations out puts idk how i will send screen...

probably you can use k means elbow method and show the plot of clustering results based on different number of clusters, or simply print out output vectors shape.

sleek harbor Jun 23, 2023, 11:10 AM

#

@past meteor @lapis sequoia thx for ur time, that was really helpful. I think I kinda get it now, tho will likely wake up tomorrow and have to revisit the entire conversation again.. :p

latent shore Jun 23, 2023, 11:10 AM

#

lapis sequoia probably you can use k means elbow method and show the plot of clustering result...

#

The output i get is the first iteration plot only

#

It doesnt give me 2nd 3rd and 4th

lapis sequoia Jun 23, 2023, 11:12 AM

#

maybe the if condition in your Kmeans class (fit function) is already satisfied at 1st iteration (and it break the iteration loop), you may try removing it once?

latent shore Jun 23, 2023, 11:13 AM

#

Remove the condition?

#

I will try

lapis sequoia Jun 23, 2023, 11:13 AM

#

just keep self.iteration>200 for once

past meteor Jun 23, 2023, 11:13 AM

#

Like nis says, no substitute for empirically validating if your model is better with or without the feature 🙂

past meteor Jun 23, 2023, 11:32 AM

#

I wonder why factor analysis isn't more common in CS-aligned data science communities: https://github.com/MaxHalford/prince

GitHub

GitHub - MaxHalford/prince: :crown: Multivariate exploratory data a...

:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA - GitHub - MaxHalford/prince: :crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, F...

#

lapis sequoia Jun 23, 2023, 11:35 AM

#

past meteor I wonder why factor analysis isn't more common in CS-aligned data science commun...

I use FA often, atleast when I want to compare results with PCA. We even used it in our winning solution for one of the featured kaggle competition.

past meteor Jun 23, 2023, 11:38 AM

#

Great to hear. I'm a kaggler myself but I do it strictly for fun 🙂

#

Tend to do the tabular playground series. Sometimes with people from my cohort over the weekend or so. Never won anything though 🤣

lapis sequoia Jun 23, 2023, 11:40 AM

#

past meteor Tend to do the tabular playground series. Sometimes with people from my cohort o...

aaha that's nice, winning is not important, it's the learning which interests me. I am competing for 3-4 years now haha.

past meteor Jun 23, 2023, 11:41 AM

#

Yup, Kaggle + Lurking on Reddit + university (in no particular order) were what thought me data science. It's a great community.

lapis sequoia Jun 23, 2023, 11:43 AM

#

Hahah quite the same for me, I recently joined full time, but the community still interests me to be active in competitions and also contribute.

past meteor Jun 23, 2023, 11:43 AM

#

It's funny because recently I had an idea to make something where you have an LLM that has the right answers to projects (could be data, could be software) where people submit their answers and get feedback from the LLM on how to improve based on 1 or more model solutions.

lapis sequoia Jun 23, 2023, 11:45 AM

#

past meteor It's funny because recently I had an idea to make something where you have an LL...

That's a nice use case.

past meteor Jun 23, 2023, 11:45 AM

#

Several technical challenges in making it but overall the idea is to get "senior dev" tier advice based on model solutions to help people improve their skills - for free. A blue sky idea, I know

lapis sequoia Jun 23, 2023, 11:47 AM

#

haha I won't claim myself as an expert in LLMs, but yeah this sounds interesting. So, you are planning to finetune the LLM models or do some prompt engineering and build an app/software upon it?

past meteor Jun 23, 2023, 11:48 AM

#

I'm in a research lab. I'll let the idea "simmer" with our LLM gurus first. Initially my idea was to just prompt engineer and build an app on top of it.

#

Scope isn't big enough to be done at my work I think so my guess is that I'll try it as a hobby project 🙂

lapis sequoia Jun 23, 2023, 11:49 AM

#

ok_handbutflipped

unique flame Jun 23, 2023, 12:06 PM

#

My labelimg programs closes when labelling. I labelled yesterday and had three classes. I then continued today and it just keeps crashing. When looking at the classes.txt it gets changed to the first class I draw. Anyone had a similar thing and know how to fix?

unique flame Jun 23, 2023, 12:39 PM

#

Nvm found a workaround

potent sky Jun 23, 2023, 1:59 PM

#

@past meteor if ydm, what did you end up doing for that synthetic tabular data generation?

past meteor Jun 23, 2023, 2:01 PM

#

Haven't done it yet but my approach is clear - I'll make a graphical model and generate it from there. If I want to make it more noisy I'll add a VAE in there.

#

But it should be possible to make it pretty noisy at the PGM level already

potent sky Jun 23, 2023, 2:05 PM

#

past meteor Haven't done it yet but my approach is clear - I'll make a graphical model and g...

makes sense. you were looking for a simpler approach right? none convincing?

past meteor Jun 23, 2023, 2:06 PM

#

Simple but it needs to feel realistic. We might use the data for internal / external training so making it yourself means you get to inject whatever issues you want to cover.

potent sky Jun 23, 2023, 2:20 PM

#

ah right, fair enough

pine escarp Jun 23, 2023, 3:19 PM

#

Best site to practice pandas.

sleek harbor Jun 23, 2023, 3:21 PM

#

pine escarp Best site to practice pandas.

I think just sticking to ur bio is a good idea

past meteor Jun 23, 2023, 3:22 PM

#

I'd say, read part of the Pandas user guide on their website and then do some Kaggle @pine escarp

boreal gale Jun 23, 2023, 3:24 PM

#

pine escarp Best site to practice pandas.

try to help in this discord 😉 people come with interesting (explicitly related to pandas or otherwise) problem, solving them is good practice!
also maybe https://github.com/ajcr/100-pandas-puzzles
and https://pandas.pydata.org/docs/getting_started/tutorials.html

pine escarp Jun 23, 2023, 3:26 PM

#

Thank you all for your suggestions, I'll definitely try them all. emoji_71

#

I have been looking for a site that offers exercises to do.

sleek harbor Jun 23, 2023, 3:27 PM

#

past meteor I'd say, read part of the Pandas user guide on their website and then do some Ka...

should you standardize one hot encoded features if you standardize all your other features? Specifically asking about standardizing (mean 0, std 1), not normalizing. I think you should, but.. for some reason I couldn't find any confirmation on google.. :/

lapis sequoia Jun 23, 2023, 3:27 PM

#

pine escarp Best site to practice pandas.

kaggle mini courses are good as well.

sleek harbor Jun 23, 2023, 3:27 PM

#

pine escarp I have been looking for a site that offers exercises to do.

If you pay, you can do some exercises on datacamp, but.. the exercises are really bad, I wouldn't recommend. Just kaggle and practice

pine escarp Jun 23, 2023, 3:28 PM

#

Thank you, I'll try them!

pine escarp Jun 23, 2023, 3:28 PM

#

sleek harbor If you pay, you can do some exercises on datacamp, but.. the exercises are reall...

Ohh, I see. Thank you.

#

emoji_71

sleek harbor Jun 23, 2023, 3:29 PM

#

pine escarp Ohh, I see. Thank you.

are you only interested in pandas, or numpy and visualization too?

pine escarp Jun 23, 2023, 3:32 PM

#

sleek harbor are you only interested in pandas, or numpy and visualization too?

Yes!

#

I'm still a beginner, so I'm starting with pandas.

#

I'll learn numpy and matplotlib next.

#

And SQL simultaneously.

#

Pandas so far have been interesting.

pine escarp Jun 23, 2023, 3:34 PM

#

lapis sequoia kaggle mini courses are good as well.

Train me to smoke data as well. xD

lapis sequoia Jun 23, 2023, 3:35 PM

#

pine escarp Train me to smoke data as well. xD

😂😂

#

Can't reveal the secret sauce, but getting addicted to kaggle was the key for me ;)) might be something else for you.

past meteor Jun 23, 2023, 3:36 PM

#

sleek harbor should you standardize one hot encoded features if you standardize all your othe...

Yes if you're using L1 or L2, it's critical that your variables are on the same scale then

sleek harbor Jun 23, 2023, 3:36 PM

#

pine escarp I'll learn numpy and matplotlib next.

well once you're done with the theory for those, if you want some beginner friendly exercises, I'd recommend the 5 ones at the end of this course: https://www.freecodecamp.org/learn/data-analysis-with-python/#data-analysis-with-python-projects
I don't remember much about them honeslty, but I remember I enjoyed them. That said, they aren't exactly very good from the perspective of "high quality material", there are a few bugs in the exercises themselves (which you may or may not encounter, depending on your approach). But I think that's a good thing, cus you get to try figuring out exactly what is wrong. Long story short, try it if u feel like it. Don't have to go through the matterial if u already know the theory, can just do the exercises

freeCodeCamp.org

Learn to Code — For Free

past meteor Jun 23, 2023, 3:38 PM

#

The strength of your regularisation would be a teeny tiny bit stronger on your OHE'd variables

#

I think either way it might not make a difference. I mostly don't do it I think

sleek harbor Jun 23, 2023, 3:39 PM

#

but there's no harm if I feel like it, right?)

pine escarp Jun 23, 2023, 3:40 PM

#

sleek harbor well once you're done with the theory for those, if you want some beginner frien...

Thank youuu! I'll try them. kannapog

pine escarp Jun 23, 2023, 3:41 PM

#

lapis sequoia Can't reveal the secret sauce, but getting addicted to kaggle was the key for me...

Kaggle, I'll definitely try it.

hasty mountain Jun 23, 2023, 3:58 PM

#

pseudo moon denormalize ?

Yes. I don't know if you're using the MSE Loss for your VAE, but, in reality, the Decoder doesn't generate any image, it just generates parameters for a distribution(for RGB images, usually Normal distribution, and for grayscale, usually Bernoulli).

So, in order to convert each value in the Decoder output from a Normal Distribution to a proper image, you have to denormalize it by adding the dataset mean(which would be the mean of the Normal Distribution) and multiply by the standard deviation.

with torch.no_grad():
    input_noise = torch.randn_like(z).unsqueeze(-1).unsqueeze(-1)

    saving_image = decoder(input_noise)
                
saving_image = saving_image.view(saving_image.size(0), saving_image.size(2), saving_image.size(3), saving_image.size(1))
saving_image = saving_image.cpu().numpy()
saving_image = (saving_image * STD) + MEAN

#

||Yes, I'm the kind of guy who reject the use of .permute() in favor of .view() to convert torch tensors to numpy arrays||

#

Using this approach, your output goes from something like this:

#

To something like this:

pseudo moon Jun 23, 2023, 4:10 PM

#

Interesting, but I was talking about DAE/denoising autoencoder which I believe is different from VAE?

hasty mountain Jun 23, 2023, 4:10 PM

#

It is? pithink

#

I thought Denoising AutoEncoder was a Variational AutoEncoder optimized in a way that the generative factor is severely decreased

#

Ok, sorry then. Your model probably has anything to do with what I said. yert

pseudo moon Jun 23, 2023, 4:15 PM

#

hasty mountain I thought Denoising AutoEncoder was a Variational AutoEncoder optimized in a way...

I've never thought of it that way pithink

past meteor Jun 23, 2023, 4:17 PM

#

Regular autoencoders can be used for denoising as well

pseudo moon Jun 23, 2023, 4:17 PM

#

That is true

past meteor Jun 23, 2023, 4:17 PM

#

The general idea would be to add noise to your input and try to reconstruct the original

pseudo moon Jun 23, 2023, 4:18 PM

#

Do you think autoencoders should include batchnorm?

hasty mountain Jun 23, 2023, 4:18 PM

#

The term "autoencoder" became very confusing to me after I learned about diffusion models...
Every paper on latent diffusion uses the term "autoencoder" to actually refer to "variational autoencoder", but they never use the "variational" term.

past meteor Jun 23, 2023, 4:18 PM

#

If the bottleneck is small enough you could also just denoise naturally without adding noise.

#

Been a while since I used any autoencoder. I did a fun project some time ago where I used an AE to do an adversarial attack on resnet-50

#

Can also be used to self-supervised train nearly everything, cool stuff, cool stuff, ...

potent sky Jun 23, 2023, 6:03 PM

#

pine escarp Best site to practice pandas.

they're a few github repo's that basically go through a series of mini projects that involve you heavily use pandas
and they're designed in such a way that you get exposed to the different features
that could be helpful
but definitely kaggle comps + pandas user guide 💯

potent sky Jun 23, 2023, 6:04 PM

#

hasty mountain The term "autoencoder" became very confusing to me after I learned about diffusi...

yeah VAEs
ig they kinda laid it out in the stable diffusion paper so it became shorthand

grave summit Jun 23, 2023, 6:09 PM

#

hy guys

#

I have a Pandas dataframe containing a time series, one column contains every day of the year 2022 with every hour, so we get 24 rows per day and the other column is a price

#

I would like to calculate the mean price of each month, I built a DIY solution with counters variables etc but it somehow fucks up at the end and includes more rows than wanted for each month from february on

#

i would like to know if there is a quick method to get the prices for the days of a given month only by selecting only these rows to calculate the mean

#

i was able to do it by comparing the date with operators to each beginning of a new month

tidal bough Jun 23, 2023, 6:13 PM

#

grave summit i would like to know if there is a quick method to get the prices for the days o...

I'd add a month column and groupby by it.

#

and the month column you can calculate by... well, depends on what way your date is represented. either something from pd.Series.dt, or something from pd.Series.str.

grave summit Jun 23, 2023, 6:14 PM

#

i did it by pd.to_datetime

#

with a format

#

Y-m-d H

tidal bough Jun 23, 2023, 6:16 PM

#

pandas has a lot of stuff for manipulating datetime columns, such as https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month.html.

grave summit Jun 23, 2023, 6:17 PM

#

ok im going to work with this

#

thanks alot

#

#

I got this

#

@tidal bough

tidal bough Jun 23, 2023, 6:33 PM

#

what am I supposed to see here? I don't see a month column.

hasty mountain Jun 23, 2023, 6:45 PM

#

potent sky yeah VAEs ig they kinda laid it out in the stable diffusion paper so it became s...

AutoEncoders are really forsaken.

#

They have such a nice theory and idea, but it's so difficult to find anyone explaining it correctly. It's just "Make neural network, output smaller than input, then make another neural network, output equal desired image"

wooden sail Jun 23, 2023, 6:48 PM

#

for a regular autoencoder, that's really mostly it lol

#

what else do you think is missing?

hasty mountain Jun 23, 2023, 6:53 PM

#

It just feels a bit... too simple...and boring.

wooden sail Jun 23, 2023, 6:53 PM

#

the easiest way to think of it is that you generate a pair of functions, and they are inverses of each other

#

that's... all, really

hasty mountain Jun 23, 2023, 6:53 PM

#

pithink

wooden sail Jun 23, 2023, 6:54 PM

#

you don't need the "encoded" part to be lower dimensional

#

it doesn't have to be images

#

the architecture of each of the networks doesn't matter, varies by application

#

the cost func also

#

"autoencoder" is just a very general framework. that's why it's "boring"

#

you can use it in conjunction with any concrete application and details

hasty mountain Jun 23, 2023, 6:55 PM

#

I'm not used to simple things in neural networks...not since I discarded the use of keras

wooden sail Jun 23, 2023, 6:55 PM

#

i work a lot with things that are technically autoencoders, and you wouldn't recognize them at all

#

stuff like task based machine learning also falls under autoencoders a lot of the time

#

same with self supervised learning

#

in some sense, it's even just an interpretation of a network

#

cuz it doesn't have to be 2 in the first place

hasty mountain Jun 23, 2023, 6:56 PM

#

pithink

wooden sail Jun 23, 2023, 6:57 PM

#

(it also doesn't have to be "networks")

potent sky Jun 23, 2023, 6:57 PM

#

wooden sail the easiest way to think of it is that you generate a pair of functions, and the...

This ^

hasty mountain Jun 23, 2023, 6:57 PM

#

So...if I have a function that receives a 32x32x3 image...and outputs a single value (0 or 1)...can I call it an autoencoder?

wooden sail Jun 23, 2023, 6:58 PM

#

no because it doesn't satisfy the main condition stargazer just referenced again

hasty mountain Jun 23, 2023, 6:58 PM

#

Oh, yes...indeed...

wooden sail Jun 23, 2023, 6:58 PM

#

if you took the image and did whatever you want with it, spitting whatever out

#

then you take that whatever and remake something very close to the original image

#

you have yourself an autoencoder

hasty mountain Jun 23, 2023, 6:59 PM

#

Data augmentation = autoencoder?

wooden sail Jun 23, 2023, 6:59 PM

#

you could generate 10 million petabytes of data as that whatever, and it's still an autoencoder

hasty mountain Jun 23, 2023, 6:59 PM

#

pithink

lapis sequoia Jun 23, 2023, 6:59 PM

#

any network performing compression and reconstruction is basically AE ig

wooden sail Jun 23, 2023, 6:59 PM

#

it doesn't even have to be compression

potent sky Jun 23, 2023, 6:59 PM

#

Any function

#

And then inversing

wooden sail Jun 23, 2023, 6:59 PM

#

that's one of the most common cases because you use them in parameter estimation, but it does not have to be th case

#

literally just a forward-inverse pair

#

the key idea that makes it so powerful is that, if you put the pieces together in a special way, you can get the forward function to give you something useful, and the training is done based on the inverse function's output

#

essentially making the pair of functions "train themselves"

#

you only need input data, not even labeled pairs

#

but that's on the application side and varies greatly depending on what you want to do

#

the main idea is: forward-inverse pair

hasty mountain Jun 23, 2023, 7:02 PM

#

I think I'm an autoencoder fan an didn't even know that...because I enjoy using self-learning and unsupervised learning models...

wooden sail Jun 23, 2023, 7:02 PM

#

hasty mountain I think I'm an autoencoder fan an didn't even know that...because I enjoy using ...

these are often autoencoders

#

not always. but often times

#

in the very classical sense, like clustering, that's not an autoenc

hasty mountain Jun 23, 2023, 7:03 PM

#

Maybe I should review the maths on a loss function I've been using to extract the minimum entropy... It uses the argmax of a softmax function as label pithink

wooden sail Jun 23, 2023, 7:05 PM

#

let your mind be flexible

#

a lot of problems can be solved by noticing two things are actually the same thing if you close one eye and tilt your head

#

that's a big part of research and problem solving

#

so yeah, review your maths and get your intuition rolling

past meteor Jun 23, 2023, 7:50 PM

#

hasty mountain Maybe I should review the maths on a loss function I've been using to extract th...

Small bits I wanna add:

AE's are a bit similar to PCA in their application domain. PCA explicitly has orthogonal eigenvectors. AE's do not (think of what cons this specifically has). PCA is a linear method (obvious cons). Solution(s): people are looking towards beta-VAEs as well.

#

There's kernel PCA that has the best of both worlds. I like kernel methods in general but you can barely apply them in reality

silent spire Jun 23, 2023, 7:52 PM

#

Can anyone explain siamese neural networks like i'm a 5 year old

past meteor Jun 23, 2023, 7:55 PM

#

silent spire Can anyone explain siamese neural networks like i'm a 5 year old

Take a neural network. You have 3 inputs, 2 are of the same class and 1 isn't.

Give it input 1 and predict vector V1 (positive class)
Give it input 2 and predict vector V2 (positive class)
Give it input 3 and predict vector V3 (negative class)

You compute the euclidian distance between all points. You reward the model for having a low distance between V1 and V2 but also being far from V3. (triplet loss)

Is that clear enough?

hasty mountain Jun 23, 2023, 7:57 PM

#

past meteor Small bits I wanna add: AE's are a bit similar to PCA in their application dom...

Hm...interesting...
https://lilianweng.github.io/posts/2018-08-12-vae/#beta-vae

past meteor Jun 23, 2023, 7:58 PM

#

Important thing to remember next to all the math is that you want to get that sweet sweet property back that you had in PCA where each component (~ neuron in the information bottleneck) carries different information (is orthogonal)

#

Bit of a handwavy explanation but Edd can fill in the blanks haha

hasty mountain Jun 23, 2023, 7:59 PM

#

Each neuron -> a different component? pithink

past meteor Jun 23, 2023, 8:02 PM

#

From my slides of uni. That information bottleneck exists in both PCA and autoencoders

#

Hence why we covered them in the same lecture

#

Does this make sense or is it confusing?

wooden sail Jun 23, 2023, 8:04 PM

#

you can set constraints either on G or on z to get (near) orthogonality in the representing basis, like in pca

#

you don't necessarily need that part though. really depends on the problem at hand

#

i can give you an example of one we deal with at work

past meteor Jun 23, 2023, 8:05 PM

#

A position I applied for was using beta-VAE's specifically in physics

#

Because for them it was important and interesting that they were (near) orthogonal

wooden sail Jun 23, 2023, 8:07 PM

#

we often want to solve a so-called "inverse problem", where we measure data, we have some idea of the physical process, and we want to extract relevant parameters. so what we can do is take the parameters and let them be x in this image. use a physically-motivated forward model, some approximate solution of a differential equation, and let it be G. this part may or may not have trainable parameters. this spits out a z, which is technically "encoded" if you choose to interpret this as an encoder, but usually z is much higher dimensional than x in this application. then let a deep network be F, and have it estimate x. once all is said and done, F can be used in stand-alone fashion on real measurement data, provided that our G was good enough, and it inverts the problem. e.g. like locating objects using x-rays

#

the properties that G and z satisfy depend entirely on the application in general though, so yeah. that's where your domain expertise comes in

past meteor Jun 23, 2023, 8:10 PM

#

That's really interesting

#

Goonna shill my own stuff, like I mentioned last time I used an autoencoder it was to make image-dependent noise that works on out-of-sample examples that turns every example into an airplane

#

So essentially, it's a cheap and easy way to "attack" any model, provided you have access to its gradients which is totally unrealistic except in toy problems haha

wooden sail Jun 23, 2023, 8:17 PM

#

ooh nice

iron basalt Jun 23, 2023, 9:03 PM

#

hasty mountain It just feels a bit... too simple...and boring.

Autoencoders encode, automatically.

#

The encoding can be whatever.

iron basalt Jun 23, 2023, 9:04 PM

#

wooden sail it doesn't even have to be compression

^

barren fable Jun 24, 2023, 12:34 AM

#

i'm confused a bit in these pics, as i can understand the first column in has heart disease (True Positive) represents the actual data of training data am i right or wrong? and if im wrong what r 4 corners refer to?

serene scaffold Jun 24, 2023, 12:37 AM

#

barren fable i'm confused a bit in these pics, as i can understand the first column in has he...

so the point of the algorithm is to figure out if someone has heart disease or not. and there are two different ways for the algorithm can be incorrect: it can say that someone without heart disease does have it (a false positive), or it can say that someone with heart disease doesn't have it (a false negative)

#

whereas if the algorithm says they have it, and they actually do, that's a true positive. and you can probably figure out what a true negative is.

barren fable Jun 24, 2023, 12:42 AM

#

serene scaffold so the point of the algorithm is to figure out if someone has heart disease or n...

i will explain to u what i understood and if i got smt wrong tell me the right thing, first
so we divide the data whatever into 4 fold cross validation or whatever.. so we divide the data let's say into 75% of training data and 25% for testing data, the actual column of "has heard disease" & "does not have heart disease" it represents the real data which is training data is that right?

serene scaffold Jun 24, 2023, 12:42 AM

#

barren fable i will explain to u what i understood and if i got smt wrong tell me the right t...

sounds like you have a couple different concepts mixed up.

#

k-fold cross validation is where you divide the data into k partitions, and then you do the algorithm k times, and each partition "takes a turn" being the test data.

#

though it sounds like you might understand that much well enough

#

the actual column of "has heard disease" & "does not have heart disease" it represents the real data which is training data is that right?
all the data--both the training data and the test data--is "real".

barren fable Jun 24, 2023, 12:50 AM

#

serene scaffold though it sounds like you might understand that much well enough

yep u r right

serene scaffold Jun 24, 2023, 12:51 AM

#

for each data point, there's what it actually is, and what the model predicts that it is.

#

#

here it is where it says which is true positive (TP), false positive (FP), etc.

barren fable Jun 24, 2023, 12:53 AM

#

serene scaffold > the actual column of "has heard disease" & "does not have heart disease" it re...

i mean the output, ok what i meant that we took like 75% of data and we took 25% for testing to c yk which method will gave us lowest errors..

barren fable Jun 24, 2023, 12:54 AM

#

serene scaffold here it is where it says which is true positive (TP), false positive (FP), etc.

cool so what does true positive false positive etc.. represent? training or testing?

serene scaffold Jun 24, 2023, 12:55 AM

#

barren fable cool so what does true positive false positive etc.. represent? training or test...

you do those counts when you test the algorithm

barren fable Jun 24, 2023, 12:57 AM

#

barren fable i'm confused a bit in these pics, as i can understand the first column in has he...

@serene scaffold the point that made me truly confused in this pic that he said the columns r the actual data and the rows r the predicted data

#

forget about this one, let's loot at other example

serene scaffold Jun 24, 2023, 12:59 AM

#

barren fable <@253696366952316929> the point that made me truly confused in this pic that he ...

there isn't separate predicted and actual data. there's predictions about the data. this is a key distinction

#

the algorithm does not produce new data.

barren fable Jun 24, 2023, 1:00 AM

#

so here, we got actual 142 ppl had heart disease, and our prediction misclassified 29 ppl that's right?

serene scaffold Jun 24, 2023, 1:00 AM

#

barren fable so here, we got actual 142 ppl had heart disease, and our prediction misclassifi...

the algorithm misclassified 29 + 22 people.

barren fable Jun 24, 2023, 1:02 AM

#

serene scaffold the algorithm misclassified `29 + 22` people.

u r right, but exactly it misclassified 29 ppl said they does not have heart disease but they have right?

serene scaffold Jun 24, 2023, 1:02 AM

#

barren fable u r right, but exactly it misclassified 29 ppl said they does not have heart dis...

right

barren fable Jun 24, 2023, 1:04 AM

#

serene scaffold right

ok so i got here the last 2 question
the first one is, the 142 is that the all real data? like we took the all data which is 142 ppl had heart disease and we tested it or it's a sample like 75% and we tested on 25%?

serene scaffold Jun 24, 2023, 1:18 AM

#

barren fable ok so i got here the last 2 question the first one is, the 142 is that the all r...

There's no fake data. It's all the real data.

The number of data instances in the test data is the sum of the four squares. 142 + 22 + 29 + 100

#

@barren fable make sense?

barren fable Jun 24, 2023, 1:48 AM

#

serene scaffold <@1120843210604949536> make sense?

i was watching a vid to explain it so i'll tell u what i understood step by step

barren fable Jun 24, 2023, 1:54 AM

#

serene scaffold <@1120843210604949536> make sense?

look at these 2
the one on the right said the columns r actual data and the rows r the predicted data
the one on the left said the columns r the predicted (which means as he explained the values coming out of the model) and rows r the expected (which means as he said the actual data the model is supposed to predict)
so im confused which one is right? or he transposed the row and columns?

serene scaffold Jun 24, 2023, 1:57 AM

#

barren fable look at these 2 the one on the right said the columns r actual data and the rows...

actual data and the rows r the predicted data
Banish this from your mind

#

there's not actual data and predicted data. there's just the data. this is very important.

barren fable Jun 24, 2023, 1:58 AM

#

serene scaffold there's not actual data and predicted data. there's just the data. this is very ...

well um that what the video said not me 😂

serene scaffold Jun 24, 2023, 1:58 AM

#

barren fable well um that what the video said not me 😂

then find a better video bing_shrug

#

there's just the data. and there's what the data actually is (actual), and what the model says the data is (predicted)

#

no new data is created.

#

there's no standard for whether rows should represent actual, or if columns should represent actual.

#

but the point is what it represents.

barren fable Jun 24, 2023, 2:01 AM

#

damn that's the answer i was looking for

#

i got it rn

serene scaffold Jun 24, 2023, 2:01 AM

#

great

barren fable Jun 24, 2023, 2:02 AM

#

serene scaffold great

so r we using cross validation in this process or no? like dividing the data for training and testing?

serene scaffold Jun 24, 2023, 2:02 AM

#

barren fable so r we using cross validation in this process or no? like dividing the data for...

this is a confusion matrix. you can have a confusion matrix whether you're doing cross validation or not.

#

cross validation is where you do the whole process multiple times, but you divide the data into train and test differently each time. and you see how much of an impact that makes on the performance

#

if there's a big difference between the best time and the worst time, then something is probably wrong

#

(for other people reading this, I'm trying to explain this as simply as possible, using terms that the asker has already used)

barren fable Jun 24, 2023, 2:07 AM

#

serene scaffold cross validation is where you do the whole process multiple times, but you divid...

yo Stel thx bro u r a lifesaver ❤️

serene scaffold Jun 24, 2023, 2:07 AM

#

barren fable yo Stel thx bro u r a lifesaver ❤️

yw
are you taking a class or something?

barren fable Jun 24, 2023, 2:08 AM

#

serene scaffold yw are you taking a class or something?

rn nope

carmine nest Jun 24, 2023, 3:00 AM

#

CP1

royal crest Jun 24, 2023, 4:12 AM

#

What's CP1?

young pewter Jun 24, 2023, 6:05 AM

#

What are some potential problems with linear regression?

#

and what exactly does linear regression do?

#

i googled the comparison between linear and logistic regressions and linear solves regression problems (not too sure what that means) while logistic regression solves classification problems, which i get more but an explanation would still help a lot

slender kestrel Jun 24, 2023, 7:00 AM

#

young pewter What are some potential problems with linear regression?

it works best mostly for linear data it cant handle complex data like a polynomial type data

slender kestrel Jun 24, 2023, 7:00 AM

#

young pewter and what exactly does linear regression do?

find the best fitting line through your data thats it

young pewter Jun 24, 2023, 7:01 AM

#

slender kestrel find the best fitting line through your data thats it

wb logistic regression?

slender kestrel Jun 24, 2023, 7:01 AM

#

young pewter wb logistic regression?

well it fits a sigmoid fucntion through the data i can send you a video link that explain its very clearly

#

i cant send images in here so the sigmoid function is sort of hard to explain ;-;

young pewter Jun 24, 2023, 7:02 AM

#

u can dm?

slender kestrel Jun 24, 2023, 7:02 AM

#

sure

#

hello i need some advice regarding machine learning career that i am perusing so please let me know if anyone can help

wooden sail Jun 24, 2023, 7:09 AM

#

you can fit a polynomial with linear regression

slender kestrel Jun 24, 2023, 7:33 AM

#

wooden sail you can fit a polynomial with linear regression

umm iirc its polynomial regression right ?

#

please correct me if i am wrong

wooden sail Jun 24, 2023, 7:34 AM

#

splitting them into categories like that does you a disservice

#

in both cases, you set up a matrix problem of the form y = Ax + b, and you solve it as A^-1 (y-b)

simple tapir Jun 24, 2023, 7:35 AM

#

Hi guys, why is the graphics of cost function in gradient descent algorithm shown parabolically? There's no x^2 or something in linear regression

wooden sail Jun 24, 2023, 7:36 AM

#

the cost function usually does have x^2s in it though

simple tapir Jun 24, 2023, 7:37 AM

#

Isn't the cost function (theta.X)/m though?

wooden sail Jun 24, 2023, 7:37 AM

#

.latex you usually use something of the form
[
J(bm{x}) = \Vert f(\bm{x} - y) \Vert_2^2
]

strange elbowBOT Jun 24, 2023, 7:37 AM

#

$latex.png$

simple tapir Jun 24, 2023, 7:37 AM

#

in a linear regression model

#

oh

#

MSE

wooden sail Jun 24, 2023, 7:37 AM

#

oof i forgot to make the y bold. but anyway, MSE has the squares in the name 😛

#

also when you set up y = Ax, it often does not actually have a solution

past meteor Jun 24, 2023, 7:38 AM

#

The intuition is that the word linear means that one unit of increase in your variable means one a certain increase in your target, given by your coefficient

wooden sail Jun 24, 2023, 7:38 AM

#

you instead minimize the error using some metric, and MSE is a common metric

#

not all metrics involve squares, but most of them are nonlinear and so you get curves

slender kestrel Jun 24, 2023, 7:38 AM

#

wooden sail in both cases, you set up a matrix problem of the form y = Ax + b, and you solve...

but is int the meaning of linear regression supposed to fitting a linear fuction aka a line ?

simple tapir Jun 24, 2023, 7:38 AM

#

^

wooden sail Jun 24, 2023, 7:38 AM

#

slender kestrel but is int the meaning of linear regression supposed to fitting a linear fuction...

a linear function is not a line, that's the problem 😛

#

a linear function is a function such that f(u + cv) = f(u) + cf(v)

past meteor Jun 24, 2023, 7:39 AM

#

And that's a big assumption. Some things are great in the beginning but they start sucking. For example temperature vs happiness. You keep getting happier the warmer it gets but when it's 50 ° c you get sadder

#

That's a typical non linear relationship

slender kestrel Jun 24, 2023, 7:39 AM

#

wooden sail a linear function is not a line, that's the problem 😛

bro wait you just shook my entire math foundation i thought linear fucntion =line lol

simple tapir Jun 24, 2023, 7:39 AM

#

So, we don't mean Ax + b with "linear function" then?

wooden sail Jun 24, 2023, 7:40 AM

#

Ax + b is an affine transformation, that's actually not even linear

#

unless you use some tricks

simple tapir Jun 24, 2023, 7:40 AM

#

Otherwise, it wouldn't really work like how gradient descent works

past meteor Jun 24, 2023, 7:40 AM

#

You can solve the problem by taking other models or new features, maybe you make a variable called temperature below 30 and temperature above 30

wooden sail Jun 24, 2023, 7:40 AM

#

gradient descent has NOTHING to do with linear functions

#

at all

#

gradient descent itself does a linearization in a neighborhood of a point, the function you minimize does not have to be linear. only differentiable

slender kestrel Jun 24, 2023, 7:41 AM

#

wooden sail gradient descent has NOTHING to do with linear functions

yea i meant its just finding the minima of a the damm loss function '

simple tapir Jun 24, 2023, 7:42 AM

#

So that parabolic graphic depends on the problem. In some cases, Ax + b is used which doesn't make it a parabolic and in some cases where Ax+b doesn't work, MSE is preferred

#

right

slender kestrel Jun 24, 2023, 7:42 AM

#

wooden sail a linear function is a function such that f(u + cv) = f(u) + cf(v)

so you trynna say that i can do polynomical regression with linear regression care to explain how please ?

simple tapir Jun 24, 2023, 7:43 AM

#

wooden sail gradient descent has NOTHING to do with linear functions

Doesn't it though? If it's a line, how would you find the local minimum with learning rate?

wooden sail Jun 24, 2023, 7:43 AM

#

simple tapir Doesn't it though? If it's a line, how would you find the local minimum with lea...

none of those things are related to linearity

simple tapir Jun 24, 2023, 7:43 AM

#

it'd be probably x = 0

simple tapir Jun 24, 2023, 7:44 AM

#

wooden sail none of those things are related to linearity

I mean in the cases where the loss function is defined as Ax's

wooden sail Jun 24, 2023, 7:45 AM

#

say we have a polynomial of degree 2. you can easily generalize what i'm about to do. we let y = ax^2 + bx + c. to do regression, we need pairs of observations (x_n, y_n). then we can write N equations of the form y_n = ax_n^2 + bx_n + c

#

we can write that in matrix form as follows

slender kestrel Jun 24, 2023, 7:46 AM

#

wooden sail say we have a polynomial of degree 2. you can easily generalize what i'm about t...

right

wooden sail Jun 24, 2023, 7:47 AM

#

.latex
[
\begin{bmatrix}
y_1 \
y_2 \
y_3 \
\vdots
\end{bmatrix}

\begin{bmatrix}
x_1^2 && x_1 && 1 \
x_2^2 && x_2 && 1 \
x_3^2 && x_3 && 1 \
\vdots
\end{bmatrix} \cdot \bm{x}
]

strange elbowBOT Jun 24, 2023, 7:47 AM

#

Failed to render input.

View Logs

wooden sail Jun 24, 2023, 7:47 AM

#

sigh one second

grave summit Jun 24, 2023, 7:47 AM

#

hello guys, I am trying to detect seasonalities in a financial time series that has a price for each hour of the day for an entire year. I would like to do this by using FFT, in order to do this i need to use a window function for tappering the time series, any of you got some advice on how to choose the window function ?

#

I`m trying to highlights two types of seasonalities, macro (between months and weeks) and micro (between days and hours)

strange elbowBOT Jun 24, 2023, 7:49 AM

#

Failed to render input.

View Logs

wooden sail Jun 24, 2023, 7:49 AM

#

ooof man, i hate this bot

slender kestrel Jun 24, 2023, 7:49 AM

#

grave summit hello guys, I am trying to detect seasonalities in a financial time series that ...

umm why you wanna find seasonality using fft there are other ways to do it right

slender kestrel Jun 24, 2023, 7:50 AM

#

wooden sail ooof man, i hate this bot

lol thnx a lot man for trying

wooden sail Jun 24, 2023, 7:50 AM

#

.latex
[
\begin{bmatrix}
y_1
y_2
y_3
\vdots
\end{bmatrix}

\begin{bmatrix}
x_1^2 && x_1 && 1 \
x_2^2 && x_2 && 1 \
x_3^2 && x_3 && 1 \
&& \vdots &&
\end{bmatrix} \cdot \bm{x}
]

strange elbowBOT Jun 24, 2023, 7:50 AM

#

$latex.png$

wooden sail Jun 24, 2023, 7:50 AM

#

ok there we go

#

i guess those should've been horizontal dots in the y vector, but nevermind

grave summit Jun 24, 2023, 7:51 AM

#

@slender kestrel In the end i aim at getting a column vector containing one number for each hourly price by which i will multiply to apply my seasonality

#

fourier might be a good way of doing so i thought

wooden sail Jun 24, 2023, 7:51 AM

#

this effectively turns the polynomial fitting problem into one of the form y = Ax with a toeplitz matrix A, and we find x via linear regression

#

also, seasonality is indeed found via FFTs

slender kestrel Jun 24, 2023, 7:52 AM

#

strange elbow

ooh i see how you are trying to explain linear regression with polynomial regression you trynna say that polynomial regression is an extension of linear if we assume x1^2 as a dimension right

wooden sail Jun 24, 2023, 7:52 AM

#

that decomposes your data into sinusoids of predefined frequencies, letting you find what the perodicity of the data is

slender kestrel Jun 24, 2023, 7:52 AM

#

slender kestrel ooh i see how you are trying to explain linear regression with polynomial regres...

@wooden sail

grave summit Jun 24, 2023, 7:52 AM

#

yeah that's why i want to use fft

wooden sail Jun 24, 2023, 7:52 AM

#

slender kestrel ooh i see how you are trying to explain linear regression with polynomial regres...

the two problems are the same thing, you can write your polynomials as vectors

slender kestrel Jun 24, 2023, 7:53 AM

#

wooden sail that decomposes your data into sinusoids of predefined frequencies, letting you ...

very true but seasonality can be found via auto correlation too

slender kestrel Jun 24, 2023, 7:53 AM

#

wooden sail the two problems are the same thing, you can write your polynomials as vectors

got it got it what you were trynna say

wooden sail Jun 24, 2023, 7:53 AM

#

you would usually still do an FFT of the autocorrelation

grave summit Jun 24, 2023, 7:54 AM

#

i can do both since i want to learn about seasonalities

wooden sail Jun 24, 2023, 7:55 AM

#

as for the window functions, using no window function is equivalent to using a rectangular function. this convolves the spectrum with a sinc, which has good and bad properties

grave summit Jun 24, 2023, 7:55 AM

#

blackman might be good i heard

wooden sail Jun 24, 2023, 7:55 AM

#

it has the highest resolution in the sense that the peaks are the narrowest, but in exchange you get side lobes that might make it seem like there are other frequency components

#

blackman and blackman harris are alternatives. those try to remove the side lobes but make the main lobe wider

grave summit Jun 24, 2023, 7:56 AM

#

I mean the size of my window function will affect the type of seasonalities i get from my analysis right?

slender kestrel Jun 24, 2023, 7:56 AM

#

wooden sail you would usually still do an FFT of the autocorrelation

if the auto correlation graph of a function is sinosudal or cosinosudal it wont require fft iirc

wooden sail Jun 24, 2023, 7:56 AM

#

it depends which things your data is sensitive to: false positives in the trends, or high resolution (closely spaced frequency components)

grave summit Jun 24, 2023, 7:56 AM

#

how can i test for this ? @wooden sail

hexed ibex Jun 24, 2023, 7:56 AM

#

#

@pine escarp '

slender kestrel Jun 24, 2023, 7:57 AM

#

wooden sail blackman and blackman harris are alternatives. those try to remove the side lobe...

i am not usually impressed but you have impressed me today by your immense amount of knowledge

wooden sail Jun 24, 2023, 7:57 AM

#

grave summit how can i test for this ? <@467435887236612106>

the easiest way is to just try a couple of them tbh, this is part of the exploratory part

grave summit Jun 24, 2023, 7:57 AM

#

ok i do a few fft with different windows an check the results

wooden sail Jun 24, 2023, 7:58 AM

#

slender kestrel i am not usually impressed but you have impressed me today by your immense amoun...

given how long i've been in uni, fourier and linear algebra are the types of things you could wake me up at 3 am to ask me questions about, and i should be able to answer

grave summit Jun 24, 2023, 7:58 AM

#

which major did you do ?

slender kestrel Jun 24, 2023, 7:58 AM

#

wooden sail given how long i've been in uni, fourier and linear algebra are the types of thi...

i mean linear algebra i am okish at that but fourier naah i i didnt like that a bit in uni

wooden sail Jun 24, 2023, 7:59 AM

#

i did telecomm in bsc, comms and sig proc in msc, and doing more sig proc in phd

hexed ibex Jun 24, 2023, 7:59 AM

#

@pine escarp are you there

slender kestrel Jun 24, 2023, 7:59 AM

#

wooden sail i did telecomm in bsc, comms and sig proc in msc, and doing more sig proc in phd

ooh can i dm you sometime for advice ? if you dont mind it ?

wooden sail Jun 24, 2023, 8:00 AM

#

i'd rather not 😛

slender kestrel Jun 24, 2023, 8:00 AM

#

wooden sail i'd rather not 😛

lol fine

#

i too wanted to learn more about time series data and sesaonality coz the articles avalible on medium teach you not too much

#

so wanted to ask you more about that stuff

wooden sail Jun 24, 2023, 8:01 AM

#

i think maybe zestar is a better person to ask. i probably know the stuff with different names, but my approaches are "unorthodox" compared to what you usually see in data science

grave summit Jun 24, 2023, 8:01 AM

#

i also have one last question Edd

slender kestrel Jun 24, 2023, 8:02 AM

#

wooden sail i think maybe zestar is a better person to ask. i probably know the stuff with d...

zestar75 that person ?

wooden sail Jun 24, 2023, 8:02 AM

#

yeah

grave summit Jun 24, 2023, 8:02 AM

#

in the end as i said i would like to get a different coefficient for each hourly price of my series by which i multiply it to take into account seasonalities

slender kestrel Jun 24, 2023, 8:02 AM

#

wooden sail i think maybe zestar is a better person to ask. i probably know the stuff with d...

i shall try pinging them @past meteor

grave summit Jun 24, 2023, 8:02 AM

#

will i be able to get this out of my fft analyisis ?

#

as I'm not only aiming at a graphical spectrum analysis

past meteor Jun 24, 2023, 8:03 AM

#

What's the question?

slender kestrel Jun 24, 2023, 8:03 AM

#

past meteor What's the question?

i too wanted to learn more about time series data and sesaonality coz the articles avalible on medium teach you not too much
so wanted to ask you more about that stuff

past meteor Jun 24, 2023, 8:04 AM

#

https://otexts.com/fpp3/

Forecasting: Principles and Practice (3rd ed)

3rd edition

wooden sail Jun 24, 2023, 8:04 AM

#

grave summit will i be able to get this out of my fft analyisis ?

hmm not with a single fft. a single fft will assign one coefficient to each frequency/period. so say something repeats every 30 minutes, you will get one coefficient for the whole data, telling you how strong the 30 minute repetition component is. if you instead what to see how this coefficient changes every x hours, you would instead use a "spectrogram". this splits the data into sub windows, and then FFTs each of them

past meteor Jun 24, 2023, 8:05 AM

#

Forecasting is more business related time series analysis but I think this book is a great place to start, afterwards you can go for a more advanced text (this one is also rigorous, but quite practical)

slender kestrel Jun 24, 2023, 8:05 AM

#

past meteor Forecasting is more business related time series analysis but I think this book ...

alright thnx mate ! also i am looking for advice for my machine learning career can you help with that ?

wooden sail Jun 24, 2023, 8:06 AM

#

the best advice is going to uni if you wanna learn it well 😛

pine escarp Jun 24, 2023, 8:06 AM

#

hexed ibex

Since when do you get this error?

grave summit Jun 24, 2023, 8:06 AM

#

yeah agree, sstem degrees are best learn at uni unless you are a genius urself

past meteor Jun 24, 2023, 8:07 AM

#

Yeah, university is key

hexed ibex Jun 24, 2023, 8:07 AM

#

from starting only

pine escarp Jun 24, 2023, 8:08 AM

#

hexed ibex from starting only

So ever since you installed anaconda, you get this error for jupyter notebook?

slender kestrel Jun 24, 2023, 8:08 AM

#

wooden sail the best advice is going to uni if you wanna learn it well 😛

i am in uni 😭 they dont teach a crap they teach theoretical stuff not the pratical thing ... i have completed specializations like the one by andrew ng completed the statquest play list for stat and machine learning but i still think i am missing stuff

hexed ibex Jun 24, 2023, 8:08 AM

#

pine escarp So ever since you installed anaconda, you get this error for jupyter notebook?

yes

wooden sail Jun 24, 2023, 8:08 AM

#

slender kestrel i am in uni 😭 they dont teach a crap they teach theoretical stuff not the pra...

if you understand the theoretical stuff, the practical part follows immediately

grave summit Jun 24, 2023, 8:08 AM

#

thanks edd gonna try some stuff and be back

wooden sail Jun 24, 2023, 8:08 AM

#

read the papers where they discuss implementation details

slender kestrel Jun 24, 2023, 8:09 AM

#

wooden sail read the papers where they discuss implementation details

yup i recently worked on implementing lipnet myself was able to do it but took me a hell lot of time

pine escarp Jun 24, 2023, 8:10 AM

#

hexed ibex yes

https://github.com/ContinuumIO/anaconda-issues/issues/12735

GitHub

"Exit code: 1" in anaconda navigator after reinstalling · Issue #12...

Actual Behavior After reinstalling Anaconda, I get the error message "Exit code: 1" when I start apps using the Anaconda Navigator. e.g. "jupyter notebook" or "CMD.exe"...

slender kestrel Jun 24, 2023, 8:11 AM

#

wooden sail read the papers where they discuss implementation details

i am looking for research internships in universities you think my level is ok to apply for them ?

wooden sail Jun 24, 2023, 8:11 AM

#

there are internships at all levels

pine escarp Jun 24, 2023, 8:11 AM

#

hexed ibex yes

Did you try opening the notebook using the anaconda prompt?

slender kestrel Jun 24, 2023, 8:12 AM

#

wooden sail there are internships at all levels

any other advice you can give me (other than going to uni) to improve myself in datascience more ?

wooden sail Jun 24, 2023, 8:13 AM

#

read extra books and papers, not just what they ask of you in uni. if you find any tasks/projects/topics you're interested in, go ahead and play around with them

slender kestrel Jun 24, 2023, 8:14 AM

#

wooden sail read extra books and papers, not just what they ask of you in uni. if you find a...

any book you want to recommend ?

wooden sail Jun 24, 2023, 8:14 AM

#

gilbert strang's linalg and axler's linalg done right

#

boyd's convex optimization

hexed ibex Jun 24, 2023, 8:15 AM

#

pine escarp Did you try opening the notebook using the anaconda prompt?

no

past meteor Jun 24, 2023, 8:15 AM

#

slender kestrel i am in uni 😭 they dont teach a crap they teach theoretical stuff not the pra...

My uni was very theoretical as well but the trick is to do practical work on the side

wooden sail Jun 24, 2023, 8:15 AM

#

louis scharf's statistical signal processing too

pine escarp Jun 24, 2023, 8:16 AM

#

hexed ibex no

Try it, it might work.

hexed ibex Jun 24, 2023, 8:16 AM

#

pine escarp Did you try opening the notebook using the anaconda prompt?

can you come in voice chat i will share the screen

pine escarp Jun 24, 2023, 8:16 AM

#

hexed ibex can you come in voice chat i will share the screen

Oh, sure.

slender kestrel Jun 24, 2023, 8:16 AM

#

wooden sail louis scharf's statistical signal processing too

alright thnx imma look into them !

past meteor Jun 24, 2023, 8:16 AM

#

I did internships, I was active in my city's data science community as a student, ...

#

Kaggle was a big help too

slender kestrel Jun 24, 2023, 8:18 AM

#

past meteor Kaggle was a big help too

yup i make all my models on kaggle itself so far i have worked on a toxicity detector a flower classification project lip net did some basic eda projects and worked on analyzing data for my professors also completed various courses about the theory part

#

so now i am looking for internships but i want to do research internships in universities but i always have this feeling that i dont know much so thats why i was asking you guys if its ok to apply or not ?

past meteor Jun 24, 2023, 8:20 AM

#

Apply

lapis sequoia Jun 24, 2023, 8:20 AM

#

slender kestrel yup i make all my models on kaggle itself so far i have worked on a toxicity det...

Should be easy to grab research internship in unis, not many students do ML/ data science. Get your hands dirty with coding part, you already have theoretical knowledge I presume, gather few research ideas, and reach out to professors directly or ask your professor to refer you.

slender kestrel Jun 24, 2023, 8:21 AM

#

past meteor Apply

alright any specific universities you would like to recommend ?

slender kestrel Jun 24, 2023, 8:21 AM

#

lapis sequoia Should be easy to grab research internship in unis, not many students do ML/ da...

thnx mate imma try reaching out to some professors in universities

hexed ibex Jun 24, 2023, 8:21 AM

#

@pine escarp join voice chat

past meteor Jun 24, 2023, 8:22 AM

#

slender kestrel alright any specific universities you would like to recommend ?

Your own university?

pine escarp Jun 24, 2023, 8:22 AM

#

hexed ibex <@369156256897826826> join voice chat

Done.

slender kestrel Jun 24, 2023, 8:23 AM

#

past meteor Your own university?

ooh see the problem with my university is there arent many opportunities in there thats i am looking for some foreign universities to apply to

hexed ibex Jun 24, 2023, 8:23 AM

#

i cant share the screen

slender kestrel Jun 24, 2023, 8:23 AM

#

slender kestrel ooh see the problem with my university is there arent many opportunities in ther...

everything i have learned is on my own basically no help from uni

past meteor Jun 24, 2023, 8:23 AM

#

To do an internship? Do you have internship credits in your program?

pine escarp Jun 24, 2023, 8:24 AM

#

hexed ibex i cant share the screen

Ohh.

slender kestrel Jun 24, 2023, 8:24 AM

#

past meteor To do an internship? Do you have internship credits in your program?

yup thats for the last semester like the 8 th semester and i dont want to wait till the last semester

pine escarp Jun 24, 2023, 8:25 AM

#

hexed ibex i cant share the screen

I'll tell you what to do.
Search anaconda prompt in search and open it.
Type jupyter notebook once it's ready to use.

past meteor Jun 24, 2023, 8:25 AM

#

How are you going to do an internship abroad when you still have class in your home university pithink

slender kestrel Jun 24, 2023, 8:26 AM

#

past meteor How are you going to do an internship abroad when you still have class in your h...

remote internships program ? and my university will only let me convert any other semester to internship semester if am doing a research project in some renowned university smh

#

down side to this is

#

i gotta study in the 8th semester

#

instead of doing internship at that time

#

so i am more inclined towards finding a remote internship

past meteor Jun 24, 2023, 8:28 AM

#

Idk how easy to find remote internships at foreign universities are. The remote part also defeats the purpose of an internship imo.

slender kestrel Jun 24, 2023, 8:28 AM

#

past meteor Idk how easy to find remote internships at foreign universities are. The remote ...

well so i am in a mess ducky_skull basically

past meteor Jun 24, 2023, 8:29 AM

#

Are you in Europe, if so can you do an Erasmus?

slender kestrel Jun 24, 2023, 8:29 AM

#

past meteor Are you in Europe, if so can you do an Erasmus?

naah bro india .-.

past meteor Jun 24, 2023, 8:29 AM

#

Totally fine as well, many indian students here doing their master's

slender kestrel Jun 24, 2023, 8:30 AM

#

i guess i should wait till graduation ;-; then only i can be free smh

past meteor Jun 24, 2023, 8:30 AM

#

You'll be fine, don't worry

pine escarp Jun 24, 2023, 8:30 AM

#

I'm from India as well.

slender kestrel Jun 24, 2023, 8:30 AM

#

pine escarp I'm from India as well.

ayo fellow indian lol hello

past meteor Jun 24, 2023, 8:31 AM

#

If you already know you want to be in data science / ML you can already start the practical side of things.

slender kestrel Jun 24, 2023, 8:31 AM

#

past meteor You'll be fine, don't worry

🫂 thnx mate

slender kestrel Jun 24, 2023, 8:31 AM

#

past meteor If you already know you want to be in data science / ML you can already start th...

yup trying to implement research paper on my own these days

pine escarp Jun 24, 2023, 8:31 AM

#

slender kestrel ayo fellow indian lol hello

Are you from north?

slender kestrel Jun 24, 2023, 8:31 AM

#

pine escarp Are you from north?

yup

lapis sequoia Jun 24, 2023, 8:32 AM

#

slender kestrel naah bro india .-.

Mitacs global link - Canada, DAAD - Eu uni, DSSG for UK uni, NTHU for taiwanese uni... there are tons of research intern / summer intern programs. you can apply for them

pine escarp Jun 24, 2023, 8:32 AM

#

slender kestrel yup

That's cool!

lapis sequoia Jun 24, 2023, 8:32 AM

#

and not to mention you can do research intern at IITs with simple cold mailing any professor

#

given you got skills ofc

slender kestrel Jun 24, 2023, 8:33 AM

#

lapis sequoia and not to mention you can do research intern at IITs with simple cold mailing a...

ooh imma try that planning to spam their dms lol jk

pine escarp Jun 24, 2023, 8:33 AM

#

lapis sequoia and not to mention you can do research intern at IITs with simple cold mailing a...

Data smoker.

slender kestrel Jun 24, 2023, 8:33 AM

#

lapis sequoia Mitacs global link - Canada, DAAD - Eu uni, DSSG for UK uni, NTHU for taiwanese ...

you too indian my guy ?

silent spire Jun 24, 2023, 3:00 PM

#

Hey guys, I'm new here

shadow viper Jun 24, 2023, 3:19 PM

#

silent spire Hey guys, I'm new here

Hey, welcome
Hope you enjoy your stay

#

I hope you improve and impact here also

barren fable Jun 24, 2023, 4:16 PM

#

@serene scaffold yo hru?

serene scaffold Jun 24, 2023, 4:42 PM

#

barren fable <@253696366952316929> yo hru?

I'm just fabulous as always.

sleek harbor Jun 24, 2023, 4:56 PM

#

am I correct in assuming:

you should not standardize principal components after PCA
when using RandomState for reproducing results and comparing CV scores for different models you should initialize a new RandomState instance for each estimator (not declare a rng variable at the top of the program, and pass it down to any object that accepts a random_state parameter) in order to prevent them from influencing each other by consuming the RNG?

pls confirm or correct me if u don't mind 🤗

wooden sail Jun 24, 2023, 4:57 PM

#

what are you calling "principal component" here? the vectors or the coefficients?

#

ah i had misread that as normalize. standardize as in making them have mean 0 and var 1 would indeed ruin them. PCA gives an orthonormal basis, so normalization is already taken care of there. due to orthogonality you can also straightforwardly conclude that all the coefficients of the vectors are between -1 and 1 if the input vectors to be PCA'd have magnitude 1. the distribution of the coefficients per input vector is arbitrary though and you can't change it, otherwise they don't synthesize the data back.

shadow viper Jun 24, 2023, 5:09 PM

#

wooden sail ah i had misread that as normalize. standardize as in making them have mean 0 an...

I'm really sorry but I gotta ask... Did y'all learn all this PCA stuff from school or self taught?
I'm new to this that's why I'm asking

sleek harbor Jun 24, 2023, 5:10 PM

#

wooden sail what are you calling "principal component" here? the vectors or the coefficients...

I meant what you get when you call fit_transform (sklearn.decomposition.PCA), could be using the wrong terminology here..

past meteor Jun 24, 2023, 5:10 PM

#

sleek harbor am I correct in assuming: 1. you should *not* standardize principal components ...

You need to rescale your data to unit variance before doing PCA. Afterwards it should still be unit variance, if in doubt just plot your data

wooden sail Jun 24, 2023, 5:10 PM

#

shadow viper I'm really sorry but I gotta ask... Did y'all learn all this PCA stuff from scho...

learned matrix decompositions in uni

wooden sail Jun 24, 2023, 5:10 PM

#

sleek harbor I meant what you get when you call fit_transform (sklearn.decomposition.PCA), co...

idk what that gives you

#

PCA decomposes into orthogonal vectors and their coefficients, sklearn should give you both

sleek harbor Jun 24, 2023, 5:11 PM

#

shadow viper I'm really sorry but I gotta ask... Did y'all learn all this PCA stuff from scho...

fully self taught

past meteor Jun 24, 2023, 5:11 PM

#

sleek harbor am I correct in assuming: 1. you should *not* standardize principal components ...

I'd set the seed on every single instance if I'm really bothered about having complete reproducibility

#

But that's mostly because I'm unsure if you'll have it fully reproducible if you set the seed on the top

#

Personally I'm mostly worried about the reproducibility of my data splitting and not more than that (specifically each estimator)

shadow viper Jun 24, 2023, 5:12 PM

#

wooden sail learned matrix decompositions in uni

Cool...

shadow viper Jun 24, 2023, 5:13 PM

#

sleek harbor fully self taught

Same here... YouTube is my best friend right now
Thanks, I will get there

sleek harbor Jun 24, 2023, 5:13 PM

#

wooden sail PCA decomposes into orthogonal vectors and their coefficients, sklearn should gi...

I'm guessing it's the vectors the, cus it's not the things I call "loadings", which I think is the coefficients. Getting the terminology right is the hardest part..

lapis sequoia Jun 24, 2023, 5:13 PM

#

wooden sail idk what that gives you

it basically returns a numpy array with each col representing a Principal component, and they are ordered by variance.

past meteor Jun 24, 2023, 5:13 PM

#

Just code up PCA with numpy

#

It's ~5-10 lines of code. Do it once and the algorithm will make sense forever

wooden sail Jun 24, 2023, 5:14 PM

#

closer to 3 😛

#

anyway, the principal vectors are orthonormal, and you want them that way

#

they have unit norm already

sleek harbor Jun 24, 2023, 5:14 PM

#

sounds like "rewrite tenforflow in 100 lines of C++" to me..

past meteor Jun 24, 2023, 5:15 PM

#

No, PCA is very simple

sleek harbor Jun 24, 2023, 5:15 PM

#

I'm not that good at numpy

lapis sequoia Jun 24, 2023, 5:15 PM

#

sleek harbor am I correct in assuming: 1. you should *not* standardize principal components ...

doesn't need to worry about seed or random states in ml based algorithms, it gets harder to reproduce results when dealing with NNs specifically tensorflow, where it is almost impossible to reproduce same results even if we keep everything same :))

past meteor Jun 24, 2023, 5:15 PM

#

calculate covariance matrix 2) calculate eigenvalues, eigenvectors 3) sort by eigenvalues 4) do matrix multiplication

#

Each of these things have a numpy "verb" so it's just chaining stuff that exists off the shelf together 🙂

sleek harbor Jun 24, 2023, 5:16 PM

#

maybe when I buy some more IQ

past meteor Jun 24, 2023, 5:17 PM

#

No, you're 100 % smart enough to do this @sleek harbor don't underestimate yourself

#

And once you do it, you'll have something like "wow, was that all that there was to it?"

sleek harbor Jun 24, 2023, 5:17 PM

#

hmm.. sounds fishy 🐟

wooden sail Jun 24, 2023, 5:18 PM

#

data = ... # size n x m; let n be the data length, m the number of samples
centered_data = data - np.mean(data, axis = 1)
cov = centered_data @ centered_data.T/m #size n x n covariance matrix
principal_components,_,_ = np.linalg.svd(cov)

#

that's a PCA for you

shadow viper Jun 24, 2023, 5:18 PM

#

sleek harbor hmm.. sounds fishy 🐟

No lol... He's right

wooden sail Jun 24, 2023, 5:19 PM

#

using an SVD has the advantage of canonically being ordered by the size of the singular values, so it saves you the sorting. it's also equivalent to the EVD for symmetric matrices, which all covariance matrices are

lapis sequoia Jun 24, 2023, 5:20 PM

#

Reminded me of funny incident, When I was doing titanic and other tutorial problems to learn, I extracted features from the indexing column provided(random string values), and I got little boost in cv scores, I got so happy. It was probably a boost from randomness or param tweaks haha

sleek harbor Jun 24, 2023, 5:20 PM

#

wooden sail ```py data = ... # size n x m; let n be the data length, m the number of samples...

that last line feels like cheating. Also I don't quite know the math of SVD. I understand PCA via the visualization in this vid :p https://youtu.be/FgakZw6K1QQ

YouTube

StatQuest with Josh Starmer

StatQuest: Principal Component Analysis (PCA), Step-by-Step

Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly complex datasets and it can tell you what variables in your data are the most important. Lastly, it can tell you how accurate your new understanding of the data actually is.

In this video, I...

▶ Play video

wooden sail Jun 24, 2023, 5:21 PM

#

sleek harbor that last line feels like cheating. Also I don't quite know the math of SVD. I u...

then do EVD instead

#

but idk what you find to be "cheating" about it, it's identical to the EVD

#

no one ever computers eigenvalues and eigenvectors by hand for anything larger than a 3x3 matrix, if that's what bothers you

sleek harbor Jun 24, 2023, 5:22 PM

#

wooden sail but idk what you find to be "cheating" about it, it's identical to the EVD

I mean the part were there's just one function call. That's cool, but I was expecting.. more code 🤣

wooden sail Jun 24, 2023, 5:22 PM

#

i told you it was like 4 lines

#

so did zestar 😛

#

if you use the EVD instead, you need 1 more line to sort by eigenvalue

past meteor Jun 24, 2023, 5:23 PM

#

means = mean(threes);
threes_cent = threes - means;
covariance_matrix = cov(threes);
[V, d] = eigs(covariance_matrix , i);
decomposed = threes_cent * V * V';
decomposed = decomposed + means;

#

This is PCA in full, in matlab

abstract rune Jun 24, 2023, 5:23 PM

#

can someone help with this, why it is not renamed ?

past meteor Jun 24, 2023, 5:24 PM

#

Matlab and numpy are twins so it should be readable

sleek harbor Jun 24, 2023, 5:24 PM

#

abstract rune can someone help with this, why it is not renamed ?

your X is capitalized, when it shouldn't be :3

abstract rune Jun 24, 2023, 5:25 PM

#

Thanks a lot @sleek harbor

#

this silly error

past meteor Jun 24, 2023, 5:26 PM

#

Most courses "forced" us to do algorithms by hand, which was a good thing in hindsight because then you know what it's doing

abstract rune Jun 24, 2023, 5:26 PM

#

guys I am using this book to get started with ML

https://www.amazon.in/Machine-Learning-Python-Manaranjan-Pradhan/dp/8126579900/ref=sr_1_1_sspa?crid=3DEK3K0JUHFAJ&keywords=machine+learning+with+python&qid=1687627589&sprefix=machine+learning+%2Caps%2C1272&sr=8-1-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&psc=1

Machine Learning using Python | IM | BS | e

This book is written to provide a strong foundation in machine learning using Python libraries by providing real-life case studies and examples. It covers topics such as foundations of machine learning, introduction to Python, descriptive analytics and predictive analytics. Advanced machine learn...

past meteor Jun 24, 2023, 5:27 PM

#

Fundamentally the building blocks aren't hard, just having a high level understanding is fine. Then later you can ask yourself questions like "why the covariance matrix", "why the eigenvalues", "how do eigenvalues relate to the cov matrix", "why is the reconstruction error ~ 0 if num_components == num_features"

#

@abstract rune https://www.statlearning.com/ is the best course text for ML in my humble opinion. Yes the examples are in R but there are Python equivalents for nearly every algo.

An Introduction to Statistical Learning

abstract rune Jun 24, 2023, 5:29 PM

#

this is the table of contents of this book

abstract rune Jun 24, 2023, 5:31 PM

#

abstract rune this is the table of contents of this book

I think I will stick to this book because continiously changing books will be a hassle, can someone who is has learned ML confirm for the table of contents of this book

past meteor Jun 24, 2023, 5:31 PM

#

it's fine

sleek harbor Jun 24, 2023, 5:35 PM

#

past meteor <@686248301062127641> https://www.statlearning.com/ is the best course text for...

I'm waiting for the python version.. should come out this summer

past meteor Jun 24, 2023, 5:35 PM

#

Just go with the R version for now, code is a relatively small part of the book

wooden sail Jun 24, 2023, 5:36 PM

#

you can learn all the stuff detached from coding

#

then the lang doesn't matter

past meteor Jun 24, 2023, 5:37 PM

#

Like if you see lm(income ~ age + year_of_experience + level_of_education) it's trivial to map that to Python's linear regression

#

I feel like the Python version will use statsmodels or something cursed anyway so that's a wrap

sleek harbor Jun 24, 2023, 5:38 PM

#

as long as u can guess that lm stands for lin model..

past meteor Jun 24, 2023, 5:38 PM

#

The book definitely mentions that lm stands for linear model 🙂

sleek harbor Jun 24, 2023, 5:39 PM

#

past meteor The book definitely mentions that `lm` stands for linear model 🙂

what does income ~ age mean? cus i read that as "income is not age"

past meteor Jun 24, 2023, 5:39 PM

#

income is a function of all those variables

sleek harbor Jun 24, 2023, 5:39 PM

#

now how would I guess that?

past meteor Jun 24, 2023, 5:39 PM

#

I just dropped that here out of context, in the book there's a logical flow so when you'd read it, it'd make sense from the context

#

lm(income ~ log(age) + years_of_experience + level_of_education + (years_of_experience * level_of_education)) is possible as well, very flexible stuff

sleek harbor Jun 24, 2023, 5:40 PM

#

i was debating on going for a masters in financial algorithms for analysis (with R).. maybe I should've went for it..

past meteor Jun 24, 2023, 5:41 PM

#

R as a programming language is so horrible

sleek harbor Jun 24, 2023, 5:41 PM

#

good thing I didn't go for it then :3

past meteor Jun 24, 2023, 5:41 PM

#

But it has a few nice ideas for specifically statistics

#

I'm talking about the language itself, not what can be done in it

#

I'm generally not a fan of languages that are dynamically typed and have less strong typing than Python. You know, languages that do a lot of casting like JS, PHP and R

#

You gotta be really awake because they'll do stuff that is imo silently failing, sort(c("1", 2, "3", "four", 5, 6)) is equivalent to sorted(["1", 2, "3", "four", 5, 6]) . In Python the latter (luckily) errors out while in those langs it does not

wooden sail Jun 24, 2023, 5:46 PM

#

python still does plenty of that though, and it's my biggest beef with it

#

😩

past meteor Jun 24, 2023, 5:48 PM

#

As dynamically typed languages go it's well designed imo but ig there's a limit to what you can do

#

Hence why I use mypy judiciously. For stuff that's not strictly data I might someday look for an alternative with more type safety but doesn't feel as verbose as idk Java.

weary warren Jun 24, 2023, 6:05 PM

#

could anyone help me with a whatsapp gpt trying to integrate stripe into it. having difficulties creating the cancel subscription

lavish lily Jun 24, 2023, 6:19 PM

#

If i run the OpenAI CLI fine_tunes.follow command and my stream disconnects is it still being trained and processed in the backend?

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i <model id>

iron basalt Jun 24, 2023, 7:36 PM

#

past meteor You gotta be really awake because they'll do stuff that is imo silently failing,...

My favorite Python related bug is setting a member variable on a object, but there was a typo and it just silently creates a new member with that name.

#

Then I spend an hour wondering why the variable has the wrong value.

serene scaffold Jun 24, 2023, 7:54 PM

#

iron basalt My favorite Python related bug is setting a member variable on a object, but the...

everyone should just use getters and setters /s

native umbra Jun 24, 2023, 8:16 PM

#

Hello, guys should I start taking "world quant data science lab" course, or should I focus on something else?

serene scaffold Jun 24, 2023, 8:21 PM

#

native umbra Hello, guys should I start taking "world quant data science lab" course, or s...

who's teaching that course, what does it cover, what does the course expect you to know before taking it, and how much of that material do you already know?

lapis sequoia Jun 24, 2023, 9:07 PM

#

Is there a really good guide to kaggle that has like the top 5-10 or so challenges that ramp up in difficulty so you can learn as you go?

somber panther Jun 24, 2023, 9:15 PM

#

matpltlib, what are left and bottom in add_axes?

simple tapir Jun 24, 2023, 9:26 PM

#

hey

#

I've learned from a tutorial that ridge regression basically sets the theta of useless features to closer to 0. But I didn't get how it knows whether a feature is useless

agile cobalt Jun 24, 2023, 9:31 PM

#

it draws all features closer to zero
the usual gradient descent process just outweights that effect for the actually useful ones

simple tapir Jun 24, 2023, 9:32 PM

#

How does gradient descent know which one is useful?

agile cobalt Jun 24, 2023, 9:32 PM

#

the same way it knows what to update for normal linear regression models

simple tapir Jun 24, 2023, 9:32 PM

#

ah

#

so instead of going forward everytime it updates itself, it starts from 0, right?

agile cobalt Jun 24, 2023, 9:33 PM

#

it isn't "totally useless" / "totally useful", more of "at some point it is useful enough to resist the pull towards 0"

simple tapir Jun 24, 2023, 9:33 PM

#

Yeah and I wanted to know the criterias that computer utilises to determine whether it's useful

agile cobalt Jun 24, 2023, 9:34 PM

#

that'd be the loss function and back-propagation mechanism (aka calculating gradients)

simple tapir Jun 24, 2023, 9:34 PM

#

Ah I see

agile cobalt Jun 24, 2023, 9:35 PM

#

what Ridge regression changes compared to linear regression in the end of the day is just adding a term to the loss function that increases the loss based on the weights

grave summit Jun 24, 2023, 9:36 PM

#

Hello guys, i need some advices

simple tapir Jun 24, 2023, 9:36 PM

#

Ridge regression draws all the features to closer to 0 where lasso regression draws them to 0 but I don't think that it'd affect the model much, since 0.0001 can be assumed as 0. Why would you choose ridge over lasso, though?

grave summit Jun 24, 2023, 9:36 PM

#

Let's set some background here, I am studying a financial time series which is representing the hourly price of electricity for each day of the year 2022 so I have a Pandas DataFrame containing two columns:

The first one is a TimeIndex containing the date and hour in a datetime format of Pandas. The second one contains the price associated to each hour.

I am studying this sample to make some predictions on the hourly price of let's say 2024. For this purpose I would like to do an in depth study of the seasonality patterns in this time series, on multiple granularity levels (hours, days, weeks, months and quarters).

In the end I would like to obtain a column vector of 8760 scalars corresponding to the 8760 hours of the year that are priced. Those scalars would represent a seasonality coefficient that I will use to make my predictions for the year coming.

Now comes the reason that i am here, I thought of doing this search for seasonality using FFT and an appropriate window function. I would like to know from you guys which window function should I use for this purpose, I know each one has its advantages and disadvantages. I would also like to know how should I choose the width of my window as obviously this will have an effect on the FFT performed. I am also open to advices on how to complete this seasonality study, would you do it another way? Which tool would you use?

This is a general question, I am looking for other people's opinion on how to do this research, I am not encountering any particular coding problem for the moment

simple tapir Jun 24, 2023, 9:36 PM

#

grave summit Let's set some background here, I am studying a financial time series which is r...

copy paste?

simple tapir Jun 24, 2023, 9:37 PM

#

simple tapir Ridge regression draws all the features to closer to 0 where lasso regression dr...

and this btw

agile cobalt Jun 24, 2023, 9:38 PM

#

simple tapir Ridge regression draws all the features to closer to 0 where lasso regression dr...

iirc Ridge and Lasso do the same thing?

the difference is that Ridge operates on the square of the weights, while Lasso operators on their absolute value

#

https://scikit-learn.org/stable/modules/linear_model.html#regression
https://scikit-learn.org/stable/modules/linear_model.html#lasso

scikit-learn

1.1. Linear Models

The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features. In mathematical notation, if\hat{y} is the predicted val...

simple tapir Jun 24, 2023, 9:40 PM

#

hmm I see

#

thanks a lot!

left tartan Jun 24, 2023, 10:06 PM

#

grave summit Let's set some background here, I am studying a financial time series which is r...

Are you familiar with arima? Since I see seasonality, just wanted to make sure

past meteor Jun 24, 2023, 10:20 PM

#

iron basalt My favorite Python related bug is setting a member variable on a object, but the...

Big oof, I do kind of like being able to do monkey patching though

#

I never use it because it's a bit janky but I ... like the fact I can do it

past meteor Jun 24, 2023, 10:22 PM

#

simple tapir I've learned from a tutorial that ridge regression basically sets the theta of u...

The intuition is that your model has a "budget" it can spend to get a certain performance because you punish it for increasing the weights

#

So you can just remember, for now, that it finds a way to get the best value for money performance wise, which means setting some values closoer to 0.

#

As for why they don't hit exactly with ridge regression and they do with lasso, you can look at the equations for that.

Write out the partial derivative of an arbitrary coefficient for L2 loss and see under what conditions beta gets to be exactly 0. It's at lambda (regularization strength) -> inf. For lasso this is not the case. (frequentist pov)

There's also the Bayesian statistics way to look at it. No regularization == uniform prior, ridge == gaussian prior and lasso == laplacian prior. If you look at the laplacian, you see a nice big peak at 0 with a big drop off. There's a high probability to be exactly 0 while the gaussian has a lot more "mass" around 0.

Honestly, idk how much value knowing this is if you're a "practitioner" and not someone making methods 🙂

iron basalt Jun 24, 2023, 10:31 PM

#

past meteor I never use it because it's a bit janky but I ... like the fact I can do it

Could still have that, but require manually calling setattr or something.

past meteor Jun 24, 2023, 10:32 PM

#

iron basalt Could still have that, but require manually calling `setattr` or something.

Stop being sane and very reasonable 🤣

queen cradle Jun 24, 2023, 10:59 PM

#

grave summit Let's set some background here, I am studying a financial time series which is r...

First of all, you can do an FFT on the whole time series if you like. An FFT length of 8760 should be easy.

Second, the FFT only gives you the results you want if everything lines up perfectly in time, and for calendars they don't. Suppose, for example, that there is a monthly effect (for example, perhaps something happens on the first of the month). Unfortunately, months don't all have the same number of days: January has 31 days, February has 28, and so on. So these effects will be unevenly spaced and hence not clearly visible if you do an FFT. Or suppose that you think there's a weekly component (a pretty reasonable guess, since people's activity is different on weekdays and weekends). The year is not a round number of weeks: It's 365 days, and 365 = 52 * 7 + 1. Consequently there is no frequency corresponding to weekly effects.

Third, if you only have a single year of data then you will need to smooth your results pretty heavily. You say you want 8760 scalars, one for each hour of the year. Well, you started with 8760 scalars, one for each hour of the year. If the price in hour of the year was independent of the price in every other hour of the year, then the maximum likelihood estimate of next year's price would just be last year's price. The only reason why your problem is complicated is because you expect that the prices are not independent. Really your question about seasonality is about how to measure possible certain kinds of non-independence. And what you have to hope is that the interdependence of the different variables is strong enough and discoverable enough that you can actually predict 8760 scalars reliably.

odd meteor Jun 24, 2023, 11:33 PM

#

grave summit Let's set some background here, I am studying a financial time series which is r...

In addition to Kyle's answer, you might find some value in this year's ClimateAI workshop at ICLR in Rwanda https://colab.research.google.com/github/bitstoenergy/iclr-tutorial/blob/main/SmartMeterDataAnalytics_Tutorial.ipynb#scrollTo=qcgRU09V3LMh

Google Colaboratory

dire violet Jun 25, 2023, 1:14 AM

#

hi, im new to ml and im trying to build a food recommendation system. i'm not sure what type of model to use though. i did a little bit of research and found that a hybrid of collaborative filtering and content-based filtering could be ideal for what i want. for hybrid, i'm not sure where to find these models to use. in addition, how does one train a model? I understand you feed it data but what type of data do i feed

serene scaffold Jun 25, 2023, 1:33 AM

#

dire violet hi, im new to ml and im trying to build a food recommendation system. i'm not su...

How one trains a model depends on a lot of things. For the moment, I'm not clear on what you want the user experience to be

dire violet Jun 25, 2023, 1:35 AM

#

what do you mean by user experience? like how what data the user will have?

serene scaffold Jun 25, 2023, 1:46 AM

#

dire violet what do you mean by user experience? like how what data the user will have?

What information are they expected to give to the model

dire violet Jun 25, 2023, 1:47 AM

#

serene scaffold What information are they expected to give to the model

i was planning like
gender, age, ethnicity, food type preferences (spicy etc) and as they interact with the app, it'd gather more info. clicks, ratings, reviews etc

odd relic Jun 25, 2023, 1:58 AM

#

what are your thoughts on this? It seems to have dropped real quick which concerns me. When training on a smaller dataset it took 80 epochs to go from 10K to 14.2, now when I moved to a dataset of 10K images it has this behavior

serene scaffold Jun 25, 2023, 3:27 AM

#

dire violet i was planning like gender, age, ethnicity, food type preferences (spicy etc) a...

what do you plan to do with the gender, age, and ethnicity information?

#

do you expect that that will inform their food preferences in some way?

odd relic Jun 25, 2023, 5:15 AM

#

oh man this is weird lol

cursive crown Jun 25, 2023, 5:17 AM

#

Hi everyone, I think I found a bug in pandas

import pandas as pd

date_1 = pd.to_datetime("2012-02-05")
print(date_1 - pd.offsets.MonthBegin())
# prints 2012-02-01 (prints a timestamp but this is the date)

date_2 = pd.to_datetime("2012-02-01")
print(date_2 - pd.offsets.MonthBegin())
# prints 2012-01-01

I want date_2 to remain as is because that's the beginning of the month. How do I do this if both kinds of dates are in the same column?

agile cobalt Jun 25, 2023, 5:22 AM

#

cursive crown Hi everyone, I think I found a bug in `pandas` ```python import pandas as pd d...

just floor instead of using an offset?

#

never mind, not sure

#

maybe .replace(day=1)

cursive crown Jun 25, 2023, 5:29 AM

#

agile cobalt never mind, not sure

That works👍. Never new there replace for Timestamp. Thanks!!

dire violet Jun 25, 2023, 5:29 AM

#

serene scaffold what do you plan to do with the gender, age, and ethnicity information?

well age and ethnicity i feel like have some impact on food preferences so to answer your question, yes

cursive crown Jun 25, 2023, 5:35 AM

#

cursive crown That works👍. Never new there `replace` for `Timestamp`. Thanks!!

@agile cobalt my mistake, the series is dtype is datetime64[ns] and it has no replace method. I tried accessing .dt and then doing replace which also doesn't work

agile cobalt Jun 25, 2023, 5:42 AM

#

no clue, my last guess would be trying something like se - timedelta(se.dt.days - 1, unit='days') but idk which kind of timedelta would fit

#

something like resample might work depending on what exactly you're doing

odd relic Jun 25, 2023, 5:47 AM

#

dire violet i was planning like gender, age, ethnicity, food type preferences (spicy etc) a...

I feel like this can just be a non-AI app

#

AI seems overkill

#

just do some research

#

like what does gender have anything to do with it

#

food preferences yes

#

but again, you dont need AI for that

cold osprey Jun 25, 2023, 5:50 AM

#

AI everything innit

dire violet Jun 25, 2023, 5:52 AM

#

odd relic just do some research

how bout for dietary preferences? cuisines and stuff. if not, how would i accomplish this

cursive crown Jun 25, 2023, 6:02 AM

#

agile cobalt no clue, my last guess would be trying something like `se - timedelta(se.dt.days...

Thanks! But I resolved it by doing date_series.dt.to_period('M').dt.to_timestamp()

young pewter Jun 25, 2023, 6:10 AM

#

anybody doing the kaggle project?

#

the spaceship titanic

lusty lotus Jun 25, 2023, 7:37 AM

#

have i implemented my training loop correctly? the LR goes really high after I implemented L1 and L2 for my regression problem but the shape of the graph is still kinda the same, it's just that the numerical value of the error itself got really high
https://pastebin.com/G5hngfd6

Pastebin

def cycle(self, X_train, y_train, X_val, y_val, best_score, l1_lamb...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

simple tapir Jun 25, 2023, 7:39 AM

#

past meteor As for why they don't hit *exactly* with ridge regression and they do with lasso...

ooh, great explanation, thanks a lot! 🙏

young pilot Jun 25, 2023, 8:49 AM

#

We need a freelancer who can build prediction model based on the dataset according to requirement.

hybrid mica Jun 25, 2023, 11:11 AM

#

I have trained a self organising map. Do SOMs have a stochastic nature? Why am I getting the same SOM after re-running the code?

slender kestrel Jun 25, 2023, 11:15 AM

#

odd relic what are your thoughts on this? It seems to have dropped real quick which concer...

lol it did go down real quick i have made a couple of models it never really goes down that quick imo

slender kestrel Jun 25, 2023, 11:17 AM

#

odd relic oh man this is weird lol

someone had the same issue on stack overflow i can link you the question if you want

lapis sequoia Jun 25, 2023, 12:34 PM

#

Anyone help plz ?

latent tundra Jun 25, 2023, 2:03 PM

#

I programmed a simple 2d topdown shooter and want to apply a tensorflow agent on it. Should I use the absolute coordinates of the enemy or the coordinates relative to the player as input?

bronze jacinth Jun 25, 2023, 2:04 PM

#

lapis sequoia Anyone help plz ?

whats the command you used for this?

past meteor Jun 25, 2023, 2:24 PM

#

latent tundra I programmed a simple 2d topdown shooter and want to apply a tensorflow agent on...

Depends on how you want your agent to act

#

I'd say relative - when I was doing Nethack RL stuff I expressed everything in relative coordinates

latent tundra Jun 25, 2023, 2:26 PM

#

Yea, know that I really think about it there is probably no advantage to giving absolute coordinates and they would probably internally be converted to relative coords or even just the distance

past meteor Jun 25, 2023, 2:27 PM

#

If you don't encode your agents position and you're using absolute coordinates you'd have a bad agent I think

dusk tide Jun 25, 2023, 3:27 PM

#

I am working on movies dataset. These 2 images containes the collection names with revenue and budget sum and mean of each collection .
When I did sum of revenue , I found Harry Potter collection stood in 1st place .
But when I did mean of revenue I found the Avengers collection stood in 1 st place.
So truly which one is a success between the two??

#data-science-and-ml

Question:

.latex [ \begin{bmatrix} y_1 \ y_2 \ y_3 \ \vdots \end{bmatrix}

.latex [ \begin{bmatrix} y_1 y_2 y_3 \vdots \end{bmatrix}

.latex
[
\begin{bmatrix}
y_1 \
y_2 \
y_3 \
\vdots
\end{bmatrix}

.latex
[
\begin{bmatrix}
y_1
y_2
y_3
\vdots
\end{bmatrix}