#data-science-and-ml

1 messages ยท Page 60 of 1

muted crypt
#

Makes sense yeah

past meteor
#

Do you rescale your output?

muted crypt
#

I've done that with just one of the coordinates, latitude vs time and just a column

past meteor
#

I assume you're using MSE or something similar but lat, lon and altitude are on different scales

muted crypt
#

I guess that is wrong

#

This is the X data

past meteor
#

Like, on paper what you're doing is that you have a 3D input that is fed into your model and together with the latent vector Z it produces a 3D output. This 3D output is what produces the loss, this is pretty much like multi task learning.

muted crypt
past meteor
#

What you should be predicting is the real trajectory

muted crypt
#

indeed, but in the train dataframe you do need to have it, right?

cold osprey
#

huh

past meteor
#

Are you using Pytorch or Tensorflow? I'm going to look for an example

muted crypt
#

Keras

muted crypt
muted crypt
past meteor
#

You just need a SimpleRNN and Dense layer to start with

muted crypt
#

as different flights are unrelated, can they all be dumped in the same array?

young granite
#

array is just a form of input

past meteor
#

I don't know about Keras specifically but you typically end up with a dataset that looks like this: num_steps x features x num_outputs so you can have unrelated ones

#

num_outputs is your y_train

#

You're conditioning over windows meaning if your window is of size 3 it'll look like [1,2,3] -> 3(pred) ; [2, 3, 4] -> 4 (pred) ; [3, 4, 5] -> 5 (pred) ....

muted crypt
#

say that I have 50 flights, each with 10 features. They are variable, so like the first flight has a latitude that varies over time, speed, altitud... all of these have to be an array inside of every cell in the matrix or is it better to have an id column and simply have as many rows as points in the entire flights

#

This is where I struggle the most, I don't really find the best approach on the dataset you mentioned

muted crypt
#

I had tested this for 12 flights

#

Being FPLlat the latitude of the intended trajectory and TELlat the latutide of the real trajectory

muted crypt
#

Like [1_intended, 2_intended, 3_intended] -> 3_real (pred)?

#

The thing here is that i don't want to extend a timeseries but rather generate a new one from a given one, this messes up my head

past meteor
#

You need to try putting it into words what your task is. Are you trying to map a coordinate from intended to flown (fully markovian) or are you trying to map a coordinate from intended to flown, given the past few coordinates (markovian over a window)

#

Depending on how these drones work, your initial results and what you want you may even need a bi-directional RNN (but don't start doing this yet)

#

Maybe a reasonable feature-set and approach is using a feedforward net (no RNN) with [X,Y,Z, starting X, starting Y, starting Z, time since start, time to end] as features

muted crypt
past meteor
#

Why? I think that potentially points in the middle are the ones that are off. The points close to the start and close to the end are typically similar (according to the pics)

#

You can also make the problem easier by predicting the diff between intended and flown etc etc

#

You just need to focus on understanding your task better on the ML side I think

muted crypt
#

I've been working on that for months and I feel like I have all the necessary data to feed it into a NN but can't figure it out because I haven't been able to find any similar example

muted crypt
muted crypt
cold osprey
#

seems good

#

loss should be very close to 0

past meteor
muted crypt
#

but i've done that by training the model with just a feature (latitude, and the time stamps) of a single flight

cold osprey
#

try with lat long and alt

muted crypt
past meteor
#

Are you only doing one of the 3 dimensions now @muted crypt ?

past meteor
#

Okay I'll try explaining what I meant again with a lot less jargon:

muted crypt
#

My dataframe is originally just a column, should I do 3 columns now and then what the Y is the next 3 values of lat, long, alt too?

past meteor
#

Your task: X, Y, Z (intended) -> X, Y, Z (actual) for all time steps T for all flights.

muted crypt
past meteor
#

Looking at your images, intended != actual typically in the middle of your flight (look at your plots)

#

So a baseline model (could be linear regression even ๐Ÿ˜„ ) is the following: time_since_start, time_to_end and you predict 3 things: difference_intended_flown_X, difference_intended_flown_Y, difference_intended_flown_Z

#

My intuition is that this model is already going to be quite good! ๐Ÿ˜„ This is a drastic simplification of your problem

cold osprey
#

diff_X, diff_Y, diff_Z as a function of time since start and time to end?

#

hmmm

past meteor
#

No. EDIT: actually yes, I missread.

muted crypt
#

how can I know the diff_X before the flight? does diff_X mean the distance to the real? or the distance to the start?

past meteor
muted crypt
#

Just the positions where the drone will fly by and at which time

#

and velocity for instance of each segment, from which you can get the time

past meteor
#

So, I suspect that the model will be used to "adjust" the intended path to correspond to the actual path ahead of flying right? (and not during)

past meteor
#

Then you can definitely make a model like I described above

muted crypt
#

so you can't "rely" on previous information from that intended flight

past meteor
#

Even if it's a bad model, you need to make it imo because it's a good baseline to compare other models to

cold osprey
#

is the predicting real-time? as in as the drone flies, it will show the predicted path it will take

past meteor
#

Based on looking at this I suspect that there is indeed a relationship between the time to start, time to end and the difference between intended and flown

muted crypt
cold osprey
#

with the simple model zestar proposed:

  1. have intended flight path
  2. use model to get diffX diffY diffZ
  3. use diff(s) and intended flight path to get predicted flight path
  4. compare to actual flight path
muted crypt
#

so combining diff with intended I see

#

Yet the model has to take both intended and real right?

cold osprey
#

for zestar's model, the model only takes time since start and time to end as inputs

past meteor
#

Exactly that

cold osprey
#

it doesnt 'care' about the X Y Z per say

past meteor
muted crypt
#

and what would be the Y in the model then?

cold osprey
#

X and Y - lat and long

past meteor
cold osprey
#

Z - altitude

#

oh soz

#

u mean that y

past meteor
#

Indeed

muted crypt
past meteor
#

Yeah, it's easy to compute this no?

#

For all flights you subtract the intended from the actual

muted crypt
#

here a big question arises though, do you take into account the time shift?

cold osprey
#

what is the time shift?

past meteor
#

What do you mean with time shift indeed?

muted crypt
cold osprey
#

i think we r using euclidean here

past meteor
#

Your time series are not aligned?

cold osprey
#

oh

muted crypt
cold osprey
#

real link doesnt work

#

whats FLPturn and FLPwpt?

muted crypt
past meteor
#

So for the intended one you have a lot less samples than for flown?

muted crypt
#

So the Real is the data recorded by the drone in 0.1 second increments ('secs' column) the intnded is jus the trajectory to be followed

past meteor
#

Because it generates a bunch of waypoints

cold osprey
#

hmmm

muted crypt
cold osprey
#

and we want to get from intended to real without actually flying the drone

past meteor
#

Does the drone pass every waypoint?

muted crypt
#

well not exactly on top, but very close at least

cold osprey
#

1st data row of real corresponds to 1st waypoint?

past meteor
#

Okay, then I would only predict 12 points per flight, the ones closest to the waypoint

muted crypt
past meteor
#

Why? Otherwise you're making big assumptions about the flight. You can't upsample the data / linearly interpolate between waypoints to have the same sample rate as the drone unless you're 100 % sure the drone is programmed to move between waypoints in a straight line

cold osprey
#

need a way to align them

past meteor
#

if you are sure that the path between the waypoints is linear then you can upsample to get the same sample rate as the flight

muted crypt
muted crypt
#

Like this:
The evolution of the error along the flight

cold osprey
#

ah

past meteor
#

Look, if you're sure that the drone flies in a straight line you should upsample between the waypoints to have an obervation every 0,01s

cold osprey
#

^

muted crypt
#

I've interpolated the intended trajectory so that the number of rows matches the number of rows in the real one, is that what you mean?

past meteor
#

yes

muted crypt
#

I've done this yeah

cold osprey
#

with the straight line assumption of the intended data, then can try zestar simple model first then

past meteor
#

Then you don't need to align them or what am I missing?

cold osprey
#

he's already aligned it right

muted crypt
#

Depends on what you mean by align

past meteor
#

I'd truncate your flight dataset to be between waypoint 1 and the last way point as well

muted crypt
#

It's impossible to perfectly alineate them, you can find the best time shift (what I like to call) (amount of time lag between real and intended)

muted crypt
#

I have this now for instance

#

For 16 flights, already interpolated

past meteor
#

After truncating it to be between way point 1 and N I'm not sure you need to align them? Especially if you've already interpolated

muted crypt
muted crypt
cold osprey
#

not for the time since start and time to end model

muted crypt
past meteor
#

You'd need to create variables such as time since waypoint and time to waypoint

cold osprey
#

^

past meteor
#

If the drone passes each waypoint then the idea is that it deviates from the path between waypoints

muted crypt
#

yet this is just 2 arrays which are flipped, aren't they?
time since start: [0, 1, 3, 4, 5]
time to end: [5 ,4 ,3 ,2 ,1 ]

past meteor
#

If the time between each waypoint is equal then yes

cold osprey
#

for 5 data points

muted crypt
cold osprey
#

wait, is the model predicting for each waypoint or on all points(from interpolation)

past meteor
#

but I can imagine you could be 2 time points from a given waypoint but 7 time points to the next

muted crypt
#

so the time to end refers to the time to the next waypoint?

past meteor
#

yes

muted crypt
#

How does that differ from the actual end of the flight?

past meteor
#

My assumption is that the drone comes pretty close, if not exactly, to each waypoint and it deviates in the middle

muted crypt
past meteor
#

Like, you can calculate this easily before you build a model to see if it's true

muted crypt
#

it won't deviate a thing on these horizontal long segments

past meteor
#

All of this is "feature engineering" and is the cornerstone of ML. You have to be a bit creative haha, if you're creative enough you can really simplify the problem for LSTMs to linear regression

past meteor
muted crypt
#

where the peaks correspond to the turns

#

yet for the rest it is not really true

past meteor
#

What is this? The difference?

muted crypt
past meteor
#

Just add the segment type as a variable for your model

muted crypt
#

Difference from real to intended

past meteor
#

And add time between waypoints

muted crypt
#

do you mean to categorize each segment?

past meteor
#

CurvaturePassed etc.

muted crypt
#

I mean a turn is not a segment but it falls into a certain length of 2 segments

past meteor
#

Can't you know if you're turning between 2 waypoints by looking at the X, Y and Z coordinates?

past meteor
#

In a straight line you only have X that varies, no?

muted crypt
#

Because you cant just have 3 points like a triangle and tell where are the turns

past meteor
#

Can there even be a turn between 2 way points?

#

Especially since you said you linearly interpolate

muted crypt
past meteor
#

Tbh, I can't chat all evening but you just need to "distill" your knowledge of the problem into variables and simplify the problem as much as possible

muted crypt
past meteor
#

And afterwards you need to build a baseline model

#

Then you start relaxing your assumptions 1 by 1 and creating more powerful models. At the end of this process you get to RNN, LSTMs, maybe even bi-directional RNNs

muted crypt
#

yeah the thing is that there not an example of something similar

past meteor
#

using time from wayepoint, time to waypoint to predict the diffs is already making many assumptions that you can then start relaxing later on (or you add variables to make more reliable assumptions)

muted crypt
#

and really predicting something like temperature is pretty easy but this is much more differnt and I don't really know why

past meteor
#

No, the thing with predicting temperature is that they've already done all what I've said and that it's just documented and makes sense because all of the tricks/thinking are written down already ๐Ÿ˜›

muted crypt
muted crypt
past meteor
#

I always use a mix of Ridge, Lasso, Random Forest, xgboost, SVMs (depending on my dataset's size) and neural networks

muted crypt
#

and can you predict 3 columns at a time for instance?

past meteor
#

This fits 3 models, neural networks on the other hand fit all of it at once with 3 output neurons

muted crypt
#

wait but the y is diffX, diffY, diffZ. so now I have to compute the error in 3 dimensions. I just had the absolute distance :(

past meteor
#

Yes...

untold cliff
#

If i have a categorical feature with lots of categories and some portion of these categories dont have a lot of instaces in the dataset, like maybe less than 10 for each of them, does it make sense to group them all in just one new category since they wouldnt provide much information i believe because they have very few instaces?

muted crypt
#

damn this is mad but I guess I'll try it

past meteor
#

You need to ensure diffX, diffY and diffZ are on the same scale and then take the mean of the error

muted crypt
#

oh yes do you recommend scaling?

past meteor
#

It's not that mad tbh ๐Ÿคทโ€โ™‚๏ธ

past meteor
muted crypt
#

then doing the model

muted crypt
tall tulip
#

We use ADFtest and KPSStest to check the stationary is there any method availabe to check the seasonality of the data?

past meteor
plucky raft
#

hey guys, im trying to save a file to a variable like so
dataset = 'filename'

#

but its not working

#

do i have to include the path to the file?

#

or the extensions?

faint mist
#

I just figured out interesting observation when dealing with time series problems

#

I usually train my model after scaling down the data in values between 0 and 1

#

The model will always learn to predict a value between 0-1

#

However, in case of regression this may be limiting the model capability to generalise

#

For example if the maximum value in the training dataset is 1000

#

Scaled down to 0-1 the 1000 will become 1

#

On the other hand, if the maximum value in the testing split is 2000

#

Scaled down based on the scaler of the training data set

#

The value will be 2

#

The model will never predict 2

past meteor
#

That's precisely why fitting your normalization stuff on your entire dataset is cheating

faint mist
#

Yes, this what i did at first

#

Then normalised only the training split and used the same scaler for testing

past meteor
#

I did an experiment a while ago with synthetic time series and I noticed that if you have preprocessing such as normalization if you do not update them across time (esp. if you have trend) only the drift on the normalization alone is enough to kill your model

faint mist
#

Exactly!

past meteor
#

I refit the normalization online at each timestep y_actual became available

faint mist
#

Makes sense, but again I still think this limits the model capability

wheat snow
#

i want to analyze my youtube data. And i received a huge html package for that... I dont know much about html's so is it possible to pull that data out and restore it in a csv format?

faint mist
faint mist
#

its a service provided by github

#

chatgpt optimized for coding

wheat snow
#

uhhh

#

i dunno what u mean sorry

faint mist
#

Sorry if I confused you

wheat snow
#

ah

#

i see

faint mist
wheat snow
#

basiccly AI pylance

#

but it knows what project ur working on

#

so it knows what command you might need next?

faint mist
#

yes it will suggest multiple

#

you can ask chatgpt too

#

these tools are really amazing if you want a head start

wheat snow
#

true bruh last time i used chatgpt was in january

#

it sucked back then lmao

faint mist
#

then you take it from there and modify as needed

wheat snow
faint mist
#

Yes

#

Will speed up the process

#

outsource the labor work

wheat snow
#
from bs4 import BeautifulSoup
import csv

# Open the HTML file and read its contents
with open('Wiedergabeverlauf.html', encoding='utf8') as file:
    contents = file.read()

# Parse the HTML data using BeautifulSoup
soup = BeautifulSoup(contents, 'html.parser')

# Find the table containing the watch history data
table = soup.find('table', {'class': 'table-section'})

# Create a list to hold the extracted data
data = []

# Loop through each row in the table and extract the data
for row in table.find_all('tr'):
    # Extract the title and watch time for each video
    title = row.find('a', {'class': 'content-link'}).text.strip()
    time = row.find('span', {'class': 'accessible-description'}).text.strip()
    
    # Add the data to the list
    data.append([title, time])

# Save the data to a CSV file
with open('watch_history.csv', 'w', newline='', encoding='utf8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Watch Time'])
    writer.writerows(data)

"This code uses BeautifulSoup to parse the HTML data and find the table containing the watch history data. It then loops through each row in the table and extracts the title and watch time for each video, and saves the data to a list. Finally, it saves the data to a CSV file called 'watch_history.csv'.

Note that the above code assumes that the watch history data is contained within a table with the class 'table-section'. If your HTML file has a different structure, you may need to modify the code accordingly."

#

im not sure bout that

#

idk if the watch history data is stored in tables...

#

it looks like that

#

here a better pic

#

@faint mist normal that the script runs so long? i mean its a 50MB html

faint mist
#

Hmm, Ideally no

#

I will leave it for someone else to pitch in and help you with the matter

#

I am no expert in parsing html files and not sure how to help

#

I apologize

past meteor
wheat snow
faint mist
#

if you get what I mean

#

It will be close

#

ofc

#

but it could be closer

hard thicket
#

Hi, might not be the right channel so apologies if thatโ€™s so (let know and Iโ€™ll delete / move it)
Looking for input on how people like to develop data pipelines for aws from development to production. Ie how do you start locally when do you move to aws what accounts separation from production do you through, any and everything would be interesting.
We have some new projects that Iโ€™m trying aws glue / emr (for pyspark) and not sure what resources to make for the team around a idp and or testable workable starting point

wheat snow
#

smth is wrong... that scrip has been running for like teh last hour

#

and nothing happend

#

no errors... its just processing

waxen tusk
#

Thoughts on Data Factory?

dusty bay
#

I want to make a plot from a csv file using matplotlib. I have made the code but there is an error 'csv2df' object has no attribute 'plt'. Can anyone help me. Here is the code.

import pandas as pd
import matplotlib.pyplot as plt


class csv2df():
    
    def __init__(self):
        self.df = pd.read_csv("RMS level.csv")
        self.sheet = self.df[3:]
        
    def plot(self):
        self.x = self.sheet["RMS Level"]
        self.plt.plot(self.x)
        
        
show = csv2df()
show.plt.show()

astral path
#

i'm working on a project where i'm trying to predict a player's success after four seasons in a basketball video game based on their high school ratings. basically, there's 20 features for a player in high school, and i'm trying to predict a specific statistic (PER) in the game during their senior year. the catch is that players who have a particularly high rating for their high school features won't play until their senior season, so there should be a soft limit for how good a player is, and if a player is too good, their predicted PER should also be lower.

my current model fails to take this into account and will predict the best high school players to have high PER as a senior, even though most of them won't return for multiple seasons. How do I fix this?

velvet abyss
#

i applied for a data engineering job

#

I mean, an internship to be more exact

#

How should I prepare myself I somehow reach the interview phase?

restive path
#

Hello

#

For those who started in data science without experience in a job, what is the most common thing they are asked to do?

stark zenith
patent pivot
cinder urchin
#

Hello. Anyone know or have a chatbot? If not, can you tell me a name of model that I can use with the "transformers" library that doesn't need a lot of memory to work. I tried a few models are only 2 managed to crach my computer?

past meteor
tall tulip
#

I've dataset with 5 min time stamp which I changed to hourly data, and this data have and daily and 12 days seasonality and also not a stationary data, I've make the data stationary, after that I've used SARIMAX model which gives negative AIC value but when I tried to predict the value It gives me straight line, I also tried auto arima, but still It didn't work for me. How can I improve it's accuracy?

here is the model summary:

boreal gale
restive path
#

Hey guys, any data science on here?

A question regarding the learning of mathematics, according to what I have investigated, the D must learn a lot, but in reality the most important is algebra, calculus and statistics, now if you could say the most important contents of algebra and calculus, what would? they are?

wooden sail
#

what do you mean when you say algebra here? mathematicians say algebra to mean abstract algebra, which is way different from your high school algebra. what one uses very often in data science is linear algebra, which is one of the elementary parts of abstract algebra

mint palm
#

how does CLS token work with transformer?

wooden sail
#

regarding calculus, really all of it. you'll be looking at gradients, jacobians and hessians (so multivar calc) and integration used for optimization

sleek harbor
#

could someone explain this behavior of optuna? When I set the sampler as optuna.samplers.TPESampler(n_startup_trials=300) with 300-400 random initial samplings everything is fine.. at first. You can indeed see it taking 300-400 random hyperparameter combinations, after which the graph becomes more stable as the "smart algorithms" kick in.. but that only lasts for around 200 samplings.. after which it seems that optuna reverts to random sampling again..! How can this be explained? Is it supposed to be like this? I can't make sense of it..

agile cobalt
#

the alternative would be pretty much overfitting then staying there

past meteor
#

I should look at Optuna sometime ๐Ÿค” I always just use sci-kit's hyperparameter tuner or Keras tuner (even with Torch etc.) if I need more flexibility.

agile cobalt
#

remember to be careful when tuning hyperparameters, otherwise you might end up overfiting your model's hyperparameters to your test validation data

past meteor
#

wdym? You should never test on your test set before you've fixed 1 set of hyper parameters

restive path
agile cobalt
#

do you call it like
train / validation / test
or
train / test / validation
or only
train / test

#

in my mind, validation is after freezing everything, test is how you would measure if it gets better or worse, not sure what is the standard

past meteor
#

train /validation (find best model + hyperparameters) => test once

wooden sail
#

(sub)spaces, linear transformations, projections, diagonalization/EVD, SVD, low rank approximation. in fact, the other stuff (calculus and statistics) will always be applied on TOP of linear algebra

past meteor
sleek harbor
# agile cobalt remember to be careful when tuning hyperparameters, otherwise you might end up o...

this is really infuriating.. there's not much you could do here, but the way optuna just takes that one lucky hyperparam combo and claims it is the "best".. and it does that all the time, even when you really could chose an optimal combination.. it just throws out this random combo that happened to get lucky, even while you can see the actual algorithm at work moving in another direction.. why don't they fix this? it's obvious this is random luck, not actually a good combo..

past meteor
agile cobalt
sleek harbor
wooden sail
past meteor
#

I personally only vaguely know about TPS hence why I would not touch it over the ones I know and trust like Bayes opt (sequential problems) or random search

agile cobalt
#

it is also possible that the method you are using is just not appropriated for your model

past meteor
#

If you can run your trials in parallel I think random search and iteratively making your grids smaller is a good option

#

Assuming you have many combinations otherwise you could just run grid search ofc

sleek harbor
past meteor
wooden sail
sleek harbor
past meteor
#

Why do you care about the average best? What is your exact problem?

sleek harbor
wooden sail
#

all you can do is compare to the results you get

past meteor
#

I did a bunch of graduate courses on global optimization for non-convex problems. This is one of the god particles.

wooden sail
#

there's probably a parameter you pass to optuna to choose the cost function with which it picks the hyper params

past meteor
#

There's ideas you can do if you want good results on average but I'm curious to know what your exact problem is? Is it really just hyperparameter tuning?

sleek harbor
past meteor
sleek harbor
cold osprey
#

coz its evaluated on the batch size

#

if its evaluated on the entire dataset, then its cost function?

wooden sail
wooden sail
royal void
#

Hi, I need to find a way to get the size of the center clusteron these maps, do you know a way to compute that ? like in the first one i would like a size of 5 and in the second one of 1 or 2

cold osprey
#

is this just a semantic thing?

royal void
#

whoops i just mixed the first and the second *

past meteor
wooden sail
sleek harbor
wooden sail
#

you can pick an "average best" if you like, but there's no special reason why that would be any better

royal void
wooden sail
#

what did you try?

past meteor
#

Tbh the plot you showed doesn't even tell the full story as we can't see what parameters were tried

#

If I were you I would most likely do a small search around the n lowest points and call it day @sleek harbor

wooden sail
past meteor
#

But most likely I would just select whatever came up lowest, hyperparameter tuning is imo something that is rarely worth it time vs. reward wise

restive path
# wooden sail almost all of it, since it's the bread and butter
  1. Basic properties of matrices and vectors: scalar multiplication, linear transformation, transpose, conjugate, range, determinant
  2. Internal and external products, matrix multiplication rule and various algorithms, inverse matrix
    3.Special matrices: square matrix, identity matrix, triangular matrix, idea on sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, biased-Hermitian and unitary matrices
  3. Matrix factorization/LU decomposition concept, Gauss/Gauss-Jordan elimination, solving the linear equation system Ax=b
  4. Vector space, basis, interval, orthogonality, orthonormality, linear least squares
    Eigenvalues, eigenvectors, diagonalization, singular value decomposition
royal void
# wooden sail what did you try?

I tried to define a treshold by taking the mean value as I have a lot of points and define the radius like the first value below the mean +0.01 to be a little higher but in the first image for example the cluster expands a little even when we are below the thresold

#

sorry for my english I'm french

wooden sail
wooden sail
past meteor
#

I disagreeish on these being the essentials because there's so much abstraction in ML nowadays that you can get away with knowing less

#

If you want to make novel stuff then yes, it is the bare minimum

restive path
cold osprey
wooden sail
wooden sail
royal void
#

Here I can get below the treshold in the orange square but I would like to get the red square as size of the cluster

#

Oh yes make a fit should work

past meteor
#

Even in my context (applied research) I don't think anyone remembers what SVD is or how to do PCA from their time in uni

royal void
wooden sail
#

oh oof, in C. well my suggestion would be to set up the math on paper and then code that :p but there surely exists a library that can help you with it

wooden sail
#

but you should understand it inside out

past meteor
#

I'm not even talking about by hand I meant the general procedure ๐Ÿคฃ

wooden sail
#

the conceptual understanding is the most important

past meteor
#

People I work with know what it does and why you'd need it but not the internals

royal void
past meteor
#

For most stuff in my context that is more than enough. In pure industry you can get away with even less

wooden sail
#

i think that's the most important, yeah. if you understand that, you can read an algorithm and understand why it'd work

sleek harbor
#

@wooden sail @past meteor I'm kinda dumb, so pls bear with me a bit. Am I wrong in assuming, that in such a graph, where we want the lowest value, that 1 (or a value very close to 1) would be the best obvious choice? Cus that's what I'd want to get as the "best" hyperparam value. However, usually, just because of how the dataset is split, and random factors one can't control, with enough repetitions and tries, some combination of parameters (and even if we are just tuning this one hyperparam) will have a lower target value (y) at a value with a lower than 1 param value (x).. those (or that one) combo will Not be good when you try it on another dataset.. no? I suck at talking, so I'm not even sure I'm getting my point across..

wooden sail
#

what even is x here

#

what are we looking at

past meteor
#

The number of trials I suppose?

#

1 is most lijely the last trial

sleek harbor
#

a hyperparameter, eta, it's values

wooden sail
#

and what does that control?

past meteor
sleek harbor
#

the learning rate basically

#

of XGBoost

#

just chose a random hyperparam.. a similar picture could be painted for many hyperparams

wooden sail
#

well, you have 2 hyperparams, yeah?

past meteor
#

The value you get in hyper parameter tuning is the average over all of your folds you tried the parameters on

sleek harbor
wooden sail
#

or any other hyperparam for that matter

past meteor
#

Hence why you can take the best one. It should be relatively robust and not something that wildly overfits on your data

sleek harbor
wooden sail
#

what is "close to 1"

#

this will depend entirely on the problem at hand

sleek harbor
past meteor
#

It's fit on ALL folds

#

it's not 1 hyperparam instance on 1 fold

sleek harbor
wooden sail
#

there's no reason why eta has to be close to 1 always, and as zestar says, i would expect any hyperparameter tuning tool to already average over all the folds and trials

wooden sail
past meteor
#

The default eta of xgboost is apparently 0.3 so I wouldn't know why it should be close to 1

sleek harbor
wooden sail
#

what?

#

if there's a problem with all the folds, there's a problem with your dataset

sleek harbor
past meteor
#

I'm not sure you fully understand k-fold and/or hyperparameter tuning?

sleek harbor
past meteor
#

9 times out of 10 for something like xgboost I don't tune it ๐Ÿคทโ€โ™‚๏ธ

sleek harbor
wooden sail
#

certainly, that can happen

#

but are you aware that this problem is at least as difficult as the original one you were solving?

sleek harbor
wooden sail
#

optimizing the hyperparams is a completely separate optimization problem of its own

#

not only that, you won't even be able to check you got the "best" or even "good" hyperparams

past meteor
#

It can indeed happen that your specific instance of hyperparameters do a strangely good job on one fold which biases the result on average but you have to draw the line somewhere imo

wooden sail
#

you validate, and if it performs well, you call it a day

#

you can only check by using arbitrarily large amounts of data

past meteor
#

It's also fine to be "lazy" with hyperparameters imo. For boosting type models I would only tune the rounds of boosting I'm doing

#

Intuition tells me that this is likely the most important hyperparam (unlike for bagged models)

#

Overfitting is mostly related to fitting too many models and not the complexity of each individual one in sequential set-ups

sleek harbor
#

idk.. I just feel like a value of eta of 0.13 would be objectively a bad choice, especially when you can see a graph of points that look to be steadily improve the closer to 1 u get.. to me it seems like that value of 0.13 is pretty much an outlier that should be ignored, since other values around it seem to be on average worse than those closer to 1. Which imo means one should chose a value closer to the average "good". The thing that bugs me is that the optimization algorithm, as far as I understand, agrees with me on that, cus it keeps "suggesting" values closer to 1. But since those values, tho improve on average, don't manage to "abnormally score" the way 0.13 did, 0.13 remains the "winner". I would chose a winner that, say, scores the best among the best group of 10 consecutive averages..

sleek harbor
#

personally, to me, the eta graph looked a lot more informative, with a visible trend.. this looks.. pretty much random (already narrowed down a bit tho, when the range was 30-500 u could see that too low and too high results aren't good)

past meteor
#

If you're tuning multiple hyperparameters then the imortance of n_estimators might be subsumed

#

Kind of similar to colinearity

sleek harbor
#

anyone have a guide to how to tune them properly? cus.. I see tons of various methods, and some of them seem fundamentally wrong to me. For example, the popular "tune one at a time" seems to be a strange choice to me, specifically because of collinearity..

past meteor
#

Tune one at a time is bad as well because the surface is non-convex and some parameters are just unimportant

#

I would only tune n_estimators and call it day. Maybe tuning 5 others would be better than that one but this is such an easy one to tune because it's discrete, you can grid search it even if you want

frigid lion
#

hey so i've been learning data science for a while, displaying, analazying data and mostly machine learning models using sci kit learn and the math behind them but I hear a lot about numpy and ye i've learned about it but still i don't feel like there are so many options that I use in it for it to be talked about so much.
I just want to know how much are you actually using numpy while doing any data science projects

#

and ye i know that many other libraries are based on numpy as well but I just dk if i'm missing sth about it that i don't use it that often by just calling something straight from numpy

#

not sure if you know what i mean but whenever some1 mentions data science 2 things that are mentioned are numpy and pandas and I don't know what it means to have knowledge of numpy

wooden sail
#

it gives you the most control, but you need to know how to do all the math

frigid lion
#

what does it mean

wooden sail
#

it means, if someone gives you some math, e.g. from a recent paper, you can implement it yourself on numpy (because no library will have an implementation of something recent)

frigid lion
#

is there any way to train sth like that because i dont see myself need to ever do things like this so far

wooden sail
#

by doing/reading math and implementing it yourself from scratch

#

for example many people try to set up basic neural networks from scratch using numpy

#

it helps you review both your math and numpy at the same time

frigid lion
#

i havent got to neural networks yet so far so cant speak about it

wooden sail
#

things like linear regression, then

#

anything you've ever done with pandas can also be done with numpy

frigid lion
#

oh k maybe ill try it then this seems a bit hard to code from scratch but i may give it a try

#

or it just seems like that and may turn out not that hard

#

ive implemented knn, naive bayes and decision tree from scratch

#

how do you think linear regression compares to it when it comes to coding from scratch

wooden sail
#

linear regression should be a lot easier

#

it's a good problem to practice many things though. pseudo inverses, gradient descent, newton methods, etc

frigid lion
#

ok thanks a lot

wooden sail
#

from the things you mentioned though, sounds like you're already pretty familiar with numpy

wheat snow
#

is this the right place to ask for help on transforming an html to an csv file?

#

since csv is kinda data science related

hasty mountain
#

Does anyone has experience with Pytorch Geometric? I'd really like to know how its Dataloader does its batching process. It feels like it simply considers batch size = 1 for every sample, and then modifies the tensor dimensions so the model can analyze the graph node, its edges and bonds...

(Yes, I've tried reading the docs, but still didn't figure it out)

#

I'm trying to implement a Unsupervised Pre-training process on a Graph Neural Network, so the way the API is batching the samples is causing me some trouble...

hasty mountain
#

It gets annoyingly slow when it's too big, something that I find strange, but ok...

granite bronze
#

question, im pretty new to python and i wanna learn ai and machine learning. what kinds of things would you suggest me know how to do as a prerequisite, and also do you have any tutorials you would suggest me watch/read when it does come time to learning?

#

sorry if this question is out of place btw

stark zenith
granite bronze
#

thx man i will check that out

stark zenith
#

No problem, enjoy! Try to really commit to it, follow along with the notebooks, and make your own projects.

inland heath
#

hey im trying to use regularisation to improve a linear regression i did. i have an excel spreadsheet with x and y values and i'm not sure how to split the data so that i have a dataset of x and y train and another dataset of x and y test which have to be a numpy array (the extracted data from the excel spreadsheet is in the form of a list within a list (inner list is row values)

#

if yall can provide any suggestions feel free to ping me :)

lapis sequoia
#

.

zinc nova
#

hey hi everyone , anyone interested in nlp and classification of texts ?

wooden sail
#

as for the regularization, what are you trying to do? which property are you trying to enforce?

young granite
#

does one has an idea how i could 3d plot complex numbers in a "unit sphere"?

wooden sail
#

what are you trying to plot?

young granite
#

this just came up to my mind and would be a cool way to show distribution

wooden sail
#

i don't see where the 3d part comes in though

young granite
#

i could do it 2D on the unit circle

#

but for many datapoints it gets unstructured

wooden sail
#

why the unit circle or sphere though

young granite
#

to get a better understanding visualization of the distribution

wooden sail
#

the distribution of what

young granite
#

the complex values

wooden sail
#

i'm not sure i follow what you're trying to do

#

let's forget for a second that they're complex numbers, because they're isomorphic to R2. so we have a set of points in R2. why would you want to project them onto the unit circle for this? this gets rid of the magnitude information and keeps only the phase

gloomy saddle
#

isn't it more I and Q for stuff like this, magnitude and phase?

#

e.g. frequency for X, magnitude for Z, Phase for Y?

young granite
#

thats why i try to figure out how i can use the magnitude as z

wooden sail
#

you only have 1 input axis though. if you want to see the magnitude, you'd just get frequency vs magnitude

#

what's the actual problem? you have some data in spectral domain, and you want to figure out which frequencies have some property?

young granite
wooden sail
#

compare them to what?

young granite
#

each other

wooden sail
#

ok

#

that's very different

#

cuz then you have vectors in C^n, where n is the length of the spectrum. you'd have to do some sort of projection first

young granite
#

why tho lets assume we got a spectrum resulting in 5 freqs, when i plot them all into the unit sphere and lets say another one i can directly compare?

wooden sail
#

hold up

#

each spectrum you want to compare has 5 frequency bins? 5 samples, each one a complex number?

young granite
wooden sail
#

ok. the sphere here is the 4-sphere, a 4-dimensional object in 5d space

#

if you want something you can visualize, you have to do a projection onto a lower dimensional space first

#

and again, projecting onto the sphere gets rid of the magnitude information and leaves only the angle of the vector

young granite
#

mhhh

wooden sail
#

it keeps info regarding relative magnitudes of the complex values relative to each other in each spectrum

past meteor
#

@wooden sail I'm curious how you would solve an issue we had at work recently:

wooden sail
#

is that enough info? you tell us

young granite
wooden sail
#

i'm not sure why you wanted to project onto the sphere yet. there are cases where it makes sense, but visualization is also a completely separate matter

young granite
wooden sail
#

depends on what i'm looking for

young granite
#

would u at all do something like this

gloomy saddle
#

start by better explaining what your trying to visualise, what you have described is very fuzzy?

young granite
#

similarities distributions etc.

past meteor
#

We had a 3d point cloud with each point being an EMG sensor. It's a person moving along a line from back to front (but the direction differs from person to person) the task was to find the right heel

wooden sail
#

are we doing a statistical comparison or a deterministic one regarding shape?

young granite
#

i draw something, give me a sec ๐Ÿ˜„

past meteor
#

So we had measurements every few ms. of the position of each sensor. obviously people are moving (raising, lowering their body parts and thus the sensors)

lone plaza
#

Is this efficient enough for the cost function?

past meteor
wooden sail
lone plaza
#

Is there a build in np function that takes care of 0 and sets them to something slightly bigger than 0 as to avoid taking a log of 0

wooden sail
young granite
#
  1. spectrum, 2) FFT, 3) complex values, 4) sphere plot
past meteor
#

We have some activation values but it's mostly X, Y, Z we're working with. after that we use the EMG activation of a reference point to make our models

#

The stick figure models are a good one! I know it from the context of facial recognition but I hadn't thought of applying it here

wooden sail
past meteor
#

We have a heuristic in place right now, I'll try and see if I can make what you're suggesting work indeed

young granite
wooden sail
#

what?

young granite
#

i keep 5 freq of the resulting FFT

#

or in that example 3

wooden sail
#

ok so the original thing isn't a spectrum

#

cuz the fft yields the spectrum

gloomy saddle
#

1 is normally your raw input data

wooden sail
#

and here i mean spectrum as in spectral domain, its physical meaning notwithstanding

young granite
#

so yes 1 is input data

wooden sail
#

anyway. you have some data, you fft it to get the spectral domain, you keep some fourier bins

young granite
wooden sail
#

do you keep the same bins for all the data?

young granite
#

i can choose whether i keep the 5 with highest power spectrum or [:5]

wooden sail
#

ok. and after doing this, we wanna check how similar the bins are

young granite
#

so not necessarily the 5 highest and therefore could differ

gloomy saddle
#

and after getting frequency and magnitude, a 2 dimensional value, now what? e.g. are you say slicing the input into small time periods, and plotting how the FFT changes over time?

wooden sail
#

in this case the meaning of the fourier axis doesn't really matter

#

these are basically just vectors in C^n

#

is the magnitude of the bins important? or only their ratios?

#

e.g. is the vector [10, 5] the same as the vector [2, 1]? or is the "energy content" important?

young granite
wooden sail
#

ok, then the magnitude doesn't matter and you can indeed project on the unit sphere

#

that can make the distance... tricky to measure, but we can ignore that for now

young granite
#

yeh i think the idea is pretty cool but i struggle with embedding the code ๐Ÿ˜„

wooden sail
#

now we have unit vectors in C^n. and you want to project this to R^3 you say

young granite
#

to get all values inside the sphere

wooden sail
#

that's fairly difficult. hmm

#

i don't think there's a very meaningful way of doing that tbh

young granite
#

mhhh

wooden sail
#

the only way to guarantee you get real values out of a function with complex inputs is to make it a constant function ๐Ÿ˜›

#

you can make 2 spheres, one for the real parts and another for the complex parts

young granite
#

thats fairly simple ๐Ÿ˜„

#

i didnt know u are that pragmatic edd ๐Ÿ˜„

#

๐Ÿ˜›

wooden sail
#

i'm usually a "why visualize" kind of person tbh

#

all right, and then this still leaves the problem that we need a matrix that maps from C^5 to C^3 while approximately preserving distances

young granite
wooden sail
#

the thing is that low dimensional representations never tell the full story ๐Ÿ˜› projections lose information

young granite
#

+1

#

just get best of both worlds id say ๐Ÿ˜›

wooden sail
#

in this case, for example, if your C^5 vectors do not have a sparse representation, it'll be very difficult to embed them while preserving distance

#

the easiest approach is to make a random matrix size 3 x 5 where the entries are random, and just use that

young granite
#

wont it be possible to use the PS for Z and norm them?

wooden sail
#

what's PS?

young granite
#

power spectrum, but nah then i loose information

#

mhhh

wooden sail
#

right, you'd lose info

#

you can try, why not. compare it to the approach with 2 spheres

young granite
#

always a pleasure to hear (read lel) ur thoughts โค๏ธ

#

but then i would only represent data in 1/2 the sphere

#

so maybe not the PS

wooden sail
#

also note that a matrix with 5 columns has a spark that is at most 4, i.e. in the BEST case, we take 4 columns and they're now linearly dependent. that means you can only really COMPLETELY discern vectors that are 2-sparse

#

which is pretty strict

young granite
#

๐Ÿ—ฟ

#

2 spheres it is then ๐Ÿ˜„

#

but ill see what i can come up with after ur input

#

maybe i ask a college aswell what he thinks bout this

wooden sail
#

this will be a problem regardless of what you do, i'm just saying you will very likely not get anything useful out of this approach

#

regardless of using power spectrum or not

young granite
#

mhh

wooden sail
#

the problem is projecting down to C^3

young granite
#

so better sticking with 2D?

wooden sail
#

better not project and do it in C^5, then make plots of the distances

#

the more you project, the worse the problem gets

#

but go ahead and try. maybe we'll be pleasantly surprised. but if it doesn't give anything interesting, you shouldn't be surprised

young granite
#

pushing boundries lel

wooden sail
#

try making one sphere in R^3 using the power spectrum, and to spheres (real and imag) using the complex fourier bins and see if anything looks nice

obsidian peak
lone plaza
#

Is there any experienced python developer who's willing to look through my self written ai? Nothing impressive tho, it is just a prove of concept for me

young granite
#

@wooden sail i created worms ๐Ÿ—ฟ

wooden sail
#

lol

young granite
#

generated sine functions with noise and some freqs

#

but thats it for now i guess first discussing this with my college next week so i dont waste more time xD

steady bronze
#

hey guys do i need to pay for the open ai gpt api
because when i create a api key and try to use it its not working

from langchain. llms import OpenAI
llm = OpenAI()
llm("explain large language models in one sentence")

this is my code but the response i get is
RateLimitError: You exceeded your current quota, please check your plan and billing details.
i have never even used my api key before
i just created i

spiral smelt
#

Hello, I was just wondering whether anyone had any experience in neural network image classification? I've written a Python script that image classifies two categories, however I would like to extend it to 10 categories. Any help would be really appreciated, because I'm a bit lost on how to do this ๐Ÿ™‚

cold osprey
#

increase outputs to 10 at your fully connected layer

spiral smelt
#

Adds in our layers

Adds a convolutional layer and a max pooling layer

Has 16 filters (3,3 pixels in size)

Stride moving one pixel by one

Extracts the relevant information to make a classification

Applies a relu activation - taking into account non-linear patterns

Image shape is going to be 256 wide by 256 heigh, 3 channels deeps

model.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(256,256,3)))
model.add(MaxPooling2D())

Adds a convolutional layer and a max pooling layer

Has 32 filters (3,3 pixels in size)

Stride moving one pixel by one

model.add(Conv2D(32, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

Adds a convolutional layer and a max pooling layer

Has 16 filters (3,3 pixels in size)

Stride moving one pixel by one

model.add(Conv2D(16, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())

Flattens to remove the channels value

model.add(Flatten())

256 values will now be the output

model.add(Dense(256, activation='relu'))

Creates a single output, 0 or 1

model.add(Dense(1, activation='sigmoid'))

Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.

model.compile('adam', loss=tf.losses.BinaryCrossentropy(), metrics=['accuracy'])

Displays how the model transforms the data

model.summary()

spiral smelt
cold osprey
#
model.add(Dense(9, activation='sigmoid'))

9 or 10

#

should be 10, one for each class

spiral smelt
#

Okay one second I'll have a go ๐Ÿ™‚

spiral smelt
cold osprey
#

how does ur data look

spiral smelt
#

I have a folder called 'data' within the folder I have three sub-folders 'train' , 'test' and 'validation', within those folder is 10 categories that contain different items of clothing

cold osprey
#

u using data loaders or?

spiral smelt
# cold osprey how does ur data look

would I be worth will sharing my entire code, thank you so much for this been working on this for about 30 hours :/ im using os to load the data from the directories?

cold osprey
#

sure

#

im in a dota game rn tho hahha

spiral smelt
#

Oh don't worry if you're busy ๐Ÿ™‚ I can keep working on it @cold osprey

sleek harbor
#

I've never seen this done before (summing up results of predictions of the test set made with models trained on train-validation sets across kfolds, and then divided by the total folds). Is this a common practice? Cus so far I've only come across the popular "refit all training data with best cross val results and then predict test data with that model".. never seen something like this before in courses or tutorials, but it does kinda make sense
source: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html

quartz ivy
cold osprey
#

yes

#

can just use CrossEntropyLoss

#

hmm thats pytorch

#

not sure what is the tf equivalent iis

spiral smelt
cold osprey
#
model.add(Dense(1, activation='sigmoid'))
#

if u change this to 10 and the loss to CategoricalCrossentrypy, what happens?

#

the way uve set up ur code is abit weird too

#
if yhat < 0.5: 
    print(f'Predicted class is dress.')
else:
    print(f'Predicted class is hat.')
``` like this bit
#

are u following a course for this or?

spiral smelt
cold osprey
#

ah ic

spiral smelt
# cold osprey ah ic

Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.

model.compile('adam', loss=tf.losses.CategoricalCrossentrypy(), metrics=['accuracy']) - so when I ran this it came up with this error

#

AttributeError Traceback (most recent call last)
Cell In[52], line 2
1 # Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.
----> 2 model.compile('adam', loss=tf.losses.CategoricalCrossentrypy(), metrics=['accuracy'])

File ~\lib\site-packages\tensorflow\python\util\lazy_loader.py:59, in LazyLoader.getattr(self, item)
57 def getattr(self, item):
58 module = self._load()
---> 59 return getattr(module, item)

AttributeError: module 'keras.api._v2.keras.losses' has no attribute 'CategoricalCrossentrypy'

cold osprey
#

lel theres a typo

#

CategoricalCrossentrypy - > CategoricalCrossentropy

spiral smelt
#

yeah just realised sorry

spiral smelt
# cold osprey lel theres a typo

so know when I run - # Model.fit takes in the training data

Epoche is how long we're going to train for

Passes through the validation data, to see how well the model is performing in real time

Stores in a variable called history

hist = model.fit(train, epochs=20, validation_data=val, callbacks=[tensorboard_callback]) - it comes out as:

cold osprey
#

yes epochs is how many times we pass through the whole dataset

spiral smelt
#

I'm getting an error when I run it saying: ValueError: Shapes (None, 1) and (None, 10) are incompatible

cold osprey
#

where is the error from?

#

like which line

spiral smelt
#

hist = model.fit(train, epochs=20, validation_data=val, callbacks=[tensorboard_callback]) - its coming from this

#

oh wait one sec

spiral smelt
cold osprey
#

does ur data only have 2 classes?

#

how does y look for ur data?

spiral smelt
#

10 categories, but maybe I didn't set it up right, should I print y?

#

so this is how I set up the classes:

#

Builds an image dataset, using keras

test = tf.keras.utils.image_dataset_from_directory('data/test')
train = tf.keras.utils.image_dataset_from_directory('data/train')
val= tf.keras.utils.image_dataset_from_directory('data/validation')

#

this is the output: Found 249 files belonging to 10 classes.
Found 3054 files belonging to 10 classes.
Found 194 files belonging to 10 classes.

#

Allow us to convert to a numpy iterator, allows access to the image dataset

data_iterator_test = test.as_numpy_iterator()
data_iterator_train = train.as_numpy_iterator()
data_iterator_val = val.as_numpy_iterator()

cold osprey
#

ye

#

looking at ur code

spiral smelt
#

Thank you, honestly I appreciate this so much

cold osprey
#

basically the last layer should output 10 numbers

#

logits or probabilities

#

which the highest will be what it classifies the image as

#
 'Trouser': 1,
 'Pullover': 2,
 'Dress': 3,
 'Coat': 4,
 'Sandal': 5,
 'Shirt': 6,
 'Sneaker': 7,
 'Bag': 8,
 'Ankle boot': 9}``` then u would have something like this
#

so say the first '0th' was the highest, then its a tshirt/top

spiral smelt
#

okay that makes sense, so how do I assign the categories to there number

#

So I guess at the moment it's only assigning to either 0 or 1 and not the entire range

cold osprey
#

ya when u set ur last layer to output 1 only, its outputting one number which u then see if its < 0.5 or < 0.5 (yhat)

#

which is a ok way to do it but harder when u want to modify it for multiclass classification

#

what i wouldve done for binary classification is just output 2 classes with the same idea as 10 classes

#

i think the error is coming from how the data is set up hmmmm

#

am comparing to my pytorch code rn

#

been a while since i used tensorflow

spiral smelt
#

Okay, thank you, I'm googling too, to see what solution there is

cold osprey
#

could u print one of ur data and see how it looks like?

spiral smelt
#

okay I think I figured out the label problem I included this:

#

Copy code
num_classes = 10 # Replace 10 with the actual number of classes in your dataset

test = test.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))
train = train.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))
val = val.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))

#

I now running the testing which is working (yay!) I'll let you know the results

cold osprey
#

๐Ÿ‘

spiral smelt
cold osprey
#

if loss is going down and accuracry/other metrics is going up, should be fine

spiral smelt
#

Checks which class is assigned to which image

Checks that they've been scaled correctly

fig, ax = plt.subplots(ncols=1, figsize=(20,20))
for idx, img in enumerate(batch_train[0][:10]):
ax[idx].imshow(img)
ax[idx].title.set_text(batch_train[1][idx])

#

it doesn't display a grid of images, with there number assigned to them

#

on my 4th Epoch, it's being incredibly slow

cold osprey
#

training on a gpu?

#

if no, u can try google colab for free gpu

spiral smelt
spiral smelt
cold osprey
#

if ure using tensorboard, i think u can view the loss and accuracy in real time?

spiral smelt
#

I'm on epoche 9 and it says the loss is 2.1518 and the accuracy is 0 :/

#

Sorry was looking at the wrong metric the accuracy is 0.2603 but isn't improving

cold osprey
#
model.add(Dense(1, activation='sigmoid'))
``` may need to change this to relu
#

id suggest looking for a tutorial on multi class classification and working from that instead

#

also pytorch > tensorflow hahah

#

high chance the problem is from the data

#

else the model just isnt good enough

spiral smelt
cold osprey
#

u can use more layers too

#

or bigger layers

spiral smelt
spiral smelt
frigid lion
#

hey so atm im doing jose portilla machine learning course on udemy and i would also like to do the andrew ng course on coursera but i see a lot of the content is behind the paywall do you think the free part of the course is good enough or wont make much sense without the paid lessons as well

#

i will soon end the jose portilla course i have just a few lessons left

past meteor
#

My tip: go for a book after that course

frigid lion
#

which book?

#

and why do you think so

past meteor
frigid lion
#

ive been reading a bit from this book while taking this course cuz jose recommended it as well

#

do you have any idea doe if the andrew ng course makes sense if i were not to pay for it

past meteor
#

Normally you can always audit courses, which is follow them for free but some content is "hidden"

frigid lion
#

ye i know i can audit for free but the amount of the things that are locked behind paywall seems like a lot and i feel like these are also important topics that are there

past meteor
#

You can also read the sci-kit learn user guide. Some things might not make sense but you can google the terms to understand them better

#

You should read chapter 6, 10 and then 1, 2, 3 and 4

frigid lion
#

if some1 else has some knowledge about the andrew course i'd appreciate as well

cold osprey
#

but i already knew some sk learn before this

#

mainly regression

#

now doing a pytorch course then have some personal projects planned

past meteor
cold osprey
#

ah okay

past meteor
#

Some formulas aren't derived fully either so sometimes it feels like they're making a "jump" but that also means it's pretty hands-on

cold osprey
#

hand wavy maths is my fav kind

#

xd

past meteor
#

For me it depends. I did an entire course on just support vector machines in uni. Most of it was math, most of it was fun. Doesn't really make you significantly better at using SVMs though ๐Ÿคทโ€โ™‚๏ธ

sinful kelp
#

I did a course in machine learning which was very maths and stats focused. It felt like it gave me a good foundation for a lot of the concepts, but when it comes to actual machine learning, there seemed to be a bit of a disconnect between the ideas and the actual methods in practice.

past meteor
#

My first ML course actually only made sense to me after I did other courses... It was very theoretical and also covered stuff that is not really relevant like theta subsumption, inductive logic programming, ...

cold osprey
#

hmm i dont have formal education for ML

#

but got the maths from my degree

sinful kelp
#

I would say that the most useful concepts mainly came from statistics. I have found Bayesian statistics a very useful way to think about ML and data in general.

past meteor
#

Bayesian stuff is cool until you run out of memory and that's the part they don't talk about in stats classes. In ML classes they will, they'll also tell you variational inference exists but they won't tell you that the probabilities you get out of it aren't great.

sinful kelp
#

I would agree with that.

cerulean kayak
#

does anyone know of an alternative to feature importance? at me if u respond

next valley
#

Foundation in the mathamatical theory and concepts are important if you want to make novel models, if you're just copy and pasting pre made models all you really need to get going are some hands to manipulate the data to fit the inputs of the pre made model

cerulean kayak
# next valley Foundation in the mathamatical theory and concepts are important if you want to ...

okay so I have a model that I made (you dont have to read it all but I wanted to be as specific as possible):

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
x = data.drop('Rings', axis=1)
y = data['Rings']
x_train, x_test, y_train,y_test=train_test_split(x,y,test_size=0.3)
clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)
clf_gini.fit(x_train, y_train)
y_pred_gini = clf_gini.predict(x_test)
print("Accuracy with gini index: {0:0.4f}".format(accuracy_score(y_test,y_pred_gini)))

and then I got 0.2706 which is of corse abysmal (note data is a pandas array created by a read_csv function.) and I want to be like "because this is real bad we need to find out what is throwing us off." and I know the anwser is we need to drop the sex column, but I don't know how to come to that conclusion. My friend did this using a random forest tree instead of a decision tree, so he used feature importance. I read online that feature importance is more for random forest than decision tree, so what should I use?

mint palm
#

I have an interview for Data Science role. I realise i am more into ML stuff, and been a while since I did my project and statistics stuff on R.
Can someone give me list of topic that you would revise before interview? Also i realise i forgot about various distributions, quantiles, QQ plot etc. so please try to include related important stuff.
Thanks in advance for your time.

sinful kelp
cerulean kayak
mint palm
mint palm
agile cobalt
#

max depth 3 also sounds a bit shallow, though that depends on your data

cerulean kayak
agile cobalt
next valley
# cerulean kayak okay so I have a model that I made (you dont have to read it all but I wanted to...

There are many ways, also a random forst trees are technically just a bunch of decision trees that average their results to give a output
since you are using a decision tree classifier, it's best to first see your datas total columns as decision trees branch based columns averages

One reason may be that your data has way more dimensions than your tree can handle, therefore perhapse it'll pay off to increase the depth of the tree

If the depth of the tree exceeds the total colums of the data, there may be issues with the data itself try seeing if the data is complete, i.e. there are no missing values

It may also be the case that the data itself is non linear in nature, therefore it'll be hard for a decision tree to model the data

agile cobalt
#

but about the original question, as far as I am aware, using random forests (even if your final model isn't a random forest) is the best way to automatically determine which features are or aren't important
usually you may want to filter by hand as well

cerulean kayak
next valley
#

you can also speed up inference or sending data for inference via dimension reducing techniques such as PCA, which is basically finding which feature (columns) influence the labels (prediction) the most

Course this also comes at the cost of you potentially discarding important information that may pop up in the future that would be important to your model

#

@cerulean kayak that help?

cerulean kayak
#

give me a second, I vaugly know what you are talking about: if you can't tell im in a college class and alot of my problems are stemming from the fact that I know stuff but I don't know it's specific name.

next valley
#

Pca is principal component analysis

sinful kelp
cerulean kayak
sinful kelp
next valley
#

Kind of, it's more of a analysis of what columns you dont need, hence analysis in principle component analysis

#

@cerulean kayak

cerulean kayak
#

okay and pca is not limited to clustering algorithems? because i know it deals with knn, which is a clustering algorithem

sinful kelp
#

no PCA is a method for transforming your data into a new space where each axis explains how the data varies.

#

It's typically used for visualizing high-dimensional data (probably where you saw it being used for KNNs), but it can also be used to generate new, more relevant features from your data

cerulean kayak
cold osprey
#

as in drop them from X

next valley
cold osprey
#

like if u know feature_45 is not useful, u may not even query the data from the database say

sinful kelp
#

The predictions are learnt automatically from the model (e.g. the decision tree). The feature engineering (pre-processing of the data) can often be done by hand.

agile cobalt
cerulean kayak
cold osprey
#

oh hmmm

#

gender isnt ordinal tho

#

i guess it doesnt matter for a tree based model?

agile cobalt
#

"i"?

cold osprey
#

no it does i think

agile cobalt
#

a tree based model would do something like 1 goes left, 2 and 3 goes right or 1, 2 goes left and 3 goes right

cold osprey
#

I for 'i dont know'

cerulean kayak
cold osprey
#

which implies some order in the feature

agile cobalt
#

no, it does treats it as numbers - I'm just using the discrete labels because those are all the possible values

cold osprey
#

ah okay

sinful kelp
agile cobalt
#

but yeah it cannot do 1, 3 left, 2 right in one split I think

cold osprey
#

yeah idt it can if its treating it as numbers

cerulean kayak
#

so is mapping it like this wise? because im basing this off a lab my ta did for a DT and they did the same thing but with doors on a car: {2 doors:2, 3 doors:3 4+ doors : 3}

cold osprey
#

read up ordinal vs nominal data

#

stuff like gender, brand & colour is nominal

#

generation (boomer, millenial, genZ) is an example of ordinal

#

hmm thinking about it, generation may not necessarily be ordinal too, depending on the context

cloud marsh
#

cupy basically reimplements numpy methods to use CUDA where possible right? how are dependencies resolved for higher level projects that depend on numpy?

cerulean kayak
#

okay and real quick @agile cobalt what do you think I should do for the depth of my tree?

next valley
cold osprey
#

Just experimented with a transfer learning model

#

does the constant up and down fluctuations of loss and accuracy mean anything?

next valley
#

academic term for it is called high variance iirc, there are a lot of things that can affect this and it depends highly on what exactly your model is and how you are feeding the data

cold osprey
#

for background, its a EfficientNet B0 model that im tuning the fully connected layer to classify 3 food classes

#

proly an overkill model but ye

next valley
#

may be that the layers you didn't freeze have too high of a learning rate set to them

#

or it may be beneficial to increase the batch size

cold osprey
#

0.001 seems pretty small hahah

#

batch size is 32 en. lemme double it

#

another q, do we need train val test datasets for NNs?

#

or is 2 sets enough

next valley
#

what?

#

oh

cold osprey
#

currently im only splitting my data into train and test

next valley
#

train val/dev test is extremely important to fine-tune a model, without a test set you risk over fitting to your model to the val/dev set as well when trying to address variance issues between your train val/dev set

cold osprey
#

cool

#

thats what i thought

#

larger batch size = more gpu memory usage, coz more data has to be in memory when updating the model params?

next valley
#

depends on where you are loading the batches to yes

#

also i would suggest you reduce the epoch count, it doesn't seem like the dataset you are fine tuning it to is large enough to justify 100 iterations on it

cold osprey
#

yeah haha

#

50 seems more than enough

#

oh hmm something went weird

#

epoch 81 onwards, both loss became nan

#

and accuracy tanked

next valley
#

something may be wrong with your data

cold osprey
next valley
#

oh

cold osprey
#

some params went to zero in the model im guessing

#

or overflow?

next valley
#

no, try clearing the variable holding the accuracy train_accuracy test_accuracy

gloomy saddle
#

probably means something went wrong in one of your scoring or loss functions ๐Ÿ™‚

next valley
#

note how train_accuracy and test_accuracy go beyond 80 epoches

cold osprey
#

ya coz loss at epoch 80 onwards is nan

next valley
#

i have no clue what your code structure is but it may have been that you forgot to reinitialize the variables you used to graph the loss and accuracy

cold osprey
#

ye i have hella code

#

sec

#

!code

arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

cold osprey
#

there

#

at the bottom, i have a train function which i call with all the parameters i need

#

seems like the only way loss can be nan is the len(dataloader) being 0

#

line 69 and 122

#

gradients...

next valley
#

unfortunatly I use tensorflow but based on my limited knowledge of pytorch my assumption is that you didn't update the test_dataloader variable to be the new epoch

#

try adding a print statement with print(len(test_dataloader)) at line 122

#

that's the best i can come up with to verify that you did things right

cold osprey
#

haha i cba tbh since i wont be running 100 epochs

#

will leave it for future me to figure out when i run a model that does require that many epochs

next valley
#

other than that I'm unsure what parts of the model you are trainning but I think adding some form of regularization may help, or maybe reshuffle the images in the dataset to see if maybe you may have some batches that are easier than others

cold osprey
#

the base model EfficientNet B0

#

trained on 1 mil images, 1k classes

next valley
#

like, the whole model

cold osprey
#

yeye

#

pretty strong base model

#

just fine tuned on the fully connected layer

next valley
#

oh, i though you where training all layers, when i meant by the whole model i mean you un froze all the layers

cold osprey
#

nah nah hahaha

#

froze all the CNN bits of it

next valley
#

besides those things, i guess another thing to help with regularization to smooth out training loss would be to add dropout to the MLP if it isn't already there, I'm unsure about the architecture of efficientnet

cold osprey
#

yep there a dropout layer

#

p=0.2

#

can maybe increase it

next valley
#

Yup

cerulean kayak
next valley
cerulean kayak
#

also if you are not supposed to do mapping to values for nominal data, should I use dummies instead?

next valley
# cerulean kayak okay. > it's best to first see your datas total columns as decision trees branc...

Poorly worded on my part

Decision trees work at a high level by chosing a column and then deciding how to branch into another column
Let's say that your data is a matrix/tensor of dimensions/shape [m, n]
Therefore in theory the most optimal branch depth would be where depth = n before the tree starts looping over all columns

On the subject of mapping to values for nominal data, traditionally it doesn't matter as decision trees can also split on nominal data

however sklearn decision tree classifier cannot handle nominal data and therefore you must transform it to be ordinal

#

@cerulean kayak

#

Also, when i mean optimal i mean by accuracy, if you want optimal in terms of accuracy and inference speed then you'll need to figure out how to reduce how many features (columns) are in your data set

#

Hence you can perform pca to prune features that are deemed irrelevant

cerulean kayak
cold osprey
#

u can one hot encode them

cerulean kayak
#

%$#$@#&%
okay...

next valley
#

It all depends on how you manipulate your data, welcome to machine learning where 80% of the time is asking how tf do i make my data work

cerulean kayak
#

o trust me, yesterday my model had 0.76 accuracy and today it has 0.23 and i changed the print from a .format to print(f"")

cold osprey
#

sounds like a bigger problem than the print

cerulean kayak
#

well i have 2 other witnesses who say the same thing

cold osprey
#

Anyone got any sklearn subclassing code i can refer to?

#

or is that not a thing?

serene scaffold
cold osprey
#

like how its done in pytorch/tensorflow

serene scaffold
#

an estimator? not sure what you're referring to.

cold osprey
#

any classifier

#

nothing this complicated but what im saying is, whats the diff of declaring a classifier like

lin_reg = LinearRegression() ```
#

and ```py
class SpecialLinearRegression(LinearRegression):

def init(self):
pass

special_lin_reg = SpecialLinearRegression()```

cloud marsh
#

what are some good options for managing data you plan on sending to tensorboard?

#

or just tensorboard tooling and data science log/benchmark data in general

dense oar
#

Any good resource recommendations (websites, books, etc) for learning AI with Python? For complete beginners

cloud marsh
magic dune
#
import numpy as np
import matplotlib.pyplot as plt


class NeuralNetwork:
    def __init__(self, layers, lr, epoch, X, t):
        self.lr = lr
        self.epoch = epoch
        self.layers = layers
        self.X = X
        self.t = t
        self.weights = {layer_idx: np.random.randn(layers[layer_idx + 1], layers[layer_idx]) / 5 for layer_idx in
                        range(len(layers) - 1)}
        self.bias = np.random.randn((len(layers) - 1), 1) / 5
        self.z_dict = {i: np.zeros((layers[i])) for i in range(len(layers))}
        self.z_dict[0] = X[0].flatten()
        delta_3 = (self.z_dict[2][0] - t[0]) * (self.z_dict[2][0] * (1 - self.z_dict[2][0]))
        delta_4 = (self.z_dict[2][1] - t[1]) * (self.z_dict[2][1] * (1 - self.z_dict[2][1]))
        self.delta = np.array([delta_3, delta_4])
        self.plot_data = []

    def forward(self):
        for z in X:
            z = z.reshape(-1, 1)
            for layer_idx in range(1, (len(layers))):
                a = np.matmul(self.weights[(layer_idx - 1)], z) + self.bias[(layer_idx - 1)]
                z = 1 / (1 + np.exp(-a))
                self.z_dict[layer_idx] = z.flatten()
            error = 0.5 * (z.flatten() - t) ** 2
        total_error = np.sum(error)
        return total_error

    def sigmoid(self, z):
        return z * (1 - z)

    def backward(self):
        for l in reversed(range(len(self.weights))):
            diag = np.diag(self.delta)
            arr = np.array([self.z_dict[l], self.z_dict[l]])
            new_derivatives = np.matmul(diag, arr)
            self.weights[l] = self.weights[l] - (self.lr * new_derivatives)
            self.bias[l] = sum(self.delta)
            sigmoid_arr = np.diag(self.sigmoid(self.z_dict[l]))
            self.delta = np.matmul(sigmoid_arr, np.matmul(self.weights[l].T, self.delta))
        return self.weights, self.bias

    def train(self):
        for e in range(self.epoch):
            total_error = self.forward()
            self.plot_data.append([e, total_error])
            self.backward()
            print(f"{e}: {total_error}")
        return self.weights, self.bias,

    def predict(self):
        return self.z_dict[2]

    def plot(self):
        data = np.array(self.plot_data)
        plt.scatter(data[:, 0], data[:, 1])
        print(data[:, 0])
        print(data[:, 1])
        plt.xlabel("Epoch")
        plt.ylabel("Total Error")
        plt.show()



if __name__ == '__main__':
    X = np.array([[0.05, .10]])
    t = np.array([1.00, 3.00])
    lr = 0.5
    n = 2
    H = 2
    output = 2
    epoch = 40000
    layers = [n, H, output]
    nn = NeuralNetwork(layers, lr, epoch, X, t)
    nn.train()
    print(nn.predict())
    nn.plot()

rate my neural network code?

cloud marsh
magic dune
cloud marsh
# magic dune lol

i have no control-f here. wtf lol. does that sigmoid function work?

magic dune
cloud marsh
# dense oar Any good resource recommendations (websites, books, etc) for learning AI with Py...

quantecon has a few good books on finance, but also a good intro to python & data science:

I would start here: basics on the python ecosystem for data science programming for econ/finance: https://python-programming.quantecon.org/intro.html

when i looked at finance/econ in the past, it seemed that getting data sets and access to data streams were about as complicated as any programming.

cloud marsh
wooden sail
#

also jordan normal forms are not diagonalization in general

#

what np diag does is one of two things:

  • if you give it a vector, it spits out a diagonal matrix that is 0 everywhere except on its diagonal where it has your vector
  • if you give it a matrix, it takes the diagonal of that matrix and spits it out as a vector
cloud marsh
#

so, it's maybe useful when you want the variances and not the covariances

wooden sail
#

!e

import numpy as np
M = np.random.normal(size=(3,3))
m = np.diag(M)
print(M)
print(m)

M_hat = np.diag(m)
print(M_hat)
arctic wedgeBOT
#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[-0.36296454 -1.46717512  2.73763072]
002 |  [ 1.42493807  0.14579904  0.99692921]
003 |  [-1.60806289  0.37930145  1.77627134]]
004 | [-0.36296454  0.14579904  1.77627134]
005 | [[-0.36296454  0.          0.        ]
006 |  [ 0.          0.14579904  0.        ]
007 |  [ 0.          0.          1.77627134]]
wooden sail
#

that could be an example, sure

past meteor
cloud marsh
#

why would errors like this "Unsatisfied version of shared singleton module @late shell-widgets/base" occur?

i'm getting a similar warning when trying to render textures with k3d. i'm trying to evaluate whether I can write data to the texture and update it, but it's not rendering.

https://github.com/jupyterlab/jupyterlab-desktop/issues/576

#

i was looking into diagonizability about 6 months ago. i can't remember why, but i came across jordan normal form. i guess it's what you can do if it's not diagonalizable @wooden sail if i could ever get past the boilerplate of setting up environments/languages, then i could actually apply things and then probably retain them.

cloud marsh
#

ok i guess i could just use PyVista/Trame then

flint gazelle
#

Why cant i use an activation function in the last layer here : ```python3
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(70,), dtype=tf.int8),
tf.keras.layers.Dense(400, activation='relu'),
tf.keras.layers.Dense(2000, activation='relu'),
tf.keras.layers.Dense(1500, activation='relu'),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dense(400,activation='relu'),
tf.keras.layers.Dense(200,activation='relu'),
tf.keras.layers.Dense(1,'sigmoid'),
])

Epoch 1/2000
518/518 [==============================] - 13s 22ms/step - loss: 0.4892 - mae: 0.4892 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 2/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 3/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 4/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 5/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 6/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873

#

but if i use no activation function

Epoch 1/2000
518/518 [==============================] - 13s 22ms/step - loss: 0.7076 - mae: 0.7076 - val_loss: 0.2777 - val_mae: 0.2777
Epoch 2/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.2634 - mae: 0.2634 - val_loss: 0.2562 - val_mae: 0.2562
Epoch 3/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.2405 - mae: 0.2405 - val_loss: 0.2353 - val_mae: 0.2353
Epoch 4/2000
518/518 [==============================] - 13s 24ms/step - loss: 0.2238 - mae: 0.2238 - val_loss: 0.2233 - val_mae: 0.2233
Epoch 5/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.2098 - mae: 0.2098 - val_loss: 0.2134 - val_mae: 0.2134
Epoch 6/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.1987 - mae: 0.1987 - val_loss: 0.2025 - val_mae: 0.2025

using any other activation function in the output layer results in no progress in training. I also tried to just use

def custom(x):
    return tf.clip_by_value(x, clip_value_min=0, clip_value_max=1)

The output has to be between 0 and 1. Can anyone help me here ? I appreciate any help.

flint gazelle
#

x_shape(70,) y_shape(1,)

#

x is a chessboard 8*8 + extra info 6 bytes and y the evaluation score for the board

past meteor
#

y is real valued between 0 and 1?

flint gazelle
#

yes

past meteor
#

Then you need a linear activation and not sigmoid in your last layer

flint gazelle
#

but

def custom(x):
    return tf.clip_by_value(x, clip_value_min=0, clip_value_max=1)

is linear, right ?

past meteor
#

Linear is the default, just remove sigmoid.

flint gazelle
#

yeah i know but the output values have to be between 0 and 1. If i just dont use a activation function. There will be values higher than 1 and lower than 0

past meteor
#

Have you tested this so far?

flint gazelle
#

Yes

past meteor
#

And it was larger than 0 and 1?

flint gazelle
#

Yes

past meteor
#

You can also just use a linear activation and clip inside of your forward method EDIT: it's called call in tensorflow

#

What loss are you using?

flint gazelle
#

mae

#

I also tried mse but mae worked better

#

But to me its still weird that i cant use any activation function even the clip one, and i think i said something wrong its not the score of the board but the expectancy that whit might win. 1 is white completley winning 0.75 white has advantage and 0.5 is even and 0.25 is disadvanage for white and 0 is completley losing, but continues values. so any value inbetween is possible

#

The y values for training are also be between 0 and 1 continous

past meteor
#

For example for predicting pixels I've done linear => sigmoid before with binary cross entropy loss

#

MSE could work as well in this case

flint gazelle
#

I am currently trying but it doesnt look that promising

past meteor
#

I'd also just make your network a lot smaller

flint gazelle
#

i have to finsih training to see that

lavish kraken
past meteor
#

Helping people debug their networks is hard if I'm not sitting next to time ๐Ÿคฃ

flint gazelle
#

Yeah, but i appreciate your help. Sitting training the network the whole weekend here.

#

So some values are still a little bit above like 1.07 when just using mse and linear activation function. I will just clip the values after evaluating as you said.

#

Green is mse, yellow is mae

#

So it actually is a little better now

#

Do you have any further advice to increase the accuracy of the model ?

past meteor
#

Making it smaller for starters and trying cross-entropy loss

flint gazelle
#

I dont quite understand how i should use a cross-entropy loss. I thought these were used for classification and integers representing the class labels. Whereas i use continous values. Should i divide into value ranges so 0 -> 0.1, 0.1 -> 0.2 and so on ?

past meteor
#

No you just drop it in. Cross-entropy works for anything between [0,1] (look at the formula). This is what I did when I was training an autoencoder, pixel space is [0,1] so I could use sigmoid => MSE or sigmoid => cross-entropy.

#

Loss functions are strongly related to a different likelihood so you're optimising for something else as you would in MSE (\eta ~ gaussian vs. Y ~ bernoulli). You can reason about what makes more sense in your case, I think there's arguments for both! ๐Ÿ™‚ OR you just try it out and see which one works best empirically

warm iron
#

Hey guys I was trying to install tenserflow library but my cmd don't work as it says "pip is not recognized as a internal and external command "

#

what should I do to resolve this issue?

flint gazelle
#

You have to set the path to the directory where python is installed in you envroirment variables

#

but its recomended to use a virtual envroirment

warm iron
#

this is the python file as well as library

flint gazelle
past meteor
flint gazelle
warm iron
flint gazelle
#

No

#

just look up how to set up a virtual envroirment for example with conda and than activate the envroirment and you can get started

warm iron
flint gazelle
#

You can, but this can become an issue later when you have other projects using the same interpreter, because the dependencies might have missmathching versions and so on. If you want are more userfriendly way, you can download PyCharm they have a VirtualEnvroirment inuild in their ide

flint gazelle
#

wrong channel

#

one below

blissful vine
#

Oops

lapis sequoia
#

Hi guys,

I have images for 60 patients which gives cell types and wether the cell is cancerous or not.

And then I have images for 40 other patients which only tell wether cell is cancerous or not.

How can I make use of the extra 40 patients data to train celltype classification in CNN.

queen cradle
lapis sequoia
#

Just the cell type

#

Cell is X type

queen cradle
#

You don't need to identify cancerous versus non-cancerous?

lapis sequoia
#

Nope

queen cradle
#

In that case I think the data where you don't have the cell type is useless.

lapis sequoia
#

Really

#

Can't be

#

In the assignment it specifically says you need to find a way to use it

queen cradle
#

I guess I can imagine training an autoencoder with it.

#

Okay, maybe it's not useless.

lapis sequoia
#

Gpt says to use that extra data to augment the images

#

And then dropping the extra labels

#

It says that will give extra info to the augemntor

queen cradle
#

Don't ever trust ChatGPT.

#

It literally has no idea what it's talking about.

lapis sequoia
#

Yeah but it helps when I have no idea what I am talking about too

queen cradle
#

You said you had an assignment. What kind of course is this?

lapis sequoia
#

ML

queen cradle
#

Is the assignment specifically about certain architectures?

lapis sequoia
#

Nope. Just needed to classify images

#

And justify the choice

#

Gpt said cnn #1 CHoice. I trusted it

queen cradle
#

What kinds of classifiers are you familiar with?

lapis sequoia
#

I think I know the simple ones

#

New to Neural Nets

queen cradle
# lapis sequoia Gpt said cnn #1 CHoice. I trusted it

Please don't. It's a language model. It got some text as input and it generates text as output. It has no understanding. If you want proof, ask it to do arithmetic. Or just, "reverse the digits of 3141592653589793238462643383279".

queen cradle
lapis sequoia
#

The reversed digits of 3141592653589793238462643383279 are 9723834362468943975859382659413.