#data-science-and-ml
1 messages ยท Page 60 of 1
Do you rescale your output?
I've done that with just one of the coordinates, latitude vs time and just a column
I assume you're using MSE or something similar but lat, lon and altitude are on different scales
Like, on paper what you're doing is that you have a 3D input that is fed into your model and together with the latent vector Z it produces a 3D output. This 3D output is what produces the loss, this is pretty much like multi task learning.
what I don't get is how it can compute the loss if you don't provide it also the real trajectory
What you should be predicting is the real trajectory
indeed, but in the train dataframe you do need to have it, right?
huh
Are you using Pytorch or Tensorflow? I'm going to look for an example
Keras
that would help because I'm starting to get lost
https://www.tensorflow.org/guide/keras/rnn is a good place to start
okay I'll check that out! thank you!
before leaving though, can you provide a small guidance on what your approach would be? is it, testing first for one flight and then when loss=0, add other flights?
You just need a SimpleRNN and Dense layer to start with
as different flights are unrelated, can they all be dumped in the same array?
array is just a form of input
I don't know about Keras specifically but you typically end up with a dataset that looks like this: num_steps x features x num_outputs so you can have unrelated ones
num_outputs is your y_train
You're conditioning over windows meaning if your window is of size 3 it'll look like [1,2,3] -> 3(pred) ; [2, 3, 4] -> 4 (pred) ; [3, 4, 5] -> 5 (pred) ....
say that I have 50 flights, each with 10 features. They are variable, so like the first flight has a latitude that varies over time, speed, altitud... all of these have to be an array inside of every cell in the matrix or is it better to have an id column and simply have as many rows as points in the entire flights
This is where I struggle the most, I don't really find the best approach on the dataset you mentioned
I had tested this for 12 flights
Being FPLlat the latitude of the intended trajectory and TELlat the latutide of the real trajectory
yet once again the same question appears for me. These train on a single sequence and looking at the previous values it gives a prediction. Should the Y be the real and the window size contain the intended points?
Like [1_intended, 2_intended, 3_intended] -> 3_real (pred)?
The thing here is that i don't want to extend a timeseries but rather generate a new one from a given one, this messes up my head
You need to try putting it into words what your task is. Are you trying to map a coordinate from intended to flown (fully markovian) or are you trying to map a coordinate from intended to flown, given the past few coordinates (markovian over a window)
Depending on how these drones work, your initial results and what you want you may even need a bi-directional RNN (but don't start doing this yet)
Maybe a reasonable feature-set and approach is using a feedforward net (no RNN) with [X,Y,Z, starting X, starting Y, starting Z, time since start, time to end] as features
Why? I think that potentially points in the middle are the ones that are off. The points close to the start and close to the end are typically similar (according to the pics)
You can also make the problem easier by predicting the diff between intended and flown etc etc
You just need to focus on understanding your task better on the ML side I think
So what I've done is a study of this already, like the evolution of the error along the trajectory, finding the weak points, comparing both datasets... but now I have to implement this in ML
I've been working on that for months and I feel like I have all the necessary data to feed it into a NN but can't figure it out because I haven't been able to find any similar example
and this is fully markovian, no previous info from the flown flight. Just the intended points
is it that what you refer for loss=0, I think I've manged something
Indeed this is what I meant
but i've done that by training the model with just a feature (latitude, and the time stamps) of a single flight
try with lat long and alt
how?
Are you only doing one of the 3 dimensions now @muted crypt ?
in this test, yes
Okay I'll try explaining what I meant again with a lot less jargon:
My dataframe is originally just a column, should I do 3 columns now and then what the Y is the next 3 values of lat, long, alt too?
Your task: X, Y, Z (intended) -> X, Y, Z (actual) for all time steps T for all flights.
please do. By the way I am not a computer scientist or similar so this is very appreciated. I have to deliver this a thesis but my knowledge is quite low, hence my dumb questions
Looking at your images, intended != actual typically in the middle of your flight (look at your plots)
So a baseline model (could be linear regression even ๐ ) is the following: time_since_start, time_to_end and you predict 3 things: difference_intended_flown_X, difference_intended_flown_Y, difference_intended_flown_Z
My intuition is that this model is already going to be quite good! ๐ This is a drastic simplification of your problem
No. EDIT: actually yes, I missread.
how can I know the diff_X before the flight? does diff_X mean the distance to the real? or the distance to the start?
Do you need to know all of these quantities before the flight?
Just the positions where the drone will fly by and at which time
and velocity for instance of each segment, from which you can get the time
So, I suspect that the model will be used to "adjust" the intended path to correspond to the actual path ahead of flying right? (and not during)
that is correct
Then you can definitely make a model like I described above
so you can't "rely" on previous information from that intended flight
Even if it's a bad model, you need to make it imo because it's a good baseline to compare other models to
is the predicting real-time? as in as the drone flies, it will show the predicted path it will take
Based on looking at this I suspect that there is indeed a relationship between the time to start, time to end and the difference between intended and flown
I would like to do a simple model yes, but it really varies a lot, the flown trajectories are quite unexpected sometimes so I'm not sure it will perform too nice
with the simple model zestar proposed:
- have intended flight path
- use model to get diffX diffY diffZ
- use diff(s) and intended flight path to get predicted flight path
- compare to actual flight path
so combining diff with intended I see
Yet the model has to take both intended and real right?
for zestar's model, the model only takes time since start and time to end as inputs
Exactly that
it doesnt 'care' about the X Y Z per say
You can use this: https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html to wrap any regression model to predict diff_X, diff_Y and diff_Z. Do note you're fitting three models at the same time.
Examples using sklearn.multioutput.MultiOutputRegressor: Comparing random forests and the multi-output meta estimator Comparing random forests and the multi-output meta estimator
and what would be the Y in the model then?
lat or long ig
X and Y - lat and long
y = [diff_X, diff_Y and diff_Z]
Indeed
and we get this one beforehand right?
Yeah, it's easy to compute this no?
For all flights you subtract the intended from the actual
here a big question arises though, do you take into account the time shift?
what is the time shift?
What do you mean with time shift indeed?
i think we r using euclidean here
Your time series are not aligned?
check this out:
Intended: https://paste.pythondiscord.com/budagaqufi
Real: https://paste.pythondiscord.com/oduxafuziv
updated!
So for the intended one you have a lot less samples than for flown?
So the Real is the data recorded by the drone in 0.1 second increments ('secs' column) the intnded is jus the trajectory to be followed
Because it generates a bunch of waypoints
hmmm
it's just the waypoint that you specify -> the drone flies towards them
and we want to get from intended to real without actually flying the drone
Does the drone pass every waypoint?
correct
yes:
well not exactly on top, but very close at least
1st data row of real corresponds to 1st waypoint?
Okay, then I would only predict 12 points per flight, the ones closest to the waypoint
not necessarily, it's when the drone sensor is turned on
Why? Otherwise you're making big assumptions about the flight. You can't upsample the data / linearly interpolate between waypoints to have the same sample rate as the drone unless you're 100 % sure the drone is programmed to move between waypoints in a straight line
need a way to align them
if you are sure that the path between the waypoints is linear then you can upsample to get the same sample rate as the flight
theoretically it has to. The goal is basically to learn the patterns from all the data that I have (I have much more intended + real dataframes) and then apply it to a new intended to predict the real
I've done that
Like this:
The evolution of the error along the flight
ah
Look, if you're sure that the drone flies in a straight line you should upsample between the waypoints to have an obervation every 0,01s
^
I've interpolated the intended trajectory so that the number of rows matches the number of rows in the real one, is that what you mean?
yes
I've done this yeah
with the straight line assumption of the intended data, then can try zestar simple model first then
Then you don't need to align them or what am I missing?
he's already aligned it right
Depends on what you mean by align
I'd truncate your flight dataset to be between waypoint 1 and the last way point as well
It's impossible to perfectly alineate them, you can find the best time shift (what I like to call) (amount of time lag between real and intended)
yes that can be done too
I have this now for instance
For 16 flights, already interpolated
After truncating it to be between way point 1 and N I'm not sure you need to align them? Especially if you've already interpolated
yes, some corrections should be done as well
let's say that this is already aligned though, which almost is, is it even the right format for the model?
not for the time since start and time to end model
and apart from these 2 extra columns? this shouldn't be too hard
You'd need to create variables such as time since waypoint and time to waypoint
^
If the drone passes each waypoint then the idea is that it deviates from the path between waypoints
yet this is just 2 arrays which are flipped, aren't they?
time since start: [0, 1, 3, 4, 5]
time to end: [5 ,4 ,3 ,2 ,1 ]
If the time between each waypoint is equal then yes
for 5 data points
it will never cross it exaclty as the resultion is very different. See how many decimals it shows in the csv
wait, is the model predicting for each waypoint or on all points(from interpolation)
but I can imagine you could be 2 time points from a given waypoint but 7 time points to the next
so the time to end refers to the time to the next waypoint?
yes
How does that differ from the actual end of the flight?
My assumption is that the drone comes pretty close, if not exactly, to each waypoint and it deviates in the middle
if we have this we make sure to know that we are in the middle of the flight
just in turns though
Like, you can calculate this easily before you build a model to see if it's true
it won't deviate a thing on these horizontal long segments
All of this is "feature engineering" and is the cornerstone of ML. You have to be a bit creative haha, if you're creative enough you can really simplify the problem for LSTMs to linear regression
You can add the segment type as a feature as well then?
this is it:
where the peaks correspond to the turns
yet for the rest it is not really true
What is this? The difference?
yes
Just add the segment type as a variable for your model
Difference from real to intended
And add time between waypoints
this is much harder than it seems somehow
do you mean to categorize each segment?
https://paste.pythondiscord.com/budagaqufi.py don't you have this as a variable?
CurvaturePassed etc.
I mean a turn is not a segment but it falls into a certain length of 2 segments
Can't you know if you're turning between 2 waypoints by looking at the X, Y and Z coordinates?
this is not realiable sadly
In a straight line you only have X that varies, no?
but again you need to add more rows or something in here
Because you cant just have 3 points like a triangle and tell where are the turns
Can there even be a turn between 2 way points?
Especially since you said you linearly interpolate
no, that's why it's hard to categorize that
Tbh, I can't chat all evening but you just need to "distill" your knowledge of the problem into variables and simplify the problem as much as possible
when I interpolate i'm just adding points to the segments, on top
And afterwards you need to build a baseline model
Then you start relaxing your assumptions 1 by 1 and creating more powerful models. At the end of this process you get to RNN, LSTMs, maybe even bi-directional RNNs
yeah the thing is that there not an example of something similar
using time from wayepoint, time to waypoint to predict the diffs is already making many assumptions that you can then start relaxing later on (or you add variables to make more reliable assumptions)
and really predicting something like temperature is pretty easy but this is much more differnt and I don't really know why
No, the thing with predicting temperature is that they've already done all what I've said and that it's just documented and makes sense because all of the tricks/thinking are written down already ๐
and what model would you use here?
i mean, one thing is predicting the continuation of a sequence and the other predicting the whole sequence based on 12 points
I always use a mix of Ridge, Lasso, Random Forest, xgboost, SVMs (depending on my dataset's size) and neural networks
and can you predict 3 columns at a time for instance?
yes...
Examples using sklearn.multioutput.MultiOutputRegressor: Comparing random forests and the multi-output meta estimator Comparing random forests and the multi-output meta estimator
This fits 3 models, neural networks on the other hand fit all of it at once with 3 output neurons
wait but the y is diffX, diffY, diffZ. so now I have to compute the error in 3 dimensions. I just had the absolute distance :(
Yes...
If i have a categorical feature with lots of categories and some portion of these categories dont have a lot of instaces in the dataset, like maybe less than 10 for each of them, does it make sense to group them all in just one new category since they wouldnt provide much information i believe because they have very few instaces?
damn this is mad but I guess I'll try it
You need to ensure diffX, diffY and diffZ are on the same scale and then take the mean of the error
oh yes do you recommend scaling?
It's not that mad tbh ๐คทโโ๏ธ
If you're unsure I'd always recommend scaling
fair
so it is essentially adding the diff and time to start/end here?
then doing the model
also one last question, wouldn't the time to waypoint be controversial because as they are relatively short segments, so many row will have the same times yet the diff can be quite different?
We use ADFtest and KPSStest to check the stationary is there any method availabe to check the seasonality of the data?
Is there a big difference in 0,1 seconds? 
hey guys, im trying to save a file to a variable like so
dataset = 'filename'
but its not working
do i have to include the path to the file?
or the extensions?
I just figured out interesting observation when dealing with time series problems
I usually train my model after scaling down the data in values between 0 and 1
The model will always learn to predict a value between 0-1
However, in case of regression this may be limiting the model capability to generalise
For example if the maximum value in the training dataset is 1000
Scaled down to 0-1 the 1000 will become 1
On the other hand, if the maximum value in the testing split is 2000
Scaled down based on the scaler of the training data set
The value will be 2
The model will never predict 2
That's precisely why fitting your normalization stuff on your entire dataset is cheating
Yes, this what i did at first
Then normalised only the training split and used the same scaler for testing
I did an experiment a while ago with synthetic time series and I noticed that if you have preprocessing such as normalization if you do not update them across time (esp. if you have trend) only the drift on the normalization alone is enough to kill your model
Exactly!
I refit the normalization online at each timestep y_actual became available
Makes sense, but again I still think this limits the model capability
i want to analyze my youtube data. And i received a huge html package for that... I dont know much about html's so is it possible to pull that data out and restore it in a csv format?
Check out github copilot, will speed up this kind of labor work
got a link? 
Sorry if I confused you
basiccly AI pylance
but it knows what project ur working on
so it knows what command you might need next?
yes it will suggest multiple
you can ask chatgpt too
these tools are really amazing if you want a head start
then you take it from there and modify as needed
i think i just use it to preparte my data for analysis, i want to analyze it by my own tho+
from bs4 import BeautifulSoup
import csv
# Open the HTML file and read its contents
with open('Wiedergabeverlauf.html', encoding='utf8') as file:
contents = file.read()
# Parse the HTML data using BeautifulSoup
soup = BeautifulSoup(contents, 'html.parser')
# Find the table containing the watch history data
table = soup.find('table', {'class': 'table-section'})
# Create a list to hold the extracted data
data = []
# Loop through each row in the table and extract the data
for row in table.find_all('tr'):
# Extract the title and watch time for each video
title = row.find('a', {'class': 'content-link'}).text.strip()
time = row.find('span', {'class': 'accessible-description'}).text.strip()
# Add the data to the list
data.append([title, time])
# Save the data to a CSV file
with open('watch_history.csv', 'w', newline='', encoding='utf8') as file:
writer = csv.writer(file)
writer.writerow(['Title', 'Watch Time'])
writer.writerows(data)
"This code uses BeautifulSoup to parse the HTML data and find the table containing the watch history data. It then loops through each row in the table and extracts the title and watch time for each video, and saves the data to a list. Finally, it saves the data to a CSV file called 'watch_history.csv'.
Note that the above code assumes that the watch history data is contained within a table with the class 'table-section'. If your HTML file has a different structure, you may need to modify the code accordingly."
im not sure bout that
idk if the watch history data is stored in tables...
it looks like that
here a better pic
@faint mist normal that the script runs so long? i mean its a 50MB html
Hmm, Ideally no
I will leave it for someone else to pitch in and help you with the matter
I am no expert in parsing html files and not sure how to help
I apologize
You can do an inverse transformation after predicting btw
no worries mate
Yes, but it in theory, in real world it will never be able to predict a value higher than 1. In other words, the model will never have the capability of predicting the next "All time high"
if you get what I mean
It will be close
ofc
but it could be closer
Hi, might not be the right channel so apologies if thatโs so (let know and Iโll delete / move it)
Looking for input on how people like to develop data pipelines for aws from development to production. Ie how do you start locally when do you move to aws what accounts separation from production do you through, any and everything would be interesting.
We have some new projects that Iโm trying aws glue / emr (for pyspark) and not sure what resources to make for the team around a idp and or testable workable starting point
smth is wrong... that scrip has been running for like teh last hour
and nothing happend
no errors... its just processing
Thoughts on Data Factory?
I want to make a plot from a csv file using matplotlib. I have made the code but there is an error 'csv2df' object has no attribute 'plt'. Can anyone help me. Here is the code.
import pandas as pd
import matplotlib.pyplot as plt
class csv2df():
def __init__(self):
self.df = pd.read_csv("RMS level.csv")
self.sheet = self.df[3:]
def plot(self):
self.x = self.sheet["RMS Level"]
self.plt.plot(self.x)
show = csv2df()
show.plt.show()
i'm working on a project where i'm trying to predict a player's success after four seasons in a basketball video game based on their high school ratings. basically, there's 20 features for a player in high school, and i'm trying to predict a specific statistic (PER) in the game during their senior year. the catch is that players who have a particularly high rating for their high school features won't play until their senior season, so there should be a soft limit for how good a player is, and if a player is too good, their predicted PER should also be lower.
my current model fails to take this into account and will predict the best high school players to have high PER as a senior, even though most of them won't return for multiple seasons. How do I fix this?
i applied for a data engineering job
I mean, an internship to be more exact
How should I prepare myself I somehow reach the interview phase?
Hello
For those who started in data science without experience in a job, what is the most common thing they are asked to do?
I think you'd just need to do show.plot() ?
they may also need to change self.plt.plot(self.x) to self.plt = plt.plot(self.x) otherwise they are still referencing self.plt which does not exist
Hello. Anyone know or have a chatbot? If not, can you tell me a name of model that I can use with the "transformers" library that doesn't need a lot of memory to work. I tried a few models are only 2 managed to crach my computer?
It's the bread and butter of data in the azure stack, it's pretty intuitive imo
I've dataset with 5 min time stamp which I changed to hourly data, and this data have and daily and 12 days seasonality and also not a stationary data, I've make the data stationary, after that I've used SARIMAX model which gives negative AIC value but when I tried to predict the value It gives me straight line, I also tried auto arima, but still It didn't work for me. How can I improve it's accuracy?
here is the model summary:
you seems to be just using a AR(1) model here..? which is not the full capability of SARIMAX
i assume you are using statsmodels, have a look here, https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX-parameters
particularly the order parameter.
Hey guys, any data science on here?
A question regarding the learning of mathematics, according to what I have investigated, the D must learn a lot, but in reality the most important is algebra, calculus and statistics, now if you could say the most important contents of algebra and calculus, what would? they are?
what do you mean when you say algebra here? mathematicians say algebra to mean abstract algebra, which is way different from your high school algebra. what one uses very often in data science is linear algebra, which is one of the elementary parts of abstract algebra
how does CLS token work with transformer?
regarding calculus, really all of it. you'll be looking at gradients, jacobians and hessians (so multivar calc) and integration used for optimization
could someone explain this behavior of optuna? When I set the sampler as optuna.samplers.TPESampler(n_startup_trials=300) with 300-400 random initial samplings everything is fine.. at first. You can indeed see it taking 300-400 random hyperparameter combinations, after which the graph becomes more stable as the "smart algorithms" kick in.. but that only lasts for around 200 samplings.. after which it seems that optuna reverts to random sampling again..! How can this be explained? Is it supposed to be like this? I can't make sense of it..
if I had to guess, it just assumes that it reached a local minimum/maximum then starts going further away from it to try to to find a different (hopefully the global) local minimum/maximum?
that is, going further into X direction wouldn't make it any better, so it tries to find another Z direction that might make it better
the alternative would be pretty much overfitting then staying there
I should look at Optuna sometime ๐ค I always just use sci-kit's hyperparameter tuner or Keras tuner (even with Torch etc.) if I need more flexibility.
remember to be careful when tuning hyperparameters, otherwise you might end up overfiting your model's hyperparameters to your test validation data
wdym? You should never test on your test set before you've fixed 1 set of hyper parameters
If I asked you what are the most important contents of linear algebra, which would you tell me?
do you call it like
train / validation / test
or
train / test / validation
or only
train / test
in my mind, validation is after freezing everything, test is how you would measure if it gets better or worse, not sure what is the standard
train /validation (find best model + hyperparameters) => test once
almost all of it, since it's the bread and butter
(sub)spaces, linear transformations, projections, diagonalization/EVD, SVD, low rank approximation. in fact, the other stuff (calculus and statistics) will always be applied on TOP of linear algebra
I think in most literature / texts validation is what you select hyperparameters on, hence why k-fold crossvalidation etc.
this is really infuriating.. there's not much you could do here, but the way optuna just takes that one lucky hyperparam combo and claims it is the "best".. and it does that all the time, even when you really could chose an optimal combination.. it just throws out this random combo that happened to get lucky, even while you can see the actual algorithm at work moving in another direction.. why don't they fix this? it's obvious this is random luck, not actually a good combo..
You should read about the optimization algo that you're using
whatever is happening, the issue is probably you using the tool incorrectly, not the tool itself being objectively bad
like zestar said, make sure to read the documentation and perhaps even relevant papers if you haven't yet
I did, but that didn't give me much.. the algorithm itself seems to work, somewhat at least. But then it says that some other combo is the "best" just because it scores once
are you maybe under the impression that non convex optimization is easy? this approach is very similar to simulated annealing, which is a good heuristic. but heuristics and local optimality are about as good as it gets
I personally only vaguely know about TPS hence why I would not touch it over the ones I know and trust like Bayes opt (sequential problems) or random search
it is also possible that the method you are using is just not appropriated for your model
If you can run your trials in parallel I think random search and iteratively making your grids smaller is a good option
Assuming you have many combinations otherwise you could just run grid search ofc
random search, same as even grid search, results in the same problem tho. The best isn't actually the best, and u have to look at the graphs to see that
Why do you care about the exact best?
there is no good way of finding a global optimum for nonconvex problems. if you find one, you'd win a nice prize
that's the thing.. I don't. I care about the average best.. but it gives me the single one best
Why do you care about the average best? What is your exact problem?
but you could at least get something close to good..
no way of knowing what "good" is without knowing what the best is
all you can do is compare to the results you get
I did a bunch of graduate courses on global optimization for non-convex problems. This is one of the god particles.
there's probably a parameter you pass to optuna to choose the cost function with which it picks the hyper params
There's ideas you can do if you want good results on average but I'm curious to know what your exact problem is? Is it really just hyperparameter tuning?
lets just say i have a curve, and I'd like to get a value close to the bottom of that curve. But the curve isn't a perfect line.. I'd still like the averaged bottom, not one dot somewhere to the side that randomly happens to be lower than the average bottom
Why do you want this though? Is this really just hyperparameter tuning or not?
yeah, it's just hyperparameter tuning.. and curiosity
Q on cost function and loss function. When we pass a loss function to say a pytorch neural network, thats a loss function right?
coz its evaluated on the batch size
if its evaluated on the entire dataset, then its cost function?
this isn't really what's happening though. you get a different curve for each set of hyperparams, they parametrize a family of curves. then you pick among the curves with some criterion
"cost function" just refers to what you're minimizing
Hi, I need to find a way to get the size of the center clusteron these maps, do you know a way to compute that ? like in the first one i would like a size of 5 and in the second one of 1 or 2
is this just a semantic thing?
whoops i just mixed the first and the second *
there's no guarantees that this line is smooth and continuous
you could compute some statistics on the background noise looking at the corners of the image, then use that to define a threshold
what I was talking about would prove a problem even if we had just one hyperparameter tho.. is that really that difficult to fix?
yes, this is a very difficult problem in general with no good solution
you can pick an "average best" if you like, but there's no special reason why that would be any better
I tried but it's not really efficient...
what did you try?
Tbh the plot you showed doesn't even tell the full story as we can't see what parameters were tried
If I were you I would most likely do a small search around the n lowest points and call it day @sleek harbor
the cost is in general defined w.r.t. all the data. the batches part comes later. but yeah the distinction is just semantic
But most likely I would just select whatever came up lowest, hyperparameter tuning is imo something that is rarely worth it time vs. reward wise
- Basic properties of matrices and vectors: scalar multiplication, linear transformation, transpose, conjugate, range, determinant
- Internal and external products, matrix multiplication rule and various algorithms, inverse matrix
3.Special matrices: square matrix, identity matrix, triangular matrix, idea on sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, biased-Hermitian and unitary matrices - Matrix factorization/LU decomposition concept, Gauss/Gauss-Jordan elimination, solving the linear equation system Ax=b
- Vector space, basis, interval, orthogonality, orthonormality, linear least squares
Eigenvalues, eigenvectors, diagonalization, singular value decomposition
I tried to define a treshold by taking the mean value as I have a lot of points and define the radius like the first value below the mean +0.01 to be a little higher but in the first image for example the cluster expands a little even when we are below the thresold
sorry for my english I'm french
this?
wdym by "expands a little"? it's larger than you'd like it to be? (your english is fine)
these are the bare essentials, yeah
oh my, reminds of me of uni 
I disagreeish on these being the essentials because there's so much abstraction in ML nowadays that you can get away with knowing less
If you want to make novel stuff then yes, it is the bare minimum
are you a data scientist? is it what is most used?
this tbh, i dont rmb most of the maths ive learnt
ah, you mean the shape keeps going but falls under the noise floor. ok. yeah so, as soon as you have noise, it's not always possible to recover the shape perfectly. if you have a model for the shape we're looking for, we might be able to do better. for example, we can fit a 2d gaussian to the image
i'm doing a phd in signal processing rn. the things you listed are the things you should be able to do with your hands tied behind your back if someone suddenly wakes you up at 3 am
Here I can get below the treshold in the orange square but I would like to get the red square as size of the cluster
Oh yes make a fit should work
In industry less so
Even in my context (applied research) I don't think anyone remembers what SVD is or how to do PCA from their time in uni
but I have no idea on how to do this in C lmao but thank you I'm going to try !!
oh oof, in C. well my suggestion would be to set up the math on paper and then code that :p but there surely exists a library that can help you with it
i mean, you should never do an SVD by hand unless your problem is AT MOST 3D
but you should understand it inside out
I'm not even talking about by hand I meant the general procedure ๐คฃ
the conceptual understanding is the most important
People I work with know what it does and why you'd need it but not the internals
I already did a linear regression i guess that I can make a gaussian fit
For most stuff in my context that is more than enough. In pure industry you can get away with even less
i think that's the most important, yeah. if you understand that, you can read an algorithm and understand why it'd work
@wooden sail @past meteor I'm kinda dumb, so pls bear with me a bit. Am I wrong in assuming, that in such a graph, where we want the lowest value, that 1 (or a value very close to 1) would be the best obvious choice? Cus that's what I'd want to get as the "best" hyperparam value. However, usually, just because of how the dataset is split, and random factors one can't control, with enough repetitions and tries, some combination of parameters (and even if we are just tuning this one hyperparam) will have a lower target value (y) at a value with a lower than 1 param value (x).. those (or that one) combo will Not be good when you try it on another dataset.. no? I suck at talking, so I'm not even sure I'm getting my point across..
a hyperparameter, eta, it's values
and what does that control?

the learning rate basically
of XGBoost
just chose a random hyperparam.. a similar picture could be painted for many hyperparams
well, you have 2 hyperparams, yeah?
The value you get in hyper parameter tuning is the average over all of your folds you tried the parameters on
could be 2, could be 1, could be 100, the question would be the same
right, so then comes my point. why does it matter whether eta is close to 1?
or any other hyperparam for that matter
Hence why you can take the best one. It should be relatively robust and not something that wildly overfits on your data
because being close to 1 has "proven" to "consistently" provide good results?
yeah.. but it could just be overfit to your cv fold combination...
would you rather have 1 or 0.1 in that picture?
there's no reason why eta has to be close to 1 always, and as zestar says, i would expect any hyperparameter tuning tool to already average over all the folds and trials
whatever gives the lower loss. the value of the hyperparam itself doesn't matter
The default eta of xgboost is apparently 0.3 so I wouldn't know why it should be close to 1
it can still be overfit to ALL the folds together, as in a different set of folds would result in drastically different results
??
but that's my whole problem.. they don't average over trials.. only over folds, but that's not enough
I'm not sure you fully understand k-fold and/or hyperparameter tuning?
different values will work differently on different datasets, obviously.. the point is that I'm tuning parameters for this dataset.. if one set of hyperparams were objectively the best for all datasets nobody would tune them in the first place...
9 times out of 10 for something like xgboost I don't tune it ๐คทโโ๏ธ
no.. it just means that.. ๐คฆ I can't explain this. For the same reason cv exists in the first place, repeated kfold has been invented to compensate for the problems of kfolds, which is great, but it doesn't fix the problem entirely, only helps.. You can overfit to a combination of KFolds same as u can overfit to a random split
certainly, that can happen
but are you aware that this problem is at least as difficult as the original one you were solving?
that's cool. I'm obviously a noob and have no idea what I'm talking about.. maybe I shouldn't tune at all.. I'm just trying to understand here
optimizing the hyperparams is a completely separate optimization problem of its own
not only that, you won't even be able to check you got the "best" or even "good" hyperparams
It can indeed happen that your specific instance of hyperparameters do a strangely good job on one fold which biases the result on average but you have to draw the line somewhere imo
you validate, and if it performs well, you call it a day
you can only check by using arbitrarily large amounts of data
It's also fine to be "lazy" with hyperparameters imo. For boosting type models I would only tune the rounds of boosting I'm doing
Intuition tells me that this is likely the most important hyperparam (unlike for bagged models)
Overfitting is mostly related to fitting too many models and not the complexity of each individual one in sequential set-ups
idk.. I just feel like a value of eta of 0.13 would be objectively a bad choice, especially when you can see a graph of points that look to be steadily improve the closer to 1 u get.. to me it seems like that value of 0.13 is pretty much an outlier that should be ignored, since other values around it seem to be on average worse than those closer to 1. Which imo means one should chose a value closer to the average "good". The thing that bugs me is that the optimization algorithm, as far as I understand, agrees with me on that, cus it keeps "suggesting" values closer to 1. But since those values, tho improve on average, don't manage to "abnormally score" the way 0.13 did, 0.13 remains the "winner". I would chose a winner that, say, scores the best among the best group of 10 consecutive averages..
that's great, when you have enough experience to have intuition.. which I do not. Btw, the optuna algorithm strongly disagrees with that statement.. ๐
personally, to me, the eta graph looked a lot more informative, with a visible trend.. this looks.. pretty much random (already narrowed down a bit tho, when the range was 30-500 u could see that too low and too high results aren't good)
If you're tuning multiple hyperparameters then the imortance of n_estimators might be subsumed
Kind of similar to colinearity
anyone have a guide to how to tune them properly? cus.. I see tons of various methods, and some of them seem fundamentally wrong to me. For example, the popular "tune one at a time" seems to be a strange choice to me, specifically because of collinearity..
Tune one at a time is bad as well because the surface is non-convex and some parameters are just unimportant
I would only tune n_estimators and call it day. Maybe tuning 5 others would be better than that one but this is such an easy one to tune because it's discrete, you can grid search it even if you want
hey so i've been learning data science for a while, displaying, analazying data and mostly machine learning models using sci kit learn and the math behind them but I hear a lot about numpy and ye i've learned about it but still i don't feel like there are so many options that I use in it for it to be talked about so much.
I just want to know how much are you actually using numpy while doing any data science projects
and ye i know that many other libraries are based on numpy as well but I just dk if i'm missing sth about it that i don't use it that often by just calling something straight from numpy
not sure if you know what i mean but whenever some1 mentions data science 2 things that are mentioned are numpy and pandas and I don't know what it means to have knowledge of numpy
scikit and pandas are built on top of numpy. that is to say, numpy is comparatively "low level" and requires you to code the math yourself
it gives you the most control, but you need to know how to do all the math
ye this i know but for example if job offer says knowledge of numpy
what does it mean
it means, if someone gives you some math, e.g. from a recent paper, you can implement it yourself on numpy (because no library will have an implementation of something recent)
is there any way to train sth like that because i dont see myself need to ever do things like this so far
by doing/reading math and implementing it yourself from scratch
for example many people try to set up basic neural networks from scratch using numpy
it helps you review both your math and numpy at the same time
i havent got to neural networks yet so far so cant speak about it
things like linear regression, then
anything you've ever done with pandas can also be done with numpy
oh k maybe ill try it then this seems a bit hard to code from scratch but i may give it a try
or it just seems like that and may turn out not that hard
ive implemented knn, naive bayes and decision tree from scratch
how do you think linear regression compares to it when it comes to coding from scratch
linear regression should be a lot easier
it's a good problem to practice many things though. pseudo inverses, gradient descent, newton methods, etc
ok thanks a lot
from the things you mentioned though, sounds like you're already pretty familiar with numpy
is this the right place to ask for help on transforming an html to an csv file?
since csv is kinda data science related
Does anyone has experience with Pytorch Geometric? I'd really like to know how its Dataloader does its batching process. It feels like it simply considers batch size = 1 for every sample, and then modifies the tensor dimensions so the model can analyze the graph node, its edges and bonds...
(Yes, I've tried reading the docs, but still didn't figure it out)
I'm trying to implement a Unsupervised Pre-training process on a Graph Neural Network, so the way the API is batching the samples is causing me some trouble...
Nevermind, that was an error in my code. The batching is working fine, now. Or at least seem to be...
It gets annoyingly slow when it's too big, something that I find strange, but ok...
question, im pretty new to python and i wanna learn ai and machine learning. what kinds of things would you suggest me know how to do as a prerequisite, and also do you have any tutorials you would suggest me watch/read when it does come time to learning?
sorry if this question is out of place btw
Honestly the fast.ai course is pretty good. It more puts you in a spot where you can do something with it, then works backwards from there.
thx man i will check that out
No problem, enjoy! Try to really commit to it, follow along with the notebooks, and make your own projects.
hey im trying to use regularisation to improve a linear regression i did. i have an excel spreadsheet with x and y values and i'm not sure how to split the data so that i have a dataset of x and y train and another dataset of x and y test which have to be a numpy array (the extracted data from the excel spreadsheet is in the form of a list within a list (inner list is row values)
if yall can provide any suggestions feel free to ping me :)
.
hey hi everyone , anyone interested in nlp and classification of texts ?
you can split them at random, and try different realizations of the split
as for the regularization, what are you trying to do? which property are you trying to enforce?
does one has an idea how i could 3d plot complex numbers in a "unit sphere"?
what are you trying to plot?
i was thinking of a way to plot and do a kind of clustering of FFT frequencies maybe with their magnitude
this just came up to my mind and would be a cool way to show distribution
i don't see where the 3d part comes in though
why the unit circle or sphere though
to get a better understanding visualization of the distribution
the distribution of what
the complex values
i'm not sure i follow what you're trying to do
let's forget for a second that they're complex numbers, because they're isomorphic to R2. so we have a set of points in R2. why would you want to project them onto the unit circle for this? this gets rid of the magnitude information and keeps only the phase
isn't it more I and Q for stuff like this, magnitude and phase?
e.g. frequency for X, magnitude for Z, Phase for Y?
thats why i try to figure out how i can use the magnitude as z
you only have 1 input axis though. if you want to see the magnitude, you'd just get frequency vs magnitude
what's the actual problem? you have some data in spectral domain, and you want to figure out which frequencies have some property?
kinda i got some spectrum and want to compare the resulting frequencies
compare them to what?
each other
ok
that's very different
cuz then you have vectors in C^n, where n is the length of the spectrum. you'd have to do some sort of projection first
why tho lets assume we got a spectrum resulting in 5 freqs, when i plot them all into the unit sphere and lets say another one i can directly compare?
hold up
each spectrum you want to compare has 5 frequency bins? 5 samples, each one a complex number?
yes 5 complex value per spectrum
ok. the sphere here is the 4-sphere, a 4-dimensional object in 5d space
if you want something you can visualize, you have to do a projection onto a lower dimensional space first
and again, projecting onto the sphere gets rid of the magnitude information and leaves only the angle of the vector
mhhh
it keeps info regarding relative magnitudes of the complex values relative to each other in each spectrum
@wooden sail I'm curious how you would solve an issue we had at work recently:
is that enough info? you tell us
i would struggle to build something like this tbh
i'm not sure why you wanted to project onto the sphere yet. there are cases where it makes sense, but visualization is also a completely separate matter
how would u compare complex values of spectrums then?
depends on what i'm looking for
would u at all do something like this
start by better explaining what your trying to visualise, what you have described is very fuzzy?
similarities distributions etc.
We had a 3d point cloud with each point being an EMG sensor. It's a person moving along a line from back to front (but the direction differs from person to person) the task was to find the right heel
this is also completely different
are we doing a statistical comparison or a deterministic one regarding shape?
i draw something, give me a sec ๐
So we had measurements every few ms. of the position of each sensor. obviously people are moving (raising, lowering their body parts and thus the sensors)
Is this efficient enough for the cost function?
text not image :/
wdym?
looks like some type of entropy, what exactly is your question?
Is there a build in np function that takes care of 0 and sets them to something slightly bigger than 0 as to avoid taking a log of 0
i'm not sure what kind of data an EMG sensor gives, but what comes to mind are those stick figure models. maybe the parameters of one of those could be fit given the sensor info
- spectrum, 2) FFT, 3) complex values, 4) sphere plot
We have some activation values but it's mostly X, Y, Z we're working with. after that we use the EMG activation of a reference point to make our models
The stick figure models are a good one! I know it from the context of facial recognition but I hadn't thought of applying it here
what's the difference between spectrum and fft here
We have a heuristic in place right now, I'll try and see if I can make what you're suggesting work indeed
condensation of data
what?
1 is normally your raw input data
and here i mean spectrum as in spectral domain, its physical meaning notwithstanding
ok theres where we where mismatching ๐
so yes 1 is input data
anyway. you have some data, you fft it to get the spectral domain, you keep some fourier bins
do you keep the same bins for all the data?
i can choose whether i keep the 5 with highest power spectrum or [:5]
ok. and after doing this, we wanna check how similar the bins are
so not necessarily the 5 highest and therefore could differ
and after getting frequency and magnitude, a 2 dimensional value, now what? e.g. are you say slicing the input into small time periods, and plotting how the FFT changes over time?
in this case the meaning of the fourier axis doesn't really matter
these are basically just vectors in C^n
is the magnitude of the bins important? or only their ratios?
e.g. is the vector [10, 5] the same as the vector [2, 1]? or is the "energy content" important?
i would argue only the ratios
ok, then the magnitude doesn't matter and you can indeed project on the unit sphere
that can make the distance... tricky to measure, but we can ignore that for now
yeh i think the idea is pretty cool but i struggle with embedding the code ๐
now we have unit vectors in C^n. and you want to project this to R^3 you say
to get all values inside the sphere
that's fairly difficult. hmm
i don't think there's a very meaningful way of doing that tbh
mhhh
the only way to guarantee you get real values out of a function with complex inputs is to make it a constant function ๐
you can make 2 spheres, one for the real parts and another for the complex parts
i'm usually a "why visualize" kind of person tbh
all right, and then this still leaves the problem that we need a matrix that maps from C^5 to C^3 while approximately preserving distances
cause looks nice and makes it easy to understand for topic foreign persons
the thing is that low dimensional representations never tell the full story ๐ projections lose information
in this case, for example, if your C^5 vectors do not have a sparse representation, it'll be very difficult to embed them while preserving distance
the easiest approach is to make a random matrix size 3 x 5 where the entries are random, and just use that
wont it be possible to use the PS for Z and norm them?
what's PS?
right, you'd lose info
you can try, why not. compare it to the approach with 2 spheres
always a pleasure to hear (read lel) ur thoughts โค๏ธ
but then i would only represent data in 1/2 the sphere
so maybe not the PS
also note that a matrix with 5 columns has a spark that is at most 4, i.e. in the BEST case, we take 4 columns and they're now linearly dependent. that means you can only really COMPLETELY discern vectors that are 2-sparse
which is pretty strict
๐ฟ
2 spheres it is then ๐
but ill see what i can come up with after ur input
maybe i ask a college aswell what he thinks bout this
this will be a problem regardless of what you do, i'm just saying you will very likely not get anything useful out of this approach
regardless of using power spectrum or not
mhh
the problem is projecting down to C^3
so better sticking with 2D?
better not project and do it in C^5, then make plots of the distances
the more you project, the worse the problem gets
but go ahead and try. maybe we'll be pleasantly surprised. but if it doesn't give anything interesting, you shouldn't be surprised
pushing boundries lel
try making one sphere in R^3 using the power spectrum, and to spheres (real and imag) using the complex fourier bins and see if anything looks nice
all good i got it
Is there any experienced python developer who's willing to look through my self written ai? Nothing impressive tho, it is just a prove of concept for me
@wooden sail i created worms ๐ฟ
lol
somewhat clustering ๐ฟ
generated sine functions with noise and some freqs
but thats it for now i guess first discussing this with my college next week so i dont waste more time xD
hey guys do i need to pay for the open ai gpt api
because when i create a api key and try to use it its not working
from langchain. llms import OpenAI
llm = OpenAI()
llm("explain large language models in one sentence")
this is my code but the response i get is
RateLimitError: You exceeded your current quota, please check your plan and billing details.
i have never even used my api key before
i just created i
Hello, I was just wondering whether anyone had any experience in neural network image classification? I've written a Python script that image classifies two categories, however I would like to extend it to 10 categories. Any help would be really appreciated, because I'm a bit lost on how to do this ๐
increase outputs to 10 at your fully connected layer
Adds in our layers
Adds a convolutional layer and a max pooling layer
Has 16 filters (3,3 pixels in size)
Stride moving one pixel by one
Extracts the relevant information to make a classification
Applies a relu activation - taking into account non-linear patterns
Image shape is going to be 256 wide by 256 heigh, 3 channels deeps
model.add(Conv2D(16, (3,3), 1, activation='relu', input_shape=(256,256,3)))
model.add(MaxPooling2D())
Adds a convolutional layer and a max pooling layer
Has 32 filters (3,3 pixels in size)
Stride moving one pixel by one
model.add(Conv2D(32, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())
Adds a convolutional layer and a max pooling layer
Has 16 filters (3,3 pixels in size)
Stride moving one pixel by one
model.add(Conv2D(16, (3,3), 1, activation='relu'))
model.add(MaxPooling2D())
Flattens to remove the channels value
model.add(Flatten())
256 values will now be the output
model.add(Dense(256, activation='relu'))
Creates a single output, 0 or 1
model.add(Dense(1, activation='sigmoid'))
Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.
model.compile('adam', loss=tf.losses.BinaryCrossentropy(), metrics=['accuracy'])
Displays how the model transforms the data
model.summary()
Sorry where about would I put this, I'm very new to this
Okay one second I'll have a go ๐
So I did that but it still doesn't work. I think the problem at the moment I need to assign 0 to 9 to 10 categories before hand but at the moment I haven't figured it out
how does ur data look
I have a folder called 'data' within the folder I have three sub-folders 'train' , 'test' and 'validation', within those folder is 10 categories that contain different items of clothing
u using data loaders or?
would I be worth will sharing my entire code, thank you so much for this been working on this for about 30 hours :/ im using os to load the data from the directories?
Oh don't worry if you're busy ๐ I can keep working on it @cold osprey
donez
I've never seen this done before (summing up results of predictions of the test set made with models trained on train-validation sets across kfolds, and then divided by the total folds). Is this a common practice? Cus so far I've only come across the popular "refit all training data with best cross val results and then predict test data with that model".. never seen something like this before in courses or tutorials, but it does kinda make sense
source: https://aetperf.github.io/2021/02/16/Optuna-+-XGBoost-on-a-tabular-dataset.html
Databases, Dataviz, Machine Learning.
you can't use binary cross entropy for multi class, need to change that, i think
yes
can just use CrossEntropyLoss
hmm thats pytorch
not sure what is the tf equivalent iis
Okay thank you, I'm still looking into how to fix it. AI is really new to me
https://github.com/KatieCook12/Neural-Networks/blob/f775208e1302c14905ff7b2a4e2a643afe028807/Python - here's the code I've already written
model.add(Dense(1, activation='sigmoid'))
if u change this to 10 and the loss to CategoricalCrossentrypy, what happens?
Computes the crossentropy loss between the labels and predictions.
the way uve set up ur code is abit weird too
if yhat < 0.5:
print(f'Predicted class is dress.')
else:
print(f'Predicted class is hat.')
``` like this bit
are u following a course for this or?
Thank you, just coding it now. I'm following a YouTube video
ah ic
Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.
model.compile('adam', loss=tf.losses.CategoricalCrossentrypy(), metrics=['accuracy']) - so when I ran this it came up with this error
AttributeError Traceback (most recent call last)
Cell In[52], line 2
1 # Compiles the model using the 'adam' optimiser. Specifying what the loss is. The metric tracked is accuracy, shows how well the model is classifying either 0 or 1.
----> 2 model.compile('adam', loss=tf.losses.CategoricalCrossentrypy(), metrics=['accuracy'])
File ~\lib\site-packages\tensorflow\python\util\lazy_loader.py:59, in LazyLoader.getattr(self, item)
57 def getattr(self, item):
58 module = self._load()
---> 59 return getattr(module, item)
AttributeError: module 'keras.api._v2.keras.losses' has no attribute 'CategoricalCrossentrypy'
yeah just realised sorry
so know when I run - # Model.fit takes in the training data
Epoche is how long we're going to train for
Passes through the validation data, to see how well the model is performing in real time
Stores in a variable called history
hist = model.fit(train, epochs=20, validation_data=val, callbacks=[tensorboard_callback]) - it comes out as:
yes epochs is how many times we pass through the whole dataset
I'm getting an error when I run it saying: ValueError: Shapes (None, 1) and (None, 10) are incompatible
hist = model.fit(train, epochs=20, validation_data=val, callbacks=[tensorboard_callback]) - its coming from this
oh wait one sec
Unfortunately I'm still getting the error
10 categories, but maybe I didn't set it up right, should I print y?
so this is how I set up the classes:
Builds an image dataset, using keras
test = tf.keras.utils.image_dataset_from_directory('data/test')
train = tf.keras.utils.image_dataset_from_directory('data/train')
val= tf.keras.utils.image_dataset_from_directory('data/validation')
this is the output: Found 249 files belonging to 10 classes.
Found 3054 files belonging to 10 classes.
Found 194 files belonging to 10 classes.
Allow us to convert to a numpy iterator, allows access to the image dataset
data_iterator_test = test.as_numpy_iterator()
data_iterator_train = train.as_numpy_iterator()
data_iterator_val = val.as_numpy_iterator()
Thank you, honestly I appreciate this so much
basically the last layer should output 10 numbers
logits or probabilities
which the highest will be what it classifies the image as
'Trouser': 1,
'Pullover': 2,
'Dress': 3,
'Coat': 4,
'Sandal': 5,
'Shirt': 6,
'Sneaker': 7,
'Bag': 8,
'Ankle boot': 9}``` then u would have something like this
so say the first '0th' was the highest, then its a tshirt/top
okay that makes sense, so how do I assign the categories to there number
So I guess at the moment it's only assigning to either 0 or 1 and not the entire range
ya when u set ur last layer to output 1 only, its outputting one number which u then see if its < 0.5 or < 0.5 (yhat)
which is a ok way to do it but harder when u want to modify it for multiclass classification
what i wouldve done for binary classification is just output 2 classes with the same idea as 10 classes
i think the error is coming from how the data is set up hmmmm
am comparing to my pytorch code rn
been a while since i used tensorflow
Okay, thank you, I'm googling too, to see what solution there is
could u print one of ur data and see how it looks like?
okay I think I figured out the label problem I included this:
Copy code
num_classes = 10 # Replace 10 with the actual number of classes in your dataset
test = test.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))
train = train.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))
val = val.map(lambda x, y: (x / 255, tf.one_hot(y, num_classes)))
I now running the testing which is working (yay!) I'll let you know the results
๐
It might take a while cause my laptops slow, and I've set it to 20 epoche
if loss is going down and accuracry/other metrics is going up, should be fine
Hopefully, sorry one other thing. So I want to see what number is assigned to each image. When I run this:
Checks which class is assigned to which image
Checks that they've been scaled correctly
fig, ax = plt.subplots(ncols=1, figsize=(20,20))
for idx, img in enumerate(batch_train[0][:10]):
ax[idx].imshow(img)
ax[idx].title.set_text(batch_train[1][idx])
it doesn't display a grid of images, with there number assigned to them
on my 4th Epoch, it's being incredibly slow
That awesome, I'm check it out
On 7 epoche now, the tension is getting to me ๐
if ure using tensorboard, i think u can view the loss and accuracy in real time?
Yeah it doesn't look good though, I'm hoping it'll improve
I'm on epoche 9 and it says the loss is 2.1518 and the accuracy is 0 :/
Sorry was looking at the wrong metric the accuracy is 0.2603 but isn't improving
model.add(Dense(1, activation='sigmoid'))
``` may need to change this to relu
id suggest looking for a tutorial on multi class classification and working from that instead
also pytorch > tensorflow hahah
high chance the problem is from the data
else the model just isnt good enough
My uni supplied the data so I have to use it, but I guess I'll write about it in the report. I tried relu but it didn't work so I've changed it to softmax
Thank you, I'm just re-running it again ๐ hopefully the outcome will be better
Accuracy is looking better this time
hey so atm im doing jose portilla machine learning course on udemy and i would also like to do the andrew ng course on coursera but i see a lot of the content is behind the paywall do you think the free part of the course is good enough or wont make much sense without the paid lessons as well
i will soon end the jose portilla course i have just a few lessons left
My tip: go for a book after that course
My personal favourite is statlearning.com
ive been reading a bit from this book while taking this course cuz jose recommended it as well
do you have any idea doe if the andrew ng course makes sense if i were not to pay for it
Normally you can always audit courses, which is follow them for free but some content is "hidden"
ye i know i can audit for free but the amount of the things that are locked behind paywall seems like a lot and i feel like these are also important topics that are there
You can also read the sci-kit learn user guide. Some things might not make sense but you can google the terms to understand them better
You should read chapter 6, 10 and then 1, 2, 3 and 4
ok thanks
if some1 else has some knowledge about the andrew course i'd appreciate as well
how maths heavy is this?
this was my first ml book https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/
but i already knew some sk learn before this
mainly regression
now doing a pytorch course then have some personal projects planned
Imo it's not super math heavy, there's no proofs or so in the book
ah okay
Some formulas aren't derived fully either so sometimes it feels like they're making a "jump" but that also means it's pretty hands-on
For me it depends. I did an entire course on just support vector machines in uni. Most of it was math, most of it was fun. Doesn't really make you significantly better at using SVMs though ๐คทโโ๏ธ
I did a course in machine learning which was very maths and stats focused. It felt like it gave me a good foundation for a lot of the concepts, but when it comes to actual machine learning, there seemed to be a bit of a disconnect between the ideas and the actual methods in practice.
My first ML course actually only made sense to me after I did other courses... It was very theoretical and also covered stuff that is not really relevant like theta subsumption, inductive logic programming, ...
I would say that the most useful concepts mainly came from statistics. I have found Bayesian statistics a very useful way to think about ML and data in general.
Bayesian stuff is cool until you run out of memory and that's the part they don't talk about in stats classes. In ML classes they will, they'll also tell you variational inference exists but they won't tell you that the probabilities you get out of it aren't great.
I would agree with that.
does anyone know of an alternative to feature importance? at me if u respond
Foundation in the mathamatical theory and concepts are important if you want to make novel models, if you're just copy and pasting pre made models all you really need to get going are some hands to manipulate the data to fit the inputs of the pre made model
okay so I have a model that I made (you dont have to read it all but I wanted to be as specific as possible):
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
x = data.drop('Rings', axis=1)
y = data['Rings']
x_train, x_test, y_train,y_test=train_test_split(x,y,test_size=0.3)
clf_gini = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=0)
clf_gini.fit(x_train, y_train)
y_pred_gini = clf_gini.predict(x_test)
print("Accuracy with gini index: {0:0.4f}".format(accuracy_score(y_test,y_pred_gini)))
and then I got 0.2706 which is of corse abysmal (note data is a pandas array created by a read_csv function.) and I want to be like "because this is real bad we need to find out what is throwing us off." and I know the anwser is we need to drop the sex column, but I don't know how to come to that conclusion. My friend did this using a random forest tree instead of a decision tree, so he used feature importance. I read online that feature importance is more for random forest than decision tree, so what should I use?
I have an interview for Data Science role. I realise i am more into ML stuff, and been a while since I did my project and statistics stuff on R.
Can someone give me list of topic that you would revise before interview? Also i realise i forgot about various distributions, quantiles, QQ plot etc. so please try to include related important stuff.
Thanks in advance for your time.
can you be a bit more specific? what type of position is it? at what level?
would you mind if I dmed you?
please.
entry level but i expect some detailed question too as its a startup
you know that a random forest is literally a bunch of decision trees thrown together right?
a single decision tree is an extremely limited model
max depth 3 also sounds a bit shallow, though that depends on your data
including the target varible(y): 9. Do you have any idea how I should find out a good depth?
you mean number of columns or?..
There are many ways, also a random forst trees are technically just a bunch of decision trees that average their results to give a output
since you are using a decision tree classifier, it's best to first see your datas total columns as decision trees branch based columns averages
One reason may be that your data has way more dimensions than your tree can handle, therefore perhapse it'll pay off to increase the depth of the tree
If the depth of the tree exceeds the total colums of the data, there may be issues with the data itself try seeing if the data is complete, i.e. there are no missing values
It may also be the case that the data itself is non linear in nature, therefore it'll be hard for a decision tree to model the data
but about the original question, as far as I am aware, using random forests (even if your final model isn't a random forest) is the best way to automatically determine which features are or aren't important
usually you may want to filter by hand as well
yes.
you can also speed up inference or sending data for inference via dimension reducing techniques such as PCA, which is basically finding which feature (columns) influence the labels (prediction) the most
Course this also comes at the cost of you potentially discarding important information that may pop up in the future that would be important to your model
@cerulean kayak that help?
give me a second, I vaugly know what you are talking about: if you can't tell im in a college class and alot of my problems are stemming from the fact that I know stuff but I don't know it's specific name.
Pca is principal component analysis
Examples using sklearn.decomposition.PCA: A demo of K-Means clustering on the handwritten digits data A demo of K-Means clustering on the handwritten digits data Principal Component Regression vs P...
is this a form of preprocessing?
you could also consider using SHAP to determine feature importance https://towardsdatascience.com/using-shap-values-to-explain-how-your-machine-learning-model-works-732b3f40e137
Learn to use a tool that shows how each feature affects every prediction of the model
Kind of, it's more of a analysis of what columns you dont need, hence analysis in principle component analysis
@cerulean kayak
okay and pca is not limited to clustering algorithems? because i know it deals with knn, which is a clustering algorithem
no PCA is a method for transforming your data into a new space where each axis explains how the data varies.
It's typically used for visualizing high-dimensional data (probably where you saw it being used for KNNs), but it can also be used to generate new, more relevant features from your data
also what the heck do you mean by hand as well? That can't be machine learning.
as in drop them from X
Actually, most of machine learning is shifting through a sample of the dataset by hand to make sense of it at first
like if u know feature_45 is not useful, u may not even query the data from the database say
The predictions are learnt automatically from the model (e.g. the decision tree). The feature engineering (pre-processing of the data) can often be done by hand.
look into your data and make sure that the things you are feeding into the model makes sense before putting your data into a model?
data collection and preprocessing are extremely important steps, despite not being part of the model itself
okay so ya. a line I omited was
sex_map={'M':1,'F':2,'I':3} #
data['Sex']=data['Sex'].map(sex_map)
which is kinda an example of that.
"i"?
no it does i think
a tree based model would do something like 1 goes left, 2 and 3 goes right or 1, 2 goes left and 3 goes right
I for 'i dont know'
infant apparently
still treats it as categorical? thought it would do <=2 go left, <1 go right
which implies some order in the feature
no, it does treats it as numbers - I'm just using the discrete labels because those are all the possible values
ah okay
what type of explanation are you hoping to find for this project? If you determined that this feature hurts your model's performance, that's evidence enough to remove it from training.
but yeah it cannot do 1, 3 left, 2 right in one split I think
yeah idt it can if its treating it as numbers
so is mapping it like this wise? because im basing this off a lab my ta did for a DT and they did the same thing but with doors on a car: {2 doors:2, 3 doors:3 4+ doors : 3}
read up ordinal vs nominal data
stuff like gender, brand & colour is nominal
generation (boomer, millenial, genZ) is an example of ordinal
hmm thinking about it, generation may not necessarily be ordinal too, depending on the context
cupy basically reimplements numpy methods to use CUDA where possible right? how are dependencies resolved for higher level projects that depend on numpy?
okay and real quick @agile cobalt what do you think I should do for the depth of my tree?
bruh, did you not read the large ass blurb i sent your first?
Just experimented with a transfer learning model
does the constant up and down fluctuations of loss and accuracy mean anything?
academic term for it is called high variance iirc, there are a lot of things that can affect this and it depends highly on what exactly your model is and how you are feeding the data
for background, its a EfficientNet B0 model that im tuning the fully connected layer to classify 3 food classes
proly an overkill model but ye
may be that the layers you didn't freeze have too high of a learning rate set to them
or it may be beneficial to increase the batch size
0.001 seems pretty small hahah
batch size is 32 en. lemme double it
another q, do we need train val test datasets for NNs?
or is 2 sets enough
currently im only splitting my data into train and test
train val/dev test is extremely important to fine-tune a model, without a test set you risk over fitting to your model to the val/dev set as well when trying to address variance issues between your train val/dev set
cool
thats what i thought
larger batch size = more gpu memory usage, coz more data has to be in memory when updating the model params?
depends on where you are loading the batches to yes
also i would suggest you reduce the epoch count, it doesn't seem like the dataset you are fine tuning it to is large enough to justify 100 iterations on it
yeah haha
50 seems more than enough
oh hmm something went weird
epoch 81 onwards, both loss became nan
and accuracy tanked
something may be wrong with your data
oh
no, try clearing the variable holding the accuracy train_accuracy test_accuracy
probably means something went wrong in one of your scoring or loss functions ๐
note how train_accuracy and test_accuracy go beyond 80 epoches
ya coz loss at epoch 80 onwards is nan
i have no clue what your code structure is but it may have been that you forgot to reinitialize the variables you used to graph the loss and accuracy
there
at the bottom, i have a train function which i call with all the parameters i need
seems like the only way loss can be nan is the len(dataloader) being 0
line 69 and 122
gradients...
unfortunatly I use tensorflow but based on my limited knowledge of pytorch my assumption is that you didn't update the test_dataloader variable to be the new epoch
try adding a print statement with print(len(test_dataloader)) at line 122
that's the best i can come up with to verify that you did things right

haha i cba tbh since i wont be running 100 epochs
will leave it for future me to figure out when i run a model that does require that many epochs
other than that I'm unsure what parts of the model you are trainning but I think adding some form of regularization may help, or maybe reshuffle the images in the dataset to see if maybe you may have some batches that are easier than others
oh, i though you where training all layers, when i meant by the whole model i mean you un froze all the layers
besides those things, i guess another thing to help with regularization to smooth out training loss would be to add dropout to the MLP if it isn't already there, I'm unsure about the architecture of efficientnet
Yup
- understand and read are 2 very diffrent verbs
2). I am not as smart as you. And you've been saying alot of stuff, so that could refer to alot of diffrent posts.
You could have ask me questions on what parts sounded confusing 
okay.
it's best to first see your datas total columns as decision trees branch based columns averages
what do you mean by "datas total columns as decision tree branch based columns averages"?
also if you are not supposed to do mapping to values for nominal data, should I use dummies instead?
Poorly worded on my part
Decision trees work at a high level by chosing a column and then deciding how to branch into another column
Let's say that your data is a matrix/tensor of dimensions/shape [m, n]
Therefore in theory the most optimal branch depth would be where depth = n before the tree starts looping over all columns
On the subject of mapping to values for nominal data, traditionally it doesn't matter as decision trees can also split on nominal data
however sklearn decision tree classifier cannot handle nominal data and therefore you must transform it to be ordinal
@cerulean kayak
Also, when i mean optimal i mean by accuracy, if you want optimal in terms of accuracy and inference speed then you'll need to figure out how to reduce how many features (columns) are in your data set
Hence you can perform pca to prune features that are deemed irrelevant
no I don't care about speed. Python's motto should be "what speed, lol"
but seriously just accuracy.
also, so will mapping sex to 1s and 2s make it inaccurate?
u can one hot encode them
%$#$@#&%
okay...
Yes if there's more than 2 items you are trying to map but as someone else said, just one hot encode them, which will not decrease accuracy
It all depends on how you manipulate your data, welcome to machine learning where 80% of the time is asking how tf do i make my data work
o trust me, yesterday my model had 0.76 accuracy and today it has 0.23 and i changed the print from a .format to print(f"")
sounds like a bigger problem than the print
well i have 2 other witnesses who say the same thing
what sklearn thing did you want to subclass?
an estimator maybe?
like how its done in pytorch/tensorflow
an estimator? not sure what you're referring to.
any classifier
https://towardsdatascience.com/how-to-build-a-custom-estimator-for-scikit-learn-fddc0cb9e16e found this
Implementing a custom ensemble model with under-sampling for imbalanced data
nothing this complicated but what im saying is, whats the diff of declaring a classifier like
lin_reg = LinearRegression() ```
and ```py
class SpecialLinearRegression(LinearRegression):
def init(self):
pass
special_lin_reg = SpecialLinearRegression()```
what are some good options for managing data you plan on sending to tensorboard?
or just tensorboard tooling and data science log/benchmark data in general
Any good resource recommendations (websites, books, etc) for learning AI with Python? For complete beginners
what do you mean by complete beginner?
import numpy as np
import matplotlib.pyplot as plt
class NeuralNetwork:
def __init__(self, layers, lr, epoch, X, t):
self.lr = lr
self.epoch = epoch
self.layers = layers
self.X = X
self.t = t
self.weights = {layer_idx: np.random.randn(layers[layer_idx + 1], layers[layer_idx]) / 5 for layer_idx in
range(len(layers) - 1)}
self.bias = np.random.randn((len(layers) - 1), 1) / 5
self.z_dict = {i: np.zeros((layers[i])) for i in range(len(layers))}
self.z_dict[0] = X[0].flatten()
delta_3 = (self.z_dict[2][0] - t[0]) * (self.z_dict[2][0] * (1 - self.z_dict[2][0]))
delta_4 = (self.z_dict[2][1] - t[1]) * (self.z_dict[2][1] * (1 - self.z_dict[2][1]))
self.delta = np.array([delta_3, delta_4])
self.plot_data = []
def forward(self):
for z in X:
z = z.reshape(-1, 1)
for layer_idx in range(1, (len(layers))):
a = np.matmul(self.weights[(layer_idx - 1)], z) + self.bias[(layer_idx - 1)]
z = 1 / (1 + np.exp(-a))
self.z_dict[layer_idx] = z.flatten()
error = 0.5 * (z.flatten() - t) ** 2
total_error = np.sum(error)
return total_error
def sigmoid(self, z):
return z * (1 - z)
def backward(self):
for l in reversed(range(len(self.weights))):
diag = np.diag(self.delta)
arr = np.array([self.z_dict[l], self.z_dict[l]])
new_derivatives = np.matmul(diag, arr)
self.weights[l] = self.weights[l] - (self.lr * new_derivatives)
self.bias[l] = sum(self.delta)
sigmoid_arr = np.diag(self.sigmoid(self.z_dict[l]))
self.delta = np.matmul(sigmoid_arr, np.matmul(self.weights[l].T, self.delta))
return self.weights, self.bias
def train(self):
for e in range(self.epoch):
total_error = self.forward()
self.plot_data.append([e, total_error])
self.backward()
print(f"{e}: {total_error}")
return self.weights, self.bias,
def predict(self):
return self.z_dict[2]
def plot(self):
data = np.array(self.plot_data)
plt.scatter(data[:, 0], data[:, 1])
print(data[:, 0])
print(data[:, 1])
plt.xlabel("Epoch")
plt.ylabel("Total Error")
plt.show()
if __name__ == '__main__':
X = np.array([[0.05, .10]])
t = np.array([1.00, 3.00])
lr = 0.5
n = 2
H = 2
output = 2
epoch = 40000
layers = [n, H, output]
nn = NeuralNetwork(layers, lr, epoch, X, t)
nn.train()
print(nn.predict())
nn.plot()
rate my neural network code?
ok well thank god it's not javascript
lol
i have no control-f here. wtf lol. does that sigmoid function work?
have u ever heard of squash function in math
quantecon has a few good books on finance, but also a good intro to python & data science:
I would start here: basics on the python ecosystem for data science programming for econ/finance: https://python-programming.quantecon.org/intro.html
-
quantecon with python: this is probably a bit too much if you're in HS, but it gives you information about where to find data sets for econ/finance. https://python.quantecon.org/intro.html
-
quantecon with julia (mostly the same problems as above, but in Julia) https://julia.quantecon.org/intro.html
-
network economics: https://networks.quantecon.org/
when i looked at finance/econ in the past, it seemed that getting data sets and access to data streams were about as complicated as any programming.
i don't think i've heard it called that.
so z is a probability then. does np.diag actually diagonalization (like with Jordan Normal Form, i might be mincing terminology here) or does it just take the diagonal?
not at all
also jordan normal forms are not diagonalization in general
what np diag does is one of two things:
- if you give it a vector, it spits out a diagonal matrix that is 0 everywhere except on its diagonal where it has your vector
- if you give it a matrix, it takes the diagonal of that matrix and spits it out as a vector
so, it's maybe useful when you want the variances and not the covariances
!e
import numpy as np
M = np.random.normal(size=(3,3))
m = np.diag(M)
print(M)
print(m)
M_hat = np.diag(m)
print(M_hat)
@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.
001 | [[-0.36296454 -1.46717512 2.73763072]
002 | [ 1.42493807 0.14579904 0.99692921]
003 | [-1.60806289 0.37930145 1.77627134]]
004 | [-0.36296454 0.14579904 1.77627134]
005 | [[-0.36296454 0. 0. ]
006 | [ 0. 0.14579904 0. ]
007 | [ 0. 0. 1.77627134]]
that could be an example, sure
Do you still need this?
why would errors like this "Unsatisfied version of shared singleton module @late shell-widgets/base" occur?
i'm getting a similar warning when trying to render textures with k3d. i'm trying to evaluate whether I can write data to the texture and update it, but it's not rendering.
i was looking into diagonizability about 6 months ago. i can't remember why, but i came across jordan normal form. i guess it's what you can do if it's not diagonalizable @wooden sail if i could ever get past the boilerplate of setting up environments/languages, then i could actually apply things and then probably retain them.
ok i guess i could just use PyVista/Trame then
Why cant i use an activation function in the last layer here : ```python3
model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(70,), dtype=tf.int8),
tf.keras.layers.Dense(400, activation='relu'),
tf.keras.layers.Dense(2000, activation='relu'),
tf.keras.layers.Dense(1500, activation='relu'),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dense(400,activation='relu'),
tf.keras.layers.Dense(200,activation='relu'),
tf.keras.layers.Dense(1,'sigmoid'),
])
Epoch 1/2000
518/518 [==============================] - 13s 22ms/step - loss: 0.4892 - mae: 0.4892 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 2/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 3/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 4/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 5/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
Epoch 6/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.4891 - mae: 0.4891 - val_loss: 0.4873 - val_mae: 0.4873
but if i use no activation function
Epoch 1/2000
518/518 [==============================] - 13s 22ms/step - loss: 0.7076 - mae: 0.7076 - val_loss: 0.2777 - val_mae: 0.2777
Epoch 2/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.2634 - mae: 0.2634 - val_loss: 0.2562 - val_mae: 0.2562
Epoch 3/2000
518/518 [==============================] - 11s 22ms/step - loss: 0.2405 - mae: 0.2405 - val_loss: 0.2353 - val_mae: 0.2353
Epoch 4/2000
518/518 [==============================] - 13s 24ms/step - loss: 0.2238 - mae: 0.2238 - val_loss: 0.2233 - val_mae: 0.2233
Epoch 5/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.2098 - mae: 0.2098 - val_loss: 0.2134 - val_mae: 0.2134
Epoch 6/2000
518/518 [==============================] - 11s 21ms/step - loss: 0.1987 - mae: 0.1987 - val_loss: 0.2025 - val_mae: 0.2025
using any other activation function in the output layer results in no progress in training. I also tried to just use
def custom(x):
return tf.clip_by_value(x, clip_value_min=0, clip_value_max=1)
The output has to be between 0 and 1. Can anyone help me here ? I appreciate any help.
What is your x and y?
x_shape(70,) y_shape(1,)
x is a chessboard 8*8 + extra info 6 bytes and y the evaluation score for the board
y is real valued between 0 and 1?
yes
Then you need a linear activation and not sigmoid in your last layer
but
def custom(x):
return tf.clip_by_value(x, clip_value_min=0, clip_value_max=1)
is linear, right ?
Linear is the default, just remove sigmoid.
yeah i know but the output values have to be between 0 and 1. If i just dont use a activation function. There will be values higher than 1 and lower than 0
Have you tested this so far?
Yes
And it was larger than 0 and 1?
Yes
You can also just use a linear activation and clip inside of your forward method EDIT: it's called call in tensorflow
What loss are you using?
mae
I also tried mse but mae worked better
But to me its still weird that i cant use any activation function even the clip one, and i think i said something wrong its not the score of the board but the expectancy that whit might win. 1 is white completley winning 0.75 white has advantage and 0.5 is even and 0.25 is disadvanage for white and 0 is completley losing, but continues values. so any value inbetween is possible
The y values for training are also be between 0 and 1 continous
For example for predicting pixels I've done linear => sigmoid before with binary cross entropy loss
MSE could work as well in this case
I am currently trying but it doesnt look that promising
I'd also just make your network a lot smaller
i have to finsih training to see that
if you want learn about XAI with python --https://github.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python
Helping people debug their networks is hard if I'm not sitting next to time ๐คฃ
Yeah, but i appreciate your help. Sitting training the network the whole weekend here.
So some values are still a little bit above like 1.07 when just using mse and linear activation function. I will just clip the values after evaluating as you said.
Green is mse, yellow is mae
So it actually is a little better now
Do you have any further advice to increase the accuracy of the model ?
Making it smaller for starters and trying cross-entropy loss
I dont quite understand how i should use a cross-entropy loss. I thought these were used for classification and integers representing the class labels. Whereas i use continous values. Should i divide into value ranges so 0 -> 0.1, 0.1 -> 0.2 and so on ?
No you just drop it in. Cross-entropy works for anything between [0,1] (look at the formula). This is what I did when I was training an autoencoder, pixel space is [0,1] so I could use sigmoid => MSE or sigmoid => cross-entropy.
Loss functions are strongly related to a different likelihood so you're optimising for something else as you would in MSE (\eta ~ gaussian vs. Y ~ bernoulli). You can reason about what makes more sense in your case, I think there's arguments for both! ๐ OR you just try it out and see which one works best empirically
Hey guys I was trying to install tenserflow library but my cmd don't work as it says "pip is not recognized as a internal and external command "
what should I do to resolve this issue?
You have to set the path to the directory where python is installed in you envroirment variables
but its recomended to use a virtual envroirment
this is the python file as well as library
sigmoid => cross-entropy seems to be working quite well. Thank you
Tip: make your neurons a power of 2 and really make your network smaller. Add early stopping. Sprinkle in some drop-out if you're overfitting
pip is in Scripts
so should I put the library file inside scripts?
No
just look up how to set up a virtual envroirment for example with conda and than activate the envroirment and you can get started
I can't do it on a normal python envroirment?
You can, but this can become an issue later when you have other projects using the same interpreter, because the dependencies might have missmathching versions and so on. If you want are more userfriendly way, you can download PyCharm they have a VirtualEnvroirment inuild in their ide
alright I will do that thanks
Oops
Hi guys,
I have images for 60 patients which gives cell types and wether the cell is cancerous or not.
And then I have images for 40 other patients which only tell wether cell is cancerous or not.
How can I make use of the extra 40 patients data to train celltype classification in CNN.
What do you want your classifier to output? Should it output, "This cell is cancerous and the type is ..." or "This cell is not cancerous"? Or something else?
You don't need to identify cancerous versus non-cancerous?
Nope
In that case I think the data where you don't have the cell type is useless.
Really
Can't be
In the assignment it specifically says you need to find a way to use it
I guess I can imagine training an autoencoder with it.
Okay, maybe it's not useless.
Gpt says to use that extra data to augment the images
And then dropping the extra labels
It says that will give extra info to the augemntor
Yeah but it helps when I have no idea what I am talking about too
You said you had an assignment. What kind of course is this?
ML
Is the assignment specifically about certain architectures?
Nope. Just needed to classify images
And justify the choice
Gpt said cnn #1 CHoice. I trusted it
What kinds of classifiers are you familiar with?
Please don't. It's a language model. It got some text as input and it generates text as output. It has no understanding. If you want proof, ask it to do arithmetic. Or just, "reverse the digits of 3141592653589793238462643383279".
What kinds of things do you know how to train?
The reversed digits of 3141592653589793238462643383279 are 9723834362468943975859382659413.


