#data-science-and-ml
1 messages · Page 135 of 1
Stelercus 👉 👈 
I appreciate that you think I'm the expert, but I am not.
but orange name
mods are orange
admins are tomato
but tomato
anyway, if the experiment setup gives the model a clear signal, and the modal architecture is appropriate, the model will converge on something. it just won't necessarily be the best possible model.
my example proves it wont
what won't what?
huh
my example shows how convergence is impossible (clearly its not) so my reasoning must be wrong somewhere
you said "it won't", which is short for "x will not y", but idk what x and y are.
my example proves that even with a model having a clear signal and an appropriate architecture that the model will not converge on something
I think one of those two things might not be true. or the hyperparameters are bad (which I guess is a third condition that I didn't mention)
my example doenst involve hyperparameters
well, I've never done reinforcement learning
ask me about interactive LLMs.
🤧 
Right. I actually had validation code before (which I accidentally left out of the post here)
# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Determine number of GPUs
if device.type == 'cuda':
num_gpus = torch.cuda.device_count()
print(f"There are {num_gpus} CUDA devices available.")
else:
num_gpus = 0
print("No CUDA devices available.")
When I run this on my local machine, I get:
No CUDA devices available.
Initializing a new model.
Parameters to be optimized: 7041970
When I run it on a new pod (with different hardware) I get:
There are 2 CUDA devices available.
Initializing a new model.
Parameters to be optimized: 7041970
ill keep reading books until i find an answer to this
okay, so make sure it's always device = torch.device('cuda'). If you don't have a GPU available, there's no point trying to continue.
In the case where you have 2 CUDA devices, that appears to be where you end up with the "tensors are on different devices" problem. I've never run an experiment on more than one GPU.
Thanks for the reply.
Yeah, I suppose I can bail out if there is no CUDA.
But as far as the "tensors are on different devices" problem, I am stumped. Just to make it clearer, I am now using .cuda() and that on everything in train.py that will accept it but still not working. Also, now I am only working with one GPU with the hopes that I can get it running on 1 before trying to get it to run on multiple.
//...
train_data = torch.load("assets/output/train.pt").cuda()
valid_data = torch.load("assets/output/valid.pt").cuda()
//...
if update:
try:
model = torch.load("assets/models/model.pt").cuda()
print("Loaded existing model to continue training.")
except FileNotFoundError:
print("No existing model found. Initializing a new model.")
model = GPTLanguageModel(vocab_size=len(vocab)).cuda()
else:
print("Initializing a new model.")
model = GPTLanguageModel(vocab_size=len(vocab)).cuda()
//...
train_loss = estimate_loss(model, train_data).cuda()
valid_loss = estimate_loss(model, valid_data).cuda()
time = current_time().cuda()
//...
# sample batch of data
x_batch, y_batch = get_batch(train_data).cuda()
# evaluate the loss
logits, loss = model(x_batch, y_batch).cuda()
//...
torch.save(model, "assets/models/model.pt").cuda()
print("Model saved")
Does anyone know how can i normalize the points so i can use the mask with roboflow?
fixed
Because off-policy algorithms can update a different target policy than the behaviour policy. This is what the Q in Deep Q learning stands for.
I'd check out the paper I sent you
Hey guys, I've been training a AI coco-model on image detection lately in google colab. I was wondering if anyone had a link oe two that would explain a way in which I can somehow download this model I've trained so that I can import it to a new file and just plug in an image to be detected rather than re-run the enitre model on colab for it to be used everytime I restart my PC.
just download the weights/checkpoints, and the configuration file @wild loom
you can run !find . -type f -name *.{ckpt,pth} i think
otherwise maybe this !find . -type f -name "*.ckpt" -o -name "*.pth"
those are frequent but it may be .keras etc depending on what is the framework and format.
okay thank you for your help I'll try that out
Guys which field or topic of machine learning should i focus more
because i think that i understand most of the supervised learning topics and their codes are pretty similar
same goes for unsupervised
is making a custom loss function good in certain cases?
Yeah, it can be useful if you want to penalize certain behaviour of your model
F.e. increasing the loss for underrepresented classes, and decreasing loss of overrepresented classes.
How to do semantic segmentation?
Is it a better option than bounding boxes for object detection
So basically, I am trying to make a project and I have already used the COCO dataset
I want to combine it with the Mapillary vistas dataset which is more specific for street objects
Havent found any resources regarding object detection using bounding boxes but only results for semantic segmentation
if the loss is shown to be nan what does that mean?
nvm researched it
it means something like a division by zero or multiplication/division by infinity happened somewhere
oh ol
Can you guys suggest some datasets which I can train a model on or use a preexisting model on which can be used for Object detection NOT Semantic Segmentation for detecting objects on the street. Already using coco dataset want to combine with another
kaggle? google datasets?
decided to use open images
Any tutorials you guys can suggest or any links to code a custom object detector - Specifically for tensorflow. Like Object Detection API (A good tutorial which you find useful for this would be much appreciated)
Also if I already know tensorflow , should I still learn pytorch?
YoloV8 and others require pytorch
A question about TorchRL:
checkpoint = torch.load('ppo_model.pth')
actor_net.load_state_dict(checkpoint['actor_net_state_dict'])
value_net.load_state_dict(checkpoint['value_net_state_dict'])
ProbabilisticActor_policy_module.load_state_dict(checkpoint['probabilistic_actor_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
Adam_optimizer_used.load_state_dict(checkpoint['optimizer_state_dict'])
GAE_advantage_module.load_state_dict(checkpoint['gae_state_dict'])
maximum_average_reward = checkpoint['maximum_reward_tensor']
For some reason the average reward decreases after I load the model. I saved the states of Actor Network, Value Network, ProbabilisticActor state, Cosine Annealing Learning Rate Scheduler state, Adam optimiser state, Generalised Advantage Estimator State and maximum average reward number.
Do I need to save
- sub-batch ReplayBuffer state
- SyncDataCollector experience collector class state
- ClipPPOLoss class state
- Do I need to save the model before .step() method of my Cosine Annealing LR Scheduler or after?
https://discord.com/channels/267624335836053506/1263119686867091487
Helllo guys hope u good , i have a tiny problem , so basically i have my zip who contains a well structured Python code files (CTGAN model) and i want to implement it in api so i could use it in a desktop app , where i should deploy the code first ?
Book says I’m correct
Stochastic approximation theory proves that if the random exploration chance doesn’t decay to zero convergence is impossible
So most cost functions allow us to reach local minima and which local minima you end up in depends on what your random weights and biases that were set initially are, isn't that a inefficient ? because you are missing out on much better minima so maybe something llke trying several random initializations which gives us a higher chance of getting a lower local minimia
So what if your exploration coefficient (entropy) = 0% until you are stuck at the same reward too long, at which point exploration chance will start increasing until a better reward is reached?
This is something that is commonly done yeah.
But if the search space is so big, you can't find every local minimum (and thus finding the global minimum is almost never possible)
But it's great that you come to these conclusion by yourself 
oh that makes more sense
Thx man
Yeah he mentioned that
❤️
I’ve never heard of that but maybe
yeah it's called adaptive learning
finding local min/max is very easy. finding global min/max requires us to be able to either try all numbers or differentiate the function itself.
the most widely used method is to have multiple different starting points so that a large area of numbers is covered
Chaining? Yes
you should absolutely never use inplace
and yes, chaining is pretty common
Never? Why not? I've read two articles on it now and I think it offers performance benefits for methods like drop, and fillna
Specifically this https://sourcery.ai/blog/pandas-inplace/
the gain is negligible compared to the headaches it can cause
in particular, that exact article you linked is saying that you should not use it for drop
I think they made a mistake there, since drop is shown in their chart as being possible to do without making a copy. They then say that if that's the case, its ok to use inplace
drop is not going to copy data one way or the other
drop is the only method that has a green checkmark in their chart, that is later listed as a method to avoid using inplace with
that confusion is all the more reason to just never use it
hi! for a project, i'm interested in creating a small application that would be able to read/parse through links and mark down certain info like the article name, the original language the article is in, etc. how would i be able to do this? could someone redirect me to a youtube video or an api i'd be able to use? i feel like i could possibly incoorporate the usage of ai to automate this process. thanks!
is there anywhere i can ask R related questions? if anyone is knowledgable i can ask from dm as well, i have a least squares question
depends on what you want to do. hyper graph neural nets are on the rise
collect from roboflow
They are similar, use whatever you want. I use keras 3 currently, it's great.
if you understand neural networks and python well enough, you should be able to switch between tensorflow and pytorch if the situation requires it.
that is: no.
I have no idea what those are, I assume you need a solid understanding of graph theory to get into those?
honestly, there is no situation where i would require tensorflow over pytorch
I've never used tensorflow at my job.
it seems like the only people who use tensorflow are tutorial authors
rightfully so, that package is VERY overrated
i'm just starting with it, yes, you do need some background
it's a bit more complex than graph neural nets
i hate the fact that with neural nets intuition is just out the window
wdym?
there are neurons which take input and have different coefficients, those coefficients produce a result
an oversimplification
but even then the intuition of the researcher towards the data is never better than the neuron itself since neuron really do not explain anything
yeah that's like saying that cars are like horses
yeah google has been lacking on maintaining the framework, a lot of the data processing functions especially don't work the way their documented on the documentation (bc they don't maintain that either) or work at all
is it necessary to convert images to Grayscale ( for CNN )
because I am dealing with weather images!
necessary no, i don't think so, but may be wrong
not necessary but a 3D CNN is a little harder to implement, adding RGB channels makes it 3D while grayscale keeps it 2D
a mixed cnn where one model takes the grayscale and the other the rgb values of the pixels maybe?
but then you would need more than 1 layer of input neurons :/
an image is a 3 D tensor, you can feed that to a CNN network
what is this?
I am directly loading dataset using dataset!
and my first input is 3[channels] for CNN
^ and use 3D convolution
well, it's 2D
the D in a convolution are the dimensions of the window, not of the cube
is that with converting the image to grayscale?
2 D means you specify the kernel window size
I mean would't you add another dimension of corresponding kernels for the added RGB dimension?
1D would still be 3D, but you specify 1 dimension
no, that's a bit confusing at first
the depth is always automatically set to the depth of the input data
you only specify the area for 2D, and the height for 1D convolutions.
yeah so it would go 1D -> [x], 2D -> [x,y], 3D -> [x,y,z]
and so should the kernels, if an image has a dimensional depth of 3 then the kernel matrix would be 3D
okay after debate just explain me in simple!@
3D convs you set the depth, they aren't very common id say
oh wait nvm I see where 2D is used for RGB
When we do 2d convolution with RGB images we are, actually, doing 3d convolution. For this we still use the pytorch 2d_conv layers. When we do 3d convolution of a set of RGB images, we are doing 4d convolution and can use the 3d conv layer. My question is: what is the difference, if any, between using the 3d conv layer for a set of grayscale i...
interestingly, what stuff like pytorch will do is apply 2d convolutions separately to each layer of color, then add the results up
yeah I saw that on the forum
however, this turns out to be equivalent to doing a 3d convolution if you ignore all of the outputs that aren't fully overlapping
yes, that's what i was trying to say
I see now
you're free to interpret it as you like for this one
but each has != weights
true
it's not the same kernel
(you can make it have the same weights)
sure, but that's not commonly the case
in any case, the way pytorch does it by default can be written on paper both as several 2d and a single 3d convolution
just kind of a boring 3d conv
yes, it's one of a single step, if i understand correctly?
like the same transition happens from 1D to 2D convolution imho
every higher dimensional convolution is just a giant 1D convolution
well, you just had to trigger the topic that made me a meme
@unkempt apex so imho you'd just use conv2D to be brief. others may disagree
N-D convolution can be represented as a multi-level block-toeplitz matrix
the number of dimensions and the order of unrolling the multi-way array determines whether you have blocks that are toeplitz, or toeplitz blocks
so in that sense yes, you can unfold multidim convolutions into an operation that looks like a 1d conv
where is the image from? text has so many weird words lol
its from kaggle, this article https://www.kaggle.com/code/shivamb/3d-convolutions-understanding-use-case
interesting, there are several misspellings in that single par
but the image looks right
ptrblck the nvidia guy from pytorch forums, he is so clever lol
To add to this / explain it a bit if you look it up. You can represent a ton of a operations as a matrix multiplication by just setting a bunch of entries to 0 (often representing lack of connection / edge / interaction) (which then can be skipped for performance reasons). It's kind of like adding 0 to an expression. So why do this? So you can write it down in linear algebra form to analyze it.
yeah the amount of times his answers carried my torch projects 💀

nice
i'll have to read, guess i have 0s in my brain
i assume that if there is a zero you may skip reading the other matrix's row, or col, but not anything else
You skip the reading / loading / multiply / add.
yeah
For example. If you have say two vectors: [0, 1, 0, 0] and [1, 2, 3, 4] and you want to element-wise multiply them. You can skip almost all of the work if you know the index of the 1 in the first one-hot vector.
matrix mult optimisation is quite nice
yea, if 1234 is a mtrix, still more savings
Yes, if your matrix is like 80% zeros, you get massive gains.
but that's unlikely if those are actual weights, but i guess it's useful somehow, like you indicated..
Not if your weights are sparse...
if the conv kernel is comparatively small wrt the image, you immediately have humongous sparsity (and you'll notice this is super often the case in CNNs)
so something along the lines of a piecewise?
{ 0 if V1_n || V2_n == 0
{ V1_n * V2_n if V1_n and V2_n != 0
And it gets even better, as it's shared weights.
but are you assuming some of the entries in the kernel are 0s?
As you may have guessed, graph problems usually don't have everything connected to everything, often the opposite, they are sparse.
they don't need to be
the linear transformation acting on the image is the same size as the image, and everything outside the kernel size is automatically 0
I like to show this video to get across the idea, it's well made: https://www.youtube.com/watch?v=0fHkKcy0x_U
Ever run into this funny little puzzle? It appears in Legend of Zelda: Link's Awakening, LEGO Star Wars: The Skywalker Saga, and in a 1995 electronic toy called Lights Out. It turns out that this game has some pretty rich math. In this video, we'll learn about modular arithmetic and the matrix inverse. We'll also learn about substitution ciphers...
yeah i had to write some graph parsing a while ago
actually, that happens in vector encoding of characters
(this also is an example of why you want to have it (the problem) in linear algebra form)
uhm..i don't think i get that but thanks for trying to explain
maybe i can cook something up. in the 1D case, imagine we have a vector of length 15 and we want to convolve it with a convolution kernel [1,2,1]
the matricized transformation would look like this
gonna take some time 🙂
!e
import numpy as np
import scipy.linalg as slin
import matplotlib.pyplot as plt
N = 15
kernel = np.zeros(2*N - 1)
kernel[13] = 1
kernel[14] = 2
kernel[15] = 1
M = slin.toeplitz(kernel[N-1:], np.flipud(kernel[:N]))
plt.imshow(M)
plt.savefig("biggest_oof.png")
ugh
from my terminal
you get a nice sparse, toeplitz matrix representing the convolution
what should be count of epochs?
for training CNN?
I have dataset with approx. 1200 images which are divided into 4 classes
put many (say 200) and use the "early stop callback" (i.e search about it.) @unkempt apex
with patience
200?, ohh I was litteerally only training on 10 epochs and analyzing output
earlystopping is good!
great 🙂
In RL if I have an agent with the pure goal of staying alive and is rewarded after every day it’s still alive. Is there an analytical difference in making the new reward per day constant (1, 1, 1) vs increasing (1, 2, 3) ?
do agent scores get reset each day?
No they get reset on death
End of episode
regardless of the answer to that, there is one difference depending on your implementation. Say you are rewarding agents x score per day survived and 5 score for getting berries, if x increases proportionally, that will make berries less impactful as a source of reward and thus change what the "optimal policies" are for that task during the training.
Book says rewarding for getting berries is wrong
idk what you're actually doing I'm just giving a hypothetical
I am only rewarding for days survived
what kind of RL are you talking about here, q learning?
Deep q
if days alive is the only source of reward and death setting the agent score to 0 for the rest of the sim is the only punishment, I can't think of any analytical reason that increasing the reward over time would change the training compared to keeping it constant
but
maybe there is one idk
Ty
Hello everyone, can anyone tell me where to find or buy Spanish proxies?
eBay

sorry, i had some delay to read it, but that's a nice expl from the post:
In a 3-dimensional convolution, you would use a 4-dimensional filter, which still uses all input channel, but moves in all 3 volumetric dimensions.
The method is very similar to a 2-dimensional convolution with an additional depth dimension the filter moves along.
thanks !
ah I see what its getting at with 3D convolution being used mostly for something like video convolution
Could be useful for interesting problems. Video seems likely indeed @fervent shore
maybe chemical reactions, for example
ahh, the blogpost talks about drug discovery, interesting! i'm just tweaking / rewriting a net for a similar purpose.
I know I shouldn't ask to ask, but would anyone mind to do some sorta code review for a non-linear regression?
i don't have enough knowledge, also, it may be better in the #algos-and-data-structs ? not a problem for me though
regression is ai, right?

I mean...barely but yeah...
it is normally included in most books, i guess i consider ai=dl which is unfair
It's fine either way, it seemed to be producing reasonable result
So the code is right?
by non-linear you mean polynomial?
Yes
i may be able to read it then
Oh, would you mind?
i wouldn't if you put it on some github ill check it out as codespace
would be good if it's got unittests
or whatever is called in python
that's a neat way to test it, adding some tests
https://github.com/sunnyayyl/machine-learning
Do note, the code is quite bad
don't worry im a very mediocre coder
This is the first time I'm implementing these kinda of algorithm
probably won't say much either
why did you call it non-linear though? I think in terms of linear algebra is still linear, but may be wrong
Just me being stupid
yeah, in terms of linear algebra it isn't bc it's linear in terms of the coefficients
Huh, so when is it not linear?
you don't use x1,x2,...xn as variables but as constants.
deep linearning is that
lol, deep learning
i'll go for a walk though, i'll eventually sit aand read it
np, i've got nothing to do
looks perfect
what i found a proble, is when you have large values and they are closely spaced i think
there is some conditions where it failed (other code, not yours.)
Oh I can't find dataset, I'm just randomly generating value
i'll link you one
Oh, thanks!
check their datasets https://github.com/mljs/regression-polynomial
It's kinda sad, they're used by 1.8k, but only have 15 stars
why sad?
Kinda funny, I suppose? Usually, I would kinda expect used project to have a considerable amount of star
As appreciation, I suppose
Where are the axis title tho

Dunno what to name the axis tho, it's random data
@partial(jit, static_argnames="data_size")
def generate_data(key: Array, data_size: int) -> tuple[float32, float32]:
_, subkey = random.split(key)
x = jnp.sort(
random.uniform(key=subkey, shape=(data_size,), minval=-500, maxval=500)
)
# x = jnp.arange(-200.0, 200.0, step=30.0)
y = 2 * jnp.pow(x, 2) + 6 * jnp.pow(x, 3) + 1
return x, y
x and y, I suppose
Yeah, just don't leave it blank (good habit)
Fair enough
units can be au arbitrary units
Imagine if I got x and y flipped, it'll be so embarrassing
Oh and the le8 is that *10^8?
out of curiosity, what happens if you feed a parabola lying on its side
like y = +- sqrt(x)
Let me see
Looks like it
Wait
It's because I got nan
x has to be positive..!
Ohhhhh
Woops
I was imagining thing
/j
Your guess is correct
It's a straight line
@partial(jit, static_argnames="data_size")
def generate_data(key: Array, data_size: int) -> tuple[float32, float32]:
_, subkey = random.split(key)
x = jnp.sort(random.uniform(key=subkey, shape=(data_size,), minval=0, maxval=500))
# x = jnp.arange(-200.0, 200.0, step=30.0)
# y = 2 * jnp.pow(x, 2) + 6 * jnp.pow(x, 3) + 1
y1 = +jnp.sqrt(x)
y2 = -jnp.sqrt(x)
return x, jnp.add(y1, y2)
but the points should be a parabola
Just realised it
Weird
Why is y all 0
but you get 2 points right (x,y1), (x,y2)
Oh I can just *-1
yes
abs
nice
Not a perfect straight line it seemed
w: [-2.7694678e-06 5.7525358e-06 -2.8457648e-06] b: 1.0132458783118636e-07
now you can set y2=0
no, just removing it, should fit
What do you mean?
ohh that's gradient descent?
Yup
you can solve it with a single linear algebra formula, it's got exact solution i think
@jit
def grad_decend(
w: Array,
b: float32,
learning_rate: float32,
x_train: Array,
y_train: Array,
):
w_grad = jacfwd(lambda w: cost(w, b, x_train, y_train))(w)
b_grad = grad(cost, argnums=1)(w, b, x_train, y_train)
temp_w = w - learning_rate * w_grad
temp_b = b - learning_rate * b_grad
return temp_w, temp_b, w_grad, b_grad
Thanks
grad_descent is i think
It does
My good ol' reliable spelling (mistake)
Apparenly even pycharm is shouting at me for the spelling mistake
makes sense
Linear and polynomial regression with least squared error has a closed solution
Can I, a newbie to ml impliment that tho
Gradient descend seemes to be quite simple
(and good enough)
With numpy it should be fine
you have to calc a couple of matrix transpositions and are done
in numpy this means X.t I think
What's it called?
try linear regression wikipedia
and links to other methods
or polynomial regression directly...
tbh, these days you may be better of doing gradient descent, but idk
Y
Looks intimidating, but I'll have a look
it's basic linear algebra unless you dig into it
not saying it's easy, but it's that
Wikipedia might have full maths details, maybe you can find simpler things
||Funny thing, I've only have basic math knowledge, I am only in high school||
I do surprisingly know more than what school teach
Oh
So are you getting the derivative and set it to 0 and solve?
yes, that's one approach
How would you implement it w/o manually doing the solving?
probably other users know better about the exact details, i don't remember/know that much
I've literally done this a few years ago like 4 years or smthing I have no idea if I can find it somewhere
but you don't need to code the derivatives
That's a big if, really, I haven't even been thought how to do derivative yet (I know some differentiation), but, there's nothing stopping me from self teaching ngl
I mean my school is still teaching you how to use if statement
you may wait until learning basics of matrices, imho
I know how to do them
Even though it's not taught
I know multiplication at least
that's good..
Doesn't hurt learning something new
this is a good idea @ocean pawn
If I can learn python when I'm 11 or something, why can't I learn ml now (cope)
Any good resources?
Wikipedia is really intimidating
Looks like calculus is where I start off with
It's fine, I can understand most math notation
I hope, at least
You can fool around with libraries, implementing is another story
I kinda wanna understand it
The reason I can never understand keras/tf
is because I have no idea why am I doing certin thing
By knowing how it works
I actually understand what and why
I guess I'm just really stubborn and wanna understand it
if you type"least squares matrix formula"
and go to images/videos
you will realise what of those may fit your level. try it
Thanks!
Oh and to get derivative for grad decedent calculus is used right?
gradient descent isn't used here
No I meant
In case where gradient descent is used
How do they get derivative?
Is it calculus?
yes, the derivative is calculus
Currently, I only understand what the derivative mean, but I don't know how to do it myself
So I'll see
Thanks, everyone
(I do wanna do it myself, cause, why not, I am thankfull for Jax's autograd tho)
Only need derivation and partial derivate (not that much more complicated)
ur welcome 🙂
Worse case scenario: I'll understand it in two year (I'll eventually learn them in school)
certainly, i think we just didn't want to make you waste time and feel frustrated. it's important to have a good learning path.
It should be fine, I know differentiation, which should be enough to start learning about derivative
Differentiation is calculating the derivative.
Oh, I know basic differentiation, but not all of the rules
Yeah, my sentence is stupid
Nice debate at mlst https://www.youtube.com/watch?v=8LxTWIaInok
(Hotz vs Leahy)
did u guys watch it?
Any of you web scrape frequently?
i do sometimes, can't help w anything very intricated though
which text preprocessing for nlp ml task yall using given that tf.keras.preprocessing.text.Tokenizer is depreciated
wait wut. keras 3!
yup
why is https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer depreciated 😦
it is still keras no?
sbecause keras is not part of tf
hmm i guess. but not sure why tf.keras
sounds good.
yeah, read the getting started
thoughts on spacy & nltk?
they seem to do similar things, for like tokenization at least
no idea what that is sorry, maybe others know
no worries
just curious. how did u figure out this inforomation
actually, from here https://keras.io/getting_started/
at the bottom of the page
Starting with TensorFlow 2.16, doing pip install tensorflow will install Keras 3. When you have TensorFlow >= 2.16 and Keras 3, then by default from tensorflow import keras (tf.keras) will be Keras 3.
thanks!
welcome !
i was so excited about keras 3, cuz it's multibackend, but you will know everythin from that page
i forgot to google lol
for real
altho i think it is kinda awkward
like i am sure tensorflow can do the same thing as pytorch and vice versa. who would bother learning both of them and using 2 for one task lol
i could be wrong though
i read from somewhere that some people think keras is shit
cuz it is like "tensorflow wrapper"
i think it is more than that but i could be wrong
to me it's a symbolic computation layer
independent of any backend, and extremely powerful
s got almost as many gh stars as pytorch (i know, that's not always the best metric.)
Iirc it’s the other way around
Tensorflow is a Keras wrapper
the low level calculations are carried out by tf (or other backends now..)
matmul and such
I see a wonderful course in udemy for all AI/ML enthusiast in a discounted price of only 9.9 USD, lesser than the udemy original deal's program price which is 13 USD. Discount coupon is only valid till 2 days with unlimited redemptions within 2 days. Sharing here as it might help anyone from this community.
coupon code: CAIDPROGRAMDISCOUNT
Helllllllo
Is there are anyone thst can help me i am leaening python currently and want to lewrn ai ml for fun
https://youtube.com/playlist?list=PLnfmfrpiDBPgmrfT_04epUpubx33OU8GV&si=31V6xy_vyJ1QUuSL
Here is the playlist tht i created by my own is it are s good playlist or not i eant to learn it but hsve no guidance csn anyone guide me i am a conplete begineer just 2 weeks ago i started learning python
Anyine can dm me if he want to guide me
it seems like both my validation loss (binary cross entropy) and accuracy curves are going up together, does anyone know how to interpret this?
I would imagine the model is overfitting since my train loss is decreasing (and accuracy is 100%)
but these curves still don't make much sense to me
Are the labels correct?
Oh i guess binary classification
Hard to say without reading the code for me
Yepp the labels are correct. It's just basically whether the stock market moves up/down in the next 5 days (so I guess that should be pretty random)
Yeah that's my guess too, pretty odd behaviour nonetheless
If data is random it memorises the input, then acc and loss for training improve, the other 2 shouldn't, although you said vlidationt accuracy goes up?
Yep, training accuracy goes to 100% and loss goes to near-zero. But that doesn't explain why validation accuracy keeps going up too 😂
Sorry i meant validation acc
A better question rather is whether I should care about the validation loss going up or if I should just care about the accuracy since that's what I'm after anyway
Uhmm i think you should care but am not an expert
def _step(self, batch, _, prefix: str):
x, y = batch
y_hat: torch.Tensor = self.forward(x).squeeze()
loss = self.criterion(y_hat, y)
acc = 100 * (y_hat.round() == y).float().mean()
self.log(f"{prefix}_loss", loss)
self.log(f"{prefix}_acc", acc)
return loss
same code for both validation & training steps
Is the validation dataset too small?
total dataset is about 2500 rows, did a 60-40 split
similar results with 80-20 split too
Sounds smallish for the task isnt it?
It is, but unfortunately all I've got to work with
which explains why the training accuracy so easily goes up to 100%
Other answers explain well how accuracy and loss are not necessarily exactly (inversely) correlated, as loss measures a difference between raw output (float) and a class (0 or 1 in the case of binary classification), while accuracy measures the difference between thresholded output (0 or 1) and class. So if raw outputs change, loss changes but accuracy is more "resilient" as outputs need to go over/under a threshold to actually change accuracy.
Id try to get 10x that n at least, I'd try reducing the network params as well
This makes sense too, maybe the model is just learning to predict values close to 0.5
Oh yeah, I'll try a smaller network
Yes, so that you can get an idea about the behaviour
Yeah no, similar results with smaller models too
Yeah, that's a good post
Maybe it's a better strategy predicting amount of movement in the stock than classifying up/down, that might get rid of the "bad predictions getting worse" problem
Possibly, or using rnns, unless it is an rnn
a simple MLP for now
Cause those should capture more info
So normally you want smth that takes context info i think
I doubt there's enough data to train one well, but worth a shot
Actually rnn would be if you need to predict a continuation of the sequence of the real values i think, so you might be fine. But i wonder whether some rnn like structure wouldn't do better.
Maybe smone else can help further
hello everyone! i hope everyone is doing well.
i am learning Bellman equations and I have some questions on different concepts on RL. I would greatly appreciate it if anyone could answer some of my questions :D
https://docs.google.com/document/d/1JlRIBYSIJKypfkLOEA0P7AOLKCUhmjquXbunD40WqhE/edit
Hey Ive been following a book to build my own LLM and its all coded in Python, ive made a blog post about it and it would be great to get some input from this community about it. https://theclouddude.co.uk/building-my-own-llm-a-journey-into-language-models-building-a-tokenizer
As Im quite new to Python....
That's really not a good thing
Need regularization, early_stopping for example etc
early_stopper = EarlyStopper(patience = 5, min_delta = 0.01)
```is this good?
but after 10 epochs the training stops
https://stackoverflow.com/questions/50284898/keras-earlystopping-which-min-delta-and-patience-to-use
I'd use smaller than 0.0001 i think but post says better
0.0001 ?
Yeah try
Which is better for ML/AI beginners- tensorflow or pytorch? I am open to learning both though
the question is basically what do i learn first
pytorch
aight ty
tf is absolutely good tho
yeah but it is predicting cloudy image, as rainy!!😂
Skill issue 
yeah!💀
Not even joking, if the problem is tf, the problem probably isn't tf 
Pytorch is good too tho
tf??
what should be dropout rate for CNN?
0.5 is current!
because the model is getting overfitted to quickly !!< should I increase that?
ignore that!!
current accuracy is good enought actually which is 93
tensorflow=tf
I don't use that!
only pytorch!
The misunderstanding was big here lol
I was responding to you saying pytorch as an absolute choice just saying tf=tensorflow is also good
tf is tensorflow
how can I workaround input?
maybe hardcode it?
i mean when deploying i dont have access to type text
but still i want to show to someone result
type name of product
text = input("type name of product (e.g beer): ")
test = pred(text)
print("predicted label index e.g 0 - chips: ", test)
predicted = ""
if test == y[test]:
predicted = yTxt[test]
print("do you want to add", predicted + "?")
accuracy = np.sum(y == preds) / len(y)
print("Model Accuracy = {}".format(accuracy))```
it stops on input
no problem when running locally but then i type text to input
here I dont have possibility
here on render deployed
hmm maybe replace input with HTML input
i mean change to graphical from console
so then I must use e.g streamlit
Guys what do you think of this Value Network Architecture for Inverse Double Pendulum v.4?
value_net = nn.Sequential(
nn.LazyLinear(num_cells, device=device), # num_cells = 256
nn.Tanh(),
nn.ReLU(),
nn.LazyLinear(num_cells, device=device), # num_cells = 256
nn.Tanh(),
nn.LazyLinear(num_cells, device=device), # num_cells = 256
nn.Tanh(),
nn.LazyLinear(1, device=device),
)
```? Is there anything that can be improved?
Hey, thanks for your help yesterday
I think I managed to understand derivative
Seemed like power rule and chain rule is enough
For the time being at least
(I even managed to understand how to do derivative for sigmoid function)
I assume I'll need it for binary classification? (Right?)
guys , if anyone has studied deep learning, does anyone know why mini batch gradient descent is said to be more efficient than batch gradient descent? Im watching andrew ngs deep learning course 2. Without any parallelization, I would think mini batch is inherently the same if not slower than batch computing because you are purposely breaking up the already vectorized operations in favor for linearly looping over the training sample
it sounds like you misunderstand what aspects of model training can be parallelized
I'll elaborate when I get a chance
in the meantime, can you tell me what you think the difference is between batch and mini-batch?
batch is you consider the entire parameter vector theta and compute gradients on theta
mini batch is you split up theta into subsets and compute gradient descent on those subsets
Right but you make updates after each minibatch you calculated
Where you are on the gradient therefore changes, so you cannot parallelize those calculations
right so i understand you can parallelize each mini-batches gradient computation, but im clarifying whether mini batch is inherently faster than batch (without parallelizing each batches gradient computation)
and also in what manner is it faster, in computation time, or for convergence
That was a typo, you can't parallelize minibatches
Anyone here familiar with discretization schemes in quant finance for the heston model? I am trying to implement a model but having a few issues (one really)
It's faster in convergence time because you trade the accuracy of your gradient estimate for the speed of your convergence. Say you have a 10,000 sample dataset and in batch you use every sample before taking a step. Now say in mini batch you take a step (update your model) after every 20 samples, you have made 500 more updates to your model in a single epoch.
so you are updating more frequently but each epoch of updates is slwoer
The amount of wait time/compute is technically more in mini batch (per epoch) than in batch yes. But the actual convergence speed is faster for mini batch.
so when you say converges faster, you mean it takes less number of epochs to converge?
That's one way to put it yes.
so less epochs that are slower, is somehow faster than more epochs that are faster?
"slower" and "faster" referring to the amount of compute you have to do yes, but its sort of misleading to put it that way here. In general when we talk about speed in training we mean convergence speed. In terms of convergence one mini batch epoch can be "faster" than multiple batch epochs, because the number of steps you take down the gradient in mini batch is so much more.
one way to think about this is an analogue to float precision. We generally use 32 bit floats as the default for machine learning models because, although you can get more precise gradient estimates using 64 bit floating points, that extra precision doesn't help us converge any faster really. mini batch GD is taking that idea to the extreme, we don't need every sample to get a gradient estimation that lets us step in the right direction; assuming our dataset is properly balanced then that handful of samples should be a good enough idea of where to go for us to safely take a step.
wait so in minibatch do you use every mini batch in an epoch or no?
yeah you make updates on every mini batch per epoch
yes, mini batch gives us much faster convergence
and why is that? I feel like I am going in circles, but you are still just breaking up a larger task into smaller tasks but a lot more tasks
think of it like this
a batch epoch is: compute gradient estimate across all samples -> update model
two steps
a mini batch epoch is: compute gradient estimate on batch 1 -> update model -> compute gradient estimate on batch 2 -> update model -> ...
you are basically having lots of little "batch epochs" in a single "minibatch epoch"
you are basically getting the same result per 2 steps, just with mini batch you do a hell of a lot less computational work to get that result, and do it many times
ok let me use an analogy, if I have to take a flight from Los Angeles to New York, I could take one 6 hour flight or I could take three 2 hour flights, but the total flight time is still the same, 6 hours.
(In practice, the overhead for setting up the flight and layovers itself is actually not insigificant, so you would spend more time taking 3 flights than 1)
I don't think it's a great analogy because the speed of the flights is the same or worse when you break it up. A better analogy might be:
You have 10 seconds to get as far down a hill as you can, you can lay on your stomach and measure exactly the angle of the hill in front of you then gently take a step, or you can just take a glance and jump down the hill. Both will get you in about the same place, but you can do the second one way more times in 10 seconds than you can do the first
but in mb gradient descent, your step size is smaller than in b gradient descent
what makes you say that? 
the magnitude of your step down the gradient might be smaller, or the noise of your data might mean on average the step you take doesn't get you as close to the local/global minimum
but 100 mini batch steps vs 10 batch steps favors mini batch greatly
you will be much farther down the gradient in mb
ok wait side question, at the end of all epochs you would have an array of costs for each mnibatch, how do you reconcile that at the end for a scalar cost
no you do the updates per mini batch, you don't calculate all the costs then update at the end
it's the exact same mathematical algorithm as batch gd you just do it on way less samples, over and over until you run out of samples
it is not
between epochs?
you cannot parallelize mini batch
ig you could technically parallelize a single minibatch calculation across multiple processors and aggregate but I don't think anyone does that, you don't need parallelization because it's so much faster than regular GD
Yes and no. You really should view most of stochastic gradient descent as matrix multiplication. You multiply your batch, a tensor, with your weights (a matrix or a tensor), calculate the loss and subsequently do your update. Multiplying matrices is parallelizable
spent the last 3 hours trying to figure out
why read_csv() wont read all the rows in my text file.
tried polars , tried pandas nothing worked,
pandas did work but only kinda ,
my raw data had 40k rows , but it was only reading 36k and dropping shit with no warnings or errors what so ever.
after trying a billion things setting an argument 'quoting=3' and it just fucking works,
but the lines that were being dropped had no quotes in them to begin with
I wanna die
i mean you could just compute -alpha * dtheta for each parallel process, and just add them all up after the parallel processes finish to get the final cost no?
you could absolutely do that, but it's not worth the overhead unless your dataset is larger than memory and you're doing it distributed
Matrix multiplication with numpy is already multithreaded and with jax/torch you can go a step further and run it on a gpu for even more parallelism EDIT: fixed it, thanks to yo
Ah, the second numpy is meant to be torch
your saying that running in a for loop serially for all mini batches is not significantly slower than running mini batches in parallel?
Oh, I think I get your question now
You need to do it serially because you do an update of the weights after each batch
if you could imagine one is O(N) would the other not be O(N / n) where n is number of processes
o rite totally forgot about that lol
guys , i was wondering what is the best module fr me to learn and start with as a bigginer , pytorch or tanserflow
Those are both for the same thing. You shouldn't start with either. Start by learning about what "data" is in the context of data science and AI and how to manipulate it.
pytorch is more used, also tensorflow isnt supported anymore by google
they have shifted more towards jax
Like math resources on the basic implementation?
all u really need to know is autograd for the very basics
which is just chain rule
Hello, are you guys familiar with prototype_path and class name script?
you can just ask the question
I don't mind if it's complex implementation as long as it explain how it can calculate for optimizing the weights and biases
yeah the paper I linked does that
Hey guys, Can anyone help me with setting up the apache spark environment?
nice resource, ty
Hello guys
supppppp
hello
I’m having trouble understanding how it’s supposed to work
The path in the script isn’t working I’m guessing it’s because of .npy file
I'm making a project on an fred website economy data, can someone help me generate questions regarding it?
i am new in making personal project i haven't got confidence till now to create my own questions for a project that i'm making
this is how the data looks
Hi! I want to ask a question, so I want to create a chatbot and i want this chatbot to get information from the websites using ai. i know python a little bit i completed some python 3 courses and im going to start the intermediate course and like review python. However, I don't know where to start building the chatbot and which ai should i use etc.
can someone tell me what do i need to create a chatbot? some says use openai assistant other says use gemini and while others say use dialogflow etc.
What about it? What's your question?
keras guides are really superb...
im reading this one for curiosity.. https://keras.io/guides/serialization_and_saving/
If you are deciding the batch size in a model then what's the use of number of steps per epoch param?
isn't it automatically decided by dividing the number of data points in the dataset by the batch size?
i'd think so
your name reminds me of Angelina Jolie and Billy bob marriage, it was wild
the steps per epoch may be useful when you feed the whole data set
Yeah I thought the same
Hello respected members , as you all can see that i have just joined in this community
I want to become an AI engineer but i dont knows the proper steps
Can anyone guide me about the roadmap to become one
what stage are you at currently? are you in high school or what?
I just complete my high school
will you be going to college/university for computer science?
what do you mean by that?
like i am still not sure whether i will get a seat in the desired uni
cz thats the only good in uni in my city
what country are you in?
so i m trying not to depend on my uni too much
INDIA
I hail from north eastern india which doesnt have good educational instituition thats y
you will need a university education in CS with an emphasis in AI to be able to have a career in this space.
okay, so take as many AI-related courses as you can, and make sure you're taking the appropriate math prerequisites
I have aplied for CS in Data science and AI
Yes the uni has already set courses which include discrete maths , linear algebra etc
What would you do if your uni doesnt provide you with such classes ?
find another uni
Nice
lol
I mean wouldnt you recommend just learn the topic by yourself with books ?
Sir may i ask from which country do u hail frm
you can do that if you want, but your chances of getting a job are so low that you need another plan.
it takes alot of effort and hardwork to do it without a good uni
yep
and luck.
i also think so
true that
Okay. I just have one DS class which teaches the basic algorthimes with Neural Networks.
what aside from uni studies what can i do to achieve my desired goal
I have modules like Data Driven Decision where you work with python mostly to preidct something
see if you can work with research professors who specialize in AI.
But its not a lot of theroy
Like i dont have any coding background
that's fine. they'll teach you when you get there.
and i want to learn Python
there are online courses by good profs
by myself
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
that's easy
Is there any steps or routine i need to follow like a roadmap
for clear basics I'd recommend apna college. youre from India So im guessing yk hindi
@serene scaffold which country are u from
@bleak reef yes i k them
@serene scaffold are u a working professional
@serene scaffold what type of work u do like ai or web developer etc
I'm a computational linguist for a lab.
@bleak reefare u also from india
yeah
yeah in 3rd year of my CS degree
@bleak reefis there anything i need to be aware of before starting my cs degree as i will start my first sem from next month
@serene scaffold is there any roadmap to learn python
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
profs will guide you there just fine
@bleak reef actually i dont believe in the teaching faculty of uni cz i heard the faculty is average
I would recommend watch some random python yt course to gasp the basics and then look up libaries you probably need. #
And work with code in generell
But if you want to be better than others you should know how to communicate well and network with professionals
oh i see
and be good at maths
hmm
yeah
Is there tips to learn code better
Don't just read or watch books/courses, actually type in the code yourself and try to apply what you learn by making your own projects
Active practice is far better than just passively absorbing material
What is that?
I specialize in language technology
yes
@serene grail thnx
Hello everybody. I was wondering if anyone knew good resources to learn how to build AI applications using Python before college?
@warm copper the slopes and the means are functionally the ssame
For instance, if you're analysing the impact of different factors on delivery times, regression could help you. It can easily predict how changes in these factors affect the overall supply chain. If you're comparing the average delivery times across different regions or transportation methods, ANOVA would be more appropriate.
if you have a binary variable yes/no annd you make a dummy the intercept is the mean of yes or no
and the "YES" variable is the difference of the mean
Can you not see how this is the same?
It's presented differently in R but it's fundamentally the same thing repackaged
You know that you need to use a t-test, but are you stumped on what kind of t-test to use?
When choosing a t-test, you will need to consider two things: whether the groups being compared come from a single population or two different populations, and whether you want to test the difference in a specific direction.
This video is part of the H...
Hence why more and more programs don't teach ANOVA anymore altogether
I literally used ANOVA during my internship
Don't take this the wrong way but are you reading what I'm saying?
I am but I still use them for different purposes
If you ONLY have 3 categorical variables and you make 2 dummies
red, blue and green. You make a column for red and a column for green
It's clear that beta_0 is the mean of blue, yes?
It's also clear that the coefficients for beta_1 and beta_2 are expressing the difference of the means compared to red and green, right?
I used ANOVA in DOE a lot too
Instead of Regression
Yeah I understand that
but you cant do logistic regression with ANOVA
multiple regression in my bulletpoint doesnt refer to linear regression
it can be logistic or polynomial regression too
Exactly
All I'm saying is that if it's a stats heavy data scientist role and you don't see that linear regression subsumes anova
you may have an issue during the interview
And for that reason I avoid those roles like the plague
I mean ANOVA can be used as a linear regression for sure. You can use them interchangebly
but the output i get from each of them are different
but they achieve the same thing
Other fun questions I get at interviews is how random forest and xgboost work
For some reason every technical interview has asked that
Anov table puts out more statistical stuff
like p values, f score degree of freedom
you don't see that kind of stuff on linear regression output
Depends on the library of course
We used Stata (🤢) for this stuff in uni an afaik you get F scores there with your regression output
this is from my undergrad thesis @past meteor
I'm very picky, are you sure you want me to look? 😂
yup
My first question is, did the people that die specifically die due to heart failure or could it be anything
If you had a car crash in september are you added to the group that passed?
they died specifically from it
stepwise regression is a no-go
dataset part of project states that
many papers on why you shouldn't use stepwise
Well we only learnt stepwise, logistic and linear regression in that course
damn
It was called Applier Regression Analysis
no
damn
The only lasso I know is the regularization technique
Yeah
which I learnt in ML
that one
https://www.reddit.com/r/statistics/comments/7bvo6m/why_is_stepwise_regression_criticized/
From what I've been taught about stepwise regression, the problem is how very atheoretical it is - and also how dependent it is on sample characteristics. Since predictors are often at least a little correlated, using the exact same set of variables with stepwise selection on two different datasets will often get you a VERY different solution.
Hell, if you run it forward and backward on the same dataset you'll often get very different solutions.
For models that are inherently multivariate, pretending they aren't (doing a bunch of pairwise comparisons, one variable at a time) is generally not the best way to go.
I guess that if your prof is teaching you stepwise, then I can see why you did it
I would use DT on this dataset now
I'm mostly missing a residual analysis
😄
Your regression formula assumes a linear relationship between each variable with 0 interactions
Well, you could plot the residuals wrt each variable and find out if it's homoscedastic or not indeed
isn't the bachelor thesis at the very end?
strange
Nah this was more like a class project
It was a very tough class tho
We had 24 peeps when we started and ended with 8
I glossed over all the statistical tests because it's been too long for me and I never use them at work
yeah, I think it's mostly stepwise and no iterative, "data driven" approach to modelling
but I guess you saw that in other classes
Also with a tad of omitted variable bias https://en.wikipedia.org/wiki/Omitted-variable_bias
Mostly matters for interpretation of your results
Hence why I'm always afraid of interpreting regresssion coefficients, it's risky business if you're not a statistician pur sang
Or maybe I take this too seriously 🤷
try now lol
im using transformers
to detect colon cancer
I managed 99 percent accuracy
👽
I'm not gonna read this ngl haha
LOL
it's very notebooky
yusss
you need to read it top to bottom
And can't join in the middle and see what's going on
i said to myself
it's 10 pm for me on a friday, aint gonna do that rn 🤣
if I cant get a job i will become a college prof
What I'd look for here is if you're leaking data or not
i will do a phd and stay in the college
If you get 99 % accuracy you should be worried, not happy imo
transformers are really powerful
basicall you are training your model on a pretrained model
I used ViT
which is what you do with say resnet as well
yup
what is worrying me is the fluctiations in epochs @past meteor
Epoch: 3 | train loss: 0.1296 | test accuracy: 1.00
Epoch: 3 | train loss: 0.1374 | test accuracy: 1.00
Epoch: 4 | train loss: 0.2167 | test accuracy: 0.94
Epoch: 4 | train loss: 0.2823 | test accuracy: 0.94
Epoch: 4 | train loss: 0.5354 | test accuracy: 0.88
Epoch: 4 | train loss: 0.2172 | test accuracy: 0.94
Epoch: 4 | train loss: 0.0596 | test accuracy: 1.00
something is off
# Get the next batch for testing purposes
test = next(iter(test_loader))
test_x = test[0]
``` that `iter()` is either redundant or a bug
its redundsnt
but what do you think about the epochs @agile cobalt
why such high fluctiations?
Epoch: 6 | train loss: 0.1894 | test accuracy: 0.94
Epoch: 6 | train loss: 0.0428 | test accuracy: 1.00
Epoch: 6 | train loss: 0.1503 | test accuracy: 0.94
Epoch: 7 | train loss: 0.3682 | test accuracy: 0.94
Epoch: 7 | train loss: 0.3046 | test accuracy: 0.94
idk batch size too small?
its 16
don't you have 8 classes or so
never mind, I got a bit confused
(thinking about how many classes it'll see each iteration, but that probably shouldn't matter)
lol
I changed my batch size to 64 from 16 now
and LR to 0.01 from 0.00001
Epoch: 0 | train loss: 2.2913 | test accuracy: 0.12
Epoch: 0 | train loss: 1.7133 | test accuracy: 0.27
Epoch: 1 | train loss: 2.4397 | test accuracy: 0.11
Made it worse lol
maybe try smaller just to see what happens
or even 4
0.01 learning rate is probably too high though
EPOCHS = 50
BATCH_SIZE = 8
LEARNING_RATE = 0.000001
Im gonna do this lol
Epoch: 5 | train loss: 0.6605 | test accuracy: 1.00
Epoch: 5 | train loss: 1.0521 | test accuracy: 0.75
Epoch: 5 | train loss: 0.5068 | test accuracy: 1.00
Epoch: 5 | train loss: 0.7508 | test accuracy: 1.00```
lol 🥲
Why is a single epoch printed out mroe than once?
That in an of itself is a bit strange
I increased dropout layer to 0.5
it goes through each batch size
you can make it print once
it gives more detail on whats going on each batch size
evaluating on the validation set each batch is strange
I'd just have 1 line per epoch tbf
fixed it
Anyway, if your model's performance is really good odds are you shouldn't be celebrating but rather looking for where you have a leak
If you've exhaustively search and you find nothing then you can celebrate
I did 50 epochs of training on my model, and on the 11th epoch I got 92% training and validation accuracy. How can I like select the epoch with the best training and validation when I train it next so that I can make that configuration the permanent one?
Also, when doing the hyperparameter search, when I get the best parameters on the next run it always changes. So I finally ran it and I saw good parameters and I hard coded thoes in and I got better results. Why isnt that the standard instead of them changing every time
what library do u use for deep learning
i use pytorch
ok , from where did u learn it
ok , still at school 😦
are you?
i ve only readen the quick start
then see neural nine video
The pytorch documentation has a "learn" section
it's how I learnt pytorch. It's mostly the same as Numpy and Tensorflow
no books, no videos, just the docs
is it possible to learn it from courses
that's how you should learn to use tools imo. If the tools have bad docs, just use a different one if you have a choice
m now learning pandas
I really don't like the idea of learning from courses
I explained last time already why not
You should pick a book like the ones in the pinned post I list
They all have exercises
i know
The second pinned message is about books I recommend
from Raggy?
do i need to learn API's
"APIs" are a broad concept. if you're learning how to use <x programming thing>, you're learning the api of x.
Do you mean backend APIs / Web development or what stelercus is talking about (the correct usage of the term API)
is there an API concept in ML (i just heard it so sorry for any confu)
Can you first clarify what you mean with API first
it doesn't need to be perfect, your describe it in your own words so I understand what you mean
Epoch: 31 | train loss: 0.1147 | test accuracy: 1.00
Epoch: 32 | train loss: 0.0760 | test accuracy: 0.75
Epoch: 33 | train loss: 0.0796 | test accuracy: 0.88
Epoch: 34 | train loss: 0.0718 | test accuracy: 0.88
Epoch: 35 | train loss: 0.0762 | test accuracy: 0.88
Epoch: 36 | train loss: 0.0627 | test accuracy: 1.00
Epoch: 37 | train loss: 0.0548 | test accuracy: 0.88
Epoch: 38 | train loss: 0.0607 | test accuracy: 1.00
Epoch: 39 | train loss: 0.0512 | test accuracy: 0.88
Epoch: 40 | train loss: 0.0575 | test accuracy: 1.00
Epoch: 41 | train loss: 0.0567 | test accuracy: 1.00
Epoch: 42 | train loss: 0.0506 | test accuracy: 1.00
Epoch: 43 | train loss: 0.0588 | test accuracy: 1.00
Epoch: 44 | train loss: 0.0361 | test accuracy: 1.00
Epoch: 45 | train loss: 0.0434 | test accuracy: 1.00
Epoch: 46 | train loss: 0.0401 | test accuracy: 1.00
Epoch: 47 | train loss: 0.0331 | test accuracy: 1.00
Epoch: 48 | train loss: 0.0440 | test accuracy: 1.00
Epoch: 49 | train loss: 0.0330 | test accuracy: 1.00
i just heard it on a data scientist roadmap
Pretty good accuracy! @past meteor
I would probably get a lower train loss if I ran it for another 10 epochs
it can be just a miss understanding
So, they're using API in the colloquial but incorrect way as a way for the outside world to interact with your ML models over the internet
Throw away that roadmap.
Pick a basic ML concept (not a library) and write some code that exemplifies that concept. You'll inevitably need at least one ML library to accomplish it. Just learn whatever minimal amount of that library's API that you need to do it.
Don't worry about it
Stelercus is spot on
I started learning data science in exactly that way. I downloaded all of my data from facebook (... I use messenger a lot) and did a data analysis project with it
u mean like learning from projects
sure.
use kaggle @glass ridge
Along the way I learnt the basics of pandas, storing data in DBs with Python, what JSON is, making ML models using sklearn, ...
Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.
This is a project you can easily copy for whatever platform you use because of GDPR, like you can ask discord for all of your data
And then you can do some analysis on that
Doesn't need to be advanced, but it's more productive than roadmaps and whatnot
these are good places
- A very low train loss is absolutely a bad thing
2.A very high accuracy is very suspiciouis
what project
I just told you 😭
it was really high
Epoch: 0 | train loss: 1.7844 | test accuracy: 0.25
Epoch: 1 | train loss: 1.3510 | test accuracy: 0.75
Epoch: 2 | train loss: 1.3775 | test accuracy: 1.00
Epoch: 3 | train loss: 0.8661 | test accuracy: 0.88
Epoch: 4 | train loss: 0.7971 | test accuracy: 1.00
went from 180 percent to 3 percent
why would it be a bad thing
you aim to minimize train loss
I recently had a very good model and I presented it to my clients (both are PhD + multiple post doc tier data scientists)
their obvious reaction was "where did you make an error?"
what do u mean by dataanalytics , is it clearing and representing data
And that is with me presenting my results with loads of skepticisim
first of all transformers usually lead to very low train loss
you don't
that's not the point of training models, it's the exact opposite 😔
??????
you need high accuracy
it doesn't matter if it's classification, regression or clustering
I'm sorry but this is patently false
I'm being hard on you because you're interviewing
If you say this in an interview it's over
alright then lets have a shitty classification model that doesn't accurately classify images
that's not guaranteed to be the case
and high accuracy doesn't mean that the model will perform well on instances outside the dataset
accuracy might also be the wrong metric

