#data-science-and-ml

1 messages · Page 135 of 1

verbal venture
#

the training data is text + image embeddings of image clips (embedded vectors of image frames) and their respective captions. the other 2 names are video datasets

violet gull
#

Stelercus 👉 👈 ducky_sphere

serene scaffold
violet gull
#

but orange name

serene scaffold
#

mods are orange
admins are tomato

violet gull
#

but tomato

serene scaffold
#

anyway, if the experiment setup gives the model a clear signal, and the modal architecture is appropriate, the model will converge on something. it just won't necessarily be the best possible model.

serene scaffold
#

what won't what?

violet gull
#

my example shows how convergence is impossible (clearly its not) so my reasoning must be wrong somewhere

serene scaffold
#

you said "it won't", which is short for "x will not y", but idk what x and y are.

violet gull
serene scaffold
#

I think one of those two things might not be true. or the hyperparameters are bad (which I guess is a third condition that I didn't mention)

violet gull
#

my example doenst involve hyperparameters

serene scaffold
#

well, I've never done reinforcement learning
ask me about interactive LLMs.

violet gull
#

🤧 lemon_sentimental

autumn comet
#

Right. I actually had validation code before (which I accidentally left out of the post here)

     # Check if CUDA is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # Determine number of GPUs
    if device.type == 'cuda':
        num_gpus = torch.cuda.device_count()
        print(f"There are {num_gpus} CUDA devices available.")
    else:
        num_gpus = 0
        print("No CUDA devices available.")

When I run this on my local machine, I get:

No CUDA devices available.
Initializing a new model.
Parameters to be optimized: 7041970

When I run it on a new pod (with different hardware) I get:

There are 2 CUDA devices available.
Initializing a new model.
Parameters to be optimized: 7041970
violet gull
#

ill keep reading books until i find an answer to this

serene scaffold
autumn comet
#

Thanks for the reply.

Yeah, I suppose I can bail out if there is no CUDA.

But as far as the "tensors are on different devices" problem, I am stumped. Just to make it clearer, I am now using .cuda() and that on everything in train.py that will accept it but still not working. Also, now I am only working with one GPU with the hopes that I can get it running on 1 before trying to get it to run on multiple.

//...
        
    train_data = torch.load("assets/output/train.pt").cuda()
    valid_data = torch.load("assets/output/valid.pt").cuda()

    //...

    if update:
        try:
            model = torch.load("assets/models/model.pt").cuda()
            print("Loaded existing model to continue training.")
        except FileNotFoundError:
            print("No existing model found. Initializing a new model.")
            model = GPTLanguageModel(vocab_size=len(vocab)).cuda()
        
    else:
        print("Initializing a new model.")
        model = GPTLanguageModel(vocab_size=len(vocab)).cuda()

//...
            train_loss = estimate_loss(model, train_data).cuda()
            valid_loss = estimate_loss(model, valid_data).cuda()

            time = current_time().cuda()
//...

        # sample batch of data
        x_batch, y_batch = get_batch(train_data).cuda()

        # evaluate the loss
        logits, loss = model(x_batch, y_batch).cuda()
        //...

    torch.save(model, "assets/models/model.pt").cuda()
    print("Model saved")
digital heath
#

Hello

#

How are you

#

I need help guys

frosty fulcrum
#

Does anyone know how can i normalize the points so i can use the mask with roboflow?

frosty fulcrum
#

fixed

past meteor
#

Because off-policy algorithms can update a different target policy than the behaviour policy. This is what the Q in Deep Q learning stands for.

#

I'd check out the paper I sent you

wild loom
#

Hey guys, I've been training a AI coco-model on image detection lately in google colab. I was wondering if anyone had a link oe two that would explain a way in which I can somehow download this model I've trained so that I can import it to a new file and just plug in an image to be detected rather than re-run the enitre model on colab for it to be used everytime I restart my PC.

lapis sequoia
#

just download the weights/checkpoints, and the configuration file @wild loom

#

you can run !find . -type f -name *.{ckpt,pth} i think

#

otherwise maybe this !find . -type f -name "*.ckpt" -o -name "*.pth"

#

those are frequent but it may be .keras etc depending on what is the framework and format.

wild loom
#

okay thank you for your help I'll try that out

remote stream
#

Guys which field or topic of machine learning should i focus more

#

because i think that i understand most of the supervised learning topics and their codes are pretty similar

#

same goes for unsupervised

deep sleet
#

is making a custom loss function good in certain cases?

mild dirge
#

F.e. increasing the loss for underrepresented classes, and decreasing loss of overrepresented classes.

deep sleet
#

oh

#

Ty

unique spoke
#

How to do semantic segmentation?

#

Is it a better option than bounding boxes for object detection

#

So basically, I am trying to make a project and I have already used the COCO dataset

#

I want to combine it with the Mapillary vistas dataset which is more specific for street objects

#

Havent found any resources regarding object detection using bounding boxes but only results for semantic segmentation

deep sleet
#

if the loss is shown to be nan what does that mean?

unique spoke
#

nvm researched it

wooden sail
unique spoke
#

Can you guys suggest some datasets which I can train a model on or use a preexisting model on which can be used for Object detection NOT Semantic Segmentation for detecting objects on the street. Already using coco dataset want to combine with another

unique spoke
#

Any tutorials you guys can suggest or any links to code a custom object detector - Specifically for tensorflow. Like Object Detection API (A good tutorial which you find useful for this would be much appreciated)

#

Also if I already know tensorflow , should I still learn pytorch?

#

YoloV8 and others require pytorch

lapis sequoia
#

A question about TorchRL:

    checkpoint = torch.load('ppo_model.pth')
    actor_net.load_state_dict(checkpoint['actor_net_state_dict'])
    value_net.load_state_dict(checkpoint['value_net_state_dict'])
    ProbabilisticActor_policy_module.load_state_dict(checkpoint['probabilistic_actor_state_dict'])
    scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
    Adam_optimizer_used.load_state_dict(checkpoint['optimizer_state_dict'])
    GAE_advantage_module.load_state_dict(checkpoint['gae_state_dict'])
    maximum_average_reward = checkpoint['maximum_reward_tensor']

For some reason the average reward decreases after I load the model. I saved the states of Actor Network, Value Network, ProbabilisticActor state, Cosine Annealing Learning Rate Scheduler state, Adam optimiser state, Generalised Advantage Estimator State and maximum average reward number.

Do I need to save

  1. sub-batch ReplayBuffer state
  2. SyncDataCollector experience collector class state
  3. ClipPPOLoss class state
  4. Do I need to save the model before .step() method of my Cosine Annealing LR Scheduler or after?
    https://discord.com/channels/267624335836053506/1263119686867091487
oblique isle
#

Helllo guys hope u good , i have a tiny problem , so basically i have my zip who contains a well structured Python code files (CTGAN model) and i want to implement it in api so i could use it in a desktop app , where i should deploy the code first ?

violet gull
#

Stochastic approximation theory proves that if the random exploration chance doesn’t decay to zero convergence is impossible

deep sleet
#

So most cost functions allow us to reach local minima and which local minima you end up in depends on what your random weights and biases that were set initially are, isn't that a inefficient ? because you are missing out on much better minima so maybe something llke trying several random initializations which gives us a higher chance of getting a lower local minimia

lapis sequoia
mild dirge
#

But if the search space is so big, you can't find every local minimum (and thus finding the global minimum is almost never possible)

#

But it's great that you come to these conclusion by yourself ok_handbutflipped

deep sleet
#

Thx man

violet gull
lapis sequoia
cedar tusk
#

finding local min/max is very easy. finding global min/max requires us to be able to either try all numbers or differentiate the function itself.

#

the most widely used method is to have multiple different starting points so that a large area of numbers is covered

scenic parcel
#

Is this a common philosophy?

violet gull
#

Chaining? Yes

agile cobalt
# scenic parcel

you should absolutely never use inplace

and yes, chaining is pretty common

scenic parcel
agile cobalt
scenic parcel
agile cobalt
scenic parcel
#

drop is the only method that has a green checkmark in their chart, that is later listed as a method to avoid using inplace with

agile cobalt
#

that confusion is all the more reason to just never use it

desert cedar
#

hi! for a project, i'm interested in creating a small application that would be able to read/parse through links and mark down certain info like the article name, the original language the article is in, etc. how would i be able to do this? could someone redirect me to a youtube video or an api i'd be able to use? i feel like i could possibly incoorporate the usage of ai to automate this process. thanks!

cedar tusk
#

is there anywhere i can ask R related questions? if anyone is knowledgable i can ask from dm as well, i have a least squares question

lapis sequoia
lapis sequoia
serene scaffold
#

that is: no.

serene grail
cedar tusk
serene scaffold
#

it seems like the only people who use tensorflow are tutorial authors

cedar tusk
lapis sequoia
#

it's a bit more complex than graph neural nets

cedar tusk
#

i hate the fact that with neural nets intuition is just out the window

lapis sequoia
#

wdym?

cedar tusk
#

there are neurons which take input and have different coefficients, those coefficients produce a result

#

an oversimplification

#

but even then the intuition of the researcher towards the data is never better than the neuron itself since neuron really do not explain anything

lapis sequoia
#

yeah that's like saying that cars are like horses

cedar tusk
#

obviously nets are made to think instead of the researchers

#

to mimic the brain

fervent shore
unkempt apex
#

is it necessary to convert images to Grayscale ( for CNN )
because I am dealing with weather images!

lapis sequoia
#

necessary no, i don't think so, but may be wrong

fervent shore
cedar tusk
#

but then you would need more than 1 layer of input neurons :/

lapis sequoia
#

an image is a 3 D tensor, you can feed that to a CNN network

unkempt apex
fervent shore
lapis sequoia
#

well, it's 2D

#

the D in a convolution are the dimensions of the window, not of the cube

fervent shore
lapis sequoia
#

2 D means you specify the kernel window size

fervent shore
#

I mean would't you add another dimension of corresponding kernels for the added RGB dimension?

lapis sequoia
#

1D would still be 3D, but you specify 1 dimension

#

no, that's a bit confusing at first

#

the depth is always automatically set to the depth of the input data

#

you only specify the area for 2D, and the height for 1D convolutions.

fervent shore
#

yeah so it would go 1D -> [x], 2D -> [x,y], 3D -> [x,y,z]
and so should the kernels, if an image has a dimensional depth of 3 then the kernel matrix would be 3D

unkempt apex
#

okay after debate just explain me in simple!@

lapis sequoia
#

3D convs you set the depth, they aren't very common id say

fervent shore
#

oh wait nvm I see where 2D is used for RGB

#
wooden sail
#

interestingly, what stuff like pytorch will do is apply 2d convolutions separately to each layer of color, then add the results up

fervent shore
#

yeah I saw that on the forum

wooden sail
#

however, this turns out to be equivalent to doing a 3d convolution if you ignore all of the outputs that aren't fully overlapping

lapis sequoia
#

yes, that's what i was trying to say

fervent shore
#

I see now

wooden sail
#

you're free to interpret it as you like for this one

lapis sequoia
#

but each has != weights

fervent shore
#

true

lapis sequoia
#

it's not the same kernel

wooden sail
#

(you can make it have the same weights)

lapis sequoia
#

sure, but that's not commonly the case

wooden sail
#

in any case, the way pytorch does it by default can be written on paper both as several 2d and a single 3d convolution

#

just kind of a boring 3d conv

lapis sequoia
#

yes, it's one of a single step, if i understand correctly?

#

like the same transition happens from 1D to 2D convolution imho

fervent shore
#

every higher dimensional convolution is just a giant 1D convolutionPepe_Hmmm

wooden sail
#

well, you just had to trigger the topic that made me a meme

lapis sequoia
#

@unkempt apex so imho you'd just use conv2D to be brief. others may disagree

wooden sail
#

N-D convolution can be represented as a multi-level block-toeplitz matrix

#

the number of dimensions and the order of unrolling the multi-way array determines whether you have blocks that are toeplitz, or toeplitz blocks

#

so in that sense yes, you can unfold multidim convolutions into an operation that looks like a 1d conv

lapis sequoia
fervent shore
lapis sequoia
#

interesting, there are several misspellings in that single par

#

but the image looks right

#

ptrblck the nvidia guy from pytorch forums, he is so clever lol

iron basalt
# wooden sail N-D convolution can be represented as a multi-level block-toeplitz matrix

To add to this / explain it a bit if you look it up. You can represent a ton of a operations as a matrix multiplication by just setting a bunch of entries to 0 (often representing lack of connection / edge / interaction) (which then can be skipped for performance reasons). It's kind of like adding 0 to an expression. So why do this? So you can write it down in linear algebra form to analyze it.

fervent shore
lapis sequoia
#

i assume that if there is a zero you may skip reading the other matrix's row, or col, but not anything else

iron basalt
lapis sequoia
#

yeah

iron basalt
#

For example. If you have say two vectors: [0, 1, 0, 0] and [1, 2, 3, 4] and you want to element-wise multiply them. You can skip almost all of the work if you know the index of the 1 in the first one-hot vector.

lapis sequoia
#

matrix mult optimisation is quite nice

#

yea, if 1234 is a mtrix, still more savings

iron basalt
#

Yes, if your matrix is like 80% zeros, you get massive gains.

lapis sequoia
#

but that's unlikely if those are actual weights, but i guess it's useful somehow, like you indicated..

iron basalt
#

Not if your weights are sparse...

lapis sequoia
#

why would they?

#

ah maybe in graphs, but only in the first layers

wooden sail
#

if the conv kernel is comparatively small wrt the image, you immediately have humongous sparsity (and you'll notice this is super often the case in CNNs)

fervent shore
iron basalt
lapis sequoia
iron basalt
wooden sail
#

the linear transformation acting on the image is the same size as the image, and everything outside the kernel size is automatically 0

iron basalt
#

I like to show this video to get across the idea, it's well made: https://www.youtube.com/watch?v=0fHkKcy0x_U

Ever run into this funny little puzzle? It appears in Legend of Zelda: Link's Awakening, LEGO Star Wars: The Skywalker Saga, and in a 1995 electronic toy called Lights Out. It turns out that this game has some pretty rich math. In this video, we'll learn about modular arithmetic and the matrix inverse. We'll also learn about substitution ciphers...

▶ Play video
lapis sequoia
#

yeah i had to write some graph parsing a while ago

#

actually, that happens in vector encoding of characters

iron basalt
lapis sequoia
wooden sail
#

maybe i can cook something up. in the 1D case, imagine we have a vector of length 15 and we want to convolve it with a convolution kernel [1,2,1]

#

the matricized transformation would look like this

lapis sequoia
#

gonna take some time 🙂

wooden sail
#

!e

import numpy as np
import scipy.linalg as slin
import matplotlib.pyplot as plt

N = 15
kernel = np.zeros(2*N - 1)
kernel[13] = 1
kernel[14] = 2
kernel[15] = 1

M = slin.toeplitz(kernel[N-1:], np.flipud(kernel[:N]))
plt.imshow(M)
plt.savefig("biggest_oof.png") 
#

ugh

#

from my terminal

#

you get a nice sparse, toeplitz matrix representing the convolution

unkempt apex
#

what should be count of epochs?
for training CNN?

#

I have dataset with approx. 1200 images which are divided into 4 classes

lapis sequoia
#

put many (say 200) and use the "early stop callback" (i.e search about it.) @unkempt apex

#

with patience

unkempt apex
#

earlystopping is good!

lapis sequoia
#

great 🙂

violet gull
#

In RL if I have an agent with the pure goal of staying alive and is rewarded after every day it’s still alive. Is there an analytical difference in making the new reward per day constant (1, 1, 1) vs increasing (1, 2, 3) ?

small wedge
violet gull
#

End of episode

small wedge
#

regardless of the answer to that, there is one difference depending on your implementation. Say you are rewarding agents x score per day survived and 5 score for getting berries, if x increases proportionally, that will make berries less impactful as a source of reward and thus change what the "optimal policies" are for that task during the training.

violet gull
#

Book says rewarding for getting berries is wrong

small wedge
#

idk what you're actually doing I'm just giving a hypothetical

violet gull
#

I am only rewarding for days survived

small wedge
#

what kind of RL are you talking about here, q learning?

violet gull
#

Deep q

small wedge
#

if days alive is the only source of reward and death setting the agent score to 0 for the rest of the sim is the only punishment, I can't think of any analytical reason that increasing the reward over time would change the training compared to keeping it constant

#

but shrug maybe there is one idk

lapis sequoia
#

there is their discord invite in the website.

hearty crown
#

Hello everyone, can anyone tell me where to find or buy Spanish proxies?

hearty crown
lapis sequoia
lapis sequoia
#

thanks !

fervent shore
#

ah I see what its getting at with 3D convolution being used mostly for something like video convolution

lapis sequoia
#

Could be useful for interesting problems. Video seems likely indeed @fervent shore

#

maybe chemical reactions, for example

#

ahh, the blogpost talks about drug discovery, interesting! i'm just tweaking / rewriting a net for a similar purpose.

ocean pawn
#

I know I shouldn't ask to ask, but would anyone mind to do some sorta code review for a non-linear regression?

lapis sequoia
lapis sequoia
#

I mean...barely but yeah...

#

it is normally included in most books, i guess i consider ai=dl which is unfair

ocean pawn
#

It's fine either way, it seemed to be producing reasonable result

#

So the code is right?

lapis sequoia
#

by non-linear you mean polynomial?

ocean pawn
lapis sequoia
#

i may be able to read it then

ocean pawn
#

Oh, would you mind?

lapis sequoia
#

i wouldn't if you put it on some github ill check it out as codespace

#

would be good if it's got unittests

#

or whatever is called in python

#

that's a neat way to test it, adding some tests

ocean pawn
lapis sequoia
#

don't worry im a very mediocre coder

ocean pawn
#

This is the first time I'm implementing these kinda of algorithm

lapis sequoia
#

probably won't say much either

#

why did you call it non-linear though? I think in terms of linear algebra is still linear, but may be wrong

lapis sequoia
#

oh np, i just wasn't sure

#

iirc it's a similar sol to linear regression

ocean pawn
#

In my brain, not straight line must be non-linear

#

I'm probably wrong

lapis sequoia
#

yeah, in terms of linear algebra it isn't bc it's linear in terms of the coefficients

lapis sequoia
#

you don't use x1,x2,...xn as variables but as constants.

#

deep linearning is that

#

lol, deep learning

#

i'll go for a walk though, i'll eventually sit aand read it

ocean pawn
#

It's fine, I don't expect anyone to check it for me

#

It looks like it's working

lapis sequoia
#

np, i've got nothing to do

ocean pawn
#

The graph looks right

lapis sequoia
#

looks perfect

#

what i found a proble, is when you have large values and they are closely spaced i think

#

there is some conditions where it failed (other code, not yours.)

ocean pawn
#

Oh I can't find dataset, I'm just randomly generating value

lapis sequoia
#

i'll link you one

ocean pawn
#

Oh, thanks!

lapis sequoia
ocean pawn
#

It's kinda sad, they're used by 1.8k, but only have 15 stars

lapis sequoia
#

why sad?

ocean pawn
#

Kinda funny, I suppose? Usually, I would kinda expect used project to have a considerable amount of star

#

As appreciation, I suppose

lapis sequoia
#

oh, JS isn't like that for math

#

but will eventually get there

spare forum
ocean pawn
#

His fault

spare forum
ocean pawn
#

Dunno what to name the axis tho, it's random data

#
@partial(jit, static_argnames="data_size")
def generate_data(key: Array, data_size: int) -> tuple[float32, float32]:
    _, subkey = random.split(key)
    x = jnp.sort(
        random.uniform(key=subkey, shape=(data_size,), minval=-500, maxval=500)
    )
    # x = jnp.arange(-200.0, 200.0, step=30.0)
    y = 2 * jnp.pow(x, 2) + 6 * jnp.pow(x, 3) + 1
    return x, y
#

x and y, I suppose

spare forum
#

Yeah, just don't leave it blank (good habit)

ocean pawn
lapis sequoia
#

units can be au arbitrary units

ocean pawn
#

Imagine if I got x and y flipped, it'll be so embarrassing

ocean pawn
lapis sequoia
#

out of curiosity, what happens if you feed a parabola lying on its side

#

like y = +- sqrt(x)

ocean pawn
lapis sequoia
#

i guess it throws a straight line or smth

#

or it could fail to solve

ocean pawn
#

Do numpy sqrt return both positive and negative?

#

Or just positive

lapis sequoia
#

idk

ocean pawn
#

Wait

#

It's because I got nan

lapis sequoia
#

x has to be positive..!

ocean pawn
#

Ohhhhh

#

Woops

#

I was imagining thing

#

/j

#

Your guess is correct

#

It's a straight line

#
@partial(jit, static_argnames="data_size")
def generate_data(key: Array, data_size: int) -> tuple[float32, float32]:
    _, subkey = random.split(key)
    x = jnp.sort(random.uniform(key=subkey, shape=(data_size,), minval=0, maxval=500))
    # x = jnp.arange(-200.0, 200.0, step=30.0)
    # y = 2 * jnp.pow(x, 2) + 6 * jnp.pow(x, 3) + 1
    y1 = +jnp.sqrt(x)
    y2 = -jnp.sqrt(x)
    return x, jnp.add(y1, y2)
lapis sequoia
#

but the points should be a parabola

ocean pawn
#

Weird

#

Why is y all 0

lapis sequoia
#

why are you adding y1, y2?

#

id return x,y1,y2 maybe, in this case

ocean pawn
#

Not -2

lapis sequoia
#

but you get 2 points right (x,y1), (x,y2)

ocean pawn
#

Fixed it

#

Wait

#

How do I do the other half

lapis sequoia
#

also, you can return (x,y1), (x, -y1) isn't it?

#

no need to calc 2 sqrt

ocean pawn
#

Oh I can just *-1

lapis sequoia
#

yes

spare forum
#

abs

ocean pawn
#

I am stupid

#

I am loooking for concat not add

#

Fixed

lapis sequoia
#

nice

ocean pawn
#

Not a perfect straight line it seemed

w: [-2.7694678e-06  5.7525358e-06 -2.8457648e-06] b: 1.0132458783118636e-07
lapis sequoia
#

now you can set y2=0

ocean pawn
#

Gradient decent must be going mad

lapis sequoia
#

no, just removing it, should fit

ocean pawn
lapis sequoia
#

ohh that's gradient descent?

ocean pawn
lapis sequoia
#

you can solve it with a single linear algebra formula, it's got exact solution i think

ocean pawn
#
@jit
def grad_decend(
    w: Array,
    b: float32,
    learning_rate: float32,
    x_train: Array,
    y_train: Array,
):
    w_grad = jacfwd(lambda w: cost(w, b, x_train, y_train))(w)
    b_grad = grad(cost, argnums=1)(w, b, x_train, y_train)
    temp_w = w - learning_rate * w_grad
    temp_b = b - learning_rate * b_grad
    return temp_w, temp_b, w_grad, b_grad
lapis sequoia
#

but it's great however you do it

#

looks good

ocean pawn
lapis sequoia
#

grad_descent is i think

ocean pawn
ocean pawn
lapis sequoia
#

makes sense

spare forum
#

Linear and polynomial regression with least squared error has a closed solution

ocean pawn
#

Can I, a newbie to ml impliment that tho

#

Gradient descend seemes to be quite simple

#

(and good enough)

lapis sequoia
#

yes, you can, but it can be tricky with edge cases

#

check wikipedia

spare forum
#

With numpy it should be fine

lapis sequoia
#

you have to calc a couple of matrix transpositions and are done

#

in numpy this means X.t I think

ocean pawn
#

What's it called?

lapis sequoia
#

try linear regression wikipedia

#

and links to other methods

#

or polynomial regression directly...

ocean pawn
#

Least-squares estimation

#

This?

lapis sequoia
#

tbh, these days you may be better of doing gradient descent, but idk

spare forum
ocean pawn
#

Looks intimidating, but I'll have a look

lapis sequoia
#

it's basic linear algebra unless you dig into it

#

not saying it's easy, but it's that

spare forum
#

Wikipedia might have full maths details, maybe you can find simpler things

ocean pawn
#

||Funny thing, I've only have basic math knowledge, I am only in high school||
I do surprisingly know more than what school teach

#

Oh

#

So are you getting the derivative and set it to 0 and solve?

lapis sequoia
#

yes, that's one approach

ocean pawn
lapis sequoia
#

probably other users know better about the exact details, i don't remember/know that much

spare forum
#

I've literally done this a few years ago like 4 years or smthing I have no idea if I can find it somewhere

lapis sequoia
#

but you don't need to code the derivatives

ocean pawn
#

That's a big if, really, I haven't even been thought how to do derivative yet (I know some differentiation), but, there's nothing stopping me from self teaching ngl

#

I mean my school is still teaching you how to use if statement

lapis sequoia
#

you may wait until learning basics of matrices, imho

ocean pawn
#

Even though it's not taught

#

I know multiplication at least

lapis sequoia
#

that's good..

ocean pawn
#

pithink Doesn't hurt learning something new

lapis sequoia
ocean pawn
#

If I can learn python when I'm 11 or something, why can't I learn ml now (cope)

ocean pawn
#

Wikipedia is really intimidating

lapis sequoia
#

uhmm...

#

i took a look but most look intimidating tbh

ocean pawn
#

Looks like calculus is where I start off with

ocean pawn
#

I hope, at least

spare forum
#

You can fool around with libraries, implementing is another story

ocean pawn
#

The reason I can never understand keras/tf

#

is because I have no idea why am I doing certin thing

#

By knowing how it works

#

I actually understand what and why

#

I guess I'm just really stubborn and wanna understand it

lapis sequoia
#

if you type"least squares matrix formula"

#

and go to images/videos

#

you will realise what of those may fit your level. try it

ocean pawn
#

Thanks!

lapis sequoia
#

in the end, the sol looks like this (A.tA)^(-1) A.t b

#

or similar.

ocean pawn
#

Oh and to get derivative for grad decedent calculus is used right?

lapis sequoia
#

gradient descent isn't used here

ocean pawn
#

In case where gradient descent is used

#

How do they get derivative?

#

Is it calculus?

lapis sequoia
#

yes, the derivative is calculus

ocean pawn
#

Currently, I only understand what the derivative mean, but I don't know how to do it myself

#

So I'll see
Thanks, everyone

#

(I do wanna do it myself, cause, why not, I am thankfull for Jax's autograd tho)

spare forum
#

Only need derivation and partial derivate (not that much more complicated)

lapis sequoia
#

ur welcome 🙂

ocean pawn
#

Worse case scenario: I'll understand it in two year (I'll eventually learn them in school)

lapis sequoia
#

certainly, i think we just didn't want to make you waste time and feel frustrated. it's important to have a good learning path.

ocean pawn
serene scaffold
ocean pawn
#

Yeah, my sentence is stupid

lapis sequoia
#

did u guys watch it?

#

Any of you web scrape frequently?

#

i do sometimes, can't help w anything very intricated though

visual violet
#

which text preprocessing for nlp ml task yall using given that tf.keras.preprocessing.text.Tokenizer is depreciated

visual violet
#

wait wut. keras 3!

lapis sequoia
#

yup

visual violet
#

hmm the newest version

#

so prob legit

lapis sequoia
#

tf 2.16 uses keras3 already

#

under the hood

visual violet
#

it is still keras no?

lapis sequoia
#

sbecause keras is not part of tf

visual violet
#

hmm i guess. but not sure why tf.keras

lapis sequoia
#

it is not very defined actually, but you'd rather do improt keras

#

should work

visual violet
#

sounds good.

lapis sequoia
#

yeah, read the getting started

visual violet
#

thoughts on spacy & nltk?

#

they seem to do similar things, for like tokenization at least

lapis sequoia
#

no idea what that is sorry, maybe others know

visual violet
#

no worries

visual violet
lapis sequoia
#

Starting with TensorFlow 2.16, doing pip install tensorflow will install Keras 3. When you have TensorFlow >= 2.16 and Keras 3, then by default from tensorflow import keras (tf.keras) will be Keras 3.

visual violet
#

thanks!

lapis sequoia
#

welcome !

#

i was so excited about keras 3, cuz it's multibackend, but you will know everythin from that page

visual violet
#

i forgot to google lol

visual violet
#

altho i think it is kinda awkward

#

like i am sure tensorflow can do the same thing as pytorch and vice versa. who would bother learning both of them and using 2 for one task lol

#

i could be wrong though

lapis sequoia
#

that's the point

#

you learn only keras

visual violet
#

i read from somewhere that some people think keras is shit

#

cuz it is like "tensorflow wrapper"

#

i think it is more than that but i could be wrong

lapis sequoia
#

to me it's a symbolic computation layer

#

independent of any backend, and extremely powerful

#

s got almost as many gh stars as pytorch (i know, that's not always the best metric.)

fervent shore
#

Tensorflow is a Keras wrapper

lapis sequoia
#

the low level calculations are carried out by tf (or other backends now..)

#

matmul and such

obsidian mesa
unkempt apex
#

selenium , lxml

lapis sequoia
#

Helllllllo

#

Anyine can dm me if he want to guide me

bright garden
#

it seems like both my validation loss (binary cross entropy) and accuracy curves are going up together, does anyone know how to interpret this?

#

I would imagine the model is overfitting since my train loss is decreasing (and accuracy is 100%)

bright garden
lapis sequoia
#

Are the labels correct?

#

Oh i guess binary classification

#

Hard to say without reading the code for me

bright garden
lapis sequoia
#

(cant open the images here)

#

Sounds like overfitting

bright garden
lapis sequoia
#

If data is random it memorises the input, then acc and loss for training improve, the other 2 shouldn't, although you said vlidationt accuracy goes up?

bright garden
lapis sequoia
#

Sorry i meant validation acc

bright garden
#

A better question rather is whether I should care about the validation loss going up or if I should just care about the accuracy since that's what I'm after anyway

lapis sequoia
#

Uhmm i think you should care but am not an expert

bright garden
#
def _step(self, batch, _, prefix: str):
        x, y = batch
        y_hat: torch.Tensor = self.forward(x).squeeze()

        loss = self.criterion(y_hat, y)
        acc = 100 * (y_hat.round() == y).float().mean()
        self.log(f"{prefix}_loss", loss)
        self.log(f"{prefix}_acc", acc)

        return loss
#

same code for both validation & training steps

lapis sequoia
#

Is the validation dataset too small?

bright garden
#

similar results with 80-20 split too

lapis sequoia
#

Sounds smallish for the task isnt it?

bright garden
#

which explains why the training accuracy so easily goes up to 100%

lapis sequoia
#

Maybe you can add a bit of noise

#

Idk what is the std strategy

bright garden
#

Other answers explain well how accuracy and loss are not necessarily exactly (inversely) correlated, as loss measures a difference between raw output (float) and a class (0 or 1 in the case of binary classification), while accuracy measures the difference between thresholded output (0 or 1) and class. So if raw outputs change, loss changes but accuracy is more "resilient" as outputs need to go over/under a threshold to actually change accuracy.

https://stats.stackexchange.com/questions/282160/how-is-it-possible-that-validation-loss-is-increasing-while-validation-accuracy

lapis sequoia
#

Id try to get 10x that n at least, I'd try reducing the network params as well

bright garden
#

This makes sense too, maybe the model is just learning to predict values close to 0.5

bright garden
lapis sequoia
#

Yes, so that you can get an idea about the behaviour

bright garden
#

Yeah no, similar results with smaller models too

bright garden
#

Maybe it's a better strategy predicting amount of movement in the stock than classifying up/down, that might get rid of the "bad predictions getting worse" problem

lapis sequoia
#

Possibly, or using rnns, unless it is an rnn

bright garden
lapis sequoia
#

Cause those should capture more info

#

So normally you want smth that takes context info i think

bright garden
lapis sequoia
#

Actually rnn would be if you need to predict a continuation of the sequence of the real values i think, so you might be fine. But i wonder whether some rnn like structure wouldn't do better.

#

Maybe smone else can help further

lusty lotus
graceful garden
spare forum
#

Need regularization, early_stopping for example etc

unkempt apex
#
early_stopper = EarlyStopper(patience = 5, min_delta = 0.01)
```is this good?
#

but after 10 epochs the training stops

lapis sequoia
#

I'd use smaller than 0.0001 i think but post says better

unkempt apex
#

0.0001 ?

lapis sequoia
#

Yeah try

misty shuttle
#

Which is better for ML/AI beginners- tensorflow or pytorch? I am open to learning both though

#

the question is basically what do i learn first

unkempt apex
#

pytorch

misty shuttle
#

aight ty

unkempt apex
#

again stopping at 14

#

goal is 100

spare forum
unkempt apex
#

yeah but it is predicting cloudy image, as rainy!!😂

spare forum
#

Skill issue ducky_concerned

unkempt apex
#

yeah!💀

spare forum
#

Not even joking, if the problem is tf, the problem probably isn't tf ducky_concerned

#

Pytorch is good too tho

unkempt apex
#

tf??

#

what should be dropout rate for CNN?

#

0.5 is current!

#

because the model is getting overfitted to quickly !!< should I increase that?

#

ignore that!!

#

current accuracy is good enought actually which is 93

lapis sequoia
#

tensorflow=tf

unkempt apex
#

I don't use that!
only pytorch!

spare forum
#

The misunderstanding was big here lol

#

I was responding to you saying pytorch as an absolute choice just saying tf=tensorflow is also good

lunar wharf
verbal oar
#

how can I workaround input?
maybe hardcode it?

#

i mean when deploying i dont have access to type text

#

but still i want to show to someone result

#

type name of product

#
text = input("type name of product (e.g beer): ")
test = pred(text)

print("predicted label index e.g 0 - chips: ", test)

predicted = ""
    if test == y[test]:
        predicted = yTxt[test]

print("do you want to add", predicted + "?")

accuracy = np.sum(y == preds) / len(y)

print("Model Accuracy = {}".format(accuracy))```
#

it stops on input

#

no problem when running locally but then i type text to input

#

here I dont have possibility

#

here on render deployed

#

hmm maybe replace input with HTML input

#

i mean change to graphical from console

#

so then I must use e.g streamlit

lapis sequoia
#

Guys what do you think of this Value Network Architecture for Inverse Double Pendulum v.4?


value_net = nn.Sequential(
    nn.LazyLinear(num_cells, device=device), # num_cells = 256
    nn.Tanh(),
    nn.ReLU(),
    nn.LazyLinear(num_cells, device=device), # num_cells = 256
    nn.Tanh(),
    nn.LazyLinear(num_cells, device=device), # num_cells = 256
    nn.Tanh(),
    nn.LazyLinear(1, device=device),
)
```? Is there anything that can be improved?
ocean pawn
#

I think I managed to understand derivative

#

Seemed like power rule and chain rule is enough

#

For the time being at least

#

(I even managed to understand how to do derivative for sigmoid function)

#

I assume I'll need it for binary classification? (Right?)

wild coral
#

guys , if anyone has studied deep learning, does anyone know why mini batch gradient descent is said to be more efficient than batch gradient descent? Im watching andrew ngs deep learning course 2. Without any parallelization, I would think mini batch is inherently the same if not slower than batch computing because you are purposely breaking up the already vectorized operations in favor for linearly looping over the training sample

serene scaffold
#

I'll elaborate when I get a chance

#

in the meantime, can you tell me what you think the difference is between batch and mini-batch?

wild coral
small wedge
#

Right but you make updates after each minibatch you calculated

#

Where you are on the gradient therefore changes, so you cannot parallelize those calculations

wild coral
#

right so i understand you can parallelize each mini-batches gradient computation, but im clarifying whether mini batch is inherently faster than batch (without parallelizing each batches gradient computation)

#

and also in what manner is it faster, in computation time, or for convergence

small wedge
#

That was a typo, you can't parallelize minibatches

formal flume
#

Anyone here familiar with discretization schemes in quant finance for the heston model? I am trying to implement a model but having a few issues (one really)

small wedge
#

It's faster in convergence time because you trade the accuracy of your gradient estimate for the speed of your convergence. Say you have a 10,000 sample dataset and in batch you use every sample before taking a step. Now say in mini batch you take a step (update your model) after every 20 samples, you have made 500 more updates to your model in a single epoch.

wild coral
#

so you are updating more frequently but each epoch of updates is slwoer

small wedge
#

The amount of wait time/compute is technically more in mini batch (per epoch) than in batch yes. But the actual convergence speed is faster for mini batch.

wild coral
#

so when you say converges faster, you mean it takes less number of epochs to converge?

small wedge
#

That's one way to put it yes.

wild coral
small wedge
#

one way to think about this is an analogue to float precision. We generally use 32 bit floats as the default for machine learning models because, although you can get more precise gradient estimates using 64 bit floating points, that extra precision doesn't help us converge any faster really. mini batch GD is taking that idea to the extreme, we don't need every sample to get a gradient estimation that lets us step in the right direction; assuming our dataset is properly balanced then that handful of samples should be a good enough idea of where to go for us to safely take a step.

wild coral
small wedge
#

yeah you make updates on every mini batch per epoch

wild coral
#

ok

#

is MB faster in runtime compared to Batch GD? as in takes less time to converge

small wedge
#

yes, mini batch gives us much faster convergence

wild coral
#

and why is that? I feel like I am going in circles, but you are still just breaking up a larger task into smaller tasks but a lot more tasks

small wedge
#

think of it like this

#

a batch epoch is: compute gradient estimate across all samples -> update model

#

two steps

#

a mini batch epoch is: compute gradient estimate on batch 1 -> update model -> compute gradient estimate on batch 2 -> update model -> ...

#

you are basically having lots of little "batch epochs" in a single "minibatch epoch"

#

you are basically getting the same result per 2 steps, just with mini batch you do a hell of a lot less computational work to get that result, and do it many times

wild coral
small wedge
#

I don't think it's a great analogy because the speed of the flights is the same or worse when you break it up. A better analogy might be:

You have 10 seconds to get as far down a hill as you can, you can lay on your stomach and measure exactly the angle of the hill in front of you then gently take a step, or you can just take a glance and jump down the hill. Both will get you in about the same place, but you can do the second one way more times in 10 seconds than you can do the first

wild coral
#

but in mb gradient descent, your step size is smaller than in b gradient descent

small wedge
#

what makes you say that? pithink

wild coral
#

thats what andrew ng said lol

#

not exactly sure why that is the case

small wedge
#

the magnitude of your step down the gradient might be smaller, or the noise of your data might mean on average the step you take doesn't get you as close to the local/global minimum

#

but 100 mini batch steps vs 10 batch steps favors mini batch greatly

#

you will be much farther down the gradient in mb

wild coral
#

ok wait side question, at the end of all epochs you would have an array of costs for each mnibatch, how do you reconcile that at the end for a scalar cost

small wedge
#

no you do the updates per mini batch, you don't calculate all the costs then update at the end

#

it's the exact same mathematical algorithm as batch gd you just do it on way less samples, over and over until you run out of samples

wild coral
#

oh right

#

then that is not parallelizable then?

small wedge
#

it is not

wild coral
#

between epochs?

small wedge
#

you cannot parallelize mini batch

#

ig you could technically parallelize a single minibatch calculation across multiple processors and aggregate but I don't think anyone does that, you don't need parallelization because it's so much faster than regular GD

past meteor
# wild coral then that is not parallelizable then?

Yes and no. You really should view most of stochastic gradient descent as matrix multiplication. You multiply your batch, a tensor, with your weights (a matrix or a tensor), calculate the loss and subsequently do your update. Multiplying matrices is parallelizable

gray citrus
#

spent the last 3 hours trying to figure out
why read_csv() wont read all the rows in my text file.

tried polars , tried pandas nothing worked,

pandas did work but only kinda ,
my raw data had 40k rows , but it was only reading 36k and dropping shit with no warnings or errors what so ever.

after trying a billion things setting an argument 'quoting=3' and it just fucking works,
but the lines that were being dropped had no quotes in them to begin with

I wanna die

wild coral
past meteor
#

Matrix multiplication with numpy is already multithreaded and with jax/torch you can go a step further and run it on a gpu for even more parallelism EDIT: fixed it, thanks to yo

hearty depot
#

^

#

and also askshually numpy is cpu only 🤓

past meteor
#

Ah, the second numpy is meant to be torch

wild coral
past meteor
#

You need to do it serially because you do an update of the weights after each batch

wild coral
#

if you could imagine one is O(N) would the other not be O(N / n) where n is number of processes

wild coral
past meteor
#

code it up 😄

#

shouldn't take longer than a couple of hours

#

at most

glass ridge
#

guys , i was wondering what is the best module fr me to learn and start with as a bigginer , pytorch or tanserflow

serene scaffold
hearty depot
late lichen
#

Uhm guys can you give me resources about back prop?

#

Or gradient decent?

#

Ping pls

small wedge
hearty depot
#

which is just chain rule

true gulch
#

Hello, are you guys familiar with prototype_path and class name script?

hollow sentinel
late lichen
small wedge
#

yeah the paper I linked does that

marble turtle
#

Hey guys, Can anyone help me with setting up the apache spark environment?

lapis sequoia
urban pendant
#

Hello guys

hollow sentinel
orchid forge
#

hello

true gulch
#

The path in the script isn’t working I’m guessing it’s because of .npy file

orchid forge
#

I'm making a project on an fred website economy data, can someone help me generate questions regarding it?

#

i am new in making personal project i haven't got confidence till now to create my own questions for a project that i'm making

#

this is how the data looks

dark minnow
#

Hi! I want to ask a question, so I want to create a chatbot and i want this chatbot to get information from the websites using ai. i know python a little bit i completed some python 3 courses and im going to start the intermediate course and like review python. However, I don't know where to start building the chatbot and which ai should i use etc.

can someone tell me what do i need to create a chatbot? some says use openai assistant other says use gemini and while others say use dialogflow etc.

topaz abyss
#

webscraping?

#

you want it to webscrape?

left tartan
lapis sequoia
#

keras guides are really superb...

deep sleet
#

If you are deciding the batch size in a model then what's the use of number of steps per epoch param?

#

isn't it automatically decided by dividing the number of data points in the dataset by the batch size?

lapis sequoia
#

i'd think so

orchid forge
lapis sequoia
#

the steps per epoch may be useful when you feed the whole data set

deep sleet
slender meadow
#

Hello respected members , as you all can see that i have just joined in this community

#

I want to become an AI engineer but i dont knows the proper steps

#

Can anyone guide me about the roadmap to become one

serene scaffold
slender meadow
#

I just complete my high school

serene scaffold
slender meadow
#

yes

#

but its 50\50 percent

serene scaffold
#

what do you mean by that?

slender meadow
#

like i am still not sure whether i will get a seat in the desired uni

#

cz thats the only good in uni in my city

serene scaffold
#

what country are you in?

slender meadow
#

so i m trying not to depend on my uni too much

#

INDIA

#

I hail from north eastern india which doesnt have good educational instituition thats y

serene scaffold
#

you will need a university education in CS with an emphasis in AI to be able to have a career in this space.

slender meadow
#

Yes i have opted for that

#

Most probably i will get it

serene scaffold
#

okay, so take as many AI-related courses as you can, and make sure you're taking the appropriate math prerequisites

slender meadow
#

I have aplied for CS in Data science and AI

#

Yes the uni has already set courses which include discrete maths , linear algebra etc

lapis sequoia
lapis sequoia
#

Nice

bleak reef
#

lol

lapis sequoia
#

I mean wouldnt you recommend just learn the topic by yourself with books ?

slender meadow
#

Sir may i ask from which country do u hail frm

serene scaffold
bleak reef
#

it takes alot of effort and hardwork to do it without a good uni

slender meadow
#

i also think so

bleak reef
lapis sequoia
#

Okay. I just have one DS class which teaches the basic algorthimes with Neural Networks.

slender meadow
#

what aside from uni studies what can i do to achieve my desired goal

lapis sequoia
#

I have modules like Data Driven Decision where you work with python mostly to preidct something

serene scaffold
lapis sequoia
#

But its not a lot of theroy

slender meadow
#

Like i dont have any coding background

serene scaffold
slender meadow
#

and i want to learn Python

bleak reef
slender meadow
#

by myself

serene scaffold
arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

bleak reef
slender meadow
#

Is there any steps or routine i need to follow like a roadmap

bleak reef
slender meadow
#

@serene scaffold which country are u from

#

@bleak reef yes i k them

#

@serene scaffold are u a working professional

#

@serene scaffold what type of work u do like ai or web developer etc

serene scaffold
slender meadow
#

@bleak reefare u also from india

slender meadow
#

@serene scaffold sounds pretty impressive

#

@bleak reefare u a student

bleak reef
#

yeah in 3rd year of my CS degree

slender meadow
#

@bleak reefis there anything i need to be aware of before starting my cs degree as i will start my first sem from next month

#

@serene scaffold is there any roadmap to learn python

arctic wedgeBOT
#
Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

bleak reef
slender meadow
#

@bleak reef actually i dont believe in the teaching faculty of uni cz i heard the faculty is average

lapis sequoia
#

I would recommend watch some random python yt course to gasp the basics and then look up libaries you probably need. #

#

And work with code in generell

bleak reef
slender meadow
#

oh i see

bleak reef
#

and be good at maths

slender meadow
#

hmm

slender meadow
#

Is there tips to learn code better

serene grail
deep sleet
serene scaffold
deep sleet
#

oh so stuff similar to chat gpt?

#

sorry for my ignorance

serene scaffold
#

yes

slender meadow
#

@serene grail thnx

umbral blaze
#

Hello everybody. I was wondering if anyone knew good resources to learn how to build AI applications using Python before college?

past meteor
#

@warm copper the slopes and the means are functionally the ssame

warm copper
#

For instance, if you're analysing the impact of different factors on delivery times, regression could help you. It can easily predict how changes in these factors affect the overall supply chain. If you're comparing the average delivery times across different regions or transportation methods, ANOVA would be more appropriate.

past meteor
#

if you have a binary variable yes/no annd you make a dummy the intercept is the mean of yes or no

#

and the "YES" variable is the difference of the mean

#

Can you not see how this is the same?

#

It's presented differently in R but it's fundamentally the same thing repackaged

warm copper
past meteor
#

Hence why more and more programs don't teach ANOVA anymore altogether

warm copper
#

I literally used ANOVA during my internship

past meteor
#

Don't take this the wrong way but are you reading what I'm saying?

warm copper
#

I am but I still use them for different purposes

past meteor
#

If you ONLY have 3 categorical variables and you make 2 dummies

#

red, blue and green. You make a column for red and a column for green

#

It's clear that beta_0 is the mean of blue, yes?

#

It's also clear that the coefficients for beta_1 and beta_2 are expressing the difference of the means compared to red and green, right?

warm copper
#

I used ANOVA in DOE a lot too

#

Instead of Regression

#

Yeah I understand that

#

but you cant do logistic regression with ANOVA

#

multiple regression in my bulletpoint doesnt refer to linear regression

#

it can be logistic or polynomial regression too

past meteor
#

Exactly

#

All I'm saying is that if it's a stats heavy data scientist role and you don't see that linear regression subsumes anova

#

you may have an issue during the interview

#

And for that reason I avoid those roles like the plague

warm copper
#

I mean ANOVA can be used as a linear regression for sure. You can use them interchangebly

#

but the output i get from each of them are different

#

but they achieve the same thing

past meteor
#

Other fun questions I get at interviews is how random forest and xgboost work

#

For some reason every technical interview has asked that

warm copper
#

Anov table puts out more statistical stuff

#

like p values, f score degree of freedom

#

you don't see that kind of stuff on linear regression output

past meteor
#

Depends on the library of course

#

We used Stata (🤢) for this stuff in uni an afaik you get F scores there with your regression output

warm copper
#

this is from my undergrad thesis @past meteor

past meteor
#

I'm very picky, are you sure you want me to look? 😂

warm copper
#

yup

past meteor
#

My first question is, did the people that die specifically die due to heart failure or could it be anything

#

If you had a car crash in september are you added to the group that passed?

warm copper
#

they died specifically from it

past meteor
#

stepwise regression is a no-go

warm copper
#

dataset part of project states that

past meteor
#

many papers on why you shouldn't use stepwise

warm copper
#

Well we only learnt stepwise, logistic and linear regression in that course

past meteor
#

damn

warm copper
#

It was called Applier Regression Analysis

past meteor
#

stepwise + AIC = 🥴

#

No Lasso?

warm copper
#

no

past meteor
#

damn

warm copper
#

The only lasso I know is the regularization technique

past meteor
#

Yeah

warm copper
#

which I learnt in ML

past meteor
#

that one

#

https://www.reddit.com/r/statistics/comments/7bvo6m/why_is_stepwise_regression_criticized/

From what I've been taught about stepwise regression, the problem is how very atheoretical it is - and also how dependent it is on sample characteristics. Since predictors are often at least a little correlated, using the exact same set of variables with stepwise selection on two different datasets will often get you a VERY different solution.

Hell, if you run it forward and backward on the same dataset you'll often get very different solutions.

For models that are inherently multivariate, pretending they aren't (doing a bunch of pairwise comparisons, one variable at a time) is generally not the best way to go.

#

I guess that if your prof is teaching you stepwise, then I can see why you did it

warm copper
#

I would use DT on this dataset now

past meteor
#

I'm mostly missing a residual analysis

warm copper
#

😄

past meteor
#

Your regression formula assumes a linear relationship between each variable with 0 interactions

warm copper
#

Homoscedasticity

#

I learnt all those later in my degree

past meteor
#

Well, you could plot the residuals wrt each variable and find out if it's homoscedastic or not indeed

#

isn't the bachelor thesis at the very end?

warm copper
#

Assumptions of Linearity

#

Nag

past meteor
#

strange

warm copper
#

Nah this was more like a class project

#

It was a very tough class tho

#

We had 24 peeps when we started and ended with 8

past meteor
#

I glossed over all the statistical tests because it's been too long for me and I never use them at work

#

yeah, I think it's mostly stepwise and no iterative, "data driven" approach to modelling

#

but I guess you saw that in other classes

warm copper
#

yup

#

This is my current project for deep learning @past meteor

past meteor
past meteor
#

Mostly matters for interpretation of your results

#

Hence why I'm always afraid of interpreting regresssion coefficients, it's risky business if you're not a statistician pur sang

past meteor
#

Or maybe I take this too seriously 🤷

warm copper
#

try now lol

#

im using transformers

#

to detect colon cancer

#

I managed 99 percent accuracy

#

👽

past meteor
#

I'm not gonna read this ngl haha

warm copper
#

LOL

past meteor
#

it's very notebooky

warm copper
#

yusss

past meteor
#

you need to read it top to bottom

#

And can't join in the middle and see what's going on

warm copper
#

i said to myself

past meteor
#

it's 10 pm for me on a friday, aint gonna do that rn 🤣

warm copper
#

if I cant get a job i will become a college prof

past meteor
#

What I'd look for here is if you're leaking data or not

warm copper
#

i will do a phd and stay in the college

past meteor
#

If you get 99 % accuracy you should be worried, not happy imo

warm copper
#

transformers are really powerful

#

basicall you are training your model on a pretrained model

#

I used ViT

past meteor
#

which is what you do with say resnet as well

warm copper
#

yup

#

what is worrying me is the fluctiations in epochs @past meteor

#

Epoch: 3 | train loss: 0.1296 | test accuracy: 1.00
Epoch: 3 | train loss: 0.1374 | test accuracy: 1.00
Epoch: 4 | train loss: 0.2167 | test accuracy: 0.94
Epoch: 4 | train loss: 0.2823 | test accuracy: 0.94
Epoch: 4 | train loss: 0.5354 | test accuracy: 0.88

#

Epoch: 4 | train loss: 0.2172 | test accuracy: 0.94
Epoch: 4 | train loss: 0.0596 | test accuracy: 1.00

#

something is off

agile cobalt
#
# Get the next batch for testing purposes
test = next(iter(test_loader))
test_x = test[0]
``` that `iter()` is either redundant or a bug
warm copper
#

its redundsnt

#

but what do you think about the epochs @agile cobalt

#

why such high fluctiations?

#
Epoch:  6 | train loss: 0.1894 | test accuracy: 0.94
Epoch:  6 | train loss: 0.0428 | test accuracy: 1.00
Epoch:  6 | train loss: 0.1503 | test accuracy: 0.94
Epoch:  7 | train loss: 0.3682 | test accuracy: 0.94
Epoch:  7 | train loss: 0.3046 | test accuracy: 0.94
agile cobalt
#

idk batch size too small?

warm copper
#

its 16

agile cobalt
#

don't you have 8 classes or so

warm copper
#

yup

#

why?

agile cobalt
#

never mind, I got a bit confused
(thinking about how many classes it'll see each iteration, but that probably shouldn't matter)

warm copper
#

lol

#

I changed my batch size to 64 from 16 now

#

and LR to 0.01 from 0.00001

#

Epoch: 0 | train loss: 2.2913 | test accuracy: 0.12
Epoch: 0 | train loss: 1.7133 | test accuracy: 0.27
Epoch: 1 | train loss: 2.4397 | test accuracy: 0.11

#

Made it worse lol

agile cobalt
#

maybe try smaller just to see what happens

warm copper
#

yeah maybe like 8?

#

for batch_size?

agile cobalt
#

or even 4

0.01 learning rate is probably too high though

warm copper
#

EPOCHS = 50
BATCH_SIZE = 8
LEARNING_RATE = 0.000001

#

Im gonna do this lol

#
Epoch:  5 | train loss: 0.6605 | test accuracy: 1.00
Epoch:  5 | train loss: 1.0521 | test accuracy: 0.75
Epoch:  5 | train loss: 0.5068 | test accuracy: 1.00
Epoch:  5 | train loss: 0.7508 | test accuracy: 1.00```
#

lol 🥲

past meteor
#

That in an of itself is a bit strange

warm copper
#

I increased dropout layer to 0.5

#

it goes through each batch size

#

you can make it print once

#

it gives more detail on whats going on each batch size

past meteor
#

evaluating on the validation set each batch is strange

#

I'd just have 1 line per epoch tbf

warm copper
#

fixed it

past meteor
#

Anyway, if your model's performance is really good odds are you shouldn't be celebrating but rather looking for where you have a leak

#

If you've exhaustively search and you find nothing then you can celebrate

harsh sun
#

I did 50 epochs of training on my model, and on the 11th epoch I got 92% training and validation accuracy. How can I like select the epoch with the best training and validation when I train it next so that I can make that configuration the permanent one?

#

Also, when doing the hyperparameter search, when I get the best parameters on the next run it always changes. So I finally ran it and I saw good parameters and I hard coded thoes in and I got better results. Why isnt that the standard instead of them changing every time

glass ridge
warm copper
#

i use pytorch

glass ridge
warm copper
#

college

#

😄

#

work

glass ridge
warm copper
#

are you?

glass ridge
#

1 year to graduate and choose a specializaton

past meteor
#

Sadly the answer will be the same as I give you with numpy 😅

#

The documentation

glass ridge
#

then see neural nine video

past meteor
#

The pytorch documentation has a "learn" section

#

it's how I learnt pytorch. It's mostly the same as Numpy and Tensorflow

#

no books, no videos, just the docs

glass ridge
past meteor
#

that's how you should learn to use tools imo. If the tools have bad docs, just use a different one if you have a choice

glass ridge
past meteor
#

I really don't like the idea of learning from courses

#

I explained last time already why not

#

You should pick a book like the ones in the pinned post I list

#

They all have exercises

past meteor
#

do you know how to find pinned messages on discord?

#

no problem if you don't

glass ridge
#

i know

past meteor
#

The second pinned message is about books I recommend

glass ridge
#

from Raggy?

past meteor
glass ridge
serene scaffold
past meteor
glass ridge
past meteor
#

Can you first clarify what you mean with API first

#

it doesn't need to be perfect, your describe it in your own words so I understand what you mean

warm copper
#
Epoch:  31 | train loss: 0.1147 | test accuracy: 1.00
Epoch:  32 | train loss: 0.0760 | test accuracy: 0.75
Epoch:  33 | train loss: 0.0796 | test accuracy: 0.88
Epoch:  34 | train loss: 0.0718 | test accuracy: 0.88
Epoch:  35 | train loss: 0.0762 | test accuracy: 0.88
Epoch:  36 | train loss: 0.0627 | test accuracy: 1.00
Epoch:  37 | train loss: 0.0548 | test accuracy: 0.88
Epoch:  38 | train loss: 0.0607 | test accuracy: 1.00
Epoch:  39 | train loss: 0.0512 | test accuracy: 0.88
Epoch:  40 | train loss: 0.0575 | test accuracy: 1.00
Epoch:  41 | train loss: 0.0567 | test accuracy: 1.00
Epoch:  42 | train loss: 0.0506 | test accuracy: 1.00
Epoch:  43 | train loss: 0.0588 | test accuracy: 1.00
Epoch:  44 | train loss: 0.0361 | test accuracy: 1.00
Epoch:  45 | train loss: 0.0434 | test accuracy: 1.00
Epoch:  46 | train loss: 0.0401 | test accuracy: 1.00
Epoch:  47 | train loss: 0.0331 | test accuracy: 1.00
Epoch:  48 | train loss: 0.0440 | test accuracy: 1.00
Epoch:  49 | train loss: 0.0330 | test accuracy: 1.00
glass ridge
warm copper
#

Pretty good accuracy! @past meteor

#

I would probably get a lower train loss if I ran it for another 10 epochs

glass ridge
past meteor
#

So, they're using API in the colloquial but incorrect way as a way for the outside world to interact with your ML models over the internet

serene scaffold
# glass ridge i just heard it on a data scientist roadmap

Throw away that roadmap.
Pick a basic ML concept (not a library) and write some code that exemplifies that concept. You'll inevitably need at least one ML library to accomplish it. Just learn whatever minimal amount of that library's API that you need to do it.

past meteor
#

Don't worry about it

warm copper
#

i deem my model as a success

#

😛

past meteor
#

Stelercus is spot on

#

I started learning data science in exactly that way. I downloaded all of my data from facebook (... I use messenger a lot) and did a data analysis project with it

glass ridge
serene scaffold
warm copper
#

use kaggle @glass ridge

past meteor
#

Along the way I learnt the basics of pandas, storing data in DBs with Python, what JSON is, making ML models using sklearn, ...

warm copper
past meteor
#

This is a project you can easily copy for whatever platform you use because of GDPR, like you can ask discord for all of your data

#

And then you can do some analysis on that

#

Doesn't need to be advanced, but it's more productive than roadmaps and whatnot

warm copper
#

these are good places

past meteor
past meteor
warm copper
#

it was really high

#

Epoch: 0 | train loss: 1.7844 | test accuracy: 0.25
Epoch: 1 | train loss: 1.3510 | test accuracy: 0.75
Epoch: 2 | train loss: 1.3775 | test accuracy: 1.00
Epoch: 3 | train loss: 0.8661 | test accuracy: 0.88
Epoch: 4 | train loss: 0.7971 | test accuracy: 1.00

#

went from 180 percent to 3 percent

#

why would it be a bad thing

#

you aim to minimize train loss

past meteor
#

I recently had a very good model and I presented it to my clients (both are PhD + multiple post doc tier data scientists)

#

their obvious reaction was "where did you make an error?"

glass ridge
past meteor
#

And that is with me presenting my results with loads of skepticisim

warm copper
#

first of all transformers usually lead to very low train loss

past meteor
#

that's not the point of training models, it's the exact opposite 😔

warm copper
#

??????

past meteor
#

The point is not minimising the train loss

#

it's training something that generalizes

warm copper
#

you need high accuracy

past meteor
#

If you make the training loss go to 0

#

you're typically not generalizing

warm copper
#

Im doing image classification

#

lower the loss the better the accuracy

past meteor
#

it doesn't matter if it's classification, regression or clustering

#

I'm sorry but this is patently false

warm copper
#

Im supposed to aim 95 percent accuracy

#

for the project

past meteor
#

I'm being hard on you because you're interviewing

#

If you say this in an interview it's over

warm copper
#

alright then lets have a shitty classification model that doesn't accurately classify images

serene scaffold
warm copper
#

first of all

#

the model is already given