#data-science-and-ml | Python | Page 122

wooden sail May 25, 2024, 7:01 AM

#

all estimates are random variables and have their own statistical distributions

past meteor May 25, 2024, 7:01 AM

#

Yeah, the MSE and the variance are the same under the condition they estimator is unbiased

wooden sail May 25, 2024, 7:02 AM

#

specifically, the variance of the error and the MSE of the error

past meteor May 25, 2024, 7:02 AM

#

Idk if you'll ever do reinforcement learning @spring field but there understanding and working with the bias of estimators has practical tradeoffs

#

Whereas in stats you can argue it's a bit fuzzy, unless you're making new methods and you want them to have certain sane properties

wooden sail May 25, 2024, 7:06 AM

#

i'd also suggest to take some time to digest the idea. i've found a lot of people struggle with it

past meteor May 25, 2024, 7:06 AM

#

Honestly depends on what you want to do though

#

It's something I had to learn in statistics and it goes into the bucket of things I never use

spring field May 25, 2024, 7:07 AM

#

past meteor Idk if you'll ever do reinforcement learning <@670379095951147019> but there und...

I will, in fact, I have already, lol

wooden sail May 25, 2024, 7:07 AM

#

i would say you do use it. this is one of the nails in the coffin of trying to say whether your model is good by looking at the values of the MSE 😛 a mistake many people make

past meteor May 25, 2024, 7:07 AM

#

Maybe having passive knowledge of it helped me? We can never answer that question

past meteor May 25, 2024, 7:08 AM

#

wooden sail i would say you do use it. this is one of the nails in the coffin of trying to s...

There are different ways to arrive at this conclusion

wooden sail May 25, 2024, 7:09 AM

#

certainly

#

we did recently submit a paper exactly on this matter btw

#

since it turns out a lot of results are misinterpreted in papers

#

the MSE do be tricky

spring field May 25, 2024, 7:10 AM

#

MSE of the variance? like how far off the predicted variance is from the true one? isn't that just square of their difference? not like you can have multiple variances, can you?

past meteor May 25, 2024, 7:10 AM

#

In your niche it's very true / important but I'm curious how many PhDs can answer this coherently. That doesn't make it right though

#

But you get my point

wooden sail May 25, 2024, 7:10 AM

#

spring field MSE of the variance? like how far off the predicted variance is from the true on...

you started off right 😛 since your population sample is chosen "at random", that makes your estimated variance also a random variable with its own mean and variance

spring field May 25, 2024, 7:11 AM

#

past meteor Yeah, the MSE and the variance are the same under the condition they estimator i...

yes, that's understandable, I just noticed the formulas are similar enough 😁
but I'm glad I started this whole educational thing rn :p

wooden sail May 25, 2024, 7:11 AM

#

you can also compute its MSE

past meteor May 25, 2024, 7:11 AM

#

If there's one thing I've learnt is that if you stare at any concept long enough you start to see it's more nuanced than you initially thought

wooden sail May 25, 2024, 7:11 AM

#

past meteor In your niche it's very true / important but I'm curious how many PhDs can answe...

that's probably the case tbh, i need to talk to real people at some point

pearl marten May 25, 2024, 7:12 AM

#

1

spring field May 25, 2024, 7:12 AM

#

wooden sail specifically, the variance of the error and the MSE of the error

variance of the error means using the mean of the error as well, right?

spring field May 25, 2024, 7:13 AM

#

wooden sail i'd also suggest to take some time to digest the idea. i've found a lot of peopl...

will do

wooden sail May 25, 2024, 7:13 AM

#

spring field variance of the error means using the mean of the error as well, right?

well, not necessarily

#

you could directly compute the variance

past meteor May 25, 2024, 7:13 AM

#

Do you have an actual statistics course?

#

If so, they'll teach this

wooden sail May 25, 2024, 7:13 AM

#

you could compute it through the MSE by doing a bias-covariance decomposition though

past meteor May 25, 2024, 7:14 AM

#

Notions of what an unbiased estimator is, is kind of important ye

spring field May 25, 2024, 7:14 AM

#

wooden sail i would say you do use it. this is one of the nails in the coffin of trying to s...

mmm, I have realized that the loss doesn't have any particular meaning aside from smaller is better

wooden sail May 25, 2024, 7:14 AM

#

"small number good"

past meteor May 25, 2024, 7:16 AM

#

Here at least comp sci got less meaningful statistics than us

#

Econometrics was the best course I took in terms of statistical modelling

spring field May 25, 2024, 7:17 AM

#

past meteor Do you have an actual statistics course?

not yet, I kinda dropped out of the ME course I was doing, cuz... reasons
anyway, I'm planning on going to a different uni now for sth in cs, ds, stats, ml/ai, applied math, not entirely sure quite yet, so I'll probably have stats, yes

past meteor May 25, 2024, 7:18 AM

#

spring field not yet, I kinda dropped out of the ME course I was doing, cuz... reasons anyway...

You're just self learning all this? You're not enrolled right now? If that's the case you'll be perfectly fine 😄

#

The economics side of it is very boring but econometrics teaches you all the pitfalls of modelling in practice

#

Many of my comp sci peers didn't take it and were a lot worse at modelling

wooden sail May 25, 2024, 7:19 AM

#

past meteor Many of my comp sci peers didn't take it and were a lot worse at modelling

https://tenor.com/view/zoolander-blue-steel-ben-stiller-duck-face-gif-8425169

Tenor

past meteor May 25, 2024, 7:20 AM

#

No ML course teaches you about heteroscedasticity, predicted vs actual plots, analysing residuals, omitted variable bias, multicollinearity, ...

#

It's just about ML models and nothing more

#

Which is a mistake imho

spring field May 25, 2024, 7:20 AM

#

past meteor You're just self learning all this? You're not enrolled right now? If that's the...

not self learning exactly, lol, it's a... special course on ML, unrelated to uni, but the stats and some other stuff I'm covering mostly myself ig, yeah

past meteor May 25, 2024, 7:21 AM

#

Maybe multicollinearity is one that is touched on

#

To motivate why you need regularisation

wooden sail May 25, 2024, 7:21 AM

#

past meteor No ML course teaches you about heteroscedasticity, predicted vs actual plots, an...

these should kinda be prereqs even, not even part of ML courses

past meteor May 25, 2024, 7:22 AM

#

wooden sail these should kinda be prereqs even, not even part of ML courses

I agree-ish

wooden sail May 25, 2024, 7:23 AM

#

you activated my trap card

past meteor May 25, 2024, 7:23 AM

#

I'm honestly bad at giving precise definitions of things related to math

#

I can give you a hand wavy one

#

The only context we spoke about it in the context of ML was with ill-posed problems

wooden sail May 25, 2024, 7:24 AM

#

most people would mention the explicit ones like multiplying or adding a term that makes the problem easier in some sense

past meteor May 25, 2024, 7:24 AM

#

Like a matrix where one row is a linear combination of another

wooden sail May 25, 2024, 7:24 AM

#

fishing out a particular solution, changing the type of convexity, etc

past meteor May 25, 2024, 7:24 AM

#

Something something about the rank

wooden sail May 25, 2024, 7:25 AM

#

roughly anything that makes a problem easier by restricting the solution space

past meteor May 25, 2024, 7:25 AM

#

By adding an additional regularisation term you can solve it uniquely

#

I feel like "what is regularisation" is a question you can answer in 5 different ways

#

Especially if you're asking for practical applications, how to interpret it and so on

wooden sail May 25, 2024, 7:26 AM

#

i also didn't like it right now, i was gonna add "or changing its geometry"

past meteor May 25, 2024, 7:26 AM

#

I could talk about L2 regularisation being a Gaussian prior in the Bayes world and so on

wooden sail May 25, 2024, 7:27 AM

#

now define "capacity"

past meteor May 25, 2024, 7:27 AM

#

You can add noise to all your input, that's regularisation

past meteor May 25, 2024, 7:27 AM

#

past meteor Like a matrix where one row is a linear combination of another

And it solves this problem

wooden sail May 25, 2024, 7:28 AM

#

that would be a restriction of the parameter space

#

making it low dimensional

#

(it's not the only kind, but that works)

wooden sail May 25, 2024, 7:28 AM

#

past meteor And it solves this problem

tikhonov walks into the room

past meteor May 25, 2024, 7:29 AM

#

Ah yes hearing "Tikhonov" sends me back to the university auditorium

wooden sail May 25, 2024, 7:29 AM

#

L2 and L1 don't actually do that though, those aren't restrictions. they change the geometry

#

that was my beef with it

#

if anything L2 reg increases the dimension of the parameter space to make it smoother

#

as zestar says, increasing the rank of the matrix

#

more degrees of freedom

#

yes but that's not a restriction 😛 as you pointed out before

#

you can explicitly do this though, e.g. with how you complain about needing to use rectangular matrices instead of low rank square ones

#

THAT is a restriction

#

an additive term changes geometry and promotes a behavior, but does not restrict it

#

that's exactly it. and the intution is indeed correct

#

but a restriction eliminates the possibility of something happening

#

https://en.wikipedia.org/wiki/Restriction_(mathematics)

#

i bring it up because a restriction is something properly defined

#

i would be surprised if you could in the ML context

past meteor May 25, 2024, 7:33 AM

#

Why is early stopping considered regularisation

wooden sail May 25, 2024, 7:33 AM

#

cuz we don't have a whole lot of guarantees for ML

past meteor May 25, 2024, 7:33 AM

#

Is there any basis for it?

wooden sail May 25, 2024, 7:34 AM

#

i recommend looking at fisher information (this is what i study), which measures the information models carry about their parameter spaces. it's related to entropy, but they're not the same

wooden sail May 25, 2024, 7:35 AM

#

past meteor Is there any basis for it?

i guess in the sense they prevent overfitting

past meteor May 25, 2024, 7:35 AM

#

Then the definition is reduced to "thing that prevents over fitting"

wooden sail May 25, 2024, 7:35 AM

#

i would interpret it as wanting a low dimensional solution

peak ridge May 25, 2024, 7:35 AM

#

hmm

#

im building the next big thing
and a big thanks to this server

wooden sail May 25, 2024, 7:36 AM

#

overfitting means your parameter space has a structure that contains even the noise, which is higher dimensional than the original parameter space (remember the previous noise discussion)

#

early stopping would keep your parameter space lean before you go on to fit the noise, so something like using a subspace of the full parameter space of your model

#

(not necessarily a vector space)

peak ridge May 25, 2024, 7:37 AM

#

sure

past meteor May 25, 2024, 7:37 AM

#

wooden sail early stopping would keep your parameter space lean before you go on to fit the ...

Okay this is a nice way to look at it

wooden sail May 25, 2024, 7:39 AM

#

the whole discussion on "dimension" is pretty powerful

#

linalg too stronk

#

yeah smth like that

#

dimension accounts for it automatically 😌

past meteor May 25, 2024, 7:43 AM

#

It's 2024 and I still haven't learnt how to use stan

wooden sail May 25, 2024, 7:44 AM

#

treating it like a manifold with charts of dimension <= d might work

#

there's a reason topological methods are hot stuff

past meteor May 25, 2024, 7:44 AM

#

https://mc-stan.org

Stan

wooden sail May 25, 2024, 7:44 AM

#

past meteor It's 2024 and I still haven't learnt how to use stan

https://tenor.com/view/meme-gif-23357291

Tenor

#

stop when d starts to increase

#

i think incorporating time is a problem tbh

#

referring to deciding after how many iterations you have reached a solution, not to time

#

you can make the analogy, but you may inadvertently introduce misconceptions on how the optimization process works

#

if you compose a function with itself 10 times, where is the time axis?

#

gradient-based methods are like fixed-point iterations. you compose a function with itself N times and hope you ended up at a good solution

#

choosing to attach the number N to time is an arbitrary choice

#

which is an arbitrary choice 😛

past meteor May 25, 2024, 7:49 AM

#

Is there an equivalence for gradient descent with early stopping when you have a closed form solution?

wooden sail May 25, 2024, 7:49 AM

#

time need not be how you parametrize the curve the fixed point iteration traced

wooden sail May 25, 2024, 7:49 AM

#

past meteor Is there an equivalence for gradient descent with early stopping when you have a...

i don't think so. you'd need an explicit regularizer if it's done in one step

#

like L2 or L1

#

in that case you can formulate regularization as a bias

#

lead me astray daddy, but do it gently

#

idk why i said that but i think it captures the sentiment

past meteor May 25, 2024, 7:50 AM

#

It's kind of nasty though, not like L2/L1

wooden sail May 25, 2024, 7:51 AM

#

this is fine, but one can make the argument that we use math to model things outside our standard intuition too

past meteor May 25, 2024, 7:51 AM

#

I use it because it's simpler to implement and reason about

wooden sail May 25, 2024, 7:51 AM

#

that motivates the abstract, axiomatic approach of maths from the last 100 years

#

it's more powerful

#

(and less intuitive)

#

idk about clearly, that very much seems to be your preferred flavor only because you studied something related to physics

#

there's a distinction between what is actually happening and how you choose to interpret it. the way you interpret it may bring limitations and introduce misconceptions

#

certainly

#

a lot of momentum techniques, as the name implies in the first place, are studied exactly as you say

#

i think some of the popular interpretations of nesterov acceleration have something to do with damping of a spring or something like that

#

just as there are other studies that use none of that

#

yeah needs some statistical flavor on top

#

that's essentially it, yeah? you have a network that spits something out and then you evaluate it in a loss function that maps to R

#

so the composition of network and loss function is a functional acting on the inputs

#

but then you have a chicken and egg problem since calc of vars exists separately from physics 😛

#

it's also there if you include the data, since now you have to include expectation operations

#

every calculus student: wdym integrals are easier than sums

#

hilbert spaces are already directly connected to ML

#

R^n and C^n are hilbert spaces

#

the complex number part is usually kinda trivial, since you don't need the structure of C^n. cost functions f:C^n -> R are anyway not complex analytic

#

so you study them by isomorphism to R^2N (wirtinger calculus or splitting real and imag)

#

not really "larger" anyway though

#

since complex floats are internally represented as 2 float 64s

#

so internally what you have in the computer is also R^2N with special multiplication and addition structure

smoky robin May 25, 2024, 8:37 AM

#

I have a question and for which I will have to post a picture is that okay

spring field May 25, 2024, 8:44 AM

#

sure

latent girder May 25, 2024, 9:01 AM

#

Hi, is this any good? Feel free to recommend, thanks. I just recently finished a basic python course and wanna jump into these stuffs. Im a complete beginner btw

spring field May 25, 2024, 9:08 AM

#

might wanna practice Python for a bit first

latent girder May 25, 2024, 9:09 AM

#

Alright thanks

lapis sequoia May 25, 2024, 11:31 AM

#

latent girder Hi, is this any good? Feel free to recommend, thanks. I just recently finished a...

Read the ISLR textbook and just look up stuff you want to learn. Anytime there is some sort of crash course, it is usually not good. Data School is good for beginners.

smoky robin May 25, 2024, 11:43 AM

#

#

My task is this

#

The only question I have is that by attributes it means column?

spring field May 25, 2024, 12:01 PM

#

ngl, but features and channels mean roughly the same thing to me, if you could provide some additional context, like that last task mentioned or a sample of the dataset or sth

past meteor May 25, 2024, 12:14 PM

#

If you have a multivariate time series you have n channels over t time points. Each channel has a measurement for each t ⋹ T. If you "flatten" it and give it to say a traditional ML model you have T * n features

spring field May 25, 2024, 12:35 PM

#

so it's 6 params each embedded in 38 dimensions? over time

#

sth like
(batch, sequence, features, channels)
pithink

lapis sequoia May 25, 2024, 12:37 PM

#

why is scipy.optimize never used? Are there just better methods? I know hyperparmeter tuning is better, but, yeah.

past meteor May 25, 2024, 12:38 PM

#

lapis sequoia why is scipy.optimize never used? Are there just better methods? I know hyperpar...

Used for what exactly?

#

I use it infrequently

lapis sequoia May 25, 2024, 12:40 PM

#

optimization. I remember a couple of years ago, I used it to optimize some residual for lasso regression. I do not know, ML is optimization heavy.

past meteor May 25, 2024, 12:40 PM

#

lapis sequoia optimization. I remember a couple of years ago, I used it to optimize some resid...

For that you can just use Lasso or ElasticNet directly in sklearn

lapis sequoia May 25, 2024, 12:41 PM

#

I had to use it years ago in grad school

#

he wanted us to do it that way for some reason

wooden sail May 25, 2024, 12:42 PM

#

scipy optimize will either use heuristics for the gradients and hessians, or require you to provide them explicitly

past meteor May 25, 2024, 12:42 PM

#

Nothing wrong with that. It just requires a bit more steps than just using sklearn. I use scipy directly when I have to

wooden sail May 25, 2024, 12:43 PM

#

you use them when your optimization problem is nice and easy to formulate explicitly

lapis sequoia May 25, 2024, 12:43 PM

#

Hessians man.

wooden sail May 25, 2024, 12:43 PM

#

with deep learning this is not the case, so you use something different

lapis sequoia May 25, 2024, 12:43 PM

#

That is when there is more than one variable to optimize, right? I do not remember

wooden sail May 25, 2024, 12:43 PM

#

no

lapis sequoia May 25, 2024, 12:43 PM

#

what is it

wooden sail May 25, 2024, 12:43 PM

#

hessians? or which part?

past meteor May 25, 2024, 12:43 PM

#

How do they fit lasso again? Coordinate descent?

lapis sequoia May 25, 2024, 12:44 PM

#

yeah, second order partials: [f11,f12] [f21,f22]

wooden sail May 25, 2024, 12:44 PM

#

past meteor How do they fit lasso again? Coordinate descent?

that's an option. usually some form of iterative shrinkage/thresholding with (possibly block coordinate) descent

lapis sequoia May 25, 2024, 12:44 PM

#

youngs theorem

wooden sail May 25, 2024, 12:44 PM

#

lapis sequoia yeah, second order partials: [f11,f12] [f21,f22]

you can do it for the case with a single variable as well

lapis sequoia May 25, 2024, 12:46 PM

#

wooden sail you can do it for the case with a single variable as well

How?

wooden sail May 25, 2024, 12:46 PM

#

lapis sequoia How?

you differentiate twice 😛

past meteor May 25, 2024, 12:46 PM

#

The last time I used scipy was when I handrolled some time series methods because I don't like most implementations 😦

lapis sequoia May 25, 2024, 12:47 PM

#

wooden sail you differentiate twice 😛

that is just a second derivative of one variable

wooden sail May 25, 2024, 12:47 PM

#

well, if there is only one variable, that's all you have

#

the hessian is the jacobian of the gradient. in the univariate case, that's just the second derivative of the one variable

#

in more dimensions you get also the cross derivatives

warm pebble May 25, 2024, 1:35 PM

#

can somebody help me make an object detection ai

astral prism May 25, 2024, 1:36 PM

#

warm pebble can somebody help me make an object detection ai

use yolo

#

the best

warm pebble May 25, 2024, 1:36 PM

#

ok but i want to create my own

astral prism May 25, 2024, 1:37 PM

#

warm pebble ok but i want to create my own

yeah u can do it

warm pebble May 25, 2024, 1:37 PM

#

ok i tried to download yolo but i dont know what file to choose on github

astral prism May 25, 2024, 1:37 PM

#

u gonna use use yolo to train ur own ai model

flat sigil May 25, 2024, 1:37 PM

#

finally having actual success 🙏

astral prism May 25, 2024, 1:38 PM

#

warm pebble ok i tried to download yolo but i dont know what file to choose on github

u can search on kaggale

#

instead on github

warm pebble May 25, 2024, 1:46 PM

#

astral prism u can search on kaggale

what do i do to install it

astral prism May 25, 2024, 1:55 PM

#

warm pebble what do i do to install it

check my github the latest repo car detection one

#

u will understand

warm pebble May 25, 2024, 2:07 PM

#

astral prism check my github the latest repo car detection one

what is your github

astral prism May 25, 2024, 2:08 PM

#

warm pebble what is your github

its in my profile

lapis sequoia May 25, 2024, 2:38 PM

#

wooden sail the hessian is the jacobian of the gradient. in the univariate case, that's just...

you need to know the signs of so example: f12 = f21; right? This was a while ago.

wooden sail May 25, 2024, 2:39 PM

#

i don't understand your question

lapis sequoia May 25, 2024, 2:43 PM

#

the cross partials, like, f11 *f22 -( f12)**2 < 0; if f11 < 0; then it is at a maximum, if f22 < 0; maximum as well, in order for the cross-partials to hold, the signs need to be known. And that is only know by knowing if f12 = f21 or something

wooden sail May 25, 2024, 2:44 PM

#

lapis sequoia the cross partials, like, f11 *f22 -( f12)**2 < 0; if f11 < 0; then it is at a m...

for functions with continuous second order partial derivatives, the order of differentiation does not matter and the hessian is immediately symmetric

#

what one looks for is whether all of the eigenvalues of the hessian are positive or negative

lapis sequoia May 25, 2024, 2:45 PM

#

wooden sail for functions with continuous second order partial derivatives, the order of dif...

show me

#

f12 does not always equal f21

wooden sail May 25, 2024, 2:45 PM

#

lapis sequoia f12 does not always equal f21

https://en.wikipedia.org/wiki/Symmetry_of_second_derivatives#Schwarz's_theorem

#

no, not always, but many functions you deal with in optimization do work this way

#

particularly under the condition i mentioned above

lapis sequoia May 25, 2024, 2:48 PM

#

wooden sail no, not always, but many functions you deal with in optimization do work this wa...

Gotcha. Sorry, it has been a while and I forgot. The jacobian determinate is just to make sure the IFT holds or something and it is a matrix of first order derivatives, right? and it cannot = 0

smoky robin May 25, 2024, 5:00 PM

#

My model is giving me a recall score if 1

#

Yet other methods (accuracy, precision, f1 score) are showing accuracy of around 95

past meteor May 25, 2024, 5:02 PM

#

That's very possible

#

Recall means that you classified all true positives as being of the positive class

smoky robin May 25, 2024, 5:03 PM

#

Is that not over fitting of data?

past meteor May 25, 2024, 5:03 PM

#

But you can have false positives, which is impacts your precision, accuracy and f1 score

#

Not necessarily

#

Imagine your data is 50/50 distributed between + and -

#

In your case you have 51/49 for example

smoky robin May 25, 2024, 5:04 PM

#

It is 45/46

past meteor May 25, 2024, 5:04 PM

#

All those that are + were classified as + but you misclassified a few extra as + that should've been -

#

then this shouldn't be too hard to figure out, look at the data 🙂

smoky robin May 25, 2024, 5:05 PM

#

I get it a little bit

#

Thank you very much

arctic silo May 25, 2024, 8:10 PM

#

datacamp it worth for learning data science and practising projects

serene scaffold May 25, 2024, 8:28 PM

#

arctic silo datacamp it worth for learning data science and practising projects

Are you asking a question?

flat sigil May 25, 2024, 8:40 PM

#

is it bad that the average minimum fitness of each generation is staying relatively constant?

lapis sequoia May 25, 2024, 9:13 PM

#

Topics in ML that are the most advanced, what would you guys/girls, say?

lapis sequoia May 25, 2024, 9:14 PM

#

flat sigil is it bad that the average minimum fitness of each generation is staying relativ...

What are these generations? This generation or whatever would dominate all previous generations in terms of fitness by miles

flat sigil May 25, 2024, 9:30 PM

#

lapis sequoia What are these generations? This generation or whatever would dominate all previ...

Wdym?

flat sigil May 25, 2024, 10:06 PM

#

Thx. After each training generation I just wrote the max mean and min fitness of that generation to a SQLite db and the used matplolib to plot it and scipy to make the smoothed curve

#

lol

#

Honestly I have never used dataframes before.

lapis sequoia May 25, 2024, 10:49 PM

#

Any of you into Game Theory? I heard it was used in RL. Kind of obsessed with Game Theory and IO. Like, I have heard RL borrows from Game Theory, how?

hollow escarp May 25, 2024, 11:39 PM

#

ANy ideas how can i optimize onnx model for arm64 architecture?

real phoenix May 26, 2024, 6:40 AM

#

I want to webscrape IMDB top 1000 movie subtitles to classify them using NLP. Any suggestions as to how to proceed. There are permissions issues I am running into. I'd also be ok with using a dump of these files. Thank you.

hollow escarp May 26, 2024, 10:53 AM

#

hollow escarp ANy ideas how can i optimize onnx model for arm64 architecture?

I need to speed up my model object detection for raspberry pi devices

river cape May 26, 2024, 11:27 AM

#

Hi guys does the word2vec need sentences or is it fine even if we feed only the words?

past meteor May 26, 2024, 12:51 PM

#

lapis sequoia Any of you into Game Theory? I heard it was used in RL. Kind of obsessed with Ga...

I did a bit of game theory in my bachelors. I think they're quite different because the classical RL setting just has a single agent whereas game theory is more of a multi-agent thing. When you have multiple agents there are game theoretic insights.

Another way you can look at it is from this perspective: in RL you have to learn the optimal policy and game theory actually gives you an answer to that question in the form of a nash-equilibrium (wherever present)

arctic silo May 26, 2024, 1:42 PM

#

serene scaffold Are you asking a question?

yeah

left tartan May 26, 2024, 1:43 PM

#

real phoenix I want to webscrape IMDB top 1000 movie subtitles to classify them using NLP. An...

What's the issue? Is this a scraping question or a processing question?

left tartan May 26, 2024, 1:43 PM

#

hollow escarp ANy ideas how can i optimize onnx model for arm64 architecture?

Theres no details here. Whats slow? Why is it slow? What is your code? Etc.

hollow escarp May 26, 2024, 2:02 PM

#

left tartan Theres no details here. Whats slow? Why is it slow? What is your code? Etc.

So basicly im runing object detection script on machine with out GPU so it runs on cpu. What more details do you need?

left tartan May 26, 2024, 2:05 PM

#

hollow escarp So basicly im runing object detection script on machine with out GPU so it runs ...

That's like zero details. So I guess explain more?

teal lance May 26, 2024, 2:07 PM

#

It is possible to use lightweight charts and yfinance together ? 🤔

humble mesa May 26, 2024, 4:16 PM

#

yes.
https://lightweight-charts-python.readthedocs.io/en/latest/examples/yfinance.html

orchid forge May 26, 2024, 4:25 PM

#

oh god, i just downloaded a dataset from kaggle, to make a project out of it. im not able to import a simple excel file

humble mesa May 26, 2024, 4:26 PM

#

May you share the code?

orchid forge May 26, 2024, 4:26 PM

#

sure hold on

humble mesa May 26, 2024, 4:26 PM

#

dw take your time

#

I would use pandas if i were you

orchid forge May 26, 2024, 4:29 PM

#

humble mesa May 26, 2024, 4:30 PM

#

Is the file name correct? (Try removing the whitespace of your filename)

#

/ is the file in the current working directory

#

On my end works, considering the correct file path

orchid forge May 26, 2024, 4:34 PM

#

orchid forge May 26, 2024, 4:34 PM

#

orchid forge

it looks like this

humble mesa May 26, 2024, 4:34 PM

#

Okay, is this file in the same directory as your notebook

orchid forge May 26, 2024, 4:34 PM

#

humble mesa On my end works, considering the correct file path

i have to pip it oh oh ok

humble mesa May 26, 2024, 4:34 PM

#

don't forget the !pip

orchid forge May 26, 2024, 4:35 PM

#

so everything is good with my file name?

humble mesa May 26, 2024, 4:35 PM

#

And if you have a virtual environment, use python -m if you are inside your shell

orchid forge May 26, 2024, 4:35 PM

#

there's no white space?

orchid forge May 26, 2024, 4:35 PM

#

humble mesa And if you have a virtual environment, use python -m if you are inside your shel...

what does that mean

humble mesa May 26, 2024, 4:35 PM

#

File name should be fine, just wasn't sure how pandas reacts to whitespaces xD

orchid forge May 26, 2024, 4:36 PM

#

oh

humble mesa May 26, 2024, 4:36 PM

#

orchid forge what does that mean

Okay my sr. data scientist told me that recently. If you do a normal pip install everything will land in your base python

#

This is bad, since some libraries might crash with different versions

#

so we use venvs (virtual environments).

Those are isolated environments where all libraries for your project are separated from the base python and other project environments

#

(At least this is how I understood it)

orchid forge May 26, 2024, 4:38 PM

#

humble mesa so we use venvs (virtual environments). Those are isolated environments where ...

oh

humble mesa May 26, 2024, 4:39 PM

#

idk if that is the real technical answer but it has worked on my end since then 🙂

#

does it work now?

orchid forge May 26, 2024, 4:41 PM

#

orchid forge

im getting the same error

humble mesa May 26, 2024, 4:42 PM

#

Hm ok.. wait let me check something really quick

orchid forge May 26, 2024, 4:42 PM

#

k

humble mesa May 26, 2024, 4:42 PM

#

You are on windows, right?

orchid forge May 26, 2024, 4:42 PM

#

FileNotFoundError: [Errno 2] No such file or directory: 'salaries (2).csv.csv'

orchid forge May 26, 2024, 4:42 PM

#

humble mesa You are on windows, right?

ya

humble mesa May 26, 2024, 4:43 PM

#

#

maybe it has to do with this (idk only an educated guess)

orchid forge May 26, 2024, 4:44 PM

#

oh k lemme check

lapis sequoia May 26, 2024, 4:44 PM

#

past meteor I did a bit of game theory in my bachelors. I think they're quite different beca...

Game Theory has sequential move games(like chess) when you have to wait for the player 1 to move first and the second player acts based on the moves that the first player makes. Simultaneous move games are when both players act without knowing what you other player is doing. These games have a game table(most common game). Pure strategy Nash-Equillibrium is when both players cannot have a better pay off based on what the other player(s) has done. There does not have to be one Nash-equilibrium, there can be no Nash equilibrium and more there can be more than one pure-strategy Nash equilibrium. This is just simple game theory groundings. Game Theory becomes nonsense hard when it comes to Auctions. Auctions are extremely hard. I regress. Nash-Equillibrium in sequential move games is 'Sub-Game Perfect equillibrium'. There is first and second mover advantage. And, Sequential\simultaneous move games can be combined. There are also mixing games, mixed-strategy Nash equilibrium, Bayesian Nash Equilibrium, signaling games(hard), Infinitely Repeated prisoners dilemma(my favorite), Bertrand pricing games, Cournot games, Voting Games, Incentive Design(principal-agent), Collective Action Games. 'Agents' are players. I do not know. My friend is in RL and would ask me for Game Theory advice. Game Theory is incredibly underated and it is not a joke.

orchid forge May 26, 2024, 4:47 PM

#

humble mesa

do i have to add the file name right after i have pasted the whole path to that file?

humble mesa May 26, 2024, 4:49 PM

#

So you mean something like this C:\path\to\ file (2).csv ?

orchid forge May 26, 2024, 4:49 PM

#

ya

humble mesa May 26, 2024, 4:50 PM

#

There is no space after the \ afaik

orchid forge May 26, 2024, 4:50 PM

#

nope

humble mesa May 26, 2024, 4:50 PM

#

damn

#

Ok let me google

orchid forge May 26, 2024, 4:51 PM

#

im so sorry, i feel so embarassed

humble mesa May 26, 2024, 4:51 PM

#

nah dw

#

We all have been there

I am neither an expert as well. Literally in my 2nd semester xD

orchid forge May 26, 2024, 4:54 PM

#

omg i did it

humble mesa May 26, 2024, 4:54 PM

#

how??

orchid forge May 26, 2024, 4:54 PM

#

i got it done

humble mesa May 26, 2024, 4:54 PM

#

Awesome

orchid forge May 26, 2024, 4:54 PM

#

thanks

#

im just a stupid beginner im the only person with with dumb questions

humble mesa May 26, 2024, 4:56 PM

#

Nono you are all good 🙂

I initially learned Java before my uni and now I am in my 2nd semester of python so I am new as well

#

What was the fix btw?

orchid forge May 26, 2024, 4:57 PM

#

i made a stupid problem, i didnt add 'r'

humble mesa May 26, 2024, 4:58 PM

#

Happens xD

I literally got roasted by Java sometimes for not adding a ; in line 42

#

see it positively: you will never forget the r 😄

orchid forge May 26, 2024, 5:02 PM

#

ya

#

lol

misty shuttle May 26, 2024, 5:04 PM

#

how do i proceed with ML if i've learnt numpy, pandas, matplotlib and scikit learn?

humble mesa May 26, 2024, 5:05 PM

#

What helped me was doing a data visualisation project

#

I analysed the top 1000 streamers world wide and ran some statistics on a public dataset.

#

So my go-to answer would be:

1. get some dataset
2. Try to visualise the data 
3. Upload to git

#

btw i like matplotlib as well, however plotly express is perfect for interactive elements

misty shuttle May 26, 2024, 6:44 PM

#

humble mesa So my go-to answer would be: ```py 1. get some dataset 2. Try to visualise the d...

thanks!

hollow escarp May 26, 2024, 7:29 PM

#

left tartan That's like zero details. So I guess explain more?

Could you give like exact info what detials do you need? Code? Model?

serene scaffold May 26, 2024, 7:29 PM

#

hollow escarp Could you give like exact info what detials do you need? Code? Model?

All of those, yes.

hollow escarp May 26, 2024, 7:30 PM

#

https://universe.roboflow.com/roboflow-universe-projects/license-plate-recognition-rxg4e i use this model converted to onnx using onnx cli

Roboflow

License Plate Recognition Object Detection Dataset and Pre-Trained ...

10126 open source license-plates images plus a pre-trained License Plate Recognition model and API. Created by Roboflow Universe Projects

#

!pastebin

arctic wedgeBOT May 26, 2024, 7:32 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow escarp May 26, 2024, 7:33 PM

#

https://paste.pythondiscord.com/LD3A thats code which i use for object detrection

#

      if not frame_queue.empty():
        frame = frame_queue.get()
        # cv2.imshow("Frame", frame)
        height, width = frame.shape[:2]
        processed_image = prepare_image_input(frame)
        plate = detect_closest_license_plate(
          session, processed_image, height, width, app_logger
        )``` Here is how i trigger that function for camera stream

#

How can I send you my model in onnx format to not get message from @arctic wedge ?

#

Some model info from https://netron.app/

Netron

Visualizer for neural network, deep learning and machine learning models.

#

@serene scaffold what else do you need?

lapis sequoia May 26, 2024, 10:08 PM

#

does anyone have a really good guide to NLPs? I have been working on some, but I just want a good guide

#

one that is not outdated

serene scaffold May 26, 2024, 10:14 PM

#

lapis sequoia does anyone have a really good guide to NLPs? I have been working on some, but I...

Nothing is "an nlp". No one talks about "NLPs" to refer to programs.

What specific thing do you want to make?

lapis sequoia May 26, 2024, 10:15 PM

#

ok, so like, countvectorize,tfidvectorize, when one starts the train/test split, I do not know I get this one error all of the time. I do not know if it is because I used tfid instead of countvectorize or vice versa

serene scaffold May 26, 2024, 10:18 PM

#

lapis sequoia ok, so like, countvectorize,tfidvectorize, when one starts the train/test split,...

There isn't an overarching guide to all of nlp because it's a broad category of AI. Whatever you're working on is concerned with a small subset of what is considered to be nlp.

If you need help in relation to an error message, be sure to always always show the whole error message

lapis sequoia May 26, 2024, 10:21 PM

#

serene scaffold There isn't an overarching guide to all of nlp because it's a broad category of ...

How broad? and this:
UnimplementedError: Graph execution error:

#skiping error

Node: 'binary_crossentropy/Cast'
Cast string to float is not supported
[[{{node binary_crossentropy/Cast}}]] [Op:__inference_train_function_21541]

serene scaffold May 26, 2024, 10:27 PM

#

lapis sequoia How broad? and this: UnimplementedError: Graph execution error: #skiping error...

I'm not sure how to quantity how broad it is.

It looks like you have some data structure that's supposed to contain only numbers, but it contains strings.

lapis sequoia May 26, 2024, 10:28 PM

#

I will figure it out. It always heppens during one step. I do not know it is confusing me

serene scaffold May 26, 2024, 10:30 PM

#

lapis sequoia I will figure it out. It always heppens during one step. I do not know it is con...

I'm on mobile, so I can't dig in more at the moment. You might make a reproducible example of the problem you're having

lapis sequoia May 26, 2024, 10:32 PM

#

serene scaffold I'm on mobile, so I can't dig in more at the moment. You might make a reproducib...

alright, will not be hard to do, thank you.

lapis sequoia May 26, 2024, 10:52 PM

#

serene scaffold I'm on mobile, so I can't dig in more at the moment. You might make a reproducib...

it is this. Everytime, Sorry, not trying to bother you, this is the error,and this is the snippet of code I am talking about: raise ValueError(Errors.E1041.format(type=type(doc_like)))

ValueError: [E1041] Expected a string, Doc, or bytes as input, but got: <class 'list'> def lemmatize_reviews(df):
df['Quote'] = df['Quote'].apply(lemmatize_text)
return df

from nltk.tokenize import word_tokenize

def lemmatize_text(text):
doc = nlp(text)
lemmatized_text = ' '.join([token.lemma_ for token in doc])
return lemmatized_text

def do_tokenization(text):

token_words = word_tokenize(text)
return token_words

df['Quote'] = df['Quote'].apply(do_tokenization)

from sklearn.feature_extraction.text import TfidfVectorizer,CountVectorizer

tfid = TfidfVectorizer(preprocessor=do_tokenization)

df['Quote'] = df['Quote'].apply(lemmatize_text)
df['Anime'] = df['Anime'].apply(lemmatize_text)

cv = CountVectorizer()

spring field May 26, 2024, 10:53 PM

#

lapis sequoia it is this. Everytime, Sorry, not trying to bother you, this is the error,and th...

!code could you please format that

arctic wedgeBOT May 26, 2024, 10:53 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

frosty spear May 26, 2024, 10:54 PM

#

print("Code")

sand bloom May 26, 2024, 11:24 PM

#

Hii, can i ask for simple help on a Collaborative filtering question?

serene scaffold May 26, 2024, 11:25 PM

#

sand bloom Hii, can i ask for simple help on a Collaborative filtering question?

Yes. Be sure to always ask a complete question that someone can start answering.

small ore May 26, 2024, 11:32 PM

#

" implemented sklearn library for cosine" ???? It sounds like you implemented that class in sklearn

#

As in the source code

sand bloom May 26, 2024, 11:35 PM

#

yes sorry

sand bloom May 27, 2024, 2:11 AM

#

Hi no worries sorted my issue my question was just dumbly asked

spring field May 27, 2024, 2:13 AM

#

PER in RL sort of seems similar to DML and triplet loss where you try to get the furthest same class and closest other class to your current thing you're checking to sort of push away the other class and bring the same class closer
similarly PER should enable both retrieving low score actions in states thus making it less likely to make such actions in the future (pushing it away) and bring actions that had greater rewards closer (if making an analogy for triplet loss)

stray tide May 27, 2024, 8:10 AM

#

Hey everyone! 😊
I'm diving back into programming after a bit of a break and planning to create a stock market prediction ML model. I've got some basic ML knowledge from Kaggle, but I'm looking for some buddies to learn and build this together. Programming alone can get pretty dull, and having friends makes it so much more fun! If you're interested, hit me up! Let's make this project awesome together!

stiff urchin May 27, 2024, 8:36 AM

#

what concepts do i need to learn in calculus for machine learning? Already learnt integration and differentiation, what else?

stiff urchin May 27, 2024, 8:54 AM

#

do i need to learn each and every topic for ml?

#

not backend engineering, ig i would need to learn different different languages for that.. currently im planning for research

wooden sail May 27, 2024, 8:57 AM

#

research of what kind

#

the applied approach requires less maths, but stuff like designing new architectures and studying their properties requires a fair amaount of math

#

all of those are made up terms trying to keep up with the tasks people end up doing at work, idk how meaningful it is to distinguish them strictly when companies don't

#

i don't think the experience will be much different in most places

toxic mortar May 27, 2024, 9:11 AM

#

do you have phd? @final kiln

#

Have you looked what europe-based ml programs for msc/phd can offer?

stiff urchin May 27, 2024, 9:14 AM

#

actually, im still in high school 😅 currently want to learn ml as a hobby so in future i can become data scientist. and im even preparing for one of the toughest exams in the world (jee; you might know about it), i believe it consists of some of the concepts of calculus, it'll help building my foundation and then I'll start with advanced topics

toxic mortar May 27, 2024, 9:14 AM

#

Does it makes sense to do phd in ml if u are not aiming for r&d roles?

#

I will be soon getting into it and I'll share what I hear/find w u

past meteor May 27, 2024, 9:18 AM

#

The term data scientist is so dead

#

ML engineer will perish soon as well

#

Nobody, it's going to be a game of revolving chairs

#

Data scientist nowadays means "does anything with data"

#

ML engineer will soon mean "does anything with ML", including calling openai's APIs

#

Just wait, I saw the same thing gradually happen to data science

#

Well, nowadays it just means "makes dashboards in powerbi" anyway 😭

humble mesa May 27, 2024, 10:59 AM

#

past meteor Well, nowadays it just means "makes dashboards in powerbi" anyway 😭

Well we have a dedicated powerbi team separated from the AI team and Data Science team. Unfortunately, some believe that AI = LLMs and LLMs = M365 Co Pilot

past meteor May 27, 2024, 11:02 AM

#

Yeah, I'm working exclusively on deep learning right now but I'm not sure if there will be any other project like mine after it's done

dusty talon May 27, 2024, 11:03 AM

#

hey uh im new to ai i tried to make a classifier ai im not sure why its saying ball is a verb though any idea? i have a feeling its lack of data because im using not a lot of data but ??
https://paste.pythondiscord.com/YGAQ

humble mesa May 27, 2024, 11:03 AM

#

JEEEEEEZ try a bow approach

#

I mean I am in my 2nd semester. I honestly haven't worked so much with that but what I would do is take random text

build bigrams

#

THis is way easier to do with a library (let me search a good one)

https://maartengr.github.io/KeyBERT/guides/countvectorizer.html#basic-usage

CountVectorizer - KeyBERT

Leveraging BERT to extract important keywords

dusty talon May 27, 2024, 11:10 AM

#

alright let me try tysm

spring field May 27, 2024, 11:28 AM

#

pop!

edgy elm May 27, 2024, 12:08 PM

#

Guys i got a question about some confussion matrix why my image doesnt show the number? while i print it have the number shown

#

like this

spring field May 27, 2024, 12:32 PM

#

hard to tell without the code

#

yeah, I was thinking about its applications in RL as well

misty shuttle May 27, 2024, 1:14 PM

#

humble mesa So my go-to answer would be: ```py 1. get some dataset 2. Try to visualise the d...

can you explain what exactly a data-visualisation project is? I dont get how its related to ML sorry

edgy elm May 27, 2024, 1:29 PM

#

spring field hard to tell without the code

thx for replying the problem was solved, the seaborn lib must be 0.13.0

orchid forge May 27, 2024, 2:10 PM

#

hola

#

im making a project on this type of data

#

can some one help me with what type of questions i should come up with to make a project about it?

uncut plaza May 27, 2024, 4:25 PM

#

orchid forge can some one help me with what type of questions i should come up with to make a...

you can do time series anaysis with that data, maybe predict the salary with experience level

orchid forge May 27, 2024, 4:26 PM

#

uncut plaza you can do time series anaysis with that data, maybe predict the salary with exp...

ok

uncut plaza May 27, 2024, 4:26 PM

#

is this a personal project or related to school?

orchid forge May 27, 2024, 4:26 PM

#

personal project

uncut plaza May 27, 2024, 4:27 PM

#

okay that's awesome, keep working on it

orchid forge May 27, 2024, 4:28 PM

#

thanks man. on to it haha

analog bolt May 27, 2024, 5:32 PM

#

So making an AI that does one thing, e.g, guessing a country based on its shape, is relatively simple, but how much harder is making an AI that is able to do 2 things, e.g, can recognise countries based on shape, but can also recognise countries based on their flags?

torpid dune May 27, 2024, 5:32 PM

#

hi everyone! i was looking for an all in one ml ecosystem, as lightning studio AI, does someone use it? How about speed? do you have any better recommendation for realtime inference?

uncut plaza May 27, 2024, 5:47 PM

#

analog bolt So making an AI that does one thing, e.g, guessing a country based on its shape,...

its not that hard you can use the same model you did for predicting using shape u can make it predict using flag same model differnet dataset

analog bolt May 27, 2024, 5:49 PM

#

uncut plaza its not that hard you can use the same model you did for predicting using shape ...

No, I mean being able to do both at the same time

#

How much harder would it be to create a model that can do both?

uncut plaza May 27, 2024, 5:49 PM

#

depends on what your definition of hard is

#

like i said it should be the same

spare briar May 27, 2024, 6:41 PM

#

torpid dune hi everyone! i was looking for an all in one ml ecosystem, as lightning studio A...

what model are you serving?

torpid dune May 27, 2024, 6:42 PM

#

context: im building a multi agent with langgraph.
But this multiagent contains multiples models. One of that, is a Vision model.
The other ones are simples llm inference endpoints.

The vision model, currently is an NVIDIA endpoint, but it takes too much time like 2-3 seconds. Im looking for deploy a self-hosted vision model to try to reduce the inference time.

spare briar May 27, 2024, 6:45 PM

#

i would recommend tensorrt + triton inference server

#

prod ready, fast, maintained well

torpid dune May 27, 2024, 6:49 PM

#

spare briar prod ready, fast, maintained well

it seems very cool, thanks for the suggestion! i'll take a look. It contains a grpc server, great!

spare briar May 27, 2024, 6:50 PM

#

yes we use grpc and have had good experience

earnest finch May 27, 2024, 7:02 PM

#

hello everyone, i wanted to learn reinforcement learning so came here to ask suggestions for good resources available online
i have completed my college math and have a decent understanding of statistics, lin alg and calculus.
i have worked with simple ANNs like making it from scratch (shoutout to andrej karpathy for it). unfortunately i couldnt find any videos from him which teach RL so wanted your opinions on where else/who else can i watch for learning how RL's work from scratch. im fine with papers as well (as long as they dont include any complicated math notations/vocabulary 😅 as i still consider myself to be a beginner)

spring field May 27, 2024, 7:05 PM

#

you can start here I guess https://medium.com/free-code-camp/an-introduction-to-reinforcement-learning-4339519de419

past meteor May 27, 2024, 7:08 PM

#

earnest finch hello everyone, i wanted to learn reinforcement learning so came here to ask sug...

I did it by reading this book http://incompleteideas.net/book/the-book-2nd.html and implementing every algorithm they show pseudocode for

earnest finch May 27, 2024, 7:09 PM

#

ahhh will check them out, thank you

past meteor May 27, 2024, 7:09 PM

#

The devil is in the details and just reading the text was not enough for me to understand the nuance

#

Especially when you start comparing on-policy vs off-policy methods etc

earnest finch May 27, 2024, 7:10 PM

#

thank you so much

past meteor May 27, 2024, 7:11 PM

#

It was pretty fun. After writing this stuff from scratch (just numpy + matplotlib) everything became clear 😄

spring field May 27, 2024, 7:12 PM

#

I'm doing policy stuff now... it's a bit of a confusing topic to say the least and I don't understand why it's not converging waaaaaaaaaahhhhhh I mean it is, so maybe the issue is more with policy gradient not being efficient enough on its own shruganimated

past meteor May 27, 2024, 7:12 PM

#

spring field I'm doing policy stuff now... it's a bit of a confusing topic to say the least a...

it's worth to start from the beginning of the book imho

earnest finch May 27, 2024, 7:12 PM

#

maybe i will understand the magical words used once i read the books 😄

past meteor May 27, 2024, 7:12 PM

#

there's a very logical trajectory to all of this

earnest finch May 27, 2024, 7:12 PM

#

im assuming i start with full pdf link?

past meteor May 27, 2024, 7:13 PM

#

beginning with policy gradient methods would've confused the heck out of me

#

Also, many of the algorithms are proven to converge under specific conditions

spring field May 27, 2024, 7:13 PM

#

I mean, I didn't begin with policy gradients, I'm there now

past meteor May 27, 2024, 7:13 PM

#

like, on-policy + tabular => converges

#

off-policy + function approximation => can possibly not converge etc

spring field May 27, 2024, 7:15 PM

#

nvm, I'm gonna just go and read that book, lmao

past meteor May 27, 2024, 7:15 PM

#

policy gradient should be on-policy though iirc

spring field May 27, 2024, 7:17 PM

#

I'm just currently going over all those methods really, I think aim is to get familiar with a bunch of network types and then specialize later? the course I mentioned a couple days ago

river cape May 27, 2024, 7:17 PM

#

How do you handle imbalance data?

spring field May 27, 2024, 7:18 PM

#

tbf, from a learning perspective, making multiple passes over learning material, going deeper and deeper on each iteration is a better method than just going one by one and trying to have a deep understanding immediately

spring field May 27, 2024, 7:19 PM

#

river cape How do you handle imbalance data?

by using weights, but I think it depends on what exactly you're doing, cuz apparently for say next token prediction you don't want to do that

past meteor May 27, 2024, 7:19 PM

#

river cape How do you handle imbalance data?

doing nothing

#

if that's your distribution, that's your distribution

spring field May 27, 2024, 7:19 PM

#

squint hmmm

river cape May 27, 2024, 7:19 PM

#

spring field by using weights, but I think it depends on what exactly you're doing, cuz appar...

Lets say my dataset has 20% of no cases and 80% of yes cases

past meteor May 27, 2024, 7:20 PM

#

spring field tbf, from a learning perspective, making multiple passes over learning material,...

9 times out of 10 I'd agree but the book is so well written I think it's the kind of text you can go through and just get it in a single pass

river cape May 27, 2024, 7:20 PM

#

Should I take 20% yes cases only?

past meteor May 27, 2024, 7:21 PM

#

It has a logical structure that makes RL seem coherent, if you go method by method (as is actually not a bad idea for traditional ML) you're kind of losing out on that

spring field May 27, 2024, 7:21 PM

#

past meteor 9 times out of 10 I'd agree but the book is so well written I think it's the kin...

I was more talking about the whole course that I'm doing where we don't really go thaaaat much in depth of all the models we cover

past meteor May 27, 2024, 7:21 PM

#

oooooh

#

yeah ok that makes sense

#

then I agree

spring field May 27, 2024, 7:22 PM

#

but if I'm gonna be doing some more RL afterwards, I'll definitely check out that book, bookmarked it

past meteor May 27, 2024, 7:24 PM

#

If your course touched on RL it's probably a really good one

#

Maybe we should pin it?

#

As a nice "overview" type of course

spring field May 27, 2024, 7:26 PM

#

it's, uhh, a private one so to speak, but it is good, yes 😁

past meteor May 27, 2024, 7:27 PM

#

fair enough! 🙂

lusty relic May 27, 2024, 7:44 PM

#

Is levenshtein good option for spell checking if i have sequences of words in database for search recommendations?

past meteor May 27, 2024, 7:48 PM

#

I'll link this again as what you should do with unbalanced data

https://scikit-learn.org/stable/auto_examples/model_selection/plot_cost_sensitive_learning.html

scikit-learn

Post-tuning the decision threshold for cost-sensitive learning

Once a classifier is trained, the output of the predict method outputs class label predictions corresponding to a thresholding of either the decision_function or the predict_proba output. For a bin...

#

this is true but typically not the case, at least in applications where people talk about unbalanced data

spring field May 27, 2024, 7:53 PM

#

alright, suppose it's a classification task, you'd want to handle class unbalance there for sure, right?

past meteor May 27, 2024, 7:53 PM

#

There's also the misconception that the method will just ignore the minority class in favour of the majority

#

That's not necessarily true

past meteor May 27, 2024, 7:53 PM

#

spring field alright, suppose it's a classification task, you'd want to handle class unbalanc...

no

past meteor May 27, 2024, 7:54 PM

#

past meteor I'll link this again as what you should do with unbalanced data https://scikit...

This example is classification

spring field May 27, 2024, 7:55 PM

#

I can understand that specific example

past meteor May 27, 2024, 7:55 PM

#

yeah I agree on it too but the thing is

#

I wouldn't want to mention it because I don't like talking about downsampling/upsampling because that's what people like doing which is just the wrong thing to do in many cases

#

For instance, I did a course on biometrics (making models for fingerprint, retina, face, ... detection) and we didn't downsample or anything funky a single time

#

You can imagine how unbalanced these datasets are.

#

We just used common sense, less sexy ways to deal with imbalance

spring field May 27, 2024, 7:59 PM

#

past meteor For instance, I did a course on biometrics (making models for fingerprint, retin...

mmm, doesn't that use DML or sth?

past meteor May 27, 2024, 7:59 PM

#

DML?

spring field May 27, 2024, 7:59 PM

#

Deep Metric Learning

#

how face detection works and stuff

past meteor May 27, 2024, 7:59 PM

#

yeah, that's one of the things we did

#

but there's many other ways

spring field May 27, 2024, 8:00 PM

#

AVE?

#

ah, these, we did cover them too at some point

past meteor May 27, 2024, 8:03 PM

#

Looking back at it, we did eigenfaces, fisher faces, local binary patterns and deep metric learning

#

I also did deep metric learning for retina detection

#

well, classification/retrieval

#

damn, sift features are literally stone age tech now

past meteor May 27, 2024, 8:05 PM

#

past meteor damn, sift features are literally stone age tech now

this ugly thing is a retina, but placed horizontally instead of in a circle

#

It's your eye

#

they're unique identifiers, even between twins

#

ah shit

#

true

#

https://tenor.com/view/biometrics-steve-webb-fbi-international-eye-scanner-retina-scanner-gif-25897822

Tenor

#

like this stuff

#

was a cool course in hindsight, kind of like applied computer vision

#

the one I sent?

#

it's legit ye but these were the originals, before preprocessing

spring field May 27, 2024, 8:09 PM

#

past meteor damn, sift features are literally stone age tech now

how do you get this from that?

past meteor May 27, 2024, 8:10 PM

#

spring field how do you get this from that?

honest answer? A bunch of preprocessing I didn't write myself

spring field May 27, 2024, 8:10 PM

#

oh, I just don't see the correlation, lol

past meteor May 27, 2024, 8:11 PM

#

hmm?

spring field May 27, 2024, 8:11 PM

#

is it like, unrolled?

past meteor May 27, 2024, 8:11 PM

#

yes

spring field May 27, 2024, 8:12 PM

#

idk, looks a bit more than just split in half

past meteor May 27, 2024, 8:13 PM

#

the code I copy pasted is nasty

#

but it works 🤷

#

Whenever people at work talk about "neural nets are too heavyweight" I show them stuff like this to show how finnicky pre NN comp vision was

#

Way more effort for worse results 😉 😄

spring field May 27, 2024, 8:16 PM

#

mmm, cuz I'm thinking this

past meteor May 27, 2024, 8:16 PM

#

spring field mmm, cuz I'm thinking this

Ah that's fair actually

spring field May 27, 2024, 8:17 PM

#

idk, the pupil is weird anyway

past meteor May 27, 2024, 8:17 PM

#

fsr I didn't ask as many questions about this as you guys are, but that's a good thing on your part

#

I just accepted the preprocessing had processed

#

I actually haven't used vision transformers

#

CNNs only 👴

spring field May 27, 2024, 8:18 PM

#

I did not need to know that...

#

I saw some performance graphs and that did seem to be the case
though seeing where they pay their attention is quite fun in ViTs

#

and I assume ViViT probably does beat a CNN, cuz a CNN can't really do frame processing, I mean, ig you could try do an RNN + CNN pithink

past meteor May 27, 2024, 8:19 PM

#

This was a lot of fun. I made a novel method that can add image specific noise (output of a NN) to turn any image into an image neural net 2 to trick it into believing it's an airplane.

#

The noise is very visible but that's because I didn't tune my loss function, I used a basic heuristic

spring field May 27, 2024, 8:20 PM

#

ah, the likely future of AI cybersecurity pg_rofl

past meteor May 27, 2024, 8:21 PM

#

I wonder if something similar can be done to open source transformers

#

You need whitebox access to the gradients, which you absolutely do for llama etc.

lapis sequoia May 27, 2024, 8:21 PM

#

past meteor You need whitebox access to the gradients, which you absolutely do for llama etc...

Gradients, huh?

past meteor May 27, 2024, 8:23 PM

#

I should do more ML projects. I never do (aside my actual job)

spring field May 27, 2024, 8:24 PM

#

same here (except I don't have a job yet...)

past meteor May 27, 2024, 8:33 PM

#

Yeah, I think Matiss is talking about video

#

RNN+CNN is truly a thing for those applications yeah

#

@final kiln https://boards.eu.greenhouse.io/otainsightltd/jobs/4334136101 thoughts? Sounds suspect to me.

#

Type of role where I suspect you'll be doing dashboards 24/7

#

I applied, nothing to lose

#

Want to know the secret of the high callback ratio? (I suspect I'll get a callback for this one very soon)

#

Here most ML people can't really really code well to the extent they can put things into prod

#

yes

#

If you can, and have demonstrable experience doing so you're ahead

#

Notice how using version control is a nice to have 😭

#

I see it with some of my colleagues. One does all his "development" inside a browser IDE in our SaaS. Doesn't version control anything "because it is saved in the browser"

#

Brilliant guy, but that's reality

#

I detest browser IDEs

#

Made me really dislike databricks

#

I need my vs code with my very specific colour template, keybindings, ...

#

I sync it between work and private

#

I just want them to give me a way to SSH into their environment

#

and code with vscode

#

ah, then it's np

#

Also went for this because I'm curious how much they pay https://www.solita.fi/positions/ai-specialist-with-a-focus-on-genai-5878783003/

#

I wouldn't do Gen AI focused roles ngl

#

The risk is losing touch with actually training models and the difficulties with deploying them

#

If you're just doing it with existing APIs you're effectively a backend SWE

#

Just my 2 cents

#

yeah, this stuff is actually engaging

#

because people (incl. myself) don't really understand it

#

It was the same issue with my current job. All our projects were too abstract for regular people with no tangible use case

#

"Spinal deformity detection using wavelet features", "glucose prediction models for people with type 1 diabetes", ...

The only thing we made that people could understand was some IoT computer vision thing

#

It had a UI, clear output etc

#

That's a factor as well

#

I'm gonna let the NLP train pass me I think

#

Haven't done enough in it and for some reason I can't be bothered to either

#

Just did the info retrieval thing in uni (which is the foundation for RAGs etc) that didn't do the actual NLP course

#

The cost of training, deploying, rlhf,, ...

hollow escarp May 27, 2024, 9:18 PM

#

hollow escarp https://universe.roboflow.com/roboflow-universe-projects/license-plate-recogniti...

@left tartan is data mentioned here enough?

#

https://www.easypaste.org/file/xqYJMfw0/license.plate.detector.onnx?lang=en

EasyPaste.org

license_plate_detector.onnx

#

Here is my model converted to onnx format

craggy patio May 27, 2024, 9:44 PM

#

!paste

arctic wedgeBOT May 27, 2024, 9:44 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

hollow escarp May 27, 2024, 9:55 PM

#

craggy patio !paste

for who is that message?

hollow escarp May 27, 2024, 10:34 PM

#

I dont think that pasting model to paste bin makes sens

#

Also im really glad for any ideas how can i speed up my model

craggy patio May 27, 2024, 10:41 PM

#

oh that's actually really funny mb

#

I did that for myself

#

I was pasting code in the rust discord server but it was too big

#

I needed the link for pastebin but I didn't know it off the top my head so I went to a random channel and did ! paste

#

sorry for the confusion

left tartan May 27, 2024, 10:47 PM

#

hollow escarp <@738234281146712084> is data mentioned here enough?

I can't look today, but perhaps someone else can.

spring field May 27, 2024, 11:11 PM

#

ViViT uses transformers, lol

torpid dune May 27, 2024, 11:11 PM

#

past meteor If you're just doing it with existing APIs you're effectively a backend SWE

completely agree.

spring field May 27, 2024, 11:14 PM

#

I was just wondering if CNN + RNN could do sth with videos

trail whale May 28, 2024, 2:51 AM

#

Ok I have a long and convoluted question for someone with a bigger brain than me:
So I know there are tons of GPT/LM services online that are free where I can send a prompt and the thing returns a response. And I know the way this works is it tokenizes my input, vectorizes those tokens, sends data into a neural network, and it outputs all the possible tokens' confidence values, or how sure it is that every token proceeds the last. My question is, I'm wondering if there's an API or a cloud-based thing where I can send a string as input, and the output is the confidence values for the next token in the string. For example, if the input is "The sun is", the output could be [["hot", 98], ["big", 95], ...]
TLDR: Is there an API where I can get the list of the raw output nodes of a GPT?

agile cobalt May 28, 2024, 3:10 AM

#

you may as well just run Llama or Phi locally instead

much of the time the tokens are not entire words, but rather fragments that may make no sense on their own like "un"

#

you can specify logprobs for the OpenAI GPT API though, https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs

spare briar May 28, 2024, 3:12 AM

#

openai completions api exposes logprobs but at most top 5

#

wonder if this is related to message size? or to make teacher-student distillation more difficult?

spring field May 28, 2024, 3:13 AM

#

agile cobalt you may as well just run Llama or Phi locally instead much of the time the toke...

quick question, can a TokenLearner be applied to GPTs as well, similar to how it works with ViT? so that it can learn how to split words itself or sth?

#

alright, ig a TokenLearner specifically probably cannot, but the general idea at least

#

network learns how to split words/sentences itself

agile cobalt May 28, 2024, 3:16 AM

#

TokenLearner itself appears to be specifically made for working with multi dimension data, so yeah probably not a good idea to use it directly on text

spare briar May 28, 2024, 3:16 AM

#

tokenizers are typically trained separately (most commonly by byte-pair encoding or unigram model https://github.com/google/sentencepiece)

there is some work like this https://arxiv.org/abs/2106.12672 on tokenizers learned e2e with pretraining but it isn't used in sota LLMs

arXiv.org

Charformer: Fast Character Transformers via Gradient-based Subword ...

State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model. To this end, we introduce a soft gradient-...

agile cobalt May 28, 2024, 3:17 AM

#

spring field network learns how to split words/sentences itself

the overall idea... iirc the token mappings for text transformers are already generated automatically from the text distribution in the training data, and you probably wouldn't get as much benefit from it in text as you may get for images

spring field May 28, 2024, 3:17 AM

#

agile cobalt TokenLearner itself appears to be specifically made for working with multi dimen...

yeah, it's used with vision transformers, it's pretty cool, I implemented one some days ago #data-science-and-ml message

spring field May 28, 2024, 3:19 AM

#

agile cobalt the overall idea... iirc the token mappings for text transformers are already ge...

mmm, I see, how does this "automatic generation" happen though pg_rofl
(I could also just google it...)

agile cobalt May 28, 2024, 3:19 AM

#

I don't know in enough detail to explain it

spring field May 28, 2024, 3:20 AM

#

fair enough

orchid forge May 28, 2024, 5:27 AM

#

can someone give me some EDA tutorial ?

#

good one

#

course i mean

past meteor May 28, 2024, 6:03 AM

#

orchid forge can someone give me some EDA tutorial ?

I'd say the best thing you can do is participate in tabular playground competitions in Kaggle (look this up). Do your own EDA and look at other solutions afterwards.

bitter ibex May 28, 2024, 6:14 AM

#

Hi I am working on a project using sklearn and i am facing an error in my final project.
If anybody can help me solve my problem , that would be greatfull. Just ping me in the dm.
Thanks

odd meteor May 28, 2024, 6:50 AM

#

bitter ibex Hi I am working on a project using sklearn and i am facing an error in my final ...

You'll get help much faster if you state what exactly you're having problem with here.

orchid forge May 28, 2024, 9:01 AM

#

past meteor I'd say the best thing you can do is participate in tabular playground competiti...

Is it on kaggle
Really?

past meteor May 28, 2024, 9:02 AM

#

yes, make an account and check it out

left tartan May 28, 2024, 9:25 AM

#

This is my favorite resource for EDA topics: https://www.itl.nist.gov/div898/handbook/.

pliant heron May 28, 2024, 9:25 AM

#

# Screen time analysis is the task of analyzing and creating a report on which applications and websites are used by the user for how much time. Apple devices hve one of the best ways of creating a screen time report.
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

data = pd.read_csv("Screentime_App_Details.csv")
# print(data.head())
# print(data.isnull().sum())
# print(data.describe())

figure = px.bar(data_frame=data, x="Date", y="Usage", color="App", title="Usage Graph")
figure.show()

PLease someone help me in fixing this error
i have tried deactivating my anti-virus but still the issue persists

summer anvil May 28, 2024, 9:27 AM

#

pliant heron ```python # Screen time analysis is the task of analyzing and creating a report ...

you need to host the server first to run it idk

pliant heron May 28, 2024, 9:28 AM

#

summer anvil you need to host the server first to run it idk

how to do that?

#

thankyou

summer anvil May 28, 2024, 9:28 AM

#

pliant heron how to do that?

idk what are you trying to do

left tartan May 28, 2024, 9:28 AM

#

pliant heron ```python # Screen time analysis is the task of analyzing and creating a report ...

Where are you running this? From a notebook?

pliant heron May 28, 2024, 9:28 AM

#

left tartan Where are you running this? From a notebook?

from vs code

left tartan May 28, 2024, 9:29 AM

#

In a .py or .ipynb?

pliant heron May 28, 2024, 9:29 AM

#

left tartan In a .py or .ipynb?

.py

worn adder May 28, 2024, 9:29 AM

#

i wanna get into this sector, ik basic python , can anyone tell like what exatly it is and the roadmap? thank you

pliant heron May 28, 2024, 9:29 AM

#

i m looking for this output

left tartan May 28, 2024, 9:30 AM

#

Yah, so you probably want to just write the image or figure.to html

summer anvil May 28, 2024, 9:30 AM

#

then use simplehttpserver or smt idk

pliant heron May 28, 2024, 9:30 AM

#

how to do it

left tartan May 28, 2024, 9:30 AM

#

https://plotly.com/python/getting-started/ (write html, I meant)

summer anvil May 28, 2024, 9:30 AM

#

pliant heron how to do it

or just use live server extension

pliant heron May 28, 2024, 9:31 AM

#

thanks billy and hoang

left tartan May 28, 2024, 9:31 AM

#

The explore chapter, if that's what you mean, yes

#

Its a very thorough document

#

Yup, exactly. And practitioners get frustrated with the recent trend to throw ML at data and forego EDA

past meteor May 28, 2024, 9:47 AM

#

Too many people never look at the data

#

When I say look I just really mean, sample some of it images, records, documents, whatever it may be and read it

#

no graphs just looking

#

From the questions that come here that's often implicit (not looking)

#

As for EDA

#

I've said it many times, I think you should avoid doing ML and your first line of defence is a good EDA such that you could find heuristics that solve the problem sufficiently

#

This is usually a thing with tabular data only imho

#

Agreed but when it's about EDA I guess the assumption is that it's tabular data

#

Or at most time series

#

There's degrees to it though. One of my colleagues is a bio-engineer more analytics focused (+ has more domain knowledge in our project)

#

The EDA's he does are things I'd never dream of

#

They have tons of depth

#

Sometimes they're complex in a bad way but still, loads of depth

night kettle May 28, 2024, 2:49 PM

#

Guys , please if someone can help me plot my data just like in the graoh below

stray tide May 28, 2024, 3:03 PM

#

I am making a stock market price predictions how to give the time stamp feature to the model

#

i have given there not sure if it is of any use

#add day of  features
df['day_week']=df['Datetime'].dt.dayofweek
df['day_month']=df['Datetime'].dt.day
df['month']=df['Datetime'].dt.month

df['time']=df['Datetime'].map(lambda a:(pd.Timedelta(a-pd.to_datetime('2023-1-14 9:15:00+05:30'))/pd.Timedelta('5m'))%(24*12))

#

!code

spring field May 28, 2024, 3:34 PM

#

I don't think Lisan here is gonna like my suggestion ( pg_rofl ), but... RNNs are usually what you'd use for time series data such as this and you'd simply provide the data in a sequence
probably could use transformers for it as well since they have positional encoding, but idk
other than RNNs, there are a bunch of other time series thingies in ML like ARMA, ARIMA, SARIMA, SARIMAX and whatnot

dusty talon May 28, 2024, 3:43 PM

#

Is it better to make my own ai from scratch? or use pre built libraries already im still new to ai, and im not sure whether i must go down to learn, how LLM, RNN.. etc work, and then program my own from scratch or should i use a library like, tensorflow , torch or Hugging face???
Everything is still a bit confusing for me, i do understand python so i guess that would be my starting point i do suck at maths though.

Im kinda looking for a starting point, and where to go from their.?

stray tide May 28, 2024, 3:50 PM

#

dusty talon Is it better to make my own ai from scratch? or use pre built libraries already ...

if you are starting I would strongly suggest you to use a pre-built model to learn

#

Once you get a hang of it you might try coding it on your own but not sure if it will be ever needed in a practical scenario I might be wrong that what I know

spring field May 28, 2024, 3:55 PM

#

but of course, it's wishful thinking at best, but if you believe there are patterns to find, might as well I guess, besides it's not a terrible learning experience anyway

spring field May 28, 2024, 3:57 PM

#

stray tide if you are starting I would strongly suggest you to use a pre-built model to lea...

I would not suggest that, I'd suggest doing it all from the beginning, implementing a simple Feed-Forward network, by hand, using say only numpy and matplotlib for dataviz, then ramp up the complexity, then move onto pre-trained models

#

reminds me of that time monkeys throwing darts beat the market... (I just found an article on Forbes and it's actually a bit more than that, the explanations and stuff, it's quite interesting)

#

one can hope waaaaaaaaaahhhhhh

frank dagger May 28, 2024, 4:16 PM

#

anyone familiar with assigning weights to datapoinst in a dataset and then training these weight in parallel to the original dataset, in order to determine which datapoints that are more reliable when sampling together a prediction?

hollow escarp May 28, 2024, 4:46 PM

#

hollow escarp https://universe.roboflow.com/roboflow-universe-projects/license-plate-recogniti...

Anyone has some idea how can i speed up my model?

#

Or recognition process with out Using GPU

narrow tiger May 28, 2024, 4:53 PM

#

any generative AI videos u guys recommend to add custom knowledge to models/ create agents get used to basics thanks

past meteor May 28, 2024, 5:16 PM

#

spring field I don't think Lisan here is gonna like my suggestion (<:pg_rofl:8374364447703040...

This is the right answer

#

There's been a lot of research on this

river cape May 28, 2024, 5:21 PM

#

Hey guys I have just finished these concepts in NLP:- Text preprocessing , One hot Encoding , Bag Of Words, N -grams , TF-IDF , Word2Vec and POS tagging . What else should learn to improve in NLP?

past meteor May 28, 2024, 5:23 PM

#

It is

#

Oh, I was talking about time series in general

#

Not stocks

hollow escarp May 28, 2024, 5:35 PM

#

Any ideas what additional img transformations should i make to make this img easier to read by OCR?

#

Thats img before any preprocessing operations

#

Im using easyocr

#

For OCR actions

hollow escarp May 28, 2024, 5:37 PM

#

hollow escarp Any ideas what additional img transformations should i make to make this img eas...

But im wondering if i should my imgs better quality or are there any additional preprocess techniques which i could use

#

How to do that?

#

Thx, let me try it

narrow tiger May 28, 2024, 5:40 PM

#

can irun mistral on 1660ti?

hollow escarp May 28, 2024, 5:40 PM

#

I think thats not that easy task, im making OCR for licesne plates so i would need a lot of time and resources for creating good model

narrow tiger May 28, 2024, 5:43 PM

#

have you tried creating this?

#

it maybe if you can make it see where liq is

#

and price is fractal

#

it might work to some extent

#

using smart money concepts

hollow escarp May 28, 2024, 5:44 PM

#

I will try it, im not really good in training data sets most of the time i used already created by someone

narrow tiger May 28, 2024, 5:44 PM

#

cool can u suggest any good read on 'fractal time series'

hollow escarp May 28, 2024, 5:45 PM

#

But for sure i will try it right after connectedCompontents function

narrow tiger May 28, 2024, 5:46 PM

#

i never said u should use numbers for this

#

price is fractal
by this i mean price action repeats on different time frames
so if u can make a model that can learn different patterns on different time frames and can alsoo see where most of the liq is

#

it might be possible to predict direction to some extent from higher to lower time frames

#

ngl i have no idea what that means
but as a trader when we say price is fractal it means
it behaves same way on different time frames

#

#

liq runs same happens on all time frames

#

tag buy side and then to sell side imediatly

#

buy side liquidity == people who's buy orders got filled from being short(liquidated)

#

what do u mean by "it is not present in the numbers "

#

bro u don't have to beleive me
but u should look up 'smart money concepts'
that's how people understand what market is about to do

#

and i been trading for 3 years now

#

haha people do say just by looking at raw price action you can predict if something at scale is gonna happen

#

going by that defination you can't really predict the market ever no matter what you do

#

but u can always use probability and risk management to make educated guess

#

nope u can't
the only way to 100% accurately predict is by knowing what everyone involved is gonna do with certiniity

#

but try delta neutral stratgies you are good with math u can come up with somethiing like that i am sure

#

there is theoratical part and there is what actually happens

#

news has little effect on markets

#

people who are sources of the news usually have already taken their sides of the bet so news is almost always priced in

#

well let me know what you some up with

#

try paper trading
i have built alot of trading bots most of them only make money under some specific conditions or just loose
thanks for the resources hopefully i'll learn something new

hollow escarp May 28, 2024, 6:21 PM

#

Okay i think trying to get that high resoults with OCR doesnt make sens, @final kiln could you share some more details about training UNET model?

#

What dataset did you use

#

Unfortunetly i dont have "my data" i mean i found some dataset with licesne plates imgs here https://universe.roboflow.com/roboflow-universe-projects/license-plate-recognition-rxg4e

Roboflow

License Plate Recognition Object Detection Dataset and Pre-Trained ...

10126 open source license-plates images plus a pre-trained License Plate Recognition model and API. Created by Roboflow Universe Projects

#

There is no such repo

#

#

Label you mean the location of license plate or actual text?

#

Nope

#

What do you mean by that?

#

Okay i could try to generate such data

#

Okay assuming i have a set which includes every letter in 40 random license plates from the set above how should i train that data? As i told you im really bad in traning stuff

#

Oh okay

#

Okay im glad for your advices for sure i will try to train such model

#

https://keras.io/api/data_loading/ so i should just use this code with my prepared data from license plate data set?

Keras documentation: Data loading

#

Okay thxx

fallow coyote May 28, 2024, 6:59 PM

#

Im currently doing a guided project (stock price predictor). I'm continually commenting throughout my code to use as a guide when I go onto make a solo ML project. Is what I'm doing useful trying to improve my ML programming skills? What can I do to improve them further?

dusty talon May 28, 2024, 7:02 PM

#

stray tide if you are starting I would strongly suggest you to use a pre-built model to lea...

okay tysm

hollow escarp May 28, 2024, 7:10 PM

#

Im wondering about one think, because imgs which are provided in the data set are "to good" what i mean by that: I mean that my camera doesnt record in that high quality the imgs in the data set

#

Also i know that my camera setup could be better so im thinking also about first trying with different angle

river cape May 28, 2024, 7:13 PM

#

Hi guys is advanced NLP known as Gen AI?

hollow escarp May 28, 2024, 7:14 PM

#

Okay

river cape May 28, 2024, 7:15 PM

#

No idea I just finished ext preprocessing , One hot Encoding , Bag Of Words, N -grams , TF-IDF , Word2Vec and POS tagging. So i was wondering what to do next

#

I am sorry but I dont have any idea about what should I learn next

serene scaffold May 28, 2024, 7:17 PM

#

river cape Hi guys is advanced NLP known as Gen AI?

No. There's no widely recognized gradations of NLP. But even if there were, none of them would be "generative AI", since you can have generative AIs that have nothing to do with language.

#

Technologies like ChatGPT are interactive, generative language models. And interactive generative language models are in vogue at the moment. But they aren't the only application of language models. Or the only AI technologies that are both interactive and generative.

river cape May 28, 2024, 7:22 PM

#

serene scaffold No. There's no widely recognized gradations of NLP. But even if there were, none...

Okay I got that point

#

I am sorryyy🥲

river cape May 28, 2024, 7:23 PM

#

serene scaffold Technologies like ChatGPT are interactive, generative language models. And inter...

What should I focus on next?

serene scaffold May 28, 2024, 7:24 PM

#

river cape No idea I just finished ext preprocessing , One hot Encoding , Bag Of Words, N -...

Try making a model that classifies documents, which are emails, as SPAM or NOT SPAM

river cape May 28, 2024, 7:24 PM

#

serene scaffold Try making a model that classifies documents, which are emails, as SPAM or NOT S...

I did that

serene scaffold May 28, 2024, 7:24 PM

#

river cape I did that

how did you do it?

river cape May 28, 2024, 7:25 PM

#

serene scaffold how did you do it?

I have used naive bayes classifier

serene scaffold May 28, 2024, 7:26 PM

#

this isn't a serious suggestion.

serene scaffold May 28, 2024, 7:26 PM

#

river cape I have used naive bayes classifier

how did you represent each email?

river cape May 28, 2024, 7:27 PM

#

serene scaffold how did you represent each email?

In terms of vectors? I used the word2vec to give me the vectors of each word in the email and then as whole for the email

serene scaffold May 28, 2024, 7:28 PM

#

river cape In terms of vectors? I used the word2vec to give me the vectors of each word in ...

what was the dimensionality of the vectors?

river cape May 28, 2024, 7:29 PM

#

serene scaffold what was the dimensionality of the vectors?

I have kept the dimensionality of those vectors as 100

serene scaffold May 28, 2024, 7:29 PM

#

river cape I have kept the dimensionality of those vectors as 100

okay, try making a feed-forward neural network that takes those same vectors and outputs a true/false prediction

river cape May 28, 2024, 7:30 PM

#

serene scaffold okay, try making a feed-forward neural network that takes those same vectors and...

You mean instead using an ML algorithm , I should make a DL model with outputs as true or false?

serene scaffold May 28, 2024, 7:31 PM

#

river cape You mean instead using an ML algorithm , I should make a DL model with outputs a...

deep learning is a subset of machine learning (and don't listen to anyone who tells you otherwise)

river cape May 28, 2024, 7:31 PM

#

serene scaffold deep learning is a subset of machine learning (and don't listen to anyone who te...

Sure bud , you are my coach ! Okay I shall make that now and let you know

narrow tiger May 28, 2024, 7:59 PM

#

how does a local running ollama model hold soo much info?

#

it is only like 4gb

#

by having info i mean
it is answering all the questions?
my understanding is LLMs just copy paste knowledge that they have seen before (they aren't able to think)
so how can a 4gb model answer all those questions

wooden sail May 28, 2024, 8:09 PM

#

narrow tiger by having info i mean it is answering all the questions? my understanding is LLM...

this isn't quite right

#

it's better to think of it as building a function that assigns tokens a probability of occuring together

#

no copy pasting is going on

narrow tiger May 28, 2024, 8:10 PM

#

mb didn't mean literally copypasting

wooden sail May 28, 2024, 8:10 PM

#

it's more like building a function that you feed some text into, and it tells you what it thinks comes next by assigning a probability to all the tokens it knows

#

and as for how much predictive power, let's do a quick calculation

narrow tiger May 28, 2024, 8:11 PM

#

wooden sail it's better to think of it as building a function that assigns tokens a probabil...

wait just like any other machine learning algo
where the fit some function and do gradient decent for cost function reduction

spring field May 28, 2024, 8:11 PM

#

yep

wooden sail May 28, 2024, 8:11 PM

#

with the distinction that the prediction is not deterministic

narrow tiger May 28, 2024, 8:11 PM

#

ohh ok ok now i see how hullicination might occur

wooden sail May 28, 2024, 8:12 PM

#

not all models have a random sampling effect

narrow tiger May 28, 2024, 8:12 PM

#

this is magic 😂

spring field May 28, 2024, 8:12 PM

#

wooden sail with the distinction that the prediction is not deterministic

do they actually do a random selection with the predicted weights or do they pick the max? or is max usually used for classification?

wooden sail May 28, 2024, 8:13 PM

#

at any rate: 4GB. these models often use 32 bit floats. we can fit 4*10^9/32 floats in that much memory

wooden sail May 28, 2024, 8:13 PM

#

spring field do they actually do a random selection with the predicted weights or do they pic...

it does use a random selection among the top scores

#

.wa s 4*10^9/32

strange elbowBOT May 28, 2024, 8:13 PM

#

Wolfram Alpha

125000000

wooden sail May 28, 2024, 8:13 PM

#

that's the number of parameters in the model

spring field May 28, 2024, 8:13 PM

#

mmm, ig that makes sense, otherwise it would indeed be rather boring if it were deterministic

wooden sail May 28, 2024, 8:14 PM

#

spring field mmm, ig that makes sense, otherwise it would indeed be rather boring if it were ...

right, it makes it sound less natural

#

asking exactly the same question on different sessions should net you slightly different responses

narrow tiger May 28, 2024, 8:16 PM

#

it kinda makes sense
i have only done very basic neural networks and the statistics did work
but to think that i can work for something as natural language to "guess" what comes next and still able to make a sentence which makes sense is unbelievable

wooden sail May 28, 2024, 8:16 PM

#

whether it learns how to assign probabilities correctly depends entirely on the training data though

#

there are lots of cool experiments tricking language models into spitting out tokens in ways that make no sense

#

you can get stuff that doesn't make up words at all

narrow tiger May 28, 2024, 8:18 PM

#

is there a basic grammar defined like it has to map token with real words

wooden sail May 28, 2024, 8:18 PM

#

no

#

that's all learned

narrow tiger May 28, 2024, 8:18 PM

#

so how does iit know what "Apoptosis" is

wooden sail May 28, 2024, 8:18 PM

#

it doesn't

#

but in the training data, the tokens making up that word only appear in special combinations, so seeing that in a sentence in the correct context (with other tokens in the correct order) tells it to assign the upcoming tokens a high probability

narrow tiger May 28, 2024, 8:20 PM

#

is there like a very basic LLM you can train yourself

#

to better understand the process

wooden sail May 28, 2024, 8:21 PM

#

there should be, try scrolling up. i recall someone training a model of their own in the past few days

#

maybe matiiss, actually

#

i also remember that person getting random tokens sometimes

spring field May 28, 2024, 8:21 PM

#

I mean, it makes sense

narrow tiger May 28, 2024, 8:21 PM

#

like for ml i did an example for
[1,0,0,1] so had to fit the line by first 3 elements and train to predict the 4th

spring field May 28, 2024, 8:21 PM

#

wooden sail i also remember that person getting random tokens sometimes

that wasn't me then 😁

narrow tiger May 28, 2024, 8:22 PM

#

wooden sail there should be, try scrolling up. i recall someone training a model of their ow...

thanks i will

wooden sail May 28, 2024, 8:23 PM

#

i think i was thinking of this

#

maybe not, idk

spring field May 28, 2024, 8:24 PM

#

that was an RNN

wooden sail May 28, 2024, 8:24 PM

#

ah ic, i wasn't paying attention

spring field May 28, 2024, 8:24 PM

#

I mean, I did attempt next token with GPTs as well, I just forgot to actually do the rollouts 😁

#

and also I implemented it like a translator...

wooden sail May 28, 2024, 8:26 PM

#

that does cut the parameters from N^2 to 2N - 1

#

N if you use a symmetric kernel

#

we used a similar trick in a recent paper cuz a model required a few petabytes of memory otherwise

#

you give something up though. CNNs enforce shift invariance, which is something not all sentences have

#

could be interesting. there will definitely be a tradeoff between memory gains and accuracy as you play with the window size

hollow escarp May 28, 2024, 8:57 PM

#

hollow escarp https://universe.roboflow.com/roboflow-universe-projects/license-plate-recogniti...

@final kiln hey sory for ping, but this question is siting here for some time now and i see you've got big knowladge about AI im wondering how could i speed up my obejct detection model? ( using CPU cant use GPU )

#

in the message pined there are some info about model

hollow escarp May 28, 2024, 8:58 PM

#

hollow escarp https://www.easypaste.org/file/xqYJMfw0/license.plate.detector.onnx?lang=en

Also send model it self

#

Im using onnxruntime

#

They told me here that using onnxruntime is really good solution for deploying object recognition in dockers

#

i've had a problem with puting my yolo object detection to docker

#

and here told me to use onnxruntime for it

#

Oh i remeber that was to problem

#

The size

#

@final kiln

#

I need to deploy it to field devices, so it cant be that big

#

Thats the size of my docker now

#

And i know i can cut like 100MB more from it's compressed size

#

Nope, wait i will send you my docker

#

There is a bit mess here becasuse i was trying some different stuff with mutli platform imgs, but here you go:

FROM --platform=$TARGETPLATFORM python:3.11.9-slim-bullseye

ARG TARGETPLATFORM

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        libgl1 \
        libglib2.0-0 \
        gcc \
        libhdf5-dev \
        cmake \
        g++ \
        libffi-dev \
        pkg-config && \
        if [ "$TARGETPLATFORM" = "linux/arm/v7" ]; then \
            apt-get install -y --no-install-recommends ninja-build; \
        fi

RUN apt-get install -y --no-install-recommends libssl-dev 

WORKDIR /app

RUN pip install --upgrade pip setuptools wheel && \
    pip install --no-binary=h5py h5py --no-cache-dir

COPY requirements.txt .
# COPY /wheels .

# RUN if [ "$TARGETPLATFORM" = "linux/arm/v7" ]; then \
#     pip install onnxruntime-1.16.0-cp311-cp311-linux_armv7l.whl; \
#     apt-get install python3-scipy -y --no-install-recommends; \
# fi

RUN pip install --no-cache-dir -r requirements.txt --find-links https://download.pytorch.org/whl/cpu 

COPY . .

RUN echo "Building for $TARGETPLATFORM"

CMD ["python", "main.py"]```

#

What do you mean "no you did"?

#

You can cut a bit by using mutli stage builds

#

But it wont be much

#

No i followed the instaltion for CPU only

#

Oh you mean in old img

hollow escarp May 28, 2024, 9:20 PM

#

hollow escarp Thats the size of my docker now

I thought your talking about that one

#

Ye so i installed it with CUDA there

#

But even with out CUDA i had like 5GB or like 4.7GB of img

#

I heard about some model quantization but never investigated in subject

#

I tried bunch of ways but non of them gave me any improvements

#

Okay, so i should grab some controllers with GPU or start being AWS slave XD

#

Ye i know, im just joking

#

I will read that options really carefully, im really glad for all you support

#

Ye i know that

fallow coyote May 28, 2024, 10:25 PM

#

I swear it always feels like im going back to square one with my ML journey. i see you all with somewhat clear aims to be in ML/AI/DS judging by the level of knowledge you all have, whereas I'm just a uni dropout doing this shit because it seems interesting, not knowing how much fucking work it requires with no aim in what to use it in

fallow coyote May 28, 2024, 10:42 PM

#

I dropped out of 1st year. Got sick of the studying. Applying to apprenticeships (engineering and it). Idk wtf i want to do

#

Still waiting for confirmation if im successful or not. It just feels shit. When i have no aim or guide, i cant focus in doing something. Like with the ML/AI. I cant put the hours needed to be good at it if i dont know what im going to do in the future with it

#

Well its what i plan to do. Do an apprenticeship and then use that to get into a degree apprenticeship scheme. I just want to do something to pass the time and have it be beneficial for me in the future

#

I swear bro its like in constantly going back square one every few days

fallow coyote May 28, 2024, 11:13 PM

#

It used to be my dream but when you spend so many years studying and then for it to not go anywhere and seeing your dreams collapse in front of you with barely anyone helping you out, it destroys and changes how you think about things

serene scaffold May 29, 2024, 12:01 AM

#

fallow coyote Well its what i plan to do. Do an apprenticeship and then use that to get into a...

Are apprenticeships a common thing where you live? I've never heard of anyone in the US getting into AI from an apprenticeship.

fallow coyote May 29, 2024, 12:20 AM

#

serene scaffold Are apprenticeships a common thing where you live? I've never heard of anyone in...

Im from the UK. Ton of apprenticeship schemes over here

#

I applied to an IT apprenticeship with BT and an engineering apprenticeship with another company. If i do well, ill do a degree apprenticeship where i can get a degree through my apprenticeship (no need to pay for anything)

latent girder May 29, 2024, 5:28 AM

#

is there any good interactive free data science tutorial/course? Most yt's are boring. A full of non-stop talking

#

oh nvm I mean just python, not the whole data science stuffs.

#

sry about that

fallow coyote May 29, 2024, 7:14 AM

#

Ill see how it goes. Hopefully i get something back from them

peak ridge May 29, 2024, 8:11 AM

#

@final kiln

#

how was the job interview?

#

on paper im millionaire

#

on paper valuation , my share price

#

you'll get the job

#

nice.

#

do u have LangChain knowledge tho?

#

as someone who hires,
I would say it's not about just interview
Some ppl interview to hire "actually" and some to reject, it's just matter of requirement and urgency, scarcity

#

for backup

#

they have options, but to be on safer side they i'll get back to u
after even sometimes 6 months there another hire when fucks up they'll get to u

#

that's sad.bad but reality.

#

oh, what was different?

Maybe i will try to make our company like this

#

what was so different in this Interview Vs Others @final kiln

#

can u give me a bit detailed insight, maybe i could apply it too

#

notice everything and please do lemme know with insights

#

so she took personalized interest in ur work and gave u more ideas and stuff

#

very very good, i do this but i'll take this more into consideration

#

crazy

#

where are u from ?

#

which country

#

oh

#

why?

#

why didnt u like the HR

#

(im trying to scrape everything that a developer like u thinks of a interviewer/a company, so i can maybe identify something wrong or apply something seeming to be useful)

#

ok, what kind of personality ppl
"developers" like you wanna work with

azure igloo May 29, 2024, 8:26 AM

#

hi! sorry for interrupting, quick question. I was wondering how I could figure out how much memory capacity/processing power would be ideal for my AI model? i'm using a python wrapper called faster whisper to live transcribe voice into text and running it on my normal CPU isn't cutting it at all

peak ridge May 29, 2024, 8:30 AM

#

okay, i get it.

azure igloo May 29, 2024, 8:31 AM

#

okay

#

i'm eventually planning to move it to a raspberry pi

#

how would that work?

#

oh hm

#

me and my friends were planning to make a robotic dog with speech recognition and a camera

#

is a raspberry pi not ideal for that?

#

so essentially use an online/cloud API

#

gotcha ty

covert aspen May 29, 2024, 9:36 AM

#

Let's say I have a dataframe that consists of 1000 rows. 700 of those rows belong to class A and the rest 300 belong to class B. Is there a function in Pandas that takes in an imbalanced dataframe and randomly removes rows from the class with excess rows such that the final dataframe contains equal number of each class?

untold bloom May 29, 2024, 10:27 AM

#

there is no function to specifically do that

#

but you can identify the major class, get the number of samples to be dropped from it, sample that amount of bad indexes, and drop them from the frame

#

a way

jaunty helm May 29, 2024, 10:40 AM

#

covert aspen Let's say I have a dataframe that consists of 1000 rows. 700 of those rows belon...

built-in, no. if you can't be bothered, imblearn has samplers. if you just want to use pandas, do what Nahita suggested
like .groupby the class, .apply a .sample(n) with n equal to the number of elements in the minority class

covert aspen May 29, 2024, 10:41 AM

#

jaunty helm built-in, no. if you can't be bothered, `imblearn` has samplers. if you just wan...

Alrighty, I'll do it. Not that I'm too lazy to do it, I just thought it'd be great if there was an in-built way of doing it. However, it's a trivial task.

covert aspen May 29, 2024, 10:42 AM

#

untold bloom but you can identify the major class, get the number of samples to be dropped fr...

Oh, alright. 👍🏻

jaunty helm May 29, 2024, 10:43 AM

#

or actually, if you don't need any randomness you can just .groupby('class').head(100) to get the first 100 of each class (and .tail for the last 100)

sick sonnet May 29, 2024, 12:55 PM

#

what is the absolute fastest embedding model with acceptable performance (must be open source) right now?

agile cobalt May 29, 2024, 12:56 PM

#

the "acceptable performance" will vary greatly based on which task you want to use it for, and the model performance may vary a lot depending on what your data is like

traditional search engine? RAG? Which type of data you're working with?
(QA, text or pdf documents, how structured is it, which language(s))

#

I'm guessing only text, but the way you framed your question it's also ambiguous whenever you want a model that only accept text inputs, be only images, multi-modal

agile cobalt May 29, 2024, 1:03 PM

#

sick sonnet what is the absolute fastest embedding model with acceptable performance (must b...

see the above questions

but seems like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 might be a good default choice

sentence-transformers/all-MiniLM-L6-v2 · Hugging Face

sick sonnet May 29, 2024, 1:04 PM

#

how does it compare to something more primative such as TfidfVectorizer?

sick sonnet May 29, 2024, 1:05 PM

#

agile cobalt the "acceptable performance" will vary greatly based on which task you want to u...

it's a rag with a mix of structured and and unstructured textual data

agile cobalt May 29, 2024, 1:07 PM

#

sick sonnet how does it compare to something more primative such as TfidfVectorizer?

you have to fit TfidfVectorizer on your data, and it lacks much of the semantic meaning which embedding models try to preseve

#

I'm also not sure about what its output shape would end up as for large dataset?
most embedding models create a vector with few thundred to a few thousand of vectors

narrow tiger May 29, 2024, 2:05 PM

#

i am running mistral on localhost and it is taking a few seconds before responding

agile cobalt May 29, 2024, 2:06 PM

#

unless you're running it on a GPU, that is to be expected

narrow tiger May 29, 2024, 2:06 PM

#

also ollama services is running on my system as soon as my system starts
whenever i call the mistral endPoint does it start mistral each time? bcz i don't see it using too much resources

#

i don't see ollama/mistral using my gpu

#

#

unless thoose --variation seed versions are ollama??

agile cobalt May 29, 2024, 2:08 PM

#

it should take way more memory than that

narrow tiger May 29, 2024, 2:09 PM

#

thanks i'll try to run it on gpu

#

it isen't even using much of cpu lol

agile cobalt May 29, 2024, 2:10 PM

#

if your GPU does not supports it you could try a smaller model like phi3 but I wouldn't expect amazing performance nor quality from another you can run without a GPU

narrow tiger May 29, 2024, 2:12 PM

#

mistral 7B is running on my CPU i5 5-600k? shouldn't it also be able to run og 1660Ti with 6gb vram i think

lapis sequoia May 29, 2024, 2:31 PM

#

do you ever use training data on a grid search?

spring field May 29, 2024, 3:43 PM

#

bit pytorch specific but what's the difference (if any) between

model.forward(...)
model.forward(...)
...
optimizer.step()

and

with torch.no_grad():
    model.forward(...)
model.forward(...)
...
optimizer.step()

#

that's what I thought, I just don't understand it 😄
like, aren't gradients calculated and added during backprop?

#

oh the gradients of only that layer?

#

for context I was doing A2C RL and the former worked while the latter did whatever it did, lol
also speaking of RL it seemed to me that using too big of a hidden size overfit the score function without actually increasing the total reward

past meteor May 29, 2024, 3:52 PM

#

spring field bit pytorch specific but what's the difference (if any) between ```py model.forw...

I recommend going into a debugger and looking at the gradients of your weights in #1 and #2 to compare

spring field May 29, 2024, 3:56 PM

#

I went into the debugger a couple times while trying to figure out why it was doing whatever it did and that never crossed my mind 😁 will do joe_salute

young ledge May 29, 2024, 3:57 PM

#

anyone knows how to make jupyter vs code extension store outputs and variables when i close vs code? kinda annoying having to retrain the model every time i wanna do smt with it after closing vs code
i could save and load the model into a file but jupyter saving outputs wouldve been more convenient

hollow escarp May 29, 2024, 8:40 PM

#

Hi, does anyone know how could i make treshhold for that img which would give the letters really high contrast?'

hollow escarp May 29, 2024, 9:49 PM

#

I tryed bunch of techniques but even with this img im still having trouble with that OCR

#

maybe thats because im using easyocr for that task

#

Idk

#

def preprocess_license_plate_image(img):
    img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.GaussianBlur(img, (5, 5), 0)

    kernel_sharpening = np.array([[0, -1, 0], 
                                  [-1, 5, -1],
                                  [0, -1, 0]])
    img = cv2.filter2D(img, -1, kernel_sharpening)
    img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                cv2.THRESH_BINARY, 11, 2)
    
    return img
``` Code used for preporcessing

#

I even achived that preprocessing img

#

But still having a trouble with correct reading it

toxic mortar May 29, 2024, 10:06 PM

#

Why dont you fine-tune some neural net?

#

Also try tesseract-ocr

#

I had very pleasent exp w it

hollow escarp May 29, 2024, 10:10 PM

#

Ye i know that, but im still conecered about your method becasue Eeasyocr is already using AI model for that

#

https://github.com/JaidedAI/EasyOCR

GitHub

GitHub - JaidedAI/EasyOCR: Ready-to-use OCR with 80+ supported lang...

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. - JaidedAI/EasyOCR

uncut marten May 29, 2024, 10:35 PM

#

does anyone know how yo create a chatbot?

spring field May 29, 2024, 10:47 PM

#

hollow escarp I even achived that preprocessing img

btw, I can't tell if it's an S or a 5

#

really looking forward to that 4th point lolDog

hollow escarp May 29, 2024, 10:52 PM

#

In good enviorment yes

#

But in field not really

#

i mean i still have some options for different camera setups

spring field May 29, 2024, 10:52 PM

#

anyway, currently I'm doing GAN stuff

hollow escarp May 29, 2024, 10:53 PM

#

spring field btw, I can't tell if it's an S or a 5

Thats actually how the licesne plate letters are printed

#

Thats S

#

Thats like base img which i preprocess

spring field May 29, 2024, 10:54 PM

#

hollow escarp Thats S

mmm, had my suspicions on that, but like... a model understanding that? meh, maybe

hollow escarp May 29, 2024, 10:54 PM

#

What do you mean by that?

spring field May 29, 2024, 10:55 PM

#

well, a bit in reverse, but yeah

hollow escarp May 29, 2024, 10:55 PM

#

yes but also i know that i didnt setup the camera in best possible way

spring field May 29, 2024, 10:56 PM

#

hollow escarp Thats like base img which i preprocess

this is gonna be a bit sidestepped, but have you considered using a better quality camera?

hollow escarp May 29, 2024, 10:56 PM

#

spring field this is gonna be a bit sidestepped, but have you considered using a better quali...

Currently not XD

#

I mean i still have a few not tested configurations from which i will definitely start

modest steeple May 30, 2024, 12:02 AM

#

hi i have this pictrue and this is my code:
from PIL import Image
from pytesseract import pytesseract
import enum
import pyautogui
import os

class OS(enum.Enum):
Windows = 1

class Language(enum.Enum):
ENG = 'eng'
ARB = 'arb'

class ImageReader:

def __init__(self, os: OS):
    if os == OS.Windows:
        windows_path = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
        pytesseract.tesseract_cmd = windows_path
        #print("Running on: Windows\n")

def extract_text(self, image_path: str, lang: Language) -> str:
    img = Image.open(image_path)
    extracted_text = pytesseract.image_to_string(img, lang=lang.value).strip()  # Strip the text to remove newlines and spaces
    return extracted_text

if name == 'main':
save_path = r'C:\Users\falcon\Desktop\my_lexis_plus_bot\images1'
file_name = "big_question.png"
save_path1 = os.path.join(save_path,file_name)
left = 1250
top = 400
width = 600
height =42
screenshot = pyautogui.screenshot(region=(left,top,width,height))
screenshot.save(save_path1)
screenshot.show()
ir = ImageReader(OS.Windows)
bigq = ir.extract_text('images1/big_question.png', lang=Language.ENG)
print(bigq)

is there anyway to make my code detect the underscores and write them with the output?
beacuse the output only the sentens without the underscores like this : Nadia is much more than her sister.

narrow tiger May 30, 2024, 1:27 AM

#

use an img reader that supports _

#

how come LLMs are not deterministic if they have predefined weights and they predict what word should come next shouldn't they be deterministic?

serene scaffold May 30, 2024, 1:32 AM

#

narrow tiger how come LLMs are not deterministic if they have predefined weights and they pre...

they might be if you turn the temperature to 0

#

there's this: https://152334h.github.io/blog/non-determinism-in-gpt-4/

152334H

Non-determinism in GPT-4 is caused by Sparse MoE

It’s well-known at this point that GPT-4/GPT-3.5-turbo is non-deterministic, even at temperature=0.0. This is an odd behavior if you’re used to dense decoder-only models, where temp=0 should imply greedy sampling which should imply full determinism, because the logits for the next token should be a pure function of the input sequence & the model...

nova matrix May 30, 2024, 2:23 AM

#

I am building a predictive model, consider it to be based on stocks etc. There are columns for search trends which show trends rated from 1 to 100. Could be a very useful variable for the model but most of the data for those search trend columns is missing. Any idea on how I could leverage whatever is given?

winged hornet May 30, 2024, 6:45 AM

#

Is anyone of you experienced in opencv in python

hasty grail May 30, 2024, 7:42 AM

#

winged hornet **Is anyone of you experienced in opencv in python**

Please just ask the question directly. (Don't ask to ask: https://nohello.net/en/)

winged hornet May 30, 2024, 9:22 AM

#

hasty grail Please just ask the question directly. (Don't ask to ask: <https://nohello.net/e...

I need some software developers (good in opencv) to work in my startup

hasty grail May 30, 2024, 9:26 AM

#

This isn't really the place to advertise job openings

#

!rule 9

arctic wedgeBOT May 30, 2024, 9:26 AM

#

Rules

9. Do not offer or ask for paid work of any kind.

hearty token May 30, 2024, 11:56 AM

#

Hi guys, I am attempting to generate vector embeddings for a very structured document. It is divided at four levels, the book, the division, the title and then the content. I have been thinking about how to prepare the material for chunking, especially in a way that maximizes on the rigid structure of the documents. My plan is to divide them by the smallest block, the content, and put a block of text above the content, containing the book, division and title. Every chunk will have this header block and the content. How does this idea sound? One concern I have is the repetition of the header block for divisions that has a lot of titles and content, so it would be the same header block over and over, and I wonder if that will be counterproductive to search.

hollow escarp May 30, 2024, 1:53 PM

#

Anyone knows here how could i quntize my object detection model?

serene scaffold May 30, 2024, 1:58 PM

#

hollow escarp Anyone knows here how could i quntize my object detection model?

you mean quantize?

hollow escarp May 30, 2024, 1:58 PM

#

serene scaffold you mean quantize?

Yes

#

Sorry for spelling mistake

serene scaffold May 30, 2024, 2:00 PM

#

hearty token Hi guys, I am attempting to generate vector embeddings for a very structured doc...

From my understanding of your description, this is not what we'd consider a "very structured document". Regardless, having the "header block" repeated shouldn't be an issue as long as you tokenize it as one unit.

#

You might also exclude the header block from each chunk, if it's only there to tell you where chunk boundaries are.

hearty token May 30, 2024, 2:06 PM

#

serene scaffold You might also exclude the header block from each chunk, if it's only there to t...

My intention is to supply some metadata that is not contained within the chunk content, rather than to use as a delimiter for the chunk boundaries. It is good to know that the repetition won't cause any issues since i feel that it is a key component for each chunk to make sense. Thanks for your comment.

buoyant vine May 30, 2024, 2:06 PM

#

hollow escarp Anyone knows here how could i quntize my object detection model?

Easiest way IMO is convert the model to ONNX, and then use onnxruntime to quantize the model

hollow escarp May 30, 2024, 2:06 PM

#

buoyant vine Easiest way IMO is convert the model to ONNX, and then use onnxruntime to quanti...

i already have my model in onnx

#

But im still having trouble in doing that

buoyant vine May 30, 2024, 2:07 PM

#

What have you tried so far? are you following https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantizing-an-onnx-model ?

onnxruntime

Quantize ONNX models

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

hollow escarp May 30, 2024, 2:07 PM

#

tried bunch of methods and they didnt speed up My model

buoyant vine May 30, 2024, 2:07 PM

#

they are typically not magic

#

the speedup will depend on hardware

#

typically they only reliably reduce memory overhead, not speed.

hollow escarp May 30, 2024, 2:08 PM

#

https://www.easypaste.org/file/xqYJMfw0/license.plate.detector.onnx?lang=en thats my onnx model file

EasyPaste.org

license_plate_detector.onnx

hollow escarp May 30, 2024, 2:08 PM

#

buoyant vine the speedup will depend on hardware

Im using arm64 architecture on production

buoyant vine May 30, 2024, 2:09 PM

#

and what type are you trying to quantize to?

hollow escarp May 30, 2024, 2:09 PM

#

buoyant vine and what type are you trying to quantize to?

From float32 to uint8

#

https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/cpu followed the repo mantioned in docs

GitHub

onnxruntime-inference-examples/quantization/image_classification/cp...

Examples for using ONNX Runtime for machine learning inferencing. - microsoft/onnxruntime-inference-examples

#

And it didnt work well

buoyant vine May 30, 2024, 2:17 PM

#

I'm not sure if you're going to gain much via float32 -> int8

#

their intermediary results are likely going back to int32 operations

#

I can't see anything wrong with the model itself

#

Looking at the cpu provider code, most of the operations do not work via int8 they get cast internally

#

Well in theory if the data is int8, you can at minimum do 16 operations per AVX2 register on the CPU compared to 8 fp32 operations.
But in arm this is a bit different since NEON & SVE behaves a bit differently

#

But normally smaller int operations = more throughput loosely

hollow escarp May 30, 2024, 2:32 PM

#

buoyant vine I'm not sure if you're going to gain much via float32 -> int8

Okay, so your telling me that even if my quantization process went well i still wouldnt gain that much speed

buoyant vine May 30, 2024, 2:33 PM

#

GPUs normally have a lot of specialization around the quantization, so they more commonly see a significant speedup with the quantization

#

But on the CPU you are more at the mercy of the execution provider's willingness to specialize to datatypes

buoyant vine May 30, 2024, 2:34 PM

#

hollow escarp Okay, so your telling me that even if my quantization process went well i still ...

It depends, but I can't see a huge amount of specialization for int8 on CPU providers with onnxruntime.

If you're on ARM, you can try use the ARMNN provider that might have more specializations

hollow escarp May 30, 2024, 2:35 PM

#

buoyant vine It depends, but I can't see a huge amount of specialization for int8 on CPU prov...

Okay, whats ARMNN provider ?

buoyant vine May 30, 2024, 2:35 PM

#

https://onnxruntime.ai/docs/execution-providers/community-maintained/ArmNN-ExecutionProvider.html

onnxruntime

Arm - Arm NN

Instructions to execute ONNX Runtime with Arm NN on Armv8 cores

#

There is also https://onnxruntime.ai/docs/execution-providers/community-maintained/ACL-ExecutionProvider.html

onnxruntime

Arm - ACL

Instructions to execute ONNX Runtime with the ACL Execution Provider

#

Not sure what Arm chip it will be running on so not sure if you can run any of these

hollow escarp May 30, 2024, 2:43 PM

#

Thx, maybe this will give me better resoults

untold wadi May 30, 2024, 6:08 PM

#

Hi there,

I developed a webapp with Flask recently. I'd like to add a scraping task in the background (something I've already dev'd) but things get complicated after :

I have lots of videos, from which I'd like to extract some texts but not necessarily all. I've tested a bit with Google's Vision AI, which works very well, but I end up with a lot of extraneous text. So my question is, what do you think is the best way to clean up the data with OCR tech? I have several solutions in mind, for example training a model to detect only the texts I want (but I don't know where to start with this method), I've also thought about using classic regexes but this solution seems extremely limited and not suitable for me. Can you think of any other viable solutions for me? Any frameworks? Ways to train AI models? etc.

Thanks for taking the time to read already and have a nice day!

buoyant shoal May 30, 2024, 8:20 PM

#

hi, can someone clarify what exactly should i be doing for this question ( i don't particularly understand cuz i'm new to pytorch)

#

#

Oh yeah the quoted "question 1" is just this:

#

#

https://paste.pythondiscord.com/TGBQ

#

Anyway this is my code for question 1, but idk what to do for q3), could someone help?

mild dirge May 30, 2024, 9:22 PM

#

buoyant shoal hi, can someone clarify what exactly should i be doing for this question ( i don...

If you have a model, you should make a batch size of 1, and give this batch of 1 image (shape would probably something like (1, nr_channels, height, width)), and then pass the result to the loss function with the desired output (i.e. the input image).

#

This should then give you a loss for a single image. Do this multiple times, and and plot the resulting losses. @buoyant shoal

#

Bit unsure about the terminology your teacher uses. The grammar is also a bit poor. I assume with "total loss" (Q1) they mean the "MSE" over 600 images. But that would give the same answer as question 3, so they probably mean "sum of squared error"?

#

Does that make sense @buoyant shoal?

mild dirge May 30, 2024, 9:31 PM

#

mild dirge Bit unsure about the terminology your teacher uses. The grammar is also a bit po...

Could also be the sum of MSE of the 100 epochs for question 1. That would make more sense I guess. Question 3 would then be the MSE of epoch 0 (untrained). But they also want a plot of the loss of each separate image I would think.

buoyant shoal May 30, 2024, 9:37 PM

#

mild dirge If you have a model, you should make a batch size of 1, and give this batch of 1...

hi, so wait this is wrong?

#

# Create a data loader for training data with a batch size of 600
train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=600)```

mild dirge May 30, 2024, 9:39 PM

#

Do you have the entire assignment description?

buoyant shoal May 30, 2024, 9:39 PM

#

mild dirge Do you have the entire assignment description?

I think this is it basically

#

The code was like already there for the most part and was just filling in trivial details

#

but iirc they already had batch_size = 100 on the code

#

i just changed it to 600

mild dirge May 30, 2024, 9:40 PM

#

Do you have the original code then?

#

WIthout what you put in it

buoyant shoal May 30, 2024, 9:40 PM

#

buoyant shoal https://paste.pythondiscord.com/TGBQ

This is the original code

#

oh

#

okay yes it's more or less the same

#

# Import torch library and the other usual libraries
# Based on code by Naveen on nomidl.com

import torch
import torchvision
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
from torchvision import transforms```

#

# Define a data transformation to convert digit images to tensors
transform = transforms.ToTensor()
# Load the MNIST datasets for training and validation
# images are 28x28 pixel images of handwritten digits in a greyscale
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
valid_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create a data loader for training data with a batch size of 100
train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=100)```

#

yeah the rest is all the same actually

#

i moved all the stuffs from the jupyter notebook

mild dirge May 30, 2024, 9:43 PM

#

Hmm alright. There is just some contradicting stuff, like they initally use a batch size of 100, which means they give 100 images, calculate MSE, update model, do this until all images are used, then the epoch is done.

#

But they want to plot the MSE of a batch of 600 for every epoch (for question 1), which is a bit random, Why pick 600 random images and save the MSE of that? why not just use the MSE of all images?

buoyant shoal May 30, 2024, 9:44 PM

#

mild dirge Hmm alright. There is just some contradicting stuff, like they initally use a ba...

wait i think maybe it has to do with the train_dl line?

#

train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=100)

#

this thing like doesn't even run entirely

mild dirge May 30, 2024, 9:45 PM

#

What do you mean?

buoyant shoal May 30, 2024, 9:45 PM

#

hold on

#

huh okay what

#

it works now but okay basically