#data-science-and-ml
1 messages Β· Page 137 of 1
what's the initial image size?
3, 128, 128
if name == "main":
# Ensure Kafka topics are created
create_kafka_topics()
websocket.enableTrace(True)
ws = websocket.WebSocketApp(
f"wss://ws.finnhub.io?token={finnhub_api_key}",
on_message=on_message,
on_error=on_error,
on_close=on_close
)
ws.on_open = on_open
where:
def on_message(ws, message):
"""Callback function to handle incoming WebSocket messages."""
data = json.loads(message)
if data.get('type') == 'trade':
for trade in data['data']:
symbol = trade['s']
record = {
'symbol': symbol,
'timestamp': datetime.fromtimestamp(trade['t'] / 1000.0).strftime('%Y-%m-%d %H:%M:%S'),
'price': trade['p'],
'volume': trade['v']
}
latest_trade_data[symbol] = record # Update latest trade data
try:
future = producer.send(kafka_topic_data, key=symbol, value=record)
future.add_callback(delivery_report)
future.add_errback(lambda exc: logger.error(f"Failed to send record to Kafka: {exc}"))
except Exception as e:
logger.error(f"Failed to send record to Kafka: {e}")
and the kernel preserves the dimensions?
kernel size for all layers are same ( pool )
so yeah!
what
and for conv it is 3
with padding 1 I assume
yeah
stride = 1 for conv
stride = 2 for pool
so should I start with image or directly with conv1?
I'm not entirely sure, but this makes sense to me
cuz obvs you start with an image
after convolution you have more layers, but size is unchanged, after maxpool same number of layers, but size is halved, then again, after convolution, more layers and size unchanged and then after max pool, same number of layers but size is halved, then it goes through the linear transforms
just send me image for architecture block!
I didn't save it 
though I had the forethought to screenshot all the parts, so you should be able to easily just recreate it off that screenshot
okay!
Has anyone been able to use open rest api for calculating driving distances between coordinates? Ideally Iβd like to have a table built out with distance between combination of various lat lon.
Has anyone used OSRM or similar method to call distance?
I watched this video , he only explains how 1 cell workz
What's the need/ project tho, do you need that much
Sample it for learning purpose, for anything serious you don't use your personal computer
what more do you need though?
you just push all the tokens in a sequence through that one cell
You want to calculate the distance with point a point b lon lat coordinate as if it took a certain path?
Tbh I'm dealing with a similar type of data rn, but I have the odometer at the end and start of the trip
I am looking to build a matrix containing driving distance between all the possible coordinates.
Then based on a coordinate I am at, it can tell me the other closest coordinates near by.
So far I have found two solutions though unsure of the cost. Distance Matrix API from Google, and Travel Cost Matrix by ESRI. I was curious if there are others someone has already tried.
I want to know if we have a network with for example 2 hidden layers , each containing 3 cells , how would that work?
I suppose you're looking for something like this https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
when looking in the horizontal direction, it's the same cell just being passed all the tokens sequentially, in the vertical direction it's different cells though
though as far as I can tell, despite this containing several cells, it's still a single hidden layer 
hmmmmmmmmmm
or you just can't have two hidden layers with three cells each and it's just gonna be 6 hidden layers
that is, unless your hidden layers contain sublayers, meaning, you have an RNN, some other thing, then another RNN, then some other thing, then another RNN and then this whole sequence repeats twice, I guess you could consider that group one hidden layer?
right, something like
[
[
RNN
Linear
RNN
Linear
],
[
RNN
Linear
RNN
Linear
]
]
this would be 2 hidden layers with 2 RNN cells each
cuz you could do say this
[
[
RNN
RNN
RNN
],
[
RNN
RNN
RNN
]
]
but that's equivalent to just
[
RNN
RNN
RNN
RNN
RNN
RNN
]
and I mean, sure, you can flatten this one out as well, or even make it into 4 hidden layers with an RNN and a linear layer each
it also depends on what you consider a layer, is an activation function a layer as well?
[
Embedding
[
RNN <Tanh>
Linear <Sigmoid> |
RNN <Tanh> |
|--- + ----------| (residual)
|
LayerNorm
RNN <Tanh>
Linear <Sigmoid>
] x 2
Softmax
]
sth like this could be 2 hidden layers with 3 RNN cells each I guess?
[
Embedding
[
RNN <Tanh> ----|
RNN <Tanh> <- | - (not included in the residual)
RNN <Tanh> |
|-- + ---------| (residual)
|
Linear <Sigmoid>
] x 2
Softmax
]
or sth like this, for instance, idk, just dreaming up some architectures, lol
(also need an embedding layer before all of these architectures, lol (added to the last two, but same goes for the previous stuff))
if i have numerical data and i need to preprocess it by addressing input data skew, running SMOTE, and standardizing the data, would this preprocessing pipeline be acceptable:
log transform input features -> split data into train/test -> SMOTE training data -> combine training + testing data -> standardize inputs -> split back into training + testing (using same split approach as step 2)
standardizing should be fitted on training only @coral field
why so?
Because you shouldn't use information from the test set
It is a minor thing, but formally it should be done like that
If your test set has a different distribution, and you standardize based on those values, you already use info from the test set, so your results could be kinda biased in your favor, which is bad.
alright, but if the distribution is really similar, im assuming the performance wouldn't matter too much?
(is standardization the same as normalization?)
Yeah, like I said, it is minor and probably won't matter. But if the distributions were slightly different, it could bias your results, so it is just good form to do it like that.
got it. thanks so much!
one last thing, if my training and test sets are very different, yet standardization was fit based on both sets combined, would the test set results generally be a lot weaker, or would it depend
if you fit and transform seperately then probably yes. Just to make it clear, you should fit on the training data (so you get train_mean, train_SD) and then you do
test = (test - train_mean) / train_SD
train = (train - train_mean) / train_SD
If the distributions were very different and you did
test = (test - test_mean) / test_SD
train = (train - train_mean) / train_SD
Your model would likely perform worse.
π
so this is dumb but I didn't that each layer passes it's hidden states to the next , that was my question initially
what is a residual π
Well in precalc itβs the actual minus predicted value.
Yeah I know that , but I don't get it in this context
I donβt understand the code at all sorry.
No worries
in simple terms (in this case meaning that I have barely any ideas as to why, lol) you just add the input to a layer to that layer's output, it's a skip connection, you can also use concatenation instead methinks, but that requires more memory
the layer's don't pass the hidden states to other layers 
they pass the outputs of the cells to the next cells and such
to me I intuitively see it as adding earlier data to some more processed data to add more context, in conv nets this can be understood even better as you're essentially merging earlier (thus more/larger) features with the little ones, so like, the network knows about both the smaller and bigger details in the original input
idk which framework, in keras you'd do (pseudocode):
encoder = Encoder(...)(input)
output = Decoder(...)(encoder)
AE = Model(input, output)
can't you just create a new AE class and use the encoder/decoder models within i.e set them in init, call them in forward?
class AE(torch.nn.Module):
def __init__(self, encoder, decoder):
super().__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
i can't really help more than that now though, that's just a sketch-idea
Oh gotcha sorry
hey um, im about to start college and im taking AI and ML courses in my college, i have some knowledge in AI and ML like about activation functions, layers, how layer values are calculated and some other things but i feel like im nowhere near to the actual stuff, can someone provide me with ways to learn and where to start from? (i dont mind learning from the start), thanks in advance
correct me if im wrong but isnt college there to teach you that?
Anyways im sure you can find some great courses on coursera udemy ectc
There is an ML math book in the pinned messages on this channel
yeah but im afraid if they could teach me better than me
look at pinned messages
What textbooks do you suggest for neural networks? Or where do I read research papers. I have heard when getting into these it's important to read many papers
Aight
Thank you!
ight thanks
ig not an introductory book, judging by who's commented
Do anyone know how to denormalize w and b? I can do it for w, just not b
lambda x: (x - x_train_mean) / x_train_std) #normalize
lambda n: n * x_train_std + x_train_mean #denormalize
x_train_mean and x_train_std both have the shape of w, so it works
But for b, which is just a scalar, how do I denormalize it
I just want to denormalize it so I can plot the graph
Denormalizing w works, but I can't do it for b
The gradient looks right, which mean w is denormalized properly, but b is still off
Thanks!
Is what I'm doing completely wrong? Can you even undo normalization for parameter?
you can do for a list of values, the meaning depends in your especific problem
but a parameter no, you can't; the mean is itself, and std is 0.
You seem confused
You normalized the data x, not parameters or watever
I know
well that's batch normalisation
I see
yeah
Uh no what am I saying
I think I have to denormalized the prediction parameters before plotting?
Uh what am I saying
Sorry nvm you don't have to, but if you fitted with normalized x just plot normalized x and y
Oh I want to plot it without it normalized? Or does it not matter
Cause I am scattering the raw data
If I predict a line, then get the prediction, I invert the normalization on the prediction
Is that the correct way?
yes
I'm just getting myself confused, trying to plot the function
so if you normalised x and y, then you undo it, or plot both normed.
i thought you meant w and b originally
That's my first idea, also my stupid idea
not stupid, it's confusing because NNs do normalise w
(it's called batch normalisation.)
Oh, I've not gone that far yet, I'll keep that in mind
I new learner Python guys help me
let's say that you fit a line to data, if you normalise x and y, then your solution parameters will correspond to that scenario
plt.scatter(x_raw[:, 0], x_raw[:, 1], c=y_train)
x1 = jnp.arange(-5, 5)
x2 = ((-b) - x1 * w[0]) / w[1]
print(f"x1: {x1} x2: {x2}")
plt.plot(x1, x2)
So I am trying to draw a decision boundaries
How do I plot this, if my x_train is normalized
plot(x_normalized, y)
Oh, what if I am trying to plot a decision boundaries
You litteraly fitted your model to predict y in function of x_normalized
Yes, I am getting myself confused
I'm trying to plot a decision boundaries of a logistics regression
Do I just invert normalized x1 before plotting?
I don't have a computer next to me so I can't test my guess
Or should I give up and scatter the normalized data and call it a day
I got to be honest I am a bit confused
Is x_1 here the inputs and x_2 the output? Or are those 2 inputs?
if x_1 is the input, and it was trained on normalised data, just do (x_1*std)+mean; where mean and std are the ones you got from the dataset.
Oh I'm trying to generate a boundary line
all you have to do is scale the final output of the network appropriately, not the prediction params
The equation is derived from
When it's =0 sigmoid is 0.5
So that's the decision boundaries
But I have no idea how to scale this output
Am I stupid, or do I invert normalized x1 and x2
Is that it?
Wait no
I am confused
if what you did is to fit x_1, x_2 (within z) and found w1, w2, and b to predict g(z)
the same op you did on training data, you do now on inputs (that's my take from your desc. i can be wrong.)
If x1, x2 is fit into z, g(z) should always be 0.5 right?
Because the x1w1+x2w2+b=0
And sigmoid of 0 is 0.5
yeah i meant 0.5
Even knowing that, my brain still gets confused when I want to plot a decision boundaries that fit the not normalized data
so you have to norm it, using (x_s - mean)/std imho
cuz that transform was done to the training data
Do I normalized x1 and x2
Just realized I pasted the code twice
well, those correspond to normed data, if your training data was
step one, plot the decision boundaries for the normalized data
step two, inverse the operation you used to normalize it and apply that to scale the plot
Oh, just plot normalized data and scale the whole thing?
So if x axis is x1 and y axis is x2, I can just inverse the normalization for x and y axis?
if your normalization function can also be described as x_normalized = a*x + b, then you can just do x = (x_normalized - b) / a
Like this, right?
just make sure you do it in the correct order. if you subtracted first and divided second, then you need to multiply first and add second

the cute name for the reversal of order of operations during inversion is "shoes and socks theorem"
if you put your socks on first and then your shoes, then you first need to take off your shoes and then take your socks off later
in respect to polars lazyframe are lazy queries similar to lazy computing lazy loading? or is it a coined term by polars?
I can't find any good technical discussions of how laziness/lazyframe in polars works
my real issue is the notable slowness of dataframe.unique() but I'm trying to find the true issue as I doubt i'll get solutions to that
laziness is a more general concept in computer science, polars didn't coin it
show me the code π
I'm aware of the term laziness in CS more curious if thats what polars is doing
yea sure why not
yes, in a way that is similar to SQL. Several instructions are stored and then a query optimizer is ran
processing time is like 0.5-0.77 seconds
ahh so it has no inherent downsides? just optimal for repeated queries
the unique method calls lazy frame
return (
self.lazy()
.unique(subset=subset, keep=keep, maintain_order=maintain_order)
.collect(_eager=True)
)
on the former it's naturally lazily evaluated and the latter it's eagerly evaluated
return wrap_ldf(self._df.lazy())
yes
don't get me wrong i may be misunderstanding something ofc hence why i asked if there was more technical writing on it
I'm having a very very hard time to understand what you are asking
methods on the polars DataFrame class have identical implementations to the Lazyframes from me reading the underlying code
any good pytorch tutorials?
The code you just removed could defnitely benefit from using LazyFrame. Specifically, you need to swap out pl.read_parquet to pl.scan_parquet
pytorch/deep learning tutorials
The pytorch docs
would that result in a faster read if only one read is done? 
So, it seems to me from what you've sent me that the eager versions are implemented by doing lazy -> computation -> collect
But that's kind of irrelevant to the point of what LazyFrame is all about
aight
yea i'm using the eager version
yea I was curious on how the lazyFrame works i was trying to read more than it
it's that the query plan functionality of polars can see "oh you're doing a group by and 2 aggregations, those can be done concurrently" and so on
to see if i can speed up my computation, i'm only doing a singular read tho not multiple reads
Yeah, if I were you I'd start by not using the eager side of polars
Do everything in lazy
I'm looking for the right docs page because this is discussed at length
In the user guide
hopefully this question isnt lazy (no pun intended) but this would result in overall faster processing?
the lazy API allows Polars to apply automatic query optimization with the query optimizer
Because of this exact reason it is
Or rather, it can be
ahh then its on me for starting with pl.DataFrames then
seems like columns in lazyFrame has no setter
Polars typically has two methods, read_X and scan_X
Use scan_parquet and you'll have a lazyframe
pardon my ignorance btw like 2 weeks ago i moved off from pandas to polars
Did you read the user guide?
yes and the documentation
hmmm
wdym with this?
hence why i was able to transition everything easily
Also read this one again https://docs.pola.rs/user-guide/lazy/query-plan/#non-optimized-query-plan
you need to use rename opposed to assigning
You can compare the non-optimized query plan vs the optimized one if you want to see if you're getting any gains from using a lazyframe
ahh
You can assign new columns with a lazyframe
.with_columns works there
Or how were you assigning columns?
how i was assigning columns, i appreciate the help π«‘ i'll read more
yea it does
huh?
Anyway, I use polars extensively at work. I'd always tell people to use LazyFrame, especially Pandas users
Even if you get no performance benefits at the very least it disables all anti-patterns you can do (anti-patterns you can freely do in Pandas as well)
The only operations you can do are quite optimized
i'm really a C++ guy i'm ngl not bad at python or nun but i've only been using python for work recently so i appreciate the help
yea i grasp what you're saying
There's certain things you cannot do like sorting iirc but you basically need to discipline yourself to collect once and sort at the end or sort at the front and then go lazy. Something like that. Not collecting multiple times and so on
So like, from a UX pov it's nudging you to more efficient code π
most of the efficient stuff is written in C++ bindings here but i understand what you're saying and i think what you recommend is best so i'll go that route
its not much to improve anyway better to write the proper pattern early
In your work project?
yea python is more a top layer thing if that makes sense
if something is too slow usually rewrite it in C++
I think swapping what you have to lazy is trivial. You had none of the anti-patterns. Aside from sorting I think you could just replace readt o scan and you're done
ahh makes sense i've got more reading/re-reading to do on my end appreciate the patience
Anyway, if you have any experience with DBs, just think of it as SQL and query plans there
yea I understand wym
easy peasy
actually kinda insane i'm ngl π but yea appreciate the help alot
not saying you won't get help here but #web-development may be better
Oh shit, i don't know why it switched, didn't notice, thanks : )
wait
you still need to .collect()
It (obviously) takes 0s because it didn't do any work
i did, only issue is collect is a tad bit slow but i'm reading more
ahh
So the secodn function should also return a DataFrame at the end
but inside it should only deal with LazyFrames and the very last line should be return df.collect()
yea thats what i did
The screenshot has the second one's return type as a LazyFrame though?
that's cause i'm actually querying the in memory lazy frame in a seperate function
i'm fine with using lazyframe until i'm complete with all operations
albeit now i'm weighing if we should write the cleaned data opposed to recleaning data sets on every read thats something i'll prob pick up in a meeting
For sure. My pointer here is using something like the "medallion architecture"
It's a buzzword for basically writing raw data in its original schema (bronze), cleaning it a little and writing it (silver) and then maybe doing a couple of final transforms for your end application(s) (gold)
yea i'm aware of it
perfect
my project is historical analysis still in the earlier stages so we're still discussing long term stuff
i got moved from the real time analysis team lol
everyone is concerned with performance but not sure how to obtain it so i've been mostly fronting that load
Polars is a fine choice then
yea its a major improvement over pandas team is small as well so not a huge issue of adoption
pandas had us discerning if to scrap the original python/C++ architecture lol
pretty sure I read somewhere that the "eager API" is just the lazy API but it collects after every operation or smthn
you'll prob see a nice difference when you queue up a large amount of ops then collect, allowing the optimizer to do its thing
yea thats where i got confused early on when reading the source code the eager api was literally just the lazy api calling collect
That was new to me as well, nice that I learnt that
Thanks everyone! I plotted the decisions boundary (for data that isn't normalized but parameter is trained on normalized data) ! Thanks!
nice :-)
in a single string? you may get the column as a list and join the list ?
nvm i figured it :))
The final solution is quite easy, no idea why I didn't think of it ealrier
Thanks everyone for your help
np. nice quote
how do i unload datatypes from memory that i no longer need themπ
What was your solution?
Because the best solution involves .str.cat
You don't unload data types, you unload data.
stores the column values into list and then conc all list entries in single string
Try doing df['article'].str.cat(sep=' ')
i mean i want to unload specific data types that are no longer relevant
It sounds like you think "data" and "data type" mean the same thing.
Are you trying to delete columns from the dataframe?
Int is a data type.
You can delete every int. And you can delete every int in a given column. But you can't delete the whole concept of int from your code.
i meant like lets say i used a string to create dictionary but i no longer need the original string
no longer need string in memory*
The string will get deleted automatically.
If you want to delete a column, do del df['col']
Data gets deleted automatically when the last variable/reference to it goes out of scope
If you want to delete something before that, you need to delete all the references to it, so that there are none left.
but what if it never goes out of scope, i imagine out of scope like going out of for loop or function
When you leave a function, yes. Not when you leave a loop, though.
In which case you follow the second part of what I said
pretty sure if u intialize anything inside for loop , then its gets deleted from memory when u leave for loop
That is false.
really
Yes. If you define a variable in a loop, the last value for that variable persists after the loop
Until the end of the scope
interesting
Same for the "loop variable"
i think imma restudy global and local scopes
Sounds good
Keep in mind that python makes it impossible to delete data directly. But it's guaranteed that a variable will never refer to deleted data.
You can only delete data indirectly by deleting all its references.
how to delete references written in previous lines of code
like ones where the compiler already ran
i think if compiler sees it no-longer referenced in future, it decides it went out of scope
You say compiler, but you mean interpreter.
The interpreter doesn't look ahead to see that a reference won't be used again. It only deals with that at the end of a scope.
But you can manually delete references (not values) with the del keyword
!e
a = 5
print(a)
del a
print(a)
:x: Your 3.12 eval job has completed with return code 1.
001 | 5
002 | Traceback (most recent call last):
003 | File "/home/main.py", line 4, in <module>
004 | print(a)
005 | ^
006 | NameError: name 'a' is not defined
I'm glad
i love that programming communities help each other out
A lot of programmers got a lot of help when they were starting (I did) and want to pay it forward
print("Hello World, Im New Here")
Have anyone used lets-plot for python before?
The interactive nature of the plot looks interesting, but the syntax is quite different from plt
hey guys is this book a good resource for learning stats? I took it in college years ago and since then have forgotten much of the calculus and linear algebra. Could it give me a good understanding form the basics?https://www.oreilly.com/library/view/practical-statistics-for/9781492072935/
https://www.oreilly.com/library/view/essential-math-for/9781098102920/
and is this enough for the refresher of Linear algebra and calculus. Now I would love to go through a course and go over the theory and proofs but don't got the time
Do you think it would make sense to make a real jarvis using AI?
are you asking if it would make sense, or if it's currently possible?
I'm asking if it makes sense
what does it mean for it to "make sense" to make something?
someone could ask "do you think it would make sense to make pencils?", but I don't understand why someone would ask that.
I'm sorry my english is a little bad but I hope you understand.
Dude, do you think it makes sense or not?
that's okay.
I do not think any beginner to AI will ever be able to create Jarvis, and that they should not even try.
I'm sorry, but I don't know what you mean by "make sense". You will have to rephrase if we are going to continue.
I know you know what you mean, but I sincerely do not.
I haven't exactly started yet
You should not try to make Jarvis. You will give up before you are succesful.
Then let me ask differently, do you think this project can be done, that is, is it possible to make real high-level jarvis?
It might be possible for someone with a very, very advanced understanding of AI to be able to create something like that. You would need many years of full-time experience in AI before you can get anywhere close.
If you're interested in AI and chat bots, there are other projects you can undertake that are attainable
This is not the one.
I have the necessary ai chat apis
I have 3 chat ai api and 1 imagine ai api
can you give an example
you can make a basic chatbot using markov chains
markov chains are a mathematical concept, not a library.
I "animated" gradient descend (nothing groundbreaking, I just find it interesting)
that's cool π
can you do it for 3d?
I don't have any data for it
Nor do I know how to generate 3d data
you can make arbitrary data points in 3d space and assign an arbitrary decision boundary, and then animate gradient descent finding it
Just random point?
right
I'll have a look
- make arbitrary points in 3d space
- make an arbitrary decision boundary
- assign each point to "yes" or "no" based on which side of that arbitrary boundary they're on
- run your code for finding the boundary
I assume I can just add x1**2, x2**2 ,x3**2 to make it curve?
something like that, I think
Thanks for help
... Great, I type annotated and type guarded all my function, and now they only allows 2d array
nvm
I am stupid
3d logistic regression looks fine
Huh
Something is missing
why do some plt function, like plot_surface not appears in autocomplete?
Oh no, you can't animate plot_surface
oo nice
I didn't change any code for that ngl
But I can't animate surface
def animation(frame):
global w, b, plot
if plot is not None:
plot.remove()
w, b, w_grad, b_grad = ml.grad_descend(
w,
b,
x_train,
y_train,
0.1,
ml.logistic_cost,
)
history.append(ml.logistic_cost(w, b, x_train, y_train))
x1 = jnp.arange(-10, 10, dtype=jnp.float32).reshape(-1, 1)
x2 = jnp.arange(-10, 10, dtype=jnp.float32).reshape(-1, 1)
x3 = ((-b) - w[0] * x1 - w[1] * x2) / w[2]
plot = ax.plot_surface(
inverse_normalizer(x1, argnums=(0,)),
inverse_normalizer(x2, argnums=(1,)),
inverse_normalizer(x3, argnums=(2,)),
)
return (plot,)
I don't understand how do I plot a surface
Ignoring plot_surface vs plot
def animation(frame):
global plot
if plot is not None:
plot = []
gradient_descend.next_epoch()
x1 = jnp.arange(-10, 10, dtype=jnp.float32).reshape(-1, 1)
x2 = jnp.arange(-10, 10, dtype=jnp.float32).reshape(-1, 1)
x3 = (
(-gradient_descend.b) - gradient_descend.w[0] * x1 - gradient_descend.w[1] * x2
) / gradient_descend.w[2]
plot = ax.plot(
inverse_normalizer(x1, argnums=(0,)),
inverse_normalizer(x2, argnums=(1,)),
inverse_normalizer(x3, argnums=(2,)),
)
return (plot,)
Is the function version neater or the class version neater?
i mean jarvis is what got me into programing that and video games and ive almost completed a full one currently am struggleing with making a custom tts voice
i be happy to share it i just completely reworked how my reward system works
The "Jarvis" in the Iron Man movies is more advanced than the most advanced AI that currently exists anywhere, so when someone asks "how do I make Jarvis?", it sounds like they have delusions of grandeur.
If you someone wants to attach ChatGPT to a TTS system, that's a very far cry away from Jarvis, but it is doable.
i completely agree with that statement jarvis is not buildable with ai the way it is today
chat gpt isnt even close but im saying ive created a smaller manageable verison
do u agree?
I can't get into all my thoughts about that at the moment
I'm a language AI professional.
honestly id be interested to see what you think of mine if you call it awful so be it yk just feedback is helpful
linguistics?
ye
Does anyone know how to use neat
Does anyone know how to access ChatGPT 3.5??
I only get 4o and 4o-mini even when I'm not logged in, I needed 3.5 specifically for completing a project and now it's gone
You should be able to access it via the API, but as they state in https://platform.openai.com/docs/models/gpt-3-5-turbo
As of July 2024,
gpt-4o-minishould be used in place ofgpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast.gpt-3.5-turbois still available for use in the API.
so I don't see them using it in the website
Hey guys. How would I be able to detect whether someone is looking at their phone using the phone camera itself
I made the code to detect the eyes
but what calculations or idk ml model can I use for checking whether the person is looking at the phone
Asking chatgpt just told me about the Eye aspect ratio but all that does it check whether you are blinking or not
Any other ideas because the only thing I can get from that is that people dont blink as much looking at the phone but thats not close to good enough
And It will potentially never be possible
Even Lecunn said that AGI is illusion but OK
Do you want to elaborate on those ideas?
That's kinda a pb that you don't know who that is
Some people argue that human intelligence isn't general intelligence either, it's just a powerful narrow intelligence
And that intelligence can't be general in principle
I think that's kind of pedantic in this context but it could be a useful distinction sometimes
What's a pb?
Hum kinda, being able to evaluate the quality of an output for example, better context in a unique scenario etc..
Except, I hope, you can use logic
IMO humans can't inherently use logic either, we learn to use basic logic over the course of years of training
And humans make basic mistakes in logic all the time even as adults
I don't know what a convolution kernel is, I'll read about that
we can still use it, there is almost none in LLM now except a sequence on trained data
bro can u tell me how to implement chatgpt api in a discord bot
You can have context, there is self attention and all, but at the end, we can't say it's really intelligent, without training on terabytes costing millions of dollars it's not that clever
Seems to help : https://github.com/antoinelame/GazeTracking
Only problem im running into is a module not found error
after git cloning it
idk why
Not exactly but a lot of useful functions
I would agree that LLMs in their current form don't have "true reasoning" (or they are extremely bad at it), I can't really define what true reasoning is but it's a "you know it when you see it" sort of thing
I saw something nvidia was working on but its not released to ublic
I'm very excited to see the field in the next decade though, things could flop or go in a very interesting way
Yeah it's unfortunate
Btw, to know who does what you use kinda is important, Yann Lecunn is pretty much the father of CNN and one of the very top experts
But tbh the "AI that replaces swe, DS" etc... Is a fantasy at the moment for me, we can have the time to be retired before this happens
you've to take researchers opinions' with a grain of salt though
witten is a genius, hasn't really been correct in many ways
I still prefer to take researchers opinions rather than Nvidia CEO or Elon musk
i do agree with lecunn though, but ai's hype isn't really related to AGI
it's because there is a lot of money for large companies, and they dont need to get to agi for it..
scaling up models will make them better (read as: more useful and a replacement for more jobs.), even though it won't make them agi
'''
im trying to create this vehicle detection model and as of now it detects,counts and tracks using deep sort. im using pretrainied yolo weights. now I want to record which path the car is using (for example, in a junction). any suggestions?
For time series is it necessary to transform 8:50 a.m. July 25, 2024, into year: 2024, month: 7, day: 25, hour: 8, minute: 50 ? (so 5 columns instead of 1)
some models might work if you pass it as a single timestamp, but you must transform it to a number for the model to work with it at all
So you can't just keep it as a datetime, at least need to turn it into unix time
remember that in the end of the day ML models are just ridiculously complicated functions that turn one sequence of numbers into another
they can only work with numbers - anything else must be encoded as numbers (even text, images, videos)
Agreed. The human brain needs a fraction of the energy and training data for new tasks
We do multimodal, multitask few shot learning by default
Ehhh, I don't know about the training data
How much training data has a 3 year old has had to be able to detect edges of objects and identify them as separate objects from something else?
I'm too lazy to type out the entire premise but we simply need way less data
To learn new tasks
Because we don't start from 0
You aren't counting all the frames in this baby's lifespan before that
I come pretrained
So do all foundation models
She's pretraining as well
You need a baby with 0 seconds out of the womb
How much training data do they have compared to a 3 year old though? That's what I'm curious about
I think way less
Ok but that's the uninteresting premise
It teaches us very little
Being able to adapt foundation models with a similar amount of data for new tasks (few shot or even zero shot learning), that's the interesting premise imo
In her thesis she "trains" humans to do a certain task with abstract shapes (so humans don't have a big bias)
And does the same with CNNs
IMO that would have to be performance above human level when it comes to performance to data ratio
There's always the asterisk you can put on ML/DL studies "did you use the right hyperparameters" etc.
But the results are quite clear that humans have no prooblems in finding the decision boundaries correctly and interpolating/extrapolating to other abstract shapes
Whereas NNs struggle to reason in that space
The same NNs that can outcompete humans on say the imagenet dataset
Right now we're at "NNs can beat us humans at many tasks if we're willing to spend β¬$Β£ to create (or finetune) a specialized model for each task"
How is that contradictory? That's two different things
Both are important
how the hell have i managed to fuck up like this
a bad dataset, and terrible training
It's not a goal to know every scientist or whatever but he is one of the top expert, that doesn't mean everything he say is pitch perfect but it's always good to know what the most knowing person in a field say about his field rather than speculating as simple students/practisionner
He's litteraly written articles whatever
Hey guys, I wanna get into medical field but I love AI and math as a whole, but I don't know if it will be worth it or not. Should I keep doing it?
what are you asking would be worth it or not? going into medics, or AI?
Reflexion by quote yet doesn't want to know scientist of what you use look funny
i still wanna do medical stuff as my career, but I wanna see if AI is worth my time
i just wanna know if it's still good to learn
I would always encourage people to learn about things that interest them. but you would need to study AI full-time for a few years for it to meaningfully impact what you can do as a medical professional, and you probably don't have time for that if you're going to go into medics.
I have like 4 years before I go into college, is that sufficient time for me to atleast do some meaningful AI research?
Thank you!
you won't be able to do any meaningful AI research as a high school student.
unless you can get a summer research internship. I used to work in a lab that took high school interns, but those people wanted to do AI as their primary career focus.
to be clear, I think that you should read/watch more about AI and try implementing basic models, but for your own edification.
research as in personal gain of knowledge or a profesional level where you publish a thesis ?
gain of knowledge and i wanna advance reseach in AI for biomedical stuff
ofcourse you can do it if self paced, there's no age limit or an age to begin researching, but publishing and thesis stuff are on phd level of education so it might be challenging to do it right away
@dry raft to clarify, do you want to be a healthcare provider (doctor, surgeon, etc) or some kind of biomedical researcher?
it might be likely that they want to use Ai to improve health care systems
I wanna do neurosurgery, but I wanna help with AI advancement in medical field.
Like disease detection, protein folding, etc.
yeah that could work, start with computer science first as it is shorter then med path
if you have pre-med programs in ur country u should take compsci
yeah, i'm learning linear algebra by stelercus rn
btw thanks stelercus, I'm starting to understand stuff more!
btw i'm an incoming freshman
this person has 4 years before going to college, 4 years is so largely enough if they start learning data sciece and algorithms first
so it is litreally possible
thank you man
before you know if something can happen or not, you gotta try first
rght now i fw image classification models and do linear alebra
well thats a good start
I want to reimplement a paper sometime soon tho
But I also need to learn some basic pytorch
i get what you mean, but the field of ai they're iterested in is related to their main job, which is doable since it's not completely 2 different things
And then you don't forget it either as new data arrives (no catastrophic interference). Other important difference is that the order in which you see them matters, it's not like deep learning in which we randomize our order of inputs (and resample them randomly). (Some) neural networks also lack statistical consistency, you could give two of them the same infinite data and they may never arrive to the same conclusions (initialization matters a lot (and again the order which they are given)).
(Most ML methods really want statistical consistency, for obvious reasons)
About the ordering. There is in-order MNIST, which is sorted instead of shuffled, deep learning completely flops on this (and it can't be fixed by sticking to those methods, although many have tried).
Also note there are no epochs, single it's single shot once through.
(This means you can't plot a nice graph for it learning, so you can't directly compare it to deep learning papers)
(not being able to directly compare things prevents a lot of people from ever trying it, it's much easier to have something directly comparable when you are forced to publish (this locks in a lot science, not just ML, and it's holding back progress))
I need help with some simple code for ai. Please respond if you can help. I am willing to pay. Thanks!
Hello, you are not allowed to offer payment here.
When you ask for help, always give the information needed to help you in the same message.
Am I allowed to say that lol
ohh okay
I have a question for an assignment for an summer educational class on ai class but i havent even taken machine learning.
It looks relatively simple but the professor never taught us anything about how to code ai.
what are students in this course expected to already know?
He said "It will be for everyone"
nothing
Are all the questions like this? Maybe some questions are optional and you aren't expected to get all the points
are they expected to know python?
This is way too open ended, allowing it to range from trivial to the hardest unsolved problems.
not really because another student in the class never wrote any code in his life
okay. well tbh, I would just ask ChatGPT to produce code for the simplest solution to the problem. because it's unreasonable for the instructor to ask you that.
he said he would check for people using chatgptπ
if you have no idea how to approach this problem, it would take several hours of someones time to walk you through all the concepts needed for you to solve it.
This is why I majored in math and not comp sciπ
I see
do you already "know python"?
Yes I took the same professor for introduction to programming and I made an A
It was all python
What is the most complex Python program you have made?
I made a sudoku solver in python
But ai is different
I asked the best programmers i know but they said they never learned ai
Since the question did not specify which library to use, nor what the quality of the generated images need to be, you can really do anything.
Random noise is a valid submission.
It did not say the AI needs to be good...
Data preprocessing: ignore all data. Model architecture: random uniform noise generator. Training and evaluation: No training, evaluation. Generated sample visualization: converted noise to a image using Pillow (or some other image library).
Alternative: it just stores the entire data set and spits out one of them as the generated image.
(comp sci is math, they are just being coy)
LMAOOO
ngl u can prob just fine tune an onnx
model and be like "transfer learning"
how do i do that
a lot of online models tend to have their weights in onnx format so just download weights and load into favorite framework and u can fine tune as needed
Can u help me
what do you think about this post https://stats.stackexchange.com/questions/203288/understanding-almost-all-local-minimum-have-very-similar-function-value-to-the
It would be interesting to read about people's opinions on this, especially in regards to more modern things going on
I myself don't know enough math to understand this
nice. do you understand it conceptually? (approximately)
I think so?
I think it says that with large enough models, you can find the local minimum instead of the global minimum and it works fine
So not the place where the function's output is at its lowest but just one of the places where it's lower than everything around it
And there are many of these points and it's way easier to find them than the global minimum
But then there's some other stuff with the saddle points which I didn't get
yes, quite a striking property
nice one line summary
with large enough models, you can find the local minimum instead of the global minimum and it works fine
i'd say 'local minima' instead of 'local minimum'
Oh okay
(just bc there are many normally)
And global minima too?
i'm unsure what's the convention there !
another phrasing would be 'you can find any local minimum...' maybe
Hi guysπ
Please correct me if I am not allowed to comment here.
I am a complete newbie.
Is there someone who can help me to learn how to build Autogen from Microsoft. They have a github page. You can google it. They have tutorials but I am beginner and hard to follow. I also would like to adjust the code for my use case.
Also would be cool if you play LOL or have PlayStation console so we can combine coding with playing and having funπ
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap - microsoft/autogen
just remove the 'pay' part, it's not allowed iirc
(it's not that we don't need money.)
π€£
I mean I can buy LOL skins instead of directly transferring
idk anything about it, you could post it in some freelancing platform
probably others here do know though, so wait jic
this plot compares how optimisation algorithms behave near a minimum (sp a saddle point), it's really nice https://web.archive.org/web/20160725050440/https://imgur.com/a/Hqolp
Is it related to this^?
it's related to bengio's hypothesis
i.e: higher dim spaces (in other words nets with many neurons) have a proliferation of saddle points that optimisation algos struggle to get around of.
pretty sure there is more recent research on this
Heeey!!!
Good Morning!
Does somebody know the best ML (Machine Learning) engineer roadmaps for basing on my studies?
I appreciate any help
Oooh cool, I'll save the link and check it when I have the time! Thank you for sharing π
you are welcome
scan through the pinned messages at top right corner if you are on desktop
(...) both articles believe that in order to reach the global minimizer, one must overcome the optimization challenge of saddle points. The first article just believes that local minima are good enough.
One common folklore belief is that the optimization landscape is similar to that of an egg carton,
XD
π₯ β
@rich moth Hi mate π any chance you can help here?
idk exactly why, each egg pocket should represent a minimum in the analogy, but OP says "that's wrong" in the next paragraph
maybe depends on how OP imagines the box...as some don't have saddle points
yeah what happened?
I need help with this:)
come on ping that , nvm I will check that
This
They have a tutorial on github. Just a sec
at what point you are having problem?
https://github.com/microsoft/OptiGuide
Maybe you can guide me
Large Language Models for Supply Chain Optimization - microsoft/OptiGuide
I don't think so, but wait someone will def.
I know that for anyone who knows Python it is an easy task. Not for me as I am a beginner
in python also?
you are begineer and directly hoping in autogen?
Yes, just check the link they have step by step guide
In python yes. I have R knowledge but its irrelevant
yeah so wait , someone will def. respond to this!
what inputs are you giving to it?
dataset?
link that!!!!!
okay got it
it was fist one
now send the layer code!!
model code
yes
dataset is good, for inference which picture are u giving to it?
( to test the model)
Like I have partitioned it from the same dataset. So technically from it
like for training, val and testing
the dataset is not labelled I guess right?
like which photo has which characters
so what about loss?
Ops sorry. This was the dataset. I sent the wrong link. I changed it - https://www.kaggle.com/datasets/akashguna/large-captcha-dataset
wait wait, what about ur last fc layer??
I think there is a mistake there
Yes exactly . I felt so .
Epoch 99: Test Loss 0.5723
Any idea as to why do we use -e .
Apparently it should output for 5 characters but it is doing only for one? is it?
why 4 conv2d layers??
and 3rd and 4th are same I guess
3rd converts -> 128 - 256
4th converts -> 256 -256
why>
256 - 512?
it converts normal installation process into editable installation
yeah!!, but still why 4 layers??, I mean you can but then you have 2more fc layers!!!!
because images are not that complex, ( to add more layers)
maybe for now work with 4 and add 2 fc layers
Wait so if i change the code in any of my files , I dont need to reinstall the pacakge with the updated code? it automates if that's the corect word?
Oh . I was trying to experiment with it. Not sure but will it over train? Also two fc layers?
and fc1 should be like
fc1 -> 512 - 256
fc2 -> 256 - classes
okay so if you want you can try with one fc layer also!
yeah!
@hard shuttle make changes and send again the model code
I see. What about num_class * num_char?
don't worry about overfitting we have dropout layers if required!!,
first it needs to predict something
okay
they are the last layer! wait lemme focus fist!
suppose the captcha have this word -> zvQn
now you have inserted this into model , but the thing is
your model will analyze the picture
and "classify" into only one letter!!!!
here is the catch!!
that's why it was only giving u only one word!
shit happens!
yes. I tried it with val dataset and test sets. Its only predicting one letter. Not all
yeah, which means it is predicting word!! but we are giving it more than one!!
and the model is classifying that!!
Oh wait. I meant it was giving 5 characters but only one of them is correct. Let me show you an example
heh?
be clear for yourself first and provide correct info
Sorry. There had been instances where it was only predicting one letter of the captcha. It does give 5 characters as an output but it was just random occurances of only one being letter being correct. I just tested it again and the issue still lies. None of the letters of the captcha from the val or test dataset is correct.
can you give screenshot of it?
The current output is - Actual - 3Cj81 ; Predicted - 1l2M7
ignore accuracy for now, because there is mistake in model!!
we just need to find that , the model is trying to classify only one letter or whole word?
which is good!!
now need to make changes in model
send the current code for model
But you did find an error in the model and asked me to make changed right
alright
just the model code?
lemme explain breifly
suppose you created package (python package in seperate dir ), and now you are using that in your python code
now if you make changes in package, you have to reinstall it to use in your python code,
but when you set it as editable, it will do it for you automatically!
so any changes in packages will get automatically reflected in all your code where you are using that package!
yeah that layers
is this fine to paste like this?
use this [ ` ] 3 times and then write py
and at end again 3 times
what is input size of image?
.
dimensions?
256*256
yes
which?, just curious for time taken to train as per epochs
1 epoch took 60-80 secs
import torch
import torch.nn as nn
class NewCNN(nn.Module):
def __init__(self):
super(NewCNN, self).__init__()
self.num_class = 62 # 10 digits + 26 lowercase + 26 uppercase
self.num_char = 5 # 5 characters per CAPTCHA
self.conv = nn.Sequential(
nn.Conv2d(3, 32, 3, padding=1),
nn.MaxPool2d(2, 2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.Conv2d(32, 128, 3, padding=1),
nn.MaxPool2d(2, 2),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.Conv2d(128, 256, 3, padding=1),
nn.MaxPool2d(2, 2),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, 512, 3, padding=1),
nn.MaxPool2d(2, 2),
nn.BatchNorm2d(512),
nn.ReLU(),
)
self.fc1 = nn.Linear(512 * 16 * 16, 512)
self.fc2 = nn.Linear(512, self.num_class * self.num_char)
self.relu = nn.ReLU()
def forward(self, x):
x = self.conv(x)
x = x.view(x.size(0), -1)
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
which GPU?
gpu-4070
and yeah try now by this code
use this
okay
One more thing. The dataset
My test set contains 824 samples and the training contains 3k samples
is it fine?
I know this is wrong but are we supposed to split the dataset just by copy/paste it manually?
yeahπ , what if you have millions of images?
for smaller it is fine
but well created dataset comes with split!
yes apparently that dataset has 80k images. I only took one portion of it
are you training or not?
Can i train. The split is irregular i suppose
Can i start?
no it'sfine
bs - 128
you can use early stopper
I havent implemented that
it should for first time!
I see
It is a problem if the test loss is not completely below the train loss right
the goal of training model is to reduce the val loss, not to compare train loss and val loss!!
Yeah but the final val loss should be below train loss?
Is it supposed to be like that?
ignore that !!, also plot first both losses
Okay
Can i do that while its on training?
what is current epoch?
6
stop that
alright
ignore
Started
give me code for how are you calculating training and val loss
Too long to send. Ill send you ?
Its not working after clicking paste
after clickingpaste , copy the url and give me that
try this-
https://paste.pythondiscord.com/BTAQ
at 5
Is it for the next run?
yeah..
https://stackoverflow.com/questions/66220774/difference-between-the-input-shape-for-a-1d-cnn-2d-cnn-and-3d-cnn
truly great, thanks
I haven't really been able to find good research papers on this topic, but what is the point of polynomial feature expansion in the context of nonlinear features, and if my features have nonlinear relationships, would there be any point in using polynomial feature expansion before using a nonlinear dimensionality reduction like KernelPCA?
So you mean that you're using a method that is capable of finding non-linear relationships and you're wondering why you might do polynomial expansion (or anything else) before?
Can you confirm that's the question? If so, I like it π
i would say the strongest motivation is that polynomials have strong theoretical guarantees regarding their approximation capabilities for functions satisfying more or less mild regularity constraints
you can both interpret what you're doing and know how bad the error is going to be
you can also endow polynomial bases with nice properties pretty easily
ofc if you have a deep enough network you never need to do any of this type of pre-processing explicitly. but if you want to use fewer layers, you can treat poly feature expansion as "model-based machine learning" where you know that what you're doing is approximating a function with a low order poly, and then feed that into a network that does other stuff with those features. similar to how wavelet decomposition can shave off several layers in models dealing with data that lends itself well to approximate low dimension wavelet decomps
what comes to mind is the weierstrass approximation theorem, and you can probably find more about poly feature expansion in papers from like 1980~2000 because it's what i'd call a "classical signal processing technique"
to address your question more directly, i would say the point of poly feature expansion is that its behavior in classical settings is well known, but ideally one should have good reasons to believe that a poly expansion will work well for the problem. you're then left with the problem of which basis and what order of poly to use. the first isn't super important. for the latter, you can pick a modest number and then do feature selection/dim reduction to keep the terms that contribute the most
roughly the same as with any other "encoding" approach
I was going to give the basic example of decsion trees. They can approximate polynomials
But they're greedy, if you don't form that polynomial explicitly ahead of time as a feature you have no guarantees it'll do the 4-5 consecutive splits you may need (depending on the case)
Explicitly doing it gives your model space to learn other, more relevant things, say interactions with your polynomial and other features
Neural nets are definitely more opaque but the same idea holds there, decision trees are an easy way to make it "visual"
giving a shot at this if smone wants to join: https://arxiv.org/pdf/1412.0233
umm...so i am like starting out with pytorch and i have a small doubt that is there any rule for how many hidden layers we can make cause i see many ppl adding 3 or 4...and got a question that why is it only 3 or 4 why not maybe 50 cause the more the merrier and well what are the cons of using too many hidden layers could any pls explain
#1267031736253022228 message
Does anyone know pyspark + NLP?
need help
PS- it will also be helpful if the people who have some idea can tell me an alternative of handling large data for analysis and training models
if there was a "limit", there would be no LLM's
The people who said that maybe meant with respect to Model complexity vs amt of data
there are many other things to consider when training a neural net like
overfitting,Increased Computational Cost(money vs efficiency of prediction), time can be another factor
optimization is difficult as we add more layers coze lots of local minima(you can use advanced optimizations which can help you maybe optimize better, but what i think of is that you would still have to itterate through random initialization to find the best minima), vanishing gradient is another issue...
damn, i explained more than stack overflow

HELP ^
ah makes sense lol so mostly its just hit and trail thingy
πwell they gave some rules imma look into that later when i have a large dataset ig
vanishing/exploding gradients
ouh so the gradient just disappears?
gradient gets too small (so converges too slowly), or too big (so steps are too big, never converging)
ouh okay
thanks for that
more complex models like lstm and transformers use techniques to avoid this problem, allowing them to get very deep, and the results are pretty good
what techniques ?
it's how they're built (I don't think I have the ability to provide a clean explanation)
oh its alr i havent reached that part yet so in future
Hi all, Im trying to move the legend of a seaborn plot to the bottom, after "bill_lenght_mm", but I dont understand how the bbox_to_anchor works...
Thanks in advance
looks like (0,0) is the lower left corner and (1,1) is the upper right corner of the image, and the lower left corner of the legend is placed there
Alright, but what does the (.5, 1) means?
And what do I have to do to place the legend at the bottom instead of the top
as i said, it's the relative location where you want to place the legend
Then I would use (0.5, 0) to place the legend at bottom?
Assuming the legend would "push" the plot to the top
If that makes sense
that's a good question, you'll have to try and see. it might be that you need to use a negative y coordinate instead of 0
bbox_to_anchor=(0.5, 0) puts the legend over the plot, and bbox_to_anchor=(0.5, -1) puts it way lower the plot
So I think something like bbox_to_anchor=(0.5, -0.1) would work
bbox_to_anchor=(0.5, -0.3) worked well, thank you for your help!
you can counteract this with skip connections
I'd say the reason why people don't go very very deep is occam's razor tbh
But in a practical sense, it's better to train a large model and remove layers than do the inverse
So yeah, build a large network and then shave off as necessary
the largest issue you'll have is that training will take much longer and you'll overfit (but you can counteract this with dropout etc)
idk what dropouts are yet π
From a practitioner's pov I rather have 5 normal layers than 50 layers with dropout, batchnorm, skip connections, L2 regularization, ...
ouh i'm still learning those terms π
I add a lot of jargon on purpose, not to intimidate you but so you can google the terms π
hehe its fine sooner or later i'll learn those terms dw and yea thanks for the clarification !
the ELI5 version of my answer is this: big network => overfit and takes longer to train. You can make it not overfit with tricks but that's more effort than a small net
BUT when starting out on a new task going with a big network is smart, then you can check if you can at least fit the training set correctly. If you can't do that maybe you have a bug
This is less important than my first point though, I'd say this is at intermediate level
yea i'm like trying to do that just to see what haappens when i train the model with more layers(practical approach) hehe okay
You'll probably overfit after some epochs
lol idk it became saturated after some epochs
Yep, exactly. I have 28 features and I believe it's enough to directly run dim. reduction, but Im not sure what additional advantages polynomials would give
- Many local minima shouldnt be a problem https://arxiv.org/pdf/1412.0233
- and most random initialisations should get to different minima, but similar in quality..(same article.)
- vanishing gradient isn't that frequent of an issue using ReLU iirc. This depends on the arch. though..
What is your task
oh im just running through a project where i'm trying to classify heatwave risk using demographic factors
Sounds like good ol' gradient boosting does the trick
I'd run that without any feature engineering to set a baseline
yep, so with just the base MLP classification it got ~96 training ~92 validation
My apologies for a stupid question.
But do 1/m and 1/2m makes any different?
Can I mix them together? E.g. 1/m for cost, 1/2m for regularization?
thanks!
.latex
\[\frac{1}{m}\sum_{i=1}^{m} (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^{2}+\frac{\lambda}{m}\sum_{j=1}^{n}w_{j}^{2}\]
\centerline{vs}
\[\frac{1}{2m}\sum_{i=1}^{m} (f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)})^{2}+\frac{\lambda}{2m}\sum_{j=1}^{n}w_{j}^{2}\]
Nice
I assume it makes no different except it make differentiation easier?
would this be a valid implementation for regularization?
(cost+jnp.mean(jnp.pow(w,2)))*lambda_
Nope
cost+(jnp.mean(jnp.pow(w,2)))*lambda_
no, those have the same minimizer
it's just so that the derivative has no dangling 2 in front
if that's the case, this is ok, right?
Assuming if the cost is calcuated already
that'd work regardless of whether you put the 1/2 or not as long as the lambda is chosen correctly, sure
I was a bit concerned as I am using mean instead of doing the \frac{\lambda}{m}
it's the same thing
i would have some numerical concerns cuz idk how numpy and jax compute the mean. hopefully the 1/m is multiplied first, otherwise you can have overflows
other than that, the expression itself is correct
Should I do mean or sum then divide it?
summing first is what you don't want to do
Convention-wise
so the question is what jax does automatically with mean()
imagine all of the values are, say, a couple hundred million each
and you have several millions of them. the sum will overflow, but the mean can be computed by dividing first
these things make a difference in ML due to the size of the problems
jax/_src/numpy/reductions.py line 743
def _mean(a: ArrayLike, axis: Axis = None, dtype: DTypeLike | None = None,```
at any rate, the code and math are technically correct, it's just that the implementation breaks down at different places depending on the implementation
return lax.div(
sum(a, axis, dtype=computation_dtype, keepdims=keepdims, where=where),
lax.convert_element_type(normalizer, computation_dtype)
).astype(result_dtype)
Sum, then division?
So should I do it manually
If so, by convention, I should also use 1/2m, right?
whatever you prefer is fine, really
Thanks!
jnp.sum(a/a.size)
would this avoid overflow?
yeah
!rule 5
5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.
what was malicious?
Yes there are a lot of dependency
But good to know the part that most local minima are similar in quality.
attempting to bypass CAPTCHA
ohh!
@hard shuttle
Want to make a test dataset to evaluate different approaches to STT - is there any reason why I couldn't generate the dataset with TTS? Then maybe chop & screw the output with something like this? https://github.com/iver56/audiomentations
Seems like a relatively quick way to get a labeled dataset to evaluate some simple cases?
Still, just try a basic xgboost
I tried testing it before, but I just couldn't find a good way to prevent validation overfitting
Hmm but ig I'll try again
Do a hyperparameter search on the number of trees
Reducing that should prevent overfitting
For the record, I call all gradient boosting algorithms "xgboost". Which implementation are you using?
Actual xgboost or?
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html I always use this one. You can get away with tuning just max_iter
L1, L2 regularization, max_depth, min_samples_split etc.. And you'll be good
got it, thanks so much!
yeah, that's if their assumptions are valid
I really want to talk with a person who is a legit data analyst because I have questions to ask for a data analysis project portfolio
Just ask your questions immediately please π https://dontasktoask.com/ . Makes it a bit easier on whoever wants to answer
What is it exactly to add in your resume to get a good data analysis job ? Also like just please tell me what are those things which makes you a "special" data analyst
Hellow, I wanna know roadmaps for ai/ml/ds/da.
Kindly, give me some sources for roadmaps and the tutorials
click in that icon at the top of the app (see image below) @eager plume
Yo. Bout to build pc for neural and ai. So i wanted to ask whats the best cpu for this specific sphere. Currently i dont really understand diff between amd ryzen , intel core i ans intel xeon. Could anybody explain which one could be the better option in this specific sphere.
Motherload: A620M AM5
Gpu: gigabyte rtx 3070 (cuda support)
Ram: ddr5 16-32gb
Rom: ssd250, hdd1000
Ps: 650w
I was thinking about amd ryzen 7700, intel core i 7 9-11th gen
But some says xeon is quite better for those massive data so i dont give a thing
intel xeon has some sort of acceleration.
ppl go for just good GPU since it's more reliable (everyone has to support nvidia.)
see for example https://blog.tensorflow.org/2023/01/optimizing-tensorflow-for-4th-gen-intel-xeon-processors.html
with cuda supported GPU, i'll only help some.
Is there CUDA-equivalent for AMD?
ROCm, but it's not exactly accurate to call it an equivalent from what I hear (CUDA is better)
there seems to be a sort of tutorial here https://hub.docker.com/r/rocm/rocm-terminal/
cuda's just got a massive head start on everybody
cause no one except nvidia invested this much into gpu computing
then the AI boom and everyone's rushing to replicate what nvidia's got, obviously it's not that easy
Normally, since the overhead of having the code, gathering training data, the ML libraries failing etc is so high, people go for the easy path.
an interesting middle ground is macos metal (M1s and M2s w silicon chip); for example i got a 2nd hand M1 (actually my prev boss gave to me, but it's cheap.), and it's quite nice for simple stuff.
Tbh CUDA may be more efficient but generally computing on the AMD GPUs are cheaper, so normally people can weigh up the extra speed of CUDA vs the cost savings of running on AMD
i mean, if that were the case google would be supporting it
this is very true (iirc where I'm at for 16gb vram AMD's literally 0.75x the price :|)
though I think I also hear about ROCm compat issues especially on windows?
Also worth mentioning, all CPUs have this. It is just SIMD operations. Generally speaking though the AMD chips perform better in this aspect
Probably, I believe torch support for it is still experimental to some extent.
At work we just train on Nvidia GPUs and then convert the model to ONNX which has good ROCm support.
In general I'd say:
- If you don't want to fuck around with AMD drivers and hardware support with ROCm, go with Nvidia GPUs, also I think the AMD gpus are still a little bit lacking, that being said the newer gen being released (soon? now?) seem promising with their increased graphics power.
- AMD Ryzen Zen4 CPU, it will just serve you much better than Intel will. Especially in terms of upgradability and performance.
- Get a bigger SSD, 250GB is nothing now days and you will feel it
also 650W for that GPU and CPU are probably going to be a low, or at least very close to max capacity
just for completeness...https://en.wikipedia.org/wiki/OneAPI_(compute_acceleration)
oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multip...
btw, idk if it's just SIMD, since the optimisations are unavailable for intel i-series
Knowledge of some of the relevant tools: power bi or tableau, SQL, Python and maybe Excel. If you're interested in a specific industry (finance, healthcare, logistics, ...) domain knowledge is a big big benefit. On top of that, notions of dimensional modelling, ETL and the basics of data engineering go a long way. Being able to analyze a dataset is one thing but being able to set up everything necessary to do it is also important.
wikipedia landed dark mode
it is just simd
The big reason it isn't supported for most I-series CPUs is because they do not ship most chips with the hardware
well, but then it's not supported in those cpus
Yes, all the more reason not to use intel :P
AMD realistically dominates the SIMD game rn
Not that it matters in the context of AI training because you'll end up using a GPU anyway
but if you're doing KNN or inference, the AMD chips have a significant performance difference over intel
π Providing it isn't Zen1 and Zen2, we don't talk about those...
I know that for a data scientist a degree is necessary and a master's is preferred (or maybe borderline mandatory)
But what about a data analyst? Do you think there's a reasonable chance of getting a job as a data analyst without a degree?
Assuming you can learn the relevant skills by yourself and have projects that demonstrate those skills
@buoyant vine can I get your opinion on something? One of the things I've been working on recently is a general purpose inference server. Basically, there's a Python API that wraps sklearn => onnx, torch => onnx, ... and writes them to $storage and gives you an id.
On the rust side of things it's a basic axum webserver with dynamic routes that corresponds to that id. It loads the model from $storage and deserializes the payload (json in my poc) and does the inference.
The idea is to make something really simple for data people that want to deploy models. Once the inference server is running all you need to do is call things Python side
What's left is a small CLI tool and gui
MLflow already has serving but it feels so bloated imo. It pickles models and so on
k
Sounds vaguely similar to what we did before we merged the services into a monolith
What do you do now?
Ours is a bit difference since it is a realtime web classifier, but we originally had:
- Scraper microservices
- Job submit/front API
- Job manager
- Inference API & model
- Translator service
w/ communication original via SQS which then went to HTTP microservice calls.
And in the end we moved everything into one application and used channels to act as the internal queues.
Everything is possible, but a degree makes everything easier
Overall it sounds fine, the only issue I have had with ort previously is there are some edge cases:
- If you have two sessions trying to use the GPU it will deadlock both sessions (no idea the cause, but I believe it was memory locking)
- Some models or CPU inference in general must do some internal batching, because no matter the batch size you give, it will always use effectively all the CPU cores you give it in terms of usage.
At some point I thought it must be spinning 1 thread constantly
Makes sense, I'd only be doing the inference API & model part. Although I get you, if these are microservices communicating with http the overhead might be too large for your usecase
yikes for the first bullet point
yeah it is probably fine for your case, our issue was actually AWS thingsβ’οΈ Where ALB would effectively DDOS the services
this is what i've seen about intel https://intel.github.io/intel-extension-for-tensorflow/latest/get_started.html
it's in the context of pluggable devices, tensorflow.
Yeah, I remember the load balancer being an issue
I think I'll start small (CPU only as well) and expand it gradually
Yeah, I mean in general if your have ORT setup, then adding GPU support is super simple
If you want to docker it, then it'll be a bit of a pain but anything gpu wise is a pain
Why? Isn't it a matter of adding a couple of things to the compose file?
your docker engine needs to have all the GPU stuff configured especially for nvidia,
there is a toolkit you need to install and attach to the docker runtime
there is dockerfiles already with that configured..
it is not the docker image side of things
Depends on what you were running on and if the deployment already has it configured
well, you need the cuda drivers in the OS
WSL (at home) and debian (at work). I didn't install docker engine at work so the sysadmin might've gone through the pain for me
do you guys normally do AI related stuff?
Yes
Possibly then, reason I brought it up was if you intended to share it with co workers via something like docker
it can be tempermental
but probably chances are most people's devices are setup already for it to 'just work'
π Well you'll certainly find out
I'll look it up for sure
i normally just use conda either through GPU or HPC/cpus.
but the docs on tensorflow docker are very clear, that's why i meantioned it can't be that complex, yeah others need the compatible cuda config than your os
The most annoying thing you can run into is missmatching cuda versions
yeah but it's not that problematic imho
unless you don't know much (i.e how to download the right version)
versions are backwards compatible
I think in theory they are, but I remember us having some real pain points with them miss matching
now they dropped support for centos 7...
they've got a great version-matrix, and now the version of each package is !=, anyways..
So much wacky tacit knowledge is necessary to do this stuff correctly
i disagree, but it is a pain unless you like to read a lot of random crap
At my new job they just use databricks and let the cloud bill go π
We have some machine images and docker files that are just "You shall use these images with this docker base image if you enjoy your sanity"
yikers
i'd guess simd enabled cpus could be useful if you do data-preprocessing outside the network (edit: without the neural network library.)
I mean idk what the cost is
I know we have this habit of being concerned about the cost of something if it is in its own isolated AWS account and they go "Wait this project costs us how much!?" but then if it is in the main account... You could burn through money and no one is going to notice it
Shout out to dynamoDB btw for its truly amazingly expensive on demand cost
Pretty sexy cost reduction graph tho
interesting
