#data-science-and-ml
1 messages Β· Page 110 of 1
If you ever need to do anything webby like deplying mlflow I'd use it ahead of nginx
I've been using traefik, it works well with docker compose, which has been the main orchestrator I use
Traefik is Caddy's main competitor yeah. It also works well with Docker compose (which is what I use as well)
tho the issue here is just that the free machine only has two cores, so the two workers get overrun very easily
Personally I deploy 1 Caddy for all my apps and not 1 per
I'll give it a try for sure, using nginx was a nightmare, especially with docker
Maybe you could look into getting an EC2 instance or similar to permanently host stuff for you
yeah I'll upgrade to a spot instance, use skypilot to get a new machine once it's taken away
it's like 5 bucks per month for one of the good ones, assuming constant usage, which wouldn't really be the case
5 or 10
What do you get for β¬5?
I don't recall, it was one of the cX machines, I just skimmed through to see what price I could get
man doing profiling in python sucks
What are you using?
am trying to see why training is slowly filling up my RAM then my swap, am using memory_profiler
the __call__ to DistilbertModel from huggingface's transformers increments memory usage by 100 mb on each call, so i'm trying to see what exactly is going on
also for some reason memray says peak memory usage is 600 mb, yet memory_profiler shows it going over 12 GB
I always use Scalene https://github.com/plasma-umass/scalene
pretty good experience with it all round
nice had not heard of it, will be trying it out right now
thanks!
Hi so I am trying to train a neural network to detect cells in an image using faster RCNN. I have a dataset consisting of 1300ish such images, that are labelled as well . Each image contains several cells, most of which are red blood cells and the rest of which are infected cells. Is there something pre-existing that I can use to train a network on my dataset of images?
Typically object detection is trained on the 20ish classes from the coco dataset, if you wnat to do anything else (which you are) you'll have to finetune
Please only ask the same question in one place. You can link to the original question in other places.
what do you mean
sorry my bad
new to me. it supports asyncio?
Okay, there's a lot more than 20 https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
with 1300 images you might be in the sweet spot of having enough data to train a good model but not needing a pre-trained base model. how many labels do you have? is it relatively balanced? how big are the images?
Good question, I haven't used it for async (yet)
the images are on average 2000kb and I'm pretty sure all the images are labelled. Here is the dataset if you want to take a look or in case i might be wrong: https://bbbc.broadinstitute.org/BBBC041
but I wouldn't know why
also @coral lotus from a human perspective, how hard is it to distinguish the images? are you interested in getting a useful estimate of the probability distribution over labels, or just minimizing prediction error on labels?
How good are you with object detection/segmentation already?
just minimizing prediction error i guess. my main goal is just to make cell identification as accurate as possible
i havent done anything with object detection or segmentation
oh yeah is this a detection/segmentation task, or are you just classifying the entire image?
Doing 2 things at the same time is > 2x as hard as learning it one by one
detection/segmentation
so this is what an average iamge looks like
If I were you I'd learn about object detection and segmentation first and then learn how to do what you're trying to do after
If you don't know enough about neural nets while you're doing that I'd advise you to do that as well
im gonna be honest, this is for my science fair project thats coming up pretty quick so i dont have much time. But I don't think it should be too hard to do them both?
Gonna be honest and say that I (and most folks) won't want to walk you through the entire thing either but are definitely willing to help if you have specific questions
i was going to use a faster rcnn framework to detect the cells in an image and then a cnn framework to classify the exact cells
@coral lotus if you just want to classify the entire image, this might be on par with mnist (considered easy/solved) depending on how distinct the infected cells are from healthy cells. for actually detecting/counting infected cells it might also be easy but i don't have experience with detection or segmentation and don't want to speculate
I mean i guess its just classifying an image? not completely sure. like if there is one infected cell in an image then the whole image can just be considered as infected
because they are close up images of blood smears of patients
so if there is a single infected cell in an image then that means the patient is infected
fair enough i cant blame you
Do you want to draw boxes on every infected cell or do you want to say "this image has infected cells"?
I mean preferably drawing boxes, but just saying "this image has infected cells" would be enough for my project
As usual I agree with salt rock lamp
"This image has infected cells" is easy
Drawing boxes isn't too hard either but you may have to label data and that's the time consuming part
yeah so like if this is an image, id just it to say "this blood smear shows that the patient likely has malaria"
Unless your dataset is already labelled
Then you can follow a guide such as https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
alright thank you
also by labelled i mean the dataset came with a json file consisting of thousands of lines of this
[{"image": {"checksum": "676bb8e86fc2dbf05dd97d51a64ac0af", "pathname": "/images/8d02117d-6c71-4e47-b50a-6cc8d5eb1d55.png", "shape": {"r": 1200, "c": 1600, "channels": 3}}, "objects": [{"bounding_box": {"minimum": {"r": 1057, "c": 1440}, "maximum": {"r": 1158, "c": 1540}}, "category": "red blood cell"},
guess what: you actually have labeled image segments
at least according to that data
i would still start by just trying to classify the entire image infected or not. easier project, more forgiving
but there should be plenty of introductory material on image segmentation as well. the data already being labeled with bounding boxes should help a lot
but again i would have to defer to other people regarding how to actually build a segmentation model
seems like a great practice task for me to learn π
yeah, that's fine
you'll have to read the guide and plug in the gaps as you go
hello guys i Hope Ur doing well , i was wondering if anyone can send me an interesting dataset medium or bug sized for machine learning and he got some dΓ©cent results After working on It , tysm π
.latex Suppose that $f_1$ is a model with $X$ features and the loss function is denoated as $L_1$, whereas $f_2$ is a quadratic model with $X_2$ features and the loss function is denoted as $L_2$, we want to show that:
$L_2 \geq L_1$
We know that the logistic loss function is:
$L(y,\hat{y})= -ylog(\hat{y}) - (1 - y)log(1 - \hat{y})$
Assume that the predicted probability for $f_1$ is $\hat{y}_1$ and $\hat{y}_2$ for $f_2$ on the same data point $(x_i, y_i)$:
Then, $\hat{y}_1 = f_1(x_i)$ and $\hat{y}_2 = f_2(x_i)$.
Since $X_2$ has the quadratic features of $X$, it contains all the features of $X$ as well. Thus, $X_2$ has at least as many features as $X_1$, which also means $f_2$ is more flexible to fit the data compared to $f_1$.
Because $f_2$ is at least as flexible as $f_1$ and both are optimized to minimize the loss function, we can conclude that:
$L_2 \geq L_1$
heres the question:
.latex .latex Suppose that $f_1$ is a model that optimally fits the data $(X,y)$, and $f_2$ is another model that optimally fits the data $(X_2,y)$, where $X_2$ are the quadratic features of $X$. Then the loss function value obtained by $f_2$ is always going to be at least equal to that for $f_1$. Try to come up with a solid mathematical argument that justifies this claim.
correct idea? @wooden sail
i'd say no
WHAT WHY?!
none of that is true nor useful
by your argument, raising all the entries of x to the 0th power sill also be "at least as flexible"
o.O
i would try doing the math for the scalar case and then generalizing
it's always been about math, you could train models with pen and paper
it'd just take ages...
so teacher gave a line like this in the exam
there were tons of points on each side of the line (two different classes) seperated by that line above
and asked this question:
This line of separation cannot be done using SVM
I said yes
we can't right?
its because SVM needs a linear decision boundary
unless we use a kernel trick
An RBF SVM is still an SVM
I think it's not a great exam question either way. you can make a new feature x' where you can get a linear seperation
yeah
I was confused with that question tbh
i mean it can with kernel trick but also it cant @past meteor
question should have been specific
Anyone knows a place where I can learn AI
youtube
Hi, anyone know what the N and M stand for here?
it means there is a set with N examples x_n, and another for y_m with M examples
Oh duh, thanks lol, appreciate you
Has anyone in this community started using the Google Gemini API following GPT-3, and could you provide insights into its strengths and weaknesses? Specifically, I'm interested in understanding its performance in terms of
- pricing
- speed
- reasoning capabilities
- multilingual understanding
- controllability.
Any feedback would be greatly appreciated.
kindly Tag me
hi anyone that know about apis and python can help me out
i am trying to use novelai api to generate an image
In reviewing the MetaFormer paper. So the conclusion is that a local operation like average pooling can substitute global operations like scaled dot product. And by extension, a kernel conv layer can substitute the avg pooling.
So here's what's bothering me. Did no one ever thought of using CNNs for language modelling ?
Even if that doesn't work well, it looks like a very small step to go from a CNN to a series of (conv + MLP) type of model
It would legit be the second thing I'd try
In fact, I thought the whole idea behind transformers is that CNNs, despite reducing dimensionality, they can't capture long rage relations
So maybe the important feature is the multi headed thing, which granted, I wouldn't easily think of it
Omg they even use the identity map to replace the attention module, I'm sooooo confused
I'm super suss'd out rn ngl
the -> means substitute, so in case of the identity mapping the removed the avg pooling and used an identity, reducing the model to a series of MLPs with layer norm in between
that gives them 74.3%, which is obviously suss because the presence of a given token will not affect any of the others right
the most suss part is the hybrid stages
they sneakly don't include the results for [Attention, Attention, Attention, Attention,]
and there's a clear increase as more attention modules are included
this is for vision though, I need to see if there's anything funky going on with patch embedding, haven't looked into it yet
does exactly what it sounds like it does
they have the code public tho, adding it as a todo to replicate their results and see how the missing row looks like: https://github.com/sail-sg/poolformer
there can still be token interactions if there's a hidden layer
how would you do it? same approach as transformers? mapping to low-dim space to apply the convolution filter, then un-mapping back to the original vector space?
the MLP acts on each individual embedding, so does layer normalization, maybe they changed normalization
No I'd just slide a kernel over the batch, which is what they are doing with avg pooling
Like, in a way, it's a batch of images
I guess I can buy it for the avg pooling, but the identity mapping is suss
conv1d with 50 dimensions?
No like, a NxN kernel
oh i see
A single one ig
I'm sure it was tried, i just wouldn't know π
You'd think it was one of the first things to try right
ask during more normal US hours so one of the actual ML experts can weigh in
I like the interpretation of attention as a "token mixer"
But i think the main issue is likely to be what you said: text is less "local" than an image and you probably can't do without the long range token graph
Yeah could be an explanation, the patches of an image fit together in a very different way
@final kiln https://dennybritz.com/posts/wildml/understanding-convolutional-neural-networks-for-nlp/
When we hear about Convolutional Neural Network (CNNs), we typically think of Computer Vision.
2015
safe to say it has been tried
Pixels close to each other are likely to be semantically related (part of the same object), but the same isnβt always true for words. In many languages, parts of phrases could be separated by several other words.
Could be that it won't work for NLP
They used imagenet, and if they are doing classification, I can see this working
Even with identity, the patch embedding already packs a ton of information into the embedding
Words aren't pixels...
Well they certainly are if you consider the input tokens, it's a 1xsequence_size image
It's not necessarily a semantic relationship though
When you embed the tokens you get something that resembles an image, quite a lot actually
Well this is actually a batch
But I reckon it would look fairly similar if you imshow a sequence of embeddings
here's an idea I haven't seen: stack transformers in front of convolutions. the idea being that the transformer is learning a representation that is optimal for the CNN to operate on, which one would hope implies a kind of "localizing" effect, where you place relevant tokens adjacent to each other so that they can be picked up by a sliding filter
i don't think it would be necessarily better for text generation, but could be interesting linguistically
that's always been my intuition about what transformers do anyway, we've talked about that here before
i wonder if you could do some kind of "semantic filtering" on the output of a transformer stack, and then running a convolutional layer over something like a sliding average of tokens. less grammatical detail, more big picture
i should noodle around with that + nanogpt
or maybe something encoder-only like bert... need our NLP experts' opinion
They sort of explored that idea
In the hybrid section
But ofc, with avg pooling
The residual connections transfer information from earlier layers to later layers, so the order might not matter that much
right, but they didn't have multi-head attention at the time
Yeah there's definitely a ton to explore here
so avg pooling of the attention-ized token sequence specifically
Oh, that's an ablation study on the transformer
They took out the scaled dot product and used an avg pooling
To show that the token mixer is not important
in a way, sure. instead of five transformers, do three transformers followed by two convolutions
oh i was still thinking about text. i never actually looked into metaformer because image-related data never comes up in my work, i just glanced at the paper when you mentioned it
as in, running the same experiment but on a text dataset?
yeah that will be interesting for sure
This one is already with a quadratic form attention, instead of Q K V
The metric tensor one is not done yet, I'm gonna code it in cuda directly to impose all the weird conditions that I need
But the pytorch implementation did well for the Amazon dataset
It achieved close to SOTA performance
If you filter out those models that weren't pre trained beforehand
Like training Bert on next token pred with google level datasets and then fine tuning for sentiment analysis
I'm coding a bunch other attention mechanisms today, including the avg pooling
cool!
It's taking a while because I was figuring out my ML workflow, I definitely got it down now
So after finishing the model stuff and the extra validation steps on the training loop
I gotta setup a couple more datasets and then I can trigger the experiments
Those are gonna take a looong while, so I can finally pause this and give a bit more attention to my job search π
I mean, they did only use one dataset here, so I might do the same
Yeah I might keep it to text classification. The IMBD dataset is very good
Perhaps too good tho
well it's been exciting to watch your progress on all this π I am enjoying it vicariously while I mess around with IoT data and maps at work
Just tried data scraping https://www.city-data.com/ and I made sure to use a timeout between each request, anywhere from 1 to 2 seconds long (random), but my connection still got closed and the IP I was using was blocked. Thankfully I was using a VPN so I can try again, but how the heck is 1-2 seconds between requests too fast? I thought hosts would only deny requests that were like 0.1 seconds or less in between.
I can't make it much longer, because I don't have all that much time to wait (imagine if I was having to do this for a real job; they certainly wouldn't be that patient) and what if it encounters some problem with a random page partway through and I have to do the whole thing over again?
A little frustrated since I'm doing this for a portfolio project that my resume kind of hinges on. Why is 1-2 seconds too little anyway? If I wanted to ddos attack someone (which I don't), common sense would say I'd do it a lot faster than that.
It's been a ton of fun for me for sure. Honestly, I think I'm gonna do more of these long pauses. Definitely not frequently, but yeah, every so and so years I give myself 6 months to do wtv.
unfortunately their ToS prohibits automatic scraping, so we cannot help as per server rules.
https://www.city-data.com/terms.html
This license does not include any right to private or commercial collection, aggregation, copying, duplication, display or derivative use of the Service nor any use of data mining, robots, spiders, or similar data gathering and extraction tools for any purpose unless expressly permitted in advance in a written document signed by us. The sole exception is the limited right provided to general purpose internet search engines and non-commercial public archives that use such tools to gather information for the sole purpose of displaying hyperlinks to the Service, provided they comply with our robots.txt file.
in the future, i advise not basing a school project around ToS violation and circumvention of access control
it's been inspiring for me as well. i should really try to build a couple of 6-month sabbaticals into my financial plans at some point
@turbid drift this doesn't seem like a commercial site. so if you contact the owner of the site, they might be willing to provide you with a dataset.
Yeah, I just sent them an email.
Still frustrated though, web hosts are so overly paranoid.
It should be obvious that I'm not an attacker.
yeah, good luck. this seems like a hell of a lot of work for a volunteer data aggregation project π€ are they making money off of this somehow?
but you are violating their stated ToS and they have every right to try to prevent you from doing that
I know. Hopefully I can find another source if this doesn't work with a more flexible ToS.
I'd just do an analysis on an existing Kaggle project but apparently employers want me to actually come up with my own data and not just use something from there.
There isn't anyone making money off of this; this is purely for my portfolio. It's my first major data analyst portfolio project.
More specifically, I'm wanting to get data on US sister cities, including information like connections between ethnic populations and whether or not they correspond with the countries represented in those sister cities.
For example, do US cities with sister cities in Asia tend to have higher-than-average Asian populations?
Seems easy enough for a beginner project, but challenging enough to demonstrate skills to an employer.
Interesting
What was this, is there some prev message you can quote?
for the USA at least, that kind of data should be publicly and freely available through the US Census and related gov't agencies
given that this seems to be work-sponsored, why don't you bring this up with your manager / mentor / whoever and let them know that you might need to adjust your project, to avoid getting bogged down in a "gray-hat" web scraping task that's unrelated to the actual topic?
Ah, it's not work-sponsored. I don't have a specific job I'm doing this for, this is just to make my portfolio look more attractive to any employers who are looking for data analysts.
oh
just pick a different project then
i get the desire to work on something particularly interesting, but imo there's no point wasting your time with a distraction. if you want to practice webscraping, start on wikipedia, which does permit scraping (within reasonable limits)
but for a data analyst job i think your attention will be better spent elsewhere. web scraping and related tasks can be extremely useful and can make you seem like a wizard. but focus on fundamentals is more important.
So can I just look for a random, but information-rich project on Kaggle and just do a bunch of analysis on it, and have it be good enough for a portfolio project?
that job market is absolutely brutal right now as i'm sure you know. if you have a particular industry of interest, you might be able to gain an advantage by doing a project in that particular industry's domain.
ideally you'd still pick an interesting project with some kind of realistic "research question" that you can answer. you are always somewhat limited with public/low-cost data, but for certain topics (macroeconomics, meteorology) public data abounds, published by the USA and other governments
I honestly don't enjoy data scraping that much, and I only really do it because I feel like I'm pressured to come up with data I gathered myself, otherwise employers won't think I'm desirable.
right, but there's a lot of data out there that you don't need to scrape from the web or call from an API in small batches
in general, getting data yourself can be very important. so i don't want to undersell it too much. but i think you're on the right track in not wanting to spend too much effort on it.
I feel like I might be misunderstanding a lot about the industry too. I'm coming off the Google Data Analytics certificate by the way.
what kinds of jobs are you looking to get? are you looking for your first job in tech / data?
Yeah, just any kind of data analyst/data science job, remote or nearby, involving Python, SQL, R, Microsoft Excel, all of which I can use well.
I have a Bachelor's in CS as well, along with my Google certificates.
So really just my portfolio is stopping me.
great, so you have the programming and technical skills. then you just need to show that you can put together a research question, make useful data visualizations, do some basic statistics, and write a coherent executive summary of your results.
Yeah, and I was on the track to doing that with my sister cities project, and moving along quite nicely with it too. Had already gathered some very useful information.
So I'm overcoming that imposter syndrome.
based on that background you are probably a stronger programmer than 90-95% of data analysts and you might want to consider looking at more of a data science career path if you can get some work experience + a masters degree (ideally filling in the gaps you probably have in math, masters is optional but might be faster than grinding away for years at self study & looks stronger on a resume)
the sister cities thing is great, but unless you can get that data you might have to divert
what about just comparing US cities instead? come up with some kind of comparative analysis, the actual topic is less important than demonstrating that you can come up with interesting questions and answer them coherently
Perhaps. I think another thing I'm afraid of is the thought in the back of my head that someone has probably done/found that info out before, and I'm just re-inventing the wheel, in which an employer might find that out and just accuse me of having copied from somewhere else.
i'd actually encourage spending less time on this particular project (maybe a few afternoons at most? just enough to answer your own question in a nice 2-page writeup) and then maybe go deploy your programming skills on some AI task. that could catch recruiter eyeballs.
of course, but who cares? you're not trying to get published in Econometrica, you're trying to get a job
As someone who's highest level of math is Calculus 1 and discrete math, I heard AI needs at least multivariable calculus doesn't it? I get the concept of gradient descent though, but not the implementation of it to the little details.
yes, but keep in mind that this also puts you at the 80-90th percentile of data "analysts" (as opposed to data "scientists")
that's probably an exaggeration but still, your imposter syndrome seems severe and you haven't even been hired anywhere yet
you are doing fine. scale back the project, learn to embrace everything being slightly fucked, and go get a job
"everything being slightly fucked" is a normal state of affairs in data. the best data analysts are the ones who get stuff done anyway.
your job is to show up and answer useful questions for the business. all you need to do right now is demonstrate that you can do that. the particular choice of research question is only interesting insofar as it demonstrates your ability to think about the real-world context behind the data and come up with an interesting question. but you have plenty of other skills you need to demonstrate too, so don't get hung up on that one aspect in particular.
Can anyone hop in my python post for a sec?
Aaaah I'd have to dig far back. The concept is simple tho, take out Q, K, V and have a metric tensor for calculating the dot product between each token and every other token. Plus a matrix at the start to project the tokens to a lower dimensional space. The rest remains the same
I spent 2 weeks training it thinking the performance was subpar but the dataset was just bad in on itself, best models are getting 65% acc on papers with code, I was getting 55% before overfit
To combat the problem of quadratic scaling of compute with scaling of num tokens?
Why metric tensor tho
The jump from calculus to multivariate calculus is not that hard tho. The nabla operators all have an intuitive interpretation, and even come with descriptive names, "gradient" and "curl".
To make it more interpretable. It also lets me half the number of parameters
Nice
I don't understand, how does that make it more interpretable?
You can view it as constructing pseudo-eucledian metric spaces, which, despite sounding fancy, it's actually easier to picture in the head than keys queries and values
ERROR: Failed building wheel for llama-cpp-python...I want to build a docker image and I got this error message
they noticed that it kind of just pops out of the matrix arithmetic:
Q = X @ Wq
K = X @ Wk
V = X @ Wv
Q @ K.T == (X @ Wq) @ (X @ Wk).T == (X @ Wq) @ (Wk.T @ X.T) == X @ (Wq @ Wk.T) @ X.T
you can set M = (Wq @ Wk.T) and then impose constraints like "M must be a metric tensor" (i.e a distance matrix)
Nice, I'm back on the hackathon grind and I have a top 6 placement (finalist) for my first
β¬3k pot for the winner
congrats!
i never even tried entering a hackathon
I think you'd be great at them
The ones I participate usually get won by a good mix of communication/business skills and tech stuff
So not just 1 or the other
Generally a nice past time that sometimes gets you money and networking
what's the idea, you enter as a small team or individual, and then build something over 2 long sessions?
I've done many. Some are team based and you had to build something in anything ranging from 4 hours to 2 days.
Others you enroll alone. The last one I did was actually one where you enrolled alone, got a random team and had an hour to conjure up an end-to-end data architecture and pitch it
how do you find them to enroll in? meetup.com kind of thing? was it "programming" or data-specific?
I only participate in data science/ML ones. Typically just LinkedIn or even Facebook ads
i'll keep an eye out!
how would you describe the level of participants in a regular DS/ML hackathon?
always wanted to join one but didn't want to be dead weight, especially after not doing DS properly for ages..
Industry professionals
I would just do it. everyone who didn't form teams in advance is taking a risk wrt to their teammates.
that's a fair point
(and the winners will probably be a team of high-performing professionals who formed a team in advance--ngl)
Luckily the ones I participated at post graduating where ones where you didn't form teams ahead of time
But I think the level of the regulars here is probably higher than the people I see participating
Soft skills matter a lot
Personally the only thing I'm sure I can do well is presenting/pitching do that's always my angle
Attention mechanisms done, gonna code a bunch more metrics to get a complete report at the end of each run
Time to start thinking about how I can use the metric tensor symmetry to optimize for speed, and also, how am I gonna code c++ CUDA kernels through rust into torch or, through torch into rust ? Idk, but however they did it to make these rust bindings in the first place, I gotta do the same
I'm actually gonna think about this first before thinking how to code the kernels. That part will be much easier and idk how feasible the integration will be, so that goes first
hmm guys do yall think i should master machine learning? like ive got alot of knowledge and experience with simple software dev but was wondering bout ml and whether its worthwhile or ont
that really seems like a question you need to ask yourself before anyone else, but beside all that, ive heard quite a few times recently that companies are hiring less and less ai/ml/ds devs after the boom that happened 5-10 years ago
have you done it as well? what was your experience like?
I only did one hackathon (the year before covid started), and I gave up before it ended.
I was doing a project on video highlights generation on python, can someone let me know how can i use the text from the video in deciding certain "key" moments from the video?
This would be a challenging beginner project.
You need the transcription of the video to have timestamps associated with each part. I assume you have that.
You'll want to look into techniques for determining which part of a text is most salient.
so you'll want to google stuff like "saliency detection nlp"
i was thinking about how was i even going to stitch them back up, guess i have to to do timestamps, i was just happy the transcription came out fine
Ayy congrats π
Ahh positive definite scalar forms
I'm actually doing some research at the intersection of ML, information theory and algebraic topology atm so this is quite interesting xd
Interesting
Thanks for the illustration!
I think my sklearn wrapper for Torch stinks a bit π
Training time is pretty invariant to the size of the net which means all the time is spent loading data to the GPU
If I care enough I'll refactor the hyperparameter search to use raw torch and the rest my wrapper
Rust for CUDA has been a passive problem I've been looking at for sometime
The cuda ecosystem for rust isn't really mature as far as I could find
Lmk if you make any headway!
I'm not sure how much fancy stuff it'd be possible to do with it since it doesn't really give for very interesting spaces like the ones you see for general relativity. It's more like minkowski spaces and really, the only thing that matters is how it affects the angles between the embeddings.
But like, the point is just that, to get the interest of people who study this kind of math, I'm not entirely sure what kind of insights can be extracted, but I think it's a step in the right direction.
Do you guys think ML on tabular data is a solved problem?
If my job was more industry, less research I'd just create lags for my time series throw it into xgboost and call it day
I'm actually gonna code c++ CUDA and bind it into torch, which I'll then bind into rust
Prodding at something from all the different angles is great, if nothing else we just find a lot of angles that don't work and why they don't work
The client paid for exotic nets, so the client gets exit nets but it's a bit of a waste of time
Why not bind it directly to rust
The improvement vs lags+xgboost is so marginal
maybe you're interested in "information geometry", where they look for manifolds and the corresponding geodesics and metrics to be able to train with less data and measure distances more sensibly
Because this way is easier since I can cheat by looking how they did it for the torch rust bindings I'm using
I guess we can say we're more of a fraction through to 'solved' than we are for vision, audio etc.
They're each other's inverse in the sense collecting data for those is a bit easier but more heavy weight models are required
Yessss exactly
That and I'm also looking into manifolds that give a better "native" representation for particular types of data
Like negatice curvature Riemann manifolds for hierarchical structures (power law)
don't let me trick you into thinking i know what i'm doing though, i just know these things exist. i dabble at most tangentially in that i work a lot with fisher information, which happens to define a metric tensor in special cases
oh lol true ig
Can you share those torch rust bindings btw
I've seen you share code written in that a few times here and it was so sleek and nice (obviously, rust xd)
Is this in any way related to fisher vectors
Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to probability distributions.
first time I've actually been interested in studying any kind of stats
And you probably know more than me haha
I invade the algebraic topology territory starting from ML and information theory, that's more my home ground
i had never heard of them, but it looks like it. based on the fisher score
log likelihood and what not
ooh interesting I'll have to look into it
Fisher vectors are more of a toy I like to use from time to time
Very elegant, but DL beats them for most applications
Thanks!
I'm actually yet to see any sign it won't work, I'm replicating a study (made for vision) that claims that the attention mechanism is inconsequential. they at one point even substitute it for an identity mapping
in all likelihood, I think the identity and avg pooling won't work as well as for vision
(as substitutes for attention)
I think I've heard of this paper
but anything else will work and the network doesn't care as long as you give it a way of comparing the tokens
Lots of complaints when it came out π
yeah I've had my own
I haven't dived into it myself tho
True
.
Are you trying to reduce it from quadratic scaling or is that not a concern for you
it's not currently a concern, I've just halved the number of parameters in the attention head and will use the symmetry to reduce the number of operations
I'm more interested in the interpretability and in replicating those guys results
ahh nice fair enough
I've very recently started keeping an eye on ways to reduce the quadratic scaling problem in attention
Massive benefits if we can find a way, but ofcourse it's a difficult task and we're not currently in a place to drop everything else and focus on thay
Has anyone here read the Retentive net paper
I do have a couple ideas
the most straightforward way is to make it a funnel like structure like you do with UNETs
there's no reason for the output dimension to be the same as the input dimension for the attention module
We actually played around with this a bit
Nice
I also suspect you can take half of the network and have it do convolution, like
imagine two branches with a series of attention modules
the first branch does self attention, and the second branch does conv
you can then feed one into the other like you do with encoder decoder
the conv captures local relations, the attention mechanism captures global ones
But the self attention will still scale quadratic right
the hope would be that you wouldn't need to scale as much embedding dimensionality, since part of the burden has been shifted to a different branch
so it doesn't actually completely solve it
Ahh hmmmm
And how will the self attention be incentivized to prioritise only learning global context, just backprop?
potentially via masking of the attention scores
Of the tokens close by? I'm not sure if that's a good idea.
For starters tokens are often sub-words and such
You might only get enough useful information to compute relative global context when taking a bunch of close tokens together
That would be captured by the conv layers, the self attention is followed by cross attention that is fed with the conv layer results
With residual connections info is never really lost
But it would be required in the self attention being computed, itself
Hmm yeah
Uhm, not sure if I understand
Each token would suffer influence from far away tokens only
The self attention computation being performed is on masked vectors? Or masking is done after the computation?
On the attention scores, like you do to make it causal
But really the only way to know if it works is to try it out
I think my next project is gonna be to write the torch bindings for that new programming language
Gleam
I'd make it cool tho, using the compiler to aid the ML dev process
But b4 that I might try this
There's so much cool stuff to do ._.
Yeah ikr π
I just try to find justifications (excuses) to shoe-horn it in my work time π
That's one way to do it
If someone wanna do a chatbot with custom data dm me for join the project
facing a moral dilemma with my personal project
on one hand, i do not want to use AI to work on it because then it feels like less of my baby
on the other... the practicality it offers in eliminating redundancy and whatnot is unmatched, because i'm working with a lot of math and i'm not all that great at python yet
i'm considering a compromise to be only using it to handle the more rote things like big tables/dicts/definitions, or to figure out the harder coding problems that i'm still learning
maybe i could only use it for learning how to code?
idk, thoughts on coding with AI?
I think you'll find you'll get over the "not all that great at python yet" if you put down the AI and muscle through it.
what do you do with fisher information specifically?
as an illustrative example, i have a junior colleague who refuses to put down the AI, and as a result his progress is very slow
it seems causal, because we consistently make more progress in our 1-hour socratic discussion sessions (so what do you think we should do in XYZ situation?) than he does on his own time
he clearly knows how to do the stuff. he just has it in his head somehow that the AI is definitely helping him even when it seemingly is holding him back
I have a similar view on leetcode and similar puzzle questions: the best problems are the one where you're stuck on them for days, and then think your way through it.
The process is the point, not the result
yeah! if he'd just put down the damn AI and think + write notes on pen and paper, he'd make a ton of progress and become very strong
this, yes. he doesn't seem to get that the part where you think hard and you feel like you don't know what you're doing is the part where you're learning
i used to be so afraid to feel like i didn't know what i was doing. it took me so long to embrace that feeling. i'd have been toast in school using AI for everything, even as it is, wolfram alpha did me no favors in learning calculus. i dealt with the consequences of that laziness for years.
so i get it
but, learn from my mistakes
I halfjoke but, perhaps the saving grace of this AI craze, is job security for the rest of us
Rubber duck time.
Also if you don't feel uncomfortable, you are probably not learning, much like how not feeling uncomfortable while working out means you are probably not making progress.
(There is no way to minimize / avoid it, but we try really hard (human nature to avoid things that make us uncomfortable), and so turn to stuff like AI)
(This can be anything else, like watching tutorials instead of actually doing it (just watching is not uncomfortable))
Has anyone in this community started using the Google Gemini API following GPT-3, and could you provide insights into its strengths and weaknesses? Specifically, I'm interested in understanding its performance in terms of
- pricing
- speed
- reasoning capabilities
- multilingual understanding
- controllability.
Any feedback would be greatly appreciated.
kindly Tag me
I'm Stuck on This Project for 3 Days because I want to deploy the Flask app online. So, I can get data from anywhere online but after trying to deploy The Project on almost every Cloud and Hosting site, I'm just facing one error ( 502 Bad Gateway). No matter, if I talk about AWS, Azure, Google app engine, Vercel, Heroku, Netlify, Konbey. Everywhere I'm getting the timeout error however locally the project is working perfect and Through Clis also working but as I deploy successfully and hit end point, it says 502 Bad Gateway.
If any of you have Solution for this then for God's sake pls tell me. I'm Stuck and can't move forward.
ai isnt here to beat us, its to provide more competition, :D
look at estimation bounds and optimal design
very cool project dude
One pattern I'm noticing is that everyone including me thinks that they are using AI right and it's this other group of people that are using it wrong and not learning and being held back.
Which is making me seriously reflect on how I'm using AI.
I don't use copilot, partly because of that, it spits out code, that isn't necessarily that good, and it's just so easy to leave it there as if it were a lib function
So I mostly use chat gpt, and mostly when I notice that I'm looking through the documentation and finding no success or it's just taking too long and it's just easier to just ask the omniscient chat bot about it
Like, what's the difference between reading the answer to a stack overflow question and a chat gpt answer to my specific question ?
The difference is that I had to write it and I had to cross check the answer either way
The me writing it part can be strangely beneficial
Which doesn't happen when you use copilot
Interesting, I don't use copilot but do use chatGPT for 100 % the same reasons
How can I train an ai model to become a support helper? I have all the discord messages saved from the support channel help. In any of this format CSV/TXT/JSON. I want to train the ai on the messages.
Totally forgot to do no grad, so that's why it was overfitting the val data
I'm also getting used to rust
Hey guys wanted to start with Data science can some one help me get best resources and roadmap , tried to search on udemy , coursera , youtube but im confused from where to start
ahh, is that for experiment design? of all classes i regret not taking, that was the #1.
yeah
a lot of my current research deals with stuff like imaging, localization, parameter estimation, etc based on a small number of measurements. like one would do in x-rays and what not
and then there's always the question of "where do i put the sensors?"
100% accuracy achieved >.>
I'm either doing something wrong or I'm going right to the top of the SOTA leaderboard
I reckon I'm doing something wrong
very cool. did you learn that stuff from a book or a course? or just picked it up as you went?

Hello guys, need some help with medical image segmentation. I am working on lung tumor segmentation and the model seems to work well with the train dataset (overfitting, mostly) but the testing does not go well. I am not sure if I am doing things right. I am using pytorch and segmentation models pytorch for Unet models.
I'm trying to force an overfit rn
That would prove no leakage
For anyone curious, average pooling is working here
But I got leakage no doubt, 1e-7 loss on both
my supervisor asked me a really hard question while i was writing my masters thesis, and then he made a suggestive comment pointing toward this stuff as a possible answer
5 years and many books and papers later, here we are
leakage has been found, part of my code still assumed multi processing instead of async, so they used the same duck db connection to create the tables
The tables all had the same name
Just created a uuid for the name of the tables
i'll take resources if you have a couple that helped you in particular!
That wasn't it, which in hindsight makes sense because I'm creating a different connection for each
So.... How am I gonna debug this >.>
Maybe join the two tables via de text field ?
Alright, 123 rows are shared by both tables
This is out of 50k
So it does not explain it yet
I do believe this is in the original dataset
Must be there's no way I mixed this up in such a specific way, and also by only using curl and tar
Matter fact, I'm gonna code the download part right in the pipeline
How do I reshape Table A into Table B? Do I melt, pivot, stack or something else entirely. Many thanks in advanced!
I need your folks help on this:
So Im making an AI assistant for my school project and there are two problems Im facing: (Im using PyCharm btw)
import os
import time
import pyaudio
import playsound
from gtts import gTTS
import openai
import speech_recognition as sr
api_key = "API"
openai.api_key = api_key
lang = 'en'
while True:
def get_audio():
r = sr.Recognizer() - here Pycharm tells me to indent the "r" right here
with sr.Microphone(device_index=1) as source:
audio = r.listen(source)
said = ""
try:
said = r.recognize_google(audio)
print(said)
if "BanglaGPT" in said:
completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": said}])
text = completion.choices[0].messages.content
speech = gTTS(text=text, lang=lang, slow=False, tld="com.au")
speech.save = "welcome1.mp3"
playsound.playsound("welcome.mp3")
except Exception:
print("Exception")
return said: - Several problems with this line, It says its out of function, it needs End Statement
get_audio(): - An illegal target for variable annotation, and expression is expected
(Ignore the fact that the name of the AI is BanglaGPT and the language it is supposed to speak is english, its for tests ok?)
go delete/regenerate that API Key ASAP
you should treat them as passwords, if not even more important than that
I did
just editing/deleting the message is not enough - make sure you actually delete the key
Ok so I was able to fix almost all the issues in the code
but now the problem is
It doesnt speak
Smh
statistical signal processing: detection, estimation, and time series analysis by louis l. scharf
fundamentals of statistical signal processing: estimation theory by steven m. kay
spectral analysis of signals by petre stoica and randolph moses
"stack" and "unstack" convert column names into row labels (aka index levels) and vice versa. that is, they operate on labels.
"melt" and "pivot" convert data columns between long and wide format. that is, they operate on data.
both can be used for this reshaping operation. melt and pivot might be easier to reason about though.
in fact, you need both here. first you want to melt this to "long format":
Region | Country | Studies | Date
then pivot this to "wide format" with respect to date:
Region | Country | Studies | Jan 2023 | Feb 2023 | ...
oh i see, very technical for signal processing! i'll take a look at the first one and see if there's anything i can get from it
the application motivates the problems covered in those books, but i think you'll find most of the stuff is easily generalizable
also what constitutes a "signal" is very easy to satisfy π
a large chunk of AI/ML is parameter estimation and estimation theory in a trench coat
that's what i'm hoping to get here. i remember trying to get into this stuff back when i was an undergrad studying economics, for the same reasons you just stated, but i had a hard time connecting to the applications at the time & wasn't strong enough with math yet.
I was parsing one of the classes incorrectly and that led to only one output token, which was 0 and that was the source of it
Now I'm getting a more realistic goes up to 0.5 acc and annoyingly stays there forever
I suspect there's gonna be something in the SQL still, gonna write some unit tests for this
{
"run_name": null,
"experiment_id": 1,
"data": {
"slices": 1,
"batch_size": 256,
"test_source": "***/dataset/test.parquet",
"train_source": "****/train.parquet"
},
"model": {
"depth": 6,
"heads": 10,
"encoding": "tiktoken-gpt2",
"dimension": 64,
"kernel_size": null,
"attention_kind": "quadratic",
"context_window": 300,
"input_vocabolary": 60000,
"output_vocabolary": 5
},
"train": {
"epochs": 100,
"learning_rate": 0.0005
},
"process": {
"use_gpu": true,
"executable_source": "****t"
}
}
in case anyone has any ideas, but hopefully it will be a temporary issue related to the data
that hypothesis is motivated by the fact that it's not overfiting
which might mean I'm messing up my randomization again, maybe I'm mixing up the labels each time
Does not seem to be the case
I have a question to those experienced in Excel. I'm having an issue in a sheet where after new data is generated through my Python script via Openpyxl, I am getting some values in a column with a yellow highlighted cell with bolded font. The thing is I double click on the cell and the formatting in question goes away and returns to looking like the other cells in the column. I know it's not my code because I tested the same exact code on a blank workbook and sheet and got everything without the formatting. Is there some type of hidden check that was in the original Excel template?
If I try and go over the cell and click no fill or try and unbold it, nothing happens. It only returns to normal after double clicking it and then clicking somewhere else.
I found the problem. The person before me left some conditional formatting in there. I ended up removing it. Problem solved.
Alright.
Here's what I'm NOT gonna do, spend two weeks experimenting with this stuff.
Gotta change my approach.
What I'm gonna do instead is go through the literature and see what people did and how. And then replicate that.
Time to PR, wait for the checks to be done, do a release so that my binary gets built automatically and then published and then it's time for a well deserved rest.
Tomorrow I'm gonna freeze some of the interfaces and write integration tests for them. After that I'll collect some papers and also think about the most efficient way to take the derivative of a metric tensor.
right now I'm working on a neat little project on my calculator for AI generated music
what was "this stuff" again?
Hyper parameter exploration
I spent a lot of time just messing about with the parameters, when I should've just looked it up. I think it's gonna be the same thing here
And its also about time I write some tests. Particularly, I'm gonna write some for the training loop itself. Just gonna generate a dataset that oughta be easy to generalize
Hi, can anyone tell my why I get this error?
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 87, in run
pipe = subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Python312\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] Nie moΕΌna odnaleΕΊΔ okreΕlonego pliku
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\barte\Documents\GitHub\docgpt\main.py", line 10, in <module>
doc = textract.process("spa.pdf")
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\__init__.py", line 79, in process
return parser.process(filename, input_encoding, output_encoding, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
raise ex
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\pdf_parser.py", line 21, in extract
return self.extract_pdftotext(filename, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\pdf_parser.py", line 44, in extract_pdftotext
stdout, _ = self.run(args)
^^^^^^^^^^^^^^
File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 95, in run
raise exceptions.ShellError(
textract.exceptions.ShellError: The command `pdftotext spa.pdf -` failed with exit code 127
------------- stdout -------------
------------- stderr -------------```
This is my code:
import textract
import os
from transformers import GPT2TokenizerFast
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
load_dotenv()
doc = textract.process("spa.pdf")
with open('./dataFromPdf.txt', 'w') as f:
f.write(doc.decode('utf-8'))
with open('./dataFromPdf.txt', 'r') as f:
text = f.read()
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
def count_tokens(text: str) -> int:
return len(tokenizer.encode(text))
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 512,
chunk_overlap = 24,
length_function = count_tokens,
)
chunks = text_splitter.create_documents([text])
It literally says in the traceback, you gotta read those
Exit code 127
was stuck figuring out why cropping images took about 3-4 seconds per image and why it was running on cpu (100% usage) instead of the cuda device
after some debugging I found the related line:
visible_pixels = crop[crop > 0]
and changed it to this:
mask = crop.gt(0.0).to(crop.dtype)
visible_pixels = mask * crop + (1 - mask)
function execution time went down from 3408ms to just 28ms lol
thats a 99,17% reduction!
love optimizing stuff like this but its rare that I manage to lower it this much
this is why i love programming
on the topic of resources, what do you think of this? http://neuralnetworksanddeeplearning.com/
Hi guys, (very quick question π₯Ί)
I recently downloaded a dataset of images of shotguns, handguns, and knives. I am using this to train a cnn used to detect potential weapons through doorbell camera images or footage. However, I don't know if i should normalize all the images to a certain size.
if I do, then the bound boxes in the corresponding txt file for each image would be skewed.
I have this preprocessing step
def frequency_encode(df: pd.DataFrame, features: str | list[str]=None, inplace=False):
if features is None:
features = df.columns
elif isinstance(features, str):
features = [features]
if not inplace:
df = df.copy()
for feature in features:
frq = df[feature].value_counts() # <-- problem
df[f'{feature}_FrqEncode'] = df[feature].replace(frq.to_dict())
if not inplace:
return df
```I put this in a `FunctionTransformer` in a `Pipeline`, then later I realized that at `# <-- problem`, I should instead somehow store a `df_train` that was seen during `.fit()` and use `df_train[feature].value_counts()` when `.transform()`ing
how do I do this? (while still being able to use a `Pipeline` of course)
You can use the Pipeline class to wrap your custom class with the necessary preprocessing and encoding. You can do it like this
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.expr import FunctionTransformer
from sklearn.pipeline import Pipeline
class DistanceTransformer(BaseEstimator, TransformerMixin):
def __init__(self, metric_func, **kwargs):
self.metric_func = metric_func
self.inplace = True
def fit(self, X, y=None, **kwargs):
if not self.inplace:
raise RuntimeError("Transformers must be called with inplace=True")
X = self.metric_func(X)
return X
def transform(self, X, **kwargs):
if not self.inplace:
raise RuntimeError("Transformers must be called with inplace=True")
X = self.metric_func(X)
return X
class PairwiseDistanceEstimator(BaseEstimator, TransformerMixin):
def __init__(self, **kwargs):
super().__init__()
self.kwargs = kwargs
def fit(self, X):
X = self.get_metric(X)
return X
def transform(self, X):
X = self.get_metric(X)
return X
def get_metric(self, X):
if self.kwargs['metric'] == 'euclidean':
return pairwise_distances(X, metric='euclidean')
if self.kwargs['metric'] == 'cosine':
return pairwise_distances(X, metric='cosine')
raise ValueError(f"Invalid metric '{self.kwargs['metric']}'. Defaulting to Euclidean distance")
class StatedFunctionTransformer(FunctionTransformer):
def fit(self, X: pd.DataFrame, y=None):
def deco(fn):
def wrapper(*args, **kwargs):
kwargs['df_train'] = X
return fn(*args, **kwargs)
return wrapper
self.func = deco(self.func)
return super().fit(X, y)
def frequency_encode(df: pd.DataFrame, features: str | list[str]=None, inplace=False, df_train: pd.DataFrame=...):
if features is None:
features = df.columns
elif isinstance(features, str):
features = [features]
if not inplace:
df = df.copy()
for feature in features:
frq = df_train[feature].value_counts() # <-- problem
df[f'{feature}_FrqEncode'] = df[feature].replace(frq.to_dict())
if not inplace:
return df
```this is what I've settled on for now, if anyone knows of a better/more conventional method, or there's a problem to what I'm doing here, please tell me
I know this isn't DS but you guys use jupyter... I'm reviewing jupyter and the course instructor is describing tooltip uses, when he presses shift tabx3 the tool tip remains open while hes typing, my tooltip closes immediately, what am i doing wrong?
Looks fine. I'd still go for the deep learning book I linked though π it's also hands-on, but it's more topical than the one you linked
tooltip remains open is not a default behaviour, my guess is he is using https://github.com/jupyter-lsp/jupyterlab-lsp or something.
ty
hi, i'm creating a project based on detecting a larvae presence in different water types. I'm getting the data thru sensors such as turbidity, oxygen, ph Level, and temperature. I wanna ask is random forest the way to go to properly detect larvaes depending on the data or is there other better ml algorithms?
some problem lends itself to certain models, e.g. sound and wavelet models, image and CNNs.
RF is a good starting point, it's up to you to search for better models once you establish a baseline model, xgboost has always been a strong contender in kaggle for a reason, i suggest you do some more research in that regard if you aren't familiar.
also sometimes it's not so much about the model you use, but the features you craft - e.g. your problem could potentially be solved by a temporal snapshot (i.e. just sensor values in one instance of time), or an alternative maybe more useful set of features might be some aggregate of sensor values over time (diff, % change maybe?), sometimes it's worth looking into the fundamental aspect of the problem, in this case think about the biological impact of larvae presence (they might make the water warm, "more warm" than usual? idk - not a biologist.. but if so how do you describe that properly?)
Anyone tried out the LLMs in 1.58 bits paper yet?
i second this. although i do think RF is a great default choice for medium-size data with a relatively small number of features
Tests - are important
Im training a housing price prediction model in TensorFlow with dimensions of X (20433 rows Γ 13 columns), loss="mae", optimizer="Adam()".
The problem I am getting is that upon training the loss initially decreases but after some epochs becomes stagnant.
Any suggestions on improving the model, and how many layers should I use?
tf.random.set_seed(42)
model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(32),
tf.keras.layers.Dense(1)
])
model.compile(
loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.SGD(lr=0.001),
metrics=["mae"]
)
norm_history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=64)
I know im always gonna enjoy these more cuz it took so much work, but damn these things look good
always check with papers with code to see how people are doing it
but also you are missing your non linear activations I think
non linear activations are used for linear regression problem?
Uhm, I don't know, I would try it yeah
https://www.kaggle.com/datasets/camnugent/california-housing-prices this is the dataset
There was a lot of discussion recently because of the meaning of linear in linear regression
Just try it, can't hurt to just try rite
Okay i'll try and say
this is paying off rn, getting a slow but steady decrease instead of a plateau
Tried it, its worse. Before it was plateauing around 48000, now its 53000
uhm, not much difference really, your loss is very large, try normalizing your data somehow, and adding layer normalization too in between
networks will prefer stuff between 0 and 1, in transformers z-score normalization is used across each batch, followed by a trainable affine
This is the normalization I am already using,
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
which activation did you use ?
im not sure what this does
would be helpful to see your loss graph
relu for both layers
I'm not aware of a loss graph
is it done using Pandas?
ah, the plot of the training and validation loss against step
something of this sort
Loss graph
alright, can you include loss/val and loss/train ?
mine is also not looking too good ngl, the missing ingredient is gonna be the pre trained embedder
woah, are those really per epoch ?
yeahπ
model.compile(
loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.SGD(lr=0.001),
metrics=["mae"]
)
so, don't use SGD, use Adam
mean average error, that sounds fishy to me
I dont use keras
let me check this
loss = mean(abs(y_true - y_pred))
try using mean squared error isntead
I think that is mae - mean absolute error
yeah, mean squared error is better
no need for a sqrt
uhm, what is your batch size ?
I've tried mse before it gives huge loss numbers
64
that's fine, the issue is that the data is likely not properly normalized
After using Adam optimizer
the validation and the training loss follow each other very closely here
that can be suss, after many epochs the model should start overfiting
try increasing your model capacity
more layers
u mean the no of layers?
model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(32),
tf.keras.layers.Dense(1)
])
this is quite small
model = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(25)
tf.keras.layers.Dense(1)
])
something like that, plus the activations ofc
you want your model to have more capacity than the dataset requires
could you suggest any activations?
GeLU
okay
ReLU?
okkay
after you've managed to overfit your model, you know you got something that has the power to do the task
you'll then try to cripple it so that it doesn't overfit, or, overfits just a little
you do that using dropouts
One question here, can I make the 1st Dense layer as 13 neurons because the dataset has 13 features?
what are the 13 features ?
longitude latitude housing_median_age total_rooms total_bedrooms population households median_income ocean_proximity_<1H OCEAN ocean_proximity_INLAND ocean_proximity_ISLAND ocean_proximity_NEAR BAY ocean_proximity_NEAR OCEAN
the last 5 were one-hot encoded with pd.Dummies
and what does the output of the model mean ?
alright, let's try to first normalize these, maybe with z-score along each column except for the one hot encodings
This is the output
tho that would kinda make it dependent on the sample
you need to get these in more reasonable ranges
for example house median age, maybe I'd divide every value by 30 or something like that
median house value too, by 300k or something
I actually did normalize before training
NORMALIZATION & STANDARDIZATION
features = ['longitude', 'latitude', 'housing_median_age', 'total_rooms',
'total_bedrooms', 'population', 'households', 'median_income',
'ocean_proximity']
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_test_scaled[0]
array([ 1.16597857, -1.33318189, -0.68338903, -0.76968499, -0.61778743,
-0.79510954, -0.64364484, -0.36439632, -0.89050504, -0.68141436,
-0.01649168, -0.35421275, 2.59982148])
the resulting X_test_scaled is alright?
model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(25)
tf.keras.layers.Dense(1)
])
maybe something like that
negative values are acceptable right?
yeah
make sure the output is also normalized
which I suspect it is not because of your large loss values
Output must be normalized?
This?
yes
if you look here it kidna plateus on the same order of magnitude as those values
For y
scaler = StandardScaler().fit(np.array(y_train).reshape(-1,1))
y_train = scaler.transform(np.array(y_train).reshape(-1,1))
y_test = scaler.transform(np.array(y_test).reshape(-1,1))
this is the scaling I'm using for y values
This is the graph now
It plateaus in its scale
how long does it take to do 100 epochs ?
2 minutes
I think generally ppl dont normalize the output y_values, right?
why not ?
I dont know I've never seen
ive never seen a non-normalized one, usualyl you're ven supposed to interpret the output as a set of probabilities
Do they use it in regression problems?
last time I did a curve fitting like the one you're doing I used normalization on the output
also tailor and fourier features
a small learning rate helped
lets gogoogoooooooooooooooooooo
the weight initialization helped
omg thank god
make it larger
fill your gpu memory as far as possible
I got 84% accuracy, aint gonna need no pretraineds
Yeah but your graph looks smooth
you have no idea how much work that took
gonna do 40 epochs like in the paper
then im just gonna push and do a release
tomorrow is CUDA time yo
you do that with Docker?
packaging n everything...
I'm just now learning that's why I asked
yeah, so I got these base images, which are meant for production use, I then have these github actions workflows that build on top of these images to produce my development and stagin environments
this setup allows me to very quickly switch from developing on my laptop, to developing on an aws machine with gpu
anything that works during development is guaranteed to work during production
and it's all very cost effective because I'm using interruptible instances
docker is very nice, I like it a lot, tho it's a solution to a problem that the industry created
yeah so Docker ensures all the dependencies are packed together so that in production the image can deploy and run anywhere, right?
each docker container is basically its own VM. so yes.
VM is?
virtual machine
an instance of an operating system
Okay, where do I learn docker and all its applications
I just got started with a 3hr video course on yt
a good way to practice would be to build a model, and then create a docker image that, when run as a container, allows users to interact with that model in a jupyter notebook.
which means that you'll need to write a Dockerfile that copies the model into the image, and installs all the Python dependencies
Hmm great...
this is using avg pooling instead of the usual attention mechanism
it would seem that there's something about the metaformer paper
only slightly worse
now I wanna train it for next token prediction, now way it could work for that right
I can believe sentiment analysis, because really, all you need is to count how many bad words and how many nice words
in fact im using identity now instead of the attention
Identity is suss tho, identity can be suss, but also could be not suss due to the aforementioned reasoning
The network will operate on each embedding individually and then average out to one token which is then projected to the output probabilities
This is kinda suss
By installing Docker and doing the official tutorial
Docker ain't perfect, but it's the best we got
noone? ;-;
It would be interesting to see for sure
Might not be able to do it tho, I'm getting this CUDA stuff done this week and next week I gotta shift focus to my job search
I'm leaving the project in a good state tho, easy to extend, all that's really needed is to add a pipeline that generates the right data
Next token prediction is just sequence to sequence without global average pooling
Ah there's stuff that does require major mods
Like machine translation or summarization, which I'm guessing require encoder decoder
I'm gonna start applying like a madman. Likely gonna focus on London cuz the EU market is so small
There's like 5 job openings in Switzerland ._.
I'm rooting for you πͺπͺπͺπͺ
This might interest you.
Look at Belgium, the ML market is really "English friendly"
pandas DataFrame, to select every n-th row
Starting pandas version 2.2.0 it becomes harder to use iloc property to select every n-th row from DataFrame. It happens because iloc got deprecated. What are alternative ways when index has the default form (it was created implicitly by DataFrame constructor called without index-related arguments neither it was modified)?
It happens because iloc got deprecated
what?
Pretty sure that's just misinformation. If not, show proof / link where did you see that.
found it in pandas.DataFrame.iloc API reference
this?
Youβre right only one feature depricated. Sorry
tbh I don't get what they mean by "Returning a tuple from a callable is deprecated.", this doesn't makes sense on this page and I do not see anything about it in the 2.2.0 changelog either
!e oh wait, probably something like ```py
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [1,2,3,4,5,6], "C":[1,2,3,4,5,6]})
test = df.iloc[lambda frame: (len(frame.index)//2, len(frame.columns)//2)]
print(test)
@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | /home/main.py:4: FutureWarning: Returning a tuple from a callable with iloc is deprecated and will be removed in a future version
002 | test = df.iloc[lambda frame: (len(frame.index)//2, len(frame.columns)//2)]
003 | A B C
004 | 3 4 4 4
005 | 1 2 2 2
hmm yep, that doesn't really works like I expected either
(it picked multiple rows instead of a row and a column)
!e that may as well be why it's deprecated lol```py
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [1,2,3,4,5,6], "C":[1,2,3,4,5,6]})
test = df.iloc[(len(df.index)//2, len(df.columns)//2)]
print(test)
@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.
4
still weird that it is not mentioned in https://pandas.pydata.org/docs/whatsnew/v2.2.0.html though
eh, just leaving the Issue that caused it to be deprecated in case anyone is curious https://github.com/pandas-dev/pandas/issues/53533
hey guys has anyone worked on the elec2 dataset and what should i do about drift detecting and what stuff should i apply to improve the performance
thank you !
yeah I'll be looking into almost all European countries where it's customary to use english in the workplace, I'm learning german but it will take quite a while
thank you !
ML6 is a great company, I think you'd be a good fit and you could get in so definitely apply π
sure, I will, thank you for the suggestion !
Being a Machine Learning Engineer at ML6 means you consider yourself as a healthy mix between a machine learning expert, a software engineer, a researcher, and a hacker! π€
very nice
Yup, I've seen many of their talks. They do really cool stuff. They'd be high on my list when I decide to change jobs myself.
aaah, nothing like battling nvidia right in the morning to get the heart pumping
what kind of stuff do they do ?
Cutting edge ML, but practice focused
I battled my sysadmin yesterday for more resources on my VM but the result was a 24h downtime and a wiped machine
I know smart people who refuse to work in software because they don't have the patience for it, and indeed it does really require a lot of it at times
When my colleagues say BS like "bUt WhY nOT rUn oN CpU" I want to quit immediately
Do you mean ML people?
I think there's a lot of anti software snobbery with data scientists but I've mentioned this already
no, people outside the industry altogether, who could get in if they wanted because they have the educational background for a transition
I think the two worst environments are nvidia's and javascript, they compete for first place as the worst dev experience possible
nvidia does it by forcing pytorch to do 8gb docker images and not allowing emulation so that people have to buy their gpu
I think there's also no cross platforming with the layer where cuda resides
I think the Nvidia docker runtime is well done tho, I wonder if it is possible to map nvcc inside the container instead of having to pull their dev image
"no space left on device", there's literally 100gb on that machine and I'm just pulling a docker image
I don't get it
And I just added 50gb, so somehow it's downloading 50 extra gbs
I had pulled it locally first to test it out, says 8gb
No wait, says 18gb
The root volume of the AMI I was using is different from the other AMI, so the volume got mounted but it wasn't root.
Giving it 125gb for good measure tho
An entire morning later, I have the compiler running
The Nvidia stuff with docker drives me nuts
Iβm unfamiliar with this, Whatβs the docker stuff with nvidia?
It is just an inconvenience with using CUDA, the Nvidia libraries and tooling single handily make the docker images enormous and difficult to run reliable across environments (Things like CUDA versions not aligning
)
I am looking forward to Burn's auto fusion system using WGPU becoming a bit more mature since it solves this issue providing you don't need the absolute most efficient and fastest system possible.
Oh, sure, yah we have to build out the images, the whole cuda setup is just a pain. Thought you were saying something else about docker
nah, it isn't specific to docker either, but with docker images you normally care to make them as small as is reasonable since it helps start times among other things, and CUDA just throws that out the window :P
maybe that's the issue, and I should try to build this directly on the machine
but then it breaks the rest of the workflow I think, since im using docker for everything
I think that is normally the best place to start, at least that way you can start to pinpoint what might be causing it
If I remember right, we had some issues with miss matched cuda versions, where the CUDA v11 pytorch image didn't want to work on EC2 for some reason, but the V12 image did
No idea why, didn't question it, just accepted that it was working and agreed to never touch it again
if it's anything like what I experienced it was probly the ARM architecture stuff, don't know if they resolved it but some time back they didn't have arm wheels or arm images
Haven't attempted ARM yet
My end goal is to use the Inf2 EC2 instances, but I still need to write a bunch of bindings and libs to work with the compiler and things
personally I think AWS has actually got a CUDA killer... If they focussed on lib support and integration a bit more
after this morning I think anyone that comes forward with a better dev experience is a cuda killer
but i reckon it's more complicated than that cuz it's also hardware stuff
The inf2 instance are pretty insane compute wise
It is just the lib support that is -_-
I believe they also stopped supporting ONNX which was a weird move imo
can someone help me if they know about topics like nlp, summarization, topic modelling, stuff like that? i'm doing a project and i need to know if there are articles/videos which specifically will be useful to my project
I'm gonna go in steps
- Compile simple c++
- Bind it to rust
- Compile c++ with some torch in it
- Bind it to rust
- Compile c++ with CUDA
- Bind it to rust
- Compile c++ with torch and CUDA
- Bind it to rust
Rn I'm trying to do a python binding using setup() in py. Which might not even be the right direction since I'm not binding it into py like that
If I bind anything into py it would be rust
I can't not use docker, the experience is bad, but it will only be worst without it
Even if some short term relief is achieved, and even that is not guaranteed
See the nano gpt repo, really good to get into transformers
Another possible approach is to try to see how rust behaves when interoping with CUDA, and then see if there's some way to assemble that into a custom layer directly in the torch rust bindings
I did check, and have not found a good way to do it in the rust torch bindings. But I'd definitely have to dig deeper.
They are mostly holding on to a pointer to the tensor and then calling c++ code with it.
Hi people. Is there anything who is here and can have a look on my code?
So I have trying to make a temporal view from my data. 2 variable. Each variables has 6 different areas with (3;2;1 sub-area).
Over the temporal subplots I`ve created, I want to make a mean value as for the area (from the sub-area together). So I want basically but the mean value to be shown for each day as well as the individual data.
I`ve done 2 different experiment. But the data will only be shown in the first experiment and for day 11 of the experiment 2
Experiment 1 (Day 1-5)
Experiment 2 (Day 6-11). So on experiment 2, data for 6-10 is missing, it must be either the way I am calculating the mean value of filtering the data.
Alright so if i got you correctly,
You want the mean value of the all the data from each day right?
That is accurate. But also the individual data.
Alright so just to clarify,
For each day you want the individual data and the mean value of that data right?
Yes.
The mean data for :
URT which would be (URT1, URT2 and URT3) and the individual for each URT1, URT2 and URT3
URC which would be (URC1, URC2 and URC2) and the individual for each URC1, URC2 and URC2 and so on
Alright so when i run your code, it runs and creates the mean value, but can u let me see your raw data so that i can check it real fast?
I can PM you the data.
Problem was not solved. I am open for hearing suggestions.
hey everyone , just got started with computer science and theres a course on database , main ERDs , i was wondering if any of you have sources to free exercises i can try and maybe even software for the diagrams , thanx in advance
can you at least state what dimensions your data has? i see your data looks like this? Experiment, Area (is that the legend?), Day, CH4, and CO2 -- and what are the colors?
ideally you could share a sample dataframe
you might want to ask in #databases , but check the pinned messages & search the chat history in that channel before asking your question. "can someone recommend a course" questions tend to get the same answer over and over.
Is there a stdlib way to do memory profiling?
What do you mean by saying "Dimension" in this contex?
I call it experiment has the same stuff has been done twice, just two different times.
Area would be the legend. The color would represent the individual spots for each area.
I can check out #databases for course.
a "dimension" is like an identifier. so area is a "dimension", day is a "dimension", etc. all the "IDs" that aren't "data", if you will.
Got it.
got it. so your data is like this?
Experiment | Day | Area | Measurement | Value
---------------------------------------------
1 | 1 | URT | Ξ΄13C-Ξ£CO2 | 10.5
...
so you have 1 measurement each of CH4 and CO2, per bucket per day?
or do you have repeats per bucket per day?
The first mention. I have one measurement of CH4 and CO2, per bucket, per day.
okay, so what are you trying to visualize? the daily average across buckets? or you just want to plot that full data?
As far I'm aware, there isn't
What I initially want to plot is the mean value for each bucket given the same name:
Mean value in grey of all URT (URT1, URT2, URT3) and additionally each individual given a color. And so on for the other. CV will only have 1 tho. and CH2 two
mean value across days? experiments? both?
because i thought you only had one measurement from each bucket per experiment per day. so there's nothing to take the mean of
thanks mate
@quaint loom you also mentioned an "individual" -- what's that in this context?
The mean value of ex URT1, URT2 and URT3. The individual would be shown as 1 plot for URT 1, one plot for URT 2 and 1 plot for URT 3
yes but the mean value aggregating across which other variable?
is this the table for one experiment? or is this the whole data?
don't be coy here
I tried to read to get the context, but i didn't get it
Maybe a diagram or some sample showing input and output makes more sense?
Is the same tavke for both experiment.
Could be better that if I choosed the average value instead.
i think you might need to share example data. or step back and provide a more detailed explanation of your data
i feel like you're saying conflicting things
# Columns: Bucket, Day, Experiment, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...
data = data.set_index(["Bucket", "Day", "Experiment"])
is this what your data looks like, or no?
How can I share a example of the data similar as you did here?
i want to know what the raw data looks like, i.e. what's in the excel sheet. the screenshot you posted shows one measurement of CH4 and one measurement of CO2 per day, per bucket. it sounds like that is in fact representative of your data. but now you're talking about averaging and i'm trying to figure out what you want to average over.
My internet seem to be a bit slow, so I can`t paste the data somewhere.
Maybe I am also not using the vocability (As you`ve mention to be years ago) not accurate. Not sure if it would be more suitable to use the word average or mean. But I would like to plot the Mean/Average of URT 1, URT2, URT 3 etc for each day.
I will share a picture of the dataset to I find a solution or if you could add me as a friend and I can share the excel file .
plot the Mean/Average of URT 1, URT2, URT 3 etc for each day.
you want to plot that average across all 4 experiments that were carried out each day?
Hmm then any as-good-as-stdlib de facto way?
@quaint loom for each of CH4 and CO2, you want to create a plot with an X axis that says "Day" and a Y axis that's the average CH4 or CO2 level, across all buckets?
or you want the average within each bucket? but then i don't know what you want to average over.
Maybe that is where the issues lays. Because the way you phrase it makes me confused but logical somehow. Just Average/mean given that day1, 2, 3 - 12
# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...
data = data.set_index(["Bucket", "Day"])
so your data is like this?
where Bucket and Day uniquely identify 1 measurement of Ξ΄13C-Ξ£CO2 and 1 measurement of Ξ΄13C-Ξ£CH4?
For memory profiling you mean?
Yes
One Graphs with CH4 (Experiment 1), One graph for CO2 (experiment 1) and same for Experiment 2
X axis I want days, Y, the value of CH4 or CO2
Sorry the messages in between had not loaded in for me
These memory-profiler, which is commonly used
Average/mean of all 3 bucket for each day.
Thanks
I've been having a great time with scalene recently
A few of my colleagues have also been testing out memray from Bloomberg, heard it's pretty good
@desert oar But I also want the individual bucket to be represented in the graph, shown with color.
so you want the individual buckets and the average across buckets?
i see. that was not clear
@quaint loom does this work?
# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...
fig, axs = plt.subplots(2, 1)
# CO2
ax = axs[0]
ax.set_title("Ξ΄13C-Ξ£CO2")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CO2"], label=bucket_name)
ax.legend()
# CH4
ax = axs[1]
ax.set_title("Ξ΄13C-Ξ£CH4")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CH4"], label=bucket_name)
ax.legend()
plt.show()
oh i didn't add the averages, hang on
what's this Experiment 1 and Experiment 2...
you didn't say anything about that. i was trying to get clarification but it sounded irrelevant
anyway what i posted above should at least give you the right idea. just need to separately call ax.scatter for each plot
Think about it as different days. I thought I mentioned to you that it was the same thing, just different days.
but you also have day on the X axis
are they days, or are they something else?
Experiment 1 is day 1-6 and experiment 2 is day 7-12
ah
# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...
daily_avgs = data.groupby("Day")[["Ξ΄13C-Ξ£CH4", "Ξ΄13C-Ξ£CO2"]].mean()
fig, axs = plt.subplots(2, 1)
# CO2
ax = axs[0]
ax.set_title("Ξ΄13C-Ξ£CO2")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CO2"], label=bucket_name)
ax.scatter(daily_avgs["Day"], daily_avgs["Ξ΄13C-Ξ£CO2"], label="average")
ax.legend()
# CH4
ax = axs[1]
ax.set_title("Ξ΄13C-Ξ£CH4")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CH4"], label=bucket_name)
ax.scatter(daily_avgs["Day"], daily_avgs["Ξ΄13C-Ξ£CH4"], label="average")
ax.legend()
plt.show()
that should do it with 2 plots, one for each chemical you're measuring
if you want to split it up into 4 plots per day you'll have to do more work
but again you might want to look into seaborn, it automates a lot of this for you
i don't personally like seaborn very much, but it does make it a little easier to make multi-faceted plots like this
Yea, I like seaborn.
this is how you can define the experiment column easily:
data["Experiment"] = np.where(data["Day"] >= 7, 1, 2)
and that lets you now set Experiment as a facet
Dank. I have been filtering the days, so separate them from other day: experiment2_filter = [6, 7, 8, 9, 10, 11] Not sure if you looked at my code
i did but it was 160 lines of stuff that i didn't fully understand, so i hope you'll excuse me for not trying to analyze it too carefully π
All good. Thank you for your input, Salty rocky l@mp
of course. in the future it helps to define your terminology up-front to avoid back and forth interview like this
Im working on the terminology βΊοΈ
Why is numpy.linalg.det() so inaccurate?
or is there something wrong with my algorithm?
(I'm using reduction to upper triangular matrix method)
i got tested cpp code bound to rust, tested in the rust code
it also compiled with torch
now I'm gonna try to get a tensor operation to compute in the cpu via the cpp code
If you are already transforming it into an upper triangular matrix why do you even need to compute the determinant with a function like that in the first place.
I'm using np.linalg.det() for testing if my algorithm works
apparently I'm passing 614/10000 tests
and getting an average of 0.42% of error
wait wait
I don't really understand what you are doing at all where you can conclude that the method is numerically unstable as opposed to whatever your algorithm is.
What is the condition number and error of one of these matrices that are "so inaccurate"?
no numpy was giving me results in the weird floats so i thought it was using heavy optimization tricks which results in inaccuracy
I actually fixed my algorithm now
numpy was correct
9938/10000 passed
my bad
CUDA is compiling into rust, just gotta get the syntax right
But I'm gonna have to do it without torch for now.
No idea how their CUDA API works
So I'm just gonna have float pointer passed to the kernel and play around a bit with it
Just got my first kernel run on a GPU
Didn't do anything
Cuz I didn't code anything, but I can see from Nvidia smi that the process is accessing the GPU as I hit cargo test
This was a lot smoother than expected ngl
Afternoon guys, I'm having trouble getting Cuda working on VS Code using Python 11 + Pipenv. Even though I have a CUDA enabled GPU and installed CUDA toolkit, torch.cuda.is_available() outputs false and that's it. I can't use my GPU in windows 10.
Does anybody know what the propblem might be?
Can you run nvidia-smi ?
i think so
Which command did you use to install pytorch ?
pipenv install torch==2.2.1
That's not the one for GPU
is there a different library for torch gpu?
Well actually now that I'm seeing it would work, but only if your CUDA version is 12.1
I have cuda 12.4.99
uhm you'd think 12.4 is compatible with 12.1
where did you check that it has to be 12.1 specifically?
in the link I sent
not sure if it's possible to downgrade cuda version
I've been using the nvidia docker runtime
it can also be frustrating at times, but it does free me from these kinds of issues
what's this?
it's a docker runtime that gives containers cuda capabilities
you just do --gpus all when running the container and it becomes available
the good part is that pytorch has official docker images
so version mismatches are usually not an issue
which containers are you referring to?
Note the docker deamon also needs to be configured normally to use the nvidia container toolkit
sorry im very new to this
ah I code in aws machines that come with it installed
but last time I had to install it for some deployments I dont recall having to do that much with it
even had a script for it
thanks for all your help, I will try to fix it
this is the worst part of it, dealing with this stuff
I just blindly ran pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 inside of my pipenv virtual environment lol
it's moving something, so we will have to see if it works
I think the source of this mess is, if I'm not mistaken, is that Nvidia won't let us have a CUDA alternative - not too sure about it, I overheard it in a yt video
There's even a video of the creator of Linux flipping them the finger, must be a reason for it >.>
Wait it started to work, but hmmm
None of the installed files are in pipfile
pipfile ?
yeah I'm using a pipenv virtual environment
I see, a requirements.txt replacement
something like that
It's in the title
There's also poetry
I've replaced all this with docker, not sure if it will ever make sense to go back, don't even have a requirements.txt in my project
The dependencies are pinned in the docker file?
Yes
How is it any different than having a requirement file?
I can pin dependencies from 3 or 4 diff languages
There's also env variables and other necessary scripting that can be moved to the docker file
So it centralizes a lot of stuff
Within the docker file itself it's either
Pip install packages...
Vs
Pip install -r requirements.txt
I don't see an improvement
The improvement lies in not having an extra requirements.txt file around and in being able to pin down the dependencies from the other languages and other parts of the project
Lol it just combines both the files
The docker file is also production ready
What does that mean?
I think you are not seeing the big picture
This doesn't make an improvement
Either i have all the packages within the docker file or point the docker file to a requirement file
I don't see an improvement
Rather it makes the docker file more cluttered
I've listed my reasons already, if you have any argument against it I'm happy to listen to it. But at this time this is the best workflow I came up with.
@final kiln this basically
why duplicate effort and have a requirements.txt file when you can just hardcode it into the dockerfile
How is it duplicated though? The docker file can always point to the requirement file, which can get updated independently of each other
It's even more clear what's changed in a pr
you're stating the same thing twice, install these dependencies
like, I'm not gonna argue about this, I prefer stuff in one place
Buddy the docker file will have a pointer to the requirement
Like
Pip install -r requirements.txt
That's it
I don't like it and I have valid reasons to not like it, there's dependencies that have to be installed in a particular order for example
has happened before
Just want to hear your reasoning not an argument
there's dependencies that you only need in a builder step
I don't like having one file for setting up my environment, and then a separate file for setting up another part of it, makes no sense when I can just have it laid out right there within its context
You can have multiple dependencies files
Usually I have like deps folder with main, dev, test etc etc
in general I avoid making people jump around my code, I try to minimize the number of indirections
So having everything in one file is more practical?
Understandable
yes, it is, because it's literally just a list
Who even reads dependencies files at all, i usually look it up once in a quarter to bulk update all together or setup a deps bot

I read it when there's an issue
or when I need to check a version
or when cuda is acting up again
hardcoding is how I do it, you do it different, it's fine
Like core dependencies are usually defined in pyproject.toml, for packages
For apps to make reproducible build, a requirement file is used to pin all dependencies
I hardcode the production dependencies in my docker file and my dev dependencies are installed via github actions workflows
something that just came to my mind: is cgpt, at least initially, speaking so, you could say, eloquently, because those words may have been rarer in the dataset so they were artificially inflated with weights or whatever and that turned out to just make it choose those words more often in the end?
it allows me to code in any machine in a reproduceable way, it fetches me a machine from my cloud provider and exposes me a vscode instance in the browser
This is a common pattern used across many projects, you might wanna look up some oss projects on how they do it
I understand it is a common pattern, but I don't use it as I don't code exclusevely in python
there's a lot of moving parts in my project
which don't fit into that structure
Neither do it, polyglot ftw
I think they just selected a good dataset right, at least in the fine tuning stages
Usually it's not always as complex as we think it is, somebody would have solved the problem we have
If not that will birth a new oss utility π
I'm interoping cuda, c++ and rust with torch permeating all of them
and I also dont have a gpu
if you have a better way of achieving a burn rate of 10cents an hour, I'm all hears tbh
I don't know what lead to this, but having each as a seperate self contained module and using python to interact would be a viable solution
I just decided I wanted to challenge myself
I actually couldn't compile a cuda extension with python, ended up being a lot smoother with rust, go figure
Well good luck engineering, don't forget what you originally tried to solve π€ͺ
no, there's solid reasoning behind each decision, I'm exploring the use of compiled languages for machine learning, I believe that a synergy is possible because the compiler can be used to perform early math checks
rust is being used because the torch cpp interface is not stable, the only job of the rust torch maintainers is to keep that torch interface stable to its users
I decided to code cuda because it's the only way to squeeze out the performance gains from my proposed attention mechanism
unless there's a layer that leverages the symmetry of symetric tensors, but I couldn't even fiind a way of imposing that with pytorch without a lot of overhead
rust runs the training loop in a process, python generates data in another, which means there's never a gpu down time thus being cost effective
My deep learning set up at work finally works π
Still don't have all I need, in the sense I'm being bottlenecked by RAM and CPU and not by GPU/vRAM
Luxury problem
@final kiln what is the actual problem you're trying to solve?
I'm exploring a couple of research questions
I've just laid out one of them
I still didn't get the intent though, the use of certain tech could have it's reasons
the intent is to the use of compiled languages for machine learning, I believe that a synergy is possible because the compiler can be used to perform early math checks
and to also extract results from my new attention mechanism by reproducing the metaformer study
But aren't most ml packages complied to begin with?
the compiler can be used to perform early math checks
Python is used to interact and interface
which are not currently done

I don't think you're trying to understand what I'm saying
You mean to say optimise out steps?
I mean to detect that a matrix operation will not pan out before running a training loop
yes you can have runtime checks
but with a compiled language you can have the compiler carry out the work and not have the overhead when training
I've also felt the need for stronger type checking in the models as my projects grew larger
Still I'm wondering the data/matrix only comes in to place during runtime, how a compiler can find out this at compile time?
you can, for example, declare a tensor to be of a shape Tensor[a, b, c] and use symbolic calculus to infer the rest
I managed to trick the rust compiler into it using macros
On the arch yes
but it's an open ended exploration, idk how far it will get and if rust is even the right language for it