#data-science-and-ml

1 messages Β· Page 110 of 1

final kiln
#

sometimes the runner is my laptop, so I need to be careful

#

never heard of caddy, the automatic https thing sounds appealing

past meteor
#

If you ever need to do anything webby like deplying mlflow I'd use it ahead of nginx

final kiln
#

I've been using traefik, it works well with docker compose, which has been the main orchestrator I use

past meteor
#

Traefik is Caddy's main competitor yeah. It also works well with Docker compose (which is what I use as well)

final kiln
#

tho the issue here is just that the free machine only has two cores, so the two workers get overrun very easily

past meteor
#

Personally I deploy 1 Caddy for all my apps and not 1 per

final kiln
#

I'll give it a try for sure, using nginx was a nightmare, especially with docker

past meteor
#

Maybe you could look into getting an EC2 instance or similar to permanently host stuff for you

final kiln
#

yeah I'll upgrade to a spot instance, use skypilot to get a new machine once it's taken away

#

it's like 5 bucks per month for one of the good ones, assuming constant usage, which wouldn't really be the case

#

5 or 10

past meteor
#

What do you get for €5?

final kiln
#

I don't recall, it was one of the cX machines, I just skimmed through to see what price I could get

long canopy
#

man doing profiling in python sucks

past meteor
#

What are you using?

long canopy
#

am trying to see why training is slowly filling up my RAM then my swap, am using memory_profiler

#

the __call__ to DistilbertModel from huggingface's transformers increments memory usage by 100 mb on each call, so i'm trying to see what exactly is going on

#

also for some reason memray says peak memory usage is 600 mb, yet memory_profiler shows it going over 12 GB

past meteor
#

pretty good experience with it all round

long canopy
#

thanks!

coral lotus
#

Hi so I am trying to train a neural network to detect cells in an image using faster RCNN. I have a dataset consisting of 1300ish such images, that are labelled as well . Each image contains several cells, most of which are red blood cells and the rest of which are infected cells. Is there something pre-existing that I can use to train a network on my dataset of images?

past meteor
serene scaffold
desert oar
past meteor
desert oar
past meteor
coral lotus
past meteor
#

but I wouldn't know why

desert oar
#

also @coral lotus from a human perspective, how hard is it to distinguish the images? are you interested in getting a useful estimate of the probability distribution over labels, or just minimizing prediction error on labels?

past meteor
#

How good are you with object detection/segmentation already?

coral lotus
#

just minimizing prediction error i guess. my main goal is just to make cell identification as accurate as possible

coral lotus
desert oar
#

oh yeah is this a detection/segmentation task, or are you just classifying the entire image?

past meteor
coral lotus
#

so this is what an average iamge looks like

past meteor
#

If I were you I'd learn about object detection and segmentation first and then learn how to do what you're trying to do after

#

If you don't know enough about neural nets while you're doing that I'd advise you to do that as well

coral lotus
#

im gonna be honest, this is for my science fair project thats coming up pretty quick so i dont have much time. But I don't think it should be too hard to do them both?

past meteor
#

Gonna be honest and say that I (and most folks) won't want to walk you through the entire thing either but are definitely willing to help if you have specific questions

coral lotus
#

i was going to use a faster rcnn framework to detect the cells in an image and then a cnn framework to classify the exact cells

desert oar
#

@coral lotus if you just want to classify the entire image, this might be on par with mnist (considered easy/solved) depending on how distinct the infected cells are from healthy cells. for actually detecting/counting infected cells it might also be easy but i don't have experience with detection or segmentation and don't want to speculate

coral lotus
#

I mean i guess its just classifying an image? not completely sure. like if there is one infected cell in an image then the whole image can just be considered as infected

#

because they are close up images of blood smears of patients

#

so if there is a single infected cell in an image then that means the patient is infected

past meteor
#

Do you want to draw boxes on every infected cell or do you want to say "this image has infected cells"?

coral lotus
#

I mean preferably drawing boxes, but just saying "this image has infected cells" would be enough for my project

past meteor
#

As usual I agree with salt rock lamp

#

"This image has infected cells" is easy

#

Drawing boxes isn't too hard either but you may have to label data and that's the time consuming part

coral lotus
#

yeah so like if this is an image, id just it to say "this blood smear shows that the patient likely has malaria"

past meteor
#

Unless your dataset is already labelled

coral lotus
#

its already labelled

past meteor
coral lotus
#

alright thank you

#

also by labelled i mean the dataset came with a json file consisting of thousands of lines of this

[{"image": {"checksum": "676bb8e86fc2dbf05dd97d51a64ac0af", "pathname": "/images/8d02117d-6c71-4e47-b50a-6cc8d5eb1d55.png", "shape": {"r": 1200, "c": 1600, "channels": 3}}, "objects": [{"bounding_box": {"minimum": {"r": 1057, "c": 1440}, "maximum": {"r": 1158, "c": 1540}}, "category": "red blood cell"},

desert oar
#

at least according to that data

#

i would still start by just trying to classify the entire image infected or not. easier project, more forgiving

#

but there should be plenty of introductory material on image segmentation as well. the data already being labeled with bounding boxes should help a lot

#

but again i would have to defer to other people regarding how to actually build a segmentation model

#

seems like a great practice task for me to learn πŸ˜†

past meteor
#

you'll have to read the guide and plug in the gaps as you go

broken arch
#

hello guys i Hope Ur doing well , i was wondering if anyone can send me an interesting dataset medium or bug sized for machine learning and he got some dΓ©cent results After working on It , tysm πŸ™

warm copper
#

.latex Suppose that $f_1$ is a model with $X$ features and the loss function is denoated as $L_1$, whereas $f_2$ is a quadratic model with $X_2$ features and the loss function is denoted as $L_2$, we want to show that:

$L_2 \geq L_1$

We know that the logistic loss function is:
$L(y,\hat{y})= -ylog(\hat{y}) - (1 - y)log(1 - \hat{y})$

Assume that the predicted probability for $f_1$ is $\hat{y}_1$ and $\hat{y}_2$ for $f_2$ on the same data point $(x_i, y_i)$:

Then, $\hat{y}_1 = f_1(x_i)$ and $\hat{y}_2 = f_2(x_i)$.

Since $X_2$ has the quadratic features of $X$, it contains all the features of $X$ as well. Thus, $X_2$ has at least as many features as $X_1$, which also means $f_2$ is more flexible to fit the data compared to $f_1$.

Because $f_2$ is at least as flexible as $f_1$ and both are optimized to minimize the loss function, we can conclude that:

$L_2 \geq L_1$

strange elbowBOT
warm copper
#

heres the question:

#

.latex .latex Suppose that $f_1$ is a model that optimally fits the data $(X,y)$, and $f_2$ is another model that optimally fits the data $(X_2,y)$, where $X_2$ are the quadratic features of $X$. Then the loss function value obtained by $f_2$ is always going to be at least equal to that for $f_1$. Try to come up with a solid mathematical argument that justifies this claim.

strange elbowBOT
warm copper
#

correct idea? @wooden sail

wooden sail
#

i'd say no

warm copper
#

WHAT WHY?!

wooden sail
#

none of that is true nor useful

#

by your argument, raising all the entries of x to the 0th power sill also be "at least as flexible"

warm copper
#

o.O

wooden sail
#

i would try doing the math for the scalar case and then generalizing

warm copper
#

when ML is all about math

#

πŸ₯²

spring field
#

it's always been about math, you could train models with pen and paper
it'd just take ages...

warm copper
#

so teacher gave a line like this in the exam

#

there were tons of points on each side of the line (two different classes) seperated by that line above

#

and asked this question:

#

This line of separation cannot be done using SVM

#

I said yes

#

we can't right?

#

its because SVM needs a linear decision boundary

#

unless we use a kernel trick

past meteor
#

I think it's not a great exam question either way. you can make a new feature x' where you can get a linear seperation

warm copper
#

yeah

#

I was confused with that question tbh

#

i mean it can with kernel trick but also it cant @past meteor

#

question should have been specific

terse quarry
#

Anyone knows a place where I can learn AI

native bough
#

youtube

abstract wasp
#

Hi, anyone know what the N and M stand for here?

wooden sail
abstract wasp
midnight harbor
#

Has anyone in this community started using the Google Gemini API following GPT-3, and could you provide insights into its strengths and weaknesses? Specifically, I'm interested in understanding its performance in terms of

  • pricing
  • speed
  • reasoning capabilities
  • multilingual understanding
  • controllability.
    Any feedback would be greatly appreciated.

kindly Tag me

smoky hamlet
#

Hey um ummm

#

Im gay

modern storm
#

hi anyone that know about apis and python can help me out

#

i am trying to use novelai api to generate an image

final kiln
#

In reviewing the MetaFormer paper. So the conclusion is that a local operation like average pooling can substitute global operations like scaled dot product. And by extension, a kernel conv layer can substitute the avg pooling.

#

So here's what's bothering me. Did no one ever thought of using CNNs for language modelling ?

#

Even if that doesn't work well, it looks like a very small step to go from a CNN to a series of (conv + MLP) type of model

#

It would legit be the second thing I'd try

#

In fact, I thought the whole idea behind transformers is that CNNs, despite reducing dimensionality, they can't capture long rage relations

#

So maybe the important feature is the multi headed thing, which granted, I wouldn't easily think of it

#

Omg they even use the identity map to replace the attention module, I'm sooooo confused

#

I'm super suss'd out rn ngl

#

the -> means substitute, so in case of the identity mapping the removed the avg pooling and used an identity, reducing the model to a series of MLPs with layer norm in between

#

that gives them 74.3%, which is obviously suss because the presence of a given token will not affect any of the others right

#

the most suss part is the hybrid stages

#

they sneakly don't include the results for [Attention, Attention, Attention, Attention,]

#

and there's a clear increase as more attention modules are included

final kiln
final kiln
desert oar
desert oar
final kiln
desert oar
#

ooooh right

#

yeah good question then

#

i always forget that detail

final kiln
#

Like, in a way, it's a batch of images

#

I guess I can buy it for the avg pooling, but the identity mapping is suss

final kiln
desert oar
#

oh i see

final kiln
#

A single one ig

desert oar
#

I'm sure it was tried, i just wouldn't know πŸ˜†

final kiln
#

You'd think it was one of the first things to try right

desert oar
#

ask during more normal US hours so one of the actual ML experts can weigh in

#

I like the interpretation of attention as a "token mixer"

#

But i think the main issue is likely to be what you said: text is less "local" than an image and you probably can't do without the long range token graph

final kiln
#

Yeah could be an explanation, the patches of an image fit together in a very different way

desert oar
#

2015

#

safe to say it has been tried

final kiln
#

Pixels close to each other are likely to be semantically related (part of the same object), but the same isn’t always true for words. In many languages, parts of phrases could be separated by several other words.

#

Could be that it won't work for NLP

#

They used imagenet, and if they are doing classification, I can see this working

#

Even with identity, the patch embedding already packs a ton of information into the embedding

crisp raptor
final kiln
crisp raptor
final kiln
#

When you embed the tokens you get something that resembles an image, quite a lot actually

#

Well this is actually a batch

#

But I reckon it would look fairly similar if you imshow a sequence of embeddings

desert oar
#

i don't think it would be necessarily better for text generation, but could be interesting linguistically

#

that's always been my intuition about what transformers do anyway, we've talked about that here before

#

i wonder if you could do some kind of "semantic filtering" on the output of a transformer stack, and then running a convolutional layer over something like a sliding average of tokens. less grammatical detail, more big picture

#

i should noodle around with that + nanogpt

#

or maybe something encoder-only like bert... need our NLP experts' opinion

final kiln
final kiln
#

But ofc, with avg pooling

#

The residual connections transfer information from earlier layers to later layers, so the order might not matter that much

desert oar
final kiln
desert oar
#

so avg pooling of the attention-ized token sequence specifically

final kiln
#

They took out the scaled dot product and used an avg pooling

#

To show that the token mixer is not important

desert oar
desert oar
final kiln
#

I'm replicating it for NLP

#

Quite curious on how it will turn out

desert oar
#

as in, running the same experiment but on a text dataset?

#

yeah that will be interesting for sure

final kiln
#

This one is already with a quadratic form attention, instead of Q K V

desert oar
#

your metric tensor variant right?

#

how did that turn out by the way

final kiln
#

The metric tensor one is not done yet, I'm gonna code it in cuda directly to impose all the weird conditions that I need

#

But the pytorch implementation did well for the Amazon dataset

#

It achieved close to SOTA performance

#

If you filter out those models that weren't pre trained beforehand

#

Like training Bert on next token pred with google level datasets and then fine tuning for sentiment analysis

#

I'm coding a bunch other attention mechanisms today, including the avg pooling

desert oar
#

cool!

final kiln
#

It's taking a while because I was figuring out my ML workflow, I definitely got it down now

#

So after finishing the model stuff and the extra validation steps on the training loop

#

I gotta setup a couple more datasets and then I can trigger the experiments

#

Those are gonna take a looong while, so I can finally pause this and give a bit more attention to my job search πŸ˜…

final kiln
# final kiln

I mean, they did only use one dataset here, so I might do the same

#

Yeah I might keep it to text classification. The IMBD dataset is very good

#

Perhaps too good tho

desert oar
turbid drift
#

Just tried data scraping https://www.city-data.com/ and I made sure to use a timeout between each request, anywhere from 1 to 2 seconds long (random), but my connection still got closed and the IP I was using was blocked. Thankfully I was using a VPN so I can try again, but how the heck is 1-2 seconds between requests too fast? I thought hosts would only deny requests that were like 0.1 seconds or less in between.

#

I can't make it much longer, because I don't have all that much time to wait (imagine if I was having to do this for a real job; they certainly wouldn't be that patient) and what if it encounters some problem with a random page partway through and I have to do the whole thing over again?

#

A little frustrated since I'm doing this for a portfolio project that my resume kind of hinges on. Why is 1-2 seconds too little anyway? If I wanted to ddos attack someone (which I don't), common sense would say I'd do it a lot faster than that.

final kiln
desert oar
# turbid drift Just tried data scraping https://www.city-data.com/ and I made sure to use a tim...

unfortunately their ToS prohibits automatic scraping, so we cannot help as per server rules.

https://www.city-data.com/terms.html

This license does not include any right to private or commercial collection, aggregation, copying, duplication, display or derivative use of the Service nor any use of data mining, robots, spiders, or similar data gathering and extraction tools for any purpose unless expressly permitted in advance in a written document signed by us. The sole exception is the limited right provided to general purpose internet search engines and non-commercial public archives that use such tools to gather information for the sole purpose of displaying hyperlinks to the Service, provided they comply with our robots.txt file.

#

in the future, i advise not basing a school project around ToS violation and circumvention of access control

desert oar
#

@turbid drift this doesn't seem like a commercial site. so if you contact the owner of the site, they might be willing to provide you with a dataset.

turbid drift
#

Still frustrated though, web hosts are so overly paranoid.

#

It should be obvious that I'm not an attacker.

desert oar
#

yeah, good luck. this seems like a hell of a lot of work for a volunteer data aggregation project πŸ€” are they making money off of this somehow?

desert oar
turbid drift
#

I know. Hopefully I can find another source if this doesn't work with a more flexible ToS.
I'd just do an analysis on an existing Kaggle project but apparently employers want me to actually come up with my own data and not just use something from there.

turbid drift
#

More specifically, I'm wanting to get data on US sister cities, including information like connections between ethnic populations and whether or not they correspond with the countries represented in those sister cities.

#

For example, do US cities with sister cities in Asia tend to have higher-than-average Asian populations?

#

Seems easy enough for a beginner project, but challenging enough to demonstrate skills to an employer.

potent sky
desert oar
#

given that this seems to be work-sponsored, why don't you bring this up with your manager / mentor / whoever and let them know that you might need to adjust your project, to avoid getting bogged down in a "gray-hat" web scraping task that's unrelated to the actual topic?

turbid drift
#

Ah, it's not work-sponsored. I don't have a specific job I'm doing this for, this is just to make my portfolio look more attractive to any employers who are looking for data analysts.

desert oar
#

oh

#

just pick a different project then

#

i get the desire to work on something particularly interesting, but imo there's no point wasting your time with a distraction. if you want to practice webscraping, start on wikipedia, which does permit scraping (within reasonable limits)

#

but for a data analyst job i think your attention will be better spent elsewhere. web scraping and related tasks can be extremely useful and can make you seem like a wizard. but focus on fundamentals is more important.

turbid drift
#

So can I just look for a random, but information-rich project on Kaggle and just do a bunch of analysis on it, and have it be good enough for a portfolio project?

desert oar
#

that job market is absolutely brutal right now as i'm sure you know. if you have a particular industry of interest, you might be able to gain an advantage by doing a project in that particular industry's domain.

desert oar
turbid drift
#

I honestly don't enjoy data scraping that much, and I only really do it because I feel like I'm pressured to come up with data I gathered myself, otherwise employers won't think I'm desirable.

desert oar
#

right, but there's a lot of data out there that you don't need to scrape from the web or call from an API in small batches

#

in general, getting data yourself can be very important. so i don't want to undersell it too much. but i think you're on the right track in not wanting to spend too much effort on it.

turbid drift
#

I feel like I might be misunderstanding a lot about the industry too. I'm coming off the Google Data Analytics certificate by the way.

desert oar
#

what kinds of jobs are you looking to get? are you looking for your first job in tech / data?

turbid drift
#

Yeah, just any kind of data analyst/data science job, remote or nearby, involving Python, SQL, R, Microsoft Excel, all of which I can use well.

#

I have a Bachelor's in CS as well, along with my Google certificates.

#

So really just my portfolio is stopping me.

desert oar
#

great, so you have the programming and technical skills. then you just need to show that you can put together a research question, make useful data visualizations, do some basic statistics, and write a coherent executive summary of your results.

turbid drift
#

Yeah, and I was on the track to doing that with my sister cities project, and moving along quite nicely with it too. Had already gathered some very useful information.

#

So I'm overcoming that imposter syndrome.

desert oar
#

based on that background you are probably a stronger programmer than 90-95% of data analysts and you might want to consider looking at more of a data science career path if you can get some work experience + a masters degree (ideally filling in the gaps you probably have in math, masters is optional but might be faster than grinding away for years at self study & looks stronger on a resume)

#

the sister cities thing is great, but unless you can get that data you might have to divert

#

what about just comparing US cities instead? come up with some kind of comparative analysis, the actual topic is less important than demonstrating that you can come up with interesting questions and answer them coherently

turbid drift
#

Perhaps. I think another thing I'm afraid of is the thought in the back of my head that someone has probably done/found that info out before, and I'm just re-inventing the wheel, in which an employer might find that out and just accuse me of having copied from somewhere else.

desert oar
#

i'd actually encourage spending less time on this particular project (maybe a few afternoons at most? just enough to answer your own question in a nice 2-page writeup) and then maybe go deploy your programming skills on some AI task. that could catch recruiter eyeballs.

#

of course, but who cares? you're not trying to get published in Econometrica, you're trying to get a job

turbid drift
desert oar
#

that's probably an exaggeration but still, your imposter syndrome seems severe and you haven't even been hired anywhere yet

#

you are doing fine. scale back the project, learn to embrace everything being slightly fucked, and go get a job

#

"everything being slightly fucked" is a normal state of affairs in data. the best data analysts are the ones who get stuff done anyway.

#

your job is to show up and answer useful questions for the business. all you need to do right now is demonstrate that you can do that. the particular choice of research question is only interesting insofar as it demonstrates your ability to think about the real-world context behind the data and come up with an interesting question. but you have plenty of other skills you need to demonstrate too, so don't get hung up on that one aspect in particular.

gentle sierra
#

Can anyone hop in my python post for a sec?

final kiln
#

I spent 2 weeks training it thinking the performance was subpar but the dataset was just bad in on itself, best models are getting 65% acc on papers with code, I was getting 55% before overfit

potent sky
final kiln
final kiln
potent sky
final kiln
serene jolt
#

ERROR: Failed building wheel for llama-cpp-python...I want to build a docker image and I got this error message

desert oar
past meteor
#

Nice, I'm back on the hackathon grind and I have a top 6 placement (finalist) for my first

#

€3k pot for the winner

desert oar
#

i never even tried entering a hackathon

past meteor
#

I think you'd be great at them

#

The ones I participate usually get won by a good mix of communication/business skills and tech stuff

#

So not just 1 or the other

#

Generally a nice past time that sometimes gets you money and networking

desert oar
past meteor
desert oar
past meteor
#

I only participate in data science/ML ones. Typically just LinkedIn or even Facebook ads

desert oar
#

i'll keep an eye out!

boreal gale
#

how would you describe the level of participants in a regular DS/ML hackathon?
always wanted to join one but didn't want to be dead weight, especially after not doing DS properly for ages..

past meteor
#

Industry professionals

serene scaffold
boreal gale
#

that's a fair point

serene scaffold
#

(and the winners will probably be a team of high-performing professionals who formed a team in advance--ngl)

past meteor
#

Luckily the ones I participated at post graduating where ones where you didn't form teams ahead of time

#

But I think the level of the regulars here is probably higher than the people I see participating

#

Soft skills matter a lot

#

Personally the only thing I'm sure I can do well is presenting/pitching do that's always my angle

final kiln
#

Attention mechanisms done, gonna code a bunch more metrics to get a complete report at the end of each run

#

Time to start thinking about how I can use the metric tensor symmetry to optimize for speed, and also, how am I gonna code c++ CUDA kernels through rust into torch or, through torch into rust ? Idk, but however they did it to make these rust bindings in the first place, I gotta do the same

#

I'm actually gonna think about this first before thinking how to code the kernels. That part will be much easier and idk how feasible the integration will be, so that goes first

tiny stag
#

hmm guys do yall think i should master machine learning? like ive got alot of knowledge and experience with simple software dev but was wondering bout ml and whether its worthwhile or ont

grizzled sail
desert oar
serene scaffold
trim jewel
#

I was doing a project on video highlights generation on python, can someone let me know how can i use the text from the video in deciding certain "key" moments from the video?

serene scaffold
#

so you'll want to google stuff like "saliency detection nlp"

trim jewel
#

i was thinking about how was i even going to stitch them back up, guess i have to to do timestamps, i was just happy the transcription came out fine

potent sky
potent sky
past meteor
#

I think my sklearn wrapper for Torch stinks a bit πŸ˜‚

Training time is pretty invariant to the size of the net which means all the time is spent loading data to the GPU

#

If I care enough I'll refactor the hyperparameter search to use raw torch and the rest my wrapper

potent sky
final kiln
# potent sky Ahh positive definite scalar forms I'm actually doing some research at the inter...

I'm not sure how much fancy stuff it'd be possible to do with it since it doesn't really give for very interesting spaces like the ones you see for general relativity. It's more like minkowski spaces and really, the only thing that matters is how it affects the angles between the embeddings.

But like, the point is just that, to get the interest of people who study this kind of math, I'm not entirely sure what kind of insights can be extracted, but I think it's a step in the right direction.

past meteor
#

Do you guys think ML on tabular data is a solved problem?

#

If my job was more industry, less research I'd just create lags for my time series throw it into xgboost and call it day

final kiln
potent sky
past meteor
#

The client paid for exotic nets, so the client gets exit nets but it's a bit of a waste of time

potent sky
past meteor
#

The improvement vs lags+xgboost is so marginal

wooden sail
final kiln
potent sky
past meteor
#

They're each other's inverse in the sense collecting data for those is a bit easier but more heavy weight models are required

potent sky
wooden sail
#

don't let me trick you into thinking i know what i'm doing though, i just know these things exist. i dabble at most tangentially in that i work a lot with fisher information, which happens to define a metric tensor in special cases

potent sky
potent sky
final kiln
#

Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to probability distributions.
first time I've actually been interested in studying any kind of stats

potent sky
#

And you probably know more than me haha
I invade the algebraic topology territory starting from ML and information theory, that's more my home ground

wooden sail
#

log likelihood and what not

potent sky
#

ooh interesting I'll have to look into it
Fisher vectors are more of a toy I like to use from time to time
Very elegant, but DL beats them for most applications

potent sky
final kiln
#

in all likelihood, I think the identity and avg pooling won't work as well as for vision

#

(as substitutes for attention)

potent sky
final kiln
#

but anything else will work and the network doesn't care as long as you give it a way of comparing the tokens

potent sky
#

Lots of complaints when it came out πŸ˜‚

final kiln
potent sky
#

I haven't dived into it myself tho

final kiln
potent sky
#

Are you trying to reduce it from quadratic scaling or is that not a concern for you

final kiln
#

it's not currently a concern, I've just halved the number of parameters in the attention head and will use the symmetry to reduce the number of operations

#

I'm more interested in the interpretability and in replicating those guys results

potent sky
#

ahh nice fair enough
I've very recently started keeping an eye on ways to reduce the quadratic scaling problem in attention
Massive benefits if we can find a way, but ofcourse it's a difficult task and we're not currently in a place to drop everything else and focus on thay

#

Has anyone here read the Retentive net paper

final kiln
#

I do have a couple ideas

#

the most straightforward way is to make it a funnel like structure like you do with UNETs

#

there's no reason for the output dimension to be the same as the input dimension for the attention module

potent sky
#

Nice

final kiln
#

I also suspect you can take half of the network and have it do convolution, like

#

imagine two branches with a series of attention modules

#

the first branch does self attention, and the second branch does conv

#

you can then feed one into the other like you do with encoder decoder

#

the conv captures local relations, the attention mechanism captures global ones

potent sky
#

But the self attention will still scale quadratic right

final kiln
#

the hope would be that you wouldn't need to scale as much embedding dimensionality, since part of the burden has been shifted to a different branch

#

so it doesn't actually completely solve it

potent sky
#

Ahh hmmmm

#

And how will the self attention be incentivized to prioritise only learning global context, just backprop?

final kiln
#

potentially via masking of the attention scores

potent sky
final kiln
#

With residual connections info is never really lost

potent sky
final kiln
#

Uhm, not sure if I understand

#

Each token would suffer influence from far away tokens only

potent sky
#

The self attention computation being performed is on masked vectors? Or masking is done after the computation?

final kiln
#

On the attention scores, like you do to make it causal

#

But really the only way to know if it works is to try it out

#

I think my next project is gonna be to write the torch bindings for that new programming language

#

Gleam

#

I'd make it cool tho, using the compiler to aid the ML dev process

final kiln
#

There's so much cool stuff to do ._.

potent sky
#

I just try to find justifications (excuses) to shoe-horn it in my work time πŸ˜‚

final kiln
#

That's one way to do it

lapis sequoia
#

If someone wanna do a chatbot with custom data dm me for join the project

hollow reef
#

facing a moral dilemma with my personal project
on one hand, i do not want to use AI to work on it because then it feels like less of my baby
on the other... the practicality it offers in eliminating redundancy and whatnot is unmatched, because i'm working with a lot of math and i'm not all that great at python yet

i'm considering a compromise to be only using it to handle the more rote things like big tables/dicts/definitions, or to figure out the harder coding problems that i'm still learning
maybe i could only use it for learning how to code?

idk, thoughts on coding with AI?

left tartan
#

I think you'll find you'll get over the "not all that great at python yet" if you put down the AI and muscle through it.

desert oar
desert oar
#

it seems causal, because we consistently make more progress in our 1-hour socratic discussion sessions (so what do you think we should do in XYZ situation?) than he does on his own time

#

he clearly knows how to do the stuff. he just has it in his head somehow that the AI is definitely helping him even when it seemingly is holding him back

left tartan
#

The process is the point, not the result

desert oar
#

yeah! if he'd just put down the damn AI and think + write notes on pen and paper, he'd make a ton of progress and become very strong

desert oar
#

i used to be so afraid to feel like i didn't know what i was doing. it took me so long to embrace that feeling. i'd have been toast in school using AI for everything, even as it is, wolfram alpha did me no favors in learning calculus. i dealt with the consequences of that laziness for years.

#

so i get it

#

but, learn from my mistakes

left tartan
#

I halfjoke but, perhaps the saving grace of this AI craze, is job security for the rest of us

iron basalt
#

Also if you don't feel uncomfortable, you are probably not learning, much like how not feeling uncomfortable while working out means you are probably not making progress.

#

(There is no way to minimize / avoid it, but we try really hard (human nature to avoid things that make us uncomfortable), and so turn to stuff like AI)

#

(This can be anything else, like watching tutorials instead of actually doing it (just watching is not uncomfortable))

midnight harbor
#

Has anyone in this community started using the Google Gemini API following GPT-3, and could you provide insights into its strengths and weaknesses? Specifically, I'm interested in understanding its performance in terms of

  • pricing
  • speed
  • reasoning capabilities
  • multilingual understanding
  • controllability.
    Any feedback would be greatly appreciated.

kindly Tag me

pseudo pasture
#

I'm Stuck on This Project for 3 Days because I want to deploy the Flask app online. So, I can get data from anywhere online but after trying to deploy The Project on almost every Cloud and Hosting site, I'm just facing one error ( 502 Bad Gateway). No matter, if I talk about AWS, Azure, Google app engine, Vercel, Heroku, Netlify, Konbey. Everywhere I'm getting the timeout error however locally the project is working perfect and Through Clis also working but as I deploy successfully and hit end point, it says 502 Bad Gateway.

If any of you have Solution for this then for God's sake pls tell me. I'm Stuck and can't move forward.

https://github.com/saqib772/sportsodds

GitHub

Betting Sports Odds for NBA Games. Contribute to saqib772/sportsodds development by creating an account on GitHub.

hollow mortar
wooden sail
final kiln
#

I don't use copilot, partly because of that, it spits out code, that isn't necessarily that good, and it's just so easy to leave it there as if it were a lib function

#

So I mostly use chat gpt, and mostly when I notice that I'm looking through the documentation and finding no success or it's just taking too long and it's just easier to just ask the omniscient chat bot about it

#

Like, what's the difference between reading the answer to a stack overflow question and a chat gpt answer to my specific question ?

#

The difference is that I had to write it and I had to cross check the answer either way

#

The me writing it part can be strangely beneficial

#

Which doesn't happen when you use copilot

past meteor
spare scarab
#

How can I train an ai model to become a support helper? I have all the discord messages saved from the support channel help. In any of this format CSV/TXT/JSON. I want to train the ai on the messages.

final kiln
#

Totally forgot to do no grad, so that's why it was overfitting the val data

#

I'm also getting used to rust

nocturne narwhal
#

Hey guys wanted to start with Data science can some one help me get best resources and roadmap , tried to search on udemy , coursera , youtube but im confused from where to start

desert oar
wooden sail
#

yeah

#

a lot of my current research deals with stuff like imaging, localization, parameter estimation, etc based on a small number of measurements. like one would do in x-rays and what not

#

and then there's always the question of "where do i put the sensors?"

final kiln
#

100% accuracy achieved >.>

final kiln
#

I'm either doing something wrong or I'm going right to the top of the SOTA leaderboard

#

I reckon I'm doing something wrong

desert oar
pseudo pasture
sturdy slate
#

Hello guys, need some help with medical image segmentation. I am working on lung tumor segmentation and the model seems to work well with the train dataset (overfitting, mostly) but the testing does not go well. I am not sure if I am doing things right. I am using pytorch and segmentation models pytorch for Unet models.

final kiln
#

I'm trying to force an overfit rn

#

That would prove no leakage

#

For anyone curious, average pooling is working here

#

But I got leakage no doubt, 1e-7 loss on both

wooden sail
#

5 years and many books and papers later, here we are

final kiln
#

leakage has been found, part of my code still assumed multi processing instead of async, so they used the same duck db connection to create the tables

#

The tables all had the same name

#

Just created a uuid for the name of the tables

desert oar
final kiln
#

So.... How am I gonna debug this >.>

#

Maybe join the two tables via de text field ?

#

Alright, 123 rows are shared by both tables

#

This is out of 50k

#

So it does not explain it yet

#

I do believe this is in the original dataset

#

Must be there's no way I mixed this up in such a specific way, and also by only using curl and tar

#

Matter fact, I'm gonna code the download part right in the pipeline

little arrow
#

How do I reshape Table A into Table B? Do I melt, pivot, stack or something else entirely. Many thanks in advanced!

valid quartz
#

I need your folks help on this:

So Im making an AI assistant for my school project and there are two problems Im facing: (Im using PyCharm btw)

import os
import time
import pyaudio
import playsound
from gtts import gTTS
import openai
import speech_recognition as sr

api_key = "API"

openai.api_key = api_key

lang = 'en'

while True:
def get_audio():

r = sr.Recognizer() - here Pycharm tells me to indent the "r" right here
with sr.Microphone(device_index=1) as source:
    audio = r.listen(source)
    said = ""

    try:
        said = r.recognize_google(audio)
        print(said)
        if "BanglaGPT" in said:
            completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": said}])
            text = completion.choices[0].messages.content
            speech = gTTS(text=text, lang=lang, slow=False, tld="com.au")
            speech.save = "welcome1.mp3"
            playsound.playsound("welcome.mp3")

    except Exception:
        print("Exception")

return said: - Several problems with this line,  It says its out of function, it needs End Statement

get_audio(): - An illegal target for variable annotation, and expression is expected

(Ignore the fact that the name of the AI is BanglaGPT and the language it is supposed to speak is english, its for tests ok?)

agile cobalt
agile cobalt
#

just editing/deleting the message is not enough - make sure you actually delete the key

valid quartz
#

alr

#

done

valid quartz
#

Ok so I was able to fix almost all the issues in the code

#

but now the problem is

#

It doesnt speak

#

Smh

wooden sail
desert oar
# little arrow How do I reshape Table A into Table B? Do I melt, pivot, stack or something else...

"stack" and "unstack" convert column names into row labels (aka index levels) and vice versa. that is, they operate on labels.

"melt" and "pivot" convert data columns between long and wide format. that is, they operate on data.

both can be used for this reshaping operation. melt and pivot might be easier to reason about though.

in fact, you need both here. first you want to melt this to "long format":

Region | Country | Studies | Date

then pivot this to "wide format" with respect to date:

Region | Country | Studies | Jan 2023 | Feb 2023 | ...
desert oar
wooden sail
#

the application motivates the problems covered in those books, but i think you'll find most of the stuff is easily generalizable

#

also what constitutes a "signal" is very easy to satisfy πŸ˜›

#

a large chunk of AI/ML is parameter estimation and estimation theory in a trench coat

desert oar
#

that's what i'm hoping to get here. i remember trying to get into this stuff back when i was an undergrad studying economics, for the same reasons you just stated, but i had a hard time connecting to the applications at the time & wasn't strong enough with math yet.

final kiln
#

I was parsing one of the classes incorrectly and that led to only one output token, which was 0 and that was the source of it

#

Now I'm getting a more realistic goes up to 0.5 acc and annoyingly stays there forever

#

I suspect there's gonna be something in the SQL still, gonna write some unit tests for this

#
{
    "run_name": null,
    "experiment_id": 1,
    "data": {
        "slices": 1,
        "batch_size": 256,
        "test_source": "***/dataset/test.parquet",
        "train_source": "****/train.parquet"
    },
    "model": {
        "depth": 6,
        "heads": 10,
        "encoding": "tiktoken-gpt2",
        "dimension": 64,
        "kernel_size": null,
        "attention_kind": "quadratic",
        "context_window": 300,
        "input_vocabolary": 60000,
        "output_vocabolary": 5
    },
    "train": {
        "epochs": 100,
        "learning_rate": 0.0005
    },
    "process": {
        "use_gpu": true,
        "executable_source": "****t"
    }
}
#

in case anyone has any ideas, but hopefully it will be a temporary issue related to the data

#

that hypothesis is motivated by the fact that it's not overfiting

#

which might mean I'm messing up my randomization again, maybe I'm mixing up the labels each time

jagged latch
#

I have a question to those experienced in Excel. I'm having an issue in a sheet where after new data is generated through my Python script via Openpyxl, I am getting some values in a column with a yellow highlighted cell with bolded font. The thing is I double click on the cell and the formatting in question goes away and returns to looking like the other cells in the column. I know it's not my code because I tested the same exact code on a blank workbook and sheet and got everything without the formatting. Is there some type of hidden check that was in the original Excel template?

#

If I try and go over the cell and click no fill or try and unbold it, nothing happens. It only returns to normal after double clicking it and then clicking somewhere else.

jagged latch
#

I found the problem. The person before me left some conditional formatting in there. I ended up removing it. Problem solved.

final kiln
#

Alright.

#

Here's what I'm NOT gonna do, spend two weeks experimenting with this stuff.

Gotta change my approach.

What I'm gonna do instead is go through the literature and see what people did and how. And then replicate that.

#

Time to PR, wait for the checks to be done, do a release so that my binary gets built automatically and then published and then it's time for a well deserved rest.

#

Tomorrow I'm gonna freeze some of the interfaces and write integration tests for them. After that I'll collect some papers and also think about the most efficient way to take the derivative of a metric tensor.

crisp raptor
#

right now I'm working on a neat little project on my calculator for AI generated music

final kiln
#

I spent a lot of time just messing about with the parameters, when I should've just looked it up. I think it's gonna be the same thing here

#

And its also about time I write some tests. Particularly, I'm gonna write some for the training loop itself. Just gonna generate a dataset that oughta be easy to generalize

zealous spear
#

Hi, can anyone tell my why I get this error?

File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 87, in run
    pipe = subprocess.Popen(
           ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python312\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] Nie moΕΌna odnaleΕΊΔ‡ okreΕ›lonego pliku

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\barte\Documents\GitHub\docgpt\main.py", line 10, in <module>
    doc = textract.process("spa.pdf")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\__init__.py", line 79, in process
    return parser.process(filename, input_encoding, output_encoding, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 46, in process
    byte_string = self.extract(filename, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
    raise ex
  File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\pdf_parser.py", line 21, in extract
    return self.extract_pdftotext(filename, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\pdf_parser.py", line 44, in extract_pdftotext
    stdout, _ = self.run(args)
                ^^^^^^^^^^^^^^
  File "C:\Users\barte\AppData\Roaming\Python\Python312\site-packages\textract\parsers\utils.py", line 95, in run
    raise exceptions.ShellError(
textract.exceptions.ShellError: The command `pdftotext spa.pdf -` failed with exit code 127
------------- stdout -------------
------------- stderr -------------```
#

This is my code:

import textract
import os
from transformers import GPT2TokenizerFast
from langchain.text_splitter import RecursiveCharacterTextSplitter

from dotenv import load_dotenv

load_dotenv()

doc = textract.process("spa.pdf")

with open('./dataFromPdf.txt', 'w') as f:
    f.write(doc.decode('utf-8'))

with open('./dataFromPdf.txt', 'r') as f:
    text = f.read()

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

def count_tokens(text: str) -> int:
    return len(tokenizer.encode(text))

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 512,
    chunk_overlap  = 24,
    length_function = count_tokens,
)

chunks = text_splitter.create_documents([text])
final kiln
#

Exit code 127

oblique comet
#

was stuck figuring out why cropping images took about 3-4 seconds per image and why it was running on cpu (100% usage) instead of the cuda device
after some debugging I found the related line:

visible_pixels = crop[crop > 0]

and changed it to this:

mask = crop.gt(0.0).to(crop.dtype)
visible_pixels = mask * crop + (1 - mask)

function execution time went down from 3408ms to just 28ms lol
thats a 99,17% reduction!

love optimizing stuff like this but its rare that I manage to lower it this much
this is why i love programming

spring field
vapid storm
#

Hi guys, (very quick question πŸ₯Ί)
I recently downloaded a dataset of images of shotguns, handguns, and knives. I am using this to train a cnn used to detect potential weapons through doorbell camera images or footage. However, I don't know if i should normalize all the images to a certain size.

if I do, then the bound boxes in the corresponding txt file for each image would be skewed.

jaunty helm
#

I have this preprocessing step

def frequency_encode(df: pd.DataFrame, features: str | list[str]=None, inplace=False):
    if features is None:
        features = df.columns
    elif isinstance(features, str):
        features = [features]

    if not inplace:
        df = df.copy()
    for feature in features:
        frq = df[feature].value_counts()  # <-- problem
        df[f'{feature}_FrqEncode'] = df[feature].replace(frq.to_dict())
    if not inplace:
        return df
```I put this in a `FunctionTransformer` in a `Pipeline`, then later I realized that at `# <-- problem`, I should instead somehow store a `df_train` that was seen during `.fit()` and use `df_train[feature].value_counts()` when `.transform()`ing
how do I do this? (while still being able to use a `Pipeline` of course)
lapis sequoia
#

You can use the Pipeline class to wrap your custom class with the necessary preprocessing and encoding. You can do it like this

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.expr import FunctionTransformer
from sklearn.pipeline import Pipeline

class DistanceTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, metric_func, **kwargs):
        self.metric_func = metric_func
        self.inplace = True

    def fit(self, X, y=None, **kwargs):
        if not self.inplace:
            raise RuntimeError("Transformers must be called with inplace=True")
    
        X = self.metric_func(X)
        return X

    def transform(self, X, **kwargs):
        if not self.inplace:
            raise RuntimeError("Transformers must be called with inplace=True")
    
        X = self.metric_func(X)
        return X

class PairwiseDistanceEstimator(BaseEstimator, TransformerMixin):
    def __init__(self, **kwargs):
        super().__init__()
        self.kwargs = kwargs

    def fit(self, X):
        X = self.get_metric(X)
        return X

    def transform(self, X):
        X = self.get_metric(X)
        return X

    def get_metric(self, X):
        if self.kwargs['metric'] == 'euclidean':
            return pairwise_distances(X, metric='euclidean')
        if self.kwargs['metric'] == 'cosine':
            return pairwise_distances(X, metric='cosine')
        raise ValueError(f"Invalid metric '{self.kwargs['metric']}'. Defaulting to Euclidean distance")
jaunty helm
# jaunty helm I have this preprocessing step ```py def frequency_encode(df: pd.DataFrame, feat...
class StatedFunctionTransformer(FunctionTransformer):
    def fit(self, X: pd.DataFrame, y=None):
        def deco(fn):
            def wrapper(*args, **kwargs):
                kwargs['df_train'] = X
                return fn(*args, **kwargs)
            return wrapper
        self.func = deco(self.func)
        return super().fit(X, y)

def frequency_encode(df: pd.DataFrame, features: str | list[str]=None, inplace=False, df_train: pd.DataFrame=...):
    if features is None:
        features = df.columns
    elif isinstance(features, str):
        features = [features]

    if not inplace:
        df = df.copy()
    for feature in features:
        frq = df_train[feature].value_counts()  # <-- problem
        df[f'{feature}_FrqEncode'] = df[feature].replace(frq.to_dict())
    if not inplace:
        return df
```this is what I've settled on for now, if anyone knows of a better/more conventional method, or there's a problem to what I'm doing here, please tell me
mellow vector
#

I know this isn't DS but you guys use jupyter... I'm reviewing jupyter and the course instructor is describing tooltip uses, when he presses shift tabx3 the tool tip remains open while hes typing, my tooltip closes immediately, what am i doing wrong?

past meteor
boreal gale
mellow vector
#

ty

red kraken
#

hi, i'm creating a project based on detecting a larvae presence in different water types. I'm getting the data thru sensors such as turbidity, oxygen, ph Level, and temperature. I wanna ask is random forest the way to go to properly detect larvaes depending on the data or is there other better ml algorithms?

boreal gale
# red kraken hi, i'm creating a project based on detecting a larvae presence in different wat...

some problem lends itself to certain models, e.g. sound and wavelet models, image and CNNs.

RF is a good starting point, it's up to you to search for better models once you establish a baseline model, xgboost has always been a strong contender in kaggle for a reason, i suggest you do some more research in that regard if you aren't familiar.

also sometimes it's not so much about the model you use, but the features you craft - e.g. your problem could potentially be solved by a temporal snapshot (i.e. just sensor values in one instance of time), or an alternative maybe more useful set of features might be some aggregate of sensor values over time (diff, % change maybe?), sometimes it's worth looking into the fundamental aspect of the problem, in this case think about the biological impact of larvae presence (they might make the water warm, "more warm" than usual? idk - not a biologist.. but if so how do you describe that properly?)

potent sky
#

Anyone tried out the LLMs in 1.58 bits paper yet?

desert oar
final kiln
#

Tests - are important

final kiln
#

found one

#

the pre trained embeddings part might be critical

slate crystal
#

Im training a housing price prediction model in TensorFlow with dimensions of X (20433 rows Γ— 13 columns), loss="mae", optimizer="Adam()".
The problem I am getting is that upon training the loss initially decreases but after some epochs becomes stagnant.

Any suggestions on improving the model, and how many layers should I use?

#

tf.random.set_seed(42)

model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(32),
tf.keras.layers.Dense(1)
])

model.compile(
loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.SGD(lr=0.001),
metrics=["mae"]
)

norm_history = model.fit(X_train_scaled, y_train, epochs=100, batch_size=64)

final kiln
#

I know im always gonna enjoy these more cuz it took so much work, but damn these things look good

final kiln
final kiln
slate crystal
final kiln
slate crystal
final kiln
#

There was a lot of discussion recently because of the meaning of linear in linear regression

#

Just try it, can't hurt to just try rite

slate crystal
#

Okay i'll try and say

final kiln
slate crystal
final kiln
#

networks will prefer stuff between 0 and 1, in transformers z-score normalization is used across each batch, followed by a trainable affine

slate crystal
#

This is the normalization I am already using,

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

final kiln
final kiln
#

would be helpful to see your loss graph

slate crystal
slate crystal
#

is it done using Pandas?

final kiln
#

something of this sort

slate crystal
#

Loss graph

final kiln
#

mine is also not looking too good ngl, the missing ingredient is gonna be the pre trained embedder

slate crystal
#

just a minute i'll try to do it

#

This will do right?

final kiln
slate crystal
#

yeahπŸ˜…

final kiln
#

model.compile(
loss=tf.keras.losses.mae,
optimizer=tf.keras.optimizers.SGD(lr=0.001),
metrics=["mae"]
)

#

so, don't use SGD, use Adam

#

mean average error, that sounds fishy to me

#

I dont use keras

#

let me check this

#

loss = mean(abs(y_true - y_pred))

#

try using mean squared error isntead

slate crystal
final kiln
#

yeah, mean squared error is better

#

no need for a sqrt

#

uhm, what is your batch size ?

slate crystal
#

I've tried mse before it gives huge loss numbers

slate crystal
final kiln
slate crystal
#

After using Adam optimizer

final kiln
#

the validation and the training loss follow each other very closely here

#

that can be suss, after many epochs the model should start overfiting

#

try increasing your model capacity

#

more layers

slate crystal
#

u mean the no of layers?

final kiln
#

model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(32),
tf.keras.layers.Dense(1)
])

this is quite small

#

model = tf.keras.Sequential([
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(25)
tf.keras.layers.Dense(1)
])

#

something like that, plus the activations ofc

#

you want your model to have more capacity than the dataset requires

slate crystal
#

could you suggest any activations?

final kiln
#

GeLU

slate crystal
final kiln
slate crystal
#

okkay

final kiln
#

after you've managed to overfit your model, you know you got something that has the power to do the task

#

you'll then try to cripple it so that it doesn't overfit, or, overfits just a little

#

you do that using dropouts

slate crystal
slate crystal
#

longitude latitude housing_median_age total_rooms total_bedrooms population households median_income ocean_proximity_<1H OCEAN ocean_proximity_INLAND ocean_proximity_ISLAND ocean_proximity_NEAR BAY ocean_proximity_NEAR OCEAN

the last 5 were one-hot encoded with pd.Dummies

final kiln
#

and what does the output of the model mean ?

slate crystal
#

the output is median_house_value

final kiln
#

alright, let's try to first normalize these, maybe with z-score along each column except for the one hot encodings

slate crystal
#

This is the output

final kiln
#

tho that would kinda make it dependent on the sample

#

you need to get these in more reasonable ranges

#

for example house median age, maybe I'd divide every value by 30 or something like that

#

median house value too, by 300k or something

slate crystal
#

I actually did normalize before training

NORMALIZATION & STANDARDIZATION

features = ['longitude', 'latitude', 'housing_median_age', 'total_rooms',
'total_bedrooms', 'population', 'households', 'median_income',
'ocean_proximity']

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

X_test_scaled[0]
array([ 1.16597857, -1.33318189, -0.68338903, -0.76968499, -0.61778743,
-0.79510954, -0.64364484, -0.36439632, -0.89050504, -0.68141436,
-0.01649168, -0.35421275, 2.59982148])

final kiln
#

oh okay

#

right, so now for the network then

slate crystal
final kiln
#

model = tf.keras.Sequential([
tf.keras.layers.Dense(13),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(100),
tf.keras.layers.Dense(50),
tf.keras.layers.Dense(25)
tf.keras.layers.Dense(1)
])

#

maybe something like that

slate crystal
final kiln
#

make sure the output is also normalized

#

which I suspect it is not because of your large loss values

slate crystal
slate crystal
final kiln
#

yes

final kiln
slate crystal
#

For y

scaler = StandardScaler().fit(np.array(y_train).reshape(-1,1))
y_train = scaler.transform(np.array(y_train).reshape(-1,1))
y_test = scaler.transform(np.array(y_test).reshape(-1,1))

this is the scaling I'm using for y values

#

This is the graph now

final kiln
#

awesome

#

uhm

slate crystal
#

It plateaus in its scale

final kiln
#

how long does it take to do 100 epochs ?

slate crystal
#

2 minutes

final kiln
#

let's go nuts then, add moar capacity

#

try to go as far as the gpu lets you

slate crystal
#

I think generally ppl dont normalize the output y_values, right?

final kiln
#

why not ?

slate crystal
#

I dont know I've never seen

final kiln
#

ive never seen a non-normalized one, usualyl you're ven supposed to interpret the output as a set of probabilities

slate crystal
#

That is for classification

#

right?

final kiln
#

it's also for image segmentation

#

which is classification in disguised

slate crystal
#

Do they use it in regression problems?

final kiln
#

last time I did a curve fitting like the one you're doing I used normalization on the output

#

also tailor and fourier features

#

a small learning rate helped

#

lets gogoogoooooooooooooooooooo

#

the weight initialization helped

#

omg thank god

slate crystal
#

Nicee

#

But I must figure out where it is lagging now

final kiln
#

fill your gpu memory as far as possible

#

I got 84% accuracy, aint gonna need no pretraineds

slate crystal
#

Yeah but your graph looks smooth

final kiln
#

you have no idea how much work that took

#

gonna do 40 epochs like in the paper

#

then im just gonna push and do a release

#

tomorrow is CUDA time yo

slate crystal
#

you do that with Docker?

final kiln
#

which part ?

#

im using docker extensively

slate crystal
#

packaging n everything...

slate crystal
final kiln
#

yeah, so I got these base images, which are meant for production use, I then have these github actions workflows that build on top of these images to produce my development and stagin environments

#

this setup allows me to very quickly switch from developing on my laptop, to developing on an aws machine with gpu

#

anything that works during development is guaranteed to work during production

#

and it's all very cost effective because I'm using interruptible instances

final kiln
slate crystal
#

yeah so Docker ensures all the dependencies are packed together so that in production the image can deploy and run anywhere, right?

serene scaffold
slate crystal
#

VM is?

serene scaffold
#

virtual machine
an instance of an operating system

slate crystal
#

Okay, where do I learn docker and all its applications

#

I just got started with a 3hr video course on yt

serene scaffold
#

a good way to practice would be to build a model, and then create a docker image that, when run as a container, allows users to interact with that model in a jupyter notebook.

#

which means that you'll need to write a Dockerfile that copies the model into the image, and installs all the Python dependencies

slate crystal
#

Hmm great...

final kiln
#

this is using avg pooling instead of the usual attention mechanism

#

it would seem that there's something about the metaformer paper

#

only slightly worse

#

now I wanna train it for next token prediction, now way it could work for that right

#

I can believe sentiment analysis, because really, all you need is to count how many bad words and how many nice words

#

in fact im using identity now instead of the attention

final kiln
#

Identity is suss tho, identity can be suss, but also could be not suss due to the aforementioned reasoning

#

The network will operate on each embedding individually and then average out to one token which is then projected to the output probabilities

#

This is kinda suss

past meteor
#

Docker ain't perfect, but it's the best we got

potent sky
potent sky
final kiln
#

I'm leaving the project in a good state tho, easy to extend, all that's really needed is to add a pipeline that generates the right data

#

Next token prediction is just sequence to sequence without global average pooling

#

Ah there's stuff that does require major mods

#

Like machine translation or summarization, which I'm guessing require encoder decoder

final kiln
#

There's like 5 job openings in Switzerland ._.

odd meteor
# final kiln I'm gonna start applying like a madman. Likely gonna focus on London cuz the EU ...

I'm rooting for you πŸ’ͺπŸ’ͺπŸ’ͺπŸ’ͺ
This might interest you.

https://jobs.inverid.com/ml-ops-engineer/en

Inverid - creators of ReadID

Is innovation in your DNA? Do you love tinkering with the latest technologies, and do you understand that security is very important? Do you know what it means to create trusted scalability for our software?Β Then we might be looking for you!

past meteor
#

Look at Belgium, the ML market is really "English friendly"

open raven
#

pandas DataFrame, to select every n-th row

Starting pandas version 2.2.0 it becomes harder to use iloc property to select every n-th row from DataFrame. It happens because iloc got deprecated. What are alternative ways when index has the default form (it was created implicitly by DataFrame constructor called without index-related arguments neither it was modified)?

agile cobalt
agile cobalt
open raven
#

found it in pandas.DataFrame.iloc API reference

agile cobalt
open raven
#

Youβ€˜re right only one feature depricated. Sorry

agile cobalt
#

tbh I don't get what they mean by "Returning a tuple from a callable is deprecated.", this doesn't makes sense on this page and I do not see anything about it in the 2.2.0 changelog either

#

!e oh wait, probably something like ```py
import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [1,2,3,4,5,6], "C":[1,2,3,4,5,6]})
test = df.iloc[lambda frame: (len(frame.index)//2, len(frame.columns)//2)]
print(test)

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

001 | /home/main.py:4: FutureWarning: Returning a tuple from a callable with iloc is deprecated and will be removed in a future version
002 |   test = df.iloc[lambda frame: (len(frame.index)//2, len(frame.columns)//2)]
003 |    A  B  C
004 | 3  4  4  4
005 | 1  2  2  2
agile cobalt
#

hmm yep, that doesn't really works like I expected either
(it picked multiple rows instead of a row and a column)

#

!e that may as well be why it's deprecated lol```py
import pandas as pd

df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [1,2,3,4,5,6], "C":[1,2,3,4,5,6]})
test = df.iloc[(len(df.index)//2, len(df.columns)//2)]
print(test)

arctic wedgeBOT
#

@agile cobalt :white_check_mark: Your 3.12 eval job has completed with return code 0.

4
agile cobalt
broken arch
#

hey guys has anyone worked on the elec2 dataset and what should i do about drift detecting and what stuff should i apply to improve the performance

final kiln
final kiln
past meteor
#

ML6 is a great company, I think you'd be a good fit and you could get in so definitely apply πŸ™‚

final kiln
#

sure, I will, thank you for the suggestion !

#

Being a Machine Learning Engineer at ML6 means you consider yourself as a healthy mix between a machine learning expert, a software engineer, a researcher, and a hacker! πŸ€–

very nice

past meteor
#

Yup, I've seen many of their talks. They do really cool stuff. They'd be high on my list when I decide to change jobs myself.

final kiln
#

aaah, nothing like battling nvidia right in the morning to get the heart pumping

final kiln
past meteor
#

Cutting edge ML, but practice focused

#

I battled my sysadmin yesterday for more resources on my VM but the result was a 24h downtime and a wiped machine

final kiln
#

I know smart people who refuse to work in software because they don't have the patience for it, and indeed it does really require a lot of it at times

past meteor
#

When my colleagues say BS like "bUt WhY nOT rUn oN CpU" I want to quit immediately

past meteor
#

I think there's a lot of anti software snobbery with data scientists but I've mentioned this already

final kiln
#

I think the two worst environments are nvidia's and javascript, they compete for first place as the worst dev experience possible

#

nvidia does it by forcing pytorch to do 8gb docker images and not allowing emulation so that people have to buy their gpu

#

I think there's also no cross platforming with the layer where cuda resides

#

I think the Nvidia docker runtime is well done tho, I wonder if it is possible to map nvcc inside the container instead of having to pull their dev image

#

"no space left on device", there's literally 100gb on that machine and I'm just pulling a docker image

#

I don't get it

#

And I just added 50gb, so somehow it's downloading 50 extra gbs

#

I had pulled it locally first to test it out, says 8gb

#

No wait, says 18gb

#

The root volume of the AMI I was using is different from the other AMI, so the volume got mounted but it wasn't root.

#

Giving it 125gb for good measure tho

#

An entire morning later, I have the compiler running

final kiln
#

I believe

#

This is gonna require some thinking

buoyant vine
#

The Nvidia stuff with docker drives me nuts

left tartan
#

I’m unfamiliar with this, What’s the docker stuff with nvidia?

buoyant vine
#

It is just an inconvenience with using CUDA, the Nvidia libraries and tooling single handily make the docker images enormous and difficult to run reliable across environments (Things like CUDA versions not aligning sadge )

#

I am looking forward to Burn's auto fusion system using WGPU becoming a bit more mature since it solves this issue providing you don't need the absolute most efficient and fastest system possible.

left tartan
#

Oh, sure, yah we have to build out the images, the whole cuda setup is just a pain. Thought you were saying something else about docker

buoyant vine
#

nah, it isn't specific to docker either, but with docker images you normally care to make them as small as is reasonable since it helps start times among other things, and CUDA just throws that out the window :P

final kiln
#

maybe that's the issue, and I should try to build this directly on the machine

#

but then it breaks the rest of the workflow I think, since im using docker for everything

buoyant vine
#

I think that is normally the best place to start, at least that way you can start to pinpoint what might be causing it

#

If I remember right, we had some issues with miss matched cuda versions, where the CUDA v11 pytorch image didn't want to work on EC2 for some reason, but the V12 image did

#

No idea why, didn't question it, just accepted that it was working and agreed to never touch it again

final kiln
#

if it's anything like what I experienced it was probly the ARM architecture stuff, don't know if they resolved it but some time back they didn't have arm wheels or arm images

buoyant vine
#

Haven't attempted ARM yet

#

My end goal is to use the Inf2 EC2 instances, but I still need to write a bunch of bindings and libs to work with the compiler and things

#

personally I think AWS has actually got a CUDA killer... If they focussed on lib support and integration a bit more

final kiln
#

after this morning I think anyone that comes forward with a better dev experience is a cuda killer

#

but i reckon it's more complicated than that cuz it's also hardware stuff

buoyant vine
#

The inf2 instance are pretty insane compute wise

#

It is just the lib support that is -_-

#

I believe they also stopped supporting ONNX which was a weird move imo

trim jewel
#

can someone help me if they know about topics like nlp, summarization, topic modelling, stuff like that? i'm doing a project and i need to know if there are articles/videos which specifically will be useful to my project

final kiln
#

I'm gonna go in steps

  1. Compile simple c++
  2. Bind it to rust
  3. Compile c++ with some torch in it
  4. Bind it to rust
  5. Compile c++ with CUDA
  6. Bind it to rust
  7. Compile c++ with torch and CUDA
  8. Bind it to rust
#

Rn I'm trying to do a python binding using setup() in py. Which might not even be the right direction since I'm not binding it into py like that

#

If I bind anything into py it would be rust

#

I can't not use docker, the experience is bad, but it will only be worst without it

#

Even if some short term relief is achieved, and even that is not guaranteed

final kiln
#

Another possible approach is to try to see how rust behaves when interoping with CUDA, and then see if there's some way to assemble that into a custom layer directly in the torch rust bindings

#

I did check, and have not found a good way to do it in the rust torch bindings. But I'd definitely have to dig deeper.

They are mostly holding on to a pointer to the tensor and then calling c++ code with it.

quaint loom
#

Hi people. Is there anything who is here and can have a look on my code?

#

So I have trying to make a temporal view from my data. 2 variable. Each variables has 6 different areas with (3;2;1 sub-area).

Over the temporal subplots I`ve created, I want to make a mean value as for the area (from the sub-area together). So I want basically but the mean value to be shown for each day as well as the individual data.

I`ve done 2 different experiment. But the data will only be shown in the first experiment and for day 11 of the experiment 2

Experiment 1 (Day 1-5)
Experiment 2 (Day 6-11). So on experiment 2, data for 6-10 is missing, it must be either the way I am calculating the mean value of filtering the data.

Code: https://paste.pythondiscord.com/JBVA

lapis sequoia
#

Alright so if i got you correctly,

You want the mean value of the all the data from each day right?

quaint loom
lapis sequoia
quaint loom
lapis sequoia
quaint loom
quaint spade
#

hey everyone , just got started with computer science and theres a course on database , main ERDs , i was wondering if any of you have sources to free exercises i can try and maybe even software for the diagrams , thanx in advance

desert oar
#

ideally you could share a sample dataframe

desert oar
potent sky
#

Is there a stdlib way to do memory profiling?

quaint loom
desert oar
desert oar
#

got it. so your data is like this?

Experiment | Day | Area | Measurement | Value
---------------------------------------------
1          | 1   | URT  | Ξ΄13C-Ξ£CO2   | 10.5
...
quaint loom
desert oar
#

so you have 1 measurement each of CH4 and CO2, per bucket per day?

#

or do you have repeats per bucket per day?

quaint loom
desert oar
raw mortar
quaint loom
#

What I initially want to plot is the mean value for each bucket given the same name:

Mean value in grey of all URT (URT1, URT2, URT3) and additionally each individual given a color. And so on for the other. CV will only have 1 tho. and CH2 two

desert oar
#

because i thought you only had one measurement from each bucket per experiment per day. so there's nothing to take the mean of

desert oar
#

@quaint loom you also mentioned an "individual" -- what's that in this context?

quaint loom
desert oar
desert oar
# quaint loom

is this the table for one experiment? or is this the whole data?

#

don't be coy here

raw mortar
#

I tried to read to get the context, but i didn't get it

#

Maybe a diagram or some sample showing input and output makes more sense?

quaint loom
quaint loom
desert oar
#

i feel like you're saying conflicting things

#
# Columns: Bucket, Day, Experiment, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...

data = data.set_index(["Bucket", "Day", "Experiment"])

is this what your data looks like, or no?

quaint loom
quaint loom
arctic wedgeBOT
#
Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

desert oar
# quaint loom https://paste.pythondiscord.com/JBVA

i want to know what the raw data looks like, i.e. what's in the excel sheet. the screenshot you posted shows one measurement of CH4 and one measurement of CO2 per day, per bucket. it sounds like that is in fact representative of your data. but now you're talking about averaging and i'm trying to figure out what you want to average over.

quaint loom
# desert oar i want to know what the raw data looks like, i.e. what's in the excel sheet. the...

My internet seem to be a bit slow, so I can`t paste the data somewhere.

Maybe I am also not using the vocability (As you`ve mention to be years ago) not accurate. Not sure if it would be more suitable to use the word average or mean. But I would like to plot the Mean/Average of URT 1, URT2, URT 3 etc for each day.

I will share a picture of the dataset to I find a solution or if you could add me as a friend and I can share the excel file .

desert oar
quaint loom
potent sky
#

Hmm then any as-good-as-stdlib de facto way?

desert oar
#

@quaint loom for each of CH4 and CO2, you want to create a plot with an X axis that says "Day" and a Y axis that's the average CH4 or CO2 level, across all buckets?

#

or you want the average within each bucket? but then i don't know what you want to average over.

quaint loom
desert oar
#
# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...

data = data.set_index(["Bucket", "Day"])

so your data is like this?

#

where Bucket and Day uniquely identify 1 measurement of Ξ΄13C-Ξ£CO2 and 1 measurement of Ξ΄13C-Ξ£CH4?

raw mortar
potent sky
#

Yes

quaint loom
potent sky
#

Sorry the messages in between had not loaded in for me

raw mortar
quaint loom
potent sky
raw mortar
#

I've been having a great time with scalene recently

#

A few of my colleagues have also been testing out memray from Bloomberg, heard it's pretty good

quaint loom
#

@desert oar But I also want the individual bucket to be represented in the graph, shown with color.

desert oar
#

so you want the individual buckets and the average across buckets?

desert oar
#

i see. that was not clear

#

@quaint loom does this work?

# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...

fig, axs = plt.subplots(2, 1)

# CO2
ax = axs[0]
ax.set_title("Ξ΄13C-Ξ£CO2")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
    ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CO2"], label=bucket_name)
ax.legend()

# CH4
ax = axs[1]
ax.set_title("Ξ΄13C-Ξ£CH4")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
    ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CH4"], label=bucket_name)
ax.legend()

plt.show()
#

oh i didn't add the averages, hang on

quaint loom
desert oar
#

what's this Experiment 1 and Experiment 2...

#

you didn't say anything about that. i was trying to get clarification but it sounded irrelevant

#

anyway what i posted above should at least give you the right idea. just need to separately call ax.scatter for each plot

quaint loom
#

Think about it as different days. I thought I mentioned to you that it was the same thing, just different days.

desert oar
#

are they days, or are they something else?

quaint loom
desert oar
#

ah

#
# Columns: Bucket, Day, Ξ΄13C-Ξ£CO2, Ξ΄13C-Ξ£CH4
data: pd.DataFrame = ...

daily_avgs = data.groupby("Day")[["Ξ΄13C-Ξ£CH4", "Ξ΄13C-Ξ£CO2"]].mean()

fig, axs = plt.subplots(2, 1)

# CO2
ax = axs[0]
ax.set_title("Ξ΄13C-Ξ£CO2")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
    ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CO2"], label=bucket_name)
ax.scatter(daily_avgs["Day"], daily_avgs["Ξ΄13C-Ξ£CO2"], label="average")
ax.legend()

# CH4
ax = axs[1]
ax.set_title("Ξ΄13C-Ξ£CH4")
ax.set_ylabel("Value")
ax.set_xlabel("Day")
for bucket_name, bucket_data in data.groupby("Bucket"):
    ax.scatter(bucket_data["Day"], bucket_data["Ξ΄13C-Ξ£CH4"], label=bucket_name)
ax.scatter(daily_avgs["Day"], daily_avgs["Ξ΄13C-Ξ£CH4"], label="average")
ax.legend()

plt.show()

that should do it with 2 plots, one for each chemical you're measuring

#

if you want to split it up into 4 plots per day you'll have to do more work

#

but again you might want to look into seaborn, it automates a lot of this for you

#

i don't personally like seaborn very much, but it does make it a little easier to make multi-faceted plots like this

desert oar
#

this is how you can define the experiment column easily:

data["Experiment"] = np.where(data["Day"] >= 7, 1, 2)
#

and that lets you now set Experiment as a facet

quaint loom
desert oar
quaint loom
desert oar
quaint loom
tawdry plover
#

Why is numpy.linalg.det() so inaccurate?

#

or is there something wrong with my algorithm?

#

(I'm using reduction to upper triangular matrix method)

final kiln
#

i got tested cpp code bound to rust, tested in the rust code

#

it also compiled with torch

#

now I'm gonna try to get a tensor operation to compute in the cpu via the cpp code

merry ridge
tawdry plover
#

apparently I'm passing 614/10000 tests

#

and getting an average of 0.42% of error

#

wait wait

merry ridge
#

I don't really understand what you are doing at all where you can conclude that the method is numerically unstable as opposed to whatever your algorithm is.

#

What is the condition number and error of one of these matrices that are "so inaccurate"?

tawdry plover
#

no numpy was giving me results in the weird floats so i thought it was using heavy optimization tricks which results in inaccuracy

#

I actually fixed my algorithm now

#

numpy was correct

#

9938/10000 passed

#

my bad

final kiln
#

But I'm gonna have to do it without torch for now.

#

No idea how their CUDA API works

#

So I'm just gonna have float pointer passed to the kernel and play around a bit with it

final kiln
#

Just got my first kernel run on a GPU

#

Didn't do anything

#

Cuz I didn't code anything, but I can see from Nvidia smi that the process is accessing the GPU as I hit cargo test

#

This was a lot smoother than expected ngl

lapis sequoia
#

Afternoon guys, I'm having trouble getting Cuda working on VS Code using Python 11 + Pipenv. Even though I have a CUDA enabled GPU and installed CUDA toolkit, torch.cuda.is_available() outputs false and that's it. I can't use my GPU in windows 10.
Does anybody know what the propblem might be?

lapis sequoia
lapis sequoia
final kiln
lapis sequoia
final kiln
lapis sequoia
final kiln
final kiln
#

uhm you'd think 12.4 is compatible with 12.1

lapis sequoia
final kiln
#

in the link I sent

lapis sequoia
final kiln
#

I've been using the nvidia docker runtime

#

it can also be frustrating at times, but it does free me from these kinds of issues

lapis sequoia
final kiln
#

it's a docker runtime that gives containers cuda capabilities

#

you just do --gpus all when running the container and it becomes available

#

the good part is that pytorch has official docker images

#

so version mismatches are usually not an issue

lapis sequoia
buoyant vine
#

Note the docker deamon also needs to be configured normally to use the nvidia container toolkit

lapis sequoia
#

sorry im very new to this

final kiln
#

ah I code in aws machines that come with it installed

#

but last time I had to install it for some deployments I dont recall having to do that much with it

#

even had a script for it

lapis sequoia
final kiln
lapis sequoia
#

it's moving something, so we will have to see if it works

final kiln
#

I think the source of this mess is, if I'm not mistaken, is that Nvidia won't let us have a CUDA alternative - not too sure about it, I overheard it in a yt video

#

There's even a video of the creator of Linux flipping them the finger, must be a reason for it >.>

lapis sequoia
final kiln
#

pipfile ?

lapis sequoia
final kiln
#

I see, a requirements.txt replacement

lapis sequoia
final kiln
#

It's in the title

#

There's also poetry

#

I've replaced all this with docker, not sure if it will ever make sense to go back, don't even have a requirements.txt in my project

raw mortar
final kiln
#

Yes

raw mortar
#

How is it any different than having a requirement file?

final kiln
#

I can pin dependencies from 3 or 4 diff languages

#

There's also env variables and other necessary scripting that can be moved to the docker file

#

So it centralizes a lot of stuff

raw mortar
#

Within the docker file itself it's either
Pip install packages...
Vs
Pip install -r requirements.txt

#

I don't see an improvement

final kiln
#

The improvement lies in not having an extra requirements.txt file around and in being able to pin down the dependencies from the other languages and other parts of the project

raw mortar
#

Lol it just combines both the files

final kiln
#

The docker file is also production ready

raw mortar
final kiln
raw mortar
#

Either i have all the packages within the docker file or point the docker file to a requirement file

#

I don't see an improvement

#

Rather it makes the docker file more cluttered

final kiln
#

I've listed my reasons already, if you have any argument against it I'm happy to listen to it. But at this time this is the best workflow I came up with.

final kiln
raw mortar
#

It's even more clear what's changed in a pr

final kiln
#

like, I'm not gonna argue about this, I prefer stuff in one place

raw mortar
final kiln
#

has happened before

raw mortar
final kiln
#

there's dependencies that you only need in a builder step

#

I don't like having one file for setting up my environment, and then a separate file for setting up another part of it, makes no sense when I can just have it laid out right there within its context

raw mortar
#

You can have multiple dependencies files

final kiln
#

yes, I can have a lot of things

#

doesn't mean it's the most practical way to do it

raw mortar
#

Usually I have like deps folder with main, dev, test etc etc

final kiln
#

in general I avoid making people jump around my code, I try to minimize the number of indirections

raw mortar
final kiln
raw mortar
#

Who even reads dependencies files at all, i usually look it up once in a quarter to bulk update all together or setup a deps bot

final kiln
#

or when I need to check a version

#

or when cuda is acting up again

#

hardcoding is how I do it, you do it different, it's fine

raw mortar
#

Like core dependencies are usually defined in pyproject.toml, for packages

raw mortar
final kiln
spring field
#

something that just came to my mind: is cgpt, at least initially, speaking so, you could say, eloquently, because those words may have been rarer in the dataset so they were artificially inflated with weights or whatever and that turned out to just make it choose those words more often in the end?

final kiln
#

it allows me to code in any machine in a reproduceable way, it fetches me a machine from my cloud provider and exposes me a vscode instance in the browser

raw mortar
final kiln
#

there's a lot of moving parts in my project

#

which don't fit into that structure

raw mortar
#

Neither do it, polyglot ftw

final kiln
raw mortar
final kiln
#

and I also dont have a gpu

#

if you have a better way of achieving a burn rate of 10cents an hour, I'm all hears tbh

raw mortar
final kiln
#

I actually couldn't compile a cuda extension with python, ended up being a lot smoother with rust, go figure

raw mortar
final kiln
#

rust is being used because the torch cpp interface is not stable, the only job of the rust torch maintainers is to keep that torch interface stable to its users

#

I decided to code cuda because it's the only way to squeeze out the performance gains from my proposed attention mechanism

#

unless there's a layer that leverages the symmetry of symetric tensors, but I couldn't even fiind a way of imposing that with pytorch without a lot of overhead

#

rust runs the training loop in a process, python generates data in another, which means there's never a gpu down time thus being cost effective

past meteor
#

My deep learning set up at work finally works πŸ™

#

Still don't have all I need, in the sense I'm being bottlenecked by RAM and CPU and not by GPU/vRAM

#

Luxury problem

raw mortar
#

@final kiln what is the actual problem you're trying to solve?

final kiln
#

I've just laid out one of them

raw mortar
#

I still didn't get the intent though, the use of certain tech could have it's reasons

final kiln
raw mortar
#

But aren't most ml packages complied to begin with?

final kiln
raw mortar
#

Python is used to interact and interface

final kiln
#

which are not currently done

raw mortar
final kiln
#

I don't think you're trying to understand what I'm saying

raw mortar
final kiln
#

yes you can have runtime checks

#

but with a compiled language you can have the compiler carry out the work and not have the overhead when training

#

I've also felt the need for stronger type checking in the models as my projects grew larger

raw mortar
#

Still I'm wondering the data/matrix only comes in to place during runtime, how a compiler can find out this at compile time?

final kiln
#

I managed to trick the rust compiler into it using macros

final kiln
#

but it's an open ended exploration, idk how far it will get and if rust is even the right language for it