#data-science-and-ml | Python | Page 113

winter nacelle Mar 25, 2024, 12:15 PM

#

Why can't I upload the pdf?

#

Okay, so I can't upload a pdf file

#

I will drop the link to access the paper

#

Please look up

#

Weather Forecasting using Incremental K-means Clustering - arXiv https://arxiv.org/pdf/1406.4756

civic elm Mar 25, 2024, 12:42 PM

#

I am I wrong here, low-resource language models are just glorified token compressed lookup tables with low temperature and low top-k next-word prediction models?

pure pond Mar 25, 2024, 12:54 PM

#

what language models? If you mean llms then, what lookup table? What are the keys in the table youre imagining?

civic elm Mar 25, 2024, 1:05 PM

#

like if the language model is fine tuned with a small set of Q and A and we take the prompt as the input it would be basically the key to the table

#

and then we write our inference code to only have a few sets of allowed keywords

#

then it's just a chat based knowledge vault, essentially

final kiln Mar 25, 2024, 2:43 PM

#

current draft, bit more refined, still haven't decided on some details of the notation so it might not be consistent yet, gotta lookup what people usually use and use that

#

not gonna be making a lot of major advances cuz im also writing my resume rn

pale thunder Mar 25, 2024, 2:52 PM

#

When dealing with the gini index for the purposes of deciding a split in a decision tree, you compute the gini index of samples on either side of the split, then take a weighted average of the indexes. However, a gini index is supposed to be the probability of a sample being misclassified - that is, the probability of random.choice(samples).class_ != random.choice(samples).class_ - the correct way to compute this for the two splits would be a different formula entirely. Why is the weighted average used?

hollow sentinel Mar 25, 2024, 3:08 PM

#

@serene scaffold i think i did something cool on my own (for once)

#

#

i parsed this excel file into a pandas dataframe. i hate parsing stuff in excel to pandas dfs.

#

df2 = pd.read_excel("/Users/rahuldas/Desktop/Tortilla Dataset/statistic_id1345446_corn-tortillas-consumer-price-index-change-in-mexico-2021-2023.xlsx", sheet_name="Data", skiprows=[0,1,2,3,4], names= ["Months", "Percentage Amount"], usecols=[0,1])
print(df2.head())

lapis sequoia Mar 25, 2024, 4:45 PM

#

how would get the following to work in python? This is some mincer thing.

carmine girder Mar 25, 2024, 4:52 PM

#

Can anyone say how should i start Tensorflow?

past meteor Mar 25, 2024, 5:23 PM

#

pale thunder When dealing with the gini index for the purposes of deciding a split in a decis...

Because you want to add the proportions to it

#

You wouldn't want the tree to make splits that split off 1 instance each time into a leaf. You'd much prefer splits that can split off a large amount of instances

pale thunder Mar 25, 2024, 5:29 PM

#

past meteor Because you want to add the proportions to it

the correct way to compute this for the two splits would be a different formula entirely
turns out I am just straight up wrong, this is in fact the correct formula.

mild grotto Mar 25, 2024, 10:54 PM

#

I noticed I have an off by one error (the area between blue and red has a disjointed connection where it meets the south pole area) 😦 😦 😦

#

Thats... going to be annoying to debug

#

I suspect it's related to pyproj

#

or bigger

#

#

I think the south pole is actually correct, and it's the prime meridian which has the off-by-1 (as well as the north pole). Since the black line at the north poll I think should be 1 pixel to the left which looks more symmetric with the south pole

#

Though... wait. if black is 0, and red is 1...

Doesn't this mean my longitude is increasing clockwise around the globe? Doesn't it go the other way?

#

😦 Oh no, there are more bugs than I thought

abstract scroll Mar 26, 2024, 2:17 AM

#

Made this AI that runs a LLM locally through Python, gave it some speech recognition for commands, still very work in progress

context, its replying with "xdd" and "short and bad" answers as I have for the sake of Debugging and testing made its behaviour like that (Im running the LLM off a Bad CPU Locally so Wanted to keep the Response time low ish) and I know there is some bugs with the Text Settings still (fixing that rn)

Overall, I'm very happy with it so far, In the future would probs upgrade back to Microsoft Azure Speech Recognition and Speech Synthesisation plus probably buy a GeForce Graphics card with CUDA for faster Responses (and using bigger models)

lofty thorn Mar 26, 2024, 5:38 AM

#

abstract scroll Made this AI that runs a LLM locally through Python, gave it some speech recogni...

cool

terse kindle Mar 26, 2024, 8:39 AM

#

I am trying to create an AI to forecast household electricity consumption appliances wise for a month. I have asked all the chatbots to write a code to create a model using suitable algorithm but still I’ve been facing problems as I don’t have strong knowledge on this. Is there any resources available for free to learn particularly for my project or any existing Research paper to learn from ?

twilit elk Mar 26, 2024, 9:34 AM

#

terse kindle I am trying to create an AI to forecast household electricity consumption applia...

You can try and look on the website Kaggle, it hosts competitions and projects, but often it also includes guides with provided code. High chance there will be a similar project on Kaggle already meaning you can look at other people's code or even look up some youtube video

#

Quick question here, im doing a project and im currently in the pre-processing stage of my data. After assesing correlation i notice that there are most likely non-linear relationships between features. Anyone know some techniques to uncover these non-linear relationships such that i can perform feature selection.

terse kindle Mar 26, 2024, 9:45 AM

#

twilit elk You can try and look on the website Kaggle, it hosts competitions and projects, ...

I’ve tried all those but still hard to find the dataset and the right algorithm for this specific project goal.

#

I later created a dataset using chatgpt and applied all the suitable algorithms but the accuracy is low. Every attempt I’ve ever made was through chatgpt provided code.

jaunty helm Mar 26, 2024, 9:59 AM

#

twilit elk Quick question here, im doing a project and im currently in the pre-processing s...

there are metrics other than (pearson) correlation you can try, like kendall's, spearman's, mutual information, etc

#

in fact if you look at the docs of pd.corr, you can see that you can specify a method=... to use the aforementioned kendall/spearman

twilit elk Mar 26, 2024, 10:01 AM

#

Sure, ill give it a try. Didnt notice the different methods. The thing is also that its time-series data so that might also play a role

#

I checked it, there is not much difference between corr measures, all still approximately the same

jaunty helm Mar 26, 2024, 10:06 AM

#

twilit elk Sure, ill give it a try. Didnt notice the different methods. The thing is also t...

I've also seen people recommend not doing manual feature selection and leaving it to regularization

twilit elk Mar 26, 2024, 10:07 AM

#

I see, but isnt that highly dependent on what models you choose

jaunty helm Mar 26, 2024, 10:07 AM

#

twilit elk I checked it, there is not much difference between corr measures, all still appr...

I guess you can still check mutual information
there's 2 versions, this one's for regression and the one above classification

scikit-learn

sklearn.feature_selection.mutual_info_regression

Examples using sklearn.feature_selection.mutual_info_regression: Comparison of F-test and mutual information

scikit-learn

sklearn.feature_selection.mutual_info_classif

Examples using sklearn.feature_selection.mutual_info_classif: Selecting dimensionality reduction with Pipeline and GridSearchCV

twilit elk Mar 26, 2024, 10:08 AM

#

jaunty helm I guess you can still check [mutual information](https://scikit-learn.org/stable...

Ill try the classif, since my target variable is either 1 or -1. See if it holds any new info.

jaunty helm Mar 26, 2024, 10:10 AM

#

twilit elk I see, but isnt that highly dependent on what models you choose

there's also using metrics/models to select for you instead of doing it manually
e.g. SelectKBest

scikit-learn

sklearn.feature_selection.SelectKBest

Examples using sklearn.feature_selection.SelectKBest: Release Highlights for scikit-learn 1.1 Pipeline ANOVA SVM Univariate Feature Selection Concatenating multiple feature extraction methods Selec...

#

some guide on the sklearn site

scikit-learn

Model-based and sequential feature selection

This example illustrates and compares two approaches for feature selection: SelectFromModel which is based on feature importance, and SequentialFeatureSelector which relies on a greedy approach. We...

twilit elk Mar 26, 2024, 10:17 AM

#

jaunty helm I guess you can still check [mutual information](https://scikit-learn.org/stable...

The mutual info, classif, yields more features that have some importance, although the importance values are all below 0.02. Note: i have about 140 variables, so i figured the importance would be spread out but i hoped for a few very important features. On to the selectKBest!

hallow sphinx Mar 26, 2024, 11:55 AM

#

Hey, I want to start learning about AI development. I have ~3 years experience in programming, and have learned it by myself. How should I get started with Data science and AI? Can somebody guide me to the recommended resource for absolute beginners in this field?

serene scaffold Mar 26, 2024, 1:27 PM

#

hallow sphinx Hey, I want to start learning about AI development. I have ~3 years experience i...

!resources data science

arctic wedgeBOT Mar 26, 2024, 1:27 PM

#

Resources

The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.

final kiln Mar 26, 2024, 1:43 PM

#

Hopefully I'll be able to do summarization with decoder ony blocks, not sure if I'll have time to code cross attention

untold dove Mar 26, 2024, 2:57 PM

#

https://discord.com/channels/267624335836053506/1222192150150779020

#

anyone able to assist me here i was directed towards this channel

slender ledge Mar 26, 2024, 6:33 PM

#

hallow sphinx Hey, I want to start learning about AI development. I have ~3 years experience i...

Try to setup openAI client on your computer and then go through some of their tutorials

dawn whale Mar 27, 2024, 2:18 AM

#

yo, I have some regression model, that outputs 36 output dimensions, each of them being a human body length

I want to compare how well the model did on each body length, so I just calculated the MAE of it and the truth, but the longer body lengths tend to have higher errors (which isn't surprising)

So I wanted to ask: How should I normalize it? Should I use RAE or just divide by the truth mean?

fickle shale Mar 27, 2024, 5:35 AM

#

Any AI/Ds dev here?

#

Need some carrer advice

#

How can i become good ai dev and how can i start

#

I want to strong my fundamental firstly so how can i start ai

#

and work on funadamental

#

Thank You Ur advice is appericated!

deep bough Mar 27, 2024, 7:43 AM

#

Hii anyone has worked with flowise ai?

brave cobalt Mar 27, 2024, 7:52 AM

#

does anybody has worked with pytrends library?

final kiln Mar 27, 2024, 8:02 AM

#

@brave cobalt @deep bough just ask your questions, dont ask to ask

deep bough Mar 27, 2024, 8:02 AM

#

I am facing issue with connecting custom tools with webhooks

final kiln Mar 27, 2024, 8:03 AM

#

fickle shale ```How can i become good ai dev and how can i start```

depends on how deep you wanna go, if you wanna do research you'll need the math, if you wanna do high level gluing of AI components you can do it with good software knowledge and surface level understanding of how AI works

#

I personally feel like knowing at least a bit of multivariate calculus is necessary to understand concepts like gradient descent, or, just what are gradients. but multivariate calculus is not much harder than normal calculus especially if you dont get into the advanced stuff

deep bough Mar 27, 2024, 8:05 AM

#

deep bough I am facing issue with connecting custom tools with webhooks

Here is my flowise ai work flow

#

->

#

And here is my custom tool query

final kiln Mar 27, 2024, 8:06 AM

#

deep bough ->

interesting, which part is not working ? do you have an error somewhere ?

final kiln Mar 27, 2024, 8:07 AM

#

deep bough And here is my custom tool query

aah javascript, what does the error printout ?

#

I both miss it and hate it, how is that possible

deep bough Mar 27, 2024, 8:07 AM

#

custom tool is not activating during the chat

final kiln Mar 27, 2024, 8:07 AM

#

deep bough custom tool is not activating during the chat

right but what does console.error print out ?

deep bough Mar 27, 2024, 8:09 AM

#

Im trying to make a appointment chat bot and while chatting it should ask user its name and that name will me pulled with help of custom tool(given name property and js query (which is right)) so it should pull the name and post it to webhooks

#

🙂🙂

#

no error

#

while chatting I am giving name but its not activating the custom tool

final kiln Mar 27, 2024, 8:10 AM

#

dawn whale yo, I have some regression model, that outputs 36 output dimensions, each of the...

maybe z-score normalization ? like, look for the tabulated average and standard deviation of all humans (they'd be estimates ofc)

deep bough Mar 27, 2024, 8:10 AM

#

deep bough while chatting I am giving name but its not activating the custom tool

and not even connecting to webhooks

final kiln Mar 27, 2024, 8:11 AM

#

deep bough while chatting I am giving name but its not activating the custom tool

ah okay so it's chat gpt that is not picking up on your prompt to create a request in the first place ?

deep bough Mar 27, 2024, 8:11 AM

#

final kiln ah okay so it's chat gpt that is not picking up on your prompt to create a reque...

I guess yes

final kiln Mar 27, 2024, 8:11 AM

#

deep bough I guess yes

ask if he knows about it

#

if he doesn't then maybe it's not made available to it in the first place

#

but ig this is the challange of using and working with LLMs, they're very unpredictable

#

maybe reduce the temperature to 0 or wtv parameter controls the output sampling

deep bough Mar 27, 2024, 8:12 AM

#

its having a normal conversating as it should so it means OpenAi tool is working

final kiln Mar 27, 2024, 8:12 AM

#

deep bough its having a normal conversating as it should so it means OpenAi tool is working

yeah but don't mean it knows about the endpoints right

deep bough Mar 27, 2024, 8:13 AM

#

maybe just a second let me try

#

yAA ITS WORKING

#

tHANKS

final kiln Mar 27, 2024, 8:34 AM

#

Awesome

#

I've rewritten the gradients, now I just have to code them into the cuda code

#

the first one is looking kinda suss tho

#

cuz i did a whole thing just to get to this to avoid computing extra stuff

#

and one does not look like the derivative of the other

final kiln Mar 27, 2024, 8:46 AM

#

final kiln I've rewritten the gradients, now I just have to code them into the cuda code

I think I gotta lower the indices on the deltas of the first equation first line, because there's an implied sum with Mkk'

#

yeah that was the case, dont know why im operating on the original expression anyway

#

im keeping it like this, but again only way im gonna know this is right is with a unit test on a fully coded layer

lapis sequoia Mar 27, 2024, 9:28 AM

#

Hello i am creating 1 layer neural network using numpy that is trying to learn AND gate and something is wrong i am doing forward and adjusting weights but outputs are wrong

#

#

#

I am doing it for my school project and if someone can help me with it i would be really gratefull

small wedge Mar 27, 2024, 9:35 AM

#

lapis sequoia

your sigmoid derivative is wrong

#

it should be sigmoid(z) * (1 - sigmoid(z))

tidal bough Mar 27, 2024, 9:36 AM

#

I don't think so; they pass y to it after all.

#

(more like, the argument of sigmoid_der should be called y and not z)

lapis sequoia Mar 27, 2024, 9:41 AM

#

small wedge your sigmoid derivative is wrong

def forward returning a sigmoid already so i just pass it into sigmoid derivative so i dont calculate sigmoid again in it

small wedge Mar 27, 2024, 9:43 AM

#

oh, I didn't see that

lapis sequoia Mar 27, 2024, 9:44 AM

#

ok i missed adding a biases in forward

tidal bough Mar 27, 2024, 9:44 AM

#

lapis sequoia

i think your backprop is wrong - dW should involve np.dot(X, error), not np.dot(self.weights, error). (unless I can't take derivatives this early in the morning.)

small wedge Mar 27, 2024, 9:44 AM

#

shouldn't the derivative of input.dot(weights) be transposed

lapis sequoia Mar 27, 2024, 9:45 AM

#

it started to work after i add biases and change learning rate

final kiln Mar 27, 2024, 9:48 AM

#

lapis sequoia

Your error is a vector, shouldn't it be a scalar ?

tidal bough Mar 27, 2024, 9:51 AM

#

the variable name is slightly misleading but that part of the formula is correct I think

lapis sequoia Mar 27, 2024, 9:51 AM

#

tidal bough i think your backprop is wrong - dW should involve `np.dot(X, error)`, not `np.d...

ok ur right i think i check my school materials that i get from teacher and changed it and its working properly

final kiln Mar 27, 2024, 10:10 AM

#

tidal bough the variable name is slightly misleading but that part of the formula is correct...

Ok I think I'm actually more confused by the fact that idk the dimensions of anything, but I think I see what's going on, the np.dot will do a matrix mul

hallow sphinx Mar 27, 2024, 10:20 AM

#

final kiln I've rewritten the gradients, now I just have to code them into the cuda code

I am just beginning AI and this scares me

final kiln Mar 27, 2024, 10:21 AM

#

hallow sphinx I am just beginning AI and this scares me

im tryna scare someone into hiring me

#

its on my resume

#

but like, you shouldnt worry this is super specialized stuff

hallow sphinx Mar 27, 2024, 10:21 AM

#

Is this what I have to go through in my college if I am pursuing AI/ML? 😭

final kiln Mar 27, 2024, 10:22 AM

#

hallow sphinx Is this what I have to go through in my college if I am pursuing AI/ML? 😭

i dont think tensor calculus is part of it

hallow sphinx Mar 27, 2024, 10:22 AM

#

shipit good to know

#

What math do I need for AI/ML? Anyone have a playlist for that?

tidal bough Mar 27, 2024, 10:25 AM

#

maybe not a lot of tensor calculus but I'd be surprised to not see linear algebra and a bit of multivariate calculus in an ML track

final kiln Mar 27, 2024, 10:30 AM

#

I wish ML papers used it tho, a lot of stuff goes under-specified with normal matrix notation

rocky ridge Mar 27, 2024, 10:37 AM

#

hallow sphinx What math do I need for AI/ML? Anyone have a playlist for that?

calculus

#

linear algebra

#

vectors

#

Matrices
Statistics

full pilot Mar 27, 2024, 10:54 AM

#

hi, im kind of new to machine learning, im not sure if this is the right chat to ask such questions either. i cant seem to get results that look right to me. i am trying to predict peat collection quantity based on weather statistics daily. ive tried multiple scikit-learn models, parameter tuning, using a standardscaler, but nothing really works out the way i think it should. can you recommend any models or just give any tips for this situation?

#

heres an example of what the full dataset looks like. "total_qty" is the peat collected that day

bronze robin Mar 27, 2024, 11:06 AM

#

Using matplotlib how can I plot such figure where I can fit two plots in same axes frame i.e one above and other one below

fickle shale Mar 27, 2024, 11:07 AM

#

final kiln depends on how deep you wanna go, if you wanna do research you'll need the math,...

I want to learn to get a job as a fresher and want strong fundamental

fickle shale Mar 27, 2024, 11:08 AM

#

bronze robin Using matplotlib how can I plot such figure where I can fit two plots in same ax...

Read matplotlib or seaborn documentation I forget how we can create

boreal gale Mar 27, 2024, 11:10 AM

#

bronze robin Using matplotlib how can I plot such figure where I can fit two plots in same ax...

are you looking for twinx/twinxy? https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.twinx.html

bronze robin Mar 27, 2024, 11:21 AM

#

boreal gale are you looking for twinx/twinxy? https://matplotlib.org/stable/api/_as_gen/matp...

Kind of but the y axis reference (0) is shifted for both plots that they shouldn't coincide

bronze robin Mar 27, 2024, 11:22 AM

#

boreal gale are you looking for twinx/twinxy? https://matplotlib.org/stable/api/_as_gen/matp...

I will look into it more, thank you

boreal gale Mar 27, 2024, 11:22 AM

#

bronze robin Kind of but the y axis reference (0) is shifted for both plots that they shouldn...

configure the ylims and ticks i guess? unless other matplotlib gurus have other ideas that is.

deep bough Mar 27, 2024, 12:19 PM

#

Hii how can I integrate Flowise ai chatbot with whatsapp

where I want to store user input in excel too

dawn whale Mar 27, 2024, 12:34 PM

#

final kiln maybe z-score normalization ? like, look for the tabulated average and standard ...

ah ok, in this case this would be mathematically the same as dividing by standard deviation (when calculating MAE loss), thanks!

lapis sequoia Mar 27, 2024, 2:30 PM

#

hallow sphinx Is this what I have to go through in my college if I am pursuing AI/ML? 😭

depends the kind of AI job you want

#

you can work with AI without need to use all math tools, i mean, develop a new deep learning architeture layout to solve a problem.
but still always highly recommended have a good statistics skill

#

but if you want research and create new types of algorithms like, optmization, implements libraries from scratch, or something related you will need a good math background

final kiln Mar 27, 2024, 3:13 PM

#

dawn whale ah ok, in this case this would be mathematically the same as dividing by standar...

Doing it right on the input is better tho, it reduces the risk of overflow during inference and all other sorts of floating point related trouble

lofty thorn Mar 27, 2024, 3:31 PM

#

please explain this

#

i didn't get the formula

final kiln Mar 27, 2024, 4:01 PM

#

which one

lofty thorn Mar 27, 2024, 4:02 PM

#

both

final kiln Mar 27, 2024, 4:02 PM

#

what part of the first one confuses you

lofty thorn Mar 27, 2024, 4:03 PM

#

what is j,n,p

final kiln Mar 27, 2024, 4:05 PM

#

p is the percentile, j is the index of the datapoint and n is the total number of datapoints

#

the author is arguing that the definition is ambiguous because there are many such P's

lofty thorn Mar 27, 2024, 4:10 PM

#

sorry i don't get it..
im new to stats

#

can someone guide me

final kiln Mar 27, 2024, 4:17 PM

#

you gotta give a starting point by trying to understand, throw an hypothesis, draw something, see what's the earliest thing you understand on the text and go from there

lofty thorn Mar 27, 2024, 4:26 PM

#

how do i understand this formula..
i mean i know the basic stats..like Mean deviation, standard deviation, median absolute deviation etc.
i get the interquartile range but don't know why we measure percentile

lofty thorn Mar 27, 2024, 4:26 PM

#

lofty thorn please explain this

and can't understand the formula...

final kiln Mar 27, 2024, 4:27 PM

#

just like mean, std and etc, its just another way to characterize the data without having to look at the entire thing

lofty thorn Mar 27, 2024, 4:28 PM

#

ok..

#

can you please decode the formula for me?

final kiln Mar 27, 2024, 4:30 PM

#

it's usually best if you do that yourself, otherwise you'll always be dependant on someone else to learn a new bit of information

#

I can help you unlock yourself if you get stuck in a specific place, but otherwise you should be able to study

lofty thorn Mar 27, 2024, 4:31 PM

#

ok fine

final kiln Mar 27, 2024, 4:32 PM

#

https://en.wikipedia.org/wiki/Percentile -

Percentile

In statistics, a k-th percentile, also known as percentile score or centile, is a score below which a given percentage k of scores in its frequency distribution falls ("exclusive" definition) or a score at or below which a given percentage falls ("inclusive" definition).
Percentiles are expressed in the same unit of measurement as the input sco...

#

final kiln Mar 27, 2024, 4:49 PM

#

final kiln im keeping it like this, but again only way im gonna know this is right is with ...

this project has a nice mlops pipeline from dev to prod, it has data pipelines using prefect, deployments, uses py, rust, cpp and CUDA and has fancy math

so for my next trick, im gonna try to get it published somewhere

lofty thorn Mar 27, 2024, 4:50 PM

#

final kiln https://en.wikipedia.org/wiki/Percentile -

this is too much detail

final kiln Mar 27, 2024, 4:51 PM

#

lofty thorn this is too much detail

try searchng about it on google

#

but also, there's this

final kiln Mar 27, 2024, 4:52 PM

#

lofty thorn this is too much detail

https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap/percentiles-cumulative-relative-frequency/v/calculating-percentile

#

khan academy is really good for getting your basics

carmine wharf Mar 27, 2024, 4:54 PM

#

Hi everybody, Do you think it is worth passing the tensorflow certification now, that they are gonna end it ? and in more general, do you think it's a nice certification to get, or is there a better one ?

flat token Mar 27, 2024, 4:55 PM

#

hallow sphinx What math do I need for AI/ML? Anyone have a playlist for that?

U just need linear algebra and statistical learning theory which is still just linear algebra. Tensor calculus is not relevant. Tensor, as you will learn in linear, are just billinear mappings and are used because in topics like RL ur mapping rank matrices of different rank to try to populate Q matrices and such. Either way an undergrad degree will only teach u how to use it, not to research it. Topics like that are learned latr

lapis sequoia Mar 27, 2024, 4:56 PM

#

what plotting libraries do people use for jupyter?

flat token Mar 27, 2024, 4:56 PM

#

Matplot what else

#

Matplot for plotting cv2 for image generation etc

final kiln Mar 27, 2024, 4:57 PM

#

flat token U just need linear algebra and statistical learning theory which is still just l...

you actually also need calculus

#

not tensor tho

jagged latch Mar 27, 2024, 4:58 PM

#

flat token Matplot for plotting cv2 for image generation etc

There's also Plotly and Shiny for dashboarding.

flat token Mar 27, 2024, 4:58 PM

#

Gradients have strict linear algebra relationships and a derivative is nothing more than a Picard iteration not a derivative in computer terms so you don't. It's implied you learn it prior to linear but it's not necessary to utilize a packages which is what most "a.i" people do anyways

final kiln Mar 27, 2024, 4:59 PM

#

flat token Gradients have strict linear algebra relationships and a derivative is nothing m...

uhm

#

you gotta know what a derivate is

flat token Mar 27, 2024, 4:59 PM

#

As a matter of fact, SVMs in c++ don't use gradient descent at all bc it's a slow shitty method that is weak

final kiln Mar 27, 2024, 4:59 PM

#

that's the meaning of gradient

#

as in, gradient descent

flat token Mar 27, 2024, 5:00 PM

#

Ur computer doesn't take a derivative

final kiln Mar 27, 2024, 5:00 PM

#

flat token Ur computer doesn't take a derivative

it also doesn't see geometry

flat token Mar 27, 2024, 5:00 PM

#

When u do gradient descent, it's performing a Picard iteration

final kiln Mar 27, 2024, 5:00 PM

#

yet, it renders it

flat token Mar 27, 2024, 5:01 PM

#

Ahhh it can take in geometrical images if u do an emplacement into R2

final kiln Mar 27, 2024, 5:01 PM

#

it actually understands a very small set of instructions all things consdering

flat token Mar 27, 2024, 5:01 PM

#

Which is also a linear algebra relationship

final kiln Mar 27, 2024, 5:01 PM

#

it understands +, -, if

flat token Mar 27, 2024, 5:01 PM

#

Because there is a topological mapping between any hashmap and R2

#

I.e. dict{key, value} -> R^2

final kiln Mar 27, 2024, 5:02 PM

#

you need to know what a gradient is in order to understand the concept of gradient descent, or am I missing something

#

how do you understand back propagation without knowing about the chain rule

flat token Mar 27, 2024, 5:02 PM

#

There are many algorithms that don't use gradient descent that are much faster.

hallow sphinx Mar 27, 2024, 5:02 PM

#

flat token U just need linear algebra and statistical learning theory which is still just l...

I found a maths series on youtube by free code camp. It teaches maths along with how to do that with Python, using libraries like sympy etc.

Should I learn from it or should I just learn normal maths and later learn how to implement it in Python?

flat token Mar 27, 2024, 5:03 PM

#

You should buy the book or go online and pirate linear algebra done right

final kiln Mar 27, 2024, 5:03 PM

#

flat token There are many algorithms that don't use gradient descent that are much faster.

I don't think you understand your claim when you say calculus is not needed

flat token Mar 27, 2024, 5:03 PM

#

And treat it like ur bible

#

U don't need calculus to do machine learning. You can use it to do some - but no u don't need it

hallow sphinx Mar 27, 2024, 5:03 PM

#

flat token You should buy the book or go online and pirate linear algebra done right

No need to learn "Python bindings" for it rn??

final kiln Mar 27, 2024, 5:04 PM

#

flat token U don't need calculus to do machine learning. You can use it to do some - but no...

but you need linear algebra ?

#

they go hand in hand imo, you need both

lapis sequoia Mar 27, 2024, 5:04 PM

#

flat token There are many algorithms that don't use gradient descent that are much faster.

i tried training my model using conjugate gradient from scipy and turns out it tweaks every single parameter one by one and sets it back and tweaks the next one

#

what else is there

hallow sphinx Mar 27, 2024, 5:05 PM

#

flat token You should buy the book or go online and pirate linear algebra done right

Any book you'd recommend?

flat token Mar 27, 2024, 5:05 PM

#

Within gradient descent there are a shitton

final kiln Mar 27, 2024, 5:05 PM

#

I don't agree with your assessment

flat token Mar 27, 2024, 5:05 PM

#

Conjugate gradient descent is just one. Stochastic gradient descent, vanilla gradient descent, primal dual conjugate gradient descent dual gradient descent

final kiln Mar 27, 2024, 5:05 PM

#

I think calculus is a fundamental subject to study

flat token Mar 27, 2024, 5:06 PM

#

Oh don't get me wrong it is, but it's too low level too understand what's going on

final kiln Mar 27, 2024, 5:06 PM

#

flat token Oh don't get me wrong it is, but it's too low level too understand what's going ...

I understand that if you just wanna do high level gluing of AI components you dont need to understand the fundamentals

flat token Mar 27, 2024, 5:07 PM

#

Exactly my point

final kiln Mar 27, 2024, 5:07 PM

#

but that's why I always preface that it depends on what you wanna do, how deep you wanna go

flat token Mar 27, 2024, 5:07 PM

#

Undergrads glue

lapis sequoia Mar 27, 2024, 5:07 PM

#

flat token Conjugate gradient descent is just one. Stochastic gradient descent, vanilla gra...

but stochastic and vanilla descents are the normal differential based gradient descent

flat token Mar 27, 2024, 5:07 PM

#

Yes u didn't c the end of what I said I said for undergrads (my assumption was he was an undergrad)

flat token Mar 27, 2024, 5:08 PM

#

lapis sequoia but stochastic and vanilla descents are the normal differential based gradient d...

They are different there is only one vanilla there are subtleties in each u cannot ignore

hallow sphinx Mar 27, 2024, 5:08 PM

#

Do I need to understand curves too?

flat token Mar 27, 2024, 5:08 PM

#

U need to go through linear algebra done right

#

From pinker to stinker

hallow sphinx Mar 27, 2024, 5:08 PM

#

aleoght

flat token Mar 27, 2024, 5:08 PM

#

That's ur job nothing else it's crucial

#

To u progressing

final kiln Mar 27, 2024, 5:09 PM

#

flat token U need to go through linear algebra done right

So that's my issue then, how can you say that linear algebra is needed but calculus is not

#

They're both first year subjects that always happen in the same semester

#

Cuz they're both fundamental

flat token Mar 27, 2024, 5:10 PM

#

Because not all machine learning methods need calculus but all machine learning needs linear but u misinterpreted what I meant when I said what is needed for high lvl work in the space

#

Calculus is very fundamental and even helps with linear algebra

#

But there are numerous methods in machine learning that do not use calculus and are actually faster as a result

final kiln Mar 27, 2024, 5:10 PM

#

flat token Because not all machine learning methods need calculus but all machine learning ...

Show me an opt Algo that don't use calculus so I see what you mean

flat token Mar 27, 2024, 5:11 PM

#

Any SVM that uses kernalization and abuses linear separability

#

And then u can utilize the same problem using nonlinear seperability

#

And it can be implemented in c++ as well

#

So it's way faster than Python which is a snail language

#

I also do cutting edge research tho so applying these techniques is much more difficult than just abusing a package - and there is something to be said about the time it takes to create and build something not really making up for speed inprovements

final kiln Mar 27, 2024, 5:13 PM

#

flat token Any SVM that uses kernalization and abuses linear separability

Recent algorithms for finding the SVM classifier include sub-gradient descent and coordinate descent. Both techniques have proven to offer significant advantages over the traditional approach when dealing with large, sparse datasets—sub-gradient methods are especially efficient when there are many training examples, and coordinate descent when the dimension of the feature space is high.

flat token Mar 27, 2024, 5:13 PM

#

But I can't and don't use packages really anyway so I'm just used to doing things by hand the right wah

#

Ahh yes the curse of dimensionality

final kiln Mar 27, 2024, 5:13 PM

#

Sounds like you'd be robbing yourself of a lot of tooling by not knowing calculus

flat token Mar 27, 2024, 5:14 PM

#

Well we are talking Abt sometjing very different niw

#

I'm just saying the only math an undergrad must know to know how to use the packages and write real machine learning software they just need a strong base in coding basic and intermediate stuff discrete structures the like and linear algebra to understand feature spaces

#

Now if they wanted to mow the underpinnings of anything else then ofc calculus is mandatory

#

Know*

final kiln Mar 27, 2024, 5:16 PM

#

flat token I'm just saying the only math an undergrad must know to know how to use the pack...

The reason why I disagree is that I believe both subjects are important to have a high level understanding of a lot of the core concepts in ML

flat token Mar 27, 2024, 5:16 PM

#

High lvl and that's where I would totally agree but we aren't talking high lvl

#

And calculus is not just derivative

final kiln Mar 27, 2024, 5:16 PM

#

If we're not talking high level, then I'd argue you'd need more math

flat token Mar 27, 2024, 5:17 PM

#

There is a lot more meat and potatoes there as well and besides on a daily basis most people would never even touch partial derivatives anyway.

#

Just call a package and be on their way

final kiln Mar 27, 2024, 5:18 PM

#

You're touching one everytime you train a model

#

I argue, it's best to know you're doing so, that's all

flat token Mar 27, 2024, 5:18 PM

#

Ofc it's always good to know what ur doing but as I said linear algebra done right is self contained so it will teach him the building blocks anyway

final kiln Mar 27, 2024, 5:19 PM

#

Ah that idk I don't know that book. Can't really judge

flat token Mar 27, 2024, 5:19 PM

#

It's the canonical text I've taught from it many times and I stand by it always

#

It's always my rec because it's extensive, advanced, and self contained as well perfecr explanations from a real mathematician not some of these apezoid linear books that are a joke

final kiln Mar 27, 2024, 5:21 PM

#

I don't recall which book I used for linear algebra, but I learned my calculus from spivak

#

Long time ago, still remember it lol

flat token Mar 27, 2024, 5:22 PM

#

Yeah Stewart has a good one too but I learned calculus in high school so I don't remember the specifics the book I used

#

But early transcendentals by Stewart or whatever it's called is good

#

Mostly bc of the good parts included in it for multi which if u want to understand gradient descent at that basic lvl is crucial

#

Then obviously pffafenburger or rudin to understand what ur even doing in calculus but that's just the mathematician in me

flat token Mar 27, 2024, 5:24 PM

#

final kiln I don't recall which book I used for linear algebra, but I learned my calculus f...

Are you also a PhD ?

final kiln Mar 27, 2024, 5:24 PM

#

flat token Are you also a PhD ?

No I ran away from academia

#

But was gonna enroll yea

flat token Mar 27, 2024, 5:24 PM

#

Understandable I c why people don't like it

#

I'm a PhD in applied math at UIUC which may seem weird that I'd say don't understand all the math but I think undergrads mostly just want a job

#

And don't need to actually know anything

#

Just let people like me write packages for them

final kiln Mar 27, 2024, 5:27 PM

#

Yeah ig it can be easy to forget that these are hard subjects if you're young and learning stuff for the first time. To me they're like stuff I've learned on my first semester, and not even close to the level of mental punishment I had to go through afterwards

flat token Mar 27, 2024, 5:27 PM

#

Exactly u also seem like u liked it a lot and were willing to pursue it at a high lvl

#

Most people seriously don't care so when I offer guidance -> I do that first and if I get pushback then I'm like ok u wanna really learn? Then do this this and this

final kiln Mar 27, 2024, 5:28 PM

#

Yeah you gotta like it

flat token Mar 27, 2024, 5:28 PM

#

Like id recommend to most people if they really wanna do machine learning? Gotta learn dynamical systems, graph theory, algorithms and recursive structures, etc

#

Lots of probability theory too bc most research is now in MDPs and the likes

final kiln Mar 27, 2024, 5:30 PM

#

Whelp, I didn't do half of those, I did physics

#

Tho ig you can say I picked up algos during my thesis

flat token Mar 27, 2024, 5:31 PM

#

Yeh I hate physics but I'm starting to release my beef against jt

final kiln Mar 27, 2024, 5:31 PM

#

I also had a ton of stats and probs cuz I did a minor in maths, I literally prefer that someone punch me before having to hear another intro to the subject >.>

final kiln Mar 27, 2024, 5:32 PM

#

flat token Yeh I hate physics but I'm starting to release my beef against jt

Mathematicians get triggered by the handwaviness in physics, and it's kinda funny

flat token Mar 27, 2024, 5:32 PM

#

Yes I do

#

It's very frustrating

#

But I try to not remember that they are different fields and in reality physics is crucial

final kiln Mar 27, 2024, 5:34 PM

#

flat token But I try to not remember that they are different fields and in reality physics ...

Physicists say the exact same thing about maths

#

I know the perfect video for this but I lost it

flat token Mar 27, 2024, 5:35 PM

#

Yeh u know mathematicians could learn something from other fields about not wasting our time so much on things that don't matter but in the same vein everyone could learn from maths that everything matters

#

That's why I do applied and research RL and computational graph theory both topics actually have serious real world applications so it's both for the love of math and actually trying to further civilization

#

Instead of just picking my butt trying to prove crazy analysis or algebraic stuff that's meaningless

#

I meant math not maths I hate it when people call it maths

#

But I'm American so xd

final kiln Mar 27, 2024, 5:40 PM

#

Honestly at some point physics becomes super similar to math, especially at the forefront of research. They're all just doing math that 90% of the time doesn't really relate to their day to day experience so it ends up being similar to just inventing more math

They say that's how the field became stale with string theory, a lot of math, no experiments. That side of things will come back once they have better hardware ig, but til then I'm not sure if they can do anything.

final kiln Mar 27, 2024, 5:41 PM

#

flat token That's why I do applied and research RL and computational graph theory both topi...

Super interesting stuff, did you do math first then went into ML research ?

flat token Mar 27, 2024, 5:43 PM

#

Well ironically despite what I said earlier which was guiding for the other person, math is ML and is critical to understanding it (assuming u want to at a high lvl)

#

So yes I did do math first I did my undergrad at NYU in mathematics

final kiln Mar 27, 2024, 5:45 PM

#

flat token Well ironically despite what I said earlier which was guiding for the other pers...

Ah when I said high level I meant it more in the computer science sense, in the "abstracting away the details" way

flat token Mar 27, 2024, 5:45 PM

#

But the ML research I do is itself a math problem so

final kiln Mar 27, 2024, 5:46 PM

#

That's funny cuz I see it as a complete analogue to physics where you get your hypothesis and model and test it against experiment

flat token Mar 27, 2024, 5:46 PM

#

Well it depends I guess on if ur just doing one stage of it or the whole thing or whatever

final kiln Mar 27, 2024, 5:47 PM

#

At least in deep learning it seems to be very experimental. You don't know nor can't prove anything 90% of the time, you just kinda gotta test it out

flat token Mar 27, 2024, 5:48 PM

#

Well I wouldn't go that far my research 100% I prove first then I implement then I go back to the drawing board try something new

#

Also speaks to how cutting edge the work that is being done. Mine is all in deep reinforcement learning so it's very cutting edge

#

But I had to do the math first and then I'll have to do more math later down the road when I'm done working with the toy problem I've been playing witj

final kiln Mar 27, 2024, 5:49 PM

#

The papers I read are usually about transformers or semantic segmentation stuff

#

None of them are particularly mathy

flat token Mar 27, 2024, 5:49 PM

#

Ahh if u want some super good readings do

#

Rcnn fast rcnn faster rcnn mask rcnn (detectron)

#

That's 4 super interesting paper right there I'm about to give a presentation on them along with my implementations

final kiln Mar 27, 2024, 5:50 PM

#

final kiln current draft, bit more refined, still haven't decided on some details of the no...

I'm doing this stuff rn

#

I feel like it's 100x more mathy than what you'd find on the literature it will refer to

#

Which includes the attention all you need, which was a super impactful study

#

But idk, I'm just following my interests

final kiln Mar 27, 2024, 5:53 PM

#

flat token Rcnn fast rcnn faster rcnn mask rcnn (detectron)

I'll look into it, I feel like I've read about it b4

flat token Mar 27, 2024, 5:53 PM

#

What these papers choose to include is also field dependent

#

Sometimes it's published also from the math perspective but u might just be looking at the comp sci perspective or something

#

Idk the specific problem tho can't know every problem

final kiln Mar 27, 2024, 5:54 PM

#

I think in that sphere at least the computer science perspective is crucial cuz it directly relates to the $ you need to train at large scales

flat token Mar 27, 2024, 5:54 PM

#

True detectron2 I couldn't get to work cuz my computer only has 2 gpus

#

And u need 8 to optimally train it

final kiln Mar 27, 2024, 5:55 PM

#

o.o

#

I mean it depends

flat token Mar 27, 2024, 5:55 PM

#

I could've written better code to make it use both my cpus so 2 cpus and 2gpus but I didn't have time

#

So I half assed it

final kiln Mar 27, 2024, 5:55 PM

#

To replicate the attention all you need one you also need a lot of GPU and like a week or something non stop

#

A good direction for research is to try to find ways to democratize all this stuff. Industry is already moving in that direction

flat token Mar 27, 2024, 5:57 PM

#

Yeh I wouldn't bet on that

#

Too much money to invest no one is going to want to fork it over for free

#

Electricity ain't free 🤷‍♂️

final kiln Mar 27, 2024, 5:57 PM

#

https://www.theverge.com/2024/3/23/24109511/stability-ai-ceo-emad-mostaque-resignation-decentralized-ai

The Verge

Stability AI CEO resigns to “pursue decentralized AI”

There was a lot of drama in the AI startup world this week.

final kiln Mar 27, 2024, 5:57 PM

#

flat token Electricity ain't free 🤷‍♂️

it's cheaper than renting a gpu on amazon

#

there are ways to do it

flat token Mar 27, 2024, 5:58 PM

#

Wouldn't matter to me anyway I have access to Delta at UIUC so I have a massive supercomputer XD

final kiln Mar 27, 2024, 5:58 PM

#

flat token Wouldn't matter to me anyway I have access to Delta at UIUC so I have a massive ...

I keep hearing that kind of stuff xddd

flat token Mar 27, 2024, 5:58 PM

#

And my advisor headed the project so I get to work on it a lot which is lit

#

Yeh it's huge

#

And u can multi GPU and multi CPU train

final kiln Mar 27, 2024, 5:58 PM

#

I suppose that could explain the disconnect between you guys and industry needs

flat token Mar 27, 2024, 5:58 PM

#

They built it for both

#

Yeh I mean I also have access to a supercomputer in industry bc I work in quant finance

#

And they have way more money than even my school which has billions xd

final kiln Mar 27, 2024, 5:59 PM

#

by industry I would mean like, the startup scene which is arguably a crucial backbone for innovation

#

would be nice that mistral ai didnt have to sellout to msft to keep doing what they do

#

or that anthropic didnt need to so much investment from the big dogs

flat token Mar 27, 2024, 6:01 PM

#

True I don't know uch Abt the startup scene tho

#

I'm a big dog unfortunately

#

Or fortunately depending who u ask xd

#

Research is great and all but research don't pay bills

final kiln Mar 27, 2024, 6:02 PM

#

i dont care either way, they both have their place imo

flat token Mar 27, 2024, 6:02 PM

#

True the little guys in this space have actually a lot of impact

#

Cus they have to force innovation with minimal resources

#

Vs just abusing supercomputers

#

But then ig when quantum computing comes out all this is gonna be irrelevant anyway

final kiln Mar 27, 2024, 6:03 PM

#

it's also harder to mobilize a company like google, or an old timey institution like a uni

flat token Mar 27, 2024, 6:03 PM

#

Yeh well universities have like tons of people doing different problems so they don't really give a shit

final kiln Mar 27, 2024, 6:03 PM

#

in this space they seem to be behind tho

#

most impactful stuff has come from companies and not unis

flat token Mar 27, 2024, 6:04 PM

#

Well big companies use our stuff

#

And at places like Nvidia those PhDs also teach

#

So it's like a combined effort if anything

#

With every1 doing a specific part of the pipeline

final kiln Mar 27, 2024, 6:05 PM

#

uhm, I suppose so, wouldn't make sense any other way anyway

#

but still, the major innovations I can recall seem to come from the free market, tho ofc the knowledge has to come from somewhere and it comes from the faculties

final kiln Mar 27, 2024, 6:58 PM

#

@flat token I found the video - https://www.youtube.com/watch?v=NzK11DrRdks - not suitable for all audiences, math majors be advised

YouTube

Ron & Math

S-Tier Acrobatics Calculus That Physicists Do

This video is inspired by an example from "Professor Stewart's Casebook of Mathematical Mysteries".

▶ Play video

#

"as someone with a physics background, I can't resist expand anything that is expandable" 🤣

wooden sail Mar 27, 2024, 7:09 PM

#

https://en.wikipedia.org/wiki/Neumann_series just for a little completeness, cuz that video was painful lmao

Neumann series

A Neumann series is a mathematical series of the form

∑k=0∞Tk{\displaystyle \sum _{k=0}^{\infty }T^{k}}where T{\displaystyle T} is an operator and Tk:=Tk−1∘T{\displaystyle T^{k}:={}T^{k-1}\circ {T}} its k{\displaystyle k} times repeated application.
This generalizes the geometric series.
The series is named after the mathematician Carl Neumann...

mint palm Mar 27, 2024, 7:18 PM

#

please help:
i am getting error:
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same

#

i have tried following:


        def _convert_weights_to_fp16(l):
            if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
                l.weight.data = l.weight.data.half()
                if l.bias is not None:
                    l.bias.data = l.bias.data.half()

            if isinstance(l, nn.MultiheadAttention):
                for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
                    tensor = getattr(l, attr)
                    if tensor is not None:
                        tensor.data = tensor.data.half()

            for name in ["text_projection", "proj"]:
                if hasattr(l, name):
                    attr = getattr(l, name)
                    if attr is not None:
                        attr.data = attr.data.half()
        self.enc.apply(_convert_weights_to_fp16)
        
    def forward(self, x, window_of_500_patch, visualize, epoch_num, batch_idx, idx):
        
        with torch.no_grad():
            with torch.cuda.amp.autocast():
                # print(x.dtype)
                # x_f = x.float()
                #! (N, 197, 768) => pick [CLS] => (N, 768)
                out = self.enc.forward(torch.rand(2, 3, 224, 224).cuda(), output_hidden_states=True)```

final kiln Mar 27, 2024, 7:30 PM

#

wooden sail https://en.wikipedia.org/wiki/Neumann_series just for a little completeness, cuz...

Honestly it can be really awful if you're studying advanced physics from books cuz they'll do these mental acrobatics and not tell you about them so you kinda have to go back and forth endlessly til you figure it out

royal crest Mar 27, 2024, 11:32 PM

#

what would be a good way to visualise a graph network? and should i do the visualisation pre or post clustering?

peak pine Mar 27, 2024, 11:55 PM

#

hey, so this is a super simple problem, but im trying to import a csv to remove the first few rows. My csv is in the same folder and the first few rows contain text and other things. I've been using this code and keep getting the error that they're no columns to parse

#

#

do you know why this isn't working

hollow sentinel Mar 27, 2024, 11:58 PM

#

https://www.usatoday.com/story/money/markets/2021/05/02/corn-agricultural-commodities-price-increase-cornbread-corn-tortillas/4916054001/

USA TODAY

Corn tortilla prices are increasing in Mexico. Here's what American...

Tortillas and corn chips are about to get pricier. Here's why your favorite staples and snacks aren't going to be bargain-basement deals this summer.

#

further details supporting my initial hypothesis

#

i need to find more datasets though

hollow sentinel Mar 28, 2024, 12:22 AM

#

#

PMAIZMTUSDM is the global price of corn. there's a huge jump for it in 2022.

karmic trail Mar 28, 2024, 3:40 AM

#

I am very confused on how the deep Q learning algorithm can acutely work. Specifically I am confused on how the loss function will mean anything. How can you you be sure that reward + gamma*targetModel's Predction will guide you in the right direction if the target Model's prediction can be completely off. Thank you!

brave cobalt Mar 28, 2024, 3:44 AM

#

can sselenium access this inspect tab, im planning to get the xhr that have "multi" in the name of xhr

raw mortar Mar 28, 2024, 4:11 AM

#

royal crest what would be a good way to visualise a graph network? and should i do the visua...

https://networkx.org/documentation/stable/reference/drawing.html
It internally uses other plotting packages.

magic dune Mar 28, 2024, 6:58 AM

#

Has anyone worked with Time series

flat token Mar 28, 2024, 7:37 AM

#

karmic trail I am very confused on how the deep Q learning algorithm can acutely work. Specif...

Deep q learning is verification of a bellman equation

#

It's possible to teater out from not having enough iterations but that problem is fixed in a lot of different ways

#

For one the scheduler modifies the learning rate as the explore phase and exploit phase progress to continuously attempt decrease loss (which is ultimately the point)

#

Through this you build up your Q table and that completes the problem formulation

#

There are a lot more nitty gritty details but I don't know your background. This is quite advanced and not something cursory to just look at learn and know which is why you may be having some difficulty understanding the underpinngs

final kiln Mar 28, 2024, 8:28 AM

#

final kiln im keeping it like this, but again only way im gonna know this is right is with ...

alright, today I'm chunking away some time to code these

#

..

lapis sequoia Mar 28, 2024, 8:54 AM

#

why in nn.Module.register_full_backward_hook, thats what the hook is:

hook(module, grad_input, grad_output) -> tuple(Tensor) or None

why is grad_input and grad_ouput a tuple with one element?\

hallow sphinx Mar 28, 2024, 8:56 AM

#

Hey, so when we train an AI, it needs to store the data. So how do AIs which play any game work? Is the data or something distributed along with the AI?

final kiln Mar 28, 2024, 9:00 AM

#

hallow sphinx Hey, so when we train an AI, it needs to store the data. So how do AIs which pla...

information is encoded in the weights

hallow sphinx Mar 28, 2024, 9:00 AM

#

weights?

#

What is that?

final kiln Mar 28, 2024, 9:02 AM

#

hallow sphinx What is that?

in deeplearning models are super complicated functions

as an example you can look at a super simple function

f(x) = mx+b

in this case, m and b are the weights, which you can change to make the function be the shape that you want

in deeplearning models, exact same concept, but the function is super complicated

hallow sphinx Mar 28, 2024, 9:03 AM

#

final kiln in deeplearning models are super complicated functions as an example you can lo...

looks complicated

final kiln Mar 28, 2024, 9:03 AM

#

lapis sequoia why in `nn.Module.register_full_backward_hook`, thats what the hook is: `hook(...

potentially just for compatibility with older parts of the program, only way to know for sure is see it stated in the docs or ask one of the maintainers I think

final kiln Mar 28, 2024, 9:05 AM

#

hallow sphinx looks complicated

you can also think of it as a big machine, with millions to billions of knobs, which you can alter so that the behaviour of the machine changes, you alter these knobs until the machine does what you want, and thus, the information gets encoded in them

iron basalt Mar 28, 2024, 9:47 AM

#

hallow sphinx Hey, so when we train an AI, it needs to store the data. So how do AIs which pla...

https://www.youtube.com/watch?v=sw7UAZNgGg8

YouTube

Vsauce2

The Game That Learns

By the 1950s, science fiction was beginning to become reality: machines didn’t just calculate; they began to learn. Machine calculating was out. Machine learning was in. But we had to start small.

Donald Michie’s “Machine Educable Noughts And Crosses Engine” -- MENACE -- was composed of 304 separate matchboxes that each depicted a possible stat...

▶ Play video

final kiln Mar 28, 2024, 10:13 AM

#

this is the first one

#

i feel like there's one too many if statements, but im leaving it cuz im never not lost in the middle of all this stuff

final kiln Mar 28, 2024, 11:25 AM

#

im almost done with this stuff i think, taking a while ngl

deep bough Mar 28, 2024, 12:09 PM

#

@final kiln hii you have any idea how I can connect flowise chatbot with whatsapp

#

or say can I integrate it with whatsapp
Where I still want to extract user name and email from chat and its worrking with custom tool

final kiln Mar 28, 2024, 12:11 PM

#

deep bough <@935270247366271027> hii you have any idea how I can connect flowise chatbot wi...

you probably can't, last I recall whatsapp doesnt allow it

deep bough Mar 28, 2024, 12:12 PM

#

similar to botpress

final kiln Mar 28, 2024, 12:12 PM

#

you can probably do it o telegram

deep bough Mar 28, 2024, 12:12 PM

#

with meta developer I got whatsapp bussiness id

#

wait lemme show something

final kiln Mar 28, 2024, 12:13 PM

#

even with that, idk, I wanted to do a similar thing so I could have chat gpt on my whatsapp, but wasnt able to

deep bough Mar 28, 2024, 12:16 PM

#

I can send message and receive using make.com

final kiln Mar 28, 2024, 12:17 PM

#

interesting

deep bough Mar 28, 2024, 12:18 PM

#

so is there any way I can send this user and system message in it

final kiln Mar 28, 2024, 12:18 PM

#

deep bough I can send message and receive using make.com

dont they expose an api ?

deep bough Mar 28, 2024, 12:18 PM

#

final kiln dont they expose an api ?

yaa they require whatsapp business token

final kiln Mar 28, 2024, 12:19 PM

#

deep bough yaa they require whatsapp business token

then you can just use that api to fetch and send data

#

should be a pretty small pythons script

deep bough Mar 28, 2024, 12:19 PM

#

deep bough so is there any way I can send this user and system message in it

but How I can fetch this user and system messages

final kiln Mar 28, 2024, 12:20 PM

#

deep bough but How I can fetch this user and system messages

that will depend on the api of the other thing you're using

deep bough Mar 28, 2024, 12:20 PM

#

means?

final kiln Mar 28, 2024, 12:20 PM

#

it's probably described in the documentation

#

like you gotta see how API A works, how API B works, and then you glue them with a script

deep bough Mar 28, 2024, 12:21 PM

#

API A you mean whatsapp AIP and API B

#

??

final kiln Mar 28, 2024, 12:22 PM

#

deep bough so is there any way I can send this user and system message in it

API B means whatever you're using to interact with the language m odel, be it open ai api directly, or whatever this is

deep bough Mar 28, 2024, 12:22 PM

#

right now I am passing direcly messages

#

there is no API B]

final kiln Mar 28, 2024, 12:23 PM

#

you gotta find one

deep bough Mar 28, 2024, 12:23 PM

#

😭 yess

final kiln Mar 28, 2024, 12:23 PM

#

likely in the docs

deep bough Mar 28, 2024, 12:23 PM

#

flowise is suchh a bitch

#

🙂

final kiln Mar 28, 2024, 12:24 PM

#

why are you using it

#

you could use open ai directly

deep bough Mar 28, 2024, 12:24 PM

#

client requirement

#

it can easly done using direct python script

#

like have to just make a chatbot using llm

final kiln Mar 28, 2024, 12:24 PM

#

maybe you can make it as thin as possible and do the rest in py

deep bough Mar 28, 2024, 12:25 PM

#

okk thanks for helping Ill try

lapis sequoia Mar 28, 2024, 2:41 PM

#

I came across reddit post saying matplotlib sucks and so many people agreed

#

but what else do I use???

#

I tried a whole bunch of libs they keep deleting my plots after i save my notebook and close and reopen

#

i do agree matplotlib is annoying last time all i needed was major and minor ticks for different y limits and it was so painful

#

also I am looking for some plot that I can update in place in jupyter notebook like fastprogress plot is there anything like that

final kiln Mar 28, 2024, 2:43 PM

#

haven't found a good direct alternative tbh, but yeah the API for matplotlib could be better for sure

#

there's plotly

#

I think this is one of those tasks where you just ask gpt for some boilerplate code and modify it to your needs

serene scaffold Mar 28, 2024, 2:46 PM

#

final kiln haven't found a good direct alternative tbh, but yeah the API for matplotlib cou...

it's hard--and frankly terrifying--to imagine how it could be worse.

final kiln Mar 28, 2024, 2:48 PM

#

I have like, two or three functions I use from it, plot, scatter, figure and subplot

#

never dared to go farther

#

oh and hist

hollow sentinel Mar 28, 2024, 2:51 PM

#

i use seaborn

#

which is built... on top of matplotlib

lapis sequoia Mar 28, 2024, 2:52 PM

#

I've made a class that I can say class.plot(x) or class.imshow(x) any amount of times and then when I say class.show() it creates a figure with all the things I added to it and I dont have to deal with figures and axes

#

but still giant drawback is that its not interactive, I need to plot 20 lines and I have no idea which is which from the legendf

#

I liked bokeh (from little usage) but it deletes after reloading notbook

hallow sphinx Mar 28, 2024, 2:58 PM

#

Why use conda over pip

lapis sequoia Mar 28, 2024, 3:00 PM

#

hallow sphinx Why use conda over pip

its the only python environment manager that I was able to install and figure out

lapis sequoia Mar 28, 2024, 3:03 PM

#

hallow sphinx Why use conda over pip

also when u install pytorch it seems to be better at installing CUDA

hallow sphinx Mar 28, 2024, 3:04 PM

#

I was watching AI and data science course and they recommended to install conda

final kiln Mar 28, 2024, 3:07 PM

#

I used to use it a lot before having my life dominated by docker

lapis sequoia Mar 28, 2024, 3:08 PM

#

hallow sphinx I was watching AI and data science course and they recommended to install conda

conda is considered to be "for datascience" but idk why

final kiln Mar 28, 2024, 3:08 PM

#

anaconda comes with all the scientific packaging

lapis sequoia Mar 28, 2024, 3:09 PM

#

yeah but they could make pip come with all of it

hallow sphinx Mar 28, 2024, 3:09 PM

#

yea

final kiln Mar 28, 2024, 3:09 PM

#

Not really, python is a general purpose language, AI is one of a hundred uses

lapis sequoia Mar 28, 2024, 3:09 PM

#

they could make anapip for AI users

#

the conda part I don't think is specifically for AI

final kiln Mar 28, 2024, 3:10 PM

#

I don't wanna argue against my own self interest here, but idk if we'll ever get special treatment like that

lapis sequoia Mar 28, 2024, 3:10 PM

#

just a nice environment manager and can istall some packages without breaking dependencies unlike pip

final kiln Mar 28, 2024, 3:11 PM

#

lapis sequoia just a nice environment manager and can istall some packages without breaking de...

Yeah I see what you mean. Pip does have something like that with venv

#

https://docs.python.org/3/library/venv.html

Python documentation

venv — Creation of virtual environments

Source code: Lib/venv/ The venv module supports creating lightweight “virtual environments”, each with their own independent set of Python packages installed in their site directories. A virtual en...

#

But I've never used it a lot cuz I used conda or docker

#

There's also this one https://python-poetry.org/

Poetry - Python dependency management and packaging made easy

Python dependency management and packaging made easy

#

Which is the yarn/npm analogue of py

#

Arguably better than conda, but depends on your tastes

abstract rune Mar 28, 2024, 4:51 PM

#

I want to clarify a doubt regarding linear regression
I came across 2 ways to solve the problem

gradient descent
a mathematical equation = (XXT)^(-1) (XY)

is this correct and are both used in real life?

#

is the second method called OLS ?

orchid kayak Mar 28, 2024, 5:06 PM

#

Hey there, not sure if this is the right place to ask, but I'm facing an issue and wondering if anyone has stumbled across this before
I'm managing an air gapped environment where part of my job is making jupyter notebooks accessible to data scientists. We mostly do this through JupyterLab images on K8S, but we also provide the ability to work with VS code

We configured an image that has vscode-server, that way they can SSH into the remote container and leverage robust hardware, while conveniently working from vs code. But we only considered people working with regular python files in vscode

Some clients requested the ability to work with Jupyter notebooks in vscode from the remote ssh, We figured it'd be a simple case of installing the microsoft jupyter extesion for vscode, installing IPython, ipykernel and jupyter on our images and installing the python extension.

However, for some reason, the Jupyter extesion doesn't detect any Jupyter kernels. It doesn't even detect that python is installed, which is the weirdest part because it clearly is, I can run python code with the python extension,

Does anyone have an idea as to what the problem is? I am using VScode 1.82.2

wooden sail Mar 28, 2024, 5:43 PM

#

abstract rune is the second method called OLS ?

yes and yes

toxic mortar Mar 28, 2024, 6:41 PM

#

I want to semantically group independent document information in the same context. For example, if there are 50 hedge fund reports, the ideal output is "two advisers predicting that stock X will increase while one predicts a decrease", etc...

I am pretty new to this, so I tried embeding and cluster, provided me somewhat bad results but set me to the path of exploring more in that direction.

Recently, I've found out about BERTopic and Topic modelling. I think this is huge and that I am closer than ever to solving this. My BERTopic stack looks like:

embedding_model: all-MiniLM-L6-v2
representation_model:  [KeyBERTInspired(top_n_words=25), MaximalMarginalRelevance(diversity=0.4)]
vectorizer_model: CountVectorizer(stop_words="english")

I want to either:
A) For every document, I am parsing and looking for "similarities" to run fit and transform so I have a list of every document's topics. Then, for connected topic-based docs to use, like ChatGPT, to try to find similarities for specific "overlapping."
B) Run all the documents together as a knowledge base to see mutual topics and, based on the output, search for relevant parts in the documents.

Bonus questions:

Should I split documents into semantically grouped parts, or should I have one element/document?

Thanks

abstract rune Mar 28, 2024, 6:46 PM

#

wooden sail yes and yes

Thanks. !

past meteor Mar 28, 2024, 9:17 PM

#

Yaay, I won the hackathon I spoke about recently

versed pilot Mar 28, 2024, 9:22 PM

#

lapis sequoia I came across reddit post saying matplotlib sucks and so many people agreed

Choose the libraries that work for you, whatever they say on Reddit. Matplotlib is behind a lot of other stuff (pandas plot, seaborn etc.), it's worth learning a bit about it even if you don't use it directly

long canopy Mar 28, 2024, 9:29 PM

#

anyone got any news on distributed inference?

lapis sequoia Mar 28, 2024, 10:44 PM

#

give me ML ideas

flat token Mar 28, 2024, 11:05 PM

#

lapis sequoia conda is considered to be "for datascience" but idk why

R is the "data science" language. Most other languages can do many things

serene scaffold Mar 29, 2024, 12:22 AM

#

hallow sphinx Why use conda over pip

in 2024, you can uninstall conda and forget you ever had it.

#

I've been doing ML since 2018 or so, and I've never had conda installed.

agile owl Mar 29, 2024, 1:29 AM

#

Does anyone here have experience designing machine learning pipelines using model-based parallelism so that you can effectively have bigger-than-one-gpu models

#

I am wondering if there's any resources someone recommends on this topic

floral pine Mar 29, 2024, 3:52 AM

#

lapis sequoia but what else do I use???

try plotly, it's a good visualization library

hallow sphinx Mar 29, 2024, 4:40 AM

#

serene scaffold in 2024, you can uninstall conda and forget you ever had it.

So conda is just a Cpython distribution that contains pre installed packages and tools for data science?

high crow Mar 29, 2024, 5:00 AM

#

hello, im not sure where to post this, so please let me know if there is some other place I should post this but I needed some help understanding this. I get the idea but I don't understand how to do it. How do I for example use the bigram model in this instance?

wooden sail Mar 29, 2024, 5:14 AM

#

i still use conda, it's an easy way of managing python in environments without any permissions. mamba is pretty good

kind loom Mar 29, 2024, 5:31 AM

#

hello guys
so I wrote my BSc final exams few days ago and my Project defence would be coming up in a month time. I just want to say I am officially unemployed😂 .
I am a Data Scientist and Machine Learning Engineer, been programming since 2021. I am currenly exploring NLP and i am working on a TextSentimentAnalysis project, hoping to eventually build a Customer Review Analysis software.

I am readily available to take on any role in the Data field. So guys please hit me lemon_fingerguns_shades

high crow Mar 29, 2024, 6:36 AM

#

sure 👊 😏 👊

woeful breach Mar 29, 2024, 7:05 AM

#

hey i need some hep with data preprocessing

#

anyone free to lend a hand and teach me

mortal pumice Mar 29, 2024, 9:50 AM

#

hello! I'm writing a data processing code that heavily use pandas library and it seems kinda slow. I have no idea how I can optimize it but maybe someone here can help. Can I post a my code here ?

past meteor Mar 29, 2024, 9:51 AM

#

mortal pumice hello! I'm writing a data processing code that heavily use pandas library and it...

sure, send it

past meteor Mar 29, 2024, 9:52 AM

#

kind loom hello guys so I wrote my BSc final exams few days ago and my Project defence wou...

Sorry, we don't do look-for-hires but if you want general tips you can check out #career-advice

mortal pumice Mar 29, 2024, 9:57 AM

#

Hope you guys can find something to optimize. 🙂
Here is the main loop of my program:

import pandas as pd
from strategies.Strategy import Strategy


def strategyLoop(df: pd.DataFrame, strategy: Strategy, longTermMAPeriod:int=200, pipValue:float=50.0, capital:float=1000) -> pd.DataFrame:

    CAPITAL = capital #$
    inPosition = False
    entryPrice, sl, tp = 0, 0, 0
    slInPips, tpInPips = 0, 0
    pipValue = pipValue
    lot_size = 0.01
    entryDate = df["datetime"].iloc[0]
    tradesData = []

    for i in df.index[longTermMAPeriod+strategy.N:]:

        currentPrice = df["close"].iloc[i]

        if not inPosition:
            inPosition, slInPips, tpInPips, entryPrice, entryDate = strategy.checkIfCanEnterPosition(df, i, CAPITAL)
        else:
            newSlInPips = strategy.updateSl(currentPrice, entryPrice, tpInPips)
            if newSlInPips != 0: slInPips = newSlInPips
            sl, tp = entryPrice+slInPips, entryPrice+tpInPips
            lose = currentPrice <= sl
            win = tp <= currentPrice
            if lose or win:
                profit = tpInPips*pipValue*lot_size if win else slInPips*pipValue*lot_size
                #print(f"profit {profit}, tpInPips: {tpInPips}, slInPips: {slInPips}")
                CAPITAL += profit 
                tradesData.append({
                    "entry_date":entryDate, 
                    "exit_date":df["datetime"].iloc[i], 
                    "entry_price":entryPrice, 
                    "stop_loss":sl, 
                    "take_profit":tp, 
                    "profit":profit, 
                    "capital_after_trade":CAPITAL
                })
                inPosition = False

    return pd.DataFrame(tradesData)

#

and here is a function used in the previous code:


    def checkIfCanEnterPosition(self, df: pd.DataFrame, i: int, capital: float) -> tuple[bool, float, float, float, str]:
        inPosition, slInPips, tpInPips, entryPrice, entryDate = False, 0, 0, 0, ""
        
        allowedToTrade = True
        
        if self.uselongTermMA:
           allowedToTrade = True if df["longTermMA"].iloc[i] < df["HA open"].iloc[i] else False

        if allowedToTrade:
            shortTermMAZoneMin = df["shortTermMA"].iloc[i]-(df["close"].iloc[i]/100)*self.percentZoneFromMA # => MA - 3% du prix
            shortTermMAZoneMax = df["shortTermMA"].iloc[i]+(df["close"].iloc[i]/100)*self.percentZoneFromMA # => MA + 3% du prix
        
            isLastNCandlesInshortTermMAZone = False
            for j in range(i-self.N, i):
                if utility.between(df["HA close"].iloc[j], shortTermMAZoneMin, shortTermMAZoneMax):
                    isLastNCandlesInshortTermMAZone = True
                    break
            
            if df["shortTermMA"].iloc[i] < df["HA open"].iloc[i] and df["HA color"].iloc[i] == "green" and isLastNCandlesInshortTermMAZone:
                entryDate = df["datetime"].iloc[i]
                entryPrice =  df["close"].iloc[i]
                if self.useSR:
                    isBelowMiddleSR, slInPips, tpInPips = self.determineSlAndTp(capital, entryPrice, self.keyLevels)
                    inPosition = isBelowMiddleSR
                    
                else:
                    slInPips = -utility.getSlInPipsForTrade(
                                invested = capital*self.maxRisk,
                                pipValue = 50, # valeur du pip pour le SP500 pour un lot standard = 50
                                lotSize = 0.01 # micro lot
                            )
                    inPosition = True
                    tpInPips = -slInPips

        return inPosition, slInPips, tpInPips, entryPrice, entryDate

past meteor Mar 29, 2024, 10:01 AM

#

mortal pumice Hope you guys can find something to optimize. 🙂 Here is the main loop of my pro...

Yup, that's what I suspected.

#

You're typically not supposed to loop over data frames

mortal pumice Mar 29, 2024, 10:04 AM

#

past meteor You're typically not supposed to loop over data frames

Yeah but I don't know how to do other way :/

past meteor Mar 29, 2024, 10:05 AM

#

mortal pumice Yeah but I don't know how to do other way :/

How well do you know Pandas yet?

#

There's no wrong answers, I just want to see how best I can help you 😄

mortal pumice Mar 29, 2024, 10:06 AM

#

past meteor How well do you know Pandas yet?

I know you can do queries like in a database and you can use apply function that could maybe improve the speed

#

I used pandas because dataframes are very "readable" datastructure and easy to use with a lot of functions but maybe I should use another datastructure to store the datas ?

past meteor Mar 29, 2024, 10:37 AM

#

mortal pumice I know you can do queries like in a database and you can use apply function that...

So, I 'd advise you to learn what functions Pandas has to offer because you can replace your "imperative" code with Pandas' methods that do the same but in 1 line of code and without having to write loops because looping over dataframes is what makes it slower

mortal pumice Mar 29, 2024, 11:18 AM

#

past meteor So, I 'd advise you to learn what functions Pandas has to offer because you can ...

Ok I'll try to learn more about pandas functions

long canopy Mar 29, 2024, 11:51 AM

#

anyone done some distributed inference yet? am messing about with PiPPy atm

long canopy Mar 29, 2024, 12:23 PM

#

what is pytorch doing with my threads gahhh

#

so much abstraction i can't see anything

serene scaffold Mar 29, 2024, 2:05 PM

#

hallow sphinx So conda is just a Cpython distribution that contains pre installed packages and...

It's more than that. But in 99.9% of circumstances, you don't need it. You can just use python normally and pip install everything you need.

final kiln Mar 29, 2024, 2:31 PM

#

I do all my development in a cloud computer running a docker container with all the dependencies.

....and sometimes the cloud computer is my own laptop - it makes all lot more sense than what it sounds like once you do use it

#

Like, I can onboard a dev in about 20-30min or less. No one else has to install any packages or worry about env

agile owl Mar 29, 2024, 2:45 PM

#

serene scaffold It's more than that. But in 99.9% of circumstances, you don't need it. You can j...

it's a convenient way to spawn new python environments too

serene scaffold Mar 29, 2024, 2:45 PM

#

agile owl it's a convenient way to spawn new python environments too

You can do that with normal python

agile owl Mar 29, 2024, 2:45 PM

#

not if you need to switch python versions

#

which will occasionally be the case with ML and numerical libraries and finding what works

#

also I understand in the latest versions of conda they are switching to the mamba solver if I'm not mistaken

#

so the main reason it was kind of unenjoyable before should be going away

final kiln Mar 29, 2024, 2:48 PM

#

Can't you do it with pyenv

agile owl Mar 29, 2024, 2:48 PM

#

idk what that is and why people keep pushing it

#

conda has its own repos too

#

which is good for business usage

final kiln Mar 29, 2024, 2:48 PM

#

I'm not pushing it, I'm saying that I think you can manage py versions with pyenv

agile owl Mar 29, 2024, 2:48 PM

#

A lot of people recommend it but I've never heard of it anywhere but this server

final kiln Mar 29, 2024, 2:48 PM

#

I favour docker over all of this person ally

final kiln Mar 29, 2024, 2:49 PM

#

agile owl A lot of people recommend it but I've never heard of it anywhere but this server

I think poetry uses it, might've been how I found out about it

#

Not too sure tho

agile owl Mar 29, 2024, 2:50 PM

#

in my professional life however, I've encountered conda independently several times

#

does that really mean anything? idk maybe

final kiln Mar 29, 2024, 2:50 PM

#

Same, but I've encountered all of them I think

#

Tho docker is everywhere I haven't been in any project where docker isn't used

#

In some way shape or form, there's always been docker

agile owl Mar 29, 2024, 2:52 PM

#

I use docker but just when I'm getting ready to make my stuff portable I think running everything out of docker from the beginning of development sounds like a pain in the ass

final kiln Mar 29, 2024, 2:53 PM

#

Sir, you'd be 100% correct

#

But - after spending the time and getting it right for the first time, it has been a breeze. Never going back.

agile cobalt Mar 29, 2024, 2:54 PM

#

agile owl not if you need to switch python versions

py -m venv .venv, py -3.11 -m venv .venv? assuming you have the version you want installed

final kiln Mar 29, 2024, 2:54 PM

#

agile owl I use docker but just when I'm getting ready to make my stuff portable I think r...

Like, there's a way to do it in which it is not troublesome. It just took me a long time to fine tune it.

agile owl Mar 29, 2024, 2:54 PM

#

isn't py some weird program that only exists on windows

#

I've never used that either I don't know what it is and only heard of it here

agile cobalt Mar 29, 2024, 2:54 PM

#

you can do the same thing with python3 in linux as far as I know

agile owl Mar 29, 2024, 2:55 PM

#

yeah but you need to then manually install python versions

final kiln Mar 29, 2024, 2:55 PM

#

On that note, why do we have so many names

agile owl Mar 29, 2024, 2:55 PM

#

which is the entire point of what I was getting at

#

don't manage python version installations yourself

#

just treat python like a package

#

with a tool that can pull it

agile cobalt Mar 29, 2024, 2:56 PM

#

you still need to keep track of which versions you are using, trying to pretend that it works like magic is just throwing problems under the carpet

final kiln Mar 29, 2024, 2:57 PM

#

No, I think he means like, you can have several 3.11 installations done via conda, like it provides the API so you don't have to do it

agile cobalt Mar 29, 2024, 2:57 PM

#

agile owl with a tool that can pull it

that much I guess that I can understand, but still don't consider enough of a upside to use conda

also, technically you can install python via the command line? definitely overly complicated territory though

agile owl Mar 29, 2024, 3:01 PM

#

I understand that there is actually a way to have models bigger than a single GPU's vram if you pipeline the model into different pieces where each piece fits into a single GPU

#

I want to learn how to do this

#

does anyone have any resources on how this sort of thing can be done

toxic mortar Mar 29, 2024, 3:03 PM

#

How do you decide how to fill NaNs in ur dataset?

#

always mean?

final kiln Mar 29, 2024, 3:03 PM

#

agile owl I understand that there is actually a way to have models bigger than a single GP...

Interesting problem, haven't thought about it myself.

The only technique I know that is in that vein is gradient accumulation, which is pretty simple to implement and quite effective

final kiln Mar 29, 2024, 3:04 PM

#

toxic mortar How do you decide how to fill NaNs in ur dataset?

Ig the first question I'd ask is why are they there, do they mean anything, is it missing data, if so can I get away with removing data points that contain NaN, etc

agile owl Mar 29, 2024, 3:05 PM

#

toxic mortar How do you decide how to fill NaNs in ur dataset?

there is in principle a different approach for each dataset and each model

#

interpretation of missing data can be different depending on the context

#

in some cases you might want to backfill it using something like KNN

#

in some cases you know what it means and should encode a dummy variable

toxic mortar Mar 29, 2024, 3:06 PM

#

from dataset with [15716 rows x 16 columns] , Number of rows without NaN in any column: 2955

#

Thats <20%

#

Considerable amount of NaNs

final kiln Mar 29, 2024, 3:06 PM

#

Can you get away with removing the column that contains NaNs ?

toxic mortar Mar 29, 2024, 3:06 PM

#

Missing value NaNs

agile owl Mar 29, 2024, 3:06 PM

#

I bet it's a subset of columns that are most often missing

final kiln Mar 29, 2024, 3:07 PM

#

I mean how much info can it hold of it's mostly nans rite

agile owl Mar 29, 2024, 3:07 PM

#

if it's just random missing data I think doing interpolation using KNN is actually a good idea

#

but if it's a few columns then you might just want to drop them

toxic mortar Mar 29, 2024, 3:07 PM

#

Distribution of missing values is like this:

Date - 0 NaNs
Issuer - 0 NaNs
Symbol - 0 NaNs
Exchange - 0 NaNs
Amount - 248 NaNs
Security - 4 NaNs
Performance_1Qtr_After_Deal - 1087 NaNs
Performance_1Yr_After_Deal - 938 NaNs
Performance_to_Current - 0 NaNs
Market Cap - 652 NaNs
Forward P/E - 4841 NaNs
PEG Ratio - 11353 NaNs
Price/Sales - 4184 NaNs
ROE - 3414 NaNs
Debt-to-Equity Ratio - 5152 NaNs
Net Income - 3376 NaNs

#

what do you mean maru? Are you suggesting using clustering to predict value range which I can plug in into missing data?

#

for each NaN column?

agile owl Mar 29, 2024, 3:09 PM

#

there are things built into sk learn to do this

final kiln Mar 29, 2024, 3:09 PM

#

toxic mortar Distribution of missing values is like this: ``` Date - 0 NaNs Issuer - 0 NaNs S...

Idk actually, seems evenly distributed almost

agile owl Mar 29, 2024, 3:09 PM

#

i forgot what it's called exactly

#

but with those y ou probably can't do it

#

because just from my personal knowledge

jaunty helm Mar 29, 2024, 3:09 PM

#

toxic mortar How do you decide how to fill NaNs in ur dataset?

context is important
e.g. if I have a row whose basementArea is nan, but the value in hasBasement is False, then the nan is probably because there's no basement, and I'd fill that with a 0

agile owl Mar 29, 2024, 3:09 PM

#

all those missing ratios are probably cases where earnings are negative or zero

#

or sales are negative or zero

#

so they just make the ratio a nan

toxic mortar Mar 29, 2024, 3:09 PM

#

Agree, thats why i cant just simply drop NaN dense columns or like use mean value

agile owl Mar 29, 2024, 3:10 PM

#

you should one hot encode the missing variables

#

like

#

if you need to make them some value for your model do that

#

but then also have a column that says "this row had this column replaced because it was nan"

#

as an indicator variable

toxic mortar Mar 29, 2024, 3:11 PM

#

It is little more complex than that, and it isnt the point of my question, i just want to ask you for suggestion how to structurely think about filling in missing values or like what can i do with it

final kiln Mar 29, 2024, 3:11 PM

#

I mean you can if they just signal that the data point is incomplete, but if it means something otherwise, you can just encode it somehow

agile owl Mar 29, 2024, 3:11 PM

#

you should not fill in those missing values

#

they have a semantic meaning in finance

#

they are probably nan for a reason

#

you can't use zero for negative earnings

final kiln Mar 29, 2024, 3:12 PM

#

Maybe try to dig in the dataset docs, if there's one

toxic mortar Mar 29, 2024, 3:12 PM

#

is this a specific use-case where i shouldnt try to fill in, or that is general goto?

agile owl Mar 29, 2024, 3:12 PM

#

it's because we know that the nans probably exist for a reason and aren't just missing data

#

the reason is that when companies have no sales or no earnings they are undefined

#

and filling them with any number would be inappropriate in a sense

#

that's why the indicator variable is important

final kiln Mar 29, 2024, 3:14 PM

#

The answer then seems to be that you gotta acquire domain specific knowledge about your problem and try to make a decision that makes sense

toxic mortar Mar 29, 2024, 3:14 PM

#

Yeah that makes sense

#

thanks @agile owl @final kiln

#

also, what do you mean by encoding missing values?

#

I would hot encode enums

#

Cause I know that 0-item1 1-item2 etc...

final kiln Mar 29, 2024, 3:16 PM

#

Depends on the model, for example if I was using a language model I'd just use a special token that represents NaN

toxic mortar Mar 29, 2024, 3:16 PM

#

How about in neural nets

#

or like regressions/forests

final kiln Mar 29, 2024, 3:17 PM

#

Perhaps a numerical value that I know for sure won't appear anywhere else, maybe a -1 if all other numbers are positive, idk actually

You can also do one hot encoding

#

And you can actually also do the same thing as with language models

#

Which is to use an embeddings table

toxic mortar Mar 29, 2024, 3:18 PM

#

would it make sense for example to normalize it [0,1] and then to use -1 as encoder

final kiln Mar 29, 2024, 3:18 PM

#

Maybe, I can't say for sure. Normalizing tends to be a pretty good idea tho

toxic mortar Mar 29, 2024, 3:18 PM

#

ok, yeah ig

#

I wont know until i try it

#

tyty man

final kiln Mar 29, 2024, 3:19 PM

#

toxic mortar I wont know until i try it

For sure, but also, to spare you some suffering, try to search through papers with code to see if anyone has solved your problem or a similar problem

#

You'd be surprised at how much time you can save

toxic mortar Mar 29, 2024, 3:20 PM

#

Where do you look

#

hugging face?

final kiln Mar 29, 2024, 3:20 PM

#

paperswithcode.com

#

It shows you the state of the art in the various areas of deep learning

#

And you might even find your dataset there

toxic mortar Mar 29, 2024, 3:21 PM

#

neat! thats what i need

agile owl Mar 29, 2024, 3:26 PM

#

how you encode missing values can also depend on the model yo uare using

#

some models just natively handle nan

#

which is the best imo

long canopy Mar 29, 2024, 3:45 PM

#

what framework are you guys using to run a pytorch model as a server/daemon?

toxic mortar Mar 29, 2024, 3:50 PM

#

https://stackoverflow.com/questions/64739281/how-should-i-handle-nan-values-in-a-finance-df

Stack Overflow

How should I handle NaN values in a Finance DF?

I am a beginner in Machine Learning, my point is..how should i encode the column "OECDSTInterbkRate"? I don't know how to replace the missing values and especially with what. Should I just

#

#

If it is true, can someone explain me why this is the case?

#

Also I think this is extremely nice formulated:

You can create a machine learning model without using the column and use it's performance as a baseline, and carry out a performance(accuracy) benchmarking for all the steps compared to the baseline.

final kiln Mar 29, 2024, 4:44 PM

#

Idk, I'm very wary of anything that means generating new data to fill in the gaps

#

My instinct tells me to just drop columns and data points than to add synthetic data like that. Ideally the model would have some way of encoding "missing data", cuz that in on itself could be a bit of information right, "when these values are missing, the output tends to be a certain value"

What that person says at the start is very true in my experience. Often the data quality is much more important than the model. Ig you can totally choose the wrong model, but if you have bad data not even the best SOTA models will help you out.

#

If I were to directly modify data points from my dataset, I would need a pristine justification to myself

orchid forge Mar 29, 2024, 4:57 PM

#

guys

wooden sail Mar 29, 2024, 4:57 PM

#

final kiln My instinct tells me to just drop columns and data points than to add synthetic ...

this is not quite right either. you can think of it as being two sliders, one for the model and another for the data. the worse one is, the better the other has to be. the issue is that most people do black box ML, i.e. the model is completely made up, and so data is everything

orchid forge Mar 29, 2024, 4:58 PM

#

im doing a project in data analysis

#

i need a lil bit of help

#

i wish i could talk with someone and share my screen and stuff

final kiln Mar 29, 2024, 5:00 PM

#

wooden sail this is not quite right either. you can think of it as being two sliders, one fo...

What I'm saying is that you can have a fancy model and it won't work cuz the data is bad, and it's waaay easier to have a fancy model than to have good data.

orchid forge Mar 29, 2024, 5:00 PM

#

but i accidently left this server now i have to wait for 3 days to get voice verification

final kiln Mar 29, 2024, 5:01 PM

#

Not only that, at least the deeplearning models I've been using are very resilient, I can butcher them and still get good results if my data is of high quality >.>

long canopy Mar 29, 2024, 5:01 PM

#

what are you guys using for serving?

wooden sail Mar 29, 2024, 5:02 PM

#

final kiln Not only that, at least the deeplearning models I've been using are very resilie...

sure, this would be the extreme case where the model slider is set close to 0 but the data is good

#

you can use bad data with a good model and well-motivated regularization to account for data errors and that works too

orchid forge Mar 29, 2024, 5:02 PM

#

why no one is replying to me

wooden sail Mar 29, 2024, 5:02 PM

#

orchid forge i wish i could talk with someone and share my screen and stuff

because there is nothing we can do about this

#

if you have questions, by all means go ahead and ask

final kiln Mar 29, 2024, 5:03 PM

#

I've spent countless hours around bad data to get nothing, then I got better data and it got solved in like 30min

#

It happens a lot to me

wooden sail Mar 29, 2024, 5:03 PM

#

for complicated phenomena, it's very difficult to make a good model

final kiln Mar 29, 2024, 5:03 PM

#

I suppose the data slider is more important, that's how I feel

orchid forge Mar 29, 2024, 5:03 PM

#

wooden sail if you have questions, by all means go ahead and ask

would you help me?

wooden sail Mar 29, 2024, 5:03 PM

#

in those cases you're kinda screwed without good data

final kiln Mar 29, 2024, 5:04 PM

#

But ig it can depend on the problem

wooden sail Mar 29, 2024, 5:04 PM

#

orchid forge would you help me?

i can't know if you don't ask your question

final kiln Mar 29, 2024, 5:04 PM

#

Maybe I've only worked with problems where data is more important

serene scaffold Mar 29, 2024, 5:04 PM

#

@wooden sail why is it?

wooden sail Mar 29, 2024, 5:04 PM

#

why is what

serene scaffold Mar 29, 2024, 5:04 PM

#

them

#

try showing the dataset and say what the task is

orchid forge Mar 29, 2024, 5:05 PM

#

wooden sail i can't know if you don't ask your question

i have a dataset and they have given me some task and i've done most of the work except two tasks and nothing is helpful not even chatGPT

wooden sail Mar 29, 2024, 5:05 PM

#

you're still not asking a question

serene scaffold Mar 29, 2024, 5:05 PM

#

imagine that you are the person trying to help you. what would that person need to know to say something helpful

orchid forge Mar 29, 2024, 5:06 PM

#

serene scaffold try showing the dataset and say what the task is

how can i show you my dataset?

long canopy Mar 29, 2024, 5:06 PM

#

this is one of those english, m*, do you speak it moments

serene scaffold Mar 29, 2024, 5:06 PM

#

orchid forge how can i show you my dataset?

what is the structure of it? text files? images?

orchid forge Mar 29, 2024, 5:06 PM

#

excel

serene scaffold Mar 29, 2024, 5:06 PM

#

long canopy this is one of those english, m*, do you speak it moments

no need to be rude

serene scaffold Mar 29, 2024, 5:06 PM

#

orchid forge excel

is it xlsx or csv?

orchid forge Mar 29, 2024, 5:06 PM

#

its normal column and row data

serene scaffold Mar 29, 2024, 5:06 PM

#

the file itself: what is the extension?

orchid forge Mar 29, 2024, 5:06 PM

#

serene scaffold is it xlsx or csv?

xlsx

#

can i send it here?

serene scaffold Mar 29, 2024, 5:07 PM

#

okay. show a screenshot that shows the names of each column and the first few rows (and nothing else--don't include a bunch of other stuff on your screen)

orchid forge Mar 29, 2024, 5:08 PM

#

okay wait

serene scaffold Mar 29, 2024, 5:09 PM

#

Please stop trying to upload documents. Please post a screenshot.

orchid forge Mar 29, 2024, 5:10 PM

#

oh no i cant send the xlsx file here

orchid forge Mar 29, 2024, 5:10 PM

#

serene scaffold Please stop trying to upload documents. Please post a screenshot.

can i send u personally?

serene scaffold Mar 29, 2024, 5:10 PM

#

No

#

If you're willing to upload the whole xlsx file here, I'm not sure why a screenshot would be an issue

orchid forge Mar 29, 2024, 5:11 PM

#

the thing is idk how to take a screenshot

final kiln Mar 29, 2024, 5:11 PM

#

wooden sail you can use bad data with a good model and well-motivated regularization to acco...

I'd need an example of this tbh, cuz like, I don't think it's advisable to just invent new data out of nowhere, especially if it's purely motivated by stats and not from some understanding of the underlying phenomena

long canopy Mar 29, 2024, 5:11 PM

#

so... what do you guys serve models with

serene scaffold Mar 29, 2024, 5:12 PM

#

orchid forge the thing is idk how to take a screenshot

what operating system are you on

wooden sail Mar 29, 2024, 5:12 PM

#

final kiln I'd need an example of this tbh, cuz like, I don't think it's advisable to just ...

well yeah, just making data up is always a bad idea 😛

orchid forge Mar 29, 2024, 5:12 PM

#

intel

final kiln Mar 29, 2024, 5:13 PM

#

wooden sail well yeah, just making data up is always a bad idea 😛

Not necessarily, I can think of situations where it is okay

wooden sail Mar 29, 2024, 5:13 PM

#

i mean randomly making it up with no motivation behind how you made it up

orchid forge Mar 29, 2024, 5:13 PM

#

64 bit

serene scaffold Mar 29, 2024, 5:13 PM

#

orchid forge intel

are you on windows, mac, or linux

orchid forge Mar 29, 2024, 5:13 PM

#

windows

wooden sail Mar 29, 2024, 5:13 PM

#

things like missing data are ok as long as there is some notion of "structure" or "low dimensionality" underlying the data

serene scaffold Mar 29, 2024, 5:13 PM

#

orchid forge windows

okay, open the "snipping tool"

long canopy Mar 29, 2024, 5:14 PM

#

wooden sail i mean randomly making it up with no motivation behind how you made it up

label images according to whether the 192th pixel's color hex is prime

final kiln Mar 29, 2024, 5:14 PM

#

wooden sail things like missing data are ok as long as there is some notion of "structure" o...

Ok wait, I think I didn't understand what you said. You said regularization to account for errors, what do you mean by that ?

orchid forge Mar 29, 2024, 5:14 PM

#

serene scaffold okay, open the "snipping tool"

omg yeah got it

serene scaffold Mar 29, 2024, 5:14 PM

#

orchid forge omg yeah got it

great. Remember to only use screenshots to share information that you cannot share as text. Text is always preferable to screenshots.

final kiln Mar 29, 2024, 5:15 PM

#

wooden sail you can use bad data with a good model and well-motivated regularization to acco...

In here

wooden sail Mar 29, 2024, 5:15 PM

#

final kiln Ok wait, I think I didn't understand what you said. You said regularization to a...

depends on the type of error, but some concrete examples include the data being "bad" due to noise. if you know the noise statistics, then you can do something about it

#

if the data has no noise but parts are missing, and you know it follows a "simple"/predictable structure, that's also fine

#

a combination of the two is also ok

#

this would mean you know the model is "simple" and you also have a statistical model

final kiln Mar 29, 2024, 5:16 PM

#

wooden sail depends on the type of error, but some concrete examples include the data being ...

Okay yeah that actually makes total sense and I've done that many times especially with data coming out of instruments, you know the noise cuz you can literally sample it.

wooden sail Mar 29, 2024, 5:17 PM

#

if you also know your model is wrong because it's a little too simple, but it usually performs well, there is a way to measure "mismatch". you can try to make simplified models with fewer parameters that are "usually" "almost correct"

#

these tend to be robust to data errors, but in exchange the maximum "resolution" is poor

#

never too wrong, but also never quite right

final kiln Mar 29, 2024, 5:18 PM

#

Would you say it makes sense to try to identify noise in the data even tho you can't actually sample it ? Maybe using stats you see that there's like a random component to it and you remove it

My instinct is to not remove it, cuz I'm ignorant about what it is and where it came from

orchid forge Mar 29, 2024, 5:18 PM

#

serene scaffold great. **Remember to only use screenshots to share information that you cannot s...

#

serene scaffold Mar 29, 2024, 5:19 PM

#

orchid forge

okay. now explain what the task is. be specific, so that we don't have to interview you.

wooden sail Mar 29, 2024, 5:19 PM

#

final kiln Would you say it makes sense to try to identify noise in the data even tho you c...

yeah you can't even remove it if you don't know what it is. there are some decompositions that, although blackboxy in that they use deep learning, are based on the idea of decomposing the data into a "simple"/"structured" component and another "random" one with lots of detail that cannot be trusted

orchid forge Mar 29, 2024, 5:20 PM

#

serene scaffold okay. now explain what the task is. be specific, so that we don't have to interv...

Task: Geographic Analysis
Plot the locations of restaurants on a map using longitude and latitude coordinates

serene scaffold Mar 29, 2024, 5:20 PM

#

orchid forge Task: Geographic Analysis Plot the locations of restaurants on a map using long...

what tool are they asking you to use?

orchid forge Mar 29, 2024, 5:20 PM

#

python ofc

orchid forge Mar 29, 2024, 5:21 PM

#

serene scaffold what tool are they asking you to use?

they didnt mention that

#

i just have to do the work with python thats all

final kiln Mar 29, 2024, 5:22 PM

#

I gotta do some upskilling on non-deeplearning ML

I hate stats tho >.>

serene scaffold Mar 29, 2024, 5:22 PM

#

orchid forge python ofc

okay, try using this: https://geopandas.org/en/latest/gallery/create_geopandas_from_pandas.html

orchid forge Mar 29, 2024, 5:23 PM

#

ok

serene scaffold Mar 29, 2024, 5:23 PM

#

final kiln I gotta do some upskilling on non-deeplearning ML I hate stats tho >.>

but non-DL ML is just stats 😮

#

(and DL is just doing calculus on the stats)

orchid forge Mar 29, 2024, 5:24 PM

#

also you're so kind @serene scaffold

#

you have so much patients for someone who is dumb like me lol

serene scaffold Mar 29, 2024, 5:26 PM

#

@orchid forge no problem. do you know how to install stuff (like numpy, pandas, etc)?

orchid forge Mar 29, 2024, 5:26 PM

#

ya lol

#

ofc

serene scaffold Mar 29, 2024, 5:26 PM

#

you'll need pandas and geopandas. and probably openpyxl, to open the excel file as a dataframe

final kiln Mar 29, 2024, 5:28 PM

#

serene scaffold but non-DL ML is just stats 😮

Ig I'm being unfair to the subject. I think that stats alone is extremely dry and unappealing, but when it's coupled with a subject it becomes something really good. Some of the most profound ideas that I've had the pleasure to put in my head are statistical in nature

orchid forge Mar 29, 2024, 5:28 PM

#

serene scaffold <@1127458312938598460> no problem. do you know how to install stuff (like numpy,...

im using google colab i hope i could import those libraries

serene scaffold Mar 29, 2024, 5:29 PM

#

orchid forge im using google colab i hope i could import those libraries

yeah, you just need to do !pip install pandas geopandas openpyxl as a cell.

orchid forge Mar 29, 2024, 5:30 PM

#

k

orchid forge Mar 29, 2024, 5:34 PM

#

serene scaffold yeah, you just need to do `!pip install pandas geopandas openpyxl` as a cell.

you're an admins woowww

serene scaffold Mar 29, 2024, 5:34 PM

#

orchid forge you're an admins woowww

yeah, I spend way too much time here

#

😄

orchid forge Mar 29, 2024, 5:35 PM

#

its nice that you help people

serene scaffold Mar 29, 2024, 5:35 PM

#

I only do it to offset what a horrible person I am in every other aspect of my life

orchid forge Mar 29, 2024, 5:37 PM

#

no you're not, if you're helping someone like me i think you're the coolest yet very humble person

serene scaffold Mar 29, 2024, 5:38 PM

#

anyway, this isn't about me. what progress have you made towards making the "maps"?

orchid forge Mar 29, 2024, 5:38 PM

#

i just imported the libraries

#

and now writing the further code

#

god i can't write a single code i wanna cry

serene scaffold Mar 29, 2024, 5:44 PM

#

orchid forge god i can't write a single code i wanna cry

start with just loading the excel data as a dataframe, and then converting that dataframe to a geodataframe

orchid forge Mar 29, 2024, 5:45 PM

#

ok

#

look

serene scaffold Mar 29, 2024, 5:48 PM

#

okay, what does gdf look like when it prints?

orchid forge Mar 29, 2024, 5:49 PM

#

#

like this

#

😕

serene scaffold Mar 29, 2024, 5:49 PM

#

looks good to me. keep following the example from the geopandas website

#

it looks like you have cities from all over the world, so you can skip the part where they restrict the map to just south america.

orchid forge Mar 29, 2024, 5:50 PM

#

okay

#

god idk what to do now

serene scaffold Mar 29, 2024, 5:52 PM

#

orchid forge god idk what to do now

https://geopandas.org/en/latest/gallery/create_geopandas_from_pandas.html
you're on #5 now

orchid forge Mar 29, 2024, 5:52 PM

#

serene scaffold https://geopandas.org/en/latest/gallery/create_geopandas_from_pandas.html you're...

what does that mean?

serene scaffold Mar 29, 2024, 5:53 PM

#

orchid forge what does that mean?

orchid forge Mar 29, 2024, 5:53 PM

#

ok

serene scaffold Mar 29, 2024, 5:57 PM

#

I think you can just remove the .clip([ ]) part

orchid forge Mar 29, 2024, 5:58 PM

#

ok

long canopy Mar 29, 2024, 5:58 PM

#

it's an absolute frigging pain to download these huge models

#

is there no better alternative than git clone? it keeps messing up

#

forg it, i'll download the parts file by file

desert oar Mar 29, 2024, 5:58 PM

#

long canopy it's an absolute frigging pain to download these huge models

huggingface?

long canopy Mar 29, 2024, 5:58 PM

#

desert oar huggingface?

yeah but it uses git clone

orchid forge Mar 29, 2024, 5:59 PM

#

loookkkkkkkkkk

desert oar Mar 29, 2024, 5:59 PM

#

interesting. git is historically really bad at very large binary files.

orchid forge Mar 29, 2024, 5:59 PM

#

i did it

long canopy Mar 29, 2024, 5:59 PM

#

desert oar interesting. git is historically really bad at very large binary files.

it is absolute scheisse

#

and my instances keep dying because they run out of memory

#

from a DOWNLOAD

desert oar Mar 29, 2024, 5:59 PM

#

long canopy it is absolute scheisse

it's just not a good fit for the data model. git is meant to work well for lots of small and medium size text files containing mostly text.

long canopy Mar 29, 2024, 6:00 PM

#

any alternatives? otherwise i'm just going to make a shell script that downloads a list of URLs

serene scaffold Mar 29, 2024, 6:00 PM

#

orchid forge loookkkkkkkkkk

YAY

orchid forge Mar 29, 2024, 6:00 PM

#

omgggggg

final kiln Mar 29, 2024, 6:01 PM

#

long canopy any alternatives? otherwise i'm just going to make a shell script that downloads...

Might as well just download it directly from the hugging face web UI

serene scaffold Mar 29, 2024, 6:01 PM

#

desert oar interesting. git is historically really bad at very large binary files.

huggingface makes you install an extra git module to handle large binaries

desert oar Mar 29, 2024, 6:01 PM

#

serene scaffold huggingface makes you install an extra git module to handle large binaries

git-lfs? or something else?

final kiln Mar 29, 2024, 6:01 PM

#

Oh I didn't know that

serene scaffold Mar 29, 2024, 6:01 PM

#

yes

desert oar Mar 29, 2024, 6:01 PM

#

I've only just started using HF (via sagemaker)

final kiln Mar 29, 2024, 6:02 PM

#

Then how is it messing up ?

#

git lfs is pretty good

long canopy Mar 29, 2024, 6:02 PM

#

I think it's a memory leak?

orchid forge Mar 29, 2024, 6:02 PM

#

@desert oar you're a genius tho

long canopy Mar 29, 2024, 6:02 PM

#

last instance died from out of memory

#

i was downloading starcoder2

final kiln Mar 29, 2024, 6:02 PM

#

Disk memory or ram memory?

long canopy Mar 29, 2024, 6:02 PM

#

ram

#

16 GB ram instance

final kiln Mar 29, 2024, 6:02 PM

#

Interesting

desert oar Mar 29, 2024, 6:02 PM

#

orchid forge <@389497659087650836> you're a genius tho

I'm just a chunk of salt but thanks

final kiln Mar 29, 2024, 6:03 PM

#

Try increasing your swap file

long canopy Mar 29, 2024, 6:03 PM

#

final kiln Interesting

total download is 65 GB, my instance had 75 GB + 16 GB RAM

final kiln Mar 29, 2024, 6:03 PM

#

But no ideas beyond that

long canopy Mar 29, 2024, 6:03 PM

#

final kiln Try increasing your swap file

will just get a bigger instance if it comes to this lol

orchid forge Mar 29, 2024, 6:03 PM

#

desert oar I'm just a chunk of salt but thanks

oops not u

long canopy Mar 29, 2024, 6:03 PM

#

cloud makes swap obsolete heheh

orchid forge Mar 29, 2024, 6:03 PM

#

i mean @serene scaffold

final kiln Mar 29, 2024, 6:03 PM

#

long canopy cloud makes swap obsolete heheh

You do pay more for more ram

long canopy Mar 29, 2024, 6:04 PM

#

final kiln You do pay more for more ram

0.01 USD more for the hour lol

final kiln Mar 29, 2024, 6:04 PM

#

But in any case, 16gb should be more than enough

serene scaffold Mar 29, 2024, 6:04 PM

#

desert oar I'm just a chunk of salt but thanks

whenever I see salt rock lamps for sale at stores, I feel bad for your trapped brethren

final kiln Mar 29, 2024, 6:04 PM

#

long canopy 0.01 USD more for the hour lol

Woah, I don't believe that

long canopy Mar 29, 2024, 6:04 PM

#

final kiln But in any case, 16gb should be more than enough

i know right? something is wrong

long canopy Mar 29, 2024, 6:04 PM

#

final kiln Woah, I don't believe that

yeah check out google cloud spot instances

final kiln Mar 29, 2024, 6:04 PM

#

16 to 32 a .01?

#

Ah spot, yeah possible

long canopy Mar 29, 2024, 6:04 PM

#

final kiln 16 to 32 a .01?

let me tell you exactly

final kiln Mar 29, 2024, 6:04 PM

#

Spot is an auction market so that does happen

orchid forge Mar 29, 2024, 6:04 PM

#

i wanna be a coder like you @serene scaffold

final kiln Mar 29, 2024, 6:05 PM

#

I've been using 32gb of ram cuz its cheaper than the 16gb

long canopy Mar 29, 2024, 6:05 PM

#

final kiln I've been using 32gb of ram cuz its cheaper than the 16gb

0.09 extra USD per hour for 16 to 32

#

half that for spot

#

why do people even buy computers anymore lol

serene scaffold Mar 29, 2024, 6:06 PM

#

orchid forge i wanna be a coder like you <@253696366952316929>

seems like you're on your way

final kiln Mar 29, 2024, 6:06 PM

#

Never saw a just 10cent increase like that tbh

#

But yeah in that case might as well rite

long canopy Mar 29, 2024, 6:07 PM

#

final kiln Never saw a just 10cent increase like that tbh

gcp

long canopy Mar 29, 2024, 6:08 PM

#

final kiln Never saw a just 10cent increase like that tbh

oh half price for quad core non-spot

final kiln Mar 29, 2024, 6:08 PM

#

long canopy why do people even buy computers anymore lol

My computer has been turned into a glorified terminal for cloud machines

long canopy Mar 29, 2024, 6:08 PM

#

nice

long canopy Mar 29, 2024, 6:08 PM

#

final kiln My computer has been turned into a glorified terminal for cloud machines

lol mine too

final kiln Mar 29, 2024, 6:09 PM

#

I could legit just code from my cellphone browser

#

I'd click some buttons on my GitHub actions workflows and it gives me web link that opens a vscode in the browser

long canopy Mar 29, 2024, 6:09 PM

#

i'm going to start fully transitioning to cloud lol

final kiln Mar 29, 2024, 6:10 PM

#

It's worth it for sure, and you don't even need to be confined there cuz your computer can also be part of the list of machines right

#

So you get a perfect reproduceble env across any machine if you do it right

#

But mostly cloud tho

orchid forge Mar 29, 2024, 6:10 PM

#

serene scaffold seems like you're on your way

i think i could learn a lot from you

long canopy Mar 29, 2024, 6:10 PM

#

final kiln It's worth it for sure, and you don't even need to be confined there cuz your co...

this

#

but also it's made me feel that turning off my computer at night is a waste lol

#

i still literally have a hard time believing this cloud stuff is available

#

and it is so FRIGGIN CHEAP

serene scaffold Mar 29, 2024, 6:11 PM

#

orchid forge i think i could learn a lot from you

you can probably learn from any of the people who frequent this channel

orchid forge Mar 29, 2024, 6:11 PM

#

hmm

long canopy Mar 29, 2024, 6:12 PM

#

before the end of the day I'm going to try running distributed inference with starcoder2 on 80 E2 1GB instances, wish me luck

long canopy Mar 29, 2024, 6:13 PM

#

long canopy before the end of the day I'm going to try running distributed inference with st...

also it's going to be free because I haven't used my compute this month lololo

final kiln Mar 29, 2024, 6:14 PM

#

long canopy and it is so FRIGGIN CHEAP

I feel that the way I got this stuff setup is so good that I could just turn it into a product and sell it. Something cloud agnostic and not dependent on GitHub actions. You'd just need to install an agent on your machine and you'd get the whole thing. Cloud agnostic env, from dev to prod, spot instance pricing only cuz fault tolerance is pretty easy to account for

long canopy Mar 29, 2024, 6:15 PM

#

final kiln I feel that the way I got this stuff setup is so good that I could just turn it ...

yeah i mean the time investment to get this stuff working is non negligible

#

i've been at it nonstop for like the last month and it's still not good enough for prod

final kiln Mar 29, 2024, 6:15 PM

#

Would 100% blow up the current competition that doesn't even have GPU, let alone give you the option to not use the cloud if you don't want

#

About 70-80% of the pricing too

long canopy Mar 29, 2024, 6:16 PM

#

final kiln About 70-80% of the pricing too

dude way less

orchid forge Mar 29, 2024, 6:16 PM

#

i have another task ..... Analyze the ratings and popularity of different restaurant chains

#

for the same dataset

final kiln Mar 29, 2024, 6:16 PM

#

long canopy dude way less

No.

I'd need my cut hueheuehs s

#

Alas, my brain is not smart and prefers to do ML research for free

lapis inlet Mar 29, 2024, 6:19 PM

#

Hey I was trying to use the BERT model for one of my applications but it seems I'm not able to install tensorflow-text library, currently using Python3.12 any suggestions?

serene scaffold Mar 29, 2024, 6:22 PM

#

orchid forge

you should change all the Yes/No data to True and False. and you can probably ignore "Rating color" and "Rating text", since those are just non-numeric versions of "Aggregate rating".

orchid forge Mar 29, 2024, 6:22 PM

#

ok

#

how to code it

serene scaffold Mar 29, 2024, 6:28 PM

#

I have to head out for a bit

serene scaffold Mar 29, 2024, 6:28 PM

#

orchid forge i have another task ..... Analyze the ratings and popularity of different restau...

what "analysis" do they want you to do, anyway? just looking for patterns?

orchid forge Mar 29, 2024, 6:29 PM

#

ya i guess

#

i think they just want me to do it all by myself they are not specific with things

#

i think im free to analyze it the way i want to

orchid forge Mar 29, 2024, 6:31 PM

#

serene scaffold I have to head out for a bit

k thanks for helpng, you rock!!!

final kiln Mar 29, 2024, 6:36 PM

#

long canopy dude way less

You know what, screw it, I'm submitting the idea to ycombinator. I gotta be trying everything, and I suppose this kinda counts as a job application

opaque mantle Mar 29, 2024, 8:24 PM

#

Which library should we use to create data insights for a python project

#

please ping me if someone answers

serene scaffold Mar 29, 2024, 8:55 PM

#

opaque mantle Which library should we use to create data insights for a python project

What is a "data insight"? Pandas is the most popular library for reading tabular data (like excel or CSV). And matplotlib is (unfortunately) the main one for creating data visualizations.

lime aspen Mar 29, 2024, 9:00 PM

#

Hi. I have a containerised pytorch model which does a prediction when you give an input but the problem is that the majority of the time spent by the container image when it is loaded (on a serverless GPU platform), is spent on importing packages and dependencies like importing torch, other dependencies from other places in the directory.
Is it possible to reduce this time? Also is it possible to reduce the load time of a pytorch model as well?

Like my inference is done in less than 5 seconds but these package imports shoot up the inference time by 5x because the packages are not imported yet. How do I solve this?

Any help is appreciated!

serene scaffold Mar 29, 2024, 9:02 PM

#

lime aspen Hi. I have a containerised pytorch model which does a prediction when you give ...

every time you send a request to that container, is it re-loading pytorch and the model in a fresh python process, every time?

lime aspen Mar 29, 2024, 9:03 PM

#

Yes. As the server spins down if it is not in use. Or isn't warm.

serene scaffold Mar 29, 2024, 9:03 PM

#

lime aspen Yes. As the server spins down if it is not in use. Or isn't warm.

the model needs to always be fully loaded and ready to go, or there's nothing you can do to make it fast.

lime aspen Mar 29, 2024, 9:04 PM

#

Really? 😭

#

Isn't there a way to deconstruct a pytorch model such that we can save it as a file and load it as is? Kinda like a cache but saved on disk?

serene scaffold Mar 29, 2024, 9:06 PM

#

Perhaps

#

I don't think it will ultimately be as satisfying as keeping the model warm.

lime aspen Mar 29, 2024, 9:09 PM

#

I came across this article:

https://medium.com/ibm-data-ai/how-to-load-pytorch-models-340-times-faster-with-ray-8be751a6944c

But this is no longer supported, so I thought of another possible way to desconstruct the pytorch model, first remove the weights as numpy arrays and copy the model structure as it says in the article. I was trying to serialize the weights in a json file which is doable, but the problem is with how to save the model structure with the weights removed as a json since it is not serializable! If this can be done i.e. the model scaffolding can be saved somehow it can be instantly loaded in few hundred ms instead of 8-10 seconds that it currently takes.

Medium

How to Load PyTorch Models 340 Times Faster with Ray

Ray’s Plasma object store can reduce the cost of loading deep learning models for inference almost to zero.

charred light Mar 29, 2024, 10:04 PM

#

Random forest is primarily a bagging algorithm since it averages the trees right? On the same note, it also can be boosting (i.e. gradient boosted RF).

proud maple Mar 29, 2024, 10:35 PM

#

What could a graph like this signify? My validation loss is massive, but training loss is only around 1.
I'm doing multi-class classification with FastAI with 9 classes.
My model is using DenseNet121 and FastAI's vision_learner function.
My loss is Weighted Cross Entropy Loss.
I think it might be due to the wrong activation function, but I'm not entirely sure (I'm using ReLU).

I'm really new to ML and would be grateful for any help.

supple inlet Mar 29, 2024, 11:49 PM

#

im trying to run, i have p40 (24gb) cuda version 12.2, 535.161.07 nvidia drivers and pytorch 2.2.1:

`model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
feature_extractor = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)`

but im getting this error:

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSAto enable device-side assertions.

mild grotto Mar 30, 2024, 1:59 AM

#

just wanted to share this gif I made: an equirectangular projection of a cubed sphere with gausians applied at each step and moving particles

#

Using: python, numpy, pyproj, and scipy (oh and cv2)

wooden sail Mar 30, 2024, 5:21 AM

#

mild grotto just wanted to share this gif I made: an equirectangular projection of a cubed s...

nice

wispy junco Mar 30, 2024, 5:30 AM

#

heyy guys, how to enable autocompletion in jupyter notebooks? ( I mean auto closing parenthesis)

wispy junco Mar 30, 2024, 7:11 AM

#

I mean auto closing parenthesis, if it makes sense

wispy junco Mar 30, 2024, 7:15 AM

#

wispy junco heyy guys, how to enable autocompletion in jupyter notebooks? ( I mean auto clos...

It can be found in settings > auto close brackets

opaque mantle Mar 30, 2024, 7:30 AM

#

serene scaffold What is a "data insight"? Pandas is the most popular library for reading tabular...

Thanks 🙂

#

But why unfortunately

#

pithink

orchid forge Mar 30, 2024, 8:49 AM

#

hey

orchid forge Mar 30, 2024, 8:52 AM

#

mild grotto just wanted to share this gif I made: an equirectangular projection of a cubed s...

wow

long canopy Mar 30, 2024, 10:08 AM

#

@final kiln did you find any discord server where people talk about cloud dev? not finding much

#

the AWS discord server is good tho but specific to AWS

final kiln Mar 30, 2024, 10:12 AM

#

long canopy <@935270247366271027> did you find any discord server where people talk about cl...

Haven't searched for it

rocky ridge Mar 30, 2024, 11:12 AM

#

https://discord.com/channels/267624335836053506/1223579820055724072

slow lynx Mar 30, 2024, 2:08 PM

#

I know it is kind of a useless question, but is ReLU actually better then sigmoid in NN forward_prop?

serene scaffold Mar 30, 2024, 2:11 PM

#

slow lynx I know it is kind of a useless question, but is ReLU actually better then sigmoi...

it's difficult to say what makes one activation function "better" than another. it depends on the situation. and even then, it might not be straightforward to explain why one activation function seems to work better.

the important thing is that it's a non-linear function

slow lynx Mar 30, 2024, 2:12 PM

#

Okay i already thought so

#

Just got in to NN, but i'm learning on the way as i develop

final kiln Mar 30, 2024, 2:27 PM

#

slow lynx Just got in to NN, but i'm learning on the way as i develop

Also beware of the random initialization of the weights, it has been the deciding factor for me many times

slow lynx Mar 30, 2024, 2:29 PM

#

Really, how could you counter randomness?

serene scaffold Mar 30, 2024, 2:31 PM

#

slow lynx Really, how could you counter randomness?

you don't.

slow lynx Mar 30, 2024, 2:32 PM

#

Soo, your model could work, but randomness can actually f it up?

final kiln Mar 30, 2024, 2:32 PM

#

slow lynx Really, how could you counter randomness?

Wdym ?

eager oriole Mar 30, 2024, 2:32 PM

#

Hi guys

#

Wanted to know what it was like being too stupid to understand what a pointer is

serene scaffold Mar 30, 2024, 2:32 PM

#

slow lynx Soo, your model could work, but randomness can actually f it up?

you start with randomly initialized weights, and make slight adjustments to them during each training iteration. it doesn't mean that your model might "randomly suck"

eager oriole Mar 30, 2024, 2:32 PM

#

Can you tell me

slow lynx Mar 30, 2024, 2:33 PM

#

eager oriole Wanted to know what it was like being too stupid to understand what a pointer is

Not everyone here only programs in python bru

serene scaffold Mar 30, 2024, 2:33 PM

#

eager oriole Wanted to know what it was like being too stupid to understand what a pointer is

pointers are part of programming languages. not AI. see #❓｜how-to-get-help

eager oriole Mar 30, 2024, 2:33 PM

#

No shit

slow lynx Mar 30, 2024, 2:33 PM

#

I code in C

eager oriole Mar 30, 2024, 2:33 PM

#

The only channel that's actually active

slow lynx Mar 30, 2024, 2:33 PM

#

So yeah i know what a pointer is

serene scaffold Mar 30, 2024, 2:33 PM

#

eager oriole The only channel that's actually active

that doesn't mean you can ask an off-topic question.

#

!rule 7

arctic wedgeBOT Mar 30, 2024, 2:34 PM

#

Rules

7. Keep discussions relevant to the channel topic. Each channel's description tells you the topic.

final kiln Mar 30, 2024, 2:34 PM

#

slow lynx Really, how could you counter randomness?

What I mean is that there are various forms of random init, and choosing one over the other has sometimes been the final step in making my model train correctly

slow lynx Mar 30, 2024, 2:35 PM

#

final kiln What I mean is that there are various forms of random init, and choosing one ove...

It shouldn't really matter right because the model will do back_prop and change the weights accordingly right?

final kiln Mar 30, 2024, 2:36 PM

#

slow lynx It shouldn't really matter right because the model will do back_prop and change ...

I don't know all the reasons it might matter, but the one I can think of is if you do a random init where your values would make the model susceptible to overflow when computing any of the steps

slow lynx Mar 30, 2024, 2:36 PM

#

Ahh yeah alright

final kiln Mar 30, 2024, 2:37 PM

#

But if you look in the torch documentation you might see the various inits and links to the corresponding research paper

slow lynx Mar 30, 2024, 2:38 PM

#

I can see, but for random init between -0.5 and 0.5 it shouldn't be much of a problem i guess

final kiln Mar 30, 2024, 2:38 PM

#

slow lynx I can see, but for random init between -0.5 and 0.5 it shouldn't be much of a pr...

I mean it shouldn't

slow lynx Mar 30, 2024, 2:38 PM

#

I hear a "but..." in that message 😅

final kiln Mar 30, 2024, 2:39 PM

#

But experience has thaught me otherwise

#

Like I'm just saying, if you're stuck, this may be one of the knobs you gotta look at

slow lynx Mar 30, 2024, 2:40 PM

#

Well yeah i have encounterd overflow already but that was because my sensor data wasn't normalised xD

#

So it was working with sens data of a max of 255

slow lynx Mar 30, 2024, 2:40 PM

#

final kiln Like I'm just saying, if you're stuck, this may be one of the knobs you gotta lo...

Good advice

final kiln Mar 30, 2024, 2:40 PM

#

The explanation I gave was just the first plausible thing I could think of, idk if it's the actual reason, there's papers on this stuff

slow lynx Mar 30, 2024, 2:41 PM

#

Alright well good to know

#

Is there btw any tutorial that actually explains back prop any good? It is still kind of magic in my eyes

final kiln Mar 30, 2024, 2:42 PM

#

slow lynx Is there btw any tutorial that actually explains back prop any good? It is still...

3blue1brown has a good playlist on it

When going through any explanation, just remember that you're just doing the chain rule

slow lynx Mar 30, 2024, 2:44 PM

#

alr. thanks

lapis sequoia Mar 30, 2024, 3:07 PM

#

nothing else?!

titanic = sns.load_dataset('titanic')

serene scaffold Mar 30, 2024, 3:09 PM

#

lapis sequoia nothing else?! ```py titanic = sns.load_dataset('titanic') ```

nothing else what?

feral kernel Mar 30, 2024, 5:15 PM

#

Hi, i'm trying to fine tune mistral using a custom dataset, but it is not working... im getting this error validation is not defined, but i defined it . https://github.com/huggingface/transformers/issues/29966

GitHub

name 'tokenized_val_dataset' is not defined but it is defined · Iss...

System Info RTX A4000 ubuntu pytorch Who can help? No response Information The official example scripts My own modified scripts Tasks An officially supported task in the examples folder (such as GL...

serene scaffold Mar 30, 2024, 5:46 PM

#

please always give code as text

#

!code

arctic wedgeBOT Mar 30, 2024, 5:46 PM

#

Formatting code on Discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

fathom tide Mar 30, 2024, 6:44 PM

#

Does anyone know how to fix this?

serene scaffold Mar 30, 2024, 6:47 PM

#

fathom tide Does anyone know how to fix this?

Hello, please always give text as actual text. not screenshots.

#

if you get an import error of the form "cannot import name x from y", it means that you do have y installed, but y doesn't have x

#

langchain is actively developed, so this might be a version mismatch.

fathom tide Mar 30, 2024, 6:48 PM

#

Is there any way to fix it?

serene scaffold Mar 30, 2024, 6:49 PM

#

are you following a tutorial, or what?

fathom tide Mar 30, 2024, 6:49 PM

#

yeah

serene scaffold Mar 30, 2024, 6:49 PM

#

please link the tutorial

fathom tide Mar 30, 2024, 6:50 PM

#

serene scaffold please link the tutorial

https://www.youtube.com/watch?v=Iyh6ftlZ2Q0 im using this tutorial and just googling a bit to try and give my llm a search tool to use

YouTube

Quick Tutorials

How To Build LangChain Custom Agents (Tools): A VERY SIMPLE Tutorial!

🚀 In this tutorial video, we present a very simple and quick tutorial on how to build custom LangChain Agents and Tools. We do this through a very simple Python code!

🔖LangChain is an open source framework that allows AI developers to combine LLMs like GPT-4 with external sources of computation and data. Specifically, LangChain is a framework d...

▶ Play video

serene scaffold Mar 30, 2024, 6:51 PM

#

fathom tide Does anyone know how to fix this?

I don't see that import statement in the tutorial. what did you read that gave you the expectation that it would work?

fathom tide Mar 30, 2024, 6:52 PM

#

serene scaffold I don't see that import statement in the tutorial. what did you read that gave y...

wait i meant this tutorial https://www.youtube.com/watch?v=QI3HrPz7ZlI

YouTube

Vuk Rosić (LevelUpNow.dev)

Serp API + LangChain: Give Internet Access To GPT-4

Learn to build anything possible with AI in my course - schedule a call with me to learn more - https://calendly.com/vukrosic/20min Learn everything about AI and its business application in my course + community - https://www.skool.com/ai-entrepreneur-8527

📚 Explore our video courses covering a wide range of AI topics.
💬 Engage with the communi...

▶ Play video

serene scaffold Mar 30, 2024, 6:53 PM

#

fathom tide wait i meant this tutorial https://www.youtube.com/watch?v=QI3HrPz7ZlI

that video is six months old. so you'll have to figure out what the newest version of langchain was when that video came out

#

looks like it was this one https://github.com/langchain-ai/langchain/tree/v0.0.283

GitHub

GitHub - langchain-ai/langchain at v0.0.283

🦜🔗 Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.

#

so you'd need to do pip install git+https://github.com/langchain-ai/langchain.git@v0.0.283

#

if that doesn't work, copy and paste the entire error message as text.

#

@fathom tide

fathom tide Mar 30, 2024, 6:59 PM

#

@serene scaffold thanks but im looking for a different way with the newest version

serene scaffold Mar 30, 2024, 7:01 PM

#

okay

long canopy Mar 30, 2024, 7:38 PM

#

@final kiln what database tech have you been using?

#

need to start thinking about organizing my logs, metrics and data lol

final kiln Mar 30, 2024, 7:39 PM

#

long canopy <@935270247366271027> what database tech have you been using?

In which context

#

For logging training metrics and such I've been using MLFlow, which I connected to a managed postgresql db in aws

#

For vector db, I've so far used open search and qdrant - and I recommend qdrant for sure, hands down

#

Tho there's potential benefit with using postgresql for vector db, because you can get a managed solution in AWS

#

Postgres has a vector thing, but I haven't used it

#

Qdrant also has managed

#

But it's a smaller and more recent company so there's more risk

#

And for normal stuff I've used MySQL

long canopy Mar 30, 2024, 7:42 PM

#

hm i see. you heard of Redis? unified model apparently for both vector + sql

#

thanks for comments btw

final kiln Mar 30, 2024, 7:42 PM

#

Yeah there's redis too

#

I use redis a lot and it's really good, so far it may actually be the best thing I've used

#

Like it has never given me trouble

#

I just set it up and I forget it exists

#

You don't get that a lot, open search for example is a pain in the butt, I had so much trouble with it

#

MySQL is very good too ofc, but I think it's needlessly complicated to setup replication and other advanced stuff

#

Tho you gonna wanna go for managed at that point

long canopy Mar 30, 2024, 7:44 PM

#

hm but then why don't you switch to redis-only?

final kiln Mar 30, 2024, 7:45 PM

#

Redis is in memory db, it wasn't designed to persist data, it also is noSQL

#

Tho I think you can set it up to persist data

#

It's also very useful to use for locking your processes cuz it's single threaded

long canopy Mar 30, 2024, 7:46 PM

#

final kiln Redis is in memory db, it wasn't designed to persist data, it also is noSQL

it has an sql module

final kiln Mar 30, 2024, 7:46 PM

#

long canopy it has an sql module

Didn't know that

long canopy Mar 30, 2024, 7:46 PM

#

yeah it also has a couple of persistence options

final kiln Mar 30, 2024, 7:46 PM

#

I mean if it does SQL I might try to use it

long canopy Mar 30, 2024, 7:47 PM

#

am wondering whether I should go full redis or learn the other techs individually

final kiln Mar 30, 2024, 7:47 PM

#

I built this huge multi container application

past meteor Mar 30, 2024, 7:49 PM

#

long canopy need to start thinking about organizing my logs, metrics and data lol

I use:

Postgres
Optuna
MinioDB
MLflow
Tensorboard

long canopy Mar 30, 2024, 7:50 PM

#

past meteor I use: * Postgres * Optuna * MinioDB * MLflow * Tensorboard

hm no overlap between MLflow and Tensorboard?

past meteor Mar 30, 2024, 7:50 PM

#

different use cases

long canopy Mar 30, 2024, 7:50 PM

#

never heard of MinioDB, will look it up

past meteor Mar 30, 2024, 7:50 PM

#

It's just on-premise AWS S3