#data-science-and-ml
1 messages · Page 172 of 1
Oh ok
I'm in Idaho in the USA
What's your next plan for your study
I might do a dbt Fundamentals and a data certification
For a digit classification project is there really much data exploration you can do?
not really. and data exploration won't tell you anything you don't already know. like, 1 sometimes looks like 7. 3 sometimes looks like 8.
some people do a closed 4 that looks like a 9
What's the best data analytics certificate since I'm about to get a diploma in a month. School doesn't teach that much and I need the skill. Any of these things worth it? Trying to get much within a month. Is Python mandatory?
is Python mandatory?
the best way to answer this is to look at job listings and see. Chances are, they'll all say Python, or maybe "Python or R".
Usually they use python for ML right?
@serene scaffold, is ML hard in jobs? I took a course and usually use six similar charts and calculations, which I copy and paste but make some small changes to it.
you usually need a masters to stand out. I'm not sure how what you said about charts relates to ML.
Like graphs
making data visualizations isn't ML.
I'm using the phone but this is one of the homework
Plots
this is just data analysis. not ML.
Oh ok
did you train a model at any point?
Its been while. This is the one?
yes, that involves model training.
They course I took is like beginner of ML?
yes, beginner for sure.
👍🏻
@serene scaffold, What's the best approach to get some skills for data analytics? My goal is to project by learning the DBT Fundamental if you recommend and maybe a certification for my resume
Good approach?
from what I understand, certs aren't very valuable for analyst jobs.
how much python should i know before starting data science
you should know all the basics. if you start doing something with data science, and you're confused by the semantics of a for loop, or something like that, you'd need to back track
I have prior experience in programming so shouldn't take too long to learn
I want build fundamental
I need to know where to start
@serene scaffold Tell me where to start. I know bits by bits from different area
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
I know that data science use Python and Data analysts not really @serene scaffold
I struggle with Python. Took 3 course and it always a headache for me
Hey guys, I've been very interested in studying and learning about ML/AI architectures, algorithms, and implementation. I've been reading research papers and been digging into some open source code bases to both deepen my understanding of the architecture, but also grow knowledge and comptence with python and these types of codebases in general.
Im just wondering does anyone here have any similar interests and positions that can offer me advice in this process.
Thank you
sorry for late reply thats annoying how does it compare to other thinking models and also you think we would be able it with system commands or finetuning?
what model are you using or you using chatgpt api?
I've only done some simple tests on the website, usually I wait a few weeks so the software issues are resolved before running local
for example a fix to the chat template a few days ago by unsloth reports that this shoots it up in the polyglot benchmark from 40ish to 60ish
my initial impressions:
- the
sorry I can't help with thattrigger frequency isn't too bad. if your question is complex and needs reasoning, it seems to actually reason about that rather than getting stuck on thinking about policies. - if you can get past the censor, its actual abilities aren't too bad.
- people have said that these feel like
phi, as in their inherent world knowledge is really bad and are meant for agentic use. haven't tested this myself yet. - when it comes to creative writing... yeah probably not gonna test further, all the issues about
ossshow up here and it's not worth it
the most interesting thing about the recent open models is they're all large MoEs with small active parameters - this means you can reasonably run them on consumer hardware, a good chunk of the model can go on normal ram and still run at respectable speeds. these include gpt-oss of course, but also glm, qwen, ernie, etc.
this also means that gpt-oss is less competing against other reasoning models, but these MoE models; on that front I think I see more people preferring qwen's or ernie's similar sized model over the oss 20b
the oss 120b currently has a niche of being very big but having tiny active params even compared against other MoEs, which might mean you can do pure cpu inference with this one (though I don't have enough normal ram to test myself)
Shall look at a job in data entry or reporting assistance/clerk in order to become a data analyst since most require some type of experience?
ive got a plot of precision and recall values for different thresholds
does the threshold value we now choose just depend on our goal?
as in if we want high precision we may choose a higher threshold and vice versa for recall?
I’ll be only finding jobs that need Atleast 1-3 experience 🤔
what are some models i can try for digit classification?
Im currently doing support vector machine classifier (SVC), random forest classifier, K nearest neighbours classifier
That’s really interesting thanks for the insight
yes
maybe try gradient boosting classifier too? it's a kind of decision tree ensemble method
also do hyperparameter tuning if you haven't already, it's possible you might be able to squeeze a little extra performance out of the classifier with better hypermarameters
what is better approach to combining function calling and structured outputs?
So i have a llm, that has some tools it can call. After it thinks task is finished it has to give final combined output in a structured response format.
- Use structured output explictly. I validated that gemini-2.5 flash can call both tools and give final structured output.
- Create a function called give_final_output(response_format: ResponseFormat). where ResponseFormat is pydantic representation of output. Then i can stop llm when it asks to call this function and use the arguments as final output.
Please, I am trying to learn python and leaning towards AI Engineering. Where can I find resources to help me master python. Would be better if these resource is in a jupyter notebook or colab or in any other interactive form that allows me to read code and practice mine right after each learned concept. Would be nice if it contains loads of exercises for me to do.
yeah torture yourself by studying the PyTorch api, the keras api and the tensorflow api then tell chatgpt to create you a colab workbook
I am a beginner. Any resource for absolute beginners to learn the basics of python from beginner to advanced?
Kaggle learn
You can pass your mathematics. It will be important in the future. There is a lot of difference between 90 %and 95 %model success. But if you are just learning, you won't feel that much difference.
University education dives into a lot of mathematics. (Including statistics)
Alright. Thank you very much for this.
Hello everyone. I've been working as a Software Developer and Data Engineer for a few years now. However I have a master in AI with a focus on ML and Deep Learning. I miss doing maths and analysis and I've been looking at job-postings as an AI/ML specialist. Many of these require practical experience with PyTorch and Tensorflow. Now I obiouvly have used Tensorflow as part of my studies, but that was before Keras was part of it and I'm sure both libraries have evolved a lot in the past 6 years.
My question is, how would go about "relearning" and catching up with Modern ML practices and to learn PyTorch (I believe it's the more popular framework now, right?) to an adequate level?
Hi guys,
anyone can suggest more advanced Object Detection like YOLO or Detectron, which are up to date and still maintained?
i would say pytorch is the more popular one now indeed, as for relearning id advice to do small projects with the framework maybe check whats needed in the field u want to apply in.
What's keras?
Hey i have to write a review paper on it give me a suggestion my domain will he al/ml
Any experienced AI/Ml engineers?
I've been tryna train models on a dataset I've used and it just doesn't stop overfitting. Reaches like 99% accuracy within the 1st epoch. Idk what to do, I tried GroupShuffleSplit, dropout, Data augmentation, early stopping. Idk what to do. Im just exhausted.
Which one is good enough to learn. I wasn’t fun of reading? I rather do tutorials and watching videos
Overfitting often means you have too many parameters for the problem. Consider reducing the width of your layers (assuming you’re making NNs), and maybe working with a training set that you introduce some sort of ‘noise’ to.
The other thing is that 99% on epoch 1 is only a problem if you’re not also getting 99% on your test set 😄 Some problems are easily solved with these methods.
There could also be data issues too, unless there's more you haven't mentioned there isn't enough information to conclude that overfitting is the problem, what dataset are you using and what set is the 99% accuracy coming from?
guys, I am looking to train a vehicle price prediction model(among various types of vehicles) using randomforest. I scraped 200k+ data from a dealership website, filtered 50K with no price information and did further processing like - deleted unneccessary fields -
[ 'title', 'retail_price', 'location', 'state', 'is_auction', 'is_for_sale', 'dealer', 'description', 'has_images', 'image_count', 'video_count', '360_image_count', 'specs.Serial Number', 'specs.Stock Number', "listing_date"]
So now the dataset has 150K records with [Category, Manufacturer, Year, Model, Condition(1: new, 0: used), Price, Hours] (1305 records doesn't have Model info, 35773 doesn't have Hours info)
The result is MAE: 30K USD, which is not good.
Any tweeks to improve the result?
I appreciate your help in advance.
I also posted my issue in #1035199133436354600
I'm wondering if it might be a data issue
What is Hours?
I guess I'm wondering if it's reasonable to predict the price of a car from the information you have in the data, and if more might help
For example, I'm curious if state is relevant, since I'd think a car in California or New York might be more expensive than a car in Utah or Alabama
Also I'm wondering if is_auction has implications for the price, that is, if it means the price listed is a starting bid rather than the final sale price
It's possible you could get better results with fine tuning or trying different types of models, but I am curious about your feature selection process and how some of the removed features were determined to be unnecessary
See above for some thoughts
me when I fine-tune a model trained on a naturally massively imbalanced dataset with a uniformly balanced dataset and there's indication of improvement during training and validation, but benchmarking on a fresh naturally massively imbalanced dataset shows significant drop in performance across a slew of metrics
so now I'll be using a different architecture (but that has more reasons beyond what I stated above of course)
yes
previous experience indicated that to be the case, yes
yo @spring field what happen?
the core issue was bad performance for one of the classes (with the lowest representation) in the imbalanced dataset, now, the model that was being fine-tuned was actually fed sub-par data of that class in particular during the initial training
however, given the low representation, fine-tuning on a naturally imbalanced dataset would likely produce only marginal improvements, so a decision was made to balance the fine-tuning dataset
the experience that such an approach would work came from previous experiments and likely some papers, though those other experiments were done on a different architecture
but again, there are other reasons beyond these latest failed fine-tuning experiments to move to different architecture
happen when? 
is that like a "what's up?" type question? 😅
In which case, uhh, nothing much I suppose? idk, haven't talked here that much lately, but I'm doing great
lol you were typing for long time
I thought your cat was testing the keyboard
lol
ah, I suppose a crucial detail here is also that the new architecture will be initially trained only on that one class
yes
UNET do be cool, but nowadays transformers have taken over pretty much everything, lol
There's Mamba though, wonder how it will fare in this environment
moved to #1035199133436354600
how can I increase performance of catboost classification model, any new features or addition
if there was a one-size-fits-all solution to this, everyone would just do that every time.
can you be as obnoxiously specific as you can about the nature of the problem and the model's current performance?
@vagrant oyster if you ask a question in more than one place, please direct people to one of those places, to reduce duplication of effort.
Thanks for pushing me to be specific — here’s what I’ve done so far:
Preprocessing / Feature Engineering:
All features converted to categorical strings
Log-transform of skewed numerics (abs(skew) > 1)
Sine/cosine transforms for cyclical time features
Numeric features binned into 15 categories
Missing values filled with sentinel "NA"
Passed all feature indices to cat_features
Model setup:
CatBoostClassifier (Logloss, AUC, learning_rate=0.08, depth=7, iterations=1000, od_type='Iter', od_wait=100)
Stratified 5-fold CV + 80/20 validation split
Current best validation ROC AUC: 0.89
Tried already:
Hyperparam sweeps (depth, learning_rate, l2_leaf_reg, border_count)
Feature removal based on permutation importance
Various categorical binning/grouping methods
Looking for:
Specific CatBoost tricks beyond the basics — e.g., CTR features, one_hot_max_size tuning, target statistics on grouped features, interaction features between high-importance vars, or advanced quantization settings.
do you have a confusion matrix for the current performance?
MY current script doesn’t produce a confusion matrix — it only calculates AUC, should I use this?
No. try changing your script to make a confusion matrix.
ok doing that
(128,830) → True Negatives (TN)
(3,072) → False Positives (FP)
(6,965) → False Negatives (FN)
(11,133) → True Positives (TP) - As per confusion matrix
so it's binary classification?
yes
how many positive samples are there and how many negative samples are there?
in the data being used
yes, how many of each
0 = 659512
1 = 90488
considering all this, is the problem an abudence of false negative or of false positives?
false negatives are 6,965 and false positives are 3,072 as per confusion matrix
look at the instances that are getting misclassified and see what they have in common
anyone there
need help
im working on mosdac [ISRO] data scraping i need to scrape all the data from the missions section in json format so i can create a knowledge graph but the sub sections in the missions catogory are so unpredictable some migh contain images sub headings and tables other time its just text idk how to do it please HELPPPPPPPPPPPPPPPPPP!!!!
please anyone bro
Hello guys
hello
hello
can anyone please help me if youre online
hello
@past bramble @obsidian plume
Greetings!
The Programming Club shall be organizing CredTech - a FinTech Hackathon in association with Deep Root Investments from 16th to 22nd August 2025. The hackathon, based on the applications of Machine Learning and Development in Finance involves tackling real-world challenges by utilizing cutting-edge technology.
Deep Root Investments is a forward-looking investment management firm specializing in credit risk strategies enhanced by artificial intelligence and machine learning. They combine deep expertise in credit science with cutting-edge technology to uncover mispriced credit opportunities and deliver differentiated risk-adjusted returns. Core Specializations of the firm include:
- Credit Risk Arbitrage
- AI/ML-Powered Credit Assessment
- Advanced Credit Risk Measurement
This is happening and i am looking for team members of indian origin
Prerequisites
Some ai/ml knowledge and enthusiasm
I do too now days later actually just wanted to say ty
guys what languages do i need for ai n machine learning ik python is one of em but im currently learning c coz of college so is that significant in learning it or should i just take it as another academic sub
the dominant language for ml now is python
specifically, python with libraries that are written in a low level language, but unless you're one of the maintainers you really don't have to worry about that
Can anyone provide me a link of Resources to AGI advancements and Open Source LLM advancements in relations to like Gemini competitors etc handling large codebases?
so except for python is there anything else i need?? any language or smth??
it's less about the language and more about the math behind it if you want to understand ml
ik math foundation is imp for ml but ive only started leaning coding so im not sure what to do after python
Look use sklearn
Its fast enough for you to make models simple ones as well as ensemble models
anyone wanna help me fix my auto email generator? 😭
Hey everyone im new to this community and discord thing
I am a cse student and i am clueless of what to do with my life
I heard about data analysis and wanted to know if it is a good option?
Just learn some C and you will understand a lot of other languages from that if you want to learn another language.
(And how things like compilers, linkers, etc work)
After that you can get into others pretty easily and start reading some open source projects (the ML libraries themselves (or even Python itself)).
This is not nearly as difficult as the math needed, it's just an extra detail that you need to learn if you want to work on these libraries or make your own.
(Or just want to improve at programming in general)
It depends
Python programming (incl libraries) is enough in many cases and 'after python' can take years of practice like any other language really.
hey.. i couldnt work out the sync net part.. i couldnt even run it on server with gpu without crashing,, is there any other way to identify the speaking person in video
Hello
Hey everyone
Anyone here interested to participate in a credtech hackathon
Pre requisites - Some ai/ml knowledge (sklearn) and enthusiasm
This is the info about the hackathon
Greetings!
The Programming Club shall be organizing CredTech - a FinTech Hackathon in association with Deep Root Investments from 16th to 22nd August 2025. The hackathon, based on the applications of Machine Learning and Development in Finance involves tackling real-world challenges by utilizing cutting-edge technology.
Deep Root Investments is a forward-looking investment management firm specializing in credit risk strategies enhanced by artificial intelligence and machine learning. They combine deep expertise in credit science with cutting-edge technology to uncover mispriced credit opportunities and deliver differentiated risk-adjusted returns. Core Specializations of the firm include:
- Credit Risk Arbitrage
- AI/ML-Powered Credit Assessment
- Advanced Credit Risk Measurement
what are the rewards
Lemme tell one sec
@serene dew
Its not quite of a much big hackathon but these are according to 1st,2nd and 3rd
571 Usd
285 Usd
171 Usd
anyone one know how to do the FinBERT fine turning?
here are some notebooks I wrote a while ago that fine-tune BERT. https://github.com/center-for-threat-informed-defense/tram/tree/main/model-development
well i am just starting out machine learning
i had a simple problem to make a single neuron learn that when input >5 the activation must be one otherwise zero
i was trying to analyize the behaviour of stochastic and mini batch descent and too my surprise the mini batch approach was less effective , can someone explain me why ?
the boundary condition is b/W =- 5 (ideal solution), i have obviously filtered out the outliers
I'm reading a book and they leverage conda. I am most familiar with simply setting up a python venv
Is it necessary to swap to conda for this? Or can I stick to just using venvs?
No, don't switch to conda.
When you get a moment could you expand why? I'm looking at a very large conda yaml file and there's a lot of "dependencies" that are prior to the - pip: section of dependencies, and that's both confusing and concerning
Code that actually requires conda to run is getting increasingly rare to the point that I don't think there's any compelling reason for DS/AI people to use something different than the rest of the python community.
I have never used conda since starting in 2018. I work for a research company, and conda is banned here.
I see, that's refreshing to hear. This book I'm reading seems to use conda a lot however.. maybe I can slug through it and try making it work with venvs instead
Thanks!
How old is the book?
Yeah, seems pretty unlikely that their code requires it.
can somebody help
you can run a good chunk of the model on cpu and still have nice speed
it's a big advantage with these small active param moes
Anyone here can help?
How do you get rid the the subplot error on the bottom?
For the figure I did
ax1,ax2 = plt.subplots(2,1,
layout='constrained')
ax1 = plt.subplot(2,1,1)
ax1.plot(PPID, color='red', label= 'PPI Energy')
ax1.set_xlabel("PPI Energy")
ax1. set_ylabel("Index 1982-1984=100")
ax1. legend()```
I then did the same for the second plot
Except add a super title on top
but I keep getting extra numbers in the x axis
I followed the Stack Overflow suggestion but it didn't work
when you mean extra numbers, do you want the lines on the graph to not have any space on the left and right?
Like on the bottom of the second graph there's two scales
Like 0-1 and then the date column
That is your ax2?
Yeah
Can you post the code for that?
Sure give a minute
If you were to try to replicate the code you'll need a FRED account though
I just want to see the code, I can always fake data if needed
Can you comment out the sharex=ax1 for a second and see what the bottom graph looks like?
Sure but I do want to connect the zoom for both graphs
So here's the interesting part that's kinda hard to solve,
But the zoom goes away. Where I can't zoom both graphs simultaneously
The labels correct themselves but the zoom is gone
So I'm not sure how to keep the shared zoom and make sure the axis doesn't create a overlap.
Okay, so the secondary ticks/labels only appear with the sharex
Try setting the sharex when you create the figure
I can't add that the plt.figure part so do I add that elsewhere? I can't add it to ax2 either
I get a error
Ah sorry, it's when you create the subplots
plt.subplots(2, 1, layout="constrained", sharex=True)
That actually worked
Thanks
I did lose my x axis on my top plot but it's not a big deal since they have the same axis
can anyone helpp :c
@inland mulch You've not posted any code so it's impossible for anyone to know what you might have done wrong
def activation(inputx,weight,bias):
def sigmoid(x):
return (1/(1+np.e**-x))
return sigmoid((weight*inputx) + bias)
def modelN():
global weight,bias
weight=np.random.uniform(-1.5,1.5)
bias=np.random.uniform(-5,5)
weight_0=weight
bias_0=bias
def gradient(param,inputx,activation,expected):
def rectify(param):
if param =="W" : return 2*(activation-expected)*inputx*activation*(1-activation)
if param == "b" :return 2*(activation-expected)*1*activation*(1-activation)
rectifyVal=rectify(param=param)
return rectifyVal
def correction(param,hyperparam,rectifyVal):
return (param - (hyperparam*rectifyVal))
def epoch(number):
global weight,bias
if number == 0 : return
for j in range(number):
for i in range(len(trainL)):
training_data=trainL[i][0]
expected=trainL[i][1]
activation_value=activation(inputx=training_data,weight=weight,bias=bias)
rectify_W=gradient(param="W",inputx=training_data,expected=expected,activation=activation_value)
rectify_b=gradient(param="b",inputx=training_data,expected=expected,activation=activation_value)
weight=correction(param=weight,hyperparam=0.1,rectifyVal=rectify_W)
bias=correction(param=bias,hyperparam=0.1,rectifyVal=rectify_b)
epoch(100)
return [bias/weight,weight,bias,weight_0,bias_0] # used for graph plotting
I don't have time to dig into it fully but I don't see where either the stochastic vs minibatch aspect comes in here. You're iterating over each of the training items individually so there's no batch involved.
Guys I’ve learnt the math (statistics) and python behind AI and Machine Learning. I also know neural networks and the weights and biases... but idk how to continue to be able to make things like the MNIST data set number recogniser and other basic AI projects. But how do I learn these things? Any tutorials and stuff to learn this? (I don’t like YT vids for learning but idm anything)
I'm not surprised minibatch isn't doing as well for a problem with a fixed single correct solution
For SGD the model is considering all points at once, for minibatch it's a smaller random subset
oops sorry , i thought iterating and correcting after each training example is called stochastic desent ( thats wha i've been told)
So the weights tend to get pulled around more in suboptimal directions with minibatch
by " all points at once" do you mean our true batch descent that considers all training examples at once and takes an average
or desent after each example ?
No sorry I'm the one that got them mixed up, sgd is one at a time
This is correct
Hm how are you evaluating the performance of the model other than whether it finds the ideal weights?
yeah so i was told that sgd should give us most noise because its not a true desent and makes the path (descent noisy)
but considering over 1000 final values
i dont see that , it makes the solutions most close and dense to ideal
im not
im only considering how close it is to the ideal solution
because the number of epocs are same for all : 100
I don’t really understand your gradient operation (possibly because it’s hard to read the code on my phone) but if you were processing a minibatch I’d expect to see code where it calculates gradients in several instances, averages them, and applies that. It would probably benefit from a different learning rate too.
just so we're on the same page, by stochastic you mean training on 1 sample at a time, and mini-batch batch_size samples at a time (where batch_size < dataset_size), yeah?
it has been observed in many places that too large batch_size can lead to quality degradation, and I don't think there's a precise answer to why this happens currently
yup i got my signal
tbh i am overcomplicating a simple problem
well for mini batch i had other code
def mini_batch(number):
global weight,bias
if number == 0: return
for k in range(number):
for i in range(int(len(trainL)/batch_size)): # since divide give 3.0
rectify_W_sum=0
rectify_b_sum=0
for j in range(batch_size):
training_data=trainL[(4*i)+j][0]
expected=trainL[(4*i)+j][1]
activation_value=activation(input=training_data,weight=weight,bias=bias)
rectify_W=gradient(param="W",input=training_data,expected=expected,activation=activation_value)
rectify_b=gradient(param="b",input=training_data,expected=expected,activation=activation_value)
rectify_W_sum+=rectify_W
rectify_b_sum+=rectify_b
weight=correction(param=weight,hyperparam=0.1,rectifyVal=(rectify_W_sum/batch_size))
bias=correction(param=bias,hyperparam=0.1,rectifyVal=rectify_b_sum/batch_size)
but dw, i should not think very much for this simple problem
First thing I would do is refactor the code so that the two solutions share as much code as possible.
The other alternative is just to use minibatch with a batch size of 1 - that is just stochastic gradient descent anyway
Try kaggle's beginner competitions; they also usually come with a simple course guiding you through the basics
unrelated but doesnt large batch size push the solution to be more precise , i.e more stable descent cuz true descent invloves taking in the average of all data at once ,i.e batch size of len(data)
Yes, the ‘ideal’ is to train on the whole set at once, i.e. a 100% batch size, but it’s disproportionately slow to do that
more precise is one way to look at it, and also more deterministic
at the extreme when batch_size == training_dataset_size, you follow a path completely determined by the gradient, which means you take 0 "wrong" steps and it converges fast, but it may lead you to a local optimum
at batch_size < training_dataset_size, you can imagine you sometimes take "wrong" steps, converging slower, but these wrong steps can also knock you out of local optimums
Training on the whole set as one batch converges fast in terms of iterations but incredibly slow in terms of real world time
(I’m ignoring the local optima problem though)
guys you have a complete roadmap for ai?
yeah, it is usually unreasonably expensive (computationally, but also lots of memory) to actually calculate the gradient for the entire dataset, so you really don't see batch training anywhere
ah i see
also i am moving on from this simple problem and advancing towards the 3b1b neural network playlist , where will try to code the digit recognition model myself , that will def take a while , cuz i will be DEFINITELY experimenting with EVERYTHING there cuz its so complex
but after that i dont have any definite path for where to go
any suggestions?
and also mentioned by Kylotan already, the difference between sgd, minibatch, and batch is tiny, you can just view them as training with 1 or batch_size or training_dataset_size samples at once, so in practice it's not really distinguished in most libraries
No such thing as a complete road map, as it’s not a solved problem.
No such thing as fully mastering a programming language either.
Just get stuck in and learn as you go.
Sorry I had to step away for a moment, i think there are infinitely many solutions to the problem, aren't there? Because if you're just trying to see if a number is over a threshold you can scale it with the weight and learn a corresponding intercept
With any kind of descent algorithm it's going to descend on a solution but not necessarily any one in particular
The solution it finds is more up to the random initial weights
yup there are , the solution is b/W =-5 , and ultimatley , i have deduced considering the simplicity of the problem that comparing the performance of different descents is moot cuz the differnce is tiny
I do think it's interesting that some approaches seem to land on the "ideal" more often than others though
depends on what you want to do ig? you could look into other ml algorithms for example
the field is vast that a definitive path I don't think exists
ahh tbh i want to explore and eventually research in the discipline
ok
I mean if you want to get into research I think it's reasonable that you should probably read the recent papers
choose a subfield, see what people have been doing, probably need to learn math to understand what they're talking about
Hmm ok ty!
Hello
I am trying to making the dashboard using data but i feel so boring to understand the column and purpose of value in normal when i start learning it feel so excited now i feel bored to make project why it happens to me
Hello all I would love to review your portfolio or resume especially if you are an entry level professional for Data Science or analytics. I have been exploring working with data and just completed an analyst internship over the summer. No degree so I would love to get an idea of how to show my experience and skills as valuable. If there’s a resource you would like to point me to that would be great too!
my vae implementation for curvature is actually working!
hi im a beginner at coding python , im aware of the basics, how can i learn python specific to data engineering which will help me excel in it?
Try Datacamp!
thx
Cool!!
It's been a while since I've last poked my head into building models, does anyone have any references or interesting posts to read through specifcally looking at training and/or distilling a encoder-decoder model for predicting next words? Or more specifically predicting associated words and phrases to an input
In my mind i'm thinking it might be possible to use one of the OSS generative AI models and retune & distil it to target that application, but I can't remember basically anything around that lol
thinking like input -> output:
Tiger Woods -> Golf, Sports ChampionPrada -> Shoes, Designer brandAdvertising -> Ad, Marketing
What about this video by sebastian lague:
https://www.youtube.com/watch?v=hfMk-kjRv4c
this is like my 3rd time trying to use it
base (non-instruct) models try to predict what would be the next word in a sentence, but if you mean 'next' as in 'similar' not sure
one option could be just embedding each word in a dictionary then using semantic search tbh
I could do a training setup similar to GLOVE I guess
biggest pain would be getting the dictionary coverage though
a normal LLM pipeline would probably be easier to ship and deploy though
Hi GenAI enthusiasts, I think you all faced similar problems while working for any AI agent.
I really love to hear from you on how you approach the solution of the below cases.
- Data storing and retrieval approach also called agent memory
- Long term
- Short term - Reduce token while LLM call
- Measure accuracy (testing)
Any suggestion from any of you can help all of us.
Hey guys, I’m new here and just released my first open source package onto PyPi. It’s a declarative, object oriented approach to creating LLM agents. I would love any feedback if anyone wants to take a look!
https://rmikulec.github.io/pyAgentic
quick note: it only supports OpenAI right now, but I have plans in the roadmap to add other services and local model support
Ya! It totally does have some similarities. I think where pydantic-ai took a more functional "fastapi" - type approach, this sticks with object oriented code. I did this to have an easy way to create inheritance hierarchies, as well as the ability to create mixins with AgentExtentions
Another feature that is nice, user's dont need to create an instance of the Agent itself in order to gain access to things like the pydantic response model, tools definitions, etc.
Obviously it cant compete with pydantic-ai cause that is already impressive itself, but I thought i'd give a shot at it as a fun side-project
For sure, tool-calling can definitely be finicky. Working on implementing structured output support right now actually, its pretty amazing how well it works haha
Is it true that llms are just better than sota ocr engines a lot of the time
LLMs are not intrinsically capable of understanding images. If you ask ChatGPT to transcribe the text in an image, it's actually delegating that to a system with OCR capability
do i have to learn R language for ml or is python enough
Any of the things you might want to do with R can almost certainly be done just as easily in python
Are you sure about this? I thought it embedded the image using neural networks
Like, it uses a completely different technique than traditional OCR
Hi
I suspect it's farming it off to an OCR model just because of how image embeddings work
I am guessing but they're probably good enough at this point that images with some words in them can probably embed the content of the text in the embedding space
but if you're looking at something like a PDF or image that is nothing but text, at the end of the day an image embedding is of a fixed size and can only contain so much information
and the shape of the embedding space also has to describe everything else that the image model is trained to recognize
I started 3 months after I had started learning python (and failed miserably) but now I managed to make a neural network using numpy, and a lot of the knowledge I gained was during those first months (it also taught me a lot about python) :)
src/transformers/models/mllama/modeling_mllama.py lines 164 to 177
# Copied from transformers.models.clip.modeling_clip.CLIPMLP with CLIP->MllamaVision
class MllamaVisionMLP(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
self.activation_fn = ACT2FN[config.hidden_act]
self.fc1 = nn.Linear(config.hidden_size, config.intermediate_size)
self.fc2 = nn.Linear(config.intermediate_size, config.hidden_size)
def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
hidden_states = self.fc1(hidden_states)
hidden_states = self.activation_fn(hidden_states)
hidden_states = self.fc2(hidden_states)
return hidden_states```
Hi, has anyone here enrolled for Andrew NG Coursera, courses of machine learning and deep learning and related. They have stopped the audit option and I really wanted to complete the course.
hello is there anyone did langchain project?
How is my deep learning model doing ?
Guys, what is the best way of learning AIML ?
i see, can u share it with me how u are doing it?
i see
i think u hv a great approach
Then wat should I do is there a better way to predict
how close to infinity have you got?
hello..i am new so i wanna learn AI do rn i am doing python and ml so any guidance
resoures or roadmap
??
https://www.kaggle.com/learn/python
this might be helpful for you
i am learning python fine...tahnks for resouce...i need guidance what to learn to ...persue in AI...
is there any other strategy then making an ML or deep learning model ?
which is better ?
I see but do you recommed a place to look for articles ?
What bout where to use which ML algo or deep learning one I have some idea but needs more indept knowledge
guys which things should i download in my laptop before starting AI and Machine learning , i already know the basics of python , i just started
dont forget to use docker bro otherwise u r whole system might crash
if it takes a lot of memory suddenly
what should i learn first for ai and ML ? i already know the basics of python , and i dont wanna spend more time on python cause my main goal is AI .
do some basic data exploration with something like pandas.
new to ai in python what should i learn first like i wanna make a chatbot or a nureal network
yea but whats pandas , my university will start in two weeks and i wanna get head start , which course on youtube is best?
you can look it up.
as an AI engineer, you'll be looking up things you've never heard of a few times a day.
ok
thanks .
😭 ,,now i am confuse can someone plz guide me
tryna familiarize myself in data science, is the free first chapter from datacamp enough? cuz im broke
"enough" for what?
it's not necessary to learn both Python and R.
if its enough information to know how python is applied to statistics, something like that
very unlikely. any statistical computation can be done in python.
it might be a good place to start.
ooh thanks, thats good to know.
Guys new to ml where do i start(ik python on its own) jyst no ml libraries
Working on pytorch rn
"Deep Learning with Pytorch" is a good intro imo
I am not sure if this is the right place to ask this, but has anyone made a neural network with asm?
I'm sure it's been done, but there's not much point these days.
I've been looking for the past 2 hours-- I couldn't find any.
Probably because it's not worth doing or sharing.
... you only saying that when I've done half of it already?
okay
NNs are perfectly suited to running on GPUs. Even the most efficient CPU implementation can't come close. So there's not much point. Better to ensure the CPU part is optimised for ease of use, e.g. written in Python, and that the actual processing is offloaded to the GPU.
hey guys, im using LASSO within a DML framework, should i fit my nuisance fits separately?
How should I go about making a image set
I'm using cats also how long should I take for each image to find?
Because I was investing in making a web scraper so I can scrape the web for all these images and get my data set without driving myself nuts
What I mean is going and finding each cat image to put in my data set or should I just use one made
Well I would gain the knowledge of how to make my own dataset
How can I evaluate the quality
bro just use class
Well I have just started machine learning and I didn't knew people write neural networks in classes.. is there any specific reason??
In real-world and large-scale projects it is always better to use Object-Oriented Programming (OOP) and classes.
When you're a beginner you can build simple models without using classes But as projects become more complex and large-scale using OOP with classes becomes essential OOP provides a structured way to manage complexity making your code easier to maintain, understand and scale
hi guys
how would I make my own models or know how to use existing ML algos I have learned basics of Machine Learning I want to dive deeper into algo trading, quant, HFT etc
I am looking into research papers is there any recommendations ?
so I'm going over neural networks and find myself in a spot of bother, I'm using the iris dataset for context and not setting aside any validation or test data, just trying to sort out the moving pieces and this code wall is mostly fine
class MyNetwork(nn.Module):
def __init__(self, n_hidden_layers, neurons_per_layer, n_features, n_predictions):
super().__init__()
self.layer_dict = nn.ModuleDict()
self.n_h_l = n_hidden_layers
self.layer_dict['input'] = nn.Linear(n_features, neurons_per_layer)
# hidden layers
for _ in range(self.n_h_l):
self.layer_dict[f'hidden_{_}'] = nn.Linear(neurons_per_layer, neurons_per_layer)
self.layer_dict['output'] = nn.Linear(neurons_per_layer, n_predictions)
def forward(self, features):
features = self.layer_dict['input'](features)
features.relu_()
for _ in range(self.n_h_l):
features = self.layer_dict[f'hidden_{_}'](features)
features.relu_()
features = self.layer_dict['output'](features)
return features
my_model = MyNetwork(2,3,4,3)
def train_my_model(the_model, lr = 0.01, n_epochs = 3000):
loss_func = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(the_model.parameters(), lr = lr)
for i in range(n_epochs):
y_hat = the_model(features)
loss = loss_func(y_hat, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
predictions = the_model(features)
greatest_prediction = torch.argmax(predictions, axis=1)
accuracy = 100*torch.mean((greatest_prediction == labels).float())
print(accuracy)
train_my_model(my_model)
what I'm confused by is sort of my own labels, I have features feeding into the forward method, and then again when calculating y_hat and I'm a bit confused by what's happening throughout
It’s the same thing. When you ‘call’ your model, Pytorch uses the model’s ’forward’ method to determine the output
I guess what's bugging me is that my code makes it look redundant, why do I specify in both places which data set is used?
not data set but tensor...
I sus i'm shadowing but i don't know what's under the hood so i have to ask
oh, so my course used convention "x" which was not at all helpful for learning how to write my model class
X is fine, that’s the standard input to any mathematical function. Not super descriptive, but still.
I’m not entirely sure what you mean about your data set. I guess it’s a bit confusing because you refer to ‘features’ which are usually coming from a data set you iterate over and pass minibatches to ‘forward’ in turn.
the instructor used def forward(self, x): this was the same thing claude reported is the convention when i dug into it a bit more
had just been a few days since I watched the vid, figure this is code worth commiting to memory
scikit-learn
I'm at my PC now rather than on my phone so I can see your code a bit clearer now 😄
I wouldn't have a layer_dict - I'm unsure what value that gives you over simply having each layer as a member variable. It probably complicates matters a bit.
I guess it allows you to have that loop inside forward but, to be honest, to begin with I'd just have a list of hidden layers instead.
Not that it really affects the problem - I'm just a firm believer in simplifying the code when it doesn't do what is expected.
The only other big question I have is that I don't see any sort of batching going on. Each epoch is meant to go over the whole training set, but you don't have a training set, just features, and it's not clear what that is exactly. Most likely there's a loop missing here, where you extract those input features from each element in the dataset, either one by one or in batches
nah it was never meant too, it's just the pace this instructor is setting. It's a comfortable pace but I do find myself screaming at my screen sometimes.
I'm sure he'll incorporate batches and validation in the next section.
what are the two places where you specify the tensor ? I don't understand
it was my forward method that confused me
does it still confuses you or is everything ok ?
I still think there's something missing here. If there's no batch then it's not really an 'epoch'. If there's only a single piece of data it's hard to train on it. In particular, your accuracy measurement appears to assume that you have multiple predictions, but if you only have 1 piece of data, then you only have 1 prediction.
welp, I'll know where to ask when I get confused again
How many time did it take you to be in this level of coding
this isn't really much coding, i spent years learning core python and years working in a statistics heavy field
but you could totally skip Data structures and various other subjects I've dug into
only took about a week to learn the maths and a week writing my first ANN, already being comfortable with python and statistics
data science ai bros, am i able to scale my outcome and controls for LASSO, compute weighted penalty loadings (RLASSO), then rescale residuals found from this prediction
idk the theory , but in terms of tight confidence intervals, it fcking works
Can any of you DM me if you know about LangChain agents?
why don't you just ask your question about LangChain? that way people don't have to DM you to find out if they know enough about LangChain to help you.
Hello, I'm hoping someone can help me with a weird LangChain RAG issue.
I'm building a chatbot to read PDFs. My code successfully finds and loads the FAISS vector store from Google Cloud Storage, and the logs even say it's finding the right document chunks
But when the RAG chain tries to build the prompt, the context is completely empty. The final answer is always "no text was provided."
I feel like I've fixed everything else (GCS paths, file IDs, unpickling errors). Has anyone had this problem before, where the system finds the documents but their actual text content is missing? I'm at a dead end and would appreciate any ideas.
Thank you!
Always show code
Don't wait for someone to ask you to show the code. Just include it in your first message. You can use our paste bin if you need to.
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
Hello guys, i'm from Brazil and want to learn python for data science, which path do you recommend for me to start studying? and which courses??
Allen Downey's https://greenteapress.com/wp/elements-of-data-science/
with ELO scoring, when it is transitive (A > B > C; -> A>C) are they referring to separate LLMs and there performance? Like, say for example in terms of food I guess, eggs > Waffles > Pizza(E>P; would have to always hold); it would mean you like eggs more than pizza and waffles more than pizza but transitivity would be violated if someone liked pizza more than eggs or the person aid the likes waffles more than eggs. They are just doing that with the performance of separate llms, right?
NVM, transitivity is not guaranteed.
guys could u tell me wht im doing wrong here ?
im practicing using neural networks for predictions but it has only close to 80% accuracy
wht can i improve to make it better ?
i mean thats only 30% than a random guess
idk just doesn't feel sufficent
especially if u were to use this for any real world purposes
80% could be really good for the problem, I'd suggest comparing what you have against a simple baseline like logistic regression to see how much the neural network is performing against it
also since this is a simple binary classification problem I'd do more of an investigation into the precision and recall and not focus solely on the accuracy
oh im sorry i just used the share button
dont really know much abt kaggle sharing lemme fix it
ok now you should be able to see it
seems like u can only view it when im not running it which is a little inconvinient
hm yeah I would definitely look into a confusion matrix
I just noticed the dataset is imbalanced and about 80% of the data is the negative class
I mean in general 80% isn't terrible as far as accuracy goes, but also the model could achieve 80% accuracy by only predicting the negative class
oh
true
i forgot to test for class imabalances
try outputting the f1 score during training or something else that balances precision and recall
but also if you are going to do a neural network, I do recommend comparing it against a simple logistic regressor just to give you some context for the results you see
good point ill try tht out
also this is a smaller point but I strongly recommend a train, test, and validation set so you can meaningfully compare the results
thanks for the advise 🙂
i used a validation split does that not cover it ?
you used two splits, train/test
i ran it for close to 120 epochs and there were no differences in accuracy or validation scores
if you're going to do hyperparameter tuning, a baseline like I suggested, or tweak the neural network like @final kiln suggested, you need a validation set to compare them
train on the training set, use the validation set to find the best-performing model, then perform a final evaluation on the test set only on the best-performing model
tweaking the neural network is ok but you really need to figure out the imbalance and find a baseline
ehhh
this looks like an introductory project, trying to encode the surname is a lot to deal with
100% the imbalance is the biggest issue here
all this stuff about tweaking the network and doing embeddings of people's names and stuff isn't going to help if 80% of the dataset is the negative class and no one has seen the confusion matrix yet
the most important thing right now is understanding what that 80% accuracy figure means and getting a better picture of the predictions the model is making, maybe doing some precision/recall curve work, and then tweak the network
i get wht ur saying i hv a call rn so ill get back to the matrix afterwards and notify u
Does anyone have a really messy dataset(s) with atleast 100k records? I need one for my project.

this is what i got from my classification report
as u said the class imbalance probably contributes since there's a really low precision for one of the classes
Of the things it said were positive, only 4% were
But it correctly identified 62% of the positive class overall
So if in reading this right, it's saying a lot of things are positive that aren't
You might consider defining class weights, it will allow examples from the minority class to exert a greater influence on the model's weights when they update
There are also techniques for balancing the dataset
I would start with that just to make sure you're giving the model the best opportunity to learn the data, but I would also consider other models besides neural networks too
Unless this is just an exercise to learn neural networks
Newb here, sorry in advance if this is the wrong channel for this question! I'm working on a simple RAG PoC: Currently I'm using the basics, RecursiveCharacterTextSplitter from langchain_text_splitters, along with WebBaseLoader from langchain_community.document_loaders to extract content from a webpage, turn it into embeddings, and index in a vector store. Basically RAG 101 stuff.
My question is related to trying to figure out a useful way to identify where in a page a particular "chunk" is pulled from. A very long web page for example, might have 10 "chunks" in it, but when one of those chunks if indexed / added to a vector database, it isn't really related to a particular part of the page. I am familiar with the web / HTML / etc, so realize this is perhaps an impossible problem to solve (there may not be a reasonable "hook" that a tool could use to programmatically identify a "chunk" in a way that's useful in the output), but I was curious if anybody is aware of any tooling / strategy to help identify where on a page a chunk would come from, so for example in the final UI I might be able to somehow identify where in a particular webpage the content was referenced from?
hi i wanna be a data analyst/scientist i have some python fundamentals and now im learning pandas
but im not so perfect in excel (i know some formulas and pivot tables a little bit) also i did not learned tableu or powerbi whatever
so am i doing wrong
cause i feed more happy when im working about python
(btw im not even a university stundent i will start this year)
yeah it was
oh i see will look into it
thanks guys for the help 🙂
One more thing, it's a useful exercise to put yourself in the position of the bank and think about what precision and recall mean for you in this problem, and which is more important
Assuming you aren't already familiar with the precision and recall trade-off, I I highly recommend using this problem as an excuse to study it
hi i need some help, im trying to use rag for my llm and im on the data prep section, im was about to chunk but should i tokenize first, then chunk then embed? i thought that if i tokenize after wouldnt the chunks end up becoming larger in size? i was planning on adding some overlap too... also should i use an embedding model or do it manually? im a beginner w rag so was thinking if doing it from scratch as much as possible would help me understand the process better or idk what do u think?
im also new but i think when chunking you can refrence a source
which chunking strategy to use will vary depending on which kind of document you're working with
for some things you could just separate into paragraphs without tokenizing at all, for others you might want to take multiple overlapping regions of tokens
same goes for whenever or not to embed
How do you want to search later?
if hardcoded filters or traditional full text search suffice for your use case, no need
If you want semantic search, you'll need to embed and preferably create a index in whichever vector db you choose
depending on your use case, the ideal chunk size could be anywhere from a single sentence to an entire pdf
do u have a resource i could use as a guide?
i data scraped some websites and some of the txt files arent too long, but some are, a lot of them have bullet points, this is the place i extracted data from:
https://www.nationaleatingdisorders.org/risk-factors/
Learn about the risk factors of eating disorders. Visit the Resource Center at the National Eating Disorders Association.
trying to run from pyspark.sql import SparkSession/spark = SparkSession.builder.appName('test').getOrCreate() just hangs in my ipynb file forever anyone know a fix?
hi
Hello and welcome to our wonderful data science chat
does people generally talk about ML or data science at here
Both
i got a question
How is anyone supposed to answer it if you don't say what it is?
i was waiting u to listen
anyways
i've seen lots of LLM NLP models on huggingface before
I'm on a walk right now, but you don't need me specifically. There are lots of knowledgeable people in this community
People come and go pretty quickly throughout the day. They respond when they check the channel and see a message that they know how to answer
LlamaIndex docs and deeplearning.ai mini-courses can serve as inspiration
Sup
need help with ai agent pretty decent already just need finishign stuff
Say all the things someone would need to give you meaningful help without asking you anything else
I agree with @final kiln , and also I would recommend that if you aren't already familiar with lists and dictionaries and other Python language features, you should take the time to understand them before getting too far into Pandas
They are very much their own thing independent of Pandas, and it's better to understand them on their own terms instead of how they are used by Pandas
To give you an example, to reword your explanation of {}, the dictionary's keys are used as column names and its values become the values inside each column
If that doesn't make sense, especially the key/value terminology, definitely go study dictionaries
I don't understand oversampling. Can someone explain? I understand that if we have 7000 of x and 4000 of y then there will be bias as there will be a greater incentive to prioritise x, but from what i know, oversampling will change it so that there is 7000 of x and 7000 of y. But won't that alter the original data and lead to inaccurate results?
and won't that change the meaning of the data?
anyone beginner in data science? like who knows pandas and matplotlib
there will be a greater incentive to prioritise x
there won't be if you for example weigh differently how much getting a certain class right matters
alter the original data and lead to inaccurate results
it will amplify the bias of the data, yeah
in fact in my experience oversampling only ever makes things worse
how does AI exactly work because i thought of making an AI and so far all im
doing is just giving it responses that im writing manually and making the AI randomly choose from the responses it's allowed this seems very basic and I feel like im doing something wrong in a way
Click here to see this code in our pastebin.
the letters took so long to do btw
your approach is like expert systems, i.e. for possible inputs you're manually setting some rules on how the AI should react
modern "AI" like llms like chatgpt, use machine learning trained on a massive text dataset to learn how text should behave and generate convincing text that way
if my approcach is like expert systems does that mean im an expert at AI
dont bigger AI's and companies use ways to search the web and pull resources summarize them and such
expert systems is just the name of what I'm describing, it was popular before ML approaches took over because the latter is easier to scale
what specifically are you thinking of?
when I use ai for suggestions or if i ask it to help me understand something it shows me its sources that its pulling the responses from and giving me what I asked
like if i ask how to start javascript it shows the sources it pulls from like reddit and then shows me the summarized result
so chatgpt and the like
then not much is different, they're still llms, just that there's some utility code to search the web, include the search results in the llm's context, then generate
thats interesting but idk how to do that and my AI is just basic conversation
because it's hard to start from scratch
you need a huge quality dataset and a lot of computing power to even begin thinking of training one from scratch
so the only reason im not gonna get far anytime soon is just because it requires just a massive dataset of possible responses
basically your approach and the modern big ai company approach is very different
chatgpt doesn't specifically check something like if "name is" in user_input anywhere, instead it's trained to learn, for some given text (context), what's the next most likely text piece to appear
right now you can also download a model trained by others and just run it; running it requires a lot less (but still a considerable amount) of compute compared to training
so I can become similar ot a big AI company like openai i just need millions of if statements
there's not really a good point of reference on how millions of if statements would perform, but my gut feeling is that you probably need a lot more
hence the reason people are using ML approaches
ima get to work on manually writing my billions of if statements
thanks for the idea
I found the reason I was creating sequences on the fly that's why it was taking so much time after saving sequences in start and saving them now it takes only 3-4 mins per epoch .I reduced image size as well to 256 x 256
yes you are correct I fixed that as well now I am predict t9,t10,t11 and input is t to t8
yes first 8 are input and later 3 are outputs so its rolling window first sequence is t to t8 as inputs and t9 to t11 as outputs,next sequence will be t1 to t9 as inputs and then t10 to t12 as outputs and so on for whole data set
these are the plots
from what I realised
Epoch [25/25] Train Loss: 12.5875 Train MAE: 12.5875 Val Loss: 8.5209 Val MAE: 8.5209 LR: 0.000100 my Loss and Mae both are same for all epochs 😭
so it's normal for both to be the same, if they are the same thing
train and loss are different
but of course your two plots are identical
(Or are you using different data for both plots ?)
yes
So like after training loop I added val loop for predictions and evaluation
I might have messed something
but if you do that, you compute val on the val data and train on the train data
so both images are the same but show 2 different plots
that should not be the case
as what I did to stating 70% as training and than from remaining 30% took 15% val and 15% test
I doubt it should be same
sure
Before that let me try again
I used gpt to do it
I will first write it on my own
This looks OK, it's common for validation loss to be slightly worse than training loss. also l1 loss and MAE loss are the same thing, how are you computing them?
ok yeah that makes sense
loss_fn = torch.nn.L1Loss() using this
def compute_mae(outputs, targets):
return torch.mean(torch.abs(outputs - targets)).item()
ok yeah you're just doing the same computation twice
https://docs.pytorch.org/docs/stable/generated/torch.nn.L1Loss.html "Creates a criterion that measures the mean absolute error (MAE) between each element in the input x and target y."
but data is different
wait how so
for train its first 70% data
you have "loss over epochs" and "MAE over epochs"
and for val 70:85
i'm saying the "loss over epochs" and "MAE over epochs" graphs look the same because they're both computing the same thing, but the train/validation loss in each graph is different because they are different datasets
right? or is there some difference I'm not seeing
ok
but data is different so results should be different right?
'''pbar = tqdm(loader, desc="Training", leave=False)
for X, y in pbar:
X, y = X.to(device), y.to(device)
batch_size = X.size(0)
optimizer.zero_grad()
outputs = model(X)
loss = loss_fn(outputs, y)
loss.backward()
optimizer.step()
# Track
total_loss += loss.item() * batch_size
total_sq_error += compute_mse_sum(outputs, y)
num_samples += batch_size
pbar.set_postfix({"batch_loss": f"{loss.item():.4f}"})
avg_loss = total_loss / num_samples
avg_mse = total_sq_error / num_samples
return avg_loss, avg_mse'''
there are two dimensions to think about here
one is L1 vs MAE, the other is training vs validation
this is for training I replaced mae with mse
oh
and this is for val def validate_one_epoch(model, loader, loss_fn, device):
model.eval()
total_loss = 0.0
total_sq_error = 0.0
num_samples = 0
with torch.no_grad():
pbar = tqdm(loader, desc="Validation", leave=False)
for X, y in pbar:
X, y = X.to(device), y.to(device)
batch_size = X.size(0)
outputs = model(X)
loss = loss_fn(outputs, y)
total_loss += loss.item() * batch_size
total_sq_error += compute_mse_sum(outputs, y)
num_samples += batch_size
pbar.set_postfix({"batch_loss": f"{loss.item():.4f}"})
avg_loss = total_loss / num_samples
avg_mse = total_sq_error / num_samples
return avg_loss, avg_mse
so in loader I give train_loader and val_loader
what is compute_mse_sum
train_loss, train_mse = train_one_epoch(model, train_loader, loss_fn, optimizer, device)
val_loss, val_mse = validate_one_epoch(model, val_loader, loss_fn, device)
its name of func
def compute_mse_sum(outputs, targets):
return torch.sum((outputs - targets) ** 2).item()
why not just use the mse_loss() function? this seems like it's more complicated than it needs to be
got it but they will do same thing i guess
I see
from what I can read, you compute the loss and the mae/mse with the same training data
and same for validation
so of course the plots are identical
no no I am using different data here check this
train_loss, train_mse = train_one_epoch(model, train_loader, loss_fn, optimizer, device)
val_loss, val_mse = validate_one_epoch(model, val_loader, loss_fn, device)
different loaders
it's not in this piece of code
are you sure you are using it ?
here
pbar = tqdm(loader, desc="Validation", leave=False)
for X, y in pbar:
loader
its before the loop
ok but you are computing both the loss and the mse in the same function
total_loss += loss.item() * batch_size
total_sq_error += compute_mse_sum(outputs, y)
oh ok so we dont do it each batch wise?
no you can do it for each batch that's not the point
but those two lines are one after the other, working with the same loader
yes those are for trainloss and trainsqerror
I am little confused as I am doing trainloss and trainsqerror and after that I am doing valloss and valsqerror
yes that's fine
but you are doing it on the same loader, so you get the same function
def train_one_epoch(model, loader, loss_fn, optimizer, device):
model.train()
total_loss = 0.0
total_sq_error = 0.0
num_samples = 0
pbar = tqdm(loader, desc="Training", leave=False)
for X, y in pbar:
X, y = X.to(device), y.to(device)
batch_size = X.size(0)
optimizer.zero_grad()
outputs = model(X)
loss = loss_fn(outputs, y)
loss.backward()
optimizer.step()
# Track
total_loss += loss.item() * batch_size
total_sq_error += compute_mse_sum(outputs, y)
num_samples += batch_size
pbar.set_postfix({"batch_loss": f"{loss.item():.4f}"})
avg_loss = total_loss / num_samples
avg_mse = total_sq_error / num_samples
return avg_loss, avg_mse
def validate_one_epoch(model, loader, loss_fn, device):
model.eval()
total_loss = 0.0
total_sq_error = 0.0
num_samples = 0
with torch.no_grad():
pbar = tqdm(loader, desc="Validation", leave=False)
for X, y in pbar:
X, y = X.to(device), y.to(device)
batch_size = X.size(0)
outputs = model(X)
loss = loss_fn(outputs, y)
total_loss += loss.item() * batch_size
total_sq_error += compute_mse_sum(outputs, y)
num_samples += batch_size
pbar.set_postfix({"batch_loss": f"{loss.item():.4f}"})
avg_loss = total_loss / num_samples
avg_mse = total_sq_error / num_samples
return avg_loss, avg_mse
for epoch in range(1, num_epochs + 1):
print(f"\nEpoch {epoch}/{num_epochs}")
train_loss, train_mse = train_one_epoch(model, train_loader, loss_fn, optimizer, device)
val_loss, val_mse = validate_one_epoch(model, val_loader, loss_fn, device)
so, only one loader for mse and loss
1 for train, one for val, and on each of them you compute the loss and the mse
so if you use the mse as the loss, you'll get the exact same values
okie I a m confused I will go through the code once properly and update you
to simplify, here is what you are doing:
import mse_function
loss_function = mse_function
for i in range(epochs):
#training
output = model(x_train)
loss = loss_function(output, y_train)
mse = mse_function(output, y_train)
loss.backward()
optimizer.step()
loss_list_train.append(loss.item())
mse_list_train.append(loss.item())
#validation
output = model(x_val)
loss = loss_function(output, y_val)
mse = mse_function(output, y_val)
loss_list_val.append(loss.item())
mse_list_val.append(loss.item())
So obviously, the content of both loss_list_val and mse_list_val are the same
(I removed the batching for the sake of simplicity)
Oh okay
they are calculated by same function
as loss function and mse function are same
Oh dang that was so dumb
yes they are, that's what we talked about at the very beginning
do that but with mse as the loss and mae on the right and you'll get different plots
got it I am updating the code and keep it on run
Epoch 1/50 [Train]: 22%|███▊ | 168/757 [00:58<03:23, 2.90it/s, Batch Loss=78844.4609, Batch MAE=280.3940]
lets see what happens looks crazy high lol
you are accumulating an MSE so it's normal
yes
MSE is square of MAE in order of magnitude
yes
I kept it for 50epochs it will take around 2 hours
check this out
Recent advancements in data-driven weather forecasting models have delivered deterministic models that outperform the leading operational forecast systems based on traditional, physics-based...
This paper proposes a simple modification to the mean squared error loss function that eliminates the problem of overly-smooth fine scales in data-driven weather forecasts.
Imma go to sleep now
no
Majority are hard coded
Deterministic models
check this one The WRF model
aah okay yeah
🙂
here from data-driven they meant ml dl models
haha
nah its fine
now that I think of everything is data driven approach
I want to start ML but I dont really see anyone talk any code. Its great to understand the concept on how an ML works but I never seem to pick up how to actually code one. Do you guys have any recommendations to learning actual code?
hi, i have copied an api based ai assistant from a youtuber
now i want to add some features like it can help making 3d models and can perform functions on command while editing
Like how to code algorithms?
Or are you asking how to use them in practice
How many images do I need again for making my own dataset because may as well learn how to make one from ground zero in case I can't find one
Thank you for yesterday
results are fine as well
will look in different loss functions
It looks like its smoothing the image
Looking in that
yes
its actually good as its able to detect the structure
input are simialr images
this is an example
as you can see there are some changes
just a min let me look
Here
these are predictions on train -first sequence
predictions for this sequence
yes
satellite imagery
of INSAT
so these are clouds you can see it forming .changing shapes and dispersing
wait really?
wait its gone
yes
exactly
yeah
I am gonna go now and look what I can do to get sharp image
now I see its not actually learning
in input features edge is same
30epochs after that it went like flat
I was in logs
its movie around 7.1 to 7.5
I will try that but its from paper
one with 1024 bottleneck
alright imma go play with it
my code for getting mood based on color
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd
''' ----- CONFIG ----- '''
MODEL_PATH = r"Main\Machine Learning\model\ColorMood_SVMV1.pkl" # folder to save/load model
DATASET_PATH = r'Main\Machine Learning\data\ColorMoodDataSet.csv'
# region PREPARE Data
''' ----- LOAD DATA -----'''
ds = pd.read_csv(DATASET_PATH, comment='#')
X = ds.drop('mood',axis=1)
y = ds['mood']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# endregion
# region SVM Classifier
# Initialize the SVM classifier with One-vs-One strategy
svm_ovo = SVC(decision_function_shape='ovo')
svm_ovo.fit(X_train, y_train)
# endregion
# region Test
# Predict and evaluate the One-vs-One model
y_pred_ovo = svm_ovo.predict(X_test)
print("One-vs-One Accuracy:", accuracy_score(y_test, y_pred_ovo))
# endregion
# region Save Model
import joblib
# save
joblib.dump(svm_ovo, MODEL_PATH)
print("✅ Model saved at:", MODEL_PATH)
# endregion
this has accuracy of 1
but for the dataset
PrevNote,Note,Tempo,Mood,Dur,NextNote
60, 65, 120, 2, 0.5, 67
62, 67, 120, 2, 0.5, 69
65, 70, 120, 2, 0.5, 60
67, 60, 120, 2, 0.5, 62
69, 62, 120, 2, 0.5, 64
65, 60, 120, 2, 0.5, 60
62, 67, 120, 2, 0.5, 62
65, 70, 120, 2, 0.5, 65
67, 60, 120, 2, 0.5, 67
69, 62, 120, 2, 0.5, 69
it has accuracy of 0 any particular method i should use to train my model on music notes?
How many different mood classes are there? Also you're going to want to standardize your input features first, the columns are all at different scales and that's going to mess with most models
So I have total of 4 moods
The one program I shared is getting mood from r,g,b values
That went well next I tried with notes
So from C4 to A4 are the notes I want it to predict which note falls in the progression so I feed 1st and 2nd note I want the 3rd note
I'm not sure I understand your problem, the code you posted is using PrevNote, Note, Tempo, Dur, and NextNote to predict Mood
You're wanting to change it to predict something else? What is this new data that's getting a 0 on accuracy?
Oh I see you're giving mood, note, etc as input and trying to predict nextnote
How many different notes are in the dataset, it's all c4 to a4? How much data do you have and how are you preprocessing mood? Also I would reiterate that standardizing the features is a good idea
https://youtu.be/4bCrNl4Bx1M?si=OWj-lCMf_VQ2ili2
I found this video that might help me
Not sure how the code would work tho
In this episode of the AI show Erika explains how to create deep learning models with music as the input. She begins by describing the problem of generating music by specifically describing how she generated the appropriate features from a midi file. She then describes the deep learning model she used in order to generate music. Learn more:
Blo...
hey guys
I am completely new to coding other than the basics and just want to learn by observing i dont have much value to provide
Are good resources for AI in general pined into the channel?
I want to create an AI that plays a bit complicated match 3 game but not sure how to approach this complex project and I didn't find much on this topic it's mostly entertainment or something unrelated
I'm not familiar with Match 3 but, assuming the game can't be solved with a straightforward algorithm, reinforcement learning is a good option
but that is a big complicated machine learning topic and I wouldn't recommend making that your introduction to machine learning
assuming you're familiar with Python already, I really like the book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow", and I think it has a section on reinforcement learning
there are also some resources in the pinned messages
Match 3 is a genre of games think of candy crush for example
I might be asking a stupid question but I always get confused between the terminology of AI and how it learns they all seems same to me
is that game real-time or turn-based?
It's based on turn you've a set number of movies and you must reach to the score
so it's possible to fail to reach the score by making bad moves
Yes rather by getting less score or not getting extra movies or not using items you have at disposal
you might start by coding something that uses a heuristic. you'd need a way to assign a point value to how good a certain move is in the context of the current board. then at each turn, your code makes whichever move has the maximum value.
so a really good concept to familiarize yourself with is something called the state space, because this is going to determine whether you can reasonably solve the game with an algorithm versus needing to use a probabilistic method like reinforcement learning
also that
the state space just refers to how many possible states the game can be in, something like tic-tac-toe has very few but chess has an impossibly huge amount
if the state space of the game is relatively small you can solve it just by doing an exhaustive search over all possible outcomes
if it's too large to do that, the heuristic idea starts becoming more appealing
This sounds cool and all but I have zero experience with designing stuff with python, at most I just know some syntax I thought of going through books first to get a bit of an understanding of reinforcement learning
and then of course if it is quite large and complicated it could be a good use case for reinforcement learning
get a very solid grasp of python and computer science before you start trying to do this, this kind of work involves a lot of computer science and machine learning ideas
By very solid you mean I should study it for couple of years before going into reinforcement learning? I wanted to do that but to me learning stuff without a clear roadmap is confusing and discourage me easily
Do you recommend the road map shown in roadmap.sh/ai?
idk what it says, you have to have an account to use it
Ok I have had an idea didn't try it yet but though of using markov chain
So
Sudo code
Key = (previous note, curr note, type, number, mood)
Value = {next note : prob}
For data in getdata(data.csv):
dict[key][Next note] += 1
Convert to prob
For key, next_counts in dict.items()
Total = sum(next_count.values()
Chain[key] = {note: count/total for note, count in next_count.items()
This should give a set of notes and their probability that can play based on the input params(key)
Am I going in the right direction?
Also if I want to predict a data that was not in this set how do I do that
That pseudocode looks correct for what you're trying to do, but this approach cannot deal with (prev-note, curr-note, type, number, mood) tuples it hasn't seen before
this is a more limited variant of what you were doing with the sklearn model before
totally worth doing as an exercise, but the support vector machine was estimating the relationship between the input data and output note in a way that this method can't
Anyone know how realistic it is get into machine learning if you are majoring in electrical and comp engineering?
you often need a masters degree to get jobs in ML, so you could potentially use your undergraduate degree as a springboard for that.
I don't see why you couldn't, what are you wanting to do with it?
I'm trying to us ece as a fail safe
but also, if you know you want to do ML, why are you in ECE?
I don't understand your reasoning. if your goal is to get a job in A, why would you study B for fear that you won't get a job in A?
i kind of expected compujter engineering to give me a foundation
are jobs in ECE more plentiful, or something?
yeah
hmmm
I'm also not 100% sure if ML is for me
i'm still kind of in the deciding phase for what I want to pursue
the only concepts from ECE that I actually think about as an ML engineer are different kinds of floating point representations.
but I'm not designing ML hardware at NVIDIA.
hmmm
that's normal.
tell your advisor that you're interested in ML and find out if there's a way you can take an introductory ML course for an elective.
great
I'm also taking data structures
that's not really part of ML, but yay
this
is it possible to learn machine learning on my own?
I'm confused. are you taking a data structures course and an ML course? two different courses?
I mean all learning is ultimately self-learning. but you pretty much won't get an ML job without formal ML credentials.
data structures guaranteed
I can take an ML elective as well
great
hmmm
so if i get a certification
no
hmmm
it needs to be from a university
coursera et al are fine for upskilling, but nobody cares about those ML certs.
so if lets say I want to go into machine learning what major would I look at
very likely CS.
On the scale of academic history, ML is very new, and has historically just been a niche area of computer science (which itself is also very new).
if you see "masters degree in {data science, machine learning, artificial intelligence}", it's probably low-key predatory
wdym predatory
they're designed to extract money from people who are desperate to switch into a currently-hyped career type, and don't necessarily impart a credential that employers perceive as having any value.
so mainly comp sci is what i'm looking for
yes. if CS is part of the engineering school of your university, it should be easy to switch.
this is normal. ideally, university should be about figuring out what you do and don't like.
thats the thing though
i enjoy a ton of things engineering and coding related
to be honest I'm mainly going into ML for the money but I also know that i'm interested in this type of thing as well
its the same thing as electrical and computer engineering except the job market is way more consistent but also doesn't reach values as high as ML
so i tried ml to identify music notes as in i feed the previous note and current note and it predicts the next note i was stuck for a while but then i tried markov chain
it works but it cant predict new data only ones i train it on
i am using chordonomicon for the dataset just smaller set for testing that has been transfromed in this format
is chordonomicon good enough to get most of the notes right but if i wanted to generlize do i use SVM?
Type,Number,PrevNote,CurrNote,Nextnote
1,1,F,C,E7
1,1,C,E7,Am
1,1,E7,Am,C
...
import pandas as pd
from collections import defaultdict
''' ----- CONFIG ----- '''
DATASET_PATH = r"Main\Machine Learning\data\chordonomicon_SmallPrepared.csv"
ds = pd.read_csv(DATASET_PATH)
# region CREATE TRANSITION DICT
# Dictionary: (Type, Number, PrevNote, CurrNote) -> {NextNote : count}
transitions = defaultdict(lambda: defaultdict(int))
for _, row in ds.iterrows():
key = (row["Type"], row["Number"], row["PrevNote"], row["CurrNote"])
transitions[key][row["NextNote"]] += 1
# endregion
# region CONVERT COUNTS TO PROBABILITY
# Dictionary: (Type, Number, PrevNote, CurrNote) -> {NextNote : probability}
markov_chain = {}
for key, next_count in transitions.items():
Total = sum(next_count.values())
markov_chain[key] = {note: count/Total for note,count in next_count.items()}
# endregion
# region LOOKUP
state = (2, 1, 'F', 'G') # Type, Number, PrevNote, CurrNote
print(markov_chain.get(state, {}))
# endregion
model = keras.Sequential([
Input(shape=(12,)),
Dense(6, activation='relu'),
Dropout(0.1),
Dense(6, activation='relu'),
Dropout(0.1),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adamW', loss=BinaryFocalLoss(gamma=2), metrics=['accuracy'])
this model performed better with 85% accuracy both on test and training data than the model below
model = keras.Sequential([
Input(shape=(12,)),
Dense(100, activation='relu', kernel_regularizer='l2'),
Dropout(0.2),
Dense(100, activation='relu', kernel_regularizer='l2'),
Dense(1, activation='sigmoid', kernel_regularizer='l2')
])
model.compile(optimizer='adamW', loss=BinaryFocalLoss(gamma=2), metrics=['accuracy'])
this model only got 79% accuracy, despite having more neurons, why is this ? i even added regularizers
Could be overfitting, could be randomness (did you try different seeds ?)
It is also weird that your last layer is a Dense(1) while you compute an accuracy
Dense(1) indicate you are doing a regression, but accuracy is a classification metric. What are you trying to achieve ?
12 -> 100 -> dropout => 100 -> 1?
many neuron is not always bring high accuracy, remake dense structure, and using dropout 0.1
your second code was overfitting actually
first one total param : 127
second one total param : 11,501
Hello guys I'm new in data science
Is kaggle a good platform for learn data science?
I don't know about the courses, but the datasets and examples are pretty good
I agree
Honestly it is pretty hard to learn data science especially in logic from kaggle for the first time(my opinion), I am self-taught.
You should rely on other sources too, youtube have some good courses, maybe someone else here have some recommendations
Great idea, thanks for suggestions
how did u get these values?
oh that makes sense
was trying to reduce overfitting using the regularizers
Hello everyone, is there any quant traders in the server ?
with neural networks it feels like im just taking shots in the dark sometimes
like add a dropout layer here, arbitrarily change regularizers, add or change the amount of layers in the NN
i get the concepts behind each of them, but the amount of variability seems honestly overwhelming
so I can't understand how to properly approach making a neural network
any ideas on how to improve this ?
will look into it thanks
what are some of the main youtubers or courses (free ones) that you'd reccomend for learning data science ?
im currently using 3blue1brown for theoretical stuff and the mathematics of it, freecodeacademy, codebasics and stat quest
I'm working on a problem in pytorch that involves comparing a model trained with both autoregression and teacher forcing, and because of how the model works, I'm incrementally accumulating a hidden tensor of outputs then attending to it before giving that output to a decoder. so something like this in pseudocode:
context = torch.zeros(batch, num_timesteps, 32, device=inputs.device)
for i in range(num_timesteps):
context[:, i] = encoder(inputs)
output = decoder(attention(context))
I'm training in mixed-precision mode and I noticed the context tensor is float32, but the encoder outputs are float16, so the encoder outputs are being upcast to float32. then when the context is attended to, the results are cast back down to float16
I don't see a lot in the documentation about whether it's good practice to create tensors like this of the correct dtype, is it ok to ignore this or should I be proactively creating the context tensor as float16 if mixed precision is active?
It probably doesn't matter, your memory usage isn't optimized but it doesn't really matter
yeah it does take up more memory, but the alternative is substantially slower
I was mainly looking at this from a performance perspective because the main loop of the model is quite slow and I'm trying to squeeze out as much performance as possible
I tried doing %timeit and it does not seem to make a difference
tensor_32_hist = torch.zeros(1000, 1000, 1000, dtype=torch.float, device="cuda")
torch_16_in = torch.rand(1000, 1000, device="cuda")
def run():
for i in range(1000):
tensor_32_hist[:, i] = torch_16_in
%timeit run()
14.8 ms ± 250 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
tensor_16_hist = torch.zeros(1000, 1000, 1000, dtype=torch.half, device="cuda")
torch_16_in = torch.rand(1000, 1000, device="cuda")
def run():
for i in range(1000):
tensor_16_hist[:, i] = torch_16_in
%timeit run()
11 ms ± 138 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
i'm not at a scale where that few ms is going to make a difference lol
this is a more memory efficient solution and it is quite slow
torch_16_in = torch.rand(1000, 1, 1000, device="cuda")
def run():
hist = []
for i in range(1000):
hist.append(torch_16_in)
torch.concat(hist, dim=1)
%timeit run()
4.7 s ± 260 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)
even then it's eventually going to take up the same amount of space as the faster solution, but not for the whole loop
oh whoops I forgot to set torch_16_in to the Half type but it still doesn't make a difference when I do
Guys wth is markov model
did you look it up? this channel isn't a good replacement for a search engine. but if you already tried to look it up and don't understand something, this channel is a good resource for asking specific questions.
I am tryna understand it for 2 hours now LITERALLY
And this isn't helping anyway
it might help to learn what a Markov chain is first. that is, you have a model of some thing X that evolves over time. and the "Markov property" is that the probability distribution of X at the current time only depends on the value of X at the previous time.
Pr(X[t] | X[0], X[1], ..., X[t-1]) = Pr(X[t] | X[t-1])
a Markov model is broadly a probability model with that property
Is it something like the anchor affect where one guess affectsthe others to anchor near the first one
it's not about a guess, it's about the actual outcome
think of it like following directions on a map
Ooo
Sure
it doesn't matter how you got to where you are, right? all that matters is where you are now, and the next turn you need to make.
the next turn you need to make is independent of the 100 turns before you got to where you are
that's one example of the markov property
So next turn ain't dependent of the previous
yes, but it does depend on the current location
Oo
(this is actually how Kalman filtering works if you've heard of that, the Kalman filter is a Markov model)
Lemme check what it is
don't worry about it. it might be more of a distraction than a help, if you don't already know what it is
there is also a very important type of model called a hidden markov model where you don't actually observe the "state" X. instead of you observe something else "Y" that depends on X but is not the same thing as X
that is: X[t] only depends on X[t-1], and Y[t] only depends on X[t]
I'm already confused tbh
are you not very familiar with math notation?
ok. in any case stick with the Markov model for now, don't worry about the "hidden" version until you understand the regular version
Yeah okay thx mate cheers
you might want to look up "Markov chains" as well. it gets complicated quickly but the intro-level examples should be easy for anyone to understand
I was looking that up only but then when they started explaining nuclear fission with markov chain i rlly got confused
yeah, it can get hard to think about
it can be a challenging topic. remember to slow down and work through things step by step. a good textbook is worth more than 100 blog posts and wikipedia pages
Should I get a book on it
I found a book on it by Joshua Chapman and ima start it now
can anyone help me learn ai/ml
im beginner and im so fascinated to learn it
anybody their to help me
??
there are resources in the pins
I am also working on a LSTM+PPO, been having difficulty getting DDP/torchrun (that creates a run script) parellelization to utilize the full CPU. 1mil model params, 50 epochs, taking 10hrs a trial at 12% CPU on a 16core lol. The more DDP processeses it just uses more RAM and the same CPU lol. Any suggestions?
what does your dataloader configuration look like? this seems like the kind of thing that could happen if the loaders can't assemble batches enough fast for the model
also some details around the model could be relevant, I'm not familiar with PPO specifically but looking it up it seems like it's a RL thing, how are you computing the cost function and training the LSTM? I'm assuming it isn't possible to train with teacher forcing if the state needs to be modified and fed back into the LSTM at each timestep
train_sampler = DistributedSampler(train_dataset, num_replicas=world_size, rank=rank)
val_sampler = DistributedSampler(val_dataset, num_replicas=world_size, rank=rank)
train_loader = DataLoader(
train_dataset, batch_size=batch_size, sampler=train_sampler,
shuffle=(train_sampler is None), num_workers=num_workerz, #tried 4 too same result
pin_memory=True if device.type == 'cuda' else False,
persistent_workers=True if num_workerz > 0 else False
)
val_loader = DataLoader(
val_dataset, batch_size=batch_size, sampler=val_sampler,
shuffle=False, num_workers=num_workerz, #tried 4 too
pin_memory=True if device.type == 'cuda' else False,
persistent_workers=True if num_workerz > 0 else False
)```
actually this one isnt a ppo just an lstm
what about the model code?
the model.train or the model class or the LSTM data class?
how about just the constructor forward function and we can go more into it if necessary?
Click here to see this code in our pastebin.
I guess I'm just trying to figure out how the LSTM is being used and if it should be extremely optimized, or if you're doing a step-by-step thing
yeah this looks pretty straightforward
Then i launch it using a script & subprocesses wtih DDP
Click here to see this code in our pastebin.
wait what is this, it's a script that outputs a script and then runs it?
i'm assuming this was written by a LLM too
sry dood I don't really want to try to parse through this :\
good luck
all good lol
hey where's the resource page?
!res
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
this?
yes it used to be there in the # list?
gotcha, this is the machine learning channel, you might have an easier time navigating if you start in https://discord.com/channels/267624335836053506/267624335836053506
what do you guys think are the best coursera courses for learning AIML
or maybe is there one that isn't from there, or a playlist, anything?
mmm. someone does maybe know how to write an algo which will see relations between ordered number sets?
i mean, dataset like that:
8 77 90 34
6 55 80 17
44 99 100 51
22 111 70 34
...
as far i saw (by googling) it should be unsupervised model to assign tags to have relations but unsure here…
!rule 9 We don't allow offering payment
ah okay
maybe try to run it on a gpu ?
Like on google collab
what do you mean by relations exactly?
are you looking to cluster? this is probably what the googling results mean by unsupervised model, you'll get groups where data instances in same/different groups will be similar/different according to a metric
and using your 4 column dataset as example, if you're looking for something like trying to find if there's relation between say column 0 and 2, then have you tried just statistical measures like pearson correlation, info gain, etc?
i'm looking for something that will find relations between numbers in dataset
between which numbers?
are you trying to group different data points together? are you trying to see if some column A correlates with column B?
i'm trying to find correlations between sets of numbers (less or more random, like, for a game :) )
so it can for example see that in that example first numbers are dividable by 2 and last ones by 17…
I haven't heard of any off the shelf solutions that would do this at least
and like calculate some expressions like idk. 2n*7+20 (where n as in number in array of sets) for each piece of set…
and also, there are basically infinite possible rules you can fit to some random dataset
like you can find an unlimited number of 4-variable equations where half of your dataset would satisfy the equation and the other half not or something
i know…
like just wanted to make something like "akinator" but for maths where player inputs n numbers related and it says what should be next, then to keep results in file and make answers also personalized for player…
something like recently was added to excel i mean, doesn't need to be exact, as i say - it's just for a game…
you can make test manually mulitple rules and say you don't know if none of them work
I mean, make a few regressions, test some divisibility, check some easy known series like fibonacci
that kind of things
i'd rather have it say something than nothing even if is far from that number tbh…
I'd probably just do what tanguy said, set a few manual rules to check
at worst, you can just fit a n-degree polynomial to an n-number input and that will always produce a correct answer
I mean, if your rules didn't find it, then have it say a random rule, or the closest
not to say this can't be interesting though, when it comes to optimizing rule checking
for example, you can check gcd(a, b, c, d) != 1 for if the numbers are divisible by something (other than the trivial case of divisible by 1)
for a harder example, the berlekamp-massey algorithm will automatically find the shortest linear recurrence of a sequence of numbers (like the fibonacci sequence of F_n = F_n-1 + F_n-2)
thx
i am building expense tracking using llms. i discovered langchain and langgraph. which should i use? also should i use 2 llms, one for tool call and one for actual user interaction?
I'm planning a rather ambitious AI, and I wanted to get you guys' opinion
I want to build a diffusion model for magic the gathering decks. Diffusion is a great choice because it excels at learning the implicit structure of data while also exploring with maximum creativity. A deck of magic the gathering has an internal structure, but not one that can be easily explained in words
The problem with diffusion though is that can't output real cards. It'll output continuous values. Specifically, I'll pass in a 100 x D matrix where 100 is the number of cards and D is the number of values in each card's vector
I'll add noise, then denoise. During inference I'll have the use select some seed cards and diffuse the rest from noise
What I'm thinking is that to convert the diffuser's output to actual cards, I could use a transformer and iterative refinement
@final cobalt this makes sense to me.
so each individual output from stable difussion is an array of shape (D,)? and then you have to reshape that to the intended dimensions of the card?
I just figured that the dimensions of the desired image is an inference parameter.
Note: not stable diffusion, just diffusion in general
You can add noise to and then remove it from a tensor of any shape
I started the khan academy course for linear algebra and after this one i plan to also go trough both the one for statistics and calc but there's a lot of content and i realised that its gonna take a lot of time to learn this part. So i wonder if i should just focus on the math and nothing else for a while or also do something else as well? I learned the basics of pandas numpy and a bit matplot so far tho nothing too much
did a pandas project on whats the biggest winning factor in LOL and some other little programs as well
What do you guys use for parallellization, just DDP?
Linear algebra is hard fyi
It was the only class I've ever had to take twice - though in my defence, my first teacher spoke in broken English and was very difficult to understand
XD It was so stressful that one day I just stood up and walked out
ik a bit linear from hs but nothing too hard but ik its a really important topic for what i want to do so i'll need to spend a lot of time here
Anyway - you don't need the math to succeed in ML, but it definitely helps. My advice would be to learn the math at an appropriate pace but also work on your practical ML skills at the same time
so i should learn ML concepts as well as the math?
many sources recommend that u first learn the math but that will take a while so i came here to ask
i watched the first 2 videos and they were great but i thought i can go trough khan academy and use 3blue as a second resource to clarify and solidify the concepts i'll learn
the visuals are really helpful
I guess I'm just saying that if you wanted to you could start using ML libraries without learning the math first. Advance on two fronts
tbh i'll get into the math and try to get a good grasp of it since i wanna understand what im doing but also try some MLalgorithms and see how they work
"The Math" is three terms of calculus and at least one of linear algebra
but i'll be starting school next week and there's so much content to cover so ig i'll try to do 2h a day
So, three terms worth of work
i got linear algebra starting this year as a subject i think so that's great
thats true i'll most likely go to uni so it wont be a waste of time for sure
I have a special loathing for linear algebra
Though this might have more to do with that teacher I had than anything
thats very true
He was very, very Ukrainian
Very impatient, and spoke in very broken English
There was also this girl in the class who was very autistic. Now, don't think I've got anything against that, but one of her quirks was that when she was frustrated she just talked endlessly
ik a bit ukrainian since my language has simular words and way too many people in these regions speak broken english
So on one hand you could barely understand the teacher, on the other hand you had this girl talking non stop for the entire lecture
there's an autistic girl in my class and when shes nervous on exams she starts panicking which can be very distracting so i get u
ego is a huge factor
some teachers dont understand they are on the same side with the students
at least they should be
Oh!
Sadly he doesn't have a linear algebra course, but
This Channel is dedicated to quality mathematics education. It is absolutely FREE so Enjoy! Videos are organized in playlists and are course specific. If they have helped you, consider Support:
You may find and support me at Patreon.com/Professorleonard
Please consider "Whitelisting" this Channel on your AdBlock if it is enabled.
Your su...
Professor leonard (aka professor sexy, as I call him) is your god with respect to calculus
I didn't even bother going to my calculus classes. I just learned from this guy and showed up for the exams
ill check his chanel out
Glory unto professor leonard
some schools are trying to implement more collaboration but id think its working too well
it is, i hope more schools start doing that
thanks a lot for the answers
does anyone have any free courses that i could sign up for to learn more about machine learning beyond google searches
I strongly recommend ChatGPT
You can ask it to mak you one. Better yet, though, rather than trying to "learn ai" I suggest you try to build one
Autoencoders for images are a good start
You'll need a powerful GPU though
ehhh don't do this, if you don't already know about machine learning you aren't going to be equipped to protect yourself against things it tells you to do that are incorrect or unusual
if you're into books I strongly recommend "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow", and there are more resources pinned in the channel
You're not exactly wrong, but I would argue that learning from LLMs is a skill in and of itself
One that we'll all have to get good at
yeah, and a major part of working effectively with LLMs is understanding what you're asking about to the point that when it tells you incorrect things, you can recognize it and work around it
hey, i wanna use langchain/langgraph for my project. But since its paid service, can u guys give me a review about langchain and langgraph. Is it a good and recommended framework?
That’s what it’s best to understand FIRST and then use the LLMs to help or fill in knowledge later
definitely agree w you that you shouldn’t be using LLMs to learn at first, only to supplement
Langchain and langgraph is free and open source
What's paid is some of their logging solution but you don't need to use it to use langchain and langgraph
I've not tried it myself, but I do see a good amount of people who don't like langchain, most citing bad abstractions
I mean at the end of the day all you're doing is just organizing text to send to an llm, you probably don't even need any framework for that
I have a lenovo loq 15 with an rtx 4050 6gb, will it be enough??
perfectly fine for learning, you won't be able to do big complex models with it but you'll get a lot of mileage out of that before you hit a wall
also any study of machine learning is going to include the classical non-neural models like linear regression, SVMs, decision trees, and so on, and most implementations of them don't even use a GPU
and even then some smaller neural networks don't benefit from a GPU, you probably won't need it until you start studying very deep neural networks or any of the more complex types like RNNs and CNNs
Should be plenty!
I also want to suggest, again, starting with an autoencoder
Is summarizes the basic concepts you'll be working with - convolution, linear layers, the symbolic nature NNs
And it opens the door for lots of fun stuff, like using compressed latents are conditioning for other models, diffusion, etc
If you're going to investigate non neural methods, I recommend GOAP
It basically means making a graph which works as a map from current state A to desired state B (and scoring each edge on the way from A to B), and then using A* pathfinding to find the shortest path
Bonus: nodes along the path can be made to degrade into primitives when prerequisites aren't met
GOAP isn't machine learning? I wouldn't consider GOAP to be an alternative to a neural network, it seems like it's in an entirely different problem domain
not to say it couldn't make use of machine learning, but just that in a typical sense of looking at what machine learning is trying to accomplish, a support vector machine or decision tree is more similar to a neural network
tysm!!
Is there someone who can help me? I have an AI that I built in Google AI Studio that I'm trying to connect to streamer.bot - I have no idea if I have the coding or syntax right, I'm completely new to this.
what kind of model for which task?
if you just want to have machine learning for the sake of having it, you can use something simple from scikit-learn
if you want large language models (aka chatgpt-ish chatbot), use an API such as Google Gemini's instead of trying to host it locally
Hey folks, any resource recommendation for anomaly detection in time series data? (Prometheus)
i just wanna find a good place to figure out how to figure out how machine learning works
100% agree. Main advantage is the personalization where the LLM can adapt to the style of learning of the user while also being relatively factual.
XD They really are unavoidable these days
A sensible attitude, but it seems LLMs have won out
It seems you can even rely on them as complicated if statements, filters for data, in living code
The only question these days is having a reasonable GPU
Also, why tf am I awake at 7AM!?
I'm building an MtG platform
And it seems ChatGPT/DeepSeek is capable of acting as an AI opponent
Magic the Gathering
it doesn't look like you need a LLM for that
make actions in a card game
s** sorry that was not for you then
my bad
I'd like to reiterate my amazement and displeasure
Why tf am I awake at 7AM!?!?!?!?!?
As a rule that's good reasoning
But I drank 7 very strong beers last night
I should be conked
it starts to be a little off topic....
Sure, here are simple solutions to solve your alcohol induced circadian rhythm disruption:
- Stop drinking
- Stop sleeping
Would you want an example of how to use these solutions or maybe some solutions to fix your drug addiction-induced cholesterol?
Hi people, I'm new to AI programming, having learned Python these last few years mostly to gain access to ML libraries such as Torch and Tensorflow.
I want to explore writing real time AI agents to play games, similar to this now 10 year old video about using neural networks to play Mario. https://www.youtube.com/watch?v=qv6UVOQ0F44
However, I wanted to ask if anyone is experienced in this area and would recommend the best way of going about this in the modern era (not 10 years ago). What libraries are favored, are there any good resources for this topic?
I want to explore building and training AI agents to play relatively simple 2d games at a high level of skill, with my highest goal being that they be competetive against high skill human players in PVP.
When I first began researching this, OpenAI playground was a common recommendation, but that was quite a while ago.
So far I have found myself favoring working with Tensorflow, as it's being integrated with Unity, indicating an emphasis on game development. I am not a Unity developer myself though.
Using the high level API Keras, I have gotten the impression that it is not much harder to prototype in than with Torch.
Architecturally all I really know is that I will be serializing my complete game state and running it separately from the rendering, so that a neural network can have parameters to act upon to achieve the goal, and so the game simulation can run quickly to facilitate many repetitions and training.
Anyways any recommendations greatly appreciated. Thanks.
MarI/O is a program made of neural networks and genetic algorithms that kicks butt at Super Mario World.
Source Code: https://gist.github.com/SethBling/598639f8d5e8afb5453a0b9519be51ff
"NEAT" Paper: http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf
Some relevant Wikipedia links:
https://en.wikipedia.org/wiki/Neuroevolution
https://en.wik...
If you want to do like what you see in the video, you'll want to start studying reinforcement learning
my usual recommendation for this is starting with the simplest formulation of it, which is multi-armed bandits, and that will get you started on really important concepts like rewards, regret, and the exploration/exploitation tradeoff
you can play simple games like tic-tac toe with Q-learning, which can be done with or without neural methods, and will introduce you to more complex action selection
I can recommend a textbook but I don't know if you're intending to do a very rigorous study of it or if you're wanting to be more hands on
this is a great library though: https://github.com/Farama-Foundation/Gymnasium
note that this is just reinforcement learning which I have some experience with, but the mario thing you posted was made with genetic algorithms, which isn't quite the same thing
If you decide to study reinforcement learning, you'll find a lot of examples online using this library for learning to play a bunch of different games and toy problems
hey can i get some guidence on making a chatbot for a project
Study reinforcement learning: https://mitpress.mit.edu/9780262039246/reinforcement-learning/