#data-science-and-ml
1 messages · Page 145 of 1
it’s like 20 terabytes all together
I have master's so what phd math I need, according to prevoius message/s?
and do people come from applied math who implemented these scikitlearn things? I mean these more advanced not these which you learn at university
so these books like math for machine learning or math for deep learning?
lmao, yeah id get an external hardrive and then use PILGRIM and OS to get the files one at a time when needed during runtime
doing preprocessing on the data might also help but it would take a while for 9 million images
So it would be a bad idea to store them on s3? why?
keep the kernel size 3x3, very rarely you need anything else
don't use stride or dilation (unless you are doing wavenet)
do just add 1 padding (if I remember correct).
this way, you keep the dimension the same between layers, you can add them res-net style.
I hate doing this math too
its way easier and faster and permanent
i have no idea what the cost of s3 cloud storage is but i do know its a monthly cost and a hard drive is only an up front cost
I mean sth like lars,omp etc
Orthogonal Matching Pursuit (OMP) and Least Angle Regression (LARS)?
I'm not sure how they are particularly relevant?
quick calculation i did shows that it costs about~
$460 for a month of standard S3 20 Terabytes
and a 20 Terabyte hardrive costs about $800-900
so if youre going to be using it for long term storage its better to buy the hardrive i guess
Ur Indonesian?
Yes
is there pdf of user guide or at least single html of it I mean scikitlearn I wanto to skim this
Saya mau nanya seputar interview
I'll take your advice, for now after long time I ended up with this:
Im doing GANs, so I used 5x5 kernel, and 2 strides in transpose to upscale them each time. I kept the padding same..
btw, I'm not familiar with GAN, so I could be wrong. Generally speaking, unless there is a very specific reason, why not just use the existing architectures?
what architectures? If you mean the layers and structure, I want to learn how they work by building things myself
man I dont know if this will work, even kaggle's GPU P100 ran into memory error. Reduced by batch size for images from 32 -> 16 -> 8 now. Even this is taking too long.
~3 minutes for each batch
I hope the result will be good
I accidentally added cmap='gray' to see results every epoch 💀
It's already been through 8 epochs for 25 minutes, I can't change it now
i was thinking why it's all black and white
That's true, but for me personally, I would to prefer by getting started with
- finding a few famous architecture,
- going through the source code to make sure I absolutely understand every single line and WHY
- tweaking them and see what works / fail, usually to answer the why question, Why they do it this way, not some other way? Let's try the other way and see. Sometimes you figure out why they do it that way, sometimes you just became an inventor.
Going from scratch are very useful, but also painful.
that's a great way to learn as well, to each their own. I won't stick to my method for too long if the results aren't as good. I'll listen to yours if I need so
What have you learned so far?
I put nearly everything I learned so far online https://www.arianprabowo.com/research-and-publications https://scholar.google.com/citations?user=ozZvUN4AAAAJ
its just approx 400 pages related to supervised unsupervised learning so its doable to read
rest are examples, api reference total 2.5k pages
What about storing it in Glacier
oh I see you are in geometric deep learning so to do for example deep render I need scene graph so GNN?
I'm not sure if this question is addressed to me. And if it is, I still don't understand the question.
yes, for example if someone do deep rendering then need scene graph for it?
I mean when just one object so cnn is enough
Worried that If I store it on a hard drive it will get corrupted
this is interesting
but for scene he uses just cnn
and he says in readme to make triangles need rnn (so sequences)
that would take forever to get each file at runtime no?
Yeah this is my concern
Because the files have to be redownloaded right?
I'm not sure how to answer your questions, but I have a few comments.
Firstly, when I say geometric deep learning, I usually refer to non-euclidean geometry.
Secondly, I am not familiary with neural rendering. I have read some papers, I think they have really interesting, but I have never used it, so I can't make any practical suggestions.
Finally, it seems that the best approach is using radiance field instead of CNN https://paperswithcode.com/task/neural-rendering
Hey, I want to evaluate the quality of my documents corpus. Quality means that it should provide information, be coherent etc… my corpus could be in any language. For the moment I tokenized my text and compute shanon entropy but I want to mesure in a better way
If people someone could help me I would be very grateful
yeah, that would be an issue
if you want speed you need to have direct access tot hem
so neither of the deep storage models will work for you
you might be able to use the infrequent access model though, but i think youd be using standard if youre going to be using the images a lot for training
and if thats the case, a secondary storage connected to the computer is a lot easier to work with and cheaper over time as it only inurs an up front cost
but its up to you what suits your needs
How do I ensure data doesn’t get corrupted
lead box /j
not sure, i havent worked with that amount of data before
but if its jsut training data, i think adding a small detector before loading each file would work well enough
because if a single file amongst 9 million gets corrupted, as long as you can stop it getting into the network, then you should be fine
To detect if the file is corrupted?
during the preprocessing stage of loading the file for use in training etc.
when loading and processing it, if it was corrupted it would cause a runtime error
so place some tests to check and stop those types of files, and then continue with a different file
So here’s what I want to do:
I have a bunch of images of cans, I want to segment just the can and embed that image for similarity search later so if someone uploads a can it’ll find the exact brand etc.
I was thinking yolo to draw the bounding box around the can (some images don’t have cans at all), then SAM to segment it.
Does this approach make sense? Or is there a better way
that relu function looks weird, usually you would implement it as np.max(x, 0)
arent they effectively the same thing? or is it doing some vectorisation shenanigans im missing?
you can also write it as np.where(x > 0, x, 0) but that's a bit slower afaik
That's so weird, I had to think about that for a little bit
i forgot i was messing with the mod function during the day lol
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def relu(x):
'''
if x>0:
return x
return 0
'''
return np.where(x > 0, x, 0)
#'''
def leaky_relu(x):
'''
if x>0:
return x
return 0
'''
return np.where(x > 0, x, x*0.5)
#'''
def activationfunction(x):
f = 0
if f==0:
return(sigmoid(x))
elif f==1:
return(relu(x))
elif f ==2:
return(leaky_relu(x))
def sigmoid_derivative(x):
return x * (1 - x)
def relu_derivative(x):
'''
if x>0:
return 1
return 0
'''
return np.where(x > 0, 1, 0)
#'''
def leaky_relu_derivative(x):
'''
if x>0:
return 1
return 0.5
'''
return np.where(x > 0, 1, 0.5)
#'''
this is what i was using before hand
Ok, that makes a bit more sense
and then it just loops constanly outputting [ 0, 0] with a loss value of 0.5
can you check the distribution of all the weights during training?
not currently
maybe a boxplot of them using matplotlib
or just calculate and print the min/max/mean/stddev
https://paste.pythondiscord.com/EXIQ
heres the initialisation weights and biases
hmm, what are all those 1's at the end?
oh wait i got it to print the data during runtime and everything immediately gets set to NaN for some reason
the biases
oh, is that standard practice?
idk, its jsut what i do, works fine for sigmoid
We can check back with it once the NaNs are gone I guess
i set the biases to zero
cause i had them as zero before
and the NaNs are gone
oh wait i had it running sigmoid nevermind
(L106) I think it's because it's dividing by zero here ```py
derivativeA = -(target / activations[-1]) + (1 - target) / (1 - activations[-1])
since it's really common for activation to be 0
I'm not sure how one would fix that though
well thats kinda silly
Let me refresh my memory on backprop real quick
Make a Ann from scratch
# AA = previous layer, A = current layer
# W = weights, B = biases
# dX_dY = del X / del Y
def backprop_layer(W, B, Z, A, dC_dA):
# Z = WA + B
dZ_dW = AA; dZ_dAA = W; dZ_dB = 1
# A = activation(Z)
dA_dZ = activation_derivative(Z)
dC_dW = dC_dA * dA_dZ * dZ_dW
dC_dAA = dC_dA * dA_dZ * dZ_dAA
dC_dB = dC_dA * dA_dZ * dZ_dB
``` I think this was the gist of it?
@odd stratus how did you come up with the formula in your code?
mish mashing a bunch of stuff from youtube cause the math was bonkers to try understand the first try lmao
also wdym formulat?
is that a typo or smthn
I would approach it by implementing backprop for one layer then taking it backwards till the first layer
I think this is pretty close, except for some matrix multiplications here and there
the main problem was trying to make it so that it can scale to have any layer sizes and depth like i wanted
once you implement it for one layer, you can call it in a loop to make it general
hmm this is just chain rule
Yes, backprop is essentially backwards chain rule: https://www.3blue1brown.com/lessons/backpropagation-calculus
to go from D to A you go to C,B like in graph
where D is end A is start
so DC, CB, BA is DA
there is reference in grokking machine learning about these multiplying of partials etc
Appendix B Math behind gradient descent
yes this is just calculating partials and substituting and multiplying
i don't like kaggle notebooks
my 3 hours of gpu "memory error"
i had saved checkpoints but after reloading they weren't there
hi
how would i train an AI to speak like a friend of mine? he gracefully supplied me with some of his writings (hes a literature nerd) and i thought it would be funny to train an AI that could imitate his works
train from scratch??
( need to learn more then )
OR
use pre-trained models!
pre-trained sounds like a good idea. i have like 20,000 words to train it on
20000 words??
I guess it should be context related for you right?
then only try a simple text model and train for your context
what does that mean
like for example, suppose I am training a model which will act as my resume chatbot, so like if you ask it about my self, my skilss, it will give me that info
this is consider as "context" to make personalised
ah ok i get it now
how and where do i find one of those?
ahh, search that
or if you get more confused share here, so that others can also help you
about that particular model
whats up @unkempt apex ? Long time no see. What have you been working on these da ys?
done, road extraction from satellite images
you said, we will do something together?, why you were not online these days?
@rich moth ???
Learning AI in University, anyone have a good youtube channel for learning fundamentals?
Currently learning efficient tree / graph searches. Using pruning and cost eval functions.
Working on stuff like game theory, min-max, alpha pruning, ect. So like basic basics
basic basics? Check out CS50 for AI
Like a Senior in college who is taking his first ai courses. Taken lots of theory and algorithms classes, but never really worked / developed in AI. Ty, Will Check it out
https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi is also a nice primer
what knowledge is needed to understand this
Is 85-15 class balance in a binary classification problem bad enough for logreg to predict all 0s?
if so, how would i solve this?
Ubalanced data only affects the model intercept. If you can guess, or god tells you, you can apply the correction to the intercept
Also look at the ROC to help w/ thresholding
so, I should just pick out a balanced sample and use that for training
or at least that'd be the easiest way
There's no need. You could use an ensemble method if you want
such as RFC?
xgboost and if you want to explore it more, you can look at using different weights for each class
that assumes your goal is the highest prediction accuracy, the model will pretty much be a black box
Im just trying to explore a few different algorithms
xgboost is very commonly used in industry
and trying to understand why some dont perform that well, such as log reg. My first assumption was the class imbalance
there could be other reasons such as not having a linear correlator with the features
you could look at something like decision trees as well
would tf be worth trying as well?
tensorflow doesn't mean anything
We took a big vacation and I just focused on some other projects around the house. Oh, I got Baldur's gate 3 with some work buddies, that took over a month of my life. Ya, I did! Always looking to work on something, I recently started tinkering again with that capture the flag game using pygame and ML to train ai agents using q-learning and some other stuff. Also still messing around with the AI model that can learn and generate images using captions
did you worked on that game? on what part?
Ya, I made a lot of changes still debugging it though. I caught up training this model havennt had a channce to mess with since monday.
The players are suppose to learn from the environment now also interact with obstacles and colab with teammates
if anyone is familiar with transformer.pipeline, is there a way to natively map a pipeline over multiple inputs?
using a threadpool works quite well, but i'm wondering if there isn't already a built in way
with ThreadPoolExecutor() as executor:
results = executor.map(model_pipeline, list_of_strings)```
from transformers import pipeline
model_pipeline = pipeline(
"text-classification", model="model"
)
with ThreadPoolExecutor() as executor:
results = executor.map(model_pipeline, list_of_strings)
``` better than this i mean
oh, i can just pass the list to the pipeline it looks like
anyone use darts for time series forecasting
Is it normal to forgot parameter and functions?
has anyone worked with labelbox? I'm trying to export my annotated image. I don't want in export in json format. I want mask image
Hey, how can i usea set of multiple CSV Files into my training dataset for feeding into my LSTM network?
Or in other words, I want to use multiple CSV Files as training data for LSTM. How can i do it?
I do not want to concatenate all the CSV Files
Are you using hugging face for the CSV files or are they already in a directory?
When you got time to take a look at it we can build a github page or something together.
they are already in a directory
which tools should i use create mutli class segemenation dataset?
You can create something with torch.utils.data . Check out Dataset and DataLoader.
Thanks
prolly create a custom data loader?
What are the specifics for your project in terms of multi-class segmentation?
That's what I'd would look into.
my project is image segmentation and i'm using unet for it. I have more than one features in image. So, i'm looking for some tools to create dataset
i have tried labelbox,apeer etc
but none of giving me desired result
thank you 🙂
Torch, numpy, labellmg and check out opencv
Try labellmg
not really sure
ig i have tried labelimg
hmm... vgg image annotator?
yeah i have tried. I think i have to go through long process. If that's case. VGG annotator will give me image coco json format after that i have to convert it in mask.
im still getting a bit lost here tbh
i kinda suck at backpropogation
i understood the forward propogation lmao
is Leetcode still relevant for DS/ML/AI or is that mostly asked for SDE roles? I’d like to know if I’m wasting my time grinding LC
can you use another tool to convert the mask format into something that works for training a unet model?
pycoocotools? not sure : \
mostly just asking for internships
which tools u would recommend me?
ok
They are essential for getting the job. LC is absolutely useless after getting into the job
Im interested in hearing more about what you've been working on.
Has anyone seen @Lisan Al Gaib
Does anyone knows what is that? It imports my libraries but I am scared to not crash when I have astronomy Olympiad with computers in a week.
read message
We'll call C the quadratic cost function; it's also sometimes known as the mean squared error or just MSE.
I'm confused. Both of them are MSE but different?
nvm i got it
finally i'm able to achieve it. I don't know how much it's correct. Let's see. Thanks for your help
thanks
hee left!
anyone knows a good library or downloadable model that I can use in python for converting speech to text?
whisper is probably the best one, but you need a GPU to use it.
yea and also that I don't wanna leverage open ai 😭
thanks
it is open source and completely local, I wouldn't even consider it leveraging open ai?
(also not sure if I'd consider openai significantly worse than amazon, meta, google etc. - you should leverage open source as much as you can regardless of it source imo)
okk
Can whisper work offline?
if it's on your computer, then yes
true
You could use it via an API, in which case you don't need of a GPU nor have to download model weights or run anything resource intensive yourself, or you can download and run it locally.
If you download and run it yourself, you do not rely on any online services (after downloading everything) at all
Thanks @agile cobalt @serene scaffold
???
I'm actually a beginner
you can watch this video without anything
this video teaches ever common details for you
Recommended me something
i learn python basics before 2-3 years and i would like to reverse the python topics and learn machine learning like a proffesional
and i search and find a roadmap for ml
and i follow steps which are in the ml roadmap i find
i find roadmap at this channel
@tiny bluff I'm trying the basic understanding with stats and SQL first before touching python
is there any1 who is familiar with qlearning that could help me in how to pick my alpha, gamma, epsilon and epsilon decay? Im not sure how to determine what values they should be
Hi guys, what topics are typically required for ML interns to be confident with it
look up positions you would apply to and see what they're asking.
you'll probably want at least some statistics, linear algebra and basic numpy syntax/usage though
What else that mosh does?
i dont know actually i deal with only machine learning however you can search
Hello
This is a very basic question but I am still in the earlier stages of wrapping my head around the relevant details.
I'm a soon to be second year AI and Datasci student engaged in the RSNA 2024 Lumbar Spine Degenerative Classification purely for the learning curves.
https://www.kaggle.com/competitions/rsna-2024-lumbar-spine-degenerative-classification
A peer of mine, perhaps correctly, says that we have to split the images into training, test and validation classifications. He wants to do this using code that randomly selects images and puts them into any one of the 3 categories.
However the competition already presents testing and training datasets with, I'm sure I remember correctly but couldn't find the documentation that details it, a final unseen set of images that it performs the classification on so as to determine the effectiveness of the model.
Also nowhere in the EfficientNet sample can I see anything that does that classification.
https://www.kaggle.com/code/charlesexiaviour/rsna-efficientnet-starter-notebook
I think I am right here in that in terms of testing and validation the images are already classified and it's only through a dictionary that some of the images need the conditions and plains added to them.
Thanks for any and all help, any clarification will help a great deal.
Classify lumbar spine degenerative conditions
is anyone up to challenge to code some sort of algorithm that analyses students requirements (14 students for now) and creates schedule (monday - friday, time 13:00 - 21:00 with 15 minutes break.) i can send you chart with the information from the students (with false names, only time will be correct nothing else)
Adding to what Etrotta said...
I remembered gathering the job description of about 9 companies I wanted to intern for, then I used a spreadsheet to track the common skills mentioned by those companies.
This gave me a clear idea on what my area of weakness was and what I needed to further improve on.
Is there any way to parse this which degrades gracefully under morphisms? https://www.partnersincareoahu.org/vacancy-grid-2024
What if "but what about the poor AIs" is merely a sophisticated metaphor for "but what about the middle class"? https://www.marktechpost.com/2024/08/21/megaagent-a-practical-ai-framework-designed-for-autonomous-cooperation-in-large-scale-llm-agent-systems
Large Language Models (LLMs) have advanced rapidly, becoming powerful tools for complex planning and cognitive tasks. This progress has spurred the development of LLM-powered multi-agent systems (LLM-MA systems), which aim to simulate and solve real-world problems through coordinated agent cooperation. These systems can be applied to various sce...
What's the difference between skit - learn and other machine learning library
scikit-learn helps you to train, evaluate and run inference using a bunch of 'traditional' ML models such as linear regression, decision trees, and random forests
pytorch / tensorflow / keras are focused specifically on Neural Networks, though they support a lot of different architectures for them
there are a few dozens of others somewhat popular libraries you'll see, and hundreds of niche libraries
e.g. numpy can be used for nearly any operation involving multi dimensional arrays (vectors / matrixes / so on), jax is similar to numpy but includes automatic differentiation, transformers & diffusers are focused specifically on running inference for popular models, and there's a lot of libraries that are just wrappers on top of others
they also have varying levels of support for runnings things in the CPU vs GPU, but I'm not gonna go into detail about that
i want to improve the old code i made its a simplified NEAT (on my bio) and i have no idea how to do it someone please assist me
Find a roadmap with actual videos and lessons, including projects. @tiny bluff @spare forum
Is that better roadmap?
Just don't be afraid to start tbh there is not an absolute roadmap ressources etc... Every time I've spent time searching roadmaps and shi nothing ended up done, everytime I applied freestyle learning I did projects etc... And learned the most
anyone have good sources to learn order precedence?
we got given this but i have no clue what this is trying to say
i assume down the list = order
but is there a reason bitwise not is higher up then the others?
precedence is highest topmost, lower precedence as it goes downwards
each column tells you what that operation applies on
so bitwise and, happens before things like logic operaters?
no, roadmaps in general are pretty bad
the best way to learn is by doing projects
Tbh everytime ppl search for roadmaps for weeks and end up doing very little
what variational means, I relate it with probability and some prior is it good thinking?
Sounds like someone wants their homework done for them
So, basically I trying to install intel-extension-for-pytorch, But I'm encountering huge errors, Full log: https://paste.pythondiscord.com/JBBQ.
Any solutions?
You do not have a C++ compiler it seems
Did you install G++ or some other compiler? @mystic ruin
I think I don't have
do that 😛
or just look in some glossary?
hi
How can i make gpt2 model to generate questions from answers? I have list of text messages and random conversations, I'm trying to convert them to Q-A type
Type your answer: The capital of france is paris.
Generated Question: Given the following statement, generate a relevant question: 'The capital of france is paris.'.
"If you use the word paris, you may get a similar answer. The word is a synonym for 'posterior, adverbial, pungent, repugnant, distressing,
objectionable'. But if you use the word adverbial, adverbial, pungent, repugnant, distressing, objectionable, you will get the same
prompt = f"Given the following statement, generate a relevant question: '{input_text}'."
what am i doing wrong??
Expected output: What is the capital of france?
@proven inlet gpt2 isn't instruction-following like ChatGPT is
But doesn't chatgpt use gpt in it?
It just keeps generating text that's probable to follow whatever you pass to it
ChatGPT is a gpt model that's tuned to be interactive and instruction following
oh
How can i tune a gpt model to chatbot with texts but not Q-A types?
chatgpt used text mostly to train afaik
For a time series prediction which model would be more ideal? other than LSTM
Gpt is trained entirely on text. There is nothing that has any meaning to gpt except text.
You'd probably have an easier time with a "small" language model like mixtral 7b
Yes gpt is a LLM
I know that.
im actually trying to generate chatbot so small language model would not be enough i guess
Mixtral is better at instruction following than non-chat gpt models
It's still a "large" language model. But the L in LLM is meaningless now.
can i finetune gpt2 to become a basic chatbot?
You don't have enough training data or time for that
is 5k list of messages enough for that
Not even close.
Oh.
The amount of training data and compute time required to create and tune these models is astronomical
That's why only large companies like meta are putting out LLMs. Everyone else is innovating by finding creative ways to prompt them.
just of curiosity, could i train gpt2 with pure text but not Q-A type? With 1B diffirent texts eg
How many words?
over 5B
And what would you be training it to do?
Chatbot
So you'd be fine tuning it to produce text that follows a certain structure. Namely dialogue structure
Which is what ChatGPT is
You might be able to do it with that many words.
But they don't have to be Q-A format or do they?
Like can i use training data for wikipedia and books
But not dialogues
If you train it on Wikipedia, it will generate content that's structured like a Wikipedia article
And it probably won't behave naturally if you ask it a question in a conversational way
But it also won't be repeating me right? When i ask for what is the capital of France for example
will it continue my sentence?
if not, what makes it to not continue the sentence
If you prompt gpt2 with "the capital of France is", it will probably finish the sentence correctly.
You have to tune it on text that is structured as the kinds of interactions that you want to have with it
But you probably don't have enough data or compute time for that
So you should probably use an existing language model that is interactive, like mixtral
Okay thanks, I'll use mixtral
Guys I want to know whether aws provides any free services which can be used in ml?
Free trial with limited access, not like free forever
(AWS sagemaker)
gcp provide free credits for new accounts which is 300€ equivalent
How long is that access for?
I believe 1 year
After that they charge?
Yes
Have you tried azure?
Nope mainly aws, gcp and databricks (just for learning)
So for a year I can use it using one email and create another account with another email to get another year for free?
You still put credit card and shi so pbby not so easy, and the use is very bounded, pretty much it's only okay for side projects and learning
may I use tensorflow on windows on python 12
you have my permission
also Python 12? 
anyway, apparently on Windows the latest TF versions only work through WSL because sth sth they dropped Win support? not entirely sure, but sth along those lines
basically yes, but only through WSL
yeah it works only on python 11 not python 12 I'm not sure why
I personally am only on Python 3
@wooden sail @iron basalt just want to confirm my understanding here is correct. The attention model does Q * K to update "words that represent each other". The model has no actual understanding of this. What it's doing is changing the weights so the Q * K (attention between each words) becomes better over time. This is simply a matter of running dot product on all words in the corpus numerous times to find a relationship between them. This relationship can somehow be captured by dot product attention, because that represents cosine similarity, but ultimately the reason the model can converge to this representation is because backprop will adjust the weights of the model to better create Q and K vectors. When the model makes a mistake, it will adjust the weights, do Q * K again, and the newest iteration of Q * K will be a slightly better "relationship" capture between words
you cant use cuda with python 3.12 on windows without wsl
use 3.11 if you dont want to deal with wsl
This is from my first epoch on the multi-modal learning system I've been working on, where I’m combining a VQ-VAE model for image reconstruction with feature aggregation using CLIP for text-image alignment, and BLIP for generating descriptive captions. So far the results seems promising
what is wsl?
linux inside windows?
yup
Open the microsoft store and search for WSL
I want to start somewhere before doing any projects. Need some knowledge
its pretty east to install these days. let me know if you have any questios
is it heavyweight or requires a lot of set up? I'm planning to change my pc so if it is as I said I'll do when I get the new pc
Plunder is back!
Not at all. You enable a few things like hyper-v I believe but after you enable of those options in windows you can install a bunch of different distro types
why not to directly use Linux?
Whats up buddy! Just watching some tv, got my model training right now. Whare you you doing?
( it's morning here ) just hop onto PC, now will learn about BERT, any suggestions?
I think you can just even install WSL from the MS store and it enable it for you .
After that you can search for the distro and verison you want
I messed around with a few BERT models .(https://huggingface.co/docs/transformers/model_doc/bert)
What are you trying to do with it ?
sounds more easier, let me search
there's a lot of "Ubuntu" results, which one do I use
surprising, huh?
wait it says "Ubuntu 22.04.3 LTS" is already installed
open a terminal and type wsl
guess it's already installed ```bash
wsl
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
wonder how
just to learn!
depending on how you installed wsl, it comes with ubuntu by default
cool beans! ya Im not sure how it got installed, but lemme know if you got any questions. It works great. Its nice having the option to do both in one place.
Oh that reminds me I was going to setup a plex server on my laptop.
I'm sure I was running into errors when I used tensorflow on python 3.12, which is why I installed 3.11
weird I tried running tensorflow on 3.12 venv now it isn't raising any errors now
this was the error I had some days ago:
thanks a lot helping me out!
i wanna show my new GAN I created (based on scenary images)
looks great.
on which dataset u train this?
it's a competition dataset from kaggle, I don't remember exact name, it has pics of scenaries
nah its for non school things
I was trying to make a c++ implementation of the BM25 information retrieval algorithm and make a wrapper to it using cython, and was comparing my results against those from this library https://github.com/dorianbrown/rank_bm25
Interestingly, for one of the variants, the BM25L variant, the results I got were different and after quite a bit of time of debugging, it turned out that if I copy the source code of the library and then run the tests I get the same results. I get different results only one I use it as a pip package and I was very curious about the reason for such behavior.
I turns out that, after inspecting the code of the package after pip installing it against the source code on github, that there was a small difference in the formula used. I don't know how pip packages are made so it is still a mystery to me how such an error happened, but yeah this seems to be the reason, unless someone here can shed more light about it.
I totally know what that means
hey @hot obsidian can tell me about what thing i have to learn in data science or have you source where i can learn it
data science is very broad
there are resources in pinned
are there any free inference options?
as in LLMs? run local; some on openrouter are also free if you want to try that
there are very 'small' models like gemma-2b, phi, minitron-4b, etc. that don't need that good of a GPU (the 3 mentioned above can all be comfortably ran by a 4gb vram card with quantization)
CPU inference is also an option if you're desperate, then you're not limited by the GPU, but CPU clock speed & ram & ram speed
are there any libraries to get text embeddings?
I'd assume libraries that focus on inference would allow you to do that
so check ollama, transformers(huggingface) ig
I can't find any docs for transformers module, do you know where to find them?
thanks!
The transformers module is specifically for hugging face?
it's maintained by huggingface (& community) and has easy integration with it
I see.
damn 800 floats for a single text
quite a big vector
I was planning to try making a small text model using embeddings and conversations data
I mean each base model should have differing embedding sizes
found this leaderboard https://huggingface.co/spaces/mteb/leaderboard
might be helpful to you
ohh cool
both LLMs and image detection models, I tried to run them locally on my laptop but it’s just not good enough. Tried hugging face inference endpoints but it kept declining my card so Im looking for a free alternative
I mean you're not gonna get "good free" models so
why lend you compute for free when they can ask for a subscription / pay per token
precisely, how do you go about running some model with the help of flask. Like a simple input output type of web application for lets say an image detection model
dunno, you'll have to ask someone more knowledgeable for specifics
but I don't imagine it to be too different from everything else
Thanks alot, i'll look for tutorials while Im at it
is there any recommendations to get a team to work with on any pet project and way to run projects not on PC?
Where is federated learning actually used?
There are some projects like https://github.com/bigscience-workshop/petals, but idk how widely used they are in practice though
I see it spoken of a lot in things like EHR records
I also saw a use case of a streaming service using it for their recommender system
@agile cobalt I wanted something an application where hardware is used
what do you mean?
like for instance applying federated learning on an edge device for instance
like this
the amount of processing power micro controllers have is really low compared to GPUs... you'd need of thousands of them in order to match one GPU used in data centers, and the latency & amount of data you'd have to transfer between them makes it pretty inpractical
even running inference on micro controllers is already hard
you might be able to continuously fine-tune a small model in a micro controller, but I wouldn't expect to see anyone using them for federated training
To be fair, with those specs like 4GB RAM that doesn't really look like a microcontroller, that's a SBC, like a Raspberry Pi, for example
its a microprocessor
This is my best run yet just on the first epoch. The colors and shapes actually look decent and a steady loss from all the components. This is my best verison so far.
someone please help me with some tutorial o good book to initiate on DataScience
this platform is perfect for what I’ve been working on. Ive developed a text-image multimodal model that’s just 60MB, its ideal for embedding and staying lightweight. It integrates CLIP for text-image alignment, BLIp for text generation, and Sentencetransformers for embeddings
did anyone use WALDO? I'm having trouble finding the model files ( like they do not exist )
this WALDO btw https://github.com/stephansturges/WALDO
spent months building a object detector neural network library from scratch to finally achieve this holy
Hi, what are the main issues people usually face with data scientists? From the client's side of things
I thought I'd do some research since I don't have enough data/experience about it myself
"client's side of things"?
The ones hiring/in need of the data scientists' services
wherever you'll look you'll find pretty biased views in multiple ways, but maybe try looking at some freelancing offers & some Kaggle compeitions
I see. Are you a data scientist?
Great to hear
any idea on where i can get a corpus of python-related words? for now i've resolved to extracting things from the source code directly like imports, function names, assignments but i would like more general stuff
opensource ??
for a starter I'm thinking of using single numbers to represent each word instead of vectors (text embeddings)
are there any existing algorithms to convert words to a number? I want to make my own encoder/decoder to go back and forth easily
first thing that hit me was using indices and ascii of each character, math operations on it to come up with unique numbers for each word
then it hit me there might be cases where it's not unique as well
Define python related?
well stuff that occurs in python or in the docs i guess
You could use AST to parse the stdlib and grab whatever you want?
But I think your question is: does such a corpus already exist
i've already done something quite similar, but it doesn't quite grab a lot
correct
my google fu fails there
My answer is, not that I know of. Maybe someone else can pitch in 😄
i've gone so far as processing the source code of programs i'm reading and building small corpuses of of them but still not quite enough sadly
What are you trying to do?
removing gibberish from LLM output
correct output contains a lot of python terms, so i also use the python corpus to filter out what's not gibberish
Well done
So this is from 10 epochs. Everything seems to be improving gradually. Its learning, but its slow going. I might need to play with the learning rates a bit more but i think Its gonna take a long time to train
impressive, it looks like it will reconstruct the same image for same prompts, is that expected?
is vae from scratch hard to do?
I saw for example building in keras but its rather simple and its was not from scratch
now I'm reading an introduction to variational autoencoders from Kingma, Welling
anyone have a large plain text file for LLM ?
containing? I have looked for a conversations dataset on kaggle, it's a csv btw
just a large amount of text is all
web scrape bunch of Wikipedia and save it on text file
i just copy pasted the lord of the rings lmaoo
project gutenberg maybe, alice in wonderland etc dont sure
You can start with a list from wikipedia https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research#Internet
These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-...
This is publicly available https://pile.eleuther.ai/
The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.
is 825 GB large enough?
oh very cool
toooo large lmao, ive only got 20GB of storage on my computer lmao
Well, I guess you want a small one then? hahaha
I am trying to setup pytorch for my A770 GPU, I followed the docs, got this error when importing pytorch:
PS C:\kanemoto\vscode\llm> python .\main.py
Traceback (most recent call last):
File "C:\kanemoto\vscode\llm\main.py", line 1, in <module>
import torch
File "C:\Users\kanemoto\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\__init__.py", line 139, in <module>
raise err
OSError: [WinError 126] The specified module could not be found. Error loading "C:\Users\kanemoto\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.```
The `backend_with_compiler.dll` exists in its path.
I have the latest Microsoft Visual C++ Redistributable installed.
Any idea?
yeah, just large enough so that it can get a bunch of speech, but not too large that its gonna take a while or break the computer
nah doesnt satisfy my hunger
@odd stratus u building an LLM?
yeah, im gonna try to
How about this https://data.commoncrawl.org/crawl-data/CC-MAIN-2024-33/index.html ? Is it big enough?
how did you install pytorch? #packaging-and-distribution might also help
I used this command to install the libraries:
python -m pip install torch==2.1.0.post3 torchvision==0.16.0.post3 torchaudio==2.1.0.post3 intel-extension-for-pytorch==2.1.40+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/```
(from the docs)
did you pass the sanity check?
hey guys should I buy collab pro and cloud storage for training? Is it worth it?
Depends on ur requirements
why do u need colab pro, in the first place?
aha 69TB nice it's perfect
me too, are you gonna convert text to embeddings?
Hi, so I want to classify if a number between 1 and 100 is even or odd. Now I want to achieve that with the most simple MLP.
class SimpleClassifier(nn.Module):
def __init__(self):
super(SimpleClassifier, self).__init__()
# One input node, two hidden nodes, one output node
self.hidden = nn.Linear(1, 2) # From input to two hidden nodes
self.output = nn.Linear(2, 1) # From two hidden nodes to output
def forward(self, x):
# Forward pass: input -> hidden layer (ReLU activation) -> output (Sigmoid activation)
x = torch.relu(self.hidden(x)) # Apply ReLU to the hidden layer
x = torch.sigmoid(self.output(x)) # Sigmoid to get the output between 0 and 1
return x
I don't get much better than 50% accuracy i.e. guessing. :D
Here's my training loop:
def train_model(model, criterion, optimizer, dataloader, epochs=100):
for epoch in range(epochs):
epoch_loss = 0.0
for inputs, labels in dataloader:
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
# Compute loss
loss = criterion(outputs, labels)
# Add L1 regularization
l1_loss = 0
l1_weight = 0.001
loss
for param in model.parameters():
l1_loss += torch.sum(torch.abs(param))
loss += l1_weight * l1_loss
# loss = criterion(outputs, labels) # Unsqueeze labels to match output shape
# Backward pass and optimize
loss.backward()
optimizer.step()
# Accumulate loss
epoch_loss += loss.item()
What could I improve? I really wanna keep the MLP this simple
hmm maybe it's just not possible mathematically? I basically have two linear functions, I wouldn't know how I could do it by hand
You need an activation function
there's relu
maybe I just write my own using modulu and basically hardcode it ^^
Aha, I didn't see that. Then you likely need to increase the number of parameters
I just need some model that has very distinct grads
But not so much so as to memorize which number is even and odd
yeah of course but I wanted to make a minimal example for something. I wanted to find a classification that for a given input has very distinct gradients
yes I think or one of its dependencies is issue, pytorch reply
I'm curious about how well neural networks can extrapolate anyway
Lots of chance it will not work with larger numbers
I couldn't even think of how to do it manually but anyway found something else that might work
it's compulsory right ?? to convert text to embeddings?
ya, its expected. when you give it the same input, it should reliably produce similar outputs.
it took a couple months, but I added a manifold autoencoder and attention aggregation as well as clip and blip to help with the text-image alignment and caption generation and enough trial and error to kill a horse
who made it compulsary? we can use our own ways if we wish to, not that I know any other ways yet
nice
compulsory means, effective way to pass tokens!
btw what are other ways also?
besides embeddings?
yeah
nothing I can think of as effective, not really. But what if you stacked embeddings of tokens from a sentence or sequence to form a larger image-like structure
ah so you mixed things
exactly
I'm asking because when I see calculus of variations (variatonal) inspiration, and wonder if it is difficult in code as in math formulation
there is much of derivation
for example I saw in wikipedia derivation of q or p dont remember
i bypassed a lot of complexities using a vector quantizer to represent the latent space
I am having some troubles with tensorflow. I am loading tf_flowers dataset using tensorflow_datasets. The moment I run the jupyter cell and load it, 1.9 GB of 4 GB of my dedicated VRAM gets used which was all free before, the total size of the dataset is just around 233 MB, Also, when I try to train some models with single dense layer only and 128 neurons, I get ResourceExhaustedError saying Out Of Memory while only 2.1 GB of my dedicated VRAM is used and 1.9 GB is still left. How do I deal with this without restarting the kernel each time?
Hi talents I installed annaconda 2024 version and I'm using jupyter notebook Its too slow any one has this problem
I used the old version and its not slow as this
Too many variables to comment, but it's unlikely just changing anaconda version had any measurable impact. What are you doing with your notebook?
That's pretty vague
why ? what do you mean ?
He's asking you to be specific.
I started on kaggle Few days ago what do I need to know before starting the titanic competition
I just finished the introduction to programming course by Alexis Cook
do you know the story of the titanic, and how life boat seats were allocated?
domain knowledge really shines here.
I know the story and I would consider going back to refresh my memory on it
I may even try seeing the movie again
After that what's the next step?
the movie won't help.
OK then, YouTube resources would do right?
you only need to understand how it was decided who would get on the lifeboats.
and you should be able to manipulate tabular data with pandas to highlight those determining factors.
I don't know that course. you might do the kaggle pandas tutorial.
Will the knowledge on the introduction to programming do?
.
OK thanks the help, I'll keep you updated on the development
what are embeddings?
im just gonna try go letter by letter
You got me thinking of a different type of technique. Instead of passing standard embeddings, im stacking them to create an image like representation. I made a CNN that reshapes the embeddings into a 2d grid and applies connvoultions to extract patterns and intergrates it with the image data. I intergrated it in my project ive been working on and its training now
Thats cool but seems a bit counter intuitive to me, since the intuition behind convolution is that it gives you information about the neighbors of an "anchor datum". In other words, it would give you information relating to the position of the embeddings on the grid and the neighbors surrounding your anchor, which doesnt really make sense for embeddings in the same way it would for pixels. But I'll be interested to see if the results you get are good nonetheless.
Is there some specific reason you built it like this like it's used in a paper or are you just throwing stuff at the wall for research?
the attention mechanism should (hopefully) be taking care of the relationships between the words already
good point, but the reason im trying this is im hoping the CNN can learn to capture the higher level patterns and relationships between the token embeddings even if its not strickly spatial. i dont know if it will pan out, but i figured it worth a shot.
Gotcha
you might want to look into how the "standard" embeddings are constructed
so im new to a.i. what sort of layers and systems should i be implementing and using?
share the results! ( just always u do ), it will be interesting to see that then
not really, figuring it out
💀 well no
I hope you know what tokens are in LLMs
each token gets converted into vectors of n dimensions
basically an array of n dimensions containing floats
two tokens with same meaning will have similar vectors, such as boy and male
when you perform math operations you will quite often get the same result
example:
distance = King - man
now we can use it this way:
woman + distance
which is equal toQueen
watch 3b1b video of Deep learning, then you will understand it deeply!
This is original text
priknik horn red electric air horn compressor interior dual tone trumpet loud compatible with sx
and this is tokenized from BERT normal tokenizer
'##k',
'##nik',
'horn',
'red',
'electric',
'air',
'horn',
'compressor',
'interior',
'dual',
'tone',
'trumpet',
'loud',
'compatible',
'with',
's',
'##x']```
is it good?, but why '##' is being added to letters
oh yeah, i remembered seeing something like that but i had no idea how it works
Those vectors are created by some ways idk, but you can use the same text embeddings from an already existing open source model
watching through the 3b1b videos
i know what i want to do and how it works, i just dont know how to do it or what i would need to do to start doing it
does the a.i. learn the vectors itself through training, or are the vectors premade upon loading into the perceptron?
I was thinking of using it and then I saw the size of one vector for one of the models was "800", it's huge to me
the vectors are premade based on existing data
interestingg
i have no idea how i would use or gain the vectors though lmao
using existing vectors is what we can do, I still think they are large
@odd stratus when you said letter by letter, are you passing in ascii values? How are you going to pass them?
ascii values squashed to the scale of 0-1 or smthn
and the output can be a 128 vector output or smthn the a.i. can chose from
Im sure it'll struggle creating words. Passing words instead might be better.
I was thinking of coming up with some algorithm that converts words to numbers and back, still thinking
I have a really bad idea
I make a list of words, everytime I come accross a new word I append it
and the indices will be the values I pass in to train the model and get the output
Is there any solution to this...?
so restarting session works?
but then slowly slowly as you move forward ( run more code ) , it gives you this error right?
Yes
elaborate more what are u doing in that code!
I mean, how u are loading dataset and all
are u using Dataloader class?
I am just using tfds.load for tf_flowers dataset with batch_size 8.
It is. giving a Dataset object.
Doing some normalisation on image, and training a sequential model with a flatten layer 64 neuron dense layer and softmax output (used Adam optimizer).
only these?
Here is the code to load dataset:
BATCH_SIZE = 16 # Later changed to 8 but could not solve the problem
IMG_WIDTH = 128
IMG_HEIGHT = 128
builder = tfds.builder("tf_flowers")
builder.download_and_prepare(download_dir=r"D:\tensorflow_datasets")
train_ds, test_ds = builder.as_dataset(
split=["train[:80%]", "train[80%:]"],
shuffle_files=True,
batch_size=BATCH_SIZE
)
class_names = builder.info.features["label"].names
print(class_names)
def preprocess_images(image_batch):
# Resizing the images
image_batch["image"] = tf.image.resize(image_batch["image"], (IMG_HEIGHT, IMG_WIDTH))
# Scaling the images
image_batch["image"] = tf.image.convert_image_dtype(image_batch["image"], tf.float32)
# Format expected by `fit` method
return (image_batch["image"], image_batch["label"])
prepared_train_ds = train_ds.map(preprocess_images, num_parallel_calls=tf.data.AUTOTUNE)
prepared_test_ds = test_ds.map(preprocess_images, num_parallel_calls=tf.data.AUTOTUNE)
Model code:
model2 = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)),
tf.keras.layers.Dense(16, activation="relu"),
tf.keras.layers.Dense(len(class_names), activation="softmax")
])
model2.compile(
optimizer="adam",
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=["accuracy"]
)
I later changeed the dense layer neurons from 64 to 16 to resolve the error, but I couldn't.
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
I am currently not able to reproduce the error, but from a previous training, here is the error:
ResourceExhaustedError: {{function_node __wrapped__Mul_device_/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:Mul]
I copied it from my GPT prompt where I first asked about this problem. I am unable to provide the full traceback.
to able to reproduce error?? what?? , then share current error
Yeah, the situation is very random...
have u tried all this?
ResourceExhaustedError docs?
it says reduce batch size
Raised when some resource has been exhausted while running operation.
so out of memory, as I supposed
but he says he have 4gb vram
Well, I tried that, but I turns of that tensorflow might be saving models in GPU memory until kernel is shutdown/restart...
So, reducing batch_size didn't worked for me.
because it is helpful for ourselves
wait wait, try the same notebook on kaggle or collab
not familiar with tf, but maybe
prepared_train_ds = train_ds.map(preprocess_images, num_parallel_calls=tf.data.AUTOTUNE)
```this part's doing copies and so your gpu can't hold all of the data?
wtfff
I also tried: tf.keras.backend.clear_session() but didn't release the memory.
cause python would have to hold both train_ds and prepared_train_ds (and the test ones)
For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. If running into ResourceExhaustedError due to out of memory (OOM), try to use smaller batch size or reduce dimension size of model weights.
Well, on my physical disk the whole dataset size is around 233 MB, but it uses 1.9 GB of my GPU memory when I load it.
assuming it's copying, if you did like
train_ds = train_ds.map(preprocess_images, num_parallel_calls=tf.data.AUTOTUNE)
```the unprocessed data could be collected and reduce mem
dunno
maybe the data is compressed so when you load it it takes more memory than it might seem
Yes
and all of that is just 278 mb
no it's only 221 mb
another option as I said, try to run the same code on colab now
with the GPU they provide
profiling would be helpful I think
Okay I am trying...
if error not occurs, change upgrade your GPU then 😂
I must try pytorch
While loading the dataset in Colab it takes no GPU memory, the usage remains constant to 0.1 GB out of 16 GB but in my system it instantly consumes 1.9 GB of dedicated GPU VRAM (I have RTX 3050 with 4 GB dedicated VRAM). Why that might be...?
is porting from torch(lua) relatively easy to pytorch?
because I see some deep render in torch and want do it in pytorch
ordinals?
trying to improve my image generation model, looks good from epoch 8 :)
im using ordinals for the letter inputs and outputs
looks better and better I think
And I am using tensorflow 2.10 as that iis the only supported version in Windows.
what might be to know some bottlenecks etc use profiler
what's that?
I assume ordinal numers but not sure
ordinal numbers and encoding text doesn't relate
hmm but I saw somewhere this term ordinals, forgot where
maybe OrdinalEncoder
Gallery examples: Release Highlights for scikit-learn 1.3 Release Highlights for scikit-learn 1.2 Categorical Feature Support in Gradient Boosting Combine predictors using stacking Poisson regressi...
looks like make sense
to preserve inherent ordering
hm
using GAN?
is it giving any error on collab?
No, not till now....
I suppose
which model? are u using , I have tried U-Net!
wdym? GAN doesn't explain it?
u using GAN now right?
yeah
how's your structure of Generator then?
bunch of CNNs
generally people make similar to CNN
yeah that's what, but we can also make similiar like U-Net
that's a new one
cGAN !
conditonal GAN? I guess I made my number model that way
yup
!e
print(ord("E"))
print(chr(69))
:white_check_mark: Your 3.12 eval job has completed with return code 0.
001 | 69
002 | E
well that's ASCII
I don't think it'll be effective
dunno I haven't tried
btw a question
_____________
I need outputs from a neural network from a set of numbers, which each represent a word. How can I make it that the network only outputs from the set I have defined?
Example:
I have the set: ```py
[0.1, 0.2, 0.3, 0.4, 0.5]
The output:```py
[0.3, 0.1, 0.4]
```or ```py
[0.3, 0.1, 0.4, 0, 0] # padding on the right
The output size isn't fixed, since a conversation response can be of any size.
How can I go about making such an output layer?
Where did those numbers come from?
Look up "one hot encoding": each word is encoded as a vector of all 0s, with 1 in the position corresponding to the word. So your input sequence is a sequence of vectors, not of numbers.
I think they mean the output should only output those in the set
I will be encoding the words into numbers, I want to input numbers for learning and experimenting purpose.
Thanks for the idea on one hot encoding, I didn't think I could use that here. For now I still want to try on numbers first before vectors
its getting a 90% accuracy after 30000 epochs lmaoo
wait, 30k epochs seriously? 😂
i have no idea if thats good or not but it only takes about 30 minutes
im basing the concepts off of the image generating a.i's where the a.i. only needs to predict one letter at a time to create a full "image" but the image being the output text
and my input data is the entire movie scene script for The Fellowship of The Ring lmaooo
ive restarted a few times and trained my model a bit in each to see how it works
it seems to follow two trends
- it repeatedly outputs a single letter after initialising but around epoch 3000 it starts choosing different letters
- it either
a. starts getting everything correct
b. or it starts averaging results and getting incorrect output
could just be random initialising data
but it does get really accurate results when its initialised data is lucky lmao 50/50
lol have you tried it out?
can you show it's responses
30k in 30 minutes is fast ngl
testing different stuff to get it to do a full generated output
these are the layer sizes [101,1000,300,500,500,258] output layer is 258
for example I think I should not do sth like "added vae from scratch" but rather more modular messages?
like added encoder
added decoder
or not use git would be faster
when I'm not using git I code faster
I know how to use it but dont know when, at what messages to have
for a sec I thought read 101,1000,300,500,500,258 as a single 20 digit number 🤣
jus say whatever you did
how do we utilize tensorflow gpu on pycharm
i tried every possible way, but I can't find the right solution
writing git messages is like naming variables 😂
lmaooooo
Correct, but there's no way to do that. One hot encoding is how you do that. On the other hand, if the output is a real number within some range, there are things you can do to constrain the range of the output. But you can't put arbitrary constraints on the output beyond that. If you try, you run into some fundamental trickiness of the real numbers, among other problems
Unfortunately, one hot encoding is precisely how you encode a fixed set of numbers in a model. You are mapping words to integers, and then mapping those integers to elements in a vector. There are other ways to do it that are mostly used in research fields like psychology, but for the purposes of machine learning they are equivalent, so one hot encoding is preferred because it's the simplest and easiest to interpret
It looks like you're trying to use numbers other than integers, maybe decimal numbers within some range? Consider that 0.1, 0.2, ... 1.0 are identical to 1, 2, ..., 10 -- you just divide everything by 10
So without loss of generality, you can always transform a finite set of numbers to natural numbers counting up from 1 or 0 as desired
It turns out that this is true even for the rational numbers. digging into that is the content of a course in real analysis
I hate to tell you not to experiment with something, but at least hopefully you understand now why people do what they do (and don't do what you're trying to do)
Sorry to derail Salt's excellent explanation, but a bit of a question around training LLMs or at least, looking for guidance around what approach to take:
I'm currently looking to try build a model the predicts the next set of relevant tokens upto N tokens for Y variants, where N and Y are small (think maybe 10 at most) where it is trying to predict the most relevant tokens based on a input training dataset that varies in size.
I guess it technically falls under generative AI but it has some caveats:
- The aim is not to produce accurate grammar or longer sentences, just tokens.
- The system does not want to do KNN or other semantic search type of logic to get the most relevant tokens, i.e. RAG is out of the question.
I haven't tried it yet but I wondered if you could take some basic encoder-decoder model and fine-tune it to the new dataset forcing it to generating the tokens related to that dataset only. But not sure if that is the right or most efficient way to do so.
What do you guys suggest for mlops? ZenML, MLFlow or something else?
Had a good experience with both MLFlow and Neptune (SAAS)
What do you think about ZenML?
Haven't tried it so can't really say
Could you just generate embeddings for your new dataset and insert them into the pretrained model?
here's a relevant paper on this technique, although they were testing different languages https://openreview.net/pdf?id=MsjB2ohCJO1
I have reconsidered with the way you have explained it. I know how to use one hot encoding to provide input, what about the output? I'm not aware of activation functions or any solution for recieving output in this way
what are you trying to do?
hey, given a sentence, is there any way to figure out which chapter (textbook) or topic (pre determined) is it from? from research online I was told to use BERT but is there any simpler way? looks like I have to Train BRET with quite some data to begin with
make a small LLM type of model. do you need more context?
the way LLM's choose a word is by having a softmax across their entire vocabulary
the token with the highest probability is chosen
BERT is just another language model, as is GPT.
But I would avoid using language models for this, if you can get away with it. You could instead figure out what the "keywords" are for each chapter, and make a decision based on which of those keywords appear in the sentence.
or, if you have the whole textbook available, you can just... find the sentence in the textbook.
sometimes I need to identify what the sentences is about
Consider this sentence:
I'd never want to go anywhere without my wonderful towel.
Is this sentence about "I" or "towel"?
Its about "I"
so, you just need to identify the grammatical subject of each sentence?
but for sentence analysis that I am doing, towel was more appropriate answer
sorry for confusion
What about the output size? how do they create text such that their content doesn't exceed the max limit and it's constructed accordingly by stopping with punctuations. picking tokens with highest probability until you reach a stop punctuation before hitting the max length?
the model outputs 1 token at a time
you request from it as many tokens as you want (input -> output1 -> input + output1 -> output2 -> ...)
you could use punctuation as a way to stop if you want
it doesn't really matter
for example given
---
Consider the following pairs:
1. Port of Rotterdam: First major port in Europe registered as a company
2. Port of Shanghai: Largest privately owned port in the world
3. Port of Singapore: Largest container port in the world
How many of the above pairs are correctly matched?
(a) Only one pair
(b) Only two pairs
(c) All three pairs
---
I have to determine if this question is from geography or history
are you saying it guesses the next word?
that's how LLMs work yes
you can't have a. punctuation just. anywhere
so I train it with a dataset where the input is a text and the output is the word it's supposed to guess?
that's weird I have to figure out how to do that when the dataset I have is conversation pairs
in essence yes but it gets a bit more complicated with transformer architecture. Are you planning on using that or are you just gonna make a simple one with LSTM or something?
can you give an example of an input/label pair in your dataset?
I had found this on kaggle
It's an experiment, I won't be using it which is why I am trying to be different from the way normal LLMs work
modern llms use transformers and multihead attention all that
but you can make something like this with simple RNN stuff like LSTM or GRU
yeah so if you wanted to have a chat bot that can generate novel conversations that don't exist in it's dataset you'd probably wanna go the softmax route and feed it stuff like "I'm fine, how about yourself? " -> "I'm fine, how about yourself? I'm" -> "I'm fine, how about yourself? I'm " etc.
the big issue you'll probably run into here if you've never played with this kinda NLP before is probably stop words
your dataset is not massive, and there might be a lot of words that appear very often like "i'm" "i've" even spaces that the model can easily find local minima for when just spamming the same word over and over as an output. There are 2 minds to dealing with this which is basically to remove common stopwords altogether from the dataset to avoid having the model break during training (this unfortunately leads to the model not being able to accurately generate those stopwords without further fine tuning) or just leaving the stop words in and praying to any gods that will listen that it doesn't break.
you do in fact use one-hot encoding for outputs as well. that's the standard technique for classification in all cases, not just for text (where you are "classifying" each output token with a word). the difference is that you don't get strict 1 and 0 values -- you get a score in each vector element, and conventionally we treat the highest-scoring element as 1 and all the others as 0. ideally you would use the softmax function to ensure that the scores are all between 0 and 1, and they all add up to 1, which helps ensure that the output is sane, aids interpretation, and allows you to use loss functions that treat the output as a multinomial probabilty model, which is exactly what we have here
i suggest taking a look at the classic word2vec model: it's a good entry point into a lot of these concepts and still forms the conceptual basis for a lot of what we do in ML with text even 10+ years after the model came out
(most of the ideas in word2vec are based on older ideas in ML and statistics but at that point you're going very deep into the fundamentals, which is a good thing, but probably unsatisfying if you want to just play around and build some toy projects)
aren't all the big LLMs are trained on next-token prediction anyway?
as far as i understand, that's precisely what "GPT" is/was: a decoder-only model with a huge number of parameters trained on a huge amount of data turns out to be great at generating text
i kind of disagree. i think a language model is a reasonable approach to obtain a good-quality document embedding. BERT in particular has a small context window so you might need to do something like compute word vectors by sliding the context window across each chapter. then you can just do KNN or train a classifier on the document vectors
source: we used BERT vectors at work for text classification shortly after the model came out, and it improved our results compared to other vector embeddings
and we didn't fine-tune, we just used the off-the-shelf model weights
but as a learning exercise, yeah i think using pre-trained vectors starves you of an opportunity to explore and experiment and practice with building your own things
oh Im getting some ideas now, thanks a lot!
when im training my a.i.
it isnt outputting quality answers
im training it to predict the next letter in a sequence
however instead of outputting the next predicted letter
its output vector is just an average of the training data
e.g. if 25% of the output was the letter e and 10% was the letter a
its output isnt accurate and instead constantly outputs e as e is the most correct average
how do i prevent it?
Yep, but the goal of the most of the existing models want to predict human text as such, i.e. it has certain things like gramar correctness and formating sentences, which we don't really want.
The goal is it needs to be fast and lightweights, so it can't do things like RAG or things which end up involving running both the model and then KNN ontop of that.
the primary objective is keyword & phrase supplimenting to keyword search queries, but most systems like word2vec or GloVe, etc... are trained on general (normally english) text, making it liable to predicting words that don't exist in the corpus
Does anyone here work with AI in healthcare, or is anyone interested?
can you show your code?
Looks interesting, I will have a peak at this, ty!
why not train your own word vectors? it's super fast and easy with fasttext
Hmm possibly, how well does that work with predicting phrases of text though?
not enough source data?
oh, not well because it's cbow and skipgram neither of which is what you want i think
possibly, the source data itself is a black box, because it depends ultimiately on who is using the engine
different users will have bigger or smaller indexes
how much text do you have? maybe you can use nanogpt
that is: use the basic transformer architecture for its original purpose of sequence modeling, forget all the LLM stuff
i haven't seen this embedding replacement technique that waterfall posted though, so maybe that's promising
it definitely sounds like it might help you, from the abstract
Yeah need to dig into it, effectively the biggest issue here is amount of compute required. The goal is this is a suplimental system which can periodly train on the user's search corpus and then that gets used to help supliment search queries
giving you an illusion of hybrid or semantic search
but without the ANN/KNN related activities
In theory you could use word2vec and Glove on some pre-compiled (small) index, but I'm not sure how well they work when trying to form or predict phrases of 2 or 3 words
Does this imply that RAG normally uses KNN? I don't know anything about RAG besides "you want an LLM to produce accurate outputs about fish, so you use RAG with fish articles and hope/assume that your source is accurate"
That's my understanding so far
normally RAG has some sort of database that provides context to the LLM
which is normally in some form of vector search
doesn't have to be, but it is very common
have you looked into any sparse encoding search techniques? or would that still be too computationally costly?
And KNN is a form of vector search?
it is still realistically very computationally expensive
Yes
Thank you!
The issue is also the fact that it slows down time to search and ingesting times.
Currently in the landscape trying to do hybrid search with something like sparse encoding or just ANN/KNN you end up using 10-100x more compute than a regular keyword based system would, and often endup scanning a lot more data in the process.
The flip side is often people don't actually want the full semantic behaviour, they just want some similar keywords or terms of phrases to be included in the results when search for something like "high heels" for example. Adding vector search often ends up meaning you need a GPU instance to quickly embed all your data and respond to queries quickly, and then also see a much sharper increase of costs when you dataset grows and your time to search goes down because building the indexes takes longer.
yeahh
do you have any ML experience or knowledge prior to this?
nah
Hi guys can I get an AI roadmap recommendation
if im being honest i just need the steps lol
i rlly dont need to create it i just want to write about it
but i sorta want to understand it
ok lemme give it a read ty
https://www.3blue1brown.com/topics/neural-networks watch the first 4 videos here at least then pick a course you like
Here are 2 that are popular
https://see.stanford.edu/Course/CS229
https://developers.google.com/machine-learning/crash-course/
ok tyy
Just finished an evaluation step on my model. I had to make a bunch of changes to get it working still got some tweaking todo probably. Ill let it run for a bit then we can see some results.
Honestly, for the first reconstruction this is one of the best ive seen.
good day everyone, i'm not familiar with GPUs so i want to ask since i want to make use of google colab to train a model thats based on vision transformer from scrarch.
using the google colab T4 GPU or the google colab TPU v2-8
which one would you advice to train the vision transformer?
If you don't know why you want to use a tensor processing unit (TPU), just use the GPU.
thanks man
Dopamine
OMGGG.... I'm currently training with the GPU T4 and I'm not even gonna lie, its so awesome.
i use my laptop CPU(16 gb ram, core i7 and 3.0ghz) to train it normally before but i will stop every other tasks just because I'm scared my system doesn't blow up or crash. but now, omg, its as if I'm doing nothing. i cant even hear my laptop fan make any sound, i can literally type freely without any lag. and its fasttttt!!!!!!!!!!!!!!!!!!!!!!!
i'm so saving up for a real time GPU
What should I use for a kernel for a converted image matrix my apologies
the computation is happening on a google server rack somewhere, so you shouldn't notice a resource spike on your laptop.
if you buy a computer with a GPU, and you do machine learning on that GPU, you probably will hear the fans go up, and you might not be able to do other things on your computer while it's training.
yh, thats another fact.
quick question, my laptop has a GPU but im unable to train tensorflow on it so i use my CPU instead. now here comes the question. say i get an external GPU, like the big Nvidia RTX and the likes, will i have any issue with the training?
You just need a GPU that supports CUDA, which is pretty much exactly NVIDIA GPUs. But the deep learning that people have been doing for the last two-or-so years can't effectively be done on gaming-tier GPUs.
your money is probably better spent renting cloud compute.
hey, i just finished epoch 1, and in less than 5 mins(when i left here after my question and now that im back typing this), im done with epoch two. omg, this is so beautiful. i'm so happy about this, im literally sad right now i might cry, because this project has made me gone through hell
sounds like you're experiencing a lot right now.
its just so beautiful man. been trying to balance school with this project. but with this, its a game changer.
thanks for this reply Stelercus, truly helpful, i'm grateful
I don't think I was especially helpful, but I hope you'll remember this moment the next time you feel like a challenge is insurmountable.

trust me, I will
is that supposed to be a tear of joy?
you can use 
you're always of help. Thank you
Any opinions on groq ? Im trying to use it in my saas but not quite sure if that would be the best
Computervision: I cannot open 2 camera's at the same time.
Everything worked fine on my windows 11 laptop, then I transferred all my code to my linux / ubuntu. When I only open one camera with cv2.Videocapture(0) it works fine. All my different cameras work fine with index 0. But when I plug in 2 cameras and try Videocapture(0) and videocapture(1) at the same time i get that error message:
[ WARN:0@0.008] global cap_v4l.cpp:999 open VIDEOIO(V4L2:/dev/video1): can't open camera by index [ERROR:0@0.408] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range Error: Failed to capture image.
I also tried index 2, 3 and 4, and it gives me the same error, while there are 3 cameras plugged in my laptop
Btw google and chatgpt weren't of any help.
Thank you in advance for your help :)
that likely means you're model isnt complex enough to fit your problem, try a larger model, also try other optimizations like Adam, or RMSProp
i have written an unet model for image segementation. When i run my model. I'm getting loss as nan. I don't know why i'm getting it nan
i even have checked my input value
im working on a data science project on colab with some friends. one of our datasets is a 9gb csv file. is there anyway to import/load it into colab to work on it as a dataframe? or how should i go about working with this massive file?
maybe read it in chunks
sorry how do i read it in in chunks if i need to have it uploaded somewhere first..?
you can load the file into your google drive and mount the drive in colab
though for a file that size, you may or may not need a paid tier of either google drive or colab
if the data is obtained from some website/API, you'd have to process it as you obtain it
what do u mean by process it as i obtain ?
clean it as it imports?
yeah, process it in chunks
Hey guys do have any ideas
How to use machine learning for iot project
Or real time projects
That's a cool one
I saw an old project that predicted poses through walls using wifi signal data
https://www.researchgate.net/publication/359177684_Image_Generation_A_Review there are a handful of techniques that don't involve an autoencoder or a GAN like variational u-nets, but you can see like 95% of the image generation research follows some variation of an AE or GAN
explain more with code sir!
There are bunch of projects online
I think i have issues with my data. When i used some kaggle dataset. It was working. But when i used my dataset. It wasn't
see, then maybe print first 5 rows from your dataset
Strange thing is that I'm getting loss value as nan
Yeah mask value contains only 0 and 255
then as simple, it's how you are calculating and on what thing you are calculating
I still not able to figure. Where this is causing nan
then share some info, so others can also take a look at that
Ok
What sorts of things i can share with u?
I said already, first 5 rows from dataset, and maybe code on how you are calculating loss
Ok
I wanted to make my scenary image generator more realistic. I used GAN so I was looking for better ways. If that's the best way I wonder if changing my neural network structure would help. My second try involved adding more layers, results were even worse
original image array```
0 116.743820 129.932129 140.204529 123.849365 110.228264 104.687317
1 118.085236 128.580093 133.103256 111.298531 115.019913 110.951637
2 99.976089 112.731461 117.565979 125.454437 116.122879 115.366837
3 117.441841 130.380569 128.740417 114.199303 128.042313 137.160263
4 140.550476 141.988953 121.252663 107.138397 132.045837 136.520050
mask image```
0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 0 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0
root_train_dir = "D:\\feature-extraction\\assets\\train"
root_test_dir = "D:\\feature-extraction\\assets\\test"
train_x = glob.glob(root_train_dir+"\\images\\" + "*.npy")
train_y = glob.glob(root_train_dir+"\\masks\\" + "*.npy")
test_x = glob.glob(root_test_dir+"\\images\\" + "*.npy")
test_y = glob.glob(root_test_dir+"\\masks\\" + "*.npy")
def load_data(x, y):
X = np.array([np.load(i) for i in x])
Y = np.array([np.load(j) for j in y])
return X, Y
callbacks=[
EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
]
X_train , Y_train = load_data(train_x, train_y)
X_test , Y_test = load_data(test_x, test_y)
print(X_train.shape, Y_train.shape)
history = model.fit(X_train, Y_train, validation_data=(X_test, Y_test),epochs = 10, batch_size=8, callbacks=callbacks)
your dataset are images?
What were you thinking my apologies
hmm looks like you have somewhere division by zero?
divide by zero will result in NaN
the hyperparams of the model/fitting method might also be set incorrectly, causing the model parameters to blow up
maybe use np.isnan to print X and Y to see if it contains any NaN values
I have a question do I need to make a lot of pathways for a neuron to teach a neural network sorry because I got pillow installed in my network so I can put any image and turn it into an array my apologies
check out convolutionnal layers
It's in the same file as the python cold I'm trying the final destination Plus the image I'm sorry
Anyone ever play around with OpenAI Gym? I want to test my AI logic in unique enviorments. I made a CTF game using pygame but I wanted to try something different.
It is that referring to a YouTuber sorry
No but you can possibly make your own environment using pie game just find some sprites it's not the same but you could add some grabbing logic to objects to make it be able to move blocks to walk entrances from enemies etc sorry
@rich moth I'm sorry
#===(imports)===#
from PIL import Image
import numpy as np
from matplotlib.image import imread
#==============#
image_array = imread('C:\Users\Willo\Desktop\ais\eye0.png')
array =np.array(image_array)
X = array
print(X.shape)
what's the difference between RAG and AI search? And is this RAG? if not what should I google to learn how to make this?
I made a capture the flag game using pygame for it, I just wanted to try out some other things with it
What specifically a 3D because I believe there's a function that you could use to make 2D games although I don't know too much about training I'm just trying to make a neural network at the beginning although I didn't make a training simulation for Paul I'm sorry
You can embedded all that data into say something like elasticsearch index. Then you can build an AI pipeline around it so you can query the data.
what's the difference between that and RAG vector store emebeddings
What do I need for the convolution to get it so that the image can be turned into an array I'm sorry
Hey, I would like to get address, number and web site data from my saved places in google maps the saved as a .csv file. How can I do that without Google API? (Ex places list: https://maps.app.goo.gl/bsxbhgW9zvXzSa8n9)
What do you think I may need to do because I did everything to open up the image and turn it into an array which then I can add weights to plus the bias my apologies
its just a little more advanced , but really any vector DB. but if you want to build one I would do it with haystack and elasticsearch.
Okay that’s sota? And what’s haystack?
That’s the vector db?
okay and just wondering how would citations be done?
is that through the prompt (return me the citations) or is that through indexing metadata (done through code/software engineering)?
ya citation in RAG pipelines can be done through indexing the metadata.
I built one, but the UI is minimal and it looks like crap
can you link me the code?
and biggest problem with RAG rn is hallucination concerns yeah?
Found out why the image wasn't showing its shape forgot 2 back slash
honestly, mine doesnt hallucinate. believe it or not.
So how many areas for weights should I have sorry because it does say three color channels but there's Image size (194,259)
Nice
Thank you
What should I use for the kernel?
Not sure what you're asking.
To sign over the image to help it come to the decision what it is now I'm trying to recreate an experiment I heard of an AI that was showing images of may have had or won't have hurt problems in the future and it reliably told the biological sex trying to create that and for a convolutional neural networks to take images and recognize them you have to have a colonel that goes over the image sliding past on the array my apologies
Yes I'm sorry
In this video we'll create a Convolutional Neural Network (or CNN), from scratch in Python. We'll go fully through the mathematics of that layer and then implement it. We'll also implement the Reshape Layer, the Binary Cross Entropy Loss, and the Sigmoid Activation. Finally, we'll use all these objects to make a neural network capable of classif...
its part of torchvision, transforms
?
Any videos that may have any use my apologies
do you know what RAG is? if you do, can you explain what it is according to your understanding?
yeah I know what rag is but i'm wondering if a product like this was built without rag
basically asking if there's more methods to local-document AI search than RAG
if you got the code on git can you link?
RAG has to have a document retrieval component, but RAG in itself is not a document retrieval component.
If someone says "oh we need some way to search for documents", and someone else says "ok let's use RAG", that doesn't solve the problem. you need to already know how you can retrieve documents in order to create a RAG system.
yeah you're saying the retrieval is bm25/KNN vector search
ok I'm asking if there's other conversational search types besides rag
that are conversational? Not that I know of.
even if someone claimed that there were, I'd want to understand how it works before I agree that it's not RAG.
so you're saying RAG is the only solution to things like perplexity rn
I didn't say anything like that.
what is "perplexity", in this context?
ah
yeah have you ever tried it?
I think the way it works is they return the google search API results then summarize the answers through prompt engineering and cite their sources
seems kinda easy technically? 2B valuation
Does anyone know how to make a efficient kernel using numpy my apologies
I havent built a github page for yet.
Thats what mine looks like
What is the best way of getting Data for the neural network to do its job sorry
this question is too abstract to be answered.
What I mean is sliding it all across the image and getting the values to put into a relu function for each individual value sure it will slow it down but it might learn to go to the next layer and then the next layer and then the next layer and it will tell me what it is I know it's an over simplification I'm trying to explain it to not be abstract my apologies
maybe check out. https://www.youtube.com/watch?v=vT1JzLTH4G4
Lecture 1 gives an introduction to the field of computer vision, discussing its history and key challenges. We emphasize that computer vision encompasses a wide variety of different tasks, and that despite the recent successes of deep learning we are still a long way from realizing the goal of human-level visual intelligence.
Keywords: Computer...
Thank you
np
V dense can you send the code
helllo
63.3s 12 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
63.3s 13 I0000 00:00:1726126970.029600 62 service.cc:145] XLA service 0x7e5a04003a40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
63.3s 14 I0000 00:00:1726126970.029656 62 service.cc:153] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
63.5s 15 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
63.5s 16 I0000 00:00:1726126970.029600 62 service.cc:145] XLA service 0x7e5a04003a40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
63.5s 17 I0000 00:00:1726126970.029656 62 service.cc:153] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
64.8s 18 I0000 00:00:1726126971.486784 62 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
65.0s 19 I0000 00:00:1726126971.486784 62 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
79.4s 20 2024-09-12 07:43:06.118002: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape infunctional_1_1/dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
79.6s 21 2024-09-12 07:43:06.118002: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape infunctional_1_1/dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
145.9s 22 2024-09-12 07:44:12.611558: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape infunctional_1_1/dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
146.1s 23 2024-09-12 07:44:12.611558: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape infunctional_1_1/dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
not the first time I get these warnings/messages. I want to know what the reason is
bruhh why tf ??
im pretty happy with the results i got
the a.i. managed to write out this sentence in perfect order
it learnt to write and spell letter by letter
i think it knows that quick brown fox jumps over lazy dog
perhapsss
what else does it know
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

anyone know about this??
loss.backward(retain_graph=True)
I tried this option also
but still error
training with batch_size = 32
any resources to learn about training steps in neural network, forward propagation, backward propagation, loss and gradient in detail?
bruhh, you know that kid 17 year old who has made 5 hour video on maths for DL
A complete guide to the mathematics behind neural networks and backpropagation.
In this lecture, I aim to explain the mathematical phenomena, a combination of linear algebra and optimization, that underlie the most important algorithm in data science today: the feed forward neural network.
Through a plethora of examples, geometrical intuitio...
this is lit.....
nice i wanted in more detail
more detail? still>?
nop sarcasm, thanks!
trying to teach it the bee movie script next lmao
Anyone here who's create a (CNN) because I can use some help with the creation of colonels my apologies
wdym?
colonels? | who's create?
u helping CNN maker , or u wanna see that
i have tried and i didn't get any nan value
do you mean "kernels"?
Hello guys , so I have this project idea of building an ai model which takes the map of building , example lets say a mall, and it should give me the directions for a particular store in the mall
Like for example , I want to visit the nike store in the mall
It should the directions to that store from any where inside the mall
You could say like its a mini version of Google Maps
So if someone could give some ideas , as how to proceed?
you need to have an a.i. program that you can run
if you want to use preexisting infrastructure theres a lot of libraries e.g. tensorflow etc.
or make one yourself
then you need to design the layers and layer sizes, then you need to turn your task into a well defined set of outputs
then you need to take in data in a well defined way such that it can be mapped onto the output data
e.g. x+10 = y
input x output y
then once you have a lot of training data, test and train the a.i. until you get results you want
you might want to look up "path finding" algorithms -- this is a classic AI thing that long predates (and generally doesn't require) deep learning
An ai model which selects the best nearest alogrithm? like Djistra's?
wdym by "best" here?
i think dijkstra is optimal regarding complexity for the most general path finding problem
man making the game is harder than the AI part.
No, the path-finding itself is the AI here
Always lmao
And balancing rewards to actually get your agents playing your game instead of finding a niche and exploiting it is just as hard as making the agent too
Yes
That's amazing how do you have a reward system so I can add that to my convolutional neural network my apologies
what would this reward function do for your neural network
Being a point higher than the human player
what game is the CNN playing
A pong because there's two simple outputs up and down but it has to know where the ball is sorry
I broke it down into something simple pong because it only has two values that you would really need one for up and zero for down
im using a DQN with both self and cross-attention mechanisms. i represent player states and team dynamics with vectors, and those vectors are aggregated using attention layers to create dynamic behaviors for the agents. The self-attention focuses on individual agent features, while cross-attention helps coordinate actions based on interactions with teammates and opponents.
are you talking about a reward system for a CNN?
Well I was thinking that but it's probably not what's needed for a CNN so I might try training one on games first because I can build any game that I want and I can have it trained on the data that's found so I can get a better idea a feel for how to train them in the future should be a reward based or shipping just how it figures it out by itself my apologies
Can a CNN do that because I know some neural networks need training with reinforcement learning sorry
like a CNN-DQN?
DQN?
deep q-network
Yes a deep learning network
What do you want to do with it? Whats your end goal?
Teach a deep neral network to do anything but to start maybe games sorry
Well if I can't get convolution together maybe some type of deep reinforced game playing Network that I could train to do multiple different games old school and new school sorry
I only have the training site made I just need to figure out how to make the network I don't know if that needs to be a CNN or can it just be a regular not working that's been put into deep learning my apologies
So I made a simple DQN with a CNN . It actually works pretty damn well lol
i just install gym https://github.com/openai/gym
Has anyone tried it before?
this is pretty rad https://www.gymlibrary.dev/environments/atari/adventure/