#data-science-and-ml
1 messages · Page 141 of 1
Hi! is it okay to talk about Machine Learning here?
Yep
read channel description
has anyone ever had experience working with raw ECG data?
@lyric furnace mate just keep learning python understand when you can utilize loop, if else,aggregation operations, etc...even python hv libraries just trust me ...ive seen ml code that is using for loop and the code is something like 200+ lines regardles after utilizes libraries
sin is actually surprisingly good btw
so i red up and duckdb is optimized for analytical queries on giant databases whereas sqlite is optimized for single writes
which is why duckdb is so slow for logging
but nested dicts are still 10 times faster, but I haven't figured out and thus tested how to do threaded or async database writes
when they remove gil it will be much easier
Hey, anyone knows about sematic search engines?
What are your go-to methods to evaluate your clssification model performance on huge unseen dataset?
do you mean sine function? didn't get it
yeah
interesting, i haven't seen it used, and read a few posts yesterday that didn't seem too positive
like basically saying it reduces to tanh the useful part
the periodicity didn't really help if i understood correctly
i had the same results as relu with sine in segmentation but maybe its making some tasks harder
also interesting function modulus (abs(output)) haven't tried it
interesting
also there is a learnable piecewise function https://github.com/PiotrDabkowski/torchpwl
Piecewise Linear Functions (PWL) implementation in PyTorch - PiotrDabkowski/torchpwl
i've got this paper in the pipeline https://openreview.net/pdf?id=Sks3zF9eg
talks about that
also I tried an activation function that takes maximum among first half of the channels and second half of the channels, and minimum, and concatenates them, and it worked, although wasnt as good as relu
its crazy how you can give it any weird model and it will find how to use that model
by it i mean gradient descent
that repo is piece wise linear units right?
yeah
with any number of segments, that's somewhat new to me
this paper explores many of them, didn't check it yet either https://arxiv.org/pdf/1710.05941
(...) While sinusoidal activation functions have been successfully used for specific applications, they remain largely ignored (...)
[we] describe how the presence of infinitely many and shallow local minima emerges from the architecture.
(...) by showing that for several network architectures the presence of the periodic cycles is largely ignored (...)
etc.
may not be the best paper though (and may be incorrect.), just one i found.
i hears Cyc is a knowledge database, but can i use it to train my model? how can i get the code?
can anyone help me with this error
In the future, please always show code and other text as text. Not as a screenshot.
This error message means that your val_logs variable refers to None. Not to a dict.
i think your generator may have ran out of data
who are you talking to when you say that?
@lapis sequoia
how can I fix that?
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
you need to make sure that the steps _ execution *epochs is less than the n of batches, it's a common error, just read the docs for PyDataset
okay
the generator values are called once (till it runs out of data.), so that's why. in the case of tensorflow you can use .repeat idk if there is anything like that for pydataset.
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
tf.random.set_seed(42)
#preprocess the data (pixels in the range of 1 to 255)
train_datagen = ImageDataGenerator(rescale = 1./255)
valid_datagen = ImageDataGenerator(rescale = 1./255)
train_dir = '/content/pizza_steak/train'
test_dir = '/content/pizza_steak/test'
import data from directories and turn them into batches
train_data = train_datagen.flow_from_directory(directory = train_dir, batch_size=32,
target_size=(224, 224), class_mode="binary", seed=42)
test_data = valid_datagen.flow_from_directory(directory = test_dir, batch_size=32,
target_size=(224, 224), class_mode="binary", seed=42)
Build a CNN model
model_1 = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters=10, kernel_size=3, activation='relu', input_shape=(224,224,3)),
tf.keras.layers.Conv2D(10, 3, activation = 'relu'),
tf.keras.layers.MaxPool2D(pool_size=2, padding='valid'),
tf.keras.layers.Conv2D(10, 3, activation = 'relu'),
tf.keras.layers.Conv2D(10, 3, activation = 'relu'),
tf.keras.layers.MaxPool2D(2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(1, activation='sigmoid')
])
#compile our model
model_1.compile(loss='binary_crossentropy',
optimizer = tf.keras.optimizers.Adam(),
metrics=['accuracy'])
history_1 = model_1.fit(train_data,
epochs=5,
steps_per_epoch = len(train_data),
validation_data = test_data,
validation_steps = len(test_data))
@lapis sequoia please read this: #data-science-and-ml message
i think this could work (not fully sure.):
steps_per_epoch = (len(train_data)//(batch_size))//epochs),
not working
did you add //epochs
yes
i think that steps per execution * batch size have to be less than the training data
the same applies to validation steps
actually you could try both with the number 30 and check @lapis sequoia ?
otherwise you may have to ask either in a separate question in #1035199133436354600 or maybe in a forum (or wait for others to help)
by taking batch size of 30?
no sorry, by passing into steps per execution a number smaller than the number of batches generated (not the batch size.) @lapis sequoia
same applies to validation_steps, using a number smaller than the number of test batches generated (should actually be less than len(data)//batch_size) for each case, if i understand correctly.
thanks, it worked
no problem, you've got several other errors
@lapis sequoia can you please explain me what was the issue with the previous code?
i think it expects an Input layer
yeah let me see if i can find a thread explaining it, i can't now
yeah sure
quite similar: https://stackoverflow.com/questions/59864408/tensorflowyour-input-ran-out-of-data @lapis sequoia
if you ever use tf.data.Dataset instead, it's got this option (extracted from link.):
`If you're using a tf.data.Dataset, you can also add the repeat() method, but be careful: it will loop indefinitely (unless you specify a number)
can happen is using XLA (incomplete batches break it), then use drop_remainder=True, not your case though.
Guys what s better to analyze Academic papers ? Claude or GPT 4 ?
idk if this is the channel to ask.
In a comparative assessment of Claude 3 Opus and GPT-4’s capabilities, Claude 3 Opus generally demonstrates superior performance across a spectrum of tasks that test for knowledge and reasoning abilities. Claude 3 Opus consistently outperforms GPT-4, with an especially notable advantage in complex reasoning and coding tasks, suggesting it is bet...
claude then, sonnet is actually higher apparently https://www.artificialintelligence-news.com/news/anthropics-claude-3-5-sonnet-beats-gpt-4o-most-benchmarks/
what do they do for benchmarks
no idea, maybe they say more in the original post @lapis sequoia https://www.anthropic.com/news/claude-3-5-sonnet
summary on sine vs tanh (paper is here https://openreview.net/pdf?id=Sks3zF9eg), pretty interesting read
tldr; sine seems better for intuitively periodic tasks (like addition.), and comparable to tanh in std cases
(not surprising that is kinda works in many tasks, but not due to periodicity.)
is random grid search just straight up better than quasi random search?
because it has the lowest discrepancy possible
and quasi random search is better than random search
so random grid search is the best?
#===[imports]===#
import numpy as np
#===============#
X = np.array([0.1, 0.2, 0.3, 0.4])
converted_data0=np.asarray(X)
print(converted_data0)
is there a question?
how can i get the array to collect to the data andrun it through the network?
arrays don't "do things".
and what network?
yeah, I think the best example is making a neural network predict sin(x) from x, and sin activation works the best for it
the nerons
there is nothing in your code that appears to be neurons.
im working on the array to increse the speed this was to test the use before joing the mail code
Are you communicating with us through an automated translator?
no
There is no reason to have converted_data0=np.asarray(X) in your code. X is already an array.
Try writing more code that represents layers of a network, and write code to send an array through the network.
#===[imports]===#
import sys
import numpy as np
import matplotlib
#===============#
#===[neuron network]===#
np.random.seed(0)
X = [[1, 2 ,3,2.5],
[2.0,5.0,-1.0, 2.0],
[-1.5, 2.7, 3.3, -0.8]]
class Layer_Dense:
def __init__(self, n_inputs, n_neurons):
self.weights =0.10 * np.random.randn(n_inputs, n_neurons)
self.biases = np.zeros((1, n_neurons))
def forward(self, inputs):
self.output = np.dot(inputs, self.weights) + self.biases
layer0 = Layer_Dense(4,5)
layer1 = Layer_Dense(5,9)
layer2 = Layer_Dense(9,4)
layer3 = Layer_Dense(4,2)
layer0.forward(X)
layer1.forward(layer0.output)
layer2.forward(layer1.output)
layer3.forward(layer2.output)
print(layer3.output)
did you verify that this works?
yes
[[ 5.86410565e-03 4.20239779e-05]
[ 4.60184756e-03 2.41869992e-03]
[ 1.37659937e-02 -1.03951813e-02]]
my apoliges
do you understand why it works?
yes its the outputs combined from the neurons getting all posible outputs from the set inputs. my apoliges
why do you end every message with "my apoliges"?
segilopa ym wonk t'nod i
what?
I thought it was another language but it's just reversed lol
are you trolling us?
no my apoliges
I take time out of my work day to answer questions here. so please do not shitpost.
my apoliges
so, what do you want to do now?
make it be able tolearn colors from images and other things too
can you give an example of an image and what color you want it to learn?
why is that color the learned output for that image?
fox faces and the color green to start green because you recommended it and foxes as there faces are unece in shape and perportions
what the hell is going on !!😂
so you want it to recognize that the image contains a fox face, and when that's the case, you want the model to output green?
maybe i should teach it colors first my apoliges
bruhhh....
you don't have to keep apologizing
what would it mean to teach colors to the model?
bro is high on something I guess ..
to be fair, I did tell them to start with green. #1253470566107709480 message
and in that post, he also apologies..
if it where to be given a photo it can diseur colors in that photo for example:
list of colors
primarly green
secondary blue
third white
fourth is brown
my apoliges
bruhhh..
my apoliges
im sorry
i can figure out the rest i just need help help with one color my apoliges
@unkempt apex
writing assignments sir!
just post the questions properly, so other guys will look into it
without apologies
I just need help with one color input and then I can use a different output later
you don't need ML for that. you can write code that tells you the color of each pixel and gives you the count for each color
you'll want to cluster them so that shades of what you consider green, blue, yellow, etc. are counted the same.
im sorry
why are you sorry
for bothering you
should i learn pytorch or tensorflow first
you chose to do so, didn't you?
need help with this, if you read the post it has details
ive been ignored for houirs any help is appreciated
I'm not sure what point you're making. I was in the process of helping that person, and they posted a message where the letters were reversed, and I asked them not to shitpost.
pytorch and tensorflow are two libraries that do the same thing. it's not a foregone conclusion that you need to know both. I recommend focusing on one.
but I also recommend learning a lot of other things before you get anywhere near neural networks.
I know the math behind backprop and all that good stuff but which one should I focus on
I use pytorch every day and have never been asked to use tensorflow in over five years.
Do u have a job in ML? What do u do cuz I wanna pursue a career in it
I work in language ai
Research or industry? Also, how much regex do you use? I recently found myself with a project that initially sounded like it would need NER but regex worked very well.
come on , always pytorch..
any data mining projects recommended?
whats the point of data mining/hoarding
to gather data, to make sense of unstructured data mostly. Like whether you require that column on your dataset, example: you want to find avg height of 11-21 yrs old, you take a lot of data that contains their names, age, sex, bmi, address etc. Now which all you want, what's the dtype of the data, do you need to create more columns? This is looking mostly data mining
hii, how can I solve this error :FileNotFoundError: [Errno 2] No such file or directory: 'C:\programs\anaconda3\Lib\site-packages\matplotlib\backends\web_backend\js\mpl.js'
for this code :%matplotlib notebook
plt.plot(y_test,label='Real values')
plt.plot(california_y_predicted,label='guess values')
plt.legend();
I'm not sure that's a good example though; that shows it's good at predicting it's own behaviour.
But if it indeed is good at tasks with some periodicity built-in, then good enough for the sine.
this paper has a lot of cool stuff with activations, if anyone wants to waste their time, certainly llm s may summarise it though.
https://arxiv.org/pdf/1710.05941
is there anyone knows machine learning libraries like Tensorflow, pyTorch, Scikit-learn #data-science-and-ml
are people here more in the camp of illusionists, materialists, reductionists, panpsychists, dualists,... ?
Ask your question don't ask to ask
im illusionist
interesting, sad to lose dennett
yeah
but I havent read him
i only red parfit
I just think if you are a materialist and not illusionist than you cant not be a dualist
have you read marvin minsky, and what from parfit?
reasons and persons from parfit
havent red anything else on phil of mind
i thought the personal identity chapter from reasons and persons was good because i agreed with it
this seems a nice article about meta learning https://jameskle.com/writes/meta-learning-is-all-you-need
ill take a look one day, didn't know that guy, seems very interesting.
basically he takes the teleportation paradox and then goes very deep on a whole bunch of similar arguments that convincingly prove counter intuitive things about personal identity
actually he invented the teleportation paradox
and his other view is utilitarianism and he also has a whole bunch of very interesting and weird paradoxes even though i don't care too much about phil of ethics
that's cool, what are your main areas of interest @lapis sequoia ?
I think just paradoxes and thought experiments in general because its very interesting how counter intuitive they are
nice. recently read 'am i strange loop? by Douglas Hofstadter,' he loves paradoxes. i like too.
nice, i want to read it and weirdness of the world by eric schwitzgebel
a review on amazon says:
The word "Bizarre" is used 188 times on its 360-odd pages...
but can still be good
i actually havent even read the reviews i just liked the title
Is there anyone can provide a learning path of AI/ML engineer from zero to hero?
what do you like?
all are OK
like, find some topic that you like, sports, movies, etc, and then do data mining on that topic
I use regex a lot, but mostly to parse semi-structured data.
I do research in industry. I don't work for a university.
"do you need to create more columns?" you're talking about feature engineering. data mining is when you form insights based on analysis of large amounts of data.
Does anyone know somewhere that I can download/mine large amount of resumes? I am thinking of making an anonymous resume dataset for SWOT analysis
idk but noticed there are several discord communities just about that
Hi guys. I need books (or other resources) to learn Data Structures and Algorithms.
Please recommend.
wrong channel; see #algos-and-data-structs
dropout 4all
yeah sorry got a bit mismatched
So is dropout commonly used? From "it's used in basically every NN" to "it's almost never used", how common is it?
I think it's used pretty often; read system papers and develop your own sense for this.
there are some rules for when to apply it, but it's a common regulariser, and mostly useful for networks that are prone to overfitting (that's why it was invented.)
it may not work well with ReLUs in very deep networks, but im unsure whether this is fully established.
I see, thank you both
can anyone please help me with my langchain?
https://discord.com/channels/267624335836053506/1273305302954934334
you are welcome, you may find the original paper's abstract readable https://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf
it's got the many top researchers right there.
Hello, I'm in search for a DE mentor. Is a good place to ask?
how do I even anwser to:
"What is your experience managing batch and incremental data ingestion processes?"
is incremental live data?
I don't know much. I know Python well but not much pipes and data flows.
As I see, I need to know SQL well and be able to work with ETL in the cloud.
havings the basic of sql +data modeling, be familiar with notions like data warahouse, lake... and also some common tools
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
yeah updating..
?
and also this one
blue arrows represent conv
yeah like that!
that paper is really nice.
dataset is taking so much time to upload
on kaggle
what if I restart my session, will that 4GB dataset will it gone?
shit then what's the best way?
I have already
download in ka ggle?
how>?
where should i start i need help, i already know intermediate python and know custom tkinter i need some help in starting projects in data science and ai
I don't think you can have a national flag icon.
if you wish to get in contact with the moderation team, you can DM @sonic vapor
I think I might have came up with a machine learning concept, I don't think I heard this concept anywhere else.
Concept: The machine gets a bunch of information and if the information if relevant to the task, it will store and keep the info. Else if it isn't relevant to the task, it will save it just in case it's useful for another task. Else, it will delete the info as it's not needed.
The moderator says it is OK.
between relevant and irrelevant what is the "else"
I'm also not sure how that's related to machine learning, the machine learning would be determining whether information is relevant or not
For example, lets say your friend tells you he picked up a pen. It's information but it's extremely useless.
There are 3 sections.
Useful - Useful to the task
Junk - Stores the info, in case if it's relevant to another task
Trash - Completely useless information
I think this concept can go well with reinforcement learning. Dealing with the information efficiently..?
Sorry, I don't know much about machine learning, but the concept of it intrigues me.
sorry, i wasnt done explaining
There are 3 sections.
Useful
Junk
I'm not quite following
we're back to two states, initially you implied at least 3 different states (relevant, irrelevant, else (which isn't really possible, since the other two states should cover everything already))
ur the one that is getting mad
im learning tensors
i learnt how to make tensors 😂
and what makes the information included in the example you gave completely useless?
It's useless because it's not relevant to any task and the information given cannot do any other tasks.
Although, if he asks you if he picked up a pen, it can be useful information.
exactly, so, how do you determine whether it's potentially useful?
it either is useful or it is not
True, true
Then I would need a system that can detect fake information
Because in that scenario, fake information can be relevant.
I think you're asking: how can I detect which variables are significant and which ones are noise (irrelevant). Is that right? Aka feature importance
I'm especially interested in learning GBDT and XGBoost (and CRF), if any projects cover them it would be better
Yes.
Although, I was thinking of storing all it's knowledge in a list.
A method I thought of to get rid of the fake information is to put the program through a test and see if it can successfully complete it with no errors. Once it passed, it will keep the information.
Though, I would constantly need to create a test.
That's why I need to come up with an efficient method that can make sure the machine doesn't store false information.
I based this upon how we learn. Let's say we read a book, we absorb all it's information and we store it as useful and good. And when we encounter fake info, we can just dismiss it with our knowledge from the book and disprove it.
But I'm still struggling on how this method would be useful in.
Sorting data?
Fake is probably not the word you mean then; fake means (to me) inaccurate or misleading data, vs 'irrelevant' (noise).
Yeah. Sorry about that. Got a little off track.
This honestly sounds like it can be used in sorting data
I wish I knew more about machine learning. Anyone got any resources I can use? Thanks.
does anyone have any tips on making template matching characters more reliable? I want to be able to identify characters in a game menu, which has a consistent font. Currently I'm using cv2.matchTemplate alongside a collection of rendered characters in that font. To my eye the characters look to be about the same size as the ones in the image, and I'm using Image.convert to make sure the colors match. Any ideas?
ok, so
- find a topic that you like. If you need a list of topic, go https://paperswithcode.com/sota
- find a dataset in that topic
- go crazy
Thanks for passing this along
How is Geospy Ai model trained? What data did they use and how can it be done?
can anyone help me find a pre trained model for a medical chatbot
for what uses?
there are some who predict diseases based on symptons
Exactly that
without getting too deep into what i want to do, its basically a "machine learning algorithm" that can differentiate between slides of blood with cancer and without cancer by using a control data set and a data set that has cancer, how can i achieve this?
what does that imply?
how deep you want to go?
Is this list good enough? https://kidger.site/thoughts/just-know-stuff/
nvm , I deleted dataset and uploaded again now works
I'll try it myself as well, is the model any good?
can anyone help me in my langchain problem?
https://discord.com/channels/267624335836053506/1273677397220261960
Hey guys who is good in machine learning t help me with a project.Anyone
always ask your actual question. don't ask to ask.
okay
this is quite cool, hard to get all though (for me) https://en.wikipedia.org/wiki/Universal_approximation_theorem
In the mathematical theory of artificial neural networks, universal approximation theorems are theorems of the following form: Given a family of neural networks, for each function
f
{\displaystyle f}
from a certain function space, there exists a sequence of neural networks
...
I just want to learn Reinforcement Learning
then learn that
alr, sounds good
..
@spare forum the code is in pic
do you need that class code of how I am loading data?
it's with datasets.ImageFolder from torchvision no ? or handmade
yeah from torchvision
the dataset is inherited with Dataset from torch.utils.data
check result maybe idk
I don't really know the problem
I would do like from torchvision import datasets then the code is very similar like datasets.ImageFolder(root="..." , transform = train_transform)
but something went off with this I guess
that was very stupid error
the train images was .jpg and I was checking for .png😂
☠️
so now, the project is road extraction from satellite images
where
/train -> satellite images and masks(label)
/test -> sat images
/valid -> sat images
so how can I train my model?>
because we can't calculate validation loss as there are no masks for to vaildate and insimple even compare
how to do backpropagation with tensorflow
this is what I have so far
class Conv2d(Layer):
def __init__(self, depth, kernel_shape=[3, 3], stride=1, variance="He"):
self.kernel_shape = np.array(kernel_shape)
self.variance = variance
self.depth = depth
self.stride = stride
def forward(self, input_activations, training=True):
output_activations = self.biases.copy()
# for i, kernels in enumerate(self.kernels):
# for kernel, channel in zip(kernels, input_activations):
# output_activations[i] += scipy.signal.correlate2d(channel, kernel, "valid")
start_time = time.time()
output_activations += tf.nn.conv2d(
input_activations.reshape(*input_activations.shape, -1).T,
np.flip(self.kernels.T, (0, 1)),
strides=[1, 1, 1, 1],
padding='VALID'
)[0].numpy().T
end_time = time.time()
if training:
self.output_activations = output_activations
return output_activations
def backward(self, input_activations, node_values):
new_node_values = np.zeros(input_activations.shape)
kernels_gradient = np.zeros(self.kernels.shape)
for i, (kernels, kernel_node_values) in enumerate(zip(self.kernels, node_values)):
for j, (image, kernel) in enumerate(zip(input_activations, kernels)):
kernels_gradient[i, j] = scipy.signal.correlate2d(image, kernel_node_values, "valid")
new_node_values[j] += scipy.signal.convolve2d(kernel_node_values, kernel, "full")
kernels_biases_gradient = node_values
return new_node_values, [kernels_gradient, kernels_biases_gradient]
The forward pass is really fast, but the full convolutional operation is really slow and I can't figure out how to write it using tensorflow which is much faster
I didn't see a response but you want to look at classification models. I haven't read this but it looks like a good start: https://jonaac.github.io/works/deepxgboost.html
Does anyone here have ChatGPT Advanced Voice and would be willing to help do something like https://www.youtube.com/watch?v=MB-IGShzNzA but for geolocating UK accents?
nvm I figured it out
def forward(self, input_activations, training=True):
output_activations = self.biases.copy()
output_activations += tf.nn.conv2d(
input_activations.reshape(*input_activations.shape, -1).T,
np.flip(self.kernels.T, (0, 1)),
strides=[1, 1, 1, 1],
padding='VALID'
)[0].numpy().T
if training:
self.output_activations = output_activations
return output_activations
def backward(self, input_activations, node_values):
new_node_values = np.zeros(input_activations.shape)
kernels_gradient = np.zeros(self.kernels.shape)
new_node_values = tf.nn.conv2d_backprop_input(
[1, *input_activations.shape[::-1]],
filters = self.kernels.T,
out_backprop = node_values.reshape(*node_values.shape, -1).T,
strides = [1, 1, 1, 1],
padding = "VALID",
).numpy()[0].T
kernels_gradient = tf.nn.conv2d_backprop_filter(
input_activations.reshape(*input_activations.shape, -1).T,
self.kernels.shape[::-1],
out_backprop = node_values.reshape(*node_values.shape, -1).T,
strides = [1, 1, 1, 1],
padding = "VALID",
).numpy().T
kernels_biases_gradient = node_values
return new_node_values, [kernels_gradient, kernels_biases_gradient]
@lapis swift
thanks ill check it out
quite a beautiful theorem, the universal approximation theorem in 2 short paragraphs
Any continuous fn in a subset of extended euclidean space can be approximated by a 1 -hidden- layer neural network; with infinite neurons.
Interestingly, many theorems cite that the activation fn must be non-polynomial, which many papers seem to ignore (they test polynomials fn and fail.).
Hello, I am currently working on a research project and I've run into a problem with my minimax algorithm. Some backstory, the project is aims to integrate minimax strategies into the selection phase of the MCTS algorithm implemented in Michael Hu's AlphaZero "clone/model". The MCTS used in AlphaZero is a bit different from a traditional MCTS algorithm is these ways: 1.) After the search reaches a leaf node, there is no rollout. Instead, AlphaZero uses the neural network to evaluate the board position and uses that as an estimated game result to update the statistics in the search tree.
2.) When expanding a leaf node, all children are expanded in a single operation, rather than the standard MCTS, which expands one child at a time. This means that after node expansion, a leaf node immediately becomes fully expanded.
3.) AlphaZero uses a slightly different UCT algorithm to select the best child during the selection phase, which incorporates the prior action probabilities from the output of the neural network. There's a lot of code but main issue I'm having is, my minimax function for some reason doesn't work, it does not select the correct max or min values based on the evaluation value gotten at the terminal state
The minimax function with alpha-beta pruning:
def minimax(
env,
node: Node,
depth: int,
alpha: float,
beta: float,
maximizing_player: bool,
eval_func: Callable[[np.ndarray], Tuple[Iterable[np.ndarray], float]]
) -> float:
# Base case: if we reach the maximum depth or the node is terminal (not expanded)
if depth == 0 or not node.is_expanded:
# Use the environment's observation and eval_func for terminal state evaluation
observation = env.observation() # Get the observation from the environment
_, value = eval_func(observation)
return value
# Get the legal moves (i.e., child nodes that are expanded)
legal_moves = np.where(node.child_N > 0)[0]
if maximizing_player:
max_eval = float('-inf')
for move in legal_moves:
child_node = node.children[move]
eval = minimax(env, child_node, depth - 1, alpha, beta, False, eval_func)
max_eval = max(max_eval, eval)
alpha = max(alpha, eval)
if beta <= alpha:
break # Beta cut-off
return max_eval
else:
min_eval = float('inf')
for move in legal_moves:
child_node = node.children[move]
eval = minimax(env, child_node, depth - 1, alpha, beta, True, eval_func)
min_eval = min(min_eval, eval)
beta = min(beta, eval)
if beta <= alpha:
break # Alpha cut-off
return min_eval
i wish 5+ lines code blocks were collapsed by default
could i do that?
i don't think so, not your fault at all; you can paste a link but then less people would see it, so it's fine :-)
The selction function:
def best_child(
env,
node: Node,
legal_actions: np.ndarray,
c_puct_base: float,
c_puct_init: float,
child_to_play: int,
eval_func: Callable[[np.ndarray], Tuple[Iterable[np.ndarray], Iterable[float]]],
alpha: float = 0.5,
minimax_depth: int = 2,
) -> Node:
if not node.is_expanded:
raise ValueError('Expand leaf node first.')
ucb_scores = -node.child_Q() + node.child_U(c_puct_base, c_puct_init)
# Initialize the minimax scores for legal actions
minimax_values = np.full_like(ucb_scores, fill_value=-9999.0, dtype=np.float32)
# Apply minimax to the legal actions only
for move in range(len(legal_actions)):
if legal_actions[move] == 1:
if move in node.children:
minimax_values[move] = minimax(env, node.children[move], minimax_depth, float('-inf'), float('inf'),
node.to_play == 1, eval_func)
else:
# If the child node does not exist, treat it as a leaf with no Minimax value
minimax_values[move] = 0
# Combine the UCB scores and Minimax values with a weighted sum
combined_scores = (1 - alpha) * ucb_scores + alpha * minimax_values
# Exclude illegal actions by setting the combined scores to -9999
combined_scores = np.where(legal_actions == 1, combined_scores, -9999)
# Select the move with the highest combined score
move = np.argmax(combined_scores)
assert legal_actions[move] == 1
if move not in node.children:
node.children[move] = Node(to_play=child_to_play, num_actions=node.num_actions, move=move, parent=node)
return node.children[move]
Thanks in advance for any help🙏
hmm if classical usage of pca was face recognition
and if pca is dimensionality reduction and equivalent is autoencoder so autoencoders are used for face recognition?
I meant extracting features with pca
hmm but with neural networks here is no need for feature extraction
stuck at this error
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-3-87b76c9d6778> in <cell line: 12>()
10
11 from mrcnn.config import Config
---> 12 from mrcnn.model import modellib, utils
/content/mrcnn/model.py in <module>
21 import keras.backend as K
22 import keras.layers as KL
---> 23 import keras.engine as KE
24 import keras.models as KM
25
ModuleNotFoundError: No module named 'keras.engine'
i'm trying to use mask rcnn code. But i don't know why it's giving me this error
What is Batching of LLM Jobs, How Can It Reduce LLM Inference Cost, and How Can It Help Overcome Challenges Like Rate Limiting and GPU Utilization?
In this article, I explained all the above concepts. Please have a read and let me know your thought.
https://blog.cuminai.com/unlocking-the-power-of-job-batching-transforming-ai-workloads-2220b8c05e4f
maybe that means that all components of C, W, b can be found?
no, the fn just needs to be continuous
that was my 1st interp. but i think it just means that sigma would get undefined constants multiplying it
if it means that though, it makes sense that it's calling it sigma, and those were proven much later
ReLU, GeLU and so on, in fact some were proven only recently!
and also for discontinous functions (2023)
it's the form most often quoted; the 1st theorem was proven only for sigmoid iirc
it was extended to relu etc
Also, certain non-continuous activation functions can be used to approximate a sigmoid function, which then allows the above theorem to apply to those functions. For example, the step function works. In particular, this shows that a perceptron network with a single infinitely wide hidden layer can approximate arbitrary functions.
true, i thought it was for relus; those were proven later, i may get the paragraph
are you on the wikipage?
im not sure why, but the first was proven for sigmoids, this is the line:
The first examples were the arbitrary width case. George Cybenko in 1989 proved it for sigmoid activation functions
the paper is paid though
maybe but they can be used though, it's just proven later on from what i read there, then that's why we use XLUs
yes, plus people use x^2 as well from what i understand
you may need to check that paper, kinda funny to put it under a paywall since it's got 30K citations, well maybe thats why
for me visually, it makes sense that ReLUs can approximate any fn, more than sigmoids
Hi,
I am working on a small project related to RAG and am stuck (Apparently cuz I don't know much)
I used mxbai-embed-large as embeddings and Chroma db as Vector store all goes well to this point.
Issue: When I try to retrieve data with similarity threshold it returns 0 docs and without threshold and k it always returns 4 docs no matter the query.
What is it that I am doing wrong?
Here my Code:
Vector Store Creation File:
# Load Docs and then store embeddings in the Chroma DB
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
embeddings = OllamaEmbeddings(
base_url="http://43.204.231.131:11434",
model="mxbai-embed-large",
)
loader = PyMuPDFLoader("./data/aliceShort.pdf")
data = loader.load()
# print(len(data))
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=300,
chunk_overlap=100,
length_function=len,
add_start_index=True,
)
chunks = text_splitter.split_documents(data)
print(f"Split {len(data)} documents into {len(chunks)} chunks.")
db = Chroma.from_documents(chunks, embeddings,persist_directory="./chroma_langchain_db")
query = "Who is Alice?"
docs = db.similarity_search(query)
print(docs[0].page_content)
Query File:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma
embeddings = OllamaEmbeddings(
base_url="http://65.2.37.27:11434",
model="mxbai-embed-large",
)
db = Chroma(persist_directory="./chroma_langchain_db", embedding_function=embeddings)
query_text="Who is Alice?"
retriever = db.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.1})
docs = retriever.invoke(query_text)
print(len(docs))
this is also quite interesting lol:
Notice also that the neural network is only required to approximate within a compact set K. The proof does not describe how the function would be extrapolated outside of the region.
Well, i guess it does badly outside the region
i read it as: the approximation will be accurate within the training set (or the area sufficiently covered by it)
yeah, but it's limited unless you have all the data
so you may learn any fn fitting the data in a subarea
i don't mean infinite data, i mean all the data
uhmm..to me it's got a more practical reading as well. since we normally have a subset of the data, that may be fully linear (described by a line/plane/etc), and you may find f for that region K, in any-dimensional space
but that's f for K which is the universe for the model, since it hasn't seen outside K
no, the dataset (datapoints) would get g to approach f
so one works backwards and the dataset points define K for g which will approach to f for K
then any datapoint outside K will never be predicted correctly unless one gets lucky
that leaky relu and such can approximate any fn seems intuitive,
imagine that you put all weights to 0 apart from some (what ReLU does.),
and they add up to a local line that is 'tangent' to the real fn; for the offset, one's got the bias.
(the infinite neurons of the single layer are the infinite segments)
apparently this is why polynomials fail, i've no idea what it means
Paper is MULTILAYER FEEDFORWARD NETWORKS WITH A NON-POLYNOMIAL ACTIVATION FUNCTION CAN APPROXIMATE ANY FUNCTION sorry it's in caps
this is another "proof" of universality for maxout networks, it's visual. i can't get it very well, but seems intuitively right?
Isn't that not so knew?
yup same, that's why i shared the paper's title below
but Density is mentioned in the wikipage as well
idk what that is either
what's new is the proof for discontinous fns i think i can share if anynone wants to check, it's way too complex for me
I see because the theorem with some strict assumptions (continuous ? ) is older, I have it somewhere in my course (not the entire proof just the result lol)
yes, that's correct, so you can find out there that NNs cant approximate some specific fns (discontinuous)
i mean some crazy fns f, if that's a reply to me: https://www.reddit.com/r/programming/comments/z23f05/comment/ixeg9os/ (just the comments!)
Hey guys,
I’ve created a multiclass classification model and trained it on a labeled dataset. Went pretty well on the local dataset tbh and I’m now looking to soft-launch it into prod. The input data will be converted into an n-dimensional input vector, which won’t form a convex or regular shape when plotted on a chart (at least my EDA shows that). Since I can’t foresee every possible model input, the model won’t handle every scenario perfectly, which is i guess okay, but I am looking for broad use-case. Which will lead to a number of false positives, which I want to iteratively add to my training data corpus and improve the model overtime.
I’m looking for an efficient approach to identify and manage these false positives. I was thinking about:
1)Randomly sampling a subset of the data and label it manually to verify where it is true postiive or false postiive.
2)Get user feedback to identify misclassified ones.
3)Using clustering techniques with metrics like Silhouette score, Davies-Bouldin Index, Calinski-Harabasz Index (CH), Normalized Mutual Information (NMI), or the Dunn Index.
4) Combine 1) and 3)? Identify some of false positives and then with clustering to find the similiar ones which are possibly also false positives
My end goal is to create a pipeline that will iteratively improve over time. How would you approach this problem? Thanks!
Happy Friday, August 16: It’s all about AI and Automation! 🎉
Hello i am new there,)
I need your collective wisdom for an AI challenge! 🧠
My mission: Describe images with AI, and I’ve set my sights on LLaVA.
The issue: I’m a bit lost on how to choose the best approach! 🏊♂️
Quick context:
• I previously used OpenRouter (which used Fireworks)
• But it’s no longer available 😢
• I’m looking fto use Python
• I struggled this morning with PyTorch (persistent DLL file issues) 😅
• My laptop doesn’t have a powerful graphics card
What I’m looking for:
- An API rather than a local solution (too complicated for me right now)
- Cost-effective options
- Technically simple solutions
I’ve already explored a few options:
• Replicate
• Hugging Face
• Fal.ai
• Google Colab...
But I’m a bit confused by all these options and their differences... 🤔
Questions for you, wise developers:
• What would be the best API for using LLaVA in my case?
• How can I navigate through all the variations of LLaVA?
• Do you have a simple comparison of the models (efficiency/cost)?
• Are there other options I might have missed?
I don’t want to dive in headfirst without understanding all possibilities first. Basically, how would you go about researching and choosing the best option?
Thanks in advance for your insights! 🙏✨
Please excuse my English if it’s not perfect, as I’m not a native speaker.
any idea how to get started with machine learning without heavy math background?
study math 
you will probably want to understand a little of calculus and linear algebra, at least enough to understand why things work
Please excuse my English if it’s not perfect, as I’m not a native speaker.
are you sure you aren't native speaker
check fast.ai
check the pinned mml-book (mathematics for machine learning)
my problem is I dont know how little should I learn because I'm not keen to long studying I learn fast when im doing something
my take is that you would rather get started and build an intuition with a library that makes stuff for you
bc that makes learning easier afterwards, in a way decoupling terminology from concepts.
like sklearn or tensorflow?
know u know with my audio,)
like fast.ai
got it thank you very much
you can also check pytorch slowly, it's great and has got many examples
thank you very much
np, don't just listen my advice though, that's one angle, the book suggested to you is another, and the book is very good.
either fal or replicate should work fine
if you need to test something with 0 costs you can try using the Gradio API for a Hugging Face Space that uses Llava like https://huggingface.co/spaces/llava-hf/llava-4bit
i didn't understand much of those posts (a lot of jargon for me to parse.) but this is somewhat simpler, it's just a small piece of those posts
• How can I navigate through all the variations of LLaVA?
Using a different model should be as simple as changing the name of the model in one line of code
If you mean finding all variations that exist, browse though Hugging Face models or the models page of the API provider you plan to use
• Do you have a simple comparison of the models (efficiency/cost)?
You can look up benchmarks, but you should never expect for the benchmark performance to be an extremely good estimation of its performance in real tasks ; You must test and benchmark it in your own tasks
The post starts
By the Stone-Weierstrass theorem,
and the image i linked is the theorem.. :-(
(from https://arxiv.org/pdf/1302.4389v4, neat paper imho)
The theorem says:
In mathematical analysis, the Weierstrass approximation theorem states that every continuous function defined on a closed interval [a, b] can be uniformly approximated as closely as desired by a polynomial function.
The paper/image as well, but replaces polynomial with PWL (piece wise linear.)
(bc maxout networks can approximate any PWL, they say they are universal fn approximators.)
Can someone help me on this??? 😒😒😒
sota in CIFAR-100 not improved in several years apparently?
we should try to beat it :-). They've got a neat API https://github.com/paperswithcode/paperswithcode-client
API Client for paperswithcode.com. Contribute to paperswithcode/paperswithcode-client development by creating an account on GitHub.
Hey guys!
quick question
Are you aware of any AutoML llibraries that takes advantage of CUDA or GPU?
I am working with AutoGluon at the moment, seems to be CPU intensive even though I am using GPU parameters. Is there anything that correctly integrates with Nvidia CUDA as per your knowledge?
If so, please drop a reply! Thanks!
did you install the gpu version and set the configuration to use the gpu?
most standard/popular ML libraries CAN use gpus, but require you to set it up correctly
yeah, I am using autogluon[all]
I don't have a GPU but I know you can use it with autogluon
I see. It seems to be keen on using CPU but lemme look into it further
different models and optimizers require you to tell them explicitly to use the gpu
It use CPU by default, also it uses all available cores with agluon
thank you very much @agile cobalt !🙏
For your valuable response and there is a lot of information and I really like your sentence that I put in quote above 'V yes because indeed we get lost in all the proposed models that is why I wondered how experienced developers did it which is not at all my case who tries it is just to run a simple script in python!
As I am a beginner basically I am dependent on the information that you give me the artificial intelligences to guide me and in some cases I wasted too much time on useless choices when there was a very simple solution in two clicks so it is true that it is never easy to find the right decision in my case to know which way to start but thanks to your information I am already better equipped
damn
the amount of optimizers in nevergrad is crazy
I am looping through all of them to see which one is the best
the results are in
I asked all of them to solve a maze
the worst one goes to "HaltonSearchPlusMiddlePoint" (which is quasi random search and idk what is middle point)
the best one is LargeDiagCMA (evolutionary strategy)
whats crazy is
Accelerated random search is better than all of them
and no library implements it
300 algos though thats insane
Actually I just realized that its not true
because I let them all have 10 seconds and nevergrad is a bit slower and does 10 times less iterations (even random search)
I want to repost #data-science-and-ml message periodically, maybe is once every couple weeks until someone responds ok?
No
https://corbin-c.medium.com/
Check out the MLP article. Is it accurate enough??
Ok, then please let me ask about classifiers in general for https://www.accenthelp.com/blogs/accenthelpblog/british-isles-accent-map
British Isles Accent Map When people talk about a ‘British accent’, they tend to be thinking of the upper class Received Pronunciation accent. But what you might not realize is that the UK has a huge variety of accents, and a higher level of linguistic diversity than many other countries. These range from the lilt of t
It's possible in principle if you have enough samples of each accent. You might need hours of audio and dozens of distinct speakers per account.
It would be especially helpful if the dataset had speakers who contributed audio samples in more than one accent
Is there a training data set for accent identification?
Hi
in yolo
model = YOLO("yolov8n.yaml")
model = YOLO("yolov8n.pt")
model = YOLO("yolov8n.yaml").load("yolov8n.pt")
same as
model = YOLO("yolov8n.pt")
?
in the first example, we create a model from stratch and transfer the params of pretrained yolo model
In the second one, we directly use the pretrained model
Is there any difference between them or are they just same?
some ways u could find out: 1. log the model's weights + arch, 2. inspect the config file, 3. Try on a sample x and compare, 4. Read the docs. But as a guess i'd expect so. @simple tapir
the yaml file may be for re-training (or creating a model from scratch.), but can't say for sure.
Heuristics and example for when and how to consider dropout: https://www.kaggle.com/code/pavansanagapati/what-is-dropout-regularization-find-out
Hi, I am looking for a Python-oriented AI notes PPT presentation , like python basics then numpy, pandas, matplotlib libraries.
Thanks
I don't have one, but fyi none of what you said (np, pd, matplot) are inherently AI
you might have some luck broadening your search to just 'data science' or something
Hey guys, Have a question on how I can run my program through the input from my phone's camera
But didnt seem to work for me
Hey can someone tell me all Major steps and their minor steps in order. Like what comes first and followed by what. WHere do we start fron? data gathering > wrangling or elt pipelines > preprocessing and what part of it etc. I am very confused about the process in bits. like transformation itself is part of preprocessing, but what others are part of it and at what time or project does it come?
for ai, ds and dl
The quadratic loss assigns more importance to outliers than to the true data due to its square nature, so alternatives like the Huber, Log-Cosh and SMAE losses are used when the data has many large outliers.
never used the Huber loss but it seems quite common
im looking for some good book on constructing/designing/handcrafting loss functions w examples
See pinned messgaes on this channel, maybe thay can help you
thanks for the heads up, actually one of the books seems to have some stuff at least
@unkempt apex can I use this model with a flask script?
This is a bit of a huge task. Accent alone from dialect are two different things. Accent is how you'd pronounce u in cup, or whether you pronounce the r in car etc. Dialect goes into local vocabulary and maybe even grammar/syntax e.g. supposedly in some places people still said "thou" until fairly recently.
neat article, quite hard at parts https://en.wikipedia.org/wiki/Loss_function; sharing in case anyone wants to discuss it :-)
Hey, I have a AI+Cybersec Hackathon Problem Statement
I'd like if anyone could give their insights as to how they would approach this and how you would interpret this
1. Automated data collection from RAW images (forensic images) and other formats using disk imaging tools
2. Automate the scanning and analysis of data, including files, system logs, registry entries, network activity etc.
**3. Identify indicators of compromise (IOCs) and related suspicious activities
4. Integrate AI/ML algorithms for anomaly detection and pattern recognition. The AI/ML feature should incorporate a scoring system and recommendation engine that allow investigators to quickly focus on the important artifacts. **
5. User-friendly review options should include interactive timelines and graphical summaries, while comprehensive reporting capabilities should allow exports in various formats such as PDF, JSON, and CSV.```
Emphasis on the 3rd and 4th point
Thanks
correct me if im wrong, ig we have to make an Anomaly detection like tool for Real time packets.
but the scoring system part is kinda confusing(pt4)
Hello Everyone, I am a student. I want to workout in a product in machine learning. I know programming language like c,c++,java,python. I have also been learning books from Oreilly publication and YT channel. How should I get started ?
hi
Answer the question based on the above context: {question}"""
SYSTEM_PROMPT = """Based on the following context: {context}, please recommend the best tools for the question: {question}. Provide the tool names only in a Python list format."""```
is this good for my Ai RAG, anything to add or remove for making it a good prompt by prompt engg?
sry for late reply!
wdym by using this model?
you can do everything with it, finetune , inference(give input and receive output)
but for flask, then you have to make an endpoint!
just like we use chatGPT api!
but if you are using for yourself then just use
I was hoping to try it on a locally hosted web application
Im quite new to this
str(helper_llm.invoke(f"write the very short summarize & combine of the DuckDuckGo Search Result's Without Loosing Detail, Result:\n\n\n\n{result}").content)
I am Using Langchain, THe Above is the Prompt, How Can I tell AI to not to include the Here is a short summary and combination of the DuckDuckGo search results without losing detail:
curent Result:
Here is a short summary and combination of the DuckDuckGo search results without losing detail:\n\n**Summary:** OpenAI\'s CEO Sam ... interviews, including of members of the OpenAI Board of Directors.
Result I want:
I can use any mode which is available on groq
AI
have you load that model?
@radiant shadow @left tartan here too ^
What are your favorite prompt tips when using language and code models?
For me it’s « PEP8 style format » after a Python request
What are you trying to do?
Getting more quality outputs from local models I use daily
It’s a question with broad applications
Is this only for code generation? Because that doesn't go without saying.
It's gonna be bad code anyway, just gaining some time but for anything serious no copy and paste
I'm just curious, is this what you're expected to do throughout the internship? (it just doesn't seem like you'd be doing much of "generative" AI)
Looks like the internship is about building a data pipeline. It doesn't look like you'll be doing anything with any variety of AI. But your pipeline might support people who will.
Are the skills gained from this sort of thing (building a data pipeline and working with data pipelines) usually useful if one wishes to work in the ML field? Do people there still do these things?
Or is it usually completely separate people building the data pipelines and making models?
Yes, they're very relevant
In many jobs and roles (all the ones I've had) they were one person doing both
You can have a data engineer without a data scientist / ML engineer but not vice versa. If your future employer makes the mistake of hiring a ML person without a data engineer then you'll have to (be willing to) do both
I see, thank you!
I disagree, the outlines around classification have been drawn decades ago, by those measuring the first and second formants of vowel shifts: https://www.cambridge.org/core/journals/journal-of-the-international-phonetic-association/article/abs/formant-frequencies-of-vowels-in-13-accents-of-the-british-isles/857541BE2E95A40117CBF24DE5836F6E
HI, How are you , Can you please telll me i little bit confuse what i learn next i complete PYthon Bootcamp , Which field is best Data Science , Data Analyst , Cyber Security , or AI Enginer . My Self ....... My name is Danish , I do BS in Information Management from Punjab University Lahore. Thanks !
So that limits the scope only to accent, not to dialect, and only to vowels
I don’t know how
then learn to do that
Hi guys is it hard to build an image generation model without the use of any gpt?
Hey guys
I just got an idea but don't know where to start from.
So we use a postgresql DB
I was wondering if someone could guide me on how I can like just give a chat prompt for people, and an LLM model could understand based on the schema and table descriptions that I am going to provide.
I want to train the model with the database schema and its descriptions.
What i want to do is help people not giving information to all the common platforms they use like chatgpt, claude etc, and just train my own model. this way the users dont have to keep explaining to the AI to get answers.
I am a professional, however im very new to all these so just wanted to know if this is something already done or any tutorials that could help me with it.
did you look into variational autoencoders? (encodes into latent vector, continuous space, and decodes to new data.)
Yes Encoder-decoder , but buildiing a model that can handle least 1% of the data , is it hard?
ive no idea im afraid, but maybe smone else here knows
Because i have understood the way it works , but then i wondered to can I build any gpt model , that does atleast 2% of the job from scratch
I ran a model 100 times in dev, I want to take a higher value from the highest value so that I can take care of the performance in prod. What is the appropriate percentage above the upper bound.
example tc 10-100 seconds in dev. There are also ram constraints as 100% ram being used
what's the % I should add above the upper bound that's not too extreme a case
10% seems too much if it takes model 10 s, and to much if it takes 1 week
I want my data to determine my extremes.
So that when I am working on my code in dev, and test it. It will throw error if it takes more than the acceptable range
it's quite weird that the loss fn can be considered as just a fn you want to minimise, vs having some relationship to statistics
i wish one day ill understand that :-(
seems how wikipedia starts the page about Loss
maybe learning how linear regression can be seen as an optimisation problem (least squares, by calculus or linear algebra approach.) or some statistics problem can help, idk.
what is this percentage for?
Trying to figure out which OS to use for data engineering before I jump into learning
I've been a web dev for few years now and will transition to data by new year
I am comfy using any and all win/linux/macos
I just wanna know like what is the preference in workstation/laptop setups real data engies use
100% would require the processing power of idk , maybe a lot , so used the 2%
I use LInux , does the job for you , no much of problem while installing and running dependencies , handles environments also.
What kind of machine do you own, do you do it on a workstation what are the speccs or laptop
Okay I have the HP-Pavilion gaming laptop , 16gb ram 512gb ssd and gtx 1650 , I have dual booted the pc into WIndows and LInux
this is what I am trying to avoid, working on win with wsl lol
hoping that unix based MacBook with m3 chip will do it for me
and for any ml or ds tasks, I use colab
One of the best tools
Macbook should run perfectly well
I really really hope so, cuz I have no experience with data engineering tools and hoping that there will be no issues
like compatability and stuff idk, just wanna take the job with me to cafe if I want to
and MacBooks are awesome for that
pretty cool wiki article https://en.wikipedia.org/wiki/Empirical_risk_minimization
Empirical risk minimization is a principle in statistical learning theory which defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practi...
I barely understood anything lol
Only windows could be a pain, the rest is ok
and yet the best data engi I know in real life uses 6th gen i5 windows PC... lol
1st paragraph?
Most company forces you the OS anyway, It's just a pain for spark and things like that, it's fine but not the best
Yes okay I think that explains the term Epirical Risk
Background? i.e 2nd section
So should I get a MacBook Pro M3Pro chip 18gb ram lol
ive got a mac mini w silicon chip, they are quite nice and cheap from what i see (2nd hand)
Sound good
if one assumes P(x,y) -or joint probability distribution- exists it seems that P(y|x) (or conditional distribution) makes sense for DL. That should be what the model ends up estimating.
P(y|x) is just a slice of P(x,y)
i don't think the pixels are independent variables though..
hey guys does anyone know any ai tools that automatically create flashcards for study? i like quizlet but it doesn't include everything
We rarely know the P(y|x) distribution , when we do that's exactly where theorically we can use naive Bayes algorithm
isn't this the trained model? i.e f(X); say for a classification task.
im trying to map this fn to DL as well (the Risk), i'd asy the integral is normally a sum, L the loss but can't see dP(x,y) mapping to anything
if one has \int sin(x)dx then maps to \sum sin(x) delta
dP(x y) could mean it's for classification or regression, it's a measure, dw it's just a very general way of writing
We don't know an explicit conditional distribution, if we know P(y|x) then we can explicity do a model with it let's say for binary classification we could write it simply with P(y=0|x),
wait isn't that 1/n ? ig it's not..
i dont understand, if we run many examples x through f(x) and f is the NN, and the output is interpreted as probability, it seems to me P(y|x)~f(x) ?
Mostly for code generation, inspiration and learning, not copy-paste.
the classifier as to return a class or a number for regression, in this case, theorically the classifier is when you apply the argmax on top
very easy
just get a pretained model, done.
lol
now if he ask "from scratch" let's run away
now if he ask "but it has to be good" I'll answer: if you have few million dollars, you can do it by funding me and I'll do it for you
he meant general pretrained model :-)
what's a general pretained model?
any pretrained model, it was just a bad joke
lol ok haha,
but yeah anyway it's obviously the goal to approximate this
nice, thanks. i still dont really get it but am closer than before, effectively lowering my loss i think
what are you talking about btw?
im trying to understand a neural network in statistical terms as opposed to an optimised function, or something close to that @fiery bane
this article is so far the simplest description ive found, https://en.wikipedia.org/wiki/Empirical_risk_minimization
though not quite complete.
Empirical risk minimization is a principle in statistical learning theory which defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practi...
it applies to all supervised learning tbf
Isn't loss basically the difference between true predictions (true positives, true negatives) and false predictions (false positives, false negatives)?
So this is what you're trying to minimize, 0 loss -> you estimate lines up with true values 100%
(I'm a noob so take this with a mountain of salt)
like this? https://deeplearningtheory.com/
Official website for The Principles of Deep Learning Theory, a Cambridge University Press book.
yeah, 100%
but it's possible to see it statistically
oh no, no that book lol
if edward witten likes it, i wont understand it
but yeah, it includes a lot of fantastic stuff
lol ias people
How about this? https://yann.lecun.com/exdb/publis/orig/lecun-06.pdf
that looks good, nice to see some physics formulas there
thank you both @fiery bane @spare forum
It would be easier to find smthing like a master degree course or something
compared to what?
Finding articles etc...
Well, maybe all he needed was that one article I posted lol
Just saying, bc on those courses you have a bit of everything centralized with the most important, may be heavier maths tho
that's true.
I think the best combination is if there's a text book, and a course based on just that textbook
finally understood the formula (approx)
the risk minimisation formula is:
- weighing the error (loss) with the probability of that instance,
- adding all up (integral in terms of
xandyvectors) aka risk or probability of error, - and minimising it (wrt to the weights.)
(in practice, it ends up being the standard mean-loss minimisation by backpropagation.)
no, but this paper really blew my mind https://papers.nips.cc/paper_files/paper/1991/file/ff4d5fbbafdf976cfdc032e3bde78de5-Paper.pdf
i dont know much and need to go in steps
never heard of this before https://en.wikipedia.org/wiki/Riemann–Stieltjes_integral but it was useful for it
In mathematics, the Riemann–Stieltjes integral is a generalization of the Riemann integral, named after Bernhard Riemann and Thomas Joannes Stieltjes. The definition of this integral was first published in 1894 by Stieltjes. It serves as an instructive and useful precursor of the Lebesgue integral, and an invaluable tool in unifying equivalent f...
I mean, sounds like this is the kind of things that you want: https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
actually looks fantastic, great plot quality
last few days i read 5 papers and saved about 50
XD im falling behind
do you want more?
no, i just want to understand
good luck!
thank you, same :-)
I don't need luck.
I need miracles T__T
u theist?
yea sure
do AI/ML positions ask for LC during interviews?
getting error while installing packages with pip
on aws ec2 instance
WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pip/```
this one
how does pytorch, tensorflow and other neural network frameworks save the weights and biases so effeciently? I'm making my own module from sratch and I notice that my file sizes are in the gb's and all I did was copy the yolo v1 architecture
https://github.com/TheonlyIcebear/Neural-Net-Framework/blob/main/utils/network.py
is it just that the data is compressed using some algorithm?
I'm not interested in classifying dialect, just accent, partly because I believe simply identifying the vowels' first and second formants will provide as much geolocation information as more detailed examination of speech. It still requires dictation with phonetic time points (e.g. a first pass with a STT service like AssemblyAI, a second and third pass for forced alignment of words and phonemes with PocketSphinx, and a fourth pass putting each voiced phoneme into a formants extractor.) That could build a classifier with enough training data.
Unfortunately, nobody seems to have labeled training data of just "people native to $CITY, UK saying things". Is it even possible that doesn't exist somewhere yet?
just leaving this here.
zlib
thx
What are you comparing with? Maybe the model you're looking at is quantized and your version isn't?
to enhance model's training time efficiency .....is it enouf if i install eGPU only?
im using macbook
and im planning to deploy the model to docker then connect the model via an API to my EVE-NG
bczu i wan the model to analyse time series data from my lab which is residing inside EVE-NG
EVE-NG is virtual environment for computer networking
can you guys tell me what your development environment is for working or fine tuning models? I as a student use a gaming laptop until i blew my gpu a while ago, it wasn't that strong (GTX 1650) but it got the work done, but now i only have integrated graphics to work with which is abhorrent, so... what are some places where i can migrate my project to, to get gpu access?
If you Want to train, like there are some options as kaggle, colab
But then working with it maybe you need to adapt for cloud gpu providers or maybe AWS things
I have tried both ( AWS and colab) , it depends on what you wanna do with and how big is your mode
this project is that to use the model to analyse my network lab
well i usually work with py files and not notebooks, are notebooks capable of OOP's?
Yeah , after finishing your code , you can download that notebook as .py also
also didnt wrry for the network it just my own smoll network lab project and im myself have been majoring in computer networking till degree
It's python in all case, notebooks just separate chunks of code and outputs
anyone ?
Wait , some one will respond
You want to decrease training time?
okay guess i will learn google colab then
Then it will literally , depends on eGPU which u will use
for finetuning models using pytorch will i require colab pro?
yh
Collab pro only offers GPU for more time!
You can still fine-tune within time limits
for model like SVM is it CPU or GPU focused?
interesting paper this one https://arxiv.org/pdf/2012.05208
Most ml models doesn't need the use of GPU
I used to learn SVM on my cpu
what CPU you used that time ?...and the data..also does its kind of pepega or so so?
sneak peek for the curious
i mean asked you that time
i mean i wan to connect the model
to an application which is used for my virtual computre networking lab
Wdym mean by connecting model?
You can host that model and I tegrate API's ( just like GPT)
Or maybe add the model on kab, and then use that with short python code
EVE-NG computer networking virtual enviroment
Is that web app?
app
Ahh, then I am not familiar with that
I guess you have to use API then by hosting your model
an application, my computer networking lab is inside the app and i wan to connect the model to analyze traffic data of my lab
Ha e you ever tried integrating API calls on app?
Any app
Oooh, that's really interesting
I should read that paper
nice, if you do we can discuss it
at least the intro, idk how complex it gets later so i might not be able to discuss the rest XD
You can also integrate in ap itself only if that model is too small, but again that will be wrong approach
wait... you can't use terminal in colab? how do you import libraries?
Lol, nice question, colab do this for u
Just import it in code
damn i am dumb
paid colab has terminal
It's somewhat reasonable that AI won't work out of distribution, but does it learn generalisable units that can be easily learn out of distribution? The answer is to some extent yes (fine tuning, and other approaches), and no (they can't solve ARC challenges.)
Why does this happen?
but most of those guys (see bengio and karpathy, now seem to disagree!)
I don't know, I feel like this is the sort of question the leading experts are trying to solve and I don't have the knowledge
I mean, fundamentally you should be able to learn to generalize based on limited information because humans do it. That's the thing.
So maybe the approaches we are using are just not yet good enough, like we haven't discovered how to make machines that "learn concepts" in the way that allows for this sort of generalization
Some people would say "just throw more compute at it" but idk about that 🥴
Yes, I agree, it's quite puzzling
that paper says:
In this work, we will adopt a more unified approach that addresses these problems from within the framework of connectionism.
we'll see (the "problems" are of creating more abstract, symbolic units; and "connectionists" is just standard deep learning.)
visually, it looks like this (the last bit is similar to Marvin Minsky's diagrams.):
Hi fellow data scientists
Bois is anyone interested in helping me in a project
Abt voice keyword detection
It's for a competition. I need help in vc
Hmmm, so basically every object corresponds to a "mental object" and that's what allows us to reason about things abstractly, like "this object is like this relative to this object"
This is my understanding
And NNs don't currently have that capability
imho the notion of "object" isn't that difficult to learn, isn't SAM (Meta's Segment Anything Model.) excellent at that?
im not sure whether it knows an object from a part of it, though, but does not confuse them in a way..
I don't know anything about SAM
But also "object" is kind of a really vague term
yeah, there are papers about what an object is...
Like, anything is an object. You can say that any part of an object is an object, any property of an object is an object, any action is an object, etc.
If we're talking about a "mental object", which again, I'm not defining well so maybe what I'm saying doesn't make sense 🥴
you can directly use sam
it's worth checking it https://segment-anything.com/demo
Oh nice, I'll look into it later
If it's computer vision, it's more about detecting objects right?
I think the paper talked more about the ability to reason about objects (from reading the beginning)
Oh cool
yes, but for that you need segregation
one problem seems that networks have all the information merged
Which is the first step here right
exactly
and the reasoning is the composability
so i'd do: SAM => NN 1 => NN 2 say
NN 1 may not be necessary actually. that's only for CV
Yeah I think that would make sense to have multiple models working together
in my mind binding problem == not having segregation
Guys is there someone who's willing to collaborate with me on a keyword detection project I don't have much knowledge in that field any help is appreciated
read sects 1 & 2, will read 3 tomorrow likely
Nice, I only read section 1 so far
ended up reading 3 as well, not all the details though, and will instead read 4 + 5 tom.
Any ESRGAN expert here??
Hello, remember to never ask to ask. always ask your actual question.
I am asking who is a ESRGAN expert here? So I can get to know about upscaling and stuff
right. don't ask who knows about ESRGAN. ask a question that someone who knows about ESRGAN could start answering.
by waiting for someone who thinks they know about ESRGAN to present themselves, you're creating extra steps for that person if they ever appear, and preventing other people from potentially helping.
Hi, does anyone know how I can plot a line for average increase over time using one axis with Matplotlib?
Usually I do it based on 2 axis but cannot get any methods to work using 1
example data = [10,20,12,14,12,9,15,18,12,10,15,14,17,10,20]
The other axis is the days. Example: ["10th July", "11th July", "12th July", "15th July", "16th July", "20th July"]
np.mean()
I want to know about ESRGAN upscaling images and videos on Google collab???
paperspace is pretty great
someone who would know about ESRGAN could not answer this question without asking follow-up questions, please just ask your actual question
My google collab show this whenever I upscale the videos using ESRGAN
How to solve this issue??
this appears completely unrelated to ESRGAN apart from it being somewhat involved in the process as a whole 
and there's not enough information to answer your question, you haven't defined a name 
to fix this, you would define this name
if it works locally but not on google colab, consider it being an issue with environment compatibility, for example, you're using a newer or older version on google colab than locally
I got this collab from someone, but they are unreachable. And also you have to mount google drive. Then mention the input folder and the model's path. After that the upscaling would begin. But this is not working with videos. On images it works fine.
Should I send you the collab for corrections????
how did you get it "from someone" if they are unreachable?
the error is pretty clear on what the issue is and the issue appears to be in your notebook
because in some code path the name pre_upscale is not defined, yet used
Before I joined a community but it somehow got deleted or discontinued. Some user wrote me this but after that they don't even respond to messages and emails
What to use it in instead of it???
I would assume the same thing, but it's just not defined in that code path
and you still haven't provided more information
well, I suppose you asked whether you should and I didn't respond to that...
anyway, paste your code here: https://paste.pythondiscord.com
Should I send you my collab??
!paste no, paste it here
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
I have pasted the collab code how to share it with you?
send the link to the paste
Could you kindly review and correct what's necessary
I would be grateful to you
can you point me to where in the code is pre_upscale defined?
Ok
Line number 56
In video upscaling
which line of code is that?
do you have some understanding of how Python works? because I feel not and if that's the case, I would advise you to start from the beginning, you can check out the resources linked below to get started
!resources
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
Could you correct it??
as I said, it's not defined there, it's only referenced there, you need to define it first
and since you don't quite seem to understand that, I feel as though you should pick up on the basics before attempting such endeavours, that'll make it easier for you down the road too
How to define it???
I'm afraid it won't be that simple, for one, I have no idea what that function is supposed to do, and two, it would take quite a bit of effort to define it (probably), so, again, I would suggest you start with the basics and slowly work your way up
Could you refer me to someone??
^ the resources here are fantastic
first time using U-Net model, ,, so this is after 20 epochs,
but the mask should only contain the lines for road
nnuh uh
not too sure what that is but is that just a edge detector?
if so it seems to be doing its job
ofc edge detector for roads, and then drawing those lines
like this one, where it is only creating lines
what was the input for that?
it seems like it is only detecting the lines over a certain thickness, so likely needs more training time / model capacity or its a dataset issue
idk tho im prolly wrong
like this
num_classes = 1
is it okay?
because I only want that white lines?
supervised or unsupervised?
if these are the masks then you probably don't want to have a grayscale mask as an output, you want to convert pixels above a threshold to pure white and below the threshold to pure black and perchance calculate the loss with that
image = Image.open(img_path).convert('RGB')
mask = Image.open(mask_path).convert('L')
this is the code I used while loading dataset
so all masks are grayscale
so do you think this is bothering it?
I meant that you convert the output of the network to black and white, instead of having values in between
or I should use mask images as it is?
not understood!
like here with the predicted mask if the value of a pixel is say greater than 150, you just set it to 255 and if the value is less than 150, you just set it to 0
okay okay got it now
lemme see how it can be done
lemme know how it goes, I'm curious as well 😁
def predict_single_image(img_path, model, transform, device, threshold = 150):
image = Image.open(img_path).convert('RGB')
image = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image)
output = torch.sigmoid(output)
output = output.squeeze().cpu().numpy()
# now applying threshold
binary_mask = (output> (threshold / 255.0)).astype(np.uint8) * 255
return binary_mask
is it good?
nah, still not getting correct output
you can just use np.where
binary_mask = np.where(output > (threshold / 255.0), 255, 0)
okay so no matter what I change the threshold, it still gives me this
tried 0.5 also
this what?
are you using this?
yeah
then it, uhh, doesn't make sense, if something were off with the threshold, you'd be getting either a completely white or a completely black image
(did you save the code?)
yeah auto save on vs code
well, surely something's not running
did you rerun the code?
how are you displaying the image?
can you go into a debugger and look at it?
import torch
import torch.nn as nn
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
from model import UNet
from customDataset import CustomDataset
model_path = 'best_model.pth'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = UNet(n_class=1)
checkpoint = torch.load(model_path, map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)
model.eval()
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
])
def predict_single_image(img_path, model, transform, device):
image = Image.open(img_path).convert('RGB')
image = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image)
output = torch.sigmoid(output)
output = output.squeeze().cpu().numpy()
return output
test_image_path = 'random.jpg'
predicted_mask = predict_single_image(test_image_path, model, transform, device)
# Visualize the result
original_image = Image.open(test_image_path)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(original_image)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(predicted_mask, cmap='gray')
plt.title('Predicted Mask')
plt.axis('off')
plt.show()
this is the whole test.py if you want
also , why to use debugger?
ofc
so, uhh, where in this do you have that np.where?
debugger is an amazing tool
wtff sorryt
I accidently changes into train.py
it's late night here and I am still awake with half open eyes😂
need to sleep but after this testing
okay threshold is working
but not getting accurate
for example, setting 150, giving full black image
perchance need to lower it
but also, try training on that mask
use it for calculating the loss from the ground truth and such
training on mask?
how?
by specifying mask_image with threshold?
mmm, I'm not sure, maybe what I'm thinking of is more suited for a metric insetad of a loss, I was thinking of essentially using log loss and comparing the masks you get from the model after applying this threshold to the ground truth image 
is it problem in training?, because testing seems to be simplen now, ( just use random.jpg and generate mask according to model is being trained )
I mean, clearly the model has either not had enough training or the training was ineffective
okay will give a look at it, thanks for the time
hey guys, some friends and I are working on some hackathon projects this month in the DS/ML space. if anyones interested in joining in shoot me a message!
Specifically Open AI doesn't have free API as far as I know, it's paid
You could look at Hugging Face, I've heard they have some free API there (with a different model). It's going to be limited of course but for testing it should be fine
Thank u so much man ❤️ .
Another question please
is there any alternatives for Google colab pro ? free with more ram , cz i cant work under 12.7 Gb Ram
you can't use it to host models, no.
the only other colab-like platform I know of is kaggle notebooks. but either way, no one is going to give you unlimited free compute.
Hugging Face Spaces is pretty generous for model deployment tbh
I don't think you're going to find >12GB RAM for free without strings anywhere though
hey, I just joined. Can anyone suggest a library for transforming excel files into markdown which preserve as much as possible of the original formatting? I've done openpyxl -> pandas -> markdown, but you lose a lot of formatting there.
Have you looked at Github Codespaces? https://docs.github.com/en/codespaces/overview
Account plan | Storage per month | Core hours per month
GitHub Free | 15 GB-month | 120
GitHub Pro | 20 GB-month | 180
Hii
Can someon help me with something
I am getting my data like this
But i want it like this
data2.head()
or even just data2 ?
i wanted to make a network by revorking a tutorial
idk much about the matrix and vector operations involved but i know mostly how it shoul work
my problem is that my code isnt really learning. it stops at like 60% precision
code: https://paste.pythondiscord.com/JY7A
can any1 tell what i messed up?
tutorial: http://neuralnetworksanddeeplearning.com/chap1.html
i used the same settings and he starts at 90% i arrive at 60%-70%
for activation function deriative try this maybe tell me if its good:
def sigmoid_prime(z: np.ndarray[float]) -> np.ndarray[float]:
s = sigmoid(z)
return s * (1 - s)
thx checking rn
Give me 10 minute to write code for your evaluation function to make sure its correctly computing accuracy
i can wait thx
also as i see that should be a speed up but it only avoids to run it twice doesnt really improve the fact that my score seems to cap at 60.7%
test_results = [(np.argmax(self.feedsforward(x)), np.argmax(y)) for (x, y) in test_data]
return sum(int(x == y) for (x, y) in test_results) / len(test_data)
(Evaluation function)
try @cosmic willow
you right
not that you need to really optimize an evaluation function
making sure its correct is important
but yea
ofc ofc, I just mean it's a tiny tiny part of your runtime if you're training a model
yeah
how could i implement printing the train accurasy too? i feel like it may be learning that only.
the function spartan gave you does calculate accuracy, just print it
on the test data not the train data
u can add a method to calculate training accuracy
similar to evaluate
and you can update the learn method to print the accuracy of both trainings and test datasets
finally got my first data science job!
that is great man
the train and test accurasy seems to be the same but still it starts at like 60% goes to 67% and goes up and down there
congrats
congrats
Gaming GPUs and enterprise ML GPUs are now two separate classes, basically
Yes but enterprise GPUs are pretty expensive.
depends on what you want to run, I've seen some things that require >20GB
I was hoping to work with smaller LLMs like Llama-3-8B and train neural nets
Do you need it tho
basically, any CUDA-enabled GPU is good enough for whatever you can fit on it. And if you're training/fine-tuning a model, you need to factor in the memory footprint of that as well.
If you're trying to fine-tune an interactive LLM that came out within the last year, gaming-tier GPUs might not be enough.
Hmm, even an 8B model?
calculate how much VRAM you would need to load the 8B param model, and then how much extra room you would need for fine-tuning.
if that can fit on a 4080, yay. if not, you might be able to quantize it.
You can use some computational ressources from any cloud provider, don't need to have the latest GPU at home
I've never done the math on that, but buying compute time on an enterprise GPU for the specific experiments that you want to do is probably going to be cheaper than buying a gaming GPU.
(I don't train models on my gaming computer because then I wouldn't be able to game while I wait for the model to train.)
Yeah
There is no way it's economical to buy entreprise GPU for this, it's more of a geeky satisfying thing
(which is fine)
true
use a cloud gaming service to free up your GPU /s
Can I combine a bunch of these with NVlink or SLI:
https://www.amazon.ca/dp/B09SJ2BZ85?ref_=cm_sw_r_cp_ud_dp_QGCCXX2FHAP0HQX8S34R
NVIDIA® RTX™ A2000 12GB brings the power of RTX to more professionals with a powerful low-profile, dual-slot GPU design, delivering real-time ray tracing, AI-accelerated compute, and high-performance graphics to your desktop. The VR ready RTX A2000 12GB combines 26 second-generation RT Cores, 104...
Actually, 2 of them would also do the job
24GB VRAM
you'll get a performance penalty each time a computation spans more than one device
I see, so I'd benefit when tasks are larger than 12GB?
By tasks I mean models and Datasets
having two 12GB GPUs is worse than having one 24GB GPU, because data will occasionally need to move from one device to the other.
Yes, true, but it would save me some money.
Whereas a 4090 would be nearly 1k$ more.
sure. I'm just letting you know that that's how it works.
How much of a performance impact could I face?
I'm not sure
Ohk, thanks though.
Either this or 2 used 3090s.
Cause new ones are very expensive.
there's also techniques like qlora or tools like unsloth to help reduce vram requirements
for running the llm you need a lot less
Yeah, I thought LoRA was only to reduce training times tho?
does it help with VRAM limitations as well?
qlora is like training on quantized models, and quantized models are smaller obv
I thought LoRA was when you changed and trained 1-3% of the model's weights or something.
the q in qlora is quantized
so you're doing stuff on a quantized model, and quantized models are smaller than not-quantized models, thus takes less vram
Oh okay!
here's a table on llama factory
https://github.com/hiyouga/LLaMA-Factory#hardware-requirement
I see thanks!
actually, I see people buying tesla P40s for inference, not sure how they are when it comes to training
they're pretty old, but has 24gb vram and definitely cheaper than a 4090
Oh, let me check that out
I found this for $200
It seems to have 24GB RAM
4992 cores sounds pretty low?
Seems to have more than a T4
Not bad
not good either
but not bad, especially for the price.
4080 SUPER was 10240?
hmm not as low as I thought
Indeed
but won't the gen of the CUDA cores also play a role?
or are they all the same?
that's a k80, even older than a p40
nonetheless, these are all old cards, and thus support an old version of CUDA, so I'd check compatibility at least before purchasing
idk
also I have no idea what this means but from a review on the amazon page: This is NOT a graphics card, it is a graphics accelerator
like it's kinda special hardware and it doesn't function as your normal gaming gpu for example
^This makes sense
YB, may I ask why do you prefer buying your own GPU instead of buying compute time?
Oh okay
No particular reason but I'd like running things locally, idk why.
I see
so i'd need a GPU to go with this?
no, but I think you need a power converter or something
I'm not too knowledgeable either and you should probably research it just in case
I think I know a server with people who are familiar with this kinda stuff.
you're definitely better off asking them about it then 😅
and gl
why have you linked this?
just thought it was interesting
Is this related to the paper about how NNs don't have separate mental representations that you linked before?
yeah, last part of the paper, they say that perceptions are the basis of concepts
this is less related, but really crazy https://en.wikipedia.org/wiki/Ideasthesia
Ideasthesia (alternative spelling ideaesthesia) is a neuropsychological phenomenon in which activations of concepts (inducers) evoke perception-like sensory experiences (concurrents). The name comes from the Ancient Greek ἰδέα (idéa) and αἴσθησις (aísthēsis), meaning 'sensing concepts' or 'sensing ideas'. The notion was introduced by neuroscient...
(that the trigger of some perceptions or the why is semantic, as its believed in the case of synesthesia.)
That's interesting, I wonder what a "perception" would be for an NN
I've heard some people say that one of the major things that prevent these models from being closer to human performance is that they don't "learn on the fly", so to speak. And humans do, you learn something from every perception
But I'm not sure how human brains do this, is this just because biological neurons are extremely different from artificial "neurons" or is there something else
guys what is the best environmnt to train and work on a chatbot ?
yes, that's discussed in the paper, but they just describe it more or less like you did, there isn't a clear solution
A Linux machine with a large GPU.
that they don't "learn on the fly", so to speak.
i think some are trying related stuff to solve the arc problem (by chollet; he offers 1M prize.)
not sure though, i vaguely remember.
i dont have a large gpu
so clearly i need a cloud env or smtg like this
Yes
what do u suggest
aws, if you want free then use kaggle or colab
okay so right one is predicted mask , but it's not that accurate
thanks
Use cv2.Canny for it
for what? predicted image>?
and why?
It has better edge detection
heh? bruhh I am using U-Net model to train the images, so why to put canny here ?
paperspace is really nice (mostly paid, but there are certain free things as well)
have you tried paperspace? they're pretty nice
currently on aws! on T4 gpu
Isn't this a proof that in NNs the inputs can be considered random variables ? https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables#In_machine_learning
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in d...
but what random variable??
it could only be bounded within datasets
paperspace is also nice though
the random variable is the set of pixels (for images), each time for example.
each time you withdraw an image it comes from the same distribution
i.e the training set
so it is iid imho
is every pixel fully independent?