#data-science-and-ml
1 messages · Page 93 of 1
what you showed should be a way to do it
>>> t.tensor(0.4)
tensor(0.4000)
like how much total sales of each genre is
i wanna add global column of every row which has genre as 'Action'
do you have a sales field?
yes
see this
Global is total sales
over all regions
i wanna add global sales of all the games which fall under the genre'Action'
oh wait we have been equating wrong column all this time
this will actually work lol
working
checks out too
thankss
136.85 Billion Dollars
nice:))
Hm?
@blazing vale please don't call out specific people. ask a complete question that anyone who knows the answer could start answering.
Ahh okay
Yoo
Line 167,166,128
Can you tell me whats the issue with em?
you have to say what they currently do, and how it's different from what you want them to do.
The output is at the end of the code
Its giving me blank
No output for the condition on line 167
Sir nedbat told me there is smthg wrong with line 128. But i cant figure it out
I have made it so complex i can’t figure out myself what wrong i am doing 💀
i asked chatgpt. lol it figured out easily. i am so dumbbb
it was a minor mistake
what hugging face training data will make anime or cartoonlike art I have dreamlike-anime but it is not often accurate I am familiar with safesensors
has someone played with the new version of midjourney?
By training data, do you mean any other diffusion based model hosted on HF? Most of these SD models including dreamlike-anime were pretrained on a very large private data. You will hardly find any data that can be used to train a similar performing model.
Would anyone be interested in doing a deep learning project with me it will be a self dataset building ai dm me if interested 😁
anyone here ever deal with interactive graph visualization? e.g., the ability to zoom in or zoom out, or obtain info by hovering over a node
I found an article the other day that described different methodologies of how output data could be organized, and now I can't find that article. Perhaps you all can help me figure out the appropriate keywords so I can find it again.
The two concepts were one that involved many rows, as opposed to organizing output data into less rows, more columns. Probably be easier to show a couple output tables in csv format:
1,Pressure,50
2,Pressure,60
3,Pressure,70
1,Flow,10
2,Flow,20
3,Flow,30```
As opposed to:
```Scenario,Pressure,Flow
1,50,10
2,60,20
3,70,30```
I'm trying to figure out which is "better", what the pros/cons are for organizing data in either method, and so forth. I know the answer to this is "it depends" but unless I know what terms to use to search on the internet I won't be able to learn anything about it.
Please let me know if this doesn't make sense and I'll do my best to clarify things more. Also, please tag me if you respond so I'll get an alert. Thank you!
i might learn javascript just for the libraries available for graph visualization
Wide and narrow (sometimes un-stacked and stacked, or wide and tall) are terms used to describe two different presentations for tabular data.
We are trying to analyse this huge dataset, which is supposed to predict the number of days for the treatment of a patient.
We have tried multiple models like GradientBoost, CatBoost, LinearRegression, LGBM, XGB etc.
We have also done feature selection including Mutual Info Feature Selection, Correlation, Variance Threshold. And other stuff like Normalization as well as Outlier Detection.
We have had the best results with Gradient Boost Regressor. We have done some hypertuning via GridSearchCV.
What is our next best step to reduce Root Mean Sq. error
The dataset looks like this 👇
You actually have to look at the data imo
Imo it's something you can't do with summary statistics etc. I'd try and understand what you're working with first
This will lead you to feature engineering etc.
https://github.com/magjac/d3-graphviz maybe this? I have used it for visualization purposes. Pretty cool. Also allows render/zoom features.
Regularization is also an option you can pursue if you have a lot of (useless) variables. In my experience Extra trees works well if you have many correlated predictors as well.
yeah, currently investigating non-python alternatives at the moment
literally nothing for interactive networks in python atm
am investigating whether the Gephi Toolkit is worthwhile and more efficient than javascript approaches; the toolkit uses OpenGL
def maxs2(genre,region):
print(df[(df[('Genre')]==genre)&(df[region]==df[region].max())])```
returning empty dataframe
just like my brain
max doesn't return a mask
but i used == before it. wont it check?
I think pyvis is also good for visualisation if you are using notebooks/ python. I can't recall if it support interactive visuals or not.
Maximum and Minimum Sales made by each Genre across each Region and ROW Sales
trying to do this now
oh I see my bad I think it should be fine
it returns empty dataframe 😭
Enter genre: Action
Enter Region: North America
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []
output
had a look at vis.js, also seems like a cool library
If the end goal is reducing rmse, maybe try out strategies like cv based ensemble if there are no constraints on time. If samples are less, maybe seed averaging for more robust predictions, more feature engineering?
Why does everyone use gradient descent for backpropagation or determining weights, there are so many other faster ways, simulated annealing, batched quasi Newtonian, genetics algorithms, random weights plus some adjusting, and other metaheuristics? Is it because it is easier, more papers on it, easier to scale and more stable? I feel like to progress even more in ML, people need to transition beyond normal back-propagation and a new architecture.
Could you elaborate on CV based ensemble? (no, there are no constraints on time)
why does BERT base uncased need 12+ GB GPU memory to train
when using a standard AdamW optimizer, doesn't it represent each parameter as 8 bytes
so 110 Million Paramters x 8 bytes, 880 MB, + 1x for gradient + 1x for optimizer state
@valid wind show your training code
and in particular, when you move the training data onto the gpu
yeah let me show it
tokenizer=transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model=transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
train_dataset = datasets.Dataset.from_pandas(train_df, preserve_index=False)
def tokenize_texts(texts):
global tokenizer
q1rows=texts['question1']
q2rows=texts['question2']
return tokenizer(q1rows, q2rows, truncation=True)
tokenized_data = train_dataset.map(tokenize_texts, batched=True)
cast_features = tokenized_data.features.copy()
cast_features['is_duplicate'] = ClassLabel(num_classes=2, names=['not_duplicate', 'duplicate'], names_file=None, id=None)
tokenized_data=tokenized_data.remove_columns(['question1','question2', 'id', 'qid1', 'qid2'])
tokenized_data=tokenized_data.rename_column('is_duplicate', 'labels')
tokenized_data=tokenized_data.train_test_split(test_size=0.2)
training_args = TrainingArguments("./quora-bert", evaluation_strategy="epoch", save_strategy='no', report_to='none', num_train_epochs=3, per_device_train_batch_size=32, per_device_eval_batch_size=32)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_data['train'],
eval_dataset=tokenized_data['test'],
compute_metrics=compute_metrics,
data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
tokenizer=tokenizer,
)
trainer.train
please edit to to include a py on the first line with the backticks
```py
code
```
ty
sorry I forgot to add that
@valid wind
you don't need global to read the global scope. only to write to it. so global tokenizer is unnecessary.
try lowering the batch size and see if you don't run out of GPU memory.
no I don't run out of GPU memory training this, when using 16GB, however, when I run my own training loop, with even 32 batch size this only requires 12 GB around of memory
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
looks like your paste messed up the indentation.
yep, fixing it now
but it might be just because I'm clearing my cuda cache
@serene scaffold https://paste.pythondiscord.com/DIVA
this should be better
gotta do a work thing. I should be back in like ten minutes
np, thanks for the helping me
# Run the forward pass of the model
logits = model(
input_ids=batch['ids'].to(device, dtype=torch.long),
attn_mask=batch['mask'].to(device, dtype=torch.long),
pred_indicator=batch['pred'].to(device, dtype=torch.long),
)
loss = loss_function(logits.transpose(2,1), batch['targets'].to(device, dtype=torch.long))
@valid wind this potentially saves memory because it never creates extra references to all those cuda tensors in that scope
you also have a syntax error on line 33
that makes sense
and yeah the code runs, I must have done something while fixing the indentation
yeah it was an error while pasting it or something
that's not in the original code
can you give a minimal example of your data? are you looking for the row with the max sales for the given genre, in the given region?
normally i'd recommend just 3 columns here ("long" format): genre, region, and sales. instead it looks like you have a separate column for sales in every region ("wide" format).
if you set up your data with indexes, you can do this as simply as df.loc[(genre, region), "sales"].max() or similar
but that requires setting up your data correctly, which requires you to share information about how you constructed this data
i second what zestar said. all the fancy machine learning algo stuff comes after you've done a thorough inspection and analysis of the data. you will also want to form a coherent understanding of how the data was collected and how these measurements were obtained. otherwise you're just flailing around, which doesn't actually work in most cases, despite the breathless hype of companies that want to sell you machine learning APIs, cloud compute, etc.
how could I programmatically control the zooming/centering of a pyplot graph while it is actually being rendered? in a sort of REPL fashion
@long canopy you mean like control these things programmatically instead of in the gui? https://matplotlib.org/stable/users/explain/figure/interactive.html
ah! this is it, thank you
thanks for the reply.its actually my dumb mistake. i checked the whole dataset in excel and found out there were no suitable matches for the info i was giving i\n input lol
So basically i am working on this big dataset which has 826 rows and 9 columns. i made a small search system using pandas to access rows according to user input. However suppose there are no matches. How can i make it in a way that instead of returning this output given below for no results it just prints"No results found "```
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []
anyone?
Scraping google most certainly violates their terms of use (something we can't help with). Have you considered one of their APIs? Or open street map?
Note that Google maps specifically prohibits caching or storing their outputs for use in your own database
People do it all the time of course, but as per server rules we officially are not allowed to help with anything resembling that
what kind of data specifically? maybe open street map / nominatim or geonames can help
i mean, if you're just looking to practice, you can probably use google maps, yelp, foursquare, bing, facebook, etc
openstreetmap has some of that but it depends on volunteer input
well, writing a python app to fetch data from an api is good practice. not related to the channel topic, but good practice all the same
I am assuming you have a defined split for cross validation. In that case, you can try out blending predictions from multiple experiments. Can also do weighted ensemble w1pred1 + w2pred2 + wn*predn , these weights could be tuned using some grid search algos ex optuna. Ensemble becomes more effective when you have diverse predictions. (low correlation)
i have an np ndarray B of shape (N, K, K), and a of shape (N,)
i want to find the weighted sum of the N KxK matrices in B, where a stores the weights
np.dot(B.T, a).T seems to work on its own but if i decorate the function its in with numba.njit i get this error
Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function dot>) found for signature:
>>> dot(array(float64, 3d, F), array(float64, 1d, C))
im guessing its because B.T becomes fortran style because of the transpose, can i make np.dot work without the transpose or do i need to write a loop myself
minimum repro
import numpy as np
from numba import njit
@njit # works if this is removed
def f():
a = np.ones((10,))
B = np.ones((10, 30, 30))
return np.dot(B.T, a).T
print(f())
I think you will need to write explicit loop for performing B transpose, numba should be able optimize that. Quite straightforward.
sad
was np.einsum("i,ijk->jk", a, B) too slow?
looks like numba doesnt support einsum
I’ve dataset. For examle it contains age, money , job, salary etc. How can i give points per user? Between 0 - 1. Unfortunately i dont have target data
hi,anyone aware of a pre-implemented fast way to have numpy materialize a element path, if one array column is the parent indexes
aka given something like
tree_array = [0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9]
when asking for the path of the index 9, returning [4, 1,0] and when asking for 19its [9,4,1]
its a easy python loop but i'm working with about a dozen million elements and i would like to have a pipeline of extract path, get indexes of another array based of the path and return that - i cant find a premade implementaton of that type of tree walking
Do any of you guys know how to setup a local dbpedia triple store?
I'm not sure that I fully understand the desired behavior, but if it's easy to solve with a python loop, is there a reason why numba isn't viable?
Take a look at the docs here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization
looks like theres a standardize keyword argument that you can use to change that behavior.
A preprocessing layer which maps text features to integer sequences.
Optional specification for standardization to apply to the input text. Values can be:
None: No standardization.
"lower_and_strip_punctuation": Text will be lowercased and all punctuation removed.
"lower": Text will be lowercased.
"strip_punctuation": All punctuation will be removed.
Callable: Inputs will passed to the callable function, which should be standardized and returned.
Keras TextVectorisation does have a parameter named standardize. By default its set to standardize='lower_and_strip_punctuation'
You can set it to None to keep the same case
Sorry missed Stelercus response.
By the mouth of two shall every word be established.
im looking for a preexisting optimized tree walker,
sorry that I don't have the answer
I really like pytest, though
class NanoGPT(nn.Module):
def __init__(self, params):
super(NanoGPT, self).__init__()
self.sequence_encoder = SequenceEncoder(params)
self.transformer_1 = Transformer(params)
self.transformer_2 = Transformer(params)
self.transformer_3 = Transformer(params)
self.norm = LayerNormalization(params)
self.lm_weights = RandParameter(params.coordinates, params.tokens)
def forward(self, sequence):
sentence = self.sequence_encoder(sequence)
sentence = self.transformer_1(sentence)
sentence = self.transformer_2(sentence)
sentence = self.transformer_3(sentence)
sentence = self.norm(sentence)
sentence = sentence @ self.lm_weights
# last bit ...
return sentence
I'm almost done !! only the logits thing left
after this im gonna teach it to sort letters
class NanoGPT(nn.Module):
def __init__(self, params):
super(NanoGPT, self).__init__()
self.sequence_encoder = SequenceEncoder(params)
self.transformer_1 = Transformer(params)
self.transformer_2 = Transformer(params)
self.transformer_3 = Transformer(params)
self.norm = LayerNormalization(params)
self.lm_weights = RandParameter(params.coordinates, params.tokens)
def forward(self, sequence):
sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
sentence = self.transformer_1(sentence)
sentence = self.transformer_2(sentence)
sentence = self.transformer_3(sentence)
sentence = self.norm(sentence)
logits = sentence @ self.lm_weights
max_values, max_indices = logits.max(dim=2)
shifted = logits - max_values.unsqueeze(2)
exponentiated = torch.exp(shifted)
return torch.sum(exponentiated, dim = 2)
aight imma train it now
class NanoGPT(nn.Module):
def __init__(self, params):
super(NanoGPT, self).__init__()
self.sequence_encoder = SequenceEncoder(params)
self.transformer_1 = Transformer(params)
self.transformer_2 = Transformer(params)
self.transformer_3 = Transformer(params)
self.norm = LayerNormalization(params)
self.lm_weights = RandParameter(params.coordinates, params.tokens)
def forward(self, sequence):
sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
sentence = self.transformer_1(sentence)
sentence = self.transformer_2(sentence)
sentence = self.transformer_3(sentence)
sentence = self.norm(sentence)
logits = sentence @ self.lm_weights
max_values, max_indices = logits.max(dim=2)
shifted = logits - max_values.unsqueeze(2)
exponentiated = torch.exp(shifted)
probs = exponentiated / torch.sum(exponentiated, dim = 2).unsqueeze(2)
return probs
params = ModelParameters(
# The dimension of a vector embedding
coordinates = 3*1000,
# The number of tokens in the vocabolary
tokens = 3,
# The maximum number of words in a sentence (context window)
words = 10,
)
def generate_data(batches = 2000):
for _ in range(30):
sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")
sorted_matrix, sorted_indices = torch.sort(sequence, dim=1) # Sort along columns
encoding = torch.eye(params.tokens).to("cuda")
yield sequence, encoding[sorted_matrix]
nanoGPT = NanoGPT(params).to("cuda")
optimizer = torch.optim.Adam(nanoGPT.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
torch.autograd.set_detect_anomaly(True)
for epoch in range(100):
nanoGPT.train()
for batch, targets in generate_data():
optimizer.zero_grad()
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
loss = loss_function(outputs, targets.to("cuda"))
loss.backward()
optimizer.step()
print(loss)
is not learning very well 🙃
i got it i got it, wasn't actually creating batches:
sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")
Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ
Hey , why does pytorch nn conv2d model keep telling me too many values to unpack, i changed to code to accept 4 channels , but it still says expected two channels
can you send a code snippet and mark the line where it gives you this error (mark it with a comment)? (I don't have pytorch currently installed, but I worked with it a bit)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(model.parameters(), lr=0.01)
import torch
import numpy as np
# Define the custom loss function for complex numbers
def custom_complex_loss(input, target):
input_magnitude = torch.abs(input)
target_magnitude = torch.abs(target)
loss = torch.mean((target_magnitude - input_magnitude) ** 2)
return loss def train_model(early_stopping=True, epochs=50):
epoch_train_losses = []
epoch_test_losses = []
current_best_loss = np.inf
early_stopping_counter = 0
early_stopping_patience = 2
for epoch in range(epochs):
for i, batch in enumerate(train_loader, 0):
real_inputs, imag_inputs, real_labels, imag_labels = batch
inputs = torch.complex(real_inputs, imag_inputs)
labels = torch.complex(real_labels, imag_labels)
optimizer.zero_grad()
outputs = net(inputs)
loss = custom_complex_loss(outputs, labels)
loss.backward()
optimizer.step()```
Cell In[97], line 51
48 break
50 # Assuming net, train_loader, test_loader, optimizer, and device are defined
---> 51 train_model(early_stopping=True, epochs=50)
Cell In[97], line 21, in train_model(early_stopping, epochs)
19 for epoch in range(epochs):
20 for i, batch in enumerate(train_loader, 0):
---> 21 real_inputs, imag_inputs, real_labels, imag_labels = batch
22 inputs = torch.complex(real_inputs, imag_inputs)
23 labels = torch.complex(real_labels, imag_labels)
ValueError: too many values to unpack (expected 4)```
torch.Size([4, 1, 500, 500]) also my data has been fourier transformed. It should be 4 why does it say it expects 2, maybe somewhere it should have 4 inputs but it has two. But i changed to four but it didnt understand it so i did torch complex
ok, so you can try to do print(type(batch)) and print(len(batch))
then print(type(batch[0])) and len(batch[0])
maybe batch contains some iterables like list or tuple or something
you need to see what your train_loader does to your data and see the shape of its outputs or something
@feral kernel basically the whole process of debugging this is to go to the source of your data (in your case, batch) and see how it is processed and why is it the shape it is
do some detective work and like go back on the stages of your data flow, see what happened at every stage with a print at its end - that's what I do to identify problems and their causes when I debug my code
I did that already , still the same error , i know i need to change the batch so 4 dimensions will fit but i get another error.. chatgpt is not helping at all, same error over and over . Went to the source of error, tried to change the inputs , but for some reason still the same error
I already printed the batch and teh shape, i asked gpt to change the shape pf the data to fit but for some reason it cant do it
The error message "too many values to unpack" happens when you tried to pull out more elements from the tuple than existed.
so your train_loader doesn't do what you would want it to do
so rather than reshaping the batch, see what your train_loader does
I know that, i dont understand why cant gpt change it so the batch loader to accept dimension of 4. I need manually do it lol
coz gpt is a funny guy, it's generative - i.e. creative
creativity is not always accuracy so yeah if it gets stuck in the same idea of bad code, it's over
is the train_loader a function or something?
and is it something from a library or custom defined?
It is not very creative , it just regurgitates and mixes info
It is the input after it has been resiZed. It is just some cfloat tensors
?
that's the train_loader?
Im not home, i need to check the code… the train_loader process and loads the tensor
train_dataset = CustomTensorDataset(root_dir=train_root, transform=resize_transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
see what the DataLoader does
maybe that's the issue
so basically the train_loader variable is a list or something
a list of lists or tuples with 2 elements
and you want it to be of 4 elements
where is the DataLoader function from? some pytorch component?
Im not home , but i have some histories loaded. import os
from torch.utils.data import Dataset
class CustomTensorDataset(Dataset):
def init(self, root_dir, transform=None):
self.root_dir = root_dir
self.transform = transform
self.file_names = [f for f in os.listdir(root_dir) if f.endswith('.pt')]
def __len__(self):
return len(self.file_names)
def __getitem__(self, idx):
file_path = os.path.join(self.root_dir, self.file_names[idx])
tensor = torch.load(file_path)
# Ensure the tensor has a channel dimension
tensor = tensor.unsqueeze(0) if tensor.dim() == 2 else tensor
if self.transform:
tensor = self.transform(tensor)
return tensor
I'm confused, I don't see DataLoader written here
It should be some pytorch function or a function gpt and i defined from before
that was the relevant part for the array you wanted to unpack
I know i cant find where it is defined in my history. i need to go home and find it on my laptop.
is it this thing?
from torch.utils.data import DataLoader
train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)
Yes a pytorch library
ok, so it doesn't seem to affect the shape of your input data, it just "slices" it
Thanks a lot
so just to be sure, print the shape of one element from train_dataset and train_loader
and if the DataLoader doesn't mess with the shape, then your problem stems from the train_dataset, maybe your processing does not give you some shape 4 data or so
wait, I think training data might be the inputs only. so you forgot the labels lol
I overlooked that
so yeah, it's normal to unpack 2 values only.
anyway, see what your training data looks like - if it has both inputs and labels or whatever
I did print my train_loader. The labels variable is the same data as input.
I defined labels, labels = torch.complex(real_labels, imag_labels)
Lol, gpt writes undefined code , then i have to correct it. I think i might need to define real labels , it defined imag_labels = batch
any idea why there is a faint blue line in the background that vaguely resembles the dark blue line? what does it mean and how can i remove it? Graphed with seaborn using lineplot with only the training loss (val loss is not present)
call the function 20k times and store the results
it was working but when I reopened colab it stopped to work
Maybe range is the array
Great
Don't use names of built-in functions and classes for arrays
Or anything else
i haven't done that
You probably did and deleted the code where you did
But with notebooks, deleting code that you already ran doesn't undo it
well thanks for help, glad it works now
Yw
whatever i did
There is no other explanation
The chances of other possible causes are next to none.
Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ
are you sure the underlying data is actually getting changed? because it might just be that google sheets is displaying it for you as an excel-style sheet
That's the point of a bias in a network cant I just change the weights and get the exact same thing
tensor([[0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2],
[0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
[0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2],
[0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
[0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2]], device='cuda:0')
that's coming from the gpt I trained, it sorts arrays
I'm really getting annoyed by coding, I have spent 80-90 hours trying to get this data to run on this custom neural network, it is not working. It has errors one after another... It is insane, i'm still on downsampling, i haven't even started training and tuning the weights and weight generators yet...Man this will take forever. Rewriting neural networks with a custom backpropagation and custom activation function and custom transformed dataset is hard. Man I need to learn more pytorch and python...
no. as an exercise, try to fit the line y=3x+1 using linear regression with no intercept
@desert oar after a ton of tensor indices shenanigans (I used an optimization that merges Q, K, and V for the three heads into one matrix ), I managed to place it into the xMx.T form during inference time:
Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
input = tensor([0, 0, 1, 0, 0, 1, 1, 0, 2, 0, 1], device='cuda:0')
output = tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2], device='cuda:0')
the first three prints are from the three heads, to confirm that it is in fact doing it
this is also not symmetric, so it's actually not a correct interpretation, wonder what will happen if I force it to be symmetric and positive definite
how are you measuring the induced distance?
so you trained the normal way, then precomputed M for inference?
as a quick test, you should get the same outputs from both versions, given the same input
up to a few decimal places of course
M is just the product of two projections, right? does it need to be symmetric? it's not a correlation matrix
X M Xt should however be a correlation matrix
It's more of an induced dot product, and even then, it's still not correct because M doesn't satisfy the conditions to be a metric tensor.
I'm just taking the embeddings and putting them in x of xMx.T
Yes I checked that
The end result is a tensor of integers tho, didn't compare the rest of it
It doesn't need to be, but to maintain the nice picture of M being a metric tensor it needs to be symmetric and xMx.T needs to result in positive values
It might be super worth it to insist on making M a metric tensor somehow, cuz then we have access to a ton of math that exists out there to describe high dimensional non eucledian spaces
Like, if the network doesn't mind doing it, or even the performance doesn't drop in any way, this way would be a lot more interpretable
yeah but are they identical?
They are the same result
nice
BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')
AFTER M
Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')
copied the last print on accident ._.
in any case, the next step is to see if I can train it without Wk and Wq, directly using M, and then im gonna see if i can force M to be a metric
Yes, I'm hoping the network can induce a metric on the space of embeddings. If it does, we can start talking about the latent spaces in terms of distances and angles and areas and etc, which, at least for my brain, is a lot more intuitive to talk about.
well maybe my linear algebra is lacking here, but why should the composition of two projections be a metric tensor?
There's no reason, I'm just gonna do away with the projections and keep M as the only learnable parameter, and then force M to be metric, I might test out other constraints, but this one would be the more interesting to me
It might be too restrictive for the network though, but there's only one way to find out
i'm not familiar with metric tensors in general. that X M X.T has a nice interpretation in that case? aside from being a correlation matrix
or do the elements of M take on some specific interpretation?
I'm gonna see if I do some benchmarks of Wq, Wk vs M to see if I found an optimization, but I doubt it since I suspect that the researchers that came up with this started from xMx.T and then split it into Wq and Wk for optimization purposes
i'm not sure about that actually. my impression is that the QKV system was always meant to be the "soft lookup" mechanism
So the metric tensor defines a dot product:
dot(x, y) = xMy^T
If you batch it, the final matrix will just be a table of dot products
would have to look into the literature from 2016 and earlier to see the history of the idea before the big attention is all you need paper
right, that's a pretty nice way to interpret the Q K.T operation
are you retraining nanogpt or something?
ah, was gonna attach the code but the python bot didnt let me
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])
print("AFTER M")
nanoGPT.finish()
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("input =", inputs[3])
print("output =", max_indices[3])
this compares the output
I'm gonna clean it up a bit and put it in its own repo because now there's gonna be a lot of experiments
I'm too curious aboout the metric thing, but isn't really a priority
the actual thing I have to do now is to branch the model so that it can take in two inputs
also train it on a corpus of text like shakespear
and start thinking about how I'm gonna do the whole voice thing
the whole thing is gonna be inside a simple web app that im gonna code, it's gonna be like:
http/ws traffik <---> traefik service <-> go + htmx + tmpl <-> fastapi + model
all in compose
nice, there should be plenty of examples and prior art for that, again it's kind of what transformers were originally designed to do
i'm impressed at all the projects you've made time for, i barely have time to sit down and think through a project idea
being unemployed has its perks ig
hah, enjoy it
as long as you have the money
there was a period of time when i had no living expenses and no job, my biggest regret was not milking that for longer
being in a LCOL country and remoting to HCOL allows me to do these kinds of breaks to upskill
tho this is the first and last time tbh, im hoping to land an on site ML Eng job to get me both out of the country and out of the house >.>
fair enough. just remember to enjoy the moment of freedom!
these are also great projects for ML Eng
Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ
Hello everyone!
ok so, i have a python assistant and i am adding a new feature to it.. called "tasks".. its like the to-do lists.. and i connected it to the database and all and i even managed it to get the task and inform me when its time.. i want it to run in the background when the another script is running foreground.. and check if the time matches. how do i make it run in the background and run another script foreground
Is there any reason why you're not doing the standard breadth first search?
can anyone recommend me a kaggle competition / dataset / something like that to get beginner experience with finetuning bert models? The task doesnt matter so much I guess, I want to get pytorch experience mainly (coming from knowing a lot of theory of transformers), but something in fashion would be nice, so related to rag or clustering
try fine-tuning a BERT model for named entity recognition or sequence classification.
(I am a computational linguist professionally--those are the first two things for which I fine-tuned BERT to learn pytorch.)
how does a transformer work for NER? do you have labeled regions in a token sequence and it emits a sequence like [null, null, ENTITY, ENTITY, null, ENTITY]?
you add another linear layer for the number of classes in the data, and each output is a tensor of shape (batch_size, sequence_length, class). For the sequence_length dimension, all sequences are right-padded with padding tokens to make them as long as the longest sequence in the batch. And then each element represents the probability that the token for that "row" belongs to the class for that "column". You need a "column" to represent the null class (the token is not an entity)
also, since BERT does subtokenization, you usually get better results if you label all but the first subtoken as null.
So if you tokenize "That is unbelievable" as ["That", "is", "un", "##believ", "##able"], and you're classifying parts of speech for pronouns and adjectives, the classifications would be [PRON, null, ADJ, null, null]
@pure pond take note of that ^
makes sense. so the output for one sequence in the batch is a distribution of class weights at each token?
class 1 class 2 class 3
token1: ...
token2: ...
Indeed
one thing i've actually never been sure about for these models is how text generation works
or in this case, output generation
do you feed in a sequence one token at a time? like predict(seq[:1]), predict(seq[:2]), ...?
I've been needing to learn more about that since interactive LLMs became ascendant
i know bert isn't really meant for that, you just stick in a sequence and get a sequence of outputs, right?
for BERT, yes
you can actually fine-tune GPT-2 for anything you might want to fine-tune a BERT variant for, and the API is the same because of hugging face. but the inner workings presumably are not.
This is a notebook I wrote a few months ago that does both. https://github.com/center-for-threat-informed-defense/tram/blob/main/model-development/train_single_label.ipynb
interesting. afaik the main difference is that gpt-style models mask off future tokens while bert-style models don't. is that right?
the idea being that gpt-style models are meant to only scan backwards through the sequence, whereas bert-style models are meant to examine the sequence as a whole
is this supposed to happen
I was trying to convert my dataframe to all strings
and when I check the dtype I get all objects
by default pandas encodes strings as plain python objects. "object" dtype means that your series is internally just a python list (in this case a list of strings, but the data could actually be any type)
i suggest pd.StringDtype() instead of str
i think you can also pass "string" as a shorthand
!d pandas StringDtype
!d pandas.StringDtype
class pandas.StringDtype(storage=None)```
Extension dtype for string data.
Warning
StringDtype is considered experimental. The implementation and parts of the API may change without warning.
Ok thanks, how did you set up that experiment though? So youre using a pretrained encoder from the sounds of it? Where did you get that from? And the dataset?
they say it's experimental, but it's been stable for a while now
And yeah thats right. Openais gpts are decoders, which means they dont pay attention to the future tokens in the sequence. It doesnt mean that they dont produce embeddings which I see a lot of people get confused about. Each iteration they'll take in the whole sequence, get embeddings from attention and mlp layers, then produce a prediction for the next token in the sequence. Then they repeat, looping over those steps, adding the new token to the end of the input sequence each time (to get at your other comment). When training decoders, you mask out all the future tokens, and just predict the first mask. Eg if you have a sentence of 8 tokens, you can split that up into 9 training examples, the first you have token 1 and predict token 2, the 2nd you have 1 and 2 and predict 3, and so on (up to predicting token 9). https://www.youtube.com/watch?v=kCc8FmEb1nY is a super great video, you can just watch the 30m if you dont care about the details of attention layers even
auto-regressive means it "consumes" the output for the next input. Thats how people usually say it, I dont like that phrasing though, because it sounds like the output is used up or something. So to me, auto-regressive is that the outputs get added to the input, as you repeatedly loop over the (growing) input
This is a really great post from the other day about llm inferencing https://vgel.me/posts/faster-inference/
Blog about linguistics, programming, and my projects
I ment what's the point
dont produce embeddings
by this you mean embeddings for the complete sequence, right?
well, thats right, if by complete sequence you mean for all time iterations, not just up to and including the current time iteration. But what I meant when I said that, was that some people hear encoder vs decoder, and then wonder how a decoder model can work when just given tokens as the input, because they think no encoder means no embeddings
idk about that specifically, but GPT is one-directional and BERT is bi-directional, yes.
i see, that makes sense
BERT models come with a tokenizer and you have to use that tokenizer.
gpt is bidirectional, but up to a cutoff token. Token 1 can attend to token 5, if both already exist
i think (actually im not sure anymore)
and my team created the dataset. but I wouldn't use it for what you're doing.
a tokenizer sure but thats not the weights of the bert itself
20354 kb
you'd be fine-tuning an existing BERT model
why do you keep posting uncropped pictures
Because I can
okay? it seems like spam.
yeah thats what I mean, where do you get the pretrained bert? Also, when training your ner classifier, did you freeze the bert and just train the head? Or no
I need help with my activation
you get the pretrained BERT from huggingface. and I'm pretty sure that training can modify all the weights, including the existing BERT weights and the linear output layer that you add.
My brain keep output the same thing
I have to go do Christmas stuff
everyone be good.
I think it has something to do with my activation function
ok ty for the help
Can you help me
So my output is the same every time
No that's not it
My output is always the same and I think it's the activation function
what output
so?
I don't want that
why not
It's not supposed to do it
theres many reasons why it could be doing that
Because it's supposed to selecting one thing
i dont wanna explain every possible thing
I thinks it's the activation function though
you mean every output class has the same value?
bro install discord on your machine
What?
Oh
The y is set to the last amount of inputs going into the hidden network
So I can get a range from lest output and set it between 0-1
And the inputs are randomized from the beginning
So they shouldn't be all the same
I know the weight
Is it because the weights aren't different
ok I have a suggestion, try changing the activation function to figure out whats going wrong. Change it to just add up all the input values it's recieving. And then change that to add up 2* each value. And make sure each change has the expected effect on your outputs. You should find a clue doing that
I'm scraping some data from a website and it's all legal as per their robots.txt but the way they have their site set up is that I have to access a page for every single record I want to view and in my case that's like 100,000 records. So does anyone know how slow I should make my program run to prevent them from thinking I'm trying to DDOS their site?
looks like it's still experimental because the module doesn't have it
what version of pandas do you have? it's been available for years...
also you can just use .astype like before
yep but can't do re.sub or tokenization since my dataframe isn't registered as a string
well my jupyter notebook app icon on windows has a red x next to it
which I've heard means that it's likely corrupted
maybe I should be more concerned with that
alright, I got some possibly interesting results, the left side is the correlation matrices (or dot product tables) of the model with the normal Q, K, V. The right side is the same, but for the model using only M where M is forced to be symmetric.
for example, head 0 metric 0, it's pretty clear that it is just a number 0 detector (0 is the first token)
while head 0 metric 1 is a number 2 detector
these are the actual tensors, left side is WqWk.T right side is M
I collapsed the model into a single self attention module and reduced the number of dimensions, these matrices are tables of actual distances between the vectors in the metric spaces that the network is creating
the first matrix essentially encodes the sort order between the numbers 0, 1, 2
not sure if this only happens in the metric tensor network, gonna try with the other one too just to be sure
this is from the normal Q, K, V network
yellow = larger distance
ok, so the yellow on the right most side is ocurring because the matrix allows for negative numbers, the diagonals are always 0 because the vectors are subtracted with themselves
I see this as an absolute win because I can clearly interpret what is happening on the first image
while in the second image im not sure if I can do the same on any of the tables
Hello, can anyone give me a Datascience, AI/ML and DL roadmap. I wish to deploy my own model and shine my resume. I also wish to make some great research papers and do github dtuff in ds mostly. I am very bad in python and this all. May I please get some roadmap/advice/resource I can follow?
I am a datascience student but wish to learn from scratch(didnt pay attention till now) and redo the math as well. I am asking for any solid advice. I am interested in NLP, video generation etc
I wish to become proficient in all the AI/ML related things
see the first pinned message
What is the difference betwwen difference between df.loc['column'] and df.loc[df['column']]
both of these are pretty weird in my opinion - df.loc[thing] is for getting a row with key thing, not a column. and hence df.loc[df['column']] will also fail unless entries of df['column'] are keys of the dataframe's index, which seems like an unusual situation.
what are the 3 different matrices here?
and what are the axes? there are only 3 tokens in the input?
the former selects the row labeled "column", the latter selects rows with labels matching the contents of the column "column"
the 2nd thing is very likely not something you ever need to do
so you just constrained it to be symmetric, but let the diagonals be negative & didn't try to make it positive definite, right?
Axis numbers are meaningless, I'm not being very rigorous.
I'm comparing every allowed token with each other. There's three tokens, A, B and C and the model sorts a sequence of 11 tokens.
The values of the matrices are:
u = vocab[I] - vocab[j]
value_ij = uMu.T
The three matrices come from the three self attention heads in the self attention module.
so second can be done with df.query
ah okay, they all say "head 0" so i was confused
sort of... what are you actually trying to do here?
I only constrained it to be symmetric but it seems the model ends up making it positive.
i actually don't know: can .query access the index?
Ah oops
I am not trying to do anything rn lol. Just exploring
the ones across the row, not down the column
Should be Module 0 Head 0, Module 0 Head 1 Module 0 Head 2.
df.loc[ idx , col ] in general selects rows with labels in idx and columns with labels in col
@final kiln really interesting experiment. where did your input sequences come from?
i wouldn't have any idea how to generate realistic synthetic data for something like this
can you force it to be positive semidefinite by using the cholesky decomposition somehow?
So how would you change values where by in Column A it equals bob for example?
not sure if setting L L.T = M is enough
Oh, it's actually easy if you think about it, so each letter maps to an index, A -> 0, B -> 1 and C -> 2, so I create a random sequence of size 11, [0, 1, 1, 0, 2, ...] And then sort it with a normal sort Algo to serve as ground truth
That's what I'm using at the moment
yes, .loc also allows boolean series for selecting rows by true/false.
eq_bob = df["a"] == "bob"
df.loc[eq_bob]
But idk if it forces it to be positive, maybe it does ?
oh it's sorting, right
Is this the easiest way?
i think so, it's kinda like squaring
it's the standard way. your other option is df.query
ok cool
however if you want to select rows where the row label (aka "index") is a specific value, you use .loc directly: df.loc["bob"]
often if you have some kind of unique identifier, id number, timestamp, etc it's a good idea to use that for the row labels. it makes some things a lot tidier and a lot faster
Just checked, it is, that explains why they all come positive, which is nice. I think the other property that I need is that it must be non-degenrate
again im not sure you can get that with cholesky decomp
I need to review my linear algebra >.>
But it seems that yes
Then the experiment is complete. And I've halved the number of parameters in each head because it's a symmetric matrix
i'm really surprised this isn't an established technique, if you have code I'd love to see it so I can play around with it
Sure, I'll send you in a bit.
I've just taken Wq and matmul it with itself
And deleted Wk
Next step really is see if this scales, benchmark it and all that stuff.
If no one has ever done this I'm gonna write a small paper on it
yeah that's what i wanted to try, running nanogpt with it
definitely worth a paper with your results
and if someone else already did it, then you'll find out 😆
next time you ask for help in 2 different places please link to your other conversation for context
I def wanna find out before commiting to benchmarks
Ok thank you
# prepare for matrix multiplication
q_b3wk = q_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
# k_b3wk = k_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
v_b3wk = v_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
# perform matrix multiplication
attention_scores_b3ww = q_b3wk @ q_b3wk.transpose(-1, -2)
this is the key difference
notation goes like
name_of_tensor_bwk, b = batch, w = words, k = coordinates // 3
it's tensor notation
v_b3wk = shape of (batch, 3, words, coordinates // 3)
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])
this will print a result after training
or is it? i guess you're just recycling the variable name
i'd call them W or something like that
they already include the matrix mul
yeah i see it
(xWq)(xWq).T = QQ.T
at least gpt4 cant find anything of the sort
Im skeptical about how far this can be taken tho, if I can teach it to do shakespear without loss of efficiency I'll be very excited indeed
anyone here ever worked with image or object detection?
any one around to toy with an idea?
the what?
hi
https://www.youtube.com/watch?v=OGxgnH8y2NM&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&pp=iAQB is this playlist good enough to follow? ( considering it's 7 years old now )
The objective of this course is to give you a holistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms.
In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, a...
If anyone followed it , kindly share your feedback
Me
Hi everyone I received a hiring challenge which is pretty lame I think, basically it states that I have to find the pattern from the number which pop up in the website continuesly and I'm confident that I can create ml model but how can anyone suggest ideas on how I can use javascript to predict the label simultaneously while the numbers appear the link for the website is below
you did find the hidden link to the full dataset, right?
No, Idk how to do... I don't know webdev
sounds like you won't get hired, then
if you look at the source, there's a comment suggesting to find what makes the numbers change. by looking at the JS script in the debugger (which even has a source map, no decompilation needed) you can find the link to the full dataset in a comment.
Hi, I have an algorithm that detects trendlines the "low" and "high" levels of a signal in a dataframe, for a time window that I can set.
After I have the trendlines' slopes saved into the dataframe, I try to find which trendline pairs are parallel and: ascending, descending or horizontal.
Can anyone hop on a call with me to look through the code? It's a bit long to explain
Just so you know, it's pretty unlikely that anyone will join a call with you. If they do, that's great. But you should probably ask the most focused question that you can and show relevant code as text.
how do I paste code here that will look more like code and not plain text?
!code
^ permanently remember the information here
Here are the functions I use to:
- classify the trend patterns (wedges, channels, triangles) from the trendlines' slopes
- get the dataframe index of the N-th occurrence of a certain trend pattern that was identified by the previous function
https://paste.pythondiscord.com/HLCA
My problem is that the trendlines for channels don't seem to look too parallel, even though sensitivity = 0.001 and the condition for descending channel is abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity
In the second pic, it says I have slopes -0.06 and -0.01 which give an absolute difference of 0.05 that's greater than 0.01, which is 10*sensitivity.
I don't know where my function failed.
so maybe look further up in your code? if the slopes are not what you expected, maybe you made a mistake in that part of the code
Hey guys, sorry for asking such a beginner question. But I have a confusion. Do I need to learn a lot of math before learning ml and do I need SciKitLearn before I touch something like a Tensorflow ?
What would a typical roadmap look like
guys, what did i do wrong.
i wanted to look like this
but instead, it look like this
My first suspicion was the part that "marks" the trendline pattern with 1, but I then input the upper and lower slopes into the if conditions and got "nothing":
sensitivity = 0.001
slmin = -0.06
slmax = -0.01
if abs(slmin - slmax) < sensitivity*10 and slmin > sensitivity and slmax > sensitivity:
print("chan_asc")
# chan_desc is appended 1 when slopes are almost parallel and both of them are below a small negative value
elif abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity:
print("chan_desc")
# chan_hor is appended 1 when slopes are almost parallel and both of them have a really small amount (around 0)
elif abs(slmin - slmax) < sensitivity*10 and abs(slmin) < sensitivity and abs(slmax) < sensitivity:
print("chan_hor")
# tri_desc is appended 1 when slmin is very small and positive and slmax is negative
elif slmin > sensitivity and slmax < -sensitivity and abs(slmin) < sensitivity:
print("tri_desc")
# tri_asc is appended 1 when slmax is very small and negative and slmin is positive
elif slmax < -sensitivity and slmin > sensitivity and abs(slmax) < sensitivity:
print("tri_asc")
# wed_desc is appended 1 when slmin is negative and slmax is negative but slmin is greater than slmax
elif slmin < -sensitivity and slmax < -sensitivity and slmax < slmin:
print("wed_desc")
# wed_asc is appended 1 when slmin is positive and slmax is positive but slmin is greater than slmax
elif slmin > sensitivity and slmax > sensitivity and slmin > slmax:
print("wed_asc")
else:
print("nothing")
I also tried to see if the plotting function calculates the trendlines differently than the function that adds their slope and intercept to the dataframe, but they use the exact same method on the exact same time frame, so that's no issue either. I'm literally out of ideas on what to test
So all this time, I've been "trapped" by my curiosity about Transformers architecture. And it turns out, to study the Transformer architecture, I had to go back and learn the creation of machine translation models using RNN and RNN+Bahdanau attention.
As a result, I have now built three machine translation models using the TensorFlow library. However, when I compare the results, it turns out that the machine translation model using the Transformer architecture has much lower accuracy compared to the other two RNN models, only around 55%. Here are the details:
Classic RNN
loss: 0.1082 - accuracy: 0.9762 - val_loss: 0.4065 - val_accuracy: 0.9304
RNN + Bahdanau Attention
loss: 0.2247 - accuracy: 0.9593 - val_loss: 0.5537 - val_accuracy: 0.9100
Transformer
loss: 0.9383 - accuracy: 0.7688 - val_loss: 2.6770 - val_accuracy: 0.5565
On the one hand, when I look at the results of others who have made machine translation with the Transformer architecture, the validation loss values they get are similar to mine, around 2.4 (this output is generated when using the PyTorch library).
My question is, why does the accuracy of the Transformer architecture seem much lower than that of the RNN models? Initially, I thought this might be because the accuracy metric used for model performance measurement is "accuracy," which, when creating a machine translation model, should use other metrics, such as using BLEU Score. However, on the other hand, when I want to use BLEU Score as the accuracy metric, my computer is not capable of running it.
Is this "accuracy" metric not reliable, so that's why the RNN model performs much better than the Transformer? or what?
How deep did you go with regards to the transformer architecture learning?
Merry Christmas guys! Today is a good day to close your laptop and enjoy the Christmas celebration with your family and friends. 😁👌
Bleu is likely to be more suitable here @bold timber
Is your model tuned to the dataset ?
I think so, but my computer not capable to run it if I use BLEU score as a accuracy metrics😅
can you elaborate more about this?
When it comes to neural machine translation, the evaluation metric isn't the conventional accuracy score.
Metrics like BLEU, GLEU, METEOR, Perplexity, COMET, etc are used.
It's strange to learn you're unable to compute BLEU score on your pc. What exactly happens when you try calculating BLEU? It throws an error message or kernel dies or ??
Note: the context here is the machine translation model for english-french
No. But when I try to use BLEU score as a metrics, I need to set run_eagerly = True, which makes the kernel died (otherwise, it will error)
Can I see the code snippet where you're computing BLEU?
If you're using the bleu module from NLTK and it's not working for you (it should work under normal circumstance though), then try using sacreBLEU
Hi, thanks for the suggestion! I will try to understand it first.
Hello everyone, Just completed notebook link here https://www.kaggle.com/code/nishchay331/n6-house-prices-advanced-regression-techniques. I'll be glad if you take some time to go through this and point my mistakes as I am a beginner. Suggestions are highly appreciated. Thank you.
do you know how to using the bleu module from NLTK as an accuracy metrics in the machine translation model with tensorflow?
Once you've trained your NMT model, and you'd like to evaluate the model performance, you just need the translated text (the sentence predicted by your model) and the actual text from the target language (the correct sentence you expect your model to predict) compute BLEU score.
from nltk.translate.bleu_score import sentence_bleu
reference = ['It was raining heavily today'.split()]
candidate = 'It It It is raining heavily'.split()
print(sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0)))
We have different types of bleu computation; sentence_bleu, corpus_bleu... This will provide more clarity https://machinelearningmastery.com/calculate-bleu-score-for-text-python/
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations. Although developed for translation, it can be used to evaluate text generated for a suite of natural language processing tasks. In this tutorial, you will discover the BLEU score for evaluating and scoring...
Does this mean that the BLEU score is not included in the metrics during model compilation, like model.compile(metrics=[BLEU])?
I have no idea if it's possible to implement it that way in TensorFlow. I use PyTorch not TF.
You might wanna check TensorFlow documentation. Or perhaps, someone who uses TensorFlow could offer a much better feedback on this.
Alright, thank you for the respond you provided
By the way, how do you determine weights in sentence_bleu?
The BLEU score calculation in NLTK gives us leverage to compute either the cumulative or individual BLEU scores of different n-grams.
By default, sentence_bleu() calculates the cumulative 4-gram BLEU score, also called BLEU-4. The weights for BLEU-4 are (0.25, 0.25, 0.25, 0.25) (remember, 1/4 == 0.25)
Now, in the example I showed earlier, the weights are (0.5, 0.5, 0, 0) because I'm interested in computing BLEU-2 a.k.a bigram bleu score (remember 1/2 == 0.5 )
If you check the attached link in my previous response, it has a section called **Individual vs. Cumulative BLEU scores** where this concept was explained in more detailed way
Hi everyone I'm want to fine tune a MultiModal llm which can generate icons based on input text, I'm not able to find open-source models for it.. any idea?
aah I see, thank you so much!
so i seem to have found 2 ways to break chatgbt 3.5
is it common? by break i mean by passing the explicit rule
You mean prompts?
no i just straight up asked questions about something then kept slighty changing the subject
it worked
twice
you can report it directly to OpenAI, but it is not ultra surprising
that sort of stuff (using ai tools / prompt engineering / their vulnerabilities) is not particularly on topic for this server though
and what outcome did you expect for those inputs? are you sure your if/else conditions are correct?
what kind of experience are you looking for?
a computer who is programed to tell the truth, and is not restricted always
i understand there are some topics which it should avoid but it seems its always lying or just not telling the whole truth, or just trying as much as possible to give you 1 single opinion
That's not really how these models work, there is no concept of the truth to them
They are predicting the most likely text to come after whatever you give them
Even if they were trained on 100% factual and true datasets (which they absolutely are not) they could still give you false information
yes i know that but it's so restricted
also, outside of pure logic, truth isnt really so well defined, and especially on topics a lot of people care about. By that I mean, people disagree with eachother, and not beacuse all but 1 are intentionally being malicious
you dont even have to look to politics, think of the legal industry. We try to create a set of rules, where if you do x then y happens. But it doesnt work like some complex algorithm in a deterministic way, the quality of your lawyer has a huge impact, and facts if they exist are interpreted
so, how do you even measure how truthful an llm is?
and sure there things that are widely accepted to be true, or at least by communities who should understand the topic well, but there is so much grey area everywhere that it doesnt help that much (and youre not interested in restriction of topics)
@midnight root
and if you really just want an unrestricted model now, get mixtral 7x8b running
check out https://youtu.be/zjkBMFhNj_g?t=2774
anyway
I (think I) know a fair amount of deep learning theory but I want to get some practical experience. I'm doing a kaggle competition and im wondering, how do most people do their data processing? By that I mean setting up something to feature engineer each input, not so much the mining. Do you just write whatever function to do whtever you want, or is the industry standard to make (heavy?) use of sklearn pipelines or whatever? I guess it depends on the task, but lets say youre working on your own on a relatively static and manageable data set, ie not worrying about big data streams in the cloud or something
actually, for my question, does anyone have any examples of like a github repo where they implemented an ML pipeline? Which was done either for a job interview kind of scale or while preparing? That'd be perfect
Idk y I haven't used this before, MLFlow is amazing
It's running on an ec2 instance and connects to S3.
In gonna do three experiments, one for normal GPT, another for a slightly modified GPT and another for a full variation on the transformer where I also incorporate a couple other ideas I have
I'll train them on the same Shakespeare dataset and compare performances, and also how the performances scale with model size
I've already put some thought into how I'm gonna do the multimodal thing. It's gonna be a voice input to text output because one step at a time. The idea is to take whisper, invert the decoder (somehow) so that I have a text to embedding translator, then use lamma to generate text, so I can use it to fine tune another lamma to understand the embedding. Turning lamma into a decoder for the whisper encoder.
cant you just get embeddings from whisper?
and how will you feed embeddings into llama
It's encoder decoder, I'm just gonna take from the bottleneck
I'm gonna add something to the initial layers to adapt for the new input type
Greatest challenge will be to invert whispers decoder
output_hidden_state=True apprently
oops didnt mean to reply to that, how do I undo that?
whatever
It's not so easy, you can get to the bottleneck (which I think is what that does im not sure), but the problem is that the decoder has two inputs
also I dont think finetuning llama will work for this, youd have to (pre)train it all
I'll try it all the same, if it doesn't work I'm sure the lessons learned will inform my next step
ofc
I will point out this though https://arxiv.org/abs/2305.11206 that in my view was a big landmark in the shift to thinking that finetuning doesnt really teach the model any new information (or how to handle new data formats in this case), which is pretty mainstream now. Check out the start of section 2 in the paper. Also https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts is pretty good from around that same time
as yeah I see people parrot this a lot but not really know where its all come from
Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
I guess thats a fair point, its more saying that finetuning brings out what the model already knows, but I spent about a third of the year just funetuning llamas, trying to get them to have a better understanding of the domain of my companies industry, and didnt get that far. Its why theres been such a hard pivot to RAG now (imo)
that's the point of fine tuning tho, I don't wanna introduce knowledge, that's too expensive, I want it to adapt to a new input type
I mean sure, I'm not really an expert, I could (in theory) tell you details of how it went at my company, but its also just our team and a fairly small one. I see finetuning as getting the model to output in more predictable ways, and in ways you design. What youre doing is more the other side, getting the model to take an input that it's unfamiliar with. So instead of condensing whatever region of the embedding space represents its "understanding" in different ways, you want it to handle unfamiliar regions of the space. But tbh, Ive never tried anything like that, and cant think of any experiments like that either that I know about at least
so yeah this is a cool thing to try actually, hit me up if it goes well lol
but it might just be a hopefully easy enough to do transformation to get the embeddings to what llama knows
tbh there's probably good reason why gpt4 uses a translation network for its vision thing instead of finetuning the starting layers
or at least, that's what I heard how they do it, not gpt4 specifically ofc cuz that's not opne sourced
the reason I wanna fine tune is that the embedding will contain other useful information that I know language models understand but can't get from most text prompts
yeah, thats where my gut feeling is that pretraining will be better suited than finetuning kicks in. But again, idk really
yeah idk, interesting problem
like, I know GPT4 "understands" sadness, conceptually, it can describe it, adapt prompts if you tell it you are sad. but you need to do extra work for the prompt to include emotions
a speech embedding will already contain all that
and since the language model also already contains some representation of it
I'm hoping that it will just sort of adapt to it
this is late stage stuff tho
I think the same....it might give optimistic looking results after a bunch of fine-tuning but pre-training is where it's at.
the LLM already has a capability based on a ton of parameters trained on? understanding structure of natural language, not numbers in an embedding space
interesting approach tho, backwards_propagation
i've thought about this problem a bunch too, the information lost in the original embeddings when going for multimodality
but this would occur only in "put-together" multi-modal networks so to say right, not in ones designed for multimodality
which is a long winded way to say that that should be your pre-training task
The embedding spaces are the things that actually encode the structure of natural language.
The information is extracted from the embeddings, so it must encode meaning.
yes but LLMs are not trained to decode structure from embedding space inputs, they're trained to decode structure and meaning from natural language inputs
The text input is first translated into an embedding, using a table of embeddings that the network learns.
I personally don't like arguing like this, I've banged my head against the wall so many times by trying to guess what's possible or what's not using my own intuition.
xd i get it
I've learned to accept that I just can't predict what works until I try it
you'd still need a pre-trained joint embedding space model at minimum right
but that isn't a huge ask
I'm just gonna let the first layers change. Possibly apply a decaying learning rate as a function of model depth
with the amount of unexpected results, or results that are only explained after they're observed, in ML, that's a reasonable approach imo
interesting
lmk if it shows promise!
Sure, I always post here what im doing in case people want to chip in. Always super helpful to discuss and hear new ideas as I'm not very experienced in NLP
ah nice! I used to be pretty regular here but life gets in the way lol, still drop in from time to time
I think youre in agreement though, backprop is saying that there will be a network that tries to translate whisper embeddings to llama embeddings, getting rid of the first part of llama that turns the sequence into embeddings. And the hope being that this (can) learn how to do that shift. In a sense, instead of whisper -> text -> llama, it'll be last embeddings of whisper -> translation network -> first embeddings of llama, so avoiding destroying information when you convert to text in the middle
I think this is all fine in theory, but my intuition is that llama wont really make use of the extra information whisper knows about from finetuning, it needs pretraining as a fresh language model
no reason not to start with finetuning though considering compute (btw backprop, in case you havent come across it, try LORA)
Your last paragraph is what I think the joint space training would be useful for imo, that I mentioned
LoRA and QLora
ah ok, wasnt familiar with the term
Though I wonder for the joint space embeddings
Is it necessary to pre train a full model
Or can we use an adapter network in some sense
Interesting stuff
Also I think currently backprop is trying to avoid using a separate translation network and hoping the first few layers of llama will adapt to doing that on fine tuning
This is a part I'm not very confident about
Hey guys, i know its a pretty dumb question but what is x_train and y_train and why we always have to have both x and y (e.g: x_val, y_val; x_test, y_test)?
for each instance, y is the property that you want the model to output, and X are the properties it can use to do that.
So if your data set has these properties about a house: {square footage, number of rooms, neighborhood, price}
and you want to be able to predict the price
then price is the y value, and {square footage, number of rooms, neighborhood} are the X values.
make sense so far?
yeah
ty
So if i want a model to predict a number for each image i need a matrix with all pixel (x) and the number on image (y)?
the whole image (the array of pixels) would be the X value, and the number that image represents would be the y, yes.
yw
How does a model that does what https://thispersondoesnotexist.com/ does work? All MLM problems (very few) I've done have had inputs and outputs, this just has an output. How does training a model like that work? (I'm not looking for an in depth explanation though I'll hapily take that, but more a few pointers on what to research)
i'm not an expert in this field, but my broad understanding is that the layers get progressively bigger as you get towards the end of image, culminating in a full-size image output
There is no input because the model is no longer training. Also, it is a type of generative adversarial network.
I got one who only sort of has glasses
lol I got the most generic looking dad
Oh okay thank you I'll look into that
Thankfully thispersondoesnotexist.com has been improved a lot, I think both with newer models, and importantly better data.
I remember there used to be a good share of NSFW images generated a few years ago
Which makes it awkward when you're excited trying to show someone this cool thing about your field 😅
Can someone explain what's going on in this validation curve? I think it shows underfitting since the training and cross validation score are still steadily increasing. That right?
Anyone know how to fix this?
My df styling seems messed up
Columns are taking a lot of width and trying to expand fully
The default itself has changed. I haven't run any other cell
Hey there! Im trying to implement a GMM. Im a bit struggling to implement the pdf for e step. This is my current implementation ```py
import numpy as np
def pdf(data, variance, mu):
covariance = np.diag(variance)
return np.exp(-0.5 * (data - mu).T @ np.linalg.inv(covariance) @ (data - mu)) / np.sqrt((2 * np.pi)**data.shape[1] * np.linalg.det(covariance))```
i read about it in a paper covering the basics about multivariate data analysis, but translating it into code is always a different thing
You didn't tell us which score is displayed on your Y-axis. RMSE or Coefficient of Determination (R2) or ??
Well, I'm gonna presume y-axis is the loss. From the learning curve, you could see there's a convergence between your train and cv score. As expected, the loss tends to increase as the number of neighbours increases.
It's certainly not overfitting. I won't say it's underfitting either.
While the goal is to train a model that makes 0% mistake in its prediction, your model prediction on train data happens to be making ~35% error in its prediction. ( You know it could have been way worse depending on your predefined threshold. For example, for some people, if their model was making, say, >= 70% error on the train data they'll shutdown the whole thing and retrain. For some, >= 50% error on the train data is enough to trigger a complete overhaul, which would of course, require training a new model)
i need some help with computer vision
i got to take keyboard inputs for the game from the hand gestures
any appropriate library for that to help me out with it
pls ping for the reply
Check out mediapipe
i am using the same for the hand gestures part , can you provide with some documentation for the same , as i cant find it
https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#get_started
Almost tailored to your use-case ig.
What issues are you facing?
now that i figured out by using pyautogui , and defined the cases for the press baar when we open a closed hand , but as soon as i open the hand the fps drops to 0
heres the code for that
oh ig , we cant send python file here
Hey, i'm kinda stuck here, i'm trying to use open ai whisper for transcription, but i'm having trouble loading the model:
I'm just use the basic transformers code from hugging face:
import whisper import torch from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq print("Downloading the model...") direct = "model/" model_1 = AutoModelForSpeechSeq2Seq.from_pretrained('openai/whisper-base', cache_dir=direct) processor = AutoProcessor.from_pretrained('openai/whisper-base', cache_dir=direct) model = whisper.load_model(model_1) print("Model loaded. Transcribing test audio...") result = model.transcribe("audio_test.mp3") print(result["text"])
It seems that i can't directly load it (i'm just doing it wrong, thats all i know, but i'm not sure what to do. Do i need to use transformers?
Please critique my chart
Can somebody smarter tell me why my probabilties are all over the place. The ossalate between 0 and 20. Which is a behaviour I cant explain: Code https://paste.pythondiscord.com/6K6Q logFile https://pastebin.com/RrcNg1tw
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
less clutter on y axis
can you pare this down at all? it's just a huge log output. also i suggest using https://paste.pythondiscord.com which isn't covered in malicious ads
thanks for the code. ideally you can include an error message to make it easier for people to help. include the entire "traceback" section, which tells you where exactly the error occurred in your code
looks good, i wouldn't mind a faint horizontal line in the background at 0
thats the error i get
model = whisper.load_model(model_1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper_init_.py", line 135, in load_model
elif os.path.isfile(name):
^^^^^^^^^^^^^^^^^^^^
File "<frozen genericpath>", line 30, in isfile
TypeError: stat: path should be string, bytes, os.PathLike or integer, not WhisperForConditionalGeneration
@desert oar
im setting up some more infra for the model training, after some research I found that the most efficient way is to use amazon spot instances
so im gonna setup a self hosted github actions runner that will be responsible for baby sitting the spot instances
b4 that I need to setup spotty, which is the thing that'll actually handle them
spot instances are like ec2 instances, but they can be brought down any time by aws, making them cheaper, spotty will essentially handle the fault tolerance aspect of it, it's specifically made for ML training
processor = AutoProcessor.from_pretrained('openai/whisper-base', cache_dir=direct)
model = whisper.load_model(model_1)```
Just use model = whisper.load_model('model_name'), where model_name is "openai/whisper-base". load_model function expects a string or path, while you are sending Automodel class from HF as arguement.
I dont think you even require AutoProcessor for transcribing from model.transcibe in whisper models.
Just read about this, super interesting, I might use it instead of directly altering the weights. Just to see if I understood, instead of altering the weights directly, you get something of the form
Y = W_fixed X + (delta W) X
Where delta W = AB, two trainable matrices, one initialized to 0 and one initialized to a gaussian sample of N(0,1).
Yes, the idea is that the network will know to adapt existing representations to the new kind of input which includes representations of the same thing but in a different "language".
but what do i do with the model_1 variable?
You can also remove that.
for whisper.load_model can i just specifiy the directory of the model
one of the official model names listed by `whisper.available_models()`, or
path to a model checkpoint containing the model dimensions and the model state_dict.```
whisper/__init__.py line 99
def load_model(```
You can also use the help function to print the docstring of methods.
help(whisper.load_model)
Hi guys. I just made my first nn. Its a number classification model. I would be really grateful if you gave me some tips / things to improve. Here is my code in google.collab:
https://colab.research.google.com/drive/1S65G8iwa1XgwaGEiK_Sr6_ag70vNflsd?usp=sharing
Yeah basically, the key is also that A and B are low rank, so the embeddings are projected into a low rank space, and the training has the job of figuring out which projection is the most helpful. It's interesting that such a loss of potential information often has very small effects on performance
Yeah it seems that it can load a checkpoint file but i don't see any??? i tried:
model = whisper.load_model("model/models--openai--whisper-base")
and i get RuntimeError: Model model/models--openai--whisper-base not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']
Not trying to be dismissive but try just pasting your code into chatgpt and asking it
@outer widget ??
i just realized that i don't need to use transformers, if i tell it to load for example base.en it will download the required file automatically, however i get this error now:
FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
File "C:\Users\Default.DESKTOP-KA61FU5\Desktop\Talk2GPT\Talk2GPT.py", line 27, in <module>
result = model.transcribe("audio_test.mp3")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\transcribe.py", line 122, in transcribe
mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 140, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 58, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 1024, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python311\Lib\subprocess.py", line 1493, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified
@final kiln Do you know anything
Yes you need to first read the error and understand it. Sometimes the solution is obvious, other times requires some trial and error.
I suppose it's like zip compression, it removes unnecessary info
makes you wonder how big llms really need to be
is anyone interested in RAG? I'm reading through https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/446422 and dont really understand why/what logit theyre feeding into their binary classifier head (the steps under the figure).
So, they have the question with the retrieved context. The way I (try to) understand it (which also goes against other things they say they do so I know I'm wrong) is that they take that context+question sequence, then append the answer to the sequence, run the sequence through the model to get the logits for every possible token in the vocab, and feed those logits into a classifier head? Why would this give a helpful measure of how correct the answer is? Surely the LLM is happy to spit out the next token anyway, I dont see what relevance specific logits there has?
I guess my first question is, is it right that they only forward through the network 5 times for each question (because of the 5 answers)? Just to make sure I'm on the same page
Then 2, is it just a thing that I have to accept that they can classify over the logits for each answer? I dont see where they got the idea from, what the intuition is. If it was me I'd like to try to get the logits for the model producing tokens A/B/C/D/E with whatever setup is needed for that
actually I think I have the right understanding of WHAT theyre doing, but still unsure as to WHY
Hmm, it seems it has to do with FFMPEG, i have it in my c drive under FFMPEG, and it got it on Path enviorment variable but it still wont work,
also I find it weird that they do this
if tokenizer.pad_token is None:
if tokenizer.unk_token is not None:
tokenizer.pad_token = tokenizer.unk_token
else:
tokenizer.pad_token = tokenizer.eos_token
ie pad with unk, am I right in thinking thats weird?
ah I think Ive found the answer to my question from above, the diagram doesnt really explain but
inst = f"Answer: {row['answer']}\n###\nIs this answer correct? "
instructions.append(inst)
now I can accept the logits being useful
@heavy sigil when you read the error, what did you think of this line? result = model.transcribe("audio_test.mp3") in the context of FileNotFoundError ?
it's in the same folder as the .py file
and i'm running the py from command line on that folder
put this above the line that errors
import sys
print(sys.path)
and then google pythonpath
and also try changing the code to use the absolute path
Hmm, the error is happening from the library not my code
and sorry not pythonpath, current working directory
import os
print(os.getcwd())
I have my working directory set correctly
can you with open("audio_test.mp3") directly above the line that errors?
???
the error is not caused by script
i need it done locally
well just with a test file to rule out os issues?
wdym?
why does it need to be local
lets take this to dm also as im concsious of spamming this chat
Sorry, the y-axis is the RMSE. My take is that it's overfitting badly for k<20 and underfits for k>20k
Yes it's overfitting at k < 20. The learning curve didn't show what's happening beyond k > 100.
Hmm yes, it's worth trying definitely, I'm just not confident it'll give the best results out of the methods available to us because I don't think the embedding space geometry holds the similar common structure and patterns found across different human languages
LoRA (and more generally, PEFT) is pretty powerful!
I use it quite often now, instead of the domain specific adapters we were engineering before
Can anybody help me study these following topics
- Mamba architecture
- Liquid Time Constant Networks
Im also looking into hyperdimensional computing and q star so any help there would be nice
I've never heard of these. This sounds pretty advanced.
Yes, it is
Anyone have an good papers on NEAT?
That's cool and all but I followed a tutorial and was able to train a nn for mnist (I don't understand how it works at all just copied the dudes code)
you need to use .format with the string not the function
and you haven't given any placeholder for the string also
Yeah I figured it out thank you
alr
I wanna to ask does it print the both print orders if the equation is true can't I choose to print either if it is true and other order is false ?
worth trying to move from data engineer to data scientist ?
anyone know how to solve this problem?
any in-depth guide on customizing graph layouts in Networkx?
do you understand what the error message is telling you
@serene scaffold thanks for ur response bro but it already solve
@past meteor I thought of an idea based on an article I read about time vectors being used in language models. Suppose we have entirely real-valued parameters as codons of an evolutionary algorithm applied by an agent over a time series. For each period n in a periodization of the training period, use the EA to find the best-fitting model. Then, we can look for time-dependent vectors in the parameters and use them to extrapolate into the future.
I'm on vacation till like the 5th of Jan, tag me afterwards 😄
sorry enjoy vacation
Hi, I'm trying to learn ml using project-based learning, but I'm facing some troubles. When I get a project idea, like a spam email classifier, how do I go about building it without just searching up how to do it and blindly copy and pasting code? If I fall into this trap, I'm usually unable to provide my input and that becomes troublesome.
I think something good can be to just do one project where you're essentially copying and pasting code but instead of just doing that try to understand what each step does. Then on a second project, try to do these steps on your own for a different project
Then you can ensure that you've learned it
why q star? Because of some openai shitposts? anyway https://www.youtube.com/watch?v=nOBm4aYEYR4 https://www.youtube.com/watch?v=9dSkvxS2EB0
S4 as a precursor to mamba if you havent come across it also
oh I was actually doing something like that lol
just needed validation for it
i find banning myself from using copy + paste when in the copy code mode helps a bit, even if I just copy it out by hand
This is a long shot, but i have to code this:
https://gyazo.com/4c62c2a417578ef6e51f617d1e633ad4
x = []
y = []
z = []
prodl = 1
summ = 0
sumi = 0
jj = -1
n = 2
for i in range(n+1):
for m in range(n+1):
for l in range(n+1):
if l != m and l != i and l != jj:
prodl = prodl * (n/2 - l) / (jj - l)
x.append(prodl)
if m != i and m != jj:
summ = summ + 1 / (jj-m)
y.append(summ)
if i != jj:
sumi = sumi + 1 / (jj-i)
z.append(sumi)
print(np.prod(x) * np.sum(y) * np.sum(z))
This is my code, however for the last product of all it just gives an array full of 0
this is because if n/2 - l = 0, since its product of all everythng after is 0
so is the equation just wrong thna?
in my case n = 2, so when n = 1 the whole product is 0
this makes 0 sense
the equation just breaks
even worse i get '-0.0'
-0?
for some reason, people have stopped using the Gephi Toolkit since around 2021 (last posts on their forums), but what the heck have people replaced it with?
Hello all, I am new to python and have some questions related to dimensionality reduction. I have a textual dataset about publications in the field of information visualization, my target is to apply suitable dimensionality reduction methods to visualize the large amount of data against 2 dimensions. So far, I have used the CountVectorizer and TfidfVectorizer to generate the tf and tf-idf vectors for 2 of my columns. Currently I am trying out different methods like PCA, TruncatedSVD and LLE. I am still unsure what makes which dimensionality reduction method more suitable for the task. When I used LLE, I got a really sparse visualization of only 3 points, so I assumed that all other points are overlapping and hence I see only 3, but I am unsure. If anyone can help me understand more about dimensionality reduction it would be great. I would appreciate any helpful resources. Thank you😊 .
The best approach is to stop thinking about code. You want to know what techniques to use, where to get data, etc
So you can look up spam classification for example, then you'll read about naive bayes classification and spam datasets. Then you do EDA, scripting to process data, and finally model fitting, but all of that is driven by figuring out conceptually what you need to do. The code becomes only a means to an end
At which point figuring out the code itself becomes a lot easier, because you'll already know what you need to do
UMAP is kind of the go-to dimension reduction technique for high-dimensional data
Thanks for the reply, I am actually working on a lab assignment and need to use at least 2 methods, I have already tried PCA, TruncatedSVD, t-SNE and LLE. The visualizations looked really dense and there was so much overlap in the scatterplot so I started to experiment with the min_df parameter for the TfidfVectorizer to see how it affects the visualization or if it makes it better. I will give UMAP a try.
Yo is anyone here in digital or computational humanities, I'm trying to relearn some corpus analytics stuff b/c i want to create a model for comparing the communication patterns or repetition of words and images specific to political topics and compare the averages i get throughout my data set to determine what comics are more political than others on a macro scale, and on a microscale seeing what words are specific to each comic that defines whatever political topic i'm exploring in the text.
For each comic: replace all the words in their sub-corpus with the "base" form of each word, which is called the lemma. "run" is the lemma of "running". "cat" is the lemma of "cats". this process is called lemmatization. This normalizes each word, so that you can count how frequent each one is.
feels like test-driven development could be used to make LLMs write programs that are a bit more sophisticated, no?
or, has anyone worked on the idea of making packages, basic types, basic classes, etc., be considered tokens?
I dont know if this is the right sub, but i have the following question:
Im trying to discriminate between images of numbers, i want to define image an image property that describes the amount of enclosures in the image. For example the image of an 8 has 2 enclosures, where as the images of 6 and 9 have exactly 1 enclosure.
see images i supplied: the 7 has no enclosures, the 4 has 1 enclosure and the Q has 1 enclosure. How would i write a python function that calculates the amount of enclosures present in an image?
You can use breadth first search, iterate all pixels of value 0, for each pixel do bfs to visit every neighboor of value 0, if the graph does not touch the edge of the image increase enclosure count by one. Always keep track of which pixels have already been visited so you can skip them.
You can also invert the image, perform connected components and exclude those that touch the edge of the image and then count what's left.
heyuhm
i started getting this new error from autoscraper, its not giving an output anymore, just empty list after scraping
you'd need some kind of vector encoding for each, which might be tricky.
i suppose you could use one transformer-based model to obtain "function vectors" and then another to work on high-level sequences thereof
This LSTM shows some fluctuation in the validation MSE loss but its gettting to the point where I'm happy with the loss. Should I be concerned?
Also the validation loss is lower than trianing, but from what I understadn thats becasue of my dropout rate
someone can help me editing a google colab project, Whisper Youtube, to use insanely fast whisper instead of just whisper ?
does anyone here know tensor algebra? 👀
i'm trying to understand how to express arbitrary tensor contraction in terms of blas routines
for example, a tensor product between (2,3,4) and (3,4,5) can be expressed as a matrix multiplication, by reshaping the two tensors into matrixes (2, 12) and (12,5), then multiplying these two matrixes
but i can't quite figure out how to do it with a contraction like (2,3,4) and (5, 4, 6), where the contracted dimension is in the middle of the tensor...
any resources wrt that would also be appreciated
You are asking how the contra and covariant transformation properties are applied?
not exactly?
i'm trying to come up with a general algorithm to express tensor contraction in terms of BLAS routines (that is, only using matrix multiplication, vector products, tensor reshaping, transposing and such)
hmm...no zero shot solutions for you but i can point you in a direction that might help
invariant inner products between vectors and covectors under transformation is how i'd look at mapping tensor ops -> matrix ops
and if you want a general algorithm you'd want something that is valid regardless of the coordinate system
and for a general algorithm it might be interesting to explore if you could derive a calculus function that sums over the indices akin to an integral (sort of 🙂
contraction is very similar to integration over a set of indices (or dimension0
sry i couldnt be more help ☕
interesting 👀
check out Einstein Summation
what's kinda frustrating is that this algorithm already exists and is well known, bc this is how libraries like numpy and pytorch implement this
but how tho 😭
oh you want to kntow how it works?
i mean, i understand how einsum works, i'm implementing my own tensor contraction algorithm that way rn, but i don't understand how that would help expressin einstin summations in terms of blas routines
A = np.random.rand(2, 3, 4)
B = np.random.rand(5, 4, 6)
B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)
something like this?
this can be directly correlated to einsum
closely
does this give the rame result as
numpy.tensordot(A, B, axes=[[2],[1]])
?
well yes
i don't know numpy well, so i'm kinda confused by the arguments of .reshape(), i thought the arugments was just the desired shape?
in the case (beforce your example) we wanted to reshare the transposed tensor to align the 3rd dimension of A with the second dimension of B for contraction
reshape sorry
so the manual approach i shared gives more control
because in your case is not explicitly specifying the reshaping and permutation of dimensions.. tensordot does perform a contraction over a specified axes, which is the axes by which you want to contract the tensor. [[2],[1]]
so 3rd dimension of A (index 2) and second dimension of B (index 1)
which exactly what the manual approach i shared does
if i had to guess (and i've never looked honestly) numpy.tensordot is probably more efficient 🙂
i see
but this does get me thinking
btw, the way numpy does it, reshaping things doesn't acc affect the data in any way, it's simply a reordering of dimensions and only has an effect on how you read/write to it, right?
ok... so hmm
if you wanted something more general you'd want to be able to handle any valid combination of contraction axes specified as pairs
and you'd want to magically (i say that word instead of automation) infer the shape and reshape and permutate without manual intervention
let me try something real quick
i already have an algorithm that does this, but it's pretty wonky and as a result very slow
which is why i'm trying to express it in terms of blas
sec let me open up jupyter
i want to benchmark a general case
i'll use your inputs if that works
btw, i checked your algorithm, and it gives a different result from just doing tensordot (unless i messed smth up)
A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)
B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)
print(np.tensordot(A, B, axes=[[2], [1]]))
print(result_tensor)
thats yours right?
that's your code, i just changed A and B to have concrete values
to easily compare results betwen runs
yes
numpy must be doing smth more than just reshaping the tensors into desired matrix shapes
bc the following two examples of tensor contraction
(2,3,4)@(5,4,6) and (2,3,4)@(5,6,4) would be mapped into the same matrix product, but technically shoudl have different results...
no 👀
my issue is that i can't yet express arbitrary tensor contraction product in terms of blas, but yes
TensorLy is a pure tensor library w/ blas based contractions
i see
the goal of my project was to understand how to acc implement all those ops, so this kinda loses the purpose 😦
wdym
not the one i showed you only
huh, you two are new
welcome to our wonderful data science and AI chat
ty 🙂
this question does acc tie into ml, bc efficient tensor contraction lies at the core of tensor autodiff, so this is on topic lol
I didn't say it wasn't
also i have another fun idea to test
👀
uhh kinda busy tbh
since it can compile it down to machine code
Hi can anyone help me
numpy uses hyperoptimised blas routines, so even being 2x-3x slower than numpy is a success in my book tbh
you have to say exactly what it is that you need help with before anyone can even try
@topaz turtle im debugging some code.. my contracted tensor1 resulted slightly different dimensions than tensor2 😄
in my very general approach that i'm experimenting
tensor math not for the faint of heart..
Ok wow...
i really thought this idea was sound
ahh
seems like maybe its because i'm using numpy's matmul
on tensors that dont have compatible inner dimensions.
looks like long form math is the only way
Hey @topaz turtle you know how BLAS is a 4-letter acronym?
I came up with one for this contraction math long-form called TRIS 🙂
Transform, reshape, iterate, and sum
@topaz turtle Ok sorry it took a lot longer than I thought
coding a completely general blas transform was harder than I thought
@topaz turtle
import numpy as np
import time
###### falcon wings axis permutations to simulate blas w/ tensor contractions
###### v1 with numpy arrays
# 100,000 Iterations
A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)
contraction_axes = [[1, 2], [0, 1]]
from unittest import result
# Tensor Contraction BLAS experiments
# TRIS - transform reshape iterate sum :)
def long_form_third_grade_math_tensor_contraction(tensor1, tensor2, contraction_axes):
# we will use numpy for its efficient arrays but will implement the tensor math
tensor1 = np.asarray(tensor1)
tensor2 = np.asarray(tensor2)
# T (Tris)
permuted_axes1 = [axis for axis in range(tensor1.ndim) if axis not in contraction_axes[0]] + contraction_axes[0]
permuted_axes2 = [axis for axis in range(tensor2.ndim) if axis not in contraction_axes[1]] + contraction_axes[1]
tensor1 = np.transpose(tensor1, permuted_axes1)
tensor2 = np.transpose(tensor2, permuted_axes2)
# R (tRis)
new_shape1 = tensor1.shape[:-len(contraction_axes[0])] + (-1,)
reshaped_tensor1 = tensor1.reshape(new_shape1)
new_shape2 = (-1,) + tensor2.shape[len(contraction_axes[1]):]
reshaped_tensor2 = tensor2.reshape(new_shape2)
result_shape = tensor1.shape[:-len(contraction_axes[0])] + tensor2.shape[len(contraction_axes[1]):]
result_tensor = np.zeros(result_shape)
# trIs (Iterate)
for i in np.ndindex(result_tensor.shape[:-1]):
for j in range(result_tensor.shape[-1]):
sum_over_axes = 0
for k in range(reshaped_tensor1.shape[-1]):
# triS (Sum)
sum_over_axes += reshaped_tensor1[i + (k,)] * reshaped_tensor2[(k,) + (j,)]
result_tensor[i + (j,)] = sum_over_axes
return result_tensor
start_time = time.time()
for letsgooo in range(100000):
result = long_form_third_grade_math_tensor_contraction(A, B, contraction_axes)
general_hundy = time.time() - start_time
print(f"falcon slowbie: {general_hundy:.4f} seconds")
falcon slowbie: 11.3511 seconds
And now for numpy's
import numpy as np
# Numpulous einstanamous
A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)
start_time = time.time()
for _ in range(100000):
result_tensor_einsum = np.einsum('ijk,lmn->iln', A, B)
einsum_time = time.time() - start_time
einsum_time_str = f"einsum smasher: {einsum_time:.4f} seconds"
einsum_time_str
einsum smasher: 4.3604 seconds
dont forget to SMASH that einsum
Would anyone be interested in doing a deep learning project with me it will be a self dataset building DL ai dm me if interested 😁
Any ideas how to start with machine learning? Maybe learn pandas, flask,numbpy first?
pandas is good for a simple up to complex dataset
i would suggest tensorflow im my opinion
https://roadmap.sh/ai-data-scientist
just check off all the boxes on this site and you'll be good to go
pytorch imo
that one is little more complex to start you go simpler
I've finished setting up the workflow and all the aws stuff, Im now waiting for amazon to accept my quota increase request so I can connect everything up
that is github actions
the actions workflow is gonna babysit the gpu spot instances, and the spot instances are gonna train GPT and send the info to MLFlow, I deploed it on the free tier aws instance
is this your whisper thing?
yes, I'm setting up the infra for doing experiments, there's a couple things I wanna try out regarding a possible modification of the self attention mechanism
nice gl
ty
@topaz turtle was that helpful...
I want to create a project to add to my portfolio since it's empty right now.
I'm new to data science and know very basic python. Any suggestions on what kind of projects I could do? Or if you know of any examples, that'd be helpful too!
Also, should I do it on vs studio code of google collab? Those are the two platforms I'm familiar with
how are you handling interruptions to the spot instance? saving the model at every epoch?
I found this thing: https://github.com/spotty-cloud/spotty
It automates all the details
ideally, pick something you already have an interest in (gaming, cooking, sports, whatever) and look for interesting questions to ask, and data that you can use to answer those questions. once you have a question to answer and some data to work with, you can go in pretty much any direction from there
it doesn't have to be super clever or detailed. but all successful real-world projects start with a question to answer and some data to work with
Upon setting it up, it uploads code to a bucket it creates automatically. I reckon it will also cache docker checkpoints or something like that
if you're stuck for ideas, check out kaggle. for example the Titanic dataset and the associated survival classification task is an excellent beginner project
But haven't yet been able to use it because AWS defaults the quota for GPU instances to 0, so I'll have to wait 24h til they increase it
nice, let me check this out. might need it for work some day
Def worth, spot instances are a loooot cheaper
fortunately until now i've been able to do it all right on my company macbook
pretty powerful machine, and even does gpu albeit much slower than an actual gpu
That's cool, I've been relying on kaggle and colab for free GPU. But I reckon that once I start doing large text datasets + larger gpts this will be important.
sorry, i got quite drunk 😭😭😭😭😭😭😭😭
i’ll check it tmr 🥰🥰🥰🥰
wtf
it’s a thing that ppl do? 😭😭
hey, i was wondering if there's a resources page for this topic
i mean data sc/analyis
check the pinned messages
Has anyone ever used matplotlib for 3d plot ? I have two planes in 3d and the dot product of their normal vectors is literally 0, and legendary library plot them to be legit parallel, i am so confused..
Hi, anyone have experience with deep q reinforcement learning? I have a few questions

("asking to ask" creates extra steps that decrease the chances your question will ever get answered.)
thanks for the advice!
How does a Deep Q Network for reinforcement learning learn at all?
To my understanding, you have a minibatch of random experiences, where each experience is in the format (current_state, action, reward, next_state) the way a DQN works is that you pass in the current game state(current_state) and get an estimate of the q-values for each action then, you choose the q-value of the action you took and store it in a variable. let's call it current_q_for_action_taken for now
then, with a seperate target network, we pass in next_state and get q_values for each action we can then calculate target_q_for_action_taken as follows: target_q_for_action_taken = reward+max(q's from target network)
Then, we can calculate the MSE loss for updating the other model(not target model) as follows: (target_q_for_action_taken-current_q_for_action_taken)^2.
Then using gradient descent and backpropogation, we can update that network's weights and biases(I think it updates biases)
then, every n steps we transfer the weights from this network over to the target network
my question is: how does all this allow this network to estimate an optimal q-function? It seems to me that the network will just flail around randomly adjusting weights and biases but never learning the optimal q-function to accurately map current game states to accurate q-values
What does it mean if when you split your training data into periods and the vector of the fit parameters over time have significant autocorrelation/time series dynamics. Does it mean you should fit a per-period model on some arbitrary periodization of your training data or should you try to find the latent variable that explains the time series dynamics and make one model for all periods?
for example, the optimal value of one parameter seems to oscillate back and forth quite predictably from one period to the next the way I'm splitting up the data
I’ve done a lot of research and the answer has pretty much been a toss up. PyTorch vs Tensorflow what is the best to learn for a beginner in the Machine Learning field (fully fluent in Python already)
I am looking to develop a model that can recognize a pattern in a database of a food journal. My dad that had cancer still has stomache issues, and he has 2 years worth of food entries and bowel movements. I am trying to develop a model that can tell him which foods he can and cannot eat based on the log.
It seems that I should be using a classification model, but I am not sure what the best way/how to approach this general solution
I don't think most people care that much. but no one really talks about perceptrons except when discussing the history of AI.
It's contradicting itself. You can bring this up as an edit to the Wikipedia page.
Isn't the Heaviside a non-linear function ?
a linear function is a polynomial of degree one or less, including the zero polynomial
which is not the case for the step function, doesnt fit into ax+b
https://en.wikipedia.org/wiki/Piecewise_linear_function - this would be the correct terminology I think
In mathematics and statistics, a piecewise linear, PL or segmented function is a real-valued function of a real variable, whose graph is composed of straight-line segments.
interestingly, your algorithm is as fast on my laptop as numpy's 👀
Yea it's pretty good
I'm going to implement it in C later
as I actually have some ideas for it
ooh interesting
for transformer math
acc the reason i asked in the first place is bc i need it for a C library i'm writing lol
at the moment I'm finishing up some code that instantly analyzes a models layers
an open source ones:D
btw, there is a potential problem w/ testing on such small tensors
when i tested my own algorithm, i learned that for small tensors there's a very small disparity in performance, but once i tested on tensors with ~million elements, it revealed that the time disparity in reality is acc like a 100 😭
just an example
does anyone know if libraries like numpy acc move data around when doing a transpose (to preserve linearity of data access)
I started with TensorFlow because it appeared somewhat more beginner-friendly to me, however, since I tried PyTorch I never went back to TensorFlow.
Idk how to explain it lol but PyTorch gives you this liberty to explore and navigate your ship however you deem fit. So long as you enjoyed OOP when you learned Python, you will most likely enjoy PyTorch.
More so, you could as well leverage PyTorch Lightning; which is more like a wrapper for PyTorch models, to even go brrrrrrrrrrrrrr while driving your ship.
I'd say, just start with anyone framework that comes easy to you. It doesn't even have to make sense why you pick one over the other, just get started already.
I'm sorry to hear about your Dad's stomach issues. Hopefully, what you're trying to build helps him get better.
Goodluck 💯
There's an error in their definition.
Peceptron uses a threshold function which is a linear function. It becomes MLP when you introduce at least one hidden layer and the activation function changes from a threshold function to a non-linear activation function like ReLU, SWISH, Phish, Tanh etc.
Ive had much better experience w pytorch as well. Have you ever tried building tensorflow from source w tensor rt and cuda w clang? I have. Its not pleasant.
I think calling the threshold function linear is a misnomer in on itself, especially because ReLU is being considered non linear too, they are both piecewise linear, but non are strictly linear
Training going. Time to go chill:)
this thing ended up not working out, I tried using the t2.micro just to get it going, it starts the spot instances and takes care of a lot of stuff, but then it gets stuck on some unknown signal that google can't find.
im just gonna code a simple master-worker setup myself using boto3, I can use it to start the spot instance and run a script that starts the training process, the worker will drop messages to an aws queue. spot instances give a 2 min warning, which I can use to save state and restart where I left off on some other spot instance
Yeah that's true. You're right
y does stuff that sound simple always turn out to be complicated >.>
need to rethink this
to be fair to tf, pytorch isn't all that great in that regard as well
for example, last time i checked, if you wanted pytorch w/ vulkan support, you had to build it from source, not to mention that the only guide/documentation wrt that is a pytorch blogpost from 3 years ago
:incoming_envelope: :ok_hand: applied timeout to @brave thistle until <t:1703948445:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).
The <@&831776746206265384> have been alerted for review.
If I'm just probing to see whether a reinforcement algorith might have merit how many timesteps should I invest in training before deciding whether the problem is suitable or not
doing anything that involves nvidia driver or cuda toolkit dependencies from source sounds like nightmare fuel
I'm lucky if my nvidia-smi wants to come out to play
instead of giving me the NVML mismatch error
does anyone else get that joy of an error of libcupti not being found so you need to install a 1.x version of torch?
question: if my spot instance goes down, should I restart from the last epoch or from the last best epoch ?
given your seed won't it just go in the same direction again
no, the batches are usually randomized, even with the same seed it wouldn't go in the same direction since it«s being initialized at a different starting epoch
finally got my first successful master-worker run
this was hard work ngl
now I gotta make it fault tolerant
how do you debug if your results are actually different every time you run it
the right pattern to use is to have mlflow store the state and then draw decisions from that
I want to explore state space, randomness is usually a good feature since it ensures that it is properly explored
this is only for contracting axes at the end of the first tensor and at the start of the second tensor? a la (2,3,4) x (3,4,5)?
right?
tho I can see the value of being able to exactly reproduce the results
not entirely sure how it could be done with this setup tho
guess I'd need to decide on the batches for all epochs a priori and save it somewhere
im going for last epoch then
i am trying to load a scikit learn model from 2019 via joblib, but i am getting errors (probably because i am using newer python version now than what it was created it in) I have tried setting up a docker image and replicate environment from then, but that gave me other errors (some opencv errors). Can anybody help me load this model into a modern version of python and scikit learn? I can pay for it
Hello everyone.
I wanted to ask, does anyone know any tutorial/course machine learning, where they teach how to further train and improve a model in the last remaining percentages of the loss error. I struggle finding even one on YouTube or a literature about techniques or things to do to improve it further, other than to train it longer or change the learning rate.
Show the error that you currently want help with.
Don't offer money.
The reason you're not finding resources to answer that question is that there are no broadly applicable answers. Look at the insurances that your model isn't learning and think about why that might be.
Thanks for responding.
Does it have to do with the data itself or maybe because there are too many outliers on the data because, when I compare some of the predicted values and their true targets, the difference are 5 times smaller (when I'm supposed to get 15K it gives me 3K) and some 2 times bigger compared to their true targets.
I normalized the features at first to the same range, that didn't help, then I normalize the targets to the same scale with the features to avoid exploding gradients, still no having this issue.
You also know that numpy does't support unlimited dimensionality either right?
i think it's like 64 tho, isn't it?
After 32 dimensions, many of numpy's functions hit a wal
i mean ye, but i'm still trying to generalise this to at least 4 or 5 😭
you know how right?
not really
well there more i think about it the more i'm thinking my idea is pretty unique
I havent actually seen another example of doing it that way but not saying it hasnt been done
i'm exploring the idea of skipping the expense of the transposition step by performing scattered data access on sub-tensors of each tensor that are large enough to fit into L1 cache (so the non-linear data access due to the transposition of the axes doesn't incur any performance overhead)
yeah that would work