#data-science-and-ml

1 messages · Page 93 of 1

blazing vale
#

here

small wedge
#

what you showed should be a way to do it

>>> t.tensor(0.4)
tensor(0.4000)
buoyant vine
#

FacePalm yeah that would work

#

I kept trying torch.Tensor

small wedge
#

ah lol

#

hm

#

what are you trying to sum here?

blazing vale
#

like how much total sales of each genre is

#

i wanna add global column of every row which has genre as 'Action'

small wedge
#

so you want the number of records right?

#

what about .count

blazing vale
#

nope

#

i wanna add total sales

#

lemme show u

small wedge
#

do you have a sales field?

blazing vale
#

yes

#

see this

#

Global is total sales

#

over all regions

#

i wanna add global sales of all the games which fall under the genre'Action'

small wedge
#

oh

#
print(df[df['Genre']=='Action']['Global'].sum())
blazing vale
#

oh wait we have been equating wrong column all this time

#

this will actually work lol

#

working

#

checks out too

#

thankss

#

136.85 Billion Dollars

#

nice:))

blazing vale
blazing vale
#

U there?

small wedge
serene scaffold
#

@blazing vale please don't call out specific people. ask a complete question that anyone who knows the answer could start answering.

blazing vale
#

Line 167,166,128

#

Can you tell me whats the issue with em?

serene scaffold
blazing vale
#

The output is at the end of the code

#

Its giving me blank

#

No output for the condition on line 167

#

Sir nedbat told me there is smthg wrong with line 128. But i cant figure it out

#

I have made it so complex i can’t figure out myself what wrong i am doing 💀

#

i asked chatgpt. lol it figured out easily. i am so dumbbb

#

it was a minor mistake

agile jackal
#

what hugging face training data will make anime or cartoonlike art I have dreamlike-anime but it is not often accurate I am familiar with safesensors

fresh eagle
#

has someone played with the new version of midjourney?

outer widget
candid spruce
#

Would anyone be interested in doing a deep learning project with me it will be a self dataset building ai dm me if interested 😁

long canopy
#

anyone here ever deal with interactive graph visualization? e.g., the ability to zoom in or zoom out, or obtain info by hovering over a node

timid kiln
#

I found an article the other day that described different methodologies of how output data could be organized, and now I can't find that article. Perhaps you all can help me figure out the appropriate keywords so I can find it again.

The two concepts were one that involved many rows, as opposed to organizing output data into less rows, more columns. Probably be easier to show a couple output tables in csv format:

1,Pressure,50
2,Pressure,60
3,Pressure,70
1,Flow,10
2,Flow,20
3,Flow,30```
As opposed to:
```Scenario,Pressure,Flow
1,50,10
2,60,20
3,70,30```

I'm trying to figure out which is "better", what the pros/cons are for organizing data in either method, and so forth.  I know the answer to this is "it depends" but unless I know what terms to use to search on the internet I won't be able to learn anything about it.

Please let me know if this doesn't make sense and I'll do my best to clarify things more.  Also, please tag me if you respond so I'll get an alert.  Thank you!
long canopy
#

i might learn javascript just for the libraries available for graph visualization

past meteor
spare junco
#

We are trying to analyse this huge dataset, which is supposed to predict the number of days for the treatment of a patient.

We have tried multiple models like GradientBoost, CatBoost, LinearRegression, LGBM, XGB etc.

We have also done feature selection including Mutual Info Feature Selection, Correlation, Variance Threshold. And other stuff like Normalization as well as Outlier Detection.

We have had the best results with Gradient Boost Regressor. We have done some hypertuning via GridSearchCV.

What is our next best step to reduce Root Mean Sq. error

The dataset looks like this 👇

past meteor
#

Imo it's something you can't do with summary statistics etc. I'd try and understand what you're working with first

#

This will lead you to feature engineering etc.

outer widget
past meteor
#

Regularization is also an option you can pursue if you have a lot of (useless) variables. In my experience Extra trees works well if you have many correlated predictors as well.

long canopy
#

literally nothing for interactive networks in python atm

#

am investigating whether the Gephi Toolkit is worthwhile and more efficient than javascript approaches; the toolkit uses OpenGL

blazing vale
#
def maxs2(genre,region):
    print(df[(df[('Genre')]==genre)&(df[region]==df[region].max())])```
#

returning empty dataframe

#

just like my brain

small wedge
#

max doesn't return a mask

blazing vale
outer widget
blazing vale
#

Maximum and Minimum Sales made by each Genre across each Region and ROW Sales

#

trying to do this now

small wedge
blazing vale
#

it returns empty dataframe 😭

#
Enter genre: Action
Enter Region: North America
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []
#

output

long canopy
red solar
#

hello everyone.

#

I'm interested in machine learning.

#

So, which one help me?

outer widget
feral kernel
#

Why does everyone use gradient descent for backpropagation or determining weights, there are so many other faster ways, simulated annealing, batched quasi Newtonian, genetics algorithms, random weights plus some adjusting, and other metaheuristics? Is it because it is easier, more papers on it, easier to scale and more stable? I feel like to progress even more in ML, people need to transition beyond normal back-propagation and a new architecture.

spare junco
valid wind
#

why does BERT base uncased need 12+ GB GPU memory to train

#

when using a standard AdamW optimizer, doesn't it represent each parameter as 8 bytes

#

so 110 Million Paramters x 8 bytes, 880 MB, + 1x for gradient + 1x for optimizer state

serene scaffold
#

@valid wind show your training code

#

and in particular, when you move the training data onto the gpu

valid wind
#

yeah let me show it

#
tokenizer=transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model=transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
train_dataset = datasets.Dataset.from_pandas(train_df, preserve_index=False)


def tokenize_texts(texts):
    global tokenizer
    q1rows=texts['question1']
    q2rows=texts['question2']
    return tokenizer(q1rows, q2rows, truncation=True)



tokenized_data = train_dataset.map(tokenize_texts, batched=True)
cast_features = tokenized_data.features.copy()
cast_features['is_duplicate'] = ClassLabel(num_classes=2, names=['not_duplicate', 'duplicate'], names_file=None, id=None)
tokenized_data=tokenized_data.remove_columns(['question1','question2', 'id', 'qid1', 'qid2'])
tokenized_data=tokenized_data.rename_column('is_duplicate', 'labels')
tokenized_data=tokenized_data.train_test_split(test_size=0.2)
training_args = TrainingArguments("./quora-bert", evaluation_strategy="epoch", save_strategy='no', report_to='none', num_train_epochs=3, per_device_train_batch_size=32, per_device_eval_batch_size=32)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data['train'],
    eval_dataset=tokenized_data['test'],
    compute_metrics=compute_metrics,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    tokenizer=tokenizer,
)
trainer.train
serene scaffold
#

please edit to to include a py on the first line with the backticks

#

```py
code
```

#

ty

valid wind
#

sorry I forgot to add that

serene scaffold
#

@valid wind
you don't need global to read the global scope. only to write to it. so global tokenizer is unnecessary.

try lowering the batch size and see if you don't run out of GPU memory.

valid wind
#

no I don't run out of GPU memory training this, when using 16GB, however, when I run my own training loop, with even 32 batch size this only requires 12 GB around of memory

serene scaffold
#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

valid wind
serene scaffold
#

looks like your paste messed up the indentation.

valid wind
#

yep, fixing it now

#

but it might be just because I'm clearing my cuda cache

#

this should be better

serene scaffold
#

gotta do a work thing. I should be back in like ten minutes

valid wind
serene scaffold
#
         # Run the forward pass of the model
         logits = model(
             input_ids=batch['ids'].to(device, dtype=torch.long),
             attn_mask=batch['mask'].to(device, dtype=torch.long),
             pred_indicator=batch['pred'].to(device, dtype=torch.long),
         )
         loss = loss_function(logits.transpose(2,1), batch['targets'].to(device, dtype=torch.long))

@valid wind this potentially saves memory because it never creates extra references to all those cuda tensors in that scope

#

you also have a syntax error on line 33

valid wind
#

yeah it was an error while pasting it or something

#

that's not in the original code

long canopy
#

how come no one suggested Bokeh to me

#

it does exactly what I need

desert oar
#

normally i'd recommend just 3 columns here ("long" format): genre, region, and sales. instead it looks like you have a separate column for sales in every region ("wide" format).

#

if you set up your data with indexes, you can do this as simply as df.loc[(genre, region), "sales"].max() or similar

#

but that requires setting up your data correctly, which requires you to share information about how you constructed this data

desert oar
# spare junco We are trying to analyse this huge dataset, which is supposed to predict the num...

i second what zestar said. all the fancy machine learning algo stuff comes after you've done a thorough inspection and analysis of the data. you will also want to form a coherent understanding of how the data was collected and how these measurements were obtained. otherwise you're just flailing around, which doesn't actually work in most cases, despite the breathless hype of companies that want to sell you machine learning APIs, cloud compute, etc.

long canopy
#

how could I programmatically control the zooming/centering of a pyplot graph while it is actually being rendered? in a sort of REPL fashion

desert oar
blazing vale
#

So basically i am working on this big dataset which has 826 rows and 9 columns. i made a small search system using pandas to access rows according to user input. However suppose there are no matches. How can i make it in a way that instead of returning this output given below for no results it just prints"No results found "```
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []

#

anyone?

left tartan
#

Scraping google most certainly violates their terms of use (something we can't help with). Have you considered one of their APIs? Or open street map?

desert oar
#

Note that Google maps specifically prohibits caching or storing their outputs for use in your own database

#

People do it all the time of course, but as per server rules we officially are not allowed to help with anything resembling that

#

what kind of data specifically? maybe open street map / nominatim or geonames can help

#

i mean, if you're just looking to practice, you can probably use google maps, yelp, foursquare, bing, facebook, etc

#

openstreetmap has some of that but it depends on volunteer input

#

well, writing a python app to fetch data from an api is good practice. not related to the channel topic, but good practice all the same

outer widget
rugged mist
#

i have an np ndarray B of shape (N, K, K), and a of shape (N,)
i want to find the weighted sum of the N KxK matrices in B, where a stores the weights
np.dot(B.T, a).T seems to work on its own but if i decorate the function its in with numba.njit i get this error

Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function dot>) found for signature:
 
 >>> dot(array(float64, 3d, F), array(float64, 1d, C))

im guessing its because B.T becomes fortran style because of the transpose, can i make np.dot work without the transpose or do i need to write a loop myself

#

minimum repro

import numpy as np
from numba import njit

@njit  # works if this is removed
def f():
    a = np.ones((10,))
    B = np.ones((10, 30, 30))
    return np.dot(B.T, a).T

print(f())
outer widget
rugged mist
#

sad

untold bloom
#

was np.einsum("i,ijk->jk", a, B) too slow?

rugged mist
#

looks like numba doesnt support einsum

hollow flicker
#

I’ve dataset. For examle it contains age, money , job, salary etc. How can i give points per user? Between 0 - 1. Unfortunately i dont have target data

radiant cipher
#

hi,anyone aware of a pre-implemented fast way to have numpy materialize a element path, if one array column is the parent indexes

aka given something like

tree_array = [0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9]

when asking for the path of the index 9, returning [4, 1,0] and when asking for 19its [9,4,1]

its a easy python loop but i'm working with about a dozen million elements and i would like to have a pipeline of extract path, get indexes of another array based of the path and return that - i cant find a premade implementaton of that type of tree walking

red oriole
#

Do any of you guys know how to setup a local dbpedia triple store?

serene scaffold
#
Optional specification for standardization to apply to the input text. Values can be:
None: No standardization.
"lower_and_strip_punctuation": Text will be lowercased and all punctuation removed.
"lower": Text will be lowercased.
"strip_punctuation": All punctuation will be removed.
Callable: Inputs will passed to the callable function, which should be standardized and returned.
outer widget
#

Keras TextVectorisation does have a parameter named standardize. By default its set to standardize='lower_and_strip_punctuation'
You can set it to None to keep the same case

#

Sorry missed Stelercus response.

serene scaffold
radiant cipher
serene scaffold
final kiln
#
class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    sentence = sentence @ self.lm_weights


    # last bit ... 

    return sentence

I'm almost done !! only the logits thing left

#

after this im gonna teach it to sort letters

final kiln
#
class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    logits = sentence @ self.lm_weights

    max_values, max_indices = logits.max(dim=2)
    shifted = logits - max_values.unsqueeze(2)
    exponentiated = torch.exp(shifted)
    return torch.sum(exponentiated, dim = 2)

aight imma train it now

final kiln
#
class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    logits = sentence @ self.lm_weights

    max_values, max_indices = logits.max(dim=2)
    shifted = logits - max_values.unsqueeze(2)
    exponentiated = torch.exp(shifted)
    probs = exponentiated / torch.sum(exponentiated, dim = 2).unsqueeze(2)
    return probs



params = ModelParameters(
  # The dimension of a vector embedding
  coordinates = 3*1000,

  # The number of tokens in the vocabolary
  tokens = 3,

  # The maximum number of words in a sentence (context window)
  words = 10,
)



def generate_data(batches = 2000):
  for _ in range(30):
    sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")
    sorted_matrix, sorted_indices = torch.sort(sequence, dim=1)  # Sort along columns
    encoding = torch.eye(params.tokens).to("cuda")
    yield sequence, encoding[sorted_matrix]

nanoGPT = NanoGPT(params).to("cuda")
optimizer = torch.optim.Adam(nanoGPT.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
torch.autograd.set_detect_anomaly(True)
for epoch in range(100):
  nanoGPT.train()

  for batch, targets in generate_data():
    optimizer.zero_grad()
    inputs = batch.to("cuda")
    outputs = nanoGPT(inputs)
    loss = loss_function(outputs, targets.to("cuda"))
    loss.backward()
    optimizer.step()
  print(loss)
#

is not learning very well 🙃

#

i got it i got it, wasn't actually creating batches:

sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")

onyx forge
#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

feral kernel
#

Hey , why does pytorch nn conv2d model keep telling me too many values to unpack, i changed to code to accept 4 channels , but it still says expected two channels

whole zephyr
feral kernel
#
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(model.parameters(), lr=0.01)
import torch
import numpy as np

# Define the custom loss function for complex numbers
def custom_complex_loss(input, target):
    input_magnitude = torch.abs(input)
    target_magnitude = torch.abs(target)
    loss = torch.mean((target_magnitude - input_magnitude) ** 2)
    return loss     def train_model(early_stopping=True, epochs=50):
    epoch_train_losses = []
    epoch_test_losses = []
    current_best_loss = np.inf
    early_stopping_counter = 0
    early_stopping_patience = 2

    for epoch in range(epochs):  
        for i, batch in enumerate(train_loader, 0):
            real_inputs, imag_inputs, real_labels, imag_labels = batch
            inputs = torch.complex(real_inputs, imag_inputs)
            labels = torch.complex(real_labels, imag_labels)
            

            optimizer.zero_grad()
            outputs = net(inputs)

            loss = custom_complex_loss(outputs, labels)
            loss.backward()
            optimizer.step()```
feral kernel
# feral kernel ```model = BesselCNN() criterion = nn.CrossEntropyLoss() optimizer = torch.optim...

Cell In[97], line 51
     48                     break
     50 # Assuming net, train_loader, test_loader, optimizer, and device are defined
---> 51 train_model(early_stopping=True, epochs=50)

Cell In[97], line 21, in train_model(early_stopping, epochs)
     19 for epoch in range(epochs):  
     20     for i, batch in enumerate(train_loader, 0):
---> 21         real_inputs, imag_inputs, real_labels, imag_labels = batch
     22         inputs = torch.complex(real_inputs, imag_inputs)
     23         labels = torch.complex(real_labels, imag_labels)

ValueError: too many values to unpack (expected 4)```
feral kernel
whole zephyr
#

ok, so you can try to do print(type(batch)) and print(len(batch))

#

then print(type(batch[0])) and len(batch[0])

#

maybe batch contains some iterables like list or tuple or something

#

you need to see what your train_loader does to your data and see the shape of its outputs or something

#

@feral kernel basically the whole process of debugging this is to go to the source of your data (in your case, batch) and see how it is processed and why is it the shape it is

do some detective work and like go back on the stages of your data flow, see what happened at every stage with a print at its end - that's what I do to identify problems and their causes when I debug my code

feral kernel
feral kernel
whole zephyr
#

The error message "too many values to unpack" happens when you tried to pull out more elements from the tuple than existed.

#

so your train_loader doesn't do what you would want it to do

#

so rather than reshaping the batch, see what your train_loader does

feral kernel
whole zephyr
#

coz gpt is a funny guy, it's generative - i.e. creative

#

creativity is not always accuracy so yeah if it gets stuck in the same idea of bad code, it's over

#

is the train_loader a function or something?

#

and is it something from a library or custom defined?

feral kernel
#

It is the input after it has been resiZed. It is just some cfloat tensors

whole zephyr
#

that's the train_loader?

feral kernel
#

Im not home, i need to check the code… the train_loader process and loads the tensor

feral kernel
whole zephyr
#

see what the DataLoader does

#

maybe that's the issue

#

so basically the train_loader variable is a list or something

#

a list of lists or tuples with 2 elements

#

and you want it to be of 4 elements

#

where is the DataLoader function from? some pytorch component?

feral kernel
# whole zephyr where is the DataLoader function from? some pytorch component?

Im not home , but i have some histories loaded. import os
from torch.utils.data import Dataset

class CustomTensorDataset(Dataset):
def init(self, root_dir, transform=None):
self.root_dir = root_dir
self.transform = transform
self.file_names = [f for f in os.listdir(root_dir) if f.endswith('.pt')]

def __len__(self):
    return len(self.file_names)

def __getitem__(self, idx):
    file_path = os.path.join(self.root_dir, self.file_names[idx])
    tensor = torch.load(file_path)

    # Ensure the tensor has a channel dimension
    tensor = tensor.unsqueeze(0) if tensor.dim() == 2 else tensor

    if self.transform:
        tensor = self.transform(tensor)

    return tensor
whole zephyr
feral kernel
whole zephyr
feral kernel
whole zephyr
#

is it this thing?

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

whole zephyr
#

ok, so it doesn't seem to affect the shape of your input data, it just "slices" it

whole zephyr
#

so just to be sure, print the shape of one element from train_dataset and train_loader

#

and if the DataLoader doesn't mess with the shape, then your problem stems from the train_dataset, maybe your processing does not give you some shape 4 data or so

#

wait, I think training data might be the inputs only. so you forgot the labels lol

#

I overlooked that

#

so yeah, it's normal to unpack 2 values only.

#

anyway, see what your training data looks like - if it has both inputs and labels or whatever

feral kernel
#

I did print my train_loader. The labels variable is the same data as input.

feral kernel
gloomy crow
#

any idea why there is a faint blue line in the background that vaguely resembles the dark blue line? what does it mean and how can i remove it? Graphed with seaborn using lineplot with only the training loss (val loss is not present)

crystal phoenix
#

what???

#

some sort of bug

serene scaffold
#

It's not a bug

crystal phoenix
#

call the function 20k times and store the results

#

it was working but when I reopened colab it stopped to work

serene scaffold
#

Lottery isn't a function. It's an array

#

Don't reuse names

crystal phoenix
serene scaffold
#

Maybe range is the array

crystal phoenix
#

i restarted the session and it worked

#

:P

#

weird

serene scaffold
#

Great
Don't use names of built-in functions and classes for arrays

#

Or anything else

crystal phoenix
#

i haven't done that

serene scaffold
#

But with notebooks, deleting code that you already ran doesn't undo it

crystal phoenix
#

well thanks for help, glad it works now

serene scaffold
#

Yw

crystal phoenix
#

whatever i did

serene scaffold
#

There is no other explanation

#

The chances of other possible causes are next to none.

onyx forge
#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

serene scaffold
#

are you sure the underlying data is actually getting changed? because it might just be that google sheets is displaying it for you as an excel-style sheet

queen junco
#

That's the point of a bias in a network cant I just change the weights and get the exact same thing

final kiln
#
tensor([[0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2],
        [0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2]], device='cuda:0')

that's coming from the gpt I trained, it sorts arrays

feral kernel
# feral kernel --------------------------------------------------------------------------- ```V...

I'm really getting annoyed by coding, I have spent 80-90 hours trying to get this data to run on this custom neural network, it is not working. It has errors one after another... It is insane, i'm still on downsampling, i haven't even started training and tuning the weights and weight generators yet...Man this will take forever. Rewriting neural networks with a custom backpropagation and custom activation function and custom transformed dataset is hard. Man I need to learn more pytorch and python...

desert oar
final kiln
#

@desert oar after a ton of tensor indices shenanigans (I used an optimization that merges Q, K, and V for the three heads into one matrix ), I managed to place it into the xMx.T form during inference time:

Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
input = tensor([0, 0, 1, 0, 0, 1, 1, 0, 2, 0, 1], device='cuda:0')
output = tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2], device='cuda:0')

the first three prints are from the three heads, to confirm that it is in fact doing it

final kiln
#

uhm

#

that's not induced distance, that's induced dot product rite

final kiln
# final kiln

this is also not symmetric, so it's actually not a correct interpretation, wonder what will happen if I force it to be symmetric and positive definite

desert oar
desert oar
# final kiln

so you trained the normal way, then precomputed M for inference?

#

as a quick test, you should get the same outputs from both versions, given the same input

#

up to a few decimal places of course

#

M is just the product of two projections, right? does it need to be symmetric? it's not a correlation matrix

#

X M Xt should however be a correlation matrix

final kiln
final kiln
#

The end result is a tensor of integers tho, didn't compare the rest of it

final kiln
#

It might be super worth it to insist on making M a metric tensor somehow, cuz then we have access to a ton of math that exists out there to describe high dimensional non eucledian spaces

#

Like, if the network doesn't mind doing it, or even the performance doesn't drop in any way, this way would be a lot more interpretable

desert oar
final kiln
desert oar
#

nice

final kiln
#

BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')
AFTER M
Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')

#

copied the last print on accident ._.

#

in any case, the next step is to see if I can train it without Wk and Wq, directly using M, and then im gonna see if i can force M to be a metric

desert oar
#

interesting

#

does it make sense that it should be?

final kiln
# desert oar does it make sense that it should be?

Yes, I'm hoping the network can induce a metric on the space of embeddings. If it does, we can start talking about the latent spaces in terms of distances and angles and areas and etc, which, at least for my brain, is a lot more intuitive to talk about.

desert oar
final kiln
desert oar
#

ah, i see

#

interesting idea for sure

final kiln
#

It might be too restrictive for the network though, but there's only one way to find out

desert oar
#

i'm not familiar with metric tensors in general. that X M X.T has a nice interpretation in that case? aside from being a correlation matrix

#

or do the elements of M take on some specific interpretation?

final kiln
#

I'm gonna see if I do some benchmarks of Wq, Wk vs M to see if I found an optimization, but I doubt it since I suspect that the researchers that came up with this started from xMx.T and then split it into Wq and Wk for optimization purposes

desert oar
#

i'm not sure about that actually. my impression is that the QKV system was always meant to be the "soft lookup" mechanism

final kiln
desert oar
#

would have to look into the literature from 2016 and earlier to see the history of the idea before the big attention is all you need paper

desert oar
#

are you retraining nanogpt or something?

final kiln
#

ah, was gonna attach the code but the python bot didnt let me

#
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])
print("AFTER M")
nanoGPT.finish()
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("input =", inputs[3])
print("output =", max_indices[3])

this compares the output

#

I'm gonna clean it up a bit and put it in its own repo because now there's gonna be a lot of experiments

#

I'm too curious aboout the metric thing, but isn't really a priority

#

the actual thing I have to do now is to branch the model so that it can take in two inputs

#

also train it on a corpus of text like shakespear

#

and start thinking about how I'm gonna do the whole voice thing

#

the whole thing is gonna be inside a simple web app that im gonna code, it's gonna be like:

http/ws traffik <---> traefik service <-> go + htmx + tmpl <-> fastapi + model

all in compose

desert oar
#

i'm impressed at all the projects you've made time for, i barely have time to sit down and think through a project idea

final kiln
desert oar
#

hah, enjoy it

#

as long as you have the money

#

there was a period of time when i had no living expenses and no job, my biggest regret was not milking that for longer

final kiln
#

being in a LCOL country and remoting to HCOL allows me to do these kinds of breaks to upskill

#

tho this is the first and last time tbh, im hoping to land an on site ML Eng job to get me both out of the country and out of the house >.>

desert oar
#

fair enough. just remember to enjoy the moment of freedom!

#

these are also great projects for ML Eng

onyx forge
#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

mint plume
#

Hello everyone!

brittle storm
#

ok so, i have a python assistant and i am adding a new feature to it.. called "tasks".. its like the to-do lists.. and i connected it to the database and all and i even managed it to get the task and inform me when its time.. i want it to run in the background when the another script is running foreground.. and check if the time matches. how do i make it run in the background and run another script foreground

past meteor
pure pond
#

can anyone recommend me a kaggle competition / dataset / something like that to get beginner experience with finetuning bert models? The task doesnt matter so much I guess, I want to get pytorch experience mainly (coming from knowing a lot of theory of transformers), but something in fashion would be nice, so related to rag or clustering

serene scaffold
#

(I am a computational linguist professionally--those are the first two things for which I fine-tuned BERT to learn pytorch.)

desert oar
serene scaffold
# desert oar how does a transformer work for NER? do you have labeled regions in a token sequ...

you add another linear layer for the number of classes in the data, and each output is a tensor of shape (batch_size, sequence_length, class). For the sequence_length dimension, all sequences are right-padded with padding tokens to make them as long as the longest sequence in the batch. And then each element represents the probability that the token for that "row" belongs to the class for that "column". You need a "column" to represent the null class (the token is not an entity)

#

also, since BERT does subtokenization, you usually get better results if you label all but the first subtoken as null.

#

So if you tokenize "That is unbelievable" as ["That", "is", "un", "##believ", "##able"], and you're classifying parts of speech for pronouns and adjectives, the classifications would be [PRON, null, ADJ, null, null]

#

@pure pond take note of that ^

desert oar
#
          class 1    class 2    class 3
token1:  ...
token2:  ...
desert oar
#

one thing i've actually never been sure about for these models is how text generation works

#

or in this case, output generation

#

do you feed in a sequence one token at a time? like predict(seq[:1]), predict(seq[:2]), ...?

serene scaffold
#

I've been needing to learn more about that since interactive LLMs became ascendant

desert oar
#

i know bert isn't really meant for that, you just stick in a sequence and get a sequence of outputs, right?

serene scaffold
#

for BERT, yes

#

you can actually fine-tune GPT-2 for anything you might want to fine-tune a BERT variant for, and the API is the same because of hugging face. but the inner workings presumably are not.

This is a notebook I wrote a few months ago that does both. https://github.com/center-for-threat-informed-defense/tram/blob/main/model-development/train_single_label.ipynb

GitHub

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®. - center-for-threat-informed-defense/tram

desert oar
#

the idea being that gpt-style models are meant to only scan backwards through the sequence, whereas bert-style models are meant to examine the sequence as a whole

signal holly
#

is this supposed to happen
I was trying to convert my dataframe to all strings
and when I check the dtype I get all objects

desert oar
#

i suggest pd.StringDtype() instead of str

#

i think you can also pass "string" as a shorthand

#

!d pandas StringDtype

#

!d pandas.StringDtype

arctic wedgeBOT
#

class pandas.StringDtype(storage=None)```
Extension dtype for string data.

Warning

StringDtype is considered experimental. The implementation and parts of the API may change without warning.
pure pond
desert oar
#

they say it's experimental, but it's been stable for a while now

pure pond
# desert oar interesting. afaik the main difference is that gpt-style models mask off future ...

And yeah thats right. Openais gpts are decoders, which means they dont pay attention to the future tokens in the sequence. It doesnt mean that they dont produce embeddings which I see a lot of people get confused about. Each iteration they'll take in the whole sequence, get embeddings from attention and mlp layers, then produce a prediction for the next token in the sequence. Then they repeat, looping over those steps, adding the new token to the end of the input sequence each time (to get at your other comment). When training decoders, you mask out all the future tokens, and just predict the first mask. Eg if you have a sentence of 8 tokens, you can split that up into 9 training examples, the first you have token 1 and predict token 2, the 2nd you have 1 and 2 and predict 3, and so on (up to predicting token 9). https://www.youtube.com/watch?v=kCc8FmEb1nY is a super great video, you can just watch the 30m if you dont care about the details of attention layers even

#

auto-regressive means it "consumes" the output for the next input. Thats how people usually say it, I dont like that phrasing though, because it sounds like the output is used up or something. So to me, auto-regressive is that the outputs get added to the input, as you repeatedly loop over the (growing) input

desert oar
pure pond
#

well, thats right, if by complete sequence you mean for all time iterations, not just up to and including the current time iteration. But what I meant when I said that, was that some people hear encoder vs decoder, and then wonder how a decoder model can work when just given tokens as the input, because they think no encoder means no embeddings

serene scaffold
serene scaffold
pure pond
#

gpt is bidirectional, but up to a cutoff token. Token 1 can attend to token 5, if both already exist

#

i think (actually im not sure anymore)

serene scaffold
#

and my team created the dataset. but I wouldn't use it for what you're doing.

queen junco
#

Bro

#

3 mil lines

pure pond
queen junco
#

20354 kb

serene scaffold
queen junco
serene scaffold
queen junco
#

Because I can

serene scaffold
#

okay? it seems like spam.

pure pond
#

yeah thats what I mean, where do you get the pretrained bert? Also, when training your ner classifier, did you freeze the bert and just train the head? Or no

queen junco
serene scaffold
queen junco
#

My brain keep output the same thing

serene scaffold
#

I have to go do Christmas stuff
everyone be good.

queen junco
#

I think it has something to do with my activation function

pure pond
#

ok ty for the help

queen junco
pure pond
#

whats the problem?

#

20mb too big?

queen junco
#

So my output is the same every time

#

No that's not it

#

My output is always the same and I think it's the activation function

pure pond
#

what output

queen junco
#

The neral network output

#

There all the same

pure pond
#

so?

queen junco
#

I don't want that

pure pond
#

why not

queen junco
#

It's not supposed to do it

pure pond
#

theres many reasons why it could be doing that

queen junco
#

Because it's supposed to selecting one thing

pure pond
#

i dont wanna explain every possible thing

queen junco
#

I thinks it's the activation function though

pure pond
#

you mean every output class has the same value?

queen junco
#

Yes

#

Lemme get the function rq

pure pond
#

bro install discord on your machine

queen junco
#

What?

#

Oh

#

The y is set to the last amount of inputs going into the hidden network

#

So I can get a range from lest output and set it between 0-1

#

And the inputs are randomized from the beginning

#

So they shouldn't be all the same

#

I know the weight

#

Is it because the weights aren't different

pure pond
#

ok I have a suggestion, try changing the activation function to figure out whats going wrong. Change it to just add up all the input values it's recieving. And then change that to add up 2* each value. And make sure each change has the expected effect on your outputs. You should find a clue doing that

slow vigil
#

I'm scraping some data from a website and it's all legal as per their robots.txt but the way they have their site set up is that I have to access a page for every single record I want to view and in my case that's like 100,000 records. So does anyone know how slow I should make my program run to prevent them from thinking I'm trying to DDOS their site?

signal holly
desert oar
#

also you can just use .astype like before

signal holly
signal holly
#

maybe I should be more concerned with that

final kiln
#

alright, I got some possibly interesting results, the left side is the correlation matrices (or dot product tables) of the model with the normal Q, K, V. The right side is the same, but for the model using only M where M is forced to be symmetric.

#

for example, head 0 metric 0, it's pretty clear that it is just a number 0 detector (0 is the first token)

#

while head 0 metric 1 is a number 2 detector

#

these are the actual tensors, left side is WqWk.T right side is M

#

I collapsed the model into a single self attention module and reduced the number of dimensions, these matrices are tables of actual distances between the vectors in the metric spaces that the network is creating

#

the first matrix essentially encodes the sort order between the numbers 0, 1, 2

#

not sure if this only happens in the metric tensor network, gonna try with the other one too just to be sure

#

this is from the normal Q, K, V network

#

yellow = larger distance

#

ok, so the yellow on the right most side is ocurring because the matrix allows for negative numbers, the diagonals are always 0 because the vectors are subtracted with themselves

#

I see this as an absolute win because I can clearly interpret what is happening on the first image

#

while in the second image im not sure if I can do the same on any of the tables

indigo wing
#

Hello, can anyone give me a Datascience, AI/ML and DL roadmap. I wish to deploy my own model and shine my resume. I also wish to make some great research papers and do github dtuff in ds mostly. I am very bad in python and this all. May I please get some roadmap/advice/resource I can follow?

I am a datascience student but wish to learn from scratch(didnt pay attention till now) and redo the math as well. I am asking for any solid advice. I am interested in NLP, video generation etc

#

I wish to become proficient in all the AI/ML related things

shut slate
#

What is the difference betwwen difference between df.loc['column'] and df.loc[df['column']]

tidal bough
shut slate
#

don't get it lol

#

sorry

desert oar
#

and what are the axes? there are only 3 tokens in the input?

desert oar
#

the 2nd thing is very likely not something you ever need to do

desert oar
final kiln
# desert oar and what are the axes? there are only 3 tokens in the input?

Axis numbers are meaningless, I'm not being very rigorous.

I'm comparing every allowed token with each other. There's three tokens, A, B and C and the model sorts a sequence of 11 tokens.

The values of the matrices are:

u = vocab[I] - vocab[j]

value_ij = uMu.T

The three matrices come from the three self attention heads in the self attention module.

shut slate
desert oar
desert oar
final kiln
desert oar
#

i actually don't know: can .query access the index?

final kiln
shut slate
#

I am not trying to do anything rn lol. Just exploring

desert oar
final kiln
#

Should be Module 0 Head 0, Module 0 Head 1 Module 0 Head 2.

desert oar
#

@final kiln really interesting experiment. where did your input sequences come from?

#

i wouldn't have any idea how to generate realistic synthetic data for something like this

#

can you force it to be positive semidefinite by using the cholesky decomposition somehow?

shut slate
#

So how would you change values where by in Column A it equals bob for example?

desert oar
#

not sure if setting L L.T = M is enough

final kiln
final kiln
desert oar
final kiln
#

But idk if it forces it to be positive, maybe it does ?

desert oar
desert oar
shut slate
#

ok cool

desert oar
#

however if you want to select rows where the row label (aka "index") is a specific value, you use .loc directly: df.loc["bob"]

#

often if you have some kind of unique identifier, id number, timestamp, etc it's a good idea to use that for the row labels. it makes some things a lot tidier and a lot faster

final kiln
desert oar
final kiln
#

But it seems that yes

#

Then the experiment is complete. And I've halved the number of parameters in each head because it's a symmetric matrix

desert oar
#

i'm really surprised this isn't an established technique, if you have code I'd love to see it so I can play around with it

final kiln
#

Sure, I'll send you in a bit.

#

I've just taken Wq and matmul it with itself

#

And deleted Wk

#

Next step really is see if this scales, benchmark it and all that stuff.

#

If no one has ever done this I'm gonna write a small paper on it

desert oar
#

yeah that's what i wanted to try, running nanogpt with it

#

definitely worth a paper with your results

#

and if someone else already did it, then you'll find out 😆

desert oar
# shut slate ok cool

next time you ask for help in 2 different places please link to your other conversation for context

final kiln
shut slate
#

Ok thank you

final kiln
#
        # prepare for matrix multiplication
        q_b3wk = q_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
        # k_b3wk = k_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
        v_b3wk = v_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
      
        # perform matrix multiplication
        attention_scores_b3ww = q_b3wk @ q_b3wk.transpose(-1, -2)

this is the key difference

#

notation goes like

#

name_of_tensor_bwk, b = batch, w = words, k = coordinates // 3

#

it's tensor notation

#

v_b3wk = shape of (batch, 3, words, coordinates // 3)

desert oar
#

makes sense

#

hmmm, wait. that's not the same thing

final kiln
#
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])

this will print a result after training

desert oar
#

or is it? i guess you're just recycling the variable name

#

i'd call them W or something like that

final kiln
#

they already include the matrix mul

desert oar
#

yeah i see it

final kiln
#

(xWq)(xWq).T = QQ.T

desert oar
#

right

#

i'm on my phone now so i'll check the full code later. thanks!

final kiln
#

at least gpt4 cant find anything of the sort

#

Im skeptical about how far this can be taken tho, if I can teach it to do shakespear without loss of efficiency I'll be very excited indeed

magic island
#

anyone here ever worked with image or object detection?

pale hemlock
#

any one around to toy with an idea?

white hedge
#

hi

lament pine
#

https://www.youtube.com/watch?v=OGxgnH8y2NM&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&pp=iAQB is this playlist good enough to follow? ( considering it's 7 years old now )

The objective of this course is to give you a holistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms.

In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, a...

▶ Play video
#

If anyone followed it , kindly share your feedback

vapid garden
#

Hi everyone I received a hiring challenge which is pretty lame I think, basically it states that I have to find the pattern from the number which pop up in the website continuesly and I'm confident that I can create ml model but how can anyone suggest ideas on how I can use javascript to predict the label simultaneously while the numbers appear the link for the website is below

https://superbhai.in/hiring-challenge-unscramble?m=n

tidal bough
#

you did find the hidden link to the full dataset, right?

vapid garden
#

No, Idk how to do... I don't know webdev

tidal bough
#

sounds like you won't get hired, then
if you look at the source, there's a comment suggesting to find what makes the numbers change. by looking at the JS script in the debugger (which even has a source map, no decompilation needed) you can find the link to the full dataset in a comment.

whole zephyr
#

Hi, I have an algorithm that detects trendlines the "low" and "high" levels of a signal in a dataframe, for a time window that I can set.

After I have the trendlines' slopes saved into the dataframe, I try to find which trendline pairs are parallel and: ascending, descending or horizontal.

Can anyone hop on a call with me to look through the code? It's a bit long to explain

serene scaffold
whole zephyr
arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold
#

^ permanently remember the information here

whole zephyr
#

Here are the functions I use to:

  • classify the trend patterns (wedges, channels, triangles) from the trendlines' slopes
  • get the dataframe index of the N-th occurrence of a certain trend pattern that was identified by the previous function
    https://paste.pythondiscord.com/HLCA
#

My problem is that the trendlines for channels don't seem to look too parallel, even though sensitivity = 0.001 and the condition for descending channel is abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity

In the second pic, it says I have slopes -0.06 and -0.01 which give an absolute difference of 0.05 that's greater than 0.01, which is 10*sensitivity.

I don't know where my function failed.

desert oar
rigid cape
#

Hey guys, sorry for asking such a beginner question. But I have a confusion. Do I need to learn a lot of math before learning ml and do I need SciKitLearn before I touch something like a Tensorflow ?

What would a typical roadmap look like

grizzled locust
#

guys, what did i do wrong.

#

i wanted to look like this

#

but instead, it look like this

whole zephyr
# desert oar so maybe look further up in your code? if the slopes are not what you expected, ...

My first suspicion was the part that "marks" the trendline pattern with 1, but I then input the upper and lower slopes into the if conditions and got "nothing":

sensitivity = 0.001
slmin = -0.06
slmax = -0.01
if abs(slmin - slmax) < sensitivity*10 and slmin > sensitivity and slmax > sensitivity:
    print("chan_asc")
# chan_desc is appended 1 when slopes are almost parallel and both of them are below a small negative value
elif abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity:
    print("chan_desc")
# chan_hor is appended 1 when slopes are almost parallel and both of them have a really small amount (around 0)
elif abs(slmin - slmax) < sensitivity*10 and abs(slmin) < sensitivity and abs(slmax) < sensitivity:
    print("chan_hor")
# tri_desc is appended 1 when slmin is very small and positive and slmax is negative
elif slmin > sensitivity and slmax < -sensitivity and abs(slmin) < sensitivity:
    print("tri_desc")
# tri_asc is appended 1 when slmax is very small and negative and slmin is positive
elif slmax < -sensitivity and slmin > sensitivity and abs(slmax) < sensitivity:
    print("tri_asc")
# wed_desc is appended 1 when slmin is negative and slmax is negative but slmin is greater than slmax
elif slmin < -sensitivity and slmax < -sensitivity and slmax < slmin: 
    print("wed_desc")
# wed_asc is appended 1 when slmin is positive and slmax is positive but slmin is greater than slmax
elif slmin > sensitivity and slmax > sensitivity and slmin > slmax:  
    print("wed_asc")
else:
    print("nothing")
#

I also tried to see if the plotting function calculates the trendlines differently than the function that adds their slope and intercept to the dataframe, but they use the exact same method on the exact same time frame, so that's no issue either. I'm literally out of ideas on what to test

bold timber
#

So all this time, I've been "trapped" by my curiosity about Transformers architecture. And it turns out, to study the Transformer architecture, I had to go back and learn the creation of machine translation models using RNN and RNN+Bahdanau attention.

As a result, I have now built three machine translation models using the TensorFlow library. However, when I compare the results, it turns out that the machine translation model using the Transformer architecture has much lower accuracy compared to the other two RNN models, only around 55%. Here are the details:

Classic RNN
loss: 0.1082 - accuracy: 0.9762 - val_loss: 0.4065 - val_accuracy: 0.9304

RNN + Bahdanau Attention
loss: 0.2247 - accuracy: 0.9593 - val_loss: 0.5537 - val_accuracy: 0.9100

Transformer
loss: 0.9383 - accuracy: 0.7688 - val_loss: 2.6770 - val_accuracy: 0.5565

On the one hand, when I look at the results of others who have made machine translation with the Transformer architecture, the validation loss values they get are similar to mine, around 2.4 (this output is generated when using the PyTorch library).

My question is, why does the accuracy of the Transformer architecture seem much lower than that of the RNN models? Initially, I thought this might be because the accuracy metric used for model performance measurement is "accuracy," which, when creating a machine translation model, should use other metrics, such as using BLEU Score. However, on the other hand, when I want to use BLEU Score as the accuracy metric, my computer is not capable of running it.

Is this "accuracy" metric not reliable, so that's why the RNN model performs much better than the Transformer? or what?

proud wing
#

How deep did you go with regards to the transformer architecture learning?

odd meteor
#

Merry Christmas guys! Today is a good day to close your laptop and enjoy the Christmas celebration with your family and friends. 😁👌

proud wing
#

Bleu is likely to be more suitable here @bold timber

#

Is your model tuned to the dataset ?

bold timber
bold timber
odd meteor
bold timber
bold timber
odd meteor
# bold timber No. But when I try to use BLEU score as a metrics, I need to set run_eagerly = T...

Can I see the code snippet where you're computing BLEU?

If you're using the bleu module from NLTK and it's not working for you (it should work under normal circumstance though), then try using sacreBLEU

https://github.com/mjpost/sacreBLEU

GitHub

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons - GitHub - mjpost/sacrebleu: Reference BLEU implementation that auto-dow...

bold timber
dusk tide
bold timber
odd meteor
# bold timber do you know how to using the bleu module from NLTK as an accuracy metrics in the...

Once you've trained your NMT model, and you'd like to evaluate the model performance, you just need the translated text (the sentence predicted by your model) and the actual text from the target language (the correct sentence you expect your model to predict) compute BLEU score.

from nltk.translate.bleu_score import sentence_bleu
reference = ['It was raining heavily today'.split()]
candidate = 'It It It is raining heavily'.split()
print(sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0)))

We have different types of bleu computation; sentence_bleu, corpus_bleu... This will provide more clarity https://machinelearningmastery.com/calculate-bleu-score-for-text-python/

BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations. Although developed for translation, it can be used to evaluate text generated for a suite of natural language processing tasks. In this tutorial, you will discover the BLEU score for evaluating and scoring...

bold timber
odd meteor
bold timber
bold timber
odd meteor
# bold timber By the way, how do you determine weights in sentence_bleu?

The BLEU score calculation in NLTK gives us leverage to compute either the cumulative or individual BLEU scores of different n-grams.

By default, sentence_bleu() calculates the cumulative 4-gram BLEU score, also called BLEU-4. The weights for BLEU-4 are (0.25, 0.25, 0.25, 0.25) (remember, 1/4 == 0.25)

Now, in the example I showed earlier, the weights are (0.5, 0.5, 0, 0) because I'm interested in computing BLEU-2 a.k.a bigram bleu score (remember 1/2 == 0.5 )

If you check the attached link in my previous response, it has a section called **Individual vs. Cumulative BLEU scores** where this concept was explained in more detailed way

vapid garden
#

Hi everyone I'm want to fine tune a MultiModal llm which can generate icons based on input text, I'm not able to find open-source models for it.. any idea?

bold timber
midnight root
#

so i seem to have found 2 ways to break chatgbt 3.5

#

is it common? by break i mean by passing the explicit rule

small wedge
midnight root
#

no i just straight up asked questions about something then kept slighty changing the subject

#

it worked

#

twice

agile cobalt
#

you can report it directly to OpenAI, but it is not ultra surprising
that sort of stuff (using ai tools / prompt engineering / their vulnerabilities) is not particularly on topic for this server though

midnight root
#

also chatgpt does lie alot

#

i just wish there is a way to get actual ai experience

desert oar
desert oar
midnight root
#

a computer who is programed to tell the truth, and is not restricted always

#

i understand there are some topics which it should avoid but it seems its always lying or just not telling the whole truth, or just trying as much as possible to give you 1 single opinion

small wedge
#

That's not really how these models work, there is no concept of the truth to them

#

They are predicting the most likely text to come after whatever you give them

#

Even if they were trained on 100% factual and true datasets (which they absolutely are not) they could still give you false information

midnight root
pure pond
#

also, outside of pure logic, truth isnt really so well defined, and especially on topics a lot of people care about. By that I mean, people disagree with eachother, and not beacuse all but 1 are intentionally being malicious

#

you dont even have to look to politics, think of the legal industry. We try to create a set of rules, where if you do x then y happens. But it doesnt work like some complex algorithm in a deterministic way, the quality of your lawyer has a huge impact, and facts if they exist are interpreted

#

so, how do you even measure how truthful an llm is?

#

and sure there things that are widely accepted to be true, or at least by communities who should understand the topic well, but there is so much grey area everywhere that it doesnt help that much (and youre not interested in restriction of topics)

#

@midnight root

#

and if you really just want an unrestricted model now, get mixtral 7x8b running

pure pond
#

anyway

#

I (think I) know a fair amount of deep learning theory but I want to get some practical experience. I'm doing a kaggle competition and im wondering, how do most people do their data processing? By that I mean setting up something to feature engineer each input, not so much the mining. Do you just write whatever function to do whtever you want, or is the industry standard to make (heavy?) use of sklearn pipelines or whatever? I guess it depends on the task, but lets say youre working on your own on a relatively static and manageable data set, ie not worrying about big data streams in the cloud or something

pure pond
#

actually, for my question, does anyone have any examples of like a github repo where they implemented an ML pipeline? Which was done either for a job interview kind of scale or while preparing? That'd be perfect

final kiln
#

Idk y I haven't used this before, MLFlow is amazing

#

It's running on an ec2 instance and connects to S3.

#

In gonna do three experiments, one for normal GPT, another for a slightly modified GPT and another for a full variation on the transformer where I also incorporate a couple other ideas I have

#

I'll train them on the same Shakespeare dataset and compare performances, and also how the performances scale with model size

#

I've already put some thought into how I'm gonna do the multimodal thing. It's gonna be a voice input to text output because one step at a time. The idea is to take whisper, invert the decoder (somehow) so that I have a text to embedding translator, then use lamma to generate text, so I can use it to fine tune another lamma to understand the embedding. Turning lamma into a decoder for the whisper encoder.

pure pond
#

cant you just get embeddings from whisper?

#

and how will you feed embeddings into llama

final kiln
final kiln
#

Greatest challenge will be to invert whispers decoder

final kiln
#

It's not so easy, you can get to the bottleneck (which I think is what that does im not sure), but the problem is that the decoder has two inputs

pure pond
#

also I dont think finetuning llama will work for this, youd have to (pre)train it all

final kiln
#

I'll try it all the same, if it doesn't work I'm sure the lessons learned will inform my next step

pure pond
#

ofc

#

as yeah I see people parrot this a lot but not really know where its all come from

final kiln
#

Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

pure pond
#

I guess thats a fair point, its more saying that finetuning brings out what the model already knows, but I spent about a third of the year just funetuning llamas, trying to get them to have a better understanding of the domain of my companies industry, and didnt get that far. Its why theres been such a hard pivot to RAG now (imo)

final kiln
pure pond
#

I mean sure, I'm not really an expert, I could (in theory) tell you details of how it went at my company, but its also just our team and a fairly small one. I see finetuning as getting the model to output in more predictable ways, and in ways you design. What youre doing is more the other side, getting the model to take an input that it's unfamiliar with. So instead of condensing whatever region of the embedding space represents its "understanding" in different ways, you want it to handle unfamiliar regions of the space. But tbh, Ive never tried anything like that, and cant think of any experiments like that either that I know about at least

#

so yeah this is a cool thing to try actually, hit me up if it goes well lol

#

but it might just be a hopefully easy enough to do transformation to get the embeddings to what llama knows

final kiln
#

or at least, that's what I heard how they do it, not gpt4 specifically ofc cuz that's not opne sourced

#

the reason I wanna fine tune is that the embedding will contain other useful information that I know language models understand but can't get from most text prompts

pure pond
#

yeah, thats where my gut feeling is that pretraining will be better suited than finetuning kicks in. But again, idk really

#

yeah idk, interesting problem

final kiln
#

like, I know GPT4 "understands" sadness, conceptually, it can describe it, adapt prompts if you tell it you are sad. but you need to do extra work for the prompt to include emotions

#

a speech embedding will already contain all that

#

and since the language model also already contains some representation of it

#

I'm hoping that it will just sort of adapt to it

#

this is late stage stuff tho

potent sky
#

interesting approach tho, backwards_propagation
i've thought about this problem a bunch too, the information lost in the original embeddings when going for multimodality
but this would occur only in "put-together" multi-modal networks so to say right, not in ones designed for multimodality

#

which is a long winded way to say that that should be your pre-training task

final kiln
potent sky
#

yes but not as input

#

and when you remove the input layer the embedding space shifts

final kiln
#

The information is extracted from the embeddings, so it must encode meaning.

potent sky
#

yes but LLMs are not trained to decode structure from embedding space inputs, they're trained to decode structure and meaning from natural language inputs

final kiln
#

The text input is first translated into an embedding, using a table of embeddings that the network learns.

#

I personally don't like arguing like this, I've banged my head against the wall so many times by trying to guess what's possible or what's not using my own intuition.

potent sky
#

xd i get it

final kiln
#

I've learned to accept that I just can't predict what works until I try it

potent sky
#

you'd still need a pre-trained joint embedding space model at minimum right

#

but that isn't a huge ask

final kiln
#

I'm just gonna let the first layers change. Possibly apply a decaying learning rate as a function of model depth

potent sky
potent sky
#

lmk if it shows promise!

final kiln
#

Sure, I always post here what im doing in case people want to chip in. Always super helpful to discuss and hear new ideas as I'm not very experienced in NLP

potent sky
#

ah nice! I used to be pretty regular here but life gets in the way lol, still drop in from time to time

pure pond
# potent sky and when you remove the input layer the embedding space shifts

I think youre in agreement though, backprop is saying that there will be a network that tries to translate whisper embeddings to llama embeddings, getting rid of the first part of llama that turns the sequence into embeddings. And the hope being that this (can) learn how to do that shift. In a sense, instead of whisper -> text -> llama, it'll be last embeddings of whisper -> translation network -> first embeddings of llama, so avoiding destroying information when you convert to text in the middle

I think this is all fine in theory, but my intuition is that llama wont really make use of the extra information whisper knows about from finetuning, it needs pretraining as a fresh language model

#

no reason not to start with finetuning though considering compute (btw backprop, in case you havent come across it, try LORA)

potent sky
#

Your last paragraph is what I think the joint space training would be useful for imo, that I mentioned

#

LoRA and QLora

pure pond
#

ah ok, wasnt familiar with the term

potent sky
#

Though I wonder for the joint space embeddings
Is it necessary to pre train a full model
Or can we use an adapter network in some sense

#

Interesting stuff

potent sky
frigid owl
#

Hey guys, i know its a pretty dumb question but what is x_train and y_train and why we always have to have both x and y (e.g: x_val, y_val; x_test, y_test)?

serene scaffold
#

for each instance, y is the property that you want the model to output, and X are the properties it can use to do that.

#

So if your data set has these properties about a house: {square footage, number of rooms, neighborhood, price}
and you want to be able to predict the price
then price is the y value, and {square footage, number of rooms, neighborhood} are the X values.

#

make sense so far?

frigid owl
#

yeah

#

ty

#

So if i want a model to predict a number for each image i need a matrix with all pixel (x) and the number on image (y)?

serene scaffold
#

the whole image (the array of pixels) would be the X value, and the number that image represents would be the y, yes.

frigid owl
#

Oh okay

#

Again thank you so much

serene scaffold
#

yw

hidden sapphire
#

How does a model that does what https://thispersondoesnotexist.com/ does work? All MLM problems (very few) I've done have had inputs and outputs, this just has an output. How does training a model like that work? (I'm not looking for an in depth explanation though I'll hapily take that, but more a few pointers on what to research)

desert oar
soft dock
serene scaffold
#

I got one who only sort of has glasses

hidden sapphire
hidden sapphire
potent sky
#

Thankfully thispersondoesnotexist.com has been improved a lot, I think both with newer models, and importantly better data.
I remember there used to be a good share of NSFW images generated a few years ago
Which makes it awkward when you're excited trying to show someone this cool thing about your field 😅

spark tartan
#

Can someone explain what's going on in this validation curve? I think it shows underfitting since the training and cross validation score are still steadily increasing. That right?

lapis sequoia
#

Anyone know how to fix this?

#

My df styling seems messed up

#

Columns are taking a lot of width and trying to expand fully

lapis sequoia
#

The default itself has changed. I haven't run any other cell

oblique quarry
#

Hey there! Im trying to implement a GMM. Im a bit struggling to implement the pdf for e step. This is my current implementation ```py
import numpy as np

def pdf(data, variance, mu):
covariance = np.diag(variance)
return np.exp(-0.5 * (data - mu).T @ np.linalg.inv(covariance) @ (data - mu)) / np.sqrt((2 * np.pi)**data.shape[1] * np.linalg.det(covariance))```

#

i read about it in a paper covering the basics about multivariate data analysis, but translating it into code is always a different thing

odd meteor
# spark tartan Can someone explain what's going on in this validation curve? I think it shows u...

You didn't tell us which score is displayed on your Y-axis. RMSE or Coefficient of Determination (R2) or ??

Well, I'm gonna presume y-axis is the loss. From the learning curve, you could see there's a convergence between your train and cv score. As expected, the loss tends to increase as the number of neighbours increases.

It's certainly not overfitting. I won't say it's underfitting either.

While the goal is to train a model that makes 0% mistake in its prediction, your model prediction on train data happens to be making ~35% error in its prediction. ( You know it could have been way worse depending on your predefined threshold. For example, for some people, if their model was making, say, >= 70% error on the train data they'll shutdown the whole thing and retrain. For some, >= 50% error on the train data is enough to trigger a complete overhaul, which would of course, require training a new model)

solid seal
#

i need some help with computer vision

#

i got to take keyboard inputs for the game from the hand gestures

#

any appropriate library for that to help me out with it

#

pls ping for the reply

potent sky
solid seal
solid seal
#

heres the code for that

#

oh ig , we cant send python file here

heavy sigil
#

Hey, i'm kinda stuck here, i'm trying to use open ai whisper for transcription, but i'm having trouble loading the model:
I'm just use the basic transformers code from hugging face:
import whisper import torch from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq print("Downloading the model...") direct = "model/" model_1 = AutoModelForSpeechSeq2Seq.from_pretrained('openai/whisper-base', cache_dir=direct) processor = AutoProcessor.from_pretrained('openai/whisper-base', cache_dir=direct) model = whisper.load_model(model_1) print("Model loaded. Transcribing test audio...") result = model.transcribe("audio_test.mp3") print(result["text"])
It seems that i can't directly load it (i'm just doing it wrong, thats all i know, but i'm not sure what to do. Do i need to use transformers?

quaint crescent
#

Please critique my chart

oblique quarry
quaint crescent
#

less clutter on y axis

desert oar
desert oar
desert oar
heavy sigil
# desert oar thanks for the code. ideally you can include an error message to make it easier ...

thats the error i get

model = whisper.load_model(model_1)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper_init_.py", line 135, in load_model
elif os.path.isfile(name):
^^^^^^^^^^^^^^^^^^^^
File "<frozen genericpath>", line 30, in isfile
TypeError: stat: path should be string, bytes, os.PathLike or integer, not WhisperForConditionalGeneration

heavy sigil
#

@desert oar

final kiln
#

im setting up some more infra for the model training, after some research I found that the most efficient way is to use amazon spot instances

#

so im gonna setup a self hosted github actions runner that will be responsible for baby sitting the spot instances

#

b4 that I need to setup spotty, which is the thing that'll actually handle them

#

spot instances are like ec2 instances, but they can be brought down any time by aws, making them cheaper, spotty will essentially handle the fault tolerance aspect of it, it's specifically made for ML training

outer widget
#

I dont think you even require AutoProcessor for transcribing from model.transcibe in whisper models.

final kiln
final kiln
heavy sigil
outer widget
heavy sigil
#

for whisper.load_model can i just specifiy the directory of the model

outer widget
arctic wedgeBOT
#

whisper/__init__.py line 99

def load_model(```
final kiln
frigid owl
pure pond
heavy sigil
#

Yeah it seems that it can load a checkpoint file but i don't see any??? i tried:
model = whisper.load_model("model/models--openai--whisper-base")
and i get RuntimeError: Model model/models--openai--whisper-base not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

pure pond
heavy sigil
#

@outer widget ??

heavy sigil
#

i just realized that i don't need to use transformers, if i tell it to load for example base.en it will download the required file automatically, however i get this error now:

FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
File "C:\Users\Default.DESKTOP-KA61FU5\Desktop\Talk2GPT\Talk2GPT.py", line 27, in <module>
result = model.transcribe("audio_test.mp3")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\transcribe.py", line 122, in transcribe
mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 140, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 58, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 1024, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python311\Lib\subprocess.py", line 1493, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

heavy sigil
#

@final kiln Do you know anything

final kiln
final kiln
pure pond
#

makes you wonder how big llms really need to be

#

is anyone interested in RAG? I'm reading through https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/446422 and dont really understand why/what logit theyre feeding into their binary classifier head (the steps under the figure).

So, they have the question with the retrieved context. The way I (try to) understand it (which also goes against other things they say they do so I know I'm wrong) is that they take that context+question sequence, then append the answer to the sequence, run the sequence through the model to get the logits for every possible token in the vocab, and feed those logits into a classifier head? Why would this give a helpful measure of how correct the answer is? Surely the LLM is happy to spit out the next token anyway, I dont see what relevance specific logits there has?

I guess my first question is, is it right that they only forward through the network 5 times for each question (because of the 5 answers)? Just to make sure I'm on the same page

Then 2, is it just a thing that I have to accept that they can classify over the logits for each answer? I dont see where they got the idea from, what the intuition is. If it was me I'd like to try to get the logits for the model producing tokens A/B/C/D/E with whatever setup is needed for that

#

actually I think I have the right understanding of WHAT theyre doing, but still unsure as to WHY

heavy sigil
pure pond
#

also I find it weird that they do this

    if tokenizer.pad_token is None:
        if tokenizer.unk_token is not None:
            tokenizer.pad_token = tokenizer.unk_token
        else:
            tokenizer.pad_token = tokenizer.eos_token

ie pad with unk, am I right in thinking thats weird?

#

ah I think Ive found the answer to my question from above, the diagram doesnt really explain but

        inst = f"Answer: {row['answer']}\n###\nIs this answer correct? "
        instructions.append(inst)

now I can accept the logits being useful

#

@heavy sigil when you read the error, what did you think of this line? result = model.transcribe("audio_test.mp3") in the context of FileNotFoundError ?

heavy sigil
#

it's in the same folder as the .py file

#

and i'm running the py from command line on that folder

pure pond
#

put this above the line that errors

import sys
print(sys.path)
#

and then google pythonpath

#

and also try changing the code to use the absolute path

heavy sigil
#

Hmm, the error is happening from the library not my code

pure pond
#

and sorry not pythonpath, current working directory

import os
print(os.getcwd())
heavy sigil
#

I have my working directory set correctly

pure pond
#

can you with open("audio_test.mp3") directly above the line that errors?

heavy sigil
#

???

heavy sigil
pure pond
#

idk maybe its not windows compatible

#

try doing it on a google colab

heavy sigil
#

Looks like it is?

#

i really can't

heavy sigil
pure pond
#

well just with a test file to rule out os issues?

heavy sigil
#

wdym?

pure pond
#

why does it need to be local

#

lets take this to dm also as im concsious of spamming this chat

spark tartan
odd meteor
potent sky
potent sky
rare osprey
#

Can anybody help me study these following topics

  • Mamba architecture
  • Liquid Time Constant Networks
#

Im also looking into hyperdimensional computing and q star so any help there would be nice

serene scaffold
rare osprey
#

Yes, it is

magic dune
#

Anyone have an good papers on NEAT?

hidden sapphire
pearl barn
#

Shows me this error what I have made worng ??

exotic smelt
# pearl barn

you need to use .format with the string not the function

#

and you haven't given any placeholder for the string also

pearl barn
#

Yeah I figured it out thank you

exotic smelt
#

alr

pearl barn
#

I wanna to ask does it print the both print orders if the equation is true can't I choose to print either if it is true and other order is false ?

swift torrent
#

worth trying to move from data engineer to data scientist ?

broken palm
#

anyone know how to solve this problem?

long canopy
#

any in-depth guide on customizing graph layouts in Networkx?

serene scaffold
broken palm
agile owl
#

@past meteor I thought of an idea based on an article I read about time vectors being used in language models. Suppose we have entirely real-valued parameters as codons of an evolutionary algorithm applied by an agent over a time series. For each period n in a periodization of the training period, use the EA to find the best-fitting model. Then, we can look for time-dependent vectors in the parameters and use them to extrapolate into the future.

past meteor
agile owl
#

sorry enjoy vacation

signal holly
#

Hi, I'm trying to learn ml using project-based learning, but I'm facing some troubles. When I get a project idea, like a spam email classifier, how do I go about building it without just searching up how to do it and blindly copy and pasting code? If I fall into this trap, I'm usually unable to provide my input and that becomes troublesome.

valid wind
#

Then you can ensure that you've learned it

pure pond
#

S4 as a precursor to mamba if you havent come across it also

signal holly
pure pond
#

i find banning myself from using copy + paste when in the copy code mode helps a bit, even if I just copy it out by hand

umbral charm
#
x = []
y = []
z = []
prodl = 1
summ = 0
sumi = 0
jj = -1
n = 2
for i in range(n+1):
    for m in range(n+1):
        for l in range(n+1):
            if l != m and l != i and l != jj:
                prodl = prodl * (n/2 - l) / (jj - l)
                x.append(prodl)
        if m != i and m != jj:
            summ = summ +  1 / (jj-m)
            y.append(summ)
    if i != jj:
        sumi = sumi + 1 / (jj-i)
    z.append(sumi)

print(np.prod(x) * np.sum(y) * np.sum(z))
#

This is my code, however for the last product of all it just gives an array full of 0

#

this is because if n/2 - l = 0, since its product of all everythng after is 0

#

so is the equation just wrong thna?

#

in my case n = 2, so when n = 1 the whole product is 0

#

this makes 0 sense

#

the equation just breaks

#

even worse i get '-0.0'

#

-0?

long canopy
#

for some reason, people have stopped using the Gephi Toolkit since around 2021 (last posts on their forums), but what the heck have people replaced it with?

zealous swan
#

Hello all, I am new to python and have some questions related to dimensionality reduction. I have a textual dataset about publications in the field of information visualization, my target is to apply suitable dimensionality reduction methods to visualize the large amount of data against 2 dimensions. So far, I have used the CountVectorizer and TfidfVectorizer to generate the tf and tf-idf vectors for 2 of my columns. Currently I am trying out different methods like PCA, TruncatedSVD and LLE. I am still unsure what makes which dimensionality reduction method more suitable for the task. When I used LLE, I got a really sparse visualization of only 3 points, so I assumed that all other points are overlapping and hence I see only 3, but I am unsure. If anyone can help me understand more about dimensionality reduction it would be great. I would appreciate any helpful resources. Thank you😊 .

desert oar
#

So you can look up spam classification for example, then you'll read about naive bayes classification and spam datasets. Then you do EDA, scripting to process data, and finally model fitting, but all of that is driven by figuring out conceptually what you need to do. The code becomes only a means to an end

#

At which point figuring out the code itself becomes a lot easier, because you'll already know what you need to do

desert oar
zealous swan
# desert oar UMAP is kind of the go-to dimension reduction technique for high-dimensional dat...

Thanks for the reply, I am actually working on a lab assignment and need to use at least 2 methods, I have already tried PCA, TruncatedSVD, t-SNE and LLE. The visualizations looked really dense and there was so much overlap in the scatterplot so I started to experiment with the min_df parameter for the TfidfVectorizer to see how it affects the visualization or if it makes it better. I will give UMAP a try.

elder falcon
#

Yo is anyone here in digital or computational humanities, I'm trying to relearn some corpus analytics stuff b/c i want to create a model for comparing the communication patterns or repetition of words and images specific to political topics and compare the averages i get throughout my data set to determine what comics are more political than others on a macro scale, and on a microscale seeing what words are specific to each comic that defines whatever political topic i'm exploring in the text.

serene scaffold
long canopy
#

feels like test-driven development could be used to make LLMs write programs that are a bit more sophisticated, no?

#

or, has anyone worked on the idea of making packages, basic types, basic classes, etc., be considered tokens?

flint umbra
#

I dont know if this is the right sub, but i have the following question:

Im trying to discriminate between images of numbers, i want to define image an image property that describes the amount of enclosures in the image. For example the image of an 8 has 2 enclosures, where as the images of 6 and 9 have exactly 1 enclosure.

see images i supplied: the 7 has no enclosures, the 4 has 1 enclosure and the Q has 1 enclosure. How would i write a python function that calculates the amount of enclosures present in an image?

final kiln
#

You can also invert the image, perform connected components and exclude those that touch the edge of the image and then count what's left.

hardy depot
#

heyuhm

#

i started getting this new error from autoscraper, its not giving an output anymore, just empty list after scraping

desert oar
azure compass
#

This LSTM shows some fluctuation in the validation MSE loss but its gettting to the point where I'm happy with the loss. Should I be concerned?

#

Also the validation loss is lower than trianing, but from what I understadn thats becasue of my dropout rate

devout creek
#

someone can help me editing a google colab project, Whisper Youtube, to use insanely fast whisper instead of just whisper ?

topaz turtle
#

does anyone here know tensor algebra? 👀

#

i'm trying to understand how to express arbitrary tensor contraction in terms of blas routines

#

for example, a tensor product between (2,3,4) and (3,4,5) can be expressed as a matrix multiplication, by reshaping the two tensors into matrixes (2, 12) and (12,5), then multiplying these two matrixes

#

but i can't quite figure out how to do it with a contraction like (2,3,4) and (5, 4, 6), where the contracted dimension is in the middle of the tensor...

#

any resources wrt that would also be appreciated

proud wing
topaz turtle
proud wing
#

hmm...no zero shot solutions for you but i can point you in a direction that might help

#

invariant inner products between vectors and covectors under transformation is how i'd look at mapping tensor ops -> matrix ops

#

and if you want a general algorithm you'd want something that is valid regardless of the coordinate system

#

and for a general algorithm it might be interesting to explore if you could derive a calculus function that sums over the indices akin to an integral (sort of 🙂

#

contraction is very similar to integration over a set of indices (or dimension0

#

sry i couldnt be more help ☕

proud wing
#

check out Einstein Summation

topaz turtle
#

what's kinda frustrating is that this algorithm already exists and is well known, bc this is how libraries like numpy and pytorch implement this

proud wing
#

yea

#

numpy uses einstein sums to optimize contractions

topaz turtle
#

but how tho 😭

proud wing
#

oh you want to kntow how it works?

topaz turtle
#

i mean, i understand how einsum works, i'm implementing my own tensor contraction algorithm that way rn, but i don't understand how that would help expressin einstin summations in terms of blas routines

proud wing
#
A = np.random.rand(2, 3, 4)
B = np.random.rand(5, 4, 6)

B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)

something like this?

proud wing
#

closely

topaz turtle
#

does this give the rame result as

numpy.tensordot(A, B, axes=[[2],[1]])

?

proud wing
#

well yes

topaz turtle
#

i don't know numpy well, so i'm kinda confused by the arguments of .reshape(), i thought the arugments was just the desired shape?

proud wing
#

in the case (beforce your example) we wanted to reshare the transposed tensor to align the 3rd dimension of A with the second dimension of B for contraction

#

reshape sorry

#

so the manual approach i shared gives more control

#

because in your case is not explicitly specifying the reshaping and permutation of dimensions.. tensordot does perform a contraction over a specified axes, which is the axes by which you want to contract the tensor. [[2],[1]]

so 3rd dimension of A (index 2) and second dimension of B (index 1)

#

which exactly what the manual approach i shared does

#

if i had to guess (and i've never looked honestly) numpy.tensordot is probably more efficient 🙂

topaz turtle
#

i see

proud wing
#

but this does get me thinking

topaz turtle
#

btw, the way numpy does it, reshaping things doesn't acc affect the data in any way, it's simply a reordering of dimensions and only has an effect on how you read/write to it, right?

proud wing
#

ok... so hmm

#

if you wanted something more general you'd want to be able to handle any valid combination of contraction axes specified as pairs

#

and you'd want to magically (i say that word instead of automation) infer the shape and reshape and permutate without manual intervention

#

let me try something real quick

topaz turtle
#

i already have an algorithm that does this, but it's pretty wonky and as a result very slow

#

which is why i'm trying to express it in terms of blas

proud wing
#

sec let me open up jupyter

#

i want to benchmark a general case

#

i'll use your inputs if that works

topaz turtle
#

btw, i checked your algorithm, and it gives a different result from just doing tensordot (unless i messed smth up)

A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)

B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)

print(np.tensordot(A, B, axes=[[2], [1]]))
print(result_tensor)
proud wing
#

thats yours right?

topaz turtle
#

that's your code, i just changed A and B to have concrete values

proud wing
#

oh right

#

but yours at the bottom

topaz turtle
#

to easily compare results betwen runs

topaz turtle
#

numpy must be doing smth more than just reshaping the tensors into desired matrix shapes

proud wing
#

also do you know about TensorLy?

#

if my goal was to make something blazing fast btw

topaz turtle
#

bc the following two examples of tensor contraction

(2,3,4)@(5,4,6) and (2,3,4)@(5,6,4) would be mapped into the same matrix product, but technically shoudl have different results...

proud wing
#

I'd writ ethe tensor operations in C/C++

#

and call the blas routines

topaz turtle
topaz turtle
proud wing
#

TensorLy is a pure tensor library w/ blas based contractions

topaz turtle
#

i see

#

the goal of my project was to understand how to acc implement all those ops, so this kinda loses the purpose 😦

proud wing
#

ah

#

mine will do that

topaz turtle
proud wing
#

not the one i showed you only

serene scaffold
#

huh, you two are new
welcome to our wonderful data science and AI chat

proud wing
#

Hi @serene scaffold thx so much

#

hope you're having a happy holiday 🙂

topaz turtle
#

this question does acc tie into ml, bc efficient tensor contraction lies at the core of tensor autodiff, so this is on topic lol

serene scaffold
#

I didn't say it wasn't

topaz turtle
#

fair 🙂

#

do you by chance have any suggestions wrt the question? 👀

proud wing
#

also i have another fun idea to test

topaz turtle
#

👀

proud wing
#

i'm assuming my func will be slower than numpy

#

but i want to experiment with numba

serene scaffold
proud wing
#

since it can compile it down to machine code

late salmon
#

Hi can anyone help me

topaz turtle
#

numpy uses hyperoptimised blas routines, so even being 2x-3x slower than numpy is a success in my book tbh

serene scaffold
proud wing
#

@topaz turtle im debugging some code.. my contracted tensor1 resulted slightly different dimensions than tensor2 😄

#

in my very general approach that i'm experimenting

#

tensor math not for the faint of heart..

#

Ok wow...

#

i really thought this idea was sound

#

ahh

#

seems like maybe its because i'm using numpy's matmul

#

on tensors that dont have compatible inner dimensions.

#

looks like long form math is the only way

#

Hey @topaz turtle you know how BLAS is a 4-letter acronym?

#

I came up with one for this contraction math long-form called TRIS 🙂

#

Transform, reshape, iterate, and sum

proud wing
#

@topaz turtle Ok sorry it took a lot longer than I thought

#

coding a completely general blas transform was harder than I thought

#

@topaz turtle

import numpy as np
import time
###### falcon wings axis permutations to simulate blas w/ tensor contractions
###### v1 with numpy arrays 
# 100,000 Iterations 

A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)
contraction_axes = [[1, 2], [0, 1]]  
from unittest import result
# Tensor Contraction BLAS experiments
# TRIS - transform reshape iterate sum :)

def long_form_third_grade_math_tensor_contraction(tensor1, tensor2, contraction_axes):

    # we will use numpy for its efficient arrays but will implement the tensor math
    tensor1 = np.asarray(tensor1)
    tensor2 = np.asarray(tensor2)

    # T (Tris)
    permuted_axes1 = [axis for axis in range(tensor1.ndim) if axis not in contraction_axes[0]] + contraction_axes[0]
    permuted_axes2 = [axis for axis in range(tensor2.ndim) if axis not in contraction_axes[1]] + contraction_axes[1]
    
    tensor1 = np.transpose(tensor1, permuted_axes1)
    tensor2 = np.transpose(tensor2, permuted_axes2)

    # R (tRis)
    new_shape1 = tensor1.shape[:-len(contraction_axes[0])] + (-1,)  
    reshaped_tensor1 = tensor1.reshape(new_shape1)

    new_shape2 = (-1,) + tensor2.shape[len(contraction_axes[1]):]
    reshaped_tensor2 = tensor2.reshape(new_shape2)

    result_shape = tensor1.shape[:-len(contraction_axes[0])] + tensor2.shape[len(contraction_axes[1]):]
    result_tensor = np.zeros(result_shape)

    # trIs (Iterate)
    for i in np.ndindex(result_tensor.shape[:-1]):
        for j in range(result_tensor.shape[-1]):
            sum_over_axes = 0
            for k in range(reshaped_tensor1.shape[-1]):
            # triS (Sum)
                sum_over_axes += reshaped_tensor1[i + (k,)] * reshaped_tensor2[(k,) + (j,)]
            result_tensor[i + (j,)] = sum_over_axes

    return result_tensor
start_time = time.time()

for letsgooo in range(100000):
    result = long_form_third_grade_math_tensor_contraction(A, B, contraction_axes)
general_hundy = time.time() - start_time
print(f"falcon slowbie: {general_hundy:.4f} seconds")

falcon slowbie: 11.3511 seconds
#

And now for numpy's

#
import numpy as np
# Numpulous einstanamous 
A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)

start_time = time.time()
for _ in range(100000):
    result_tensor_einsum = np.einsum('ijk,lmn->iln', A, B)
einsum_time = time.time() - start_time

einsum_time_str = f"einsum smasher: {einsum_time:.4f} seconds"
einsum_time_str

einsum smasher: 4.3604 seconds
pure pond
#

dont forget to SMASH that einsum

proud wing
#

ahh

#

always gotta smash

candid spruce
#

Would anyone be interested in doing a deep learning project with me it will be a self dataset building DL ai dm me if interested 😁

primal agate
#

Any ideas how to start with machine learning? Maybe learn pandas, flask,numbpy first?

candid spruce
candid spruce
primal agate
#

thanks for opinion

#

and advice

proud wing
pure pond
#

pytorch imo

candid spruce
final kiln
#

I've finished setting up the workflow and all the aws stuff, Im now waiting for amazon to accept my quota increase request so I can connect everything up

#

that is github actions

#

the actions workflow is gonna babysit the gpu spot instances, and the spot instances are gonna train GPT and send the info to MLFlow, I deploed it on the free tier aws instance

pure pond
#

is this your whisper thing?

final kiln
#

yes, I'm setting up the infra for doing experiments, there's a couple things I wanna try out regarding a possible modification of the self attention mechanism

pure pond
#

nice gl

final kiln
#

ty

proud wing
#

@topaz turtle was that helpful...

dense yarrow
#

I want to create a project to add to my portfolio since it's empty right now.
I'm new to data science and know very basic python. Any suggestions on what kind of projects I could do? Or if you know of any examples, that'd be helpful too!

#

Also, should I do it on vs studio code of google collab? Those are the two platforms I'm familiar with

desert oar
final kiln
#

It automates all the details

desert oar
#

it doesn't have to be super clever or detailed. but all successful real-world projects start with a question to answer and some data to work with

final kiln
desert oar
#

if you're stuck for ideas, check out kaggle. for example the Titanic dataset and the associated survival classification task is an excellent beginner project

final kiln
#

But haven't yet been able to use it because AWS defaults the quota for GPU instances to 0, so I'll have to wait 24h til they increase it

desert oar
final kiln
desert oar
#

pretty powerful machine, and even does gpu albeit much slower than an actual gpu

final kiln
#

That's cool, I've been relying on kaggle and colab for free GPU. But I reckon that once I start doing large text datasets + larger gpts this will be important.

topaz turtle
#

i’ll check it tmr 🥰🥰🥰🥰

topaz turtle
lapis sequoia
#

bro im a kid

#

i can relate

rugged zinc
#

hey, i was wondering if there's a resources page for this topic

#

i mean data sc/analyis

desert oar
jolly horizon
#

Has anyone ever used matplotlib for 3d plot ? I have two planes in 3d and the dot product of their normal vectors is literally 0, and legendary library plot them to be legit parallel, i am so confused..

jade bloom
#

Hi, anyone have experience with deep q reinforcement learning? I have a few questions

serene scaffold
#

("asking to ask" creates extra steps that decrease the chances your question will ever get answered.)

jade bloom
#

thanks for the advice!

#

How does a Deep Q Network for reinforcement learning learn at all?

To my understanding, you have a minibatch of random experiences, where each experience is in the format (current_state, action, reward, next_state) the way a DQN works is that you pass in the current game state(current_state) and get an estimate of the q-values for each action then, you choose the q-value of the action you took and store it in a variable. let's call it current_q_for_action_taken for now

then, with a seperate target network, we pass in next_state and get q_values for each action we can then calculate target_q_for_action_taken as follows: target_q_for_action_taken = reward+max(q's from target network)

Then, we can calculate the MSE loss for updating the other model(not target model) as follows: (target_q_for_action_taken-current_q_for_action_taken)^2.

Then using gradient descent and backpropogation, we can update that network's weights and biases(I think it updates biases)

then, every n steps we transfer the weights from this network over to the target network

my question is: how does all this allow this network to estimate an optimal q-function? It seems to me that the network will just flail around randomly adjusting weights and biases but never learning the optimal q-function to accurately map current game states to accurate q-values

agile owl
#

What does it mean if when you split your training data into periods and the vector of the fit parameters over time have significant autocorrelation/time series dynamics. Does it mean you should fit a per-period model on some arbitrary periodization of your training data or should you try to find the latent variable that explains the time series dynamics and make one model for all periods?

#

for example, the optimal value of one parameter seems to oscillate back and forth quite predictably from one period to the next the way I'm splitting up the data

upbeat linden
#

I’ve done a lot of research and the answer has pretty much been a toss up. PyTorch vs Tensorflow what is the best to learn for a beginner in the Machine Learning field (fully fluent in Python already)

I am looking to develop a model that can recognize a pattern in a database of a food journal. My dad that had cancer still has stomache issues, and he has 2 years worth of food entries and bowel movements. I am trying to develop a model that can tell him which foods he can and cannot eat based on the log.

It seems that I should be using a classification model, but I am not sure what the best way/how to approach this general solution

serene scaffold
#

I don't think most people care that much. but no one really talks about perceptrons except when discussing the history of AI.

iron basalt
#

It's contradicting itself. You can bring this up as an edit to the Wikipedia page.

final kiln
#

Isn't the Heaviside a non-linear function ?

#

a linear function is a polynomial of degree one or less, including the zero polynomial

which is not the case for the step function, doesnt fit into ax+b

final kiln
topaz turtle
proud wing
#

Yea it's pretty good

#

I'm going to implement it in C later

#

as I actually have some ideas for it

topaz turtle
#

ooh interesting

proud wing
#

for transformer math

topaz turtle
#

acc the reason i asked in the first place is bc i need it for a C library i'm writing lol

proud wing
#

at the moment I'm finishing up some code that instantly analyzes a models layers

#

an open source ones:D

topaz turtle
#

👀

#

feel free to ping me when/if you get to doing that 🙂

topaz turtle
#

when i tested my own algorithm, i learned that for small tensors there's a very small disparity in performance, but once i tested on tensors with ~million elements, it revealed that the time disparity in reality is acc like a 100 😭

proud wing
#

just an example

topaz turtle
#

does anyone know if libraries like numpy acc move data around when doing a transpose (to preserve linearity of data access)

odd meteor
# upbeat linden I’ve done a lot of research and the answer has pretty much been a toss up. PyTor...

I started with TensorFlow because it appeared somewhat more beginner-friendly to me, however, since I tried PyTorch I never went back to TensorFlow.

Idk how to explain it lol but PyTorch gives you this liberty to explore and navigate your ship however you deem fit. So long as you enjoyed OOP when you learned Python, you will most likely enjoy PyTorch.

More so, you could as well leverage PyTorch Lightning; which is more like a wrapper for PyTorch models, to even go brrrrrrrrrrrrrr while driving your ship.

I'd say, just start with anyone framework that comes easy to you. It doesn't even have to make sense why you pick one over the other, just get started already.

I'm sorry to hear about your Dad's stomach issues. Hopefully, what you're trying to build helps him get better.

Goodluck 💯

odd meteor
#

There's an error in their definition.

Peceptron uses a threshold function which is a linear function. It becomes MLP when you introduce at least one hidden layer and the activation function changes from a threshold function to a non-linear activation function like ReLU, SWISH, Phish, Tanh etc.

proud wing
#

Ive had much better experience w pytorch as well. Have you ever tried building tensorflow from source w tensor rt and cuda w clang? I have. Its not pleasant.

final kiln
proud wing
#

Training going. Time to go chill:)

final kiln
# final kiln I found this thing: https://github.com/spotty-cloud/spotty

this thing ended up not working out, I tried using the t2.micro just to get it going, it starts the spot instances and takes care of a lot of stuff, but then it gets stuck on some unknown signal that google can't find.

im just gonna code a simple master-worker setup myself using boto3, I can use it to start the spot instance and run a script that starts the training process, the worker will drop messages to an aws queue. spot instances give a 2 min warning, which I can use to save state and restart where I left off on some other spot instance

odd meteor
final kiln
#

y does stuff that sound simple always turn out to be complicated >.>

#

need to rethink this

topaz turtle
arctic wedgeBOT
#

:incoming_envelope: :ok_hand: applied timeout to @brave thistle until <t:1703948445:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

agile owl
#

If I'm just probing to see whether a reinforcement algorith might have merit how many timesteps should I invest in training before deciding whether the problem is suitable or not

#

doing anything that involves nvidia driver or cuda toolkit dependencies from source sounds like nightmare fuel

#

I'm lucky if my nvidia-smi wants to come out to play

#

instead of giving me the NVML mismatch error

#

does anyone else get that joy of an error of libcupti not being found so you need to install a 1.x version of torch?

final kiln
#

question: if my spot instance goes down, should I restart from the last epoch or from the last best epoch ?

agile owl
#

given your seed won't it just go in the same direction again

final kiln
#

no, the batches are usually randomized, even with the same seed it wouldn't go in the same direction since it«s being initialized at a different starting epoch

#

finally got my first successful master-worker run

#

this was hard work ngl

#

now I gotta make it fault tolerant

agile owl
#

how do you debug if your results are actually different every time you run it

final kiln
#

the right pattern to use is to have mlflow store the state and then draw decisions from that

final kiln
topaz turtle
#

right?

final kiln
#

tho I can see the value of being able to exactly reproduce the results

#

not entirely sure how it could be done with this setup tho

#

guess I'd need to decide on the batches for all epochs a priori and save it somewhere

#

im going for last epoch then

wooden condor
#

i am trying to load a scikit learn model from 2019 via joblib, but i am getting errors (probably because i am using newer python version now than what it was created it in) I have tried setting up a docker image and replicate environment from then, but that gave me other errors (some opencv errors). Can anybody help me load this model into a modern version of python and scikit learn? I can pay for it

crisp shuttle
#

Hello everyone.

I wanted to ask, does anyone know any tutorial/course machine learning, where they teach how to further train and improve a model in the last remaining percentages of the loss error. I struggle finding even one on YouTube or a literature about techniques or things to do to improve it further, other than to train it longer or change the learning rate.

serene scaffold
serene scaffold
crisp shuttle
# serene scaffold The reason you're not finding resources to answer that question is that there ar...

Thanks for responding.

Does it have to do with the data itself or maybe because there are too many outliers on the data because, when I compare some of the predicted values and their true targets, the difference are 5 times smaller (when I'm supposed to get 15K it gives me 3K) and some 2 times bigger compared to their true targets.

I normalized the features at first to the same range, that didn't help, then I normalize the targets to the same scale with the features to avoid exploding gradients, still no having this issue.

proud wing
topaz turtle
proud wing
#

After 32 dimensions, many of numpy's functions hit a wal

topaz turtle
#

i mean ye, but i'm still trying to generalise this to at least 4 or 5 😭

proud wing
#

you know how right?

topaz turtle
#

not really

proud wing
#

well there more i think about it the more i'm thinking my idea is pretty unique

#

I havent actually seen another example of doing it that way but not saying it hasnt been done

topaz turtle
#

i'm exploring the idea of skipping the expense of the transposition step by performing scattered data access on sub-tensors of each tensor that are large enough to fit into L1 cache (so the non-linear data access due to the transposition of the axes doesn't incur any performance overhead)