#data-science-and-ml | Python | Page 93

blazing vale Dec 21, 2023, 12:15 PM

#

here

small wedge Dec 21, 2023, 12:16 PM

#

what you showed should be a way to do it

>>> t.tensor(0.4)
tensor(0.4000)

buoyant vine Dec 21, 2023, 12:17 PM

#

FacePalm yeah that would work

#

I kept trying torch.Tensor

small wedge Dec 21, 2023, 12:17 PM

#

ah lol

#

hm

#

what are you trying to sum here?

blazing vale Dec 21, 2023, 12:18 PM

#

like how much total sales of each genre is

#

i wanna add global column of every row which has genre as 'Action'

small wedge Dec 21, 2023, 12:19 PM

#

so you want the number of records right?

#

what about .count

blazing vale Dec 21, 2023, 12:20 PM

#

nope

#

i wanna add total sales

#

lemme show u

small wedge Dec 21, 2023, 12:20 PM

#

do you have a sales field?

blazing vale Dec 21, 2023, 12:20 PM

#

yes

#

see this

#

Global is total sales

#

over all regions

#

i wanna add global sales of all the games which fall under the genre'Action'

small wedge Dec 21, 2023, 12:21 PM

#

oh

#

print(df[df['Genre']=='Action']['Global'].sum())

blazing vale Dec 21, 2023, 12:22 PM

#

oh wait we have been equating wrong column all this time

#

this will actually work lol

#

working

#

checks out too

#

thankss

#

136.85 Billion Dollars

#

nice:))

blazing vale Dec 21, 2023, 2:48 PM

#

blazing vale Dec 21, 2023, 2:52 PM

#

small wedge oh

Heyy

#

U there?

small wedge Dec 21, 2023, 3:24 PM

#

blazing vale U there?

Hm?

serene scaffold Dec 21, 2023, 3:25 PM

#

@blazing vale please don't call out specific people. ask a complete question that anyone who knows the answer could start answering.

blazing vale Dec 21, 2023, 3:25 PM

#

serene scaffold <@864347116213501952> please don't call out specific people. ask a complete ques...

Ahh okay

blazing vale Dec 21, 2023, 3:25 PM

#

small wedge Hm?

Yoo

#

https://pym.dev/p/34ha6/

34ha6 - Python Pastebin - Python Morsels

A free Python-oriented pastebin service for sharing Python code snippets with anyone

#

Line 167,166,128

#

Can you tell me whats the issue with em?

serene scaffold Dec 21, 2023, 3:28 PM

#

blazing vale Line 167,166,128

you have to say what they currently do, and how it's different from what you want them to do.

blazing vale Dec 21, 2023, 3:28 PM

#

The output is at the end of the code

#

Its giving me blank

#

No output for the condition on line 167

#

Sir nedbat told me there is smthg wrong with line 128. But i cant figure it out

#

I have made it so complex i can’t figure out myself what wrong i am doing 💀

#

i asked chatgpt. lol it figured out easily. i am so dumbbb

#

it was a minor mistake

agile jackal Dec 21, 2023, 3:33 PM

#

what hugging face training data will make anime or cartoonlike art I have dreamlike-anime but it is not often accurate I am familiar with safesensors

fresh eagle Dec 21, 2023, 3:33 PM

#

has someone played with the new version of midjourney?

outer widget Dec 21, 2023, 4:56 PM

#

agile jackal what hugging face training data will make anime or cartoonlike art I have dreaml...

By training data, do you mean any other diffusion based model hosted on HF? Most of these SD models including dreamlike-anime were pretrained on a very large private data. You will hardly find any data that can be used to train a similar performing model.

candid spruce Dec 21, 2023, 5:17 PM

#

Would anyone be interested in doing a deep learning project with me it will be a self dataset building ai dm me if interested 😁

long canopy Dec 21, 2023, 5:39 PM

#

anyone here ever deal with interactive graph visualization? e.g., the ability to zoom in or zoom out, or obtain info by hovering over a node

timid kiln Dec 21, 2023, 5:44 PM

#

I found an article the other day that described different methodologies of how output data could be organized, and now I can't find that article. Perhaps you all can help me figure out the appropriate keywords so I can find it again.

The two concepts were one that involved many rows, as opposed to organizing output data into less rows, more columns. Probably be easier to show a couple output tables in csv format:

1,Pressure,50
2,Pressure,60
3,Pressure,70
1,Flow,10
2,Flow,20
3,Flow,30```
As opposed to:
```Scenario,Pressure,Flow
1,50,10
2,60,20
3,70,30```

I'm trying to figure out which is "better", what the pros/cons are for organizing data in either method, and so forth.  I know the answer to this is "it depends" but unless I know what terms to use to search on the internet I won't be able to learn anything about it.

Please let me know if this doesn't make sense and I'll do my best to clarify things more.  Also, please tag me if you respond so I'll get an alert.  Thank you!

long canopy Dec 21, 2023, 5:52 PM

#

i might learn javascript just for the libraries available for graph visualization

past meteor Dec 21, 2023, 5:54 PM

#

timid kiln I found an article the other day that described different methodologies of how o...

https://en.wikipedia.org/wiki/Wide_and_narrow_data

Wide and narrow data

Wide and narrow (sometimes un-stacked and stacked, or wide and tall) are terms used to describe two different presentations for tabular data.

spare junco Dec 21, 2023, 6:13 PM

#

We are trying to analyse this huge dataset, which is supposed to predict the number of days for the treatment of a patient.

We have tried multiple models like GradientBoost, CatBoost, LinearRegression, LGBM, XGB etc.

We have also done feature selection including Mutual Info Feature Selection, Correlation, Variance Threshold. And other stuff like Normalization as well as Outlier Detection.

We have had the best results with Gradient Boost Regressor. We have done some hypertuning via GridSearchCV.

What is our next best step to reduce Root Mean Sq. error

The dataset looks like this 👇

past meteor Dec 21, 2023, 6:32 PM

#

spare junco We are trying to analyse this huge dataset, which is supposed to predict the num...

You actually have to look at the data imo

#

Imo it's something you can't do with summary statistics etc. I'd try and understand what you're working with first

#

This will lead you to feature engineering etc.

outer widget Dec 21, 2023, 6:33 PM

#

long canopy anyone here ever deal with interactive graph visualization? e.g., the ability to...

https://github.com/magjac/d3-graphviz maybe this? I have used it for visualization purposes. Pretty cool. Also allows render/zoom features.

GitHub

GitHub - magjac/d3-graphviz: Graphviz DOT rendering and animated tr...

Graphviz DOT rendering and animated transitions using D3 - GitHub - magjac/d3-graphviz: Graphviz DOT rendering and animated transitions using D3

past meteor Dec 21, 2023, 6:33 PM

#

Regularization is also an option you can pursue if you have a lot of (useless) variables. In my experience Extra trees works well if you have many correlated predictors as well.

long canopy Dec 21, 2023, 6:34 PM

#

outer widget https://github.com/magjac/d3-graphviz maybe this? I have used it for visualizati...

yeah, currently investigating non-python alternatives at the moment

#

literally nothing for interactive networks in python atm

#

am investigating whether the Gephi Toolkit is worthwhile and more efficient than javascript approaches; the toolkit uses OpenGL

blazing vale Dec 21, 2023, 6:35 PM

#

def maxs2(genre,region):
    print(df[(df[('Genre')]==genre)&(df[region]==df[region].max())])```

#

returning empty dataframe

#

just like my brain

small wedge Dec 21, 2023, 6:35 PM

#

max doesn't return a mask

blazing vale Dec 21, 2023, 6:36 PM

#

small wedge max doesn't return a mask

but i used == before it. wont it check?

outer widget Dec 21, 2023, 6:36 PM

#

long canopy literally nothing for interactive networks in python atm

I think pyvis is also good for visualisation if you are using notebooks/ python. I can't recall if it support interactive visuals or not.

blazing vale Dec 21, 2023, 6:37 PM

#

Maximum and Minimum Sales made by each Genre across each Region and ROW Sales

#

trying to do this now

small wedge Dec 21, 2023, 6:37 PM

#

blazing vale but i used == before it. wont it check?

oh I see my bad I think it should be fine

blazing vale Dec 21, 2023, 6:37 PM

#

it returns empty dataframe 😭

#

Enter genre: Action
Enter Region: North America
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []

#

output

long canopy Dec 21, 2023, 6:39 PM

#

outer widget I think pyvis is also good for visualisation if you are using notebooks/ python....

had a look at vis.js, also seems like a cool library

red solar Dec 21, 2023, 6:39 PM

#

hello everyone.

#

I'm interested in machine learning.

#

So, which one help me?

outer widget Dec 21, 2023, 6:40 PM

#

spare junco We are trying to analyse this huge dataset, which is supposed to predict the num...

If the end goal is reducing rmse, maybe try out strategies like cv based ensemble if there are no constraints on time. If samples are less, maybe seed averaging for more robust predictions, more feature engineering?

feral kernel Dec 21, 2023, 6:47 PM

#

Why does everyone use gradient descent for backpropagation or determining weights, there are so many other faster ways, simulated annealing, batched quasi Newtonian, genetics algorithms, random weights plus some adjusting, and other metaheuristics? Is it because it is easier, more papers on it, easier to scale and more stable? I feel like to progress even more in ML, people need to transition beyond normal back-propagation and a new architecture.

spare junco Dec 21, 2023, 6:47 PM

#

outer widget If the end goal is reducing rmse, maybe try out strategies like cv based ensembl...

Could you elaborate on CV based ensemble? (no, there are no constraints on time)

valid wind Dec 21, 2023, 7:56 PM

#

why does BERT base uncased need 12+ GB GPU memory to train

#

when using a standard AdamW optimizer, doesn't it represent each parameter as 8 bytes

#

so 110 Million Paramters x 8 bytes, 880 MB, + 1x for gradient + 1x for optimizer state

serene scaffold Dec 21, 2023, 8:01 PM

#

@valid wind show your training code

#

and in particular, when you move the training data onto the gpu

valid wind Dec 21, 2023, 8:01 PM

#

yeah let me show it

#

tokenizer=transformers.AutoTokenizer.from_pretrained('bert-base-uncased')
model=transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
train_dataset = datasets.Dataset.from_pandas(train_df, preserve_index=False)


def tokenize_texts(texts):
    global tokenizer
    q1rows=texts['question1']
    q2rows=texts['question2']
    return tokenizer(q1rows, q2rows, truncation=True)



tokenized_data = train_dataset.map(tokenize_texts, batched=True)
cast_features = tokenized_data.features.copy()
cast_features['is_duplicate'] = ClassLabel(num_classes=2, names=['not_duplicate', 'duplicate'], names_file=None, id=None)
tokenized_data=tokenized_data.remove_columns(['question1','question2', 'id', 'qid1', 'qid2'])
tokenized_data=tokenized_data.rename_column('is_duplicate', 'labels')
tokenized_data=tokenized_data.train_test_split(test_size=0.2)
training_args = TrainingArguments("./quora-bert", evaluation_strategy="epoch", save_strategy='no', report_to='none', num_train_epochs=3, per_device_train_batch_size=32, per_device_eval_batch_size=32)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_data['train'],
    eval_dataset=tokenized_data['test'],
    compute_metrics=compute_metrics,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    tokenizer=tokenizer,
)
trainer.train

serene scaffold Dec 21, 2023, 8:03 PM

#

please edit to to include a py on the first line with the backticks

#

```py
code
```

#

ty

valid wind Dec 21, 2023, 8:03 PM

#

sorry I forgot to add that

serene scaffold Dec 21, 2023, 8:06 PM

#

@valid wind
you don't need global to read the global scope. only to write to it. so global tokenizer is unnecessary.

try lowering the batch size and see if you don't run out of GPU memory.

valid wind Dec 21, 2023, 8:06 PM

#

no I don't run out of GPU memory training this, when using 16GB, however, when I run my own training loop, with even 32 batch size this only requires 12 GB around of memory

serene scaffold Dec 21, 2023, 8:07 PM

#

!paste

arctic wedgeBOT Dec 21, 2023, 8:07 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.

valid wind Dec 21, 2023, 8:07 PM

#

https://paste.pythondiscord.com/KJQQ

serene scaffold Dec 21, 2023, 8:08 PM

#

looks like your paste messed up the indentation.

valid wind Dec 21, 2023, 8:11 PM

#

yep, fixing it now

#

but it might be just because I'm clearing my cuda cache

#

@serene scaffold https://paste.pythondiscord.com/DIVA

#

this should be better

serene scaffold Dec 21, 2023, 8:13 PM

#

gotta do a work thing. I should be back in like ten minutes

valid wind Dec 21, 2023, 8:14 PM

#

serene scaffold gotta do a work thing. I should be back in like ten minutes

np, thanks for the helping me

serene scaffold Dec 21, 2023, 8:31 PM

#

         # Run the forward pass of the model
         logits = model(
             input_ids=batch['ids'].to(device, dtype=torch.long),
             attn_mask=batch['mask'].to(device, dtype=torch.long),
             pred_indicator=batch['pred'].to(device, dtype=torch.long),
         )
         loss = loss_function(logits.transpose(2,1), batch['targets'].to(device, dtype=torch.long))

@valid wind this potentially saves memory because it never creates extra references to all those cuda tensors in that scope

#

you also have a syntax error on line 33

valid wind Dec 21, 2023, 8:32 PM

#

serene scaffold ```py # Run the forward pass of the model logits = model( ...

that makes sense

valid wind Dec 21, 2023, 8:32 PM

#

serene scaffold you also have a syntax error on line 33

and yeah the code runs, I must have done something while fixing the indentation

#

yeah it was an error while pasting it or something

#

that's not in the original code

long canopy Dec 21, 2023, 8:56 PM

#

how come no one suggested Bokeh to me

#

it does exactly what I need

desert oar Dec 21, 2023, 9:04 PM

#

blazing vale ```py def maxs2(genre,region): print(df[(df[('Genre')]==genre)&(df[region]==...

can you give a minimal example of your data? are you looking for the row with the max sales for the given genre, in the given region?

#

normally i'd recommend just 3 columns here ("long" format): genre, region, and sales. instead it looks like you have a separate column for sales in every region ("wide" format).

#

if you set up your data with indexes, you can do this as simply as df.loc[(genre, region), "sales"].max() or similar

#

but that requires setting up your data correctly, which requires you to share information about how you constructed this data

desert oar Dec 21, 2023, 9:07 PM

#

spare junco We are trying to analyse this huge dataset, which is supposed to predict the num...

i second what zestar said. all the fancy machine learning algo stuff comes after you've done a thorough inspection and analysis of the data. you will also want to form a coherent understanding of how the data was collected and how these measurements were obtained. otherwise you're just flailing around, which doesn't actually work in most cases, despite the breathless hype of companies that want to sell you machine learning APIs, cloud compute, etc.

long canopy Dec 21, 2023, 9:12 PM

#

how could I programmatically control the zooming/centering of a pyplot graph while it is actually being rendered? in a sort of REPL fashion

desert oar Dec 21, 2023, 9:17 PM

#

@long canopy you mean like control these things programmatically instead of in the gui? https://matplotlib.org/stable/users/explain/figure/interactive.html

long canopy Dec 21, 2023, 9:18 PM

#

desert oar <@190081918526685184> you mean like control these things programmatically instea...

ah! this is it, thank you

blazing vale Dec 21, 2023, 10:03 PM

#

desert oar can you give a minimal example of your data? are you looking for the row with th...

thanks for the reply.its actually my dumb mistake. i checked the whole dataset in excel and found out there were no suitable matches for the info i was giving i\n input lol

#

So basically i am working on this big dataset which has 826 rows and 9 columns. i made a small search system using pandas to access rows according to user input. However suppose there are no matches. How can i make it in a way that instead of returning this output given below for no results it just prints"No results found "```
Empty DataFrame
Columns: [Game, Year, Genre, Publisher, North America, Europe, Japan, Rest of World, Global]
Index: []

#

anyone?

left tartan Dec 21, 2023, 11:59 PM

#

Scraping google most certainly violates their terms of use (something we can't help with). Have you considered one of their APIs? Or open street map?

desert oar Dec 22, 2023, 12:11 AM

#

Note that Google maps specifically prohibits caching or storing their outputs for use in your own database

#

People do it all the time of course, but as per server rules we officially are not allowed to help with anything resembling that

#

what kind of data specifically? maybe open street map / nominatim or geonames can help

#

i mean, if you're just looking to practice, you can probably use google maps, yelp, foursquare, bing, facebook, etc

#

openstreetmap has some of that but it depends on volunteer input

#

well, writing a python app to fetch data from an api is good practice. not related to the channel topic, but good practice all the same

outer widget Dec 22, 2023, 5:08 AM

#

spare junco Could you elaborate on CV based ensemble? (no, there are no constraints on time)

I am assuming you have a defined split for cross validation. In that case, you can try out blending predictions from multiple experiments. Can also do weighted ensemble w1pred1 + w2pred2 + wn*predn , these weights could be tuned using some grid search algos ex optuna. Ensemble becomes more effective when you have diverse predictions. (low correlation)

rugged mist Dec 22, 2023, 6:27 AM

#

i have an np ndarray B of shape (N, K, K), and a of shape (N,)
i want to find the weighted sum of the N KxK matrices in B, where a stores the weights
np.dot(B.T, a).T seems to work on its own but if i decorate the function its in with numba.njit i get this error

Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<built-in function dot>) found for signature:
 
 >>> dot(array(float64, 3d, F), array(float64, 1d, C))

im guessing its because B.T becomes fortran style because of the transpose, can i make np.dot work without the transpose or do i need to write a loop myself

#

minimum repro

import numpy as np
from numba import njit

@njit  # works if this is removed
def f():
    a = np.ones((10,))
    B = np.ones((10, 30, 30))
    return np.dot(B.T, a).T

print(f())

outer widget Dec 22, 2023, 6:37 AM

#

rugged mist i have an np ndarray `B` of shape (N, K, K), and `a` of shape (N,) i want to fin...

I think you will need to write explicit loop for performing B transpose, numba should be able optimize that. Quite straightforward.

rugged mist Dec 22, 2023, 6:38 AM

#

sad

untold bloom Dec 22, 2023, 7:34 AM

#

was np.einsum("i,ijk->jk", a, B) too slow?

rugged mist Dec 22, 2023, 8:01 AM

#

looks like numba doesnt support einsum

hollow flicker Dec 22, 2023, 11:35 AM

#

I’ve dataset. For examle it contains age, money , job, salary etc. How can i give points per user? Between 0 - 1. Unfortunately i dont have target data

radiant cipher Dec 22, 2023, 1:52 PM

#

hi,anyone aware of a pre-implemented fast way to have numpy materialize a element path, if one array column is the parent indexes

aka given something like

tree_array = [0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9]

when asking for the path of the index 9, returning [4, 1,0] and when asking for 19its [9,4,1]

its a easy python loop but i'm working with about a dozen million elements and i would like to have a pipeline of extract path, get indexes of another array based of the path and return that - i cant find a premade implementaton of that type of tree walking

red oriole Dec 22, 2023, 3:42 PM

#

Do any of you guys know how to setup a local dbpedia triple store?

serene scaffold Dec 22, 2023, 3:53 PM

#

radiant cipher hi,anyone aware of a pre-implemented fast way to have numpy materialize a eleme...

I'm not sure that I fully understand the desired behavior, but if it's easy to solve with a python loop, is there a reason why numba isn't viable?

#

Take a look at the docs here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization
looks like theres a standardize keyword argument that you can use to change that behavior.

TensorFlow

tf.keras.layers.TextVectorization | TensorFlow v2.14.0

A preprocessing layer which maps text features to integer sequences.

#

Optional specification for standardization to apply to the input text. Values can be:
None: No standardization.
"lower_and_strip_punctuation": Text will be lowercased and all punctuation removed.
"lower": Text will be lowercased.
"strip_punctuation": All punctuation will be removed.
Callable: Inputs will passed to the callable function, which should be standardized and returned.

outer widget Dec 22, 2023, 5:51 PM

#

Keras TextVectorisation does have a parameter named standardize. By default its set to standardize='lower_and_strip_punctuation'
You can set it to None to keep the same case

#

Sorry missed Stelercus response.

serene scaffold Dec 22, 2023, 6:52 PM

#

outer widget Sorry missed Stelercus response.

By the mouth of two shall every word be established.

radiant cipher Dec 22, 2023, 10:33 PM

#

serene scaffold I'm not sure that I fully understand the desired behavior, but if it's easy to s...

im looking for a preexisting optimized tree walker,

serene scaffold Dec 22, 2023, 10:34 PM

#

radiant cipher im looking for a preexisting optimized tree walker,

sorry that I don't have the answer Sadge I really like pytest, though

final kiln Dec 23, 2023, 1:00 AM

#

class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    sentence = sentence @ self.lm_weights


    # last bit ... 

    return sentence

I'm almost done !! only the logits thing left

#

after this im gonna teach it to sort letters

final kiln Dec 23, 2023, 1:29 AM

#

class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    logits = sentence @ self.lm_weights

    max_values, max_indices = logits.max(dim=2)
    shifted = logits - max_values.unsqueeze(2)
    exponentiated = torch.exp(shifted)
    return torch.sum(exponentiated, dim = 2)

aight imma train it now

final kiln Dec 23, 2023, 2:04 AM

#

class NanoGPT(nn.Module):

  def __init__(self, params):
    super(NanoGPT, self).__init__()
    self.sequence_encoder = SequenceEncoder(params)
    self.transformer_1 = Transformer(params)
    self.transformer_2 = Transformer(params)
    self.transformer_3 = Transformer(params)
    self.norm = LayerNormalization(params)
    self.lm_weights = RandParameter(params.coordinates, params.tokens)

  def forward(self, sequence):
    sentence: Float[Tensor, "words coordinates"] = self.sequence_encoder(sequence)
    sentence = self.transformer_1(sentence)
    sentence = self.transformer_2(sentence)
    sentence = self.transformer_3(sentence)
    sentence = self.norm(sentence)
    logits = sentence @ self.lm_weights

    max_values, max_indices = logits.max(dim=2)
    shifted = logits - max_values.unsqueeze(2)
    exponentiated = torch.exp(shifted)
    probs = exponentiated / torch.sum(exponentiated, dim = 2).unsqueeze(2)
    return probs



params = ModelParameters(
  # The dimension of a vector embedding
  coordinates = 3*1000,

  # The number of tokens in the vocabolary
  tokens = 3,

  # The maximum number of words in a sentence (context window)
  words = 10,
)



def generate_data(batches = 2000):
  for _ in range(30):
    sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")
    sorted_matrix, sorted_indices = torch.sort(sequence, dim=1)  # Sort along columns
    encoding = torch.eye(params.tokens).to("cuda")
    yield sequence, encoding[sorted_matrix]

nanoGPT = NanoGPT(params).to("cuda")
optimizer = torch.optim.Adam(nanoGPT.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()
torch.autograd.set_detect_anomaly(True)
for epoch in range(100):
  nanoGPT.train()

  for batch, targets in generate_data():
    optimizer.zero_grad()
    inputs = batch.to("cuda")
    outputs = nanoGPT(inputs)
    loss = loss_function(outputs, targets.to("cuda"))
    loss.backward()
    optimizer.step()
  print(loss)

#

is not learning very well 🙃

#

i got it i got it, wasn't actually creating batches:

sequence: Int[Tensor, "batches words"] = torch.randint(0, params.tokens, (1, params.words,)).to("cuda")

onyx forge Dec 23, 2023, 3:22 AM

#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

feral kernel Dec 23, 2023, 8:30 AM

#

Hey , why does pytorch nn conv2d model keep telling me too many values to unpack, i changed to code to accept 4 channels , but it still says expected two channels

whole zephyr Dec 23, 2023, 8:54 AM

#

feral kernel Hey , why does pytorch nn conv2d model keep telling me too many values to unpac...

can you send a code snippet and mark the line where it gives you this error (mark it with a comment)? (I don't have pytorch currently installed, but I worked with it a bit)

feral kernel Dec 23, 2023, 10:24 AM

#

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(model.parameters(), lr=0.01)
import torch
import numpy as np

# Define the custom loss function for complex numbers
def custom_complex_loss(input, target):
    input_magnitude = torch.abs(input)
    target_magnitude = torch.abs(target)
    loss = torch.mean((target_magnitude - input_magnitude) ** 2)
    return loss     def train_model(early_stopping=True, epochs=50):
    epoch_train_losses = []
    epoch_test_losses = []
    current_best_loss = np.inf
    early_stopping_counter = 0
    early_stopping_patience = 2

    for epoch in range(epochs):  
        for i, batch in enumerate(train_loader, 0):
            real_inputs, imag_inputs, real_labels, imag_labels = batch
            inputs = torch.complex(real_inputs, imag_inputs)
            labels = torch.complex(real_labels, imag_labels)
            

            optimizer.zero_grad()
            outputs = net(inputs)

            loss = custom_complex_loss(outputs, labels)
            loss.backward()
            optimizer.step()```

feral kernel Dec 23, 2023, 10:24 AM

#

feral kernel ```model = BesselCNN() criterion = nn.CrossEntropyLoss() optimizer = torch.optim...

Cell In[97], line 51
     48                     break
     50 # Assuming net, train_loader, test_loader, optimizer, and device are defined
---> 51 train_model(early_stopping=True, epochs=50)

Cell In[97], line 21, in train_model(early_stopping, epochs)
     19 for epoch in range(epochs):  
     20     for i, batch in enumerate(train_loader, 0):
---> 21         real_inputs, imag_inputs, real_labels, imag_labels = batch
     22         inputs = torch.complex(real_inputs, imag_inputs)
     23         labels = torch.complex(real_labels, imag_labels)

ValueError: too many values to unpack (expected 4)```

feral kernel Dec 23, 2023, 10:28 AM

#

whole zephyr can you send a code snippet and mark the line where it gives you this error (mar...

torch.Size([4, 1, 500, 500]) also my data has been fourier transformed. It should be 4 why does it say it expects 2, maybe somewhere it should have 4 inputs but it has two. But i changed to four but it didnt understand it so i did torch complex

whole zephyr Dec 23, 2023, 10:42 AM

#

ok, so you can try to do print(type(batch)) and print(len(batch))

#

then print(type(batch[0])) and len(batch[0])

#

maybe batch contains some iterables like list or tuple or something

#

you need to see what your train_loader does to your data and see the shape of its outputs or something

#

@feral kernel basically the whole process of debugging this is to go to the source of your data (in your case, batch) and see how it is processed and why is it the shape it is

do some detective work and like go back on the stages of your data flow, see what happened at every stage with a print at its end - that's what I do to identify problems and their causes when I debug my code

feral kernel Dec 23, 2023, 10:48 AM

#

whole zephyr <@671670346285318145> basically the whole process of debugging this is to go to ...

I did that already , still the same error , i know i need to change the batch so 4 dimensions will fit but i get another error.. chatgpt is not helping at all, same error over and over . Went to the source of error, tried to change the inputs , but for some reason still the same error

feral kernel Dec 23, 2023, 10:49 AM

#

whole zephyr ok, so you can try to do print(type(batch)) and print(len(batch))

I already printed the batch and teh shape, i asked gpt to change the shape pf the data to fit but for some reason it cant do it

whole zephyr Dec 23, 2023, 10:49 AM

#

The error message "too many values to unpack" happens when you tried to pull out more elements from the tuple than existed.

#

so your train_loader doesn't do what you would want it to do

#

so rather than reshaping the batch, see what your train_loader does

feral kernel Dec 23, 2023, 10:50 AM

#

whole zephyr The error message "too many values to unpack" happens when you tried to pull out...

I know that, i dont understand why cant gpt change it so the batch loader to accept dimension of 4. I need manually do it lol

whole zephyr Dec 23, 2023, 10:50 AM

#

coz gpt is a funny guy, it's generative - i.e. creative

#

creativity is not always accuracy so yeah if it gets stuck in the same idea of bad code, it's over

#

is the train_loader a function or something?

#

and is it something from a library or custom defined?

feral kernel Dec 23, 2023, 10:55 AM

#

whole zephyr creativity is not always accuracy so yeah if it gets stuck in the same idea of b...

It is not very creative , it just regurgitates and mixes info

#

It is the input after it has been resiZed. It is just some cfloat tensors

whole zephyr Dec 23, 2023, 10:57 AM

#

feral kernel It is the input after it has been resiZed. It is just some cfloat tensors

?

#

that's the train_loader?

feral kernel Dec 23, 2023, 10:59 AM

#

Im not home, i need to check the code… the train_loader process and loads the tensor

feral kernel Dec 23, 2023, 11:10 AM

#

whole zephyr that's the train_loader?

train_dataset = CustomTensorDataset(root_dir=train_root, transform=resize_transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

whole zephyr Dec 23, 2023, 11:14 AM

#

see what the DataLoader does

#

maybe that's the issue

#

so basically the train_loader variable is a list or something

#

a list of lists or tuples with 2 elements

#

and you want it to be of 4 elements

#

where is the DataLoader function from? some pytorch component?

feral kernel Dec 23, 2023, 11:19 AM

#

whole zephyr where is the DataLoader function from? some pytorch component?

Im not home , but i have some histories loaded. import os
from torch.utils.data import Dataset

class CustomTensorDataset(Dataset):
def init(self, root_dir, transform=None):
self.root_dir = root_dir
self.transform = transform
self.file_names = [f for f in os.listdir(root_dir) if f.endswith('.pt')]

def __len__(self):
    return len(self.file_names)

def __getitem__(self, idx):
    file_path = os.path.join(self.root_dir, self.file_names[idx])
    tensor = torch.load(file_path)

    # Ensure the tensor has a channel dimension
    tensor = tensor.unsqueeze(0) if tensor.dim() == 2 else tensor

    if self.transform:
        tensor = self.transform(tensor)

    return tensor

whole zephyr Dec 23, 2023, 11:20 AM

#

feral kernel Im not home , but i have some histories loaded. import os from torch.utils.data ...

I'm confused, I don't see DataLoader written here

feral kernel Dec 23, 2023, 11:21 AM

#

whole zephyr see what the DataLoader does

It should be some pytorch function or a function gpt and i defined from before

whole zephyr Dec 23, 2023, 11:21 AM

#

feral kernel It should be some pytorch function or a function gpt and i defined from before

that was the relevant part for the array you wanted to unpack

feral kernel Dec 23, 2023, 11:21 AM

#

whole zephyr I'm confused, I don't see DataLoader written here

I know i cant find where it is defined in my history. i need to go home and find it on my laptop.

whole zephyr Dec 23, 2023, 11:22 AM

#

is it this thing?

from torch.utils.data import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

feral kernel Dec 23, 2023, 11:22 AM

#

whole zephyr is it this thing? from torch.utils.data import DataLoader train_dataloader = D...

Yes a pytorch library

whole zephyr Dec 23, 2023, 11:23 AM

#

ok, so it doesn't seem to affect the shape of your input data, it just "slices" it

feral kernel Dec 23, 2023, 11:24 AM

#

whole zephyr ok, so it doesn't seem to affect the shape of your input data, it just "slices" ...

Thanks a lot

whole zephyr Dec 23, 2023, 11:24 AM

#

so just to be sure, print the shape of one element from train_dataset and train_loader

#

and if the DataLoader doesn't mess with the shape, then your problem stems from the train_dataset, maybe your processing does not give you some shape 4 data or so

#

wait, I think training data might be the inputs only. so you forgot the labels lol

#

I overlooked that

#

so yeah, it's normal to unpack 2 values only.

#

anyway, see what your training data looks like - if it has both inputs and labels or whatever

feral kernel Dec 23, 2023, 11:32 AM

#

I did print my train_loader. The labels variable is the same data as input.

feral kernel Dec 23, 2023, 11:32 AM

#

whole zephyr wait, I think training data might be the inputs only. so you forgot the labels l...

I defined labels, labels = torch.complex(real_labels, imag_labels)
Lol, gpt writes undefined code , then i have to correct it. I think i might need to define real labels , it defined imag_labels = batch

gloomy crow Dec 23, 2023, 12:07 PM

#

any idea why there is a faint blue line in the background that vaguely resembles the dark blue line? what does it mean and how can i remove it? Graphed with seaborn using lineplot with only the training loss (val loss is not present)

crystal phoenix Dec 23, 2023, 1:59 PM

#

what???

#

some sort of bug

serene scaffold Dec 23, 2023, 2:02 PM

#

crystal phoenix what???

What is that code supposed to do

#

It's not a bug

crystal phoenix Dec 23, 2023, 2:02 PM

#

call the function 20k times and store the results

#

it was working but when I reopened colab it stopped to work

serene scaffold Dec 23, 2023, 2:03 PM

#

Lottery isn't a function. It's an array

#

Don't reuse names

crystal phoenix Dec 23, 2023, 2:04 PM

#

serene scaffold Dec 23, 2023, 2:04 PM

#

Maybe range is the array

crystal phoenix Dec 23, 2023, 2:05 PM

#

i restarted the session and it worked

#

:P

#

weird

serene scaffold Dec 23, 2023, 2:05 PM

#

Great
Don't use names of built-in functions and classes for arrays

#

Or anything else

crystal phoenix Dec 23, 2023, 2:06 PM

#

i haven't done that

serene scaffold Dec 23, 2023, 2:06 PM

#

crystal phoenix weird

You probably did and deleted the code where you did

#

But with notebooks, deleting code that you already ran doesn't undo it

crystal phoenix Dec 23, 2023, 2:07 PM

#

well thanks for help, glad it works now

serene scaffold Dec 23, 2023, 2:07 PM

#

Yw

crystal phoenix Dec 23, 2023, 2:07 PM

#

whatever i did

serene scaffold Dec 23, 2023, 2:07 PM

#

There is no other explanation

#

The chances of other possible causes are next to none.

onyx forge Dec 23, 2023, 2:14 PM

#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

serene scaffold Dec 23, 2023, 3:34 PM

#

are you sure the underlying data is actually getting changed? because it might just be that google sheets is displaying it for you as an excel-style sheet

queen junco Dec 23, 2023, 5:09 PM

#

That's the point of a bias in a network cant I just change the weights and get the exact same thing

final kiln Dec 23, 2023, 7:27 PM

#

tensor([[0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2],
        [0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2],
        [0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2],
        [0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
        [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2],
        [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2]], device='cuda:0')

that's coming from the gpt I trained, it sorts arrays

feral kernel Dec 23, 2023, 7:37 PM

#

feral kernel --------------------------------------------------------------------------- ```V...

I'm really getting annoyed by coding, I have spent 80-90 hours trying to get this data to run on this custom neural network, it is not working. It has errors one after another... It is insane, i'm still on downsampling, i haven't even started training and tuning the weights and weight generators yet...Man this will take forever. Rewriting neural networks with a custom backpropagation and custom activation function and custom transformed dataset is hard. Man I need to learn more pytorch and python...

desert oar Dec 23, 2023, 8:08 PM

#

queen junco That's the point of a bias in a network cant I just change the weights and get t...

no. as an exercise, try to fit the line y=3x+1 using linear regression with no intercept

final kiln Dec 23, 2023, 9:46 PM

#

@desert oar after a ton of tensor indices shenanigans (I used an optimization that merges Q, K, and V for the three heads into one matrix ), I managed to place it into the xMx.T form during inference time:

Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
input = tensor([0, 0, 1, 0, 0, 1, 1, 0, 2, 0, 1], device='cuda:0')
output = tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2], device='cuda:0')

the first three prints are from the three heads, to confirm that it is in fact doing it

final kiln Dec 23, 2023, 10:02 PM

#

#

#

uhm

#

that's not induced distance, that's induced dot product rite

final kiln Dec 23, 2023, 10:20 PM

#

final kiln

this is also not symmetric, so it's actually not a correct interpretation, wonder what will happen if I force it to be symmetric and positive definite

desert oar Dec 23, 2023, 10:56 PM

#

final kiln that's not induced distance, that's induced dot product rite

how are you measuring the induced distance?

desert oar Dec 23, 2023, 10:56 PM

#

final kiln

so you trained the normal way, then precomputed M for inference?

#

as a quick test, you should get the same outputs from both versions, given the same input

#

up to a few decimal places of course

#

M is just the product of two projections, right? does it need to be symmetric? it's not a correlation matrix

#

X M Xt should however be a correlation matrix

final kiln Dec 23, 2023, 11:00 PM

#

desert oar how are you measuring the induced distance?

It's more of an induced dot product, and even then, it's still not correct because M doesn't satisfy the conditions to be a metric tensor.

I'm just taking the embeddings and putting them in x of xMx.T

final kiln Dec 23, 2023, 11:01 PM

#

desert oar as a quick test, you should get the same outputs from both versions, given the s...

Yes I checked that

#

The end result is a tensor of integers tho, didn't compare the rest of it

final kiln Dec 23, 2023, 11:02 PM

#

desert oar M is just the product of two projections, right? does it need to be symmetric? i...

It doesn't need to be, but to maintain the nice picture of M being a metric tensor it needs to be symmetric and xMx.T needs to result in positive values

#

It might be super worth it to insist on making M a metric tensor somehow, cuz then we have access to a ton of math that exists out there to describe high dimensional non eucledian spaces

#

Like, if the network doesn't mind doing it, or even the performance doesn't drop in any way, this way would be a lot more interpretable

desert oar Dec 23, 2023, 11:06 PM

#

final kiln The end result is a tensor of integers tho, didn't compare the rest of it

yeah but are they identical?

final kiln Dec 23, 2023, 11:07 PM

#

desert oar yeah but are they identical?

They are the same result

desert oar Dec 23, 2023, 11:08 PM

#

nice

final kiln Dec 23, 2023, 11:09 PM

#

BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')
AFTER M
Using metric tensor thing
Using metric tensor thing
Using metric tensor thing
BEFORE M
input = tensor([2, 2, 2, 1, 2, 1, 0, 0, 1, 1, 1], device='cuda:0')
output = tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2], device='cuda:0')

#

copied the last print on accident ._.

#

in any case, the next step is to see if I can train it without Wk and Wq, directly using M, and then im gonna see if i can force M to be a metric

desert oar Dec 23, 2023, 11:13 PM

#

interesting

#

does it make sense that it should be?

final kiln Dec 23, 2023, 11:18 PM

#

desert oar does it make sense that it should be?

Yes, I'm hoping the network can induce a metric on the space of embeddings. If it does, we can start talking about the latent spaces in terms of distances and angles and areas and etc, which, at least for my brain, is a lot more intuitive to talk about.

desert oar Dec 23, 2023, 11:24 PM

#

final kiln Yes, I'm hoping the network can induce a metric on the space of embeddings. If i...

well maybe my linear algebra is lacking here, but why should the composition of two projections be a metric tensor?

final kiln Dec 23, 2023, 11:26 PM

#

desert oar well maybe my linear algebra is lacking here, but why should the composition of ...

There's no reason, I'm just gonna do away with the projections and keep M as the only learnable parameter, and then force M to be metric, I might test out other constraints, but this one would be the more interesting to me

desert oar Dec 23, 2023, 11:28 PM

#

ah, i see

#

interesting idea for sure

final kiln Dec 23, 2023, 11:30 PM

#

It might be too restrictive for the network though, but there's only one way to find out

desert oar Dec 23, 2023, 11:30 PM

#

i'm not familiar with metric tensors in general. that X M X.T has a nice interpretation in that case? aside from being a correlation matrix

#

or do the elements of M take on some specific interpretation?

final kiln Dec 23, 2023, 11:31 PM

#

I'm gonna see if I do some benchmarks of Wq, Wk vs M to see if I found an optimization, but I doubt it since I suspect that the researchers that came up with this started from xMx.T and then split it into Wq and Wk for optimization purposes

desert oar Dec 23, 2023, 11:33 PM

#

i'm not sure about that actually. my impression is that the QKV system was always meant to be the "soft lookup" mechanism

final kiln Dec 23, 2023, 11:33 PM

#

desert oar i'm not familiar with metric tensors in general. that X M X.T has a nice interpr...

So the metric tensor defines a dot product:

dot(x, y) = xMy^T

If you batch it, the final matrix will just be a table of dot products

desert oar Dec 23, 2023, 11:33 PM

#

would have to look into the literature from 2016 and earlier to see the history of the idea before the big attention is all you need paper

desert oar Dec 23, 2023, 11:34 PM

#

final kiln So the metric tensor defines a dot product: dot(x, y) = xMy^T If you batch it...

right, that's a pretty nice way to interpret the Q K.T operation

#

are you retraining nanogpt or something?

final kiln Dec 23, 2023, 11:35 PM

#

ah, was gonna attach the code but the python bot didnt let me

#

https://paste.pythondiscord.com/66GQ

#

inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])
print("AFTER M")
nanoGPT.finish()
inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("input =", inputs[3])
print("output =", max_indices[3])

this compares the output

#

I'm gonna clean it up a bit and put it in its own repo because now there's gonna be a lot of experiments

#

I'm too curious aboout the metric thing, but isn't really a priority

#

the actual thing I have to do now is to branch the model so that it can take in two inputs

#

also train it on a corpus of text like shakespear

#

and start thinking about how I'm gonna do the whole voice thing

#

the whole thing is gonna be inside a simple web app that im gonna code, it's gonna be like:

http/ws traffik <---> traefik service <-> go + htmx + tmpl <-> fastapi + model

all in compose

desert oar Dec 23, 2023, 11:43 PM

#

final kiln the actual thing I have to do now is to branch the model so that it can take in ...

nice, there should be plenty of examples and prior art for that, again it's kind of what transformers were originally designed to do

#

i'm impressed at all the projects you've made time for, i barely have time to sit down and think through a project idea

final kiln Dec 23, 2023, 11:44 PM

#

desert oar i'm impressed at all the projects you've made time for, i barely have time to si...

being unemployed has its perks ig

desert oar Dec 23, 2023, 11:44 PM

#

hah, enjoy it

#

as long as you have the money

#

there was a period of time when i had no living expenses and no job, my biggest regret was not milking that for longer

final kiln Dec 23, 2023, 11:45 PM

#

being in a LCOL country and remoting to HCOL allows me to do these kinds of breaks to upskill

#

tho this is the first and last time tbh, im hoping to land an on site ML Eng job to get me both out of the country and out of the house >.>

desert oar Dec 24, 2023, 12:10 AM

#

fair enough. just remember to enjoy the moment of freedom!

#

these are also great projects for ML Eng

onyx forge Dec 24, 2023, 2:36 AM

#

Hello I just got into AI & genetic learning so I made this messy rushed prototype for testing & getting a hold of the basics but I cant get it to actually learn anything, it’s goal is to move towards the cookie emoji. I have no idea if it’s a problem with the agent archatecture itself or the way I’m trying to teach it, could someone take a look & point out what I’m doing wrong? https://paste.pythondiscord.com/5PZQ

mint plume Dec 24, 2023, 3:15 AM

#

Hello everyone!

brittle storm Dec 24, 2023, 9:08 AM

#

ok so, i have a python assistant and i am adding a new feature to it.. called "tasks".. its like the to-do lists.. and i connected it to the database and all and i even managed it to get the task and inform me when its time.. i want it to run in the background when the another script is running foreground.. and check if the time matches. how do i make it run in the background and run another script foreground

past meteor Dec 24, 2023, 9:38 AM

#

onyx forge Hello I just got into AI & genetic learning so I made this messy rushed prototyp...

Is there any reason why you're not doing the standard breadth first search?

pure pond Dec 24, 2023, 5:11 PM

#

can anyone recommend me a kaggle competition / dataset / something like that to get beginner experience with finetuning bert models? The task doesnt matter so much I guess, I want to get pytorch experience mainly (coming from knowing a lot of theory of transformers), but something in fashion would be nice, so related to rag or clustering

serene scaffold Dec 24, 2023, 5:27 PM

#

pure pond can anyone recommend me a kaggle competition / dataset / something like that to ...

try fine-tuning a BERT model for named entity recognition or sequence classification.

#

(I am a computational linguist professionally--those are the first two things for which I fine-tuned BERT to learn pytorch.)

desert oar Dec 24, 2023, 5:29 PM

#

serene scaffold try fine-tuning a BERT model for named entity recognition or sequence classifica...

how does a transformer work for NER? do you have labeled regions in a token sequence and it emits a sequence like [null, null, ENTITY, ENTITY, null, ENTITY]?

serene scaffold Dec 24, 2023, 5:32 PM

#

desert oar how does a transformer work for NER? do you have labeled regions in a token sequ...

you add another linear layer for the number of classes in the data, and each output is a tensor of shape (batch_size, sequence_length, class). For the sequence_length dimension, all sequences are right-padded with padding tokens to make them as long as the longest sequence in the batch. And then each element represents the probability that the token for that "row" belongs to the class for that "column". You need a "column" to represent the null class (the token is not an entity)

#

also, since BERT does subtokenization, you usually get better results if you label all but the first subtoken as null.

#

So if you tokenize "That is unbelievable" as ["That", "is", "un", "##believ", "##able"], and you're classifying parts of speech for pronouns and adjectives, the classifications would be [PRON, null, ADJ, null, null]

#

@pure pond take note of that ^

desert oar Dec 24, 2023, 5:43 PM

#

serene scaffold you add another linear layer for the number of classes in the data, and each out...

makes sense. so the output for one sequence in the batch is a distribution of class weights at each token?

#

          class 1    class 2    class 3
token1:  ...
token2:  ...

serene scaffold Dec 24, 2023, 5:44 PM

#

desert oar makes sense. so the output for one sequence in the batch is a distribution of cl...

Indeed

desert oar Dec 24, 2023, 5:44 PM

#

one thing i've actually never been sure about for these models is how text generation works

#

or in this case, output generation

#

do you feed in a sequence one token at a time? like predict(seq[:1]), predict(seq[:2]), ...?

serene scaffold Dec 24, 2023, 5:45 PM

#

I've been needing to learn more about that since interactive LLMs became ascendant

desert oar Dec 24, 2023, 5:45 PM

#

i know bert isn't really meant for that, you just stick in a sequence and get a sequence of outputs, right?

serene scaffold Dec 24, 2023, 5:45 PM

#

for BERT, yes

#

you can actually fine-tune GPT-2 for anything you might want to fine-tune a BERT variant for, and the API is the same because of hugging face. but the inner workings presumably are not.

This is a notebook I wrote a few months ago that does both. https://github.com/center-for-threat-informed-defense/tram/blob/main/model-development/train_single_label.ipynb

GitHub

tram/model-development/train_single_label.ipynb at main · center-fo...

TRAM is an open-source platform designed to advance research into automating the mapping of cyber threat intelligence reports to MITRE ATT&CK®. - center-for-threat-informed-defense/tram

desert oar Dec 24, 2023, 5:57 PM

#

serene scaffold you can actually fine-tune GPT-2 for anything you might want to fine-tune a BERT...

interesting. afaik the main difference is that gpt-style models mask off future tokens while bert-style models don't. is that right?

#

the idea being that gpt-style models are meant to only scan backwards through the sequence, whereas bert-style models are meant to examine the sequence as a whole

signal holly Dec 24, 2023, 6:00 PM

#

is this supposed to happen
I was trying to convert my dataframe to all strings
and when I check the dtype I get all objects

desert oar Dec 24, 2023, 6:05 PM

#

signal holly is this supposed to happen I was trying to convert my dataframe to all strings ...

by default pandas encodes strings as plain python objects. "object" dtype means that your series is internally just a python list (in this case a list of strings, but the data could actually be any type)

#

i suggest pd.StringDtype() instead of str

#

i think you can also pass "string" as a shorthand

#

!d pandas StringDtype

#

!d pandas.StringDtype

arctic wedgeBOT Dec 24, 2023, 6:05 PM

#

pandas.StringDtype


class pandas.StringDtype(storage=None)```
Extension dtype for string data.

Warning

StringDtype is considered experimental. The implementation and parts of the API may change without warning.

pure pond Dec 24, 2023, 6:05 PM

#

serene scaffold <@225333243912781824> take note of that ^

Ok thanks, how did you set up that experiment though? So youre using a pretrained encoder from the sounds of it? Where did you get that from? And the dataset?

desert oar Dec 24, 2023, 6:06 PM

#

they say it's experimental, but it's been stable for a while now

pure pond Dec 24, 2023, 6:09 PM

#

desert oar interesting. afaik the main difference is that gpt-style models mask off future ...

And yeah thats right. Openais gpts are decoders, which means they dont pay attention to the future tokens in the sequence. It doesnt mean that they dont produce embeddings which I see a lot of people get confused about. Each iteration they'll take in the whole sequence, get embeddings from attention and mlp layers, then produce a prediction for the next token in the sequence. Then they repeat, looping over those steps, adding the new token to the end of the input sequence each time (to get at your other comment). When training decoders, you mask out all the future tokens, and just predict the first mask. Eg if you have a sentence of 8 tokens, you can split that up into 9 training examples, the first you have token 1 and predict token 2, the 2nd you have 1 and 2 and predict 3, and so on (up to predicting token 9). https://www.youtube.com/watch?v=kCc8FmEb1nY is a super great video, you can just watch the 30m if you dont care about the details of attention layers even

#

auto-regressive means it "consumes" the output for the next input. Thats how people usually say it, I dont like that phrasing though, because it sounds like the output is used up or something. So to me, auto-regressive is that the outputs get added to the input, as you repeatedly loop over the (growing) input

#

This is a really great post from the other day about llm inferencing https://vgel.me/posts/faster-inference/

How to make LLMs go fast

Blog about linguistics, programming, and my projects

queen junco Dec 24, 2023, 6:13 PM

#

desert oar no. as an exercise, try to fit the line y=3x+1 using linear regression with no i...

I ment what's the point

desert oar Dec 24, 2023, 6:13 PM

#

pure pond And yeah thats right. Openais gpts are decoders, which means they dont pay atten...

dont produce embeddings
by this you mean embeddings for the complete sequence, right?

pure pond Dec 24, 2023, 6:17 PM

#

well, thats right, if by complete sequence you mean for all time iterations, not just up to and including the current time iteration. But what I meant when I said that, was that some people hear encoder vs decoder, and then wonder how a decoder model can work when just given tokens as the input, because they think no encoder means no embeddings

serene scaffold Dec 24, 2023, 6:17 PM

#

desert oar interesting. afaik the main difference is that gpt-style models mask off future ...

idk about that specifically, but GPT is one-directional and BERT is bi-directional, yes.

desert oar Dec 24, 2023, 6:17 PM

#

pure pond well, thats right, if by complete sequence you mean for all time iterations, not...

i see, that makes sense

serene scaffold Dec 24, 2023, 6:17 PM

#

pure pond Ok thanks, how did you set up that experiment though? So youre using a pretraine...

BERT models come with a tokenizer and you have to use that tokenizer.

pure pond Dec 24, 2023, 6:17 PM

#

gpt is bidirectional, but up to a cutoff token. Token 1 can attend to token 5, if both already exist

#

i think (actually im not sure anymore)

serene scaffold Dec 24, 2023, 6:18 PM

#

and my team created the dataset. but I wouldn't use it for what you're doing.

queen junco Dec 24, 2023, 6:18 PM

#

💀

rn_image_picker_lib_temp_9dc090b2-48ef-4d9f-a2c4-4bd0d864e0f3.jpg

#

Bro

#

3 mil lines

pure pond Dec 24, 2023, 6:19 PM

#

serene scaffold BERT models come with a tokenizer and you have to use that tokenizer.

a tokenizer sure but thats not the weights of the bert itself

queen junco Dec 24, 2023, 6:19 PM

#

20354 kb

serene scaffold Dec 24, 2023, 6:19 PM

#

pure pond a tokenizer sure but thats not the weights of the bert itself

you'd be fine-tuning an existing BERT model

queen junco Dec 24, 2023, 6:19 PM

#

rn_image_picker_lib_temp_adb51ed5-efce-41d2-8036-469cd2be715f.jpg

serene scaffold Dec 24, 2023, 6:19 PM

#

queen junco

why do you keep posting uncropped pictures

queen junco Dec 24, 2023, 6:20 PM

#

Because I can

serene scaffold Dec 24, 2023, 6:20 PM

#

okay? it seems like spam.

pure pond Dec 24, 2023, 6:20 PM

#

yeah thats what I mean, where do you get the pretrained bert? Also, when training your ner classifier, did you freeze the bert and just train the head? Or no

queen junco Dec 24, 2023, 6:21 PM

#

serene scaffold okay? it seems like spam.

I need help with my activation

serene scaffold Dec 24, 2023, 6:21 PM

#

pure pond yeah thats what I mean, where do you get the pretrained bert? Also, when trainin...

you get the pretrained BERT from huggingface. and I'm pretty sure that training can modify all the weights, including the existing BERT weights and the linear output layer that you add.

queen junco Dec 24, 2023, 6:21 PM

#

My brain keep output the same thing

serene scaffold Dec 24, 2023, 6:21 PM

#

I have to go do Christmas stuff
everyone be good.

queen junco Dec 24, 2023, 6:21 PM

#

I think it has something to do with my activation function

pure pond Dec 24, 2023, 6:22 PM

#

ok ty for the help

queen junco Dec 24, 2023, 6:22 PM

#

pure pond ok ty for the help

Can you help me

pure pond Dec 24, 2023, 6:22 PM

#

whats the problem?

#

20mb too big?

queen junco Dec 24, 2023, 6:22 PM

#

So my output is the same every time

#

No that's not it

#

My output is always the same and I think it's the activation function

pure pond Dec 24, 2023, 6:23 PM

#

what output

queen junco Dec 24, 2023, 6:23 PM

#

The neral network output

#

There all the same

pure pond Dec 24, 2023, 6:23 PM

#

so?

queen junco Dec 24, 2023, 6:24 PM

#

I don't want that

pure pond Dec 24, 2023, 6:24 PM

#

why not

queen junco Dec 24, 2023, 6:24 PM

#

It's not supposed to do it

pure pond Dec 24, 2023, 6:24 PM

#

theres many reasons why it could be doing that

queen junco Dec 24, 2023, 6:24 PM

#

Because it's supposed to selecting one thing

pure pond Dec 24, 2023, 6:24 PM

#

i dont wanna explain every possible thing

queen junco Dec 24, 2023, 6:24 PM

#

I thinks it's the activation function though

pure pond Dec 24, 2023, 6:25 PM

#

you mean every output class has the same value?

queen junco Dec 24, 2023, 6:25 PM

#

Yes

#

Lemme get the function rq

#

pure pond Dec 24, 2023, 6:26 PM

#

bro install discord on your machine

queen junco Dec 24, 2023, 6:27 PM

#

What?

#

Oh

#

The y is set to the last amount of inputs going into the hidden network

#

So I can get a range from lest output and set it between 0-1

#

And the inputs are randomized from the beginning

#

So they shouldn't be all the same

#

I know the weight

#

Is it because the weights aren't different

pure pond Dec 24, 2023, 6:36 PM

#

ok I have a suggestion, try changing the activation function to figure out whats going wrong. Change it to just add up all the input values it's recieving. And then change that to add up 2* each value. And make sure each change has the expected effect on your outputs. You should find a clue doing that

slow vigil Dec 24, 2023, 6:49 PM

#

I'm scraping some data from a website and it's all legal as per their robots.txt but the way they have their site set up is that I have to access a page for every single record I want to view and in my case that's like 100,000 records. So does anyone know how slow I should make my program run to prevent them from thinking I'm trying to DDOS their site?

signal holly Dec 24, 2023, 7:32 PM

#

desert oar by default pandas encodes strings as plain python objects. "object" dtype means ...

looks like it's still experimental because the module doesn't have it

desert oar Dec 24, 2023, 7:40 PM

#

signal holly looks like it's still experimental because the module doesn't have it

what version of pandas do you have? it's been available for years...

#

also you can just use .astype like before

signal holly Dec 24, 2023, 7:42 PM

#

desert oar also you can just use `.astype` like before

yep but can't do re.sub or tokenization since my dataframe isn't registered as a string

signal holly Dec 24, 2023, 7:43 PM

#

desert oar what version of pandas do you have? it's been available for years...

well my jupyter notebook app icon on windows has a red x next to it
which I've heard means that it's likely corrupted

#

maybe I should be more concerned with that

final kiln Dec 24, 2023, 7:47 PM

#

alright, I got some possibly interesting results, the left side is the correlation matrices (or dot product tables) of the model with the normal Q, K, V. The right side is the same, but for the model using only M where M is forced to be symmetric.

#

for example, head 0 metric 0, it's pretty clear that it is just a number 0 detector (0 is the first token)

#

while head 0 metric 1 is a number 2 detector

#

these are the actual tensors, left side is WqWk.T right side is M

#

I collapsed the model into a single self attention module and reduced the number of dimensions, these matrices are tables of actual distances between the vectors in the metric spaces that the network is creating

#

the first matrix essentially encodes the sort order between the numbers 0, 1, 2

#

not sure if this only happens in the metric tensor network, gonna try with the other one too just to be sure

#

this is from the normal Q, K, V network

#

yellow = larger distance

#

ok, so the yellow on the right most side is ocurring because the matrix allows for negative numbers, the diagonals are always 0 because the vectors are subtracted with themselves

#

I see this as an absolute win because I can clearly interpret what is happening on the first image

#

while in the second image im not sure if I can do the same on any of the tables

indigo wing Dec 24, 2023, 8:36 PM

#

Hello, can anyone give me a Datascience, AI/ML and DL roadmap. I wish to deploy my own model and shine my resume. I also wish to make some great research papers and do github dtuff in ds mostly. I am very bad in python and this all. May I please get some roadmap/advice/resource I can follow?

I am a datascience student but wish to learn from scratch(didnt pay attention till now) and redo the math as well. I am asking for any solid advice. I am interested in NLP, video generation etc

#

I wish to become proficient in all the AI/ML related things

final kiln Dec 24, 2023, 8:41 PM

#

indigo wing Hello, can anyone give me a Datascience, AI/ML and DL roadmap. I wish to deploy ...

see the first pinned message

shut slate Dec 24, 2023, 8:55 PM

#

What is the difference betwwen difference between df.loc['column'] and df.loc[df['column']]

tidal bough Dec 24, 2023, 9:02 PM

#

shut slate What is the difference betwwen difference between df.loc['column'] and df.loc[df...

both of these are pretty weird in my opinion - df.loc[thing] is for getting a row with key thing, not a column. and hence df.loc[df['column']] will also fail unless entries of df['column'] are keys of the dataframe's index, which seems like an unusual situation.

shut slate Dec 24, 2023, 9:03 PM

#

don't get it lol

#

sorry

desert oar Dec 24, 2023, 10:14 PM

#

final kiln I collapsed the model into a single self attention module and reduced the number...

what are the 3 different matrices here?

#

and what are the axes? there are only 3 tokens in the input?

desert oar Dec 24, 2023, 10:15 PM

#

shut slate What is the difference betwwen difference between df.loc['column'] and df.loc[df...

the former selects the row labeled "column", the latter selects rows with labels matching the contents of the column "column"

#

the 2nd thing is very likely not something you ever need to do

desert oar Dec 24, 2023, 10:17 PM

#

final kiln alright, I got some possibly interesting results, the left side is the correlati...

so you just constrained it to be symmetric, but let the diagonals be negative & didn't try to make it positive definite, right?

final kiln Dec 24, 2023, 10:17 PM

#

desert oar and what are the axes? there are only 3 tokens in the input?

Axis numbers are meaningless, I'm not being very rigorous.

I'm comparing every allowed token with each other. There's three tokens, A, B and C and the model sorts a sequence of 11 tokens.

The values of the matrices are:

u = vocab[I] - vocab[j]

value_ij = uMu.T

The three matrices come from the three self attention heads in the self attention module.

shut slate Dec 24, 2023, 10:17 PM

#

desert oar the former selects the row labeled `"column"`, the latter selects rows with labe...

so second can be done with df.query

desert oar Dec 24, 2023, 10:18 PM

#

final kiln Axis numbers are meaningless, I'm not being very rigorous. I'm comparing every ...

ah okay, they all say "head 0" so i was confused

desert oar Dec 24, 2023, 10:18 PM

#

shut slate so second can be done with df.query

sort of... what are you actually trying to do here?

final kiln Dec 24, 2023, 10:18 PM

#

desert oar so you just constrained it to be symmetric, but let the diagonals be negative & ...

I only constrained it to be symmetric but it seems the model ends up making it positive.

desert oar Dec 24, 2023, 10:18 PM

#

i actually don't know: can .query access the index?

final kiln Dec 24, 2023, 10:19 PM

#

desert oar ah okay, they all say "head 0" so i was confused

Ah oops

shut slate Dec 24, 2023, 10:19 PM

#

I am not trying to do anything rn lol. Just exploring

desert oar Dec 24, 2023, 10:19 PM

#

final kiln Ah oops

the ones across the row, not down the column

final kiln Dec 24, 2023, 10:20 PM

#

Should be Module 0 Head 0, Module 0 Head 1 Module 0 Head 2.

desert oar Dec 24, 2023, 10:20 PM

#

shut slate I am not trying to do anything rn lol. Just exploring

df.loc[ idx , col ] in general selects rows with labels in idx and columns with labels in col

#

@final kiln really interesting experiment. where did your input sequences come from?

#

i wouldn't have any idea how to generate realistic synthetic data for something like this

#

can you force it to be positive semidefinite by using the cholesky decomposition somehow?

shut slate Dec 24, 2023, 10:24 PM

#

So how would you change values where by in Column A it equals bob for example?

desert oar Dec 24, 2023, 10:25 PM

#

not sure if setting L L.T = M is enough

final kiln Dec 24, 2023, 10:25 PM

#

desert oar <@935270247366271027> really interesting experiment. where did your input sequen...

Oh, it's actually easy if you think about it, so each letter maps to an index, A -> 0, B -> 1 and C -> 2, so I create a random sequence of size 11, [0, 1, 1, 0, 2, ...] And then sort it with a normal sort Algo to serve as ground truth

final kiln Dec 24, 2023, 10:25 PM

#

desert oar not sure if setting `L L.T = M` is enough

That's what I'm using at the moment

desert oar Dec 24, 2023, 10:26 PM

#

shut slate So how would you change values where by in Column A it equals bob for example?

yes, .loc also allows boolean series for selecting rows by true/false.

eq_bob = df["a"] == "bob"
df.loc[eq_bob]

final kiln Dec 24, 2023, 10:26 PM

#

But idk if it forces it to be positive, maybe it does ?

desert oar Dec 24, 2023, 10:26 PM

#

final kiln Oh, it's actually easy if you think about it, so each letter maps to an index, A...

oh it's sorting, right

shut slate Dec 24, 2023, 10:26 PM

#

desert oar yes, `.loc` also allows boolean series for selecting rows by true/false. ```py e...

Is this the easiest way?

desert oar Dec 24, 2023, 10:26 PM

#

final kiln But idk if it forces it to be positive, maybe it does ?

i think so, it's kinda like squaring

desert oar Dec 24, 2023, 10:26 PM

#

shut slate Is this the easiest way?

it's the standard way. your other option is df.query

shut slate Dec 24, 2023, 10:27 PM

#

ok cool

desert oar Dec 24, 2023, 10:27 PM

#

however if you want to select rows where the row label (aka "index") is a specific value, you use .loc directly: df.loc["bob"]

#

often if you have some kind of unique identifier, id number, timestamp, etc it's a good idea to use that for the row labels. it makes some things a lot tidier and a lot faster

final kiln Dec 24, 2023, 10:28 PM

#

desert oar i think so, it's kinda like squaring

Just checked, it is, that explains why they all come positive, which is nice. I think the other property that I need is that it must be non-degenrate

desert oar Dec 24, 2023, 10:29 PM

#

final kiln Just checked, it is, that explains why they all come positive, which is nice. I ...

again im not sure you can get that with cholesky decomp

final kiln Dec 24, 2023, 10:31 PM

#

desert oar again im not sure you can get that with cholesky decomp

I need to review my linear algebra >.>

#

But it seems that yes

#

Then the experiment is complete. And I've halved the number of parameters in each head because it's a symmetric matrix

desert oar Dec 24, 2023, 10:33 PM

#

i'm really surprised this isn't an established technique, if you have code I'd love to see it so I can play around with it

final kiln Dec 24, 2023, 10:33 PM

#

Sure, I'll send you in a bit.

#

I've just taken Wq and matmul it with itself

#

And deleted Wk

#

Next step really is see if this scales, benchmark it and all that stuff.

#

If no one has ever done this I'm gonna write a small paper on it

desert oar Dec 24, 2023, 10:35 PM

#

yeah that's what i wanted to try, running nanogpt with it

#

definitely worth a paper with your results

#

and if someone else already did it, then you'll find out 😆

desert oar Dec 24, 2023, 10:37 PM

#

shut slate ok cool

next time you ask for help in 2 different places please link to your other conversation for context

final kiln Dec 24, 2023, 10:39 PM

#

desert oar and if someone else already did it, then you'll find out 😆

I def wanna find out before commiting to benchmarks

shut slate Dec 24, 2023, 10:39 PM

#

Ok thank you

final kiln Dec 24, 2023, 10:41 PM

#

desert oar yeah that's what i wanted to try, running nanogpt with it

https://paste.pythondiscord.com/W6BA

#

        # prepare for matrix multiplication
        q_b3wk = q_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
        # k_b3wk = k_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
        v_b3wk = v_bwc.view(batch, words, 3, coordinates // 3).transpose(1, 2)
      
        # perform matrix multiplication
        attention_scores_b3ww = q_b3wk @ q_b3wk.transpose(-1, -2)

this is the key difference

#

notation goes like

#

name_of_tensor_bwk, b = batch, w = words, k = coordinates // 3

#

it's tensor notation

#

v_b3wk = shape of (batch, 3, words, coordinates // 3)

desert oar Dec 24, 2023, 10:44 PM

#

makes sense

#

hmmm, wait. that's not the same thing

final kiln Dec 24, 2023, 10:44 PM

#

inputs = batch.to("cuda")
outputs = nanoGPT(inputs)
logits = F.softmax(outputs, dim=-1)
max_values, max_indices = logits.max(dim=-1)
print("BEFORE M")
print("input =", inputs[3])
print("output =", max_indices[3])

this will print a result after training

desert oar Dec 24, 2023, 10:44 PM

#

or is it? i guess you're just recycling the variable name

#

i'd call them W or something like that

final kiln Dec 24, 2023, 10:45 PM

#

they already include the matrix mul

desert oar Dec 24, 2023, 10:45 PM

#

yeah i see it

final kiln Dec 24, 2023, 10:45 PM

#

(xWq)(xWq).T = QQ.T

desert oar Dec 24, 2023, 10:47 PM

#

right

#

i'm on my phone now so i'll check the full code later. thanks!

final kiln Dec 24, 2023, 10:59 PM

#

at least gpt4 cant find anything of the sort

#

Im skeptical about how far this can be taken tho, if I can teach it to do shakespear without loss of efficiency I'll be very excited indeed

magic island Dec 25, 2023, 12:40 AM

#

anyone here ever worked with image or object detection?

pale hemlock Dec 25, 2023, 1:07 AM

#

any one around to toy with an idea?

onyx forge Dec 25, 2023, 1:36 AM

#

past meteor Is there any reason why you're not doing the standard breadth first search?

the what?

white hedge Dec 25, 2023, 3:12 AM

#

hi

lament pine Dec 25, 2023, 11:42 AM

#

https://www.youtube.com/watch?v=OGxgnH8y2NM&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&pp=iAQB is this playlist good enough to follow? ( considering it's 7 years old now )

YouTube

sentdex

Practical Machine Learning Tutorial with Python Intro p.1

The objective of this course is to give you a holistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms.

In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, a...

▶ Play video

#

If anyone followed it , kindly share your feedback

vapid garden Dec 25, 2023, 2:07 PM

#

magic island anyone here ever worked with image or object detection?

Me

#

Hi everyone I received a hiring challenge which is pretty lame I think, basically it states that I have to find the pattern from the number which pop up in the website continuesly and I'm confident that I can create ml model but how can anyone suggest ideas on how I can use javascript to predict the label simultaneously while the numbers appear the link for the website is below

https://superbhai.in/hiring-challenge-unscramble?m=n

tidal bough Dec 25, 2023, 2:13 PM

#

you did find the hidden link to the full dataset, right?

vapid garden Dec 25, 2023, 3:13 PM

#

No, Idk how to do... I don't know webdev

tidal bough Dec 25, 2023, 3:15 PM

#

~~sounds like you won't get hired, then~~
if you look at the source, there's a comment suggesting to find what makes the numbers change. by looking at the JS script in the debugger (which even has a source map, no decompilation needed) you can find the link to the full dataset in a comment.

whole zephyr Dec 25, 2023, 8:42 PM

#

Hi, I have an algorithm that detects trendlines the "low" and "high" levels of a signal in a dataframe, for a time window that I can set.

After I have the trendlines' slopes saved into the dataframe, I try to find which trendline pairs are parallel and: ascending, descending or horizontal.

Can anyone hop on a call with me to look through the code? It's a bit long to explain

serene scaffold Dec 25, 2023, 8:51 PM

#

whole zephyr Hi, I have an algorithm that detects trendlines the "low" and "high" levels of a...

Just so you know, it's pretty unlikely that anyone will join a call with you. If they do, that's great. But you should probably ask the most focused question that you can and show relevant code as text.

whole zephyr Dec 25, 2023, 8:52 PM

#

serene scaffold Just so you know, it's pretty unlikely that anyone will join a call with you. If...

how do I paste code here that will look more like code and not plain text?

serene scaffold Dec 25, 2023, 8:52 PM

#

whole zephyr how do I paste code here that will look more like code and not plain text?

!code

arctic wedgeBOT Dec 25, 2023, 8:52 PM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold Dec 25, 2023, 8:52 PM

#

^ permanently remember the information here

whole zephyr Dec 25, 2023, 8:58 PM

#

Here are the functions I use to:

classify the trend patterns (wedges, channels, triangles) from the trendlines' slopes
get the dataframe index of the N-th occurrence of a certain trend pattern that was identified by the previous function
https://paste.pythondiscord.com/HLCA

#

My problem is that the trendlines for channels don't seem to look too parallel, even though sensitivity = 0.001 and the condition for descending channel is abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity

In the second pic, it says I have slopes -0.06 and -0.01 which give an absolute difference of 0.05 that's greater than 0.01, which is 10*sensitivity.

I don't know where my function failed.

desert oar Dec 25, 2023, 11:56 PM

#

whole zephyr My problem is that the trendlines for channels don't seem to look too parallel, ...

so maybe look further up in your code? if the slopes are not what you expected, maybe you made a mistake in that part of the code

rigid cape Dec 26, 2023, 5:21 AM

#

Hey guys, sorry for asking such a beginner question. But I have a confusion. Do I need to learn a lot of math before learning ml and do I need SciKitLearn before I touch something like a Tensorflow ?

What would a typical roadmap look like

grizzled locust Dec 26, 2023, 7:09 AM

#

guys, what did i do wrong.

#

i wanted to look like this

#

but instead, it look like this

whole zephyr Dec 26, 2023, 7:42 AM

#

desert oar so maybe look further up in your code? if the slopes are not what you expected, ...

My first suspicion was the part that "marks" the trendline pattern with 1, but I then input the upper and lower slopes into the if conditions and got "nothing":

sensitivity = 0.001
slmin = -0.06
slmax = -0.01
if abs(slmin - slmax) < sensitivity*10 and slmin > sensitivity and slmax > sensitivity:
    print("chan_asc")
# chan_desc is appended 1 when slopes are almost parallel and both of them are below a small negative value
elif abs(slmin - slmax) < sensitivity*10 and slmin < -sensitivity and slmax < -sensitivity:
    print("chan_desc")
# chan_hor is appended 1 when slopes are almost parallel and both of them have a really small amount (around 0)
elif abs(slmin - slmax) < sensitivity*10 and abs(slmin) < sensitivity and abs(slmax) < sensitivity:
    print("chan_hor")
# tri_desc is appended 1 when slmin is very small and positive and slmax is negative
elif slmin > sensitivity and slmax < -sensitivity and abs(slmin) < sensitivity:
    print("tri_desc")
# tri_asc is appended 1 when slmax is very small and negative and slmin is positive
elif slmax < -sensitivity and slmin > sensitivity and abs(slmax) < sensitivity:
    print("tri_asc")
# wed_desc is appended 1 when slmin is negative and slmax is negative but slmin is greater than slmax
elif slmin < -sensitivity and slmax < -sensitivity and slmax < slmin: 
    print("wed_desc")
# wed_asc is appended 1 when slmin is positive and slmax is positive but slmin is greater than slmax
elif slmin > sensitivity and slmax > sensitivity and slmin > slmax:  
    print("wed_asc")
else:
    print("nothing")

#

I also tried to see if the plotting function calculates the trendlines differently than the function that adds their slope and intercept to the dataframe, but they use the exact same method on the exact same time frame, so that's no issue either. I'm literally out of ideas on what to test

bold timber Dec 26, 2023, 8:17 AM

#

So all this time, I've been "trapped" by my curiosity about Transformers architecture. And it turns out, to study the Transformer architecture, I had to go back and learn the creation of machine translation models using RNN and RNN+Bahdanau attention.

As a result, I have now built three machine translation models using the TensorFlow library. However, when I compare the results, it turns out that the machine translation model using the Transformer architecture has much lower accuracy compared to the other two RNN models, only around 55%. Here are the details:

Classic RNN
loss: 0.1082 - accuracy: 0.9762 - val_loss: 0.4065 - val_accuracy: 0.9304

RNN + Bahdanau Attention
loss: 0.2247 - accuracy: 0.9593 - val_loss: 0.5537 - val_accuracy: 0.9100

Transformer
loss: 0.9383 - accuracy: 0.7688 - val_loss: 2.6770 - val_accuracy: 0.5565

On the one hand, when I look at the results of others who have made machine translation with the Transformer architecture, the validation loss values they get are similar to mine, around 2.4 (this output is generated when using the PyTorch library).

My question is, why does the accuracy of the Transformer architecture seem much lower than that of the RNN models? Initially, I thought this might be because the accuracy metric used for model performance measurement is "accuracy," which, when creating a machine translation model, should use other metrics, such as using BLEU Score. However, on the other hand, when I want to use BLEU Score as the accuracy metric, my computer is not capable of running it.

Is this "accuracy" metric not reliable, so that's why the RNN model performs much better than the Transformer? or what?

proud wing Dec 26, 2023, 8:32 AM

#

How deep did you go with regards to the transformer architecture learning?

odd meteor Dec 26, 2023, 8:33 AM

#

Merry Christmas guys! Today is a good day to close your laptop and enjoy the Christmas celebration with your family and friends. 😁👌

proud wing Dec 26, 2023, 8:35 AM

#

Bleu is likely to be more suitable here @bold timber

#

Is your model tuned to the dataset ?

bold timber Dec 26, 2023, 8:40 AM

#

proud wing Bleu is likely to be more suitable here <@786960616664727572>

I think so, but my computer not capable to run it if I use BLEU score as a accuracy metrics😅

bold timber Dec 26, 2023, 8:40 AM

#

proud wing Is your model tuned to the dataset ?

can you elaborate more about this?

odd meteor Dec 26, 2023, 8:45 AM

#

bold timber So all this time, I've been "trapped" by my curiosity about Transformers archite...

When it comes to neural machine translation, the evaluation metric isn't the conventional accuracy score.

Metrics like BLEU, GLEU, METEOR, Perplexity, COMET, etc are used.

It's strange to learn you're unable to compute BLEU score on your pc. What exactly happens when you try calculating BLEU? It throws an error message or kernel dies or ??

bold timber Dec 26, 2023, 8:47 AM

#

bold timber So all this time, I've been "trapped" by my curiosity about Transformers archite...

Note: the context here is the machine translation model for english-french

bold timber Dec 26, 2023, 8:50 AM

#

odd meteor When it comes to neural machine translation, the evaluation metric isn't the con...

No. But when I try to use BLEU score as a metrics, I need to set run_eagerly = True, which makes the kernel died (otherwise, it will error)

odd meteor Dec 26, 2023, 8:58 AM

#

bold timber No. But when I try to use BLEU score as a metrics, I need to set run_eagerly = T...

Can I see the code snippet where you're computing BLEU?

If you're using the bleu module from NLTK and it's not working for you (it should work under normal circumstance though), then try using sacreBLEU

https://github.com/mjpost/sacreBLEU

GitHub

GitHub - mjpost/sacrebleu: Reference BLEU implementation that auto-...

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons - GitHub - mjpost/sacrebleu: Reference BLEU implementation that auto-dow...

bold timber Dec 26, 2023, 9:13 AM

#

odd meteor Can I see the code snippet where you're computing BLEU? If you're using the b...

Hi, thanks for the suggestion! I will try to understand it first.

dusk tide Dec 26, 2023, 12:25 PM

#

Hello everyone, Just completed notebook link here https://www.kaggle.com/code/nishchay331/n6-house-prices-advanced-regression-techniques. I'll be glad if you take some time to go through this and point my mistakes as I am a beginner. Suggestions are highly appreciated. Thank you.

bold timber Dec 26, 2023, 2:07 PM

#

odd meteor Can I see the code snippet where you're computing BLEU? If you're using the b...

do you know how to using the bleu module from NLTK as an accuracy metrics in the machine translation model with tensorflow?

odd meteor Dec 26, 2023, 2:25 PM

#

bold timber do you know how to using the bleu module from NLTK as an accuracy metrics in the...

Once you've trained your NMT model, and you'd like to evaluate the model performance, you just need the translated text (the sentence predicted by your model) and the actual text from the target language (the correct sentence you expect your model to predict) compute BLEU score.

from nltk.translate.bleu_score import sentence_bleu
reference = ['It was raining heavily today'.split()]
candidate = 'It It It is raining heavily'.split()
print(sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0)))

We have different types of bleu computation; sentence_bleu, corpus_bleu... This will provide more clarity https://machinelearningmastery.com/calculate-bleu-score-for-text-python/

MachineLearningMastery.com

Jason Brownlee

A Gentle Introduction to Calculating the BLEU Score for Text in Pyt...

BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations. Although developed for translation, it can be used to evaluate text generated for a suite of natural language processing tasks. In this tutorial, you will discover the BLEU score for evaluating and scoring...

bold timber Dec 26, 2023, 2:28 PM

#

odd meteor Once you've trained your NMT model, and you'd like to evaluate the model perform...

Does this mean that the BLEU score is not included in the metrics during model compilation, like model.compile(metrics=[BLEU])?

odd meteor Dec 26, 2023, 2:36 PM

#

bold timber Does this mean that the BLEU score is not included in the metrics during model c...

I have no idea if it's possible to implement it that way in TensorFlow. I use PyTorch not TF.

You might wanna check TensorFlow documentation. Or perhaps, someone who uses TensorFlow could offer a much better feedback on this.

bold timber Dec 26, 2023, 2:39 PM

#

odd meteor I have no idea if it's possible to implement it that way in TensorFlow. I use Py...

Alright, thank you for the respond you provided

bold timber Dec 26, 2023, 3:25 PM

#

odd meteor I have no idea if it's possible to implement it that way in TensorFlow. I use Py...

By the way, how do you determine weights in sentence_bleu?

odd meteor Dec 26, 2023, 3:52 PM

#

bold timber By the way, how do you determine weights in sentence_bleu?

The BLEU score calculation in NLTK gives us leverage to compute either the cumulative or individual BLEU scores of different n-grams.

By default, sentence_bleu() calculates the cumulative 4-gram BLEU score, also called BLEU-4. The weights for BLEU-4 are (0.25, 0.25, 0.25, 0.25) (remember, 1/4 == 0.25)

Now, in the example I showed earlier, the weights are (0.5, 0.5, 0, 0) because I'm interested in computing BLEU-2 a.k.a bigram bleu score (remember 1/2 == 0.5 )

If you check the attached link in my previous response, it has a section called **Individual vs. Cumulative BLEU scores** where this concept was explained in more detailed way

vapid garden Dec 26, 2023, 3:52 PM

#

Hi everyone I'm want to fine tune a MultiModal llm which can generate icons based on input text, I'm not able to find open-source models for it.. any idea?

bold timber Dec 26, 2023, 3:53 PM

#

odd meteor The BLEU score calculation in NLTK gives us leverage to compute either the cumul...

aah I see, thank you so much!

midnight root Dec 26, 2023, 4:04 PM

#

so i seem to have found 2 ways to break chatgbt 3.5

#

is it common? by break i mean by passing the explicit rule

small wedge Dec 26, 2023, 4:06 PM

#

midnight root so i seem to have found 2 ways to break chatgbt 3.5

You mean prompts?

midnight root Dec 26, 2023, 4:06 PM

#

no i just straight up asked questions about something then kept slighty changing the subject

#

it worked

#

twice

agile cobalt Dec 26, 2023, 4:12 PM

#

you can report it directly to OpenAI, but it is not ultra surprising
that sort of stuff (using ai tools / prompt engineering / their vulnerabilities) is not particularly on topic for this server though

midnight root Dec 26, 2023, 4:14 PM

#

also chatgpt does lie alot

#

i just wish there is a way to get actual ai experience

desert oar Dec 26, 2023, 4:15 PM

#

whole zephyr My first suspicion was the part that "marks" the trendline pattern with 1, but I...

and what outcome did you expect for those inputs? are you sure your if/else conditions are correct?

desert oar Dec 26, 2023, 4:15 PM

#

midnight root i just wish there is a way to get actual ai experience

what kind of experience are you looking for?

midnight root Dec 26, 2023, 4:18 PM

#

a computer who is programed to tell the truth, and is not restricted always

#

i understand there are some topics which it should avoid but it seems its always lying or just not telling the whole truth, or just trying as much as possible to give you 1 single opinion

small wedge Dec 26, 2023, 4:30 PM

#

That's not really how these models work, there is no concept of the truth to them

#

They are predicting the most likely text to come after whatever you give them

#

Even if they were trained on 100% factual and true datasets (which they absolutely are not) they could still give you false information

midnight root Dec 26, 2023, 4:32 PM

#

small wedge They are predicting the most likely text to come after whatever you give them

yes i know that but it's so restricted

pure pond Dec 26, 2023, 5:24 PM

#

also, outside of pure logic, truth isnt really so well defined, and especially on topics a lot of people care about. By that I mean, people disagree with eachother, and not beacuse all but 1 are intentionally being malicious

#

you dont even have to look to politics, think of the legal industry. We try to create a set of rules, where if you do x then y happens. But it doesnt work like some complex algorithm in a deterministic way, the quality of your lawyer has a huge impact, and facts if they exist are interpreted

#

so, how do you even measure how truthful an llm is?

#

and sure there things that are widely accepted to be true, or at least by communities who should understand the topic well, but there is so much grey area everywhere that it doesnt help that much (and youre not interested in restriction of topics)

#

@midnight root

#

and if you really just want an unrestricted model now, get mixtral 7x8b running

pure pond Dec 26, 2023, 5:29 PM

#

midnight root is it common? by break i mean by passing the explicit rule

check out https://youtu.be/zjkBMFhNj_g?t=2774

#

anyway

#

I (think I) know a fair amount of deep learning theory but I want to get some practical experience. I'm doing a kaggle competition and im wondering, how do most people do their data processing? By that I mean setting up something to feature engineer each input, not so much the mining. Do you just write whatever function to do whtever you want, or is the industry standard to make (heavy?) use of sklearn pipelines or whatever? I guess it depends on the task, but lets say youre working on your own on a relatively static and manageable data set, ie not worrying about big data streams in the cloud or something

pure pond Dec 26, 2023, 5:36 PM

#

rigid cape Hey guys, sorry for asking such a beginner question. But I have a confusion. Do ...

https://blog.paperspace.com/a-practical-guide-to-deep-learning-in-6-months/

#

actually, for my question, does anyone have any examples of like a github repo where they implemented an ML pipeline? Which was done either for a job interview kind of scale or while preparing? That'd be perfect

final kiln Dec 26, 2023, 5:48 PM

#

Idk y I haven't used this before, MLFlow is amazing

#

It's running on an ec2 instance and connects to S3.

#

In gonna do three experiments, one for normal GPT, another for a slightly modified GPT and another for a full variation on the transformer where I also incorporate a couple other ideas I have

#

I'll train them on the same Shakespeare dataset and compare performances, and also how the performances scale with model size

#

I've already put some thought into how I'm gonna do the multimodal thing. It's gonna be a voice input to text output because one step at a time. The idea is to take whisper, invert the decoder (somehow) so that I have a text to embedding translator, then use lamma to generate text, so I can use it to fine tune another lamma to understand the embedding. Turning lamma into a decoder for the whisper encoder.

pure pond Dec 26, 2023, 6:00 PM

#

cant you just get embeddings from whisper?

#

and how will you feed embeddings into llama

final kiln Dec 26, 2023, 6:01 PM

#

pure pond cant you just get embeddings from whisper?

It's encoder decoder, I'm just gonna take from the bottleneck

final kiln Dec 26, 2023, 6:01 PM

#

pure pond and how will you feed embeddings into llama

I'm gonna add something to the initial layers to adapt for the new input type

#

Greatest challenge will be to invert whispers decoder

pure pond Dec 26, 2023, 6:03 PM

#

final kiln I'm gonna add something to the initial layers to adapt for the new input type

output_hidden_state=True apprently

#

oops didnt mean to reply to that, how do I undo that?

#

whatever

#

https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel.forward.output_hidden_states

final kiln Dec 26, 2023, 6:05 PM

#

It's not so easy, you can get to the bottleneck (which I think is what that does im not sure), but the problem is that the decoder has two inputs

pure pond Dec 26, 2023, 6:05 PM

#

also I dont think finetuning llama will work for this, youd have to (pre)train it all

final kiln Dec 26, 2023, 6:06 PM

#

I'll try it all the same, if it doesn't work I'm sure the lessons learned will inform my next step

pure pond Dec 26, 2023, 6:07 PM

#

ofc

#

I will point out this though https://arxiv.org/abs/2305.11206 that in my view was a big landmark in the shift to thinking that finetuning doesnt really teach the model any new information (or how to handle new data formats in this case), which is pretty mainstream now. Check out the start of section 2 in the paper. Also https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts is pretty good from around that same time

#

as yeah I see people parrot this a lot but not really know where its all come from

final kiln Dec 26, 2023, 6:17 PM

#

Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

pure pond Dec 26, 2023, 6:19 PM

#

I guess thats a fair point, its more saying that finetuning brings out what the model already knows, but I spent about a third of the year just funetuning llamas, trying to get them to have a better understanding of the domain of my companies industry, and didnt get that far. Its why theres been such a hard pivot to RAG now (imo)

final kiln Dec 26, 2023, 6:20 PM

#

pure pond I guess thats a fair point, its more saying that finetuning brings out what the ...

that's the point of fine tuning tho, I don't wanna introduce knowledge, that's too expensive, I want it to adapt to a new input type

pure pond Dec 26, 2023, 6:24 PM

#

I mean sure, I'm not really an expert, I could (in theory) tell you details of how it went at my company, but its also just our team and a fairly small one. I see finetuning as getting the model to output in more predictable ways, and in ways you design. What youre doing is more the other side, getting the model to take an input that it's unfamiliar with. So instead of condensing whatever region of the embedding space represents its "understanding" in different ways, you want it to handle unfamiliar regions of the space. But tbh, Ive never tried anything like that, and cant think of any experiments like that either that I know about at least

#

so yeah this is a cool thing to try actually, hit me up if it goes well lol

#

but it might just be a hopefully easy enough to do transformation to get the embeddings to what llama knows

final kiln Dec 26, 2023, 6:25 PM

#

pure pond I mean sure, I'm not really an expert, I could (in theory) tell you details of h...

tbh there's probably good reason why gpt4 uses a translation network for its vision thing instead of finetuning the starting layers

#

or at least, that's what I heard how they do it, not gpt4 specifically ofc cuz that's not opne sourced

#

the reason I wanna fine tune is that the embedding will contain other useful information that I know language models understand but can't get from most text prompts

pure pond Dec 26, 2023, 6:29 PM

#

yeah, thats where my gut feeling is that pretraining will be better suited than finetuning kicks in. But again, idk really

#

yeah idk, interesting problem

final kiln Dec 26, 2023, 6:31 PM

#

like, I know GPT4 "understands" sadness, conceptually, it can describe it, adapt prompts if you tell it you are sad. but you need to do extra work for the prompt to include emotions

#

a speech embedding will already contain all that

#

and since the language model also already contains some representation of it

#

I'm hoping that it will just sort of adapt to it

#

this is late stage stuff tho

potent sky Dec 26, 2023, 6:58 PM

#

pure pond yeah, thats where my gut feeling is that pretraining will be better suited than ...

I think the same....it might give optimistic looking results after a bunch of fine-tuning but pre-training is where it's at.
the LLM already has a capability based on a ton of parameters trained on? understanding structure of natural language, not numbers in an embedding space

#

interesting approach tho, backwards_propagation
i've thought about this problem a bunch too, the information lost in the original embeddings when going for multimodality
but this would occur only in "put-together" multi-modal networks so to say right, not in ones designed for multimodality

#

which is a long winded way to say that that should be your pre-training task

final kiln Dec 26, 2023, 7:01 PM

#

potent sky I think the same....it might give optimistic looking results after a bunch of fi...

The embedding spaces are the things that actually encode the structure of natural language.

potent sky Dec 26, 2023, 7:02 PM

#

yes but not as input

#

and when you remove the input layer the embedding space shifts

final kiln Dec 26, 2023, 7:02 PM

#

The information is extracted from the embeddings, so it must encode meaning.

potent sky Dec 26, 2023, 7:03 PM

#

yes but LLMs are not trained to decode structure from embedding space inputs, they're trained to decode structure and meaning from natural language inputs

final kiln Dec 26, 2023, 7:04 PM

#

The text input is first translated into an embedding, using a table of embeddings that the network learns.

#

I personally don't like arguing like this, I've banged my head against the wall so many times by trying to guess what's possible or what's not using my own intuition.

potent sky Dec 26, 2023, 7:05 PM

#

xd i get it

final kiln Dec 26, 2023, 7:05 PM

#

I've learned to accept that I just can't predict what works until I try it

potent sky Dec 26, 2023, 7:05 PM

#

you'd still need a pre-trained joint embedding space model at minimum right

#

but that isn't a huge ask

final kiln Dec 26, 2023, 7:06 PM

#

I'm just gonna let the first layers change. Possibly apply a decaying learning rate as a function of model depth

potent sky Dec 26, 2023, 7:06 PM

#

final kiln I've learned to accept that I just can't predict what works until I try it

with the amount of unexpected results, or results that are only explained after they're observed, in ML, that's a reasonable approach imo

potent sky Dec 26, 2023, 7:07 PM

#

final kiln I'm just gonna let the first layers change. Possibly apply a decaying learning r...

interesting

#

lmk if it shows promise!

final kiln Dec 26, 2023, 7:08 PM

#

Sure, I always post here what im doing in case people want to chip in. Always super helpful to discuss and hear new ideas as I'm not very experienced in NLP

potent sky Dec 26, 2023, 7:10 PM

#

ah nice! I used to be pretty regular here but life gets in the way lol, still drop in from time to time

pure pond Dec 26, 2023, 7:34 PM

#

potent sky and when you remove the input layer the embedding space shifts

I think youre in agreement though, backprop is saying that there will be a network that tries to translate whisper embeddings to llama embeddings, getting rid of the first part of llama that turns the sequence into embeddings. And the hope being that this (can) learn how to do that shift. In a sense, instead of whisper -> text -> llama, it'll be last embeddings of whisper -> translation network -> first embeddings of llama, so avoiding destroying information when you convert to text in the middle

I think this is all fine in theory, but my intuition is that llama wont really make use of the extra information whisper knows about from finetuning, it needs pretraining as a fresh language model

#

no reason not to start with finetuning though considering compute (btw backprop, in case you havent come across it, try LORA)

potent sky Dec 26, 2023, 7:37 PM

#

Your last paragraph is what I think the joint space training would be useful for imo, that I mentioned

#

LoRA and QLora

pure pond Dec 26, 2023, 7:38 PM

#

ah ok, wasnt familiar with the term

potent sky Dec 26, 2023, 7:38 PM

#

Though I wonder for the joint space embeddings
Is it necessary to pre train a full model
Or can we use an adapter network in some sense

#

Interesting stuff

potent sky Dec 26, 2023, 7:40 PM

#

pure pond I think youre in agreement though, backprop is saying that there will be a netwo...

Also I think currently backprop is trying to avoid using a separate translation network and hoping the first few layers of llama will adapt to doing that on fine tuning
This is a part I'm not very confident about

frigid owl Dec 26, 2023, 8:14 PM

#

Hey guys, i know its a pretty dumb question but what is x_train and y_train and why we always have to have both x and y (e.g: x_val, y_val; x_test, y_test)?

serene scaffold Dec 26, 2023, 8:18 PM

#

frigid owl Hey guys, i know its a pretty dumb question but what is x_train and y_train and ...

#

for each instance, y is the property that you want the model to output, and X are the properties it can use to do that.

#

So if your data set has these properties about a house: {square footage, number of rooms, neighborhood, price}
and you want to be able to predict the price
then price is the y value, and {square footage, number of rooms, neighborhood} are the X values.

#

make sense so far?

frigid owl Dec 26, 2023, 8:21 PM

#

yeah

#

ty

#

So if i want a model to predict a number for each image i need a matrix with all pixel (x) and the number on image (y)?

serene scaffold Dec 26, 2023, 8:22 PM

#

the whole image (the array of pixels) would be the X value, and the number that image represents would be the y, yes.

frigid owl Dec 26, 2023, 8:23 PM

#

Oh okay

#

Again thank you so much

serene scaffold Dec 26, 2023, 8:24 PM

#

yw

hidden sapphire Dec 27, 2023, 3:30 AM

#

How does a model that does what https://thispersondoesnotexist.com/ does work? All MLM problems (very few) I've done have had inputs and outputs, this just has an output. How does training a model like that work? (I'm not looking for an in depth explanation though I'll hapily take that, but more a few pointers on what to research)

desert oar Dec 27, 2023, 4:39 AM

#

hidden sapphire How does a model that does what https://thispersondoesnotexist.com/ does work? ...

i'm not an expert in this field, but my broad understanding is that the layers get progressively bigger as you get towards the end of image, culminating in a full-size image output

soft dock Dec 27, 2023, 4:39 AM

#

hidden sapphire How does a model that does what https://thispersondoesnotexist.com/ does work? ...

There is no input because the model is no longer training. Also, it is a type of generative adversarial network.

https://github.com/NVlabs/stylegan

GitHub

GitHub - NVlabs/stylegan: StyleGAN - Official TensorFlow Implementa...

StyleGAN - Official TensorFlow Implementation. Contribute to NVlabs/stylegan development by creating an account on GitHub.

serene scaffold Dec 27, 2023, 4:57 AM

#

I got one who only sort of has glasses

hidden sapphire Dec 27, 2023, 5:02 AM

#

serene scaffold I got one who only sort of has glasses

lol I got the most generic looking dad

hidden sapphire Dec 27, 2023, 5:02 AM

#

soft dock There is no input because the model is no longer training. Also, it is a type of...

Oh okay thank you I'll look into that

potent sky Dec 27, 2023, 5:49 AM

#

Thankfully thispersondoesnotexist.com has been improved a lot, I think both with newer models, and importantly better data.
I remember there used to be a good share of NSFW images generated a few years ago
Which makes it awkward when you're excited trying to show someone this cool thing about your field 😅

spark tartan Dec 27, 2023, 8:00 AM

#

Can someone explain what's going on in this validation curve? I think it shows underfitting since the training and cross validation score are still steadily increasing. That right?

lapis sequoia Dec 27, 2023, 8:28 AM

#

Anyone know how to fix this?

#

My df styling seems messed up

#

Columns are taking a lot of width and trying to expand fully

lapis sequoia Dec 27, 2023, 9:10 AM

#

The default itself has changed. I haven't run any other cell

oblique quarry Dec 27, 2023, 9:38 AM

#

Hey there! Im trying to implement a GMM. Im a bit struggling to implement the pdf for e step. This is my current implementation ```py
import numpy as np

def pdf(data, variance, mu):
covariance = np.diag(variance)
return np.exp(-0.5 * (data - mu).T @ np.linalg.inv(covariance) @ (data - mu)) / np.sqrt((2 * np.pi)**data.shape[1] * np.linalg.det(covariance))```

#

i read about it in a paper covering the basics about multivariate data analysis, but translating it into code is always a different thing

odd meteor Dec 27, 2023, 1:24 PM

#

spark tartan Can someone explain what's going on in this validation curve? I think it shows u...

You didn't tell us which score is displayed on your Y-axis. RMSE or Coefficient of Determination (R2) or ??

Well, I'm gonna presume y-axis is the loss. From the learning curve, you could see there's a convergence between your train and cv score. As expected, the loss tends to increase as the number of neighbours increases.

It's certainly not overfitting. I won't say it's underfitting either.

While the goal is to train a model that makes 0% mistake in its prediction, your model prediction on train data happens to be making ~35% error in its prediction. ( You know it could have been way worse depending on your predefined threshold. For example, for some people, if their model was making, say, >= 70% error on the train data they'll shutdown the whole thing and retrain. For some, >= 50% error on the train data is enough to trigger a complete overhaul, which would of course, require training a new model)

solid seal Dec 27, 2023, 2:14 PM

#

i need some help with computer vision

#

i got to take keyboard inputs for the game from the hand gestures

#

any appropriate library for that to help me out with it

#

pls ping for the reply

potent sky Dec 27, 2023, 4:05 PM

#

solid seal any appropriate library for that to help me out with it

Check out mediapipe

solid seal Dec 27, 2023, 4:07 PM

#

potent sky Check out mediapipe

i am using the same for the hand gestures part , can you provide with some documentation for the same , as i cant find it

potent sky Dec 27, 2023, 4:11 PM

#

solid seal i am using the same for the hand gestures part , can you provide with some docum...

https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#get_started

Almost tailored to your use-case ig.
What issues are you facing?

Google for Developers

Gesture recognition task guide | MediaPipe | Google for Develop...

solid seal Dec 27, 2023, 4:15 PM

#

potent sky https://developers.google.com/mediapipe/solutions/vision/gesture_recognizer#get_...

now that i figured out by using pyautogui , and defined the cases for the press baar when we open a closed hand , but as soon as i open the hand the fps drops to 0

#

heres the code for that

#

oh ig , we cant send python file here

heavy sigil Dec 27, 2023, 4:43 PM

#

Hey, i'm kinda stuck here, i'm trying to use open ai whisper for transcription, but i'm having trouble loading the model:
I'm just use the basic transformers code from hugging face:
import whisper import torch from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq print("Downloading the model...") direct = "model/" model_1 = AutoModelForSpeechSeq2Seq.from_pretrained('openai/whisper-base', cache_dir=direct) processor = AutoProcessor.from_pretrained('openai/whisper-base', cache_dir=direct) model = whisper.load_model(model_1) print("Model loaded. Transcribing test audio...") result = model.transcribe("audio_test.mp3") print(result["text"])
It seems that i can't directly load it (i'm just doing it wrong, thats all i know, but i'm not sure what to do. Do i need to use transformers?

quaint crescent Dec 27, 2023, 4:48 PM

#

Please critique my chart

oblique quarry Dec 27, 2023, 4:50 PM

#

Can somebody smarter tell me why my probabilties are all over the place. The ossalate between 0 and 20. Which is a behaviour I cant explain: Code https://paste.pythondiscord.com/6K6Q logFile https://pastebin.com/RrcNg1tw

Pastebin

2023-12-27 17:16:58,720 - INFO - Variable value updated: 19.02023-1...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

quaint crescent Dec 27, 2023, 5:12 PM

#

less clutter on y axis

desert oar Dec 27, 2023, 5:48 PM

#

oblique quarry Can somebody smarter tell me why my probabilties are all over the place. The oss...

can you pare this down at all? it's just a huge log output. also i suggest using https://paste.pythondiscord.com which isn't covered in malicious ads

desert oar Dec 27, 2023, 5:49 PM

#

heavy sigil Hey, i'm kinda stuck here, i'm trying to use open ai whisper for transcription, ...

thanks for the code. ideally you can include an error message to make it easier for people to help. include the entire "traceback" section, which tells you where exactly the error occurred in your code

desert oar Dec 27, 2023, 5:49 PM

#

quaint crescent less clutter on y axis

looks good, i wouldn't mind a faint horizontal line in the background at 0

heavy sigil Dec 27, 2023, 6:11 PM

#

desert oar thanks for the code. ideally you can include an error message to make it easier ...

thats the error i get

model = whisper.load_model(model_1)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper_init_.py", line 135, in load_model
elif os.path.isfile(name):
^^^^^^^^^^^^^^^^^^^^
File "<frozen genericpath>", line 30, in isfile
TypeError: stat: path should be string, bytes, os.PathLike or integer, not WhisperForConditionalGeneration

heavy sigil Dec 27, 2023, 6:49 PM

#

@desert oar

final kiln Dec 27, 2023, 6:56 PM

#

im setting up some more infra for the model training, after some research I found that the most efficient way is to use amazon spot instances

#

so im gonna setup a self hosted github actions runner that will be responsible for baby sitting the spot instances

#

b4 that I need to setup spotty, which is the thing that'll actually handle them

#

spot instances are like ec2 instances, but they can be brought down any time by aws, making them cheaper, spotty will essentially handle the fault tolerance aspect of it, it's specifically made for ML training

outer widget Dec 27, 2023, 7:00 PM

#

heavy sigil thats the error i get model = whisper.load_model(model_1) ^^^^^...

processor = AutoProcessor.from_pretrained('openai/whisper-base', cache_dir=direct)
model = whisper.load_model(model_1)```
Just use model = whisper.load_model('model_name'), where model_name is "openai/whisper-base". load_model function expects a string or path, while you are sending Automodel class from HF as arguement.

#

I dont think you even require AutoProcessor for transcribing from model.transcibe in whisper models.

final kiln Dec 27, 2023, 7:21 PM

#

pure pond no reason not to start with finetuning though considering compute (btw backprop,...

Just read about this, super interesting, I might use it instead of directly altering the weights. Just to see if I understood, instead of altering the weights directly, you get something of the form

Y = W_fixed X + (delta W) X

Where delta W = AB, two trainable matrices, one initialized to 0 and one initialized to a gaussian sample of N(0,1).

final kiln Dec 27, 2023, 7:23 PM

#

potent sky Also I think currently backprop is trying to avoid using a separate translation ...

Yes, the idea is that the network will know to adapt existing representations to the new kind of input which includes representations of the same thing but in a different "language".

heavy sigil Dec 27, 2023, 7:43 PM

#

outer widget ```model_1 = AutoModelForSpeechSeq2Seq.from_pretrained('openai/whisper-base', ca...

but what do i do with the model_1 variable?

outer widget Dec 27, 2023, 7:44 PM

#

heavy sigil but what do i do with the model_1 variable?

You can also remove that.

heavy sigil Dec 27, 2023, 7:45 PM

#

for whisper.load_model can i just specifiy the directory of the model

outer widget Dec 27, 2023, 7:49 PM

#

heavy sigil for whisper.load_model can i just specifiy the directory of the model

https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/__init__.py#L99

        one of the official model names listed by `whisper.available_models()`, or
        path to a model checkpoint containing the model dimensions and the model state_dict.```

arctic wedgeBOT Dec 27, 2023, 7:49 PM

#

whisper/__init__.py line 99

def load_model(```

final kiln Dec 27, 2023, 7:56 PM

#

heavy sigil for whisper.load_model can i just specifiy the directory of the model

You can also use the help function to print the docstring of methods.

help(whisper.load_model)

frigid owl Dec 27, 2023, 8:04 PM

#

Hi guys. I just made my first nn. Its a number classification model. I would be really grateful if you gave me some tips / things to improve. Here is my code in google.collab:
https://colab.research.google.com/drive/1S65G8iwa1XgwaGEiK_Sr6_ag70vNflsd?usp=sharing

Google Colaboratory

pure pond Dec 27, 2023, 8:14 PM

#

final kiln Just read about this, super interesting, I might use it instead of directly alte...

Yeah basically, the key is also that A and B are low rank, so the embeddings are projected into a low rank space, and the training has the job of figuring out which projection is the most helpful. It's interesting that such a loss of potential information often has very small effects on performance

heavy sigil Dec 27, 2023, 8:15 PM

#

Yeah it seems that it can load a checkpoint file but i don't see any??? i tried:
model = whisper.load_model("model/models--openai--whisper-base")
and i get RuntimeError: Model model/models--openai--whisper-base not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

pure pond Dec 27, 2023, 8:16 PM

#

frigid owl Hi guys. I just made my first nn. Its a number classification model. I would be ...

Not trying to be dismissive but try just pasting your code into chatgpt and asking it

heavy sigil Dec 27, 2023, 8:22 PM

#

@outer widget ??

heavy sigil Dec 27, 2023, 8:49 PM

#

i just realized that i don't need to use transformers, if i tell it to load for example base.en it will download the required file automatically, however i get this error now:

FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
File "C:\Users\Default.DESKTOP-KA61FU5\Desktop\Talk2GPT\Talk2GPT.py", line 27, in <module>
result = model.transcribe("audio_test.mp3")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\transcribe.py", line 122, in transcribe
mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 140, in log_mel_spectrogram
audio = load_audio(audio)
^^^^^^^^^^^^^^^^^
File "C:\Users\Default.DESKTOP-KA61FU5\AppData\Roaming\Python\Python311\site-packages\whisper\audio.py", line 58, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\subprocess.py", line 1024, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\Python311\Lib\subprocess.py", line 1493, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified

heavy sigil Dec 27, 2023, 9:27 PM

#

@final kiln Do you know anything

final kiln Dec 27, 2023, 9:29 PM

#

heavy sigil <@935270247366271027> Do you know anything

Yes you need to first read the error and understand it. Sometimes the solution is obvious, other times requires some trial and error.

final kiln Dec 27, 2023, 9:37 PM

#

pure pond Yeah basically, the key is also that A and B are low rank, so the embeddings are...

I suppose it's like zip compression, it removes unnecessary info

pure pond Dec 27, 2023, 9:44 PM

#

makes you wonder how big llms really need to be

#

is anyone interested in RAG? I'm reading through https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/446422 and dont really understand why/what logit theyre feeding into their binary classifier head (the steps under the figure).

So, they have the question with the retrieved context. The way I (try to) understand it (which also goes against other things they say they do so I know I'm wrong) is that they take that context+question sequence, then append the answer to the sequence, run the sequence through the model to get the logits for every possible token in the vocab, and feed those logits into a classifier head? Why would this give a helpful measure of how correct the answer is? Surely the LLM is happy to spit out the next token anyway, I dont see what relevance specific logits there has?

I guess my first question is, is it right that they only forward through the network 5 times for each question (because of the 5 answers)? Just to make sure I'm on the same page

Then 2, is it just a thing that I have to accept that they can classify over the logits for each answer? I dont see where they got the idea from, what the intuition is. If it was me I'd like to try to get the logits for the model producing tokens A/B/C/D/E with whatever setup is needed for that

#

actually I think I have the right understanding of WHAT theyre doing, but still unsure as to WHY

heavy sigil Dec 27, 2023, 9:59 PM

#

final kiln Yes you need to first read the error and understand it. Sometimes the solution i...

Hmm, it seems it has to do with FFMPEG, i have it in my c drive under FFMPEG, and it got it on Path enviorment variable but it still wont work,

pure pond Dec 27, 2023, 10:02 PM

#

also I find it weird that they do this

    if tokenizer.pad_token is None:
        if tokenizer.unk_token is not None:
            tokenizer.pad_token = tokenizer.unk_token
        else:
            tokenizer.pad_token = tokenizer.eos_token

ie pad with unk, am I right in thinking thats weird?

#

ah I think Ive found the answer to my question from above, the diagram doesnt really explain but

        inst = f"Answer: {row['answer']}\n###\nIs this answer correct? "
        instructions.append(inst)

now I can accept the logits being useful

#

@heavy sigil when you read the error, what did you think of this line? result = model.transcribe("audio_test.mp3") in the context of FileNotFoundError ?

heavy sigil Dec 27, 2023, 10:06 PM

#

it's in the same folder as the .py file

#

and i'm running the py from command line on that folder

pure pond Dec 27, 2023, 10:06 PM

#

put this above the line that errors

import sys
print(sys.path)

#

and then google pythonpath

#

and also try changing the code to use the absolute path

heavy sigil Dec 27, 2023, 10:07 PM

#

Hmm, the error is happening from the library not my code

pure pond Dec 27, 2023, 10:09 PM

#

and sorry not pythonpath, current working directory

import os
print(os.getcwd())

pure pond Dec 27, 2023, 10:09 PM

#

heavy sigil Hmm, the error is happening from the library not my code

so?

heavy sigil Dec 27, 2023, 10:09 PM

#

I have my working directory set correctly

pure pond Dec 27, 2023, 10:09 PM

#

can you with open("audio_test.mp3") directly above the line that errors?

heavy sigil Dec 27, 2023, 10:15 PM

#

???

heavy sigil Dec 27, 2023, 10:15 PM

#

pure pond can you with open("audio_test.mp3") directly above the line that errors?

the error is not caused by script

pure pond Dec 27, 2023, 10:17 PM

#

idk maybe its not windows compatible

#

try doing it on a google colab

heavy sigil Dec 27, 2023, 10:17 PM

#

Looks like it is?

#

i really can't

heavy sigil Dec 27, 2023, 10:18 PM

#

pure pond try doing it on a google colab

i need it done locally

pure pond Dec 27, 2023, 10:18 PM

#

well just with a test file to rule out os issues?

heavy sigil Dec 27, 2023, 10:18 PM

#

wdym?

pure pond Dec 27, 2023, 10:18 PM

#

why does it need to be local

#

lets take this to dm also as im concsious of spamming this chat

spark tartan Dec 27, 2023, 11:24 PM

#

odd meteor You didn't tell us which score is displayed on your Y-axis. RMSE or Coefficient ...

Sorry, the y-axis is the RMSE. My take is that it's overfitting badly for k<20 and underfits for k>20k

odd meteor Dec 27, 2023, 11:59 PM

#

spark tartan Sorry, the y-axis is the RMSE. My take is that it's overfitting badly for k<20 a...

Yes it's overfitting at k < 20. The learning curve didn't show what's happening beyond k > 100.

potent sky Dec 28, 2023, 1:36 AM

#

final kiln Yes, the idea is that the network will know to adapt existing representations to...

Hmm yes, it's worth trying definitely, I'm just not confident it'll give the best results out of the methods available to us because I don't think the embedding space geometry holds the similar common structure and patterns found across different human languages

potent sky Dec 28, 2023, 1:39 AM

#

final kiln Just read about this, super interesting, I might use it instead of directly alte...

LoRA (and more generally, PEFT) is pretty powerful!
I use it quite often now, instead of the domain specific adapters we were engineering before

rare osprey Dec 28, 2023, 2:49 AM

#

Can anybody help me study these following topics

Mamba architecture
Liquid Time Constant Networks

#

Im also looking into hyperdimensional computing and q star so any help there would be nice

serene scaffold Dec 28, 2023, 2:59 AM

#

rare osprey Can anybody help me study these following topics - Mamba architecture - Liquid ...

I've never heard of these. This sounds pretty advanced.

rare osprey Dec 28, 2023, 3:00 AM

#

Yes, it is

magic dune Dec 28, 2023, 6:12 AM

#

Anyone have an good papers on NEAT?

hidden sapphire Dec 28, 2023, 7:49 AM

#

rare osprey Can anybody help me study these following topics - Mamba architecture - Liquid ...

That's cool and all but I followed a tutorial and was able to train a nn for mnist (I don't understand how it works at all just copied the dudes code)

pearl barn Dec 28, 2023, 10:17 AM

#

Shows me this error what I have made worng ??

#

rn_image_picker_lib_temp_2007359c-12b3-4571-91d9-8a2164828467.jpg

exotic smelt Dec 28, 2023, 10:47 AM

#

pearl barn

you need to use .format with the string not the function

#

and you haven't given any placeholder for the string also

pearl barn Dec 28, 2023, 10:48 AM

#

Yeah I figured it out thank you

exotic smelt Dec 28, 2023, 10:48 AM

#

alr

pearl barn Dec 28, 2023, 10:54 AM

#

I wanna to ask does it print the both print orders if the equation is true can't I choose to print either if it is true and other order is false ?

swift torrent Dec 28, 2023, 10:55 AM

#

worth trying to move from data engineer to data scientist ?

broken palm Dec 28, 2023, 11:16 AM

#

anyone know how to solve this problem?

long canopy Dec 28, 2023, 1:32 PM

#

any in-depth guide on customizing graph layouts in Networkx?

serene scaffold Dec 28, 2023, 3:03 PM

#

broken palm anyone know how to solve this problem?

do you understand what the error message is telling you

broken palm Dec 28, 2023, 3:05 PM

#

serene scaffold do you understand what the error message is telling you

@serene scaffold thanks for ur response bro but it already solve

agile owl Dec 28, 2023, 3:08 PM

#

@past meteor I thought of an idea based on an article I read about time vectors being used in language models. Suppose we have entirely real-valued parameters as codons of an evolutionary algorithm applied by an agent over a time series. For each period n in a periodization of the training period, use the EA to find the best-fitting model. Then, we can look for time-dependent vectors in the parameters and use them to extrapolate into the future.

past meteor Dec 28, 2023, 3:09 PM

#

agile owl <@260493929047130113> I thought of an idea based on an article I read about time...

I'm on vacation till like the 5th of Jan, tag me afterwards 😄

agile owl Dec 28, 2023, 3:10 PM

#

sorry enjoy vacation

signal holly Dec 28, 2023, 6:42 PM

#

Hi, I'm trying to learn ml using project-based learning, but I'm facing some troubles. When I get a project idea, like a spam email classifier, how do I go about building it without just searching up how to do it and blindly copy and pasting code? If I fall into this trap, I'm usually unable to provide my input and that becomes troublesome.

valid wind Dec 28, 2023, 6:55 PM

#

signal holly Hi, I'm trying to learn ml using project-based learning, but I'm facing some tro...

I think something good can be to just do one project where you're essentially copying and pasting code but instead of just doing that try to understand what each step does. Then on a second project, try to do these steps on your own for a different project

#

Then you can ensure that you've learned it

pure pond Dec 28, 2023, 7:08 PM

#

rare osprey Im also looking into hyperdimensional computing and q star so any help there wou...

why q star? Because of some openai shitposts? anyway https://www.youtube.com/watch?v=nOBm4aYEYR4 https://www.youtube.com/watch?v=9dSkvxS2EB0

#

S4 as a precursor to mamba if you havent come across it also

signal holly Dec 28, 2023, 7:29 PM

#

valid wind I think something good can be to just do one project where you're essentially co...

oh I was actually doing something like that lol
just needed validation for it

pure pond Dec 28, 2023, 7:32 PM

#

i find banning myself from using copy + paste when in the copy code mode helps a bit, even if I just copy it out by hand

umbral charm Dec 28, 2023, 8:37 PM

#

This is a long shot, but i have to code this:
https://gyazo.com/4c62c2a417578ef6e51f617d1e633ad4

Gyazo

#

x = []
y = []
z = []
prodl = 1
summ = 0
sumi = 0
jj = -1
n = 2
for i in range(n+1):
    for m in range(n+1):
        for l in range(n+1):
            if l != m and l != i and l != jj:
                prodl = prodl * (n/2 - l) / (jj - l)
                x.append(prodl)
        if m != i and m != jj:
            summ = summ +  1 / (jj-m)
            y.append(summ)
    if i != jj:
        sumi = sumi + 1 / (jj-i)
    z.append(sumi)

print(np.prod(x) * np.sum(y) * np.sum(z))

#

This is my code, however for the last product of all it just gives an array full of 0

#

this is because if n/2 - l = 0, since its product of all everythng after is 0

#

so is the equation just wrong thna?

#

in my case n = 2, so when n = 1 the whole product is 0

#

this makes 0 sense

#

the equation just breaks

#

even worse i get '-0.0'

#

-0?

long canopy Dec 28, 2023, 10:06 PM

#

for some reason, people have stopped using the Gephi Toolkit since around 2021 (last posts on their forums), but what the heck have people replaced it with?

zealous swan Dec 28, 2023, 10:36 PM

#

Hello all, I am new to python and have some questions related to dimensionality reduction. I have a textual dataset about publications in the field of information visualization, my target is to apply suitable dimensionality reduction methods to visualize the large amount of data against 2 dimensions. So far, I have used the CountVectorizer and TfidfVectorizer to generate the tf and tf-idf vectors for 2 of my columns. Currently I am trying out different methods like PCA, TruncatedSVD and LLE. I am still unsure what makes which dimensionality reduction method more suitable for the task. When I used LLE, I got a really sparse visualization of only 3 points, so I assumed that all other points are overlapping and hence I see only 3, but I am unsure. If anyone can help me understand more about dimensionality reduction it would be great. I would appreciate any helpful resources. Thank you😊 .

desert oar Dec 28, 2023, 11:34 PM

#

signal holly Hi, I'm trying to learn ml using project-based learning, but I'm facing some tro...

The best approach is to stop thinking about code. You want to know what techniques to use, where to get data, etc

#

So you can look up spam classification for example, then you'll read about naive bayes classification and spam datasets. Then you do EDA, scripting to process data, and finally model fitting, but all of that is driven by figuring out conceptually what you need to do. The code becomes only a means to an end

#

At which point figuring out the code itself becomes a lot easier, because you'll already know what you need to do

desert oar Dec 28, 2023, 11:40 PM

#

zealous swan Hello all, I am new to python and have some questions related to dimensionality ...

UMAP is kind of the go-to dimension reduction technique for high-dimensional data

zealous swan Dec 29, 2023, 1:15 AM

#

desert oar UMAP is kind of the go-to dimension reduction technique for high-dimensional dat...

Thanks for the reply, I am actually working on a lab assignment and need to use at least 2 methods, I have already tried PCA, TruncatedSVD, t-SNE and LLE. The visualizations looked really dense and there was so much overlap in the scatterplot so I started to experiment with the min_df parameter for the TfidfVectorizer to see how it affects the visualization or if it makes it better. I will give UMAP a try.

elder falcon Dec 29, 2023, 1:59 AM

#

Yo is anyone here in digital or computational humanities, I'm trying to relearn some corpus analytics stuff b/c i want to create a model for comparing the communication patterns or repetition of words and images specific to political topics and compare the averages i get throughout my data set to determine what comics are more political than others on a macro scale, and on a microscale seeing what words are specific to each comic that defines whatever political topic i'm exploring in the text.

serene scaffold Dec 29, 2023, 2:37 AM

#

elder falcon Yo is anyone here in digital or computational humanities, I'm trying to relearn ...

For each comic: replace all the words in their sub-corpus with the "base" form of each word, which is called the lemma. "run" is the lemma of "running". "cat" is the lemma of "cats". this process is called lemmatization. This normalizes each word, so that you can count how frequent each one is.

long canopy Dec 29, 2023, 7:01 AM

#

feels like test-driven development could be used to make LLMs write programs that are a bit more sophisticated, no?

#

or, has anyone worked on the idea of making packages, basic types, basic classes, etc., be considered tokens?

flint umbra Dec 29, 2023, 9:10 AM

#

I dont know if this is the right sub, but i have the following question:

Im trying to discriminate between images of numbers, i want to define image an image property that describes the amount of enclosures in the image. For example the image of an 8 has 2 enclosures, where as the images of 6 and 9 have exactly 1 enclosure.

see images i supplied: the 7 has no enclosures, the 4 has 1 enclosure and the Q has 1 enclosure. How would i write a python function that calculates the amount of enclosures present in an image?

final kiln Dec 29, 2023, 11:56 AM

#

flint umbra I dont know if this is the right sub, but i have the following question: Im try...

You can use breadth first search, iterate all pixels of value 0, for each pixel do bfs to visit every neighboor of value 0, if the graph does not touch the edge of the image increase enclosure count by one. Always keep track of which pixels have already been visited so you can skip them.

#

You can also invert the image, perform connected components and exclude those that touch the edge of the image and then count what's left.

hardy depot Dec 29, 2023, 1:54 PM

#

heyuhm

#

i started getting this new error from autoscraper, its not giving an output anymore, just empty list after scraping

desert oar Dec 29, 2023, 1:58 PM

#

long canopy or, has anyone worked on the idea of making packages, basic types, basic classes...

you'd need some kind of vector encoding for each, which might be tricky.

i suppose you could use one transformer-based model to obtain "function vectors" and then another to work on high-level sequences thereof

azure compass Dec 29, 2023, 3:44 PM

#

This LSTM shows some fluctuation in the validation MSE loss but its gettting to the point where I'm happy with the loss. Should I be concerned?

#

Also the validation loss is lower than trianing, but from what I understadn thats becasue of my dropout rate

devout creek Dec 29, 2023, 4:09 PM

#

someone can help me editing a google colab project, Whisper Youtube, to use insanely fast whisper instead of just whisper ?

topaz turtle Dec 29, 2023, 4:16 PM

#

does anyone here know tensor algebra? 👀

#

i'm trying to understand how to express arbitrary tensor contraction in terms of blas routines

#

for example, a tensor product between (2,3,4) and (3,4,5) can be expressed as a matrix multiplication, by reshaping the two tensors into matrixes (2, 12) and (12,5), then multiplying these two matrixes

#

but i can't quite figure out how to do it with a contraction like (2,3,4) and (5, 4, 6), where the contracted dimension is in the middle of the tensor...

#

any resources wrt that would also be appreciated

proud wing Dec 29, 2023, 4:48 PM

#

topaz turtle any resources wrt that would also be appreciated

You are asking how the contra and covariant transformation properties are applied?

topaz turtle Dec 29, 2023, 4:54 PM

#

proud wing You are asking how the contra and covariant transformation properties are applie...

not exactly?

i'm trying to come up with a general algorithm to express tensor contraction in terms of BLAS routines (that is, only using matrix multiplication, vector products, tensor reshaping, transposing and such)

proud wing Dec 29, 2023, 4:59 PM

#

hmm...no zero shot solutions for you but i can point you in a direction that might help

#

invariant inner products between vectors and covectors under transformation is how i'd look at mapping tensor ops -> matrix ops

#

and if you want a general algorithm you'd want something that is valid regardless of the coordinate system

#

and for a general algorithm it might be interesting to explore if you could derive a calculus function that sums over the indices akin to an integral (sort of 🙂

#

contraction is very similar to integration over a set of indices (or dimension0

#

sry i couldnt be more help ☕

topaz turtle Dec 29, 2023, 5:09 PM

#

proud wing and for a general algorithm it might be interesting to explore if you could deri...

interesting 👀

proud wing Dec 29, 2023, 5:09 PM

#

check out Einstein Summation

topaz turtle Dec 29, 2023, 5:09 PM

#

what's kinda frustrating is that this algorithm already exists and is well known, bc this is how libraries like numpy and pytorch implement this

proud wing Dec 29, 2023, 5:10 PM

#

yea

#

numpy uses einstein sums to optimize contractions

topaz turtle Dec 29, 2023, 5:10 PM

#

but how tho 😭

proud wing Dec 29, 2023, 5:10 PM

#

oh you want to kntow how it works?

topaz turtle Dec 29, 2023, 5:11 PM

#

i mean, i understand how einsum works, i'm implementing my own tensor contraction algorithm that way rn, but i don't understand how that would help expressin einstin summations in terms of blas routines

proud wing Dec 29, 2023, 5:12 PM

#

A = np.random.rand(2, 3, 4)
B = np.random.rand(5, 4, 6)

B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)

something like this?

proud wing Dec 29, 2023, 5:13 PM

#

topaz turtle i mean, i understand how einsum works, i'm implementing my own tensor contractio...

this can be directly correlated to einsum

#

closely

topaz turtle Dec 29, 2023, 5:14 PM

#

does this give the rame result as

numpy.tensordot(A, B, axes=[[2],[1]])

?

proud wing Dec 29, 2023, 5:15 PM

#

well yes

topaz turtle Dec 29, 2023, 5:15 PM

#

i don't know numpy well, so i'm kinda confused by the arguments of .reshape(), i thought the arugments was just the desired shape?

proud wing Dec 29, 2023, 5:16 PM

#

in the case (beforce your example) we wanted to reshare the transposed tensor to align the 3rd dimension of A with the second dimension of B for contraction

#

reshape sorry

#

so the manual approach i shared gives more control

#

because in your case is not explicitly specifying the reshaping and permutation of dimensions.. tensordot does perform a contraction over a specified axes, which is the axes by which you want to contract the tensor. [[2],[1]]

so 3rd dimension of A (index 2) and second dimension of B (index 1)

#

which exactly what the manual approach i shared does

#

if i had to guess (and i've never looked honestly) numpy.tensordot is probably more efficient 🙂

topaz turtle Dec 29, 2023, 5:19 PM

#

i see

proud wing Dec 29, 2023, 5:19 PM

#

but this does get me thinking

topaz turtle Dec 29, 2023, 5:20 PM

#

btw, the way numpy does it, reshaping things doesn't acc affect the data in any way, it's simply a reordering of dimensions and only has an effect on how you read/write to it, right?

proud wing Dec 29, 2023, 5:21 PM

#

ok... so hmm

#

if you wanted something more general you'd want to be able to handle any valid combination of contraction axes specified as pairs

#

and you'd want to magically (i say that word instead of automation) infer the shape and reshape and permutate without manual intervention

#

let me try something real quick

topaz turtle Dec 29, 2023, 5:22 PM

#

i already have an algorithm that does this, but it's pretty wonky and as a result very slow

#

which is why i'm trying to express it in terms of blas

proud wing Dec 29, 2023, 5:24 PM

#

sec let me open up jupyter

#

i want to benchmark a general case

#

i'll use your inputs if that works

topaz turtle Dec 29, 2023, 5:25 PM

#

btw, i checked your algorithm, and it gives a different result from just doing tensordot (unless i messed smth up)

A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)

B_permuted = np.transpose(B, (0, 2, 1))
result_tensor = np.matmul(A.reshape(-1, 4), B_permuted.reshape(4, -1)).reshape(2, 3, 5, 6)

print(np.tensordot(A, B, axes=[[2], [1]]))
print(result_tensor)

proud wing Dec 29, 2023, 5:26 PM

#

thats yours right?

topaz turtle Dec 29, 2023, 5:26 PM

#

that's your code, i just changed A and B to have concrete values

proud wing Dec 29, 2023, 5:26 PM

#

oh right

#

but yours at the bottom

topaz turtle Dec 29, 2023, 5:26 PM

#

to easily compare results betwen runs

topaz turtle Dec 29, 2023, 5:26 PM

#

proud wing but yours at the bottom

yes

#

numpy must be doing smth more than just reshaping the tensors into desired matrix shapes

proud wing Dec 29, 2023, 5:28 PM

#

also do you know about TensorLy?

#

if my goal was to make something blazing fast btw

topaz turtle Dec 29, 2023, 5:28 PM

#

bc the following two examples of tensor contraction

(2,3,4)@(5,4,6) and (2,3,4)@(5,6,4) would be mapped into the same matrix product, but technically shoudl have different results...

proud wing Dec 29, 2023, 5:28 PM

#

I'd writ ethe tensor operations in C/C++

#

and call the blas routines

topaz turtle Dec 29, 2023, 5:28 PM

#

proud wing also do you know about TensorLy?

no 👀

topaz turtle Dec 29, 2023, 5:29 PM

#

proud wing I'd writ ethe tensor operations in C/C++

my issue is that i can't yet express arbitrary tensor contraction product in terms of blas, but yes

proud wing Dec 29, 2023, 5:29 PM

#

TensorLy is a pure tensor library w/ blas based contractions

topaz turtle Dec 29, 2023, 5:29 PM

#

i see

#

the goal of my project was to understand how to acc implement all those ops, so this kinda loses the purpose 😦

proud wing Dec 29, 2023, 5:30 PM

#

ah

#

mine will do that

topaz turtle Dec 29, 2023, 5:30 PM

#

proud wing mine will do that

wdym

proud wing Dec 29, 2023, 5:30 PM

#

not the one i showed you only

serene scaffold Dec 29, 2023, 5:30 PM

#

huh, you two are new
welcome to our wonderful data science and AI chat

proud wing Dec 29, 2023, 5:31 PM

#

Hi @serene scaffold thx so much

#

hope you're having a happy holiday 🙂

topaz turtle Dec 29, 2023, 5:32 PM

#

serene scaffold huh, you two are new welcome to our wonderful data science and AI chat

ty 🙂

#

this question does acc tie into ml, bc efficient tensor contraction lies at the core of tensor autodiff, so this is on topic lol

serene scaffold Dec 29, 2023, 5:33 PM

#

I didn't say it wasn't

topaz turtle Dec 29, 2023, 5:33 PM

#

fair 🙂

#

do you by chance have any suggestions wrt the question? 👀

proud wing Dec 29, 2023, 5:34 PM

#

also i have another fun idea to test

topaz turtle Dec 29, 2023, 5:34 PM

#

👀

proud wing Dec 29, 2023, 5:34 PM

#

i'm assuming my func will be slower than numpy

#

but i want to experiment with numba

serene scaffold Dec 29, 2023, 5:34 PM

#

topaz turtle do you by chance have any suggestions wrt the question? 👀

uhh kinda busy tbh

proud wing Dec 29, 2023, 5:34 PM

#

since it can compile it down to machine code

late salmon Dec 29, 2023, 5:34 PM

#

Hi can anyone help me

topaz turtle Dec 29, 2023, 5:34 PM

#

numpy uses hyperoptimised blas routines, so even being 2x-3x slower than numpy is a success in my book tbh

serene scaffold Dec 29, 2023, 5:52 PM

#

late salmon Hi can anyone help me

you have to say exactly what it is that you need help with before anyone can even try

proud wing Dec 29, 2023, 5:59 PM

#

@topaz turtle im debugging some code.. my contracted tensor1 resulted slightly different dimensions than tensor2 😄

#

in my very general approach that i'm experimenting

#

tensor math not for the faint of heart..

#

Ok wow...

#

i really thought this idea was sound

#

ahh

#

seems like maybe its because i'm using numpy's matmul

#

on tensors that dont have compatible inner dimensions.

#

looks like long form math is the only way

#

Hey @topaz turtle you know how BLAS is a 4-letter acronym?

#

I came up with one for this contraction math long-form called TRIS 🙂

#

Transform, reshape, iterate, and sum

proud wing Dec 29, 2023, 7:01 PM

#

@topaz turtle Ok sorry it took a lot longer than I thought

#

coding a completely general blas transform was harder than I thought

#

@topaz turtle

import numpy as np
import time
###### falcon wings axis permutations to simulate blas w/ tensor contractions
###### v1 with numpy arrays 
# 100,000 Iterations 

A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)
contraction_axes = [[1, 2], [0, 1]]  
from unittest import result
# Tensor Contraction BLAS experiments
# TRIS - transform reshape iterate sum :)

def long_form_third_grade_math_tensor_contraction(tensor1, tensor2, contraction_axes):

    # we will use numpy for its efficient arrays but will implement the tensor math
    tensor1 = np.asarray(tensor1)
    tensor2 = np.asarray(tensor2)

    # T (Tris)
    permuted_axes1 = [axis for axis in range(tensor1.ndim) if axis not in contraction_axes[0]] + contraction_axes[0]
    permuted_axes2 = [axis for axis in range(tensor2.ndim) if axis not in contraction_axes[1]] + contraction_axes[1]
    
    tensor1 = np.transpose(tensor1, permuted_axes1)
    tensor2 = np.transpose(tensor2, permuted_axes2)

    # R (tRis)
    new_shape1 = tensor1.shape[:-len(contraction_axes[0])] + (-1,)  
    reshaped_tensor1 = tensor1.reshape(new_shape1)

    new_shape2 = (-1,) + tensor2.shape[len(contraction_axes[1]):]
    reshaped_tensor2 = tensor2.reshape(new_shape2)

    result_shape = tensor1.shape[:-len(contraction_axes[0])] + tensor2.shape[len(contraction_axes[1]):]
    result_tensor = np.zeros(result_shape)

    # trIs (Iterate)
    for i in np.ndindex(result_tensor.shape[:-1]):
        for j in range(result_tensor.shape[-1]):
            sum_over_axes = 0
            for k in range(reshaped_tensor1.shape[-1]):
            # triS (Sum)
                sum_over_axes += reshaped_tensor1[i + (k,)] * reshaped_tensor2[(k,) + (j,)]
            result_tensor[i + (j,)] = sum_over_axes

    return result_tensor

start_time = time.time()

for letsgooo in range(100000):
    result = long_form_third_grade_math_tensor_contraction(A, B, contraction_axes)
general_hundy = time.time() - start_time
print(f"falcon slowbie: {general_hundy:.4f} seconds")

falcon slowbie: 11.3511 seconds

#

And now for numpy's

#

import numpy as np
# Numpulous einstanamous 
A = np.arange(2 * 3 * 4).reshape(2, 3, 4)
B = np.arange(5 * 4 * 6).reshape(5, 4, 6)

start_time = time.time()
for _ in range(100000):
    result_tensor_einsum = np.einsum('ijk,lmn->iln', A, B)
einsum_time = time.time() - start_time

einsum_time_str = f"einsum smasher: {einsum_time:.4f} seconds"
einsum_time_str

einsum smasher: 4.3604 seconds

pure pond Dec 29, 2023, 7:20 PM

#

dont forget to SMASH that einsum

proud wing Dec 29, 2023, 7:21 PM

#

ahh

#

always gotta smash

candid spruce Dec 29, 2023, 7:21 PM

#

Would anyone be interested in doing a deep learning project with me it will be a self dataset building DL ai dm me if interested 😁

primal agate Dec 29, 2023, 7:21 PM

#

Any ideas how to start with machine learning? Maybe learn pandas, flask,numbpy first?

pure pond Dec 29, 2023, 7:21 PM

#

https://blog.paperspace.com/a-practical-guide-to-deep-learning-in-6-months/ is good imo

candid spruce Dec 29, 2023, 7:22 PM

#

primal agate Any ideas how to start with machine learning? Maybe learn pandas, flask,numbpy f...

pandas is good for a simple up to complex dataset

candid spruce Dec 29, 2023, 7:22 PM

#

primal agate Any ideas how to start with machine learning? Maybe learn pandas, flask,numbpy f...

i would suggest tensorflow im my opinion

primal agate Dec 29, 2023, 7:22 PM

#

thanks for opinion

#

and advice

proud wing Dec 29, 2023, 7:22 PM

#

primal agate thanks for opinion

https://roadmap.sh/ai-data-scientist
just check off all the boxes on this site and you'll be good to go

roadmap.sh

AI and Data Scientist Roadmap

Learn to become an AI and Data Scientist using this roadmap. Community driven, articles, resources, guides, interview questions, quizzes for modern backend development.

pure pond Dec 29, 2023, 7:22 PM

#

pytorch imo

candid spruce Dec 29, 2023, 7:23 PM

#

pure pond pytorch imo

that one is little more complex to start you go simpler

final kiln Dec 29, 2023, 7:23 PM

#

I've finished setting up the workflow and all the aws stuff, Im now waiting for amazon to accept my quota increase request so I can connect everything up

#

that is github actions

#

the actions workflow is gonna babysit the gpu spot instances, and the spot instances are gonna train GPT and send the info to MLFlow, I deploed it on the free tier aws instance

pure pond Dec 29, 2023, 7:25 PM

#

is this your whisper thing?

final kiln Dec 29, 2023, 7:26 PM

#

yes, I'm setting up the infra for doing experiments, there's a couple things I wanna try out regarding a possible modification of the self attention mechanism

pure pond Dec 29, 2023, 7:28 PM

#

nice gl

final kiln Dec 29, 2023, 7:30 PM

#

ty

proud wing Dec 29, 2023, 7:39 PM

#

@topaz turtle was that helpful...

dense yarrow Dec 29, 2023, 8:41 PM

#

I want to create a project to add to my portfolio since it's empty right now.
I'm new to data science and know very basic python. Any suggestions on what kind of projects I could do? Or if you know of any examples, that'd be helpful too!

#

Also, should I do it on vs studio code of google collab? Those are the two platforms I'm familiar with

desert oar Dec 29, 2023, 9:01 PM

#

final kiln the actions workflow is gonna babysit the gpu spot instances, and the spot insta...

how are you handling interruptions to the spot instance? saving the model at every epoch?

final kiln Dec 29, 2023, 9:02 PM

#

desert oar how are you handling interruptions to the spot instance? saving the model at eve...

I found this thing: https://github.com/spotty-cloud/spotty

GitHub

GitHub - spotty-cloud/spotty: Training deep learning models on AWS ...

Training deep learning models on AWS and GCP instances - GitHub - spotty-cloud/spotty: Training deep learning models on AWS and GCP instances

#

It automates all the details

desert oar Dec 29, 2023, 9:02 PM

#

dense yarrow I want to create a project to add to my portfolio since it's empty right now. I...

ideally, pick something you already have an interest in (gaming, cooking, sports, whatever) and look for interesting questions to ask, and data that you can use to answer those questions. once you have a question to answer and some data to work with, you can go in pretty much any direction from there

#

it doesn't have to be super clever or detailed. but all successful real-world projects start with a question to answer and some data to work with

final kiln Dec 29, 2023, 9:03 PM

#

final kiln I found this thing: https://github.com/spotty-cloud/spotty

Upon setting it up, it uploads code to a bucket it creates automatically. I reckon it will also cache docker checkpoints or something like that

desert oar Dec 29, 2023, 9:04 PM

#

if you're stuck for ideas, check out kaggle. for example the Titanic dataset and the associated survival classification task is an excellent beginner project

final kiln Dec 29, 2023, 9:04 PM

#

But haven't yet been able to use it because AWS defaults the quota for GPU instances to 0, so I'll have to wait 24h til they increase it

desert oar Dec 29, 2023, 9:04 PM

#

final kiln I found this thing: https://github.com/spotty-cloud/spotty

nice, let me check this out. might need it for work some day

final kiln Dec 29, 2023, 9:04 PM

#

desert oar nice, let me check this out. might need it for work some day

Def worth, spot instances are a loooot cheaper

desert oar Dec 29, 2023, 9:05 PM

#

final kiln Def worth, spot instances are a loooot cheaper

fortunately until now i've been able to do it all right on my company macbook

#

pretty powerful machine, and even does gpu albeit much slower than an actual gpu

final kiln Dec 29, 2023, 9:07 PM

#

That's cool, I've been relying on kaggle and colab for free GPU. But I reckon that once I start doing large text datasets + larger gpts this will be important.

topaz turtle Dec 29, 2023, 9:23 PM

#

proud wing <@879726954356572160> was that helpful...

sorry, i got quite drunk 😭😭😭😭😭😭😭😭

#

i’ll check it tmr 🥰🥰🥰🥰

lapis sequoia Dec 29, 2023, 10:12 PM

#

topaz turtle sorry, i got quite drunk 😭😭😭😭😭😭😭😭

wtf

topaz turtle Dec 29, 2023, 10:15 PM

#

lapis sequoia wtf

it’s a thing that ppl do? 😭😭

lapis sequoia Dec 29, 2023, 10:18 PM

#

bro im a kid

#

i can relate

rugged zinc Dec 30, 2023, 12:26 AM

#

hey, i was wondering if there's a resources page for this topic

#

i mean data sc/analyis

desert oar Dec 30, 2023, 12:49 AM

#

rugged zinc hey, i was wondering if there's a resources page for this topic

check the pinned messages

jolly horizon Dec 30, 2023, 1:14 AM

#

Has anyone ever used matplotlib for 3d plot ? I have two planes in 3d and the dot product of their normal vectors is literally 0, and legendary library plot them to be legit parallel, i am so confused..

jade bloom Dec 30, 2023, 1:33 AM

#

Hi, anyone have experience with deep q reinforcement learning? I have a few questions

serene scaffold Dec 30, 2023, 1:33 AM

#

jade bloom Hi, anyone have experience with deep q reinforcement learning? I have a few ques...

JustAsk

#

("asking to ask" creates extra steps that decrease the chances your question will ever get answered.)

jade bloom Dec 30, 2023, 1:34 AM

#

thanks for the advice!

#

How does a Deep Q Network for reinforcement learning learn at all?

To my understanding, you have a minibatch of random experiences, where each experience is in the format (current_state, action, reward, next_state) the way a DQN works is that you pass in the current game state(current_state) and get an estimate of the q-values for each action then, you choose the q-value of the action you took and store it in a variable. let's call it current_q_for_action_taken for now

then, with a seperate target network, we pass in next_state and get q_values for each action we can then calculate target_q_for_action_taken as follows: target_q_for_action_taken = reward+max(q's from target network)

Then, we can calculate the MSE loss for updating the other model(not target model) as follows: (target_q_for_action_taken-current_q_for_action_taken)^2.

Then using gradient descent and backpropogation, we can update that network's weights and biases(I think it updates biases)

then, every n steps we transfer the weights from this network over to the target network

my question is: how does all this allow this network to estimate an optimal q-function? It seems to me that the network will just flail around randomly adjusting weights and biases but never learning the optimal q-function to accurately map current game states to accurate q-values

agile owl Dec 30, 2023, 2:08 AM

#

What does it mean if when you split your training data into periods and the vector of the fit parameters over time have significant autocorrelation/time series dynamics. Does it mean you should fit a per-period model on some arbitrary periodization of your training data or should you try to find the latent variable that explains the time series dynamics and make one model for all periods?

#

for example, the optimal value of one parameter seems to oscillate back and forth quite predictably from one period to the next the way I'm splitting up the data

upbeat linden Dec 30, 2023, 4:57 AM

#

I’ve done a lot of research and the answer has pretty much been a toss up. PyTorch vs Tensorflow what is the best to learn for a beginner in the Machine Learning field (fully fluent in Python already)

I am looking to develop a model that can recognize a pattern in a database of a food journal. My dad that had cancer still has stomache issues, and he has 2 years worth of food entries and bowel movements. I am trying to develop a model that can tell him which foods he can and cannot eat based on the log.

It seems that I should be using a classification model, but I am not sure what the best way/how to approach this general solution

serene scaffold Dec 30, 2023, 5:06 AM

#

I don't think most people care that much. but no one really talks about perceptrons except when discussing the history of AI.

iron basalt Dec 30, 2023, 5:26 AM

#

It's contradicting itself. You can bring this up as an edit to the Wikipedia page.

final kiln Dec 30, 2023, 8:15 AM

#

Isn't the Heaviside a non-linear function ?

#

a linear function is a polynomial of degree one or less, including the zero polynomial

which is not the case for the step function, doesnt fit into ax+b

final kiln Dec 30, 2023, 8:32 AM

#

https://en.wikipedia.org/wiki/Piecewise_linear_function - this would be the correct terminology I think

Piecewise linear function

In mathematics and statistics, a piecewise linear, PL or segmented function is a real-valued function of a real variable, whose graph is composed of straight-line segments.

topaz turtle Dec 30, 2023, 9:04 AM

#

proud wing <@879726954356572160> ```python import numpy as np import time ###### falcon wi...

interestingly, your algorithm is as fast on my laptop as numpy's 👀

proud wing Dec 30, 2023, 9:04 AM

#

Yea it's pretty good

#

I'm going to implement it in C later

#

as I actually have some ideas for it

topaz turtle Dec 30, 2023, 9:04 AM

#

ooh interesting

proud wing Dec 30, 2023, 9:04 AM

#

for transformer math

topaz turtle Dec 30, 2023, 9:04 AM

#

acc the reason i asked in the first place is bc i need it for a C library i'm writing lol

proud wing Dec 30, 2023, 9:04 AM

#

at the moment I'm finishing up some code that instantly analyzes a models layers

#

an open source ones:D

topaz turtle Dec 30, 2023, 9:05 AM

#

👀

#

feel free to ping me when/if you get to doing that 🙂

topaz turtle Dec 30, 2023, 9:08 AM

#

proud wing Yea it's pretty good

btw, there is a potential problem w/ testing on such small tensors

#

when i tested my own algorithm, i learned that for small tensors there's a very small disparity in performance, but once i tested on tensors with ~million elements, it revealed that the time disparity in reality is acc like a 100 😭

proud wing Dec 30, 2023, 9:22 AM

#

just an example

topaz turtle Dec 30, 2023, 10:01 AM

#

does anyone know if libraries like numpy acc move data around when doing a transpose (to preserve linearity of data access)

odd meteor Dec 30, 2023, 10:43 AM

#

upbeat linden I’ve done a lot of research and the answer has pretty much been a toss up. PyTor...

I started with TensorFlow because it appeared somewhat more beginner-friendly to me, however, since I tried PyTorch I never went back to TensorFlow.

Idk how to explain it lol but PyTorch gives you this liberty to explore and navigate your ship however you deem fit. So long as you enjoyed OOP when you learned Python, you will most likely enjoy PyTorch.

More so, you could as well leverage PyTorch Lightning; which is more like a wrapper for PyTorch models, to even go brrrrrrrrrrrrrr while driving your ship.

I'd say, just start with anyone framework that comes easy to you. It doesn't even have to make sense why you pick one over the other, just get started already.

I'm sorry to hear about your Dad's stomach issues. Hopefully, what you're trying to build helps him get better.

Goodluck 💯

odd meteor Dec 30, 2023, 11:02 AM

#

There's an error in their definition.

Peceptron uses a threshold function which is a linear function. It becomes MLP when you introduce at least one hidden layer and the activation function changes from a threshold function to a non-linear activation function like ReLU, SWISH, Phish, Tanh etc.

proud wing Dec 30, 2023, 11:15 AM

#

Ive had much better experience w pytorch as well. Have you ever tried building tensorflow from source w tensor rt and cuda w clang? I have. Its not pleasant.

final kiln Dec 30, 2023, 11:20 AM

#

odd meteor There's an error in their definition. Peceptron uses a threshold function whic...

I think calling the threshold function linear is a misnomer in on itself, especially because ReLU is being considered non linear too, they are both piecewise linear, but non are strictly linear

proud wing Dec 30, 2023, 11:21 AM

#

Training going. Time to go chill:)

final kiln Dec 30, 2023, 11:27 AM

#

final kiln I found this thing: https://github.com/spotty-cloud/spotty

this thing ended up not working out, I tried using the t2.micro just to get it going, it starts the spot instances and takes care of a lot of stuff, but then it gets stuck on some unknown signal that google can't find.

im just gonna code a simple master-worker setup myself using boto3, I can use it to start the spot instance and run a script that starts the training process, the worker will drop messages to an aws queue. spot instances give a 2 min warning, which I can use to save state and restart where I left off on some other spot instance

odd meteor Dec 30, 2023, 11:32 AM

#

final kiln I think calling the threshold function linear is a misnomer in on itself, especi...

Yeah that's true. You're right

final kiln Dec 30, 2023, 1:17 PM

#

y does stuff that sound simple always turn out to be complicated >.>

#

need to rethink this

topaz turtle Dec 30, 2023, 1:27 PM

#

proud wing Ive had much better experience w pytorch as well. Have you ever tried building t...

to be fair to tf, pytorch isn't all that great in that regard as well

for example, last time i checked, if you wanted pytorch w/ vulkan support, you had to build it from source, not to mention that the only guide/documentation wrt that is a pytorch blogpost from 3 years ago

arctic wedgeBOT Dec 30, 2023, 2:50 PM

#

:incoming_envelope: :ok_hand: applied timeout to @brave thistle until <t:1703948445:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

agile owl Dec 30, 2023, 3:50 PM

#

If I'm just probing to see whether a reinforcement algorith might have merit how many timesteps should I invest in training before deciding whether the problem is suitable or not

#

doing anything that involves nvidia driver or cuda toolkit dependencies from source sounds like nightmare fuel

#

I'm lucky if my nvidia-smi wants to come out to play

#

instead of giving me the NVML mismatch error

#

does anyone else get that joy of an error of libcupti not being found so you need to install a 1.x version of torch?

final kiln Dec 30, 2023, 4:44 PM

#

question: if my spot instance goes down, should I restart from the last epoch or from the last best epoch ?

agile owl Dec 30, 2023, 5:00 PM

#

given your seed won't it just go in the same direction again

final kiln Dec 30, 2023, 5:03 PM

#

no, the batches are usually randomized, even with the same seed it wouldn't go in the same direction since it«s being initialized at a different starting epoch

#

finally got my first successful master-worker run

#

this was hard work ngl

#

now I gotta make it fault tolerant

agile owl Dec 30, 2023, 5:04 PM

#

how do you debug if your results are actually different every time you run it

final kiln Dec 30, 2023, 5:04 PM

#

the right pattern to use is to have mlflow store the state and then draw decisions from that

final kiln Dec 30, 2023, 5:05 PM

#

agile owl how do you debug if your results are actually different every time you run it

I want to explore state space, randomness is usually a good feature since it ensures that it is properly explored

topaz turtle Dec 30, 2023, 5:10 PM

#

proud wing <@879726954356572160> ```python import numpy as np import time ###### falcon wi...

this is only for contracting axes at the end of the first tensor and at the start of the second tensor? a la (2,3,4) x (3,4,5)?

#

right?

final kiln Dec 30, 2023, 5:16 PM

#

tho I can see the value of being able to exactly reproduce the results

#

not entirely sure how it could be done with this setup tho

#

guess I'd need to decide on the batches for all epochs a priori and save it somewhere

#

im going for last epoch then

wooden condor Dec 30, 2023, 5:42 PM

#

i am trying to load a scikit learn model from 2019 via joblib, but i am getting errors (probably because i am using newer python version now than what it was created it in) I have tried setting up a docker image and replicate environment from then, but that gave me other errors (some opencv errors). Can anybody help me load this model into a modern version of python and scikit learn? I can pay for it

crisp shuttle Dec 30, 2023, 6:06 PM

#

Hello everyone.

I wanted to ask, does anyone know any tutorial/course machine learning, where they teach how to further train and improve a model in the last remaining percentages of the loss error. I struggle finding even one on YouTube or a literature about techniques or things to do to improve it further, other than to train it longer or change the learning rate.

serene scaffold Dec 30, 2023, 6:42 PM

#

wooden condor i am trying to load a scikit learn model from 2019 via joblib, but i am getting ...

Show the error that you currently want help with.
Don't offer money.

serene scaffold Dec 30, 2023, 6:43 PM

#

crisp shuttle Hello everyone. I wanted to ask, does anyone know any tutorial/course machine l...

The reason you're not finding resources to answer that question is that there are no broadly applicable answers. Look at the insurances that your model isn't learning and think about why that might be.

crisp shuttle Dec 30, 2023, 6:52 PM

#

serene scaffold The reason you're not finding resources to answer that question is that there ar...

Thanks for responding.

Does it have to do with the data itself or maybe because there are too many outliers on the data because, when I compare some of the predicted values and their true targets, the difference are 5 times smaller (when I'm supposed to get 15K it gives me 3K) and some 2 times bigger compared to their true targets.

I normalized the features at first to the same range, that didn't help, then I normalize the targets to the same scale with the features to avoid exploding gradients, still no having this issue.

proud wing Dec 30, 2023, 7:18 PM

#

topaz turtle this is only for contracting axes at the end of the first tensor and at the star...

You also know that numpy does't support unlimited dimensionality either right?

topaz turtle Dec 30, 2023, 7:18 PM

#

proud wing You also know that numpy does't support unlimited dimensionality either right?

i think it's like 64 tho, isn't it?

proud wing Dec 30, 2023, 7:18 PM

#

After 32 dimensions, many of numpy's functions hit a wal

topaz turtle Dec 30, 2023, 7:18 PM

#

i mean ye, but i'm still trying to generalise this to at least 4 or 5 😭

proud wing Dec 30, 2023, 7:19 PM

#

you know how right?

topaz turtle Dec 30, 2023, 7:19 PM

#

not really

proud wing Dec 30, 2023, 7:19 PM

#

well there more i think about it the more i'm thinking my idea is pretty unique

#

I havent actually seen another example of doing it that way but not saying it hasnt been done

topaz turtle Dec 30, 2023, 7:20 PM

#

i'm exploring the idea of skipping the expense of the transposition step by performing scattered data access on sub-tensors of each tensor that are large enough to fit into L1 cache (so the non-linear data access due to the transposition of the axes doesn't incur any performance overhead)