#data-science-and-ml | Python | Page 53

verbal venture Mar 19, 2023, 3:06 PM

#

pandas is throwing me a 'could not convert string to float: '?' error when I try to do print(df['horsepower'].median()). Does anyone know why? I'm running this in pycharm I get an error, but when I run it in jupyter notebook it works

queen cradle Mar 19, 2023, 3:10 PM

#

You should check very carefully whether you're working with the same data in PyCharm and your Jupyter notebook. It sounds like they're out of sync.

hasty mountain Mar 19, 2023, 4:13 PM

#

Does anyone know a trick to check if a sentence has special characters(like comma) and isolate that character from nearby words?
I'm trying to preprocess a text to be used for a small-scale GPT, and I don't want to remove those special characters from my input text because, well, I want the model to learn when to use them. However, my vocabulary list doesn't have those characters attached to words(like this,), only isolated tokens ['like', 'this', ','].

#

Oh yes...regular expressions module... yert

gritty heart Mar 19, 2023, 4:16 PM

#

hasty mountain Does anyone know a trick to check if a sentence has special characters(like comm...

string slicing?

hasty mountain Mar 19, 2023, 4:16 PM

#

gritty heart string slicing?

I don't know. Maybe.

gritty heart Mar 19, 2023, 4:16 PM

#

u could use it to isolate the characters

#

do u want to completely leave them out or just make them special in some sort of way like adding 2 spaces after them or a symbol after them?

hasty mountain Mar 19, 2023, 4:17 PM

#

I want to add a space before and after that character

#

this, ----> this ,

gritty heart Mar 19, 2023, 4:18 PM

#

so whenever this AI uses those words it just adds a space?

hasty mountain Mar 19, 2023, 4:19 PM

#

Nah, it's just so I don't get an error because my vocabulary was made like ['this', ','] and not like ['this,']

tidal bough Mar 19, 2023, 4:20 PM

#

hasty mountain Does anyone know a trick to check if a sentence has special characters(like comm...

So the sort of Byte-Pair Encoding used in LLMs? There's probably some good tokenizer for this (maybe in something like spacy?) but as a stopgap, multiple str.partitions come to mind.

hasty mountain Mar 19, 2023, 4:20 PM

#

I didn't really want something to return a list, though

#

There's sentence.replace(',', ' , '), but...if I use that for multiple characters, it gets messy

tidal bough Mar 19, 2023, 4:21 PM

#

something like

def partition_all(text: str, separators: list[str]) -> list[str]:
    tokens = [text]
    for sep in separators:
        tokens = [part for token in tokens for part in token.partition(sep) if part]
    return tokens

is what I'm thinking. probably very slow though.

hasty mountain Mar 19, 2023, 4:21 PM

#

hasty mountain I didn't really want something to return a list, though

Wait, I'll review some things. Maybe returning a list may be better, afterall

Nah, maybe not much

tidal bough Mar 19, 2023, 4:24 PM

#

oh right, that doesn't quite work because partition only does one

#

one needs re.split instead

#

!e

import re
def tokenize(text: str, separators: list[str]) -> list[str]:
    # separators should be escaped if they are regex-special!
    tokens = [text]
    for sep in separators:
        tokens = [part for token in tokens for part in re.split(f"({sep})", token) if part]
    return tokens
print(tokenize(
    "Nah, it's just so I don't get an error because my vocabulary was made like ['this', ','] and not like ['this,']",
    list(";, '"),
))

arctic wedgeBOT Mar 19, 2023, 4:26 PM

#

@tidal bough :white_check_mark: Your 3.11 eval job has completed with return code 0.

['Nah', ',', ' ', 'it', "'", 's', ' ', 'just', ' ', 'so', ' ', 'I', ' ', 'don', "'", 't', ' ', 'get', ' ', 'an', ' ', 'error', ' ', 'because', ' ', 'my', ' ', 'vocabulary', ' ', 'was', ' ', 'made', ' ', 'like', ' ', '[', "'", 'this', "'", ',', ' ', "'", ',', "'", ']', ' ', 'and', ' ', 'not', ' ', 'like', ' ', '[', "'", 'this', ',', "'", ']']

tidal bough Mar 19, 2023, 4:26 PM

#

this looks good to me

hasty mountain Mar 19, 2023, 4:27 PM

#

Thanks!

wooden sail Mar 19, 2023, 4:36 PM

#

my best guess is that the curvature of your loss function is very steep near that local minimum, so it starts behaving poorly. somethings to notice with SGD are that you have no guarantee the loss will actually decrease at each iteration, and the convergence of all gradient methods depends on the curvature of the loss. the larger the curvature, the smaller the step size has to be

#

maybe try with a smaller step size first

serene scaffold Mar 19, 2023, 4:39 PM

#

wooden sail my best guess is that the curvature of your loss function is very steep near tha...

are there optimizers intended to overcome this? like AdamW?

merry fern Mar 19, 2023, 4:39 PM

#

verbal venture pandas is throwing me a 'could not convert string to float: '?' error when I try...

There's a string in your column

wooden sail Mar 19, 2023, 4:42 PM

#

serene scaffold are there optimizers intended to overcome this? like AdamW?

yeah so, my understanding of adam is that it's a form of gradient with momentum, which avoids the gradient suddenly changing directions

merry fern Mar 19, 2023, 4:44 PM

#

Try df.info() whenever you get a TypeError and then go from there.

I think you could do something like this to find the strings:
df[df['horsepower'].str.contains('^[a-zA-Z]$')]

hasty mountain Mar 19, 2023, 4:52 PM

#

wooden sail yeah so, my understanding of adam is that it's a form of gradient with momentum,...

Pytorch docs says Adam uses some kind of EMA for the gradients and gradients**2

#

(Or so I remember)

serene scaffold Mar 19, 2023, 4:52 PM

#

verbal venture pandas is throwing me a 'could not convert string to float: '?' error when I try...

if your "horsepower" column has non-numeric information in it (like what unit it's using), that needs to be removed, or put in a different column.

verbal venture Mar 19, 2023, 5:31 PM

#

serene scaffold if your "horsepower" column has non-numeric information in it (like what unit it...

It worked in jupyter notebook tho. Wouldn’t it still throw the same error there?

verbal venture Mar 19, 2023, 5:31 PM

#

merry fern There's a string in your column

Will try this, thanks

wooden sail Mar 19, 2023, 5:36 PM

#

does it still work in your notebook if you rerun everything from top to bottom in one go?

serene scaffold Mar 19, 2023, 5:40 PM

#

verbal venture It worked in jupyter notebook tho. Wouldn’t it still throw the same error there?

jupyter notebooks make it incredibly easy to throw in extra steps that won't necessarily be re-done the next time you run the notebook.

verbal venture Mar 19, 2023, 5:41 PM

#

wooden sail does it still work in your notebook if you rerun everything from top to bottom i...

Yeah..

serene scaffold Mar 19, 2023, 5:41 PM

#

verbal venture Yeah..

then there's some way that your non-notebook code is different from the code in the notebook.

#

also, when Edd says "rerun everything from top to bottom in one go", that includes restarting the notebook kernel.

#

think of all the variables and functions in your notebook as existing in a dict (because they actually do). each cell can change what's in that dict. deleting the cell or changing what's in it and re-running it doesn't undo what changes you made to the global state dict.

#

whereas restarting the kernel clears everything out of the global state dict, and you start over fresh.

ocean holly Mar 19, 2023, 5:51 PM

#

How do i use gpu on jupyter lab ? I am training AI with cpu . and it is so slow

serene scaffold Mar 19, 2023, 6:05 PM

#

ocean holly How do i use gpu on jupyter lab ? I am training AI with cpu . and it is so slow

this isn't a jupyter question. it's a "what library are you using" and "what GPU do you have" question.

ocean holly Mar 19, 2023, 6:05 PM

#

serene scaffold this isn't a jupyter question. it's a "what library are you using" and "what GPU...

i am using tensorflow and i got 3080 ti

#

and 3090 at school

serene scaffold Mar 19, 2023, 6:06 PM

#

ocean holly i am using tensorflow and i got 3080 ti

I don't use tensorflow, but this is the guide for using a GPU with it https://www.tensorflow.org/guide/gpu

TensorFlow

Use a GPU | TensorFlow Core

mellow pendant Mar 19, 2023, 6:19 PM

#

Hi all, I have a question more on statistics. I have an excel file with that contains transaction information. The file has seperate sheets for each year since 2011 and contains the columns on the business name, customer name, line of business, sub line of business, and revenue. I want to look at specific businesses and see if diversifying their products has helped with revenue. I have looked at it a few different ways and wanted to know if it would make sense to look at the standard deviation of the percentage make up of the sub linbes of business as a measure of if a company has diversified. An example being if a certain company had 80%/10%/10% breakdown of the sub lines of business where one takes up 80% of the records, the standard deviation would be higher than if it were something like 50%/30%/20% since the values would be closer to the mean. Does this make sense to do it that way? I feel like I am missing something.

languid mortar Mar 19, 2023, 8:24 PM

#

I had a data scaling question and database question, if i was storing json data and webpage html in a db, what is an efficient db backend to use plain old mysql redis or something else. I'd like to build this app to be scalable from the begining so I don't have to change anything later. it will mostly be used for a cache. Thanks for the guidance

#

oh theres a database channel nevermind my bad

violet gull Mar 19, 2023, 8:30 PM

#

wooden sail maybe try with a smaller step size first

0.001 isnt small enough?

wooden sail Mar 19, 2023, 8:31 PM

#

the number depends entirely on the loss function, the network, and the data. there is no 1 number that always works

#

try dividing by 10 or 100 and see if it behaves any better or different. if not, then we have to give some thought to what the reason might be

violet gull Mar 19, 2023, 8:37 PM

#

#

@wooden sail 0.0001

#

but all that did was slow the rate down so if i did more than 200 iterations it would probably go back to doing the unga bunga dance

velvet mountain Mar 19, 2023, 8:49 PM

#

does someone know a good alternative to MLflow that is reliable? I tried unsuccessfully to set it up with FTP but ran into massive failures (and I'm not the only one, according to their issue tracker). so I'm a bit tired of them. any other good tool in the scope of managing model deployments ?

gilded bobcat Mar 19, 2023, 8:49 PM

#

Hi all I had a question on a personal project im working on:

I have animal types, their outcome (adopted, not adopted), etc...

I find that most animals get adopted in 2 months of less, however I get roughly .01% being adopted in 2 years, 3 years, 5 years.... These extreme observations are true and are not errors. I fear keeping them will ruin all my inputs to my models as the standardization values will be influenced by these outliers, however I don't want to drop them because they are informative!

Curious on what I should do? Leaning on dropping anyway?

velvet mountain Mar 19, 2023, 8:52 PM

#

gilded bobcat Hi all I had a question on a personal project im working on: I have animal type...

random question, but : are you sure normalising is a good idea?

clearly the distribution is not normal : are you sure you control the distribution after a renormalisation? 🙂

gilded bobcat Mar 19, 2023, 8:54 PM

#

velvet mountain random question, but : are you sure normalising is a good idea? clearly the dis...

Hmmm good question.... Now I am all turned around hahaha....

Like I want to normalize all my variables so they're in a similar scale before learning. Demeaning/normalizing would make my distributions normal (now they're typically 0 to X for days in pound, age, etc...) but keeping in the outliers would screw up what the centered value would be?

#

idk if my logic even sound to you lol

#

I guess I could standardize to the median and not mean, win-win?

violet gull Mar 19, 2023, 8:56 PM

#

#

i did 2000 iterations and 0.0001 lr

#

it still broke

velvet mountain Mar 19, 2023, 8:56 PM

#

gilded bobcat Hmmm good question.... Now I am all turned around hahaha.... Like I want to nor...

yeah it's quite rare people in machine learning are statistician enough to really care about statistical errors they commit, so I guess we don't care much

#

median sounds like a nice thing to at least try

gilded bobcat Mar 19, 2023, 8:57 PM

#

I could always share my screen and show you my logic but idk if you wanna hear my rambling lol

velvet mountain Mar 19, 2023, 8:57 PM

#

std deviation is also not robust to outliers, so wach out here too

gilded bobcat Mar 19, 2023, 8:57 PM

#

Yeah fair ty ty

#

I'll just push forward and hope for the best 🙂

queen cradle Mar 19, 2023, 9:10 PM

#

@gilded bobcat My advice is, don't try to fit a normal distribution. Instead, try a gamma distribution. They're more appropriate to your situation.

gilded bobcat Mar 19, 2023, 9:14 PM

#

queen cradle <@223920920085921802> My advice is, don't try to fit a normal distribution. Inst...

this makes me wonder, will the distribution even matter if I just use non-parametric models like decision trees ?

queen cradle Mar 19, 2023, 9:17 PM

#

If you're ultimately going to use a non-parametric model, then I'd say the only reason to try to fit a parametric distribution in the first place is so that you have some idea of what the data looks like.

#

If a gamma distribution fits well, then you learn something. If it doesn't fit well, and if you can identify where it doesn't fit well, then you learn something else.

#

That is, you can fit a parametric distribution as a kind of exploratory tool, instead of because you want to shoehorn the data into something parametric.

gilded bobcat Mar 19, 2023, 9:21 PM

#

Makes sense, do you have advice on how to ensure it looks like a gamma dist other than inspection? Ngl I am used to shoving my data into normal and moving on lol

queen cradle Mar 19, 2023, 9:22 PM

#

For exploratory purposes, inspection is a good way to go. For example, plot the density of the fitted gamma distribution over a histogram or KDE of the data. Or make a Q-Q plot of the data versus the fitted gamma distribution.

gilded bobcat Mar 19, 2023, 9:25 PM

#

Got it! Visually it looks good, let me know if you agree:

queen cradle Mar 19, 2023, 9:27 PM

#

Huh, those are interesting. They all seem to have distinct elbows.

gilded bobcat Mar 19, 2023, 9:27 PM

#

Like the ~750 spot on "adopt"?

#

queen cradle Mar 19, 2023, 9:28 PM

#

Yeah. They all seem to have elbows at about that height.

hasty mountain Mar 19, 2023, 9:51 PM

#

Hey guys, can someone help me with Pretraining of a Transformer?
I know that the Unsupervised Learning phase of neural networks is mostly to train the "feature extracting" layers, with the objective of minimizing information entropy to make things easier for the classifier. I can see that quite easily for image models. But how can I do that for a Transformer?
Should I use as "information entropy" output the Encoder output?

But then, GPT-1 had only the Decoder part, right? Shouldn't I use something related to the Decoder for this?

hasty mountain Mar 19, 2023, 10:14 PM

#

ChatGPT told me that the Transformer would be trained to predict whether 2 generated sentences are consecutive or not...but it also gets pretty messed up with that information.

#

Uh...ok...I don't get it...it would be a CrossEntropy, ok? But what would be the targets?

silk marsh Mar 19, 2023, 10:33 PM

#

So what math Field should I learn for AI except statistics

mild elk Mar 19, 2023, 10:44 PM

#

does anyone know why this is not working

#

I just converted the numpy array X_test_MinMax into a dataframe called a

#

and then i just want to get the dataframe when the "Island" column is 1.0 but it shows NaN values

austere swift Mar 19, 2023, 10:56 PM

#

silk marsh So what math Field should I learn for AI except statistics

linear algebra and calculus are also very important

austere swift Mar 19, 2023, 10:57 PM

#

hasty mountain Uh...ok...I don't get it...it would be a CrossEntropy, ok? But what would be the...

the inputs are the previous tokens in the sequence, and the outputs are the probabilities of what the next token should be

#

for example if you give it "I really love programming in", it would give you what it thinks the next token should be, which could be something like: 0.5 python, 0.3 C++, 0.1 java, 0.05 rust, ...

hasty mountain Mar 19, 2023, 10:59 PM

#

austere swift for example if you give it "I really love programming in", it would give you wha...

Ok, that's for text generation, but what about the loss?

austere swift Mar 19, 2023, 10:59 PM

#

this is if you're using word tokenization, there's also a bunch of other tokenization methods that are subword (so tokens represent parts of words) so the tokens could be something like "th-" or "ex-"

hasty mountain Mar 19, 2023, 11:00 PM

#

In a supervised learning configuration, the loss would be CrossEntropy(model_output, target_text).
But what about unsupervised, where there's no labels, no targets?

austere swift Mar 19, 2023, 11:00 PM

#

the target is the actual next word in the sequence

#

because your training data is a bunch of text, you already know what the next word is

#

so the label is just that next word

hasty mountain Mar 19, 2023, 11:01 PM

#

That doesn't look like unsupervised learning to me

austere swift Mar 19, 2023, 11:01 PM

#

it's self-supervised learning

hasty mountain Mar 19, 2023, 11:02 PM

#

Ok, but I want to use unsupervised learning for pre-training

austere swift Mar 19, 2023, 11:02 PM

#

there is no unsupervised learning for that, people call it "unsupervised" because there are no explicit labels, but it's technically self-supervised

hasty mountain Mar 19, 2023, 11:06 PM

#

Ugh... Then I see no difference from an unsupervised learning configuration and a supervised learning for Transformer

#

Working with images is way easier... and clarified

gilded bobcat Mar 19, 2023, 11:18 PM

#

I have a NLP-ish type of question.... I have a feature called animal breeds, many (if not most) of these breeds are sparse (like 1 or 2 animals per breeds). Could embedding these with a pretrained model (like GloVe) be a good idea? Would it be able to understand the similarity between "German Shepard Mix" and "German Shepard" and "Pitbull?" My end goal is to use breed to predict if an animal will be adopted

serene scaffold Mar 19, 2023, 11:23 PM

#

gilded bobcat I have a NLP-ish type of question.... I have a feature called animal breeds, man...

You're trying to predict breeds, with what features?

gilded bobcat Mar 19, 2023, 11:24 PM

#

serene scaffold You're trying to predict breeds, with what features?

I am trying to predict if an animal will be adopted using some predictors.... One of them being breed, I feel as if it will provide some great explanatory power, but if I OHE it itll be incredibly sparse. I thought maybe I could make embeddings instead and use those for prediction.

#

To make it more confusing some animals are "short hair tabby" and others are "short hair tabby mix"

serene scaffold Mar 19, 2023, 11:25 PM

#

gilded bobcat I am trying to predict if an animal will be adopted using some predictors.... On...

What are the predictors?

gilded bobcat Mar 19, 2023, 11:27 PM

#

Y is adoption (dichotomous), possible X's are: Age (continuous), Animal Type (categorical), Breed (categorical), ,Color (categorical), Intake Reason (categorical), Intake Sex (categorical), Intake Conditional (categorical)

#

I will prob drop color its even worse than breed

serene scaffold Mar 19, 2023, 11:27 PM

#

gilded bobcat Y is adoption (dichotomous), possible X's are: Age (continuous), Animal Type (ca...

Those things you just said are called features, just so you know. That's what I was asking for earlier.

gilded bobcat Mar 19, 2023, 11:29 PM

#

serene scaffold Those things you just said are called features, just so you know. That's what I ...

Got it

serene scaffold Mar 19, 2023, 11:29 PM

#

Anyway, I wouldn't do the word embedding thing. I'll explain why later.

austere swift Mar 19, 2023, 11:34 PM

#

Just putting this out there as well, a lot of those features will likely not get you much information for the model to predict the breed

#

color and type I think would make sense, but intake reason, sex, and age likely have little to no predictive power

gilded bobcat Mar 19, 2023, 11:35 PM

#

Sorry I might have been unclear. I want to use breed as a feature to predict if an animal will get adopted.

austere swift Mar 19, 2023, 11:36 PM

#

ah okay that makes sense

austere swift Mar 19, 2023, 11:36 PM

#

gilded bobcat I have a NLP-ish type of question.... I have a feature called animal breeds, man...

I read this as "i'm trying to predict animal breeds"

gilded bobcat Mar 19, 2023, 11:36 PM

#

Yeah let me edit, I think steelercus read it the same

austere swift Mar 19, 2023, 11:37 PM

#

I think it would make sense to kind of merge some of those breeds together into the same category if they're very similar

gilded bobcat Mar 19, 2023, 11:38 PM

#

Here is an idea of what they look like:

austere swift Mar 19, 2023, 11:38 PM

#

like "german shepard mix" and "german shepard" would be merged into "german shepard" etc

gilded bobcat Mar 19, 2023, 11:38 PM

#

I would agree but my pain is like I think for dogs being 'mixed' actually matters.... A purebred german shepard is wildly different from a mix. Moreover, I am just unsure how german shepard/lab is different from a lab/german shepard... I could def break it up though

austere swift Mar 19, 2023, 11:39 PM

#

yeah if the mix would non-negligibly affect the chances of being adopted then it should be kept

gilded bobcat Mar 19, 2023, 11:39 PM

#

but like Chicken and Chicken mix? Wtf is a chicken mix??

#

I have another Q on feature selection if thats okay

#

I plan to do feature selection prior to building my model, probably like RFE.... Should I include my OHE categoricals when I do this? If so, if it says one categorical value (like A, B, C and it says A is useless) is useless then should I drop all my dummies for that categorical?

serene scaffold Mar 20, 2023, 1:10 AM

#

anyway @gilded bobcat, you could use word vectors to see if the names of the breeds form discernable clusters is that vector space, and treat any breeds that are part of the same cluster as the same breed. But that's basically just binning. If you want to create bins of breeds, you can already do that using whatever bins you want.

serene scaffold Mar 20, 2023, 1:11 AM

#

gilded bobcat I plan to do feature selection prior to building my model, probably like RFE.......

you have to decide what features you're going to feed into the model before you feed them into the model, yes. even if you decide to feed all the features into the model, that's just feature selection where you select all the features.

gilded bobcat Mar 20, 2023, 1:11 AM

#

I see, honestly worth a shot or atleast a fun way to practice my clustering techniques.... With 4k+ breeds I need a better way to automate over me deciding

serene scaffold Mar 20, 2023, 1:11 AM

#

gilded bobcat I see, honestly worth a shot or atleast a fun way to practice my clustering tech...

well, you're certainly free to. you can use kmeans clustering on the resultant vectors and see what you get.

gilded bobcat Mar 20, 2023, 1:12 AM

#

Ty 🙂

serene scaffold Mar 20, 2023, 1:12 AM

#

yw

serene scaffold Mar 20, 2023, 1:12 AM

#

gilded bobcat I plan to do feature selection prior to building my model, probably like RFE.......

you might also be conflating feature selection with feature encoding.

gilded bobcat Mar 20, 2023, 1:14 AM

#

serene scaffold you have to decide what features you're going to feed into the model before you ...

I might be confused, what if I have a categorical with A, B, and C values only. I go ahead and OHE these so that I now have three columns in my feature dataset. I then run a feature selection technique over my all my possible features. If my feature selection was to say "A and B are really important but C is useless" should I just drop the whole categorical variable or drop the one hot encoded C column?

#

I say drop all of that categorical variable, but curious none the less

serene scaffold Mar 20, 2023, 1:14 AM

#

what does ohe stand for

gilded bobcat Mar 20, 2023, 1:14 AM

#

one hot encode/dummy them out

serene scaffold Mar 20, 2023, 1:14 AM

#

ah right. I've never seen that as an acronym for some reason.

#

also there's no established meaning for "feature dataset" as a separate thing from "dataset".

#

"A and B are really important but C is useless" should I just drop the whole categorical variable or drop the one hot encoded C column?
why would you drop A and B if they're important?

gilded bobcat Mar 20, 2023, 1:17 AM

#

Because they're all dummies within the same categorical variable, I have no good reason to say I should/shouldn't, but it feels like I am tossing out a 1/3 of a variable and not sure if that's okay.

serene scaffold Mar 20, 2023, 1:18 AM

#

my guess is that the model would just learn to ignore C anyway, but it depends on the model and properties of your dataset.

mild elk Mar 20, 2023, 1:41 AM

#

mild elk does anyone know why this is not working

can someone answer my question, i am still very confused on why it is not working

gilded bobcat Mar 20, 2023, 1:45 AM

#

mild elk can someone answer my question, i am still very confused on why it is not workin...

try a.loc[a.island == 1]

mild elk Mar 20, 2023, 1:55 AM

#

it still does not work

mild elk Mar 20, 2023, 1:56 AM

#

gilded bobcat try a.loc[a.island == 1]

charred frost Mar 20, 2023, 2:27 AM

#

hey I got a quick question for pandas, I want to add a dataframe to another to the right of it

#

so add it column wise but keep it as is

#

keep the row names etc

#

so do this

       |  A                                                    |  B

after appending B to A

       |  A                      |  B

gilded bobcat Mar 20, 2023, 2:48 AM

#

mild elk

im not too sureunless I can see that minmax array, you could send example data?

charred frost Mar 20, 2023, 2:59 AM

#

charred frost so do this | A ...

if anyone knows how to do this for dfs please let me know

#

seems like osmething really basic but cant find easy way to do this

plush jungle Mar 20, 2023, 3:30 AM

#

does anyone know why the matplotlib window might be not responding when I use:

plt.pause(0.001)```

iron basalt Mar 20, 2023, 3:38 AM

#

plush jungle does anyone know why the matplotlib window might be not responding when I use: `...

Have you tried pausing for longer?

plush jungle Mar 20, 2023, 3:39 AM

#

I tried 0.1, but that didn't work either

#

same thing happens with zero pause

iron basalt Mar 20, 2023, 3:39 AM

#

How much work is it doing? Are you plotting a lot? Matplotlib is slow.

plush jungle Mar 20, 2023, 3:39 AM

#

nah, it's a single data point on the first iteration

#

def plot_rewards(show_result=False):
    plt.figure(1)

    rewards_t = torch.tensor(episode_rewards, dtype=torch.float)
    if show_result:
        plt.title('Result')
    else:
        plt.clf()
        plt.title('Training...')
    plt.xlabel('Episode')
    plt.ylabel('Reward')
    plt.plot(rewards_t.numpy())
    # Take 10 episode averages and plot them too
    if len(rewards_t) >= 10:
        means = rewards_t.unfold(0, 10, 1).mean(1).view(-1)
        means = torch.cat((torch.zeros(99), means))
        plt.plot(means.numpy())
    
    plt.pause(0.001)  # pause a bit so that plots are updated
    #plt.show()
    if is_ipython:
        if not show_result:
            display.display(plt.gcf())
            display.clear_output(wait=True)
        else:
            display.display(plt.gcf())```

#

not on ipython

iron basalt Mar 20, 2023, 3:44 AM

#

Is interactive mode on?

plush jungle Mar 20, 2023, 3:46 AM

#

iron basalt Is interactive mode on?

yes it is

#

ooh ok, now it's actually plotting data once I up the wait to 1, but it's still not responding when I click on it

#

I guess that's not a super big problem, but it is annoying

iron basalt Mar 20, 2023, 3:51 AM

#

Use FuncAnimation instead.

plucky bolt Mar 20, 2023, 5:49 AM

#

Anyone here read this book?

#

Wondering if it is beginner friendly

half quest Mar 20, 2023, 8:24 AM

#

Hey, I have alot of data that will be key value pairs. Both the keys and values will be intergers. There will probably be either millions, or maybe billions of key value pairs. What would be the best way to store this data so python could get a value with a key efficiently?

wooden sail Mar 20, 2023, 9:08 AM

#

probably with a database if you don't want to store all the data in memory

young granite Mar 20, 2023, 9:37 AM

#

Hey, im currently trying out sklearns pipeline and wondered if theres an easy way to implement preprocessing and postprocessing into it.

young granite Mar 20, 2023, 9:38 AM

#

plucky bolt Anyone here read this book?

they got bunch of books similar to this and in general i think those are good books, that said i did not read that particular book

foggy yarrow Mar 20, 2023, 10:25 AM

#

Does anyone have experience with extracting data from uneven grid?

sullen flicker Mar 20, 2023, 10:40 AM

#

I got a question regarding data labeling:

I have a few large datasets of customer feedback which I would wish to label for text classification. However, my time and resources are limited, so I do not have the capacity to label the whole dataset. Therefore, I am interested in algorithms that help me to create labels with only a fraction of the data by utilizing unsupervised/semi-supervised techniques and accepting some noise in the labels.

So my questions would be: Which approaches would you recommend? How would you solve this problem? What state of the art algorithms /papers exist on this topic?

young granite Mar 20, 2023, 10:42 AM

#

currently i use functions for preprocessing but would want to store all steps into the pipeline to have a complete "model", but i cant implement the functions directly to the pipeline cause it starts with a df then transformations etc. and i struggle to get the right approach

#

i got n dfs with the shape (300, 2) and transform them into a df where n rows are present and 60+ cols

#

from this df the cols are my features and i got another df with 20 cols which are my targets, n is always the ID of the df in all cases

cold snow Mar 20, 2023, 1:13 PM

#

hello any there

#

i need a small help

serene scaffold Mar 20, 2023, 1:15 PM

#

cold snow i need a small help

don't ask to ask. be sure that you post a complete, answerable question all at once.

cold snow Mar 20, 2023, 1:15 PM

#

ok

#

actually i am creating an AI bot. after importing the csv file i got a msg like this .

#

ValueError Traceback (most recent call last)
<ipython-input-21-6250f5fee32f> in <module>
----> 1 df.sample(6)

1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/sample.py in sample(obj_len, size, replace, weights, random_state)
148 raise ValueError("Invalid weights: weights sum to zero")
149
--> 150 return random_state.choice(obj_len, size=size, replace=replace, p=weights).astype(
151 np.intp, copy=False
152 )

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: a must be greater than 0 unless no samples are taken

#

i am using this csv file

#

📎 zerotwo.csv

#

i am using google colab to create this bot

serene scaffold Mar 20, 2023, 1:20 PM

#

@cold snow df.sample(6) worked for me when I did it with your csv, so you may have inadvertently overwritten your df variable with the wrong thing.

cold snow Mar 20, 2023, 1:21 PM

#

how to correct it

#

actually i aa begginer so i am asking

serene scaffold Mar 20, 2023, 1:21 PM

#

I haven't seen enough of your code to know. try restarting the notebook kernel, and then do df.sample(6) immediately after the df is created.

cold snow Mar 20, 2023, 1:22 PM

#

i will try it

#

bro its still the same

#

let me send my ipynb file

arctic wedgeBOT Mar 20, 2023, 1:26 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold Mar 20, 2023, 1:27 PM

#

cold snow let me send my ipynb file

ipynb files are hard to read unless you open a notebook server on your computer, so just copy and paste the code out of it.

cold snow Mar 20, 2023, 1:27 PM

#

ok

#

context window of size 7
n = 7

for i in data[data.name == 'KIRTHIN waifu'].index:
if i < n:
continue
row = []
prev = i - 1 - n # we additionally substract 1, so row will contain current responce and 7 previous responces
for j in range(i, prev, -1):
row.append(data.line[j])
contexted.append(row)

columns = ['response', 'context']
columns = columns + ['context/' + str(i) for i in range(n - 1)]

df = pd.DataFrame.from_records(contexted, columns=columns)

serene scaffold Mar 20, 2023, 1:29 PM

#

cold snow context window of size 7 n = 7 for i in data[data.name == 'KIRTHIN waifu'].inde...

what is that loop intended to do? because you should never be writing loops like that.

cold snow Mar 20, 2023, 1:30 PM

#

actually its a pre made command line

#

i am editing ad using it

#

and the context i problem i think

#

@serene scaffold are you there bro

cold snow Mar 20, 2023, 2:00 PM

#

@serene scaffold

#

bro

#

anybody help

#

me

#

@barren otter

serene scaffold Mar 20, 2023, 2:21 PM

#

cold snow <@107226564030156800>

please don't ping random people. I'm busy right now with work.

cold snow Mar 20, 2023, 2:21 PM

#

ok

merry fern Mar 20, 2023, 2:49 PM

#

cold snow context window of size 7 n = 7 for i in data[data.name == 'KIRTHIN waifu'].inde...

Can you format your code using ```

cold snow Mar 20, 2023, 2:50 PM

#

bro i cant understand

#

pls explain cearly

merry fern Mar 20, 2023, 2:51 PM

#

cold snow pls explain cearly

https://pythondiscord.com/pages/asking-good-questions/

Python Discord - Asking Good Questions

A guide for how to ask good questions in our community.

#

n = 7

for i in data[data.name == 'KIRTHIN waifu'].index:
  if i < n:
    continue
  row = []
  prev = i - 1 - n # we additionally substract 1, so row will contain current responce and 7 previous responces  
  for j in range(i, prev, -1):
    row.append(data.line[j])
  contexted.append(row)

columns = ['response', 'context'] 
columns = columns + ['context/' + str(i) for i in range(n - 1)]

df = pd.DataFrame.from_records(contexted, columns=columns)```

cold snow Mar 20, 2023, 3:11 PM

#

well yes

#

😅

#

!paste

arctic wedgeBOT Mar 20, 2023, 3:26 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cold snow Mar 20, 2023, 3:28 PM

#

@serene scaffold bro i fixed my problem

#

but again new one appeared that is

serene scaffold Mar 20, 2023, 3:28 PM

#

cold snow but again new one appeared that is

that's how debugging goes. getting a new error is progress.

cold snow Mar 20, 2023, 3:28 PM

#

create dataset suitable for our model
def construct_conv(row, tokenizer, eos = True):
flatten = lambda l: [item for sublist in l for item in sublist]
conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
conv = flatten(conv)
return conv

class ConversationDataset(Dataset):
def init(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

    block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

    directory = args.cache_dir
    cached_features_file = os.path.join(
        directory, args.model_type + "_cached_lm_" + str(block_size)
    )

    if os.path.exists(cached_features_file) and not args.overwrite_cache:
        logger.info("Loading features from cached file %s", cached_features_file)
        with open(cached_features_file, "rb") as handle:
            self.examples = pickle.load(handle)
    else:
        logger.info("Creating features from dataset file at %s", directory)

        self.examples = []
        for _, row in df.iterrows():
            conv = construct_conv(row, tokenizer)
            self.examples.append(conv)

        logger.info("Saving features into cached file %s", cached_features_file)
        with open(cached_features_file, "wb") as handle:
            pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

def __len__(self):
    return len(self.examples)

def __getitem__(self, item):
    return torch.tensor(self.examples[item], dtype=torch.long)

serene scaffold Mar 20, 2023, 3:28 PM

#

cold snow create dataset suitable for our model def construct_conv(row, tokenizer, eos = T...

!code

arctic wedgeBOT Mar 20, 2023, 3:28 PM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

serene scaffold Mar 20, 2023, 3:28 PM

#

be sure to always use this from now on ^

cold snow Mar 20, 2023, 3:29 PM

#

ok

#

!code

serene scaffold Mar 20, 2023, 3:29 PM

#

arctic wedge

you have to read this message

cold snow Mar 20, 2023, 3:30 PM

#

ok

lapis sequoia Mar 20, 2023, 3:30 PM

#

Hello chat

cold snow Mar 20, 2023, 3:30 PM

#

i used pastebin but how to use thsat in here

lapis sequoia Mar 20, 2023, 3:30 PM

#

Woah, CS grad?

cold snow Mar 20, 2023, 3:30 PM

#

merry fern ```py n = 7 for i in data[data.name == 'KIRTHIN waifu'].index: if i < n: ...

like this

lapis sequoia Mar 20, 2023, 3:30 PM

#

What college?

serene scaffold Mar 20, 2023, 3:31 PM

#

cold snow i used pastebin but how to use thsat in here

it says so in the message from the Python bot

serene scaffold Mar 20, 2023, 3:31 PM

#

arctic wedge

this message

serene scaffold Mar 20, 2023, 3:31 PM

#

lapis sequoia What college?

an east coast US one

lapis sequoia Mar 20, 2023, 3:32 PM

#

Woah

#

Can you show me your setup

#

I have so many questions

#

I'm planning to study CS in the future

#

Right now just focusing on school and a little bit of programming when I have time

serene scaffold Mar 20, 2023, 3:33 PM

#

lapis sequoia Right now just focusing on school and a little bit of programming when I have ti...

you're in nepal, right? I don't really know how it works in nepal.

#

cool flag though 🇳🇵

#

try asking in #career-advice is there are any developers in india or nepal who know what to do

lapis sequoia Mar 20, 2023, 3:34 PM

#

serene scaffold you're in nepal, right? I don't really know how it works in nepal.

I'm planning to do IB so I can go abroad

#

I don't see much potential here

cold snow Mar 20, 2023, 3:34 PM

#

NameError Traceback (most recent call last)
<ipython-input-12-a654172287f5> in <module>
6 return conv
7
----> 8 class ConversationDataset(Dataset):
9 def init(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):
10

<ipython-input-12-a654172287f5> in ConversationDataset()
7
8 class ConversationDataset(Dataset):
----> 9 def init(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):
10
11 block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

NameError: name 'PreTrainedTokenizer' is not defined

#

bro this is the error i got

lapis sequoia Mar 20, 2023, 3:35 PM

#

Code

cold snow Mar 20, 2023, 3:35 PM

#

anywayto solve it

lapis sequoia Mar 20, 2023, 3:35 PM

#

Send code

cold snow Mar 20, 2023, 3:35 PM

#

lapis sequoia Send code

above

serene scaffold Mar 20, 2023, 3:35 PM

#

cold snow NameError Traceback (most recent call last) <ipy...

you never imported PreTrainedTokenizer, so you'll have to figure out what that is and where to import it from.

cold snow Mar 20, 2023, 3:36 PM

#

how to import it

serene scaffold Mar 20, 2023, 3:36 PM

#

an import statement. but you have to figure out where it's located.

#

you must have other import statements in your code to use as an example

cold snow Mar 20, 2023, 3:36 PM

#

i c

#

let me trythanks again

lapis sequoia Mar 20, 2023, 3:38 PM

#

serene scaffold you never imported `PreTrainedTokenizer`, so you'll have to figure out what that...

Dataset?

#

He hasn't defined Dataset either

odd steeple Mar 20, 2023, 3:40 PM

#

How to get started in developing a ai model like chatgpt

serene scaffold Mar 20, 2023, 3:42 PM

#

odd steeple How to get started in developing a ai model like chatgpt

are you okay with the model you create being orders of magnitude less impressive than ChatGPT?

odd steeple Mar 20, 2023, 3:42 PM

#

serene scaffold are you okay with the model you create being orders of magnitude less impressive...

I just wanna do it for the fun

serene scaffold Mar 20, 2023, 3:44 PM

#

odd steeple I just wanna do it for the fun

look into how to create a basic language model. because language models are what ChatGPT is based on.

odd steeple Mar 20, 2023, 3:44 PM

#

Language models means like NLP? Just asking

lapis sequoia Mar 20, 2023, 3:45 PM

#

LMAO WHY DO I HAVE THIS

serene scaffold Mar 20, 2023, 3:45 PM

#

odd steeple Language models means like NLP? Just asking

nlp is the part of AI that deals with natural language. so language models are part of nlp, in the same way that addition is part of math.

cold snow Mar 20, 2023, 3:47 PM

#

stelercus bro i solved tokenizer error also

#

i reloaded kernel and forgot to install pip transformers and install tokenizer

#

reloaded

#

runned the command on run time and the probem is solved

odd steeple Mar 20, 2023, 4:00 PM

#

serene scaffold nlp is the part of AI that deals with natural language. so language models are p...

Can you do it in python or some other language

serene scaffold Mar 20, 2023, 4:03 PM

#

odd steeple Can you do it in python or some other language

pretty much all of nlp is done in python, but conceptually, you can use any language you want.

odd steeple Mar 20, 2023, 4:14 PM

#

serene scaffold pretty much all of nlp is done in python, but conceptually, you can use any lang...

Okay I'm gonna try it and reach you back

serene scaffold Mar 20, 2023, 4:14 PM

#

odd steeple Okay I'm gonna try it and reach you back

sure, but I only really have time to answer very specific questions.

odd steeple Mar 20, 2023, 4:15 PM

#

I understood brother

heavy crow Mar 20, 2023, 6:03 PM

#

Do you guys know of any papers on caption to image search? Along the lines of clip or lit but if possible a bit more efficient and accurate ofc haha

#

I feel like lit, clip, align, blip, blip2 are all more about captioning and classification whereas I am looking only for caption to image search

#

Please ping me if you know any cool research:)

lapis sequoia Mar 20, 2023, 6:48 PM

#

Does anyone here know how to use OpenAI gym library in python for reinforcement learning? I've been having trouble using an environment to train an AI on.

storm geyser Mar 20, 2023, 10:11 PM

#

anyone want to chat and code

hasty mountain Mar 20, 2023, 11:22 PM

#

lapis sequoia Does anyone here know how to use OpenAI gym library in python for reinforcement ...

The idea is basically this:

import retro
import time
from stable_baselines import PPO2
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines.common import set_global_seeds
from stable_baselines.common.callbacks import CheckpointCallback
from wrapper import wrapper

# Pre-saved states
states = ["ChunLiVsBlanka.1star", "ChunLiVsBalrog.1star", "ChunLiVsBison.1star", "ChunLiVsChunLi.1star", "ChunLiVsDhalsim.1star",
            "ChunLiVsGuille.1star", "ChunLiVsHonda.1star", "ChunLiVsKen.1star", "ChunLiVsRyu.1star", "ChunLiVsSagat.1star", "ChunLiVsVega.1star",
            "ChunLiVsZahgief.1star"]

env = retro.make(game="StreetFighterIISpecialChampionEdition-Genesis", state="ChunLiVsBlanka.1star")
#env = retro.make(game="DonkeyKongCountry2-Snes")
env = wrapper(env)
#model = PPO2.load("D:/Python/Projects/Hakisa/rl_model_1000000_steps")

obs = env.reset()
total_reward = []
steps = 0
end = False

model = PPO2(policy="CnnPolicy", env=env, gamma=0.99, n_steps=64, learning_rate=3e-9, vf_coef=0.5, verbose=1)
start = time.time()
model.learn(total_timesteps=1000000000, log_interval=100, reset_num_timesteps=True, callback=checkpoint)

while end != True:
    env.render()
    action, state = model.predict(obs)
    obs, reward, end, info = env.step(action)
    #steps += 1
    total_reward.append(reward)
    time.sleep(0.05)

# Don't use these
#env.render()
#env.close()

end = time.time()

print("Duration: ", (end-start)/3600)
print(f"Total Reward: {sum(total_reward)}")

'''checkpoint = CheckpointCallback(save_freq=100000, save_path="D:/Python/Projects/Hakisa/Donkey_Kong")'''

#

Also, env.render() and env.close() have some bugs. If you call env.close() it'll simply close your window directly...but when you call env.render(), it'll already render the game window and then close it.

#

Oh yes... I forgot to mention...this one is gym retro, which is more focused on retro games...but I suppose the original gym might have an idea that is close to this.

lapis sequoia Mar 20, 2023, 11:37 PM

#

thanks so much!

#

this helped alot!

gilded bobcat Mar 21, 2023, 2:25 AM

#

Hello my DS friends, curious on your take:

Would you do a feature selection technique if youll already have regularization in your model?

thorn swift Mar 21, 2023, 2:30 AM

#

gilded bobcat Hello my DS friends, curious on your take: Would you do a feature selection te...

why not?

gilded bobcat Mar 21, 2023, 2:31 AM

#

thorn swift why not?

I guess best case scenario itll make your model faster but wont improve it over just regularization (cause regularization would have dropped the same variables anyway!), at worst youll over generalize and make a worse model... This is my guess tho

#

I guess ill do both and report back haha

plucky bolt Mar 21, 2023, 3:38 AM

#

Anyone here use anaconda?

thorn swift Mar 21, 2023, 3:38 AM

#

plucky bolt Anyone here use anaconda?

i used to, i remember it looked cool

plucky bolt Mar 21, 2023, 3:39 AM

#

thorn swift i used to, i remember it looked cool

I am trying to install it on my laptop and gave it a non-default install location. Think it can cope with that? I mean it gave me the choice! lol

thorn swift Mar 21, 2023, 3:40 AM

#

it shoudlnt be a problem

plucky bolt Mar 21, 2023, 3:51 AM

#

thorn swift it shoudlnt be a problem

so far it seems to have all installed fine. Thanks 🙂

mint palm Mar 21, 2023, 4:18 AM

#

scikit learn
how do i pronounce it, i heard multiple pronounciation, what do you use?

#

have heard*

thorn swift Mar 21, 2023, 4:50 AM

#

mint palm scikit learn how do i pronounce it, i heard multiple pronounciation, what do you...

"sci" (like science) - "kit" (like kitkat) - "learn" (like learn)

mint palm Mar 21, 2023, 4:59 AM

#

some people say skeeet learn

thorn swift Mar 21, 2023, 5:00 AM

#

yea thats probably how it was meant

soft badge Mar 21, 2023, 12:00 PM

#

Guys what is the path for be expert in IA?

odd meteor Mar 21, 2023, 12:47 PM

#

plucky bolt Anyone here use anaconda?

I do 👀

lapis sequoia Mar 21, 2023, 12:48 PM

#

so for some fun AI stuff im currently working on a SC2 AI for their deep learning ladder and i have gotten this error

"c:/Users/Redux/Documents/python VB/hi.py"
Traceback (most recent call last):
  File "c:\Users\Redux\Documents\python VB\hi.py", line 1, in <module>
    from pysc2.agents import base_agent
  File "C:\Users\Redux\AppData\Local\Programs\Python\Python311\Lib\site-packages\pysc2\agents\base_agent.py", line 20, in <module>
    from pysc2.lib import actions
  File "C:\Users\Redux\AppData\Local\Programs\Python\Python311\Lib\site-packages\pysc2\lib\actions.py", line 27, in <module>
    from s2clientprotocol import spatial_pb2 as sc_spatial
  File "C:\Users\Redux\AppData\Local\Programs\Python\Python311\Lib\site-packages\s2clientprotocol\spatial_pb2.py", line 16, in <module>
    from s2clientprotocol import common_pb2 as s2clientprotocol_dot_common__pb2
  File "C:\Users\Redux\AppData\Local\Programs\Python\Python311\Lib\site-packages\s2clientprotocol\common_pb2.py", line 32, in <module>
    _descriptor.EnumValueDescriptor(
  File "C:\Users\Redux\AppData\Local\Programs\Python\Python311\Lib\site-packages\google\protobuf\descriptor.py", line 796, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

not sure if this is the right place to post this but do you guys have any insight on how to correct this error?

odd meteor Mar 21, 2023, 12:48 PM

#

soft badge Guys what is the path for be expert in IA?

AI you mean?

serene scaffold Mar 21, 2023, 12:57 PM

#

mint palm scikit learn how do i pronounce it, i heard multiple pronounciation, what do you...

sigh-kit-learn

#

the sci is the same as science, and I'm not aware of a variety of English where the "c" is pronounced.

lapis sequoia Mar 21, 2023, 12:58 PM

#

English is a very weird language for sure haha

serene scaffold Mar 21, 2023, 12:58 PM

#

lapis sequoia English is a very weird language for sure haha

it's actually not uniquely weird. every language has irregularity.

lapis sequoia Mar 21, 2023, 1:00 PM

#

I couldn't speak to that as i only speak English so xD I just know the weird things with English which there are a lot for example ate and eight knife science and some other examples of English being interesting

rough lava Mar 21, 2023, 1:23 PM

#

Hey hey people
Anyone here has any experience with bias (e.g. gender bias) in text based data ?
Looking to build a binary classifier to detect bias as a first step
So here are my 3 questions

Could anyone suggest any open source annotated/labelled datasets? (labels --> 1: biased, 2: non-biased)
What other methods would you recommend if any?
In conjunction with 2, I know of word embeddings but never actually used them. Are they implemented/trained with NN mostly ?

Sorry for the long post ^^

P.S. Using python/anaconda and mostly interested of 1)

serene scaffold Mar 21, 2023, 1:23 PM

#

lapis sequoia I couldn't speak to that as i only speak English so xD I just know the weird thi...

these are cherry-picked examples that only pertain to spelling. and spelling isn't the only property of a language that can be irregular. (in some senses, spelling isn't even a property of the language at all.)

soft badge Mar 21, 2023, 1:33 PM

#

odd meteor AI you mean?

yeah, because in brasil we say IA

serene scaffold Mar 21, 2023, 1:41 PM

#

soft badge yeah, because in brasil we say IA

people are more likely to know what you mean in this server if you say AI

soft badge Mar 21, 2023, 1:42 PM

#

okay

#

yeah

#

so I haven't studied AI in depth yet

#

but my goal is to create automations with AI.

#

either with computer vision or even within some software.

#

what do you reccomend study on internet?

wooden sail Mar 21, 2023, 1:49 PM

#

if you really wanna be an expert, you should start by learning the basics of python and learning math

#

AI is math, and the more math you know, the better stuff you'll be able to do with it. separately, python is currently the most popular language for AI due to the large community and available modules, among other things

rough lava Mar 21, 2023, 1:55 PM

#

soft badge yeah, because in brasil we say IA

there are plenty of courses for you to start, with small to little cost (e.g. edx ones ~300) or even completely free (e.g. MIT youtube uploads lots)
You could check those courses and if interested then you can delve deeper to math as @wooden sail said, and said strong foundations. (Math will help you remove the feeling of wondering "why" in most cased of AI)

soft badge Mar 21, 2023, 1:57 PM

#

wooden sail if you really wanna be an expert, you should start by learning the basics of pyt...

uhum... thanks Edd

serene scaffold Mar 21, 2023, 1:58 PM

#

soft badge uhum... thanks Edd

why the hesitation?

soft badge Mar 21, 2023, 1:58 PM

#

rough lava there are plenty of courses for you to start, with small to little cost (e.g. ed...

uhum.. thanks Chonky

serene scaffold Mar 21, 2023, 1:58 PM

#

getting AI advice from Edd is like getting a million dollars, or something.

wooden sail Mar 21, 2023, 1:58 PM

#

maybe that's the regional variant of mhm

soft badge Mar 21, 2023, 1:58 PM

#

it's my way of understanding

serene scaffold Mar 21, 2023, 1:58 PM

#

interesting

soft badge Mar 21, 2023, 2:00 PM

#

but it's also important to know about communication between systems, so you know how to pull data from a given system and apply a right model?

wooden sail Mar 21, 2023, 2:01 PM

#

that's also true. there's layers to AI, and that's on the system design level. you may or may not have to deal with that at all depending on which part of AI you want to focus on

#

for very large tasks, it's not just one person. you have people building a pipeline, people doing math, people coding the models, people sifting through data, etc

#

it's not realistic for one person to do all of these for a large model, but they can be good skills to have. in that sense, getting familiar with databases (something i've never done in my life) and generally with linux (because everything runs on linux) are good ideas

soft badge Mar 21, 2023, 2:04 PM

#

oh yeah

rough lava Mar 21, 2023, 2:05 PM

#

do you have lots of working experience ? 😮 @wooden sail

wooden sail Mar 21, 2023, 2:05 PM

#

depends on what you call work experience 😛

weary flint Mar 21, 2023, 2:05 PM

#

wooden sail depends on what you call work experience 😛

love ur pfp

soft badge Mar 21, 2023, 2:06 PM

#

wooden sail that's also true. there's layers to AI, and that's on the system design level. y...

but in my head I think, for example, I don't know what is possible to do this, but my idea, for example, is to create software with AI that has automation

rough lava Mar 21, 2023, 2:06 PM

#

wooden sail it's not realistic for one person to do all of these for a large model, but they...

^^'
Asking cuz I feel the same about dbs

wooden sail Mar 21, 2023, 2:07 PM

#

i do signal processing stuff, which usually has me dealing with the math part, but not necessarily handling data

rough lava Mar 21, 2023, 2:07 PM

#

oh knife-edge model on matlab and stuff like that?

wooden sail Mar 21, 2023, 2:09 PM

#

sure. i've never explicitly used that diffraction model, but similar stuff

rough lava Mar 21, 2023, 2:12 PM

#

Interesting and love reading code about those things
But I prefer to stay away from math parts if possible
Nlp seems fun for the time being, except when I need to hunt for datasets...

odd meteor Mar 21, 2023, 2:13 PM

#

soft badge Guys what is the path for be expert in IA?

Programing ==> Python
Theory ==> Mathematics and Statistics
Domain Knowledge ==> Dependent on your area of specialty, the industry you work, etc

soft badge Mar 21, 2023, 2:14 PM

#

deep, its big truly

#

sometimes i stay just on language

wooden sail Mar 21, 2023, 2:16 PM

#

if you're in HS, different math topics might be more accessible than others

#

many schools cover calculus toward the end, but not much on statistics and linear algebra

#

the good thing is that basic probability and a lot of linalg are independent from most other stuff you learn in school, so you could jump right into them

odd meteor Mar 21, 2023, 2:20 PM

#

rough lava Interesting and love reading code about those things But I prefer to stay away f...

Since you find NLP fun, do you like reading the theory part of NLP?

rough lava Mar 21, 2023, 2:21 PM

#

At first I did, but due to having to read papers most of the time, not anymore

#

at least not in volumes
if I have the time to spread them more in my day, yeah depending on the NLP topic

violet gull Mar 21, 2023, 5:17 PM

#

Can I train an image classification cnn on 60 images per class?

agile cobalt Mar 21, 2023, 5:19 PM

#

depends on the model, whenever you are planning to train from scratch or fine tune, as well as how easy your images are to tell apart

violet gull Mar 21, 2023, 5:48 PM

#

agile cobalt depends on the model, whenever you are planning to train from scratch or fine tu...

Alex net

#

So decently big

olive stone Mar 21, 2023, 6:03 PM

#

Hey, when creating an ML model can the validation data be the same as the training data?

wooden sail Mar 21, 2023, 6:12 PM

#

olive stone Hey, when creating an ML model can the validation data be the same as the traini...

that's usually a pretty bad idea

#

there are recent papers showing that some neural network architectures reach 0% training loss under mild conditions, and this in general says nothing about how well the network generalizes

#

it's a recipe for overfitting

olive stone Mar 21, 2023, 6:14 PM

#

I see, then having separate data for validation is better?

wooden sail Mar 21, 2023, 6:14 PM

#

i'd say necessary, not just better, if you want a useful network outside of the training data

#

unless you only ever need the network to work on the training data. there are special cases where this makes sense

agile cobalt Mar 21, 2023, 6:16 PM

#

there are even recommendations to have a third data set to see how the network will work with completely unseen data after you reach a final model

empty inlet Mar 21, 2023, 6:24 PM

#

Hello, good afternoon

#

how to solve this inequality with sympy? 4 <= 3x - 2 < 13

#

tryed a lot of solvers and also cant found an example in internet

wooden sail Mar 21, 2023, 6:28 PM

#

hmm

#

oops

#

ugh just do two inequality and find the intersection lol

agile cobalt Mar 21, 2023, 6:31 PM

#

iirc pandas also requires for you to use (a < b) & (b < c) (or use methods like .between) instead of supporting a < b < c, I guess that the way python handles a == b == c, a < b < c etc doesn't allows as much customisation

wooden sail Mar 21, 2023, 6:32 PM

#

!e

from sympy import solve_univariate_inequality, Symbol
x = Symbol('x')
s1 = solve_univariate_inequality(4 <= 3*x - 2, x)
s2 = solve_univariate_inequality(3*x - 2 < 13, x)
print(s1 & s2)

arctic wedgeBOT Mar 21, 2023, 6:32 PM

#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

(-oo < x) & (2 <= x) & (x < 5) & (x < oo)

wooden sail Mar 21, 2023, 6:32 PM

#

i thought it would simplify it, somehow

agile cobalt Mar 21, 2023, 6:32 PM

#

oo? is that their infinite representation?

wooden sail Mar 21, 2023, 6:32 PM

#

when printing, yeah

empty inlet Mar 21, 2023, 6:33 PM

#

nice solution... I was trying to solve all together.... thank you by the help. Really apreciate

wooden sail Mar 21, 2023, 6:33 PM

#

i guess we can pass the parameter extended_real = False to get rid of the infinities

#

!e

from sympy import solve_univariate_inequality, Symbol
x = Symbol('x')
s1 = solve_univariate_inequality(4 <= 3*x - 2, x, extended_real=False)
s2 = solve_univariate_inequality(3*x - 2 < 13, x, extended_real=False)
print(s1 & s2)

arctic wedgeBOT Mar 21, 2023, 6:35 PM

#

@wooden sail :x: Your 3.11 eval job has completed with return code 1.

001 | Traceback (most recent call last):
002 |   File "/home/main.py", line 3, in <module>
003 |     s1 = solve_univariate_inequality(4 <= 3*x - 2, x, extended_real=False)
004 |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
005 | TypeError: solve_univariate_inequality() got an unexpected keyword argument 'extended_real'

wooden sail Mar 21, 2023, 6:35 PM

#

hmmm

#

the API is kinda bad for this, the argument is there for rational polynomials. oh well, no matter

empty inlet Mar 21, 2023, 6:39 PM

#

its very ticky to get without documentation... and also had a lots of books here. No one example like that

wooden sail Mar 21, 2023, 6:40 PM

#

https://docs.sympy.org/latest/modules/solvers/inequalities.html

#

the docs are there, i used the last example to cook something up

empty inlet Mar 21, 2023, 6:43 PM

#

in the books solution is closed to 2 and opened in 5 interval

#

[2, 5)

wooden sail Mar 21, 2023, 6:43 PM

#

arctic wedge <@467435887236612106> :white_check_mark: Your 3.11 eval job has completed with r...

yeah, same as here

junior sun Mar 21, 2023, 6:45 PM

#

does anyone know any alternative to GPT 3 cuz i used all of my tokens and uh let's say i have some paypal issues

wooden sail Mar 21, 2023, 6:45 PM

#

alternative in what sense?

junior sun Mar 21, 2023, 6:46 PM

#

basically it can do the same stuff but it's free

#

i'm fairly new to AI and stuff so i just want to try it out and see if i can use it in one of my projects

#

so

#

i don't want to pay rn

#

i know that this isn't about openai and gpt but i thought this might be the place where i can find people who have knowledge about this kind of stuff

agile cobalt Mar 21, 2023, 6:50 PM

#

perhaps chatgpt or bloom

#

that thing?

#

does not sounds like the weights are public
~~and let's not talk about leaked models~~

olive stone Mar 21, 2023, 6:54 PM

#

When dealing with a dataset of images, is it always recommended to apply normalization on the images?

hasty mountain Mar 21, 2023, 7:07 PM

#

wooden sail AI is math, and the more math you know, the better stuff you'll be able to do wi...

AI is math
To think I don't like complex math, and I've chosen exactly the programming field with most math grumpchib

hasty mountain Mar 21, 2023, 7:08 PM

#

wooden sail for very large tasks, it's not just one person. you have people building a pipel...

||As an intern researcher, I prefer to consider there's no excuse to not do everything...because that might be the case yert ||

iron basalt Mar 21, 2023, 7:31 PM

#

hasty mountain *AI is math* To think I don't like complex math, and I've chosen exactly the pro...

I think you will find (and this is entirely subjective), that the more interesting a programming field is, the more math is probably involved (there are other kinds of interesting, such as designing a beautiful web page, but since this is programming i'm assuming a specific type of interest).

wooden sail Mar 21, 2023, 7:32 PM

#

this is your daily reminder that CS is originally a branch of mathematics

#

the compute in computer science is from computability theory

iron basalt Mar 21, 2023, 7:32 PM

#

wooden sail this is your daily reminder that CS is originally a branch of mathematics

If only you could get a math major with it...

#

(Also it makes the "science" in "computer science" extra wrong)

#

("computer math?")

wooden sail Mar 21, 2023, 7:35 PM

#

computer meth

warm copper Mar 21, 2023, 7:41 PM

#

hello everyone

#

does anyone know pyspark here

#

I cant seem to get pyspark work with pycharm

#

reee

serene scaffold Mar 21, 2023, 7:44 PM

#

warm copper I cant seem to get pyspark work with pycharm

you have to give more information to get help. what did you try to do, what did you expect to happen, and what happened instead? (and if you show code or error messages--which is strongly encouraged--don't post them as screenshots.)

warm copper Mar 21, 2023, 7:45 PM

#

#1087821537991729292 message

#

its in this channel @serene scaffold

weak briar Mar 21, 2023, 7:48 PM

#

Hi all, I'm having problems with a task I was given for a course. For context, I'm a physics student in their first semester, and I'm taking a course to be able to use python in my branch. I'm afraid that the task is a bit above my level in physics though, so I'm not sure how I should continue... I've plotted the data, but I don't know how I could continue with the rest of the task- so if anyone could at least help me with a periodogram, I'd be really thankful!! MafuBow

short heart Mar 21, 2023, 9:17 PM

#

I need help with pandas ASAP, I would appreciate help. Basically I have df with 3 cols: userID, value, itemID. I need to do the following: group by user, pick the biggest value and assign itemID, which corresponds to value to all userIDs. How can I do this?

serene scaffold Mar 21, 2023, 9:54 PM

#

short heart I need help with pandas ASAP, I would appreciate help. Basically I have df with ...

to get help ASAP, please do print(df.sample(10).to_dict('list')) and put the text (no screenshots) in the chat in your next message.

short heart Mar 21, 2023, 9:54 PM

#

serene scaffold to get help ASAP, please do `print(df.sample(10).to_dict('list'))` and put the t...

doesnt matter now, solved it

serene scaffold Mar 21, 2023, 9:55 PM

#

short heart doesnt matter now, solved it

for future reference, copy-and-pastable dataframe examples increase the likelihood that you'll get help fast.

charred light Mar 21, 2023, 9:58 PM

#

I have a 1 x n df (n is always going to be even). Columns are named column1_null_count, column1_count_not_null, column2_null_count, column2_not_null_count, ...

How can I turn this into:
column1 | column1_null_count | column1_not_null_count
column2 | column2_null_count | column2_not_null_count

Without having to do some kind of looping and split on '_'

serene scaffold Mar 21, 2023, 9:59 PM

#

charred light I have a 1 x n df (n is always going to be even). Columns are named column1_null...

see my above messages about copy-and-pastable examples.

#

but do print(df.T.head(10).to_dict('list'))

#

you also don't want to have dataframes where the number of columns is the one that varies.

#

please ping me if/when you do that.

boreal gale Mar 21, 2023, 10:05 PM

#

charred light I have a 1 x n df (n is always going to be even). Columns are named column1_null...

ha managed to sort your null count query out eh?

i would take the underlying numpy values, reshape and assign another column for the extra column name

charred light Mar 21, 2023, 10:06 PM

#

boreal gale ha managed to sort your null count query out eh? i would take the underlying nu...

Unfortunately, had to hard code it with some help of python generating the query.

#

Essentially ended up as a large sum(case when XXXX is null then 1 else 0 end) XXXX_count_nulls, count(XXXX) as XXXX_count_not_nulls for each column. It's not great, but no way around it. Ends up as a 1 x n df.

boreal gale Mar 21, 2023, 10:08 PM

#

" If it's stupid but it works, it's not stupid " 🤷

charred light Mar 21, 2023, 10:09 PM

#

serene scaffold but do `print(df.T.head(10).to_dict('list'))`

This shows up as. {0: [0, 10, 0, 10, 1, 9, 3, 7, 10, 0]}

#

More like I already spent an hour googling and couldn't find a built in function w/ SQL that does that.

#

My other option was to pull the entire table, but that takes longer.

serene scaffold Mar 21, 2023, 10:10 PM

#

charred light This shows up as. ` {0: [0, 10, 0, 10, 1, 9, 3, 7, 10, 0]} `

do print(df.head().to_dict()) instead.

boreal gale Mar 21, 2023, 10:11 PM

#

i would take the underlying numpy values, reshape and assign another column for the extra column name
meaning something like

import pandas as pd
df = pd.DataFrame({'col_1_a': [1], 'col_1_b': [1], 'col_2_a': [2], 'col_2_b': [3]})
pd.DataFrame(df.values.reshape(2,-1), columns=['a', 'b']).assign(column=df.columns[::2].str.rsplit('_', n=1).str[0])

(i am all ears for a neat actual pandas solution if anyone has one 🙏 )

serene scaffold Mar 21, 2023, 10:12 PM

#

I might have one once I get the print result 😛

charred light Mar 21, 2023, 10:14 PM

#

serene scaffold do `print(df.head().to_dict())` instead.

{'col1_a' : {0: 0}, 'col1_b': {0: 10}, 'col2_a' : {0: 0}, 'col2_b': {0: 10}, 'col3_a' : {0: 1}, 'col3_b': {0: 9}, ...}

serene scaffold Mar 21, 2023, 10:14 PM

#

charred light {'col1_a' : {0: 0}, 'col1_b': {0: 10}, 'col2_a' : {0: 0}, 'col2_b': {0: 10}, 'co...

thanks, one moment.

charred light Mar 21, 2023, 10:17 PM

#

boreal gale > i would take the underlying numpy values, reshape and assign another column fo...

Well, df.values.reshape(-1,2) I think brings me 90% there, I just need to add in the index w/ the column name.

#

Nvm, above works if I swapped the reshape.

serene scaffold Mar 21, 2023, 10:22 PM

#

@charred light

In [23]: df
Out[23]:
   col1_a  col1_b  col2_a  col2_b  col3_a  col3_b
0       0      10       0      10       1       9

In [24]: df2 = df.T.reset_index()

In [25]: df2
Out[25]:
    index   0
0  col1_a   0
1  col1_b  10
2  col2_a   0
3  col2_b  10
4  col3_a   1
5  col3_b   9

In [26]: df3 = df2['index'].str.extract(r"col(\d+)_(\w+)")

In [27]: df3
Out[27]:
   0  1
0  1  a
1  1  b
2  2  a
3  2  b
4  3  a
5  3  b

In [28]: df3['num'] = df2[0]

In [29]: df3
Out[29]:
   0  1  num
0  1  a    0
1  1  b   10
2  2  a    0
3  2  b   10
4  3  a    1
5  3  b    9

In [37]: df3.pivot_table(columns=1, index=0, values='num')
Out[37]:
  num
    a   b
1   0  10
2   0  10
3   1   9

#

CC @boreal gale

boreal gale Mar 21, 2023, 10:26 PM

#

oh yeah, good one 👍 i was thinking of .T.reset_index() but my brain was fried and it just eluded my mind

charred light Mar 21, 2023, 10:31 PM

#

serene scaffold <@287800670441046018> ```py In [23]: df Out[23]: col1_a col1_b col2_a col2...

Thanks.

#

I also just realized, I actually don't really need the not_null count. ASfacepalm Could have just gotten the full count once at the start.

sterile wyvern Mar 21, 2023, 10:41 PM

#

What if we the returns do not make a continuous function?

boreal gale Mar 21, 2023, 10:41 PM

#

sterile wyvern What if we the returns do not make a continuous function?

what's "the returns"?

sterile wyvern Mar 21, 2023, 11:18 PM

#

boreal gale what's "the returns"?

TestWindowLen = [90, 125, 256]
CheckDays = [ 1,15, 30 ]
std_dev=[25, 50, 100]

rancid ruin Mar 22, 2023, 2:33 AM

#

What is backpropagating/how do you do it?

whole gazelle Mar 22, 2023, 5:03 AM

#

Hi! Has anybody here worked with object detectors in webapps?

I'm trying to integrate YOLOv8 into my React Project. I have code for frontend and backend. Im thinking that Im gonna need 3 CLIs for this since I need 1 each for client and server and another for the Object Detector. Do any of you have any ideas as to how I could accomplish this?

olive stone Mar 22, 2023, 9:37 AM

#

olive stone When dealing with a dataset of images, is it always recommended to apply normali...

Any clue?

wooden sail Mar 22, 2023, 9:38 AM

#

yes

boreal gale Mar 22, 2023, 9:47 AM

#

sterile wyvern TestWindowLen = [90, 125, 256] CheckDays = [ 1,15, 30 ] std_dev=[25, 50, 100]

i don't think that's an issue. have a look at https://optuna.org/ if you just want a package to do the hyperparameter optimisation for you. otherwise you will need to dive into some papers to fully understand what's going on

Optuna

Optuna - A hyperparameter optimization framework

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.

sterile wyvern Mar 22, 2023, 11:31 AM

#

boreal gale i don't think that's an issue. have a look at https://optuna.org/ if you just wa...

What do you use? For bayesian optimisation*

boreal gale Mar 22, 2023, 11:37 AM

#

sterile wyvern What do you use? For bayesian optimisation*

historically i have been just using https://github.com/fmfn/BayesianOptimization

but currently playing with optuna which i linked above.

sterile wyvern Mar 22, 2023, 11:40 AM

#

boreal gale historically i have been just using https://github.com/fmfn/BayesianOptimization...

What do you think of my results? Are they continuous?

sterile wyvern Mar 22, 2023, 11:41 AM

#

sterile wyvern TestWindowLen = [90, 125, 256] CheckDays = [ 1,15, 30 ] std_dev=[25, 50, 100]

^

boreal gale Mar 22, 2023, 11:58 AM

#

well firstly it's important to really distinguish properly what is it that you are showing.
are these hyperparamters of your models?
are these some sort of output of your models?

sterile wyvern Mar 22, 2023, 12:20 PM

#

boreal gale well firstly it's important to really distinguish properly what is it that you a...

The hyperparameters.

boreal gale Mar 22, 2023, 12:21 PM

#

so the hyperparameters you have shown looks like components you would need to create a grid in grid search.
without knowing the model you are working with, i can't tell if they are continuous or not.

sterile wyvern Mar 22, 2023, 12:24 PM

#

boreal gale so the hyperparameters you have shown looks like components you would need to cr...

Im evaluating this. (What model to use (but the function being continuous is a factor))

#

So Ive ben told to use Random search should be fast and possible. It was said random search can be combined with other optimization techniques like Bayesian optimization.

#

How do you determin if a function is continuous? @boreal gale

boreal gale Mar 22, 2023, 12:28 PM

#

sorry i am really confused as to what are you trying to do. perhaps it's a language barrier or there is some knowledge gap somewhere

#

what's the thing you are trying to model?
what are your inputs? where are they from? what do they mean in the real world?
what are your output(s)? where are they from? what do they mean in the real world?

#

How do you determin if a function is continuous?
layman explanation is probably "function that does not have discontinuities" or "something that you can draw with one stroke of a pen, as opposed something that require you to lift your pen"

sterile wyvern Mar 22, 2023, 12:45 PM

#

I have a program that uses a grid search to find the optimal parameters for another function. Im trying to find a better way to gt optimal values. @boreal gale

#

So Im wondering if a random seach can be used/applied to this.

boreal gale Mar 22, 2023, 12:48 PM

#

another function
what is this "another function"? without knowing this i can't comment of whether the parameters (i.e. the hyperparamters) are continuous. do you see what i am getting at?

#

So Ive Im wondring if a random seach can be used/applied to this.
yes, random search is always an option.

sterile wyvern Mar 22, 2023, 12:55 PM

#

boreal gale > another function what is this "another function"? without knowing this i can't...

What do you ned to know about the function to say if its continuous?

boreal gale Mar 22, 2023, 1:02 PM

#

i don't know. but preferably the entire definition.

#

also are you asking whether the hyperparamters are continuous or the function itself? those are two different things.

mild dirge Mar 22, 2023, 1:06 PM

#

If you can use grid search, then obviously you can sample random points on that grid to test your model with. So yes, you are able to use random search.

sterile wyvern Mar 22, 2023, 1:22 PM

#

boreal gale also are you asking whether the hyperparamters are continuous or the function it...

Considering my hyperparameters I thought you could derive if the results for eg: (90, 1,100) if these are continuous.

analog cipher Mar 22, 2023, 1:23 PM

#

Hello

#

if there is someone who's good at numpy, can I talk to you in private ?

wooden sail Mar 22, 2023, 1:25 PM

#

i'm decent at numpy, but won't go in dms

analog cipher Mar 22, 2023, 1:25 PM

#

u do u

wooden sail Mar 22, 2023, 1:25 PM

#

we discourage dms, as the server is for helping in the server

analog cipher Mar 22, 2023, 1:25 PM

#

k

wooden sail Mar 22, 2023, 1:29 PM

#

pretty much

#

the composition through different layers makes that a terrible exercise in the chain rule

#

deep learning frameworks evaluate lazily. they create a computational graph that makes the forward pass more efficient, and automatic differentiation easier

#

i think someone had linked you before to a website on how to construct and traverse computation graphs. if you don't use autodiff, you either construct the graph yourself on code, or do the derivatives on paper

tidal bough Mar 22, 2023, 1:34 PM

#

tensorflow used to do that, didn't it

#

before version 2 or so. when pytorch just began, its killer feature was implicit computational graphs rather than explicit.

#

(then TF learned to do that too)

terse kindle Mar 22, 2023, 2:59 PM

#

does anybody have resource link where I can learn about Handwriting recognition using Deep Learning ?

grand warren Mar 22, 2023, 3:03 PM

#

https://www.youtube.com/watch?v=aircAruvnKk&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv

YouTube

3Blue1Brown

But what is a neural network? | Chapter 1, Deep learning

What are the neurons, why are there layers, and what is the math underlying it?
Help fund future projects: https://www.patreon.com/3blue1brown
Written/interactive form of this series: https://www.3blue1brown.com/topics/neural-networks

Additional funding for this project provided by Amplify Partners

Typo correction: At 14 minutes 45 seconds, th...

▶ Play video

#

3 blue one brown explains it

serene scaffold Mar 22, 2023, 3:35 PM

#

@terse kindle ^

terse kindle Mar 22, 2023, 3:36 PM

#

Will look into it. Thank you so much

wet blade Mar 22, 2023, 4:39 PM

#

dataframe\pandas or numpy

wooden sail Mar 22, 2023, 4:54 PM

#

yes, that would just make it easier for you to make your own autodiff

#

i don't know how in depth you wanna go into the making your own deep learning stuff

#

fair enough. in python, i would recommend stuff like jax, pytorch, and tensorflow for this task. you can also do it with sympy, but sympy is pretty slow

tidal bough Mar 22, 2023, 5:43 PM

#

jax, notably, is very annoying to build on windows

plush jungle Mar 22, 2023, 7:14 PM

#

are there any obvious bugs in this DQN back propagation code I wrote? I mostly used the pytorch example but I changed a couple things so it would fit my tensor shapes and I'm worried the reason it's diverging is cause I have some bug

#

def optimize_model():

    if len(memory) < BATCH_SIZE:
        return
    batch = memory.sample(BATCH_SIZE)
        
    state_batch, reward_batch, next_state_batch, terminal_batch = zip(*batch)

    state_batch = torch.stack(tuple(state for state in state_batch))
    reward_batch = torch.stack(reward_batch)
    next_state_batch = torch.stack(tuple(state for state in next_state_batch))

    if torch.cuda.is_available():
        state_batch = state_batch.cuda()
        reward_batch = reward_batch.cuda()
        next_state_batch = next_state_batch.cuda()

    q_values = policy_net(state_batch)
    policy_net.eval()
    
    with torch.no_grad():            
        next_prediction_batch = target_net(next_state_batch)

    y_batch = torch.cat(
        tuple(reward if terminal else reward + GAMMA * prediction for reward, terminal, prediction in
              zip(reward_batch, terminal_batch, next_prediction_batch)))

    optimizer.zero_grad()

    loss = loss_function(q_values, y_batch.float())
    loss.backward()

    torch.nn.utils.clip_grad_value_(policy_net.parameters(), 100)
    optimizer.step()

warm copper Mar 22, 2023, 7:28 PM

#

I assume no one works with pyspark on Mac here?

#

No one managed to answer the error I keep getting 😅

serene scaffold Mar 22, 2023, 7:40 PM

#

Traceback (most recent call last):
  File "/Users/kadiraltunel/PythonProjects/Lab1/main.py", line 17, in <module>
    df = spark.createDataFrame(data=data, schema=col_names)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/sql/session.py", line 894, in createDataFrame
    return self._create_dataframe(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/sql/session.py", line 938, in _create_dataframe
    jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/rdd.py", line 3113, in _to_java_object_rdd
    return self.ctx._jvm.SerDeUtil.pythonToJava(rdd._jrdd, True)
                                                ^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/rdd.py", line 3505, in _jrdd
    wrapped_func = _wrap_function(
                   ^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/rdd.py", line 3362, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/rdd.py", line 3345, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
                      ^^^^^^^^^^^^^^^^^^
  File "/Users/kadiraltunel/Documents/Spark/python/pyspark/serializers.py", line 468, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

^ this is the error message they're referring to

warm copper Mar 22, 2023, 7:43 PM

#

The code doesn’t have any issues @serene scaffold

#

It’s the pyspark causing issues on my Mac

serene scaffold Mar 22, 2023, 7:43 PM

#

warm copper The code doesn’t have any issues <@253696366952316929>

if the error message of yours that I just posted isn't the one that you currently need help with, you have to show the new error message.

warm copper Mar 22, 2023, 7:44 PM

#

Oh that’s the error

#

I’m just saying that so that people don’t try to find out if the code is wrong. That’s my teacher’s code which works on his computer

#

I’m using Pycharm on MacBook Air M2

#

My teacher couldn’t figure it out either but then he doesn’t use Mac @serene scaffold

novel python Mar 22, 2023, 8:53 PM

#

anyone used to plotly dash in python here?

boreal gale Mar 22, 2023, 9:12 PM

#

@warm copper what python version are you using?

because i recall there was someone who dug up a jira issue for you which shows a similar error for python 3.11 iirc, and it was quickly dismissed for some reason, all without actually checking anything as far as i can see.

warm copper Mar 22, 2023, 9:31 PM

#

I am using the latest version of python @boreal gale

#

I even added the paths

sterile wyvern Mar 22, 2023, 9:41 PM

#

boreal gale i don't know. but preferably the entire definition.

But if the function is what you need then I would have to ask more about your background. Its in financial space. Are you open to pms?

boreal gale Mar 22, 2023, 9:54 PM

#

warm copper I am using the latest version of python <@231160898872410123>

what's your spark version? i presume 3.3.2?
python 3.11 does not work with 3.3.2, only pyspark 3.4+ (which is unreleased as of writing this) works.
please downgrade to 3.10 as a workaround for now.
ref: https://github.com/apache/spark/pull/38987

misty flint Mar 22, 2023, 9:56 PM

#

tidal bough jax, notably, is very annoying to build on windows

i have a meme for you in #ot1-perplexing-regexing

boreal gale Mar 22, 2023, 9:57 PM

#

sterile wyvern But if the function is what you need then I would have to ask more about your ba...

i have a no dm policy, sorry.

also, the function being continuous or not has no relevance to whether you can use bayes opt or not. why do you want to know if your function is continuous anyway?

edit: meh i can't type, i missed out an important no in the middle of the sentence 😂

austere swift Mar 22, 2023, 9:57 PM

#

tidal bough jax, notably, is very annoying to build on windows

this might be a controversial take but I think anybody who has the know-how to use jax and who is doing complex applications that would need it (since it's oriented towards applications where you'd want complete control over pretty much every operation) would likely already be using linux due to the headache that windows brings

zenith plover Mar 22, 2023, 10:35 PM

#

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import cv2
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam
from keras import backend as K
from keras.layers import Conv2D,MaxPooling2D,UpSampling2D,Input,BatchNormalization,LeakyReLU
from keras.layers.merge import concatenate
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
import tensorflow
import tensorflow.compat.v1 as tf

tensorflow.random.set_seed(123)
session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
tf.keras.backend.set_session(sess)
tensorflow.random.set_seed(2)
np.random.seed(1)

print(os.listdir("../deneme/dataset/dataset_updated/"))

#

LEARNING_RATE = 0.001
Model_Colourization.compile(optimizer=Adam(learning_rate=LEARNING_RATE),
                            loss='mean_squared_error')
Model_Colourization.summary()

#

I am getting an error in the compile part. How to fix

sterile wyvern Mar 22, 2023, 10:36 PM

#

boreal gale i have a no dm policy, sorry. also, the function being continuous or not has no...

I was told the results need to be continuous to use random search.

boreal gale Mar 22, 2023, 10:41 PM

#

sterile wyvern I was told the results need to be continuous to use random search.

hmm okay, again we need to clarify what is "the results"

in hyperparameter optimisation, you obviously have some metric you are trying to maximise/minimise, if that's what you are calling "the results" (since your application is finance, let's just assume this is your profit for example) then i don't think that statement is correct, random search doesn't have any requirements like that.

random search is literally trying some hyperparamter configuration and see how good it performs, whether the metric you are trying to maximise/minimise is continuous or not shouldn't matter i think.

however if that statement is aimed towards bayesian optimisation, then maybe there is some truth to it. i have never dealt with a discrete metric to optimise for. but i can kinda see why the metric being discrete might be an issue.

sterile wyvern Mar 22, 2023, 10:43 PM

#

boreal gale hmm okay, again we need to clarify what is "the results" in hyperparameter opti...

Results from the function that gives you the best hyper parameters we are trying to improve.

boreal gale Mar 22, 2023, 10:46 PM

#

sterile wyvern Results from the function that gives you the best hyper parameters we are trying...

do you mind writing that in your native language just this once? i really think there is a language barrier here, as i am failing to understand that.

sterile wyvern Mar 22, 2023, 10:46 PM

#

boreal gale do you mind writing that in your native language just this once? i really think ...

What languages do you speak?

#

Besides english

boreal gale Mar 22, 2023, 10:47 PM

#

just english and chinese

sterile wyvern Mar 22, 2023, 10:47 PM

#

lol

#

Lets stick to english.

sterile wyvern Mar 22, 2023, 10:47 PM

#

sterile wyvern Results from the function that gives you the best hyper parameters we are trying...

Whats confusing you?

#

I have a function Im using to get hyperparameters.

#

To use random seach on this function the results should be continuous.

#

I can be wrong.

boreal gale Mar 22, 2023, 10:49 PM

#

I have a function Im using to get hyperparameters.
okay, let's call this function the hyperparameters-optimiser from now on.

sterile wyvern Mar 22, 2023, 10:49 PM

#

Ok.

boreal gale Mar 22, 2023, 10:50 PM

#

To use random seach on this function the results should be continuous.
is "this function" here the hyperparameters-optimiser?

sterile wyvern Mar 22, 2023, 10:50 PM

#

boreal gale > To use random seach on this function the results should be continuous. is "thi...

Yes.

boreal gale Mar 22, 2023, 10:51 PM

#

when you say on, did you mean in?

sterile wyvern Mar 22, 2023, 10:52 PM

#

boreal gale when you say `on`, did you mean `in`?

Would you use an intercooler in or on a car?

boreal gale Mar 22, 2023, 10:52 PM

#

meaning.. "using random search as the core logic of this hyperparameter-optimiser"

sterile wyvern Mar 22, 2023, 10:52 PM

#

boreal gale meaning.. "using random search as the core logic of this hyperparameter-optimise...

Yes.

boreal gale Mar 22, 2023, 10:52 PM

#

okie dokie

sterile wyvern Mar 22, 2023, 10:52 PM

#

🙂

boreal gale Mar 22, 2023, 10:53 PM

#

now that only leaves "the results" as my only source of confusion.
i assume "the results" mean the evaluation metric of the model?
the model here is not the hyperparamter-optimiser

sterile wyvern Mar 22, 2023, 10:55 PM

#

boreal gale now that only leaves "the results" as my only source of confusion. i assume "the...

Or you think the results should be from the function that the optimized values are used in?

boreal gale Mar 22, 2023, 10:56 PM

#

when you use the word "the results", my mind just draws a blank, hence the confusion.

#

but if you do mean the evaluation metric your hyperparamter-optimiser will be working to maximise, then my above reply is relevant
#data-science-and-ml message

sterile wyvern Mar 22, 2023, 10:57 PM

#

what makes more sense to you?

sterile wyvern Mar 22, 2023, 10:58 PM

#

boreal gale when you use the word "the results", my mind just draws a blank, hence the confu...

The results being the best values from the hyperparamter-optimiser or the performance of the function that its being used in?

boreal gale Mar 22, 2023, 10:59 PM

#

"performance of the function that its being used in" makes more sense, yeah (but i can't just assume that's what you meant)

sterile wyvern Mar 22, 2023, 11:00 PM

#

boreal gale "performance of the function that its being used in" makes more sense, yeah (but...

Well i use it in something called an insample test.

sterile wyvern Mar 22, 2023, 11:01 PM

#

sterile wyvern Well i use it in something called an insample test.

The values would be ratios and percentages.

boreal gale Mar 22, 2023, 11:06 PM

#

okay, random search should be fine for reasons stated above.

as for the issue of continuous or not, ratios/percentages sounds pretty continuous to me, unless the components of the ratio is themselves bounded and discrete

e.g. say if your ratio is X:Y, and X can only ever take value (1,2,3,4,5) and Y only take value (5,6,7,8,9), then that ratio doesn't sound continuous to me.

sterile wyvern Mar 22, 2023, 11:08 PM

#

boreal gale okay, random search should be fine for reasons stated above. as for the issue o...

Google Sharpe Ratio.

boreal gale Mar 22, 2023, 11:08 PM

#

it's continuous then

arctic wedgeBOT Mar 22, 2023, 11:12 PM

#

Hey @vocal fractal!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

#

Hey @vocal fractal!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

vocal fractal Mar 22, 2023, 11:16 PM

#

https://paste.pythondiscord.com/fefapididi getting this error (pasted at the end of code). Can someone please help

sterile wyvern Mar 22, 2023, 11:20 PM

#

How often you use a random search?

#

@boreal gale

#

Do you use insample out of sample split?

#

80/20 for example

boreal gale Mar 22, 2023, 11:23 PM

#

vocal fractal https://paste.pythondiscord.com/fefapididi getting this error (pasted at t...

RSI as in Relative Strength Index?
presumably it's because when calculating RSI, you had to discard some data because you don't have 14 (or whatever your window is) days worth of history for RSI calculation, and you will have less rows compared to the original dataframe

when you try to assign non-scalar (i.e. not just one value) data (in data structure that is without pandas index information) back into the dataframe, the length has match the original dataframe.
in this case you are indeed assigning non-scalar data without pandas index information (a numpy array) back into a dataframe and the length doesn't match, hence an error occurs, specifically "ValueError: Length of values (3312) does not match length of index (3326)" - notice how the two values are off by 14!

#

if this isn't a homework, maybe looking into using TA-lib instead of homebrewing RSI calculation is a worthwhile thing to do.
https://pypi.org/project/TA-Lib/

PyPI

TA-Lib

Python wrapper for TA-Lib

boreal gale Mar 22, 2023, 11:26 PM

#

sterile wyvern How often you use a random search?

not that often. i generally prefer grid search/bayes opt, grid search if i am feeling lazy, bayes opt if i know it's going to be tough finding a good param.

vocal fractal Mar 22, 2023, 11:26 PM

#

@boreal gale Thank you so much! I have so much to learn!

sterile wyvern Mar 22, 2023, 11:28 PM

#

boreal gale not that often. i generally prefer grid search/bayes opt, grid search if i am fe...

Isnt Bayes a vrsion of random search?

boreal gale Mar 22, 2023, 11:28 PM

#

sterile wyvern Do you use insample out of sample split?

yes to this, but a bit more complicated usually. it's more like this from the sklearn docs, there is more than one split going on, a la cross validation

boreal gale Mar 22, 2023, 11:29 PM

#

sterile wyvern Isnt Bayes a vrsion of random search?

ohh.. i wouldn't consider bayes opt as random search, it's more "search guided by some bayesian model" (which imo isn't random at all)

i see how one can consider that as random though!

sterile wyvern Mar 22, 2023, 11:31 PM

#

boreal gale ohh.. i wouldn't consider bayes opt as random search, it's more "search guided b...

"random search can be combined with other optimization techniques like Bayesian optimization"

boreal gale Mar 22, 2023, 11:32 PM

#

👌

sterile wyvern Mar 22, 2023, 11:35 PM

#

Which is fast and powerful?

#

Why not use random seach?

#

Is Bayesian optimization fast?

boreal gale Mar 22, 2023, 11:36 PM

#

random seach is very luck based.
bayes opt at least has some theory as to why it should work

#

though it's really worth noting, hyperparameter optimisation is not a silver bullet.
sometimes, one's data/model is just not up to scratch, no matter how you tune your hyperparameter, the model is still not going to perform to your satisfaction
had that happened to me once or twice, my data is just shit, no matter what i do, i can't make anything useful from it

#

our mega thread just drowned out some poor guy's help request, #data-science-and-ml message
could someone help this poor soul D:

hasty mountain Mar 22, 2023, 11:45 PM

#

Can someone give me some help with unsupervised learning applied to neural networks?
I'm currently trying to test a Minimum Entropy Loss from a recent paper, which has the objective of making the model learn to minimize the information entropy of certain input more explictly, even allowing it to "pre-classify" the input(inputs with similar entropy level are more likely to be from the same class, like in CIFAR100, ambulances pics tend to have the same minimum entropy).

However, I'm having the problem that...my model loss is actually increasing after each epoch, not decreasing(if not to say that the entropy minimization doesn't seem to make sense at all). I suppose this means the model isn't being consistent with the entropy minimization.

Can someone give me some ideas on what could be causing this? Is there any more "consecrated" loss function for this task, just so I can use it as a control compared to this MinEntLoss?

#

(Yes, I have reviewed my code quite a few times to make sure I'm implementing the loss and unsupervised task correctly)
Also, the model is a ResNet extracting features from 100x100x3 images into 512 features.

warm copper Mar 22, 2023, 11:57 PM

#

bruh I think its my teacher's code @boreal gale

boreal gale Mar 23, 2023, 12:03 AM

#

warm copper bruh I think its my teacher's code <@231160898872410123>

i had a look at it, it's been a while since i touched pyspark but i do believe the code is alright.

i really do recommend at least try to install python3.10 (and pyspark-related libraries on it) and give the same code another go.

#

i have now replicated the issue in python3.11
https://paste.pythondiscord.com/gamayuqubi

exact same code running on python3.10 is fine
https://paste.pythondiscord.com/oxatizalit

boreal gale Mar 23, 2023, 12:15 AM

#

boreal gale i have now replicated the issue in python3.11 https://paste.pythondiscord.com/g...

@warm copper 👆

warm copper Mar 23, 2023, 12:16 AM

#

so its the python?

boreal gale Mar 23, 2023, 12:16 AM

#

indeed. python 3.11 is not supported yet.

warm copper Mar 23, 2023, 12:16 AM

#

oh interesting

boreal gale Mar 23, 2023, 12:17 AM

#

ref: https://github.com/apache/spark/pull/38987 as mentioned in my previous message
https://github.com/apache/spark/pull/38987#issuecomment-1343650267 is the most interesting part, py3.11 is only support from spark 3.4 onward (assuming they don't revert the change obviously)

warm copper Mar 23, 2023, 12:19 AM

#

lol

#

3.4?

#

is py 3.4 out?

boreal gale Mar 23, 2023, 12:19 AM

#

spark 3.4 not python 3.4

but to answer your question 3.4 was out a long long time ago

warm copper Mar 23, 2023, 12:20 AM

#

yeah

#

im on 3.3

boreal gale Mar 23, 2023, 12:20 AM

#

it's not out yet, i am unsure what's their release schedule

boreal gale Mar 23, 2023, 12:21 AM

#

boreal gale what's your spark version? i presume 3.3.2? python 3.11 does not work with 3.3....

hence my recommendation to downgrade to python3.10 here

warm copper Mar 23, 2023, 12:21 AM

#

ugh

#

thats gonna suck

#

this code worked tho

#

interestingly

boreal gale Mar 23, 2023, 12:22 AM

#

if you haven't heard of pyenv, it might be worth looking into it, it will lessen the burden of installing python/switching python version.

warm copper Mar 23, 2023, 12:22 AM

#

from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, desc

spark = SparkSession.builder.appName(
    'Covid').getOrCreate()

covid = spark.read.csv('/Users/kadiraltunel/PycharmProjects/covid-us.csv', sep=',',
                       inferSchema=True, header=True)

covid.show(50)

covid.groupBy('date').agg(sum('cases'), sum('deaths')).orderBy('date').show()
covid.groupBy('state').agg(sum('cases'), sum('deaths')).orderBy(desc('sum(cases)')).show()

covid.select(sum(covid.cases)).show()

#

I wonder if you get the same results as I do tho

#

when you run it

boreal gale Mar 23, 2023, 12:23 AM

#

i don't have access to your csv, so it's gonna blow up, but i will most likely get the same behaviour as you i believe.

warm copper Mar 23, 2023, 12:24 AM

#

https://github.com/nytimes/covid-19-data/blob/master/us-counties-2022.csv

GitHub

covid-19-data/us-counties-2022.csv at master · nytimes/covid-19-data

A repository of data on coronavirus cases and deaths in the U.S. - covid-19-data/us-counties-2022.csv at master · nytimes/covid-19-data

#

this is the data

boreal gale Mar 23, 2023, 12:25 AM

#

as to why it only breaks when running the dataframe example script?
it's because of some spark internal which deemed creating a dataframe from user manually supplied data requires a shuffle in the data (or at least the shuffle function), such that the random.Random referenced here https://github.com/apache/spark/pull/38987/files is pickled for transport to other process (basically this is how your python code gets transported to the executor in spark, via something called a pickle, you might also see something like cloudpickle, or dill - all very similar and built upon pickle), seeing as this class no longer exists, code running on python3.11 blows up.

warm copper Mar 23, 2023, 12:25 AM

#

Im confused because the number of cases seem way too high

#

oh I see @boreal gale

boreal gale Mar 23, 2023, 12:30 AM

#

warm copper Im confused because the number of cases seem way too high

i assume this is similar, ran it on python3.10

#

anyway, you can tell your professor 3.11 can't run the dataframe script properly at the moment. you can link https://github.com/apache/spark/pull/38987 and/or https://issues.apache.org/jira/browse/SPARK-41125 if he asks for proof/reasons

also, if that's the only script that doesn't run, there is not much reason to downgrade 🤷 (my advice to downgrade was based on my understanding that spark just plainly doesn't work at all in python3.11, of which obviously i was wrong)

sonic comet Mar 23, 2023, 1:19 AM

#

#voice-chat-text-0 message

warm copper Mar 23, 2023, 2:40 AM

#

weird that there are over 31 billion cases right? @boreal gale

rapid oriole Mar 23, 2023, 2:42 AM

#

Hey guys, I need help for a customer churn prediction model for my group project in my marketing course. Basically we have a 1.2 million customers database that made purchases in various retailers. A specific retailers has been assigned to us and therefore, we now have 591k customers that have made at least one purchase at this retailer. We would like to create a binary variable called 'churn', that will take the value of 1 if the last purchase is more than 18 months ago, 0 if not. We have data for 36 months (2019-2021). We would like to predict customer churn. I intend to spit the data randomly into train and test, however my question is: After fitting the model.. What do I do? Obviously I can't try to predict customers from my training model, so we can ignore them.. What about the rest? How do I apply my model so we can confidently say: This cluster of customer is at risk of churning?

Edit: After thinking, I had this idea: after training the model, I create a new dataframe containing only customers that are still active and haven't churned yet. Then I fit the model, and every customers that have been predicted to churn will form my group of customers that are at risk of churning?

edgy falcon Mar 23, 2023, 2:53 AM

#

Hi!, can someone help me with the next problem:
Im trying to use a Transformer XL layer in my model, then when i used the argument "**kwargs" it tolds me its not defined, but the documentation used it, help me please

soft badge Mar 23, 2023, 3:20 AM

#

Anyone know a community in discord focus on chatGPT and another AI?

lapis sequoia Mar 23, 2023, 3:26 AM

#

is there a way to version control jupyter notebooks on github that doesnt make the diffs insane and huge

#

or does kaggle or hugging face have something smarter

night pasture Mar 23, 2023, 6:38 AM

#

hello i am new here and i am searching for how to become data science programmer can anyone suggest what do i learn first

plush jungle Mar 23, 2023, 6:39 AM

#

night pasture hello i am new here and i am searching for how to become data science programmer...

do you know python?

night pasture Mar 23, 2023, 6:39 AM

#

plush jungle do you know python?

yes

plush jungle Mar 23, 2023, 6:40 AM

#

have you taken any machine learning courses in university?

night pasture Mar 23, 2023, 6:40 AM

#

plush jungle do you know python?

nope i am just learning from home

plush jungle Mar 23, 2023, 6:41 AM

#

well the first thing to do would be get super familiar with all the basics of python. data scientists use almost exclusively python

night pasture Mar 23, 2023, 6:41 AM

#

plush jungle well the first thing to do would be get super familiar with all the basics of py...

yes i watched a lot of tuto and i don't know what to do next

plush jungle Mar 23, 2023, 6:42 AM

#

learning by doing is the best way. pick a project and test your skills. preferably a data science project

#

there are three main things data scientists do:

gather data
clean data
apply machine learning/statistics to data

night pasture Mar 23, 2023, 6:44 AM

#

thanks i will try to learn those

plush jungle Mar 23, 2023, 6:45 AM

#

if you're looking for data that's already gathered and mostly cleaned, kaggle.com is a great resource

night pasture Mar 23, 2023, 6:47 AM

#

plush jungle if you're looking for data that's already gathered and mostly cleaned, kaggle.co...

yes

hasty mountain Mar 23, 2023, 6:56 AM

#

edgy falcon Hi!, can someone help me with the next problem: Im trying to use a Transformer ...

**kwargs is not really a proper argument, it's just extra arguments that could be added.

def init_optimizer(args)

arguments = {
  args['lr']=0.001,
  args['betas']=(0.9, 0.999),
  **kwargs
}

optimizer = torch.optimizer.Adam(**arguments)

Considering that torch.optimizer.Adam() accepts as arguments lr, betas and eps, you could pass as arguments for the function init_optimizer a dictionary args with the itens 'eps'=1e-6, for example, which would be a **kwargs, an extra argument that isn't defined by default.

edgy falcon Mar 23, 2023, 6:58 AM

#

Thank u bro

mint palm Mar 23, 2023, 7:46 AM

#

pca, tSNE (visualisation, dimensionality reduction), contingency tables, uni/bi/multi variate analysis, what other statistics should i learn before my interview?
I will do micro/ marco F1, confusion matrix, AUC, ROC etc, too

#

from libraries what important functions should i know?

cloud finch Mar 23, 2023, 9:25 AM

#

I will be grateful if you take a look at #1088401092326477824 asparkles

rough lava Mar 23, 2023, 11:10 AM

#

Anyone here knows any free/open-source datasets regarding bias ?

rough lava Mar 23, 2023, 11:28 AM

#

lemon_sentimental

rough lava Mar 23, 2023, 1:03 PM

#

Should I maybe dm another channel here? lemon_thinking

lavish wave Mar 23, 2023, 1:40 PM

#

Can anyone help me in resolving this error?

serene scaffold Mar 23, 2023, 1:50 PM

#

lavish wave Can anyone help me in resolving this error?

remember to always give code and errors as text, so that we can copy and paste them as needed.

#

though I suspect that the model you're trying to load is just invalid.

lapis sequoia Mar 23, 2023, 2:04 PM

#

Hi I want to train a regression model.
Should I find the optimal degree first on a default model?
And then take care of regularisation and other parameters later with the optimal degree of the regression model?

agile cobalt Mar 23, 2023, 2:10 PM

#

use regularisation from the start.

lapis sequoia Mar 23, 2023, 2:17 PM

#

agile cobalt use regularisation from the start.

hmm, that will be a really complex loop ig. Because regularisation CV loop, plus an outer loop of degrees

#

Does that mean if you have n hyperparameters you would always need a n-nested loop?

agile cobalt Mar 23, 2023, 2:19 PM

#

you do not have to perform a full grid search on all parameters, if there are too many hyper parameters to tune just use a random search instead of grid search or fix some of them

lapis sequoia Mar 23, 2023, 2:20 PM

#

In this case it's just 2, so it will work

#

Once I did a 4 level grid search, that took a very very long time so I ended up fixing one of the parameters to just an acceptable value

agile cobalt Mar 23, 2023, 2:23 PM

#

remember that if you are way too picky about your hyper parameters you may end up effectively overfitting to your test set

lapis sequoia Mar 23, 2023, 2:23 PM

#

hmm

lapis sequoia Mar 23, 2023, 4:40 PM

#

my column looks like this, do I perform power transformation on them to make them normal for polynomial regression

#

I don't really know when to use it and if there's any downsides to just always using it. Chatgpt says to compare performance before and after using it. But that's just another 'hyperparameter' to tune then.

#

@agile cobalt

agile cobalt Mar 23, 2023, 4:48 PM

#

lapis sequoia <@256442550683041793>

your 'question' being about when to use "power transformation"?

lapis sequoia Mar 23, 2023, 4:51 PM

#

ye

agile cobalt Mar 23, 2023, 4:52 PM

#

honestly I have never seen the term power transformation before and taking a quick look at wikipedia I don't get it, but for features that typically scale exponentially like population or money (such as the GDP column) you may want to consider using log(), while for things that scale linearly you'll probably want to not use any way too fancy methods

lapis sequoia Mar 23, 2023, 4:52 PM

#

hmm

#

what's a way to check for growth rate of something

#

what plot will it reflect in

agile cobalt Mar 23, 2023, 4:53 PM

#

you don't

#

if anywhere, it might be reflected in the distribution of the data

#

but that is something that you should know about the data you are dealing with, not something you'll infer from the data

lapis sequoia Mar 23, 2023, 4:55 PM

#

What if it's all just black box data with no labels

#

or you can't understand the labels

agile cobalt Mar 23, 2023, 4:57 PM

#

then you should not be using that data at all?

#

model interpretability is already bad enough as-is, I cannot commend using data you do understand

lapis sequoia Mar 23, 2023, 5:03 PM

#

I see

boreal gale Mar 23, 2023, 5:20 PM

#

if your question is about when to use power transformation, imo you should use it when your model model assumes normality and your data doesn't follow a normal distribution (how you determine if your data is roughly normal is another question, QQ plot and kolmogorov smirnov test is pretty common)

if your model doesn't require/assume normality, then there are less reasons to use it but sometimes it is indeed useful, i feel this is all very context-dependent.

also power transform impacts the interpretability of your model, which might be an issue. but you can always use SHAP to recover some if not all interpretability.

lapis sequoia Mar 23, 2023, 5:22 PM

#

boreal gale if your question is about when to use power transformation, imo you should use i...

Polynomial regression doesn't assume normality I think. But I was taught that it does

boreal gale Mar 23, 2023, 5:27 PM

#

i actually have no idea 🤔 my stats has degraded a lot since leaving uni

lapis sequoia Mar 23, 2023, 5:28 PM

#

what did you study

boreal gale Mar 23, 2023, 5:28 PM

#

stats 😂

wooden sail Mar 23, 2023, 5:28 PM

#

if you do it via linear least squares, poly regression does indeed assume normality

#

there's more than one way to find the coefficients of a polynomial

#

least squares always assumes normally distributed observations, i.i.d.

lapis sequoia Mar 23, 2023, 5:31 PM

#

hmm, I think sklearn does it the least square way, doesn't it?

wooden sail Mar 23, 2023, 5:31 PM

#

most likely, but i can't say for sure 😛

lapis sequoia Mar 23, 2023, 5:31 PM

#

That's what I am taught as well

#

But I also used regularisation, but that doesn't affect much ig?

wooden sail Mar 23, 2023, 5:32 PM

#

depends on the kind of regularization

lapis sequoia Mar 23, 2023, 5:32 PM

#

I did elastic net

#

l1+l2

wooden sail Mar 23, 2023, 5:33 PM

#

you can usually think of regularized least squares as assuming there is AWGN, and the regularization terms are equivalent to assuming your coefficients are random and come from a special distribution

#

with l1 and l2, that'd be some weird combo of laplace and gaussian priors

lapis sequoia Mar 23, 2023, 5:34 PM

#

I have 38 features now and with 3 degree that become a lot of features and taking long to run

#

don't people have tons of features irl?'

#

That would make polynmial regression inefficient

#

beyond maybe 2 degrees

boreal gale Mar 23, 2023, 5:37 PM

#

don't people have tons of features irl?'
yes, but most people don't use polynmial regression, at least from my past experience

lapis sequoia Mar 23, 2023, 5:40 PM

#

why is it there then

#

What's popular mostly used algorithms

#

Cool graph isn't it

wooden sail Mar 23, 2023, 5:41 PM

#

you usually only use poly regression for modestly low degree polynomials

lapis sequoia Mar 23, 2023, 5:41 PM

#

My teacher made it for 10 degrees, but I have tons of features

wooden sail Mar 23, 2023, 5:41 PM

#

because it turns out it's a fairly challenging problem

#

it involves a toeplitz matrix with a terrible condition number

#

you can run into issues involving numerical stability, or if direct inversion is impossible, slow convergence

lapis sequoia Mar 23, 2023, 5:47 PM

#

@wooden sail

agile cobalt Mar 23, 2023, 5:51 PM

#

lapis sequoia <@467435887236612106>

friendly reminder:

wooden sail Mar 23, 2023, 5:52 PM

#

lapis sequoia <@467435887236612106>

as i said, AWGN

#

the additive error is normally distributed

thick viper Mar 23, 2023, 6:13 PM

#

#

Any ideas?🤣

agile cobalt Mar 23, 2023, 6:14 PM

#

Challenge portfolio work
what does that means?

thick viper Mar 23, 2023, 6:19 PM

#

It’s just a task, it’s my homework

#

I’m trying to do the challenge task because I wanna try to learn it better but I literally have no clue

wooden sail Mar 23, 2023, 6:20 PM

#

you'll have to brush up on your joint and conditional probabilities

thick viper Mar 23, 2023, 6:38 PM

#

Nevermind I got it :so

#

Had to do it on paper haha

mossy lance Mar 23, 2023, 6:39 PM

#

in pytorch, does anyone know how to combine a sequence of multidimensional tensors? 65536 4x4x4 tensors that i want to reshape into 8x8x8 tensors

serene scaffold Mar 23, 2023, 6:49 PM

#

mossy lance in pytorch, does anyone know how to combine a sequence of multidimensional tenso...

more information is needed. to go from (4, 4, 4) to (8, 8, 8), you'll be combining multiple (4, 4, 4) tensors to create each (8, 8, 8) one. and how they're going to go together matters, or you'll end up with meaningless tensors.

mint palm Mar 23, 2023, 7:15 PM

#

can interviewer ask me "without checking syntax" write code for something?

#

what am i expected to generally write without reference? TO give you an idea of role: i am applying for "Sr. data scientist" role at startup

scarlet kite Mar 23, 2023, 7:21 PM

#

beginner question. what is the bias used for? also is it automatically added or not?

low musk Mar 23, 2023, 7:27 PM

#

scarlet kite beginner question. what is the bias used for? also is it automatically added or ...

i think u select it

#

by yourself

agile cobalt Mar 23, 2023, 7:28 PM

#

I'm guessing the b in y = x*w + b?

low musk Mar 23, 2023, 7:28 PM

#

bias in neral networks

low musk Mar 23, 2023, 7:28 PM

#

agile cobalt I'm guessing the `b` in `y = x*w + b`?

yeah

agile cobalt Mar 23, 2023, 7:28 PM

#

usually it is just a feature with value 1 for all records

keen loom Mar 23, 2023, 7:29 PM

#

thick viper

commit

mint palm Mar 23, 2023, 7:29 PM

#

scarlet kite beginner question. what is the bias used for? also is it automatically added or ...

yeah added automatically but can be turned off

low musk Mar 23, 2023, 7:29 PM

#

mint palm yeah added automatically but can be turned off

what does added automatically mean

thick viper Mar 23, 2023, 7:29 PM

#

keen loom commit

What?

agile cobalt Mar 23, 2023, 7:29 PM

#

without it, you would always get y=0 for Xs = [0, 0, 0, 0, 0, ..., 0, 0, 0]
some machine learning libraries might add it automatically for you, while others may require for you to add it yourself

mint palm Mar 23, 2023, 7:29 PM

#

low musk what does added automatically mean

usually its wx+b but we can do wx

#

am i hired, lmao?

keen loom Mar 23, 2023, 7:30 PM

#

thick viper What?

nvm

low musk Mar 23, 2023, 7:30 PM

#

mint palm usually its ``wx+b`` but we can do ``wx``

oh

#

i thought it was just some random value u had to choose

#

in my project the value of bias didnt matter much 🤔

scarlet kite Mar 23, 2023, 7:32 PM

#

thanks

low musk Mar 23, 2023, 7:32 PM

#

hmm I didnt use any library i so thats why i had to select a random value

agile cobalt Mar 23, 2023, 7:32 PM

#

the bias "feature" always has a value of 1
the bias weight is initialised randomly and learned by the network, just like all other feature weights

low musk Mar 23, 2023, 7:33 PM

#

idk what is feature

agile cobalt Mar 23, 2023, 7:33 PM

#

one of the names for your input data

low musk Mar 23, 2023, 7:33 PM

#

oh

#

someone one said I am not implementing something can you explain what it is i didnt watch the vide tehy suggested yet https://media.discordapp.net/attachments/1086687200516788315/1088500719037976657/image.png?width=1440&height=240

#

gradient descent

#

i didnt implement it

#

but my program still works

#

98% of the time it guesses it correctly

#

what else matters?

#

but it improved?

agile cobalt Mar 23, 2023, 7:38 PM

#

there are dozens if not hundreds of different ways to measure how well a model is doing

low musk Mar 23, 2023, 7:38 PM

#

when u start training it has accuracy 0.5

#

eventually reaches 0.98

agile cobalt Mar 23, 2023, 7:39 PM

#

!paste can you show your code?

arctic wedgeBOT Mar 23, 2023, 7:39 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

low musk Mar 23, 2023, 7:39 PM

#

https://github.com/tusharhero/perceptron

GitHub

GitHub - tusharhero/perceptron: The Perceptron (👁️-👁️)

The Perceptron (👁️-👁️). Contribute to tusharhero/perceptron development by creating an account on GitHub.

scarlet kite Mar 23, 2023, 7:40 PM

#

is an activation function used after the last layer of the network?

low musk Mar 23, 2023, 7:41 PM

#

idk what that means yet

scarlet kite Mar 23, 2023, 7:41 PM

#

i was just asking because i dont know the answer

low musk Mar 23, 2023, 7:41 PM

#

oh i thought u were asking about my code

#

😅

agile cobalt Mar 23, 2023, 7:42 PM

#

not sure if it counts as actual gradient descent, but I guess that this bit trains it

arctic wedgeBOT Mar 23, 2023, 7:42 PM

#

algorithm.py lines 135 to 145

if product + bias > 0:
    predict_shape = shapes[0]
    if shape != predict_shape:
        weight = addition(weight, image_list, +1)
else:
    predict_shape = shapes[1]
    if shape != predict_shape:
        weight = addition(weight, image_list, -1)

if shape == predict_shape:
    correct_guesses += 1```

agile cobalt Mar 23, 2023, 7:42 PM

#

but +1 on their suggestion to use numpy

low musk Mar 23, 2023, 7:42 PM

#

is it neccesary

agile cobalt Mar 23, 2023, 7:43 PM

#

would probably run at least 10~100x faster, and if you want to actually work with data science or do anything even hobby level of seriousness you'll 100% need to use numpy and friends

low musk Mar 23, 2023, 7:43 PM

#

arctic wedge `algorithm.py` lines 135 to 145 ```py if product + bias > 0: predict_shape =...

this part tweaks the weight

low musk Mar 23, 2023, 7:44 PM

#

agile cobalt would probably run at least 10~100x faster, and if you want to actually work wit...

its already fast but for my next project I definitely will

#

how do I learn it properly is there a beginner's book for this

agile cobalt Mar 23, 2023, 7:45 PM

#

I'd recommend taking a course like Andrew Ng's machine learning introduction on coursera, or at least following something like 3Blue1Brown's videos or https://course.fast.ai

low musk Mar 23, 2023, 7:46 PM

#

i am just starting linear algebra at school 💀

#

how do they identify more than 2 things

agile cobalt Mar 23, 2023, 7:48 PM

#

watch it to find out /s

low musk Mar 23, 2023, 7:49 PM

#

/s ?

agile cobalt Mar 23, 2023, 7:49 PM

#

sarcasms

low musk Mar 23, 2023, 7:49 PM

#

ohh

agile cobalt Mar 23, 2023, 7:50 PM

#

usually you'll do something like ```
1-10 how much of a circles is it
1-10 how much of a rectangle is it
1-10 how much of a triangle is it

low musk Mar 23, 2023, 7:50 PM

#

agile cobalt usually you'll do something like ``` 1-10 how much of a circles is it 1-10 how m...

how do i get these values in my current program?

agile cobalt Mar 23, 2023, 7:51 PM

#

you don't | you rewrite a lot of things

low musk Mar 23, 2023, 7:51 PM

#

how do they get those values

agile cobalt Mar 23, 2023, 7:51 PM

#

actually seriously this time, watch the video to find out

low musk Mar 23, 2023, 7:52 PM

#

yeah no good night

#

i have school tom

scarlet kite Mar 23, 2023, 7:52 PM

#

is an activation function used after the last layer of the network?

#

or just for hiudde

#

hidden

agile cobalt Mar 23, 2023, 7:52 PM

#

low musk yeah no good night

to put it simply, it is not simple 🤷
do prioritise school though

mild dirge Mar 23, 2023, 7:55 PM

#

scarlet kite is an activation function used after the last layer of the network?

Usually after every layer

#

And not every layer in the same model needs to have the same activation

#

You will often see hidden layers each having ReLU activation, and the last layer Softmax f.e.

scarlet kite Mar 23, 2023, 8:09 PM

#

and when i use random data for a network, is there a way of seeing the predictions vs. the real data?

#

@mild dirge

edgy falcon Mar 24, 2023, 1:24 AM

#

Hi! someone can help me with this error in a Transformer XL layer on tensorflow:
TypeError: tf__call() missing 1 required positional argument: 'relative_position_encoding'

Here's the layer

    vocab_size=140,
    num_layers=6,
    hidden_size=256,
    num_attention_heads=30,
    head_size=5,
    inner_size=30,
    dropout_rate=0.2,
    attention_dropout_rate=0.2,
    initializer="glorot_uniform",
    two_stream=True,
    tie_attention_biases=True,
    memory_length=30,
    reuse_length=30,
    inner_activation='relu'
)(embedding_1)```

hasty mountain Mar 24, 2023, 2:37 AM

#

edgy falcon Hi! someone can help me with this error in a Transformer XL layer on tensorflow:...

I suppose you probably have to pass a tuple (embedding_1, relative_positional_encoding) instead of just (embedding_1)

edgy falcon Mar 24, 2023, 2:40 AM

#

I'll try it, thank u bro

brittle pivot Mar 24, 2023, 3:25 AM

#

I have a pandas dataframe with a time column, a power column and a frequency column. The power and frequency are measured at 0.05s intervals. The frequency that is measured is based on a target frequency, where the frequency will jump to a value and then be held for 60 seconds. how can I split the dataframe based on these frequency jumps?

serene scaffold Mar 24, 2023, 3:40 AM

#

brittle pivot I have a pandas dataframe with a time column, a power column and a frequency col...

What do you mean by split?

brittle pivot Mar 24, 2023, 3:42 AM

#

I want to detect when the frequency changes, i.e. f1 and take a slice from index[0] to index[f1], then from index[f1] to index[f2] and so on

lofty dagger Mar 24, 2023, 5:25 AM

#

import plotly.graph_objects as go

lat = ["22.290222"]
lon = ["73.167065"]

fig = go.Figure(go.Scattermapbox(
    lat=lat,
    lon=lon,
    mode="markers",
    marker=go.scattermapbox.Marker(
        size=10,
        color='red'
    ),
    text=['Location'],
))

fig.update_mapboxes(style="open-street-map")
fig.write_html("/tmp/temp.html")

anyone knows how i would change the shape of the marker to that of a bus?

untold flicker Mar 24, 2023, 5:47 AM

#

Hi I'm just trying to create a time-series neural network. I don't know what format to put time into my data? Do I convert it into seconds and input my data as a 3-D tensor, samples, seconds, features or do I keep it in date and time format

lapis sequoia Mar 24, 2023, 7:18 AM

#

How can I remove the dependent variable from my list of Xs? Here is what I got so far

#

                                paste(feature.names, 
                                      collapse = ' + ')))```

#

Type ~ RI + Na + Mg + Al + Si + K + Ca + Ba + Fe + Type```

#

Nvm I figured it out

clever summit Mar 24, 2023, 8:59 AM

#

Hello

#

Can you help me?

#

I'm keeping on hitting this error:

~\AppData\Local\Temp\ipykernel_7264\806691498.py in <module>
     27 incCount5=0
     28 incCount_reset=0
---> 29 start_time=time.time()
     30 
     31 net = cv2.dnn.readNetFromDarknet(model_config,model_weights)

AttributeError: 'float' object has no attribute 'time'```

What did i do wrong? What should i do?

wooden sail Mar 24, 2023, 9:16 AM

#

clever summit I'm keeping on hitting this error: ```AttributeError ...

it looks like you imported the time module, but somewhere in your code you created a variable called time and assigned a float to it. this essentially destroyed your imported time module

#

call the variable or the module a different name

mild dirge Mar 24, 2023, 9:18 AM

#

Yeah, we're solving it in their help channel

clever summit Mar 24, 2023, 10:19 AM

#

Hello, i need help again

#

This time about opencv dnn error

#

error                                     Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7264\3346177036.py in <module>
     87     #classIds, confs, bbox = net.detect(img,confThreshold=thres)
     88     print(classIds,bbox)
---> 89     blob=cv2.dnn.blobFromImage(img,1/255,(wght_hght_target,wght_hght_target),[0,0,0,0],1,crop=False)
     90     net.setInput(blob)
     91     LayerNames=net.getLayerNames()

error: OpenCV(4.7.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\resize.cpp:4062: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'```

#

I am currently using yolov3-320.cfg as config and yolov3-320.weights as weight

#

What is happening here?

queen cradle Mar 24, 2023, 1:23 PM

#

untold flicker Hi I'm just trying to create a time-series neural network. I don't know what for...

You should usually not keep this kind of time data in date and time format. Converting to seconds since an epoch is fine; or to milliseconds, microseconds, or nanoseconds. I would only recommend date and time formats if you need to know the original timezone; but I haven't been in a situation where I needed to worry about that much, so others may have a more informed opinion.

low musk Mar 24, 2023, 1:57 PM

#

https://discord.com/channels/267624335836053506/1088823771617566822

silent spade Mar 24, 2023, 2:40 PM

#

As anyone tried deploying a NLP model that uses nltk wordnet or stopwords to AWS Lambda?

serene scaffold Mar 24, 2023, 3:26 PM

#

silent spade As anyone tried deploying a NLP model that uses nltk wordnet or stopwords to AWS...

your question probably has more to do with Lambda than nltk. but you should always ask complete questions that someone who knows the answer can start answering right away.

silent spade Mar 24, 2023, 4:02 PM

#

ok my apologies.

I have built a lambda function that uses NLTK to preprocess text before being used in my classification model. The function needs to use NLTK's stopwords, punkt and wordnet libraries to work. I am having issues with the lambda function being able to download the libraries upon execution. Everything works fine locally, but when deployed to AWS it doesnt download the files to the right directory. Has anyone come across this issue before?

dusty valve Mar 24, 2023, 4:11 PM

#

I wanted to display a cnn layer weights in mplt,the shape is (3,3,3,32), can i do it?

agile cobalt Mar 24, 2023, 4:13 PM

#

silent spade ok my apologies. I have built a lambda function that uses NLTK to preprocess t...

usually you would avoid doing anything particularly heavy in serverless environments
if you can, try modifying the install location so that you can just download locally and include it as part of your source code so that it can just read from disk without having to download anything later

silent spade Mar 24, 2023, 4:28 PM

#

I have tried to change the install location to a /tmp/ directory, but the function doesnt want to search that directory for the libraries.

arctic wedgeBOT Mar 24, 2023, 4:53 PM

#

Rules

9. Do not offer or ask for paid work of any kind.

low musk Mar 24, 2023, 5:24 PM

#

should I run some random apk from a discord server 🤔

serene scaffold Mar 24, 2023, 5:34 PM

#

You can't recruit for paid opportunities or business projects here, so please remove your message.

#

!warn 807551900417130537 We've asked you before not to recruit for projects like this. This is your last warning about this, so please contact @sonic vapor if you need any further clarification about what is or isn't appropriate.

arctic wedgeBOT Mar 24, 2023, 5:35 PM

#

:incoming_envelope: :ok_hand: applied warning to @shell sequoia.

midnight grotto Mar 24, 2023, 6:01 PM

#

how tough is it to like make some sort of ai . that simply has to choose between apis to use for results on the basis of text provided

tidal bough Mar 24, 2023, 6:02 PM

#

"some sort of ai" can mean a lot of things, including "an if statement" :p

#

depends what kind of accuracy you want, and what kind of task you have in detail.

iron basalt Mar 24, 2023, 6:17 PM

#

https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/

Stephen Wolfram Writings

Stephen Wolfram

ChatGPT Gets Its “Wolfram Superpowers”!

Wolfram plugin gives computationally accurate answers to ChatGPT queries using Wolfram's data knowledgebase and language. Custom visualizations are given as well. Stephen Wolfram explains how it works.

#

A bit late on this news, but this is something nice.

wooden sail Mar 24, 2023, 6:18 PM

#

oh that's fantastic

tidal bough Mar 24, 2023, 6:19 PM

#

heh, I saw this suggested as a possible way to improve gpt4's capabilities, like, yesterday

#

since it's not good at math, but pretty good at delegating to tools

iron basalt Mar 24, 2023, 6:19 PM

#

If it could read your input files / directories it could be a full DS tool.

wooden sail Mar 24, 2023, 6:19 PM

#

yeah this makes a lot more sense. basically use it to translate your natural language queries to formal mathematical ones and back

iron basalt Mar 24, 2023, 6:20 PM

#

(And remote databases that you point it at)

bright pasture Mar 24, 2023, 6:53 PM

#

ValueError: num_samples should be a positive integer value, but got num_samples=0```

I'm trying to run something called Lora_SVC, and it gives me this error when I try to train.

edgy falcon Mar 24, 2023, 7:16 PM

#

Hi! i need some help please, im passing the argument relative_position_enconding to a Transformer XL layer (in tensorflow) like this:
relative_position_encoding=(None, 300, 256)
But i get the next error:

Dimension value must be integer or None or have an index method, got value 'TensorShape([])' with type '<class 'tensorflow.python.framework.tensor_shape.TensorShape'>'

So whats wrong?

misty flint Mar 24, 2023, 10:28 PM

#

silent spade ok my apologies. I have built a lambda function that uses NLTK to preprocess t...

yes. it is a pain. you should try to build a custom container image with the libraries already inside.

#

kekHands aws

hasty mountain Mar 25, 2023, 1:56 AM

#

Guys, when preprocessing text data for training a Transformer model, should I add a <Start-Of-Sentence> token to my target sentence?
So at the first iteration in a sentence, the model must predict the <SOS> token before predicting any actual word?

#

It feels a bit weird, since the <SOS> token is inserted by default during inference...

hasty mountain Mar 25, 2023, 4:13 AM

#

Uh... I suppose when the Transformer is implemented correctly, vanishing gradients isn't that much of a problem?

glad otter Mar 25, 2023, 4:39 AM

#

I have a log files, it might contain an error lines or not, i want to make a code that can understand each error line and print just a unique from each no need to duplicate

Example

Input file :
Leakage value 1.2 for circuit 1 is greater than the standard

1)Leakage value 0.9 for circuit 2 is greater than standard

2)Capacitance is huge in circuit 3

3)Capacitance is huge in circuit 4

4)Capacitance is huge in circuit 5

5)Capacitance is huge in circuit 6

6)High delay in circuit 7

Output: must be the unique ignoring instance information like circuit number or certain value

Leakage value 1.2 for circuit 1 is greater than the standard
Capacitance is huge in circuit 3
High delay in circuit 7

The log file may contain over than 10000 errors but not ,however its might be just 10 unique errors as shown in the output ,,,

Anyone can suggest a library, or a place to start from?is it possible to make code clever enough to determine these things ?

keen kestrel Mar 25, 2023, 4:57 AM

#

Anyone can suggest me which vendor that offers VM with gpu like V100 / A10 / A100 at decent price? This is for my personal learning on training deep learning model on public data, no privacy / enterprise feature necessary.

misty flint Mar 25, 2023, 5:28 AM

#

keen kestrel Anyone can suggest me which vendor that offers VM with gpu like V100 / A10 / A10...

check this resource out https://fullstackdeeplearning.com/cloud-gpus/

Cloud GPUs Comparison Table

Detailed comparison table of cloud GPU providers for deep learning.

#

courtesy of the FSDL folks

#

blobpray 🥞

#

thanks josh tobin and charles frye and fam

keen kestrel Mar 25, 2023, 5:38 AM

#

misty flint check this resource out https://fullstackdeeplearning.com/cloud-gpus/

Thanks, did not know someone put compilation of vendors

hasty mountain Mar 25, 2023, 5:47 AM

#

hasty mountain Uh... I suppose when the Transformer is implemented correctly, vanishing gradien...

Indeed...when I use the correct hyperparameters, things tend to get better...not perfect and far from chatGPT, but still py_guido

misty flint Mar 25, 2023, 5:47 AM

#

keen kestrel Thanks, did not know someone put compilation of vendors

you can thank the Full Stack DL folks

#

DoggoKek

#

their online course is also really good

obsidian moth Mar 25, 2023, 8:24 AM

#

First of all hi, I need a data about whether the python is playing in data science how??

How to become data scientist by using python language

white sun Mar 25, 2023, 9:03 AM

#

maximum knowledge required in DS&A (PYTHON) for Data Sc.?

white sun Mar 25, 2023, 9:03 AM

#

white sun maximum knowledge required in DS&A (PYTHON) for Data Sc.?

#1035199133436354600

wooden sail Mar 25, 2023, 9:06 AM

#

wdym by "maximum knowledge"?

mint palm Mar 25, 2023, 9:32 AM

#

I had an interview yesterday, went well.
One anomalous question was following:

HIM: why F1 is HM?
ME: To punish score even if one of precision or recall is low even when other might be high.
HIM: so why cant we use F1=precision X recall

Now, i wasnt able to comeup with explanation, he told me it had to do something with Harmonic motion that we learn in high school.
Does anyone know why F1 cant be precision X recall?

mild dirge Mar 25, 2023, 10:58 AM

#

It's the harmonic mean between precision and recall

#

https://en.wikipedia.org/wiki/Harmonic_mean

Harmonic mean

In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired.
The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. As a simple example, t...

#

And this is probably what they meant with that motion:

#

@mint palm

#

#

Here you can see the resulting f1 score (yellow is 1, dark blue is 0) for the harmonic mean, and simply multiplying them

#

queen vector Mar 25, 2023, 11:16 AM

#

Hlw, I am using request-html for web scraping but when I am encountering a <div class=""> no children is returned but while inspecting there is an img tag, How to get that img tag src data ??

edgy falcon Mar 25, 2023, 3:36 PM

#

Hi! i need some help please, im passing the argument relative_position_enconding to a Transformer XL layer (in tensorflow) like this:
relative_position_encoding=(None, 300, 256)
But i get the next error:

Dimension value must be integer or None or have an index method, got value 'TensorShape([])' with type '<class 'tensorflow.python.framework.tensor_shape.TensorShape'>'

So whats wrong?

light quiver Mar 25, 2023, 4:14 PM

#

hey,does anyone use simpy in simulations???

hasty mountain Mar 25, 2023, 4:56 PM

#

Hey guys, I'm trying to make a vectorizer model. The model has 4 fully connected layers, receives features from an image and then generates a vector. It works fine for generating vectors with dimensions (Batch, vector_size).
However, I want to generate 2 dimensional vectors to see how things will work and also to be able to plot the model performance and visualize how things are going(like it's done with PCA and tSNE), but I don't know how to do this without getting an output where the first vector number will be exactly equal to the second.
My code looks like this:

x = self.neuronA(x)
x = self.Relu(x)
x = self.neuronB(x)
x = self.Relu(x)
x = self.neuronC(x)
x = self.Relu(x)
features_embedding = self.neuronD(x)

return features_embedding

I want to make something similar to Pytorch/Keras embedding layer:

test = torch.randint(0, 10, (1, 5)) # (Batch, n_features)

embed = nn.Embedding(10, 10) # 10 embedding dimensions

out = embed(teste) # (Batch, n_features, embedding_dimension) = (1, 5, 10)

Any tip or suggestion?

#

(ChatGPT suggested me to simply reshape my output, but this doesn't seem to make sense mathmatically)

mild dirge Mar 25, 2023, 5:01 PM

#

Well, reshaping is the first thing I thought of too. It will generate 50 output values. You can interpret it as a 1d vector, or 5x10 f.e.

#

So what do you want different from a 2d output than from a 1d with same number of elements?

#

And with PCA you would normally get a 1d vector output with 2 elements. Such that you can plot it as a 2d x-y graph

hasty mountain Mar 25, 2023, 5:04 PM

#

mild dirge So what do you want different from a 2d output than from a 1d with same number o...

I was thinking about something like a spacial space, where the first element, the 5 there, would be the X coordinate(like in a space given by -1 and 1, where the closer the value is to -1, the more it's related to the idea A, and the closer it is to 1, the more it's related to non-A.
The same would be for the 10, the Y coordinate.

#

I think the correct term would be "spacial vector representation", or something like that...

mild dirge Mar 25, 2023, 5:05 PM

#

So an embedding where you want similar inputs to be close in the output space as well

hasty mountain Mar 25, 2023, 5:05 PM

#

mild dirge And with PCA you would normally get a 1d vector output with 2 elements. Such tha...

Yes, that's what I want. Should I simply apply reshape to my 1d vector to get 2 elements?

mild dirge Mar 25, 2023, 5:05 PM

#

Yes, you could just make the output a 1d vector of two elements

hasty mountain Mar 25, 2023, 5:06 PM

#

pithink

#

Oh, ok. Now that I think about it...it's a bit like how we do to create images with linear layers... We get a 1-d output, and simply apply reshape to get a 2-d or 3-d array

mild dirge Mar 25, 2023, 5:07 PM

#

You want the output to be an image?

hasty mountain Mar 25, 2023, 5:07 PM

#

Apply dimensionality reduction to a dimensionality reduction?

hasty mountain Mar 25, 2023, 5:07 PM

#

mild dirge You want the output to be an image?

No, it's just a comparison

#

Ok then... That was easier than I expected. Thanks!

boreal gale Mar 25, 2023, 5:42 PM

#

could you type out the expected output by hand in form of a dataframe please?

#

and can i assume column Two prefix will match One? i.e. if Two is 'A-123' then One must be A?

boreal gale Mar 25, 2023, 6:02 PM

#

okay perfect, gimme a moment!

#

okay, long story short is that there is just no out of the box way to do this merge natively using just the toolbox pandas provides

doing any naive merge and then filling in the blanks seems to be making it harder on yourself.

i first assume you know how to truncate Two from df_data into the string before ;, i will call this truncated Two
imo, your best bet would be then to compute the correct join key in your df_data first, i.e. first check if the corresponding truncated Two exists in df_keys, if it does, great, use that truncated Two as is, if not then the join key would be None/NaN (since your wildcard join is indicated by None/NaN)
(edit: by join key, i meant one part of the actual join condition you will be using, namely how you match up the Twos from both dataframe, since One is already known to be equal from both dataframe, we pay no extra attention to it)

all together this would be

df_data['truncated_two'] = df_data['Two'].str.split(';').str[0]  # > i first assume you know how to truncate Two from df_data into the string before ;, i will call this truncated Two

df_data['joinable_two'] =  np.where(
    df_data['truncated_two'].isin(df_keys['Two']),  # >  first check if the corresponding truncated `Two` exists in `df_keys`
    df_data['truncated_two'],  # > if it does, great, use that truncated `Two` as is
    None  # > if not then the join key would be `None`/`NaN` (since your wildcard join is indicated by `None`/`NaN`)
)
    

pd.merge(
    df_keys,
    df_data,
    left_on=['One', 'Two'],
    right_on=['One', 'joinable_two'],
    how='right',
)[['One', 'Two_x', 'Target', 'Total']]

iron basalt Mar 25, 2023, 6:14 PM

#

mint palm I had an interview yesterday, went well. One anomalous question was following: ...

If you play around with it I think it becomes immediately apparent why harmonic mean is chosen. Sometimes math is about crafting a function that behaves how you want / looks right (which graphics programmers do for their job). https://www.desmos.com/calculator/sdldtw1zy7

Desmos

Desmos | Graphing Calculator

agile cobalt Mar 25, 2023, 6:15 PM

#

there's also the alternative of building a multiindex instead of using merge() / join() but I probably shouldn't really recommend it ```py
import pandas as pd
...
df_data["Two"] = df_data["Two"].str.split(";", n=1).str[0]
mapping = df_keys.set_index(["One", "Two"])["Target"]
keys_to_map = pd.MultiIndex.from_frame(df_data[["One", "Two"]])

values = keys_to_map.map(mapping)

result = df_data.assign(Target=values.fillna(0))
print(result)

primal linden Mar 25, 2023, 6:41 PM

#

This is the last cell of a project I've been working on in jupyter notebook. I added it specifically because a blogger said it only requires pandas/numpy, and I have no other visualizations in the notebook. Upon running it in a virtual environment it turns out the blogger lied to me and it requires matplotlib.

My question is, do you fine folks think it is worth including matplotlib just for this one, rinky dink visualization, or should I just remove it altogether because the values are already discussed in the cells prior?

tidal bough Mar 25, 2023, 6:44 PM

#

You need matplotlib for background_gradient? huh, weird

#

oh, I guess it's for the colormap.

primal linden Mar 25, 2023, 6:46 PM

#

I'll try it without the cmap.

tidal bough Mar 25, 2023, 6:46 PM

#

looking at the source code, it uses matplotlib unconditionally

#

https://github.com/pandas-dev/pandas/blob/v1.5.3/pandas/io/formats/style.py#L3930-L3936

arctic wedgeBOT Mar 25, 2023, 6:46 PM

#

pandas/io/formats/style.py lines 3930 to 3936

with _mpl(Styler.background_gradient) as (plt, mpl):
    smin = np.nanmin(gmap) if vmin is None else vmin
    smax = np.nanmax(gmap) if vmax is None else vmax
    rng = smax - smin
    # extend lower / upper bounds, compresses color range
    norm = mpl.colors.Normalize(smin - (rng * low), smax + (rng * high))
    from pandas.plotting._matplotlib.compat import mpl_ge_3_6_0```

tidal bough Mar 25, 2023, 6:47 PM

#

IMO, matplotlib is so common you might as well install it. Your choice, though, it's not like the gradient is even very noticable here on a 2x2 table.

primal linden Mar 25, 2023, 6:48 PM

#

I appreciate the input, as well as others'!

tawdry ruin Mar 25, 2023, 8:25 PM

#

Is anyone interested to do leetcode questions together starting from easy level?
We can do by our own approaches and then have a discussion on concepts!

serene scaffold Mar 25, 2023, 8:53 PM

#

tawdry ruin Is anyone interested to do leetcode questions together starting from easy level?...

Try #algos-and-data-structs

thin palm Mar 25, 2023, 10:36 PM

#

https://astrobitez.com/

Astro Bitez

Astro Bitez - Unique and Delicious Snacks from Around the World

Discover the world's most unique and delicious snacks at Astro Bites! From savory to sweet, our selection of exotic snacks will take your taste buds on a journey you'll never forget. Shop now and satisfy your cravings for something different!

queen cradle Mar 25, 2023, 10:50 PM

#

thin palm https://astrobitez.com/

!rule 6

arctic wedgeBOT Mar 25, 2023, 10:50 PM

#

Rules

6. Do not post unapproved advertising.

frank sinew Mar 26, 2023, 12:12 AM

#

What are some ways I can implement a bot to my game using an Ai (training and usage)? The background is that each client is in control of an ev3 mindstorm robot, but the robot can also run on ai if there are no players, doing stuff like moving around in the real world and shoot other robots. The data the robots have is the position of the other robots which i get from the aruco markers in opencv from the camera pointing down on all of them

#

I am limited to one main phone camera 3rd person which points down on the aruco marker on top of the tobot. Each robot also has a camera in front (first person).

worldly dawn Mar 26, 2023, 12:23 AM

#

frank sinew What are some ways I can implement a bot to my game using an Ai (training and us...

I would suggest to downscope your problem if that's your first foray into that area

#

Like maybe starting simple with a small 2d simulation

karmic valley Mar 26, 2023, 1:25 AM

#

is chatgpt good to write code

serene scaffold Mar 26, 2023, 2:00 AM

#

karmic valley is chatgpt good to write code

sometimes it produces correct results, and sometimes it produces mostly-correct results. but if it produces mostly-correct results, and you have no idea which part to fix, then it doesn't really help.

#

A lot of people overrate its abilities.

edgy falcon Mar 26, 2023, 2:14 AM

#

How can i solve this error:
Dimension value must be integer or None or have an index method, got value 'TensorShape([])' with type '<class 'tensorflow.python.framework.tensor_shape.TensorShape'>'

On this: relative_position_encoding=(None, 300, None)

is how im passing the argument to a Transformer XL layer on tensorflow

charred light Mar 26, 2023, 2:33 AM

#

serene scaffold sometimes it produces correct results, and sometimes it produces mostly-correct ...

Probably should just have this as a bot command lmao

serene scaffold Mar 26, 2023, 2:33 AM

#

charred light Probably should just have this as a bot command lmao

I'll just make a selfbot to reply that every time someone says chatgpt /s

supple pine Mar 26, 2023, 3:10 AM

#

serene scaffold I'll just make a selfbot to reply that every time someone says chatgpt /s

Ask chatgpt to code it

serene scaffold Mar 26, 2023, 3:12 AM

#

supple pine Ask chatgpt to code it

I asked chatgpt to code a discord bot thing a few weeks ago, and it was surprisingly good, but it had some extra lines that were crap. I think its training data included non-overlapping versions of dpy

supple pine Mar 26, 2023, 3:13 AM

#

serene scaffold I asked chatgpt to code a discord bot thing a few weeks ago, and it was surprisi...

I think the type of task discord bots do it’s well suited for

white pendant Mar 26, 2023, 3:25 AM

#

serene scaffold I asked chatgpt to code a discord bot thing a few weeks ago, and it was surprisi...

GPT-4 can do independent reasoning. It's not blindly fetching code.

#

For example

#

I was just asking if it could help me design a version of Simpletron that ran on a 640 x 480 screen

#

Here is a fragment of the answer it came up with

#



First, let's lay out a plan for how the display and the memory should be organized for the 640x480 resolution:

Display: The Simpletron display will have a resolution of 640x480 pixels. Each pixel can be represented by a single memory address in Simpletron's memory. To store the entire display, we need 640x480 = 307,200 memory addresses. You can choose to use 16-bit or 32-bit memory addresses, depending on your requirements.

Memory: Simpletron typically has a fixed amount of memory, for example, 1000 memory addresses. However, for our design, we need to accommodate 307,200 memory addresses for the display alone. You can choose to expand Simpletron's memory to fit the display requirements or use a separate memory space for the display.

Now, let's outline some basic instructions for the Simpletron display:

SET_PIXEL: This instruction will set a pixel at the specified X and Y coordinates to a specific color. The operand will contain the X, Y, and color information.

CLEAR_SCREEN: This instruction will clear the entire screen by setting all pixels to a specific color. The operand will contain the color information.

DRAW_LINE: This instruction will draw a line between two points with a specific color. The operand will contain the X1, Y1, X2, Y2, and color information.

DRAW_RECT: This instruction will draw a rectangle with a specified color at the given X, Y coordinates, and dimensions. The operand will contain the X, Y, width, height, and color information.```

#

This is something I asked about that is not present anywhere (or not anywhere that I know of)

#

And yet, look how intricate this is

queen cradle Mar 26, 2023, 3:32 AM

#

Sure, it's intricate. But it says you can choose to use 16-bit memory addresses when you have 307,200 things to store.

white pendant Mar 26, 2023, 3:32 AM

#

queen cradle Sure, it's intricate. But it says you can choose to use 16-bit memory addresses ...

It's not 307,200 things to store

#

It's 307,200 memory addresses

queen cradle Mar 26, 2023, 3:32 AM

#

To store the entire display, we need 640x480 = 307,200 memory addresses.

white pendant Mar 26, 2023, 3:33 AM

#

The 307,200 memory addresses is for simulating the display

queen cradle Mar 26, 2023, 3:33 AM

#

Each pixel can be represented by a single memory address in Simpletron's memory.

white pendant Mar 26, 2023, 3:34 AM

#

Yes, that's correct

#

It's an 8-bit pixel

queen cradle Mar 26, 2023, 3:34 AM

#

So you need 307,200 bytes. But 16-bit addresses can only index 65,536 separate items.

#

ChatGPT is clearly confused about arithmetic here.

white pendant Mar 26, 2023, 3:35 AM

#

Um...

white pendant Mar 26, 2023, 3:35 AM

#

queen cradle So you need 307,200 bytes. But 16-bit addresses can only index 65,536 separate i...

It's you who is confused

#

Yes, you need 307,200 memory addresses for the memory itself

#

But each memory address can hold a 16-bit value

queen cradle Mar 26, 2023, 3:36 AM

#

That only gets you to 131,072 bytes.

white pendant Mar 26, 2023, 3:39 AM

#

Q: What is the math for the display again? Can you please clarify? If each memory address stores an 8-bit address, how come the total reserved is 307,200 bytes?

A: Apologies for the confusion earlier. You are correct that if each memory address stores an 8-bit value, the calculation for the total memory reserved for the display would be different. Let me clarify the math for the display:

For a 640x480 resolution display, there are a total of 640 x 480 = 307,200 pixels. If we use 8 bits per pixel (256 colors), each pixel will require 1 byte of memory.

Now, if each memory address stores an 8-bit value (1 byte), then we would need one memory address per pixel. Therefore, the total memory required for the display would be:

307,200 pixels * 1 byte per pixel = 307,200 bytes

So, in this case, the total memory reserved for the display would still be 307,200 bytes (or 307.2 KB), as each pixel requires 1 byte of memory, and there are 307,200 pixels in a 640x480 resolution display.

Please note that this calculation assumes an 8-bit color depth. If you use a different color depth, such as 16-bit (65,536 colors) or 24-bit (16.7 million colors), the memory requirements for the display would change accordingly.

#

@queen cradle How's that?

queen cradle Mar 26, 2023, 3:42 AM

#

It doesn't acknowledge that it made a mistake in claiming that you could use a 16-bit address space. Though to be fair, you didn't specifically ask it about that. Also to be fair, it wouldn't matter to me if you did. ChatGPT isn't good with arithmetic; there are plenty of examples of this, and yours is just one more.

#

I think I've said all I have to say here.

white pendant Mar 26, 2023, 3:42 AM

#

@queen cradle The mistake was on me though, not on GPT

#

Because it originally worded as such:

#

Display: The Simpletron display will have a resolution of 640x480 pixels. Each pixel can be represented by a single memory address in Simpletron's memory. To store the entire display, we need 640x480 = 307,200 memory addresses. You can choose to use 16-bit or 32-bit memory addresses, depending on your requirements.

#

I suppose the last paragraph could be reworded to add: "Please note my calculation assumes 8-bit color depth. If you choose a different resolution, your requirements will change."

queen cradle Mar 26, 2023, 3:44 AM

#

Okay.

edgy falcon Mar 26, 2023, 4:53 AM

#

Hi!, how can i solve this error:
Dimension value must be integer or None or have an index method, got value 'TensorShape([])' with type '<class 'tensorflow.python.framework.tensor_shape.TensorShape'>'

On this: relative_position_encoding=(None, 300, None)

is how im passing the argument to a Transformer XL layer on tensorflow

whole gazelle Mar 26, 2023, 7:48 AM

#

Hi! has anybody here worked with YOLOv8? Im trying to save the values in xywh format using the save_txt=True CLI argument but it's currently what I assume to be normalized

2 0.839807 0.165415 0.12882 0.0654616
24 0.850087 0.551329 0.193253 0.089764
2 0.840522 0.179473 0.128972 0.088213
0 0.535866 0.103385 0.0689186 0.0563577
2 0.898594 0.135384 0.202476 0.0797681
0 0.364594 0.115743 0.08385 0.058203
2 0.957544 0.171258 0.0844107 0.0878528
2 0.0187968 0.179325 0.0375859 0.107676
2 0.935403 0.13661 0.128447 0.0786718
0 0.80963 0.272964 0.101042 0.200643
0 0.471424 0.116067 0.212462 0.208825
0 0.686469 0.351023 0.275836 0.676815
0 0.310915 0.28648 0.20025 0.400011
0 0.245834 0.285782 0.200336 0.39382
0 0.533272 0.447903 0.397006 0.667959
0 0.090301 0.226236 0.180229 0.413001
36 0.442348 0.725502 0.411456 0.163376
36 0.642305 0.625472 0.274271 0.159934

#

Unless some of you know how to convert this to the xywh format that i need

serene scaffold Mar 26, 2023, 11:26 AM

#

white pendant GPT-4 can do independent reasoning. It's not blindly fetching code.

I know it's not blindly fetching code. That doesn't mean that it's infallible.

silent spade Mar 26, 2023, 12:58 PM

#

misty flint yes. it is a pain. you should try to build a custom container image with the lib...

Yeah I managed to figure it out. I did create a docker image with the required packages. The issue was that upon execution, the function will download the stop words and other NLTK libraries needed for preprocessing. It tried downloading the files to directories that can’t be modified for some reason. So I had to have it download to a temp directory and manually point the NLTK function to look in that particular file. It was a pain

serene scaffold Mar 26, 2023, 1:29 PM

#

silent spade Yeah I managed to figure it out. I did create a docker image with the required p...

you can have the code to do that in the Dockerfile, like RUN python -c "import nltk; nltk.download('punkt')"

hasty mountain Mar 26, 2023, 1:53 PM

#

white pendant ```Sure! I can help you get started with designing a version of Simpletron that ...

All this...and it still can't help me make decent GANs hyperlemon

#

Or make a Diffusion model

mild dirge Mar 26, 2023, 1:54 PM

#

Whats the point of this part in a pytorch dataset? isn't index just always an integer?

hasty mountain Mar 26, 2023, 1:54 PM

#

Also...does diffusion models work with audio data, for audio generation? pithink

hasty mountain Mar 26, 2023, 1:56 PM

#

mild dirge Whats the point of this part in a pytorch dataset? isn't index just always an in...

Sincerely, I never used that, but I guess that's because, for N batch in your dataloader, the dataloader will call __getitem__() N times

#

So, if your batch has size 8, dataloader will be like

for i in range(8):
  item = dataset.__getitem__(i)
  return item

mild dirge Mar 26, 2023, 1:57 PM

#

Right, but in that case it would be an integer

#

So why would it ever expect a tensor

hasty mountain Mar 26, 2023, 1:58 PM

#

Hm... maybe because of .iloc?

#

pithink

#

Yeah, I don't know either

wooden sail Mar 26, 2023, 2:04 PM

#

that would be my guess as well

#

for example in numpy and pandas, indexing with a list yields different results from indexing with another numpy array

#

and you may compute indices using other pytorch functions

mild dirge Mar 26, 2023, 2:17 PM

#

I've been writing this entire dataset for pytorch

class VegetableDataset(Dataset):
    def __init__(self, dataset_path, nr_images_per_class=None, transform=None):
        """
        :param dataset_path: Path to the dataset containing all images
        :param nr_images_per_class: The number of images per class that are loaded
        :param transform: transform to be applied to samples
        """
        # Initialize some instance variables
        self.dataset_path = dataset_path
        self.nr_images_per_class = nr_images_per_class
        self.transform = transform

        # Compose a dict with a list of paths for each image class
        image_path_dict = {}
        for class_path in glob.glob(os.path.join(dataset_path, '*')):
            class_image_paths = list(glob.glob(os.path.join(class_path, '*')))
            random.shuffle(class_image_paths)

            if nr_images_per_class is not None:
                class_image_paths = class_image_paths[:nr_images_per_class]

            class_name = os.path.basename(os.path.normpath(class_path))
            image_path_dict[class_name] = class_image_paths

        # Put the images of the dict into a single list together with the labels
        self.paths_and_labels = []
        for idx, class_name in enumerate(sorted(image_path_dict.keys())):
            for image_path in image_path_dict[class_name]:
                self.paths_and_labels.append((image_path, idx))

        # Shuffle this list to randomize the order of images fed to
        random.shuffle(self.paths_and_labels)

    def __len__(self):
        return len(self.paths_and_labels)

    def __getitem__(self, idx):
        image_path, label = self.paths_and_labels[idx]
        image = ...

#

Turns out, I can just use this

from torchvision.datasets import ImageFolder

dataset = ImageFolder('vegetable_data/')

#

Anyone knows if I can make it so it only grabs x nr of images per class, instead of all of them?

wooden sail Mar 26, 2023, 2:28 PM

#

you'd then use a dataloader

#

imagefolder does not load the contents to memory

#

check out this https://pytorch.org/tutorials/beginner/basics/data_tutorial.html and this https://debuggercafe.com/pytorch-imagefolder-for-training-cnn-models/ for some examples on dataloaders with imagefolder

#

you can tell the dataloader to load some amount (batch size) of images each time, selected at random from a source of images (imagefolder)

#

these dataloaders usually include augmentation capabilities btw. tensorflow has something similar as well

mild dirge Mar 26, 2023, 2:37 PM

#

Yeah, but I want to limit the "entire dataset" to only contain x nr of images per class. Not just adjust the batch size when loading the images in

#

I want to train multiple cnns and then combine the results using some election rules, so I want each one to be trained on some random set of images

#

But I already wrote the custom dataset, I'll just continue that so I can personalize it anyways

wooden sail Mar 26, 2023, 2:41 PM

#

i'm pretty sure there should be some parameter for that, but i don't use pytorch so i don't know which one. the rule of thumb is that, if it seems like a common enough problem, it already has a solution 😛

serene scaffold Mar 26, 2023, 2:42 PM

#

wooden sail i'm pretty sure there should be some parameter for that, but i don't use pytorch...

you use tensorflow, then? or jax?

wooden sail Mar 26, 2023, 2:42 PM

#

i use jax, but never for this sort of stuff. i usually generate my own data synthetically and rarely work on measured data

#

i did do some tensorflow courses at some point but i've never used it for anything myself other than the usual mnist, fashionmist, hand signs, etc that everyone does while learning

#

off the top of my head, a solution could be to make a list of dataloaders per class, but that only makes sense if you first split the data into directories per class

#

seems you can write a sampler function for dataloader and pass that as an arg too https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler but yeah, since you already made your tool 😛

mild dirge Mar 26, 2023, 2:50 PM

#

Yeah I suppose it doesn't even matter that the classes are perfectly balanced, I could just iterate through a set nr of batches per classifier. But it's good to know that there is already stuff out there for image datasets.

serene scaffold Mar 26, 2023, 2:51 PM

#

is Jax more explicit than pytorch when it comes to adjusting the weights? because I find loss.backward() and optim.step() to be weird and implicit.

wooden sail Mar 26, 2023, 2:53 PM

#

as explicit as you like

#

you can compute the gradients and do whatever you like with them before updating the parameters

#

jax by default is just numpy with jit and autodiff

#

there's also the optax module that works much like pytorch and tf. you tell it which optimizer you like and it handles the rest

serene scaffold Mar 26, 2023, 2:56 PM

#

is it at least as fast as pytorch?

wooden sail Mar 26, 2023, 2:56 PM

#

should be comparable

serene scaffold Mar 26, 2023, 2:56 PM

#

hmm, maybe I'll try it for my next project that uses neural networks

wooden sail Mar 26, 2023, 2:56 PM

#

that's in general a terrible idea, i think

serene scaffold Mar 26, 2023, 2:57 PM

#

why