arctic wedgeBOT Apr 27, 2023, 6:14 PM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

lapis sequoia Apr 27, 2023, 6:15 PM

#

tweet_df['Neutral_count'] = tweet_df['sentiment'].apply(lambda x: 1 if x == 'Neutral' else 0)
tweet_df['Positive_count'] = tweet_df['sentiment'].apply(lambda x: 1 if x == 'Positive' else 0)
tweet_df['Negative_count'] = tweet_df['sentiment'].apply(lambda x: 1 if x == 'Negative' else 0)
tweet_df.head()

#

Its simply supposed to keep track of the number of positive neutral and negative in the sentiment column

serene scaffold Apr 27, 2023, 6:15 PM

#

can you do print(tweet_df['sentiment'].head())?

lapis sequoia Apr 27, 2023, 6:16 PM

#

past meteor Apr 27, 2023, 6:16 PM

#

Will you endlessly tune on validation?

serene scaffold Apr 27, 2023, 6:16 PM

#

lapis sequoia

Please do not show screenshots of text anymore.

lapis sequoia Apr 27, 2023, 6:16 PM

#

surte

serene scaffold Apr 27, 2023, 6:17 PM

#

Anyway, all you need to do is tweet_df['sentiment'].value_counts(). but you don't want to put that into tweet_df

#

because there aren't different sentiment counts for each row. it's counts for the whole dataframe.

lapis sequoia Apr 27, 2023, 6:18 PM

#

what about if i want to store cumsum

#

would ```py
tweet_df['positive_sent'] = (tweet_df['sentiment'] == 'Positive').cumsum()
tweet_df['negative_sent'] = (tweet_df['sentiment'] == 'Negative').cumsum()
tweet_df['neutral_sent'] = (tweet_df['sentiment'] == 'Neutral').cumsum()

#

to store the running sum

serene scaffold Apr 27, 2023, 6:19 PM

#

try it and see

#

tweet_df['sentiment'].eq('Positive').cumsum() -- this notation would also work. I think it looks cleaner, but that's just me.

lapis sequoia Apr 27, 2023, 6:20 PM

#

it didnt work ill try yours

serene scaffold Apr 27, 2023, 6:20 PM

#

it won't have a different result

past meteor Apr 27, 2023, 6:22 PM

#

Yeah but was this on your first go?

lapis sequoia Apr 27, 2023, 6:22 PM

#

serene scaffold it won't have a different result

would I also have to store it in a new df?

serene scaffold Apr 27, 2023, 6:23 PM

#

lapis sequoia would I also have to store it in a new df?

I don't think so.

vague briar Apr 27, 2023, 6:23 PM

#

how do i get whatsapp seen data? With beatiful soup or selenium

lapis sequoia Apr 27, 2023, 6:23 PM

#

odd.

#

well i cant post a screen shot to show my df head but its just all zeroes

serene scaffold Apr 27, 2023, 6:23 PM

#

lapis sequoia well i cant post a screen shot to show my df head but its just all zeroes

you can show dataframes as text by doing print(df.head().to_dict('list'))

vague briar Apr 27, 2023, 6:24 PM

#

vague briar how do i get whatsapp seen data? With beatiful soup or selenium

I tried with bs4 to get with class

#

But ı cant

past meteor Apr 27, 2023, 6:24 PM

#

Worst case scenario you've tuned to solve your validation set didn't you

#

Yes but so long as you don't have completely other data you don't know if that'll be consistently lowering

lapis sequoia Apr 27, 2023, 6:26 PM

#

{'id': [1651643206474317824, 1651643242595729408, 1651643261784563719, 1651643301555064835, 1651643310782533632], 'user_name': ['MAKS_Diogenes', 'crypto__wire', 'eurexcoinLTD', 'BitcoinCourant', 'spaziocrypto'], 'text': ['Im thinking about making 50 seed phrases - place 10000 sats on each - then I will engrave the phrases on metal with… https://t.co/SR5Tdq6sY1', 'Crypto Winter Is Over |  Mass Bitcoin Adoption | Meme Coins Continue To ... https://t.co/rd1K6tb6sZ #BABYPEPE #memecoins #Bitcoin', '1: Bitcoin price is $29293.55 (0.66% 1h)\n2: Ethereum price is $1906.26 (0.44% 1h)\n3: Tether price is $1.00 (-0.01%… https://t.co/qCEQZgdptU', 'An easy way to run your own Bitcoin node is by using Bitcoin Core\nhttps://t.co/NyAtlHbWTB', 'Pufff, banking system gone... #bitcoin https://t.co/zQSjQls5qx'], 'sentiment': [' Positive', ' Positive', ' Neutral', ' Neutral', ' Positive'], 'positive_sent': [0, 0, 0, 0, 0], 'negative_sent': [0, 0, 0, 0, 0], 'neutral_sent': [0, 0, 0, 0, 0]}

past meteor Apr 27, 2023, 6:26 PM

#

Like it might, but it might not

lapis sequoia Apr 27, 2023, 6:30 PM

#

tweet_df['positive_sent'] = tweet_df['sentiment'].eq('Positive').cumsum()
tweet_df['negative_sent'] = tweet_df['sentiment'].eq('Negative').cumsum()
tweet_df['neutral_sent'] = tweet_df['sentiment'].eq('Neutral').cumsum()
print(tweet_df.head().to_dict('list'))
```  my code for this

lapis sequoia Apr 27, 2023, 6:31 PM

#

lapis sequoia

Is it because the datatype is an object and not a string?

lapis sequoia Apr 27, 2023, 7:13 PM

#

😦

#

I found the issue is that there was a space before the sentiment

#

'sentiment': [' Positive', ' Positive', ' Neutral', ' Neutral', ' Positive']

tawdry flint Apr 27, 2023, 7:21 PM

#

is there a website to learn machine learning, which is free for Students?

past meteor Apr 27, 2023, 7:26 PM

#

introduction to statistical learning is the book I'd recommend. It's free, but the code examples and labs are in R. A Python version is coming out soon.

wooden sail Apr 27, 2023, 7:27 PM

#

tawdry flint is there a website to learn machine learning, which is free for Students?

if you have proof of your student status, you can apply to "financial aid" on coursera. this allows you to get the certificates for free. just keep in mind the application takes like 2 weeks to get processed

tawdry flint Apr 27, 2023, 7:29 PM

#

Thanks

lapis sequoia Apr 27, 2023, 8:00 PM

#

wooden sail if you have proof of your student status, you can apply to "financial aid" on co...

yes and for each "course" you have to apply lol

#

ex: https://www.coursera.org/professional-certificates/ibm-data-science

Coursera

IBM Data Science

Offered by IBM. Kickstart your career in data science & ML. Build data science skills, learn Python & SQL, analyze & visualize data, build ... Enroll for free.

#

has 10 courses, so you apply 10 times

wooden sail Apr 27, 2023, 8:02 PM

#

indeed, but it's free

#

beggars can't be choosers

agile cobalt Apr 27, 2023, 8:05 PM

#

there are also fast.ai's and sklearn's courses, but like anything 100% free they have no certificates

edgy jacinth Apr 27, 2023, 8:20 PM

#

when it comes to my model rewriting it’s own pathways and creating neurons how do I go about that

naive peak Apr 27, 2023, 8:28 PM

#

are there good libraries to convert somewhat complex XLSX files into json?

#

was it like pandas or something?

hushed wave Apr 27, 2023, 8:59 PM

#

hi guys

#

# Set the path of the image folder
image_folder_path = "/content/gdrive/MyDrive/video_frames3"

# Define the list of emotions to detect
emotion_labels = ["neutral", "happy", "sad", "surprise", "angry", "fear", "disgust"]

# Create an empty DataFrame to store the emotion data
emotion_df = pd.DataFrame(columns=["Image"] + emotion_labels + ["Dominant Emotion"])

# Loop through the images in the folder
for image_filename in os.listdir(image_folder_path):
    if image_filename.endswith(".jpg") or image_filename.endswith(".png"):
        # Load the image using DeepFace and check if a face is detected
        image_path = os.path.join(image_folder_path, image_filename)
        detected_faces = DeepFace.extract_faces(image_path)
        
        if len(detected_faces) == 0:
            # If no face is detected, skip to the next image
            continue
        
        # Perform emotion detection using DeepFace
        emotions = DeepFace.analyze(image_path, actions=['emotion'])
        dominant_emotion = DeepFace.analyze(image_path, actions=["dominant_emotion"])
        
        # Append the emotion data to the DataFrame
        emotion_data = {"Image": image_filename}
        for label in emotion_labels:
            emotion_data[label] = emotions["emotion"].get(label)
        try:
            dominant_emotion_label = dominant_emotion[0].get("dominant_emotion")
        except:
            dominant_emotion_label = "None"
        emotion_data["Dominant Emotion"] = dominant_emotion_label
        emotion_df = emotion_df.append(emotion_data, ignore_index=True)

# Save the emotion data to a CSV file
emotion_df.to_csv("emotion_data.csv", index=False)

Any changes recommended for this because atm it's giving me this error

#

TypeError: list indices must be integers or slices, not str

edgy jacinth Apr 27, 2023, 9:08 PM

#

Does anyone have a pre trained model to a certain extent I can use as a baseline for mine? Or willing to help me make one

thorn swift Apr 27, 2023, 11:42 PM

#

edgy jacinth Does anyone have a pre trained model to a certain extent I can use as a baseline...

for what?

edgy jacinth Apr 27, 2023, 11:42 PM

#

working ai assistant, just need some baseline

serene scaffold Apr 27, 2023, 11:45 PM

#

edgy jacinth Does anyone have a pre trained model to a certain extent I can use as a baseline...

you can download large language models like GPT2 from huggingface, but even if you did, it would be tons of work to create an AI assistant.

edgy jacinth Apr 27, 2023, 11:47 PM

#

serene scaffold you can download large language models like GPT2 from huggingface, but even if y...

call me stony hark!!!

serene scaffold Apr 27, 2023, 11:48 PM

#

edgy jacinth call me stony hark!!!

I will not do that. what is your motivation for wanting to create an AI assistant, and do you have any (fairly specific) examples of what you want it to do?

edgy jacinth Apr 27, 2023, 11:55 PM

#

serene scaffold I will not do that. what is your motivation for wanting to create an AI assistan...

yes, and one day ill get the world to acknowledge me! uses are an integration into an ar glasses hardware ( requires the ai portion to be done ofc)

serene scaffold Apr 28, 2023, 12:22 AM

#

edgy jacinth yes, and one day ill get the world to acknowledge me! uses are an integration in...

that's a fine goal. but you'll need to start smaller. an AI assistant that is actually useful would require a lot of components, and you probably don't know ML fundamentals.

edgy jacinth Apr 28, 2023, 12:24 AM

#

shiit they know https://cdn.discordapp.com/emojis/766226970974617600.gif?size=64

#

im workin on it parental figure believe in me!

round kettle Apr 28, 2023, 12:54 AM

#

Any advice for someone making a career pivot from civil engineering to the data sector? Finished my M.S. and have experience with computational modeling and HPC but want to score a role as a junior level data engineer, analyst, or some sort of developer in this industry. I know I have a lot to learn and I’m excited to do so, but just need some guidance on how to get started. Willing to DM LinkedIN or resume for context.

slim lance Apr 28, 2023, 12:59 AM

#

Maybe find a volunteer role to get experience? https://www.datakind.org/do-good-with-data

DataKind

Harnessing the power of data science + AI in the service of humanity

queen cradle Apr 28, 2023, 1:25 AM

#

https://learn.microsoft.com/en-us/office/open-xml/open-xml-sdk

round kettle Apr 28, 2023, 1:27 AM

#

@slim lance never thought of that, that’s a really clever idea!

lapis sequoia Apr 28, 2023, 1:41 AM

#

Can someone who's fluent with pyspark & data handling please ping me ? I need to ask how do i handle 120GB worth of data in json file using pyspark. I need to clean that data and then put into into my mongoDB but i don't understand how I'd read so much data with pyspark.
So if someone's familiar with pyspark please ping / dm would be better. It would be a major help

slim lance Apr 28, 2023, 2:44 AM

#

round kettle <@385088012662472714> never thought of that, that’s a really clever idea!

The other thing to do is learn to drive the api of every single service you use with Python. e.g. - Gmail, Gsheets, Trello, Jira, Slack, Discord, etc. Not specific to DE/DS but great coding practice. (At least it has been for me.)

round kettle Apr 28, 2023, 2:50 AM

#

slim lance The other thing to do is learn to drive the api of every single service you use ...

I like this idea too 👍

rugged comet Apr 28, 2023, 2:56 AM

#

Code

df = spark.read.option("header", True).csv(*[f"/FileStore/tables/deck_data_{num}.csv" for num in range(500000, 2500001, 500000)])

Error

ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near '/': extra input '/'(line 1, pos 0)

== SQL ==
/FileStore/tables/deck_data_1000000.csv
^^^

What is going on here? I'm not using a SQL cell, I'm using a Python cell. This is in databricks by the way.

#

I'm trying to read multiple csv files at once by unpacking a list of file paths.

slim lance Apr 28, 2023, 3:32 AM

#

Anyone have experience running Athena queries from python?

#

(I’m wondering if it’s worth the extra step)

lapis sequoia Apr 28, 2023, 7:27 AM

#

I have a dataset with data about 50 stores in US. I need to predict revenue for all of them at once. Dataset looks like this: 10 lines for 1 store, 15 for second, 35 for third etc. Which model should I use?

#

I have one Y and three X

cold osprey Apr 28, 2023, 7:27 AM

#

what is lines?

#

rows of data or?

lapis sequoia Apr 28, 2023, 7:28 AM

#

Rows in csv

cold osprey Apr 28, 2023, 7:28 AM

#

im guessing one per year or

lapis sequoia Apr 28, 2023, 7:28 AM

#

One per month

cold osprey Apr 28, 2023, 7:28 AM

#

u need to be clearer on what data u have

#

and u want to predict next month?

lapis sequoia Apr 28, 2023, 7:29 AM

#

Preferably upcoming 6 months

cold osprey Apr 28, 2023, 7:29 AM

#

would probaly start with linear regression

#

quite little data

lapis sequoia Apr 28, 2023, 7:30 AM

#

But how I can do it for all stores at once?

#

I built a model in ARIMAX that is working only when I use dataset containing one store

cold osprey Apr 28, 2023, 7:33 AM

#

yeah ig it doesnt rly make sense to do all stores at once

#

unless u mean the totality?

lapis sequoia Apr 28, 2023, 7:33 AM

#

No, I need to see results separately

#

I know, but it’s a requirement from my boss

cold osprey Apr 28, 2023, 7:34 AM

#

hes enforcing that u only use one model?

#

it doesnt make sense

#

one store have have 5x less revenue than another lol

#

ig u can merge ur data together with some column denoting which store it is

#

seems like a bad idea if stores can have quite different revenues

lapis sequoia Apr 28, 2023, 7:35 AM

#

Or maybe I can do it with a loop? So I will be running predictions for one store only

cold osprey Apr 28, 2023, 7:36 AM

#

sure

#

not familiar with arimax

earnest widget Apr 28, 2023, 7:37 AM

#

Has anyone worked with T-SNE visualization? I am having a hard time trying to understand what it actually shows. I have two classes.

cold osprey Apr 28, 2023, 7:37 AM

#

do u have a train bit and also a test bit?

lapis sequoia Apr 28, 2023, 7:37 AM

#

Yes

cold osprey Apr 28, 2023, 7:37 AM

#

i would use a diff model for each store

#

so u would run fit for each store

lapis sequoia Apr 28, 2023, 7:39 AM

#

But is there any model that would fit my needs? Let’s assume that stores are connected somehow and using all for training will do a better predictions

cold osprey Apr 28, 2023, 7:42 AM

#

no idea

lapis sequoia Apr 28, 2023, 7:44 AM

#

I will try to google that again

#

Thank you! ❤️

modern belfry Apr 28, 2023, 8:04 AM

#

def similarity_matrix_blocking_code(class_dataframe) -> ndarray:
      #using sklearn algorithms
      tfidf_vectorize = TfidfVectorizer(stop_words='english')
      anime_matrix = tfidf_vectorize.fit_transform(class_dataframe[DataframeColumns.COMBINED_FEATURES])
      return cosine_similarity(anime_matrix)```

#

I have this code for basic similarity recommendations

#

the only problem is high cpu usage because 10-15k items are being processed here

#

which crashes my container on vps ;--;

#

any suggestions?

mild dirge Apr 28, 2023, 8:28 AM

#

!rule 6 @floral comet remove that message pls

arctic wedgeBOT Apr 28, 2023, 8:28 AM

#

Rules

6. Do not post unapproved advertising.

deft robin Apr 28, 2023, 8:32 AM

#

Hey folks, I'm looking for resources to build something in ML/AI with IoT sensor data. Any suggestions? Anomaly detection seems to be a common application. Any other applications? Currently planning to focus on Power consumption data or Temperature/Humidity data of a factory

cold osprey Apr 28, 2023, 8:45 AM

#

time series prediction?

#

demand prediction maybe

#

where are the IoT sensors located

deft robin Apr 28, 2023, 10:46 AM

#

cold osprey where are the IoT sensors located

A power sensor connected to a machine and Temp, Humidity, Air Quality sensor in the same room. It's just a scenario as I want to understand how AI/ML can be used in this situation

#

Why can't I just set a range of values that are OK and if the values start going out of that range I could consider it an anomaly and send an alert?

tidal bough Apr 28, 2023, 10:48 AM

#

If you have to set a range of values manually, that's not ML, that's just an if-statement 🙂

#

Anomaly detection is basically feeding a model a lot of normal data and having the model learn from it what the normal ranges for the values are.

deft robin Apr 28, 2023, 10:51 AM

#

tidal bough If you have to set a range of values manually, that's not ML, that's just an if-...

Yeah, let's say a machine normally uses 500W. If the machine not working properly it may use 200W. I can use an if-else to do this. I want to understand in what way can ML help me here

tidal bough Apr 28, 2023, 10:58 AM

#

Here's a short overview of how simple (non-neural-network) anomaly detection works: https://scikit-learn.org/stable/modules/outlier_detection.html

cold osprey Apr 28, 2023, 11:01 AM

#

I think it's to alert before something goes wrong

tidal bough Apr 28, 2023, 11:02 AM

#

Though since your data has timestamps attached, it'd be losing a lot of context to consider your data points independently: like, it might be that none of the measurements are weird on their own, but a specific sequence of several measurements in a row is anomalous. If you want to take that into account, you need anomaly detection on time series as shimmer mentioned, which I know little about (though apparently Azure has an implementation, https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/anomaly-detection)

cold osprey Apr 28, 2023, 11:02 AM

#

Say too high of a humidity and the machine wont run properly

earnest widget Apr 28, 2023, 12:27 PM

#

I am trying to do feature extraction and visualisation for an image dataset with classification in mind, does anyone have any ideas on any methods to go about this? So far I have used Kmeans clustering to highlight false positives and such but I am looking for other ways to visualize it.

cold osprey Apr 28, 2023, 1:02 PM

#

glass estuary Apr 28, 2023, 1:05 PM

#

Hi everyone! Could someone tell me if there are any repos, tools that I can use to generate images from text on windows with AMD gpu?

queen cradle Apr 28, 2023, 1:29 PM

#

earnest widget Has anyone worked with T-SNE visualization? I am having a hard time trying to un...

This output means you don't have much, if any, separation between your classes. You can try UMAP (sometimes it succeeds where t-SNE fails, and vice versa) but I wouldn't expect it to perform much better. Your classes are probably not separated well enough.

glacial iris Apr 28, 2023, 1:31 PM

#

https://tenor.com/view/0001-gif-25597406

Tenor

earnest widget Apr 28, 2023, 1:33 PM

#

queen cradle This output means you don't have much, if any, separation between your classes. ...

Oh okay, I just have two classes and I guess it is not able to classify well based on the graph? Also, the images are put into separate sub-folders for each class.

queen cradle Apr 28, 2023, 1:33 PM

#

deft robin Why can't I just set a range of values that are OK and if the values start going...

This is a more reliable solution than AI/ML. People sometimes think that AI/ML techniques are better because they're sophisticated. But they can also be more fragile.

If simply setting a range doesn't work, there is a lot of statistical literature on "statistical process control" or "statistical quality control". The material is quite classical at this point; it was developed starting about 100 years ago. It might give you a little more sensitivity while still remaining robust to ordinary variation.

queen cradle Apr 28, 2023, 1:34 PM

#

earnest widget Oh okay, I just have two classes and I guess it is not able to classify well bas...

Yes, it looks like whatever method you used for classification has failed. You'll need to re-evaluate your methods and look for mistakes or ways in which they could be improved.

earnest widget Apr 28, 2023, 1:35 PM

#

queen cradle Yes, it looks like whatever method you used for classification has failed. You'l...

Thing is, I didn't even classify yet. I just wanted to extract the features and view it.

queen cradle Apr 28, 2023, 1:36 PM

#

This is labeled data?

earnest widget Apr 28, 2023, 1:36 PM

#

Yeah it's images put into its respective class sub-folders; container and no_container.

#

I mean it's kind of my first time trying these visualisations out so that's why I am trying to figure it out.

queen cradle Apr 28, 2023, 1:38 PM

#

It sounds like the features you extracted aren't powerful enough to distinguish the two classes.

earnest widget Apr 28, 2023, 1:38 PM

#

queen cradle It sounds like the features you extracted aren't powerful enough to distinguish ...

Oh alright, maybe I should try a different model perhaps?

queen cradle Apr 28, 2023, 1:40 PM

#

earnest widget Oh alright, maybe I should try a different model perhaps?

Maybe? I don't know what you've tried or what's appropriate for your data. And feature engineering is a kind of art.

earnest widget Apr 28, 2023, 1:41 PM

#

queen cradle Maybe? I don't know what you've tried or what's appropriate for your data. And f...

Yeah true. I just tried YOLOv5s maybe it is more suited towards obj detection instead. I should maybe try using a better suited model like VGG or RESNET.

deft robin Apr 28, 2023, 1:41 PM

#

queen cradle This is a more reliable solution than AI/ML. People sometimes think that AI/ML t...

True, I was looking at a TinyML video on DigiKey's Youtube channel. They had to re-train the model whenever they moved the sensor because the data changes based on the location of the sensor. Seems a bit too much work for something that can be achieved in a simpler way.
I'm trying to understand if there is something I don't know or see about ML/AI

earnest widget Apr 28, 2023, 1:42 PM

#

queen cradle Maybe? I don't know what you've tried or what's appropriate for your data. And f...

But yeah feature engineering is quite hard ngl (at least for me).

surreal solstice Apr 28, 2023, 1:42 PM

#

Hey guys, long-time lurker here. I had a question about some pandas functionality.

#

Doesn't seem possible to groupby then aggregate a custom function over multiple columns?

#

Basically, I need to be able to define, in one groupby/agg statement, the sum & the division of two separate columns and save them as a new column, and then calculate the sum of other columns.

#

This way the sum & division, and the sums is/are done over the grouped variables.

boreal gale Apr 28, 2023, 1:45 PM

#

could you give us an concrete example?

cold osprey Apr 28, 2023, 1:47 PM

#

https://stackoverflow.com/questions/26812763/applying-a-custom-groupby-aggregate-function-to-output-a-binary-outcome-in-panda

Stack Overflow

Applying a custom groupby aggregate function to output a binary out...

I have a dataset of trader transactions where the variable of interest is Buy/Sell which is binary and takes on the value of 1 f the transaction was a buy and 0 if it is a sell. An example looks as

#

some people forget google exists sometimes

boreal gale Apr 28, 2023, 1:48 PM

#

being snide doesn't really add to the conversion constructively 🙂

surreal solstice Apr 28, 2023, 1:51 PM

#

cold osprey https://stackoverflow.com/questions/26812763/applying-a-custom-groupby-aggregate...

I appreciate the sarcasm, but I've already looked at this.

surreal solstice Apr 28, 2023, 1:52 PM

#

boreal gale being snide doesn't really add to the conversion constructively 🙂

Yes -- just a moment ry.

cold osprey Apr 28, 2023, 1:53 PM

#

does each aggregation use the same group by columns?

#

or different

surreal solstice Apr 28, 2023, 1:53 PM

#

Unfortunately not. They are different

cold osprey Apr 28, 2023, 1:53 PM

#

then u can loop over them or smth

surreal solstice Apr 28, 2023, 1:53 PM

#

Just a moment, I'm typing up an example so I can't respond, thanks

#

df = pd.DataFrame({'location': ['backyard', 'store', 'bank', 'backyard', 'backyard', 'bank', 'store'],
                   'is_orange': [1, 1, 0, 0, 1, 0, 1],
                   'is_non_orange': [0, 0, 1, 1, 0, 1, 0],
                   'melons':     [73, 81, 94, 174, 23, 71, 65})

lapis sequoia Apr 28, 2023, 1:57 PM

#

@serene scaffold @mighty patio Sorry for the late response. So i am designing with the ezdxf library a figure, i have seen that i can export that figure and save it because ezdxf has this functionality implemented. I can do that through matplotlib. Now i have also to make a report that has that figure inside. I wanted to surpass the step of saving the figure from the ezdxf script i have and load it into the report docx file. So in summary i want to keep the figure in memory without saving it localy and then load it directly into the docx file.

surreal solstice Apr 28, 2023, 1:58 PM

#

Alright, so given this DataFrame, what I'd like to do is something like this: sorry this is pseudocode

df.sort_values(['location']).groupby(['location']).agg(
    'total orange/non-orange' : df['is_orange'] + df['is_non_orange'],
    'percent_orange'          : df['is_orange'] / (df['is_orange'] + df['is_non_orange']),
    'sum_melons'              : sum(df['melons'])

cold osprey Apr 28, 2023, 1:59 PM

#

seems like u shud be able to define custom agg functions to use

#

and/or combining columns

surreal solstice Apr 28, 2023, 2:00 PM

#

Right, that's the idea. Basically, the requirements are forcing me to make this into one table. There are posts about custom agg functions for one column, but I unfortunately need this for multiple columns.

#

We are able to do this in legacy statistical software pretty easily, strangely enough.

cold osprey Apr 28, 2023, 2:01 PM

#

u can make the sums first

#

and then do like py df['percent_orange'] = df['is_orange'] / (df['is_orange'] + df['is_non_orange']) ?

mild dirge Apr 28, 2023, 2:02 PM

#

surreal solstice Just a moment, I'm typing up an example so I can't respond, thanks

Btw, not to be that guy that says "just google it", but in my experience chatgpt is pretty good for this exact purpose of finding pandas functions to transform dataframes by giving an example and some explanation.

surreal solstice Apr 28, 2023, 2:02 PM

#

That's a great idea @mild dirge , I'll definitely look into it after this.

#

Thank you

surreal solstice Apr 28, 2023, 2:03 PM

#

cold osprey and then do like ```py df['percent_orange'] = df['is_orange'] / (df['is_orange'...

Yeah, this is exactly the kind of thing I was thinking as well: just define the complex stuff up front and just sum, right? But if the weights (in our case, location) are different, then that would lead to incorrect roll-ups.

cold osprey Apr 28, 2023, 2:03 PM

#

is_orange    is_non_orange    melons    percent_orange
location                
backyard    2    1    270    0.666667
bank    0    2    165    0.000000
store    2    0    146    1.000000```

surreal solstice Apr 28, 2023, 2:04 PM

#

But that is exactly the kind of output I am looking for @cold osprey , yeah

boreal gale Apr 28, 2023, 2:04 PM

#

surreal solstice Alright, so given this DataFrame, what I'd like to do is something like this: so...

this is slightly odd.

df['is_orange'] + df['is_non_orange']
seems to be operation between two series, which returns a series
sum(df['melons'])
seems to be a scalar

how do you expect the result to look? basically i want to know what is the output after your desired aggregation

cold osprey Apr 28, 2023, 2:05 PM

#

i mean i got something

#

but not sure what u mean by the weights and rolling up

boreal gale Apr 28, 2023, 2:06 PM

#

oh i see, you just didn't add all the sums in, right?

surreal solstice Apr 28, 2023, 2:06 PM

#

That's a really good question @boreal gale . The best I can understand it looking at this legacy code, we want a sum of our melons grouped by location. So the bank would have 165 melons, and so on. Then, the df['is_orange'] + df['is_non_orange'], you are correct, this is my bad and it is an abuse of notation. I'd like to sum up each value within the groups.

#

So if I was trying to explain it, it would be: for each individual, sum up all of the is_orange and is_non_orange values pertaining to that location, and present each sum in the groupby table.

#

I hope that makes sense !

cold osprey Apr 28, 2023, 2:07 PM

#

yeah i think u can do it with sums and defining new columns based on those

boreal gale Apr 28, 2023, 2:08 PM

#

understood, sorry if i am being pedantic

cold osprey Apr 28, 2023, 2:08 PM

#

is_orange    is_non_orange    melons    percent_orange    total orange/non-orange
location                    
backyard    2    1    270    0.666667    3
bank    0    2    165    0.000000    2
store    2    0    146    1.000000    2``` is my final output

surreal solstice Apr 28, 2023, 2:08 PM

#

not at all. and noted @cold osprey , I'll take a look

surreal solstice Apr 28, 2023, 2:09 PM

#

boreal gale understood, sorry if i am being pedantic

Not at all, it was a great question. I was using badly-written pseudo code haha

#

In the meantime @cold osprey I will try to implement this. @boreal gale definitely let me know what you think.

#

Thanks everyone for your help

boreal gale Apr 28, 2023, 2:16 PM

#

surreal solstice Not at all, it was a great question. I was using badly-written pseudo code haha

!e i would use something like this, but it's worth remembering shimmer's point - precomputing anything where sensible on the global dataframe level (though i don't think there is any here)

import pandas as pd

df = pd.DataFrame({'location': ['backyard', 'store', 'bank', 'backyard', 'backyard', 'bank', 'store'],
                   'is_orange': [1, 1, 0, 0, 1, 0, 1],
                   'is_non_orange': [0, 0, 1, 1, 0, 1, 0],
                   'melons':     [73, 81, 94, 174, 23, 71, 65]})



def stats(df_subgroup):
    return pd.Series({
        'total_oranges': (df_subgroup['is_non_orange'] + df_subgroup['is_orange']).sum(),
        'melons': (df_subgroup['melons']).sum(),
    })


print(df.groupby('location').apply(stats))

arctic wedgeBOT Apr 28, 2023, 2:16 PM

#

@boreal gale :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 |           total_oranges  melons
002 | location                       
003 | backyard              3     270
004 | bank                  2     165
005 | store                 2     146

surreal solstice Apr 28, 2023, 2:17 PM

#

Ah, I didn't think to use apply or the custom function like this at all. So the Series object can in effect contain 2 series.

boreal gale Apr 28, 2023, 2:18 PM

#

the returned series in the custom function describe one row at a time like this

cold osprey Apr 28, 2023, 2:18 PM

#

not sure if there can be a speed up if u precompute sums then create new cols

#

may well be slower

#

i just deleted my code lel

surreal solstice Apr 28, 2023, 2:20 PM

#

yeah I'm unsure of how the groupby is optimized.

#

@boreal gale do you think your answer is worth posting on SO?

boreal gale Apr 28, 2023, 2:22 PM

#

there might very well be better solutions out there - this is just the way i prefer to do it
at the end of the day, if it solves a particular issue it's probably worth being posted 🤷

surreal solstice Apr 28, 2023, 2:22 PM

#

Got it, I can post my question then

cold osprey Apr 28, 2023, 2:22 PM

#

fair comparison or?

#

10x speedup

surreal solstice Apr 28, 2023, 2:23 PM

#

interesting, let me try it

cold osprey Apr 28, 2023, 2:23 PM

#

but need second dataframe or dropping old columns if new 'calculated' columns are placed in same dataframe

#

ah wait

#

lemme put the sum groupby in the same cell

#

unfair comparison

boreal gale Apr 28, 2023, 2:25 PM

#

i would run benchmark on the actual problem instead of in this microbenchmark-esque way, but 10x does sound significant

surreal solstice Apr 28, 2023, 2:27 PM

#

agree

#

I think they're both fair shakes at the problem

boreal gale Apr 28, 2023, 2:28 PM

#

how did you get df2?

cold osprey Apr 28, 2023, 2:28 PM

#

boreal gale how did you get `df2`?

just created it

#

empty dataframe

boreal gale Apr 28, 2023, 2:28 PM

#

could you give me code for doing that?

#

literally empty? no index no column?

surreal solstice Apr 28, 2023, 2:29 PM

#

Ry, did you have a way to calculate the percentage oranges?

cold osprey Apr 28, 2023, 2:29 PM

#

#

trynna figure out how to increase the loops and runs manually

#

numbers inconsistent across runs

boreal gale Apr 28, 2023, 2:30 PM

#

could you post cell 17 as in here please?

cold osprey Apr 28, 2023, 2:30 PM

#

boreal gale could you post cell 17 as in here please?

thats ur stats function

#

sec running benchmark with more loops n runs

#

will send notebook

#

nvm cant send notebook LOL

#

best i can do

#

looks like 2x

#

lemme try with df2 declaration in the same cell

#

ah

surreal solstice Apr 28, 2023, 2:34 PM

#

df = pd.DataFrame({
    'location' : ['backyard', 'store', 'bank', 'backyard', 'backyard', 'bank', 'store'],
    'is_orange': [1, 1, 0, 0, 1, 0, 1],
    'is_non_orange': [0, 0, 1, 1, 0, 1, 0],
    'melons': [73, 81, 94, 174, 23, 71, 65]
})

def stats(df_subgroup):
    return pd.Series({
    'total_oranges' : (df_subgroup['is_non_orange'] + df_subgroup['is_orange']).sum(),
    'percentage_oranges' : (df_subgroup['is_orange'] / (df_subgroup['is_non_orange'] + df_subgroup['is_orange'])).mean(),
    'melons': (df_subgroup['melons']).sum()
})

location    total_oranges    percentage_oranges    melons
backyard              3.0                  0.66       270
bank                  2.0                  0.00       165
store                 2.0                  1.00       146

cold osprey Apr 28, 2023, 2:34 PM

#

no diff this time around

surreal solstice Apr 28, 2023, 2:35 PM

#

surreal solstice ``` df = pd.DataFrame({ 'location' : ['backyard', 'store', 'bank', 'backyard...

I believe these are the correct results. Shimmer do you get the same thing?

cold osprey Apr 28, 2023, 2:36 PM

#

uh

#

is_orange    is_non_orange    melons    percent_orange    total orange/non-orange
location                    
backyard    2    1    270    0.666667    3
bank    0    2    165    0.000000    2
store    2    0    146    1.000000    2```

#

seems like i do

surreal solstice Apr 28, 2023, 2:37 PM

#

yeah seems like it

#

Tell you what, I'll make an SO question here, and I'll send the link to you guys in a few

#

Both answers seem to work for me as well

cold osprey Apr 28, 2023, 2:39 PM

#

ye its roughly doing the same thing

surreal solstice Apr 28, 2023, 2:39 PM

#

I think your version is good but what do you think about memory overhead

cold osprey Apr 28, 2023, 2:39 PM

#

not exactly sure how apply with custom function works under the hood

surreal solstice Apr 28, 2023, 2:39 PM

#

yeah same

cold osprey Apr 28, 2023, 2:40 PM

#

i think memory shud be about the same

#

using apply will 'delete' the old df/uneeded columns only when it returns the new one so

#

same thing as df/df2 and then manually deleting df

#

or using same df and then dropping the unneeded cols

#

haha i wanna up the runs and loops just to see how far i can push it

boreal gale Apr 28, 2023, 2:43 PM

#

surreal solstice ``` df = pd.DataFrame({ 'location' : ['backyard', 'store', 'bank', 'backyard...

oh, the aggregation doesn't get more complicated than that?

surreal solstice Apr 28, 2023, 2:44 PM

#

In my case, it doesn't, but you've asked a good question. What if it did?

boreal gale Apr 28, 2023, 2:46 PM

#

if agg('sum') doesn't give you sufficient information for your further aggregation then you are potentially stuck with apply

surreal solstice Apr 28, 2023, 2:47 PM

#

Right. Also, it seems that if you need to do some sort of multiplicative thing like *, / , then you'd have to use the mean or median function to retrieve the correct value

#

Which kinda makes sense, it's sort of what shimmer is doing.

cold osprey Apr 28, 2023, 2:47 PM

#

ah thats that the mean was for

#

i was wondering what it was there for but it worked

surreal solstice Apr 28, 2023, 2:48 PM

#

yeah exactly. Basically it's kinda creating your new columns and broadcasting the same value to each subgroup, so it takes the 'mean' of the subgroup which is just all the same numbers

#

@cold osprey , @boreal gale , does this sound right? [pandas]: I need to groupby on a column, then define multiple (including some custom) aggregation functions.

cold osprey Apr 28, 2023, 2:49 PM

#

ye seems like a good way to do it

#

that way anyone can just modify the agg function and it will apply to everywhere it is used

surreal solstice Apr 28, 2023, 2:50 PM

#

Yeah, agree

#

I will see if I can add that point too once I ask that question you guys answer

boreal gale Apr 28, 2023, 2:51 PM

#

yeah that sounds sensible as a title, also i gotta go, have fun 🙂

surreal solstice Apr 28, 2023, 2:55 PM

#

@boreal gale , @cold osprey : https://stackoverflow.com/questions/76130797/pandas-groupby-on-columns-then-define-multiple-including-some-custom-agg

Stack Overflow

[pandas]: Groupby on column(s), then define multiple (including som...

I need to be able to define, in one groupby/agg statement, the sum & the division of two separate columns and save them as a new column, and then calculate the sum of another column.
Mods, please

#

Posted, please answer and decide who will get the accepted answer. Thanks again so much for your guys's help

cold osprey Apr 28, 2023, 2:56 PM

#

haha idet i have a stackoverflow account

surreal solstice Apr 28, 2023, 2:58 PM

#

Got it, would be great if you could upload your answer. If not I can do that as soon as I finish up w work

#

And give you credit

cold osprey Apr 28, 2023, 2:58 PM

#

haha its fine yeah u can upload my ans too

surreal solstice Apr 28, 2023, 3:11 PM

#

done, I gave you credit as shimmer from the Python Discord server, I hope that is enough

cold osprey Apr 28, 2023, 3:12 PM

#

i dont care much for credit but thanks

surreal solstice Apr 28, 2023, 3:14 PM

#

Of course. Thanks so much for your help, not often you come across something like this

surreal solstice Apr 28, 2023, 3:14 PM

#

surreal solstice <@231160898872410123> , <@342346882800025600> : https://stackoverflow.com/questi...

@boreal gale , I'll let you post your answer to that link if you want, otherwise I will post your approach & also credit you

past meteor Apr 28, 2023, 3:50 PM

#

Any of you ever used transformers for multivariate time series analysis, if so how was your experience? I'm not sure how I feel about it since attention is permutation invariant. Not sure we have enough data for stuff like temporal fusion transformers either.

#

I'm not sure there's any merit to doing this at all - do people just apply them to time series because they are sequences?

rugged comet Apr 28, 2023, 5:04 PM

#

Code

df = spark.read.option("header", True).csv(*[f"/FileStore/tables/deck_data_{num}.csv" for num in range(500000, 2500001, 500000)])

Error

ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near '/': extra input '/'(line 1, pos 0)

== SQL ==
/FileStore/tables/deck_data_1000000.csv
^^^

What is going on here? I'm not using a SQL cell, I'm using a Python cell. This is in databricks by the way. I'm trying to read multiple csv files at once by unpacking a list of file paths.

hollow crane Apr 28, 2023, 5:05 PM

#

Someone please help me, there is one cell with an error i don't understand please check this link and help me out, PS: I am a beginner practicing python data-science

https://colab.research.google.com/drive/1LE1RLYrl1pCWfoMPbmWiVFebg3a99pB-?usp=sharing
The code and the error you will find in this link below

Google Colaboratory

tranquil gust Apr 28, 2023, 5:35 PM

#

rugged comet Code ```py df = spark.read.option("header", True).csv(*[f"/FileStore/tables/deck...

df = spark.read.option("header", True).csv(*[f"file:/FileStore/tables/deck_data_{num}.csv" for num in range(500000, 2500001, 500000)])

warm jungle Apr 28, 2023, 5:45 PM

#

Suppose I have an Nx2 array: e.g

n [92]: a
Out[92]: 
array([[3, 7],
       [2, 4],
       [0, 9]])

I want to treat each row as representing the start and end points of a consecutive sequence of elements in another array. How can efficiently extract those sequences. Obviously they'll be different length, so can't be an array, but they could be a list of arrays. So e.g. if I have

In [98]: x
Out[98]: array([7, 1, 5, 2, 0, 4, 8, 9, 6, 3])

I want to get:

In [100]: r
Out[100]: [array([2, 0, 4, 8]), array([5, 2]), array([7, 1, 5, 2, 0, 4, 8, 9, 6])]

I can do it by looping over a and forming slices from each row, but is there something that doesn't involve looping in python?

tidal bough Apr 28, 2023, 6:39 PM

#

warm jungle Suppose I have an Nx2 array: e.g ``` n [92]: a Out[92]: array([[3, 7], ...

I don't think so, since the output is a list.

#

tried a numba function - it's exactly as fast as the python implementation, probably because a list is involved.

rugged comet Apr 28, 2023, 7:38 PM

#

tranquil gust df = spark.read.option("header", True).csv(*[f"file:/FileStore/tables/deck_data_...

I don't think this works.
I'm having a new error in spark though. I'm trying to reformat some csvs.
Code

for num in range(500000, 2500001, 500000):
    path = f"/FileStore/tables/deck_data_{num}.csv"
    with open(path, "r") as f:
        reader = csv.reader(f)
        with open(f"deck_data_{num}_formatted.csv", "w", newline="") as f:
            writer = csv.writer(f, delimiter="|")
            for row in reader:
                writer.writerow(row)

Error


FileNotFoundError: [Errno 2] No such file or directory: '/FileStore/tables/deck_data_500000.csv'

I'm sure the files are in the dbfs. It seems like spark won't let me open them.

tranquil gust Apr 28, 2023, 7:39 PM

#

rugged comet I don't think this works. I'm having a new error in spark though. I'm trying to...

You must enter "file:///c:/..." style in my opinion.

rugged comet Apr 28, 2023, 7:40 PM

#

tranquil gust You must enter "file:///c:/..." style in my opinion.

Not really sure what you mean by that. This is in databricks remember.

FileNotFoundError: [Errno 2] No such file or directory: 'file:///c:/FileStore/tables/deck_data_500000.csv'

tranquil gust Apr 28, 2023, 7:43 PM

#

When you input path, couldn't you see the file? like this.

rugged comet Apr 28, 2023, 7:44 PM

#

tranquil gust When you input path, couldn't you see the file? like this.

Databricks doesn't have intillisense as far as I'm aware. So no, I can't see the filename autocomplete.

#

#

I'm using databricks community edition.

tranquil gust Apr 28, 2023, 8:01 PM

#

rugged comet I'm using databricks community edition.

If the path is no problem, in other word the file is existing, the code will go well.

#

rugged comet Apr 28, 2023, 8:02 PM

#

The file system on databricks is a bit different than a regular hard drive.

next valley Apr 28, 2023, 8:14 PM

#

Never used databricks but can you traverse whatever the file system they use?

tranquil gust Apr 28, 2023, 8:15 PM

#

Did you check if the file is existing?

vagrant ginkgo Apr 28, 2023, 8:54 PM

#

Anyone here super familiar with matplotlib?

rugged comet Apr 28, 2023, 8:59 PM

#

next valley Never used databricks but can you traverse whatever the file system they use?

That's basically what I'm trying to figure out.

rugged comet Apr 28, 2023, 8:59 PM

#

tranquil gust Did you check if the file is existing?

Yes.

warm jungle Apr 28, 2023, 9:03 PM

#

tidal bough tried a numba function - it's exactly as fast as the python implementation, prob...

Although the output is a list, it should be possible to do the iteration over the inputs quicker. I'll have a play with cython and numba...

wooden sail Apr 28, 2023, 9:03 PM

#

you could probably rewrite the code that generates that list of lists to generate slices instead

tidal bough Apr 28, 2023, 9:04 PM

#

my approach would probably be to rethink whether really need that list of arrays. like, maybe you can use slices directly?

warm jungle Apr 28, 2023, 9:07 PM

#

Not sure what you mean exactly, a slice, of itself, isn't the data - I need to get my hands on the data. The resulting arrays will be views, so no need to actually copy the underlying memory

tidal bough Apr 28, 2023, 9:08 PM

#

I mean that maybe you can rewrite whatever function consumes this list to take an array of pairs instead, and take slices using that array, which can be numbified.

tranquil gust Apr 28, 2023, 9:12 PM

#

rugged comet Yes.

I have not experienced databricks as well. So I can't give you advice any more. Sorry.

rugged comet Apr 28, 2023, 9:12 PM

#

tranquil gust I have not experienced databricks as well. So I can't give you advice any more. ...

It's okay. Thanks for trying to help.

warm jungle Apr 28, 2023, 9:13 PM

#

tidal bough I mean that maybe you can rewrite whatever function consumes this list to take a...

It;s the "taking slices" bit that I'm trying to solve...

tidal bough Apr 28, 2023, 9:14 PM

#

warm jungle It;s the "taking slices" bit that I'm trying to solve...

Well no, not quite, you're trying to then put these slices into a list. I'm saying that maybe you can construct these slices right before usage.

#

Like, instead of having a function that takes a list of variable-length arrays, have a function that takes an (N,2) shaped array, and, inside the function, take slices using these pairs. That way, the whole thing can be numbified much better than creating a list of numpy arrays can be.

glacial iris Apr 28, 2023, 10:54 PM

#

https://tenor.com/view/lightning-struck-by-gif-14902359

Tenor

narrow crane Apr 28, 2023, 11:58 PM

#

hey

#

i'm looking for some advice on datacsience here #1101656351744196688

orchid sky Apr 29, 2023, 12:32 AM

#

What is the most used function for AI in pythob

next valley Apr 29, 2023, 12:38 AM

#

orchid sky What is the most used function for AI in pythob

Too vague and also doesn't mean much, but if u insist imo matrix multiplication

orchid sky Apr 29, 2023, 12:38 AM

#

Yes that go with that

next valley Apr 29, 2023, 12:40 AM

#

Then again i guess a more suitable answer would be tensor multiplication lol

orchid sky Apr 29, 2023, 12:41 AM

#

If can ask on how do you get an idea on what to program with AI if want to create a new tool

serene scaffold Apr 29, 2023, 12:55 AM

#

even if you're starting with a pretrained model, you would still need a GPU to fine tune it for your use case. but you can get some free GPU compute at google colab.

#

@high iron what are you trying to do?

#

chatbots are not a good first project.

next valley Apr 29, 2023, 1:18 AM

#

Honestly, just learn multi var calculus and a intro to linear algebra and everything else is somewhat simple to understand

next valley Apr 29, 2023, 1:39 AM

#

Training a 1 stack/layer transformer ~~which I am assuming you will be doing anyway because that's what everyone only talks about these days~~ with hardware such as a 3080 take about 30 minutes on a dataset of 51785 training 1803 test and 1193 validation
In batches (mini batch) of 64
And epoch of 20
Each example has 86 feature and we'll inference up to 81 labels max

The 3080 had it's power draw capped to 85%

This should be faster with decode only ~~because that's what everyone does, but it makes sense since the MHA confusion matrix for the encoder shows that it's fucking useless~~ as you removed about half the trainable parameters

#

so technically you don't really need hardware, the problem is data, which everyone seems to forget, for some reason

vital cedar Apr 29, 2023, 2:50 AM

#

LinearRegression.predict()

This returns a list with a bunch of decimal values when I used it on my x_test data, what are these numbers for and how can they be used? (Linear regression model from sklearn library)

serene scaffold Apr 29, 2023, 2:54 AM

#

vital cedar ```py LinearRegression.predict() ``` This returns a list with a bunch of decimal...

it returns a list of predictions, where the nth element of the list is the prediction for the nth element of x_test.

#

you have to know what x_test represents and what the model is intended to do to make sense of the output.

vital cedar Apr 29, 2023, 2:55 AM

#

serene scaffold you have to know what `x_test` represents and what the model is intended to do t...

Well it is a spam filter project, one column shows the email, the other has a value of 1 or 0 (0 is not spam, 1 is spam)

serene scaffold Apr 29, 2023, 2:55 AM

#

vital cedar Well it is a spam filter project, one column shows the email, the other has a va...

if the model is supposed to tell you if the model is spam or not, than the "is spam" value should not be part of the x data.

vital cedar Apr 29, 2023, 2:56 AM

#

serene scaffold if the model is supposed to tell you if the model is spam or not, than the "is s...

Yes, the "is spam" is not present, only the number

serene scaffold Apr 29, 2023, 2:57 AM

#

@vital cedar how did you represent the emails for the purposes of linear regression?

vital cedar Apr 29, 2023, 2:57 AM

#

serene scaffold <@817603098649821215> how did you represent the emails for the purposes of linea...

Count Vectorizer

serene scaffold Apr 29, 2023, 3:01 AM

#

vital cedar Count Vectorizer

and the outputs from predict are what? numbers between 0 and 1?

vital cedar Apr 29, 2023, 3:01 AM

#

serene scaffold and the outputs from predict are what? numbers between 0 and 1?

[ 0.63555711  0.28661988  0.63555711 ... -0.36245224 -1.77000725
 -0.47555284]

#

They don't seem to be between 0 and 1

serene scaffold Apr 29, 2023, 3:02 AM

#

vital cedar ```py [ 0.63555711 0.28661988 0.63555711 ... -0.36245224 -1.77000725 -0.47555...

this is an array, not a list.

vital cedar Apr 29, 2023, 3:03 AM

#

serene scaffold this is an array, not a list.

My bad

thorn swift Apr 29, 2023, 3:04 AM

#

vital cedar They don't seem to be between 0 and 1

youll need to do transformations to force between 0 and 1, I did a spam model the other day and just cut it at .5 (>.5 = 1) (<=.5 =0)

thorn swift Apr 29, 2023, 3:05 AM

#

vital cedar They don't seem to be between 0 and 1

https://www.kaggle.com/code/joseacortes/detecting-spam-with-different-models

Detecting Spam with different models

Explore and run machine learning code with Kaggle Notebooks | Using data from SMS SPAM Detection Dataset

vital cedar Apr 29, 2023, 3:12 AM

#

thorn swift youll need to do transformations to force between 0 and 1, I did a spam model th...

What do these values represent exactly?

#

I thought it would return 0 or 1

thorn swift Apr 29, 2023, 3:13 AM

#

not if you use linear regression, you basically fitted a line on a graph

#

if you want to use a model like that you can do a random forrest i guess but its basically the same thing with the cutoff i wrote just built in

mint palm Apr 29, 2023, 6:53 AM

#

if line 49 says both are same dimensions, how do line 50 gives no error on view but 52 says RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

#

agile cobalt Apr 29, 2023, 7:03 AM

#

mint palm if line 49 says both are same dimensions, how do line 50 gives no error on ``vie...

it sounds like an issue related to the stride? (or something about the underlying array being non-continuous, see https://github.com/cezannec/capsule_net_pytorch/issues/4)

did you try just using the other function the error message tells you to?

mint palm Apr 29, 2023, 7:15 AM

#

agile cobalt it sounds like an issue related to the stride? (or something about the underlyin...

i havent tried those, it would look inconsistent in code to do so, i dont see why they are different, and one is compatible with view and other not

vital cedar Apr 29, 2023, 7:58 AM

#

thorn swift not if you use linear regression, you basically fitted a line on a graph

Well, what are those values and how can I use them?

thorn swift Apr 29, 2023, 8:03 AM

#

vital cedar Well, what are those values and how can I use them?

The higher the number the more your robot thinks it’s spam

vital cedar Apr 29, 2023, 8:03 AM

#

thorn swift The higher the number the more your robot thinks it’s spam

Thank you 👍

lapis sequoia Apr 29, 2023, 8:09 AM

#

anyone familiar with pyspark here?

#

please i need guidance

#

i have a project which consists of a 100GB json file, i need to process it; clean it and then insert it in to mongoDB
can someone please guide me, im not actually sure what to do here

vital cedar Apr 29, 2023, 8:36 AM

#

thorn swift The higher the number the more your robot thinks it’s spam

I'm messing with the numbers to get one that is accurate enough to differentiate between spam and not spam

if regModel.predict(msg) > 1:

For now it's 1 but is there a good or accurate way to find the number?

copper patio Apr 29, 2023, 1:48 PM

#

I want to start learning machine learning but I am not sure which framework to use, any suggestions? I am thinking either Pytorch or Tensorflow...

surreal solstice Apr 29, 2023, 2:13 PM

#

My advice is to learn PyTorch. It will save you headaches down the road, despite TensorFlow being easier to use OOB.

cold osprey Apr 29, 2023, 2:14 PM

#

+1 pytorch

#

esp if ure on windows

surreal solstice Apr 29, 2023, 2:28 PM

#

Also shimmer I was thinking. Your approach yesterday doesn't actually add significant space overhead because pandas doesn't create deep copies by default.

past meteor Apr 29, 2023, 2:33 PM

#

surreal solstice Also shimmer I was thinking. Your approach yesterday doesn't actually add signif...

Has your thing been solved?

#

I normally have a decorator laying somewhere that lets you compute an arbitrary function per group of your df

thorn swift Apr 29, 2023, 2:51 PM

#

copper patio I want to start learning machine learning but I am not sure which framework to u...

TensorFlow cause of its built in training loops and nothing else it’s mostly just my preference

#

I just want more people to use tensorflow tbh

thorn swift Apr 29, 2023, 2:53 PM

#

vital cedar I'm messing with the numbers to get one that is accurate enough to differentiate...

Read my earlier messages

vital cedar Apr 29, 2023, 2:57 PM

#

thorn swift youll need to do transformations to force between 0 and 1, I did a spam model th...

Transformations? How?

thorn swift Apr 29, 2023, 3:03 PM

#

.5 cutoff. Here: https://youtu.be/xG-E--Ak5jg

YouTube

Simplilearn

Classification In Machine Learning | Machine Learning Tutorial | Py...

🔥Artificial Intelligence Engineer Program (Discount Coupon: YTBE15): https://www.simplilearn.com/masters-in-artificial-intelligence?utm_campaign=ClassificationInMachineLearning-xG-E--Ak5jg&utm_medium=Descriptionff&utm_source=youtube
🔥Professional Certificate Program In AI And Machine Learning: https://www.simplilearn.com/pgp-ai-machine-learning-...

▶ Play video

past meteor Apr 29, 2023, 3:36 PM

#

copper patio I want to start learning machine learning but I am not sure which framework to u...

Honestly, starting with sci-kit learn might be a good idea

#

Especially if you're working with tabular data that should be the place to start.

copper patio Apr 29, 2023, 4:54 PM

#

copper patio I want to start learning machine learning but I am not sure which framework to u...

Which one is normally easier to learn?

hasty mountain Apr 29, 2023, 5:13 PM

#

copper patio Which one is normally easier to learn?

Pytorch

#

The degree of difficulty is something like:
Keras < Pytorch < Tensorflow

Keras is highest level API, you assemble your model like you assemble lego pieces.
Tensorflow is the lowest level, you have to do many things manually(not all, though)
Pytorch is the mid-term

mild dirge Apr 29, 2023, 5:34 PM

#

I'd say tensorflow and pytorch are at the same level

#

They have a lot of overlap

#

Easiest to learn will def be keras, as it is just: import model -> fit model -> use model. But you don't learn a lot from it.

hot blade Apr 29, 2023, 7:26 PM

#

when im feature engineering an imbalanced dataset, should i apply pca before or after resampling?

past meteor Apr 29, 2023, 8:04 PM

#

hot blade when im feature engineering an imbalanced dataset, should i apply pca before or ...

I'd make sure you want to resample first

#

It's fair to just do stuff as usual and then select your operating point manually by looking at ROC, PR, DET, ... based on your application

hot blade Apr 29, 2023, 8:12 PM

#

past meteor I'd make sure you want to resample first

im going for binary classification

#

one class is 95% of the dataset lol

#

so specificity and all that would be horrendous without resampling

past meteor Apr 29, 2023, 8:13 PM

#

You need to compute all of those metrics over several operating points (decision thresholds)

mild dirge Apr 29, 2023, 9:05 PM

#

hot blade when im feature engineering an imbalanced dataset, should i apply pca before or ...

You want to PCA first, because you want to know the the projection of the original data. And then you can resample

#

Otherwise the PCA is affected by the resampling

clear siren Apr 29, 2023, 9:22 PM

#

hiii

#

is there anybody who is good with flask and python , I need a bit of help

serene scaffold Apr 29, 2023, 9:27 PM

#

clear siren is there anybody who is good with flask and python , I need a bit of help

This is the data science channel, so try #web-development . But please always ask a complete question that someone can start answering. Not if someone knows about a topic.

serene scaffold Apr 30, 2023, 12:03 AM

#

@dapper flame this is the data science channel. kindly remove your messages from this channel and try in #web-development.

#

I see you also asked in #async-and-concurrency. that's fine. but please ask your question in only one channel, so that no one answers a question that was answered somewhere else.

dapper flame Apr 30, 2023, 1:13 AM

#

Ok no problem sorry

#

Thank you for your feedback

valid wind Apr 30, 2023, 1:42 AM

#

Are any of you guys good with using pytorch, for some reason I'm getting a dimension mismatch and I don't really know why

serene scaffold Apr 30, 2023, 1:47 AM

#

valid wind Are any of you guys good with using pytorch, for some reason I'm getting a dimen...

show code and error

valid wind Apr 30, 2023, 2:06 AM

#

serene scaffold show code and error

Ok so just a bit of context first, I'm simply trying to implement UNet for audio source separation and I'm using the musdb dataset and accompanying package in order to do it

#

https://paste.pythondiscord.com/alabaqaqut

#

There is the entirety of the code that I'm using

#

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-81-b99926cdd359> in <cell line: 1>()
      7         print("Target shape:", target.shape)
      8   train_unet_func(umet_model, reshaped_train_loader, optimizer, device, epoch, tb_writer)
----> 9   test_unet_func(umet_model, test_unet, device, epoch, tb_writer)
     10   umet_model.cpu()
     11   state_dict = umet_model.state_dict()

<ipython-input-79-badf5cb3f78b> in test_unet_func(model, test_data, device, epoch, tb_writer)
     38 
     39             x_padded, (left, right) = padding(x)
---> 40             right = x_padded.size(1) - right
     41             mask = model(x_padded.unsqueeze(0)).squeeze(0)[:, :, left:right]
     42             y = mask * x.unsqueeze(0)

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

#

Also, I printed out the input shape after every step of forward while debugging and some other areas, I don't know how much it will be useful to you, but I have included the paste of it below:
https://paste.pythondiscord.com/zilegevuqe

arctic wedgeBOT Apr 30, 2023, 8:16 AM

#

:incoming_envelope: :ok_hand: applied timeout to @stiff matrix until <t:1682843160:f> (10 minutes) (reason: duplicates spam - sent 4 duplicate messages).

The <@&831776746206265384> have been alerted for review.

bleak sky Apr 30, 2023, 9:04 AM

#

I'm working with a random forest classification model and when I implement it on an external validation dataset (like the test set), i'm getting a recall (sensitivity) of 100%. Is it fine to have a high recall like this?

past meteor Apr 30, 2023, 9:21 AM

#

bleak sky I'm working with a random forest classification model and when I implement it on...

that's for you to decide

#

You can get 100 % recall on a given class by just predicting everything belongs to that class.

bleak sky Apr 30, 2023, 9:23 AM

#

I'm working on binary classes... and the rest of the parameters seems to be what I expected

#

Accuracy = 83.33333333333334
Sensitivity = 100.0
Specificity = 66.66666666666666
Precision = 75.0
ROC = 83.33333333333334
MCC = 70.71067811865476
f1 = 85.71428571428571

#

I feel like this can still serve the purpose.. even with high recall

#

care to share your opinion....?

past meteor Apr 30, 2023, 9:32 AM

#

bleak sky Accuracy = 83.33333333333334 Sensitivity = 100.0 Specificity = 66.66666666666666...

Accuracy, precision, recall, specificity, sensitivity, ... all have intuitive, real world meanings. If I were you I'd ask myself the question "what would make my classifier a good one" and then look up the definitions of these metrics. I can't decide this for you, it's problem dependent 🙂

bleak sky Apr 30, 2023, 9:33 AM

#

past meteor Accuracy, precision, recall, specificity, sensitivity, ... all have intuitive, r...

yeahhh... okayy.. Thank you

lapis sequoia Apr 30, 2023, 10:35 AM

#

Hello everyone

plot_decision_boundary(model=model_4,
X=X,
y=y)

With this code im trying to check the decision boundary from the latest model Im building

#

This is the error I received

ValueError: Exception encountered when calling layer 'sequential_10' (type Sequential).

Input 0 of layer "dense_15" is incompatible with the layer: expected min_ndim=2, found ndim=1. Full shape received: (None,)

Call arguments received by layer 'sequential_10' (type Sequential):
  • inputs=('tf.Tensor(shape=(None,), dtype=float32)', 'tf.Tensor(shape=(None,), dtype=float32)')
  • training=False
  • mask=None

granite falcon Apr 30, 2023, 11:10 AM

#

Hi all need some help in data science. Question.

lapis sequoia Apr 30, 2023, 12:04 PM

#

granite falcon Hi all need some help in data science. Question.

What is it

lapis sequoia Apr 30, 2023, 12:11 PM

#

granite falcon Hi all need some help in data science. Question.

I forgot to ask😅

What data am I missing currently? Why is it not working?

granite falcon Apr 30, 2023, 12:21 PM

#

lapis sequoia What is it

@lapis sequoia hi davs I have dm you check

lone plaza Apr 30, 2023, 2:47 PM

#

Hello, I'm fairly new to ml. I'd consider myself as a intermediate to advanced python developer(although I see myself somewhere in the middle). But I have almost no experience in ml. I learn best by finding somebody who can guide me. Anyone willing to spend some time to help me to hop on the train?

#

Maybe I should mention that I'm really interested in the math behind it but I dunno if it's really worth it to learn it from scratch when there are already so many libraries etc

past meteor Apr 30, 2023, 3:09 PM

#

@lone plaza https://mml-book.github.io/ and then https://www.statlearning.com/ and https://arxiv.org/abs/2106.11342 ideally you should actually do projects etc. while reading these

Mathematics for Machine Learning

An Introduction to Statistical Learning

arXiv.org

Dive into Deep Learning

This open-source book represents our attempt to make deep learning
approachable, teaching readers the concepts, the context, and the code. The
entire book is drafted in Jupyter notebooks, seamlessly integrating exposition
figures, math, and interactive examples with self-contained code. Our goal is
to offer a resource that could (i) be freely av...

#

Each of them has a "who is this book for" section, let that convince you whether or not you want to go into the maths or not.

zealous hollow Apr 30, 2023, 3:45 PM

#

lone plaza Hello, I'm fairly new to ml. I'd consider myself as a intermediate to advanced p...

hello falk
if you managed to find a outline or a guide my case is exactly similar to you
but i have little bit experience in it
i did some projects 😂

#

can you tell me about it as well

buoyant vine Apr 30, 2023, 4:27 PM

#

Hi all,

I'm currently trying to work out a bit of an issue i'm having mapping some numpy operations to Rust and have encountered an interesting behaviour which has dumbfounded me.

Say I have an array:

data = np.array([
    np.full(5, 0.20, dtype=np.float64),
    np.full(5, 25.7, dtype=np.float64),
    np.full(5, 3.0, dtype=np.float64),
    np.full(5, 0.9, dtype=np.float64),
], dtype=np.float64)

And it's an f64, when I then apply the following ops to it:

    hyperplane_vector = np.empty(dim, dtype=np.float64)
    for d in range(dim):
        hyperplane_vector[d] = (data[left, d] / left_norm) - (
            data[right, d] / right_norm
        )

Where left, right and their respective norms are:

left = 0
right = 1
left_norm = norm(data[left])  # L2 norm
right_norm = norm(data[right])  # L2 norm

Numpy will produce an array of 0.0

interesting [0. 0. 0. 0. 0.]

But if this array becomes a float32 we get:

interesting [-5.0820086e-09 -5.0820086e-09 -5.0820086e-09 -5.0820086e-09 -5.0820086e-09]

And this is confusing the fuck out of me where the accuracy is dropping / if the f64 is correct and it should be zero or if something else is going on

#

The reason why i'm a bit confused is because when porting this over the f32 array over in rust world Is creating the same values as numpy's float64 behavour
If I force it to become a f64 array and do all of that with double precision I get a number close to the f32 value in numpy but off by a tad which can honestly just get put down to rounding error

lone plaza Apr 30, 2023, 4:30 PM

#

past meteor <@1083413473159041165> https://mml-book.github.io/ and then https://www.statlea...

Wow, dude! Appreciate you!

zealous hollow Apr 30, 2023, 6:31 PM

#

hi all,
so basically, i am working on a project right now and need some help more like guidance

i want to predict temperature values based on inputs such as date, humidity (percentage values)
First, are these enough inputs?
second i am using one hot coding (not sure if that the right name but basically taking date as Day 1,2,3,4,5,....

third which algorithm will be best
i have worked with SVM (rgf) and only dates and i found the results quite promising.
but when i introduced humidity values the results were worse
like before mean error was 1.xxxx and then it went up like 144.xxxx
pearson corealtion values were
-.67 for temp and humidity
0.5xx for temp and day

sterile belfry Apr 30, 2023, 6:33 PM

#

hi Allive been asked to run random_state 10 time and take the mean of them, do I just code it like this? new to all this and really struggling to get my head round it
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.20, random_state=10)

agile cobalt Apr 30, 2023, 6:36 PM

#

no.

look up cross validation

mild dirge Apr 30, 2023, 6:38 PM

#

They probably want you to do 10 random runs, so you should set the seed to a different value each time. @sterile belfry

zealous hollow Apr 30, 2023, 6:40 PM

#

zealous hollow hi all, so basically, i am working on a project right now and need some help mor...

data

#

processed data

zealous hollow Apr 30, 2023, 6:42 PM

#

past meteor <@1083413473159041165> https://mml-book.github.io/ and then https://www.statlea...

thanks from me as well

rigid cape Apr 30, 2023, 7:04 PM

#

hi

mild dirge Apr 30, 2023, 7:05 PM

#

Oy

#

Have you already watched a video by 3 blue 1 brown @rigid cape ?

rigid cape Apr 30, 2023, 7:05 PM

#

watching it

mild dirge Apr 30, 2023, 7:06 PM

#

The two videos I sent I actually highly recommend watching. The one by sebastian lague is pretty chill, and he explains in pretty simple terms.

#

The 3b1b goes a bit more in-depth on the maths

sullen reef Apr 30, 2023, 7:17 PM

#

mild dirge The two videos I sent I actually highly recommend watching. The one by sebastian...

I agree those two are good ones

tall tulip Apr 30, 2023, 8:28 PM

#

I'm working on a time series dataset, when I plot this using line plotly library the time will show like this I want to format it, so it look little good, I've tried everythin I know but still didn't work for me can anyone help me in this to format this datetime.

elfin stirrup May 1, 2023, 4:05 AM

#

can someone help me with this question? i got 7 datapoints for cluster 1 but it was incorrect

Screenshot_2023-05-01_at_12.00.45_AM.jpg

magic dune May 1, 2023, 4:16 AM

#

elfin stirrup can someone help me with this question? i got 7 datapoints for cluster 1 but it ...

are you using Euclidean distance

elfin stirrup May 1, 2023, 4:17 AM

#

magic dune are you using Euclidean distance

Manhatten distance

magic dune May 1, 2023, 4:18 AM

#

elfin stirrup Manhatten distance

show formula

elfin stirrup May 1, 2023, 4:19 AM

#

magic dune show formula

After performing the initial clustering and restimating the centroids using the Manhattan distance as a distance metric, we obtain:

Cluster 1: [2, 1, 4, 5, 3]

Cluster 2: [8, 6, 28, 12, 9, 7, 10]

The new centroids are:

K1_new = (2+1+4+5+3)/5 = 3

K2_new = (8+6+28+12+9+7+10)/7 = 11.14285714

#

[3, 6, 2, 1, 4, 5, 7] = C1

#

7 datapoints

elfin stirrup May 1, 2023, 4:26 AM

#

magic dune show formula

Distance to K1_new:

[0.67, 5.67, 3.67, 1.67, 2.67, 4.67, 24.33, 8.33, 5.33, 2.33, 3.33, 6.33]

Distance to K2_new:

[4.5, 5.5, 15.17, 9.5, 11.5, 8.17, 15.83, 3.83, 3.83, 7.17, 4.17, 2.83]

#

then i used the distances to get the c1 points

magic dune May 1, 2023, 4:26 AM

#

ok

elfin stirrup May 1, 2023, 4:27 AM

#

i got 7 buts incorrect

magic dune May 1, 2023, 4:27 AM

#

elfin stirrup i got 7 buts incorrect

do you know correct answer or no

elfin stirrup May 1, 2023, 4:28 AM

#

magic dune do you know correct answer or no

i dont

#

doesnt give me answer

magic dune May 1, 2023, 4:29 AM

#

| x 1 − x 2 | + | y 1 − y 2 |

#

@elfin stirrup

#

this is manhattan

elfin stirrup May 1, 2023, 4:29 AM

#

magic dune this is manhattan

yes i did that

magic dune May 1, 2023, 4:36 AM

#

for my centroids

#

wait

#

wrong

elfin stirrup May 1, 2023, 4:45 AM

#

magic dune for my centroids

?

magic dune May 1, 2023, 4:53 AM

#

elfin stirrup ?

second one I am getting 28

#

but idk

#

feels like to much

elfin stirrup May 1, 2023, 4:54 AM

#

magic dune second one I am getting 28

Cluster 1: [3, 2, 1, 4, 5, 7, 9]

Cluster 2: [8, 6, 28, 12, 10]

#

this is what i got for both clusters

elfin stirrup May 1, 2023, 4:54 AM

#

magic dune feels like to much

yeah i dont think so

wooden sail May 1, 2023, 5:48 AM

#

elfin stirrup Cluster 1: [3, 2, 1, 4, 5, 7, 9] Cluster 2: [8, 6, 28, 12, 10]

you solution seems ok, the result depends on whether you use <= or <, since it looks like one distance is repeated

#

!e

import numpy as np
x = np.array([3,8,6,2,1,4,28,12,9,5,7,10])
c1 = 2
c2 = 8

for i in range(2):
    d1 = np.abs(x-c1)
    d2 = np.abs(x-c2)
    print(d1)
    print(d2)
    clusters = d1 < d2
    if i == 1:
        break
    c1 = np.mean(x[clusters])
    c2 = np.mean(x[np.logical_not(clusters)])

print(f"{np.sum(clusters)} points belong to cluster c1 with centroid {c1}")
print(f"{np.sum(np.logical_not(clusters))} points belong to cluster c2 " + 
      f"with centroid {c2}")

arctic wedgeBOT May 1, 2023, 5:53 AM

#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [ 1  6  4  0  1  2 26 10  7  3  5  8]
002 | [ 5  0  2  6  7  4 20  4  1  3  1  2]
003 | [ 0.5  5.5  3.5  0.5  1.5  1.5 25.5  9.5  6.5  2.5  4.5  7.5]
004 | [ 7.625  2.625  4.625  8.625  9.625  6.625 17.375  1.375  1.625  5.625
005 |   3.625  0.625]
006 | 6 points belong to cluster c1 with centroid 2.5
007 | 6 points belong to cluster c2 with centroid 10.625

wooden sail May 1, 2023, 5:53 AM

#

notice in the first iteration, there's a point with the same distance to both cluster centroids. the result depends on which one you pick for that iteration

#

if we use <= instead of <, we get 7 points as you said

elfin stirrup May 1, 2023, 5:59 AM

#

wooden sail if we use <= instead of <, we get 7 points as you said

oh ok so i put the wrong question lol.

Screenshot_2023-05-01_at_12.00.30_AM.jpg

#

it was similar but i got 6 points for this

#

Cluster 1: [3, 2, 1, 4, 5]
Cluster 2: [8, 6, 28, 12, 9, 7, 10]

New centroid K1 = mean([3, 2, 1, 4, 5]) = 3
New centroid K2 = mean([8, 6, 28, 12, 9, 7, 10]) = 11.4

Cluster 1: [3, 2, 1, 4, 5, 7]
Cluster 2: [8, 6, 28, 12, 9, 10]

New centroid K1 = mean([3, 2, 1, 4, 5, 7]) = 3.67
New centroid K2 = mean([8, 6, 28, 12, 9, 10]) = 12.17

Cluster 1: [3, 2, 1, 4, 5, 7]
Cluster 2: [8, 6, 28, 12, 9, 10]

earnest widget May 1, 2023, 9:50 AM

#

I have used a PCA technique for my image dataset and it shows the two classes points are overlapping. Does it mean there is a high degree of similarity?

past meteor May 1, 2023, 9:51 AM

#

earnest widget I have used a PCA technique for my image dataset and it shows the two classes po...

The first 2 principal components capture the most variance but it's not guaranteed that your classes can be separated along those 2 dimensions

#

They could be similar in the first 2 and dissimilar in the others, hard to tell. Maybe you can try LDA, it's more or less a supervised version of PCA.

earnest widget May 1, 2023, 9:53 AM

#

LDA?

past meteor May 1, 2023, 9:54 AM

#

latent discriminant analysis. I assume you use sci-kit learn? There's an LDA classifier there. What you want to call is the transform method.

earnest widget May 1, 2023, 9:55 AM

#

Oh okay alright, I am just extracting the features through mobileNet and then running it through the visualizations. I tried T-Sne as well, made no sense.

past meteor May 1, 2023, 9:55 AM

#

earnest widget Oh okay alright, I am just extracting the features through mobileNet and then ru...

So you're doing features => PCA => plot?

earnest widget May 1, 2023, 9:57 AM

#

past meteor So you're doing features => PCA => plot?

Yeah so normalize the images, resize, put in dataloader etc. => Run the MobileNet for image feature extraction => PCA => plot

past meteor May 1, 2023, 9:57 AM

#

Do you use a Standardscaler before you do your PCA

earnest widget May 1, 2023, 9:57 AM

#

Idk if I am missing any steps.

earnest widget May 1, 2023, 9:58 AM

#

past meteor Do you use a `Standardscaler` before you do your PCA

No. Supposed to do that as well?

past meteor May 1, 2023, 9:58 AM

#

Yes you are 🙂 make_pipeline(StandardScaler(), PCA(2))

earnest widget May 1, 2023, 9:59 AM

#

Oh okay.

#

But will that make any difference or should I go straight with LDA?

past meteor May 1, 2023, 9:59 AM

#

Personally I would go with PCA first again

earnest widget May 1, 2023, 10:00 AM

#

I am assuming the scaling function should be used with LDA, T-SNE etc?

earnest widget May 1, 2023, 10:00 AM

#

past meteor Personally I would go with PCA first again

Okay cool.

past meteor May 1, 2023, 10:02 AM

#

earnest widget I am assuming the scaling function should be used with LDA, T-SNE etc?

Many methods require feature scaling but not all of them. The rest can feel free to correct me if they disagree but when in doubt you can rescale because the effect of not doing it is worse than the inverse.

#

You should also use the exact image normalization that your pretrained model used. For example, some are trained on [-1, 1] (typically) others on [0, 1] so you should consult the docs to see what they did and mirror that.

earnest widget May 1, 2023, 10:03 AM

#

Oh alright. So this StandardScaler method is usually done right before using PCA or other methods?

earnest widget May 1, 2023, 10:04 AM

#

past meteor You should also use the exact image normalization that your pretrained model use...

You mean the size?

past meteor May 1, 2023, 10:04 AM

#

earnest widget Oh alright. So this StandardScaler method is usually done right before using PCA...

Yes indeed

past meteor May 1, 2023, 10:07 AM

#

earnest widget You mean the size?

Typically pixels are in [0, 255] but models are trained on [0,1] or [-1,1]

earnest widget May 1, 2023, 10:08 AM

#

past meteor Typically pixels are in [0, 255] but models are trained on [0,1] or [-1,1]

Oh okay alright, let me just try out this scaling function and I will let you know.

past meteor May 1, 2023, 10:10 AM

#

I think you're already doing this since you mentioned you normalize and then resize. Just check the docs to see what they normalized with in training

earnest widget May 1, 2023, 10:12 AM

#

So for MobileNet I am using this docs, https://pytorch.org/vision/main/models/generated/torchvision.models.mobilenet_v3_small.html#torchvision.models.mobilenet_v3_small It seems like the model was trained on 256x256.

earnest widget May 1, 2023, 10:13 AM

#

past meteor Yes indeed

Also, the scaling option did not work as well. Looks like the same thing.

past meteor May 1, 2023, 10:15 AM

#

"The inference transforms are available at MobileNet_V3_Small_Weights.IMAGENET1K_V1.transforms " Ideally you need to run this transformation instead of manually resizing / normalising

past meteor May 1, 2023, 10:16 AM

#

earnest widget Also, the scaling option did not work as well. Looks like the same thing.

The image is exactly the same as the previous one? Can I maybe see the code?

earnest widget May 1, 2023, 10:17 AM

#

past meteor "The inference transforms are available at `MobileNet_V3_Small_Weights.IMAGENET1...

Oh this is new to me. Let me just check it.

earnest widget May 1, 2023, 10:17 AM

#

past meteor The image is exactly the same as the previous one? Can I maybe see the code?

Yeah sure. Hold on.

#

So this code takes the features and labels:

features = []
labels = []
for images, label in train_dataloader:
    with torch.no_grad():
        outputs = model(images)
        features.append(outputs.numpy())
        labels.append(label.numpy())
features = np.concatenate(features, axis=0)
labels = np.concatenate(labels, axis=0)

I get 3488,1000 as the output. 3488 is the number of samples and 1000 is the features.

#

Then the scaling:

features_2d = scaler.fit_transform(features)```

#

pca = PCA(n_components=2)
pca_result = pca.fit_transform(features_2d)
print(pca_result.shape) # (3488,2)

ax = plt.figure()
ax = ax.add_subplot(111)
ax.scatter(pca_result[:, 0], pca_result[:, 1], c=labels, cmap=ListedColormap(colors))

# View the plot
%matplotlib inline
print(plt.show())

#

This is the visualization.

#

Also, I did not freeze any layers. I think the model.eval() command does that.

past meteor May 1, 2023, 10:26 AM

#

Hmmm then I'm not sure because this seems to be OK. Could be the transforms that are going wrong. I doubt it, but it's good practice to use the ones they suggest in the docs for pretrained models. If it's not that it could be that the features from mobilenet are inadequate (try Xception for example, but the model is a lot larger) OR that it is really your data OR that the first 2 PC's are not discriminative

earnest widget May 1, 2023, 10:27 AM

#

Hmm okay.

#

So the docs just say to put the weights like this:

model = models.mobilenet_v3_small(pretrained=True, weights="MobileNet_V3_Small_Weights.IMAGENET1K_V1")

#

Cause it accepts the params.

#

Let me try it now.

#

I remove the normalization and resizing.

#

But idk how that would make a significant difference.

past meteor May 1, 2023, 10:30 AM

#

Yes, that's one thing but the other one (bottom paragraph of the docs) is doing all of the same transformations they did. They make it simple by offering you MobileNet_V3_Small_Weights.IMAGENET1K_V1.transforms on this link: https://pytorch.org/vision/main/models/generated/torchvision.models.mobilenet_v3_small.html#torchvision.models.mobilenet_v3_small

earnest widget May 1, 2023, 10:33 AM

#

past meteor Yes, that's one thing but the other one (bottom paragraph of the docs) is doing ...

Yeah I saw this but they have not explicitly shown where to add it to.

past meteor May 1, 2023, 10:37 AM

#

earnest widget Yeah I saw this but they have not explicitly shown where to add it to.

Wherever you were resiziing and normalizing before you can replace it with that

earnest widget May 1, 2023, 10:38 AM

#

past meteor Wherever you were resiziing and normalizing before you can replace it with that

You mean my transform function?

transform = transforms.Compose(
    [
        transforms.Resize((IMG_HEIGHT, IMG_WIDTH)),  # Resize the images to (224, 224)
        transforms.ToTensor(),  # Convert the images to PyTorch tensors
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        ),  # Normalize the images
    ]
)

past meteor May 1, 2023, 10:38 AM

#

earnest widget You mean my transform function? ```python transform = transforms.Compose( [...

yes, this

earnest widget May 1, 2023, 10:42 AM

#

past meteor yes, this

Ah okay, so I don't think that works because the transform function does not accept that module.

transform = transforms.Compose(
    [
        transforms.Resize((IMG_HEIGHT, IMG_WIDTH)),  # Resize the images to (224, 224)
        transforms.ToTensor(),  # Convert the images to PyTorch tensors
        # transforms.Normalize(
        #     mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
        # ),  # Normalize the images
        transforms.MobileNet_V3_Small_Weights.IMAGENET1K_V1,
    ]
)

#

So I think I can apply the transformations which they have done through the transform function and then keep the weights param.

lapis sequoia May 1, 2023, 10:45 AM

#

past meteor May 1, 2023, 10:45 AM

#

earnest widget Ah okay, so I don't think that works because the transform function does not acc...

Maybe I'm explaining it poorly. Either way, you can just remove that part then because you hard coded the numbers which is fine as well I guess?

#

Using MobileNet_V3_Small_Weights.IMAGENET1K_V1.transforms (transforms is missing) just applies all the steps you've done.

#

I actually got to go now, if I were you I'd just try a different model at this point (Xception, Resnet)

earnest widget May 1, 2023, 10:49 AM

#

past meteor I actually got to go now, if I were you I'd just try a different model at this p...

Thanks a lot for your help. Will update you when I can.🤗

maiden widget May 1, 2023, 10:56 AM

#

can someone explain what is graph_execution_error and how can i overcome it ?

#

i have used tflearn to make a model for disease recognition in pomegranate. when i load the model and try to predict, it gives correct output at first but for second try it gives graph execution error

mild dirge May 1, 2023, 11:29 AM

#

You should probably check the shapes of your input and output data, and the input and output shape of your model. @maiden widget

maiden widget May 1, 2023, 11:59 AM

#

thanks

#

i actually the model needed to close first or restart before using it again

hot blade May 1, 2023, 1:17 PM

#

using keras, what's the difference between enabling return_sequences and having several nodes in the output layer?

lapis sequoia May 1, 2023, 1:26 PM

#

Anyone here know linear & logistic regression?

wooden sail May 1, 2023, 1:28 PM

#

lapis sequoia Anyone here know linear & logistic regression?

what's up? you got any specific questions?

graceful inlet May 1, 2023, 1:29 PM

#

hi guys, i use EMNIST balanced dataset(47 classes) in order to train my CNN in keras and i get an accuracy equal to 83%. however, when i make predictions with my model on new data(images drawn by me), i get inaccurate detections all the time(except for letter X for some reason lol).

inputs = Input(shape=(width, height, 1))
    x = Conv2D(filters=32, kernel_size=(3,3), activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.001))(inputs)
    x = MaxPooling2D(pool_size=(2,2 ))(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=64, kernel_size=(3,3), activation="relu", kernel_regularizer=tf.keras.regularizers.l2(0.001))(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Conv2D(filters=128, kernel_size=(3,3), activation="relu", kernel_regularizer=tf.keras.regularizers.l1_l2(0.001) )(x)
    x = BatchNormalization()(x)
    x = Flatten()(x)
    x = Dropout(.5)(x)
    outputs = Dense(NB_CLASSES, activation="softmax")(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

this is how my model looks like. what could be the reason behind incorrect predictions? i preprocess data correctly so i have no idea the cause of the problem.

#

width = 28; height = 28 and NB_CLASSES = 47

#

little note: i trained the model on 15 epochs and i set the batch size to 16

graceful inlet May 1, 2023, 2:09 PM

#

apparently it works but i have to rotate the image counter clockwise by 90 like bruh 😭

kind moth May 1, 2023, 2:14 PM

#

Hey can someone help with my code, I have an issue, here's my code:

import numpy as np
from keras.models import load_model
from PIL import Image
import time

#modle
model = load_model('model.h5')

noise = np.random.randn(1, 512, 512, 3)

delay_time = 30  #sec

for i in range(10):
    generated_images = model.predict(noise)

    #convert imge array
    generated_image = Image.fromarray(np.uint8(generated_images[0]*255))

    #save img
    generated_image.save('generated_image.png')

    #delay
    time.sleep(delay_time)

For some reason when I try to run it, it just generates a 1 by 1 white pixel image.
Thanks for any help!

weary dust May 1, 2023, 3:54 PM

#

hi guys. im new to programming i want to get into programming
i saw this video 1 month ago https://www.youtube.com/watch?v=WtEYMELvRHI&t=53s&ab_channel=AtleFjellangSæther
i searched many things about this and i want to make robots like this
in internet i understand i need to learn matlab ,python ,machine learning,arduino and raspberry pi
are they good to make one like this

YouTube

Atle Fjellang Sæther

Autonomous Self-Learning Robot (Q-Learning)

This video illustrates the work performed in the
context of our bachelor's thesis.

The project was conducted in collaboration
with Oslo and Akershus University College of
Applied Sciences.

The purpose of the thesis has been to elucidate
the main methods of self-learning systems, and
develop a self-learning algorithm for an
appropriate de...

▶ Play video

agile cobalt May 1, 2023, 4:50 PM

#

weary dust hi guys. im new to programming i want to get into programming i saw this video 1...

shouldn't need of matlab
auduino or or raspberry pi, probably no need to learn both
it is a bit harder to use python on arduino than on rapseberry pi

do not underestimate machine learning
but yes, it should be possible to do something like that with python + rapberry pi + machine learning, but it is by no means a simple project

#

(to clarify, by "do not underestimate machine learning" I mean, it is significantly deep - might be like three times harder compared to the other items you'd have to learn if you were to understand what each part of the system is doing with a reasonable depth.... though if you just stick with using it as a black box, copy pasting from some tutorial and editing without trying to understand what's happening under the hood, which tbh is perfectly fine, it might not be that bad)

ocean swallow May 1, 2023, 5:42 PM

#

you guys used langchain?

#

It is very elemantary to find the related text part given a question but what if the related text is related to another text and esentially the text, hence all the relevant information, becomes too big to feed to AI models.

lapis sequoia May 1, 2023, 5:49 PM

#

guys any idea on how i can perform exploratory data analysis on huge datasets without my analysis being biased? because if i reduce the dataset then my analysis starts to loose integrity

umbral delta May 1, 2023, 8:38 PM

#

so i have this:

df = pd.concat([pd.read_csv('./Sales_Data/'+file) for file in listdir('./Sales_Data')])
df.dropna(how='any')
df['Price Each'] = pd.to_numeric(df['Price Each'])

which causes this error:

ValueError: Unable to parse string "Price Each"

could someone please explain why this is happening

torn hull May 1, 2023, 9:11 PM

#

hey guys how we evaluate a recommendation system

uncut wasp May 1, 2023, 11:11 PM

#

Hello, I am trying to run DiffMorph (https://github.com/volotat/DiffMorph/) on my mac. However I ran into some error

To run this program I tried running it by doing the following commands:

pip install -r requirements.txt
python morph.py

GitHub

GitHub - volotat/DiffMorph: Image morphing without reference points...

Image morphing without reference points by applying warp maps and optimizing over them. - GitHub - volotat/DiffMorph: Image morphing without reference points by applying warp maps and optimizing ov...

#

however while installing the packages I ran in some error related to the numpy version required a higher python versxion.
So what I did is that I switched to python 3.6 to pythjon 3.10.
I redid the same commands above however I got a new error while running pip install -r requirements.txt:

ERROR: Could not find a version that satisfies the requirement tensorflow==2.9.1 (from versions: none)
ERROR: No matching distribution found for tensorflow==2.9.1
```.
How do I get to install tensorflow without having this error

earnest widget May 2, 2023, 5:44 AM

#

past meteor I actually got to go now, if I were you I'd just try a different model at this p...

Just wanted to let you know that I have tried out the LDA visualization and this is what I came up with. I think I know why the points keep coming closer, the features within each image is similar so that's why there is an overlap of the points. That makes sense right?

lapis sequoia May 2, 2023, 6:19 AM

#

this is my code

plt.plot(neigh, acc_score, marker="o", markeredgecolor = 'black', markerfacecolor = 'red')
plt.xlabel("Number of neighbors")
plt.ylabel("Accuracy score")```  and this below is my graph. How can i add the annotation on the markers only or make the x axis scale a bit more detailed?

mild dirge May 2, 2023, 7:26 AM

#

You can do plt.xticks(range(51), range(51)) @lapis sequoia

lapis sequoia May 2, 2023, 7:27 AM

#

mild dirge You can do `plt.xticks(range(51), range(51))` <@456226577798135808>

that's great, thank you.

mild dirge May 2, 2023, 7:27 AM

#

Can also do plt.grid() so you can see where it alligns

somber panther May 2, 2023, 8:52 AM

#

pandas dataframe, use it as an object or like a dictionary?

wooden sail May 2, 2023, 8:53 AM

#

wdym by "as an object"? dicts are also objects

somber panther May 2, 2023, 8:54 AM

#

just because series can be accessed with dataframe["header"] and dataframe.header

#

was wondering if one is prefered

wooden sail May 2, 2023, 8:56 AM

#

do you mean dataframe.head? those do different things

untold bloom May 2, 2023, 8:57 AM

#

df["..."] always works, df.... sometimes works; so in "real" code, one might want to prefer first

#

but latter is easier to type, so there's that

#

latter sometimes works only because of name clashes, e.g., if you have a column named "info" or "sum", it would fail and defer to the corresponding attributes

#

if you have nonvalid Python identifiers as a column name, e.g., one with spaces in it, it will fail too

#

so in short, IMHO, prefer df["..."] except in one-off quick trials on frames you perform

#

df["..."] has also the big advantage of conveying you are selecting a column immediately

young granite May 2, 2023, 9:22 AM

#

if i got a linear regression model and do a residual plot after prediction and in this plot i see x:y pairs of x:-x, how can i adapt so i reduce this phenomena

mild dirge May 2, 2023, 9:31 AM

#

So it predicts the y value too low, and this error is proportional to x? @young granite

young granite May 2, 2023, 9:32 AM

#

mild dirge So it predicts the y value too low, and this error is proportional to x? <@38575...

i would want something like this:

#

but i get a linear trend

#

i did check the IDs of the "outlier" sets for all targets an see that if they follow this linear trend for one feature they do it often for more

#

i wouldnt say its proportional

#

i think its just not good represented by the model

mild dirge May 2, 2023, 9:35 AM

#

The residuals just show you the error for different x values, which is what the top plot shows. I don't see any linear trend in that, nor do I see why you would want to fit a linear regression model on the residuals.

young granite May 2, 2023, 9:35 AM

#

for some targets i got fewer datasets

young granite May 2, 2023, 9:36 AM

#

mild dirge The residuals just show you the error for different x values, which is what the ...

u misunderstood me i guess, i do a LR on my dataset 2k data, from that i get 50 features and want to generate 10 targets

#

these pics are just examples

#

the green dots go from -x:x through 0

mild dirge May 2, 2023, 9:38 AM

#

I'm sorry, I don't think I can help

young granite May 2, 2023, 9:38 AM

#

mild dirge I'm sorry, I don't think I can help

thanks for trying did u understood the problem now or do i need to rephrase more?

somber panther May 2, 2023, 10:35 AM

#

untold bloom df["..."] always works, df.... sometimes works; so in "real" code, one might wan...

this is helpful, thanks

wooden sail May 2, 2023, 11:06 AM

#

young granite u misunderstood me i guess, i do a LR on my dataset 2k data, from that i get 50...

can you explain a little more? we do linear regression to find 50 parameters that explain your 2000 data examples; how many dimensions do the data observations have each?

young granite May 2, 2023, 11:08 AM

#

wooden sail can you explain a little more? we do linear regression to find 50 parameters tha...

i got a dataset of 2k datasets, each containing 800x2 datapoints which are reduced to 50 features to predict 10 targets

wooden sail May 2, 2023, 11:11 AM

#

and you're doing linear regression on those 50 features? or?

young granite May 2, 2023, 11:13 AM

#

yes

wooden sail May 2, 2023, 11:16 AM

#

ok, so you reduce it to 2000 samples, each one being a vector of 50 features. and you wanna do regression on that. what are these targets you want to predict, and how many parameters are you using in the linear regression?

young granite May 2, 2023, 11:20 AM

#

wooden sail ok, so you reduce it to 2000 samples, each one being a vector of 50 features. an...

m/z values, i currently run simple LR from scikitlearn without any parameters

wooden sail May 2, 2023, 11:23 AM

#

i#m still not understanding what you're trying to do, sorry

young granite May 2, 2023, 11:25 AM

#

wooden sail i#m still not understanding what you're trying to do, sorry

where am i loosing u?

cold osprey May 2, 2023, 11:26 AM

#

2000 samples

#

50 features

wooden sail May 2, 2023, 11:27 AM

#

i'm trying to figure out the size of the matrix we're working with, but i'm not sure what you're calling a feature here

young granite May 2, 2023, 11:29 AM

#

dataset: 2k
1 set contains a df o shape: 800row x 2col
each set is reduced to: 50 features
target are: 10 m/z values

cold osprey May 2, 2023, 11:30 AM

#

Wait what

wooden sail May 2, 2023, 11:30 AM

#

and you wanna do a different regression for each of the 2k datasets?

cold osprey May 2, 2023, 11:31 AM

#

What's that 800 x 2 in a set

young granite May 2, 2023, 11:31 AM

#

wooden sail and you wanna do a different regression for each of the 2k datasets?

so i got 2000x50 features for one LR model

young granite May 2, 2023, 11:31 AM

#

cold osprey What's that 800 x 2 in a set

spectroscopy data

cold osprey May 2, 2023, 11:31 AM

#

How did u get the 50 features

#

oh

wooden sail May 2, 2023, 11:43 AM

#

and what are you passing to sklearn's LinearRegression.fit()?

young granite May 2, 2023, 11:44 AM

#

my features

#

and their m/z

wooden sail May 2, 2023, 11:45 AM

#

ok. so the 2000x50 array of wavelet coefficients, and whatever this m/z is

#

then the linear regression learns 51 parameters, ok

young granite May 2, 2023, 11:46 AM

#

now back to the original question hahahaha

#

sorry for the circumstances

wooden sail May 2, 2023, 11:47 AM

#

in the plot you showed above, what is this "fitted value" you put on the x axis?

#

the predicted m/z?

young granite May 2, 2023, 11:47 AM

#

yyes

#

residual is: real-pred

#

so when x:-x occurs the model has problems

#

and i want to figure out a way to improve that other than to switch model

wooden sail May 2, 2023, 11:48 AM

#

what does your notation x:-x mean

young granite May 2, 2023, 11:48 AM

#

x:y coordinates

wooden sail May 2, 2023, 11:48 AM

#

ok

young granite May 2, 2023, 11:48 AM

#

pred:-pred

#

linear downwards trend

wooden sail May 2, 2023, 11:49 AM

#

that does indeed indicate model mismatch and not noise

#

you can try to whiten the data before doing linear regression

#

but if the relationship between the data is not linear, no amount of preprocessing will help

young granite May 2, 2023, 11:52 AM

#

could it be due to lack of certain m/z values

#

so that it will decrease over time

#

cause thats my assumption

wooden sail May 2, 2023, 11:52 AM

#

probably not tbh

young granite May 2, 2023, 11:53 AM

#

if a m/z is in 90% its good represented but not for 10%

#

mhhh

#

i came to this conclusion cause i tried a simple CNN aswell which resulted in a similar trend

somber panther May 2, 2023, 1:24 PM

#

so Series.count() isn't what i was expecting, whats the trick?

#

need to collect the number of "Black" in a series

cold osprey May 2, 2023, 1:25 PM

#

groupby count

willow quest May 2, 2023, 1:27 PM

#

I'm being asked to find a 'nice looking' representation of how close one number is to one another, preferably condensed to a 0-1 or 0-100 scale (it's not a group/population, just a series of A and B data points that are unrelated to other A and B data points, so cannot apply normalization).

So when A=100 and B=100, the score is like 1 or 100. But when A or B is different, they want to know the 'distance' so to speak, without the sign. So e.g. A=50, B=100 = 0.5 or 50 score. But if A=1000000 and B=100, then the score would be 0.000x or similar. Anyone has any ideas? My stats background is in life sciences so I'm a bit lost here 😅

hoary wigeon May 2, 2023, 1:52 PM

#

Hello everyone, I need help with understanding the use case of shapely value..

First of all, Is it possible to calculate record level shapely value? (record in sense for individual observation used for training or testing the model)

untold cliff May 2, 2023, 2:15 PM

#

Does the imbalanced datasets problem concern only the target variable? Like if i have an imblanced feature, do i have to deal with it?

cold osprey May 2, 2023, 2:15 PM

#

wdym imbalanced feature?

#

like skewed?

untold cliff May 2, 2023, 2:16 PM

#

cold osprey wdym imbalanced feature?

Sorry, i meant categorical features.

cold osprey May 2, 2023, 2:16 PM

#

oh

#

idt u need to do anything

untold cliff May 2, 2023, 2:18 PM

#

cold osprey idt u need to do anything

Ok so that would be a problem only if the imbalance isn't really representative of the population i guess?

tidal bough May 2, 2023, 2:24 PM

#

willow quest I'm being asked to find a 'nice looking' representation of how close one number ...

Well, you could take |A-B| and apply to it any function that maps [0, ∞] to [0,1). For example, arctan (times a constant).

#

atan(|A-B|)*2/π would be 0 for A=B, and approaches 1 as |A-B| approaches infinity.

#

the logistic function (aka the sigmoid, 1/(1+exp(x))) is another choice for the function. Though that one is 1/2 at 0, so you'd have to rescale it.

willow quest May 2, 2023, 2:35 PM

#

tidal bough `atan(|A-B|)*2/π` would be 0 for A=B, and approaches 1 as |A-B| approaches infin...

thank you! going to read in to the mapping ability rn

uncut wasp May 2, 2023, 2:37 PM

#

uncut wasp Hello, I am trying to run DiffMorph (https://github.com/volotat/DiffMorph/) on m...

can somebody help me with this

lapis sequoia May 2, 2023, 2:42 PM

#

Heyy

uncut wasp May 2, 2023, 2:43 PM

#

Hi @lapis sequoia

upper bridge May 2, 2023, 2:47 PM

#

following this tutorial:https://www.geeksforgeeks.org/disease-prediction-using-machine-learning/ and in code ```py

Training the models on whole data

arctic wedge · 2023-04-27T18:14:52.941Z

#data-science-and-ml | Python | Page 59

final_svm_model = SVC()
final_nb_model = GaussianNB()
final_rf_model = RandomForestClassifier(random_state=18)
final_svm_model.fit(X, y)
final_nb_model.fit(X, y)
final_rf_model.fit(X, y)

Reading the test data

test_data = pd.read_csv("./dataset/Testing.csv").dropna(axis=1)

test_X = test_data.iloc[:, :-1]
test_Y = encoder.transform(test_data.iloc[:, -1])

Making prediction by take mode of predictions

made by all the classifiers

svm_preds = final_svm_model.predict(test_X)
nb_preds = final_nb_model.predict(test_X)
rf_preds = final_rf_model.predict(test_X)

final_preds = [mode([i,j,k])[0][0] for i,j,
k in zip(svm_preds, nb_preds, rf_preds)]

print(f"Accuracy on Test dataset by the combined model
: {accuracy_score(test_Y, final_preds)*100}")

cf_matrix = confusion_matrix(test_Y, final_preds)
plt.figure(figsize=(12,8))

sns.heatmap(cf_matrix, annot = True)
plt.title("Confusion Matrix for Combined Model on Test Dataset")
plt.show()
``` i get error y contains previously unseen labels: 'Fungal infection' why is that? my dataset contains the label

GeeksforGeeks

Disease Prediction Using Machine Learning - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

misty flint May 2, 2023, 3:02 PM

#

if anyone is interested in DE https://www.linkedin.com/posts/benjaminrogojan_data-engineering-study-guide-outline-make-activity-7059166721567297536-rBDL

Benjamin Rogojan on LinkedIn: Data Engineering Study Guide - Outlin...

Interviewing for any technical position generally requires preparing, studying, and long, all-day interviews.

This is why I put together a data engineer…

upper bridge May 2, 2023, 3:23 PM

#

upper bridge following this tutorial:https://www.geeksforgeeks.org/disease-prediction-using-m...

Help please

mint palm May 2, 2023, 4:29 PM

#

i removed loss.backward, and removes shuffle, passed same sample at test and train time, BUT LOSS comes out to be DIFFERENT. HOWWWWWW?

#

This model is f**king with me, i am fed up.

serene scaffold May 2, 2023, 6:13 PM

#

mint palm i removed loss.backward, and removes shuffle, passed same sample at test and tra...

I'm sorry that this is frustrating for you. if you want help, you might get it if you show the code (not as a screenshot).

mint palm May 2, 2023, 6:45 PM

#

serene scaffold I'm sorry that this is frustrating for you. if you want help, you might get it i...

It happens, i figured it out, it was the random crop that was making the difference.
One more thing, i noticed my model was not improving. I made some changes to a model(added a new backbone and a transformer encoder) and before that it was training as expected.
Considering Implementation is not an issue, can you please tell me what to try?
few more observation:

loss is decreasing during training
loss isnt decreasing during testing
sometimes test accuracy reduces

Should i try changing hyperparameter? is so what?

hasty mountain May 2, 2023, 8:45 PM

#

mint palm It happens, i figured it out, it was the random crop that was making the differe...

I mean...if you removed the loss.backward() during training, the loss won't decrease because you're not backpropagating the gradients.

#

Isn't the idea to train the model on train samples, then evaluate it on test sample to check how things are going?
Its loss won't decrease during test. The loss in the test section will only decrease after a train section.

#

Also, the "sometimes test accuracy reduces", I suppose the cause for that might be similar to why sometimes, after a batch iteration(or even an epoch) the lost might increase instead of decreasing. It's just the stochastic gradient nature. The model might be optimized into a worse point accidentally(or, for the accuracy, towards being overfit), but then fix that afterwards.

pallid badge May 2, 2023, 9:21 PM

#

Hi everybody

#

I was wondering if you can recommed good exercises for scipy, numpy, matplotlib including algorithm development

agile cobalt May 2, 2023, 9:28 PM

#

not sure about scipy, but for numpy+matplotlib you could try implementing some simple algorithms like k-means clustering

waxen tusk May 2, 2023, 9:29 PM

#

Does anyone know of any good math courses focused around data science/ML principles?

agile cobalt May 2, 2023, 9:32 PM

#

I've heard about https://www.deeplearning.ai/courses/mathematics-for-machine-learning-and-data-science-specialization but haven't tried it myself so cannot really vouch for it

Mathematics for Machine Learning and Data Science Specialization

A beginner-friendly specialization where you'll master the fundamental mathematics toolkit of machine learning: calculus, linear algebra, statistics, and probability.

waxen tusk May 2, 2023, 9:32 PM

#

Ty

agile cobalt May 2, 2023, 9:32 PM

#

there are also 3blue1brown's videos on youtube

pallid badge May 2, 2023, 9:40 PM

#

agile cobalt there are also 3blue1brown's videos on youtube

For what purpose?

agile cobalt May 2, 2023, 9:41 PM

#

learning math?
they have some very nice visualisations

#

ah, yeah that was in response to Whip, not to you

cunning agate May 2, 2023, 9:47 PM

#

hello guys where can i find some ai and ml project to work on it

hazy sequoia May 2, 2023, 10:19 PM

#

cunning agate hello guys where can i find some ai and ml project to work on it

You can check on Kaggle

untold cliff May 2, 2023, 10:33 PM

#

@willow quest @tidal bough Here's what chatgpt said which is nice and simple i guess:
One option could be to calculate the ratio between the two numbers and then scale it to a 0-1 or 0-100 range. For example, if A=50 and B=100, the ratio is 0.5, which can be scaled to a 50 out of 100 score or a 0.5 out of 1 score. Similarly, if A=1000000 and B=100, the ratio is 10000, which can be scaled to a 0.0001 out of 1 score or a 0.01 out of 100 score.

Another option could be to take the logarithm of the ratio between the two numbers, which would compress the range of values and make it easier to compare across different magnitudes. For example, if A=50 and B=100, the logarithm of the ratio would be -0.301, which could be scaled to a 30 out of 100 score or a 0.3 out of 1 score. If A=1000000 and B=100, the logarithm of the ratio would be 4.605, which could be scaled to a 0.046 out of 1 score or a 4.6 out of 100 score.

Ultimately, the choice of method would depend on the specific requirements of the task and the preferences of the stakeholders involved.

somber panther May 2, 2023, 11:25 PM

#

could use a good video covering pandas if anyone has suggestions

untold cliff May 2, 2023, 11:33 PM

#

somber panther could use a good video covering pandas if anyone has suggestions

A playlist: https://youtube.com/playlist?list=PL9oKUrtC4VP7ry0um1QOUUfJBXKnkf-dA

YouTube

Python Pandas For Your Grandpa

So easy, your grandpa could learn it! A free course on Python Pandas. If you like this and want to support my development of future courses, consider 1) Liki...

cloud marsh May 2, 2023, 11:36 PM

#

I'm using pyenv with virtualenv. I have a ROCm GPU and i'm running the command on pytorch.org/get-started/locally.

i've set --no-cache-dir and ensured it's pulling from indexes in the proper order. it's downloading the linux manylinux wheels and then downloads the nvidia_cuda_cu11 wheels anyways.

does it just download that anyways? or am i not specifying things correctly?

somber panther May 2, 2023, 11:48 PM

#

man... 100days course has me using pandas before numpy, all the resources for pandas seem to suggest that i'm doing this out of order...

somber panther May 2, 2023, 11:50 PM

#

untold cliff A playlist: https://youtube.com/playlist?list=PL9oKUrtC4VP7ry0um1QOUUfJBXKnkf-dA

I bookmarked this, thankyou

untold cliff May 2, 2023, 11:52 PM

#

somber panther I bookmarked this, thankyou

He does have an equally good playlist for numpy as well btw.

somber panther May 2, 2023, 11:53 PM

#

Yeah I'm going to run through these exercises and just type what im told for now, i'll probably go through those before I start my DS course

serene scaffold May 2, 2023, 11:58 PM

#

somber panther man... 100days course has me using pandas before numpy, all the resources for pa...

learning pandas first is fine. the main hurdle with learning either is to not write for loops.

maiden widget May 3, 2023, 3:46 AM

#

i am making a model to classify an image in 5 class

network = input_data(shape=input_shape)
network = conv_2d(network, 32, 3, activation='relu')
network = max_pool_2d(network, 2)
network = conv_2d(network, 64, 3, activation='relu')
network = max_pool_2d(network, 2)
network = fully_connected(network, 128, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 5, activation='softmax')
network = regression(network, optimizer='adam',loss='categorical_crossentropy', learning_rate=0.001)

#

can i use anything else rather than regression for output layer ?

lapis sequoia May 3, 2023, 3:47 AM

#

Is it run program or is just example?

maiden widget May 3, 2023, 3:49 AM

#

this is the model i am using to train the model, but my teacher is saying regression is used for prediction not classification

earnest widget May 3, 2023, 6:53 AM

#

How can I get both my labels to show?

# Plot the data using different colors for each class
fig, ax = plt.subplots()
scatter = ax.scatter(features_2d[:, 0], features_2d[:, 1], c=labels)

plt.legend(loc='upper right', labels=['Container', 'No_Container'])

# Set the title and show the plot
ax.set_title("LDA Visualization")
plt.show()

This only shows one label and not the other.

untold cliff May 3, 2023, 8:36 AM

#

earnest widget How can I get both my labels to show? ```python # Plot the data using different...

What does your labels variable contain?(the one used for c)

mint palm May 3, 2023, 8:40 AM

#

easy way to convert first tensor to second?

#

currently i do: sims = sims.reshape(2, 3*2).t().view(2, 3, 2)

earnest widget May 3, 2023, 8:44 AM

#

untold cliff What does your labels variable contain?(the one used for c)

Two classes 0 and 1. 0 is container and 1 is no_container.

tidal bough May 3, 2023, 8:47 AM

#

mint palm currently i do: ``sims = sims.reshape(2, 3*2).t().view(2, 3, 2)``

arr.reshape((2,2,3)).transpose((1,2,0)) would do, for one.

mint palm May 3, 2023, 8:48 AM

#

tidal bough `arr.reshape((2,2,3)).transpose((1,2,0))` would do, for one.

is it faster also? or just cleaner?

tidal bough May 3, 2023, 8:48 AM

#

who knows, probably about the same

mint palm May 3, 2023, 8:49 AM

#

ok, i am gonna use your, looks better

mint palm May 3, 2023, 8:53 AM

#

tidal bough `arr.reshape((2,2,3)).transpose((1,2,0))` would do, for one.

got no overload error for transpose

tidal bough May 3, 2023, 8:54 AM

#

probably it's done slightly differently in torch than in numpy, maybe .transpose(1,2,0) or whatever

boreal gale May 3, 2023, 8:55 AM

#

earnest widget How can I get both my labels to show? ```python # Plot the data using different...

the crux of the issue is that you have only called ax.scatter once (plt.legend seems to be matching against all the plots you have plotted so far in a best-effort manner, seeing as you only plotted once, there can only be one legend entry)
can you try using two ax.scatter, one for the individual classes on their own, so two in total

earnest widget May 3, 2023, 8:58 AM

#

boreal gale the crux of the issue is that you have only called `ax.scatter` once (`plt.legen...

Like this?


# Plot the data using different colors for each class
fig, ax = plt.subplots()
scatter1 = ax.scatter(features_2d[:, 0], c=labels)
scatter2 = ax.scatter(features_2d[:, 1], c=labels)

# 0 is container and 1 is no container
plt.legend(*scatter.legend_elements(), loc="upper right", title="Classes")

# Set the title and show the plot
ax.set_title("LDA Visualization")
plt.show()

boreal gale May 3, 2023, 9:02 AM

#

no, you need to select rows where the corresponding label is 0 and plot and then repeat (subbing 0 with 1), do you know how to do that?

earnest widget May 3, 2023, 9:04 AM

#

Yeah so currently it is getting all the rows of column 0.

#

Which is one of my classes.

wooden sail May 3, 2023, 9:08 AM

#

each column is a class?

boreal gale May 3, 2023, 9:08 AM

#

0 is indeed one of your classes, but plotting just features_2d[:, 0] is not going to work. you haven't specified the y argument, also think about what is features_2d[:, 0], did it really select all rows of class 0?

earnest widget May 3, 2023, 9:10 AM

#

boreal gale 0 is indeed one of your classes, but plotting just `features_2d[:, 0]` is not go...

Yeah it does not, it prints the first element in the array.

       -1.2610931 ,  0.05956531], dtype=float32)```

boreal gale May 3, 2023, 9:12 AM

#

earnest widget Yeah it does not, it prints the first element in the array. ```array([ 0.9578631...

"first element in the array" is not quite correct.

it's printing the red box, the first "column" of the array

earnest widget May 3, 2023, 9:12 AM

#

I get all with just features_2d[:]

         0.17381935,  0.69571465],
       [-0.68409383,  1.4792389 ,  0.50605595, ..., -1.1951295 ,
        -0.1751789 ,  0.39030302],
       [-1.2169716 ,  0.4897305 ,  0.03768349, ..., -0.6940949 ,
        -1.2008945 ,  0.4508954 ],
       ...,
       [-1.0407516 , -0.3269264 ,  0.41814092, ..., -1.038974  ,
        -1.4509413 ,  1.5210139 ],
       [-1.2610931 ,  0.33272272,  0.88745356, ..., -1.881851  ,
        -1.0341158 ,  2.5147882 ],
       [ 0.05956531, -0.46555987,  1.7835644 , ..., -0.74892455,
        -2.190432  ,  1.3660964 ]], dtype=float32)

boreal gale May 3, 2023, 9:13 AM

#

what you want is something like this
assuming all the red box is where the corresponding class is 0

#

"where the corresponding class is 0" meaning in the label array, the corresponding entry is 0
e.g. the 3rd element in the array is 0 for the first red box

earnest widget May 3, 2023, 9:16 AM

#

boreal gale "where the corresponding class is 0" meaning in the label array, the correspondi...

Yeah so 0 and 1 for the respective images, so it should show as [0] in the red boxes?

boreal gale May 3, 2023, 9:17 AM

#

i didn't really understand what you meant there, mind elaborating?

earnest widget May 3, 2023, 9:17 AM

#

You said in the red box is where the corresponding class is, so it will show as [0,0], like that?

wooden sail May 3, 2023, 9:18 AM

#

what shape is your data? i think ry means the data is of size N x 3 and the 3rd column is the class

boreal gale May 3, 2023, 9:19 AM

#

i was operating under the assumption you have two arrays, a 2D array for the features, and a 1D array for the labels/classes

earnest widget May 3, 2023, 9:19 AM

#

So my features_2d is 3488,1000
Labels is 3488.

#

That is the shape.

earnest widget May 3, 2023, 9:20 AM

#

boreal gale i was operating under the assumption you have two arrays, a 2D array for the fea...

Yeah I have it as a list.

boreal gale May 3, 2023, 9:20 AM

#

this is what i meant re. corresponding entry (i didn't highlight all 1s obviously)

earnest widget May 3, 2023, 9:20 AM

#

Yeah yeah that

#

Like that is how I have my labels:

[0 0 0 0 0 0 0 0 0 0]```

#

For each image.

#

Now I understood what you mean.

boreal gale May 3, 2023, 9:21 AM

#

you lost me at the word image hmm

#

but do you know how to select those rows with label 1?

wooden sail May 3, 2023, 9:22 AM

#

ok that also works. you can make an array of indices based on the labels, and use those to index the rows

earnest widget May 3, 2023, 9:23 AM

#

boreal gale you lost me at the word image hmm

For each image, I am having a label of 1 and 0 respectively. That's what I mean.

wooden sail May 3, 2023, 9:23 AM

#

maybe something like

indices = labels == 0
features_2d[indices, :]

#

indices is a boolean array with True for rows with a label of 0, and we can use that to weed out the class 0

#

something similar can be done for the class 1

#

here's a MWE

#

!e

import numpy as np
data2d = np.random.normal(size=(4,3))
labels = np.array([0,1,0,0])
print(data2d)
print(labels)
indices = labels == 0
print(data2d[indices, :])

arctic wedgeBOT May 3, 2023, 9:26 AM

#

@wooden sail :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | [[-0.59429971 -0.92207244  0.48890026]
002 |  [ 0.22585973 -0.70344427 -0.46252381]
003 |  [ 0.46275675  1.01961049 -1.97198535]
004 |  [ 0.11588553  0.40670295  0.83123672]]
005 | [0 1 0 0]
006 | [[-0.59429971 -0.92207244  0.48890026]
007 |  [ 0.46275675  1.01961049 -1.97198535]
008 |  [ 0.11588553  0.40670295  0.83123672]]

earnest widget May 3, 2023, 9:26 AM

#

Okay yeah that helps bring out the class 0 but what is indices for then? To make it include only 0?

wooden sail May 3, 2023, 9:27 AM

#

to make it include only class 0

earnest widget May 3, 2023, 9:27 AM

#

Yeah class 0, okay.

wooden sail May 3, 2023, 9:27 AM

#

what ry has been saying all this time is that your plot is not split by classes, from what i understand. and they're trying to help you do that

earnest widget May 3, 2023, 9:28 AM

#

Yeah I realized that when I was checking array that it was not showing correctly.

#

So I gotta redo my scatterplot.

muted crypt May 3, 2023, 9:30 AM

#

Anyone familiar with neural networks here?

earnest widget May 3, 2023, 9:31 AM

#

@boreal gale Going back to your statement earlier, what did you mean by two ax.scatter?

boreal gale May 3, 2023, 9:34 AM

#

earnest widget <@231160898872410123> Going back to your statement earlier, what did you mean by...

#

it smells slightly to not have legend defined where your scatter plot is though, hence i would prefer this

earnest widget May 3, 2023, 9:43 AM

#

boreal gale it smells slightly to not have legend defined where your scatter plot is though,...

Yeah I was also getting confused with which label was what. But wait, I am just trying to fully understand this. So the features_2d_0 will contain all of the indices of class 0 right? Then in the first ax.scatter, what is that doing?

boreal gale May 3, 2023, 9:51 AM

#

features_2d_0 will contain all of the indices of class 0 righ
it contains all rows which is of class 0
Then in the first ax.scatter, what is that doing?
it plots a scatter plot, using the 0th column as x and 1st column as y, for all points of class 0.

earnest widget May 3, 2023, 9:59 AM

#

boreal gale > features_2d_0 will contain all of the indices of class 0 righ it contains all...

Oh okay, I get it now. Understood well.

#

Thanks a lot. @boreal gale @wooden sail

weary dust May 3, 2023, 10:37 AM

#

hi , i am currently learning arduino and matlab i want to make a robot like this https://www.youtube.com/watch?v=WtEYMELvRHI&ab_channel=AtleFjellangSæther. is arduino and matlab enough for this? and should i change matlab with python

YouTube

Atle Fjellang Sæther

Autonomous Self-Learning Robot (Q-Learning)

This video illustrates the work performed in the
context of our bachelor's thesis.

The project was conducted in collaboration
with Oslo and Akershus University College of
Applied Sciences.

The purpose of the thesis has been to elucidate
the main methods of self-learning systems, and
develop a self-learning algorithm for an
appropriate de...

▶ Play video

harsh stump May 3, 2023, 12:27 PM

#

guys does this data looks or seems to be white noise ?

wooden sail May 3, 2023, 12:31 PM

#

you can check. the main properties are zero mean, constant variance, and uncorrelatedness

#

subtract the mean value and check whether those properties hold

harsh stump May 3, 2023, 12:34 PM

#

harsh stump guys does this data looks or seems to be white noise ?

i mainly think it is not due to the std, it seems to be increasing by time

rough lava May 3, 2023, 12:47 PM

#

Hello people,
Question for Dropout
Is spatialDropout used in bi-lstm? or just cnn?
Cuz so far it helped my model a lot

foggy harness May 3, 2023, 3:34 PM

#

I am working on an object detection project to detect road defects and I already have the data.

There are main class and subclasses, for example one of the main class being cracks and the subclasses being multiple line crack, Hairline Crack , Block Crack and etc.

Right now we are trying out grouding dino on this but it is giving a lot of noise and detecting things that are not cracks.

Currently, I am stuck in terms of accuracy/mAP, and any tips/advice to approach this object detection task would be greatly appreciated. Since cracks tend to be similar, I feel simply just using an object detection model would not be enough for good performance

Here is an example of the classes:

Cracks
---Transverse crack
---Longitudinal crack
---Multi crack
---Alligator crack
---Block crack
---Rigid pavement crack
Potholes
---Wet pothole
---Pothole with cracks
---Dry pothole

iron basalt May 3, 2023, 6:42 PM

#

weary dust hi , i am currently learning arduino and matlab i want to make a robot like this...

I recommend starting with the Arduino Programming Language (the default it comes with), which is similar to C++. Make some projects with that, no machine learning. Then learn some Python to run on your PC, make some projects with that, no machine learning. Then, if you still want to get into machine learning, you will need to learn some mathematics and play around with some machine learning libraries in Python, there are many resources for that.

#

Make sure to read the documentation: https://docs.arduino.cc/learn/starting-guide/getting-started-arduino

Getting Started with Arduino | Arduino Documentation

An introduction to hardware, software tools, and the Arduino API.

faint mist May 3, 2023, 7:22 PM

#

Hello everyone, I am trying to build a multistep LSTM-DNN model to forecast gold prices

#

Mainly it is a a regression problem with timeseries data

uncut wasp May 3, 2023, 7:24 PM

#

foggy harness I am working on an object detection project to detect road defects and I already...

That's really cool what your doing. Is it open source

faint mist May 3, 2023, 7:24 PM

#

What I am struggling to setup is the supervised dataset

past meteor May 3, 2023, 7:28 PM

#

faint mist What I am struggling to setup is the supervised dataset

have a look at TSfresh

faint mist May 3, 2023, 7:30 PM

#

Interesting, but I can see its more about feature extraction which is indeed part of the setup but not really what I am asking for

#

lets assume that for now, the only feature we have is the historical price

#

using the last 30 days of data to forecast the next 7 days

past meteor May 3, 2023, 7:32 PM

#

Yeah Tsfresh has transforms to make your dataset "rolling" as well

faint mist May 3, 2023, 7:32 PM

#

I see. Let me check it out

past meteor May 3, 2023, 7:33 PM

#

faint mist I see. Let me check it out

https://tsfresh.readthedocs.io/en/latest/text/forecasting.html

cosmic lynx May 3, 2023, 8:51 PM

#

If python has a hard time with computationally demanding tasks, why is it popular for AI development?

agile cobalt May 3, 2023, 8:53 PM

#

cosmic lynx If python has a hard time with computationally demanding tasks, why is it popula...

all of the computationally demanding parts are handled by libraries like numpy or pytorch, which can do the operations in a way as efficient as they would be if done in C/C++

#

ofc, that requires using these libraries and writing code in a way that makes good use of the features they offer though

#

furthermore, when you add in GPUs / TPUs to the mix, doing it in python and letting pytorch/tensorflow optimise how to make good use of the GPU/TPU can make it orders of magnitudes faster

cosmic lynx May 3, 2023, 8:56 PM

#

Okay thanks

somber panther May 4, 2023, 1:37 AM

#

so I'm just starting with pandas and have encountered some discrepencies between what the tutor is using and how the docs propose elements are extracted. Specifically, the tutor will grab a "cell" (my wording) and wrap it in int() to achieve element extraction, but the docs would have you python x_value = row_data.loc[row_data.index[0], 'x']
vspython x_value = int(row_data.x)
and I'm getting the sense that using the tutors approach is overlooking an important aspect of data manipulation

#

My question, then, is there a right and wrong way? why?

grand mason May 4, 2023, 2:03 AM

#

cosmic lynx If python has a hard time with computationally demanding tasks, why is it popula...

One reason is the abundance of data science/ai libraries

#

Have you heard about the programming language Mojo? It's a superset of python, meant for high performance AI operations. I think it came out yesterday

#

I said came out, but it was just announced as a new project :D

agile cobalt May 4, 2023, 2:09 AM

#

somber panther so I'm just starting with pandas and have encountered some discrepencies between...

which "tutor" specifically are you referring to?
some website/tutorial or a real person that is teaching you?

somber panther May 4, 2023, 2:09 AM

#

its the 100days course, angela yu

agile cobalt May 4, 2023, 2:10 AM

#

haven't seen it myself but yeah, without more context, the way they're doing it doesn't makes much sense

the two pieces of code you showed are for completely different purposes though

#

in py x_value = row_data.loc[row_data.index[0], 'x'] row_data shouldn't be an actual row (pandas.Series), but rather a data frame (pandas.DataFrame)

somber panther May 4, 2023, 2:11 AM

#

its for used in a dataframe state, x coor, y coor

agile cobalt May 4, 2023, 2:12 AM

#

in x_value = int(row_data.x), row data should be a series, and you're taking the x value out of it, then converting it from a pandas/numpy float or integer to a python integer

somber panther May 4, 2023, 2:12 AM

#

my code for row data is row_data = state_coors[state_coors["state"] == guess]

agile cobalt May 4, 2023, 2:12 AM

#

somber panther my code for row data is row_data = state_coors[state_coors["state"] == guess]

if state coors is a dataframe, then the second code should throw an error

#

!e ```py
import pandas as pd
df = pd.DataFrame({'state': [1, 2, 3]})
rows = df[df['state'] == 2]
print(rows.shape, rows, type(rows), sep='\n')

somber panther May 4, 2023, 2:13 AM

#

she may have converted it to a series, i haven't watched the video since yesterday

arctic wedgeBOT May 4, 2023, 2:13 AM

#

@agile cobalt :white_check_mark: Your 3.11 eval job has completed with return code 0.

001 | (1, 1)
002 |    state
003 | 1      2
004 | <class 'pandas.core.frame.DataFrame'>

somber panther May 4, 2023, 2:13 AM

#

maybe i need to be more specific

#

her method of extracting an int was just completely different than what i got from docs

agile cobalt May 4, 2023, 2:14 AM

#

there are a bunch of different ways to do more or less same thing tbf

#

like, these two would give you the same result:```py
df.iloc[0, 0]
df.loc[df.columns[0], df.index[0]]

in case df = pd.DataFrame({'A': [10]})

df.loc['A', 0]

somber panther May 4, 2023, 2:15 AM

#

yeah no doubt I realize im kind of splitting hairs

agile cobalt May 4, 2023, 2:16 AM

#

the biggest difference would be the type of the number you're getting out of that though

somber panther May 4, 2023, 2:16 AM

#

I guess i'm concerned that im not seeing something core to data analysis with python

#

they're both ints, here's my complete code

#

import turtle
import pandas as pd

screen = turtle.Screen()
screen.title("State Test")
image = "blank_states_img.gif"
screen.addshape(image)
turtle.shape(image)
state_coors = pd.read_csv("50_states.csv")
state_names = state_coors["state"].tolist()
print(state_names)


def get_mouse_click_coor(x, y):
    print(x, y)


def user_answer():
    return screen.textinput(title="Guess the states", prompt="Guess a state")
while True:
    guess = user_answer()
    if guess in state_names:
        row_data = state_coors[state_coors["state"] == guess]
        x_value = row_data.loc[row_data.index[0], 'x']
        y_value = row_data.loc[row_data.index[0], 'y']
        print("x_value:", x_value)
        print("y_value:", y_value)
        state_pointer = turtle.Turtle()
        state_pointer.shape("circle")
        state_pointer.color("black")
        state_pointer.penup()
        state_pointer.goto(x_value,y_value)

turtle.mainloop()```

#

is the difference that 1 is numpy int and the other is core python?

agile cobalt May 4, 2023, 2:18 AM

#

pretty much

somber panther May 4, 2023, 2:19 AM

#

would that ever matter?

agile cobalt May 4, 2023, 2:19 AM

#

turtle might complain if you pass a numpy number to it, not sure

somber panther May 4, 2023, 2:19 AM

#

nah it worked fine

#

from my code?

agile cobalt May 4, 2023, 2:22 AM

#

eh, never mind

#

I was trying to demonstrate something but that something shouldn't really matter to you

somber panther May 4, 2023, 2:22 AM

#

I don't want to waste your time with something that's unimportant 99.9% of the time

#

will be helpful to keep in the back of my head, ty

agile cobalt May 4, 2023, 2:24 AM

#

back to the topic... as long as it works, you can just assume that it is a stylistic choice from the author

#

oftentimes you'll see like 5+ methods to do the same thing

#

and as for which one is the "right" way, usually you should stick with the official documentation

somber panther May 4, 2023, 2:25 AM

#

yeah i get it, just growing pains, it took me 3 hours to get to the same outcome as row_data.x

dusty bay May 4, 2023, 2:50 AM

#

Hi, I'm a novice programmer. I want to display a dataframe coming from a csv file using an object oriented programming method. Here I show the code.
"""

#

`import pandas as pd
import matplotlib.pyplot as plt

class csv2df():

def __init__(self):
    df = pd.read_csv("RMS level.csv")`

#

Explanation please. Thank You

robust stratus May 4, 2023, 3:27 AM

#

Is this a day in the life of a data scientist? They collect data and create a chart for it?

serene scaffold May 4, 2023, 3:36 AM

#

robust stratus Is this a day in the life of a data scientist? They collect data and create a ch...

that would be the easiest day in the life of a data scientist.

robust stratus May 4, 2023, 3:46 AM

#

serene scaffold that would be the easiest day in the life of a data scientist.

Are you a Data Scientist?

serene scaffold May 4, 2023, 3:59 AM

#

robust stratus Are you a Data Scientist?

I'm a computational linguist.

somber panther May 4, 2023, 4:36 AM

#

robust stratus Is this a day in the life of a data scientist? They collect data and create a ch...

I think you would probably want to change "data" to "science" to better reflect the complexity of the occupation

#

or title the chart with "Science" perhaps

golden vapor May 4, 2023, 5:01 AM

#

skills required to become a data scientist

somber panther May 4, 2023, 5:10 AM

#

golden vapor skills required to become a data scientist

Is that a question? I think advanced statistics one requirement, calculus maybe

somber panther May 4, 2023, 5:13 AM

#

golden vapor skills required to become a data scientist

Realistically though, I have heard that there's not much standardization, there's a lot of cross-over between data analyst, product analyst and data scientist

golden vapor May 4, 2023, 5:17 AM

#

somber panther Is that a question? I think advanced statistics one requirement, calculus maybe

no like i meant programming langs

somber panther May 4, 2023, 5:17 AM

#

python or r, and sql

golden vapor May 4, 2023, 5:18 AM

#

alright ty

jade plaza May 4, 2023, 5:41 AM

#

I wanted to run a question by a few... I have an opportunity to buy a friends computer for ML/data analytics work...

AMD Threadripper 3970x
MSI RTX 3080 10G
128GB (4x 32GB) DDR4 3200MHz
Gigabyte TRX40 Motherboard
970 Evo Plus NVMe 500GB
970 Evo NVMe 500GB
Corsair HX1200 PSU
Noctua NH-U14S

For $2350 USD -- how much value is in the 3970X now? Only concern for the above is locked into TRX40 which won't work with zen3 etc. PC part picker of the above tells me its $5700 but not sure how accurate that is.

faint mist May 4, 2023, 6:56 AM

#

Not sure if this is a good deal given it is a used condition

jade plaza May 4, 2023, 7:00 AM

#

Fair call

#

One factor is that I'm located in the south pacific on an island called Australia.

cloud marsh May 4, 2023, 8:48 AM

#

how can i find out why pip refuses to install a package? i'm seeing the result when i run pip_search tensorflow-rocm but i can't run pip install tensorflow-rocm it shows 'no matching distribution'

cold osprey May 4, 2023, 8:55 AM

#

i think it may be a python version thing

#

https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/308

GitHub

Installing tensorflow-rocm with pip / Python 3.7 installs an empty ...

Using Python 3.7, the command: $ pip3 install --user tensorflow-rocm attempts to download and install the following file: https://files.pythonhosted.org/packages/d6/b5/f14d48711d276f2391c944ecda0bc...

muted crypt May 4, 2023, 8:55 AM

#

yes, I had a similar issue

cold osprey May 4, 2023, 8:55 AM

#

https://pypi.org/project/tensorflow-rocm/

PyPI

tensorflow-rocm

TensorFlow is an open source machine learning framework for everyone.

#

hmm but seems like it does support 3.8++

muted crypt May 4, 2023, 8:57 AM

#

Does anyone have an idea of why the predicted results look like this? They seem to be flipped along the horizontal axis as well as scaled down? (I've used LSTM)

cyan magnet May 4, 2023, 9:13 AM

#

i m having this error can anyone help me to resolve this i googled about it and asked gpt4 and got to know it cuz of version mismatch of PyQt and opencv but idk which is the suitable version for opencv and PyQt

#

#

does anyone has any solution of this problem ?????

cold osprey May 4, 2023, 10:28 AM

#

how many of a, b, c (s) are there?

#

is it 15? or are there 15 unique combinations of this

#

some are both a and b?

#

so there could be multiple procedures applied to the same patient(row)?

#

something like this seems good

#

if u have 1 million rows, should be fine to have 15 more cols

#

another way is to one hot encode the combination of procedures, which will be more than 15

#

Yeah don't think there ie

cyan magnet May 4, 2023, 11:11 AM

#

cyan magnet i m having this error can anyone help me to resolve this i googled about it and ...

@barren otter@agile hearth help

cold osprey May 4, 2023, 11:33 AM

#

Anyone got any resources on how to make dashboards look better? 😅

#

Design, colours etc

tall tulip May 4, 2023, 11:39 AM

#

After taking the difference in timeseries data I got negative values, can change it to positive?

boreal gale May 4, 2023, 11:51 AM

#

tall tulip After taking the difference in timeseries data I got negative values, can change...

while we can of course tell you how to make it positive from a data manipulation perspective, i think it would be beneficial to know why do you want to change it to positive as that might not be the correct thing to do in the grand scheme of things (grand scheme of things as in the overall goal of completing your coursework/delivering value at work).

tall tulip May 4, 2023, 11:54 AM

#

boreal gale while we can of course tell you how to make it positive from a data manipulation...

I take the diff to make the data stationary, I resample the the data to hourly from 5 min timestamp, Now I'm just asking should I change it to positive or not?

boreal gale May 4, 2023, 11:55 AM

#

ah, i obviously lack caffeine in my body to read properly, heh.

no, you shouldn't make it positive.

tall tulip May 4, 2023, 11:58 AM

#

and 1 more question, when I do kpss and adf test to check whether the data is stationary or not, the kpss test gives me result the data is stationary and adf test results show the data is not stationary, after taking the diff kpss test show the data is not stationary and adf test result show the stationary.

#

Like both test give alternative answers @boreal gale

cold osprey May 4, 2023, 11:58 AM

#

u malaysian?

tall tulip May 4, 2023, 11:59 AM

#

cold osprey u malaysian?

Me??

cold osprey May 4, 2023, 11:59 AM

#

tall tulip Me??

yeye

tall tulip May 4, 2023, 11:59 AM

#

Nope

cold osprey May 4, 2023, 11:59 AM

#

where u from

tall tulip May 4, 2023, 11:59 AM

#

Pakistan

cold osprey May 4, 2023, 11:59 AM

#

ah

tall tulip May 4, 2023, 11:59 AM

#

Yep

boreal gale May 4, 2023, 12:01 PM

#

so to recap you have only taken the first order difference, and performed ADF and KPSS test on the differenced time series?

have you looked into whether you have trend in your data?

earnest widget May 4, 2023, 12:11 PM

#

Has anyone had any experience with image adaptive GAN reconstruction?

lavish lagoon May 4, 2023, 12:12 PM

#

Does anyone have any experience with image enhancement using cv? I have a project including ocr and the video quality is poor and needs enhancement and I have tried everything I could find and I am really lost

tall tulip May 4, 2023, 12:16 PM

#

boreal gale so to recap you have only taken the first order difference, and performed ADF an...

Yes data have daily seasonality, and also increasing trend. Yes I've take only first order difference but it makes data stationary after first order difference

robust stratus May 4, 2023, 12:17 PM

#

somber panther I think you would probably want to change "data" to "science" to better reflect ...

What are some examples of Data Science work?

#

Do they predict the sales of products for a company or something?

boreal gale May 4, 2023, 12:18 PM

#

tall tulip Yes data have daily seasonality, and also increasing trend. Yes I've take only f...

you might want to de-trend it first? then ADF and KPSS should hopefully agree with each other

tall tulip May 4, 2023, 12:19 PM

#

boreal gale you might want to de-trend it first? then ADF and KPSS should hopefully agree wi...

I want to make the data stationary, so I will check the ARIMA model performace on our data

#

ARIMA or SARIMA

somber panther May 4, 2023, 12:25 PM

#

robust stratus What are some examples of Data Science work?

I couldn't say but i'm sure you could google some examples of data science

boreal gale May 4, 2023, 12:33 PM

#

robust stratus What are some examples of Data Science work?

data collection: collect data from various sources, including databases, APIs, and the web.
data cleaning: raw data is often noisy and incomplete - part of the job is to to preprocess and clean the data to remove outliers, fill in missing values, and standardise data formats.
exploratory data analysis (EDA): using visualisation tools and stats to explore and summarise the data, identify patterns (in a primative way(?)), and detect outliers.
feature engineering: creating new features from the existing data in order to enhance the predictive power of models.
model development: use machine learning algorithms/traditional stats models to build predictive models that can make somewhat accurate predictions on unseen new data.
model evaluation: test and evaluate the performance of models to ensure their accuracy (among other evaluation metrics) and effectiveness.
deployment: data scientists deploy the model into production environments to make data-driven decisions.
monitoring: data scientists monitor the model's performance over time, detect any anomalies or changes in data patterns (e.g. population drift), and update the model accordingly.

depends on the organisation you work for, your focus might varies among these 8 tasks.
i assume most DS won't be collecting data/deploying their own model.

also, people always say cleaning the data takes 80% of your time, and making models takes 20% of your time.
i personally think this is just a meme, it depends on your org and your task really.

muted crypt May 4, 2023, 12:43 PM

#

Hey @boreal gale do you mind if I ask a question about that problem that you helped me solve to find the error between two trajectories where you used DTW?

boreal gale May 4, 2023, 12:45 PM

#

sure, what's up?

muted crypt May 4, 2023, 12:47 PM

#

boreal gale sure, what's up?

if you remember, we had the trajectories and one of them was shifted in time so dtw helped to find the closest distance. But is there a way to find the most optimal time shift that minimizes the error?

#

Referring to this, as the real trajectory (red) seems to be shifted to the left and it would be nice to find what's the most ideal shift so both of them can be considered to be happening at the same time

#

In this example is quite clear but there are other cases where it's not so easy to tell where to shift one of the sequences

boreal gale May 4, 2023, 12:51 PM

#

But is there a way to find the most optimal time shift that minimizes the error?
just so we are on the same page, what is time shift?
is it a constant shift in time? or a dynamic shift in time?

because DTW assumes there could be a dynamic shift in time, as in time 1 in series A could be matched to time 1+4 in series B and time 2 in A could be matched to time 2+10 in B (the +4 and +10 there is not a constant)

muted crypt May 4, 2023, 12:53 PM

#

boreal gale > But is there a way to find the most optimal time shift that minimizes the err...

just a constant time, or at least that's my approach to what I think is a good way of representing them on the same time axis and considering the best alineation between both of them

boreal gale May 4, 2023, 12:54 PM

#

oh! if it's constant time, then i don't think DTW is a right approach. i need to re-read the paper to fully confirm this though...

muted crypt May 4, 2023, 12:55 PM

#

boreal gale oh! if it's constant time, then i don't think DTW is a right approach. i need to...

because what I am doing, is considering these trajectories as time series and then make a model, that passing it the intended trajectory as input, can sort of predict what could be the real trajectory that will follow from the things that it has learned from the data. Do you think that's possible?

boreal gale May 4, 2023, 12:58 PM

#

i see, you are making a "real trajectory synthesiser"

give the thing the intended trajectory, and you can expect a simulated "real trajectory" from it

i don't see anything that screams "no" to me here - but there is a very real possibility that this is one of those "unknown unknown" thing to me, i don't know what could fail until i attempt it

muted crypt May 4, 2023, 1:01 PM

#

boreal gale i see, you are making a "real trajectory synthesiser" give the thing the intend...

fair! I've been trying to use neural networks. Like LSTM for now but the results aren't as I was expecting :(

#

I mean it's possible as there are a lot of papers doing so but I can't get my head around how do they do it

#

The best I could do is something like this (green is the prediction, orange is the real truth, and blue is the intended)

#

(of course the prediction matches closely the intended trajectory but the problem is time once again)

boreal gale May 4, 2023, 1:04 PM

#

from the description of your task, LSTM sounds sensible as a component in your NN.
also there is this concept of GAN, i wonder if you can incorporate it into your NN somehow

(it's important to note my knowledge to NN is almost purely academic, i haven't done any "real" NN development)

boreal gale May 4, 2023, 1:06 PM

#

muted crypt The best I could do is something like this (green is the prediction, orange is t...

if i just look at this by itself, i would argue your NN hasn't learn enough to give you a decent predicted real trajectory 🤔

#

are you feeding it intended trajectory as features and the unshifted real trajectory you have observed as targets?

muted crypt May 4, 2023, 1:06 PM

#

I'll take a look at it! I mean i have this as a thesis to deliver soon and I'm not a computer scientist or something so my skills are quite low (that's why you see me asking all time here! thanks for saving me!) and having to learn about NN is just tough

muted crypt May 4, 2023, 1:09 PM

#

boreal gale are you feeding it intended trajectory as features and the unshifted real trajec...

so what i've tried here is so wrong but I was desperate to see something happen. This has just been trained with some concatenated latitude-positions from different flights over time. That's it, just a column and the time stamp. But the thing is that I can't seem to train it with the real data as then it has to become an input which I don't have for future intended flights

#

it's either that or I have understood that wrong. As fas as I know they usually take previous information to predict the future, but if I feed it all the data, the it asks me the input to be the same size and I don't have the things such as the error of a trajectory that hasn't been flown

boreal gale May 4, 2023, 1:11 PM

#

since you have a real world understanding of your problem, you can potentially think about why the real trajectory is different to the intended trajectory, then come up with features that correspond to the underlying cause of deviation to intended trajectory. and feed those into the NN

e.g. is it about to turn a super tight corner? is it at max acceleration? i don't actually know the physics of drones to this is just me guessing.

muted crypt May 4, 2023, 1:14 PM

#

boreal gale since you have a real world understanding of your problem, you can potentially t...

it's mainly that the drone doesn't follow exactly the intended trajectory in terms of time. Say that it has to reach a turn at time=10 but the dron arrives at time=12, then somehow the dron speeds up and get to the next point at time=20 instead of the intended time=18. This leads to the most error. Of course that in the turns the error is higher as we could see from the dtw thing that you did! i marked the highest error points in the trajectory and these were the turns

#

these drones not perfectly adapting to the time marks of the intended path makes the trajectories very different and I though that shifting them would make sense

x4gRI3D58mUkJSXh6NGjCAwMhEQiETo8QogCo6SOEEIUiK2tLc6dOweJRAJvb20adMGs2bNgomJCdTU6CObEPJmIkbTnxNCCCGEKD362kcIIYQQogIoqSOEEEIIUQGU1BFCCCGEqABK6gghhBBCVAAldYQQQgghKoCSOkIIIYQQFUBJHSGEEEKICqCkjhBCCCFEBVBSRwghhBCiAiipI4QQQghRAZTUEUIIIYSoAErqCCGEEEJUwP8DnCYNkmwO6MAAAAASUVORK5CYII.png

boreal gale May 4, 2023, 1:17 PM

#

muted crypt so what i've tried here is so wrong but I was desperate to see something happen....

oh, you might want to normalise your input first, as i think having the raw lat lng will impede the NN from learning.

instead of latlng pairs like (62,15), (63,16)
you might instead want (0,0), (1,1)

because that path is virtually the same as (100,100),(101,101) (which noramlises to the same 00,11 pair) , but your NN won't see it that way unless you normalise

i know it's not actually the same, because the way that lat lng works, but it's close enough also >90 for both of them is obviously wrong lol

muted crypt May 4, 2023, 1:17 PM

#

(in 3D they look veeery similar though as we ignore the time variable)

robust stratus May 4, 2023, 1:18 PM

#

boreal gale - data collection: collect data from various sources, including databases, APIs,...

Nice explanation. I guess a Data Scientist were useful to predict the rise COVID-19 cases.

muted crypt May 4, 2023, 1:18 PM

#

boreal gale oh, you might want to normalise your input first, as i think having the raw lat ...

I've done something like that with a Scaler, which puts it between 0 and 1 but I suppose it is not the same, right?

boreal gale May 4, 2023, 1:18 PM

#

muted crypt I've done something like that with a Scaler, which puts it between 0 and 1 but I...

not quite the same

muted crypt May 4, 2023, 1:19 PM

#

boreal gale oh, you might want to normalise your input first, as i think having the raw lat ...

what do you mean that the NN won't see it?

boreal gale May 4, 2023, 1:20 PM

#

"won't see it that way"

as in behaviour located at (62,15), (63,16) and (100,100),(101,101) could be treated differently

muted crypt May 4, 2023, 1:20 PM

#

doesn't normalizing make (62,15), (63,16) == (100,100),(101,101) ?

#

oh

boreal gale May 4, 2023, 1:20 PM

#

no it doesn't, if you are using scaler

#

consider if we only have those two "paths"
it (the scaler) would normalise to something like
(0,0),(0.1,0.1) and (0.99,0.99),(1,1)
which is not the same as treating both path as (0,0), (1,1)

muted crypt May 4, 2023, 1:21 PM

#

so (62,15), (63,16) becomes (0,0), (1,1). What does (100,100),(101,101) become?

boreal gale May 4, 2023, 1:22 PM

#

both should be (0,0), (1,1) imo

#

basically your input should be what's the trajectory relative to where you started, not the trajectory of actual absolute position on earth

muted crypt May 4, 2023, 1:23 PM

#

then it is like subtracting the initial point to the rest of the points. Otherwise all the trajectories would have so many points in common?

boreal gale May 4, 2023, 1:24 PM

#

then it is like subtracting the initial point to the rest of the points.
bingo

#

otherwise your NN could be learning different things for flights that happen in a different starting location

muted crypt May 4, 2023, 1:25 PM

#

but my flights all happen in the same area

boreal gale May 4, 2023, 1:26 PM

#

then that effect is somewhat reduced

muted crypt May 4, 2023, 1:26 PM

#

boreal gale > then it is like subtracting the initial point to the rest of the points. bing...

plotting this makes all flights trajectories' overlap

boreal gale May 4, 2023, 1:27 PM

#

muted crypt plotting this makes all flights trajectories' overlap

hmm? that's not a problem if we are just using that as training data for NN?

muted crypt May 4, 2023, 1:28 PM

#

#

muted crypt May 4, 2023, 1:29 PM

#

boreal gale hmm? that's not a problem if we are just using that as training data for NN?

You mean doing that?

boreal gale May 4, 2023, 1:31 PM

#

ermm.. i guess?

basically you want to take each trajectory - i assume it's 2D np array (call it path), and subtract the starting position

path_deltas = path - path[0]

muted crypt May 4, 2023, 1:32 PM

#

My assumption was that this could be treated as a time series

boreal gale May 4, 2023, 1:32 PM

#

oh

#

why though

#

each individual flights should be treated as one time series "observation" imo

young granite May 4, 2023, 1:34 PM

#

u want to forecast so u dont have to do new measurments?

muted crypt May 4, 2023, 1:35 PM

#

boreal gale each individual flights should be treated as one time series "observation" imo

yes makes sense now that you say it. Then the question is that we should also have the real trajectory in the same "observation" too, right?

boreal gale May 4, 2023, 1:36 PM

#

muted crypt yes makes sense now that you say it. Then the question is that we should also ha...

not sure what you mean here

muted crypt May 4, 2023, 1:36 PM

#

young granite u want to forecast so u dont have to do new measurments?

It's for a development of drones, having the intended path, we need to know how it will "really" look like when the drone flies it problems can be avoided

past meteor May 4, 2023, 1:37 PM

#

boreal gale - data collection: collect data from various sources, including databases, APIs,...

I've truly spent several months just collecting and cleaning data (clinical trial) pithink

muted crypt May 4, 2023, 1:37 PM

#

boreal gale not sure what you mean here

for the NN to learn, it kind of need to associate the intended trajectory to the real one so it learns what differs from one to another? like providing the input (intended) and output (real)

boreal gale May 4, 2023, 1:38 PM

#

past meteor I've truly spent several months just collecting and cleaning data (clinical tria...

heh, there are always exceptions 😉

young granite May 4, 2023, 1:38 PM

#

past meteor I've truly spent several months just collecting and cleaning data (clinical tria...

cleaning data is an iterative process 😄 (always coming back to it Q_Q)

past meteor May 4, 2023, 1:38 PM

#

I guess it depends on the domain but the 80/20 thing is probably true if you're in an applied science kind of domain where you're responsible for designing the study, collecting data, ...

young granite May 4, 2023, 1:39 PM

#

if u get data from someone else the struggle begins 😄

past meteor May 4, 2023, 1:39 PM

#

young granite cleaning data is an iterative process 😄 (always coming back to it Q_Q)

Yup, agree but for some reason we did the lump sum in the beginning

boreal gale May 4, 2023, 1:39 PM

#

young granite if u get data from someone else the struggle begins 😄

omg. that reminds me... dealing with census data always give me a headache 😡

young granite May 4, 2023, 1:39 PM

#

boreal gale omg. that reminds me... dealing with census data always give me a headache 😡

hahahaha

muted crypt May 4, 2023, 1:40 PM

#

boreal gale ermm.. i guess? basically you want to take each trajectory - i assume it's 2D n...

this leads something like that

past meteor May 4, 2023, 1:40 PM

#

Last time I dealt with census data the results were so bad I contacted our government and they said "oopsie we made a mistake, sending you the new dataset in a minute."

young granite May 4, 2023, 1:40 PM

#

government be like : shits on fire yo

past meteor May 4, 2023, 1:42 PM

#

The entire thing was a joke. We had a large cross-section of a significant % of the population. Yes they hashed some fields but I'm pretty sure if you tried you could de-anonymize most people. (We obviously did not because this is illegal.)

young granite May 4, 2023, 1:42 PM

#

past meteor The entire thing was a joke. We had a large cross-section of a significant % of ...

FBI open the door

muted crypt May 4, 2023, 1:42 PM

#

past meteor The entire thing was a joke. We had a large cross-section of a significant % of ...

don't be shy, share the dataset :)

past meteor May 4, 2023, 1:43 PM

#

muted crypt don't be shy, share the dataset :)

Nope. I deleted it after the work was done as you should ❤️

muted crypt May 4, 2023, 1:43 PM

#

that's what I wanted to hear, good job man

boreal gale May 4, 2023, 1:43 PM

#

muted crypt for the NN to learn, it kind of need to associate the intended trajectory to the...

OHH, i think i see the issue, i think you might be using LSTM slightly wrong 🤔

the more NN-literate folks can probably comment.

i think you are stitching paths like that because you aren't connecting hidden state of the LSTM units to output units?
i think it's possible to connect hidden state of the LSTM units to output units and have these output units predict your "real" path

past meteor May 4, 2023, 1:45 PM

#

What is your task @muted crypt ?

muted crypt May 4, 2023, 1:45 PM

#

this sounds good but I won't lie if I say that i know how to do this

boreal gale May 4, 2023, 1:45 PM

#

in your input to LSTM there should never be real trajectory, the simulated real trajectory should be generated by these output units and compared against the real real trajectory.

muted crypt May 4, 2023, 1:46 PM

#

past meteor What is your task <@484100119185063947> ?

I've been weeks with no progress :(

past meteor May 4, 2023, 1:46 PM

#

Can you explain what you want to do?

muted crypt May 4, 2023, 1:46 PM

#

boreal gale in your input to LSTM there should *never* be real trajectory, the simulated rea...

oh this changes so many things

#

I mean in the training dataset there has to be the real trajectory, marked as the label (y)? at least that's what I have

past meteor May 4, 2023, 1:48 PM

#

You want to predict the trajectory of an object through time given a starting position?

young granite May 4, 2023, 1:48 PM

#

so input should be ur expected drone trajectory and that output should be compared with ur real trajectory

muted crypt May 4, 2023, 1:48 PM

#

past meteor Can you explain what you want to do?

I have 2 datasets: one of drone inteded trajectories and another of the real trajectories (already flown following the intended ones). From this, I have to develop a model that provided an intended trajectory, predicts how the real one will be

boreal gale May 4, 2023, 1:48 PM

#

someone else can probably guide you better, since i only have academic experience with NN.
post your current attempt and potentially with some example data would really maximise traction.

anyhow, i gotta get back to the fun task of scraping data for now 👋

young granite May 4, 2023, 1:49 PM

#

boreal gale someone else can probably guide you better, since i only have academic experienc...

bs4 ftw

muted crypt May 4, 2023, 1:49 PM

#

past meteor You want to predict the trajectory of an object through time given a starting po...

given the whole intended trajectory

#

from the green one (intended path) as input, predict the red (real flown trajectory) one

young granite May 4, 2023, 1:50 PM

#

this is misleading

#

so predicted is real?

#

u only want to compare pred with real wont u?

muted crypt May 4, 2023, 1:50 PM

#

not compare

past meteor May 4, 2023, 1:51 PM

#

So you have 3 input sequences (intended) and your output needs to be 3 output sequences (flown)?

muted crypt May 4, 2023, 1:51 PM

#

literally provide the green plot and get the red one (or similar)

young granite May 4, 2023, 1:51 PM

#

ah lel

#

u didnt build a model just yet?

#

so its green(input) and red(target)

#

but u didnt do calc. just yet?

muted crypt May 4, 2023, 1:52 PM

#

past meteor So you have 3 input sequences (intended) and your output needs to be 3 output se...

Input essentially is a sequence of points (3D points and time stamp) and then the output should also be (3D points and time)

past meteor May 4, 2023, 1:52 PM

#

Do you expect the sequences to be independent of each other or not? E.g., the intended altitude has an effect on the flown longitude?

muted crypt May 4, 2023, 1:52 PM

#

young granite u didnt build a model just yet?

I've done a LSTM model, but it fails

muted crypt May 4, 2023, 1:53 PM

#

past meteor Do you expect the sequences to be independent of each other or not? E.g., the in...

ideally that would be good yeah!

young granite May 4, 2023, 1:53 PM

#

muted crypt ideally that would be good yeah!

did u check that?

muted crypt May 4, 2023, 1:53 PM

#

this is probably more illustrative

sVzbIXgiAIgiAIf0KMtAiCIAiCoBdE0SIIgiAIgl4QRYsgCIIgCHpBFC2CIAiCIOgFUbQIgiAIgqAXRNEiCIIgCIJeHKc8qnwLtikQAAAABJRU5ErkJggg.png

young granite May 4, 2023, 1:54 PM

#

muted crypt this is probably more illustrative

indeed 😄

muted crypt May 4, 2023, 1:54 PM

#

(being lat,long,alt)

young granite May 4, 2023, 1:54 PM

#

muted crypt this is probably more illustrative

here the fit is way better then in ur other example, is it the same ?

muted crypt May 4, 2023, 1:55 PM

#

young granite did u check that?

i'd like to start without taking into consideration this and probably as it seems an extra hard step but I'd love to do that

past meteor May 4, 2023, 1:55 PM

#

From my understanding you have a quite normal RNN set-up where you map intended -> flown

young granite May 4, 2023, 1:55 PM

#

u can even choose to give a 3d array into the NN

past meteor May 4, 2023, 1:55 PM

#

You should start by sanity checking your model - use a training set of size 1 and get 0 loss

muted crypt May 4, 2023, 1:56 PM

#

young granite here the fit is way better then in ur other example, is it the same ?

yes, see that in the 2D we have time, while in 3D we don't. So in 3D we don't see the time shift (which is what makes the 2D plots look so different)

young granite May 4, 2023, 1:56 PM

#

muted crypt yes, see that in the 2D we have time, while in 3D we don't. So in 3D we don't se...

so drone is just slow 🗿 😄, jokes aside try what zestar suggested and maybe differ the inputs dimension

muted crypt May 4, 2023, 1:56 PM

#

haha facts

cold osprey May 4, 2023, 1:57 PM

#

red just looks like its lagging behind green

muted crypt May 4, 2023, 1:57 PM

#

past meteor You should start by sanity checking your model - use a training set of size 1 an...

can you elaborate more on that?

cold osprey May 4, 2023, 1:57 PM

#

and theres some kind of 'startup' time

past meteor May 4, 2023, 1:57 PM

#

For whatever DNN I'm making I always try to overfit a single example as proof that my architecture is bug-free

muted crypt May 4, 2023, 1:57 PM

#

cold osprey red just looks like its lagging behind green

yes but not alkwys

muted crypt May 4, 2023, 1:58 PM

#

past meteor For whatever DNN I'm making I always try to overfit a single example as proof th...

so a single flight that would be?

past meteor May 4, 2023, 1:58 PM

#

Yes

young granite May 4, 2023, 1:58 PM

#

for a POC yes

cold osprey May 4, 2023, 1:59 PM

#

so the model kinda represents the quirks of this particular drone flying

#

certain delays, route simplification etc it may have

muted crypt May 4, 2023, 1:59 PM

#

past meteor Yes

I've done that

past meteor May 4, 2023, 1:59 PM

#

muted crypt I've done that

What is blue and what is yellow?

muted crypt May 4, 2023, 1:59 PM

#

cold osprey so the model kinda represents the quirks of this particular drone flying

that's exactly the idea!

muted crypt May 4, 2023, 2:00 PM

#

past meteor What is blue and what is yellow?

yellow is the real (true) and the blue what it has predicted from the train

young granite May 4, 2023, 2:00 PM

#

reverse engineering the algo in the drone? 😄

cold osprey May 4, 2023, 2:00 PM

#

young granite reverse engineering the algo in the drone? 😄

haha thats what im thinking also

past meteor May 4, 2023, 2:00 PM

#

muted crypt yellow is the real (true) and the blue what it has predicted from the train

Did you train it on just one flight?

muted crypt May 4, 2023, 2:00 PM

#

yes

past meteor May 4, 2023, 2:00 PM

#

If you're not hitting 0 loss on 1 flight there's something wrong