#data-science-and-ml | Python | Page 417

young plume Jul 3, 2022, 10:54 PM

#

So how can i use the implementation to create a RL neural network?

serene scaffold Jul 3, 2022, 10:55 PM

#

Do you know the difference between supervised and unsupervised learning?

young plume Jul 3, 2022, 10:56 PM

#

I only know how to give it a set of inputs and it predict an output from a different input. But i would like it to learn its inputs by its self, so unsupervised i guess

#

Not too familiar with the terms, just the code

serene scaffold Jul 3, 2022, 10:56 PM

#

Reinforcement learning is actually separate from either of those

young plume Jul 3, 2022, 10:57 PM

#

Ah

serene scaffold Jul 3, 2022, 10:57 PM

#

young plume Not too familiar with the terms, just the code

You have to know the terms.

young plume Jul 3, 2022, 10:57 PM

#

I am learning now lol

serene scaffold Jul 3, 2022, 10:58 PM

#

Right. Anyway, the first neural network one usually learns is a feed forward neural network

#

Usually for a supervised classification task

young plume Jul 3, 2022, 11:00 PM

#

Now how can i convert one to a RL network?

#

Phone die? :(

serene scaffold Jul 3, 2022, 11:02 PM

#

No. I'm playing Wii with my sisters

#

It was my turn

young plume Jul 3, 2022, 11:02 PM

#

Oh okie

serene scaffold Jul 3, 2022, 11:03 PM

#

To flap like a chicken

young plume Jul 3, 2022, 11:03 PM

#

You really remind me of my brother lol.

serene scaffold Jul 3, 2022, 11:03 PM

#

Is he hot?

young plume Jul 3, 2022, 11:03 PM

#

Hes my brother

serene scaffold Jul 3, 2022, 11:04 PM

#

Anyway, I don't think you can just use a feed forward neural network for RL. I can't think of how that would work

iron basalt Jul 3, 2022, 11:04 PM

#

young plume Now how can i convert one to a RL network?

You can do RL without a neural network, the neural network serves a specific purpose to expand the capabilities of the RL algorithm beyond toy examples (neural networks do this for a lot of algorithms, not just RL). So you need to first learn RL.

young plume Jul 3, 2022, 11:04 PM

#

Ok

#

Ill get to that

serene scaffold Jul 3, 2022, 11:05 PM

#

I agree. I would first think of something you can teach an agent to do with RL, and then implement it.

iron basalt Jul 3, 2022, 11:05 PM

#

For that, there is the classic book, written by those that invented it.

serene scaffold Jul 3, 2022, 11:05 PM

#

Woah

#

What book is it

iron basalt Jul 3, 2022, 11:05 PM

#

Finding link.

young plume Jul 3, 2022, 11:06 PM

#

I miss my brother lol

iron basalt Jul 3, 2022, 11:06 PM

#

https://www.amazon.com/Reinforcement-Learning-Introduction-Adaptive-Computation/dp/0262039249/ref=sr_1_1?keywords=sutton+barto+reinforcement+learning&qid=1656889564&sprefix=sutton+barto%2Caps%2C79&sr=8-1

Reinforcement Learning: An Introduction (Adaptive Computation and M...

Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series)

#

(They took RL from psychology and made it mathematical)

#

(it's the classic goto book)

young plume Jul 3, 2022, 11:08 PM

#

Kind of a bit out of my price range, so ill set aside a fund. But thank you for informing me of it. Hopefully some freelance work can get me enough

#

Trying to keep a good amount in the bank for interest

iron basalt Jul 3, 2022, 11:09 PM

#

https://en.wikipedia.org/wiki/Richard_S._Sutton

Richard S. Sutton

Richard S. Sutton is a Canadian computer scientist. Currently, he is a distinguished research scientist at DeepMind and a professor of computing science at the University of Alberta. Sutton is considered one of the founders of modern computational reinforcement learning, having several significant contributions to the field, including temporal ...

#

"Sutton is considered one of the founders of modern computational reinforcement learning,[1] having several significant contributions to the field, including temporal difference learning and policy gradient methods. "

#

(Btw it also covers a bit of neuroscience of how actual neurons do it, and they can do several things that DL can't that make them way better at it, it's in one of the last chapters, I highly recommend not skipping that part)

tidal bough Jul 3, 2022, 11:12 PM

#

young plume Kind of a bit out of my price range, so ill set aside a fund. But thank you for ...

I think this book is available for free from the offsite or something

#

seems so: http://incompleteideas.net/book/the-book.html

iron basalt Jul 3, 2022, 11:13 PM

#

Yeah you can probably find a PDF.

young plume Jul 3, 2022, 11:13 PM

#

Okie

tidal bough Jul 3, 2022, 11:13 PM

#

oh, you can always find a pdf of any book, I mean this one is even official 🙂

misty flint Jul 3, 2022, 11:32 PM

#

serene scaffold Is he hot?

~~asking the real questions~~

#

RunFail

misty flint Jul 3, 2022, 11:33 PM

#

iron basalt You can do RL without a neural network, the neural network serves a specific pur...

makes sense. then you can do deep RL

#

CLe_FeelsEvilLurk

#

jk

#

~~maybe~~

misty flint Jul 3, 2022, 11:34 PM

#

tidal bough seems so: http://incompleteideas.net/book/the-book.html

pepeStudy

velvet rampart Jul 3, 2022, 11:39 PM

#

Please I need help with this

serene scaffold Jul 4, 2022, 12:04 AM

#

velvet rampart Please I need help with this

Please don't ask people to read a screenshot of text and infer what your problem is.

tropic matrix Jul 4, 2022, 12:36 AM

#

what would be a good model architecture for a DNN regression model?
in my dataset, I have:

4400 features inputted
approx 23m samples of data (raw, not split into training, i'm using a 64% train, 16% val, 20% test split)
1 output neuron

what i'm mainly looking for is how many hidden layers and neurons per hidden layer I should need for training

sharp crescent Jul 4, 2022, 3:56 AM

#

young plume I miss my brother lol

I miss my ex lol

velvet rampart Jul 4, 2022, 5:35 AM

#

serene scaffold Please don't ask people to read a screenshot of text and infer what your problem...

please so how should I show what's wrong to seek for help

solemn dragon Jul 4, 2022, 7:18 AM

#

Hi there. When plotting a box plot with plotly express is there a way to only keep the output values when exporting to HTML?

Let me explain :

If i generate a box plot from a dataframe with 1Mil records the output file will keep the 1Mil records in the javascripts whereas I'm only interested in the min max med Q1 Q2 Q3 values.
Ideally the output file should only have those 6 values (and some outliers if need be ?)

Right now my solution is to manually plot the box plot from a dataframe that contains the BoxPlot info

unique flame Jul 4, 2022, 7:30 AM

#

tropic matrix what would be a good model architecture for a DNN regression model? in my datase...

Add/remove layers or neurons>Does it work well?>Repeat first step.

tranquil sage Jul 4, 2022, 8:16 AM

#

What's the possible reasons caused precision, recall, f1-score turned 0 while I have 27 samples for class 1?
Is it too less sample?

tranquil sage Jul 4, 2022, 8:21 AM

#

tranquil sage What's the possible reasons caused precision, recall, f1-score turned 0 while I ...

steady basalt Jul 4, 2022, 10:25 AM

#

Word cloud from Twitter?

weary ridge Jul 4, 2022, 10:44 AM

#

template matching in opencv?

#

anyone

steady basalt Jul 4, 2022, 11:33 AM

#

Seems about right

unique flame Jul 4, 2022, 12:09 PM

#

Does anyone have articles on distance estimation using object detection? These people have something, but not the distance estimation part yet..

D. Qiao and F. Zulkernine, "Vision-based Vehicle Detection and Distance Estimation," 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 2836-2842, doi: 10.1109/SSCI47803.2020.9308364.

arctic wedgeBOT Jul 4, 2022, 12:41 PM

#

wordcloud v1.8.2.2

A little word cloud generator

steady basalt Jul 4, 2022, 12:50 PM

#

Guys

#

I just discovered the iPhones search bar ability to search text in photos

#

Holy… shit!!!

vast yacht Jul 4, 2022, 1:06 PM

#

i need help on Apache Airflow. I'm still browsing Stackoverflow for this. I've been meaning to create 2 custom operators. One is for getting information and return a dictionary of it. One is for receiving that dictionary and print out the results. I've been stucking on how to share information between the two operators since both run with execute() of BaseOperator. I tried xcoms but still didnt achieve what i want

class HelloOperator(BaseOperator):
    def __init__(self, **info) -> None:
        super().__init__(**info)
        
    def execute(self, context):
        # message = f'Your information: {self.info}'
        # print(message)
        return info
    
class GetInformationOperator(BaseOperator):
    def __init__(self, name: str, age: int, **kwargs) -> None:
        super().__init__(**kwargs)
        self.name = name
        self.age = age
        
    def execute(self, context):
        return {
            'name': self.name,
            'age': self.age
        }

default_args = {
    'owner': 'Trang Nguyen',
    'retries': 5,
    'retry-delay': timedelta(minutes=5)
}

with DAG(
    dag_id='custom-operator_v1',
    default_args=default_args,
    description='this is my custom operator',
    start_date=datetime(2022, 7, 4),
    schedule_interval='@daily'
) as dag:
    get_info_task = GetInformationOperator(
        task_id='get_info_task',
        name='Cheng',
        age=22
    )
    
    hello_task = HelloOperator(
        task_id='greet_task',
        info='???'
    )
    
    get_info_task >> hello_task

versed gulch Jul 4, 2022, 1:59 PM

#

does anyone know how to extract OME-XML metadata from czi images in python?

tacit basin Jul 4, 2022, 2:06 PM

#

weary ridge template matching in opencv?

Check pyimagesearch

coarse nacelle Jul 4, 2022, 2:18 PM

#

Where do I start with regex any book or courses?

misty flint Jul 4, 2022, 2:18 PM

#

sounds just like twitter

#

but thats pretty funny

#

kekHands

#

still valuable data

#

nonetheless haha

tropic matrix Jul 4, 2022, 3:25 PM

#

unique flame Add/remove layers or neurons>Does it work well?>Repeat first step.

should i jus start at some arbitrary number?

#

just*

mild dirge Jul 4, 2022, 3:29 PM

#

tropic matrix should i jus start at some arbitrary number?

You should probably look at another recent project that did something similar, and see what they chose, and use that as starting point

wicked grove Jul 4, 2022, 4:46 PM

#

Hello,i have a doubt while using tensorflow and pytorch

#

Im trying to plot the model using add_graph

#

Im using colab but i keep getting an error ,that the only output should be tensors

misty flint Jul 4, 2022, 5:58 PM

#

i like spacy and nltk. havent tried sparknlp.

upper scaffold Jul 4, 2022, 6:01 PM

#

Hello, people. I am a beginner in programming and I would like to know your opinions in which you consider is the best learning pathway for learning really well and deeply AI. I am person who likes to construct the bases of what I want to learn and understand what I am doing. So I would be really grateful for any help 🙂

odd meteor Jul 4, 2022, 6:03 PM

#

What are you trying to do? SpaCy and NLTK are great.

wicked grove Jul 4, 2022, 6:05 PM

#

wicked grove Im using colab but i keep getting an error ,that the only output should be tenso...

RuntimeError: Only tensors, lists, tuples of tensors, or dictionary of tensors can be output from traced functions

odd meteor Jul 4, 2022, 6:05 PM

#

upper scaffold Hello, people. I am a beginner in programming and I would like to know your opin...

If you have the time, start by learning Python (not just python for data science) once you're done with python, you can then move to Data Science.

You can use Udemy or Coursera

upper scaffold Jul 4, 2022, 6:16 PM

#

odd meteor If you have the time, start by learning Python (not just python for data science...

OK nice! I will check it, thank you very much 👌

obsidian pumice Jul 4, 2022, 6:19 PM

#

Can anyone recommend any good Python tools for getting into reinforcement learning and making RL agents?

#

I've been having difficulties getting TensorForce to even import

misty flint Jul 4, 2022, 6:22 PM

#

TensorForce

#

CLf_HyperThonk

#

never tried it

#

have you tried checking your versions

obsidian pumice Jul 4, 2022, 6:23 PM

#

I mean my script couldn't even find the package

misty flint Jul 4, 2022, 6:23 PM

#

that sounds like a directory error

obsidian pumice Jul 4, 2022, 6:23 PM

#

We were in the same environment and everything

#

Conda was a mistake

misty flint Jul 4, 2022, 6:24 PM

#

hmm now i see why Stel recommends against conda for beginners

#

pithink

#

have you tried using just google colab

obsidian pumice Jul 4, 2022, 6:24 PM

#

Never heard of it

misty flint Jul 4, 2022, 6:24 PM

#

try it

#

just know you need to run
!pip install <your library> before you can import libraries

odd meteor Jul 4, 2022, 6:27 PM

#

You'd have to scrap Twitter data first using an API so you can gather enough tweets that captures specific kind of tweet(s) you'd wanna predict its sentiment.

After performing the sentiment analysis, if you'd wanna take it a nudge further, then look into ABSA (Aspect-Based Sentiment Analysis)

Finally, since this is a long term work as you've mentioned, I'd recommend you look into Adversarial Text Attack in NLP if you have more "Whys" 😊

serene scaffold Jul 4, 2022, 6:39 PM

#

misty flint hmm now i see why Stel recommends against conda for beginners

hmm now i see why Stel recommends against conda ~~for beginners~~
ftfy

weary ridge Jul 4, 2022, 7:12 PM

#

does anyone knows template matching?

serene scaffold Jul 4, 2022, 7:13 PM

#

weary ridge does anyone knows template matching?

don't ask to ask

weary ridge Jul 4, 2022, 7:15 PM

#

okay

#

i have wrote a code

#

which is highlighting wrong boxes

#

i d like to know the reason

#

is there anyone to check?

mild pecan Jul 4, 2022, 7:17 PM

#

Can someone help me with a credit risk task in Python? Please PM me

weary ridge Jul 4, 2022, 7:21 PM

#

serene scaffold don't ask to ask

i can attach the highlighted boxes if you d like to see

serene scaffold Jul 4, 2022, 7:23 PM

#

mild pecan Can someone help me with a credit risk task in Python? Please PM me

try asking your question in this channel. people don't want to have to DM you to find out if the question is one they can answer.

mild pecan Jul 4, 2022, 7:23 PM

#

serene scaffold try asking your question in this channel. people don't want to have to DM you to...

I'm not comfortable sharing sensible data to 50k people. If someone is willing to help I'm sure they would DM 🙂

serene scaffold Jul 4, 2022, 7:24 PM

#

mild pecan I'm not comfortable sharing sensible data to 50k people. If someone is willing t...

you still have to give enough information for people to know in advance if the question is something they can help with.

mild pecan Jul 4, 2022, 7:26 PM

#

serene scaffold you still have to give enough information for people to know in advance if the q...

Ok. It's a task about assessing default risk of loans. Basically I have to construct a regression model using test and training data sets

weary ridge Jul 4, 2022, 7:35 PM

#

serene scaffold don't ask to ask

i ve asked in detail

serene scaffold Jul 4, 2022, 7:38 PM

#

weary ridge i ve asked in detail

I wasn't volunteering to answer your question once you had asked it, necessarily. it's just that no one would volunteer to answer until the question was asked.

serene scaffold Jul 4, 2022, 7:38 PM

#

weary ridge i have wrote a code

can you show the code?

serene scaffold Jul 4, 2022, 7:38 PM

#

weary ridge which is highlighting wrong boxes

and can you show which boxes are highlighted, and explain which ones you want to be highlighted?

worldly dawn Jul 4, 2022, 7:40 PM

#

mild pecan I'm not comfortable sharing sensible data to 50k people. If someone is willing t...

you should not feel confident sharing them with a stranger in DMs either

lapis sequoia Jul 4, 2022, 7:45 PM

#

serene scaffold and can you show which boxes are highlighted, and explain which ones you want to...

can you like make an neural network ai that can connect to google

#

like it has access to see links videos and stuff on google like us

serene scaffold Jul 4, 2022, 7:47 PM

#

lapis sequoia like it has access to see links videos and stuff on google like us

if you make the neural network in Python, and there's a YouTube API for Python, then yes.

mild pecan Jul 4, 2022, 7:48 PM

#

Ok, Lets say I have a column "Data Type", containing values "1", "2", "3",... How do I create a categorical column out of this? Lets say the column contains 3 different data types. This means I have to create 2 categorical columns. How do I do this with the Pandas package?

serene scaffold Jul 4, 2022, 7:49 PM

#

mild pecan Ok, Lets say I have a column "Data Type", containing values "1", "2", "3",... Ho...

try taking that Series and putting .astype('category') on it.

lapis sequoia Jul 4, 2022, 7:55 PM

#

lapis sequoia like it has access to see links videos and stuff on google like us

but can anyone make that type of neural network?

mild pecan Jul 4, 2022, 7:56 PM

#

serene scaffold try taking that Series and putting `.astype('category')` on it.

Not quite sure how you mean that. Lets say there are only 2 data types in the column. "A" and "B". I have to delete column "Data Type" and instead create a column "A" which will have 1's and 0's

df['DATA_TYPE'] = df["A"].astype("category")

This yields an error for me

serene scaffold Jul 4, 2022, 7:57 PM

#

mild pecan Not quite sure how you mean that. Lets say there are only 2 data types in the co...

if you get an error, please always show the error, instead of just saying that you got one. I have no way of knowing what the error is unless you tell me.

mild pecan Jul 4, 2022, 7:57 PM

#

df['A'] = df["DATA_TYPE"].astype("category")```

#

should be this way

#

No this doesnt create 1's and 0's

#

I think I have to use a dummy function

serene scaffold Jul 4, 2022, 8:00 PM

#

mild pecan I think I have to use a dummy function

not necessarily. but I'm still unclear on what you're trying to do.

mild pecan Jul 4, 2022, 8:00 PM

#

df["A"] = pd.get_dummies(df["DATA_TYPE"])

#

also doesnt work

serene scaffold Jul 4, 2022, 8:02 PM

#

mild pecan also doesnt work

I'm going to let someone else try to answer this. statements like "x doesn't work" aren't helpful unless it's clear what x does, and how it's different from what you wanted.

mild pecan Jul 4, 2022, 8:05 PM

#

OK. Let me try to explain it better.
I have a (tidy) data frame from a .csv file with 10 different columns and endless rows. For example column "Job", "Data Type", "Salary", "Education", etc.
I want to focus on the "Data Type" column for now. This column contains only "A"s and "B"s. I want to make this column categorical, meaning that I want to delete the column "Data Type" and replace it with a new column "A" which is was originally created by the column "Data Type". This new column "A" only has 1's and 0's. For example, if in the column "Data Type" in row 4 there was an "A", then, in the new column "A", I want to see a 1 there. If there was a "B", I want to see a 0 there. Hope this was clearer. This all relates to basic regression modelling

wooden sail Jul 4, 2022, 8:09 PM

#

you'd wanna add as many columns as you have distinct categories in that case

mild pecan Jul 4, 2022, 8:11 PM

#

Yes, but since I have only two discting categories, I only need 1 new column. If the Data Type is A, there will be a 1, and if it's B, there will be a 0 in the new column. So I dont have to create two new columns

wooden sail Jul 4, 2022, 8:12 PM

#

if you have only 2, then yes

#

you'd given 3 in your original example, so i'd gone with that

mild pecan Jul 4, 2022, 8:12 PM

#

wooden sail you'd given 3 in your original example, so i'd gone with that

Yep sorry, I thoughth it was three, but its actually 2

wooden sail Jul 4, 2022, 8:14 PM

#

can you do something like myseries = df.pop['col_label']

mild pecan Jul 4, 2022, 8:15 PM

#

It's python df = pd.concat([df, pd.get_dummies(df['A'])], axis=1)
I just found it with google after making my question clearer 😄

wooden sail Jul 4, 2022, 8:15 PM

#

then apply a function to that series, and do df['category_label'] = result_of_operation_on_series

mild pecan Jul 4, 2022, 8:15 PM

#

Thank you nonetheless

wooden sail Jul 4, 2022, 8:15 PM

#

nice

mild pecan Jul 4, 2022, 8:18 PM

#

how do I delete multiple columns at once?

wooden sail Jul 4, 2022, 8:19 PM

#

df.drop(['my','labels','to','drop']) perhaps?

mild pecan Jul 4, 2022, 8:25 PM

#

thx

mild pecan Jul 4, 2022, 9:19 PM

#

Task: Define 'REFERENCE_DATE’ and ‘DEFAULT_DATE’ as date variables

#

how do I do this?

#

those two are columns

#

If it appears as datetime64[ns] after running df.dtypes, does this mean they are now defined as date variables?

#

I ran

df['REFERENCE_DATE'] = pd.to_datetime(df['REFERENCE_DATE'])

before

mild pecan Jul 4, 2022, 9:48 PM

#

Is there maybe a more active Data Science related Discord server?

serene scaffold Jul 4, 2022, 10:17 PM

#

mild pecan Is there maybe a more active Data Science related Discord server?

You feel that this channel isn't active enough?

serene scaffold Jul 4, 2022, 10:18 PM

#

mild pecan If it appears as datetime64[ns] after running df.dtypes, does this mean they are...

It means that the type of data contained in that column is datetime values. Whether or not that's what you want, I'm not sure.

#

The data contained within a DataFrame are not "variables"

lapis sequoia Jul 4, 2022, 11:02 PM

#

anyone here can make neural network if so please do tell me

serene scaffold Jul 4, 2022, 11:05 PM

#

lapis sequoia anyone here can make neural network if so please do tell me

There are a lot of kinds of neural networks. Your question is underspecified

misty flint Jul 4, 2022, 11:06 PM

#

serene scaffold > hmm now i see why Stel recommends against conda ~~for beginners~~ ftfy

kekHands

#

tbh i heard a podcast today about how bad conda was in a production environment

lapis sequoia Jul 4, 2022, 11:06 PM

#

serene scaffold There are a lot of kinds of neural networks. Your question is underspecified

well i need one that is an ai that i can connect to google

misty flint Jul 4, 2022, 11:06 PM

#

way too bulky

serene scaffold Jul 4, 2022, 11:07 PM

#

lapis sequoia well i need one that is an ai that i can connect to google

Once it "connects to Google", what is it going to do?

#

Because AIs don't just accumulate arbitrary knowledge

#

You have to have a very specific idea of what you're trying to do.

lapis sequoia Jul 4, 2022, 11:09 PM

#

well that i and smn else will do but first we need to do the first part

serene scaffold Jul 4, 2022, 11:10 PM

#

If you don't have a clear idea of what you're trying to do, and you can't communicate it, no one can make a neural network that suits your purposes

tidal bough Jul 4, 2022, 11:10 PM

#

~~something something coherent extrapolated volition :p~~

lapis sequoia Jul 4, 2022, 11:13 PM

#

serene scaffold If you don't have a clear idea of what you're trying to do, and you can't commun...

the idea is when the neural is made me and my pal are gonna make it self learning ai using python and by that we are gonna make it learn from google and if we can make it create what it learnt

serene scaffold Jul 4, 2022, 11:14 PM

#

lapis sequoia the idea is when the neural is made me and my pal are gonna make it self learnin...

So, neural networks aren't sponges that can just soak up knowledge from anything. They're mathematical constructs that approximate functions.

lapis sequoia Jul 4, 2022, 11:15 PM

#

well we are gonna use that to make it take info from google and learn and keep all its learnings in a type of encryted file

#

so it doesn't have storage issues

serene scaffold Jul 4, 2022, 11:16 PM

#

lapis sequoia well we are gonna use that to make it take info from google and learn and keep a...

Sorry, but none of this is going to work. I would suggest you try a different project with a more coherent goal.

lapis sequoia Jul 4, 2022, 11:16 PM

#

well tell then how would you make an ai

serene scaffold Jul 4, 2022, 11:17 PM

#

An AI that does what?

mild pecan Jul 4, 2022, 11:17 PM

#

serene scaffold It means that the type of data contained in that column is datetime values. Whet...

Thats what I was confused about. What do they mean with "Define column x as a date variable"? would df['REFERENCE_DATE'] = pd.to_datetime(df['REFERENCE_DATE']) be wrong here?

serene scaffold Jul 4, 2022, 11:17 PM

#

Each ai has a very specific thing that it dows

lapis sequoia Jul 4, 2022, 11:17 PM

#

mild pecan Thats what I was confused about. What do they mean with "Define column x as a da...

self learns like i said

hollow sentinel Jul 4, 2022, 11:17 PM

#

it sounds like you don't have a problem statement

lapis sequoia Jul 4, 2022, 11:17 PM

#

that's literally what i was explaining up tehre

serene scaffold Jul 4, 2022, 11:17 PM

#

mild pecan Thats what I was confused about. What do they mean with "Define column x as a da...

I don't know what the person who wrote that question thinks those words mean.

serene scaffold Jul 4, 2022, 11:18 PM

#

lapis sequoia self learns like i said

Self learns what? This isn't a coherent problem statement. It sounds like you need to spend more time learning about what AI is in general, so that you can come up with project ideas that make sense in terms of what AI actually is.

hollow sentinel Jul 4, 2022, 11:18 PM

#

https://machinelearningmastery.com/how-to-define-your-machine-learning-problem/

Machine Learning Mastery

How to Define Your Machine Learning Problem

The first step in any project is defining your problem. You can use the most powerful and shiniest algorithms available, but the results will be meaningless if you are solving the wrong problem. In this post you will learn the process for thinking deeply about your problem before you get started. This is unarguably the […]

#

this should be pinned imo

mild pecan Jul 4, 2022, 11:21 PM

#

serene scaffold I don't know what the person who wrote that question thinks those words mean.

I mean its a super simple task no?

#

I think I'm making this way more complicated than it should be!

serene scaffold Jul 4, 2022, 11:23 PM

#

mild pecan I mean its a super simple task no?

I'm sure it's a simple task, but if we don't have shared definition of what "variable" means, that's going to make communication difficult.

What you did is probably correct. Can you ask the person who told you to do it to confirm?

#

Again, the data in a DataFrame aren't variables. They're just data, or elements of the DataFrame. Variables are names for objects in the python environment.

iron basalt Jul 4, 2022, 11:27 PM

#

lapis sequoia self learns like i said

Why do you want it to "self learn"? What is the end goal? What is done with the knowledge?

hollow sentinel Jul 4, 2022, 11:27 PM

#

are you trying to predict a numerical outcome?

#

are you trying to label an instance as a particular class?

iron basalt Jul 4, 2022, 11:28 PM

#

Forget everything you think you know about AI and whatever. Just specify what the goal is first. Then we can discuss if Ai is even required, and if it is, how to go about it.

#

Be specific about the goal, start general and add more details.

mild dirge Jul 4, 2022, 11:29 PM

#

iron basalt Forget everything you think you know about AI and whatever. Just specify what th...

This reminds me of a friend helping some business owner, and one of the things the owner put on the list of to-dos was to "use AI" with no explanation at all lol

iron basalt Jul 4, 2022, 11:29 PM

#

Do not DM me.

#

I will block all DMs.

#

Unless I ask to be DMed.

iron basalt Jul 4, 2022, 11:33 PM

#

mild dirge This reminds me of a friend helping some business owner, and one of the things t...

"I want to go to the moon, and it needs to use a train." - You are hiring experts to help you with your business because you admit to not knowing how to do it or don't have the time, so it's only fair that you make no assumptions about the process or it will sound like that quote.

lapis sequoia Jul 4, 2022, 11:34 PM

#

iron basalt Why do you want it to "self learn"? What is the end goal? What is done with the ...

what is done with the knowledge it helps you make stuff or shows you the code you need cause not everyone can learn so it helps learning and showing so it benefits

tidal bough Jul 4, 2022, 11:34 PM

#

strange, I thought these days it's usually "it needs to use the blockchain" :p

iron basalt Jul 4, 2022, 11:35 PM

#

lapis sequoia what is done with the knowledge it helps you make stuff or shows you the code yo...

This is beginning to sound like Github copilot, is this correct?

lapis sequoia Jul 4, 2022, 11:35 PM

#

uh not sure what that it is

iron basalt Jul 4, 2022, 11:36 PM

#

https://github.com/features/copilot/

GitHub

GitHub Copilot · Your AI pair programmer

GitHub Copilot works alongside you directly in your editor, suggesting whole lines or entire functions for you.

lapis sequoia Jul 4, 2022, 11:37 PM

#

hmm interesting i see what you mean but if i just get the nueral network to work i can get the rest to do to

hollow sentinel Jul 4, 2022, 11:37 PM

#

tidal bough strange, I thought these days it's usually "it needs to use the blockchain" :p

https://tenor.com/view/why-not-both-why-not-take-both-gif-11478682

Tenor

iron basalt Jul 4, 2022, 11:37 PM

#

tidal bough strange, I thought these days it's usually "it needs to use the blockchain" :p

Picture of train sold as an NFT.

mild dirge Jul 4, 2022, 11:37 PM

#

lapis sequoia hmm interesting i see what you mean but if i just get the nueral network to work...

If you still ask this, then you don't understand what we are saying to you

hollow sentinel Jul 4, 2022, 11:37 PM

#

you want a neural network that can help you code?

lapis sequoia Jul 4, 2022, 11:39 PM

#

okie i see i confused everyone give me a bit to make it better a explanation

candid pollen Jul 5, 2022, 12:10 AM

#

hello i have a question, in this layer snippet (Conv1D(3,5, activation='relu', input_shape=(200,3))) it has 291 params, how do i explain this manually?

hollow sentinel Jul 5, 2022, 12:23 AM

#

another mini data sci project done

#

https://tenor.com/view/another-one-bites-the-dust-queen-dance-dancing-gif-8171986

Tenor

#

how do you like that jason brownlee

#

but the models probably overfitted even with cross validation

#

oh no

#

i'm dumb

#

i was using regression models for a classificatiion problem

#

that was embarassing lol

#

dude why are people starring my repo

nova pollen Jul 5, 2022, 3:03 AM

#

@royal garnet what's your question

royal garnet Jul 5, 2022, 3:04 AM

#

I have a dataframe consisting of a bunch of sessions for an event spanning several days.

#

There is a start datetime and end datetime column - and I want to somehow get new data frames for each day

#

But the catch is, I am writing a program that can take any csv as input - so I won't always know what the dates are.

#

Just that there could be 1 or more days worth of dates.

#

Is that something that can be done?

nova pollen Jul 5, 2022, 3:06 AM

#

can you give an example

arctic wedgeBOT Jul 5, 2022, 3:06 AM

#

:incoming_envelope: :ok_hand: applied mute to @royal garnet until <t:1656991001:f> (9 minutes and 59 seconds) (reason: discord_emojis rule: sent 80 emojis in 10s).

nova pollen Jul 5, 2022, 3:06 AM

#

oops

delicate apex Jul 5, 2022, 3:07 AM

#

well it pings the mods whenever it mutes someone, at least

nova pollen Jul 5, 2022, 3:07 AM

#

mhm

silent fable Jul 5, 2022, 3:07 AM

#

looks like a mistake was made

#

!unmute 165943073040236544

arctic wedgeBOT Jul 5, 2022, 3:07 AM

#

:incoming_envelope: :ok_hand: pardoned infraction mute for @royal garnet.

silent fable Jul 5, 2022, 3:07 AM

#

sorry about that

nova pollen Jul 5, 2022, 3:08 AM

#

thanks luna

silent fable Jul 5, 2022, 3:08 AM

#

!paste use the pasting service to avoid this issue again

arctic wedgeBOT Jul 5, 2022, 3:08 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

nova pollen Jul 5, 2022, 3:08 AM

#

@royal garnet we're back

silent fable Jul 5, 2022, 3:08 AM

#

@royal garnet

silent fable Jul 5, 2022, 3:08 AM

#

nova pollen thanks luna

of course nod

crude bluff Jul 5, 2022, 3:10 AM

#

i think we're not going to see him again even after unmuted him.

#

poor guy

hybrid sierra Jul 5, 2022, 3:12 AM

#

Man's traumatized

nova pollen Jul 5, 2022, 3:12 AM

#

we're chatting in dms now, dont worry :p

royal garnet Jul 5, 2022, 3:15 AM

#

Oops

#

I'm back

#

I think I have something to try - but a quick follow-on question. To confirm, can pandas group dataframes by datetime objects?

#

say, by each individual day in a column made up of parsed dates

royal garnet Jul 5, 2022, 4:04 AM

#

Oh man this guys video just made my day.
https://www.youtube.com/watch?v=cUArbPdzR_c

YouTube

Reuven Lerner

Grouping on dates in pandas

Pandas has great support for dates and times — and that extends to its grouping capabilities, too. In this video, I show you how to group on datetime fields, both indirectly (by creating a new column) and directly in the call to "groupby". This video continues my previous one, in which I introduce grouping in pandas.

Jupyter notebooks from what...

▶ Play video

wicked grove Jul 5, 2022, 4:17 AM

#

hello im using pytorch and colab , ho can i visualize my model's architecture

lapis sequoia Jul 5, 2022, 4:44 AM

#

does data science include ips?

#

https://paste.pythondiscord.com/uzinabulav

#

gets all info on a ip or a websites ip

royal garnet Jul 5, 2022, 5:21 AM

#

Okay, wtf am I doing wrong here.

grouped = evt.groupby('day')

grouped.get_group('day')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/max/python/venv/pm-toolbox/scratch.ipynb Cell 11' in <cell line: 1>()
----> 1 grouped.get_group('day')

File ~/python/venv/pm-toolbox/lib/python3.10/site-packages/pandas/core/groupby/groupby.py:747, in BaseGroupBy.get_group(self, name, obj)
    745 inds = self._get_index(name)
    746 if not len(inds):
--> 747     raise KeyError(name)
    749 return obj._take_with_is_copy(inds, axis=self.axis)

KeyError: 'day'

#

I am following the pandas documentation - and it just won't work and I'm about ready to toss my laptop out the window.

serene scaffold Jul 5, 2022, 5:27 AM

#

royal garnet Okay, wtf am I doing wrong here. ```python grouped = evt.groupby('day') groupe...

If you are grouping by the day column, then "day" isn't going to be the name of a group. One of the unique values in the day column will be.

royal garnet Jul 5, 2022, 5:30 AM

#

oh

#

damn, I feel dumb.

serene scaffold Jul 5, 2022, 5:30 AM

#

It's okay

royal garnet Jul 5, 2022, 5:32 AM

#

Pandas has so far proven to be the most challenging thing to learn...

#

Is that normal - or am I just not getting something?

serene scaffold Jul 5, 2022, 5:34 AM

#

It's different from the rest of python

lapis sequoia Jul 5, 2022, 5:34 AM

#

Okay so question

iron basalt Jul 5, 2022, 5:36 AM

#

royal garnet Is that normal - or am I just not getting something?

It's normal. It's big with lots of stuff in it and it uses various things like operator overloading abuse, multiple types allowed per function (implicit function overloading), and abstractions upon abstractions.

lapis sequoia Jul 5, 2022, 5:37 AM

#

I have a pandas dataframe in which column('SOP') has numbers 0 to 100. It also has another column called "open" with numbers in it.

I want to create a new column where whenever the column SOP == 0, it takes the value of open the last time SOP was equal to 0 and subtracts it from the open value of the current row.

How can I do this?

I can show code if this is confusing for you.

#

I've been stuck on this for literally 8 hours

royal garnet Jul 5, 2022, 5:40 AM

#

Makes me wonder if maybe there is an easier way to approach solving this. I'm working with csvs and then I want to define some functions to pull certain bits of information based on conditions - and then populate that information and write to a spreadsheet or save it to a db. Right now, I am just trying to find a certain unique string in a row, and then for each day find the minimum time for that given unique string.

iron basalt Jul 5, 2022, 5:40 AM

#

iron basalt It's normal. It's big with lots of stuff in it and it uses various things like o...

In addition it's building on top of Numpy which is already its whole own thing to learn.

iron basalt Jul 5, 2022, 5:42 AM

#

royal garnet Makes me wonder if maybe there is an easier way to approach solving this. I'm w...

It would be simple to do with some manual loops and such, but Python is slow, so to do it fast you need to know how to do it with whatever functions Pandas/Numpy provide.

royal garnet Jul 5, 2022, 5:43 AM

#

My datasets are rarely longer than 500 ish rows

iron basalt Jul 5, 2022, 5:43 AM

#

Well, you can do it manually first, see how it goes, and then maybe try to find how to do the same thing faster later.

lapis sequoia Jul 5, 2022, 5:43 AM

#

Ok, I solved my problem 🙂

royal garnet Jul 5, 2022, 5:43 AM

#

But screw it - I've already put 2 days into learning this - may as well keep on cracking

iron basalt Jul 5, 2022, 5:44 AM

#

If you are used to Pandas/Numpy then it becomes easier to some extend to do it with the functions provided, but there is a learning curve before that point.

royal garnet Jul 5, 2022, 5:44 AM

#

Am I on the right track with using groups to figure this out.

iron basalt Jul 5, 2022, 5:46 AM

#

Give an example table or print yours if you can share it.

#

(head)

royal garnet Jul 5, 2022, 5:46 AM

#

I can't, it has pii info on it.

iron basalt Jul 5, 2022, 5:48 AM

#

Make up some table and put it here.

#

Pandas examples often use animals.

royal garnet Jul 5, 2022, 5:48 AM

#

Give me a moment, I'm putting something together

#

@iron basalt Something like this:

Session Start Date/Time End Date/Time Session Name Session ID Speaker Code Full Name Email Address
2022-06-14 13:00:00 2022-06-14 14:15:00 SESSION 1 4009a82f-eaa7-4068-919e-55ee38ee64b5 UUID FULLNAME EMAIL
2022-06-14 13:00:00 2022-06-14 14:15:00 SESSION 1 4009a82f-eaa7-4068-919e-55ee38ee64b5 UUID FULLNAME EMAIL
2022-06-14 13:00:00 2022-06-14 14:15:00 SESSION 1 4009a82f-eaa7-4068-919e-55ee38ee64b5 UUID FULLNAME EMAIL
2022-06-14 13:00:00 2022-06-14 14:15:00 SESSION 1 4009a82f-eaa7-4068-919e-55ee38ee64b5 UUID FULLNAME EMAIL
2022-06-14 13:00:00 2022-06-14 14:15:00 SESSION 1 4009a82f-eaa7-4068-919e-55ee38ee64b5 UUID FULLNAME EMAIL

#

I want an output that shows me the first session, on each date in the DF for each speaker code (which is a uuid)

#

We're only seeing on session repeated - because in this case that first session has more than 5 speakers

#

Does that make sense?

iron basalt Jul 5, 2022, 5:57 AM

#

So everything the same except the last 3 columns?

#

(5 speakers)

royal garnet Jul 5, 2022, 5:57 AM

#

no, the df goes on for 100+ more rows

iron basalt Jul 5, 2022, 5:57 AM

#

Yeah I mean for what is shown here.

royal garnet Jul 5, 2022, 5:57 AM

#

Multipel date times, session names, and session ids

#

Oh yes

#

Correct

iron basalt Jul 5, 2022, 5:59 AM

#

You can try to first add a column for the date and a column for the time to split that up.

#

Unless they are already separate columns.

royal garnet Jul 5, 2022, 6:01 AM

#

They aren't

#

and I was thinking that would be a good idea just now

#

can I do that right at the csv_read part of my code?

#


evt = pd.read_csv(
            sessions,
            sep="\t",
            encoding="utf-8-sig",
            usecols=session_columns,
            converters={"Speaker Code": lambda x: extract_speaker_codes(x, spk)},
        )[
            session_columns
        ]  # [session_columns] at the end here preserves the desired column order
evt = evt.explode("Speaker Code")

iron basalt Jul 5, 2022, 7:03 AM

#

royal garnet ```python evt = pd.read_csv( sessions, sep="\t", ...

parse_dates

iron basalt Jul 5, 2022, 7:17 AM

#

royal garnet I want an output that shows me the first session, on each date in the DF for eac...

  Session Start Date/Time        End Date/Time Session Name                            Session ID Speaker Code Full Name Email Address
0     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 1    NAME 1       EMAIL 1
1     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 1    NAME 1       EMAIL 1
2     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 1    NAME 1       EMAIL 1
3     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 1    NAME 1       EMAIL 1
4     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 1    NAME 1       EMAIL 1
5     2022-06-14 13:00:00  2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5       UUID 2    NAME 2       EMAIL 2
6     2022-06-15 07:00:00  2022-06-15 09:15:00    SESSION 2  1009a82f-eaa7-40ba-919e-55eeabee64b5       UUID 2    NAME 2       EMAIL 2
7     2022-06-15 11:00:00  2022-06-15 12:15:00    SESSION 3  2222a82f-eaa7-40ba-919e-55eeabee64b5       UUID 2    NAME 2       EMAIL 2
--------------------------------
                                            End Date/Time Session Name                            Session ID Full Name Email Address
Session Start Date/Time Speaker Code                                                                                                
2022-06-14 13:00:00     UUID 1        2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5    NAME 1       EMAIL 1
                        UUID 2        2022-06-14 14:15:00    SESSION 1  4009a82f-eaa7-4068-919e-55ee38ee64b5    NAME 2       EMAIL 2
2022-06-15 07:00:00     UUID 2        2022-06-15 09:15:00    SESSION 2  1009a82f-eaa7-40ba-919e-55eeabee64b5    NAME 2       EMAIL 2
2022-06-15 11:00:00     UUID 2        2022-06-15 12:15:00    SESSION 3  2222a82f-eaa7-40ba-919e-55eeabee64b5    NAME 2       EMAIL 2
``` If this is what you want then you can do it with groupby.

#

grouping = df.groupby(["Session Start Date/Time", "Speaker Code"])
print(grouping.first())

#

Except you can split the date up to further get what you want (day / time).

#

(Right now showing every session of each day)

#

(You only want by day, not time)

noble arrow Jul 5, 2022, 9:41 AM

#

Hi there, I'm studying about keras modelling from two different articles and trying my best to understand how linear probing works
So far I found an article that has this series of code on linear probing in a defined function:

# Single dense layer for linear probing
model.linear_probe = K.Sequential(
    [layers.Input(shape=(width,)), layers.Dense(10)], name="linear_probe"
)

model.encoder.summary()
model.projection_head.summary()
model.linear_probe.summary()

I'm wondering how can I better translate this define function code into this:

from keras.models import Sequential 

model = Sequential() 
input_layer = Dense(32, input_shape=(8,)) model.add(input_layer) 
hidden_layer = Dense(64, activation='relu'); model.add(hidden_layer) 
output_layer = Dense(8) 
model.add(output_layer)

#

I think my first step can be:

model.linear_probe = K.Sequential()
input_layer = Dense(10, input_shape=(width,))
model.add(input_layer)

I think? I'll also try my best to figure out the width part as well too

unique flame Jul 5, 2022, 9:55 AM

#

I have question concerning K-fold cross validation for image classification. I am using the function "image_dataset_from directory" and put validation split on 0.3. I then want to create three instances where the validation data would consist of the first part of data, then the middle part of the data and then the final part of the data. I was thinking of putting shuffle to "True" and change the seed each time (e.g. seed=0, seed=100,seed=1000), but I don't think that's correct.

So anyone know a better way to do cross validation on image classification?

mild dirge Jul 5, 2022, 9:57 AM

#

For cross validation you take the entire training data and split it into evenly sized folds

#

A regular value is 5 folds, so each time 80% is training, 20% is validation

#

and that way you have used every bit of your training data for training (4 times) and for validation (1 time)

#

@unique flame

#

I am not really sure what that validation split of 0.3* means, maybe that is for splitting the entire dataset into training and testing?

unique flame Jul 5, 2022, 10:01 AM

#

Yes it is splitting it in training and testing. Well i am splitting into training and validation. For testing I add unlabeled images.

mild dirge Jul 5, 2022, 10:02 AM

#

Okay so you got 30% testing (we keep that untouched until after we are done with the entire training process) and 70% training right

unique flame Jul 5, 2022, 10:02 AM

#

I could set the validation split to 0.2 and do a 5 fold, but like you said every part of the data should have been part of the validation set. And right now I don't know how to do that

mild dirge Jul 5, 2022, 10:02 AM

#

And your question is how to split training into train and validation multiple times?

#

Such that it uses all data for training and testing at least once

unique flame Jul 5, 2022, 10:03 AM

#

yes

mild dirge Jul 5, 2022, 10:03 AM

#

So k-fold cross validation doesn't take a "validation split", it takes an amount of folds

#

like 5

#

#

Here blue is validation for each split, and red/pinkish is training

#

And the test set is kept completely separate

unique flame Jul 5, 2022, 10:06 AM

#

yes, so you mean i should be using another function to load the data?

mild dirge Jul 5, 2022, 10:06 AM

#

No, you used the function "image_dataset_from_directory" and put validation split on 0.3.

#

So that means you already split it into training (green) and testing (purple)

#

Now you need to split training into multiple folds for each split of k-fold cross validation

#

Some pseudo-code for this would be like:

entire_training_data = ...
for split in range(5):
  split_train = []
  split_valid = []
  for idx, sample in enumerate(entire_training_data):
    if idx % 5 == split:
      split_valid.append(sample)
    else:
      split_train.append(sample)
   
   # Code for training and validation

#

This is assuming there is no pattern in the order of the data

unique flame Jul 5, 2022, 10:13 AM

#

Thanks, I'll try this. Brain was thinking loud for a few sec

magic mason Jul 5, 2022, 11:41 AM

#

Hello everyone

#

i have on assesment in which i have to implement k-mean clustering in python which will read and cluster data
but only using numpy and csv.

#

i dont know about this subject but it is my core subject so i have to study it

#

can anyone provide me any source or help , so i can able to do this

#

i know regarding k-mean clustering but dont know how to do coding part and what if i watch videos that use any other library, will that help me?

#

As i cant find anh vjdeo which only uses any one of those two libraries

mild dirge Jul 5, 2022, 11:45 AM

#

You can do it without any of those two libraries

#

If you understand how k-means clustering works, you can load in the data, and then use some simple for loops to perform iterations of k-means clustering

#

And later simplify it with numpy

#

@magic mason

#

You could also just check if numpy has a certain function that you would think is useful, or maybe just check out a general intro to numpy

#

Stuff like np.mean could be useful f.e.

magic mason Jul 5, 2022, 11:48 AM

#

Thanks i will have a look

near matrix Jul 5, 2022, 11:53 AM

#

Hi all!

I've made a package to read and write sklearn objects blueprints to YAML.

The goal is to make experiment tracking more convenient.

https://github.com/matheusccouto/scikit-yaml

GitHub

GitHub - matheusccouto/scikit-yaml: Define Scikit-Learn objects usi...

Define Scikit-Learn objects using YAML. Contribute to matheusccouto/scikit-yaml development by creating an account on GitHub.

tidal bough Jul 5, 2022, 12:32 PM

#

I think this is explode

#

looks like explode needs elements to be lists though, not dicts

#

ah, and also not quite the right output, hmm

#

That'd probably work

karmic valley Jul 5, 2022, 12:37 PM

#

hi

#

i am trying to add up all the pixel colours and then divide by number of pixels in this list. however when i do print(colour_average) i am getting [6319.198711063373, 6403.701396348013, 5679.463480128894]. these numbers are much bigger than 255

#

color_sum = [0,0,0]
for coord in coord_list:
    row = coord[0]
    col = coord[1]
    for new_row in range(0, row):
        pixel = im[new_row][col]
        color_sum[0] += pixel[0]
        color_sum[1] += pixel[1]
        color_sum[2] += pixel[2]

color_average = [color_sum[0]/len(coord_list), color_sum[1]/len(coord_list), color_sum[2]/len(coord_list)]
print(color_average)

tidal bough Jul 5, 2022, 12:38 PM

#

the number of elements you're taking the mean over isn't len(coord_list)

#

it's the sum of coord[0] for coord in coord_list

#

the easiest way to fix that would be to just do count += 1 every time you take a pixel into account, and divide by that at the end.

wooden sail Jul 5, 2022, 12:41 PM

#

if you can describe the desired effect a little more clearly, we can come up with a 2-liner using numpy, too

#

what exactly do you want to average?

#

though i think python also has a mean() built-in

karmic valley Jul 5, 2022, 12:41 PM

#

ill show the whole code so it makes a bit more sense

wooden sail Jul 5, 2022, 12:41 PM

#

no no, that'll make it worse if the code is long

#

just the high level idea

mild dirge Jul 5, 2022, 12:41 PM

#

Can you close your help-channel if you are getting help here @karmic valley Someone is trying to help you there too

karmic valley Jul 5, 2022, 12:42 PM

#

import cv2
from PIL import Image

im = cv2.imread(r"C:\Users\guest\Documents\Education\University Imperial\Module 3\TrackingAI outfolder\test\plot\234496_1024.png", cv2.IMREAD_UNCHANGED)

coord_list = []

for row in range(len(im)):
    for col in range(len(im[row])):
        if im[row][col][2] >= 200 and im[row][col][0] < 100 and im[row][col][1] < 100:
            im[row][col][1] = 255
            im[row][col][0] = 0
            im[row][col][2] = 0

            coord_list.append([row, col])



color_sum = [0,0,0]
for coord in coord_list:
    row = coord[0]
    col = coord[1]
    for new_row in range(0, row):
        pixel = im[new_row][col]
        color_sum[0] += pixel[0]
        color_sum[1] += pixel[1]
        color_sum[2] += pixel[2]

color_average = [color_sum[0]/len(coord_list), color_sum[1]/len(coord_list), color_sum[2]/len(coord_list)]
print(color_average)

cv2.imwrite("output_graph.png", im)
pil_im = Image.open("output_graph.png", 'r')
pil_im.show()

#

oh okay

wooden sail Jul 5, 2022, 12:42 PM

#

you want the average of each of r, g, and b of an image?

karmic valley Jul 5, 2022, 12:44 PM

#

so the whole code is this. it basically looks at an image and wherever there is a red line it notes its coordinate. then it converts red line to green line.

next part of code then is meant to look at those coordinates and work out the average pixel rgb colour below the line but its not working

wooden sail Jul 5, 2022, 12:44 PM

#

can you clarify "below the line"

karmic valley Jul 5, 2022, 12:44 PM

#

so was trying to add up all the rgb pixel values below line and then divide by above line.

yes i will show example 2secs

#

#

so the red line is what im referring to

wooden sail Jul 5, 2022, 12:47 PM

#

so, you find where there is a red pixel, and then you want the average r, g, and b for the rest of the column below that pixel?

karmic valley Jul 5, 2022, 12:50 PM

#

wooden sail so, you find where there is a red pixel, and then you want the average r, g, and...

yes exactly.

#

so will be working out average rgb for everything under red line

tidal bough Jul 5, 2022, 12:50 PM

#

I think you straight up can't get a speedup via apply/np.vectorize unless you're using numpy ufuncs

karmic valley Jul 5, 2022, 12:51 PM

#

but i think my code is a bit wrong because its giving a massive number

#

not between 0 and 255

wooden sail Jul 5, 2022, 12:51 PM

#

it's certainly wrong if it's giving you a large number

tidal bough Jul 5, 2022, 12:51 PM

#

tidal bough the number of elements you're taking the mean over isn't `len(coord_list)`

^

karmic valley Jul 5, 2022, 12:52 PM

#

so the problem lies somewhere in here


color_sum = [0,0,0]
for coord in coord_list:
    row = coord[0]
    col = coord[1]
    for new_row in range(0, row):
        pixel = im[new_row][col]
        color_sum[0] += pixel[0]
        color_sum[1] += pixel[1]
        color_sum[2] += pixel[2]


color_average = [color_sum[0]/len(coord_list), color_sum[1]/len(coord_list), color_sum[2]/len(coord_list)]
print(color_average)

wooden sail Jul 5, 2022, 12:53 PM

#

what i would do is call np.array(the_image) first to get a 3d array. then something like np.mean(my_array[row_with_redpix+1:,current_col,:], axis=0)

karmic valley Jul 5, 2022, 12:54 PM

#

sorry im new to coding. took me 3months to write these 20 lines of code lol.

so where exactly do i write this np.array

wooden sail Jul 5, 2022, 12:54 PM

#

hmm in that case it's probably better if you don't use numpy arrays, but debug your code instead

karmic valley Jul 5, 2022, 12:55 PM

#

i think maybe the maths/logic behind this part of code not right but cant figure it out

#

for new_row in range(0, row):
pixel = im[new_row][col]
color_sum[0] += pixel[0]
color_sum[1] += pixel[1]
color_sum[2] += pixel[2]

tidal bough Jul 5, 2022, 12:57 PM

#

Actually, looks like it's not quite zero-speedup? The code for non-ufuncs is complicated:
https://github.com/pandas-dev/pandas/blob/e8093ba372f9adfe79439d90fe74b0b5b6dea9d6/pandas/core/apply.py#L1128-L1147=
but looks like it ends up using map_infer, which is a Cython function:
https://github.com/pandas-dev/pandas/blob/7e23a37e1c5bda81234801a6584563e2880769eb/pandas/_libs/lib.pyx#L2869=
So it should be a bit faster than a Python loop at least, even when using normal Python functions. Probably. Needs measurements.

#

(That's about apply, note. np.vectorize, I remember from reading the source code, does literally just use a normal Python loop when applied to a Python function. Unless they changed that.)

serene scaffold Jul 5, 2022, 1:02 PM

#

You can use dask 😂

#

Did you concat on the wrong axis?

#

What's the index? Because concat joins on that.

#

BlobFearSweat

#

I was making a joke. But dask can process a bunch of independent csv files as one DataFrame, provided that they have the same schema

#

And it can distribute operations across an arbitrary number of cores.

wooden sail Jul 5, 2022, 1:18 PM

#

😌 reminds me of that meme, "i paid for the full computer, i'm gonna use all of it"

primal shuttle Jul 5, 2022, 1:19 PM

#

Seems like it's more of a "Gonna use all of it, whether I like it or not"

#

😉

hallow turret Jul 5, 2022, 1:31 PM

#

guys, I would like to start with ML but i dont knw how.

#

help

#

please

#

:3

primal shuttle Jul 5, 2022, 1:32 PM

#

Forgive the slightly ironic comment: do you have a machine, @hallow turret ?

#

If you do, start learning 😇

hallow turret Jul 5, 2022, 1:33 PM

#

dude

primal shuttle Jul 5, 2022, 1:34 PM

#

It's just there are so many resources online that any google search gives you so much information on how to go from zero to hero it's not anyone's but your task to personalise learning to your needs

hallow turret Jul 5, 2022, 1:37 PM

#

nice

#

i see

primal shuttle Jul 5, 2022, 1:38 PM

#

I mean no disrespect, it's very difficult to advise anything on this - it's like a medical student asking what kind of doctor they want to be - nobody can make the decision for them

steady basalt Jul 5, 2022, 1:40 PM

#

hallow turret guys, I would like to start with ML but i dont knw how.

Try coding an ml project using public data set

#

UCI is good

#

And of course , use python

#

Unless u only know other Lang

#

R is passable but not as flexible

#

C++ is a joke for 99% of ml needs

hallow turret Jul 5, 2022, 1:44 PM

#

bruh im just starting with ai and python can you recommend me how should i start learning ai...

#

https://tenor.com/view/awkward-umm-what-what-gif-14694719

Tenor

primal shuttle Jul 5, 2022, 1:46 PM

#

https://www.python.org/about/gettingstarted/ - that's a good starting point for python, way back when I was learning through datacamp, but I haven't checked them out in years, then pick a project you are interested in and do it - be ready to stomach lots of frustration 🙂 @hallow turret

Python.org

Python For Beginners

The official home of the Python Programming Language

jolly knoll Jul 5, 2022, 2:02 PM

#

Hello people. I recently came across Approximate Nearest Neighbour and was wondering, if I have a master dataset that consists of datasets A,B,C; is it theoretically possible to ensure my output is only from dataset C?

frail thistle Jul 5, 2022, 2:14 PM

#

A bit advanced yet simple question:

While returning the result of a layer to a variable tensor, how can I make that tensor require grad?

self.X = self.conv(x)

I want X to record grads

tidal bough Jul 5, 2022, 2:16 PM

#

frail thistle A bit advanced yet simple question: While returning the result of a layer to a ...

.requires_grad_(), I believe.
https://pytorch.org/docs/stable/generated/torch.Tensor.requires_grad_.html#torch.Tensor.requires_grad_

frail thistle Jul 5, 2022, 2:17 PM

#

Thank you, however, to my knowledge, that X must be pre defined

tidal bough Jul 5, 2022, 2:17 PM

#

Not sure what you mean. Whenever you assign any tensor to self.X, mark that tensor as requiring grad.

#

If you assign to self.X in many places and that's annoying, you can use a property to automate it.

frail thistle Jul 5, 2022, 2:18 PM

#

Oh you are right since grad will be created in backwards pass!

#

The problem was that I don't want to predefine "X" and to be able to record grads

#

Somehow was thinking I need to do requires grad at the assignment

serene scaffold Jul 5, 2022, 2:25 PM

#

please don't ask people to read screenshots of text

#

!code

arctic wedgeBOT Jul 5, 2022, 2:25 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

frail thistle Jul 5, 2022, 2:26 PM

#

tidal bough Not sure what you mean. Whenever you assign any tensor to `self.X`, mark that te...

After the assignment, using the .requires_grad_ doesn't track the grads

onyx tulip Jul 5, 2022, 2:26 PM

#

        ('rf', RandomForestClassifier()),
        ('abc', AdaBoostClassifier()),
        ('svc', SVC())]

bstc = StackingClassifier(estimators=bestimators, final_estimator=LogisticRegression())

stc_params = {
    'rf__n_estimators': [100,150, 200],
    'rf__criterion': ['entropy'],
    'rf__bootstrap': [True],
    'rf__oob_score': [True],
    'rf__max_depth': [10],
    'rf__random_state': [5],
    'abc__base_estimator': [DecisionTreeClassifier],
    'abc__n_estimators': [100, 150, 200],
    'abc__learning_rate': [1.0],
    'abc__random_state': [5],
    'svc__C': [1.0],
    'svc__kernel': ['rbf'],
    'svc__gamma': ['auto'],
    'svc__random_state':[5],
    'final_estimator__penalty':['l2'],
    'final_estimator__C':[1.0],
    'final_estimator__fit_intercept': [True],
    'final_estimator__solver': ['liblinear']
}

stc_gs = GridSearchCV(estimator=bstc_ ,param_grid=stc_params, cv=5, n_jobs=4)
stc_gs.fit(X_train, y_train)

onyx tulip Jul 5, 2022, 2:38 PM

#

onyx tulip ```py ...

bump

tidal bough Jul 5, 2022, 2:39 PM

#

frail thistle After the assignment, using the .requires_grad_ doesn't track the grads

Do you maybe mean that you need to also track the grad of applying self.conv? If so, you need to apply .requires_grad_() to x, before you do self.conv(x).

jolly knoll Jul 5, 2022, 2:42 PM

#

jolly knoll Hello people. I recently came across Approximate Nearest Neighbour and was wonde...

Anyone knows about this? Would appreciate a reply regarding this!

fallow remnant Jul 5, 2022, 2:42 PM

#

question: if I'm using a linear lasso model to train against a right skewed data set, should I set my alpha to 0.01, 0.1, or 1?

mild dirge Jul 5, 2022, 2:49 PM

#

onyx tulip bump

?

onyx tulip Jul 5, 2022, 2:55 PM

#

onyx tulip ```py ...

anyone knows how to combine stacking estimators and gridsearch cv together?

#

keep getting an error. TypeError: Cannot clone object. You should provide an instance of scikit-learn estimator instead of a class.

gloomy glen Jul 5, 2022, 3:08 PM

#

how to process a document using LayoutLM model

#

cant understand where to give the input image or how to process

#

can anyone please guide me

steady basalt Jul 5, 2022, 3:23 PM

#

onyx tulip ```py ...

What’s with double underscores

mild dirge Jul 5, 2022, 3:23 PM

#

@onyx tulip Where is bstc_ defined?

#

Also you haven't given the full error traceback

fallow remnant Jul 5, 2022, 3:26 PM

#

fallow remnant question: if I'm using a linear lasso model to train against a right skewed data...

^

proper salmon Jul 5, 2022, 4:00 PM

#

I've been working on making an AI chat bot for discord, is anyone interested in trying it out?

#

It utilizes modified version of GPT-3

main fox Jul 5, 2022, 4:00 PM

#

proper salmon I've been working on making an AI chat bot for discord, is anyone interested in ...

Has it improved since last time?

royal garnet Jul 5, 2022, 4:00 PM

#

Did it turn out racist like the Microsoft one?

#

(kidding)

proper salmon Jul 5, 2022, 4:01 PM

#

main fox Has it improved since last time?

Absolutely

proper salmon Jul 5, 2022, 4:01 PM

#

royal garnet Did it turn out racist like the Microsoft one?

nerd

mild dirge Jul 5, 2022, 4:01 PM

#

Is it sentient?

proper salmon Jul 5, 2022, 4:01 PM

#

No it's very neutral suh

proper salmon Jul 5, 2022, 4:01 PM

#

mild dirge Is it sentient?

Arguably, wanna try?

mild dirge Jul 5, 2022, 4:01 PM

#

Where can I try it?

main fox Jul 5, 2022, 4:01 PM

#

GPT-3 seems to have some sort of censor for racism, to some extent at least.

proper salmon Jul 5, 2022, 4:01 PM

#

I just need to invite you to my server if that's okay

royal garnet Jul 5, 2022, 4:02 PM

#

You should let it loose on the some unsuspecting server, and just see what happens.

proper salmon Jul 5, 2022, 4:02 PM

#

GIGACHAD

royal garnet Jul 5, 2022, 4:02 PM

#

Like that guy that unleashed the but on 4chat.

#

seychelles chan or whatever it was.

proper salmon Jul 5, 2022, 4:03 PM

#

Never heard of it

main fox Jul 5, 2022, 4:03 PM

#

I like the one that makes greentext posts

royal garnet Jul 5, 2022, 4:03 PM

#

Oh dude

#

https://www.youtube.com/watch?v=efPrtcLdcdM

YouTube

Yannic Kilcher

This is the worst AI ever

#gpt4chan #4chan #ai

GPT-4chan was trained on over 3 years of posts from 4chan's "politically incorrect" (/pol/) board.
(and no, this is not GPT-4)

EXTRA VIDEO HERE: https://www.youtube.com/watch?v=dQw4w9WgXcQ

Website (try the model here): https://gpt-4chan.com
Model (no longer available): https://huggingface.co/ykilcher/gpt-4chan
Code: http...

▶ Play video

proper salmon Jul 5, 2022, 4:03 PM

#

Ohhhh yeah that bot

#

It's funny

royal garnet Jul 5, 2022, 4:04 PM

#

If I wasn't at work - I'd jump on and play with the bot - but sadly I'm not really free atm.

proper salmon Jul 5, 2022, 4:04 PM

#

It's cool nerd

royal garnet Jul 5, 2022, 4:07 PM

#

What kind of hardware do you need to train an ai model anyway? I'd be curious to play with some - but I just have a mid-range gaming pc.

#

Let me rephrase - what kind of hardware is needed to train one in a reasonable amount of time.

proper salmon Jul 5, 2022, 4:10 PM

#

AI operations run from GPU memory, so system memory isn't usually a bottleneck and servers typically have 128 to 512 GB of DRAM.

#

Regarding time though... that can take a long time

gloomy glen Jul 5, 2022, 4:17 PM

#

gloomy glen how to process a document using LayoutLM model

https://huggingface.co/docs/transformers/model_doc/layoutlm

LayoutLM

royal garnet Jul 5, 2022, 4:17 PM

#

I've got an rtx 3070 - can that be used?

proper salmon Jul 5, 2022, 4:18 PM

#

Yeah I don't see why not

#

But I'd recommend just renting a GPT-3 language model than trying to train one yourself, if that's what you're trying to do

serene scaffold Jul 5, 2022, 4:32 PM

#

royal garnet I've got an rtx 3070 - can that be used?

that GPU is CUDA-enabled, so you can use it for any CUDA computation up to its memory limit, which I believe is 8GB.

royal garnet Jul 5, 2022, 4:33 PM

#

Correct - but is 8gb enough for any sort of meaningul ai model training?

serene scaffold Jul 5, 2022, 4:36 PM

#

it really depends. I suspect that most state-of-the-art models for a given task use significantly more than 8GB, because organizations that can afford the talent to develop those models can also afford top-tier hardware. but that doesn't mean that similar performance couldn't possibly be achieved with smaller models.

manic heron Jul 5, 2022, 4:53 PM

#

interested if anyone has feedback here: https://www.reddit.com/r/Python/comments/vs2b6d/i_analyzed_1835_hospital_price_lists_so_you_didnt/?

r/Python - I analyzed 1835 hospital price lists so you didn't have to

0 votes and 0 comments so far on Reddit

#

https://github.com/alecstein/dolt_datascience/blob/master/hospitals_v3/cleanup/cleaning-hospitals-dataset.ipynb

GitHub

dolt_datascience/cleaning-hospitals-dataset.ipynb at master · alecs...

notebooks used to analysis projects. Contribute to alecstein/dolt_datascience development by creating an account on GitHub.

#

i'm not a real programmer, so any and all criticism is welcome

misty flint Jul 5, 2022, 5:42 PM

#

i'm not a real programmer
me

#

kekHands

night sequoia Jul 5, 2022, 5:54 PM

#

Hey guys ! I have written a kaggle notebook on Training Models ( a chapter in Hand's on Machine learning book ) and I have added the key points in that lesson and have explained the code , have a look at it and give your feedback . Cheers! LINK : https://www.kaggle.com/code/supreeth888/training-models-hand-s-on-machine-learning/notebook

Training Models-Hand's-On Machine Learning

Explore and run machine learning code with Kaggle Notebooks | Using data from No attached data sources

hollow sentinel Jul 5, 2022, 6:01 PM

#

!pastebin

arctic wedgeBOT Jul 5, 2022, 6:01 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 5, 2022, 6:01 PM

#

https://paste.pythondiscord.com/pocehuyufe

#

Traceback (most recent call last):
File "/Users/rahuldas/Desktop/ICH-CAHPS Survey Analysis/ICH-CAHPS Survey Analysis.py", line 27, in <module>
], axis = 1)
File "/Users/rahuldas/Library/Python/3.7/lib/python/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/rahuldas/Library/Python/3.7/lib/python/site-packages/pandas/core/frame.py", line 4913, in drop
errors=errors,
File "/Users/rahuldas/Library/Python/3.7/lib/python/site-packages/pandas/core/generic.py", line 4150, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/rahuldas/Library/Python/3.7/lib/python/site-packages/pandas/core/generic.py", line 4185, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/rahuldas/Library/Python/3.7/lib/python/site-packages/pandas/core/indexes/base.py", line 6017, in drop
raise KeyError(f"{labels[mask]} not found in axis")
KeyError: "['Lower box percent of patients-providing information to patients'\n 'Lower box percent of patients-rating of the nephrologist'\n 'Lower box percent of patients-rating of the dialysis center staff'\n 'Top box percent of patients-rating of the dialysis center staff'\n 'Middle box percent of patients-rating of the dialysis facility'] not found in axis"

#

the key error means the column doesn't exist in the dataframe

#

but i know it exists

#

!pastebin

arctic wedgeBOT Jul 5, 2022, 6:07 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 5, 2022, 6:07 PM

#

https://paste.pythondiscord.com/odayulagod

#

does it all have to be on one line?

hollow sentinel Jul 5, 2022, 6:30 PM

#

or maybe it’s bc there’s a typo?

#

i don’t see a typo here

mild dirge Jul 5, 2022, 6:44 PM

#

hollow sentinel i don’t see a typo here

Not really sure, but wouldn't it be simpler to just take the columns that you do actually want

#

instead of dropping 90% of them

hollow sentinel Jul 5, 2022, 6:54 PM

#

yeah that’s true

lofty elk Jul 5, 2022, 7:18 PM

#

Should I learn MatPlotLib or Plotly ?

hollow sentinel Jul 5, 2022, 7:41 PM

#

!pastebin

arctic wedgeBOT Jul 5, 2022, 7:41 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 5, 2022, 7:41 PM

#

https://paste.pythondiscord.com/utepuhijeq

#

another key error

#

bruh

#

oh

#

i just wanted to select those specific features

#

is there a way to do it?

#

ohh

steady basalt Jul 5, 2022, 7:49 PM

#

If you’re modelling multiple linear regression of a continuous variable against a binary variable plus confounders, does it have to be a generalised model?

hollow sentinel Jul 5, 2022, 7:57 PM

#

!pastebin

arctic wedgeBOT Jul 5, 2022, 7:57 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 5, 2022, 7:58 PM

#

https://paste.pythondiscord.com/yagihoqofu

#

hmm

#

is it bc profit v non profit is not 1 or 0?

steady basalt Jul 5, 2022, 8:03 PM

#

And is the equation y’=B0 + x1B1

hushed sail Jul 5, 2022, 8:26 PM

#

Hi everyone! I need to use this model for one of my applications: https://github.com/hukenovs/hagrid
Should I be looking for a powerful PC to train the model? And why can't they just upload the models as files?

Thanks in advance 🙂

GitHub

GitHub - hukenovs/hagrid: HAnd Gesture Recognition Image Dataset

HAnd Gesture Recognition Image Dataset. Contribute to hukenovs/hagrid development by creating an account on GitHub.

mild dirge Jul 5, 2022, 8:29 PM

#

The already supply pre-trained models @hushed sail

#

#

https://pytorch.org/tutorials/beginner/saving_loading_models.html

#

It downloads a .pth file, and this link shows how to load such a model I believe

#

And some of these models are very light-weight, so it should even be possible on a laptop I think

hushed sail Jul 5, 2022, 8:31 PM

#

I didn't even notice. Thank you very much, you helped me a lot!

mild dirge Jul 5, 2022, 8:32 PM

#

I'm not completely sure how to load the model if you don't have the model class

hushed sail Jul 5, 2022, 8:32 PM

#

mild dirge I'm not completely sure how to load the model if you don't have the model class

It's in the repository

mild dirge Jul 5, 2022, 8:32 PM

#

They have a demo.py file that loads a pre-trained model It seems, so probably look into that

#

Oh right, yeah it's in there

hushed sail Jul 5, 2022, 8:33 PM

#

Yeah, thanks 🙂

mild dirge Jul 5, 2022, 9:33 PM

#

Yeah, if you only want data of that category, you should filter it beforehand

#

@mild pecan

#

You should also keep x and y together when splitting into training and testing, such that Y still matches with X

#

And then you could just separate them, as they would be in the same order

mild pecan Jul 5, 2022, 9:36 PM

#

That was exactly my thought process, so what I suggested sounds right, even though it seems to be against the order of the task?

mild dirge Jul 5, 2022, 9:37 PM

#

Not really sure how you "set a column as target variable"

#

normally you do something like this

y_col = 'annual_premium'
y = insurance_df[y_col]
X = insurance_df[insurance_df.columns.drop(y_col)]

#

Which is just making two new dataframes

#

one for y, and one for X

mild pecan Jul 5, 2022, 9:39 PM

#

This relates to regression models. DEFAULT_FLAG becomes the target variable which will be predicted with the help of the other 9 columns/variables

mild dirge Jul 5, 2022, 9:41 PM

#

I understand the meaning of X and y, I just don't see how to "mark it" in a pandas dataset

#

It seems to me that you would just create two new dataframes

#

That's how I've been doing it at least

mild pecan Jul 5, 2022, 9:45 PM

#

mild dirge It seems to me that you would just create two new dataframes

Yep, thats what I am doing too

mild dirge Jul 5, 2022, 10:05 PM

#

This seems to just re-iterate what I already thought though right?

#

There's not really a method to "mark a column as target variable"

#

It's just splitting it into two dataframes

#

Not really sure what you are trying to show

#

Yes, that is what I looked at

#

iris_X, iris_y = datasets.load_iris(return_X_y=True) This is how they define X and y

#

as two separate variables, not in 1 dataframe

#

So that confirms what I said yes

rough mountain Jul 5, 2022, 11:09 PM

#

I was recently reminded of https://botnik.org/content/harry-potter.html and was wondering how you would approach something like this today. Transformers are currently all the rage, but they seem poor at generating large amounts of text. I also doubt fine-tuning would work well in a fantasy setting (Most of it's learning has been done with text from our real world). LSTMs seem to remain a decent option. A text gan seems perfect for something like this, but I've heard mixed reviews.

proper salmon Jul 5, 2022, 11:11 PM

#

With GPT-4 on the horizon, an upgrade to any GPT-3 chatbot should be easy if the api stays the same.

steady basalt Jul 5, 2022, 11:16 PM

#

GPT4 nxt yr?

#

do u think that a couplpe of comapnies are cornering the language model market?

#

i wonder what the future holds for nlp beyond gpt4, i doubt it can get much more advanced

#

im weighing my options of specialising/training in NLP or CV, can only rly choose one to focu son

charred light Jul 5, 2022, 11:30 PM

#

If I have a dataset of online orders, and I'm predicting profit. Logically speaking I can't use the column sales right? Since that would be basically feature leakage?

rough mountain Jul 5, 2022, 11:35 PM

#

What's in the sales column?

serene scaffold Jul 5, 2022, 11:41 PM

#

charred light If I have a dataset of online orders, and I'm predicting profit. Logically speak...

are you doing time series forecasting?

proper salmon Jul 6, 2022, 12:00 AM

#

steady basalt GPT4 nxt yr?

Rumor that it comes out next month

charred light Jul 6, 2022, 12:01 AM

#

serene scaffold are you doing time series forecasting?

No, just predicting Profit. Given variables from here: https://www.kaggle.com/datasets/vivek468/superstore-dataset-final

proper salmon Jul 6, 2022, 12:01 AM

#

And yeah I don't doubt that a couple companies would corner that language model.

#

GPT3 is already super expensive I can't imagine how expensive GPT4 would be

charred light Jul 6, 2022, 12:03 AM

#

Since profit = Sales - Cost, there is correlation between the two. To me, doesn't make sense to use sales.

#

It's not really possible to do time series analysis since the time periods are not uniform.

hollow sentinel Jul 6, 2022, 12:16 AM

#

!pastebin

arctic wedgeBOT Jul 6, 2022, 12:16 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 6, 2022, 12:16 AM

#

https://paste.pythondiscord.com/nesozadifu

#

!pastebin

arctic wedgeBOT Jul 6, 2022, 12:18 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Jul 6, 2022, 12:18 AM

#

https://paste.pythondiscord.com/suyowokule

#

i think the solution here is to turn the profit v non profit column to ones and zeroes

#

the problem is that would split it into two columns

#

so how do i keep it as one column with ones and zeroes?

#

i could commit a sin and iterate through the entire column, change "Profit" to 1 and "Non-profit" to 0

#

i honestly don't know

#

i could use .replace

#

holy fucking shit

#

i did it

#

am smort

#

there is a surprisingly strong classification model to predict if a dialysis facility is profit or nonprofit with ratings with these features "Star rating of nephrologists' communication and caring", "Star rating of quality of dialysis center care and operations",
"Star rating of providing information to patients", "Star rating of the nephrologist", "Star rating of the dialysis center staff",
"Star rating of the dialysis facility"

#

shit

#

my model overfitted

thorny aurora Jul 6, 2022, 2:15 AM

#

i have a question, so i'm trying to make a machine learning model and i have an input that is one of two values('L' or 'R'), do i have to one hot encode them or can i just convert to 0 and 1?

serene scaffold Jul 6, 2022, 2:34 AM

#

thorny aurora i have a question, so i'm trying to make a machine learning model and i have an ...

yes

#

well actually no

#

you would encode L as [1, 0] and R as [0, 1]. or vice versa.

thorny aurora Jul 6, 2022, 2:34 AM

#

okay

#

wait, can you convert a value in a column to an array? or do you mean split them into two columns

serene scaffold Jul 6, 2022, 2:36 AM

#

thorny aurora wait, can you convert a value in a column to an array? or do you mean split them...

the point is that for as many unique values there can be, there are vectors with that many elements that are all 0. and then you assign each value an index in those vectors. and for whichever value a given vector is intended to represent, you make the element at that value's index 1.

#

how you accomplish that is up to you.

cold saddle Jul 6, 2022, 2:42 AM

#

For data science questions is best to ask for help here or in a help channel? I looking for assistance with my neuralprophet model

novel python Jul 6, 2022, 2:45 AM

#

You can ask here

cold saddle Jul 6, 2022, 2:49 AM

#

I asked in #help-cake

#

With neurelprophet i am trying to forecast my shipping container volume. The issue is since my date information is rubbish i have many days that show 0. To address this i thought to group the data by week which should be a sufficient fix as i want weekly forecast anyway. The issue is the forecast goes below 0 which is not possible. What is the proper way if any to address this?

thorny aurora Jul 6, 2022, 2:57 AM

#

serene scaffold how you accomplish that is up to you.

one hot encoding and just changing the column values to 0 or 1 got the same result

cold saddle Jul 6, 2022, 3:13 AM

#

My MAE is also very high 😦

mystic gulch Jul 6, 2022, 3:33 AM

#

I'm looking for help with openAI, is this the right spot? or is there a better server? I just want to know how to get the summarization on openai to return a summary that is a complete sentence.

cold saddle Jul 6, 2022, 3:55 AM

#

dont ask to ask i have learned

slate cave Jul 6, 2022, 5:43 AM

#

I'm putting this in AI because speech related. I am looking for something to decompose speech into International Phonetic Alphabet (IPA). I ran across a great project named Allosaurus that did exactly what I wanted but it has a few limitations - in particular it gives back durations that are all a fixed time. This causes problems. The use-case is to map spoken words into visenes (think like animations or vtubers). Amazon Polly returned good data but it was only on generated speech. Papagayo is an open source project that sort of accomplishes the same thing but it's manual.

Anyone know of anything I should try?

royal garnet Jul 6, 2022, 7:14 AM

#

In Pandas, how do I select a row based on a condition, and then cast that entire row to a list? That condition being, say, the min value in a datetime column?

#

or in for loop, append that row to anew df

tidal bough Jul 6, 2022, 7:16 AM

#

e.g. df[df["datetime"]==df["datetime"].min()]. This will be a slice of the original dataframe. Note that it might have more than one row, if the min value repeats more than once.

royal garnet Jul 6, 2022, 7:17 AM

#

That is confusing to me - why are we doing df[df['column']?

#

instead of say df['column]

#

Oh wait - its a conditional statement inside the brackets?

#

In plain english what is that line of code saying exactly?

tidal bough Jul 6, 2022, 7:19 AM

#

Comparisons on a Series result in a Series of booleans.

#

So df["datetime"]==df["datetime"].min() is a Series of booleans - for each element, whether it's equal to df["datetime"].min().

#

That Series can then be used as an index to select only these rows.

royal garnet Jul 6, 2022, 7:20 AM

#

Ahh I see

#

hence df[that whole comparison operation]

#

you're saying in this df, select any row where df['datetime].min() is true

delicate apex Jul 6, 2022, 7:21 AM

#

royal garnet In Pandas, how do I select a row based on a condition, and then cast that entire...

~~!d pandas.DataFrame.idxmin~~
(bot not like my summon, see next one)
https://stackoverflow.com/a/10202789
SO post is for max, but the min analog seems to be something like df[df['COL_GO_HERE'].idxmin()].tolist()

Stack Overflow

Find row where values for column is maximal in a pandas DataFrame

How can I find the row for which the value of a specific column is maximal?

df.max() will give me the maximal value for each column, I don't know how to get the corresponding row.

#

!d pandas.DataFrame.idxmin

arctic wedgeBOT Jul 6, 2022, 7:21 AM

#

pandas.DataFrame.idxmin


DataFrame.idxmin(axis=0, skipna=True)```
Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

royal garnet Jul 6, 2022, 7:23 AM

#

Thanks, looks like two methods to do what I want - I'll experiment with those!

#

and thanks @tidal bough for the nice explanation of what your suggestion is doing.

celest vine Jul 6, 2022, 7:38 AM

#

How to extract customers from a sales dataset who have purchased from the website more than once? Basically repeat customers

#

Someone please help with the logic

serene scaffold Jul 6, 2022, 8:22 AM

#

slate cave I'm putting this in AI because speech related. I am looking for something to dec...

I work as a computational linguist, and I've never heard of a model that can do this. You might try asking in a server that's even more specialized.

#

Also, the level of detail in the transcription matters

astral vigil Jul 6, 2022, 8:51 AM

#

hi everyone i have a question, i still new using machine learning and my first project is to make prediction using regression. i think id have several issue in my machine learning model after i read some paper about Multicollinearity and there is method to check about this method called VIF (found it on internet). does it Multicollinearity really effect the model accuracy? or is gonna making problem for the model in the future? and btw i used OLS method

urban prism Jul 6, 2022, 10:11 AM

#

I'm trying to use a data generator in pytorch. Is there way I can work around splitting my folders into train and validation while using dataloaders? I separated my image files into trian and validation by paths (X_test = [path/image1.jpg, path/image2.jpg], Y_test = [class1, class2], X_train = [path/image3.jpg, path/image4.jpg]...) But torch datasets require a root path like e_dataset = datasets.ImageFolder(root='e_data/train', transform=data_transform). Is there a way I can work around separating my image folder into train/val/test?

mild dirge Jul 6, 2022, 1:46 PM

#

urban prism I'm trying to use a data generator in pytorch. Is there way I can work around sp...

Why do you want to work around that? seems like a organized way to store your data

#

Wouldn't it be simpler to just organize your data in that way

mighty condor Jul 6, 2022, 2:10 PM

#

Weird pandas column naming thing happening..? why aren't I able to name a column like this? python uwo["PR-Q10-1"]=df.loc["PR-Q10-1"].apply(foos.PR_Q10_1so that will completely bug, and it won't even add the column, but if I name the new column like thispython uwo["PR-Q10-"]=df.loc["PR-Q10-1"].apply(foos.PR_Q10_1It will add the new column and works as normal...? I have other columns named with endings with "-1" as well...? what's happening here?

#

It's also not working and just hiding the column, because the rank doesn't change, so I know it's not just hiding it, and if I name it without "-1", it will increase the rank

slate cave Jul 6, 2022, 2:15 PM

#

serene scaffold I work as a computational linguist, and I've never heard of a model that can do ...

Thanks. If you're interested in seeing an implementation, this is Allosaurus:
https://github.com/xinjli/allosaurus

GitHub

GitHub - xinjli/allosaurus: Allosaurus is a pretrained universal ph...

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages - GitHub - xinjli/allosaurus: Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

urban prism Jul 6, 2022, 3:02 PM

#

mild dirge Wouldn't it be simpler to just organize your data in that way

It's not a personal project, tough. That part isn't much up to me 😅

mild dirge Jul 6, 2022, 4:08 PM

#

@urban prism You can make a custom DataSet class in pytorch, this way you can make a DataSet for your train and test

#

Or a single DataSet class with a flag for train or test or whatever

#

https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

celest vine Jul 6, 2022, 4:16 PM

#

I have a column in my dataset which contains phone numbers. Majority of them are 10 digits, but some have country code in front like +91 and some have a extra 0 in front of them. How do I remove these extra +91 and zeros ?

steady basalt Jul 6, 2022, 4:34 PM

#

Anyone else here would fail a math exam?

#

Am I the only fake data scientist who couldnt pass second year hs maths?

#

Realised I don’t have the time to learn it, shud i swap to SWE? a lot of people basically infer that I couldn’t become a DS

lapis sequoia Jul 6, 2022, 4:36 PM

#

steady basalt Anyone else here would fail a math exam?

I heard from a seminar on learning to code, "You don't hate math, you hate math class"

spare briar Jul 6, 2022, 4:37 PM

#

so you couldn't solve systems of equations? don't know about functions?

steady basalt Jul 6, 2022, 4:37 PM

#

I couldn’t answer anything beyond first year

#

I don’t have the methods

spare briar Jul 6, 2022, 4:37 PM

#

If you want to be DS you definitely need to learn through high school math

steady basalt Jul 6, 2022, 4:37 PM

#

Never learnt beyond linalg and calc1 intros

steady basalt Jul 6, 2022, 4:38 PM

#

spare briar If you want to be DS you definitely need to learn through high school math

There’s literally 0 chance I have time for@that

spare briar Jul 6, 2022, 4:38 PM

#

It would be a few hours a week for a few months

#

not too bad

steady basalt Jul 6, 2022, 4:38 PM

#

I can use sklearn, tensorflow and produce projects in inferential statistics but I couldn’t pass pen and paper calculations

steady basalt Jul 6, 2022, 4:38 PM

#

spare briar It would be a few hours a week for a few months

That’s huge BS

spare briar Jul 6, 2022, 4:39 PM

#

shrug

steady basalt Jul 6, 2022, 4:39 PM

#

It took me about 40 hours to finish basic linear algebra

#

Exam papers cover far far more than these topics

spare briar Jul 6, 2022, 4:40 PM

#

Look I'm sure you could get a data job without knowing these things, but it would be limiting, and if you don't want to suck this is prerequisite knowledge

steady basalt Jul 6, 2022, 4:40 PM

#

Trig, sequences, geometry

#

All sorts

#

I wud fail every time

spare briar Jul 6, 2022, 4:41 PM

#

Then start putting in the hours to learn

#

It will take how long it takes

steady basalt Jul 6, 2022, 4:41 PM

#

How is it even relevant ?

spare briar Jul 6, 2022, 4:41 PM

#

It is necessary to go beyond superficial understanding of what you are doing

steady basalt Jul 6, 2022, 4:41 PM

#

Geometry and sequences?

#

Really?

spare briar Jul 6, 2022, 4:41 PM

#

yeah

steady basalt Jul 6, 2022, 4:41 PM

#

Trigemoetrt?

spare briar Jul 6, 2022, 4:41 PM

#

obviously not necessary for everything

steady basalt Jul 6, 2022, 4:41 PM

#

How?

#

Producing production code requires zero understanding of those topics

spare briar Jul 6, 2022, 4:42 PM

#

I personally would not hire, that is one data point

steady basalt Jul 6, 2022, 4:42 PM

#

Even a graduate?

spare briar Jul 6, 2022, 4:42 PM

#

Your job is production code + domain understanding + good models

steady basalt Jul 6, 2022, 4:43 PM

#

Lol, being able to pass a math exam has no impact on those three

spare briar Jul 6, 2022, 4:43 PM

#

I would expect a DS I hire to be able to read, understand, implement and improve on ML research papers

steady basalt Jul 6, 2022, 4:43 PM

#

So long as you understand backprop, matrices, vectors and integration

spare briar Jul 6, 2022, 4:44 PM

#

there are different levels

#

like I said, I'm sure you could get a job

#

but don't you want to be good

steady basalt Jul 6, 2022, 4:44 PM

#

Why would you need to have the ability to have methods to work out exams?

steady basalt Jul 6, 2022, 4:44 PM

#

spare briar I would expect a DS I hire to be able to read, understand, implement and improve...

Data scientist to work on research papers? That isn’t a data scientist that’s a ml research scientist

spare briar Jul 6, 2022, 4:44 PM

#

to implement methods from papers

#

understand them completely

#

use them to solve our problems

steady basalt Jul 6, 2022, 4:45 PM

#

Sorry, you work for google?

spare briar Jul 6, 2022, 4:45 PM

#

comparable

steady basalt Jul 6, 2022, 4:46 PM

#

I’d be able to learn DSA to get into ur company as an SWE in half of the time to pass ur math exam

spare briar Jul 6, 2022, 4:46 PM

#

sure then do that

steady basalt Jul 6, 2022, 4:46 PM

#

U think it’s a good idea?

spare briar Jul 6, 2022, 4:46 PM

#

based on this attitude, if you dont want to put in time to learn prerequisites, you will never put in the time to be excellent

steady basalt Jul 6, 2022, 4:47 PM

#

Putting in the time, that’s what, an entire year of studying with all my free time

spare briar Jul 6, 2022, 4:47 PM

#

more

wooden sail Jul 6, 2022, 4:47 PM

#

what you're saying is you can copy paste stuff, but don't understand how or why it works, or how or when to use it

steady basalt Jul 6, 2022, 4:47 PM

#

For neural networks? Sort of

spare briar Jul 6, 2022, 4:47 PM

#

When I decided to switch to ML I basically spent 6+ hrs/day for 2 years

#

and I already had math BS and published in physics

wooden sail Jul 6, 2022, 4:47 PM

#

well, you might imagine that limits your options, right?

steady basalt Jul 6, 2022, 4:48 PM

#

spare briar When I decided to switch to ML I basically spent 6+ hrs/day for 2 years

That isn’t healthy when ur already working full time

spare briar Jul 6, 2022, 4:48 PM

#

Well it's what I did

#

just telling you how it is

steady basalt Jul 6, 2022, 4:49 PM

#

Good job there’s no math entrance exam at most companies

#

I cud learn while working …

#

Instead of before

spare briar Jul 6, 2022, 4:49 PM

#

They will ask you questions about methods that you wouldn't be able to answer

steady basalt Jul 6, 2022, 4:49 PM

#

spare briar They will ask you questions about methods that you wouldn't be able to answer

I’ve had one interview at a big four lately and they just asked stats

mild dirge Jul 6, 2022, 4:49 PM

#

You would think so..

steady basalt Jul 6, 2022, 4:49 PM

#

Which was easy enough

spare briar Jul 6, 2022, 4:50 PM

#

exactly, and job postings usually ask for masters or "equivalent experience" as a minimum

steady basalt Jul 6, 2022, 4:50 PM

#

They didn’t ask any pen and paper calculations and equation solving

#

I have masters almost finished

spare briar Jul 6, 2022, 4:50 PM

#

We don't ask pen and paper but would want you to walk us through your understanding of relevant algorithms

steady basalt Jul 6, 2022, 4:51 PM

#

I could easily do that, is it enough?

spare briar Jul 6, 2022, 4:51 PM

#

I don't think you could do that

steady basalt Jul 6, 2022, 4:51 PM

#

Like I said earlier, my problem was sitting a final year exam and failing it lately

#

No, I literally could

#

Especially with study beforehand

mild dirge Jul 6, 2022, 4:51 PM

#

spare briar comparable

So what company, if I may ask?

steady basalt Jul 6, 2022, 4:51 PM

#

This is knowledge you can obtain over time without practising problems

spare briar Jul 6, 2022, 4:52 PM

#

mild dirge So what company, if I may ask?

I am a ml research engineer at a SV computer vision company

steady basalt Jul 6, 2022, 4:52 PM

#

spare briar I don't think you could do that

The only non pure stats question asked was how knn works

#

Could you tell me what sort of q u ask?

#

For junior data scientist

spare briar Jul 6, 2022, 4:54 PM

#

in our last interview we asked about linear regression (probabilistic view), SVD (and application to our domain), then basic deep learning questions, deep computer vision architectures, derive variational autoencoder, self-supervised learning

steady basalt Jul 6, 2022, 4:54 PM

#

I wasn’t saying earlier, that I don’t understand this stuff and how it works - i do. But I don’t have literal solving methods required to pass exams

#

Where you literally write out your solution line by line

steady basalt Jul 6, 2022, 4:55 PM

#

spare briar in our last interview we asked about linear regression (probabilistic view), SVD...

I highly doubt that any company asks those to junior DSs

wooden sail Jul 6, 2022, 4:55 PM

#

can you describe the conditions under which the least squares approach is optimal for linear regression?

spare briar Jul 6, 2022, 4:56 PM

#

we asked those of a recent masters grad

dull granite Jul 6, 2022, 4:56 PM

#

spare briar and I already had math BS and published in physics

Applied math👀

steady basalt Jul 6, 2022, 4:56 PM

#

wooden sail can you describe the conditions under which the least squares approach is optima...

Nope

dull granite Jul 6, 2022, 4:56 PM

#

Got lin alg and number theory next sem so hopefully this degree turns out well thinkmon

steady basalt Jul 6, 2022, 4:56 PM

#

spare briar we asked those of a recent masters grad

Position title?

spare briar Jul 6, 2022, 4:56 PM

#

Data Scientist 1

wooden sail Jul 6, 2022, 4:57 PM

#

steady basalt Nope

i would say this is like the first thing you learn, which ofc requires stats, linalg, and calculus/optimization

steady basalt Jul 6, 2022, 4:57 PM

#

wooden sail i would say this is like the first thing you learn, which ofc requires stats, li...

What’s the answer?

spare briar Jul 6, 2022, 4:57 PM

#

the deep learning questions probably would not be asked unless the company works on related problems but others are fair game

wooden sail Jul 6, 2022, 4:59 PM

#

steady basalt What’s the answer?

when the noise follows a distribution described by its mean and covariance, and the mean is 0 and the covariance is a scaled identity matrix

#

pops up rather naturally when looking at the log-likelihood

lapis sequoia Jul 6, 2022, 5:00 PM

#

Hi, I have a question on the deepmind lectures by David Silver. its about the forward view and backward view TD(lambda). Just to confirm, if we ignore the idea of eligibility traces, then these two are the same algorithm right? its just that the former is waiting for the future to update "now" but backward is like a recursive program where "now" is the furthest function call. right?
From one side, TD(lamda)/forward view looks like basically fusion of montecarlo and TD
and backward view is like TD lamda but reversed
but at the same time, my mind says its different cause backward view uses eligibility traces

spare briar Jul 6, 2022, 5:01 PM

#

our candidate answered gaussian distributed noise then showed how the likelihood function gives L2 loss, then we followed up about how to justify regularization and they added a prior

#

this was good enough for us, we followed up some of the details edd mentioned and they showed understanding

steady basalt Jul 6, 2022, 5:02 PM

#

Well, I’m content not joining your faang research team for a few years anyway.. gives me time to learn

#

Most companies take graduates without such hard questions

#

This sort of knowledge is learnable and memorisable without being able to solve equations in exams

spare briar Jul 6, 2022, 5:03 PM

#

you should know this if you've read any intro ML book

#

which is basically the minimum bar

steady basalt Jul 6, 2022, 5:04 PM

#

I haven’t read any ML books, during my masters it’s been mostly coding and stats

#

I will def get around to an ml book thoxxx

#

…

wooden sail Jul 6, 2022, 5:04 PM

#

you should probably pick one up, but you'll definitely wanna brush up some earlier maths first

steady basalt Jul 6, 2022, 5:04 PM

#

Which is the best one?

spare briar Jul 6, 2022, 5:04 PM

#

bishop

wooden sail Jul 6, 2022, 5:05 PM

#

i'd recommend gilbert strang's linalg

steady basalt Jul 6, 2022, 5:05 PM

#

I mean something not extremel hard to get into off the bat

#

Like you said, that info is in intro to ml

#

Are you referring to pattern learning and machine learning

spare briar Jul 6, 2022, 5:07 PM

#

hastie and tibshirani is another popular option

steady basalt Jul 6, 2022, 5:07 PM

#

Is there anywhere I can preview it

#

I don’t want to buy a book and open the third page and be hit with equations I can’t understand

spare briar Jul 6, 2022, 5:07 PM

#

you can get pdfs for free of both through google

steady basalt Jul 6, 2022, 5:08 PM

#

Link?

spare briar Jul 6, 2022, 5:08 PM

#

i think its against server tos just google name and "pdf"

steady basalt Jul 6, 2022, 5:08 PM

#

I googled it and got Amazon

#

Ah found it

wooden sail Jul 6, 2022, 5:09 PM

#

it's the first result when you google it 😛

lapis sequoia Jul 6, 2022, 5:10 PM

#

lapis sequoia Hi, I have a question on the deepmind lectures by David Silver. its about the f...

someone help me

#

on this question I have

steady basalt Jul 6, 2022, 5:11 PM

#

Wow this book is insane

#

Hmm quite good

#

I’m very glad I was at least taught probability in class

#

What’s the level of calculus required?

spare briar Jul 6, 2022, 5:14 PM

#

It definitely isn't easy but if you can get through and understand this book you'll be at the level of a strong ML masters graduate

steady basalt Jul 6, 2022, 5:14 PM

#

I can integrate a very very simple equation only

#

Especially not with a lot of surds fractions powers and multiple variables

spare briar Jul 6, 2022, 5:15 PM

#

just the core ideas

steady basalt Jul 6, 2022, 5:15 PM

#

I’m not great with functions

spare briar Jul 6, 2022, 5:15 PM

#

if there is a hard integral it will be in intro chapters or appendix

wooden sail Jul 6, 2022, 5:16 PM

#

it's very likely that you won't have to integrate anything by hand anyway, only some special results are important there, e.g. related to expectation and moments, energy-like quantities, and integral transforms

spare briar Jul 6, 2022, 5:16 PM

#

doing the integration isn't the point, the point is the concepts anyways

steady basalt Jul 6, 2022, 5:16 PM

#

wooden sail it's very likely that you won't have to integrate anything by hand anyway, only ...

Yeah but we’ve been discussing the ability to do is is required

#

Entire talks about math exam

spare briar Jul 6, 2022, 5:17 PM

#

you don't need to know all of the weird integration tricks from calc 2 or anything

steady basalt Jul 6, 2022, 5:17 PM

#

Concepts aren’t something you need to grind out practise questions

spare briar Jul 6, 2022, 5:17 PM

#

just "as needed"

steady basalt Jul 6, 2022, 5:17 PM

#

We were specifically talking about being able to get the “tricks” and pass calc algebra exams

wooden sail Jul 6, 2022, 5:17 PM

#

even so, you'll never run into an integral that requires super fancy tricks and you have to solve by hand unless you're taking a course on integral calc/calc2, so don't worry about it at that level

steady basalt Jul 6, 2022, 5:17 PM

#

This guy just said without being able to pass said exam I wudnt be hired by him

#

That’s the entire topic

wooden sail Jul 6, 2022, 5:18 PM

#

understanding special properties is what is usually evaluated, not doing a weird integral or antiderivative

steady basalt Jul 6, 2022, 5:18 PM

#

I mean, that I can learn, I can read alot and study… that’s different to solving

wooden sail Jul 6, 2022, 5:18 PM

#

they won't evaluate you on calc 2, but rather on recognizing an integral is equivalent to a special transform, or that special results can be applied to immediately simplify it

steady basalt Jul 6, 2022, 5:18 PM

#

And this is about solving ability

wooden sail Jul 6, 2022, 5:18 PM

#

it's about solving ability in the specific context

#

you could cook up an arbitrarily messed up integral that no one in the world can solve, phd in maths or no

#

that's beside the point

#

you need to know the skills for what you're aiming for

steady basalt Jul 6, 2022, 5:19 PM

#

Yeah but otherwise I feel like I’m memorising math facts without truely understanding

#

And that’s essentially what is being inferred against me; u cant solve at a low level u can’t be a good DS

#

I CAN memorise all this information and concept

wooden sail Jul 6, 2022, 5:20 PM

#

yes but you're not solving low level problems because you're failing to notice what is important

steady basalt Jul 6, 2022, 5:20 PM

#

What is

wooden sail Jul 6, 2022, 5:21 PM

#

you need a strong grasp on earlier concepts, really understanding them

#

rote computing does not necessarily equate to understanding

steady basalt Jul 6, 2022, 5:21 PM

#

But u can’t understand if u can’t compute right?

wooden sail Jul 6, 2022, 5:22 PM

#

that's absolutely wrong

#

especially considering several concepts don't even have any computation attached to them

steady basalt Jul 6, 2022, 5:23 PM

#

So in your opinion, even if I’d fail a final year math exam I could still be a decent DS?

wooden sail Jul 6, 2022, 5:24 PM

#

depends on final year at which level

steady basalt Jul 6, 2022, 5:24 PM

#

That’s the opposite of what someone just said

#

HS so calc 3 I believe is American level?

wooden sail Jul 6, 2022, 5:24 PM

#

if it's final year HS, you have a ton of ground to make up

steady basalt Jul 6, 2022, 5:24 PM

#

Or as they say here, A2

#

Core 4

wooden sail Jul 6, 2022, 5:25 PM

#

tbh the grades are overall not really important if you really understand the concepts, but you also said that wasn't the case

steady basalt Jul 6, 2022, 5:25 PM

#

Take a look at AS core 4 exams

#

A2 sorry not as

#

It’s a2 c4 maths

#

They also have further maths which is lin Alg

#

For me it’s unbelievably hard

#

By c4

wooden sail Jul 6, 2022, 5:26 PM

#

getting bad grades and struggling with a topic are two separate things

steady basalt Jul 6, 2022, 5:26 PM

#

C1,2 and possibly 3 are fine

#

https://revisionmaths.com/sites/mathsrevision.net/files/imce/Questionpaper-Unit4(6666)-June2018.pdf

#

It’s about 70% for a C

#

Thoughts?

wooden sail Jul 6, 2022, 5:30 PM

#

i'd say it seems rough for high school, but these are all things you should be capable of

steady basalt Jul 6, 2022, 5:31 PM

#

https://revisionmaths.com/sites/mathsrevision.net/files/imce/6669_01_que_20160627.pdf here’s

wooden sail Jul 6, 2022, 5:31 PM

#

they're basic undergrad maths you'd pick up in first year at latest

steady basalt Jul 6, 2022, 5:31 PM

#

I love what I do it’s fun but now I wana swap to engineering and just code mown

#

Bcs that shit would take way too long to get a good grade on

wooden sail Jul 6, 2022, 5:32 PM

#

well, switching to engineering means you'll need to learn what they learn in engineering 😛

steady basalt Jul 6, 2022, 5:32 PM

#

I could absolutely learn to code well

wooden sail Jul 6, 2022, 5:32 PM

#

these maths are the basic foundation to do the actual work later on

steady basalt Jul 6, 2022, 5:32 PM

#

I bet they are - and I couldn’t get higher than 30% marks

#

Which is a certified fail

#

60% is pass minimum I think

wooden sail Jul 6, 2022, 5:34 PM

#

then you gotta sink some time into it

steady basalt Jul 6, 2022, 5:38 PM

#

Maybe when I start working I will yes

#

Hopefully the bar will be lower to get into companies than this dudes faang

#

So start working and get experience and on the side learn that

wooden sail Jul 6, 2022, 5:39 PM

#

i would expect it to get higher, since everyone wants to jump into these fields with as little preparation as possible

steady basalt Jul 6, 2022, 5:39 PM

#

Higher in a while

#

Not in a couple months haha

#

I have an offer to be analytics consultant also which is much less mathematical

#

But I don’t rly wana do it

#

I think it’s paid bad

steady basalt Jul 6, 2022, 5:55 PM

#

https://www.reddit.com/r/datascience/comments/u8to9z/employed_data_scientists_and_ml_engineers_if_you/

r/datascience - Employed data scientists and ML engineers: If you w...

314 votes and 204 comments so far on Reddit

#

Well makes me feel slightly better…

wooden sail Jul 6, 2022, 6:01 PM

#

if you don't wanna learn it, don't. no one will force you lol

#

you might also wanna read up on confirmation bias

steady basalt Jul 6, 2022, 6:02 PM

#

It’s not that I don’t want to, it’s that I may struggle to while working full time and having other commitments

#

And knowing that it seems like a very scary idea to try work as a DS if I will not be capable to get jobs or do jobs

#

Especially since I’m finishing uni in 2 months

#

There’s no plan b

#

Except either consulting (cringe) or data engineering

#

And the convo started with me saying maybe I shud just focus purely on coding then

mild dirge Jul 6, 2022, 6:05 PM

#

What's cringe about consulting?

steady basalt Jul 6, 2022, 6:06 PM

#

I associate it with really annoying business jargon people but that’s just my bias

#

I know this one guy and he says touch base like 12 times an hour no joke

#

I’m not really sure it’s for me, and it pays pretty badly too iirc

mild dirge Jul 6, 2022, 6:07 PM

#

steady basalt I associate it with really annoying business jargon people but that’s just my bi...

"Innovative"

steady basalt Jul 6, 2022, 6:11 PM

#

Agile synergetic circle back and strategise

#

Got it?

rough mountain Jul 6, 2022, 6:15 PM

#

I want to train a lstm on a body of text. Is there a way I'm supposed to break the text down in to trainable data?

mild dirge Jul 6, 2022, 6:17 PM

#

This seems to explain most of the basics

#

https://towardsdatascience.com/word-embeddings-and-the-chamber-of-secrets-lstm-gru-tf-keras-de3f5c21bf16

Medium

Word Embeddings and the chamber of secrets| LSTM | GRU | tf.keras

The final destination to intuitively understand word embeddings… finally

rough mountain Jul 6, 2022, 6:18 PM

#

mild dirge This seems to explain most of the basics

Thanks a lot. I have found a lot of stuff on using embeddings, but nothing on the data prep for them

misty flint Jul 6, 2022, 6:31 PM

#

just heard a podcast from the ceo founder of this company, and it sounds like its pretty promising https://venturebeat.com/2022/03/16/hidden-door-reveals-its-ai-powered-narrative-game-building-platform/

VentureBeat

Rachel Kaser

Hidden Door reveals its AI-powered narrative game platform

Hidden Door, a new studio, today announced its launch. It also announced its first product.

#

“We like to think of it as Roblox meets D&D, where you have the vibe of a tabletop RPG where you and your friends are telling a story together. You’re also playing with the AI narrator, who’s sort of like our AI dungeon master, who’s building a world out of the choices that you make as you play.”

mild dirge Jul 6, 2022, 8:08 PM

#

Anyone experience with running something like Dall-e on google collaborate?

#

How quick does it run when using pro+, and can it run with just pro?

rough mountain Jul 6, 2022, 8:13 PM

#

I can convert a word to a vector with embeddings, but how can I do it in reverse.

serene scaffold Jul 6, 2022, 8:19 PM

#

rough mountain I can convert a word to a vector with embeddings, but how can I do it in reverse...

it's non-trivial to go in reverse, because a given embedding probably won't match up exactly with an embedding in your vector space

wooden sail Jul 6, 2022, 8:22 PM

#

you can think of vectors in the original encoded space as being in R^n, and vectors in the space after the embedding as being in R^m, usually with the condition m << n. you can only go in the opposite direction if the dimension of the subspace of R^n spanned by the words in your text has dimension <= m, or if you encode text that happens to have few enough unique words that it happens to satisfy some identifiability condition when paired with the matrix that does the embedding

#

there's usually no unique way of going back except under special conditions

rough mountain Jul 6, 2022, 8:29 PM

#

I want to set up the model in such a way that it writes a whole sentence instead of one word at a time. Normally people use one hot encoding, but it doesn't really work that well here.

wooden sail Jul 6, 2022, 8:31 PM

#

it could be doable as long as the sentence satisfies the identifiability condition

rough mountain Jul 6, 2022, 8:31 PM

#

Well I definitely don't know how to do it.

wooden sail Jul 6, 2022, 8:32 PM

#

should be more or less equivalent to pseudo inverting the embedding matrix, do you have any way to get ahold of its entries?

rough mountain Jul 6, 2022, 8:33 PM

#

Keras has a get layer weights function.

#

I've heard the embedding layer is basically a dense layer

wooden sail Jul 6, 2022, 8:33 PM

#

yep

#

a dense layer is the same as a dense matrix

#

if we call that M, a matrix that does the embedding, we are interested in x such that Mx = v, where v is the embedded vector and you want to solve for x. M is going to be a fat matrix (more columns than rows), meaning it is underdetermined and the equation has either no or infinitely many solutions

#

the reason people use one hot encoding here is that that inherently makes x sparse. then you can find the unique sparsest solution x by adding in sparse regularization

#

using something like combinations of syllables is less likely to have a sparse representation, which is more memory efficient, but also more difficult to invert for many reasons. it's more difficult to build prior info to find a unique sol, distances between words are not uniform, making the matrix poorly conditioned

#

so if your goal is do generate similar text, i can see the merits of using one hot

#

that being said, nothing stops you from trying both (other than time constraints)

#

for the inversion, maybe scipy or scikit learn has something like a lasso regressor

#

scikit has one, yes https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

scikit-learn

sklearn.linear_model.Lasso

Examples using sklearn.linear_model.Lasso: Release Highlights for scikit-learn 0.23 Release Highlights for scikit-learn 0.23, Compressive sensing: tomography reconstruction with L1 prior (Lasso) Co...

rough mountain Jul 6, 2022, 8:43 PM

#

The goal is to generate text based on the style of a writer (their writings being the training data). I figured out how to make a gan with a LSTM generator (for reasons I don't really understand the generator has to output vectors, so I have to train it with the real data being autoencoded) But I have no way of getting the final text out of the decoder right now.

iron basalt Jul 6, 2022, 8:43 PM

#

lapis sequoia Hi, I have a question on the deepmind lectures by David Silver. its about the f...

They are equivalent for off-line. http://incompleteideas.net/book/ebook/node76.html

7.4 Equivalence of Forward and Backward Views

wooden sail Jul 6, 2022, 8:44 PM

#

rough mountain The goal is to generate text based on the style of a writer (their writings bein...

i'm guessing you're managing to generate the sentence in the embedded space?

iron basalt Jul 6, 2022, 8:44 PM

#

On-line learning is a whole other unsolved thing. Although we can expect it to be very similar for on-line. As shown in the post, you can modify the definition to get it to work.

hollow sentinel Jul 6, 2022, 8:44 PM

#

i was trying to webscrape and i got so frustrated i ended up faking my data

#

😦

rough mountain Jul 6, 2022, 8:45 PM

#

wooden sail i'm guessing you're managing to generate the sentence in the embedded space?

Well I am embedding the inputs into the autoencoder, using the embeddings seems like the only good way to get text back out.

#

without the output being the word count * vocab

wooden sail Jul 6, 2022, 8:47 PM

#

one way that is blackboxy is to extend the autoencoder to produce the vector pre-embedding, for instance

#

but if you already have the embedded output, you can "invert" the embedding matrix, as mentioned above (more like solving an inverse problem, really)

rough mountain Jul 6, 2022, 8:48 PM

#

if I invert the embedding matrix how do I get the word itself out

wooden sail Jul 6, 2022, 8:48 PM

#

if you invert the embedding matrix, this allows you to map an embedded input into an encoded input

#

and an encoded input can be decoded with the same function you used to encode it

#

whatever you used for your word to vector conversion should have a decode function, that's no problem

rough mountain Jul 6, 2022, 8:49 PM

#

ok. Thanks a lot 🙂

iron basalt Jul 6, 2022, 8:50 PM

#

iron basalt On-line learning is a whole other unsolved thing. Although we can expect it to b...

Example (7.3) demonstrates one of the key differences between off-line and on-line: "Note that the on-line algorithm works better over a broader range of parameters. This is often found to be the case for on-line methods.".

wooden sail Jul 6, 2022, 8:50 PM

#

words -> encoded vectors -> (this operation is lossy) embedded vectors -> whatever your code does to generate new embedded vectors -> (this inversion is the difficult one) -> encoded output -> (your initial encoder should have a decoder) output sentence @rough mountain

#

at least that's how i see it in my head

twilit wave Jul 6, 2022, 9:14 PM

#

Hey, I'm training a homemade AI model on some basic sentences to analyze them for their real meaning; is there a corpus of simple subject-predicate sentences in a library somewhere?

mild pecan Jul 6, 2022, 9:18 PM

#

How do I know how to handle NaNs? I have columns like "Amount due on existing mortgage", "Value of current property", "Years at present job", "Number of major derogatory reports"
"Number of delinquent credit lines", "Age of oldest trade line in months", "Number of recent credit lines"

How do I know if I should use mean/median, kNN imputator, or imperative imputator?

mild dirge Jul 6, 2022, 10:10 PM

#

Maybe not super relevant, but I created this video using some text to image AI generator 😛

#

bold timber Jul 6, 2022, 10:19 PM

#

If we have 4 train datasets from Kaggle like A_train, B_train, C_train, and D_train. The A_train dataset contains all of the columns for each B_train, C_train, and D_train. What we can do to process the dataset?

On another side, the A_train dataset is having large data that has 3 million data. Whether we should merge the A_train dataset with another dataset to aim to have a little bit of the data? or what?

crisp wing Jul 6, 2022, 10:31 PM

#

bold timber If we have 4 train datasets from Kaggle like A_train, B_train, C_train, and D_tr...

You could look into something like dask. I've only used it on top of a netcdf/hdf5-oriented layer module called xarray

#

It probably still requires a buttload of disk usage, as much as you'd need in ram if you loaded it "normally"

#

the data i mean

bold timber Jul 6, 2022, 10:39 PM

#

what is h5df?

But if you have the dataset like that, what could you can do? Are you merge the data first or choose the column by dropping in A_train?

crisp wing Jul 6, 2022, 10:46 PM

#

bold timber what is h5df? But if you have the dataset like that, what could you can do? Are...

Hdf5 is a hierarchal dataformat kinda like json, I guess, but it's irrelevant.

Again, I used XArray, but it works on top of the Dask module. Dask allows you to read in datafiles too large to fit in memory and process them in "chunks", sounded like something you wanted.
This helped me perform SVD on a total of 120 gigs of data without having to sacrifice any data.
Other than that I can't help you, I'm not a ML expert, sorry.

From their page, maybe this helps:
https://examples.dask.org/machine-learning/training-on-large-datasets.html

#

Or perhaps this one:
https://ipython-books.github.io/511-performing-out-of-core-computations-on-large-arrays-with-dask/
Their array type should be compatible with pandas, but I'm not sure of how to convert them

IPython Cookbook - 5.11. Performing out-of-core computations on lar...

IPython Cookbook,

bold timber Jul 6, 2022, 10:49 PM

#

crisp wing Hdf5 is a hierarchal dataformat kinda like json, I guess, but it's irrelevant. ...

Yeah, I already read about h5df before. But, how to use it when dealing with CSV files?

crisp wing Jul 6, 2022, 10:50 PM

#

docs:
https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html

#

You probably need to play around with it, especially if you start specifying chunk size. I don't think it can handle discrepancies in data, so if you have NaN at different positions in your chunks (i.e. without pattern) it can fail the process, you need to account for that

bold timber Jul 6, 2022, 10:57 PM

#

I already import the dataset like this, but this happens. Why do I get the type of data?

#

How to show all values in dataset?

crisp wing Jul 6, 2022, 10:58 PM

#

What's the shape of your data?

#

I think it just states you got a crapton of data by this. You could affirm individual elements by accessing them just like any pandas dataframe, reading from their docs

bold timber Jul 6, 2022, 11:02 PM

#

crisp wing What's the shape of your data?

the dataset contains 3 million rows and 73 columns

crisp wing Jul 6, 2022, 11:03 PM

#

That's probably why then, it's not gonna list that. The npartions are the amount of "chunks" the data is place into. So when dask works with your data, it loads in chunks

#

Also, as I remember, dask doesn't perform any of it's operations (perhaps even .read_csv()) until it has to, so if you need to debug various operations, or use the data outside dask operations, you may need to perform .compute()

bold timber Jul 6, 2022, 11:09 PM

#

But actually, I don't really understand with h5df works. But if we back again to my question, when we have 4 datasets, as I said before, what we could do to process the dataset? Whether we just only use A_train that have all of the column, or we can merge the dataset based to another to get little bit data?

#

I think I would read the data with normally way with read_csv

crisp wing Jul 6, 2022, 11:11 PM

#

bold timber But actually, I don't really understand with h5df works. But if we back again to...

If they have the same variables you could load them into the same dask array using read_csv. I think you could just list them all in a tuple, or perhaps use the globbing, if it fits your files' names

#

You can also join them by various union types and things like that, but I'll leave that up to you, I honestly can't remember any of that, sorry

bold timber Jul 6, 2022, 11:15 PM

#

A_train has all of column to another dataset. B_train, C_train, and D_train is snippet from A_train dataset

crisp wing Jul 6, 2022, 11:17 PM

#

bold timber But actually, I don't really understand with h5df works. But if we back again to...

Think I misunderstood, to process a dask array look into the docs, I can't help you with that, sorry, but they got a whole section called ML-dask on their page, I'm sure you'll find something

EDIT: start here maybe:
https://ml.dask.org/cross_validation.html

bold timber Jul 6, 2022, 11:39 PM

#

crisp wing Think I misunderstood, to process a dask array look into the docs, I can't help ...

Ok thank you for discussion

rapid gull Jul 7, 2022, 12:34 AM

#

Hey all! So I recently came across a TON of stamps, and I am trying to create a dB of them. Because there are literally thousands, I am hoping to be able to take a photo of multiple stamps and have my app split them into individuals. Are there any API's or SDK's or algorithms that anybody knows of that could help me do this?

brave sand Jul 7, 2022, 12:39 AM

#

Quick question, does anyone here have any experience with the algorithm QMIX? In the linked repository, I am trying to find where the monotonicity constraint is implemented.
https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/qmix.py

GitHub

Popular-RL-Algorithms/qmix.py at master · quantumiracle/Popular-RL-...

PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet.. - Popular-RL-Algorithms/qmix.py at master ·...

iron basalt Jul 7, 2022, 1:42 AM

#

brave sand Quick question, does anyone here have any experience with the algorithm QMIX? In...

https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/qmix.py#L243-L250

arctic wedgeBOT Jul 7, 2022, 1:42 AM

#

qmix.py lines 243 to 250

w1 = self.hyper_w_1(states).abs() if self.abs else self.hyper_w_1(states)  # [#batch*#sequence, action_shape*self.embed_dim*#agent]
b1 = self.hyper_b_1(states)  # [#batch*#sequence, self.embed_dim]
w1 = w1.view(-1, self.n_agents*self.action_shape, self.embed_dim)  # [#batch*#sequence, #agent*action_shape, self.embed_dim]
b1 = b1.view(-1, 1, self.embed_dim)   # [#batch*#sequence, 1, self.embed_dim]
hidden = F.elu(torch.bmm(agent_qs, w1) + b1)  # [#batch*#sequence, 1, self.embed_dim]

# Second layer
w_final = self.hyper_w_final(states).abs() if self.abs else self.hyper_w_final(states)  # [#batch*#sequence, self.embed_dim]```

iron basalt Jul 7, 2022, 1:42 AM

#

.abs()

#

The weights of the mixing network are produced by sep-
arate hypernetworks. Each hypernetwork takes the state
s as input and generates the weights of one layer of the
mixing network. Each hypernetwork consists of a single
linear layer, followed by an absolute activation function, to
ensure that the mixing network weights are non-negative

#

To enforce the monotonicity constraint of (5), the
weights (but not the biases) of the mixing network are re-
stricted to be non-negative

#

http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf

brave sand Jul 7, 2022, 1:44 AM

#

iron basalt https://github.com/quantumiracle/Popular-RL-Algorithms/blob/master/qmix.py#L243-...

bc w1 and w2 is being constrained there correct?

iron basalt Jul 7, 2022, 1:44 AM

#

They can't be negative.

brave sand Jul 7, 2022, 1:44 AM

#

also, how do you know everything?

#

and ur username is Squiggle it’s hard to take you srsly lol

iron basalt Jul 7, 2022, 1:46 AM

#

I choose my usernames arbitrarily. I would generate a hash but that makes it hard for someone to refer to me.

brave sand Jul 7, 2022, 1:48 AM

#

iron basalt I choose my usernames arbitrarily. I would generate a hash but that makes it har...

how did you know it was that line?

iron basalt Jul 7, 2022, 1:48 AM

#

brave sand how did you know it was that line?

I knew that the weights needed to be non-negative.

#

They are chosen by a "hyper" network.

#

And those lines were in the class QMix, which is the mixing network that needs the constraint.

brave sand Jul 7, 2022, 1:51 AM

#

Oh I understand now

misty flint Jul 7, 2022, 2:11 AM

#

brave sand and ur username is Squiggle it’s hard to take you srsly lol

you dont know squiggle. out of everyone in this channel, squiggle is one of the most knowledgeable. hands down.

#

blobpray

#

i even have in my notes for squiggle: "basically knows everything"

#

DoggoKek

worthy pagoda Jul 7, 2022, 4:23 AM

#

Hey guys, first time posting in here. Had a question. Currently working on a fairly large dataset (options data) - and have a column with a bunch of expiration dates. Now I only want to filter the column to show the expirations on a Friday. Do I need to incorporate this into a loop?
I have made the column into a datetime format and have tried selecting the expiration on only day 5 (Friday) but no luck. Pasting 2 screenshots for reference.

#

#

lapis sequoia Jul 7, 2022, 5:27 AM

#

iron basalt They are equivalent for off-line. http://incompleteideas.net/book/ebook/node76.h...

yoo, thank you for noticing my question

#

https://tenor.com/view/kristenbell-crying-ellen-gif-4150129

Tenor

*crying*

▶ Play video

latent glacier Jul 7, 2022, 5:28 AM

#

PLS IS ANYBODY FAMILIAR WITH LASSO REGRESSION AND FINDING THE OPTIMAL ALPHA??

#

I NEED HELP PLS 😭😭

wooden sail Jul 7, 2022, 5:28 AM

#

what is "alpha" here?

lapis sequoia Jul 7, 2022, 5:29 AM

#

I think its the rate that lasso regression is multiplied against

latent glacier Jul 7, 2022, 5:29 AM

#

omg yes 👏

lapis sequoia Jul 7, 2022, 5:29 AM

#

lapis sequoia yoo, thank you for noticing my question

back to this question, what about for an online case tho

lapis sequoia Jul 7, 2022, 5:30 AM

#

latent glacier omg yes 👏

I think its a hyperparameter and you just gonna play with it, right?

wooden sail Jul 7, 2022, 5:31 AM

#

if you can show the equation you're using, i can take a look. idk if you mean what's normally called the "lambda" parameter

#

can you show your version of the lasso problem?

latent glacier Jul 7, 2022, 5:33 AM

#

i’m just a high school intern and i got thrown into this 😭😭

#

wooden sail Jul 7, 2022, 5:33 AM

#

ah ok, the sklearn one. then yes, it's the sparsity regularization weight

latent glacier Jul 7, 2022, 5:33 AM

#

i don’t know what i’m doing 💀 i’m just guessing the alpha value

wooden sail Jul 7, 2022, 5:34 AM

#

well, there are 2 common ways

#

one of them is exactly as you're doing it: you generate a list of alpha values, and then evaluate which one gave you the "best" result in some sense. you keep that one. this is how it's done when you use an algorithm that needs an explicit value of alpha

#

an alternative is to use an algorithm that can find it explicitly

latent glacier Jul 7, 2022, 5:35 AM

#

pls whichever one is easier to do 🧎🏻‍♀️

wooden sail Jul 7, 2022, 5:36 AM

#

probably what you're already doing. the answer is: try many different alphas and keep the best

#

if you know the values x that solve Ax = y ahead of time, you can check the distance between x and your estimate to pick alpha. if not, then the distance between Ax and y also works, though not as well

latent glacier Jul 7, 2022, 5:37 AM

#

what is the typical range?

wooden sail Jul 7, 2022, 5:45 AM

#

ah right. that one is annoying because it depends on the actual algorithm. the method i'm familiar with is as follows (though it might not work for you, we'll have to try). you have your matrix A and the vector y, yeah? it turns out that some solvers use "soft thresholding" in their iterations. the amount that entries are soft thresholded by is the product of alpha with an internal learning rate that is also used. you can compute the product A^T y and find the element with the largest absolute value. call this quantity, w, for instance. then you can set alpha to w * c, where c is a number between 0 and 1

#

setting c to 0 should remove the sparse regularization entirely, and setting it to 1 will make the output fully sparse, i.e. a vector of zeros

#

then all you have to do is test values of c between 0 and 1

latent glacier Jul 7, 2022, 5:49 AM

#

oh 🥹🥹😭😭😭 goodbye sleep

#

what if i did the other way where it would automatically find it for me?

wooden sail Jul 7, 2022, 5:51 AM

#

yeah but you'd have to use a different solver. i know cvx can do this. idk how to do it with sklearn

iron basalt Jul 7, 2022, 5:53 AM

#

lapis sequoia back to this question, what about for an online case tho

It tends to be close, but it's not equivalent as in the off-line case. Not without modifying the definitions. It's addressed in the link.

latent glacier Jul 7, 2022, 5:54 AM

#

oh okay thanks anyways!!

wooden sail Jul 7, 2022, 5:54 AM

#

latent glacier oh okay thanks anyways!!

ah, bingo. sklearn lassoCV can do this with cross validation

#

use that

lapis sequoia Jul 7, 2022, 5:55 AM

#

iron basalt It tends to be close, but it's not equivalent as in the off-line case. Not witho...

ahh really, I think I skimmed a bit too much. will read it more throughly

latent glacier Jul 7, 2022, 5:56 AM

#

omg thanks but how would i even start?

wooden sail Jul 7, 2022, 5:57 AM

#

by reading tge documentation :p the function should do pretty much everything for you