wicked grove Jan 9, 2022, 5:50 AM

#

so should i add a flatten layer

#

these are the last 2 layers of my model

#

                                )                                                                 
                                                                                                  
 top_activation (Activation)    (None, 17, 17, 2304  0           ['top_bn[0][0]']                 
                                )                                                                 
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 2304)        0           ['top_activation[0][0]']         
 2D)                                                                                              
                                                                                                  
 top_dropout (Dropout)          (None, 2304)         0           ['avg_pool[0][0]']               
                                                                                                  
 predictions (Dense)            (None, 1000)         2305000     ['top_dropout[0][0]']  ```

stone marlin Jan 9, 2022, 5:51 AM

#

I'm not sure, I'm not great at NN stuff yet, someone else in here should be able to answer you though. :']

wicked grove Jan 9, 2022, 5:51 AM

#

alrightt,thankss:))

wicked grove Jan 9, 2022, 6:00 AM

#

stone marlin I'm not sure, I'm not great at NN stuff yet, someone else in here should be able...

i think the article made sense i had to add globalpoolingaverage2d so that it maps to the new dense layer

#

but i am getting another error

#

logits and labels must have the same first dimension, got logits shape [32,3] and labels shape [96]
     [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits
 (defined at C:\Users\Urja\anaconda3\lib\site-packages\keras\backend.py:5113)``` i haven't even used logits

stone marlin Jan 9, 2022, 6:03 AM

#

I'm not sure, but this SO comes up when I google the error. https://stackoverflow.com/questions/49161174/tensorflow-logits-and-labels-must-have-the-same-first-dimension

Stack Overflow

Tensorflow : logits and labels must have the same first dimension

I am new in tensoflow and I want to adapt the MNIST tutorial https://www.tensorflow.org/tutorials/layers with my own data (images of 40x40).
This is my model function :

def cnn_model_fn(features,

wicked grove Jan 9, 2022, 6:09 AM

#

stone marlin I'm not sure, but this SO comes up when I google the error. https://stackoverfl...

Thank you soo much!!

arctic wedgeBOT Jan 9, 2022, 10:44 AM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1641725672:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

stuck holly Jan 9, 2022, 12:11 PM

#

Anyone here use vscode?

wicked grove Jan 9, 2022, 12:23 PM

#

Hello, i have a problem...im doing transfer learning for a dataset with 3390 images
The time taken for an epoch is an hour ://

#

Is there any way i can improve the speed

#

After the first epoch the training stopped cause the i had no memory left on my C drive

upbeat prism Jan 9, 2022, 2:13 PM

#

@wicked grove I'm no expert myself and no idea about the time since I never did such a thing. I suggest you do the following:

If you have a GPU use cuda to train your NN
Enable the profiler of whatever NN library you use or alternatively: Measure it yourself. E.g. in your loop for 1 epoch I guess you have something like "train()" and "evaluate()" and in e.g. train() you might have model.predict(batch) and optimizer.step() or whatever. You have different big function calls, measure them each and check which one takes the longest. Also make sure you measure the duration of you file read/write.
Generally, FileIO (wirintg/reading a file) is slow. You also want to minimize communication between CPU memory (your RAM) and CUDA. So make sure that you move as much data to CUDA memory as possible. I only can give details here if you use pyTorch's DataSet class but basically: Try to store your images in HDF5 format. Use h5py for that. (it's quit a bit of work probably).

Are you sure you run out of space on your C drive and not out of memory? (Maybe you run out of memory and your OS creates a swap file? No idea how windows works really, sorry.) My guess is that you somehow read too much into memory.

E.g. if you run out of memory on linux, linux will create a file and dump memory into the file basically making memory reads extremly!!! slow. Could be a reason but I'm really guessing here because as I said, I have no idea about windows. Make sure to monitor your memory.

To put it simply: Make basic measurements so find the bottleneck and then try to fix it.

arctic wedgeBOT Jan 9, 2022, 2:15 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1641738329:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

upbeat prism Jan 9, 2022, 2:16 PM

#

Maybe an plan of action: What I'd do is: import time and then start = time.time() and end = time.time() around the basic function calls like optimizer step, model evaluation, fileIO and then output for each of them print(end-start). Also open task manager and look at the memory usage. If it gets above 100% you need to fix how you read data. Also make sure you use cuda.

upbeat dove Jan 9, 2022, 3:15 PM

#

How do I make a many to one RNN network that can take an infinitely long string

#

In tensorflow

boreal summit Jan 9, 2022, 3:37 PM

#

Hello everyone, I'm on this task for sentiment analysis for Amazon reviews. So I have the dataset loaded on Colab, it has 3 columns namely class_index (1 to 5), review title and review text. I am asked to perform EDA on this dataset and I honestly do not know what to do next. I'm wondering if I should predict the class index or what?

serene scaffold Jan 9, 2022, 4:16 PM

#

boreal summit Hello everyone, I'm on this task for sentiment analysis for Amazon reviews. So I...

by EDA, do you mean "exploratory data analysis"? well, look at the data, see what is in it that could be used to signify which class a given review has been assigned.

#

and yes, your goal is to design a system that could predict the "class index" given only the review title and the review text.

#

can you think of what features you could use?

boreal summit Jan 9, 2022, 4:25 PM

#

serene scaffold can you think of what features you could use?

Thanks for the response. I would use a pretrained model, but I was wondering if I could use just the review text or I have to use both.

serene scaffold Jan 9, 2022, 4:26 PM

#

boreal summit Thanks for the response. I would use a pretrained model, but I was wondering if ...

I don't know why you're doing this task. Did someone tell you that you have to use the title?

#

I assume you can do it however you want. You could also concatenate the two.

boreal summit Jan 9, 2022, 4:27 PM

#

Yes, I was asked to perform EDA on the dataset and report the final performance metrics for my approach. Also suggest ways in which I can improve my model.

#

So I'm guessing I would have to create a model to predict the class index, then show the accuracy & loss, and some other metrics and suggest ways to improve the model.

#

I'm working on it already.

serene scaffold Jan 9, 2022, 4:28 PM

#

Sounds good to me

boreal summit Jan 9, 2022, 4:28 PM

#

Thanks.

serene scaffold Jan 9, 2022, 4:28 PM

#

I assume you're familiar with NLP basics like tokenization?

boreal summit Jan 9, 2022, 4:28 PM

#

Sure.

serene scaffold Jan 9, 2022, 4:28 PM

#

Sure, as in yes?

boreal summit Jan 9, 2022, 4:28 PM

#

Yes.

serene scaffold Jan 9, 2022, 4:28 PM

#

lemon_hyperpleased

boreal summit Jan 9, 2022, 4:29 PM

#

I'm also familiar with embedding, conv1d. So I'll just try out different stuffs and see what works.

#

👍🏿

unborn temple Jan 9, 2022, 4:46 PM

#

Uh, I would like to learn whole math and how it works behind machine learning. Any suggestions to resources

#

I'm not good at reading research papers, so if there is any alternatives

vast thunder Jan 9, 2022, 4:47 PM

#

Something is pinned about math. I guess this https://mml-book.com/

Mathematics for Machine Learning

serene scaffold Jan 9, 2022, 4:49 PM

#

unborn temple I'm not good at reading research papers, so if there is any alternatives

research papers are intended to be read by other experts. They also aren't always that well written. So don't feel bad if you find them confusing. One technique is to read the abstract, section headings, and conclusion a few times before trying to read the whole paper.

unborn temple Jan 9, 2022, 4:52 PM

#

Hmm, yeah I will try it, but is there any easier way like books mentioned above, for deep learning algorithm, deep reinforcement algorithms, new algorithms?

serene scaffold Jan 9, 2022, 4:53 PM

#

unborn temple Hmm, yeah I will try it, but is there any easier way like books mentioned above,...

do you go to a school/university, or are employed by a tech company? If so, I would see if you have access to O'Reilly's online library.

#

they have deep learning books with Python examples.

unborn temple Jan 9, 2022, 4:53 PM

#

serene scaffold do you go to a school/university, or are employed by a tech company? If so, I wo...

I'm currently, a first year college student in Data science and Artificial intelligence

serene scaffold Jan 9, 2022, 4:54 PM

#

unborn temple I'm currently, a first year college student in Data science and Artificial intel...

great. see if your college/university gives you that access. Mine did.

unborn temple Jan 9, 2022, 4:55 PM

#

serene scaffold great. see if your college/university gives you that access. Mine did.

Uh, in my country, covid lockdown has been placed, so I couldnt access the library in my college. So, I'm searching for online resources

#

It will continue like maybe min 3 to Max 8 months

serene scaffold Jan 9, 2022, 4:56 PM

#

unborn temple Uh, in my country, covid lockdown has been placed, so I couldnt access the libra...

well, it's O'Reilly's online library, so it's a matter of seeing of your university will let you open an account with O'Reilly.

#

in my case, it was a matter of checking the website for the university library. Though it might be as easy as trying to create an account using your university email.

#

https://www.oreilly.com/

O'Reilly Media - Technology and Business Training

Gain technology and business knowledge and hone your skills with learning resources created and curated by O'Reilly's experts: live online training, video, books, our platform has content from 200+ of the world's best publishers.

unborn temple Jan 9, 2022, 4:59 PM

#

serene scaffold in my case, it was a matter of checking the website for the university library. ...

I strongly suspect they have one, what to do if have to learn on my own?

serene scaffold Jan 9, 2022, 4:59 PM

#

unborn temple I strongly suspect they have one, what to do if have to learn on my own?

I'm not sure I follow. You're asking how to get ahold of online books, right? There are a bunch on the website I just mentioned; you just have to try to make an account with your university email and see if your university has a contract with them.

#

If you need help finding books that aren't behind a paywall, I can try to help with that, instead.

pearl grove Jan 9, 2022, 5:00 PM

#

Anyone knows why this is happening? Code works in cell 1 but not in cell 2

unborn temple Jan 9, 2022, 5:00 PM

#

serene scaffold If you need help finding books that aren't behind a paywall, I can try to help w...

That would help me

serene scaffold Jan 9, 2022, 5:00 PM

#

pearl grove Anyone knows why this is happening? Code works in cell 1 but not in cell 2

Can you show the whole error message as text (not a screenshot)?

unborn temple Jan 9, 2022, 5:01 PM

#

It could be a utf-16 character kinda error

grave frost Jan 9, 2022, 5:01 PM

#

unborn temple I strongly suspect they have one, what to do if have to learn on my own?

There are many blogposts and YT videos out there to break stuff down very well

serene scaffold Jan 9, 2022, 5:01 PM

#

unborn temple That would help me

https://www.deeplearningbook.org/

grave frost Jan 9, 2022, 5:01 PM

#

try with the "Mathematics for Machine Learning" book. with a little googling, it should get you far with just High school math knowledge

pearl grove Jan 9, 2022, 5:02 PM

#

serene scaffold Can you show the whole error message as text (not a screenshot)?

`---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
C:\Users****\AppData\Local\Temp/ipykernel_20900/2342191653.py in <module>
----> 1 Year1 = pd.read_csv(r'..\datasets\2001.csv')

C:\Anaconda3\lib\site-packages\pandas\util_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper

C:\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
584 kwds.update(kwds_defaults)
585
--> 586 return _read(filepath_or_buffer, kwds)
587
588 `

serene scaffold Jan 9, 2022, 5:02 PM

#

grave frost try with the "Mathematics for Machine Learning" book. with a little googling, it...

that was suggested earlier in this conversation, interestingly enough

pearl grove Jan 9, 2022, 5:02 PM

#

`C:\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)
480
481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
483
484 if chunksize or iterator:

C:\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in init(self, f, engine, **kwds)
809 self.options["has_index_names"] = kwds["has_index_names"]
810
--> 811 self._engine = self._make_engine(self.engine)
812
813 def close(self):

C:\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)
1038 )
1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
1041
1042 def _failover_to_python(self):

C:\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in init(self, src, **kwds)
67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
68 try:
---> 69 self._reader = parsers.TextReader(self.handles.handle, **kwds)
70 except Exception:
71 self.handles.close()

C:\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.cinit()

C:\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()

C:\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

C:\Anaconda3\lib\site-packages\pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 343: invalid continuation byte`

serene scaffold Jan 9, 2022, 5:02 PM

#

guess we should add that to our website

grave frost Jan 9, 2022, 5:02 PM

#

Also if you can afford it @unborn temple nnfs.io is a very popular starter with code and simply explained math to start you off

#

that alone should give a very strong foundation to your knowledge

serene scaffold Jan 9, 2022, 5:02 PM

#

@pearl grove did you check what the encoding is for the csv file? guess it's not utf-8, or something

pearl grove Jan 9, 2022, 5:03 PM

#

serene scaffold <@!719268206635647046> did you check what the encoding is for the csv file? gues...

how do I do that?

pearl grove Jan 9, 2022, 5:04 PM

#

serene scaffold <@!719268206635647046> did you check what the encoding is for the csv file? gues...

But wait when I write this code instead Year1 = pd.read_csv('..\datasets\2001.csv') I get the File not found error

serene scaffold Jan 9, 2022, 5:04 PM

#

pearl grove how do I do that?

I'm not sure how to do it on colab, unfortunately

serene scaffold Jan 9, 2022, 5:04 PM

#

pearl grove But wait when I write this code instead `Year1 = pd.read_csv('..\datasets\2001.c...

try switching the backslashes to forward slashes?

#

I think colab uses a linux environment, but I'm not sure

unborn temple Jan 9, 2022, 5:05 PM

#

grave frost Also if you can afford it <@!772033037788905482> nnfs.io is a very popular start...

Yeah I'm aware of it, its a good one for neural networks, sentdex, even has a video series for free, it helps in understanding normal Neural networks, but I need deep reinforcement learning algorithm math, and new algorithm math

pearl grove Jan 9, 2022, 5:05 PM

#

serene scaffold try switching the backslashes to forward slashes?

it still gives the unicode error

unborn temple Jan 9, 2022, 5:05 PM

#

serene scaffold I think colab uses a linux environment, but I'm not sure

Yeah you are right it's linux

pearl grove Jan 9, 2022, 5:05 PM

#

serene scaffold I think colab uses a linux environment, but I'm not sure

Im using jupyter

serene scaffold Jan 9, 2022, 5:05 PM

#

pearl grove Im using jupyter

jupyter isn't the same as the operating system.

pearl grove Jan 9, 2022, 5:06 PM

#

serene scaffold jupyter isn't the same as the operating system.

oh nvm then, I thought colab was an IDE, havent heard of it before

serene scaffold Jan 9, 2022, 5:06 PM

#

is it possible to share the colab so I can look?

grave frost Jan 9, 2022, 5:06 PM

#

unborn temple Yeah I'm aware of it, its a good one for neural networks, sentdex, even has a vi...

the ideas and basics are the same

serene scaffold Jan 9, 2022, 5:06 PM

#

pearl grove oh nvm then, I thought colab was an IDE, havent heard of it before

colab is an online environment; it's not an IDE per se.

pearl grove Jan 9, 2022, 5:07 PM

#

serene scaffold is it possible to share the colab so I can look?

Um how do I do that...

unborn temple Jan 9, 2022, 5:08 PM

#

pearl grove it still gives the unicode error

Try reading using Csv module, it might be slower, but it will tell where is the error

pearl grove Jan 9, 2022, 5:08 PM

#

unborn temple Try reading using Csv module, it might be slower, but it will tell where is the ...

wdym by csv module

serene scaffold Jan 9, 2022, 5:08 PM

#

there's a csv module that comes with python

#

!docs csv

arctic wedgeBOT Jan 9, 2022, 5:08 PM

#

csv

Source code: Lib/csv.py

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

pearl grove Jan 9, 2022, 5:08 PM

#

sorry I'm very new to coding so still unaware of the jargon

unborn temple Jan 9, 2022, 5:09 PM

#

https://www.geeksforgeeks.org/working-csv-files-python/

GeeksforGeeks

Working with csv files in Python - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

#

Try this

#

@pearl grove try putting your file path in filename variable in that code

pearl grove Jan 9, 2022, 5:11 PM

#

unborn temple <@719268206635647046> try putting your file path in filename variable in that c...

okay I'll do that, thanks

pearl grove Jan 9, 2022, 5:13 PM

#

unborn temple <@719268206635647046> try putting your file path in filename variable in that c...

File "C:\Users\****\AppData\Local\Temp/ipykernel_20900/1996697643.py", line 1 filename = "C:\Users\Mohammed Haris\Documents\Prog for DS\Coursework\datasets\2000.csv" ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

pearl grove Jan 9, 2022, 5:13 PM

#

unborn temple <@719268206635647046> try putting your file path in filename variable in that c...

It didnt show error when i just typed in the actual file name

unborn temple Jan 9, 2022, 5:13 PM

#

pearl grove `File "C:\Users\****\AppData\Local\Temp/ipykernel_20900/1996697643.py", line 1 ...

https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python try this

Stack Overflow

UnicodeDecodeError when reading CSV file in Pandas with Python

I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error...
File "C:\Importer\src\dfman\importer.py", line 26, in impo...

#

This might help you faster

#

    filename = r"C:\\Users\\\Mohammed Haris\\Documents\\Prog for DS\\Coursework\\\datasets\\2000.csv"

unborn temple Jan 9, 2022, 5:16 PM

#

unborn temple ```py filename = r"C:\\Users\\\Mohammed Haris\\Documents\\Prog for DS\\Cours...

This the correct file path, the one you used is not

lapis sequoia Jan 9, 2022, 5:16 PM

#

Hii all

pearl grove Jan 9, 2022, 5:18 PM

#

unborn temple This the correct file path, the one you used is not

I tried that too but same error

pearl grove Jan 9, 2022, 5:18 PM

#

unborn temple This might help you faster

k lemme check that

untold yew Jan 9, 2022, 5:21 PM

#

I have 32 images for my object classifier. Is 30 for training and 2 for testing fine?
32 images of all objects if I might add

unborn temple Jan 9, 2022, 5:21 PM

#

untold yew I have 32 images for my object classifier. Is 30 for training and 2 for testing ...

Uhh, that depends on the problem you are working

untold yew Jan 9, 2022, 5:22 PM

#

unborn temple Uhh, that depends on the problem you are working

3 objects, just infront of a white background

unborn temple Jan 9, 2022, 5:22 PM

#

Is the problem hard or easy

untold yew Jan 9, 2022, 5:22 PM

#

not really hard

#

it has to decide what object it is

#

from 3 very different ones, in front of a white wall

#

the objects are a tennis ball, nasal spray and hand cream

#

which all look very different

#

(in color and size)

unborn temple Jan 9, 2022, 5:25 PM

#

Hmm, you could increase your dataset by some methods like cutting the images(not objects included) changing colour, stretching the image

untold yew Jan 9, 2022, 5:25 PM

#

note, it doesnt need to be very good yet, im following a tutorial and this is the first one

unborn temple Jan 9, 2022, 5:25 PM

#

Are you using tensorflow?

untold yew Jan 9, 2022, 5:26 PM

#

yes

#

tf1

unborn temple Jan 9, 2022, 5:27 PM

#

Then wait a sec, I will tell you a preprocessing method

untold yew Jan 9, 2022, 5:27 PM

#

okay

#

I labeled all the images already btw

#

1 label, 1 picture | 32 times

unborn temple Jan 9, 2022, 5:28 PM

#

https://www.pyimagesearch.com/2021/06/28/data-augmentation-with-tf-data-and-tensorflow/

PyImageSearch

Adrian Rosebrock

Data augmentation with tf.data and TensorFlow - PyImageSearch

In this tutorial, you will learn two methods to incorporate data augmentation into your “tf.data” pipeline using Keras and TensorFlow.

#

Here is the methods, to increase the dataset

untold yew Jan 9, 2022, 5:29 PM

#

I think it does augmentation automatically

#

atleast he said that

#

in the video

unborn temple Jan 9, 2022, 5:30 PM

#

Hmm, then, 25 for training, remaining for testing

untold yew Jan 9, 2022, 5:31 PM

#

btw someone just told me I should use 70/30 for training/testing

untold yew Jan 9, 2022, 5:31 PM

#

unborn temple Hmm, then, 25 for training, remaining for testing

okay I will do that

unborn temple Jan 9, 2022, 5:31 PM

#

untold yew btw someone just told me I should use 70/30 for training/testing

There are cases, training using 15 images, and getting awesome results

#

It all depends on complexity

untold yew Jan 9, 2022, 5:33 PM

#

I see

#

I put all the images in, im gonna train now

unborn temple Jan 9, 2022, 5:34 PM

#

untold yew I put all the images in, im gonna train now

Are you fine tuning a trained model, or creating one from scratch?

untold yew Jan 9, 2022, 5:34 PM

#

unborn temple Are you fine tuning a trained model, or creating one from scratch?

fine tuning I think

#

cause I picked from model zoo

#

but not sure

#

im new to this

unborn temple Jan 9, 2022, 5:35 PM

#

It might take some time then

untold yew Jan 9, 2022, 5:35 PM

#

it will

#

it did when I tried yesterday too

#

but thats fine I planned that

unborn temple Jan 9, 2022, 5:35 PM

#

So okay, good luck on your project

untold yew Jan 9, 2022, 5:35 PM

#

thanks!

untold yew Jan 9, 2022, 6:22 PM

#

@unborn temple there doesnt seem to be a lot happening here except it having saved the first checkpoint 40 min ago

#

even though it is using 20% cpu

#

and directml has connected to my gpu

lapis sequoia Jan 9, 2022, 6:33 PM

#

Hello is it possible to compare the values of two columns of different length in two different dataframes?

df['habitants'] = habitants_df.loc[(habitants_df['Municipality'].apply(lambda x: str(x).split(' ', 1)[-1]) == df['cityTown']), 'Total']

Getting: raise ValueError("Can only compare identically-labeled Series objects") ValueError: Can only compare identically-labeled Series objects

#

df has approximately 53k rows whereas habitants_df has 501 rows

azure talon Jan 9, 2022, 6:40 PM

#

What is a great way to learn Python ML?

serene scaffold Jan 9, 2022, 7:13 PM

#

@lapis sequoia you can't compare series that don't have identical sets of indices. You have to use the eq method and specify a fill value

#

!docs pandas.Series.eq

arctic wedgeBOT Jan 9, 2022, 7:13 PM

#

pandas.Series.eq


Series.eq(other, level=None, fill_value=None, axis=0)```
Return Equal to of series and other, element-wise (binary operator eq).

Equivalent to `series == other`, but with support to substitute a fill\_value for missing data in either one of the inputs.

vale isle Jan 9, 2022, 7:15 PM

#

I get the error "You must compile your model before training/testing. Use model.compile(optimizer, loss)." on the last line

I compiled it... anyone spots my mistake?

model_2 = tf.keras.Sequential([
    data_augmentation ,
    layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
    layers.Conv2D(16, 3, padding='same', activation='relu'),
    layers.MaxPooling2D((2, 2), strides=2),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D((2, 2), strides=2),
    layers.Conv2D(64, 3, padding='same', activation='relu'),
    layers.MaxPooling2D((2, 2), strides=2),
    layers.Flatten(),
    layers.Dense(64,activation='relu'),
    layers.Dense(5, activation='softmax'),
])

model_2.compile(optimizer='adamax',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

history_3 = model_2.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs,
    callbacks=[es]
)


adamax_nodropout_es = model_2.evaluate(val_ds)

serene scaffold Jan 9, 2022, 7:16 PM

#

@vale isle you re defined model_2

lapis sequoia Jan 9, 2022, 7:16 PM

#

azure talon What is a great way to learn Python ML?

depends on many things, like your prior experience and learning style

vale isle Jan 9, 2022, 7:17 PM

#

serene scaffold <@65514056395530240> you re defined model_2

I copy pasted my code wrong from my notebook (its in different cells), edited it now

serene scaffold Jan 9, 2022, 7:18 PM

#

vale isle I copy pasted my code wrong from my notebook (its in different cells), edited it...

So you're using a notebook? Make sure you're executing the cells in the right order. Did you compile it in the same cell where it was defined?

vale isle Jan 9, 2022, 7:19 PM

#

serene scaffold So you're using a notebook? Make sure you're executing the cells in the right or...

i tried both, in the same cell and in a different cell

#

i'm going to restart and run the complete notebook. i guess i messed up with some copy pastes trying out different parameters

serene scaffold Jan 9, 2022, 7:21 PM

#

I'd have to look at the notebook and know the exact execution order. But I would restart the kernel and do everything in one cell until you're sure it's working

vale isle Jan 9, 2022, 7:21 PM

#

allright, thanks!

rose pasture Jan 9, 2022, 7:22 PM

#

Hi, can someone explain what happens when the transform() method is used please? I don't understand what it does and the purpose of it. I have an example here https://paste.pythondiscord.com/vubuyurewo.yaml

stone marlin Jan 9, 2022, 7:49 PM

#

In the case of the Scaler, the Scaler "fits" onto your data and gets a mean and a standard deviation (which are given by the mean_, scale_ parts below). It then computes the corresponding z-score for whatever you put in when you call transform.

In [1]: from sklearn.preprocessing import StandardScaler
In [2]: import numpy as np
In [3]: scaler = StandardScaler()
In [4]: x = np.array([10, 10, 10, 12, 10, 12, 9, 9, 8, 8]).reshape(-1, 1)
In [7]: scaler.fit(x)

In [10]: scaler.mean_
Out[10]: array([9.8])

In [11]: scaler.scale_
Out[11]: array([1.32664992])

In [13]: scaler.transform(np.array([9.8, 12, 15, 20]).reshape(-1, 1))
Out[13]:
array([[0.        ],
       [1.6583124 ],
       [3.91964748],
       [7.68853929]])

#

In general, there are two parts to most estimators / processors in Sklearn: fit and transform. Fit will take in training data and figure out what it needs to do with the data (compute means, stdevs, or, in the case of models, the model itself). Transform will transform the data into a new form to be used where ever. Scaling, for example, will scale the data nicely. PCA will apply PCA to the data.

Pipelining in sklearn makes heavy use of this and it might "click" better to see things in terms of pipelines:
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

scikit-learn

sklearn.pipeline.Pipeline

Examples using sklearn.pipeline.Pipeline: Feature agglomeration vs. univariate selection Feature agglomeration vs. univariate selection, Pipeline ANOVA SVM Pipeline ANOVA SVM, Poisson regression an...

lapis sequoia Jan 9, 2022, 7:55 PM

#

Bayesian optimization is a little more sophisticated. It assumes that a specific probability distribution, which is typically a Gaussian distribution is underlying the performance of model architectures. So you use observations from tested architectures to constrain the probability distribution and guide the selection of the next option. This allows us to build up an architecture stochastically based on the test results and the constrained distribution.
I don't understand what does
which is typically a Gaussian distribution is underlying the performance of model architectures.
mean.
Also, what does this
to constrain the probability distribution and guide the selection of the next option.
mean?

stone marlin Jan 9, 2022, 7:57 PM

#

Bayesian optimization is a little rough to understand. The gist (greatly simplified) is:

You assume that something has a certain distribution (like maybe a normal curve). But then you get more data and you notice it's a little skewed. So you update the distribution. Then the distribution tells you what to do next. You do it, you get more data which changes the distribution a bit more --- so you update it and continue.

serene scaffold Jan 9, 2022, 8:20 PM

#

@lapis sequoia @vale isle did you both figure out what you were working on?

rose pasture Jan 9, 2022, 8:25 PM

#

stone marlin In general, there are two parts to most estimators / processors in Sklearn: fit ...

The transform method will transform the data into a new form that I need, by figuring it out by itself? The z-score is used to make sense of the given data?

stone marlin Jan 9, 2022, 8:28 PM

#

It does figure it out by itself given the data you put into the "fit" method. StandardScaler, for example, when you "fit" the data, it'll find the mean and stdev of that data. Then, whenever you you "transform" new data, it'll scale it according to the mean and stdev it found.

The z-score is a standard thing in statistics that maps some value to a standard normal distribution. I think this site might explain what the z-score is and how it's used:
https://www.simplypsychology.org/z-score.html

The gist is, it tells you how many standard deviations away from the mean data is. So if you transform a number and get "1" back, it was 1 stdev above from the mean. If you get -1.6 back, it was 1.6 stdev below the mean.

rose pasture Jan 9, 2022, 8:35 PM

#

stone marlin It does figure it out by itself given the data you put into the "fit" method. S...

Is there a way to find the range of the stdev transformed data? Thank you once again for the great explanation! I understand it much better now.

stone marlin Jan 9, 2022, 8:36 PM

#

No problem, if you haven't done stats in a while it can be a bit confusing, and you slowly get used to the fit-transform stuff in sklearn.

What do you mean by the range?

rose pasture Jan 9, 2022, 8:42 PM

#

stone marlin No problem, if you haven't done stats in a while it can be a bit confusing, and ...

Yeah haven't done stats in a while. I might have to go through my notes on stats again just to refresh my memory.

Like the min max of the stdev of the given data.

stone marlin Jan 9, 2022, 8:43 PM

#

Like, a confidence interval? The mean and standard deviation of data are both floats. For example,

In [14]: a = np.array([1, 2, 3, 4])
In [15]: a.mean()
Out[15]: 2.5
In [16]: a.std()
Out[16]: 1.118033988749895

rose pasture Jan 9, 2022, 8:54 PM

#

stone marlin Like, a confidence interval? The mean and standard deviation of data are both f...

Let's say you transform that np.array([1,2,3,4]).reshapre(-1,1)
you'd get this:
https://paste.pythondiscord.com/iyerurihax.lua
So the range of the transform data would be from -1.34163079 to 1.34163079, how do you go about finding that?

azure talon Jan 9, 2022, 8:54 PM

#

lapis sequoia depends on many things, like your prior experience and learning style

No experience, and learning style being hands on, so I guess through projects.

stone marlin Jan 9, 2022, 8:55 PM

#

Well, you could find the range of the transformed training data by doing .max(), .min(), but there won't be a max or a min in general. The StandardScaler can transform a number to any number between -infty and infty.

rose pasture Jan 9, 2022, 8:57 PM

#

stone marlin Well, you could find the range of the transformed training data by doing `.max()...

Oh i see so there's no point in finding that. Thanks!!

lapis sequoia Jan 9, 2022, 9:08 PM

#

serene scaffold <@456226577798135808> you can't compare series that don't have identical sets of...

Thanks for the clarification, Can I set a multi-index on the columns that I want to compare within the dataframe that has less rows and then perform a .loc with column values of the other dataframe? This is what I mean:

multi_indexed_habitants_df = habitants_df.set_index(['Municipality', 'Period']).sort_index(ascending=True)
df['habitants'] = multi_indexed_habitants_df.loc[(df['cityTown'], pd.to_datetime(df['startDate'], format='%Y-%m-%dT%H:%M:%S').dt.year), 'Total']

serene scaffold Jan 9, 2022, 9:10 PM

#

lapis sequoia Thanks for the clarification, Can I set a multi-index on the columns that I want...

wow, there's a lot going on here. can you run this and show me the resultant text?

for d in (df, habitants_df):
    print(d.head().to_dict('list'), d.index)

#

Please ping me when you've done that.

#

also, please run it before the code that you show me, so that I can see what the data looks like before this

lapis sequoia Jan 9, 2022, 9:12 PM

#

serene scaffold Please ping me when you've done that.

Sure

slow vigil Jan 9, 2022, 9:16 PM

#

Is there any way to use a function that returns two separate values with rolling.apply() ?

#

in pandas

#

df['A'], df['B'] = series.rolling(7).apply(my_func)

#

something like this?

serene scaffold Jan 9, 2022, 9:18 PM

#

slow vigil `df['A'], df['B'] = series.rolling(7).apply(my_func)`

when you apply a function, don't call it. you want to pass the actual function

slow vigil Jan 9, 2022, 9:18 PM

#

ohh ok, but can I use one that returns two values

serene scaffold Jan 9, 2022, 9:19 PM

#

and no, a function applied with rolling has to return a numeric type.

slow vigil Jan 9, 2022, 9:19 PM

#

oh I know what I could do

#

I could apply rolling inside the function to two separate series

#

and then return them

#

or

#

no

serene scaffold Jan 9, 2022, 9:20 PM

#

roller = series.rolling(7)
df['A'], df['B'] = roller.apply(my_func), roller.apply(other_func)

slow vigil Jan 9, 2022, 9:20 PM

#

you now what I mean

serene scaffold Jan 9, 2022, 9:21 PM

#

@lapis sequoia still working on it?

lapis sequoia Jan 9, 2022, 9:22 PM

#

serene scaffold <@456226577798135808> still working on it?

I have it I'm just trying to pass it prettier

serene scaffold Jan 9, 2022, 9:22 PM

#

lapis sequoia I have it I'm just trying to pass it prettier

it doesn't matter as I'm going to copy and paste it into a REPL

vale isle Jan 9, 2022, 9:24 PM

#

serene scaffold <@456226577798135808> <@!65514056395530240> did you both figure out what you wer...

Wow, thank you for asking! Yes i solved it, thanks for your help :))

lapis sequoia Jan 9, 2022, 9:24 PM

#

serene scaffold it doesn't matter as I'm going to copy and paste it into a REPL

There u have it

#

{'incidenceId': [56125, 56123, 56122, 56121, 56120], 'sourceId': [1, 1, 1, 1, 1], 'incidenceType': ['Accidente', 'Accidente', 'Accidente', 'Seguridad vial', 'Seguridad vial'], 'autonomousRegion': ['Euskadi', 'Euskadi', 'Euskadi', 'Euskadi', 'Euskadi'], 'province': ['BIZKAIA', 'ARABA', 'GIPUZKOA', 'BIZKAIA', 'GIPUZKOA'], 'carRegistration': ['BI', 'VI', 'SS', 'BI', 'SS'], 'cause': ['Alcance', 'Alcance', 'Alcance', 'Averï¿½a', 'Averï¿½a'], 'cityTown': ['Zeanuri', 'Vitoria-Gasteiz', 'Eskoriatza', 'Barakaldo', 'Villabona'], 'startDate': ['2022-01-08T13:21:08', '2022-01-08T13:36:43', '2022-01-08T13:38:31', '2022-01-08T12:27:20', '2022-01-08T10:52:05'], 'incidenceLevel': ['Green', 'Yellow', 'Yellow', 'Green', 'Green'], 'road': ['N-240', 'N-622', 'AP-1', 'A-8', 'N-1'], 'pkStart': [39.0, 5.0, 121.0, 124.0, 444.0], 'pkEnd': [39.0, 5.0, 121.0, 124.0, 444.0], 'direction': ['BILBAO', 'BILBAO', 'Madrid', 'CANTABRIA', 'IRï¿½N'], 'latitude': [43.06164, 42.88119, 43.01685, 43.28869, 43.19736],
'longitude': [-2.70798, -2.69503, -2.541243, -3.0047, -2.0412], 'incidenceName': [None, None, None, None, None], 'endDate':
[None, None, None, None, None]}
Int64Index([ 0, 2, 3, 4, 5, 7, 8, 9, 10,
11,
...
57334, 57335, 57337, 57338, 57339, 57340, 57341, 57343, 57344,
57345],
dtype='int64', length=43493)
{'Municipios': ['Agurain/Salvatierra', 'Agurain/Salvatierra', 'Alegrï¿½a-Dulantzi', 'Alegrï¿½a-Dulantzi', 'Amurrio'], 'Sexo': ['Total', 'Total', 'Total', 'Total', 'Total'], 'Periodo': [2021, 2020, 2021, 2020, 2021], 'Total': ['5.029', '5.038', '2.925', '2.935', '10.307'], 'Provincia': ['Araba', 'Araba', 'Araba', 'Araba', 'Araba']}
RangeIndex(start=0, stop=502, step=1)

serene scaffold Jan 9, 2022, 9:24 PM

#

thanks 😄

#

@lapis sequoia wow, are you Basque?

lapis sequoia Jan 9, 2022, 9:26 PM

#

Yes I'm

serene scaffold Jan 9, 2022, 9:27 PM

#

wow, I've never met a Basque person surprisedPika

upbeat dove Jan 9, 2022, 9:27 PM

#

Can I make a Sequential take any length of an input (for an RNN network)

serene scaffold Jan 9, 2022, 9:27 PM

#

pretty sure you have to use something like an RNN for that, but I'm not completely sure

upbeat dove Jan 9, 2022, 9:28 PM

#

Yeah I am

#

My nn will consists of Embedding -> LSTM (no return sequences) -> Dense

serene scaffold Jan 9, 2022, 9:28 PM

#

@lapis sequoia can you explain what comparison you're trying to do? population for towns at each period?

lapis sequoia Jan 9, 2022, 9:28 PM

#

serene scaffold <@456226577798135808> can you explain what comparison you're trying to do? popul...

Sure

lapis sequoia Jan 9, 2022, 9:29 PM

#

azure talon No experience, and learning style being hands on, so I guess through projects.

Kaggle is the place to go then, they have both free courses and lots of projects https://www.kaggle.com/learn/intro-to-machine-learning

Learn Intro to Machine Learning Tutorials

Learn the core ideas in machine learning, and build your first models.

upbeat dove Jan 9, 2022, 9:29 PM

#

upbeat dove My nn will consists of `Embedding` -> `LSTM (no return sequences)` -> `Dense`

Since I'm not returning a sequence, in theory, I should be able to give as long of an input as I please

lapis sequoia Jan 9, 2022, 9:31 PM

#

Basically I'm trying to create a new column called habitants within the incidences dataframe that contains the number of habitants of that particular city where the incidence took place, for that I compare Municipality and cityTown which are the same but with different names for each dataframe and then I extract the year from the startDate column and compare it to the Period column which is just the year.

serene scaffold Jan 9, 2022, 9:32 PM

#

lapis sequoia Basically I'm trying to create a new column called `habitants` within the incide...

oh okay, you just have to do a merge operation

#

so first I did df['year'] = pd.to_datetime(df['startDate'], format='%Y-%m-%dT%H:%M:%S').dt.year

lapis sequoia Jan 9, 2022, 9:33 PM

#

serene scaffold oh okay, you just have to do a `merge` operation

Maybe it could be accomplished easily as you said with a merge

lapis sequoia Jan 9, 2022, 9:33 PM

#

serene scaffold so first I did `df['year'] = pd.to_datetime(df['startDate'], format='%Y-%m-%dT%H...

I see

#

With that you have a new column with the year within incidences df

serene scaffold Jan 9, 2022, 9:33 PM

#

!docs pandas.DataFrame.merge

arctic wedgeBOT Jan 9, 2022, 9:33 PM

#

pandas.DataFrame.merge


DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

serene scaffold Jan 9, 2022, 9:35 PM

#

So you need to pick which columns represent the same thing in both dataframes

#

which one means the same thing as Municipios?

#

cityTown? or province?

lapis sequoia Jan 9, 2022, 9:36 PM

#

serene scaffold which one means the same thing as Municipios?

cityTown

serene scaffold Jan 9, 2022, 9:36 PM

#

okay great

#

so you will have something like left_on=['Municipios', 'Periodo'], right_on=['cityTown', 'year']

#

to show which columns are used to match the rows.

lapis sequoia Jan 9, 2022, 9:38 PM

#

I see and then once I do the merge I can query those columns as usual

serene scaffold Jan 9, 2022, 9:38 PM

#

yes

lapis sequoia Jan 9, 2022, 9:38 PM

#

spicy move not gonna lie

#

Well I'm gonna implement that and I see what I get

serene scaffold Jan 9, 2022, 9:39 PM

#

I have to go, but I wrote down the solution. So see if you can figure it out

#

if you try and can't figure it out, I will show you.

lapis sequoia Jan 9, 2022, 9:40 PM

#

serene scaffold I have to go, but I wrote down the solution. So see if you can figure it out

Yeah sure I will try it tomorrow. Thank you very much

#

If you wanna learn basque let me know😅

stone marlin Jan 9, 2022, 10:56 PM

#

This isn't really data science or ai, you may have better luck in one of the help channels, Viking.

eager cloak Jan 9, 2022, 11:01 PM

#

stone marlin This isn't really data science or ai, you may have better luck in one of the hel...

mb wasnt supposed to send it here xD

#

i tried clicking on #discord-bots and mustve accidentally clicked here

lapis sequoia Jan 9, 2022, 11:38 PM

#

stone marlin Bayesian optimization is a little rough to understand. The gist (greatly simpli...

Thanks for answer

#

When the distance between observations grows, supervised learning becomes more difficult because predictions for new samples are less likely to be based on learning from similar training examples.
What is meant by "observations" here?

#

Is that input variables?

cursive onyx Jan 10, 2022, 12:20 AM

#

can someone send the source code of a premade chatbot AI? would mean a lot

#

in python

proper wren Jan 10, 2022, 12:49 AM

#

Who wants to work on a project involving ML and AI?

serene scaffold Jan 10, 2022, 1:13 AM

#

proper wren Who wants to work on a project involving ML and AI?

Why don't you say what the project is? Remember that you can't recruit for closed-source projects or business ventures.

proper wren Jan 10, 2022, 1:15 AM

#

serene scaffold Why don't you say what the project is? Remember that you can't recruit for close...

Oh. Well, were trying in the end to make like an alexa-type robot and soon have hardware for it... but i guess that counts as closed-source?

serene scaffold Jan 10, 2022, 1:15 AM

#

proper wren Oh. Well, were trying in the end to make like an alexa-type robot and soon have ...

if you do it on github, then it's open-source

proper wren Jan 10, 2022, 1:15 AM

#

serene scaffold if you do it on github, then it's open-source

Ok. We are doing it on github

serene scaffold Jan 10, 2022, 1:15 AM

#

in a public repository? link?

proper wren Jan 10, 2022, 1:16 AM

#

serene scaffold in a public repository? link?

getting it from dekriel

serene scaffold Jan 10, 2022, 1:16 AM

#

who is dekriel

proper wren Jan 10, 2022, 1:17 AM

#

https://github.com/Dekriel/pocket

GitHub

GitHub - Dekriel/pocket: socket but we didn't bother using it

socket but we didn't bother using it. Contribute to Dekriel/pocket development by creating an account on GitHub.

proper wren Jan 10, 2022, 1:17 AM

#

serene scaffold who is dekriel

@stark kiln

#

not much progress yet tho

earnest fog Jan 10, 2022, 1:24 AM

#

Why is the csv file being read like this instead of displaying the columns and rows nicely, or is the csv file broken?

#

Like this

earnest fog Jan 10, 2022, 1:25 AM

#

earnest fog Like this

This is the desired outcome

#

not the other

proper wren Jan 10, 2022, 1:26 AM

#

earnest fog Why is the csv file being read like this instead of displaying the columns and r...

Maybe something went wrong with the code? or the website formatting is diffrent?

earnest fog Jan 10, 2022, 1:27 AM

#

data_files = [file for file in os.listdir('./dataframes')]

both_subjects = pd.DataFrame()

for file in data_files:
    df = pd.read_csv('./dataframes/' + file)
    both_subjects = pd.concat([both_subjects, df])

both_subjects.head()

#

this is the code

proper wren Jan 10, 2022, 1:28 AM

#

hm

#

i think thats the website

earnest fog Jan 10, 2022, 1:30 AM

#

Any idea how to fix it?

serene scaffold Jan 10, 2022, 1:53 AM

#

earnest fog ```py data_files = [file for file in os.listdir('./dataframes')] both_subjects ...

try this:

both_subects = pd.concat((pd.read_csv(f'./dataframes/{file}', sep=';') for file in data_files), axis=1)

#

your separator appears to be a ;, whereas , is the default.

#

also, you should avoid design like both_subjects = pd.concat([both_subjects, df]). this involves copying every single cell of both_subjects every time, and is thus horribly inefficient

midnight fossil Jan 10, 2022, 2:01 AM

#

Hi, im trying to add the column "Days" with the column "Recieved". However, idk how to make it so python adds it with the days in the month. ex: 7 + 06.12.21 = 13.12.21

#

The library im using is pandas
Any help would be really appreciated

serene scaffold Jan 10, 2022, 2:02 AM

#

@midnight fossil one column is of floating point numbers, and one is of strings. are you trying to do addition or string concatenation?

#

oh I see. they're both wrong.

#

Days needs to be a timedelta, and Received needs to be a datetime.

midnight fossil Jan 10, 2022, 2:03 AM

#

I've heard of the datetime library before

#

but im not quite sure what you mean by that

#

sorry im a bit of a noob

#

like from dateutil.relativedelta import relativedelta?

serene scaffold Jan 10, 2022, 2:04 AM

#

a datetime is an exact time in the real world ("16.12.21"), and a timedelta is a duration ("7 days")

midnight fossil Jan 10, 2022, 2:05 AM

#

Ok, thanks. I guess ill watch a video on how to convert "Recieved" to a datetime

serene scaffold Jan 10, 2022, 2:05 AM

#

can you do print(df.head().to_dict('list')) and copy and paste the text into this chat?

#

I will not accept a screenshot

midnight fossil Jan 10, 2022, 2:06 AM

#

import pandas as df
df = df.read_excel('C:/Users/enesi/OneDrive/Amazon accounting.xlsx')
Total = df['Days '] + df['Received']
print(df.head().to_dict('list'))

serene scaffold Jan 10, 2022, 2:06 AM

#

you have to copy and paste the result of the print statement, not the code.

midnight fossil Jan 10, 2022, 2:06 AM

#

oh, my bad😆

serene scaffold Jan 10, 2022, 2:06 AM

#

C:/Users/enesi/OneDrive/Amazon accounting.xlsx is not a file that I have, so this is how I can figure out what is in it

#

in a way that is useful

midnight fossil Jan 10, 2022, 2:06 AM

#

result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for +: 'float' and 'str'

serene scaffold Jan 10, 2022, 2:07 AM

#

do it immediately after you define df

#

before the line that causes the error

midnight fossil Jan 10, 2022, 2:08 AM

#

'Amazon product testing': [1, 2, 3, 4, 5], 'Platform': ['Telegram', 'Telegram', 'Telegram', 'Telegram', 'Telegram'], 'Seller name': ['Mikki Mikki', 'Wangzekun', 'Chari', 'E S', 'Mikki Mikki'], 'Order date': ['03.12.21', '03.12.21', '04.12.21', '05.12.21', '05.12.21'], 'Order num': ['302-9210629-2828348', '302-8699596-8046740', '302-4841467-2222755', '302-4797916-9617919', '302-1684012-4898734'], 'product name': ['Uhrenladegerät', 'PC controller', 'GROJAT smart uhr', 'Aufsteckbürsten', 'Styles pen'], 'price': ['12,99', '27,99', '45,99', '7,99', '27,98'], 'Days ': [7.0, 5.0, 7.0, 5.0, 7.0], 'Received': ['06.12.21', '08.12.21', '08.12.21', '08.12.21', '08.12.21'], 'Reviewed': [1.0, 1.0, 1.0, 1.0, 1.0], 'refund status': [1.0, 1.0, 1.0, 1.0, 1.0], 'sold price': [nan, nan, nan, nan, nan], 'Add. Info': [nan, nan, 'picture', nan, nan], 'Trustworthy': ['YES', 'YES', 'YES', 'YES', nan]}

#

it just lists everything in the excel file

serene scaffold Jan 10, 2022, 2:09 AM

#

There's no opening {, but I'm going to assume that's the only character that's missing.

midnight fossil Jan 10, 2022, 2:09 AM

#

yeah

serene scaffold Jan 10, 2022, 2:09 AM

#

If there are missing columns in this, I will never be able to figure that out if you don't tell me.

midnight fossil Jan 10, 2022, 2:09 AM

#

forrgot to copy that

#

everrything is there except the {

serene scaffold Jan 10, 2022, 2:17 AM

#

>>> df2 = df.assign(
    **{
        'Order date': pd.to_datetime(df['Order date']),  # Convert `Order date` to a timestamp
        'Days': pd.to_timedelta(df['Days '], unit='D')  # convert `Days ` to a duration, without the extra space
    }
).drop('Days ', axis=1)  # Drop the column that has the extra space

@midnight fossil look at this

midnight fossil Jan 10, 2022, 2:17 AM

#

Damn

#

Thanks a lot

#

quick question, what do the 2 ** mean?

serene scaffold Jan 10, 2022, 2:18 AM

#

good question! it's an esoteric python thing.

midnight fossil Jan 10, 2022, 2:18 AM

#

midnight fossil Jan 10, 2022, 2:19 AM

#

serene scaffold good question! it's an esoteric python thing.

oh, cool

serene scaffold Jan 10, 2022, 2:19 AM

#

yes, I know what two asterisks you're referring to, lol

#

also remove the >>> as those are part of a REPL

#

which is a type of python environment

#

you're probably not using that.

midnight fossil Jan 10, 2022, 2:19 AM

#

Yeah, i figured.

#

Thanks again for the help

serene scaffold Jan 10, 2022, 2:20 AM

#

anyway, from there, Days and Order date are in formats that you can work with as actual time representations

#

since Order date becomes an exact timestamp, you can add durations to it and get an updated timestamp

midnight fossil Jan 10, 2022, 2:28 AM

#

I still got this errror for some reason:

#

I think its because you created a dataframe but I dont need a dataframe since im trying to get python to read off of an excel document

serene scaffold Jan 10, 2022, 8:13 AM

#

midnight fossil I think its because you created a dataframe but I dont need a dataframe since im...

you did import pandas as df when you should have done import pandas as pd. if you import pandas as df, and then immediately name your first DataFrame df, then you won't be able to use the to_datetime or to_timedelta functions.

You are creating a DataFrame of the excel document.

tall drum Jan 10, 2022, 9:46 AM

#

Hi, I have three numpy arrays, x and y coordinates and deflections in z coordinate, each are about size 2500. I tried to do meshgrid for x/y coordinates which was succesfull but somehow when I try to do np.meshgrid(x,y,z) it uses all the memory on my machine (180GB).
Is this normal or is there a better way maybe?

wicked grove Jan 10, 2022, 10:37 AM

#

upbeat prism <@!696373334119546890> I'm no expert myself and no idea about the time since I n...

Heyy i just saw this
Thank you soo much for the detailed answer
I do have a gpu on my laptop but idk how powerful it is
Yes i will check CUDA out,but i switched to google colab and i really hope that works
Ah okay you are right i ran out of memory ( like my c drive was full)

#

I checked it ,apparently there is intel(R) UHD 620 and radeon graphics (I'm kinda confused about this)

void peak Jan 10, 2022, 11:43 AM

#

wicked grove Heyy i just saw this Thank you soo much for the detailed answer I do have a g...

Dam, how does that device work with that

severe folio Jan 10, 2022, 11:45 AM

#

Would it be unwise to try and learn how to do machine learning/ai with python (I have only probably written 400 lines of code max)

lapis sequoia Jan 10, 2022, 12:07 PM

#

severe folio Would it be unwise to try and learn how to do machine learning/ai with python (I...

in general I would advise this:
1 learn Python basics
2 small projects with only Python (preferably no frameworks)
3 learn data science basics, think Numpy, Pandas and Matplotlib
4 learn ML/AI
because it is an advanced topic

#

this is just my two cents, I'm still somewhere at point 2/3 myself

severe folio Jan 10, 2022, 12:08 PM

#

lapis sequoia in general I would advise this: 1 learn Python basics 2 small projects with only...

Thanks 👍

warm jungle Jan 10, 2022, 12:08 PM

#

wicked grove I checked it ,apparently there is intel(R) UHD 620 and radeon graphics (I'm kind...

It's not uncommon for computers to have a graphics chip on the motherboard as well as an another gpu

lapis sequoia Jan 10, 2022, 12:09 PM

#

severe folio Thanks 👍

you're welcome !

gentle lion Jan 10, 2022, 12:15 PM

#

I have a neural network that should try to classify images of chairs into 18 different classes. More specific, given an image of a chair, it should be able to predict it's rotation. Each class is a certain rotation degree of a chair. For example, as you can see in the image, all chairs in the class "260 degrees" are rotated 260 degrees. I have the same 1.8k images in each class, but rotated differently of course. Now after 16 epochs, my model's accuracy has not improved. It's still guessing with an accuracy of 5.5% (which is random guessing if there are 18 classes). i'll also include my neural network's structure. Does anyone know some key things to make this better? (maybe even a different approach, as it should ideally predict more angles and not only angles divisable by 20)

night gorge Jan 10, 2022, 12:48 PM

#

I plotted a boxplot using seaborn library.
For "Iris-setosa",
Q1 is 3.1,
Q2 is 3.4,
Q3 is 3.6
IQR value is Q3-Q1 = 0.5

According to theory, lower and upper plot whiskers(extended thin line) should be on (Q1 - 1.5xIQR) and (Q3 + 1.5xIQR) respectively.
That would give us values 2.35 and 4.35 respectively.

But as you can see in box plot for "Iris-setosa", the starting and ending point of whiskers are not on that values.I have also specified that whisker value should be 1.5 while calling boxplot. Why this happens?

#

wicked grove Jan 10, 2022, 1:37 PM

#

Can someone please tell me how i can increase the ram on google colab

serene scaffold Jan 10, 2022, 1:46 PM

#

tall drum Hi, I have three numpy arrays, x and y coordinates and deflections in z coordina...

is it possible to do it as a sparse array?

serene scaffold Jan 10, 2022, 1:48 PM

#

wicked grove Can someone please tell me how i can increase the ram on google colab

I found two articles confirming that you can use this weird trick: https://towardsdatascience.com/double-your-google-colab-ram-in-10-seconds-using-these-10-characters-efa636e646ff

Medium

Double Your Google Colab RAM in 10 Seconds Using a Tiny Line of Code

Also, a coding challenge!

#

though keep in mind that as a free service, they're not going to keep granting you more ram indefinitely

#

also, if you're using a deep learning library, I would confirm that your tensors are actually on the GPU and not in the CPU.

wicked grove Jan 10, 2022, 1:56 PM

#

serene scaffold I found two articles confirming that you can use this weird trick: https://towar...

i read these articles and tried their code but ig colab changed their policy cause that option isnt there anymore

wicked grove Jan 10, 2022, 1:57 PM

#

serene scaffold also, if you're using a deep learning library, I would confirm that your tensors...

yes i am using tensorflow...but all my data is loaded in numpy arrays

serene scaffold Jan 10, 2022, 1:58 PM

#

wicked grove yes i am using tensorflow...but all my data is loaded in numpy arrays

why don't you convert them to cuda tensors?

wicked grove Jan 10, 2022, 1:59 PM

#

and run it on my local pc?yes somebody suggested that to me, but idk if cuda works for radeon graphics

serene scaffold Jan 10, 2022, 2:00 PM

#

wicked grove and run it on my local pc?yes somebody suggested that to me, but idk if cuda wor...

no, in colab. if I understand correctly, you're not using the GPU in colab

#

numpy arrays can't go on the GPU, so they'll just take up all your RAM.

wicked grove Jan 10, 2022, 2:00 PM

#

yeah i am not even tho i have activated it

serene scaffold Jan 10, 2022, 2:00 PM

#

so that's probably the problem

wicked grove Jan 10, 2022, 2:00 PM

#

serene scaffold numpy arrays can't go on the GPU, so they'll just take up all your RAM.

ohhh

serene scaffold Jan 10, 2022, 2:00 PM

#

use cuda tensors. in colab.

wicked grove Jan 10, 2022, 2:00 PM

#

serene scaffold so that's probably the problem

yes you are rightt

wicked grove Jan 10, 2022, 2:01 PM

#

serene scaffold use cuda tensors. in colab.

yeah somebody suggested CUDA yesterday...do you think it is better to use my local pc (has 8gb ram) or do everything on colab?

wicked grove Jan 10, 2022, 2:02 PM

#

serene scaffold use cuda tensors. in colab.

alrightt

#

ill google a few videos and try it,thank youu!!

#

@serene scaffold can i use sklearn's train_test_split on tensors?

serene scaffold Jan 10, 2022, 2:06 PM

#

wicked grove <@!253696366952316929> can i use sklearn's train_test_split on tensors?

are you using pytorch or tensorflow

wicked grove Jan 10, 2022, 2:06 PM

#

tensorflow

serene scaffold Jan 10, 2022, 2:07 PM

#

there's a few solutions given here: https://stackoverflow.com/questions/41859605/split-tensor-into-training-and-test-sets

Stack Overflow

Split tensor into training and test sets

Let's say I've read in a textfile using a TextLineReader. Is there some way to split this into train and test sets in Tensorflow? Something like:

def read_my_file_format(filename_queue):
reader ...

untold yew Jan 10, 2022, 2:16 PM

#

this code would work locally and just detect objects I hold infront of the webcam. However, I am doing it in google colab and it just stops after 0 seconds and never shows anything. I suppose that has to do with the capture device not being accessed. How do I access my webcam from google colab with opencv?

wicked grove Jan 10, 2022, 2:22 PM

#

serene scaffold there's a few solutions given here: https://stackoverflow.com/questions/41859605...

Thank youu!! I have another question
On colab i will use the gpu for the tensors and then change the runtime to none otherwise ?

serene scaffold Jan 10, 2022, 2:22 PM

#

wicked grove Thank youu!! I have another question On colab i will use the gpu for the tensor...

I've never actually used colab, so I'm not completely sure.

night gorge Jan 10, 2022, 3:55 PM

#

night gorge I plotted a boxplot using seaborn library. For "Iris-setosa", Q1 is 3.1, Q2 i...

anyone?

lapis sequoia Jan 10, 2022, 4:27 PM

#

Hey guys, I have two concerns in the graph. Could someone guide me.

How do make the years(xaxis) tilt when im using plt.stackplot?
2.What scale should be used to make the values in yaxis to be more interpretable?

#

gentle lion Jan 10, 2022, 5:39 PM

#

I am trying to make a neural network that can predict the rotation of a chair. I have right now 18 different categories that each contain 1.7k chair images of different degrees ( so 1 category contains 1.7k images of chairs rotated 20 degrees, another for 40 degrees and so on)

#

Does anyone know some CNN structure to make this work?

gentle lion Jan 10, 2022, 5:41 PM

#

gentle lion I have a neural network that should try to classify images of chairs into 18 dif...

I have tried something here which did not work

wicked grove Jan 10, 2022, 7:08 PM

#

i have an array as 512,512 how can i reshape it to 512,512,1

fair oracle Jan 10, 2022, 7:24 PM

#

hii is anyone well versed in pandas

#

dataframes

stone marlin Jan 10, 2022, 7:42 PM

#

wicked grove i have an array as 512,512 how can i reshape it to 512,512,1

Try reshape.

In [19]: a = np.ones(shape=(20, 15))
In [20]: a = a.reshape(20, 15, 1)
In [21]: a.shape
Out[21]: (20, 15, 1)

boreal bear Jan 10, 2022, 7:52 PM

#

Hello all, I need help with a problem:

I have a Pandas dataframe and want to create a conditional column based on the "USA State" column. I want to use multiple conditions to yield a column with a rep name. I have created lists associated with each rep and want to write the reps name in a column if in State is in the Rep list.
df1["Rep Name"] =

["Darrin" if i in Darrin else "None" for i in df1["State"]
This works but I want it to work with multiple names. How would I do this?

serene scaffold Jan 10, 2022, 7:55 PM

#

@boreal bear I'm on mobile, but the solution doesn't involve any loops. Look into masking with conditionals.

serene scaffold Jan 10, 2022, 7:59 PM

#

boreal bear Hello all, I need help with a problem: I have a Pandas dataframe and want to cr...

can you do print(df.head().to_dict('list')) and copy and paste the text into this chat?

stone marlin Jan 10, 2022, 9:06 PM

#

night gorge I plotted a boxplot using seaborn library. For "Iris-setosa", Q1 is 3.1, Q2 i...

For a boxplot, your whiskers won't be exactly on (q1 - 1.5 * iqr, q3 + 1.5 * iqr), since, otherwise, boxplots would always be symmetric. Instead, the whisker is on the max/min value which is not an outlier. An outlier, in this case, is defined as anything beyond that q2 +- 1.5 * iqr value.

(Also, I think your values for q1, q2, q3 may be off, I get slightly different ones using the same data and functions.)

q1: 3.2
q2: 3.4
q3: 3.675
IQR: 0.475
Whisker bounds: (2.9, 4.2)

Which means, in this case, the bounds should be at 2.9 and 4.2 respectively, which seems to be the case in the plot.

Refs:
https://www.simplypsychology.org/boxplots.html
https://github.com/mwaskom/seaborn/blob/e04b07eb3df135511e71e556c2bd34ef59ba08ba/seaborn/categorical.py#L1288-L1294

arctic wedgeBOT Jan 10, 2022, 9:06 PM

#

seaborn/categorical.py lines 1288 to 1294

def draw_box_lines(self, ax, data, support, density, center):
    """Draw boxplot information at center of the density."""
    # Compute the boxplot statistics
    q25, q50, q75 = np.percentile(data, [25, 50, 75])
    whisker_lim = 1.5 * (q75 - q25)
    h1 = np.min(data[data >= (q25 - whisker_lim)])
    h2 = np.max(data[data <= (q75 + whisker_lim)])```

lapis sequoia Jan 10, 2022, 9:57 PM

#

Hey guys, How can i plot the two values on the yaxis as shown in this figure?

blazing anchor Jan 10, 2022, 10:00 PM

#

I'm having some trouble appending DFs. I get the error Reindexing only valid with uniquely valued Index objects. The dfs have multiindex (date, company identifier) and the combo is unique. Date will repeat, company identifier will repeat, but the combo will not. I cannot reindex the dfs as I need the index values. I have a sample in a kaggle notebook that I can share. The dfs share some columns, but each have unique columns as well.

serene scaffold Jan 10, 2022, 11:10 PM

#

blazing anchor I'm having some trouble appending DFs. I get the error `Reindexing only valid wi...

Are you sure you're not trying to concatenate the two?

blazing anchor Jan 10, 2022, 11:12 PM

#

serene scaffold Are you sure you're not trying to concatenate the two?

I had 130+ columns, didn't realize that some column names are duplicated. That's why it didn't want to concat. Now I have to figure out and name columns with a unique identifier. Thank you for your response.

charred umbra Jan 11, 2022, 1:19 AM

#

gentle lion I have a neural network that should try to classify images of chairs into 18 dif...

maybe try new activation functions (eg. softplus, GELU, swish, mish, phish)

mild dirge Jan 11, 2022, 1:27 AM

#

gentle lion I have a neural network that should try to classify images of chairs into 18 dif...

Also seems like instead of separating it into completely separate classes, you would have it predict a single value which is the rotation

#

It should learn the rotation of the chair, instead of which "class" it belongs too

lapis sequoia Jan 11, 2022, 1:29 AM

#

gentle lion I have a neural network that should try to classify images of chairs into 18 dif...

Hi have you found solution to this?

mild dirge Jan 11, 2022, 1:29 AM

#

That way it would also be easier to generalize it to more rotation values

lapis sequoia Jan 11, 2022, 2:54 AM

#

Does anyone know if its a good idea to normalize all my data per column at the start or should I normalize them after I split them into train/test data

#

I am not sure how it works

serene scaffold Jan 11, 2022, 3:23 AM

#

lapis sequoia Does anyone know if its a good idea to normalize all my data per column at the s...

You want to normalize all the data the same way

#

If you normalize the test data differently, the model won't understand what it's looking at

lapis sequoia Jan 11, 2022, 3:47 AM

#

Okay thank you very much, I am doing a ml project and its my first one so I have a lot of questions. I really open and close tabs the last 10 hours.

lapis sequoia Jan 11, 2022, 3:47 AM

#

serene scaffold If you normalize the test data differently, the model won't understand what it's...

Do you think I should normalize the target variable too

serene scaffold Jan 11, 2022, 3:48 AM

#

Idk

stone marlin Jan 11, 2022, 3:50 AM

#

When I do my preprocessing, I make a pipeline and fit my normalizers / scalers / whatever only on training data. If my training data is representative, then my test data should scale the same way with the same parameters.

#

If you fit your scaler with test data, that's using test data for training, which isn't an excellent practice.

lapis sequoia Jan 11, 2022, 3:51 AM

#

I see so what I did is I normalized all my data between 0 and 1 and then splitted them

#

So in a way the test data are affected by the normalization

#

But only if they do contain the maximum/minimum value

stone marlin Jan 11, 2022, 3:52 AM

#

When making a model, you're probably never going to want to use your test set (or validation set, in the case of CV) for any preprocessing.

#

I honestly don't think it would affect too much of the preprocessing, but looking or manipulating the test data in the parameter fitting / training step is not a good practice to get into.

#

So, if you want to do some kind of normalization, it should be like this:

Split data into train/test (or train/validation for CV).
Fit your normalizer / whatever on your training data.
Make the model and train it on the training data.

Now you have two things: your normalizer and your model. When you score the test data, you want to put it through both. This is called pipelining.

See: https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

scikit-learn

sklearn.pipeline.Pipeline

Examples using sklearn.pipeline.Pipeline: Feature agglomeration vs. univariate selection Feature agglomeration vs. univariate selection, Pipeline ANOVA SVM Pipeline ANOVA SVM, Poisson regression an...

#

See the example there. They do a StandardScaler and an SVC (which is a type of model). Then they fit that, which tells the scaler how to scale everything and the model how to classify. Then it scores on the test set, running the testing sets through the pipeline.

lapis sequoia Jan 11, 2022, 3:55 AM

#

I see so I should split my data first

#

I havent checked cross-validation yet but I think I have to do it for my project

stone marlin Jan 11, 2022, 3:56 AM

#

IMO, best practice when you get data is kind'a check it out and get a test set / validation set ready at the beginning.

lapis sequoia Jan 11, 2022, 3:56 AM

#

Preprocessing: normalization
• Learning: training, cross-validation
• Diagnostics: testing, accuracy, loss
I need to do all these stuff for multiple linear regression

stone marlin Jan 11, 2022, 3:57 AM

#

It's all good, you still should split with CV to get a training set and a validation set, but some people (for whatever reason) don't use a validation set. Validation sets are pretty much just "test sets" for CV. At least, they're used in the same way.

#

Yep. So, you'd pretty much follow something similar to the pipeline example here. Then at the end, to test, you'd use whatever the metrics you want, here: https://scikit-learn.org/stable/modules/model_evaluation.html

scikit-learn

3.3. Metrics and scoring: quantifying the quality of predictions

There are 3 different APIs for evaluating the quality of a model’s predictions: Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they ...

lapis sequoia Jan 11, 2022, 3:57 AM

#

I am not sure if I need to split validation though, my data is time-related its a stock price

#

And I have read in the course that i need to get train data to be older

stone marlin Jan 11, 2022, 3:58 AM

#

It's up to you. Some people split for CV, some people don't.

lapis sequoia Jan 11, 2022, 3:58 AM

#

CV is cross validation?

stone marlin Jan 11, 2022, 3:58 AM

#

Yes, Cross Validation.

#

For time-series, this can get a bit tricky. I'd be interested to know how some of the DS people in here handle train/test or CV on their time-series. There's many ways to do it.

#

I use older data for train and newer data for test, but I also normalize with respect to trend and seasonality first. But some people do the opposite thing: they don't normalize and they test older, train newer.

#

I've read both sides claim their way was better so, you know, who knows.

lapis sequoia Jan 11, 2022, 4:00 AM

#

I see I am doing this for a course and prof says train to be older so I guess I wont have a problem with deciding :p

stone marlin Jan 11, 2022, 4:01 AM

#

Haha, yeah, I honestly think it makes more sense, but they're both probably fine ways to do it.

lapis sequoia Jan 11, 2022, 4:01 AM

#

But I have seen some tutorials where they normalize before splitting I am still not sure

stone marlin Jan 11, 2022, 4:01 AM

#

When you say normalize, in this case, for time series, what normalization are you applying?

lapis sequoia Jan 11, 2022, 4:02 AM

#

date isnt normalized its just works like index, all the prices and volume is normalized

#

its open/close/max/min

stone marlin Jan 11, 2022, 4:02 AM

#

Like, trend + seasonality, or minmaxscaler?

lapis sequoia Jan 11, 2022, 4:02 AM

#

prices

#

minmax

stone marlin Jan 11, 2022, 4:02 AM

#

Oh, got'cha.

#

Ehhh. I would probably still split first and do it all with a pipeline, but if it's a course it's prob not a big deal if you normalize first, if that makes things significantly easier for you.

lapis sequoia Jan 11, 2022, 4:04 AM

#

can i use the same minmaxscale to scale multiple columns and then reuse the multiplier of each column to the test data to normalize them with the same "multiplier"?

stone marlin Jan 11, 2022, 4:06 AM

#

Yeah, that's what you usually will do in timeseries stuff. You'll scale on a lot of train data, then you'll apply that to the test data as well when you score that.

lapis sequoia Jan 11, 2022, 4:07 AM

#

so the project states that i need to do cv, should i split the data to 70-10-20 train/val/test?

stone marlin Jan 11, 2022, 4:07 AM

#

I'm not sure how good minmax scaling on stock data will be (as it is notoriously non-seasonal and variable w/rt short-term trends).

#

You won't need a test set anymore for CV, just the number of splits to use.

lapis sequoia Jan 11, 2022, 4:08 AM

#

Learning: training, cross-validation
so if i want to do these 2 things I will need to resplit the data?

stone marlin Jan 11, 2022, 4:08 AM

#

The easiest way might be to do something like... Make a training set and a test set, something like 70-30. Then you can use that for training without CV. For CV, you can just plug in the training set, and use the test set as your validation set.

#

That way you're still splitting up data nicely, but you only have to split it once and not worry about it too much.

lapis sequoia Jan 11, 2022, 4:09 AM

#

Oh so I will never need the 3 sets in for the same process

#

okay

#

I havent read the cv thing yet so thats why i have questions

stone marlin Jan 11, 2022, 4:10 AM

#

Yeah, no problem. Some people here might do things differently, so they might have good advice as well.

lapis sequoia Jan 11, 2022, 4:36 AM

#

I am trying to store the fitted values from the minmaxscaler so i can simply transform train data

stone marlin Jan 11, 2022, 4:36 AM

#

Q for you NN People. I started to look into NNs --- neat stuff! But I wanted to know your workflow. When you get data and a task where you want to use a NN, do you build it up yourself, or look for pre-built ones, or what's the deal? I feel like it'd be annoying to build it up all the time, but there are "recipes" to make various types, so, you know, who knows. I'm interested in your take.

warped creek Jan 11, 2022, 4:37 AM

#

do you guys need to know math for ml?

stone marlin Jan 11, 2022, 4:37 AM

#

lapis sequoia I am trying to store the fitted values from the minmaxscaler so i can simply tra...

If you're using sklearn, you can do something like:

minmaxscaler = MinMaxScaler()
minmaxscaler.fit_transform(training_data_stuff)

and then the next time you wanna use it (like on the test data) you can do

minmaxscaler.transform(test_data_stuff)

#

The "fit" part is the part that "saves" the fitted values, so you only need to call transform on the stuff.

stone marlin Jan 11, 2022, 4:39 AM

#

warped creek do you guys need to know math for ml?

Depends on what you want to do with it. I'm a strong believer that one needs at least a basic understanding of calc, linear algebra, and stats to have a career in DS, but to do projects or even to do more of the software-engineering side, you may not need to know as much.

lapis sequoia Jan 11, 2022, 4:39 AM

#

Do you think I should normalize the target variable too?

#

I think thats where errors were popping from

#

cuz I didnt this for y_train/y_test too with the same scaler

#

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

#

also should i ignore this warning?

warped creek Jan 11, 2022, 4:41 AM

#

stone marlin Depends on what you want to do with it. I'm a strong believer that one needs at...

ah thats good to hear

stone marlin Jan 11, 2022, 4:44 AM

#

I'm not sure, but everything you do for the training set you should do for the test set as well. Yeahhh, that warning is saying you have something like,

df[row_indexer, col_indexer]
or something somewhere, and it wants you to use

df.iloc[row_indexer, col_indexer]
instead. Something like that. I try to fix it when I see it, but I think it'll usually work with that warning.

wicked grove Jan 11, 2022, 6:05 AM

#

hello, how can i use Conv2D on numpy arrays]

#

i want to apply Conv2D on X_train before passing it to my model

odd meteor Jan 11, 2022, 6:28 AM

#

warped creek do you guys need to know math for ml?

You need to know Math and Statistics actually. It's not really much of a big deal since you don't need to know the entirety of both subjects before getting started with ML.

wicked grove Jan 11, 2022, 6:35 AM

#

odd meteor You need to know Math and Statistics actually. It's not really much of a big dea...

Heyy,Will my cpu crash if i convolve tensors on it??
I ran this code and the pc started hanging soo much
X_train = tf.nn.Conv2D(X_train)

odd meteor Jan 11, 2022, 6:36 AM

#

stone marlin **Q for you NN People**. I started to look into NNs --- neat stuff! But I want...

I think it depends on what you're working on. Thanks to Transfer Learning for actually being a thing. It saves time and makes the modelling process a lot easier and faster.

odd meteor Jan 11, 2022, 6:43 AM

#

wicked grove Heyy,Will my cpu crash if i convolve tensors on it?? I ran this code and the pc...

Idk bruh 😀... If it's notoriously hanging then it's a sign you might be over loading it.

If you perceive the workload is too much for your machine, you might wanna use Colab.

What's the RAM size of your PC?
Do you have GPU?

wicked grove Jan 11, 2022, 6:45 AM

#

odd meteor Idk bruh 😀... If it's notoriously hanging then it's a sign you might be over lo...

Yes it hangs notoriously, the mouse pointer stops working
My ram size is 8 gb ,i have gpu but it's AMD so i cant use CUDA
I wanna use Colab but it keeps crashing cause it runs out of ram

lapis sequoia Jan 11, 2022, 6:55 AM

#

hey, is anybody here...who can help me with pandas dataframes

#

i am willing to concatenate 3 dataframes below each other vertically ... using pandas

#

how about https://datacarpentry.org/python-ecology-lesson/05-merging-data/ ?

lapis sequoia Jan 11, 2022, 7:00 AM

#

lapis sequoia i am willing to concatenate 3 dataframes below each other vertically ... using p...

Thanks, i did it myself 👍 ignore_index=True was needed to be given as parameter to pd.concat for natural behavior...

#

great !

#

@odd meteor just tell that, Are gonna text something releated to mee ? .... cause im waiting for you as you're typing...XD

odd meteor Jan 11, 2022, 7:06 AM

#

wicked grove Yes it hangs notoriously, the mouse pointer stops working My ram size is 8 gb ,...

At this point, you might wanna consider purchasing the paid version of Colab 😀

On a more serious note, isn't there a way to switch to GPU instead of using CPU? I think there should be a way.

My PC's GPU is Iris XE. So I can't utilize CUDA either although I have thunderbolt4 port to connect eGPU.

If it's any consolation, not everyone with Nvidia GPU can afford to train heavy/deep NN on their pc. Most people still utilize colab.

odd meteor Jan 11, 2022, 7:07 AM

#

lapis sequoia <@!519319496868233227> just tell that, Are gonna text something releated to mee ...

😊

lapis sequoia Jan 11, 2022, 7:08 AM

#

odd meteor 😊

Ooh, i thought. You would text something related to pandas merging data frames. that is why i was waiting for your text... *i know im very stupid

odd meteor Jan 11, 2022, 7:09 AM

#

lapis sequoia Ooh, i thought. You would text something related to pandas merging data frames. ...

No you're not. I just figured you've already resolved the issue ✌️

lapis sequoia Jan 11, 2022, 7:10 AM

#

btw, do you use PyTorch or Tensorflow.. ?

odd meteor Jan 11, 2022, 7:12 AM

#

lapis sequoia btw, do you use `PyTorch` or `Tensorflow`.. ?

TensorFlow but currently learning PyTorch

lapis sequoia Jan 11, 2022, 7:12 AM

#

odd meteor TensorFlow but currently learning PyTorch

i was in this stage 5months ago

#

i think pytorch is better with RNNs that tensorflow

odd meteor Jan 11, 2022, 7:17 AM

#

To me, no framework is superior than the other. TensorFlow is a lot easier that's why I picked it first. Ooh, Keras is fun as well.

wicked grove Jan 11, 2022, 7:50 AM

#

odd meteor At this point, you might wanna consider purchasing the paid version of Colab 😀 ...

The paid version of colab allocates 25 GB of ram will that be sufficient for a dataset which is of size (3390,512,512,3) for transfer Learning?

odd meteor Jan 11, 2022, 7:54 AM

#

wicked grove The paid version of colab allocates 25 GB of ram will that be sufficient for a d...

Honestly, I have no idea. I haven't used the paid version yet. Others who have used it might have a better answer.

wicked grove Jan 11, 2022, 7:55 AM

#

Alrightt, thanks😁

lapis sequoia Jan 11, 2022, 8:13 AM

#

hello

#

I'm trying to read these irregular tables into a dataframe

#

I dont know how to go about it

#

#

please help ~

#

it has like.. row data and column data

lapis sequoia Jan 11, 2022, 9:41 AM

#

help plox

night gorge Jan 11, 2022, 10:16 AM

#


svm_model_linear = SVC(kernel = 'linear').fit(x_train, y_train)```
.
ytrain is one hot encoded (it contains 1 among 3  values)
. 
But while running it gives error
```y should be a 1d array, got an array of shape (105, 3) instead.```
.
How to fix this?

lapis sequoia Jan 11, 2022, 10:25 AM

#

lapis sequoia it has like.. row data and column data

does it though ? Where are the column headers ?

lapis sequoia Jan 11, 2022, 10:27 AM

#

lapis sequoia does it though ? Where are the column headers ?

the things you see in row 3 in the first table and row 10 in second table

#

they are column headers

lapis sequoia Jan 11, 2022, 11:05 AM

#

then I'd start with deleting lines 1, 2, 8 and 9 if they're not relevant

arctic wedgeBOT Jan 11, 2022, 11:23 AM

#

:x: failed to apply.

#

@lucid plover Please don't try to ping @everyone or @here. Your message has been removed. If you believe this was a mistake, please let staff know!

upbeat prism Jan 11, 2022, 11:24 AM

#

wicked grove I checked it ,apparently there is intel(R) UHD 620 and radeon graphics (I'm kind...

not all GPU support CUDA. Disk space is not the same as memory. Disk space is your HDD or SSD while memory is your RAM or the GPU's RAM. If you run out of ram it can be that the computer uses your HDD/SSD as ram which is very slow.

In any case, make sure you understand why you run out of memory. E.g. if you network has an input size of let's say 10 numbers (doubles) and you have a batch size of 1000, then you can compute how much memory you need approximately (10 * 10'000 * 8 bytes) 8 bytes cause a double is 8 bytes (64 bits).

Now you wil need a bit more memory since you might store some things. Coding mistakes can lead to you running out of memory. Maybe you have too much input => out of memory. Make sure you understand why you run out of memory.

upbeat prism Jan 11, 2022, 11:27 AM

#

serene scaffold there's a few solutions given here: https://stackoverflow.com/questions/41859605...

@wicked grove

what I use is pytorch's subset.

        # Read samples dataset
        samplesDS = SamplesDataset(args.samples_file,
            device=args.device)

        # Make a 80/20 split for training/eval data
        k = len(samplesDS)
        train_indices = np.arange(0, int(k * 0.8), dtype='int')
        validation_indices = np.arange(int(k * 0.8), k, dtype='int')
        TrainDS = torch.utils.data.Subset(samplesDS, train_indices)
        ValidDS = torch.utils.data.Subset(samplesDS, validation_indices)

(I don't shuffle my data before splitting in this example)

Not sure if it's the best way though.

upbeat prism Jan 11, 2022, 11:29 AM

#

night gorge ```from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear').fit(x_...

the error is clear. x_train is probably something like [1,2,3,4] while y is something like [[1,2,3],[4,5,6],[6,7,8],...]

Just make them the same dimension. You probably do your one hot encoding wrong.

lapis sequoia Jan 11, 2022, 11:30 AM

#

lapis sequoia then I'd start with deleting lines 1, 2, 8 and 9 if they're not relevant

they are relevant.. it's weird because it's like the field name is in one cell on those lines and the data for that field is in the subsequent cell

wicked grove Jan 11, 2022, 11:39 AM

#

upbeat prism not all GPU support CUDA. Disk space is not the same as memory. Disk space is yo...

Thank youu!! Yess so the data i was trying to store was 10.6GiB and it was running of ram on colab as well
I'm planning on buying colab pro which 25 GB and i hope that works

wicked grove Jan 11, 2022, 11:40 AM

#

upbeat prism <@!696373334119546890> what I use is pytorch's subset. ```py # Read s...

Ohhh okayy, i converted my tensor back to array and used sklearn's train_test_split

#

Yeahh i have amd gpu and it isnt supported by CUDA

lapis sequoia Jan 11, 2022, 11:42 AM

#

lapis sequoia they are relevant.. it's weird because it's like the field name is in one cell ...

then I'm afraid I cannot help you, there are so many abbreviations that it's hard to see what it's all about

odd meteor Jan 11, 2022, 11:51 AM

#

night gorge ```from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear').fit(x_...

Is there any reason why you onehot encoded the response variable? Even if you had to encode the target, ensure it's still in 1D

Just like the error message reads, your response variable is supposed to be in 1d space

wicked grove Jan 11, 2022, 11:59 AM

#

night gorge ```from sklearn.svm import SVC svm_model_linear = SVC(kernel = 'linear').fit(x_...

I got the same error,you can reshape y

gentle lion Jan 11, 2022, 12:08 PM

#

lapis sequoia Hi have you found solution to this?

i have now switched to linear regression as it would indeed make more sense. I am training some different CNN structures, but my pc is very slow so it takes a long time I will try on a better PC soon and then i will know if one if these structures work

night gorge Jan 11, 2022, 12:26 PM

#

odd meteor Is there any reason why you onehot encoded the response variable? Even if you ha...

I am a beginner, I am working on iris dataset.
.
We have 150 rows of Sepal Width, Sepal Length,Petal Width, Petal Length and its corresponding classification ("Iris-virginica","Iris-versicolor" or "Iris-setosa")
.
I need to make a model which will predict "classification" depending on values of SW, SL, PW, PL
.
I made:
x = SW, SL, PW, PL
y = "classification"
.
Then since y is a categorical data, I encoded with y=pd.get_dummies(y).
.
Then I split x and y and tried to model. Where can it go wrong?

arctic wedgeBOT Jan 11, 2022, 12:27 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1641904622:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

pastel valley Jan 11, 2022, 12:36 PM

#

yo are there any library llike keras which has imagedatagenerator but instead of flipping zoom and shift i want to applica color space transformations

lapis sequoia Jan 11, 2022, 12:38 PM

#

lapis sequoia then I'm afraid I cannot help you, there are so many abbreviations that it's har...

its not abbreviations, it's column name

lapis sequoia Jan 11, 2022, 12:50 PM

#

lapis sequoia its not abbreviations, it's column name

let's agree to disagree 🙂

upbeat prism Jan 11, 2022, 1:06 PM

#

wicked grove Thank youu!! Yess so the data i was trying to store was 10.6GiB and it was runni...

10.6 GiB shouldn't be a problem though, no? How much do you read at once?

Think about it: You basically have one loop which loops over the epochs and each epoch does training + evaluation and maybe some other minor things right? While training, you loop over those 10.6 GiB of data right? Now here you don't have to read all at once! You can read 1GiB, train it, read next 1GiB, train it. Once done with training, you go to validation and do the same.

You might have to change how you store the data though. If you store the data in one big normal file like csv or txt or whatever people use, then reading it in 1 GiB chunks won't be possible.

How do you store the data?
how do you read the data?
How much data do you process at once?

#

E.g. my training data currently is 30gb or something like that and test data is 2x80gb

#

and my GPU has 3.7GB ram.

upbeat prism Jan 11, 2022, 1:09 PM

#

pastel valley yo are there any library llike keras which has imagedatagenerator but instead of...

I have no idea but maybe you find something here https://pytorch.org/vision/stable/index.html ?

odd meteor Jan 11, 2022, 1:40 PM

#

night gorge I am a beginner, I am working on iris dataset. . We have 150 rows of Sepal Width...

Since you have 3 classes in your response variable, you can simply encode the 3 species of flowers as, 0,1,2 respectively. And then do a multi-class classification

vast thunder Jan 11, 2022, 2:18 PM

#

Guys is Brain.js a good library for AI? I'm not necessarily gonna do super advanced stuff. Just some kaggle datasets ig?

wicked grove Jan 11, 2022, 2:26 PM

#

upbeat prism 10.6 GiB shouldn't be a problem though, no? How much do you read at once? Thin...

The problem i faced was allocating that much data in numpy arrays
It ran out of ram then
Initially i was able to run 1 epoch but then it crashed later
It is extremely slow and difficult to load all the data in a numpy array idk why
The data i have is 3390,528,528,1
And then i use np.repeat to make it 3390,528,528,3 and that's when it starts hanging

upbeat prism Jan 11, 2022, 2:53 PM

#

wicked grove The problem i faced was allocating that much data in numpy arrays It ran out of...

How do you store the data?

wicked grove Jan 11, 2022, 2:53 PM

#

npz file

upbeat prism Jan 11, 2022, 2:54 PM

#

do you use pytorch? keras? why npz?

wicked grove Jan 11, 2022, 2:54 PM

#

keras

#

i just found it easier to load it in npz after preprocessing

wicked grove Jan 11, 2022, 2:54 PM

#

wicked grove keras

tensorflow.keras

upbeat prism Jan 11, 2022, 2:57 PM

#

I don't know keras nor npz but basically what you do is you load the whole 10GB at once. Right? Then you trible it => you'd need 30GB or memory which you probably haven't.

What you can do is to only read a piece of the data, feed it to your network, load new data, feed it again. You might have to store your data in chunks of npz files or just use HDF?

see e.g. https://keras.io/api/data_loading/

Keras documentation: Data loading

wicked grove Jan 11, 2022, 3:02 PM

#

upbeat prism I don't know keras nor npz but basically what you do is you load the whole 10GB ...

no no i have around 3.55 gb of data which i load at once and then i trible it which becomes 10.6

#

thank youu!!:)) i will check that

upbeat prism Jan 11, 2022, 3:04 PM

#

how much memory do you have?

wicked grove Jan 11, 2022, 3:12 PM

#

i have 8 gb ram

#

google colab pro offers 25gb

molten wedge Jan 11, 2022, 6:22 PM

#

for computer vision, is it possible to take five different pictures of the same object and also train the model to understand that these five different pictures are actually the same object?

iron basalt Jan 11, 2022, 6:22 PM

#

molten wedge for computer vision, is it possible to take five different pictures of the same ...

For human made objects it can be done without ML.

molten wedge Jan 11, 2022, 6:23 PM

#

I am building a grading system which will grade diamonds based on the initial image clicked. I have to assume that multiple images from different angles will be best able to grade as opposed to a single image. However is it possible to tell the model that for example these five images are of the same diamond clicked from different angles?

iron basalt Jan 11, 2022, 6:24 PM

#

Yes.

#

You can also just do video and rotate it depending on how good the model is.

molten wedge Jan 11, 2022, 6:26 PM

#

thank you for your answer. What are the limitations of using a video? I have to assume that prices in the video would take much much longer time than processing a single or two images. '

#

that processing*

iron basalt Jan 11, 2022, 6:26 PM

#

If you know where the diamond is relative to the camera, you can do it without ML, since it's nice and rigid, not something like a pile of mud.

molten wedge Jan 11, 2022, 6:27 PM

#

I see.. Let's say we do not know whether diamond is relative to the camera. Is it possible to train a dataset say on 360° videos of 10,000 diamonds? can you ballpark along it would take?

#

or if this is something that is even reasonable…

iron basalt Jan 11, 2022, 6:29 PM

#

A model to segment the image for the diamond and then another to asses its quality can work. Or you can do a single end-to-end model. Or some combination of various models that are experts that each find out something and finally you take that output and simply apply some fuzzy logic. There are multiple ways.

#

Getting the training data will probably be the most difficult part.

molten wedge Jan 11, 2022, 6:31 PM

#

That is true....pretty much impossible to find a training data like that

#

so just that I understood clearly... It indeed is possible to tell a model when it is looking at the same thing from different angles without obtaining any kind of 3D scan or 3D data some other complicated stuff

iron basalt Jan 11, 2022, 6:34 PM

#

It can be done with camera alone. 3D scans make it a lot easier though.

molten wedge Jan 11, 2022, 6:35 PM

#

Got it...thanks for helping me.

safe elk Jan 11, 2022, 6:37 PM

#

Search for Photogrammetry algo or software might help

molten wedge Jan 11, 2022, 6:38 PM

#

problem is that 3D scans capture voxel data but they do not capture colour characteristics... Unless I'm mistaken

safe elk Jan 11, 2022, 6:41 PM

#

Ah yes and your subject is transparent ...photogrammetry has troubles there too

molten wedge Jan 11, 2022, 6:41 PM

#

yeah....

safe elk Jan 11, 2022, 6:41 PM

#

Lemme think

molten wedge Jan 11, 2022, 6:42 PM

#

sure thanks

#

how can I answer the following question:
for (z) e.g. 1000 number of images given (x) resolution (e.g. 512x512 or 1024x1024) what amount of (y) time it would take to train a neural network using Google colab?

#

I think this would make it easier for me to go ahead

iron basalt Jan 11, 2022, 6:48 PM

#

Not nearly as long as it takes to get those 1000 images I would think. It also depends a lot on which model / method is chosen.

#

If your model can perform online learning, then technically the training would take as long as it takes you get each sample, since it could learn it on the spot for each image you get.

molten wedge Jan 11, 2022, 6:53 PM

#

Lets say i have 200 images at 2048x2048 resoltion of 200 diamonds...are you saying it would tkae 1 second per image to process and learn?

iron basalt Jan 11, 2022, 6:53 PM

#

Standard deep learning usually can't do online learning though, so if you want to stick with that, which most people know about, then it might take a while to get a good model.

molten wedge Jan 11, 2022, 6:53 PM

#

can you please elaborate what you mean when you say it depends on which model/method? I am quite new to this aspect of Python and am trying to learn more

#

sorry if these are too many questions...

iron basalt Jan 11, 2022, 6:56 PM

#

Some methods require a lot of samples and resampling (to avoid forgetting), while others can instantly and permanently learn things (one-shot). While the latter is the ideal and what the future holds, the current most popular methods can't do it yet, or at least not very well. But also within those slower methods, some may still require more samples, or less, and more processing power / time or less.

#

One thing that obviously controls how long it will take it how large the model is.

#

But a larger model might give better results.

#

In this regard, datascience is more like alchemy than chemistry, more of an art than a science (despite the name), and really just requires some testing. Try some small models that don't take very long to train to get a feel for how long it might take and how well it can perform.

#

Predicting it upfront is really hard since not only does the choice of model and model hyper-parameters matter, but also hardware configurations, and desired results (how well does it perform).

#

(Also without having done a similar problem with that exact same setup, it's even harder)

lapis sequoia Jan 11, 2022, 7:03 PM

#

Hello help me pls

molten wedge Jan 11, 2022, 7:04 PM

#

got it. thankyou so much for all your responses, I think I'm starting to understand why my questions don't have a straight answer. Truly appreciate you taking the time to tell me all of thiss

robust jungle Jan 11, 2022, 7:27 PM

#

how viable is it to make a dataset by seperating frames of a few videos

#

assuming the environments / lightings were varied along with the angles

#

note: specifically talking about image recognition

rapid pawn Jan 11, 2022, 8:02 PM

#

anyone know if google collab free tier has better gpu or is a 3080 better at training networks?

#

ping me plz thank you

olive shore Jan 11, 2022, 8:18 PM

#

import pandas as pd
from sklearn import preprocessing
from sklearn.naive_bayes import GaussianNB


df = pd.read_csv('../input/heart-failure-prediction/heart.csv')


Age = df.Age.tolist()
Sex = df.Sex.tolist()
CPT = df.ChestPainType.tolist()
RestingBP = df.RestingBP.tolist()
Cholesterol = df.Cholesterol.tolist()
FastingBS = df.FastingBS.tolist()
RestingECG = df.RestingECG.tolist()
MaxHR = df.MaxHR.tolist()
ExerciseAngina = df.ExerciseAngina.tolist()
Oldpeak = df.Oldpeak.tolist()
ST_Slope = df.ST_Slope.tolist()
HeartDisease = df.HeartDisease.tolist()

#encoding strings into integers in order to calculate the probabilities.


le = preprocessing.LabelEncoder()

Sex_E=le.fit_transform(Sex)
CPT_E=le.fit_transform(CPT)
RestingECG_E=le.fit_transform(RestingECG)
ExerciseAngina_E=le.fit_transform(ExerciseAngina)
ST_Slope_E=le.fit_transform(ST_Slope)

variables = zip(Age,Sex,CPT_E,RestingBP,Cholesterol,FastingBS,RestingECG_E,MaxHR,ExerciseAngina_E,Oldpeak,ST_Slope_E)


model = GaussianNB()
model.fit(variables,HeartDisease)
predicted= model.predict([[0,1]])

print(predicted)

I am trying to use bayes theorem for heart conditions this dataset has 12 columns one is the heart condition column. I am getting an error about array Reshape if my data has a single Reshape. What do I do?

austere swift Jan 11, 2022, 8:31 PM

#

olive shore ```py import pandas as pd from sklearn import preprocessing from sklearn.naive_b...

could you give some details on the data in the csv file as well as a paste of the full error message

olive shore Jan 11, 2022, 8:34 PM

#

Ok

#

https://www.kaggle.com/fedesoriano/heart-failure-prediction is the dataset

Heart Failure Prediction Dataset

11 clinical features for predicting heart disease events.

#

If you need like specific data I will give it to you in a second I need to get on my laptop

#

I was using the instructions

#

First ones

#

https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn

DataCamp Community

Sklearn Naive Bayes Classifier Python: Gaussian Naive Bayes Scikit-...

Sklearn Naive Bayes Classifier Python. Learn how to build & evaluate a Gaussian Naive Bayes Classifier using Python's Scikit-learn package.

storm blade Jan 11, 2022, 9:33 PM

#

How can I learn Data Science and AI for free? it can also be cheap but like my wallet is struggling to survive sooo...

olive shore Jan 11, 2022, 9:34 PM

#

there are so many free sources out there like fast ais course and courses on coursera and datacamp

brave sand Jan 12, 2022, 12:42 AM

#

how do I create a custom dataset for image detection?

#

as in what kind of images go in?

#

I just take hundreds of pictures of the object?

heady spoke Jan 12, 2022, 1:22 AM

#

hi

serene scaffold Jan 12, 2022, 2:00 AM

#

brave sand how do I create a custom dataset for image detection?

So, the model needs to be able to identify that certain regions of the image depict a certain thing? That kind of thing?

brave sand Jan 12, 2022, 2:14 AM

#

serene scaffold So, the model needs to be able to identify that certain regions of the image dep...

like in a real time video stream, it needs to be able to detect a certain image

olive shore Jan 12, 2022, 2:29 AM

#

is anyone good with data science or AI and could help me with my issue

stone marlin Jan 12, 2022, 2:31 AM

#

Try doing .reshape(-1, 1) on whatever it's throwing the error on, but you also should probably do the "10-minute to Pandas" guide, since most of what you're doing can be done in terms of manipulation can be done with dataframe operations.

olive shore Jan 12, 2022, 2:36 AM

#

i did it,it doesnt work. They are saying do that if its one feature. doesnt one feature mean one column?

serene scaffold Jan 12, 2022, 2:38 AM

#

brave sand like in a real time video stream, it needs to be able to detect a certain image

detect certain images. do you mean it has to pick out entire frames, or identify things in the video?

olive shore Jan 12, 2022, 2:38 AM

#

serene scaffold detect certain images. do you mean it has to pick out entire frames, or identify...

do you know how to fix my issue?

serene scaffold Jan 12, 2022, 2:39 AM

#

olive shore do you know how to fix my issue?

no

#

Sex_E=le.fit_transform(Sex)
CPT_E=le.fit_transform(CPT)
RestingECG_E=le.fit_transform(RestingECG)
ExerciseAngina_E=le.fit_transform(ExerciseAngina)
ST_Slope_E=le.fit_transform(ST_Slope)

every time you call fit_transform, you reset the encoder, which means you can't transform instances of the same feature again.

brave sand Jan 12, 2022, 2:41 AM

#

serene scaffold detect certain images. do you mean it has to pick out entire frames, or identify...

so like I have a drone 50 feet in the air. I have to detect a big red target mat in real time and fly over. Best way of doing this?

serene scaffold Jan 12, 2022, 2:42 AM

#

brave sand so like I have a drone 50 feet in the air. I have to detect a big red target mat...

this problem is called object detection, not image detection. see if you can find resources about it with that in mind.

brave sand Jan 12, 2022, 2:44 AM

#

serene scaffold this problem is called object detection, not image detection. see if you can fin...

ah sorry. so I did something with a pre trained ssd mobile net but now I have to train my own neural network and label and create my one dataset. any tips for creating my own dataset?

serene scaffold Jan 12, 2022, 2:44 AM

#

I don't know how to do that, sorry.

brave sand Jan 12, 2022, 2:45 AM

#

same, couldn’t find any tips online

safe elk Jan 12, 2022, 2:45 AM

#

https://pjreddie.com/darknet/yolo/

YOLO: Real-Time Object Detection

You only look once (YOLO) is a state-of-the-art, real-time object detection system.

#

We tried that and trained

#

You do have to label the regions of interest for training data

#

https://cloudxlab.com/blog/label-custom-images-for-yolo/

CloudxLab Blog

How to label custom images for YOLO - YOLO 3 | CloudxLab Blog

In this blog we will show how to label custom images for making your own YOLO detector. We have other blogs that cover how to setup Yolo with Darknet, running object detection on images, videos and live CCTV streams. If you want to detect items not covered by the general model, you need custom training. … Continue reading "How to label custom im...

#

So you need photos of the target taken from a distance then do as above

brave sand Jan 12, 2022, 2:48 AM

#

can yolo recognize a custom dataset?

#

alright thanks

safe elk Jan 12, 2022, 2:49 AM

#

It needs the annotated images and there are annotation tools

#

Get a fast machine with a good gpu on or offline

stone marlin Jan 12, 2022, 2:50 AM

#

We've seen a lot of YOLO in the past few days, dang.

#

Also, I finished up that NN deeplearning ai course, it was pretty good. I did the basic keras/tensorflow tutorial, and that was also fun. It was nice building up little dealies and messing around with it. I didn't make anything substantial, but I did get it to classify some article text into a few subjects [sports, tech, and finance], which is better than nothing!

#

Thanks for recommending it to me, y'all. I feel like I know "something" about NNs now. Enough to know what to google, anyhow, if I ever need'em again.

desert oar Jan 12, 2022, 2:54 AM

#

stone marlin We've seen a lot of YOLO in the past few days, dang.

typical for this channel

stone marlin Jan 12, 2022, 2:54 AM

#

Next project: relearn PySpark junk so I don't look like a fool when I need to do it. :''']

#

Ugh, the worst thing about PySpark is honestly the logs. The rest is fine. The logs are just a gd nightmare. You make a typo and it's 50 pages of logs.

desert oar Jan 12, 2022, 2:56 AM

#

alas, jvm tracebacks

olive shore Jan 12, 2022, 2:56 AM

#

serene scaffold no

so are you saying I need to put this after everytime I encode something

#

le = preprocessing.LabelEncoder()

serene scaffold Jan 12, 2022, 2:57 AM

#

olive shore so are you saying I need to put this after everytime I encode something

I would first establish that every feature (a) needs to be encoded and that (b) the LabelEncoder is the right encoder for the job

#

but in either case, every feature needs its own encoder, unless you never want to encode instances of that feature again.

olive shore Jan 12, 2022, 2:59 AM

#

ok

#

yeah so the stuff I am encoding are strings and they need to be put into integers in order for them to be calculated into the probability

#

i used the same encoding method they used here

#

https://www.datacamp.com/community/tutorials/naive-bayes-scikit-learn

DataCamp Community

Sklearn Naive Bayes Classifier Python: Gaussian Naive Bayes Scikit-...

Sklearn Naive Bayes Classifier Python. Learn how to build & evaluate a Gaussian Naive Bayes Classifier using Python's Scikit-learn package.

serene scaffold Jan 12, 2022, 3:00 AM

#

so MaxHR is a string?

#

because that sounds wrong.

olive shore Jan 12, 2022, 3:00 AM

#

serene scaffold ```py Sex_E=le.fit_transform(Sex) CPT_E=le.fit_transform(CPT) RestingECG_E=le.fi...

all of the stuff here are strings

serene scaffold Jan 12, 2022, 3:01 AM

#

why would a heartrate be a string

#

I just checked the dataset and it's a number.

olive shore Jan 12, 2022, 3:02 AM

#

which one?

#

maxHR?

serene scaffold Jan 12, 2022, 3:02 AM

#

yes. several of the features are numbers, actually.

olive shore Jan 12, 2022, 3:02 AM

#

Sex_E=le.fit_transform(Sex)
CPT_E=le.fit_transform(CPT)
RestingECG_E=le.fit_transform(RestingECG)
ExerciseAngina_E=le.fit_transform(ExerciseAngina)
ST_Slope_E=le.fit_transform(ST_Slope)

those are the only one that are being encoded

#

maxHR isnt there?

#

wait are the stuff that go here

variables = zip(Age,Sex,CPT_E,RestingBP,Cholesterol,FastingBS,RestingECG_E,MaxHR,ExerciseAngina_E,Oldpeak,ST_Slope_E)

only supposed to be the encoded ones?

serene scaffold Jan 12, 2022, 3:03 AM

#

I see

stone marlin Jan 12, 2022, 3:03 AM

#

That is the most things I've ever seen zipped. :'']

olive shore Jan 12, 2022, 3:03 AM

#

stone marlin That is the most things I've ever seen zipped. :'']

i dont need to zip them right?

#

i just followed exactly what they did in datacamp

#

idk if I messed something up though

stone marlin Jan 12, 2022, 3:09 AM

#

I'm only half paying attention, I'm sorry, Mr. Einstein, I've got to finish up a submission. I think a lot of what you want to do can be done with df operations, though.

wicked grove Jan 12, 2022, 3:37 AM

#

can someone pls tell me how i can resolve this

#

Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

olive shore Jan 12, 2022, 3:43 AM

#

what is the code?

gray swallow Jan 12, 2022, 6:37 AM

#

Guys can I have any recommendations for tutorials in AI and ML for beginners like absolute beginner

fair locust Jan 12, 2022, 9:44 AM

#

rows, columns = (3, 6)
fig, axes = plt.subplots(nrows=rows, ncols=columns)
for i, key in enumerate(df.keys()):
    plot = df.boxplot(column=key, ax=axes[i%rows, i//rows])
    plt.setp(axes, xticks=[], yticks=[])

plt.show()

How can I add titles to this?

#

df.hist returns an ndarray

#

And I can't just use plt.title

prisma jay Jan 12, 2022, 9:51 AM

#

What makes df.groupby([some columns]).sum().reset_index() retrun zero row df?

olive patio Jan 12, 2022, 12:57 PM

#

Hey guys

#

I'm trying to do something with shap

#

import shap
batch = next(iter(test_dl))
images, _ = batch

background = images[:100].to(device)
test_images = images[100:105].to(device)

e = shap.DeepExplainer(model, background)
shap_values = e.shap_values(images)

shap_numpy = [np.swapaxes(np.swapaxes(s, 1, -1), 1, 2) for s in shap_values]
test_numpy = np.swapaxes(np.swapaxes(test_images.cpu().numpy(), 1, -1), 1, 2)
shap.image_plot(shap_numpy, -test_numpy)

#

I'm getting an error -The size of tensor a (512) must match the size of tensor b (2048) at non-singleton dimension 1

#

This is my model -class PredsModel(ImageClassificationBase):
def init(self, num_classes, pretrained=True):
super().init()
# Use a pretrained model
self.network = models.resnet50 (pretrained=pretrained)
# Replace last layer
self.network.fc = nn.Linear(self.network.fc.in_features, num_classes)

def forward(self, xb):
    return self.network(xb)

#

How do I solve this? Thanks

fair locust Jan 12, 2022, 1:02 PM

#

fair locust ```py rows, columns = (3, 6) fig, axes = plt.subplots(nrows=rows, ncols=columns)...

^

tardy badger Jan 12, 2022, 1:39 PM

#

does anyone know a good course to learn Pyspark?

lapis sequoia Jan 12, 2022, 2:08 PM

#

So I am using Tensorflow for making 2 classifiers, I first placed 70% data into training, then 15% data into validation and 15% in test. I used that data for training 2 classifiers. Then I create new dataset in the same way and again train two classifiers. I do that 5 times. I got accuracy for each phase of testing.

#

Now I would like to use statistical test for comparing my my models

#

What statistical test do you propose?

nocturne hedge Jan 12, 2022, 2:50 PM

#

Hello, I've a question regarding Neural Networks. Does every Inputneuron get the whole Inputvector or just one element of it

vast thunder Jan 12, 2022, 4:03 PM

#

Guys is https://www.w3schools.com/ai/ a good tutorial source for ML?

W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

desert oar Jan 12, 2022, 4:13 PM

#

prisma jay What makes df.groupby([some columns]).sum().reset_index() retrun zero row df?

provide a sample of data that reproduces this problem

desert oar Jan 12, 2022, 4:14 PM

#

fair locust ```py rows, columns = (3, 6) fig, axes = plt.subplots(nrows=rows, ncols=columns)...

you should be able to set the title on each Axes object, otherwise use plt.suptitle to set a title for the whole figure

#

i think you can also invoke fig.suptitle if you want to use the OO interface

desert oar Jan 12, 2022, 4:15 PM

#

tardy badger does anyone know a good course to learn Pyspark?

good question. i learned the basics from a coworker...

hot slate Jan 12, 2022, 4:21 PM

#

Hi everyone, in pandas.Series, how can I overload the == operator?

For example:

s = pd.Series(['abc', 'daf', 'ghi'])
if (s == 'a').any():
  print('True')

I want to overload the == operator to perform some regular expression. The result that I want is:

True
True
False

serene scaffold Jan 12, 2022, 4:22 PM

#

you don't need to overload the operator for that. what regular expression operation are you trying to do?

#

@hot slate

#

!e

import pandas as pd
s = pd.Series(['abc', 'daf', 'ghi'])
result = s.str.contains('a')
print(result)

arctic wedgeBOT Jan 12, 2022, 4:23 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | 0     True
002 | 1     True
003 | 2    False
004 | dtype: bool

serene scaffold Jan 12, 2022, 4:24 PM

#

and you can add .any() to the end of that if you want a single bool.

hot slate Jan 12, 2022, 4:24 PM

#

oh thanks alot

#

I didn't know the .str.contains before

serene scaffold Jan 12, 2022, 4:24 PM

#

!docs pandas.Series.str

arctic wedgeBOT Jan 12, 2022, 4:24 PM

#

pandas.Series.str


Series.str()```
Vectorized string functions for Series and Index.

NAs stay NA unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.

Examples

```py
>>> s = pd.Series(["A_Str_Series"])
>>> s
0    A_Str_Series
dtype: object
```...

serene scaffold Jan 12, 2022, 4:25 PM

#

the str accessor has tons of good stuff.

#

most string methods or regex functions, you can do to the whole series via the str accessor.

hot slate Jan 12, 2022, 4:26 PM

#

thanks

#

this is extremely helpful and informative for me

serene scaffold Jan 12, 2022, 4:28 PM

#

any time 💚

lapis sequoia Jan 12, 2022, 4:29 PM

#

So I am using Tensorflow for making 2 classifiers, I first placed 70% data into training, then 15% data into validation and 15% in test. I used that data for training 2 classifiers. Then I create new dataset in the same way and again train two classifiers. I do that 5 times. I got accuracy for each phase of testing.
Now I would like to use statistical test for comparing my my models
What statistical test do you propose?
If that's important I have maybe 850 photos per class

vagrant monolith Jan 12, 2022, 4:32 PM

#

Hello i have a timeseries type date i want to extract the year only, how can i do this ?

serene scaffold Jan 12, 2022, 4:34 PM

#

vagrant monolith Hello i have a timeseries type date i want to extract the year only, how can i d...

df['Date'].dt.year

stone marlin Jan 12, 2022, 4:34 PM

#

Another sweet accessor, dt.

lapis sequoia Jan 12, 2022, 4:35 PM

#

stone marlin Another sweet accessor, `dt`.

Do you know maybe answer for my question? I remember you previously commented something about deep learning on my post

stone marlin Jan 12, 2022, 4:35 PM

#

What statistical testing would you like to do?

olive patio Jan 12, 2022, 4:35 PM

#

guys can someone answer this please https://stackoverflow.com/questions/70684946/the-size-of-tensor-a-512-must-match-the-size-of-tensor-b-2048-at-non-singlet

Stack Overflow

The size of tensor a (512) must match the size of tensor b (2048) a...

I'm trying to use shap to improve the explainability of my model. This is the code-
import shap
batch = next(iter(test_dl))
images, _ = batch

background = images[:100].to(device)
test_images = im...

stone marlin Jan 12, 2022, 4:35 PM

#

"This model is statistically more accurate in some regard to this other one."?

vagrant monolith Jan 12, 2022, 4:36 PM

#

lapis sequoia Jan 12, 2022, 4:36 PM

#

stone marlin What statistical testing would you like to do?

I would like to compare my two models, I am not sure what statistical test would be best

vagrant monolith Jan 12, 2022, 4:36 PM

#

Thing is its not a datetime value its a time series

robust jungle Jan 12, 2022, 4:36 PM

#

I'm trying to make a bot to parry a kick in a game. The bot needs to be able to recognize when it will be hit by the kick in real time, how should I approach this?

desert oar Jan 12, 2022, 4:37 PM

#

vagrant monolith Thing is its not a datetime value its a time series

what do you mean by "a time series"

#

post your data, or a sample thereof

#

or at least tell us what the dtype is and give some example values

vagrant monolith Jan 12, 2022, 4:38 PM

#

heres the dataset

📎 coin_Dogecoin.csv

stone marlin Jan 12, 2022, 4:39 PM

#

No DMs please, keep everything public.

lapis sequoia Jan 12, 2022, 4:39 PM

#

stone marlin No DMs please, keep everything public.

Ok

stone marlin Jan 12, 2022, 4:40 PM

#

In re: to statistical tests, you've got a few models and you want to compare them. So, you want to kind of say, "This one has X accuracy (or whatever), and this one has Y accuracy (or whatever), so this one is better." That kind of thing? There's a large number of ways to do this sort of thing, so I'm trying to narrow down what you want.

serene scaffold Jan 12, 2022, 4:41 PM

#

vagrant monolith

what type is that column currently? are those all strings? because strings aren't proper datetimes.

#

!docs pandas.to_datetime

arctic wedgeBOT Jan 12, 2022, 4:41 PM

#

pandas.to\_datetime


pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)```
Convert argument to datetime.

serene scaffold Jan 12, 2022, 4:41 PM

#

you'll have to use this to parse them out. here's the docs for the format= mini-language: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

lapis sequoia Jan 12, 2022, 4:41 PM

#

stone marlin In re: to statistical tests, you've got a few models and you want to compare the...

Yeah in my case are two models

stone marlin Jan 12, 2022, 4:42 PM

#

Remind me which type of classifier you've decided to use on them?

lapis sequoia Jan 12, 2022, 4:42 PM

#

One is my custom made CNN other is InceptionV3 with transfer learning

vagrant monolith Jan 12, 2022, 4:43 PM

#

@serene scaffold

#

Im sorry im new and kinda lost

serene scaffold Jan 12, 2022, 4:44 PM

#

vagrant monolith <@!253696366952316929>

so, they are strings. proper datetimes are stored as numbers.

#

the pd.to_datetime function will help you.

stone marlin Jan 12, 2022, 4:45 PM

#

The usual way that I know of (maybe someone else in here knows more) to test is called McNemar's test. It's a chi-square test which compares error rates in the two models.

#

There's some other ones, but they're usually fairly specific.

serene scaffold Jan 12, 2022, 4:45 PM

#

it might even work if you just do pd.to_datetime(df['Date'])

vagrant monolith Jan 12, 2022, 4:45 PM

#

I've already tried it no luck

#

serene scaffold Jan 12, 2022, 4:45 PM

#

so you'll have to provide the format=, which is where you say what each part of the string means.

#

oh

#

so it worked, you just have to write it back to the dataframe

#

market_cap_doge['Date'] = pd.to_datetime(market_cap_doge['Date'])

stone marlin Jan 12, 2022, 4:47 PM

#

McNemar's test is given here: https://en.wikipedia.org/wiki/McNemar's_test and it might help you out. There are other tests --- like t- and z- score tests for some models, but the recent criticism iirc has been that they violate basic assumptions of t- and z-, so the recommendation was to not use them unless literally nothing else would work.

#

If anyone else knows better than me, let me know, since I don't do statistical testing all that much on my models. I know, I know, it's bad.

vagrant monolith Jan 12, 2022, 4:47 PM

#

@serene scaffold it workeddd !! thanks so much!!

serene scaffold Jan 12, 2022, 4:47 PM

#

@vagrant monolith pandas objects work differently from the rest of Python. most pandas functions/methods return new objects, without changing existing ones

#

so, pd.to_datetime returns a new Series. it won't change the DataFrame that that Series came from unless you tell it to.

lapis sequoia Jan 12, 2022, 4:48 PM

#

stone marlin McNemar's test is given here: https://en.wikipedia.org/wiki/McNemar%27s_test and...

What you mean by -t and -z?

vagrant monolith Jan 12, 2022, 4:49 PM

#

serene scaffold so, `pd.to_datetime` returns a new Series. it won't change the DataFrame that th...

Ohhh i see so it's not an instance method that modofies the exisiting value

#

unless u tell it so

#

i see now thanks a bunch

stone marlin Jan 12, 2022, 4:49 PM

#

Student t-test and Z-test, which are two of the "big" tests in statistics. https://en.wikipedia.org/wiki/Z-test It's the statistical test most stats classes will lead off with since it's fairly robust and usually pretty good.

serene scaffold Jan 12, 2022, 4:49 PM

#

vagrant monolith Ohhh i see so it's not an instance method that modofies the exisiting value

instance methods in pandas work the way that I said. they return a new instance, without changing the one that you called the method from.

lapis sequoia Jan 12, 2022, 4:49 PM

#

Going to eat, I already read something so I will response later

stone marlin Jan 12, 2022, 4:50 PM

#

Good luck. I don't know much about this, so maybe someone else will be able to chime in later.

vagrant monolith Jan 12, 2022, 4:50 PM

#

@serene scaffold Oh ii get it now that makes sense

serene scaffold Jan 12, 2022, 4:51 PM

#

vagrant monolith <@!253696366952316929> Oh ii get it now that makes sense

it's confusing at first, but it's actually very useful once you get used to it. it makes it easier to keep track of how your data changes through your program.

#

soon you will understand 😄

vagrant monolith Jan 12, 2022, 4:53 PM

#

@serene scaffold yeaa i see how it can be helpful you don't wanna end up with modified data everytime

lime sigil Jan 12, 2022, 4:56 PM

#

How can I detect how likely it is that a string as the same meaning as another string?
Like I have a sentence "The first programming language was Fortran " and "Fortran was the first programming language"
For us they say the same, I need to detect it via Python

serene scaffold Jan 12, 2022, 4:56 PM

#

unrelated, but does anyone know of a library for taking existing voice audio and making it sound higher or lower? one that just changes the pitch isn't sufficient as there's more to voice quality than that. it's apparently very difficult to Google for because there's too much noise (voice synthesis libraries, general audio manipulation libraries, etc.)

serene scaffold Jan 12, 2022, 5:02 PM

#

lime sigil How can I detect how likely it is that a string as the same meaning as another s...

what kind of computer will you be running this on? you might be able to use models for sentence similarity.

brave granite Jan 12, 2022, 5:14 PM

#

how to find location of data in excel file using python

uneven oracle Jan 12, 2022, 5:19 PM

#

Can anybody support on this.

Randomly place 20 points within a unit square.
Find the two points that are closest to each other and compute their distance. Find the two points that are farthest from each other and compute their distance. Code these calculations from scratch; do not use a packaged function.
Repeat (1) to (2) r=100 times. Collect the closest and farthest pairs. Plot all pairs on a scatter plot, with blue points for closest pairs and red points for farthest pairs. Report the average closest- and farthest-pair distances on the scatter plot.

lime sigil Jan 12, 2022, 5:21 PM

#

serene scaffold what kind of computer will you be running this on? you might be able to use mode...

linux server

#

I need it for a discord bot

opal fern Jan 12, 2022, 5:36 PM

#

#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pandas_datareader as data
from tensorflow.keras.models import load_model
import streamlit as st

the time of start and close

start = '2010-01-01'
end = '2019-12-31'

st.title('stock Trend Prediction')
user_input = st.text_input('Enter stock Ticker', 'AAPL')
#using datareader to take data, 'AAPL'is the company ticket
df = data.DataReader(user_input,'yahoo', start, end)

#describing data
st.subheader('Data from 2010 - 2019')
st.write(df.describe())

#

error : ModuleNotFoundError: No module named 'keras.models'

quasi parcel Jan 12, 2022, 5:42 PM

#

have you tried

#

uninstalling and installing?

lapis sequoia Jan 12, 2022, 5:47 PM

#

stone marlin The usual way that I know of (maybe someone else in here knows more) to test is ...

Did you say McNemar's test because in my case non-parametric test should be done? If so, why is in my case non-parametric test?

quasi parcel Jan 12, 2022, 5:47 PM

#

i have an issue and its troubling me since days
the issue is in this sample csv
https://paste.pythondiscord.com/nufixokoka.yaml
i need to change these following columns type
product_category_id is in string need to parse to list of integer
product_category is in string need to parse in list of String
product_ids is in string need to parse to list of string
i tried ast.literal_eval
df_explode = piv_pdp.assign(second=piv_pdp.product_ids.str.split(","))
when i executed this literal_eval code it was giving this traceback

1         [240623]
2         [286313]
3         [285627]
4         [312021]
Name: product_ids, Length: 390202, dtype: object```

#

can anyone please help

median idol Jan 12, 2022, 5:48 PM

#

Could someone please help, as I'm trying to use replace in my dataset but part that I'm trying to replace remains the same:

#

df['floor'] = df['floor'].astype('int64')

col_to_replace = ['hoa', 'rent amount', 'property tax', 'fire insurance']

for i in col_to_replace:
    df[i] = df[i].astype('string')
    df[i] = df[i].replace('R$', ' ')
    
df.head()```

#

#

The output remains the same

timber sky Jan 12, 2022, 5:58 PM

#

Hi, I am training a model out of a sensordataset from kaggle. I guess somehow I am doing something wrong 😄

lapis sequoia Jan 12, 2022, 6:03 PM

#

Howdy y’all

#

Could anyone answer a question and provide a few insights for me? It’d be greatly appreciated

stone marlin Jan 12, 2022, 6:04 PM

#

lapis sequoia Did you say McNemar's test because in my case non-parametric test should be done...

I don't know why, but I assumed it was a classifier between two classes, but I'm not sure if that is the case --- if it isn't, I'm not exactly sure what test they do for multi-class modeling. And, no, it didn't have anything to do with parametric-ness.

lapis sequoia Jan 12, 2022, 6:04 PM

#

I’m a bit mentorless and just trying to understand a project I have for my boot camp. Sorry if it’s noobish.

lapis sequoia Jan 12, 2022, 6:05 PM

#

stone marlin I don't know why, but I assumed it was a classifier between two classes, but I'm...

I don't know why, but I assumed it was a classifier between two classes
It's multi class classifier

stone marlin Jan 12, 2022, 6:05 PM

#

Oof, I guess don't ask to ask is blocked. Either way, "Don't ask to ask", scoby, just ask.

lapis sequoia Jan 12, 2022, 6:06 PM

#

Damn, fucking statistics makes my life hard...

stone marlin Jan 12, 2022, 6:06 PM

#

Yeah, in that case I'm not sure what statistical test would be good. I imagine that McNemars could be extended, but I've never done it.

serene scaffold Jan 12, 2022, 6:06 PM

#

median idol ```df['floor'] = df['floor'].replace('-',0) df['floor'] = df['floor'].astype('in...

you have to remove the non-numeric characters, and then convert it to a numeric type.

df['hoa'].replace(r'[\$R,]', '', regex=True).astype(float)

lapis sequoia Jan 12, 2022, 6:06 PM

#

I’m just starting a Data Science program and basically I have this rubric for an assignment.

timber sky Jan 12, 2022, 6:07 PM

#

Why does the accuracy sink if I train a neronal network? Like with everystep until almost 0 😄

serene scaffold Jan 12, 2022, 6:07 PM

#

are you sure you're not talking about loss?

lapis sequoia Jan 12, 2022, 6:07 PM

#

not looking for someone to do it for me just… a bit intimidated.

#

I was curious how y’all might approach this.

Im pretty familiar w python and it’s our first project. I just wanted some insights from those who are more knowledgeable than myself. It’s a data science ML boot camp. This is our first assignment.

#

It’s a simple dataset really.

serene scaffold Jan 12, 2022, 6:08 PM

#

lapis sequoia I was curious how y’all might approach this. Im pretty familiar w python and i...

any reason you can't copy and paste the text in that screenshot? it's easier for people to help you when they're dealing with text and it doesn't take that much more work on your part.

lapis sequoia Jan 12, 2022, 6:09 PM

#

It’s just my desktop. I’ll do that.

timber sky Jan 12, 2022, 6:09 PM

#

serene scaffold are you sure you're not talking about loss?

?

stone marlin Jan 12, 2022, 6:10 PM

#

The first step for any good DS project is to do EDA (exploratory data analysis) and to make a bunch of graphs and things. By the end of that, you should be able to answer part 1. I'd take things step-by-step.

robust jungle Jan 12, 2022, 6:10 PM

#

How can I go about making a bot to predict when something will happen via video (e.g. when a falling ball will hit the ground).

robust jungle Jan 12, 2022, 6:11 PM

#

timber sky ?

loss is pretty much inaccuracy

lapis sequoia Jan 12, 2022, 6:11 PM

#

Description
Objective

Explore the dataset to identify differences between the customers of each product. You can also explore relationships between the different attributes of the customers. You can approach it from any other line of questioning that you feel could be relevant for the business. The idea is to get you comfortable working in Python.

You are expected to do the following :

Come up with a customer profile (characteristics of a customer) of the different products
Perform univariate and multivariate analyses
Generate a set of insights and recommendations that will help the company in targeting new customers.

Data Dictionary

The data is about customers of the treadmill product(s) of a retail store called Cardio Good Fitness. It contains the following variables-

Product - The model no. of the treadmill
Age - Age of the customer in no of years
Gender - Gender of the customer
Education - Education of the customer in no. of years
Marital Status - Marital status of the customer
Usage - Avg. # times the customer wants to use the treadmill every week
Fitness - Self rated fitness score of the customer (5 - very fit, 1 - very unfit)
Income - Income of the customer
Miles- Miles that a customer expects to run

serene scaffold Jan 12, 2022, 6:11 PM

#

robust jungle How can I go about making a bot to predict when something will happen via video ...

how narrow is the scope of what it needs to predict? because otherwise it sounds like you're edging "true AI".

timber sky Jan 12, 2022, 6:11 PM

#

basically I see it live and it goes down every second it does another step

robust jungle Jan 12, 2022, 6:12 PM

#

serene scaffold how narrow is the scope of what it needs to predict? because otherwise it sounds...

quite narrow

#

another example

#

say there was an attack in a game you needed to dodge

#

you already knew what attack it was you needed to dodge, and your goal was to dodge that one attack

lapis sequoia Jan 12, 2022, 6:13 PM

#

Sorry for the text block, I’m not so worried about completing the project or anything of that sort. Just wanted insights about how y’all might approach this,

trying to learn as much as I can without being in a vacuum

#

@stone marlin I will read about different non parametric stat tests...can I ask you something if I don't understand something?

lapis sequoia Jan 12, 2022, 6:13 PM

#

stone marlin The first step for any good DS project is to do EDA (exploratory data analysis) ...

Thank you

robust jungle Jan 12, 2022, 6:13 PM

#

it needs to:
realize that the attack is being used
find out when it will land (it already knows how long the animation is, it needs to figure out how far behind it was based on where it is in the animation)
avoid the attack

stone marlin Jan 12, 2022, 6:15 PM

#

lapis sequoia <@!199950202252165120> I will read about different non parametric stat tests...c...

You should ask the room in general, I may not be around or asleep, and I'm not an expert on non-param stats. :'[ Someone here may be though.

lapis sequoia Jan 12, 2022, 6:16 PM

#

stone marlin You should ask the room in general, I may not be around or asleep, and I'm not a...

Yeah, it seems that you are only one who is familiar with that

stone marlin Jan 12, 2022, 6:16 PM

#

I'm only tangentially familiar with it, unfortunately, and I may not have time to do the necessary research to answer your questions.

#

I will skim back up and if no one answers I'll try my best.

lapis sequoia Jan 12, 2022, 6:17 PM

#

stone marlin I will skim back up and if no one answers I'll try my best.

Nice

stone marlin Jan 12, 2022, 6:17 PM

#

Please no DMs, y'all, keep it in public chat.

#

(Not you, Luka, haha.)

lapis sequoia Jan 12, 2022, 6:20 PM

#

stone marlin (Not you, Luka, haha.)

Haha, I can DM you?

#

why is this so intimidating

stone marlin Jan 12, 2022, 6:20 PM

#

No, no, I meant, not this time, you.

#

No DMs from anyone, please. Everything in public chat.

lapis sequoia Jan 12, 2022, 6:20 PM

#

😅 it was me

#

Lol

stone marlin Jan 12, 2022, 6:21 PM

#

It's all good, it's one of those old-man vs younger peeps things. I'm an angry ancient guy.

lapis sequoia Jan 12, 2022, 6:21 PM

#

For example, Student’s t-test for two independent samples is reliable only if each sample follows a normal distribution and if sample variances are homogeneous.

#

So I should calculate normal distribution and variance of each image?

stone marlin Jan 12, 2022, 6:22 PM

#

You could do this --- I've read recently that this is violated but, honestly, I think most people still do the t-test / z-test.

lapis sequoia Jan 12, 2022, 6:22 PM

#

stone marlin You could do this --- I've read recently that this is violated but, honestly, I ...

Why it's violated?

#

Also, this two (normal distribution and sample variances) are for students tests

stone marlin Jan 12, 2022, 6:23 PM

#

IIRC, it's that it's not homoskedasticitic, but I'd have to read the paper again.

#

Lemme see if I can find an example of this.

lapis sequoia Jan 12, 2022, 6:24 PM

#

stone marlin Lemme see if I can find an example of this.

Thanks man

stone marlin Jan 12, 2022, 6:27 PM

#

https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ Here's some examples of the tests. I'm not 100% sure how to do it with NNs, but with Cross Validation stuff you'll usually take the means of the slices, create a distribution with those, and test that the distributions are different using a difference-of-means.

#

(There is also a chi-square version but I've literally never seen it used, so idk about it.)

lapis sequoia Jan 12, 2022, 6:35 PM

#

Hello I need a help

#

pls

robust jungle Jan 12, 2022, 6:35 PM

#

lapis sequoia Hello I need a help

what do you need help with?

lapis sequoia Jan 12, 2022, 6:36 PM

#

With .csv files

robust jungle Jan 12, 2022, 6:36 PM

#

what about them?

lapis sequoia Jan 12, 2022, 6:36 PM

#

Problems with appending

robust jungle Jan 12, 2022, 6:36 PM

#

this may be the wrong channel, but I would still be happy to help

lapis sequoia Jan 12, 2022, 6:37 PM

#

I am using pandas library

lapis sequoia Jan 12, 2022, 6:37 PM

#

stone marlin https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-...

Thanks, I read several articles from that site about this topic but didn't find that one

#

@stone marlin Hmm, this is for numeric data in all examples

stone marlin Jan 12, 2022, 6:47 PM

#

Maybe I'm misunderstanding what you're trying to test --- there are many, many potential ways to "test" models to see which is "better", and there's various things better means.

lapis sequoia Jan 12, 2022, 6:48 PM

#

stone marlin Maybe I'm misunderstanding what you're trying to test --- there are many, many p...

Well, in my case accuracy of particular class isn't important, so I track accuracy

#

I thought to set hypothesis

stone marlin Jan 12, 2022, 6:49 PM

#

Wait, it's not important so you track it?

lapis sequoia Jan 12, 2022, 6:49 PM

#

accuracy of X > accuracy of Y

lapis sequoia Jan 12, 2022, 6:50 PM

#

stone marlin Wait, it's not important so you track it?

English isn't my native language, sorry. I mean, I follow accuracy as metric how I compare two models

stone marlin Jan 12, 2022, 6:50 PM

#

It's okay, I just was unsure what you meant.

lapis sequoia Jan 12, 2022, 6:50 PM

#

I wanted to say I don't pay attention to accuracy of particular class

#

So I thought that it's valid to set hypothesis as model X has greater acc than model Y

stone marlin Jan 12, 2022, 6:50 PM

#

Okay, so you have two models. You are looking at accuracy for model X and model Y. And you want a test to say, "This model performs better, in terms of this."

#

Right, exactly. So, that's exactly the test.

#

But, there's one big issue there. You don't have a standard deviation with just one run of the model.

#

You have just one value: the accuracy.

lapis sequoia Jan 12, 2022, 6:52 PM

#

stone marlin You have just one value: the accuracy.

Yeah, why would I need standard deviation?

#

For knowing my samples follow normal distribution?

stone marlin Jan 12, 2022, 6:53 PM

#

Well, right now, you've set up

H0: mean(X) == mean(Y)
HA: mean(X) != mean(Y)

I thought you noted you were going to do a t-test to test this.

lapis sequoia Jan 12, 2022, 6:56 PM

#

stone marlin Well, right now, you've set up ``` H0: mean(X) == mean(Y) HA: mean(X) != mean(...

Haha, I didn't know that I set up that 😄

#

I thought you noted you were going to do a t-test to test this.
I read that in using t-test

stone marlin Jan 12, 2022, 6:56 PM

#

Oh, whoops, you did >. But same deal.

lapis sequoia Jan 12, 2022, 6:56 PM

#

Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.

#

What does it mean that samples are independent?

stone marlin Jan 12, 2022, 6:57 PM

#

The gist for hypothesis testing is you calculate a "test statistic" and then you try to see if the corresponding p-value is small or not. [I'm leaving a LOT out here, because this could cover the last third of a stats course.]

lapis sequoia Jan 12, 2022, 6:57 PM

#

Independent samples are samples that are selected randomly so that its observations do not depend on the values other observations.

#

How did you conclude that I set
H0: mean(X) == mean(Y)
HA: mean(X) != mean(Y)

stone marlin Jan 12, 2022, 6:59 PM

#

Sorry, above you said >. In this case, though, it should probably be two-sided.

#

I thought to set hypothesis
accuracy of X > accuracy of Y

#

This is what you noted before.

lapis sequoia Jan 12, 2022, 6:59 PM

#

stone marlin ``` I thought to set hypothesis accuracy of X > accuracy of Y ```

Yeah but how does that relate to

#

H0: mean(X) == mean(Y)
HA: mean(X) != mean(Y)

stone marlin Jan 12, 2022, 7:00 PM

#

Ah, I see, I used "mean" which threw you off probably.

#

I was getting ahead of myself here. The idea is that you cannot do this test for one value of accuracy. Otherwise your test is just "this is greater than this".

desert oar Jan 12, 2022, 7:01 PM

#

stone marlin The gist for hypothesis testing is you calculate a "test statistic" and then you...

i'll elaborate further on this because i feel strongly about it:

hypothesis testing works by setting up a "null hypothesis". you then figure out how unlikely/unusual/rare/extreme your data is, assuming that the null hypothesis is true. if it turns out that your data is very unlikely/unusual/rare/extreme when the null hypothesis is assumed to be true, then this is taken as evidence against the null hypothesis. and we reject the null hypothesis when that evidence exceeds a pre-determined threshold.

the evidence is usually the p-value, and the threshold is the size of the test (often written as α)

stone marlin Jan 12, 2022, 7:02 PM

#

Thank you, Salt, haha. I'm going to also have to step back because work is picking up in a few mins so feel free to chime in.

lapis sequoia Jan 12, 2022, 7:02 PM

#

stone marlin I was getting ahead of myself here. The idea is that you cannot do this test fo...

So how I should test this? What's your proposal?

stone marlin Jan 12, 2022, 7:02 PM

#

The usual way to do it is to get multiple values for accuracy, and then you've got a set of accuracy values for each model. At that point, you have a mean and a standard deviation for the accuracy of both models.

lapis sequoia Jan 12, 2022, 7:03 PM

#

stone marlin The usual way to do it is to get _multiple_ values for accuracy, and then you've...

Yeah I have multi values of accuracy actually

stone marlin Jan 12, 2022, 7:03 PM

#

You can then perform a t-test (since you have the mean and stdev of accuracies for both models), and you conclude that either the means of the accuracies are the same OR they are different.

#

(Actually: you either reject the null hypothesis or you fail to reject it, but, in this case it's probably going to be rejected.)

#

Good, so you've got some values for accuracy from each model

#

So you get the means + stdevs from those, and with those you perform a t-test.

lapis sequoia Jan 12, 2022, 7:05 PM

#

@stone marlin You mean this test Paired Student’s t-test?

#

But what if two means of two paired samples are significantly different?

#

I don't understand how this test relates to what I want to test - whether model X has greater accuracy than model Y

stone marlin Jan 12, 2022, 7:07 PM

#

https://www.investopedia.com/terms/t/t-test.asp I don't know exactly how scipy does it, so I'd use the formulas here. I'd prob use the "Equal Variance" t-test for this.

Investopedia

T-Test Definition

A t-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features.

#

The gist is like this: what if, by a fluke, your accuracy in model X was higher than model Y. Then model X is better right? Not necessarily. So you want to try a few different times to see if the one time you trained it wasn't a "fluke".

#

Like, you might have gotten lucky with data the first time and model X was really good, but it was terrible every other time.

#

I've got to go to a meeting, but I'd say the following: if this is an assignment, you might want to ask the teacher / TA what they want from this, there are MANY things that we could do to test it. If not, I wouldn't worry about testing right now, esp if you don't know hypothesis testing.

lapis sequoia Jan 12, 2022, 7:11 PM

#

stone marlin I've got to go to a meeting, but I'd say the following: if this is an assignment...

Good luck with meeting. I will read more about hypothesis testing

#

@stone marlin

The gist is like this: what if, by a fluke, your accuracy in model X was higher than model Y.
I think that's not a case in my case. What I did was to randomize data, placed 70% in training, 15% in validation and 15% in testing. Then I trained first model and tested it, then trained second model and tested it. I then again randomized data, placed 70% in training, 15% in validation and 15% in testing, trained first model and tested it, then trained second model and tested it...and I did that 4 more times, so I have 5 accuracies

stone marlin Jan 12, 2022, 7:15 PM

#

(Right: that's not the case. That's what you're trying to show explicitly with this test.)

lapis sequoia Jan 12, 2022, 7:16 PM

#

stone marlin (Right: that's _not_ the case. That's what you're trying to show explicitly wit...

Ok, just to check it, so you propose to use Paired Student’s t-test?

stone marlin Jan 12, 2022, 7:16 PM

#

I'd say that's what I'd use. Salt may know more. I will say this is not a common thing most DS people deal with --- at least in my field.

#

(okay, now I'm really gone.)

lapis sequoia Jan 12, 2022, 7:17 PM

#

stone marlin (okay, now I'm really gone.)

Thanks a lot for help, man! I appreciate that!

#

@desert oar are you here?

urban meadow Jan 12, 2022, 7:50 PM

#

dumb question, how to use numpy to convert an array of this kind to this kind?
[[255 255 255 ... 255 255 255]] -> [[[255 255 255] [255 255 255] [255 255 255]]]
so effectively for each element i do: 255 -> [255 255 255]

serene scaffold Jan 12, 2022, 7:55 PM

#

urban meadow dumb question, how to use numpy to convert an array of this kind to this kind? [...

looks like you're trying to reshape it. try arr.reshape(1, -1, 3). It won't work if the number of elements isn't evenly divisible by 3.

#

when you reshape an array, -1 means "the rest".

#

I made you a little example.

In [8]: np.repeat(255, 9)
Out[8]: array([255, 255, 255, 255, 255, 255, 255, 255, 255])

In [9]: arr = _

In [10]: arr.reshape(1, -1, 3)
Out[10]:
array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]])

urban meadow Jan 12, 2022, 8:00 PM

#

ty for the answer, just digesting

serene scaffold Jan 12, 2022, 8:01 PM

#

do you know how array shapes work?

#

not just reshaping them, but what the shape of an array is, in general

urban meadow Jan 12, 2022, 8:02 PM

#

ok i think i tried reshape but one problem is that i don't want to have 3 elements, for each element I want to replace it with an array of size 3 that has the same number

#

the real question is i have an image that is being read by opencv but the image is 2 colors, white/red and it keeps being read as a grayscale and it's not being read as rgb/bgr/whatever and every time i try to mask the mask image is grayscale

lapis sequoia Jan 12, 2022, 8:05 PM

#

Yo guys,

#

So I want to find the Average income for each gender in my dataset.

#

# Reading Data from relative directory
data = pd.read_csv("CardioGoodFitness.csv")

#Converting Gender Values to Intergers for later use. Female = 0, Male = 1

data['Gender'].replace('Female', 0, inplace=True)
data['Gender'].replace('Male', 1, inplace=True)

#

ran this: then this,

#

data.groupby("Income")["Gender"].mean().sort_values(ascending=True)

#

got this:

#

Income
55713 0.0
65220 0.0
62535 0.0
53536 0.0
52291 0.0
...
48658 1.0
48556 1.0
31836 1.0
68220 1.0
104581 1.0
Name: Gender, Length: 62, dtype: float64

#

How can I consolidate this so its not ascending

#

or sorted, i guess i just delete the end of that command?

serene scaffold Jan 12, 2022, 8:10 PM

#

urban meadow ok i think i tried reshape but one problem is that i don't want to have 3 elemen...

like this?

In [47]: np.tile(np.arange(10), (3, 1)).T
Out[47]:
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4],
       [5, 5, 5],
       [6, 6, 6],
       [7, 7, 7],
       [8, 8, 8],
       [9, 9, 9]])

grand imp Jan 12, 2022, 8:11 PM

#

Hey, is there a way to train a Neural network on hundreds on human conversations, and make it be able to respond to inputs with a human-like tone?

serene scaffold Jan 12, 2022, 8:12 PM

#

grand imp Hey, is there a way to train a Neural network on hundreds on human conversations...

I would train it to input/output text, and then handle the speech synthesis separately

grand imp Jan 12, 2022, 8:13 PM

#

Well the input is text based and not speech

serene scaffold Jan 12, 2022, 8:15 PM

#

so, you're asking how to make a conversational chat bot? those are notoriously difficult because when humans have conversations, the things that they say are informed by a large body of knowledge and experiences. robots don't have that.

urban meadow Jan 12, 2022, 8:16 PM

#

@serene scaffold yes can I just feed in my array to np.arrange like np.arrange(imagearray)? that would be perfect
@lapis sequoia you can just clean it up after line per line. then do string manipulation/convert to int. u can use grep or just make a for loop and check if it's a number or not to make your income per person if speed isn't a concern

grand imp Jan 12, 2022, 8:16 PM

#

Exactly, but how about training it on thousands if not millions of conversations (yes I know it will take really long but it's possible) so that it gains information of the conversations

lapis sequoia Jan 12, 2022, 8:16 PM

#

Its not for that I have to

#

create business insights for a project

grand imp Jan 12, 2022, 8:17 PM

#

So it will compare your input with the others and see which output is most similar to the correct input, and give that which already contains the knowledge you are looking for.

serene scaffold Jan 12, 2022, 8:17 PM

#

urban meadow <@253696366952316929> yes can I just feed in my array to np.arrange like np.arra...

no, np.arange takes an integer n and returns a one-dimensional array of every integer from [0, n). you can't just pass random stuff to it.

urban meadow Jan 12, 2022, 8:17 PM

#

ah i see

atomic leaf Jan 12, 2022, 8:17 PM

#

Hey guys! I hope I am not disturbing you convo too much c:
I am making a CAPTCHA solver with pytorch, but I can't create a dataset that has both the image/captcha and the label/target in the dataset/dataloader. Can someone assist me on this? ❤️

lapis sequoia Jan 12, 2022, 8:17 PM

#

im crying

grand imp Jan 12, 2022, 8:17 PM

#

we all are

serene scaffold Jan 12, 2022, 8:17 PM

#

atomic leaf Hey guys! I hope I am not disturbing you convo too much c: I am making a CAPTCHA...

We can't help you with a captcha solver, sorry.

lapis sequoia Jan 12, 2022, 8:17 PM

#

I just wanna see if the men on average are making more than the women in this hypothetical scenario

serene scaffold Jan 12, 2022, 8:18 PM

#

@atomic leaf keep in mind that asking for help with captcha solvers is against the rules, so don't do that in the future.

lapis sequoia Jan 12, 2022, 8:18 PM

#

trying to build a customer profile for a project

#

fake dataset

atomic leaf Jan 12, 2022, 8:18 PM

#

serene scaffold <@!252455063602069505> keep in mind that asking for help with captcha solvers is...

oh, yea okay. My bad! Thanks for the answer

lapis sequoia Jan 12, 2022, 8:19 PM

#

going to make a business recommendation on marketing to men or women more based on customers income provided

urban meadow Jan 12, 2022, 8:20 PM

#

i mean what's stopping you from calling income for men -> create average by cleaning up datasetm calling income from women -> create average then compare?

grand imp Jan 12, 2022, 8:20 PM

#

atomic leaf oh, yea okay. My bad! Thanks for the answer

Just change the question to "needing help with image recognition" lmao

urban meadow Jan 12, 2022, 8:21 PM

#

@serene scaffold is it possible to clone my image array and then read into a new array of size 3? too dumb to figure out the syntax, something like a1, a2, a3, and then answerarray[a1,a2,a3]

#

not size 3 but each element is size 3

serene scaffold Jan 12, 2022, 8:22 PM

#

urban meadow <@!253696366952316929> is it possible to clone my image array and then read int...

so the array is currently of shape (n, m), but you want to change it to (n, m, 3), where each slice in the third dimension is the same?

spice hamlet Jan 12, 2022, 8:24 PM

#

hi guys. i saw a few vids about ai in games (geometry dash for example) and wondered, how a ai can teach itself how to play the game, without even having a variable that tells it (good/bad). im talking of genetic algorythms

serene scaffold Jan 12, 2022, 8:25 PM

#

serene scaffold so the array is currently of shape `(n, m)`, but you want to change it to `(n, m...

This is the solution, assuming that is the question:

In [59]: np.arange(12).reshape(4, 3)
Out[59]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [60]: arr = _

In [61]: np.dstack([arr] * 3)
Out[61]:
array([[[ 0,  0,  0],
        [ 1,  1,  1],
        [ 2,  2,  2]],

       [[ 3,  3,  3],
        [ 4,  4,  4],
        [ 5,  5,  5]],

       [[ 6,  6,  6],
        [ 7,  7,  7],
        [ 8,  8,  8]],

       [[ 9,  9,  9],
        [10, 10, 10],
        [11, 11, 11]]])

urban meadow Jan 12, 2022, 8:25 PM

#

@serene scaffold so for each element X in the original array, in the new array is [X X X] so not really the same because Ihave 2 colors

serene scaffold Jan 12, 2022, 8:25 PM

#

urban meadow <@!253696366952316929> so for each element X in the original array, in the new a...

I can't answer the question effectively without knowing the the number of dimensions in the array.

urban meadow Jan 12, 2022, 8:26 PM

#

source image/array is (315, 375) target is (315, 375, 3)

serene scaffold Jan 12, 2022, 8:26 PM

#

so, aren't you repeating the array three times into a new dimension?

#

because that's what I just did.

urban meadow Jan 12, 2022, 8:27 PM

#

yes

#

ok i see it tyty

lapis sequoia Jan 12, 2022, 8:28 PM

#

serene scaffold I made you a little example. ```py In [8]: np.repeat(255, 9) Out[8]: array([255,...

What does -1 means here? Also, I am not sure why you use 1 for first parameter and 3 for last parameter?

serene scaffold Jan 12, 2022, 8:29 PM

#

lapis sequoia What does -1 means here? Also, I am not sure why you use 1 for first parameter a...

when you reshape an array, -1 means "the rest"

lapis sequoia Jan 12, 2022, 8:29 PM

#

serene scaffold when you reshape an array, -1 means "the rest"

Yeah I read that but couldn't understand what "the rest" would mean in that context

serene scaffold Jan 12, 2022, 8:29 PM

#

but that code didn't actually solve the asker's question.

serene scaffold Jan 12, 2022, 8:31 PM

#

lapis sequoia Yeah I read that but couldn't understand what "the rest" would mean in that cont...

the size of an array is the product of the length of each axis. when you reshape an array, the size has to remain the same. so the -1 represents whatever integer, if there is one, completes the product.

#

so if you have an array of size 12, you can reshape it to (2, -1, 2), and then -1 gets interpreted as 3, because 2 * 3 * 2 is 12.

urban meadow Jan 12, 2022, 8:32 PM

#

in a philosophical perspective u use -1 because you can never get to -1 (unless you go backwards/use a negative step). even if you use an arbitrarily high number you will end up getting to it at some point

#

ok i never knew that about numpy

serene scaffold Jan 12, 2022, 8:33 PM

#

I think they just picked -1 somewhat arbitrarily.

lapis sequoia Jan 12, 2022, 8:34 PM

#

@serene scaffold I see. Thanks for explanation. Btw, are you maybe familiar with statistical tests?

serene scaffold Jan 12, 2022, 8:36 PM

#

like t tests?

lapis sequoia Jan 12, 2022, 8:38 PM

#

serene scaffold like t tests?

Yes

serene scaffold Jan 12, 2022, 8:41 PM

#

lapis sequoia Yes

there's a few t test implementations in scipy

lapis sequoia Jan 12, 2022, 8:42 PM

#

serene scaffold there's a few t test implementations in scipy

Okay, so I have few accuracies for my two models (each prediction from same training and test dataset)

#

I want to make sure that particular model that has greater mean is really better

#

What test do you propose that I use?

serene scaffold Jan 12, 2022, 8:44 PM

#

so you have two arrays, each representing scores from two models on the same data? you can use this: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

#

this assumes that each prediction is independent.

lapis sequoia Jan 12, 2022, 8:45 PM

#

serene scaffold this assumes that each prediction is independent.

Yes, and I don't know whether my predictions are independent and identically distributed

#

I read what identically distributed mean

lapis sequoia Jan 12, 2022, 8:46 PM

#

lapis sequoia Yes, and I don't know whether my predictions are independent and identically dis...

I think they are independent, because previous or next prediction didn't influence on current prediction

#

What do you think? @serene scaffold

serene scaffold Jan 12, 2022, 8:48 PM

#

I'm not really sure; I don't have time to dive in, unfortunately

lapis sequoia Jan 12, 2022, 8:53 PM

#

@stone marlin Are you available maybe? I don't know if my data is identically distributed, googled and still not sure

serene scaffold Jan 12, 2022, 9:04 PM

#

lapis sequoia <@199950202252165120> Are you available maybe? I don't know if my data is identi...

please don't ping people asking for help. if they're not actively reading the channel, assume they aren't available.

#

they already told you that: #data-science-and-ml message

lapis sequoia Jan 12, 2022, 9:05 PM

#

serene scaffold please don't ping people asking for help. if they're not actively reading the ch...

Hmm, tbh, that doesn't make sense to me...can you explain, why I wouldn't tag someone?

lapis sequoia Jan 12, 2022, 9:06 PM

#

serene scaffold they already told you that: https://discord.com/channels/267624335836053506/3666...

Yeah but if I ask something and tag him that doesn't mean that he can just reply

serene scaffold Jan 12, 2022, 9:07 PM

#

lapis sequoia Hmm, tbh, that doesn't make sense to me...can you explain, why I wouldn't tag so...

it's rude to ping people to draw attention to your question. all help given here is volunteer-driven, and community members deserve to only be pinged to draw their attention to ongoing conversations that they've chosen to participate in

lapis sequoia Jan 12, 2022, 9:08 PM

#

serene scaffold it's rude to ping people to draw attention to your question. all help given here...

it's rude to ping people to draw attention to your question.
Why it's rude?

#

and community members deserve to only be pinged to draw their attention to ongoing conversations that they've chosen to participate in
Well, I wanted to ask him about something what we talked about

serene scaffold Jan 12, 2022, 9:08 PM

#

lapis sequoia > it's rude to ping people to draw attention to your question. Why it's rude?

because they get a notification on all of their devices, and for those who frequently volunteer to help, that gets noisy very quickly.

Please DM @sonic vapor if you have any other questions about this.

serene scaffold Jan 12, 2022, 9:09 PM

#

lapis sequoia > and community members deserve to only be pinged to draw their attention to ong...

but they already told you not to do that when they said to direct your questions to the whole channel.

#

If what you are currently typing pertains to this, please send it to @sonic vapor

lapis sequoia Jan 12, 2022, 9:12 PM

#

Yeah ok, I don't have time for that

#

I will follow rules

lapis sequoia Jan 12, 2022, 9:42 PM

#

Hello I have an existing project and I want to set a dev environment with conda cause I'm using libraries compiled in C and what it started as a relatively small project, it turned into a big project. I'm in doubt if I should install Anaconda in Windows Subsystem for Linux or install it regularly on Windows since I have never used conda before and I'm not sure what would be a better choice

rapid pawn Jan 12, 2022, 10:35 PM

#

lapis sequoia Hello I have an existing project and I want to set a dev environment with conda ...

i would either dual boot ubuntu or other linux distro or install conda natively on windows if you have gigantic projects just to avoid the jankiness of WSL. i would recommend a native linux environment if resource is tight since those tend to take up less VRAM etc if you have no GUI compared to windows which consumes a lot of ram and VRAM by its own GUI

desert oar Jan 12, 2022, 10:40 PM

#

lapis sequoia Hello I have an existing project and I want to set a dev environment with conda ...

i would avoid using anaconda and stick with plain conda ("miniconda") if possible

#

it works fine in powershell/cmd, you don't need a VM or WSL

#

in general building packages is nontrivial in windows, whereas in a linux-based environment you typically have a sensible build toolchain already set up

#

but if you are using pre-compiled conda packages, you should be fine working directly in windows

modest mulch Jan 12, 2022, 10:48 PM

#

anyone knows how to approximate/calculate the median or nth quantile of very large datasets that can't be fit into memory at once? preferably doing so in batches rather than having to iterate through each sample.

lapis sequoia Jan 12, 2022, 11:10 PM

#

Thank you very much for both explanations I will definetely follow your leads tomorrow when I set up my development environment. I will probably stick to installing "Miniconda" instead of Anaconda natively in Windows since I think that I will be using packages that are accesible with Miniconda

#

I have one more question though

lapis sequoia Jan 12, 2022, 11:17 PM

#

lapis sequoia <@!199950202252165120> > The gist is like this: what if, by a fluke, your accur...

Is my data identically distributed?

#

I'm planning to use jupyter notebooks to display relevant information in a user friendly manner thanks to markdown language. I have never used jupyter notebook before so I wonder how can I set up my conda environment to be able to interact with jupyter notebooks

desert oar Jan 12, 2022, 11:22 PM

#

lapis sequoia Thank you very much for both explanations I will definetely follow your leads to...

anaconda is just conda with a bunch of stuff included by default. miniconda is conda only, it's only "mini" because it's a minimal installation; the underlying pacakge manager is identical

desert oar Jan 12, 2022, 11:22 PM

#

lapis sequoia I'm planning to use jupyter notebooks to display relevant information in a user ...

it depends a little on how the notebooks will be shared/hosted/run

#

jupyter itself is a client/server setup

#

the actual code is run by a "jupyter kernel", which acts as the server

#

and you interact with a "jupyter frontend", which acts as the client

#

ipykernel (part of the ipython project) is the standard python kernel. you install this into your conda environment with all your dependencies

#

jupyter notebook or jupyterlab are frontends. these can run any kernel anywhere on your system. you can install them right into your project, in which case no additional setup is required. but if you are hosting this over the web, you might want to run a centralized instance (e.g. jupyterhub), in which case you will need to set up a "kernel spec" that tells the jupyter frontend how to start and connect to your desired jupyter kernel

#

(it can be a bit confusing because jupyter notebook is itself a server. but with respect to the jupyter protocol, it's a client)

#

jupyter kernel  <->  jupyter notebook  <->  user's browser
  (conda env)           (anywhere)            (anywhere)

rapid pawn Jan 12, 2022, 11:28 PM

#

you can even use jupyter notebook in pycharm

#

which enhances the experience massively

#

also if you decided to stick with browsers i suggest jupyter lab instead of jupyter notebook

#

since jupyter lab is the way forward

desert oar Jan 12, 2022, 11:29 PM

#

it's worth being precise about the terminology: pycharm can act as a jupyter frontend/client, and it can read and edit the same file format as jupyter notebook

rapid pawn Jan 12, 2022, 11:29 PM

#

yes exactly

#

jupyter notebook files have the extension .ipynb

#

which stands for ipython notebook iirc

#

in my experience if you get a massive project jupyter notebook or jupyter lab in a browser alone would be really messy

#

because it lacks a lot of the IDE functionalities such as stepwise debug, auto association of variables, suggestive contexts etc

lapis sequoia Jan 12, 2022, 11:32 PM

#

desert oar it depends a little on how the notebooks will be shared/hosted/run

We will use it mainly during development to see in a more visual way the representation off the data and also to show it to the client so that they understand better how the process work. I think that we are not planning to host it alongside the code which will be probably installed by means of a python package globally on a server hosted by the client

desert oar Jan 12, 2022, 11:34 PM

#

lapis sequoia We will use it mainly during development to see in a more visual way the represe...

you should look into JupyterHub

#

it's a way to host jupyter notebooks with user authentication

#

you can serve it over http and put it behind your company's domain

lapis sequoia Jan 12, 2022, 11:37 PM

#

Wow thank you so match to all of you for the in depth explanations. I will be reading through the messages to make sure that I have a proper understanding of everything so that tomorrow I can start setting up the environment

#

And I will definetely propose to my organization the idea of JupiterHub

stone marlin Jan 12, 2022, 11:41 PM

#

As noted before by both myself and now by Sterlercus, please do not ping me, it pings on all my devices.

#

I'm not sure if your data is iid, but you can assume it is for the sake of this problem.

#

I also am going to be busy for a while, I've got to finish a project fairly quickly, so unfortunately I will be unable to help for a bit.