#data-science-and-ml

cedar sky Oct 14, 2020, 10:28 AM

#

I am using the shakesperean data

#

I get a good accuracy in the training data

#

But the text generated is random

#

I use character wise text generation

#

ROMEO: hNk.HWg$zjq?CxlG$GWbjeOu!Byou$-svgZkf;xY bjgcT$i3ypDUwOgnWdmCVpkLHBpE3f:GCQWW
zE
P
W:Y.C vYzGzESOz,'

#

This is what it generates when I feed in "ROMEO: "

#

Epoch 1/100
139/139 [==============================] - 130s 938ms/step - loss: 1.1619 - accuracy: 0.6327

#

I am using tensorflow

mild topaz Oct 14, 2020, 11:18 AM

#

Traceback (most recent call last):
  File "image_classification.py", line 185, in <module>
    axs[0][1].imshow(x_batch[i].reshape(imageDimensions[0], imageDimensions[1]))
IndexError: index 14 is out of bounds for axis 0 with size 14```

glossy vale Oct 14, 2020, 12:29 PM

#

ix is not working.

mild topaz Oct 14, 2020, 1:09 PM

#

my training process @hasty grail plz check @sage idol @lapis sequoia

📎 unknown.png

hasty grail Oct 14, 2020, 1:11 PM

#

I'm busy rn, maybe someone else can help

mild topaz Oct 14, 2020, 1:12 PM

#

ok np . plz ping me when u back @hasty grail @sage idol @lapis sequoia 🙂

arctic wedgeBOT Oct 14, 2020, 1:41 PM

#

Hey @cedar sky!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

whole sage Oct 14, 2020, 2:12 PM

#

Hey

#

Does anyone knows why I get an Infeasible in optimization status

#

📎 Screenshot_20201014-092105_WhatsApp.jpg

#

Please help me

wide oxide Oct 14, 2020, 2:21 PM

#

Do we have a link for projects?

serene scaffold Oct 14, 2020, 2:59 PM

#

@void anvil if you're sure that any instance of a punctuation character can be removed, you can use a pretty simple regular expression to substitute all of them with an empty string

#

return the string without the punctuation characters?

#

strings are immutable so a function like re.sub will return a new string rather than modify the one you passed

#

you don't care whether what is mutable?

#

it's relevant to your question though

#

oh I see

#

so it's removing punctuation when it spell checks it, and you don't want that?

#

are you required to use this particular library?

#

It looks like this is a known issue with this library

#

someone on github posted their own workaround: https://github.com/mammothb/symspellpy/issues/7

GitHub

Punctuations getting removed · Issue #7 · mammothb/symspellpy

Hello. Great work! For inputs with punctuations, There are many novel, innovaive, and empirical anayysis availaible. we get outputs like there are many novel innovative and empirical analysis avail...

#

right, someone posted a workaround

#

it's on that page

#

alright

#

If you know where after which tokens the punctuation is supposed to go, you could save that information and then add them back in, but it appears that it's not guaranteed that the number of tokens in matches the number of tokens out.

#

I'm looking to see if there's another library

#

https://spacy.io/universe/project/contextualSpellCheck
this will very likely be slower but it looks like what you want

Contextual Spell Check

Contextual Spell Check · spaCy Universe

Contextual spell correction using BERT (bidirectional representations)

serene scaffold Oct 14, 2020, 3:36 PM

#

@void anvil sorry I couldn't be more helpful. A lot of NLP projects are moving towards computationally heavy solutions.

vocal sequoia Oct 14, 2020, 4:36 PM

#

Hello
I have a issue in neural networks can someone help me?
I obtained the dataset for Video Clips from ChaLearn Competition 2017.
we divided the video into Images (frames) and Audio (wav) files.
I have annotations in a pickle file
How do I read the dataset into keras ?

dapper fern Oct 14, 2020, 4:37 PM

#

Hey guys anyone learning ml, doing kaggle wanna coop? I'm beginner with some knowledge of theory, somewhat ok at python and stuff.

green basin Oct 14, 2020, 5:41 PM

#

Hi all, I have a question regarding the application of DS but I'm not sure it belongs here. The gist is that I have an existing model that predicts labels for data (derived from a video), and now I want to write those labels onto the source video but I am not sure about the best way to do this. Is this the correct channel to ask something like this? (My apologies if this is not the right way to do this, I don't normally come to the help section) (PMs welcome).

tight hawk Oct 14, 2020, 6:07 PM

#

I got an excel sheet with dates in a column in format dd/mm/yyyy
for example
07/07/2020

when i read this excel sheet using read_excel
pandas by default will read this column as int64 tyoe and the df will show value of 44019 instead of 07/07/2020

This happens for all the date values

How can i fix this ?

grave frost Oct 14, 2020, 6:24 PM

#

@green basin Can you elaborate?

green basin Oct 14, 2020, 6:24 PM

#

@tight hawk https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Look under "dtype" or event "parse_dates". I would try using the optional argument "dtype" with a value that is a dict. The key for the dict should be the col name and the value should be the data type. If you can't find a suitable date data type, try parsing as a string then use dataframe['myNewDateCol'] = dataframe['myOriginalDateCol'].map(lambda x: ... to parse the date to how you want to do it. At least that's how someone told me to do it once. Don't take my word as gospel.

grave frost Oct 14, 2020, 6:26 PM

#

@vocal sequoia What is the format of the text? Does it have some <end> or <start> sequences? Is it natively in JSON?

green basin Oct 14, 2020, 6:27 PM

#

@grave frost Sure. Let's assume that I have a Pandas DataFrame with 1 row for every 1 frame in my source video. Let's say that I have 1 column that has my string label value where I want to add that string as text ON TOP of the source video. I have tried using cv2 module; however, all implementations of adding text on top of video (that I have seen using cv2) requires writing each frame (as an image) to disk, adding the text onto the frame, then zipping all of those frames-with-text together to make a new video. I am hoping to find an alternative solution that does not require writing those frames to disk because it is so memory and CPU intensive.

grave frost Oct 14, 2020, 6:30 PM

#

@green basin I get a lot of hits on Google for reading images directly in memory using cv2. https://jdhao.github.io/2019/07/06/python_opencv_pil_image_to_bytes/ Have you tried these methods?

Convert PIL or OpenCV Image to Bytes without Saving to Disk

Introduction
Sometimes, we may want an in-memory jpg or png image that is represented as
binary data. But often, what we have got is image in OpenCV (Numpy ndarray) or
PIL Image format. In this post, I will share how to convert Numpy image or PIL
Image object to binary data wi...

green basin Oct 14, 2020, 6:33 PM

#

You know what, I've seen that web page before and it didn't answer my question before, but it has inspired me to try something different that I had not thought of. I will let you know how I fare 🙂

vocal sequoia Oct 14, 2020, 6:38 PM

#

@vocal sequoia What is the format of the text? Does it have some <end> or <start> sequences? Is it natively in JSON?
@grave frost

Dataset has videos
And annotations I extracted in a df
What I got was 5 columns with big5 personality traits (which is what I want to predict )
And 1 col with the name of the video

green basin Oct 14, 2020, 6:58 PM

#

@grave frost so I dont think cv2.imencode helped from what I see. It looks like it just serializes it but that's not where I'm having trouble. The trouble im having is after that, after i have read in a video frame (as a numpy array of (width, height, colour-depth)-shape), I'm trying to add text to that video frame and every other frame and then save a new video with those labeled frames but without having to write those frames to disk

grave frost Oct 14, 2020, 7:03 PM

#

So you do not want disk to be involved in any case?

#

@cedar sky How many epochs did you train it for?

#

@green basin Well I am no expert in this matter but I highly doubt any lib is there which accomplishes this. It seems to require very low level programming for this task.
However I don't understand where the disk is involved in the processing part. You would read img as an array, edit it according to your needs and save it. Only step 3 requires disk. Why would you write it to HDD in 1 or 2?

green basin Oct 14, 2020, 7:09 PM

#

It's just the way i've seen it done on every tutorial for cv2. I know, it sounds ridiculous. I kept getting errors about a week ago, took a break to fix something else, and now I've returned to the problem and I'm attacking it all over again. I'm going to try 1 more approach and hopefully it works. I'll let you know if i have any more questions and I'll try to be more specific. Thank you for your time.

grave frost Oct 14, 2020, 7:11 PM

#

@mild topaz Your training looks definitely wrong. For epoch 12 I don't expect val_acc to be 100% along with accuracy. Are you using dropout layers and are you sure you are not overfitting (by a very large margin it seems)

lyric canopy Oct 14, 2020, 7:11 PM

#

Did you mean to ping me? Maybe you mean something else

grave frost Oct 14, 2020, 7:11 PM

#

oh wait, so sorry 🤦‍♂️

#

It was the other jr guy

hollow sentinel Oct 14, 2020, 7:14 PM

#

does plot.ly count as data science

#

bc I wanna do some data visualization

grave frost Oct 14, 2020, 7:15 PM

#

It does

#

Though it is a general topic so you can find help in a ton of other channels too

hollow sentinel Oct 14, 2020, 7:15 PM

#

for plotly?

#

like what

grave frost Oct 14, 2020, 7:15 PM

#

I mean data visualization

hollow sentinel Oct 14, 2020, 7:15 PM

#

oh true

#

yeah i'm only on step one tho loading in a data set and creating a database

green basin Oct 14, 2020, 7:24 PM

#

@grave frost Final question since you seem knowledgeable: can you suggest a module for building an app to modify hyper parameters when building classifiers?
I have some lab-mates who understand the hyper parameters, but aren't coders, and they want to mess with the model parameters iteratively on some sort of GUI. Django is overkill but Streamlit isn't developed enough to satisfy our needs. Any suggestions for a middle ground?

rigid vector Oct 14, 2020, 7:54 PM

#

Hi ! I have a computational science question. I am working to solve an ODE of the form dy/dt=f(y,t), with f a function which depend of combination of sin(y) and polynomial function of t as t*(t-2) but I cannot find any exemple on the web which seems like my question. Is someone can help ?

storm sandal Oct 14, 2020, 7:58 PM

#

@rigid vector this seems more like a DiffEQ problem than a Data Science problem.

rigid vector Oct 14, 2020, 8:02 PM

#

hmm ok, I am going to search the right chanel where to write my question

#

thanks

heady hatch Oct 14, 2020, 8:12 PM

#

Hey all,

Currently I'm working on a dataset with couple million rows in csv format, and I think I'm running out of memory while trying to train the model.

My input pipeline is:

read data into pandas
filter rows, extract text and target
stratify split data into train and test
tokenize text, convert text to seq, then pad sequences
then throw it into model

I'm unsure if I'm batching, I did set the batch_size variable during fitting.

Couple options I'm looking into right now is data subsampling and using generators.

#

Any advice?

wise garden Oct 14, 2020, 9:00 PM

#

What's that popular website that has free modules on ML

#

someone told me but I forgot

#

jk found it

heady hatch Oct 14, 2020, 9:14 PM

#

Oh that's super cool. I'll look into it.
Thank you @void anvil .

Does anyone know a TF/Keras solution?

#

I think I can lazy load it without the library though.

#

I'm not looking for visualization or anything, just needed to load it into model.

#

I'm still trying it out. But what I've done thus far is just saving the preprocessed data as h5 format onto drive and load the preprocessed data instead.

#

I'll be spreading words about vaex though, I think my colleague will appreciate it.

shadow harbor Oct 14, 2020, 9:32 PM

#

Heyy everyone, I am new to coding and I am planning to take up data science or data analysis as a career option. Any tips and tricks plsss?

desert oar Oct 14, 2020, 10:00 PM

#

@shadow harbor long road, a lot to learn. be patient and keep working. you are likely making progress even if you don't feel like you are.

shadow harbor Oct 14, 2020, 10:01 PM

#

thanks @desert oar , any tips in particular regarding what to learn as a beginner and what to follow? It'll be great help!

velvet thorn Oct 14, 2020, 10:59 PM

#

@glad mulch what od yo uexpect the result to look like?

serene scaffold Oct 14, 2020, 11:36 PM

#

@void anvil that's pretty annoying. sorry there aren't any better alternatives

hollow sentinel Oct 14, 2020, 11:43 PM

#

where do you guys look for data sets

#

data.gov?

#

or kaggle

austere swift Oct 14, 2020, 11:43 PM

#

kaggle

#

sometimes github has some good ones

hollow sentinel Oct 14, 2020, 11:44 PM

#

i just want to do some data visualization with plot.ly

#

but would that require setting up a frontend and backend

#

i'm a noob w this stuff if it wasn't obvious

#

https://www.kaggle.com/jeffreybraun/chipotle-locations

Chipotle Locations

A CSV file for all Chipotle Locations in the US

#

I want to use that CSV show how many chipotle restaurants are in each location

#

or each state

#

but idk how to get started

#

can you put a CSV in a database?

#

I tried looking at some github projects but i couldn't figure it out

#

did i just kill the vibe

quiet vessel Oct 15, 2020, 12:02 AM

#

nah its wednesday

hollow sentinel Oct 15, 2020, 12:03 AM

#

oh ok good i thought you guys would be like nah we don't help with projects

#

i just want to make a pie chart of all the data with each State being a part of the pie chart

#

very simple ik but i just wanted a project on my resume

bright turret Oct 15, 2020, 12:04 AM

#

i never got an answer to my 3d histogram question, i wonder if they prefer pies

hollow sentinel Oct 15, 2020, 12:05 AM

#

3d histogram?👀

#

yeah well everyone here has lives and can't always consistently respond haha

bright turret Oct 15, 2020, 12:05 AM

#

ya like a time series of histograms plotted along the z axis

hollow sentinel Oct 15, 2020, 12:06 AM

#

@glad mulch wdym

#

it's there in the kaggle dataset

#

i'm confusion

bright turret Oct 15, 2020, 12:07 AM

#

you can put a csv in a database

#

but that's not exactly python

hollow sentinel Oct 15, 2020, 12:07 AM

#

yeah it's SQL

bright turret Oct 15, 2020, 12:07 AM

#

you can make a pie chart using matplotlib

#

and the csv

#

without getting a database involved

#

what is ur question

hollow sentinel Oct 15, 2020, 12:07 AM

#

do you need a frontend and a backend for it

#

bc i don't know how to set those up

bright turret Oct 15, 2020, 12:08 AM

#

https://matplotlib.org/3.1.1/gallery/pie_and_polar_charts/pie_features.html

#

https://www.anaconda.com/

Anaconda

Anaconda | The World's Most Popular Data Science Platform

Anaconda is the birthplace of Python data science. We are a movement of data scientists, data-driven enterprises, and open source communities.

hollow sentinel Oct 15, 2020, 12:09 AM

#

do you need a frontend to create the graph from the csv

bright turret Oct 15, 2020, 12:10 AM

#

lets drop the jargon for a moment

hollow sentinel Oct 15, 2020, 12:10 AM

#

do you need any javascript at all

bright turret Oct 15, 2020, 12:10 AM

#

i do not believe so

#

you need to learn the numpy and pandas libraries

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

hollow sentinel Oct 15, 2020, 12:10 AM

#

i know a tiny bit of numpy

bright turret Oct 15, 2020, 12:11 AM

#

pandas can read your csv and import it into a dataframe

#

from there u can use matplotlib to make a pie chart

hollow sentinel Oct 15, 2020, 12:11 AM

#

ok i'll watch some tutorials

#

yeah the kaggle csv has it too

#

so i'll just do what @bright turret is saying

#

it

#

it'll be a good exercise

#

thanks guys

bright turret Oct 15, 2020, 12:14 AM

#

good luck, take it slowly

#

slowly

#

slowly

hollow sentinel Oct 15, 2020, 12:15 AM

#

yeah there's some videos for data visualization

#

can you do it on repl.it

#

yes right it's just import statements

bright turret Oct 15, 2020, 12:16 AM

#

never heard of that

hollow sentinel Oct 15, 2020, 12:16 AM

#

https://repl.it/~

repl.it

Log In

Repl.it is a simple yet powerful online IDE, Editor, Compiler, Interpreter, and REPL. Code, compile, run, and host in 50+ programming languages: Clojure, Haskell, Kotlin, QBasic, Forth, LOLCODE, BrainF, Emoticon, Bloop, Unlambda, JavaScript, CoffeeScript, Scheme, APL, Lua, Pyt...

#

online IDE

bright turret Oct 15, 2020, 12:16 AM

#

i just use anacondas

hollow sentinel Oct 15, 2020, 12:16 AM

#

anaconda is weird on my mac

bright turret Oct 15, 2020, 12:16 AM

#

oh RIP

hollow sentinel Oct 15, 2020, 12:16 AM

#

F

bright turret Oct 15, 2020, 12:16 AM

#

F

hollow sentinel Oct 15, 2020, 12:17 AM

#

literally any IDE is wack on my mac

#

it's like the universe wants me to use repl.it

bright turret Oct 15, 2020, 12:18 AM

#

Windows 4 lyfe or something better comes along

#

But at least i'm not the biggest noob here, so thanks

hollow sentinel Oct 15, 2020, 12:18 AM

#

thank you for helping me @bright turret

bright turret Oct 15, 2020, 12:18 AM

#

you're welcome

hollow sentinel Oct 15, 2020, 12:18 AM

#

i'll come back w more questions

bright turret Oct 15, 2020, 12:19 AM

#

i'm sure you will

hollow sentinel Oct 15, 2020, 12:33 AM

#

omg guys

#

i just made my first graph

#

i didn't know it would work on repl.it

hollow sentinel Oct 15, 2020, 1:05 AM

#

sike i switched to anaconda

heady hatch Oct 15, 2020, 1:08 AM

#

Hey people, anyone familiar with tensorflow dataset api?

https://www.tensorflow.org/api_docs/python/tf/data/Dataset

I came across this error while trying to train my model.
The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (140, 8))

I wasn't too sure what was wrong.

My model is

Model: "functional_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         [(None, 140)]             0         
_________________________________________________________________
embedding_5 (Embedding)      (None, 140, 50)           1000000   
_________________________________________________________________
bidirectional_5 (Bidirection (None, 140, 100)          40400     
_________________________________________________________________
global_max_pooling1d_5 (Glob (None, 100)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 50)                5050      
_________________________________________________________________
dropout_5 (Dropout)          (None, 50)                0         
_________________________________________________________________
dense_11 (Dense)             (None, 8)                 408       
=================================================================
Total params: 1,045,858
Trainable params: 1,045,858
Non-trainable params: 0
_________________________________________________________________

Shape of the training features is (14292, 140) and shape of label is (14292,).

TensorFlow

tf.data.Dataset | TensorFlow Core v2.3.0

Represents a potentially large set of elements.

#

Please let me know if you need more information.

#

To clarify,

net = lstm(...)

train = tf.data.Dataset.from_tensor_slices((xtrain, ytrain))
test = tf.data.Dataset.from_tensor_slices((xtest, ytest))

callbacks = []
callbacks.append(EarlyStopping(monitor="val_loss", min_delta=0.01, patience=3, verbose=1, mode="auto"))

net.compile(loss='sparse_categorical_crossentropy', 
            optimizer='adam', 
            metrics=['accuracy'])

history = net.fit(train,
                  epochs=5,
                  batch_size=16,
                  callbacks=callbacks,
                  validation_data=test)

This was how I trained the model using Dataset API.

#

I think I might have figured it out! But I'm currently fitting on the whole dataset to see if there's overfitting or not.

I'll have to report back after!

rain crescent Oct 15, 2020, 1:27 AM

#

can anyone suggest materials for learn numpy in detail

lapis sequoia Oct 15, 2020, 1:29 AM

#

literally any IDE is wack on my mac
@hollow sentinel dont use an IDE then

hollow sentinel Oct 15, 2020, 1:34 AM

#

y'all i got a question

#

https://www.kaggle.com/jeffreybraun/chipotle-locations

Chipotle Locations

A CSV file for all Chipotle Locations in the US

#

i'm trying to graph this with each state as it's own slice in a pie chart

#

yes i've looked at the documentation

#

i'm stuck

#

you can't put a dataframe into a piechart?

wheat pilot Oct 15, 2020, 1:41 AM

#

how do you compute leave one out cross validation error for 1 nearest neighbors if the data is multifeatured and each feature is categorical?

magic quail Oct 15, 2020, 2:55 AM

#

Hi.
I have a dataframe, I am trying to convert it to parquet. By default the column datatype is varchar, that is string, but it contains numeric values to that is interpreted as float. Before converting the data frame to parquet how can I make sure, that all values of the column are made as strings?

heady hatch Oct 15, 2020, 3:01 AM

#

One way I can think of is cast the types to string first before the conversion.

#

Something along the lines of

df['col'].astype(str)

magic quail Oct 15, 2020, 3:04 AM

#

I have tried it

#

Produces the same error

#

@heady hatch

heady hatch Oct 15, 2020, 3:06 AM

#

Oh what error are you getting?

#

@magic quail

blazing rain Oct 15, 2020, 3:19 AM

#

Hey all. I'm currently training in Python for DS/ML/AI and would like to specialize in Spatial Data Analysis. I have a formal university degree with majors in Geography, Economics, and International Studies. Does anyone have suggestions for projects / additional training in this specific area? I've done more generalized bootcamps in Py, NN, etc. but would like to get some experience with spatial data in ML/AI, including display via GIS. Any suggestions are much appreciated!

austere swift Oct 15, 2020, 3:20 AM

#

You could try something out with predicting weather patterns with ML

#

noaa has some good datasets on that

hollow sentinel Oct 15, 2020, 3:21 AM

#

guys i'm so confused

#

i'm trying to make a pie chart out of a dataframe

blazing rain Oct 15, 2020, 3:21 AM

#

Definitely open to that. Hoping to find something that might offer guided projects like that to help learn workflow, process, etc.

hollow sentinel Oct 15, 2020, 3:21 AM

#

https://www.kaggle.com/jeffreybraun/chipotle-locations

Chipotle Locations

A CSV file for all Chipotle Locations in the US

#

that's what i'm usiing

#

i want to make a pie chart where each state has a slice

#

my problem is getting the data into the pie chart

#

plt.title("My Awesome Pie Chart")
state_data =sample_data["state"]
state_data
plt.pie(state_data, labels = state_data)
plt.show()

#

ValueError Traceback (most recent call last)
<ipython-input-28-d963c0f29800> in <module>
2 state_data =sample_data["state"]
3 state_data
----> 4 plt.pie(state_data, labels = state_data)
5 plt.show()

~/opt/anaconda3/lib/python3.7/site-packages/matplotlib/pyplot.py in pie(x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame, rotatelabels, data)
2786 wedgeprops=wedgeprops, textprops=textprops, center=center,
2787 frame=frame, rotatelabels=rotatelabels, **({"data": data} if
-> 2788 data is not None else {}))
2789
2790

~/opt/anaconda3/lib/python3.7/site-packages/matplotlib/init.py in inner(ax, data, *args, **kwargs)
1597 def inner(ax, *args, data=None, **kwargs):
1598 if data is None:
-> 1599 return func(ax, *map(sanitize_sequence, args), **kwargs)
1600
1601 bound = new_sig.bind(ax, *args, **kwargs)

~/opt/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py in pie(self, x, explode, labels, colors, autopct, pctdistance, shadow, labeldistance, startangle, radius, counterclock, wedgeprops, textprops, center, frame, rotatelabels)
2963 # The use of float32 is "historical", but can't be changed without
2964 # regenerating the test baselines.
-> 2965 x = np.asarray(x, np.float32)
2966 if x.ndim != 1 and x.squeeze().ndim <= 1:
2967 cbook.warn_deprecated(

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87

#

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in array(self, dtype)
752 dtype='datetime64[ns]')
753 """
--> 754 return np.asarray(self.array, dtype)
755
756 # ----------------------------------------------------------------------

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87

~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/numpy_.py in array(self, dtype)
182
183 def array(self, dtype=None) -> np.ndarray:
--> 184 return np.asarray(self._ndarray, dtype=dtype)
185
186 _HANDLED_TYPES = (np.ndarray, numbers.Number)

~/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
83
84 """
---> 85 return array(a, dtype, copy=False, order=order)
86
87

ValueError: could not convert string to float: 'Alabama'

#

there's my error

#

i've read documentation and looked all over the place i can't figure it out

magic quail Oct 15, 2020, 3:24 AM

#

@heady hatch
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 107, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Expected a string or bytes dtype, got float64', 'Conversion failed for column NOTES with type float64')

heady hatch Oct 15, 2020, 3:24 AM

#

Can you show me how you're casting the dataframe?

hollow sentinel Oct 15, 2020, 3:25 AM

#

wdym by casting

#

creating?

blazing rain Oct 15, 2020, 3:25 AM

#

Looks like you need to use ['state'] as your labels but plot the values associated with the ['state'] index column

hollow sentinel Oct 15, 2020, 3:25 AM

#

what

austere swift Oct 15, 2020, 3:25 AM

#

@hollow sentinel the first part of the plt.pie should be the size of the pie slices

#

so thats why you get that error, you cant put strings into it

#

and I don't understand what you're trying to do anyways, you're putting state_data in for both arguments

magic quail Oct 15, 2020, 3:27 AM

#

@heady hatch not possible now, I'll dm you later?

heady hatch Oct 15, 2020, 3:27 AM

#

Ping me here.

hollow sentinel Oct 15, 2020, 3:27 AM

#

i want to show how many chipotles there are in each state

#

so what would you put for the size then

#

@austere swift

heady hatch Oct 15, 2020, 3:28 AM

#

You can groupby the state and count it.

hollow sentinel Oct 15, 2020, 3:28 AM

#

how do you do that

#

i'm confusion

heady hatch Oct 15, 2020, 3:29 AM

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.count.html

#

https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby

Stack Overflow

Get statistics for each group (such as count, mean, etc) using pand...

I have a data frame df and I use several columns from it to groupby:

df['col1','col2','col3','col4'].groupby(['col1','col2']).mean()
In the above way I almost get the table (data frame) that I ne...

austere swift Oct 15, 2020, 3:29 AM

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

hollow sentinel Oct 15, 2020, 3:30 AM

#

idk how to use that

#

sample_data = pd.read_csv("chipotle_stores.csv")

#

sorry if i'm being annoying i genuinely don't know

#

i'm new to data science

#

this is just for a small project i wanted to do

magic quail Oct 15, 2020, 3:33 AM

#

@heady hatch ok

#

Will you be available after 4-5 hours? @heady hatch

heady hatch Oct 15, 2020, 3:34 AM

#

Nope, will be asleep in 2 hours.

#

@hollow sentinel So after you read the data via read_csv, try sample_data.head(). That'll give you an idea of what the columns look like.

#

How new are you to data science, @hollow sentinel?

hollow sentinel Oct 15, 2020, 3:34 AM

#

uh just started yesterday @heady hatch

magic quail Oct 15, 2020, 3:34 AM

#

Ok my df is also read from csv

hollow sentinel Oct 15, 2020, 3:35 AM

#

i've only used plot.ly before this w a frontend/backend

heady hatch Oct 15, 2020, 3:35 AM

#

Are you familiar with statistics at all?

hollow sentinel Oct 15, 2020, 3:35 AM

#

yeah i'm taking a class in it

heady hatch Oct 15, 2020, 3:36 AM

#

So I'm going to walk you through your goal, which is to make a pie chart.

#

Okay so you say you want to make a pie chart. What do you want the pie chart to show?

hollow sentinel Oct 15, 2020, 3:36 AM

#

how much chipotle restaurants there are in each state

heady hatch Oct 15, 2020, 3:37 AM

#

Okay so you want to show how many restaurants are there in each state.

Now what kind of statistics will you need for this visualization?

hollow sentinel Oct 15, 2020, 3:37 AM

#

there's a state column in the csv

#

i was thinking like california is there ____ times and I wanted to show california's slice in the pie chart

heady hatch Oct 15, 2020, 3:38 AM

#

Okay right right.

#

To clarify, what kind of numerical data will you need?

hollow sentinel Oct 15, 2020, 3:38 AM

#

how many times each state is there

magic quail Oct 15, 2020, 3:38 AM

#

matplotlib

heady hatch Oct 15, 2020, 3:38 AM

#

You have a column with a bunch of states in them, right? What can you gather from it?

#

like you said, how many times each state is in there.

hollow sentinel Oct 15, 2020, 3:39 AM

#

that's how many chipotle resturants each state has

heady hatch Oct 15, 2020, 3:39 AM

#

So this is called a count, might have some other names. However for the purpose of this conversation, we'll call it count.

#

So we want to get a count of how many states there are in some data, right?

hollow sentinel Oct 15, 2020, 3:39 AM

#

yes

heady hatch Oct 15, 2020, 3:40 AM

#

Okay so

#

You have couple ways of doing this.

#

We'll start simple.

#

let's say your dataframe is called data.

data = pd.read_csv(...)

and you have a column in there called states.

You can access it via data['states'].

hollow sentinel Oct 15, 2020, 3:41 AM

#

yeah that's what i did w state_data =sample_data["state"]

heady hatch Oct 15, 2020, 3:41 AM

#

Try that in the terminal and tell me what it shows you?

#

Take a look at the actual column, not just assigning it.

hollow sentinel Oct 15, 2020, 3:41 AM

#

it shows each state

heady hatch Oct 15, 2020, 3:42 AM

#

Okay.

hollow sentinel Oct 15, 2020, 3:42 AM

#

alabama to wyoming

#

and it skips a bunch of rows

heady hatch Oct 15, 2020, 3:42 AM

#

Cool cool cool.

#

So I think dataframe has a function called value_counts.

So you can try something like

states_data.value_counts().

#

Let me know what that shows you.

hollow sentinel Oct 15, 2020, 3:43 AM

#

it shows california 421, texas 226

#

so that's how many times california and texas shows up

#

and all the other states

heady hatch Oct 15, 2020, 3:43 AM

#

Just two of them?

#

Ahh okay.

#

So now you have a dataframe

hollow sentinel Oct 15, 2020, 3:44 AM

#

more than just two states

heady hatch Oct 15, 2020, 3:44 AM

#

with states as the index, and the count of each state as the values.

hollow sentinel Oct 15, 2020, 3:44 AM

#

all the way to wyoming as 1

heady hatch Oct 15, 2020, 3:45 AM

#

Now you can access these values as such

states_idx = states_data.value_counts().index
states_val = states_data.value_counts().values

#

Now take a look at states_idx and states_val.

#

Tell me what you see.

hollow sentinel Oct 15, 2020, 3:46 AM

#

for states_idx i see nothing

#

just a blank graph

#

for states_val i have AttributeError: 'str' object has no attribute 'value_counts'

heady hatch Oct 15, 2020, 3:47 AM

#

uh

#

What does states_data.value_counts() evaluate to again?

#

btw you're going to need to adapt this to your code

#

Because I'm just using generic variable names.

hollow sentinel Oct 15, 2020, 3:47 AM

#

am i allowed to send screenshots here

heady hatch Oct 15, 2020, 3:48 AM

#

paste the code instead in triple backticks.

hollow sentinel Oct 15, 2020, 3:48 AM

#

California        421
Texas             226
Ohio              193
Florida           177
New York          160
Illinois          144
Virginia          107
Pennsylvania       96
Maryland           94
Arizona            85
Colorado           79
Minnesota          71
New Jersey         69
North Carolina     65
Massachusetts      62
Georgia            61
Washington         43
Indiana            40
Missouri           39
Michigan           39
Oregon             32
Kansas             30
Nevada             29
Tennessee          26
Connecticut        24
Kentucky           21
Washington DC      21
South Carolina     21
Wisconsin          20
Alabama            15

#

it goes on

heady hatch Oct 15, 2020, 3:48 AM

#

Okay cool.

#

and now print out the value_counts().index?

hollow sentinel Oct 15, 2020, 3:49 AM

#

Index(['California', 'Texas', 'Ohio', 'Florida', 'New York', 'Illinois',
       'Virginia', 'Pennsylvania', 'Maryland', 'Arizona', 'Colorado',
       'Minnesota', 'New Jersey', 'North Carolina', 'Massachusetts', 'Georgia',
       'Washington', 'Indiana', 'Missouri', 'Michigan', 'Oregon', 'Kansas',
       'Nevada', 'Tennessee', 'Connecticut', 'Kentucky', 'Washington DC',
       'South Carolina', 'Wisconsin', 'Alabama', 'Oklahoma', 'Utah',
       'Louisiana', 'Nebraska', 'Iowa', 'Delaware', 'Rhode Island',
       'New Mexico', 'New Hampshire', 'Arkansas', 'West Virginia', 'Maine',
       'Idaho', 'Montana', 'Vermont', 'Mississippi', 'North Dakota',
       'Wyoming'],
      dtype='object')

heady hatch Oct 15, 2020, 3:49 AM

#

Hm I'm not too sure what you mean by just a blank graph.

#

Because you're showing me the index right now.

hollow sentinel Oct 15, 2020, 3:49 AM

#

oh yeah forget about that i fixed it

heady hatch Oct 15, 2020, 3:49 AM

#

Okay so what about value_counts().values?

hollow sentinel Oct 15, 2020, 3:50 AM

#

'str' object has no attribute 'value_counts

heady hatch Oct 15, 2020, 3:50 AM

#

Uh

#

Can you show me your exact code?

hollow sentinel Oct 15, 2020, 3:50 AM

#

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-20d3e9a2843a> in <module>
      3 states_idx = state_data.value_counts().index
      4 states_idx
----> 5 states_val = state.value_counts().values
      6 states_val
      7 

AttributeError: 'str' object has no attribute 'value_counts'

heady hatch Oct 15, 2020, 3:50 AM

#

Is it supposed to be state.value_counts()?

hollow sentinel Oct 15, 2020, 3:50 AM

#

state_data =sample_data["state"]
state_data.value_counts()
states_idx = state_data.value_counts().index
states_idx
states_val = state.value_counts().values
states_val

#plt.pie(state_data, labels = state_data)
#plt.show()

heady hatch Oct 15, 2020, 3:50 AM

#

Because the line above is state_data.

#

Shouldn't it be state_data.value_counts()?

hollow sentinel Oct 15, 2020, 3:51 AM

#

isn't that what i copy pasted

#

does it say states somewhere it's not supposed to

#

i have been staring at the word state for too long

heady hatch Oct 15, 2020, 3:52 AM

#

📎 unknown.png

#

📎 unknown.png

#

I think on line 5.

hollow sentinel Oct 15, 2020, 3:53 AM

#

so it should be state_val

heady hatch Oct 15, 2020, 3:53 AM

#

It should be states_data.value_counts().

#

instead of states.value_counts().

#

so

#

on your assignment.

hollow sentinel Oct 15, 2020, 3:54 AM

#

i'm so confused

heady hatch Oct 15, 2020, 3:54 AM

#

I am confused of what you're confused about. hahaha

#

So

#

Take a look at line 5.

hollow sentinel Oct 15, 2020, 3:54 AM

#

i have no line numbers on anaconda

heady hatch Oct 15, 2020, 3:54 AM

#

Look to the right of the = sign.

#

Okay

#

Look to the right of your = sign.

#

Look at what you're assigning.

#

Go through it, character by character.

hollow sentinel Oct 15, 2020, 3:55 AM

#

OHHHHHHH

heady hatch Oct 15, 2020, 3:55 AM

#

now you have a list of your states and the values of the count of the states.

hollow sentinel Oct 15, 2020, 3:55 AM

#

state_data =sample_data["state"]
states_data.value_counts()
states_idx = states_data.value_counts().index
states_idx
states_val = state.value_counts().values
states_val

#plt.pie(state_data, labels = state_data)
#plt.show()

heady hatch Oct 15, 2020, 3:55 AM

#

Now you can plot it via plt.pie.

hollow sentinel Oct 15, 2020, 3:56 AM

#

did i fix it this time

heady hatch Oct 15, 2020, 3:56 AM

#

You still haven't fixed it.

hollow sentinel Oct 15, 2020, 3:56 AM

#

oh my god

#

i'm blind

#

is it the second line?

heady hatch Oct 15, 2020, 3:56 AM

#

📎 unknown.png

#

Tell me if these two are the same.

hollow sentinel Oct 15, 2020, 3:56 AM

#

no

heady hatch Oct 15, 2020, 3:56 AM

#

Why not?

hollow sentinel Oct 15, 2020, 3:56 AM

#

one has data.value

#

and one has value_counts()

heady hatch Oct 15, 2020, 3:57 AM

#

They both have value_counts.

#

📎 unknown.png

hollow sentinel Oct 15, 2020, 3:57 AM

#

eys

#

yes

heady hatch Oct 15, 2020, 3:57 AM

#

Okay so you have these two lines.

states_idx = states_data.value_counts().index

states_val = state.value_counts().values

#

Right?

hollow sentinel Oct 15, 2020, 3:57 AM

#

yes

heady hatch Oct 15, 2020, 3:57 AM

#

Now for the first one

#

it's

= states_data.value_counts() right?

#

And for the second one, it's = state.value_counts().

#

= states_data.value_counts()
= state.value_counts()

#

Look at where they're different.

hollow sentinel Oct 15, 2020, 3:59 AM

#

one has _data

heady hatch Oct 15, 2020, 3:59 AM

#

right.

#

So what's state.value_counts()?

#

Why are you using state for one and state_data for the other?

hollow sentinel Oct 15, 2020, 4:00 AM

#

uhhh

#

why am i getting confused by this

heady hatch Oct 15, 2020, 4:00 AM

#

I don't know, I don't even know why you wrote state instead of state_data.

#

I think if you can answer that, you might be able to answer your confusion.

hollow sentinel Oct 15, 2020, 4:01 AM

#

where did i write state intead of state_data

#

i must be blind

heady hatch Oct 15, 2020, 4:01 AM

#

Maybe you're getting tired.

#

Work on it tomorrow.

hollow sentinel Oct 15, 2020, 4:01 AM

#

noooooo

heady hatch Oct 15, 2020, 4:02 AM

#

I'm going to get ready for bed.

#

Good night.

hollow sentinel Oct 15, 2020, 4:02 AM

#

ok

velvet thorn Oct 15, 2020, 4:02 AM

#

states_idx = states_data.value_counts().index
states_idx
states_val = state.value_counts().values

^ there

hollow sentinel Oct 15, 2020, 4:02 AM

#

no that doesn't work either

#

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-42-625cfd47ba59> in <module>
      1 state_data =sample_data["state"]
----> 2 states_data.value_counts()
      3 states_idx = states_data.value_counts().index
      4 states_idx
      5 states_val = state.value_counts().values

NameError: name 'states_data' is not defined

heady hatch Oct 15, 2020, 4:03 AM

#

You might need to be more familiar with Python. Maybe look over basics of Python, you're making mistakes on the syntax.

hollow sentinel Oct 15, 2020, 4:03 AM

#

idek what i'm messing up on

#

ugh i just wanted a pie chart

austere swift Oct 15, 2020, 4:04 AM

#

its state_data not states_data

hollow sentinel Oct 15, 2020, 4:05 AM

#

📎 unknown.png

#

i thought that would fix it

austere swift Oct 15, 2020, 4:07 AM

#

you still spelled it wrong again

hollow sentinel Oct 15, 2020, 4:08 AM

#

what line

austere swift Oct 15, 2020, 4:08 AM

#

the line its pointing to

hollow sentinel Oct 15, 2020, 4:08 AM

#

state_data =sample_data["state"]
state_data.value_counts()
states_idx = state_data.value_counts().index
states_idx
state_val = state.value_counts().values

#plt.pie(state_data, labels = state_data)
#plt.show()

#

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-45-4b34edeae852> in <module>
      3 states_idx = state_data.value_counts().index
      4 states_idx
----> 5 state_val = state.value_counts().values
      6 
      7 #plt.pie(state_data, labels = state_data)

AttributeError: 'str' object has no attribute 'value_counts'

velvet thorn Oct 15, 2020, 4:09 AM

#

like two people have said...why do you have a state and a state_data?

austere swift Oct 15, 2020, 4:09 AM

#

^

velvet thorn Oct 15, 2020, 4:09 AM

#

no that doesn't work either
@hollow sentinel and I'm not saying that'll work; I'm pointing out the inconsistency in your code

hollow sentinel Oct 15, 2020, 4:10 AM

#

so it should be ```python
state = sammple_data["state]

#

sample

#

idk i think i should go to bed i'm not thinking straight

velvet thorn Oct 15, 2020, 4:11 AM

#

like at the very least, you should be able to tell that your variable names are all over the place

#

which leads to confusion.

hollow sentinel Oct 15, 2020, 4:11 AM

#

i have been staring at words on a screen for too long

#

i'm sorry

velvet thorn Oct 15, 2020, 4:12 AM

#

how about you go take a break

#

and look at it again in the morning or evening or whatever is a few hours away in your timezone

#

think that might be more productive

hollow sentinel Oct 15, 2020, 4:12 AM

#

yeah i agree

#

this project doesn't have to be finished today

velvet thorn Oct 15, 2020, 4:13 AM

#

yup

#

haste makes waste

tawdry bobcat Oct 15, 2020, 5:26 AM

#

How can I find out the dataset format for DialoGPT? (https://github.com/microsoft/DialoGPT)

mild topaz Oct 15, 2020, 6:04 AM

#

Your training looks definitely wrong. For epoch 12 I don't expect val_acc to be 100% along with accuracy. Are you using dropout layers and are you sure you are not overfitting (by a very large margin it seems)
@grave frost my code here https://paste.pythondiscord.com/hajamafano.coffeescript

#

also i have very less amount of data to train a model for e.g say i have 3 classes, each class have approx only 7-10 images@grave frost plz ping me when u back.

mild topaz Oct 15, 2020, 6:40 AM

#

my training process here https://paste.pythondiscord.com/pisamuvuci.diff plz have look @grave frost

hasty grail Oct 15, 2020, 7:20 AM

#

unless you're doing few-shot learning I don't think your model would perform very well

#

that's way too little data

mild topaz Oct 15, 2020, 7:23 AM

#

@hasty grail hello

#

i can understand , but actually i have very very less data

hasty grail Oct 15, 2020, 7:23 AM

#

have you tried data augmentation?

#

you should

mild topaz Oct 15, 2020, 7:27 AM

#

yess i am using python dataGen = ImageDataGenerator(width_shift_range = 0.1, height_shift_range = 0.1, zoom_range = 0.2, shear_range = 0.1, rotation_range = 10) see this @hasty grail

hasty grail Oct 15, 2020, 7:27 AM

#

how many samples do you have per epoch? (not batches)

mild topaz Oct 15, 2020, 7:28 AM

#

steps_per_epoch_val = 10 @hasty grail is this u are talking?

hasty grail Oct 15, 2020, 7:28 AM

#

no

#

samples

#

which is batch_size * num_batches

mild topaz Oct 15, 2020, 7:30 AM

#

see in training -12 samples, val -3samples, test -4samples @hasty grail

hasty grail Oct 15, 2020, 7:33 AM

#

and how many classes are there?

mild topaz Oct 15, 2020, 7:33 AM

#

3 classes only @hasty grail

hasty grail Oct 15, 2020, 7:35 AM

#

you should increase the amount of data you have via data augmentation

mild topaz Oct 15, 2020, 7:35 AM

#

i am using data augmentation this in my code @hasty grail

hasty grail Oct 15, 2020, 7:36 AM

#

but you aren't increasing the amount of samples you have

mild topaz Oct 15, 2020, 7:38 AM

#

means incresing the images (image data) u are saying here ? @hasty grail

hasty grail Oct 15, 2020, 7:38 AM

#

yes

#

a better alternative would be for you to gather more data thouhg

mild topaz Oct 15, 2020, 7:39 AM

#

u mean to collect more image data @hasty grail

hasty grail Oct 15, 2020, 7:42 AM

#

yes

mild topaz Oct 15, 2020, 7:51 AM

#

thanks 🙂 @hasty grail

obtuse skiff Oct 15, 2020, 8:18 AM

#

how hard would it be to implement a convolutional nueral network from scratch?

#

Only using numpy

eager heath Oct 15, 2020, 8:19 AM

#

Not that hard

#

But still quite maths heavy

hasty grail Oct 15, 2020, 8:27 AM

#

numpy by itself doesn't use GPU though so it'll be rather slow

mild topaz Oct 15, 2020, 9:12 AM

#

@hasty grail hii, val_loss should be less that val_accuracy is this the method in training a NN model ? is this correct?

hasty grail Oct 15, 2020, 9:16 AM

#

you're not describing a method, seems that you're describing a goal

mild topaz Oct 15, 2020, 9:17 AM

#

i am using a CNN @hasty grail

hasty grail Oct 15, 2020, 9:17 AM

#

it's possible that val_loss > val_accuracy depending on how they are computed, it is not absolutely necessary that the loss is smaller than accuracy

mild topaz Oct 15, 2020, 9:17 AM

#

also, how to decide how much epoch should be keep to train a CNN model ?

hasty grail Oct 15, 2020, 9:17 AM

#

You should search on Google for that

#

(Try figuring stuff out for yourself)

mild topaz Oct 15, 2020, 9:25 AM

#

i have trained a model now, what u can say on this? @hasty grail python Epoch 47/50 10/10 [==============================] - 12s 1s/step - loss: 0.1458 - accuracy: 0.8947 - val_loss: 0.0019 - val_accuracy: 1.0000 Epoch 48/50 10/10 [==============================] - 12s 1s/step - loss: 0.1911 - accuracy: 0.9286 - val_loss: 0.0241 - val_accuracy: 1.0000 Epoch 49/50 10/10 [==============================] - 12s 1s/step - loss: 0.1833 - accuracy: 0.9286 - val_loss: 0.0051 - val_accuracy: 1.0000 Epoch 50/50 10/10 [==============================] - 18s 2s/step - loss: 0.1468 - accuracy: 0.9737 - val_loss: 0.0093 - val_accuracy: 1.0000 test score: 0.3029577136039734 test accuracy: 0.800000011920929

grave frost Oct 15, 2020, 9:27 AM

#

@obtuse skiff I agree with DarkLight, if you have an Nvidia GPU (even those cheap laptop ones) better use Cupy for your implementation.

#

@mild topaz Model seems to be overfitting. But from the test accuracy, I don't think more can be expected from it, seeing the amount of data

obtuse skiff Oct 15, 2020, 9:33 AM

#

Thanks for the recommendation

#

Will do

#

How would you parallel the layers?

#

Nodes in the layers

hasty grail Oct 15, 2020, 9:34 AM

#

that really really depends on the model architecture, I am not aware of model parallelism being implemented even for the popular ML frameworks

odd yoke Oct 15, 2020, 9:35 AM

#

assuming you talk about parallelizing the computations within the same layer, numpy will do it for you if you use it correctly (basically, without using loops)

mild topaz Oct 15, 2020, 9:35 AM

#

to prevent overfitting i need more image data, Correct ? @grave frost

odd yoke Oct 15, 2020, 9:36 AM

#

that's one way, yes, there are many techniques you can use to limit overfitting without needing to add more data

#

simplest one is regularization

hasty grail Oct 15, 2020, 9:36 AM

#

given that they only have 10 batches I think they do need to add more data xD

odd yoke Oct 15, 2020, 9:36 AM

#

yeah probably then

obtuse skiff Oct 15, 2020, 9:37 AM

#

Kk

#

Ty

grave frost Oct 15, 2020, 9:38 AM

#

@green basin Hyperparameter optimization isn't exactly something to be done on a GUI (at least in my experience I have not seen such a lib). You need to understand that optimization takes a lot of time, 7-ish days is kinda expected if you want maximum performance. Since most hyperparameters will be numerical (save a few others like loss and optimizers) most efficient way is to research the optimizers/loss, test them out manually (since I assume you do not have enough resources to train model for a few days continously).
Seeing as to the research part and the fact that parameters are numerical, GUI will not help you or your lab-mates very much. There are plenty of methods online for h-optimization (I personally like talos since it is extremely simple and bare-bones).

"Messing" hyperparameters can be done manually but it is too inefficient and draining task so better to be automated by some library for that. You could make an app that consists of a start button to start the training but apart from that ML is a bit technical stuff - even if your labmates don't like coding, they can learn a bit o' maths and research for you the best loss and optimizer for the job.

So my recommendation is to just use a lib (don't worry it's pretty easy) and automate the boring stuff. You may tune some basic parameters like Learning rate, optimizer hyperparametrs etc. (like 4-5 parameters for a start) so you training would be done in about a day with a decent GPU. 🙂

#

simplest one is regularization
@odd yoke I don't think they are allowed to modify the program in any way, only the data....

odd yoke Oct 15, 2020, 9:40 AM

#

oh, that'll teach me not to read

mild topaz Oct 15, 2020, 9:46 AM

#

given that they only have 10 batches I think they do need to add more data xD
@hasty grail correct

grave frost Oct 15, 2020, 9:48 AM

#

So why didn't you do Image augmentation? Not using Keras built-ins but with some other dedicated lib like imguag???

hasty grail Oct 15, 2020, 9:49 AM

#

they are already doing it apparently

#

but I guess they didn't actually increase the size of the dataset

#

regardless, having more real data is better than having more artificial ones

#

especially for a dataset this small

#

Finally - I spent hours just to get this to display nicely as a (roughly) 1:1 image regardless of the spatial dimensions and value range of the data 😫 (Don't worry the data here is just a placeholder xD)

📎 unknown.png

grave frost Oct 15, 2020, 9:55 AM

#

@hasty grail augmentation can give upto 10x the data but I don't see that much substantial data with the OP, even assuming he started with only a couple of photos. Since there are many filters, he could easily boost his accuracy to be atleast a little bit better that the current 80%...

mild topaz Oct 15, 2020, 10:03 AM

#

Epoch 48/50
10/10 [==============================] - 17s 2s/step - loss: 0.0196 - accuracy: 1.0000 - val_loss: 3.3961e-04 - val_accuracy: 1.0000
Epoch 49/50
10/10 [==============================] - 17s 2s/step - loss: 0.0123 - accuracy: 1.0000 - val_loss: 2.0785e-04 - val_accuracy: 1.0000
Epoch 50/50
10/10 [==============================] - 19s 2s/step - loss: 0.0082 - accuracy: 1.0000 - val_loss: 1.0205e-04 - val_accuracy: 1.0000
test score:  4.928431034088135
test accuracy:  0.0```   when again i train a model @grave frost

#

i get the above results

grave frost Oct 15, 2020, 10:03 AM

#

uh-oh that looks pretty bad

mild topaz Oct 15, 2020, 10:04 AM

#

becoz of less image data ? @grave frost

grave frost Oct 15, 2020, 10:04 AM

#

I am not very sure but apparently the model has overfitted to such an extent that it has lost all its ability to generalize. Though, you can confirm this with Darklight or lgneous just to be on the safe side...

#

Yes, lack of data is one factor in the overfitting

hasty grail Oct 15, 2020, 10:19 AM

#

Test accuracy 0 oof

mild topaz Oct 15, 2020, 10:21 AM

#

yes @hasty grail

cedar sky Oct 15, 2020, 10:40 AM

#

Epoch 48/50
10/10 [==============================] - 17s 2s/step - loss: 0.0196 - accuracy: 1.0000 - val_loss: 3.3961e-04 - val_accuracy: 1.0000
Epoch 49/50
10/10 [==============================] - 17s 2s/step - loss: 0.0123 - accuracy: 1.0000 - val_loss: 2.0785e-04 - val_accuracy: 1.0000
Epoch 50/50
10/10 [==============================] - 19s 2s/step - loss: 0.0082 - accuracy: 1.0000 - val_loss: 1.0205e-04 - val_accuracy: 1.0000
test score:  4.928431034088135
test accuracy:  0.0```   when again i train a model @grave frost

@mild topaz What is the size of your vaalidation set

#

Evenif it hadoverfit badly it would be able to have an test acc of more than 0

#

It is an issue with the fundamentals of the code

grave frost Oct 15, 2020, 10:48 AM

#

Evenif it hadoverfit badly it would be able to have an test acc of more than 0
@cedar sky Not necessarliy - the OP had like 5 testing samples. It could be possible that due to an unlucky conjunction he got test accuracy of 0.

cedar sky Oct 15, 2020, 10:49 AM

#

@cedar sky Not necessarliy - the OP had like 5 testing samples. It could be possible that due to an unlucky conjunction he got test accuracy of 0.
@grave frost The chance of that happening is really low

#

for a normal task the initialized weights would have abt a 30% accuracy

grave frost Oct 15, 2020, 10:54 AM

#

That is a very wrong assumption there. Initialized weights are often set at 0 or 1 which anyone can change; It does not gurantee that you would get 30% accuracy without training. If you do not train a model then the expected accuracy would be the empirical probablity calculated by the data, not on the weights themselves as the model would then be simply guessing at random.

#

Though in practice untrained models sometimes perform even worse than the theoretical outcome due to it's stochastic nature

hasty grail Oct 15, 2020, 11:08 AM

#

They mentioned that there are 3 classes, so naively, the chance of it getting all 5 wrong would be (2 / 3) ** 5 = 13.17%, which honestly isn't all that low

azure marsh Oct 15, 2020, 11:16 AM

#

📎 unknown.png

#

how can I get avg weekly sale for every product?

hasty grail Oct 15, 2020, 11:34 AM

#

Try searching "pandas get average of column" on Google

#

Being able to find out the solution yourself will save you the pain of having to wait for someone to respond xD

azure marsh Oct 15, 2020, 11:36 AM

#

I am searching

#

but I just cannot find a way to get weekly average

hasty grail Oct 15, 2020, 11:37 AM

#

If you want the weekly average I think you can use groupby

azure marsh Oct 15, 2020, 11:38 AM

#

I think I will try to add another column

#

that displays week instead of the date

hasty grail Oct 15, 2020, 11:41 AM

#

take a look at this

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html#pandas.Series.resample

azure marsh Oct 15, 2020, 11:41 AM

#

Thanks a lot

hasty grail Oct 15, 2020, 11:41 AM

#

np

grave frost Oct 15, 2020, 1:03 PM

#

They mentioned that there are 3 classes, so naively, the chance of it getting all 5 wrong would be (2 / 3) ** 5 = 13.17%, which honestly isn't all that low
yeah, but he was getting good test accuracy before so I assume his code must be working. Also, he was getting 100% val accuracy which isn't very common. Why would a model get 100% val acc but 0 test acc? Probably because it was way overfitted or either some part of code is wrong. But it was working before I don't see why the code would suddenly stop working, unless he forgot to shuffle the data...

mild topaz Oct 15, 2020, 1:49 PM

#

https://paste.pythondiscord.com/vopasohisi.coffeescript my code here plz check @grave frost

hollow sentinel Oct 15, 2020, 2:04 PM

#

import pandas as pd
from matplotlib import pyplot as plt 
sample_data = pd.read_csv("chipotle_stores.csv")

#

sample_data = pd.read_csv("chipotle_stores.csv")
sample_data = pd.read_csv("chipotle_stores.csv")

NameError Traceback (most recent call last)
<ipython-input-7-e3615d995c74> in <module>
----> 1 sample_data = pd.read_csv("chipotle_stores.csv")

NameError: name 'pd' is not defined

#

uhhhhhhh

#

this was working fine last night

#

i checked and there's nothing wrong with the folder

#

i don't understand

#

i'm confusion

mild topaz Oct 15, 2020, 2:12 PM

#

try pip install pandas @hollow sentinel

hollow sentinel Oct 15, 2020, 2:13 PM

#

in the terminal right

#

that doesn't do anythnig

#

it says pandas is already in anaconda

#

📎 unknown.png

#

ughhhhh this is such bullshit i just wanted a pie chart mann

#

i swear to god it was working last night

#

i don't wanna use repl.it repl.it is mad sus

#

HEY IT WORKS AGAIN

#

it's like my mac just forgot i have pandas

#

📎 unknown.png

#

@heady hatch I did it chief

#

um i think i have to make a legend

heady hatch Oct 15, 2020, 3:20 PM

#

Congratulations.

hollow sentinel Oct 15, 2020, 3:22 PM

#

it looks ugly

#

i think a bar graph would be better

#

x_pos = [i for i, _ in enumerate(x)]

#

what does that mean

#


x_pos = [i for i, _ in enumerate(states_idx)]
plt.bar = (x_pos, stateValues, color = "blue")
plt.xlabel("States")
plt.ylabel("Restaurants")
plt.title("States with the Largest Amount of Chipotle Restaurants")
plt.xticks(x_pos, states_idx )
plt.show()

#

what's wrong with the syntax with color = "blue"

pale thunder Oct 15, 2020, 3:32 PM

#

you cannot have assignment in tuples

hollow sentinel Oct 15, 2020, 3:32 PM

#

so just take it out

#

it's weird bc i'm looking at this

#

📎 unknown.png

pale thunder Oct 15, 2020, 3:33 PM

#

bar = (x_pos
bar(x_pos

hollow sentinel Oct 15, 2020, 3:34 PM

#

one has an equal sign and one doesn't

#

got it thank you

#

created the bar chart

#

it looks mad ugly too

#

📎 unknown.png

#

who knew data viz was so sus

austere swift Oct 15, 2020, 3:48 PM

#

honestly i think a really good way to visualize this would be a choropleth

#

you can do those in plotly pretty easily

#

its like a map that has a color bar from like 2 different colors and the more towards one color the higher the value

#

so you could make a map of all the states and have it so that the one that has the most chipotles would be one color and then a gradient down to the one with the least chipotles

hollow sentinel Oct 15, 2020, 3:51 PM

#

yeah but plot.ly sus

#

it requires a frontend and a backend

austere swift Oct 15, 2020, 3:51 PM

#

you dont need a frontend and a backend lol

hollow sentinel Oct 15, 2020, 3:51 PM

#

no?

#

oh

#

WHAT

#

my freshman year of coding was a lie

austere swift Oct 15, 2020, 3:51 PM

#

i've done choropleths with plotly many times

#

its really simple

#

https://plotly.com/python/choropleth-maps/

Choropleth Maps

How to make choropleth maps in Python with Plotly.

hollow sentinel Oct 15, 2020, 3:53 PM

#

that is smart

austere swift Oct 15, 2020, 3:55 PM

#

basically something like this

import plotly.express as px
import pandas as pd
import json
from urllib.request import urlopen

print("Loading Data")
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    geojson = json.load(response)

sample_data = pd.read_csv("chipotle_stores.csv")

fig = px.choropleth(sample_data, geojson=geojson, locations="state")

that probably wont work but something like that

hollow sentinel Oct 15, 2020, 3:56 PM

#

📎 unknown.png

austere swift Oct 15, 2020, 3:56 PM

#

yeah actually what i sent wont work cus the geojson is for counties lol

hollow sentinel Oct 15, 2020, 3:57 PM

#

should I put it on github as is

#

it's a work in progress

austere swift Oct 15, 2020, 3:57 PM

#

yeah you could

hollow sentinel Oct 15, 2020, 3:57 PM

#

ok good i can say i visualized data w two charts and i'm working to add another

#

ty @austere swift

#

can i ping you if i have questions about the chloropath

austere swift Oct 15, 2020, 3:58 PM

#

sure

hollow sentinel Oct 15, 2020, 3:58 PM

#

thanks you and the others have been really helpful

#

much appreciated haha

hollow sentinel Oct 15, 2020, 4:14 PM

#

alright great it's on github

heady hatch Oct 15, 2020, 4:18 PM

#

Hey guys.

Right now I'm playing around with the dataset and sequence API on both CPU and GPU to see the speed difference.

I noticed CPU is running much quicker than GPU, which I'm thinking I'm using GPU wrong.

Any advice here?

CPU is running at an hour per epoch while GPU is running at 2 hours per epoch.

austere swift Oct 15, 2020, 4:19 PM

#

code?

heady hatch Oct 15, 2020, 4:19 PM

#

Which part would you like?

#

So this is the model fitting.

callbacks = []
callbacks.append(EarlyStopping(monitor="val_loss", min_delta=0.01, patience=2, verbose=1, mode="auto"))
callbacks.append(ModelCheckpoint(
    filepath='weights.{epoch:02d}-{val_loss:.2f}.hdf5',
    save_weights_only=True,
    monitor='val_loss',
    mode='auto',
    save_best_only=True))

# with tf.device('/GPU:0'):

net.compile(loss='sparse_categorical_crossentropy', 
            optimizer='adam', 
            metrics=['accuracy'])

history = net.fit(train,
                  epochs=5,
                  verbose=1,
                  callbacks=callbacks,
                  validation_data=test)

#

This is the model.

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 140)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 140, 50)           1000000   
_________________________________________________________________
bidirectional (Bidirectional (None, 140, 100)          40400     
_________________________________________________________________
global_max_pooling1d (Global (None, 100)               0         
_________________________________________________________________
dense (Dense)                (None, 50)                5050      
_________________________________________________________________
dropout (Dropout)            (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 408       
=================================================================
Total params: 1,045,858
Trainable params: 1,045,858
Non-trainable params: 0
_________________________________________________________________

#

This is the prepared data.

processed_data_path = Path('/kaggle/input/russian-troll-problem-compilation/processed_dataset.h5')
with h5py.File(processed_data_path, 'r') as f:
    input_shape = f['x_train'][:].shape
    num_classes = len(np.unique(f['y_train'][:]))
    
    train = TextSequence(f['x_train'][:], f['y_train'][:], batch_size=256)
    test = TextSequence(f['x_test'][:], f['y_test'][:], batch_size=256)
#     train = Dataset.from_tensor_slices((f['x_train'][:], f['y_train'][:])).batch(256)
#     test = Dataset.from_tensor_slices((f['x_test'][:], f['y_test'][:])).batch(256)
    embed_matrix = f['embedding'][:]

#

x_train is shape (some large number, 140) and y_train is (some large number,).

#

Let me know if you need some other code.

#

This is being run on Kaggle.

austere swift Oct 15, 2020, 4:23 PM

#

have you verified that its actually running on gpu?

#

like checked gpu mem utilization and stuff

#

cus it could be saying that but then had some cuda error and run on cpu instead

#

but that still wouldnt explain the time difference...

heady hatch Oct 15, 2020, 4:24 PM

#

I thought it was running it on GPU since the monitor had GPU usage.

#

But let me actual double check.

#

Any other way of checking if GPU is being used?

austere swift Oct 15, 2020, 4:24 PM

#

tf.config.list_physical_devices('GPU')

heady hatch Oct 15, 2020, 4:27 PM

#

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)
    
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

#Name: /physical_device:GPU:0   Type: GPU
#Num GPUs Available:  1

austere swift Oct 15, 2020, 4:27 PM

#

ok so it sees the gpu then

heady hatch Oct 15, 2020, 4:27 PM

#

It's okay to run it on GPU with this syntax, right?

with tf.device('/GPU:0'):

    net.compile(loss='sparse_categorical_crossentropy', 
                optimizer='adam', 
                metrics=['accuracy'])

    history = net.fit(train,
                      epochs=5,
                      verbose=1,
                      callbacks=callbacks,
                      validation_data=test)

austere swift Oct 15, 2020, 4:27 PM

#

yes

boreal summit Oct 15, 2020, 4:32 PM

#

Guys, I have a little issue here. I'm trying to use make_column_transformer from sklesen.compose to transform a column. I also imported simpleImputer to handle nan values. The code gives me an error saying, input contains Nan

#

Don't know what's wrong.

heady hatch Oct 15, 2020, 4:32 PM

#

What does your pipeline look like?

#

My guess is it's hitting nans before the simpleimputer.

boreal summit Oct 15, 2020, 4:33 PM

#

📎 20201015_173340.jpg

austere swift Oct 15, 2020, 4:34 PM

#

can you copy paste your code?

boreal summit Oct 15, 2020, 4:34 PM

#

Ooh, okay. Let me try something else. @heady hatch

#

@austere swift I'm typing from my phone.

hollow sentinel Oct 15, 2020, 4:52 PM

#

fig = go.Figure(data=go.Chloropeth(
locations = states_idx
z = stateValues.astype(float)
locationmode = 'USA-states
colorscale = 'Reds',
colorbar_title = "Restaurants per State",
))

fig.update_layout(
title_text = "States with the Largest Amount of Chipotle Restaurants",
geoscope = "usa")

fig.show()

#

File "<ipython-input-2-5694b0fa71b6>", line 29
stateValues.astype(float)
^
SyntaxError: invalid syntax

boreal summit Oct 15, 2020, 4:52 PM

#

Forgive me, I'm a student and still learning.

hollow sentinel Oct 15, 2020, 4:53 PM

#

📎 unknown.png

#

i'm going off that

#

idk what's wrong

#

any help is much appreciated

#

idk what's off i'm trying to follow the doc to a T

misty lake Oct 15, 2020, 4:55 PM

#

Can anyone enlighten me about architecting a machine learning application as micro services or show me an example?

hollow sentinel Oct 15, 2020, 4:55 PM

#

does anyone know what's going on i'm very confused

grave frost Oct 15, 2020, 5:03 PM

#

@misty lake You mean deploying a trained ML model to cloud and serve it via REST API's or something?

hollow sentinel Oct 15, 2020, 5:04 PM

#

guys i genuinely don't know how to fix my code

#

i think i'm close but like invalid syntax what and how

#

in the doc the variable z is never used

#

i don't even know why they assign it to z

grave frost Oct 15, 2020, 5:06 PM

#

@hollow sentinel You might get responses quicker if you explain your problem/error, what you have tried, and your goal clearly in one message. Multiple messages look pretty weird

hollow sentinel Oct 15, 2020, 5:08 PM

#

ok so I'm trying to make a chloropeth which is like a map. I pasted my code before and it's saying I have an invalid syntax error. I tried taking it out but then it moves onto the next line and says invalid syntax too.

grave frost Oct 15, 2020, 5:08 PM

#

ok, can you copy paste your whole traceback and the offending code here?

hollow sentinel Oct 15, 2020, 5:08 PM

#

traceback??

grave frost Oct 15, 2020, 5:08 PM

#

The error

hollow sentinel Oct 15, 2020, 5:09 PM

#

File "<ipython-input-7-e51a0b6a67dc>", line 30
    locationmode = 'USA-states",
               ^
SyntaxError: invalid syntax

#

fig = go.Figure(data=go.Chloropeth(
locations = states_idx
#z = stateValues.astype(float)
locationmode = 'USA-states",
colorscale = 'Reds',
colorbar_title = "Restaurants per State",
))

fig.update_layout(
title_text = "States with the Largest Amount of Chipotle Restaurants",
geoscope = "usa")

fig.show

grave frost Oct 15, 2020, 5:10 PM

#

See do you use an IDE?

hollow sentinel Oct 15, 2020, 5:11 PM

#

yeah i'm using Jupyter notebook

#

on anaconda

grave frost Oct 15, 2020, 5:11 PM

#

ohk, if you are a beginner just don't use them yet please. Usually you can find such errors yourself.

hollow sentinel Oct 15, 2020, 5:11 PM

#

i don't like repl.it

grave frost Oct 15, 2020, 5:11 PM

#

Now tell me, does the syntax look a little off to you, like a missing bracket or a possible comma missing?

hollow sentinel Oct 15, 2020, 5:11 PM

#

but ok

#

it looks like mismatched quotes to me

#

locationmode = 'USA-states",

#

but i changed it and that doesn't work either

grave frost Oct 15, 2020, 5:12 PM

#

is there a comma before that line? i.e after states_idx? python doesn't point to the exact absolute error, usually it is around somewhere there

#

and is there a line where the comma is not needed, say the line where you will specify no argument next?

hollow sentinel Oct 15, 2020, 5:15 PM

#

hang on i'm running into issues w my terminal

#

module 'plotly.graph_objects' has no attribute 'Chloropeth'

grave frost Oct 15, 2020, 5:16 PM

#

If you are a beginner, I highly discourage Jupyter Notebooks. I myself use JN's every day but you need to have a little experience in programming to use it and avoid basic mistakes. You will also find that IDE's also provided code completion (or you can use kite with your notebook). So it would provide you with docs and complete your own code, find basic mistakes etc. all of which are very productive.

hollow sentinel Oct 15, 2020, 5:16 PM

#

i had my freshman year in coding i

#

i'm pretty new but not like i started coding last night

#

i'm just rusty

grave frost Oct 15, 2020, 5:16 PM

#

Right. Now did you google the error first to find out what fixes are suggested?

#

Well, if you are rusty then you shldn't be on JN in the first place

#

And it is no matter of embarrassment that you are having basic problems. Everyone has them at first 🙂 IDE is just to help you.

hollow sentinel Oct 15, 2020, 5:18 PM

#

it's annoying how JN doesn't have line numbers

#

ok so i found this

#

https://stackoverflow.com/questions/51251188/attributeerror-module-plotly-has-no-attribute-plotly

Stack Overflow

AttributeError: module 'plotly' has no attribute 'plotly'

I don't understand why this snippet of code does not work. The data is fictional, I just want to be able to make time-series visualization with plotly.

This module once worked for me in a Kaggle k...

#

and they said to import plot.ploty

grave frost Oct 15, 2020, 5:18 PM

#

so try it out 🙂

hollow sentinel Oct 15, 2020, 5:19 PM

#

ImportError: 
The plotly.plotly module is deprecated,
please install the chart-studio package and use the
chart_studio.plotly module instead.

grave frost Oct 15, 2020, 5:19 PM

#

Uh-oh. Now if you another problem, just google it

hollow sentinel Oct 15, 2020, 5:20 PM

#

yeah i found this

#

https://github.com/plotly/plotly.py/issues/1660

GitHub

plotly.plotly import error message · Issue #1660 · plotly/plotly.py

Should have caught this before RC1, sorry... This message: I think should provide a little more context and say something like: The plotly.plotly module is deprecated. To create figures using the C...

grave frost Oct 15, 2020, 5:20 PM

#

In programming it's like an infinite amount of errors, and all you can do is just keep correcting them. Over time you will make less mistakes.

hollow sentinel Oct 15, 2020, 5:20 PM

#

yeah i know that

grave frost Oct 15, 2020, 5:20 PM

#

Though I hate it myself 🙂

hollow sentinel Oct 15, 2020, 5:20 PM

#

how do you exit out of a terminal that's currently running jupyter notebook

grave frost Oct 15, 2020, 5:21 PM

#

what OS are you on?

#

I believe ctrl-c twice does the trick

austere swift Oct 15, 2020, 5:21 PM

#

you can just do ctrl-c once

#

thats just a general "cancel program" code

grave frost Oct 15, 2020, 5:22 PM

#

yeah then you have to press "y" and enter then also

hollow sentinel Oct 15, 2020, 5:22 PM

#

it's saying to download plotly 3.10.0

grave frost Oct 15, 2020, 5:22 PM

#

better to do it twice and shut down everything

austere swift Oct 15, 2020, 5:22 PM

#

no?

#

just doing it once you dont have to do y or anything

#

loll

grave frost Oct 15, 2020, 5:23 PM

#

Hmm I am on linux and I do it twice 🙂 maybe its old

hollow sentinel Oct 15, 2020, 5:23 PM

#

The plotly.plotly module is deprecated,
please install the chart-studio package and use the
chart_studio.plotly module instead.

#

i tried the fix

#

it didn't work

#

this is sus

#

https://community.plotly.com/t/solved-update-to-plotly-4-0-0-broke-application/26526/2

Plotly Community Forum

[solved] Update to plotly-4.0.0 broke application

Okay, I eventually found this guide: (don’t know what took me so long) And the key part which did break my application is that I needed to use conda install -c plotly chart-studio instead of pip install. Then a simple import chart_studio.plotly in my graph_definitio...

grave frost Oct 15, 2020, 5:25 PM

#

~~then just kick it out~~ You have to try many fixes. Hopefully one will work

hollow sentinel Oct 15, 2020, 5:27 PM

#

yeah i'm googling all over the place

#

nope still didn't work

main rain Oct 15, 2020, 5:28 PM

#

hey so im getting an error message when trying to install the pytorch module

glass jetty Oct 15, 2020, 5:29 PM

#

@main rain Take a look at https://stackoverflow.com/questions/56859803/modulenotfounderror-no-module-named-tools-nnwrap and https://github.com/pytorch/pytorch/issues/4827

Stack Overflow

ModuleNotFoundError: No module named 'tools.nnwrap'

I am trying to import a package "torch".
For same, I tried to install it using pip command as below, installation even started but after few seconds it got error

below is the command that I execu...

GitHub

ModuleNotFoundError: No module named 'torch' · Issue #4827 · pytorc...

OS: macOS High Sierra version 10.13.2 PyTorch version: How you installed PyTorch (conda, pip, source): pip3 Python version: Python 3.6.0 :: Anaconda 4.3.0 (x86_64) CUDA/cuDNN version: No GPU I succ...

main rain Oct 15, 2020, 5:29 PM

#

ohh thanks a lot imma read that

hollow sentinel Oct 15, 2020, 5:30 PM

#

no problem

#

the same error

#

module 'plotly.graph_objs' has no attribute 'Chloropeth'

#

nope nothing i try does anything

#

i'm on the third page of google

main rain Oct 15, 2020, 5:36 PM

#

@glass jetty thanks it worked by running
pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

misty lake Oct 15, 2020, 5:39 PM

#

@misty lake You mean deploying a trained ML model to cloud and serve it via REST API's or something?
@grave frost Not really but without using cloud services. Also architecting example .

dull musk Oct 15, 2020, 6:07 PM

#

anyone can recommend me a dataset to get companies information like name and domain ?

coral walrus Oct 15, 2020, 6:14 PM

#

heya. using pandas, is there a way to generate an empty dataframe if the code below doesn't return any values?

data = pd.read_excel(r'{path}\some_data.xlsx', sheet_name='any')

heady hatch Oct 15, 2020, 6:18 PM

#

You can always do pd.DataFrame() then append to it.

coral walrus Oct 15, 2020, 6:19 PM

#

so something like if data is empty: new = pd.DataFrame() and add a single empty row to it or something like that

heady hatch Oct 15, 2020, 6:20 PM

#

Depending on your situation, you don't even need to add a single empty row.

#

Just straight up df = pd.DataFrame().

coral walrus Oct 15, 2020, 6:21 PM

#

I'll give it a try

heady hatch Oct 15, 2020, 6:22 PM

#

Btw it might help to talk about what you're trying to do because as of now the problem you're asking about is "how do I create an empty DataFrame", which the answer is pd.DataFrame().

But that might not be the actual problem you're facing.

#

http://xyproblem.info/

The XY Problem

Asking about your attempted solution rather than your actual problem

coral walrus Oct 15, 2020, 6:22 PM

#

I think you misunderstand

heady hatch Oct 15, 2020, 6:23 PM

#

heya. using pandas, is there a way to generate an empty dataframe if the code below doesn't return any values?
data = pd.read_excel(r'{path}\some_data.xlsx', sheet_name='any')

@coral walrus Could you clarify what you mean exactly here?

coral walrus Oct 15, 2020, 6:23 PM

#

if I write

data = pd.read_excel(r'{path}\some_data.xlsx', sheet_name='any')
df = pd.DataFrame(data)

I'll create a new df and load it with the data I import from some_data.xlsx

#

but if there's no data in some_data.xlsx, df = pd.DataFrame(data) will fail

#

so I need to check first if any data is imported from the workbook, and if no data is imported, then I need to create an empty dataframe

heady hatch Oct 15, 2020, 6:26 PM

#

If the excelsheet doesn't have any data in it, the read_excel should have no data in it.

#

In [2]: pd.read_excel('Workbook1.xlsx')
Out[2]:
Empty DataFrame
Columns: []
Index: []

#

I created an excelsheet without data, and as you can see.

coral walrus Oct 15, 2020, 6:27 PM

#

I might know what I'm doing wrong. thanks for trying anyway @heady hatch DankMemes

hollow sentinel Oct 15, 2020, 10:24 PM

#

fig = go.Figure(data=go.Chloropeth(
locations = states_idx,
#z = stateValues.astype(float)
locationmode = "USA-states",
colorscale = "Reds",
colorbar_title = "Restaurants per State",
))

fig.update_layout(
title_text = "States with the Largest Amount of Chipotle Restaurants",
geoscope = "usa")

fig.show

#

this gives me an error saying

#

AttributeError Traceback (most recent call last)
<ipython-input-6-33c0f5439965> in <module>
25 plt.show()
26
---> 27 fig = go.Figure(data=go.Chloropeth(
28 locations = states_idx,
29 #z = stateValues.astype(float)

AttributeError: module 'plotly.graph_objs' has no attribute 'Chloropeth'

#

i've looked for why and they suggested doing a bunch of import statements and pip install plot.ly

#

import pandas as pd
from matplotlib import pyplot as plt
import chart_studio.plotly as py
from plotly import graph_objs as go
import plotly.plotly as py

#

I genuinely have no clue on how to fix it I've gone to like the fifth page of Google too

#

@austere swift any ideas?

austere swift Oct 15, 2020, 10:31 PM

#

its plotly express

hollow sentinel Oct 15, 2020, 10:32 PM

#

what

austere swift Oct 15, 2020, 10:33 PM

#

the graph objs one isnt really good for this purpose you should use the plotly express one

#

its in the using built in country and state geometries

hollow sentinel Oct 15, 2020, 10:33 PM

#

import plotly.express as px

austere swift Oct 15, 2020, 10:33 PM

#

yes, then px.choropleth

hollow sentinel Oct 15, 2020, 10:34 PM

#

📎 unknown.png

austere swift Oct 15, 2020, 10:34 PM

#

the second example

#

with the US and the built in state geometries

hollow sentinel Oct 15, 2020, 10:34 PM

#

second example wehre

austere swift Oct 15, 2020, 10:34 PM

#

scroll down

#

just below the one you showed an ss of

hollow sentinel Oct 15, 2020, 10:35 PM

#

polar coordinates?

austere swift Oct 15, 2020, 10:35 PM

#

no

hollow sentinel Oct 15, 2020, 10:35 PM

#

oh

austere swift Oct 15, 2020, 10:35 PM

#

this one

📎 unknown.png

#

youll need to do a little preprocessing on your dataset though

hollow sentinel Oct 15, 2020, 10:36 PM

#

are you on this https://plotly.com/python/plotly-express/

Plotly Express

Plotly Express is a terse, consistent, high-level API for creating figures.

austere swift Oct 15, 2020, 10:36 PM

#

no

#

https://plotly.com/python/choropleth-maps/#using-builtin-country-and-state-geometries

Choropleth Maps

How to make choropleth maps in Python with Plotly.

#

the link i sent you earlier remember

#

you could create a second DF that just has [state abbreviation, state name, number of chipotles] as columns

#

then do something like this

import plotly.express as px

fig = px.choropleth(df, locations="state_abv", locationmode="USA-states", color="num_chipotles", scope="usa")
fig.show()

#

state_abv and num_chipotles being the names of the columns in your df

#

could be whatever you want

#

but the nice thing about px is that you dont need to do like df["state_abv"] for locations and color and stuff

#

you could just put the string and since the first argument is the df itll take the string index of that automatically

#

thats the basic idea, so you just need to preprocess the df and make a second df that just has the state abbreviations and the number of chipotles

#

which would be pretty easy

#

also you could include hover data, so that when you hover over it itll show the state name and the number of chipotles

#

that was probably a lot that i just said lmk if you need clarification on anything

hollow sentinel Oct 15, 2020, 10:41 PM

#

I don't know how to create the new df

austere swift Oct 15, 2020, 10:41 PM

#

so you created a variable with a list of the names of states and another one with the number of chipotles in each one right

hollow sentinel Oct 15, 2020, 10:42 PM

#

yes

#

that's stateValues and states_idx

austere swift Oct 15, 2020, 10:43 PM

#

which one is the state names and which one is chipotles per state?

hollow sentinel Oct 15, 2020, 10:43 PM

#

stateValues is chipotles per state

#

states_idx is state names

#

@storm sandal quit facepalming i'm new to this

austere swift Oct 15, 2020, 10:45 PM

#

so you could create a dict then make a df from the dict

#

or you could make a df directly from the vars

#

but youd have to transpose the lists first if you wanna make a df directly from those

#

which isnt that hard tbh

hollow sentinel Oct 15, 2020, 10:47 PM

#

how about I just not make a chloropath it sounds hard

#

lmao

#

I got no clue what to do

austere swift Oct 15, 2020, 10:47 PM

#

its honestly a lot easier than it sounds

#

making a dict would probably be easer so you could do this

data = {"states": states_idx, "num_chipotles": stateValues}
df = pd.DataFrame(data)

hollow sentinel Oct 15, 2020, 10:50 PM

#

yeah but doesn't it require states abbreviated

austere swift Oct 15, 2020, 10:51 PM

#

oh yeah forgot abt that

#

you could find some dict online for states to state abbreviations

#

https://gist.github.com/rogerallen/1583593 bam

Gist

A Python Dictionary to translate US States to Two letter codes

A Python Dictionary to translate US States to Two letter codes - us_state_abbrev.py

hollow sentinel Oct 15, 2020, 10:52 PM

#

nice but idk how to use that in my code

#

sorry i'm new to all of this

storm sandal Oct 15, 2020, 10:53 PM

#

@hollow sentinel right there with ya. doesn't mean I can't cringe at your (or my own) expense.

austere swift Oct 15, 2020, 10:53 PM

#

state_abbreviations = [us_state_abbrev[state] for state in states_idx]

#

then incorporate that in the above code

#

state_abbreviations = [us_state_abbrev[state] for state in states_idx]
data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
df = pd.DataFrame(data)

#

and you could save that df as a csv so you dont have to put that whole dict in your code

#

so you could preprocess it in the python shell or something then have the csv ready to import in your code

hollow sentinel Oct 15, 2020, 10:56 PM

#

yep this is way over my head

#

lmao

#

it's not that complicated I must be burnt out

austere swift Oct 15, 2020, 10:57 PM

#

so essentially the first line is a list comp to convert states_idx to state abbreviations, then the second line creates a dict for all the data, then the third line just makes a df from that dict

hollow sentinel Oct 15, 2020, 10:58 PM

#

yeah it's saying that us_state_abbrev is not defined

austere swift Oct 15, 2020, 10:58 PM

#

thats the dict from the link i sent you

#

you have to put that dict in your code somewhere too

hollow sentinel Oct 15, 2020, 10:58 PM

#

the whole dict

#

alright

#

ok i did it

austere swift Oct 15, 2020, 10:59 PM

#

yeah thats why i'm saying you should just paste it in a python shell prompt and do all that code to make the df in ipython, then save it as a csv once then you can import it in your normal code

#

i gtg now but that should be enough to get you started

hollow sentinel Oct 15, 2020, 11:00 PM

#

thank you @austere swift

hollow sentinel Oct 15, 2020, 11:15 PM

#

also guys i spelled Choropleth wrong in my code

#

so i have a new error now

arctic wedgeBOT Oct 15, 2020, 11:16 PM

#

Hey @hollow sentinel!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

hollow sentinel Oct 15, 2020, 11:16 PM

#


ValueError: 
    Invalid value of type 'plotly.graph_objs.Choropleth' received for the 'data' property of 
        Received value: Choropleth({
    'colorbar': {'title': {'text': 'Restaurants per State'}},
    'colorscale': 'Reds',
    'locationmode': 'USA-states',
    'locations': array(['California', 'Texas', 'Ohio', 'Florida', 'New York', 'Illinois',
                        'Virginia', 'Pennsylvania', 'Maryland', 'Arizona', 'Colorado',
                        'Minnesota', 'New Jersey', 'North Carolina', 'Massachusetts', 'Georgia',
                        'Washington', 'Indiana', 'Missouri', 'Michigan', 'Oregon', 'Kansas',
                        'Nevada', 'Tennessee', 'Connecticut', 'Washington DC', 'South Carolina',
                        'Kentucky', 'Wisconsin', 'Alabama', 'Oklahoma', 'Utah', 'Nebraska',
                        'Iowa', 'Louisiana', 'Delaware', 'New Mexico', 'Rhode Island',
                        'New Hampshire', 'Arkansas', 'West Virginia', 'Maine', 'Idaho',
                        'Montana', 'Vermont', 'North Dakota', 'Mississippi', 'Wyoming'],
                       dtype=object)
})

hollow sentinel Oct 15, 2020, 11:35 PM

#

i think that's because it should be abbreviated state names

empty cloud Oct 15, 2020, 11:46 PM

#

Anyone have any problem with plotly
I tried it on vscode and jupyter but to no avail
It just shows a blank graph on Jupyter notebook online

#

Oh vscode it says module is missing even though I installed everything

stable otter Oct 16, 2020, 12:58 AM

#

Im quite new to machine learning/deep learning. I use pytorch and would like to work on some projects to get better. Any suggestions

lapis sequoia Oct 16, 2020, 12:59 AM

#

Hey

#

i have like 40k of valid phone numbers and like 100k of invalid ones there is anyway to train the model and predict or generate like valid phone numbers ? with Reinforcement Learning or some kind of Machine learning ?

#

Thanks!

austere swift Oct 16, 2020, 1:16 AM

#

@lapis sequoia well the thing is phone numbers dont really have a good pattern to figure out whether theyre valid or not

#

other than the area code theyre mostly random

#

so you can't really create a machine learning model to predict it or anything

#

@stable otter what projects are you looking to do?

stable otter Oct 16, 2020, 1:21 AM

#

anything that can get me good practice and more experience in this area

#

i tend to forget what ive learned if i dont implement it

#

@austere swift

lapis sequoia Oct 16, 2020, 1:28 AM

#

@lapis sequoia well the thing is phone numbers dont really have a good pattern to figure out whether theyre valid or not
@austere swift Yeah that's the problem but reinforsement learning i think will be good for this but how i can setup a environement ?

austere swift Oct 16, 2020, 1:31 AM

#

reinforcement learning still wont work unless theres some sort of pattern

lapis sequoia Oct 16, 2020, 1:39 AM

#

Yeah

#

Thanks!

sly grove Oct 16, 2020, 2:09 AM

#

https://computersciencehub.io/datascience/must-have-data-scientist-skill-writing-skill/

Computer Science Hub

Must have Data Scientist skill Writing Skill

Almost everyone is talking about how important is Programming or Statistic skills are for Data Science job, rather form my experience it's not that instead it's Writing Skills.

vital yarrow Oct 16, 2020, 2:47 AM

#

hello can anyone help me with the matplotlib library

#

my plot contains two y axes (implemented it as a subplot), and i cant find a way to make it so my two graphs doesn't overlap

#

📎 unknown.png

#

messy graph, but i highlighted in blue where the red plot overlaps with the black plot.

#

i tried changing the y_lim of my axes but all it does is change the y axis range but the plot shape stays exactly the same (so it python just automatically scales it)

hasty grail Oct 16, 2020, 2:57 AM

#

are you trying to plot one graph per subplot?

vital yarrow Oct 16, 2020, 2:58 AM

#

er not quite sure what you mean by that im not really good at subplots (or matplot in general)

#

heres my code

#

📎 unknown.png

#

im just making 1 figure that has two datasets plotted and 2 y-axes

#

and from what i see on google it seems like subplots can implement this with the ax.twinx() function so i went with it lol

hasty grail Oct 16, 2020, 3:05 AM

#

Oh I have never tried doing that so idk, maybe someone else can help you

vital yarrow Oct 16, 2020, 3:06 AM

#

hmm okay

#

i think the most odd part is no matter how i set my y limit python automatically scales my entire plot so that it fits with the figure. Id imagine i can make some artificial white space between the two plots if i change one of the y-axes range...maybe its just a subplot thing

clear mulch Oct 16, 2020, 3:30 AM

#

try to increase the figure size by using plt.figure(figsize=(10,10))

#

I hope this helps

📎 Capture.PNG

solid mantle Oct 16, 2020, 4:17 AM

#

Any bayesians here?

fierce shadow Oct 16, 2020, 4:57 AM

#

hey, does anybody knows what should I keep as a loss function if I want to create a CNN which takes image as an input and returns image as ouput

#

and how can I make keras model return image in the first place?

hasty grail Oct 16, 2020, 5:43 AM

#

if you're rescaling all of your pixels to the range [0, 1] you can get the model to output pixels in said range by using the sigmoid activatoin

#

as for the loss function you can use binary_crossentropy

tawny oak Oct 16, 2020, 6:55 AM

#

Hey everybody, I need your help, I have an excel table which has 10000 rows. each one has one sentence used different words and I want to find same meaning sentences on rows. For example 1000rows are about brake system on 10000 rows. or %10 is about brake system. how should i go? do you have any advice?

heady hatch Oct 16, 2020, 6:57 AM

#

Hmm I can only offer couple machine learning methods.

#

You can take some form of vector representations of each sentences, reduce the dimension with SVD, and then use KNN.

#

Or you can go deep learning methods where you encode the sentences in some kind of embeddings and then calculate similarity based on those embeddings.

tawny oak Oct 16, 2020, 6:59 AM

#

okay thank you so mush

#

much

heady hatch Oct 16, 2020, 7:08 AM

#

Hey guys question before I dive deep into a naive computation.

I'm creating a similarity search engine.
I have a huge dataset of text, I'm planning on encoding each text into Roberta's embeddings, then calculating similarities based on those embeddings.

However it's going to iterate through every single text embedding combination, ie text 1 embedding and text 2...

For all the text data, which I have around 2 million data points.

Is this how you guys would go about it?

hasty grail Oct 16, 2020, 7:15 AM

#

You can try storing the embeddings in a cKDTree but as SciPy has mentioned:

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

#

I assume you're already vectorizing the calculations

heady hatch Oct 16, 2020, 7:16 AM

#

Wait, to clarify. what do you mean by vectorizing the calculations?

As in the code to calculate similarity is vectorized?

#

On the other hand, I wonder if I can try my suggestion to the guy above.

Take the embeddings, reduce the dimensions, and then go from there.

hasty grail Oct 16, 2020, 7:21 AM

#

As in the code to calculate similarity is vectorized?
Yes, aka you don't use for loops

heady hatch Oct 16, 2020, 7:22 AM

#

Then yup, mhm!

hasty grail Oct 16, 2020, 7:27 AM

#

Maybe you want to take a look at Locality-Sensitive Hashing

heady hatch Oct 16, 2020, 7:31 AM

#

Thank you so much, looking into these algorithms now.

lapis sequoia Oct 16, 2020, 10:28 AM

#

hey

#

can someone help me please ?? ❤️

#

im struggling like 2 hour's

#

import urllib.request
def isitup():
try:
if(urllib.request.urlopen("http://www.instagram.com").getcode()==200):
print("it's up")
except:
print("Error caught")

def getinstagram():
import json
import requests
inputuser_name = input("Username: ")
user=(inputuser_name + '_name/?__a=1')
solditems = requests.get('https://www.instagram.com/',user) # (your url)
data = solditems.json()
with open(input("Enter Filename: " 'wb') as f:
f.write(data)

getinstagram()

#

that's my source code

#

and i want to rename the save file with the inputuser name

#

but im getting a error.

brittle agate Oct 16, 2020, 11:09 AM

#

Wait a minute...

#

📎 unknown.png

#

https://tenor.com/view/loading-cat-thinking-wait-what-gif-15922897

Tenor

halcyon vale Oct 16, 2020, 3:13 PM

#

https://www.linkedin.com/posts/thinam-tamang-3b12831a2_66daysofdata-datascience-machinelearning-activity-6722859389633490944-wCF_

Thinam Tamang posted on LinkedIn

Day 43 of #66DaysOfData! with Ken Jee

In my journey of Natural Language Processing, Today I have read and Implemented about MLP Classifier, The Training...

vital yarrow Oct 16, 2020, 3:29 PM

#

@clear mulch thanks for the tip but unfortunately it didnt solve my issue

#

if anyone has any other suggestions on how to make it so two matplotlib plots dont overlap plz ping/dm me

ripe forge Oct 16, 2020, 3:36 PM

#

Curious about where you hit these kinds of bottlenecks in python. The issue being, if it's a memory issue, no programming language can help, and if it's purely a performance issue, the python libraries that utilize C or similar low level code should have been sufficient I would presume. Having said that, I personally have no experience with julia

#

Interesting. A case this specific I'm curious how long Julia will take

#

The immediate issue I see is that you're having a decent amount of IO and then you're probably using native python datatypes for this computation.

#

Inherently python datatypes have to do a lot more checks for things like simple addition. It's one reason why numpy and pandas are usually recommended over native python for number crunching

#

So I can see the issue here. I'm curious how cython would fare

#

If you can rephrase your work into numpy arrays, it's definitely worth a shot.

#

(I just personally don't know how to do that here. Not too proficient at it)

#

Can't hurt to try, though the first issue is figuring out if this problem can even be expressed as arrays first.

#

My limited understanding of cython is that you could always just create and compile the function you were interested in for doing the dirty work

#

Leaving the rest of your code cython free and good to go as-is

molten hamlet Oct 16, 2020, 4:18 PM

#

my plot contains two y axes (implemented it as a subplot), and i cant find a way to make it so my two graphs doesn't overlap
@vital yarrow what exactly you mean by two graphs doesn't overlap?

vital yarrow Oct 16, 2020, 4:19 PM

#

📎 unknown.png

#

the blue circles are areas where my two plots overlap (the red plot and black plot)

#

im trying to either shift up the red plot or move down the black plot so theres no overlap (theres white space in between the peaks)

#

📎 unknown.png

#

@molten hamlet

grave frost Oct 16, 2020, 4:43 PM

#

@vital yarrow Why don't you just adjust the data points at that peak where is overlaps?

molten hamlet Oct 16, 2020, 4:48 PM

#

@vital yarrow Why don't you just adjust the data points at that peak where is overlaps?
@grave frost cause it has to do same with axis

vital yarrow Oct 16, 2020, 4:52 PM

#

i cant adjust individual datapoints since this data is for a lab report, i can only scale the data

#

i found out i was looking at the wrong axis so imma fix it and see if it solves my problem

#

ill ping you guys if i hit a deadend again

molten hamlet Oct 16, 2020, 5:00 PM

#

yes but you could shift all data, and move axis to start at 0.05 or something

hollow sentinel Oct 16, 2020, 5:30 PM

#

state =sample_data["state"]
stateValues = state.value_counts()
stateValues
#counts of each state is stateValues

states_idx = state.value_counts().index
states_idx
#each state name

plt.pie(stateValues, labels = states_idx,)
#labels are the state name

plt.title("States with the Largest Amount of Chipotle Restaurants")
plt.show()

x_pos = [i for i, _ in enumerate(states_idx)]
#each state name
plt.bar (x_pos, stateValues, color = "blue")
#create the bar chart
plt.style.use('ggplot')
plt.xlabel("States")
plt.ylabel("Restaurants")
plt.title("States with the Largest Amount of Chipotle Restaurants")
plt.xticks(x_pos, states_idx )
plt.show()
state_abbreviations = [us_state_abbrev[state] for state in states_idx]
data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
df = pd.DataFrame(data)

fig = go.Figure(data=go.Choropleth(
locations = states_idx,
#z = stateValues.astype(float)
locationmode = "USA-states",
colorscale = "Reds",
colorbar_title = "Restaurants per State",
))

fig.update_layout(
title_text = "States with the Largest Amount of Chipotle Restaurants",
geoscope = "usa")

fig.show

#

states = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'
}

#

KeyError                                  Traceback (most recent call last)
<ipython-input-8-f84fb4d75306> in <module>
     85 }
     86 
---> 87 state_abbreviations = [us_state_abbrev[state] for state in states_idx]
     88 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     89 df = pd.DataFrame(data)

<ipython-input-8-f84fb4d75306> in <listcomp>(.0)
     85 }
     86 
---> 87 state_abbreviations = [us_state_abbrev[state] for state in states_idx]
     88 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     89 df = pd.DataFrame(data)

KeyError: 'California'

#

does anyone know why i'm getting this key error for california

#

i tried taking it out and that didn't fix it

#

i need state abbreviations to create my chloropeth

#

key error happens when you refer to something that's not in the dictionary but california is clearly in the dictionary

odd yoke Oct 16, 2020, 5:36 PM

#

you need to invert your dict

hollow sentinel Oct 16, 2020, 5:36 PM

#

uhhhh

odd yoke Oct 16, 2020, 5:36 PM

#

eg py {"Wyoming": "WY"}

hollow sentinel Oct 16, 2020, 5:36 PM

#

ohhh so switch keys and values

odd yoke Oct 16, 2020, 5:37 PM

#

dictionaries in python are a key/value map, they're not what's called "bi-directional", you can't get a key from a value (easily)

#

(you can but it's annoying and ruins the purpose of dicts)

hollow sentinel Oct 16, 2020, 5:37 PM

#

ok so how do you invert it

odd yoke Oct 16, 2020, 5:38 PM

#

name_to_abbrev = {v: k for k, v in states.items()}```

hollow sentinel Oct 16, 2020, 5:39 PM

#

ok thanks

#

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-f8d5727657f6> in <module>
     87 name_to_abbrev = {v: k for k, v in states.items()}
     88 
---> 89 state_abbreviations = [name_to_abbrev[state] for state in states_idx]
     90 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     91 df = pd.DataFrame(data)

<ipython-input-10-f8d5727657f6> in <listcomp>(.0)
     87 name_to_abbrev = {v: k for k, v in states.items()}
     88 
---> 89 state_abbreviations = [name_to_abbrev[state] for state in states_idx]
     90 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     91 df = pd.DataFrame(data)

KeyError: 'Washington DC'

#

yeah this is the other key error i ran into

#

i put in your snippet of code to reverse the keys and values

#

idk what's wrong this error has been chasing me for the past day

odd yoke Oct 16, 2020, 5:53 PM

#

There is no Washington DC in your dict

hollow sentinel Oct 16, 2020, 6:04 PM

#

Oh that’s sus

#

ok I’ll add it in

#

KeyError                                  Traceback (most recent call last)
<ipython-input-11-6bdf779ccf89> in <module>
     88 name_to_abbrev = {v: k for k, v in states.items()}
     89 
---> 90 state_abbreviations = [name_to_abbrev[state] for state in states_idx]
     91 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     92 df = pd.DataFrame(data)

<ipython-input-11-6bdf779ccf89> in <listcomp>(.0)
     88 name_to_abbrev = {v: k for k, v in states.items()}
     89 
---> 90 state_abbreviations = [name_to_abbrev[state] for state in states_idx]
     91 data = {"states": states_idx, "num_chipotles": stateValues, "state_abv": state_abbreviations}
     92 df = pd.DataFrame(data)

KeyError: 'Washington DC'

#

I added in a washington DC

#

 'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'DC': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'

#

this is bothering me

hollow sentinel Oct 16, 2020, 6:29 PM

#

I keep having a recurring error of KeyError: 'Washington'
when "Washington" : "WA" is already there

#

please ping me if you figure this out bc i have been staring at this for the past 2 hours and I still can't

grave frost Oct 16, 2020, 7:24 PM

#

error is KeyError: 'Washington DC' yet I don't see Washington DC, only washington

hollow sentinel Oct 16, 2020, 7:49 PM

#

that didn't work either

#

ugh

velvet thorn Oct 16, 2020, 11:01 PM

#

that didn't work either
@hollow sentinel why not just use a set comprehension to find the elements present in one but not the other

hollow sentinel Oct 16, 2020, 11:02 PM

#

@velvet thorn what’s set

velvet thorn Oct 16, 2020, 11:05 PM

#

@velvet thorn what’s set
@hollow sentinel like a list but it can only have unique elements

#

and unordered

hollow sentinel Oct 16, 2020, 11:10 PM

#

@velvet thorn idk man I have no clue how to do that

#

I think I’m gonna give up on making a chloropeth and start doing beginner data science projects instead

velvet thorn Oct 16, 2020, 11:18 PM

#

@velvet thorn idk man I have no clue how to do that
@hollow sentinel okay so like

#

state_abbreviations = [name_to_abbrev[state] for state in states_idx] isn't working, right

#

because there are keys in states_idx that are not in name_to_abbrev

#

so to find all those keys...

#

{state for state in states_idx if state not in name_to_abbrev}

hollow sentinel Oct 16, 2020, 11:19 PM

#

ok I’ll try it

#

Thank you @velvet thorn

lapis sequoia Oct 17, 2020, 12:15 AM

#

I am currently working on a neural network and got the error: self._open(**self.request.kwargs.copy())

TypeError: _open() got an unexpected keyword argument 'channels'

#

I don't have an open function so I don't know how this error got associated here

jolly plank Oct 17, 2020, 4:38 AM

#

is this the place where i ask a random question

#

because i dont know where to go

glossy osprey Oct 17, 2020, 4:55 AM

#

Hi everyone

#

I'm beginning date science

lapis sequoia Oct 17, 2020, 4:56 AM

#

Data scientist or developer

glossy osprey Oct 17, 2020, 4:56 AM

#

I wanna to be a data scientist

#

*be

lapis sequoia Oct 17, 2020, 4:57 AM

#

Ir data minner

glossy osprey Oct 17, 2020, 4:57 AM

#

Sorry?

lapis sequoia Oct 17, 2020, 4:57 AM

#

Ml

#

Do you know ai

glossy osprey Oct 17, 2020, 4:58 AM

#

I don't know nothing

#

Lol

#

I just know the simple syntax of python 's language

lapis sequoia Oct 17, 2020, 4:59 AM

#

Including English grammer

glossy osprey Oct 17, 2020, 4:59 AM

#

Kkkkkk sorry, I'm from Brazil

#

Kkkkkk

austere swift Oct 17, 2020, 4:59 AM

#

@glossy osprey what kinda data science are you gonna try and go into?

#

like statistics, machine learning, etc

lapis sequoia Oct 17, 2020, 5:00 AM

#

Me

austere swift Oct 17, 2020, 5:00 AM

#

no xvandao

glossy osprey Oct 17, 2020, 5:01 AM

#

I'm trying get in the date science

lapis sequoia Oct 17, 2020, 5:02 AM

#

I dont like to be a developer its not interesting to make and design things

glossy osprey Oct 17, 2020, 5:02 AM

#

I'm using the R and Python

#

Python I started 7 months ago

#

But I just learned the simple syntax

lapis sequoia Oct 17, 2020, 5:04 AM

#

Then learn common packages

glossy osprey Oct 17, 2020, 5:04 AM

#

I studied until POO

#

Poo right?

#

Here we call of poo

#

Wait

#

Please

#

A moment

lapis sequoia Oct 17, 2020, 5:05 AM

#

Poo or pee

austere swift Oct 17, 2020, 5:05 AM

#

I think they mean OOP

glossy osprey Oct 17, 2020, 5:05 AM

#

object-oriented programming

lapis sequoia Oct 17, 2020, 5:06 AM

#

Poo 😂

austere swift Oct 17, 2020, 5:06 AM

#

yeah OOP

glossy osprey Oct 17, 2020, 5:06 AM

#

Kkkkkkkkk sorry

austere swift Oct 17, 2020, 5:06 AM

#

its alr

glossy osprey Oct 17, 2020, 5:06 AM

#

In Portuguese be poo

lapis sequoia Oct 17, 2020, 5:06 AM

#

Ok

austere swift Oct 17, 2020, 5:06 AM

#

well have you learned like statistics and stuff?

glossy osprey Oct 17, 2020, 5:06 AM

#

"Programação orientada ao objeto "
"object-oriented programming"
Kkkkk

#

well have you learned like statistics and stuff?
@austere swift yes

#

Do you recommend to use Python or R?

austere swift Oct 17, 2020, 5:07 AM

#

okay well once you have the math and stuff down you should start learning about the different methods and stuff to do data science

#

I havent used R that much and even when i did it was like 2 years ago so idk but either one would work

#

python is better if you wanna delve into like machine learning and stuff though

glossy osprey Oct 17, 2020, 5:08 AM

#

There is a big discussion about who is the "better" here kkk

lapis sequoia Oct 17, 2020, 5:09 AM

#

Whats r

austere swift Oct 17, 2020, 5:09 AM

#

A programming language

lapis sequoia Oct 17, 2020, 5:10 AM

#

Is there any marketplace for that

#

Should i know statistics to learn ml

austere swift Oct 17, 2020, 5:12 AM

#

ml doesnt really need statistics that much, but sometimes the preprocessing for it does

#

ml is mostly calculus and linear algebra that you need to know

glossy osprey Oct 17, 2020, 5:13 AM

#

I havent used R that much and even when i did it was like 2 years ago so idk but either one would work
@austere swift I'm I'm studying Science Economics and, in my college, We use very much the R

lapis sequoia Oct 17, 2020, 5:14 AM

#

I know algerbra but i forget statistics

glossy osprey Oct 17, 2020, 5:15 AM

#

When I beginned the college, I learned the python first

#

But now I'm using the twice languages

lapis sequoia Oct 17, 2020, 5:16 AM

#

I learned java first

glossy osprey Oct 17, 2020, 5:16 AM

#

I learned java first
@lapis sequoia it's great!!

lapis sequoia Oct 17, 2020, 5:17 AM

#

Precisely two langs

glossy osprey Oct 17, 2020, 5:22 AM

#

Yeah

#

Now I'm going to sleep

jolly plank Oct 17, 2020, 5:23 AM

#

hello does anyone know pandas library

glossy osprey Oct 17, 2020, 5:23 AM

#

Here is 2:23 A.M

jolly plank Oct 17, 2020, 5:23 AM

#

in python

glossy osprey Oct 17, 2020, 5:23 AM

#

Bye friends

jolly plank Oct 17, 2020, 5:24 AM

#

can somebody help me with a question on pandas

#

for python

#

hello...

austere swift Oct 17, 2020, 5:36 AM

#

@jolly plank whats your question

jolly plank Oct 17, 2020, 5:36 AM

#

I need help with a question that keeps on giving me an error

#

it requires pandas

#

here is the question screenshot

#

📎 unknown.png

#

and here is my code that im having error on

#

📎 unknown.png

#

@austere swift hello...

#

hello is anyone here

jolly plank Oct 17, 2020, 6:05 AM

#

can someone help me

lapis sequoia Oct 17, 2020, 6:29 AM

#

@jolly plank can you please send the error clearly? its too small to read

bitter harbor Oct 17, 2020, 7:11 AM

#

you've cut off your code but taxi_zones_pickup isn't a key

proper fable Oct 17, 2020, 8:03 AM

#

Hi, Im new at datascience. would you give me some hint about how can I explore the dataset from here

https://www.kaggle.com/claudiodavi/superhero-set#

I have to submit this exploration for my assignment that say

"Do exploration and make
up the question!"

Super Heroes Dataset

How's your average superhero?

sample_data = pd.read_csv("chipotle_stores.csv") sample_data = pd.read_csv("chipotle_stores.csv")

sample_data = pd.read_csv("chipotle_stores.csv")
sample_data = pd.read_csv("chipotle_stores.csv")