#data-science-and-ml | Python | Page 272

lapis sequoia Nov 29, 2020, 7:35 AM

#

Do you all use PyTorch or TF for modelling?

ripe forge Nov 29, 2020, 9:32 AM

#

Whichever has an easier github repo at the time 😛

covert spire Nov 29, 2020, 10:31 AM

#

Where can i find data science projects to do as a beginner?

#

Like, is there an archive or smth for it

still delta Nov 29, 2020, 11:07 AM

#

Does Someone suggest me a challenge ???

#

I want to take a part in a challenge????

ashen socket Nov 29, 2020, 11:11 AM

#

@still delta @covert spire You can try kaggle. It has everything you might need. It has datasets for you to use. It has solutions. It has contests and a lot more.

still delta Nov 29, 2020, 11:12 AM

#

Is there any team working?

vapid burrow Nov 29, 2020, 11:27 AM

#

Consider joining a code jam

#

You can form teams and work together

paper nacelle Nov 29, 2020, 1:23 PM

#

Hi. I need help with ARIMA.
I am using the code below to find p,q&r. (I saw it somewhere, it worked fine with another data but not with the one I'm trying currently). It is to predict the number of daily covid cases in a country.
`model = pm.auto_arima(train, start_p=1, start_q=1,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=1, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)

print(model.summary())It is giving me the following error:ValueError: Input contains NaN, infinity or a value too large for dtype('float64').'

#

I have removed all NaN values tho....

#

it is very inaccurate rn :

📎 unknown.png

lapis sequoia Nov 29, 2020, 1:53 PM

#

guys, how do i train a cnn with more than 2 classes???

livid quartz Nov 29, 2020, 2:03 PM

#

Any ideas on how to plot a NumPy array? I've tried pcolor but that plots a mirrored version of the array and imshow() grid lines dont surround the array values properly

covert spire Nov 29, 2020, 3:06 PM

#

ashen socket <@!664438589840556064> <@!506136278144647178> You can try kaggle. It has everyth...

Thanks bro i'll look at it

vague portal Nov 29, 2020, 4:50 PM

#

Thanks! I've found a module which is a pipeline for processing text https://spacy.io/usage/spacy-101#pipelines, this is pretty similar to what I'm trying to achieve but with dataframes instead. https://github.com/explosion/spaCy/blob/master/spacy/pipeline/pipes.pyx --> this is the source code but I'm struggling to understand how they've built and organised the classes 😦

GitHub

explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython - explosion/spaCy

spaCy 101: Everything you need to know

spaCy 101: Everything you need to know · spaCy Usage Documentation

The most important concepts, explained in simple terms

wintry olive Nov 29, 2020, 5:25 PM

#

Sounds good. I'm going to focus on researching statistics & probability concepts for awhile. I understand how the comp sci processes work and what the models are trying to do. I could trial and error through it and my vision is reliant on chaos theory initial conditions for emergent fractal simulations not end to end engineered simulations. but Id like to have a decent understanding of statistics and probability concepts nevertheless. so i can help with validations, bias and variables et al.

#

speaking of which; is probability just another statistic? or are all statistics just a probability assessment?

azure stump Nov 29, 2020, 5:36 PM

#

https://asr373.medium.com/will-there-be-a-million-job-positions-for-data-scientists-4f5d50d1b04e

Medium

Will there be a Million job positions for data scientists?

Is Data Science a Build-up?😲

wintry olive Nov 29, 2020, 5:40 PM

#

of course there is an uncertainty principle to factor into this and that considering most of my understanding of statistics/probability comes from casual study of physics so symmetries/asymmetries, entropy, distributions and standard of deviations.

azure stump Nov 29, 2020, 5:41 PM

#

https://medium.com/analytics-vidhya/without-knowing-these-you-cant-be-a-data-scientist-b88deaba9533
it is based on functions of pandas

Medium

Without knowing these you can’t be a Data Scientist.

Pandas Functions

wintry olive Nov 29, 2020, 5:46 PM

#

yup with financial and business intelligence/analytics at the top

summer cobalt Nov 29, 2020, 5:49 PM

#

Can someon rate my code from a purely engineering perspective?

#

Link: https://github.com/aaditkapoor/icdata/blob/main/infer_and_convert/infer_and_convert.py

GitHub

aaditkapoor/icdata

icdata or "I see data" is a set of programs or code snippets that enable a data scientist/analyst to perform most of the heavy lifted data wrangling tasks in a more concise way. ...

#

Thanks!

verbal light Nov 29, 2020, 7:04 PM

#

Hi, I know basis of machine learning and RCNN theory. I want to make a object recognition program with google/custom images. Can u recommend some algorithms and their example implementation with tensorflow. I know that fast rcnn and faster rcnn are better/harder but i think there would be more examples so i'm open for suggestions. I'm trying to work at vest.ai machines because of my computer's parameters.

wintry olive Nov 29, 2020, 7:50 PM

#

ahh the keyword is Vision Transformer not pixel word

#

📎 unknown.png

#

hmm not sure @verbal light I tried to look if there were any APIs to use google image or bing image search engine

verbal light Nov 29, 2020, 8:05 PM

#

i mean something like "Open Images Dataset V6" https://storage.googleapis.com/openimages/web/download.html

wintry olive Nov 29, 2020, 8:07 PM

#

ahh thats like a corpus of images

#

in sets already

#

not sure man ive been all NLP

#

whoa thats big data too

#

one thing NLP can do with character search and semantics of collocates et all is allow for the creation of smaller sub sets of virtual corpora

#

i suppose you could extract subsets based on this:

#

📎 unknown.png

#

idk

#

heres a good old fashion cnn course

#

https://www.coursera.org/projects/tensorflow-for-cnns-object-recognition

Coursera

TensorFlow for CNNs: Object Recognition

Offered by Coursera Project Network. This guided project course is part of the "Tensorflow for Convolutional Neural Networks" series, and this series presents material that builds on the second course of DeepLearning.AI TensorFlow Developer Professional Certificate, which will help learners reinforce their skills and build more projects with Ten...

ionic isle Nov 29, 2020, 8:25 PM

#

Mathematics for Machine Learning
This is a 400-page free book about the mathematics needed for machine learning. It covers the things you need to know in order to get started with machine learning.
https://mml-book.com/

south quest Nov 29, 2020, 8:26 PM

#

ionic isle **Mathematics for Machine Learning** This is a 400-page free book about the math...

trim imp Nov 29, 2020, 8:43 PM

#

Hi, I have a question and it is hard to explain but I am trying. Can I summarize text from PDF and then classified the main topic? For example, some text has different topics divided into multiple articles. I want the topics with summary. So I hope that’s clear. How can I do it using Python? And which library would be useful. Thanks

livid quartz Nov 29, 2020, 8:43 PM

#

ionic isle **Mathematics for Machine Learning** This is a 400-page free book about the math...

Is any prior knowledge assumed?

#

Does anyone know how to change figure size in using plt.subplot?

#

It doesn't work the same as plt.subplots()

south quest Nov 29, 2020, 8:58 PM

#

livid quartz Is any prior knowledge assumed?

Have a read of Who Is the Target Audience? in the book

#

basically they claim high school maths

lapis sequoia Nov 29, 2020, 9:36 PM

#

hi. Could u help me making a cnn for image classification? All the examples ive seen are about cat/dogs with already image dataset from keras. But i want to use my own data set, and there are more than 2 categories

oblique vine Nov 29, 2020, 9:37 PM

#

@up just use other model than binary-crossentropy

#

use categorical with number of categories specified

#

or sparse-categorical 😛

lapis sequoia Nov 29, 2020, 9:39 PM

#

thats the name of the model i need?

oblique vine Nov 29, 2020, 9:39 PM

#

idk, most likely

#

find any digit recognition tutorial and read the code

#

it will be most likely something really similar to google one, but with categorical model

wintry olive Nov 29, 2020, 9:51 PM

#

i do have ideas for computer vision but without doing more research i have no way to determine how relevant, viable or outlandish the ideas are. until i finish working with NLP someone run a highly experimental unsupervised learning model with the mandelbrot set zoom as the input image. id love to take a look at what the model sees

#

and apply it to this:

#

📎 unknown.png

cold yarrow Nov 29, 2020, 11:00 PM

#

Clustering is cool

lapis sequoia Nov 30, 2020, 12:25 AM

#

guys my train data seems like this

#

https://gyazo.com/10ad185f8027af44c0e9e2edb9200a6f

Gyazo

#

how can i pass that as train data for a cnn?

#

i already got the labels, which are basically the name of the folders

#

but inside each folder there are images

#

how can i tell the cnn "all the images from this folder correspond to this label"

#

tf.keras.preprocessing.image_dataset_from_directory turns image files sorted into class-specific folders into a labeled dataset of image tensors.

#

is this what i want?

#

well, it sais i dont have such funcion

austere swift Nov 30, 2020, 3:01 AM

#

lapis sequoia ``tf.keras.preprocessing.image_dataset_from_directory`` turns image files sorted...

yeah that's what you'd want

#

and you'd set label mode to whatever label mode you'd want to use, by default it's sparse labels

lapis sequoia Nov 30, 2020, 3:14 AM

#

yeah but that method isnt implemented i think

#

not on tensorflow 2

#

The specific function (tf.keras.preprocessing.image_dataset_from_directory) is not available under TensorFlow v2.1.x or v2.2.0 yet.

#

so... could u help me to do it manually?

mossy dragon Nov 30, 2020, 5:34 AM

#

@trim imp

#

nltk would be useful

#

im not sure about summarizing, but you can def do topic modeling

#

Topic modeling is an unsupervised machine learning technique that's capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

obtuse skiff Nov 30, 2020, 10:10 AM

#

Hello, Im working on utilizing tripletloss for MNIST. I got something running and the Loss for the Training and Validation is getting smaller as expected every epoch, but the Accuracy is sticking around 18-20% and its just basic MNIST so something is def wrong,
I have a basic 2 conv layer 3 fc layer architecture. I put the anchor, pos and neg through that model, then put those results into the TripletMarginLoss on pytorch

any recommendations on what I can do?

wary sand Nov 30, 2020, 10:33 AM

#

hello, im looking for an A.I server

#

im interested in A.I and python

#

any suggestions?

gaunt tusk Nov 30, 2020, 10:34 AM

#

https://discord.gg/CbVJYtz

wary sand Nov 30, 2020, 10:35 AM

#

im already in that one, but im looking for a small and you know less active server

gaunt tusk Nov 30, 2020, 10:35 AM

#

That makes absolutely no sense

#

You would rather a small inactive server over one with 9000+ people

wary sand Nov 30, 2020, 10:35 AM

#

no, this server is very active and often people dont reply to my questions so

#

thats why i need a small server

ocean mountain Nov 30, 2020, 11:37 AM

#

Hii

#

Can anyone help with this

#

📎 IMG_20201130_170245_004.jpg

#

🙄

fallow prism Nov 30, 2020, 11:41 AM

#

use pandas please

ocean mountain Nov 30, 2020, 11:41 AM

#

ocean mountain

Done it

#

Already

#

But same problem

fallow prism Nov 30, 2020, 11:42 AM

#

The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8. Since 0x90 doesn't actually mean anything in Latin-1, UTF-8 (where 0x90 is a continuation byte) is more likely.

You specify the encoding when you open the file:

file = open(filename, encoding="utf8")

https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character

Stack Overflow

UnicodeDecodeError: 'charmap' codec can't decode byte X in position...

I'm trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:

Traceback (most recent cal...

ocean mountain Nov 30, 2020, 11:43 AM

#

Thanks

#

I am trying this now

fallow prism Nov 30, 2020, 11:43 AM

#

your file is using another encode and you have to know what is

ocean mountain Nov 30, 2020, 11:43 AM

#

Thanks 👍

fallow prism Nov 30, 2020, 11:43 AM

#

😸

ocean mountain Nov 30, 2020, 11:44 AM

#

I am implementing it

#

😌

fallow prism Nov 30, 2020, 11:44 AM

#

do you know about a site for learn about CNN?

ocean mountain Nov 30, 2020, 11:44 AM

#

Nop bro

#

Are you in Hactoberfest grp bro ?

fallow prism Nov 30, 2020, 11:44 AM

#

what is that?

#

hahaha

ocean mountain Nov 30, 2020, 11:45 AM

#

An open source event grp

fallow prism Nov 30, 2020, 11:45 AM

#

tell me more

ocean mountain Nov 30, 2020, 11:45 AM

#

Which happens in October every year

#

More than 70k Dev's take part in it

#

It's the biggest open source event

fallow prism Nov 30, 2020, 11:46 AM

#

grumpchib but is november now

ocean mountain Nov 30, 2020, 11:47 AM

#

Yep

copper kindle Nov 30, 2020, 12:16 PM

#

Any expertise using the orange software for data analysis and visualization ? what is the difference between t-SNE block and manifold t-SNE block? why both results are not the same ?

lapis sequoia Nov 30, 2020, 1:47 PM

#

lapis sequoia ``The specific function (tf.keras.preprocessing.image_dataset_from_directory) is...

can someone help me to implement it?

lapis sequoia Nov 30, 2020, 1:58 PM

#

lapis sequoia ``tf.keras.preprocessing.image_dataset_from_directory`` turns image files sorted...

https://stackoverflow.com/questions/54921711/interactive-labeling-of-images-in-jupyter-notebook

Intresting Post about Image Labeling in Python. Maybe you can extract the important part for your function

Stack Overflow

Interactive labeling of images in jupyter notebook

I have a list of pictures:

pictures = {im1,im2,im3,im4,im5,im6}
Where

im1:
im2:
im3:
im4:
im5:
im6:
I want to assign the pictures to labels (1,2,3,4 etc.)

For instance, here pictures 1 t...

fallow prism Nov 30, 2020, 2:06 PM

#

whats mean the shape of one array is (n,)?

#

why ','?

lapis sequoia Nov 30, 2020, 2:09 PM

#

fallow prism whats mean the shape of one array is (n,)?

Be precise. In which context is it used? Documentation or code?

leaden vessel Nov 30, 2020, 2:13 PM

#

lf help with periodogram of sinusoidal signals with normalized frequency and dB power

#

signal:

N = 1024; f1 = 500; f2 = 1200; fs = 8000 #Hz
n = np.arange(N)
Sn = 0.5*np.sin(2*np.pi*n*f1/fs) + np.sin(2*np.pi*n*f2/fs)

#

I have to generate PSD (power spectral density) with and without Hann window

#

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.periodogram.html
returns V**2/Hz

lapis sequoia Nov 30, 2020, 2:17 PM

#

lapis sequoia https://stackoverflow.com/questions/54921711/interactive-labeling-of-images-in-j...

mmm this is not what i was looking for i believe. I have my datset like this https://gyazo.com/10ad185f8027af44c0e9e2edb9200a6f and each fodler has the images. All the images on each folder have the folder name as label. I was planning on using like 80% of the images of each folder as train for that specific label, and the rest for validation. I just dont know how to tell that to keras

Gyazo

fallow prism Nov 30, 2020, 2:24 PM

#

lapis sequoia Be precise. In which context is it used? Documentation or code?

for example i run the code my_array.shape() where 'my_array' is a numpy array, then his shape is (102,) and not is (102,1) for example, why?

clever vapor Nov 30, 2020, 2:29 PM

#

hey

#

so guys i wanna make a program that does not specify my needs

#

do any of you have any recommendations django?

#

or anythihg else?

lapis sequoia Nov 30, 2020, 2:35 PM

#

fallow prism for example i run the code my_array.shape() where 'my_array' is a numpy array, t...

Now i understand this question. Okay i make it easy.

A array shape with (500, 1) is 2 dimensional. 500 rows and 1 column.
LIKE --> np.array([[1],[2],[3]...[n]])

A array shape with (500, ) is 1 dimensional and have 500 elements.
LIKE --> np.array([1,2,3 ... n])

#

It just say you have a array with n elements and not a array with (n* x )*m elements

#

@fallow prism if you need more, i can send you some good StackOverflow links

fallow prism Nov 30, 2020, 2:46 PM

#

of course!! thank you @lapis sequoia you were clear

lapis sequoia Nov 30, 2020, 2:47 PM

#

fallow prism of course!! thank you <@456226577798135808> you were clear

No problem. Better than study today 🙃

fallow prism Nov 30, 2020, 2:59 PM

#

hahaha it's worse to study on a tuesday

wintry olive Nov 30, 2020, 3:18 PM

#

i found this; https://neo4j.com/

Neo4j Graph Database Platform

Neo4j Graph Platform – The Leader in Graph Databases

Neo4j is the graph database platform powering mission-critical enterprise applications like artificial intelligence, fraud detection and recommendations.

#

Seems like a lot of adoption for this graph data platform

#

id converge all your datasets there

#

aside from learning playgrounds or research and development

#

all your startups career business medical IT security et all

final scaffold Nov 30, 2020, 3:30 PM

#

Hi! I need a quick help regarding group by in pandas

wintry olive Nov 30, 2020, 3:30 PM

#

unless there is a better option?

final scaffold Nov 30, 2020, 3:31 PM

#

📎 IMG_20201130_210134.jpg

#

Dataset looks like this^ ...ignore column event time.

#

I want to groupby the dataset by install time, event name, campaign, and siteid...and sum event revenue and add a new column which counts rows of event name

wintry olive Nov 30, 2020, 3:50 PM

#

there is this snag: 12. No Export. You agree and certify that neither the Product nor any other technical data received from Neo4j,~~~~

lapis sequoia Nov 30, 2020, 3:51 PM

#

final scaffold

https://docs.aspose.com/cells/java/grouping-and-ungrouping-rows-and-columns-in-python/

Aspose Documentation

Grouping and Ungrouping Rows and Columns in Python

Class Libraries & REST APIs for the developers to manipulate & process Files from Word, Excel, PowerPoint, Visio, PDF, CAD & several other categories in Web, Desktop or Mobile apps. Develop & deploy on Windows, Linux, MacOS & Android platforms.

wintry olive Nov 30, 2020, 3:55 PM

#

for a startup that wants to step up and build their own platform after using the graph dataset that might be an issue everything up to that part of the agreement was solid. The question is tho would their platform handle model cache, validation test scores and metrics so data plus set plus model card plus

wide void Nov 30, 2020, 4:00 PM

#

Hello everyone! Im having a bit of an issue and was wondering if you can help me. Im trying to train a neural network. Im having issues with fitting my model. When it goes into the directory where my images are it prepends "._" before my image name and can't for the life of me figure out why.

swift gyro Nov 30, 2020, 4:01 PM

#

Are you using any libraries? Keras / Tensorflow?

wide void Nov 30, 2020, 4:01 PM

#

Both

wintry olive Nov 30, 2020, 4:02 PM

#

oh yeah neo4Jj = awesomeness

swift gyro Nov 30, 2020, 4:02 PM

#

Can you give a short run down of the model?

#

I've had this problem before and I think it was somehing with a conv2d but don't know how I fixed it. Programming 101

wide void Nov 30, 2020, 4:04 PM

#

Im using a mobilenet I fine tuned by removing the last 6 layers and added a Dense layer at the end as output

swift gyro Nov 30, 2020, 4:04 PM

#

Linear activation, adam?

wide void Nov 30, 2020, 4:04 PM

#

softmax, adam

swift gyro Nov 30, 2020, 4:05 PM

#

have you added breakpoints and tried to see where the name changes?

wide void Nov 30, 2020, 4:06 PM

#

Negative. im a bit of a noob. I'll try that now.

swift gyro Nov 30, 2020, 4:07 PM

#

Kk. (Don't worry, I'm no developer, just a High schooler with youtube and LinkedIn Learning, also a noob)

wide void Nov 30, 2020, 4:12 PM

#

Im not sure what Im looking at. Is it ok to post the error on here?

azure stump Nov 30, 2020, 4:22 PM

#

https://medium.com/analytics-vidhya/data-lakes-vs-data-warehouses-7813d6563280?sk=484c9f4a7e7efb8ab2f1f052b4dfe190

Medium

Data Lakes vs Data Warehouses

Is there really a difference between them?😮

wintry olive Nov 30, 2020, 4:30 PM

#

https://grandstack.io/ yoooo

Build Fullstack GraphQL Applications With Ease | GRANDstack

Build Fullstack GraphQL Applications With Ease

#

not only is it a multi-code editor on the datagraph side but this architect app lets you build dataset with point and click & code

#

thats kind of what I was thinking of

earnest forge Nov 30, 2020, 4:40 PM

#

How is this sort of visualisation called?

#

📎 unknown.png

wintry olive Nov 30, 2020, 4:41 PM

#

wait its

#

scatterplot with meta waveform over top

#

that is neat

#

if you go up one more layer or dimension guess what it is....

#

the initial start of a statistical fractal

#

probably sounds more useful then it is or poetic really

earnest forge Nov 30, 2020, 4:49 PM

#

I am curious how to make this wavefrom

wintry olive Nov 30, 2020, 4:53 PM

#

its the first example I have seen its like a graph layered over a graph the waveform itself is probably statistical deviation from zero but i saw its relation to the point cloud right away

earnest forge Nov 30, 2020, 4:55 PM

#

oh. I see you are decently competent in statistics and maths too, right? I've got a question related to percentiles, though...

wintry olive Nov 30, 2020, 4:56 PM

#

be neat if the waveform could animate although its more like....

#

a standing wave 🙂

earnest forge Nov 30, 2020, 4:56 PM

#

considering this graph, I can see a correlation. anyway

wintry olive Nov 30, 2020, 4:57 PM

#

📎 unknown.png

#

i briefly looked at statistics earlier still have to study

split eagle Nov 30, 2020, 5:00 PM

#

I am working with a pandas df in jupyter notebook and am trying to drop rows on the condition that df['overall_status'] =='Recruiting') and df['Raction accrued] is NaN. I have tried using the functions .isna(), .isnull(), and also tried df['Fraction accrued'].replace('',np.nan,inplace=True) followed by df['Fraction accrued'] =='True'. I get the error: "unhashable type: 'list'. Here's my full code:

#

index_names = df_cancer_drop.drop([((df_cancer_drop['overall_status']=='Recruiting') & (df_cancer_drop['Fraction accrued'].isna()))].index)
df_cancer=df_cancer_drop.drop(index_names,inplace=True)

#

How can I correctly write the logical statement to drop these rows?

earnest forge Nov 30, 2020, 5:01 PM

#

I have the following array
a = [ 1, -9, -15, -11, -19, 2, -15, 3, 8, -8, -5, -14, -5, 1, -19]

And when I'm computing np.percentile(a, 99)
I get this confusing output: 7.299999999999997

#

Shan't it return simply -19?

wintry olive Nov 30, 2020, 5:02 PM

#

yeah there is a correlation im just not sure exactly what to call it perhaps values of y but if data visualization is analytics what is that telling me...

copper kindle Nov 30, 2020, 5:05 PM

#

earnest forge I have the following array `a = [ 1, -9, -15, -11, -19, 2, -15, 3, 8, -...

nope its correct.

copper kindle Nov 30, 2020, 5:06 PM

#

earnest forge I have the following array `a = [ 1, -9, -15, -11, -19, 2, -15, 3, 8, -...

a = [  1,  -9, -15, -11, -19,   2, -15,   3,   8,  -8,  -5, -14,  -5, 1, -19]

a = sorted(a)
print(np.percentile(a, 99))

to calculate percentiles your data must be sorted aswell.

earnest forge Nov 30, 2020, 5:06 PM

#

I've learned only basics of percentile, such as 25/50/75 so how is it computer for not 'boring' values?

copper kindle Nov 30, 2020, 5:07 PM

#

once your data is sorted you can calculate correct percentiles by hand. But numpy calculates the correct percentiles prolly by sorting the array during calculation.

earnest forge Nov 30, 2020, 5:07 PM

#

oh, i got it. after sorting the array I got 8 as the last element and 3 as prenultimate. so it explains it now

#

thanks 😄

wintry olive Nov 30, 2020, 5:08 PM

#

can numpy cancel out the positive and negative integers?

earnest forge Nov 30, 2020, 5:08 PM

#

wintry olive its the first example I have seen its like a graph layered over a graph the wave...

btw, I grabbed this picture from sklearn site, there was a comparison of different scalers

wintry olive Nov 30, 2020, 5:09 PM

#

i just noticed scalar on the graph data platform

copper kindle Nov 30, 2020, 5:09 PM

#

earnest forge thanks 😄

happy to help 😄

wintry olive Nov 30, 2020, 5:09 PM

#

first example i have seen

#

i got a bit excited 🙂

#

if i do cancel them out would that reduce noise or would i lose vale?

copper kindle Nov 30, 2020, 5:12 PM

#

earnest forge

The data value plotting may show the correlation (there can be a positive correlation if noise is reduced) and the x,y graphs might show the data distributions.

wicked meadow Nov 30, 2020, 5:15 PM

#

Trying to work on a python script that will save a string as a .sql file, anybody know how to go about this?

austere swift Nov 30, 2020, 5:18 PM

#

sql seems more of a #databases thing to me

copper kindle Nov 30, 2020, 5:18 PM

#

wicked meadow Trying to work on a python script that will save a string as a .sql file, anybo...

try this

with open('your_said_.sql', 'w') as file:
  file.write("stuff you want to write")
file.close()

austere swift Nov 30, 2020, 5:19 PM

#

copper kindle try this ```py with open('your_said_.sql', 'w') as file: file.write("stuff yo...

you don't need the file.close() since you have the with block

copper kindle Nov 30, 2020, 5:20 PM

#

austere swift you don't need the file.close() since you have the with block

isnt it better to close the file too ?

austere swift Nov 30, 2020, 5:20 PM

#

with automatically closes it thats why

copper kindle Nov 30, 2020, 5:20 PM

#

austere swift with automatically closes it thats why

you learn something new everyday haha. niceeeee 👍

wicked meadow Nov 30, 2020, 5:20 PM

#

thanks i'll try that

austere swift Nov 30, 2020, 5:20 PM

#

yep :)

wicked meadow Nov 30, 2020, 5:22 PM

#

I assume I'll be able to pass variables as the names of the files?

copper kindle Nov 30, 2020, 5:22 PM

#

wicked meadow I assume I'll be able to pass variables as the names of the files?

yes you can.

#

but you need to concatenate .sql with the variable aswell.

for example your_variable+''.sql"

wicked meadow Nov 30, 2020, 5:23 PM

#

Okay cool. How will that work with the quotation marks?

copper kindle Nov 30, 2020, 5:23 PM

#

wait a minute.

#

your varialbe holds the sql commands ?

wicked meadow Nov 30, 2020, 5:24 PM

#

no i just want the variable to be the names of the file

copper kindle Nov 30, 2020, 5:24 PM

#

alright then the aobe thing works.

#

above*

wicked meadow Nov 30, 2020, 5:25 PM

#

Okay cool thanks

stone tangle Nov 30, 2020, 6:16 PM

#

In a django project where could I manipulate data for data science? any help is much appreciated!

#

also sorry if this is not the right channel

calm forge Nov 30, 2020, 6:20 PM

#

You could just @/# something here but if your question is about data science I would leave it

stone tangle Nov 30, 2020, 6:21 PM

#

ok cool

calm forge Nov 30, 2020, 6:24 PM

#

sure

lapis sequoia Nov 30, 2020, 6:27 PM

#

I know you can make a django website that changes data live when your data changes, I would think you can also change the data on the website to, sounds a little more advanced tho

stone tangle Nov 30, 2020, 6:27 PM

#

yea I am kinda a noob

lapis sequoia Nov 30, 2020, 6:29 PM

#

i suggest learning basics of django before doing that type of stuff, so it will be less complicated when you get there and there will be less errors to deal with, will save days if your life probably.

#

I am sure you knew that but never can be to sure.

stone tangle Nov 30, 2020, 6:30 PM

#

ok, I might try a udemy course or somthing

tight dove Nov 30, 2020, 7:28 PM

#

is JSON and JSON-stat different formats? do I need to use different library to handle json-stat?

plucky spindle Nov 30, 2020, 7:45 PM

#

@stone tangle You could also check out Streamlit or Flask, it's a bit easier to do things like Machine Learning as a Service (MLaaS)

stone tangle Nov 30, 2020, 7:50 PM

#

ok will do

snow compass Nov 30, 2020, 8:26 PM

#

wait this might be a better place for pandas dataframe questions. I want to iterate through a list of dataframes. I have a number of functions where the function takes the dataframe as an argument. But! I want to do different things depending on which dataframe in the list is being put into the function.


#define my functions first
def funtion1(df):
    if df.name == df1: #doesn't actually work!!!
        #do thing
    elif df.name == df2:
        #do thing differently

#same basic structure for the rest of my functions

df_list = [df1, df2, df3, df4, df5, df6]

for df in df_list:
    df.name = df #(doesn't actually work)
    function1(df=df)
    function2(df=df)```
is basically what my stuff looks like.

but I can't do it that way. 

A pandas *series* can be given a name attribute but not a dataframe. 

if df.name == df1: **#doesn't actually work!!!**
*ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any(), or a.all().*

So of course I google the error and check the top SO links. I try to create a dictionary as a top reply suggests. 

```python
dfs = {'some_label' : df} #is what they type out

but when I try to use df.name = dfs[df] or dfs = {df1 : 'df1' , df2 : 'df2'} I get TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed from both of those. I must not be using a dictionary right or this isn't a good solution.

I would like to be able to keep the inside of my functions along the lines of
if df1 then a
elif df2 then b
elif df3 then c

but, well, the ways I've gone about this are giving me error messages (tried .name, tried making a dict.) help?

civic fractal Nov 30, 2020, 9:48 PM

#

Anyone familiar with pandas willing to help out?

#

for a min

haughty ingot Nov 30, 2020, 9:48 PM

#

like that @spark dirge

📎 unknown.png

civic fractal Nov 30, 2020, 9:53 PM

#

civic fractal Anyone familiar with pandas willing to help out?

https://stackoverflow.com/questions/65076439/how-to-get-rid-of-rows-with-pandas-in-a-csv-where-the-value-of-cells-in-a-specif

Stack Overflow

How to get rid of rows with pandas in a CSV where the value of cell...

I'm trying to filter through a CSV and make a new CSV which is the exact same except for it gets rid of any rows that have a value of greater than 100 billion in the 'marketcap' column.
The code I've

snow compass Nov 30, 2020, 9:57 PM

#

oh! muskrat

#

maybe you can use WHERE? or REPLACE?
numpy.where(condition[, x, y])
Where True, yield x, otherwise yield y.

like, iterate through the column and the condition is true then keep but false replace with nan and then remove nans?

I feel like I had to do something like this once, hang on let me go over my recent projects

#

df = df.loc[df['marketcap'] <= 1000000000]

#

@civic fractal what happens when you try that?

snow compass Nov 30, 2020, 10:12 PM

#

haughty ingot like that <@!494915938836021248>

scropie that for me?

haughty ingot Nov 30, 2020, 10:17 PM

#

no

#

this for @spark dirge

#

like that @spark dirge

📎 unknown.png

spark dirge Nov 30, 2020, 10:18 PM

#

haughty ingot no

not sure what you are trying to do. looking for context.

snow compass Nov 30, 2020, 10:20 PM

#

I'm just trying to get some outside eyes on my dataframe problem and then elongatedmuskrat posted after me so I tried to help with their problem and now I'm just chilling here

spark dirge Nov 30, 2020, 10:21 PM

#

df_list = [df1, df2, df3, df4, df5, df6]
df_mp = {}
for df in df_list:
  df_mp[df.name] = some_func(df)
print(df_mp)

#

You just want a list of dataframes sent through a function and in a collection?

civic fractal Nov 30, 2020, 10:32 PM

#

snow compass maybe you can use WHERE? or REPLACE? numpy.where(condition[, x, y]) Where True, ...

I'll try that now. Sorry I went offline for a bit haha

boreal summit Nov 30, 2020, 10:33 PM

#

@spark dirge what you tryna do exactly?

snow compass Nov 30, 2020, 10:37 PM

#

the inside of some of my functions look like

if df == df1:
    #do thing
if df == df2:
    #do other thing

but I get that value error. I thought I could assign a name to each df and then instead it's if df.name == '' then do thing

but that hasn't worked

I went to SO and read up a similar problem and the person was advised to make a dictionary and I feel I must be doing something wrong because THAT gives me an error

#

https://stackoverflow.com/questions/31727333/get-the-name-of-a-pandas-dataframe/31727504#31727504

In many situations, a custom attribute attached to a pd.DataFrame object is not necessary. In addition, note that pandas-object attributes may not serialize. So pickling will lose this data.

Instead, consider creating a dictionary with appropriately named keys and access the dataframe via dfs['some_label'].

df = pd.DataFrame()

dfs = {'some_label': df}

Stack Overflow

Get the name of a pandas DataFrame

How do I get the name of a DataFrame and print it as a string?

Example:

boston (var name assigned to a csv file)

import pandas as pd
boston = pd.read_csv('boston.csv')

print('The winner is te...

spark dirge Nov 30, 2020, 10:45 PM

#

boreal summit <@494915938836021248> what you tryna do exactly?

I'm trying to help Silver and scropie. not sure what they are trying to do though.

velvet thorn Nov 30, 2020, 11:12 PM

#

@snow compass two simple ways

#

which more or less lead to the same thing

#

create a list of (df, function) tuples

#

and iterate through that

snow compass Nov 30, 2020, 11:23 PM

#

I mean, I need all of my dataframes to be put through all of the functions. it's just I need to write to a different row depending on which dataframe is being run, for example.

real wigeon Nov 30, 2020, 11:26 PM

#

how to change the timezone in a timestamp column

#

pandas

#

im pulling a report from my db

#

and need to change the time to est (everyone using this app will be on est)

austere swift Dec 1, 2020, 12:06 AM

#

!d pandas.Series.dt.tz_convert

arctic wedgeBOT Dec 1, 2020, 12:06 AM

#

`pandas.Series.dt.tz_convert`

Series.dt.tz_convert(*args, **kwargs)```
Convert tz-aware Datetime Array/Index from one time zone to another.

Parameters  **tz**str, pytz.timezone, dateutil.tz.tzfile or NoneTime zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A tz of None will convert to UTC and remove the timezone information.

Returns  Array or Index   Raises  TypeErrorIf Datetime Array/Index is tz-naive.

See also

[`DatetimeIndex.tz`](pandas.DatetimeIndex.tz.html#pandas.DatetimeIndex.tz "pandas.DatetimeIndex.tz")A timezone that has a variable offset from UTC.

[`DatetimeIndex.tz_localize`](pandas.DatetimeIndex.tz_localize.html#pandas.DatetimeIndex.tz_localize "pandas.DatetimeIndex.tz_localize")Localize tz-naive DatetimeIndex to a given time zone, or remove timezone from a tz-aware DatetimeIndex.

Examples

With the tz parameter, we can change the DatetimeIndex to other time zones:... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.tz_convert.html#pandas.Series.dt.tz_convert)

austere swift Dec 1, 2020, 12:06 AM

#

@real wigeon

#

thats from converting time-zone aware columns

#

if your column isnt time-zone aware you'd need to make it time-zone aware

#

!d pandas.Series.dt.tz_localize

arctic wedgeBOT Dec 1, 2020, 12:07 AM

#

`pandas.Series.dt.tz_localize`

Series.dt.tz_localize(*args, **kwargs)```
Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.

This method takes a time zone (tz) naive Datetime Array/Index object and makes this time zone aware. It does not move the time to another time zone. Time zone localization helps to switch from time zone aware to time zone unaware objects.

Parameters  **tz**str, pytz.timezone, dateutil.tz.tzfile or NoneTime zone to convert timestamps to. Passing `None` will remove the time zone information preserving local time.

**ambiguous**‘infer’, ‘NaT’, bool array, default ‘raise’When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.

• ‘infer’ will attempt to infer fall dst-transition hours based on order
... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.tz_localize.html#pandas.Series.dt.tz_localize)

real wigeon Dec 1, 2020, 12:18 AM

#

Thank you @austere swift

wintry olive Dec 1, 2020, 12:25 AM

#

https://media.discordapp.net/attachments/366673247892275221/783009575917846538/unknown.png

#

I havent had a chance to research scalar models yet but the meta waveform could be a way to get a glance at the distribution or symmetry between the two axis without having to scope out the numbers

#

a visual aid layer not a meta or extrapolation or just a visual aid for the deviation from zero or the symmetry of the two almost as if some law of large numbers set

#

the distribution is symmetrical but the vertical waveform is front whereas the horizontal waveform is in the middle

hollow gull Dec 1, 2020, 12:31 AM

#

snow compass wait this might be a better place for pandas dataframe questions. I want to iter...

It seems like you are being sort of particular about how you do this without letting us know why, so expect a lot of solutions that don't quite hit your requirements (because we don't know them / why they exist)

#define my functions first
def funtion1(df, dfname):
    if dfname == 'df1': #doesn't actually work!!!
        #do thing
    elif dfname == 'df2':
        #do thing differently

#same basic structure for the rest of my functions

dict_dfs = dict()
dict_dfs['df1'] = df1
dict_dfs['df2'] = df2
dict_dfs['df3'] = df3
dict_dfs['df4'] = df4

for dfname in dict_dfs.keys():
    function1(df=dict_dfs[dfname], dfname=dfname)

wintry olive Dec 1, 2020, 12:33 AM

#

the offset could be the rate of the axis range the vertical increase by 1 so its waveform is always front where as the horizonal increases an even amount along the axis so its waveform is right in the middle

hollow gull Dec 1, 2020, 12:33 AM

#

@snow compass Maybe a better way of doing this is to build custom classes where the different functions do different things depending on the class that you pass. But that would only be better if you had multiple dataframes of the same type that should have the same thing done if they are passed to the same function.

wintry olive Dec 1, 2020, 12:34 AM

#

id very much like to see how the graph looks at different intervals same axis range but increases that are unusual

#

the following ideas are highly experimental.

#

i would like to have 2 more graphs in a grid exact opposites counting down from the max established from before. This creates a min max wave

#

a nice symmetrical one at that.

#

id iterate two more times then have the fourth layer be only min/max and target data the rest is truncated as noise.

real wigeon Dec 1, 2020, 1:19 AM

#

austere swift if your column isnt time-zone aware you'd need to make it time-zone aware

how could i check if it's aware

austere swift Dec 1, 2020, 1:22 AM

#

!d pandas.Series.dt.tz

arctic wedgeBOT Dec 1, 2020, 1:22 AM

#

`pandas.Series.dt.tz`

Series.dt.tz```
Return timezone, if any.

Returns  datetime.tzinfo, pytz.tzinfo.BaseTZInfo, dateutil.tz.tz.tzfile, or NoneReturns None when the array is tz-naive.

austere swift Dec 1, 2020, 1:22 AM

#

if that's None then it's not aware

real wigeon Dec 1, 2020, 1:35 AM

#

doing this

#

timestamps = df["upload_timestamp"].dt.tz
            print(timestamps)``` resulted in ``none``

#

i presume im selecting the df column properly

#

erm im kind of a noob

#

If i remember correctly pandas is kind of weird

red briar Dec 1, 2020, 1:37 AM

#

py

  def convert_timezone(self, x):
       from_zone = tz.gettz('UTC')
       to_zone = tz.gettz('America/New_York')
       return x.replace(tzinfo=from_zone).astimezone(to_zone)

#

      df['Creation Date'] = df['Creation Date'].apply(lambda x:self.convert_timezone(x))
      df['Creation Date'] = df['Creation Date'].apply(lambda x:x.tz_localize(None))

real wigeon Dec 1, 2020, 1:39 AM

#

right but that's just applying that logic to the column

#

doesn't pandas handle that kind of weird, because the result is a series

#

and id need that as a part of the df

#

aren't they two separate entities now

real wigeon Dec 1, 2020, 1:41 AM

#

red briar ``` df['Creation Date'] = df['Creation Date'].apply(lambda x:self.convert_...

wont doing this result in a series object, which is considered separate from the pandas df?

#

or is doing df['Creation Date'] applying it to the df, but only to the column Creation Date

#

i am noob

red briar Dec 1, 2020, 1:43 AM

#

real wigeon or is doing ``df['Creation Date']`` applying it to the df, but only to the colum...

this will apply on ur date column only

real wigeon Dec 1, 2020, 1:50 AM

#

austere swift !d pandas.Series.dt.tz_localize

do i need to localize to my time zone, or mark it as UTC

#

because localize only makes it aware (my data in the db is UTC), it doesn't convert.

serene scaffold Dec 1, 2020, 2:05 AM

#

I have to make a Bayes classifier for a dataset where each object gets one continuous feature and its class label. But how do you even apply Bayes for continuous data?

#

binning?

austere swift Dec 1, 2020, 2:15 AM

#

real wigeon because ``localize`` only makes it aware (my data in the db is UTC), it doesn't ...

thats what the other function does

#

tz_convert

real wigeon Dec 1, 2020, 2:15 AM

#

yes but im asking

#

 make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')``` Since my data in the db is ``UTC``

#

or should I set it to my local timezone

austere swift Dec 1, 2020, 2:17 AM

#

i mean you do tz_localize and then tz_convert

real wigeon Dec 1, 2020, 2:17 AM

#

yes correct

#

but do you localize to UTC

#

or EST

#

the data is in UTC

#

i went with UTC

austere swift Dec 1, 2020, 2:19 AM

#

well in the examples it shows you could use est

#

📎 unknown.png

#

see how after the localization it shows -5:00

#

that means that when it localized with est it assumed the original values were utc

#

so i think you can just use that

#

I'm not completely sure tho lol

real wigeon Dec 1, 2020, 2:20 AM

#

ok cool

#

i mean it says that it does not convert

#

.>

#

alright well, idk how to place that column back into my df

austere swift Dec 1, 2020, 2:25 AM

#

instead of assigning that value to make_timestamps_tz_aware just assign it back to df["upload_timestamp"]

real wigeon Dec 1, 2020, 2:27 AM

#

what do you mean

austere swift Dec 1, 2020, 2:28 AM

#

df["upload_timestamp"] = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')

real wigeon Dec 1, 2020, 2:29 AM

#

oh

#

i was actually going to do this

#

make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```

austere swift Dec 1, 2020, 2:35 AM

#

that works too

real wigeon Dec 1, 2020, 2:37 AM

#

hmm it says though

#

it's not a date time index

#

austere swift Dec 1, 2020, 2:38 AM

#

oh its probably not in datetime format

#

!d pandas.to_datetime

arctic wedgeBOT Dec 1, 2020, 2:38 AM

#

`pandas.to_datetime`

pandas.to_datetime(arg: DatetimeScalar, errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → Union[DatetimeScalar, ‘NaTType’]``````py
pandas.to_datetime(arg: ‘Series’, errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → ’Series’``````py
pandas.to_datetime(arg: Union[List, Tuple], errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → DatetimeIndex```
Convert argument to datetime.

Parameters  **arg**int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-likeThe object to convert to a datetime.

**errors**{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
 • If ‘raise’, then invalid parsing will raise an exception.

• If ‘coerce’, then invalid parsing will be set as NaT.

• If ‘ignore’, then invalid parsing will return the input.

**dayfirst**bool, default FalseSpecify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

**yearfirst**bool, default FalseSpecify a date parse order if arg is str or its list-likes.

• If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.

• If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).
... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime)

austere swift Dec 1, 2020, 2:38 AM

#

use that to convert it

#

damn thats long

real wigeon Dec 1, 2020, 2:41 AM

#

uhh

#

when

austere swift Dec 1, 2020, 2:41 AM

#

before the localize stuff

real wigeon Dec 1, 2020, 2:44 AM

#

oh

#

that looks like its applied to the entire df?

austere swift Dec 1, 2020, 2:45 AM

#

no, you can do it on a single column

real wigeon Dec 1, 2020, 2:46 AM

#

update_to_datetime = df["upload_timestamp"].to_datetime

#

err

#

cuz idk

austere swift Dec 1, 2020, 2:46 AM

#

no, its not a function from the df its from pandas

real wigeon Dec 1, 2020, 2:46 AM

#

yeah

austere swift Dec 1, 2020, 2:47 AM

#

so pd.to_datetime(df["upload_timestamp"])

#

and then you'd need to set the format arg so it can see how to format it

#

oh oops typo

#

i put datatime instead of datetime lol

real wigeon Dec 1, 2020, 2:49 AM

#

looking up the formatting

austere swift Dec 1, 2020, 2:50 AM

#

its kinda like datetime strptime

real wigeon Dec 1, 2020, 2:50 AM

#

this is the current format

#

11/26/2020 11:26:27 PM

#

but it's in utc

austere swift Dec 1, 2020, 2:51 AM

#

https://stackoverflow.com/questions/17134716/convert-dataframe-column-type-from-string-to-datetime-dd-mm-yyyy-format

Stack Overflow

Convert DataFrame column type from string to datetime, dd/mm/yyyy f...

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetimes?

real wigeon Dec 1, 2020, 2:52 AM

#

ermm im over thinking

#

it probably accepts

#

mm/dd/yyyy HH:mm:ss

austere swift Dec 1, 2020, 2:52 AM

#

yeah it probably does

#

try it

real wigeon Dec 1, 2020, 2:53 AM

#

im going to try this

#

df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            pd.to_datetime(df["upload_timestamp"], format='mm/dd/yyyy HH:mm:ss')

            make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```

austere swift Dec 1, 2020, 2:53 AM

#

well, not that for format

#

thats not how format works

real wigeon Dec 1, 2020, 2:54 AM

#

oh uhh

#

lmao

#

whoops

#

sry sry

#

yhe %% s

austere swift Dec 1, 2020, 2:54 AM

#

yes

real wigeon Dec 1, 2020, 2:55 AM

#

i dont think this is quite correct

#

pd.to_datetime(df["upload_timestamp"], format='%mm/%dd/%yyyy %HH:%mm:%ss')

austere swift Dec 1, 2020, 2:55 AM

#

no

#

https://www.programiz.com/python-programming/datetime/strftime scroll down a bit and you'll see a list of all the codes

Python strftime() - datetime to string

In this article, you will learn to convert datetime object to its equivalent string in Python with the help of examples. For that, we can use strftime() method. Any object of date, time and datetime can call strftime() to get string from these objects.

#

thats for datetime strftime but i think it should be the same for pandas

snow compass Dec 1, 2020, 3:00 AM

#

hollow gull <@!202439315165413376> Maybe a better way of doing this is to build custom class...

I will come back to this in the morning. I'll need to look up classes and see if that's my solution. I'm performing the same math and using the same pandas functions on each of the dataframes. I'm just writing to different rows or writing to different worksheets in the workbook I'm writing to.

real wigeon Dec 1, 2020, 3:01 AM

#

dam it's still the same error @austere swift

#

    make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

austere swift Dec 1, 2020, 3:02 AM

#

real wigeon dam it's still the same error <@!494466018245345282>

it's not an inplace function btw

#

it returns the output

real wigeon Dec 1, 2020, 3:02 AM

#

ohh

austere swift Dec 1, 2020, 3:02 AM

#

so you'd need to assign it back to the original column

#

or assign it to an intermediary variable that you then use for the other modifications

real wigeon Dec 1, 2020, 3:04 AM

#

alrighty

#

testing

#

hmmm

#

same error

austere swift Dec 1, 2020, 3:05 AM

#

code?

real wigeon Dec 1, 2020, 3:06 AM

#

df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            convert_timestamp_to_date_time = pd.to_datetime(df["upload_timestamp"], format="%m/%d/%Y, %H:%M:%S")
            make_timestamps_tz_aware = convert_timestamp_to_date_time.dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```

austere swift Dec 1, 2020, 3:07 AM

#

also you forgot %p btw, thats for AM and PM

#

and there shouldnt be a comma in the format

real wigeon Dec 1, 2020, 3:07 AM

#

is it :%p

austere swift Dec 1, 2020, 3:07 AM

#

no it would have space %p

real wigeon Dec 1, 2020, 3:07 AM

#

k

austere swift Dec 1, 2020, 3:08 AM

#

so basically imagine you're writing your time out, but replace all the actual number values with the % codes

real wigeon Dec 1, 2020, 3:08 AM

#

i see

#

alright but the error states something about the index

austere swift Dec 1, 2020, 3:10 AM

#

well test that out

#

it could just be the format error

real wigeon Dec 1, 2020, 3:11 AM

#

apparently localize and convert only works on the index

#

https://stackoverflow.com/questions/26089670/unable-to-apply-methods-on-timestamps-using-series-built-ins

Stack Overflow

Unable to apply methods on timestamps using Series built-ins

On the following series:

0 1411161507178
1 1411138436009
2 1411123732180
3 1411167606146
4 1411124780140
5 1411159331327
6 1411131745474
7 1411151831454
8 1411152487758
...

#

yeah same error

#

hmmmmm that doesnt really help

trim oar Dec 1, 2020, 3:25 AM

#

Hello guys, I know it depends on the problem, but how would you approach to find out the appropriate number layers?

#

As well as nodes?

#

Like a baseline number

austere swift Dec 1, 2020, 3:26 AM

#

theres no real way to just figure out how many you need

#

that's the whole concept of hyperparameter tuning

#

you just have to test stuff out and see how it goes

#

I'd recommend trying to use a model that already works for your baseline, like a premade model

#

then tweak from there

trim oar Dec 1, 2020, 3:28 AM

#

I know hyperparameter with GridSearch when doing classical ML. How would you do it with TensorFlow?

austere swift Dec 1, 2020, 3:29 AM

#

if you're using keras you can use keras tuner

#

https://www.tensorflow.org/tutorials/keras/keras_tuner

TensorFlow

Introduction to the Keras Tuner | TensorFlow Core

trim oar Dec 1, 2020, 3:34 AM

#

Thank you!

real wigeon Dec 1, 2020, 3:48 AM

#

yeah so im still getting the same error

#

TypeError: index is not a valid DatetimeIndex or PeriodIndex

#

progress

#

syntax stuff

real wigeon Dec 1, 2020, 4:12 AM

#

alright well... i managed to download in xls format just the timestamp column..

#

and I mistakenly stripped the hours/seconds info

fallow thunder Dec 1, 2020, 4:38 AM

#

Hi. How can I make this matplotlib figure bigger in the y axis without changing the ylim? Since the limit of the values in the y axis is 1.

📎 Screen_Shot_2020-12-01_at_12.33.50_AM.png

#

This is the code used to generate the figure:

import matplotlib.pyplot as plt
import matplotlib.patches as patches

fig = plt.figure(figsize=(10,2))

ax = fig.add_subplot(1,1,1, aspect='equal')

# Low
x = [0,0,9,11]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

# Medium
x = [10,12,15,17]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

# High
x = [16,18,20,20]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

ax.set_xlim([0,20])
ax.set_ylim([0,2])

plt.show()

trim oar Dec 1, 2020, 4:43 AM

#

I'm not exactly sure of your codes but you can set the ticker with plt.yticks = array

#

Say your array is range(1, 10,1), then you can set plt.ytics = range(1,10,1)

#

Don't know fit hat helps

#

Increase figsize as well?

fallow thunder Dec 1, 2020, 4:57 AM

#

I tried increasing the figsize but the height of the figure doesn't change

#

nvm, it was aspect='equal', I forgot to remove it. Thanks for the help anyway!

lapis sequoia Dec 1, 2020, 4:59 AM

#

beginner question but in numpy rather than creating the matrix from scratch is there a way I can call an empty matrix of specified size?

#

exp: i could call a 3 x 2 matrix full of 0s with the values to change later

lapis sequoia Dec 1, 2020, 5:15 AM

#

nvm found answer

ivory panther Dec 1, 2020, 5:25 AM

#

Good nigth to everybody. Does anybody have an idea to transform this plot so that it shows the form of the curves better?

📎 newplot_-_2020-11-30T232020.608.png

#

Without the need to show two different pictures.

lapis sequoia Dec 1, 2020, 6:23 AM

#

increase window width if thats an option

#

or take more windows

high badge Dec 1, 2020, 6:55 AM

#

is singular value decomposition (SVD) solely for linear regression or can it perform on other models like the Gradient Descent algorithms can?

high badge Dec 1, 2020, 7:26 AM

#

nvm

sleek robin Dec 1, 2020, 1:29 PM

#

hey guys, in backpropagation, if we're using cross-entropy as the loss function, why is the error term in the output layer computed as [y - (output activation)]? isn't that the partial derivative of a mean squared error loss func with respect to output activation, rather than cross-entropy? i keep seeing it even if the loss function isn't MSE

snow compass Dec 1, 2020, 2:54 PM

#

hollow gull It seems like you are being sort of particular about how you do this without let...

How did I miss this last night?? I saw your second ping and not this one. Is this what gm meant? because now that makes sense.

Sorry I didn't realize I was being particular about this. I think I still don't have the best handle I need on the jargon? like, using words as correctly as possible to their coding definition.

I'm gonna try this out and see if that does the thing. and hopefully have a better understanding of why >.>;;

ornate valve Dec 1, 2020, 4:55 PM

#

hi! , anyone can help me with np.trapz for calculate area under the curve ? ive been doing some research but all the examples contains random data and i dont know how to incorporate my data.

glacial rune Dec 1, 2020, 4:56 PM

#

I have a dictionary of dictionaries:

{
'A': {'spread' = .., 'mid' = ..},
'B': {'spread' = .., 'mid' = ..},
...
}

Where there are usually 3-15 keys. I need the most performant way of finding the minimum spread AND the N largest mids - I've currently got the min spread as best = min(prices.values(), key=lambda x: x['spread'] then best_spread = best['spread]
I'm not sure how to find the N largest mids in the most performant way - but I do put the mids in a numpy array as I need to find their median or mean.

grave frost Dec 1, 2020, 5:11 PM

#

Well, does anyone know why when we use TPUs, PyTorch uses the System RAM for loading the model rather than the internal TPU Vram or the GPU RAM??

split eagle Dec 1, 2020, 5:16 PM

#

I'm trying to drop rows that contain specific words within a column from my df. I tried creating an index and dropping the index, but I got an error saying that since it included more than 6 items it was too large and couldn't be used. I have just tried the following code, which I adapted from Stack Overflow:

#

tox = ['toxic','toxicity','toxicities', 'deaths','fatal','patient~ safety','safety issue', 'safety monitoring', 'safety data', 'safety measures', 'safety related', 'safety reasons', 'safety concern', 'safety and efficacy']
df_test1 = df_test1[-df_test1['why_stopped'].isin(tox)]

#

This doesn't return any errors, but the size of my df_test1 hasn't changed.

#

How might I get this this to successfully drop rows that contain the terms in tox from df_test1?

ornate valve Dec 1, 2020, 5:19 PM

#

ivory panther Good nigth to everybody. Does anybody have an idea to transform this plot so tha...

maybe you can try plot in matrix axs[1]. then axs[2]. etc.... you are going to have the data in sub plots in the same image.

real wigeon Dec 1, 2020, 5:25 PM

#

i have a dataset that I manipulate some timezone data on
it manipulates just one column
however I'm trying to output the entire data set, not just the timestamp column, to xls
currently it's just exporting the xls file
im using pandas

df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            convert_timestamp_to_date_time = pd.to_datetime(df["upload_timestamp"], format="%m/%d/%Y %H:%M:%S %p")
            make_timestamps_tz_aware = convert_timestamp_to_date_time.dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.dt.tz_convert('America/New_York')
            remove_time_zone = make_timestamps_tz_est.dt.tz_localize(None)

            #remove_time_zone = make_timestamps_tz_est.apply(lambda a: pd.to_datetime(a).date())


            remove_time_zone.to_excel('staffDashboard/output.xlsx', index=False)
            #print(cursor.mogrify(get_results, (formatted_start_date, formatted_end_date)))
            connection.close()
            cursor.close()
            return send_file('output.xlsx', attachment_filename=f"{formatted_start_date}-{formatted_end_date}_survey_results.xlsx", as_attachment=True)```
how do i go from refferencing just the column, to merging it into the dataframe
and then exporting that dataframe
like I said, currently it just export the column
do i just replace the old column
and export the new df

woven tundra Dec 1, 2020, 5:30 PM

#

@real wigeon

You're splitting out that column, running it through functions and then exporting just the column.

Add it back to the df with

df["converted_timestamp"] = remove_time_zone

And then export the df

df.to_excel("output.xlsx", index=False)

real wigeon Dec 1, 2020, 5:30 PM

#

doing this

#

df["converted_timestamp"] = remove_time_zone

#

wont that assign a new column, since the name is different

woven tundra Dec 1, 2020, 5:31 PM

#

yes, if you want to replace the upload_timestamp column with the column full of converted info change the name to "upload_timestamp"

real wigeon Dec 1, 2020, 5:31 PM

#

ok gotcha

#

let me test

#

I thought it was something simple like that

woven tundra Dec 1, 2020, 5:32 PM

#

sure let me know, ping me if it doesn't work so I get a notification

grave frost Dec 1, 2020, 5:34 PM

#

Well, does anyone know why when we use TPUs, PyTorch uses the System RAM for loading the model rather than the internal TPU Vram or the GPU RAM??

real wigeon Dec 1, 2020, 5:37 PM

#

@woven tundra that worked

#

thank you

woven tundra Dec 1, 2020, 5:37 PM

#

awesome

#

no worries

livid quartz Dec 1, 2020, 5:38 PM

#

Does anyone know how to convert an array with values dtype = 'timedelta64[ns]' to days?

real wigeon Dec 1, 2020, 5:38 PM

#

although it does.... this weird thing... where the query range is like x-y but y wont be included

woven tundra Dec 1, 2020, 5:39 PM

#

real wigeon although it does.... this weird thing... where the query range is like x-y but y...

i don't understand, will need more context

real wigeon Dec 1, 2020, 5:40 PM

#

i think it's a mysql thing

woven tundra Dec 1, 2020, 5:40 PM

#

oh okay, is it included in the input file?

real wigeon Dec 1, 2020, 5:40 PM

#

im thinking it might now be

#

i dont believe mysql is inclusive i think is the term

#

like if i ask it to query all data points between 5am and 6am, it will go all the way up to 5:59am, but not include the 6am

woven tundra Dec 1, 2020, 5:41 PM

#

livid quartz Does anyone know how to convert an array with values dtype = 'timedelta64[ns]' t...

if for eg it's stored in column 'datedif' in a dataframe called 'df

Do df['datedif'].dt.days

real wigeon Dec 1, 2020, 5:42 PM

#

yeah i did some searching

#

its a mysql thing

woven tundra Dec 1, 2020, 5:43 PM

#

cool cool

real wigeon Dec 1, 2020, 5:43 PM

#

and its because i didnt specify seconds

#

lol

woven tundra Dec 1, 2020, 5:43 PM

#

i can't be a lot of help on the mysql front 🤷🏻‍♂️

real wigeon Dec 1, 2020, 5:43 PM

#

no worries you've been helpful

#

im just typing for the sake of it

livid quartz Dec 1, 2020, 5:51 PM

#

woven tundra if for eg it's stored in column `'datedif'` in a dataframe called '`df` Do `df[...

thanks

ivory panther Dec 1, 2020, 6:04 PM

#

ornate valve maybe you can try plot in matrix axs[1]. then axs[2]. etc.... you are going to h...

I Had thought the same but given that I will use these picks for a paper, I need to save space because of I have more plots with the same problem :/

azure stump Dec 1, 2020, 6:06 PM

#

https://medium.com/analytics-vidhya/knowing-these-can-really-make-you-better-in-understanding-predictive-analytics-fd54ad622fcf?sk=b21cbd1eda797742b74b538b8b030461

Medium

Knowing these can really make you better in understanding Predictiv...

Like Artificial Intelligence, predictive analytics is not a new concept at all.

snow compass Dec 1, 2020, 6:24 PM

#

hollow gull It seems like you are being sort of particular about how you do this without let...

is this a dumb way of handling this?

dfs = [df1,df2,df3,df4,df5,df6]
fn2 = [3,14,25,36,47,58]
fn3 = [3,14,25,36,47,58]
fn4 = [3,19,35,51,67,83]
fn5 = [1,6,11,16,21,26]
fn6 = [6,7,8,9,10,11]
for n in range(6):
  fxn1(dfs(n))
  fxn2(dfs(n), fn2(n))
  fxn3(dfs(n), fn3(n))
  fxn4(dfs(n), fn4(n))
  fxn5(dfs(n), fn5(n))
  fxn6(dfs(n), fn6(n))

ornate valve Dec 1, 2020, 6:27 PM

#

ivory panther I Had thought the same but given that I will use these picks for a paper, I need...

https://www.pythoninformer.com/python-libraries/matplotlib/line-plots/ // and maybe trough this?

Line plot styles in Matplotlib

paper nacelle Dec 1, 2020, 6:47 PM

#

the cell has executed but i dont see the map

📎 unknown.png

#

am using plotly express

#

in jupyter

sleek robin Dec 1, 2020, 6:51 PM

#

if it's a white square, try restarting the notebook

#

i had that a couple of times in jupyter with plotly

gray phoenix Dec 1, 2020, 7:05 PM

#

Does anyone know where I would be able to learn time series analysis?

Cost isnt too big of an issue since i would be getting my employer to pay for it.

fallow prism Dec 1, 2020, 7:26 PM

#

split eagle This doesn't return any errors, but the size of my df_test1 hasn't changed.

the size hasn't change, but you data inside?

#

it's possible that fill with NaN values after drops

#

beacuse your dataframe has a fix size

split eagle Dec 1, 2020, 7:29 PM

#

@fallow prism I'll inspect the data real quick. Give me a sec.

#

@fallow prism I have examined the df and the cells that I intended to drop remain.

keen crest Dec 1, 2020, 7:34 PM

#

Posting in this channel because my issue includes the use of a dataframe, but please direct me to the correct channel if I posted incorrectly. Can anyone help me fix this error? I don't understand why my list isn't being accepted as column names, even though my variable used is a list with four elements. My list is printed in cmd as ['owner', 'series, 'name', 'image']

📎 image0.png 📎 image1.png

fallow prism Dec 1, 2020, 7:36 PM

#

split eagle <@!615010864247865354> I have examined the df and the cells that I intended to d...

is your column just a word or string?

#

if is just a word try this

#

df_test1['why_stopped'] = df_test1['why_stopped'].apply(lambda x: return x if x not in tox)

#

or make a new column an replace the first column later

lapis sequoia Dec 1, 2020, 7:45 PM

#

New Medium Article Published. Introduction to NumPy in Python. Exploring Operations and Arrays in NumPy, The Numerical Python Library. Let me know what you think! https://medium.com/analytics-vidhya/introduction-to-numpy-in-python-db8aa7ffd91f

Medium

Introduction to NumPy in Python

Exploring Operations and Arrays in NumPy, The Numerical Python Library

serene scaffold Dec 1, 2020, 7:46 PM

#

@lapis sequoia this is something that you wrote?

lapis sequoia Dec 1, 2020, 7:46 PM

#

@serene scaffold Yes

serene scaffold Dec 1, 2020, 7:50 PM

#

@lapis sequoia very nice. I'm looking at the section on joining. You mention using .join but it looks like it's np.concatenate that you use

lapis sequoia Dec 1, 2020, 7:53 PM

#

serene scaffold <@456226577798135808> very nice. I'm looking at the section on joining. You ment...

Oops, you are correct, I just published the fixed version. Thank you for that feedback!

serene scaffold Dec 1, 2020, 7:53 PM

#

lapis sequoia Oops, you are correct, I just published the fixed version. Thank you for that fe...

cool! 💥

split eagle Dec 1, 2020, 7:56 PM

#

@lobon22 A string.

lapis sequoia Dec 1, 2020, 8:39 PM

#

hey

#

i keep getting this error

#

    result = self.forward(*input, **kwargs)
  File "/Users/ashley/Deeplearning/fresh_vs_rotton.py", line 67, in forward
    x = F.max_pool2d(self.relu(self.conv1(x_1)), 2)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 2-dimensional input of size [1176, 512] instead

#

idk how to fix it

#

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
        self.relu = nn.ReLU()


    def forward(self, x_1):
        x = F.max_pool2d(self.relu(self.conv1(x_1)), 2)
        x = F.max_pool2d(self.relu(self.conv2(x)), 2)
        x = x.view(-1, 8 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

fallow prism Dec 1, 2020, 9:30 PM

#

split eagle `tox = ['toxic','toxicity','toxicities', 'deaths','fatal','patient~ safety','saf...

that minus '-' have to be '~'

#

https://appdividend.com/2020/01/22/python-pandas-how-to-remove-rows-in-dataframe/ i am dead

AppDividend

Krunal

Python Pandas: How To Remove Rows and Columns In DataFrame

Python Pandas dataframe drop() is an inbuilt function that is used to drop the rows. The drop() removes the row based on an index provided to that function.

spiral peak Dec 1, 2020, 9:35 PM

#

So I have a dataset made up of 3 columns. Not every column has data for every row, but I'd still like to compute an average for that row, even if it's just using the 1 column. How do I do that?
.mean() is giving me NaN for the rows that have NaN values in one of the columns and I don't remember how to get around this.

pearl vine Dec 1, 2020, 9:37 PM

#

Define "get around" -- how do you want Nan to be treated? Ignore the value, i.e., average the non-NaN values, presumably deducting them from the count? Treat them as zero?

spiral peak Dec 1, 2020, 9:37 PM

#

ignore the value if NaN

serene scaffold Dec 1, 2020, 9:49 PM

#

I want triangle functions in numpy that, for a given range, return 1.0 in the middle of the range, 0.0 at the ends, and np.nan outside the range. But I can only find stuff about making n-arrays with that distribution.

#

I guess I can make them myself with np.vectorize or something

split eagle Dec 1, 2020, 9:50 PM

#

👍

spiral peak Dec 1, 2020, 9:52 PM

#

spiral peak So I have a dataset made up of 3 columns. Not every column has data for every ro...

Hi, I'm dumb. .mean() works, I just have an inability to spell

lapis sequoia Dec 1, 2020, 10:56 PM

#

serene scaffold I want triangle functions in numpy that, for a given range, return `1.0` in the ...

Don't know if this is helpful since it returns 2D arrays. https://stackoverflow.com/questions/39951392/python-plot-triangular-function-into-2d-arrays

Stack Overflow

python, plot triangular function into 2d arrays

I'm new to python, in this case, I want to put my function into 2d arrays, so I can plot the function. Here is my triangle function, I'm using it for fuzzy logic:

def triangle (z,a,b,c):
if (z...

cedar sun Dec 1, 2020, 11:36 PM

#

on the examples, when training a nn. What is X_train, Y_train, X_validation and Y_Validation?

#

X is a list with the training data (images or what ever) and Y another llist of the same size with the labels for each X?

velvet thorn Dec 2, 2020, 12:04 AM

#

snow compass is this a dumb way of handling this? ```python dfs = [df1,df2,df3,df4,df5,df6] f...

what is this 🥴

velvet thorn Dec 2, 2020, 12:05 AM

#

serene scaffold I want triangle functions in numpy that, for a given range, return `1.0` in the ...

wait what do you mean by that

#

do you have an example

velvet thorn Dec 2, 2020, 12:05 AM

#

cedar sun X is a list with the training data (images or what ever) and Y another llist of ...

usually not a list

#

but some sort of container

velvet thorn Dec 2, 2020, 12:05 AM

#

cedar sun on the examples, when training a nn. What is X_train, Y_train, X_validation and ...

you train your model on some data, then you check if it's working on other data that your model hasn't seen before. the latter is validation data.

cedar sun Dec 2, 2020, 12:06 AM

#

if not a list, what?

serene scaffold Dec 2, 2020, 12:06 AM

#

velvet thorn wait what do you mean by that

I'm semi-afk but I can write a better explanation later.

velvet thorn Dec 2, 2020, 12:06 AM

#

cedar sun if not a list, what?

numpy array, pandas DataFrame (backed by said array), or TensorFlow/PyTorch's containers

cedar sun Dec 2, 2020, 12:06 AM

#

okey okey

#

i knew images must be np arrays

#

thats easy cuz i think opencv loads images as np arrays, right?

covert cedar Dec 2, 2020, 12:21 AM

#

Hey guys, I am trying to assign the strike and expiration date from a row to all of the results its values spawn, would I use inheritance to solve this?

#

I tried .at but it did not work correctly

tough citrus Dec 2, 2020, 12:58 AM

#

is this the place to talk about neural nets

velvet thorn Dec 2, 2020, 1:33 AM

#

tough citrus is this the place to talk about neural nets

yes

velvet thorn Dec 2, 2020, 1:33 AM

#

covert cedar Hey guys, I am trying to assign the strike and expiration date from a row to all...

what?

covert cedar Dec 2, 2020, 1:34 AM

#

velvet thorn what?

I just want to pass the UIDs of a row to the rows it spawns from an api call

velvet thorn Dec 2, 2020, 1:35 AM

#

covert cedar I just want to pass the UIDs of a row to the rows it spawns from an api call

what is "spawns"

#

you're going to need to give more details

covert cedar Dec 2, 2020, 1:36 AM

#

Sorry. For each row of my df, it has a unique option. The values are then passed to a robin_stocks method that returns roughly 210k rows to the 1 input. I need all 210k to be directly traceable back to the 1 input

velvet thorn Dec 2, 2020, 1:36 AM

#

covert cedar Sorry. For each row of my df, it has a unique option. The values are then passed...

what do you mean by "directly traceable"?

#

like do you want all the results in one big DataFrame

#

and have an additional column

#

to indicate the source?

covert cedar Dec 2, 2020, 1:37 AM

#

So if input 1 is ID 1, I want to pass that 1 to all 210k

#

Yes

velvet thorn Dec 2, 2020, 1:37 AM

#

do you know what a join is?

covert cedar Dec 2, 2020, 1:37 AM

#

Yes

velvet thorn Dec 2, 2020, 1:37 AM

#

yup

#

that's what you want

covert cedar Dec 2, 2020, 1:37 AM

#

Thats why UID

#

To join on

velvet thorn Dec 2, 2020, 1:37 AM

#

left join on that

covert cedar Dec 2, 2020, 1:37 AM

#

but

#

I cant get the value to populate

velvet thorn Dec 2, 2020, 1:37 AM

#

covert cedar I cant get the value to populate

what do you mean

covert cedar Dec 2, 2020, 1:39 AM

#

📎 1201.PNG

#

See the NAN

#

Symbol is provided by the response from robin_stocks

velvet thorn Dec 2, 2020, 1:40 AM

#

hm

#

okay so

#

what are you joining on?

covert cedar Dec 2, 2020, 1:41 AM

#

Nothing yet

#

Trying to be able to

#

1 to many

velvet thorn Dec 2, 2020, 1:41 AM

#

if you haven't joined

#

why are there null values

covert cedar Dec 2, 2020, 1:41 AM

#

   a = f.at[i,'strike']
    c = f.at[i,'xpire']
    df4.at[i,'strike'] = a
    df4.at[i,'xpire'] = c```

#

is how I did it

velvet thorn Dec 2, 2020, 1:42 AM

#

huh.

#

wait

#

I

#

actually don't get why you did that

#

that looks like a loop.

#

why do you have a loop?

covert cedar Dec 2, 2020, 1:42 AM

#

YEs

#

It was

#

for i in tqdm(range(len(df2))):
    
    df4 = df4.append(r.options.get_option_historicals(f.loc[i]['symbol'], f.loc[i]['xpire'], f.loc[i]['strike'], 'call', interval='5minute', span='week', bounds='regular', info=None))

    
    
    a = f.at[i,'strike']
    c = f.at[i,'xpire']
    df4.at[i,'strike'] = a
    df4.at[i,'xpire'] = c```

velvet thorn Dec 2, 2020, 1:42 AM

#

I'm going to assume

#

r.options.get_option_historicals is the said function?

covert cedar Dec 2, 2020, 1:43 AM

#

Yes

velvet thorn Dec 2, 2020, 1:43 AM

#

that's

#

a pretty weird way to do things

#

let me think for a bit

#

show me a bit of df2

#

in text

#

not picture form

#

my gut feel is that you should use df2.apply

covert cedar Dec 2, 2020, 1:44 AM

#

Yeah I tried that

velvet thorn Dec 2, 2020, 1:44 AM

#

with pd.concat

#

and then join on common columns

covert cedar Dec 2, 2020, 1:45 AM

#

df2 is made like this

#


for i in tqdm(range(len(df))):
    df2 = df2.append(r.options.find_tradable_options(df.loc[i]['Symbol'],expirationDate=None, strikePrice=None, optionType=None, info=None))```

velvet thorn Dec 2, 2020, 1:45 AM

#

you should really

#

avoid append in a loop

#

probably df.transform would be appropriate

covert cedar Dec 2, 2020, 1:47 AM

#

would it be like

#

df2.transform(lambda x: r.stuff(x['1'],x['2'],))

velvet thorn Dec 2, 2020, 1:54 AM

#

actually, no

#

more like df['Symbol].transform

#

since you're only using that column

covert cedar Dec 2, 2020, 1:55 AM

#

It takes in 3 values, the df2 gen works ok

serene scaffold Dec 2, 2020, 2:13 AM

#

@velvet thorn native numpy support for this:

class TriangleFunc:

    def __init__(self, start, end):
        self._start = start
        self._end = end
        self._mid = ((end - start) / 2) + start
        self._slope = 1 / ((end - start) / 2)

    def __call__(self, x):
        if not (self._start <= x <= self._end):
            return np.nan
        slope = self._slope if x <= self._mid else -self._slope
        return slope * (x - self._start)

except where I get the slope right for the right side of the midpoint

#

I'm making a fuzzy controller

#

problem is I don't think you can vectorize methods

#

looks like vectorizing doesn't improve performance so I guess it's a moot point.

civic fractal Dec 2, 2020, 2:57 AM

#

https://stackoverflow.com/questions/65100587/is-there-any-way-to-make-long-iterative-codes-in-python-not-take-progressively-l

Stack Overflow

Is there any way to make long iterative codes in python not take pr...

I noticed that some recent code that I was experimenting with to find the digits of pi to many decimals started off running very quickly getting often thousands of decimals per second (might be

#

I'd appreciate an answer if possible

serene scaffold Dec 2, 2020, 3:01 AM

#

@civic fractal the answer that's already given is quite good

#

it sounds like you're pushing the limits of how numbers are stored on your computer

velvet thorn Dec 2, 2020, 5:46 AM

#

@velvet thorn native numpy support for this:

class TriangleFunc:

    def __init__(self, start, end):
        self._start = start
        self._end = end
        self._mid = ((end - start) / 2) + start
        self._slope = 1 / ((end - start) / 2)

    def __call__(self, x):
        if not (self._start <= x <= self._end):
            return np.nan
        slope = self._slope if x <= self._mid else -self._slope
        return slope * (x - self._start)

except where I get the slope right for the right side of the midpoint
@serene scaffold I must confess I do not see what this code is meant to do

#

🥴

serene scaffold Dec 2, 2020, 5:46 AM

#

@velvet thorn I figured that part out

#

now I'm just trying to plot everything

#

and then I'm 1/3 of the way through the assignment

#

💥 🎆 😢

#

(took two days to get this far)

#

(due at 4pm)

velvet thorn Dec 2, 2020, 5:52 AM

#

ah, assignments

#

atb! 👋

serene scaffold Dec 2, 2020, 5:57 AM

#

this is wrong :((((((((((((((

📎 unknown.png

velvet thorn Dec 2, 2020, 6:17 AM

#

what is that supposed to be

lapis sequoia Dec 2, 2020, 1:04 PM

#

Hello! Does anyone here know anything about data mining using Python? I have an assignment I have to do.
Here's the kinda stuff we have to cover...

📎 unknown.png

#

If anyone can help let me know! 😁

#

Just @lapis sequoia me

#

And this is using Anaconda if that means anything

torpid cave Dec 2, 2020, 2:18 PM

#

Hi @lapis sequoia , your task seems simple and the explanation on what is expected is quite good, let us know if you need any help

#

Anaconda is just a Python distribution that has the relevant libraries/packages (however you call it) and its dependencies sort of installed

lapis sequoia Dec 2, 2020, 2:25 PM

#

Yeah I think so far it has been pretty straight forward, I suppose I'm just kinda worried that it seems too simple and that it's like a trick question or something?

#

📎 unknown.png

#

Like so far this is what I have

torpid cave Dec 2, 2020, 2:26 PM

#

The outliers one looks quite fun

lapis sequoia Dec 2, 2020, 2:28 PM

#

Oh that one I have no idea where to even begin honestly

#

Maybe you could help me with that

#

I imagine most of the marks are going towards that question

#

📎 unknown.png

#

Is this right @torpid cave ?

torpid cave Dec 2, 2020, 2:34 PM

#

I would just present one number instead of creating the table though

lapis sequoia Dec 2, 2020, 2:39 PM

#

What number though?

#

I don't get it 😂

torpid cave Dec 2, 2020, 2:39 PM

#

haha so your correlation is -0.1

#

You show a correlation table instead of the correlation between 2 variables, that is why that number is repeated

#

So instead of showing that matrix I would try to get just the -0.109

#

But it is just a personal preference thb

#

tbh

lapis sequoia Dec 2, 2020, 2:40 PM

#

How do you know it's the -0.109

#

For the second one is it 0.927 then?

red hound Dec 2, 2020, 2:41 PM

#

I have an assignment to show my understanding of boosting and bagging concepts. The report requires me to provide examples of various examples of boosting and bagging. Do you think it is ethical to use sample code from xgboost or scikit to show how ada boost, xgboost,etc. works?

lapis sequoia Dec 2, 2020, 2:41 PM

#

You'd definitely have to reference it

#

Don't take stuff from online without referencing it because you're inherently implying it's all your work then

red hound Dec 2, 2020, 2:41 PM

#

Of course I will reference but shouldnt be an issue after that right

#

Since the goal is not to improve a given model just to show the understanding of these concepts

lapis sequoia Dec 2, 2020, 2:42 PM

#

I mean I've never heard of xgboost or scikit before, but if the website or your lecturer doesn't declare that you can't do that then I guess it isn't an issue?

red hound Dec 2, 2020, 2:43 PM

#

Cool the TA references the site and recommends checking it out

#

Thanks just wanted a second opinion

torpid cave Dec 2, 2020, 2:43 PM

#

Yeah reference everything

#

Even your lecturer

lapis sequoia Dec 2, 2020, 2:43 PM

#

I don't know what boosting or bagging is but I guess it's not too small or simple to create an example yourself?

#

Our lecturers say not to reference them

#

I think it's kinda cringy when you do

#

When you like quote them from a class...

torpid cave Dec 2, 2020, 2:44 PM

#

I am from the school that references ppt slides

lapis sequoia Dec 2, 2020, 2:44 PM

#

Hmmm

torpid cave Dec 2, 2020, 2:44 PM

#

Rules were quite strict in grad school

red hound Dec 2, 2020, 2:44 PM

#

i tend to reference the book used in class thats about it

#

undergraduate most students here dont cite properly

torpid cave Dec 2, 2020, 2:44 PM

#

tbh I don't remember citing much in undergrad

#

But this was quite a while ago

lapis sequoia Dec 2, 2020, 2:45 PM

#

I think there has to be some sort of line because mostly 99% of everything we know came from somewhere else, and if we were to reference everything it would be kinda tedious...

torpid cave Dec 2, 2020, 2:45 PM

#

And I did engineering

lapis sequoia Dec 2, 2020, 2:45 PM

#

I think for the most part your lecturers understand that most of what you're saying came from them anyway

#

Unless you specify otherwise

torpid cave Dec 2, 2020, 2:45 PM

#

In grad school... I did at least 20 references per paper

lapis sequoia Dec 2, 2020, 2:45 PM

#

Damn...

#

I think in grad school it's a bit different though

torpid cave Dec 2, 2020, 2:45 PM

#

Yep

lapis sequoia Dec 2, 2020, 2:45 PM

#

Because your work may get a bit more public and attention

#

And so it's kinda necessary to show your sources

red hound Dec 2, 2020, 2:46 PM

#

most of my reports have like 5 and 90% of them are from blogs

lapis sequoia Dec 2, 2020, 2:46 PM

#

As opposed to undergrad where your work is really only gonna be seen by your lecturer

torpid cave Dec 2, 2020, 2:46 PM

#

Depends on the subject as well I guess

red hound Dec 2, 2020, 2:46 PM

#

also on the TA. Most cant be bothered to check really

torpid cave Dec 2, 2020, 2:47 PM

#

For example I would not reference how to get correlations... but I would reference testing for heteroskasdicity

red hound Dec 2, 2020, 2:47 PM

#

what major did you do graduate studies if i may ask?

torpid cave Dec 2, 2020, 2:47 PM

#

BsC Engineering - MsC Applied Economics

#

So yeah

red hound Dec 2, 2020, 2:48 PM

#

Ahh cool aight thanks guys I should be fine if I reference the samples

torpid cave Dec 2, 2020, 2:48 PM

#

Yeah, reference as much as you can, you never lose much and you might impress your lecturer if he cares about that shit

lapis sequoia Dec 2, 2020, 2:50 PM

#

But not referencing could be a serious offense 😬

#

@torpid cave

#

📎 unknown.png

#

This is just a shot in the dark at this point

#

I have no idea if this is correct or not

#

📎 unknown.png

#

For this point ^

livid quartz Dec 2, 2020, 3:29 PM

#

Would t-SNE be useful for visualising this dataset (https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease) ? In the documentation it says to use PCA if the data has a large amount of variables, but i'm not too sure what constitutes as a large amount...

lapis sequoia Dec 2, 2020, 3:31 PM

#

you can use either, usually t-SNE would be preferable for extremely high dimensional data

#

t-SNE/PCA would work fine by the looks of it

livid quartz Dec 2, 2020, 3:31 PM

#

Thanks 🙂

lapis sequoia Dec 2, 2020, 3:52 PM

#

is it okay to upload sensitive data as a private dataset on kaggle?

#

for some reason the TPU on colab doesn't work well while reading data from drive

cedar sun Dec 2, 2020, 3:57 PM

#

guys, i got this loop to load the data set:

for pok in os.listdir(datadir):
    path = os.path.join(datadir, pok)
    images = os.listdir(path)
    amount = len(images)
    for i in range(amount):
        img_array = cv2.imread(os.path.join(path, images[i]), 0)
        new_array = cv2.resize(img_array, dimension)
        if i < amount * 0.8:
            train_data.append([new_array, pok])
            train_label.append([pok])
        else:
            valid_data.append([new_array, pok])
            valid_label.append([pok])```

#

but it takes a while to complete. Can i run it once, export it somewhere and somehow, and the next times i just load it?

lapis sequoia Dec 2, 2020, 4:14 PM

#

save the dataset, there are many formats

#

pickle, npy, npz, you can write it to a text file. If its a numpy array best options for you are npy and npz

cedar sun Dec 2, 2020, 4:39 PM

#

numpy array are only the images

#

new_array

#

since opencv loads them as numpy array

#

pok is just a string

#

also, i am thinking. train_data doesnt need to have the label if train_label exists

#

or train_label shouldnt exists. Right?

lapis sequoia Dec 2, 2020, 4:41 PM

#

Hey ! I'm using matplotlib to display activities with bars and legends, but some text is overlapping, any idea why ?

#

📎 unknown.png

#

Well, I know why

#

but I don't know how to fix it

#

also, you noticed the hours on the bottom don't exactly display hours from 00:00 to 24:00, do you know how I may be able to fix this ?

arctic wedgeBOT Dec 2, 2020, 4:43 PM

#

Hey @lapis sequoia!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

lapis sequoia Dec 2, 2020, 4:43 PM

#

https://paste.pythondiscord.com/uqitimozen.sql

#

i'm very new to matplotlib so don't understand everything in there, I copy pasted a chunk of code from stackoverflow to get the structure

carmine bough Dec 2, 2020, 4:51 PM

#

Hey, is someone familiar with opencv and a little machine learning?

frozen moth Dec 2, 2020, 4:56 PM

#

what's up guys

#

anyone know a good way to classify job seniority?

lapis sequoia Dec 2, 2020, 4:57 PM

#

📎 unknown.png

#

Anyone know how to do this? 🤔

#

And this is coming from a dataset where I have a bunch of values for petal width and length.

frozen moth Dec 2, 2020, 4:58 PM

#

iris?

lapis sequoia Dec 2, 2020, 4:58 PM

#

carmine bough Hey, is someone familiar with opencv and a little machine learning?

just ask your question directly imo

#

📎 unknown.png

#

It's the name of the data set

#

count for which the condition is met / total number of combinations

frozen moth Dec 2, 2020, 4:59 PM

#

^this

carmine bough Dec 2, 2020, 4:59 PM

#

Well I have a video and I need to recognize and display the poses left arm up and right arm up and I don't quite know how to do it

frozen moth Dec 2, 2020, 4:59 PM

#

create new feat width*length

#

df['new_feat'] = df['widht] + df['length]

lapis sequoia Dec 2, 2020, 5:00 PM

#

not + right?

lapis sequoia Dec 2, 2020, 5:00 PM

#

carmine bough Well I have a video and I need to recognize and display the poses left arm up an...

try posenet and look for hand gesture recognition/ pose detection models on github

carmine bough Dec 2, 2020, 5:00 PM

#

lapis sequoia try posenet and look for hand gesture recognition/ pose detection models on gith...

thanks

frozen moth Dec 2, 2020, 5:02 PM

#

lapis sequoia * not + right?

yes

#

sorry

#

and then count them

#

len(df[df[new_feat] > 1] )

#

divide that value by the amount of entries in you df

lapis sequoia Dec 2, 2020, 5:03 PM

#

📎 unknown.png

frozen moth Dec 2, 2020, 5:03 PM

#

i.e. and then count them
len(df[df[new_feat] > 1] ) / df.shape[0]

lapis sequoia Dec 2, 2020, 5:03 PM

#

🤔

#

What is the df.shape[0]?

#

What does that mean?

#

len(df)

#

What's the difference?

frozen moth Dec 2, 2020, 5:04 PM

#

its the size of your dataframe (i.e how many sampels of petals u have)

lapis sequoia Dec 2, 2020, 5:04 PM

#

theres no difference, its the number of entries

frozen moth Dec 2, 2020, 5:04 PM

#

its the same thing

#

sorry it's just a habit of mine

lapis sequoia Dec 2, 2020, 5:05 PM

#

📎 unknown.png

#

Gg

#

So what other method could I use though?

frozen moth Dec 2, 2020, 5:05 PM

#

therefore 2/3 are bigger than 3

#

than 1**

lapis sequoia Dec 2, 2020, 5:06 PM

#

question says two methods HMM

#

Hmm is right

#

📎 unknown.png

#

I took a stab at this question also

#

but meh

#

📎 unknown.png

#

I have no idea if that's right

#

but in what respect, a difference formula (mathematical approach) or a different way to query the data frame

#

you could train a logistic regression model which gives probability that your condition is true, given that class 1: product >1 class 2: product <=1

#

📎 unknown.png

#

there was also this question

#

Which I have no idea about

#

Are outliars judged by their distance from the average?

frozen moth Dec 2, 2020, 5:10 PM

#

from the line

lapis sequoia Dec 2, 2020, 5:11 PM

#

And and what point is the max?

#

What line though?

#

What is the line

#

their distance from the linear regression line

#

📎 unknown.png

#

This?

#

I see a lot of lines here...

#

😳

frozen moth Dec 2, 2020, 5:11 PM

#

the furthest one

lapis sequoia Dec 2, 2020, 5:12 PM

#

The question says an outliar is identified as the point with maximum distance

#

but like what is the max distance?

#

There can be more than one outliar right?

frozen moth Dec 2, 2020, 5:12 PM

#

distance perpendicular to the line

magic dune Dec 2, 2020, 5:13 PM

#

I am working on a linear regression line can anyone help??? please!?

frozen moth Dec 2, 2020, 5:13 PM

#

lapis sequoia There can be more than one outliar right?

that is true, but for your case the question says that th outlier is the furthest way

lapis sequoia Dec 2, 2020, 5:13 PM

#

Ahhh

frozen moth Dec 2, 2020, 5:13 PM

#

hoenstly it's a sh*t definition for an outlier

lapis sequoia Dec 2, 2020, 5:13 PM

#

ahaha

#

usually the outliers problem isnt so easy xD

#

but i think its a training exercise so its ok

#

But I just need to check each point and get whichever is furthest from the line, right?

frozen moth Dec 2, 2020, 5:14 PM

#

exactly

lapis sequoia Dec 2, 2020, 5:14 PM

#

Just for loop through the data set

#

But

#

How do I get the distance from the line?

#

What do I say to get that?

#

euclidean distance

magic dune Dec 2, 2020, 5:16 PM

#

does anyone kind of understand linear regression because I am stuck

lapis sequoia Dec 2, 2020, 5:16 PM

#

Not at all

#

Ahahaa

frozen moth Dec 2, 2020, 5:16 PM

#

you have your x value (length) and your y value (width

when you take your x value and put it into your LR eqn ^y = mx + b

you compare the real value y with the predicted value ^y

#

max(y - ^y ) do it for all of them an take out the maximum one

magic dune Dec 2, 2020, 5:17 PM

#

frozen moth you have your x value (length) and your y value (width when you take your x val...

thx

#

that really helps

frozen moth Dec 2, 2020, 5:17 PM

#

np

lapis sequoia Dec 2, 2020, 5:20 PM

#

lapis sequoia is it okay to upload sensitive data as a private dataset on kaggle?

anyone know this?

frozen moth Dec 2, 2020, 5:21 PM

#

no clue

#

is the data NDA stuff?

lapis sequoia Dec 2, 2020, 5:22 PM

#

yes

frozen moth Dec 2, 2020, 5:22 PM

#

then i wouldn't

#

even being private

lapis sequoia Dec 2, 2020, 5:23 PM

#

not my first choice either but I'm having issues on colab TPU

serene scaffold Dec 2, 2020, 5:23 PM

#

Is there a way to transform this dataframe:

           0    1
0   0.435752  0.0
1   0.296690  0.0
2   0.737365  2.0
3   0.332111  1.0
4   0.030198  1.0

into this:

0     1 
0.0   0.435752  0.296690
1.0   0.332111  0.030198
2.0   0.737365

#

I know it's no longer rectangular data

frozen moth Dec 2, 2020, 5:24 PM

#

split the df and then then merge?

#

nvm read it wrong

serene scaffold Dec 2, 2020, 5:25 PM

#

I thought it might be the pivot method

lapis sequoia Dec 2, 2020, 5:26 PM

#

have you tried groupby

serene scaffold Dec 2, 2020, 5:26 PM

#

I didn't think that would have plotting functionality. I'm making density distribution plots for three classes.

rustic dew Dec 2, 2020, 5:27 PM

#

in pivot you need to have unique indices

lapis sequoia Dec 2, 2020, 5:30 PM

#

I'm thinking groupby column 1 and make a function that returns values having the number, maybe would take some more editing to get the column name in order

#

let me try and get back to you

radiant ingot Dec 2, 2020, 5:48 PM

#

Hey everyone, I'm working with time series data and could use some opinions on the best way to format dates. I have to choose between datetime.datetime or numpy.datetime64 objects.

#

Leaning towards the native datetime library, but I thought that datetime64 may play nicer with certain models? Anyone run into this before?

rustic dew Dec 2, 2020, 5:49 PM

#

I'd say if you use numpy for everything, roll with np.datetime64, if pandas, use pandas own datetimes, if mix or not sure, go with datetime.datetime

#

worst-case-scenario, you can always convert

radiant ingot Dec 2, 2020, 5:50 PM

#

Right now we use a mix of pandas and numpy

#

Thanks, appreciate the thoughts

rustic dew Dec 2, 2020, 5:52 PM

#

although a bit old, but mostly still valid SO answer on converting: https://stackoverflow.com/a/13704307

Stack Overflow

Converting between datetime, Timestamp and datetime64

How do I convert a numpy.datetime64 object to a datetime.datetime (or Timestamp)?

In the following code, I create a datetime, timestamp and datetime64 objects.

import datetime
import numpy as np
...

#

so practically, you can choose anything what you like:) personally, I like better native datetime.datetime, not sure why...

radiant ingot Dec 2, 2020, 5:53 PM

#

Yeah I'm a bit spoiled because we were working in R before and the lubridate package made my life so easy haha

arctic wedgeBOT Dec 2, 2020, 5:55 PM

#

Hey @fallow prism!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia Dec 2, 2020, 6:09 PM

#

serene scaffold Is there a way to transform this dataframe: ```py 0 1 0 0.435752...

could you solve?

serene scaffold Dec 2, 2020, 6:09 PM

#

lapis sequoia could you solve?

I haven't solved that yet

lapis sequoia Dec 2, 2020, 6:14 PM

#

is there a way to convert list to a dataframe row

lapis sequoia Dec 2, 2020, 6:33 PM

#

I export my env into .yaml and then import to anac nav, then I cd to the path and I find out that my project folder is missing, I go back to my laptop, cp folder, mv to cloud to then cp to path, any idea of how to do this faster than it is or how do you do it?

#

whats the objective

#

I work on my laptop most of the time but I decided to work the project on the desktop.

#

I'm curious about if there is a better or faster way to achieve this.

radiant ingot Dec 2, 2020, 6:41 PM

#

Could this be done with git?

#

I'm no expert at these things but we use git (via github) for version control on all our code, perhaps you could just commit your env yaml as well and then pull whenever you want to work on a new machine

lapis sequoia Dec 2, 2020, 6:45 PM

#

I will look into it, I think it could work, thanks for the idea.

lapis sequoia Dec 2, 2020, 6:59 PM

#

serene scaffold I haven't solved that yet

i got close but couldnt get exactly the same

#

nvm that

serene scaffold Dec 2, 2020, 7:05 PM

#

lapis sequoia i got close but couldnt get exactly the same

ended up doing pd.DataFrame({cls: normalized_train[normalized_train[1] == cls].iloc[:, 0] for cls in {0., 1., 2.}})

#

@umbral oracle We don't allow people to recruit for paid opportunities of any kind here.

vagrant parcel Dec 2, 2020, 7:36 PM

#

Hey guys, did anyone used DataQuest? I'm thinking about paying the pro year, but I wanted to hear from someone that used it (if this is not the place, where can I talk about it?)

umbral oracle Dec 2, 2020, 7:37 PM

#

serene scaffold <@!771441381352669265> We don't allow people to recruit for paid opportunities o...

Oh sorry!

lapis sequoia Dec 2, 2020, 7:38 PM

#

vagrant parcel Hey guys, did anyone used DataQuest? I'm thinking about paying the pro year, but...

If you study, you can get the pro version for free. Ask your Prof.

vagrant parcel Dec 2, 2020, 7:41 PM

#

I don't think my college here in Brasil has a partnership with them... But I'll ask anyway

frozen moth Dec 2, 2020, 7:43 PM

#

lapis sequoia Not at all

📎 1.png

frozen moth Dec 2, 2020, 7:43 PM

#

frozen moth

📎 2.png

frozen moth Dec 2, 2020, 7:43 PM

#

frozen moth

📎 3.png

frozen moth Dec 2, 2020, 7:43 PM

#

frozen moth

📎 4.png

#

Guys has anyone here classified job seniority based on job descriptions? [NLP]

prime cloud Dec 2, 2020, 8:16 PM

#

I am trying to implement an environmental sound classifier using the urban sounds 8k data set but it seems like my validation loss seems to grow with the epochs. Any idea why?

📎 unknown.png

#

The reference paper I am using gets about 74% accuracy

#

https://arxiv.org/pdf/1608.04363.pdf

solemn oracle Dec 2, 2020, 8:45 PM

#

I just moved to a new computer and am have trouble getting pandas to show my graphs in atom. It says it’s finished but shows nothing

#

Anything simple I’m missing?

#

Far as I can tell, I’m just doing df.plot()

lapis sequoia Dec 2, 2020, 9:25 PM

#

Hey guys, so there’s this job opening for “Artificial Intelligence Engineer” role at this company that I am thinking I should apply to... this is the job post ... any tips on how I should prepare for that and what to study... I am fairly new to this

📎 image0.png

austere swift Dec 2, 2020, 9:37 PM

#

that would be more of a question to ask #career-advice

lapis sequoia Dec 2, 2020, 9:38 PM

#

Ahh okay sry

#

Thought I’d ask the data science people for some resources or tips

austere swift Dec 2, 2020, 9:39 PM

#

its alr, just application and job stuff is more in that realm, although one tip i'd give you is to have some sort of example project you could show them

frozen moth Dec 2, 2020, 9:40 PM

#

i study data science engineering and I'm not quite sure what an AI engineer is

#

i would assume that an AI eng would have to know the NLP and be comfortable with algorithms such as A* and be able to figure constrain satisfaction problems etc but the description for that job seems to be something a data scientist would do?

#

or maybe not

#

honestly idk

austere swift Dec 2, 2020, 9:43 PM

#

its more of someone who can make machine learning/deep learning models to run in the field

lapis sequoia Dec 2, 2020, 9:43 PM

#

I think they just mean data science/machine learning

frozen moth Dec 2, 2020, 9:43 PM

#

fair enough

lapis sequoia Dec 2, 2020, 9:45 PM

#

Know any good resource for learning some of the maths related to data science

#

Forgot most of my university maths 😅

frozen moth Dec 2, 2020, 9:46 PM

#

it's basically statistics

#

and machine learning (SVM, LR, DT, RF, ANN, NB, etc.)

#

brush up on your multivariate analysis and statistics

lapis sequoia Dec 2, 2020, 9:50 PM

#

Hmmm

frozen moth Dec 2, 2020, 9:50 PM

#

were you looking for something more specific?

lapis sequoia Dec 2, 2020, 9:51 PM

#

So my only experience with data science was like 2.5-3 years ago at my 5-6 months internship... was getting the hang of it until I stopped and life continued

frozen moth Dec 2, 2020, 9:51 PM

#

whats your background?

lapis sequoia Dec 2, 2020, 9:51 PM

#

Computer science degree and currently working in ASP.Net

#

But I kept using python here and there for automation and scripting

frozen moth Dec 2, 2020, 9:52 PM

#

yea data science is mostly scripting

#

since you're compsci i assume your programming skills are good

#

so i'd say focus on the math and some info viz

#

the math you require is, like I said, stats, multivariate analysis and all that ML mumbo jumbo

#

you've got some pretty neat O'Reilly textbooks that focus on the math behind data science

#

you can torrent them for free

#

at http://libgen.li/

lapis sequoia Dec 2, 2020, 9:55 PM

#

They say statistics, probability theory, machine learning algorithms and data modeling

#

In the post

frozen moth Dec 2, 2020, 9:56 PM

#

yup sounds about right

lapis sequoia Dec 2, 2020, 9:56 PM

#

And python data science stack, I’ve only used like pandas, numpy and some scikit learn from what I remember at my internship

#

Is this what they mean with that

frozen moth Dec 2, 2020, 9:58 PM

#

idk tbh but it must be

waxen birch Dec 2, 2020, 9:58 PM

#

📎 unknown.png

frozen moth Dec 2, 2020, 9:58 PM

#

you've got the python software packages that are common thru out all DS: sklearn, numpy, pandas, matplotlib

waxen birch Dec 2, 2020, 9:59 PM

#

hello, having such a data in csv i would like to create df having period of time in this case having : Doctorid1 period 12:00-12:16

frozen moth Dec 2, 2020, 9:59 PM

#

and then you have the ML ones like tensorflow/keras, and sklearn,

waxen birch Dec 2, 2020, 9:59 PM

#

using pandas and groupby, does anyone has some clues? 🙂

frozen moth Dec 2, 2020, 9:59 PM

#

the info viz stuff: seaborn, yellowbricks, dash plotly etc

#

the NLP ones like spaCy and NLTK

lapis sequoia Dec 2, 2020, 10:00 PM

#

Uff yeah those I remember from my internship the NLP ones

frozen moth Dec 2, 2020, 10:00 PM

#

then more specific ones ... for example id your dealing with networks you'd use networkX, powerlaw etc.

#

i guess through practice you'll start accumulating knowledge on these libraries

frozen moth Dec 2, 2020, 10:01 PM

#

frozen moth you've got the python software packages that are common thru out all DS: sklearn...

but these ones are a must

#

thats like your foundation

lapis sequoia Dec 2, 2020, 10:03 PM

#

Right, lets see what I can do... the sucky thing is that my laptop is broken so I only have the PC at work to try and squeeze some learning while no one is looking 😅

frozen moth Dec 2, 2020, 10:03 PM

#

good luck there buddi

#

a couple of good places to start is kaggle.com and https://towardsdatascience.com/

lapis sequoia Dec 2, 2020, 10:05 PM

#

Nice, was looking also at a site called analytics vidhya

#

Don’t know if they’re good

frozen moth Dec 2, 2020, 10:06 PM

#

https://math2510.coltongrainger.com/books/2017-bruce-and-bruce-pratical-statistics-for-data-scientists.pdf

#

^^^ apparently you don't even have to download the textbook

#

its all there

frozen moth Dec 2, 2020, 10:07 PM

#

lapis sequoia Nice, was looking also at a site called analytics vidhya

🤔 didn't know this one, deffo gonna check it out

lapis sequoia Dec 2, 2020, 10:08 PM

#

Havent looked into them much but was reading a medium article by them

kind jungle Dec 2, 2020, 10:12 PM

#

can someone please explain to me what is wrong

#

this just baffles me

#

📎 unknown.png

spark stag Dec 2, 2020, 10:14 PM

#

those aren't " your using so it doesn't see data.csv as a string, it sees it as a variable with some other type of quotes first (causing the invalid character)

south hedge Dec 2, 2020, 10:15 PM

#

kind jungle this just baffles me

the () should contain the name of the file

kind jungle Dec 2, 2020, 10:15 PM

#

it does

#

jamiesaunders was right

lapis sequoia Dec 2, 2020, 10:15 PM

#

Fix the quotes

#

Yeah

kind jungle Dec 2, 2020, 10:15 PM

#

the first quote was apparently a "LATIN SMALL LETTER A WITH CIRCUMFLEX "

trim oar Dec 2, 2020, 10:15 PM

#

Problem is I don't understand how did it inteprete hte quotes like that

south hedge Dec 2, 2020, 10:16 PM

#

kind jungle can someone please explain to me what is wrong

remove the (.csv), but instead ("data_csv"). that should do the trick

kind jungle Dec 2, 2020, 10:16 PM

#

according to a character identifier

#

it works now

#

thx

#

:)

fallow prism Dec 2, 2020, 10:35 PM

#

how i can to do to dataframe.head() show me all row?

austere swift Dec 2, 2020, 11:17 PM

#

why not do print(dataframe) instead?

waxen birch Dec 2, 2020, 11:20 PM

#

having this kind of data, using pandas i should print in one row (cell) a period of time (in this case it should be 12:00 - 12:16), any clues? 😄

📎 unknown.png

torpid cave Dec 2, 2020, 11:21 PM

#

Sorry @lapis sequoia I went to sleep

#

Still need help?

fallow prism Dec 2, 2020, 11:41 PM

#

austere swift why not do `print(dataframe)` instead?

I had not thought of it

#

still cut it

#

my problem is the width, i need more width for each row or wrap rows

#

dataframe.apply(print) and that is all

#

or Serie.apply(print)

#

thanks !

#

oh, that isn't works 😢

#

📎 unknown.png

#

that 3 points

#

don't like to me

#

a['descripcion_del_hecho - Final'][:5].apply(print) that works fine for me i guess the other ways is mor difficult

#

more*

#

😅

#

pd.options.display.max_colwidth=None

#

that work better

river yarrow Dec 3, 2020, 12:11 AM

#

any someone with kaggle competition experiment?

blazing bridge Dec 3, 2020, 12:13 AM

#

are you asking if someone wants to do a kaggle competition with you

river yarrow Dec 3, 2020, 12:13 AM

#

Do I have the right to edit the notebook after a competition deadline in Kaggle is over?

blazing bridge Dec 3, 2020, 12:13 AM

#

not sure

river yarrow Dec 3, 2020, 12:15 AM

#

I found

#

You can make a submission at any time and as many times as you like, but we will only consider your latest submission before the deadline.

magic dune Dec 3, 2020, 12:23 AM

#

I need help writing a linear regression code can someone help???

#

@glad mulch here is my code I want to make a linear regression line```py
import pandas, os
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import linear_model

root=os.path.dirname(file)
data_dir=os.path.join(root,"data")
fig_dir=os.path.join(data_dir,"figs")

def make_plt(x,y,df):
x_list=df[x].to_list()
y_list=df[y].to_list()
x_train, x_test, y_train, y_test = train_test_split(x_list, y_list, test_size=0.2, random_state=42)
linear=linear_model.LinearRegression
plt.title("Coding Books")
plt.legend(["train","test"])
plt.scatter(x_train,y_train)
plt.scatter(x_test,y_test)
plt.savefig(os.path.join(fig_dir,f"{x}-{y}.png"))
plt.close()
def main():
data_raw=os.path.join(data_dir,"prog_book.csv")
raw_df=pandas.read_csv(data_raw)
raw_df["Reviews"]=raw_df["Reviews"].str.replace(",","")
raw_df['Reviews'] = raw_df['Reviews'].astype(int)
#plot price verus rating plot steps #1 turn columns into lists
lists=["Rating","Reviews","Number_Of_Pages","Type","Price"]

for col in lists:
    for col2 in lists:
        if col2 != col:
            make_plt(col,col2, raw_df)


#step #2 use plt.plot to plot the lists
print(lists)
print(type(lists[0]))


# # step #3 export the plot to a pdf

# #regresion lines

if name == 'main':
main()

#

I do not know how to make the line

#

I know the different equations but other than that I have no idea what I am supposed to do

#

thank you so much

#

your a big life saver

neat dew Dec 3, 2020, 1:30 AM

#

can anyone help me install tensorflow on IDLE? i seem to keep getting callback errors when attempting to import and need it working for a school assignment 😦

cedar sun Dec 3, 2020, 2:07 AM

#

ValueError: Input 0 is incompatible with layer conv2d_1: expected ndim=4, found ndim=3

#

i am getting this error

#

my images are black and white

#

img_array = cv2.imread(os.path.join(path, images[i]), 0)

#

opening them with 0 turns into black white i guess

#

model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=dimension))```

#

dimension = (64, 64)

#

where is the error?

sharp stump Dec 3, 2020, 2:12 AM

#

dimension amount is different i guess ¯_(ツ)_/¯
you could use stack overflow...