#data-science-and-ml

1 messages · Page 272 of 1

lapis sequoia
#

Do you all use PyTorch or TF for modelling?

ripe forge
#

Whichever has an easier github repo at the time 😛

covert spire
#

Where can i find data science projects to do as a beginner?

#

Like, is there an archive or smth for it

still delta
#

Does Someone suggest me a challenge ???

#

I want to take a part in a challenge????

ashen socket
#

@still delta @covert spire You can try kaggle. It has everything you might need. It has datasets for you to use. It has solutions. It has contests and a lot more.

still delta
#

Is there any team working?

vapid burrow
#

Consider joining a code jam

#

You can form teams and work together

paper nacelle
#

Hi. I need help with ARIMA.
I am using the code below to find p,q&r. (I saw it somewhere, it worked fine with another data but not with the one I'm trying currently). It is to predict the number of daily covid cases in a country.
`model = pm.auto_arima(train, start_p=1, start_q=1,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=1, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)

print(model.summary())It is giving me the following error:ValueError: Input contains NaN, infinity or a value too large for dtype('float64').'

#

I have removed all NaN values tho....

lapis sequoia
#

guys, how do i train a cnn with more than 2 classes???

livid quartz
#

Any ideas on how to plot a NumPy array? I've tried pcolor but that plots a mirrored version of the array and imshow() grid lines dont surround the array values properly

vague portal
#

Thanks! I've found a module which is a pipeline for processing text https://spacy.io/usage/spacy-101#pipelines, this is pretty similar to what I'm trying to achieve but with dataframes instead. https://github.com/explosion/spaCy/blob/master/spacy/pipeline/pipes.pyx --> this is the source code but I'm struggling to understand how they've built and organised the classes 😦

spaCy 101: Everything you need to know

The most important concepts, explained in simple terms

wintry olive
#

Sounds good. I'm going to focus on researching statistics & probability concepts for awhile. I understand how the comp sci processes work and what the models are trying to do. I could trial and error through it and my vision is reliant on chaos theory initial conditions for emergent fractal simulations not end to end engineered simulations. but Id like to have a decent understanding of statistics and probability concepts nevertheless. so i can help with validations, bias and variables et al.

#

speaking of which; is probability just another statistic? or are all statistics just a probability assessment?

wintry olive
#

of course there is an uncertainty principle to factor into this and that considering most of my understanding of statistics/probability comes from casual study of physics so symmetries/asymmetries, entropy, distributions and standard of deviations.

wintry olive
#

yup with financial and business intelligence/analytics at the top

summer cobalt
#

Can someon rate my code from a purely engineering perspective?

#

Thanks!

verbal light
#

Hi, I know basis of machine learning and RCNN theory. I want to make a object recognition program with google/custom images. Can u recommend some algorithms and their example implementation with tensorflow. I know that fast rcnn and faster rcnn are better/harder but i think there would be more examples so i'm open for suggestions. I'm trying to work at vest.ai machines because of my computer's parameters.

wintry olive
#

ahh the keyword is Vision Transformer not pixel word

#

hmm not sure @verbal light I tried to look if there were any APIs to use google image or bing image search engine

verbal light
wintry olive
#

ahh thats like a corpus of images

#

in sets already

#

not sure man ive been all NLP

#

whoa thats big data too

#

one thing NLP can do with character search and semantics of collocates et all is allow for the creation of smaller sub sets of virtual corpora

#

i suppose you could extract subsets based on this:

#

idk

#

heres a good old fashion cnn course

ionic isle
#

Mathematics for Machine Learning
This is a 400-page free book about the mathematics needed for machine learning. It covers the things you need to know in order to get started with machine learning.
https://mml-book.com/

trim imp
#

Hi, I have a question and it is hard to explain but I am trying. Can I summarize text from PDF and then classified the main topic? For example, some text has different topics divided into multiple articles. I want the topics with summary. So I hope that’s clear. How can I do it using Python? And which library would be useful. Thanks

livid quartz
#

Does anyone know how to change figure size in using plt.subplot?

#

It doesn't work the same as plt.subplots()

south quest
#

basically they claim high school maths

lapis sequoia
#

hi. Could u help me making a cnn for image classification? All the examples ive seen are about cat/dogs with already image dataset from keras. But i want to use my own data set, and there are more than 2 categories

oblique vine
#

@up just use other model than binary-crossentropy

#

use categorical with number of categories specified

#

or sparse-categorical 😛

lapis sequoia
#

thats the name of the model i need?

oblique vine
#

idk, most likely

#

find any digit recognition tutorial and read the code

#

it will be most likely something really similar to google one, but with categorical model

wintry olive
#

i do have ideas for computer vision but without doing more research i have no way to determine how relevant, viable or outlandish the ideas are. until i finish working with NLP someone run a highly experimental unsupervised learning model with the mandelbrot set zoom as the input image. id love to take a look at what the model sees

#

and apply it to this:

cold yarrow
#

Clustering is cool

lapis sequoia
#

guys my train data seems like this

#

how can i pass that as train data for a cnn?

#

i already got the labels, which are basically the name of the folders

#

but inside each folder there are images

#

how can i tell the cnn "all the images from this folder correspond to this label"

#

tf.keras.preprocessing.image_dataset_from_directory turns image files sorted into class-specific folders into a labeled dataset of image tensors.

#

is this what i want?

#

well, it sais i dont have such funcion

austere swift
#

and you'd set label mode to whatever label mode you'd want to use, by default it's sparse labels

lapis sequoia
#

yeah but that method isnt implemented i think

#

not on tensorflow 2

#

The specific function (tf.keras.preprocessing.image_dataset_from_directory) is not available under TensorFlow v2.1.x or v2.2.0 yet.

#

so... could u help me to do it manually?

mossy dragon
#

@trim imp

#

nltk would be useful

#

im not sure about summarizing, but you can def do topic modeling

#

Topic modeling is an unsupervised machine learning technique that's capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

obtuse skiff
#

Hello, Im working on utilizing tripletloss for MNIST. I got something running and the Loss for the Training and Validation is getting smaller as expected every epoch, but the Accuracy is sticking around 18-20% and its just basic MNIST so something is def wrong,
I have a basic 2 conv layer 3 fc layer architecture. I put the anchor, pos and neg through that model, then put those results into the TripletMarginLoss on pytorch

any recommendations on what I can do?

wary sand
#

hello, im looking for an A.I server

#

im interested in A.I and python

#

any suggestions?

gaunt tusk
wary sand
#

im already in that one, but im looking for a small and you know less active server

gaunt tusk
#

That makes absolutely no sense

#

You would rather a small inactive server over one with 9000+ people

wary sand
#

no, this server is very active and often people dont reply to my questions so

#

thats why i need a small server

ocean mountain
#

Hii

#

Can anyone help with this

#

🙄

fallow prism
#

use pandas please

ocean mountain
#

Already

#

But same problem

fallow prism
#

The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8. Since 0x90 doesn't actually mean anything in Latin-1, UTF-8 (where 0x90 is a continuation byte) is more likely.

You specify the encoding when you open the file:

file = open(filename, encoding="utf8")

https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character

ocean mountain
#

Thanks

#

I am trying this now

fallow prism
#

your file is using another encode and you have to know what is

ocean mountain
#

Thanks 👍

fallow prism
#

😸

ocean mountain
#

I am implementing it

#

😌

fallow prism
#

do you know about a site for learn about CNN?

ocean mountain
#

Nop bro

#

Are you in Hactoberfest grp bro ?

fallow prism
#

what is that?

#

hahaha

ocean mountain
#

An open source event grp

fallow prism
#

tell me more

ocean mountain
#

Which happens in October every year

#

More than 70k Dev's take part in it

#

It's the biggest open source event

fallow prism
#

grumpchib but is november now

ocean mountain
#

Yep

copper kindle
#

Any expertise using the orange software for data analysis and visualization ? what is the difference between t-SNE block and manifold t-SNE block? why both results are not the same ?

lapis sequoia
lapis sequoia
# lapis sequoia ``tf.keras.preprocessing.image_dataset_from_directory`` turns image files sorted...

https://stackoverflow.com/questions/54921711/interactive-labeling-of-images-in-jupyter-notebook

Intresting Post about Image Labeling in Python. Maybe you can extract the important part for your function

fallow prism
#

whats mean the shape of one array is (n,)?

#

why ','?

lapis sequoia
leaden vessel
#

lf help with periodogram of sinusoidal signals with normalized frequency and dB power

#

signal:

N = 1024; f1 = 500; f2 = 1200; fs = 8000 #Hz
n = np.arange(N)
Sn = 0.5*np.sin(2*np.pi*n*f1/fs) + np.sin(2*np.pi*n*f2/fs)
#

I have to generate PSD (power spectral density) with and without Hann window

lapis sequoia
# lapis sequoia https://stackoverflow.com/questions/54921711/interactive-labeling-of-images-in-j...

mmm this is not what i was looking for i believe. I have my datset like this https://gyazo.com/10ad185f8027af44c0e9e2edb9200a6f and each fodler has the images. All the images on each folder have the folder name as label. I was planning on using like 80% of the images of each folder as train for that specific label, and the rest for validation. I just dont know how to tell that to keras

fallow prism
clever vapor
#

hey

#

so guys i wanna make a program that does not specify my needs

#

do any of you have any recommendations django?

#

or anythihg else?

lapis sequoia
#

It just say you have a array with n elements and not a array with (n* x )*m elements

#

@fallow prism if you need more, i can send you some good StackOverflow links

fallow prism
#

of course!! thank you @lapis sequoia you were clear

lapis sequoia
fallow prism
#

hahaha it's worse to study on a tuesday

wintry olive
#

Seems like a lot of adoption for this graph data platform

#

id converge all your datasets there

#

aside from learning playgrounds or research and development

#

all your startups career business medical IT security et all

final scaffold
#

Hi! I need a quick help regarding group by in pandas

wintry olive
#

unless there is a better option?

final scaffold
#

Dataset looks like this^ ...ignore column event time.

#

I want to groupby the dataset by install time, event name, campaign, and siteid...and sum event revenue and add a new column which counts rows of event name

wintry olive
#

there is this snag: 12. No Export. You agree and certify that neither the Product nor any other technical data received from Neo4j,~~~~

lapis sequoia
wintry olive
#

for a startup that wants to step up and build their own platform after using the graph dataset that might be an issue everything up to that part of the agreement was solid. The question is tho would their platform handle model cache, validation test scores and metrics so data plus set plus model card plus

wide void
#

Hello everyone! Im having a bit of an issue and was wondering if you can help me. Im trying to train a neural network. Im having issues with fitting my model. When it goes into the directory where my images are it prepends "._" before my image name and can't for the life of me figure out why.

swift gyro
#

Are you using any libraries? Keras / Tensorflow?

wide void
#

Both

wintry olive
#

oh yeah neo4Jj = awesomeness

swift gyro
#

Can you give a short run down of the model?

#

I've had this problem before and I think it was somehing with a conv2d but don't know how I fixed it. Programming 101

wide void
#

Im using a mobilenet I fine tuned by removing the last 6 layers and added a Dense layer at the end as output

swift gyro
#

Linear activation, adam?

wide void
#

softmax, adam

swift gyro
#

have you added breakpoints and tried to see where the name changes?

wide void
#

Negative. im a bit of a noob. I'll try that now.

swift gyro
#

Kk. (Don't worry, I'm no developer, just a High schooler with youtube and LinkedIn Learning, also a noob)

wide void
#

Im not sure what Im looking at. Is it ok to post the error on here?

wintry olive
#

not only is it a multi-code editor on the datagraph side but this architect app lets you build dataset with point and click & code

#

thats kind of what I was thinking of

earnest forge
#

How is this sort of visualisation called?

wintry olive
#

wait its

#

scatterplot with meta waveform over top

#

that is neat

#

if you go up one more layer or dimension guess what it is....

#

the initial start of a statistical fractal

#

probably sounds more useful then it is or poetic really

earnest forge
#

I am curious how to make this wavefrom

wintry olive
#

its the first example I have seen its like a graph layered over a graph the waveform itself is probably statistical deviation from zero but i saw its relation to the point cloud right away

earnest forge
#

oh. I see you are decently competent in statistics and maths too, right? I've got a question related to percentiles, though...

wintry olive
#

be neat if the waveform could animate although its more like....

#

a standing wave 🙂

earnest forge
#

considering this graph, I can see a correlation. anyway

wintry olive
#

i briefly looked at statistics earlier still have to study

split eagle
#

I am working with a pandas df in jupyter notebook and am trying to drop rows on the condition that df['overall_status'] =='Recruiting') and df['Raction accrued] is NaN. I have tried using the functions .isna(), .isnull(), and also tried df['Fraction accrued'].replace('',np.nan,inplace=True) followed by df['Fraction accrued'] =='True'. I get the error: "unhashable type: 'list'. Here's my full code:

#

index_names = df_cancer_drop.drop([((df_cancer_drop['overall_status']=='Recruiting') & (df_cancer_drop['Fraction accrued'].isna()))].index)
df_cancer=df_cancer_drop.drop(index_names,inplace=True)

#

How can I correctly write the logical statement to drop these rows?

earnest forge
#

I have the following array
a = [ 1, -9, -15, -11, -19, 2, -15, 3, 8, -8, -5, -14, -5, 1, -19]

And when I'm computing np.percentile(a, 99)
I get this confusing output: 7.299999999999997

#

Shan't it return simply -19?

wintry olive
#

yeah there is a correlation im just not sure exactly what to call it perhaps values of y but if data visualization is analytics what is that telling me...

copper kindle
earnest forge
#

I've learned only basics of percentile, such as 25/50/75 so how is it computer for not 'boring' values?

copper kindle
#

once your data is sorted you can calculate correct percentiles by hand. But numpy calculates the correct percentiles prolly by sorting the array during calculation.

earnest forge
#

oh, i got it. after sorting the array I got 8 as the last element and 3 as prenultimate. so it explains it now

#

thanks 😄

wintry olive
#

can numpy cancel out the positive and negative integers?

earnest forge
wintry olive
#

i just noticed scalar on the graph data platform

copper kindle
wintry olive
#

first example i have seen

#

i got a bit excited 🙂

#

if i do cancel them out would that reduce noise or would i lose vale?

copper kindle
# earnest forge

The data value plotting may show the correlation (there can be a positive correlation if noise is reduced) and the x,y graphs might show the data distributions.

wicked meadow
#

Trying to work on a python script that will save a string as a .sql file, anybody know how to go about this?

austere swift
copper kindle
austere swift
copper kindle
austere swift
#

with automatically closes it thats why

copper kindle
wicked meadow
#

thanks i'll try that

austere swift
#

yep :)

wicked meadow
#

I assume I'll be able to pass variables as the names of the files?

copper kindle
#

but you need to concatenate .sql with the variable aswell.

for example your_variable+''.sql"

wicked meadow
#

Okay cool. How will that work with the quotation marks?

copper kindle
#

wait a minute.

#

your varialbe holds the sql commands ?

wicked meadow
#

no i just want the variable to be the names of the file

copper kindle
#

alright then the aobe thing works.

#

above*

wicked meadow
#

Okay cool thanks

stone tangle
#

In a django project where could I manipulate data for data science? any help is much appreciated!

#

also sorry if this is not the right channel

calm forge
#

You could just @/# something here but if your question is about data science I would leave it

stone tangle
#

ok cool

calm forge
#

sure

lapis sequoia
#

I know you can make a django website that changes data live when your data changes, I would think you can also change the data on the website to, sounds a little more advanced tho

stone tangle
#

yea I am kinda a noob

lapis sequoia
#

i suggest learning basics of django before doing that type of stuff, so it will be less complicated when you get there and there will be less errors to deal with, will save days if your life probably.

#

I am sure you knew that but never can be to sure.

stone tangle
#

ok, I might try a udemy course or somthing

tight dove
#

is JSON and JSON-stat different formats? do I need to use different library to handle json-stat?

plucky spindle
#

@stone tangle You could also check out Streamlit or Flask, it's a bit easier to do things like Machine Learning as a Service (MLaaS)

stone tangle
#

ok will do

snow compass
#

wait this might be a better place for pandas dataframe questions. I want to iterate through a list of dataframes. I have a number of functions where the function takes the dataframe as an argument. But! I want to do different things depending on which dataframe in the list is being put into the function.


#define my functions first
def funtion1(df):
    if df.name == df1: #doesn't actually work!!!
        #do thing
    elif df.name == df2:
        #do thing differently

#same basic structure for the rest of my functions

df_list = [df1, df2, df3, df4, df5, df6]

for df in df_list:
    df.name = df #(doesn't actually work)
    function1(df=df)
    function2(df=df)```
is basically what my stuff looks like.

but I can't do it that way. 

A pandas *series* can be given a name attribute but not a dataframe. 

if df.name == df1: **#doesn't actually work!!!**
*ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any(), or a.all().*

So of course I google the error and check the top SO links. I try to create a dictionary as a top reply suggests. 

```python
dfs = {'some_label' : df} #is what they type out

but when I try to use df.name = dfs[df] or dfs = {df1 : 'df1' , df2 : 'df2'} I get TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed from both of those. I must not be using a dictionary right or this isn't a good solution.

I would like to be able to keep the inside of my functions along the lines of
if df1 then a
elif df2 then b
elif df3 then c

but, well, the ways I've gone about this are giving me error messages (tried .name, tried making a dict.) help?

civic fractal
#

Anyone familiar with pandas willing to help out?

#

for a min

haughty ingot
civic fractal
snow compass
#

oh! muskrat

#

maybe you can use WHERE? or REPLACE?
numpy.where(condition[, x, y])
Where True, yield x, otherwise yield y.

like, iterate through the column and the condition is true then keep but false replace with nan and then remove nans?

I feel like I had to do something like this once, hang on let me go over my recent projects

#

df = df.loc[df['marketcap'] <= 1000000000]

#

@civic fractal what happens when you try that?

snow compass
haughty ingot
#

no

#

this for @spark dirge

spark dirge
snow compass
#

I'm just trying to get some outside eyes on my dataframe problem and then elongatedmuskrat posted after me so I tried to help with their problem and now I'm just chilling here

spark dirge
#
df_list = [df1, df2, df3, df4, df5, df6]
df_mp = {}
for df in df_list:
  df_mp[df.name] = some_func(df)
print(df_mp)
#

You just want a list of dataframes sent through a function and in a collection?

civic fractal
boreal summit
#

@spark dirge what you tryna do exactly?

snow compass
#

the inside of some of my functions look like

if df == df1:
    #do thing
if df == df2:
    #do other thing

but I get that value error. I thought I could assign a name to each df and then instead it's if df.name == '' then do thing

but that hasn't worked

I went to SO and read up a similar problem and the person was advised to make a dictionary and I feel I must be doing something wrong because THAT gives me an error

#

https://stackoverflow.com/questions/31727333/get-the-name-of-a-pandas-dataframe/31727504#31727504

In many situations, a custom attribute attached to a pd.DataFrame object is not necessary. In addition, note that pandas-object attributes may not serialize. So pickling will lose this data.

Instead, consider creating a dictionary with appropriately named keys and access the dataframe via dfs['some_label'].

df = pd.DataFrame()

dfs = {'some_label': df}

spark dirge
velvet thorn
#

@snow compass two simple ways

#

which more or less lead to the same thing

#

create a list of (df, function) tuples

#

and iterate through that

snow compass
#

I mean, I need all of my dataframes to be put through all of the functions. it's just I need to write to a different row depending on which dataframe is being run, for example.

real wigeon
#

how to change the timezone in a timestamp column

#

pandas

#

im pulling a report from my db

#

and need to change the time to est (everyone using this app will be on est)

austere swift
#

!d pandas.Series.dt.tz_convert

arctic wedgeBOT
#
Series.dt.tz_convert(*args, **kwargs)```
Convert tz-aware Datetime Array/Index from one time zone to another.

Parameters  **tz**str, pytz.timezone, dateutil.tz.tzfile or NoneTime zone for time. Corresponding timestamps would be converted to this time zone of the Datetime Array/Index. A tz of None will convert to UTC and remove the timezone information.

Returns  Array or Index   Raises  TypeErrorIf Datetime Array/Index is tz-naive.

See also

[`DatetimeIndex.tz`](pandas.DatetimeIndex.tz.html#pandas.DatetimeIndex.tz "pandas.DatetimeIndex.tz")A timezone that has a variable offset from UTC.

[`DatetimeIndex.tz_localize`](pandas.DatetimeIndex.tz_localize.html#pandas.DatetimeIndex.tz_localize "pandas.DatetimeIndex.tz_localize")Localize tz-naive DatetimeIndex to a given time zone, or remove timezone from a tz-aware DatetimeIndex.

Examples

With the tz parameter, we can change the DatetimeIndex to other time zones:... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.tz_convert.html#pandas.Series.dt.tz_convert)
austere swift
#

@real wigeon

#

thats from converting time-zone aware columns

#

if your column isnt time-zone aware you'd need to make it time-zone aware

#

!d pandas.Series.dt.tz_localize

arctic wedgeBOT
#
Series.dt.tz_localize(*args, **kwargs)```
Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.

This method takes a time zone (tz) naive Datetime Array/Index object and makes this time zone aware. It does not move the time to another time zone. Time zone localization helps to switch from time zone aware to time zone unaware objects.

Parameters  **tz**str, pytz.timezone, dateutil.tz.tzfile or NoneTime zone to convert timestamps to. Passing `None` will remove the time zone information preserving local time.

**ambiguous**‘infer’, ‘NaT’, bool array, default ‘raise’When clocks moved backward due to DST, ambiguous times may arise. For example in Central European Time (UTC+01), when going from 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the ambiguous parameter dictates how ambiguous times should be handled.

• ‘infer’ will attempt to infer fall dst-transition hours based on order
... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.tz_localize.html#pandas.Series.dt.tz_localize)
real wigeon
#

Thank you @austere swift

wintry olive
#

I havent had a chance to research scalar models yet but the meta waveform could be a way to get a glance at the distribution or symmetry between the two axis without having to scope out the numbers

#

a visual aid layer not a meta or extrapolation or just a visual aid for the deviation from zero or the symmetry of the two almost as if some law of large numbers set

#

the distribution is symmetrical but the vertical waveform is front whereas the horizontal waveform is in the middle

hollow gull
# snow compass wait this might be a better place for pandas dataframe questions. I want to iter...

It seems like you are being sort of particular about how you do this without letting us know why, so expect a lot of solutions that don't quite hit your requirements (because we don't know them / why they exist)

#define my functions first
def funtion1(df, dfname):
    if dfname == 'df1': #doesn't actually work!!!
        #do thing
    elif dfname == 'df2':
        #do thing differently

#same basic structure for the rest of my functions

dict_dfs = dict()
dict_dfs['df1'] = df1
dict_dfs['df2'] = df2
dict_dfs['df3'] = df3
dict_dfs['df4'] = df4

for dfname in dict_dfs.keys():
    function1(df=dict_dfs[dfname], dfname=dfname)
wintry olive
#

the offset could be the rate of the axis range the vertical increase by 1 so its waveform is always front where as the horizonal increases an even amount along the axis so its waveform is right in the middle

hollow gull
#

@snow compass Maybe a better way of doing this is to build custom classes where the different functions do different things depending on the class that you pass. But that would only be better if you had multiple dataframes of the same type that should have the same thing done if they are passed to the same function.

wintry olive
#

id very much like to see how the graph looks at different intervals same axis range but increases that are unusual

#

the following ideas are highly experimental.

#

i would like to have 2 more graphs in a grid exact opposites counting down from the max established from before. This creates a min max wave

#

a nice symmetrical one at that.

#

id iterate two more times then have the fourth layer be only min/max and target data the rest is truncated as noise.

real wigeon
austere swift
arctic wedgeBOT
#
Series.dt.tz```
Return timezone, if any.

Returns  datetime.tzinfo, pytz.tzinfo.BaseTZInfo, dateutil.tz.tz.tzfile, or NoneReturns None when the array is tz-naive.
austere swift
#

if that's None then it's not aware

real wigeon
#

doing this

#
timestamps = df["upload_timestamp"].dt.tz
            print(timestamps)``` resulted in ``none``
#

i presume im selecting the df column properly

#

erm im kind of a noob

#

If i remember correctly pandas is kind of weird

red briar
#
py

  def convert_timezone(self, x):
       from_zone = tz.gettz('UTC')
       to_zone = tz.gettz('America/New_York')
       return x.replace(tzinfo=from_zone).astimezone(to_zone)

#
      df['Creation Date'] = df['Creation Date'].apply(lambda x:self.convert_timezone(x))
      df['Creation Date'] = df['Creation Date'].apply(lambda x:x.tz_localize(None))

real wigeon
#

right but that's just applying that logic to the column

#

doesn't pandas handle that kind of weird, because the result is a series

#

and id need that as a part of the df

#

aren't they two separate entities now

real wigeon
#

or is doing df['Creation Date'] applying it to the df, but only to the column Creation Date

#

i am noob

red briar
real wigeon
#

because localize only makes it aware (my data in the db is UTC), it doesn't convert.

serene scaffold
#

I have to make a Bayes classifier for a dataset where each object gets one continuous feature and its class label. But how do you even apply Bayes for continuous data?

#

binning?

austere swift
#

tz_convert

real wigeon
#

yes but im asking

#
 make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')``` Since my data in the db is ``UTC``
#

or should I set it to my local timezone

austere swift
#

i mean you do tz_localize and then tz_convert

real wigeon
#

yes correct

#

but do you localize to UTC

#

or EST

#

the data is in UTC

#

i went with UTC

austere swift
#

well in the examples it shows you could use est

#

see how after the localization it shows -5:00

#

that means that when it localized with est it assumed the original values were utc

#

so i think you can just use that

#

I'm not completely sure tho lol

real wigeon
#

ok cool

#

i mean it says that it does not convert

#

.>

#

alright well, idk how to place that column back into my df

austere swift
#

instead of assigning that value to make_timestamps_tz_aware just assign it back to df["upload_timestamp"]

real wigeon
#

what do you mean

austere swift
#

df["upload_timestamp"] = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')

real wigeon
#

oh

#

i was actually going to do this

#
make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```
austere swift
#

that works too

real wigeon
#

hmm it says though

#

it's not a date time index

austere swift
#

oh its probably not in datetime format

#

!d pandas.to_datetime

arctic wedgeBOT
#
pandas.to_datetime(arg: DatetimeScalar, errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → Union[DatetimeScalar, ‘NaTType’]``````py
pandas.to_datetime(arg: ‘Series’, errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → ’Series’``````py
pandas.to_datetime(arg: Union[List, Tuple], errors: str = '...', dayfirst: bool = '...', yearfirst: bool = '...', utc: Optional[bool] = '...', format: Optional[str] = '...', exact: bool = '...', unit: Optional[str] = '...', infer_datetime_format: bool = '...', origin='...', cache: bool = '...') → DatetimeIndex```
Convert argument to datetime.

Parameters  **arg**int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-likeThe object to convert to a datetime.

**errors**{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
 • If ‘raise’, then invalid parsing will raise an exception.

• If ‘coerce’, then invalid parsing will be set as NaT.

• If ‘ignore’, then invalid parsing will return the input.

**dayfirst**bool, default FalseSpecify a date parse order if arg is str or its list-likes. If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10. Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).

**yearfirst**bool, default FalseSpecify a date parse order if arg is str or its list-likes.

• If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12.

• If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil).
... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime)
austere swift
#

use that to convert it

#

damn thats long

real wigeon
#

uhh

#

when

austere swift
#

before the localize stuff

real wigeon
#

oh

#

that looks like its applied to the entire df?

austere swift
#

no, you can do it on a single column

real wigeon
#

update_to_datetime = df["upload_timestamp"].to_datetime

#

err

#

cuz idk

austere swift
#

no, its not a function from the df its from pandas

real wigeon
#

yeah

austere swift
#

so pd.to_datetime(df["upload_timestamp"])

#

and then you'd need to set the format arg so it can see how to format it

#

oh oops typo

#

i put datatime instead of datetime lol

real wigeon
#

looking up the formatting

austere swift
#

its kinda like datetime strptime

real wigeon
#

this is the current format

#

11/26/2020 11:26:27 PM

#

but it's in utc

austere swift
real wigeon
#

ermm im over thinking

#

it probably accepts

#

mm/dd/yyyy HH:mm:ss

austere swift
#

yeah it probably does

#

try it

real wigeon
#

im going to try this

#
df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            pd.to_datetime(df["upload_timestamp"], format='mm/dd/yyyy HH:mm:ss')

            make_timestamps_tz_aware = df["upload_timestamp"].dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```
austere swift
#

well, not that for format

#

thats not how format works

real wigeon
#

oh uhh

#

lmao

#

whoops

#

sry sry

#

yhe %% s

austere swift
#

yes

real wigeon
#

i dont think this is quite correct

#

pd.to_datetime(df["upload_timestamp"], format='%mm/%dd/%yyyy %HH:%mm:%ss')

austere swift
#

no

#

thats for datetime strftime but i think it should be the same for pandas

snow compass
real wigeon
#

dam it's still the same error @austere swift

#
    make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')
austere swift
#

it returns the output

real wigeon
#

ohh

austere swift
#

so you'd need to assign it back to the original column

#

or assign it to an intermediary variable that you then use for the other modifications

real wigeon
#

alrighty

#

testing

#

hmmm

#

same error

austere swift
#

code?

real wigeon
#
df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            convert_timestamp_to_date_time = pd.to_datetime(df["upload_timestamp"], format="%m/%d/%Y, %H:%M:%S")
            make_timestamps_tz_aware = convert_timestamp_to_date_time.dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.tz_convert('US/East')

            make_timestamps_tz_est.to_excel('location/output.xlsx', index=False)```
austere swift
#

also you forgot %p btw, thats for AM and PM

#

and there shouldnt be a comma in the format

real wigeon
#

is it :%p

austere swift
#

no it would have space %p

real wigeon
#

k

austere swift
#

so basically imagine you're writing your time out, but replace all the actual number values with the % codes

real wigeon
#

i see

#

alright but the error states something about the index

austere swift
#

well test that out

#

it could just be the format error

real wigeon
#

apparently localize and convert only works on the index

#

yeah same error

#

hmmmmm that doesnt really help

trim oar
#

Hello guys, I know it depends on the problem, but how would you approach to find out the appropriate number layers?

#

As well as nodes?

#

Like a baseline number

austere swift
#

theres no real way to just figure out how many you need

#

that's the whole concept of hyperparameter tuning

#

you just have to test stuff out and see how it goes

#

I'd recommend trying to use a model that already works for your baseline, like a premade model

#

then tweak from there

trim oar
#

I know hyperparameter with GridSearch when doing classical ML. How would you do it with TensorFlow?

austere swift
#

if you're using keras you can use keras tuner

trim oar
#

Thank you!

real wigeon
#

yeah so im still getting the same error

#

TypeError: index is not a valid DatetimeIndex or PeriodIndex

#

progress

#

syntax stuff

real wigeon
#

alright well... i managed to download in xls format just the timestamp column..

#

and I mistakenly stripped the hours/seconds info

fallow thunder
#

This is the code used to generate the figure:

import matplotlib.pyplot as plt
import matplotlib.patches as patches

fig = plt.figure(figsize=(10,2))

ax = fig.add_subplot(1,1,1, aspect='equal')

# Low
x = [0,0,9,11]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

# Medium
x = [10,12,15,17]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

# High
x = [16,18,20,20]
y = [0,1,1,0]
ax.add_patch(patches.Polygon(xy=list(zip(x,y)), fill=False))

ax.set_xlim([0,20])
ax.set_ylim([0,2])

plt.show()
trim oar
#

I'm not exactly sure of your codes but you can set the ticker with plt.yticks = array

#

Say your array is range(1, 10,1), then you can set plt.ytics = range(1,10,1)

#

Don't know fit hat helps

#

Increase figsize as well?

fallow thunder
#

I tried increasing the figsize but the height of the figure doesn't change

#

nvm, it was aspect='equal', I forgot to remove it. Thanks for the help anyway!

lapis sequoia
#

beginner question but in numpy rather than creating the matrix from scratch is there a way I can call an empty matrix of specified size?

#

exp: i could call a 3 x 2 matrix full of 0s with the values to change later

lapis sequoia
#

nvm found answer

ivory panther
#

Without the need to show two different pictures.

lapis sequoia
#

increase window width if thats an option

#

or take more windows

high badge
#

is singular value decomposition (SVD) solely for linear regression or can it perform on other models like the Gradient Descent algorithms can?

high badge
#

nvm

sleek robin
#

hey guys, in backpropagation, if we're using cross-entropy as the loss function, why is the error term in the output layer computed as [y - (output activation)]? isn't that the partial derivative of a mean squared error loss func with respect to output activation, rather than cross-entropy? i keep seeing it even if the loss function isn't MSE

snow compass
# hollow gull It seems like you are being sort of particular about how you do this without let...

How did I miss this last night?? I saw your second ping and not this one. Is this what gm meant? because now that makes sense.

Sorry I didn't realize I was being particular about this. I think I still don't have the best handle I need on the jargon? like, using words as correctly as possible to their coding definition.

I'm gonna try this out and see if that does the thing. and hopefully have a better understanding of why >.>;;

ornate valve
#

hi! , anyone can help me with np.trapz for calculate area under the curve ? ive been doing some research but all the examples contains random data and i dont know how to incorporate my data.

glacial rune
#

I have a dictionary of dictionaries:

{
'A': {'spread' = .., 'mid' = ..},
'B': {'spread' = .., 'mid' = ..},
...
}

Where there are usually 3-15 keys. I need the most performant way of finding the minimum spread AND the N largest mids - I've currently got the min spread as best = min(prices.values(), key=lambda x: x['spread'] then best_spread = best['spread]
I'm not sure how to find the N largest mids in the most performant way - but I do put the mids in a numpy array as I need to find their median or mean.

grave frost
#

Well, does anyone know why when we use TPUs, PyTorch uses the System RAM for loading the model rather than the internal TPU Vram or the GPU RAM??

split eagle
#

I'm trying to drop rows that contain specific words within a column from my df. I tried creating an index and dropping the index, but I got an error saying that since it included more than 6 items it was too large and couldn't be used. I have just tried the following code, which I adapted from Stack Overflow:

#

tox = ['toxic','toxicity','toxicities', 'deaths','fatal','patient~ safety','safety issue', 'safety monitoring', 'safety data', 'safety measures', 'safety related', 'safety reasons', 'safety concern', 'safety and efficacy']
df_test1 = df_test1[-df_test1['why_stopped'].isin(tox)]

#

This doesn't return any errors, but the size of my df_test1 hasn't changed.

#

How might I get this this to successfully drop rows that contain the terms in tox from df_test1?

ornate valve
real wigeon
#

i have a dataset that I manipulate some timezone data on
it manipulates just one column
however I'm trying to output the entire data set, not just the timestamp column, to xls
currently it's just exporting the xls file
im using pandas

df = pd.DataFrame(query_resolution, columns=['upload_timestamp', 'email', 'was_this_a_pandemic_related_call',
                                                      'what_was_the_call', 'was_the_inquiry_resolved'])

            convert_timestamp_to_date_time = pd.to_datetime(df["upload_timestamp"], format="%m/%d/%Y %H:%M:%S %p")
            make_timestamps_tz_aware = convert_timestamp_to_date_time.dt.tz_localize(tz='UTC', ambiguous='infer')
            make_timestamps_tz_est = make_timestamps_tz_aware.dt.tz_convert('America/New_York')
            remove_time_zone = make_timestamps_tz_est.dt.tz_localize(None)

            #remove_time_zone = make_timestamps_tz_est.apply(lambda a: pd.to_datetime(a).date())


            remove_time_zone.to_excel('staffDashboard/output.xlsx', index=False)
            #print(cursor.mogrify(get_results, (formatted_start_date, formatted_end_date)))
            connection.close()
            cursor.close()
            return send_file('output.xlsx', attachment_filename=f"{formatted_start_date}-{formatted_end_date}_survey_results.xlsx", as_attachment=True)```
how do i go from refferencing just the column, to merging it into the dataframe
and then exporting that dataframe
like I said, currently it just export the column
do i just replace the old column
and export the new df
woven tundra
#

@real wigeon

You're splitting out that column, running it through functions and then exporting just the column.

Add it back to the df with

df["converted_timestamp"] = remove_time_zone

And then export the df

df.to_excel("output.xlsx", index=False)

real wigeon
#

doing this

#

df["converted_timestamp"] = remove_time_zone

#

wont that assign a new column, since the name is different

woven tundra
#

yes, if you want to replace the upload_timestamp column with the column full of converted info change the name to "upload_timestamp"

real wigeon
#

ok gotcha

#

let me test

#

I thought it was something simple like that

woven tundra
#

sure let me know, ping me if it doesn't work so I get a notification

grave frost
#

Well, does anyone know why when we use TPUs, PyTorch uses the System RAM for loading the model rather than the internal TPU Vram or the GPU RAM??

real wigeon
#

@woven tundra that worked

#

thank you

woven tundra
#

awesome

#

no worries

livid quartz
#

Does anyone know how to convert an array with values dtype = 'timedelta64[ns]' to days?

real wigeon
#

although it does.... this weird thing... where the query range is like x-y but y wont be included

woven tundra
real wigeon
#

i think it's a mysql thing

woven tundra
#

oh okay, is it included in the input file?

real wigeon
#

im thinking it might now be

#

i dont believe mysql is inclusive i think is the term

#

like if i ask it to query all data points between 5am and 6am, it will go all the way up to 5:59am, but not include the 6am

woven tundra
real wigeon
#

yeah i did some searching

#

its a mysql thing

woven tundra
#

cool cool

real wigeon
#

and its because i didnt specify seconds

#

lol

woven tundra
#

i can't be a lot of help on the mysql front 🤷🏻‍♂️

real wigeon
#

no worries you've been helpful

#

im just typing for the sake of it

ivory panther
snow compass
paper nacelle
#

am using plotly express

#

in jupyter

sleek robin
#

if it's a white square, try restarting the notebook

#

i had that a couple of times in jupyter with plotly

gray phoenix
#

Does anyone know where I would be able to learn time series analysis?

Cost isnt too big of an issue since i would be getting my employer to pay for it.

fallow prism
#

it's possible that fill with NaN values after drops

#

beacuse your dataframe has a fix size

split eagle
#

@fallow prism I'll inspect the data real quick. Give me a sec.

#

@fallow prism I have examined the df and the cells that I intended to drop remain.

keen crest
#

Posting in this channel because my issue includes the use of a dataframe, but please direct me to the correct channel if I posted incorrectly. Can anyone help me fix this error? I don't understand why my list isn't being accepted as column names, even though my variable used is a list with four elements. My list is printed in cmd as ['owner', 'series, 'name', 'image']

fallow prism
#

if is just a word try this

#

df_test1['why_stopped'] = df_test1['why_stopped'].apply(lambda x: return x if x not in tox)

#

or make a new column an replace the first column later

lapis sequoia
serene scaffold
#

@lapis sequoia this is something that you wrote?

lapis sequoia
#

@serene scaffold Yes

serene scaffold
#

@lapis sequoia very nice. I'm looking at the section on joining. You mention using .join but it looks like it's np.concatenate that you use

lapis sequoia
split eagle
#

@lobon22 A string.

lapis sequoia
#

hey

#

i keep getting this error

#
    result = self.forward(*input, **kwargs)
  File "/Users/ashley/Deeplearning/fresh_vs_rotton.py", line 67, in forward
    x = F.max_pool2d(self.relu(self.conv1(x_1)), 2)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/Users/ashley/Deeplearning/venv/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 2-dimensional input of size [1176, 512] instead
#

idk how to fix it

#
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 8, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(8 * 8 * 8, 32)
        self.fc2 = nn.Linear(32, 2)
        self.relu = nn.ReLU()


    def forward(self, x_1):
        x = F.max_pool2d(self.relu(self.conv1(x_1)), 2)
        x = F.max_pool2d(self.relu(self.conv2(x)), 2)
        x = x.view(-1, 8 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x
fallow prism
spiral peak
#

So I have a dataset made up of 3 columns. Not every column has data for every row, but I'd still like to compute an average for that row, even if it's just using the 1 column. How do I do that?
.mean() is giving me NaN for the rows that have NaN values in one of the columns and I don't remember how to get around this.

pearl vine
#

Define "get around" -- how do you want Nan to be treated? Ignore the value, i.e., average the non-NaN values, presumably deducting them from the count? Treat them as zero?

spiral peak
#

ignore the value if NaN

serene scaffold
#

I want triangle functions in numpy that, for a given range, return 1.0 in the middle of the range, 0.0 at the ends, and np.nan outside the range. But I can only find stuff about making n-arrays with that distribution.

#

I guess I can make them myself with np.vectorize or something

split eagle
#

👍

spiral peak
lapis sequoia
# serene scaffold I want triangle functions in numpy that, for a given range, return `1.0` in the ...
cedar sun
#

on the examples, when training a nn. What is X_train, Y_train, X_validation and Y_Validation?

#

X is a list with the training data (images or what ever) and Y another llist of the same size with the labels for each X?

velvet thorn
#

do you have an example

velvet thorn
#

but some sort of container

velvet thorn
cedar sun
#

if not a list, what?

serene scaffold
velvet thorn
cedar sun
#

okey okey

#

i knew images must be np arrays

#

thats easy cuz i think opencv loads images as np arrays, right?

covert cedar
#

Hey guys, I am trying to assign the strike and expiration date from a row to all of the results its values spawn, would I use inheritance to solve this?

#

I tried .at but it did not work correctly

tough citrus
#

is this the place to talk about neural nets

covert cedar
velvet thorn
#

you're going to need to give more details

covert cedar
#

Sorry. For each row of my df, it has a unique option. The values are then passed to a robin_stocks method that returns roughly 210k rows to the 1 input. I need all 210k to be directly traceable back to the 1 input

velvet thorn
#

like do you want all the results in one big DataFrame

#

and have an additional column

#

to indicate the source?

covert cedar
#

So if input 1 is ID 1, I want to pass that 1 to all 210k

#

Yes

velvet thorn
#

do you know what a join is?

covert cedar
#

Yes

velvet thorn
#

yup

#

that's what you want

covert cedar
#

Thats why UID

#

To join on

velvet thorn
#

left join on that

covert cedar
#

but

#

I cant get the value to populate

velvet thorn
covert cedar
#

See the NAN

#

Symbol is provided by the response from robin_stocks

velvet thorn
#

hm

#

okay so

#

what are you joining on?

covert cedar
#

Nothing yet

#

Trying to be able to

#

1 to many

velvet thorn
#

if you haven't joined

#

why are there null values

covert cedar
#
   a = f.at[i,'strike']
    c = f.at[i,'xpire']
    df4.at[i,'strike'] = a
    df4.at[i,'xpire'] = c```
#

is how I did it

velvet thorn
#

huh.

#

wait

#

I

#

actually don't get why you did that

#

that looks like a loop.

#

why do you have a loop?

covert cedar
#

YEs

#

It was

#
for i in tqdm(range(len(df2))):
    
    df4 = df4.append(r.options.get_option_historicals(f.loc[i]['symbol'], f.loc[i]['xpire'], f.loc[i]['strike'], 'call', interval='5minute', span='week', bounds='regular', info=None))

    
    
    a = f.at[i,'strike']
    c = f.at[i,'xpire']
    df4.at[i,'strike'] = a
    df4.at[i,'xpire'] = c```
velvet thorn
#

I'm going to assume

#

r.options.get_option_historicals is the said function?

covert cedar
#

Yes

velvet thorn
#

that's

#

a pretty weird way to do things

#

let me think for a bit

#

show me a bit of df2

#

in text

#

not picture form

#

my gut feel is that you should use df2.apply

covert cedar
#

Yeah I tried that

velvet thorn
#

with pd.concat

#

and then join on common columns

covert cedar
#

df2 is made like this

#

for i in tqdm(range(len(df))):
    df2 = df2.append(r.options.find_tradable_options(df.loc[i]['Symbol'],expirationDate=None, strikePrice=None, optionType=None, info=None))```
velvet thorn
#

you should really

#

avoid append in a loop

#

probably df.transform would be appropriate

covert cedar
#

would it be like

#
df2.transform(lambda x: r.stuff(x['1'],x['2'],))
velvet thorn
#

actually, no

#

more like df['Symbol].transform

#

since you're only using that column

covert cedar
#

It takes in 3 values, the df2 gen works ok

serene scaffold
#

@velvet thorn native numpy support for this:

class TriangleFunc:

    def __init__(self, start, end):
        self._start = start
        self._end = end
        self._mid = ((end - start) / 2) + start
        self._slope = 1 / ((end - start) / 2)

    def __call__(self, x):
        if not (self._start <= x <= self._end):
            return np.nan
        slope = self._slope if x <= self._mid else -self._slope
        return slope * (x - self._start)

except where I get the slope right for the right side of the midpoint

#

I'm making a fuzzy controller

#

problem is I don't think you can vectorize methods

#

looks like vectorizing doesn't improve performance so I guess it's a moot point.

civic fractal
#

I'd appreciate an answer if possible

serene scaffold
#

@civic fractal the answer that's already given is quite good

#

it sounds like you're pushing the limits of how numbers are stored on your computer

velvet thorn
#

@velvet thorn native numpy support for this:

class TriangleFunc:

    def __init__(self, start, end):
        self._start = start
        self._end = end
        self._mid = ((end - start) / 2) + start
        self._slope = 1 / ((end - start) / 2)

    def __call__(self, x):
        if not (self._start <= x <= self._end):
            return np.nan
        slope = self._slope if x <= self._mid else -self._slope
        return slope * (x - self._start)

except where I get the slope right for the right side of the midpoint
@serene scaffold I must confess I do not see what this code is meant to do

#

🥴

serene scaffold
#

@velvet thorn I figured that part out

#

now I'm just trying to plot everything

#

and then I'm 1/3 of the way through the assignment

#

💥 🎆 😢

#

(took two days to get this far)

#

(due at 4pm)

velvet thorn
#

ah, assignments

#

atb! 👋

serene scaffold
velvet thorn
#

what is that supposed to be

lapis sequoia
#

Hello! Does anyone here know anything about data mining using Python? I have an assignment I have to do.
Here's the kinda stuff we have to cover...

#

If anyone can help let me know! 😁

#

Just @lapis sequoia me

#

And this is using Anaconda if that means anything

torpid cave
#

Hi @lapis sequoia , your task seems simple and the explanation on what is expected is quite good, let us know if you need any help

#

Anaconda is just a Python distribution that has the relevant libraries/packages (however you call it) and its dependencies sort of installed

lapis sequoia
#

Yeah I think so far it has been pretty straight forward, I suppose I'm just kinda worried that it seems too simple and that it's like a trick question or something?

#

Like so far this is what I have

torpid cave
#

The outliers one looks quite fun

lapis sequoia
#

Oh that one I have no idea where to even begin honestly

#

Maybe you could help me with that

#

I imagine most of the marks are going towards that question

#

Is this right @torpid cave ?

torpid cave
#

I would just present one number instead of creating the table though

lapis sequoia
#

What number though?

#

I don't get it 😂

torpid cave
#

haha so your correlation is -0.1

#

You show a correlation table instead of the correlation between 2 variables, that is why that number is repeated

#

So instead of showing that matrix I would try to get just the -0.109

#

But it is just a personal preference thb

#

tbh

lapis sequoia
#

How do you know it's the -0.109

#

For the second one is it 0.927 then?

red hound
#

I have an assignment to show my understanding of boosting and bagging concepts. The report requires me to provide examples of various examples of boosting and bagging. Do you think it is ethical to use sample code from xgboost or scikit to show how ada boost, xgboost,etc. works?

lapis sequoia
#

You'd definitely have to reference it

#

Don't take stuff from online without referencing it because you're inherently implying it's all your work then

red hound
#

Of course I will reference but shouldnt be an issue after that right

#

Since the goal is not to improve a given model just to show the understanding of these concepts

lapis sequoia
#

I mean I've never heard of xgboost or scikit before, but if the website or your lecturer doesn't declare that you can't do that then I guess it isn't an issue?

red hound
#

Cool the TA references the site and recommends checking it out

#

Thanks just wanted a second opinion

torpid cave
#

Yeah reference everything

#

Even your lecturer

lapis sequoia
#

I don't know what boosting or bagging is but I guess it's not too small or simple to create an example yourself?

#

Our lecturers say not to reference them

#

I think it's kinda cringy when you do

#

When you like quote them from a class...

torpid cave
#

I am from the school that references ppt slides

lapis sequoia
#

Hmmm

torpid cave
#

Rules were quite strict in grad school

red hound
#

i tend to reference the book used in class thats about it

#

undergraduate most students here dont cite properly

torpid cave
#

tbh I don't remember citing much in undergrad

#

But this was quite a while ago

lapis sequoia
#

I think there has to be some sort of line because mostly 99% of everything we know came from somewhere else, and if we were to reference everything it would be kinda tedious...

torpid cave
#

And I did engineering

lapis sequoia
#

I think for the most part your lecturers understand that most of what you're saying came from them anyway

#

Unless you specify otherwise

torpid cave
#

In grad school... I did at least 20 references per paper

lapis sequoia
#

Damn...

#

I think in grad school it's a bit different though

torpid cave
#

Yep

lapis sequoia
#

Because your work may get a bit more public and attention

#

And so it's kinda necessary to show your sources

red hound
#

most of my reports have like 5 and 90% of them are from blogs

lapis sequoia
#

As opposed to undergrad where your work is really only gonna be seen by your lecturer

torpid cave
#

Depends on the subject as well I guess

red hound
#

also on the TA. Most cant be bothered to check really

torpid cave
#

For example I would not reference how to get correlations... but I would reference testing for heteroskasdicity

red hound
#

what major did you do graduate studies if i may ask?

torpid cave
#

BsC Engineering - MsC Applied Economics

#

So yeah

red hound
#

Ahh cool aight thanks guys I should be fine if I reference the samples

torpid cave
#

Yeah, reference as much as you can, you never lose much and you might impress your lecturer if he cares about that shit

lapis sequoia
#

But not referencing could be a serious offense 😬

#

@torpid cave

#

This is just a shot in the dark at this point

#

I have no idea if this is correct or not

#

For this point ^

livid quartz
lapis sequoia
#

you can use either, usually t-SNE would be preferable for extremely high dimensional data

#

t-SNE/PCA would work fine by the looks of it

livid quartz
#

Thanks 🙂

lapis sequoia
#

is it okay to upload sensitive data as a private dataset on kaggle?

#

for some reason the TPU on colab doesn't work well while reading data from drive

cedar sun
#

guys, i got this loop to load the data set:

for pok in os.listdir(datadir):
    path = os.path.join(datadir, pok)
    images = os.listdir(path)
    amount = len(images)
    for i in range(amount):
        img_array = cv2.imread(os.path.join(path, images[i]), 0)
        new_array = cv2.resize(img_array, dimension)
        if i < amount * 0.8:
            train_data.append([new_array, pok])
            train_label.append([pok])
        else:
            valid_data.append([new_array, pok])
            valid_label.append([pok])```
#

but it takes a while to complete. Can i run it once, export it somewhere and somehow, and the next times i just load it?

lapis sequoia
#

save the dataset, there are many formats

#

pickle, npy, npz, you can write it to a text file. If its a numpy array best options for you are npy and npz

cedar sun
#

numpy array are only the images

#

new_array

#

since opencv loads them as numpy array

#

pok is just a string

#

also, i am thinking. train_data doesnt need to have the label if train_label exists

#

or train_label shouldnt exists. Right?

lapis sequoia
#

Hey ! I'm using matplotlib to display activities with bars and legends, but some text is overlapping, any idea why ?

#

Well, I know why

#

but I don't know how to fix it

#

also, you noticed the hours on the bottom don't exactly display hours from 00:00 to 24:00, do you know how I may be able to fix this ?

arctic wedgeBOT
lapis sequoia
#

i'm very new to matplotlib so don't understand everything in there, I copy pasted a chunk of code from stackoverflow to get the structure

carmine bough
#

Hey, is someone familiar with opencv and a little machine learning?

frozen moth
#

what's up guys

#

anyone know a good way to classify job seniority?

lapis sequoia
#

Anyone know how to do this? 🤔

#

And this is coming from a dataset where I have a bunch of values for petal width and length.

frozen moth
#

iris?

lapis sequoia
#

It's the name of the data set

#

count for which the condition is met / total number of combinations

frozen moth
#

^this

carmine bough
#

Well I have a video and I need to recognize and display the poses left arm up and right arm up and I don't quite know how to do it

frozen moth
#

create new feat width*length

#

df['new_feat'] = df['widht] + df['length]

lapis sequoia
#
  • not + right?
lapis sequoia
frozen moth
#

sorry

#

and then count them

#

len(df[df[new_feat] > 1] )

#

divide that value by the amount of entries in you df

lapis sequoia
frozen moth
#

i.e. and then count them
len(df[df[new_feat] > 1] ) / df.shape[0]

lapis sequoia
#

🤔

#

What is the df.shape[0]?

#

What does that mean?

#

len(df)

#

What's the difference?

frozen moth
#

its the size of your dataframe (i.e how many sampels of petals u have)

lapis sequoia
#

theres no difference, its the number of entries

frozen moth
#

its the same thing

#

sorry it's just a habit of mine

lapis sequoia
#

Gg

#

So what other method could I use though?

frozen moth
#

therefore 2/3 are bigger than 3

#

than 1**

lapis sequoia
#

question says two methods HMM

#

Hmm is right

#

I took a stab at this question also

#

but meh

#

I have no idea if that's right

#

but in what respect, a difference formula (mathematical approach) or a different way to query the data frame

#

you could train a logistic regression model which gives probability that your condition is true, given that class 1: product >1 class 2: product <=1

#

there was also this question

#

Which I have no idea about

#

Are outliars judged by their distance from the average?

frozen moth
#

from the line

lapis sequoia
#

And and what point is the max?

#

What line though?

#

What is the line

#

their distance from the linear regression line

#

This?

#

I see a lot of lines here...

#

😳

frozen moth
#

the furthest one

lapis sequoia
#

The question says an outliar is identified as the point with maximum distance

#

but like what is the max distance?

#

There can be more than one outliar right?

frozen moth
#

distance perpendicular to the line

magic dune
#

I am working on a linear regression line can anyone help??? please!?

frozen moth
lapis sequoia
#

Ahhh

frozen moth
#

hoenstly it's a sh*t definition for an outlier

lapis sequoia
#

ahaha

#

usually the outliers problem isnt so easy xD

#

but i think its a training exercise so its ok

#

But I just need to check each point and get whichever is furthest from the line, right?

frozen moth
#

exactly

lapis sequoia
#

Just for loop through the data set

#

But

#

How do I get the distance from the line?

#

What do I say to get that?

#

euclidean distance

magic dune
#

does anyone kind of understand linear regression because I am stuck

lapis sequoia
#

Not at all

#

Ahahaa

frozen moth
#

you have your x value (length) and your y value (width

when you take your x value and put it into your LR eqn ^y = mx + b

you compare the real value y with the predicted value ^y

#

max(y - ^y ) do it for all of them an take out the maximum one

frozen moth
#

np

frozen moth
#

no clue

#

is the data NDA stuff?

lapis sequoia
#

yes

frozen moth
#

then i wouldn't

#

even being private

lapis sequoia
#

not my first choice either but I'm having issues on colab TPU

serene scaffold
#

Is there a way to transform this dataframe:

           0    1
0   0.435752  0.0
1   0.296690  0.0
2   0.737365  2.0
3   0.332111  1.0
4   0.030198  1.0

into this:

0     1 
0.0   0.435752  0.296690
1.0   0.332111  0.030198
2.0   0.737365
#

I know it's no longer rectangular data

frozen moth
#

split the df and then then merge?

#

nvm read it wrong

serene scaffold
#

I thought it might be the pivot method

lapis sequoia
#

have you tried groupby

serene scaffold
#

I didn't think that would have plotting functionality. I'm making density distribution plots for three classes.

rustic dew
#

in pivot you need to have unique indices

lapis sequoia
#

I'm thinking groupby column 1 and make a function that returns values having the number, maybe would take some more editing to get the column name in order

#

let me try and get back to you

radiant ingot
#

Hey everyone, I'm working with time series data and could use some opinions on the best way to format dates. I have to choose between datetime.datetime or numpy.datetime64 objects.

#

Leaning towards the native datetime library, but I thought that datetime64 may play nicer with certain models? Anyone run into this before?

rustic dew
#

I'd say if you use numpy for everything, roll with np.datetime64, if pandas, use pandas own datetimes, if mix or not sure, go with datetime.datetime

#

worst-case-scenario, you can always convert

radiant ingot
#

Right now we use a mix of pandas and numpy

#

Thanks, appreciate the thoughts

rustic dew
#

so practically, you can choose anything what you like:) personally, I like better native datetime.datetime, not sure why...

radiant ingot
#

Yeah I'm a bit spoiled because we were working in R before and the lubridate package made my life so easy haha

arctic wedgeBOT
#

Hey @fallow prism!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

serene scaffold
lapis sequoia
#

is there a way to convert list to a dataframe row

lapis sequoia
#

I export my env into .yaml and then import to anac nav, then I cd to the path and I find out that my project folder is missing, I go back to my laptop, cp folder, mv to cloud to then cp to path, any idea of how to do this faster than it is or how do you do it?

#

whats the objective

#

I work on my laptop most of the time but I decided to work the project on the desktop.

#

I'm curious about if there is a better or faster way to achieve this.

radiant ingot
#

Could this be done with git?

#

I'm no expert at these things but we use git (via github) for version control on all our code, perhaps you could just commit your env yaml as well and then pull whenever you want to work on a new machine

lapis sequoia
#

I will look into it, I think it could work, thanks for the idea.

lapis sequoia
#

nvm that

serene scaffold
#

@umbral oracle We don't allow people to recruit for paid opportunities of any kind here.

vagrant parcel
#

Hey guys, did anyone used DataQuest? I'm thinking about paying the pro year, but I wanted to hear from someone that used it (if this is not the place, where can I talk about it?)

lapis sequoia
vagrant parcel
#

I don't think my college here in Brasil has a partnership with them... But I'll ask anyway

frozen moth
frozen moth
frozen moth
frozen moth
#

Guys has anyone here classified job seniority based on job descriptions? [NLP]

prime cloud
#

I am trying to implement an environmental sound classifier using the urban sounds 8k data set but it seems like my validation loss seems to grow with the epochs. Any idea why?

#

The reference paper I am using gets about 74% accuracy

solemn oracle
#

I just moved to a new computer and am have trouble getting pandas to show my graphs in atom. It says it’s finished but shows nothing

#

Anything simple I’m missing?

#

Far as I can tell, I’m just doing df.plot()

lapis sequoia
#

Hey guys, so there’s this job opening for “Artificial Intelligence Engineer” role at this company that I am thinking I should apply to... this is the job post ... any tips on how I should prepare for that and what to study... I am fairly new to this

austere swift
lapis sequoia
#

Ahh okay sry

#

Thought I’d ask the data science people for some resources or tips

austere swift
#

its alr, just application and job stuff is more in that realm, although one tip i'd give you is to have some sort of example project you could show them

frozen moth
#

i study data science engineering and I'm not quite sure what an AI engineer is

#

i would assume that an AI eng would have to know the NLP and be comfortable with algorithms such as A* and be able to figure constrain satisfaction problems etc but the description for that job seems to be something a data scientist would do?

#

or maybe not

#

honestly idk

austere swift
#

its more of someone who can make machine learning/deep learning models to run in the field

lapis sequoia
#

I think they just mean data science/machine learning

frozen moth
#

fair enough

lapis sequoia
#

Know any good resource for learning some of the maths related to data science

#

Forgot most of my university maths 😅

frozen moth
#

it's basically statistics

#

and machine learning (SVM, LR, DT, RF, ANN, NB, etc.)

#

brush up on your multivariate analysis and statistics

lapis sequoia
#

Hmmm

frozen moth
#

were you looking for something more specific?

lapis sequoia
#

So my only experience with data science was like 2.5-3 years ago at my 5-6 months internship... was getting the hang of it until I stopped and life continued

frozen moth
#

whats your background?

lapis sequoia
#

Computer science degree and currently working in ASP.Net

#

But I kept using python here and there for automation and scripting

frozen moth
#

yea data science is mostly scripting

#

since you're compsci i assume your programming skills are good

#

so i'd say focus on the math and some info viz

#

the math you require is, like I said, stats, multivariate analysis and all that ML mumbo jumbo

#

you've got some pretty neat O'Reilly textbooks that focus on the math behind data science

#

you can torrent them for free

lapis sequoia
#

They say statistics, probability theory, machine learning algorithms and data modeling

#

In the post

frozen moth
#

yup sounds about right

lapis sequoia
#

And python data science stack, I’ve only used like pandas, numpy and some scikit learn from what I remember at my internship

#

Is this what they mean with that

frozen moth
#

idk tbh but it must be

waxen birch
frozen moth
#

you've got the python software packages that are common thru out all DS: sklearn, numpy, pandas, matplotlib

waxen birch
#

hello, having such a data in csv i would like to create df having period of time in this case having : Doctorid1 period 12:00-12:16

frozen moth
#

and then you have the ML ones like tensorflow/keras, and sklearn,

waxen birch
#

using pandas and groupby, does anyone has some clues? 🙂

frozen moth
#

the info viz stuff: seaborn, yellowbricks, dash plotly etc

#

the NLP ones like spaCy and NLTK

lapis sequoia
#

Uff yeah those I remember from my internship the NLP ones

frozen moth
#

then more specific ones ... for example id your dealing with networks you'd use networkX, powerlaw etc.

#

i guess through practice you'll start accumulating knowledge on these libraries

frozen moth
#

thats like your foundation

lapis sequoia
#

Right, lets see what I can do... the sucky thing is that my laptop is broken so I only have the PC at work to try and squeeze some learning while no one is looking 😅

frozen moth
#

good luck there buddi

lapis sequoia
#

Nice, was looking also at a site called analytics vidhya

#

Don’t know if they’re good

frozen moth
#

^^^ apparently you don't even have to download the textbook

#

its all there

frozen moth
lapis sequoia
#

Havent looked into them much but was reading a medium article by them

kind jungle
#

can someone please explain to me what is wrong

#

this just baffles me

spark stag
#

those aren't " your using so it doesn't see data.csv as a string, it sees it as a variable with some other type of quotes first (causing the invalid character)

south hedge
kind jungle
#

it does

#

jamiesaunders was right

lapis sequoia
#

Fix the quotes

#

Yeah

kind jungle
#

the first quote was apparently a "LATIN SMALL LETTER A WITH CIRCUMFLEX "

trim oar
#

Problem is I don't understand how did it inteprete hte quotes like that

south hedge
kind jungle
#

according to a character identifier

#

it works now

#

thx

#

:)

fallow prism
#

how i can to do to dataframe.head() show me all row?

austere swift
#

why not do print(dataframe) instead?

waxen birch
#

having this kind of data, using pandas i should print in one row (cell) a period of time (in this case it should be 12:00 - 12:16), any clues? 😄

torpid cave
#

Sorry @lapis sequoia I went to sleep

#

Still need help?

fallow prism
#

still cut it

#

my problem is the width, i need more width for each row or wrap rows

#

dataframe.apply(print) and that is all

#

or Serie.apply(print)

#

thanks !

#

oh, that isn't works 😢

#

that 3 points

#

don't like to me

#

a['descripcion_del_hecho - Final'][:5].apply(print) that works fine for me i guess the other ways is mor difficult

#

more*

#

😅

#

pd.options.display.max_colwidth=None

#

that work better

river yarrow
#

any someone with kaggle competition experiment?

blazing bridge
#

are you asking if someone wants to do a kaggle competition with you

river yarrow
#

Do I have the right to edit the notebook after a competition deadline in Kaggle is over?

blazing bridge
#

not sure

river yarrow
#

I found

#

You can make a submission at any time and as many times as you like, but we will only consider your latest submission before the deadline.

magic dune
#

I need help writing a linear regression code can someone help???

#

@glad mulch here is my code I want to make a linear regression line```py
import pandas, os
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import linear_model

root=os.path.dirname(file)
data_dir=os.path.join(root,"data")
fig_dir=os.path.join(data_dir,"figs")

def make_plt(x,y,df):
x_list=df[x].to_list()
y_list=df[y].to_list()
x_train, x_test, y_train, y_test = train_test_split(x_list, y_list, test_size=0.2, random_state=42)
linear=linear_model.LinearRegression
plt.title("Coding Books")
plt.legend(["train","test"])
plt.scatter(x_train,y_train)
plt.scatter(x_test,y_test)
plt.savefig(os.path.join(fig_dir,f"{x}-{y}.png"))
plt.close()
def main():
data_raw=os.path.join(data_dir,"prog_book.csv")
raw_df=pandas.read_csv(data_raw)
raw_df["Reviews"]=raw_df["Reviews"].str.replace(",","")
raw_df['Reviews'] = raw_df['Reviews'].astype(int)
#plot price verus rating plot steps #1 turn columns into lists
lists=["Rating","Reviews","Number_Of_Pages","Type","Price"]

for col in lists:
    for col2 in lists:
        if col2 != col:
            make_plt(col,col2, raw_df)


#step #2 use plt.plot to plot the lists
print(lists)
print(type(lists[0]))


# # step #3 export the plot to a pdf

# #regresion lines

if name == 'main':
main()

#

I do not know how to make the line

#

I know the different equations but other than that I have no idea what I am supposed to do

#

thank you so much

#

your a big life saver

neat dew
#

can anyone help me install tensorflow on IDLE? i seem to keep getting callback errors when attempting to import and need it working for a school assignment 😦

cedar sun
#

ValueError: Input 0 is incompatible with layer conv2d_1: expected ndim=4, found ndim=3

#

i am getting this error

#

my images are black and white

#

img_array = cv2.imread(os.path.join(path, images[i]), 0)

#

opening them with 0 turns into black white i guess

#
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=dimension))```
#

dimension = (64, 64)

#

where is the error?

sharp stump
#

dimension amount is different i guess ¯_(ツ)_/¯
you could use stack overflow...