#data-science-and-ml | Python | Page 237

desert parcel Jul 23, 2020, 10:12 AM

#

so x_test is the stuff inside the .csv and it uses the things the came up with in x_train and compare it against x_test?

spark stag Jul 23, 2020, 10:12 AM

#

so you are using a linear regression model, that tries to fit the data given to it by creating a line of best fit through the data, if you think of it as a 2d graph for now, what the model is trying to do isfind the best gradient and intercept of a line so that this line of best fit goes through the data

desert parcel Jul 23, 2020, 10:12 AM

#

alright yeah

spark stag Jul 23, 2020, 10:14 AM

#

x_train and x_test is just the data from x defined at the top but split into 2 groups, it takes most of that data for training but reserves some for testing its performance on at the end

desert parcel Jul 23, 2020, 10:15 AM

#

so the percentage it gives out at the end

#

are the y values?

spark stag Jul 23, 2020, 10:18 AM

#

so lets say for example you had some data like py array([-1.15878911, 2.93868307, -1.59251035, 2.96522191, -1.47123134, 2.73263764, 2.17527494, 2.90636932]) this would be the data in x at the beggining, (there would be values in y for the true values as well), when you pass this to sklearn.model_selection.train_test_split(), it takes some of the data away e.g. -1.59251035 to be used for testing ( x_test) at the end, and the rest of it is kept for training on (x_train) the 0.1 means 10% of the data is for training

#

you don't give all of the data to the model because then you don't know if it is just memorizing the input so you test it on new data to see how well it can make predictions on new data

desert parcel Jul 23, 2020, 10:19 AM

#

ahhh

#

this is a lot clearer

#

so that's what

#

train_test_split() does

#

so for example

spark stag Jul 23, 2020, 10:21 AM

#

yes, it partitions all the data you give it so it can be used for training and then checking if your model has done a good job at finding general patterns in the data

desert parcel Jul 23, 2020, 10:21 AM

#

 14 x = np.array(data.drop([predict], 1))                                                                               15 y = np.array(data[predict])                                                                                         16                                                                                                                     17 x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)                    18 linear = linear_model.LinearRegression()     ```

#

ignore the 14s, and 15s

#

so if instead of x and y at the first two rows

#

if I did like

#

dataA and dataB

#

then on the train, and test lines

#

would it be

#

dataA_train, dataA_test, and so on?

spark stag Jul 23, 2020, 10:23 AM

#

sort of yes, but y isn't used as data the model uses to make predictions with, its only used as a comparison at the end to see how well that predicion did so the model knows how it should change to make better future predictions

desert parcel Jul 23, 2020, 10:23 AM

#

oh so then

#

y is like the answer sheet

#

it compares it's own results (x) with the answers (y)?

amber hound Jul 23, 2020, 10:25 AM

#

Hello there! I'm new to the server and I'm coming with a few questions about an error I'm getting using cosine similarity (for a Count Vectorizer Matrix) and linear kernel (TFIDF vectorizer matrix) of Sklearn. I'm getting a Memory Error each time I try to compute the similarities as my data-frame is too big (df.shape = (394592, 7)) and I don't know how to approach this problem. Any help would be appreciated!

spark stag Jul 23, 2020, 10:25 AM

#

for example if I was trying to teach you to classify 2 new animals, the data x could be an image of them, and the labels y could be which animal that is, you would make a predicion from the image I showed you (x data), then I would tell you what that animal actuall was (y labels), this way you learn from your mistakes, this is called supervised learning

desert parcel Jul 23, 2020, 10:27 AM

#

ahh

#

so like your answers against an answer sheet then

#

Yeah that makes sense

#

This is gonna be one really long course to finish

spark stag Jul 23, 2020, 10:28 AM

#

its important to note here that x isnt the prediction but the data e.g. the image, your prediction is made based on the data x but its not the same thing

desert parcel Jul 23, 2020, 10:29 AM

#

oh.

#

then what is the prediction then

#

y_train or y_test?

#

Or am I still not getting it

spark stag Jul 23, 2020, 10:32 AM

#

the prediction isn't any of the data you store, htat is just data, the prediction is based on those variables but because as the model learns it may interpret data differentl, the predications cannot be stored before hand, they are made by the model in linear.fit and linear.score, then it checks agains the true value y what it should of predicted

#

i haven't used sklearn much but see if the model (liner) has a predict method (or something similar), if so you can print out a single data item, print out linear.predict(<that item>), then print out label (y) for that prediction, it may help you understand the difference between them

desert parcel Jul 23, 2020, 10:35 AM

#

alright i'll mess around with this

#

You can help the other person

amber hound Jul 23, 2020, 10:41 AM

#

Hello there! I'm new to the server and I'm coming with a few questions about an error I'm getting using cosine similarity (for a Count Vectorizer Matrix) and linear kernel (TFIDF vectorizer matrix) of Sklearn. I'm getting a Memory Error each time I try to compute the similarities as my data-frame is too big (df.shape = (394592, 7)) and I don't know how to approach this problem. Any help would be appreciated!
@amber hound I've heard of Gensim for the TFIDF case and I'm searching information on how to implement it but I don't know what to do about the Count Vectorizer and Cosine Similarity memory error.

dull turtle Jul 23, 2020, 11:17 AM

#

hello, i am using pycountry module.
i am getting "country_code" : "IND" in request.
how i can make use it in my code to get the value of respected country
my code here
modelType = pycountry.countries.get(alpha_3='country_code').name

desert oar Jul 23, 2020, 12:45 PM

#

@dull turtle you should use one of our general purpose help channels. See #❓｜how-to-get-help

#

@amber hound you want the entire pairwise similarity matrix for almost 400k vectors?

#

That's like trillions of nonzero entries in the similarity matrix. No surprise it won't fit in RAM

#

Sorry not trillions. Many billions though

#

So multiply that by 64 bit or 32 bit for minimum RAM usage

#

What do you actually want to do with that

river thistle Jul 23, 2020, 1:38 PM

#

In SageMath, is it possible to calculate probabilities of multiple sets, using P(x) notation?

desert oar Jul 23, 2020, 4:26 PM

#

no @void anvil it sends data back and forth between python and an R process

#

so yes you still need R

chilly geyser Jul 23, 2020, 4:33 PM

#

@amber hound Yes, your df is too big.
Consider that your distance matrix would be 394592^2 = 155702846464 entries. Assuming 1 byte each, that's 155 GB. Typically you'd store reals or floats, which are 4 bytes. Assuming you store above-diagonal only since it's symmetric, you'd still need approx 2x of 155GB or 300GB+

There is no sane or normal computer with approx. 300GB RAM, assuming it's processed in RAM.

I'd recommend you somehow make your df smaller with some qualitative reasoning

I must admit IDK why your df has 7 columns though.

desert oar Jul 23, 2020, 4:35 PM

#

for context, the 32 core general purpose ML server that my team uses has 256 GB of RAM

chilly geyser Jul 23, 2020, 4:35 PM

#

That's some serious investment mannnnnn

desert oar Jul 23, 2020, 4:35 PM

#

it's a physical on-prem machine, so it's not like they're paying through the nose for some cloud services

#

but yes

#

this is neither a sane nor normal setup 😆

chilly geyser Jul 23, 2020, 4:36 PM

#

I'm pretty sure I can get access to supercomputing for highly parallelizable loads, though I can't say how much it'd cost. Not too sure about specific high-corecount single-machine-ish things

desert oar Jul 23, 2020, 4:36 PM

#

but yeah point being you'd basically have to parallelize the distance computation and dump the results to disk periodically

#

which again begs the question: what exactly do you need such a huge distance matrix for?

chilly geyser Jul 23, 2020, 4:37 PM

#

I hope there's a better way to do the distance-matrix thing though. Scaling by n^2 in memory means it's quite impossible to develop large-scale things

desert oar Jul 23, 2020, 4:37 PM

#

not that i know of. it's a fundamental limitation of techniques that require a full distance matrix

chilly geyser Jul 23, 2020, 4:37 PM

#

Large distance matrices are always useful

desert oar Jul 23, 2020, 4:38 PM

#

yeah but for what? are you going to compute hdbscan on it or something?

#

obviously if you are building a database it makes sense that you'd want to construct an index of some kind

#

that actually might be better

#

is there a general-purpose on-disk "vector database" for doing neighbor queries and stuff?

chilly geyser Jul 23, 2020, 4:42 PM

#

Well I have used distance matrices in 2 instances:

node-A to node-B distance in a graph. The more different locations, the more nodes you get. And also generalises to anything that 'nodes' can be, which can scale to very large numbers
NLP word vectors. A larger vocabulary is better than a smaller vocabulary

#

word-vec A dist to word-vec B is useful IMO

#

You'd need the whole distance matrix if you want to query any pair, or alternatively you could just query specific word vectors/elements themselves to get closest for just one I guess, which should scale linearly in terms of memory

desert oar Jul 23, 2020, 4:46 PM

#

yes, this is what indexes are for

#

go look at how e.g. fasttext does it, that's all written in very simple readable C++

#

and just computing distance for a specific pair of vectors isn't that slow

chilly geyser Jul 23, 2020, 4:49 PM

#

I doubt you'd want a specific specific pair

#

For a specificvector-to-all-other-vectors I can see applications yes

#

Also I must be really late to the party, but I found out about fastText from what you just said

#

Welp, guess there's always new things, although I do NLP more for hobby (what's a close word to "xyz"?) than anything

desert oar Jul 23, 2020, 5:07 PM

#

yeah there are a few of these "oddball" ML tools out there

#

vowpal wabbit, fasttext, starspace

#

the latter 2 are facebook research products

chilly geyser Jul 23, 2020, 5:07 PM

#

Interestingly I don't see fastText data in CC0 license 😦

desert oar Jul 23, 2020, 5:07 PM

#

software typically isnt CC licensed

chilly geyser Jul 23, 2020, 5:08 PM

#

I meant the word vectors themselves

desert oar Jul 23, 2020, 5:08 PM

#

ah

#

https://fasttext.cc/docs/en/english-vectors.html

English word vectors · fastText

This page gathers several pre-trained word vectors trained using fastText.

#

CC BY-SA 3.0

chilly geyser Jul 23, 2020, 5:08 PM

#

That said I'm surprised a Gigaword-trained data is dedicated to Public Domain

#

Yeah it's BY-SA, which is copyleft-forcing

#

Which means no commercialisation

desert oar Jul 23, 2020, 5:09 PM

#

that isn't true at all

#

that would be NC

#

this is not a NC license

chilly geyser Jul 23, 2020, 5:09 PM

#

Oh

#

Right

desert oar Jul 23, 2020, 5:09 PM

#

it isn't even "viral" like the GPL

chilly geyser Jul 23, 2020, 5:09 PM

#

Ah I got confused

#

So it's not very functionally different from MIT/BSD

desert oar Jul 23, 2020, 5:09 PM

#

im not sure how CC BY-SA handles derivative works

#

so you need to check

#

well no... if you modify the data you must share it under BY-SA

chilly geyser Jul 23, 2020, 5:10 PM

#

Ah, I guess MIT is more permissive?

desert oar Jul 23, 2020, 5:10 PM

#

so for the data itself, it is viral like GPL

chilly geyser Jul 23, 2020, 5:10 PM

#

It's definitely easier to self-train

desert oar Jul 23, 2020, 5:11 PM

#

but for derivative works e.g. a software product that uses said data internally, i don't know

#

well the data itself is under the license too

#

so a trained model would be a derivative work of the source data

chilly geyser Jul 23, 2020, 5:11 PM

#

probably not at the same scale or with the exact same data

desert oar Jul 23, 2020, 5:11 PM

#

ah derivative work i think still requires SA

#

"Adaptation" means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered an Adaptation for the purpose of this License.

#

https://creativecommons.org/licenses/by-sa/3.0/legalcode

#

"Distribute" means to make available to the public the original and copies of the Work or Adaptation, as appropriate, through sale or other transfer of ownership.

chilly geyser Jul 23, 2020, 5:13 PM

#

I just realised Wikipedia is CC-BY-SA 3.0 legally speaking too

desert oar Jul 23, 2020, 5:13 PM

#

yes

chilly geyser Jul 23, 2020, 5:13 PM

#

Welllllllll I guess I'm never working for an NLP company

desert oar Jul 23, 2020, 5:13 PM

#

i dont understand your issue

#

you want to do hobby projects?

#

who cares, you arent distributing them

chilly geyser Jul 23, 2020, 5:14 PM

#

Nah my hobby projects are obviously doable

desert oar Jul 23, 2020, 5:14 PM

#

you want to start a company and use other people's free data? sucks, follow the license terms

chilly geyser Jul 23, 2020, 5:14 PM

#

But for any monetary purposes I'd need to declare it comes via training on Wiki, for example

#

Yeah hmm

desert oar Jul 23, 2020, 5:14 PM

#

that's my guess as to why they bothered using CC BY-SA 3.0

#

because the source wikipedia data is, and their dataset + trained model are derivative works

#

also english gigaword has a whole license agreement attached to it

#

not sure what public domain version there is

chilly geyser Jul 23, 2020, 5:16 PM

#

Yeahhhh

#

I got a feeling that it can't be sent to Public Domain ...

desert oar Jul 23, 2020, 5:18 PM

#

are you looking for a public domain dataset?

chilly geyser Jul 23, 2020, 5:18 PM

#

That'd be nice

#

I'd be able to publish that under permissive licenses on Github without troubles 😛

#

Ah well I should have just used my original Wikinews dump

📎 unknown.png

#

But honestly Wikinews is very small.

serene scaffold Jul 23, 2020, 5:23 PM

#

@desert oar going line-by-line worked but produced an output with fewer lines but a much larger file size

#

I'm thoroughly confused.

mellow spruce Jul 23, 2020, 5:40 PM

#

@desert oar That worked out. Thank you so much!!
@mellow spruce Hi is me again. This worked great, I have a question tho. I want to apply this calculations in two different columns i.e want to calculate the idle time between and activity is over and the next activity starts for these different groups I have created. I tried time_diff_byname=lf.groupby('name')['Time','Time 2'].apply(lambda y:y['Time'].shift(-1)-y['Time2'] But that didn't work

desert oar Jul 23, 2020, 5:41 PM

#

first of all, write [['Time, 'Time 2']]

#

actually that should do it

#

or you can write lf[['Time', 'Time 2', 'name']].groupby('name')

mellow spruce Jul 23, 2020, 5:43 PM

#

That did it king. much thanks

river thistle Jul 23, 2020, 5:47 PM

#

In SageMath, is it possible to calculate probabilities of multiple sets, using P(x) notation?

desert oar Jul 23, 2020, 5:48 PM

#

idk if anyone here uses sage :/ i saw you've asked this a few times

earnest wadi Jul 23, 2020, 5:57 PM

#

Hello, Im having some trouble using a dataset I have created with tensor flow

im getting this error:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

this is the code:

train_data = np.asarray(train_data)
train_labels = np.asarray(train_labels)
test_data = np.asarray(test_data)
test_labels = np.asarray(test_labels)
return (train_data, train_labels), (test_data, test_labels)


model.fit(train_data, train_labels, epochs=10)

#

ask for any other stuff u need

mellow spruce Jul 23, 2020, 5:59 PM

#

So when I tried to add time_diff_byname to the data frame with lf['Time_diff']=time_diff_byname.reset_index(level-1, drop=True) it gives me this error message cannot reindex from a duplicate axis any ideas to append this column?

desert oar Jul 23, 2020, 6:19 PM

#

@earnest wadi what does train_data and test_data contain?

#

@mellow spruce does time_diff_byname.index contain duplicate values?

#

.reset_index(level=-1, drop=True)
```?

#

can you show what time_diff_byname.head() contains

mellow spruce Jul 23, 2020, 6:23 PM

#


      4    -1 days +23:10:10

      8    -1 days +22:49:50

      12                 NaT

Mary  2    -1 days +22:22:35```

#

It might contain duplicates

desert oar Jul 23, 2020, 6:27 PM

#

ah

#

im actually not sure how it would, tbh

#

oh did i steer you wrong on this too

#

does lf itself have duplicate indices?

#

or no

mellow spruce Jul 23, 2020, 6:29 PM

#

nop, I haven't set indices to the df yet

desert oar Jul 23, 2020, 6:29 PM

#

can you send me some sample data again

mellow spruce Jul 23, 2020, 6:30 PM

#


   'tool':['Hammer', 'Drill','Wipes', 'Driver', 'Drill','Wipes','Hammer', 'Driver','Driver', 'Drill','Hammer', 'Drill', 'Drill','Wipes','Hammer', 'Driver'],

   'Time':['13:40:31','13:20:33','13:05:00','12:15:28','12:00:00','11:43:35','11:27:35','11:17:22','11:10:10','10:59:11','10:22:15','10:12:10','10:00:00','09:55:05','09:45:45','09:16:35']}

lf=pd.DataFrame(data=d)

lf['Time']=pd.to_timedelta(lf['Time'])```

desert oar Jul 23, 2020, 6:33 PM

#

so, groupby on series and dataframes have different semantics

#

with respect to how the indices are constructed at the end

#

which i did not realize

#

try level=0 instead of level=-1

#


In [31]: time_diff_byname
Out[31]:
name
John     0    12:00:00
         4    11:10:10
         8    10:00:00
         12        NaT
Mary     2    11:27:35
         6    10:22:15
         10   09:45:45
         14        NaT
Peter    1    11:43:35
         5    10:59:11
         9    09:55:05
         13        NaT
Richard  3    11:17:22
         7    10:12:10
         11   09:16:35
         15        NaT

this is whati get

#

which means the first (0th) level of the index needs to go

#

not the last

#

that was my mistake

mellow spruce Jul 23, 2020, 6:34 PM

#

lf['Time_diff']=time_diff_byname.reset_index(level-1, drop=True) in here?

#

that worked out correctly, thank you so much master!!1

earnest wadi Jul 23, 2020, 6:43 PM

#

@desert oar
they both look like lots of these

  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0] ```

there is a corresponding word in a word index for each number

#

it cant print the whole thing

 [   4    5   14 ...    0    0    0]
 [  22   23    5 ...    0    0    0]
 ...
 [1080   32    5 ...    0    0    0]
 [  89    5   25 ...    0    0    0]
 [ 448   59   76 ...    0    0    0]] [[ 330  366 1032 ...    0    0    0]
 [1125   22  615 ...    0    0    0]
 [1142  134    5 ...    0    0    0]
 ...
 [ 126 2402  128 ...    0    0    0]
 [  33 2419  248 ...    0    0    0]
 [ 128 2430   22 ...    0    0    0]]```

desert oar Jul 23, 2020, 6:44 PM

#

is that a 2d array?

#

it might be an array of lists

#

which the error message hints at

earnest wadi Jul 23, 2020, 6:44 PM

#

oh yeah that would actually make sense

#

ill iterate through it and make sure they are all arrays

#

still getting this ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

#

I did this:

for i in range(len(test_labels)):
        for r in range(len(test_labels[i])):
            test_labels[i][r] = int(test_labels[i][r])
        test_labels[i] = np.array(test_labels[i])

desert oar Jul 23, 2020, 6:48 PM

#

you still have an array of arrays

#

where is this data coming from?

#

how is it constructed?

#

seems like you ought to flatten this to a 2d array

earnest wadi Jul 23, 2020, 6:50 PM

#

the data is from a file structured like this

data [['4', '5', '6', '7', '8', '5', '4', '11'], ['4', '5', '14', '15', '16', '5', '4', '6', '20', '21'], ['22', '23', '5', '25', '6', '27', '28', '29', '30', '31'], ['32', '33', '34', '35', '6', '37', '22', '39', '25', '41', '42', '43', '44', '45'], ['46', '47', '22', '49', '50', '5', '52', '47', '54', '22', '49', '57', '41', '59', '5', '61', '62', '22', '50', '5', '66'], ['67', '68', '69', '22', '71', '72', '73', '42', '75', '76', '33', '78'], ['79', '47', '81', '16', '83', '84', '85', '86', '44', '5'], ['89', '5', '25', '92', '93', '16', '95', '96', '97', '44', '47'], ['100', '5', '102', '103', '104', '44', '106', '16', '5', '109', '110', '111'], ['5', '113', '42', '6', '116', '117', '118', '119', '6', '121', '16', '123', '5', '125', '126', '127', '128', '129', '130'], ['89', '5', '133', '134', '135', '8', '137', '138', '126', '140'], ['125', '33', '143', '144', '8', '146', '147', '44', '149', '150'], ['32', '5', '153', '6', '155', '156', '157', '42', '6', '160'], ['32', '5', '153', '6', '165', '157', '42', '6', '169'], ['125', '33', '172', '173', '16', '137', '176', '177'], ['100', '5', '180', '126', '182', '16', '5', '185', '186', '187', '44', '47'], ['4', '5', '129', '193', '16', '195', '6', '197', '198', '119', '129', '201', '202', '47', '22', '205', '206', '5', '208'], ['32', '22', '153', '6', '7', '214', '215', '216', '217', '218', '219', '5'], ['89', '5', '133', '134', '225', '16', '22', '25', '5', '6', '231'], ['47', '233', '5', '125', '25', '6', '238', '233', '240', '6', '242', '233', '244', '6', '246'], ['137', '248', '249', '5', '251', '47', '253', '254', '33', '143']]```

I then split it up into train and test data

#

how do I flatten it

desert oar Jul 23, 2020, 6:51 PM

#

that's what the data looks like in the file?

#

what kind of horrible data format is that

earnest wadi Jul 23, 2020, 6:51 PM

#

yes

#

haha

#

its litterally just the list

#

I slapped it in there

desert oar Jul 23, 2020, 6:52 PM

#

:/

#

use json

#

please

earnest wadi Jul 23, 2020, 6:52 PM

#

alr haha

#

I will learn that for the future

#

point is

desert oar Jul 23, 2020, 6:52 PM

#

or .npy

earnest wadi Jul 23, 2020, 6:52 PM

#

I can extract the data back in to a python list

desert oar Jul 23, 2020, 6:52 PM

#

https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html#module-numpy.lib.format

#

yeah but i shudder to imagine how

#

either way you need to make sure this is correctly handled as a 2d matrix, not a 1d array of 1d arrays

earnest wadi Jul 23, 2020, 6:53 PM

#

litterally just reading it, splitting it at the \n then at the first " " to remove data

#

alright so how to I do that

desert oar Jul 23, 2020, 6:53 PM

#

then evaling it right?

#

blah

earnest wadi Jul 23, 2020, 6:54 PM

#

how can I make it a 2d matrix

desert oar Jul 23, 2020, 6:55 PM

#

ah

#

wait

#

these all have different lengths

earnest wadi Jul 23, 2020, 6:56 PM

#

oh, they are all converted to the same length by keras later down the line

#

hang on

#

[ 4 5 6 7 8 5 4 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

#

like this

desert oar Jul 23, 2020, 7:01 PM

#

so you need to 1) read the data, 2) pad them with 0s, 3) convert to np array

#

is that right?

earnest wadi Jul 23, 2020, 7:01 PM

#

by "need" do you mean thats what im doing?

desert oar Jul 23, 2020, 7:02 PM

#

well that's what you should be doing

earnest wadi Jul 23, 2020, 7:02 PM

#

I thinking im np arraying after padding

#

nope

#

padding is last currently

#

should I change

desert oar Jul 23, 2020, 7:02 PM

#

yes

#

if you try to make a numpy array out of uneven length lists

#

it will never make a 2d array from that

earnest wadi Jul 23, 2020, 7:03 PM

#

okay

desert oar Jul 23, 2020, 7:03 PM

#

it will always be an array of arrays

earnest wadi Jul 23, 2020, 7:04 PM

#

[[ 4 5 6 ... 0 0 0]
[ 4 5 14 ... 0 0 0]
[ 22 23 5 ... 0 0 0]
...
[1080 32 5 ... 0 0 0]
[ 89 5 25 ... 0 0 0]
[ 448 59 76 ... 0 0 0]]

#

its now padded before being an converted to an array

#

but still getting an error

desert oar Jul 23, 2020, 7:07 PM

#

show the error

#

same one?

earnest wadi Jul 23, 2020, 7:07 PM

#

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

desert oar Jul 23, 2020, 7:09 PM

#

data_txt = """data [['4', '5', '6', '7', '8', '5', '4', '11'], ['4', '5', '14', '15', '16', '5', '4', '6', '20', '21'], ['22', '23', '5', '25', '6', '27', '28', '29', '30', '31'], ['32', '33', '34', '35', '6', '37', '22', '39', '25', '41', '42', '43', '44', '45'], ['46', '47', '22', '49', '50', '5', '52', '47', '54', '22', '49', '57', '41', '59', '5', '61', '62', '22', '50', '5', '66'], ['67', '68', '69', '22', '71', '72', '73', '42', '75', '76', '33', '78'], ['79', '47', '81', '16', '83', '84', '85', '86', '44', '5'], ['89', '5', '25', '92', '93', '16', '95', '96', '97', '44', '47'], ['100', '5', '102', '103', '104', '44', '106', '16', '5', '109', '110', '111'], ['5', '113', '42', '6', '116', '117', '118', '119', '6', '121', '16', '123', '5', '125', '126', '127', '128', '129', '130'], ['89', '5', '133', '134', '135', '8', '137', '138', '126', '140'], ['125', '33', '143', '144', '8', '146', '147', '44', '149', '150'], ['32', '5', '153', '6', '155', '156', '157', '42', '6', '160'], ['32', '5', '153', '6', '165', '157', '42', '6', '169'], ['125', '33', '172', '173', '16', '137', '176', '177'], ['100', '5', '180', '126', '182', '16', '5', '185', '186', '187', '44', '47'], ['4', '5', '129', '193', '16', '195', '6', '197', '198', '119', '129', '201', '202', '47', '22', '205', '206', '5', '208'], ['32', '22', '153', '6', '7', '214', '215', '216', '217', '218', '219', '5'], ['89', '5', '133', '134', '225', '16', '22', '25', '5', '6', '231'], ['47', '233', '5', '125', '25', '6', '238', '233', '240', '6', '242', '233', '244', '6', '246'], ['137', '248', '249', '5', '251', '47', '253', '254', '33', '143']]"""

data = eval(data_txt.split(' ', maxsplit=1)[1])
max_len = max(len(rec) for rec in data)
data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
data = np.array(data)

works for me

#

resulting shape is (21, 21)

earnest wadi Jul 23, 2020, 7:10 PM

#

damn ok so uh

#

thats just like

#

what i was tring to do, but not small brain

desert oar Jul 23, 2020, 7:10 PM

#

def parse_horrible_format(txt):
    data = eval(data_txt.split(' ', maxsplit=1)[1])
    max_len = max(len(rec) for rec in data)
    data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
    data = np.array(data)
    return data

here's a function to do it 🙂

#

yes, small brain good

earnest wadi Jul 23, 2020, 7:10 PM

#

hahaha

#

parse horrible format

#

so if I input """data [['4', '5', '6', '7', '8', '5', '4', '11'], ['4', '5', '14', '15', '16', '5', '4', '6', '20', '21'], ['22', '23', '5', '25', '6', '27', '28', '29', '30', '31'], ['32', '33', '34', '35', '6', '37', '22', '39', '25', '41', '42', '43', '44', '45'], ['46', '47', '22', '49', '50', '5', '52', '47', '54', '22', '49', '57', '41', '59', '5', '61', '62', '22', '50', '5', '66'], ['67', '68', '69', '22', '71', '72', '73', '42', '75', '76', '33', '78'], ['79', '47', '81', '16', '83', '84', '85', '86', '44', '5'], ['89', '5', '25', '92', '93', '16', '95', '96', '97', '44', '47'], ['100', '5', '102', '103', '104', '44', '106', '16', '5', '109', '110', '111'], ['5', '113', '42', '6', '116', '117', '118', '119', '6', '121', '16', '123', '5', '125', '126', '127', '128', '129', '130'], ['89', '5', '133', '134', '135', '8', '137', '138', '126', '140'], ['125', '33', '143', '144', '8', '146', '147', '44', '149', '150'], ['32', '5', '153', '6', '155', '156', '157', '42', '6', '160'], ['32', '5', '153', '6', '165', '157', '42', '6', '169'], ['125', '33', '172', '173', '16', '137', '176', '177'], ['100', '5', '180', '126', '182', '16', '5', '185', '186', '187', '44', '47'], ['4', '5', '129', '193', '16', '195', '6', '197', '198', '119', '129', '201', '202', '47', '22', '205', '206', '5', '208'], ['32', '22', '153', '6', '7', '214', '215', '216', '217', '218', '219', '5'], ['89', '5', '133', '134', '225', '16', '22', '25', '5', '6', '231'], ['47', '233', '5', '125', '25', '6', '238', '233', '240', '6', '242', '233', '244', '6', '246'], ['137', '248', '249', '5', '251', '47', '253', '254', '33', '143']]"""

#

it will return

#

train_data that will work with tf?

desert oar Jul 23, 2020, 7:11 PM

#

no idea, but it will definitely do the padding and np.array conversion for you

earnest wadi Jul 23, 2020, 7:11 PM

#

alright

#

ill get that in

#

and see what happens

#

@desert oar your function doest work entirely

AttributeError: 'list' object has no attribute 'split'

desert oar Jul 23, 2020, 7:21 PM

#

data_txt should be a string

#

obviously this won't work on a list..

earnest wadi Jul 23, 2020, 7:22 PM

#

what string should it be

desert oar Jul 23, 2020, 7:26 PM

#

this

"""data [['4', '5', '6', '7', '8', '5', '4', '11'], ['4', '5', '14', '15', '16', '5', '4', '6', '20', '21'], ['22', '23', '5', '25', '6', '27', '28', '29', '30', '31'], ['32', '33', '34', '35', '6', '37', '22', '39', '25', '41', '42', '43', '44', '45'], ['46', '47', '22', '49', '50', '5', '52', '47', '54', '22', '49', '57', '41', '59', '5', '61', '62', '22', '50', '5', '66'], ['67', '68', '69', '22', '71', '72', '73', '42', '75', '76', '33', '78'], ['79', '47', '81', '16', '83', '84', '85', '86', '44', '5'], ['89', '5', '25', '92', '93', '16', '95', '96', '97', '44', '47'], ['100', '5', '102', '103', '104', '44', '106', '16', '5', '109', '110', '111'], ['5', '113', '42', '6', '116', '117', '118', '119', '6', '121', '16', '123', '5', '125', '126', '127', '128', '129', '130'], ['89', '5', '133', '134', '135', '8', '137', '138', '126', '140'], ['125', '33', '143', '144', '8', '146', '147', '44', '149', '150'], ['32', '5', '153', '6', '155', '156', '157', '42', '6', '160'], ['32', '5', '153', '6', '165', '157', '42', '6', '169'], ['125', '33', '172', '173', '16', '137', '176', '177'], ['100', '5', '180', '126', '182', '16', '5', '185', '186', '187', '44', '47'], ['4', '5', '129', '193', '16', '195', '6', '197', '198', '119', '129', '201', '202', '47', '22', '205', '206', '5', '208'], ['32', '22', '153', '6', '7', '214', '215', '216', '217', '218', '219', '5'], ['89', '5', '133', '134', '225', '16', '22', '25', '5', '6', '231'], ['47', '233', '5', '125', '25', '6', '238', '233', '240', '6', '242', '233', '244', '6', '246'], ['137', '248', '249', '5', '251', '47', '253', '254', '33', '143']]"""

#

if you are starting with the literal list

[['4', '5', '6', '7', '8', '5', '4', '11'], ['4', '5', '14', '15', '16', '5', '4', '6', '20', '21'], ['22', '23', '5', '25', '6', '27', '28', '29', '30', '31'], ['32', '33', '34', '35', '6', '37', '22', '39', '25', '41', '42', '43', '44', '45'], ['46', '47', '22', '49', '50', '5', '52', '47', '54', '22', '49', '57', '41', '59', '5', '61', '62', '22', '50', '5', '66'], ['67', '68', '69', '22', '71', '72', '73', '42', '75', '76', '33', '78'], ['79', '47', '81', '16', '83', '84', '85', '86', '44', '5'], ['89', '5', '25', '92', '93', '16', '95', '96', '97', '44', '47'], ['100', '5', '102', '103', '104', '44', '106', '16', '5', '109', '110', '111'], ['5', '113', '42', '6', '116', '117', '118', '119', '6', '121', '16', '123', '5', '125', '126', '127', '128', '129', '130'], ['89', '5', '133', '134', '135', '8', '137', '138', '126', '140'], ['125', '33', '143', '144', '8', '146', '147', '44', '149', '150'], ['32', '5', '153', '6', '155', '156', '157', '42', '6', '160'], ['32', '5', '153', '6', '165', '157', '42', '6', '169'], ['125', '33', '172', '173', '16', '137', '176', '177'], ['100', '5', '180', '126', '182', '16', '5', '185', '186', '187', '44', '47'], ['4', '5', '129', '193', '16', '195', '6', '197', '198', '119', '129', '201', '202', '47', '22', '205', '206', '5', '208'], ['32', '22', '153', '6', '7', '214', '215', '216', '217', '218', '219', '5'], ['89', '5', '133', '134', '225', '16', '22', '25', '5', '6', '231'], ['47', '233', '5', '125', '25', '6', '238', '233', '240', '6', '242', '233', '244', '6', '246'], ['137', '248', '249', '5', '251', '47', '253', '254', '33', '143']]

then feel free to delete the first line where you split the string and eval

earnest wadi Jul 23, 2020, 7:44 PM

#

oh

#

I see

#

@desert oar still the same error
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

desert oar Jul 23, 2020, 7:47 PM

#

show your code..

earnest wadi Jul 23, 2020, 7:50 PM

#

def import_data(dsdir):
    ds = open(f"{dsdir}.cds", "r", encoding='cp1252')
    dataset = ds.read()
    ds.close()

    dataset = dataset.split("\n")
    splitup = []
    for part in dataset:
        splitup.append(part.split(" ", 1))

    splitup[1][1] = splitup[1][1].replace("\'", "\"")
    splitup[2][1] = splitup[2][1].replace("\'", "\"")
    data = json.loads(splitup[1][1])
    labels = json.loads(splitup[2][1])

    train_data = (data[:len(data)//2])
    train_labels = (data[:len(labels)//2])
    test_data = (data[len(data)//2:])
    test_labels = (data[len(labels)//2:])

    def parse_horrible_format(data):
        max_len = max(len(rec) for rec in data)
        data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
        data = np.array(data)
        return data    
    
    train_data = parse_horrible_format(train_data)
    test_data = parse_horrible_format(test_data)

    

    return (train_data, train_labels), (test_data, test_labels)```

#

(train_data, train_labels), (test_data, test_labels) = ds.import_data("Pickup Lines - Insults")


word_index = ds.get_word_index("Pickup Lines - Insults")


word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def decode_review(text):
    return ' '.join([reverse_word_index.get(i, "?") for i in text])

train_data = keras.preprocessing.sequence.pad_sequences(train_data,
                                                        value=word_index["<PAD>"],
                                                        padding="post",
                                                        maxlen=256)

test_data = keras.preprocessing.sequence.pad_sequences(test_data,
                                                        value=word_index["<PAD>"],
                                                        padding="post",
                                                        maxlen=256)

train_data = np.array(train_data)
train_labels = np.array(train_labels)
test_data = np.array(test_data)
test_labels = np.array(test_labels)

print (train_data)

vocab_size = 88000

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(64, activation=tf.nn.relu))
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["acc"])

x_val = train_data[:10000]
partial_x_train = train_data[10000:]

y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

model.fit(train_data, train_labels, epochs=10)

#

first block is in my package, 2nd is the main neural net script

desert oar Jul 23, 2020, 7:52 PM

#

parse_horrible_format already does the padding FYI

earnest wadi Jul 23, 2020, 7:53 PM

#

so I should delete keras.pre... etc

desert oar Jul 23, 2020, 7:53 PM

#

i mean, there is quite a bit more action happening here

earnest wadi Jul 23, 2020, 7:53 PM

#

or does it not matter too much

desert oar Jul 23, 2020, 7:53 PM

#

frankly i dont use keras so i have no idea what much of this code does

earnest wadi Jul 23, 2020, 7:54 PM

#

ooh

desert oar Jul 23, 2020, 7:54 PM

#

the basic problem is: train_data and train_labels must be numpy arrays of floats, ints, et al

#

not arrays of arrays

#

so whatever processing you do, at the end of the day you must make sure that you are feeding "flat" numpy arrays to keras

earnest wadi Jul 23, 2020, 7:54 PM

#

but you dont know how I can flatten my data to work

#

ill try looking some more stuff up

desert oar Jul 23, 2020, 7:56 PM

#

i dont know because i dont know what your data looks like before you pass it to keras

#

its possible/likely that pad_sequences is returning an array of lists or something

earnest wadi Jul 23, 2020, 7:57 PM

#

idk because this code was working with an official keras dataset, I bassically just tried my best to replicate the format of the index,labels and data

desert oar Jul 23, 2020, 7:58 PM

#

is this "tf keras" or "keras keras" btw?

#

model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(64, activation=tf.nn.relu))
model.add(keras.layers.Dense(32, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

this looks so easy even i could probably do it and i am legitimately very stupid

earnest wadi Jul 23, 2020, 8:00 PM

#

this is

#

tf keras

#

from tensorflow import keras

desert oar Jul 23, 2020, 8:00 PM

#

cool

#

i should look into tf 2.0

#

ive used pytorch more, tf 1.0 was too cumbersome

earnest wadi Jul 23, 2020, 8:01 PM

#

yeah i managed to really easy get my head around the code they provided to make this:

desert oar Jul 23, 2020, 8:01 PM

#

i also really hate that they called their "keras-style" library "keras"

#

they should have come up with a different name

earnest wadi Jul 23, 2020, 8:01 PM

#

📎 unknown.png

desert oar Jul 23, 2020, 8:01 PM

#

cool

earnest wadi Jul 23, 2020, 8:01 PM

#

all im tryna do now is make an easy and modular way to train it on any text dataset you want

#

straight from a .txt fike

#

file*

#

bassically

#

my question

#

after your function you made for parsing

#

is the output a 2d matrix

#

or is it still array of arrays

desert oar Jul 23, 2020, 8:04 PM

#

my function should return a 2d array

#

i.e. the length of .shape should be 2

earnest wadi Jul 23, 2020, 8:04 PM

#

ill check

#

len(train_data.shape) ?

#

yes

#

it is 2

desert oar Jul 23, 2020, 8:07 PM

#

thats called a "clean room" reimplementation

#

when you copy the logic but none of the source code

#

im in a meeting, but i know something about this topic

#

ill @ you later

earnest wadi Jul 23, 2020, 8:10 PM

#

alrigfht, ive done some testing the shape is definatley 2 all the way through the code @desert oar

visual violet Jul 23, 2020, 9:20 PM

#

ok so

#

i have

#

📎 unknown.png

#

can i be smart enough to take average of percentage difference every month?

desert oar Jul 23, 2020, 9:31 PM

#

data.set_index('Date').resample('1M')['Percentage Difference'].mean()

like that?

#

or ```python
data.resample('1M', on='Date')['Percentage Difference'].mean()

#

@visual violet ^

visual violet Jul 23, 2020, 9:34 PM

#

insanely smart

#

thanks!

#

Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

#

hmm type error

desert oar Jul 23, 2020, 9:35 PM

#

Convert your Date column to an actual date or timestamp type

#

Not a string

visual violet Jul 23, 2020, 9:36 PM

#

so something like this?

#

df_newyork['Date Time'] = pd.to_datetime(df_newyork['Date Time'])

desert oar Jul 23, 2020, 9:37 PM

#

Should work, im on mobile now so cant check

visual violet Jul 23, 2020, 9:52 PM

#

@desert oar dude it worked

#

thanks for the second code

severe island Jul 23, 2020, 10:53 PM

#

is there anyway I can get twitter dataset?

#

i dont have the time to scrape one, are there any repos online that can provide twitter dataset?

flat quest Jul 23, 2020, 10:56 PM

#

you could always do a quick search

I'm sure theres some on kaggle or uni websites @severe island

vernal sierra Jul 23, 2020, 11:28 PM

#

Has anyone ever worked with neural machine translation? I got an error where it says runtime error: dimension specified as 0 but tensor has no dimensions.

silk knot Jul 24, 2020, 12:39 AM

#

someone told me to ask it here so I here I go

#

https://paste.pythondiscord.com/abinalowir.coffeescript Hi I want to confert a table I made to a pd.DataFrame(), take a look at the script I used, if that helps.

#

since the whole thing should work just fine if the table would just be in pd.DataFrame format 😅

#

please ping me if you got a solution

lapis sequoia Jul 24, 2020, 1:30 AM

#

Hi, how i can fix this erros : ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

#

Hi so basically im attempting to reshape my 1D array to become a 2D array, but I keep getting an error that says my object doesnt have the Attribute array.reshape(1, -1), even though my console told me to reshape my data using array.reshape(1, -1).

lapis sequoia Jul 24, 2020, 1:57 AM

#

how i can fix it : len() of unsized object

rancid brook Jul 24, 2020, 2:27 AM

#

Read the error and then understand why its giving that error

#

@lapis sequoia You need to post your code

desert oar Jul 24, 2020, 2:31 AM

#

@silk knot can you show an example of what rows and data contain? also try to avoid using variable names like list which shadow important built-in names

silk knot Jul 24, 2020, 2:32 AM

#

something like this but I think its fixed already

📎 unknown.png

#

And good point

#

But I did have another issue you might could help me with 😅

#

ValueError: Shape of passed values is (33905, 36), indices imply (36, 36)

#

I dont know if you mind being tagged so im just gonna do it once, let me know if it bothers you @desert oar

desert oar Jul 24, 2020, 2:46 AM

#

once is fine, thanks for asking

#

if i disappear for a while you can tag me again although i will be heading offline soon

#

show the code that produced that error?

#

usually that means you have mismatched sizes

#

e.g. ```python
pd.Series([1,2], index=list('abcdefg'))

produces a similar error

silk knot Jul 24, 2020, 2:48 AM

#

https://paste.pythondiscord.com/delabonira.coffeescript

#

the thing is, I don't remember ever making it 36, 36

#

I did have 36 samples however, so its not a random number I just don't know where I implied it

desert oar Jul 24, 2020, 2:58 AM

#

what line

silk knot Jul 24, 2020, 3:00 AM

#

I think it goes wrong in 74

desert oar Jul 24, 2020, 3:01 AM

#

makes sense

silk knot Jul 24, 2020, 3:01 AM

#

📎 unknown.png

desert oar Jul 24, 2020, 3:01 AM

#

pca_data is not the covariance matrix...

#

it's your full data projected into PC space

#

so N rows and 36 columns

#

whereas you specified index=columns, columns=labels both of which evidently contain 36 elements

silk knot Jul 24, 2020, 3:02 AM

#

uhu

#

well thank you I should be able to fix it, after I get some sleep

#

some nights, before you know it its 5am

desert oar Jul 24, 2020, 3:07 AM

#

@void anvil arch? that's a library on pypi?

#

so it is https://pypi.org/project/arch/

PyPI

arch

ARCH for Python

#

nice

#

oh

#

not about clean room code but

#

i wanted to clarify that copyright applies to code

#

specifically to the code

#

not to the algorithm

#

because the algorithm as you point out is part of nature

#

and so can be patented, but not copyrighted, because it is not a creative work

#

however the code is copyrightable, and not patentable, because it is considered a creative work

#

(under US law)

#

this is why licenses such as the GPL are interesting - they apply to the source code, but because of the license terms also have implications for consumers of the software itself

#

thats actually a good question

#

my guess is that like paraphrasing someone else's paper without citing their work

#

right

#

i would imagine the same is true for code

#

that isn't necessarily true. this is why clean room implementations exist.

#

for plausible deniability that any source code which happens to be in common is a coincidence

#

i'm not sure about that

#

i'm sure there is plenty of case law on that subject

#

i have a friend who is an attorney in this field

#

i can ask how this all works

#

he will likely say "this scenario is absurd and would never happen in real life" and it would take me 10 mins to get an answer

#

if there is exactly one way and only one way to implement an algorithm in a particular language, e.g. C, i imagine that it would not be considered copyrightable, or that using the same code is fair use

#

so much of US IP law is in the form of case law and precedent

#

so its really really hard to know even if you are brave enough to wade through the statutes

#

so i can ask him but i cant guarantee a good answer

#

i think in general if you don't willfully commit copyright infringement, most codebases are sufficiently complicated and distinct enough that you won't get copyright trolled

#

how what works?

#

you must distribute your contributions under the same license as the original.

#

i'm not sure if the CC BY-SA license can be interpreted to mean that software compiled from the licensed source code must also be distributed under CC BY-SA

#

i suspect that it can't, or isn't

#

and this is also why you should use a code-specific license for your code, so as to remove the ambiguity

#

wait who is using CC for code

#

why would you do that

#

other than like... contributions to rosetta code

#

i suspect that compiled software falls outside the scope of "Adapted Material" in the 4.0 version

#

and of "Derivative Work" in the 2.0 version

#

that's a bit like if you printed out the script to Arcadia, ran it through a paper shredder, then used the pieces to make a collage

#

is that a derivative work of Arcadia?

#

is it?

#

i mean, is it a derivative work?

#

i mean ignore the fact that snippets of copyrighted material are visible in the collage

#

the fact that it's physically made of Arcadia i think isn't alone enough to call it derivative

#

its a good question

#

if you arent distributing it then the GPL at least doesnt care

#

its more interesting if you are building a public-facing API with a closed-source backend

#

are you "distributing" something based on the GPL'ed code?

#

you aren't physically distributing software, but you are providing access to that software

#

right. if providing an API were determined to be "distributing" it would be the legal equivalent of finding P=NP or breaking RSA

#

a court discovery order would change that in a hurry

#

you wouldn't have to look far

#

what if they give a tech talk at a python conference about how they use your GPL library to go super duper fast

#

im not saying thats available in every case, but it could be enough to make an example out of someone

#

plus theres some software which is both GPL and almost literally ubiquitous

#

although then you're basically saying "statistically this person is likely to be using my software" which has a lot of uncomfortable implications for criminal law

#

right, idk if or how much they differ in that respect

#

you might also be able to avoid the "we have cause because 99% of businesses use this"

#

look at job postings for example

#

do they list django? they're probably using django

#

having a case like this succeed in courts would be a true apotheosis of the free software movement

#

not sure its even a good thing

#

or if its obviated by some other legal doctrine or statute

#

ok that one probably will show up in court

#

in the next 5 years

#

great question

#

yep

#

funny i literally just read this article tonight

#

https://www.theatlantic.com/ideas/archive/2020/07/scotus-congress-trust/614380/

The Atlantic

SCOTUS Doesn’t Trust Congress—And That’s a Problem for American Gov...

The past decade has witnessed a dangerous trend: a judicial branch that expresses deep suspicion of the legislative branch’s competence and motives.

#

IP law is such a mess already, now throw in datasets and trained models

#

increasingly complicated blends of the above with source code and patentable algorithms

#

incoming: a German court rules that articles written by AI are considered derivative works of the AI developer

#

i can't wait for that hacker news thread

#

...i think its past my bedtime

#

im having bizarre copyright fantasies

#

likewise, ill try to remember to ask about some of this

rapid plank Jul 24, 2020, 4:34 AM

#

how do I get numpy or matplotlib in vscode?

limpid oak Jul 24, 2020, 6:40 AM

#

`def f(row):
try:
return Polygon([(pt['Longitude'], pt['Latitude']) for pt in json.loads(row['PlotGeoFence'])])
except:
return numpy.nan

InputFile['geofence_poly'] = InputFile.apply(f, axis=1)`

#

i have this function which returns NaN if it fails, but it is setting all row with NaN

#

but i want to keep other column data aslo and only set NaN where it fails

#

any help wiil be appreciated

#

📎 NaN_Error.JPG

desert parcel Jul 24, 2020, 10:06 AM

#

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
model = keras.Sequential([
                         keras.layers.Flatten(input_shape=(28, 28)),
                         keras.layers.Dense(128, activation=tf.nn.relu),
                         keras.layers.Dense(10, activation=tf.nn.softmax)
])```

#

Could someone explain the 2nd layer?

#

the video says it in a very complicated way and I don't really get it

#

So is it right to say that it has 128 functions that will determine what that thing is?

simple violet Jul 24, 2020, 11:23 AM

#

Does any one know the most attracting data to scrape with pyhton, from any links or useful data?

snow grove Jul 24, 2020, 12:03 PM

#

guys can you help me with this```
pca=dt.corr() #dt is my data

#this works fine
k=10
col=pca.nlargest(k,'SalePrice').index
d=np.corrcoef(dt[col].values.T) #presence of transpose here
sb.heatmap(d,annot=True)

#this doesn't work (it keeps on running till RAM uses exceed and crashes, gives no output )
k=10
col=pca.nlargest(k,'SalePrice').index
d=np.corrcoef(dt[col].values)
sb.heatmap(d,annot=True)

grizzled oriole Jul 24, 2020, 12:47 PM

#

how do I get numpy or matplotlib in vscode?
@rapid plank
Install numpy and matplotlib in your Python environment and import it...

desert oar Jul 24, 2020, 1:01 PM

#

@desert parcel do you understand how a neural network works at a basic level?

#

a "dense" layer in keras means that every input is connected to every node. in this case there are 128 nodes

#

this is also called a "fully connected" layer

desert parcel Jul 24, 2020, 1:02 PM

#

Oh no I do not, the video i'm watching doesn't go that detailed

desert oar Jul 24, 2020, 1:02 PM

#

ah

desert parcel Jul 24, 2020, 1:02 PM

#

It's this one

desert oar Jul 24, 2020, 1:03 PM

#

i strongly recommend the 3blue1brown series that explains how neural networks operate

desert parcel Jul 24, 2020, 1:03 PM

#

https://www.youtube.com/watch?v=KNAWp2S3w94&t=318s

YouTube

TensorFlow

Intro to Machine Learning (ML Zero to Hero - Part 1)

Machine Learning represents a new paradigm in programming, where instead of programming explicit rules in a language such as Java or C++, you build a system which is trained on data to infer the rules itself. But what does ML actually look like? In part one of Machine Learning...

▶ Play video

desert oar Jul 24, 2020, 1:03 PM

#

https://www.youtube.com/watch?v=aircAruvnKk

YouTube

3Blue1Brown

But what is a Neural Network? | Deep learning, chapter 1

Home page: https://www.3blue1brown.com/
Brought to you by you: http://3b1b.co/nn1-thanks
Additional funding provided by Amplify Partners

Full playlist: http://3b1b.co/neural-networks

Typo correction: At 14 minutes 45 seconds, the last index on the bias vector is n, when it's...

▶ Play video

desert parcel Jul 24, 2020, 1:03 PM

#

ohh

#

I couldn't find a good one I just thought that i'd stick to the official tensorflow yt

#

And I keep redoing my notes

#

which is really annoying

desert oar Jul 24, 2020, 1:03 PM

#

dont forget that TF is a software library

#

they will be focused on teaching you the software

#

although it looks like they are doing a good job at introducing you to the concept

#

they might go back and explain layers later

desert parcel Jul 24, 2020, 1:04 PM

#

yeah

#

they jump around the code a bit

desert oar Jul 24, 2020, 1:04 PM

#

this actually seems like a very nice gentle introduction

desert parcel Jul 24, 2020, 1:04 PM

#

which one?

desert oar Jul 24, 2020, 1:04 PM

#

the TF one

desert parcel Jul 24, 2020, 1:04 PM

#

Oh

#

yeah I like it

#

I can make notes and simplify it

desert oar Jul 24, 2020, 1:04 PM

#

but the 3blue1brown video will probably be enlightening

desert parcel Jul 24, 2020, 1:04 PM

#

Ah I'll watch it while eating lunch lol

#

Which will be tomorrow haha

desert oar Jul 24, 2020, 1:04 PM

#

i don't see where they use the Dense(128) thing

desert parcel Jul 24, 2020, 1:04 PM

#

since i't snight already

#

Let me get my code

#

hold on

#

https://colab.research.google.com/drive/1HWAq_nxJUZVq90MZUfpjR65c_rpGuEMX

Google Colaboratory

desert oar Jul 24, 2020, 1:05 PM

#

eventually you will want to learn the math as well

desert parcel Jul 24, 2020, 1:05 PM

#

the math huh

#

well I don't like math a lot sometimes

desert oar Jul 24, 2020, 1:05 PM

#

yep. the whole idea of a "neuron" is a conceptual aid to understanding the model and doesn't really have much to do with how the model actually works

#

it's all math underneath

desert parcel Jul 24, 2020, 1:06 PM

#

most of the text in the notes is just me paraphrasing what the guy said

desert oar Jul 24, 2020, 1:06 PM

#

you dont need to understand it all, but the more you do understand the more interesting problems you can solve

desert parcel Jul 24, 2020, 1:06 PM

#

Well I don't seem too excited to get into the mathy side of things lol

desert oar Jul 24, 2020, 1:06 PM

#

i can't access that link from work

desert parcel Jul 24, 2020, 1:06 PM

#

Yeah that makes sense

#

Oh you're at work?

desert oar Jul 24, 2020, 1:06 PM

#

i'm just wondering where you got the idea to use this

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

desert parcel Jul 24, 2020, 1:07 PM

#

The youtube video

#

I copied it from the notebook they linked

desert oar Jul 24, 2020, 1:07 PM

#

oh, the part 2?

desert parcel Jul 24, 2020, 1:07 PM

#

Yeah part 2

desert oar Jul 24, 2020, 1:07 PM

#

ok

desert parcel Jul 24, 2020, 1:07 PM

#

I watched the 2 of them b2b and keep rewatching them

#

I wanna get good at the start first

desert oar Jul 24, 2020, 1:07 PM

#

the 3blue1brown video should help explain what's happening with this

desert parcel Jul 24, 2020, 1:07 PM

#

Alright then

desert oar Jul 24, 2020, 1:07 PM

#

and it will also show you where the math part comes in

desert parcel Jul 24, 2020, 1:07 PM

#

Alright I'll give that a watch then

desert oar Jul 24, 2020, 1:07 PM

#

cool

earnest wadi Jul 24, 2020, 1:25 PM

#

@desert oar hey, have gotten quite a bit further with my problem i got some help from other people and shuffled some stuff up, however now im havintg a slight problem with your function

#

def parse_horrible_format(data):
        max_len = max(len(rec) for rec in data)
        data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
        data = np.array(data).astype(np.float32)
        return data

Traceback (most recent call last):
  File "c:/Users/Silv3/OneDrive/Desktop/datasetup/datasetup/hillbilly.py", line 12, in <module>
    (train_data, train_labels), (test_data, test_labels) = ds.import_data("Pickup Lines - Insults")
  File "c:\Users\Silv3\OneDrive\Desktop\datasetup\datasetup\datasetup.py", line 258, in import_data
    labels = parse_horrible_format(labels)
  File "c:\Users\Silv3\OneDrive\Desktop\datasetup\datasetup\datasetup.py", line 253, in parse_horrible_format
    data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
  File "c:\Users\Silv3\OneDrive\Desktop\datasetup\datasetup\datasetup.py", line 253, in <listcomp>
    data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
  File "c:\Users\Silv3\OneDrive\Desktop\datasetup\datasetup\datasetup.py", line 253, in <listcomp>
    data = [[int(x) for x in rec] + [0] * (max_len - len(rec)) for rec in data]
ValueError: invalid literal for int() with base 10: 'O'

desert oar Jul 24, 2020, 1:26 PM

#

that's a capital letter O, not a digit 0

#

how did that get into your data?

earnest wadi Jul 24, 2020, 1:26 PM

#

wheres the O coming from wtf

#

i have no idea

desert oar Jul 24, 2020, 1:26 PM

#

👻 spooky

earnest wadi Jul 24, 2020, 1:26 PM

#

oh

#

yeah I have no idea

#

labels is littrally just 1's and 0's

#

oooohh

#

I see

#

labels is this @desert oar

'Do you like Star Wars? Because Yoda only one for me!': '1', 'Call me Shrek because I’m head ogre heels for you.': '1', 'Wanna go bowling? I thought it might be right up your alley.': '1', 'Excuse me, I just noticed you noticing me and I just wanted to give you notice that I noticed you too.': '1', 'If your heart was a prison, I would like to be sentenced for life.': '1', 'I love you like a pig loves not being bacon.': '1', 'Are you parents bakers? Because you are a cutie pie.': '1', 'Are you a cat? Cause you are purrrfect': '1'}

I had to shorten it because its like 90 000 characters

#

so the first one is

#

{'Of course I talk like an idiot. How else could you understand me?': '0'

#

hense the "O"

desert parcel Jul 24, 2020, 1:32 PM

#

...

#

nice joke

earnest wadi Jul 24, 2020, 1:33 PM

#

?

desert parcel Jul 24, 2020, 1:33 PM

#

lol i am stealing that

earnest wadi Jul 24, 2020, 1:33 PM

#

oh haha

#

yeah its a dataset of insults and pickup lines

desert parcel Jul 24, 2020, 1:34 PM

#

Oh here let me give you mine

earnest wadi Jul 24, 2020, 1:34 PM

#

?

desert parcel Jul 24, 2020, 1:35 PM

#

annoying = ['Is it now?', 'Nope', 'Not really', 'I say not', 'lol git gud', 'Naaaa']
query = ['Yes that is correct.', f"""I agree with which was last said""", "That's correct",
        "I'm not sure, you tell me.", "Oh please you're smarter than that.","Figure it out.",
        "I'm not google.", "You think I know everything?",
        "I'm not going to say it, now that you want me to say it.",
        "lol good luck figuring it out on your own","why would I know.",
        "Sure. If that's what you wanna think."]
what_is = [
    "I'm not sure, you tell me.", "Oh please you're smarter than that.",
    "Figure it out.", "I'm not google.", "You think I know everything?",
    "I'm not going to say it, now that you want me to say it.",
    "lol good luck figuring it out on your own",
    "why would I know.", "Sure. If that's what you wanna think."
]```

desert oar Jul 24, 2020, 1:35 PM

#

@earnest wadi you need to use a totally different function to parse that

desert parcel Jul 24, 2020, 1:35 PM

#

Lol It's just my bot being sassy

earnest wadi Jul 24, 2020, 1:35 PM

#

haha

desert parcel Jul 24, 2020, 1:35 PM

#

I wanted to implement this by learning ML and stuff

desert oar Jul 24, 2020, 1:35 PM

#

you really really really need to use json or some standard format

desert parcel Jul 24, 2020, 1:35 PM

#

I'll let you guys talk lol

desert oar Jul 24, 2020, 1:35 PM

#

this is turning into a mess to read data from disk

earnest wadi Jul 24, 2020, 1:35 PM

#

thats not a weird format

#

its a python dictionary

#

before all the numpy stuff

data is a list
index is a dict
labels is a dict

desert oar Jul 24, 2020, 1:42 PM

#

yes but you are dumping literal python objects

#

parsing that is inevitably a mess

#

eval is not a good idea

#

use json instead

lapis sequoia Jul 24, 2020, 1:51 PM

#

Hi, when i run this code :

📎 dd.PNG

#

i got this erros :

📎 aa.PNG

desert parcel Jul 24, 2020, 2:13 PM

#

X = np.array([-1.5, 0, 3.5], dtype=float)
Y = np.array([0, 3, 10], dtype=float)

newModel = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
newModel.compile(optimizer='sgd', lose='mean_squared_error')
newModel.fit(X, Y, epochs=20)

print(Model.predict([1.0]))```

#

This is the error I am getting

📎 unknown.png

#

I am not sure how to fix it

#

why does this work ```py
X = np.array([-1.5, 0, 3.5], dtype=float)
Y = np.array([0, 3, 10], dtype=float)

newModel = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
newModel.compile(optimizer='sgd', loss='mean_squared_error')
newModel.fit(X, Y, epochs=250)

print(newModel.predict([1.0]))```

desert oar Jul 24, 2020, 2:24 PM

#

look at the error @desert parcel

#

what does it say?

desert parcel Jul 24, 2020, 2:31 PM

#

Model must be made and compiled with the same DistStart

#

and a type error

desert oar Jul 24, 2020, 2:51 PM

#

@desert parcel the error message is the last line

#

where it says TypeError

autumn veldt Jul 24, 2020, 4:38 PM

#

excuse me guys, can someone help me, currently im learning something about Image Classification using google colab. since im learning it by watching Youtube Video, i got some problem and i can't solve it, can anyone help me?

desert oar Jul 24, 2020, 4:39 PM

#

just ask your question

#

!ask

arctic wedgeBOT Jul 24, 2020, 4:39 PM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

desert oar Jul 24, 2020, 4:39 PM

#

see our guide to good questions above

autumn veldt Jul 24, 2020, 4:40 PM

#

so i just put my question here right?

silk axle Jul 24, 2020, 4:40 PM

#

Yes @autumn veldt

autumn veldt Jul 24, 2020, 4:41 PM

#

ok, lemme prepare my question first

#

📎 dc2.PNG

#

!ask so, i was learning something about image classification using feature GLCM + SVM method. where i put my dataset into csv file. after reading the dataset im trying to see how much the accuracy that i can got from it (it's only showing one time training with 0.70 accuracy), now the problem that i want to ask is, how to put something like 5-20 training with different accuracy (epoch = 20) in one time run? im so sorry if my english so bad tho

📎 dc1.PNG

arctic wedgeBOT Jul 24, 2020, 4:50 PM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

autumn veldt Jul 24, 2020, 4:55 PM

#

if its possible, can u guys give me some link or whatever that i can learn about my problem too, thanks

desert oar Jul 24, 2020, 5:05 PM

#

you dont need the !ask command, that just shows you info about asking good questions

flat quest Jul 24, 2020, 5:07 PM

#

still don't really understand what you're trying to do @autumn veldt

autumn veldt Jul 24, 2020, 5:16 PM

#

so, here i was running DataValidation.csv using GLCM+SVM and the result of train is 0.70 accuracy on one time running (only one data that is calculated from the accuracy of the many existing data). the problem is, on DataValidation.csv there's about 100+ data and i want to show accuracy at least 20 data from 100 data available in the CSV file by expecting different accuracy results.

flat quest Jul 24, 2020, 5:27 PM

#

what do you mean by one data?

A single row? @autumn veldt

autumn veldt Jul 24, 2020, 5:28 PM

#

yea, single row

drifting hemlock Jul 24, 2020, 5:35 PM

#

Hello, I need some suggestions. I've trained a model to detect the probability of a client cancelling their services with us using our current client's list, if I deploy that model to production to an API to do live-predictions, and I pass the one of the clients that was used in the training set, wouldn't that be biased?

flat quest Jul 24, 2020, 5:37 PM

#

then just get 20 rows and run them through the model @autumn veldt, and check accuracy?

autumn veldt Jul 24, 2020, 5:43 PM

#

thats the problem @flat quest , how to get 20 rows on one run?

visual violet Jul 24, 2020, 6:13 PM

#

guys

#

so i calculated percetange difference from 2019 to 2020

flat quest Jul 24, 2020, 6:39 PM

#

just select all 20 rows xd @autumn veldt

data[0:20]

#

and run that through the model
that should work unless that particular model doesn't work on batched data

autumn veldt Jul 24, 2020, 6:52 PM

#

ill try it soon, btw thanks @flat quest

lapis sequoia Jul 24, 2020, 7:26 PM

#

How i can fix : thid white draw

📎 hh.PNG

limpid raft Jul 24, 2020, 7:53 PM

#

In CNN models, why is it more accurate to have Conv layers and then some dense layers instead of only Conv layers?

queen barn Jul 24, 2020, 7:55 PM

#

What's the best machine learning model that can take in a decent sized amount of rows with many nominal categorical variable and a few output variables to determine predicting factors?

drifting hemlock Jul 24, 2020, 8:11 PM

#

@void anvil you mean training the model on a small subset of the clients and running the predictions on the total population?

#

What if my dataset is small? Like 3,000~?

ebon nebula Jul 24, 2020, 8:30 PM

#

Hello all. Can someone recommended me a good book/course/site to start learning machine learning.

bitter harbor Jul 24, 2020, 9:35 PM

#

Learning about linear algebra and stats would be the place to start

eager arrow Jul 24, 2020, 10:48 PM

#

sup nerds

#

i just got back from banging your girlfriends

desert parcel Jul 24, 2020, 10:59 PM

#

@desert parcel Are you free to upload that dataset somewhere / link me to it? It looks awesome to play with
@void anvil Well I made the dataset myself I just came up with a random linear line equation.

queen barn Jul 24, 2020, 11:12 PM

#

I realize my question above was rather vague, but if anyone has the time to spare, I just need help figuring out how to go about analyzing this data. We have a product with many configurations, and rather than going through and checking each individual feature or permutation of the features, I'd like to run it through a model that can identify correlation to a binary good or bad column.

#

I have found many methods that seem almost on point for what I'm looking for, but not quite on the nose.

flat quest Jul 24, 2020, 11:29 PM

#

what kind of product is this? @queen barn

#

@limpid raft well technically you can. And it does work reasonably well. Remember Conv nets are basically 2d dense nets with a limited scope.

But at some point you need to switch from a 2d output to a 1D. That can be done through maxpool, etc. but dense layers tend to work better.

Conv nets may focus too much on low level features. Since each filter is like 3 x 3, low level local features (ex blue dot in left bottom corner) may be prioritized over a higher level feature like an oval face.

desert parcel Jul 24, 2020, 11:39 PM

#

Do I need a GPU for this?

#

I haven been using google colab for this

#

But do I really need a GPU?

queen barn Jul 24, 2020, 11:53 PM

#

what kind of product is this? @queen barn
@flat quest I'm not sure why that's relevant. I'd prefer not to sure the specifics of the product, but I'd be more than willing to explain or provide an example of the data structure.

flat quest Jul 25, 2020, 12:27 AM

#

@queen barn Ah yeah, was only asking because like you said question was pretty vague. The type of model to use is heavily dependent on what kinda product and data ur working with.

You said configurations are these config files for software? Different setups for products as in machines, etc.

queen barn Jul 25, 2020, 12:47 AM

#

No it's a physical product that has about 95 different attributes that can be changed by customization request. I'm trying to find a correlation between some of these customizations and a quality failure.

#

@flat quest I hope that helps a bit

nimble lotus Jul 25, 2020, 1:13 AM

#

I am trying to convert a column in a dataframe from an object to a string but it is not working? Could someone explain how come?

bitter harbor Jul 25, 2020, 1:35 AM

#

because the column is a column/a 1D array

#

you'll have to take every element and append/concat them together

wise garden Jul 25, 2020, 2:36 AM

#

Getting an error when fitting data to simple neural net( input, 2 hidden, 1 output, all dense layers): Error when checking input: expected dense_34_input to have shape (518994,) but got array with shape (1,). My input is 32950 by 63 and I flattened it to fit it to the network. Not sure why it's showing (1,). Anyone know what I'm doing wrong?

flat quest Jul 25, 2020, 3:10 AM

#

well a very basic way (This would be something like a base model) would be to simply throw in all these attributes as features.

As for getting those categories into a numerical format. Since its only 95 attributes, one hot should work fine.

Another method would be to use embeddings.

You should probably start with a standard dense model and see how well it performs. Since there's no temporal data involved, you won't need RNN's or LSTM's. Transformers will tend to work better than standard dense models, but its much more compute heavy.

@queen barn

#

@wise garden

So if you print the flattened input what shape does it have? It should be (batch_size, features)

old thorn Jul 25, 2020, 4:03 AM

#

can anyone here explain neural networks to me like I'm five? I got into Machine Learning about a month ago and Python like 2 months ago. I use google colab if that will help u help me better

#

Im really struggling to grasp the concepts of RNN's, CNN's, KNN's, and ANN's. I just don't understand how they work

#

Please PM me if possible

desert parcel Jul 25, 2020, 4:13 AM

#

model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

DataA = np.array([1, 2, 3, 4, 5, 6, 7], dtype=int)
DataB = np.array([6, 7, 8, 9, 10, 11, 12], dtype=int)
# DataB = DataA + 5


model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(DataA, DataB, epochs=250)
print("-"*20)
print(model.predict([9]))```

#

Output:

📎 unknown.png

#

Why is the output 15 though if it's 9 shouldn't it be 14

old thorn Jul 25, 2020, 4:34 AM

#

You might not have enough data for it to be accurate maybe? but dont take my advice, ask someone more experienced @desert parcel

desert parcel Jul 25, 2020, 4:47 AM

#

But I do though

#

In another one I copied from the web

#

it has less epochs and less data but is more accurate than what i'm using

old thorn Jul 25, 2020, 4:48 AM

#

huh

#

well idk

#

that was my only answer, gl on ur quest

desert parcel Jul 25, 2020, 4:48 AM

#

Yeah alright

tame fractal Jul 25, 2020, 5:24 AM

#

What exactly were you expecting?

#

why should it be 14?

desert parcel Jul 25, 2020, 5:28 AM

#

DataB = DataA + 5

#

in the comment over there

#

This is making 2 tuples right?

📎 unknown.png

tame fractal Jul 25, 2020, 5:31 AM

#

no

#

it's literally just adding 5 to every element of DataA

#

it's an estimator and the model you're using isn't predicting well

#

because it's an estimator you're usually not going to get exact values like that

#

try using a different model and test your result

desert parcel Jul 25, 2020, 5:41 AM

#

So the equation is backwards?

tame fractal Jul 25, 2020, 5:43 AM

#

no?

desert parcel Jul 25, 2020, 5:45 AM

#

Could you explain it again

lapis sequoia Jul 25, 2020, 8:37 AM

#

Hello all. Can someone recommended me a good book/course/site to start learning machine learning.
@ebon nebula Sorry to answer you a little bit late;
I recommend you sololearn application that is also available in web.
It has a specific section named "Machine learning".
I'm learning python by sololearn and it teaches everything from 0 to 100.
If it's the first time you're hearing about sololearn, it's better to start with android or ios version not web application.

#

and sorry for wall.

ebon nebula Jul 25, 2020, 8:44 AM

#

You have nothing to be sorry for. Thank you for the suggestion. I will check it out asap.

spark stag Jul 25, 2020, 8:55 AM

#

@desert parcel onw issue you probably have is that you only have 7 different inputs you are training the model on (which is a very small ammount for a neural network) but you are trainnig for 250 epochs which is a lot, your model is probably overfitting to the data being passed to it although having said that, i would of thought that the weight would update towards one with the bias being 5, i would reccommend using more training data and probably fewer epochs but as you only have 1 weight i don't this can over fit

#

one other note is that this is not the kind of problem neural networks are good at solving, you will probably get better results using some sort of regression, i was just highlighting above why the results may not be what you were expecting and how to try ammend these isssues for other models

desert parcel Jul 25, 2020, 9:28 AM

#

ohhh yeah

#

I was just using a Y=MX+C equation since that is used in the example

#

I base most of my test models on that equation

prisma verge Jul 25, 2020, 11:09 AM

#

how does one get into deep learning without much higher math?
i've been looking at tf2.0 for really long time, and it seems really simple. i do understand how dataset is processed in the most cases, but when it comes to model building, choosing optimizer, etc. then i'm just lost. so much models, loss functions, and every model needs number of parameters that i don't understand how to calculate. my brain can't find any patterns in this.
tl;dr how do i acquire basic knowledge of deep learning without diving into higher math and linear algebra? moden frameworks seem to make it possbile, but i just don't understand which optimizers/layers to use in which situation, how to get needed numbers of params, etc

wise garden Jul 25, 2020, 11:52 AM

#

@flat quest I had a few to many last night and didn't realize I was passing the wrong tensor through my pipeline lol got it figured out

visual violet Jul 25, 2020, 12:50 PM

#

guys

#

if i use resample function of pandas

#

it returns a series?

desert parcel Jul 25, 2020, 12:56 PM

#

how does one get into deep learning without much higher math?
i've been looking at tf2.0 for really long time, and it seems really simple. i do understand how dataset is processed in the most cases, but when it comes to model building, choosing optimizer, etc. then i'm just lost. so much models, loss functions, and every model needs number of parameters that i don't understand how to calculate. my brain can't find any patterns in this.
tl;dr how do i acquire basic knowledge of deep learning without diving into higher math and linear algebra? moden frameworks seem to make it possbile, but i just don't understand which optimizers/layers to use in which situation, how to get needed numbers of params, etc
@prisma verge You just need HS level math up to how to differentiate and how to do matrix addition, subtraction, and multiplication. PyTorch can however do all of that for you with functions.

#

But know the concepts do help

lapis sequoia Jul 25, 2020, 1:04 PM

#

`

#

noise = np.random.uniform(-1,1(observations,1)) targets = 2*xs - 3*zs + 5 + noise

#

why do i get an error for this

#

please some body help

#

the error is int is not callable

silk axle Jul 25, 2020, 1:06 PM

#

1(observations,1)

#

As the error says, you can't call ints

#

Which is what you're doing there

#

You might've meant 1, (observations, 1) idk

lapis sequoia Jul 25, 2020, 1:09 PM

#

TypeError: Level type mismatch: month

#

noo not that one

#

noise = np.random.uniform(-1,1(observations,1)) targets = 2*xs - 3*zs + 5 + noise

#

this one

#

gives me an error at line 1 stating int is not callable

silk axle Jul 25, 2020, 1:24 PM

#

Read what I said

#

I literally answered that

#

@lapis sequoia

lapis sequoia Jul 25, 2020, 1:24 PM

#

observations = 1000 xs = np.random.uniform(low=-10,high=10,size=(observations,1)) zs = np.random.uniform(-10,10,(observations,1)) inputs = np.column_stack((xs,zs))

#

but this gives no error

silk axle Jul 25, 2020, 1:25 PM

#

Because you aren't calling an integer there

lapis sequoia Jul 25, 2020, 1:25 PM

#

yeah yeah got it

#

i missed a comma there

#

-1,1,(observation)

#

yeah got it thank you so much !

visual violet Jul 25, 2020, 1:33 PM

#

x = temperature.Temperature.resample('D').mean()

#

temperature is a dataframe

#

so x should be a series

wanton bough Jul 25, 2020, 1:39 PM

#

will anacoda work better in my 2gb ram pc

visual violet Jul 25, 2020, 1:41 PM

#

better than wat

desert parcel Jul 25, 2020, 1:46 PM

#

use google colab they let you use their free gpu or whatever

#

it's just faster and it saves time from downloading all the modules

visual violet Jul 25, 2020, 2:00 PM

#

📎 unknown.png

#

temperature_array is pm2.5- an air pollutant 2017-2019
pm25_array is just average temperature 2017-2019

#

what does this graph mean

river wing Jul 25, 2020, 2:17 PM

#

i have a ongoing test can anyone please help me

signal sluice Jul 25, 2020, 4:48 PM

#

my_dict = {'100001':{
  'forename':'John',
  'surname':'Smith'
  },
'100002':{
  'forename': 'Alice',
  'surname': 'Van Gogh'
  }
}

I have a dictionary as such where one ID corresponds to two values. In pandas, I have a column which has an ID for each record - I want to split this into two columns of the forename and surname.
What's the most idiomatic way to do this?

#

I have very little experience with pandas

bleak fox Jul 25, 2020, 4:54 PM

#

df = (pd.DataFrame.from_dict(my_dict)).T

#

for @signal sluice

📎 unknown.png

signal sluice Jul 25, 2020, 5:07 PM

#

oh no, I mean that i have a column in an existing dataframe which needs to be replaced by the two other columns

#

thank you though

bleak fox Jul 25, 2020, 5:08 PM

#

Can u share it... And output expected so that I can share you code

gloomy thistle Jul 25, 2020, 5:09 PM

#

Hey guys, I gotta form a prediction model involving a prediction of one main variable based on it's dependence on 4 different parameters ... do I use Step-wise regression or Multi- polynomial regression...Or anything else

bleak fox Jul 25, 2020, 5:10 PM

#

@gloomy thistle multi-pol will be good.

gloomy thistle Jul 25, 2020, 5:11 PM

#

I would also like to know how much of each parameter is also tallied into it.. how do I do that?

bleak fox Jul 25, 2020, 5:11 PM

#

You can use PCA or LDA to find it out

#

So use all your variables and run PCA, it will generate new matrix where you can select top N coloum which has impact of x1 x2.... Xn

gloomy thistle Jul 25, 2020, 5:12 PM

#

Cool, I'll look it up, thanks Kapil !

bleak fox Jul 25, 2020, 5:13 PM

#

Welcome bro

visual violet Jul 25, 2020, 5:18 PM

#

temperature_test =  pd.read_csv ('C:/Users/dotha/PythonNotebook/File/temperature (2020) NYC.csv')
pm25_predicted_list = []
for i in temperature_test['temperature'].values.reshape(-1, 1):
    predicted_value = linear_regressor.predict(i)
    pm25_predicted_list.append(predicted_value)
pm25_predicted_list

#

please help

#

error: Expected 2D array, got 1D array instead: Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

#

i did try both way

signal sluice Jul 25, 2020, 5:20 PM

#

(sorry for the bump hb ill delete after) and ty for the help kapil

bleak fox Jul 25, 2020, 5:21 PM

#

use the old generated df which i shared and use pd.merge(your_df, my_df, key = 'id') to merge it

signal sluice Jul 25, 2020, 5:22 PM

#

awesome, tysm

bleak fox Jul 25, 2020, 5:24 PM

#

welcome bro

visual violet Jul 25, 2020, 5:25 PM

#

the temperature_test looks like this

#

📎 unknown.png

#

i dont understand why i am getting the errors

bleak fox Jul 25, 2020, 5:27 PM

#

@visual violet , try ony .values

visual violet Jul 25, 2020, 5:29 PM

#

📎 unknown.png

#

yea i tried that one too

bleak fox Jul 25, 2020, 5:29 PM

#

Use .predict([i])

#

And rest as your shared code..

visual violet Jul 25, 2020, 5:31 PM

#

📎 unknown.png

#

nvm it worked

#

no error but the list is blank

#

📎 unknown.png

bleak fox Jul 25, 2020, 5:33 PM

#

Just put a print(predicted_value) after your predict code line

#

Let's see what is output of your regressor

#

Man u r appending pm* list not m* list.... Correct that you will find your answer

visual violet Jul 25, 2020, 5:34 PM

#

lmao you are right

#

omg

#

thanks a lot Kapil

signal sluice Jul 25, 2020, 5:37 PM

#

oh, i get an error saying that key is not a recognised keyword parameter in pd.merge?

bleak fox Jul 25, 2020, 5:39 PM

#

Which IDE you are using?

#

Ok for my dataframe write 1 more line df['id']= df.index

signal sluice Jul 25, 2020, 5:41 PM

#

oh yeh

#

that worked tyty

sinful dock Jul 25, 2020, 5:54 PM

#

for this dataframe, I'm trying to subtract consecutive times in df1['JOB TIME'] columns an expressing the result in minutes in df1['delta_t'] column. Why is my code failing to execute the conversion.? No traceback ```md
#Here is mi firt function to calculate time delta between consecutive rows
#Goals was to epress the timedelta in minutes the last two comments
def time_diff1(col1, col2):
if col1 is np.datetime64('NaT'):
pass
else:
diff = col2 - col1
#trying to get minutes here but doesn't work
return np.timedelta64(diff, 'm')

#Also Tried these without luck
#return diff.dt.total_seconds()/60

df1['delta_t'] = np.vectorize(time_diff1)(df1['Previous Time'], df1['JOB TIME'])

df1.head()
JOB TIME COMMENT Values Previous Time delta_t
1 2017-11-20 01:23:00 d_P = 1259.08 psi 1259.08 2017-11-20 00:13:00 0 days 01:10:00
2 2017-12-02 13:24:00 Offset_Pressure 0.00 2017-11-20 01:23:00 12 days 12:01:00
3 2017-12-02 16:21:00 d_P = 4142.57 psi 4142.57 2017-12-02 13:24:00 0 days 02:57:00
4 2017-12-03 02:57:00 Offset_Pressure 0.00 2017-12-02 16:21:00 0 days 10:36:00
5 2017-12-03 04:03:00 d_P = 539.38 psi 539.38 2017-12-03 02:57:00 0 days 01:06:00```

bleak fox Jul 25, 2020, 6:16 PM

#

@sinful dock in function use diff=pd.Timedelta(col2- col1).minutes

#

@lapis sequoia pandas shows only top5 and bottom 5 rows but it has in actual all rows... So you can go ahead and continue your operation

lapis sequoia Jul 25, 2020, 6:18 PM

#

I figured it out that's why I deleted

#

but I have another issue

bleak fox Jul 25, 2020, 6:18 PM

#

Tell me davidd

lapis sequoia Jul 25, 2020, 6:18 PM

#

I want to remove these numbers on the side

31975                                    Zimbabwe
31976                                    Zimbabwe
31977                                    Zimbabwe
31978                                    Zimbabwe
31979                                    Zimbabwe
31980                                    Zimbabwe
31981                                    Zimbabwe
31982                                    Zimbabwe
31983                                    Zimbabwe
31984                                    Zimbabwe
31985                                    Zimbabwe
31986                                    Zimbabwe
31987                                    Zimbabwe
31988                                    Zimbabwe
31989                                    Zimbabwe
31990                                    Zimbabwe
31991                                    Zimbabwe
31992                                    Zimbabwe
31993                                    Zimbabwe
31994                                    Zimbabwe
31995                                    Zimbabwe
31996                                    Zimbabwe
31997                                    Zimbabwe
31998                                    Zimbabwe
31999                                    Zimbabwe
32000                                    Zimbabwe
32001                                    Zimbabwe
32002                                    Zimbabwe
32003                                    Zimbabwe
32004                                    Zimbabwe
32005                                    Zimbabwe
32006                                    Zimbabwe
32007                                    Zimbabwe
32008                                    Zimbabwe
32009                                    Zimbabwe
32010                                    Zimbabwe
32011                                    Zimbabwe
32012                                    Zimbabwe

#

not sure how

bleak fox Jul 25, 2020, 6:20 PM

#

They are just index....

#

While exporting pass index=False and you'll not found them in your file

lapis sequoia Jul 25, 2020, 6:22 PM

#

yep I put index=False and it worked

#

thanks

sick swan Jul 25, 2020, 6:22 PM

#

@bleak fox Bhai how are you so sure when he didn't give column names? (I'm new so just asking)

bleak fox Jul 25, 2020, 6:23 PM

#

@sick swan your question is about which questions reply... Tell me I'll explain

sick swan Jul 25, 2020, 6:23 PM

#

Do i really need to be a math GOD in order to DS? or can I.... Know how things work and know how to apply things, would that be enough?

#

@sick swan your question is about which questions reply... Tell me I'll explain
@bleak fox the Zimbabwe guy

#

I mean I'm good at math, no probs, but not GOD tier

bleak fox Jul 25, 2020, 6:24 PM

#

You can start with whatever you know in Math but keep on learning while practicing

#

@sick swan you can go ahead... DS is an art... And it requires a mind set more thn math and programing

#

@sick swan who is Zinbabwe guy?

sick swan Jul 25, 2020, 6:25 PM

#

I have done an IBM professional specialization, but it was clearly visible that they didnt dig deep into math

bleak fox Jul 25, 2020, 6:26 PM

#

Man if DS is as simple as calling a sklearn api thn everyone whould have done that.... Correct?

sick swan Jul 25, 2020, 6:26 PM

#

@bleak fox

I want to remove these numbers on the side

31975                                    Zimbabwe
31976                                    Zimbabwe
31977                                    Zimbabwe

this one

bleak fox Jul 25, 2020, 6:27 PM

#

@sick swan he shared it first but deleted... From where i got that these numbers are index

sick swan Jul 25, 2020, 6:27 PM

#

@bleak fox xD exactly, that's why I was trying to do a lotta math

lapis sequoia Jul 25, 2020, 6:27 PM

#

you do like this

print(df.to_string(index=False))

@sick swan

bleak fox Jul 25, 2020, 6:27 PM

#

I figured it out that's why I deleted
@lapis sequoia @sick swan check this

sick swan Jul 25, 2020, 6:27 PM

#

rn i need to revise my python so HARD

#

Ohk

lapis sequoia Jul 25, 2020, 6:28 PM

#

here's what code I have just so you can see in full

""" Modules """
import pandas as pd

# Read excel spreadsheet.
dataframe = pd.read_excel("../covid19data.xlsx", usecols="G")

pd.set_option("display.max_rows", None, "display.max_columns", None)


def countries():
    """ This prints all of the countries in the spreadsheet """
    countries = dataframe.to_string(index=False)
    print(countries)


countries()

sick swan Jul 25, 2020, 6:28 PM

#

damn im glad i found this community

lapis sequoia Jul 25, 2020, 6:28 PM

#

same

#

it is helpful

sick swan Jul 25, 2020, 6:29 PM

#

more like, its so good to have like minded people, everywhere i look, people are just freaking clueless

#

in the sense they dont take their future seriously

#

or dont yet have the realisation

bleak fox Jul 25, 2020, 6:30 PM

#

Yup thanks to creators

strange igloo Jul 25, 2020, 6:33 PM

#

Hey so, I'm trying to get some data from a redis website. (It has a .php ending) and then cache that data and then put it into either google sheets or some kind of analysis tool. I'm really new to this so I'm flying blind.

I can implement this myself in python but I don't want to reinvent the wheel. What kind of stuff do you (more experienced people) all use for this kind of thing?

bleak fox Jul 25, 2020, 6:49 PM

#

@strange igloo use selenium

sick swan Jul 25, 2020, 6:49 PM

#

GOD

bleak fox Jul 25, 2020, 6:50 PM

#

@sick swan GOD

sick swan Jul 25, 2020, 6:50 PM

#

:3

strange igloo Jul 25, 2020, 6:56 PM

#

Alright, I'll try and figure out selenium

sinful dock Jul 25, 2020, 7:02 PM

#

@bleak fox md def time_diff1(col1, col2): if col1 is np.datetime64('NaT'): pass else: diff = pd.Timedelta(col2- col1).minutes #trying to get minutes here but doesn't work return diff Unsure if datetime can take any self and I believe arguments need to be inside because is it failing to vectorize
<ipython-input-48-23b332169aaf> in time_diff1(col1, col2)
5 pass
6 else:
----> 7 diff = pd.Timedelta(col2- col1).minutes
8 #trying to get minutes here but doesn't work
9 return diff

AttributeError: 'Timedelta' object has no attribute 'minutes'

bleak fox Jul 25, 2020, 7:03 PM

#

Use seconds/60

#

Instead of minutes

#

().seconds/60

sinful dock Jul 25, 2020, 7:05 PM

#

yay!!

#

Thanks, documentation on these is quite confusing

bleak fox Jul 25, 2020, 7:07 PM

#

@sinful dock welcome buddy

sinful wharf Jul 25, 2020, 7:10 PM

#

hi

#

anyone here, have a quick stats question

bleak fox Jul 25, 2020, 7:16 PM

#

I can try

#

@sinful wharf

sinful wharf Jul 25, 2020, 7:17 PM

#

stats noob btw

bleak fox Jul 25, 2020, 7:17 PM

#

I am noob * 100...😂😂😂

sinful wharf Jul 25, 2020, 7:17 PM

#

so i have some survey data, likert scale (ordinal)

#

and i want to test for significance differences between some nominal groups

#

which test should i be using?

bleak fox Jul 25, 2020, 7:19 PM

#

P-value test...

sinful wharf Jul 25, 2020, 7:20 PM

#

yea they're all spit out p-value right?

#

or am i missing something

bleak fox Jul 25, 2020, 7:21 PM

#

ANOVA

sinful wharf Jul 25, 2020, 7:21 PM

#

kruskal wallis anova?

bleak fox Jul 25, 2020, 7:25 PM

#

Yup

sinful wharf Jul 25, 2020, 7:25 PM

#

prof said that was wrong

hardy apex Jul 25, 2020, 7:41 PM

#

I want gpt-3 to write my jq filters or xpath predicates!!

#

holy **** that would be cool

coarse spire Jul 26, 2020, 12:23 AM

#

Anyone have any ideas on what I can do with a bunch of twitch VOD chat logs?

marble jasper Jul 26, 2020, 12:43 AM

#

be open to possible violation of GDPR?

#

aside from that, depends what you want to do. make a bot that responds in some way to VOD? automated moderation? some kind of game?

coarse spire Jul 26, 2020, 12:49 AM

#

Hmm, not sure. Trying to gleam some insight from the data. I saw someone said they were able to select timestamps based on chat to pick out highlights in the stream.

#

Right now, I'm just making a word cloud and picking out the most popular emotes. Originally, I wanted to do topic modelling and sort out chat comments and figure out how they are reacting to the stream, but kind of stuck on that since I'm not sure what to look for, really.

marble jasper Jul 26, 2020, 12:53 AM

#

sure, sounds like you could do that

#

topic modelling, I'm not sure how you'd output that

coarse spire Jul 26, 2020, 12:53 AM

#

Well, so I was able to do that. It generated it's own topics but they don't really mean much lol

marble jasper Jul 26, 2020, 12:53 AM

#

but you could make use of pre-trained models to help, you'd just get a large vector

coarse spire Jul 26, 2020, 12:54 AM

#

Like one of the topics is one of the emotes "Pog" so it's mostly comments that just say "Pog"

marble jasper Jul 26, 2020, 12:54 AM

#

I guess you could see when the topic changed, but to do anything more may require you to train something

coarse spire Jul 26, 2020, 12:55 AM

#

when the topic changed?

#

Currently, my first run of it was using the same process as a post analyzing tweets. Twitch comments seem like a whole different beast since it's so much shorter and dependent on the comments and stream.

bitter harbor Jul 26, 2020, 2:03 AM

#

I guess you could see when the topic changed, but to do anything more may require you to train something
the issue with this is there won't be a clear divide between topics, as well as having to deal with 'garbage' comments not related to the topic/general 'sum' of topics

#

Idk how you'd approach it

coarse spire Jul 26, 2020, 2:11 AM

#

Going to see if I can find a correlation between chat and the stream by plotting chat "acceleration" minute by minute by counting the difference in comments. If I see a spike, there may be something there that the stream responded to.

#

Hmm, is there a way to get the data that's plotted by a histogram?

#

I have an array of timestamps and I want to group them by minute. The histogram plot does that for me but won't give me data back like

hist_data = {"bin_1": 200, "bin_2": 300}

#

Oh, got it. It was just bins = timestamps // 60

#

Then it's easy to group

tidal bough Jul 26, 2020, 2:15 AM

#

https://numpy.org/doc/stable/reference/generated/numpy.histogram.html#numpy.histogram can help you I think

#

oh, you just need the bins, not to bin the data before plotting.

#

oh, and actually

coarse spire Jul 26, 2020, 2:16 AM

#

No, I still need to bin the data lol

#

Thank you 🙂

tidal bough Jul 26, 2020, 2:16 AM

#

the docs for pyplot.hist say they return the values

#

https://matplotlib.org/3.2.2/api/_as_gen/matplotlib.pyplot.hist.html

📎 unknown.png

coarse spire Jul 26, 2020, 2:16 AM

#

Ooh, I'll try that then. Thanks

arctic cliff Jul 26, 2020, 2:23 AM

#

Dumb question, What's the best way to practice Pandas ?

coarse spire Jul 26, 2020, 2:23 AM

#

Now I have this image of the difference in the number of comments per minute. Gotta use some math to figure out what the cut off point should be signal and noise.

📎 index.png

arctic cliff Jul 26, 2020, 2:23 AM

#

Sorry for interrupting :/

coarse spire Jul 26, 2020, 2:23 AM

#

No worries.

#

I would look at some videos of pandas tutorials at first. They guide you through the basics

#

Get some data you want to play around with and try to follow along

arctic cliff Jul 26, 2020, 2:25 AM

#

I'm following a book, But It doesn't have much challenges so

#

Ah-

#

Thanks ! :D

#

Btw

#

Does kaggle have challenges on datasets?

coarse spire Jul 26, 2020, 2:25 AM

#

Yep tons of them.

#

I would look into the datacamp series of videos. They cover a lot of pandas stuff

arctic cliff Jul 26, 2020, 2:26 AM

#

Thanks a lot

coarse spire Jul 26, 2020, 2:26 AM

#

This might help. I skipped through it but it gives a nice overview. https://www.youtube.com/watch?v=zyIN3SE11V0

arctic cliff Jul 26, 2020, 2:34 AM

#

If I know all of these foundations, Am I ready to lean on myself and try to play with some datasets?

coarse spire Jul 26, 2020, 2:34 AM

#

Yeah

arctic cliff Jul 26, 2020, 2:34 AM

#

Because I do..

#

Great!

#

Quick question, What I will be doing now is called Data analysis ?

coarse spire Jul 26, 2020, 2:35 AM

#

The most important thing to know is not to use like a for loop to go through the whole dataset. Numpy/Pandas has it's own faster way of doing calculations on large sets of data

bitter harbor Jul 26, 2020, 2:35 AM

#

^^ I find doing random stuff that'll never be useful/never be used is super useful for learning not only fundamentals but some extra stuff as well

coarse spire Jul 26, 2020, 2:35 AM

#

Yep, I agree.

bitter harbor Jul 26, 2020, 2:36 AM

#

Quick question, What I will be doing now is called Data analysis ?
yes

#

data analysis is more of a generalization tho

arctic cliff Jul 26, 2020, 2:36 AM

#

The most important thing to know is not to use like a for loop to go through the whole dataset. Numpy/Pandas has it's own faster way of doing calculations on large sets of data
@coarse spire
I might not use them again in my entire life lol

#

^^ I find doing random stuff that'll never be useful/never be used is super useful for learning not only fundamentals but some extra stuff as well
@bitter harbor Can you give me an example of this ?

bitter harbor Jul 26, 2020, 2:37 AM

#

I've found numpy's useful in a lot of things

#

well for example matrix manipulation can be used for data science, but it's also super useful for game dev

arctic cliff Jul 26, 2020, 2:38 AM

#

How's it useful for Game Dev?

coarse spire Jul 26, 2020, 2:39 AM

#

A common thing is filtering out data. On the project I'm working on, I had to remove a bunch of comments made by bots so I would do df[~df.comment.str.contains("some bot message")] Super common if you are just looking at the data.

bitter harbor Jul 26, 2020, 2:39 AM

#

take tetris as an example

#

you could store the position of each piece as a separate variable (plz never do this, I've seen it and it's awful)

#

or you could use matrices

arctic cliff Jul 26, 2020, 2:40 AM

#

A common thing is filtering out data. On the project I'm working on, I had to remove a bunch of comments made by bots so I would do df[~df.comment.str.contains("some bot message")] Super common if you are just looking at the data.
@coarse spire
I didn't think ~ will be that useful xD

bitter harbor Jul 26, 2020, 2:40 AM

#

like if avg in row == 1, remove row and bring all values above it, down one

coarse spire Jul 26, 2020, 2:41 AM

#

lol yeah, for pandas, it's used to negate whole series

#

instead of not

arctic cliff Jul 26, 2020, 2:41 AM

#

you could store the position of each piece as a separate variable (plz never do this, I've seen it and it's awful)
@bitter harbor
just imagine this

#

Yeah this makes a whole sense to me now ..

bitter harbor Jul 26, 2020, 2:41 AM

#

I've seen it and it's awful

coarse spire Jul 26, 2020, 2:42 AM

#

Ah, that's cool. You can even make an ascii version of tetris and basically just print the matrix out.

arctic cliff Jul 26, 2020, 2:42 AM

#

When I first tried to play around with ~
I used it on a number and yeah.. My head started to hurt

bitter harbor Jul 26, 2020, 2:42 AM

#

like you could use lists, but lists are essentially 1d matrices

arctic cliff Jul 26, 2020, 2:43 AM

#

I have another important question-

#

Do companies asks you to tell them specific things about their data?
Or it's up to you to warn them about specific things and give tips to improve their companies

bitter harbor Jul 26, 2020, 2:44 AM

#

depends on your position *and values ig

arctic cliff Jul 26, 2020, 2:45 AM

#

DS

coarse spire Jul 26, 2020, 2:45 AM

#

Yeah, there are business/data analysts who create reports about the data.

arctic cliff Jul 26, 2020, 2:46 AM

#

About the specific data they need from the dataset ?

coarse spire Jul 26, 2020, 2:46 AM

#

Some places will have you doing predictive analytics, some will want just reporting.

bitter harbor Jul 26, 2020, 2:47 AM

#

it depends on what they want you to do with the data

#

which they'll tell you

coarse spire Jul 26, 2020, 2:47 AM

#

Yeah, really depends on who's asking you to do it.

bitter harbor Jul 26, 2020, 2:47 AM

#

you'll never have to do a bunch of random calculations on a dataset

#

(unless they tell you to do it)

arctic cliff Jul 26, 2020, 2:48 AM

#

So they might tell me to do random calculations and come out with random predictions ?

coarse spire Jul 26, 2020, 2:48 AM

#

They could tell you to do anything 🙂

#

Probably won't

arctic cliff Jul 26, 2020, 2:49 AM

#

This is really confusing

#

Like I'm giving you my money, Buy me whatever you want -

coarse spire Jul 26, 2020, 2:50 AM

#

So, yeah, at my company, the person they're "giving money" to would be the Machine Learning teams.

arctic cliff Jul 26, 2020, 2:50 AM

#

Well I'm complaining because I won't know what to predict from a current dataset I have if I don't search the calculations 😂

coarse spire Jul 26, 2020, 2:50 AM

#

Who usually come up with models to forecast some outcomes.

arctic cliff Jul 26, 2020, 2:50 AM

#

Who usually come up with models to forecast some outcomes.
@coarse spire Oh..

#

Wait

#

This is how ML works ?

coarse spire Jul 26, 2020, 2:51 AM

#

A part of it, yeah.

#

A very common part of it.

arctic cliff Jul 26, 2020, 2:51 AM

#

Correct me if I'm wrong,
It can predict random info that I don't know about as a DS from a dataset

coarse spire Jul 26, 2020, 2:51 AM

#

Well, when you say random info, I'm not sure what you mean.

arctic cliff Jul 26, 2020, 2:52 AM

#

Let's say I have a dataset about heights and width of squares

coarse spire Jul 26, 2020, 2:52 AM

#

A common example is figuring out house prices based on their square footage.

arctic cliff Jul 26, 2020, 2:52 AM

#

..

#

Holy

#

Moly

#

I'm speechless

bitter harbor Jul 26, 2020, 2:53 AM

#

that's actually a common example in describing the difference between coords and vectors

arctic cliff Jul 26, 2020, 2:54 AM

#

What's the difference?

#

If it's related to statistics, I can watch an explanation rn

coarse spire Jul 26, 2020, 2:54 AM

#

The housing price example? Makes sense since you don't just want to start at 0,0.

arctic cliff Jul 26, 2020, 2:55 AM

#

Btw

#

What's the difference between sklearn and tensorflow ?

coarse spire Jul 26, 2020, 2:55 AM

#

I like this playlist for stats. https://www.youtube.com/playlist?list=PL8dPuuaLjXtNM_Y-bUAhblSAdWRnmBUcr

arctic cliff Jul 26, 2020, 2:55 AM

#

Because I don't have any idea..

bitter harbor Jul 26, 2020, 2:55 AM

#


Similarly, while vectors generally imply some kind of linear space and the use of linear operations (i.e. linear algebra) they can be used to describe other data which may not have a linear basis."```

coarse spire Jul 26, 2020, 2:55 AM

#

Ah that makes sense then.

bitter harbor Jul 26, 2020, 2:55 AM

#

also a coordinate is always a vector but a vector is not always a coordinate.

arctic cliff Jul 26, 2020, 2:56 AM

#

This is really helpful ! @coarse spire

coarse spire Jul 26, 2020, 2:56 AM

#

You're trying to find a vector to predict other housing prices

bitter harbor Jul 26, 2020, 2:56 AM

#

^

coarse spire Jul 26, 2020, 2:56 AM

#

Which is what linear regression is.

arctic cliff Jul 26, 2020, 2:56 AM

#

I'm confused ..

coarse spire Jul 26, 2020, 2:57 AM

#

Yeah, so sklearn has a bunch of machine learning algorithms and tensorflow has a lot of deep learning algorithms.

#

I think that's....about what you need to know unless you need to use one of them, then you would go deeper. 🙂

arctic cliff Jul 26, 2020, 2:57 AM

#

@bitter harbor ?

📎 8HLzlQ1lXcuEAAAAAASUVORK5CYII.png

#

I think that's....about what you need to know unless you need to use one of them, then you would go deeper. 🙂
@coarse spire I might start sklearn after getting familiar with matplotlib, Do you think it's a good idea ?

coarse spire Jul 26, 2020, 2:58 AM

#

Yes

#

Well, it depends on your task, but if you just want to learn about the datascience ecosystem then machine learning with sklearn is great.

bitter harbor Jul 26, 2020, 2:59 AM

#

(a1,a2) would be the coords, [a1] [a2] would be the vector

arctic cliff Jul 26, 2020, 2:59 AM

#

OH

#

I got this !

bitter harbor Jul 26, 2020, 3:00 AM

#

📎 unknown.png

#

like that

#

I know it'd be a lot more work but i'd suggest learning about basic linear algebra, stats, and neural nets before looking at ml libraries

arctic cliff Jul 26, 2020, 3:01 AM

#

Can't I learn them in the way ?

coarse spire Jul 26, 2020, 3:01 AM

#

I think Bepples is saying that you should learn it that way 🙂

arctic cliff Jul 26, 2020, 3:01 AM

#

Like if I don't understand a specific point, I go and search it

#

Ah-

coarse spire Jul 26, 2020, 3:01 AM

#

So, the thing about that is, without a solid foundation, you won't even know what you don't know.

#

You might make it harder on yourself when there is a simpler solution.

bitter harbor Jul 26, 2020, 3:02 AM

#

like you can for sure just get a dataset and put it into a template, but it's easier to tell what's going wrong if you actually know what's happening
especially if the error isn't a programming error

coarse spire Jul 26, 2020, 3:02 AM

#

At a certain point, you'll have to do that anyway "I don't know this exact thing so I'll search for it as I go."

#

When I first got interested in data science, I took an online stats course.

#

Then I would try programming simple models and trying to pick apart others.

#

Picking apart Random Forest was fun.

arctic cliff Jul 26, 2020, 3:04 AM

#

This makes sense to me ..

bitter harbor Jul 26, 2020, 3:04 AM

#

thank god 😆

coarse spire Jul 26, 2020, 3:05 AM

#

You could also just try to throw Neural Networks at everything and see what sticks 🙂

arctic cliff Jul 26, 2020, 3:05 AM

#

😂 I think that will be horrible

bitter harbor Jul 26, 2020, 3:05 AM

#

^

arctic cliff Jul 26, 2020, 3:05 AM

#

2 days ago, Someone was asking a good question

coarse spire Jul 26, 2020, 3:05 AM

#

That's the fun part.

arctic cliff Jul 26, 2020, 3:05 AM

#

About working as a DS in a great company like Google

#

What libraries should he learns and etc.

bitter harbor Jul 26, 2020, 3:06 AM

#

"great"

arctic cliff Jul 26, 2020, 3:06 AM

#

I don't know it seems to be

bitter harbor Jul 26, 2020, 3:06 AM

#

it's a ds company because it's a data farm

arctic cliff Jul 26, 2020, 3:06 AM

#

Someone said you should know how the libraries work like what's going on behind them then think about working for them

coarse spire Jul 26, 2020, 3:07 AM

#

Eh...

arctic cliff Jul 26, 2020, 3:07 AM

#

I'm glad I didn't ask that question

bitter harbor Jul 26, 2020, 3:07 AM

#

how the libraries work like what's going on behind them
that's a good idea imo

coarse spire Jul 26, 2020, 3:08 AM

#

That's basically what the fundamentals are though. I think they want to know that you know what you're doing with the most common ML algos at least.

#

I don't know if they care too much about the underlying implementation though.

bitter harbor Jul 26, 2020, 3:08 AM

#

well ya they aren't going to ask you how nn's work, but understanding how allows you to do more and do it correctly

arctic cliff Jul 26, 2020, 3:09 AM

#

Oh

coarse spire Jul 26, 2020, 3:09 AM

#

Well, I guess I'm saying that there is a difference between knowing how an NN works and how Tensorflow 2.0 implements an NN layer.

#

Hmm, actually, tensorflow is from google so maybe they do care :p

bitter harbor Jul 26, 2020, 3:10 AM

#

ya the implementation is different because of the language
the concepts of a nn is __mostly __linear algebra/stats but the implementation is cs

arctic cliff Jul 26, 2020, 3:11 AM

#

Can't wait to post my complicated questions related to DS here >:D

#

One day

coarse spire Jul 26, 2020, 3:12 AM

#

Can't wait!

warm wedge Jul 26, 2020, 3:24 AM

#

I just released libra: a machine learning API that lets you build and train models in one line of code. Check it out here please: https://github.com/Palashio/libra. Would really appreciate any feedback / questions y'all have 🙂

bitter harbor Jul 26, 2020, 3:25 AM

#

@warm wedge #303934982764625920 is a better place to put this

desert parcel Jul 26, 2020, 3:41 AM

#

📎 unknown.png

#

There seems to be an error and I don't understand

#

so does it mean I can only do .backward() on scalars only?

#

changed it to this and the output is none

📎 unknown.png

flat quest Jul 26, 2020, 3:46 AM

#

Lol that might get confused with Libra blockchain @warm wedge

desert parcel Jul 26, 2020, 3:47 AM

#

I need a .backward() in order to differentiate something hmm

#

but I can't .backward() something that isnt a scalar

#

So what do I do

flat quest Jul 26, 2020, 3:47 AM

#

You should learn the fundamentals. Not because you don’t have to do searching along the way, but a baseline knowledge is critical if you want to understand why you got a particular result

#

@arctic cliff

bitter harbor Jul 26, 2020, 3:48 AM

#

I think numpy has a function for that

desert parcel Jul 26, 2020, 3:48 AM

#

So what can you do in numpy

#

wait let me rephrase

#

So I should convert a tensor into a numpy array and do a .backward() on that?

#

I'll try it

bitter harbor Jul 26, 2020, 3:49 AM

#

uhh no

#

I'll try to find what it's called

#

does nf.reverse() not work?

desert parcel Jul 26, 2020, 3:53 AM

#

The error gave me a suggestion

#

var.detach().numpy()

#

But that sort of fixed a prolem

flat quest Jul 26, 2020, 3:53 AM

#

C is not a scalar output

#

It’s a vector

desert parcel Jul 26, 2020, 3:53 AM

#

problem i'm trying to recreate the error that gave me the recommendation